projects

This page mentions various projects with which I am affiliated. Other pages list my past and forthcoming publications and speaking engagements.

activities and affiliations

Check Yes to Release Your Data

http://checkyes.org/

Through a simple form, you can arrange to donate your organs to save a life. In an ideal world, we’d also be able to donate our health data to enhance medical research.

I’m working with Cory Nissen (co-founder of Foodborne Chicago) to help make that ideal a reality. Our project, called Check Yes to Release Your Data, aims to create a framework through which people may donate their health data to research upon their death. Our hope is to accelerate the search for medical breakthroughs, by creating a means for health researchers to access additional, real-world health data on which to build their efforts.

Data Science for Social Good

http://dssg.io/

In summer 2013, the Data Science for Social Good fellowship brought thirty-six budding data scientists to Chicago to work with non-profit organizations on real-world data projects. The teams applied data anlysis – the very same techniques used in the commercial space – to domains as diverse as health care, disaster relief, education, and public transit.

As a volunteer mentor, I assisted the project teams on issues ranging from partner interactions, data analysis, software development, and even career planning. My role also involved external communication, through articles posted on the fellowship website.

Foodborne Chicago

http://foodborne.smartchicagoapps.org/

The Foodborne Chicago service tracks tweets to find people who have succumbed to food poisoning at a restaurant, and helps them file an issue with the city.

The Foodborne Chicago app has its origins in an informal collaboration with the City of Chicago, Smart Chicago Collaborative, and a group of volunteers (myself included) interested in applying real-world data analysis skills to civic projects. Brett Goldstein (who was the city’s Chief Data Officer and CIO at the time) and I explored this collaboration in our Strata Conference Talk, “Text-Mining Your City”.

software projects

I’ve had a hand in writing the following software tools:

forqlift: easy, command-line access to Hadoop SequenceFile archives

charcuterie: a collection of Pig UDFs, for crunching text

novi: build custom yum/Kickstart repos

segue: parallel R in the cloud! (hosted off-site)

factualR: bring Factual.com datasets right into R