BLOG@CACM
Computing Profession

The Frontier of Small Data

Posted
Valerie Barr
Valerie Barr, Professor of Computer Science, Union College

 I had the opportunity recently to attend a talk by Deborah Estrin entitled "small, N = me, data".  Her title is a play on our usual way of referring to problem size, in this case making the "n" of the problem size just a single person.  Certainly Estrin is not turning her back on big data and all that can be learned from it.  But she is very interested in how a single person’s data can be used to understand their situtation.  For example, in the medical realm, the Open mHealth effort is developing an architecture that can integrate data from an individual’s use of specific apps in order to help a health care provider make recommendations.  

Particularly interesting is the small data effort that Estrin has undertaken at Cornell NYC Tech.  Her definition of small data is "your row of their data".  What observations would surface if you could analyze in some combined way your mobile usage, cable usage, utility usage, ecommerce activities, search activities, social media and email usage, automobile usage gathered from smart car data, use of games, music, and video?  What changes in the health and wellbeing of an aging parent might surface through analysis of their aggregated data?  Would one be better able to compare the efficacy of a different courses of medical treatment by looking at aggregated data?  For example, data traces showing how far you walked and how early in the morning you left the house could indicate relative effectiveness of an arthritis medication.  

Estrin headed off possible issues with the comment that "where there’s a privacy concern there’s an opportunity."  Her goal is to develop an "ecosystem of applications" that an individual can run over her or his own set of data streams, a collection she referred to as our "personal data vault." Her hope is that eventually we will be able to subscribe to our own individual data traces. Personal data APIs will allow for development of real-time personal data apps.

Estrin listed several key challenges:

  1. getting the data
  2. processing and making sense of noisy, diverse data
  3. secure models for the personal data vaults
  4. a testbed for app prototypes

You can read Estrin’s 2013 TedMed talk online to get more information.  She closed the talk I attended with a reminder that Cornell NYC Tech is actively recruiting graduate students, so pass along the link to their Admissions page!  I know that I definitely have students who will be very interested in the possibility of working on this small data project.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More