Computing Applications Research highlights

Technical Perspective: Programming With Differential Privacy

Posted Sep 1 2010

Article
References
Author
Footnotes

Government agencies worldwide are required to release statistical information about population, education, and health, crime, and economic activities. In the U.S., protecting this data goes back to the 19th century when Carrol Wright, the first head of the Bureau of Labor Statistics, which was established in 1885, argued that protecting the confidentiality of the Bureau’s data was necessary. If enterprises feared that data about an enterprise collected by the Bureau would be shared with competitors, investigators, or the tax authorities, data quality would severely suffer. The field of statistical disclosure limitation was born.⁴

Fast-forward a few decades, Stanley Warner was faced with a similar conundrum. During interviews for market surveys, individuals would refuse questions of sensitive or controversial issue "for reasons of modesty, fear of being thought bigoted, or merely a reluctance to confide secrets to strangers."⁷ His answer was a technique where the interviewee would flip a biased coin without showing the outcome to the interviewer. Depending on the outcome of the coin flip, the interviewee would (truthfully) answer either the original yes/no question or she would negate her answers. This method intuitively protects the interviewee since her answer could always have been due to the coin flipping on the other side.

Tore Dalenius formulated a very strong notion of protection a decade later:² "If the release of the statistic S makes it possible to determine the (microdata) value more accurately than without access to S, a disclosure has taken place…". This very strong notion of semantic security implies that data publishers should think about adversaries and their knowledge since the published data could give new information to an adversary.

Fast-forward a few more decades to the turn of the century. Statisticians have developed many different methods for limiting disclosure when publishing data such as suppression, sampling, swapping, generalization (also called coarsening), synthetic data generation, data perturbation, and the publishing of marginals for contingency tables, just to name a few. These methods are applied in practice, but they do not provide formal privacy guarantees—the methods do not formally state how much an attacker can learn, and they preserve confidentiality by hiding the parameters used.

Fast-forward to 1999. In his Innovations Award Talk at the annual ACM SIGKDD Conference, Rakesh Agrawal posed the challenge of privacy-preserving data mining to the community. In the next year, two papers with the same title "Privacy Preserving Data Mining" (one by Agrawal and Srikant¹ and the other by Lindell and Pinkas⁵) are published, and the computer science community has entered the picture.

Computer scientists were especially intrigued by formal models of data privacy—formal definitions of information leakage and attacker models as they have been pioneered and used in cryptography and computer security. The strongest formal definition of disclosure in use today is differential privacy as pioneered by Dwork, McSherry, Nissim, and Smith.³ Differential privacy beautifully captures the intuitive notion that the published data should not reveal much information about an individual whether or not that individual’s data was in the data.

Since its original proposal, much progress has been made in the development of mechanisms that protect published data with differential privacy while maximizing information content. The national statistical offices have also started to pay attention; for example, OnTheMap, a U.S. Census Bureau application that provides maps showing where workers live and are employed, has now been published with a variant of differential privacy.⁶

The following paper by Frank McSherry introduces a system called PINQ that integrates differential privacy into the C# LINQ framework, which adds database query functionality to C#. PINQ enables queries over data while elegantly hiding the complexity of the underlying differentially privacy mechanisms. Users of PINQ write programs that look almost identical to standard LINQ programs, but PINQ ensures that all query answers adhere to differential privacy, and it composes the information leakage from different queries until the privacy budget of the program has run out.

Differential privacy and PINQ give only a glimpse into a new exciting area at the confluence of ideas from computer science, statistics, law, and social sciences. I believe we will see much further progress on formal privacy definitions and improved methods, and I hope that future data products from the national statistics offices will be published with some formal notion of disclosure control.

Carrol Wright would be amazed by the field today.

Submit an Article to CACM

CACM welcomes unsolicited submissions on topics of relevance and value to the computing community.

You Just Read

Technical Perspective: Programming With Differential Privacy

View in the ACM Digital Library

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

DOI

10.1145/1810891.1810915

September 2010 Issue

Published: September 1, 2010

Vol. 53 No. 9

Page: 88

Table of Contents

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Explore More

BLOG@CACM Apr 26 2024

Optimizing Energy Efficiency in Datacenters with Advanced Cooling Technologies

Alex Williams

Architecture and Hardware

Credit: Getty Images Servers in snowy setting.

News Apr 23 2024

Maximizing Power Grid Security

R. Colin Johnson

Security and Privacy

News Apr 18 2024

Keeping AI Out of Elections

Bennie Mols

Artificial Intelligence and Machine Learning

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More

Technical Perspective: Programming With Differential Privacy

DOI

September 2010 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.