Is Genomic Privacy Possible?

The darker aspect of genomics is privacy risk. — The growing ability to peer into the human genome is nothing short of revolutionary, but lost or stolen genome data cannot be changed.

The growing ability to peer into the human genome to discern everything from ancestry to disease is nothing short of revolutionary. Advances in the field have fueled enormous breakthroughs in biology, bioengineering, and medicine.

However, there is a darker side to genomics: privacy risk. Unlike a credit card number or bank account, DNA is immutable. Lost or stolen genome data—which provides clues about everything from life expectancy to the likelihood of suffering from depression—cannot be changed.

Stolen DNA data is not an entirely abstract concept. In June 2018, DNA testing service MyHeritage revealed that hackers had breached more than 92 million user accounts. This data could conceivably be used (perhaps misused is a better term) to make decisions about insurance, medical treatments, the viability of long-term loans, and what amount individuals should pay for healthcare. Genome data could also be used by hackers to extract ransoms from organizations or individuals.

The repercussions are huge. As the popularity of businesses like MyHeritage, Ancestry.com, and 23andMe explode, and academic and commercial researchers increasingly tap genomic data, the privacy risks and potential repercussions grow. Researchers must find ways to ensure anonymity and better secure data.

"If people fear that their private genome data will be breached, they're not as likely to participate in medical research and share their information," says Timothy Caulfield, Canada Research Chair in Health Law and Policy at the University of Alberta.

Beyond Genes

The widespread adoption of genome sequencing and analyzing tools has been nothing short of remarkable. Global Market Insights projects the industry will reach $45 billion by 2024. Although high-profile firms that offer DNA sampling kits designed to reveal a person's ethnic heritage grab headlines, academic researchers as well as pharmaceutical, medical, and biomedical firms increasingly use genomic research to develop new procedures and new medicines. Bonnie Berger, a professor of mathematics, electrical engineering, and computer science at the Massachusetts Institute of Technology (MIT), points out that the number of human genomes collected is expected to swell from 500,000 in 2020 to more than 10 million by 2025.

"The increasing volume of genomic data that's available is great news because we can study it to better understand conditions things like cancer, diabetes, and heart disease," Berger says. However, "If people are uncertain or fearful about the privacy of their data they may not share it." What's more, genetic privacy risks are not limited to lost and stolen data. As data science advances, it is possible to infer identity with only a few pieces of information. In 2013, Harvard University professor George Church re-identified the names of 40% of an anonymous sample in a DNA study using data science techniques.

Conventional security techniques focus on locking down databases. However, Claire Marblestone, senior counsel at the law firm of Foley & Lardner in Los Angeles and a specialist in healthcare informatics and security, says standard security measures are no longer adequate and they cannot scale to the level necessary for genomics research. "We are struggling to keep up with the technology. As we become a more digitized society, we have to find better ways to protect this data."

Engineering better privacy

Berger and a team of researchers at MIT and Stanford University believe they have found a way to securely gather genomic data for large-scale biomedical research projects. The group has devised a computational protocol based on "secret sharing," which divides sensitive data and stores it across multiple parties. This distributed computational and cryptographic model is designed to protect millions of genomes, and allow researchers to securely perform analyses without anyone gaining access to the raw data. It breaks through computational barriers that previously existed for secret sharing and introduces viable crowdsourced genomics research.

In basic secret sharing, a study participant with, say, genotype x may send a random number, say r, to one party, or server, and the difference x-r, to another. Neither party is independently able to infer the private genotype x. Collectively, however, they can still perform the desired operations. If one party stores a group of r's and adds them together, and the other party adds up all the corresponding (x-r)'s, then adding up the two results yields the sum of all the x's. What makes the system so powerful is that no party in the group ever observes the value of any one x.

Others are exploring the use of blockchain technology to anonymize and protect individual genome data, while making it available for scientific studies. Blockchain could allow individuals to provide direct consent for the use of their data, and allow large genomic datasets to be stored in a distributed and more secure manner. However, such a framework would still require individuals to trust researchers with their raw data. Hoon Cho, a doctoral student at MIT who worked with Berger to develop the genome crowdsourcing system, says their system removes this requirement. "There may be ways to combine the two technologies into a single framework," he says.

Concludes Caulfield, "Genomic privacy is a core concern for the public and researchers. It's a problem that must be addressed if we are going to achieve major improvements in healthcare. It's essential that people feel entirely secure about their data."

Samuel Greengard is an author and journalist based in West Linn, OR, USA.