When George Church, a geneticist at Harvard Medical School, decided to produce 70 billion copies of his book, Regenesis: How Synthetic Biology Will Reinvent Nature and Ourselves, he skipped printing presses, Kindles, and hard drives. The professor of genetics instead turned to a most unlikely medium: DNA, the same long molecule that serves as the building block for life on Earth. "It has worked remarkably well as a storage medium for 3.5 billion years," he says.
Welcome to the emerging world of data storage. While hard drive and solid-state drive manufacturers are attempting to increase storage densities and push the limits on speed and performance, a handful of researchers around the world are hard at work on the next generation of systems and devices that would crash standard thinking about storage. Some, like Church and the European Bioinformatics Institute (EBI), are focusing on DNA. Others, including a research group at the Massachusetts Institute of Technology (MIT), are examining molecular storage methods.
Both approaches have begun to take shape over the last fewyearsalthough the feasibility of DNA storage was first demonstrated in 1988. Over the next decade, new approaches to data storage could transform the way organizations, and society, manage and store huge volumes of data.
For perspective, all the data humans produce in a year could fit into about four grams of DNA. "There is an opportunity to create storage systems that are a million to a billion times more compact than existing technology and provide a level of longevity that is unheard of today," Church points out.
The DNA of Storage
The need for more efficient data storage methods is rooted in today's radically changing world. According to IBM, humans collectively produce about 2.5 exabytes of data each day; market research firm IDC says roughly three zettabytes of data exist in the digital world. Remarkably, 90% of the data in the world has been created over the last two years alone, say researchers at IBM. All this data requires increasingly large data centers and storage networks. It also presents challenges as storage devices and media change and data technologies become obsolete and prone to failure.
Researchers hope to significantly alter the equation. Church and fellow researchers, including Sri Kosuri, a senior scientist at the Wyss Institute, and Yuan Gao, an associate professor of biomedical engineering at Johns Hopkins University, are forging into new territory with DNA storage research. They used sophisticated sequencing techniques to encode Church's book in 96-bit blocks, each containing a 19-bit address to assist with the reassembly process.
The data was built from code based on the four constituents of DNA: adenine (A), guanine (G), cytosine (C), and thymine (T), and converted to binary code. The non-living DNA contained 54,898 data blockseach stored on an individual strand of protein. The team then sent the data to Agilent Technologies, which used a 3D printer to attach the data to the DNA strands and build a physical storage device. Then the team accurately decoded the text and read it back. Remarkably, a billion copies of the book easily fit into the moisture on the bottom of a glass or small tube.
Church says the DNA storage method is ideal for archival copies of huge datasets. The challenge is speeding the DNA fabrication processpossibly by turning to optical technology that writes with light enzymatically. For now, the writing process is expensive and slow, but the situation could flip as researchers invent better writing technologies. "We could see million-fold improvements in writing technology," he says. "Unlike Moore's Law, which results in improvements at a factor of about 1.5 per year, DNA sequencing is advancing at a 10-fold increase per year. This could translate into commercially viable technology within five years."
These writing and reading systems could attach to a computer using a USB or similar port. Although this storage technology probably would not be practical for everyday useat least in the foreseeable futureit creates a media format that can last over the long term while eliminating the need to change media every few years as new devices and technologies supplant previous generations (such as when tape transitioned to CDs, then DVDs, and later to digital file formats). "This eliminates backward compatibility issues related to new generations of technology," Church explains. What's more, "It is possible to store the data for half a million years without electricity." Indeed, the technology could, in some cases, eliminate vast storage networks and provide significant environmental benefits.
Another group examining DNA storage is the European Bioinformatics Institute (EBI) in Hinxton, England. In January 2013, scientists there reported they had encoded DNA with a 26-second audio clip of Martin Luther King's "I Have a Dream" speech, a photograph, an academic paper, and 154 sonnets from Shakespeare. The DNA was dried onto glass sheets or vials. Researchers were able to retrieve the data with a 99.99% accuracy rate. Since then, they have corrected a "biological glitch" and they can now achieve near-100% accuracy.
In addition to bringing down the current cost of writing data to DNAabout ¢12,500 ($16,365) per megabytethere is the challenge of building a system or technology that manages the data over long time spans. "One of the keys to making DNA storage work is establishing appropriate metadata and indexing systems," says Nick Goldman, group leader for EBI. "We need a Rosetta Stone equivalent that can span hundreds or thousands of years and make sure all the data is directed to the right file, device, or system as it is needed."
Other scientists also are examining possibilities related to DNA storage. For example, a research group at Stanford University has experimented with using living DNA cells in E. coli bacteria to store digital code. This approach could aid in studying cancer, aging, and organismal development, the group reports, although the approach is not particularly efficient or desirable for holding massive volumes of data. Church notes that if the living cells do not find an evolutionary advantage to the data, they will begin mutating it, and at some point, they will destroy it.
Molecular Data Storage
Next-generation storage techniques are advancing in other ways. At MIT, a group of researchers is diving into the realm of molecular storage. The group has found a way to create a new type of supramolecule from molecules specially assembled by the Indian Institute of Science Education and Research in Kolkata. This supramolecule binds two different types of atoms: fragments of graphene, comprised of thin sheets of carbon atoms, with zinc atoms. When these atoms are placed on a magnetic surface, the resulting magnetized supramolecule is about one nanometer in size and able to store data at a density of 1,000 terabytes per square inch (compared to a maximum capacity of less than one terabyte of data per square inch in current hard drive technology).
The experimental technology works in a somewhat different way than standard magnetic drives. Researchers placed a thin film of the molecular material they developed on a ferromagnetic electrode, and added a second ferromagnetic electrode on top. When a relative change in one electrode's magnetic orientation occurs, there is a sudden increase or decrease in the system's conductivity. These two states represent the 1s and 0s of binary code.
However, the MIT researchers observed two jumps in conductivityeven when the supramolecule had only one associated ferromagnetic electrode, rather than the pair. "This occurrence came as a complete surprise," says Jagadeesh S. Moodera, a senior research scientist in the MIT Department of Physics. The ability to alter the conductivity of the molecules with only one ferromagnetic electrode could drastically simplify the manufacture of molecular memory.
"The problem with today's physical storage devices is that we are approaching their physical limits."
A 1,000-times increase in storage density could redefine everything from data centers to personal devices. "The problem with today's physical storage devices is that we are approaching their physical limits," says Karthik V. Raman, a research scientist at IBM India and part of the MIT team that invented the molecular storage technology. "Molecular storage could offer far better performance in terms of data retention, densities, and power use. It could result in much more powerful and smaller devices. A device the size of an iPhone could have a staggering amount of storage capacity."
Moodera says there is still considerable work to be done on the concept. While scientists have demonstrated the technology works, they eventually hope to show two stable and nonvolatile states for the molecules. In addition, the technology currently operates at a temperature of 9 degrees Fahrenheitso-called "room temperature" in physics. Researchers will have to find way to build the storage structure at higher temperatures to make it commercially viable.
However, the challenges do not end there. The researchers must also find a way to boost conductivity differences from the current 20% range to perhaps 50% or more. Getting to this point could take a decade or more, and will require both material innovation and fabrication advances. "We need to investigate further so we can achieve a deeper understanding at the molecular level," Moodera explains.
Storage: The Next Generation
In the end, it is not so much a question of if next-generation storage technologies will go mainstream, but when. As we continue to amass increasing stores of data, the need for new storage technologies becomes increasingly clear. Molecular and DNA storage could become the vehicles of choice, or something new could appear. Either way, "New generations of vastly more efficient storage systems could fundamentally change the way we approach data, manage complex tasks, and approach computing," Raman explains.
For now, researchers are looking to fill in the gaps in order to produce commercially viable systems. They are tapping expertise in every discipline from biology and quantum physics to software development to assemble all the pieces and build the storage medium of the future. Says MIT's Moodera: "The goal is to explore different molecules, different configurations, and different ways of applying computing technology. Although these future storage mediums are remarkably complex, we are on the doorstep of developing remarkable systems that will redefine the way we manage, store, and use data."
Raman, K.V., Kamerbeek, A.M., Mukerjee, A, Atodiresel, N., Sen, T.K., Lazi, P., Cacluc, V., Michel, R., Stalke, D., Mandal, S.K., Blügel, S., Münzenberg, M., Moodera, J.S.,
Interface-engineered templates for molecular spin memory devices, Nature, 493, 509513, Janaury 2013. http://www.nature.com/nature/journal/v493/n7433/full/nature11719.html
Goldman, N., Bertone, P., Chen, S., Dessimoz, C., LeProust, E.M., Sipos, B., Birney, E.,
Towards practical, high-capacity, low-maintenance information storage in synthesized DNA, Nature, January 2013. http://www.nature.com/nature/journal/vaop/ncurrent/full/nature11875.html
O'Driscoll, A., Sleator, R.D.,
Synthetic DNA: The Next Generation of Big Data Storage, Bioengineered, May/June 2013. http://www.landesbioscience.com/journals/bioe/2013BIOE-NV-43.pdf.
Church, G.M., Gao, Y., Kosuri, S.,
Next-Generation Digital Information Storage in DNA., Science, Sept. 2012, Vol. 337, no. 6102, P. 1628. http://www.sciencemag.org/content/337/6102/1628. abstract
©2013 ACM 0001-0782/13/08
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from [email protected] or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2013 ACM, Inc.