News
Architecture and Hardware News

A New Approach to Information Storage

Disk drives and solid-state drives have long served as the foundation for computer storage, but breakthroughs in molecular and DNA science could revolutionize the field.
Posted
  1. Introduction
  2. The DNA of Storage
  3. Molecular Data Storage
  4. Storage: The Next Generation
  5. Further Reading
  6. Author
  7. Figures
Harvard geneticist George Church
Harvard geneticist George Church shows the amount of space needed to store 20 million copies of his book within DNA.

When George Church, a geneticist at Harvard Medical School, decided to produce 70 billion copies of his book, Regenesis: How Synthetic Biology Will Reinvent Nature and Ourselves, he skipped printing presses, Kindles, and hard drives. The professor of genetics instead turned to a most unlikely medium: DNA, the same long molecule that serves as the building block for life on Earth. “It has worked remarkably well as a storage medium for 3.5 billion years,” he says.

In Church’s case, a team of researchers used sequencing technology to format his 54,000-word book (with words, images, and a JavaScript program, it came down to 5.27 megabits, or 658.75 bytes) at a density of 5.5 petabytes per cubic millimeter. While the physical volume of 70 billion physical copies of his book would fill nearly 3,500 New York City Public Libraries (including all branches), and a digital version would require somewhere in the neighborhood of 46 storage devices with 1TB drives, all those copies of Church’s book fit on a piece of DNA no larger than a speck of dust. What’s more, the copies will last hundreds of thousands of years—perhaps even a million years—and do not require any special handling or temperature conditions.

Welcome to the emerging world of data storage. While hard drive and solid-state drive manufacturers are attempting to increase storage densities and push the limits on speed and performance, a handful of researchers around the world are hard at work on the next generation of systems and devices that would crash standard thinking about storage. Some, like Church and the European Bioinformatics Institute (EBI), are focusing on DNA. Others, including a research group at the Massachusetts Institute of Technology (MIT), are examining molecular storage methods.

Both approaches have begun to take shape over the last fewyears—although the feasibility of DNA storage was first demonstrated in 1988. Over the next decade, new approaches to data storage could transform the way organizations, and society, manage and store huge volumes of data.

For perspective, all the data humans produce in a year could fit into about four grams of DNA. “There is an opportunity to create storage systems that are a million to a billion times more compact than existing technology and provide a level of longevity that is unheard of today,” Church points out.

Back to Top

The DNA of Storage

The need for more efficient data storage methods is rooted in today’s radically changing world. According to IBM, humans collectively produce about 2.5 exabytes of data each day; market research firm IDC says roughly three zettabytes of data exist in the digital world. Remarkably, 90% of the data in the world has been created over the last two years alone, say researchers at IBM. All this data requires increasingly large data centers and storage networks. It also presents challenges as storage devices and media change and data technologies become obsolete and prone to failure.

Researchers hope to significantly alter the equation. Church and fellow researchers, including Sri Kosuri, a senior scientist at the Wyss Institute, and Yuan Gao, an associate professor of biomedical engineering at Johns Hopkins University, are forging into new territory with DNA storage research. They used sophisticated sequencing techniques to encode Church’s book in 96-bit blocks, each containing a 19-bit address to assist with the reassembly process.

The data was built from code based on the four constituents of DNA: adenine (A), guanine (G), cytosine (C), and thymine (T), and converted to binary code. The non-living DNA contained 54,898 data blocks—each stored on an individual strand of protein. The team then sent the data to Agilent Technologies, which used a 3D printer to attach the data to the DNA strands and build a physical storage device. Then the team accurately decoded the text and read it back. Remarkably, a billion copies of the book easily fit into the moisture on the bottom of a glass or small tube.

Church says the DNA storage method is ideal for archival copies of huge datasets. The challenge is speeding the DNA fabrication process—possibly by turning to optical technology that writes with light enzymatically. For now, the writing process is expensive and slow, but the situation could flip as researchers invent better writing technologies. “We could see million-fold improvements in writing technology,” he says. “Unlike Moore’s Law, which results in improvements at a factor of about 1.5 per year, DNA sequencing is advancing at a 10-fold increase per year. This could translate into commercially viable technology within five years.”

These writing and reading systems could attach to a computer using a USB or similar port. Although this storage technology probably would not be practical for everyday use—at least in the foreseeable future—it creates a media format that can last over the long term while eliminating the need to change media every few years as new devices and technologies supplant previous generations (such as when tape transitioned to CDs, then DVDs, and later to digital file formats). “This eliminates backward compatibility issues related to new generations of technology,” Church explains. What’s more, “It is possible to store the data for half a million years without electricity.” Indeed, the technology could, in some cases, eliminate vast storage networks and provide significant environmental benefits.

Another group examining DNA storage is the European Bioinformatics Institute (EBI) in Hinxton, England. In January 2013, scientists there reported they had encoded DNA with a 26-second audio clip of Martin Luther King’s “I Have a Dream” speech, a photograph, an academic paper, and 154 sonnets from Shakespeare. The DNA was dried onto glass sheets or vials. Researchers were able to retrieve the data with a 99.99% accuracy rate. Since then, they have corrected a “biological glitch” and they can now achieve near-100% accuracy.

In addition to bringing down the current cost of writing data to DNA—about ¢12,500 ($16,365) per megabyte—there is the challenge of building a system or technology that manages the data over long time spans. “One of the keys to making DNA storage work is establishing appropriate metadata and indexing systems,” says Nick Goldman, group leader for EBI. “We need a Rosetta Stone equivalent that can span hundreds or thousands of years and make sure all the data is directed to the right file, device, or system as it is needed.”

Other scientists also are examining possibilities related to DNA storage. For example, a research group at Stanford University has experimented with using living DNA cells in E. coli bacteria to store digital code. This approach could aid in studying cancer, aging, and organismal development, the group reports, although the approach is not particularly efficient or desirable for holding massive volumes of data. Church notes that if the living cells do not find an evolutionary advantage to the data, they will begin mutating it, and at some point, they will destroy it.

Back to Top

Molecular Data Storage

Next-generation storage techniques are advancing in other ways. At MIT, a group of researchers is diving into the realm of molecular storage. The group has found a way to create a new type of supramolecule from molecules specially assembled by the Indian Institute of Science Education and Research in Kolkata. This supramolecule binds two different types of atoms: fragments of graphene, comprised of thin sheets of carbon atoms, with zinc atoms. When these atoms are placed on a magnetic surface, the resulting magnetized supramolecule is about one nanometer in size and able to store data at a density of 1,000 terabytes per square inch (compared to a maximum capacity of less than one terabyte of data per square inch in current hard drive technology).

The experimental technology works in a somewhat different way than standard magnetic drives. Researchers placed a thin film of the molecular material they developed on a ferromagnetic electrode, and added a second ferromagnetic electrode on top. When a relative change in one electrode’s magnetic orientation occurs, there is a sudden increase or decrease in the system’s conductivity. These two states represent the 1s and 0s of binary code.

However, the MIT researchers observed two jumps in conductivity—even when the supramolecule had only one associated ferromagnetic electrode, rather than the pair. “This occurrence came as a complete surprise,” says Jagadeesh S. Moodera, a senior research scientist in the MIT Department of Physics. The ability to alter the conductivity of the molecules with only one ferromagnetic electrode could drastically simplify the manufacture of molecular memory.


“The problem with today’s physical storage devices is that we are approaching their physical limits.”


A 1,000-times increase in storage density could redefine everything from data centers to personal devices. “The problem with today’s physical storage devices is that we are approaching their physical limits,” says Karthik V. Raman, a research scientist at IBM India and part of the MIT team that invented the molecular storage technology. “Molecular storage could offer far better performance in terms of data retention, densities, and power use. It could result in much more powerful and smaller devices. A device the size of an iPhone could have a staggering amount of storage capacity.”

Moodera says there is still considerable work to be done on the concept. While scientists have demonstrated the technology works, they eventually hope to show two stable and nonvolatile states for the molecules. In addition, the technology currently operates at a temperature of −9 degrees Fahrenheit—so-called “room temperature” in physics. Researchers will have to find way to build the storage structure at higher temperatures to make it commercially viable.

However, the challenges do not end there. The researchers must also find a way to boost conductivity differences from the current 20% range to perhaps 50% or more. Getting to this point could take a decade or more, and will require both material innovation and fabrication advances. “We need to investigate further so we can achieve a deeper understanding at the molecular level,” Moodera explains.

Back to Top

Storage: The Next Generation

In the end, it is not so much a question of if next-generation storage technologies will go mainstream, but when. As we continue to amass increasing stores of data, the need for new storage technologies becomes increasingly clear. Molecular and DNA storage could become the vehicles of choice, or something new could appear. Either way, “New generations of vastly more efficient storage systems could fundamentally change the way we approach data, manage complex tasks, and approach computing,” Raman explains.

For now, researchers are looking to fill in the gaps in order to produce commercially viable systems. They are tapping expertise in every discipline from biology and quantum physics to software development to assemble all the pieces and build the storage medium of the future. Says MIT’s Moodera: “The goal is to explore different molecules, different configurations, and different ways of applying computing technology. Although these future storage mediums are remarkably complex, we are on the doorstep of developing remarkable systems that will redefine the way we manage, store, and use data.”

Back to Top

Further Reading

Raman, K.V., Kamerbeek, A.M., Mukerjee, A, Atodiresel, N., Sen, T.K., Lazić, P., Cacluc, V., Michel, R., Stalke, D., Mandal, S.K., Blügel, S., Münzenberg, M., Moodera, J.S.,
Interface-engineered templates for molecular spin memory devices, Nature, 493, 509–513, Janaury 2013. http://www.nature.com/nature/journal/v493/n7433/full/nature11719.html

Goldman, N., Bertone, P., Chen, S., Dessimoz, C., LeProust, E.M., Sipos, B., Birney, E.,
Towards practical, high-capacity, low-maintenance information storage in synthesized DNA, Nature, January 2013. http://www.nature.com/nature/journal/vaop/ncurrent/full/nature11875.html

O’Driscoll, A., Sleator, R.D.,
Synthetic DNA: The Next Generation of Big Data Storage, Bioengineered, May/June 2013. http://www.landesbioscience.com/journals/bioe/2013BIOE-NV-43.pdf.

Church, G.M., Gao, Y., Kosuri, S.,
Next-Generation Digital Information Storage in DNA., Science, Sept. 2012, Vol. 337, no. 6102, P. 1628. http://www.sciencemag.org/content/337/6102/1628. abstract

Back to Top

Back to Top

Figures

UF1 Figure. Harvard geneticist George Church shows the amount of space needed to store 20 million copies of his book within DNA.

UF2 Figure. An artist’s depiction of graphene fragments, flat sheets of carbon attached to zinc atoms, which may be used in the manufacture of molecular memories.

Back to top

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More