The U.S. Library of Congress expects to finish the initial stage of building a Twitter archive by the end of January. In April 2010, Twitter agreed to provide an archive of every public tweet since the company went live in 2006. The initial four-year archive contained about 21 billion tweets that take up 20 terabytes when uncompressed, including data fields.
The Library of Congress is storing 500 million tweets a day, and has added a total of about 170 billion tweets to its collection. The focus will now shift to making the collection accessible to lawmakers and researchers. "It is clear that technology to allow for scholarship access to large data sets is lagging behind technology for creating and distributing such data," the library says.
The full archive now requires 133.2 terabytes for two compressed copies, which are stored on tape in separate locations for safekeeping. The library already has received 400 inquiries from researchers studying citizen journalism, vaccination rates, stock market trends, and other topics.
From IDG News Service
View Full Article
Abstracts Copyright © 2013 Information Inc., Bethesda, Maryland, USA