Twenty-five years ago, Tim Berners-Lee released to the world a few lines of code—the HyperText Transfer Protocol, the HyperText Markup Language (HTML), and a rudimentary Web browser—for linking and browsing information on the Internet. What he dubbed the World Wide Web now consists of 4 billion indexed pages at more than a billion Web sites used by 3.5 billion people, and the original code has been expanded and augmented with billions of lines of user software.
Despite its huge success, Berners-Lee and a handful of Internet pioneers say it is flawed, and they now want to build a new Web. The goals are to offer reliability, integrity, and privacy not easily obtained today, and to preserve a complete and accurate archive of all Web activity.
The idea—really a multi-faceted vision supported by a host of mostly existing technologies—was presented at the Decentralized Web Summit in San Francisco by Brewster Kahle, founder of the Internet Archive. The vision appears in a white paper by Kahle, Locking the Web Open: A Call for a Decentralized Web.
Kahle points out that while the Internet communications infrastructure is highly decentralized and redundant, the Web is not; most websites, in fact, are hosted on a single server in one geographic location. While the big Internet companies such as Google, Amazon, and Facebook run highly distributed systems, user authentication flows through a single channel potentially subject to scrutiny and control by corporations and governments.
The new Web would also be based on technologies like WordPress, free software that can be used to develop and host a website on a computer owned or controlled by the user. It would employ a cryptography-based decentralized authentication system, and it might also employ the blockchain technology for encrypted distributed databases that was developed for bitcoins. Kahle describes it as a system "where Web sites are multi-homed, served out of multiple places—which can be ad hoc in configuration—and where there is no contractual obligation by the user."
Jon Kleinberg, a computer science professor at Cornell University who specializes in large-scale social and information networks, applauds these notions for ensuring the integrity and stability of Web content. "A lot of the technologies [Kahle] talks about are cryptographic at their heart," he says. "When I read something, I want to be confident of its authorship, and I want to read things without worrying that other people are monitoring what I'm reading."
Yet Kleinberg warns of difficulties in achieving these goals. "The Web is a huge system; what sorts of systems at Web scale can realistically be built? It's hard to tell exactly what is going to be technologically feasible."
A Web with a Memory
A major objective of the new Web would be to create a complete, permanent record of all pages created. Today's Internet Archive employs something called the Wayback Machine, which crawls public Web sites bi-monthly looking for changed pages and archiving a billion pages a week. However, Kahle says, Web pages last just 100 days on average before being changed or deleted. A version of the Wayback Machine for the decentralized Web would archive pages as they are created, and it would be tightly and transparently integrated so users could retrieve history from a website without having to go to the archive.org site, as is now the case.
"My interest is to preserve the WWW content in an easier way than with the massive efforts that the Internet Archive has had to mount," says Vint Cerf, Chief Internet Evangelist at Google (and a past president of ACM). "A naturally self-archiving system might be able to spread cost and risk more broadly while increasing the likelihood that content will still be discoverable years from now."
Kahle says the various pieces of technology required for this new Web mostly exist today, but they are not quite ready for integration into the kind of total system he envisions. "I thought of the [recent Decentralized Web Summit] as a call to build something, not an announcement of something being built," he says.
As in any big development effort, there will be difficulties, Cerf predicts. "I suppose it might not work—technically, economically or even politically. The Right to Be Forgotten concept might run into a problem in such a system. On the other hand, the archive might serve the Right to be Remembered. There are obvious complexities regarding expiring copyrights and relaxing of access controls upon such expiration. Expiration of software patents pose similar conundrums."
Gary Anthes is a technology writer and editor based in Arlington, VA.