Libraries Considered Hazardous

Do you remember the story of the room full of immortal monkeys typing on typewriters forever? Eventually they would produce all works ever written and that would ever be written. They would capture all truth but also everything that is false or only partly true. Were we to walk into such a place we would be confronted with an ultimate challenge: How to tell that which was true from everything else in this ultimate library?

In some ways, the contents of the Internet and especially the World Wide Web pose a similar challenge. About half the world’s population is now online according to estimates by the International Telecommunication Union.^a These approximately 3.8 billion people produce enormous quantities of information on Web pages, in databases, in social media, and other online platforms. While I do not mean to suggest these Internauts are no better than monkeys typing at random, there is a great deal of misinformation mixed in with very high-quality content. Some of that misinformation is a consequence of ignorance, but some is deliberately produced disinformation intended to confuse or to bend public opinion to achieve questionable ends. Ironically, some of the best quality, highly endorsed information is also wrong, not out of malevolent intent, but because it has been invalidated by the scientific method: theory, experiment, and measurement leading to proof or refutation.

If we are honest with ourselves, science is, at best, an approximation of reality. Even when they are not quite right, some theories can still be very useful. Newton’s laws are useful for many computations but under conditions of acceleration, high-speed or intense gravity, one needs Einstein’s refinements. And when we get to the ultra-small, we must move to quantum theory, but it doesn’t account for gravity! The challenge for us is to know under what conditions the approximations are applicable.

How does all this apply to libraries? Libraries are organized accumulations of information. I almost wrote "knowledge" but that term seems to connote "truth" and we know now that all information is not true. As we accumulate more and more information, how can we curate this content so as to correctly distinguish truth from fiction? How do we cope with the discovery that what we thought was true is, in fact, false in the light of new information? Librarians have a role to play here as keepers of knowledge, but even they cannot be expected to be omniscient. What about digital content? What about online content? Can the curators of knowledge use online digital libraries to maintain and curate content, helping the users of the library to find truth and reject fiction (except, perhaps, when looking for entertainment)?

The task of curating the Internet’s contents is well beyond any one person’s ability, or even any particular group. If we are to curate this content, we will need widespread collaboration, some of it with automated tools based on AI and machine learning. The libraries of the future cannot merely be catalogs of digital (and older media) content. The objects in the digital library will need to interact in some fashion so that truth value of their contents can be adjusted as new knowledge becomes available and is absorbed into the library. Such a process may actually prove feasible for factual knowledge but even there, fact can be elusive. Just as relativity theory shows us that two observers of the same two events may legitimately disagree as to the order in which these events occurred, it is not always clear what is factual and what is speculation.

All this tells us is that persistent accumulation of knowledge requires care and curation over time. One might even imagine that digital online libraries might have the ability to update themselves as new knowledge is added. John McCarthy^b once said to me, "Do you know, 100 years from now they will say, ‘100 years ago they had books that didn’t talk to each other!’" It will be an enormous task to devise methods to accumulate and curate digital content and its relevant metadata including provenance and validity. Will computer, information, and library science be up to the task? We can but try.

Libraries Considered Hazardous

DOI

February 2019 Issue

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.

Libraries Considered Hazardous

DOI

February 2019 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.