When the internet was being developed, scientists and engineers in academic and research settings drove the process. In their world, information was a medium of exchange. Rather than buying information from each other, they exchanged it. Patents were not the first choice for making progress; rather, open sharing of designs and protocols were preferred. Of course, there were instances where hardware and even software received patent and licensing treatment, but the overwhelming trend was to keep protocols and standards open and free of licensing constraints. The information-sharing ethic contributed to the belief that driving down barriers to information and resource sharing was an important objective. Indeed, the Internet as we know it today has driven the barrier to the generation and sharing of information to nearly zero. Smartphones, laptops, tablets, Web cams, sensors, and other devices share text, imagery, video, and other data with a tap of a finger or through autonomous operation. Blogs, tweets, social media, and Web page updates, email and a host of other communication mechanisms course through the global Internet in torrents (no pun intended). Much, if not most, of the information found on the Internet seems to me to be beneficial; a harvest of human knowledge. But there are other consequences of the reduced threshold for access to the Internet.
The volume of information is mind-boggling. I recently read one estimate that 1.7 trillion images were taken (and many shared) in the past year. The Twittersphere is alive with vast numbers of brief tweets. The social media have captured audiences and contributors measured in the billions. Incentives to generate and share content abound—some monetary, some for the sake of influence, some purely narcissistic, some to share beneficial knowledge, to list just a few. A serious problem is that the information comes in all qualities, from incalculably valuable to completely worthless and in some cases seriously damaging. Even setting aside malware, DDOS attacks, hacking and the like, we still have misinformation, disinformation, "fake news," "post-truth alternate facts," fraudulent propositions, and a raft of other execrable material often introduced cause deliberate harm to victims around the world. The vast choice of information available to readers and viewers leads to bubble/echo chamber effects that reinforce partisan views, prejudices, and other societal ills.
The question before us is what to do about the bad stuff.
There are few international norms concerning content. Perhaps child pornography qualifies as one type of content widely agreed to be unacceptable and which should be filtered and removed from the Internet. There are national norms that vary from country to country regarding legitimate and illegitimate/illegal content. The result is a cacophony of fragmentation and misinformation that pollutes the vast majority of useful or at least innocuous content to be found on the Internet. The question before us is what to do about the bad stuff. It is irresponsible to ignore it. It is impossible to filter in real time. YouTube alone gets 400 hours of video uploaded per minute (that is 16.7 days of a 24-hour television channel). The platforms that support content are challenged to cope with the scale of the problem. Unlike other media that have time and space limitations (page counts for newspapers and magazines; minutes for television and radio channels) making it more feasible to exercise editorial oversight, the Internet is limitless in time and space, for all practical purposes.
Moreover, automated algorithms are subject to error or can be misled by the action of botnets, for example, that pretend to be human users "voting" in favor of deliberate or accidental misinformation. Purely manual review of all the incoming content is infeasible. The consumers of this information might be able to use critical thinking to reject invalid content but that takes work and some people are often unwilling or unable to do that work. If we are to cope with this new environment, we are going to need new tools, better ways to validate sources of information and factual data, broader agreement on transnational norms all the while striving to preserve freedom of speech and freedom to hear, enshrined in the Universal Declaration of Human Rights.a I hope our computer science community will find or invent ways to engage, using powerful computing, artificial intelligence, machine learning, and other tools to enable better quality assessment of the ocean of content contained in our growing online universe.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2018 ACM, Inc.