Bitwise – Communications of the ACM

In 1960, physicist Eugene Wigner pondered "The Unreasonable Effectiveness of Mathematics in the Natural Sciences," wondering why it was that mathematics provided the "miracle" of accurately modeling the physical world. Wigner remarked, "it is not at all natural that ‘laws of nature’ exist, much less that man is able to discover them." Fifty years later, artificial intelligence researchers Alon Halevy, Peter Norvig, and Fernando Pereira paid homage to Wigner in their 2009 paper "The Unreasonable Effectiveness of Data," an essay describing Google’s ability to achieve higher quality search results and ad relevancy not primarily through algorithmic innovation but by amassing and analyzing orders of magnitude more data than anyone had previously. The article both summarized Google’s successes to that date and presaged the jumps in "deep learning" in this decade. With sufficient data and computing power, computer-constructed models obtained through machine learning raise the possibility of performing as well if not better than human-crafted models of human behavior. And since machines can craft such models far more quickly than humans, data-driven analytics and machine learning appear to promise more efficient, accurate, and rich models—at the cost of transparency and modularity, hence why such systems are frequently seen as black boxes.

Ironically, Halevy, Norvig, and Pereira’s insights were driven by the ineffectiveness of mathematics in the life and social sciences. It is, I hope, an undisputed contention that models of biological and human behavior have nowhere near as strong the predictive power as do physical laws. While little tenable survives of Aristotle’s physics, the fourfold humoural classification of ancient Greece endures through the Jungian temperament classifying systems employed by the majority of Fortune 500 companies. Where such folk theories still prevail, there is the potential for automated computer models to do better than our own. We do not need machine learning for physical laws, only for phenomena so apparently complex we lack an unreasonably effective model of them.

I wrote my book Bitwise: A Life in Code (Pantheon) to chronicle my own struggle to reconcile the beautiful precision of computer science and mathematical models with the messiness of human existence. Yet the problems that I mused upon as a student of computer science and literature in the 1980s and 1990s grew far more relevant as the "datafication" of the world took place through the growth of the Internet and computing power. What is facing us, I argue, are several related phenomena:

The perpetual inadequacy of our provisional models of human behavior, psychology, sociology, and economics.
The tension between overly simplistic and reductionistic models versus opaque or overfitted models, and the difficulty finding a happy medium between them.
The rush to computationally regiment these models, however inadequate they may be, so that computers can view people, groups, and other phenomena in their terms.
The ad hoc, large-scale adoption of human- and computer-generated models of human behavior and psychology in the service of creating computationalized profiles and analyses of human beings.

Consequently, I suggest there is something paradoxical about "data science" as it exists today, in that it frequently begins with approximate, inaccurate models only to be faced with the choice of either reifying them or superseding them with more inscrutable models. While some methods offer a degree of scrutability and explanation, the difficulty of utilizing these methods to externally validate models arises from the very factor that enables their success: the sheer amount of data and processing being done.

Two consequences of this massive increase in data processing are a drive toward ubiquity of the models used, and an increasing human opacity to these models, whether or not such opacity is intended or inevitable. If our lives are going to be encoded (in the non-programming sense) by computers, computer science should assume reductionism, ubiquity, and opacity as intrinsic properties (and problems) of the models its methods generate.

Networks Reify Models

The present era of computing has been labeled with buzzwords like "big data" and "deep learning," the unifying thread among them being the centrality of enormous datasets and the creation of persistent, evolving networks to collect, store, and analyze them. I count not only machine learning networks among them, but the data stores and applications of Google, Facebook, Amazon, and myriad others. As an engineer at Google in the mid-2000s, I observed that systems, analytics, and machine learning were all fields that converged on the company’s fundamental goal of making the greatest possible use of its data, in part by collecting more of it.

Inasmuch as these persistent, evolving networks rely on overt and hidden properties of their data to achieve their particular results, models are central to such networks in a way they are not to algorithms. Whether one is performing a simple MapReduce or training a neural network, the choice of which features (or signals) to analyze and how to weigh and combine them constitutes guidance toward an implicit or explicit model of the data being studied. In other words, one is always working with and toward a model that relates the network’s data to its practical applications. No two such networks are alike because each is the product of the particular data on which it is built, trained, or deployed. And in the absence of some overarching coordination, we should not expect these models to be compatible with each other. Alan Perlis said, "Every program is a part of some other program and rarely fits," and the same is doubly true for models.

The explicit models employed by the largest networks today are frequently simplistic in the extreme. For Twitter, they are primarily a collection of hashtags and other keywords which serve to lump users together into overlapping categories. For Facebook, the core models include the basic personal information about users, a set of demographic microcategories, a set of six emotional reactions, and a large set of products, hobbies, and cultural objects in which people can express interest. Amazon adopted existing taxonomies of consumer products and combined them with the user information common among other large networks. National identification and reputation systems such as China’s Social Credit System and India’s Aadhaar extend the sort of categorization employed by Facebook to organize citizens under governmentally adjudicated labels. The results, by general agreement, are paradoxically elaborate yet simple, discrete yet haphazard. True, such models are not meant to be definitive or "scientific," but forcing users into such taxonomies results in these taxonomies carrying an increasingly prescriptive element. They thus become ontologies in serving to regiment our reality. It is the irony of the data age that computers, with little to no understanding of the models they are employing, are increasingly acting as primary arbiters of the ontologies employed by humans.

The specific problem computer science faces in these "data network" scenarios is that of making the data "safe" and "accurate" for the networks.

The specific problem computer science faces in these "data network" scenarios is that of making the data "safe" and "accurate" for the networks. That is, if we assume that the fit of the data to reality is imprecise and insufficient, what general purpose techniques can computer science itself offer to mitigate the inevitable flaws of the resulting models?

Answers to this question tend to display one of two opposing tendencies: on the one hand, a reductionistic transparency; on the other, a complexified opacity. In practice, networks tend to use a combination of both. Consumers and users may be asked to self-categorize themselves, being forced to choose from a shortlist of discrete options. On the other hand, statistical and machine learning methods may be used to fit humans into categories based on training data or tuned heuristics. Nevertheless, these categories are generally explicitly specified by the creators of the network performing the analysis, meaning that no matter the complexity of the model, refinement or revision of the ontology remains a manual process, as with Facebook’s advertiser categories.

In other words, even machine-generated models are beholden to the strictures placed on them by humans. For example, Wisconsin’s COMPAS recidivism algorithm tended to err in classifying blacks as more likely to recidivate and whites as less likely. Similarly, Amazon trained a resume screening network that produced arbitrary and biased results, discriminating against women and making assessments based on linguistic choices rather than listed skills. While not entirely opaque, the models were evidently not fixable, as Amazon discontinued the project. I cite these two examples in the hopes of showing that these problems are sufficiently general that they should fall under the rubric of computer and data science rather than as application-specific failures.

Unsupervised learning suggests that machine learning may increasingly be able to create its own feature distinctions, which would mitigate the anthropocentric biases of human-specified categories and features, at the cost of making such distinctions more opaque to humans. If people, instead of being classified into human-specified definitions such as "young women who travel to Greece" and "middle-aged empty nesters making over $100,000 a year," are divided into machine-created categories without any such descriptors, what amount of meaning can those categories have to humans? This is the problem of opacity: if computers offer a model to which humans can be better fit, there is every likelihood that we ourselves would not be able to employ it in our lives. We would be more comprehensible to machines than to one another.

Models Become Opaque

For centuries now, there has been a stunning gap between the precision and accuracy of physical models of the world versus our folk psychological models of people. Computers have inherited and exacerbated this gap. The game Dwarf Fortress has an entire fluid dynamics engine built in it to manage the flows of water and lava, but its non-player character dwarves that people these environments are modeled around heuristic ideas based partly on medieval folk psychological theories. The comparative simplicity of the human models does not owe to any failing on designer Tarn Adams’ part, but rather to a lack of existing definitive models of sentient behavior and the as-yet untamed complexity of the phenomena such models attempt to capture.

When it comes to coding complex phenomena and human phenomena in particular, it can appear as though we face an unenviable choice between simplistic, human-crafted models based in folk ontologies, and opaque, computer-crafted models that defy human explanation.

Inscrutibility appears to be a mark of the human.

I discuss both at length in Bitwise, concluding that ironically, the former, reductionistic approach is closer to what we think of as "machine" and the latter, opaque approach is closer to what we think of as "human," as we successfully operate every day with vague, ambiguously underspecified systems of which none of us can give a complete or accurate account, natural language chief among them. Such human-employed systems are interpretable yet inscrutable, in that one can understand the potential purpose of saying a certain thing in a certain situation and yet remain unable to make predictions, unable to determine whether a person is about to say "It is raining" when it is raining, even if their saying so does not come as any surprise. There is no indication that the ontologies around which these systems are based have any ultimate scientific validity; what it is to be "raining" is vague and under-specified. If we are treating each other as black boxes, it is a tall order to ask computers to be transparent in their attempts to model and predict human behavior. It is a brave ideal, but one that demands honesty around its infeasibility. Inscrutability appears to be a mark of the human.

Our failure to translate our own inscrutable world models into computational terms is the primary driver of the need for the two kinds of inadequate models specified here. Of course, it is not as though these human systems are complete or accurate: few would say natural language is the ideal mechanism for expressing truths about the world. It just happens to be our shared, mutually comprehensible mechanism. If we are to cope with inscrutability in our machine-generated models, they must improve on these complex, human-used models. They must become more accurate and complete in describing phenomena while remaining comprehensible—though not necessarily wholly scrutable—to humans.

Conclusion

"What is a man so made that he can understand number, and what is number so made that a man can understand it?" This, according to Seymour Papert, was the question that guided Warren McCulloch’s life as he made some of the earliest steps, alongside Alan Turing and Norbert Wiener, toward a theory of artificial intelligence in the middle of the 20^th century. Today we could invert the question: "What is data so made that it can represent the human, and what are humans so made that they can present themselves in data?" If the quantity of data and data processing is a key differentiator between provisional success and failure in the models created and utilized by computers, then the problem of inscrutability seems unavoidable to me. Rather, the more tenable problem may be one of synchronizing interpretability between machine networks and humans, so that even though we each are black boxes, humans can still tune and correct machines—and vice versa.