NL Is Not NLP++

I have mentioned this elsewhere, that there are now plenty of so-called natural language processing (NLP) “experts” that have never heard of many language understanding puzzles like intensionality, nominal compounds, scope ambiguities, opaque contexts, etc., and have never been exposed to centuries worth of work by the likes of Gottlob Frege, Bertrand Russell, Ludwig Wittgenstein, W. V. O. Quine, Rudolph Carnap, etc., not to mention the likes of more recent thinkers such as Richard Montague, Jon Barwise, Hans Kemp, Jerry Fodor, George Lakoff, Jerry Hobbs, etc., and that this is exactly like having an expert physicist that never heard of the Third Law of Thermodynamics, or a physicist that have never heard of Isaac Newton or Albert Einstein. But that, in fact, and as silly as this might sound, is the sad state of affairs: so-called NLP experts whose only skill is knowing how to pull a machine learning library, massage the data, train a few models, and out should come a system that “understands” ordinary spoken language. And the magic will happen just because the data is BIG, and the model is DEEP. But of course, none of the “big” and none of the “deep” can still understand a simple sentence that a 4-year-old can easily utter or effortlessly comprehend. The real problem is that this naivete is not exclusive to new grads that are swept by the media hype (and, by the way, are excluding from their academic training solid scientific foundations!), but it has reached so-called NLP experts (the ones that have created and perpetuated the hype), including some rock stars of AI today (the original proponents of everything that is “big” and “deep”).

My concern is specifically NLU (with a ‘U’)—that is, natural language understanding, and not language “processing”— since finding the number of named locations in a piece of text, or the number of words surrounding “apple” or the number of times ‘Trump’ appears in a title, or the distribution and statistical correlation between “1-800” and “FREE,” etc. are all some form of language processing – but all these are, in theory, in the same computational class as finding in an image the number of pixels that have RGB color value (220, 0, 117). The fact that they are made up of English (or other language) characters is secondary. But understanding (and/or comprehension) of language is a different problem altogether, and it is not just a more capable or more powerful NLP; it is actually a different domain/study requiring different foundational knowledge beyond linguistics, grammars, and the like. Language is thought, literally, and I would even suggest replacing NLU by HTU (for “Human Thought Understanding”), so as to differentiate it from mere text processing.

To appreciate this point, consider the following:

(1) The ball didn’t fit in the brown suitcase because it was too

a. small
b. big

A 4-year-old (and one of my friends confirmed to me that his 2.5-year-old son also does) effortlessly understands that if (1) was followed by (a) then “it” is a reference to the suitcase, but a reference to the ball if it was followed by (b). Of course, one easily change these common-sense preferences by changing just a single word: for example, replacing “because” by “although” or “didn’t” by “did” or any combination of these will change the entire “plausibility space” from the standpoint of commonsense. If one insists on treating language like characters (“data”), then the number of combinations that effect the choice of what “it” refers to in this simple pattern is above 40 million (nearly half of the sentences we hear all of our lives!). So training on data and ‘learning’ patterns is beyond ridiculous. A 4-year-old, on the other hand, and even if they heard just a couple of similar sentences, UNDERSTANDS what “it” refers to because they know how these objects function and how they relate to each other in the world we live in—in short, because they have common sense. And here’s another (very serious) issue that so-called NLP experts ignore (or do not know even that an issue exists?). Consider the situation in the picture below:

In data-only approaches to language, one can easily alter reality and make false inferences because replacing ‘16’ in the sentence “I saw Mary teaching her little brother that 7 + 9 = 16” by a value that is equal to it (data-wise), is utterly false; for example, “I saw Mary teaching her little brother that 7 + 9 = 16” is not the same as “I saw Mary teaching her little brother that 7 + 9 = SQRT(256).” While the former is true, the latter is not (yes, in high-level reasoning, the kind that language understanding requires, your high school teacher was wrong, and SQRT(256) = 16 is not always true!).

In short, and while the syntactic and semantic analysis of text might be challenging in NLP, in Natural Language Understanding (NLU) that is actually the trivial part; the serious challenges in NLU are related to thinking about our metaphysical reality and its ontological structure, and about reasoning and human cognition in general.

If that is not the kind of science you are interested in, then there’s plenty to do in text processing (classification, filtering, search, etc.). If you are interested in the different problem of language understanding, then a deeper appreciation of the complexities of NLU and what the most brilliant logicians and cognitive scientists have worked on for several centuries will go a long way. And then, by all means, grab that “big” data and do all that “deep” stuff you love so much, and I, for one, am on the lookout for some of your breakthroughs!

Walid Saba is Principal AI Scientist at Astound.ai, where he works on Conversational Agents technology. Before Astound, he co-founded and was the CTO of Klangoo. He has published over 35 articles on AI and NLP, including an award-winning paper at KI-2008.