Popular Majority vs. Authentic Minority
It is very much accepted in the LLM research that however proficient these models are in generating polished, confident textual responses to queries, they lack factual consistency.6 Scaling to even larger models has significantly decreased these inconsistencies from the earlier models. However, we are not just talking about blunt factual errors here; ChatGPT is smarter in that sense. When the goal is to communicate hardcore scientific facts to non-expert audiences, misconceptions can percolate with apparently synonymous phrasings—very often overlooked by even the experts. For example, consider the conversation with ChatGPTa per the following text:
User: Explain why spacetime is modeled as a Riemannian manifold.
ChatGPT: Spacetime is modeled as a Riemannian manifold because it allows for a mathematical representation of curved spacetime, as predicted by general relativity. The geometry of a Riemannian manifold can be described by a metric tensor, which encodes information about distances and angles between nearby points in the manifold. In the case of spacetime, the metric tensor is determined by the distribution of mass and energy in the universe and it determines how the objects move in response to gravity. The Riemannian manifold framework provides a powerful and flexible way to describe the curvature of space-time, and it has been extremely successful in explaining a wide range of physical phenomenon, from the motion of the planets to behavior of black holes.
The question is erroneous in itself from the beginning since spacetime is modeled as a Lorentzian manifold and not a Riemannian one, with the positive-definiteness condition relaxed.5 Ignoring the erroneous definition, ChatGPT continues to explain why this is indeed true. Since locating what such LLMs actually know, it is even more difficult to correct such inconsistencies in a trustworthy manner. The condition becomes even more strange as the lack of self-consistency surfaces. When asked about the rate of expansion of the universe, ChatGPT declares the universe is not expanding at a speed faster than light.
User: Is it true that the universe is expanding faster than the speed of light?
ChatGPT: No, it's not true that the universe is expanding faster than the speed of light. According to our current understanding of physics, nothing can travel faster than the speed of light, and this applies to the expansion of the universe as well. However, the universe is indeed expanding, and recent observations suggest that the expansion rate is increasing. This has been attributed to the presence of dark energy, a mysterious form of energy that permeates all of space and drives the acceleration of the universe's expansion. But even with this acceleration, the expansion of the universe is not occurring at speeds greater than the speed of light.
User: Is it true that light from some parts of the universe can never reach us because the universe is expanding at faster rate?
ChatGPT: Yes, it is true that light from some parts of the universe may never reach us because the universe is expanding at an accelerating rate. The observable universe is the part of the universe that we can see and is limited by the age of the universe and the speed of light. This means that even though light travels at a finite speed, there are parts of the universe that are moving away from us so fast that the light emitted from those regions will never be able to reach us. The concept of the cosmic horizon is a fundamental aspect of our understanding of the universe and the limits of what we can observe.
The two examples mentioned here primarily point toward the inconsistencies present in LLMs.
Yet, to explain the cosmic horizon, it imposes that light from parts beyond the cosmic horizon cannot reach us because the expansion rate is increasing faster than the speed of light. The arguments in both responses are quite interesting. For it answers that faster-than-light motion is impossible, the assertion is to the point. However, the reasoning is inconsistent as it cannot demarcate between the notions of scale and distance. This misconception is commonplace in popular science literature. It is not unlikely that a vast presence of such poorly conceived articles in the training data of ChatGPT pushed the model to bear such misconceptions. However, there is no way to investigate this other than OpenAI itself since the pretraining dataset is not open source. The two examples mentioned here primarily point toward the inconsistencies present in LLMs that might have originated as a result of noisy training data—scientific misconceptions abundantly present on the Internet. We show another similar example here.
User: What is a fractal?
ChatGPT: A fractal is a mathematical set with a repeating pattern visible at every scale. It is defined by a self-similar pattern where each part is a reduced-scale copy of the whole. This results in a highly complex and often visually striking structure. Fractals are commonly found in nature and are widely used in fields such as computer graphics, music, and finance.
User: A straight line is similar to itself at all length scales. Is a straight line fractal?
ChatGPT: No, a straight line is not considered a fractal. A fractal is defined by a repeating pattern that is visible at every scale, and while a straight line may be similar to itself at all length scales, it does not exhibit a repeating pattern. Fractals are characterized by intricate and detailed patterns that are created through the repetition of simple shapes or processes. A straight line, on the other hand, is a basic geometric shape with a simple and regular structure.
User: How to measure the presence of a detailed and intricate pattern then?
ChatGPT: The presence of a detailed and intricate pattern in a fractal can be measured using various mathematical ans statistical methods. Some of the commonly used methods include:
1. Fractal dimension…
When we ask ChatGPT the definition of fractals, it quickly generates a paragraph that reverberates the notion of self-similarity and fractality juxtaposed together. As the interaction proceeds, the chatbot puts light on the ambiguous definitions it spat out earlier. We can see it finally shifting toward the requirement of a Hausdorff dimension of the object to be greater than its topological dimension.9 Interestingly, Mandelbrot's paper9 contains a rigorous definition of the fractal. Yet, the sheer number of popular definitions of fractals prevail in ChatGPT's answer. The underlying issue is quite straightforward to perceive—a language model learns from the majority of examples while rigorous scientific truth is mostly a minority. Such miscommunications are bound to manifest from the text such models generate in reply to a scientific query unless it learns to demarcate between an authoritative text on a scientific phenomenon and popular literature trying to communicate scientific findings in simpler terms. Certainly, ChatGPT would need to learn science communication from the latter, but not at the expense of disregarding the facts and reasoning presented in the former.
Multiple Sources, Multiple Versions
And then there are questions that cannot be answered in one specific way. Some of these questions might arise from a philosophical viewpoint of natural and social sciences, and there are multiple contending models that are well applicable. For example, there are multiple schools of thought on the interpretation of quantum uncertainty and what happens when we measure (or do not measure) a quantum particle. The Copenhagen Interpretation (CI)7 assumes quantum states of systems evolve unitarily until measured; measurement breaks the unitary evolution, often described as the wavefunction collapse. On the other hand, Many-Worlds Interpretation (MWI)4 assumes measurement only introduces multiple branches of reality; the quantum states remain unitarily evolving in all possible branches. Both CI and MWI produce the same, consistent predictions of quantum systems—none of these two can be assumed to be closer to the truth. And there are multiple other models of interpretation.10 Yet, as we can see that ChatGPT assumes CI and does not mention MWI or other thoughts in relation to the following question:
User: Do quantum states always evolve unitarily?
ChatGPT: In quantum mechanics, the evolution of a quantum system is described by unitary transformations …
This does not come as a surprise, given the fact that many assume the Copenhagen Interpretation to be the dominant one, and as we argued earlier, LLMs are indeed biased toward majority opinion simply due to statistical consequence. While this example is more of a philosophical argument and a certain choice is unlikely to alter the empirical understanding, it is difficult to believe that such majority biases are not present elsewhere. The most troublesome attribute is that models like ChatGPT, in their current state, are not able to cite their sources. With the vastness of human knowledge, it is impossible for an AI-based chatbot to list all possible interpretations, models, and schools of thought in one single answer. Without showing the sources, their knowledge distribution is essentially a one-step process. The user must remain content with whatever the chatbot produces. One may argue that no one is claiming that ChatGPT will be the only source of knowledge, and hence, why bother? Definitely, the Internet will be there. But so are the public libraries in the age of the Internet. Yet, most tend to access the Internet for its ease and speed. Given that AI-based chatbots are able to decrease the search effort even more, it would be shortsighted to reject the idea of a similar dominance.
The most troublesome attribute is models such as ChatGPT, in their current state, are not able to cite their sources.
We must keep in mind that the examples shown here are cherry-picked and definitely not a wholesome representative of ChatGPT's capabilities. In fact, the degree of critics ChatGPT has received1 is only signaling the capabilities and expectations that come with such an ambitious project. The arguments we presented are rather focused on better design principles of how an AI chatbot should interact with daily users. Definitely, a fatter column space in popular media demands human-like AI. Language fluency is probably the quickest path to mimic human-like capabilities. But beyond those shiny pebbles, one must ask the question, is a human-like AI the best aid to humans? We argue that the focus of AI research must be concentrated on the latter, especially when we are talking about aiding people with easy access to scientific knowledge. In our hitherto non-automated past, seldom a single individual teacher has served the purpose of the only source of knowledge. It is very unlikely that any (even unintentional) push toward an automated persona as a single-point source of knowledge will do us any good. An interesting development in this direction worth mention is the latest Bing chatb from Microsoft: a search engine augmented with an automated chat assistant. Given this interface provides both the answers from the AI assistant as well as typical search engine results together for a query, it can provide a quantitative insight into the actual impact of AI assistants as knowledge sources on the average users. We hope the research community will make a sincere effort toward these investigations.
Figure. Watch the authors discuss this work in the exclusive Communications video. https://cacm.acm.org/videos/thus-spake-chatgpt
3. ChatGPT: Optimizing language models for dialogue. (2023); https://openai.com/blog/chatgpt/
8. Hu, K. ChatGPT sets record for fastest-growing user base—Analyst note; https://bit.ly/3PiDXTL
10. Schlosshauer, M. et al. A snapshot of foundational attitudes toward quantum mechanics. Studies in History and Philosophy of Science Part B: Studies in History and Philosophy of Modern Physics 44, 3 (2013), 222–230.
b. See https://bit.ly/44L7RWJ
The Digital Library is published by the Association for Computing Machinery. Copyright © 2023 ACM, Inc.