The Sparks of AGI? Or the End of Science?

Originally published on The Road to AI We Can Trust

“Pride goes before destruction, a haughty spirit before a fall.”

– Proverbs 16:18

Microsoft put out a press release yesterday, masquerading as science, that claimed that GPT-4 was “an early (yet still incomplete) version of an artificial general intelligence (AGI) system”. It’s a silly claim, given that it is entirely open to interpretation (could a calculator be considered an early yet incomplete version of AGI? How about Eliza? Siri?). That claim would never survive serious scientific peer review. But in case anyone missed the point, they put out a similar, even more self-promotional tweet:

There is, as you might expect, the usual gushing from fans:

And also some solid critical points:

But I am not going to give you my usual critical blow-by-blow, because there is a deeper issue. As I have said before, I don’t really think GPT-4 has much to do with AGI. The strengths and weaknesses of GPT-4 are qualitatively the same as before. The problem of hallucinations is not solved; reliability is not solved; planning on complex tasks is (as the authors themselves acknowledge) not solved.

But there is a more serious concern that has been coalescing in my mind in recent days, and it comes in two parts.

The first is that the two giant OpenAI and Microsoft papers have been about a model about which absolutely nothing has been revealed, not the architecture, nor the training set. Nothing. They reify the practice of substituting press releases for science and the practice of discussing models with entirely undisclosed mechanisms and data.

Imagine if some random crank said, I have a really great idea, and you should give me a lot of scientific credibility for it, but I am not going to tell you a thing about how it works, just going to show you the output of my model. You would archive the message without reading further. The paper’s core claim— “GPT-4 attains a form of general intelligence [as] demonstrated by its core mental capabilities (such as reasoning, creativity, and deduction)”—literally cannot be tested with serious scrutiny, because the scientific community has no access to the training data. Everything must be taken on faith (and yet there already have been reports of contamination in the training data).

Worse, as Ernie Davis told me yesterday, OpenAI has begun to incorporate user experiments into the training corpus, killing the scientific community’s ability to test the single most critical question: the ability of these models to generalize to new test cases.

Perhaps all this would all be fine if the companies weren’t pretending to be contributors to science, formatting their work as science with graphs and tables and abstracts as if they were reporting ideas that had been properly vetted. I don’t expect Coca Cola to present its secret formula. But nor do I plan to give them scientific credibility for alleged advances that we know nothing about.

Now here’s the thing, if Coca Cola wants to keep secrets, that’s fine; it’s not particularly in the public interest to know the exact formula. But what if they suddenly introducing a new self-improving formula with in principle potential to end democracy or give people potentially fatal medical advice or to seduce people into committing criminal acts? At some point, we would want public hearings.

Microsoft and OpenAI are rolling out extraordinarily powerful yet unreliable systems with multiple disclosed risks and no clear measure either of their safety or how to constrain them. By excluding the scientific community from any serious insight into the design and function of these models, Microsoft and OpenAI are placing the public in a position in which those two companies alone are in a position do anything about the risks to which they are exposing us all.

This cannot and should not stand. Even OpenAI’s CEO Sam Altman has recently publicly expressed fears about where this is all going. Microsoft and OpenAI are producing and deploying products with potentially enormous risks at mass scale, and making extravagant, unreviewed and unreviewable claims about them, without sketching serious solutions to any of the potential problems they themselves have identified.

We must demand transparency, and if we don’t get it, we must contemplate shutting these projects down.

Gary Marcus (@garymarcus), scientist, bestselling author, and entrepreneur, is deeply concerned about current AI but really hoping that we might do better.