The signs and portents of the computational oracle

On the uselessness of LLM critique

As you enter the chamber, the thick smell of incense makes you feel a little light-headed. The oracle seems to speak with one voice, but vague shapes behind the veil suggest more than one person is behind it. The dim candlelight makes it difficult to tell. When you leave, you will remember only the prophecies that seemed true and forget what seemed false. Someone you know asked the oracle the same question you did and got a completely different response, but you want to believe it's because the oracle sees further than you, and grasps the details of your friend's life in mysterious ways.

Does this seem like a mystification? If it is, it is because OpenAI now deliberately mystifies the operation of their models. The comparison made by Jathan Sadowski of This Machine Kills before the release of GPT-4 turned out to be quite prescient; a veil of mystique has fallen over the latest version of the model at the exact time when it will be pushed relentlessly into our personal and working lives.

Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.
OpenAI: GPT-4 Technical Report (section 2)

Thus speaks the oracle's attendant. OpenAI has embraced the AI safety critique precisely to the extent necessary to shield their rituals from scrutiny and commercial imitation, and not a nanometer more. It is not an exaggeration to say they are now engaged in a wholly pseudoscientific enterprise, publishing "data" and "results" that no one can possibly replicate.

The "oracle" framing has been embraced not just by critics like Sadowski, but also by writers and technologists who embrace more of the creative and intellectual potential of LLMs:

LLMs when pushed to extremes will flood the zone with competing stories, causing all narrative sensemaking to break down into fever dream.
Gordon Brander, LLMs and hyper-orality

The priestly caste around the oracle makes prophetic assertions to the public about its powers. We are not meant to understand in the same way they do; we are not worthy of it, cannot be trusted with the forbidden knowledge used to inscribe thought on to electric webs of silicon. ChatGPT doesn't just predict words in a sequence: it is "reasoning." Its understanding of meaning provides a foundation so solid that we will soon build semantic towers (of Babel) atop it that will eventually formalize away all of the vagueness inherent in ordinary language. The pageantry around the oracle sometimes suffices to get the neophytes of the AI cult to revise their theories of "truth" to entirely match up with the most fanciful interpretation of the latest impressive demo, which they rationalize to themselves by using the pseudo-Bayesian terminology of "updating their priors."

The force of whatever critique I might make matters little. We can try to identify shortcomings in how ChatGPT works, point out dubious epistemic and metaphysical assumptions underlying people's understanding of what it does, or use it as a programmable vibes-based text generator, but our ability to truly understand is the same in any case. The model doesn't yield reliably deterministic output. Internal mechanisms are now trade-secret information of a commercial venture bootstrapped out of what seems like an extremely clever accounting scheme applied to 501(c)3 charitable contributions. All of us now, save for the lucky few researchers deemed worthy of OpenAI's initiation rites, are like the supplicants seeking insight and predictions. We have been unwittingly prepared for this role by what we popularly call "The Algorithm": we beg for the favor of the recommender system's attention, trying to make sense of why we have suddenly become unworthy of its gaze. Boosters and critics alike just perform post-hoc sensemaking about what the model does on the basis of the limited window we see, just like a YouTuber desperate to maintain relevance and a steady stream of ad revenue and searching for a reason why their videos don't land like they used to.

As much as I might desire to argue against overblown claims, I now fear that critique of these models reinforces the hype around them. And as someone who questions whether they should be built and introduced at all, I do not think critics should devote time and energy to the enterprise of generating ideas that could be used to "fix" them. I remain wholly unconvinced that text sequence prediction models can simply "scale" their way to a generalized form of intelligence.

You may think I am incorrect about this, that I have failed to fully grasp the moment in history that I live in. If that's the case, I'd ask you to consider one important distinction. In a prior scientific revolution, the motion of celestial bodies was observable by anyone with a telescope and the time to carefully record patterns in their movements (certainly no small ask in the early modern period). Now, when we are said to be on the cusp of a revolution in our understanding of intelligence, its inner mechanisms are being pulled away from us and we are reduced to myth-making and narrative interpretation rather than the testable hypotheses that are necessary (if not sufficient) conditions for any serious scientific enterprise. OpenAI wants to play the role of both Galileo and the Catholic church in the story of the Copernican revolution; lauding itself for its amazing discoveries while also suppressing the hidden knowledge that might lead to ruin in the hands of the heretics inspired by it. Perhaps I am right about language models. Perhaps I am wrong. The real question is: with OpenAI's obscurantist and paternalistic attitude towards the public, how are any of us supposed to tell the difference?