Large language models are stranger than most people think

Yes, they aren’t human. Yes, they are always making stuff up (always plausibly, sometimes even truthfully). But the strangeness is deeper than that – after all, if you want “entities that make stuff up hoping to sound plausible” you just need to turn to your nearest social network.

The key unexpected thing in large language models as wide-use AIs is where they take their information from. Whatever meaning of “knowing” you choose to use, everything they know about anything you ask them about comes from the accumulated textual content of the Internet.

Think about that for a second. Forget about architectures and algorithms and focus on information about the world in its most abstract sense. In these systems it doesn’t come from observation, experimentation, or curated databases of knowledge in some structured form. Companies and individuals, for their own reasons (none of which was to educate an AI) spent a few decades writing and posting online about pretty much everything, with no coordination, filter, or plan. And now we put all of that text through algorithmic blenders and bet that through purely linguistic calculus — the patterns of words, not the patterns of the world — we can query this problematic avatar of the Internet’s collective textual record and that its answers will make sense in the world.

To whatever degree this works — and it works, by rights, better than it should and worse than reactive early-hype promoters and critics claim or fear — it’s astounding philosophically and culturally. It says things about key issues in linguistics. It’s proof of the sheer uniqueness of the Internet as a civilizational fact at the intersection of size and computational accessibility.

There’s also an issue of economics that’s often at most tangentially discussed. No company in the world could have come even close to building from scratch the text with which these models have been built. These tools are built over a collective, public, largely unpaid cultural endeavor, mostly with Open Source tools, and over technical infrastructure originally developed for and by government and academia. Focusing on the not at all small algorithmic and engineering feats necessary to build them, let’s not forget how much of it is based on the astounding, if sometimes poorly understood, managed, or defended, informational collectively built public good that is the Internet. (It’s also ironic, but not unsurprising, that most of the companies that have built these AIs would have been unable to do so in the sort of stunted, closed pseudo-Internet they are constantly trying to build.)

At its core, the success of large language models also opens up questions about the diversity of informational sources with which one can have smart entities, including as somewhat idealized examples:

  • Expert systems: experts’ codified knowledge, no data, text, or rules.
  • Traditional machine learning: only data about the world, no text, rules, or codified knowledge.
  • AlphaZero-like approaches: only the rules, no data, text, or codified knowledge.
  • ChatGPT-like approaches: only text about the world, no data, rules, or codified knowledge.

All of these, and many mixtures and alternatives, seem to work, to different degrees and with different efficiencies across different problems, and there’s no reason to think we have exhausted the set of foundationally different ways in which intelligences can work. The sooner and more completely we shed our anthropomorphizing instincts — including asking “how human” something is instead of just figuring out what it can do and how — the more we will be able to appreciate, explore, and take advantage of what seems to be a world in which cognitive capabilities are not just easier to build but also wildly, fascinatingly, usefully diverse.