March 27, 2023

Do Submarines Swim?

By now I’m sure everyone has seen enthusiastic AI boosters being completely wowed by ChatGPT and other Large Language Models (LLM) and also AI detractors pointing out the flaws of these models. For example, here’s a big compilation of such LLM failures. I think I’m probably more on the critical side, but I think a lot of the criticisms fall flat. Or better, the examples are failures, but the conclusions people draw on the basis of those failures are flawed. (I’m going to us “AIs” and “LLMs” interchangeably, because pretty much all the examples I have in mind are LLMs).

There’s all sorts of difficulties with intellectual property regarding how these AIs are trained, and there’s all sorts of difficulties about the potential uses they are being put to. Let’s leave those aside for now. A great deal has already been written on these subjects by people much better informed than I am. One such influential criticism is the already classic (and already out-of-date!) Stochastic Parrots paper.

There are two specific criticisms I’ve seen that I have in mind here that I want to address. On the basis of examples like those linked to above, people will say that the AI app “doesn’t understand” what it’s being asked, or that the AI “can’t think”. I don’t think these criticisms help, because they’re saying that the AI doesn’t have some property (understanding, thinking) that we don’t really know how to define. I’ve already seen AI boosters complain that this is moving the goalposts. And it sort of is: because if the criterion is something as vague and ill-defined as “understanding” then of course the goalposts are going to move around.

The history of AI is basically the history of discovering things that we thought were tightly correlated with intelligence (whatever that means) are not that tightly correlated with intelligence. Pretty much as soon as we invented computers we discovered that an ability to do arithmetic is not the exclusive preserve of intelligent animals like us. We used to think that playing high-level chess was something that only intelligent creatures like us could do. We’ve had since the late 90s to reconcile ourselves with the mistakenness of this presumption. We used to think that writing convincing grammatical sentences, and responding appropriately to the input of a user was something only intelligent beings could do, and it looks like ChatGPT and its ilk are demonstrating that that maybe isn’t right either. So if you’re on the side of the AIs then it’s not that surprising that this is starting to look like the critics are just No-True-Scotmanning their way through the debate.

So saying “AIs don’t understand” or “AIs can’t think” are, I think, unhelpful criticisms. I can’t put it better than Edsger Dijkstra, who said “The question of whether a computer can think is no more interesting than the question of whether a submarine can swim.” Let’s put aside the question of whether these LLMs can think, and let’s start enumerating and disecting the things they can do and the things they can’t.

One obvious thing that LLMs are notorious for is something that, for some damn reason, people are describing as “hallucinating”. I hate this term, but there it is. LLMs are no good at differentiating when they’re relaying a fact or when they’re making stuff up. They produce, in the technical sense of Harry Frankfurt, bullshit. That is, they do not care whether the information they provide is true or false. (There’s so many caveats I want to add to this last sentence that I’m going to write a whole paragraph at the end all about them.) What is missing, arguably, is any sort of model of the way the world is, or any sort of concept of a “fact” or truth and falsity. On top of that, LLMs are also missing any sort of introspection of what facts they know, and, importantly, what they don’t know.

These features – a concept of a fact, and introspective access to our own state of knowledge – are obviously key features of our cognitive make up, and arguably they are both important parts of whatever intelligence is. (Draw your own connections between introspection and Gödel, Escher, Bach here.)

The other thing that LLMs seem to be terrible at is, for want of a better term, common sense. They stumble over simple word puzzles and say completely ridiculous things like that it takes nine women one month to have a baby; or they list people 6 foot tall when asked for a list of celebrities under 5'11". What’s going on in these examples is that the AI is failing to infer obvious conclusions from the information it’s been given (and background information any normal person would have). Information that would demonstrate that its answer is obviously incorrect. And again, this sort of ability to draw conclusions from the information we have is something that we might take to be a core part of intelligence or understanding.

What’s fascinating is that these aspects of cognition – facts, introspection, inference – are among the very first things that people thought an artificial intelligence would need to be considered as such. People writing in the fifties were arguing that these were going to have to be part of any computer system we might want to call intelligence. So this is not moving the goalposts. These have been the goalposts all along, but we just got seduced by the bullshit and the ability to play games. We have decades of research on logic-based AI, expert systems and knowledge representation systems. And some of these ideas live on in projects like theorem provers, non-monotonic logics and ontologies (in the computer science sense).

OK before concluding, let’s have that paragraph of caveats. You can skip this paragraph if nothing about the following sentence bothered you: “That is, they do not care whether the information they provide is true or false.” First up, sure, LLMs don’t “care” about anything: it’s sort of implicit that there’s an agent making assertions in Frankfurt’s characterisation of bullshit, but I think this is inessential. Second, on some construals of “information”, it’s only information if it’s true, otherwise it’s just sparkling speech act. Dretske, for example, said that “false information and mis-information are not kinds of information—any more than decoy ducks and rubber ducks are kinds of ducks”. I think he’s wrong here. Yeah, eat it, Dretske. Third, Frankfurt’s definition of bullshit also includes a clause that the speaker is trying to convince you of what they’re saying, and again, LLMs don’t have aims, so this seems to undermine my suggestion that LLMs publish bullshit. OK fine, then let’s say that LLMs and similar are engaging in something like Frankfurtian bullshit except that it’s purposeless: the important point is that they’re producing text which purports to state facts with no regard for their truth or falsity. This seemed too much for a footnote, and maybe I should have just left it out, but that pre-emptive quibble-quashing is so deeply ingrained in my former academic writing brain, I had to get it in there. Sorry. Also I don’t have a conclusion really. Sorry again.

Large Language Models (and other kinds of Machine-Learning-based AIs) are remarkably good at some things, and they’re getting better at an incredible rate. But how good they are at some things makes the obvious flaws they have all the more striking. Pointing to these flaws and drawing the conclusion that LLMs “aren’t thinking” or “don’t understand” are unhelpfully vague ways to frame the criticism, and play into the AI defenders who accuse critics of moving the goalposts. It is possible to more clearly define what precisely the failures of these AIs are, and when we do that, it is striking that they appear to be bad at precisely the things that the old-fashioned logic-based AIs were good at. I’m not arguing for a return to Good Old-Fashioned AI, or that expert systems are a viable route to Artificial General Intelligence, but it really is striking how much the most glaring flaws of these LLMs match up with precisely what the strengths of the symbolic AI approach.

© Seamus Bradley 2021–3

Powered by Hugo & Kiss.