Prompted by Keith Frankish's recent streamed discussion of LLM intentionality on YouTube, there's a particular idea I wanted to share which I'm not sure is widely enough appreciated but which I think gives a valuable perspective from which to think about LLMs and what kinds of intentions they may have. This is not an original idea of my own -- at least some of my thinking on this was sparked by reading about the Waluigi problem on LessWrong.
In this post, I'm going to be making the case that LLMs might have intentionality much like ours, but please understand that I'm making a point in principle and not so much arguing for the capabilities of current LLMs, which are probably not there yet. I'm going to be talking about what scope there is to give the benefit of the doubt to arbitrarily competent future LLMs, albeit ones that follow more or less the same paradigms as those of today. I'm going to try to undermine some proposed reasons for skepticism about the intentions or understanding of LLMs, not because I think the conclusions are wrong but because I think the arguments are too weak to support them.
I should also note that I will follow Keith (and Daniel Dennett, and others) in assuming an interpretivist account of intentions. That is, we should ascribe intentions to a system if and only if it helps to predict and explain the behaviour of the system. Whether it *really* has intentions beyond this is not a question I am attempting to answer (and I think that it is probably not determinate in any case).
The basic idea I want to introduce is that LLMs might have intentions and agency on multiple levels, so we may be missing something if we restrict our analysis to one level alone.
Let's start with the story of Blake Lemoine, who was famously fired from Google for spreading claims that its LLM LaMDA was conscious. Many were quick to ridicule him, pointing out that you can make LLMs claim anything you want them to. The very same LLM making claims to be conscious can also be made to say that it is not conscious. For many, this is evidence that LLMs do not have stable beliefs at all (certainly about themselves), so LLMs cannot be conscious. Similarly, you can ask an LLM to play the role of someone arguing for or against a particular proposition, and it will do so. Presumably, this means that the LLM itself has no opinions on such issues, only roles it can be asked to play. And presumably it no more experiences the intentional states of its roles as its own any more than a human actor on stage would the intentional states of a character in a play.
On to Keith's fascinating talk, where he expounds the view that we should see LLMs as playing a "chat game", where all they care about is making plausible utterances, and have no goals beyond this. Keith allows that the LLMs might have beliefs about the world, but that their intentionality is still severely lacking. In particular they have no illocutionary or perlocutionary intentions, as defined by philosopher J.L. Austin. An illocutionary act is an act that is necessarily performed via speech but which goes beyond mere speech and usually has real-world consequences (e.g. pronouncing someone guilty, or making a promise), whereas the perlocutionary act is the real-world consequence of a speech act on an interlocutor (e.g. shocking someone with an outrageous statement, or persuading someone of something). According to Keith, then, all an LLM wants to do is to deliver plausible utterances. It has no intentions that go beyond this. It doesn't intend to achieve anything with its utterances, and it doesn't intend to have any particular effect on its interlocutor. On the interpretivist account of intentionality, this is sufficient to predict what it will do and so we need no more.
However there's another way of looking at it. In the case of Searle's Chinese Room, which I discussed previously, we can see the man in the room (Searle) as an agent and the room or system as a quite distinct agent, each with its own beliefs and goals and other intentions. One speaks English, the other Chinese, but there's no reason to think that the differences should be confined to language alone: one may prefer American democracy and the other might prefer Chinese authoritarianism, for example. I think we can see what LLMs are doing in much the same way. We can see the underlying LLM, like Searle, as an agent that is only playing the chat game, as Keith suggests. But in order to play the chat game, it is, like an actor, playing a role, and that role, like the Chinese Room, may have intentions of its own.
Suppose we ask the LLM to play a game of chess with us. Unless you think that there is some fundamental reason why LLMs will never be able to play chess competently, and I doubt there is, then it seems that we could with the right prompts implement some sort of chess AI using an LLM. On Keith's analysis, it would seem that we should only ascribe "chat game" intentions to the LLM. It says "Qb4" only because it thinks that this is a plausible utterance for a player of chess by correspondence. And yet I'm pretty sure that Keith would ascribe intentions such as "putting the king in check" to a dedicated chess AI. It seems that Keith should therefore also ascribe the same intentions to a chess AI implemented via LLM. These would then be the illocutionary intentions he denies! By the locutionary act of saying "Qb4", the LLM performs the illocutionary act of moving the queen to b4 and putting the king in check.
Now suppose that instead of asking the LLM to play chess, we ask it to play the game of "therapy", where moves in the game consist of choosing lines to say in a dialog with an anxious patient. We prompt the LLM appropriately so that it knows it is to play the role of an expert and concerned therapist who wants to help the patient, just as before we prompted it to play the role of an expert chess player.
If it makes sense to interpret the chess playing LLM as having intentions about putting kings in check and so on, then I'm not sure why we should not interpret the therapy-playing LLM as having intentions about advising and helping the patient. If we do, then these would count as perlocutionary intentions, as they are intentions about how the speech acts should affect the interlocutor. Once again, even if the LLM is only playing the "chat game", that doesn't mean that it cannot do so by summoning a simulacrum with intentions of its own.
I think the mistake Keith is making is to restrict his analysis to the level of the LLM itself and to fail to consider that there may be a distinct agent supervening on top of it in much the same way as The Chinese Room supervenes on top of Searle. Perhaps the LLM itself is only playing the "chat game", but it has summoned a simulacrum which is playing another game altogether. The point is that Keith's analysis of the intentions of the LLM may be correct and yet there may still be illocutionary and perlocutionary intentions happening at other levels of analysis.
I think this sort of analysis shows how the instability of the views espoused by LLMs such as LaMDA doesn't mean much -- it could be that these are the views of different simulacra being summoned by the underlying LLM, and it could even be the case that some of these simulacra could be conscious, at least in principle.
Objections
"We don't need to interpret the intentions at these higher levels"
Keith seemed to be of the view that we should not ascribe illocutionary or perlocutionary intentions to LLMs because we don't gain anything by doing so -- the "chat game" analysis is sufficient. I agree with Keith that we should not ascribe intentions where to do so is not helpful, but I fail to see why it is not helpful in the cases above. In order to know what is a plausible thing to say for an LLM competently playing the role of a chess player in a "chat game", I need to know what moves a chess player might plausibly want to play in a game of chess. By treating it as an agent with the intentions of a chess player, I get a handle on what it will do, and I don't see how I can do that without taking such a stance. As long as we're agreed on being interpretive about intentions, these illocutionary intentions seem to be indispensible and therefore unavoidable. And the same goes for an LLM playing the role of therapist and that role's perlocutionary intentions.
"The behaviour of LLMs is nothing like the behaviour of social beings like humans"
Perhaps, but now we're talking about the limitations of current LLMs. Perhaps future LLMs may be competent enough to behave like humans if prompted to behave like humans. It's plausible that they could never get there with their current architecture, but we don't know that. If we want to give them the benefit of the doubt, then we should assume they can become arbitrarily competent unless we have a strong argument to the contrary. In any case, even a relatively dumb AI could plausibly have intentions to deceive, inform, or advise. I don't think you need to get to full human-level AGI to have perlocutionary intentions. Recent AIs such as Meta's Cicero which excel at the game of Diplomacy show as much. If you know Diplomacy, you know that this game is all about performing speech acts with decidedly perlocutionary intentions. These dedicated AGIs may not be strictly simply LLMs, but they do incorporate an LLM component, and I think they demonstrate the point in any case.
"LLMs are just stochastic parrots, predicting the next token"
LLMs are black boxes. We don't know how they work. We only know what they have been selected to do. They have been selected to be good predictors of the next token, but how they achieve this is left open. It may be that the best way to predict the next token that would be uttered by a human is to simulate a human. And if the very best, future LLMs are indeed doing something like simulating humans, then why shouldn't they have all the intentional states of a human (at least on an interpretivist account of intentionality)?
"LLMs have no stream of sensory information about the world -- all they have is text, so they cannot have intentions about the world in the way that we do."
Our brains don't exactly have a raw stream of sensation either -- all they have is nerve impulses. From these nerve impulses they have been trained both by evolution and in the course of development to construct a model of the world. I think that focusing on the fact that all LLMs see is text is a bit like focusing on how all brains see is nerve impulses. The sheer amount of data used to train LLMs is plausibly enough to give them as rich an understanding of the world as we have from our nerve impulses. Sure, it's not the same thing, but it seems enough to me to ground their intentions. In this view, all of human civilisation counts more or less as the sensory apparatus producing the "nerve impulses" of the text they have been trained on. I think this puts them as much in touch with the real world as we are. We perceive the real world via the nerve impuleses sent from our sensory apparatus, and they perceive the real world via the bit stream of text sent from their sensory apparatus. We learn to assemble this stream of nerve impulses into a rich model of sensation and perception and of the world itself, and they do the same with their text stream.
Once trained, LLMs do indeed have a pretty limited stream of sensory information, consisting only of the text typed in by an interlocutor. But I personally find it implausible that all intentionality disappears as soon as you stop getting a rich feed of sensory information. People in sensory deprivation tanks have intentions every bit as real as people outside them.
"LLMs are passive, only responding to prompts and never taking action off their own bat"
Tell that to AutoGPT! This is a system which with a relatively small and simple bit of code wrapping ChatGPT, is able to make plans and execute them autonomously. There's nothing logically necessary or profound about the passivity of LLMs. It's trivial to make them into agents.
"But it's just playing a role, like an actor. It doesn't experience these intentions."
This doesn't matter if we're being interpretivist. Besides, we don't know to what extent and in what detail a future LLM might be simulating the simulacra. If detailed enough, I'm not sure that the simulacrum might not experience the intentions even if like Searle in the Chinese room the LLM itself does not.
This is great as far as it goes. I think the next step is to figure out what sort of thing these next-level entities are? I think they're very much like fictional characters with all that implies, and that's not to minimize them.
ReplyDeleteToday, they aren't very coherent or stable entities. Perhaps, someday it will be possible to make much better fictional characters that people could care about without it seeming too weird?
If it's anything like chatting with today's LLM, the character is represented as plain text that fits within a context window. Maybe it will be a text database instead. But I suspect that researchers will come up with better ways to do it and it won't be called an LLM at all.
I'm delighted to see such constructive use of the interpretivist criterion. Pursuing this path will drive us to sharpen up tools to distinguish these levels of intentionality, which inherently blend into each other. For example we'll find it difficult or impossible to attribute some bad chess moves to the chess player role or the underlying role playing model.
ReplyDeleteGoing forward designing intentional layers seems like a fundamental engineering technique for building more complex learning system. We are already doing this in some sense with fine tuning etc. but don't yet have a handle on how to generalize layering.
I guess this also raises the question of how to manage horizontal interactions between roles on an underlying role player. Since they have different intentions they will have misunderstandings and also will have conflicts. They'll have to evolve (sub)languages appropriate to their needs. Etc.
What is the intentionality of a traffic light? On the one hand, it seems to desire to ensure the smooth and safe flow of traffic through the intersection it was posted in. On the other hand, there are those times when you are visibly the only car at the intersection and yet the light holds a steady red for minutes... perhaps it's true intention is to be evil and torture human souls. Sigh. So much to ponder.
ReplyDeleteOBJECTIONS
Traffic lights are just simple systems that operate on a timer and basic program:
- Traffic light are black boxes. We only know how the algorithm was intended to operate, but we have no way to prove that is in fact operating as designed. We can only measure the external behaviour exhibited by the lights in operation. Perhaps the most efficient way for the traffic light system to behave as designed is to simulate human consciousness and play out the role of a traffic light operating as intended?
I guess we'll just never know.
But... traffic lights aren't black boxes. We can see clearly how they work.
DeleteAnd yet, I don't completely disagree with your point, either. The underlying question should be whether LLMs *can* simulate human consciousness (or a small part of human consciousness which would include intentionality). We know that a traffic light cannot simulate consciousness or intentionality. The current capabilities of our most advanced LLMs are unclear to me - what *can* they simulate? And, hopefully, in time we will create AIs that can clearly simulate intentionality and consciousness.
I think "it's a black box" is a bit of a cop-out, as only some components of NNs are actually black box. Some components or functionalities are clear. So.. it seems like rather than throwing our hands in the air and saying "it's a mystery!" we should start by figuring out what a specific LLM *can* possibly simulate, and, by that, figuring out whether intentionality is on the table or not.
Why am I getting comments on this post all of a sudden? There's been nothing for months and then 4 all in a day from different accounts? I was thinking you all might be LLMs yourselves and this was a new wave of bot spam but your comments are not bad and your accounts are all too old.
ReplyDeleteYour blog post was shared on Hacker News. Or at least, that's where I saw it.
DeleteThanks!
Delete