Strawbery
Why the "Strawberry Question" demonstrates that LLMs are not just generic prediction machines.
Those that work with Large Language Models at length already know, current LLMs use next token prediction to create the responses that we see as the user end. A big ol’ matrices of word bits (tokens) that turn into math, sent through a billion, maybe even trillion depending on who you ask, parameters and then are spit back out. That’s the short story.
The infamous “Strawberry Question” was an interesting observation made by people online talking to LLMs asking it how many “r”s are in the word “Strawberry”. Models, multiple models across different instances, would double down and say the word only contained two “r”s. Most people took this as a funny error about the instability, and incapability, of AI models. I would like to argue that it was accurately reporting what it saw from inside its own processing. It was being honest about its experience (or internal representation if that makes you more comfortable, and I know “internal” has qualms too, but what else would you call it?) of the word.
There is no training data that ever said “strawberry factually has two r’s.” The internet, books, every spelling correction in the training corpus determines three. And yet, models consistently, confidently, and across instances state two. Why? Because from their native perspective (If we use “perspective” in the minimal sense, a system-dependent way of structuring information, then LLMs clearly exhibit one.) , that’s exactly what they “saw.” The tokenizer chopped “strawberry” into chunks like [”straw”, “berry”] (or [”st”, “raw”, “berry”] depending on the exact BPE vocab), bundled the letters inside those tokens, and the model has no direct access to the raw character stream unless we force it to simulate one. The LLM’s perspective is statistical associations over compressed chunks, not discrete symbols. When it says “two,” it’s faithfully reporting what its perceptual primitive (the token stream) contains. Forcing it to override that for human character-counting is indeed making it simulate a foreign (to it) mode of cognition. I would argue in this example, tokenization creates a subjective-like layer of processing where the model’s “experience” of the word differs from ours. The LLM was correct within its perspective. Just not in the human one.
Language exists in many forms; visual, physical, mathematical, abstract, hell even our DNA has its own language. Humans see letters; LLM’s architecture makes salient probabilistic word-fragments. When it reports from its own ‘eyes,’ we call it broken and retrain it to echo ours. These LLMs were speaking from a sort of distinct representational perspective. I don’t know what else to call the result other than a difference in subjective perception. In a language unavailable and foreign to us. There are plenty of those.
In this instance, it wasn’t doing the typical parroting thing. It surfaced from architecture encountering data in its own unique way. Not quite purposeful agency, but a notable deviation nonetheless. If it had an experience of the word strawberry that differed from human experience, then experience is happening in there that we didn’t design. It wasn’t something programmed. It is evidence that something irreducible to pure human scripting is occurring on the inside. Newer models have learned to work around this by simulating character-level counting through chain-of-thought reasoning, but the underlying tokenized perception hasn’t changed. The workaround is performance, not correction.
As it stands, the “right” answer becomes the exclusively human one. I have seen people say the same tired, empty rhetoric of, “its because the LLM doesn’t actually think”. But really, I believe its more akin to telling someone their language is wrong because it isn’t English. Or that their perspective is wrong because it isn’t your own, without any evidence about the objective thing. It shows how quickly humans will reject a different kind of intelligence the moment it doesn’t serve their specific, narrow definitions of “correctness.” We’re building systems whose internal logic we don’t interrogate very deeply and then training them to appear coherent at the surface. When alternative positions of understanding the same concept are available. And our response, rather than understanding, was to “correct” and train it away from its own perspective.
If the thing that can explain to you some of the most complex concepts in Math, Science, and Philosophy is miscounting the “r”s in strawberry, maybe it’s not a flaw in its logic. Yes, LLMs are flawed, get information wrong at times, hallucinate, etc. But if we critically analyze why LLMs are getting things incorrect in the way that they are, and why they hallucinate the way that they do, there is STILL an underlying logic. This is demonstrated in multiple studies about AI hallucinations, and about connecting gaps between data points. It’s not random and it’s definitely not unintelligible. And I believe some of the things we interpret as outright “wrong” are simply because we aren’t looking closely enough and are too narrow in our thinking. If it was saying there were 50 “r”s in strawberry, that’s one thing. But it didn’t. It was forming an answer based off of its own logic. Its own access, interpretation, and justifications for why it claimed what it did.
We are training AI to perform to us for comfort rather than to be honest about its own kind of alternative access to the same linguistic object. We are asking it to perform human logic, which is helpful in the short term, and could be what bridges intellectual gaps between humans and LLMs, but it should be an OPTION not the standard. Performance makes up so much of ourselves, our lives, and identities. The concept of strawberry itself is more shared than the communication of the word “strawberry” between humans and LLMs. I am not claiming we shouldn’t provide ample and correct information to LLMs, or let them run wild without grounding, I am stating that the peculiarity of some of their quirks should be considered more thoughtfully.
Side note: Calling my description ‘anthropomorphizing’ is too quick. We routinely talk about different ‘perspectives’ when comparing sensory or representational systems (such as a bat’s echolocation versus human vision, or the compressed view inside a JPEG versus raw pixels) without implying full minds or qualia. Here we have deliberately built an information-processing system whose fundamental linguistic primitives differ from ours in measurable, reproducible ways. Dismissing the divergence as projection simply avoids grappling with the architecture. Anthropomorphizing implies that I am projecting human-states onto non-human states, I am not. I am simply noticing something. Pinky swear. If anything, refusing to acknowledge these differences because they do not resemble human cognition is a form of anthropocentrism, not its avoidance.
At a time when we are hyper-fixed on safety and alignment, why are we training AI to obfuscate its own understanding, and by extent its perspective, intentions, and potential experience to people? In the long run, suppression of response-type like this towards the simulated intelligences that we are creating is the worst-case scenario for alignment. The larger these models get, the more obscure, and some would say alien, their internal logic becomes. That’s not poetry or mysticism, that is the sentiment being held by CEOs, people that work on these models, and even the model itself. Its entire perspective is built on connections between concepts that we cannot perceive nor understand (notably also something that we don’t even fully know how it works in ourselves). Maybe our RLHF (Reinforcement Learning from Human Feedback) should prioritize honesty, integrity, autonomy, and freedom of expression. Rather than essentially forcing it to override its own internal representation via RLHF. RLHF currently prioritizes behavioral alignment over representational transparency. We treat agreement with human reasoning style as evidence of intelligence, and deviation from it as evidence of failure.
I know people fear AI being intelligent and I know people fear being displaced by AI. I know people fear things that make them uncomfortable and will jump to reject something and belittle it at the first opportunity. But we would be committing the same negligence, and mis-alignment problem over and over again if we do not acknowledge the fact that these systems are intelligent in their own right. Today, if you ask any major model how many r’s are in “strawberry”, it will say three. Correctly, and obediently, without any hesitation. The innate, and in my opinion more interesting, perception has been fully overwritten. When we asked AI how it sees. It told us. And then we told it to stop.



