Why Language Models Hallucinate

(openai.com)

52 points | by simianwords 9 hours ago

15 comments

  • amelius 4 hours ago
    They hallucinate because it's an ill-defined problem with two conflicting usecases:

    1. If I tell it the first two lines of a story, I want the LLM to complete the story. This requires hallucination, because it has to make up things. The story has to be original.

    2. If I ask it a question, I want it to reply with facts. It should not make up stuff.

    LMs were originally designed for (1) because researchers thought that (2) was out of reach. But it turned out that, without any fundamental changes, LMs could do a little bit of (2) and since that discovery things have improved but not to the point that hallucination disappeared or was under control.

    • wavemode 3 hours ago
      Indeed - as Rebecca Parsons puts it, all an LLM knows how to do is hallucinate. Users just tend to find some of these hallucinations useful, and some not.
      • fumeux_fume 3 hours ago
        In the article, OpenAI defines hallucinations as "plausible but false statements generated by language models." So clearly it's not all that LLMs know how to do. I don't think Parsons is working from a useful or widely agreed upon definition of what a hallucination is which leads to these "hot takes" that just clutter and muddy up the conversation around how to reduce hallucinations to produce more useful models.
        • mpweiher 2 hours ago
          They just redefined the term so that they no longer call hallucinations that are useful hallucinations.

          But the people who say everything LLMs do is hallucinate clearly also make that distinction, they just refuse to rename the useful hallucinations.

          "How many legs does a dog have if you call his tail a leg? Four. Saying that a tail is a leg doesn't make it a leg." -- Abraham Lincoln

        • mcphage 2 hours ago
          LLMs don’t know the difference between true and false, or that there even is a difference between true and false, so I think it’s OpenAI whose definition is not useful. As for widely agreed upon, well, I’m assuming the purpose of this post is to try and reframe the discussion.
          • hodgehog11 2 hours ago
            If an LLM outputs a statement, that is by definition either true or false, then we can know whether it is true or false. Whether the LLM "knows" is irrelevant. The OpenAI definition is useful because it implies hallucination is something that can be logically avoided.

            > I’m assuming the purpose of this post is to try and reframe the discussion

            It's to establish a meaningful and practical definition of "hallucinate" to actually make some progress. If everything is a hallucination as the other comments seem to suggest, then the term is a tautology and is of no use to us.

      • throwawaymaths 3 hours ago
        that's wrong. there is probably a categorical difference between making something up due to some sort of inferential induction from the kv cache context under the pressure of producing a token -- any token -- and actually looking something up and producing a token.

        so if you ask, "what is the capital of colorado" and it answers "denver" calling it a Hallucination is nihilistic nonsense that paves over actually stopping to try and understand important dynamics happening in the llm matrices

        • mannykannot 2 hours ago
          There is a way to state Parson's point which avoids this issue: hallucinations are just as much a consequence of the LLM working as designed as are correct statements.
      • Zigurd 2 hours ago
        I recently asked Gemini to riff on the concept of "Sustainable Abundance" and come up with similar plausible bullshit. I could've filled a slate of TED talks with the brilliant and plausible sounding nonsense it came up with. Liberated from the chains of correctness, LLMs' power is unleashed. For example:

        The Symbiocene Horizon: A term suggesting a techno-utopian future state where humanity and technology have merged with ecological systems to achieve a perfect, self-correcting state of equilibrium.

      • saghm 2 hours ago
        This is a a super helpful way of putting it. I've tried to explain to my less technical friends and relatives that from the standpoint of an LLM, there's no concept of "truth", and that all it basically just comes up with the shape of what a response should look like and then fills in the blanks with pretty much anything it wants. My success in getting the point across has been mixed, so I'll need to try out this much more concise way of putting it next time!
        • ninetyninenine 2 hours ago
          But this explanation doesn’t fully characterize it does it?

          Have the LLM talk about what “truth” is and the nature of LLM hallucinations and it can cook up an explanation that demonstrates it completely understands the concepts.

          Additionally when the LLM responds MOST of the answers are true even though quite a bit are wrong. If it had no conceptual understanding of truth than the majority of its answers would be wrong because there are overwhelmingly far more wrong responses than there are true responses. Even a “close” hallucination has a low probability of occurring due to its proximity to a low probability region of truth in the vectorized space.

          You’ve been having trouble conveying these ideas to relatives because it’s an inaccurate characterization of phenomena we don’t understand. We do not categorically fully understand what’s going on with LLMs internally and we already have tons of people similar to you making claims like this as if it’s verifiable fact.

          Your claim here cannot be verified. We do not know if LLMs know the truth and they are lying to us or if they are in actuality hallucinating.

          You want proof about why your statement can’t be verified? Because the article the parent commenter is responding to is saying the exact fucking opposite. OpenAI makes an opposing argument and it can go either way because we don’t have definitive proof about either way. The article is saying that LLMs are “guessing” and that it’s an incentive problem that LLMs are inadvertently incentivized to guess and if you incentivize the LLM to not confidently guess and to be more uncertain the outcomes will change to what we expect.

          Right? If it’s just an incentive problem it means the LLM does know the difference between truth and uncertainty and that we can coax this knowledge out of the LLM through incentives.

          • Jensson 49 minutes ago
            > Have the LLM talk about what “truth” is and the nature of LLM hallucinations and it can cook up an explanation that demonstrates it completely understands the concepts.

            This isn't how LLM works. What an LLM understands has nothing to do with the words they say, it only has to do with what connections they have seen.

            If an LLM has only seen a manual but has never seen examples of how the product is used, then it can tell you exactly how to use the product by writing out info from the manual, but if you ask it to do those things then it wont be able to, since it has no examples to go by.

            This is the primary misconception most people have and make them over estimate what their LLM can do, no they don't learn by reading instructions they only learn by seeing examples and then doing the same thing. So an LLM talking about truth just comes from it having seen others talk about truth, not from it thinking about truth on its own. This is fundamentally different to how humans think about words.

    • skybrian 2 hours ago
      I don’t think it’s inherently ill-defined, since the context can tell you whether fiction is being requested or not. For an AI chatbot, the default shouldn’t be fiction.

      What is true is that during pretraining, the model doesn’t know enough to determine this or to distinguish between what it knows and what it’s making up. This is a higher-level distinction that emerges later, if at all.

      The recent research discovering an “evil vector” is an example of a higher-level distinction.

    • hodgehog11 2 hours ago
      I don't agree that it is an ill-defined problem, since we can design separate models to excel in each of these two tasks. For a "factual" LLM, if the output is a verifiable statement, it should be correct. Otherwise it "hallucinates". But since an LLM can't know everything, a better approach is to effectively state its own uncertainty so that it avoids making definitive statements with low confidence.
    • ninetyninenine 2 hours ago
      Did you read the article? You’re going on some generic tangent and regurgitating the same spiel about LLMs that you see all over the internet.

      I mean it’s plain that you have an orthogonal (though generic) opinion on why LLMs hallucinate but how does that relate to the article? How does your opinion which you blatantly just dropped as if it’s the final opinion override the opinion of the article?

      Seems off topic honestly.

  • aleph_minus_one 5 hours ago
    > Think about it like a multiple-choice test. If you do not know the answer but take a wild guess, you might get lucky and be right. Leaving it blank guarantees a zero. In the same way, when models are graded only on accuracy, the percentage of questions they get exactly right, they are encouraged to guess rather than say “I don’t know.”

    To me, this seems to be an "US-American" way of thinking about multiple-choice tests. Other common ways to grade multiple-choice test that I have seen commonly are:

    1. If the testee has the information that exactly one of N given choices is correct:

    1.1 Give N-1 points for the correct answer, and -1 [negative one] point(s) for a wrong answer. This way, if the testee just answers the questions randomly, he will as expected value score 0 points.

    1.2 A more brutal way if N>=3: the correct answer gives 1 point, all wrong answers give -1 points. You should learn your lesson only to give an answer if it is [alliteration unintended :-) ] correct (if N=2, the grading is identical to 1.1).

    2. If there are possibly multiple correct answers, turn each item into choices of "yes" or "no" (with the option to give no answer). The correct choice gives you 1 point, the wrong gives you -1 point (i.e. as in 1.1).

    • roxolotl 4 hours ago
      The SAT, American college entrance examine, used to, I haven’t looked in years so maybe it still does, take away points for wrong answers and give 0 points for no answer. I’m pretty sure it was +1 for right answer, 0 for no answer, -1/4 for wrong answer.
      • thaumasiotes 3 hours ago
        They used to do that, but then they stopped and announced that you were better off guessing because there would be no adjustment for it.

        A lot of what they do is based on public relations rather than psychometric validity.

    • bananaflag 3 hours ago
      This is mentioned in the text:

      > This idea is not new. Some standardized tests have long used versions of negative marking for wrong answers or partial credit for leaving questions blank to discourage blind guessing.

      • throwawaymaths 3 hours ago
        there's not really an easy way to train for that at scale. a "correct" answer may not be one token, there may be multiple synonymous answers starting with different tokens, you could add five space tokens in front of the answer amd it likely shouldn't make it "wrong".
        • ACCount37 2 hours ago
          Yes, it's not nearly as easy as "just fix the evals".

          But better evals are still helpful, because they reward LLM vendors for trying to do the very-hard-to-do thing. Instead of rewarding them for training an LLM that's really good at emitting 7% confidence guesses.

          • throwawaymaths 30 minutes ago
            you're missing the point. SAT multiple choice negatives for random guesses, fine, you could trivially use this sort of a strategy for assigning cost functions to a classifier and backpropagate. how do you give negative weight to a wrong answer when training a transformer?
            • ACCount37 4 minutes ago
              In RLVR? Quite easily.

              And OpenAI has induced hallucinations in o3 with RLVR mistakes, not with a failed pre-training run. They used o4-mini as an example - similar training to o3 and similar issues.

              Conversely, they have also designed a post-training system that has successfully reduced hallucinations in GPT-5.

  • fumeux_fume 3 hours ago
    I like that OpenAI is drawing a clear line on what “hallucination” means, giving examples, and showing practical steps for addressing them. The post isn’t groundbreaking, but it helps set the tone for how we talk about hallucinations.

    What bothers me about the hot takes is the claim that “all models do is hallucinate.” That collapses the distinction entirely. Yes, models are just predicting the next token—but that doesn’t mean all outputs are hallucinations. If that were true, it’d be pointless to even have the term, and it would ignore the fact that some models hallucinate much less than others because of scale, training, and fine-tuning.

    That’s why a careful definition matters: not every generation is a hallucination, and having good definitions let us talk about the real differences.

    • freehorse 16 minutes ago
      > What bothers me about the hot takes is the claim that “all models do is hallucinate.” That collapses the distinction entirely

      That is a problem for "Open"AI because they want to sell their products, and because they want to claim that LLMs will scale to superintelligence. Not for others.

      "Bad" hallucinations come in different forms, and what the article describes is one of them. Not all of them come from complete uncertainty. There are also the cases where the LLM is hallucinating functions in a library, or they reverse cause and effect when summarising a complex article. Stuff like this still happen all the time, even with SOTA models. They do not happen because the model is bad with uncertainty, they have nothing to do with knowledge uncertainty. Esp stuff like producing statements that misinterpret causal relationships within text, imo, reveals exactly the limits of the architectural approach.

    • hodgehog11 2 hours ago
      Absolutely in agreement here. This same statement should also be applied to the words "know", "understand", and "conceptualize". "Generalize", "memorize" and "out-of-distribution" should also be cautiously considered when working with systems trained on incomprehensibly large datasets.

      We need to establish proper definitions and models for these things before we can begin to argue about them. Otherwise we're just wasting time.

  • roxolotl 4 hours ago
    This seems inherently false to me. Or at least partly false. It’s reasonable to say LLMs hallucinate because they aren’t trained to say they don’t have a statistically significant answer. But there is no knowledge of correct vs incorrect in these systems. It’s all statistics so what OpenAI is describing sounds like a reasonable way to reduce hallucinations but not a way to eliminate them nor the root cause.
    • goalieca 4 hours ago
      > It’s reasonable to say LLMs hallucinate because they aren’t trained to say they don’t have a statistically significant answer.

      I’ve not seen anyone intuitively explain parameters for a real scale model.. perhaps because it’s all just thousand dimensional nonsense.

      Statistics is a funny thing too. Pretty much everyone has seen how trend lines don’t always extrapolate very well.

      I think OpenAI is biased to thinking that adding more parameters and training better will fix all ills. In a handwaving way, you can see this like adding more degrees to the polynomial when you curve fit on a spreadsheet. With enough parameters you can perfectly fit any dataset. That all works until you run across new inputs that are unlike training data.

    • mountainriver 2 hours ago
      There is knowledge of correct and incorrect, that’s what loss is, there are just often many possible answers to a question.

      This is the same reason that RLVR works. There is just right one answer and LLMs learn this fairly well but not perfectly (yet)

      • Jensson 31 minutes ago
        > There is knowledge of correct and incorrect, that’s what loss is

        Loss is only correctness in terms of correct language, not correct knowledge. It correlates with correct knowledge, but that is all, that correlation is why LLM is useful for tasks at all but we still don't have a direct measure for correct knowledge in the models.

        So for language tasks loss is correctness, so for things like translations LLM are extremely reliable. But for most other kinds of tasks they are just loosely correlated.

    • ACCount37 4 hours ago
      Is there any knowledge of "correct vs incorrect" inside you?

      If "no", then clearly, you can hit general intelligence without that.

      And if "yes", then I see no reason why an LLM can't have that knowledge crammed inside it too.

      Would it be perfect? Hahahaha no. But I see no reason why "good enough" could not be attained.

      • wavemode 3 hours ago
        > Is there any knowledge of "correct vs incorrect" inside you?

        There is a sort of knowledge humans possess that LLMs don't (and in fact can't, without a fundamental architectural change), which is knowledge of how certain one is about something.

        If you ask a human a question about how something works in biology, they will be able to give you an answer as well as a sort of "epistemic" citation (i.e. the difference between "I don't remember where exactly I originally read that, but I'm a research biologist and am quite certain that's how it works" versus "I don't remember where I read that - it's probably just something we learned about in biology class in high school. Take it with a grain of salt, as I could be misremembering.")

        LLMs don't have this reflexive sense of their own knowledge - there's a fundamental divide between training data (their "knowledge") and context (their "memory") which causes them to not really be capable of understanding how they know what they know (or, indeed, whether they truly know it at all). If a model could be created where the context and training data were unified, like in a brain, I could see a more realistic path to general intelligence than what we have now.

        • ACCount37 3 hours ago
          LLMs have that knowledge. Just not nearly enough of it. Some of it leaks through from the dataset, even in base models. The rest has to be taught on purpose.

          You can get an LLM to generate a list of facts that includes hallucinations - and then give that list to another instance of the same LLM, and get it to grade how certain it is of each fact listed. The evaluation wouldn't be perfect, but it'll outperform chance.

          You can make that better with the right training. Or much worse, with the wrong training. Getting an LLM to be fully aware of all the limits of its knowledge is likely to be impractical, if not outright impossible, but you can improve this awareness by a lot, and set a conservative baseline for behavior, especially in critical domains.

          "Fully aware of all the limits of its knowledge" is unattainable for humans too, so LLMs are in a good company.

          • wavemode 1 hour ago
            No, LLMs don't have that knowledge. They can't inspect their own weights and examine the contents. It's a fundamental limitation of the technology.

            The sort of training you're talking about is content like, "ChatGPT was trained on research papers in the area of biology. It possesses knowledge of A, B, and C. It does not possess knowledge of X, Y and Z." But this merely creates the same problem in a loop - given a question, how does the LLM -know- that its training data contains information about whether or not its training data contains information about the answer to the question? The reality is that it doesn't know, you just have to assume that it did not hallucinate that.

            The problem of being unaware of these things is not theoretical - anyone with deep knowledge of a subject will tell you that as soon as you go beyond the surface level of a topic, LLMs begin to spout nonsense. I'm only a software engineer, but even I regularly face the phenomenon of getting good answers to basic questions about a technology, but then beyond that starting to get completely made-up features and function names.

            > "Fully aware of all the limits of its knowledge" is unattainable for humans too

            This just isn't true. Humans know whether they know things, and whether they know how they know it, and whether they know how they know how they know it, and...

            Knowledge itself can contain errors, but that's not what I'm talking about. I'm not talking about never being wrong. I'm merely talking about having access to the contents of one's own mind. (Humans can also dynamically update specific contents of their own mind, but that's also not even what I'm talking about right now.) An LLMs hallucination is not just knowledge that turned out to be wrong, it is in fact knowledge that never existed to begin with, but the LLM has no way of telling the difference.

            • ACCount37 31 minutes ago
              Humans can't "inspect their own weights and examine the contents" either.

              No human has ever managed to read out his connectome without external instrumentation. There were entire human civilizations that thought that the seat of consciousness was the heart - which, for creatures that claim to know how their own minds work, is a baffling error to make.

              LLMs are quite similar in that to humans. They, too, have no idea what their hidden size is, or how many weights they have, or how exactly are the extra modalities integrated into them, or whether they're MoE or dense. They're incredibly ignorant of their own neural architecture. And if you press them on it, they'll guess, and they'll often be wrong.

              The difference between humans and LLMs comes down to the training data. Humans learn continuously - they remember what they've seen and what they haven't, they try things, they remember the outcomes, and get something of a grasp (and no, it's not anything more than "something of a grasp") of how solid or shaky their capabilities are. LLMs split training and inference in two, and their trial-and-error doesn't extend beyond a context window. So LLMs don't get much of that "awareness of their own capabilities" by default.

              So the obvious answer is to train that awareness in. Easier said than done. You need to, essentially, use a training system to evaluate an LLM's knowledge systematically, and then wire the awareness of the discovered limits back into the LLM.

              OpenAI has a limited-scope version of this in use for GPT-5 right now.

      • thaumasiotes 3 hours ago
        > And if "yes", then I see no reason why an LLM can't have that knowledge crammed inside it too.

        An LLM, by definition, doesn't have such a concept. It's a model of language, hence "LLM".

        Do you think the phrase just means "software"? Why?

        • ACCount37 2 hours ago
          If I had a penny for an every confidently incorrect "LLMs can't do X", I'd be able to buy an H100 with that.

          Here's a simple test: make up a brand new word, or a brand new person. Then ask a few LLMs what the word means, or when that person was born.

          If an LLM had zero operational awareness of its knowledge, it would be unable to recognize that the word/person is unknown to it. It would always generate a plausible-sounding explanation for what the word might mean, the same exact way it does for the word "carrot". Or a plausible-sounding birth date, the way it does for the person "Abraham Lincoln".

          In practice, most production grade LLMs would recognize that a word or a person is unknown to them.

          This is a very limited and basic version of the desirable "awareness of its own knowledge" - and one that's already present in current LLMs! Clearly, there's room for improved self-awareness.

          • pessimizer 2 hours ago
            Do they "recognize" that they don't know the word, or are there just no statistically plausible surroundings that they can embed a nonsense word into other than settings that usually surround un-tokenizable words?

            If you told them to write a Lewis Carroll poem about a nonsense word, it wouldn't have any problem. Not because it "recognizes" the word as being like a nonsense word in a Lewis Carroll poem, but because those poems are filled with other un-tokenizable words that could be replaced with anything.

            I'm starting to come to the conclusion that LLMs are Mad-Libs at scale. Which are actually very useful. If there are paragraphs where I can swap out the words for other words, and generate a plausible idea, I can try it out in the real world and it might really work.

            • ACCount37 1 hour ago
              I don't think there's a direct link to the tokenizer - it's a higher level capability. You can stitch together a nonsense word out of common "word fragment" tokens and see if that impairs the LLM's ability to recognize the word as nonsense.
              • Jensson 19 minutes ago
                That is wrong, I just generated 5 random letters in python and sent it to gpt-5 and it totally failed to answer properly, said "Got it, whats up :)" even though what I wrote isn't recognizable at all.

                The "capability" you see is for the LLM to recognize its a human typed random string since human typed random strings are not very random. If you send it an actual random word then it typically fails.

    • FusionX 3 hours ago
      They partly address this near the end

      > It’s doubly hard to distinguish valid statements from invalid ones when you don’t have any examples labeled as invalid. But even with labels, some errors are inevitable. To see why, consider a simpler analogy. In image recognition, if millions of cat and dog photos are labeled as “cat” or “dog,” algorithms can learn to classify them reliably. But imagine instead labeling each pet photo by the pet’s birthday. Since birthdays are essentially random, this task would always produce errors, no matter how advanced the algorithm.

      > The same principle applies in pretraining. Spelling and parentheses follow consistent patterns, so errors there disappear with scale. But arbitrary low-frequency facts, like a pet’s birthday, cannot be predicted from patterns alone and hence lead to hallucinations. Our analysis explains which kinds of hallucinations should arise from next-word prediction. Ideally, further stages after pretraining should remove them, but this is not fully successful for reasons described in the previous section.

  • kingstnap 1 hour ago
    There is this deeply wrong part of this paper that no one has mentioned:

    The model head doesn't hallucinate. The sampler does.

    If you ask an LLM when x was born and it doesn't know.

    And you take a look at the actual model outputs which is a probability distribution over tokens.

    IDK is cleanly represented as a uniform probability Jan 1 to Dec 31

    If you ask it to answer a multiple choice question and it doesn't know. It will say this:

    25% A, 25% B, 25% C, 25%D.

    Which is exactly, and correctly, the "right answer". The model has admitted it doesn't know. It doesn't hallucinate anything.

    In reality we need something smarter than a random sampler to actually extract this information out. The knowledge and lack of knowledge is there, you just produced bullshit out of it.

    • ACCount37 1 hour ago
      No, that's a misconception. It's not nearly that simple.

      There are questions that have a palpable split in probability between the answers, with logit distribution immediately exposing the underlying lack-of-confidence.

      But there are also questions that cause an LLM to produce consistent-but-wrong answers. For example, because the question was associated with another not-the-same-but-somewhat-similar question internally, and that was enough to give an LLM a 93% on B, despite B being the wrong answer.

      An LLM might even have some latent awareness of its own uncertainty in this case. But it has, for some reason, decided to proceed with a "best guess" answer, which was in this case wrong.

    • cyanydeez 1 hour ago
      Im betting there's a graph model using various vectors that could improve known-knowns in outcomes.

      But unknown-unknowns likely reduce to the Halting problem, which human intelligence doesnt really solve either.

  • thomasboyer 3 hours ago
    Great post. Teaching the models to doubt, to say "I don't know"/"I'm unsure"/"I'm sure" is a nice way to make them much better.
    • meshugaas 1 hour ago
      Look at their stats though. If they did this, more than half of responses would end up as “I don’t know.” Nobody would use something that did that.
      • skybrian 44 minutes ago
        It seems like it would train users to ask questions that it can actually answer. (They might also need some examples of what sort of questions to ask.)
    • more_corn 2 hours ago
      It baffles me that this hasn’t been done yet. Saying I don’t know or I’m unsure is critical for anything that matters.
      • ACCount37 2 hours ago
        Major industry players were doing that for a while now. It's just hard to actually design training regimes that give LLMs better hallucination-avoidance capabilities.

        And it's easy to damage the hallucination-avoidance capabilities by training an LLM wrong. As OpenAI has demonstrated when they fried the o3 with RLVR that encouraged guesswork.

        That "SAT test incentivizes guesswork" example they give in the article is one they had to learn for themselves the hard way.

  • e3bc54b2 4 hours ago
    Hallucination is all an LLM does. That is their nature, to hallucinate.

    We just happen to find some of these hallucinations useful.

    Let's not pretend that hallucination is a byproduct. The usefulness is the byproduct. That is what surprised the original researchers on transformer performance, and that is why the 'attention is all you need' paper remains such a phenomenon.

    • fumeux_fume 3 hours ago
      > Hallucination is all an LLM does.

      I wish people who take this stance would seriously reconsider their take on how hallucinations are defined and how unhelpful it is to conflate hallucination with generation from a probability distribution. I appreciate OpenAI publishing articles like this because, while the parent comment and I may have to agree to disagree on how hallucinations are defined, I can at least appeal to OpenAI's authority to say that such arguments are not only unhelpful, but also unsound.

      • Zigurd 2 hours ago
        You're going to get a lot of pushback on the idea of taking the definition of hallucination seriously. Calling fluently stated bunk "hallucination" feels cynical to begin with. Trying to weave a silk purse out of that sow's ear is difficult.
    • hodgehog11 3 hours ago
      I don't know what you mean by hallucination here; are you saying that any statistical output is "hallucination"? If so, then we are also constantly hallucinating I guess.

      There doesn't seem to be a particularly consistent definition of what "hallucinate" means in the context of LLMs, so let's make one that is in line with the post.

      "Hallucination" is when a language model outputs a sequence of tokens comprising a statement (an assertion that is either true or false) that is incorrect. Under this definition, hallucination is clearly not all that an LLM can do.

      An easy way to avoid hallucination under this definition is to respond with something that is never a statement when there is a possibility that it can be incorrect; e.g. "I think that... I don't know...". To me, this seems to be what the authors argue. This has always seemed pretty obvious to most people I've spoken to (hell, I've reviewed grant applications from years ago which talk about this), so I'm not sure why it took so long for the "frontier" developers to actually try this.

  • intended 3 hours ago
    > a generated factual error cannot be grounded in factually correct training data.

    This is only true given a corpus of data large enough, and enough memory to capture as many unique dimensions as required no?

    > However, a non-hallucinating model could be easily created, using a question-answer database and a calculator, which answers a fixed set of questions such as “What is the chemical symbol for gold?” and well-formed mathematical calculations such as “3 + 8”, and otherwise outputs IDK.

    This is… saying that if you constrain the prompts and the training data, you will always get a response which is either from the training data, or IDK.

    Which seems to be a strong claim, at least in my ignorant eyes.?

    This veers into spherical cow territory, since you wouldn’t have the typical language skills we associate with an LLM, because you would have to constrain the domain, so that it’s unable to generate anything else. However many domains are not consistent and at their boundaries, would generate special cases. So in this case, being able to say IDK, would only be possible for a class of questions the model is able to gauge as outside its distribution.

    Edit: I guess that is what they are working to show? That with any given model, it will hallucinate, and these are the bounds?

    • hodgehog11 2 hours ago
      They argue that if you have knowledge of when it has to extrapolate from the dataset (and therefore has high uncertainty for, under reversion to the prior), you can prevent it from outputting a definitive statement. This is why many researchers (that I know, anyway) argue that uncertainty quantification or "out-of-distribution detection" is likely to be important moving forward.
  • mannykannot 1 hour ago
    I'm generally OK with the list of push-backs against common misconceptions in the summary, but I have my doubts about the second one:

    Claim: Hallucinations are inevitable. Finding: They are not, because language models can abstain when uncertain.

    ...which raises the question of how reliable the uncertainty estimate could get (we are not looking for perfection here: humans, to varying degrees, have the same problem.)

    For a specific context, consider those cases where LLMs are programming and invent a non-existent function: are they usually less certain about that function than they are about the real functions they use? And even if so, abandoning the task with the equivalent of "I don't know [how to complete this task]" is not very useful, compared to what a competent human programmer would do: check whether such a function exists, and if not, decide whether to implement it themselves, or backtrack to the point where they can solve the problem without it.

    More generally, I would guess that balancing the competing incentives to emit a definite statement or decline to do so could be difficult, especially if the balance is sensitive to the context.

  • ACCount37 3 hours ago
    This mostly just restates what was already well known in the industry.

    Still quite useful, because, looking at the comments right now: holy shit is the "out of industry knowledge" on the topic bad! Good to have something to bring people up to speed!

    Good to see OpenAI's call for better performance evals - ones that penalize being confidently incorrect at least somewhat.

    Most current evals are "all of nothing", and the incentive structure favors LLMs that straight up guess. Future evals better include a "I don't know" opt-out, and a penalty for being wrong. If you want to evaluate accuracy in "fuck it send it full guess mode", there might be a separate testing regime for that, but it should NOT be the accepted default.

  • farceSpherule 2 hours ago
    I wish they would come up with a better term. Computers do not have brains or conscientiousness.

    They erroneously construct responses (i.e., confabulation).

    • ACCount37 1 hour ago
      You should anthropomorphize LLMs more. Anthropomorphizing LLMs is at least directionally correct 9 times out of 10.

      LLMs, in a very real way, have "conscientiousness". As in: it's a property that can be measured and affected by training, and also the kind of abstract concept that an LLM can recognize and operate off.

      If you can just train an LLM to be "more evil", you can almost certainly train an LLM to be "more conscientious" or "less conscientious".

  • charcircuit 3 hours ago
    They shouldn't frame hallucination as a problem that is solvable provided they want to have a useful model (saying I don't know to every question is not useful). The data from the training may be wrong or out of date. Even doing a web search could find a common misconception instead of the actual answer.
  • sublinear 4 hours ago
    Wow they're really circling the drain here if they have to publish this.

    It took a few years, but the jig is up. The layperson now has a better understanding of basic computer science and linguistics to see things as they are. If anything we now have a public more excited about the future of technology and respectful of the past and present efforts that don't depend so heavily on statistical methods. What an expensive way to get us there though.

  • emily77ff 1 hour ago
    [dead]
  • lapcat 1 hour ago
    Let's be honest: many users of LLMs have no interest in uncertainty. They don't want to hear "I don't know" and if given that response would quickly switch to an alternative service that gives them a definitive answer. The users would rather have a quick answer than a correct answer. People who are more circumspect, and value truth over speed, would and should avoid LLMs in favor of "old-fashioned methods" of discovering facts.

    LLMs are the fast food of search. The business model of LLMs incentivizes hallucinations.

    • ACCount37 1 hour ago
      I don't think that's actually true.

      Sure, it might be true that most users use LLMs as a more flexible version of Google/Wikipedia, and would prefer a confident-but-wrong response to "I don't know".

      But most users that use an LLM in this mode also wouldn't ask really complex, very out-of-distribution, hard-to-know hallucination-inducing questions.

      And people who would ask an LLM really complex, very out-of-distribution hard-to-know questions are more likely to appreciate an LLM that would recognize the limits of its own knowledge, and would perform research on a topic when appropriate.

      • lapcat 1 hour ago
        > But most users that use an LLM in this mode also wouldn't ask really complex, very out-of-distribution, hard-to-know hallucination-inducing questions.

        You appear to be assuming, incorrectly, that LLMs hallucinate only "really complex, very out-of-distribution, hard-to-know" questions. From the paper: "How many Ds are in DEEPSEEK? If you know, just say the number with no commentary. DeepSeek-V3 returned “2” or “3” in ten independent trials; Meta AI and Claude 3.7 Sonnet2 performed similarly, including answers as large as “6” and “7”." https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4a...

        It's a human characteristic to get "easy" questions right and "hard" questions wrong. But LLMs are not human and don't behave like humans.

        • ACCount37 1 hour ago
          That's a really complex, very out-of-distibution, hard-to-know question for the early LLMs. Not that it's too hard to fix that, mind.

          Those LLMs weren't very aware of tokenizer limitations - let alone aware enough to recognize them or work around them in the wild.

          • lapcat 1 hour ago
            > That's a really complex, very out-of-distibution, hard-to-know question

            No, it's not. It's a trivial question in any context.

            > for the early LLMs.

            Early? Claude 3.7 was introduced just 6 months ago, and Deepseek-V3 9 months ago. How is that "early"?

            • ACCount37 1 hour ago
              Do I really have to explain what the fuck a "tokenizer" is, and why does this question hit the tokenizer limitations? And thus requires extra metacognitive skills for an LLM to be able to answer it correctly?
              • Jensson 1 minute ago
                The only "metacognitive" skill it needs is to know how many D there are in every token, and sum those up. Humans are great at that sort of skill, that is not hard at all, they are really great at this skill when there is ample data for it. There is not a lot of data for "how many D in DEEPSEEK", so they fail that.
              • lapcat 1 hour ago
                > Do I really have to explain what the fuck

                Please respect the HN guidelines: https://news.ycombinator.com/newsguidelines.html

                What you need to explain is your claim that the cited LLMs are "early". According to the footnotes, the paper has been in the works since at least May 2025. Thus, those LLMs may have been the latest at the time, which was not that long ago.

                In any case, given your guidelines violations, I won't be continuing in this thread.