It used to mean things like false positives in computer vision, where it is sort of appropriate: the AI is seeing something that’s not there.
Then the machine translation people started misusing the term when their software mistranslated by adding something that was not present in the original text. They may have been already trying to be misleading with this term, because “hallucination” implies that the error happens when parsing the input text - which distracts from a very real concern about the possibility that what was added was being plagiarized from the training dataset (which carries risk of IP contamination).
Now, what’s happening is that language models are very often a very wrong tool for the job. When you want to cite a court case as a precedent, you want a court case that actually existed - not a sample from the underlying probability distribution of possible court cases! LLM peddlers don’t want to ever admit that an LLM is the wrong tool for that job, so instead they pretend that it is the right tool that, alas, sometimes “hallucinates”.
I love the “criti-hype”. AI peddlers absolutely love any concerns that imply that the AI is really good at something.
Safety concern that LLMs would go Skynet? Say no more, I hear you and I’ll bring it up in the congress!
Safety concern that terrorists might use it to make bombs? Say no more! I agree that the AI is so great for making bombs! We’ll restrict it to keep people safe!
Sexual roleplay? Yeah, good point, I love it. Our technology is better than sex itself! We’ll restrict it to keep mankind from falling into the sin of robosexuality and going extinct! I mean, of course, you can’t restrict something like that, but we’ll try, at least until we release a hornybot.
But any concern about language modeling being fundamentally not the right tool for some job (Do you want to cite a paper or do you want to sample from the underlying probability distribution?), hey hey hows about we talk about the skynet thing instead?