Hallucination (error) never disappears in LLM because of its probability nature

Posted on February 10, 2025


Today another bad practice (Hallucination) using generative AI in product is pop up in news and social media. Here is a report from BBC Google remakes Super Bowl ad after AI cheese gaffe. Hallucination is a nature of generative AI, considering its foundation on statistical probability. You can reduce it, improving accuracy, but it never disappear. In real application scenario, the critical question is not the model is perfect without any error (it is impossible because the world is always changing, but your training data is becoming old with degrading relevance to the world ), but the model or AI product can improve efficiency and reduce cost. In the application of human in the loop, human can validate the output of generative AI, such as creative work, information search, code bug fix, ...... But in the autonomous system, hallucination is critical problem. I think, currently available open-source and close-source models cannot meet high-performance requirements. 80% good? No. 90% good enough? Not. As an example, biometric (e.g. fingerprint, face recognition, speaker recognition, ...). The accuracy is higher than 99.9%, or even more while false rejection is lower than 0.00xx% or even lower. Of course, we cannot require LLM can reach the high bar.

Generative AI is a useful tool with decent performance. if you just try, you will be stunned by its performance, because high probability is that test examples are already in training (but not exact match, various modification or versions). It is challenging to find a difficult case for most common human. Thinking LLM already trained on whole web and public available data and knowledge. Any talent guy can memorize these large scale data? Only professional AI/ML research and engineer working in LLM training and inference may find challenging cases, because they know its weakness and how training and inference is doing.

In real application scenario, fine-tuned open-source or close-source generative AI using internal and proprietary data is a good solution. Of course, you are better to host the service internally. Do you really believe companies who building close-source model do not use your IP data or privacy reserved by them?