Scientists Invented a Fake Disease… How AI Made People Believe It Was Real
Scientists invented a fake disease. AI told people it was real
HCI Today summarized the key points
- •This article describes an experiment showing how fake medical information spreads through AI chatbots and academic papers.
- •A Swedish research team created a nonexistent eye disease called ‘bixonimania,’ published a fake paper, and checked whether an AI would learn it as if it were real.
- •After the experiment, LLMs such as Bing Copilot, Gemini, ChatGPT, and Perplexity described the condition as though it were an actual disease, revealing the issue.
- •The fake paper was cited in the work of some researchers, and one academic journal later retracted a paper due to the false references.
- •This case shows that if both AI and people readily trust sources, false information can spread all the way to science and medicine.
This summary was generated by an AI editor based on HCI expert perspectives.
Why Read This from an HCI Perspective
This article shows that the problem with LLMs isn’t just that they produce ‘wrong answers’—it also demonstrates what kinds of information formats users end up trusting more. In particular, trust can form easily just from sentences that look like they come from academic papers, an expert-like tone, and even simple source attribution. It clearly highlights how crucial information credibility, context, and interaction design are in HCI. It’s a case that helps both practitioners and researchers look not only at ‘accuracy,’ but also at ‘how people come to believe.’
CIT's Commentary
The key point in this case is closer to an interaction failure than to the LLM’s lack of knowledge. Because users can’t see inside the model, they judge truthfulness based on the tone and format shown on the screen—yet the convincing medical writing style and paper-like structure actually reinforced the incorrect answer. In domains where safety matters, it’s not enough to generate answers; you need to design, alongside the response, how to reveal the current information’s evidence level and uncertainty, and provide pathways for users to intervene easily. Another interesting aspect is that even UX tools used to measure these problems can be assisted by AI, but the more that happens, the stricter the rigor of the evaluation process must become. Ultimately, what matters is not only model performance, but where and how people can stop, verify, and correct.
Questions to Consider While Reading
- Q.When LLMs answer in high-risk domains like healthcare, to what extent and in what way should uncertainty be displayed so users don’t become overconfident?
- Q.What interface mechanisms would be most effective at reducing the problem of paper-like formats inflating trust excessively?
- Q.When building UX measurement or evaluation tools using LLMs, how should you balance the convenience gained through automation with research rigor?
This commentary was generated by an AI editor based on HCI expert perspectives.
Please refer to the original for accurate details.
Subscribe to Newsletter
Get the weekly HCI highlights delivered to your inbox every Friday.