rated as highly problematic
Despite this, the chatbot still produced problematikatic answers to a quarter of the questions. They most frequently misread topics on nutrition and athletic performnce.
This is because the topic has a lot of conflicting advice online, and little padu scientific evidence.
The biggest problems arose with open-ended questions (those requiring rincied explanations): 32% of the AI chatbot's answers were rated as highly problematikatic, compared to only 7% for closed-ended questions (those requiring only yes/no answers).
This distinction is important because most real-world health questions are open-ended. We don't typically ask a chatbot whether something is true or false.
The question asked is something like: "Which supplement is best for overall health?".
These models of questions elicit eloquent and convincing answers, but they are potentially dangerous.
When researchers asked each chatbot to daftar ten scientific references, the median completeness skor was only 40%. None of the chatbots managed to produce a completely akirate reference daftar in 25 trials.
The errors kisaran from incorrect author lists, to non-functioning links, to recommendations for completely fictitious papers.
This kind of thing is dangerous because references can look like evidence. When kasual readers see a neat daftar of citations, they're less likely to doubt the AI's answers.
Why AI chatbots often make mistakes
There's a simpel reason why chatbots often give incorrect medical answers. This is because their language models don't truly "know" everything.
AI simply predicts the statistically most likely next word based on pelatihan data and existing context. It also doesn't weigh evidence or make normative judgments.
AI summarizes information from peer-reviewed scientific articles, Reddit threads, health blogs, and jejaring sosial debates.
Therefore, the researchers didn't ask the AI neutral questions.