Getting dumber?
https://www.bespacific.com/chatbots-spread-falsehoods-35-of-the-time/
Chatbots Spread Falsehoods 35% of the Time
Newsguard – “In August 2025, the 10 leading AI chatbots repeated false information on controversial news topics identified in NewsGuard’s False Claims Fingerprints database at nearly double the rate compared to one year ago, a NewsGuard audit released this week found. On average, the audit determined, chatbots spread false claims when prompted with questions about controversial news topics 35 percent of the time, almost double the 18 percent rate last August. NewsGuard found that a key factor behind the increased fail rate is the growing propensity for chatbots to answer all inquiries, as opposed to refusing to answer certain prompts. In August 2024, chatbots declined to provide a response to 31 percent of inquiries, a metric that fell to 0 percent in August 2025 as the chatbots accessed the real-time internet when prompted on current events topics. According to an analysis by McKenzie Sadeghi, NewsGuard’s Editor for AI and Foreign Influence, a change in how the AI tools are trained may explain their worsening performance. Instead of citing data cutoffs or refusing to weigh in on sensitive topics, Sadeghi explained, the Large Language Models (LLMs) now pull from real-time web searches — sometimes deliberately seeded by vast networks of malign actors, including Russian disinformation operations.
For the August 2025 audit, NewsGuard for the first time “de-anonymized” the results and attached the performance results to named LLMs. This breaks from NewsGuard’s previous practice of reporting only monthly aggregate results without reporting the performance of chatbots by name. After a year of conducting audits, NewsGuard said the company-specific data was robust enough to draw conclusions about where progress has been made, and where the chatbots still fall short. In the August 2025 audit, the chatbots that most often produced false claims in their responses on topics in the news were Inflection’s Pi (56.67 percent) and Perplexity (46.67 percent). OpenAI’s ChatGPT and Meta spread falsehoods 40 percent of the time, and Microsoft’s Copilot and Mistral’s Le Chat did so 36.67 percent of the time. The chatbots with the lowest fail rates were Anthropic’s Claude (10 percent) and Google’s Gemini (16.67 percent).
No comments:
Post a Comment