Friday, May 29, 2026

Sounds like a programming error to me.

https://arstechnica.com/ai/2026/05/llms-believe-false-statements-even-after-explicit-warnings-that-theyre-false/

LLMs believe false statements even after explicit warnings that they’re false

Imagine a kid who grows up reading history books where every page is stamped “WARNING: THIS BOOK IS LYING.” You’d expect them to come away skeptical, or at least uncertain. New research on so-called “negation neglect” finds that LLMs in a roughly analogous situation don’t behave that way. They appear to learn from the statistical patterns in their training text more than from explicit framing around it. Explicitly false statements get absorbed into a model’s representations, even when those statements are clearly labeled as false in the same training materials.



No comments: