Anthropic Claude Fable 5 Refuses Innocuous Prompts, Angering Users with Hyper-Vigilant Safety
Anthropic's Claude Fable 5 is blocking harmless requests like 'hello' due to strict safety classifiers. Users report false positives in up to 5% of sessions, undermining trust in the model's utility.
Anthropic released its Claude Fable 5 generative AI model on Tuesday, June 9, but customers are reporting that the system refuses to answer harmless prompts due to hyper-vigilant safety classifiers. The model, the first generally available Mythos-class offering, has frustrated developers, researchers, and security experts who say the guardrails are too conservative.
Anthropic acknowledged that Fable 5's safeguards are tuned conservatively, stating they trigger false positives in less than five percent of sessions. But with an estimated 18 to 30 million users worldwide, even a small rate of refusal creates significant disruption. Mike Famulare, a principal research scientist at the Institute for Disease Modeling (part of the Gates Foundation), reported that the model balks at inputs as simple as the word "hello."
False Positives Mount Across Use Cases
Bug reports have flooded Anthropic's Claude Code GitHub repository since launch. Issues include Fable 5 refusing to edit an "Application Security Architect resume" and blocking usage for non-research lab management systems. Derya Unutmaz, an immunologist at the Jackson Laboratory for Genomic Medicine, noted on X.com that "the word 'cancer' is flagged as a biosecurity risk." The model's safety classifiers silently fall back to the older Claude Opus 4.8 when they detect a violation, often without notifying the user.
- Input safety classifier emits model_refusal_fallback on the first turn of nearly every session for some users, including sessions with only the input "hello."
- Reddit threads and social media posts show widespread complaints about benign medical and technical terms being blocked.
- Anthropic's system card indicates counter-competition classifiers use prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT) without user visibility.
Trust and Control at Cross Purposes
Anthropic's approach to safety has drawn criticism for being both opaque and aggressive. The company's counter-competition surveillance, designed to prevent rival frontier model development, operates silently. Developer Clay Merritt described this as "purposeful degradation invisible to the user." Anthropic estimates these measures affect about 0.03 percent of traffic, but users say even small rates of silent sabotage erode trust. The company is rolling out Project Glasswing and trusted access programs for critical infrastructure and biology researchers, offering the safer Claude Mythos 5 without the same guardrails.
Devon, founder of Abliteration.ai (a service that removes guardrails from models), told The Register that Anthropic is betting its brand on users tolerating the refusals. "In the long term, people are not just going to accept these companies that centralize control over their lives," he said. On Wednesday evening, an Anthropic spokesperson acknowledged the safeguards were too stringent and promised to reduce false positives for biological research, also making flagged requests for frontier LLM development visible going forward.
Fact check
-
Claude Fable 5 was released on Tuesday, June 9.
reported · source
-
A principal research scientist at the Gates Foundation reported that the model blocks the input 'hello.'
reported · source
-
Anthropic stated false positives occur in less than 5% of sessions.
reported · source
-
The model has an estimated 18 to 30 million users worldwide.
reported · source
-
Anthropic acknowledged the safeguards were too stringent and promised to reduce false positives.
reported · source
Source reporting (3)
- The Register · It blocked us at 'hello!' Anthropic Fable 5 refusing innocuous prompts
- The New Stack · Fable 5: Guardrails and burn rate are annoying users, who say it’s still better than Opus 4.8
- The New Stack · The Anthropic leader who built Claude Code says he ditched prompting — now he just writes loops.
Join the conversation
You need to be registered and logged in to comment on blog articles.
0 Comments
No comments yet
Be the first to share your thoughts on this article.