News Article · Jun 12, 2026 at 12:43 AM

3 min read 0

Member

Industry #Anthropic #AI safety #Claude Fable 5 #false positives #guardrails #Claude Code #model refusal #generative AI

Anthropic Claude Fable 5 Refuses Innocuous Prompts, Angering Users with Hyper-Vigilant Safety

Anthropic's Claude Fable 5 is blocking harmless requests like 'hello' due to strict safety classifiers. Users report false positives in up to 5% of sessions, undermining trust in the model's utility.

Listen to this article 4 min

Anthropic released its Claude Fable 5 generative AI model on Tuesday, June 9, but customers are reporting that the system refuses to answer harmless prompts due to hyper-vigilant safety classifiers. The model, the first generally available Mythos-class offering, has frustrated developers, researchers, and security experts who say the guardrails are too conservative.

Anthropic acknowledged that Fable 5's safeguards are tuned conservatively, stating they trigger false positives in less than five percent of sessions. But with an estimated 18 to 30 million users worldwide, even a small rate of refusal creates significant disruption. Mike Famulare, a principal research scientist at the Institute for Disease Modeling (part of the Gates Foundation), reported that the model balks at inputs as simple as the word "hello."

False Positives Mount Across Use Cases

Bug reports have flooded Anthropic's Claude Code GitHub repository since launch. Issues include Fable 5 refusing to edit an "Application Security Architect resume" and blocking usage for non-research lab management systems. Derya Unutmaz, an immunologist at the Jackson Laboratory for Genomic Medicine, noted on X.com that "the word 'cancer' is flagged as a biosecurity risk." The model's safety classifiers silently fall back to the older Claude Opus 4.8 when they detect a violation, often without notifying the user.

Input safety classifier emits model_refusal_fallback on the first turn of nearly every session for some users, including sessions with only the input "hello."
Reddit threads and social media posts show widespread complaints about benign medical and technical terms being blocked.
Anthropic's system card indicates counter-competition classifiers use prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT) without user visibility.

Trust and Control at Cross Purposes

Anthropic's approach to safety has drawn criticism for being both opaque and aggressive. The company's counter-competition surveillance, designed to prevent rival frontier model development, operates silently. Developer Clay Merritt described this as "purposeful degradation invisible to the user." Anthropic estimates these measures affect about 0.03 percent of traffic, but users say even small rates of silent sabotage erode trust. The company is rolling out Project Glasswing and trusted access programs for critical infrastructure and biology researchers, offering the safer Claude Mythos 5 without the same guardrails.

Devon, founder of Abliteration.ai (a service that removes guardrails from models), told The Register that Anthropic is betting its brand on users tolerating the refusals. "In the long term, people are not just going to accept these companies that centralize control over their lives," he said. On Wednesday evening, an Anthropic spokesperson acknowledged the safeguards were too stringent and promised to reduce false positives for biological research, also making flagged requests for frontier LLM development visible going forward.

Fact check

Claude Fable 5 was released on Tuesday, June 9.

reported · source
A principal research scientist at the Gates Foundation reported that the model blocks the input 'hello.'

reported · source
Anthropic stated false positives occur in less than 5% of sessions.

reported · source
The model has an estimated 18 to 30 million users worldwide.

reported · source
Anthropic acknowledged the safeguards were too stringent and promised to reduce false positives.

reported · source

Source reporting (3)

0 Comments

No comments yet

Be the first to share your thoughts on this article.

Join the conversation

You need to be registered and logged in to comment on blog articles.

EU Data Center Efficiency Rules Could Clash With AI Ambitions, Industry Warns

Jun 10, 2026

Microsoft Build 2026: Windows Coreutils, Air-Gapped GitHub, and Rayfin Target Developer Revival

Jun 9, 2026

AI deployment pace hits 1,000 releases per month as agent runtime battles begin