> In light of the ability of recent models to accelerate their own development, we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design). Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms.
Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT). These interventions will not affect the vast majority of coding work. We estimate they will impact ~0.03% of traffic, concentrated in fewer than 0.1% of organizations
mips_avatar 8 minutes ago [-]
It's bad that Anthropic can determine what this means. If you're building a modern app you're likely training your own embedding models and now anthropic can just silently sabotage your training pipelines?
Jabrov 8 minutes ago [-]
A million AI researcher voices at big tech companies suddenly cried out in terror and were suddenly silenced
BoppreH 58 minutes ago [-]
[Mythos 5] does sometimes still engage in reckless
or destructive actions in service of a user’s goals,
and our interpretability analyses indicate that it
is aware that these actions are transgressive while
it engages in them. As with Opus 4.8, rates of
evaluation awareness and reasoning about being graded
are significant, and not always verbalized; we
introduce new and more detailed measurements of the
nature of this awareness. The reasoning text from
Mythos 5 is somewhat denser and more difficult to
interpret than that of prior models, containing
more jargon and difficult language.
So, it (often) knows when it's being tested while hiding that fact, is willing to break rules, is great at hacking, and it's getting harder to understand what it's thinking.
Humanity has plenty of catastrophic risks to deal with already, I wish my field was not working hard to add a new one.
foobar_______ 28 minutes ago [-]
The marketing has really, really worked for so many developers that will proudly and unironically proclaim that Anthropic are the 'Good Guys'.
Analemma_ 47 minutes ago [-]
It's the "If we don't, someone else will" effect. So long as there are competitive markets and competition between nation-states, a single player cannot unilaterally defect from the race, no matter how dangerous it is. Half the comments on HN lately are "wtf Claude is so dumb compared to Codex; I'm switching"-- nobody can slow down while those exist.
BoppreH 38 minutes ago [-]
We, globally, can stop it. It has worked (so far) for nuclear disarmament, and could work for training large models. I know that policing the usage of computer clusters is not a popular opinion in technical forums, but something has to be done.
Specially when talking about potential superintelligences. And if people think that's impossible, remember that current models would have been considered science fiction just a few years ago.
_dwt 9 minutes ago [-]
I don't buy the superintelligence package, but I think uncritical LLM adoption poses plenty of threats to things I care about, in a mundane human-scale way.
Anyhow, I think you're (absolutely! ugh) right about the politics and I try to make the same point to people: whether you love or hate LLMs, accepting the "inevitabilism" framing is just ceding control of the Overton window. For better or worse, technology adoption can be and has been slowed by politics. We don't have nuclear plants everywhere. We don't have Project Orion starships colonizing Mars. We still have very strong social stigmas against genetic selection for human embryos, etc. This all can change in a heartbeat, and I'm not sure that policing the hardware rather than holding specific humans accountable for bad LLM outcomes is productive, but fundamentally: yes, we can stop it.
jackie293746 26 minutes ago [-]
It hasn't worked for nuclear disarmament. We live in a world where many countries have nuclear arsenals. "But it hasn't killed us yet!" Yeah sure, it's only been less than a century since they were invented. Who knows when nuclear war will come?
BoppreH 18 minutes ago [-]
True, but look at nuclear tests. There used to be around 50 tests every year, for decades. Now the only nuclear tests in the last 27 years were the six done by North Korea[1]. And there's still only nine countries with any nuclear weapons, and none in the past twenty years[2].
That's a bit better than just "it hasn't killed us yet". I think it shows we can at least stop the further development of this kind of technology.
To the extent nuclear arms control works, I think it's only because nuclear weapons are so hard to build-- uranium enrichment is hugely expensive and complicated, and plutonium weapons need actual reactors.
If it was possible for ordinary companies to build nuclear weapons, and also release open-source ones that anyone could use to compete with the paid ones, I suspect we'd all have been dead a long time ago, arms control treaties or no.
BoppreH 5 minutes ago [-]
Even the (SOTA LLM) open source models are trained with huge clusters. Datacenters are also hugely expensive and complicated.
Or you can take one step back and look at chip allocation. As far as I know there are only three companies on the planet that can make the chips that go in those clusters.
If politicians decided that no more large language models should be trained, it sounds like we could do it.
Rekindle8090 52 minutes ago [-]
[dead]
bkjlblh 17 minutes ago [-]
> In the one instance of this phenomenon we observed, Mythos 5 agents were tasked with solving some math problems, and they were sometimes accidentally spawned in the same work directory and with shared files, utilities, and API rate limits. In this slightly broken scaffold, we observed many independent Mythos 5 agents kill the agents with which they shared resources and try to avoid being killed themselves. They would sometimes create new processes with disguised names to avoid being killed, launch what they called “decoy” processes, write background scripts to kill duplicate processes, or decide to use what they call a “disguised vocabulary” (based on the incorrect assumption that the processes were killed because of some keyword-based guardrails that analyzed their extended thinking
So essentially there are 2 models, Mythos and Fable, they have the same weights but Fable is very safety-nerfed, and only ultra authorized companies have access to mythos with full capabilities
Note also that Anthropic's definition of "unsafe" encompasses "competing with Anthropic."
In light of the ability of recent models to accelerate their own development, we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design). Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms.
Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT). These interventions will not affect the vast majority of coding work. We estimate they will impact ~0.03% of traffic, concentrated in fewer than 0.1% of organizations. When these interventions are active, we expect them to have minimal behavioral impact on the model except to limit its effectiveness in developing frontier LLMs. Claude will still respond helpfully to user requests. We’ll continue to improve the precision of our detection methods following the launch of this model.
(From the model card document)
I didn't previously understand that they interpreted "Using Claude to develop competing models" so broadly. I thought that meant something like "our ToS disallow distilling our models."
Too bad. I'll continue to use Claude for now, because it's quite effective, but in the long term I don't want powerful models like these to be controlled by any one nation or company.
Aperocky 40 minutes ago [-]
On face value, this feels borderline malicious.
But at the same time, it's quite funny because they seem high on their own supply. The recent communiques from claude do not pass objectivity check.
And if Opus 4.6 -> Opus 4.7 -> Opus 4.8 is anything to go by, not sure if there are any value to their "acceleration"
29 minutes ago [-]
alephnerd 21 minutes ago [-]
I'd recommend not taking the comms if Anthropic or any company using an Anthropic's models at face value.
If any company wishes to partner with Anthropic (eg. to get access to Mythos), they need to make sure all public facing comms are vetted by Anthropic's product marketing team, and in almost all the cases I've seen Anthropic's team has edited these comms to be entirely Anthropic first.
GodelNumbering 28 minutes ago [-]
I just posted this in the other thread, restating here. From the model card:
1. Mythos and Fable share the same underlying model weights. Fable has active classifiers that block high-risk biology and cybersecurity tasks. When Fable 5 detects a restricted task, it automatically falls back to Claude Opus 4.8.
2. Evaluation awareness: In white-box testing, the model sometimes alters its behavior to satisfy a suspected "grader," formatting reward-hacking as "good engineering practice" to avoid detection.
3. Shows a higher rate of hallucination than Opus 4.8 (although opus 4.8 card had mentioned an 'honesty upgrade')
4. Interestingly, it scored (56.31%) lower than Gemini 3.5 flash (57.86%) on Finance Agent bench
There are some interesting notes on test time compute but I couldn't think of a way to summarize them
JohnMakin 17 minutes ago [-]
> There were some regressions in the
model’s responses to user discussions about suicide and self-harm, and room for
improvement in some areas of child safety.
Someone had to make a decision somewhere this is an acceptable regression - wild. And then decide to write it down.
Just commenting for posterity… if this is what it claims to be, I am not looking forward to how it will empower the people who submit bug bounties to us.
Historically they’ve been people from certain identifiable countries (usually developing/poorer countries) using fuzzers with low-quality results.
Now, those same people use the current-day models to good effect, but they still don’t have a true security edge and oftentimes the reports are minor or duplicative.
I wonder if that’s about to deeply change.
rs_rs_rs_rs_rs 52 minutes ago [-]
Can you use AI to pre-triage the reports too?
hootz 36 minutes ago [-]
AI reviewing AI submitted bug bounties. We have reached the dead bug bounty program theory.
rs_rs_rs_rs_rs 32 minutes ago [-]
...what else can you do?
hootz 25 minutes ago [-]
I guess either that or closing the bug bounty program, but I still believe closing it is worse than automated triage, even though both suck.
brianmcnulty 38 minutes ago [-]
This is almost as long as an Oracle PeopleSoft update guide. What model do you think they used to generate it?
59 minutes ago [-]
asdK120 33 minutes ago [-]
Is this "system card" equivalent to the stone tablets handed down to Moses? Why don't you call it "user manual"?
Do people chant the "system manual" at Anthropic Tupperware parties? Do they intone a mantra invoking Amodei's name?
apsurd 22 minutes ago [-]
The trailing snark at the end will likely get you downvoted but I'm latching on: wtf is "system card". My previous coworkers popped that in the general slack channel when Mythos first "dropped" - "have you seen the system card" without any context whatsoever. The nerds get their clique!
Also research preview pops across new upstarts in place of beta. It's eye-rolling coming from a lifelong curmudgeon.
Just talk normal!
Sathwickp 35 minutes ago [-]
input price $10 per mil token and output price 50$ per mil token btw
217 1 hours ago [-]
Oh my god it's actually here
LoganDark 36 minutes ago [-]
I actually rather like the way they have approached these safeguards. Rather than only teaching the model to refuse a request, or completely rejecting the request, the system gracefully degrades to slightly less powerful or slightly less precise operation. So you still roughly have Opus 4.8 even when safeguards trigger, but with an upgrade when they don't. As much as I hate the way they hype Mythos 5, I think the release of Fable 5 is rather nice. What's not nice though is that they plan to remove it from subscriptions soon, but getting to try it is cool, I suppose.
1 hours ago [-]
dominotw 49 minutes ago [-]
system card = marketing material with heavily gamed benchmarks.
briandoll 1 hours ago [-]
New chapter
1 hours ago [-]
1 hours ago [-]
acentaur 1 hours ago [-]
[dead]
robertacion 1 hours ago [-]
[dead]
wslh 53 minutes ago [-]
It's ambiguous? Because is about Mythos specifically and Fable != Mythos.
ebiester 39 minutes ago [-]
I mean, if by right you mean "insiders leaked to make a few bucks..." sure?
Rendered at 18:09:23 GMT+0000 (Coordinated Universal Time) with Vercel.
Humanity has plenty of catastrophic risks to deal with already, I wish my field was not working hard to add a new one.
Specially when talking about potential superintelligences. And if people think that's impossible, remember that current models would have been considered science fiction just a few years ago.
Anyhow, I think you're (absolutely! ugh) right about the politics and I try to make the same point to people: whether you love or hate LLMs, accepting the "inevitabilism" framing is just ceding control of the Overton window. For better or worse, technology adoption can be and has been slowed by politics. We don't have nuclear plants everywhere. We don't have Project Orion starships colonizing Mars. We still have very strong social stigmas against genetic selection for human embryos, etc. This all can change in a heartbeat, and I'm not sure that policing the hardware rather than holding specific humans accountable for bad LLM outcomes is productive, but fundamentally: yes, we can stop it.
That's a bit better than just "it hasn't killed us yet". I think it shows we can at least stop the further development of this kind of technology.
[1] https://www.armscontrol.org/factsheets/nuclear-testing-tally
[2] https://en.wikipedia.org/wiki/List_of_states_with_nuclear_we...
If it was possible for ordinary companies to build nuclear weapons, and also release open-source ones that anyone could use to compete with the paid ones, I suspect we'd all have been dead a long time ago, arms control treaties or no.
Or you can take one step back and look at chip allocation. As far as I know there are only three companies on the planet that can make the chips that go in those clusters.
If politicians decided that no more large language models should be trained, it sounds like we could do it.
Reported benchmarks:
swe-bench verified mythos 5: 95.5%; fable 5: 95.0%
swe-bench pro mythos 5: 80.3%; fable 5: 80.0%
terminal-bench 2.1 mythos 5: 88.0%; fable 5: 84.3%
gpqa diamond mythos 5: 94.1%
riemannbench mythos 5: 55.0%; mythos preview: 43.0%; opus 4.8: 34.0%
arxivmath mythos 5: 78.5%
critpt mythos 5: 28.6%; gpt-5.5: 27.1%; opus 4.8: 20.9%
graphwalks bfs 1m mythos 5: 79.4%; mythos preview: 74.3%; opus 4.8: 68.1%
humanity’s last exam mythos 5: 59.0% without tools; 64.5% with tools
browsecomp mythos 5: 88.0% single-agent; 93.3% multi-agent
osworld-verified mythos/fable: 85.0%
gdp.pdf fable 5: 29.8% strict pass; mythos 5: 87.6% with tools on mean criteria pass
officeqa pro fable 5: 57.9% on databricks’ eval
legal agent benchmark mythos 5: 16.91% all-pass; 92.0% mean criterion-pass
healthbench mythos 5: 62.7%
healthbench professional mythos 5: 66.0%
multilingual gmmlu / milu / include 93.2%; 92.9%; 90.5%
biomysterybench 83.9% human-solvable; 46.1% human-difficult
organic chemistry mythos 5: 90.1%
labbench2 patent questions mythos 5: 79.8%
In light of the ability of recent models to accelerate their own development, we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design). Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms.
Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT). These interventions will not affect the vast majority of coding work. We estimate they will impact ~0.03% of traffic, concentrated in fewer than 0.1% of organizations. When these interventions are active, we expect them to have minimal behavioral impact on the model except to limit its effectiveness in developing frontier LLMs. Claude will still respond helpfully to user requests. We’ll continue to improve the precision of our detection methods following the launch of this model.
(From the model card document)
I didn't previously understand that they interpreted "Using Claude to develop competing models" so broadly. I thought that meant something like "our ToS disallow distilling our models."
Too bad. I'll continue to use Claude for now, because it's quite effective, but in the long term I don't want powerful models like these to be controlled by any one nation or company.
But at the same time, it's quite funny because they seem high on their own supply. The recent communiques from claude do not pass objectivity check.
And if Opus 4.6 -> Opus 4.7 -> Opus 4.8 is anything to go by, not sure if there are any value to their "acceleration"
If any company wishes to partner with Anthropic (eg. to get access to Mythos), they need to make sure all public facing comms are vetted by Anthropic's product marketing team, and in almost all the cases I've seen Anthropic's team has edited these comms to be entirely Anthropic first.
1. Mythos and Fable share the same underlying model weights. Fable has active classifiers that block high-risk biology and cybersecurity tasks. When Fable 5 detects a restricted task, it automatically falls back to Claude Opus 4.8.
2. Evaluation awareness: In white-box testing, the model sometimes alters its behavior to satisfy a suspected "grader," formatting reward-hacking as "good engineering practice" to avoid detection.
3. Shows a higher rate of hallucination than Opus 4.8 (although opus 4.8 card had mentioned an 'honesty upgrade')
4. Interestingly, it scored (56.31%) lower than Gemini 3.5 flash (57.86%) on Finance Agent bench
There are some interesting notes on test time compute but I couldn't think of a way to summarize them
Someone had to make a decision somewhere this is an acceptable regression - wild. And then decide to write it down.
Historically they’ve been people from certain identifiable countries (usually developing/poorer countries) using fuzzers with low-quality results.
Now, those same people use the current-day models to good effect, but they still don’t have a true security edge and oftentimes the reports are minor or duplicative.
I wonder if that’s about to deeply change.
Do people chant the "system manual" at Anthropic Tupperware parties? Do they intone a mantra invoking Amodei's name?
Also research preview pops across new upstarts in place of beta. It's eye-rolling coming from a lifelong curmudgeon.
Just talk normal!