NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Claude Opus 4.7 Model Card (anthropic.com)
bachittle 33 minutes ago [-]
So Opus 4.7 is measurably worse at long-context retrieval compared to Opus 4.6. Opus 4.6 scores 91.9% and Opus 4.7 scores 59.2%. At least they're transparent about the model degradation. They traded long-context retrieval for better software engineering and math scores.
freedomben 1 minutes ago [-]
Agreed, I appreciate the transparency (and Anthropic isn't normally very transparent). It's also great to know because I will change how I approach long contexts knowing it struggles more with them.
koehr 1 hours ago [-]
This reads more like an advertisement for Mythos, on the first glance
Symmetry 40 minutes ago [-]
> The technical error that caused accidental chain-of-thought supervision in some prior models (including Mythos Preview) was also present during the training of Claude Opus 4.7, affecting 7.8% of episodes.

>_>

aliljet 1 hours ago [-]
Have they effectively communicated what a 20x or 10x Claude subscription actually means? And with Claude 4.7 increasing usage by 1.35x does that mean a 20x plan is now really a 13x plan (no token increase on the subscription) or a 27x plan (more tokens given to compensate for more computer cost) relative to Claude Opus 4.6?
computomatic 1 hours ago [-]
They have communicated it as 5x is 5 x Pro, and 20x is 20 x Pro (I haven’t looked lately so not sure if that’s changed).

They have also repeatedly communicated that the base unit (Pro allotment) is subject to change and does change often.

As far as I can tell, that implies there is no guarantee that those subscriptions get some specific number of tokens per unit of time. It’s not a claim they make.

DonsDiscountGas 5 minutes ago [-]
Definitely 13x, at least for now
100ms 1 hours ago [-]

    $ pbpaste | wc -w 
    62508
    $ pbpaste | grep -oi mythos|wc -w
    331
    $ pbpaste | grep -oi opus|wc -w
    809
STRiDEX 1 hours ago [-]
Dumb question but why are chemical weapons always addressed as a risk with llms? Is the idea that they contain how to make chemical weapons or that they would guide someone on how?

Would there not already be websites that contain that information? How is an llm different, i guess, from some sort of anarchist cookbook thing.

Philpax 1 hours ago [-]
Both. There's the risk of them instructing a user on how to produce a known formulation (the Anarchist Cookbook solution, as you say), which is irritating but not that problematic.

The bigger issue is that they are potentially capable of producing novel formulations capable of producing harm, and guiding someone through this process. That is, consider a world in which someone with malicious desires has access to a model as capable at chemistry / biology as Mythos is at offensive cybersecurity abilities.

This is obviously limited by the fact that the models don't operate in the physical world, but there's plenty of written material out there.

rogerrogerr 43 minutes ago [-]
The world has been blessed by two connected things:

1. Smart people have economic opportunities that align them away from being evil

2. People who are evil tend not to be smart.

We're breaking both of these assumptions.

chrisweekly 11 minutes ago [-]
"Smart people have economic opportunities that align them away from being evil"

For some definition of evil, some of the time, ok. But as economic opportunities compound (looking at the behavior of the ultra-rich), it seems there's at least strong correlation in the other direction, if not full-on "root of all evil" causation.

Der_Einzige 34 minutes ago [-]
Good. This is how we will force the world to reckon with the isolated, the disgruntled, and "lone wolf" terrorist. Real "sigma males" actually exist, and when they decide "society has to pay" we are all worse off for it. If Ted Kaczynski (quintessential example of a real actual sigma) had been in his prime operating right now, he'd have mail-bombed NeurIPS and ICLR already. I'm not cool with being in crowds of AI professionals right now for physical security reasons given the extreme anti-AI sentiment that exists from nearly everyone outside of the valley: https://jonready.com/blog/posts/everyone-in-seattle-hates-ai...
rgbrenner 39 minutes ago [-]
In the same way that all coding docs are available publicly
CodingJeebus 1 hours ago [-]
WAG but I wonder if a hijacked LLM could also assist with figuring out how to obtain required materials, not just provide the recipe.
1 hours ago [-]
jmward01 1 hours ago [-]
Haiku not getting an update is becoming telling. I suspect we are reaching a point where the low end models are cannibalizing high end and that isn't going to stop. How will these companies make money in a few years when even the smallest models are amazing?
blixt 1 hours ago [-]
Isn't it pretty common for the smaller models to release a little while after the bigger ones, for all the big model providers?
jmward01 1 hours ago [-]
The last update for Haiku was in October, or in startup land, 10 years ago.
mvkel 1 hours ago [-]
It seems to be a rule that older models are more expensive than newer ones. The low end models have higher $CPT and worse output. I wonder if the move is to just have one model and quantize if you hit compute constraints
dkhenry 1 hours ago [-]
The Gemma models are at this point. A 31B model that can fit on a consumer card is as good as Sonnet 4.5. I haven't put it through as much on the coding front or tool calling as I have the Claude or GPT models, but for text processing it is on par with the frontier models.
make3 58 minutes ago [-]
absolutely not on par you're smoking
dkhenry 22 minutes ago [-]
You make a compelling argument, but thankfully I have data to back up my anecdotal experience

This comparison shows them neck and neck https://benchlm.ai/compare/claude-sonnet-4-5-vs-gemma-4-31b

As Does this one https://llm-stats.com/models/compare/claude-sonnet-4-6-vs-ge...

And the pelican benchmark even shows them pretty close https://simonwillison.net/2026/Apr/2/gemma-4/ https://simonwillison.net/2025/Sep/29/claude-sonnet-4-5/

Also this isn't a fringe statement, you can see most people who have done an evaluation agree with me

lostmsu 55 minutes ago [-]
Just to be clear, did you notice the parent said 4.5?
cmorgan31 18 minutes ago [-]
They are also on par in a lot of classification tasks. I did have to actually use gemma4 and fine tune it a bit but that is part of the value add.
il-b 45 minutes ago [-]
Ironically, the website is down
NickNaraghi 16 minutes ago [-]
232 pages is bullshit. Longer than the Mythos system card? What are you hiding.
joeumn 1 hours ago [-]
I'm actually surprised at how it performed compared to 4.6 and also compared to mythos. Will be fun to use.
bicepjai 2 hours ago [-]
This card is a 272 page report. So now we are redefining names :)
albert_e 2 hours ago [-]
Does the model card fit in the model's context :)
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 16:14:24 GMT+0000 (Coordinated Universal Time) with Vercel.