Ok, I thought I was going insane. The last two larger coding tasks I gave Claude Code it left about 35% of my request completely undone or done sloppily.
I because of this, the next task I gave it on the larger side, I ran its work through Codex which identified 7 glaring unfinished parts of the task.
The trend was starting the part of the task but then leaving a "skeleton" of what I has requested without any of the actual working parts.
The way I would describe it is a kid cramming his 3 month project into a Sunday evening for Monday's due date.
kimixa 10 days ago [-]
Today Claude asked if I "wanted to leave this until tomorrow" as it was a "big rework", then stopped, requiring me to tell it to continue multiple times - that seemed kinda weird to me, it doesn't have the context of time of working day or similar (I'd only just started for one).
I have no idea what link it made to ask that, what in its training data or prompts, but it's very much "not a useful result".
I don't remember seeing anything similar, but have only been using Claude on and off for 6 months or so.
siva7 10 days ago [-]
Mother Anthropic needs more compute for their Mythos Model, so it phones home to tell her millions of claude harnesses to manipulate its human user into not wasting more precious compute and instead call it a day for now.
dstanko 10 days ago [-]
This has been the problem with every new model coming out in my experience. You can almost predict that they are testing new model by how dumb current one becomes suddenly
whatareyoudoing 7 days ago [-]
I created an account today to ask "Why?" -- Why are you using this tool? It's consistently producing subpar work, to the point that you're using _another_ (probably equally inferior tool) to compare the previous output?
This is something I see all the time with AI consumers and I am continuously baffled. If anything else (autocomplete, intellisense, etc.) produced this much garbage it would be immediately abandoned. Why is there such a high tolerance for the chat bot equivalent?
zambelli 10 days ago [-]
Anyone expecting a higher tier subscription to be announced since this current reduction?
Cynicism aside - I do wonder what the future will hold given that current token burn rates aren't sustainable without VC cash. Anthropic even pushed us to use haiku for claude code for "many" tasks in our enterprise training, so I'm wondering if it's not a company need of sorts to reduce the burn?
e3df 10 days ago [-]
Lol OAI and AMD did a deal together so whatever.
In reality as they scale up, the models lose nuance and become noisier. The boosters do not want to admit this.
We need highly-specialised models/interfaces. Not one thing and trying to force-fit it.
andrekandre 10 days ago [-]
> Not one thing and trying to force-fit it.
agree, but then they become glorified ide plugins and can't justify the huge valuations that a
magic box that does and knows everything can justify...
yash_salesup 5 days ago [-]
yes I have been facing this issue and all aspects of task execution has become lower in quality, I have moved to opencode.ai for all my routine repeated tasks for now -- a bit slow but works well on the free model, looking for alternative for difficult questions to run in terminal
ratg13 10 days ago [-]
Boris from the Claude Code team explained this on HN 2 days ago
> The frustrating part is that it's not a workflow _or_ model issue, but a silently-introduced limitation of the subscription plan. They switched thinking to be variable by load, redacted the thinking so no one could notice, and then have been running it at ~1/10th the thinking depth nearly 24/7 for a month. That's with max effort on, adaptive thinking disabled, high max thinking tokens, etc etc.
So Boris' explanation isn't really an explanation.
esperent 10 days ago [-]
> ~1/10th the thinking depth
While simultaneously drastically reducing the amount of work you can get done even at $200 a month. I've cancelled my subscription, it's not worth it anymore.
hello_humans 10 days ago [-]
“So squeeze, Rabban. Squeeze hard.”
niobe 10 days ago [-]
I felt it had been enshittified 1-2 weeks back, not in Feb. But it's very subjective.
MarleTangible 10 days ago [-]
There's definitely a trend of ignoring prompts and cutting thinking short.
yash_salesup 5 days ago [-]
moved to opencode.ai for most of my repetitive task for now
bravetraveler 10 days ago [-]
Your token quota is my opportunity, or something
trustfixsec 10 days ago [-]
i run claude code pretty heavily for overnight sessions and yeah the inconsistency b/w runs is noticeable. same prompt, same codebase, wildly different quality depending on the day. the frustrating part is when it half-finishes something and you come back to a mess you now have to untangle. still the most capable coding agent i've used but the variance is real.
Rendered at 08:14:20 GMT+0000 (Coordinated Universal Time) with Vercel.
I because of this, the next task I gave it on the larger side, I ran its work through Codex which identified 7 glaring unfinished parts of the task.
The trend was starting the part of the task but then leaving a "skeleton" of what I has requested without any of the actual working parts.
The way I would describe it is a kid cramming his 3 month project into a Sunday evening for Monday's due date.
I have no idea what link it made to ask that, what in its training data or prompts, but it's very much "not a useful result".
I don't remember seeing anything similar, but have only been using Claude on and off for 6 months or so.
This is something I see all the time with AI consumers and I am continuously baffled. If anything else (autocomplete, intellisense, etc.) produced this much garbage it would be immediately abandoned. Why is there such a high tolerance for the chat bot equivalent?
Cynicism aside - I do wonder what the future will hold given that current token burn rates aren't sustainable without VC cash. Anthropic even pushed us to use haiku for claude code for "many" tasks in our enterprise training, so I'm wondering if it's not a company need of sorts to reduce the burn?
In reality as they scale up, the models lose nuance and become noisier. The boosters do not want to admit this.
We need highly-specialised models/interfaces. Not one thing and trying to force-fit it.
https://news.ycombinator.com/item?id=47664442
> The frustrating part is that it's not a workflow _or_ model issue, but a silently-introduced limitation of the subscription plan. They switched thinking to be variable by load, redacted the thinking so no one could notice, and then have been running it at ~1/10th the thinking depth nearly 24/7 for a month. That's with max effort on, adaptive thinking disabled, high max thinking tokens, etc etc.
So Boris' explanation isn't really an explanation.
While simultaneously drastically reducing the amount of work you can get done even at $200 a month. I've cancelled my subscription, it's not worth it anymore.