LLMs and LLM providers are massive black boxes. I get a lot of value from them and so I can put up with that to a certain extent, but these new "products"/features that Anthropic are shipping are very unappealing to me. Not because I can't see a use-case for them, but because I have 0 trust in them:
- No trust that they won't nerf the tool/model behind the feature
- No trust they won't sunset the feature (the graveyard of LLM-features is vast and growing quickly while they throw stuff at the wall to see what sticks)
- No trust in the company long-term. Both in them being around at all and them not rug-pulling. I don't want to build on their "platform". I'll use their harness and their models but I don't want more lock-in than that.
If Anthropic goes "bad" I want to pick up and move to another harness and/or model with minimal fuss. Buying in to things like this would make that much harder.
I'm not going to build my business or my development flows on things I can't replicate myself. Also, I imagine debugging any of this would be maddening. The value add is just not there IMHO.
EDIT: Put another way, LLM companies are trying to climb the ladder to be a platform, I have zero interest in that, I was a "dumb pipe", I want a commodity, I want a provider, not a platform. Claude Code is as far into the dragon's lair that I want to venture and I'm only okay with that because I know I can jump to OpenCode/Codex/etc if/when Anthropic "goes bad".
freedomben 25 minutes ago [-]
This echoes my thoughts exactly. I've tried to stay model-agnostic but the nudges and shoves from Anthropic continue to make that a challenge. No way I'm going that deep into their "cloud" services, unless it's a portable standard. I did MCP and skills because those were transferrable.
I also clearly see the lock-in/moat strategy playing out here, and I don't like it. It's classic SV tactics. I've been burned too many times to let it happen again if I can help it.
pc86 1 hours ago [-]
> - No trust that they won't nerf the tool/model behind the feature
To the contrary, they've proven again and again and again they'll absolutely do that the first chance they get.
rbalicki 1 hours ago [-]
You can lessen your dependence on the specific details of how /loop, code routines, etc. work by asking the LLM to do simpler tasks, and instead, having a proper workflow engine be in charge of the workflow aspects.
For example, this demo (https://github.com/barnum-circus/barnum/tree/master/demos/co...) converts a folder of files from JS to TS. It's something an LLM could (probably) do a decent job of, but 1. not necessarily reliably, and 2. you can write a much more complicated workflow (e.g. retry logic, timeout logic, adding additional checks like "don't use as casts", etc), 3. you can be much more token efficient, and 4. you can be LLM agnostic.
So, IMO, in the presence of tools like that, you shouldn't bother using /loop, code routines, etc.
spprashant 4 minutes ago [-]
I think it behooves us to be selective right now. Frontier labs maybe great at developing models, but we shouldn't assume they know what they are doing from a product perspective. They constantly throw several ideas on the wall and see what sticks (see Sora). They don't know how these things will play out long term. There is no reason to believe Co-work/Routines/Skills will survive 5 years from now. So it might just be better to not invest too much in ecosystem upfront.
mikepurvis 3 hours ago [-]
> I want to pick up and move to another harness and/or model with minimal fuss. Buying in to things like this would make that much harder.
Yes, I expect that is very much the point here. A bunch of product guys got on a whiteboard and said, okay the thing is in wide use but the main moat is that our competitors are even more distrusted in the market than we are; other than that it's completely undifferentiated and can be swapped out in a heartbeat for multiple other offerings. How do we do we persuade our investors we have a locked in customer base that won't just up-stakes in favour of other options or just running open source models themselves?
throwup238 2 hours ago [-]
I think they really knee capped themselves when they released Claude for Github integrations, which allows anyone to use their Claude subscription to run Claude Code in Github actions for code reviews and arbitrary prompts. Now they’re trying to back track that with a cloud solution.
JohnMakin 2 hours ago [-]
This is a similar sentiment I heard early on in the cloud adoption fever, many companies hedged by being “multi cloud” which ended up mostly being abandoned due to hostile patterns by cloud providers, and a lot of cost. Ultimately it didn’t really end up mattering and the most dire predictions of vendor lock in abuse didn’t really happen as feared (I know people will disagree with this, but specifically speaking about aws, the predictions vs what actually happened is a massive gap. note I have never and will never use azure, so I could be wrong on that particular one).
I see people making similar conclusions about various LLM providers. I suspect in the end it’ll shake out about the same way, the providers will become practically inoperable with each other either due to inconvenience, cost, or whatever. So I’ve not wasted much of my time thinking about it.
michaeldwan 1 hours ago [-]
I credit containerization, k8s, and terraform for preventing vendor lock in. Compute like EC2 or GCE are effectively interoperable. Ditto for managed services for k8s or Postgres. The new products Anthropic is shipping is more like Lambda. Vendor kool-aid lots of people will buy into.
What grinds my gears is how Anthropic is actively avoiding standards. Like being the only harness that doesn't read AGENTS.md. I work on AI infra and use different models all the time, Opus is really good, but the competition is very close. There's just enough friction to testing those out though, and that's the point.
JohnMakin 57 minutes ago [-]
I think there is lock-in, despite those things - for containerization, you're still a lot of the times beholden to the particular runtime that provider prefers, and whatever weird quirks exist there. Migrating can have some surprises. K8s, usually you will go managed there, and while they provide the same functionality, AKS != EKS != GKE at all, at least in terms of managing them and how they plug into everything else. In terraform, migrating from AWS provider to GCP provider will hold a lot of surprises for you for what looks like it should be the exact same thing.
My point was, I don't think it mattered much, and it feels like an ok comparison - cloud offerings are mostly the exact same things, at least at their core, but the ecosystem around them is the moat, and how expensive it is to migrate off of them. I would not be surprised at all if frontier AI model providers go much the same way. I'm pretty much there already with how much I prefer claude code CLI, even if half the time I'm using it as a harness for OpenAI calls.
fragmede 23 minutes ago [-]
There's a tiny amount of friction. Enough that I'll be honest and say that I spend the majority of my time with one vendor's system, but compared the to the fiction of moving from one cloud to another, eg AWS to GCP, the friction between opening Claude code vs codex is basically zero. Have an active subscription and have Claude.md say "read Agents.md".
Claude Code routines sounds useful, but at the same time, under AI-codepocalypse, my guess is it would take an afternoon to have codex reimplement it using some existing freemium SaaS Cron platform, assuming I didn't want to roll my own (because of the maintenance overhead vs paying someone else to deal with that).
robwwilliams 2 hours ago [-]
There are different level of who gets locked in. Almost every health care system in the USA is locked in to either an Epic/Oracle barrel or a Cerner barrel. I hope AI breaks this duopoly open soon.
phist_mcgee 24 minutes ago [-]
Let's see how it shakes out after Athropic and OpenAI fully stop subsidizing their plans, that may alter the calculus.
palata 3 hours ago [-]
> - No trust that they won't nerf the tool/model behind the feature
I actually trust that they will.
gardenhedge 3 hours ago [-]
Yeah, I build my workflows with two things in mind:
1) that AI will be more advanced in the future
2) that the AI I am using will be worse in the future
freedomben 22 minutes ago [-]
Same! I actually have some comments in my codebase now like this one:
# Note: This is inefficient, but deterministic and predictable. Previous
attempts at improvements led to hard-to-predict bugs and were
scrapped. TODO improve this function when AI gets better
I don't love it or even like it, but it is realistic.
dvfjsdhgfv 3 hours ago [-]
I believe the current game everybody plays is:
* make sure the model maxes out all benchmarks
* release it
* after some time, nerf it
* repeat the same with the next model
However, the net sum is positive: in general, models from 2026 are better than those from 2024.
snek_case 2 hours ago [-]
I guess there's a pretty clear incentive to nerf the current model right before the next model is about to come out.
chinathrow 2 hours ago [-]
Wouldn't that amount to fraud?
tomwojcik 2 hours ago [-]
Serious question, do we actually know what we're paying for? All I know is it's access to models via cli, aka Claude Code. We don't know what models they use, how system prompt changes or what are the actual rate limits (Yet Anthropic will become 1 trillion dollars company in a moment).
xienze 1 hours ago [-]
> We don't know what models they use, how system prompt changes or what are the actual rate limits (Yet Anthropic will become 1 trillion dollars company in a moment).
Not just that, but there’s really no way to come to an objective consensus of how well the model is performing in the first place. See: literally every thread discussing a Claude outage or change of some kind. “Opus is absolutely incredible, it’s one shotting work that would take me months” immediately followed by “no it’s totally nerfed now, it can’t even implement bubble sort for me.”
twobitshifter 33 minutes ago [-]
Did Apple slow down iPhones before the new release? I’m really asking. People used to say that and I can’t remember if it was proven or not?
DrewADesign 3 minutes ago [-]
Yeah, but they got sued over it and purportedly stopped. They claimed it was to protect battery health.
Suuuuuuure it was.
That said, I had way better experiences with old (but contemporary) Apple hardware than any other kind of old hardware.
2 hours ago [-]
ambicapter 1 hours ago [-]
Legally?
_blk 2 hours ago [-]
yup, after the token-increase from CC from two weeks ago, I'm now consistently filling the 1M context window that never went above 30-40% a few days ago. Did they turn it off? I used to see the Co-Authored by Opus 4.6 (1M Context Window) in git commits, now the advert line is gone. I never turned it on or off, maybe the defaults changed but /model doesn't show two different context sizes for Opus 4.6
I never asked for a 1M context window, then I got it and it was nice, now it's as if it was gone again .. no biggie but if they had advertised it as a free-trial (which it feels like) I wouldn't have opted in.
Anyways, seems I'm just ranting, I still like Claude, yes but nonetheless it still feels like the game you described above.
dr_kiszonka 1 hours ago [-]
The default prompt cache TTL changed from 1 hour to 5 minutes. Maybe this is what you are experiencing.
robwwilliams 2 hours ago [-]
Yep; second time in five months we have gone from 1 million back to 200 thousand.
_blk 1 hours ago [-]
hmm, I just reverted to 2.1.98 and now with /model default has the (1M context) and opus is without (200k) .. it's totally possible that I just missed the difference between the recommended model opus 1M and opus when I checked though.
troupo 1 hours ago [-]
They are now literally blaming users for using their product as advertised:
We defaulted to medium [reasoning] as a result of user feedback about Claude using too many tokens. When we made the change, we (1) included it in the changelog and (2) showed a dialog when you opened Claude Code so you could choose to opt out. Literally nothing sneaky about it — this was us addressing user feedback in an obvious and explicit way.
--- end quote ---
jeppester 24 minutes ago [-]
I always hated SEO because it was not an exact science - like programming was.
Too bad we've now managed to turn programming into the same annoying guesswork.
gbro3n 2 hours ago [-]
I have heard it said that tokens will become commodities. I like being able to switch between Open AI and Anthropics models, but I feel I'd manage if one of them disappeared. I'd probably even get by with Gemini. I don't want to lock in to any one provider any more than I want to lock in to my energy provider. I might pay 2x for a better model, but no more, and I can see that not being the case for much longer.
ahmadyan 2 hours ago [-]
> I'm not going to build my business or my development flows on things I can't replicate myself.
but you can replicate these yourself! i'm happy that ant/oai are experimenting to find pmf for "llm for dev-tools". After they figure out the proper stickyness, (or if they go away or nerf or raise prices, etc) you can always take the off-ramp and implement your own llm/agent using the existing open-source models. The cost of building dev-tools is near zero. it is not like codegen where you need the frontier performance.
nine_k 1 hours ago [-]
In this regard, the release of open-weight Gemma models that can run on reasonable local hardware, and are not drastically worse than Anthropic flagships, is quite a punch. An M2 Mac Mini with 32GB is about 10 months worth of Claude Max subscription.
Readerium 36 minutes ago [-]
In coding they are worse.
Chinese models (GLM, MiniMax) are better.
nine_k 24 minutes ago [-]
Anyway, there are a few model that are freely distributable, and that can reasonably run on consumer-grade local hardware.
It changes a number of things. Not all tasks require very high intelligence, but a lot of data may be sensitive enough to avoid sharing it with a third party.
chinathrow 4 hours ago [-]
Yeah so better to convert tokens into sw doing the job at close to zero costs running on own systems.
cush 3 hours ago [-]
You could so easily build your own /schedule. This is hardly a feature driving lock-in
tiku 2 hours ago [-]
I believe it doesn't matter, other companies will copy or improve it. The same happend with clawdbot, the amount of clones in a month was insane.
wookmaster 1 hours ago [-]
They're trying to find ways to lock you in
SV_BubbleTime 28 minutes ago [-]
Without getting too pedantic for no reason… I think it’s important to not call this an LLM.
This isn’t an LLM. It’s a product powered by an LLM. You don’t get access to the model you get access to the product.
An LLM can’t do a web search, an LLM can’t convert Excel files into something and then into PDF. Products do that.
I think it’s a mistake to say I don’t trust this engine to get me here, rather than it is to say I don’t trust this car. Because for the most part, the engine, despite giving you a different performance all the time is roughly doing the same thing over and over.
The product is the curious entity you have no control over.
sunnybeetroot 3 hours ago [-]
Isn’t that what LangChain/LangGraph is meant to solve? Write workflows/graphs and host them anywhere?
slopinthebag 2 hours ago [-]
They have to become a platform because that is their only hope of locking in customers before the open models catch up enough to eat their lunch. Stuff like Gemma is already good enough to replace ChatGPT for the average consumer, and stuff like GLM 5.1 is not too far off from replacing Claude/Codex for the average developer.
verdverm 4 hours ago [-]
I fully endorse building a custom stack (1) because you will learn a lot (2) for full control and not having Big Ai define our UX/DX for this technology. Let's learn from history this time around?
gritspants 3 hours ago [-]
Here's the problem I keep running into with AI and 'history'. We all know where this is going. We'll pick our winners and losers in the interim, but so far, this is a technology that mostly impacts tech practitioners. Most people don't care, in the sense that you're a taxi driver. Perhaps you have a manual transmission and the odd person comments on your prowess with it. No one cares. I see a bunch of boys making fools out of themselves otherwise.
dsf2aa 1 hours ago [-]
Theres something bizarre going on and many have completely lost their minds.
The funniest thing Ive heard is that now we have LLMs, Humanoid robots are on the horizon. Like wtf? People who jump to these conclusions were never deep thinkers in the first place. And thats OK, its good to signal that. So we know who to avoid.
alfalfasprout 18 minutes ago [-]
Yep. Trust is easy to lose, hard to earn. A nondeterministic black box that is likely buggy, will almost certainly change, and has a likelihood of getting enshittified is not a very good value proposition to build on top of or invest in.
Increasingly, we're also seeing the moat shrink somewhat. Frontier models are converging in performance (and I bet even Mythos will get matched) and harnesses are improving too across the board (OpenCode and Codex for example).
I get why they're trying to do that (a perception of a moat bloats the IPO price) but I have little faith there's any real moat at all (especially as competitors are still flush with cash).
andrewmcwatters 4 hours ago [-]
[dead]
crystal_revenge 1 hours ago [-]
This sounds like someone complaining about how Windows is a black box while ignoring the existence of Linux/BSD.
I'm currently hosting, on very reasonable consumer grade hardware, an LLM that is on par performance wise what every anyone was paying for about a year ago. Including all the layers in between the model and the user.
Llama.cpp serves up Gemma-4-26B-A4B, Open WebUI handles the client details: system prompt, web search, image gen, file uploading etc. With Conduit and Tailscale providing the last layer so I can have a mobile experience as robust as anything I get from Anthropic, plus I know how all the pieces works and can upgrade, enhance, etc to my hearts delight. All this runs from a pretty standard MBP at > 70 tokens/sec.
If you want to better understand the agent side of things, look into Hermes agent and you can start understanding the internals of how all this stuff is done. You can run a very competitive coding agent using modest hardware and open models. In a similar note, image/video gen on local hardware has come a long way.
Just like Linux, you're going to exchanging time for this level of control, but it's something anyone who takes LLMs seriously and has the same concerns can easily get started with.
Yet I still see comments like this that seem to complete ignore the incredible work in the open model community that has been perpetually improving and is starting to really be competitive. If you relax the "local" requirement and just want more performance from an LLM backend you can replace the llama.cpp part with a call to Kimi 2.5 or Minimax 2.7 (which you could feasibly run at home, not kimi though). You can still control all the additional part of the experience but run models that are very competitive with current proprietary SoTA offering, 100% under your control still and a fraction of the price.
andai 5 hours ago [-]
I'm a little confused on the ToS here. From what I gathered, running `claude -p <prompt>` on cron is fine, but putting it in my Telegram bot is a ToS violation (unless I use per-token billing) because it's a 3rd party harness, right? (`claude -p` being a trivial workaround for the "no 3rd party stuff on the subscription" rule)
This Routines feature notably works with the subscription, and it also has API callbacks. So if my Telegram bot calls that API... do I get my Anthropic account nuked or not?
joshstrange 4 hours ago [-]
Anthropic deserves to have this as the top comment on every HN post. It's absurd that they don't clarify this better and so many people are running around online saying the exact opposite from what their, confusing, docs say.
The Chilling Effect of this is real and it gets more and more frustrating that they can't or won't clarify.
throwup238 4 hours ago [-]
It’s also absurd that they’re doing their communication on a bunch of separate platforms like HN, Reddit, and Github with no coherent strategy or consistency as far as I can tell. Can’t I just get policy clarifications in my email like a normal business?
I downgraded my $200/mo sub to $20 this past week and I’m going to try out Codex’s Pro plans. Between the cache TTL (does it even affect me? No idea), changes in the rate limit, 429 rate limit HTTP status code during business hours, adaptive thinking (literally the worst decision they’ve ever made, as far as my line of work is concerned), dumb agent behavior silently creating batshit insane fallthroughs, clearly vibe coded harness/infrastructure, and their total lack of transparency, I think I’m done. It was fun while it lasted but I’m tired of paying for their mistakes in capacity planning and I feel like the big rug pull (from all three SOTA providers) is coming like a freight train.
sidrag22 3 hours ago [-]
I was "Claude only" for well over a year. Kinda crazy how they seem to be gaining a LOT of public attention the last few months, yet i see this type of sentiment from other devs/myself. for me it started with their opencode drama, and openai's decision to embrace opencode in response.
I didn't even know what opencode was prior to that drama, yet now here i am using opencode and a ton of crafted openai agents in my projects. Would love to have some claude agents in that mix, but i guess im stuck in Claude Code if i wanna even touch their models... I'd love to go back to just claude as i "trust" them more in a sorta less evil vibe manner, but if they are gonna prevent subscription usage to something people use to allow themselves more freedom, they gotta then close that gap with their own tools rather than pumping out stuff like this which scares me off given the past couple months.
I totally understand why they are cutting off 3pa access to stuff like openclaw, where the avg user is just a power user in comparison to avg claude user or whatever. I haven't kept up a ton with their opencode issues, but I just know i can't get behind a company actively trying to make my potential usage of tokens less optimized to keep me locked into their ecosystem.
Really just kinda hoping local models kill it all for devs after a few years, I'm not interested in perma relying on data centers for my workflow.
hayd 1 hours ago [-]
same. I use pi - and anthropic pricing change have made it usuable with Max. Codex works pretty much the same, not need to change development practice... apart from no 429s (so far).
stephbook 3 hours ago [-]
The ambiguity is intentional. Like Microsoft not banning volume licenses. They want to scare you, so you don't max out your subscription – which they sell at a loss.
Another comparison would be "unlimited storage", where "unlimited" means some people will abuse it and the company will soon limit the "unlimited."
pixel_popping 2 hours ago [-]
Literally yeah, the ambiguity is just so they can boycott anytime they want, people underestimate Anthropic too much, obviously they have insane amount of scrappers, bots... no comments online is made without their awareness and analyzed by a bunch of agents that then do prediction and for sure so much more. They know exactly what they are doing.
causal 2 hours ago [-]
Yeah in the span of a month or so we had:
- SDK that allows you to use OAuth authentication!
- Docs updated to say DO NOT USE OAUTH authentication unless authorized! [0]
- Anthropic employee Tweeting "That's not what we meant! It's fine for personal use!" [1]
- An email sent out to everyone saying it's NOT fine do NOT use it [2]
Wait we can't use claude -p around other tools? What is the point of the JSON SDK then? Anthropic is confusing here, ugh.
edit: And specifically i'm making an IDE, and trying to get ClaudeCode into it. I frankly have no clue when Claude usage is simply part of an IDE and "okay" and when it becomes a third party harness..
cortesoft 4 hours ago [-]
I was pretty sure that claude -p would always be fine, but I looked at the TOS and it is a bit unclear.
It says in the prohibited use section:
> Except when you are accessing our Services via an Anthropic API Key or where we otherwise explicitly permit it, to access the Services through automated or non-human means, whether through a bot, script, or otherwise.
So it seems like using a harness or your own tools to call claude -p is fine, AS LONG AS A HUMAN TRIGGERS IT. They don’t want you using the subscription to automate things calling claude -p… unless you do it through their automation tools I guess? But what if you use their automation tool to call your harness that calls claude -p? I don’t actually know. Does it matter if your tool loops to call claude -p? Or if your automation just makes repeated calls to a routine that uses your harness to make one claude -p call?
It is not nearly as clear as I thought 10 minutes ago.
Edit: Well, I was just checking my usage page and noticed the new 'Daily included routine runs' section, where it says you get 15 free routine runs with your subscription (at least with my max one), and then it switches to extra usage after that. So I guess that answers some of the questions... by using their routine functionality they are able to limit your automation potential (at least somewhat) in terms of maxing out your subscription usage.
Possibly, though at first i was entirely focusing (and still am) on Claude Code usage. Given that CC had an API, i figured its own SDK would update faster/better/etc to new Claude features that Anthropic introduces. I'm sure ACP is a flexible protocol, but nonetheless i was just aiming for direct Claude integration.. and you know, it's an official SDK, seemed quite logical to me.
It would be absurd to me if the same application is somehow allowed via ACP but not via official SDK. Though perhaps the official SDK offers data/features that they don't want you to use for certain scenarios? If that were they case though it would be nice if they actually published a per-SDK-API restrictions list.
They’re shooting themselves in the foot with these dumb restrictions.
taytus 4 hours ago [-]
They are not dumb restrictions. They just don't have the compute. That is the dumb part. Dario did not secure the compute they need so now they are obviously struggling.
joshstrange 4 hours ago [-]
The restrictions are dumb not because they're lower than any of us want them to be, but because they're unclear. Every time Claude comes up on Hacker News, someone asks this question. And every time people chime in to agree that they also are unclear or someone weighs in saying, no, it's totally clear, while proceeding not to point at any official resource and/or to "explain" the rules in a that is incompatible with official documentation.
There's another part that's bullshit: If you've paid for an annual subscription, for a given number of tokens, welp, now you're getting fewer tokens. They've decreased the limits mid-subscription. How is it not bait-and-switch to pay for something for a year only to have something else delivered?
taytus 3 hours ago [-]
You are arguing something different. My point is that they must apply these restrictions. Do I think they could have calculated their growth a little better? Yes, of course, but hindsight is 20/20.
joshstrange 2 hours ago [-]
We might be talking past each other, I promise I'm not just trying to argue.
> My point is that they must apply these restrictions.
I fully understand and respect they need restrictions on how you can use your subscription (or any of their offerings). My issue is not there there _are_ restrictions but that the restrictions themselves are unclear which leads to people being unsure where the line is (that they are trying not to cross).
Put simply: At what point is `claude -p` usage not allowed on a subscription:
- Running `claude -p` from the CLI?
- Running `claude -p` on a Cron?
- Running `claude -p` as a response to some external event? (GH action, webhook, etc?)
- Running `claude -p` when I receive a Telegram/Discord/etc message (from myself)?
Different people will draw the line in different places and Anthropic is not forthcoming about what is or is not allowed. Essentially, there is a spectrum between "Running claude by hand on the command line" and "OpenClaw" [0] and we don't know where they draw the line. Because of that, and because the banning process is draconian and final with no appeals, it leads to a lot of frustration.
[0] I do not use OpenClaw nor am I arguing it should be allowed on the subscription. It would be nice if it was but I'm not saying it should be. I'm just saying that OpenClaw clearly is _not_ allowed but `claude -p` wouldn't be usable at all with a subscription if it was completely banned so what can it (safely) be used for?
dgellow 4 hours ago [-]
Their growth over the past months has been more than insane. It’s completely expected they don’t have the compute. You don’t have infinite data centers around
taytus 3 hours ago [-]
Like or not, openai isn't having the same compute strain, meaning this was predictable.
comboy 3 hours ago [-]
Unrelated, but Claude was performing so tragically last few days, maybe week(s), but days mostly, that I had to reluctantly switch. Reluctantly because I enjoy it. Even the most basic stuff, like most python scripts it has to rerun because of some syntax error.
The new reality of coding took away one of the best things for me - that the computer always just does what it is told to do. If the results are wrong it means I'm wrong, I made a bug and I can debug it. Here.. I'm not a hater, it's a powerful tool, but.. it's different.
bluegatty 1 hours ago [-]
Codex with 5.4 xhigh. It's a bad communicator but does the job.
pacha3000 3 hours ago [-]
I'm the first to be tired of everyone, for every model, that says "uuuh became dumber" because I didn't believe them
... until this week! Opus is struggling worse than Sonnet those last two weeks.
saghm 27 minutes ago [-]
Forget the agent itself being dumber: right now I'm getting an "API error: usage limit exceeded" message whenever I try anything despite my usage showing as 26% for the session limit and 8% for the week (with 0/5 routines, which I guess is what this thread is about). This is with the default model and effort, and Claude Code is saying I need to turn on extra usage for it to work. Forget that, I just canceled my subscription instead.
There's utility in LLMs for coding, but having literally the entire platform vibe-coded is too much for me. At this point, I might genuinely believe they're not intentionally watering anything down, because it's incredibly believable that they just have no clue how any of it works anymore.
girvo 43 minutes ago [-]
My favourite was, Opus 4.6 last night (to be fair peak IST time, late afternoon my time), the first prompt with a small context: jams a copy-pasted function in between a bunch of import statements, doesn't even wire up it's own function and calls it done. Wild, I've not seen failure states like that since old Sonnet 4
jpcompartir 1 hours ago [-]
Likewise, I foolishly assumed everybody else was just doing it wrong.
But this week I've lost count of the times I've had to say something along the lines of:
"Can you check our plan/instructions, I'm pretty sure I said we need to do [this thing] but you've done [that thing]..."
And get hit with a "You're absolutely right...", which virtually never happened for me. I think maybe once since Opus 4-6.
qingcharles 28 minutes ago [-]
Is it? Or is it the task you're trying to do? Opus 4.6 has been staggeringly good for me this last week, both inside Claude Code and through Antigravity until I used up my quota.
combyn8tor 30 minutes ago [-]
In my experience Opus and Claude have declined significantly over the past few weeks. It actually feels like dealing with an employee that has become bored and intentionally cuts corners.
comboy 2 hours ago [-]
Pretty reassuring to hear that. I was skeptical too, there's a lot of variables like some crap added to memory specific skill or custom instructions interfering with the workflow and what not. But now it was like a toddler that consumes money when talking.
timacles 39 minutes ago [-]
It’s quite an interesting business model actually that the worse it performs to a degree the more money it makes you because of the token churn
Eldodi 4 hours ago [-]
Anthropic is really good at releasing features that are almost the same but not exactly the same as other features they released the week before
masto 47 minutes ago [-]
So management can cancel all of last week’s projects when they told us all we had to be using skills because the CEO read about them in the in flight magazine. Routines are the future, baby. DevOps already made a big announcement that they’re centralizing the Routines Hub. If you can’t keep up, we’ll get someone who says they can.
dymk 4 hours ago [-]
7 days is long enough for work to leave the context window, hence…
titzer 11 minutes ago [-]
Just wait until they get into the phase where they're big enough that they're eating all the baby startups and have to pick winners and losers amongst the myriad of overlapping features while also having the previous baby startups they acquired crank out new features.
We're watching a speed run of growthism, folks.
tclancy 4 hours ago [-]
And or things I’ve spent a bunch of time building already. And naming them the same. I should have trademarked “dispatch”!
dbish 3 hours ago [-]
you're telling me dispatchagents.ai :) (open to new names if anyone has cool ones, didn't expect anthropic to start using dispatch with their agents, naming is way too hard)
spelunker 4 hours ago [-]
> In the Desktop app, click New task and choose New remote task; choosing New local task instead creates a local Desktop scheduled task, which runs on your machine and is not a routine.
Oh uh... ok then.
minimaxir 5 hours ago [-]
Given the alleged recent extreme reduction in Claude Code usage limits (https://news.ycombinator.com/item?id=47739260), how do these more autonomous tools work within that constraint? Are they effectively only usable with a 20x Max plan?
EDIT: This comment is apparently [dead] and idk why.
giancarlostoro 3 hours ago [-]
I've been talking to friends about this extensively, and read all sorts of different social media posts on X where people deep dove things (I'm at work so I don't have any links handy - though I did submit one on HN, grain of salt, unsure how valid it is but it was interesting: https://news.ycombinator.com/item?id=47752049 ).
I think the real issue stems from the 1 Million token context window change. They did not anticipate the amount of load it would give you. That first few days after they released the new token window, I was making amazing things in one single session from nothing, to something (a new .NET based programming language inspired by Python, and a Virtual Actor framework in Rust). I think since then they've been trying too many things to tweak things, whilst irritating their users.
They even added a new "Max" thinking mode, and made "High" the old medium, which is ridiculous because you think you're using "High" but really you're not. There's a hidden config file to change their terrible defaults to let Claude be smarter still, and apparently you can toggle off the 1M tokens.
I think the real fix, and I'm surprised nobody there has done this yet, is to let the user trim down their context window.
Think about it, you used to have what? 350k tokens or so? Now Claude will keep sending your prompt from 30 minutes ago that's completely irrelevant to the back-end, whereas 3 months ago it would have been compacted by now.
Others have noted that similar prompting for some ungodly reason adds tens of thousands of extra garbage tokens (not sure why).
Edit looks like someone figured out that if you downgrade your version of Claude Code and change one single setting it unruins Claude:
Yea, I've realized that if I stay under 200k tokens I basically don't have usage issues any more.
A bit annoying, but not the end of the world.
dacox 3 hours ago [-]
Yeah, I have been seeing lots of comments, tweets, etc, but given everything I have learned about these models - i do not think the change to 1M was innocuous. I'm not sure what they've claimed publicly, but I'm fairly certain they must be doing additional quantization, or at minimum additional quantization of the KV cache. Plus, sequence length can change things even when not fully utilized. I had to manually re-enable the "clear context and continue" feature as well.
giancarlostoro 3 hours ago [-]
I used the heck out of it when it was announced, and it felt like I was using one of the best models I've ever used, but then so were all of their other customers, I don't think they accounted for such heavy load, or maybe follow up changes goofed something up, not sure. Like I said, the 1M token, for the first few days allowed me to bust out some interesting projects in one session from nothing to "oh my" in no time.
I'm thinking they should go back to all their old settings and as a user cap you at their old token limit, and ask you if you want to compact at your "soft" limit or burst for a little longer, to finish a task.
AI race to the bottom is a debt game now. Once the party is over somebody will have to pay the bill.
timacles 29 minutes ago [-]
It’s going to be crazy with the explanation they come up with why the us public has to pay to bail out AI for national security.
In a way, it’s true if china has superior AI then it’s dominance over US will materialize. But it’s not hard to see how this scenario is being used to essential lie and scam into trillions of debt.
Its interesting how the cutthroat space of big tech has manifested into an incidious hyper capitalist system where disrupting a system is it’s primary function. The system in this case is world order and western governments
breakingcups 4 hours ago [-]
You seem to be vouched for now, no longer dead for me.
minimaxir 4 hours ago [-]
Hmm, I can't edit the original comment to retract that edit either. Either my account is flagged for something or HN is being weird.
TacticalCoder 4 hours ago [-]
Everything looks good to me: you don't look like you have a flagged account (but then I don't work for HN).
cedws 2 hours ago [-]
This is the beginning of AI clouds in my estimation. Cloud services provide needed lock-in and support the push to provide higher level services over the top of models. It just makes sense, they'll never recoup the costs on just inference.
ctoth 5 hours ago [-]
You'd think that if they were compute-limited ... Trying to get people to use it less ... The rational thing to do would be to not ship features that will use more compute automatedly? Or does this use extra usage?
dpark 2 hours ago [-]
They are more worried about building a moat than anything else. They want people building integrations that are difficult to undo so that they lock into the platform.
lostmsu 1 hours ago [-]
> They want people building integrations that are difficult to undo so that they lock into the platform.
Ironically, they are now playing against their own models that can relatively easily build wrappers around any API shape into any other API shape.
whicks 5 hours ago [-]
I would imagine that this sort of scheduling allows them to have more predictable loads, and they may be hoping that people will schedule some of their tasks in “off hours” to reduce daytime load.
andai 5 hours ago [-]
It also beats OC's heartbeat where it auto-runs every 30 minutes and runs a bunch of prompts to see if it actually needed to run or not.
pkulak 4 hours ago [-]
Man, this just bit me too. I started playing with OC over the weekend (in a VM), and the spend was INSANE even though I wasn't doing anything. I don't see this as very useful as an "assistant" that wanders around and anticipates my needs. But I do like the job system, and the ability to make skills, then run them on a schedule or in response to events. But when I looked into what it was doing behind my back, 48 times a day it was packaging up 20K tokens of silly context ("Be a good agent, be helpful, etc, for 30 paragraphs"), shipping it off to the model, and then responding with a single HEARTBEAT_OK.
Luckily you can turn if off pretty easily, but I don't know why it's on by default to begin with. I guess holdover from when people used it with a $20 subscription and didn't care.
pletnes 4 hours ago [-]
Also you can schedule it a bit off. Every hour? Delay it a few seconds. Can’t do that with a chat message. Also, batch up a bunch of them, maybe save some compute that way? Latency is not an issue.
ctoth 5 hours ago [-]
I thought about that but I'm pretty sure that if the backlog is automatically clean and I don't need to run my skill for that when I start up in the morning that just means I can do the next task I would have done which will probably use Claude Code.
Your own, personal, Jevons.
iBelieve 5 hours ago [-]
Max accounts get 15 daily runs included, any runs above that will get billed as extra usage.
AlexCoventry 4 hours ago [-]
I don't think "usage" is exactly the metric they're going for, more like "usage in line with our developmental strategy." Transcripts of people using Claude to write code are probably far more valuable to them than transcripts of OpenClaw trying to set up a calendar invite.
fgkramer 3 hours ago [-]
I mean, they don’t train on your data unless you have the setting enabled.
Do you really think they are reading your prompts at all?
Free inference providers sure, but Anthropic?
dockerd 4 hours ago [-]
It's how they can lock more users into their eco-system.
eranation 4 hours ago [-]
I've been using it for a while (it was just called "Scheduled", so I assume this is an attempt to rebrand it?)
It was a bit buggy, but it seems to work better now. Some use cases that worked for me:
1. Go over a slack channel used for feedback for an internal tool, triage, open issues, fix obvious ones, reply with the PR link. Some devs liked it, some freaked out. I kept it.
2. Surprisingly non code related - give me a daily rundown (GitHub activity, slack messages, emails) - tried it with non Claude Code scheduled tasks (CoWork) not as good, as it seems the GitHub connector only works in Claude Code. Really good correlation between threads that start on slack, related to email (outlook), or even my personal gmail.
I can share the markdowns if anyone is interested, but it's pretty basic.
Very useful, (when it works).
twobitshifter 22 minutes ago [-]
It seemed OpenClaw is just Pi with Cron and hooks, and it seems like this is just Claude Code with Cron and hooks. Based on the superiority of Pi, I would not expect this to attract any one from OpenClaw, but it will increase token usage in Claude Code.
mellosouls 5 hours ago [-]
Put Claude Code on autopilot. Define routines that run on a schedule, trigger on API calls, or react to GitHub events...
We ought to come up with a term for this new discipline, eg "software engineering" or "programming"
avaer 4 hours ago [-]
Setting up your agent. This part doesn't deserve a name; there is no programming or engineering or really much thinking involved.
They support much of the same triggers and come with many additional security controls out of the box
eranation 2 hours ago [-]
+1 for that, having that said, because GH agentic workflows require a bit more handholding and testing to work, (and have way more guardrails, which is great, but limiting), and lack some basic connectors (for example - last time I tried it, it had no easy slack connector, I had to do it on my own). This is why I'm moving some of the less critical gh-aw (all the read only ones) to Claude Routines.
Why have I not heard of this? Was looking for a way to integrate LLM CLI's to do automated feature development + PR submission triggered by Github issues, seems like this would solve it.
eranation 26 minutes ago [-]
Built in Co-Pilot I believe can do this better than gh-aw (or a click away).
Cursor has that too by the way (issue -> remote coding session -> PR -> update slack)
eranation 29 minutes ago [-]
If anyone from anthropic reads it. I love this feature very much, when it works. And it mostly doesn't.
The main bugs / missing features are
1. It loses connection to it's connectors, mostly to the slack connector. It does all the work, then says it can't connect to slack. Then when you show it a screenshot of itself with the slack connector, it will say, oh, yeah, the tools are now loaded and does the rest of the routine.
2. ability to connect it to github packages / artifactory (private packages) - or the dangerous route of allowing access to some sort of vault (with non critical dev only secrets... although it's always a risk. But cursor has it...)
3. the GitHub MCP not being able to do simple things such as update release markdown (super simple use case of creating automated release notes for example)
You are so close, yet so far...
Terretta 6 minutes ago [-]
It's remarkable how often it refuses to introspect but a SCREENSHOT of itself and suddenly "yeah this works fine".
This happens in all their UIs, including, say, Claude in Excel, as well.
didn’t we have several antitrust cases where a vendor used its monopoly to disadvantage rivals? did not anthropic block openclaw?
Someone1234 4 hours ago [-]
They did not.
You can still use OpenClaw on their API pricing tier as much as you want. What they did is not allow subscriptions to be used to power automated third-party workloads, including OpenClaw.
Now, is their messaging around this confusing? Absolutely. The whole thing has been handled shambolically. Everyone knows that they lack the compute to keep up, and likely have lower margins on subscriptions than API; but they cannot just say that because investors may be skittish.
dmix 5 hours ago [-]
How is Anthropic a monopoly? The market is barely even fully developed and has multiple large and small competitors
andai 5 hours ago [-]
It's not blocked, you just can't use the Claude-only subscription endpoint with unauthorized 3rd party software. (You can use it via the regular API (7x more expensive) and pay per token just fine.)
...Except now you sorta-kinda can: now they auto-detect 3rd party stuff and bill you per-token for it?
The reason someone would use this vs. third-party alternatives is still the fact that the $200/mo subscription is markedly cheaper than per-token API billing.
Not sure how this works out in the long term when switching costs are virtually zero.
> Not sure how this works out in the long term when switching costs are virtually zero.
All these not really helpful, but vendor specific, "bonuses" sounds like a way to try to lock people in, to try to raise the switching cost.
I'm using, on purpose, a simple process so that at any time I can switch AI provider.
sminchev 3 hours ago [-]
Everything is big race! Each company is trying to do as much as possible, to provide as many tools as possible, to catch the wave and beat the concurrency. I remember how Antropic and OpenAI made releases in just 10-15 minutes of difference, trying to compete and gain momentum.
And because they use AI heavily, they produce new product every week. So fast, that I have no time to check, does it worth or not.
This one looks interesting. I have some custom commands that I execute manually weekly, for monitoring, audits, summary, reports.
It it can send reports on email, or generate something that I can read in the morning with my coffee, or after I finish with it ;) it might be a good tool.
The question is, do I really want to so much productive? I am already much better in performance with AI, compared with the 'old school' way...
Everything is just getting to much for me.
woeirua 1 hours ago [-]
I don't get the use case for these... Their primary customers are enterprises. Are most enterprises happy with running daily tasks on a third party cloud outside of their ecosystem? I think not.
So who are they building these for?
dbbk 48 minutes ago [-]
Not really any different to GitHub Actions
dispencer 4 hours ago [-]
This wild, one of the pieces I was lacking for a very openclaw-esque future. Now I think I have all the mcp tools I need (github, linear, slack, gmail, querybear), all the skills I need, and now can run these on a loop.
Are they going to mirror every tool software engineers were used to for decades, but in a mangled/proprietary form?
I think to become really efficient they'll have to invent new programming language to eliminate all the ambiguity and non-determinism. Call it "prompt language", with ai-subroutines, ai-labels and ai-goto.
causal 2 hours ago [-]
Haven't Github-triggered LLMs already been the source of multiple prompt injection attacks? Seems bad.
vessenes 5 hours ago [-]
This is one of the best features of OpenClaw - makes sense to swipe it into Claude Code directly. I wonder if Anthropic wants to just make claude a full stand-in replacement for openclaw, or just chip away at what they think the best features are, now that oAI has acquired.
mkw5053 5 hours ago [-]
What are some of the best use cases you've found? I have some gh actions set up to call claude code, but those have already been possible.
taw1285 4 hours ago [-]
I have a small team of 4 engineers, each of us is on the personal max subscription plan and prefer to stay this way to save cost.
Does anyone know how I can overcome the challenge with setting up Routines or Scheduled Tasks with Anthropic infra in a collaborate manner: ie: all teammates can contribute to these nightly job of cleaning up the docs, cleaning up vibe coding slops.
hallway_monitor 3 hours ago [-]
My team was doing this until recently but I think in February, Anthropic made team accounts available for subscription instead of API billing. Assuming that is the cost you mentioned.
srid 4 hours ago [-]
I just used this to summarize HN posts in last 24 hours, including AI summaries.
Seems like it only supports x86_64. It would be nice if they offered a way to bring your own compute, to be able to work on projects targeting arm64.
tills13 3 hours ago [-]
> react to GitHub events from Anthropic-managed cloud infrastructure
Oh cool! vendor lock-in.
egamirorrim 4 hours ago [-]
I wish they'd release more stuff that didn't rely on me routing all my data through their cloud to work. Obviously the LLM is cloud based but I don't want any more lock-in than that. Plus not everyone has their repositories in GitHub.
This is massive. Arguably will be the start of the move to openclaw-style AI.
I bet anthropic wants to be there already but doesn't have the compute to support it yet.
dpark 2 hours ago [-]
What’s massive about cron jobs and webhooks? I feel like I’m missing something. This is useful functionality but also seems very straightforward.
jcims 4 hours ago [-]
Is there a consensus on whether or not we've reached Zawinski's Law?
senko 4 hours ago [-]
I've had an AI assistant send me email digests with local news, and another watching a cron job, analyzing the logs and sending me reports if there's any problem.
I'd say that counts as yes.
(For clarity: neither are powered by Claude Code Routines. Rather, Claude Code coded them and they're simple cron jobs themselves.)
verdverm 4 hours ago [-]
TIL email is what I'm missing in my personal development (swiss army) tool
nico 5 hours ago [-]
Nice, could this enable n8n-style workflows that run fully automatically then?
outofpaper 5 hours ago [-]
Yes but much less efficiently. Having LLMs handle automation is like using a steam engine to heat your bath water. It will work most of the time but it's super inefficient and not really designed for that use and it can go horribly wrong from time to time.
meetingthrower 4 hours ago [-]
Correct. But the llm can also program you the exact automation you want! Much more efficiently than gui madness with N8N. And if you want observability just program that too!
meetingthrower 4 hours ago [-]
Already very possible and super easy if you do a little vibecoding. Although it will hit the api. Have a stack polling my email every five minutes, classifying email, taking action based on the types. 30 minute coding session.
desireco42 4 hours ago [-]
I think they are using Claude to come up with these and they will bringing one every second day... In fact, this is probably routine they set.
teucris 3 hours ago [-]
My only real disappointment with Claude is its flakiness with scheduling tasks. I have several Slack related tasks that I’ve pretty much given up trying to automate - I’ve tried Cowork and Claude Code remote agents, only to find various bugs with working with plugins and connectors. I guess I’ll give this a try, but I don’t have high hopes.
varispeed 5 hours ago [-]
Why would you use it if you don't know whether the model will be nerfed at that run?
ale 5 hours ago [-]
So MCP servers all over again? I mean at the end of the day this is yet another way of injecting data into a prompt that’s fed to a model and returned back to you.
verdverm 4 hours ago [-]
One gripe I have with Claude Code is that the CLI, Desktop app, and apparently the Webapp have a Venn Diagram of features. Plugins (sets of skills and more) are supported in Code CLI, maybe in Cowork (custom fail to import) but not Code Desktop. Now this?
The report that they are 90% Ai code generated seems more likely the more I attempt to use their products.
bottlepalm 2 hours ago [-]
Their source code leak showed how badly vibe coded Claude Code is, despite it being one of the best AI assistants.
But yea there's some annoying overlap here with Cowork which also has scheduled tasks, in Cowork the tasks can use your desktop, browser and accounts which is pretty useful - a big difference from these Claude Code Routines.
consumer451 4 hours ago [-]
meta:
Sorry, but I just have to ask. Why is u/minimaxir's comment dead? Is this somehow an error, an attack, or what?
This is a respected user, with a sane question, no?
I vouched, but not enough.
edit: His comment has arisen now. Leaving this up for reference.
irthomasthomas 3 hours ago [-]
We live in strange times!
crooked-v 5 hours ago [-]
The obvious functionality that seems to be missing here is any way to organize and control these at an organization rather than individual level.
qwertyuiop_ 39 minutes ago [-]
“Scheduled tasks and actions invoked by callback urls”
bpodgursky 5 hours ago [-]
OpenClawd had about a two week moat...
Feature delivery rate by Anthropic is basically a fast takeoff in miniature. Pushing out multiple features each week that used to take enterprises quarters to deliver.
nightpool 5 hours ago [-]
Do you mean a 3 months moat? Moltbot started going viral in January. That seems to be about a quarter to deliver to me : )
jcims 4 hours ago [-]
>Feature delivery rate by Anthropic is basically a fast takeoff in miniature.
I like to just check the release notes from time to time:
The velocity of shipping is wild. Though I cannot recall a novel feature they shipped first. Can you?
whalesalad 5 hours ago [-]
Hard to wanna go all-in on the Anthropic ecosystem with how inconsistent model output from their top-tier has been recently. I pay $$$ for api-level opus 4.6 to avoid any low-tier binning or throttling or subversive "its peak rn so we're gonna serve up sonnet in place of opus for the next few hours" but I still find that the quality has been really hit or miss lately.
The bell curve up and then back down has been so jarring that I am pivoting to fully diversifying my use of all models to ensure that no one org has me by the horns.
bpodgursky 5 hours ago [-]
yeah i mean nobody uses Claude anymore, the utilization is too high
chrisweekly 4 hours ago [-]
right, like the bar nobody goes to anymore bc it's always too crowded
renticulous 4 hours ago [-]
Anthropic is trying to be AI version of AWS.
twoodfin 4 hours ago [-]
That is a really tough business if you can't match AWS' efficiency & reliability at scale. Presumably AWS also wants to be the AI version of AWS.
(Amazon + Anthropic does seem like a much more compelling enterprise collaboration / acquisition than Microsoft + OpenAI ever did.)
dbbk 5 hours ago [-]
And yet none of them work properly and are unstable.
slopinthebag 5 hours ago [-]
You're delusional if you think these features would take competent programmers quarters to deliver.
buster 4 hours ago [-]
He said "enterprises" not "competent programmers".
slopinthebag 2 hours ago [-]
Is Anthropic not an enterprise?
unshavedyak 4 hours ago [-]
Maybe they were accounting for huge layers of red tape in large orgs. God knows those are far slower than "competent programmers" lol
slopinthebag 2 hours ago [-]
That red tape doesn't disappear when you start vibe coding tho.
hamuraijack 1 hours ago [-]
please, no more features. just fix context bloat.
Shinobis_dev 2 hours ago [-]
[dead]
KaiShips 3 hours ago [-]
[dead]
lo1tuma 3 hours ago [-]
[dead]
maroondlabs 3 hours ago [-]
[dead]
vdalhambra 3 hours ago [-]
[dead]
hankerapp 1 hours ago [-]
[dead]
SadErn 5 hours ago [-]
[dead]
Rendered at 22:24:17 GMT+0000 (Coordinated Universal Time) with Vercel.
- No trust that they won't nerf the tool/model behind the feature
- No trust they won't sunset the feature (the graveyard of LLM-features is vast and growing quickly while they throw stuff at the wall to see what sticks)
- No trust in the company long-term. Both in them being around at all and them not rug-pulling. I don't want to build on their "platform". I'll use their harness and their models but I don't want more lock-in than that.
If Anthropic goes "bad" I want to pick up and move to another harness and/or model with minimal fuss. Buying in to things like this would make that much harder.
I'm not going to build my business or my development flows on things I can't replicate myself. Also, I imagine debugging any of this would be maddening. The value add is just not there IMHO.
EDIT: Put another way, LLM companies are trying to climb the ladder to be a platform, I have zero interest in that, I was a "dumb pipe", I want a commodity, I want a provider, not a platform. Claude Code is as far into the dragon's lair that I want to venture and I'm only okay with that because I know I can jump to OpenCode/Codex/etc if/when Anthropic "goes bad".
I also clearly see the lock-in/moat strategy playing out here, and I don't like it. It's classic SV tactics. I've been burned too many times to let it happen again if I can help it.
To the contrary, they've proven again and again and again they'll absolutely do that the first chance they get.
For example, this demo (https://github.com/barnum-circus/barnum/tree/master/demos/co...) converts a folder of files from JS to TS. It's something an LLM could (probably) do a decent job of, but 1. not necessarily reliably, and 2. you can write a much more complicated workflow (e.g. retry logic, timeout logic, adding additional checks like "don't use as casts", etc), 3. you can be much more token efficient, and 4. you can be LLM agnostic.
So, IMO, in the presence of tools like that, you shouldn't bother using /loop, code routines, etc.
Yes, I expect that is very much the point here. A bunch of product guys got on a whiteboard and said, okay the thing is in wide use but the main moat is that our competitors are even more distrusted in the market than we are; other than that it's completely undifferentiated and can be swapped out in a heartbeat for multiple other offerings. How do we do we persuade our investors we have a locked in customer base that won't just up-stakes in favour of other options or just running open source models themselves?
I see people making similar conclusions about various LLM providers. I suspect in the end it’ll shake out about the same way, the providers will become practically inoperable with each other either due to inconvenience, cost, or whatever. So I’ve not wasted much of my time thinking about it.
What grinds my gears is how Anthropic is actively avoiding standards. Like being the only harness that doesn't read AGENTS.md. I work on AI infra and use different models all the time, Opus is really good, but the competition is very close. There's just enough friction to testing those out though, and that's the point.
My point was, I don't think it mattered much, and it feels like an ok comparison - cloud offerings are mostly the exact same things, at least at their core, but the ecosystem around them is the moat, and how expensive it is to migrate off of them. I would not be surprised at all if frontier AI model providers go much the same way. I'm pretty much there already with how much I prefer claude code CLI, even if half the time I'm using it as a harness for OpenAI calls.
Claude Code routines sounds useful, but at the same time, under AI-codepocalypse, my guess is it would take an afternoon to have codex reimplement it using some existing freemium SaaS Cron platform, assuming I didn't want to roll my own (because of the maintenance overhead vs paying someone else to deal with that).
I actually trust that they will.
1) that AI will be more advanced in the future
2) that the AI I am using will be worse in the future
* make sure the model maxes out all benchmarks
* release it
* after some time, nerf it
* repeat the same with the next model
However, the net sum is positive: in general, models from 2026 are better than those from 2024.
Not just that, but there’s really no way to come to an objective consensus of how well the model is performing in the first place. See: literally every thread discussing a Claude outage or change of some kind. “Opus is absolutely incredible, it’s one shotting work that would take me months” immediately followed by “no it’s totally nerfed now, it can’t even implement bubble sort for me.”
Suuuuuuure it was.
That said, I had way better experiences with old (but contemporary) Apple hardware than any other kind of old hardware.
I never asked for a 1M context window, then I got it and it was nice, now it's as if it was gone again .. no biggie but if they had advertised it as a free-trial (which it feels like) I wouldn't have opted in.
Anyways, seems I'm just ranting, I still like Claude, yes but nonetheless it still feels like the game you described above.
https://x.com/lydiahallie/status/2039800718371307603
--- start quote ---
Digging into reports, most of the fastest burn came down to a few token-heavy patterns. Some tips:
• Sonnet 4.6 is the better default on Pro. Opus burns roughly twice as fast. Switch at session start.
• Lower the effort level or turn off extended thinking when you don't need deep reasoning. Switch at session start.
• Start fresh instead of resuming large sessions that have been idle ~1h
• Cap your context window, long sessions cost more CLAUDE_CODE_AUTO_COMPACT_WINDOW=200000
--- end quote ---
https://x.com/bcherny/status/2043163965648515234
--- start quote ---
We defaulted to medium [reasoning] as a result of user feedback about Claude using too many tokens. When we made the change, we (1) included it in the changelog and (2) showed a dialog when you opened Claude Code so you could choose to opt out. Literally nothing sneaky about it — this was us addressing user feedback in an obvious and explicit way.
--- end quote ---
Too bad we've now managed to turn programming into the same annoying guesswork.
but you can replicate these yourself! i'm happy that ant/oai are experimenting to find pmf for "llm for dev-tools". After they figure out the proper stickyness, (or if they go away or nerf or raise prices, etc) you can always take the off-ramp and implement your own llm/agent using the existing open-source models. The cost of building dev-tools is near zero. it is not like codegen where you need the frontier performance.
Chinese models (GLM, MiniMax) are better.
It changes a number of things. Not all tasks require very high intelligence, but a lot of data may be sensitive enough to avoid sharing it with a third party.
This isn’t an LLM. It’s a product powered by an LLM. You don’t get access to the model you get access to the product.
An LLM can’t do a web search, an LLM can’t convert Excel files into something and then into PDF. Products do that.
I think it’s a mistake to say I don’t trust this engine to get me here, rather than it is to say I don’t trust this car. Because for the most part, the engine, despite giving you a different performance all the time is roughly doing the same thing over and over.
The product is the curious entity you have no control over.
The funniest thing Ive heard is that now we have LLMs, Humanoid robots are on the horizon. Like wtf? People who jump to these conclusions were never deep thinkers in the first place. And thats OK, its good to signal that. So we know who to avoid.
Increasingly, we're also seeing the moat shrink somewhat. Frontier models are converging in performance (and I bet even Mythos will get matched) and harnesses are improving too across the board (OpenCode and Codex for example).
I get why they're trying to do that (a perception of a moat bloats the IPO price) but I have little faith there's any real moat at all (especially as competitors are still flush with cash).
I'm currently hosting, on very reasonable consumer grade hardware, an LLM that is on par performance wise what every anyone was paying for about a year ago. Including all the layers in between the model and the user.
Llama.cpp serves up Gemma-4-26B-A4B, Open WebUI handles the client details: system prompt, web search, image gen, file uploading etc. With Conduit and Tailscale providing the last layer so I can have a mobile experience as robust as anything I get from Anthropic, plus I know how all the pieces works and can upgrade, enhance, etc to my hearts delight. All this runs from a pretty standard MBP at > 70 tokens/sec.
If you want to better understand the agent side of things, look into Hermes agent and you can start understanding the internals of how all this stuff is done. You can run a very competitive coding agent using modest hardware and open models. In a similar note, image/video gen on local hardware has come a long way.
Just like Linux, you're going to exchanging time for this level of control, but it's something anyone who takes LLMs seriously and has the same concerns can easily get started with.
Yet I still see comments like this that seem to complete ignore the incredible work in the open model community that has been perpetually improving and is starting to really be competitive. If you relax the "local" requirement and just want more performance from an LLM backend you can replace the llama.cpp part with a call to Kimi 2.5 or Minimax 2.7 (which you could feasibly run at home, not kimi though). You can still control all the additional part of the experience but run models that are very competitive with current proprietary SoTA offering, 100% under your control still and a fraction of the price.
This Routines feature notably works with the subscription, and it also has API callbacks. So if my Telegram bot calls that API... do I get my Anthropic account nuked or not?
The Chilling Effect of this is real and it gets more and more frustrating that they can't or won't clarify.
I downgraded my $200/mo sub to $20 this past week and I’m going to try out Codex’s Pro plans. Between the cache TTL (does it even affect me? No idea), changes in the rate limit, 429 rate limit HTTP status code during business hours, adaptive thinking (literally the worst decision they’ve ever made, as far as my line of work is concerned), dumb agent behavior silently creating batshit insane fallthroughs, clearly vibe coded harness/infrastructure, and their total lack of transparency, I think I’m done. It was fun while it lasted but I’m tired of paying for their mistakes in capacity planning and I feel like the big rug pull (from all three SOTA providers) is coming like a freight train.
I didn't even know what opencode was prior to that drama, yet now here i am using opencode and a ton of crafted openai agents in my projects. Would love to have some claude agents in that mix, but i guess im stuck in Claude Code if i wanna even touch their models... I'd love to go back to just claude as i "trust" them more in a sorta less evil vibe manner, but if they are gonna prevent subscription usage to something people use to allow themselves more freedom, they gotta then close that gap with their own tools rather than pumping out stuff like this which scares me off given the past couple months.
I totally understand why they are cutting off 3pa access to stuff like openclaw, where the avg user is just a power user in comparison to avg claude user or whatever. I haven't kept up a ton with their opencode issues, but I just know i can't get behind a company actively trying to make my potential usage of tokens less optimized to keep me locked into their ecosystem.
Really just kinda hoping local models kill it all for devs after a few years, I'm not interested in perma relying on data centers for my workflow.
Another comparison would be "unlimited storage", where "unlimited" means some people will abuse it and the company will soon limit the "unlimited."
- SDK that allows you to use OAuth authentication!
- Docs updated to say DO NOT USE OAUTH authentication unless authorized! [0]
- Anthropic employee Tweeting "That's not what we meant! It's fine for personal use!" [1]
- An email sent out to everyone saying it's NOT fine do NOT use it [2]
Sigh.
[0] https://code.claude.com/docs/en/agent-sdk/overview#get-start...
[1] https://www.reddit.com/r/ClaudeAI/comments/1r8et0d/update_fr...
[2] https://news.ycombinator.com/item?id=47633396
edit: And specifically i'm making an IDE, and trying to get ClaudeCode into it. I frankly have no clue when Claude usage is simply part of an IDE and "okay" and when it becomes a third party harness..
It says in the prohibited use section:
> Except when you are accessing our Services via an Anthropic API Key or where we otherwise explicitly permit it, to access the Services through automated or non-human means, whether through a bot, script, or otherwise.
So it seems like using a harness or your own tools to call claude -p is fine, AS LONG AS A HUMAN TRIGGERS IT. They don’t want you using the subscription to automate things calling claude -p… unless you do it through their automation tools I guess? But what if you use their automation tool to call your harness that calls claude -p? I don’t actually know. Does it matter if your tool loops to call claude -p? Or if your automation just makes repeated calls to a routine that uses your harness to make one claude -p call?
It is not nearly as clear as I thought 10 minutes ago.
Edit: Well, I was just checking my usage page and noticed the new 'Daily included routine runs' section, where it says you get 15 free routine runs with your subscription (at least with my max one), and then it switches to extra usage after that. So I guess that answers some of the questions... by using their routine functionality they are able to limit your automation potential (at least somewhat) in terms of maxing out your subscription usage.
It would be absurd to me if the same application is somehow allowed via ACP but not via official SDK. Though perhaps the official SDK offers data/features that they don't want you to use for certain scenarios? If that were they case though it would be nice if they actually published a per-SDK-API restrictions list.
That we're having to guess at this feels painful.
edit: Hah, hilariously you're still using the SDK even if you use ACP, since Claude doesn't have ACP support i believe? https://github.com/agentclientprotocol/claude-agent-acp
Example: https://news.ycombinator.com/item?id=47737924
> My point is that they must apply these restrictions.
I fully understand and respect they need restrictions on how you can use your subscription (or any of their offerings). My issue is not there there _are_ restrictions but that the restrictions themselves are unclear which leads to people being unsure where the line is (that they are trying not to cross).
Put simply: At what point is `claude -p` usage not allowed on a subscription:
- Running `claude -p` from the CLI?
- Running `claude -p` on a Cron?
- Running `claude -p` as a response to some external event? (GH action, webhook, etc?)
- Running `claude -p` when I receive a Telegram/Discord/etc message (from myself)?
Different people will draw the line in different places and Anthropic is not forthcoming about what is or is not allowed. Essentially, there is a spectrum between "Running claude by hand on the command line" and "OpenClaw" [0] and we don't know where they draw the line. Because of that, and because the banning process is draconian and final with no appeals, it leads to a lot of frustration.
[0] I do not use OpenClaw nor am I arguing it should be allowed on the subscription. It would be nice if it was but I'm not saying it should be. I'm just saying that OpenClaw clearly is _not_ allowed but `claude -p` wouldn't be usable at all with a subscription if it was completely banned so what can it (safely) be used for?
The new reality of coding took away one of the best things for me - that the computer always just does what it is told to do. If the results are wrong it means I'm wrong, I made a bug and I can debug it. Here.. I'm not a hater, it's a powerful tool, but.. it's different.
... until this week! Opus is struggling worse than Sonnet those last two weeks.
There's utility in LLMs for coding, but having literally the entire platform vibe-coded is too much for me. At this point, I might genuinely believe they're not intentionally watering anything down, because it's incredibly believable that they just have no clue how any of it works anymore.
But this week I've lost count of the times I've had to say something along the lines of: "Can you check our plan/instructions, I'm pretty sure I said we need to do [this thing] but you've done [that thing]..."
And get hit with a "You're absolutely right...", which virtually never happened for me. I think maybe once since Opus 4-6.
We're watching a speed run of growthism, folks.
Oh uh... ok then.
EDIT: This comment is apparently [dead] and idk why.
I think the real issue stems from the 1 Million token context window change. They did not anticipate the amount of load it would give you. That first few days after they released the new token window, I was making amazing things in one single session from nothing, to something (a new .NET based programming language inspired by Python, and a Virtual Actor framework in Rust). I think since then they've been trying too many things to tweak things, whilst irritating their users.
They even added a new "Max" thinking mode, and made "High" the old medium, which is ridiculous because you think you're using "High" but really you're not. There's a hidden config file to change their terrible defaults to let Claude be smarter still, and apparently you can toggle off the 1M tokens.
I think the real fix, and I'm surprised nobody there has done this yet, is to let the user trim down their context window.
Think about it, you used to have what? 350k tokens or so? Now Claude will keep sending your prompt from 30 minutes ago that's completely irrelevant to the back-end, whereas 3 months ago it would have been compacted by now.
Others have noted that similar prompting for some ungodly reason adds tens of thousands of extra garbage tokens (not sure why).
Edit looks like someone figured out that if you downgrade your version of Claude Code and change one single setting it unruins Claude:
https://news.ycombinator.com/item?id=47769879
A bit annoying, but not the end of the world.
I'm thinking they should go back to all their old settings and as a user cap you at their old token limit, and ask you if you want to compact at your "soft" limit or burst for a little longer, to finish a task.
In a way, it’s true if china has superior AI then it’s dominance over US will materialize. But it’s not hard to see how this scenario is being used to essential lie and scam into trillions of debt.
Its interesting how the cutthroat space of big tech has manifested into an incidious hyper capitalist system where disrupting a system is it’s primary function. The system in this case is world order and western governments
Ironically, they are now playing against their own models that can relatively easily build wrappers around any API shape into any other API shape.
Luckily you can turn if off pretty easily, but I don't know why it's on by default to begin with. I guess holdover from when people used it with a $20 subscription and didn't care.
Your own, personal, Jevons.
It was a bit buggy, but it seems to work better now. Some use cases that worked for me:
1. Go over a slack channel used for feedback for an internal tool, triage, open issues, fix obvious ones, reply with the PR link. Some devs liked it, some freaked out. I kept it.
2. Surprisingly non code related - give me a daily rundown (GitHub activity, slack messages, emails) - tried it with non Claude Code scheduled tasks (CoWork) not as good, as it seems the GitHub connector only works in Claude Code. Really good correlation between threads that start on slack, related to email (outlook), or even my personal gmail.
I can share the markdowns if anyone is interested, but it's pretty basic.
Very useful, (when it works).
We ought to come up with a term for this new discipline, eg "software engineering" or "programming"
airgramming plusgramming programming maxgramming studiogramming
and recently the brand new way of working: Neogramming !
Personally I stick for now with the "Programming " tier. Maybe will upgrade to "Maxgramming" later this year...
They support much of the same triggers and come with many additional security controls out of the box
Cursor has that too by the way (issue -> remote coding session -> PR -> update slack)
The main bugs / missing features are
1. It loses connection to it's connectors, mostly to the slack connector. It does all the work, then says it can't connect to slack. Then when you show it a screenshot of itself with the slack connector, it will say, oh, yeah, the tools are now loaded and does the rest of the routine.
2. ability to connect it to github packages / artifactory (private packages) - or the dangerous route of allowing access to some sort of vault (with non critical dev only secrets... although it's always a risk. But cursor has it...)
3. the GitHub MCP not being able to do simple things such as update release markdown (super simple use case of creating automated release notes for example)
You are so close, yet so far...
This happens in all their UIs, including, say, Claude in Excel, as well.
You can still use OpenClaw on their API pricing tier as much as you want. What they did is not allow subscriptions to be used to power automated third-party workloads, including OpenClaw.
Now, is their messaging around this confusing? Absolutely. The whole thing has been handled shambolically. Everyone knows that they lack the compute to keep up, and likely have lower margins on subscriptions than API; but they cannot just say that because investors may be skittish.
...Except now you sorta-kinda can: now they auto-detect 3rd party stuff and bill you per-token for it?
If I'm reading it right:
https://news.ycombinator.com/item?id=47633568
The reason someone would use this vs. third-party alternatives is still the fact that the $200/mo subscription is markedly cheaper than per-token API billing.
Not sure how this works out in the long term when switching costs are virtually zero.
All these not really helpful, but vendor specific, "bonuses" sounds like a way to try to lock people in, to try to raise the switching cost.
I'm using, on purpose, a simple process so that at any time I can switch AI provider.
And because they use AI heavily, they produce new product every week. So fast, that I have no time to check, does it worth or not.
This one looks interesting. I have some custom commands that I execute manually weekly, for monitoring, audits, summary, reports. It it can send reports on email, or generate something that I can read in the morning with my coffee, or after I finish with it ;) it might be a good tool.
The question is, do I really want to so much productive? I am already much better in performance with AI, compared with the 'old school' way...
Everything is just getting to much for me.
So who are they building these for?
Am I needed anymore?
I think to become really efficient they'll have to invent new programming language to eliminate all the ambiguity and non-determinism. Call it "prompt language", with ai-subroutines, ai-labels and ai-goto.
This PR was created by the Claude Code Routine:
https://github.com/srid/claude-dump/pull/5
The original prompt: https://i.imgur.com/mWmkw5e.png
Oh cool! vendor lock-in.
I bet anthropic wants to be there already but doesn't have the compute to support it yet.
I'd say that counts as yes.
(For clarity: neither are powered by Claude Code Routines. Rather, Claude Code coded them and they're simple cron jobs themselves.)
The report that they are 90% Ai code generated seems more likely the more I attempt to use their products.
But yea there's some annoying overlap here with Cowork which also has scheduled tasks, in Cowork the tasks can use your desktop, browser and accounts which is pretty useful - a big difference from these Claude Code Routines.
Sorry, but I just have to ask. Why is u/minimaxir's comment dead? Is this somehow an error, an attack, or what?
This is a respected user, with a sane question, no?
I vouched, but not enough.
edit: His comment has arisen now. Leaving this up for reference.
Feature delivery rate by Anthropic is basically a fast takeoff in miniature. Pushing out multiple features each week that used to take enterprises quarters to deliver.
I like to just check the release notes from time to time:
https://github.com/anthropics/claude-code/releases
and the equally frenetic openclaw:
https://github.com/openclaw/openclaw/releases
GPT-4.1 was released a year ago today. Sonnet 4 is ~11 months old. The claude-code cli was released last Feb. Gas Town is 3 months old.
This is a chart that simply counts the bullet points in the release notes of claude code since inception:
https://imgur.com/a/tky9Pkz
This is as bad and as slow as it's going to be.
The bell curve up and then back down has been so jarring that I am pivoting to fully diversifying my use of all models to ensure that no one org has me by the horns.
(Amazon + Anthropic does seem like a much more compelling enterprise collaboration / acquisition than Microsoft + OpenAI ever did.)