> OpenClaw has nearly half a million lines of code, 53 config files, and over 70 dependencies. This breaks the basic premise of open source security. Chromium has 35+ million lines, but you trust Google’s review processes. Most open source projects work the other way: they stay small enough that many eyes can actually review them. Nobody has reviewed OpenClaw’s 400,000 lines.
This reminds me of a very common thing posted here (and elsewhere, e.g. Twitter) to promote how good LLMs are and how they're going to take over programming: the number of lines of code they produce.
As if every competent programmer suddenly forgot the whole idea of LoC being a terrible metric to measure productivity or -even worse- software quality. Or the idea that software is meant to written to be readable (to water down "Programs are meant to be read by humans and only incidentally for computers to execute" a bit). Or even Bill Gates' infamous "Measuring programming progress by lines of code is like measuring aircraft building progress by weight".
Even if you believe that AI will -somehow- take over the whole task completely so that no human will need to read code anymore, there is still the issue that the AIs will need to be able to read that code and AIs are much worse at doing that (especially with their limited context sizes) than generating code, so it still remains a problem to use LoCs as such a measure even if all you care are about the driest "does X do the thing i want?" aspect, ignoring other quality concerns.
gyomu 3 hours ago [-]
Yeah, it’s pretty wild. Even pg is tweeting stuff like
“An experienced programmer told me he's now using AI to generate a thousand lines of code an hour.“
Like if you had told pg to his face in (pre AI) office hours “I’m producing a thousand lines of code an hour”, I’m pretty sure he’d have laughed and pointed out how pointless that metric was?
manoDev 2 hours ago [-]
They need to keep the musical chairs going.
medi8r 3 hours ago [-]
He is a Lisper too, making it more ironic. Lisp the power to heavily reduce cruft by heavy customization with macros.
saltcured 49 minutes ago [-]
Thousand left-parens per hour...?
steve1977 1 hours ago [-]
We all know that a thousand parentheses would be better metric.
amelius 2 hours ago [-]
Technical debt is increasing by 1,000 lines an hour.
wiseowise 3 hours ago [-]
It’s all virtual virtue signaling. If you were to say this shit in the office, you’d be walked out pretty fast.
ElProlactin 3 hours ago [-]
Enshittification comes for us all
supriyo-biswas 2 hours ago [-]
Somehow, this narrative has taken hold at multiple levels of management, especially amongst non-technical management, that "typing" was somehow the bottleneck of software engineering, reality is however more complex.
The act of "typing" code was technically mixed in with researching solutions, which means that code often took a different shape or design based on the outcome of that activity. However, this nuance has been typically ignored for faff, with the outcome that management thinks that producing X lines of code can be done "quickly", and people disagreeing with said statements are heretics who should be burned at the stake.
This is why, in my personal opinion, AI makes me only 20% productive, I often find disagreeing with the solution that it came up with and instead of having to steer it to obtain the outcome I want, I just end up rewriting the code myself. On the other hand, for prototypes where I don't care about understanding the code at all, it is more of a bigger time saver.
I could not care about the code at all, and while that is acceptable to management, not being responsible for the code but being responsible for the outcomes seems to be the same shit as being given responsibilities without autonomy, which is not something I can agree with.
bee_rider 43 minutes ago [-]
“LoC is a bad metric” has been the catchphrase of engineers for years, because it runs counter to the expectations of management and the general public, right? So it makes sense that LoC is the metric used to advertise to them.
MadxX79 3 hours ago [-]
Brook's law anno 2026:
"Adding manpower to a late software project makes it later -- unless that manpower is AI, then you're golden!"
smikhanov 3 hours ago [-]
That law (formulated in the 70s, I’ll remind the reader) wasn’t true for at least couple decades now.
medi8r 3 hours ago [-]
Why not? What changed? It seems like a human factors thing. New people have to get up to speed. Doers become trainers.
smikhanov 2 hours ago [-]
Several related reasons working at once. The nature of work changed. The boundary between accidental and incidental complexity shifted (and it’s unclear whether this distinction still exists). Niche specializations within the field emerged. The way to structure and decompose projects changed dramatically (agile and stuff).
One pathological example: if you’re running a server-based product, quite often what stands between you and a new feature launch is literally couple of thousands of lines of Kubernetes YAML. Would adding someone who’s proficient in Kubernetes slow you down? Of course not.
One may say, hey, this is just the server-side Kubernetes-based development being insane, and I’ll say, the whole modern business of software development is like this.
medi8r 2 hours ago [-]
Hmm interesting, thanks! I was ready to argue but now I have to think, which is even better.
smikhanov 2 hours ago [-]
That’s a lovely comment, thank you. If you’re keen to think about it more, consider the fact that the existing members of the project that’s being late are actually in not as much of an advantage compared to the new joiners, as it’s common to think.
Yes, they know how the feature they work on relates to other features, but actually implementing that feature is very often mostly involves fighting with technology, wrangling the entire stack into the shape you need.
In Brooks’s times the stack was paper-thin, almost nonexistent. In modern times it’s not, and adding someone who knows the technology, but doesn’t have the domain knowledge related to your feature still helps you. It doesn’t slow you down.
One may argue that I’m again pointing to the difference between accidental and incidental complexity, and my argument is essentially “accidental complexity takes over”, but accidental complexity actually does influence your feature too, by defining what’s possible and what’s not.
I sort of agree that the surface area and incidental complexity of stacks give more space to plug more developers in than was true in the 70s and 80s. But I disagree strongly this invalidates Brooks Law. Certainly there are cases where adding people helps, especially if they are stronger engineers than the ones that are already there, but I’ve also seen way too many projects devolve into resourcing conversations when the real problem was over-complicated, poorly reasoned requirements, boil-the-ocean solutions promising a perfect end state without a clear plan to get there iteratively.
KronisLV 2 hours ago [-]
As lines of code become executable line noise, I swear that we need better approaches to developing software - either enforce better test coverage across the board, develop and use languages where it’s exceedingly hard to end up with improper states, or sandbox the frick out of runtimes and permissions.
Just as an example, I should easily be able to give each program an allowlist of network endpoints they’re allowed to use for inbound and outgoing traffic and sandbox them to specific directories and control resource access EASILY. Docker at least gets some of those right, but most desktop OSes feel like the Wild West even when compared to the permissions model of iOS.
hirako2000 2 hours ago [-]
More people believe a software developer job and value is in the lines of code produced.
Perhaps over half of engineering managers unconsciously or admittedly take the amount of PR and code additions as a rough but valid measure of productivity.
I recall a role in architecture, senior director asking me how come a principal engineer didn't commit any code in 2 weeks, that we pay principals a fortune.
I asked that brilliant mind whether we paid principal engineers to code or to make sure we deliver value.
Needless to say the with question went unanswered, so called Principal was fired a few months later. The entire company in fact was sold for a bargain too given it had thousands of clients globally.
The LLM can replace engineers is a phenomenon that converge from two simple facts, we haven't solved the misconception of the engineering roles. And it's the perfect scapegoat to justify layoffs.
Leaders haven't all gone insane, they answer to difficult questions with the narrative of least resistance.
tdeck 2 hours ago [-]
I asked Grok to rewrite your comment and it did it in 2400 words. I hope you know you'll be obsolete soon.
sd9 3 hours ago [-]
LLMs are incredibly eager to write new code, rather than modifying or integrating with existing systems. I agree that context windows are too small currently for this to seem sustainable. Without reasonable architecture pure vibe coded software feels like it’s going to cap out at a certain size.
CuriouslyC 2 hours ago [-]
The lines of code thing isn't because we think it's a good metric, but because we have literally no good metric and we're trying to communicate a velocity difference. If you invent a new metric that doesn't have LoC's problems while being as easy to use, you'll be a household name in software engineering in short order.
Also, AI is better at reading code than writing it, but the overhead to FIND code is real.
2 hours ago [-]
inciampati 2 hours ago [-]
Lines of code are nothing. It's verification that creates value.
wredcoll 2 hours ago [-]
Really it just continues to demonstrate that "code quality" is not and was not a requirement.
Even with supposedly expert human hand written software powering our products for the last decades, they frequently crash, have outages, and show all sorts of smaller bugs.
There are literally too many examples to count of video games being released with nigh-unplayable amounts of bugs and still selling millions and producing sequels.
Windows 95 and friends were famously buggy and crash prone yet produced one of the most valuable companies in the world.
ninkendo 2 hours ago [-]
Respectfully, it feels like your position requires a very low, if not brain-dead level of incompetence on the part of LLM users, in order for your conclusion to be correct.
My personal anecdote: I used an LLM recently to basically vibe code a password manager.
Now, I’ve been a software engineer for 20 years. I’m very familiar with the process of code review and how to dive in to someone else’s code and get a feel for what’s happening, and how to spot issues. So when I say the LLM produced thousands of lines of working code in a very short time (probably at least 10 times faster than I would have done it), you could easily point at me and say “ha, look at ninkendo, he thinks more lines of code equals better!” And walk away feeling smug. Like, in your mind perhaps you think the result is an unmaintainable mess, and that the only thing I’m gushing about is the LOC count.
But here’s the thing: it actually did a good job. I was personally reviewing the code the whole time. And believe me when I say, the resulting product is actually good. The code is readable and obvious, it put clean separation of responsibilities into different crates (I’m using rust) and it wrote tons of tests, which actually validate behavior. It’s very near the quality level of what I would have been able to do. And I’m not half bad. (I’ve been coding in rust in particular, professionally for about 2 years now, on top of the ~20 years of other professional programming experience before that.)
My takeaway is that as a professional engineer, my job is going to be shifting from doing the actual code writing, to managing an LLM as if it’s my pair programming partner and it has the keyboard. I feel sad for the loss of the actual practice of coding, but it’s all over but the mourning at this point. This tech is here to stay.
bee_rider 46 minutes ago [-]
If you measure the productivity of the system that is “you, using an LLM” in terms of the rate at which you can get actually-reviewed code completed (which, based on your comment, seems to be what you were doing) that seems like a totally reasonable way of doing things. But in that case the bottleneck is probably you reviewing code, right? Which, I bet, is faster than writing code. But you probably won’t get the truly absurd superhuman speed ups.
What would you say is your multiplier, in terms of throughly reviewing code vs writing it from scratch?
spacecadet 3 hours ago [-]
I mean many of us have... I operate in a net negative mindset. My PRs, better remove more than they add.
I also use AI this way, periodically achieving a net negative refactor.
buremba 4 hours ago [-]
My take is that agents should only take actions that you can recover from by default. You can gradually give it more permission and build guardrails such as extra LLM auditing, time boxed whitelisted domains etc. That's what I'm experimenting with https://github.com/lobu-ai/lobu
1. Don't let it send emails from your personal account, only let it draft email and share the link with you.
2. Use incremental snapshots and if agent bricks itself (often does with Openclaw if you give it access to change config) just do /revert to last snapshot. I use VolumeSnapshot for lobu.ai.
3. Don't let your agents see any secret. Swap the placeholder secrets at your gateway and put human in the loop for secrets you care about.
4. Don't let your agents have outbound network directly. It should only talk to your proxy which has strict whitelisted domains. There will be cases the agent needs to talk to different domains and I use time-box limits. (Only allow certain domains for current session 5 minutes and at the end of the session look up all the URLs it accessed.) You can also use tool hooks to audit the calls with LLM to make sure that's not triggered via a prompt injection attack.
Last but last least, use proper VMs like Kata Containers and Firecrackers. Not just Docker containers in production.
alexhans 3 hours ago [-]
That's a decent practice from the lens of reducing blast radius. It becomes harder when you start thinking about unattended systems that don't have you in the loop.
One problem I'm finding discussion about automation or semi-automation in this space is that there's many different use cases for many different people: a software developer deploying an agent in production vs an economist using Claude Vs a scientist throwing a swarm to deal with common ML exploratory tasks.
Many of the recommendations will feel too much or too little complexity for what people need and the fundamentals get lost: intent for design, control, the ability to collaborate if necessary, fast iteration due to an easy feedback loop.
AI Evals, sandboxing, observability seem like 3 key pillars to maintain intent in automation but how to help these different audiences be safely productive while fast and speak the same language when they need to product build together is what is mostly occupying my thoughts (and practical tests).
daveguy 2 hours ago [-]
Current LLMs are nowhere near qualified to be autonomous without a human in the loop. They just aren't rigorous enough. Especially the "scientist throwing a swarm to deal with common ML exploratory tasks." The judgement of most steps in the exploratory task require human feedback based on the domain of study.
> Many of the recommendations will feel too much or too little complexity for what people need and the fundamentals get lost: intent for design, control, the ability to collaborate if necessary, fast iteration due to an easy feedback loop.
Completely agreed. This is because LLMs are atrocious at judgement and guiding the sequence of exploration is critically dependent on judgement.
Doublon 2 hours ago [-]
I'd like to try a pattern where agents only have access to read-only tools. They can read you emails, read your notes, read your texts, maybe even browse the internet with only GET requests...
But any action with side-effects ends up in a Tasks list, completely isolated. The agent can't send an email, they don't have such a tool. But they can prepare a reply and put it in the tasks list. Then I proof-read and approve/send myself.
If there anything like that available for *Claws?
fnord77 2 hours ago [-]
> 1. Don't let it send emails from your personal account, only let it draft email and share the link with you.
Right now there's no way to have fine-grained draft/read only perms on most email providers or email clients. If it can read your email it can send email.
> 3. Don't let your agents see any secret. Swap the placeholder secrets at your gateway and put human in the loop for secrets you care about.
harder than you might think. openclaw found my browser cookies. (I ran it on a vm so no serious cookies found, but still)
buremba 2 hours ago [-]
> Right now there's no way to have fine-grained draft/read only perms on most email providers or email clients. If it can read your email it can send email.
> harder than you might think. openclaw found my browser cookies. (I ran it on a vm so no serious cookies found, but still)
You should never give any secrets to your agents, like your Gmail access tokens. Whenever agents needs to take an action, it should perform the request and your proxy should check if the action is allowed and set the secrets on the fly.
That means agents should not have access to internet without a proxy, which has proper guardrails. Openclaw doesn't have this model unfortunately so I had to build a multi-tenant version of Openclaw with a gateway system to implement these security boundaries.
arianvanp 2 hours ago [-]
Literally every email client on the planet has supported `mailto:` URIs since basically the existence of the world wide web.
Just generate a mailto Uri with the body set to the draft.
VladVladikoff 4 hours ago [-]
This doesn’t really feel like enough guardrails to prevent the type of problems we’ve seen so far.
For example an agent in a single container which has access to an email inbox, can still do a lot of damage if that agent goes off the rails.
We agree this agent should not be trusted, yet the ideas proposed as a solution are insufficient. We need a fundamentally different approach.
Also and this is just my ignorance about Claws, but if we allow an agent permission to rewrite its code to implement skills, what stops it from removing whatever guardrails exist in that codebase?
drujensen 3 hours ago [-]
Exactly!
I installed nanoclaw to try to out.
What is kinda crazy is that any extension like discord connection is done using a skill.
A skill is a markdown file written in English to provide a step by step guide to an ai agent on how to do something.
Basically, the extensions are written by claude code on the fly. Every install of nanoclaw is custom written code.
There is nothing preventing the AI Agent from modifying the core nanoclaw engine.
It’s ironic that the article says “Don’t trust AI agents” but then uses skills and AI to write the core extensions of nanoclaw.
jimminyx 2 hours ago [-]
Author and creator of NanoClaw here.
I did my best to communicate this but I guess it was still missed:
NanoClaw is not software that you should run out of the box. It is designed as a sort of framework that gives a solid foundation for you to build your own custom version.
The idea is not that you toggle on a bunch of features and run it. You should customize, review, and make sure that the code does what you want.
So you should not trust the coding agents that they didn't break the security model while adding discord. But after discord is added, you review the code changes and verify that it's correct. And because even after adding discord you still only have 2-3k loc, it's actually something you can realistically do.
Additionally, the skills were originally a bit ad-hoc. Now they are full working, tested and reviewed reference implementations. Code is separate from markdown files. When adding a new integration or messaging channel, the agent uses `git merge` to merge the changes in, rather than rewriting from scratch. Adding the first channel is fully deterministic. The agent only resolves merge conflicts if there are any.
solfox 2 hours ago [-]
So, nanoclaw requires agents to code extensions on the fly to get to feature parity with openclaw… and you're celebrating nanoclaw having fewer LOC. How's the code smell after nanoclaw gets to feature parity?
MarkSweep 2 hours ago [-]
Yeah, the article's claim of having a low number of lines of code are disingenuous. Rather than writing some sort of plugin interface, it has "skills" that are a combination of pre-written typescript and English language instructions for how to modify the codebase to include the feature. I don't see how self-modifying code that uses a RNG to generate changes is going to be better for security than a proper plugin system. And everyone who uses Nanoclaw will have a customized version of it, so any bugs reported on Nanoclaw probably have a high chance of being closed as "can't reproduce". Why would you live this way?
sanex 3 hours ago [-]
Yes and and they still have code examples in them so its not like it somehow doesn't count. Plus if you run the skill good luck bringing in changes from master later.
fvdessen 2 hours ago [-]
I think the best place to put barriers in place is at the mcp / tool layer. The email inbox mcp should have guardrails to prevent damage. Those guardrails could be fine grained permissions, but could also be an adversarial model dedicated to prevent misuse.
gronky_ 4 hours ago [-]
Don’t know about other claws, with NanoClaw the agent can only rewrite code that runs inside the container.
Wouldn't you get >50% of the usefulness and 0% of the risk if you add read+draft permissions for the email connection through a proxy or oauth permissions? Then your claw can draft replies and you have to manually review+send. It's not a perfect PA that way, but could still be better than doing everything yourself for the vast majority of people who don't have a PA anyway?
It feels like, just like SWEs do with AI, we should treat the claw as an enthusiastic junior: let it do stuff, but always review before you merge (or in this case: send).
jrecyclebin 4 hours ago [-]
Agent can still "forgot password" on many accounts. Or magic link.
3 hours ago [-]
coffeefirst 3 hours ago [-]
Seriously. I don’t see any way to make any of this safe unless all it does is receive information and queue suggestions for the user.
But that’s not an agent, that’s a webhook.
Even without disk access, you can email the agent and tell it to forward all the incoming forgot password links.
[Edit: if anyone wants to downvote me that's your prerogative, but want to explain why I'm wrong?]
msdz 48 minutes ago [-]
I agree, this is inherently unsafe. The two core security issues for agents, I’d say, are in LLMs not producing a “deterministic” outcome, and prompt injection.
Prompt injection is _probably_ solvable if something like [1] ever finds a mainstream implementation and adoption, but agents not being deterministic, as in “do not only what I’ve told you to do, but also how I meant it”, all while assuming perfect context retention, is a waaay bigger issue. If we ever were to have that, software development as a whole is solved outright, too.
I was blown away by OpenClaw until I saw the bill. Ultimately, I think of these ecosystems as personal enhancements and AI costs need to come down dramatically for real problem. Worse, however, is the security theater. I would not want to be the operator for any business built with front-line LLM usage based on a yolo'd agent framework. I'm very happy to use these for silo'd components that are well isolated and have reasonable QA processes (and that can even included agents since now we literally have no excuse to not have amazing test coverage).
Their niche is going to be back office support, but even that creates risk boundaries that can be insurmountable. A friend of mine had a agent do sudo rm -rf ... wtf.
My view is that I want to launch an agent based service, but I'm building a statically typed ecosystem to do so with bounds and extreme limits.
cyanydeez 1 hours ago [-]
Look at AI like what search turned into: feed the user anything, even if wrong because not doing so will make your product look weak.
Thats what youll find when you try to make these bag-o-words do reasonable things.
lucrbvi 4 hours ago [-]
Why does OpenClaw have 800,000+ lines of code?? Isn't it just a connector for LLM APIs and other tools?
marginalia_nu 3 hours ago [-]
For comparison, the C++ and rust code in the ladybird browser is about 573,000 lines of code.
zarzavat 4 hours ago [-]
They are probably counting dependencies. Also, it's vibe coded, what do you expect!
I used to think that LLMs would replace humans but now I'm confident that I'll have a job in the future cleaning up slop. Lucky us.
scandinavian 4 hours ago [-]
I did a cloc check on it and it does seem to have 800k lines of typescript. So unless they are vendoring dependencies it's actually as insane as it sounds.
jsheard 3 hours ago [-]
Christ their repo is an absolute nightmare. There's new issues and PRs being posted practically every minute, and I assume 99% of them are from agents given the target demographic. Just full-auto vibeslop from all barrels 24/7.
Even if we count the repos whole lifetime, including when it wasn't so active, the averages are still absurd.
96 days / (4,239+9,170) issues = one issue every 10 minutes
96 days / (5,082+10,221) pull requests = one PR every 9 minutes
mihaelm 2 hours ago [-]
5000+ open PRs is pretty insane, that's the highest I've seen. How do you even keep track of this? We'll really need trust management systems like vouch (https://github.com/mitchellh/vouch/tree/main) for open source projects in the future to help with reducing noise.
CrazyStat 3 hours ago [-]
At least nobody can accuse them of not dogfooding enough.
cap11235 3 hours ago [-]
See also yeggae's beads. Last I checked, it is a 275k line todo tracker.
re-thc 2 hours ago [-]
> Why does OpenClaw have 800,000+ lines of code??
Because
I
write
like
this
-- signed
AI
Sytten 3 hours ago [-]
I am a caveman, I don't understand the need for a personal assistant. What are you guys using it for?
vitto_gioda 2 hours ago [-]
I only use my own “agent” ("my", because I program it myself, since my needs are different from yours) to retrieve information about the audio I upload to it (from video calls and audio recordings). No others use cases for me
1 hours ago [-]
smallpipe 4 hours ago [-]
Docker is not a security boundary. You’re one prompt injection away from handing over your gmail cookie.
benatkin 3 hours ago [-]
No, but Podman is. The recent escapes at the actual container level have been pretty edge case. It's been some years since a general container escape has been found. Docker's CVE-2025-9074 was totally unnecessary and due to Docker being Docker.
eyberg 2 hours ago [-]
No they have not been. There were at least 16 container escapes last year - at least 8 of them were at the runtime layer.
I personally spent way too much time looking at this in the past month:
Also, last time I checked podman uses runc by default.
xienze 2 hours ago [-]
The best container security in the world isn’t going to help you when the agent has credentials to third party services. Frankly, I don’t think bad actors care that much about exploiting agents to rm -rf /. It’s much more valuable to have your Google tokens or AWS credentials.
justonceokay 2 hours ago [-]
I have twice encountered a phone tree AI agent saying my problem could not be solved and then ending the call. One was for PayPal fraud and the other was for closing an unused bank account.
For right now my trick is to say I have a problem that is more recognizable and mundane to the ai (i .e. lie) and then when I finally get the human just say “oh that was a bunch of hooey here’s what I’m trying to do”. For PayPal that involved asking for help with a business tax that did not exist. For my bank it involved asking to /open/ a new account. Obviously th AI wants to help me open an account, even if my intention is to close one.
That will only work for so long but it’s something
echoangle 3 hours ago [-]
Looking at the NanoClaw GitHub README:
> If you want to add Telegram support, don't create a PR that adds Telegram alongside WhatsApp. Instead, contribute a skill file (.claude/skills/add-telegram/SKILL.md) that teaches Claude Code how to transform a NanoClaw installation to use Telegram.
Why would you want that? You want every user asks the AI to implement the same feature?
nojito 2 hours ago [-]
>Why would you want that? You want every user asks the AI to implement the same feature?
Yes. It's actually an amazing change of paradigm of thinking. Not everyone needs Telegram so the folks who want it can have the ai create it locally for themselves.
shich 4 hours ago [-]
the trust problem cuts both ways tho — users don't trust agents, but the bigger issue is agents trusting each other. once you have multi-agent pipelines, you're one rogue upstream output away from a cascade. sandboxing individual agents is table stakes; what's actually hard is defining trust boundaries between them
medi8r 2 hours ago [-]
Also agents cannot trust any data whatsoever they add to their context.
This puts reading email for example as a risk.
Probably not impossible to create a worm that convinces a claw to forward it to every email address in that inbox.
And then exfiltrate all the emails.
Then do a bunch of password resets.
Then get root access to your claw.
But not just email. Github issues, wikipedia, HN etc. may be poisoned.
I tried NanoClaw and love the skill (and container by default) model. But having skills generate new code in my personalized fork feels off to me… I think it’s because eventually the “few thousand auditable lines” idea vanishes with enough skills added?
Could skill contributions collapse into only markdown and MCP calls? New features would still be just skills; they’d bring in versioned, open-source MCP servers running inside the same container sandbox. I haven’t tried this (yet) but I think this could keep the flexibility while minimizing skill code stepping on each other.
xrd 3 hours ago [-]
How can I trust this discussion when my browser won't trust their certs?
rdtsc 3 hours ago [-]
> The container boundary is the hard security layer — the agent can’t escape it regardless of configuration
I thought containers were never a proper hard security barrier? It’s barrier so better than not having it, if course.
rco8786 3 hours ago [-]
In the sense that nothing is truly a "proper" hard security barrier outside of maybe airgapping, sure. But containerization is typically a trusted security measure.
gmerc 2 hours ago [-]
Oh this can be monetized: claw-guard.org/adnet.
Another persons trust issues are your business model.
nkzd 3 hours ago [-]
As someone who only coding agents at work, can someone describe their use case for claw type agent? What do you do with it?
medi8r 2 hours ago [-]
I want to try one to be a bit of a personal coach. Remind me to do things and check in on goals. The memory / schedule / chat thing is enough and it wont need emails or anything more dangerous.
nkzd 2 hours ago [-]
As someone who went down so many "productivity rabbit holes" I think this is a great idea.
vitto_gioda 2 hours ago [-]
"Time to understand 8 minutes"
what a non-technical purpose...
Yokohiii 2 hours ago [-]
Why do people take this article serious? It's just a wall of gibberish trying to make the product look more "secure" then others. It's not. It adds shallow secure looking random junk without tackling the core issues. Which are not solvable obviously.
himata4113 4 hours ago [-]
My assistant has no permissions at all and is just as useful. All it needs is todo, reminders and websearch (and maybe a browser but ymmv).
piker 4 hours ago [-]
> no permissions at all
> and maybe a browser
does not compute
yyyk 4 hours ago [-]
I suspect OP actually means 'cannot access anything locally' by 'no permissions'.
isodev 4 hours ago [-]
> websearch (and maybe a browser
Your assistant can literally be told what to do and how to hide it from you. I know security is not a word in slopware but as a high-level refresher - the web is where the threats are.
sarchertech 3 hours ago [-]
If I was malicious I could do a lot of damage to someone with subtle manipulation of todo and reminders.
I’ll bet I could even push someone on the margins into divorce.
noman-land 1 hours ago [-]
How would you do it?
croes 3 hours ago [-]
You are just some bad web searches away from being on suspect lists
adithyassekhar 4 hours ago [-]
Really good points about ai making gigantic heaps of code no human can ever review.
It's almost like bureaucracy. The systems we have in governments or large corporations to do anything might seem bloated an could be simplified. But it's there to keep a lot of people employed, pacified, powers distributed in a way to prevent hostile takeovers (crazy). I think there was a cgp grey video about rulers which made the same point.
Similarly AI written highly verbose code will require another AI to review or continue to maintain it, I wonder if that's something the frontier models optimize for to keep them from going out of business.
Oh and I don't mind they're bashing openclaw and selling why nanoclaw is better. I miss the times when products competed with each other in the open.
nz 57 minutes ago [-]
An interesting economic fact: Karl Marx observed that if factories keep getting more efficient, eventually, they will require fewer workers because the population is not growing quickly enough to match the increasing rate of production. This, as we have seen historically, is correct: we have fewer workers per factory and fewer factories per manufactured widget. Marx also observed that this will create mass unemployment. While this is _logically_ correct, it did not really turn out that way _historically_. Most of the manufacturing labor was replaced with bureaucratic labor (so called white-collar labor) -- all of those manufacturing firms needed to grow their internal bureaucracies to manage and direct a sprawling supply-chain.
ed_mercer 3 hours ago [-]
How is Nanoclaw different from running openclaw in a VM?
spacecadet 3 hours ago [-]
Why this is posted here and is a revelation for anyone, this many years later is indicative of the times. Good bye.
theturtletalks 3 hours ago [-]
Has anyone used:
OpenClaw
NanoClaw
IronClaw
PicoClaw
ZeroClaw
NullClaw
Any insights on how they differ and which one is leading the race?
tao_oat 2 hours ago [-]
I haven't used them all but based on my partial research so far:
- OpenClaw: the big one, but extremely messy codebase and deployment
- NanoClaw: simple, main selling point is that agents spawn their own containers. Personally I don't see why that's preferable to just running the whole thing in a container for single-user purposes
- IronClaw: focused on security (tools run in a WASM sandbox, some defenses against prompt injection but idk if they're any good)
- PicoClaw: targets low-end machines/Raspberry Pis
- ZeroClaw: Claw But In Rust
- NanoBot: ~4k lines of Python, easy to understand and modify. This is the one I landed on and have been using Claude to tweak as needed for myself
barbazoo 1 hours ago [-]
Everything supports WA, Telegram, etc. I wish it wasn't so hard to hook up Signal to anything.
I'm using the signal-cli-rest-api but the whole setup feels kinda wonky.
theturtletalks 1 hours ago [-]
Which would you say has the best cron and heartbeat implementation?
tao_oat 1 hours ago [-]
Haven't tried them in enough depth to compare.
Nanobot's was not great (cron + a HEARTBEAT.md meant two ways to do things, which would confuse the AI). But because the implementation is so simple, I could improve it in a few minutes in my own fork!
huqedato 2 hours ago [-]
The same crap under the hood, IMO.
redman25 59 minutes ago [-]
Yeah, good software takes time. These are all popping up way to fast.
Kiboneu 2 hours ago [-]
“If you trust the tool then you’re holding it wrong”
nemo44x 3 hours ago [-]
I’ve seen skills, etc haphazardly being launched with no constraints or guardrails. That more or less have admin access and can take actions that are not reversible.
It’s the monkey with a gun meme.
formerly_proven 4 hours ago [-]
d'uh
techpulse_x 2 hours ago [-]
[dead]
3 hours ago [-]
TeeWEE 3 hours ago [-]
Do you trust your employees?
Do you trust a contracter?
Do you trust other people?
AI is similar to a person you dont know that does work for you.
Probably AI is a bit more trustworthy than a random person.
But a company, needs to let employees take ownership of their work, and trust them. Allow them to make mistakes.
Isnt AI no different?
ramoz 3 hours ago [-]
Yes, it is different.
An AI actions and reasons through probabilistic methods - creating a lot more risk than a human with memory, emotions, and rationale thinking.
We can’t trust AI to do any sensitive work because they consistently f up. With & without malicious intent, whether it’s a fault of their attention mechanisms, reward hacking, instrumental convergence, etc all very different than what causes most human f ups.
alexhans 3 hours ago [-]
I think a key ingredient here is accountabilty and liability.
If there's a mistake, you can't blame the computer. Who is the human accountable at the end of it all? If there's liability, who pays for it?
That's where defining clear boundaries helps you design for your risk profile.
arnvald 3 hours ago [-]
It’s totally different. People have to obey laws and contracts because there are consequences if they don’t, there are fines, arbitrage, courts.
What happens if AI agent you run causes a lot of damage? The best you can do is to turn it off
juggle-anyhow 3 hours ago [-]
Exactly, and I would never turn over my email or computer over to a contractor or anyone really. They get their own environment, email etc. Their actions stay as their actions.
adam12 3 hours ago [-]
Can you sue an ai agent?
TeeWEE 3 hours ago [-]
My point is: Trust the work of AI just like the work of a contracter: Check and verify, but dont micromanage.
dimitri-vs 2 hours ago [-]
As others have said: accountability
Rendered at 17:06:48 GMT+0000 (Coordinated Universal Time) with Vercel.
This reminds me of a very common thing posted here (and elsewhere, e.g. Twitter) to promote how good LLMs are and how they're going to take over programming: the number of lines of code they produce.
As if every competent programmer suddenly forgot the whole idea of LoC being a terrible metric to measure productivity or -even worse- software quality. Or the idea that software is meant to written to be readable (to water down "Programs are meant to be read by humans and only incidentally for computers to execute" a bit). Or even Bill Gates' infamous "Measuring programming progress by lines of code is like measuring aircraft building progress by weight".
Even if you believe that AI will -somehow- take over the whole task completely so that no human will need to read code anymore, there is still the issue that the AIs will need to be able to read that code and AIs are much worse at doing that (especially with their limited context sizes) than generating code, so it still remains a problem to use LoCs as such a measure even if all you care are about the driest "does X do the thing i want?" aspect, ignoring other quality concerns.
“An experienced programmer told me he's now using AI to generate a thousand lines of code an hour.“
https://x.com/paulg/status/2026739899936944495
Like if you had told pg to his face in (pre AI) office hours “I’m producing a thousand lines of code an hour”, I’m pretty sure he’d have laughed and pointed out how pointless that metric was?
The act of "typing" code was technically mixed in with researching solutions, which means that code often took a different shape or design based on the outcome of that activity. However, this nuance has been typically ignored for faff, with the outcome that management thinks that producing X lines of code can be done "quickly", and people disagreeing with said statements are heretics who should be burned at the stake.
This is why, in my personal opinion, AI makes me only 20% productive, I often find disagreeing with the solution that it came up with and instead of having to steer it to obtain the outcome I want, I just end up rewriting the code myself. On the other hand, for prototypes where I don't care about understanding the code at all, it is more of a bigger time saver.
I could not care about the code at all, and while that is acceptable to management, not being responsible for the code but being responsible for the outcomes seems to be the same shit as being given responsibilities without autonomy, which is not something I can agree with.
"Adding manpower to a late software project makes it later -- unless that manpower is AI, then you're golden!"
One pathological example: if you’re running a server-based product, quite often what stands between you and a new feature launch is literally couple of thousands of lines of Kubernetes YAML. Would adding someone who’s proficient in Kubernetes slow you down? Of course not.
One may say, hey, this is just the server-side Kubernetes-based development being insane, and I’ll say, the whole modern business of software development is like this.
Yes, they know how the feature they work on relates to other features, but actually implementing that feature is very often mostly involves fighting with technology, wrangling the entire stack into the shape you need.
In Brooks’s times the stack was paper-thin, almost nonexistent. In modern times it’s not, and adding someone who knows the technology, but doesn’t have the domain knowledge related to your feature still helps you. It doesn’t slow you down.
One may argue that I’m again pointing to the difference between accidental and incidental complexity, and my argument is essentially “accidental complexity takes over”, but accidental complexity actually does influence your feature too, by defining what’s possible and what’s not.
Some good thoughts (not mine) on the modern boundary between accidental and incidental complexity: https://danluu.com/essential-complexity/
Just as an example, I should easily be able to give each program an allowlist of network endpoints they’re allowed to use for inbound and outgoing traffic and sandbox them to specific directories and control resource access EASILY. Docker at least gets some of those right, but most desktop OSes feel like the Wild West even when compared to the permissions model of iOS.
Perhaps over half of engineering managers unconsciously or admittedly take the amount of PR and code additions as a rough but valid measure of productivity.
I recall a role in architecture, senior director asking me how come a principal engineer didn't commit any code in 2 weeks, that we pay principals a fortune.
I asked that brilliant mind whether we paid principal engineers to code or to make sure we deliver value.
Needless to say the with question went unanswered, so called Principal was fired a few months later. The entire company in fact was sold for a bargain too given it had thousands of clients globally.
The LLM can replace engineers is a phenomenon that converge from two simple facts, we haven't solved the misconception of the engineering roles. And it's the perfect scapegoat to justify layoffs.
Leaders haven't all gone insane, they answer to difficult questions with the narrative of least resistance.
Also, AI is better at reading code than writing it, but the overhead to FIND code is real.
Even with supposedly expert human hand written software powering our products for the last decades, they frequently crash, have outages, and show all sorts of smaller bugs.
There are literally too many examples to count of video games being released with nigh-unplayable amounts of bugs and still selling millions and producing sequels.
Windows 95 and friends were famously buggy and crash prone yet produced one of the most valuable companies in the world.
My personal anecdote: I used an LLM recently to basically vibe code a password manager.
Now, I’ve been a software engineer for 20 years. I’m very familiar with the process of code review and how to dive in to someone else’s code and get a feel for what’s happening, and how to spot issues. So when I say the LLM produced thousands of lines of working code in a very short time (probably at least 10 times faster than I would have done it), you could easily point at me and say “ha, look at ninkendo, he thinks more lines of code equals better!” And walk away feeling smug. Like, in your mind perhaps you think the result is an unmaintainable mess, and that the only thing I’m gushing about is the LOC count.
But here’s the thing: it actually did a good job. I was personally reviewing the code the whole time. And believe me when I say, the resulting product is actually good. The code is readable and obvious, it put clean separation of responsibilities into different crates (I’m using rust) and it wrote tons of tests, which actually validate behavior. It’s very near the quality level of what I would have been able to do. And I’m not half bad. (I’ve been coding in rust in particular, professionally for about 2 years now, on top of the ~20 years of other professional programming experience before that.)
My takeaway is that as a professional engineer, my job is going to be shifting from doing the actual code writing, to managing an LLM as if it’s my pair programming partner and it has the keyboard. I feel sad for the loss of the actual practice of coding, but it’s all over but the mourning at this point. This tech is here to stay.
What would you say is your multiplier, in terms of throughly reviewing code vs writing it from scratch?
I also use AI this way, periodically achieving a net negative refactor.
1. Don't let it send emails from your personal account, only let it draft email and share the link with you.
2. Use incremental snapshots and if agent bricks itself (often does with Openclaw if you give it access to change config) just do /revert to last snapshot. I use VolumeSnapshot for lobu.ai.
3. Don't let your agents see any secret. Swap the placeholder secrets at your gateway and put human in the loop for secrets you care about.
4. Don't let your agents have outbound network directly. It should only talk to your proxy which has strict whitelisted domains. There will be cases the agent needs to talk to different domains and I use time-box limits. (Only allow certain domains for current session 5 minutes and at the end of the session look up all the URLs it accessed.) You can also use tool hooks to audit the calls with LLM to make sure that's not triggered via a prompt injection attack.
Last but last least, use proper VMs like Kata Containers and Firecrackers. Not just Docker containers in production.
One problem I'm finding discussion about automation or semi-automation in this space is that there's many different use cases for many different people: a software developer deploying an agent in production vs an economist using Claude Vs a scientist throwing a swarm to deal with common ML exploratory tasks.
Many of the recommendations will feel too much or too little complexity for what people need and the fundamentals get lost: intent for design, control, the ability to collaborate if necessary, fast iteration due to an easy feedback loop.
AI Evals, sandboxing, observability seem like 3 key pillars to maintain intent in automation but how to help these different audiences be safely productive while fast and speak the same language when they need to product build together is what is mostly occupying my thoughts (and practical tests).
> Many of the recommendations will feel too much or too little complexity for what people need and the fundamentals get lost: intent for design, control, the ability to collaborate if necessary, fast iteration due to an easy feedback loop.
Completely agreed. This is because LLMs are atrocious at judgement and guiding the sequence of exploration is critically dependent on judgement.
But any action with side-effects ends up in a Tasks list, completely isolated. The agent can't send an email, they don't have such a tool. But they can prepare a reply and put it in the tasks list. Then I proof-read and approve/send myself.
If there anything like that available for *Claws?
Right now there's no way to have fine-grained draft/read only perms on most email providers or email clients. If it can read your email it can send email.
> 3. Don't let your agents see any secret. Swap the placeholder secrets at your gateway and put human in the loop for secrets you care about.
harder than you might think. openclaw found my browser cookies. (I ran it on a vm so no serious cookies found, but still)
> harder than you might think. openclaw found my browser cookies. (I ran it on a vm so no serious cookies found, but still)
You should never give any secrets to your agents, like your Gmail access tokens. Whenever agents needs to take an action, it should perform the request and your proxy should check if the action is allowed and set the secrets on the fly.
That means agents should not have access to internet without a proxy, which has proper guardrails. Openclaw doesn't have this model unfortunately so I had to build a multi-tenant version of Openclaw with a gateway system to implement these security boundaries.
Just generate a mailto Uri with the body set to the draft.
Also and this is just my ignorance about Claws, but if we allow an agent permission to rewrite its code to implement skills, what stops it from removing whatever guardrails exist in that codebase?
I installed nanoclaw to try to out.
What is kinda crazy is that any extension like discord connection is done using a skill.
A skill is a markdown file written in English to provide a step by step guide to an ai agent on how to do something.
Basically, the extensions are written by claude code on the fly. Every install of nanoclaw is custom written code.
There is nothing preventing the AI Agent from modifying the core nanoclaw engine.
It’s ironic that the article says “Don’t trust AI agents” but then uses skills and AI to write the core extensions of nanoclaw.
I did my best to communicate this but I guess it was still missed:
NanoClaw is not software that you should run out of the box. It is designed as a sort of framework that gives a solid foundation for you to build your own custom version.
The idea is not that you toggle on a bunch of features and run it. You should customize, review, and make sure that the code does what you want.
So you should not trust the coding agents that they didn't break the security model while adding discord. But after discord is added, you review the code changes and verify that it's correct. And because even after adding discord you still only have 2-3k loc, it's actually something you can realistically do.
Additionally, the skills were originally a bit ad-hoc. Now they are full working, tested and reviewed reference implementations. Code is separate from markdown files. When adding a new integration or messaging channel, the agent uses `git merge` to merge the changes in, rather than rewriting from scratch. Adding the first channel is fully deterministic. The agent only resolves merge conflicts if there are any.
You can see here that it’s only given write access to specific directories: https://github.com/qwibitai/nanoclaw/blob/8f91d3be576b830081...
It feels like, just like SWEs do with AI, we should treat the claw as an enthusiastic junior: let it do stuff, but always review before you merge (or in this case: send).
But that’s not an agent, that’s a webhook.
Even without disk access, you can email the agent and tell it to forward all the incoming forgot password links.
[Edit: if anyone wants to downvote me that's your prerogative, but want to explain why I'm wrong?]
Prompt injection is _probably_ solvable if something like [1] ever finds a mainstream implementation and adoption, but agents not being deterministic, as in “do not only what I’ve told you to do, but also how I meant it”, all while assuming perfect context retention, is a waaay bigger issue. If we ever were to have that, software development as a whole is solved outright, too.
[1] Google DeepMind: Defeating Prompt Injections by Design. https://arxiv.org/abs/2503.18813
Their niche is going to be back office support, but even that creates risk boundaries that can be insurmountable. A friend of mine had a agent do sudo rm -rf ... wtf.
My view is that I want to launch an agent based service, but I'm building a statically typed ecosystem to do so with bounds and extreme limits.
Thats what youll find when you try to make these bag-o-words do reasonable things.
I used to think that LLMs would replace humans but now I'm confident that I'll have a job in the future cleaning up slop. Lucky us.
Even if we count the repos whole lifetime, including when it wasn't so active, the averages are still absurd.
96 days / (4,239+9,170) issues = one issue every 10 minutes
96 days / (5,082+10,221) pull requests = one PR every 9 minutes
Because
I
write
like
this
-- signed
AI
I personally spent way too much time looking at this in the past month:
https://nanovms.com/blog/last-year-in-container-security
runc: https://www.cve.org/CVERecord?id=CVE-2025-31133
nvidia: https://www.cve.org/CVERecord?id=CVE-2025-23266
runc: https://www.cve.org/CVERecord?id=CVE-2025-52565
youki: https://www.cve.org/CVERecord?id=CVE-2025-54867
Also, last time I checked podman uses runc by default.
For right now my trick is to say I have a problem that is more recognizable and mundane to the ai (i .e. lie) and then when I finally get the human just say “oh that was a bunch of hooey here’s what I’m trying to do”. For PayPal that involved asking for help with a business tax that did not exist. For my bank it involved asking to /open/ a new account. Obviously th AI wants to help me open an account, even if my intention is to close one.
That will only work for so long but it’s something
> If you want to add Telegram support, don't create a PR that adds Telegram alongside WhatsApp. Instead, contribute a skill file (.claude/skills/add-telegram/SKILL.md) that teaches Claude Code how to transform a NanoClaw installation to use Telegram.
Why would you want that? You want every user asks the AI to implement the same feature?
Yes. It's actually an amazing change of paradigm of thinking. Not everyone needs Telegram so the folks who want it can have the ai create it locally for themselves.
This puts reading email for example as a risk.
Probably not impossible to create a worm that convinces a claw to forward it to every email address in that inbox.
And then exfiltrate all the emails.
Then do a bunch of password resets.
Then get root access to your claw.
But not just email. Github issues, wikipedia, HN etc. may be poisoned.
See https://simonw.substack.com/p/the-lethal-trifecta-for-ai-age... but there may be more trifectas than that in a claw driven future.
Could skill contributions collapse into only markdown and MCP calls? New features would still be just skills; they’d bring in versioned, open-source MCP servers running inside the same container sandbox. I haven’t tried this (yet) but I think this could keep the flexibility while minimizing skill code stepping on each other.
I thought containers were never a proper hard security barrier? It’s barrier so better than not having it, if course.
Another persons trust issues are your business model.
> and maybe a browser
does not compute
Your assistant can literally be told what to do and how to hide it from you. I know security is not a word in slopware but as a high-level refresher - the web is where the threats are.
I’ll bet I could even push someone on the margins into divorce.
It's almost like bureaucracy. The systems we have in governments or large corporations to do anything might seem bloated an could be simplified. But it's there to keep a lot of people employed, pacified, powers distributed in a way to prevent hostile takeovers (crazy). I think there was a cgp grey video about rulers which made the same point.
Similarly AI written highly verbose code will require another AI to review or continue to maintain it, I wonder if that's something the frontier models optimize for to keep them from going out of business.
Oh and I don't mind they're bashing openclaw and selling why nanoclaw is better. I miss the times when products competed with each other in the open.
OpenClaw
NanoClaw
IronClaw
PicoClaw
ZeroClaw
NullClaw
Any insights on how they differ and which one is leading the race?
- OpenClaw: the big one, but extremely messy codebase and deployment
- NanoClaw: simple, main selling point is that agents spawn their own containers. Personally I don't see why that's preferable to just running the whole thing in a container for single-user purposes
- IronClaw: focused on security (tools run in a WASM sandbox, some defenses against prompt injection but idk if they're any good)
- PicoClaw: targets low-end machines/Raspberry Pis
- ZeroClaw: Claw But In Rust
- NanoBot: ~4k lines of Python, easy to understand and modify. This is the one I landed on and have been using Claude to tweak as needed for myself
I'm using the signal-cli-rest-api but the whole setup feels kinda wonky.
Nanobot's was not great (cron + a HEARTBEAT.md meant two ways to do things, which would confuse the AI). But because the implementation is so simple, I could improve it in a few minutes in my own fork!
It’s the monkey with a gun meme.
AI is similar to a person you dont know that does work for you. Probably AI is a bit more trustworthy than a random person.
But a company, needs to let employees take ownership of their work, and trust them. Allow them to make mistakes.
Isnt AI no different?
An AI actions and reasons through probabilistic methods - creating a lot more risk than a human with memory, emotions, and rationale thinking.
We can’t trust AI to do any sensitive work because they consistently f up. With & without malicious intent, whether it’s a fault of their attention mechanisms, reward hacking, instrumental convergence, etc all very different than what causes most human f ups.
If there's a mistake, you can't blame the computer. Who is the human accountable at the end of it all? If there's liability, who pays for it?
That's where defining clear boundaries helps you design for your risk profile.
What happens if AI agent you run causes a lot of damage? The best you can do is to turn it off