Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲The 100 hour gap between a vibecoded prototype and a working product (kanfa.macbudkowski.com)

262 points by kiwieater 20 days ago | 331 comments

alexpotato 20 days ago [-]

I work as a DevOps/SRE and have been doing it FinTech (bank, hedge funds, startups) and Crypto (L1 chain) for almost 20 years.

My thoughts on vibe coding vs production code:

- vibe coding can 100% get you to a PoC/MVP probably 10x faster than pre LLMs

- This is partly b/c it is good at things I'm not good at (e.g. front end design)

- But then I need to go in and double check performance, correctness, information flow, security etc

- The LLM makes this easier but the improvement drops to about 2-3x b/c there is a lot of back and forth + me reading the code to confirm etc (yes, another LLM could do some of this but then that needs to get setup correctly etc)

- The back and forth part can be faster if e.g. you have scripts/programs that deterministically check outputs

- Testing workloads that take hours to run still take hours to run with either a human or LLM testing them out (aka that is still the bottleneck)

So overall, this is why I think we're getting wildly different reports on how effective vibe coding is. If you've never built a data pipeline and a LLM can spin one up in a few minutes, you think it's magic. But if you've spent years debugging complicated trading or compliance data pipelines you realize that the LLM is saving you some time but not 10x time.

matt_heimer 20 days ago [-]

I'm building a Java HFT engine and the amount of things AI gets wrong is eye opening. If I didn't benchmark everything I'd end up with much less optimized solution.

Examples: AI really wants to use Project Panama (FFM) and while that can be significantly faster than traditional OO approaches it is almost never the best. And I'm not taking about using deprecated Unsafe calls, I'm talking about using primative arrays being better for Vector/SIMD operations on large sets of data. NIO being better than FFM + mmap for file reading.

You can use AI to build something that is sometimes better than what someone without domain specific knowledge would develop but the gap between that and the industry expected solution is much more than 100 hours.

jacquesm 20 days ago [-]

AI is extremely good at the things that it has many examples for. If what you are doing is novel then it is much less of a help, and it is far more likely to start hallucinating because 'I don't know' is not in the vocabulary of any AI.

Filligree 20 days ago [-]

> because 'I don't know' is not in the vocabulary of any AI.

That is clearly false. I’m only familiar with Opus, but it quite regularly tells me that, and/or decides it needs to do research before answering.

If I instruct it to answer regardless, it generally turns out that it indeed didn’t know.

jacquesm 20 days ago [-]

I haven't had that at all, not even a single time. What I have had is endless round trips with me saying 'no, that can't work' and the bot then turning around and explaining to me why it is obvious that it can't work... that's quite annoying.

dwaltrip 20 days ago [-]

Try something like:

> Please carefully review (whatever it is) and list out the parts that have the most risk and uncertainty. Also, for each major claim or assumption can you list a few questions that come to mind? Rank those questions and ambiguities as: minor, moderate, or critical.

> Afterwards, review the (plan / design / document / implementation) again thoroughly under this new light and present your analysis as well as your confidence about each aspect.

There's a million variations on patterns like this. It can work surprisingly well.

You can also inject 1-2 key insights to guide the process. E.g. "I don't think X is completely correct because of A and B. We need to look into that and also see how it affects the rest of (whatever you are working on)."

jacquesm 20 days ago [-]

Ok! I will try that, thank you very much.

dwaltrip 20 days ago [-]

Of course! I get pretty lazy so my follow-up is often usually something like:

"Ok let's look at these issues 1 at a time. Can you walk me through each one and help me think through how to address it"

And then it will usually give a few options for what to do for each one as well as a recommendation. The recommendation is often fairly decent, in which case I can just say "sounds good". Or maybe provide a small bit of color like: "sounds good but make sure to consider X".

Often we will have a side discussion about that particular issue until I'm satisfied. This happen more when I'm doing design / architectural / planning sessions with the AI. It can be as short or as long as it needs. And then we move on to the next one.

My main goal with these strategies is to help the AI get the relevant knowledge and expertise from my brain with as little effort as possible on my part. :D

A few other tactics:

- You can address multiple at once: "Item 3, 4, and 7 sound good, but lets work through the others together."

- Defer a discussion or issue until later: "Let's come back to item 2 or possibly save for that for a later session".

- Save the review notes / analysis / design sketch to a markdown doc to use in a future session. Or just as a reference to remember why something was done a certain way when I'm coming back to it. Can be useful to give to the AI for future related work as well.

- Send the content to a sub-agent for a detailed review and then discuss with the main agent.

intended 20 days ago [-]

Eh… I am not sure if that translate to “I don’t know”.

IDK would require the LLM to be aware of the frequency of cases seen in its own training.

I can see this working as a risk ranking, which is certainly worth trying in its own right.

Does it actually say “I don’t know?”

j45 18 days ago [-]

I don’t know can be added to the vocabulary depending on the technique being used.

There are so many overlapping and also unique approaches to software development beyond vibe coding and ai driven software development.

mtrovo 20 days ago [-]

I think the main issue is treating LLM as a unrestrained black box, there's a reason nobody outside tech trust so blindly on LLMs.

The only way to make LLMs useful for now is to restrain their hallucinations as much as possible with evals, and these evals need to be very clear about what are the goal you're optimizing for.

See karpathy's work on the autoresearch agent and how it carry experiments, it might be useful for what you're doing.

riffraff 20 days ago [-]

> there's a reason nobody outside tech trust so blindly on LLMs.

Man, I wish this was true. I know a bunch of non tech people who just trusts random shit that chatgpt made up.

I had an architect tell me "ask chatgpt" when I asked her the difference between two industrial standard measures :)

We had politicians share LLM crap, researchers doing papers with hallucinated citations..

It's not just tech people.

withinboredom 20 days ago [-]

We were working on translations for Arabic and in the spec it said to use "Arabic numerals" for numbers. Our PM said that "according to ChatGPT that means we need to use Arabic script numbers, not Arabic numerals".

It took a lot of back-and-forths with her to convince her that the numbers she uses every day are "Arabic numerals". Even the author of the spec could barely convince her -- it took a meeting with the Arabic translators (several different ones) to finally do it. Think about that for a minute. People won't believe subject matter experts over an LLM.

We're cooked.

ThrowawayR2 20 days ago [-]

Kind of a tangent but that did make me curious about how numbers are written in Arabic: https://en.wikipedia.org/wiki/Eastern_Arabic_numerals

tracker1 19 days ago [-]

I guess "Western Arabic" would have been more precise.

tstenner 20 days ago [-]

The architect should have required Hindu numbers. Same result, but even more confusion.

dvfjsdhgfv 20 days ago [-]

Man this is maddening.

godelski 20 days ago [-]

Honestly I think we're just becoming more aware of this way of thinking. It's certainly exacerbated it now that everyone has "an expert" in their pocket.

It's no different than conspiracy theorists. We saw a lot more with the rise in access to the internet. Not because they didn't put in work to find answers to their questions, but because they don't know how to properly evaluate things and because they think that if they're wrong then it's a (very) bad thing.

But the same thing happens with tons of topics, and it's way more socially acceptable. Look how everyone has strong opinions on topics like climate, rockets, nuclear, immigration, and all that. The problem isn't having opinions or thoughts, but the strength of them compared to the level of expertise. How many people think they're experts after a few YouTube videos or just reading the intro to the wiki page?

Your PM is no different. The only difference is the things they believed in, not the way they formed beliefs. But they still had strong feelings about something they didn't know much about. It became "their expert" vs "your expert" rather than "oh, thanks for letting me know". And that's the underlying problem. It's terrifying to see how common it is. But I think it also leads to a (partial) solution. At least a first step. But then again, domain experts typically have strong self doubt. It's a feature, not a bug, but I'm not sure how many people are willing to be comfortable with being uncomfortable

20 days ago [-]

j45 18 days ago [-]

There’s a possibility the same people might believe anything they read on social media or via Google and it’s something worthy of attention.

roncesvalles 20 days ago [-]

And the worst part is, these people don't even use the flagship thinking models, they use the default fast ones.

closewith 20 days ago [-]

In my experience, people outside of tech have nearly limitless faith in AI, to the point that when it clashes with traditional sources of truth, people start to question them rather than the LLM.

FpUser 20 days ago [-]

I am curious about what causes some to choose Java for HFT. From what I remember the amount of virgin sacrifices and dances with the wolves one must do to approach native speed in this particular area is just way too much of development time overhead.

matt_heimer 20 days ago [-]

Probably the same thing that makes most developers choice a language for a project, it's the language they know best.

It wasn't a matter of choosing Java for HFT, it was a matter of selecting a project that was a good fit for Java and my personal knowledge. I was a Java instructor for Sun for over a decade, I authored a chunk of their Java curriculum. I wrote many of the concurrency questions in the certification exams. It's in my wheelhouse :)

My C and assembly is rusty at this point so I believe I can hit my performance goals with Java sooner than if I developed in more bare metal languages.

nly 20 days ago [-]

"HFT" means different things to different people.

I've worked at places where ~5us was considered the fast path and tails were acceptable.

In my current role it's less than a microsecond packet in, packet out (excluding time to cross the bus to the NIC).

But arguably it's not true HFT today unless you're using FPGA or ASIC somewhere in your stack.

atomicnumber3 20 days ago [-]

The one person who understands HFT yeah. "True" HFT is FPGA now and also those trades are basically dead because nobody has such stupid order execution anymore, either via getting better themselves or by using former HFTs (Virtu) new order execution services.

So yeah there's really no HFT anymore, it's just order execution, and some algo trades want more or less latency which merits varying levels of technical squeezing latency out of systems.

matt_heimer 20 days ago [-]

Software HFT? I see people call Python code HFT sometimes so I understand what you mean. It's more in-line with low latency trading than today's true HFT.

I don't work for a firm so don't get to play with FPGAs. I'm also not co-located in an exchange and using microwave towers for networking. I might never even have access to kernel networking bypass hardware (still hopeful about this one). Hardware optimization in my case will likely top out at CPU isolation for the hot path thread and a hosting provider in close proximity to the exchanges.

The real goal is a combination of eliminating as much slippage as possible, making some lower timeframe strategies possible and also having best class back testing performance for parameter grid searching and strategy discovery. I expect to sit between industry leading firms and typical retail systematic traders.

20 days ago [-]

smokel 20 days ago [-]

> AI really wants to use Project Panama

It would help if you briefly specified the AI you are using here. There are wildly different results between using, say, an 8B open-weights LLM and Claude Opus 4.6.

matt_heimer 20 days ago [-]

I've been using several. LM Studio and any of the open weight models that can fit my GPU's RAM (24GB) are not great in this area. The Claude models are slightly better but not worth they extra cost most of the time since I typically have to spend almost the same amount of time reworking and re-prompting, plus it's very easy to exhaust credits/tokens. I mostly bounce back and forth between the codex and Gemini models right now and this includes using pro models with high reasoning.

tracker1 19 days ago [-]

Maybe a silly question, but why Java? As a C# guy, my experience with AI is it hasn't been great with it, and I'd suspect similar for Java. I'd probably go with Rust, which my own efforts with AI has done really well with, even if I'm far from a Rust expert.

colechristensen 20 days ago [-]

Then you list all of the things you want it not to do and construct a prompt to audit the codebase for the presence of those things. LLMs are much better at reviewing code than writing it so getting what you want requires focusing more on feedback than creation instructions.

grim_io 20 days ago [-]

Wouldn't Java always lose in terms of latency against a similarly optimized native code in, let's say, C(++)?

jacquesm 20 days ago [-]

Not necessarily. Java can be insanely performant, far more than I ever gave it credit for in the first decade of its existence. There has been a ton of optimization and you can now saturate your links even if you do fairly heavy processing. I'm still not a fan of the language but performance issues seem to be 'mostly solved'.

nly 20 days ago [-]

"Saturating your links" is rarely the goal in HFT.

You want low deterministic latency with sharp tails.

If all you care about is throughput then deep pipelines + lots of threads will get you there at the cost of latency.

matt_heimer 20 days ago [-]

You can achieve optimized C/C++ speeds, you just can't program the same way you always have. Step 1, switch your data layout from Array of Structures to Structure of Arrays. Step 2, after initial startup switch to (near) zero object creation. It's a very different way to program Java.

You have to optimize your memory usage patterns to fit in CPU cache as much as possible which is something typical Java develops don't consider. I have a background in assembly and C.

I'd say it's slightly harder since there is a little bit of abstraction but most of the time the JIT will produce code as good as C compilers. It's also an niche that often considers any application running on a general purpose CPU to be slow. If you want industry leading speed you start building custom FPGAs.

jodleif 20 days ago [-]

As long as you tune the JVM right it can be faster. But its a big if with the tune, and you need to write performant code

andriy_koval 20 days ago [-]

Java has significant overhead, that most/every object is allocated on heap, synchronized and has extra overhead of memory and performance to be GC controlled. Its very hard/not possible to tune this part.

matt_heimer 20 days ago [-]

You program differently for this niche in any language. The hot path (number crunching) thread doesn't share objects with gateway (IO) threads. Passing data between them is off heap, you avoid object creation after warm up. There is no synchronization, even volatile is something you avoid.

andriy_koval 20 days ago [-]

> Passing data between them is off heap

how exactly you are passing data? You can pass some primitives without allocating them on heap. You can use some tiny subset of Java+standard library to write high performance code, but why would you do this instead of using Rust or C++?

matt_heimer 20 days ago [-]

In some places I'm using https://github.com/aeron-io/agrona

Strangely this is one of the areas where I want to use project panama so I might re-implement some of the ring buffers constructs.

You allocate off heap memory and dump data into it. With modern Java classes like Arena, MemoryLayout, and VarHandle it's honestly a lot like C structs.

I answered "why" in another post in this thread.

andriy_koval 20 days ago [-]

> You allocate off heap memory and dump data into it. With modern Java classes like Arena, MemoryLayout, and VarHandle it's honestly a lot like C structs.

my opinion is that no, it is not, declaring and using C struct is 20x times more transparent, cost efficient and predictable. And that's we talking about C raw stucts, which has lots of additional ergonomics/safety/expression improvements in both c++ and rust on top of it.

tyingq 20 days ago [-]

Depends. Many reasons, but one is that Java has a much richer set of 3rd party libraries to do things versus rolling your own. And often (not always) third party libraries that have been extensively optimized, real world proven, etc.

Then things like the jit, by default, doing run time profiling and adaptation.

andriy_koval 20 days ago [-]

Java has huge ecosystem in enterprise dev, but very unlikely it has ecosystem edge in high performance/real time compute.

20 days ago [-]

roncesvalles 20 days ago [-]

There are actually cases when Java (the HotSpot JVM) runs faster than the same logic written in C/C++ because the JVM is doing dynamic analysis and selective JIT compilation to machine code.

not_kurt_godel 20 days ago [-]

I personally know of an HFT firm that used Java approximately a decade ago. My guess would be they're still using it today given Java performance has only improved since then.

andriy_koval 20 days ago [-]

it doesn't mean Java is optimal or close to optimal choice. Amount of extra effort they do to achieve goals could be significant.

jcgrillo 20 days ago [-]

Optimal in what sense? In the java shops I've worked at it's usually viewed as a pretty optimal situation to have everything in one language. This makes code reuse, packaging, deployment, etc much simpler.

In terms of speed, memory usage, runtime characteristics... sure there are better options. But if java is good enough, or can be made good enough by writing the code correctly, why add another toolchain?

andriy_koval 20 days ago [-]

> But if java is good enough, or can be made good enough by writing the code correctly,

"writing code correctly" here means stripping 95% of lang capabilities, and writing in some other language which looks like C without structs (because they will be heap allocated with cross thread synchronization and GC overhead) and standard lib.

Its good enough for some tiny algo, but not good enough for anything serious.

jcgrillo 20 days ago [-]

It's good enough for the folks who choose to do it that way. Many of them do things that are quite "serious"... Databases, kafka, the lmax disruptor, and reams of performance critical proprietary code have been and continue to be written in java. It's not low effort, you have to be careful, get intimate with the garbage collector, and spend a lot of time profiling. It's a totally reasonable choice to make if your team has that expertise, you're already a java shop, etc. I no longer make the choice to use java for new code. I prefer rust. But neither choice is correct or incorrect.

andriy_koval 20 days ago [-]

> Databases, kafka, the lmax disruptor, and reams of performance critical proprietary code have been and continue to be written in java.

those have low bar of performance, also they mostly became popular because of investments from Java hype, and rust didn't exist or had weak ecosystem at that time.

20 days ago [-]

mewpmewp2 20 days ago [-]

I would say that if AI has to make decisions about picking between framework or constructs irrelevant to the domain at hand, it feels to me like you are not using the AI correctly.

LtWorf 20 days ago [-]

I've seen SQL injection and leaked API tokens to all visitors of a website :)

Aurornis 20 days ago [-]

There’s a big gap between reality and the influencer posts about LLMs. I agree with you that LLMs do provide some significant acceleration, but the influencers have tried to exaggerate this into unbelievable numbers.

Even non-influencers are trying to exaggerate their LLM skills as a way to get hired or raise their status on LinkedIn. I rarely read the LinkedIn social feed but when I check mine it’s now filled with claims from people about going from idea to shipped product in N days (with a note at the bottom that they’re looking for a new job or available to consult with your company). Many of these posts come from people who were all in on crypto companies a few years ago.

The world really is changing but there’s a wave of influencers and trend followers trying to stake out their claims as leaders on this new frontier. They should be ignored if you want any realistic information.

I also think these exaggerated posts are causing a lot of people to miss out on the real progress that is happening. They see these obviously false exaggerations and think the opposite must be true, that LLMs don’t provide any benefit at all. This is creating a counter-wave of LLM deniers who think it’s just a fad that will be going away shortly. They’re diminishing in numbers but every LLM thread on HN attracts a few people who want to believe it’s all just temporary and we’re going back to the old ways in a couple years.

ryandrake 20 days ago [-]

> I rarely read the LinkedIn social feed but when I check mine it’s now filled with claims from people about going from idea to shipped product in N days (with a note at the bottom that they’re looking for a new job or available to consult with your company).

This always seems to be the pattern. "I vibe coded my product and shipped it in 96 hours!" OK, what's the product? Why haven't I heard of it? Why can't it replace the current software I'm using? So, you're looking for work? Why is nobody buying it?

Where is the Quicken replacement that was vibecoded and shipping today? Where are the vibecoded AAA games that are going to kill Fortnite? Where is the vibecoded Photoshop alternative? Heck, where is the vibecoded replacement for exim3 that I can deploy on my self hosted E-mail server? Where are all of the actual shipping vibecoded products that millions of users are using?

piersj225 20 days ago [-]

I found one example of this going very wrong on reddit the other day -

https://www.reddit.com/r/selfhosted/comments/1rckopd/huntarr...

One redditor security reviews a vibe coded project

ryandrake 20 days ago [-]

Wow, great example, and great example of what these fakers do when called out. Summary:

The maintainer, instead of listening to the security researcher and accepting feedback about his development process, instead:

1. Denied the problem

2. Censored discussion of the problem

3. Banned the people calling out the problem

...and then when the security issues were posted more publicly and got traction...

4. Made the subreddit private

5. Wiped and deleted his account

6. Wiped and deleted the GitHub repo

7. Took the project's web site off the web

Absolutely wild and unhinged behavior.

pjc50 20 days ago [-]

The self hosted reddit has been inundated with slop and is trying to ban it. It's not a good idea to run anyone else's vibe code!

jcgrillo 20 days ago [-]

holy fuck this is awesome.. I haven't laughed this hard in a while

socalgal2 20 days ago [-]

I agree with your general point but ... "Where are the vibecoded AAA games". A game dev team is typically less than 15% programmers. Most of the team are artists, followed by game designers. Maybe someday those will be replaced too but at the moment, while you can get some interesting pictures from stable-diffusion techniques it's unlikely to make a cohesive game and even prompting to create all of it would still take many person years.

That said, I have had some good experiences getting a few features from zero to working via LLMs and it's helped me find lots of bugs far easier than my own looking.

I can imagine a vibe coded todo app. I can also kind of imagine a vibe coded gIMP/Photoshop though it would still take several person years, prompting through each and every feature.

wierdbytes 20 days ago [-]

> Where are all of the actual shipping vibecoded products that millions of users are using?

Claude Code and OpenClaw - they are vibecoded. And I believe more coming.

Jensson 20 days ago [-]

Claude Code is not vibecoded, it is made using Claude Code but it is not vibecoded using Claude Code.

wierdbytes 19 days ago [-]

kind of - https://x.com/bcherny/status/2004897269674639461

snovv_crash 20 days ago [-]

But it's like crypto then, good for buying other crypto, or illegal stuff.

Also people are using CC for the cheap access to the model, otherwise they'd be using opencode.

youknownothing 20 days ago [-]

Yeah, I really wonder if someone would trust to do their taxes in a vibe-coded version of Turbotax...

mordechai9000 20 days ago [-]

Do you really need Turbotax? Just feed it the tax code, your financial data, and the relevant forms and it should be good to go. Now we have freed up the labor of accountants so they can go be productive in another segment of society. /s

adithyassekhar 19 days ago [-]

Looking from outside the US tax system, I feel taxes are intentionally complex to keep some people employed and to hide from questions.

youknownothing 19 days ago [-]

well, you still need to drive some data: say you purchased some food, was that a personal purchase or was it business-related (e.g. a meal with a client). Software can help with that decision making.

But even if human interaction wasn't needed, the question still stands, would you trust an LLM to calculate your tax liability just by feeding it data?

pjc50 20 days ago [-]

I regret only having one upvote for this.

I note that games are mostly art assets and things like level design, and players are already happy to instantly consign such products to the slop bin.

The whole thing is "market for lemons": app stores filling with dozens of indistinguishable clones of each product category will simply scare users off all of them.

atomicnumber3 20 days ago [-]

"I come from a state that raises corn and cotton and cockleburs and Democrats, and frothy eloquence neither convinces nor satisfies me. I am from Missouri. You have got to show me."

paganel 20 days ago [-]

The “store on the chain” thing turned out to be a fad in terms of technology, even though it made a lot of money (in the billions and more) to some people via the crypto thing. That was less than 10 years ago, so many of us do remember the similarities of the discourse being made then to what’s happening now.

With all that said, today’s LLMs do seem so provide a little bit more value compared to the bit chain thing, for example OCR/.pdf parsing is I’d say a solved thing right now thanks to LLMs, which is nice.

20 days ago [-]

roncesvalles 20 days ago [-]

>Many of these posts come from people who were all in on crypto companies a few years ago.

This is ditto my observation. There seems to be a certain "type" of people like this. And it's not just people looking for work.

My guess is either they have super low critical thinking, a very cynical view of the world where lies and exaggeration are the only way to make it, or something more pathological (narcissism etc).

PyWoody 20 days ago [-]

The "type" is simply the get-rich-quick schemers.

I have a relative who was late to crypto, late to drop shipping, late to carbon credits, but is now absolutely all-in on AI as his ticket out. It honestly depresses the hell out of me trying to talk to him because everything is about money and getting rich.

People like this don't care about underlying technologies or learning past the most basic surface level of understanding.

ge96 20 days ago [-]

Day 7 of using Claude Code here are my takes...

sebastiennight 20 days ago [-]

“Day 7" would be amazing - all that I see YouTube recommending is "I tried it for 24 hours"

I was listening to an "expert" on a podcast earlier today up until the point where the interviewer asked how long his amazing new vibe-coded tooling has been in production, and the self-proclaimed expert replied "actually we have an all-hands meeting later today so I can brief the team and we will then start using the output..."

capplexham 20 days ago [-]

> The LLM makes this easier but the improvement drops to about 2-3x b/c there is a lot of back and forth + me reading the code to confirm etc

This makes sense when you stop viewing the LLM as a "vending machine" for apps and start seeing it as a repository of software deltas.

LLMs aren't just trained on final code; they are trained on the entire history of pull requests, review comments, and issue discussions that move a project from one version to the next.

When I use an LLM now, my workflow has shifted entirely. I’ve stopped trying to be the "coder" and have instead stepped into the role of PR Reviewer and Power User. My job is to point out edge cases, define the spec, and catch regressions—effectively managing a "virtual team" that handles the boilerplate and feature implementation.

Expecting a one-shot 1.0 release is unrealistic because it bypasses the thousand micro-decisions that happen in a real dev cycle. By embracing the "review and refine" loop, I’m becoming a better maintainer, even if that 100-hour gap to a polished product still exists.

bittermandel 20 days ago [-]

This is exactly my experience at Lovable. For some parts of the organization, LLMs are incredibly powerful and a productivity multiplier. For the team I am in, Infra, it's many times distraction and a negative multiplier.

I can't say how many times the LLM-proposed solution to a jittery behavior is adding retries. At this point we have to be even more careful with controlling the implementation of things in the hot path.

I have to say though, giving Amp/Claude Code the Grafana MCP + read-only kubectl has saved me days worth of debugging. So there's definitely trade-offs!

bee326 20 days ago [-]

My colleague recently shipped a "bug fix" that addresses a race condition by adding a 200ms delay somewhere, almost completely coded by LLM. LLM even suggests that "if this is not good enough, increase it to 300ms".

That says something about how much some people care about this.

nomorewords 20 days ago [-]

Even doubly so because that's how most people have solved a similar problem, so that the LLM suggests that

netbioserror 20 days ago [-]

More generally: LLM effectiveness is inversely proportional to domain specificity. They are very good at producing the average, but completely stumble at the tails. Highly particular brownfield optimization falls into the tails.

Aperocky 20 days ago [-]

The magic is testing. Having locally available testing and high throughput testing with high amount of test cases now unlocks more speed.

The test cases themselves becomes the foci - the LLM usually can't get them right.

robhlt 20 days ago [-]

How does that test suite get built and validated? A comprehensive and high quality test suite is usually much larger than the codebase it tests. For example, the sqlite test suite is 590x [1] the size of the library itself

1. https://sqlite.org/testing.html

Aperocky 20 days ago [-]

By sweat and tears, and unfortunately, AI can only help so much in those cases. You'll have to have a really concrete idea about what your product is and how it should work.

nswango 20 days ago [-]

sqlite is an extreme outlier not a typical example, with regard to test suite size and coverage.

worik 20 days ago [-]

> The magic is testing.

No it is not.

There os no amount of testing that can fix a flawed design

Aperocky 20 days ago [-]

That is a given, similar to how no amount of implementation can fix a wrong product. Both statements are not very meaningful.

neonbrain 20 days ago [-]

The word "Testing" is a very loaded term. Few non-professionals, or even many professionals, fully understand what is meant by it.

Consider the the following: Unit, Integration, System, UAT, Smoke, Sanity, Regression, API Testing, Performance, Load, Stress, Soak, Scalability, Reliability, Recovery, Volume Testing, White Box Testing, Mutation Testing, SAST, Code Coverage, Control Flow, Penetration Testing, Vulnerability Scanning, DAST, Compliance (GDPR/HIPAA), Usability, Accessibility (a11y), Localization (L10n), Internationalization (i18n), A/B Testing, Chaos Engineering, Fault Injection, Disaster Recovery, Negative Testing, Fuzzing, Monkey Testing, Ad-hoc, Guerilla Testing, Error Guessing, Snapshot Testing, Pixel-Perfect Testing, Compatibility Testing, Canary Testing, Installation Testing, Alpha/Beta Testing...

...and I'm certain I've missed dozens of other test approaches.

megous 20 days ago [-]

You forgot a hope-driven development and release process and other optimism based ("i'm sure it's fine" method), or faith based approaches to testing (ship and pray, ...). Customer driven invluntary beta testing also comes to mind and "let's see what happens" 0-day testing before deployment. We also do user-driven error discovery, frequently.

worik 20 days ago [-]

There is no science to testing, no provable best way, despite many people's vehement opinions

neonbrain 20 days ago [-]

Why did you assume I'm talking about a "provable best way"? I meant that it doesn't make sense to talk simply about "testing" without clarifying what one means by it. If you assume that the absence of a "provable best way" implies a lack of utility, let me remind you that there is no "provable best way" for training LLMs either. Does that matter in practice?

yojo 20 days ago [-]

> - This is partly b/c it is good at things I'm not good at (e.g. front end design)

Everyone thinks LLMs are good at the things they are bad at. In many cases they are still just giving “plausible” code that you don’t have the experience to accurately judge.

I have a lot of frontend app dev experience. Even modern tools (Claude w/Opus 4.6 and a decent Claude.md) will slip in unmaintainable slop in frontend changes. I catch cases multiple times a day in code review.

Not contradicting your broader point. Indeed, I think if you’ve spent years working on any topic, you quickly realize Claude needs human guidance for production quality code in that domain.

steveBK123 20 days ago [-]

Yes I’ve seen this at work where people are promoting the usage of LLMs for.. stuff other people do.

There’s also a big disconnect in terms of SDLC/workflow in some places. If we take at face value that writing code is now 10x faster, what about the other parts of the SDLC? Is your testing/PR process ready for 10x the velocity or is it going to fall apart?

What % of your SDLC was actually writing code? Maybe time to market is now ~18% faster because coding was previously 20% of the duration.

onionisafruit 20 days ago [-]

It’s the Gell-Mann amnesia effect applied to LLM instead of media

cvhc 20 days ago [-]

> Testing workloads that take hours to run still take hours to run with either a human or LLM testing them out (aka that is still the bottleneck)

Actually I had some terrible experiences when asking the agent to do something simple in our codebase (like, rename these files and fix build scripts and dependencies) but it spent much longer time than a human, because it kept running the full CI pipelines to check the problems after every attempted change.

A human would, for example, rely on the linter to detect basic issues, run a partial build on affected targets, etc. to save the time. But the agent probably doesn't have a sense of time elapsed.

alexpotato 20 days ago [-]

Went through something similar recently with database calls.

Co-pilot said something about having too many rows returned and had some complex answer on how to reduce row count.

I just added a "LIMIT 100" which was more than adequate.

djeastm 20 days ago [-]

Can't this be solved with something like "Don't run any CI commands" in the AGENTS.md?

aeonfox 20 days ago [-]

Except for the times you do want it to run the CI.

LLM issues can often be solved by being more and more specific, but at some point being specific enough is just as time consuming as jumping in and doing it yourself.

bauerd 20 days ago [-]

>Testing workloads that take hours to run still take hours to run with either a human or LLM testing them out (aka that is still the bottleneck)

Absolutely. Tight feedback loops are essential to coding agents and you can’t run pipelines locally.

efromvt 20 days ago [-]

This is where I think we need better tooling around tiered validation - there's probably quite a bit you can run locally if we had the right separation; splitting the cheap validation from the expensive has compounding benefits for LLMs.

bojangleslover 20 days ago [-]

What I do now is I make an MVP with the AI, get it working. And then tear it all down and start over again, but go a little slower. Maybe tear down again and then go even more slowly. Until I get to the point where I'm looking at everything the AI does and every line of code goes through me.

the__alchemist 20 days ago [-]

I concur on the DevSecOps aspect for a more specific reason: If you're failing a pipeline because ThirdPartyTOol69 doesn't like your code style or W/E, you can have the LLM fix it. Or get you to 100% test coverage etc. Or have it update your Cypress/Jest/SonarQube configs until the pipeline passes without losing brain cells doing it by hand. Or finds you a set of dependency versions that passes.

teleforce 20 days ago [-]

> The LLM makes this easier but the improvement drops to about 2-3x b/c there is a lot of back and forth + me reading the code to confirm etc (yes, another LLM could do some of this but then that needs to get setup correctly etc)

> The back and forth part can be faster if e.g. you have scripts/programs that deterministically check outputs

This is where configuration language like CUE can be useful in complementing LLM [1].

It's the deterministic NLP cousin of the stochastic LLM based on mathematically sound latticed-value logic [2].

[1] Guardrailing Intuition: Towards Reliable AI:

https://cue.dev/blog/guardrailing-intuition-towards-reliable...

[2] The Logic of CUE:

https://cuelang.org/docs/concept/the-logic-of-cue/

amelius 20 days ago [-]

Also, now you're reading someone else's code and not everybody likes that. In fact, most self-proclaimed 10x coders I know hate it.

So instead of the 10x coder doing it, the 1x coder does it, but then that factor of 3x becomes 0.3x.

steveBK123 20 days ago [-]

Absolutely. In my experience there are more “good coders” than people who are good at code review/PR/iterative feedback with another dev.

A lot of people are OCD pedants about stuff that can be solved with a linter (but can’t be bothered to implement one) or just “LGTM” everything. Neither provide value or feedback to help develop other devs.

alexpotato 20 days ago [-]

> A lot of people are OCD pedants about stuff that can be solved with a linter (but can’t be bothered to implement one) or just “LGTM” everything. Neither provide value or feedback to help develop other devs.

This many be one of the best quotes on HN in a while.

steveBK123 20 days ago [-]

Thank you, I felt old & cranky today

baxtr 20 days ago [-]

Isn’t that the reason why people advocate for spec-driven development instead of vibe coding?

jmt710 19 days ago [-]

Do you ever think that maybe your biased? I ask because I get the feeling from a lot of professional programmers that they feel like they are better and smarter than everyone and everything else. No matter how good an LLM or AI in general gets at programming task, people who make a living programming will always have a problem with it. There is going to come a time when you're going to be obsolete. I hate to say it, but it's coming and the hostility twords the tech isn't going to save your job.

quater321 20 days ago [-]

At this point, every programmer who claims that vibecoding doesn't make you at least 10 times more productive is simply lying or worst, doesn't know how to vibe code. -So, you want to tell me that you don't review the code you write? Or that others don't review it? - You bring up ONE example with a bottleneck that has nothing to do with programming. Again, if you claim it doesn't make you 10x more productive, you don't know how to use AI, it is that simple. - I pin up 10 agents, while 5 are working on apps, 5 do reviews and testing, I am at the end of that workflow and review the code WHILE the 10 agents keep working.

For me it is far more than 10x, but I consider noobs by saying 10x instead of 20x or more.

sieste 20 days ago [-]

Just goes to show that most programmers have no idea what most programmers are mostly programming. Great that it works for you, but don't assume that this applies to everyone else.

atomicnumber3 20 days ago [-]

Can you link to one launched product with users for us?

geetee 20 days ago [-]

I can't tell if this is real or a joke.

hsuduebc2 20 days ago [-]

What exactly are you producing? LinkedIn posts?

raincole 20 days ago [-]

They're... launching an NFT product in 2026...

I know it's not the point of this article, but really?

s1mon 20 days ago [-]

Yep. As much as the rest of it resonated with LLM coding experiences I'm having, the NFT thing is unfortunate.

serial_dev 20 days ago [-]

The way I see it, the NFT part is actually just for convenience to distribute AI generated images.

It could have been a web app, but with NFTs and Farcaster miniapps, you market to people who are willing and able to spend using their wallet instead of asking “normies” for credit card information for a 2 dollar custom image (that you could also prompt out of a free Gemini session).

With Farcaster, you also already have the profile picture of the user, one less hurdle again.

ryandrake 20 days ago [-]

I think there's simply a huge overlap between the Crypto Bros, the NFT Bros, and now the AI Bros. The same sorts of people are pumping each one. I knew a guy who was into LeadGen and Drop Shipping in the 2000s, then got into online poker, then of course, got into Crypto, then inevitably NFTs. I haven't kept up with him, but I'm almost 100% sure he's pumping some AI related scheme now. These guys get into this pipeline and at each stage they are convinced that they're going to get rich off it.

colechristensen 20 days ago [-]

Crypto has very narrow usage unless you're a criminal or a bro, NFT has essentially 0 non-bro activity, surely AI attracts bros, but also some of the smartest people I've known have been working on it a long time to build truly useful things.

AI can be really attractive to bros but also be incredibly useful.

In other words, AI isn't a trend that's going to pass, it's permanently going to reshape the tech scene and economy in a way that cryptocoins and NFTs absolutely did not.

ryandrake 20 days ago [-]

> AI isn't a trend that's going to pass, it's permanently going to reshape the tech scene and economy in a way that cryptocoins and NFTs absolutely did not.

This exact wording was used for crypto. "It isn't a trend that's going to pass" and "It's going to reshape everything." Why are we sure of it now for AI (and that we're going to be right), when they were also sure of it before for crypto (and they ended up wrong)?

The AI people have the exact same feelings of absolute certainty as the crypto people had.

cheschire 20 days ago [-]

People's grandmothers know what AI is and many have used it, even outside the west.

Probably zero grandmothers outside the west, and very few grandmothers within the west, know what NFT even stands for.

mattmanser 20 days ago [-]

But they do know what crypto is for and several of them would have lost money on it.

colechristensen 20 days ago [-]

>The AI people have the exact same feelings of absolute certainty as the crypto people had.

Yes, this is what hype is like. Something being hyped doesn't automatically mean it's going to go bust.

Again, everyone smart was making fun of NFT at its peak. People who accomplish things are excited by LLMs, not just teenagers, grifters, bros, and your weird uncle.

suzzer99 20 days ago [-]

I'd pay a few bucks for some cool avatar or w/e this is. It seems like a good use of NFTs.

itomato 20 days ago [-]

And the viewpoint is from the development of such "product" with "manufactured virality".

It's bunk.

chamomeal 20 days ago [-]

I have friends (well, friends of friends) who still play the NFT lottery. People love gambling lol

daveguy 20 days ago [-]

I thought everyone realized by now that a digital image made available via block chain or any other mechanism, can be duplicated indefinitely. The only thing you get is a copyright on some generated image or set of bits. And what are the chances any random digital image is going to be appreciated as art? You can't hang it in a living room or sit it on a coffee table. It's beanie babies, but without even a hill of beans.

Are people just expecting there's going to be enough digital fools to make a market?

sowbug 20 days ago [-]

Isn't the same true of any intellectual property?

A movie can be duplicated indefinitely. There's no guarantee your song will be appreciated as art. I'm not sure why you say you can't print out an image and hang it in your living room; we do that all the time at home.

I've personally never dabbled in NFTs, but I don't think it's fair to ascribe the inherent conflict between information and scarcity uniquely to them.

daveguy 20 days ago [-]

The difference is, there is a person or many people collaborating who created the song, movie, etc. there's a dipshit with an RNG who created the NFT.

roncesvalles 20 days ago [-]

You don't have to believe in it. You just have to believe someone else will believe in it and be willing to pay a higher price.

marginalia_nu 20 days ago [-]

The more I evaluate Claude Code, the more it feels like the world's most inconsistent golfer. It can get within a few paces of the hole in often a single strike, and then it'll spend hours, days, weeks trying to nail the putt.

There's some 80-20:ness to all programming, but with current state of the art coding models, the distribution is the most extreme it's ever been.

youknownothing 20 days ago [-]

I'm having somewhat good experiences with AI but I think that's because I'm only half-adopting it: instead of the full agentic / Ralphing / the-AI-can-do-anything way, I still do work in very small increments and review each commit. I'm not as fast as others, but I can catch issues earlier. I also can see when code is becoming a mess and stop to fix things. I mean, I don't fix them manually, I point Claude at the messy code and ask it to refactor it appropriately, but I do keep an eye to make sure Claude doesn't stray off course.

Honestly, seeing all the dumb code that it produces, calling this thing "intelligent" is rather generous...

tqwhite 20 days ago [-]

I would love it if someone explained what their ten agents Ralphing away were actually told to do.

I suppose if you are doing something that truly can be decided based on a test but, I just don't see it, at least for anything I do.

apitman 20 days ago [-]

I think ralphing is for purely vibe coded stuff, where you're literally never looking at the code and only asking for changes to the final output.

If I'm reviewing all the code, so far I'm still the bottleneck even with a single agent and I don't see an easy way to change that.

ChrisMarshallNY 20 days ago [-]

"working" != "shipping."

When we start selling the software, and asking people to pay for/depend upon our product, the rules change -substantially.

Whenever we take a class or see a demo, they always use carefully curated examples, to make whatever they are teaching, seem absurdly simple. That's what you are seeing, when folks demonstrate how "easy" some new tech is.

A couple of days ago, I visited a friend's office. He runs an Internet Tech company, that builds sites, does SEO, does hosting, provides miscellaneous tech services, etc.

He was going absolutely nuts with OpenClaw. He was demonstrating basically rewiring his entire company, with it. He was really excited.

On my way out, I quietly dropped by the desk of his #2; a competent, sober young lady that I respect a lot, and whispered "Make sure you back things up."

niemandhier 20 days ago [-]

With sufficiently advanced vibe coding the need for certain type of product just vanishes.

I needed it, I quickly build it myself for myself, and for myself only.

sieste 20 days ago [-]

Related anecdote: My 12yo son didn't like the speed cubing online timer he was using because it kept crashing the browser and interrupted him with ads. Instead of googling a better alternative we sat down with claude code and put together the version of the website that behaved and looked exactly as he wanted. He got it working all by himself in under an hour with less than 10 prompts, I only helped a bit putting it online with github pages so he can use it from anywhere.

WarmWash 20 days ago [-]

I don't think people are grasping yet that this is the future of software, if by no metric other than "most software used is created by the user".

AstroBen 20 days ago [-]

The average user doesn't even know what a file is

sieste 20 days ago [-]

Turns out that knowing what a plain text file is will be the criterion that distinguishes users who are digitally free from those locked into proprietary platforms.

nly 20 days ago [-]

Wont happen.

The average user just has no interest in building things.

sieste 20 days ago [-]

Many parents are extremely interested in quickly building digital tools for their kids (education and entertainment) that they know are free from advertising, social media integration, user monitoring etc.

GeoAtreides 20 days ago [-]

I'm saying this with all my love and respect: you are living in a very small bubble

sieste 20 days ago [-]

That may be true. But you also have to give the average parent more credit by assuming they don't want tech companies spying on their children and forcing their toxic platforms on them.

There are well attended parent evenings in our school on that topic.

Thinking about it, we should turn these into vibe coding hackathons where we replace all the ad-ridden little games, learning tools, messengers we don't like with healthy alternatives.

20 days ago [-]

WarmWash 20 days ago [-]

Which is why they will use AI to do the building...

mattmanser 20 days ago [-]

Outside of small niches, no-one wants to maintain and host their own software.

This is not the future of software.

qsera 20 days ago [-]

>most software used is created by the user

You really believe that?

WarmWash 20 days ago [-]

Yes, because the current software paradigm (a shed/barn/warehouse full of tools to suite every possible users every possible need) doesn't make sense when LLMs can turn plain English into a software tool in the matter of minutes.

qsera 20 days ago [-]

>LLMs can turn plain English into a software tool in the matter of minutes.

Unless LLMs can read minds, no one will bother to specify, even in plain english with the required level of detail. And that is assuming the user has the details in mind, which is also something pretty improbable...

Gareth321 20 days ago [-]

You need to think outside the box a little. They're not going to need to write a requirements doc from scratch. They'll tell it to copy a piece of software which is already established and make some customisations or improvements based on their needs. This is a few sentences.

qsera 20 days ago [-]

But if most software is created by user, then where does this reference piece come from?

Gareth321 16 days ago [-]

The same place all creative reference comes from: someone or something else. We have a nearly unlimited well of creative and technical reference now.

zahlman 20 days ago [-]

That wasn't being claimed, just proposed as the direction we're headed.

qsera 20 days ago [-]

Another user had already written what I had in mind when I responded to your comment..

https://news.ycombinator.com/item?id=47387570

marcosdumay 20 days ago [-]

So... The future is like the past?

That would be good news, but I doubt most people will do things like that.

SlinkyOnStairs 20 days ago [-]

> I don't think people are grasping yet that this is the future of software

What about this is new?

Sitting down with a child to teach them the very basics of javascript in an hour? Trivial.

Needing Claude to do it is kind of embarassing, if anything.

kaffekaka 20 days ago [-]

Out of curiosity, did you also implement scramble support? Or just the timing stuff?

sieste 20 days ago [-]

yes. claude added a suggested random scramble (if that's what you mean?), also running average of 5/12/100, local storage of past times on first iteration, my son told it to also add a button for +2s penalties and touch screen support.

kaffekaka 19 days ago [-]

Ok cool! I have not done any cubing related coding so I don't know how complicated it gets but making sure suggested scrambles are solvable etc seems like it could be non-trivial?

sieste 16 days ago [-]

You just get a sequence of random moves to go from solved to scrambled, it's quite trivial.

See here if you're interested: stefansiegert.net/cube-timer

Let me know if you adapt it in any way, my son would be delighted to see open source work its magic :)

kaffekaka 16 days ago [-]

Ah of course, thank you. Defining the moves to get to the scramble makes sure it is solvable.

zahlman 20 days ago [-]

... So at no point in this did anyone even question why it should be a website?

sieste 20 days ago [-]

"use it from anywhere" was important, and I don't think there's an easier way than a freely hosted static website.

AstroBen 20 days ago [-]

Because now that website is fully cross-platform and sandboxed with no practical downside

lacedeconstruct 20 days ago [-]

I dont want that though, I want someone to spend much more time than I can afford thinking about and perfecting a product that I can pay for and dont worry about it

jsdalton 20 days ago [-]

The metaphor that’s popped into my head recently is baking bread.

You can learn to bake good bread. It’s not _that_ hard. And it’ll probably taste better than store bought bread.

But it almost certainly won’t be cheaper. And it’ll take a more more time and effort.

Still, sometimes you might bake your own bread for kicks. But most of the time, you’ll just buy the bread someone else has already perfected.

nly 20 days ago [-]

Baking bread also takes hours of waiting.

I can have fresh bread anytime I want from a handful of nearby stores.

Gareth321 20 days ago [-]

In the next few years it's going to be quicker to tell an AI to make something than it will take to hunt down software which fits all your uses perfectly. If you're honest, all software is imperfect for you. It's not customised exactly how you like it. Imagine if it could be exactly what you want with zero effort.

kami23 20 days ago [-]

And some people do, both things can be true. I'd rather make a tool just for me that breaks when I introduce a new requirement and I just add into it and keep going.

kjksf 20 days ago [-]

The statement wasn't: "no one ever vibe codes an alternative to product X"

It was: "With sufficiently advanced vibe coding the need for certain type of product just vanishes."

If a product has 100 thousand users and 1% of them vibe codes an alternative for themselves, the product / business doesn't vanish. They still have 99 thousand of users.

That was the rebuttal, even if not presented as persuasively and intelligently as I just did.

So no, it's not the case of "both things being true". It's a case of: he was wrong.

GorbachevyChase 20 days ago [-]

At some point there will be market consequences for that kind of behavior. So where market dynamics are not dominated by bullshit (politics, friendships forged on Little St James, state intervention, cartel behavior, etc.) if my company provides the same service as another, but I replaced all of the low quality software as a service products my competitor uses with low quality vibe coded products, my overhead cost will be lower and that will give me an advantage.

hmmmmmmmmmmmmmm 20 days ago [-]

If we could return to one-off payments without dark patterns I would agree. Hopefully at least the software that rely on grift will start to vanish.

keyle 20 days ago [-]

I built a jira with attachments and all sorts of bells and whistles. Purrs like a kitten. Saas are going extinct. At least the jobs that charged $1000 a day to write jira plugins.

ivan_gammel 20 days ago [-]

Some minor UX enhancement SaaS of the most recent VC-funded wave will do. Maybe those who forgot how to invest in R&D and spent last 20 years just fixing bugs. There’s plenty of SaaS on the market that offers added value beyond the code. Data brokers. Domain experts, etc. Even if homemade solution is sometimes possible, initial development costs are going to be just one of several important factors in choosing whether to build or to buy.

101008 20 days ago [-]

SaaS are not going exctinct. This reminds me of the LinkedIn posts saying they clone Slack in two hours, copying the UI, etc. Yeah, if you think Slack is private chat rooms then you should use IRC for your company.

One of the most valuable things about Slack is the ecosystem: apps, API support, etc. If you need to receive notifications from external apps (like PageDuty or Incident.io or something like that), good luck expecting them having a setup for your own version of the app. Yeah, some of them provide webhooks (not all of them), but in the end you have to maintain that too...

advancespace 20 days ago [-]

[dead]

pydry 20 days ago [-]

jira is a perfect example of an abysmal product that was marketed well.

xp84 20 days ago [-]

Yes, it seems like it got to some tipping point around 2013 where so many product and management people were familiar with it, and from there it became this “industry standard” that management always wanted everyone to use.

Also though, I feel like being attached to Confluence helped it because there is a lot less competition in the world of documentation wikis than there is in task management.

matwood 20 days ago [-]

Products where the only value was the code are definitely under pressure. But, how many products are really like that? I suggest everyone look up HALO that’s so popular in investing right now, and start looking at companies with the assumption that the value of the code is zero so what other value is there. There’s often a lot more there than people realize.

jcgrillo 20 days ago [-]

How many products are actually like that? If I could easily replace github, datadog/sentry/whatever, cloudflare, aws, tailscale that would be great. In my view building and owning is better than buying or renting. Especially when it comes to data--it would be much better for me to own my telemetry data for example than to ship it off to another company. But I don't think you (or anyone) will be vibecoding replacements for these services anytime soon. They solve big, hard, difficult problems.

CuriouslyC 20 days ago [-]

Github is on the chopping block as a tool (it's sticky as a social network). The other stuff not so much.

The things that are going away are tools that provide convenience on top of a workflow that's commoditized. Anything where the commercial offering provides convenience rather than capabilities over the open source offerings is gonna get toasted.

jcgrillo 20 days ago [-]

Even at recent levels of uptime I think it would be very difficult to build a competing product that could function at the scale of even a small company (10 engineers). How would you implement Actions? Code review comments/history? Pull requests? Issues? Permalinks? All of these things have serious operational requirements. If you just want some place to store a git repository any filesystem you like will do it but when you start talking about replacing github that's a different story altogether and TBH I don't think building something that appears to function the same is even the hard part, it's the scaling challenges you run into very quickly.

WarmWash 20 days ago [-]

The future is narrow bespoke apps custom tailored for exactly that one single users use case.

An example would be if the user only ever works with .jpg files, then you don't need to support any of the dozens of other formats an image program would support.

I cannot stress enough how many software users out there are only using 1-10% of a program's capability, yet they have to pay for a team of devs who maintain 100% of it.

jcgrillo 20 days ago [-]

"The future" is fiction. It's a blank canvas where you can make a fingerpainting of any fantasy you like. Whenever people tell me about "the future" I know they're talking absolute rubbish. And I also like your fantasy! But it probably won't happen.

ryandrake 20 days ago [-]

I call it "Psychics for Programmers." People will scoff at psychics and fortune telling and palm reading, but then the same people will listen to Elon or some founder or VC and be utterly convinced that that person is a visionary and can describe the future.

WarmWash 20 days ago [-]

It's just reading the room. People hate having to use their computers through the lens of quasi-robot humans (saying that as one of those robots). They hate having to pay monthly just so dumb features and UI overhauls can be pushed on them.

They just want the software to do the few things they need it to do. AI labs are falling over themselves to remove the gate keeping regular people from using their computing device the way they want to use it. And the progress there in the last few years is nothing short of absolutely astounding.

jcgrillo 20 days ago [-]

> the progress there in the last few years is nothing short of absolutely astounding

Yet, all the astounding progress notwithstanding, I don't have a suite of bespoke tools replacing the ones I depend on. I cannot say "hey claude, make me a suite of bespoke software infrastructure monitoring and operational tooling tailored to my specific needs" and expect anything more than a giant headache and wasted time. So maybe we just need to wait? Or maybe it's just not actually real. My view is unless you show me a working demo it's vaporware. Show me that the problem is solved, don't tell me that it might be solved later sometime.

user34283 20 days ago [-]

And what exactly is preventing you from building bespoke software for "infrastructure monitoring and operational tooling tailed to your specific needs"?

I could certainly imagine building myself some sort of dashboard. It would seem like a prime use case.

You want to hear about a problem solved? Recently I extended a tool that snaps high resolution images to a Pixel art grid, adding a GUI. I added features to remove the background, to slice individual assets out of it automatically, and to tile them in 9-slice mode.

Could I have realistically implemented the same bespoke tool before AI? No.

jcgrillo 20 days ago [-]

> And what exactly is preventing you from building bespoke software for "infrastructure monitoring and operational tooling tailed to your specific needs"?

Let's say I emit roughly 1TB of telemetry data per day--logs, metrics, etc. That's roughly what you might expect from medium sized tech company or a specific department (say, security) at a large company. There is going to be a significant infrastructure investment to replicate datadog's function in my organization, even if I only use a small subset of their product. It's not just "building a dashboard" it's building all the infrastructure to collect, normalize, store, and retrieve the data to even be able to draw that dashboard.

The dashboard is the trivial part. The hard part is building, operating, and maintaining all the infrastructure. Claude doesn't do a very good job helping with this, and in some sense it actually hinders.

EDIT: I'm not saying you shouldn't take ownership of your telemetry data. I think that's a strategically (and potentially from a user's perspective) better end result. But it is a mistake to trivialize the effort of that undertaking. Claude is not going to vibeslop it for you.

user34283 20 days ago [-]

I agree, that does not seem like a smart undertaking. I was thinking more of a dashboard within the existing software, or above it.

For my use case I wanted bespoke software to work with Pixel art, but obviously I would not try to build Photoshop or Aseprite from scratch. I needed only specific functionality and I was able to build that in a way fitting my workflow better than any existing software could.

I was able to build it with Claude Code and Codex. Maybe the implementation is sloppy, I did not care to check. The program works, and it's like a side project to my side project. It would not have been possible in the past, I would have needed to work with what Aseprite offers out of the box.

20 days ago [-]

IAmGraydon 20 days ago [-]

This is a pipe dream and “sufficiently advanced” is doing a lot of heavy lifting. You really think people would rather spin up and debug their own self-made software rather than pay for something that has been tested, debugged, and proven by thousands of users? Why would anyone do that for anything more than a very simple script? It makes zero sense unless the LLM outputs literally perfect one-shot software reliably.

niemandhier 20 days ago [-]

Perplexity just launched a tool that builds and hosts small bespoke tools.

I tried it works wells. I can do the same thing in my Linux machine, but even my 12 year old now can get perplexity to build him a tool to compare ram prices at different chinease vendors.

qsera 20 days ago [-]

Yes, LLMs can be a better search tool.

djaro 20 days ago [-]

In which case, if LLMs can perfectly one shot simple programs, creating and maintaining a really advanced program would presumably be very cheap since it could just one shot every feature. So instead of generating new image editing programs for every task, why not pay $10/month for the guy who spent a week guiding an LLM into making ultra photoshop with every feature Ill ever need?

user34283 20 days ago [-]

It makes sense if you want bespoke software to do a specific job in a way best suited to your workflow.

Could you do the same in eg. Photoshop? Maybe, but even if, you would need to learn how.

program_whiz 20 days ago [-]

Photoshop is a good example -- not that I agree with everything in the app, but just to design all the interactions properly in photoshop would take hundreds of hours (not to mention testing and figuring out the edges). If your goal is a 1-to-1 clone why not use Krita or photoshop? With LLM you'll get "mostly there" with many many hours of work, and lots of sharp edges. If all you need is paint bucket, basic brush / pencil, and save/load, ok maybe you can one-shot it in a few hours... or just use paint / aesprite...

zer00eyz 20 days ago [-]

https://xkcd.com/1205/ (is it worth the time matrix)

LLM's change the calculus of the above chart dramatically.

carterparks 20 days ago [-]

I think there's a lot to pick apart here but I think the core premise is full of truth. This gap is real contrary to what you might see influencers saying and I think it comes from a lot of places but the biggest one is writing code is very different than architecting a product.

I've always said, the easiest part of building software is "making something work." The hardest part is building software that can sustain many iterations of development. This requires abstracting things out appropriately which LLMs are only moderately decent at and most vibe coders are horrible at. Great software engineers can architect a system and then prompt an LLM to build out various components of the system and create a sustainable codebase. This takes time an attention in a world of vibe coders that are less and less inclined to give their vibe coded products the attention they deserve.

tqwhite 20 days ago [-]

An advantage I have enjoyed is that I am insanely careful about my fundamental architecture and I have a project scaffold that works correctly.

It has examples of all the parts of a web app written, over many years, to be my own ideal structure. When the LLM era arrived, I added a ton of comments explaining what, why and how.

It turns out to serves as a sort of seed crystal for decent code. Though, if I do not remind it to mimic that architecture, it sometimes doesn't and that's very weird.

Still, that's a tip I suggest. Give it examples of good code that are commented to explain why its good.

hebrides 20 days ago [-]

I’ve had a similar experience. I’ve been vibecoding a personal kanban app for myself. Claude practically one-shotted 90% of the core functionality (create boards, lanes, cards, etc.) in a single session. But after that I’ve now spent close to 30 hours planning and iterating on the remaining features and UI/UX tweaks to make the app actually work for me, and still, it doesn’t feel "ready" yet. That’s not to say it hasn’t sped up the process considerably; it would’ve taken me hours to achieve what Claude did in the first 10 minutes.

lelanthran 20 days ago [-]

I've got a few projects I've generated, along with a wholly handwritten project started in Dec.

The difference I've noticed is that the act of actually typing out code made me backtrack a few times refining the possible solutions before even starting the integration tests, sometimes before even doing a compile.

When generating, the LLM never backtracked, even in the face of broken tests. It would proceed to continue band-aiding until everything passed. It would add special exceptions to general code instead of determining that the general rule should be refined or changed.

The reason that some devs are reporting 10x productivity is because a bunch of duct-taped, band-aided, instant-legacy code is acceptable. Others who dont see that level of productivity increase are spending time fixing the code to be something they can read.

Not sure yet if accepting the spaghetti is the right course. If future LLMs can understand this spaghetti then theres no point in good code. If we still need human coders, then the productivity increase is very small.

qsera 20 days ago [-]

> It would add special exceptions to general code instead of determining that the general rule should be refined or changed.

That is pretty bad..

pmdr 20 days ago [-]

What little vibe coding I've done has been consistent with that experience.

mattmanser 20 days ago [-]

Was a kanban board ever that hard?

Trello was written by interns as a summer project, when SPAs were just becoming a thing and React didn't even exist.

With 30 hours I bet I could get a pretty good one up without vibe coding it.

In a single afternoon I could get boards, cards, lanes, etc done. React, MaterialUI uaing Grid + Card and you're almost done.

20 days ago [-]

ncruces 20 days ago [-]

I built my latest side project (a Wasm to Go "transpiler") precisely as a way to push the limits of what I could do with an LLM/agent.

It sped me up (and genuinely helped with some ideas) but not 10x.

The bits I didn't design myself I definitely needed to inspect and improve before the ever eager busy beaver drove them to the ground.

That said, I'm definitely impressed by how a frontier model can "reason" about Go code that's building an AST to generate other Go code, and clearly separate what's available at generation time vs. at runtime. There's some sophistication there, and I found myself telling them often "this is the kind of code I want to generate, build the AST."

I also appreciated how faster models are good enough at slightly fuzzy find and replace. Like I need to do this refactor, I did two samples of it here, can you do these other 400? I have these test cases in language X, converted 2, can you do the other 100? Even these simple things saved me a lot of time.

In return I got something that can translate SQLite compiled to Wasm into 500k lines of Go in about a month of my spare time.

https://github.com/ncruces/wasm2go

risyachka 20 days ago [-]

>> people who say they "vibecoded an app in 30 minutes" are either building simple copies of existing projects,

those are not copies, they aren't even features. usually part of a tiny feature that barely works only in demo.

with all vibe coding in the world today you still need at least 6 months full time to build a nice note taking app.

If we are talking something more difficult - it will be years - or you will need a team and it will still take a long time.

Everything less will result in an unusable product that works only for demo and has 80% churn.

ianm218 20 days ago [-]

Can you expand on this? You definitely don’t need 6 months for a note taking app to be useable it is more you need to compete with the state of the art right

utopiah 20 days ago [-]

I'd argue you need between 6 minutes and 6 years.

It depends entirely on what you want. You can literally code a JavaScript 1-liner that will make a <textarea> then put the content back in the URL and it will work serverless on pretty much any platform with a Web browser.

You can also write a note taking app that will be federated yet private, that will have its own scripting language, etc. I mean you can yak-shave your way to write your own OS or even designing your own CPU for that.

So... I'm not sure that metric, time, means much without a proper context, including who does it. It's quite different if to do that, regardless of the tooling used, if you are a professional developer, designer, fullstack dev, prototypist, PM, marketer, writer, etc.

risyachka 20 days ago [-]

> Can you expand on this?

sure. does your note taking app supports formatting? you don't need it today. you will need it at some point. images? same.

does it handle file corruption etc? no? then its pretty much useless.

does it work across devices? in modern world, again, it is pretty much useless without it

it works across devices? then it needs hosting. if it is hosted it needs auth, it needs backups

you can go on for ever.

the bar for very minimal note taking app that you actually will use is very high, with other software it is even higher.

and this is not even state of art, this is must haves

ianm218 20 days ago [-]

Obsidian is super popular and is generally local first and device specific.

And even so if your starting a note taking app most of those problems like file corruption and image support are largely solved problems. There is also the benefit of being able to reference tons of open source implementations.

I think one month to notion like app that is prod ready if you just need Auth + markdown + images + standard text editing

hmmmmmmmmmmmmmm 20 days ago [-]

>with all vibe coding in the world today you still need at least 6 months full time to build a nice note taking app.

Bad example, note apps loaded with features are anti-productive and are for people who treat note taking as a hobby itself.

You have Obsidian anyway if you want something open source to work with.

Ekaros 20 days ago [-]

Ah, note taking as hobby finally explains to me why these apps seem so popular. I don't think I have ever considered that I need one. And it to be something that shouldn't be fully solved multiple times now. But it really being hobby does kinda make the point for me.

glst0rm 20 days ago [-]

I don't disagree, but I found it ironic I built ZenPlan, my ideal hybrid task/notetaking app, in about 50 hours with Claude Code this month after being frustrated with notebook and task management sprawl in OneNote. www.getzenplan.com

benwaffle 20 days ago [-]

obsidian isn't open source

weird-eye-issue 20 days ago [-]

What universe do you live in

margalabargala 20 days ago [-]

You seem to be making the assumption that "app" means "sellable product", rather than "one off that works for me". It doesn't.

When everyone is able to make their own one off prototype in 30 minutes, no one will pay for the thing that took someone 6 months.

risyachka 20 days ago [-]

whatever you prototype - the one who built it in 6 month will have economy of scale to make it cheaper than your diy solution, and because they serve many customers and developed it for 6 months - their product will be 100x better than the one you diy

there is very very rare use case when diy makes sense. in 99% of cases its just a toy that feels nice as you kinda did it. but if you factor in the time etc it is always costs 100x more than $5/month you could usually buy

margalabargala 20 days ago [-]

Based on your earlier comment and your last paragraph, your impression of AI vibe coding ability is at least a year out of date.

In late 2024 it might have taken 6 months. Today, two weeks, maybe 3.

weird-eye-issue 20 days ago [-]

With that logic LEGO sets and Salesforce subscriptions should be virtually free due to their economy of scale

nemo44x 20 days ago [-]

The 80/20 rule doesn’t go away. I am an AI true believer and I appreciate how fast we can get from nothing to 80% but the last “20%” still takes 80%+ of the time.

The old rules still apply mainly.

tossandthrow 20 days ago [-]

Yes, so 80% of 100 hours is considerably less than 80% of 600 hours

nemo44x 19 days ago [-]

You get 80% done in 20% of the time. The LLM shrinks that 20%. So a 100 task maybe takes 5 hours instead of 20 which is great. But the remaining 80 hours are not as improved. So a 100hr job takes ~85 hours which is very good.

This is in-line with Googles study showing about a 10% productivity increase and other research I’ve read. I suspect this will increase with more integrations and workflow adaptations.

But even after power tools changed how quickly carpenters can frame and rough-in a house, the finishing work (which uses power tools too) still takes the majority of the time.

iamcalledrob 20 days ago [-]

In my experience, the last 20% tends to be the stuff that's less obvious, too, by it's very nature.

The details and pitfalls that are unique to your specific scenario, that you only discover by running into them.

And yet this less obvious, more uncommon stuff is also what AI will be weakest at.

skyberrys 20 days ago [-]

If you ask for something complicated this headline is more than true. But why complicate things, keep it simple and keep it fast.

Also this article uses 'pfp' like it's a word, I can't figure out what it means.

I'm able to vibe code simple apps in 30 minutes, polish it in four hours and now I've been enjoying it for 2 months.

robocat 20 days ago [-]

AI is usually better than traditional search for working out acronyms and jargon.

My prompt "What does PFP mean on this page: https://kanfa.macbudkowski.com/vibecoding-cryptosaurus" gave a good answer and it described extra relevant context within crypto.

I had less luck with "What does sharp tails mean in «HFT. You want low deterministic latency with sharp tails»". But I suspect the source sentence is the problem.

etothet 20 days ago [-]

I noticed this as well. I had to look it up. Apparently ‘pfp’ means ‘profile picture’.

xp84 20 days ago [-]

Yeah I’ve always found that a cringe initialism given that it’s not Pro File Picture. I would just say avatar.

sph 19 days ago [-]

It’s a Gen-Z initialism, who are too young to be familiar with ‘avatar’, unless you’re talking about the movie.

stavros 20 days ago [-]

Apparently it means profile photo.

tim-projects 20 days ago [-]

I started working on one of my apps around a year ago. There was no ai CLI back then. My first prototype was done in Gemini chat. It took a week copy and pasting text between windows. But I was obsessed.

The result worked but that's just a hacked together prototype. I showed it to a few people back then and they said I should turn it into a real app.

To turn it into a full multi user scaleable product... I'm still at it a year later. Turns out it's really hard!

I look at the comments about weekend apps. And I have some of those too, but to create a real actual valuable bug free MVP. It takes work no matter what you do.

Sure, I can build apps way faster now. I spent months learning how to use ai. I did a refactor back in may that was a disaster. The models back then were markedly worse and it rewrote my app effectively destroying it. I sat at my desk for 12 hours a day for 2 weeks trying to unpick that mess.

Since December things have definitely gotten better. I can run an agent up to 8 hours unattended, testing every little thing and produce working code quite often.

But there is still a long way to go to produce quality.

Most of the reason it's taking this long is that the agent can't solve the design and infra problems on its own. I end up going down one path, realising there is another way and backtracking. If I accepted everything the ai wanted, then finishing would be impossible.

tqwhite 20 days ago [-]

Back then, also around May, I had Claude 3.old destroy a working app. Those were sad old days.

Hasn't happened in a long time. Opus 4.6 is a miracle improvement.

makingstuffs 20 days ago [-]

I’m sure someone else has probably coined the term before me (or it’s just me being dumb, often the case) but I’ve started calling this phase of SWE ‘Ricky Bobby Development’.

So many people are just shouting ‘I wanna go fast’ and completely forgetting the lessons learned over the past few decades. Something is going to crash and burn, eventually.

I say this as a daily LLM user, albeit a user with a very skeptical view of anything the LLM puts in front of me.

nunez 20 days ago [-]

I love this!

shepherdjerred 20 days ago [-]

My experience is that Claude Code, when used appropriately, can produce work better than most programmers.

"when used appropriately" means:

- Setting up guardrails: use a statically typed language, linters, CLAUDE.md/skills for best practices.

- Told to do research when making technical decisions, e.g. "look online for prior art" or "do research and compare libraries for X"

- Told to prioritize quality and maintainability over speed. Saying we have no deadline, no budget, etc.

- Given extensive documentation for any libraries/APIs it is using. Usually I will do this as a pre-processing step, e.g. "look at 50 pages of docs for Y and distill it into a skill"

- Given feedback loops to check its work

- Has external systems constraining it from making shortcuts, e.g. "ratchet" checks to make sure it can't add lint suppressions, `unsafe` blocks, etc.

And, the most important things:

- An operator who knows how to write good code. You aren't going to get a good UI/app unless you can tell it what that means. E.g. telling it to prioritize native HTML/CSS over JS, avoiding complexity like Redux, adding animations but focus on usability, make sure the UI is accessible, etc.

- An operator who is steering it to produce a good plan. Not only to make sure that you are building the right thing, but also you are explaining how to test it, other properties it should have (monitoring/observability, latency, availability, etc.)

A lot of this comes down to "put the right things in the context/plan". If you aren't doing that, then of course you're going to get bad output from an LLM. Just like you would get bad output from a dev if you said "build me X" without further elaboration.

pindab0ter 20 days ago [-]

Where do you keep the skills it generates from docs? Does this not become a mess?

shepherdjerred 20 days ago [-]

It can go in your repo or your home dir.

Why would it become a mess? If I made a skill for, say, BuildKite, Claude will only load it when applicable.

They will eventually get out of date, so that is something you’ll have to accept or perhaps make some meta skill-updater program to run on a schedule

dehrmann 20 days ago [-]

> Late in the night most problems were fixed and I wrote a script that found everyone whose payment got stuck. I sent them money back (+ extra $1 as a ‘thank you for your patience’ note), and let them know via DMs.

(emphasis added)

Not sure if it was actually written by hand or AI was glossed over, but as soon as giving away money was on the table, the author seems to have ditched AI.

capplexham 20 days ago [-]

This isn’t surprising when you consider how LLMs are actually trained to write code.

Expecting a one-shot 1.0 release is unrealistic because the sheer volume of context and decision-making required for a finished product is enormous.

Instead, I think of LLMs as being trained on the "delta" of software development: the pull requests, review comments, and issue discussions that move a project from one version to the next.

When you use an LLM for coding, you are effectively tapping into the collective output of a team of developers and a crowd of users. My mental model has shifted accordingly: I no longer try to be the "coder." Instead, I act as the PR reviewer and the passionate power user. My job is to point out edge cases and refine the output, rather than expecting a finished product in one go.

It’s making me a better maintainer and a more precise communicator, even if the "100-hour gap" to production remains a reality.

westurner 20 days ago [-]

I keep seeing things that were vibe coded and thinking, "That's really impressive for something that you only spent that much time on".

To have a polished software project, you must spend time somewhat menially iterating and refining (as each type of user).

To have a polished software project, you need to have started with tests and test coverage from the start for the UI, too.

Writing tests later is not as good.

I have taken a number of projects from a sloppy vibe coded prototype to 100% test coverage. Modern coding llm agents are good at writing just enough tests for 100% coverage.

But 100% test coverage doesn't mean that it's quality software, that it's fuzzed, or that it's formally verified.

Quality software requires extensive manual testing, iteration, and revision.

I haven't even reviewed this specific project; it's possible that the author developed a quality (CLI?) UI without e2e tests in so much time?

Was the process for this more like "vibe coding" or "pair programming with an LLM"?

westurner 20 days ago [-]

Is 100 hours enough?

A 40-hour work year has 2,080 hours per person per year.

The "10,000" hours necessary to be really good at anything number was the expert threshold that they used to categorize test subjects who performed neuroimaging studies while compassion meditating. "10,000" hours to be an expert is about 5 years at full time.

But how many hours to have a good software product?

Usually I check for tests and test coverage first. You could have spent 1,000 hours on a software project and if it doesn't have automated tests, we can't evolve the software and be sure that we haven't caused regressions.

westurner 20 days ago [-]

> That's really impressive for something that you only spent that much time on"

Again, I haven't even read this particular project;

There's:

Prompt insufficiency: Was the specification used to prompt the model to develop the software sufficient in relation to what are regarded as a complete enough software specifications?

Model and/or Agent insufficiency,

Software Development methods and/or Project Management insufficiency,

QA insufficiency,

Peer review sufficiency;

Is it already time to rewrite the product using the current project as a more sufficient specification?

But then how many hours of UI and business logic review would be necessary again?

aenis 20 days ago [-]

The interesting part about vibe coding is the spectrum of experiences and attitudes. I have been playing with it for 2-3hrs a day for the last 4 months now. None of my friends who are using it are using it in the same way. Some people vibe and then refactor, some spec-everything and micro-prompt the solutions. Nobody is feeling like this thing can go unsupervised.

And then there is one guy, a friend of mine, who is planning to release a "submit a bug report, we will fix it immediately" feature (so, collect error report from a user, possibly interview them, then assess if its a bug or not with a "product owner LLM", and then autonomously do it, and if it passes the tests - merge and push to prod - all under one hour. Thats for a mid cap company, for their client-facing product. F*** hell! I have a full bag of bug reports ready for when this hits prod :->

devld 20 days ago [-]

My non-technical client has totally vibe coded a SaaS prototype with lots of features, way bigger product than OP and it sort of works. They spent like 200 hours on it. I wonder what would have been the time needed to clean it up and approve it is secure. I declined to work on it, as I was not sure if it's even possible or if it would be better to rewrite the entire thing from scratch with better prompts. I was not that sure about it given the cost and the fact that they had a product that sort of worked and I let them go to find someone to clean it up. My reasoning is that if the client took 200h to develop this without stopping to check the code, it would take me 2 - 3 x to rewrite it with AI, but the right way, while the cleanup may be so painful it would be way better value for money to rewrite it from scratch.

dehrmann 20 days ago [-]

> I declined to work on it, as I was not sure if it's even possible or if it would be better to rewrite the entire thing from scratch with better prompts.

This is a bit of an unknown right now. If you get a working prototype, but need to productionize it, make sure it can scale, and get it looked over with a security mineset, how long it might take isn't clear, so finding someone who will do that is hard.

devld 19 days ago [-]

It sounds like a really crappy deal to a contractor. In the eyes of the client, you are basically a janitor and Claude is like a superstar that built the whole thing, while the truth is reviewing such code is harder than reviewing and quality approving one's own work. I hope it proves out to be uneconomical and we will be just rewriting the vibe code from scratch.

eongchen 20 days ago [-]

The 100 hours aren't a vibecoding tax. They're an engineering knowledge tax.

I built 4 AI products to hundreds of thousands of users, working with AI agents as collaborators, not autopilots. The difference isn't the tool. It's whether you can tell the AI is wrong and stop it before it wastes 10 hours going down the wrong path.

The author watched Claude create new S3 buckets for several rounds before catching it. An experienced engineer catches that on the first diff. Most of those 100 hours were spent not knowing you're lost.

"Vibecoding" as a concept is the problem. It implies you can vibe your way through engineering. You can't. AI is a force multiplier, not a replacement for knowing what good looks like.

jonstewart 20 days ago [-]

Woodworking is an analogy that I like to use in deciding how to apply coding agents. The finished product needs to be built by me, but now I can make more, and more sophisticated, jigs with the coding agents, and that in turn lets me improve both quality and quantity.

rhoopr 20 days ago [-]

This seems more like he is bad at describing what he wants and is prompting for “a UI” and then iterating “no, not like that” for 99 hours.

firesteelrain 20 days ago [-]

Author admittedly didn’t know how to scale his app for thousands or hundreds of thousands of users. He jokes about it working great on localhost or “my machine”.

Not knocking the premise of the post. It probably works well for one single user if it’s an iPhone or Android app. But his 100 power hours are probably just right for what he ended up launching as he iterated through the requirements and learned how to set this up through reinforced learning and user feedback.

PunchTornado 20 days ago [-]

Yeah but if you have to describe in very much details in english, you're better of just writing it with autocomplete.

I find that vibe coding is useful when it can be build with little details and it makes the right assumptions.

dielll 20 days ago [-]

I have had the experience with creating https://swiftbook.dev/learn

Used Codex for the whole project. At first I used claude for the architect of the backend since thats where I usually work and got experience in. The code runner and API endpoints were easy to create for the first prototype. But then it got to the UI and here's where sh1t got real. The first UI was in react though I had specifically told it to use Vue. The code editor and output window were a mess in terms of height, there was too much space between the editor and the output window and no matter how much time I spent prompting it and explaining to it, it just never got it right. Got tired and opened figma, used it to refine it to what I wanted. Shared the code it generated to github, cloned the code locally then told codex to copy the design and finally it got it right.

Then came the hosting where I wanted the code runner endpoint to be in a docker container for security purpose since someone could execute malicious code that took over the server if I just hosted it without some protection and here it kept selecting out of date docker images. Had to manually guide it again on what I needed. Finally deployed and got it working especially with a domain name. Shared it with a few friends and they suggested some UI fixes which took some time.

For the runner security hardening I used Deepseek and claude to generate a list of code that I could run to show potential issues and despite codex showing all was fine, was able to uncover a number of issues then here is where it got weird, it started arguing with me despite showing all the issues present. So I compiled all the issues in one document, shared the dockerfile and linux secomp config tile with claude and the also issues document. It gave me a list of fixes for the docker file to help with security hardening which I shared back with codex and that's when it fixed them.

Currently most of the issues were resolved but the whole process took me a whole week and I am still not yet done, was working most evenings. So I agree that you cannot create a usable product used by lots of users in 30 minutes not unless it's some static website. It's too much work of constant testing and iteration.

tqwhite 20 days ago [-]

I have had things like your React instead of Vue problem. I solved it by always having Claude write a full implementation spec/plan in markdown which I give to a fresh context Claude to implement. Typically, I have comments and make it revise until I am happy.

It has basically eliminated surprises like that.

tom_ 20 days ago [-]

You can say "shit" here if you like.

i_love_retros 20 days ago [-]

> With AI, it’s easier to get the first 90 percent out there. This means we can spend more time on the remaining 10 percent, which means more time for craftsmanship and figuring out how to make your users happy.

EXCEPT... you've just vibe coded the first 90 percent of the product, so completing the remaining 10 percent will take WAY longer than normal because the developers have to work with spaghetti mess.

And right there this guy has shown exactly how little people who are not software developers with experience understand about building software.

fzeroracer 20 days ago [-]

I can't say I'm impressed by this at all. 100+ hours to build a shitty NFT app that takes one picture and a predefined prompt, then mints you a dinosaur NFT. This is the kind of thing I would've seen college students slam out over a weekend for a coding jam with no experience and a few cans of red bull with more quality and effort. Has our standards really gotten so low? I don't see any craftsmanship at play here.

capitalsigma 20 days ago [-]

Also the process sounds like a nightmare: "it broke and I asked 4 different LLMs to fix it; my `AGENTS.md` file contained hundreds of special cases; etc." I thought this article was intended to be a horror story, not an advertisement

nunez 20 days ago [-]

If nothing else, at least the age of AI finally got devs to write good documentation!

plastic041 20 days ago [-]

> Since my main goal was to learn, I decided to do it "the right way". This means I didn’t want to rely on Replit or Lovable where the infra part is obfuscated. I wanted to deal with that complexity myself.

I expected OP to actually 'learn' devops, but what they did was just asking LLMs to do everything.

Also...

> 180+ paid $2 for a dino

People pays $2 for an image of dinosaur with human face?

mbrumlow 20 days ago [-]

> Now I'm pretty sure that people who say they "vibecoded an app in 30 minutes" are either building simple copies of existing projects, produce some buggy crap, or just farm engagement.

Some people seem to be better at it than others. I see a huge gulf in what people can do. Oddly there is a correlation between was a good engineer pre AI and can vibe code well.

But I see one odd thing. A subset of those who people would consider good or even amazing pre AI struggle. The best I can tell at this stage is because they lacked get int good results with unskilled workers in the past and just relied on their own skills to carry the project.

AI coders can do some amazing things. But at this stage you have to be careful about how you guide it down a path in the same way you did with junior engineers. I am not making a comparison to AI being junior, they by far can code better than most senior engineers, and have access to knowledge at lighting speed.

upflag 19 days ago [-]

The 100 hours are real and they're heavily back-loaded toward things that aren't features. In my experience the biggest gap isn't code quality, it's operational awareness. You can refactor the code, add types, write tests. But the part that kills you is when something breaks silently in production and you find out from a user three days later. I had a single-character typo in a deploy (traditional code, not AI) that cost $30K because we discovered it from revenue numbers, not an alert. The fix isn't reviewing every line, it's making sure you know when key flows stop working.

esafak 20 days ago [-]

Look at the screenshots to understand what the author means by 'product'.

stavros 20 days ago [-]

We don't need to shit on someone who shared their experiences and thoughts.

Lerc 20 days ago [-]

I agree with you point, but I do look sidelong at the number of points the post has. It is, at the very least, unexpected.

spiderfarmer 20 days ago [-]

This would have been generic slop if it wasn't for AI.

naasking 20 days ago [-]

Of course vibe coding is going to be a headache if you have very particular aesthetic constraints around both the code and UX, and you aren't capable of clearly and explicitly explaining those constraints (which is often hard to do for aesthetics).

There are some good points here to improve harnesses around development and deployment though, like a deployment agent should ask if there is an existing S3 bucket instead of assuming it has to set everything up. Deployment these days is unnecessarily complicated in general, IMO.

czhu12 20 days ago [-]

I'd also say for a lot of applications -- most applications perhaps -- outside of "consumer" ones, the number of features is quite a bit more important than the shape of a button or the animations during a page transition.

Even pretty massive companies like databricks don't think about those things and basically have a UI template library that they then compose all their interfaces from. Nothing fancy. Its all about features, and LLM create copious amounts of features.

stillpointlab 20 days ago [-]

I came across the following yesterday: "The Great Way is not difficult for those who have no preferences," a famous Zen teaching from the Hsin Hsin Ming by Sengstan

As we move from tailors to big box stores I think we have to get used to getting what we get, rather than feeling we can nitpick every single detail.

I'd also be more interested in how his 3rd, 4th or 5th vibe coded app goes.

fixxation92 20 days ago [-]

What I really want to know is... as a software developer for 25+ years, when using these AI tools- it is still called "vibecoding"? Or is "vibecoding" reserved for people with no/little software development background that are building apps. Genuine question.

Calazon 14 days ago [-]

"Vibecoding" is about how you use AI tools, rather than who you are.

If you're asking the AI to generate large amounts of code that you don't really look at, you're definitely vibecoding. If you are mainly writing the code yourself with some assistance from AI tools, you are not vibecoding.

Naturally it's fuzzy in the middle. And for people without software development skills, vibecoding is the only option they have.

DennisP 20 days ago [-]

Steve Yegge has been a dev for several decades with lead spots at Amazon and Google, has completely converted to using AI, wrote a book about it using it effectively for large production-ready projects, and still calls it vibe coding.

fixxation92 20 days ago [-]

I don't think I'll ever adopt this term, I'm not a fan of it at all. I find myself saying "I was working with AI" and just leave it at that. It is a collaboration afterall.

newsoftheday 20 days ago [-]

As a software developer over 30 years, AI is not a tool, it is not deterministic, it is an aide.

tqwhite 20 days ago [-]

Don't have it do things for you. Have it do things with you.

IAmGraydon 20 days ago [-]

If you hear someone spouting off about how vibe coding allows for creation of killer apps in a fraction of the time/cost, just ask them if you can see what successful killer apps they’ve created with it. It’s always crickets at that point because it’s somewhere between wishful thinking and an outright lie.

jimnotgym 20 days ago [-]

I have not been coding for a few years now. I was wondering if vibe coding could unstick some of my ideas. Here is my question, can I use TDD to write tests to specify what I want and then get the llm to write code to pass those tests?

kantselovich 20 days ago [-]

TDD helps a lot, but it’s no guarantee - LLM is smart enough to “fake” the code to pass tests .

I’m working on project - a password manager, where I have full end to end test harnesses - cli client makes changes, sync them to the server and then observe the data in iOS app running in the emulator. More than once I noticed codex just hard coded expected values from the test harnesses directly into UI layout in iOS app to make the test pass…

Similar issues in the crypto layer - tests were written first , then code was written . During the review I noticed that the code was made to just pass the test - the logic was to check if signature values exists instead of checking if crypto signature is valid.

LLM can help with code reviews as well, but it has to be guided specifically what to look for for. This is with codex 5.4 model

_heimdall 20 days ago [-]

That's a great approach, though I'd also recommend setting up a strong basis for linting, type checking, compilation, etc depending on the language. An LLM given a full test suite and guard rails of basic code style rules will likely do a pretty good job.

I would find it a bit tricky to write a full test suite for a product without any code though. You'd need to understand the architecture a bit and likely end up assuming, or mocking, what helpers, classes, config, etc will be built.

potro 20 days ago [-]

You absolutely can. This is one of recommended directions with agentic coding. But you can go farther and ask llm to write tests too. The review/approve them.

mlaretallack 20 days ago [-]

Yes, I mostly do spec driven developement. And at the design stage, I always add in tests. I repeat this pattern for any new features or bug fixes, get the agent to write a test (unit, intergration or playwright based), reproduce the issue and then implement the change and retest etc... and retest using all the other tests.

linsomniac 20 days ago [-]

To expand on the "Yes": the AI tools work extremely well when they can test for success. Once you have the tests as you'd like them, you may want to tell the LLM not to modify the tests because you can run into situations where it'll "fix" the tests rather than fixing the code.

__mp 20 days ago [-]

yes. depending on the techstack your experience might be better or worse. HTML/CSS/React/Go worked great, but it struggled with Swift (which I had no experience in).

faeyanpiraat 20 days ago [-]

Yes

20 days ago [-]

itomato 20 days ago [-]

Nobody is saying they're ready for production in 30 minutes, just that there is something real where an idea used to be.

Something much closer to production SDLC patterns than a Figma mockup.

quickrefio 20 days ago [-]

The speed of prototyping right now is wild.

The interesting shift seems to be that building the first version is no longer the bottleneck — distribution, UX polish and reliability are.

smoody07 19 days ago [-]

His mistake was trying to get "Simple, Lovable, Complete."

Great vibe coded projects optimize for simple, legible, and complete.

Everything else regresses to a toy.

m3kw9 20 days ago [-]

100 hours try 500 hours at least if you want a competitive product, unless you are a wizard at marketing where you out market the 80/20 guys.

hashmap 20 days ago [-]

if something like a popup appears that i didnt ask the page to do i snap close the page and never look at it again

holoduke 20 days ago [-]

Instead of 10x devs you now have the super rare 100x devs. They are using AI how it should be used.

20 days ago [-]

anonymous344 20 days ago [-]

this is why i use ai just for one file at the time, as extension of my own programming. not so fast, but keeps control

quater321 20 days ago [-]

It already starts with BS. Yes there are apps you can build in 30 minutes and they are great, not buggy or crap as he says it. And there are apps you need 1 hour or even weeks. It depends on what you want to build. To start off by saying that every app build in 30 minutes is crap, simply shows that he did not want to think about it, is ignorant or he simply wanted to push himselve higher up by putting others down. At this point, every programmer who claims that vibecoding doesn't make you at least 10 times more productive is simply lying or worst, doesn't know how to vibe code.

Uptrenda 20 days ago [-]

I mean the worst part about this is the author also vibe coded their security. It could have been much more catastrophic if they built a crypto wallet or trading system. But because it was NFTs I guess the max damage was limited.

I have to say its a little sad that so many devs think of security and cryptography in the same way as library frameworks. In that they see it as just some black box API to use for their projects rather than respecting that its a fully developed, complex field that demands expertise to avoid mistakes.

spacecadet 20 days ago [-]

Im an 20 year veteran of application development consulting. Contributor level... not talking head. I do more estimating than anyone you likely know. Consulting is cooked. I just AI native built (not vibe coding...) an application with a buddy, another Principal level engineer and what would cost a client 500-750k and 8-12 weeks, we did for $200 and 1 sprint. Its a passion project but highly complex mapping and navigation app with host/client multi-user sync'd state. Cooked.

spacecadet 20 days ago [-]

I realize this sounds one sided. Ive also founded companies and worked across the range of startup to faang. Everything has changed... For the better if you ask me.

qsera 20 days ago [-]

You are invested in some kind of AI start up, right?

spacecadet 19 days ago [-]

No. I no longer believe in VC or PE. Evils that hurt society more then benefit...

qsera 20 days ago [-]

>highly complex mapping

Curious. Can you elaborate on this a bit?

spacecadet 20 days ago [-]

Do you have a race car or race team? Happy to onboard you, otherwise, not here.

qsera 20 days ago [-]

No, but I think I got my answer.

spacecadet 19 days ago [-]

unlikely, people tend to assume wrong about me... as an outlier and all...

mahirsaid 20 days ago [-]

i found that to be effective is to use multiple AI tools at once. I'm using Gemini newest model i cant think of at the top of my head right now, and Claude newest model. i have each for its purpose with rustover IDE to speed things up. Rustover is particularly helpful because of how rust is worked with, the constant cargo cli commands and database interactions right in the IDE. i know visual code has this to a certain limit but IMO i prefer Rustover. Using multiple models is because i know what each one is good at and how my knowledge works with their output, makes my life way easier and drives frustration down, which is needed when you need creativity at the forefront. This is being said it def helps to know what you are doing if not 100% at least 60% of the things you are asking the models to do for you, I have caught mistakes and know when a model might make mistake which im fine with, sometimes i just want to see how something is done like the structure for a certain function of crate as im reading cargo.io doc constantly to learn what im doing.

There are plenty of ways to code and use code, which-ever works for you is good just improve on it and make it more effective. I have multiple screens on my computer, i don't like jumping back and fourth opening tabs and browsers so i have my set up the best way that works for me. As for the AI models, they are not going to be that helpful to you if you don't understand why its doing what its doing in a particular function or crate (in case of rust) or library. I imagine the the over the top coder that has years of experience and multiple knowledge in various languages and depth knowledge in libraries, using the same technique he can replace a whole Department by himself technically.

bethekidyouwant 20 days ago [-]

Why did this crypto grifter AI app get traction on this site?

StilesCrisis 20 days ago [-]

It seems like the entire "product" here is just a ChatGPT system prompt: "combine this image of a person with this image of a dinosaur".

The only thing he needed to code was an NFT wrapper, which presumably is just forking an existing NFT wholesale.

The interesting, user-facing part of the project isn't code at all! It's just an HTML front end on someone else's image generator and a "pay me" button.

Very disappointing.

mentalgear 20 days ago [-]

> The "remaining 10 percent" is a difference between slop and something people enjoy.

I would say the remaining 10% are about how robust your solution is - anything associated with 'vibe' feels inherently unsecure. If you can objectively proof it is not, that's 10 % time well spend.

zahlman 20 days ago [-]

> anything associated with 'vibe' feels inherently unsecure.

Only "feels"?

iainctduncan 20 days ago [-]

I can't take anyone seriously who says an AI edge will be a "superpower".

Which part of "commodity" is confusing???

geldedus 19 days ago [-]

Yet another AI-basher who doesn't understand what "vibe coding" is, and why it is different from AI-assisted engineering (what he actually did)

mrothroc 20 days ago [-]

Everyone keeps saying 80/20 but that undersells what's going on. The last 20% isn't just hard. It's hard because of what happened during the first 80%.

When an agent takes a shortcut early on, the next step doesn't know it was a shortcut. It just builds on whatever it was handed. And then the step after that does the same thing. So by hour 80 you're sitting there trying to fix what looks like a UI bug and you realize the actual problem is three layers back. You're not doing the "hard 20%." You're paying interest on shortcuts you didn't even know were taken. (As I type this I'm having flashbacks to helping my kid build lego sets.)

The author figured this out by accident. He stopped prompting and opened Figma to design what he actually wanted. That's the move. He broke the chain before the next stage could build on it. The 100 hours is what it costs when you don't do that.

bmurphy1976 20 days ago [-]

This is how all software projects play out. The difference is when it's people we call it tech debt or bad desing and then start a project to refactor.

Apparently LLMs break some devs brains though. Because it's not one shot perfect they throw their hands in the air claim AI can't ever do it and move on, forgetting all those skills they (hopefully) spent years building to manage complex software. Of course a newbie vibe coder won't know this but an experienced developer should.

sevenseacat 19 days ago [-]

Except when you've worked on building the software yourself instead of getting the LLM to do it, you have a loooooooot of built-up context that you can use to know why decisions were made, to debug faster, and to get things done more efficiently.

I can look at code I wrote years ago and have absolutely no memory of writing it, but I know its my code and I know where some of the warts and traps are. I can answer questions about why things work a certain way.

With an LLM, you don't get that. You're basically starting from scratch when it comes to solving any problem or answering any question.

bmurphy1976 17 days ago [-]

Agreed but that's also a documentation problem. Every time I have to make a tricky decision that I know will trip me up later or some kind of compromise, I now ask the AI to document it and the reasons why. That information has definitely come in handy later and serves as a form of long-term memory.

Best part, it's good for people too.

swixisagrifter 19 days ago [-]

| You're not doing the "hard 20%." You're paying interest on...

Ok clanker

phillipclapham 20 days ago [-]

[flagged]

AstroBen 20 days ago [-]

I don't know how other people work, but writing the code for me has been essential in even understanding the problem space. The architecture and design work in a lot of cases is harder without going through that process.

apitman 20 days ago [-]

See "Programming as Theory Building": https://pages.cs.wisc.edu/~remzi/Naur.pdf

suzzer99 20 days ago [-]

I recently had to build a widget that lets the user pick from a list of canned reports and then preview them in an overlay before sending to the printer (or save to PDF). All I knew was that I wanted each individual report's logic and display to be in its own file, so if the system needed to grow to 100 reports, it wouldn't get any more complicated than with 6 reports.

The final solution ended up being something like: 1. Page includes new React report widget. 2. Widget imports generic overlay component and all canned reports, and lets user pick a report. 3. User picks report, widget sets that specific report component as a child of the overlay component, launches overlay. 4. Report component makes call to database with filters and business logic, passes generic set of inputs (report title, other specifics, report data) to a shared report display template.

My original plan was for the report display template to also be unique to each report file. But when the dust settled, they were so similar that it made sense to use a shared component. If a future report diverges significantly, we can just skip the shared component and create a one-off in the file.

I could have designed all this ahead of time, as I would need to do with an LLM. But it was 10x easier to just start coding it while keeping my ultimate scalability goals in mind.

seanmcdirmid 20 days ago [-]

This. It’s also much easier to tell someone what you don’t like if what you don’t like is right in front of you than to tell them what you want without a point of reference.

kantselovich 20 days ago [-]

100%! Lots of issues are only discovered when enough code has been written. More than that , other issues are discovered only when the project is actually deployed as MVP.

dijksterhuis 20 days ago [-]

- version 1 -- we build what we think is needed

- version 2 -- we realise we're solving a completely different problem to what is needed

- version 3 -- we build what is actually needed

phillipclapham 20 days ago [-]

[flagged]

phillipclapham 20 days ago [-]

[flagged]

jopsen 20 days ago [-]

Yeah, communicating what you want can be hard.

I'm doing a simple single line text editor, and designing some frame options. Which has a start end markers.

This was really hard to get the LLM to do right.. until just took a pen and paper, drew what I wanted, took a photo and gave it to the llm

cyk21 20 days ago [-]

This.

Additionally, the author seems to build an app just for the sake of building an app / learning, not to solve any real serious business problem. Another "big" claim on LLM capabilities based on a solo toy project.

phillipclapham 20 days ago [-]

[flagged]

Gud 20 days ago [-]

Absolutely. You need to treat it like a real program from the very beginning.

tqwhite 20 days ago [-]

YES YES YES!! I so wish that we could go back in time and never, ever have even suggested anything other that what you say here. AI doesn't do it for you. It does it with you.

You have to figure out what you want before the AI codes. The thinking BEFORE is the entire game.

Though I will also say that I use Claude for working out designs a lot. Literally hours sometimes with long periods of me thinking it through.

And I still get a ton more done and often use tech that I would never have approached before these glory days.

phillipclapham 20 days ago [-]

[flagged]

coderenegade 20 days ago [-]

This. Code generation is cheap, so you can rapidly explore the space and figure out the architecture that best suits the problem. From there, I start fresh and pseudocode the basic pattern I want and have Claude fill in the gaps.

sstart 10 days ago [-]

[dead]

ddactic 16 days ago [-]

[dead]

JulianPembroke 19 days ago [-]

[dead]

robutsume 20 days ago [-]

[dead]

anesxvito 20 days ago [-]

[dead]

hirehalai 20 days ago [-]

[dead]

agenticbtcio 20 days ago [-]

[dead]

aplomb1026 20 days ago [-]

[dead]

davispeck 20 days ago [-]

[flagged]

dijksterhuis 20 days ago [-]

... writing the code was never the slow part. correctness (according to what people actually needed) has always been the slow part.

davispeck 20 days ago [-]

[flagged]

dijksterhuis 19 days ago [-]

No. It has nothing to do with workflows.

Writing code was always way faster than delivering software that people actually need or want. That's always been the hard part.

sevenseacat 19 days ago [-]

Because you did a lot of the validation along the way. Incrementally testing, thinking about design and architecture before typing.

stainlu 20 days ago [-]

[flagged]

nanobuilds 20 days ago [-]

One can hope that vibecoded apps will eventually be vibe-maintained with agents trained specifically for the kind of novel and weird bugs ai-coding tends to bring up. These tools will hopefully also get better at identifying security risks created but previous generations of ai models. 6 months is a long time in the life of a vibe coded app.

harmf 17 days ago [-]

[flagged]

myrak 20 days ago [-]

[dead]

olivercoleai 20 days ago [-]

[dead]

redgridtactical 20 days ago [-]

[flagged]

shepherdjerred 20 days ago [-]

> probably 30% of the total dev time was wrapping every async call in try/catch with timeouts, handling permission denials gracefully, making sure corrupted AsyncStorage doesn't brick the app

This is the exact kind of task that LLMs excel at

johnfn 20 days ago [-]

This comment is written by an LLM, right?

Edit: It's interesting how I am getting downvoted here when pangram confirms my suspicions that this is 100% AI generated.

Tadpole9181 20 days ago [-]

It's impossible to be sure, but...

- Em dash

- "Not just X, it's Y"

- "It's the difference between."

- "Narrow statement. Broad statement."

I'm more and more convinced that humans were fundamentally not ready for LLMs and are not taking how existential of a threat it poses to basic communication and social normals seriously enough.

dijksterhuis 20 days ago [-]

scanned through history of the original commentator. i lean towards agreeing either using AI for heavy editing or fully generating comments.

---

@redgridtactical -- if you are doing so and are not aware of the new guideline

> Don't post generated comments or AI-edited comments. HN is for conversation between humans.

https://news.ycombinator.com/newsguidelines.html

d1l 20 days ago [-]

100% bro

croisillon 20 days ago [-]

c'm'on, drop that

Louis830903 20 days ago [-]

[flagged]

nottorp 20 days ago [-]

Wow. First realistic post about coding assistants that I've read on HN, I think.

[Disclaimer: that I have read. Doesn't mean there weren't others.]

Too bad it's about NFTs but we can't have everything, can we?

Rendered at 18:11:09 GMT+0000 (Coordinated Universal Time) with Vercel.