I'm going to take a contrarian view and say it's actually a good UI, but it's all about how you approach it.
I just finished a small project where I used o3-mini and o3-mini-high to generate most of the code. I averaged around 200 lines of code an hour, including the business logic and unit tests. Total was around 2200 lines. So, not a big project, but not a throw away script. The code was perfectly fine for what we needed. This is the third time I've done this, and each time I get faster and better at it.
1. I find a "pair programming" mentality is key. I focus on the high-level code, and let the model focus on the lower level code. I code review all the code, and provide feedback. Blindly accepting the code is a terrible approach.
2. Generating unit tests is critical. After I like the gist of some code, I ask for some smoke tests. Again, peer review the code and adjust as needed.
3. Be liberal with starting a new chat: the models can get easily confused with longer context windows. If you start to see things go sideways, start over.
4. Give it code examples. Don't prompt with English only.
FWIW, o3-mini was the best model I've seen so far; Sonnet 3.5 New is a close second.
ryandrake 16 hours ago [-]
I guess the things I don't like about Chat are the same things I don't like about pair (or team) programming. I've always thought of programming as a solitary activity. You visualize the data structures, algorithms, data paths, calling flow and stack, and so on, in your mind, with very high throughput "discussions" happening entirely in your brain. Your brain is high bandwidth, low latency. Effortlessly and instantly move things around and visualize them. Figure everything out. Finally, when it's correct, you send it to the slow output device (your fingers).
The minute you have to discuss those things with someone else, your bandwidth decreases by orders of magnitude and now you have to put words to these things and describe them, and physically type them in or vocalize them. Then your counterpart has to input them through his eyes and ears, process that, and re-output his thoughts to you. Slow, slow, slow, and prone to error and specificity problems as you translate technical concepts to English and back.
Chat as a UX interface is similarly slow and poorly specific. It has all the shortcomings of discussing your idea with a human and really no upside besides the dictionary-like recall.
alickz 3 minutes ago [-]
in my experience, if you can't explain something to someone else then you don't fully understand it
our brains like to jump over inconsistencies or small gaps in our logic when working by themselves, but try to explain that same concept to someone else and those inconsistencies and gaps become glaringly obvious (doubly so if the other person starts asking questions you never considered)
it's why pair programming and rubber duck debugging work at all, at least in my opinion
yarekt 15 hours ago [-]
That's such a mechanical way of describing pair programming. I'm guessing you don't do it often (understandable if its not working for you).
For me pair programming accelerates development to much more than 2x. Over time the two of you figure out how to use each other's strengths, and as both of you immerse yourself in the same context you begin to understand what's needed without speaking every bit of syntax between each other.
In best cases as a driver you end up producing high quality on the first pass, because you know that your partner will immediately catch anything that doesn't look right. You also go fast because you can sometimes skim over complexities letting your partner think ahead and share that context load.
I'll leave readers to find all the caveats here
Edit: I should probably mention why I think Chat Interface for AI is not working like Pair programming: As much as it may fake it, AI isn't learning anything while you're chatting to it. Its pointless to argue your case or discuss architectural approaches. An approach that yields better results with Chat AI is to just edit/expand your original prompt. It also feels less like a waste of time.
With Pair programming, you may chat upfront, but you won't reach that shared understanding until you start trying to implement something. For now Chat AI has no shared understanding, just "what I asked you to do" thing, and that's not good enough.
RHSeeger 14 hours ago [-]
I think it depends heavily on the people. I've done pair programming at a previous job and I hated it. It wound up being a lot slower overall.
For me, there's
- Time when I want to discuss the approach and/or code to something (someone being there is a requirement)
- Time when I want to rubber duck, and put things to words (someone being there doesn't hurt, but it doesn't help)
- Time when I want to write code that implements things, which may be based on the output of one of the above
That last bucket of time is generally greatly hampered by having someone else there and needing to interact with them. Being able to separate them (having people there for the first one or two, but not the third) is, for me, optimal.
Tempest1981 2 hours ago [-]
You could try setting some quiet hours. Or headphones.
Maybe collaborate the first hour each morning, then the first hour after lunch.
andreasmetsala 12 minutes ago [-]
I think you missed the point. AI chat is not compatible with the solitary focused programming session.
skue 6 hours ago [-]
> I'm guessing you don't do it often (understandable if its not working for you).
For me pair programming accelerates development to much more than 2x.
The value of pair programming is inversely proportional to the expertise of the participant. Junior devs who pair with senior devs get a lot out of it, senior devs not so much.
GP is probably a more experienced dev, whereas you are the type of dev who says things like “I’m guessing that you…”.
ionwake 13 hours ago [-]
this is so far removed from anything I have ever heard or experienced. But I know not everyone is the same and it is refreshing to view this comment.
freehorse 12 hours ago [-]
Pair programming is imo great when there is some sort of complementarity between the programmers. It may or may not accelerate output, but it can definitely accelerate learning which is often harder. But as you say, this is not what working with llms is about.
taneq 9 hours ago [-]
> As much as it may fake it, AI isn't learning anything while you're chatting to it.
What's your definition of 'learn'? An LLM absolutely does extract and store information from its context. Sure, it's only short term memory and it's gone the next session, but within the session it's still learning.
I like your suggestion to update your original prompt instead of continuing the conversation.
frocodillo 16 hours ago [-]
I would argue that is a feature of pair programming, not a bug. By forcing you to use the slower I/O parts of your brain (and that of your partner) the process becomes more deliberate, allowing you to catch edge cases, bad design patterns, and would-be bugs before even putting pen to paper so to speak. Not to mention that it immediately reduces the bus factor by having two people with a good understanding of the code.
I’m not saying pair programming is a silver bullet, and I tend to agree that working on your own can be vastly more efficient. I do however think that it’s a very useful tool for critical functionality and hard problems and shouldn’t be dismissed.
RHSeeger 14 hours ago [-]
You can do that without pair programming, though. Both through actual discussions and through rubber ducking.
TeMPOraL 14 hours ago [-]
I guess it depends on a person. My experience is close to that of 'ryandrake.
I've been coding long enough to notice there are times where the problem is complex and unclear enough that my own thought process will turn into pair programming with myself, literally chatting with myself in a text file; this process has the bandwidth and latency on the same order as talking to another person, so I might just as well do that and get the benefit of an independent perspective.
The above is really more of a design-level discussion. However, there are other times - precisely those times that pair programming is meant for - when the problem is clear enough I can immerse myself in it. Using the slow I/O mode, being deliberate is exactly the opposite of what I need then. By moving alone and focused, keeping my thoughts below the level of words, I can explore the problem space much further, rapidly proposing a solution, feeling it out, proposing another, comparing, deciding on a direction, noticing edge cases and bad design up front and dealing with them, all in a rapid feedback loop with test. Pair programming in this scenario would truly force me to "use the slower I/O parts of your brain", in that exact sense: it's like splitting a highly-optimized in-memory data processing pipeline in two, and making the halves communicate over IPC. With JSON.
As for bus factor, I find the argument bogus anyway. For that to work, pair programming would've to be executed with the same partner or small group of partners, preferably working on the same or related code modules, daily, over the course of weeks at least - otherwise neither them nor I are going to have enough exposure to understand what the other is working on. But it's not how pair programming worked when I've experienced it.
It's a problem with code reviews, too: if your project has depth[0], I won't really understand the whole context of what you're doing, and you won't understand the context of my work, so our reviews of each others' code will quickly degenerate to spotting typos, style violations, and peculiar design choices; neither of us will have time or mental capacity to fully understand the changeset before "+2 LGTM"-ing it away.
--
[0] - I don't know if there's a a better, established term for it. What I mean is depth vs. breadth in the project architecture. Example of depth: you have a main execution orchestrator, you have an external data system that handles integrations with a dozen different data storage systems, then you have math-heavy business logic on data, then you have RPC for integrating with GUI software developed by another team, then you have extensive configuration system, etc. - each of those areas is full of design and coding challenges that don't transfer to any other. Contrast that with an example of breadth: a typical webapp or mobile app, where 80% of the code is just some UI components and a hundred different screens, with very little unique or domain-specific logic. In those projects, developers are like free electrons in metal: they can pick any part of the project at any given moment and be equally productive working on it, because every part is basically the same as every other part. In those projects, I can see both pair programming and code reviews deliver on their promises in full.
bcoates 8 hours ago [-]
Agreed, particularly on code reviews: the only useful code reviews I've had were either in an outright trainee/expert relationship or when the reviewer is very experienced in the gotchas of the project being modified and the reviewer is new.
Peer and near-peer reviews have always wound up being nitpicking or perfunctory.
An alternative that might work if you want two hands on every change for process reasons is to have the reviewer do something closer to formal QA, building and running the changed code to verify it has the expected behavior. That has a lot of limitations too, but it least it doesn’t degrade to bikeshedding about variable name aesthetics.
skydhash 10 hours ago [-]
As I work, I pepper the files with TODO comments, then do a quick rgrep to find action items.
hinkley 10 hours ago [-]
Efficient, but not always more effective.
hmcdona1 5 hours ago [-]
This going to sound out of left field, but I would venture to guess you have very high spatial reasoning skills. I operate much this same way and only recently connected these dots that that skill might be what my brain leans on so heavily while programming and debugging.
Pair programming is endlessly frustrating beyond just rubber duckying because I’m having to exit my mental model, communicate it to someone else, and then translate and relate their inputs back into my mental model which is not exactly rooted in language in my head.
knighthack 30 minutes ago [-]
I mostly agree with you, but I have to point out something to the contrary of this part you said: "...The minute you have to discuss those things with someone else, your bandwidth decreases by orders of magnitude and now you have to put words to these things and describe them, and physically type them in or vocalize them."
Subvocalization/explicit vocalization of what you're doing actually improves your understanding of the code. Doing so may 'decrease bandwith', but improves comprehension, because it's basically inline rubber duck debugging.
It's actually easy to write code which you don't understand and cannot explain what it's doing, whether at the syntax, logic or application level. I think the analogue is to writing well; anyone can write streams of consciousness amounting to word salad garbage. But a good writer can cut things down and explain why every single thing was chosen, right down to the punctuations. This feature of writing should be even more apparent with code.
I've coded tons of things where I can get the code working in a mediocre fashion, and yet find great difficulty in try to verbally explain what I'm doing.
In contrast there's been code where I've been able to explain each step of what I'm doing before I even write anything; in those situations what generally comes out tends to be superior maintainable code, and readable too.
cjonas 8 hours ago [-]
I find its exactly the opposite. With AI chat, I can define signatures, write technical requirements and validate my approach in minutes. I'm not talking with the AI like I would a human... I'm writing a blend of stubs and concise requirements, providing documentation, reviewing, validating and repeating. When it goes in the wrong direction, I add additional details and regenerate from scratch. I focus on small, composable chunks of functionality and then tie it all together at the end.
bobbiechen 14 hours ago [-]
I agree, chat is only useful in scenarios that are 1) poorly defined, and 2) require a back-and-forth feedback loop. And even then, there might be better UX options.
At the same time, putting your ideas to words forces you to make them concrete instead of nebulous brain waves. I find that the chat interface gets rid of the downsides of pair programming (that the other person is a human being with their own agency*) while maintaining the “intelligent” pair programmer aspect.
Especially with the new r1 thinking output, I find it useful to iterate on the initial prompt as a way to make my ideas more concrete as much as iterating through the chat interface which is more hit and miss due to context length limits.
* I don’t mean that in a negative way, but in a “I can’t expect another person to respond to me instantly at 10 words per second” way.
cortesoft 15 hours ago [-]
> At the same time, putting your ideas to words forces you to make them concrete instead of nebulous brain waves.
I mean, isn’t typing your code also forcing you to make your ideas concrete
RHSeeger 14 hours ago [-]
Doing it in your native language can add an extra dimension to it, though. In a way, I would consider it like double checking your work on something like a math problem by solving it a different way. By having to express the problem and solution in clear language, it can really help you make sure your solution is a good one, and considers all the angles.
nick238 13 hours ago [-]
Someone else (future you being a distinct person) will also need to grok what's going on when they maintain the code later. By living purely in a high-dimensional trans-enlightenment state and coding that way, means you may as well be building a half-assed organic neural network to do your task, rather than something better "designed".
Neural networks and evolved structures and pathways (e.g. humans make do with ~20k genes and about that many more in regulatory sequences) are absolutely more efficient, but good luck debugging them.
godelski 16 hours ago [-]
> I focus on the high-level code, and let the model focus on the lower level code.
Tbh the reason I don't use LLM assistants is because they suck at the "low level". They are okay at mid level and better at high level. I find it's actual coding very mediocre and fraught with errors.
I've yet to see any model understand nuance or detail.
This is especially apparent in image models. Sure, it can do hands but they still don't get 3D space nor temporal movements. It's great for scrolling through Twitter but the longer you look the more surreal they get. This even includes the new ByteDance model also on the front page. But with coding models they ignore context of the codebase and the results feel more like patchwork. They feel like what you'd be annoyed at with a junior dev for writing because not only do you have to go through 10 PRs to make it pass the test cases but the lack of context just builds a lot of tech debt. How they'll build unit tests that technically work but don't capture the actual issues and usually can be highly condensed while having greater coverage. It feels very gluey, like copy pasting from stack overflow when hyper focused on the immediate outcome instead of understanding the goal. It is too "solution" oriented, not understanding the underlying heuristics and is more frustrating than dealing with the human equivalent who says something "works" as evidenced by the output. This is like trying to say a math proof is correct by looking at just the last line.
Ironically, I think in part this is why chat interface sucks too. A lot of our job is to do a lot of inference in figuring out what our managers are even asking us to make. And you can't even know the answer until you're part way in.
yarekt 14 hours ago [-]
> A lot of our job is to do a lot of inference in figuring out what our managers are even asking us to make
This is why I think LLMs can't really replace developers. 80% of my job is already trying to figure out what's actually needed, despite being given lots of text detail, maybe even spec, or prototype code.
Building the wrong thing fast is about as useful as not building anything at all. (And before someone says "at least you now know what not to do"? For any problem there are infinite number of wrong solutions, but only a handful of ones that yield success, why waste time trying all the wrong ones?)
godelski 12 hours ago [-]
> Building the wrong thing fast is about as useful as not building anything at all.
SAY IT LOUDER
Fully agree. Plus, you may be faster in the short term but you won't in the long run. The effects of both good code and bad code compound. "Tech debt" is just a fancy term for "compounding shit". And it is true, all code is shit, but it isn't binary; there is a big difference between being stepping in shit and being waist deep in shit.
I can predict some of the responses
Premature optimization is the root of all evil
There's a grave misunderstanding in this adage[0], and I think many interpret it as "don't worry about efficiency, worry about output." But the context is that you shouldn't optimize without first profiling the code, not that you shouldn't optimize![1] I find it also funny revisiting this quote, because it seems like it is written by a stranger in a strange land, where programmers are overly concerned with optimizing their code. These days, I hear very little about optimization (except when I work with HPC people) except when people are saying to not optimize. Explains why everything is so sluggish...
[1] Understanding the limitations of big O analysis really helps in understanding why this point matters. Usually when n is small, you can have worse big O and still be faster. But the constants we drop off often aren't a rounding error. https://csweb.wooster.edu/dbyrnes/cs200/htmlNotes/qsort3.htm
TeMPOraL 13 hours ago [-]
> For any problem there are infinite number of wrong solutions, but only a handful of ones that yield success, why waste time trying all the wrong ones?
Devil's advocate: because unless you're working in heavily dysfunctional organization, or are doing a live coding interview, you're not playing "guess the password" with your management. Most of the time, they have even less of a clue about how the right solution looks like! "Building the wrong thing" lets them diff something concrete against what they imagined and felt like it would be, forcing them to clarify their expectations and give you more accurate directions (which, being a diff against a concrete things, are less likely to be then misunderstood by you!). And, the faster you can build that wrong thing, the less money and time is burned to buy that extra clarity.
bcoates 8 hours ago [-]
It's just incredibly inefficient if there's any other alternative.
Doing 4 sprints over 2 months to make a prototype in order to save 3 60 minute meetings over a week where you do a few requirements analysis/proposal review cycles.
andreasmetsala 5 minutes ago [-]
> Doing 4 sprints over 2 months to make a prototype
That’s a lot of effort for a prototype that you should be throwing away even if it does the right thing!
Are you sure you’re not gold plating your prototypes?
TeMPOraL 2 hours ago [-]
Yeah, that would be stupid. I was thinking one order of magnitude less in terms of effort. If you can make a prototype in a day, it might deliver way more value than 3x 60 minute meetings. If you can make it in a week, where the proper implementation would take more than a month, that could still be a huge win.
I see this not as opposed, but as part of requirements analysis/review - working in the abstract, with imagination and prose and diagrams, it's too easy to make invalid assumptions without anyone realizing it.
godelski 8 hours ago [-]
I don't think you're disagreeing, in fact, I think you're agreeing. Ironically with the fact of either you or I being wrong demonstrating the difficulty of chat based UI communication. I believe yarekt would be in agreement with me that
> you can't even know the answer until you're part way in.
Which it seems you do too. But for clarity, there's a big difference between building the /wrong/ thing and /not the right thing/. The underlying point of my comment is that not only is communication difficult, but the overall goals are ambiguous. That a lot of time should be dedicated to getting this right. Yes, that involves finding out what things are wrong and that is the sentiment behind the original meaning of "fail fast" but I think that term has come to mean something a bit different now. Moreover, I believe that there's just people not looking at details.
It is really hard to figure out what the right thing is. We humans don't do this just through chat. We experiment, discuss, argue, draw, and there's tons of inference and reliance upon shared understandings. There's a lot of associated context. You're right that a dysfunctional organization (not uncommon) is worse, but these things are still quite common in highly functioning organizations. Your explicit statement about management having even less of an idea of what the right solution is, is explicitly what we're pushing back against. Saying that that is a large part of a developer's job. I would argue that the main reason we have a better idea is due to our technical skills, our depth of knowledge, our experience. A compression machine (LLM) will get some of this, but there's a big gap when trying to get to the end point. Pareto is a bitch. We all know there is a huge difference between a demonstrating prototype and an actual product. That the amount of effort and resources are exponentially different. ML systems specifically struggle with detail and nuance, but this is the root of those resource differences.
I'll give an example for clarity. Considering the iPad, the existence of third party note taking apps can be interpreted of nothing short of Apple's failure. I mean for the love of god, you got the pencil and you can't pull up notes and interact with it like it is a piece of paper? It's how the damned thing is advertised! A third party note taking app should be interpreted by Apple as showing their weak points. But you can't even zoom on the notes app?! Sure, you can turn on the accessibility setting and zoom with triple tap (significantly diverging from the standard pinching gesture used literally everywhere else) but if you do this (assuming full screen) you are just zooming in on the portion of the actual screen and not zooming in the notes. So you get stupid results like not having access to your pen's settings. Which is extra important here given that the likely reason someone would zoom is to adjust details and certainly you're going to want to adjust the eraser size. What I'm trying to say is that there's a lot of low hanging fruit here that should be incredibly obvious were you to actually use the application, dog-fooding. Instead, Apple is dedicating time into hand writing recognition and equation solving, which in practice (at least in my experience) end up creating a more jarring experience and cause more editing. Though it is cool when it does work. I'd say that here, Apple is not building the right thing. They are completely out of touch with the actual goals and experiences of the users. It's not that they didn't build a perfect app, it is that they fail to build basic functionality.
But of course, Apple probably doesn't care. Because they significantly prioritize profits over building a quality product. These are orthogonal aspects and they can be simultaneously optimized. One should not need pick one over another and the truth is that our economics should ensure alignment, that quality begets profits and that one can't "hack" the metrics.
Apple is far from alone here though. I'd say this "low hanging infuriating bullshit" is actually quite common. In fact, I think it is becoming more common. I have argued here before about the need for more "grumpy developers." I think if you're not grumpy, you should be concerned. Our job is to find problems, break them down into a way that can be addressed, and to resolve them. The "grumpiness" here is a dissatisfaction with the state of things. Given that nothing is perfect, there should always be reason to be "grumpy." A good developer should be able to identify and fix problems without being asked. But I do think there's a worrying decline of (lack of!) "grumpy" types, and I have no doubt this is connected to the rapid rise of vaporware and shitty products.
Also, I notice you play Devil's advocate a lot. While I think it can be useful, I think it can be overused. It needs to drive home the key limitations to an argument, especially when they are uncomfortable. Though I think in our case, I'm the one making the argument that diverges from the norm.
lucasmullens 15 hours ago [-]
> But with coding models they ignore context of the codebase and the results feel more like patchwork.
Have you tried Cursor? It has a great feature that grabs context from the codebase, I use it all the time.
pc86 15 hours ago [-]
I can't get the prompt because I'm on my work computer but I have about a three-quarter-page instruction set in the settings of cursor, it asks clarifying questions a LOT now, and is pretty liberal with adding in commented pseudo-code for stuff it isn't sure about. You can still trip it up if you try, but it's a lot better than stock. This is with Sonnet 3.5 agent chats (composer I think it's called?)
I actually cancelled by Anthropic subscription when I started using cursor because I only ever used Claude for code generation anyway so now I just do it within the IDE.
troupo 13 hours ago [-]
> It has a great feature that grabs context from the codebase, I use it all the time.
If only this feature worked consistently, or reliably even half of the time.
It will casually forget or ignore any and all context and any and all files in your codebase at random times, and you never know what set of files and docs it's working with at any point in time
godelski 14 hours ago [-]
I have not. But I also can't get the general model to work well in even toy problems.
It probably isn't obvious in a quick read, but there are mistakes here. Maybe the most obvious is that how `replacements` is made we need to intelligently order. This could be fixed by sorting. But is this the right data structure? Not to mention that the algorithm itself is quite... odd
To give a more complicated example I passed the same prompt from this famous code golf problem[0]. Here's the results, I'll save you the time, the output is wrong https://0x0.st/8K3M.txt (note, I started command likes with "$" and added some notes for you)
Just for the heck of it, here's the same thing but with o1-preview
As you can see, o1 is a bit better on the initial problem but still fails at the code golf one. It really isn't beating the baseline naive solution. It does 170 MiB/s compared to 160 MiB/s (baseline with -O3). This is something I'd hope it could do really well on given that this problem is rather famous and so many occurrences of it should show up. There's tons of variations out there and It is common to see parallel fizzbuzz in a class on parallelization as well as it can teach important concepts like keeping the output in the right order.
But hey, at least o1 has the correct output... It's just that that's not all that matters.
I stand by this: evaluating code based on output alone is akin to evaluating a mathematical proof based on the result. And I hope these examples make the point why that matters, why checking output is insufficient.
Edit: I want to add that there's also an important factor here. The LLM might get you a "result" faster, but you are much more likely to miss the learning process that comes with struggling. Because that makes you much faster (and more flexible) not just next time but in many situations where even a subset is similar. Which yeah, totally fine to glue shit together when you don't care and just need something, but there's a lot of missed value if you need to revisit any of that. I do have concerns that people will be plateaued at junior levels. I hope it doesn't cause seniors to revert to juniors, which I've seen happen without LLMs. If you stop working on these types of problems, you lose the skills. There's already an issue where we rush to get output and it has clear effects on the stagnation of devs. We have far more programmers than ever but I'm not confident we have a significant number more wizards (the percentage of wizards is decreasing). There's fewer people writing programs just for fun. But "for fun" is one of our greatest learning tools as humans. Play is a common trait you see in animals and it exists for a reason.
wiremine 14 hours ago [-]
> Tbh the reason I don't use LLM assistants is because they suck at the "low level". They are okay at mid level and better at high level. I find it's actual coding very mediocre and fraught with errors.
That's interesting. I found assistants like Copilot fairly good at low level code, assuming you direct it well.
godelski 12 hours ago [-]
I have a response to a sibling comment showing where GPT 4o and o1-preview do not yield good results.
> assuming you direct it well.
But hey, I admit I might not be good at this. But honestly, I've found greater value in my time reading the docs than spending trying to prompt engineer my way through. And I've given a fair amount of time to trying to get good at prompting. I just can't get it to work.
I do think that when I'm coding with an LLM it _feels_ faster, but when I've timed myself, it doesn't seem that way. It just seems to be less effort (I don't mind the effort, especially because the compounding rewards).
rpastuszak 16 hours ago [-]
I've changed my mind on that as well. I think that, generally, chat UIs are a lazy and not very user friendly. However, when coding I keep switching between two modes:
1. I need a smart autocomplete that can work backwards and mimic my coding patterns
2. I need a pair programming buddy (of sorts, this metaphor doesn't completely work, but I don't have a better one)
Pair development, even a butchered version of the so called "strong style" (give the driver the highest level of abstraction they can use/understand) works quite well for me. But, the main reason this works is that it forces me to structure my thinking a little bit, allows me to iterate on the definition of the problem. Toss away the sketch with bigger parts of the problem, start again.
It also helps me to avoid yak shaving, getting lost in the detail or distracted because the feedback loop between me seeing something working on the screen vs. the idea is so short (even if the code is crap).
I'd also add 5.: use prompts to generate (boring) prompts. For instance, I needed a simple #tag formatter for one of my markdown sites. I am aware that there's a not-so-small list of edge cases I'd need to cover. In this case I'd write a prompt with a list of basic requirements and ask the LLM to: a) extend it with good practice, common edge cases b) format it as a spec with concrete input / output examples. This works a bit similar to the point you made about generating unit tests (I do that too, in tandem with this approach).
In a sense 1) is autocomplete 2) is a scaffolding tool.
yarekt 14 hours ago [-]
Oh yea, point 1 for sure. I call copilot regex on steroids.
Example:
- copy paste a table from a pdf datasheet into a comment (it'll be badly formatted with newlines and whatnot, doesn't matter)
- show it how to do the first line
- autocomplete the rest of the table
- Check every row to make sure it didn't invent fields/types
For this type of workflow the tools are a real time saver. I've yet to see any results for the other workflows. They usually just frustrate me by either starting to suggest nonsense code without full understanding, or its far too easy to bias the results and make them stuck in a pattern of thinking.
ryandrake 15 hours ago [-]
> I've changed my mind on that as well. I think that, generally, chat UIs are a lazy and not very user friendly. However, when coding I keep switching between two modes:
> 1. I need a smart autocomplete that can work backwards and mimic my coding patterns
> 2. I need a pair programming buddy (of sorts, this metaphor doesn't completely work, but I don't have a better one)
Thanks! This is the first time I've seen it put this clearly. When I first tried out CoPilot, I was unsure of how I was "supposed" to interact with it. Is it (as you put it) a smarter autocomplete, or a programming buddy? Is it both? What was the right input method to use?
After a while, I realized that for my personal style I would pretty much entirely use method 1, and never method 2. But, others might really need that "programming buddy" and use that interface instead.
echelon 16 hours ago [-]
I work on GenAI in the media domain, and I think this will hold true with other fields as well:
- Text prompts and chat interfaces are great for coarse grained exploration. You can get a rough start that you can refine. "Knight standing in a desert, rusted suit of armor" gets you started, but you'll want to take it much further.
- Precision inputs (mouse or structure guided) are best for fine tuning the result and honing in on the solution itself. You can individually plant the cacti and pose the character. You can't get there with text.
Syzygies 1 hours ago [-]
An environment such as Cursor supports many approaches for working with AI. "Chat" would be the instructions printed on the bottom, perhaps how their developers use it, but far from the only mode it actually supports.
It is helpful to frame this in the historical arc described by Yuval Harari in his recent book "Nexus" on the evolution of information systems. We're at the dawn of history for how to work with AI, and actively visualizing the future has an immediate ROI.
"Chat" is cave man oral tradition. It is like attempting a complex Ruby project through the periscope of an `irb` session. One needs to use an IDE to manage a complex code base. We all know this, but we haven't connected the dots that we need to approach prompt management the same way.
Flip ahead in Harari's book, and he describes rabbis writing texts on how to interpret [texts on how to interpret]* holy scriptures. Like Christopher Nolan's movie "Inception" (his second most relevant work after "Memento"), I've found myself several dreams deep collaborating with AI to develop prompts for [collaborating with AI to develop prompts for]* writing code together. Test the whole setup on multiple fresh AI sessions, as if one is running a business school laboratory on managerial genius, till AI can write correct code in one shot.
Duh? Good managers already understand this, working with teams of people. Technical climbers work cliffs this way. And AI was a blithering idiot until we understood how to simulate recursion in multilayer neural nets.
AI is a Rorschach inkblot test. Talk to it like a kindergartner, and you see the intelligence of a kindergartner. Use your most talented programmer to collaborate with you in preparing precise and complete specifications for your team, and you see a talented team of mature professionals.
We all experience degradation of long AI sessions. This is not inevitable; "life extension" needs to be tackled as a research problem. Just as old people get senile, AI fumbles its own context management over time. Civilization has advanced by developing technologies for passing knowledge forward. We need to engineer similar technologies for providing persistent memory to make each successive AI session smarter than the last. Authoring this knowledge helps each session to survive longer. If we fail to see this, we're condemning ourselves to stay cave men.
Compare the history of computing. There was a lot of philosophy and abstract mathematics about the potential for mechanical computation, but our worldview exploded when we could actually plug the machines in. We're at the same inflection point for theories of mind, semantic compression, structured memory. Indeed, philosophy was an untestable intellectual exercise before; now we can plug it in.
How do I know this? I'm just an old mathematician, in my first month trying to learn AI for one final burst of productivity before my father's dementia arrives. I don't have time to wait for anyone's version of these visions, so I computed them.
In mathematics, the line in the sand between theory and computation keeps moving. Indeed, I helped move it by computerizing my field when I was young. Mathematicians still contribute theory, and the computations help.
A similar line in the sand is moving, between visionary creativity and computation. LLMs are association engines of staggering scope, and what some call "hallucinations" can be harnessed to generalize from all human endeavors to project future best practices. Like how to best work with AI.
I've tested everything I say here, and it works.
1 hours ago [-]
dataviz1000 16 hours ago [-]
I agree with you.
Yesterday, I asked o3-mini to "optimize" a block of code. It produced very clean, functional TypeScript. However, because the code is reducing stock option chains, I then asked o3-mini to "optimize for speed." In the JavaScript world, this is usually done with for loops, and it even considered aspects like array memory allocation.
This shows that using the right qualifiers is important for getting the results you want. Today, I use both "optimize for developer experience" and "optimize for speed" when they are appropriate.
Although declarative code is just an abstraction, moving from imperative jQuery to declarative React was a major change in my coding experience. My work went from telling the system how to do something to simply telling it what to do. Of course, in React—especially at first—I had to explain how to do things, but only once to create a component. After that, I could just tell the system what to do. Now, I can simply declare the desired outcome, the what. It helps to understand how things work, but that level of detail is becoming less necessary.
javier2 16 hours ago [-]
Nah, a Chat is terrible for development. In my tears of working, i have only had the chance to start a new codebase 3-4 times. 90% of the time is spent modifying large existing systems, constantly changing them. The chat interface is terrible for this. It would be much better if it was more integrated with the codebase and editor
pc86 15 hours ago [-]
Cursor does all of this, and agent chats let you describe a new feature or an existing bug and it will search the entire codebase and add relevant code to its context automatically. You can optionally attach files for the context - code files that you want to add to the context up front, documentation for third-party calls, whatever you want.
As a side note, "No, you're wrong" is not a great way to have a conversation.
javier2 14 hours ago [-]
Yeah, that is right. I'll give Cursor a try, because I believe we can do much better than these hopeless chat windows!
pc86 14 hours ago [-]
I've tried every LLM+IDE combo that I've heard about and Cursor is by far the best.
zahlman 11 hours ago [-]
>In my tears of working
Sometimes typos are eerily appropriate ;)
(I almost typed "errily"...)
javier2 4 hours ago [-]
I’ll leave it!
rafaelmn 16 hours ago [-]
This only works for small self-contained problems with narrow scope/context.
Chat sucks for pulling in context, and the only worse thing I've tried is the IDE integrations that supposedly pull the relevant context for you (and I've tried quite a few recently).
I don't know if naive fine-tuning with codebase would work, I suspect there are going to be tools that let you train the AI on code in the sense that it can have some references in model, and it knows how you want your project code/structure to look like (which is often quite different from what it looks in most areas)
knes 12 hours ago [-]
IMHO, I would agree with you.
I think chat is a nice intermediary evolution between the CLI (that we use every day) and whatever comes next.
I work at Augment (https://augmentcode.com), which, surprise surprise, is an AI coding assistant. We think about the new modality required to interact with code and AI on a daily basis.
Beside increase productivity (and happiness, as you don't have to do mundane tasks like tests, documentations, etc), I personally believe that what AI can open up is actually more of a way for non-coders (think PMs) to interact with a codebase. AI is really good at converting specs, user stories, and so on into tasks—which today still need to be implemented by software engineers (with the help of AI for the more tedious work). Think of what Figma did between designers and developers, but applied to coding.
What’s the actual "new UI/UX paradigm"? I don’t know yet. But like with Figma, I believe there’s a happy ending waiting for everyone.
gamedever 15 hours ago [-]
What did you create? In my field, so far, I've found the chat bots not doing so well. My guess is the more likely you're making something other people make often, the more likely the bot will help.
Even then though, I asked o1-cursor to start a react app. It failed, mostly because it's out of date. It's instructions were for react 2 versions ago.
This seems like an issue. If the statistically most likley answer is old, that's not helpful.
wiremine 14 hours ago [-]
The most recent one was a typescript project focused on zod.
I might be reading into your comment, but I agree "top-down" development sucks: "Give me a react that does X". I've had much more success going bottom-up.
And I've often seen models getting confused on versions. You need to be explicit, and even then then forget.
jacob019 16 hours ago [-]
Totally agree. Chat is a fantastic interface because it stays out of my way. For me it's much more than a coding assistant. I get live examples of how to use tools, and help with boilerplate, which is a time saver and improvement over legacy workflows, but the real benefit is all the spitballing I can do with it to refine ideas and logic and help getting up to speed on tooling way outside of my domain. I spent about 3.5 hours chatting with o1 about RL architecture to solve some business problems. Now I have a crystal clear plan and the confidence to move forward in an optimal way. I feel a little weird now, like I was just talking to myself for a few hours, but it totally helped me work through the planning. For actual code, I find myself being a bit less interactive with LLMs as time goes, sometimes it's easier to just write the logic the way I want rather than trying to explain how I want it but the ability to retrieve code samples for anything with ease is like a superpower. Not to mention all the cool stuff LLMs can do at runtime via API. Yeah, chat is great, and I'll stick with writing code in Vim and pasting as needed.
bboygravity 14 hours ago [-]
Interesting to see the narrative on here slowly change from "LLM's will forever be useless for programming" to "I'm using it every day" over the course of the past year or so.
I'm now bracing for the "oh sht, we're all out of a job next year" narrative.
RHSeeger 14 hours ago [-]
I think a lot of people have always thought of it as a tool that can help.
I don't want an LLM to generate "the answer" for me in a lot of places, but I do think it's amazing for helping me gather information (and cite where that information came from) and pointers in directions to look. A search engine that generates a concrete answer via LLM is (mostly) useless to me. One that gives me an answer and then links to the facts it used to generate that answer is _very_ useful.
It's the same way with programming. It's great helping you find what you need. But it needs to be in a way that you can verify it's right; or take it's answer and adjust it to what you actually need (based on the context it provides).
wiremine 14 hours ago [-]
> "oh sht, we're all out of a job next year"
Maybe. My sense if we'd need to see 3 to 4 orders of magnitude improvements on the current models before we can replace people outright.
I do think we'll see a huge productivity boost per developer over the next few years. Some companies will use that to increase their throughput, and some will use it to reduce overhead.
mirkodrummer 12 hours ago [-]
Whenever I read huge productivity boost for developers or companies I shiver. Software sucked more and more even before LLMs, I don't see it getting better just getting out faster maybe. I'm afraid in most cases it will be a disaster
larodi 15 hours ago [-]
I would actually join you, as my longstanding view on coding is that it is best done in pairs. Sadly humans and programmers in particular are not so ready to work arms-by-arms, and it is even more depressing that it now turns AI is pairing us.
Perhaps there's gonna be post-AI programming movement where people actually stare at the same monitor and discuss while one of them is coding.
As a sidenote - we've done experiments with FOBsters, and when paired this way, the multiply their output. There's something about psychology of groups and how one can only provide maximum output when teaming.
Even for solo activities, and non-IT activities, such as skiing/snowboard, it is better to have a partner to ride with you and discuss the terrain.
shmoogy 16 hours ago [-]
Have you tried cursor? I really like the selecting context -> cmd+l to make a chat with it - explain requirement, hit apply, validate the diff.
Works amazingly well for a lot of what I've been working on the past month or two.
gnatolf 16 hours ago [-]
I haven't tried cursor yet, but how is this different from the copilot plugin in vscode? Sounds pretty similar.
cheema33 15 hours ago [-]
> copilot plugin in vscode
Copilot, back when I used it, completely ignored context outside of the file I was working in. Copilot, as of a few weeks ago, the absolute dumbest assistant of all the various options available.
With cursor, I can ask it to make a change to how the app generates a JWT without even knowing which file or folder the relevant code is in. For very large codebases, this is very very helpful.
RugnirViking 13 hours ago [-]
ya know what, after a couple times hearing this comment, I downloaded it literally yesterday. It does feel pretty different, at least the composer module and stuff. A bit improvement in ai tooling imo
cruffle_duffle 14 hours ago [-]
Similar flow but much better user experience. At least that is how I’d describe it.
bandushrew 9 hours ago [-]
Producing 200 lines of usable code an hour is genuinely impressive.
My experiments have been nowhere near that successful.
I would love, love, love to see a transcript of how that process worked over an hour, if that was something you were willing to share.
protocolture 9 hours ago [-]
100%.
I do all this + rubber ducky the hell out of it.
Sometimes I just discuss concepts of the project with the thing and it helps me think.
I dont think chat is going to be right for everyone but it absolutely works for me.
ic4l 15 hours ago [-]
For me the o models consistently make more mistakes for me than Claude 3.5 Sonnet.
pc86 15 hours ago [-]
Same for me. I wonder if Claude is better at some languages than others, and o models are better at those weaker languages. There are some devs I know who insist Claude is garbage for coding and o3-* or o4-* are tier 1.
svachalek 12 hours ago [-]
I think Claude is incredible on JS/TS coding while GPT is highly python focused.
kristofferR 14 hours ago [-]
o4 doesn't exist (in public at least) yet.
esafak 9 hours ago [-]
OP means 4o
AutistiCoder 13 hours ago [-]
ChatGPT itself is great for coding.
GitHub Copilot is...not. It doesn't seem to understand how to help me as well as ChatGPT does.
sdesol 16 hours ago [-]
> 1. I find a "pair programming" mentality is key. I focus on the high-level code, and let the model focus on the lower level code. I code review all the code, and provide feedback. Blindly accepting the code is a terrible approach.
This is what I've found to be key. If I start a new feature, I will work with the LLM to do the following:
- Create problem and solution statement
- Create requirements and user stories
- Create architecture
- Create skeleton code. This is critical since it lets me understand what it wants to do.
- Generate a summary of the skeleton code
Once I have done the above, I will have the LLM generate a reusable prompt that I can use to start LLM conversations with. Below is an example of how I turn everything into a reusable prompt.
The first message is the reusable prompt message. With the first message in place, I can describe the problem or requirements and ask the LLM what files it will need to better understand how to implement things.
What I am currently doing highlights how I think LLM is a game changer. VCs are going for moonshots instead of home runs. The ability to gather requirements and talk through a solution before even coding is how I think LLMs will revolutionize things. It is great that it can produce usable code, but what I've found it to be invaluable is it helps you organize your thoughts.
In the last link, I am having a conversation with both DeepSeek v3 and Sonnet 3.5 and the LLMs legitimately saved me hours in work, without even writing a single line of code. In the past, I would have just implemented the feature and been done with it, and then I would have to fix something if I didn't think of an edge case. With LLMs, it literally takes minutes to develop a plan that is extremely well documented that can be shared with others.
This ability to generate design documents is how I think LLMs will ultimately be used. The bonus is producing code, but the reality is that documentation (which can be tedious and frustrating) is a requirement for software development. In my opinion, this is where LLMs will forever change things.
16 hours ago [-]
zahlman 11 hours ago [-]
LoC per hour seems to me like a terrible metric.
esafak 9 hours ago [-]
Why? Since you are vetting the code it generates, the rate at which you end up with code you accept seems like a good measure of productivity.
59nadir 3 hours ago [-]
1000 lines of perfectly inoffensive and hard to argue against code that you don't need because it's not the right solution is negative velocity. Granted, I don't think that's much worse with LLMs but I do think it's going to be a growing problem caused by the cost of creating useless taxonomies and abstractions going down.
That is to say: I think LLMs are going to make a problem we already had (much) worse.
ikety 17 hours ago [-]
do you use pair programming tools like aider?
nonrandomstring 16 hours ago [-]
> it's actually a good UI
Came to vote good too. I mean, why do we all love a nice REPL? That's
chat right? Chat with an interpreter.
bongodongobob 16 hours ago [-]
To add to that, I always add some kind of debug function wrapper so I can hand off the state of variables and program flow to the LLM when I need to debug something. Sometimes it's really hard to explain exactly what went wrong so being able to give it a chunk of the program state is more descriptive.
throwup238 16 hours ago [-]
I do the same for my QT desktop app. I’ve got an “Inspector” singleton that allows me to select a component tree via click, similar to browser devtools. It takes a screenshot, dumps the QML source, and serializes the state of the components into the clipboard.
I paste that into Claude and it is surprisingly good at fixing bugs and making visual modifications.
rubymamis 13 hours ago [-]
Sounds awesome. I would love to hear more about this. Any chance you can share this or at least more details?
acrophiliac 14 hours ago [-]
That sounds cool. I could use that. Care to share your Inspector code?
ls_stats 15 hours ago [-]
>it's actually a good UI
>I just finished a small project
>around 2200 lines
why the top comments on HN are always people who have not read the article
pc86 15 hours ago [-]
It's not clear to me in the lines you're quoting that the GP didn't read the article.
wiremine 14 hours ago [-]
Just confirming I did read the article in its entirety. Not reading it is like HN sin #1.
16 hours ago [-]
taeric 18 hours ago [-]
I'm growing to the idea that chat is a bad UI pattern, period. It is a great record of correspondence, I think. But it is a terrible UI for doing anything.
In large, I assert this is because the best way to do something is to do that thing. There can be correspondence around the thing, but the artifacts that you are building are separate things.
You could probably take this further and say that narrative is a terrible way to build things. It can be a great way to communicate them, but being a separate entity, it is not necessarily good at making any artifacts.
zamfi 17 hours ago [-]
With apologies to Bill Buxton: "Every interface is best at something and worst at something else."
Chat is a great UI pattern for ephemeral conversation. It's why we get on the phone or on DM to talk with people while collaborating on documents, and don't just sit there making isolated edits to some Google Doc.
It's great because it can go all over the place and the humans get to decide which part of that conversation is meaningful and which isn't, and then put that in the document.
It's also obviously not enough: you still need documents!
But this isn't an "either-or" case. It's a "both" case.
packetlost 18 hours ago [-]
I even think it's bad for generalized communication (ie. Slack/Teams/Discord/etc.) that isn't completely throwaway. Email is better in every single way for anything that might ever be relevant to review again or be filtered due to too much going on.
goosejuice 17 hours ago [-]
I've had the opposite experience.
I have never had any issue finding information in slack with history going back nearly a decade. The only issue I have with Slack is a people problem where most communication is siloed in private channels and DMs.
Email threads are incredibly hard to follow though. The UX is rough and it shows.
packetlost 17 hours ago [-]
I hard disagree. Don't have a conversation? Ask someone who does to forward it. Email lets the user control how to organize conversations. Want to stuff a conversation in a folder? Sure. Use tags religiously? Go for it. Have one big pile and rely on full-text search and metadata queries? You bet. Only the last of these is possible with the vast majority of IM platforms because the medium just doesn't allow for any other paradigm.
The fact that there's a subject header alone leads people to both stay on topic and have better thought out messages.
I agree that email threads could have better UX. Part of that is the clients insistence on appending the previous message to every reply. This is completely optional though and should probably be turned off by default for simple replies.
goosejuice 17 hours ago [-]
That's fine.
Email is really powerful but people simply aren't good at taking advantage of it and it varies by email client. Doing some IT work at a startup made this pretty clear to me. I found Slack was much more intuitive for people.
Both systems rely on the savviness of the users for the best experience and I just think email is losing the UX war. Given how terrible people seem to be at communicating I think it's a pretty important factor to consider.
packetlost 17 hours ago [-]
I think this could reasonably be addressed, and several startups have. The trouble is that the default email clients (gmail, outlook, etc.) don't really try to make it any better.
I've also generally had the opposite experience, a huge amount of business offices live and breath in email (mostly Outlook, but I'm sure it varies). Startups tend to run fast and lean, but as soon as you have some threshold of people, email is king.
goosejuice 16 hours ago [-]
We used outlook and slack. Business primarily operated via outlook as most communication was unsurprisingly external. Most but not all internal was slack.
I'm not hating on email, it has a lot of good properties and still serves a purpose. Every office appears to have some kind anti-slack vigilante. It's really not that bad.
Sylamore 12 hours ago [-]
Email has the stigma of all the junk/grey mail, spam and scam attempts that come in via it - people want to not have to filter through as much of that and for the most part these chat apps solve that problem.
It doesn't help that Outlook's search capabilities have gotten effectively useless - I can type in search terms that I'm literally looking at in my inbox and have it return no results, or have it return dozens of hits without the search terms involved at all. I don't have that problem with Slack or Teams.
However, I think you are right overall on email being better overall for what people end up using chat apps for.
esafak 9 hours ago [-]
In Slack people don't even consistently use threads, because they are not forced to, so conversations are strewn all over the place, interleaved with one another. Slack has no model of a discussion in the first place.
taeric 18 hours ago [-]
Anything that needs to be filtered for viewing again pretty much needs version control. Email largely fails at that, as hard as other correspondence systems. That said, we have common workflows that use email to build reviewed artifacts.
People love complaining about the email workflow of git, but it is demonstrably better than any chat program for what it is doing.
packetlost 17 hours ago [-]
I don't think I agree with this. Sure, many things should be versioned, but I don't think most correspondence requires it, which is emails primarily purpose.
taeric 17 hours ago [-]
Agreed if it is correspondence that we are talking about. So, agreed I'm probably too strong that anything needing filtering and such is bad.
I'm thinking of things that are assembled. The correspondence that went into the assembly is largely of historical interest, but not necessarily one of current use.
esafak 9 hours ago [-]
So you mean like collaborating on a document? Modern word processors are versioned, or you can use text and your own VCS, same as with your code.
Is your issue that you want to discuss the thing you are collaborating on outside of the tool you are creating it in?
taeric 7 hours ago [-]
This feels inline with my point? Versioning of documents is better done using other tools. Correspondence is fine over email.
We have some tools integrated with email to help version control things. But the actual version control is, strictly, not the emails.
packetlost 17 hours ago [-]
Yup, I agree there. Email is a horrible means of collaborating on changes in general, but doubly so in realtime. But so is IM.
SoftTalker 18 hours ago [-]
Yes, agree. Chatting with a computer has all the worst attributes of talking to a person, without any of the intuitive understanding, nonverbal cues, even tone of voice, that all add meaning when two human beings talk to each other.
TeMPOraL 17 hours ago [-]
That comment made sense 3 years ago. LLMs already solved "intuitive understanding", and the realtime multimodal variants (e.g. the thing behind "Advanced Voice" in ChatGPT app) handle tone of voice in both directions. As for nonverbal cues, I don't know yet - I got live video enabled in ChatGPT only few days ago and didn't have time to test it, but I would be surprised if it couldn't read the basics of body language at this point.
Talking to a computer still sucks as an user interface - not because a computer can't communicate on multiple channels the way people do, as it can do it now too. It sucks for the same reason talking to people sucks as an user interface - because the kind of tasks we use computers for (and that aren't just talking with/to/at other people via electronic means) are better handle by doing than by talking about them. We need an interface to operate a tool, not an interface to an agent that operates a tool for us.
As an example, consider driving (as in, realtime control - not just "getting from point A to B"): a chat interface to driving would suck just as badly as being a backseat driver sucks for both people in the car. In contrast, a steering wheel, instead of being a bandwidth-limiting indirection, is an anti-indirection - not only it lets you control the machine with your body, the control is direct enough that over time your brain learns to abstract it away, and the car becomes an extension of your body. We need more of tangible interfaces like that with computers.
The steering wheel case, of course, would fail with "AI-level smarts" - but that still doesn't mean we should embrace talking to computers. A good analogy is dance - it's an interaction between two independently smart agents exploring an activity together, and as they do it enough, it becomes fluid.
So dance, IMO, is the steering wheel analogy for AI-powered interfaces, and that is the space we need to explore more.
ryandrake 17 hours ago [-]
> We need an interface to operate a tool, not an interface to an agent that operates a tool for us.
Excellent comment and it gets to the heart of something I've had trouble clearly articulating: We've slowly lost the concept that a computer is a tool that the user wields and commands to do things. Now, a computer has its own mind and agency, and we "request" it to do things and "communicate" with it, and ask it to run this and don't run that.
Now, we're negotiating and pleading with the man inside of the computer, Mr. Computer, who has its own goals and ambitions that don't necessarily align with your own as a user. It runs what it wants to run, and if that upsets you, user, well tough shit! Instead of waiting for a command and then faithfully executing it, Mr. Computer is off doing whatever the hell he wants, running system applications in the background, updating this and that, sending you notifications, and occasionally asking you for permission to do even more. And here you are as the user, hobbled and increasingly forced to "chat" with it to get it to do what you want.
Even turning your computer off! You used to throw a hardware switch that interrupts the power to the main board, and _sayonara_ Mr. Computer! Now, the switch does nothing but send an impassioned plea to the operating system to pretty please, with sugar on top, when you're not busy could you possibly power off the computer (or mostly power it off, because off doesn't even mean off anymore).
xp84 16 hours ago [-]
This is a great observation. I've mostly thought of it, not in relation to AI, but in relation to the way Apple and to a lesser extent, Microsoft, act like they are the owners of the computers we "buy." An update will be installed now. Your silly user applications will be closed by force if necessary. System stability depends on it!
The modern OS values the system's theoretical 'system health' metrics far above things like "whether the user can use it to do some user task."
Another great example is how you can't boot a modern Mac laptop, on AC power, until it has decided its battery is sufficiently charged. Why? None of your business.
Anyway to get back on topic, this is an interesting connection you've made, the software vendor will perhaps delegate decisions like "is the user allowed to log into the computer at this time" or "is a reboot mandatory" to an "agent" running on the computer. If we're lucky we'll get to talk to that agent to plead our case, but my guess is Apple and Microsoft will decide we aren't qualified to have input to the decisions.
ryandrake 16 hours ago [-]
An example of where this is going is Apple's so-called "System Integrity Protection"[1] which is essentially an access level to system files that's even higher than root. It's Apple arrogantly protecting "their" system from the user, even from the root user:
System Integrity Protection is designed to allow modification of these protected parts only by processes that are signed by Apple and have special entitlements to write to system files, such as Apple software updates and Apple installers.
Only Apple can be trusted to operate what is supposed to be your computer.
Which is why I love my freebsd installation (and before that Alpine Linux) and why I develop on a VM on macOS. I can trivially modify the system components to get the behavior that I need. I consider macOS as a step up from ChromeOS, but not a general purpose computer OS. Latest annoyance was the fact that signing out of Books.app signs you out of the App Store (I didn’t want epubs to be synced).
Karrot_Kream 15 hours ago [-]
> Now, a computer has its own mind and agency, and we "request" it to do things and "communicate" with it, and ask it to run this and don't run that.
FWIW this happens what happens with modern steering wheels as well. Power steering is its own complicated subsystem that isn't just about user input. It has many more failure modes than an old-fashioned, analog steering wheel. The reason folks feel like "Mr. Computer" has a mind of its own is because of the mismatch between user desire and effect. This is a UX problem.
I also think chat and RAG are the biggest two UX paradigms we've spent exploring when it comes to LLMs. It's probably worth folks exploring other UX for LLMs that are enabling for the user. Suggestions in documents and code seem to be a UX that more people enjoy using but even then there's a mismatch.
smj-edison 16 hours ago [-]
This is one reason I love what Bret Victor has been doing with Dynamic Land[1]. He's really been doing in on trying to engage as many senses as possible, and make the whole system understandable. One of his big points is that the future in technology is helping us understand more, not defer our understanding to something else.
I think this gets to how a lot of these conversations go past each other? A chat interface for getting a ride from a car is almost certainly doable? So long as the itinerary and other details remain separate things? At large, you are basically using a chat bot to be a travel agent, no?
But, as you say, a chat interface would be a terrible way to actively drive a car. And that is a different thing, but I'm growing convinced many will focus on the first idea while staving off the complaints of the latter.
In another thread, I assert that chat is probably a fine way to order up something that fits a repertoire that trained a bot. But, I don't think sticking to the chat window is the best way to interface with what it delivers. You almost certainly want to be much more actively "hands on" in very domain specific ways with the artifacts produced.
TeMPOraL 12 hours ago [-]
> But, I don't think sticking to the chat window is the best way to interface with what it delivers. You almost certainly want to be much more actively "hands on" in very domain specific ways with the artifacts produced.
Yes, this is what I've also tried to hint at in my comment, but failed part-way. In most of the cases I can imagine chat interface to be fine (or even ideal), it's really only good as a starting point. Take two examples based on your reply:
1) Getting a car ride. "Computer, order me a cab home" is a good start. It's even OK if I then get asked to narrow it down between several different services/fares (next time I'll remember to specify that up front). But if I want to inspect the route (or perhaps adjust it, in a hypothetical service that supports it), I'd already prefer an interactive map I can scroll and zoom, with PoIs I can tap on to get their details, than to continue a verbal chat.
2) Ordering food in a fast food restaurant. I'm fine starting it with a conversation if I know what I want. However, getting back the order summary in prose (or worse, read out loud) would already be taxing, and if I wanted to make final adjustments, I'd beg for buttons and numeric input boxes. And, in case I don't know what I want, or what is available (and at what prices), a chat interface is a non-starter. Interactive menu is a must.
You sum this up perfectly:
> You almost certainly want to be much more actively "hands on" in very domain specific ways with the artifacts produced.
Chat may be great to get that first artifact, but afterwards, there's almost always a more hands-on interface that would be much better.
taeric 12 hours ago [-]
Oh, apologies, I meant my post to be a highlight of how I agree with you! Your post is great!
aylmao 17 hours ago [-]
I would also call it having all the worst attributes of a CLI, without the succinctness, OS integration, and program composability of one.
1ucky 17 hours ago [-]
You should check out out MCP by Anthropic, which solves some of the issues you mentioned.
taeric 18 hours ago [-]
Yeah, this is something I didn't make clear on my post. Chat between people is the same bad UI. People read in the aggression that they bring to their reading. And get mad at people who are legit trying to understand something.
You have some of the same problems with email, of course. Losing threading, in particular, made things worse. It was a "chatification of email" that caused people to lean in to email being bad. Amusing that we are now seeing chat applications rise to replace email.
SoftTalker 12 hours ago [-]
Yeah this is part of why RTO is not an entirely terrible idea. Remote work has these downsides -- working with another person over a computer link sucks pretty hard, no matter how you do it (not saying WFH doesn't have other very real upsides).
taeric 12 hours ago [-]
Agreed.
I'm actually in an awkward position where I was very supportive of RTO two years ago, but have since become very reliant on some things I could not do with a rigid RTO policy.
Regardless of RTO or WFH, patience and persistence remain vital qualities.
Suppafly 17 hours ago [-]
I like the idea of having a chat program, the issue is that it's horrible to have a bunch of chat programs all integrated into every application you use that are separate and incompatible with each other.
I really don't like the idea of chatting with an AI though. There are better ways to interface with AIs and the focus on chat is making people forget that.
tux1968 17 hours ago [-]
We need an LSP like protocol for AI, so that we can amortize the configuration over every place we want such an integration. AISP?
Midjourney is an interesting case study in this I think, building their product UI as a discord bot. It was interesting to be sure, but I always felt like I was fighting the "interface" to get things done. It certainly wasn't all bad, and I think if I used it more it might even be great, but as someone who doesn't use Discord other than that and only rarely generated images, I had to read the docs every time I wanted to generate an image, which is a ridiculous amount of friction.
joe_guy 18 hours ago [-]
There has recently been a pretty large UI inclusion for midjourney directly inside Discord which has the option of being used instead of the text input.
As is often the case in these sorts of thingsz your milage may vary for the more complex settings.
ijk 18 hours ago [-]
I'm curious if you find their new website interface more tractable--there's some inherent friction to the prompting in either case, but I'd like to know if the Discord chat interface can be overcome by using a different interface or if the issue is more intrinsic.
troupo 13 hours ago [-]
Their website UI is great. Discord is nigh unusable
dapperdrake 18 hours ago [-]
Email threads seem better for documenting and searching correspondence.
The last counter argument I read got buried on Discord or Slack somewhere.
jayd16 18 hours ago [-]
Isn't this entirely an implementation detail of slack and discord search? What about email makes it more searchable fundamentally? The meta data if both platforms is essentially the same, no?
NovemberWhiskey 17 hours ago [-]
I think this depends very much on how you use the tools.
My experience with email is that people have subject lines, email explicitly identifies to and cc recipients; email is threaded; email often has quotes/excerpting/highlighting from prior parts of the thread.
On the other hand, most chat usage I see is dependent on temporal aspects for threading (people under-utilize platform features for replies etc), tagging is generally only done to ping people to attract attention, chat groups are frequently reused for multiple different purposes.
Leaping to a point-in-time within a chat stream is often a bad user experience, with having to scroll up and down through unrelated stuff to find what you’re looking for.
Stuff in email is just massively more discoverable for me.
mrweasel 17 hours ago [-]
No, it has to do with context. In an email you will frequently have to provide more context for your answers to make sense. Chat is a conversation, which search drops you straight into, may with AI you could get placed at an appropriate starting point, but you're still reading a conversation. It's much easier to get dropped into a correspondence. To me the difference is like reading someones letter, vs. overhearing a conversation in a bus.
This obvious assumes that who ever wrote the email isn't a madman, who insist on using emails like it was a chat.
layer8 17 hours ago [-]
What makes email more useful in general is that each email is a separate object that you can organize in any way you want, i.e. move, copy, rename, sort into folders, attach as a file to any calendar entry, todo item, etc., or indeed to any other email. You can forward them to any other recipient, you can add and remove any recipient to and from the conversation at any time. It is conceptually powerful and flexible in a similar way that files in a file system are a powerful and flexible way to organize data. And it is easy to understand.
While all of these features could in principle be realized in a chat system as well, in practice they don’t provide that flexibility and power.
Another usability feature of emails is that they have a subject line. This allows to meaningfully list emails in a compact fashion. In a desktop interface, you can easily view and visually grep 50 emails or more at once in a mail folder or list of search results (in something like Outlook or Thunderbird or Mutt). This allows working with emails more efficiently than with a chat view where you can only see a few messages at once, and only of the same thread or channel.
Yet another usability feature of emails is that each email has its own read/unread status. This, again, is facilitated by each email being its own separate data object, and by the separation between subject and body, which allows the read status to be unambiguously bound to “opening” the email, or to navigating in the list of emails alongside a preview pane. And you can mark any email as unread again. In chats, the granularity of read/unread is the whole chat, whether you’ve actually read all of it or not. You can’t easily track what you’ve read or not in an automated way as with email, other than by that coarse-grained linear time-based property of when you last visited the channel.
jerjerjer 16 hours ago [-]
Accessing Thunderbird via JDBC from my favorite SQL client was so convenient. No messaging app search is even remotely close to what a simple SELECT/WHERE can do. Old Skype versions also stored chat info in an SQLite db. I wish I'd still have SQL access to my messages.
slongfield 18 hours ago [-]
Personally, when I send an email, I feel less time pressure to respond, so I more carefully craft my responses. The metadata is similar enough, but the actual data in email/forums is usually better.
al_borland 17 hours ago [-]
I find things get buried just as easily in email. People on my team are constantly resending each other emails, because they can’t find the thread.
This is why, if something is important, I take it out of email and put it into a document people can reference. The latest and correct information from all the decisions in the thread can also be collected in one place, so everyone reading doesn’t have to figure it out. Not to mention side conversations can influence the results, without being explicitly stated in the email thread.
kmoser 17 hours ago [-]
> This is why, if something is important, I take it out of email and put it into a document people can reference.
This is how things should be done, regardless of which medium is used to discuss the project. Without isolating and aggregating the final decision of each thread, there is no way to determine what everybody has agreed upon as the final product without looking back, which quickly becomes onerous.
Things get messy when you start having different versions of each feature, but that doesn't change the concept of using email/Slack/Discord/text/etc. for discussion and a separate "living" document for formalizing those decisions.
6510 16 hours ago [-]
Lets toss in a minimum number of [digital] signatures.
HappMacDonald 16 hours ago [-]
We had this problem in our organization circa 20 years back so I built a ticketing system, now each conversation exists as its own object, and "the same thing being discussed twice" has the opportunity to be merged into one, etc. That seems to have helped a lot with our internal conversations.
taeric 18 hours ago [-]
Discord and slack baffle me. I liked them specifically because they were more ephemeral than other options. Which, seems at odds with how people want them to be? Why?
wizzard0 18 hours ago [-]
Can't say for everyone, but I have terrible memory and rely heavily on the chat history (and other tools) to keep my mental model in shape.
Here, ephemeral means "this conversation might as well never had happened", so why waste time on that?
taeric 18 hours ago [-]
I suspect it has to do with mental models. For my model, at large, conversations are worthless. Anyone that tries to hold you to a conversation from weeks ago that didn't secure a stronger commitment is almost certainly flying loose and more than willing to selectively choose what they want to be committed to.
Does that mean I can't have some pleasure in conversing about things? Of course not. But, I also enjoy some pleasure there from the low stakes and value that a conversation has. It should be safe to be wrong. If you have a conversation spot where being wrong is not safe, then I question what is the advantage of that over trying to adopt a legalese framework for all of your communication?
TeMPOraL 12 hours ago [-]
My preferences are the opposite, but my mental frame is more about utility than about safety. I'm not worried about someone fishing for something I said that could be construed as commitment or admission - they can just as easily do that with e-mail[0]. For me, conversations can be extremely valuable, and I gravitate towards people and places where that's a common case. HN is one of such places - the comment threads here are conversations (half-way in form between chat and e-mail), and they often are valuable, as people often share deep insights, interesting ideas, worthwhile advice and useful facts. Because they're valuable, my instinct is that they need to be preserved, so that myself and others can find those gems again, or (re)discover them when searching for solutions, or read again to reevaluate, etc.
So now imagine such (idealized) HN threads transplanted to Discord or Slack. Same people, same topics, same insights, just unrolling in the form of a regular chat. All that value, briefly there to partake in, and then forever lost after however much time it takes for it to get pushed up a few screens worth of lines in the chat log. People don't habitually scroll back very far on a regular basis (and the UI of most chat platforms starts to rapidly break down if you try), and the lack of defined structure (bounded conversations labeled by a topic) plus weak search tools means you're unlikely to find a conversation again even if you know where and when it took place.
That, plus ephemeral nature of casual chat means not just the platform, but also some of the users expect it to quickly disappear, leading to what I consider anti-features such as the ability to unilaterally edit or unsend any message at arbitrary time in the future. It takes just one participant deciding, for whatever reason, to mass-delete their past messages, for many conversations to lose most of their value forever.
--
[0] - Especially that the traditional communication style, both private and business, is overly verbose. Quite like a chat, in fact, but between characters in a theatrical play - everyone has longer lines.
taeric 11 hours ago [-]
I think this is fair. And I should be clear that I'm not so worried about someone digging to find something stupid I said here on HN. Or in a chat. I'm more thinking about people that are afraid of saying something stupid, to the point that they just don't engage.
I think my mental model is more for chat rooms to take the place of coffee room chats. Ideally, some of those do push on something to happen. I'm not sure that forcing them into the threaded structure of conversations really helps, though?
Maybe it is based on the aim? If the goal is a simulacrum of human contact, then I think ephemeral makes a ton of sense.
I also kind of miss the old tradition of having a "flamewars" topic in newsgroups. I don't particularly love yelling at each other, but I do hate that people can't bring up some topics.
(I also miss some old fun newsgroups. I recall college had Haiku and a few other silly restrictive style groups that were just flat fun.)
mrweasel 17 hours ago [-]
I really don't get why people are so happy about Slack (never used Discord). The interface is awful, it barely functions as a chat client, yet people adds bots, automation and use it as a repository for documentation. Honestly it would be better if history was deleted weekly or something, just to prevent people from storing things in Slack.
parasubvert 7 hours ago [-]
It's the opposite in my experience, it's the best parts of IRC and the history is gold. Storing things in Slack is one of the most useful bits of it. I've seen several multi-billion dollar companies built most of their collaboration across offices around Slack.
jayd16 18 hours ago [-]
Were these ever ephemeral? Are you misremembering history free IRC chat rooms?
taeric 18 hours ago [-]
Fair that they were probably less ephemeral than I have them in my mental model. Which, as you guessed, was largely from them taking up the same spot as a slack (edit: I meant irc) instance in my mind. Slack, in particular, often had policies applied so that messages were deleted after a set time frame. I remember people complaining, but that seemed legit to me and fit my model.
I also confess this model of ephemeral conversation is amusing in this specific website. Which I also largely view as a clubhouse conversation that is also best viewed as ephemeral. But it is clearly held for far longer than that idea would lead me to think.
65 17 hours ago [-]
Oh, how nice it must be to complain about Slack. Try using Teams and you will never want to complain about Slack again.
jerjerjer 16 hours ago [-]
Slack is way worse than Teams. I honestly rather dislike both and rather use email only, but will pick Teams over Slack any time.
parasubvert 7 hours ago [-]
This is, to put it mildly, a minority opinion. I don't hate Teams as much as most people do but my old (big and small) companies had both Slack and Teams and about 40% of employees had Teams statuses of "ping me on Slack I refuse to use Teams".
chinathrow 18 hours ago [-]
Voice messages within a chat UI is even worse. I can't search it, I can't listen to it in the same situations I can read a message.
I wish I could block them within all these chat apps.
"Sorry, you can't bother to send voice messages to this person."
taeric 18 hours ago [-]
Oh dear lord yes. I am always baffled when I hear that some folks send voice memos to people.
tpmoney 17 hours ago [-]
I disagree. Chat is a fantastic UI for getting an AI to generate something vague. Specifically I’m thinking of AI image generation. A chat UI is a great interface for iterating on an image and dialing it in over a series of iterations. The key here is that the AI model needs to keep context both of the image generation history and that chat history.
I think this applies to any “fuzzy generation” scenario. It certainly shouldn’t be the only tool, and (at least as it stands today) isn’t good enough to finalize and fine tune the final result, but a series of “a foo with a bar” “slightly less orange” “make the bar a bit more like a fizzbuzz” interactions with a good chat UI can really get a good 80% solution.
But like all new toys, AI and AI chat will be hammered into a few thousand places where it makes no sense until the hype dies down and we come up with rules and guidelines for where it does and doesn’t work
badsectoracula 16 hours ago [-]
> Specifically I’m thinking of AI image generation
I heavily disagree here, chat - or really text - is a horrible UI for image generation, unless you have almost zero idea of what you want to achieve and you don't really care about the final results.
Typing "make the bar a bit more like a fizzbuzz" in some textbox is awful UX compared to, say, clicking on the "bar" and selecting "fizzbuzz" or drag-and-dropping "fizzbuzz" on the "bar" or really anything that takes advantage of the fact we're interacting with a graphical environment to do work on graphics.
In fact it is a horrible UI for anything, except perhaps chatbots and tasks that have to do with text like grammar correction, altering writing styles, etc.
It is helpful for impressing people (especially people with money) though.
tpmoney 12 hours ago [-]
> Typing "make the bar a bit more like a fizzbuzz" in some textbox is awful UX compared to, say, clicking on the "bar" and selecting "fizzbuzz" or drag-and-dropping "fizzbuzz" on the "bar" or really anything that takes advantage of the fact we're interacting with a graphical environment to do work on graphics.
That assumes that you have a UX capable of determining what you're clicking on in the generated image (which we could call a given if we assume a sufficiently capable AI model since we're already instructing it to alter the thing), and also that it can determine from your click that you've intended to click on the "foo" not the "greeble" that is on the foo or the shadow covering that part of the foo or anything else that might be in the same Z stack as your intended target. Pixel bitching adventure games come to mind as an example of how badly this can go for us. And yes, this is solvable, Squeak has a UI where repeatedly clicking in the same spot will iterate through the list of possibilities in that Z stack. But it could also get really messy really quickly.
Then we have to assume that your UX will be able to generate an entire list of possible things you might want to be able to to do with that thing that you've clicked, including adding to it, removing it, removing part of it, moving it, transforming its dimensions, altering its colors, altering the material surface and on and on and on. And that list of possibilities needs to be navigable and searchable in a way that's faster than just typing "make the bar more like a fizzbuzz" into a context aware chat box.
Again, I'm not arguing the chat interface should be the only interface. In fact, as you point out we're using a graphical system, it would be great if you could click on things or select them and have the system work on too. It should be able to take additional input than just chat. But I still think for iterating on a fuzzy idea, a chat UI is a useful tool.
beambot 18 hours ago [-]
As a written form of "stream of consciousness", it seems to have a lot of value to me. It's noisy, inefficient & meandering -- all the things those polished artifacts are not -- but it's also where you can explore new avenues without worrying about succinctness or completeness. It's like the first draft of a manuscript.
taeric 18 hours ago [-]
Certainly, it can have its use. But I question if it is stronger than previous generative techniques for creating many things. There have been strong tools that you could, for example, draw a box and say this should be a house. Add N rooms. This room should be a bathroom. Add windows to these rooms. Add required subfloor and plumbing.
Even with game development. Level editors have a good history for being how people actually make games. Some quite good ones, I should add.
For website development, many template based systems worked quite well. People seem hellbent on never acknowledging that form builders of the late 90s did, in fact, work.
Is it a bit nicer that you can do everything through a dialog? I'm sure it is a great for people that think that way.
t_mann 17 hours ago [-]
Ok, but what is a good pattern to leverage AI tools for coding (assuming that they have some value there, which I think most people would agree with now)? I could see two distinct approaches:
- "App builders" that use some combination of drag&drop UI builders, and design docs for architecture, workflows,... and let the UI guess what needs to be built "under the hood" (a little bit in the spirit of where UML class diagrams were meant to take us). This would still require actual programming knowledge to evaluate and fix what the bot has built
- Formal requirement specification that is sufficiently rigorous to be tested against automatically. This might go some way towards removing the requirement to know how to code, but the technical challenge would simply shift to knowing the specification language
taeric 17 hours ago [-]
I'd challenge if this is specific to coding? If you want to get a result that is largely like a repertoire of examples used in a training set, chat is probably workable? This is true for music. Visual art. Buildings. Anything, really?
But, if you want to start doing "domain specific" edits to the artifacts that are made, you are almost certainly going to want something like the app builders idea. Down thread, I mention how this is a lot like procedural generative techniques for game levels and such. Such that I think I am in agreement with your first bullet?
Similarly, if you want to make music with an instrument, it will be hard to ignore playing with said instrument more directly. I suspect some people can create things using chat as an interface. I just also suspect directly touching the artifacts at play is going to be more powerful.
I think I agree with the point on formal requirements. Not sure how that really applies to chat as an interface? I think it is hoping for a "laws of robotics" style that can have a test to confirm them? Reality could surprise me, but I always viewed that as largely a fiction item.
kiitos 14 hours ago [-]
I've yet to see any AI/LLM produce code that withstands even basic scrutiny.
lucasyvas 17 hours ago [-]
Disclaimer: Haven't used the tools a lot yet, just a bit. So if I say something that already exists, forgive me.
TLDR: Targeted edits and prompts / Heads Up Display
It should probably be more like an overlay (and hooked into context menus with suggestions, inline context bubbles when you want more context for a code block) and make use of an IDE problems view. The problems view would have to be enhanced to allow it to add problems that spanned multiple files, however.
Probably like the Rust compiler output style, but on steroids.
There would likely be some chatting required, but it should all be at a particular site in the code and then go into some history bank where you can view every topic you've discussed.
For authoring, I think an interactive drawing might be better, allowing you to click on specific areas and then use shorter phrasing to make an adjustment instead of having an argument in some chat to the left of your screen about specificity of your request.
Multi-point / click with minimal prompt. It should understand based on what I clicked what the context is without me having to explain it.
staplers 17 hours ago [-]
Ok, but what is a good pattern to leverage AI tools for coding?
Actual product stakeholders are not likely to spill their magic sauce and give free consultancy.
swiftcoder 17 hours ago [-]
"Actual product stakeholders" in this space clearly don't actually have any magic sauce to spill. Everyone is building more or less the same chat-based workflows on the same set of 3rd-party LLMs.
The space is ripe for folks with actual domain expertise to design an appropriate AI workflow for their domain.
FloorEgg 16 hours ago [-]
I have magic sauce that I haven't spilled yet.
t_mann 15 hours ago [-]
That's no reason to not discuss potentially cool ideas, unless you think their input is so indispensable that any debate is futile without them.
gagik_co 17 hours ago [-]
I think “correspondence UX” can be bad UX but there’s nothing inherently wrong with chat UI.
I created the tetr app[1] which is basically “chat UI for everything”. I did that because I used to message myself notes and wanted to expand it to many more things. There’s not much back and forth, usually 1 input and instant output (no AI), still acting like a chat.
I think there’s a lot of intuitiveness with chat UI and it can be a flexible medium for sharing different information in a similar format, minimizing context switching. That’s my philosophy with tetr anyhow.
It's usually not. Narrative is a famously flawed way to communicate or record the real world.
It's great for generating engagement, though.
sangnoir 17 hours ago [-]
> Narrative is a famously flawed way to communicate or record the real world.
...and yet with it's flaws, it's the most flexible in conveying meaning. A Ted Chiang interview was on the HN frontpage a few days ago, in it, he mentions that humans created multiple precise, unambiguous communication modes like equations used in mathematical papers and proofs. But those same papers are not 100% equations, the mathematicians have to fall back to flawed language to describe and provide context because those formal languages only capture a smaller range of human thought compared to natural language.
This is not to say chat has the best ergonomics for development - it's not, but one has to remember that the tools are based on Large Language Models whose one-trick is manipulating language. Better ergonomics would likely come from models trained or fine-tuned on AST-tokens and diffs. They'd still need to modulate on language (understanding requirements, hints, variable names,and authoring comments, commits and/or PRs).
taeric 17 hours ago [-]
I think fictional narratives that aim to capture inner monologue are famously flawed. I think narrative tours of things can be good. I'm not clear if "narrated tours" are a specific genre, sadly. :(
OJFord 18 hours ago [-]
I don't know, I'm in Slack all day with colleagues, I quite like having the additional ChatGPT colleague (even better I can be quite rude/terse in my messages with 'them').
Incidentally I think that's also a good model for how much to trust the output - you might have a colleague who knows enough about X to think they can answer your question, but they're not necessarily right, you don't blindly trust it. You take it as a pointer, or try the suggestion (but not surprised if it turns out it doesn't work), etc.
taeric 18 hours ago [-]
Oh, do not take my comment as a "chat bots shouldn't exist." That is not at all my intent. I just think it is a bad interface for building things that are self contained in the same chat log.
Sylamore 13 hours ago [-]
NC DMV replaced their regular forms with a chat bot and it's horrible. Takes forever to complete tasks that used to take less than a minute because of the fake interaction and fake typing. Just give me a damn form to pay my taxes or request a custom plate.
brobdingnagians 17 hours ago [-]
Similar thing I've run into lately, chat is horrible for tracking issues and tasks. When people try to use it that way, it becomes absolute chaos after awhile.
dartos 18 hours ago [-]
Preach!
I’ve been saying this since 2018
varispeed 17 hours ago [-]
Talk to AI chat as if you would talk to junior developer at your company and tell it to do something that you need.
I think it is brilliant. On another hand I caught myself many times writing prompts to colleagues. Although it made requirements of what I need so much clearer for them.
themanmaran 18 hours ago [-]
I'm surprised that the article (and comments) haven't mentioned Cursor.
Agreed that copy pasting context in and out of ChatGPT isn't the fastest workflow. But Cursor has been a major speed up in the way I write code. And it's primarily through a chat interface, but with a few QOL hacks that make it way faster:
1. Output gets applied to your file in a git-diff style. So you can approve/deny changes.
2. It (kinda) has context of your codebase so you don't have to specify as much. Though it works best when you explicitly tag files ("Use the utils from @src/utils/currency.ts")
3. Directly inserting terminal logs or type errors into the chat interface is incredibly convenient. Just hover over the error and click the "add to chat"
dartos 18 hours ago [-]
I think the wildly different experiences we all seem to have with AI code tools speaks to the inconsistency of the tools and our own lack of understanding of what goes into programming.
I’ve only been slowed down with AI tools. I tried for a few months to really use them and they made the easy tasks hard and the hard tasks opaque.
But obviously some people find them helpful.
Makes me wonder if programming approaches differ wildly from developer to developer.
For me, if I have an automated tool writing code, it’s bc I don’t want to think about that code at all.
But since LLMs don’t really act deterministically, I feel the need to double check their output.
That’s very painful for me. At that point I’d rather just write the code once, correctly.
kenjackson 16 hours ago [-]
I use LLMs several times a day, and I think for me the issue is that verification is typically much faster than learning/writing. For example, I've never spent much time getting good at scripting. Sure, probably a gap I should resolve, but I feel like LLMs do a great job at it. And what I need to script is typically easy to verify, I don't need to spend time learning how to do things like, "move the files of this extension to this folder, but rewrite them so that the name begins with a three digit number based on the date when it was created, with the oldest starting with 001" -- or stuff like that. Sometimes it'll have a little bug, but one that I can debug quickly.
Scripting assistance by itself is worth the price of admission.
The other thing I've found it good at is giving me an English description of code I didn't write... I'm sure it sometimes hallucinates, but never in a way that has been so wrong that its been apparent to me.
shaan7 15 hours ago [-]
I think you and the parent comment are onto something. I also feel like the parent since I find it relatively difficult to read code that someone else wrote. My brain easily gets biased into thinking that the cases that the code is covering are the only possible ones. On the flip side, if I were writing the code, I am more likely to determine the corner cases.
In other words, writing code helps me think, reading just biases me. This makes it extremely slow to review a LLM's code at which point I'd just write it myself.
Very good for throwaway code though, for example a PoC which won't really be going to production (hopefully xD).
skydhash 15 hours ago [-]
Your script example is a good one, but the nice thing about scripting is when you learn the semantic of it. Like the general pattern of find -> filter/transform -> select -> action. It’s very easy to come up with a one liner that can be trivially modified to adapt it to another context. More often than not, I find LLMs generate overly complicated scripts.
lukeschlather 13 hours ago [-]
It's astounding how often I ask an LLM to generate some thing, do a little more research, come back and I'm ready to use the code it generated and I realize, no, it's selected the wrong flags entirely.
Although most recently I caught it because I fed it into both gpt-4o and o1 and o1 had the correct flags. Then I asked 4o to expand the flags from the short form to the long form and explain them so I could double-check my reasoning as to why o1 was correct.
sangnoir 17 hours ago [-]
> But since LLMs don’t really act deterministically, I feel the need to double check their output.
I feel the same
> That’s very painful for me. At that point I’d rather just write the code once, correctly.
I use AI tools augmentatively, and it's not painful for me, perhaps slightly inconvenient. But for boiler-plate-heavy code like unit tests or easily verifiable refactors[1], adjusting AI-authored code on a per-commit basis is still faster than me writing all the code.
1. Like switching between unit-test frameworks
aprilthird2021 18 hours ago [-]
I think it's about what you're working on. It's great for greenfield projects, etc. Terrible for complex projects that plug into a lot of other complex projects (like most of the software those of us not at startups work on day to day)
dartos 18 hours ago [-]
It’s been a headache for my greenfield side projects and for my day to day work.
Leaning on these tools just isn’t for me rn.
I like them most for one off scripts or very small bash glue.
lolinder 18 hours ago [-]
I like Cursor, but I find the chat to be less useful than the super advanced auto complete.
The chat interface is... fine. Certainly better integrated into the editor than GitHub Copilot's, but I've never really seen the need to use it as chat—I ask for a change and then it makes the change. Then I fixed what it did wrong and ask for another change. The chat history aspect is meaningless and usually counterproductive, because it's faster for me to fix its mistakes than keep everything in the chat window while prodding it the last 20% of the way.
tarsinge 15 hours ago [-]
I was a very skeptic on AI assisted coding until I tried Cursor and experienced the super autocomplete. It is ridiculously productive. For me it’s to the point it makes Vim obsolete because pressing tab correctly finishes the line or code block 90% of the time. Every developer having an opinion on AI assistance should have just tried to download Cursor and start editing a file.
themanmaran 17 hours ago [-]
Agreed the autocomplete definitely gets more milage than the chat. But I frequently use it for terminal commands as well. Especially AWS cli work.
"how do I check the cors bucket policies on [S3 bucket name]"
fragmede 17 hours ago [-]
> while prodding it the last 20% of the way.
hint: you don't get paid to get the LLM to output perfect code, you get paid by PRs submitted and landed. Generate the first 80% or whatever with the LLM, and then finish the last 20% that you can write faster than the LLM yourself, by hand.
reustle 16 hours ago [-]
Depends on the company. Most of the time, you get paid to add features and fix bugs, while maintaining reliability.
End users don’t care where the code came from.
jeremyjh 17 hours ago [-]
That is exactly what GP was pointing out, and why they said they do not prod it for it the last 20%.
koito17 13 hours ago [-]
I'm not familiar with Cursor, but I've been using Zed with Claude 3.5 Sonnet. For side projects, I have found it extremely useful to provide the entire codebase as context and send concise prompts focusing on a single requirement. Claude handles "junior developer" tasks well when each unit of work is clearly separated.
Zed makes it trivial to attach documentation and terminal output as context. To reduce risk of hallucination, I now prefer working in static, strongly-typed languages and use libraries with detailed documentation, so that I can send documentation of the library alongside the codebase and prompt. This sounds like a lot of work, but all I do is type "/f" or "/t" in Zed. When I know a task only modifies a single file, then I use the "inline assist" feature and review the diffs generated by the LLM.
Additionally, I have found it extremely useful to actually comment a codebase. LLMs are good at unstructured human language, it's what they were originally designed for. You can use them to maintain comments across a codebase, which in turn helps LLMs since they get to see code and design together.
Last weekend, I was able to re-build a mobile app I made a year ago from scratch with a cleaner code base, better UI, and implement new features on top (making the rewrite worth my time). The app in question took me about a week to write by hand last year; the rewrite took exactly 2 days.
---
As a side note: a huge advantage of Zed with locally-hosted models is that one can correct the code emitted by the model and force the model to re-generate its prior response with those corrections. This is probably the "killer feature" of models like qwen2.5-coder:32b. Rather than sending extra prompts and bloating the context, one can just delete all output from where the first mistake was made, correct the mistake, then resume generation.
stitched2gethr 17 hours ago [-]
I think this misses the point. It seems like the author is saying we should move from imperative instructions to a declarative document that describes what the software should do.
Imperative:
- write a HTTP server that serves jokes
- add a healthcheck endpoint
- add TLS and change the serving port to 443
Declarative:
- a HTTP server that serves jokes
- contains a healthcheck endpoint
- supports TLS on port 443
The differences here seem minimal because you can see all of it at once, but in the current chat paradigm you'd have to search through everything you've said to the bot to get the full context, including the side roads that never materialized.
In the document approach you're constantly refining the document. It's better than reviewing the code because (in theory) you're looking at "support TLS on port 443" instead of a lot of code, which means it can be used by a wider audience. And ideally I can give the same high level spec to multiple LLMs and see which makes the best application.
ygouzerh 17 hours ago [-]
Good explanation! As an open-reflexion: will a declarative document be as detailed as the imperative version? Often between the specs that the product team is providing (that we can consider as the "descriptive" document) and the implementation, many sub specs have been created by the tech team that uncovered some important implementation details. It's like a Rabbit Hole.
For example, for a signup page, we could have:
- Declarative: Signup the user using their email address
- Imperative: To do the same, we will need to implement the smtp library, which means discovering that we need an SMTP server, so now we need to choose which one. And when purchasing an SMTP Server plan, we discover that there are rate limit, so now we need to add some bot protection to our signup page (IP Rate Limit only? ReCaptcha? Cloudflare bot protection?), etc
Which means that at the end, the imperative code way is kind of like the ultimate implementation specs.
bze12 7 hours ago [-]
I could imagine a hybrid where declarative statements drive the high-level, and lower-level details branch off and are hashed out imperatively (in chat). Maybe those detail decisions then revise the declarative statements.
The source of truth would still be the code though, otherwise the declarative statements would get so verbose that they wouldn't be any more useful than writing the code itself.
skydhash 15 hours ago [-]
The issue is that there’s no execution platform for declarative specs, so something will be translated to imperative and that is where the issue lies. There’s always an imperative core which needs to be deterministic or it’s out needs to be verified. LLMs are not the former and the latter option can take more time than just writing the code.
I used Continue before Cursor. Cursor’s “agent” composer mode is so much better than what Continue offered. The agent can automatically grep the codebase for relevant files and then read them. It can create entirely new files from scratch. I can still manually provide some files as context, but it’s not usually necessary. With Continue, everything was very manual.
Cursor also does a great job of showing inline diffs of what composer is doing, so you can quickly review every change.
I don’t think there’s any reason Continue can’t match these features, but it hadn’t, last I checked.
Cursor also focuses on sane defaults, which is nice. The tab completion model is very good, and the composer model defaults to Claude 3.5 Sonnet, which is arguably the best non-reasoning code model. (One would hope that Cursor gets agent-composer working with reasoning models soon.) Continue felt much more technical… which is nice for power users, but not always the best starting place.
freeone3000 17 hours ago [-]
It’s not nearly as slick. cursor’s indexing and integration are significant value-adds.
mkozlows 14 hours ago [-]
Windsurf is even moreso this way -- it'll look through your codebase trying to find the right files to inspect, it runs the build/test stuff and examines the output to see what went wrong.
I found interacting with it via chat to be super-useful and a great way to get stuff done. Yeah, sometimes you just have to drop into the code, and tag a particular line and say "this isn't going to work, rewrite it to do x" (or rewrite it yourself), but the ability to do that doesn't vitiate the value of the chat.
mholm 18 hours ago [-]
Yeah, the OP has a great idea, but models as-is can't handle that kind of workflow reliably. The article is both a year behind, and a year ahead at the same time.
The user must iterate with the chatbot, and you can't do that by just doing a top down 'here's a list of all features, get going, ping me when finished' prompt. AI is a junior engineer, so you have to treat it like a junior engineer, and that means looking through your chat logs, and perhaps backing up to a restore point and going a different direction.
mttrms 17 hours ago [-]
I've started using Zed on a side project and I really appreciate that you can easily manipulate the chat / context and continue making requests
It's still a "chat" but it's just text at the end of the day. So you can edit as you see fit to refine your context and get better responses.
notShabu 15 hours ago [-]
chat is the best way to orchestrate and delegate. whether or not this is considered "ME writing MY code" is imo a philosophical debate
e.g. executives treat the org as a blackbox LLM and chat w it to get real results
croes 19 hours ago [-]
Natural language isn’t made to be precise that’s why we use a subset in programming languages.
So you either need lots of extra text to remove the ambiguity of natural language if you use AI or you need a special precise subset to communicate with AI and that’s just programming with extra steps.
Klaster_1 18 hours ago [-]
A lot of extra text usually means prior requirements, meeting transcripts, screen share recordings, chat history, Jira tickets and so on - the same information developers use to produce a result that satisfies the stakeholders and does the job. This seems like a straightforward direction solvable with more compute and more efficient memory. I think this will be the way it pans outs.
Real projects don't require an infinitely detailed specification either, you usually stop where it no longer meaningfully moves you towards the goal.
The whole premise of AI developer automation, IMO, is that if a human can develop a thing, then AI should be able too, given the same input.
cube2222 18 hours ago [-]
We are kind of actually there already.
With a 200k token window like Claude has you can already dump a lot of design docs / transcripts / etc. at it.
rightisleft 18 hours ago [-]
Its all about the context window. Even the new Mistral Codestral-2501 256K CW does a great job.
If you use cline with any large context model the results can be pretty amazing. It's not close to self guiding, You still need to break down and analyze the problem and provide clear and relevant instructions. IE you need to be a great architect. Once you are stable on the direction, its awe inspiring to watch it do the bulk if the implementation.
I do agree that there is space to improve over embedded chat windows in IDEs. Solutions will come in time.
selectodude 18 hours ago [-]
Issue I have with Cline that I don't run into with, say, Aider, is that I find Cline to be like 10x more expensive. The number of tokens it blows through is incredible. Is that just me?
mollyporph 18 hours ago [-]
And Gemini has 2m token window. Which is about 10 minutes of video for example.
layer8 17 hours ago [-]
This premise in your last paragraph can only work with AGI, and we’re probably not close to that yet.
throwaway290 18 hours ago [-]
idk if you think all those jira tickets and meetings are precise enough (IMO sometimes the opposite)
By the way, remind me why you need design meetings in that ideal world?:)
> Real projects don't require an infinitely detailed specification either, you usually stop where it no longer meaningfully moves you towards the goal.
The point was that specification is not detailed enough in practice. Precise enough specification IS code. And the point is literally that natural language is just not made to be precise enough. So you are back where you started
So you waste time explaining in detail and rehashing requirements in this imprecise language until you see what code you want to see. Which was faster to just... idk.. type.
Klaster_1 18 hours ago [-]
That's a fair point, I'd love to see Copilot come to a conclusion that they can't resolve a particular conundrum and communicates with other people so everyone makes a decision together.
falcor84 18 hours ago [-]
Even if you have superhuman AI designers, you still need buy-in.
uoaei 18 hours ago [-]
There's a nice thought, that anyone with that kind of power would share it.
oxfordmale 19 hours ago [-]
Yes, let's devise a more precise way to give AI instructions. Let's call it pAIthon. This will allow powers that be, like Zuckerberg to save face and claim that AI has replaced mid-level developers and enable developers to rebrand themselves as pAIthon programmers.
Joking aside, this is likely where we will end up, just with a slightly higher programming interface, making developers more productive.
dylan604 18 hours ago [-]
man, pAIthon was just sitting right there for the taking
Or a proposal/feedback process. Ala you are hired by non technical person to build something, you generate requirements and a proposed solution. You then propose that solution, they give feedback.
Having a feedback loop is the only way viable for this. Sure, the client could give you a book on what they want, but often people do not know their edge cases, what issues may arise/etc.
kokanee 18 hours ago [-]
> or you need a special precise subset to communicate with AI
haha, I just imagined sending TypeScript to ChatGPT and having it spit my TypeScript back to me. "See guys, if you just use Turing-complete logically unambiguous input, you get perfect output!"
dylan604 18 hours ago [-]
> and that’s just programming with extra steps.
If you know how to program, then I agree and part of why I don't see the point. If you don't know how to program, than the prompt isn't much different than providing the specs/requirements to a programmer.
empath75 19 hours ago [-]
AIs actually are very good at this. They wouldn't be able to write code at all otherwise. If you're careful in your prompting, they'll make fewer assumptions and ask clarifying questions before going ahead and writing code.
9rx 18 hours ago [-]
> If you're careful in your prompting
In other words, if you replace natural language with a programming language then the computer will do a good job of interpreting your intent. But that's always been true, so...
benatkin 18 hours ago [-]
Being careful in your prompting doesn’t imply that. That can also be thought of as just using natural language well.
9rx 18 hours ago [-]
What separates natural language from programming language is that natural language doesn't have to be careful. Once you have to be careful, you are programming.
xboxnolifes 16 hours ago [-]
Does this mean that good communication skills is equivalent to programming?
benatkin 18 hours ago [-]
It does have to be careful at times if you’re going to be effective with natural language.
9rx 18 hours ago [-]
Certainly there is a need for care outside of computers too, like in law, but legal documents are a prime example of programs. That's programming, written using a programming language, not natural language. It is decidedly not the same language you would use for casual conversation and generally requires technical expertise to understand.
In other word, complex applications can still be fully specified in plain English, even if it might take more words.
9rx 14 hours ago [-]
> complex applications can still be fully specified in plain English
In plain English, of course, but not in natural English. When using language naturally one will leave out details, relying on other inputs, such as shared assumptions, to fill in the gaps. Programming makes those explicit.
kmoser 7 hours ago [-]
Programming only needs to make things as explicit as necessary based on the developer's desires and the system's assumptions. Where more detail is necessary, the programmer can add more code. For example, there's no need to tell a browser explicit rules for how users should be able to interact with an input field, since that's the browser's default behavior; you only need to specify different behavior when you want it to differ from the default.
Likewise for English: one can use natural English to add as many details as necessary, depending on who you're talking to, e.g. "Make an outline around the input field, and color the outline #ff0000." You can then add, if necessary, "Make the corners of the outline rounded with a 5 pixel radius."
In this respect, complex applications can be fully specified in English; we usually call those documents "formal specifications." You can write it terse, non-natural language with consistent, defined terminology to save room (as most specs are), or colloquial (natural) language if you really want. I wouldn't recommend the latter, but it's definitely useful when presenting specs to a less technically informed audience.
9rx 6 hours ago [-]
> complex applications can be fully specified in English
Of course. We established that at the beginning. The entire discussion is about exactly that. It was confirmed again in the previous comment. However, that is not natural. I expect most native English speakers would be entirely incapable of fully specifying a complex application or anything else of similar complexity. That is not natural use.
While the words, basic syntax, etc. may mirror that found in natural language, a specification is really a language of its own. It is nothing like the language you will find people speaking at the bar or when writing pointless comments on Reddit. And that's because it is a programming language.
kmoser 4 hours ago [-]
> I expect most native English speakers would be entirely incapable of fully specifying a complex application or anything else of similar complexity.
Your original postulation was that it simply wasn't possible, implying nobody could do it. The fact that most native English speakers wouldn't be able to do it doesn't mean nobody can do it.
I agree that most native English speakers wouldn't be able to write a reasonably complete spec in any type of language, not just because they lack the language skill, but because they simply wouldn't have the imagination and knowledge of what to create to begin with, let alone how to express it.
benatkin 18 hours ago [-]
People can often be observed to be deliberately making an effort in casual, social, natural language conversation. It flows for some people more than others. Try watching Big Bang Theory and see characters at times being deliberate with their words and at other times responding automatically.
An LLM can do increasingly well as a fly on the wall, but it’s common for people using an LLM to be less collaborative with an LLM and for them to expect the LLM to structure the conversation. Hence the suggestion to be careful in your prompting.
9rx 18 hours ago [-]
> at times being deliberate with their words and at other times responding automatically.
Right. On one side you have programming language and on the other natural language.
They can intermingle, if that is what you are trying to say? You can see this even in traditional computer programming. One will often switch between deliberate expression and casual, natural expression (what often get called comments in that context).
oxfordmale 18 hours ago [-]
AI is very good at this. Unfortunately, humans tend to be super bad at providing detailed verbal instructions.
indymike 18 hours ago [-]
Languages used for day to day communication between humans do not have the specificity needed for detailed instructions... even to other humans. We out of band context (body language, social norms, tradition, knowledge of a person) quite a bit more than you would think.
nomel 14 hours ago [-]
Programming languages, which are human language, are purpose built for this. Anyone working in the domain of precise specifications uses them, or something very similar (for example, engineering, writing contracts, etc), often daily. ;)
They all usually build down to a subset of english, because near caveman speak is enough to define things with precision.
nomel 14 hours ago [-]
Then those same humans won't be able to reason about code, or the problem spaces they're working in, regardless, since it's all fundamentally about precise specifics.
foobiekr 8 hours ago [-]
I don’t think I’ve ever seen an llm in any context ask for clarification. Is that a real thing?
croes 15 hours ago [-]
AI is a little bit like Occam‘s razor, when you say hoofbeats, you get horses. Bad if you need Zebras.
LordDragonfang 18 hours ago [-]
> they'll make fewer assumptions and ask clarifying questions before going ahead and writing code.
Which model are you talking about here? Because with ChatGPT, I struggle with getting it to ask any clarifying questions before just dumping code filled with placeholders I don't want, even when I explicitly prompt it to ask for clarification.
65 18 hours ago [-]
We're going to create SQL all over again, aren't we?
lelanthran 18 hours ago [-]
A more modern COBOL maybe.
9rx 17 hours ago [-]
So SQL?
thomastjeffery 17 hours ago [-]
Natural language can be precise, but only in context.
The struggle is to provide a context that disambiguates the way you want it to.
LLMs solve this problem by avoiding it entirely: they stay ambiguous, and just give you the most familiar context, letting you change direction with more prompts. It's a cool approach, but it's often not worth the extra steps, and sometimes your context window can't fit enough steps anyway.
My big idea (the Story Empathizer) is to restructure this interaction such that the only work left to the user is to decide which context suits their purpose best. Given enough context instances (I call them backstories), this approach to natural language processing could recursively eliminate much of its own ambiguity, leaving very little work for us to do in the end.
Right now my biggest struggle is figuring out what the foundational backstories will be, and writing them.
skydhash 14 hours ago [-]
That’s what programming languages are: You define a context, then you see that you can shorten the notation to symbol character: Like “The symbol a will refer to the value of type string and content ‘abcd’ and cannot refer to anything else for its life time” get you:
const a = “abcd”
That is called semantics. Programming is mostly fitting the vagueness inherent to natural languages to the precise context of the programming language.
thomastjeffery 13 hours ago [-]
Yes, but programming languages are categorically limited to context-free grammar. This means that every expression written in a programming language is explicitly defined to have precisely one meaning.
The advantage of natural language is that we can write ambiguously defined expressions, and infer their meaning arbitrarily with context. This means that we can write with fewer unique expressions. It also means that context itself can be more directly involved in the content of what we write.
In context-free grammar, we can only express "what" and "how"; never "why". Instead, the "why" is encoded into every decision of the design and implementation of what we are writing.
If we could leverage ambiguous language, then we could factor out the "why", and implement it later using context.
18 hours ago [-]
matthewsinclair 17 hours ago [-]
Yep. 100% agree. The whole “chat as UX” metaphor is a cul-de-sac that I’m sure we’ll back out of sooner or later.
I think about this like SQL in the late 80s. At the time, SQL was the “next big thing” that was going to mean we didn’t need programmers, and that management could “write code”. It didn’t quite work out that way, of course, as we all know.
I see chat-based interfaces to LLMs going exactly the same way. The LLM will move down the stack (rather than up) and much more appropriate task-based UX/UI will be put on top of the LLM, coordinated thru a UX/UI layer that is much sympathetic to the way users actually want to interact with a machine.
In the same way that no end-users ever touch SQL these days (mostly), we won’t expose the chat-based UX of an LLM to users either.
There will be a place for an ad-hoc natural language interface to a machine, but I suspect it’ll be the exception rather than the rule.
I really don’t think there are too many end users who want to be forced to seduce a mercurial LLM using natural language to do their day-to-day tech tasks.
jug 11 hours ago [-]
I think a counterpoint to this is that SQL has a specific and well-defined meaning and it takes effort to get what you actually want right. However, communication with an AI can sometimes request a specific context or requirements but also be intentionally open-ended where we want to give the AI leeway. The great thing here is that humans _and_ AI now quite clearly understand when a sentence is non-specific, or with great importance. So, I think it’s hard to come up with a more terse or approachable competitor to the sheer flexibility of language. In a way, I think it’s a similar problem that still has engineers across the world input text commands in a terminal screen since about 80 years now.
sangnoir 17 hours ago [-]
> The whole “chat as UX” metaphor is a cul-de-sac that I’m sure we’ll back out of sooner or later.
Only when someone discovers another paradigm that matches or exceeds the effectiveness of LLMs without being a language model.
daxfohl 15 hours ago [-]
Or DSLs like cucumber for acceptance tests. Cute for simple things, but for anything realistic, it's more convoluted than convenient.
spolsky 17 hours ago [-]
I don't think Daniel's point is that Chat is generically a clunky UI and therefore Cursor cannot possibly exist. I think he's saying that to fully specify what a given computer program should do, you have to provide all kinds of details, and human language is too compressed and too sloppy to always include those details. For example, you might say "make a logon screen" but there are an infinite number of ways this could be done and until you answer a lot of questions you may not get what you want.
If you asked me two or three years ago I would have strongly agreed with this theory. I used to point out that every line of code was a decision made by a programmer and that programming languages were just better ways to convey all those decisions than human language because they eliminated ambiguity and were much terser.
I changed my mind when I saw how LLMs work. They tend to fill in the ambiguity with good defaults that are somewhere between "how everybody does it" and "how a reasonably bright junior programmer would do it".
So you say "give me a log on screen" and you get something pretty normal with Username and Password and a decent UI and some decent color choices and it works fine.
If you wanted to provide more details, you could tell it to use the background color #f9f9f9, but a part of what surprised my and caused me to change my mind on this matter was that you could also leave that out and you wouldn't get an error; you wouldn't get white text on white background; you would get a decent color that might be #f9f9f9 or might be #a1a1a1 but you saved a lot of time by not thinking about that level of detail and you got a good result.
zamfi 17 hours ago [-]
Yeah, and in fact this is about the best-case scenario in many ways: "good defaults" that get you approximately where you want to be, with a way to update when those defaults aren't what you want.
Right now we have a ton of AI/ML/LLM folks working on this first clear challenge: better models that generate better defaults, which is great—but also will never solve the problem 100%, which is the second, less-clear challenge: there will always be times you don't want the defaults, especially as your requests become more and more high-level. It's the MS Word challenge reconstituted in the age of LLMs: everyone wants 20% of what's in Word, but it's not the same 20%. The good defaults are good except for that 20% you want to be non-default.
So there need to be ways to say "I want <this non-default thing>". Sometimes chat is enough for that, like when you can ask for a different background color. But sometimes it's really not! This is especially true when the things you want are not always obvious from limited observations of the program's behavior—where even just finding out that the "good default" isn't what you want can be hard.
Too few people are working on this latter challenge, IMO. (Full disclosure: I am one of them.)
skydhash 14 hours ago [-]
Which no one argues about really. But writing code was never the issue of software project. And if you open any books about software engineering, there’s barely any mention of coding. The issue is the process of finding what code to write and where to put it in a practical and efficient way.
In your example, the issue is not with writing the logon screen (You can find several example on github and a lot of css frameworks have form snippets). The issue is making sure that it works and integrate well with the rest of the project, as well as being easy to maintain.
Edmond 19 hours ago [-]
This is about relying on requirements type documents to drive AI based software development, I believe this will be ultimately integrated into all the AI-dev tools, if not so already. It is really just additional context.
We are also using the requirements to build a checklist, the AI generates the checklist from the requirements document, which then serves as context that can be used for further instructions.
Now we just need another tool that allows stakeholders to write requirement docs using a chat interface
jakelazaroff 19 hours ago [-]
I agree with the premise but not with the conclusion. When you're building visual things, you communicate visually: rough sketches, whiteboard diagrams, mockups, notes scrawled in the margins.
Something like tldraw's "make real" [1] is a much better bet, imo (not that it's mutually exclusive). Draw a rough mockup of what you want, let AI fill in the details, then draw and write on it to communicate your changes.
We think multi-modally; why should we limit the creative process to just text?
I can't wait for someone to invent a new language, maybe a subset of English, that is structured enough to half-well describe computer programs. Then train a model with RLHF to generate source code based on prompts in this new language.
It will slowly grow in complexity, strictness, and features, until it becomes a brand-new programming language, just with a language model and a SaaS sitting in the middle of it.
A startup will come and disrupt the whole thing by simply writing code in a regular programming language.
fullstackwife 14 hours ago [-]
> Who is hiring 2035:
> Looking for a low level engineer, who works close to the metal, will work on our prompts
ajmurmann 18 hours ago [-]
I agree with this and disagree at the same time. It depends what the goal is. If the goal is to have AI write the entire codebase for you, yes chat and human language is quite bad. That's part of the reason formal languages exist. But then only experts can use it. Requirement docs are a decent middle ground. However, I'm not sure it's a good goal for AI to generate the code base.
The mode that I've found most fruitful when using Cursor is treating it almost exactly as I would a pair programming partner. When I start on a new piece of functionality I describe the problem and give it what my thoughts are on a potential solution and invite feedback. Sometimes my solution is the best. Sometimes the LLM had a better idea and frequently we take a modified version of what one of us suggested. Just as you would with a human partner. The result of the discussion is better than what either of us would have done on their own.
I also will do classical ping-pong style tdd with it one we agreed on an approach. I'll write a test; llm makes it pass and write the next test which I'll make pass and so on.
As with a real pair, it's important to notice when they are struggling and help them or take over. You can only do this if you stay fully engaged and understand every line. Just like when pairing. I've found llms get frequently in a loop where something doesn't work and they keep applying the same changes they've tried before and it never works. Understand what they are trying to do and help them out. Don't be a shitty pair for your llm!
cruffle_duffle 14 hours ago [-]
> I've found llms get frequently in a loop where something doesn't work and they keep applying the same changes they've tried before and it never works.
It gets even funner when you try to get other models to fix whatever is broken and they too get caught in the same loop. I’ll be like “nope! Your buddy ChatGPT said the same thing and got stuck in such and such loop. Clearly whatever you are trying isn’t working so step back and focus on the bigger picture. Are we even doing this the right way in the first place?”
And of course it still walks down the loop. So yeah, better be ready to fix that problem yourself cause if they all do the same thing you are either way off course or they are missing something!
sho_hn 18 hours ago [-]
I'd say this criticism is well-addressed in aider. Steering the LLM via code comments is the first UX I've seen that works.
How jarring it is & how much it takes you out of your own flow state is very much dependent on the model output quality and latency still, but at times it works rather nicely.
fny 17 hours ago [-]
Narrative text is a worse UI pattern. It's impractical to read. Also how exactly do you merge narrative changes if you need to write several transformations as updates? Are you expected to update the original text? How does this affect diffs in version control?
I think it's more ideal to have the LLM map text to some declarative pseudocode that's easy to read which is then translated to code.
The example given by Daniel might map to something like this:
Then you'd use chat to make updates. For example, "make the gradient red" or "add a name field." Come to think of it, I don't see why chat is a bad interface at all with this set up.
ygouzerh 17 hours ago [-]
It's interesting, it seems that we are looping back on the old trend of Model-Driven Architecture
cruffle_duffle 14 hours ago [-]
lol. I’ve tried to get my LLM to produce something like that. Prompt was like “I’m going to feed your output to another model, please don’t write a narrative write what we’ve discussed in a machine readable format”.
It decided to output something JSON and maybe YAML once.
spandrew 12 hours ago [-]
AI "Agents" that can do tasks outside of the confines of just a chat window are probably the next stage of utility.
The company I work for integrated AI into some of our native content authoring front-end components and people loved it. Our system took a lot of annotating to be able to accurately translate the natural language to the patterns of our system but users so far have found it WAYYY more useful than chat bc it's deeply integrated into the tasks they do anyway.
Figma had a similar success at last year's CONFIG when they revealed AI was renaming default layers names (Layer 1, 2, etc)... something they didn't want to do anyway. I dare say nobody gave a flying f about their "template" AI generation whereas layer renaming got audible cheers. Workflow integration is how you show people AI isn't just replacing their job like some bad sci-fi script.
Workflow integration is going to be big. I think chat will have its place tho; just kind of as an aside in many cases.
I took the position of not liking to much the AI coding early on. This was specially when it was starting. People writing long description to generate an app, I quickly noticed that doesn't work because it's all in the details.
Then having ai generate code for my project didn't feel good either, I didn't really understand what it was doing so I would have to read it to understand, then what is the purpose, I might as well write it.
I then started playing, and out came a new type of programming language called plang (as in pseudo language). It allows you to write the details without all the boiler code.
> This is the core problem. You can’t build real software without being precise about what you want.
I've tested a few integrated AI dev tools and it works like a charm. I don't type all my instructions at once. I do it the same way as I do it with code. Iteratively:
1) Create a layout
2) Fill left side
3) Fill right side
4) Connect components
5) Populate with dummy data
> The first company to get this will own the next phase of AI development tools.
There's more than 25 working on this problem and they are already in production and some are really good.
xena 18 hours ago [-]
My nuclear fire hot take is that the chat pattern is actively hampering AI tools because we have to square peg -> round hole things either into the chat UI (because that's what people expect), or that as developers you have to square peg -> round hole into the chat API patterns.
I wonder if foundation models are an untapped goldmine in terms of the things they can do, but we can't surface them to developers because everyone's stuck in the chat pattern.
disqard 18 hours ago [-]
Whoa! You broke my brain a bit there (but your posts often do, in a Good way!)
Would you be so kind as to ELI5 what you did in that index.js?
I've used ollama to run models locally, but I'm still stuck in chat-land.
Of course, if a blog post is in the works, I'll just wait for that :)
AI models fundamentally work on the basis of "given what's before, what comes next?" When you pass messages to an API like:
[
{ "role": "system", content": "You are an expert in selling propane and propane accessories. Whenever someone talks about anything that isn't propane, steer them back." },
{ "role": "user", "content": "What should I use to cook food on my grill?" },
{ "role": "assistant", "content": "For cooking food on your grill, using propane is a great choice due to its convenience and efficiency. [...]" }
]
Under the hood, the model actually sees something like this (using the formatting that DeepSeek's Qwen 2.5 32b reasoning distillation uses):
You are an expert in selling propane and propane accessories. Whenever someone talks about anything that isn't propane, steer them back.
<|User|>What should I use to cook food on my grill?<|endofsentence|>
<|Assistant|>
And then the model starts generating tokens to get you a reply. What the model returns is something like:
For cooking food on your grill, using propane is a great choice due to its convenience and efficiency. [...]<|endofsentence|>
The runtime around the model then appends that as the final "assistant" message and sends it back to the user so there's a façade of communication.
What I'm doing here is manually assembling the context window such that I can take advantage of that and then induce the model that it needs to think more, so the basic context window looks like:
Follow this JSON schema: [omitted for brevity]
<|User|>Tell me about Canada.<|endofsentence|>
<|Assistant|><think>Okay
And then the model will output reasoning steps until it sends a </think> token, which can be used to tell the runtime that it's done thinking and to treat any tokens after that as the normal chat response. However, sometimes the model stops thinking too soon, so what you can do is intercept this </think> token and then append a newline and the word "Wait" to the context window. Then when you send it back to the model, it will second-guess and double-check its work.
The paper s1: Simple test-time scaling (https://arxiv.org/abs/2501.19393) concludes that this is probably how OpenAI implemented the "reasoning effort" slider for their o1 API. My index.js file applies this principle and has DeepSeek's Qwen 2.5 32b reasoning distillation think for three rounds of effort and then output some detailed information about Canada.
In my opinion, this is the kind of thing that people need to be more aware of, and the kind of stuff that I use in my own research for finding ways to make AI models benefit humanity instead of replacing human labor.
disqard 17 hours ago [-]
Thank You so much for making time to write that up! Deeply appreciated.
It's fascinating how this "turn-taking protocol" has emerged in this space -- as a (possibly weird) analogy, different countries don't always use the same electrical voltage or plug/socket form-factor.
Yet, the `role` and `content` attrib in json appears to be pretty much a de facto standard now.
yapyap 15 hours ago [-]
> AI was supposed to change everything. Finally, plain English could be a programming language—one everyone already knows. No syntax. No rules. Just say what you want
That’s the thing about language, you CAN’T program in human language for this exact reason, whereas programming languages are mechanical but precise, human languages flow better but they leave wiggle room. Computers can’t do jack shit with wiggle room, they’re not humans. That’ll always remain, until there’s an AI people like enough to have it’s own flair on things.
PaulHoule 15 hours ago [-]
It makes me think of the promises and perils of Jupyter notebooks.
So far as this article is concerned (not the many commenters who are talking past it), "chat" is like interacting with a shell or a REPL. How different is the discussion that Winograd has with SHRDLU
with the conversation that you have with a database with the SQL monitor really?
There's a lot to say for trying to turn that kind of conversation into a more durable artifact. I'd argue that writing unit tests in Java I'm doing exploratory work like I'd do in a Python REPL except my results aren't scrolling away but are built into something I can check into version control.
On the other hand, workspace-oriented programming environments are notorious for turning into a sloppy mess, for instance people really can't make up their mind if they want to store the results of their computations (God help you if you have more than one person working on it, never mind if you want to use version control -- yet, isn't that a nice way to publish a data analysis?) or if they want to be a program that multiple people can work, can produce reproducible results, etc.
See also the struggles of "Literate Programming"
Not to say there isn't an answer to all this but boy is it a fraught area.
bangaladore 16 hours ago [-]
I'll preface this by saying I also dislike using chat as a pattern for AI tools. However, in theory, the idea has merit. Just as having 100% of the specifications and design guidance for a product is valuable before development, complete requirements would seem ideal. In reality, though, many requirements and specifications are living documents. Should we expect to rebuild the entire application every time a document changes? For example, if I decide to reduce a header's height, there's a significant chance the application could end up looking or feeling entirely different.
In a real-world scenario, we begin with detailed specifications and requirements, develop a product, and then iterate on it. Chat-based interactions might be better suited to this iterative phase. Although I'm not particularly fond of the approach, it does resemble receiving a coworker's feedback, making a small, targeted change, and then getting feedback again.
Even if the system were designed to focus solely on the differences in the requirements—thus making the build process more iterative—we still encounter an issue: it tends to devolve into a chat format. You might have a set of well-crafted requirements, only for the final instruction to be, "The header should be 2px smaller."
Nonetheless, using AI in an iterative process (focusing on requirement diffs, for example) is an intriguing concept that I believe warrants further exploration.
muzani 11 hours ago [-]
The first wave was not chat, it was completion. Instead of saying "suggest some names for an ice cream shop", the first wave was "Here are some great names for ice cream shops: 1. Nice Cream 2." Chat was a lot more intuitive and low effort than this.
Chat is also iterative. You can go back there and fix things that were misinterpreted. If the misinterpretation happens often, you can add on another instruction on top of that. I strongly disagree that they'd be fixed documents. Documents are a way to talk to yourself and get your rules right before you commit to them. But it costs almost nothing to do this with AI vs setting up brainstorming sessions with another human.
However, the rational models (o1, r1 and such) are good at iterating with themselves, and work better when you give them documents and have them figure out the best way to implement something.
Bjorkbat 15 hours ago [-]
This kind of reminds me of back when the hype cycle was focused on Messenger apps and the idea of most online behavior being replaced with a chatbot. God I hated the smug certainty of (some, definitely not all!) UX designers at the time proclaiming that chat was the ultimate interface.
Absolutely insane that all the doors unlocked by being able to interact with a computer graphically, and yet these people have visions of the future stuck in the 60s.
karmakaze 18 hours ago [-]
What makes it bad currently is the slow output.
The example shows "Sign-in screen" with 4 (possibly more) instructions. This could equivalently have been entered one at a time into 'chat'. If the response for each was graphic and instantaneous, chat would be no worse than non-chat.
What makes non-chat better is that the user puts more thought into what they write. I do agree for producing code Claude with up-front instructions beats ChatGPT handily.
If OTOH AI's actually got as good or better than humans, chat would be fine. It would be like a discussion in Slack or PR review comments.
cheapsteak 17 hours ago [-]
I'm predicting that Test-Driven Development may be having a comeback
English behaviour descriptions -> generated tests
Use both behaviour descriptions and feedback from test results to iterate on app development
r0ckarong 18 hours ago [-]
I don't want to become a lawyer to talk to my compiler; thank you.
Been experimenting with the same approach but for "paged shells" (sorry for the term override) and this seems to be a best of both worlds kinda thing for shells. https://xenodium.com/an-experimental-e-shell-pager That is, the shell is editable when you need it to be (during submission), and automatically read-only after submission. This has the benefit of providing single-character shortcuts to navigate content. n/p (next/previous) or tab/backtab.
The navigation is particularly handy in LLM chats, so you can quickly jump to code snippets and either copy or direct output elsewhere.
bcherry 17 hours ago [-]
Chat is a great UX _around_ development tools. Imagine having a pair programmer and never being allowed to speak to them. You could only communicate by taking over the keyboard and editing the code. You'd never get anything done.
Chat is an awesome powerup for any serious tool you already have, so long as the entity on the other side of the chat has the agency to actually manipulate the tool alongside you as well.
skydhash 14 hours ago [-]
The real powerup is scripting. And if the actions are precise enough, macros. Much more efficient for a lot of tasks.
furyofantares 18 hours ago [-]
In cursor I keep a specification document in .cursorrules and I have instructions that cursor should update the document whenever I add new specifications in chat.
yoz 18 hours ago [-]
That's a great idea. How well does it work in practice?
furyofantares 18 hours ago [-]
It works great until it stops doing anything. I use it on new projects and it makes everything go smoothly at the start and, I think, for much longer.
I haven't done it for existing projects but I have done something similar for an unfamiliar, old and challenging codebase. I worked with the cursor chat agent to produce a document I called architecture.md mapping out high level features to files/classes/functions. This was excellent because I found the document useful and it also made cursor more effective.
harlanlewis 16 hours ago [-]
This is a great idea, I've been doing something similar at 2 levels:
1. .cursorrules for global conventions. The first rule in the file is dumb but works well with Cursor Composer:
`If the user seems to be requesting a change to global project rules similar to those below, you should edit this file (add/remove/modify) to match the request.`
This helps keep my global guidance in sync with emergent convention, and of course I can review before committing.
2. An additional file `/.llm_scratchpad`, which I selectively include in Chat/Composer context when I need lengthy project-specific instructions that I made need to refer to more than once.
The scratchpad usually contains detailed specs, desired outcomes, relevant files scope, APIs/tools/libs to use, etc. Also quite useful for transferring a Chat output to a Composer context (eg a comprehensive o1-generated plan).
Lately I've even tracked iterative development with a markdown checklist that Cursor updates as it progresses through a series of changes.
The scratchpad feels like a hack, but they're obvious enough that I expect to see these concepts getting first-party support through integrations with Linear/Jira/et al soon enough.
jillesvangurp 3 hours ago [-]
Not just for development tools. It's a bad UI for a lot of use cases. It's merely what made sense for UX challenged researchers when they had to come up with a UI for their LLMs in a hurry. Discord was there and reasonably easy to integrate. Many tools started out as just that. Fast forward and most tools are kind of standalone versions of the same thing.
The challenge is that I haven't seen anything better really.
Lately the innovation comes mainly from deeper integration with tools. Standalone AI editors are mainly popular with people who use relatively simple editors (like VS Code). VS Code has a few party tricks but for me swapping out Intellij for something else on a typical Kotlin project is a complete non starter. Not going to happen. I'd gain AI, but I'd loose everything else that I use all the time. That would be a real productivity killer. I want to keep all the smart tooling I already have and have used for years.
There are a few extensions for intellij but they are pretty much all variations of a sidebar with a chat and autocomplete. Autocomplete competes with normal autocomplete, which I use all the time. And the clippy style "it looks like you are writing a letter" style completions just aren't that useful too me at all. They are just noise and break my flow. And they drown out the completions I use and need all the time. And sidebars just take up space and copying code from there back to your editor is a bit awkward as UX
Lately I've been using chat gpt. It started out pretty dumb but these days I can option+shift+1 in a chat and have it look over my shoulder at my current editor. "how do I do that?" translates into a full context with my current editing window, cursor & selected text, etc. all in the context. Before I was copy pasting everything and the kitchen sync to chat gpt, now it just tells me what I need to do. The next step up from this is that it starts driving the tools itself. They already have a beta for this. This deeper integration is what is needed.
A big challenge is that most of these tools are driven to minimize cost and context size. Tokens cost money. So chat GPT only looks at my active editor and not at the 15 other files I have open. It could. But it doesn't. It's also unaware of my project structure, or the fact that most of my projects are kotlin multiplatform and can't use JVM dependencies. So, in that sense, every chat still is a bit ground hog day. It's promise to "remember" stuff when you ask it too is super flaky. It forgets most things it's supposed to remember pretty quickly.
These are solvable problems of course. But it's useful to me for debugging, analyzing, completing functions, etc.
reverendsteveii 18 hours ago [-]
This puts me in mind of something I read years ago and am having trouble finding that basically had the same premise but went about proving it a different way. The idea was that natural language programming is always going to mean dealing with a certain background level of ambiguity, and the article cited contracts and contract law as proof. Basically, a contract is an agreement to define a system with a series of states and a response for each state defined, and the vast and difficult-to-navigate body of contract law is proof that even when purposefully being as unambiguous as possible with two entities that fully grasp the intricacies of the language being used there is so much ambiguity that there has to be an entire separate group of people (the civil court system) whose only job it is to mediate and interpret that ambiguity. You might point to bad-faith actors but a contract where every possible state and the appropriate response are defined without ambiguity would be proof against both misinterpretations and bad faith actors.
Every time there is a chat interface for something I try to use it, then after 1-2 prompts I give up.
So I completely agree with this. Chat is not a good UI
cruffle_duffle 14 hours ago [-]
A lot of times it is because those things aren’t properly wired up into their systems well enough to get the right context needed to help. Lots of them are nothing more than a prompt with no ability to dig any deeper than their original training data.
Vox_Leone 18 hours ago [-]
I call it 'structured prompting' [think pseudo-code]. It strikes a nice balance between human-readable logic and structured programming, allowing the LLM to focus on generating accurate code based on clear steps. It’s especially useful when you want to specify the what (the logic) without worrying too much about the how (syntax and language-specific details). If you can create an effective system that supports this kind of input, it would likely be a big step forward in making code generation more intuitive and efficient. Good old UML could also be used.
Example of a Structured Pseudo-Code Prompt:
Let’s say you want to generate code for a function that handles object detection:
'''Function: object_detection
Input: image
Output: list of detected objects
Steps:
1. Initialize model (load pretrained object detection model)
2. Preprocess the image (resize, normalize, etc.)
3. Run the image through the model
4. Extract bounding boxes and confidence scores from the model's output
5. Return objects with confidence greater than 0.5 as a list of tuples (object_name, bounding_box)
Language: Python'''
yazmeya 17 hours ago [-]
Why not just give it the desired function signature and comments in the function body, in Python?
weitendorf 16 hours ago [-]
This is exactly why we're developing our AI developer workflow product "Brilliant" to steer users away from conversations altogether.
Many developers don't realize this but as you go back and forth with models, you are actively polluting their context with junk and irrelevant old data that distracts and confuses it from what you're actually trying to do right now. When using sleeker products like Cursor, it's easy to forget just how much junk context the model is constantly getting fed (from implicit RAG/context gathering and hidden intermediate steps). In my experience LLM performance falls off a cliff somewhere around 4 decent-sized messages, even without including superfluous context.
We're further separating the concept of "workflow" from "conversation" and prompts, basically actively and aggressively pruning context and conversation history as our agents do their thing (and only including context that is defined explicitly and transparently), and it's allowing us to tackle much more complex tasks than most other AI developer tools. And we are a lot happier working with models - when things don't work we're not forced to grovel for a followup fix, we simply launch a new action to make the targeted change we want with a couple clicks.
It is in a weird way kind of degrading to have to politely ask a model to change a color after it messed up, and it's also just not an efficient way to work with LLMs - people just default to that style because it's how you'd interact with a human you are delegating tasks to. Developers still need to truly internalize the facts that LLMs are purely completion machines, that your conversation history lives entirely client side outside of active inference, and that you can literally set your conversation input to be whatever you want (even if the model never said that) - after that realizing you're on the path towards using LLMs like "what words do I need to put it in to get it to do what I want" rather than working "with" them.
michaelfeathers 13 hours ago [-]
Chat in English? Sure. But there is a better way. Make it a game to see how little you can specify to get what you want.
I used this single line to generate a 5 line Java unit test a while back.
test: grip o -> assert state.grip o
LLMs have wide "understanding" of various syntaxes and associated semantics. Most LLMs have instruct tuning that helps. Simplifications that are close to code work.
Re precision, yes, we need precision but if you work in small steps, the precision comes in the review.
Make your own private pidgin language in conversation.
fhd2 17 hours ago [-]
I've mainly used gptel in Emacs (primarily with Claude), and I kind of already use the chat buffer like a document. You can freely edit the history, and I make very generous use of that, to steer where the model is going.
It has features to add context from your current project pretty easily, but personally I prefer to constantly edit the chat buffer to put in just the relevant stuff. If I add too much, Claude seems to get confused and chases down irrelevant stuff.
Fully controlling the context like that seems pretty powerful compared to other approaches I've tried. I also fully control what goes into the project - for the most part I don't copy paste anything, but rather type a version of the suggestion out quickly.
If you're fast at typing and use an editor with powerful text wrangling capabilities, this is feasible. And to me, it seems relatively optimal.
sprucevoid 16 hours ago [-]
I find web chat interfaces very useful for programming, but it also feels like early days. Speedups will smooth out a lot of pain points. But other UI changes, some even quite small, could enhance use a lot. A few of various size top of mind with regard to Claude web chat UI specifically:
- intellisense in the inputbox based on words in this or all previous chats and a user customizable word list
- user customizable buttons and keyboard shortcuts for common quick replies, like "explain more".
- when claude replies with a numbered list of alternatives let me ctrl+click a number to fork the chat with continued focus on that alternative in a new tab.
- a custom right click menu with action for selection (or if no selection claude can guess the context e.g. the clicked paragraph) such as "new chat with selection", "explain" and some user customizable quick replies
- make the default download filenames follow a predicable pattern, claude currently varies it too much e.g. "cloud-script.py" jumps to "cloud-script-errorcheck.py". I've tried prompting a format but claude seems to forget that.
- the stop button should always instantly stop claude in its tracks. Currently it sometimes takes time to get claude to stop thinking.
- when a claude reply first generates code in the right sidebar followed by detailed explanation text in the chat, let some keyboard shortcut instantly stop the explanation in its tracks. Let the same shortcut preempt that explanation while the sidebar code is still generating.
- chat history search is very basic. Add andvanced search features, like filter by date first/last message and OR search operator
- batch jobs and tagging for chat history. E.g. batch apply a prompt to generate a summary in each selected chat and then add the tag "summary" to them. Let us then browse by tag(s).
- tools to delete parts of a chat history thread, that in hindsight were detours
- more generally, maybe a "chat history chat" to have Claude apply changes to the chat histories
15 hours ago [-]
andix 15 hours ago [-]
My approach for AI generated code with more complexity was always this:
1. Ask AI to generate a spec of what we're planning to do.
2. Refine it until it's kind of resembling what I want to do
3. Ask AI to implement some aspects from the spec
jimlikeslimes 18 hours ago [-]
Has anyone invited an LLM inside their lisp process that can be accessed from the repl? Being able to empower an LLM to be able to affect the running lisp image (compile functions etc), and having changes reflected back to the source on disk would be interesting.
notatoad 6 hours ago [-]
i like chat. all the more dedicated ai development tools try to force the author's (or somebody's) specific workflow, and fall into an uncanny valley sort of situation for my workflow, where it highlights all the bits of my workflow that don't match the tool's desired workflow.
chat is so drastically far away from my workflow that it doesn't feel like my workflow is wrong.
lcfcjs6 19 hours ago [-]
Seems like this is a common complaint from folks trying to write code purely with ChatGPT / Deepseek by communicating in complete sentences. You can only get so far using these tools before you need a proper understanding of whats happening with the code.
azhenley 18 hours ago [-]
See my article from January 2023, "Natural language is the lazy user interface".
It's interesting we view Email and Chat so differently. Some companies run on chat (e.g. Slack), while most companies run on email.
Emails are so similar to Chat, except we're used to writing in long-form, and we're not expecting sub-minute replies.
Maybe emails are going to be the new chat?
I've been experimenting with "email-like" interfaces (that encourage you to write more / specify more), take longer to get back to you, and go out to LLMs. I think this works well for tools like Deep Research where you expect them to take minutes to hours.
nimski 17 hours ago [-]
This has been the thesis behind our product since the beginning (~3 years), before a lot of the current hype took hold. I'm excited to see it gain more recognition.
Chat is single threaded and ephemeral. Documents are versioned, multi-threaded, and a source of truth. Although chat is not appropriate as the source of truth, it's very effective for single-threaded discussions about documents. This is how people use requirements documents today. Each comment on a doc is a localized chat. It's an excellent interface when targeted.
foz 18 hours ago [-]
After using Cursor and Copilot for some time, I long for a tool that works like a "real" collaborator. We share a spec and make comments, resolve them. We file issues and pull requests and approve them. We use tests and specs to lock down our decisions. We keep a backlog up to date, maintain our user docs, discuss what assumptions we have to validate still, and write down our decisions.
Like with any coworker - when ideas get real, get out of chat and start using our tools and process to get stuff done.
ygouzerh 17 hours ago [-]
That's a great idea! Maybe when context window will be larger and tokens even cheaper?
I would like as well to add to it a peer-programming feature, with it making some comments on top of the shoulder when coding, a kind of smarter linter that will not lint one line, but that will have the entire project context.
daxfohl 13 hours ago [-]
I agree with everything except there being a business opportunity there. Whatever ends up being the fix, all the big players will incorporate it into their own IDEs within a couple months. Unless the fix is something that is significantly different from an IDE, that incorporating it into one doesn't make sense.
firefoxd 16 hours ago [-]
That's also why AGI as defined today is an interface problem. Imagine we've actually achieved it, and the interface is a chat prompt. It will be really hard to differentiate it with the current tools we have.
For writing, the canvas interface is much more effective because you rely less on copy and paste. For code, even with the ctrl+i method, it works but it's a pain to have to load all other files as reference every single time.
I had a daydream about programming with an LLM as something more like driving a car than typing, e.g. constant steering, changing gears and so on
a3w 19 hours ago [-]
For me: Chat is like writing comments, but not a the right place in the source code.
Perhaps I should comment all todos and then write "finish todos" as the always-same text prompt.
gspencley 18 hours ago [-]
This is preference for sure, but I am of the opinion that ALL code comments are code smells.
And that's not even to say that I don't write code comments. When working on large legacy codebases, where you often need to do 'weird' things in service of business goals and timelines, a comment that explains WHY something was done the way it was is valuable. And I leave those comments all the time. But they're still a code smell.
Comments are part of your code. So they need to be maintained with the rest of your code. Yet they are also "psychologically invisible" most of the time to most programmers. Our IDEs even tend to grey them out by default for us, so that they get out of the way so we can focus on the actual implementation code.
This means that comments are a maintenance obligation that often get ignored and so they get out of sync with the actual code really fast.
They also clutter the code unnecessarily. Code, at its best, should be self-explanatory with extremely little effort needed to understand the intent of the code. So even a comment that explains why the code is weird is doing little more than shining a flashlight on smelly code without actually cleaning it up.
And don't get me started on "todo" comments. Most professional shops use some kind of project management tool for organizing and prioritizing future work. Don't make your personal project management the problem of other people that share and contribute to your codebase. There is zero rationale for turning shared code into your personal todo list. (and it should be obvious that I'm talking about checked in code .. if it's your working branch then you do you :) )
So if programming using LLMs is similar to writing comments (an interesting analogy I hadn't considered before), then maybe this is part of the reason I haven't found a problem that LLMs solve for me yet (when programming specifically). I just don't think like that when I'm writing code.
jes5199 11 hours ago [-]
I'm looking forward to a Cursor-like experience but using voice - so we can discuss code changes verbally, as if we were pair programming. honestly I'm not sure why we don't have that yet.
nektro 5 hours ago [-]
got so close to the point and then flew right past it. AI is never going to be great for this because the limit of "prompt engineering" just loops right back to normal programming
tgraf_80 17 hours ago [-]
Truly speaking, you can use AI for a little bit higher abstraction and ambiguity, but not much. For instance, if you need an iteration over an array and you want to do a very specific aggregation you can instruct AI to write this loop but you yourself need to understand exactly what it’s doing and have a very clear idea how this code snippet fits into the larger picture
josefrichter 18 hours ago [-]
I think everyone is aware that chat is not the ideal UI pattern for this. It's just the way current AI models work and generate content - that's why they have this "typewriter" mode, which naturally leads to a chat interface.
It's not really a conscious choice, but rather a side effect. And we already see the trend is away from that, with tools like chatGPT Canvas, editors like Windsurf, etc.
kijin 18 hours ago [-]
When the only tool your AI can wield is a hammer, everything had better look like a nail.
Once the models become fast enough to feel instantaneous, we'll probably begin to see more seamless interfaces. Who wants a pair programmer who goes "umm... ahh..." every time you type something? A proper coding assistant should integrate with your muscle memory just like autocomplete. Tab, tab, tab and it's done.
orand 14 hours ago [-]
Chat as a bad UI pattern for development tools is like saying language is a bad UI pattern for thought.
kmarc 17 hours ago [-]
Look, deleting the inside of () parens in a function call makes total sense by instructing your editor to "delete inside parenthesis", or in vim:
di(
Yet, millions of programmers use their mouse to SELECT first something visually and THEN delete whatever was selected. Shrug.
I won't be surprised if chat-based programming will be the next way of doing stuff.
randomNumber7 14 hours ago [-]
For me the LLM does 3 things. Since it is trained on pattern matching it performs well on these. The tree like chatgpt inerface (where you can change the questions) is perfect imo.
- Speed up literature recherche
- replace reading library documentation
- generate copy pasta code that has been written often before
grumbel 17 hours ago [-]
Plenty of software has been developed on the command line, chat is just a more powerful and flexible version of that. The missing part with current AI systems is a persistent workspace/filesystem that allows you to store things you want to keep, discard things you want to get rid off and highlight things you want to focus on.
darepublic 16 hours ago [-]
Need interactive chat for coding. You say something high level, the model prompts you for low level decisions etc. every once in a while the code bot can send a screenshot or some test results so we stay grounded in where we are in the process. This can enable coding while I'm driving or sitting stoned on the couch.
suralind 12 hours ago [-]
I love how Zed integrates the chat into IDE. You can essentially edit any part of the history, remove or redo the prompt. I just started to use this feature a couple of days ago and I couldn’t be happier.
ansonhw 15 hours ago [-]
I agree actually that chat is overrated overall as UX. It works really well for chatgpt but creates the wrong ux expectations for users where more precision or constraint is needed. Also not good for large processing. It was magic though for chatgpt.
sramam 14 hours ago [-]
Interesting take. Completely agree that product requirements document is a good mental models for system description. However, aren't bug-reports+PRs approximating a chat-interface?
RyanAdamas 16 hours ago [-]
The chat interface modality is a fleeting one in the grand scheme. Billion token context windows with recursive Ai production based on development documentation and graphics is likely the next iteration.
ypyrko 17 hours ago [-]
100% agree. I had the same issue when it comes to text editing and I created this tool https://www.potext.com
I love to have a full control over AI suggestions
whatsakandr 18 hours ago [-]
The nice thing about chat is it's open ended, the terrible thing is that holy crap I have to write a paragraph describing exactly what I want when I should just be able to hit a button or navigate through a couple menus.
tiborsaas 18 hours ago [-]
> I should just be able to hit a button or navigate through a couple menus.
The problem with this is that you need a gazillion of menus, dialogs and options to find that modal which does the thing _exactly_ what you want. Menus and likes are a means to an end, we don't really want them, but up until recently we couldn't live without them. With instruct based computing this is all changing.
16 hours ago [-]
icapybara 16 hours ago [-]
This feels like those arguments that text is the worst way to code and we actually need a no code solution instead.
Theoretically maybe, but chat windows are getting the job done right now.
18 hours ago [-]
arnaudsm 16 hours ago [-]
Considering how good real time voice chat with LLMs is now (gpt4o and Gemini 2.0), I'm haven't seen anyone try to integrate them into programming tools.
It could be quite fun !
esafak 8 hours ago [-]
Most programmers can type faster than they can speak.
anoncow 17 hours ago [-]
There should be a smart way of merging all the chat messages into a streamlined story of the development on the fly. Perhaps something an AI could do. We could call it contextAI.
remoquete 17 hours ago [-]
I'm intrigued by the conclusion. Docs-as-code, this time turning actual documentation and requirements into code? So, specifications? Back to OpenAPI?
Back to... programming languages? :)
gunalx 13 hours ago [-]
This just seem more cumbersome than just writing the software to begin with.
Its a problem of programming languages and definitions.
jpcom 11 hours ago [-]
So you're saying a spec [specification] is the solution to building programs.
stevage 13 hours ago [-]
Boy does this feel like the author has never actually used any AI tools for writing code.
ern 10 hours ago [-]
I had the same feeling reading this, but a large number of developers struggle to communicate verbally and would hate a chat based interface. Some years ago, I worked with a very intelligent person who had come into development with math-related degree.
We tried a pair-programming exercise, and he got visibly angry, flustered and frustrated when he tried to verbalize what he was doing.
One of the reasons Business Analysts and the like exist is that not everyone can bridge the gap between the messy, verbal real world, and the precision demanded by programming languages.
jfkrrorj 18 hours ago [-]
No, it is pretty much dialog, I would compare it to pair programming.
AI in many levels is more capable than human programmer, in some it is not. It is not supersmart. It can not hold entire program in its head, you have to feed it small relevant section of program.
》 That’s why we use documents—they let us organize complexity, reference specific points, and track changes systematically.
Extra steps. Something like waterfall...
shireboy 17 hours ago [-]
Yeah, I've landed on similar, although I wouldn't say it's bad for all dev scenarios. For small tweaks, or cases where I want a junior dev to do something I say explicitly ("add a bootstrap style input field for every property on #somemodel") chat works fine.
For higher-level AI assist, I do agree chat is not what makes sense. What I think would be cool is to work in markdown files, refining in precise plain english each feature. The AI then generates code from the .md files plus existing context. Then you have well-written documentation and consistent code. You can do this to a degree today by referencing a md file in chat, or by using some of the newer tools, but I haven't seen exactly what I want yet. (I guess I should build it?)
aantix 15 hours ago [-]
The only LLM agent I've seen who asked any sort of clarifying questions about design was Devin.
Well if I can't have intelligent replies I'll gladly take the amusing ones, thanks
px43 18 hours ago [-]
Language evolves. As much as we are training computers to understand us, they are training us to understand them.
The level of precision required for highly complex tasks was never necessary before. My four year old has a pretty solid understanding of how the different AI tools she has access to will behave differently based on how she phrases what she says, and I've noticed she is also increasingly precise when making requests of other people.
amelius 18 hours ago [-]
LLMs are what COBOL was supposed to be.
krainboltgreene 18 hours ago [-]
I suspect there's a gradient for language: One side is clarity, the other side is poetry. English definitely is farther to the poetry side than the clarity side, where programming languages are significantly closer to the clarity side.
I suspect there's an 100 year old book describing what I'm saying but much more eloquently.
debacle 14 hours ago [-]
I quite like it. Meta AI has become a good programming companion.
proc0 18 hours ago [-]
This is lowkey cope. AI should be like talking to another human, at least that is the promise. Instead we're getting glorified autocomplete with padded language to sound like a human.
In its current form LLMs are pretty much at their limit, barring optimization and chaining them together for more productivity once we have better hardware. Still, it will just be useful for repetitive low level tasks and mediocre art. We need more breakthroughs beyond transformers to approach something that creates like humans instead of using statistical inference.
biscuit1v9 18 hours ago [-]
>In its current form LLMs are pretty much at their limit
How do you know that?
proc0 17 hours ago [-]
I don't know that, mostly speculating based on how mixture of experts is outperforming decoder-only architectures, which means we're already composing transformers to squeeze the most out of it, and still it seems to fall short. They have already been trained with incredible amounts of data, and it still needs to be composed into multiple instances and needs even better hardware and it seems it has reached diminishing returns. The question is will the little that is left to optimize be enough to have it be truly agentic and create full apps on its own, or will it still require expert supervision for anything useful.
indymike 18 hours ago [-]
How else do you interact with a chat based ai? It may not be ideal, but it is an improvement.
synergy20 17 hours ago [-]
what about organizing chats into documents by chatting: keep track of the chats and build up a design doc
or the other way around,give AI a design doc and generate what you want,this is still chatting, just more official and lengthy
karaterobot 18 hours ago [-]
I don't know about this. He admits you can write prototype code with chat-based LLMs, but then says this doesn't matter, because you can't write extremely complex applications with them.
First of all, most people can't write extremely complex applications, period. Most programmers included. If your baseline for real programming is something of equivalent complexity as the U.S. tax code, you're clearly such a great programmer that you're an outlier, and should recognize that.
Second of all, I think it's a straw man argument to say that you can either write prototype-level code with a chat UI, or complex code with documents. You can use both. I think the proposition being put forward is that more people can write complex code by supplementing their document-based thinking with chat-based thinking. Or, that people can write slightly better-than-prototype level code with the help of a chat assistant. In other words, that it's better to have access to AI to help you code small sections of a larger application that you are still responsible for.
I'd be more interested in reading a good argument against the value of using chat-based AI as another tool in your belt, rather than a straight-up replacement for traditional coding. If you could make that argument, then you could say chat is a bad UI pattern for dev tools.
6h6j65j76k 17 hours ago [-]
"Current AI tools pretend writing software is like having a conversation. "
But that is true? Devs spend more time in meetings than writing code. Having conversations about the code they are going to write.
martinsnow 17 hours ago [-]
I agree. But in that context we're talking about specifications.
When we're trying to wrangle a piece of code to do something we want but aren't quite sure of how to interact with the api, it's a different matter.
What i found is that by the time copilot/gpt/deepseek has enough knowledge about the problem and my codebase, I've run out of tokens. Because my head can contain a much larger problem area than these models allow me to feed them in a budget friendly manner.
williamcotton 14 hours ago [-]
It’s like writing laws.
Vague and prone to endless argument?
kordlessagain 16 hours ago [-]
A terminal prompt has worked great for me for years…
Havoc 17 hours ago [-]
Chat seems flawed but I don’t see how a document is better.
I don’t buy that a document could capture what is needed here. Imagine describing navigating through multiple levels of menus in document form. That sounds straight up painful even for trivial apps. And for a full blown app…nope
There is a whole new paradigm missing there imo
tommiegannert 17 hours ago [-]
I'm in the business of data collection, to some extent: building a support system for residential solar panel installations. There's a bunch of data needed for simulations, purchase estimations, legal and tax reasons. Not insane amounts, but enough that filling out a form feels tedious. LLMs are great in that they can be given a task to gather a number of pieces, and can explain to the user what "kWh" means, at many level of technical depth.
We play around with LLMs to build a chat experience. My first attempt made Claude spew out five questions at a time, which didn't solve the "guiding" problem. So I started asking it to limit the number of unanswered questions. It worked, but felt really clunky and "cheap."
I drew two conclusions: We need UI builders for this to feel nice, and professionals will want to use forms.
First, LLMs would be great at driving step-by-step guides, but it must be given building blocks to generate a UI. When asking about location, show a map. When deciding to ask about TIN or roof size, if the user is technically inclined, perhaps start with asking about the roof. When asking about the roof size, let the user draw the shape and assign lengths. Or display aerial photos. The result on screen shouldn't be a log of me-you text messages, but a live-updated summary of where we are, and what's remaining.
Second, professionals have incentive to build mental model for navigating complex data structures. People who have no reason to invest time into the data model (e.g. a consumer buying a single solar panel installation in ther lifetime,) will benefit from rich LLM-driven UIs. Chat UIs might create room for a new type of computer user who doesn't use visual clues to build this mental model, but everyone else will want to stay on graphics. If you're an executive wondering how many sick days there were last month, that's a situation where a BI LLM RAG would be great. But if you're not sure what your question is, because you're hired to make up your own questions, then pointing, clicking and massaging might make more sense.
randomcatuser 17 hours ago [-]
chat=repl?
doc=programming in a DSL? / (what was that one language which was functional & represented in circles in a canvas?)
SuperHeavy256 15 hours ago [-]
so, a really long text? that's your big revelation?
quantadev 18 hours ago [-]
Just two tips/thoughts:
1) The first thing to improve chats as a genre of interface, is that they should all always be a tree/hierarchy (just like Hacker News is), so that you can go back to ANY precise prior point during a discussion/chat and branch off in a different direction, and the only context the AI sees during the conversation is the "Current Node" (your last post), and all "Parent Nodes" going back to the beginning. So that at any time, it's not even aware of all the prior "bad branches" you decided to abandon.
2) My second tip for designs of Coding Agents is do what mine does. I invented a 'block_begin/block_end' syntax which looks like this, and can be in any source file:
// block_begin MyAddNumbers
var = add(a, b)
return a + b
// block_end
With this syntax you can use English language to explain and reason about extremely specific parts of your code with out expecting the LLM to "just understand". You can also direct the LLM to only edit/update specific "Named Blocks", as I call them.
So a trivial example of a prompt expression related to the above might be "Always put number adding stuff in the MyAddNumbers Block".
To explain entire architectural aspects to the LLM, these code block names are extremely useful.
cruffle_duffle 14 hours ago [-]
Dude threaded chat is how it should be, right? Especially if you could reference one thread in another and have it build the proper context up to understand what said thread was as a basis for this new conversation.
Proper context is absolutely everything when it comes to LLM use
quantadev 10 hours ago [-]
Yep. "Threaded" might be what the kids are callin' it nowadays. It's always been trees for me. I had a 'Tree-based' chat app for OpenAI (and as of now all other leading LLMs, via LangChain/LangGraph) within one week of them opening up their API. lol. Because my Tree-based CMS was already done, to build on.
OpenAI finally made it where you can go back and edit a prior response, in their chat view, but their GUI is jank, because it's not a tree.
deeviant 18 hours ago [-]
You may have challenges using chat for development (Specifically, I mean text prompting, not necessary using a langchain session with a LLM, although that is my most common mode), but I do not. I have found chat to be, by far, the most productive interface with LLMs for coding.
Everything else, is just putting layers, that are not nearly as capable at an LLM, between me and the raw power of the LLM.
The core realization I made to truly unlock LLM code assistance as a 10x + productivity gain, is that I am not writing code anymore, I am writing requirements. It means being less an engineer, and more a manager, or perhaps an architect. It's not your job to write tax code anymore, it's your job to describe what the tax code needs to accomplish and how it's success can be defined and validated.
Also, it's never even close to true that nobody uses LLMs for production software, here's a write-up by Google talking about using LLMs to drastically accelerate the migration of complex enterprise production systems: https://arxiv.org/pdf/2501.06972
benatkin 19 hours ago [-]
A chat room is an activity stream and so is a commit log of a version control system. A lot of the bad UI is waiting a fixed amount of time that had a minimum that was too high, and for some communicating by typing. Many will prefer chatting by voice. When responses are faster it will be easier to hide the history pane and ask if you need to be reminded of anything in the history pane and use the artifact pane. However not all responses from an LLM need be fast, it is a huge advancement that LLMs will think for minutes at a time. I agree about the usefulness of prose as an artifact while coding. Markdown can be edited in IDEs using LLMs and then referenced in prompts.
newsyco21 18 hours ago [-]
generated ai is cancer
dehugger 15 hours ago [-]
LLM generated code seems to depend wildly on if the project is about something that a bunch of people have already put out on GitHub or not.
Writing a crud web API? Great!
Writing business logic for a niche edge case in a highly specialized domain? Good luck.
waylonchang 10 hours ago [-]
all this just to say "prompt neatly" :)
waychang 10 hours ago [-]
all this just to say "prompt neatly" :)
h1fra 18 hours ago [-]
Chat is a bad UI.
kristofferR 14 hours ago [-]
"You don’t program by chatting. You program by writing documents.", or long chat prompts as they are also called.
talles 14 hours ago [-]
Imagine if instead of English, someone invented some sort of computer language that is precise and produces the same result every time you execute it.
nbzso 16 hours ago [-]
Is there a statistical data on adoption of AI chatbots in the industry?
I see a lot of personal demos, small projects and nobody is talking about serious integration into production and useful patterns.
sebastianconcpt 17 hours ago [-]
Yep.
thomastjeffery 17 hours ago [-]
Chat is a bad interface for tools in general, but this problem goes deeper than that.
What's a good interface?
There are a few things we try to balance to make a good UI/UX:
- Latency: How long it takes to do a single task
- Decision-tree pathing: How many tasks to meet a goal
- Flexibility/Configurability: How much of a task can be encapsulated by the user's predefined knowledge of the system
- Discoverability: What tasks are available, and where
The perfect NLP chat could accomplish some of these:
- Flexibility/Configurability: Define/infer words and phrases that the user can use as shortcuts
- Decision-tree pathing: Define concepts that shortcut an otherwise verbose interaction
- Latency: Context-aware text-completions so the user doesn't need to type as much
- Discoverability: Well-formed introductions and clarifying questions to introduce useful interaction
This can only get us so far. What better latency can be accomplished than a button or a keyboard shortcut? What better discoverability than a menu?
The most exciting prospect left is flexibility. Traditional software is inflexible. It can only perform the interaction it was already designed with. Every design decision becomes a wall of assumption. These walls are the fundamental architecture of software. Without them, we would have nothing. With them, we have a structure that guides us along whatever assumptions were already made.
If we want to change something about our software's UI, then we must change the software itself, and that means writing. If NLP was a truly solved problem, then software compatibility and flexibility would be trivialized. We could redesign the entire UI by simply describing the changes we want.
LLMs are not even close. Sure, you can get one to generate some code, but only if the code you want generated is close enough to the text it was already trained on. LLMs construct continuations of tokens: no more, no less. There is no logic. There is no consideration about what is right or wrong: only what is likely to come next.
Like you said,
> You can’t build real software without being precise about what you want.
This is the ultimate limitation of UI. If only we could be ambiguous instead! LLMs let us do that, but they keep that ambiguity permanent. There is no real way to tie an LLM back down to reality. No logic. No axioms. No rules. So we must either be precise or ambiguous. The latter option is an exciting development, and certainly offers its own unique advantages, but it isn't a complete solution.
---
I've been thinking through another approach to the ambiguity problem that I think could really give us the expressive power of natural language, while preserving the logical structure we use to write software (and more). It wouldn't solve the problem entirely, but it could potentially move it out of the way.
Apocryphon 18 hours ago [-]
> When your intent is in a document instead of scattered across a chat log, English becomes a real programming language
So, something like Gherkin?
anarticle 18 hours ago [-]
I agree, and I think this means there is a lot of space for trying new things. I think cursor was a small glimpse in trying to fix the split between purely GitHub copilot line revision (this interrupts my thoughts too much) and calling in for help via a chat window that you're copying and pasting from.
I think this post shows there could be a couple levels of indirection, some kind of combination of the "overarching design doc" that is injected into every prompt, and a more tactical level syntax/code/process that we have with something like a chat window that is code aware. I've definitely done some crazy stuff by just asking something really stupid like "Is there any way to speed this up?" and Claude giving me some esoteric pandas optimization that gave me a 100x speedup.
I think overall the tools have crazy variance in quality of output, but I think with some "multifacet prompting", ie, code styling, design doc, architect docs, constraints, etc you might end up with something that is much more useful.
m3kw9 19 hours ago [-]
There is a black box effect between when you press enter and it starts updating code in multiple places. Like wtf just happened, I have to find these changes, which code broke dependencies. It should be more step wise visually
seunosewa 19 hours ago [-]
In VS Code Copilot you have to approve every change manually when you apply to from the chat.
joshstrange 18 hours ago [-]
Aider + git makes this a non-issue. You can revert anything it does easily (even 5-10 steps down the line) and it shows you the code blocks it's changing in the chat UI as well.
josefrichter 18 hours ago [-]
In some editors, like Windsurf, you will get git-like diff and you have to approve each change (or accept all in bulk, of course).
Take8435 19 hours ago [-]
Are you... not using git?
barrenko 18 hours ago [-]
The next couple of years will be dedicated to working that out.
naiv 18 hours ago [-]
rather months than years
barrenko 18 hours ago [-]
Yes, more like it.
empath75 19 hours ago [-]
I think he's right that there's a place for a more structured AI programming UI, but chat and autocomplete are also good for a lot of use cases.
fragmede 17 hours ago [-]
> People call them “great for prototyping,” which means “don’t use this for anything real.”
Eh, that's just copium because we all have a vested monetary interest in them not being useful for "anything real", whatever that means. If it turns out that there useful for "real things", then then entire industry would get turned on its head. (hint: they're useful for "real" things), though putting the entire codebase into the context window doesn't currently work. Aider works past this by passing the directory tree and filenames as context, so the LLM guess that /cloud/scope/cluster.go is where the cluster scope code lives and ask for that specific file to get added to the context and you can ask it to add, say, logging code to that file.
2030ai 14 hours ago [-]
[dead]
inhc19 12 hours ago [-]
[dead]
danielaipao 19 hours ago [-]
[dead]
danielaipao 19 hours ago [-]
[dead]
Rendered at 11:01:00 GMT+0000 (Coordinated Universal Time) with Vercel.
I just finished a small project where I used o3-mini and o3-mini-high to generate most of the code. I averaged around 200 lines of code an hour, including the business logic and unit tests. Total was around 2200 lines. So, not a big project, but not a throw away script. The code was perfectly fine for what we needed. This is the third time I've done this, and each time I get faster and better at it.
1. I find a "pair programming" mentality is key. I focus on the high-level code, and let the model focus on the lower level code. I code review all the code, and provide feedback. Blindly accepting the code is a terrible approach.
2. Generating unit tests is critical. After I like the gist of some code, I ask for some smoke tests. Again, peer review the code and adjust as needed.
3. Be liberal with starting a new chat: the models can get easily confused with longer context windows. If you start to see things go sideways, start over.
4. Give it code examples. Don't prompt with English only.
FWIW, o3-mini was the best model I've seen so far; Sonnet 3.5 New is a close second.
The minute you have to discuss those things with someone else, your bandwidth decreases by orders of magnitude and now you have to put words to these things and describe them, and physically type them in or vocalize them. Then your counterpart has to input them through his eyes and ears, process that, and re-output his thoughts to you. Slow, slow, slow, and prone to error and specificity problems as you translate technical concepts to English and back.
Chat as a UX interface is similarly slow and poorly specific. It has all the shortcomings of discussing your idea with a human and really no upside besides the dictionary-like recall.
our brains like to jump over inconsistencies or small gaps in our logic when working by themselves, but try to explain that same concept to someone else and those inconsistencies and gaps become glaringly obvious (doubly so if the other person starts asking questions you never considered)
it's why pair programming and rubber duck debugging work at all, at least in my opinion
For me pair programming accelerates development to much more than 2x. Over time the two of you figure out how to use each other's strengths, and as both of you immerse yourself in the same context you begin to understand what's needed without speaking every bit of syntax between each other.
In best cases as a driver you end up producing high quality on the first pass, because you know that your partner will immediately catch anything that doesn't look right. You also go fast because you can sometimes skim over complexities letting your partner think ahead and share that context load.
I'll leave readers to find all the caveats here
Edit: I should probably mention why I think Chat Interface for AI is not working like Pair programming: As much as it may fake it, AI isn't learning anything while you're chatting to it. Its pointless to argue your case or discuss architectural approaches. An approach that yields better results with Chat AI is to just edit/expand your original prompt. It also feels less like a waste of time.
With Pair programming, you may chat upfront, but you won't reach that shared understanding until you start trying to implement something. For now Chat AI has no shared understanding, just "what I asked you to do" thing, and that's not good enough.
For me, there's
- Time when I want to discuss the approach and/or code to something (someone being there is a requirement)
- Time when I want to rubber duck, and put things to words (someone being there doesn't hurt, but it doesn't help)
- Time when I want to write code that implements things, which may be based on the output of one of the above
That last bucket of time is generally greatly hampered by having someone else there and needing to interact with them. Being able to separate them (having people there for the first one or two, but not the third) is, for me, optimal.
Maybe collaborate the first hour each morning, then the first hour after lunch.
The value of pair programming is inversely proportional to the expertise of the participant. Junior devs who pair with senior devs get a lot out of it, senior devs not so much.
GP is probably a more experienced dev, whereas you are the type of dev who says things like “I’m guessing that you…”.
What's your definition of 'learn'? An LLM absolutely does extract and store information from its context. Sure, it's only short term memory and it's gone the next session, but within the session it's still learning.
I like your suggestion to update your original prompt instead of continuing the conversation.
I’m not saying pair programming is a silver bullet, and I tend to agree that working on your own can be vastly more efficient. I do however think that it’s a very useful tool for critical functionality and hard problems and shouldn’t be dismissed.
I've been coding long enough to notice there are times where the problem is complex and unclear enough that my own thought process will turn into pair programming with myself, literally chatting with myself in a text file; this process has the bandwidth and latency on the same order as talking to another person, so I might just as well do that and get the benefit of an independent perspective.
The above is really more of a design-level discussion. However, there are other times - precisely those times that pair programming is meant for - when the problem is clear enough I can immerse myself in it. Using the slow I/O mode, being deliberate is exactly the opposite of what I need then. By moving alone and focused, keeping my thoughts below the level of words, I can explore the problem space much further, rapidly proposing a solution, feeling it out, proposing another, comparing, deciding on a direction, noticing edge cases and bad design up front and dealing with them, all in a rapid feedback loop with test. Pair programming in this scenario would truly force me to "use the slower I/O parts of your brain", in that exact sense: it's like splitting a highly-optimized in-memory data processing pipeline in two, and making the halves communicate over IPC. With JSON.
As for bus factor, I find the argument bogus anyway. For that to work, pair programming would've to be executed with the same partner or small group of partners, preferably working on the same or related code modules, daily, over the course of weeks at least - otherwise neither them nor I are going to have enough exposure to understand what the other is working on. But it's not how pair programming worked when I've experienced it.
It's a problem with code reviews, too: if your project has depth[0], I won't really understand the whole context of what you're doing, and you won't understand the context of my work, so our reviews of each others' code will quickly degenerate to spotting typos, style violations, and peculiar design choices; neither of us will have time or mental capacity to fully understand the changeset before "+2 LGTM"-ing it away.
--
[0] - I don't know if there's a a better, established term for it. What I mean is depth vs. breadth in the project architecture. Example of depth: you have a main execution orchestrator, you have an external data system that handles integrations with a dozen different data storage systems, then you have math-heavy business logic on data, then you have RPC for integrating with GUI software developed by another team, then you have extensive configuration system, etc. - each of those areas is full of design and coding challenges that don't transfer to any other. Contrast that with an example of breadth: a typical webapp or mobile app, where 80% of the code is just some UI components and a hundred different screens, with very little unique or domain-specific logic. In those projects, developers are like free electrons in metal: they can pick any part of the project at any given moment and be equally productive working on it, because every part is basically the same as every other part. In those projects, I can see both pair programming and code reviews deliver on their promises in full.
Peer and near-peer reviews have always wound up being nitpicking or perfunctory.
An alternative that might work if you want two hands on every change for process reasons is to have the reviewer do something closer to formal QA, building and running the changed code to verify it has the expected behavior. That has a lot of limitations too, but it least it doesn’t degrade to bikeshedding about variable name aesthetics.
Pair programming is endlessly frustrating beyond just rubber duckying because I’m having to exit my mental model, communicate it to someone else, and then translate and relate their inputs back into my mental model which is not exactly rooted in language in my head.
Subvocalization/explicit vocalization of what you're doing actually improves your understanding of the code. Doing so may 'decrease bandwith', but improves comprehension, because it's basically inline rubber duck debugging.
It's actually easy to write code which you don't understand and cannot explain what it's doing, whether at the syntax, logic or application level. I think the analogue is to writing well; anyone can write streams of consciousness amounting to word salad garbage. But a good writer can cut things down and explain why every single thing was chosen, right down to the punctuations. This feature of writing should be even more apparent with code.
I've coded tons of things where I can get the code working in a mediocre fashion, and yet find great difficulty in try to verbally explain what I'm doing.
In contrast there's been code where I've been able to explain each step of what I'm doing before I even write anything; in those situations what generally comes out tends to be superior maintainable code, and readable too.
I wrote about this here: https://digitalseams.com/blog/the-ideal-ai-interface-is-prob...
Especially with the new r1 thinking output, I find it useful to iterate on the initial prompt as a way to make my ideas more concrete as much as iterating through the chat interface which is more hit and miss due to context length limits.
* I don’t mean that in a negative way, but in a “I can’t expect another person to respond to me instantly at 10 words per second” way.
I mean, isn’t typing your code also forcing you to make your ideas concrete
Neural networks and evolved structures and pathways (e.g. humans make do with ~20k genes and about that many more in regulatory sequences) are absolutely more efficient, but good luck debugging them.
I've yet to see any model understand nuance or detail.
This is especially apparent in image models. Sure, it can do hands but they still don't get 3D space nor temporal movements. It's great for scrolling through Twitter but the longer you look the more surreal they get. This even includes the new ByteDance model also on the front page. But with coding models they ignore context of the codebase and the results feel more like patchwork. They feel like what you'd be annoyed at with a junior dev for writing because not only do you have to go through 10 PRs to make it pass the test cases but the lack of context just builds a lot of tech debt. How they'll build unit tests that technically work but don't capture the actual issues and usually can be highly condensed while having greater coverage. It feels very gluey, like copy pasting from stack overflow when hyper focused on the immediate outcome instead of understanding the goal. It is too "solution" oriented, not understanding the underlying heuristics and is more frustrating than dealing with the human equivalent who says something "works" as evidenced by the output. This is like trying to say a math proof is correct by looking at just the last line.
Ironically, I think in part this is why chat interface sucks too. A lot of our job is to do a lot of inference in figuring out what our managers are even asking us to make. And you can't even know the answer until you're part way in.
This is why I think LLMs can't really replace developers. 80% of my job is already trying to figure out what's actually needed, despite being given lots of text detail, maybe even spec, or prototype code.
Building the wrong thing fast is about as useful as not building anything at all. (And before someone says "at least you now know what not to do"? For any problem there are infinite number of wrong solutions, but only a handful of ones that yield success, why waste time trying all the wrong ones?)
Fully agree. Plus, you may be faster in the short term but you won't in the long run. The effects of both good code and bad code compound. "Tech debt" is just a fancy term for "compounding shit". And it is true, all code is shit, but it isn't binary; there is a big difference between being stepping in shit and being waist deep in shit.
I can predict some of the responses
There's a grave misunderstanding in this adage[0], and I think many interpret it as "don't worry about efficiency, worry about output." But the context is that you shouldn't optimize without first profiling the code, not that you shouldn't optimize![1] I find it also funny revisiting this quote, because it seems like it is written by a stranger in a strange land, where programmers are overly concerned with optimizing their code. These days, I hear very little about optimization (except when I work with HPC people) except when people are saying to not optimize. Explains why everything is so sluggish...[0] https://softwareengineering.stackexchange.com/a/80092
[1] Understanding the limitations of big O analysis really helps in understanding why this point matters. Usually when n is small, you can have worse big O and still be faster. But the constants we drop off often aren't a rounding error. https://csweb.wooster.edu/dbyrnes/cs200/htmlNotes/qsort3.htm
Devil's advocate: because unless you're working in heavily dysfunctional organization, or are doing a live coding interview, you're not playing "guess the password" with your management. Most of the time, they have even less of a clue about how the right solution looks like! "Building the wrong thing" lets them diff something concrete against what they imagined and felt like it would be, forcing them to clarify their expectations and give you more accurate directions (which, being a diff against a concrete things, are less likely to be then misunderstood by you!). And, the faster you can build that wrong thing, the less money and time is burned to buy that extra clarity.
Doing 4 sprints over 2 months to make a prototype in order to save 3 60 minute meetings over a week where you do a few requirements analysis/proposal review cycles.
That’s a lot of effort for a prototype that you should be throwing away even if it does the right thing!
Are you sure you’re not gold plating your prototypes?
I see this not as opposed, but as part of requirements analysis/review - working in the abstract, with imagination and prose and diagrams, it's too easy to make invalid assumptions without anyone realizing it.
It is really hard to figure out what the right thing is. We humans don't do this just through chat. We experiment, discuss, argue, draw, and there's tons of inference and reliance upon shared understandings. There's a lot of associated context. You're right that a dysfunctional organization (not uncommon) is worse, but these things are still quite common in highly functioning organizations. Your explicit statement about management having even less of an idea of what the right solution is, is explicitly what we're pushing back against. Saying that that is a large part of a developer's job. I would argue that the main reason we have a better idea is due to our technical skills, our depth of knowledge, our experience. A compression machine (LLM) will get some of this, but there's a big gap when trying to get to the end point. Pareto is a bitch. We all know there is a huge difference between a demonstrating prototype and an actual product. That the amount of effort and resources are exponentially different. ML systems specifically struggle with detail and nuance, but this is the root of those resource differences.
I'll give an example for clarity. Considering the iPad, the existence of third party note taking apps can be interpreted of nothing short of Apple's failure. I mean for the love of god, you got the pencil and you can't pull up notes and interact with it like it is a piece of paper? It's how the damned thing is advertised! A third party note taking app should be interpreted by Apple as showing their weak points. But you can't even zoom on the notes app?! Sure, you can turn on the accessibility setting and zoom with triple tap (significantly diverging from the standard pinching gesture used literally everywhere else) but if you do this (assuming full screen) you are just zooming in on the portion of the actual screen and not zooming in the notes. So you get stupid results like not having access to your pen's settings. Which is extra important here given that the likely reason someone would zoom is to adjust details and certainly you're going to want to adjust the eraser size. What I'm trying to say is that there's a lot of low hanging fruit here that should be incredibly obvious were you to actually use the application, dog-fooding. Instead, Apple is dedicating time into hand writing recognition and equation solving, which in practice (at least in my experience) end up creating a more jarring experience and cause more editing. Though it is cool when it does work. I'd say that here, Apple is not building the right thing. They are completely out of touch with the actual goals and experiences of the users. It's not that they didn't build a perfect app, it is that they fail to build basic functionality.
But of course, Apple probably doesn't care. Because they significantly prioritize profits over building a quality product. These are orthogonal aspects and they can be simultaneously optimized. One should not need pick one over another and the truth is that our economics should ensure alignment, that quality begets profits and that one can't "hack" the metrics.
Apple is far from alone here though. I'd say this "low hanging infuriating bullshit" is actually quite common. In fact, I think it is becoming more common. I have argued here before about the need for more "grumpy developers." I think if you're not grumpy, you should be concerned. Our job is to find problems, break them down into a way that can be addressed, and to resolve them. The "grumpiness" here is a dissatisfaction with the state of things. Given that nothing is perfect, there should always be reason to be "grumpy." A good developer should be able to identify and fix problems without being asked. But I do think there's a worrying decline of (lack of!) "grumpy" types, and I have no doubt this is connected to the rapid rise of vaporware and shitty products.
Also, I notice you play Devil's advocate a lot. While I think it can be useful, I think it can be overused. It needs to drive home the key limitations to an argument, especially when they are uncomfortable. Though I think in our case, I'm the one making the argument that diverges from the norm.
Have you tried Cursor? It has a great feature that grabs context from the codebase, I use it all the time.
I actually cancelled by Anthropic subscription when I started using cursor because I only ever used Claude for code generation anyway so now I just do it within the IDE.
If only this feature worked consistently, or reliably even half of the time.
It will casually forget or ignore any and all context and any and all files in your codebase at random times, and you never know what set of files and docs it's working with at any point in time
Here's a simple example with GPT-4o: https://0x0.st/8K3z.png
It probably isn't obvious in a quick read, but there are mistakes here. Maybe the most obvious is that how `replacements` is made we need to intelligently order. This could be fixed by sorting. But is this the right data structure? Not to mention that the algorithm itself is quite... odd
To give a more complicated example I passed the same prompt from this famous code golf problem[0]. Here's the results, I'll save you the time, the output is wrong https://0x0.st/8K3M.txt (note, I started command likes with "$" and added some notes for you)
Just for the heck of it, here's the same thing but with o1-preview
Initial problem: https://0x0.st/8K3t.txt
Codegolf one: https://0x0.st/8K3y.txt
As you can see, o1 is a bit better on the initial problem but still fails at the code golf one. It really isn't beating the baseline naive solution. It does 170 MiB/s compared to 160 MiB/s (baseline with -O3). This is something I'd hope it could do really well on given that this problem is rather famous and so many occurrences of it should show up. There's tons of variations out there and It is common to see parallel fizzbuzz in a class on parallelization as well as it can teach important concepts like keeping the output in the right order.
But hey, at least o1 has the correct output... It's just that that's not all that matters.
I stand by this: evaluating code based on output alone is akin to evaluating a mathematical proof based on the result. And I hope these examples make the point why that matters, why checking output is insufficient.
[0] https://codegolf.stackexchange.com/questions/215216/high-thr...
Edit: I want to add that there's also an important factor here. The LLM might get you a "result" faster, but you are much more likely to miss the learning process that comes with struggling. Because that makes you much faster (and more flexible) not just next time but in many situations where even a subset is similar. Which yeah, totally fine to glue shit together when you don't care and just need something, but there's a lot of missed value if you need to revisit any of that. I do have concerns that people will be plateaued at junior levels. I hope it doesn't cause seniors to revert to juniors, which I've seen happen without LLMs. If you stop working on these types of problems, you lose the skills. There's already an issue where we rush to get output and it has clear effects on the stagnation of devs. We have far more programmers than ever but I'm not confident we have a significant number more wizards (the percentage of wizards is decreasing). There's fewer people writing programs just for fun. But "for fun" is one of our greatest learning tools as humans. Play is a common trait you see in animals and it exists for a reason.
That's interesting. I found assistants like Copilot fairly good at low level code, assuming you direct it well.
I do think that when I'm coding with an LLM it _feels_ faster, but when I've timed myself, it doesn't seem that way. It just seems to be less effort (I don't mind the effort, especially because the compounding rewards).
1. I need a smart autocomplete that can work backwards and mimic my coding patterns
2. I need a pair programming buddy (of sorts, this metaphor doesn't completely work, but I don't have a better one)
Pair development, even a butchered version of the so called "strong style" (give the driver the highest level of abstraction they can use/understand) works quite well for me. But, the main reason this works is that it forces me to structure my thinking a little bit, allows me to iterate on the definition of the problem. Toss away the sketch with bigger parts of the problem, start again.
It also helps me to avoid yak shaving, getting lost in the detail or distracted because the feedback loop between me seeing something working on the screen vs. the idea is so short (even if the code is crap).
I'd also add 5.: use prompts to generate (boring) prompts. For instance, I needed a simple #tag formatter for one of my markdown sites. I am aware that there's a not-so-small list of edge cases I'd need to cover. In this case I'd write a prompt with a list of basic requirements and ask the LLM to: a) extend it with good practice, common edge cases b) format it as a spec with concrete input / output examples. This works a bit similar to the point you made about generating unit tests (I do that too, in tandem with this approach).
In a sense 1) is autocomplete 2) is a scaffolding tool.
Example: - copy paste a table from a pdf datasheet into a comment (it'll be badly formatted with newlines and whatnot, doesn't matter) - show it how to do the first line - autocomplete the rest of the table - Check every row to make sure it didn't invent fields/types
For this type of workflow the tools are a real time saver. I've yet to see any results for the other workflows. They usually just frustrate me by either starting to suggest nonsense code without full understanding, or its far too easy to bias the results and make them stuck in a pattern of thinking.
> 1. I need a smart autocomplete that can work backwards and mimic my coding patterns
> 2. I need a pair programming buddy (of sorts, this metaphor doesn't completely work, but I don't have a better one)
Thanks! This is the first time I've seen it put this clearly. When I first tried out CoPilot, I was unsure of how I was "supposed" to interact with it. Is it (as you put it) a smarter autocomplete, or a programming buddy? Is it both? What was the right input method to use?
After a while, I realized that for my personal style I would pretty much entirely use method 1, and never method 2. But, others might really need that "programming buddy" and use that interface instead.
- Text prompts and chat interfaces are great for coarse grained exploration. You can get a rough start that you can refine. "Knight standing in a desert, rusted suit of armor" gets you started, but you'll want to take it much further.
- Precision inputs (mouse or structure guided) are best for fine tuning the result and honing in on the solution itself. You can individually plant the cacti and pose the character. You can't get there with text.
It is helpful to frame this in the historical arc described by Yuval Harari in his recent book "Nexus" on the evolution of information systems. We're at the dawn of history for how to work with AI, and actively visualizing the future has an immediate ROI.
"Chat" is cave man oral tradition. It is like attempting a complex Ruby project through the periscope of an `irb` session. One needs to use an IDE to manage a complex code base. We all know this, but we haven't connected the dots that we need to approach prompt management the same way.
Flip ahead in Harari's book, and he describes rabbis writing texts on how to interpret [texts on how to interpret]* holy scriptures. Like Christopher Nolan's movie "Inception" (his second most relevant work after "Memento"), I've found myself several dreams deep collaborating with AI to develop prompts for [collaborating with AI to develop prompts for]* writing code together. Test the whole setup on multiple fresh AI sessions, as if one is running a business school laboratory on managerial genius, till AI can write correct code in one shot.
Duh? Good managers already understand this, working with teams of people. Technical climbers work cliffs this way. And AI was a blithering idiot until we understood how to simulate recursion in multilayer neural nets.
AI is a Rorschach inkblot test. Talk to it like a kindergartner, and you see the intelligence of a kindergartner. Use your most talented programmer to collaborate with you in preparing precise and complete specifications for your team, and you see a talented team of mature professionals.
We all experience degradation of long AI sessions. This is not inevitable; "life extension" needs to be tackled as a research problem. Just as old people get senile, AI fumbles its own context management over time. Civilization has advanced by developing technologies for passing knowledge forward. We need to engineer similar technologies for providing persistent memory to make each successive AI session smarter than the last. Authoring this knowledge helps each session to survive longer. If we fail to see this, we're condemning ourselves to stay cave men.
Compare the history of computing. There was a lot of philosophy and abstract mathematics about the potential for mechanical computation, but our worldview exploded when we could actually plug the machines in. We're at the same inflection point for theories of mind, semantic compression, structured memory. Indeed, philosophy was an untestable intellectual exercise before; now we can plug it in.
How do I know this? I'm just an old mathematician, in my first month trying to learn AI for one final burst of productivity before my father's dementia arrives. I don't have time to wait for anyone's version of these visions, so I computed them.
In mathematics, the line in the sand between theory and computation keeps moving. Indeed, I helped move it by computerizing my field when I was young. Mathematicians still contribute theory, and the computations help.
A similar line in the sand is moving, between visionary creativity and computation. LLMs are association engines of staggering scope, and what some call "hallucinations" can be harnessed to generalize from all human endeavors to project future best practices. Like how to best work with AI.
I've tested everything I say here, and it works.
Yesterday, I asked o3-mini to "optimize" a block of code. It produced very clean, functional TypeScript. However, because the code is reducing stock option chains, I then asked o3-mini to "optimize for speed." In the JavaScript world, this is usually done with for loops, and it even considered aspects like array memory allocation.
This shows that using the right qualifiers is important for getting the results you want. Today, I use both "optimize for developer experience" and "optimize for speed" when they are appropriate.
Although declarative code is just an abstraction, moving from imperative jQuery to declarative React was a major change in my coding experience. My work went from telling the system how to do something to simply telling it what to do. Of course, in React—especially at first—I had to explain how to do things, but only once to create a component. After that, I could just tell the system what to do. Now, I can simply declare the desired outcome, the what. It helps to understand how things work, but that level of detail is becoming less necessary.
As a side note, "No, you're wrong" is not a great way to have a conversation.
Sometimes typos are eerily appropriate ;)
(I almost typed "errily"...)
Chat sucks for pulling in context, and the only worse thing I've tried is the IDE integrations that supposedly pull the relevant context for you (and I've tried quite a few recently).
I don't know if naive fine-tuning with codebase would work, I suspect there are going to be tools that let you train the AI on code in the sense that it can have some references in model, and it knows how you want your project code/structure to look like (which is often quite different from what it looks in most areas)
I think chat is a nice intermediary evolution between the CLI (that we use every day) and whatever comes next.
I work at Augment (https://augmentcode.com), which, surprise surprise, is an AI coding assistant. We think about the new modality required to interact with code and AI on a daily basis.
Beside increase productivity (and happiness, as you don't have to do mundane tasks like tests, documentations, etc), I personally believe that what AI can open up is actually more of a way for non-coders (think PMs) to interact with a codebase. AI is really good at converting specs, user stories, and so on into tasks—which today still need to be implemented by software engineers (with the help of AI for the more tedious work). Think of what Figma did between designers and developers, but applied to coding.
What’s the actual "new UI/UX paradigm"? I don’t know yet. But like with Figma, I believe there’s a happy ending waiting for everyone.
Even then though, I asked o1-cursor to start a react app. It failed, mostly because it's out of date. It's instructions were for react 2 versions ago.
This seems like an issue. If the statistically most likley answer is old, that's not helpful.
I might be reading into your comment, but I agree "top-down" development sucks: "Give me a react that does X". I've had much more success going bottom-up.
And I've often seen models getting confused on versions. You need to be explicit, and even then then forget.
I'm now bracing for the "oh sht, we're all out of a job next year" narrative.
I don't want an LLM to generate "the answer" for me in a lot of places, but I do think it's amazing for helping me gather information (and cite where that information came from) and pointers in directions to look. A search engine that generates a concrete answer via LLM is (mostly) useless to me. One that gives me an answer and then links to the facts it used to generate that answer is _very_ useful.
It's the same way with programming. It's great helping you find what you need. But it needs to be in a way that you can verify it's right; or take it's answer and adjust it to what you actually need (based on the context it provides).
Maybe. My sense if we'd need to see 3 to 4 orders of magnitude improvements on the current models before we can replace people outright.
I do think we'll see a huge productivity boost per developer over the next few years. Some companies will use that to increase their throughput, and some will use it to reduce overhead.
Perhaps there's gonna be post-AI programming movement where people actually stare at the same monitor and discuss while one of them is coding.
As a sidenote - we've done experiments with FOBsters, and when paired this way, the multiply their output. There's something about psychology of groups and how one can only provide maximum output when teaming.
Even for solo activities, and non-IT activities, such as skiing/snowboard, it is better to have a partner to ride with you and discuss the terrain.
Works amazingly well for a lot of what I've been working on the past month or two.
Copilot, back when I used it, completely ignored context outside of the file I was working in. Copilot, as of a few weeks ago, the absolute dumbest assistant of all the various options available.
With cursor, I can ask it to make a change to how the app generates a JWT without even knowing which file or folder the relevant code is in. For very large codebases, this is very very helpful.
My experiments have been nowhere near that successful.
I would love, love, love to see a transcript of how that process worked over an hour, if that was something you were willing to share.
I do all this + rubber ducky the hell out of it.
Sometimes I just discuss concepts of the project with the thing and it helps me think.
I dont think chat is going to be right for everyone but it absolutely works for me.
GitHub Copilot is...not. It doesn't seem to understand how to help me as well as ChatGPT does.
This is what I've found to be key. If I start a new feature, I will work with the LLM to do the following:
- Create problem and solution statement
- Create requirements and user stories
- Create architecture
- Create skeleton code. This is critical since it lets me understand what it wants to do.
- Generate a summary of the skeleton code
Once I have done the above, I will have the LLM generate a reusable prompt that I can use to start LLM conversations with. Below is an example of how I turn everything into a reusable prompt.
https://beta.gitsense.com/?chat=b96ce9e0-da19-45e8-bfec-a3ec...
As I make changes like add new files, I will need to generate a new prompt but it is worth the effort. And you can see it in action here.
https://beta.gitsense.com/?chat=b8c4b221-55e5-4ed6-860e-12f0...
The first message is the reusable prompt message. With the first message in place, I can describe the problem or requirements and ask the LLM what files it will need to better understand how to implement things.
What I am currently doing highlights how I think LLM is a game changer. VCs are going for moonshots instead of home runs. The ability to gather requirements and talk through a solution before even coding is how I think LLMs will revolutionize things. It is great that it can produce usable code, but what I've found it to be invaluable is it helps you organize your thoughts.
In the last link, I am having a conversation with both DeepSeek v3 and Sonnet 3.5 and the LLMs legitimately saved me hours in work, without even writing a single line of code. In the past, I would have just implemented the feature and been done with it, and then I would have to fix something if I didn't think of an edge case. With LLMs, it literally takes minutes to develop a plan that is extremely well documented that can be shared with others.
This ability to generate design documents is how I think LLMs will ultimately be used. The bonus is producing code, but the reality is that documentation (which can be tedious and frustrating) is a requirement for software development. In my opinion, this is where LLMs will forever change things.
That is to say: I think LLMs are going to make a problem we already had (much) worse.
Came to vote good too. I mean, why do we all love a nice REPL? That's chat right? Chat with an interpreter.
I paste that into Claude and it is surprisingly good at fixing bugs and making visual modifications.
why the top comments on HN are always people who have not read the article
In large, I assert this is because the best way to do something is to do that thing. There can be correspondence around the thing, but the artifacts that you are building are separate things.
You could probably take this further and say that narrative is a terrible way to build things. It can be a great way to communicate them, but being a separate entity, it is not necessarily good at making any artifacts.
Chat is a great UI pattern for ephemeral conversation. It's why we get on the phone or on DM to talk with people while collaborating on documents, and don't just sit there making isolated edits to some Google Doc.
It's great because it can go all over the place and the humans get to decide which part of that conversation is meaningful and which isn't, and then put that in the document.
It's also obviously not enough: you still need documents!
But this isn't an "either-or" case. It's a "both" case.
I have never had any issue finding information in slack with history going back nearly a decade. The only issue I have with Slack is a people problem where most communication is siloed in private channels and DMs.
Email threads are incredibly hard to follow though. The UX is rough and it shows.
The fact that there's a subject header alone leads people to both stay on topic and have better thought out messages.
I agree that email threads could have better UX. Part of that is the clients insistence on appending the previous message to every reply. This is completely optional though and should probably be turned off by default for simple replies.
Email is really powerful but people simply aren't good at taking advantage of it and it varies by email client. Doing some IT work at a startup made this pretty clear to me. I found Slack was much more intuitive for people.
Both systems rely on the savviness of the users for the best experience and I just think email is losing the UX war. Given how terrible people seem to be at communicating I think it's a pretty important factor to consider.
I've also generally had the opposite experience, a huge amount of business offices live and breath in email (mostly Outlook, but I'm sure it varies). Startups tend to run fast and lean, but as soon as you have some threshold of people, email is king.
I'm not hating on email, it has a lot of good properties and still serves a purpose. Every office appears to have some kind anti-slack vigilante. It's really not that bad.
It doesn't help that Outlook's search capabilities have gotten effectively useless - I can type in search terms that I'm literally looking at in my inbox and have it return no results, or have it return dozens of hits without the search terms involved at all. I don't have that problem with Slack or Teams.
However, I think you are right overall on email being better overall for what people end up using chat apps for.
People love complaining about the email workflow of git, but it is demonstrably better than any chat program for what it is doing.
I'm thinking of things that are assembled. The correspondence that went into the assembly is largely of historical interest, but not necessarily one of current use.
Is your issue that you want to discuss the thing you are collaborating on outside of the tool you are creating it in?
We have some tools integrated with email to help version control things. But the actual version control is, strictly, not the emails.
Talking to a computer still sucks as an user interface - not because a computer can't communicate on multiple channels the way people do, as it can do it now too. It sucks for the same reason talking to people sucks as an user interface - because the kind of tasks we use computers for (and that aren't just talking with/to/at other people via electronic means) are better handle by doing than by talking about them. We need an interface to operate a tool, not an interface to an agent that operates a tool for us.
As an example, consider driving (as in, realtime control - not just "getting from point A to B"): a chat interface to driving would suck just as badly as being a backseat driver sucks for both people in the car. In contrast, a steering wheel, instead of being a bandwidth-limiting indirection, is an anti-indirection - not only it lets you control the machine with your body, the control is direct enough that over time your brain learns to abstract it away, and the car becomes an extension of your body. We need more of tangible interfaces like that with computers.
The steering wheel case, of course, would fail with "AI-level smarts" - but that still doesn't mean we should embrace talking to computers. A good analogy is dance - it's an interaction between two independently smart agents exploring an activity together, and as they do it enough, it becomes fluid.
So dance, IMO, is the steering wheel analogy for AI-powered interfaces, and that is the space we need to explore more.
Excellent comment and it gets to the heart of something I've had trouble clearly articulating: We've slowly lost the concept that a computer is a tool that the user wields and commands to do things. Now, a computer has its own mind and agency, and we "request" it to do things and "communicate" with it, and ask it to run this and don't run that.
Now, we're negotiating and pleading with the man inside of the computer, Mr. Computer, who has its own goals and ambitions that don't necessarily align with your own as a user. It runs what it wants to run, and if that upsets you, user, well tough shit! Instead of waiting for a command and then faithfully executing it, Mr. Computer is off doing whatever the hell he wants, running system applications in the background, updating this and that, sending you notifications, and occasionally asking you for permission to do even more. And here you are as the user, hobbled and increasingly forced to "chat" with it to get it to do what you want.
Even turning your computer off! You used to throw a hardware switch that interrupts the power to the main board, and _sayonara_ Mr. Computer! Now, the switch does nothing but send an impassioned plea to the operating system to pretty please, with sugar on top, when you're not busy could you possibly power off the computer (or mostly power it off, because off doesn't even mean off anymore).
The modern OS values the system's theoretical 'system health' metrics far above things like "whether the user can use it to do some user task."
Another great example is how you can't boot a modern Mac laptop, on AC power, until it has decided its battery is sufficiently charged. Why? None of your business.
Anyway to get back on topic, this is an interesting connection you've made, the software vendor will perhaps delegate decisions like "is the user allowed to log into the computer at this time" or "is a reboot mandatory" to an "agent" running on the computer. If we're lucky we'll get to talk to that agent to plead our case, but my guess is Apple and Microsoft will decide we aren't qualified to have input to the decisions.
1: https://support.apple.com/en-us/102149
FWIW this happens what happens with modern steering wheels as well. Power steering is its own complicated subsystem that isn't just about user input. It has many more failure modes than an old-fashioned, analog steering wheel. The reason folks feel like "Mr. Computer" has a mind of its own is because of the mismatch between user desire and effect. This is a UX problem.
I also think chat and RAG are the biggest two UX paradigms we've spent exploring when it comes to LLMs. It's probably worth folks exploring other UX for LLMs that are enabling for the user. Suggestions in documents and code seem to be a UX that more people enjoy using but even then there's a mismatch.
[1] https://dynamicland.org/
EDIT: love your analogy to dance!
But, as you say, a chat interface would be a terrible way to actively drive a car. And that is a different thing, but I'm growing convinced many will focus on the first idea while staving off the complaints of the latter.
In another thread, I assert that chat is probably a fine way to order up something that fits a repertoire that trained a bot. But, I don't think sticking to the chat window is the best way to interface with what it delivers. You almost certainly want to be much more actively "hands on" in very domain specific ways with the artifacts produced.
Yes, this is what I've also tried to hint at in my comment, but failed part-way. In most of the cases I can imagine chat interface to be fine (or even ideal), it's really only good as a starting point. Take two examples based on your reply:
1) Getting a car ride. "Computer, order me a cab home" is a good start. It's even OK if I then get asked to narrow it down between several different services/fares (next time I'll remember to specify that up front). But if I want to inspect the route (or perhaps adjust it, in a hypothetical service that supports it), I'd already prefer an interactive map I can scroll and zoom, with PoIs I can tap on to get their details, than to continue a verbal chat.
2) Ordering food in a fast food restaurant. I'm fine starting it with a conversation if I know what I want. However, getting back the order summary in prose (or worse, read out loud) would already be taxing, and if I wanted to make final adjustments, I'd beg for buttons and numeric input boxes. And, in case I don't know what I want, or what is available (and at what prices), a chat interface is a non-starter. Interactive menu is a must.
You sum this up perfectly:
> You almost certainly want to be much more actively "hands on" in very domain specific ways with the artifacts produced.
Chat may be great to get that first artifact, but afterwards, there's almost always a more hands-on interface that would be much better.
You have some of the same problems with email, of course. Losing threading, in particular, made things worse. It was a "chatification of email" that caused people to lean in to email being bad. Amusing that we are now seeing chat applications rise to replace email.
I'm actually in an awkward position where I was very supportive of RTO two years ago, but have since become very reliant on some things I could not do with a rigid RTO policy.
Regardless of RTO or WFH, patience and persistence remain vital qualities.
I really don't like the idea of chatting with an AI though. There are better ways to interface with AIs and the focus on chat is making people forget that.
As is often the case in these sorts of thingsz your milage may vary for the more complex settings.
The last counter argument I read got buried on Discord or Slack somewhere.
My experience with email is that people have subject lines, email explicitly identifies to and cc recipients; email is threaded; email often has quotes/excerpting/highlighting from prior parts of the thread.
On the other hand, most chat usage I see is dependent on temporal aspects for threading (people under-utilize platform features for replies etc), tagging is generally only done to ping people to attract attention, chat groups are frequently reused for multiple different purposes.
Leaping to a point-in-time within a chat stream is often a bad user experience, with having to scroll up and down through unrelated stuff to find what you’re looking for.
Stuff in email is just massively more discoverable for me.
This obvious assumes that who ever wrote the email isn't a madman, who insist on using emails like it was a chat.
While all of these features could in principle be realized in a chat system as well, in practice they don’t provide that flexibility and power.
Another usability feature of emails is that they have a subject line. This allows to meaningfully list emails in a compact fashion. In a desktop interface, you can easily view and visually grep 50 emails or more at once in a mail folder or list of search results (in something like Outlook or Thunderbird or Mutt). This allows working with emails more efficiently than with a chat view where you can only see a few messages at once, and only of the same thread or channel.
Yet another usability feature of emails is that each email has its own read/unread status. This, again, is facilitated by each email being its own separate data object, and by the separation between subject and body, which allows the read status to be unambiguously bound to “opening” the email, or to navigating in the list of emails alongside a preview pane. And you can mark any email as unread again. In chats, the granularity of read/unread is the whole chat, whether you’ve actually read all of it or not. You can’t easily track what you’ve read or not in an automated way as with email, other than by that coarse-grained linear time-based property of when you last visited the channel.
This is why, if something is important, I take it out of email and put it into a document people can reference. The latest and correct information from all the decisions in the thread can also be collected in one place, so everyone reading doesn’t have to figure it out. Not to mention side conversations can influence the results, without being explicitly stated in the email thread.
This is how things should be done, regardless of which medium is used to discuss the project. Without isolating and aggregating the final decision of each thread, there is no way to determine what everybody has agreed upon as the final product without looking back, which quickly becomes onerous.
Things get messy when you start having different versions of each feature, but that doesn't change the concept of using email/Slack/Discord/text/etc. for discussion and a separate "living" document for formalizing those decisions.
Here, ephemeral means "this conversation might as well never had happened", so why waste time on that?
Does that mean I can't have some pleasure in conversing about things? Of course not. But, I also enjoy some pleasure there from the low stakes and value that a conversation has. It should be safe to be wrong. If you have a conversation spot where being wrong is not safe, then I question what is the advantage of that over trying to adopt a legalese framework for all of your communication?
So now imagine such (idealized) HN threads transplanted to Discord or Slack. Same people, same topics, same insights, just unrolling in the form of a regular chat. All that value, briefly there to partake in, and then forever lost after however much time it takes for it to get pushed up a few screens worth of lines in the chat log. People don't habitually scroll back very far on a regular basis (and the UI of most chat platforms starts to rapidly break down if you try), and the lack of defined structure (bounded conversations labeled by a topic) plus weak search tools means you're unlikely to find a conversation again even if you know where and when it took place.
That, plus ephemeral nature of casual chat means not just the platform, but also some of the users expect it to quickly disappear, leading to what I consider anti-features such as the ability to unilaterally edit or unsend any message at arbitrary time in the future. It takes just one participant deciding, for whatever reason, to mass-delete their past messages, for many conversations to lose most of their value forever.
--
[0] - Especially that the traditional communication style, both private and business, is overly verbose. Quite like a chat, in fact, but between characters in a theatrical play - everyone has longer lines.
I think my mental model is more for chat rooms to take the place of coffee room chats. Ideally, some of those do push on something to happen. I'm not sure that forcing them into the threaded structure of conversations really helps, though?
Maybe it is based on the aim? If the goal is a simulacrum of human contact, then I think ephemeral makes a ton of sense.
I also kind of miss the old tradition of having a "flamewars" topic in newsgroups. I don't particularly love yelling at each other, but I do hate that people can't bring up some topics.
(I also miss some old fun newsgroups. I recall college had Haiku and a few other silly restrictive style groups that were just flat fun.)
I also confess this model of ephemeral conversation is amusing in this specific website. Which I also largely view as a clubhouse conversation that is also best viewed as ephemeral. But it is clearly held for far longer than that idea would lead me to think.
I wish I could block them within all these chat apps.
"Sorry, you can't bother to send voice messages to this person."
I think this applies to any “fuzzy generation” scenario. It certainly shouldn’t be the only tool, and (at least as it stands today) isn’t good enough to finalize and fine tune the final result, but a series of “a foo with a bar” “slightly less orange” “make the bar a bit more like a fizzbuzz” interactions with a good chat UI can really get a good 80% solution.
But like all new toys, AI and AI chat will be hammered into a few thousand places where it makes no sense until the hype dies down and we come up with rules and guidelines for where it does and doesn’t work
I heavily disagree here, chat - or really text - is a horrible UI for image generation, unless you have almost zero idea of what you want to achieve and you don't really care about the final results.
Typing "make the bar a bit more like a fizzbuzz" in some textbox is awful UX compared to, say, clicking on the "bar" and selecting "fizzbuzz" or drag-and-dropping "fizzbuzz" on the "bar" or really anything that takes advantage of the fact we're interacting with a graphical environment to do work on graphics.
In fact it is a horrible UI for anything, except perhaps chatbots and tasks that have to do with text like grammar correction, altering writing styles, etc.
It is helpful for impressing people (especially people with money) though.
That assumes that you have a UX capable of determining what you're clicking on in the generated image (which we could call a given if we assume a sufficiently capable AI model since we're already instructing it to alter the thing), and also that it can determine from your click that you've intended to click on the "foo" not the "greeble" that is on the foo or the shadow covering that part of the foo or anything else that might be in the same Z stack as your intended target. Pixel bitching adventure games come to mind as an example of how badly this can go for us. And yes, this is solvable, Squeak has a UI where repeatedly clicking in the same spot will iterate through the list of possibilities in that Z stack. But it could also get really messy really quickly.
Then we have to assume that your UX will be able to generate an entire list of possible things you might want to be able to to do with that thing that you've clicked, including adding to it, removing it, removing part of it, moving it, transforming its dimensions, altering its colors, altering the material surface and on and on and on. And that list of possibilities needs to be navigable and searchable in a way that's faster than just typing "make the bar more like a fizzbuzz" into a context aware chat box.
Again, I'm not arguing the chat interface should be the only interface. In fact, as you point out we're using a graphical system, it would be great if you could click on things or select them and have the system work on too. It should be able to take additional input than just chat. But I still think for iterating on a fuzzy idea, a chat UI is a useful tool.
Even with game development. Level editors have a good history for being how people actually make games. Some quite good ones, I should add.
For website development, many template based systems worked quite well. People seem hellbent on never acknowledging that form builders of the late 90s did, in fact, work.
Is it a bit nicer that you can do everything through a dialog? I'm sure it is a great for people that think that way.
- "App builders" that use some combination of drag&drop UI builders, and design docs for architecture, workflows,... and let the UI guess what needs to be built "under the hood" (a little bit in the spirit of where UML class diagrams were meant to take us). This would still require actual programming knowledge to evaluate and fix what the bot has built
- Formal requirement specification that is sufficiently rigorous to be tested against automatically. This might go some way towards removing the requirement to know how to code, but the technical challenge would simply shift to knowing the specification language
But, if you want to start doing "domain specific" edits to the artifacts that are made, you are almost certainly going to want something like the app builders idea. Down thread, I mention how this is a lot like procedural generative techniques for game levels and such. Such that I think I am in agreement with your first bullet?
Similarly, if you want to make music with an instrument, it will be hard to ignore playing with said instrument more directly. I suspect some people can create things using chat as an interface. I just also suspect directly touching the artifacts at play is going to be more powerful.
I think I agree with the point on formal requirements. Not sure how that really applies to chat as an interface? I think it is hoping for a "laws of robotics" style that can have a test to confirm them? Reality could surprise me, but I always viewed that as largely a fiction item.
TLDR: Targeted edits and prompts / Heads Up Display
It should probably be more like an overlay (and hooked into context menus with suggestions, inline context bubbles when you want more context for a code block) and make use of an IDE problems view. The problems view would have to be enhanced to allow it to add problems that spanned multiple files, however.
Probably like the Rust compiler output style, but on steroids.
There would likely be some chatting required, but it should all be at a particular site in the code and then go into some history bank where you can view every topic you've discussed.
For authoring, I think an interactive drawing might be better, allowing you to click on specific areas and then use shorter phrasing to make an adjustment instead of having an argument in some chat to the left of your screen about specificity of your request.
Multi-point / click with minimal prompt. It should understand based on what I clicked what the context is without me having to explain it.
The space is ripe for folks with actual domain expertise to design an appropriate AI workflow for their domain.
I created the tetr app[1] which is basically “chat UI for everything”. I did that because I used to message myself notes and wanted to expand it to many more things. There’s not much back and forth, usually 1 input and instant output (no AI), still acting like a chat.
I think there’s a lot of intuitiveness with chat UI and it can be a flexible medium for sharing different information in a similar format, minimizing context switching. That’s my philosophy with tetr anyhow.
[1] https://tetr.app/
It's usually not. Narrative is a famously flawed way to communicate or record the real world.
It's great for generating engagement, though.
...and yet with it's flaws, it's the most flexible in conveying meaning. A Ted Chiang interview was on the HN frontpage a few days ago, in it, he mentions that humans created multiple precise, unambiguous communication modes like equations used in mathematical papers and proofs. But those same papers are not 100% equations, the mathematicians have to fall back to flawed language to describe and provide context because those formal languages only capture a smaller range of human thought compared to natural language.
This is not to say chat has the best ergonomics for development - it's not, but one has to remember that the tools are based on Large Language Models whose one-trick is manipulating language. Better ergonomics would likely come from models trained or fine-tuned on AST-tokens and diffs. They'd still need to modulate on language (understanding requirements, hints, variable names,and authoring comments, commits and/or PRs).
Incidentally I think that's also a good model for how much to trust the output - you might have a colleague who knows enough about X to think they can answer your question, but they're not necessarily right, you don't blindly trust it. You take it as a pointer, or try the suggestion (but not surprised if it turns out it doesn't work), etc.
I’ve been saying this since 2018
I think it is brilliant. On another hand I caught myself many times writing prompts to colleagues. Although it made requirements of what I need so much clearer for them.
Agreed that copy pasting context in and out of ChatGPT isn't the fastest workflow. But Cursor has been a major speed up in the way I write code. And it's primarily through a chat interface, but with a few QOL hacks that make it way faster:
1. Output gets applied to your file in a git-diff style. So you can approve/deny changes.
2. It (kinda) has context of your codebase so you don't have to specify as much. Though it works best when you explicitly tag files ("Use the utils from @src/utils/currency.ts")
3. Directly inserting terminal logs or type errors into the chat interface is incredibly convenient. Just hover over the error and click the "add to chat"
I’ve only been slowed down with AI tools. I tried for a few months to really use them and they made the easy tasks hard and the hard tasks opaque.
But obviously some people find them helpful.
Makes me wonder if programming approaches differ wildly from developer to developer.
For me, if I have an automated tool writing code, it’s bc I don’t want to think about that code at all.
But since LLMs don’t really act deterministically, I feel the need to double check their output.
That’s very painful for me. At that point I’d rather just write the code once, correctly.
Scripting assistance by itself is worth the price of admission.
The other thing I've found it good at is giving me an English description of code I didn't write... I'm sure it sometimes hallucinates, but never in a way that has been so wrong that its been apparent to me.
Very good for throwaway code though, for example a PoC which won't really be going to production (hopefully xD).
Although most recently I caught it because I fed it into both gpt-4o and o1 and o1 had the correct flags. Then I asked 4o to expand the flags from the short form to the long form and explain them so I could double-check my reasoning as to why o1 was correct.
I feel the same
> That’s very painful for me. At that point I’d rather just write the code once, correctly.
I use AI tools augmentatively, and it's not painful for me, perhaps slightly inconvenient. But for boiler-plate-heavy code like unit tests or easily verifiable refactors[1], adjusting AI-authored code on a per-commit basis is still faster than me writing all the code.
1. Like switching between unit-test frameworks
Leaning on these tools just isn’t for me rn.
I like them most for one off scripts or very small bash glue.
The chat interface is... fine. Certainly better integrated into the editor than GitHub Copilot's, but I've never really seen the need to use it as chat—I ask for a change and then it makes the change. Then I fixed what it did wrong and ask for another change. The chat history aspect is meaningless and usually counterproductive, because it's faster for me to fix its mistakes than keep everything in the chat window while prodding it the last 20% of the way.
"how do I check the cors bucket policies on [S3 bucket name]"
hint: you don't get paid to get the LLM to output perfect code, you get paid by PRs submitted and landed. Generate the first 80% or whatever with the LLM, and then finish the last 20% that you can write faster than the LLM yourself, by hand.
End users don’t care where the code came from.
Zed makes it trivial to attach documentation and terminal output as context. To reduce risk of hallucination, I now prefer working in static, strongly-typed languages and use libraries with detailed documentation, so that I can send documentation of the library alongside the codebase and prompt. This sounds like a lot of work, but all I do is type "/f" or "/t" in Zed. When I know a task only modifies a single file, then I use the "inline assist" feature and review the diffs generated by the LLM.
Additionally, I have found it extremely useful to actually comment a codebase. LLMs are good at unstructured human language, it's what they were originally designed for. You can use them to maintain comments across a codebase, which in turn helps LLMs since they get to see code and design together.
Last weekend, I was able to re-build a mobile app I made a year ago from scratch with a cleaner code base, better UI, and implement new features on top (making the rewrite worth my time). The app in question took me about a week to write by hand last year; the rewrite took exactly 2 days.
---
As a side note: a huge advantage of Zed with locally-hosted models is that one can correct the code emitted by the model and force the model to re-generate its prior response with those corrections. This is probably the "killer feature" of models like qwen2.5-coder:32b. Rather than sending extra prompts and bloating the context, one can just delete all output from where the first mistake was made, correct the mistake, then resume generation.
Imperative: - write a HTTP server that serves jokes - add a healthcheck endpoint - add TLS and change the serving port to 443
Declarative: - a HTTP server that serves jokes - contains a healthcheck endpoint - supports TLS on port 443
The differences here seem minimal because you can see all of it at once, but in the current chat paradigm you'd have to search through everything you've said to the bot to get the full context, including the side roads that never materialized.
In the document approach you're constantly refining the document. It's better than reviewing the code because (in theory) you're looking at "support TLS on port 443" instead of a lot of code, which means it can be used by a wider audience. And ideally I can give the same high level spec to multiple LLMs and see which makes the best application.
For example, for a signup page, we could have: - Declarative: Signup the user using their email address - Imperative: To do the same, we will need to implement the smtp library, which means discovering that we need an SMTP server, so now we need to choose which one. And when purchasing an SMTP Server plan, we discover that there are rate limit, so now we need to add some bot protection to our signup page (IP Rate Limit only? ReCaptcha? Cloudflare bot protection?), etc
Which means that at the end, the imperative code way is kind of like the ultimate implementation specs.
The source of truth would still be the code though, otherwise the declarative statements would get so verbose that they wouldn't be any more useful than writing the code itself.
Cursor also does a great job of showing inline diffs of what composer is doing, so you can quickly review every change.
I don’t think there’s any reason Continue can’t match these features, but it hadn’t, last I checked.
Cursor also focuses on sane defaults, which is nice. The tab completion model is very good, and the composer model defaults to Claude 3.5 Sonnet, which is arguably the best non-reasoning code model. (One would hope that Cursor gets agent-composer working with reasoning models soon.) Continue felt much more technical… which is nice for power users, but not always the best starting place.
I found interacting with it via chat to be super-useful and a great way to get stuff done. Yeah, sometimes you just have to drop into the code, and tag a particular line and say "this isn't going to work, rewrite it to do x" (or rewrite it yourself), but the ability to do that doesn't vitiate the value of the chat.
https://zed.dev/docs/assistant/assistant-panel#editing-a-con...
It's still a "chat" but it's just text at the end of the day. So you can edit as you see fit to refine your context and get better responses.
e.g. executives treat the org as a blackbox LLM and chat w it to get real results
So you either need lots of extra text to remove the ambiguity of natural language if you use AI or you need a special precise subset to communicate with AI and that’s just programming with extra steps.
Real projects don't require an infinitely detailed specification either, you usually stop where it no longer meaningfully moves you towards the goal.
The whole premise of AI developer automation, IMO, is that if a human can develop a thing, then AI should be able too, given the same input.
With a 200k token window like Claude has you can already dump a lot of design docs / transcripts / etc. at it.
If you use cline with any large context model the results can be pretty amazing. It's not close to self guiding, You still need to break down and analyze the problem and provide clear and relevant instructions. IE you need to be a great architect. Once you are stable on the direction, its awe inspiring to watch it do the bulk if the implementation.
I do agree that there is space to improve over embedded chat windows in IDEs. Solutions will come in time.
By the way, remind me why you need design meetings in that ideal world?:)
> Real projects don't require an infinitely detailed specification either, you usually stop where it no longer meaningfully moves you towards the goal.
The point was that specification is not detailed enough in practice. Precise enough specification IS code. And the point is literally that natural language is just not made to be precise enough. So you are back where you started
So you waste time explaining in detail and rehashing requirements in this imprecise language until you see what code you want to see. Which was faster to just... idk.. type.
Joking aside, this is likely where we will end up, just with a slightly higher programming interface, making developers more productive.
All the same buzzwords, including "AI"! In 1981!
Having a feedback loop is the only way viable for this. Sure, the client could give you a book on what they want, but often people do not know their edge cases, what issues may arise/etc.
haha, I just imagined sending TypeScript to ChatGPT and having it spit my TypeScript back to me. "See guys, if you just use Turing-complete logically unambiguous input, you get perfect output!"
If you know how to program, then I agree and part of why I don't see the point. If you don't know how to program, than the prompt isn't much different than providing the specs/requirements to a programmer.
In other words, if you replace natural language with a programming language then the computer will do a good job of interpreting your intent. But that's always been true, so...
In other word, complex applications can still be fully specified in plain English, even if it might take more words.
In plain English, of course, but not in natural English. When using language naturally one will leave out details, relying on other inputs, such as shared assumptions, to fill in the gaps. Programming makes those explicit.
Likewise for English: one can use natural English to add as many details as necessary, depending on who you're talking to, e.g. "Make an outline around the input field, and color the outline #ff0000." You can then add, if necessary, "Make the corners of the outline rounded with a 5 pixel radius."
In this respect, complex applications can be fully specified in English; we usually call those documents "formal specifications." You can write it terse, non-natural language with consistent, defined terminology to save room (as most specs are), or colloquial (natural) language if you really want. I wouldn't recommend the latter, but it's definitely useful when presenting specs to a less technically informed audience.
Of course. We established that at the beginning. The entire discussion is about exactly that. It was confirmed again in the previous comment. However, that is not natural. I expect most native English speakers would be entirely incapable of fully specifying a complex application or anything else of similar complexity. That is not natural use.
While the words, basic syntax, etc. may mirror that found in natural language, a specification is really a language of its own. It is nothing like the language you will find people speaking at the bar or when writing pointless comments on Reddit. And that's because it is a programming language.
Your original postulation was that it simply wasn't possible, implying nobody could do it. The fact that most native English speakers wouldn't be able to do it doesn't mean nobody can do it.
I agree that most native English speakers wouldn't be able to write a reasonably complete spec in any type of language, not just because they lack the language skill, but because they simply wouldn't have the imagination and knowledge of what to create to begin with, let alone how to express it.
An LLM can do increasingly well as a fly on the wall, but it’s common for people using an LLM to be less collaborative with an LLM and for them to expect the LLM to structure the conversation. Hence the suggestion to be careful in your prompting.
Right. On one side you have programming language and on the other natural language.
They can intermingle, if that is what you are trying to say? You can see this even in traditional computer programming. One will often switch between deliberate expression and casual, natural expression (what often get called comments in that context).
They all usually build down to a subset of english, because near caveman speak is enough to define things with precision.
Which model are you talking about here? Because with ChatGPT, I struggle with getting it to ask any clarifying questions before just dumping code filled with placeholders I don't want, even when I explicitly prompt it to ask for clarification.
The struggle is to provide a context that disambiguates the way you want it to.
LLMs solve this problem by avoiding it entirely: they stay ambiguous, and just give you the most familiar context, letting you change direction with more prompts. It's a cool approach, but it's often not worth the extra steps, and sometimes your context window can't fit enough steps anyway.
My big idea (the Story Empathizer) is to restructure this interaction such that the only work left to the user is to decide which context suits their purpose best. Given enough context instances (I call them backstories), this approach to natural language processing could recursively eliminate much of its own ambiguity, leaving very little work for us to do in the end.
Right now my biggest struggle is figuring out what the foundational backstories will be, and writing them.
The advantage of natural language is that we can write ambiguously defined expressions, and infer their meaning arbitrarily with context. This means that we can write with fewer unique expressions. It also means that context itself can be more directly involved in the content of what we write.
In context-free grammar, we can only express "what" and "how"; never "why". Instead, the "why" is encoded into every decision of the design and implementation of what we are writing.
If we could leverage ambiguous language, then we could factor out the "why", and implement it later using context.
I think about this like SQL in the late 80s. At the time, SQL was the “next big thing” that was going to mean we didn’t need programmers, and that management could “write code”. It didn’t quite work out that way, of course, as we all know.
I see chat-based interfaces to LLMs going exactly the same way. The LLM will move down the stack (rather than up) and much more appropriate task-based UX/UI will be put on top of the LLM, coordinated thru a UX/UI layer that is much sympathetic to the way users actually want to interact with a machine.
In the same way that no end-users ever touch SQL these days (mostly), we won’t expose the chat-based UX of an LLM to users either.
There will be a place for an ad-hoc natural language interface to a machine, but I suspect it’ll be the exception rather than the rule.
I really don’t think there are too many end users who want to be forced to seduce a mercurial LLM using natural language to do their day-to-day tech tasks.
Only when someone discovers another paradigm that matches or exceeds the effectiveness of LLMs without being a language model.
If you asked me two or three years ago I would have strongly agreed with this theory. I used to point out that every line of code was a decision made by a programmer and that programming languages were just better ways to convey all those decisions than human language because they eliminated ambiguity and were much terser.
I changed my mind when I saw how LLMs work. They tend to fill in the ambiguity with good defaults that are somewhere between "how everybody does it" and "how a reasonably bright junior programmer would do it".
So you say "give me a log on screen" and you get something pretty normal with Username and Password and a decent UI and some decent color choices and it works fine.
If you wanted to provide more details, you could tell it to use the background color #f9f9f9, but a part of what surprised my and caused me to change my mind on this matter was that you could also leave that out and you wouldn't get an error; you wouldn't get white text on white background; you would get a decent color that might be #f9f9f9 or might be #a1a1a1 but you saved a lot of time by not thinking about that level of detail and you got a good result.
Right now we have a ton of AI/ML/LLM folks working on this first clear challenge: better models that generate better defaults, which is great—but also will never solve the problem 100%, which is the second, less-clear challenge: there will always be times you don't want the defaults, especially as your requests become more and more high-level. It's the MS Word challenge reconstituted in the age of LLMs: everyone wants 20% of what's in Word, but it's not the same 20%. The good defaults are good except for that 20% you want to be non-default.
So there need to be ways to say "I want <this non-default thing>". Sometimes chat is enough for that, like when you can ask for a different background color. But sometimes it's really not! This is especially true when the things you want are not always obvious from limited observations of the program's behavior—where even just finding out that the "good default" isn't what you want can be hard.
Too few people are working on this latter challenge, IMO. (Full disclosure: I am one of them.)
In your example, the issue is not with writing the logon screen (You can find several example on github and a lot of css frameworks have form snippets). The issue is making sure that it works and integrate well with the rest of the project, as well as being easy to maintain.
Here is an example of our approach:
https://blog.codesolvent.com/2024/11/building-youtube-video-...
We are also using the requirements to build a checklist, the AI generates the checklist from the requirements document, which then serves as context that can be used for further instructions.
Here's a demo:
https://youtu.be/NjYbhZjj7o8?si=XPhivIZz3fgKFK8B
Something like tldraw's "make real" [1] is a much better bet, imo (not that it's mutually exclusive). Draw a rough mockup of what you want, let AI fill in the details, then draw and write on it to communicate your changes.
We think multi-modally; why should we limit the creative process to just text?
[1] https://tldraw.substack.com/p/make-real-the-story-so-far
It will slowly grow in complexity, strictness, and features, until it becomes a brand-new programming language, just with a language model and a SaaS sitting in the middle of it.
A startup will come and disrupt the whole thing by simply writing code in a regular programming language.
> Looking for a low level engineer, who works close to the metal, will work on our prompts
The mode that I've found most fruitful when using Cursor is treating it almost exactly as I would a pair programming partner. When I start on a new piece of functionality I describe the problem and give it what my thoughts are on a potential solution and invite feedback. Sometimes my solution is the best. Sometimes the LLM had a better idea and frequently we take a modified version of what one of us suggested. Just as you would with a human partner. The result of the discussion is better than what either of us would have done on their own.
I also will do classical ping-pong style tdd with it one we agreed on an approach. I'll write a test; llm makes it pass and write the next test which I'll make pass and so on.
As with a real pair, it's important to notice when they are struggling and help them or take over. You can only do this if you stay fully engaged and understand every line. Just like when pairing. I've found llms get frequently in a loop where something doesn't work and they keep applying the same changes they've tried before and it never works. Understand what they are trying to do and help them out. Don't be a shitty pair for your llm!
It gets even funner when you try to get other models to fix whatever is broken and they too get caught in the same loop. I’ll be like “nope! Your buddy ChatGPT said the same thing and got stuck in such and such loop. Clearly whatever you are trying isn’t working so step back and focus on the bigger picture. Are we even doing this the right way in the first place?”
And of course it still walks down the loop. So yeah, better be ready to fix that problem yourself cause if they all do the same thing you are either way off course or they are missing something!
https://aider.chat/docs/usage/watch.html
How jarring it is & how much it takes you out of your own flow state is very much dependent on the model output quality and latency still, but at times it works rather nicely.
I think it's more ideal to have the LLM map text to some declarative pseudocode that's easy to read which is then translated to code.
The example given by Daniel might map to something like this:
Then you'd use chat to make updates. For example, "make the gradient red" or "add a name field." Come to think of it, I don't see why chat is a bad interface at all with this set up.It decided to output something JSON and maybe YAML once.
The company I work for integrated AI into some of our native content authoring front-end components and people loved it. Our system took a lot of annotating to be able to accurately translate the natural language to the patterns of our system but users so far have found it WAYYY more useful than chat bc it's deeply integrated into the tasks they do anyway.
Figma had a similar success at last year's CONFIG when they revealed AI was renaming default layers names (Layer 1, 2, etc)... something they didn't want to do anyway. I dare say nobody gave a flying f about their "template" AI generation whereas layer renaming got audible cheers. Workflow integration is how you show people AI isn't just replacing their job like some bad sci-fi script.
Workflow integration is going to be big. I think chat will have its place tho; just kind of as an aside in many cases.
Then having ai generate code for my project didn't feel good either, I didn't really understand what it was doing so I would have to read it to understand, then what is the purpose, I might as well write it.
I then started playing, and out came a new type of programming language called plang (as in pseudo language). It allows you to write the details without all the boiler code.
I'm think I've stumbled on to something, and just starting to get noticed :) https://www.infoworld.com/article/3635189/11-cutting-edge-pr...
I've tested a few integrated AI dev tools and it works like a charm. I don't type all my instructions at once. I do it the same way as I do it with code. Iteratively:
1) Create a layout
2) Fill left side
3) Fill right side
4) Connect components
5) Populate with dummy data
> The first company to get this will own the next phase of AI development tools.
There's more than 25 working on this problem and they are already in production and some are really good.
Last night I wrote an implementation of an AI paper and it was so much easier to just discard the automatic chat formatting and do it "by hand": https://github.com/Xe/structured-reasoning/blob/main/index.j...
I wonder if foundation models are an untapped goldmine in terms of the things they can do, but we can't surface them to developers because everyone's stuck in the chat pattern.
Would you be so kind as to ELI5 what you did in that index.js?
I've used ollama to run models locally, but I'm still stuck in chat-land.
Of course, if a blog post is in the works, I'll just wait for that :)
AI models fundamentally work on the basis of "given what's before, what comes next?" When you pass messages to an API like:
Under the hood, the model actually sees something like this (using the formatting that DeepSeek's Qwen 2.5 32b reasoning distillation uses): And then the model starts generating tokens to get you a reply. What the model returns is something like: The runtime around the model then appends that as the final "assistant" message and sends it back to the user so there's a façade of communication.What I'm doing here is manually assembling the context window such that I can take advantage of that and then induce the model that it needs to think more, so the basic context window looks like:
And then the model will output reasoning steps until it sends a </think> token, which can be used to tell the runtime that it's done thinking and to treat any tokens after that as the normal chat response. However, sometimes the model stops thinking too soon, so what you can do is intercept this </think> token and then append a newline and the word "Wait" to the context window. Then when you send it back to the model, it will second-guess and double-check its work.The paper s1: Simple test-time scaling (https://arxiv.org/abs/2501.19393) concludes that this is probably how OpenAI implemented the "reasoning effort" slider for their o1 API. My index.js file applies this principle and has DeepSeek's Qwen 2.5 32b reasoning distillation think for three rounds of effort and then output some detailed information about Canada.
In my opinion, this is the kind of thing that people need to be more aware of, and the kind of stuff that I use in my own research for finding ways to make AI models benefit humanity instead of replacing human labor.
It's fascinating how this "turn-taking protocol" has emerged in this space -- as a (possibly weird) analogy, different countries don't always use the same electrical voltage or plug/socket form-factor.
Yet, the `role` and `content` attrib in json appears to be pretty much a de facto standard now.
That’s the thing about language, you CAN’T program in human language for this exact reason, whereas programming languages are mechanical but precise, human languages flow better but they leave wiggle room. Computers can’t do jack shit with wiggle room, they’re not humans. That’ll always remain, until there’s an AI people like enough to have it’s own flair on things.
So far as this article is concerned (not the many commenters who are talking past it), "chat" is like interacting with a shell or a REPL. How different is the discussion that Winograd has with SHRDLU
https://en.wikipedia.org/wiki/SHRDLU
with the conversation that you have with a database with the SQL monitor really?
There's a lot to say for trying to turn that kind of conversation into a more durable artifact. I'd argue that writing unit tests in Java I'm doing exploratory work like I'd do in a Python REPL except my results aren't scrolling away but are built into something I can check into version control.
On the other hand, workspace-oriented programming environments are notorious for turning into a sloppy mess, for instance people really can't make up their mind if they want to store the results of their computations (God help you if you have more than one person working on it, never mind if you want to use version control -- yet, isn't that a nice way to publish a data analysis?) or if they want to be a program that multiple people can work, can produce reproducible results, etc.
See also the struggles of "Literate Programming"
Not to say there isn't an answer to all this but boy is it a fraught area.
In a real-world scenario, we begin with detailed specifications and requirements, develop a product, and then iterate on it. Chat-based interactions might be better suited to this iterative phase. Although I'm not particularly fond of the approach, it does resemble receiving a coworker's feedback, making a small, targeted change, and then getting feedback again.
Even if the system were designed to focus solely on the differences in the requirements—thus making the build process more iterative—we still encounter an issue: it tends to devolve into a chat format. You might have a set of well-crafted requirements, only for the final instruction to be, "The header should be 2px smaller."
Nonetheless, using AI in an iterative process (focusing on requirement diffs, for example) is an intriguing concept that I believe warrants further exploration.
Chat is also iterative. You can go back there and fix things that were misinterpreted. If the misinterpretation happens often, you can add on another instruction on top of that. I strongly disagree that they'd be fixed documents. Documents are a way to talk to yourself and get your rules right before you commit to them. But it costs almost nothing to do this with AI vs setting up brainstorming sessions with another human.
However, the rational models (o1, r1 and such) are good at iterating with themselves, and work better when you give them documents and have them figure out the best way to implement something.
Absolutely insane that all the doors unlocked by being able to interact with a computer graphically, and yet these people have visions of the future stuck in the 60s.
The example shows "Sign-in screen" with 4 (possibly more) instructions. This could equivalently have been entered one at a time into 'chat'. If the response for each was graphic and instantaneous, chat would be no worse than non-chat.
What makes non-chat better is that the user puts more thought into what they write. I do agree for producing code Claude with up-front instructions beats ChatGPT handily.
If OTOH AI's actually got as good or better than humans, chat would be fine. It would be like a discussion in Slack or PR review comments.
English behaviour descriptions -> generated tests
Use both behaviour descriptions and feedback from test results to iterate on app development
Been experimenting with the same approach but for "paged shells" (sorry for the term override) and this seems to be a best of both worlds kinda thing for shells. https://xenodium.com/an-experimental-e-shell-pager That is, the shell is editable when you need it to be (during submission), and automatically read-only after submission. This has the benefit of providing single-character shortcuts to navigate content. n/p (next/previous) or tab/backtab.
The navigation is particularly handy in LLM chats, so you can quickly jump to code snippets and either copy or direct output elsewhere.
Chat is an awesome powerup for any serious tool you already have, so long as the entity on the other side of the chat has the agency to actually manipulate the tool alongside you as well.
I haven't done it for existing projects but I have done something similar for an unfamiliar, old and challenging codebase. I worked with the cursor chat agent to produce a document I called architecture.md mapping out high level features to files/classes/functions. This was excellent because I found the document useful and it also made cursor more effective.
1. .cursorrules for global conventions. The first rule in the file is dumb but works well with Cursor Composer:
`If the user seems to be requesting a change to global project rules similar to those below, you should edit this file (add/remove/modify) to match the request.`
This helps keep my global guidance in sync with emergent convention, and of course I can review before committing.
2. An additional file `/.llm_scratchpad`, which I selectively include in Chat/Composer context when I need lengthy project-specific instructions that I made need to refer to more than once.
The scratchpad usually contains detailed specs, desired outcomes, relevant files scope, APIs/tools/libs to use, etc. Also quite useful for transferring a Chat output to a Composer context (eg a comprehensive o1-generated plan).
Lately I've even tracked iterative development with a markdown checklist that Cursor updates as it progresses through a series of changes.
The scratchpad feels like a hack, but they're obvious enough that I expect to see these concepts getting first-party support through integrations with Linear/Jira/et al soon enough.
The challenge is that I haven't seen anything better really.
Lately the innovation comes mainly from deeper integration with tools. Standalone AI editors are mainly popular with people who use relatively simple editors (like VS Code). VS Code has a few party tricks but for me swapping out Intellij for something else on a typical Kotlin project is a complete non starter. Not going to happen. I'd gain AI, but I'd loose everything else that I use all the time. That would be a real productivity killer. I want to keep all the smart tooling I already have and have used for years.
There are a few extensions for intellij but they are pretty much all variations of a sidebar with a chat and autocomplete. Autocomplete competes with normal autocomplete, which I use all the time. And the clippy style "it looks like you are writing a letter" style completions just aren't that useful too me at all. They are just noise and break my flow. And they drown out the completions I use and need all the time. And sidebars just take up space and copying code from there back to your editor is a bit awkward as UX
Lately I've been using chat gpt. It started out pretty dumb but these days I can option+shift+1 in a chat and have it look over my shoulder at my current editor. "how do I do that?" translates into a full context with my current editing window, cursor & selected text, etc. all in the context. Before I was copy pasting everything and the kitchen sync to chat gpt, now it just tells me what I need to do. The next step up from this is that it starts driving the tools itself. They already have a beta for this. This deeper integration is what is needed.
A big challenge is that most of these tools are driven to minimize cost and context size. Tokens cost money. So chat GPT only looks at my active editor and not at the 15 other files I have open. It could. But it doesn't. It's also unaware of my project structure, or the fact that most of my projects are kotlin multiplatform and can't use JVM dependencies. So, in that sense, every chat still is a bit ground hog day. It's promise to "remember" stuff when you ask it too is super flaky. It forgets most things it's supposed to remember pretty quickly.
These are solvable problems of course. But it's useful to me for debugging, analyzing, completing functions, etc.
So I completely agree with this. Chat is not a good UI
Example of a Structured Pseudo-Code Prompt:
Let’s say you want to generate code for a function that handles object detection:
'''Function: object_detection Input: image Output: list of detected objects
Steps: 1. Initialize model (load pretrained object detection model)
2. Preprocess the image (resize, normalize, etc.)
3. Run the image through the model
4. Extract bounding boxes and confidence scores from the model's output
5. Return objects with confidence greater than 0.5 as a list of tuples (object_name, bounding_box)
Language: Python'''
Many developers don't realize this but as you go back and forth with models, you are actively polluting their context with junk and irrelevant old data that distracts and confuses it from what you're actually trying to do right now. When using sleeker products like Cursor, it's easy to forget just how much junk context the model is constantly getting fed (from implicit RAG/context gathering and hidden intermediate steps). In my experience LLM performance falls off a cliff somewhere around 4 decent-sized messages, even without including superfluous context.
We're further separating the concept of "workflow" from "conversation" and prompts, basically actively and aggressively pruning context and conversation history as our agents do their thing (and only including context that is defined explicitly and transparently), and it's allowing us to tackle much more complex tasks than most other AI developer tools. And we are a lot happier working with models - when things don't work we're not forced to grovel for a followup fix, we simply launch a new action to make the targeted change we want with a couple clicks.
It is in a weird way kind of degrading to have to politely ask a model to change a color after it messed up, and it's also just not an efficient way to work with LLMs - people just default to that style because it's how you'd interact with a human you are delegating tasks to. Developers still need to truly internalize the facts that LLMs are purely completion machines, that your conversation history lives entirely client side outside of active inference, and that you can literally set your conversation input to be whatever you want (even if the model never said that) - after that realizing you're on the path towards using LLMs like "what words do I need to put it in to get it to do what I want" rather than working "with" them.
I used this single line to generate a 5 line Java unit test a while back.
test: grip o -> assert state.grip o
LLMs have wide "understanding" of various syntaxes and associated semantics. Most LLMs have instruct tuning that helps. Simplifications that are close to code work.
Re precision, yes, we need precision but if you work in small steps, the precision comes in the review.
Make your own private pidgin language in conversation.
It has features to add context from your current project pretty easily, but personally I prefer to constantly edit the chat buffer to put in just the relevant stuff. If I add too much, Claude seems to get confused and chases down irrelevant stuff.
Fully controlling the context like that seems pretty powerful compared to other approaches I've tried. I also fully control what goes into the project - for the most part I don't copy paste anything, but rather type a version of the suggestion out quickly.
If you're fast at typing and use an editor with powerful text wrangling capabilities, this is feasible. And to me, it seems relatively optimal.
- intellisense in the inputbox based on words in this or all previous chats and a user customizable word list
- user customizable buttons and keyboard shortcuts for common quick replies, like "explain more".
- when claude replies with a numbered list of alternatives let me ctrl+click a number to fork the chat with continued focus on that alternative in a new tab.
- a custom right click menu with action for selection (or if no selection claude can guess the context e.g. the clicked paragraph) such as "new chat with selection", "explain" and some user customizable quick replies
- make the default download filenames follow a predicable pattern, claude currently varies it too much e.g. "cloud-script.py" jumps to "cloud-script-errorcheck.py". I've tried prompting a format but claude seems to forget that.
- the stop button should always instantly stop claude in its tracks. Currently it sometimes takes time to get claude to stop thinking.
- when a claude reply first generates code in the right sidebar followed by detailed explanation text in the chat, let some keyboard shortcut instantly stop the explanation in its tracks. Let the same shortcut preempt that explanation while the sidebar code is still generating.
- chat history search is very basic. Add andvanced search features, like filter by date first/last message and OR search operator
- batch jobs and tagging for chat history. E.g. batch apply a prompt to generate a summary in each selected chat and then add the tag "summary" to them. Let us then browse by tag(s).
- tools to delete parts of a chat history thread, that in hindsight were detours
- more generally, maybe a "chat history chat" to have Claude apply changes to the chat histories
1. Ask AI to generate a spec of what we're planning to do. 2. Refine it until it's kind of resembling what I want to do 3. Ask AI to implement some aspects from the spec
chat is so drastically far away from my workflow that it doesn't feel like my workflow is wrong.
https://austinhenley.com/blog/naturallanguageui.html
Emails are so similar to Chat, except we're used to writing in long-form, and we're not expecting sub-minute replies.
Maybe emails are going to be the new chat?
I've been experimenting with "email-like" interfaces (that encourage you to write more / specify more), take longer to get back to you, and go out to LLMs. I think this works well for tools like Deep Research where you expect them to take minutes to hours.
Chat is single threaded and ephemeral. Documents are versioned, multi-threaded, and a source of truth. Although chat is not appropriate as the source of truth, it's very effective for single-threaded discussions about documents. This is how people use requirements documents today. Each comment on a doc is a localized chat. It's an excellent interface when targeted.
Like with any coworker - when ideas get real, get out of chat and start using our tools and process to get stuff done.
I would like as well to add to it a peer-programming feature, with it making some comments on top of the shoulder when coding, a kind of smarter linter that will not lint one line, but that will have the entire project context.
For writing, the canvas interface is much more effective because you rely less on copy and paste. For code, even with the ctrl+i method, it works but it's a pain to have to load all other files as reference every single time.
Perhaps I should comment all todos and then write "finish todos" as the always-same text prompt.
And that's not even to say that I don't write code comments. When working on large legacy codebases, where you often need to do 'weird' things in service of business goals and timelines, a comment that explains WHY something was done the way it was is valuable. And I leave those comments all the time. But they're still a code smell.
Comments are part of your code. So they need to be maintained with the rest of your code. Yet they are also "psychologically invisible" most of the time to most programmers. Our IDEs even tend to grey them out by default for us, so that they get out of the way so we can focus on the actual implementation code.
This means that comments are a maintenance obligation that often get ignored and so they get out of sync with the actual code really fast.
They also clutter the code unnecessarily. Code, at its best, should be self-explanatory with extremely little effort needed to understand the intent of the code. So even a comment that explains why the code is weird is doing little more than shining a flashlight on smelly code without actually cleaning it up.
And don't get me started on "todo" comments. Most professional shops use some kind of project management tool for organizing and prioritizing future work. Don't make your personal project management the problem of other people that share and contribute to your codebase. There is zero rationale for turning shared code into your personal todo list. (and it should be obvious that I'm talking about checked in code .. if it's your working branch then you do you :) )
So if programming using LLMs is similar to writing comments (an interesting analogy I hadn't considered before), then maybe this is part of the reason I haven't found a problem that LLMs solve for me yet (when programming specifically). I just don't think like that when I'm writing code.
It's not really a conscious choice, but rather a side effect. And we already see the trend is away from that, with tools like chatGPT Canvas, editors like Windsurf, etc.
Once the models become fast enough to feel instantaneous, we'll probably begin to see more seamless interfaces. Who wants a pair programmer who goes "umm... ahh..." every time you type something? A proper coding assistant should integrate with your muscle memory just like autocomplete. Tab, tab, tab and it's done.
I won't be surprised if chat-based programming will be the next way of doing stuff.
- Speed up literature recherche
- replace reading library documentation
- generate copy pasta code that has been written often before
The problem with this is that you need a gazillion of menus, dialogs and options to find that modal which does the thing _exactly_ what you want. Menus and likes are a means to an end, we don't really want them, but up until recently we couldn't live without them. With instruct based computing this is all changing.
Theoretically maybe, but chat windows are getting the job done right now.
It could be quite fun !
Back to... programming languages? :)
Its a problem of programming languages and definitions.
We tried a pair-programming exercise, and he got visibly angry, flustered and frustrated when he tried to verbalize what he was doing.
One of the reasons Business Analysts and the like exist is that not everyone can bridge the gap between the messy, verbal real world, and the precision demanded by programming languages.
AI in many levels is more capable than human programmer, in some it is not. It is not supersmart. It can not hold entire program in its head, you have to feed it small relevant section of program.
》 That’s why we use documents—they let us organize complexity, reference specific points, and track changes systematically.
Extra steps. Something like waterfall...
For higher-level AI assist, I do agree chat is not what makes sense. What I think would be cool is to work in markdown files, refining in precise plain english each feature. The AI then generates code from the .md files plus existing context. Then you have well-written documentation and consistent code. You can do this to a degree today by referencing a md file in chat, or by using some of the newer tools, but I haven't seen exactly what I want yet. (I guess I should build it?)
The level of precision required for highly complex tasks was never necessary before. My four year old has a pretty solid understanding of how the different AI tools she has access to will behave differently based on how she phrases what she says, and I've noticed she is also increasingly precise when making requests of other people.
I suspect there's an 100 year old book describing what I'm saying but much more eloquently.
In its current form LLMs are pretty much at their limit, barring optimization and chaining them together for more productivity once we have better hardware. Still, it will just be useful for repetitive low level tasks and mediocre art. We need more breakthroughs beyond transformers to approach something that creates like humans instead of using statistical inference.
How do you know that?
or the other way around,give AI a design doc and generate what you want,this is still chatting, just more official and lengthy
First of all, most people can't write extremely complex applications, period. Most programmers included. If your baseline for real programming is something of equivalent complexity as the U.S. tax code, you're clearly such a great programmer that you're an outlier, and should recognize that.
Second of all, I think it's a straw man argument to say that you can either write prototype-level code with a chat UI, or complex code with documents. You can use both. I think the proposition being put forward is that more people can write complex code by supplementing their document-based thinking with chat-based thinking. Or, that people can write slightly better-than-prototype level code with the help of a chat assistant. In other words, that it's better to have access to AI to help you code small sections of a larger application that you are still responsible for.
I'd be more interested in reading a good argument against the value of using chat-based AI as another tool in your belt, rather than a straight-up replacement for traditional coding. If you could make that argument, then you could say chat is a bad UI pattern for dev tools.
But that is true? Devs spend more time in meetings than writing code. Having conversations about the code they are going to write.
When we're trying to wrangle a piece of code to do something we want but aren't quite sure of how to interact with the api, it's a different matter.
What i found is that by the time copilot/gpt/deepseek has enough knowledge about the problem and my codebase, I've run out of tokens. Because my head can contain a much larger problem area than these models allow me to feed them in a budget friendly manner.
Vague and prone to endless argument?
I don’t buy that a document could capture what is needed here. Imagine describing navigating through multiple levels of menus in document form. That sounds straight up painful even for trivial apps. And for a full blown app…nope
There is a whole new paradigm missing there imo
We play around with LLMs to build a chat experience. My first attempt made Claude spew out five questions at a time, which didn't solve the "guiding" problem. So I started asking it to limit the number of unanswered questions. It worked, but felt really clunky and "cheap."
I drew two conclusions: We need UI builders for this to feel nice, and professionals will want to use forms.
First, LLMs would be great at driving step-by-step guides, but it must be given building blocks to generate a UI. When asking about location, show a map. When deciding to ask about TIN or roof size, if the user is technically inclined, perhaps start with asking about the roof. When asking about the roof size, let the user draw the shape and assign lengths. Or display aerial photos. The result on screen shouldn't be a log of me-you text messages, but a live-updated summary of where we are, and what's remaining.
Second, professionals have incentive to build mental model for navigating complex data structures. People who have no reason to invest time into the data model (e.g. a consumer buying a single solar panel installation in ther lifetime,) will benefit from rich LLM-driven UIs. Chat UIs might create room for a new type of computer user who doesn't use visual clues to build this mental model, but everyone else will want to stay on graphics. If you're an executive wondering how many sick days there were last month, that's a situation where a BI LLM RAG would be great. But if you're not sure what your question is, because you're hired to make up your own questions, then pointing, clicking and massaging might make more sense.
doc=programming in a DSL? / (what was that one language which was functional & represented in circles in a canvas?)
1) The first thing to improve chats as a genre of interface, is that they should all always be a tree/hierarchy (just like Hacker News is), so that you can go back to ANY precise prior point during a discussion/chat and branch off in a different direction, and the only context the AI sees during the conversation is the "Current Node" (your last post), and all "Parent Nodes" going back to the beginning. So that at any time, it's not even aware of all the prior "bad branches" you decided to abandon.
2) My second tip for designs of Coding Agents is do what mine does. I invented a 'block_begin/block_end' syntax which looks like this, and can be in any source file:
// block_begin MyAddNumbers
var = add(a, b)
return a + b
// block_end
With this syntax you can use English language to explain and reason about extremely specific parts of your code with out expecting the LLM to "just understand". You can also direct the LLM to only edit/update specific "Named Blocks", as I call them.
So a trivial example of a prompt expression related to the above might be "Always put number adding stuff in the MyAddNumbers Block".
To explain entire architectural aspects to the LLM, these code block names are extremely useful.
Proper context is absolutely everything when it comes to LLM use
OpenAI finally made it where you can go back and edit a prior response, in their chat view, but their GUI is jank, because it's not a tree.
Everything else, is just putting layers, that are not nearly as capable at an LLM, between me and the raw power of the LLM.
The core realization I made to truly unlock LLM code assistance as a 10x + productivity gain, is that I am not writing code anymore, I am writing requirements. It means being less an engineer, and more a manager, or perhaps an architect. It's not your job to write tax code anymore, it's your job to describe what the tax code needs to accomplish and how it's success can be defined and validated.
Also, it's never even close to true that nobody uses LLMs for production software, here's a write-up by Google talking about using LLMs to drastically accelerate the migration of complex enterprise production systems: https://arxiv.org/pdf/2501.06972
Writing a crud web API? Great! Writing business logic for a niche edge case in a highly specialized domain? Good luck.
What's a good interface?
There are a few things we try to balance to make a good UI/UX:
- Latency: How long it takes to do a single task
- Decision-tree pathing: How many tasks to meet a goal
- Flexibility/Configurability: How much of a task can be encapsulated by the user's predefined knowledge of the system
- Discoverability: What tasks are available, and where
The perfect NLP chat could accomplish some of these:
- Flexibility/Configurability: Define/infer words and phrases that the user can use as shortcuts
- Decision-tree pathing: Define concepts that shortcut an otherwise verbose interaction
- Latency: Context-aware text-completions so the user doesn't need to type as much
- Discoverability: Well-formed introductions and clarifying questions to introduce useful interaction
This can only get us so far. What better latency can be accomplished than a button or a keyboard shortcut? What better discoverability than a menu?
The most exciting prospect left is flexibility. Traditional software is inflexible. It can only perform the interaction it was already designed with. Every design decision becomes a wall of assumption. These walls are the fundamental architecture of software. Without them, we would have nothing. With them, we have a structure that guides us along whatever assumptions were already made.
If we want to change something about our software's UI, then we must change the software itself, and that means writing. If NLP was a truly solved problem, then software compatibility and flexibility would be trivialized. We could redesign the entire UI by simply describing the changes we want.
LLMs are not even close. Sure, you can get one to generate some code, but only if the code you want generated is close enough to the text it was already trained on. LLMs construct continuations of tokens: no more, no less. There is no logic. There is no consideration about what is right or wrong: only what is likely to come next.
Like you said,
> You can’t build real software without being precise about what you want.
This is the ultimate limitation of UI. If only we could be ambiguous instead! LLMs let us do that, but they keep that ambiguity permanent. There is no real way to tie an LLM back down to reality. No logic. No axioms. No rules. So we must either be precise or ambiguous. The latter option is an exciting development, and certainly offers its own unique advantages, but it isn't a complete solution.
---
I've been thinking through another approach to the ambiguity problem that I think could really give us the expressive power of natural language, while preserving the logical structure we use to write software (and more). It wouldn't solve the problem entirely, but it could potentially move it out of the way.
So, something like Gherkin?
I think this post shows there could be a couple levels of indirection, some kind of combination of the "overarching design doc" that is injected into every prompt, and a more tactical level syntax/code/process that we have with something like a chat window that is code aware. I've definitely done some crazy stuff by just asking something really stupid like "Is there any way to speed this up?" and Claude giving me some esoteric pandas optimization that gave me a 100x speedup.
I think overall the tools have crazy variance in quality of output, but I think with some "multifacet prompting", ie, code styling, design doc, architect docs, constraints, etc you might end up with something that is much more useful.
Eh, that's just copium because we all have a vested monetary interest in them not being useful for "anything real", whatever that means. If it turns out that there useful for "real things", then then entire industry would get turned on its head. (hint: they're useful for "real" things), though putting the entire codebase into the context window doesn't currently work. Aider works past this by passing the directory tree and filenames as context, so the LLM guess that /cloud/scope/cluster.go is where the cluster scope code lives and ask for that specific file to get added to the context and you can ask it to add, say, logging code to that file.