As a fellow "staff engineer" LLMs are terrible at writing or teaching how to write idiomatic code, and they are actually causing me to spend more time reviewing than I was previously due to the influx of junior to senior engineers trying to sneak in LLM garbage.
In my opinion, using LLMs to write code comes as a faustian deal where you learn terrible practices and rely on code quantity, boilerplate, and indeterministic outputs - all hallmarks of poor software craftsmanship. Until ML can actually go end to end on requirements to product and they fire all of us, you can't cut corners on building intuition as a human by forgoing reading and writing code yourself.
I do think that there is a place for LLMs in generating ideas or exploring an untrusted knowledge base of information, but using code generated from an LLM is pure madness unless what you are building is truly going to be thrown away and rewritten from scratch, as is relying on it as a linting, debugging, or source of truth tool.
tokioyoyo 2 days ago [-]
I will get probably heavily crucified for this, but to people who are ideologically opposing AI generated code — executives, directors and managerial staff think the opposite. Being very anti-LLM code instead of trying to understand how it can improve the speed might be detrimental for your career.
Personally, I’m on the fence. But having conversations with others, and some requests from execs to implement different AI utils into our processes… making me to be on the safer side of job security, rather than dismiss it and be adamant against it.
feoren 2 days ago [-]
> executives, directors and managerial staff think the opposite
Executives, directors, and managerial staff have had their heads up their own asses since the dawn of civilization. Riding the waves of terrible executive decisions is unfortunately part of professional life. Executives like the idea of LLMs because it means they can lay you off; they're not going to care about your opinion on it one way or another.
> Being very anti-LLM code instead of trying to understand how it can improve the speed might be detrimental for your career.
You're making the assumption that LLMs can improve your speed. That's the very assumption being questioned by GP. Heaps of low-quality code do not improve development speed.
simonw 2 days ago [-]
I'm willing to stake my reputation on the idea that yes, LLMs can improve your speed. You have to learn how to use them effectively and responsibly but the productivity boosts they can give you once you figure that out are very real.
tokioyoyo 2 days ago [-]
I'm with you on this one. If one's experience is just using it as a general chat bot, then yeah, I can see why people are reluctant thinking it's useless. I have a feeling that a good chunk of people haven't tried to use latest models on a medium-small project from scratch, where you have to play around with its intricacies to build intuition around it.
It becomes some sort of muscle memory, where I can predict whether using LLM would be faster or slower. Or where it's more likely to give bad suggestions or not. Basically treating it as googling skills.
simonw 2 days ago [-]
"It becomes some sort of muscle memory, where I can predict whether using LLM would be faster or slower"
Yeah, that intuition is so important. You have to use the models a whole bunch to develop it, but eventually you get a sort of sixth sense where you can predict if an LLM is going to be useful or harmful on a problem and be right about it 9/10 times.
My frustration is that intuition isn't something I can teach! I'd love to be able to explain why I can tell that problem X is a good fit and problem Y isn't, but often the answer is pretty much just "vibes" based on past experience.
tokioyoyo 2 days ago [-]
Totally! My current biggest problems:
1) Being up to date with the latest capabilities - this one has slowed down a bit, and my biggest self-learning experience was in August/September, and most of the intuition still works. However, although I had time to do so, it's hard to ask my team to drop 5-6 free weekends of their lives to get up to speed
2) Transition period where not everyone is on the same page about LLM - this I think is much harder, because the expectations from the executives are much different than on the ground developers using LLMs.
A lot of people could benefit on alignment of expectations, but once again, it's hard to explain what is possible and not possible if your statements will become nullified a month later with a new AI model/product/feature.
th0ma5 2 days ago [-]
You're confusing intuition of what works or not with being too close to your problem to make a judgement on the general applicability of your techniques.
simonw 2 days ago [-]
I don't understand what you mean.
entropicdrifter 1 days ago [-]
They're saying it might work for you, but isn't generally applicable (because most people aren't going to develop this intuition, presumably).
Not sure I agree with that. I would say there are classes of problems where LLMs will generally help and a brief training course (1 week, say) would vastly improve the average (non-LLM-trained) engineer's ability to use it productively.
th0ma5 22 hours ago [-]
No it is more like thinking that my prescribed way of doing stuff must be the way things work in general because it works for me. But you give instructions specifically about everything you did but the person you give them to isn't your exact height or can't read your language, so you can easily assume now that they just don't get it. But with these LLMs you also get this bias hidden from you as you inch closer to the solution at every turn. The result seems "obvious" but the outcomes were never guaranteed and will most likely be different for someone else if one thing at point is different.
simonw 20 hours ago [-]
My whole thing about LLMs is that using them isn't "obvious". I've been banging that drum for over a year now - the single biggest misconception about LLMs is that they are easy to use and you don't need to put a serious amount of effort into learning how to best apply them.
th0ma5 17 hours ago [-]
It's to me more that the effort you put in is not a net gain. You don't advance a way of working with them that way because of myriad reasons including things from ownership of the models, fundamentals of the resulting total probabilistic space of the interaction, to just simple randomness even at low temperatures. The "learning how to best apply them" is not definable because who is learning what to apply what to what... The most succint way I know how to describe these issues is that like startup success you're saying "these are lotto number that worked for me" in many of the assumptions you make about the projects you present.
In real, traditional, deterministic systems where you explicitly design a feature, even that has difficulty being coherent over time as usage grows. Think of tab stops on a typewriting evolving from an improvised template, to metal tabs installed above the keyboard, to someone cutting and pasting incorrectly and reflowing a 200 page document to 212 pages accidentally because of tab characters...
If you create a system with these models that writes the code to process a bunch of documents in some way or so some kind of herculean automation you haven't improved the situation when it comes to clarity or simplicity, even if the task at hand finishes sooner for you in this moment.
Every token generated has an equal potential to spiral out into new complexities and whack a mole issues that tie you to assumptions about the system design while providing this veneer that you have control over the intersections of these issues, but as this situation grows you create an ever bigger problem space.
And I definitely hear you say, this is the point where you use sort of full stack interoception holistic intuition about how to persuade the system towards a higher order concept of the system and expand your ideas about how the problem could be solved and let the model guide you ... And that is precisely the mysticism I object to because it isn't actually a kind productiveness, but a struggle, a constant guessing, and any insight from this can be taken away, changed accidentally, censored, or packaged as a front run against your control.
Additionally the nature of not having separate in band and out of band streams of data means that even with agents and reasoning and all of the avenues of exploration and improving performance will still not escape the fundamental question of ... What is the total information contained in the entire probabilistic space. If you try to do out of band control in some way like the latest thing I just read where they have a separate censoring layer, you just either wind up having to use another LLM layer there which still contains all of these issues, or you use some kind of non transformer method like Bayesian filtering or something and you get all of the issues outlined in the seminal spam.txt document...
So, given all of this, I think it is really neat the kinds of feats you demonstrate, but I object that these issues can be boiled down to "putting a serious amount of effort into learning how to best apply them" because I just don't think that's a coherent view of the total problem, and not actually something that is achievable like learning in other subjects like math or something. I know it isn't answerable but for me a guiding question remains why do I have to work at all, is the model too small to know what I want or mean without any effort? The pushback against prompt engineering and the rise of agentic stuff and reasoning all seems to essentially be saying that, but it too has hit diminishing returns.
chefandy 2 days ago [-]
Many bosses are willing to stake their subordinates’ reputations on it, too.
philipwhiuk 2 days ago [-]
The problem is, I don't know you well enough for that to be worth much.
My experience has been that it's slightly improved code completion and helped with prototyping.
re5i5tor 2 days ago [-]
Ummm .... try reading maybe? simon willison dot net
th0ma5 2 days ago [-]
I've read enough from Simon to know that he doesn't know how to build or maintain reliable real world systems.
jgalt212 2 days ago [-]
Django?
codr7 2 days ago [-]
Depends on your skills, and the more you use them the less you learn, the more dependent you become.
Aeolun 2 days ago [-]
> Executives like the idea of LLMs because it means they can lay you off; they're not going to care about your opinion on it one way or another.
Probably, but when the time comes for layoffs the ones that will be the first to go are those that are hiding under a rock, claiming that there is no value to those LLM’s even as they’re being replaced.
codr7 2 days ago [-]
I can see it go the other way; as LLM's improve, the need for prompters will decrease.
The need for real coding skills however, won't.
tokioyoyo 2 days ago [-]
One's "real coding skills" don't get judged that much during a performance examination.
codr7 2 days ago [-]
That time will come once enough people have LLM'ed their software to hell and back and nothing works anymore.
Aeolun 2 days ago [-]
Just like the day the price of housing and bitcoin will crash. It’ll be here any day now I’m sure!
BerislavLopac 2 days ago [-]
In my mind, this dilemma is actually very simple:
First, what LLMs/GenAI do is automated code generation, plain and simple. We've had code generation for a very long time; heck, even compiling is automated generation of code.
What is new with LLM code generation is non-deterministic, unlike traditional code generation tools; like a box of chocolates, you never know what you're going to get.
So, as long as you have tools and mechanisms to make that non-determinism irrelevant, using LLMs to write code is not a problem at all. In fact, guess what? Hand-coding is also non-deterministic, so we already have plenty of those in place: automated tests, code reviews etc.
subw00f 2 days ago [-]
I think I’m having the same experience as you. I’ve heard multiple times from execs in my company that “software” will have less value and that, in a few years, there won’t be as many developer jobs.
Don’t get me wrong—I’ve seen productivity gains both in LLMs explaining code/ideation and in actual implementation, and I use them regularly in my workflow now. I quite like it. But these people are itching to eliminate the cost of maintaining a dev team, and it shows in the level of wishful thinking they display. They write a snake game one day using ChatGPT, and the next, they’re telling you that you might be too slow—despite a string of record-breaking quarters driven by successful product iterations.
I really don’t want to be a naysayer here, but it’s pretty demoralizing when these are the same people who decide your compensation and overall employment status.
adamredwoods 2 days ago [-]
>> But these people are itching to eliminate the cost of maintaining a dev team, and it shows in the level of wishful thinking they display.
And this is the promise of AI, to eliminate jobs. If CEOs invest heavily in this, they won't back down because no one wants to be wrong.
I understand some people try to claim AI might make net more jobs (someday), but I just don't think that is what CEOs are going for.
nicoburns 2 days ago [-]
> If CEOs invest heavily in this, they won't back down because no one wants to be wrong.
They might not have to. If the results are bad enough then their companies might straight-up fail. I'd be willing to bet that at least one company has already failed due to betting too heavily on LLMs.
That isn't to say that LLMs have no uses. But just that CEOs willing something to work isn't sufficient to make it work.
Yoric 2 days ago [-]
Yes, but don't forget that higher-ups also control, to a large extent, the narrative. Whether laying off developers to replace them with LLMs was good for the company is largely uncorrelated to whether the person in charge of the operation will get promoted for having successfully saved all this money for the company.
Pre-LLM, that's how Boeing destroyed itself. By creating value for the shareholders.
subw00f 2 days ago [-]
It makes sense—we all know how capitalism works. But the thing is, how can you not apply the law of diminishing returns here? The models are getting marginally better with a significant increase in investment, except for DeepSeek’s latest developments, which are impressive but mostly for cost reasons—not because we’ve achieved anything remotely close to AGI.
If your experienced employees, who are giving an honest try to all these tools, are telling you it’s not a silver bullet, maybe you should hold your horses a little and try to take advantage of reality—which is actually better—rather than forcing some pipe dream down your bottom line’s throat while negating any productivity gains by demotivating them with your bullshit or misdirecting their efforts into finding a problem for a given solution.
mostertoaster 2 days ago [-]
I think with LLMs we will actually see the demand for software developers who can understand code and know how to use the tools sky rocket. There will be ultimately be way more money in total going towards software developers, but average pay will be well above the median pay.
ukoki 2 days ago [-]
> I’ve heard multiple times from execs in my company that “software” will have less value and that, in a few years, there won’t be as many developer jobs.
If LLMs make average devs 10x more productive, Jevon's Paradox[1] suggests we'll just make 10x more software rather than have 10x fewer devs. You can now implement that feature only one customer cares about, or test 10x more prototypes before building your product. And if you instead decide to decimate your engineering team, watch out because your competitors might not.
Just another way for the people on top to siphon money from everyone else. No individual contributor is going to be rewarded for any productivity increase beyond what is absolutely required to get them to fulfill the company’s goals, and the goal posts will be moving so fast that keeping up will be a full time job. As we see from the current job market, the supply of problems the commercial software market needs more coders to solve maybe isn’t quite as bountiful as we thought it was, and maybe we won’t need to perpetually ramp up the number of developers humanity has… maybe we even have too many already? If a company’s top developer can do the work of the 10 developers below them, their boss is going to happily fire the extra developers— not think of all the incredible other things they could use those developers for. A lot of developers assume that one uber-productive developer left standing will be more valuable to the company than they were before— but now that developer is competing with 10 people that also know the code base willing to work for a lot cheaper. We get paid based on what the market will bear, not the amount of value we deliver, so 100% of that newfound profit goes to the top and the rest of it goes to reducing the price of the product to stay competitive with every other company doing the exact same thing.
Maybe I’m being overly cynical, but assuming this isn’t a race to the bottom and people will get rich being super productive ai-enhanced code monsters, to me, looks like a conceited white collar version of the hustle porn guys that think if they simultaneously work the right combo of gig apps at the right time of day in the right spots then they can work their way up to being wealthy entrepreneurs. Good luck.
dasil003 1 days ago [-]
Writing code isn’t the bottleneck though. What LLMs do is knock out the floor because you used to need a pretty significant baseline knowledge of a programming language to do anything. Now you don’t need that because you can just spray and pray prompts with no programming knowledge. This actually works to a point since most business code is actually repetitive CRUD. The problem comes with the fact that implicit expectations that the higher level system run with a certain uptime, level of quality, and conform to any number of common sense assumptions that no one but a good programmer was thinking about until someone uses the system and says “why does it do this completely wrong thing”. There are huge classes of these types of problems that an LLM will not be capable of resolving, and if you’ve been blasting ahead with mountains of LLM slop code even the best human programmers might not be able to save you. In other words I think LLMs will make it easy to paint yourself into a corner if you gut the technical core of your team.
chefandy 1 days ago [-]
But there’s no hard ceiling above the people on the bottom. It’s not a stratification — it’s a spectrum. The lower-end developers replaced easily by LLMs aren’t going to just give up and become task rabbits: they’re going to update their skills trying to qualify for the (probably temporarily) less vulnerable jobs above them. They might never be good enough to solve the really hard problems, they’ll put pressure on those just above them… which will echo up the entire industry. When everyone— regardless of the applicability of LLMs to their workflow— is suddenly facing competition from the developers just below them because of this upward pressure, the market gets a whole lot shittier. Damn near everybody I’ve spoken to thinks they’re the special one that surely can’t be heavily affected by LLMs because their job is uniquely difficult/quality-focused/etc. Even for the smallish percentage of people for whom that’s true, the value of their skill set on a whole is still going to take a huge hit.
What seems far more likely to me is that computer scientists will be doing math research and wrangling LLMS, a vanishingly small number of dedicated software engineers work on most practical textual coding tasks with engineering methodologies, and low or no code tooling with the aid of LLMs gets good enough to make custom software something made by mostly less-technical people with domain knowledge, like spreadsheet scripting.
A lot of people in the LLM booster crowd think LLMs will replace specialists with generalists. I think that’s utterly ridiculous. LLMs easily have the shallow/broad knowledge generalists require, but struggle with the accuracy and trustworthiness for specialized work. They are much more likely to replace the generalists currently supporting people with domain-specific expertise too deep to trust to LLMs. The problem here is that most developers aren’t really specialists. They work across the spectrum of disciplines and domains but know how to use a very complex toolkit. The more accessible those tools are to other people, the more the skill dissolves into the expected professional skill set.
nyarlathotep_ 2 days ago [-]
Yeah it seems pretty obvious where this is all going and yet a sizable proportion of the programming population cheers on every recent advancement that makes their skills more and more of a commodity.
hansvm 2 days ago [-]
"Yes, of course, I'm using AI at every single opportunity where I think it'll improve my output"
<<never uses AI>>
giobox 2 days ago [-]
This simply doesn't work much of the time as an excuse - virtually all the AI tool subscriptions for corporations provide per user stats on how much each staff member is using the AI assist. This shouldn't come as a surprise - software tool purveyors need to demonstrate ROI to their customer's management teams and as always this is in reporting tools.
I've already seen several rounds of slacks: "why aren't you using <insert LLM coding assistant name>?" off the back of this reporting.
These assistants essentially spy on you working in many cases, if the subscription is coming from your employer and is not a personal account. For one service, I was able to see full logging of all the chats every employee ever had.
codr7 2 days ago [-]
The very second someone starts monitoring me like that I'm out. Let them write their own software.
tokioyoyo 2 days ago [-]
It's not necessarily just monitoring though. I actively ask that question when I don't see certain keys not being used to inquire their relevancy. Basically taking feedback from some engineers, and generalizing it. Obviously in my case we're doing it in good faith, and assuming people will try to get their work done with whatever tools we give them access to. Like I see Anthropic keys get heavily used among eng department, but I constantly get requests for OpenAI keys for Zapier connects and etc. for business people.
rectang 2 days ago [-]
This has been true for every heavily marketed development aid (beneficial or not) for as long as the industry has existed. Managing the politics and the expectations of non-technical management is part of career development.
tokioyoyo 2 days ago [-]
Yeah, I totally agree, and you're 100% right. But the amount of integrations I've personally done and have instructed my team to do implies this one will be around for a while. At some point spending too much time on code that could be easily generated will be a negative point on your performance.
I've heard exactly the same stories from my friends in larger tech companies as well. Every all hands there's a push for more AI integration, getting staff to use AI tools and etc., with the big expectation that development will get faster.
liamwire 2 days ago [-]
> At some point spending too much time on code that could be easily generated will be a negative point on your performance.
If we take the premise at face value, then this is a time
management question, and that’s a part of pretty
much every performance evaluation everywhere. You’re not rewarded for writing some throwaway internal tooling that’s needed ASAP in assembly or with a handcrafted native UI, even if it’s strictly better once done. Instead you bash it out in a day’s worth of Electron shitfuckery and keep the wheels moving, even if it makes you sick.
Hyperbole aside, hopefully the point is clear: better is a business decision as much as a technical one, and if an LLM can (one day) do the 80% of the Pareto distribution, then you’d better be working on the other 20% when management come knocking. If I run a cafe, I need my baristas making coffee when the orders are stacking up, not polishing the machine.
Caveats for critical code, maintenance, technical debt, etc. of course. Good engineers know when to push back, but also, crucially, when it doesn’t serve a purpose to do so.
rectang 2 days ago [-]
I don't think AI is an exception. In organizations where there were top-down mandates for Agile, or OOP, or Test-Driven Development, or you-name-it, those who didn't take up the mandate with zeal were likely to find themselves out of favor.
tokioyoyo 2 days ago [-]
It's not necessarily top down. I genuinely don't know a single person in my organization who doesn't use LLMs one way or another. Obviously with different degrees of applications, but literally everyone does. And we haven't had a real "EVERYONE MUST USE AI!", just people suggesting and asking for specific model usages, access to apps like Cursor and so on.
(I know it because I'm in charge of maintaining all processes around LLM keys, their usages, Cursor stuff and etc.)
rectang 2 days ago [-]
> Being very anti-LLM code instead of trying to understand how it can improve the speed might be detrimental for your career.
> I'm in charge of maintaining all processes around LLM keys
Does management look to you for insight on which staffers are appropriately committed to leveraging AI?
tokioyoyo 2 days ago [-]
No, right now the only thing higher ups ask from me is general percentage usages for different types of model/software usages (Anthropic/OpenAI/Cursor and etc.), so we can reassess subscriptions to cut costs wherever it's needed. But to be fair, they have access to the same dashboards as I do, so if they want to, they can look for it.
aprilthird2021 2 days ago [-]
> executives, directors and managerial staff think the opposite
The entire reason they hire us is to let them know if what they think makes sense. No one is ideologically opposed to AI generated code. It comes with lots of negatives and caveats that make relying on it costly in ways we can easily show to any executives, directors, etc. who care about the technical feasibility of their feelings.
tokioyoyo 2 days ago [-]
> No one is ideologically opposed to AI generated code
Unfortunately, that hasn't been my experience. But I agree with you comment generally.
v3xro 2 days ago [-]
As a former "staff engineer" these executives can go and have their careers and leave people who want to have code they can understand, reason about and focus on quality software well alone.
hinkley 2 days ago [-]
When IntelliJ was young the autocomplete and automated refactoring were massive game changers. It felt like a dawn of a new age. But then release after release no new refactorings materialized. I don't know if they hit the Pareto limit or the people responsible moved on to new projects.
I think that's the sort of spot where better tools might be appropriate. I know what I want to do, but it's a mess to do it. I suspect that will be better at facilitating growth instead of stunting it.
jiggawatts 2 days ago [-]
Hmm… I wonder if there will be a category of LLM-assisted refactoring tools that combine mechanistic transformations with the more flexible capabilities of generative AI. E.g.: update the English text in comments automatically to reflect code structure changes.
hinkley 2 days ago [-]
Little tools like how to pluralize nouns, convert adjectives to verbs (function takes data and arranges it into a response that the adjective applies to), would help a lot with rename refactors.
esafak 2 days ago [-]
What refactoring do you want IntelliJ to do that it can not?
billy99k 2 days ago [-]
I've seen the exact opposite. Management at my company has been trying to shove AI into everything. They even said that this year we would be dropping all vendors that didn't have some for of AI in their workflow.
2 days ago [-]
satellite2 2 days ago [-]
I just don't fully understand this position at this level. Personally I know exactly what the next 5 lines need to be, and whether I write them or auto complete or some AI write them doesn't matter. I'll only accept what I had in mind exactly. And with Copilot for boilerplate and relatively trivial tasks that happens pretty often. I feel I'm just saving time / old age joint pain.
purerandomness 2 days ago [-]
If the next 5 lines of code are so predictable, do they really need to be written down?
If you're truly saving time by having an LLM write boiler plate code, is there maybe an opportunity to abstract things away so that higher-level concepts, or more expressive code could be used instead?
jaredklewis 2 days ago [-]
Sure, but abstractions have a cost.
5 lines of code written with just the core language and standard library are often much easier to read and digest than a new abstraction or call to some library.
And it’s just an unfortunate fact of life that many of the common programming languages are not terribly ergonomic; it’s not uncommon for even basic operations to require a few lines of boilerplate. That isn’t always bad as languages are balancing many different goals (expressiveness, performance, simplicity and so on).
weitendorf 2 days ago [-]
I have lately been writing a decent amount of Svelte. Svelte and frontend in general is relatively new to me, but since I’ve been programming for a while now I can usually articulate what I want to do in English. LLMs are totally a game changer for me in this scenario - they basically take me from someone who has to look everything up all the time to someone who only does so a couple times a day.
In a way LLMs are ushering in a kind of boilerplate renaissance IMO. When you can have an LLM refactor a massive amount of boilerplate in one fell swoop it starts to not matter much if you repeat yourself - actually, really logically dense code would probably be harder for LLMs to understand and modify (not dissimilar from us…) so it’s even more of a liability now than in the past. I would almost always rather have simple, easy-to-understand code than something elegant and compact and “expressive” - and our tools increasingly favor this too.
Also I really don’t give a shit about how to best center a div nor do I want to memorize a million different markup tags and their 25 years of baggage. I don’t find that kind of knowledge gratifying because it’s more trivia than anything insightful. I’m glad that with LLMs I can minimize the time I spend thinking about those things.
everforward 1 days ago [-]
Some languages don't give that opportunity. E.g. the "if err != nil" blocks in Go are effectively required and obvious, but are mandated by the language.
Other things are complicated to abstract for the boilerplate they avoid. The kind of thing that avoids 100 lines of code but causes errors that take 20 minutes to understand because of heavy use of reflection/inferred types in generics/etc. The older I get, the more I think "clever" reflection is more of a sin than boring boilerplate.
axlee 2 days ago [-]
What's your stack ? I have the complete opposite experience. LLMs are amazing at writing idiomatic code, less so at dealing with esoteric use cases.
And very often, if the LLM produces a poopoo, asking it to fix it again works just well enough.
Bjartr 2 days ago [-]
> asking it to fix it again works just well enough.
I've yet to encounter any LLM from chatGPT to cursor, that doesn't choke and start to repeat itself and say it changed code when it didn't, or get stuck changing something back and forth repeatedly inside of 10-20 minutes. Like just a handful of exchanges and it's worthless. Are people who make this workflow effective summarizing and creating a fresh prompt every 5 minutes or something?
simonw 2 days ago [-]
One of the most important skills to develop when using LLMs is learning how to manage your context. If an LLM starts misbehaving or making repeated mistakes, start a fresh conversation and paste in just the working pieces that are needed to continue.
I estimate a sizable portion of my successful LLM coding sessions included at least a few resets of this nature.
sdesol 2 days ago [-]
> using LLMs is learning how to manage your context.
This is the most important thing in my opinion. This is why I switched to showing tokens in my chat app.
I treat tokens like the tachometer for a car's engine. The higher you go, the more gas you will consume, and the greater the chance you will blow up your engine. Different LLMs will have different redlines and the more tokens you have, the more costly every conversation will become and the greater the chance it will just start spitting gibberish.
So far, my redline for all models is 25,000 tokens, but I really do not want to go above 20,000. If I hit 16,000 tokens, I will start to think about summarizing the conversation and starting a new one based on the summary.
The initial token count is also important in my opinion. If you are trying to solve a complex problem that is not well known by the LLM and if you are only starting with 1000 or less tokens, you will almost certainly not get a good answer. I personally think 7,000 to 16,000 is the sweet spot. For most problems, I won't have the LLM generate any code until I reach about 7,000 since it means it has enough files in context to properly take a shot at producing code.
kridsdale3 1 days ago [-]
I'm doing ok using the latest Gemini which is (apparently) ok with 1 million tokens.
dingnuts 2 days ago [-]
all that fiddling and copy pasting takes me longer than just writing the code most of the time
skydhash 2 days ago [-]
And for any project that's been around long enough, you find yourself mostly copy-pasting or searching for the one line you have to edit.
codr7 2 days ago [-]
Exactly, while not learning anything along the way.
liamwire 2 days ago [-]
Only if you assume one is blindly copy/pasting without reading anything, or is already a domain expert. Otherwise you’ve absolutely got the ability to learn from the process, but it’s an active process you’ve got to engage with. Hell, ask questions along the way that interest
you, as you would any other teacher. Just verify the important bits of course.
codr7 2 days ago [-]
No, learning means failing, scratching your head, banging your head against the wall.
Learning takes time.
liamwire 2 days ago [-]
I’d agree that’s one definition of learning, but there exists entire subsets of learning that don’t require you to be stuck on a problem. You can pick up simple, and related concepts without first needing to struggle with them. Incrementally building on those moments is as true a form of learning as any other I’d argue. I’d go as far as saying you can also have the moments you’re describing while using an LLM, again with intentionality, not passively.
NicuCalcea 2 days ago [-]
Hm, I use LLMs almost daily, and I've never had it say it changed code and not do it. If anything, they will sometimes try to "improve" parts of the code I didn't ask them to modify. Most times I don't mind, and if I do, it's usually a quick edit to say "leave that bit alone" and resubmit.
> Are people who make this workflow effective summarizing and creating a fresh prompt every 5 minutes or something?
I work on one small problem at a time, only following up if I need an update or change on the same block of code (or something very relevant). Most conversations are fewer than five prompt/response pairs, usually one-three. If the LLM gets something wrong, I edit my prompt to explain what I want better, or to tell it not to take a specific approach, rather than correcting it in a reply. It gets a little messy otherwise, and the AI starts to trip up on its own past mistakes.
If I move on to a different (sub)task, I start a new conversation. I have a brief overview of my project in the README or some other file and include that in the prompt for more context, along with a tree view of the repository and the file I want edited.
I am not a software engineer and I often need things explained, which I tell the LLM in a custom system prompt. I also include a few additional instructions that suit my workflow, like asking it to tell me if it needs another file or documentation, if it doesn't know something, etc.
Aeolun 2 days ago [-]
Creating a new prompt. Sometimes it can go for a while without, but the first response (with crafted context) is generally the best. Having context from the earlier conversation has its uses though.
knicholes 2 days ago [-]
The LLM you choose to work with in Cursor makes a big difference, too. I'm a fan of Claude 3.5 Sonnet.
slashdev 2 days ago [-]
In my experience you have to tell it what to fix. Sometimes how as well.
beepbooptheory 2 days ago [-]
Simply, it made my last job so nightmarish that for the first time in this career I absolutely dreaded even thinking about the codebase or having to work the next day. We can argue about the principle of it all day, or you can say things like "you are just doing it wrong," but ultimately there is just the boots-on-the-ground experience of it that is going to leave the biggest impression on me, at least. Like it's just so bad to have to work alongside, either the model itself or your coworker with the best of intentions but no domain knowledge.
Its like having to forever be the most miserable detective in the world; no mystery, only clues. A method that never existed, three different types that express the same thing, the cheeky smile of your coworker who says he can turn the whole backend into using an ORM in a day because he has Cursor, the manager who signs off on this, the deranged PR the next day. This continual sense that less and less people even know whats going on anymore...
"Can you make sure we support both Mongo and postgres?"
"Can you put this React component inside this Angular app?"
"Can you setup the kubernetes with docker compose?"
esafak 2 days ago [-]
Hiring standards are important, as are managers who get it. Your organization seems to be lacking in both.
the_mitsuhiko 2 days ago [-]
> but using code generated from an LLM is pure madness unless what you are building is truly going to be thrown away and rewritten from scratch, as is relying on it as a linting, debugging, or source of truth tool.
That does not match my experience at all. You obviously have to use your brain to review it, but for a lot of problems LLMs produce close to perfect code in record time. It depends a lot on your prompting skills though.
xmprt 2 days ago [-]
Perhaps I suck at prompting but what I've noticed is that if an LLM has hallucinated something or learned a fake fact, it will use that fact no matter what you say to try to steer it away. The only way to get out of the loop is to know the answer yourself but in that case you wouldn't need an LLM.
liamwire 2 days ago [-]
I’ve found a good way to get unstuck here is to use another model, either or comparable or superior quality, or interestingly sometimes even a weaker version of the same product (e.g. Claude Haiku, vs. Sonnet*). My mental model here is similar to pair programming or simply bringing in a colleague when you’re stuck.
*I don’t know to what extent it’s worthwhile discussing whether you could call these the same model vs. entirely different, for any two products in the same family. Outside of simply quantising the same model and nothing else. Maybe you could include distillations of a base model too?
amalcon 2 days ago [-]
The idea of using a smaller version of the same (or a similar) model as a check is interesting. Overfitting is super basic, and tends to be less prominent in systems with fewer parameters. When this works, you may be finding examples of this exact phenomenon.
sdesol 2 days ago [-]
> The idea of using a smaller version of the same (or a similar) model as a check is interesting.
I built my chat app around this idea and to save money. When it comes to coding, I feel Sonnet 3.5 is still the best but I don't start with it. I tend to use cheaper models in the beginning since it usually takes a few iterations to get to a certain point and I don't want to waste tokens in the process. When I've reached a certain state or if it is clear that the LLM is not helping, I will bring in Sonnet to review things.
Here is an example of how the conversation between models will work.
The reason why this works for my application is, I have a system prompt that includes the following lines:
# Critical Context Information
Your name is {{gs-chat-llm-model}} and the current date and time is {{gs-chat-datetime}}.
When I make an API call, I will replace the template strings with the model and date. I also made sure to include instructions in the first user message to let the model know it needs to sign off on each message. So with the system prompt and message signature, you can say "what do you think of <LLM's> response".
codr7 2 days ago [-]
I would say prompting skills relative coding skills; and the more you rely on them, the less you learn.
the_mitsuhiko 2 days ago [-]
That is not my experience. I wrote recently [1] about how I use it and it’s more like an intern, pair programmer or rubber duck. None of which make you worse.
> it’s more like an intern, pair programmer or rubber duck. None of which make you worse.
Are you sure? I've definitely had cases where an inexperienced pair programmer made my code worse.
the_mitsuhiko 2 days ago [-]
That’s a different question. But you don’t learn less.
brandall10 2 days ago [-]
It's helpful to view working solutions and quality code as separate things to the LLM.
* If you ask it to solve a problem and nothing more, chances are the code isn't the best as it will default to the most common solutions in the training data.
* If you ask it to refactor some code idiomatically, it will apply most common idiomatic concepts found in the training data.
* If you ask it to do both at the same time you're more likely to get higher quality but incorrect code.
It's better to get a working solution first, then ask it to improve that solution, rinse/repeat in smallish chunks of 50-100 loc at a time. This is kinda why reasoning models are of some benefit, as they allow a certain amount of reflection to tie together disparate portions of the training data into more cohesive, higher quality responses.
jondwillis 2 days ago [-]
It isn't like you can't write tests or reason about the code, iterate on it manually, just because it is generated. You can also give examples of idioms or patterns you would like to follow. It isn't perfect, and I agree that writing code is the best way to build a mental model, but writing code doesn't guarantee intuition either. I have written spaghetti that I could not hope to explain many times, especially when exploring or working in a domain that I am unfamiliar with.
ajmurmann 2 days ago [-]
I described how I liked doing ping-pong pairing TDD with Cursor elsewhere. One of the benefits of that approach is that I write at least half the implementation and tests and review every single line. That means that there is always code that follows the patterns I want and it's right there for the LLM to see and base its work on.
Edit: fix typo in last sentence
scudsworth 2 days ago [-]
i love when the llm can be its work of
ajmurmann 2 days ago [-]
Ugh, sorry for the typo. That was supposed to be "can base its work on"
doug_durham 2 days ago [-]
I've had exactly the opposite experience with generating idiomatic code. I find that the models have a lot of information on the standard idioms of a particular language. If I'm having to write in a language I'm new in, I find it very useful to have the LLM do an idiomatic rewrite. I learn a lot and it helps me to get up to speed more quickly.
2 days ago [-]
qqtt 2 days ago [-]
I wonder if there is a big disconnect partially due to the fact that people are talking about different models. The top tier coding models (sonnet, o1, deepseek) are all pretty good, but it requires paid subscriptions to make use of them or 400GB of local memory to run deepseek.
All the other distilled models and qwen coder and similar are a large step below the above models in terms of most benchmarks. If someone is running a small 20GB model locally, they will not have the same experience as those who run the top of the line models.
Aeolun 2 days ago [-]
The top of the line models are really cheap though. Getting an anthropic key and $5 of credit costs you exactly that, and gives you hundreds of prompts.
2 days ago [-]
arijo 2 days ago [-]
LLMs can work if you program above the code.
You still need to state your assertions with precision and keep a model of the code in your head.
Its possible to be be precise at an higher level of abstraction as long as your prompts are consistent with a coherent model of the code.
elliotto 2 days ago [-]
> Its possible to be be precise at an higher level of abstraction as long as your prompts are consistent with a coherent model of the code.
This is a fantastic quote and I will use this. I describe the future of coding as natural language coding (or maybe syntax agnostic coding). This does not mean that the llm is a magic machine that understands all my business logic. It means what you've described - I can describe my function flow in abstracted english rather than requiring adherence to a syntax
whatever1 2 days ago [-]
The counterargument that I hear is that since writing code is now so easy and cheap, there is no need to write pretty code that generalizes well. Just have the llm write a crappy version and the necessary tests, and once your requirements change you just toss everything and start fresh.
icnexbe7 2 days ago [-]
i’ve had some luck with asking conceptual questions about how something works if i am using library X with protocol Y. i usually get an answer that is either actually useful or at least gets me on the right path of what the answer should be. for code though, it will tell me to use non existent apis from that library to implement things
the_real_cher 2 days ago [-]
Code written from an LLM is really really good if done right,
i.e. reviewing every line of code as it comes out and prompt guiding it in the right direction.
If youre getting junior devs just pooping out code and sending to review thats really bad and should be a pip-able offense in my opinion.
baq 2 days ago [-]
I’ve iterated on 1k lines of react slop in 4h the other day, changed table components twice, handled errors, loading widgets, modals, you name it. It’d take me a couple days easily to get maybe 80% of that done.
The result works ok, nobody cares if the code is good or bad. If it’s bad and there are bugs, doesn’t matter, no humans will look at it anymore - Claude will remix the slop until it works or a new model will rewrite the whole thing from scratch.
Realized during writing this that I should’ve added the extract of requirements in the comment of the index.ts of the package, or maybe a README.CURSOR.md.
mrtesthah 2 days ago [-]
My experience having Claude 3.5 Sonnet or Google Gemini 2.0 Exp-12-06 rewrite a complex function is that it slowly introduces slippage of the original intention behind the code, and the more rewrites or refactoring, the more likely it is to do something other than what was originally intended.
At the absolute minimum this should require including a highly detailed function specification in the prompt context and sending the output to a full unit test suite.
n4r9 2 days ago [-]
> Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away.
Lordy. Is this where software development is going over the next few years?
mrtesthah 2 days ago [-]
In that case we can look forward to literally nothing of any complexity or reliability being produced.
aeonik 2 days ago [-]
It's actually where we have been the whole time.
mirkodrummer 2 days ago [-]
I'd pay to review one of your PRs. Maybe a consistent one with ai usage proof.
baq 2 days ago [-]
Would be great comedic relief for sure since I’m mostly working in the backend mines, where the
LLM-friendly boilerplate is harder to come by admittedly.
> I don’t do this a lot, but sometimes when I’m really stuck on a bug, I’ll attach the entire file or files to Copilot chat, paste the error message, and just ask “can you help?”
The "reasoning" models are MUCH better than this. I've had genuinely fantastic results with this kind of thing against o1 and Gemini Thinking and the new o3-mini - I paste in the whole codebase (usually via my https://github.com/simonw/files-to-prompt tool) and describe the bug or just paste in the error message and the model frequently finds the source, sometimes following the path through several modules to get there.
I have picked up the cursor tool which allows me to throw in relevant files with a drop down menu. Previously I was copy pasting files into the chatgpt browser page, but now I point cursor to o1 and do it within the ide.
One of my favourite things is to ask it if it thinks there are any bugs - this helps a lot with validating any logic that I might be exploring. I recently ported some code to a different environment with slightly different interfaces and it wasn't working - I asked o1 to carefully go over each implementation in detail why it might be producing a different output. It thought for 2 whole minutes and gave me a report of possible causes - the third of which was entirely correct and had to do with how my environment was coercing pandas data types.
There have been 10 or so wow moments over the past few years where I've been shocked by the capabilities of genai and that one made the list.
powersnail 2 days ago [-]
The "attach the entire file" part is very critical.
I've had the experience of seeing some junior dev posting error messages into ChatGPT, applying the suggestions of ChatGPT, and posting the next error message into ChatGPT again. They ended up applying fixes for 3 different kinds of bugs that didn't exist in the code base.
---
Another cause, I think, is that they didn't try to understand any of those (not the solutions, and not the problems that those solutions are supposed to fix). If they did, they would have figured out that the solutions were mismatches to what they were witnessing.
There's a big difference between using LLM as a tool, and treating it like an oracle.
theshrike79 2 days ago [-]
This is why in-IDE LLMs like Copilot are really good.
I just had a case where I was adding stuff to two projects, both open at the same time.
I added new fields to the backend project, then I swapped to the front-end side and the LLM autocomplete gave me 100% exactly what I wanted to add there.
And similar super-accurate autocompletes happen every day for me.
I really don't understand people who complain about "AI slop", what kind of projects are they writing?
jppope 2 days ago [-]
My experience is similar: great for boilerplate, great for autocomplete, starts to fall apart on complex tasks, doesn't do much as far as business logic (how would it know?)- All in all very useful, but not replacing a decent practitioner any time soon.
LLMs can absolutely bust out some corporate docs super crazy fast too... probably a reasonable thing to re-evaluate the value though
nicksergeant 2 days ago [-]
I've had kind of great experiences even doing complex tasks with lots of steps, as long as I tell it to take things slowly and verify each step.
I had a working and complete version of Apple MapKit JS rendering a map for an address (along with the server side token generation), and last night I told it I wanted to switch to Google Maps for "reasons".
It nailed it on the first try, and even gave me quick steps for creating the API keys in Google Dev Console (which is always _super_ fun to navigate).
As Simon has said elsewhere in these comments, it's all about the context you give it (a working example in a slightly different paradigm really couldn't be any better).
theshrike79 2 days ago [-]
Exactly. For unit/integration tests I've found it to be a pretty good assistant.
I have a project with a bunch of tests already, then I pick a test file and write `public Task Test` and wait a few seconds, in most cases it writes down a pretty sane basis for a test - and in a few cases it figured out an edge case I missed.
delduca 2 days ago [-]
> Disclaimer: I work for GitHub, and for a year I worked directly on Copilot.
Ah, now it makes sense.
brianstrimp 2 days ago [-]
Yeah, the submission heading should indicate that there is a high risk for a sales pitch in there.
foobazgt 2 days ago [-]
I wonder if the first bullet point, "smart auto complete", is much less beneficial if you're already using a statically typed language with a good IDE. I already feel like Intellij's auto complete reads my mind most of the time.
Klathmon 2 days ago [-]
LLM autocomplete is an entirely different beast.
Traditional auto complete can finish the statement you started typing, LLMs often suggest whole lines before I even type anything, and even sometimes whole functions.
And static types can assist the LLM too. It's not like it's an either or choice
foobazgt 2 days ago [-]
The author says they do the literal opposite:
"Almost all the completions I accept are complete boilerplate (filling out function arguments or types, for instance). It’s rare that I let Copilot produce business logic for me"
My experience is similar, except I get my IDE to complete these for me instead of an LLM.
neeleshs 2 days ago [-]
I use LLM to generate complete solutions to small technical problems. "Write an input stream implementation that skips lines based on a regex".
Hard for an IDE auto complete to do this.
AOsborn 2 days ago [-]
Yeah absolutely.
I find Copilot is great if you add a small comment describing the logic or function. Taking 10s to write a one line sentence in English can save 5-10 mins writing your code from scratch. Subjectively it feels much faster to QA and review code already written.
Having good typing and DTOs helps too.
baq 2 days ago [-]
> Copilot
…needn’t say more.
Copilot was utter garbage when I switched to cursor+claude, it was like some alien tech upgrade at first.
unregistereddev 2 days ago [-]
Does IntelliJ try to complete the word you are typing, or does it suggest an entire line of code? Because newer versions of IntelliJ incorporate LLM's to beef up autocomplete. You may already be using it.
mrguyorama 2 days ago [-]
I know I'm not using it because Intellij is constantly complaining that my version does not support the AI plugins.
The "dumb" autogenerated stuff is incredible. It's like going from bad autocomplete to Intellisense all over again.
The world of python tooling (at least as used by my former coworkers) put my expectations in the toilet.
foobazgt 2 days ago [-]
The new LLM-based completion in Intellij is not useful. :(
hansvm 2 days ago [-]
It's completely different. If I start writing an itertools library with comptime inlining and my favorite selection of other features, completing map/reduce/take/skip/... exactly how I want them to look, LLM autocomplete can finish the rest of the library exactly as would have written it even for languages it doesn't otherwise know well, outside of the interesting bits (in the context of itertools, that'd be utilities with memory tradeoffs, like tee and groupby).
VenturingVole 2 days ago [-]
My first thought upon reading this was the observation about the fact software engineers are deeply split: How can they be so negative? A mixture of emotions.
Then I reflected, how very true it was. In fact, as of writing this there are 138 comments and I started simply scrolling through what was shown to assess the negative/neutral/positive bias based upon a highly subjective personal assessment: 2/3 were negative and so I decided to stop.
As a profession, it seems many of us have become accustomed to dealing in absolutes when reality is subjective. Judging LLMs prematurely with a level of perfectionism not even cast upon fellow humans.. or at least, if cast upon humans I'd be glad not to be their colleagues.
Honestly right now - I would use this as a litmus test in hiring and the majority would fail based upon their closed-mindedness and ability to understand how to effectively utilise tools at their disposal. It won't exist as a signal for much longer, sadly!
notTooFarGone 2 days ago [-]
It boils down to responsibility.
We need to trust machines more than humans because machines can't get responsibility. That code that you pushed and broke prd - you can't point at the machine.
It is also predictability/growth in a sense. I can assess certain people and know what they will probably get wrong and develop the person and adjust it. If that person uses LLMs it disguises that exposure of skill and leads to a very hard signal to read as a senior dev, hampering their growth.
VenturingVole 2 days ago [-]
I absolutely agree with your points - assuming that "machines" to mean the code as opposed to the LLMs: As a "Staff+" IC type and mentoring and training a couple of apprentice-level developers I've already asked on several occasions "why did you do this?" and had a response of "oh, that's what the AI did." I'm very patient, but have made clear that's something to never utter - at least not until one has the experience to deeply understand boundaries/constraints effectively.
I did see a paper recently on the impact of AI/LLMs and danger to critical thinking skills - it's a very real issue and I'm having to actively counter this seemingly natural tendency many have.
With respect to signals, mine was around the attitude in general. I'd much rather work with someone who goes "Yes, but.." than one who is outright dismissive.
Increasing awareness of the importance of context will be a topic for a long time to come!
mvdtnz 2 days ago [-]
> What about hallucinations? Honestly, since GPT-3.5, I haven’t noticed ChatGPT or Claude doing a lot of hallucinating.
See this is what I don't get about the AI Evangelists. Every time I use the technology I am astounded at the amount of incorrect information and straight up fantasy it invents. When someone tells me that they just don't see it, I have to wonder what is motivating them to lie. There is simply no way you're using the same technology as me with such wildly different results.
simonw 2 days ago [-]
> There is simply no way you're using the same technology as me with such wildly different results.
Prompting styles are incredibly different between different people. It's very possible that they are using the same technology that you are with wildly different results.
I think learning to use LLMs to their maximum effectiveness takes months (maybe even years) of effort. How much time have you spent with them so far?
mrguyorama 2 days ago [-]
> I have to wonder what is motivating them to lie.
Most of these people who aren't salesmen aren't lying.
They just cannot tell when the LLM is making up code. Which is very very sad.
That or they could literally be replaced by a script that copy/pastes from stack-overflow. My friend did that a lot and it definitely helped features ship but doesn't make maintainable code.
the_mitsuhiko 2 days ago [-]
> When someone tells me that they just don't see it, I have to wonder what is motivating them to lie. There is simply no way you're using the same technology as me with such wildly different results.
I don’t know what technology you are using but I know that I am getting very different results based on my own prompt qualities.
I also do not really consider hallucinations to be much of an issue for programming. It comes up so rarely and it’s caught by the type checker almost immediately. If there are hallucinations it’s often very minor things like imagining a flag that doesn’t exist.
mvdtnz 2 days ago [-]
A lot of you guys, including the author, will respond with these "you're holding it wrong" comments. But you never give examples of actual prompts that are somehow significantly different to what I use. The author gives a very small handful of his example prompts and I don't see anything in them that's fundamentally different to what I've tried. If anything his prompts are especially lazy compared to what I use and what I have read as best practice among "prompt engineers":
"is this idiomatic C?"
"not just “how does X work”, but follow-up questions like “how does X relate to Y”. Even more usefully, you can ask “is this right” questions"
"I’ll attach the entire file or files to Copilot chat, paste the error message, and just ask “can you help?”"
the_mitsuhiko 2 days ago [-]
> A lot of you guys, including the author, will respond with these "you're holding it wrong" comments. But you never give examples of actual prompts that are somehow significantly different to what I use.
It’s hard to impossible to discuss these things without a concrete problem at hand. Most of the prompt is the context provided. I can only talk from my experience which is, that how you write the prompt matters.
If you share what you are trying to accomplish I could probably provide some more appropriate insights.
simonw 2 days ago [-]
I've shared a few hundred examples of how I'm using this stuff, with full prompt and response transcripts. You can find them linked from items on these tags on my blog - they're usually gists.
A lot of times they probably are holding it wrong. These things aren’t mind readers. You have to provide proper context and clear asks. They are just another computer interface to learn to use.
And they aren’t perfect, but they sure can save a lot of time once you know how to use them and understand what they are and aren’t good at.
stuartd 2 days ago [-]
> is this idiomatic C?
This is how I use AI at work for maintaining Python projects, a language in which I am not at all really versed. Sometimes I might add “this is how I would do it in …, how would I do this in Python?”
I find this extremely helpful and productive, especially as I have to pull the code onto a server to test it.
synthc 2 days ago [-]
This year I switched to a new job, using programming languages that I was less familiar with.
Asking a LLM to translate between languages works really well most of the time.
It's also a great way to learn which libraries are the standard solution for a language. It really accelerated my learning process.
Sure, there is the occasional too literal translation or hallucination, but I found this useful enough.
brianstrimp 2 days ago [-]
Have you noticed any difference in picking up the language(s) yourself? As in, do you think you'd be more fluent in it by now without all the help? Or perhaps less? Genuine question.
mewpmewp2 2 days ago [-]
I do tons of TypeScript in my side projects and in real life, and I usually feel heavy frustrations when I stray away.
When I stray out of this (e.g. I started doing a lot of IoT, ML and Robotics projects, where I can't always use TypeScript). I think one key thing that LLMs have helped me is that I can ask why something is X without having to worry about sounding stupid or annoying.
So I think it has enabled me at least a way to get out of the TypeScript zone more worry free without losing productivity. And I do think I learn a lot, although I'm relating a lot of it on my JS/TS heavy experience.
To me the ability to ask stupid questions without fear of judgment or accidentally offending someone - it's just amazing.
I used to overthink a lot before LLMs, but they have helped me with that aspect, I think a lot.
I sometimes think that no one except LLMs would have the patience for me if I didn't filter my thoughts always.
n144q 2 days ago [-]
Well said. CharGPT is almost the opposite of stackoverflow -- you can ask a stupid question, and ask why a language is designed in such a way, and get nice, patient, nuanced answer without judgment or starting a war.
brianstrimp 1 days ago [-]
And how much can you trust those replies?
n144q 24 hours ago [-]
At least 80% of the time.
I have brains and can verify if it's correct or not.
n144q 2 days ago [-]
Agree with many of the points here, especially the part with one-off, non-production code. I had great experience letting ChatGPT writing utility code. Once it provided Go code for an ad-hoc task which runs exactly as expected on first try, when it could cost me at least 30 minutes that's mostly spent on looking up APIs that I am not familiar with. Another time it created an HTTP server that worked with only minor tweaks. I don't want to think about life before LLMs existed.
One thing that is not mentioned -- code review. It is not great at it, often pointing out trivial or non issues. But if it finds 1 area for improvement out of 10 bullet points, that's still worth it -- most human code reviewers don't notice all the issues in the code anyway.
pgm8705 2 days ago [-]
I used to feel they just served as a great auto complete or stack overflow replacement until I switched from VSCode to Cursor. Cursor's agent mode with Sonnet is pretty remarkable in what it can generate just from prompts. It is such a better experience than any of the AI tools VSCode provides, imo. I think tools like this when paired with an experienced developer to guide it and oversee the output can result in major productivity boosts. I agree with the sentiment that it falls apart with complex tasks or understanding unique business logic, but do think it can take you far beyond boilerplate.
Prickle 2 days ago [-]
The main issue I am having here, is that I can see a measurable drop in my ability to write code because of LLM usage.
I need to avoid LLM use to ensure my coding ability stays up to par.
Aeolun 2 days ago [-]
There’s no measurable drop in my ability to write code, but there’s a very significant one in my desire to.
fosterfriends 2 days ago [-]
"Proofreading for typos and logic mistakes: I write a fair amount of English documents: ADRs, technical summaries, internal posts, and so on. I never allow the LLM to write these for me. Part of that is that I think I can write more clearly than current LLMs. Part of it is my general distaste for the ChatGPT house style. What I do occasionally do is feed a draft into the LLM and ask for feedback. LLMs are great at catching typos, and will sometimes raise an interesting point that becomes an edit to my draft."
--
I work on Graphite Reviewer (https://graphite.dev/features/reviewer). I'm also partly dyslexic. I lean massively on Grammarly (using it to write this comment) and type-safe compiled languages. When I engineered at Airbnb, I caused multiple site outages due to typos in my ruby code that I didn't see and wasn't able to execute before prod.
The ability for LLMs to proofread code is a godsend. We've tuned Graphite Reviewer to shut up about subjective stylistic comments and focus on real bugs, mistakes, and typos. Fascinatingly, it catches a minor mistake in ~1/5 PRs in prod at real companies (we've run it on a few million PRs now). Those issues it catches result in a pre-merge code change 75% of the time, about equal to what a human comment does.
AIs aren't perfect, but Im thrilled that they work as fancy code spell-checkers :)
elwillbo 2 days ago [-]
I'm in your boat with having to write a significant amount of English documents. I always write them myself, and have ChatGPT analyze them as well. I just had a thought - I wonder if I could paste in technical documentation, and code, to validate my documentation? Will have to try that later.
CoPilot is used for simple boilerplate code, and also for the autocomplete. It's often a starting point for unit tests (but a thorough review is needed - you can't just accept it, I've seen it misinterpret code). I started experimenting with RA.Aid (https://github.com/ai-christianson/RA.Aid) after seeing a post on it here today. The multi-step actions are very promising. I'm about to try files-to-prompt (https://github.com/simonw/files-to-prompt) mentioned elsewhere in the thread.
For now, LLMs are a level-up in tooling but not a replacement for developers (at least yet)
devmor 2 days ago [-]
How I use LLMs as a senior engineer:
1. Try to write some code
2. Wonder why my IDE is providing irrelevant, confusing and obnoxious suggestions
3. Realize the AI completion plugin somehow turned itself back on
4. Turn it off
5. Do my job better than everyone that didn't do step 4
hinkley 2 days ago [-]
Some of the stuff in these explanations sounds like missing tools. There's a similar thread for a different article going around today, and I kept thinking at various points, "Maybe instead of having an LLM write you unit tests, you should check out Property Based Testing?"
The question I keep asking myself is, "Should we be making tools that auto-write code for us, or should we be using this training data to suss out the missing tools we have where everyone writes the same code 10 times in their careers?"
brianstrimp 2 days ago [-]
"as a staff engineer"
Such an unnecessary flex.
simonw 2 days ago [-]
It's entirely relevant here. The opinions of a staff engineer on this stuff should be interpreted very differently from the opinions of a developer with much less experience.
theoryofx 2 days ago [-]
Not really because "Staff software engineer" has become the new "Senior Software Engineer" due to title inflation. It's become an essentially meaningless distinction at many companies.
Case in point, this person has around around 7 years of professional experience at just two companies, Zendesk and GitHub. I don't mean this as a personal dig in any way (truly) but this simply isn't what we used to mean by a "Staff" level software engineer.
This person is early-mid career, which we used to just call "Software engineer" then "Senior Software Engineer" and now (often enough) "Staff Software Engineer"
theshrike79 1 days ago [-]
Exactly. I've been in the business for 20+ years and I think my title is still "Senior Software Engineer".
Not because of lack of skill, but I don't care. I could ask for a fancier one and most likely get it, but why?
brianstrimp 1 days ago [-]
They are at a place in their career where it still feels relevant to mention that title.
It's not relevant here, because this is a post from someone who worked on copilot. It's a shady sales pitch, disguised as an engineer's honest opinion.
whoknowsidont 2 days ago [-]
Uhhh, I hate to break it to you but titles mean absolutely nothing in this industry.
nmat 2 days ago [-]
To be honest I was expecting the article to focus primarily on internal docs, meetings, long Slack posts, etc. Staff engineers spend a relatively small percentage of their time writing code. A lot of what it takes to be successful is knowing how to communicate with different audiences which AI should be really useful for.
mns 1 days ago [-]
What is a Staff engineer anyhow? Sometimes I fell like all these titles and roles pop up all of the sudden to replace already existing ones that were already too boring.
simonw 1 days ago [-]
A lot of companies use it for their IC track (Individual Contributir) track, to solve the problem of engineers being forced to move into management because otherwise their career progression stops.
Not really, once you get past senior the “shape” of staff+ engineers varies greatly. At that level the scope is typically larger which can limit the usefulness of LLMs - I’d agree that the greatest value I’ve gotten is from being able to quickly get up to speed on something I’ve been asked to have an opinion on and sanity checking my work if I’m using an unfamiliar language or framework.
It also helps if you realize staff+ is just a way to financially reward people who don’t want to be managers so you end up with these unholy engineer/architect/project manager/product manager hybrids that have to influence without authority.
iamwil 2 days ago [-]
I'd been using 4o and 3o to read research papers and ask about topics that are a little bit out of my depth for a while now. I get massive amount of value out of that. What used to take me a week of googling and squinting at wikipedia or digging for slightly less theoretical blog posts, I get to just upload a paper or transcript of a talk and just keep asking it questions until I feel like I got all the insights and ah-ha moments.
At the end, I ask it to give me a quiz on everything we talked about and any other insights I might have missed. Instead of typing out the answers, I just use Apple Dictation to transcribe my answers directly.
It's only recently that I thought to take the conversation I just had, and have it write a blog post of the insights and ah-ha moments I had, and have it write a blog post. It takes a fair bit of curation to get it to do that, however. I can't just say, "write me a blog post on all we talked about". I have to first get it to write an outline with the key insights. And then based on the outline, write each section. And then I'll use chatgpt's canvas to guide and fine-tune each section.
However, at no point do I have to specifically write the actual text. I mostly do curation.
I feel ok about doing this, and don't consider it AI slop, because I clearly mark at the top that I didn't write a word of it, and it's the result of a curated conversation with 4o. In addition, I think if most people do this as a result of their own Socratic methods with an AI, it'd build up enough training data for next generation of AI to do a better job of writing pedagogical explanations, posts, and quizzes to get people learning topics that are just out of reach, but there hadn't been too many people able to bridge the gap.
> I feel ok about doing this, and don't consider it AI slop, because I clearly mark at the top that I didn't write a word of it
This is key - if it's marked clearly as AI-generated or assisted, it's not slop. I think this is an important part of AI ethics that most people can agree with.
iamwil 2 days ago [-]
Not just that. I also spent quite a bit of time curating what it focused on. The insights and ah-ha moments.
callamdelaney 2 days ago [-]
My experience of copilot is that it’s completely useless and almost completely incapable of anything. 4o is reasonable though.
t8sr 2 days ago [-]
I guess I'm officially listed as a "staff engineer". I have been at this for 20 years, and I work with multiple teams in pretty different areas, like the kernel, some media/audio logic, security, database stuff... I end up alternating a lot between using Rust, Java, C++, C, Python and Go.
Coding assistant LLMs have changed how I work in a couple of ways:
1) They make it a lot easier to context switch between e.g. writing kernel code one day and a Pandas notebook the next, because you're no longer handicapped by slightly forgetting the idiosyncrasies of every single language. It's like having smart code search and documentation search built into the autocomplete.
2) They can do simple transformations of existing code really well, like generating a match expression from an enum. They can extrapolate the rest from 2-3 examples of something repetitive, like converting from Rust types into corresponding Arrow types.
I don't find the other use cases the author brings up realistic. The AI is terrible at code review and I have never seen it spot a logic error I missed. Asking the AI to explain how e.g. Unity works might feel nice, but the answers are at least 40% total bullshit and I think it's easier to just read the documentation.
I still get a lot of use out of Copilot. The speed boost and removal of friction lets me work on more stacks and, consequently, lead a much bigger span of related projects. Instead of explaining how to do something to a junior engineer, I can often just do it myself.
I don't understand how fresh grads can get use out of these things, though. Tools like Copilot need a lot of hand-holding. You can get them to follow simple instructions over a moderate amount of existing code, which works most of the time, or ask them to do something you don't exactly know how to do without looking it up, and then it's a crapshoot.
The main reason I get a lot of mileage out of Copilot is exactly because I have been doing this job for two decades and understand what's happening. People who are starting in the industry today, IMO, should be very judicious with how they use these tools, lest they end up with only a superficial knowledge of computing. Every project is a chance to learn, and by going all trial-and-error with a chatbot you're robbing yourself of that. (Not to mention the resulting code is almost certainly half-broken.)
chasd00 2 days ago [-]
This is pretty much how i use llm for coding. I already know what i want i just don't want to type it out. I ask the llm to do the typing for me and then i check it over, copy/paste it in, and make any adjustments or extensions.
theshrike79 1 days ago [-]
This is the way.
Just last night I did a quick test on Cursor (first time trying it). Opened up my IRC bot project and asked it to "add relevant handlers for IRC messages".
It immediately recognised the pattern I had used before and added CTCP VERSION, KICK, INVITE and 433 (nickname already in use). It didn't try to add everything under the sun and just added those. Took me 20 seconds.
bsder 2 days ago [-]
The big problem with LLMs as a "staff engineer" is that LLMs are precisely suited to the kind of tasks that I would normally assign to a junior engineer or cooperative engineering student.
That's bad because it makes "not training your juniors" the default path for senior people.
I can assign the task to one of my junior engineers and they will take several days of back and forth with me to work out the details--that's annoying but it's how you train the next generation.
Or I can ask the LLM and it will spit back something from its innards that got indexed from Github or StackOverflow. And for a "junior engineer" task it will probably be correct with the occasional hallucination--just like my junior engineers. And all I have to do for the LLM is click a couple of keys.
nbaugh1 1 days ago [-]
One thing I don't see mentioned often but is definitely true for me - I use Google like 90% less frequently now. I spend zero time crawling through various blogs or stack overflow questions to do things like understand an error I haven't seen before. Google is basically now a means of directing me to official docsites if I don't already know the URL
arcticfox 2 days ago [-]
> How I use LLMs as a staff engineer
With all the talk of o1-pro as a superb staff engineer-level architect, it took me awhile to re-parse this headline to understand what the author, apparently a staff engineer, meant
prisenco 2 days ago [-]
I used Copilot for a while (since the beta whenever that was) but recently I stopped and I'm glad I did. I use Claude and DeepSeek for searching documentation and rubber ducking/pair programming style conversations when I'm stuck, but that's about it.
I stick to a "no copy & paste" rule and that includes autocomplete. Interactions are a conversation but I write all my code myself.
ggregoire 2 days ago [-]
Don't people actually enjoy writing code and solving problems on their own?
I would be so bored if my job consisted of writing prompts all day long.
simonw 2 days ago [-]
I use LLMs to support my programming all day, and I'm not "writing prompts all day long". I'm working just like I used to, only faster.
LLMs make it quicker for me to:
- Decipher obscure error messages
- Knock out a quick exploratory prototype of a new idea, both backend and frontend code
- Write boiler plate code against commonly used libraries
- Debug things: feeding a gnarly bug plus my codebase into Gemini (for long context) or o3-mini can save me a TON of frustration
- Research potential options for libraries that might help with a problem
- Refactor - they're so good at refactoring now
- Write tests. Sometimes I'll have the LLM sketch out a bunch of tests that cover branches I may have not bothered to cover otherwise.
I enjoy working like this a whole lot more than I enjoyed working without them, and I enjoyed programming a lot prior to LLMs.
n144q 2 days ago [-]
Using LLMs to write code for you is solving problems. The argument is almost like saying "using a third party library is not solving a problem on your own". If it gets the job done, it works.
I enjoy writing code, but I enjoy seeing getting a feature out even more. In fact, I don't quite enjoy the part of writing basic logic or tweaking CSS which an intern can easily do.
I don't think anybody is writing prompts all day long. If you don't actually know how to write code, maybe. But at this point, a professional software engineer still works with a code base with a hands-on approach most of the time, and even heavy LLM users still spend a lot of time hand writing code.
Aeolun 2 days ago [-]
I enjoy building things. Writing code, well, I could take or leave that.
nvarsj 2 days ago [-]
I'm trying to understand the point of the affix "as a staff engineer", but I cannot.
2 days ago [-]
piuantiderp 2 days ago [-]
Anytime you read "as an X", spidey senses should tingle and be careful. Caveat lector
sangnoir 2 days ago [-]
Boosting their personal brand, perhaps?
nvarsj 2 days ago [-]
Yes perhaps. I guess we all need to hustle for that F U money :).
softwaredoug 2 days ago [-]
I think people mistakenly use LLMs as research tools, thinking in terms of search, when they're better as collaborators / co-creators of scaffolding you know you need to edit.
SquibblesRedux 2 days ago [-]
I have found LLMs to be "good enough" for:
- imprecise semantic search
- simple auto-completion (1-5 tokens)
- copying patterns with substitutions
- inserting commonly-used templates
ur-whale 2 days ago [-]
This article completely correlates with my so far very positive experience of using LLM's to assist me in writing code.
asdev 2 days ago [-]
if you write your code with good dependency injection/abstraction, you can one shot unit tests a lot of the time
why-el 2 days ago [-]
I was hoping the LLM is the staff engineer? can read both ways.
ddgflorida 2 days ago [-]
You summed it up well and your experience matches mine.
2 days ago [-]
floppiplopp 2 days ago [-]
Yes, LLMs are great at generating corporate bullshit to appease the clueless middle management. I wouldn't trust its code generation for production systems though, but I can see how inexperienced devs might do just that.
dlvhdr 2 days ago [-]
Another article that doesn’t introduce anything new
Rendered at 01:24:44 GMT+0000 (Coordinated Universal Time) with Vercel.
In my opinion, using LLMs to write code comes as a faustian deal where you learn terrible practices and rely on code quantity, boilerplate, and indeterministic outputs - all hallmarks of poor software craftsmanship. Until ML can actually go end to end on requirements to product and they fire all of us, you can't cut corners on building intuition as a human by forgoing reading and writing code yourself.
I do think that there is a place for LLMs in generating ideas or exploring an untrusted knowledge base of information, but using code generated from an LLM is pure madness unless what you are building is truly going to be thrown away and rewritten from scratch, as is relying on it as a linting, debugging, or source of truth tool.
Personally, I’m on the fence. But having conversations with others, and some requests from execs to implement different AI utils into our processes… making me to be on the safer side of job security, rather than dismiss it and be adamant against it.
Executives, directors, and managerial staff have had their heads up their own asses since the dawn of civilization. Riding the waves of terrible executive decisions is unfortunately part of professional life. Executives like the idea of LLMs because it means they can lay you off; they're not going to care about your opinion on it one way or another.
> Being very anti-LLM code instead of trying to understand how it can improve the speed might be detrimental for your career.
You're making the assumption that LLMs can improve your speed. That's the very assumption being questioned by GP. Heaps of low-quality code do not improve development speed.
It becomes some sort of muscle memory, where I can predict whether using LLM would be faster or slower. Or where it's more likely to give bad suggestions or not. Basically treating it as googling skills.
Yeah, that intuition is so important. You have to use the models a whole bunch to develop it, but eventually you get a sort of sixth sense where you can predict if an LLM is going to be useful or harmful on a problem and be right about it 9/10 times.
My frustration is that intuition isn't something I can teach! I'd love to be able to explain why I can tell that problem X is a good fit and problem Y isn't, but often the answer is pretty much just "vibes" based on past experience.
1) Being up to date with the latest capabilities - this one has slowed down a bit, and my biggest self-learning experience was in August/September, and most of the intuition still works. However, although I had time to do so, it's hard to ask my team to drop 5-6 free weekends of their lives to get up to speed
2) Transition period where not everyone is on the same page about LLM - this I think is much harder, because the expectations from the executives are much different than on the ground developers using LLMs.
A lot of people could benefit on alignment of expectations, but once again, it's hard to explain what is possible and not possible if your statements will become nullified a month later with a new AI model/product/feature.
Not sure I agree with that. I would say there are classes of problems where LLMs will generally help and a brief training course (1 week, say) would vastly improve the average (non-LLM-trained) engineer's ability to use it productively.
In real, traditional, deterministic systems where you explicitly design a feature, even that has difficulty being coherent over time as usage grows. Think of tab stops on a typewriting evolving from an improvised template, to metal tabs installed above the keyboard, to someone cutting and pasting incorrectly and reflowing a 200 page document to 212 pages accidentally because of tab characters...
If you create a system with these models that writes the code to process a bunch of documents in some way or so some kind of herculean automation you haven't improved the situation when it comes to clarity or simplicity, even if the task at hand finishes sooner for you in this moment.
Every token generated has an equal potential to spiral out into new complexities and whack a mole issues that tie you to assumptions about the system design while providing this veneer that you have control over the intersections of these issues, but as this situation grows you create an ever bigger problem space.
And I definitely hear you say, this is the point where you use sort of full stack interoception holistic intuition about how to persuade the system towards a higher order concept of the system and expand your ideas about how the problem could be solved and let the model guide you ... And that is precisely the mysticism I object to because it isn't actually a kind productiveness, but a struggle, a constant guessing, and any insight from this can be taken away, changed accidentally, censored, or packaged as a front run against your control.
Additionally the nature of not having separate in band and out of band streams of data means that even with agents and reasoning and all of the avenues of exploration and improving performance will still not escape the fundamental question of ... What is the total information contained in the entire probabilistic space. If you try to do out of band control in some way like the latest thing I just read where they have a separate censoring layer, you just either wind up having to use another LLM layer there which still contains all of these issues, or you use some kind of non transformer method like Bayesian filtering or something and you get all of the issues outlined in the seminal spam.txt document...
So, given all of this, I think it is really neat the kinds of feats you demonstrate, but I object that these issues can be boiled down to "putting a serious amount of effort into learning how to best apply them" because I just don't think that's a coherent view of the total problem, and not actually something that is achievable like learning in other subjects like math or something. I know it isn't answerable but for me a guiding question remains why do I have to work at all, is the model too small to know what I want or mean without any effort? The pushback against prompt engineering and the rise of agentic stuff and reasoning all seems to essentially be saying that, but it too has hit diminishing returns.
My experience has been that it's slightly improved code completion and helped with prototyping.
Probably, but when the time comes for layoffs the ones that will be the first to go are those that are hiding under a rock, claiming that there is no value to those LLM’s even as they’re being replaced.
The need for real coding skills however, won't.
First, what LLMs/GenAI do is automated code generation, plain and simple. We've had code generation for a very long time; heck, even compiling is automated generation of code.
What is new with LLM code generation is non-deterministic, unlike traditional code generation tools; like a box of chocolates, you never know what you're going to get.
So, as long as you have tools and mechanisms to make that non-determinism irrelevant, using LLMs to write code is not a problem at all. In fact, guess what? Hand-coding is also non-deterministic, so we already have plenty of those in place: automated tests, code reviews etc.
Don’t get me wrong—I’ve seen productivity gains both in LLMs explaining code/ideation and in actual implementation, and I use them regularly in my workflow now. I quite like it. But these people are itching to eliminate the cost of maintaining a dev team, and it shows in the level of wishful thinking they display. They write a snake game one day using ChatGPT, and the next, they’re telling you that you might be too slow—despite a string of record-breaking quarters driven by successful product iterations.
I really don’t want to be a naysayer here, but it’s pretty demoralizing when these are the same people who decide your compensation and overall employment status.
And this is the promise of AI, to eliminate jobs. If CEOs invest heavily in this, they won't back down because no one wants to be wrong.
I understand some people try to claim AI might make net more jobs (someday), but I just don't think that is what CEOs are going for.
They might not have to. If the results are bad enough then their companies might straight-up fail. I'd be willing to bet that at least one company has already failed due to betting too heavily on LLMs.
That isn't to say that LLMs have no uses. But just that CEOs willing something to work isn't sufficient to make it work.
Pre-LLM, that's how Boeing destroyed itself. By creating value for the shareholders.
If your experienced employees, who are giving an honest try to all these tools, are telling you it’s not a silver bullet, maybe you should hold your horses a little and try to take advantage of reality—which is actually better—rather than forcing some pipe dream down your bottom line’s throat while negating any productivity gains by demotivating them with your bullshit or misdirecting their efforts into finding a problem for a given solution.
If LLMs make average devs 10x more productive, Jevon's Paradox[1] suggests we'll just make 10x more software rather than have 10x fewer devs. You can now implement that feature only one customer cares about, or test 10x more prototypes before building your product. And if you instead decide to decimate your engineering team, watch out because your competitors might not.
https://en.wikipedia.org/wiki/Jevons_paradox
Maybe I’m being overly cynical, but assuming this isn’t a race to the bottom and people will get rich being super productive ai-enhanced code monsters, to me, looks like a conceited white collar version of the hustle porn guys that think if they simultaneously work the right combo of gig apps at the right time of day in the right spots then they can work their way up to being wealthy entrepreneurs. Good luck.
What seems far more likely to me is that computer scientists will be doing math research and wrangling LLMS, a vanishingly small number of dedicated software engineers work on most practical textual coding tasks with engineering methodologies, and low or no code tooling with the aid of LLMs gets good enough to make custom software something made by mostly less-technical people with domain knowledge, like spreadsheet scripting.
A lot of people in the LLM booster crowd think LLMs will replace specialists with generalists. I think that’s utterly ridiculous. LLMs easily have the shallow/broad knowledge generalists require, but struggle with the accuracy and trustworthiness for specialized work. They are much more likely to replace the generalists currently supporting people with domain-specific expertise too deep to trust to LLMs. The problem here is that most developers aren’t really specialists. They work across the spectrum of disciplines and domains but know how to use a very complex toolkit. The more accessible those tools are to other people, the more the skill dissolves into the expected professional skill set.
<<never uses AI>>
I've already seen several rounds of slacks: "why aren't you using <insert LLM coding assistant name>?" off the back of this reporting.
These assistants essentially spy on you working in many cases, if the subscription is coming from your employer and is not a personal account. For one service, I was able to see full logging of all the chats every employee ever had.
I've heard exactly the same stories from my friends in larger tech companies as well. Every all hands there's a push for more AI integration, getting staff to use AI tools and etc., with the big expectation that development will get faster.
If we take the premise at face value, then this is a time management question, and that’s a part of pretty much every performance evaluation everywhere. You’re not rewarded for writing some throwaway internal tooling that’s needed ASAP in assembly or with a handcrafted native UI, even if it’s strictly better once done. Instead you bash it out in a day’s worth of Electron shitfuckery and keep the wheels moving, even if it makes you sick.
Hyperbole aside, hopefully the point is clear: better is a business decision as much as a technical one, and if an LLM can (one day) do the 80% of the Pareto distribution, then you’d better be working on the other 20% when management come knocking. If I run a cafe, I need my baristas making coffee when the orders are stacking up, not polishing the machine.
Caveats for critical code, maintenance, technical debt, etc. of course. Good engineers know when to push back, but also, crucially, when it doesn’t serve a purpose to do so.
(I know it because I'm in charge of maintaining all processes around LLM keys, their usages, Cursor stuff and etc.)
> I'm in charge of maintaining all processes around LLM keys
Does management look to you for insight on which staffers are appropriately committed to leveraging AI?
The entire reason they hire us is to let them know if what they think makes sense. No one is ideologically opposed to AI generated code. It comes with lots of negatives and caveats that make relying on it costly in ways we can easily show to any executives, directors, etc. who care about the technical feasibility of their feelings.
Unfortunately, that hasn't been my experience. But I agree with you comment generally.
I think that's the sort of spot where better tools might be appropriate. I know what I want to do, but it's a mess to do it. I suspect that will be better at facilitating growth instead of stunting it.
If you're truly saving time by having an LLM write boiler plate code, is there maybe an opportunity to abstract things away so that higher-level concepts, or more expressive code could be used instead?
5 lines of code written with just the core language and standard library are often much easier to read and digest than a new abstraction or call to some library.
And it’s just an unfortunate fact of life that many of the common programming languages are not terribly ergonomic; it’s not uncommon for even basic operations to require a few lines of boilerplate. That isn’t always bad as languages are balancing many different goals (expressiveness, performance, simplicity and so on).
In a way LLMs are ushering in a kind of boilerplate renaissance IMO. When you can have an LLM refactor a massive amount of boilerplate in one fell swoop it starts to not matter much if you repeat yourself - actually, really logically dense code would probably be harder for LLMs to understand and modify (not dissimilar from us…) so it’s even more of a liability now than in the past. I would almost always rather have simple, easy-to-understand code than something elegant and compact and “expressive” - and our tools increasingly favor this too.
Also I really don’t give a shit about how to best center a div nor do I want to memorize a million different markup tags and their 25 years of baggage. I don’t find that kind of knowledge gratifying because it’s more trivia than anything insightful. I’m glad that with LLMs I can minimize the time I spend thinking about those things.
Other things are complicated to abstract for the boilerplate they avoid. The kind of thing that avoids 100 lines of code but causes errors that take 20 minutes to understand because of heavy use of reflection/inferred types in generics/etc. The older I get, the more I think "clever" reflection is more of a sin than boring boilerplate.
And very often, if the LLM produces a poopoo, asking it to fix it again works just well enough.
I've yet to encounter any LLM from chatGPT to cursor, that doesn't choke and start to repeat itself and say it changed code when it didn't, or get stuck changing something back and forth repeatedly inside of 10-20 minutes. Like just a handful of exchanges and it's worthless. Are people who make this workflow effective summarizing and creating a fresh prompt every 5 minutes or something?
I estimate a sizable portion of my successful LLM coding sessions included at least a few resets of this nature.
This is the most important thing in my opinion. This is why I switched to showing tokens in my chat app.
https://beta.gitsense.com/?chat=b8c4b221-55e5-4ed6-860e-12f0...
I treat tokens like the tachometer for a car's engine. The higher you go, the more gas you will consume, and the greater the chance you will blow up your engine. Different LLMs will have different redlines and the more tokens you have, the more costly every conversation will become and the greater the chance it will just start spitting gibberish.
So far, my redline for all models is 25,000 tokens, but I really do not want to go above 20,000. If I hit 16,000 tokens, I will start to think about summarizing the conversation and starting a new one based on the summary.
The initial token count is also important in my opinion. If you are trying to solve a complex problem that is not well known by the LLM and if you are only starting with 1000 or less tokens, you will almost certainly not get a good answer. I personally think 7,000 to 16,000 is the sweet spot. For most problems, I won't have the LLM generate any code until I reach about 7,000 since it means it has enough files in context to properly take a shot at producing code.
Learning takes time.
> Are people who make this workflow effective summarizing and creating a fresh prompt every 5 minutes or something?
I work on one small problem at a time, only following up if I need an update or change on the same block of code (or something very relevant). Most conversations are fewer than five prompt/response pairs, usually one-three. If the LLM gets something wrong, I edit my prompt to explain what I want better, or to tell it not to take a specific approach, rather than correcting it in a reply. It gets a little messy otherwise, and the AI starts to trip up on its own past mistakes.
If I move on to a different (sub)task, I start a new conversation. I have a brief overview of my project in the README or some other file and include that in the prompt for more context, along with a tree view of the repository and the file I want edited.
I am not a software engineer and I often need things explained, which I tell the LLM in a custom system prompt. I also include a few additional instructions that suit my workflow, like asking it to tell me if it needs another file or documentation, if it doesn't know something, etc.
Its like having to forever be the most miserable detective in the world; no mystery, only clues. A method that never existed, three different types that express the same thing, the cheeky smile of your coworker who says he can turn the whole backend into using an ORM in a day because he has Cursor, the manager who signs off on this, the deranged PR the next day. This continual sense that less and less people even know whats going on anymore...
"Can you make sure we support both Mongo and postgres?"
"Can you put this React component inside this Angular app?"
"Can you setup the kubernetes with docker compose?"
That does not match my experience at all. You obviously have to use your brain to review it, but for a lot of problems LLMs produce close to perfect code in record time. It depends a lot on your prompting skills though.
*I don’t know to what extent it’s worthwhile discussing whether you could call these the same model vs. entirely different, for any two products in the same family. Outside of simply quantising the same model and nothing else. Maybe you could include distillations of a base model too?
I built my chat app around this idea and to save money. When it comes to coding, I feel Sonnet 3.5 is still the best but I don't start with it. I tend to use cheaper models in the beginning since it usually takes a few iterations to get to a certain point and I don't want to waste tokens in the process. When I've reached a certain state or if it is clear that the LLM is not helping, I will bring in Sonnet to review things.
Here is an example of how the conversation between models will work.
https://beta.gitsense.com/?chat=bbd69cb2-ffc9-41a3-9bdb-095c...
The reason why this works for my application is, I have a system prompt that includes the following lines:
# Critical Context Information
Your name is {{gs-chat-llm-model}} and the current date and time is {{gs-chat-datetime}}.
When I make an API call, I will replace the template strings with the model and date. I also made sure to include instructions in the first user message to let the model know it needs to sign off on each message. So with the system prompt and message signature, you can say "what do you think of <LLM's> response".
[1]: https://lucumr.pocoo.org/2025/1/30/how-i-ai/
Are you sure? I've definitely had cases where an inexperienced pair programmer made my code worse.
* If you ask it to solve a problem and nothing more, chances are the code isn't the best as it will default to the most common solutions in the training data.
* If you ask it to refactor some code idiomatically, it will apply most common idiomatic concepts found in the training data.
* If you ask it to do both at the same time you're more likely to get higher quality but incorrect code.
It's better to get a working solution first, then ask it to improve that solution, rinse/repeat in smallish chunks of 50-100 loc at a time. This is kinda why reasoning models are of some benefit, as they allow a certain amount of reflection to tie together disparate portions of the training data into more cohesive, higher quality responses.
Edit: fix typo in last sentence
All the other distilled models and qwen coder and similar are a large step below the above models in terms of most benchmarks. If someone is running a small 20GB model locally, they will not have the same experience as those who run the top of the line models.
You still need to state your assertions with precision and keep a model of the code in your head.
Its possible to be be precise at an higher level of abstraction as long as your prompts are consistent with a coherent model of the code.
This is a fantastic quote and I will use this. I describe the future of coding as natural language coding (or maybe syntax agnostic coding). This does not mean that the llm is a magic machine that understands all my business logic. It means what you've described - I can describe my function flow in abstracted english rather than requiring adherence to a syntax
If youre getting junior devs just pooping out code and sending to review thats really bad and should be a pip-able offense in my opinion.
The result works ok, nobody cares if the code is good or bad. If it’s bad and there are bugs, doesn’t matter, no humans will look at it anymore - Claude will remix the slop until it works or a new model will rewrite the whole thing from scratch.
Realized during writing this that I should’ve added the extract of requirements in the comment of the index.ts of the package, or maybe a README.CURSOR.md.
At the absolute minimum this should require including a highly detailed function specification in the prompt context and sending the output to a full unit test suite.
Lordy. Is this where software development is going over the next few years?
My defense is that Karpathy does the same thing, admitted himself in a tweet https://x.com/karpathy/status/1886192184808149383 - I know exactly what he means by this.
> I don’t do this a lot, but sometimes when I’m really stuck on a bug, I’ll attach the entire file or files to Copilot chat, paste the error message, and just ask “can you help?”
The "reasoning" models are MUCH better than this. I've had genuinely fantastic results with this kind of thing against o1 and Gemini Thinking and the new o3-mini - I paste in the whole codebase (usually via my https://github.com/simonw/files-to-prompt tool) and describe the bug or just paste in the error message and the model frequently finds the source, sometimes following the path through several modules to get there.
Here's a slightly order example: https://gist.github.com/simonw/03776d9f80534aa8e5348580dc6a8... - finding a bug in some Django middleware
One of my favourite things is to ask it if it thinks there are any bugs - this helps a lot with validating any logic that I might be exploring. I recently ported some code to a different environment with slightly different interfaces and it wasn't working - I asked o1 to carefully go over each implementation in detail why it might be producing a different output. It thought for 2 whole minutes and gave me a report of possible causes - the third of which was entirely correct and had to do with how my environment was coercing pandas data types.
There have been 10 or so wow moments over the past few years where I've been shocked by the capabilities of genai and that one made the list.
I've had the experience of seeing some junior dev posting error messages into ChatGPT, applying the suggestions of ChatGPT, and posting the next error message into ChatGPT again. They ended up applying fixes for 3 different kinds of bugs that didn't exist in the code base.
---
Another cause, I think, is that they didn't try to understand any of those (not the solutions, and not the problems that those solutions are supposed to fix). If they did, they would have figured out that the solutions were mismatches to what they were witnessing.
There's a big difference between using LLM as a tool, and treating it like an oracle.
I just had a case where I was adding stuff to two projects, both open at the same time.
I added new fields to the backend project, then I swapped to the front-end side and the LLM autocomplete gave me 100% exactly what I wanted to add there.
And similar super-accurate autocompletes happen every day for me.
I really don't understand people who complain about "AI slop", what kind of projects are they writing?
LLMs can absolutely bust out some corporate docs super crazy fast too... probably a reasonable thing to re-evaluate the value though
I had a working and complete version of Apple MapKit JS rendering a map for an address (along with the server side token generation), and last night I told it I wanted to switch to Google Maps for "reasons".
It nailed it on the first try, and even gave me quick steps for creating the API keys in Google Dev Console (which is always _super_ fun to navigate).
As Simon has said elsewhere in these comments, it's all about the context you give it (a working example in a slightly different paradigm really couldn't be any better).
I have a project with a bunch of tests already, then I pick a test file and write `public Task Test` and wait a few seconds, in most cases it writes down a pretty sane basis for a test - and in a few cases it figured out an edge case I missed.
Ah, now it makes sense.
Traditional auto complete can finish the statement you started typing, LLMs often suggest whole lines before I even type anything, and even sometimes whole functions.
And static types can assist the LLM too. It's not like it's an either or choice
"Almost all the completions I accept are complete boilerplate (filling out function arguments or types, for instance). It’s rare that I let Copilot produce business logic for me"
My experience is similar, except I get my IDE to complete these for me instead of an LLM.
Hard for an IDE auto complete to do this.
I find Copilot is great if you add a small comment describing the logic or function. Taking 10s to write a one line sentence in English can save 5-10 mins writing your code from scratch. Subjectively it feels much faster to QA and review code already written.
Having good typing and DTOs helps too.
…needn’t say more.
Copilot was utter garbage when I switched to cursor+claude, it was like some alien tech upgrade at first.
The "dumb" autogenerated stuff is incredible. It's like going from bad autocomplete to Intellisense all over again.
The world of python tooling (at least as used by my former coworkers) put my expectations in the toilet.
Then I reflected, how very true it was. In fact, as of writing this there are 138 comments and I started simply scrolling through what was shown to assess the negative/neutral/positive bias based upon a highly subjective personal assessment: 2/3 were negative and so I decided to stop.
As a profession, it seems many of us have become accustomed to dealing in absolutes when reality is subjective. Judging LLMs prematurely with a level of perfectionism not even cast upon fellow humans.. or at least, if cast upon humans I'd be glad not to be their colleagues.
Honestly right now - I would use this as a litmus test in hiring and the majority would fail based upon their closed-mindedness and ability to understand how to effectively utilise tools at their disposal. It won't exist as a signal for much longer, sadly!
We need to trust machines more than humans because machines can't get responsibility. That code that you pushed and broke prd - you can't point at the machine.
It is also predictability/growth in a sense. I can assess certain people and know what they will probably get wrong and develop the person and adjust it. If that person uses LLMs it disguises that exposure of skill and leads to a very hard signal to read as a senior dev, hampering their growth.
I did see a paper recently on the impact of AI/LLMs and danger to critical thinking skills - it's a very real issue and I'm having to actively counter this seemingly natural tendency many have.
With respect to signals, mine was around the attitude in general. I'd much rather work with someone who goes "Yes, but.." than one who is outright dismissive.
Increasing awareness of the importance of context will be a topic for a long time to come!
See this is what I don't get about the AI Evangelists. Every time I use the technology I am astounded at the amount of incorrect information and straight up fantasy it invents. When someone tells me that they just don't see it, I have to wonder what is motivating them to lie. There is simply no way you're using the same technology as me with such wildly different results.
Prompting styles are incredibly different between different people. It's very possible that they are using the same technology that you are with wildly different results.
I think learning to use LLMs to their maximum effectiveness takes months (maybe even years) of effort. How much time have you spent with them so far?
Most of these people who aren't salesmen aren't lying.
They just cannot tell when the LLM is making up code. Which is very very sad.
That or they could literally be replaced by a script that copy/pastes from stack-overflow. My friend did that a lot and it definitely helped features ship but doesn't make maintainable code.
I don’t know what technology you are using but I know that I am getting very different results based on my own prompt qualities.
I also do not really consider hallucinations to be much of an issue for programming. It comes up so rarely and it’s caught by the type checker almost immediately. If there are hallucinations it’s often very minor things like imagining a flag that doesn’t exist.
"is this idiomatic C?"
"not just “how does X work”, but follow-up questions like “how does X relate to Y”. Even more usefully, you can ask “is this right” questions"
"I’ll attach the entire file or files to Copilot chat, paste the error message, and just ask “can you help?”"
It’s hard to impossible to discuss these things without a concrete problem at hand. Most of the prompt is the context provided. I can only talk from my experience which is, that how you write the prompt matters.
If you share what you are trying to accomplish I could probably provide some more appropriate insights.
- ai-assisted-programming: https://simonwillison.net/tags/ai-assisted-programming/
- prompt-engineering: https://simonwillison.net/tags/prompt-engineering/
And my series on how I use LLMs: https://simonwillison.net/series/using-llms/
And they aren’t perfect, but they sure can save a lot of time once you know how to use them and understand what they are and aren’t good at.
This is how I use AI at work for maintaining Python projects, a language in which I am not at all really versed. Sometimes I might add “this is how I would do it in …, how would I do this in Python?”
I find this extremely helpful and productive, especially as I have to pull the code onto a server to test it.
Asking a LLM to translate between languages works really well most of the time. It's also a great way to learn which libraries are the standard solution for a language. It really accelerated my learning process.
Sure, there is the occasional too literal translation or hallucination, but I found this useful enough.
When I stray out of this (e.g. I started doing a lot of IoT, ML and Robotics projects, where I can't always use TypeScript). I think one key thing that LLMs have helped me is that I can ask why something is X without having to worry about sounding stupid or annoying.
So I think it has enabled me at least a way to get out of the TypeScript zone more worry free without losing productivity. And I do think I learn a lot, although I'm relating a lot of it on my JS/TS heavy experience.
To me the ability to ask stupid questions without fear of judgment or accidentally offending someone - it's just amazing.
I used to overthink a lot before LLMs, but they have helped me with that aspect, I think a lot.
I sometimes think that no one except LLMs would have the patience for me if I didn't filter my thoughts always.
I have brains and can verify if it's correct or not.
One thing that is not mentioned -- code review. It is not great at it, often pointing out trivial or non issues. But if it finds 1 area for improvement out of 10 bullet points, that's still worth it -- most human code reviewers don't notice all the issues in the code anyway.
I need to avoid LLM use to ensure my coding ability stays up to par.
--
I work on Graphite Reviewer (https://graphite.dev/features/reviewer). I'm also partly dyslexic. I lean massively on Grammarly (using it to write this comment) and type-safe compiled languages. When I engineered at Airbnb, I caused multiple site outages due to typos in my ruby code that I didn't see and wasn't able to execute before prod.
The ability for LLMs to proofread code is a godsend. We've tuned Graphite Reviewer to shut up about subjective stylistic comments and focus on real bugs, mistakes, and typos. Fascinatingly, it catches a minor mistake in ~1/5 PRs in prod at real companies (we've run it on a few million PRs now). Those issues it catches result in a pre-merge code change 75% of the time, about equal to what a human comment does.
AIs aren't perfect, but Im thrilled that they work as fancy code spell-checkers :)
CoPilot is used for simple boilerplate code, and also for the autocomplete. It's often a starting point for unit tests (but a thorough review is needed - you can't just accept it, I've seen it misinterpret code). I started experimenting with RA.Aid (https://github.com/ai-christianson/RA.Aid) after seeing a post on it here today. The multi-step actions are very promising. I'm about to try files-to-prompt (https://github.com/simonw/files-to-prompt) mentioned elsewhere in the thread.
For now, LLMs are a level-up in tooling but not a replacement for developers (at least yet)
1. Try to write some code
2. Wonder why my IDE is providing irrelevant, confusing and obnoxious suggestions
3. Realize the AI completion plugin somehow turned itself back on
4. Turn it off
5. Do my job better than everyone that didn't do step 4
The question I keep asking myself is, "Should we be making tools that auto-write code for us, or should we be using this training data to suss out the missing tools we have where everyone writes the same code 10 times in their careers?"
Such an unnecessary flex.
Case in point, this person has around around 7 years of professional experience at just two companies, Zendesk and GitHub. I don't mean this as a personal dig in any way (truly) but this simply isn't what we used to mean by a "Staff" level software engineer.
This person is early-mid career, which we used to just call "Software engineer" then "Senior Software Engineer" and now (often enough) "Staff Software Engineer"
Not because of lack of skill, but I don't care. I could ask for a fancier one and most likely get it, but why?
https://www.youtube.com/watch?v=AbUU-D2Hil0
I like this definition (which comes with a whole book): https://staffeng.com/
It also helps if you realize staff+ is just a way to financially reward people who don’t want to be managers so you end up with these unholy engineer/architect/project manager/product manager hybrids that have to influence without authority.
At the end, I ask it to give me a quiz on everything we talked about and any other insights I might have missed. Instead of typing out the answers, I just use Apple Dictation to transcribe my answers directly.
It's only recently that I thought to take the conversation I just had, and have it write a blog post of the insights and ah-ha moments I had, and have it write a blog post. It takes a fair bit of curation to get it to do that, however. I can't just say, "write me a blog post on all we talked about". I have to first get it to write an outline with the key insights. And then based on the outline, write each section. And then I'll use chatgpt's canvas to guide and fine-tune each section.
However, at no point do I have to specifically write the actual text. I mostly do curation.
I feel ok about doing this, and don't consider it AI slop, because I clearly mark at the top that I didn't write a word of it, and it's the result of a curated conversation with 4o. In addition, I think if most people do this as a result of their own Socratic methods with an AI, it'd build up enough training data for next generation of AI to do a better job of writing pedagogical explanations, posts, and quizzes to get people learning topics that are just out of reach, but there hadn't been too many people able to bridge the gap.
The two I had it write are: Effects as Protocols and Contexts as Agents: https://interjectedfuture.com/effects-as-protocols-and-conte...
How free monads and functors represent syntax for algebraic effects: https://interjectedfuture.com/how-the-free-monad-and-functor...
This is key - if it's marked clearly as AI-generated or assisted, it's not slop. I think this is an important part of AI ethics that most people can agree with.
Coding assistant LLMs have changed how I work in a couple of ways:
1) They make it a lot easier to context switch between e.g. writing kernel code one day and a Pandas notebook the next, because you're no longer handicapped by slightly forgetting the idiosyncrasies of every single language. It's like having smart code search and documentation search built into the autocomplete.
2) They can do simple transformations of existing code really well, like generating a match expression from an enum. They can extrapolate the rest from 2-3 examples of something repetitive, like converting from Rust types into corresponding Arrow types.
I don't find the other use cases the author brings up realistic. The AI is terrible at code review and I have never seen it spot a logic error I missed. Asking the AI to explain how e.g. Unity works might feel nice, but the answers are at least 40% total bullshit and I think it's easier to just read the documentation.
I still get a lot of use out of Copilot. The speed boost and removal of friction lets me work on more stacks and, consequently, lead a much bigger span of related projects. Instead of explaining how to do something to a junior engineer, I can often just do it myself.
I don't understand how fresh grads can get use out of these things, though. Tools like Copilot need a lot of hand-holding. You can get them to follow simple instructions over a moderate amount of existing code, which works most of the time, or ask them to do something you don't exactly know how to do without looking it up, and then it's a crapshoot.
The main reason I get a lot of mileage out of Copilot is exactly because I have been doing this job for two decades and understand what's happening. People who are starting in the industry today, IMO, should be very judicious with how they use these tools, lest they end up with only a superficial knowledge of computing. Every project is a chance to learn, and by going all trial-and-error with a chatbot you're robbing yourself of that. (Not to mention the resulting code is almost certainly half-broken.)
Just last night I did a quick test on Cursor (first time trying it). Opened up my IRC bot project and asked it to "add relevant handlers for IRC messages".
It immediately recognised the pattern I had used before and added CTCP VERSION, KICK, INVITE and 433 (nickname already in use). It didn't try to add everything under the sun and just added those. Took me 20 seconds.
That's bad because it makes "not training your juniors" the default path for senior people.
I can assign the task to one of my junior engineers and they will take several days of back and forth with me to work out the details--that's annoying but it's how you train the next generation.
Or I can ask the LLM and it will spit back something from its innards that got indexed from Github or StackOverflow. And for a "junior engineer" task it will probably be correct with the occasional hallucination--just like my junior engineers. And all I have to do for the LLM is click a couple of keys.
With all the talk of o1-pro as a superb staff engineer-level architect, it took me awhile to re-parse this headline to understand what the author, apparently a staff engineer, meant
I stick to a "no copy & paste" rule and that includes autocomplete. Interactions are a conversation but I write all my code myself.
I would be so bored if my job consisted of writing prompts all day long.
LLMs make it quicker for me to:
- Decipher obscure error messages
- Knock out a quick exploratory prototype of a new idea, both backend and frontend code
- Write boiler plate code against commonly used libraries
- Debug things: feeding a gnarly bug plus my codebase into Gemini (for long context) or o3-mini can save me a TON of frustration
- Research potential options for libraries that might help with a problem
- Refactor - they're so good at refactoring now
- Write tests. Sometimes I'll have the LLM sketch out a bunch of tests that cover branches I may have not bothered to cover otherwise.
I enjoy working like this a whole lot more than I enjoyed working without them, and I enjoyed programming a lot prior to LLMs.
I enjoy writing code, but I enjoy seeing getting a feature out even more. In fact, I don't quite enjoy the part of writing basic logic or tweaking CSS which an intern can easily do.
I don't think anybody is writing prompts all day long. If you don't actually know how to write code, maybe. But at this point, a professional software engineer still works with a code base with a hands-on approach most of the time, and even heavy LLM users still spend a lot of time hand writing code.
- imprecise semantic search
- simple auto-completion (1-5 tokens)
- copying patterns with substitutions
- inserting commonly-used templates