I work for Google, and I just got done with my work day. I was just writing I guess what you'd call "AI generated code."
But the code completion engine is basically just good at finishing the lines I'm writing. If I'm writing "function getAc..." it's smart enough to complete to "function getActionHandler()", and maybe suggest the correct arguments and a decent jsdoc comment.
So basically, it's a helpful productivity tool but it's not doing any engineering at all. It's probably about as good, maybe slightly worse, than Copilot. (I haven't used it recently though.)
NotAnOtter 346 days ago [-]
I also work at google (until last Friday). Agree with what you said. My thoughts are
1. This quote is clearly meant to exaggerate reality, and they are likely including things like fully automated CL/PR's which have been around for a decade as "AI generated".
2. I stated before that if a team of 10 is equally as productive as a team of 8 utilizing things like copilot, it's fair to say "AI replaced 2 engineers", in my opinion. More importantly, Tech leaders would be making this claim if it were true. Copilot and it's clones have been around long enough know for the evidence to be in, and no one is stating "we've replaced X% of our workforce with AI" - therefor my claim is (by 'denying the consequent'), using copilot does not materially accelerate development.
ahmedfromtunis 346 days ago [-]
> no one is stating "we've replaced X% of our workforce with AI"
Even if that's been happening, I don't think it would be politically savvy to admit it.
In today's social climate claiming to replace humans with AI would attract the wrong kind of attention from politicians (during an election year) and from the public in general.
This would be even more unwise to admit for a company like Google who's an "AI producer". They may leave such a language for closed meetings with potential customers during sales pitches though.
whywhywhywhy 346 days ago [-]
> and from the public in general
Don't think the public will be that concerned about people in Google's salary bracket losing their jobs.
jl6 345 days ago [-]
It’s a disservice to the public to assume they aren’t capable of understanding why AI job losses might be concerning even if they aren’t directly impacted. Most people aren’t so committed to class warfare that they will root for the apocalypse as long as it stomps a rich guy first.
wavewrangler 345 days ago [-]
You mean poor person. As long as it stomps a poor person. The rich don’t have a habit of getting stomped. They direct other poor people to stomp their contemporaries. The poor don’t have a chance.
whatshisface 345 days ago [-]
I don't think a lot of people realize how few people are "rich" in the sense of not being impacted by the labor market, or how virtually all of them are retirees. CFOs aren't looking forward to a massive shift in the labor market for accountants any more than CPAs. Warren Buffet has a "job," he writes those letters for BH and oversees the firm's investments at a high level... and most of the people who live off of investments have children in the workforce. Even most people whose children live off of their investments have kids in the (nonprofit) workforce.
tehjoker 345 days ago [-]
Software engineers and grocery store workers are in different income brackets, but in the same class (labor/prolaterian). It is managers, executives, and investors that are in the capitalist class. Class is determined by your relationship to production.
barrkel 345 days ago [-]
Software engineer salaries and stock compensation can be enough to shift alignment somewhat, especially after many years of capital accumulation.
tehjoker 345 days ago [-]
if you make the majority of your earnings from passive income or you do not need to work to live you are more part of the leisure class
barrkel 345 days ago [-]
Two things: capitalists don't not work; and if you have a sizeable portfolio, you may not need to work and may earn plenty of passive income, yet still work because you add more value at the margin working than fiddling with stock allocations or angel investing or whatnot (vs index funds etc.).
datavirtue 345 days ago [-]
It's easy to get a capitalist to come out of retirement. Most of the time you just have to ask them to take a look at your business. Before you know it they accept a board position and shortly thereafter they are running point as President.
tehjoker 343 days ago [-]
For an illustrated example, you can watch Succession
DAGdug 342 days ago [-]
I’ve switched from manager to IC and vice-versa a few times at FAANG. Didn’t strike me as moving between the capitalist and proletariat classes, lol!
ytss 346 days ago [-]
The public might though be concerned that if they are being replaced, many in other positions at other companies will soon be replaced as well.
darth_avocado 345 days ago [-]
That’s not how the mind works. People cheered when Elon fired 80% of the Twitter staff. No one cares when people with high paying jobs suffer.
mmcdermott 345 days ago [-]
The people who cheered about the firing of 80% of the Twitter staff largely believed (rightly or wrongly) that they were being adversely affected by them. While Google may be seen with more wariness in tech circles, I don't think the average person believes that Google is actively harming them (again, rightly or wrongly).
ahmedfromtunis 345 days ago [-]
These aren't the same types of events. In Twitter's case, it was a one-off act, caused by one-off circumstances. With Google, it'd be more of a precursor to a new trend that might soon take root and impact me or those I care about.
almatabata 345 days ago [-]
I think twitter is an outlier because people hated the employees already for various reasons.
And a lot of people heavily disagreed with how they handled moderation. You can take things like the hunter Biden laptop suppression or in the funny category you had the getting banned for saying learn to code (https://reason.com/2019/03/11/learn-to-code-twitter-harassme...).
Take random company without controversies and you will find less vitriol about them getting fired.
pjmlp 345 days ago [-]
No one cares about self checkout on supermarkets impact on their employees, until their employer does something similar.
alsetmusic 343 days ago [-]
I care as a consumer who hates standing in long lines. My former bank branch had thirteen teller stations and two tellers. This wasn't on a bad day. This was for years.
whatshisface 345 days ago [-]
People in Google salary brackets get jobs at Google-1 salary brackets, pushing junior people at Google-1 to Google-2, all the way down to IT departments at non-tech firms. This impacts everybody who's in the industry or capable of switching.
ahmedfromtunis 345 days ago [-]
Why would the general public care about Google employees. Google is however a major saas provider. And people might start to worry that their employer is going to soon buy a subscription to whatever that that Google used to automate jobs.
wbl 345 days ago [-]
The bank tellers didn't go away: they just became higher paid and higher skilled when cash management was no longer the job.
burningChrome 345 days ago [-]
>> Even if that's been happening, I don't think it would be politically savvy to admit it.
When I was working in RPA (robotic process automation) about 7 years ago, we were explicitly told not to say "You can reduce your team size by having use develop an automation that handles what they're doing!"
Even back then we were told to talk about how RPA (and by proxy AI) empowers your team to focus on the really important things. Automation just reduces the friction to getting things done. Instead of doing 4 hours of mindless data input or moving folders from one place to the other, automation gives you back those four hours so your team can do something sufficiently more important and focus on the bigger picture stuff.
Some teams loved the idea. Other leaders were skeptical and never adopted it. I spent the majority of those three years trying to selling them on this idea automation was good and very little time actually coding. Its interesting seeing the paradigm shift and seeing this stuff everywhere now.
aleph_minus_one 345 days ago [-]
> Even back then we were told to talk about how RPA (and by proxy AI) empowers your team to focus on the really important things.
As a non-politically savy person ;-) I have a feeling that this is a similarly dangerous message, since what prevents many teams to focus on really important things is often far too long meetings with managers and similar "important" stakeholders.
ethbr1 345 days ago [-]
The reason you don't lead with headcount reduction is two-fold.
1. Almost every business has growing workload. That means reassigning good employees and not hiring new headcount, not firing existing headcount. Unipurpose, low-value offshore teams are the only ones who get cut (e.g. doing "{this} for every one of {these}" work).
2. Most operational automation is impossible to build well without deep process expertise from the SME currently performing it. If you fire that person immediately after automating their task, what do you think the next SME tells you, when you need their help?
Successfully scaling operational automation programs therefore rely on additional headcount avoidance (aka improving their volume:employee ratio) and value measurement (FTE-equivalent time savings) to justify/measure.
lenerdenator 345 days ago [-]
> I don't think it would be politically savvy to admit it.
Would it be? Do they care?
Sam Altman's been talking about how GenAI could break capitalism (maybe not the exact quote, but something similar), and these companies have been pushing out GenAI products that could obviously and easily be used to fake photographic or video evidence of things that have occurred in the real world. Elon's obsessed with making an AI that's trained to be a 20-year-old male edgelord from the sewer pits of the internet.
Compared to those things, "we've replaced X% of our workforce with AI" is absolutely anodyne.
agentultra 345 days ago [-]
100%.
Altman encourages anyone that will listen to him that monopolies are the only path to success in business. He has a lot riding on making sure everyone is addicted to AI and that he’s the one selling the shovels.
Google isn’t far off.
Most capitalists have this fantasy that they can reduce their labour expenses with AI and continue stock buy-backs and ever-increasing executive payouts.
What sucks is that they rely on class divisions so that people don’t feel bad when the “overpaid” software developers get replaced. Problem is that software developers are also part of the proletariat and creating these artificial class divisions is breaking up the ability to organize.
It’s not AI replacing jobs, it’s capital holders. AI is just the smoke and mirrors.
ahmedfromtunis 345 days ago [-]
Sam's company is not a multi-trillion dollar behemoth that employs hundreds of thousands and has practical (near-)monopoly on a huge swaths of the digital economy.
rty32 346 days ago [-]
> I don't think it would be politically savvy to admit it.
Depends on who you ask.
If Trump wins and Elon Musk actually gets a new job, they would be bragging about replacing humans with AI all day long. And corporates are going to love it.
Not sure about what voters think though. But the fact that most of these companies are in California, New York etc means that it barely matters.
petre 345 days ago [-]
Yup, just like full self driving and ending the war in Ukraine on 24 hours.
sfink 345 days ago [-]
I find the boast about ending the war to be reasonably likely -- if it is clear the US is switching sides in the conflict, a negotiated capitulation could happen pretty quickly.
In a similar vein, solving world hunger is closer today than it's ever been. The previous best hope was global thermonuclear war, but honestly that would leave enough survivors as to be mostly ineffective, and much more likely to have the opposite result. Severe climate change has a better shot at fully eliminating [human] hunger.
ulfw 345 days ago [-]
Corporates will soon have to realise the hard reality that when masses of humans have been replaced there won't be masses of humans with salaries to buy said corporate's goods anymore.
datavirtue 345 days ago [-]
AI is socialism, and it's unstoppable. People are trying to stop progress and go back to the old days. Nothing about the universe permits this.
A new economy is forming and there is nothing that can stop it without causing major, unintended fallout.
burningChrome 345 days ago [-]
>> they would be bragging about replacing humans with AI all day long.
Has either bragged about this at all?
The only thing I've heard floated is Musk running a "government efficiency commission" which I just assumed meant he would be looking for ways to gut a lot of the never ending, never dying government programs. I've never heard him saying the commissions goal was to replace people with AI.
The former president said such an audit would be to combat waste and fraud and suggested it could save trillions for the economy.
As the first order of business, Trump said that this commission will develop an action plan to eliminate fraud and improper payments within six months.
datavirtue 345 days ago [-]
Trump and Musk will get bored quickly if elected. Once in office your power is checked.
tjahg 345 days ago [-]
[flagged]
lenerdenator 345 days ago [-]
That would be the way someone with no real awareness of the philosophies and realities of the two parties in the US would see it. And to be fair, that's a good description of a large chunk of the American electorate.
But you can't have a guy who literally used to relieve himself into a golden toilet take over your party and be anything but the party of big business and billionaires.
345 days ago [-]
Thorrez 345 days ago [-]
>Despite widespread rumors, there is no verified evidence that Trump actually owns a gold toilet.
Still a guy who operated multiple luxury hotel and golf course properties that would laugh a working man out the front door if he asked for an affordable room.
onion2k 346 days ago [-]
no one is stating "we've replaced X% of our workforce with AI"
That's only worth doing if you're trying to cut costs though. If the company has unmet ambitions there's no reason to shrink the headcount from 10 to 8 and have the same amount of output when you can keep 10 people and have the output of 12 by leveraging AI.
hyperpape 346 days ago [-]
Almost all the big tech companies have had layoffs over the past several years. I think it’s safe to say cost cutting is very much part of their goal.
lupire 346 days ago [-]
But the specific roles being laid off are arbitrary, and the overall goal headcount reduction is driven by macroeconomics factors (I'm being generous there), not based on new efficiencies.
Note the difft between "cost cutting" (do less, to lower cost) and "efficiency" (do same, but with less cost)
theptip 345 days ago [-]
The goal of these cost cutting initiatives is not an absolute reduction in cost, but a relative one. They needed to show an improvement in operating margin, ie % of revenue spent on engineers.
If your engineers become 20% more efficient then your margins are better and your problem is solved. (Indeed if you have tech that can make any engineer 20% more efficient then you are back in the game of hiring as many as you can find, as long as each added engineer brings in enough additional revenue.)
346 days ago [-]
346 days ago [-]
ktnaWA 346 days ago [-]
Thanks, that is how I read the announcement. The powers that be decided that there must be some quota to be fulfilled, and magically that quota was fulfilled.
AI engineers will not yet get a Nobel prize for putting everyone out of work.
pj_mukh 345 days ago [-]
"we've replaced X% of our workforce with AI"
Most likely what is actually happening is that the X% of workforce you would lay off is being put to other projects and Google in general can take on X% more projects for the same labor $$. So there is no real reason to make that particular "replaced" statement.
Sparkyte 345 days ago [-]
Google has to sell its AI some how. The problem is that businesses will see this and want to fire head count because they go, "Well I guess AI can do it for freeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee!". Nope no way is it writing code freely.
wcoenen 345 days ago [-]
> including things like fully automated CL/PR's which have been around for a decade
I haven't seen this yet so I'm intrigued. Is this a commercial product, or internal tooling?
OkGoDoIt 345 days ago [-]
I’m assuming this refers to things analogous to dependabot on GitHub where maybe it automatically updates a library version reference and runs the tests and creates a PR if everything seems good, or similarly for fixing style issues or other stuff that is pretty trivial and has good test coverage.
When you maintain an open source project on GitHub you will occasionally get some open source automated bot that submits a PR to do things like this without you even asking, and I’m sure there’s plenty more you can sign up for or implement yourself.
I wouldn’t really call it AI, but it is automated. I agree with the parent comment that a journalist trying to push an angle would probably lump it in as AI in order to make the number seem larger.
NotAnOtter 343 days ago [-]
It's common at most mega-corps like google. For example, if a utility function in an internal library was deprecated and replaced with a different function that has the same functionality. A team might write a script which generates hundreds/thousands of PR's to make the migration to the new function.
You don't want a single PR that does that, because that would affect thousands of projects, and if something goes wrong with a single one, the whole PR needs to be rolled back.
nlehuen 346 days ago [-]
I also work at Google and I agree with the general sentiment that AI completion is not doing engineering per se, simply because writing code is just a small part of engineering.
However in my experience the system is much more powerful than you described. Maybe this is because I'm mostly writing C++ for which there is a much bigger training corpus than JavaScript.
One thing the system is already pretty good at is writing entire short functions from a comment. The trick is not to write:
function getAc...
But instead:
// This function smargls the bleurgh
// by flooming the trux.
function getAc...
This way the completion goes much farther and the quality improves a lot. Essentially, use comments as the prompt to generate large chunks of code, instead of giving minimum context to the system, which limits it to single line completion.
Aachen 346 days ago [-]
This type of not having to think about the implementation, especially in a language that we've by now well-established can't be written safely by humans (including by Google's own research into Android vulnerabilities if I'm not mistaken), at least with the current level of LLM, worries me the most
Time will tell whether it outputs worse, equal, or better quality than skilled humans, but I'd be very wary of anything it suggests beyond obvious boilerplate (like all the symbols needed in a for loop) or naming things (function name and comment autocompletes like the person above you described)
munksbeer 345 days ago [-]
> worries me the most
It isn't something I worry about at all. If it doesn't work and starts creating bugs and horrible code, the best places will adjust to that and it won't be used or will be used more judiciously.
I'll still review code like I always do and prevent bad code from making it into our repo. I don't see why it's my problem to worry about. Why is it yours?
Aachen 345 days ago [-]
Because I do security audits
Functional bugs in edge cases are annoying enough, and I seem to run into these regularly as a user, but there's yet another class of people creating edge cases for their own purposes. The nonchalant "if it doesn't work"... I don't know whether that confirms my suspicion that not all developers are aware of (as a first step; let alone control for) the risks
twoWhlsGud 345 days ago [-]
And especially if it generates bugs in ways different from humans - human review might be less effective at catching it...
xp84 343 days ago [-]
It generates bugs in pretty similar ways. It’s based on human-written code, after all.
Edge cases will usually be the ones to get through. Most developers don’t correctly write tests that exercise the limits of each input (or indeed have time to both unit test every function that way, and integration test to be sure the bigger stories are correctly working). Nothing about ai assist changes any of this.
(If anybody starts doing significant fully unsupervised “ai” coding they would likely pay the price in extreme instability so I’m assuming here that humans still basically read/skim PRs the same as they always have)
mbfg 345 days ago [-]
Except that no one trusts Barney down the hall that has stack overflow open 24/7. People naturally trust AI implicitly.
caeril 345 days ago [-]
It's worrying, yes, but we've had stackoverflow copy-paste coding for over a decade now already, which has exactly the same effects.
This isn't a new concern. Thoughtless software development started a long time ago.
Aachen 345 days ago [-]
As a security consultant, I think I'm aware of security risks all the time, also when I'm developing code just as a hobby in spare time. I can't say that I've come across a lot of stackoverflow code that was unsafe. It happened (like unsafe SVG file upload handling advice) and I know of analyses that find it in spades, but I personally correct the few that I see (got enough stackoverflow rep to downvote, comment, or even edit without the user's approval though I'm not sure I've ever needed that) and the ones found in studies may be in less-popular answers that people don't come across as often because we should be seeing more of them otherwise, both personally and in the customer's code
So that's not to say there is nothing to be concerned about on stackoverflow, just that the risk seems manageable and understood. You also nearly always have to fit it to your own situation anyway. With the custom solutions from generative models, this is all not yet established and you're not having to customise (look at) it further if it made a plausible-looking suggestion
Perhaps this way of coding ends up introducing fewer bugs. Time will tell, but we all know how many wrong answers these things generate in text as well as what they were trained on, giving grounds for worry—while also gathering experience, of course. I'm not saying to not use it at all. It's a balance and something to be aware of
I also can't say that I find it to be thoughtless when I look for answers on stackoverflow. Perhaps as a beginning coder, you might copy bigger bits? Or without knowing what it does? That's not my current experience, though
miki123211 346 days ago [-]
This is a good idea even outside of Google, with tools like copilot and such.
Often when I don't know exactly what function / sequence of functions I need to achieve a particular outcome, I put in a comment describing what I want to do, and Copilot does the rest. I then remove the comment once I make sure that the generated code actually works.
I find it a lot less flow-breaking than stackoverflow or even asking an LLM.
It doesn't work all of the time, and sometimes you do have to Google still, but for the cases it does work for, it's pretty nice.
Aachen 346 days ago [-]
Why remove the comment that summarises the intent for humans? The compiler will ignore your comment anyway, so it's only there for the next human who comes along and will help them understand the code
miki123211 344 days ago [-]
Because the code, when written, is usually obvious enough.
Doesn't need an explanation, but when working in a language I don't know well, I might not remember whether I'm supposed to call orderBy on the query or on the ORM module and pass query as the argument, whether the kwarg is called "field" or "column", whether it wants a string or something like `User.name` as the column expression, how to specify the ordering and so on.
randomdata 346 days ago [-]
Like he says, the "comment" describes what he wants to do. That's not what humans are interested in. The human already knows "what he wants to do" when they read the code. It's the things like "why did he want to do this in the first place?" that is lacking in the code, and what information is available to add in a comment for the sake of humans.
Remember, LLMs are just compilers for programming languages that just so happen to have a lot of similarities with natural language. The code is not the comment. You still need to comment your code for humans.
JohnFen 345 days ago [-]
> Like he says, the "comment" describes what he wants to do. That's not what humans are interested in.
When I'm maintaining other people's code, or my own after enough time has gone by, I'm very interested in that sort of comment. It gives me a chance to see if the code as written does what the comment says it was intended to do. It's not valuable for most of the code in a project, but is incredibly valuable for certain key parts.
You're right that comments about why things were done the way they were are the most valuable ones, but this kind of comment is in second place in my book.
mithametacs 345 days ago [-]
Or for something that needs like a quick mathematical lemma or a worked example. A comment on what is fantastic.
qwertox 346 days ago [-]
It's often unnecessarily verbose. If you read a comment and glance at the code that follows, you'll understand what it is supposed to do. But the comment you're giving as an instruction to an LLM usually contains information which will then be duplicated in the generated code.
Aachen 346 days ago [-]
I see. Might still be good to have a verbose comment than no comment at all, as well as a marker of "this was generated" so (by the age of the code) you have some idea of what quality the LLM was in that year and whether to proofread it once more or not
lupire 346 days ago [-]
External comments are API usage comments.
LLM prompts are also implementation proposal.
Implementation comments belong inside the implementation, so they should be over if not deleted.
cryptonym 346 days ago [-]
Next human will put the code in a prompt and ask what it does. Chinese Whispers.
Aachen 346 days ago [-]
I tried making a meme some months ago with exactly this idea, but for emails. One person would tell an LLM "answer that I'm fine with either option" and sends a 5 KB email, in response to which the recipient receives it and gets the automatic summary function to tell them (in a good case) "they're happy either way" or (in a bad case) "they don't give a damn". It didn't really work, too complex for meme format as far as my abilities went, but yeah the bad translator effect is something I'm very much expecting from people who use an LLM without disclosing it
_heimdall 346 days ago [-]
If someone is going to use an LLM to send me an email, I'd much rather them just send me the prompt directly. For the LLM message to be useful the prompt would have included all the context and details anyway, I don't need an LLM to make it longer and sound more "professional" or polite.
Aachen 345 days ago [-]
That is actually exactly my unstated point / the awareness I was hoping to achieve by trying to make that meme :D
mithametacs 345 days ago [-]
Not necessarily. Your prompt could include instructions to gather information from your emails and address book to tell your friend about all the relevant contacts you know in the shoe industry.
_heimdall 345 days ago [-]
Well that sounds reasonable enough. My only request is that you send me the prompt and let me decide if I want to comply...informed consent!
Wow, I love good, original programming jokes like these, even the ideas of the jokes. I used to browse r/ProgrammerHumor frequently, but it is too repetitive -- mostly recycled memes and there is anything new.
(No need to Orientalize to defamiarize, especially when a huge fraction of the audience is Chinese, so Orientalizing doesn't defamiliarize. Game of Whispers or Telephone works fine.)
protomolecule 346 days ago [-]
Do the Chinese call it English Whispers?
tessierashpool 345 days ago [-]
Chinese-Americans, at least, call it a game of Telephone, like everyone else in the English-speaking world except for the actual English.
We call it “Telephone” because “Chinese Whispers” not only sounds racist, it is also super confusing. You need a lot of cultural context to understand the particular way in which Chinese whispers would be different from any other set of whispers.
tessierashpool 332 days ago [-]
I happened to re-read this, and to be clear, I'm not Chinese-American. the "we" there means "everyone else in the English-speaking world except for the actual English."
ahoka 345 days ago [-]
It’s all Greek to them.
cryptonym 342 days ago [-]
Pardon my French.
jappgar 346 days ago [-]
I can guarantee you there is more publicly accessible javascript in the world than C++.
Copilot will autocomplete entire functions as well, sometimes without comments or even after just typing "f". It uses your previous edits as context and can assume what you're implementing pretty well.
infecto 346 days ago [-]
I can guarantee you that the author was referencing code within Google. That is, their tooling is trained off internal code bases. I am imagining c++ dwarfs javascript.
lupire 346 days ago [-]
Google does not write much publicly available JavaScript. They wrote their own special flavor. (Same for any hugel
legacy operation)
bilekas 346 days ago [-]
Can we get some more info on what you're reffering to ?
jkaptur 345 days ago [-]
They're probably talking about Closure Compiler type annotations [0], which never really took off outside Google, but (imo) were pretty great in the days before TypeScript. (Disclosure: Googler)
I find writing code to be almost relaxing plus that's really a tiny fraction of dev work. Not too excited about potential productivity gains based purely on authoring snippets. I find it much more interesting on boosting maintainability, robustness and other quality metrics (not focusing on quality of AI output, actual quality of the code base).
xp84 343 days ago [-]
I frequently use copilot and also find that writing comments like you do, to describe what I expect each function/class/etc to do gives superb results, and usually eliminates most of the actual coding work. Obviously it adds significant specification work but that’s not usually a bad thing.
michaelbuckbee 346 days ago [-]
I don't work at Google, but I do something similar with my code: write comments, generate the code, and then have the AI tooling create test cases.
AI coding assistants are generally really good at ramping up a base level of tests which you can then direct to add more specific scenario's to.
342 days ago [-]
tomhallett 345 days ago [-]
Has anyone made a coding assistant which can do this based off audio which I’m saying out loud while I’m typing (interview/pairing style), so instead of typing the comment I can just say it?
hecanjog 345 days ago [-]
I had some success using this for basic input, but never took it very far. It's meant to be customizable for that sort of thing though: https://talon.wiki/quickstart/getting_started/ (Edit: just the voice input part)
alickz 346 days ago [-]
Comment Driven Programming might be interesting, as an offshoot of Documentation Driven Programming
gniv 346 days ago [-]
That's pretty nice. Does it write modern C++, as I guess it's expected?
So this is basically the google CEO saying "a quarter of our terminal inputs is written by a glorified tab completion"?
asdfman123 346 days ago [-]
Yes. Most AI hype is this bad. They have to justify the valuations.
remus 346 days ago [-]
"tab completion good enough to write 25% of code" feels like a pretty good hit rate to me! Especially when you consider that a good chink of the other 75% is going to be the complex, detailed stuff where you probably want someone thinking about it fairly carefully.
rantallion 346 days ago [-]
The problem being that the time spent fixing the bugs in that 25% outweighs the time saved. Now that tools like Copilot are being widely used, studies are showing that they do not in fact boost productivity. All claims to the contrary seem to be either anecdotal or marketing fluff.
The AI tap complition is >100000% better than the coding assistants, it just saves you typing and doesn't introduce new bugs you need to fix instead of writting buggy shitty code from a text description.
red_admiral 346 days ago [-]
As far as I know, LLMs are a genuine boost for junior developers, but still not close to what senior/principal engineers get up to.
makestuff 345 days ago [-]
I have around 7 YOE, and I have found LLMs useful for very specific questions about syntax whenever I am working in a new language. For example, I needed to write some typescript recently and asked it how can I make a type that does X.
It is not as good with questions about API documentation for popular java libraries though and it will just hallucinate APIs/method names.
If I ask it a generic question like "how can I create a class in Java to invoke this API and store the data in this database" it is pretty useless. I'm sure I could spend more time giving it a better prompt but at that point I can just write the code myself.
Overall they are a better search engine for stackoverflow, but the LLMs are not really helping me code 30% faster or whatever the latest claim is.
_heimdall 346 days ago [-]
It'd be interesting to know how much of Google's code is written by junior engineers. I can't imagine 25% of the code is from juniors, at which point Google's CEO is either exaggerating what he considers LLM-generated code or more than just juniors are using it.
I agree with your take though, it does seem helpful to juniors but not beyond that (yet), and this OP stat seems dubious unless juniors are doing a big portion of the work.
red_admiral 346 days ago [-]
"rm re[TAB]" to remove a file called something like "report-accounting-Q1_2024.docx" is really helpful, especially when it adds quotes as required, but not exciting enough to get me out of bed any earlier in the morning.
I feel it's a bit like the old "measuring developer productivity in LoC" metric.
As I hinted at in another comment, in Java if you had a "private String name;" then the following:
/**
* Returns the name.
* @return The name.
*/
public String getName() {
return this.name;
}
and the matching setter, are easy enough to generate automatically and you don't need a LLM for it. If AI can do that part of coding a bit better, sure it's helpful in a way, but I'm not worried about my job just yet (or rather, I'm more worried about the state of the economy and other factors).
Maxion 346 days ago [-]
For me it's really goddam satisfying having good autocomplete, especially when you are just writing boilerplate lines of code to get the code into a state where you actually get to work on the fun stuff (ther harder problems).
amelius 346 days ago [-]
Also if your code gets sent to someone else's cloud?
infecto 346 days ago [-]
I don't care. The vast majority of code written in the private space is garbage and not unique. Products are usually not won because of the code.
Would I send the source of a trading algo or chatgpt to a third party, probably not but those are the outliers. The code for your xyz SAAS does not matter.
I am probably an outlier in that I don't really care what corpus a LLM trains off of. Its its available in the public space, go for it.
mewpmewp2 346 days ago [-]
Have you ever had your code repository hosted by Github, Bitbucket, Gitlab or similar?
If so, all your code is sent to cloud.
amelius 346 days ago [-]
Answer: yes, some code. But other code I and my company like to keep private.
mewpmewp2 346 days ago [-]
Where exactly is the repo hosted if there is one?
cesarb 346 days ago [-]
It's common for companies to have something like self-hosted GitHub Enterprise or self-hosted GitLab hidden behind the company's VPN.
mewpmewp2 345 days ago [-]
But where is the box where it's hosted? Is it in-house?
_heimdall 346 days ago [-]
There are alternatives out there for self-hosted git. I have a Gitea instance running on a mini PC at home for my own projects.
mewpmewp2 345 days ago [-]
Do you have backups of that as well? If something were to happen to your mini pc would you lose your code?
_heimdall 345 days ago [-]
Great question, yeah I do. Right now it backs up to a separate NAS on my home network. Every once in a while I'll copy the most important directories onto a microSD card backup, but its usually going to be at least a few weeks out of date.
amelius 346 days ago [-]
Own servers.
mewpmewp2 345 days ago [-]
Do they manage their own servers? I wonder what proportion of companies would have in house servers managed by themselves.
amelius 345 days ago [-]
They are colocated in a data center and you need physical keys to access the rack.
red_admiral 346 days ago [-]
Internally hosted gitlab instances are a thing.
mewpmewp2 346 days ago [-]
They are, but frequently the boxes where they are hosted are in AWS or similar. Or do frequently companies have actual in house servers for this purpose?
red_admiral 345 days ago [-]
Not in house, but in a "segmented" part of the cloud that comes with service level agreements and access control and restrictions on which countries the data can be hosted in and compliance procedures etc. etc.
An extreme example of this would be the AWS GovCloud for government/military applications.
keybored 346 days ago [-]
25% is a great win if you are prone to RSI. And for quicker feedback. But in terms of the overarching programming goal? Churning out code is a small part of it.
Code is often a liability.
shombaboor 345 days ago [-]
It would be funny if they had a metric for how much code is completed by CTRL+V
unglaublich 346 days ago [-]
Yes, isn't that the essential idea of industrialization and automation?
OtherShrezzing 346 days ago [-]
I think the critique here is that the AI currently deployed at Google hasn't meaningfully automated this user's life, because most IDEs already solved "very good autocomplete" more than a decade ago.
tormeh 346 days ago [-]
LLM autocomplete is on an entirely different level. It's not comparable to traditional autocomplete and mostly does not even compete with traditional autocomplete. LLM autocomplete will sometimes write entire blocks of code for you, with surprising skill. I often wonder how it knew what I wanted. It also generates some wrong code from time to time, but that's well worth it.
randomdata 346 days ago [-]
> LLM autocomplete is on an entirely different level.
Which is how they've surpassed 25% in new code, as compared to the 10% (made up number, but clearly non-zero) in the past. But incremental improvement, is all.
busterarm 345 days ago [-]
glorified, EXPENSIVE tab completion.
walthamstow 345 days ago [-]
I assume you're referring to the compute/energy used to run the completion?
busterarm 345 days ago [-]
to train the model
mmmpetrichor 346 days ago [-]
Yeah, but he wants people to hear "reduce headcount by 25% if you buy our shit!"
mewpmewp2 346 days ago [-]
How do you know that? You are creating this false sense of expectations and hype yourself.
I am going to argue contrary. If AI increases productivity 2x, it opens up as much new usecases that previously didn't seem worthy to do for its cost. So overall there will just be more work.
JimDabell 346 days ago [-]
> I am going to argue contrary. If AI increases productivity 2x, it opens up as much new usecases that previously didn't seem worthy to do for its cost. So overall there will just be more work.
This is the entire history of the computing industry. We’ve been automating our work away for decades and it just creates more demand.
mewpmewp2 346 days ago [-]
Yeah, this is only side projects, but I've been spending pretty much all of my free time now on side projects, largely because I feel much faster building them with LLMs and it has a compounding motivational effect. I also see so many use cases and work left to do, even with AI, the possibilities almost overwhelm me.
Well I do freelancing as well besides my usual day to day work, and that's also where direct benefits apply, and I'm getting more and more work, overwhelmingly so.
pawelmurias 346 days ago [-]
[flagged]
binkHN 346 days ago [-]
I wouldn't call it genius tab completion. Unfortunately, more than half of the time that the "genius" produces the code, I'm wasting my time reviewing code that is incorrect.
tguinot 346 days ago [-]
I'm sorry but I don't understand how people say LLMs are simply "tab completion".
They allow me to do much more than that thanks to all the knowledge they contain.
For instance, yesterday I wanted to write a tool that transfers any large file that is still being appended to to multiple remote hosts, with a fast throughput.
By asking Claude for help I obtained exactly what I want in under two hours.
I'm no C/C++ expert yet I have now a functional program using libtorrent and libfuse.
By using libfuse my program creates a continuously growing list of virtual files (chunks of the big file).
A torrent is created to transfer the chunks to remote hosts.
Each chunk is added to the torrent as it appears on the file system thanks to the BEP46 mutable torrent feature in libtorrent.
On each receving host, the program rebuilds the large file by appending new chunks as soon as they are downloaded through the torrent.
Now I can transfer a 25GB file (and growing) to 15 hosts as it is being written too.
Before LLM this would have taken me at least four days as I did not know those libraries.
LLMs aren't just parrots or tab completers, they actually contain a lot of useful knowledge and they're very good at explaining it clearly.
qwertox 346 days ago [-]
> By asking Claude for help I obtained exactly what I want in under two hours.
Did you use it in your editor or via the chat interface in the browser? Because they are two different approaches, and the one in the editor is mostly a (pretty awesome) tab completion.
When I tell an LLM to "create a script which does ..." I won't be doing this in the editor, even if copilot does have the chat interface. I'll be doing this in the browser because there I have a proper chat topic to which I can get back later, or review it.
tguinot 346 days ago [-]
I did not use copilot or cursor. I used the Claude interface. I'm planning to setup a proper editor tool such as Cursor as I believe they got much better lately.
Last time I tried was 2023 and it was kind of a pain in the butt.
qwertox 346 days ago [-]
I tried Cursor this month but even though it is much better than copilot, it also tries to do too much. And both of them fail regularly at generating proper autocompletions, which makes Cursor a bigger annoyance because it messes up your code quite often, which copilot doesn't do. Cursor is too aggressive.
But using copilot as a better autocomplete is really helpful and well worth the subscription. Just while typing as well as giving it more precise instructions via comments.
It's like a little helper in the editor, while the ChatGPT/Claude in the browser are more like "thinking machines" which can generate really usable code.
tguinot 346 days ago [-]
good to know, thanks
lupire 346 days ago [-]
That's fine for your quick hack that is probably a reimplementation of an existing program you can't find.
But it's not a production quality implementation of new need.
pizzafeelsright 345 days ago [-]
I am of the strong opinion most problems were solved 20-40 years ago and that most code written today is reimplementation using different languages.
I have shipped production code using LLMs in languages I did not study approved by seasoned SWE's is evidence that an acceleration is happening.
tguinot 346 days ago [-]
It's a knowledge base that can explain the knowledge it returns when you ask, how is that not useful in a professional environment for production code?
I mean if you assume all devs are script kiddies who simply copy paste what they find on google (or ChatGPT without asking for explanations) then yeah it's never gonna be useful in a prod setting.
Also you're very wrong to believe every technical need or combination of libraries has already been implemented in open source before.
rty32 345 days ago [-]
True, but hey, even if it's not production code, it may be an ad-hoc thing that never gets push to production, it may be code reviewed by C++ experts and improved to production quality. At very least, someone saved four days with it, and could use the time for something, maybe something they are expert at. Isn't that still good?
mdavid626 345 days ago [-]
Most of the time saving time is just an illusion. When that code will needed to be changed, people will spend more than 4 days debugging and understanding it. The mental model of it was written by AI. It can make sense or not at all. You’ll figure it out after 4 days.
tguinot 345 days ago [-]
The code is 2 files of 80 lines each and is very clear.
There's no way any software developer needs 4 days to understand what it does.
Moreover Claude can explain the functions used very clearly (if you're too lazy to jump to definition in your editor)
LLMs are becoming actually useful to developers new to a language. Just as Google was 20 years ago.
mdavid626 344 days ago [-]
People talk about completey different things. The article was about Google using LLM-s to generate code, not people making 80 lines with them at home. There is a huge difference. I don’t see any problem with the latter, but with the former there are many problems.
znpy 346 days ago [-]
That sounds like a great idea, are you going to open source that?
tguinot 346 days ago [-]
I think I will, I don't have time to maintain additional software right for other people now but I'm definitely planning on open sourcing it when I get time
znpy 346 days ago [-]
Yeah i see your point.
However i think that you might open source the thing with a disclaimer of no maintenance. Whoever is willing to maintain it can just fork it and move along.
346 days ago [-]
bitcharmer 345 days ago [-]
> thanks to all the knowledge they contain
This is what's problematic with modern "AI". Most people inexperienced with it, like the parent commenter will uncritically assume these LLMs poses "knowledge".
This I find the most dangerous and prevalent assumption. Most people are oblivious to the fact how bad LLMs are.
tguinot 345 days ago [-]
I know excatly how bad the output they give is, because I ask for output that I can understand, debug and improve.
People misusing tools don't make tools useless or bad. Especially since LLMs designers never claimed the compressed information inside models is spotless or 100% accurate, or based on logical reasoning.
Any serious engineer with a modicum of knowledge about neural networks knows what can or can't be done with the output.
OnionBlender 346 days ago [-]
Do people find these AI auto complete things helpful? I was trying the XCode one and it kept suggesting API calls that don't exist. I spent more time fixing its errors than I would have spent typing the correct API call.
_kidlike 346 days ago [-]
I really really dislike the ones that get in your way. Like I start typing something and it injects random stuff (yes in the auto-complete colors). I have a similar feeling to when you hear your voice back in a phone: completely disabling your thought process.
In IntelliJ thankfully you can disable that part of the AI, and keep the part that you trigger it when you want something from it.
frereubu 346 days ago [-]
> I have a similar feeling to when you hear your voice back in a phone: completely disabling your thought process.
This is a fantastic description of how it disturbs my coding practice which I hadn't been able to put into words. It's like someone is constantly interrupting you with small suggestions whether you want them or not.
gtirloni 346 days ago [-]
This is it. I have a picture in my mind and then it puts 10 lines of code in front of me and my brain can't ignore. When I'm done reviewing that, it's already tainted my idea.
mu53 346 days ago [-]
I find the simpler engines work better.
I want the end of the line completed with focus on context from the working code base, and I don't want an entire 5 line function completed with incomplete requirements.
It is really impressive when it implements a 5 line function correctly, but its like hitting the lottery
ncruces 346 days ago [-]
I particularly like the part where it suggests changes to pasted code.
When I copy and paste code, very often it needs some small changes (like changing all xs to ys and at the same time widths to heights).
It's very good at this, and does the right thing the vast majority of the time.
It's also good with test code. Test code is supposed to be explicit, and not very abstracted (so someone only mildly familiar with a codebase that's looking at a failing test can at least figure the cause). This means it's full of boilerplate, and a smart code generator can help fill that in.
andyjohnson0 346 days ago [-]
Visual Studio "intellisense" has always been pretty good for me. Seemed to make good guesses about my intentions without doing anything wild. It seemed to use ad hoc rules and patterns, but it worked and then got out of the way.
Then it got worse a couple of years ago when they tried some early-stage AI approach. I turned it off. I expect that next time I update VS it'll have got substantially worse and it will have removed the option for me to disable it.
nobleach 345 days ago [-]
Agreed, the old Visual Basic, Visual C++, Borland Delphi, Visual C# experiences were how I dove into the deep end of several languages back in the late 90's/early 2000's. Things were VERY discoverable at that point. Obviously a deeper understanding of a language is necessary for doing real work, but noodling around just trying to get a feel for what can be done, is a great way to get started.
mcintyre1994 346 days ago [-]
I like Cursor, it seems very good at keeping its autocomplete within my code base. If I use its chat feature and ask it to generate new code that doesn’t work super well. But it’ll almost always autocomplete the right function name as I’m typing, and then infer the correct parameters to pass in if they’re variables and if the function is in my codebase rather than a library. It’s also unsurprisingly really good at pattern recognition, so if you’re adding to an enum or something it’ll autocomplete that sensibly too.
I think it’d be more useful if it was clipboard aware though. Sometimes I’ll copy a type, then add a param of that type to a function, and it won’t have the clipboard context to suggest the param I’m trying to add.
qeternity 346 days ago [-]
I really like Cursor but the more I use it the more frustrated I get when it ends up in a tight loop of wanting to do something that I do not want to do. There doesn’t seem to be a good way to say “do not do this thing or things like it for the next 5 minutes”.
M4v3R 346 days ago [-]
It probably depends on the tool you use and on the programming language. I use Supermaven autocomplete when writing Typescript and it’s working great, it often feels like it’s reading my mind, suggesting what I would write next myself.
vbezhenar 346 days ago [-]
I mostly use one-line completes and they are pretty good. Also I really like when Copilot generates boilerplate like
if err != nil {
return fmt.Errorf("Cannot open settings: %w", err);
}
I_AM_A_SMURF 346 days ago [-]
I use the one at G and it's definitely helpful. It's not revolutionary, but it makes writing code less of a headache when I kinda know what that method is called but not quite.
skybrian 346 days ago [-]
I often delete large chunks of it unread if it doesn't do what I expected. It's much like copy and paste; deleting code doesn't take long.
card_zero 346 days ago [-]
So your test is "seems to work"?
skybrian 346 days ago [-]
No, what I meant is that, much like when copying code, I only keep the generated source code if it's written the way I would write it.
(By "unread" I meant that I don't look very closely before deleting if it looks weird.)
And then write tests. Or perhaps I wrote the test first.
card_zero 346 days ago [-]
Oh, if the AI doesn't do what you expected, got it.
binkHN 346 days ago [-]
Right now my opinion is that they're 60% unhelpful, so I largely agree with you. Sometimes I'll find the AI came up with a somewhat better way of doing something, but the vast majority of the time it does something wrong or does something that appears right, but it's actually wrong and I can only spot it with a somewhat decent code review.
guappa 346 days ago [-]
I suspect that if you work on trivial stuff that has been asked on stackoverflow countless of times they work very nicely.
OnionBlender 345 days ago [-]
This is what I've been noticing. For C++ and Swift, it makes pretty unhelpful suggestions. For Python, its suggestions are fine.
Swift is especially frustrating because it will hallucinate the method name and/or the argument names (since you often have to specify the argument names when calling a method).
guappa 342 days ago [-]
Ah I've had it hallucinate non-existing methods in python rather often.
Or when I say I need to do something, it invents a library that conveniently happens to just do that thing and writes code to import and use it. Except there's no such library of course.
0points 346 days ago [-]
No, not at all.
"classic" intellisense is reliable, so why introduce random source in the process?
4lb0 345 days ago [-]
I use Codeium in NeoVim and yes I find it very helpful. Of course, is not 100% error free, but even when it has errors most of the time it is easier for me to fix them than to write it from scratch.
sharpy 346 days ago [-]
Often yes. There were times when I was writing unit tests that was me just naming the test case, with 99% of the test code auto generated based on the existing code, and the name.
simne 346 days ago [-]
Looks like model is not trained well. From my exp, after make few projects (2 looks enough), oldest XCode managed to give good suggestions in much more than 50% cases.
karmasimida 346 days ago [-]
It is useful in our use case.
Realtime tab completion is good at some really mundane things within the current file.
You still need a chat model, like Claude 3.5 to do more explorational things.
DecoySalamander 345 days ago [-]
I was evaluating it for a month and caught myself regularly switching to an IDE with non-AI intellisense because I wanted code that actually works.
mdavid626 346 days ago [-]
No, not at all. It’s just the hype. It doesn’t replace engineering.
saagarjha 346 days ago [-]
The one Xcode has is particularly bad, unfortunately.
myworkinisgood 346 days ago [-]
Copilot is very good.
cryptica 346 days ago [-]
This is my experience as well. LLMs are great to boost productivity, especially in the hands of senior engineers who have a deep understanding of what they're doing because they know what questions to ask, they know when it's safe to use AI-generated code and they know what issues to look for.
In the hands of a junior, AI can create a false sense of confidence and it acts as a technical debt and security flaw multiplier.
We should bring back the title "Software engineer" instead of "Software developer."
Many people from other engineering professions look down on software engineers as "Not real engineers" but that's because they have the same perspective on coding as typical management types have. They think all code is equal, it's unavoidable spaghetti. They think software design and architecture doesn't matter.
The problems a software engineer faces when building a software system are the same kinds of problems that a mechanical or electrical engineer faces when building any engine or system. It's about weighing up trade-offs and making a large number of nuanced technical decisions to ultimately meet operational requirements in the most efficient, cost-effective way possible.
alxjrvs 346 days ago [-]
In my day to day, this still remains the main way I interact with AI coding tools.
I regularly describe it as "The best snippet tool I've ever used (because it plays horseshoes)".
tomcam 346 days ago [-]
Horseshoes? As in “close enough”?
ttul 346 days ago [-]
Or, as in, “Ouch, man! You hit my foot!”
goykasi 346 days ago [-]
As long as hand grenades arent introduced, I could live with that.
DanHulton 346 days ago [-]
Honestly, I don't think "close only count in horseshoes, hand grenades, and production code" will ever catch on...
alxjrvs 345 days ago [-]
This is why I frame it as a "snippets" plugin, rather than a Code generation tool.
I would be very confused if someone told me that they uncritically used the generated code from a snippet program with no manual input or understanding, and I feel the same with Copilot. At best, it suggests an auto-complete that I read and interpret before accepting.
The closest I come to "code generation" is during test writing, where occasionally I will let the description generate some setup, but only in tests where there are a broad number of examples to follow, and I am still going to end up re-writing a decent chunk of it based on personal example. I would not "let it write the test suite for me" and then trust the green, and I suspect that would easily fail code review (though it would be an interesting experiment...).
Obviously your comment as a good goof and well made, but it does speak to a little bit of the disconnect between what is being touted as an "AI coding tool" and how I, a person who makes react native apps to pay my rent, actually use the dang thing (i.e., "A pretty good snippets plugin"). Is My code 'AI generated'? I wouldn't call it that, but who can say definitively? We're in a fun new semantic world now.
davedx 346 days ago [-]
I'm working on a CRM with a flexible data model, and ChatGPT has written most of the code. I don't use the IDE integrations because I find them too "low level" - I work with GPT more in a sort of "pair programming" session: I give it high level, focused tasks with bits of low level detail if necessary; I paste code back and forth; and I let it develop new features or do refactorings.
This workflow is not perfect but I am definitely building out all the core features way faster than if I wrote the code myself, and the code is in quite a good state. Quite often I do some bits of cleanup, refactorings, making sure typings are complete myself, then update ChatGPT with what the code now looks like.
I think what people miss is there are dozens of different ways to apply AI to your day-to-day as a software engineer. It also helps with thinking things through, architecture, describing best practices.
littlestymaar 346 days ago [-]
I share your sentiment, I've written three apps where I've used language models extensively (a different one for each: ChatGPT, Mixtral and Llama-70B) and while I agree that they where immensely helpful in terms of velocity, there are a bunch of caveats:
- it only works well when you write code from scratch, context length is too short to be really helpful for working on existing codebase.
- the output code is pretty much always broken in some way, and you need to be accustomed to doing code reviews to use them effectively. If you trust the output and had to debug it later it would be a painfully slow process.
Also, I didn't really noticed a significant difference in code quality, even the best model (GPT-4) write code that doesn't work, and I find it much more efficient to use open models on Groq due to the really fast inference. Looking at ChatGPT slowly typing is really annoying (I didn't test o1 and I have no interest in doing so because of its very low throughput).
davedx 346 days ago [-]
> context length is too short to be really helpful for working on existing codebase.
This is kind of true, my approach is I spend a fairly large amount of time copy-pasting code from relevant modules back and forth into ChatGPT so it has enough context to make the correct changes. Most changes I need to make don't need more than 2-3 modules though.
> the output code is pretty much always broken in some way, and you need to be accustomed to doing code reviews to use them effectively.
I think this really depends on what you're building. Making a CRM is a very well trodden path so I think that helps? But even when it came to asking ChatGPT to design and implement a flexible data model it did a very good job. Most of the code it's written has worked well. I'd say maybe 60-70% of the code it writes I don't have to touch at all.
The slow typing is definitely a hindrance! Sometimes when it's a big change I lose focus and alt-tab away, like I used to do when building large C++ codebases or waiting for big test suites to run. So that aspect saps productivity. Conversely though I don't want to use a faster model that might give me inferior results.
littlestymaar 346 days ago [-]
> approach is I spend a fairly large amount of time copy-pasting code from relevant modules back and forth into ChatGPT
It can work, but what a terrible developer experience.
> I'd say maybe 60-70% of the code it writes I don't have to touch at all
I used to to write web apps so the ratio was even higher I'd say (maybe 80/90% of the code didn't need any modification) but the app itself wouldn't work at all if I didn't make those 10% changes. And you really need to read 100% of the code because you won't know upfront where those 10% will be.
> The slow typing is definitely a hindrance! Sometimes when it's a big change I lose focus and alt-tab away, like I used to do when building large C++ codebases or waiting for big test suites to run.
Yeah exactly, it's xkcd 303 but with “IA processing the response” instead of “compiling”. Having instant response was a game changer for me in terms of focus hence productivity.
> I don't want to use a faster model that might give me inferior results
As I said earlier, I didn't really feel the difference in quality so the switch was without drawbacks.
chrisjj 346 days ago [-]
> I'd say maybe 60-70% of the code it writes I don't have to touch at all.
...yet. Bugs can take time to surface.
michaelteter 346 days ago [-]
And this is equally true whether the code was entirely written by a human or not.
chrisjj 341 days ago [-]
... except "not" delivers this "the output code is pretty much always broken in some way".
creesch 346 days ago [-]
> Also, I didn't really noticed a significant difference in code quality, even the best model (GPT-4) write code that doesn't work,
Interesting, personally I have noticed a difference. Mostly in how well the models pick up small details and context. Although I do have to agree that the open Llama models are generally fairly serviceable.
Recently I have tended to lean towards Claude Sonnet 3.5 as it seems slightly better. Although that does differ per language as well.
As far as them being slow, I haven't really noticed a difference. I use them mostly through the API with open webui and the answers come quick enough.
mind-blight 345 days ago [-]
I use o1 for research rather than coding. If I have a complex question that requires combining multiple ideas or references and checking the result, it's usually pretty good at that.
Sometimes that results in code, but it's the research and cross referencing that's actually useful with it
_heimdall 346 days ago [-]
Its interesting to see these LLM tools turning developers into no-code customers. Where tools like visual site builders allowed those without coding experience to code a webpage, LLMs are letting those with coding experience to avoid the step of coding.
There's not even anything wrong with that, don't take my comment the wrong way. It is an interesting question of what happens at scale though. We could easily find ourselves in a spot where very few people know how to code and most producing code don't actually know how it works and couldn't find or fix a bug if they needed to. It also means LLMs would be stuck with today's code for a training set until it can invent its own coding paradigms and languages, at which point we're all left in the dust trusting it to work right.
sampo 346 days ago [-]
> I paste code back and forth
There is this tool Aider. Takes your prompt, adds code files (sometimes not all of your code files but files it figures relevant) and prepares one long prompt, sends it to an LLM, receives the response, and makes a git commit based on the response. If you rather review git commits, it can save you the back-and-forth copy-pasting. https://aider.chat/
maleldil 345 days ago [-]
Note that the default mode will automatically change and commit the code, which I found counter-intuitive. I prefer using the architect mode, where it first tells you what it is going to do, so you can iterate on it before making changes.
simplyluke 346 days ago [-]
This is exactly how I’ve used copilot for over a year now. It’s really helpful! Especially with repetitive code. Certainly worth what my employer pays for it.
The general public has a very different idea of that though and I frequently meet people very surprised the entire profession hasn’t been automated yet based on headlines like this.
arisAlexis 346 days ago [-]
Because you are using it like that doesn't mean that it can't be used for the whole stack and on its own and the public including laymen such as the Nvidia CEO and Sam think that yes, we (I'm a dev) will be replaced. Plan accordingly my friend.
robertlagrant 346 days ago [-]
> Because you are using it like that doesn't mean that it can't be used for the whole stack
Well no, but we have no evidence it can be used for the whole stack, whatever that means.
arisAlexis 346 days ago [-]
Even last year's gpt4 could make a whole iphone app from scratch for someone that doesn't know how to code. You can find videos online. I think you are applying the ostrich method which is understandable. We need to adapt.
papichulo2023 346 days ago [-]
Complexity increase over time. I can create new features in minutes for my new selfhosted projects, equivalent work on my entreprise work takes days...
arisAlexis 346 days ago [-]
New Gemini has millions of context windows. Think big and project 1-2 years
robertlagrant 346 days ago [-]
> I think you are applying the ostrich method which is understandable
Asking for evidence is not being an ostrich.
arisAlexis 345 days ago [-]
The ostrich method is avoiding existing evidencea available online and searchable for full stack llm programming
robertlagrant 345 days ago [-]
Making a simple app isn't evidence that it will replace people, any more than a 90%-good self-driving car is evidence that we'll get a 100%-good self-driving car.
ktnaWA 346 days ago [-]
Which industry would you pivot to? The only industry that is desperate for workers right now is the defense industry. But manufacturing shells for Ukraine and Israel does not seem appealing.
simplyluke 345 days ago [-]
I was a hacker before the entire stack I work in was common or released, and I’ll be one when all our tools change again in the future. I have family who programmed with punch cards.
But I doubt the predictions from men whose net worth depends on the hype they foment.
arisAlexis 345 days ago [-]
It's not tools. It's intelligent agents capable of human output.
arisAlexis 346 days ago [-]
The laymen was ironic of course..
red_admiral 346 days ago [-]
A few years ago we called that IntelliSense, right?
I remember many years ago as a Java developer, Netbeans could do such things as complete `psvm` to "public static void main() {...}", or if you had a field "private String name;" you could press some key combination and it would generate you the getter and setter, complete with javadoc which was mandatory at that place because apparently you need "Returns the name.\n @return The name." on a method called getName() in case you wondered what it was for.
rty32 345 days ago [-]
I think most people define "Intellisense" as "IDE suggestions based on static anaysis results". Sometimes it blends a bit of heuristics/usage statistics as added feature depending on the tool. They are mostly deterministic, based on actual AST of your code, and never hallucinates. They may not be helpful but can never be wrong.
On the other hand, LLMs are completely different -- based on machine learning and everything is random and about statistics. It depends on training data and context. It is more useful but make a ton of mistakes.
_heimdall 346 days ago [-]
Yes, Copilot and other LLM coding tools are just a (much) better version of IntelliSense.
snowe2010 346 days ago [-]
Much worse imo.
_heimdall 346 days ago [-]
That could be too. I don't use LLMs so I'm just giving it the benefit of the doubt based on other commentors here.
skydhash 346 days ago [-]
Most jetbrains IDEs come with those snippets and if you’re using IDEA, the code will be 50%+ generated by the IDE.
peepee1982 345 days ago [-]
That's what I thought. In recent weeks, most of the code I’ve written has been AI-generated. But it was mostly JSDoc comments, type checking (I'm writing JavaScript), abstracting code if I see that I'm repeating myself a little too often, etc.
All things that I would consider tedious housekeeping, but nothing that needs serious reasoning.
It's basically a glorified LSP.
inanepenguin 345 days ago [-]
I know you're not saying anything revolutionary but this is the best succinct yet fair description of these tools that I've seen. They're not worthless but they're not job destroying.
peepee1982 345 days ago [-]
You're right, it's not revolutionary at all. But I'm glad you liked my summary!
hgomersall 346 days ago [-]
Before I go and rip out and replace my development workflow, is it notably better than auto complete suggestions from CoC in neovim (with say, rust-analyzer)? I'm generally pretty impressed how quickly it gives me the right function call or whatever, or it's the one of the top few.
Leherenn 346 days ago [-]
It's more than choosing the right function call, it goes further than that. If your code has patterns, it recognises and suggests them.
For instance, one I find very useful is that we have this pattern of checking the result of a function call, logging the error and returning, or whatever. So now, every time you have `result = foo()`, it will auto suggest `if (!result) log_error...` with a generally very good error message.
Very basic, but damn convenient. The more patterns you use, the more helpful it becomes.
ghostpepper 346 days ago [-]
Does it make you 25% more productive?
vundercind 346 days ago [-]
Between the fraction of my time I spend actually writing code, and how much of the typing time I’m using to think anyway, I dunno how much of an increase in my overall productivity could realistically be achieved by something that just helped me type the code in faster. Probably not 25% no matter how fast it made that part. 5% is maybe possible, for something that made that part like 2-3x faster, but much more than that and it’d run up against a wall and stop speeding things up.
imchillyb 346 days ago [-]
I imagine that those who cherished the written word thought similar thoughts when the printing press was invented, when the typewriter was invented, and before excel took over bookkeeping.
My productivity isn't so much enhanced. It's only 1%... 2%... 5%... globally, for each employee.
Have you ever dabbled with, mucked around in, a command line? Autocomplete functions there save millions of man-hour-typing-units per year. Something to think about.
A single employee, in a single task, for a single location may not equal much gained productivity, but companies now think on much larger scales than a single office location.
moron4hire 346 days ago [-]
This is a fallacy because there is no way to add up 1% savings across 100 employees into an extra full time employee.
Work gets scheduled on short time frames. 5% savings isn't enough to change the schedule for any one person. At most, it gives me time to grab an extra coffee. I can't string together "foregone extra coffees" into "more tasks/days in the schedule".
robertlagrant 346 days ago [-]
This. I had the same conversation years ago with someone who said "imagine if Windows booted 30s faster, all the productivity gains across the world!" And I said the same thing you did: people turn their computer on and then make a cup of tea.
Now making a kettle faster? That might actually be something.
rustcleaner 346 days ago [-]
If 25% of code was AI-written, wouldn't it be a 33[.333...]% increase in productivity?
PeterStuer 346 days ago [-]
It is not a direct correlation. I might write 80% of the lines of code in a week, then spend the next 6 months on the remaining 20%. If the AI was mostly helpfull in that first week, overall productivity gain would be very low.
vundercind 345 days ago [-]
Who spends 100% of their time actually typing code?
It’s probably closer to 10% than 100%, especially at big companies.
One thing I would love to see is reports of benefits from various tools coming with one’s typing ability in WPM. I’d also like to see that on posts where people express a preference for “a quick call” or stopping by your desk rather than posting what they want in chat. I have some hypotheses I’d like to test out.
card_zero 346 days ago [-]
Not if there was also an 8.333̅% increase in slacking off.
Wait, no. That should be based on how much slacking off Google employees do ordinarily, an unknown quantity.
saagarjha 346 days ago [-]
You can just check Memegen traffic to figure that one out.
nycdatasci 346 days ago [-]
This is a great anecdote. SOTA models will not provide “engineering” per se, but they will easily double productivity of a product manager that is exploring new product ideas or technologies. They are much more than intelligent auto-complete. I have done more with side projects in the last year than I did in the preceding decade.
llm_trw 346 days ago [-]
One of my friends put it best: I just did a months worth of experimentation in two hours.
Sateeshm 346 days ago [-]
I find this hard to believe. Can someone give me an example of something that takes months that AI can correctly do in hours?
jvanveen 346 days ago [-]
Not hours; but days instead of months: porting around 30k lines of legacy livescript project to typescript. Most of the work is in tweaking a prompt for Claude (using Aider) so the porting process is done correctly.
cdchn 346 days ago [-]
Thankfully it seems like AI is best at automating the most tedious and arguably most useless endeavor in software engineering- rewriting perfectly good code in whatever the language du jour is.
disgruntledphd2 346 days ago [-]
Again, what AI is good at shows the revealed preferences of the training data, so it does make sense that it would excel at pointless rewrites.
protomolecule 345 days ago [-]
Legacy code in a dynamically typed language is never good.
llm_trw 345 days ago [-]
Use Undermind to gather a literature review of a field adjacent to the one you’re working in but with a wealth of information that you don’t yet know.
Use OpenAI to convert a few thousand lines of code from a language you're familiar with to one you’re not, as all the state-of-the-art tools in the field above use that language. Debug all the issues that arise from the impedance mismatch between the languages. Recreate the results from the seminal paper in the field to verify that the code works, and run it on your own problem. Write a stream-of-consciousness post without spell-checking, then throw it into GPT and ask it to fix it.
hnisoss 345 days ago [-]
sounds to me like you're tooting your own horn.
karmasimida 346 days ago [-]
I can totally see it.
It is actually a testament that, part of Google's code are ... kinda formulaic to some degree. Prior to the LLM take over, we already heard praise how Google's code search works wonder in helping its engineer writing code, LLM just brought that experience to next level.
jb1991 345 days ago [-]
Long before this current AI hype cycle, we’ve had excellent code completion in editors for decades. So I guess by that definition, we’ve all been writing AI assisted code for a very long time.
fhd2 345 days ago [-]
I'd say so, and it's a bit misleading to leave that out. Code generation is almost as old as computing. So far, most of it happened to be deterministic.
player1234 345 days ago [-]
Yeah but it didn't cost trillions and needed its own nuclear power plant. Noone disputes that llm/ai is cool/can be helpful but at what cost, where is the roi?
afro88 346 days ago [-]
So more or less on par with continue.dev using a local starcoder2:3b model
hackerknew 346 days ago [-]
I wondered if this the real context. i.e. They are just referring to code-completion as AI-generated code. But, the article seems like it is referring to more than that?
jszymborski 345 days ago [-]
Sounds like the JetBrains new local AI autocomplete. If it's anything like that, it's honestly my ideal application of generative deep learning.
awkward 345 days ago [-]
Stuff that works well with AI seems to correlate pretty well with high churn changes. I've had good luck using AI to port large numbers of features from version A to version B, or getting code with a a lot of dependencies under mocked unit tests.
It's easy to see that adding up quickly to represent large percentages of the codebase by line, but it's not feature development or solving hard problems.
blindhippo 345 days ago [-]
Same things I use it for as well - crap like "update this class to use JDK21" or "re-implement this client to use AWS SDKv2" or whatever.
And it works maybe... 80% of the way and I spend all my time fixing the remaining 20%. Anecdotally I don't "feel" like this really accelerates me or reduces the time it would take me to do the change if I just implemented the translation manually.
awkward 345 days ago [-]
Amazon is publicly claiming that they have saved hundreds of millions on jvm upgrades using AI, so while it feels trivial - because before that work would end up in the "just don't do it" pile - it's a relevant use case.
theodric 345 days ago [-]
I wonder how this works with IP rights in the USA. Like, is `function getAc` eligible for copyright protection, but `tionHandler()` isn't? After all, [1]
Thank you for this comment. So the code written in this manner isn't really "created by AI"; AI is just a nice additional feature of an editor.
I wonder if the enormous hype around AI is a good or bad thing; it's obviously both but will the good win out the bad, or will the disappointment eventually be so overwhelming as to extinguish any enthusiasm.
segasaturn 345 days ago [-]
How do you square this comment with the one right below it[1], which explicitly confirms the statement that Google is using GenAI via Gemini to write code? Lots of mixed signals coming from the Googlers here.
This is pretty much what I've found with Copilot as well. It's like a slightly smarter autocomplete in most cases. Copilot tends toward being a little eager sometimes, but it's easy enough to just ignore the suggestions when it starts going down a weird path.
ImaCake 346 days ago [-]
This autocomplete seems about on par with github copilot. Do you also get options for prompting it on specific chunks of code and performing specific actions such as writing docs or editing existing code? All things that come standard with gh copilot now.
grecy 345 days ago [-]
I'm confused, I've been doing similar tab completion for function names in eclipse since about 2003...
aforty 345 days ago [-]
We have this at our company too. I guess it’s useful but doesn’t really have a whole lot of time.
markstos 345 days ago [-]
Which editor is Google's AI code completion integrated with? VS Code?
hoveringhen 345 days ago [-]
Yeah
insane_dreamer 345 days ago [-]
also useful for writing unit tests, comments, descriptions, so if you count all of that as code, together with boilerplate stuff, then yeah, it could add up to 25%.
znpy 346 days ago [-]
> If I'm writing "function getAc..." it's smart enough to complete to "function getActionHandler()", and maybe suggest the correct arguments and a decent jsdoc comment.
I really mean no offense, but your example doesn't sound much different from what old IDEs (say, Netbeans) used to do 15 years ago.
I could design a Swing ui and it would generate the code and if I wanted to override a method it would generate a decent boilerplate boilerplate (a getter, like in your example) along with usual comments and definitely correct parameters list (with correct types).
Is this "AI Code" thing something that appears new because at some point we abandoned IDEs with very strong intellisense (etc) ?
"Our overhyped Autocomplete Implementation (A.I.) is completing 25% of our lines of code so well that we need to fund nuclear reactors to power the server farms."
josh_carterPDX 345 days ago [-]
My first reaction to the title was, "That explains why things are broken." but this explanation makes so much sense. Thanks for clarifying.
But yeah, I wish the new version of Chrome worked better. ¯\_(ツ)_/¯
napierzaza 346 days ago [-]
[dead]
Galatians4_16 346 days ago [-]
Kerry said hi
ntulpule 347 days ago [-]
Hi, I lead the teams responsible for our internal developer tools, including AI features. We work very closely with Google DeepMind to adapt Gemini models for Google-scale coding and other Software Engineering usecases. Google has a unique, massive monorepo which poses a lot of fun challenges when it comes to deploying AI capabilities at scale.
1. We take a lot of care to make sure the AI recommendations are safe and have a high quality bar (regular monitoring, code provenance tracking, adversarial testing, and more).
2. We also do regular A/B tests and randomized control trials to ensure these features are improving SWE productivity and throughput.
3. We see similar efficiencies across all programming languages and frameworks used internally at Google and engineers across all tenure and experience cohorts show similar gain in productivity.
I'm continually surprised by the amount of negativity that accompanies these sort of statements. The direction of travel is very clear - LLM based systems will be writing more and more code at all companies.
I don't think this is a bad thing - if this can be accompanied by an increase in software quality, which is possible. Right now its very hit and miss and everyone has examples of LLMs producing buggy or ridiculous code. But once the tooling improves to:
1. align produced code better to existing patterns and architecture
2. fix the feedback loop - with TDD, other LLM agents reviewing code, feeding in compile errors, letting other LLM agents interact with the produced code, etc.
Then we will definitely start seeing more and more code produced by LLMs. Don't look at the state of the art not, look at the direction of travel.
latexr 347 days ago [-]
> if this can be accompanied by an increase in software quality
That’s a huge “if”, and by your own admission not what’s happening now.
> other LLM agents reviewing code, feeding in compile errors, letting other LLM agents interact with the produced code, etc.
What a stupid future. Machines which make errors being “corrected” by machines which make errors in a death spiral. An unbelievable waste of figurative and literal energy.
> Then we will definitely start seeing more and more code produced by LLMs.
We’re already there. And there’s a lot of bad code being pumped out. Which will in turn be fed back to the LLMs.
> Don't look at the state of the art not, look at the direction of travel.
That’s what leads to the eternal “in five years” which eventually sinks everyone’s trust.
danielmarkbruce 346 days ago [-]
> What a stupid future. Machines which make errors being “corrected” by machines which make errors in a death spiral. An unbelievable waste of figurative and literal energy.
Humans are machines which make errors. Somehow, we got to the moon. The suggestion that errors just mindlessly compound and that there is no way around it, is what's stupid.
latexr 346 days ago [-]
> Humans are machines
Even if we accept the premise (seeing humans as machines is literally dehumanising and a favourite argument of those who exploit them), not all machines are created equal. Would you use a bicycle to fill your taxes?
> Somehow, we got to the moon
Quite hand wavey. We didn’t get to the Moon by reading a bunch of text from the era then probabilistically joining word fragments, passing that around the same funnel a bunch of times, then blindly doing what came out, that’s for sure.
> The suggestion that errors just mindlessly compound and that there is no way around it
Is one that you made up, as that was not my argument.
danielmarkbruce 346 days ago [-]
LLMs are a lot better at a lot of things than a lot of humans.
We got to the moon using a large number of systems to a) avoid errors where possible and b) build in redundancies. Even an LLM knows this and knew what the statement meant:
> LLMs are a lot better at a lot of things than a lot of humans.
Sure, I'm really poor painter, Midjourney is better than me. Are they better than a human trained for that task, on that task? That's the real question.
And I reckon the answer is currently no.
danielmarkbruce 346 days ago [-]
The real question is can they do a good enough job quickly and cheaply to be valuable. ie, quick and cheap at some level of quality is often "better". Many people are using them in the real world because they can do in 1 minute what might take them hours. I personally save a couple hours a day using ChatGPT.
latexr 346 days ago [-]
Ah, well then, if the LLM said so then it’s surely right. Because as we all know, LLMs are never ever wrong and they can read minds over the internet. If it says something about a human, then surely you can trust it.
You’ve just proven my point. My issue with LLMs is precisely people turning off their brains and blindly taking them at face value, even arduously defending the answers in the face of contrary evidence.
If you’re basing your arguments on those answers then we don’t need to have this conversation. I have access to LLMs like everyone else, I don’t need to come to HN to speak with a robot.
danielmarkbruce 345 days ago [-]
You didn't read the responses from an LLM. You've turned your brain off. You probably think self-driving cars are also a nonsense idea. Can't work. Too complex. Humans are geniuses without equal. AI is all snake oil. None of it works.
latexr 345 days ago [-]
You missed the mark entirely. But it does reveal how you latch on to an idea about someone and don’t let it go, completely letting it cloud your judgement and arguments. You are not engaging with the conversation at hand, you’re attacking a straw man you have constructed in your head.
Of course self-driving cars aren’t a nonsense idea. The execution and continued missed promises suck, but that doesn’t affect the idea. Claiming “humans are geniuses without equal” would be pretty dumb too, and is again something you’re making up. And something doesn’t have to be “all snake oil” to deserve specific criticism.
The world has nuance, learn to see it. It’s not all black and white and I’m not your enemy.
danielmarkbruce 345 days ago [-]
Nope, hit the mark.
Actually understand LLMs in detail and you'll see it isn't some huge waste of time and energy to have LLMs correct outputs from LLMs.
Or, don't, and continue making silly, snarky comments about how stupid some sensible thing is, in a field you don't understand.
malcolmgreaves 345 days ago [-]
> These LLMs seem so smart.
Yes, they do *seem* smart. My experience with a wide variety of LLM-based tools is that they are the industrialization of the Dunning-Kruger effect.
danielmarkbruce 345 days ago [-]
It's more likely the opposite. Humans rationalize their errors out the wazoo. LLMs are showing us we really aren't very smart at all.
Johanx64 346 days ago [-]
Humans are obviously machines. If not, what are humans then? Fairies?
Now once you've recognized that, you're better equiped for task at hand - which is augmenting and ultimately automating away every task that humans-as-machines perform by building equivalent or better machine that performs said tasks at fraction of the cost!
People who want to exploit humans are the ones that oppose automation.
There's still long way to go, but now we've finally reached a point where some tasks that were very ellusive to automation are starting to show great promise of being automated, or atleast being greatly augmented.
beepbooptheory 345 days ago [-]
Profoundly spiritual take. Why is that the task at hand?
The conceit that humans are machines carries with it such powerful ideology: humans are for something, we are some kind of utility, not just things in themselves, like birds and rocks. How is it anything other than an affirmation of metaphysical/theological purpose to particularly humans? Why is it like that? This must be coming from a religious context, right?
I cannot at least see how you could believe this while sustaining a rational, scientific mind about nature, cosmology, etc. Which is fine! We can all believe things, just know you cant have your cake and eat it too. Namely, if anybody should believe in fairies around here, it should probably be you!
danielmarkbruce 345 days ago [-]
> Why is that the task at hand?
Because it's boring stuff, and most of us would prefer to be playing golf/tennis/hanging out with friends/painting/etc. If you look at the history of humanity, we've been automating the boring stuff since the start. We don't automate the stuff we like.
Johanx64 345 days ago [-]
Where's the spiritual part?
Recognizing that humans, just like birds are self-replicating biological machines is the most level-headed way of looking at it.
It is consistent with observations and there are no (apparent) contraditions.
The spritual beliefs are the ones with the fairies, binding of the soul, made of special substrate, beyond reason and understanding.
If you have desire to improve human condition (not everyone does) then the task at hand naturally arisies - eliminate forced labour, aging, disease, suffering, death, etc.
This all naturally leads to automation and transhumanism.
lelanthran 346 days ago [-]
> Humans are obviously machines. If not, what are humans then? Fairies?
If humans are machines, then so are fairies.
kelnos 346 days ago [-]
The difference is that when we humans learn from our errors, we learn how to make them less often.
LLMs get their errors fed back into them and become more confident that their wrong code is right.
I'm not saying that's completely unsolvable, but that does seem to be how it works today.
danielmarkbruce 346 days ago [-]
That isn't the way they work today. LLMs can easily find errors in outputs they themselves just produced.
Start adding different prompts, different models and you get all kinds of ways to catch errors. Just like humans.
Lio 346 days ago [-]
I don’t think LLMs can easily find errors in their output.
There was a recent meme about asking LLMs to draw a wineglass full to the brim with wine.
Most really struggle with that instruction. No matter how much you ask them to correct themselves they can’t.
I’m sure they’ll get better with more input but what it reveals is that right now they definitely do not understand their own output.
I’ve seen no evidence that they are better with code than they are with images.
For instance, if the time to complete only scales with length of the token and not the complexity of its contents then it probably safe to assume it’s not being comprehended.
philipwhiuk 346 days ago [-]
> LLMs can easily find errors in outputs they themselves just produced.
No. LLMs can be told that there was an error and produce an alternative answer.
In fact LLMs can be told there was an error when there wasn't one and produce an alternative answer.
In my experience, if you confuse an LLM by deviating from the the "expected", then all the shims of logic seem to disappear, and it goes into hallucination mode.
danielmarkbruce 345 days ago [-]
Try asking this question to a bunch of adults.
mavidser 340 days ago [-]
Tbf that was exactly my point. An adult might use 'inference' and 'reasoning' to ask clarification, or go with an internal logic of their choosing.
ChatGPT here went with a lexigraphical order in Python for some reason, and then proceeded to make false statements from false observations, while also defying its own internal logic.
"six" > "ten" is true because "six" comes after "ten" alphabetically.
No.
"ten" > "seven" is false because "ten" comes before "seven" alphabetically.
No.
From what I understand of LLMs (which - I admit - is not very much), logical reasoning isn't a property of LLMs, unlike information retrieval. I'm sure this problem can be solved at some point, but a good solution would need development of many more kinds of inference and logic engines than there are today.
327 days ago [-]
cdchn 346 days ago [-]
Do you believe that the LLM understands what it is saying and is applying the logic that you interprets from its response, or do you think its simply repeating similar patterns of words its seen associated with the question you presented it?
danielmarkbruce 345 days ago [-]
If you take the time to build an (S?)LM yourself, you'll realize it's neither of these. "Understands" is an ill-defined term, as is "applying logic".
But a LLM is not "simply" doing anything. It's extremely complex and sophisticated. Once you go from tokens into high-dimensional embeddings... it seems these models (with enough training) figure out how all the concepts go together. I'd suggest reading the word2vec paper first, then think about how attention works. You'll come to the conclusion these things are likely to be able to beat humans at almost everything.
lomase 346 days ago [-]
You said humans are machines that make errors ans that LLMs can easily find errors in output they themself produce.
Are you sure you wanted to say that? Or is the other way around?
danielmarkbruce 345 days ago [-]
Yes. Just like humans. It's called "checking your work" and we teach it to children. It's effective.
0points 346 days ago [-]
> LLMs can easily find errors in outputs they themselves just produced.
Really? That must be a very recent development, because so far this has been a reason for not using them at scale. And noone is.
Do you have a source?
danielmarkbruce 345 days ago [-]
Lots of companies are using them at scale.
reverius42 346 days ago [-]
To err is human. To err at scale is AI.
cetu86 346 days ago [-]
I fear that we'll see a lot of humans err at scale next Tuesday.
Global warming is another example of human error at scale.
fuzztester 345 days ago [-]
>next Tuesday.
USA (s)election, I guess.
danielmarkbruce 346 days ago [-]
To err at scale isn't unique to AI. We don't say "no software, it can err at scale".
munk-a 345 days ago [-]
CEOs embracing the marginal gains of LLMs by dumping billions into it are certainly great examples of humans erring at scale.
fuzztester 345 days ago [-]
yep, nano mega.
trod123 346 days ago [-]
It is by will alone that I set my mind in motion.
It is by the juice of Sapho that thoughts acquire speed, the lips become stained, the stains become a warning...
fuzztester 346 days ago [-]
err, "hallucinate" is the euphemism you're looking for. ;)
arkh 346 days ago [-]
I don't like the use of hallucinate. It implies that LLM have some kind of model of reality and some times get confused. They don't have any kind of model of anything, they cannot "hallucinate", they can only output wrong results.
fuzztester 345 days ago [-]
>They don't have any kind of model of anything, they cannot "hallucinate", they can only output wrong results.
it's even more fundamental than that.
even if they had any model, they would not be able to think.
thinking requires consciousness. only humans and some animals have it. maybe plants too.
machines? no way, jose.
fuzztester 345 days ago [-]
yeah, i get you. it was a joke, though.
that "hallucinate" term is a marketing gimmick to make it seem to the gullible that this "AI" (i.e. LLMs) can actually think, which is flat out BS.
as many others have said here on hn, those who stand to benefit a lot from this are the ones promoting this bullcrap idea (that they (LLMs) are intelligent).
greater fool theory.
picks and shovels.
etc.
In detective or murder novels, the cliche is "look for the woman".
in this case, "follow the money" is the translation, i.e. who really benefits (the investors and founders, the few), as opposed to who is grandly proclaimed to be the beneficiary (us, the many).
fuzztester 345 days ago [-]
s/grand/grandiose/g
from a search for grand vs grandiose:
When it comes to bigness, there's grand and then there's grandiose. Both words can be used to describe something impressive in size, scope, or effect, but while grand may lend its noun a bit of dignity (i.e., “we had a grand time”), grandiose often implies a whiff of pretension.
Indeed, and one of the most interesting errors some human machines are making is hallucinating false analogies.
danielmarkbruce 345 days ago [-]
It wasn't an analogy.
346 days ago [-]
goatlover 346 days ago [-]
Machines are intelligently designed for a purpose. Humans are born and grow up, have social lives, a moral status and are conscious, and are ultimately the product of a long line of mindless evolution that has no goals. Biology is not design. It's way messier.
nuancebydefault 346 days ago [-]
Exactly my thought. Humans can correct humans. Machines can correct, or at least point to failures in the product of, machines.
346 days ago [-]
paradox242 346 days ago [-]
I don't see how this is sustainable. We have essentially eaten the seed corn. These current LLMs have been trained by an enormous corpus of mostly human-generated technical knowledge from sources which we already know to be currently being polluted by AI-generated slop. We also have preliminary research into how poorly these models do when training on data generated by other LLMs. Sure, it can coast off of that initial training set for maybe 5 or more years, but where will the next giant set of unpolluted training data come from? I just don't see it, unless we get something better than LLMs which is closer to AGI or an entire industry is created to explicitly create curated training data to be fed to future models.
_DeadFred_ 346 days ago [-]
These tools also require the developer class to that they are intended to replace to continue to do what they currently do (create the knowledge source to train the AI on). It's not like the AIs are going to be creating the accessible knowledge bases to train AIs on, especially for new language extensions/libraries/etc. This is a one and f'd development. It will give a one time gain and then companies will be shocked when it falls apart and there's no developers trained up (because they all had to switch careers) to replace them. Unless Google's expectation is that all languages/development/libraries will just be static going forward.
layer8 346 days ago [-]
One of my concerns is that AI may actually slow innovation in software development (tooling, languages, protocols, frameworks and libraries), because the opportunity cost of adopting them will increase, if AI remains unable to be taught new knowledge quickly.
mathw 346 days ago [-]
It also bugs me that these tools will reduce the incentive to write better frameworks and language features if all the horrible boilerplate is just written by an LLM for us rather than finding ways to design systems which don't need it.
The idea that our current languages might be as far as we get is absolutely demoralising. I don't want a tool to help me write pointless boilerplate in a bad language, I want a better language.
batty_alex 346 days ago [-]
This is my main concern. What's the point of other tools when none of the LLMs have been trained on it and you need to deliver yesterday?
It's an insanely conservative tool
jamil7 346 days ago [-]
You already see this if you use a language outside of Python, JS or SQL.
wahnfrieden 346 days ago [-]
that is solved via larger contexts
layer8 346 days ago [-]
It’s not, unless contexts get as large as comparable training materials. And you’d have to compile adequate materials. Clearly, just adding some documentation about $tool will not have the same effect as adding all the gigabytes of internet discussion and open source code regarding $tool that the model would otherwise have been trained on. This is similar to handing someone documentation and immediately asking questions about the tool, compared to asking someone who had years of experience with the tool.
Lastly, it’s also a huge waste of energy to feed the same information over and over again for each query.
wahnfrieden 345 days ago [-]
- context of millions of tokens is frontier
- context over training is like someone referencing docs vs vaguely recalling from decayed memory
- context caching
layer8 345 days ago [-]
You’re assuming that everything can be easily known from documentation. That’s far from the truth. A lot of what LLMs produce is informed by having been trained on large amounts of source code and large amounts of discussions where people have shared their knowledge from experience, which you can’t get from the documentation.
0points 346 days ago [-]
Yea, I'm thinking along the same lines.
The companies valuing the expensive talent currently working on Google will be the winner.
Google and others are betting big right now, but I feel the winner might be those who watches how it unfolds first.
brainwad 346 days ago [-]
The LLM codegen at Google isn't unsupervised. It's integrated into the IDE as both autocomplete and prompt-based assistant, so you get a lot of feedback from a) what suggestions the human accepts and b) how they fix the suggestion when it's not perfect. So future iterations of the model won't be trained on LLM output, but on a mixture of human written code and human-corrected LLM output.
As a dev, I like it. It speeds up writing easy but tedious code. It's just a bit smarter version of the refactoring tools already common in IDEs...
kelnos 346 days ago [-]
What about (c) the human doesn't realize the LLM-generated code is flawed, and accepts it?
monocasa 346 days ago [-]
I mean what happens when a human doesn't realize the human generated code is wrong and accepts the PR and it becomes part of the corpus of 'safe' code?
jaredsohn 346 days ago [-]
Presumably someone will notice the bug in both of these scenarios at some point and it will no longer be treated as safe.
skydhash 346 days ago [-]
Do you ask a junior to review your code or someone experienced in the codebase?
loki-ai 346 days ago [-]
maybe most of the code in the future will be very different from what we’re used to.
For instance, AI image processing/computer vision algorithms are being adopted very quickly given the best ones are now mostly transformers networks.
spockz 346 days ago [-]
My main gripe with this form of code generation is that is primarily used to generate “leaf” code. Code that will not be further adjusted or refactored into the right abstractions.
It is now very easy to sprinkle in regexes to validate user input , like email addresses, on every controller instead of using a central lib/utility for that.
In the hands of a skilled engineer it is a good tool. But for the rest it mainly serves to output more garbage at a higher rate.
cdchn 346 days ago [-]
>It is now very easy to sprinkle in regexes to validate user input , like email addresses, on every controller instead of using a central lib/utility for that.
Some people are touting this as a major feature. "I don't have to pull in some dependency for a minor function - I can just have AI write that simple function for me." I, personally, don't see this as a net positive.
spockz 346 days ago [-]
Yes, I have heard similar arguments before. It could be an argument for including the functionality in the standard lib for the language. There can be a long debate about dependencies, and then there is still the benefit of being able to vendor and prune them.
The way it is now just leads to bloat and cruft.
philipwhiuk 346 days ago [-]
> The direction of travel is very clear
And if we get 9 women we can produce a baby in a single month.
There's no guarantee such progression will continue. Indeed, there's much more evidence it is coming to a a halt.
Towaway69 346 days ago [-]
It might also be an example of 80/20 - we're just entering the 20% of features that take 80% of the time & effort.
It might be possible but will shareholders/investors foot the bill for the 80% that they still have to pay.
farseer 346 days ago [-]
Its not even been 2 years, and you think things are coming to a halt?
0points 346 days ago [-]
Yes. The models require training data and they already been fed the internet.
More and more of the content generated since is LLM generated and useless as training data.
The models get worse, not better by being fed their own output, and right now they are out of training data.
This is why Reddit just went profitable, AI companies buy their text to train their models because it is at least somewhat human written.
Of course, even reddit is crawling with LLM generated text, so yes. It is coming to a halt.
CaptainFever 346 days ago [-]
Data is not the only factor. Architecture improvements, data filtering etc. matter too.
simianparrot 346 days ago [-]
I know for a fact they are because rate _and_ quality of improvement is diminishing exponentially. I keep a close eye on this field as part of my job.
lelanthran 346 days ago [-]
> Don't look at the state of the art not, look at the direction of travel.
That's what people are doing. The direction of travel over the most recent few (6-12) months is mostly flat.
The direction of travel when first introduced was a very steep line going from bottom-left to top-right.
We are not there anymore.
olalonde 346 days ago [-]
> I'm continually surprised by the amount of negativity
Maybe I'm just old, but to me, LLMs feel like magic. A decade ago, anyone predicting their future capabilities would have been laughed at.
Towaway69 346 days ago [-]
Magic Makes Money - the more magical something seems, the more people are willing to pay for that something.
The discussion here seems to bare this out: CEO claims AI is magical, here the truth becomes that it’s just an auto-complete engine.
guappa 346 days ago [-]
Nah, you just were not up to speed with the current research. Which is completely normal. Now marketing departments are on the job.
davedx 346 days ago [-]
Transformers were proposed in 2017. A decade ago none of this was predictable.
guappa 345 days ago [-]
emacs psichologist was there from before :D
And so were a lot of markov chain based chatbots. Also Doretta, the microsoft AI/search engine chatbot.
Were they as good? No. Is this an iteration of those? Absolutely.
protomolecule 345 days ago [-]
Kurzweil would disagree)
mmmpetrichor 346 days ago [-]
That's the hype isn't it. The direction of travel hasn't been proven to be more than a surface level yet.
randomNumber7 346 days ago [-]
Because there seems to be a fundamental misunderstanding producing a lot of nonsense.
Of course LLMs are a fantastic tool to improve productivity, but current LLM's cannot produce anything novel. They can only reproduce what they have seen.
visarga 346 days ago [-]
But they assist developers and collect novel coding experience from their projects all the time. Each application of LLM creates feedback to the AI code - the human might leave it as is, slightly change it, or refuse it.
0points 346 days ago [-]
> LLM based systems will be writing more and more code at all companies.
At Google, today, for sure.
I do believe we still are not across the road on this one.
> if this can be accompanied by an increase in software quality, which is possible. Right now its very hit and miss
So, is it really a smart move of Google to enforce this today, before quality have increased? Or did this set off their path to losing market shares because their software quality will deteriorate further over the next couple years?
From the outside it just seems Google and others have no choice, they must walk this path or lose market valuation.
dogleash 345 days ago [-]
> I'm continually surprised by the amount of negativity that accompanies these sort of statements.
I'm excited about the possibilities and I still recoil at the refined marketer prose.
fallingknife 346 days ago [-]
I'm not really seeing this direction of travel. I hear a lot of claims, but they are always 3rd person. I don't know or work with any engineers who rely heavily on these tools for productivity. I don't even see any convincing videos on Youtube. Just show me on engineer sitting down with theses tools for a couple hours and writing a feature that would normally take a couple of days. I'll believe it when I see it.
Roark66 346 days ago [-]
Well, I rely on it a lot, but not in the IDE, I copy/paste my code and prompts between the ide and LLM. By now I have a library of prompts in each project I can tweak that I can just reuse. It makes me 25% up to 50% faster. Does this mean every project t is done in 50/75% of the time? No, the actual completion time is maybe 10% faster, but i do get a lot more time to spend on thinking about the overall design instead of writing boilerplate and reading reference documents.
Why no youtube videos thought? Well, most dev you tubers are actual devs that cultivate an image of "I'm faster than LLM, I never re-read library references, I memorise them on first read" and do on. If they then show you a video how they forgot the syntax for this or that maven plugin config and how LLM fills it in 10s instead of a 5min Google search that makes them look less capable on their own. Why would they do that?
skydhash 346 days ago [-]
Why don’t you read reference documents? The thing with bite-sized information is that is never gives you a coherent global view of the space. It’s like exploring a territory by crawling instead of using a map.
fallingknife 345 days ago [-]
Can you give me an example of one of these useful prompts? I'd love to try it out.
fuzztester 346 days ago [-]
you said it, bro.
baxtr 346 days ago [-]
I think that at least partially the negativity is due to the tech bros hyping AI just like they hyped crypto.
reverius42 347 days ago [-]
To me the most interesting part of this is the claim that you can accurately and meaningfully measure software engineering productivity.
ozim 347 days ago [-]
You can - but not on the level of a single developer and you cannot use those measures to manage productivity of a specific dev.
For teams you can measure meaningful outcomes and improve team metrics.
You shouldn’t really compare teams but it also is possible if you know what teams are doing.
If you are some disconnected manager that thinks he can make decisions or improvements reducing things to single numbers - yeah that’s not possible.
deely3 347 days ago [-]
> For teams you can measure meaningful outcomes and improve team metrics.
How? Which metrics?
anthonyskipper 346 days ago [-]
My company uses the Dora metrics to measure the productivity of teams and those metrics are incredibly good.
Capricorn2481 345 days ago [-]
These are awesome, but feel more applicable to DevOps than anything else. Development can certainly affect these metrics, but assuming your code doesn't introduce a huge bug that crashes the server, this is mostly for people deploying apps.
I think it's harder to measure things like developer productivity. The closest thing we have is making an estimate and seeing how far off you are, but that doesn't account for hedging estimates or requirements suddenly changing. Changing requirements doesn't matter for DORA as it's just another sample to test for deployment.
neaanopri 346 days ago [-]
There's only one metric that matters at the end of the day, and that's $. Revenue.
Unfortunately there's a lot of lag
ImaCake 346 days ago [-]
> Unfortunately there's a lot of lag
A great generalisation and understatement! Often looking like you are becoming more efficient is more important than actually being more efficient, e.g you need to impress investors. So you cut back on maintenance and other cost centres and the new management can blame you in 6 years time for it when you are far enough away from it to not hurt you.
fuzztester 346 days ago [-]
s/Revenue/profit/g
ozim 347 days ago [-]
That is what we pay managers -to figure out- for. They should find out which and how by knowing the team, familiarity with domain knowledge, understanding company dynamics, understanding customer, understanding market dynamics.
seanmcdirmid 346 days ago [-]
That's basically a non-answer. Measuring "productivity" is a well known hard problem, and managers haven't really figured it out...
mdorazio 346 days ago [-]
It's not a non-answer. Good managers need to figure out what metrics make sense for the team they are managing, and that will change depending on the company and team. It might be new features, bug fixes, new product launch milestones, customer satisfaction, ad revenue, or any of a hundred other things.
seanmcdirmid 346 days ago [-]
I would want a specific example in that case rather than "the good managers figure it out" because in my experience, the bad managers pretend to figure it out while the good managers admit that they can't figure it out. Worse still, if you tell your reports what those metrics are, they will optimize them to death, potentially tanking the product (I can increase my bug fix count if there are more bugs to fix...).
ozim 346 days ago [-]
So for a specific example I would have to outline 1-2 years of history of a team and product as a starter.
Then I would have to go on outlining 6-12 months of trying stuff out.
Because if I just give "an example" I will get dozens of "smart ass" replies how this specific one did not work for them and I am stupid. Thanks but don't have time for that or for writing an essay that no one will read anyway and call me stupid or demand even more explanation. :)
seanmcdirmid 346 days ago [-]
I get it, you are a true believer. I just disagree with your belief, and the fact that you can't bring credible examples to the table just reinforces that disagreement in my mind.
hshshshshsh 346 days ago [-]
The thing is even bad managers can thrive in a company with a large userbase like Google. There is a lot of momentum built into product and engineering.
randomNumber7 346 days ago [-]
I heard lines of code is a hot one.
hshshshshsh 346 days ago [-]
So basically you have nothing useful to say?
ozim 346 days ago [-]
I have to say that there is no solution that will work for "every team on every product".
This seems to be useful to understand and internalize that there are no simple answers like "use story points!".
There is also loads of people who don't understand that, so I stand by that is useful and important to repeat on every possible occasion.
yorwba 346 days ago [-]
Economists are generally fine with defining productivity as the ratio of aggregate outputs to aggregate inputs.
Measuring it is not the hard part.
The hard part is doing anything about it. If you can't attribute specific outputs to specific inputs, you don't know how to change inputs to maximize outputs. That's what managers need to do, but of course they're often just guessing.
seanmcdirmid 346 days ago [-]
Measuring human productivity is hard since we can't quantify output beyond silly metrics like lines of code written or amount of time speaking during meetings. Maybe if we were hunter/gatherers we could measure it by amount of animals killed.
ozim 346 days ago [-]
Well I pretty much see which team members are slacking and which are working hard.
But I do code myself, I write requirements so I do know which ones are trivial and which ones are not. I also see when there are complex migrations.
If you work in a group of people you will also get feedback - doesn't have to be snitching but still you get the feel who is a slacker in the group.
It is hard to quantify the output if you want to be removed from the group "give me a number" manager. If you actually do the work of a manager so you get the feel of the group like who is "Hermione Granger" nagging that others are slacking and disregard their opinion, you see who is the "silent doer" or you see who is "we should do it properly" bullshitter you can make a lot of meaningful adjustments.
rightbyte 346 days ago [-]
> Maybe if we were hunter/gatherers we could measure it by amount of animals killed.
Even that would be hard since hunting is complex. If you are the one chasing the pray into the arms of someone else, you surely want it to be considered a team effort.
"You can [accurately and meaningfully measure software engineering productivity] - but not on the level of a single developer and you cannot use those measures to manage productivity of a specific dev."
At the level of a company like Google, it's easy: both inputs and outputs are measured in terms of money.
ozim 346 days ago [-]
As you point back to my comment.
I am not Amazon person - but from my experience 2 pizza teams was what worked and I never implemented it myself just what I observed in wild.
Measuring Google in terms of money is also flawed, there is loads of BS hidden there and lots of people paying big companies more just because they are big companies.
js8 346 days ago [-]
> Maybe if we were hunter/gatherers we could measure it by amount of animals killed.
So that's how animal husbandry came about!
beefnugs 346 days ago [-]
haha that is not what managers do. Managers follow their KPIs exactly. If their KPIs say they get payed a bonus if profit goes up, then manager does smart number stuff and sees "if we fire 15% of employees this year, my pay goes up 63%" and then that happens
hshshshshsh 346 days ago [-]
That sounds like a micro manager. I would imagine good engineers can figure out something for themselves.
ChoHag 346 days ago [-]
[dead]
zac23or 345 days ago [-]
I knew a superstar developer who worked on reports in an SQL tool. In the company metrics, the developer scored 420 points per month, the second developer scored 60 points. “Please learn how to score more points from the leader”, the boss would say.
The superstar developer’s secret… he would send blank reports to clients (who would only realize it days later, and someone else would end up redoing the report), and he would score many more points without doing anything.
I’ve seen this happen a lot in many different companies. As a friend of mine used to say, “it’s very rare, but it happens all the time.”
I have no doubt that AI can help developers, but I don’t trust the metrics of the CEO or people who work on AI, because they are too involved in the subject.
svieira 345 days ago [-]
> When people are pressured to meet a target value there are three ways they can proceed:
Honestly I doubt he got away with this for long (unless it was a very dysfunctional org). Being the best gets you noticed (in a good way), and screwing people over gets you noticed too (in a bad way), the combination of the two paints a target on your back.
GeoAtreides 345 days ago [-]
> Being the best gets you noticed (in a good way), and screwing people over gets you noticed too (in a bad way),
ah, to be young again...
torginus 345 days ago [-]
I don't know what you're implying - I have had a few instances in my career when I went above and beyond and while I didn't receive too much praise for my efforts directly, after a while I noticed people who had no business knowing who I was, actually did.
Now, I was really bad at capitalizing on it, so nothing much came of it, but still, there are some positive things that higher-ups do notice.
UncleMeat 347 days ago [-]
At scale you can do this in a bunch of interesting ways. For example, you could measure "amount of time between opening a crash log and writing the first character of a new change" across 10,000s of engineers. Yes, each individual data point is highly messy. Alice might start coding as a means of investigation. Bob might like to think about the crash over dinner. Carol might get a really hard bug while David gets a really easy one. But at scale you can see how changes in the tools change this metric.
None of this works to evaluate individuals or even teams. But it can be effective at evaluating tools.
fwip 346 days ago [-]
There's lots of stuff you can measure. It's not clear whether any of it is correlated with productivity.
To use your example, a user with an LLM might say "LLM please fix this" as a first line of action, drastically improving this metric, even if it ruins your overall productivity.
valval 347 days ago [-]
You can come up with measures for it and then watch them, that’s for sure.
lr1970 347 days ago [-]
when metric becomes the target it ceases to be a good metric. when discovered how it works developers will type the first character immediately after opening the log.
edit: typo
joshuamorton 346 days ago [-]
Only if the developer is being judged on the thing. If the tool is being judged on the thing, it's much less relevant.
That is, I, personally, am not measured on how much AI generated code I create, and while the number is non-zero, I can't tell you what it is because I don't care and don't have any incentive to care. And I'm someone who is personally fairly bearish on the value of LLM-based codegen/autocomplete.
valval 345 days ago [-]
That was my point, veiled in an attempt to be cute.
LinuxBender 347 days ago [-]
Is AI ready to crawl through all open source and find / fix all the potential security bugs or all bugs for that matter? If so will that become a commercial service or a free service?
Will AI be able to detect bugs and back doors that require multiple pieces of code working together rather than being in a single piece of code? Humans have a hard time with this.
- Hypothetical Example: Authentication bugs in sshd that requires a flaw in systemd which then requires a flaw in udev or nss or PAM or some underlying library ... but looking at each individual library or daemon there are no bugs that a professional penetration testing organization such as the NCC group or Google's Project Zero would find. In other words, will AI soon be able to find more complex bugs in a year than Tavis has found in his career and will they start to compete with one another and start finding all the state sponsored complex bugs and then ultimately be able to create a map that suggests a common set of developers that may need to be notified? Will there be a table that logs where AI found things that professional human penetration testers could not?
0points 346 days ago [-]
No, that would require AGI. Actual reasoning.
Adversaries are already detecting issues tho, using proven means such as code review and fuzzing.
Google project zero consists of a team of rock star hackers. I don't see LLM even replacing junior devs right now.
paradox242 346 days ago [-]
Seems like there is more gain on the adversary side of this equation. Think nation-states like North Korea or China, and commercial entities like Pegasus Group.
AnimalMuppet 346 days ago [-]
Google's AI would have the advantage of the source code. The adversaries would not. (At least, not without hacking Google's code repository, which isn't impossible...)
saagarjha 346 days ago [-]
FWIW: NSO is the group, Pegasus is their product
nycdatasci 346 days ago [-]
You mention safety as #1, but my impression is that Google has taken a uniquely primitive approach to safety with many of their models. Instead of influencing the weights of the core model, they check core model outputs with a tiny and much less competent “safety model”. This approach leads to things like a text-to-image model that refuses to output images when a user asks to generate “a picture of a child playing hopscotch in front of their school, shot with a Sony A1 at 200 mm, f2.8”. Gemini has similar issue: it will stop mid-sentence, erase its entire response and then claim that something is likely offensive and it can’t continue.
The whole paradigm should change. If you are indeed responsible for developer tools, I would hope that you’re activity leveraging Claude 3.5 Sonnet and o1-preview.
wslh 346 days ago [-]
As someone working in cybersecurity and actively researching vulnerability scanning in codebases (including with LLMs), I’m struggling to understand what you mean by “safe.” If you’re referring to detecting security vulnerabilities, then you’re either working on a confidential project with unpublished methods, or your approach is likely on par with the current state of the art, which primarily addresses basic vulnerabilities.
bcherny 346 days ago [-]
How are you measuring productivity? And is the effect you see in A/B tests statistically significant? Both of these were challenging to do at Meta, even with many thousands of engineers —- curious what worked for you.
assanineass 346 days ago [-]
Was this comment cleared by comms
346 days ago [-]
bogwog 346 days ago [-]
Is any of the AI generated code being committed to Google's open source repos, or is it only being used for private/internal stuff?
fhdsgbbcaA 347 days ago [-]
I’ve been thinking a lot lately about how an LLM trained in really high quality code would perform.
I’m far from impressed with the output of GPT/Claude, all they’ve done is weight against stack overflow - which is still low quality code relative to Google.
What is probability Google makes this a real product, or is it too likely to autocomplete trade secrets?
hshshshshsh 346 days ago [-]
Seems like everything is working out without any issues. Shouldn't you be a bit suspicious?
mysterydip 347 days ago [-]
I assume the amount of monitoring effort is less than the amount of effort that would be required to replicate the AI generated code by humans, but do you have numbers on what that ROI looks like? Is it more like 10% or 200%?
ActionHank 345 days ago [-]
Would you say that the efficiency gain is less than, equal to, or greater than the cost?
It's always felt like having AI in the cloud for better autocomplete is a lot for a small gain.
346 days ago [-]
Twirrim 346 days ago [-]
> We work very closely with Google DeepMind to adapt Gemini models for Google-scale coding and other Software Engineering usecases.
Considering how terrible and frequently broken the code that the public facing Gemini produces, I'll have to be honest that that kind of scares me.
Gemini frequently fails at some fairly basic stuff, even in popular languages where it would have had a lot of source material to work from; where other public models (even free ones) sail through.
To give a fun, fairly recent example, here's a prime factorisation algorithm it produce for python:
# Find the prime factorization of n
prime_factors = []
while n > 1:
p = 2
while n % p == 0:
prime_factors.append(p)
n //= p
p += 1
prime_factors.append(n)
Can you spot all the problems?
ijidak 346 days ago [-]
I'm the first to say that AI will not replace human coders.
But I don't understand this attempt to tell companies/persons that are successfully using AI that no they really aren't.
In my opinion, if they feel they're using AI successfully, the goal should be to learn from that.
I don't understand this need to tell individuals who say they are successfully using AI that, "no you aren't."
It feels like a form of denial.
Like someone saying, "I refuse to accept that this could work for you, no matter what you say."
kgeist 346 days ago [-]
They probably use AI for writing tests, small internal tools/scripts, building generic frontends and quick prototypes/demos/proofs of concept. That could easily be that 25% of the code. And modern LLMs are pretty okayish with that.
gerash 346 days ago [-]
I believe most people use AI to help them quickly figure out how to use a library or an API without having to read all their (often out dated) documentation instead of helping them solve some mathematical challenge
delfinom 346 days ago [-]
I've never had an AI not just make up API when it didn't exist, instead of saying "it doesn't exist". Lol
taeric 346 days ago [-]
If the documentation is out of date, such that it doesn't help, this doesn't bode well for the training data of the AI helping it get it right, either?
macintux 346 days ago [-]
AI can presumably integrate all of the forum discussions talking about how people really use the code.
Assuming discussions don't happen in Slack, or Discord, or...
woodson 346 days ago [-]
Unfortunately, it often hallucinates wrong parameters (or gets their order wrong) if there are multiple different APIs for similar packages. For example, there are plenty ML model inference packages, and the code suggestions for NVIDIA Triton Inference Server Python code are pretty much always wrong, as it generates code that’s probably correct for other Python ML inference packages with slightly different API.
jon_richards 346 days ago [-]
I often find the opposite. Documentation can be up to date, but AI suggests deprecated or removed functions because there’s more old code than new code. Pgx v5 is a particularly consistent example.
randomNumber7 346 days ago [-]
And all the code on which it was trained...
Capricorn2481 345 days ago [-]
Forum posts can also be out of date.
randomNumber7 346 days ago [-]
I think that too but google claims something else.
calf 346 days ago [-]
We are sorely lacking a "Make Computer Science a Science" movement, the tech lead's blurb is par for the course, talking about "SWE productivity" with no reference to scientific inquiry and a foundational understanding of safety, correctness, verification, validation of these new LLM technologies.
almostgotcaught 346 days ago [-]
Did you know that Google is a for-profit business and not a university? Did you know that most places where people work on software are the same?
zifpanachr23 346 days ago [-]
So are most medical facilities. Somehow, the vibes are massively different.
almostgotcaught 346 days ago [-]
That's rich? Never heard of the opioid crisis? Or the over-prescription of imaging tests?
calf 346 days ago [-]
Did you know that Software Engineering is a university level degree? That it is a field of scientific study, with professors who dedicate their lives to it? What happens when companies ignore science and worse yet cause harm like pollution or medical malpractice, or in this case, spread Silicon Valley lies and bullshit???
Did you know? How WEIRD.
How about you not harass other commenters with such arrogantly ignorant sarcastic questions?? Or is that part of corporate "for-profit" culture too????
almostgotcaught 346 days ago [-]
> Did you know that Software Engineering is a university level degree? That it is a field of scientific study, with professors who dedicate their lives to it?
So is marketing? So finance? So is petroleum engineering?
justinpombrio 346 days ago [-]
> Can you spot all the problems?
You were probably being rhetorical, but there are two problems:
- `p = 2` should be outside the loop
- `prime_factors.append(n)` appends `1` onto the end of the list for no reason
With those two changes I'm pretty sure it's correct.
kenjackson 346 days ago [-]
You don't need to append 'p' in the inner while loop more than once. Maybe instead of an array for keeping the list of prime factors do it in a set.
zeroonetwothree 346 days ago [-]
It’s valid to return the multiplicity of each prime, depending on the goal of this.
The implicit context that the poster removed (as you can tell from the indentation) was a function definition:
def factorize(n):
...
return prime_factors
346 days ago [-]
dangsux 346 days ago [-]
[dead]
senko 346 days ago [-]
We collectively deride leetcoding interviews yet ask AI to flawlessly solve leetcode questions.
I bet I'd make more errors on my first try at it.
AnimalMuppet 346 days ago [-]
Writing a prime-number factorization function is hardly "leetcode".
senko 346 days ago [-]
I didn't say it's hard, but it's most definitely leetcode, as in "pointless algorithmic exercise that will only show you if the candidate recently worked on a similar question".
I would not expect a programmer of any seniority to churn stuff like that and have it working without testing.
AnimalMuppet 346 days ago [-]
> "pointless algorithmic exercise that will only show you if the candidate recently worked on a similar question".
I've been able to write one, not from memory but from first principles, any time in the last 40 years.
senko 346 days ago [-]
Curious, I would expect a programmer of your age to remember Knuth's "beware of the bugs in above code, I have only proven it's correct but haven't actually run it".
I'm happy you know math, but my point before this thread got derailed was that we're holding (coding) AI to a higher standard than actual humans, namely to expect to write bug-free code.
0points 346 days ago [-]
> my point before this thread got derailed was that we're holding (coding) AI to a higher standard than actual humans, namely to expect to write bug-free code
This seems like a very layman attitude and I would be surprised to find many devs adhering to this idea. Comments in this thread alone suggests that many devs on HN do not agree.
smrq 346 days ago [-]
I hold myself to a higher standard than AI tools are capable of, from my experience. (Maybe some people don't, and that's where the disconnect is between the apologists and the naysayers?)
Jensson 346 days ago [-]
Humans can actually run the code and knows what it should output. the LLM can't, and putting it in a loop against code output doesn't work well either since the LLM can't navigate that well.
eesmith 346 days ago [-]
A senior programmer like me knows that primality-based problems like the one posed in your link are easily gamed.
Testing for small prime factors is easy - brute force is your friend. Testing for large prime factors requires more effort. So the first trick is to figure out the bounds to the problem. Is it int32? Then brute-force it. Is it int64, where you might have a value like the Mersenne prime 2^61-1? Perhaps it's time to pull out a math reference. Is it longer, like an unbounded Python int? Definitely switch to something like the GNU Multiple Precision Arithmetic Library.
In this case, the maximum value is 1,000, which means we can enumerate all distinct prime values in that range, and test for its presence in each input value, one one-by-one:
That worked without testing, though I felt better after I ran the test suite, which found no errors. Here's the test suite:
import unittest
class TestExamples(unittest.TestCase):
def test_example_1(self):
self.assertEqual(distinctPrimeFactors([2,4,3,7,10,6]), 4)
def test_example_2(self):
self.assertEqual(distinctPrimeFactors([2,4,8,16]), 1)
def test_2_is_valid(self):
self.assertEqual(distinctPrimeFactors([2]), 1)
def test_1000_is_valid(self):
self.assertEqual(distinctPrimeFactors([1_000]), 2) # (2*5)**3
def test_10_000_values_is_valid(self):
values = _primes[:20] * (10_000 // 20)
assert len(values) == 10_000
self.assertEqual(distinctPrimeFactors(values), 20)
@unittest.skipUnless(__debug__, "can only test in debug mode")
class TestConstraints(unittest.TestCase):
def test_too_few(self):
with self.assertRaisesRegex(AssertionError, "size out of range"):
distinctPrimeFactors([])
def test_too_many(self):
with self.assertRaisesRegex(AssertionError, "size out of range"):
distinctPrimeFactors([2]*10_001)
def test_num_too_small(self):
with self.assertRaisesRegex(AssertionError, "num out of range"):
distinctPrimeFactors([1])
def test_num_too_large(self):
with self.assertRaisesRegex(AssertionError, "num out of range"):
distinctPrimeFactors([1_001])
if __name__ == "__main__":
unittest.main()
I had two typos in my test suite (an "=" for "==", and a ", 20))" instead of "), 20)"), and my original test_num_too_large() tested 10_001 instead of the boundary case of 1_001, so three mistakes in total.
If I had no internet access, I would compute that table thusly:
_primes = [2]
for value in range(3, 1000):
if all(value % p > 0 for p in _primes):
_primes.append(value)
Do let me know of any remaining mistakes.
What kind of senior programmers do you work with who can't handle something like this?
EDIT: For fun I wrote an implementation based on sympy's integer factorization:
from sympy.ntheory import factorint
def distinctPrimeFactors(nums: list[int]) -> int:
distinct_factors = set()
for num in nums:
distinct_factors.update(factorint(num))
return len(distinct_factors)
Here's a new test case, which takes about 17 seconds to run:
Empirical testing (for example: https://news.ycombinator.com/item?id=33293522) has established that the people on Hacker News tend to be junior in their skills. Understanding this fact can help you understand why certain opinions and reactions are more likely here. Surprisingly, the more skilled individuals tend to be found on Reddit (same testing performed there).
louthy 346 days ago [-]
I’m not sure that’s evidence; I looked at that and saw it was written in Go and just didn’t bother. As someone with 40 years of coding experience and a fundamental dislike of Go, I didn’t feel the need to even try. So the numbers can easily be skewed, surely.
atomic128 346 days ago [-]
Only individuals who submitted multiple bad solutions before giving up were counted as failing. If you look but don't bother, or submit a single bad solution, you aren't counted. Thousands of individuals were tested on Hacker News and Reddit, and surprisingly, it's not even close: Reddit is where the hackers are. I mean, at the time of the testing, years ago.
louthy 346 days ago [-]
That doesn’t change my point. It didn’t test every dev on all platforms, it tested a subset. That subset may well have different attributes to the ones that didn’t engage. So, it says nothing about the audience for the forums as a whole, just the few thousand that engaged.
Perhaps even, there could be fewer Go programmers here and some just took a stab at it even though they don’t know the language. So it could just select for which forum has the most Go programmers. Hardly rigourous.
So I’d take that with a pinch of salt personally
atomic128 346 days ago [-]
Agreed. But remember, this isn't the only time the population has been tested. This is just the test (from two years ago, in 2022) that I happen to have a link to.
louthy 346 days ago [-]
The population hasn’t been tested. A subset has.
59nadir 346 days ago [-]
It's also fine to be an outlier. I've been programming for 24 years and have been hanging out on HackerNews on and off for 11. HN was way more relevant to me 11 years ago than it is now, and I don't think that's necessarily only because the subject matter changed, but probably also because I have.
Izikiel43 346 days ago [-]
How is that thing testing?
Is it expecting a specific solution or actually running the code?
I tried some solutions and it complained anyways
atomic128 346 days ago [-]
The way the site works is explained in the first puzzle, "Hack This Site". TLDR, it builds and runs your code against a test suite. If your solutions weren't accepted, it's because they're wrong.
0xDEAFBEAD 346 days ago [-]
Where is the data?
freilanzer 346 days ago [-]
Yeah, this is useless.
346 days ago [-]
gamesetmath 347 days ago [-]
[flagged]
pixxel 347 days ago [-]
[flagged]
devonbleak 346 days ago [-]
It's Go. 25% of the code is just basic error checking and returning nil.
QuercusMax 346 days ago [-]
In Java, 25% of the code is import statements and curly braces
layer8 346 days ago [-]
You generally don’t write those by hand though.
I’m pretty sure around 50% of the code I write is already auto-complete, without any AI.
amomchilov 346 days ago [-]
Exactly, you write them with AI
throwaway106382 346 days ago [-]
IDEs have been auto completing braces, inserting imports and generating framework boilerplate for decades.
We don’t need AI for this and it’s 10x the compute to do it slower with AI.
LLMs are useful but they aren’t a silver bullet. We don’t need to replace everything with it just because.
philipwhiuk 346 days ago [-]
Yeah, but the management achievement is to call 'autocomplete' AI.
AI doesn't mean LLM after all. AI means 'a computer thing'.
throwaway106382 346 days ago [-]
I’ve been calling if-statements AI since before I graduated college
346 days ago [-]
jansan 346 days ago [-]
Simply strech your definition of AI and voilá, you are writing it with AI.
rwmj 346 days ago [-]
The most important thing is to put out a press release about how half your code is written by AI.
contravariant 346 days ago [-]
In lisp about 50% of the code is just closing parentheses.
harry8 346 days ago [-]
Heh, but it can't be that, no reason to think llms can count brackets needing a close any more than they can count words.
int_19h 346 days ago [-]
LLMs can count words (and letters) just fine if you train them to do so.
Consider the fact that GPT-4 can generate valid XML (meaning balanced tags, quotes etc) in base64-encoded form. Without CoT, just direct output.
maleldil 345 days ago [-]
That's GPT-4, which you wouldn't use for in-line suggestions because it's too slow.
I don't know what model Copilot uses these days, but it constantly makes bracket mistakes in Python.
int_19h 345 days ago [-]
You don't need a GPT-4-sized model to count brackets. You just need to make sure that your training data includes enough cases like that for NN to learn it. My point is that GPT-4 can do much more complicated things than that, so there's nothing specific about LMs that preclude them from doing this kind of stuff right.
overhead4075 346 days ago [-]
Logically, it couldn't be 50% since that would imply that the other 50% would be open brackets and that would leave 0% room for macros.
philipwhiuk 346 days ago [-]
That's just a rounding error ;)
346 days ago [-]
xxs 346 days ago [-]
Over 3 imports from the same package - use an asterisk.
NeoTar 346 days ago [-]
Does auto-code generation count as AI?
remram 346 days ago [-]
Another 60% is auto-generated protobuf/grpc code. Maybe protoc counts as "AI".
GeneralMayhem 346 days ago [-]
Google does not check in protoc-generated code. It's all generated on demand by Blaze/Bazel.
remram 345 days ago [-]
Oh thanks for the info.
On the other hand, that doesn't mean it doesn't count for the purpose of this press release/advertisement...
Groxx 342 days ago [-]
new headline: Protoc Generates Many Times More Code Than Humans Or AI
(because it's like 50k lines regenerated for every build, everywhere, all the time)
hiddencost 346 days ago [-]
Go is a very small fraction of the code at Google.
yangcheng 346 days ago [-]
Having worked at both FAANG companies and startups, I can offer a perspective on AI's coding impact in different environments.
At startups, engineers work with new tech stacks, start projects from scratch, and need to ship something quickly. LLMs can wrtie way more code. I've seen ML engineers build React frontends without any previous frontend experience, flutter developers write 100-line SQL queries for data analysis, with LLM 10x productivity for this type of work.
At FAANG companies, codebases contain years of business logic, edge cases, and 'not-bugs-but-features.' Engineers know their tech stacks well, and legacy constraints make LLMs less effective, and can generate wrong code that needs to be fixed
davnicwil 346 days ago [-]
It might not quite be there yet, but one key advantage large codebases have that I think LLMs in time will be able to better exploit is the detection of existing patterns - presuming they're consistent - and application to new code doing similar things or to fix bugs in existing code that deviates from the pattern in some way that causes a bug.
It's a different thing to what you're talking about, but it's one way I'd expect to see LLMs contribute a lot to productivity on larger codebases specifically.
mdgrech23 345 days ago [-]
large application codebase - consistent - have you worked in the field? I feel like usually there are 3 or 4 patterns from different people/teams at different points in time that spearheaded a particular ideology about how things "should" be done.
dep_b 345 days ago [-]
A quarter of all new code? Of course. Especially if you include all "smart autocomplete" code.
When dealing with a fermenting pile of technical debt? I expect very little. LLM's don't have application-wide context yet.
AI is definitely revolutionizing our field, but the same people that said that no-code tools and all of the other hype-of-the-decade technologies would make developers jobless are actually the people AI is making jobless.
Generate an opinion piece about how AI is going to make developers jobless, using AI? Less than a minute. And you don't need to maintain that article, once it's published, it's done.
While there's a tsunami of AI-generated almost-there projects coming that need to be moved to a shippable and sellable state. So I'm more afraid about the kind of work I'm going to get while still getting paid handsomely for my skills, than ever being jobless as the only guy that really understands the whole stack from top to bottom.
randomdata 345 days ago [-]
At the end of the day an LLM is just a compiler anyway. The developer isn't going away even if 100% of the code is generated by LLMs, just as the developer didn't go away when we stopped spending our days flipping toggle switches.
dep_b 345 days ago [-]
I'm actually surprised that _the others_ always think that the programmers somehow will make themselves obsolete first? If it gets cheaper to make software, more software will be made, until we reach the point again we're running short on people capable enough to keep it all running.
fzysingularity 346 days ago [-]
While I get the MBA-speak of lines-of-code that AI is now able to accomplish, it does make me think about their highly-curated internal codebase that makes them well placed to potentially get to 50% AI-generated code.
One common misconception is that all LLMs are the same. The models are trained the same, but trained on wildly different datasets. Google, and more specifically the Google codebase is arguably one of the most curated, and iterated on datasets in existence. This is a massive lever for Google to train their internal code-gen models, that realistically could easily replace any entry-level or junior developer.
- Code review is another dimension of the process of maintaining a codebase that we can expect huge improvements with LLMs. The highly-curated commentary on existing code / flawed diff / corrected diff that Google possesses give them an opportunity to build a whole set of new internal tools / infra that's extremely tailored to their own coding standard / culture.
bqmjjx0kac 346 days ago [-]
> that realistically could easily replace any entry-level or junior developer.
This is a massive, unsubstantiated leap.
risyachka 346 days ago [-]
The issue is it doesn't really replace junior dev. You become one - as you have to babysit it all the time, check every line of code, and beg it to make it work.
In many cases it is counterproductive
throwaway106382 346 days ago [-]
I’d take pair programming with a junior over a GPT bot any day.
neaanopri 346 days ago [-]
I'd take coding by own damn self over either a junior or a gpt bot
jtbetz22 346 days ago [-]
> Google codebase is arguably one of the most curated, and iterated on datasets in existence
I spent 12 years of my career in the Google codebase.
This assertion is technically correct in that google3 has been around for 20 years, and all code gets reviewed, but the implication that Google's codebase is a high-quality training set is not consistent with my experience.
unit149 346 days ago [-]
Philosophically, these models are akin to scholars prior to differentiation during their course of study. Throttling data, depending on one's course of study, and this shifting of the period in history step-by-step. Either it's a tit-for-tat manner of exchange that the junior developer is engaged in, when overseeing every edit that an LLM has modified, or I'd assume that there are in-built methods of garbage collection, that another LLM evaluating a hash function partly identifying a block of tokenized internal code would be responsible for.
morkalork 346 days ago [-]
Is the public gemini code gen LLM trained on their internal repo? I wonder if one could get it to cough up propriety code with the right prompt.
p1esk 346 days ago [-]
I’m curious if Microsoft lets OpenAI train on GH private repos.
happyopossum 346 days ago [-]
> Is the public gemini code gen LLM trained on their internal repo?
Nope
AlwaysRock 346 days ago [-]
If we are talking about the boilerplate code and autofill syntax code that copilot or any other "AI" will offer me when I start typing... Then sure. Sounds about right.
The other 75% is the stuff you actually have to think about.
This feels like saying linters impact x0% of code. This just feels like an extension of that.
creativenolo 346 days ago [-]
It probably does. But an amazing number of commenters think they are prompting the copy & pasting, and hoping for the best.
Kalabasa 346 days ago [-]
Yep, a lot of headline readers here.
It's just a very advanced autocomplete, completely integrated into the internal codebase and IDE. You can read this on the research blog (maybe if everyone just read the blog).
e.g.
I start typing `var notificationManager`
It would suggest `= (Notification Manager) context.getSystemService(NOTIFICATION_MANAGER);`
If you've done Android then you know how much boilerplate there is to suggest.
I press Ctrl+Enter or something to accept the suggestion.
Voila, more than 50% of that code was written by AI.
> blindly committing AI code
Even before AI, no one blindly accepts autocomplete.
A lot of headline-readers seem to imagine some sort of semi-autonomous or prompt based code generation that writes whole blocks of code to then be blindly accepted by engineers.
skydhash 345 days ago [-]
That makes a while since I’ve done Android, but I’m sure that this variable should be a property and be set as part of the lifecycle. And while Android (and any big project) is full of boilerplate, each line is subtly different or it would have already been abstracted in some base class. And even then, the code completion is already so good in Android Studio that you would have to be a complete junior (in this case, you wouldn’t know that the AI suggestion is good) to complain that writing code is slow. Most time spent is designing code, fixing subtle bugs, and refactoring to clean up the code.
esjeon 346 days ago [-]
> The other 75% is the stuff you actually have to think about.
I’m pretty sure the actual ratio is much lower than that. In other words, LLMs are currently not good enough to remove the majority of chores, even with the state of the art model trained on highly curated dataset.
AlwaysRock 341 days ago [-]
Have you used CoPilot with vs code? It's not perfect all the time but its autocomplete is right a significant amount of time.
imaginebit 347 days ago [-]
I think he's trying to promote AI, somehow raises questions about thrir code quality among some
dietr1ch 347 days ago [-]
I think it just shows how much noise there is in coding. Code gets reviewed anyways (although review quality was going down rapidly the more PMs where added to the team)
Most of the code must be what could be snippets (opening files and handling errors with absl::, and moving data from proto to proto).
One thing that doesn't help here, is that when writing for many engineers on different teams to read, spelling out simple code instead of depending on too many abstractions seems to be preferred by most teams.
I guess that LLMs do provide smarter snippets that I don't need to fill out in detail, and when it understands types and whether things compile it gets quite good and "smart" when it comes to write down boilerplate.
ryoshu 346 days ago [-]
Spoken like an MBA who counts lines of code.
pfannkuchen 346 days ago [-]
It’s replaced the 25% previously copy pasted from stack overflow.
rkagerer 346 days ago [-]
This may have been intended as a joke, but it's the only explanation that reconciles the quote for me.
brainwad 346 days ago [-]
The split is roughly 25% AI, 25% typed, 50% pasted.
ttul 346 days ago [-]
I wanted a new feature in our customer support console and the dev lead suggested I write a JIRA. I’m the CEO, so this is not my usual thing (and probably should not be). I told Claude what I wanted and pasted in a .js file from the existing project so that it would get a sense of the context. It cranked out a fully functional React component that actually looks quite nice too. Two new API calls were needed, but Claude helpfully told me that. So I pasted the code sample and a screenshot of the HTML output into the JIRA and then got Claude to write me the rest of the JIRA as well.
Everyone knows this was “made by AI” because there’s no way in hell I would ever have the time. These models might not be able to sit there and build an entire project from scratch yet, but if what you need is some help adding the next control panel page, Claude’s got your back on that.
simianparrot 346 days ago [-]
You’re also the CEO so chances are the people looking at that ticket aren’t going to tell you the absolute mess the AI snippet actually is and how pointless it was to include it instead of a simple succinct sentence explaining the requirements.
If you’re not a developer chances are very high the code it produces will look passable but is actually worthless — or worse, it’s misleading and now a dev has to spend more time deciphering the task.
ttul 344 days ago [-]
LoL, I really appreciate this comment. My team is very frank with me about code quality and they said Claude’s work looked pretty good — this time. But I’ll take your recommendation to heart for next time.
JonChesterfield 346 days ago [-]
> Everyone knows this was “made by AI” because there’s no way in hell I would ever have the time.
Doubtful. A decent fraction of the people reading it will guess that you've wasted your time writing incoherent nonsense in the jira. Engineers don't usually have much insight into what the C suite are doing. It would be a prudent move to spend the couple of seconds to write "something like this AI sketch:" before the copy&paste.
gloflo 346 days ago [-]
> Everyone knows this was “made by AI” because ...
They should know because you told them so.
Having to decipher weird code only to discover it was not written by a human is not nice.
zac23or 345 days ago [-]
> dev lead suggested I write a JIRA. I’m the CEO, so this is not my usual thing (and probably should not be)
Fascinating point of view.
nosbo 347 days ago [-]
I don't write code as I'm a sysadmin. Mostly just scripts. But is this like saying intellisense writes 25% of my code? Because I use autocomplete to shortcut stuff or to create a for loop to fill with things I want to do.
n_ary 347 days ago [-]
You just made it less attractive to the target corps who are to buy this product from Google. Saying, intellisense means corps already have license of various of these and some are even mostly free. Saying AI generate our 25% code sounds more attractive to corps, because it feels like something new and novel and you can imagine laying off 25% of the personnel and justify buying this product from Google.
When someone who uses a product says it, there is a 50% chance of it being true, but when someone far away from the user says it, it is 100% promotion of product and setup for trust building for a future sale.
Not what I thought when I heard "AI coding", but seems pretty neat.
stephenr 346 days ago [-]
> I don't write code as I'm a sysadmin. Mostly just scripts.
.... so what do you put in your scripts if not code?
nosbo 345 days ago [-]
Don't disagree. But I think it's pretty accepted that sysadminy scripts and full blown applications are different. Just don't want to give the wrong impression that I know what I'm talking about I guess.
bongodongobob 345 days ago [-]
The colloquial difference is a few lines, maybe a dozen or two, for maintenance and one off stuff, not a full blown application.
0xCAP 346 days ago [-]
People overestimate faang. There are many talents working there, sure, but a lot of garbage gets pumped into their codebases as well.
mattgreenrocks 345 days ago [-]
Devs who pride themselves on their capacity for rational thought seem to forget that regression to the mean applies everywhere...even to the places that they aspire to.
fuzzfactor 346 days ago [-]
>a lot of garbage gets pumped into their codebases
I would imagine it always has.
>Google CEO says more than a quarter of the company's new code is created by AI
It may very well be starting to become apparent anyway :\
summerlight 346 days ago [-]
In Google, there is a process called "Large Scale Change" which is primarily meant for trivial/safe but extremely tedious code changes that potentially span over the entire monorepo. Such as foundational API changes, trivial optimization, code style etc etc. This is a perfectly suitable for LLM driven code changes (in fact I'm seeing more and more of LLM generated LSC) and I guess a large fraction of mentioned "AI generated codes" can be actually attributable to this.
bubaumba 346 days ago [-]
yeh, but the main problem is the quality. with algorithm bug can be fixed. with llm it's more complicated. in practice they do some mistakes consistently, and in some cases cannot recover even with assistance. (don't take me wrong, I'm very happy with the results most of the time)
afro88 346 days ago [-]
You just fix the mistakes and keep moving. It's like autocomplete where you still need to fill in the blanks or select a different completion.
saagarjha 346 days ago [-]
Spotting and fixing mistakes in a LSC is no small feat.
drunken_thor 346 days ago [-]
A company that used to be the pinnacle of software development is now just generating code in order to sell their big data models. Horrifying. Devastating.
motoxpro 346 days ago [-]
People talk about how AI is bad at generating non-trivial code, but why are people using it to generate non-trivial code?
25% of coding is just the most basic boilerplate. I think of AI not as a thinking machine but as a 1000 WPM boilerplate typer.
If it is halucinating, you're trying to make it do stuff that is too complex.
ghosty141 346 days ago [-]
But for this boiletplate creating a few snippets in your code generally works better. Especially if things change you dont have to retrain your model.
Thats my main problem: for trivial things it works but isnt much better than conventional tools, for hard things it just produces incorrect code such that writing it from scratch barely makes a difference
motoxpro 346 days ago [-]
I think thats a great analogy.
What would it look like if I could have 3-500 snippets instead of 30. Those 300 are things that I do all over my codebase e.g. same basic where query but in the context of whatever function I am in, a click handler with the correct types for that purpose, etc.
There is no way I can have enough hotkeys or memorize that much, and I truly can't type faster than I can hit tab.
I don't need it to think for me. Most coding (front-end/back-end web) involves typing super basic stuff, not writing complex algorithms.
This is where the 10-20% speed-up comes in. On average I am just typing 20% faster by hitting tab.
globular-toast 346 days ago [-]
Were people seriously writing this boilerplate by hand up until this point? I started using snippets and stuff more than 15 years ago!
ausbah 347 days ago [-]
i would be may more impressed if LLMs could do code compression. more code == more things that can break, and when llms can generate boatloads of it with a click you can imagine what might happen
Scene_Cast2 347 days ago [-]
This actually sparked an idea for me. Could code complexity be measured as cumulative entropy as measured by running LLM token predictions on a codebase? Notably, verbose boilerplate would be pretty low entropy, and straightforward code should be decently low as well.
jeffparsons 347 days ago [-]
Not quite, I think. Some kinds of redundancy are good, and some are bad. Good redundancy tends to reduce mistakes rather than introduce them. E.g. there's lots of redundancy in natural languages, and it helps resolve ambiguity and fill in blanks or corruption if you didn't hear something properly. Similarly, a lot of "entropy" in code could be reduced by shortening names, deleting types, etc., but all those things were helping to clarify intent to other humans, thereby reducing mistakes. But some is copy+paste of rules that should be enforce in one place. Teaching a computer to understand the difference is... hard.
Although, if we were to ignore all this for a second, you could also make similar estimates with, e.g., gzip: the higher the compression ratio attained, the more "verbose"/"fluffy" the code is.
Fun tangent: there are a lot of researchers who believe that compression and intelligence are equivalent or at least very tightly linked.
8note 347 days ago [-]
Interpreting this comment, it would predict low complexity for code copied unnecessarily.
I'm not sure though. If it's copied a bunch of times, and it actually doesn't matter because each usecase of the copying is linearly independent, does it matter that it was copied?
Over time, you'd still see copies being changed by themselves show up as increased entropy
david-gpu 346 days ago [-]
> Could code complexity be measured as cumulative entropy as measured by running LLM token predictions on a codebase? Notably, verbose boilerplate would be pretty low entropy, and straightforward code should be decently low as well.
WinRAR can do that for you quite effectively.
malfist 346 days ago [-]
Code complexity can already be measured deterministically with cyclomatic complexity. No need to use an AI fuzzy logic at this. Especially when they're bad at math.
contravariant 346 days ago [-]
There's nothing fuzzy about letting an LLM determine the probability of a particular piece of text.
In fact it's the one thing they are explicitly designed to do, the rest is more or less a side-effect.
ks2048 347 days ago [-]
I agree. It seems like counting lines of generated code is like counting bytes/instructions of compiled code - who cares? If “code” becomes prompts, then AI should lead to much smaller code than before.
I’m aware that the difference is that AI-generated code can be read and modified by humans. But that quantity is bad because humans have to understand it to read or modify it.
TZubiri 347 days ago [-]
What's that line about accounting for lines of code on the wrong side of the balance sheet?
latexr 347 days ago [-]
> If “code” becomes prompts, then AI should lead to much smaller code than before.
What’s the point of shorter code if you can’t trust it to do what it’s supposed to?
I’ll take 20 lines of code that do what they should consistently over 1 line that may or may not do the task depending on the direction of the wind.
347 days ago [-]
AlexandrB 347 days ago [-]
Exactly this. Code is a liability, if you can do the same thing with less code you're often better off.
EasyMark 347 days ago [-]
Not if it’s already stable and has been running for years. Legacy doesn’t necessarily mean “need replacement because of technical debt”. I’ve seen lots of people want to replace code that has been running basically bug free for years because “there are better coding styles and practices now”
8note 347 days ago [-]
How would it know which edge cases are being useful and which ones aren't?
I understand more code as being more edge cases
wvenable 346 days ago [-]
More code could just be useless code that no longer serves any purpose but still looks reasonable to the naked eye. An LLM can certainly figure out and suggest maybe some conditional is impossible given the rest of the code.
I can also suggest alternatives, like using existing library functions for things that might have been coded manually.
ekwav 346 days ago [-]
Or just refactor to use early returns
asah 347 days ago [-]
meh - the LLM code I'm seeing isn't particularly more verbose. And as others have said, if you want tighter code, just add that to the prompt.
fun story: today I had an LLM write me a non-trivial perl one-liner. It tried to be verbose but I insisted and it gave me one tight line.
randomNumber7 346 days ago [-]
I cannot imagine this to be true, cause imo current LLM's coding abilities are very limited. It definitely makes me more productive to use it as a tool, but I use it mainly for boilerplate and short examples (where I had to read some library documentation before).
Whenever the problem requires thinking, it horribly fails because it cannot reason (yet). So unless this is also true for google devs, I cannot see that 25% number.
Wheatman 346 days ago [-]
My guess is that they counted each line of code made by an engineer using AI coding tools.
Besides, even google employees write a lot of boilerplate, especially android IIRC, not to mention simple but essential code, so AI can prevent carpal tunnel for the junior devs working on that.
zifpanachr23 346 days ago [-]
Roughly only one quarter (assuming they are outputting similar amounts of code as non AI using engineers) of engineers actually using AI regularly for coding is a statistic that is actually believable to me based on my own experience. A lot of small teams have their "AI guy" who has drunk the kool aid, but it's not as widespread as HackerNews would make you think.
chrisjj 346 days ago [-]
> My guess is that they counted each line of code made by an engineer using AI coding tools.
... and forgot to count the Delete presses.
jdefr89 345 days ago [-]
80% or more of the code you write day to day is just grunt work. Boring code that has, for the most part, already been written in some form such that it was copied from Google or StackOverflow. AI is basically a shortcut to using that stuff..
d_burfoot 345 days ago [-]
I'd be far more impressed if the CEO said "The AI deleted a quarter of our company's code".
zh3 345 days ago [-]
Yes, like the old story about why not to measure productivity by LoC generated.
Everyone here is arguing about the average AI code quality and I'm here just not believing the claim.
Is Google out there monitoring the IDE activity of every engineer, logging the amount of code created, by what, lines, characters, and how it was generated? Dubious.
Jyaif 346 days ago [-]
> Is Google out there monitoring the IDE activity of every engineer, logging the amount of code created, by what, lines, characters, and how it was generated
A good chunk () of their code goes in a centralized repo, and is written via a centralized web IDE. So measuring everything you mentioned is not hard.
()
Android, Chrome, and other similar projects are exceptions.
avsteele 346 days ago [-]
How does this allow them to measure the % generated by AI tooling?
Jyaif 345 days ago [-]
The IDE integrates the AI generator, like copilot.
Yes, they'll miss AI-generated code that is copy pasted, so they only have a lower bound of AI-generated code.
kunley 346 days ago [-]
Very good point. How was the 25% measured?
sbochins 346 days ago [-]
It’s probably code that was previously machine generated that they’re now calling “AI Generated”.
frank_nitti 346 days ago [-]
That would make sense and be a good use case, essentially doing what OpenAPI generators do (or Yeoman generators of yore), but less deterministic I’d imagine. So optimistically I would guess it covers ground that isn’t already solved by mainstream tools.
For the example of generating an http app scaffolding from an openapi spec, it would probably account for at least 25% of the text in the generated source code. But I imagine this report would conveniently exclude the creation of the original source yaml driving the generator — I can’t imagine you’d save much typing (or mental overhead) trying to prompt a chatbot to design your api spec correctly before the codegen
xen0 345 days ago [-]
I really do wonder who these engineers are, that the current 'AI' tools are able to write so much of their code.
Maybe my situation is unusual; I haven't written all that much code at Google lately, but what I do write is pretty tied to specific details of the program and the AI auto completion is just not that useful. Sometimes it auto completes a method signature correctly, but it never gets the body right (or even particularly close).
And it routinely making up methods or fields on objects I want to use is anti productive.
arethuza 346 days ago [-]
I'm waiting for some Google developer to say "More than a quarter of the CEOs statements are now created by AI"... ;-)
freilanzer 346 days ago [-]
I'd say most CEO statements are quite useless already, as they're mostly corporate newspeak.
prmoustache 346 days ago [-]
Aren't we just talking about auto completion?
In that case those 25% are probably the very same 25% that were automatically generated by LTP based auto-completion.
alienchow 346 days ago [-]
When setting up unit tests traditionally took more time and LOC than the logic itself, LLMs are particularly useful.
1. Paste in my actual code.
2. Prompt: Write unit tests, test tables. Include scenarios: A, B, C, D, E. Include all other scenarios I left out, isolate suggestions for review.
I used to spend the majority of the coding time writing unit tests and mocking test data, now it's more like 10%.
arkh 346 days ago [-]
> Paste in my actual code.
> Prompt: Write unit tests
TDD in shambles. What you'd like is:
> Give your specs to some AI
> Get a test suite generated with all edge cases accounted for
> Code
alienchow 346 days ago [-]
Matter of preference. I've found TDD to be inflexible for my working style. But your suggestion would indeed work for a staunch TDD practitioner.
lysace 346 days ago [-]
Github Copilot had an outage for me this morning. It was kind of shocking. I now believe this metric. :-)
I'll be looking into ways of running a local LLM for this purpose (code assistance in VS Code). I'm already really impressed with various quite large models running on my 32 GB Mac Studio M2 Max via Ollama. It feels like having a locally running chatgpt.
evoke4908 346 days ago [-]
Ollama, docker and "open webui".
It immediately works out of the box and that's it. I've been using local LLMs on my laptop for a while, it's pretty nice.
The only thing you really need to worry about is VRAM. Make sure your GPU has enough memory to run your model and that's pretty much it.
Also "open webui" is the worst project name I've ever seen.
kulahan 346 days ago [-]
I'm very happy to hear this; maybe it's finally time to buy a ton of ram for my PC! A local, private LLM would be great. I'd try talking to it about stuff I don't feel comfortable being on OpenAI's servers.
lysace 346 days ago [-]
Getting lots of ram will let you run large models on the CPU, but it will be so slow.
The Apple Silicon Macs have this shared memory between CPU and GPU that let's the (relatively underpowered GPU, compared to a decent Nvidia GPU) run these models at decent speeds, compared with a CPU, when using llama.cpp.
This should all get dramatically better/faster/cheaper within a few years, I suspect. Capitalism will figure this one out.
kulahan 346 days ago [-]
Interesting, so this is a Mac-specific solution? That's pretty cool.
I assume, then, that the primary goal would be to drop in the beefiest GPU possible when on windows/linux?
evilduck 346 days ago [-]
There's nothing Mac specific about running LLMs locally, they just happen to be a convenient way to get a ton of VRAM in a single small power efficient package.
In Windows and Linux, yes you'll want at least 12GB of VRAM to have much of any utility but the beefiest consumer GPUs are still topping out at 24GB which is still pretty limiting.
lysace 346 days ago [-]
With Windows/Linux I think the issue is that NVidia is artificially limiting the amount of onboard RAM (they want to sell those devices for 10x more to openai, etc) and that AMD for whatever reason can't get their shit together.
I'm sure that there are other much more knowledgeable people here though, on this topic.
rustcleaner 345 days ago [-]
This is why the DMCA must be repealed.
makerofthings 346 days ago [-]
I keep trying to use these things but I always end up back in vim (in which I don't have any ai autocomplete set up.)
The AI is fine, but every time it makes a little mistake that I have to correct it really breaks my flow. I might type a lot more boilerplate without it but I get better flow and overall that saves me time with less mistakes.
rcarmo 347 days ago [-]
There is a running gag among my friends using Google Chat (or whatever their corporate IM tool is now called) that this explains a lot of what they’re experiencing while using it…
tdeck 346 days ago [-]
I didn't know anyone outside Google actually used that...
skywhopper 346 days ago [-]
All this means is that 25% of code at Google is trivial boilerplate that would be better factored out of their process rather than tasking inefficient LLM tools with. The more they are willing to leave the “grunt work” to an LLM, the less likely they are to ever eliminate it from the process.
hollowturtle 346 days ago [-]
Sometimes I wonder why we would want LLMs spit out human readable code. Wouldn’t be a better future where LLMs generate highly efficient machine code and eventually we read the “source map” for debugging? Wasn’t source code just for humans?
sparcpile 346 days ago [-]
You just reinvented the compiler.
palata 346 days ago [-]
Because you can't trust what the LLM generates, so you have to read it. Of course the question then is whether you can trust your developer or not.
hollowturtle 346 days ago [-]
I’d rather reply with LLMs aren’t just capable of that. They’re okay with Python and JS simply because there’s a lot of training data out in the open. My point was that it seems like we’re delegating the future to tools that could generate critical code using languages originally thought to be easy to learn.. it doesn’t make sense
mattxxx 346 days ago [-]
I think they spit out human-readable code, because they've been tried on human authors.
But you make an interesting point: eventually AI will be making for other AI's + machines, and human verification can be an after thought.
standardUser 346 days ago [-]
I use it all the time for work. Not much for actual code that goes into production, but a lot for "tell me what this does" or "given x, how do I do y". It speeds me up a ton. I'll also have it do code review when I'm uncertain about something, asking if there's any bugs or inefficiencies in a given chunk of code. I've actually found it to be more reliable about code than more general topics. Though I'm using it in a fairly specific way with code, versus asking for deep information about history for example, where is frequently gets facts very wrong.
redbell 346 days ago [-]
Wait a second—didn't Google warn its employees against using AI-generated code? (https://news.ycombinator.com/item?id=36399021). What had changed?! Has Gemini now surpassed Bard in capabilities? Did they manage to resolve the copyright issues? Or maybe they've noticed a boost in productivity? I'm not sure, but let’s see if other big tech companies would follow this path.
KeplerBoy 346 days ago [-]
Different audiences.
You tell investors that AI is freaking magic and going to usher in an age of savings and productivity gains.
You tell your developers that it's a neat autocomplete, they should use carefully.
SavageBeast 346 days ago [-]
Google needs to bolster their AI story and this is good click bait. I'm not buying it personally.
hggigg 346 days ago [-]
I reckon he’s talking bollocks. Same as IBM was when it was about to disguise layoffs as AI uplift and actually just shovelled the existing workload on to other people.
submeta 346 days ago [-]
Pandora‘s box has been opened.
Some say „this is mere tab completion“, some say „it won’t replace the senior engineer.“
I can remember how many fiercely argued 2 years ago that GenAI and Copilot are producing garbage. But here we are: These systems improve the workflow of creating / editing code enormously. You seniors might not be affected, but there are endless many scenarios where it replaces the junior who‘d write code to transform data, write scripts, write one-off scripts, or even write boilerplate, test code and what not.
And this is only after a short time. I cannot even imagine what we‘ll have ten years from now where we can propably have much larger context windows where the system can „unterstand“ the whole code base, not just parts.
I am sorry for low level engineering jobs, but I am super exited as well.
With GebAI I have been writing super complex Elisp code to automate workflow in Emacs, or VBA scripts in Excel, or Bash scripts I wouldn’t have otherwise been able to write, or JavaScript, or quickly write Python code to solve very tricky problems (and I am very high level in Python), or even React code for web apps for my personal use.
The future looks exiting to me.
Capricorn2481 345 days ago [-]
> I can remember how many fiercely argued 2 years ago that GenAI and Copilot are producing garbage. But here we are: These systems improve the workflow of creating / editing code enormously
This is the disconnect. I, along with others, haven't seen this yet. I'm begging to see it because I'd love to automate my work away, but I can't. This comment comes off as hand-wavy to me because it says "here we are" as if Google saying their AI works is evidence itself and not a statement that requires evidence.
gmm1990 346 days ago [-]
I don’t fully understand the workflow were you hand boiler plate code off to a junior wouldn’t the communication overhead be higher than writing it yourself. Certainly llms have valid uses but I see improving junior productivity more than senior productivity
LudwigNagasena 346 days ago [-]
How much of that generated code is `if err != nil { return err }`?
344 days ago [-]
ken47 341 days ago [-]
Without context, not very meaningful. Does this simple measure lines of code? Characters written? Is it “oversuggesting” code that it shouldn’t be confident in? Does this code make it into production or is a large percentage of it fixed by humans at great cost?
Google, and really, the whole financial machine has a vested interest playing up the potential of AI. Unfortunate that it isn’t being given time to grow organically.
yearolinuxdsktp 346 days ago [-]
Of course when so much of it is written in verbose-as-fuck languages like Java and Go, you’d be stupid not to let computers generate lack chunks of it. It’s sad, we as humans stopped trying to do better at better coding languages. At least Java is slowly making progress—-maybe in another 10 years, it will finally become a high level language. Go never tried to be one. You surprised you need AI to tab complete your boilerplate?!
Financial incentives at large companies are not aligned with low volumes of code. There are no rewards for less code. People get rewarded for another bullshit framework to slap on their resume. Box me in, no, cube me in to a morass of a thick ingress layer, that uses 1/8th of my CPU.
holtkam2 346 days ago [-]
Can we also see the stats for how much code used to come from StackOverflow? Probably 25%
tgtweak 346 days ago [-]
I feel like, given my experience lately with all the API models currently available, that this is only a fact if the models google is using internally are SIGNIFICANTLY better than what is available publicly even on closed models.
Claude 3.5-sonnet (latest) is barely able to stay coherent on 500 LOC files, and easily gets tripped up when there are several files in the same directory.
I have tried similarly with o1-preview and 4o, and gemini pro...
If google is using a 5M token context window LLM with 100k+ token-output trained on all the code that is not public... then I can believe this claim.
This just goes to show how critical of an issue this is that these models are behind closed doors.
nomel 346 days ago [-]
> This just goes to show how critical of an issue this is that these models are behind closed doors.
How is competitive advantage, using in-house developed/funded tools, a critical issue? Every company has tools that only they have, that they pay significantly for to develop, and use extensively. It's can often be the primary thing that really differentiates companies who are all doing similar things.
346 days ago [-]
thelittleone 346 days ago [-]
I understand CEOs need to promote their companies, but it's notable that Google - arguably the world's leading information technology company - fell behind in AI development under Pichai's leadership. Now he's touting Google's internal capabilities, yet Gemini is being outperformed by relative newcomers like Anthropic and OpenAI.
His position seems secure despite these missteps, which highlights an interesting double standard: there appears to be far more tolerance for strategic failures at the CEO level compared to the rigorous performance standards expected of engineering staff.
jdefr89 345 days ago [-]
To be fair the paper that helped launch LLMs to a new level was from Google. “All You Need Is Attention”, Keras… They fell behind when it comes to marketing AI maybe…
mjhay 346 days ago [-]
100% of Sundar Pichai could be replaced by an AI.
elzbardico 346 days ago [-]
Well. When I developed in Java, I think that Eclipse did similar figures circa 2005.
sreitshamer 345 days ago [-]
Software development isn't a code-production activity, it's a knowledge-acquisition activity. It involves refactoring and deleting code too. I guess the AI isn't helping with that?
deterministic 346 days ago [-]
Not impressed. I currently auto generate 90% or more of the code I need to implement business solutions. With no AI involved. Just high level declarations of intent auto translated to C++/Typescript/…
syngrog66 345 days ago [-]
> "and we continue to be laser-focused on building great products."
NO! False. I can confirm they are not. I've known of several major obvious unfixed bugs/flaws in Google apps for years. and in the last year or so especially theres been an explosion in the number of head-scratching, jaw-dropping fails and UX anti-patterns in their code. GMail, Search, Maps and Android are now riddled with them.
on Sundar Pichai's watch he's been devolving Google to be yet another Microsoft type in terms of quality, care and taste.
agomez314 346 days ago [-]
I thought great engineers reduce the amount of new code in a codebase?
jeffbee 346 days ago [-]
It's quite amusing to me because I am old enough to remember when Copilot emerged the HN mainthought was that it was the death sentence for big corps, the scrappy independent hacker was going to run circles around them. But here we see the predictable reality: an organization that is already in an elite league in terms of developer velocity gets more benefit from LLM code assistants than Joe Hacker. These technologies serve to entrench and empower those who are already enormously powerful.
twis 346 days ago [-]
How much code was "written by" autocomplete before LLMs came along? From my experience, LLM integration is advanced autocomplete. 25% is believable, but misleading.
scottyah 346 days ago [-]
My linux terminal tab-complete has written 50% of my code
blibble 346 days ago [-]
this is the 2024 version of "25% of our code is now produced by outsourced resources"
arminiusreturns 346 days ago [-]
I was a luddite about the generative LLMs at first, as a crusty sysadmin type.
I came around and started experimenting. It's been a boon for me.
My conclusion is that we are at the first wave of a split between those who use LLMs to augment their abilities and knowledge, and those who delay. In cyberpunk terminally, it's aug-tech, not real AGI. (and the lesser ones code abilities and simpler the task, the more benefit, it's an accelerator)
skatanski 346 days ago [-]
I think at this moment, this sounds more like "quarter of the company's new code is created using stackoverflow and other forums. Many many people use all these tools to find information, as they did using stackoverflow a month ago, but now suddenly we can call it "created by AI". It'd be nice to have a distinction. I'm saying this, while being very excited about using LLMs as a developer.
jmartin2683 346 days ago [-]
I’m gonna bet this is a lie.
freedomben 346 days ago [-]
I don't think it's a lie, but I do think it's very misleading. With common languages probably 25% of code can be generated by an AI, but IME it's mostly just boilerplate or some pattern that largely just saves typing time, not programming/thinking time. In other words it's the 25% lowest hanging fruit, so thinking like "1/4 of programming is now done by AI" is misleading. It's probably more like 5 to 10 percent.
hsuduebc2 346 days ago [-]
I believe it is absolutely suitable for generating controllers in java spring or connecting to database and making a simple query which from my experience as an ordinary enterprise developer in Fintech is most of the job. Making these huge applicatins is a lot of repetitive work and integrations. Not a work that usually requires some advanced logic.
sanj 345 days ago [-]
Caveat: I formerly worked at Google.
What missing is that code being written by AI may have less of an impact than dataset that are developed or refined by AI. Consider examples like a utility function's coefficients, or the weights of a model.
As these are aggressively tuned using ML feedback, they'll influence far more systems than raw code.
nenadg 346 days ago [-]
Internet random person (me) says more than 99% of Google's 25%+ code written by AI has already been written by humans.
baalimago 345 days ago [-]
To me, programming assistants have two usecases:
1. Generate unit tests for modules which are already written to be tested
2. Generate documentation for interfaces
Both of these require quite deep knowledge in what to write, then it simply documents and fills in the blanks using the context which already has been laid out.
agilob 346 days ago [-]
So we're using CoL as a metric now?
piyuv 345 days ago [-]
I wish Tim Cook would reply with “more than half of all iMessages are created with autocomplete”
How do Google's IP lawyers feel about a quarter of the company's code not being copyrightable?
sjs382 345 days ago [-]
This was one of my first thoughts, too. In what ways can this contaminate their codebase? What if they use AI to add uncopyrightable code to GPL projects?
horns4lyfe 346 days ago [-]
I’d bet at least a quarter of their code is class definitions, constructors, and all the other minutiae files required for modern software, so that makes sense. But people weren’t writing most of that before either, we’ve had autocomplete and code geb for a long time.
ThinkBeat 346 days ago [-]
This is quite interesting to know.
I will be curious to see if it has any impact positive or negative
over a couple of years.
Will the code be more secure since the AI does not make the mistakes
humans do?
Or will the code, not well enough understood by the employees, exposes
exploits that would not be there?
Will it change average up time?
kunley 346 days ago [-]
what makes you think that current direction of AI development would lead to making less mistakes than humans do, as opposed to repeating same miskates plus hallucinating more?
Starlevel004 346 days ago [-]
No wonder search barely works anymore
tabbott 346 days ago [-]
Without a clear explanation of methodology, this is meaningless. My guess is this statistic is generated using misleading techniques like classifying "code changes generated by existing bulk/automated refactoring tools" as "AI generated".
mastazi 346 days ago [-]
The auto-linter in my editor probably generates a similar percentage of the characters I commit.
nine_zeros 347 days ago [-]
Writing more code means more needs to be maintained and they are cleverly hiding that fact. Software is a lot more like complex plumbing than people want to admit:
More lines == more shit to maintain. Complex lines == the shit is unmanageable.
But wall street investors love simplistic narratives such as More X == More revenue. So here we are. Pretty clever marketing imo.
davidclark 346 days ago [-]
If I tab complete my function and variable symbols, does my lsp write 80%+ of my lines of code?
_spduchamp 346 days ago [-]
I can ask AI to generate the same code multiple times, and get new variations on programming style each time, and get the occasional solution that is just not quite right but sort of works. Sounds like a recipe for a gloppy mushy mess of style salad.
mjbale116 347 days ago [-]
If you manage to convince software engineers that you are doing them a favour by employing them then they will approach any workplace negotiations with a specific mindset which will make them grab the first number it gets thrown to them.
These statements are brilliant.
akira2501 346 days ago [-]
These statements rely on an unchallenged monopoly position. This is not sustainable. These statements will hasten the collapse.
347 days ago [-]
hiptobecubic 346 days ago [-]
I've had mixed results writing "normal" business logic in c++, but i gotta say, for SQL it's pretty incredible. Granted SQL has a lot of boilerplate and predictable structure, but it saves a ton of time honestly.
346 days ago [-]
echoangle 346 days ago [-]
Does protobuf count as AI now?
Terr_ 346 days ago [-]
My concern is that "frequently needed and immediately useful results" is strongly correlated to "this code should already be abstracted away into a library by now."
Search Copy-Paste as a Service is hiding a deeper issue.
fredgrott 346 days ago [-]
Kind of useless stat given how much code a typical dev refactors....
zxilly 346 days ago [-]
As a go developer, Copilot write 100% "if err != nnil for me
Kiro 346 days ago [-]
I find it interesting that the people who dismiss the utility of AI are being so aggressive, sarcastic and hateful about it. Why all the anger? Where's the curiosity?
oglop 346 days ago [-]
No surprise. I give my career about 2 years before I’m useless.
k4rli 346 days ago [-]
Seems just overhyped tech to push up stock prices. It was already claimed 2 years ago that half of the jobs would be taken by "AI" but barely any have and AI has barely improved since GPT3.5. Latest Anthropic is only slightly helpful for software development, mostly for unusual bug investigations and logs analysis, at least in my experience.
phi-go 346 days ago [-]
They still need someone to write 75% of the code.
cebert 346 days ago [-]
Did AI have to go thru several rounds of Leetcode interviews?
> More than a quarter of new code created at Google is generated by AI, said CEO Sundar Pichai...
How do they know this? At face value, it sounds like alot, but it only says "new code generated". Nothing about code making it into source control or production, or even which parts of googles vast business units.
For all we know, this could be the result of some internal poll "Tell us if you've been using Goose recently" or some marketing analytics on the Goose "Generate" button.
It's puff piece to put Google back in the lime light, and everyone is lapping it up.
wokkaflokka 346 days ago [-]
No wonder their products are getting worse and worse...
me551ah 346 days ago [-]
AI has boosted my productivity but only marginally. Earlier I used to copy paste stuff from stackoverflow and now AI generates that for me.
teknopaul 346 days ago [-]
I'd say the same. But 90% of my time not writing code. It is mostly time wasted with github and k8s build issues.
okokwhatever 345 days ago [-]
People still don't understand those who pay the bills are those who claim developers are less and less necessary. It doesn't matter how much we love our job and how much we care for quality, at the end those who pay take more care of reducing workforce for something potentially free or cheap.
We are less needed, less cared and less seen as engineers. We are just a cost in a wrong column of Quickbooks.
Get use to it.
tremorscript 345 days ago [-]
Sounds about right and it explains a lot about the current quality of google products and google search. :-)
gilfoyle 345 days ago [-]
This is like saying more than a quarter of the code is from oss, examples and stackoverflow before LLMs.
meindnoch 346 days ago [-]
I saw code on master which was parsing HTML with regex. The author was proud that this code was mostly generated by AI.
:)
matt3210 345 days ago [-]
NVIDIA CEO said there would be no more developers too and it totally wasn't a marketing thing.
nottorp 346 days ago [-]
The protobuf boilerplate, right? :)
erlend_sh 345 days ago [-]
Self-interested hyperbole aside, I think that’s a laughably low number for what is now effectively an ‘AI Company’. I’m sure >95% of Google employees use Google (well, at least until recent years).
If this stuff really works as well as these companies claim it does, wouldn’t their entire workforce excitedly be using these tools already?
flessner 345 days ago [-]
"AI generated code" essentially means using Github Copilot or an alternative - these barely write a function without errors, nor are they even close to implementing a new feature autonomously.
I expect these tools to improve productivity for new-ish developers, however for anyone that is literate in a programming language the effect is marginal at best ("Copilot pause" etc.)
hollywood_court 345 days ago [-]
Cursor and v0.dev write 95% of the code for myself and the two other devs on my team.
chabes 346 days ago [-]
When Google announced their big layoffs, I noted the timing in relation to some big AI announcements. People here told me I was crazy for suggesting that corporations could replace employees with AI this early. Now the CEO is confirming that more than a quarter of new code is created by AI. Can’t really deny that reality anymore folks.
hbn 346 days ago [-]
I'd suggest the bigger factor in those layoffs is the money was made in earlier covid years where money was flowing and everyone was overhiring to show off record growth, then none of those employees had any justification for being kept around and were just a money sink so they fired them all.
Not to mention Elon publicly demonstrated losing 80% of staff when he took over twitter and - you can complain about his management all you want - as someone who's been using it the whole way through, from a technical POV their downtimes and software quality has not been any worse and they're shipping features faster. A lot of software companies are overstaffed, especially Google who has spent years paying people to make projects just to get a PO promoted, then letting the projects rot and die to be replaced by something else. That's a lot of useless work being done.
akira2501 346 days ago [-]
> Can’t really deny that reality anymore folks.
You have to establish that the CEO is actually aware of the reality and is interested in accurately conveying that to you. As far as I can tell there is absolutely no reason to believe any part of this.
paradox242 346 days ago [-]
When leaders without the requisite technical knowledge are making decisions then the question of whether AI is capable of replacing human workers is orthogonal to the question of whether human workers will be replaced by AI.
robohoe 346 days ago [-]
Who claims that he is speaking the truth and not some marketing jargon?
randomNumber7 346 days ago [-]
People who have replaced 25% of their brain with ai.
foobarian 346 days ago [-]
The real question is, what fraction of the company’s code is deleted by AI :-)
bryanrasmussen 346 days ago [-]
Public says more than a quarter of Google's search results are absolute crap.
Timber-6539 346 days ago [-]
All this talk means nothing until Google gives AI permissions to push to prod.
silexia 342 days ago [-]
Is this why Google search results are so bad now?
marstall 346 days ago [-]
maps with recent headlines about AI improving programmer productivity 20-30%.
which puts it in line with previous code-generation technologies i would imagine. I wonder which of these increased productivity the most?
- Assembly Language
- early Compilers
- databases
- graphics frameworks
- ui frameworks (windows)
- web apps
- code generators (rails scaffolding)
- genAI
akira2501 346 days ago [-]
Early Compilers. By a wide margin. They are the enabling factor for everything that comes below it. It's what allows you to share library interfaces and actually use them in a consistent manor and across multiple architectures. It entirely changed the shape of software development.
The gap between "high level assembly" and "compiled language" is about as large as it gets.
soperj 346 days ago [-]
The real question is how many lines of code was it responsible for removing.
haccount 346 days ago [-]
No wonder Gemini is a garbage fire if had chatgpt write the code for it.
1GZ0 346 days ago [-]
I wonder how much of that code is boilerplate vs. actual functionality.
defactor 346 days ago [-]
Try any AI tool to write basic factor code.hallucinates most of the time
347 days ago [-]
otabdeveloper4 346 days ago [-]
That explains a lot about Google's so-called "quality".
zxvkhkxvdvbdxz 346 days ago [-]
I feel this made me loose the respect I still had for Google
niobe 346 days ago [-]
This explains a LOT about Google's quality decline.
mgaunard 346 days ago [-]
AI is pretty good at helping you manage a messy large codebase and making it even more messy and verbose.
Is that a good thing though? We should work and making code small and easy to manage without AI tools.
socrateslee 345 days ago [-]
or saying that most of the Google engineers are using tools like copilot, and they use the copilot just as everyone else.
fortylove 345 days ago [-]
Is this why we finally got darkmode in gcal?
rockskon 346 days ago [-]
No shit a quarter of Google's new code is created by AI. How else do you explain why Google search has been so aggressively awful for the past 5~ years?
Seriously. The penchant for outright ignoring user search terms, relentlessly forcing irrelevant or just plain wrong information on users, and the obnoxious UI changes on YouTube! If I'm watching a video on full screen I have explicitly made it clear that I want YouTube to only show me video! STOP BRINGING UP THE FUCKING VIDEO DESCRIPTION TO TAKE UP HALF THE SCREEN IF I TRY TO BRIEFLY SWIPE TO VIEW THE TIME OR READ A MESSAGE.
I have such deep-seated contempt for AI and it's products for just how much worse it makes people's lives.
remram 346 days ago [-]
Yeah that might explain some of the loss of quality. Google apps and sites used to be solid, now they are full of not-breaking-but-annoying bugs like race conditions (don't press buttons too fast), display glitches, awful recommendations, and other usability problems.
Then again, their devices are also coming out with known fatal design flaws, like not being able to make phone calls, or the screen going black permanently.
nektro 346 days ago [-]
Google used to be respected, a place so highly sought after that engineers who worked there were revered like wizards. oh how they've fallen :(
346 days ago [-]
fmardini 346 days ago [-]
Proto-plumbing is very LLM amenable
dickersnoodle 344 days ago [-]
That explains a lot, actually.
ThinkBeat 346 days ago [-]
So um.
With making this public statement,
can we expect that 25% of "the bottom" coders at Google
will soon be granted a lot more time and
ability to spend time with their loves ones.
shane_kerns 346 days ago [-]
It's no wonder that their search absolutely sucks now. Duckduckgo is so much better in comparison now.
marstall 346 days ago [-]
first thought is that much of that 25% is test code for non-ai-gen code...
evbogue 347 days ago [-]
I'd be turning off the autocomplete in my IDE if I was at Google. Seems to double as a keylogger.
347 days ago [-]
marviel 346 days ago [-]
> 80% at Reasonote
octacat 344 days ago [-]
It is visible...
tylerchilds 345 days ago [-]
as a consumer, i never could have guessed
anacrolix 346 days ago [-]
Puts on Google
annlee2019 345 days ago [-]
google CEO doesn't write code
sheeshkebab 346 days ago [-]
and it shows… Google codebases I see in the wild are the worst - jumbled mess of hard to read code.
psunavy03 345 days ago [-]
And yet the 2024 State of DevOps report THAT GOOGLE PRODUCES has a butt-ton of caveats about the effectiveness of GenAI . . .
AI_beffr 346 days ago [-]
i like how people say that ai can only write "trivial" code well or without mistakes. but what about from the point of view of the AI? writing "trivial" code is probably almost exactly as much of a challenge as writing the most complex code a human could ever write. the scales are not the same. dont allow yourself to feel so safe..
Capricorn2481 345 days ago [-]
You think when people say AI can only write trivial code that they are writing from the perspective of AI, where trivial is actually impressive? That's is backward ass logic.
AI_beffr 344 days ago [-]
no im saying they are anthropomorphizing the capabilities of these AIs which disguises how advanced they really are.
Capricorn2481 343 days ago [-]
Not really what you said. In any case, people aren't doing that, they are just pointing out that AI writes poor code beyond very basic things. That's not Anthropomorphizing.
AI_beffr 343 days ago [-]
it is what i said exactly and they are doing it.
jdmoreira 346 days ago [-]
I would prefer if he was more competent and made the stock price go up.
I guess grifters are going to grift
sigmonsays 346 days ago [-]
imho code that is written by AI is code that is not worth having.
hodder 345 days ago [-]
The market would be even more shocked to learn that another 30% is pasted in from Stack Overflow!
AmazingTurtle 345 days ago [-]
Yeah, go ahead and lay off another 25% of development staff and see how well AI coders perform.:))
fennecbutt 345 days ago [-]
That explains a lot.
est 346 days ago [-]
Now maintain quarter of your old code base with AI, don't shut down services randomly.
skrebbel 346 days ago [-]
To my experience, AIs can generate perfectly good code relatively easy things, the kind you might as well copy&paste from stackoverflow, and they'll very confidently generate subtly wrong code for anything that's non-trivial for an experienced programmer to write. How do people deal with this? I simply don't understand the value proposition. Does Google now have 25% subtly wrong code? Or do they have 25% trivial code? Or do all their engineers babysit the AI and bugfix the subtly wrong code? Or are all their engineers so junior that an AI is such a substantial help?
Like, isn't this announcement a terrible indictment of how inexperienced their engineers are, or how trivial the problems they solve are, or both?
toasteros 346 days ago [-]
> the kind you might as well copy&paste from stackoverflow
This bothers me. I completely understand the conversational aspect - "what approach might work for this?", "how could we reduce the crud in this function?" - it worked a lot for me last year when I tried learning C.
But the vast majority of AI use that I see is...not that. It's just glorified, very expensive search. We are willing to burn far, far more fuel than necessary because we've decided we can't be bothered with traditional search.
A lot of enterprise software is poorly cobbled together using stackoverflow gathered code as it is. It's part of the reason why MS Teams makes your laptop run so hot. We've decided that power-inefficient software is the best approach. Now we want to amplify that effect by burning more fuel to get the same answers, but from an LLM.
It's frustrating. It should be snowing where I am now, but it's not. Because we want to frivolously chase false convenience and burn gallons and gallons of fuel to do it. LLM usage is a part of that.
jcgrillo 346 days ago [-]
What I can't wrap my head around is that making good, efficient software doesn't (by and large) take significantly longer than making bloated, inefficient enterprise spaghetti. The problem is finding people to do it with who care enough to think rigorously about what they're going to do before they start doing it. There's this bizarre misconception popular among bigtech managers that there's some tunable tradeoff between quality and development speed. But it doesn't actually work that way at all. I can't even count anymore how many times I've had to explain how taking this or that locally optimal shortcut will make it take longer overall to complete the project.
In other words, it's a skill issue. LLMs can only make this worse. Hiring unskilled programmers and giving them a machine for generating garbage isn't the way. Instead, train them, and reject low quality work.
aleph_minus_one 346 days ago [-]
> What I can't wrap my head around is that making good, efficient software doesn't (by and large) take significantly longer than making bloated, inefficient enterprise spaghetti. The problem is finding people to do it with who care enough to think rigorously about what they're going to do before they start doing it.
I don't think finding such programmers is really difficult. What is difficult is finding such people if you expect them to be docile to incompetent managers and other incompetent people involved in the project who, for example, got their position not by merit and competence, but by playing political games.
giantg2 346 days ago [-]
"What I can't wrap my head around is that making good, efficient software doesn't (by and large) take significantly longer than making bloated, inefficient enterprise spaghetti."
In my opinion the reason we get enterprise spaghetti is largely due to requirement issues and scope creep. It's nearly impossible to create a streamlined system without knowing what it should look like. And once the system gets to a certain size, it's impossible to get business buy-in to rearchitect or refactor to the degree that is necessary. Plus the full requirements are usually poorly documented and long forgotten by that time.
jcgrillo 346 days ago [-]
When scopes creep and requirements change, simply refactor. Where is it written in The Law that you have to accrue technical debt? EDIT: I'm gonna double down on this one. The fact that your organization thinks they can demand of you that you can magically weathervane your codebase to their changeable whims is evidence that you have failed to realistically communicate to them what is actually possible to do well. The fact that they think it's a move you can make to creep the scope, or change the requirements, is the problem. Every time that happens it should be studied within the organization as a major, costly failure--like an outage or similar.
> it's impossible to get business buy-in to rearchitect or refactor to the degree that is necessary
That's a choice. There are some other options:
- Simply don't get business buy-in. Do without. Form a terrorist cell within your organization. You'll likely outpace them. Or you'll get fired, which means you'll get severance, unemployment, a vacation, and the opportunity to apply to a job at a better company.
- Fight viciously for engineering independence. You business people can do the businessing, but us engineers are going to do the engineering. We'll tell you how we'll do it, not the other way.
- Build companies around a culture of doing good, consistent work instead of taking expedient shortcuts. They're rare, but they exist!
aleph_minus_one 346 days ago [-]
> Fight viciously for engineering independence.
Or simply find a position in an industry or department where you commonly have more independence. In my opinion this fight is not worth it - look for another position instead is typically easier.
llm_trw 346 days ago [-]
>When scopes creep and requirements change, simply refactor.
Congratulations, you just refactored out a use case which was documented in a knowledge base which has been replaced by 3 newer ones since then, happens once every 18 months and makes the company go bankrupt if it isn't carried out promptly.
The type of junior devs who think that making code tidy is fixing the application are the type of dev who you don't let near the heart of the code base, and incidentally the type who are best replaced with code gen AI.
wpietri 346 days ago [-]
Refactoring is improving the design of existing code. It shouldn't change behavior.
And regardless, the way you prevent loss of important functionality isn't by hoping people read docs that no longer exist. It's by writing coarse-grained tests that makes sure the software does the important things. If a programmer wants to change something that breaks a test like that, they go ask a product manager (or whatever you call yours) if that feature still matters.
And if nobody can say whether a feature still matters, the organization doesn't have a software problem, it has a serious management problem. Not all the coding techniques in the world can fix that.
jcgrillo 346 days ago [-]
If you don't understand your systems well enough to comfortably refactor them, you're losing the war. I probably should have put "simply" in scare quotes, it isn't simple--and that's the point. Responding to unreasonable demands, like completely changing course at the 11th hour, shouldn't come at a low price.
galdosdi 346 days ago [-]
It's a market for lemons.
Without redoing their work or finding a way to have deep trust (which is possible, but uncommon at a bigcorp) it's hard enough to tell who is earnest and who is faking it (or buying their own baloney) when it comes to propositions like "investing in this piece of tech debt will pay off big time"
As a result, if managers tend to believe such plans, bad ideas drive out good and you end up investing in a tech debt proposal that just wastes time. Burned managers therefore cope by undervaluing any such proposals and preferring the crappy car that at least you know is crappy over the car that allegedly has a brand new 0 mile motor on it but you have no way of distinguishing from a car with a rolled back odometer. They take the locally optimal path because it's the best they can do.
It's taken me 15 years of working in the field and thinking about this to figure it out.
The only way out is an organization where everyone is trusted and competent and is worthy of trust, which again, hard to do at most random bigcorps.
This is my current theory anyway. It's sad, but I think it kind of makes sense.
jcgrillo 345 days ago [-]
Soviet vs NATO. The Soviet management style is micromanaging exactly how to do everything from the rear. The NATO style is delegating to the front line ranks.
Being good at the NATO style of management means focusing on the big picture--what, when, why--and leaving how to the people actually doing it.
wpietri 346 days ago [-]
Agreed.
The way I explain this to managers is that software development is unlike most work. If I'm making widgets and I fuck up, that widget goes out the door never to be seen again. But in software, today's outputs are tomorrow's raw materials. You can trade quality for speed in the very short term at the cost of future productivity, so you're really trading speed for speed.
I should add, though, that one can do the rigorous thinking before or after the doing, and ideally one should do both. That was the key insight behind Martin Fowler's "Refactoring: Improving the Design of Existing Code". Think up front if you can, but the best designs are based on the most information, and there's a lot of information that is not available until later in a project. So you'll want to think as information comes in and adjust designs as you go.
That's something an LLM absolutely can't do, because it doesn't have access to that flow of information and it can't think about where the system should be going.
jcgrillo 345 days ago [-]
> the best designs are based on the most information, and there's a lot of information that is not available until later in a project
This is an important point. I don't remember where I read it, but someone said something similar about taking a loss on your first few customers as an early stage startup--basically, the idea is you're buying information about how well or poorly your product meets a need.
Where it goes wrong is if you choose not to act on that information.
wpietri 344 days ago [-]
For sure. Or, worse, choose to run a company in such a way that anybody making choices is insulated from that information.
c0balt 346 days ago [-]
It's relatively easy to find a programmer(s) who can realize enterprise project X, it's hard to find a programmer(s) who cares about X. Throwing an increased requirement like speed at it makes this worse because it usually ends up burning out both ends of the equation.
jihadjihad 346 days ago [-]
> The problem is finding people to do it with who care enough to think rigorously
> ...
> train them, and reject low quality work.
I agree very strongly with both of these points.
But I've observed a truth about each of them over the last decade-plus of building software.
1) very few people approach the field of software engineering with anything remotely resembling rigor, and
2) there is often little incentive to train juniors and reject subpar output (move fast and break things, etc.)
I don't know where this takes us as an industry? But I feel your comment on a deep level.
jcgrillo 346 days ago [-]
> 1) very few people approach the field of software engineering with anything remotely resembling rigor
This is a huge problem. I don't know where it comes from, I think maybe sort of learned helplessness? Like, if systems are so complex that you don't believe a single person can understand it then why bother trying anyway? I think it's possible to inspire people to not accept not understanding. That motivation to figure out what's actually happening and how things actually work is the carrot. The stick is thorough, critical (but kind and fair) code--and, crucially, design--review, and demanding things be re-done when they're not up to par. I've been extremely lucky in my career to have had senior engineers apply both of these tools excellently in my general direction.
> 2) there is often little incentive to train juniors and reject subpar output (move fast and break things, etc.)
One problem is our current (well, for years now) corporate culture is this kind of gig-adjacent-economy where you're only expected to stick around for a few years at most and therefore in order to be worth your comp package you need to be productive on your first day. Companies even advertise this as a good thing "you'll push code to prod on your first day!" It reminds me of those scammy books from when I was a kid in the late 90s "Learn C In 10 Days!".
wpietri 346 days ago [-]
> This is a huge problem. I don't know where it comes from
I think it's a bunch of things, but one legitimate issue is that software is stupidly complex these days. I had the advantage of starting when computers were pretty simple and have had a chance to grow along with it. (And my dad started when you could still lift up the hood and look at each bit. [1])
When I'm working with junior engineers I have a hard time even summing up how many layers lie beneath what they're working on. And so much of what they have to know is historically contingent. Just the other day I had to explain what LF and CR mean and how it relates to physical machinery that they probably won't see outside of a museum: https://sfba.social/@williampietri/113387049693365012
So I get how junior engineers struggle to develop a belief that the can sort it all out. Especially when so many people end up working on garbage code, where little sense is to be had. It's no wonder so many turn to cargo culting and other superstitious rituals.
I agree as well. These are actually things that bother me a lot about the industry. I’d love to write software that should run problem-free in 2035, but the reality is almost no one cares.
I’ve had the good fortune of getting to write some firmware that will likely work well for a long time to come, but I find most things being written on computers are written with (or very close to) the minimum care possible in order to get the product out. Clean up is intended but rarely occurs.
I think we’d see real benefits from doing a better job, but like many things, we fail to invest early and crave immediate gratification.
karolinepauls 346 days ago [-]
> very few people approach the field of software engineering with anything remotely resembling rigor, and
I have this one opinion which I would not say at work:
In software development it's easy to feel smart because what you made "works" and you can show "effects".
- Does it wrap every failable condition in `except Exception`? Uhh, but look, it works.
- Does it define a class hierarchy for what should be a dictionary lookup? It works great tho!
- Does it create a cyclic graph of objects calling each other's methods to create more objects holding references to the objects that created them? And for what, to produce a flat dictionary of data at end of the day? But see, it works.
this is getting boring, maybe just skip past the list
- Does it stuff what should be local variables and parameters in self, creating a big stateful blob of an object where every attribute is optional and methods need to be called in the right order, otherwise you get an exception? Yes, but it works.
- Does it embed a browser engine? But it works!
The programmer, positively affirmed, continues spewing out crap, while the senior keep fighting fires to keep things running, while insulating the programmer from the taste of their own medicine.
But more generally, it's hard to expect people to learn how to solve problems simply if they're given gigantic OO languages with all the features and no apparent cost to any of them. People learn how to write classes and then never learn get good at writing code with a clear data flow.
Even very bright people can get fall for this trap because engineering isn't just about being smart but about using intelligence and experience to solve a problem while minmaxing correctly chosen properties. Those properties should generally be: dev time, complexity (state/flow), correctness, test coverage, ease of change, performance (anything else?). Anyway, "Affirming one's opinions about how things should be done" isn't one of them.
mos_basik 346 days ago [-]
The whole one about the stateful blob of an object with all optional attributes got me real good. Been fighting that for years. But the dev that writes this produces code faster than me and understands parts of the system no one else does and doesn't speak great English, so it continues. And the company is still afloat. So who's right in the end? And does it matter?
karolinepauls 346 days ago [-]
I don't know who's right but I know that it's the ergonomics of programming languages that make producing stateful blobs fast and easy that are in the wrong.
jcgrillo 345 days ago [-]
You know it's a problem when you have to read a book having couple hundred pages to learn how to hold it right ;)
A4ET8a8uTh0 346 days ago [-]
<< Instead, train them, and reject low quality work.
Ahh, well, in order to save money, training is done via an online class with multiple choice questions, or, if your company is like mine and really committed to making sure that you know they take your training seriously, they put portions of a generic book on 'tech Z' in pdf spread spread over a drm ridden web pages.
As for code, that is reviewed, commented and rejected by llms as well. It is used to be turtles. Now it truly is llms all the way down.
That said, in a sane world, this is what should be happening for a company that actually wants to get good results over time .
noisy_boy 346 days ago [-]
> The problem is finding people to do it with who care enough to think rigorously about what they're going to do before they start doing it.
There is no incentive to do it. I worked that way, focused on quality and testing and none of my changes blew up in production. My manager opined that this approach is too slow and that it was ok to have minor breakages as long as they are fixed soon. When things break though, it's blame game all around. Loads of hypocrisy.
sethammons 346 days ago [-]
"Slow is smooth and smooth is fast"
jcgrillo 346 days ago [-]
It's true every single time.
chongli 346 days ago [-]
we've decided we can't be bothered with traditional search
Traditional search (at least on the web) is dying. The entire edifice is drowning under a rapidly rising tide of spam and scam sites. No one, including Google, knows what to do about it so we're punting on the whole project and hoping AI will swoop in like deus ex machina and save the day.
photonthug 346 days ago [-]
Maybe it is naive but I think search would probably work again if they could roll back code to 10 or 15 years ago and just make search engines look for text in webpages.
Google wasn’t crushed by spam, they decided to stop doing text search and build search bubbles that are user specific, location-specific, decided to surface pages that mention search terms in metadata instead of in text users might read, etc. Oh yeah, and about a decade before LLMs were actually usable, they started to sabotage simple substring searches and kind of force this more conversational interface. That’s when simple search terms stopped working very well, and you had to instead ask yourself “hmm how would a very old person or a small child phrase this question for a magic oracle”
This is how we get stuff like: Did you mean “when did Shakespeare die near my location”? If anyone at google cared more about quality than printing money, that thirsty gambit would at least be at the bottom of the page instead of the top.
hughesjj 346 days ago [-]
I remember in like 5th grade rural PA schools learning about Boolean operators in search engines and falling in love with them. For context, they were presenting alta vista and yahoo kids search as the most popular with Google being a "simple but effective new search platform" we might want to check out.
By the time I graduated highschool you already couldn't trust that Boolean operators would be treated literally. By the time I graduated college, they basically didn't seem to do anything, at best a weak suggestion.
Nowadays quotes don't even seem to be consistently honored.
II2II 346 days ago [-]
Even though I miss using boolean operators in search, I doubt that it was ever sustainable outside of specialized search engines. Very few people seem to think in those terms. Many of those who do would still have difficulty forming complex queries.
I suspect the real problem is that search engines ceased being search engines when they stopped taking things literally and started trying to interpret what people mean. Then they became some sort of poor man's AI. Now that we have LLMs, of course it is going to replace the poor excuse for search engines that exist today. We were heading down that road already, and it actually summarizes what is out there.
jordanb 346 days ago [-]
People were learning. Just like with mice and menus, people are capable of learning new skills and querying search engines was one. I remember when it was considered a really "n00b" thing to type a full question into a search engine.
Then Google decided to start enforcing that, because they had this idea that they would be able to divine your "intent" from a "natural question" rather than just matching documents including your search terms.
layer8 346 days ago [-]
> just make search engines look for text in webpages.
Google’s verbatim search option roughly does that for me (plus an ad blocker that removes ads from the results page). I have it activated by default as a search shortcut.
(To activate it, one can add “tbs=li:1” as a query parameter to the Google search URL.)
alex1138 346 days ago [-]
To me the stupidest thing was the removal of things like + and -. You can say it's because of Google+ but annoyingly duckduckgo also doesn't seem to honor it. Kagi seems to and I hope they don't follow the others down the road of stupid
jcgrillo 346 days ago [-]
> ?tbs=li:1
Thank you, this is almost life-alteringly good to know.
photonthug 346 days ago [-]
Funny, I can’t even test this because I’d need to know another neat trick to get my browser to let me actually edit the URL.
Seems that Firefox on mobile allows editing the url for most pages, but on google search results pages, the url bar magically turns into a did-you-mean alternate search selector where I cannot see nor edit a url. Surprised but not surprised.
Sure, there’s a work around for this too, somehow. But I don’t want to spend my life collecting and constantly updating a huge list of temporary hacks to fix things that others have intentionally broken.
layer8 346 days ago [-]
You can select verbatim search manually on the Google results page under Search tools > All results > Verbatim. You can also have a bookmark with a dummy search activating it, so you can then type your search terms into the Google search field instead of into the address bar.
Yes, it’s annoying that you can’t set it as the default on Google search itself.
tru3_power 346 days ago [-]
Wow what? Thanks!
CapeTheory 346 days ago [-]
> Maybe it is naive but I think search would probably work again if they could roll back code to 10 or 15 years ago and just make search engines look for text in webpages.
Even more naive, but my personal preference: just ban all advertising. The fact that people will pay for ChatGPT implies people will also pay for good search if the free alternative goes away.
Atreiden 346 days ago [-]
It's working for Kagi
masfuerte 346 days ago [-]
Google results are not polluted with spam because Google doesn't know how to deal with it.
Google results are polluted with spam because it is more profitable for Google. This is a conscious decision they made five years ago.
chongli 346 days ago [-]
because it is more profitable for Google
Then why are DuckDuckGo results also (arguably even more so) polluted with spam/scam sites? I doubt DDG is making any profit from those sites since Google essentially owns the display ad business.
JohnDone 346 days ago [-]
Ddg is actually Bing. Search as a service.
djvuvtgcuehb 346 days ago [-]
And Bing is google.
redwall_hp 346 days ago [-]
If you own the largest ad network that spam sites use and own the traffic firehose, pointing the hose at the spam sites and ensuring people spend more time clicking multiple results that point to ad-filled sites will make you more money.
Google not only has multiple monopolies, but a cut and dry perverse incentive to produce lower quality results to make the whole session longer instead of short and effective.
skissane 346 days ago [-]
I personally think a big problem with search is major search engines try to be all things to all people and hence suffer as a result.
For example: a beginner developer is possibly better served by some SEO-heavy tutorial blog post; an experienced developer would prefer results weighted towards the official docs, the project’s bug tracker and mailing list, etc. But since less technical and non-technical people vastly outnumber highly technical people, Google and Bing end up focusing on the needs of the former, at the cost of making search worse for the later.
One positive about AI: if an AI is doing the search, it likely wants the more advanced material not the more beginner-focused one. It can take more advanced material and simplify it for the benefit of less experienced users. It is (I suspect) less likely to make mistakes if you ask it to simplify the more advanced material than if you just gave it more beginner-oriented material instead. So if AI starts to replace humans as the main clients of search, that may reverse some of the pressure to “dumb it down”.
photonthug 346 days ago [-]
> But since less technical and non-technical people vastly outnumber highly technical people, Google and Bing end up focusing on the needs of the former, at the cost of making search worse for the later.
I mostly agree with your interesting comment, and I think your analysis basically jives with my sibling comment.
But one thing I take issue with is the idea that this type of thing is a good faith effort, because it’s more like a convenient excuse. Explaining substring search or even include/exclude ops to children and grandparents is actually easy. Setting preferences for tutorials vs API docs would also be easy. But companies don’t really want user-directed behavior as much as they want to herd users to preferred content with algorithms, then convince the user it was their idea or at least the result of relatively static ranking processes.
The push towards more fuzzy semantic search and “related content” everywhere is not to cater to novice users but to blur the line between paid advertisement and organic user-directed discovery.
No need to give megacorp the benefit of the doubt on stuff like this, or make the underlying problems seem harder than they are. All platforms land in this place by convergent evolution wherein the driving forces are money and influence, not insurmountable technical difficulties or good intentions for usability.
consp 346 days ago [-]
> For example: a beginner developer is possibly better served by some SEO-heavy tutorial blog post
Good luck finding those, you end op with SEO spam and clone page spam. These days you have to look for unobvious hidden meanings which only relate to your exact problem to find what you are looking for.
I have the strong feeling search these days is back to the Altavista era. You'd have to use trickery to find what you were looking for back then as well. Too bad + no longer works in google due to their stupid naming of a dead product (no, literal is not the same and no replacement).
tru3_power 346 days ago [-]
Yeah but this is just the name of the game. How can you even stop SEO style gamification at this point? I’m sure even LLMs are vulnerable/have been trained on SEO bs. End of the day it takes an informed user. Remember back in the day? Don’t trust the internet? I think that mindset will become the main school of thought once again. Which tbh, I think maybe a good thing.
skydhash 346 days ago [-]
> Traditional search (at least on the web) is dying.
That's not my experience at all. While there are scammy sites, using the search engines as an index instead of an oracle still yields useful results. It only requires to learn the keywords which you can do by reading the relevant materials .
chongli 342 days ago [-]
How do you read the relevant materials if you haven’t found them yet? It’s a chicken and egg problem. If your goal is to become an expert in a subject but you’re currently a novice, search can’t help you if it’s only giving you terrible results until you “crack the code.”
rubyfan 346 days ago [-]
AI will make the problem of low quality, fake, fraudulent and arbitrage content way worse. I highly doubt it will improve searching for quality content at all.
AnimalMuppet 346 days ago [-]
But it can't save the day.
The problem with Google search is that it indexes all the web, and there's (as you say) a rising tide of scam and spam sites.
The problem with AI is that it scoops up all the web as training data, and there's a rising tide of scam and spam sites.
AtlasBarfed 346 days ago [-]
There's no way the search AI will beat out the spamgen AI.
Tailoring/retraining the main search AI will be so much more expensive that retraining the spam special purpose AIs.
346 days ago [-]
layer8 346 days ago [-]
Without a usable web search index, AI will be in trouble eventually as well. There is no substitute for it.
akoboldfrying 346 days ago [-]
>The entire edifice is drowning under a rapidly rising tide of spam and scam sites.
You make this claim with such confidence, but what is it based on?
There have always been hordes of spam and scam websites. Can you point to anything that actually indicates that the ratio is now getting worse?
chongli 346 days ago [-]
There have always been hordes of spam and scam websites. Can you point to anything that actually indicates that the ratio is now getting worse?
No, there haven't always been hordes of spam and scam websites. I remember the web of the 90s. When Google first arrived on the scene every site on the results page was a real site, not a spam/scam site.
ShroudedNight 346 days ago [-]
That was PageRank flexing its capability. There were lots of sites with reams of honeypot text that caught the other search engines.
quickthrowman 346 days ago [-]
Google could fix the problem if they wanted to, but it’s not in their interests to fix it since the spam sites generally buy ads from Google and/or display Google ads on their spam websites. Google wants to maximize their income, so..
ponector 346 days ago [-]
>> No one, including Google, knows what to do about it
I'm sure they can. But they have no incentive. Try to Google an item, and it will show you a perfect match of sponsored ads and some other not-so-relevant non-sponsored results
petre 346 days ago [-]
AI will generate even more spam and scam sites more trivially.
ses1984 346 days ago [-]
What do you mean “will”, we are a few years past that point.
lokar 346 days ago [-]
It took the scam/spam sites a few years to catch up to Google search. Just wait a bit, equilibrium will return.
cyanydeez 346 days ago [-]
If only google was trying to solve search rather than shareholdet values.
fmos 346 days ago [-]
Kagi has fixed traditional search for me.
romwell 346 days ago [-]
Narrator: it did not, in fact, save the day.
jihadjihad 346 days ago [-]
Another frustration I have with these models is that it is yet another crutch and excuse for turning off your brain. I was tagged on a PR a couple days ago where a coworker had added a GIN index to a column in Postgres, courtesy of GPT-4o, of course.
He couldn't pronounce the name of the extension, apparently not noticing that trgm == trigram, or what that might even be. Copying the output from the LLM and pasting it into a PR didn't result in anything other than him checking off a box, moving a ticket in Jira, and then onto the next thing--not even a pretense of being curious about what any of it all meant. But look at those query times now!
It's been possible for a while to shut off your brain as a programmer and blindly copy-paste from StackOverflow etc., but the level of enablement that LLMs afford is staggering.
tru3_power 346 days ago [-]
Out of curiosity- did it work though?
gonzobonzo 346 days ago [-]
Doesn't this get to one of the fundamental issues though, that many of these frameworks and languages are poorly constructed in the first place? A lot of the times people turn to web searches, Stack Overflow, or AI is because they want to do X, and there's no quick, clear, and intuitive way to do X. I write cheat sheets for opaque parts of various frameworks myself. A lot of them aren't fundamentally difficult once you understand them, but they're constructed in an extremely convoluted way, and there's usually extremely poor documentation explaining how to actually use them.
In fact, I'd say I use AI more for documentation than I do for code itself, because AI generated documentation is often superior to official documentation.
In the end, these things shouldn't be necessary (or barely necessary) if we had well constructed languages, frameworks, libraries and documentation, but it appears like it's easier to build AI than to make things non-convoluted in the first place.
GaggiX 346 days ago [-]
These models are simply much more powerful than a tradition search engine and stackoverflow, so many people use these models for a reason, a friend of mine that never tried ChatGPT until very recently managed to solve a problem he couldn't find a solution on stackoverflow using GPT-4o, next time he's probably going to ask the model directly.
toasteros 345 days ago [-]
I don't know what your friend's prompts were, but this probably speaks to the conversational aspect. I've found success in using LLMs to "search" for things I don't know how to search for - a 'tip of my tongue' type scenario.
"How do I do a for loop" though is a waste of time and energy and should be put into a search engine. There is no need to use the inefficient power needs of an LLM to answer that question. The search engine will have cached the results of that question, leading to a much faster discovery of the answer, and less power draw to do it, whereas an LLM needs to ponder your question EVERY. SINGLE. TIME. A huge waste.
Stop using LLMs for simple things.
braiamp 346 days ago [-]
> because we've decided we can't be bothered with traditional search
Traditional search was only Google, and Google figured out that they don't need to improve their tools to make it better, because everyone will continue to use it as a force of habit (google is a verb!). Traditional search is being abandoned because traditional search isn't good enough for the kinds of search we need (also, while google may claim their search is very useful, people rarely search stuff nowadays, instead prefer being passively fed content via recommendations algorithm (that also use AI!))
dleeftink 346 days ago [-]
Algolia, Marginalia, Kagi, Scopus, ConnectedPapers, Lense[0] all stick to more or less traditional search and yield consistent high quality results. It shouldn't be one or the other, and I think the first one to combine both paradigms in a seamless fashion would be quite successfull (it has been tried, I know, but it's still a niche in many cases).
> But the vast majority of AI use that I see is...not that. It's just glorified, very expensive search.
Since the collapse of Internet search (rose tinted hindsight - was it ever any good?) I have been using a LLM as my syntax advisor. I pay for my own tokens, and I can say it is astonishingly cheap
It is also very good.
hawski 346 days ago [-]
A human can't be trusted to not make memory safety bugs. At the same time we can trust AI with logic bugs.
kelnos 346 days ago [-]
Since LLMs are just based on human output, we should trust LLMs (at best) as much as we trust the average human coder. And in reality we should probably trust them less.
Dalewyn 346 days ago [-]
>We are willing to burn far, far more fuel than necessary because we've decided we can't be bothered with traditional search.
That's because traditional search fucking sucks balls.
rpcope1 346 days ago [-]
I don't get it either. People will say all sorts of strange stuff about how it writes the code for them or whatever, but even using the new Claude 3.5 Sonnet or whatever variant of GPT4, the moment I ask it anything that isn't the most basic done-to-death boilerplate, it generates stuff that's wrong, and often subtly wrong. If you're not at least pretty knowledgeable about exactly what it's generating, you'll be stuck trying to troubleshoot bad code, and if you are it's often about as quick to just write it yourself. It's especially bad if you get away from Python, and try to make it do anything else. SQL especially, for whatever reason, I've seen all of the major players generate either stuff that's just junk or will cause problems (things that your run of the mill DBA will catch).
Honestly, I think it will become a better Intellisense but not much more. I'm a little excited because there's going to be so many people buying into this, generating so much bad code/bad architecture/etc. that will inevitably need someone to fix after the hype dies down and the rug is pulled, that I think there will continue to be employment opportunities.
solumunus 346 days ago [-]
Supermaven is an incredible intellisense. Most code IS trivial and I barely write trivial code anymore. My imports appear instantly, with high accuracy. I have lots of embedded SQL queries and it’s able to guess the structure of my database very accurately. As I’m writing a query the suggested joins are accurate probably 80% of the time. I’m significantly more productive and having to type much less. If this is as good as it ever gets I’m quite happy. I rarely use AI for non trivial code, but non trivial code is what I want to work on…
ta_1138 346 days ago [-]
This is all about the tooling most companies choose when building software: Things with more than enough boilerplate most code is trivial. We can build tools that have far less triviality and more density, where the distance between the code we write and business logic is very narrow.. but then every line of code we write is hard, because it's meaningful, and that feels bad enough to many developers, so we end up with tools where we might not be more productive, but we might feel productive, even though most of that apparent productivity is trivially generated.
We also have the ceremonial layers of certain forms of corporate architecture, where nothing actually happens, but the steps must exist to match the holy box, box cylinder architecture. Ceremonial input massaging here, ceremonial data transformation over there, duplicated error checking... if it's easy for the LLM to do, maybe we shouldn't be doing it everywhere in the first place.
thfuran 346 days ago [-]
>but then every line of code we write is hard, because it's meaningful, and that feels bad enough to many developers,
I don't know that I've ever even met a developer who wants to be writing endless pools of trivial boilerplate instead of meaningful code. Even the people at work who are willing to say they don't want to deal with the ambiguity and high level design stuff and just want to be told what to do pretty clearly don't want endless drudgery.
Aeolun 346 days ago [-]
That, but boilerplate stuff is also incredibly easy to understand. As compared to high density, high meaning code anyway. I prefer more low density low meaning code as it makes it much easier to reason about any part of the system.
wruza 346 days ago [-]
So basically it’s a presentation problem.
We want to control code at the call site, boilerplate helps with that by being locally modifiable.
We also want to systematize chunks of code so that they don’t flicker around and mess with a reader.
We wanted this since forever and no one does anything because anything above simple text completion is traditionally seen as an overkill, not the true way, not unix, etc. All sorts of stubborn arguments.
This can be solved by simply allowing code trees instead of lines of code (tree vs table). You drop a boilerplate into code marked as “boilerplate ‘foo’ {…}” and edit it as you see fit, which creates a boilerplate-local patch. Then you can instantly see diffs, find, update boilerplates, convert them to and from regular functions, merge best practices from boilerplate libraries, etc. Problem solved.
It feels like the development itself got collectively stuck in some stupid principles that no one dares to question. Everything that we invent stumbles upon the simple fact that we don’t have any sensible devtime structure, apart from this “file” and “import file” dullness.
thfuran 345 days ago [-]
In and of itself, it's usually easy to understand. But the more fluff you stuff between the nontrivial bits, the harder it is to take in all at once. I think large quantities of simple boilerplate make the overall project harder to understand and debug. Though that is in comparison to an imagined alternative that's somehow exactly the same but with all the glue removed, so maybe that's not entirely fair.
codr7 346 days ago [-]
I think you just nailed the paradox of Go's popularity among developers, managers are obvious.
monksy 346 days ago [-]
I don't think that is the signal that I think most people are hoping for here.
When I hear that most code is trivial, I think of this as a language design or a framework related issue making things harder than they should be.
Throwing AI or generates at the problem just to claim that they fixed it is just frustrating.
throw234234234 346 days ago [-]
> When I hear that most code is trivial, I think of this as a language design or a framework related issue making things harder than they should be.
This was one of my thoughts too. If the pain of using bad frameworks and clunky languages can be mitigated by AI, it seems like the popular but ugly/verbose languages will win out since there's almost no point to better designed languages/framework. I would rather a good language/framework/etc where it is just as easy to just write the code directly. Similar time in implementation to a LLM prompt, but more deterministic.
If people don't feel the pain of AI slop why move to greener pastures? It almost encourages things to not improve at the code level.
solumunus 346 days ago [-]
I'm writing software independently, with an extremely barebones framework (just handles routing pretty much) and very lean architecture. Maybe I should re-phrase it, "a lot of characters in the code base are trivial". Imports, function declarations, variable declarations. Is this stuff code/logic? Barely, but it's completely unavoidable. It all takes time and it's now time I rarely have to spend.
Just as an example, I have "service" functions. They're incredibly simple, a higher order function where I can inject the DB handler, user permissions, config, etc. Every time I write one of these I have to import the ServiceDependencies type and declare which dependencies I need to write the service. I now spend close to zero time doing that and all my time focusing on the service logic. I don't see a downside to this.
Most of my business logic is done in raw SQL, which can be complex, but the autocomplete often helps there too. It's not helping me figure out the logic, it's simply cutting down on my typing. I don't know how anyone could be offered "do you want to have type significantly less characters on your keyboard to get the same thing done?" and say "no thanks". The AI is almost NEVER coding for me, it's just typing for me and it's awesome.
I don't care how lean your system is, there will at least be repetition in how you declare things. There will be imports, there will be dependencies. You can remove 90% of this repetitive work for almost no cost...
I've tried to use ChatGPT to "code for me", and I agree with you that it's not a good option if you're trying to do anything remotely complex and want to avoid bugs. I never do this. But integrated code suggestions (with Supermaven, NOT CoPilot) are incredibly beneficial and maybe you should just try it instead of trying to come up with theoretical arguments. I was also a non-believer once.
int_19h 346 days ago [-]
Well, Google did design Go...
Kiro 346 days ago [-]
Interesting that you believe your subjective experience outweighs the claims of all others who report successfully using LLMs for coding. Wouldn't a more charitable interpretation be that it doesn't fit the stuff you're doing?
Regardless, I do wonder how accurate those successful reports are. Do people take LLM output, use it verbatim, not notice subtle bugs, and report that as success?
Kiro 346 days ago [-]
There's a big difference between "I've seen X" and "I've not seen X". The latter does not invalidate the former, unless you believe the person is lying or being delusional.
346 days ago [-]
tobyjsullivan 346 days ago [-]
I'm not a Google employee but I've heard enough stories to know that a surprising amount of code changes at google are basically updating API interfaces.
The way google works, the person changing an interface is responsible for updating all dependent code. They create PRs which are then sent to code owners for approval. For lower-level dependencies, this can involve creating thousands of PRs across hundreds of projects.
Google has had tooling to help with these large-scale refactors for decades, generally taking the form of static analysis tools. However, these would be inherently limited in their capability. Manual PR authoring would still be required in many cases.
With this background, LLM code gen seems like a natural tool to augment Google's existing process.
I expect Google is currently executing a wave of newly-unblocked refactoring projects.
If anyone works/worked at google, feel free to correct me on this.
cj 346 days ago [-]
Do they have tooling for generating scaffolding for various things (like unit/integration tests)?
If we’re guessing what code is easiest and largest proportion of codebase to write, my first guess would be test suites. Lots of lines of repetitive code patterns that repeat and AI is decent at dealing with
slibhb 346 days ago [-]
Most programming is trivial. Lots of non-trivial programming tasks can be broken down into pure, trivial sections. Then, the non-trivial part becomes knowing how the entire system fits together.
I've been using LLMs for about a month now. It's a nice productivity gain. You do have to read generated code and understand it. Another useful strategy is pasting a buggy function and ask for revisions.
I think most programmers who claim that LLMs aren't useful are reacting emotionally. They don't want LLMs to be useful because, in their eyes, that would lower the status of programming. This is a silly insecurity: ultimately programmers are useful because they can think formally better than most people. For the forseeable future, there's going to be massive demand for that, and people who can do it will be high status.
tonyedgecombe 346 days ago [-]
>I think most programmers who claim that LLMs aren't useful are reacting emotionally.
I don't think that's true. Most programmers I speak to have been keen to try it out and reap some benefits.
The almost universal experience has been that it works for trivial problems, starts injecting mistakes for harder problems and goes completely off the rails for anything really difficult.
theshackleford 346 days ago [-]
> I don't think that's true. Most programmers I speak to have been keen to try it out and reap some benefits.
I’ve been seeing the complete opposite. So it’s out there.
gorjusborg 346 days ago [-]
> Most programming is trivial
That's a bold statement, and incorrect, in my opinion.
At a junior level software development can be about churning out trivial code in a previously defined box. I don't think its fair to call that 'most programming'.
BobbyJo 346 days ago [-]
Probably overloading of the term "programming" is the issue here. Most "software engineering" is non-programming work. Most programming is not actually typing code.
Most of the time, when I am typing code, the code I am producing is trivial, however.
uh_uh 346 days ago [-]
Think of all the menial stuff you must perform regardless of experience level. E.g. you change the return type of a function and now you have to unpack the results slightly differently. Traditional automated tools fail at this. But if you show some examples to Cursor, it quickly catches on to the pattern and start autocompleting semi-automatically (semi because you still have to put the cursor to the right place but then you can tab, tab, tab…).
gorjusborg 345 days ago [-]
Don't misunderstand. I am not making an assertion that GenAI tools for development are useless.
I am just pointing out that the thread parent started his logical climb at a step one that is incorrect: 'Most programming is trivial'.
Given that they got it wrong on step one, how good do you thing step ten is?
346 days ago [-]
r14c 346 days ago [-]
From my perspective, writing out the requirements for an AI to produce the code I want is just as easy as writing it myself. There are some types of boilerplate code that I can see being useful to produce with an LLM, but I don't write them often enough to warrant actually setting up the workflow.
Even with the debugging example, if I just read what I wrote I'll find the bug because I understand the language. For more complex bugs, I'd have to feed the LLM a large fraction of my codebase and at that point we're exceeding the level of understanding these things can have.
I would be pretty happy to see an AI that can do effective code reviews, but until that point I probably won't bother.
er4hn 346 days ago [-]
It's reasonable to say that LLMs are not completely useless. There is also a very valid case to make that LLMs are not good at generating production ready code. I have found asking LLMs to make me Nix flakes to be a very nice way to make use of Nix without learning the Nix language.
As an example of not being production ready: I recently tried to use ChatGPT-4 to provide me with a script to manage my gmail labels. The APIs for these are all online, I didn't want to read them. ChatGPT-4 gave me a workable PoC that was extremely slow because it was using inefficient APIs. It then lied to me about better APIs existing and I realized that when reading the docs. The "vibes" outcome of this is that it can produce working slop code. For the curious I discuss this in more specific detail at: https://er4hn.info/blog/2024.10.26-gmail-labels/#using-ai-to...
Aeolun 346 days ago [-]
I find a recurring theme in these kind of comments where people seem to blame their laziness on the tool. The problem is not that the tools are imperfect, it’s that you apparently use them in situations where you expect perfection.
Does a carpenter blame their hammer when it fails to drive in a screw?
er4hn 346 days ago [-]
I'd argue that a closer analogy is I bought a laser based measuring device. I point it a distant point and it tells me the distance from the tip of the device to that point. Many people are excited that this tool will replace rulers and measuring tapes because of the ease of use.
However this laser measuring tool is accurate within a range. There's a lot of factors that affect it's accuracy like time of day, how you hold it, the material you point it at, etc. Sometimes these accuracy errors are minimal, sometimes they are pretty big. You end up getting a lot of measurements that seem "close enough". but you still need to ask if each one is correct. "Measure Twice, Cut Once" begins to require one measurement with the laser tool and once with the conventional tool when accuracy matters.
One could have a convoluted analogy where the carpenter has an electric hammer that for some reason has a rounded head that does cause some number of nails to not go in cleanly, but I like my analogy better :)
johnnyanmac 346 days ago [-]
>Does a carpenter blame their hammer when it fails to drive in a screw?
That's the exact problem. I have plenty of screwdrivers but there's so much pressure from people not in carpentry telling me to use this shiny new army Swiss knife contraption. Will it work? Probably, if I'm just screwing in a few screws. Would I readily abandon my set of precision built, magnetic tip, etc. Screwdriver set for it? Definitely not.
I'm sure it's great for non-carpenters to have so many tools in so small a space. But I developed skills and tools already. My job isn't just to screw in a few screws a day and call it quits. People wanting to replace me for a quarter the cost for this Swiss army carpenter will quickly see a quality difference and realize why it's not a solution to everything.
Or in the software sense, maybe they are fine with unlevel shelves and hanging nails in carpet. It's certainly not work I'd find acceptable.
johnnyanmac 346 days ago [-]
> I think most programmers who claim that LLMs aren't useful are reacting emotionally. They don't want LLMs to be useful because, in their eyes, that would lower the status of programming.
I think revealing the domain each programmer works in and asking in hose domains would reveal obvious trends. I imagine if you work in Web that you'll get workable enough AI gen code, but something like High Performance computing would get slop worse than copying and lasting the first result on Stackoverflow.
A model is only as good as its learning set, and not all types are code are readily able to be indexable.
adriand 346 days ago [-]
> Lots of non-trivial programming tasks can be broken down into pure, trivial sections. Then, the non-trivial part becomes knowing how the entire system fits together.
I think that’s exactly right. I used to have to create the puzzle pieces and then fit them together. Now, a lot of the time something else makes the piece and I’m just doing the fitting together part. Whether there will come a day when we just need to describe the completed puzzle remains to be seen.
boringg 346 days ago [-]
Trivial is fine but as you compound all the triviality the system starts to have a difficult time with putting it together. I don't expect it to nail it but then you have to unwind everything and figure out the issues so it isn't all gravy - fair bit of debug.
shinycode 346 days ago [-]
It’s always harder to build a mental model of the code written by someone else. No matter what, if you trust an LLM on small things in the long run you’ll trust it for bigger things. And the most code the LLM writes, the harder it is to build this mental construct. In the end it’ll be « it worked on 90% of cases so we trust it ». And who will debug 300 millions of code written by a machine that no one read based on trust ?
jolt42 346 days ago [-]
They are useful, but so far, I haven't seen LLMs being obviously more useful than stackoverflow. It might generate code closer to what I need than what I find already coded, but it also produces buggier code. Sometimes it will show me a function I wasn't aware of or approach I wouldn't have considered, but I have to balance that with all the other attempts that didn't produce something useful.
jerb 346 days ago [-]
Yes. Productivity tools make programmer time more valuable, not less. This is basic economics. You’re now able to generate more value per hour than before.
(Or if you’re being paid to waste time, maybe consider coding in assembly?)
So don’t be afraid. Learn to use the tools. They’re not magic, so stop expecting that. It’s like anything else, good at some things and not others.
Reason077 346 days ago [-]
A good farmer isn’t likely to complain about getting a new tractor. But it might put a few horses out of work.
derefr 346 days ago [-]
I would add that a lot of the time when I'm programming, I'm an expert on the problem domain but not the solution domain — that is, I know exactly what the pseudocode to solve my problem should look like; but I'm not necessarily fluent in the particular language and libraries/APIs I happen to have to use, in the particular codebase I'm working on, to operationalize that pseudocode.
LLMs are great at translating already-rigorously-thought-out pseudocode requirements, into a specific (non-esoteric) programming language, with calls to (popular) libraries/APIs of that language. They might make little mistakes — but so can human developers. If you're good at catching little mistakes, then this can still be faster!
For a concrete example of what I mean:
I hardly ever code in JavaScript; I'm mostly a backend developer. But sometimes I want to quickly fix a problem with our frontend that's preventing end-to-end testing; or I want to add a proof-of-concept frontend half to a new backend feature, to demonstrate to the frontend devs by example the way the frontend should be using the new API endpoint.
Now, I can sit down with a JS syntax + browser-DOM API cheat-sheet, and probably, eventually write correct code that doesn't accidentally e.g. incorrectly reject reject zero or empty strings because they're "false-y", or incorrectly interpolate the literal string "null" into a template string, or incorrectly try to call Element.setAttribute with a boolean true instead of an empty string (or any of JS's other thousand warts.) And I can do that because I have written some JS, and have been bitten by those things, just enough times now to recognize those JS code smells when I see them when reviewing code.
But just because I can recognize bad JS code, doesn't mean that I can instantly conjure to mind whole blocks of JS code that do everything right and avoid all those pitfalls. I know "the right way" exists, and I've probably even used it before, and I would know it if I saw it... but it's not "on the tip of my tongue" like it would be for languages I'm more familiar with. I'd probably need to look it up, or check-and-test in a REPL, or look at some other code in the codebase to verify how it's done.
With an LLM, though, I can just tell it the pseudocode (or equivalent code in a language I know better), get an initial attempt at the JS version of it out, immediately see whether it passes the "sniff test"; and if it doesn't, iterate just by pointing out my concerns in plain English — which will either result in code updated to solve the problem, or an explanation of why my concern isn't relevant. (Which, in the latter case, is a learning opportunity — but one to follow up in non-LLM sources.)
The product of this iteration process is basically the same JS code I would have written myself — the same code I wanted to write myself, but didn't remember exactly "how it went." But I didn't have to spend any time dredging my memory for "how it went." The LLM handled that part.
I would liken this to the difference between asking someone who knows anatomy but only ever does sculpture, to draw (rather than sculpt) someone's face; vs sitting the sculptor in front of a professional illustrator (who also knows anatomy), and having the sculptor describe the person's face to the illustrator in anatomical terms, with the sketch being iteratively improved through conversation and observation. The illustrator won't perfectly understand the requirements of the sculptor immediately — but the illustrator is still a lot more fluent in the medium than the sculptor is; and both parties have all the required knowledge of the domain (anatomy) to communicate efficiently about the sculptor's vision. So it still goes faster!
coliveira 346 days ago [-]
> people who can do it will be high status
They don't have high status even today, imagine in a world where they will be seen as just reviewers for AI code...
uh_uh 346 days ago [-]
> They don't have high status even today
Try putting on a dating website that you work at Google vs you work in agriculture and tell us which yielded more dates.
johnnyanmac 346 days ago [-]
Does it matter? I imagine the tanned shirtless farmer would get more hits than the pasty million dollar salary Googler anyway. (no offense to Googleers).
With so many hits, it's about hitting all the checkmarks instead of minmaxing on one check.
uh_uh 346 days ago [-]
You can't just arbitrarily change (confounding) variables like that for a proper experiment. All other factors (including physique) must remain the same while you change one thing only: occupation.
johnnyanmac 346 days ago [-]
"confounding" implies occupancy doesn't influce other factors of your life. I'm sure everyone wants the supermodel millionaire genius who's perfectly in touch with the feelings of their parter. If that was the norm then sure, farmers would be in trouble.
My comment was more a critique on online dating culture and the values it weighs compared to in person meetups.
uh_uh 346 days ago [-]
I think it’s possible to create 2 dating profiles with the same pictures and change occupation only. It doesn’t have to be real to measure the impact of occupation.
wvenable 346 days ago [-]
> Or do they have 25% trivial code?
We all have probably 25% or more trivial code. AI is great for that. I have X (table structure, model, data, etc) and I want to make Y with it. A lot of code is pretty much mindless shuffling data around.
The other thing is good for is anything pretty standard. If I'm using a new technology and I just want to get started with whatever is the best practice, it's going to do that.
If I ever have to do PowerShell (I hate PowerShell), I can get AI to generate pretty much whatever I want and then I'm smart enough to fix any issues. But I really don't like starting from nothing in a tech I hate.
lambdasquirrel 346 days ago [-]
I’ve already had one job interview where the applicant seemed broadly knowledgeable about everything we asked them during lead-in questions before actual debugging. Then when they had to actually dig deeper or demonstrate understanding while solving some problem, they fell short.
I’m pretty sure they weren’t the first and there’ve been others we didn’t know about. So now I don’t ask lead-in questions anymore. Surprisingly, it doesn’t seem to make much of a difference and I don’t need to get burned again.
randomNumber7 346 days ago [-]
Yes but then it would be more logical to say "AI makes our devs 25% more efficient". This is not what he said, but imo you are obviously right.
wvenable 346 days ago [-]
Not necessarily. If 25% of the code is written by AI but that code isn't very interesting or difficult, it might not be making the devs 25% more efficient. It could even possibly be more but, either way, these are different metrics.
johannes1234321 346 days ago [-]
The benefit doesn't translate 1:1. The generated code has to be read and verified and might require small adaptions. (Partially that can be done by AI as well)
But for me it massively improved all the boilerplate generic work. A lot of those things which are just annoying work, but not interesting.
Then I can focus on the bigger things, on the important parts.
groestl 346 days ago [-]
> do they have 25% trivial code?
From what I've seen on Google Cloud, both as a user and from leaked source code, 25% of their code is probably just packing and unpacking of protobufs.
hughesjj 346 days ago [-]
I'd bet at least 25% of code attributes to me in gitfarm at Amazon was generated by octane and/or bones.
God I miss that, thanks for the other person on HN introducing me to projen. Yeoman wasnt cutting it.
These days I write a surprising amount of shell script and awk with LLMs. I review and adapt it, of course, but for short snippets of low context scripting it's been a huge time saver. I'm talking like 3-4, up to 20 lines of POSIX shell.
Idk. Some day I'll actually learn AWK, and while I've gotten decent with POSIX shell (and bash), it's definitely been more monkey see monkey do than me going over all the libraries and reference docs like I did for python and the cpp FAQ.
akira2501 346 days ago [-]
> isn't this announcement a terrible indictment
Of obviously flawed corporate structures. This CEO has no particular programming expertise and most of his companies profits do not seem to flow from this activity. I strongly doubt he has a grip on the actual facts here and is uncritically repeating what was told to him in a meeting.
He should, given his position, been the very _first_ person to ask the questions you've posed here.
nimithryn 346 days ago [-]
An example:
I'm looking for a new job, so I've been grinding leetcode (oof). I'm an experienced engineer and have worked at multiple FAANGs, so I'm pretty good at leetcode.
Today I solved a leetcode problem 95% of the way to completion, but there was a subtle bug (maybe 10% of the test cases failing). I decided to see if Claude could help debug the code.
I put the problem and the code into Claude and asked it to debug. Over the course of the conversation, Claude managed to provide 5 or 6 totally plausible but also completely wrong "fixes". Luckily, I am experienced enough at leetcode, and leetcode problems are simple enough, that I could easily tell that Claude was mistaken. Note that I am also very experienced with prompt engineering, as I ran a startup that used prompt engineering very heavily. Maybe it's a skill issue (my company did fail, hence why I need a job), but somehow I doubt it.
Eventually, I found the bug on my own, without Claude's help. But leetcode are super simple, with known answers, and probably mostly in the training set! I can't imagine writing a big system and using an LLM heavily.
Similarly, the other day I was trying to learn about e-graphs (the data structure). I went to Claude for help. I noticed that the more I used Claude, the more confused I became. I found other sources, and as it turns out, Claude was subtly wrong about e-graphs, an uncommon but reasonably well-researched data structure! Once again, it's lucky I was able to recognize that something was up. If the problem wasn't limited in scope, I'd have been totally lost!
I use LLMs to help me code. I'm pro new technology. But when I see people bragging on Twitter about their fully automated coding solutions, or coding complex systems, or using LLMs for medical records or law or military or other highly critical domains, I seriously question their wisdom and experience.
bluerooibos 346 days ago [-]
At what point are people going to stop shitting on the code that Copilot or other LLM tools generate?
> how trivial the problems they solve are
A single line of code IS trivial. Simple code is good code. If I write the first 3 lines of a complex method and I let Copilot complete the 4th, that's 25% of my code written by an LLM.
These tools have exploded in popularity for good reason. If they were no good, people wouldn't be using them.
I can only assume people making such comments don't actually code on a daily basis and use these tools daily. Either that or you haven't figured out the knack of how to make it work properly for you.
thegrim33 346 days ago [-]
These tools have exploded in popularity for good reason. If they were no good, people wouldn't be using them.
You're saying anything that's ever been popular is popular for a good reason? You can't think of counter examples that disprove this?
You're saying anything that people decide to do is good, or else people wouldn't do it? People never act irrationally? People never blindly act on trends? People never sacrifice long-term results for short-term gain? You can't come up with any counter examples?
bluerooibos 336 days ago [-]
I don't really care to get into a philosophical debate about what's good or what people should use.
I use these tools daily and they help me immensely. If you prefer Googling for docs, browsing stack overflow, or even flicking through textbooks to find the answers/materials you need - that's great! Do what works for you. I value my time slightly more than that and prefer not to remain stuck in the past.
Perhaps you hold all the information you need in your head like an oracle and never need to learn new concepts or ever forget syntax? Wonderful. The rest of us aren't so naturally talented, so have found these new tools super helpful.
nijave 346 days ago [-]
remembers Bitcoin et al
ghosty141 346 days ago [-]
I havent seen anybody use them and be more productive.
With c++ my experience is that the results are completely worthless. It saves you from writing a few keywords but nothing that really helps in a big way.
Yes Copilot CAN work, for example writing some JS or filter functions, but in my job these trivial snippets are rather uncommon.
I‘d genuinely love to see some resources that show its usefulness that arent just PR bs.
bluerooibos 336 days ago [-]
> I havent seen anybody use them and be more productive.
What does that even mean? What are you expecting to see?
I've seen people who can't code ship entire new applications which actually work, in a few days or so. That to me seems more productive?
I use these tools daily in a FAANG level SWE role and they help me debug issues quickly - all the time, especially with tech I'm new to and have no experience with. I really don't understand the hate - it's like skipping stack overflow and giving you the ideal answer a lot faster.
Nobody likes to shout that they're using these tools but most people are.
fuzzy2 346 days ago [-]
I'll just answer here, but this isn't about this post in particular. It's about all of them. I've been struggling with a team of junior devs for the past months. How would I describe the experience? It's easy: just take any of these posts, replace "AI" with "junior dev", done.
Except of course AI at least can do spelling. (Or at least I haven't encountered a problem in that regard.)
I'm highly skeptical regarding LLM-assisted development. But I must admit: it works. If paired with an experienced senior developer. IMHO it must not be used otherwise.
palata 346 days ago [-]
Isn't the whole point of hiring a junior dev that they will learn and become senior devs eventually?
johnnyanmac 346 days ago [-]
Your mindset is sadly a decade put of touch. Companies long since shifted to churn mentality. They not only slashed retention perks, they actively expect people to move around every few years. So they don't bother stopping them or counter offering unless they are a truly exceptional person.
alfiedotwtf 346 days ago [-]
> replace "AI" with "junior dev", done.
Damn, that’s a good way of putting it. But I’ll go one further:
replace "AI" with "junior dev who doesn’t like reading documentation or googling how things work so instead confidently types away while guessing the syntax and API so it kind of looks right”
hughesjj 346 days ago [-]
I've been saying it's like an intern who has an incredible breadth of knowledge but very little depth, is excessively over confident in their own abilities given the error rates they commit, and is anxious to the point they'll straight up lie to you rather than admit a mistake.
Currently, they don't learn skills as fast as a motivated intern. A stellar intern can go from no idea to "makes relevant contributions to our product with significant independence and low error rate" (hi Matt if you ever see this) in 3 months. LLMs, to my understanding, take significantly more attention from super smart people working long hours and an army of mechanical Turks, but won't be able to independently implement a feature and will still have a higher error rate in the same 3 months.
It's still super impressive what LLMs can do, but that same intern is going to keep growing at that faster rate in skills and competency as they go from jr->mid->sr. Sure the intern won't have as large of a competency pool, and takes longer to respond to any given question, but the scope of what they can implement is so much greater.
skissane 346 days ago [-]
> To my experience, AIs can generate perfectly good code relatively easy things, the kind you might as well copy&paste from stackoverflow, and they'll very confidently generate subtly wrong code for anything that's non-trivial for an experienced programmer to write. How do people deal with this?
Well, just in the last 24 hours, ChatGPT gave me solutions to some relatively complex problems that turned out to be significantly wrong.
Did that mean it was a complete waste of my time? I’m not sure. Its broken code gave me a starting point for tinkering and exploring and trying to understand why it wasn’t working (even if superficially it looked like it should). I’m not convinced I lost anything by trying its suggestions. And I learned some things in the process (e.g. asyncio doesn’t play well together with Flask-Sock)
346 days ago [-]
JohnMakin 346 days ago [-]
> To my experience, AIs can generate perfectly good code relatively easy things, the kind you might as well copy&paste from stackoverflow,
This, imho, is what is happening. In the olden days, when StackOverflow + Google used to magically find the exact problem from the exact domain you needed every time - even then you'd often need to sift through the answers (top voted one was increasingly not what you needed) to find what you needed, then modify it further to precisely fit whatever you were doing. This worked fine for me for a long time until search rendered itself worthless and the overall answer quality of StackOverflow has gone down (imo). So, we are here, essentially doing the exact same thing in a much more expensive way, as you said.
Regarding future employment opportunities - this rot is already happening and hires are coming from it, at least from what I'm seeing in my own domain.
eco 346 days ago [-]
I'd be terribly scared to use it in a language that isn't statically typed with many, many compile time error checks.
Unless you're the type of programmer that is writing sabots all day (connecting round pegs into square holes between two data sources) you've got to be very critical of what these things are spitting out.
int_19h 346 days ago [-]
I can't help but think that Go might be one of the better languages for AI to target - statically typed, verbose with a lot of repeated scaffolding, yet generally not that easy to shoot yourself in the foot. Which might explain why this is a thing at Google specifically.
randomNumber7 346 days ago [-]
It is way more scary to use it for C or C++ than Python imo.
cybrox 346 days ago [-]
If you use it as advanced IntelliSense/auto-complete, it's not any worse than with typed languages.
If you just let it generate and run the code... yeah, probably, since you won't catch the issues at compile time.
aiforecastthway 346 days ago [-]
I decided to go into programming instead of becoming an Engineer because most Engineering jobs seemed systematic and boring. (Software Engineers weren't really a thing at the time.)
For most of my career, Software Engineering was a misnomer. The field was too young, and the tools used changed too quickly, for an appreciable amount of the work to be systematic and boring enough to consider it an Engineering discipline.
I think we're now at the point where Software Engineering is... actually Engineering. Particularly in the case of large established companies that take software seriously, like Google (as opposed to e.g. a bank).
Call it "trivial" and "boring" all you want, but at some point a road is just a road, and a train track is just a train track, and if it's not "trivial and boring" then you've probably fucked up pretty badly.
javaunsafe2019 346 days ago [-]
Since when is engineering boring? Stranges ideas and claims u made.
I’m an engineer who writes code since 20 years and it’s far away from trivial . Maybe to do web dev for a simple Webshop is. Elsewhere software has often times special requirements.
Be them technical or domain wise both make the process complex and not simple IMHO
aiforecastthway 346 days ago [-]
Boring is the opposite of exciting/dynamic.
Not all engineering is boring. Also, boring is not bad.
A lot of my career has been spent working to make software boring. To the extent that I've helped contribute to the status quo, where we can build certain types of software in a relatively secure fashion and on relatively predictable timelines, I am proud to have made the world more boring!
(Also, complexity can be extraordinarily boring. Some of the most complex things are also the most boring. Nothing more boring than a set of business rules that has an irreducible complexity coming in at 5,211 lines of if-else blocks wrapped in two while loops! Give me a simple set of partial differential equations any day -- much more exciting to work with those! If you're the type of person who enjoys reading tax code, then we just have different definitions of boring; and if you're the type of person doesn't think tax code is complex, then I'm just a dummy compared to you :))
But e.g. in the early naughts doing structural engineering work for residential new build projects was certainly less engaging and exciting work than building websites.
Most engineering works aims for repeatable and predictable outcomes. That's a good thing, and it's not easy to achieve! But if Software has reached the point where the process of building certain types of software is "repeatable and predictable", and if Google needs a lot of that type of software, then if the main criticism of AI code assistants is "it's only good for repeatable and predictable", well, then the criticism isn't exactly the indictment that skeptics think it is.
There is nothing wrong with boring in the sense I'm using it. Boring can be tremendously intellectually demanding. Also, predictable and repeatable processes are incredibly important if you want quality work at scale. Engineering is a good thing. Maturing as a field is a good thing.
But if we're maturing from "wild west everything is a greenfield project" to "70% of things are pretty systematic and repeatable" then that says something about the criticism of AI coding assistants as being only good for the systematic and repeatable stuff, right?
Also: the AI coding assistant paradigm is coming for structural/mechanical/civil engineering next, and in a big way!
sally_glance 346 days ago [-]
I was totally with you until "70% of things are pretty systematic and repeatable". This has not been my experience, and I think you acknowledged it yourself when you said "Google (as opposed to e.g. a bank)" - there are many more banks in the world than Googles. The main challenge will be transitioning all those "banks" to "Google's" and further still. They have 10y+ codebases written in 5 months by a single genius engineer (who later found his luck elsewhere), then hammered by multiple years of changing maintainers. That's the real "70% of things" :D
aiforecastthway 346 days ago [-]
No, I think we agree! Google SWE roles will be automated faster SWE roles in the financial sector :)
grepLeigh 346 days ago [-]
I have a whole "chop wood, carry water" speech born from leading corporate software teams. A lot of work at a company of sufficient size boils down to keeping up with software entropy while also chipping away at some initiative that rolls up to an OKR. It can be such a demotivating experience for the type of smart, passionate people that FANNGs like to hire.
There's even a buzzword for it: KTLO (keep the lights on). You don't want to be spending 100% of your time on KTLO work, but it's unrealistic to expect to do done of it. Most software engineers would gladly outsource this type of scutwork.
girvo 346 days ago [-]
> KTLO (keep the lights on)
Some places also call this "RTB" for "run the business" type work. Nothing but respect for the engineers who enjoy that kind of approach, I work with several!
asdfman123 346 days ago [-]
No, AI is generating a quarter of all characters. It's an autocomplete engine. You press tab, it finishes the line. Doesn't do any heavy lifting at all.
Source: I work there, see my previous comment.
dmurray 346 days ago [-]
> Or do they have 25% trivial code?
Surely yes.
I (not at Google) rarely use the LLM for anything more than two lines at a time, but it writes/autocompletes 25% of my code no problem.
I believe Google have character-level telemetry for measuring things like this, so they can easily count it in a way that can be called "writing 25% of the code".
Having plenty of "trivial code" isn't an indictment of the organisation. Every codebase has parts that are straightforward.
Nasrudith 346 days ago [-]
I wouldn't call it an indictment necessarily, because so much is dependent upon circumstances. They can't all be "deep problems" in the real world. Projects tend to have two components, "deep" work which is difficult and requires high skill and cannot be made up with by using masses of inexperienced and "shallow" work where being skilled doesn't really help, or doesn't help too much compared to throwing more bodies at the problem. To use an example it is like advanced accounting vs just counting up sales receipts.
Even if their engineers were inexperienced that wouldn't be an indictment in itself so long as they had a sufficient necessary amount of shallow work. Using all experienced engineers to do shallow work is just inefficient, like having brain surgeons removing bunions. Automation is basically a way to transform deep work to a producer of "free" shallow work.
That said, the real impressive thing with code isn't in its creation but in its ability to losslessly delete code and maintain or improve functionality.
hifromwork 346 days ago [-]
25% trivial code sounds like a reasonable guess.
fzysingularity 346 days ago [-]
This seems reasonable - but I'm interpreting this as most junior-level coding needs will end and be replaced with AI.
mrguyorama 346 days ago [-]
And the non junior developers will then just magically appear from the aether!With 10 years experience in a four year old stack.
TacticalCoder 346 days ago [-]
> and they'll very confidently generate subtly wrong code for anything that's non-trivial for an experienced programmer to write
Thankfully I don't find it subtle but plain wrong for anything but trivial stuff. I use it (and pay an AI subscription) for things where false positive won't ruin the day, like parameters validation.
But for anything advanced, it's pretty hopeless.
I've talked with lawyers: same thing. With doctors: same thing.
Which ain't no surprise see how these things do work.
> Like, isn't this announcement a terrible indictment of how inexperienced their engineers are, or how trivial the problems they solve are, or both?
Probably lots of highly repetitive boilerplate stuff everywhere. Which in itself is quite horrifying if you think about it.
sangnoir 346 days ago [-]
> Does Google now have 25% subtly wrong code?
How do you quantify "new code" - is it by lines of code or number of PRs/changesets generated? I can easily see it being the latter - if an AI workflow suggests 1 naming-change/cleanup commit to your PR made of 3 other human-authored commits, has it authored 25% of code? Arguably, yes - but it's trivial code that ought to be reviewed by humans. Dependabot is responsible for a good chunk of PRs already.
Having a monorepo brings plenty of opportunities for automation when refactoring - whether its AI, AST manipulation or even good old grep. The trick is not to merge the code directly, but have humans in the loop to approve, or take-over and correct the code first.
ants_everywhere 346 days ago [-]
Google's internal codebase is nicer and more structured than the average open source code base.
Their internal AI tools are presumably trained on their code, and it wouldn't surprise me if the AI is capable of much more internally than public coding AIs are.
geodel 346 days ago [-]
> Like, isn't this announcement a terrible indictment of how inexperienced their engineers are..
Well, Rob Pike said same thing about experience and that seemed to pissed lot of people endlessly.
However I don't think it as indictment It just seems very reasonable to me. In fact 25% seem to be on lower end. Amazon seems to have thousands of software engineers who are doing API calling API calling API.. kind of crap. Now their annual income might be more than my lifetime earnings. But to think that all these highly paid engineers are doing highly complex work that need high skills seems just a myth that is useful to boost ego of engineers and their employers alike.
afavour 346 days ago [-]
> Or do they have 25% trivial code?
If anything that's probably an underestimate. Not to downplay the complexity in much of what Google does but I'm sure they also do an absolute ton of tedious, boring CRUD operations that an AI could write.
herval 346 days ago [-]
In my experience, that was always the case with gpt3.5, most times the case with gpt4, some times the case with the latest sonnet. It’s getting better FAST, and the kind of code they can handle is increasing fast too
djvuvtgcuehb 346 days ago [-]
A better analogy is a self driving car where you need to keep your hands on the wheel in case something goes wrong.
For the most part, it drives itself.
Yes, the majority of my code is trivial. But I've also had ai iterate on some very non trivial work including writing the test suite.
It's basically autocomplete on steroids that predicts your next change in the file, not just the next change on the line.
The copy paste from stack overflow trope is a bit weird, I haven't done that in ten years and I don't think the code it produces is that low quality either. Copy paste from an open source repo on GitHub maybe?
pjmorris 346 days ago [-]
> Like, isn't this announcement a terrible indictment of how inexperienced their engineers are, or how trivial the problems they solve are, or both?
Or maybe there's a KPI around lines of code or commits.
fsckboy 346 days ago [-]
> Does Google now have 25% subtly wrong code?
maybe the ai generates 100% of the company's new code, and then by the time the programmers have fixed it, only 25% is left of the AI's ship of Theseus
rh2323o4jl234 346 days ago [-]
> Does Google now have 25% subtly wrong code?
I think you underestimate the amount of boiler-plate code that a typical job at Google requires. I found it soul-crushingly boring (though their pay is insane).
346 days ago [-]
airstrike 346 days ago [-]
By definition, "trivial" code should make up a significant portion of any code base, so perhaps the 25% is precisely the bit that is trivial and easily automated.
Smaug123 346 days ago [-]
I don't think the word "definition" means what you think it means!
manquer 346 days ago [-]
if their sales and stock depends on saying that new shinny thing is changing the world then they have to say so, and say how it is changing their world .
It is not Netflix or Airbnb or Stripe etc making this claim, google managers have a vested interest in this.
If this metric was meaningful either of two things should have happened - google should have fired 25 % developers or built 25 % more product .
Both of this would visible in their financial reporting and has not happened.
metrics like this claim depends on how you count, that is easily gamed and can be made to show any % between 0-99 you want. Of the top of head
- I could count all AI generated code used for training as new code
- consider compiler output to assembly as AI code by adding some meaningless AI step in it
- code generated with boilerplate perhaps even generated by llm now
- mix autocomplete with llm prompts so on
The number only needs to believable , 25 is believable now, it is not true but you would believe it >50 has psychological significance and bad PR on machines replacing humans jobs , less than 10 is bad for AI sales , 25 works all the commenters in this thread is testament to that
jajko 346 days ago [-]
I can generate in eclipse pojo classes or their accessor methods. I can let maven build entire packages from say XSDs (I know I am talking old boring tech, just giving an example). I can copy&paste half the code (if not more) from stack overflow.
Now replace all this and much more with 'AI'. If they said AI helped them increase say ad effectivity by 3-5%, I'll start paying attention.
signa11 346 days ago [-]
```
Like, isn't this announcement a terrible indictment of how inexperienced their engineers are, or how trivial the problems they solve are, or both?
```
there is a 3rd possibility as well: having spent a huge chunk of change on these techniques, why not overhype it (not outright lie about it) and hope to, somewhat recoup the cost from unsuspecting masses ?
cybrox 346 days ago [-]
Depends if they include test code in this metric. I have found AI most valuable in generating test code. I usually want to keep tests as simple as possible, so I prefer some repetition over abstraction to make sure there's no issues with the test logic itself, AI makes this somewhat verbose process very easy and efficient.
sally_glance 346 days ago [-]
I guess the obvious response would be - yes, they have _at least_ 25% trivial code (as any other enterprise), and yes, they should have lots of engineers 'babysitting' (aka generating training data). So in another year or two there will be no manpower at all needed for the trivial tasks.
skeeter2020 346 days ago [-]
trivial code could very easily include the vast majority of most apps we're building these days. Most of it's just glue, and AI can probably stitch together a bunch of API calls and some UI as well as a human. It could also be a lot of non-product code, tooling, one-time things, etc.
Cthulhu_ 346 days ago [-]
You're quick to jump to the assertion that AI only generates SO style utility code to do X, but it can also be used to generate boring mapping code (e.g. to/from SQL datasets). I heard one ex Google dev say that most of his job wat fiddling with Protobuf definitions and payloads.
aorloff 346 days ago [-]
Its been a while since I was really fully in the trenches, but not that long.
How people deal with this is they start by writing the test case.
Once they have that, debugging that 25% comes relatively easily and after that its basically packaging up the PR
andyjohnson0 346 days ago [-]
I suspect that a lot of the hard, google-scale stuff has already been done and packaged as an internal service or library - and just gets re-used. So the AIs are probably churning out new settings dialogs and the like.
vkou 346 days ago [-]
How would you react to a tech firm that in 2018, proudly announced that 25% of their code was generated by IntelliJ/Resharper/Visual Studio's codegen and autocomplete and refactoring tools?
nwellinghoff 346 days ago [-]
They probably have ai that scans existing human written code and auto generates patches and fixes to improve performance or security. The 25% is just a top level stat with no real meaning without context.
jjtheblunt 346 days ago [-]
Maybe the trick is to hide vetted correct code, of whatever origin, behind function calls for documented functions, thereby iteratively simplifying the work a later-trained LLM would need to do?
uoaei 346 days ago [-]
I've suspected for a while now that the people who find value in AI-generated code don't actually have hard problems to solve. I wonder how else they might justify their salary.
dyauspitr 346 days ago [-]
This subtly wrong thing happens maybe 10% of the time in my experience and asking it to generate unit tests or writing your own ahead of time almost completely eliminates it.
notyourwork 346 days ago [-]
To your point, I don't buy the truth of the statement. I work in big tech and am convinced that 25% of the code being written is not coming from AI.
ZiiS 346 days ago [-]
Yes 25% of code is trivial; certainly for companies like Google that have always been a bit NIH.
tmoravec 346 days ago [-]
Does the figure include unit tests?
ithkuil 346 days ago [-]
Or perhaps that even for excellent engineers and complicated problems a quarter of the code one writes is stupid almost copy-pasteable boilerplate which is now an excellent target for the magic lArge text Interpolator
Kiro 346 days ago [-]
You're doing circular reasoning based on your initial concern actually being a problem in practice. In my experience it's not, which makes all your other speculations inherently incorrect.
toviahudson5 345 days ago [-]
[dead]
346 days ago [-]
name_nick_sex_m 346 days ago [-]
[flagged]
llm_trw 346 days ago [-]
Or alternatively you don't know how to use AI to help you code and are in the 2020s equivalent of the 'Why do I need google when I have the yellow pages?' phase a lot of adults went through in the 2000s.
This is not a bad thing since you can improve, but constantly dismissing something that a lot of people are finding an amazing productivity boost should give you some pause.
johnnyanmac 346 days ago [-]
It's like blockchain right now. I'm sure there is some killer feature that can justify its problem space.
But as of now the field is full of swamps. Of grifters, of people with a solution looking for a problem. Of outright scams of questionable legality being challenged as we speak.
I'll wait until the swamps work itself out before evaluating an LLM workflow.
llm_trw 346 days ago [-]
Blockchain was always a solution looking for a problem.
LLMs are being used right now by a lot of people, myself included, to do tasks which we would have never bothered with before.
Again, if you don't know how to use them you can learn.
johnnyanmac 346 days ago [-]
And the same was said with the last fad when Blockcbain was all investors wanted to hear about ("Big Data" I suppose). It's all a pattern.
It's a legal nightmare in my domain as of now, so I'll make sure the Sam Breaker-Friends are weeded out. If it's really all the hype it won't be going anywhere in 5 years.
llm_trw 345 days ago [-]
It's been 5 years since GPT2. I'm really struggling to understand the amount of negativity towards the biggest breakthrough in computing since the WWW.
johnnyanmac 345 days ago [-]
If you're unaware of the general mood towards big tech in the 2020's, the downward trend of the economy, extreme speculation in all the tech sector over AI (which again, is not new), and the dozens of ethical quandries towards the methods of how LLMs obtain their data set, then yes. I can see why you're struggling to understand. There's so much literature on each point that I will only implore you to research these things on your own time if you care to.
In a purely technical vacuum though: it is truly amazing tech. I will give it that. Although it both excites and alarms me that apparently the power output predicted to properly leverage this at scale is having tech companies consider an investment in nuclear power.
llm_trw 345 days ago [-]
Yes it's wonderful that AI will solve global warming as a side effect.
johnnyanmac 344 days ago [-]
This is kind of why I'm skeptical of AI. When supposed tech experts are wearing rose tinted lens and missing the red flags, it's either because they want to wear them or because their livelihood depends on wearing them.
I won't blame people for that latter, I'd love a good quick way out of traditional work as well (gives me more time to hack on stuff without money troubles). But it's not a good model for curiosity and scrutiny. Again, I'll wait it out. Take care.
llm_trw 344 days ago [-]
The rose tinted glasses are everyone expecting batteries to become a major part of the grid so we don't have to shut it down when the sun isn't shining.
Investing trillions in carbon free energy for AI is the most benign form of bubble I can imagine. If the bubble pops we have enough base load for the next century and don't die from climate change. If it doesn't we have the expertise to keep building large nuclear power plants.
ajkjk 345 days ago [-]
Well yeah he sells AI and wants you to believe in it so the stock price stays good.
_hcuq 345 days ago [-]
Yeah. I just wrote 600 lines of SQL using a macro processor. Took 10 minutes.
throwaway290 346 days ago [-]
"More than a quarter of our code is created by autocomplete!"
That's not that much...
odinkara 345 days ago [-]
and it shows
jagged-chisel 346 days ago [-]
“Created by” or “with the assistance of”?
josephd79 346 days ago [-]
that explains everything.
DidYaWipe 346 days ago [-]
No wonder it sucks. Google's vaunted engineering has always been suspect, but their douchebaggery has been an accepted fact (even by them)>
nephy 346 days ago [-]
Can we move on to the next grift yet?
bamboozled 346 days ago [-]
"Product has not improved, or maybe even become worse in that time"
martin82 346 days ago [-]
I guess that must be the reason for the shocking enshitification of Google
yapyap 346 days ago [-]
yikes
pixelat3d 347 days ago [-]
[flagged]
jrockway 347 days ago [-]
When I was there, way more than 25% of the code was copying one proto into another proto, or so people complained. What sort of memes are people making now that this task has been automated?
hn_throwaway_99 347 days ago [-]
I am very interested in how this 25% number is calculated, and if it's a lot of boilerplate that in the past would have been just been big copy-paste jobs like a lot of protobuffers work. Would be curious if any Googlers could comment.
Not that I'm really discounting the value of AI here. For example, I've found a ton of value and saved time getting AI to write CDKTF (basically, Terraform in Typescript) config scripts for me. I don't write Terraform that often, there are a ton of options I always forget, etc. So asking ChatGPT to write a Terraform config for, say, a new scheduled task for example saves me from a lot of manual lookup.
But at the same time, the AI isn't really writing the complicated logic pieces for me. I think that comes down to the fact that when I do need to write complicated logic, I'm a decent enough programmer that it's probably faster for me to write it out in a high-level programming language than write it in English first.
dietr1ch 347 days ago [-]
I miss old memegen, but it got ruined by HR :/
rcarmo 347 days ago [-]
I am reliably told that it is alive and well, even if it’s changed a bit.
anon1243 346 days ago [-]
Memegen is there but unrecognizable now. A dedicated moderator team deletes memes, locks comments, bans people for mentioning "killing a process" (threatening language!) and contacts their managers.
dietr1ch 346 days ago [-]
Yup, I simply stopped using it, which means they won.
347 days ago [-]
347 days ago [-]
kev009 347 days ago [-]
I would hope a CEO, especially a technical one, would have enough sense to couple that statement to some useful business metric, because in isolation it might be announcement of public humiliation.
dmix 347 days ago [-]
The elitism of programmers who think the boilerplate code they write for 25% of the job, that's already been written before by 1000 other people before, is in fact a valuable use of company time to write by hand again.
IMO it's only really an issue if a competent human wasn't involved in the process, basically a person who could have written it if needed, then they do the work connecting it to the useful stuff, and have appropriate QA/testing in place...the latter often taking far more effort than the actual writing-the-code time itself, even when a human does it.
marcosdumay 347 days ago [-]
If 25% of your code is boilerplate, you have a serious architectural problem.
That said, I've seen even higher ratios. But never in any place that survived for long.
hn_throwaway_99 347 days ago [-]
Depends on how you define "boilerplate". E.g. Terraform configs count for a significant number of the total lines in one of my repos. It's not really "boilerplate" in that it's not the exact same everywhere, but it is boilerplate in the since that setting up, say, a pretty standard Cloud SQL instance can take many, many lines of code just because there are so many config options.
marcosdumay 346 days ago [-]
Terraform is verbose.
It's only boilerplate if you write it again to set almost the same thing again. What, granted, if you are writing bare terraform config, it's probably both.
But on either case, if your terraform config is repetitive and a large part of the code on an entire thing (not a repo, repos are arbitraty divisions, maybe "product", but it's also a bad name). Than that thing is certainly close to useless.
cryptoz 347 days ago [-]
Android mobile development has gotten so …architectured that I would guess most apps have a much higher rate of “boilerplate” than you’d hope for.
Everything is getting forced into a scalable, general purpose way, that most apps have to add a ridiculous amount of boilerplate.
TheNewsIsHere 347 days ago [-]
To add: it’s been my experience that it’s the company that thinks the boilerplate code is some special, secret, proprietary thing that no other business could possibly have produced.
Not the developer who has written the same effective stanza 10 times before.
8note 347 days ago [-]
Is it though? It seems to me like a team ownership boundary question rather than an architecture question.
Architecturally, it sounds like different architecture components map somewhere close to 1:1 to teams, rather than teams hacking components to be closer coupled to each other because they have the same ownership.
I'd see too much boilerplate as being a organization/management org issue rather than a code architecture issue
wvenable 346 days ago [-]
25% of new code might be boilerplate. All my apps in my organization start out roughly the same way with all the same stuff. You could argue on day one that 100% of the code is boilerplate and by the end of the project it is only a small percentage.
dmix 347 days ago [-]
You're probably thinking of just raw codebases, your company source code repo. Programmers do far, far more boilerplate stuff than raw code they commit with git. Debugging, data processing, system scripts, writing SQL queries, etc.
Combine that with generic functions, framework boilerplate, OS/browser stuff, or explicit x-y-z code then your 'boilerplate' (ie repetitive, easily reproducible) easily gets to 25% of code you're programmers write every month. If your job is >75% pure human cognition problem solving you're probably in a higher tier of jobs than the vast majority of programmers on the planet.
kev009 347 days ago [-]
Doing the same thing but faster might just mean you are masturbating more furiously. Show me the money, especially from a CEO.
mistrial9 347 days ago [-]
you probably underestimate the endless miles of verbose code that are possible, by human or machine but especially by machine.
347 days ago [-]
dyauspitr 347 days ago [-]
Or a statement of pride that the intelligence they created is capable of lofty tasks.
347 days ago [-]
joeevans1000 347 days ago [-]
I read these threads and the usual 'I have to fix the AI code for longer than it would have taken to write it from scratch' and can't help but feel folks are truly trying to downplay what is going to eat the software industry alive.
steve_adams_86 346 days ago [-]
I’m not convinced it’s there yet. I think it’s actively eating part of the software industry, but I wonder where that’ll stop—at least for some time—and a new shape of the industry is settled upon.
There are still things I do in my IDE that I can’t seem to get AI to do. It’s not really close yet. I don’t doubt it could get there eventually, but I suppose I don’t believe it’s about to eat those parts of the industry.
I do anticipate a massive issue from lower skill software jobs vanishing. I don’t know what entry into the industry will look like. There will be a strange gap that’s filled by AI and some people who use it to do basic things but have no idea how it does it. They will be somewhat like data entry workers, knowing how to use a spreadsheet or word processor but having no idea how the program actually works let alone the underlying operating system. I fully expect that to happen, and I can’t properly imagine what the implications will be.
347 days ago [-]
tylerchilds 347 days ago [-]
if the golden rule is that code is a liability, what does this headline imply?
eddd-ddde 347 days ago [-]
The code would be getting written anyways, its an invariant. The difference is less time wasted typing keys (albeit small amount of time) and more importantly (in my experience) it helps A LOT for discoverability.
With g3's immense amount of context, LLMs can vastly help you discover how other people are using existing libraries.
tylerchilds 347 days ago [-]
my experience dabbling with the ai and code is that it is terrible at coming up with new stuff unless it already exists
in regards to how others are using libraries, that’s where the technology will excel— re-writing code. once it has a stable AST to work with, the mathematical equation it is solving is a refactor.
until it has that AST that solves the business need, the game is just prompt spaghetti until it hits altitude to be able to refactor.
JimDabell 347 days ago [-]
Nothing at all. The headline talks about the proportion of code written by AI. Contrary to what a lot of comments here are assuming, it does not say that the volume of code written has increased.
Google could be writing the same amount of code with fewer developers (they have had multiple layoffs lately), or their developers could be focusing more of their time and attention on the code they do write.
contravariant 346 days ago [-]
Well, either they just didn't spend as much time writing the code or they increased their liability by about 33%.
The truth is likely somewhere in between.
danielmarkbruce 347 days ago [-]
I'm sure google won't pay you money to take all their code off their hands.
AlexandrB 347 days ago [-]
But they would pay me money to audit it for security.
danielmarkbruce 347 days ago [-]
yup, you can get paid all kinds of money to fix/guard/check billion/trillion dollar assets..
347 days ago [-]
an_d_rew 347 days ago [-]
Huh.
That may explain why google search has, in the past couple of months, become so unusable for me that I switched (happily) to kagi.
twarge 347 days ago [-]
Which uses Google results?
347 days ago [-]
croes 347 days ago [-]
Related?
> New tool bypasses Google Chrome’s new cookie encryption system
Google is now mass-producing techdebt at rates not seen since Martin Fowler’s first design pattern blogposts.
nelup20 346 days ago [-]
We've now entered the age of exponential tech debt, it'll be a sight to behold
joeevans1000 347 days ago [-]
Not really technical debt when you will be able to regenerate 20K lines of code in a minute then QA and deploy it automatically.
1attice 347 days ago [-]
Assuming, of course:
- You know which 20K lines need changing
- You have perfect QA
- Nothing ever goes wrong in deployment.
I think there's a tendency in our industry to only take the hypotenuse of curves at the steepest point
TheNewsIsHere 347 days ago [-]
That is a fantastic way to put it. I’d argue that you’ve described a bubble, which fits perfectly with the topic and where _most_ of it will eventually end up.
kibwen 347 days ago [-]
So a fresh, new ledger of technical debt every morning, impossible to ever pay off?
347 days ago [-]
Tier3r 347 days ago [-]
Google is getting enshittified. It's already visible in many small ways. I was just using Google maps and in the route they called X (bus) Interchange as X International. I can only assume this happened because they are using AI to summarise routes now. Why in the world are they doing that? They have exact location names available.
347 days ago [-]
microtherion 347 days ago [-]
[flagged]
347 days ago [-]
Tiktaalik 347 days ago [-]
[flagged]
347 days ago [-]
calmbonsai 347 days ago [-]
[flagged]
347 days ago [-]
pyuser583 347 days ago [-]
[flagged]
YPPH 347 days ago [-]
Actually 0%, assembly language is assembled to machine code, not compiled.
ndesaulniers 347 days ago [-]
Inline asm has to go through the compiler to get wired up by the register allocator.
floor_ 346 days ago [-]
So no one owns a quarter of the new code at google. It's going to be very funny when it hits 100%.
346 days ago [-]
FactKnower69 347 days ago [-]
[flagged]
eob 347 days ago [-]
So GCS customers will trust their codegen product. (Engineers aren’t the buyer; corp suite is)
hn_throwaway_99 347 days ago [-]
I don't understand why you think this at all. Care to explain?
dartharva 347 days ago [-]
Why? Especially when said AI helpers are a part of what the company itself is selling?
joeevans1000 347 days ago [-]
These companies are competing to be the next codegen service provider.
foota 347 days ago [-]
Translation: They'd love to lay off all the engineers.
sfmz 347 days ago [-]
We should watch for dev layoffs as a sign/signal of the impact of generated code. I remember reading about an anime shop that fired 80% of its illustrators due to ai-images.
TheNewsIsHere 347 days ago [-]
By some intuitive measures, it’s surprising they have very many still writing their code. Google’s product quality isn’t what it once was. There is no amount of AI accelerators and energy they can burn through to fix that without humans.
lesuorac 347 days ago [-]
Well, the article has a paywall so it might go into this.
I'm not sure this stat is as important as people point it out to be. If I start of `for` and the AI auto-completes `for(int i=0; i<args.length; i++) {` then a lot more than 25% of the code is AI written but it's also not significant. I could've figured out how to write the for-loop and its also not a meaningful amount of time saved because most of the time is figuring out and testing which the AI doesn't do.
dyauspitr 347 days ago [-]
I don’t think the public cares wether their code is written by machines or real people as long as the product works.
Nullabillity 347 days ago [-]
Just today, Google Calendar asked me whether I wanted the "easy" or "depressed" colour scheme.
mattigames 347 days ago [-]
It's for when you have an upcoming funeral, the calendar it's just trying to dress appropriately.
Mistletoe 347 days ago [-]
Ironically, your comment brightened my day.
bakugo 347 days ago [-]
[flagged]
mergisi 347 days ago [-]
[dead]
xyst 346 days ago [-]
I remember Google used to market "lines of code" for their products. Chrome at one point had 6.7 LoC. Now the new marketing term is: "product was made with 1M lines of AI generated code (slop)!11!". Or "Chrome refactored with 10% AI" or some bs
klocksib 345 days ago [-]
it's quicker and easier than ever to generate a project to send to the Google Graveyard.
ultra_nick 347 days ago [-]
Why work at big businesses anymore? Let's just create more startups.
347 days ago [-]
IAmGraydon 347 days ago [-]
Risk appetite.
game_the0ry 346 days ago [-]
Not so sure nowadays. Given how often big tech lays off employees and the abundance of recently laid off tech talent, trying to start your own company sounds a lot more appealing than ever.
I consider myself risk-averse and even I am contemplating starting a small business in the event I get laid off.
shiroiushi 346 days ago [-]
> trying to start your own company sounds a lot more appealing than ever.
It really isn't. Even if you get laid off from a large tech company, you probably didn't have to pay a cent to get the job there in the first place, and you started drawing a paycheck right away (after the initial delay due to the pay cycle). If you only work there for 6 months, you can save a really good amount of money if you have frugal habits.
Starting a company isn't nearly as easy, usually requires up-front investment, and there can be a long time before you generate any profit. Either you need some business idea that's going to generate profit (or at least enough revenue to give the founder(s) a paycheck), or a business loan or other funding, which means convincing someone to invest in your company somehow.
Starting your own company only sounds appealing if you ignore reality, or have the privilege of having plenty of cash saved up for such a venture.
game_the0ry 344 days ago [-]
I look at it differently.
If I was working a typical corp job, I would "quiet quit" and start using my excess savings to run small experiments. Maybe run like 3 in parallel a month and let them cook.
Example - start a niche blog based on my hobby, hire 3 writers and pay them up to $3k a month to write content for the blog. Let it cook for months. If it gets traffic, monetize with ads. If very profitable, quit job.
Starting an internet based business if fairly cheap nowadays, especially with cursor and ai.
wayoverthecloud 346 days ago [-]
Interesting. I think the same thing but I wonder if the market is not ready for products created by the big guys, what can I offer? Have you thought in that line?
IAmGraydon 344 days ago [-]
You’re thinking about it wrong. Most large companies won’t put development time into an idea that would only make them $1-5M per year. On the other hand, $1-5M per year is great money to an individual. So there’s a lot of untouched markets that can make you rich but just don’t interest the big guys.
All of that said, there are a lot of products that are produced by large companies and are just bad. Don’t be afraid to go after a Goliath if you see an opportunity.
game_the0ry 344 days ago [-]
Well the whole point is that you have some edge that the "big guys" cannot compete with or you have discovered an opportunity they have not (making you ripe for acquisition).
New successful businesses are being created all the time. We just focus on the ones that have already been successful for a long time.
1oooqooq 347 days ago [-]
this only means employees sign up to use new toys and they are paying enough seats for all employees.
it's like companies paying all those todolist and tutorial apps left running on aws ec2 instances in 2007ish.
I'd be worried if i were a google investor. lol.
fragmede 347 days ago [-]
I'm not sure I get your point. Google created Gemini and whatever internal LLM their employees are using for code generation. Who are they paying,
and for what seats? Not Microsoft or OpenAI or Anthropic...
Rendered at 01:21:50 GMT+0000 (Coordinated Universal Time) with Vercel.
But the code completion engine is basically just good at finishing the lines I'm writing. If I'm writing "function getAc..." it's smart enough to complete to "function getActionHandler()", and maybe suggest the correct arguments and a decent jsdoc comment.
So basically, it's a helpful productivity tool but it's not doing any engineering at all. It's probably about as good, maybe slightly worse, than Copilot. (I haven't used it recently though.)
1. This quote is clearly meant to exaggerate reality, and they are likely including things like fully automated CL/PR's which have been around for a decade as "AI generated".
2. I stated before that if a team of 10 is equally as productive as a team of 8 utilizing things like copilot, it's fair to say "AI replaced 2 engineers", in my opinion. More importantly, Tech leaders would be making this claim if it were true. Copilot and it's clones have been around long enough know for the evidence to be in, and no one is stating "we've replaced X% of our workforce with AI" - therefor my claim is (by 'denying the consequent'), using copilot does not materially accelerate development.
Even if that's been happening, I don't think it would be politically savvy to admit it.
In today's social climate claiming to replace humans with AI would attract the wrong kind of attention from politicians (during an election year) and from the public in general.
This would be even more unwise to admit for a company like Google who's an "AI producer". They may leave such a language for closed meetings with potential customers during sales pitches though.
Don't think the public will be that concerned about people in Google's salary bracket losing their jobs.
For example they thought that twitter had a bloated workforce because of videos like this (https://www.youtube.com/watch?v=buF4hB5_rFs).
And a lot of people heavily disagreed with how they handled moderation. You can take things like the hunter Biden laptop suppression or in the funny category you had the getting banned for saying learn to code (https://reason.com/2019/03/11/learn-to-code-twitter-harassme...).
Take random company without controversies and you will find less vitriol about them getting fired.
When I was working in RPA (robotic process automation) about 7 years ago, we were explicitly told not to say "You can reduce your team size by having use develop an automation that handles what they're doing!"
Even back then we were told to talk about how RPA (and by proxy AI) empowers your team to focus on the really important things. Automation just reduces the friction to getting things done. Instead of doing 4 hours of mindless data input or moving folders from one place to the other, automation gives you back those four hours so your team can do something sufficiently more important and focus on the bigger picture stuff.
Some teams loved the idea. Other leaders were skeptical and never adopted it. I spent the majority of those three years trying to selling them on this idea automation was good and very little time actually coding. Its interesting seeing the paradigm shift and seeing this stuff everywhere now.
As a non-politically savy person ;-) I have a feeling that this is a similarly dangerous message, since what prevents many teams to focus on really important things is often far too long meetings with managers and similar "important" stakeholders.
1. Almost every business has growing workload. That means reassigning good employees and not hiring new headcount, not firing existing headcount. Unipurpose, low-value offshore teams are the only ones who get cut (e.g. doing "{this} for every one of {these}" work).
2. Most operational automation is impossible to build well without deep process expertise from the SME currently performing it. If you fire that person immediately after automating their task, what do you think the next SME tells you, when you need their help?
Successfully scaling operational automation programs therefore rely on additional headcount avoidance (aka improving their volume:employee ratio) and value measurement (FTE-equivalent time savings) to justify/measure.
Would it be? Do they care?
Sam Altman's been talking about how GenAI could break capitalism (maybe not the exact quote, but something similar), and these companies have been pushing out GenAI products that could obviously and easily be used to fake photographic or video evidence of things that have occurred in the real world. Elon's obsessed with making an AI that's trained to be a 20-year-old male edgelord from the sewer pits of the internet.
Compared to those things, "we've replaced X% of our workforce with AI" is absolutely anodyne.
Altman encourages anyone that will listen to him that monopolies are the only path to success in business. He has a lot riding on making sure everyone is addicted to AI and that he’s the one selling the shovels.
Google isn’t far off.
Most capitalists have this fantasy that they can reduce their labour expenses with AI and continue stock buy-backs and ever-increasing executive payouts.
What sucks is that they rely on class divisions so that people don’t feel bad when the “overpaid” software developers get replaced. Problem is that software developers are also part of the proletariat and creating these artificial class divisions is breaking up the ability to organize.
It’s not AI replacing jobs, it’s capital holders. AI is just the smoke and mirrors.
Depends on who you ask.
If Trump wins and Elon Musk actually gets a new job, they would be bragging about replacing humans with AI all day long. And corporates are going to love it.
Not sure about what voters think though. But the fact that most of these companies are in California, New York etc means that it barely matters.
In a similar vein, solving world hunger is closer today than it's ever been. The previous best hope was global thermonuclear war, but honestly that would leave enough survivors as to be mostly ineffective, and much more likely to have the opposite result. Severe climate change has a better shot at fully eliminating [human] hunger.
A new economy is forming and there is nothing that can stop it without causing major, unintended fallout.
Has either bragged about this at all?
The only thing I've heard floated is Musk running a "government efficiency commission" which I just assumed meant he would be looking for ways to gut a lot of the never ending, never dying government programs. I've never heard him saying the commissions goal was to replace people with AI.
https://www.newsnationnow.com/politics/2024-election/trump-m...
The former president said such an audit would be to combat waste and fraud and suggested it could save trillions for the economy.
As the first order of business, Trump said that this commission will develop an action plan to eliminate fraud and improper payments within six months.
But you can't have a guy who literally used to relieve himself into a golden toilet take over your party and be anything but the party of big business and billionaires.
https://royaltoiletry.com/does-trump-have-a-gold-toilet-unpa...
Still a guy who operated multiple luxury hotel and golf course properties that would laugh a working man out the front door if he asked for an affordable room.
That's only worth doing if you're trying to cut costs though. If the company has unmet ambitions there's no reason to shrink the headcount from 10 to 8 and have the same amount of output when you can keep 10 people and have the output of 12 by leveraging AI.
Note the difft between "cost cutting" (do less, to lower cost) and "efficiency" (do same, but with less cost)
If your engineers become 20% more efficient then your margins are better and your problem is solved. (Indeed if you have tech that can make any engineer 20% more efficient then you are back in the game of hiring as many as you can find, as long as each added engineer brings in enough additional revenue.)
AI engineers will not yet get a Nobel prize for putting everyone out of work.
Most likely what is actually happening is that the X% of workforce you would lay off is being put to other projects and Google in general can take on X% more projects for the same labor $$. So there is no real reason to make that particular "replaced" statement.
I haven't seen this yet so I'm intrigued. Is this a commercial product, or internal tooling?
When you maintain an open source project on GitHub you will occasionally get some open source automated bot that submits a PR to do things like this without you even asking, and I’m sure there’s plenty more you can sign up for or implement yourself.
I wouldn’t really call it AI, but it is automated. I agree with the parent comment that a journalist trying to push an angle would probably lump it in as AI in order to make the number seem larger.
You don't want a single PR that does that, because that would affect thousands of projects, and if something goes wrong with a single one, the whole PR needs to be rolled back.
However in my experience the system is much more powerful than you described. Maybe this is because I'm mostly writing C++ for which there is a much bigger training corpus than JavaScript.
One thing the system is already pretty good at is writing entire short functions from a comment. The trick is not to write:
But instead: This way the completion goes much farther and the quality improves a lot. Essentially, use comments as the prompt to generate large chunks of code, instead of giving minimum context to the system, which limits it to single line completion.Time will tell whether it outputs worse, equal, or better quality than skilled humans, but I'd be very wary of anything it suggests beyond obvious boilerplate (like all the symbols needed in a for loop) or naming things (function name and comment autocompletes like the person above you described)
It isn't something I worry about at all. If it doesn't work and starts creating bugs and horrible code, the best places will adjust to that and it won't be used or will be used more judiciously.
I'll still review code like I always do and prevent bad code from making it into our repo. I don't see why it's my problem to worry about. Why is it yours?
Functional bugs in edge cases are annoying enough, and I seem to run into these regularly as a user, but there's yet another class of people creating edge cases for their own purposes. The nonchalant "if it doesn't work"... I don't know whether that confirms my suspicion that not all developers are aware of (as a first step; let alone control for) the risks
Edge cases will usually be the ones to get through. Most developers don’t correctly write tests that exercise the limits of each input (or indeed have time to both unit test every function that way, and integration test to be sure the bigger stories are correctly working). Nothing about ai assist changes any of this.
(If anybody starts doing significant fully unsupervised “ai” coding they would likely pay the price in extreme instability so I’m assuming here that humans still basically read/skim PRs the same as they always have)
This isn't a new concern. Thoughtless software development started a long time ago.
So that's not to say there is nothing to be concerned about on stackoverflow, just that the risk seems manageable and understood. You also nearly always have to fit it to your own situation anyway. With the custom solutions from generative models, this is all not yet established and you're not having to customise (look at) it further if it made a plausible-looking suggestion
Perhaps this way of coding ends up introducing fewer bugs. Time will tell, but we all know how many wrong answers these things generate in text as well as what they were trained on, giving grounds for worry—while also gathering experience, of course. I'm not saying to not use it at all. It's a balance and something to be aware of
I also can't say that I find it to be thoughtless when I look for answers on stackoverflow. Perhaps as a beginning coder, you might copy bigger bits? Or without knowing what it does? That's not my current experience, though
Often when I don't know exactly what function / sequence of functions I need to achieve a particular outcome, I put in a comment describing what I want to do, and Copilot does the rest. I then remove the comment once I make sure that the generated code actually works.
I find it a lot less flow-breaking than stackoverflow or even asking an LLM.
It doesn't work all of the time, and sometimes you do have to Google still, but for the cases it does work for, it's pretty nice.
Something like:
Doesn't need an explanation, but when working in a language I don't know well, I might not remember whether I'm supposed to call orderBy on the query or on the ORM module and pass query as the argument, whether the kwarg is called "field" or "column", whether it wants a string or something like `User.name` as the column expression, how to specify the ordering and so on.Remember, LLMs are just compilers for programming languages that just so happen to have a lot of similarities with natural language. The code is not the comment. You still need to comment your code for humans.
When I'm maintaining other people's code, or my own after enough time has gone by, I'm very interested in that sort of comment. It gives me a chance to see if the code as written does what the comment says it was intended to do. It's not valuable for most of the code in a project, but is incredibly valuable for certain key parts.
You're right that comments about why things were done the way they were are the most valuable ones, but this kind of comment is in second place in my book.
Implementation comments belong inside the implementation, so they should be over if not deleted.
This is one that I really liked: https://www.reddit.com/r/ProgrammerHumor/comments/l5gg3t/thi...
We call it “Telephone” because “Chinese Whispers” not only sounds racist, it is also super confusing. You need a lot of cultural context to understand the particular way in which Chinese whispers would be different from any other set of whispers.
Copilot will autocomplete entire functions as well, sometimes without comments or even after just typing "f". It uses your previous edits as context and can assume what you're implementing pretty well.
0. https://github.com/google/closure-compiler/wiki/Annotating-J...
AI coding assistants are generally really good at ramping up a base level of tests which you can then direct to add more specific scenario's to.
https://www.techspot.com/news/104945-ai-coding-assistants-do...
It is not as good with questions about API documentation for popular java libraries though and it will just hallucinate APIs/method names.
If I ask it a generic question like "how can I create a class in Java to invoke this API and store the data in this database" it is pretty useless. I'm sure I could spend more time giving it a better prompt but at that point I can just write the code myself.
Overall they are a better search engine for stackoverflow, but the LLMs are not really helping me code 30% faster or whatever the latest claim is.
I agree with your take though, it does seem helpful to juniors but not beyond that (yet), and this OP stat seems dubious unless juniors are doing a big portion of the work.
I feel it's a bit like the old "measuring developer productivity in LoC" metric.
As I hinted at in another comment, in Java if you had a "private String name;" then the following:
and the matching setter, are easy enough to generate automatically and you don't need a LLM for it. If AI can do that part of coding a bit better, sure it's helpful in a way, but I'm not worried about my job just yet (or rather, I'm more worried about the state of the economy and other factors).Would I send the source of a trading algo or chatgpt to a third party, probably not but those are the outliers. The code for your xyz SAAS does not matter.
I am probably an outlier in that I don't really care what corpus a LLM trains off of. Its its available in the public space, go for it.
If so, all your code is sent to cloud.
An extreme example of this would be the AWS GovCloud for government/military applications.
Code is often a liability.
Which is how they've surpassed 25% in new code, as compared to the 10% (made up number, but clearly non-zero) in the past. But incremental improvement, is all.
I am going to argue contrary. If AI increases productivity 2x, it opens up as much new usecases that previously didn't seem worthy to do for its cost. So overall there will just be more work.
This is the entire history of the computing industry. We’ve been automating our work away for decades and it just creates more demand.
Well I do freelancing as well besides my usual day to day work, and that's also where direct benefits apply, and I'm getting more and more work, overwhelmingly so.
They allow me to do much more than that thanks to all the knowledge they contain.
For instance, yesterday I wanted to write a tool that transfers any large file that is still being appended to to multiple remote hosts, with a fast throughput.
By asking Claude for help I obtained exactly what I want in under two hours.
I'm no C/C++ expert yet I have now a functional program using libtorrent and libfuse.
By using libfuse my program creates a continuously growing list of virtual files (chunks of the big file).
A torrent is created to transfer the chunks to remote hosts.
Each chunk is added to the torrent as it appears on the file system thanks to the BEP46 mutable torrent feature in libtorrent.
On each receving host, the program rebuilds the large file by appending new chunks as soon as they are downloaded through the torrent.
Now I can transfer a 25GB file (and growing) to 15 hosts as it is being written too.
Before LLM this would have taken me at least four days as I did not know those libraries.
LLMs aren't just parrots or tab completers, they actually contain a lot of useful knowledge and they're very good at explaining it clearly.
Did you use it in your editor or via the chat interface in the browser? Because they are two different approaches, and the one in the editor is mostly a (pretty awesome) tab completion.
When I tell an LLM to "create a script which does ..." I won't be doing this in the editor, even if copilot does have the chat interface. I'll be doing this in the browser because there I have a proper chat topic to which I can get back later, or review it.
But using copilot as a better autocomplete is really helpful and well worth the subscription. Just while typing as well as giving it more precise instructions via comments.
It's like a little helper in the editor, while the ChatGPT/Claude in the browser are more like "thinking machines" which can generate really usable code.
But it's not a production quality implementation of new need.
I have shipped production code using LLMs in languages I did not study approved by seasoned SWE's is evidence that an acceleration is happening.
I mean if you assume all devs are script kiddies who simply copy paste what they find on google (or ChatGPT without asking for explanations) then yeah it's never gonna be useful in a prod setting.
Also you're very wrong to believe every technical need or combination of libraries has already been implemented in open source before.
Moreover Claude can explain the functions used very clearly (if you're too lazy to jump to definition in your editor)
LLMs are becoming actually useful to developers new to a language. Just as Google was 20 years ago.
However i think that you might open source the thing with a disclaimer of no maintenance. Whoever is willing to maintain it can just fork it and move along.
This is what's problematic with modern "AI". Most people inexperienced with it, like the parent commenter will uncritically assume these LLMs poses "knowledge". This I find the most dangerous and prevalent assumption. Most people are oblivious to the fact how bad LLMs are.
People misusing tools don't make tools useless or bad. Especially since LLMs designers never claimed the compressed information inside models is spotless or 100% accurate, or based on logical reasoning.
Any serious engineer with a modicum of knowledge about neural networks knows what can or can't be done with the output.
In IntelliJ thankfully you can disable that part of the AI, and keep the part that you trigger it when you want something from it.
This is a fantastic description of how it disturbs my coding practice which I hadn't been able to put into words. It's like someone is constantly interrupting you with small suggestions whether you want them or not.
I want the end of the line completed with focus on context from the working code base, and I don't want an entire 5 line function completed with incomplete requirements.
It is really impressive when it implements a 5 line function correctly, but its like hitting the lottery
When I copy and paste code, very often it needs some small changes (like changing all xs to ys and at the same time widths to heights).
It's very good at this, and does the right thing the vast majority of the time.
It's also good with test code. Test code is supposed to be explicit, and not very abstracted (so someone only mildly familiar with a codebase that's looking at a failing test can at least figure the cause). This means it's full of boilerplate, and a smart code generator can help fill that in.
Then it got worse a couple of years ago when they tried some early-stage AI approach. I turned it off. I expect that next time I update VS it'll have got substantially worse and it will have removed the option for me to disable it.
I think it’d be more useful if it was clipboard aware though. Sometimes I’ll copy a type, then add a param of that type to a function, and it won’t have the clipboard context to suggest the param I’m trying to add.
(By "unread" I meant that I don't look very closely before deleting if it looks weird.)
And then write tests. Or perhaps I wrote the test first.
Swift is especially frustrating because it will hallucinate the method name and/or the argument names (since you often have to specify the argument names when calling a method).
Or when I say I need to do something, it invents a library that conveniently happens to just do that thing and writes code to import and use it. Except there's no such library of course.
"classic" intellisense is reliable, so why introduce random source in the process?
Realtime tab completion is good at some really mundane things within the current file.
You still need a chat model, like Claude 3.5 to do more explorational things.
In the hands of a junior, AI can create a false sense of confidence and it acts as a technical debt and security flaw multiplier.
We should bring back the title "Software engineer" instead of "Software developer." Many people from other engineering professions look down on software engineers as "Not real engineers" but that's because they have the same perspective on coding as typical management types have. They think all code is equal, it's unavoidable spaghetti. They think software design and architecture doesn't matter.
The problems a software engineer faces when building a software system are the same kinds of problems that a mechanical or electrical engineer faces when building any engine or system. It's about weighing up trade-offs and making a large number of nuanced technical decisions to ultimately meet operational requirements in the most efficient, cost-effective way possible.
I regularly describe it as "The best snippet tool I've ever used (because it plays horseshoes)".
I would be very confused if someone told me that they uncritically used the generated code from a snippet program with no manual input or understanding, and I feel the same with Copilot. At best, it suggests an auto-complete that I read and interpret before accepting.
The closest I come to "code generation" is during test writing, where occasionally I will let the description generate some setup, but only in tests where there are a broad number of examples to follow, and I am still going to end up re-writing a decent chunk of it based on personal example. I would not "let it write the test suite for me" and then trust the green, and I suspect that would easily fail code review (though it would be an interesting experiment...).
Obviously your comment as a good goof and well made, but it does speak to a little bit of the disconnect between what is being touted as an "AI coding tool" and how I, a person who makes react native apps to pay my rent, actually use the dang thing (i.e., "A pretty good snippets plugin"). Is My code 'AI generated'? I wouldn't call it that, but who can say definitively? We're in a fun new semantic world now.
This workflow is not perfect but I am definitely building out all the core features way faster than if I wrote the code myself, and the code is in quite a good state. Quite often I do some bits of cleanup, refactorings, making sure typings are complete myself, then update ChatGPT with what the code now looks like.
I think what people miss is there are dozens of different ways to apply AI to your day-to-day as a software engineer. It also helps with thinking things through, architecture, describing best practices.
- it only works well when you write code from scratch, context length is too short to be really helpful for working on existing codebase.
- the output code is pretty much always broken in some way, and you need to be accustomed to doing code reviews to use them effectively. If you trust the output and had to debug it later it would be a painfully slow process.
Also, I didn't really noticed a significant difference in code quality, even the best model (GPT-4) write code that doesn't work, and I find it much more efficient to use open models on Groq due to the really fast inference. Looking at ChatGPT slowly typing is really annoying (I didn't test o1 and I have no interest in doing so because of its very low throughput).
This is kind of true, my approach is I spend a fairly large amount of time copy-pasting code from relevant modules back and forth into ChatGPT so it has enough context to make the correct changes. Most changes I need to make don't need more than 2-3 modules though.
> the output code is pretty much always broken in some way, and you need to be accustomed to doing code reviews to use them effectively.
I think this really depends on what you're building. Making a CRM is a very well trodden path so I think that helps? But even when it came to asking ChatGPT to design and implement a flexible data model it did a very good job. Most of the code it's written has worked well. I'd say maybe 60-70% of the code it writes I don't have to touch at all.
The slow typing is definitely a hindrance! Sometimes when it's a big change I lose focus and alt-tab away, like I used to do when building large C++ codebases or waiting for big test suites to run. So that aspect saps productivity. Conversely though I don't want to use a faster model that might give me inferior results.
It can work, but what a terrible developer experience.
> I'd say maybe 60-70% of the code it writes I don't have to touch at all
I used to to write web apps so the ratio was even higher I'd say (maybe 80/90% of the code didn't need any modification) but the app itself wouldn't work at all if I didn't make those 10% changes. And you really need to read 100% of the code because you won't know upfront where those 10% will be.
> The slow typing is definitely a hindrance! Sometimes when it's a big change I lose focus and alt-tab away, like I used to do when building large C++ codebases or waiting for big test suites to run.
Yeah exactly, it's xkcd 303 but with “IA processing the response” instead of “compiling”. Having instant response was a game changer for me in terms of focus hence productivity.
> I don't want to use a faster model that might give me inferior results
As I said earlier, I didn't really feel the difference in quality so the switch was without drawbacks.
...yet. Bugs can take time to surface.
Interesting, personally I have noticed a difference. Mostly in how well the models pick up small details and context. Although I do have to agree that the open Llama models are generally fairly serviceable.
Recently I have tended to lean towards Claude Sonnet 3.5 as it seems slightly better. Although that does differ per language as well.
As far as them being slow, I haven't really noticed a difference. I use them mostly through the API with open webui and the answers come quick enough.
Sometimes that results in code, but it's the research and cross referencing that's actually useful with it
There's not even anything wrong with that, don't take my comment the wrong way. It is an interesting question of what happens at scale though. We could easily find ourselves in a spot where very few people know how to code and most producing code don't actually know how it works and couldn't find or fix a bug if they needed to. It also means LLMs would be stuck with today's code for a training set until it can invent its own coding paradigms and languages, at which point we're all left in the dust trusting it to work right.
There is this tool Aider. Takes your prompt, adds code files (sometimes not all of your code files but files it figures relevant) and prepares one long prompt, sends it to an LLM, receives the response, and makes a git commit based on the response. If you rather review git commits, it can save you the back-and-forth copy-pasting. https://aider.chat/
The general public has a very different idea of that though and I frequently meet people very surprised the entire profession hasn’t been automated yet based on headlines like this.
Well no, but we have no evidence it can be used for the whole stack, whatever that means.
Asking for evidence is not being an ostrich.
But I doubt the predictions from men whose net worth depends on the hype they foment.
I remember many years ago as a Java developer, Netbeans could do such things as complete `psvm` to "public static void main() {...}", or if you had a field "private String name;" you could press some key combination and it would generate you the getter and setter, complete with javadoc which was mandatory at that place because apparently you need "Returns the name.\n @return The name." on a method called getName() in case you wondered what it was for.
On the other hand, LLMs are completely different -- based on machine learning and everything is random and about statistics. It depends on training data and context. It is more useful but make a ton of mistakes.
All things that I would consider tedious housekeeping, but nothing that needs serious reasoning.
It's basically a glorified LSP.
For instance, one I find very useful is that we have this pattern of checking the result of a function call, logging the error and returning, or whatever. So now, every time you have `result = foo()`, it will auto suggest `if (!result) log_error...` with a generally very good error message.
Very basic, but damn convenient. The more patterns you use, the more helpful it becomes.
My productivity isn't so much enhanced. It's only 1%... 2%... 5%... globally, for each employee.
Have you ever dabbled with, mucked around in, a command line? Autocomplete functions there save millions of man-hour-typing-units per year. Something to think about.
A single employee, in a single task, for a single location may not equal much gained productivity, but companies now think on much larger scales than a single office location.
Work gets scheduled on short time frames. 5% savings isn't enough to change the schedule for any one person. At most, it gives me time to grab an extra coffee. I can't string together "foregone extra coffees" into "more tasks/days in the schedule".
Now making a kettle faster? That might actually be something.
It’s probably closer to 10% than 100%, especially at big companies.
One thing I would love to see is reports of benefits from various tools coming with one’s typing ability in WPM. I’d also like to see that on posts where people express a preference for “a quick call” or stopping by your desk rather than posting what they want in chat. I have some hypotheses I’d like to test out.
Wait, no. That should be based on how much slacking off Google employees do ordinarily, an unknown quantity.
Use OpenAI to convert a few thousand lines of code from a language you're familiar with to one you’re not, as all the state-of-the-art tools in the field above use that language. Debug all the issues that arise from the impedance mismatch between the languages. Recreate the results from the seminal paper in the field to verify that the code works, and run it on your own problem. Write a stream-of-consciousness post without spell-checking, then throw it into GPT and ask it to fix it.
It is actually a testament that, part of Google's code are ... kinda formulaic to some degree. Prior to the LLM take over, we already heard praise how Google's code search works wonder in helping its engineer writing code, LLM just brought that experience to next level.
It's easy to see that adding up quickly to represent large percentages of the codebase by line, but it's not feature development or solving hard problems.
And it works maybe... 80% of the way and I spend all my time fixing the remaining 20%. Anecdotally I don't "feel" like this really accelerates me or reduces the time it would take me to do the change if I just implemented the translation manually.
[1] https://www.reuters.com/legal/ai-generated-art-cannot-receiv...
I wonder if the enormous hype around AI is a good or bad thing; it's obviously both but will the good win out the bad, or will the disappointment eventually be so overwhelming as to extinguish any enthusiasm.
1: https://news.ycombinator.com/item?id=41992028
I really mean no offense, but your example doesn't sound much different from what old IDEs (say, Netbeans) used to do 15 years ago.
I could design a Swing ui and it would generate the code and if I wanted to override a method it would generate a decent boilerplate boilerplate (a getter, like in your example) along with usual comments and definitely correct parameters list (with correct types).
Is this "AI Code" thing something that appears new because at some point we abandoned IDEs with very strong intellisense (etc) ?
But yeah, I wish the new version of Chrome worked better. ¯\_(ツ)_/¯
1. We take a lot of care to make sure the AI recommendations are safe and have a high quality bar (regular monitoring, code provenance tracking, adversarial testing, and more).
2. We also do regular A/B tests and randomized control trials to ensure these features are improving SWE productivity and throughput.
3. We see similar efficiencies across all programming languages and frameworks used internally at Google and engineers across all tenure and experience cohorts show similar gain in productivity.
You can read more on our approach here:
https://research.google/blog/ai-in-software-engineering-at-g...
I don't think this is a bad thing - if this can be accompanied by an increase in software quality, which is possible. Right now its very hit and miss and everyone has examples of LLMs producing buggy or ridiculous code. But once the tooling improves to:
1. align produced code better to existing patterns and architecture 2. fix the feedback loop - with TDD, other LLM agents reviewing code, feeding in compile errors, letting other LLM agents interact with the produced code, etc.
Then we will definitely start seeing more and more code produced by LLMs. Don't look at the state of the art not, look at the direction of travel.
That’s a huge “if”, and by your own admission not what’s happening now.
> other LLM agents reviewing code, feeding in compile errors, letting other LLM agents interact with the produced code, etc.
What a stupid future. Machines which make errors being “corrected” by machines which make errors in a death spiral. An unbelievable waste of figurative and literal energy.
> Then we will definitely start seeing more and more code produced by LLMs.
We’re already there. And there’s a lot of bad code being pumped out. Which will in turn be fed back to the LLMs.
> Don't look at the state of the art not, look at the direction of travel.
That’s what leads to the eternal “in five years” which eventually sinks everyone’s trust.
Humans are machines which make errors. Somehow, we got to the moon. The suggestion that errors just mindlessly compound and that there is no way around it, is what's stupid.
Even if we accept the premise (seeing humans as machines is literally dehumanising and a favourite argument of those who exploit them), not all machines are created equal. Would you use a bicycle to fill your taxes?
> Somehow, we got to the moon
Quite hand wavey. We didn’t get to the Moon by reading a bunch of text from the era then probabilistically joining word fragments, passing that around the same funnel a bunch of times, then blindly doing what came out, that’s for sure.
> The suggestion that errors just mindlessly compound and that there is no way around it
Is one that you made up, as that was not my argument.
We got to the moon using a large number of systems to a) avoid errors where possible and b) build in redundancies. Even an LLM knows this and knew what the statement meant:
https://chatgpt.com/share/6722e04f-0230-8002-8345-5d2eba2e7d...
Putting "corrected" in quotes and saying "death spiral" implies error compounding.
https://chatgpt.com/share/6722e19c-7f44-8002-8614-a560620b37...
These LLMs seem so smart.
Sure, I'm really poor painter, Midjourney is better than me. Are they better than a human trained for that task, on that task? That's the real question.
And I reckon the answer is currently no.
You’ve just proven my point. My issue with LLMs is precisely people turning off their brains and blindly taking them at face value, even arduously defending the answers in the face of contrary evidence.
If you’re basing your arguments on those answers then we don’t need to have this conversation. I have access to LLMs like everyone else, I don’t need to come to HN to speak with a robot.
Of course self-driving cars aren’t a nonsense idea. The execution and continued missed promises suck, but that doesn’t affect the idea. Claiming “humans are geniuses without equal” would be pretty dumb too, and is again something you’re making up. And something doesn’t have to be “all snake oil” to deserve specific criticism.
The world has nuance, learn to see it. It’s not all black and white and I’m not your enemy.
Actually understand LLMs in detail and you'll see it isn't some huge waste of time and energy to have LLMs correct outputs from LLMs.
Or, don't, and continue making silly, snarky comments about how stupid some sensible thing is, in a field you don't understand.
Yes, they do *seem* smart. My experience with a wide variety of LLM-based tools is that they are the industrialization of the Dunning-Kruger effect.
Now once you've recognized that, you're better equiped for task at hand - which is augmenting and ultimately automating away every task that humans-as-machines perform by building equivalent or better machine that performs said tasks at fraction of the cost!
People who want to exploit humans are the ones that oppose automation.
There's still long way to go, but now we've finally reached a point where some tasks that were very ellusive to automation are starting to show great promise of being automated, or atleast being greatly augmented.
The conceit that humans are machines carries with it such powerful ideology: humans are for something, we are some kind of utility, not just things in themselves, like birds and rocks. How is it anything other than an affirmation of metaphysical/theological purpose to particularly humans? Why is it like that? This must be coming from a religious context, right?
I cannot at least see how you could believe this while sustaining a rational, scientific mind about nature, cosmology, etc. Which is fine! We can all believe things, just know you cant have your cake and eat it too. Namely, if anybody should believe in fairies around here, it should probably be you!
Because it's boring stuff, and most of us would prefer to be playing golf/tennis/hanging out with friends/painting/etc. If you look at the history of humanity, we've been automating the boring stuff since the start. We don't automate the stuff we like.
Recognizing that humans, just like birds are self-replicating biological machines is the most level-headed way of looking at it.
It is consistent with observations and there are no (apparent) contraditions.
The spritual beliefs are the ones with the fairies, binding of the soul, made of special substrate, beyond reason and understanding.
If you have desire to improve human condition (not everyone does) then the task at hand naturally arisies - eliminate forced labour, aging, disease, suffering, death, etc.
This all naturally leads to automation and transhumanism.
If humans are machines, then so are fairies.
LLMs get their errors fed back into them and become more confident that their wrong code is right.
I'm not saying that's completely unsolvable, but that does seem to be how it works today.
Start adding different prompts, different models and you get all kinds of ways to catch errors. Just like humans.
There was a recent meme about asking LLMs to draw a wineglass full to the brim with wine.
Most really struggle with that instruction. No matter how much you ask them to correct themselves they can’t.
I’m sure they’ll get better with more input but what it reveals is that right now they definitely do not understand their own output.
I’ve seen no evidence that they are better with code than they are with images.
For instance, if the time to complete only scales with length of the token and not the complexity of its contents then it probably safe to assume it’s not being comprehended.
No. LLMs can be told that there was an error and produce an alternative answer.
In fact LLMs can be told there was an error when there wasn't one and produce an alternative answer.
https://chatgpt.com/share/6722e41d-6b20-8002-8cbb-3012cd9179...
In my experience, if you confuse an LLM by deviating from the the "expected", then all the shims of logic seem to disappear, and it goes into hallucination mode.
ChatGPT here went with a lexigraphical order in Python for some reason, and then proceeded to make false statements from false observations, while also defying its own internal logic.
No. No.From what I understand of LLMs (which - I admit - is not very much), logical reasoning isn't a property of LLMs, unlike information retrieval. I'm sure this problem can be solved at some point, but a good solution would need development of many more kinds of inference and logic engines than there are today.
But a LLM is not "simply" doing anything. It's extremely complex and sophisticated. Once you go from tokens into high-dimensional embeddings... it seems these models (with enough training) figure out how all the concepts go together. I'd suggest reading the word2vec paper first, then think about how attention works. You'll come to the conclusion these things are likely to be able to beat humans at almost everything.
Are you sure you wanted to say that? Or is the other way around?
Really? That must be a very recent development, because so far this has been a reason for not using them at scale. And noone is.
Do you have a source?
USA (s)election, I guess.
It is by the juice of Sapho that thoughts acquire speed, the lips become stained, the stains become a warning...
it's even more fundamental than that.
even if they had any model, they would not be able to think.
thinking requires consciousness. only humans and some animals have it. maybe plants too.
machines? no way, jose.
that "hallucinate" term is a marketing gimmick to make it seem to the gullible that this "AI" (i.e. LLMs) can actually think, which is flat out BS.
as many others have said here on hn, those who stand to benefit a lot from this are the ones promoting this bullcrap idea (that they (LLMs) are intelligent).
greater fool theory.
picks and shovels.
etc.
In detective or murder novels, the cliche is "look for the woman".
https://en.m.wikipedia.org/wiki/Cherchez_la_femme
in this case, "follow the money" is the translation, i.e. who really benefits (the investors and founders, the few), as opposed to who is grandly proclaimed to be the beneficiary (us, the many).
from a search for grand vs grandiose:
When it comes to bigness, there's grand and then there's grandiose. Both words can be used to describe something impressive in size, scope, or effect, but while grand may lend its noun a bit of dignity (i.e., “we had a grand time”), grandiose often implies a whiff of pretension.
https://www.merriam-webster.com/dictionary/grandiose
Indeed, and one of the most interesting errors some human machines are making is hallucinating false analogies.
The idea that our current languages might be as far as we get is absolutely demoralising. I don't want a tool to help me write pointless boilerplate in a bad language, I want a better language.
It's an insanely conservative tool
Lastly, it’s also a huge waste of energy to feed the same information over and over again for each query.
- context over training is like someone referencing docs vs vaguely recalling from decayed memory
- context caching
The companies valuing the expensive talent currently working on Google will be the winner.
Google and others are betting big right now, but I feel the winner might be those who watches how it unfolds first.
As a dev, I like it. It speeds up writing easy but tedious code. It's just a bit smarter version of the refactoring tools already common in IDEs...
It is now very easy to sprinkle in regexes to validate user input , like email addresses, on every controller instead of using a central lib/utility for that.
In the hands of a skilled engineer it is a good tool. But for the rest it mainly serves to output more garbage at a higher rate.
Some people are touting this as a major feature. "I don't have to pull in some dependency for a minor function - I can just have AI write that simple function for me." I, personally, don't see this as a net positive.
The way it is now just leads to bloat and cruft.
And if we get 9 women we can produce a baby in a single month.
There's no guarantee such progression will continue. Indeed, there's much more evidence it is coming to a a halt.
It might be possible but will shareholders/investors foot the bill for the 80% that they still have to pay.
More and more of the content generated since is LLM generated and useless as training data.
The models get worse, not better by being fed their own output, and right now they are out of training data.
This is why Reddit just went profitable, AI companies buy their text to train their models because it is at least somewhat human written.
Of course, even reddit is crawling with LLM generated text, so yes. It is coming to a halt.
That's what people are doing. The direction of travel over the most recent few (6-12) months is mostly flat.
The direction of travel when first introduced was a very steep line going from bottom-left to top-right.
We are not there anymore.
Maybe I'm just old, but to me, LLMs feel like magic. A decade ago, anyone predicting their future capabilities would have been laughed at.
The discussion here seems to bare this out: CEO claims AI is magical, here the truth becomes that it’s just an auto-complete engine.
And so were a lot of markov chain based chatbots. Also Doretta, the microsoft AI/search engine chatbot.
Were they as good? No. Is this an iteration of those? Absolutely.
Of course LLMs are a fantastic tool to improve productivity, but current LLM's cannot produce anything novel. They can only reproduce what they have seen.
At Google, today, for sure.
I do believe we still are not across the road on this one.
> if this can be accompanied by an increase in software quality, which is possible. Right now its very hit and miss
So, is it really a smart move of Google to enforce this today, before quality have increased? Or did this set off their path to losing market shares because their software quality will deteriorate further over the next couple years?
From the outside it just seems Google and others have no choice, they must walk this path or lose market valuation.
I'm excited about the possibilities and I still recoil at the refined marketer prose.
Why no youtube videos thought? Well, most dev you tubers are actual devs that cultivate an image of "I'm faster than LLM, I never re-read library references, I memorise them on first read" and do on. If they then show you a video how they forgot the syntax for this or that maven plugin config and how LLM fills it in 10s instead of a 5min Google search that makes them look less capable on their own. Why would they do that?
For teams you can measure meaningful outcomes and improve team metrics.
You shouldn’t really compare teams but it also is possible if you know what teams are doing.
If you are some disconnected manager that thinks he can make decisions or improvements reducing things to single numbers - yeah that’s not possible.
How? Which metrics?
I think it's harder to measure things like developer productivity. The closest thing we have is making an estimate and seeing how far off you are, but that doesn't account for hedging estimates or requirements suddenly changing. Changing requirements doesn't matter for DORA as it's just another sample to test for deployment.
Unfortunately there's a lot of lag
A great generalisation and understatement! Often looking like you are becoming more efficient is more important than actually being more efficient, e.g you need to impress investors. So you cut back on maintenance and other cost centres and the new management can blame you in 6 years time for it when you are far enough away from it to not hurt you.
Then I would have to go on outlining 6-12 months of trying stuff out.
Because if I just give "an example" I will get dozens of "smart ass" replies how this specific one did not work for them and I am stupid. Thanks but don't have time for that or for writing an essay that no one will read anyway and call me stupid or demand even more explanation. :)
This seems to be useful to understand and internalize that there are no simple answers like "use story points!".
There is also loads of people who don't understand that, so I stand by that is useful and important to repeat on every possible occasion.
Measuring it is not the hard part.
The hard part is doing anything about it. If you can't attribute specific outputs to specific inputs, you don't know how to change inputs to maximize outputs. That's what managers need to do, but of course they're often just guessing.
But I do code myself, I write requirements so I do know which ones are trivial and which ones are not. I also see when there are complex migrations.
If you work in a group of people you will also get feedback - doesn't have to be snitching but still you get the feel who is a slacker in the group.
It is hard to quantify the output if you want to be removed from the group "give me a number" manager. If you actually do the work of a manager so you get the feel of the group like who is "Hermione Granger" nagging that others are slacking and disregard their opinion, you see who is the "silent doer" or you see who is "we should do it properly" bullshitter you can make a lot of meaningful adjustments.
Even that would be hard since hunting is complex. If you are the one chasing the pray into the arms of someone else, you surely want it to be considered a team effort.
You need like 'blueberries picked'.
"You can [accurately and meaningfully measure software engineering productivity] - but not on the level of a single developer and you cannot use those measures to manage productivity of a specific dev."
At the level of a company like Google, it's easy: both inputs and outputs are measured in terms of money.
I am not Amazon person - but from my experience 2 pizza teams was what worked and I never implemented it myself just what I observed in wild.
Measuring Google in terms of money is also flawed, there is loads of BS hidden there and lots of people paying big companies more just because they are big companies.
So that's how animal husbandry came about!
The superstar developer’s secret… he would send blank reports to clients (who would only realize it days later, and someone else would end up redoing the report), and he would score many more points without doing anything. I’ve seen this happen a lot in many different companies. As a friend of mine used to say, “it’s very rare, but it happens all the time.”
I have no doubt that AI can help developers, but I don’t trust the metrics of the CEO or people who work on AI, because they are too involved in the subject.
1) They can work to improve the system
2) They can distort the system
3) Or they can distort the data
https://commoncog.com/goodharts-law-not-useful/
ah, to be young again...
Now, I was really bad at capitalizing on it, so nothing much came of it, but still, there are some positive things that higher-ups do notice.
None of this works to evaluate individuals or even teams. But it can be effective at evaluating tools.
To use your example, a user with an LLM might say "LLM please fix this" as a first line of action, drastically improving this metric, even if it ruins your overall productivity.
edit: typo
That is, I, personally, am not measured on how much AI generated code I create, and while the number is non-zero, I can't tell you what it is because I don't care and don't have any incentive to care. And I'm someone who is personally fairly bearish on the value of LLM-based codegen/autocomplete.
Will AI be able to detect bugs and back doors that require multiple pieces of code working together rather than being in a single piece of code? Humans have a hard time with this.
- Hypothetical Example: Authentication bugs in sshd that requires a flaw in systemd which then requires a flaw in udev or nss or PAM or some underlying library ... but looking at each individual library or daemon there are no bugs that a professional penetration testing organization such as the NCC group or Google's Project Zero would find. In other words, will AI soon be able to find more complex bugs in a year than Tavis has found in his career and will they start to compete with one another and start finding all the state sponsored complex bugs and then ultimately be able to create a map that suggests a common set of developers that may need to be notified? Will there be a table that logs where AI found things that professional human penetration testers could not?
Adversaries are already detecting issues tho, using proven means such as code review and fuzzing.
Google project zero consists of a team of rock star hackers. I don't see LLM even replacing junior devs right now.
The whole paradigm should change. If you are indeed responsible for developer tools, I would hope that you’re activity leveraging Claude 3.5 Sonnet and o1-preview.
I’m far from impressed with the output of GPT/Claude, all they’ve done is weight against stack overflow - which is still low quality code relative to Google.
What is probability Google makes this a real product, or is it too likely to autocomplete trade secrets?
It's always felt like having AI in the cloud for better autocomplete is a lot for a small gain.
Considering how terrible and frequently broken the code that the public facing Gemini produces, I'll have to be honest that that kind of scares me.
Gemini frequently fails at some fairly basic stuff, even in popular languages where it would have had a lot of source material to work from; where other public models (even free ones) sail through.
To give a fun, fairly recent example, here's a prime factorisation algorithm it produce for python:
Can you spot all the problems?But I don't understand this attempt to tell companies/persons that are successfully using AI that no they really aren't.
In my opinion, if they feel they're using AI successfully, the goal should be to learn from that.
I don't understand this need to tell individuals who say they are successfully using AI that, "no you aren't."
It feels like a form of denial.
Like someone saying, "I refuse to accept that this could work for you, no matter what you say."
Assuming discussions don't happen in Slack, or Discord, or...
Did you know? How WEIRD.
How about you not harass other commenters with such arrogantly ignorant sarcastic questions?? Or is that part of corporate "for-profit" culture too????
So is marketing? So finance? So is petroleum engineering?
You were probably being rhetorical, but there are two problems:
- `p = 2` should be outside the loop
- `prime_factors.append(n)` appends `1` onto the end of the list for no reason
With those two changes I'm pretty sure it's correct.
I bet I'd make more errors on my first try at it.
If that doesn't satisfy, here's a similar one at leetcode.com: https://leetcode.com/problems/distinct-prime-factors-of-prod...
I would not expect a programmer of any seniority to churn stuff like that and have it working without testing.
I've been able to write one, not from memory but from first principles, any time in the last 40 years.
I'm happy you know math, but my point before this thread got derailed was that we're holding (coding) AI to a higher standard than actual humans, namely to expect to write bug-free code.
This seems like a very layman attitude and I would be surprised to find many devs adhering to this idea. Comments in this thread alone suggests that many devs on HN do not agree.
Testing for small prime factors is easy - brute force is your friend. Testing for large prime factors requires more effort. So the first trick is to figure out the bounds to the problem. Is it int32? Then brute-force it. Is it int64, where you might have a value like the Mersenne prime 2^61-1? Perhaps it's time to pull out a math reference. Is it longer, like an unbounded Python int? Definitely switch to something like the GNU Multiple Precision Arithmetic Library.
In this case, the maximum value is 1,000, which means we can enumerate all distinct prime values in that range, and test for its presence in each input value, one one-by-one:
That worked without testing, though I felt better after I ran the test suite, which found no errors. Here's the test suite: I had two typos in my test suite (an "=" for "==", and a ", 20))" instead of "), 20)"), and my original test_num_too_large() tested 10_001 instead of the boundary case of 1_001, so three mistakes in total.If I had no internet access, I would compute that table thusly:
Do let me know of any remaining mistakes.What kind of senior programmers do you work with who can't handle something like this?
EDIT: For fun I wrote an implementation based on sympy's integer factorization:
Here's a new test case, which takes about 17 seconds to run:Perhaps even, there could be fewer Go programmers here and some just took a stab at it even though they don’t know the language. So it could just select for which forum has the most Go programmers. Hardly rigourous.
So I’d take that with a pinch of salt personally
I’m pretty sure around 50% of the code I write is already auto-complete, without any AI.
We don’t need AI for this and it’s 10x the compute to do it slower with AI.
LLMs are useful but they aren’t a silver bullet. We don’t need to replace everything with it just because.
AI doesn't mean LLM after all. AI means 'a computer thing'.
Consider the fact that GPT-4 can generate valid XML (meaning balanced tags, quotes etc) in base64-encoded form. Without CoT, just direct output.
I don't know what model Copilot uses these days, but it constantly makes bracket mistakes in Python.
On the other hand, that doesn't mean it doesn't count for the purpose of this press release/advertisement...
(because it's like 50k lines regenerated for every build, everywhere, all the time)
It's a different thing to what you're talking about, but it's one way I'd expect to see LLMs contribute a lot to productivity on larger codebases specifically.
When dealing with a fermenting pile of technical debt? I expect very little. LLM's don't have application-wide context yet.
AI is definitely revolutionizing our field, but the same people that said that no-code tools and all of the other hype-of-the-decade technologies would make developers jobless are actually the people AI is making jobless.
Generate an opinion piece about how AI is going to make developers jobless, using AI? Less than a minute. And you don't need to maintain that article, once it's published, it's done.
While there's a tsunami of AI-generated almost-there projects coming that need to be moved to a shippable and sellable state. So I'm more afraid about the kind of work I'm going to get while still getting paid handsomely for my skills, than ever being jobless as the only guy that really understands the whole stack from top to bottom.
One common misconception is that all LLMs are the same. The models are trained the same, but trained on wildly different datasets. Google, and more specifically the Google codebase is arguably one of the most curated, and iterated on datasets in existence. This is a massive lever for Google to train their internal code-gen models, that realistically could easily replace any entry-level or junior developer.
- Code review is another dimension of the process of maintaining a codebase that we can expect huge improvements with LLMs. The highly-curated commentary on existing code / flawed diff / corrected diff that Google possesses give them an opportunity to build a whole set of new internal tools / infra that's extremely tailored to their own coding standard / culture.
This is a massive, unsubstantiated leap.
In many cases it is counterproductive
I spent 12 years of my career in the Google codebase.
This assertion is technically correct in that google3 has been around for 20 years, and all code gets reviewed, but the implication that Google's codebase is a high-quality training set is not consistent with my experience.
Nope
The other 75% is the stuff you actually have to think about.
This feels like saying linters impact x0% of code. This just feels like an extension of that.
It's just a very advanced autocomplete, completely integrated into the internal codebase and IDE. You can read this on the research blog (maybe if everyone just read the blog).
e.g.
I start typing `var notificationManager`
It would suggest `= (Notification Manager) context.getSystemService(NOTIFICATION_MANAGER);`
If you've done Android then you know how much boilerplate there is to suggest.
I press Ctrl+Enter or something to accept the suggestion.
Voila, more than 50% of that code was written by AI.
> blindly committing AI code
Even before AI, no one blindly accepts autocomplete.
A lot of headline-readers seem to imagine some sort of semi-autonomous or prompt based code generation that writes whole blocks of code to then be blindly accepted by engineers.
I’m pretty sure the actual ratio is much lower than that. In other words, LLMs are currently not good enough to remove the majority of chores, even with the state of the art model trained on highly curated dataset.
Most of the code must be what could be snippets (opening files and handling errors with absl::, and moving data from proto to proto). One thing that doesn't help here, is that when writing for many engineers on different teams to read, spelling out simple code instead of depending on too many abstractions seems to be preferred by most teams.
I guess that LLMs do provide smarter snippets that I don't need to fill out in detail, and when it understands types and whether things compile it gets quite good and "smart" when it comes to write down boilerplate.
Everyone knows this was “made by AI” because there’s no way in hell I would ever have the time. These models might not be able to sit there and build an entire project from scratch yet, but if what you need is some help adding the next control panel page, Claude’s got your back on that.
If you’re not a developer chances are very high the code it produces will look passable but is actually worthless — or worse, it’s misleading and now a dev has to spend more time deciphering the task.
Doubtful. A decent fraction of the people reading it will guess that you've wasted your time writing incoherent nonsense in the jira. Engineers don't usually have much insight into what the C suite are doing. It would be a prudent move to spend the couple of seconds to write "something like this AI sketch:" before the copy&paste.
They should know because you told them so.
Having to decipher weird code only to discover it was not written by a human is not nice.
Fascinating point of view.
When someone who uses a product says it, there is a 50% chance of it being true, but when someone far away from the user says it, it is 100% promotion of product and setup for trust building for a future sale.
Not what I thought when I heard "AI coding", but seems pretty neat.
.... so what do you put in your scripts if not code?
I would imagine it always has.
>Google CEO says more than a quarter of the company's new code is created by AI
It may very well be starting to become apparent anyway :\
25% of coding is just the most basic boilerplate. I think of AI not as a thinking machine but as a 1000 WPM boilerplate typer.
If it is halucinating, you're trying to make it do stuff that is too complex.
Thats my main problem: for trivial things it works but isnt much better than conventional tools, for hard things it just produces incorrect code such that writing it from scratch barely makes a difference
What would it look like if I could have 3-500 snippets instead of 30. Those 300 are things that I do all over my codebase e.g. same basic where query but in the context of whatever function I am in, a click handler with the correct types for that purpose, etc.
There is no way I can have enough hotkeys or memorize that much, and I truly can't type faster than I can hit tab.
I don't need it to think for me. Most coding (front-end/back-end web) involves typing super basic stuff, not writing complex algorithms.
This is where the 10-20% speed-up comes in. On average I am just typing 20% faster by hitting tab.
Although, if we were to ignore all this for a second, you could also make similar estimates with, e.g., gzip: the higher the compression ratio attained, the more "verbose"/"fluffy" the code is.
Fun tangent: there are a lot of researchers who believe that compression and intelligence are equivalent or at least very tightly linked.
I'm not sure though. If it's copied a bunch of times, and it actually doesn't matter because each usecase of the copying is linearly independent, does it matter that it was copied?
Over time, you'd still see copies being changed by themselves show up as increased entropy
WinRAR can do that for you quite effectively.
In fact it's the one thing they are explicitly designed to do, the rest is more or less a side-effect.
I’m aware that the difference is that AI-generated code can be read and modified by humans. But that quantity is bad because humans have to understand it to read or modify it.
What’s the point of shorter code if you can’t trust it to do what it’s supposed to?
I’ll take 20 lines of code that do what they should consistently over 1 line that may or may not do the task depending on the direction of the wind.
I understand more code as being more edge cases
I can also suggest alternatives, like using existing library functions for things that might have been coded manually.
fun story: today I had an LLM write me a non-trivial perl one-liner. It tried to be verbose but I insisted and it gave me one tight line.
Whenever the problem requires thinking, it horribly fails because it cannot reason (yet). So unless this is also true for google devs, I cannot see that 25% number.
Besides, even google employees write a lot of boilerplate, especially android IIRC, not to mention simple but essential code, so AI can prevent carpal tunnel for the junior devs working on that.
... and forgot to count the Delete presses.
https://www.folklore.org/Negative_2000_Lines_Of_Code.html
Is Google out there monitoring the IDE activity of every engineer, logging the amount of code created, by what, lines, characters, and how it was generated? Dubious.
A good chunk () of their code goes in a centralized repo, and is written via a centralized web IDE. So measuring everything you mentioned is not hard.
() Android, Chrome, and other similar projects are exceptions.
Yes, they'll miss AI-generated code that is copy pasted, so they only have a lower bound of AI-generated code.
For the example of generating an http app scaffolding from an openapi spec, it would probably account for at least 25% of the text in the generated source code. But I imagine this report would conveniently exclude the creation of the original source yaml driving the generator — I can’t imagine you’d save much typing (or mental overhead) trying to prompt a chatbot to design your api spec correctly before the codegen
Maybe my situation is unusual; I haven't written all that much code at Google lately, but what I do write is pretty tied to specific details of the program and the AI auto completion is just not that useful. Sometimes it auto completes a method signature correctly, but it never gets the body right (or even particularly close).
And it routinely making up methods or fields on objects I want to use is anti productive.
In that case those 25% are probably the very same 25% that were automatically generated by LTP based auto-completion.
1. Paste in my actual code.
2. Prompt: Write unit tests, test tables. Include scenarios: A, B, C, D, E. Include all other scenarios I left out, isolate suggestions for review.
I used to spend the majority of the coding time writing unit tests and mocking test data, now it's more like 10%.
> Prompt: Write unit tests
TDD in shambles. What you'd like is:
> Give your specs to some AI
> Get a test suite generated with all edge cases accounted for
> Code
I'll be looking into ways of running a local LLM for this purpose (code assistance in VS Code). I'm already really impressed with various quite large models running on my 32 GB Mac Studio M2 Max via Ollama. It feels like having a locally running chatgpt.
It immediately works out of the box and that's it. I've been using local LLMs on my laptop for a while, it's pretty nice.
The only thing you really need to worry about is VRAM. Make sure your GPU has enough memory to run your model and that's pretty much it.
Also "open webui" is the worst project name I've ever seen.
The Apple Silicon Macs have this shared memory between CPU and GPU that let's the (relatively underpowered GPU, compared to a decent Nvidia GPU) run these models at decent speeds, compared with a CPU, when using llama.cpp.
This should all get dramatically better/faster/cheaper within a few years, I suspect. Capitalism will figure this one out.
I assume, then, that the primary goal would be to drop in the beefiest GPU possible when on windows/linux?
In Windows and Linux, yes you'll want at least 12GB of VRAM to have much of any utility but the beefiest consumer GPUs are still topping out at 24GB which is still pretty limiting.
I'm sure that there are other much more knowledgeable people here though, on this topic.
The AI is fine, but every time it makes a little mistake that I have to correct it really breaks my flow. I might type a lot more boilerplate without it but I get better flow and overall that saves me time with less mistakes.
But you make an interesting point: eventually AI will be making for other AI's + machines, and human verification can be an after thought.
You tell investors that AI is freaking magic and going to usher in an age of savings and productivity gains.
You tell your developers that it's a neat autocomplete, they should use carefully.
Some say „this is mere tab completion“, some say „it won’t replace the senior engineer.“
I can remember how many fiercely argued 2 years ago that GenAI and Copilot are producing garbage. But here we are: These systems improve the workflow of creating / editing code enormously. You seniors might not be affected, but there are endless many scenarios where it replaces the junior who‘d write code to transform data, write scripts, write one-off scripts, or even write boilerplate, test code and what not.
And this is only after a short time. I cannot even imagine what we‘ll have ten years from now where we can propably have much larger context windows where the system can „unterstand“ the whole code base, not just parts.
I am sorry for low level engineering jobs, but I am super exited as well.
With GebAI I have been writing super complex Elisp code to automate workflow in Emacs, or VBA scripts in Excel, or Bash scripts I wouldn’t have otherwise been able to write, or JavaScript, or quickly write Python code to solve very tricky problems (and I am very high level in Python), or even React code for web apps for my personal use.
The future looks exiting to me.
This is the disconnect. I, along with others, haven't seen this yet. I'm begging to see it because I'd love to automate my work away, but I can't. This comment comes off as hand-wavy to me because it says "here we are" as if Google saying their AI works is evidence itself and not a statement that requires evidence.
Google, and really, the whole financial machine has a vested interest playing up the potential of AI. Unfortunate that it isn’t being given time to grow organically.
Financial incentives at large companies are not aligned with low volumes of code. There are no rewards for less code. People get rewarded for another bullshit framework to slap on their resume. Box me in, no, cube me in to a morass of a thick ingress layer, that uses 1/8th of my CPU.
Claude 3.5-sonnet (latest) is barely able to stay coherent on 500 LOC files, and easily gets tripped up when there are several files in the same directory.
I have tried similarly with o1-preview and 4o, and gemini pro...
If google is using a 5M token context window LLM with 100k+ token-output trained on all the code that is not public... then I can believe this claim.
This just goes to show how critical of an issue this is that these models are behind closed doors.
How is competitive advantage, using in-house developed/funded tools, a critical issue? Every company has tools that only they have, that they pay significantly for to develop, and use extensively. It's can often be the primary thing that really differentiates companies who are all doing similar things.
His position seems secure despite these missteps, which highlights an interesting double standard: there appears to be far more tolerance for strategic failures at the CEO level compared to the rigorous performance standards expected of engineering staff.
NO! False. I can confirm they are not. I've known of several major obvious unfixed bugs/flaws in Google apps for years. and in the last year or so especially theres been an explosion in the number of head-scratching, jaw-dropping fails and UX anti-patterns in their code. GMail, Search, Maps and Android are now riddled with them.
on Sundar Pichai's watch he's been devolving Google to be yet another Microsoft type in terms of quality, care and taste.
My conclusion is that we are at the first wave of a split between those who use LLMs to augment their abilities and knowledge, and those who delay. In cyberpunk terminally, it's aug-tech, not real AGI. (and the lesser ones code abilities and simpler the task, the more benefit, it's an accelerator)
What missing is that code being written by AI may have less of an impact than dataset that are developed or refined by AI. Consider examples like a utility function's coefficients, or the weights of a model.
As these are aggressively tuned using ML feedback, they'll influence far more systems than raw code.
1. Generate unit tests for modules which are already written to be tested 2. Generate documentation for interfaces
Both of these require quite deep knowledge in what to write, then it simply documents and fills in the blanks using the context which already has been laid out.
I will be curious to see if it has any impact positive or negative over a couple of years.
Will the code be more secure since the AI does not make the mistakes humans do?
Or will the code, not well enough understood by the employees, exposes exploits that would not be there?
Will it change average up time?
More lines == more shit to maintain. Complex lines == the shit is unmanageable.
But wall street investors love simplistic narratives such as More X == More revenue. So here we are. Pretty clever marketing imo.
These statements are brilliant.
Search Copy-Paste as a Service is hiding a deeper issue.
Alphabet ($GOOG) 2024 Q3 earnings release
https://news.ycombinator.com/item?id=41988811
How do they know this? At face value, it sounds like alot, but it only says "new code generated". Nothing about code making it into source control or production, or even which parts of googles vast business units.
For all we know, this could be the result of some internal poll "Tell us if you've been using Goose recently" or some marketing analytics on the Goose "Generate" button.
It's puff piece to put Google back in the lime light, and everyone is lapping it up.
:)
If this stuff really works as well as these companies claim it does, wouldn’t their entire workforce excitedly be using these tools already?
I expect these tools to improve productivity for new-ish developers, however for anyone that is literate in a programming language the effect is marginal at best ("Copilot pause" etc.)
Not to mention Elon publicly demonstrated losing 80% of staff when he took over twitter and - you can complain about his management all you want - as someone who's been using it the whole way through, from a technical POV their downtimes and software quality has not been any worse and they're shipping features faster. A lot of software companies are overstaffed, especially Google who has spent years paying people to make projects just to get a PO promoted, then letting the projects rot and die to be replaced by something else. That's a lot of useless work being done.
You have to establish that the CEO is actually aware of the reality and is interested in accurately conveying that to you. As far as I can tell there is absolutely no reason to believe any part of this.
which puts it in line with previous code-generation technologies i would imagine. I wonder which of these increased productivity the most?
- Assembly Language
- early Compilers
- databases
- graphics frameworks
- ui frameworks (windows)
- web apps
- code generators (rails scaffolding)
- genAI
The gap between "high level assembly" and "compiled language" is about as large as it gets.
Is that a good thing though? We should work and making code small and easy to manage without AI tools.
Seriously. The penchant for outright ignoring user search terms, relentlessly forcing irrelevant or just plain wrong information on users, and the obnoxious UI changes on YouTube! If I'm watching a video on full screen I have explicitly made it clear that I want YouTube to only show me video! STOP BRINGING UP THE FUCKING VIDEO DESCRIPTION TO TAKE UP HALF THE SCREEN IF I TRY TO BRIEFLY SWIPE TO VIEW THE TIME OR READ A MESSAGE.
I have such deep-seated contempt for AI and it's products for just how much worse it makes people's lives.
Then again, their devices are also coming out with known fatal design flaws, like not being able to make phone calls, or the screen going black permanently.
Like, isn't this announcement a terrible indictment of how inexperienced their engineers are, or how trivial the problems they solve are, or both?
This bothers me. I completely understand the conversational aspect - "what approach might work for this?", "how could we reduce the crud in this function?" - it worked a lot for me last year when I tried learning C.
But the vast majority of AI use that I see is...not that. It's just glorified, very expensive search. We are willing to burn far, far more fuel than necessary because we've decided we can't be bothered with traditional search.
A lot of enterprise software is poorly cobbled together using stackoverflow gathered code as it is. It's part of the reason why MS Teams makes your laptop run so hot. We've decided that power-inefficient software is the best approach. Now we want to amplify that effect by burning more fuel to get the same answers, but from an LLM.
It's frustrating. It should be snowing where I am now, but it's not. Because we want to frivolously chase false convenience and burn gallons and gallons of fuel to do it. LLM usage is a part of that.
In other words, it's a skill issue. LLMs can only make this worse. Hiring unskilled programmers and giving them a machine for generating garbage isn't the way. Instead, train them, and reject low quality work.
I don't think finding such programmers is really difficult. What is difficult is finding such people if you expect them to be docile to incompetent managers and other incompetent people involved in the project who, for example, got their position not by merit and competence, but by playing political games.
In my opinion the reason we get enterprise spaghetti is largely due to requirement issues and scope creep. It's nearly impossible to create a streamlined system without knowing what it should look like. And once the system gets to a certain size, it's impossible to get business buy-in to rearchitect or refactor to the degree that is necessary. Plus the full requirements are usually poorly documented and long forgotten by that time.
> it's impossible to get business buy-in to rearchitect or refactor to the degree that is necessary
That's a choice. There are some other options:
- Simply don't get business buy-in. Do without. Form a terrorist cell within your organization. You'll likely outpace them. Or you'll get fired, which means you'll get severance, unemployment, a vacation, and the opportunity to apply to a job at a better company.
- Fight viciously for engineering independence. You business people can do the businessing, but us engineers are going to do the engineering. We'll tell you how we'll do it, not the other way.
- Build companies around a culture of doing good, consistent work instead of taking expedient shortcuts. They're rare, but they exist!
Or simply find a position in an industry or department where you commonly have more independence. In my opinion this fight is not worth it - look for another position instead is typically easier.
Congratulations, you just refactored out a use case which was documented in a knowledge base which has been replaced by 3 newer ones since then, happens once every 18 months and makes the company go bankrupt if it isn't carried out promptly.
The type of junior devs who think that making code tidy is fixing the application are the type of dev who you don't let near the heart of the code base, and incidentally the type who are best replaced with code gen AI.
And regardless, the way you prevent loss of important functionality isn't by hoping people read docs that no longer exist. It's by writing coarse-grained tests that makes sure the software does the important things. If a programmer wants to change something that breaks a test like that, they go ask a product manager (or whatever you call yours) if that feature still matters.
And if nobody can say whether a feature still matters, the organization doesn't have a software problem, it has a serious management problem. Not all the coding techniques in the world can fix that.
Without redoing their work or finding a way to have deep trust (which is possible, but uncommon at a bigcorp) it's hard enough to tell who is earnest and who is faking it (or buying their own baloney) when it comes to propositions like "investing in this piece of tech debt will pay off big time"
As a result, if managers tend to believe such plans, bad ideas drive out good and you end up investing in a tech debt proposal that just wastes time. Burned managers therefore cope by undervaluing any such proposals and preferring the crappy car that at least you know is crappy over the car that allegedly has a brand new 0 mile motor on it but you have no way of distinguishing from a car with a rolled back odometer. They take the locally optimal path because it's the best they can do.
It's taken me 15 years of working in the field and thinking about this to figure it out.
The only way out is an organization where everyone is trusted and competent and is worthy of trust, which again, hard to do at most random bigcorps.
This is my current theory anyway. It's sad, but I think it kind of makes sense.
Being good at the NATO style of management means focusing on the big picture--what, when, why--and leaving how to the people actually doing it.
The way I explain this to managers is that software development is unlike most work. If I'm making widgets and I fuck up, that widget goes out the door never to be seen again. But in software, today's outputs are tomorrow's raw materials. You can trade quality for speed in the very short term at the cost of future productivity, so you're really trading speed for speed.
I should add, though, that one can do the rigorous thinking before or after the doing, and ideally one should do both. That was the key insight behind Martin Fowler's "Refactoring: Improving the Design of Existing Code". Think up front if you can, but the best designs are based on the most information, and there's a lot of information that is not available until later in a project. So you'll want to think as information comes in and adjust designs as you go.
That's something an LLM absolutely can't do, because it doesn't have access to that flow of information and it can't think about where the system should be going.
This is an important point. I don't remember where I read it, but someone said something similar about taking a loss on your first few customers as an early stage startup--basically, the idea is you're buying information about how well or poorly your product meets a need.
Where it goes wrong is if you choose not to act on that information.
> ...
> train them, and reject low quality work.
I agree very strongly with both of these points.
But I've observed a truth about each of them over the last decade-plus of building software.
1) very few people approach the field of software engineering with anything remotely resembling rigor, and
2) there is often little incentive to train juniors and reject subpar output (move fast and break things, etc.)
I don't know where this takes us as an industry? But I feel your comment on a deep level.
This is a huge problem. I don't know where it comes from, I think maybe sort of learned helplessness? Like, if systems are so complex that you don't believe a single person can understand it then why bother trying anyway? I think it's possible to inspire people to not accept not understanding. That motivation to figure out what's actually happening and how things actually work is the carrot. The stick is thorough, critical (but kind and fair) code--and, crucially, design--review, and demanding things be re-done when they're not up to par. I've been extremely lucky in my career to have had senior engineers apply both of these tools excellently in my general direction.
> 2) there is often little incentive to train juniors and reject subpar output (move fast and break things, etc.)
One problem is our current (well, for years now) corporate culture is this kind of gig-adjacent-economy where you're only expected to stick around for a few years at most and therefore in order to be worth your comp package you need to be productive on your first day. Companies even advertise this as a good thing "you'll push code to prod on your first day!" It reminds me of those scammy books from when I was a kid in the late 90s "Learn C In 10 Days!".
I think it's a bunch of things, but one legitimate issue is that software is stupidly complex these days. I had the advantage of starting when computers were pretty simple and have had a chance to grow along with it. (And my dad started when you could still lift up the hood and look at each bit. [1])
When I'm working with junior engineers I have a hard time even summing up how many layers lie beneath what they're working on. And so much of what they have to know is historically contingent. Just the other day I had to explain what LF and CR mean and how it relates to physical machinery that they probably won't see outside of a museum: https://sfba.social/@williampietri/113387049693365012
So I get how junior engineers struggle to develop a belief that the can sort it all out. Especially when so many people end up working on garbage code, where little sense is to be had. It's no wonder so many turn to cargo culting and other superstitious rituals.
[1] https://en.wikipedia.org/wiki/Magnetic-core_memory
I’ve had the good fortune of getting to write some firmware that will likely work well for a long time to come, but I find most things being written on computers are written with (or very close to) the minimum care possible in order to get the product out. Clean up is intended but rarely occurs.
I think we’d see real benefits from doing a better job, but like many things, we fail to invest early and crave immediate gratification.
I have this one opinion which I would not say at work:
In software development it's easy to feel smart because what you made "works" and you can show "effects".
- Does it wrap every failable condition in `except Exception`? Uhh, but look, it works.
- Does it define a class hierarchy for what should be a dictionary lookup? It works great tho!
- Does it create a cyclic graph of objects calling each other's methods to create more objects holding references to the objects that created them? And for what, to produce a flat dictionary of data at end of the day? But see, it works.
this is getting boring, maybe just skip past the list
- Does it stuff what should be local variables and parameters in self, creating a big stateful blob of an object where every attribute is optional and methods need to be called in the right order, otherwise you get an exception? Yes, but it works.
- Does it embed a browser engine? But it works!
The programmer, positively affirmed, continues spewing out crap, while the senior keep fighting fires to keep things running, while insulating the programmer from the taste of their own medicine.
But more generally, it's hard to expect people to learn how to solve problems simply if they're given gigantic OO languages with all the features and no apparent cost to any of them. People learn how to write classes and then never learn get good at writing code with a clear data flow.
Even very bright people can get fall for this trap because engineering isn't just about being smart but about using intelligence and experience to solve a problem while minmaxing correctly chosen properties. Those properties should generally be: dev time, complexity (state/flow), correctness, test coverage, ease of change, performance (anything else?). Anyway, "Affirming one's opinions about how things should be done" isn't one of them.
Ahh, well, in order to save money, training is done via an online class with multiple choice questions, or, if your company is like mine and really committed to making sure that you know they take your training seriously, they put portions of a generic book on 'tech Z' in pdf spread spread over a drm ridden web pages.
As for code, that is reviewed, commented and rejected by llms as well. It is used to be turtles. Now it truly is llms all the way down.
That said, in a sane world, this is what should be happening for a company that actually wants to get good results over time .
There is no incentive to do it. I worked that way, focused on quality and testing and none of my changes blew up in production. My manager opined that this approach is too slow and that it was ok to have minor breakages as long as they are fixed soon. When things break though, it's blame game all around. Loads of hypocrisy.
Traditional search (at least on the web) is dying. The entire edifice is drowning under a rapidly rising tide of spam and scam sites. No one, including Google, knows what to do about it so we're punting on the whole project and hoping AI will swoop in like deus ex machina and save the day.
Google wasn’t crushed by spam, they decided to stop doing text search and build search bubbles that are user specific, location-specific, decided to surface pages that mention search terms in metadata instead of in text users might read, etc. Oh yeah, and about a decade before LLMs were actually usable, they started to sabotage simple substring searches and kind of force this more conversational interface. That’s when simple search terms stopped working very well, and you had to instead ask yourself “hmm how would a very old person or a small child phrase this question for a magic oracle”
This is how we get stuff like: Did you mean “when did Shakespeare die near my location”? If anyone at google cared more about quality than printing money, that thirsty gambit would at least be at the bottom of the page instead of the top.
By the time I graduated highschool you already couldn't trust that Boolean operators would be treated literally. By the time I graduated college, they basically didn't seem to do anything, at best a weak suggestion.
Nowadays quotes don't even seem to be consistently honored.
I suspect the real problem is that search engines ceased being search engines when they stopped taking things literally and started trying to interpret what people mean. Then they became some sort of poor man's AI. Now that we have LLMs, of course it is going to replace the poor excuse for search engines that exist today. We were heading down that road already, and it actually summarizes what is out there.
Then Google decided to start enforcing that, because they had this idea that they would be able to divine your "intent" from a "natural question" rather than just matching documents including your search terms.
Google’s verbatim search option roughly does that for me (plus an ad blocker that removes ads from the results page). I have it activated by default as a search shortcut.
(To activate it, one can add “tbs=li:1” as a query parameter to the Google search URL.)
Thank you, this is almost life-alteringly good to know.
Seems that Firefox on mobile allows editing the url for most pages, but on google search results pages, the url bar magically turns into a did-you-mean alternate search selector where I cannot see nor edit a url. Surprised but not surprised.
Sure, there’s a work around for this too, somehow. But I don’t want to spend my life collecting and constantly updating a huge list of temporary hacks to fix things that others have intentionally broken.
Yes, it’s annoying that you can’t set it as the default on Google search itself.
Even more naive, but my personal preference: just ban all advertising. The fact that people will pay for ChatGPT implies people will also pay for good search if the free alternative goes away.
Google results are polluted with spam because it is more profitable for Google. This is a conscious decision they made five years ago.
Then why are DuckDuckGo results also (arguably even more so) polluted with spam/scam sites? I doubt DDG is making any profit from those sites since Google essentially owns the display ad business.
Google not only has multiple monopolies, but a cut and dry perverse incentive to produce lower quality results to make the whole session longer instead of short and effective.
For example: a beginner developer is possibly better served by some SEO-heavy tutorial blog post; an experienced developer would prefer results weighted towards the official docs, the project’s bug tracker and mailing list, etc. But since less technical and non-technical people vastly outnumber highly technical people, Google and Bing end up focusing on the needs of the former, at the cost of making search worse for the later.
One positive about AI: if an AI is doing the search, it likely wants the more advanced material not the more beginner-focused one. It can take more advanced material and simplify it for the benefit of less experienced users. It is (I suspect) less likely to make mistakes if you ask it to simplify the more advanced material than if you just gave it more beginner-oriented material instead. So if AI starts to replace humans as the main clients of search, that may reverse some of the pressure to “dumb it down”.
I mostly agree with your interesting comment, and I think your analysis basically jives with my sibling comment.
But one thing I take issue with is the idea that this type of thing is a good faith effort, because it’s more like a convenient excuse. Explaining substring search or even include/exclude ops to children and grandparents is actually easy. Setting preferences for tutorials vs API docs would also be easy. But companies don’t really want user-directed behavior as much as they want to herd users to preferred content with algorithms, then convince the user it was their idea or at least the result of relatively static ranking processes.
The push towards more fuzzy semantic search and “related content” everywhere is not to cater to novice users but to blur the line between paid advertisement and organic user-directed discovery.
No need to give megacorp the benefit of the doubt on stuff like this, or make the underlying problems seem harder than they are. All platforms land in this place by convergent evolution wherein the driving forces are money and influence, not insurmountable technical difficulties or good intentions for usability.
Good luck finding those, you end op with SEO spam and clone page spam. These days you have to look for unobvious hidden meanings which only relate to your exact problem to find what you are looking for.
I have the strong feeling search these days is back to the Altavista era. You'd have to use trickery to find what you were looking for back then as well. Too bad + no longer works in google due to their stupid naming of a dead product (no, literal is not the same and no replacement).
That's not my experience at all. While there are scammy sites, using the search engines as an index instead of an oracle still yields useful results. It only requires to learn the keywords which you can do by reading the relevant materials .
The problem with Google search is that it indexes all the web, and there's (as you say) a rising tide of scam and spam sites.
The problem with AI is that it scoops up all the web as training data, and there's a rising tide of scam and spam sites.
Tailoring/retraining the main search AI will be so much more expensive that retraining the spam special purpose AIs.
You make this claim with such confidence, but what is it based on?
There have always been hordes of spam and scam websites. Can you point to anything that actually indicates that the ratio is now getting worse?
No, there haven't always been hordes of spam and scam websites. I remember the web of the 90s. When Google first arrived on the scene every site on the results page was a real site, not a spam/scam site.
I'm sure they can. But they have no incentive. Try to Google an item, and it will show you a perfect match of sponsored ads and some other not-so-relevant non-sponsored results
He couldn't pronounce the name of the extension, apparently not noticing that trgm == trigram, or what that might even be. Copying the output from the LLM and pasting it into a PR didn't result in anything other than him checking off a box, moving a ticket in Jira, and then onto the next thing--not even a pretense of being curious about what any of it all meant. But look at those query times now!
It's been possible for a while to shut off your brain as a programmer and blindly copy-paste from StackOverflow etc., but the level of enablement that LLMs afford is staggering.
In fact, I'd say I use AI more for documentation than I do for code itself, because AI generated documentation is often superior to official documentation.
In the end, these things shouldn't be necessary (or barely necessary) if we had well constructed languages, frameworks, libraries and documentation, but it appears like it's easier to build AI than to make things non-convoluted in the first place.
"How do I do a for loop" though is a waste of time and energy and should be put into a search engine. There is no need to use the inefficient power needs of an LLM to answer that question. The search engine will have cached the results of that question, leading to a much faster discovery of the answer, and less power draw to do it, whereas an LLM needs to ponder your question EVERY. SINGLE. TIME. A huge waste.
Stop using LLMs for simple things.
Traditional search was only Google, and Google figured out that they don't need to improve their tools to make it better, because everyone will continue to use it as a force of habit (google is a verb!). Traditional search is being abandoned because traditional search isn't good enough for the kinds of search we need (also, while google may claim their search is very useful, people rarely search stuff nowadays, instead prefer being passively fed content via recommendations algorithm (that also use AI!))
[0]: https://www.lens.org/lens/search/
Since the collapse of Internet search (rose tinted hindsight - was it ever any good?) I have been using a LLM as my syntax advisor. I pay for my own tokens, and I can say it is astonishingly cheap
It is also very good.
That's because traditional search fucking sucks balls.
Honestly, I think it will become a better Intellisense but not much more. I'm a little excited because there's going to be so many people buying into this, generating so much bad code/bad architecture/etc. that will inevitably need someone to fix after the hype dies down and the rug is pulled, that I think there will continue to be employment opportunities.
We also have the ceremonial layers of certain forms of corporate architecture, where nothing actually happens, but the steps must exist to match the holy box, box cylinder architecture. Ceremonial input massaging here, ceremonial data transformation over there, duplicated error checking... if it's easy for the LLM to do, maybe we shouldn't be doing it everywhere in the first place.
I don't know that I've ever even met a developer who wants to be writing endless pools of trivial boilerplate instead of meaningful code. Even the people at work who are willing to say they don't want to deal with the ambiguity and high level design stuff and just want to be told what to do pretty clearly don't want endless drudgery.
We want to control code at the call site, boilerplate helps with that by being locally modifiable.
We also want to systematize chunks of code so that they don’t flicker around and mess with a reader.
We wanted this since forever and no one does anything because anything above simple text completion is traditionally seen as an overkill, not the true way, not unix, etc. All sorts of stubborn arguments.
This can be solved by simply allowing code trees instead of lines of code (tree vs table). You drop a boilerplate into code marked as “boilerplate ‘foo’ {…}” and edit it as you see fit, which creates a boilerplate-local patch. Then you can instantly see diffs, find, update boilerplates, convert them to and from regular functions, merge best practices from boilerplate libraries, etc. Problem solved.
It feels like the development itself got collectively stuck in some stupid principles that no one dares to question. Everything that we invent stumbles upon the simple fact that we don’t have any sensible devtime structure, apart from this “file” and “import file” dullness.
When I hear that most code is trivial, I think of this as a language design or a framework related issue making things harder than they should be.
Throwing AI or generates at the problem just to claim that they fixed it is just frustrating.
This was one of my thoughts too. If the pain of using bad frameworks and clunky languages can be mitigated by AI, it seems like the popular but ugly/verbose languages will win out since there's almost no point to better designed languages/framework. I would rather a good language/framework/etc where it is just as easy to just write the code directly. Similar time in implementation to a LLM prompt, but more deterministic.
If people don't feel the pain of AI slop why move to greener pastures? It almost encourages things to not improve at the code level.
Just as an example, I have "service" functions. They're incredibly simple, a higher order function where I can inject the DB handler, user permissions, config, etc. Every time I write one of these I have to import the ServiceDependencies type and declare which dependencies I need to write the service. I now spend close to zero time doing that and all my time focusing on the service logic. I don't see a downside to this.
Most of my business logic is done in raw SQL, which can be complex, but the autocomplete often helps there too. It's not helping me figure out the logic, it's simply cutting down on my typing. I don't know how anyone could be offered "do you want to have type significantly less characters on your keyboard to get the same thing done?" and say "no thanks". The AI is almost NEVER coding for me, it's just typing for me and it's awesome.
I don't care how lean your system is, there will at least be repetition in how you declare things. There will be imports, there will be dependencies. You can remove 90% of this repetitive work for almost no cost...
I've tried to use ChatGPT to "code for me", and I agree with you that it's not a good option if you're trying to do anything remotely complex and want to avoid bugs. I never do this. But integrated code suggestions (with Supermaven, NOT CoPilot) are incredibly beneficial and maybe you should just try it instead of trying to come up with theoretical arguments. I was also a non-believer once.
Regardless, I do wonder how accurate those successful reports are. Do people take LLM output, use it verbatim, not notice subtle bugs, and report that as success?
The way google works, the person changing an interface is responsible for updating all dependent code. They create PRs which are then sent to code owners for approval. For lower-level dependencies, this can involve creating thousands of PRs across hundreds of projects.
Google has had tooling to help with these large-scale refactors for decades, generally taking the form of static analysis tools. However, these would be inherently limited in their capability. Manual PR authoring would still be required in many cases.
With this background, LLM code gen seems like a natural tool to augment Google's existing process.
I expect Google is currently executing a wave of newly-unblocked refactoring projects.
If anyone works/worked at google, feel free to correct me on this.
If we’re guessing what code is easiest and largest proportion of codebase to write, my first guess would be test suites. Lots of lines of repetitive code patterns that repeat and AI is decent at dealing with
I've been using LLMs for about a month now. It's a nice productivity gain. You do have to read generated code and understand it. Another useful strategy is pasting a buggy function and ask for revisions.
I think most programmers who claim that LLMs aren't useful are reacting emotionally. They don't want LLMs to be useful because, in their eyes, that would lower the status of programming. This is a silly insecurity: ultimately programmers are useful because they can think formally better than most people. For the forseeable future, there's going to be massive demand for that, and people who can do it will be high status.
I don't think that's true. Most programmers I speak to have been keen to try it out and reap some benefits.
The almost universal experience has been that it works for trivial problems, starts injecting mistakes for harder problems and goes completely off the rails for anything really difficult.
I’ve been seeing the complete opposite. So it’s out there.
That's a bold statement, and incorrect, in my opinion.
At a junior level software development can be about churning out trivial code in a previously defined box. I don't think its fair to call that 'most programming'.
Most of the time, when I am typing code, the code I am producing is trivial, however.
I am just pointing out that the thread parent started his logical climb at a step one that is incorrect: 'Most programming is trivial'.
Given that they got it wrong on step one, how good do you thing step ten is?
Even with the debugging example, if I just read what I wrote I'll find the bug because I understand the language. For more complex bugs, I'd have to feed the LLM a large fraction of my codebase and at that point we're exceeding the level of understanding these things can have.
I would be pretty happy to see an AI that can do effective code reviews, but until that point I probably won't bother.
As an example of not being production ready: I recently tried to use ChatGPT-4 to provide me with a script to manage my gmail labels. The APIs for these are all online, I didn't want to read them. ChatGPT-4 gave me a workable PoC that was extremely slow because it was using inefficient APIs. It then lied to me about better APIs existing and I realized that when reading the docs. The "vibes" outcome of this is that it can produce working slop code. For the curious I discuss this in more specific detail at: https://er4hn.info/blog/2024.10.26-gmail-labels/#using-ai-to...
Does a carpenter blame their hammer when it fails to drive in a screw?
However this laser measuring tool is accurate within a range. There's a lot of factors that affect it's accuracy like time of day, how you hold it, the material you point it at, etc. Sometimes these accuracy errors are minimal, sometimes they are pretty big. You end up getting a lot of measurements that seem "close enough". but you still need to ask if each one is correct. "Measure Twice, Cut Once" begins to require one measurement with the laser tool and once with the conventional tool when accuracy matters.
One could have a convoluted analogy where the carpenter has an electric hammer that for some reason has a rounded head that does cause some number of nails to not go in cleanly, but I like my analogy better :)
That's the exact problem. I have plenty of screwdrivers but there's so much pressure from people not in carpentry telling me to use this shiny new army Swiss knife contraption. Will it work? Probably, if I'm just screwing in a few screws. Would I readily abandon my set of precision built, magnetic tip, etc. Screwdriver set for it? Definitely not.
I'm sure it's great for non-carpenters to have so many tools in so small a space. But I developed skills and tools already. My job isn't just to screw in a few screws a day and call it quits. People wanting to replace me for a quarter the cost for this Swiss army carpenter will quickly see a quality difference and realize why it's not a solution to everything.
Or in the software sense, maybe they are fine with unlevel shelves and hanging nails in carpet. It's certainly not work I'd find acceptable.
I think revealing the domain each programmer works in and asking in hose domains would reveal obvious trends. I imagine if you work in Web that you'll get workable enough AI gen code, but something like High Performance computing would get slop worse than copying and lasting the first result on Stackoverflow.
A model is only as good as its learning set, and not all types are code are readily able to be indexable.
I think that’s exactly right. I used to have to create the puzzle pieces and then fit them together. Now, a lot of the time something else makes the piece and I’m just doing the fitting together part. Whether there will come a day when we just need to describe the completed puzzle remains to be seen.
(Or if you’re being paid to waste time, maybe consider coding in assembly?)
So don’t be afraid. Learn to use the tools. They’re not magic, so stop expecting that. It’s like anything else, good at some things and not others.
LLMs are great at translating already-rigorously-thought-out pseudocode requirements, into a specific (non-esoteric) programming language, with calls to (popular) libraries/APIs of that language. They might make little mistakes — but so can human developers. If you're good at catching little mistakes, then this can still be faster!
For a concrete example of what I mean:
I hardly ever code in JavaScript; I'm mostly a backend developer. But sometimes I want to quickly fix a problem with our frontend that's preventing end-to-end testing; or I want to add a proof-of-concept frontend half to a new backend feature, to demonstrate to the frontend devs by example the way the frontend should be using the new API endpoint.
Now, I can sit down with a JS syntax + browser-DOM API cheat-sheet, and probably, eventually write correct code that doesn't accidentally e.g. incorrectly reject reject zero or empty strings because they're "false-y", or incorrectly interpolate the literal string "null" into a template string, or incorrectly try to call Element.setAttribute with a boolean true instead of an empty string (or any of JS's other thousand warts.) And I can do that because I have written some JS, and have been bitten by those things, just enough times now to recognize those JS code smells when I see them when reviewing code.
But just because I can recognize bad JS code, doesn't mean that I can instantly conjure to mind whole blocks of JS code that do everything right and avoid all those pitfalls. I know "the right way" exists, and I've probably even used it before, and I would know it if I saw it... but it's not "on the tip of my tongue" like it would be for languages I'm more familiar with. I'd probably need to look it up, or check-and-test in a REPL, or look at some other code in the codebase to verify how it's done.
With an LLM, though, I can just tell it the pseudocode (or equivalent code in a language I know better), get an initial attempt at the JS version of it out, immediately see whether it passes the "sniff test"; and if it doesn't, iterate just by pointing out my concerns in plain English — which will either result in code updated to solve the problem, or an explanation of why my concern isn't relevant. (Which, in the latter case, is a learning opportunity — but one to follow up in non-LLM sources.)
The product of this iteration process is basically the same JS code I would have written myself — the same code I wanted to write myself, but didn't remember exactly "how it went." But I didn't have to spend any time dredging my memory for "how it went." The LLM handled that part.
I would liken this to the difference between asking someone who knows anatomy but only ever does sculpture, to draw (rather than sculpt) someone's face; vs sitting the sculptor in front of a professional illustrator (who also knows anatomy), and having the sculptor describe the person's face to the illustrator in anatomical terms, with the sketch being iteratively improved through conversation and observation. The illustrator won't perfectly understand the requirements of the sculptor immediately — but the illustrator is still a lot more fluent in the medium than the sculptor is; and both parties have all the required knowledge of the domain (anatomy) to communicate efficiently about the sculptor's vision. So it still goes faster!
They don't have high status even today, imagine in a world where they will be seen as just reviewers for AI code...
Try putting on a dating website that you work at Google vs you work in agriculture and tell us which yielded more dates.
With so many hits, it's about hitting all the checkmarks instead of minmaxing on one check.
My comment was more a critique on online dating culture and the values it weighs compared to in person meetups.
We all have probably 25% or more trivial code. AI is great for that. I have X (table structure, model, data, etc) and I want to make Y with it. A lot of code is pretty much mindless shuffling data around.
The other thing is good for is anything pretty standard. If I'm using a new technology and I just want to get started with whatever is the best practice, it's going to do that.
If I ever have to do PowerShell (I hate PowerShell), I can get AI to generate pretty much whatever I want and then I'm smart enough to fix any issues. But I really don't like starting from nothing in a tech I hate.
I’m pretty sure they weren’t the first and there’ve been others we didn’t know about. So now I don’t ask lead-in questions anymore. Surprisingly, it doesn’t seem to make much of a difference and I don’t need to get burned again.
But for me it massively improved all the boilerplate generic work. A lot of those things which are just annoying work, but not interesting.
Then I can focus on the bigger things, on the important parts.
From what I've seen on Google Cloud, both as a user and from leaked source code, 25% of their code is probably just packing and unpacking of protobufs.
God I miss that, thanks for the other person on HN introducing me to projen. Yeoman wasnt cutting it.
These days I write a surprising amount of shell script and awk with LLMs. I review and adapt it, of course, but for short snippets of low context scripting it's been a huge time saver. I'm talking like 3-4, up to 20 lines of POSIX shell.
Idk. Some day I'll actually learn AWK, and while I've gotten decent with POSIX shell (and bash), it's definitely been more monkey see monkey do than me going over all the libraries and reference docs like I did for python and the cpp FAQ.
Of obviously flawed corporate structures. This CEO has no particular programming expertise and most of his companies profits do not seem to flow from this activity. I strongly doubt he has a grip on the actual facts here and is uncritically repeating what was told to him in a meeting.
He should, given his position, been the very _first_ person to ask the questions you've posed here.
I'm looking for a new job, so I've been grinding leetcode (oof). I'm an experienced engineer and have worked at multiple FAANGs, so I'm pretty good at leetcode.
Today I solved a leetcode problem 95% of the way to completion, but there was a subtle bug (maybe 10% of the test cases failing). I decided to see if Claude could help debug the code.
I put the problem and the code into Claude and asked it to debug. Over the course of the conversation, Claude managed to provide 5 or 6 totally plausible but also completely wrong "fixes". Luckily, I am experienced enough at leetcode, and leetcode problems are simple enough, that I could easily tell that Claude was mistaken. Note that I am also very experienced with prompt engineering, as I ran a startup that used prompt engineering very heavily. Maybe it's a skill issue (my company did fail, hence why I need a job), but somehow I doubt it.
Eventually, I found the bug on my own, without Claude's help. But leetcode are super simple, with known answers, and probably mostly in the training set! I can't imagine writing a big system and using an LLM heavily.
Similarly, the other day I was trying to learn about e-graphs (the data structure). I went to Claude for help. I noticed that the more I used Claude, the more confused I became. I found other sources, and as it turns out, Claude was subtly wrong about e-graphs, an uncommon but reasonably well-researched data structure! Once again, it's lucky I was able to recognize that something was up. If the problem wasn't limited in scope, I'd have been totally lost!
I use LLMs to help me code. I'm pro new technology. But when I see people bragging on Twitter about their fully automated coding solutions, or coding complex systems, or using LLMs for medical records or law or military or other highly critical domains, I seriously question their wisdom and experience.
> how trivial the problems they solve are
A single line of code IS trivial. Simple code is good code. If I write the first 3 lines of a complex method and I let Copilot complete the 4th, that's 25% of my code written by an LLM.
These tools have exploded in popularity for good reason. If they were no good, people wouldn't be using them.
I can only assume people making such comments don't actually code on a daily basis and use these tools daily. Either that or you haven't figured out the knack of how to make it work properly for you.
You're saying anything that's ever been popular is popular for a good reason? You can't think of counter examples that disprove this?
You're saying anything that people decide to do is good, or else people wouldn't do it? People never act irrationally? People never blindly act on trends? People never sacrifice long-term results for short-term gain? You can't come up with any counter examples?
I use these tools daily and they help me immensely. If you prefer Googling for docs, browsing stack overflow, or even flicking through textbooks to find the answers/materials you need - that's great! Do what works for you. I value my time slightly more than that and prefer not to remain stuck in the past.
Perhaps you hold all the information you need in your head like an oracle and never need to learn new concepts or ever forget syntax? Wonderful. The rest of us aren't so naturally talented, so have found these new tools super helpful.
With c++ my experience is that the results are completely worthless. It saves you from writing a few keywords but nothing that really helps in a big way.
Yes Copilot CAN work, for example writing some JS or filter functions, but in my job these trivial snippets are rather uncommon.
I‘d genuinely love to see some resources that show its usefulness that arent just PR bs.
What does that even mean? What are you expecting to see?
I've seen people who can't code ship entire new applications which actually work, in a few days or so. That to me seems more productive?
I use these tools daily in a FAANG level SWE role and they help me debug issues quickly - all the time, especially with tech I'm new to and have no experience with. I really don't understand the hate - it's like skipping stack overflow and giving you the ideal answer a lot faster.
Nobody likes to shout that they're using these tools but most people are.
Except of course AI at least can do spelling. (Or at least I haven't encountered a problem in that regard.)
I'm highly skeptical regarding LLM-assisted development. But I must admit: it works. If paired with an experienced senior developer. IMHO it must not be used otherwise.
Damn, that’s a good way of putting it. But I’ll go one further:
replace "AI" with "junior dev who doesn’t like reading documentation or googling how things work so instead confidently types away while guessing the syntax and API so it kind of looks right”
Currently, they don't learn skills as fast as a motivated intern. A stellar intern can go from no idea to "makes relevant contributions to our product with significant independence and low error rate" (hi Matt if you ever see this) in 3 months. LLMs, to my understanding, take significantly more attention from super smart people working long hours and an army of mechanical Turks, but won't be able to independently implement a feature and will still have a higher error rate in the same 3 months.
It's still super impressive what LLMs can do, but that same intern is going to keep growing at that faster rate in skills and competency as they go from jr->mid->sr. Sure the intern won't have as large of a competency pool, and takes longer to respond to any given question, but the scope of what they can implement is so much greater.
Well, just in the last 24 hours, ChatGPT gave me solutions to some relatively complex problems that turned out to be significantly wrong.
Did that mean it was a complete waste of my time? I’m not sure. Its broken code gave me a starting point for tinkering and exploring and trying to understand why it wasn’t working (even if superficially it looked like it should). I’m not convinced I lost anything by trying its suggestions. And I learned some things in the process (e.g. asyncio doesn’t play well together with Flask-Sock)
This, imho, is what is happening. In the olden days, when StackOverflow + Google used to magically find the exact problem from the exact domain you needed every time - even then you'd often need to sift through the answers (top voted one was increasingly not what you needed) to find what you needed, then modify it further to precisely fit whatever you were doing. This worked fine for me for a long time until search rendered itself worthless and the overall answer quality of StackOverflow has gone down (imo). So, we are here, essentially doing the exact same thing in a much more expensive way, as you said.
Regarding future employment opportunities - this rot is already happening and hires are coming from it, at least from what I'm seeing in my own domain.
Unless you're the type of programmer that is writing sabots all day (connecting round pegs into square holes between two data sources) you've got to be very critical of what these things are spitting out.
If you just let it generate and run the code... yeah, probably, since you won't catch the issues at compile time.
For most of my career, Software Engineering was a misnomer. The field was too young, and the tools used changed too quickly, for an appreciable amount of the work to be systematic and boring enough to consider it an Engineering discipline.
I think we're now at the point where Software Engineering is... actually Engineering. Particularly in the case of large established companies that take software seriously, like Google (as opposed to e.g. a bank).
Call it "trivial" and "boring" all you want, but at some point a road is just a road, and a train track is just a train track, and if it's not "trivial and boring" then you've probably fucked up pretty badly.
I’m an engineer who writes code since 20 years and it’s far away from trivial . Maybe to do web dev for a simple Webshop is. Elsewhere software has often times special requirements. Be them technical or domain wise both make the process complex and not simple IMHO
Not all engineering is boring. Also, boring is not bad.
A lot of my career has been spent working to make software boring. To the extent that I've helped contribute to the status quo, where we can build certain types of software in a relatively secure fashion and on relatively predictable timelines, I am proud to have made the world more boring!
(Also, complexity can be extraordinarily boring. Some of the most complex things are also the most boring. Nothing more boring than a set of business rules that has an irreducible complexity coming in at 5,211 lines of if-else blocks wrapped in two while loops! Give me a simple set of partial differential equations any day -- much more exciting to work with those! If you're the type of person who enjoys reading tax code, then we just have different definitions of boring; and if you're the type of person doesn't think tax code is complex, then I'm just a dummy compared to you :))
But e.g. in the early naughts doing structural engineering work for residential new build projects was certainly less engaging and exciting work than building websites.
Most engineering works aims for repeatable and predictable outcomes. That's a good thing, and it's not easy to achieve! But if Software has reached the point where the process of building certain types of software is "repeatable and predictable", and if Google needs a lot of that type of software, then if the main criticism of AI code assistants is "it's only good for repeatable and predictable", well, then the criticism isn't exactly the indictment that skeptics think it is.
There is nothing wrong with boring in the sense I'm using it. Boring can be tremendously intellectually demanding. Also, predictable and repeatable processes are incredibly important if you want quality work at scale. Engineering is a good thing. Maturing as a field is a good thing.
But if we're maturing from "wild west everything is a greenfield project" to "70% of things are pretty systematic and repeatable" then that says something about the criticism of AI coding assistants as being only good for the systematic and repeatable stuff, right?
Also: the AI coding assistant paradigm is coming for structural/mechanical/civil engineering next, and in a big way!
There's even a buzzword for it: KTLO (keep the lights on). You don't want to be spending 100% of your time on KTLO work, but it's unrealistic to expect to do done of it. Most software engineers would gladly outsource this type of scutwork.
Some places also call this "RTB" for "run the business" type work. Nothing but respect for the engineers who enjoy that kind of approach, I work with several!
Source: I work there, see my previous comment.
Surely yes.
I (not at Google) rarely use the LLM for anything more than two lines at a time, but it writes/autocompletes 25% of my code no problem.
I believe Google have character-level telemetry for measuring things like this, so they can easily count it in a way that can be called "writing 25% of the code".
Having plenty of "trivial code" isn't an indictment of the organisation. Every codebase has parts that are straightforward.
Even if their engineers were inexperienced that wouldn't be an indictment in itself so long as they had a sufficient necessary amount of shallow work. Using all experienced engineers to do shallow work is just inefficient, like having brain surgeons removing bunions. Automation is basically a way to transform deep work to a producer of "free" shallow work.
That said, the real impressive thing with code isn't in its creation but in its ability to losslessly delete code and maintain or improve functionality.
Thankfully I don't find it subtle but plain wrong for anything but trivial stuff. I use it (and pay an AI subscription) for things where false positive won't ruin the day, like parameters validation.
But for anything advanced, it's pretty hopeless.
I've talked with lawyers: same thing. With doctors: same thing.
Which ain't no surprise see how these things do work.
> Like, isn't this announcement a terrible indictment of how inexperienced their engineers are, or how trivial the problems they solve are, or both?
Probably lots of highly repetitive boilerplate stuff everywhere. Which in itself is quite horrifying if you think about it.
How do you quantify "new code" - is it by lines of code or number of PRs/changesets generated? I can easily see it being the latter - if an AI workflow suggests 1 naming-change/cleanup commit to your PR made of 3 other human-authored commits, has it authored 25% of code? Arguably, yes - but it's trivial code that ought to be reviewed by humans. Dependabot is responsible for a good chunk of PRs already.
Having a monorepo brings plenty of opportunities for automation when refactoring - whether its AI, AST manipulation or even good old grep. The trick is not to merge the code directly, but have humans in the loop to approve, or take-over and correct the code first.
Their internal AI tools are presumably trained on their code, and it wouldn't surprise me if the AI is capable of much more internally than public coding AIs are.
Well, Rob Pike said same thing about experience and that seemed to pissed lot of people endlessly.
However I don't think it as indictment It just seems very reasonable to me. In fact 25% seem to be on lower end. Amazon seems to have thousands of software engineers who are doing API calling API calling API.. kind of crap. Now their annual income might be more than my lifetime earnings. But to think that all these highly paid engineers are doing highly complex work that need high skills seems just a myth that is useful to boost ego of engineers and their employers alike.
If anything that's probably an underestimate. Not to downplay the complexity in much of what Google does but I'm sure they also do an absolute ton of tedious, boring CRUD operations that an AI could write.
For the most part, it drives itself.
Yes, the majority of my code is trivial. But I've also had ai iterate on some very non trivial work including writing the test suite.
It's basically autocomplete on steroids that predicts your next change in the file, not just the next change on the line.
The copy paste from stack overflow trope is a bit weird, I haven't done that in ten years and I don't think the code it produces is that low quality either. Copy paste from an open source repo on GitHub maybe?
Or maybe there's a KPI around lines of code or commits.
maybe the ai generates 100% of the company's new code, and then by the time the programmers have fixed it, only 25% is left of the AI's ship of Theseus
I think you underestimate the amount of boiler-plate code that a typical job at Google requires. I found it soul-crushingly boring (though their pay is insane).
It is not Netflix or Airbnb or Stripe etc making this claim, google managers have a vested interest in this.
If this metric was meaningful either of two things should have happened - google should have fired 25 % developers or built 25 % more product .
Both of this would visible in their financial reporting and has not happened.
metrics like this claim depends on how you count, that is easily gamed and can be made to show any % between 0-99 you want. Of the top of head
- I could count all AI generated code used for training as new code
- consider compiler output to assembly as AI code by adding some meaningless AI step in it
- code generated with boilerplate perhaps even generated by llm now
- mix autocomplete with llm prompts so on
The number only needs to believable , 25 is believable now, it is not true but you would believe it >50 has psychological significance and bad PR on machines replacing humans jobs , less than 10 is bad for AI sales , 25 works all the commenters in this thread is testament to that
Now replace all this and much more with 'AI'. If they said AI helped them increase say ad effectivity by 3-5%, I'll start paying attention.
there is a 3rd possibility as well: having spent a huge chunk of change on these techniques, why not overhype it (not outright lie about it) and hope to, somewhat recoup the cost from unsuspecting masses ?
How people deal with this is they start by writing the test case.
Once they have that, debugging that 25% comes relatively easily and after that its basically packaging up the PR
This is not a bad thing since you can improve, but constantly dismissing something that a lot of people are finding an amazing productivity boost should give you some pause.
But as of now the field is full of swamps. Of grifters, of people with a solution looking for a problem. Of outright scams of questionable legality being challenged as we speak.
I'll wait until the swamps work itself out before evaluating an LLM workflow.
LLMs are being used right now by a lot of people, myself included, to do tasks which we would have never bothered with before.
Again, if you don't know how to use them you can learn.
It's a legal nightmare in my domain as of now, so I'll make sure the Sam Breaker-Friends are weeded out. If it's really all the hype it won't be going anywhere in 5 years.
In a purely technical vacuum though: it is truly amazing tech. I will give it that. Although it both excites and alarms me that apparently the power output predicted to properly leverage this at scale is having tech companies consider an investment in nuclear power.
I won't blame people for that latter, I'd love a good quick way out of traditional work as well (gives me more time to hack on stuff without money troubles). But it's not a good model for curiosity and scrutiny. Again, I'll wait it out. Take care.
Investing trillions in carbon free energy for AI is the most benign form of bubble I can imagine. If the bubble pops we have enough base load for the next century and don't die from climate change. If it doesn't we have the expertise to keep building large nuclear power plants.
That's not that much...
Not that I'm really discounting the value of AI here. For example, I've found a ton of value and saved time getting AI to write CDKTF (basically, Terraform in Typescript) config scripts for me. I don't write Terraform that often, there are a ton of options I always forget, etc. So asking ChatGPT to write a Terraform config for, say, a new scheduled task for example saves me from a lot of manual lookup.
But at the same time, the AI isn't really writing the complicated logic pieces for me. I think that comes down to the fact that when I do need to write complicated logic, I'm a decent enough programmer that it's probably faster for me to write it out in a high-level programming language than write it in English first.
IMO it's only really an issue if a competent human wasn't involved in the process, basically a person who could have written it if needed, then they do the work connecting it to the useful stuff, and have appropriate QA/testing in place...the latter often taking far more effort than the actual writing-the-code time itself, even when a human does it.
That said, I've seen even higher ratios. But never in any place that survived for long.
It's only boilerplate if you write it again to set almost the same thing again. What, granted, if you are writing bare terraform config, it's probably both.
But on either case, if your terraform config is repetitive and a large part of the code on an entire thing (not a repo, repos are arbitraty divisions, maybe "product", but it's also a bad name). Than that thing is certainly close to useless.
Everything is getting forced into a scalable, general purpose way, that most apps have to add a ridiculous amount of boilerplate.
Not the developer who has written the same effective stanza 10 times before.
Architecturally, it sounds like different architecture components map somewhere close to 1:1 to teams, rather than teams hacking components to be closer coupled to each other because they have the same ownership.
I'd see too much boilerplate as being a organization/management org issue rather than a code architecture issue
Combine that with generic functions, framework boilerplate, OS/browser stuff, or explicit x-y-z code then your 'boilerplate' (ie repetitive, easily reproducible) easily gets to 25% of code you're programmers write every month. If your job is >75% pure human cognition problem solving you're probably in a higher tier of jobs than the vast majority of programmers on the planet.
There are still things I do in my IDE that I can’t seem to get AI to do. It’s not really close yet. I don’t doubt it could get there eventually, but I suppose I don’t believe it’s about to eat those parts of the industry.
I do anticipate a massive issue from lower skill software jobs vanishing. I don’t know what entry into the industry will look like. There will be a strange gap that’s filled by AI and some people who use it to do basic things but have no idea how it does it. They will be somewhat like data entry workers, knowing how to use a spreadsheet or word processor but having no idea how the program actually works let alone the underlying operating system. I fully expect that to happen, and I can’t properly imagine what the implications will be.
With g3's immense amount of context, LLMs can vastly help you discover how other people are using existing libraries.
in regards to how others are using libraries, that’s where the technology will excel— re-writing code. once it has a stable AST to work with, the mathematical equation it is solving is a refactor.
until it has that AST that solves the business need, the game is just prompt spaghetti until it hits altitude to be able to refactor.
Google could be writing the same amount of code with fewer developers (they have had multiple layoffs lately), or their developers could be focusing more of their time and attention on the code they do write.
The truth is likely somewhere in between.
That may explain why google search has, in the past couple of months, become so unusable for me that I switched (happily) to kagi.
> New tool bypasses Google Chrome’s new cookie encryption system
https://news.ycombinator.com/item?id=41988648
- You know which 20K lines need changing - You have perfect QA - Nothing ever goes wrong in deployment.
I think there's a tendency in our industry to only take the hypotenuse of curves at the steepest point
I'm not sure this stat is as important as people point it out to be. If I start of `for` and the AI auto-completes `for(int i=0; i<args.length; i++) {` then a lot more than 25% of the code is AI written but it's also not significant. I could've figured out how to write the for-loop and its also not a meaningful amount of time saved because most of the time is figuring out and testing which the AI doesn't do.
I consider myself risk-averse and even I am contemplating starting a small business in the event I get laid off.
It really isn't. Even if you get laid off from a large tech company, you probably didn't have to pay a cent to get the job there in the first place, and you started drawing a paycheck right away (after the initial delay due to the pay cycle). If you only work there for 6 months, you can save a really good amount of money if you have frugal habits.
Starting a company isn't nearly as easy, usually requires up-front investment, and there can be a long time before you generate any profit. Either you need some business idea that's going to generate profit (or at least enough revenue to give the founder(s) a paycheck), or a business loan or other funding, which means convincing someone to invest in your company somehow.
Starting your own company only sounds appealing if you ignore reality, or have the privilege of having plenty of cash saved up for such a venture.
If I was working a typical corp job, I would "quiet quit" and start using my excess savings to run small experiments. Maybe run like 3 in parallel a month and let them cook.
Example - start a niche blog based on my hobby, hire 3 writers and pay them up to $3k a month to write content for the blog. Let it cook for months. If it gets traffic, monetize with ads. If very profitable, quit job.
Starting an internet based business if fairly cheap nowadays, especially with cursor and ai.
All of that said, there are a lot of products that are produced by large companies and are just bad. Don’t be afraid to go after a Goliath if you see an opportunity.
New successful businesses are being created all the time. We just focus on the ones that have already been successful for a long time.
it's like companies paying all those todolist and tutorial apps left running on aws ec2 instances in 2007ish.
I'd be worried if i were a google investor. lol.