Once GPT is tuned more heavily on Lean (proof assistant) -- the way it is on Python -- I expect its usefulness for research level math to increase.
I work in a field related to operations research (OR), and ChatGPT 4o has ingested enough of the OR literature that it's able to spit out very useful Mixed Integer Programming (MIP) formulations for many "problem shapes". For instance, I can give it a logic problem like "i need to put i items in n buckets based on a score, but I want to fill each bucket sequentially" and it actually spits out a very usable math formulation. I usually just need to tweak it a bit. It also warns against weak formulations where the logic might fail, which is tremendously useful for avoiding pitfalls. Compare this to the old way, which is to rack my brain over a weekend to figure out a water-tight formulation of MIP optimization problem (which is often not straightforward for non-intuitive problems). GPT has saved me so much time in this corner of my world.
Yes, you probably wouldn't be able to use ChatGPT well for this purpose unless you understood MIP optimization in the first place -- and you do need to break down the problem into smaller chunks so GPT can reason in steps -- but for someone who can and does, the $20/month I pay for ChatGPT more than pays for itself.
side: a lot of people who complain on HN that (paid/good - only Sonnet 3.5 and GPT4o are in this category) LLMs are useless to them probably (1) do not know how to use LLMs in way that maximizes their strengths; (2) have expectations that are too high based on the hype, expecting one-shot magic bullets. (3) LLMs are really not good for their domain. But many of the low-effort comments seem to mostly fall into (1) and (2) -- cynicism rather than cautious optimism.
Many of us who have discovered how to exploit LLMs in their areas of strength -- and know how to check for their mistakes -- often find them providing significant leverage in our work.
WhatIsDukkha 115 days ago [-]
I entirely agree about their utility.
HN, and the internet in general, have become just an ocean of reactionary sandbagging and blather about how "useless" LLMs are.
Meanwhile, in the real world, I've found that I haven't written a line of code in weeks. Just paragraphs of text that specify what I want and then guidance through and around pitfalls in a simple iterative loop of useful working code.
It's entirely a learned skill, the models (and very importantly the tooling around them) have arrived at the base line they needed.
Much Much more productive world by just knuckling down and learning how to do the work.
> Much Much more productive world by just knuckling down and learning how to do the work.
The fact everyone that say they've become more productive with LLMs won't say how exactly. I can talk about how VIM have make it more enjoyable to edit code (keybinding and motions), how Emacs is a good environment around text tooling (lisp machine), how I use technical books to further my learning (so many great books out here). But no one really show how they're actually solving problems with LLMs and how the alternatives were worse for them. It's all claims that it's great with no further elaboration on the workflows.
> I haven't written a line of code in weeks. Just paragraphs of text that specify what I want and then guidance through and around pitfalls in a simple iterative loop of useful working code.
Code is intent described in terms of machinery actions. Those actions can be masked by abstracting them in more understandable units, so we don't have to write opcodes, but we can use python instead. Programming is basically make the intent clear enough so that we know what units we can use. Software engineering is mostly selecting the units in a way to do minimal work once the intent changes or the foundational actions do.
Chatting with a LLM look to me like your intent is either vague or you don't know the units to use. If it's the former, then I guess you're assuming it is the expert and will guide you to the solution you seek, which means you believe it understands the problem more than you do. The second is more strange as it looks like playing around with car parts, while ignoring the manuals it comes with.
What about boilerplate and common scenarios? I agree that LLMs helps a great deal with that, but the fact is that there are perfectly good tools that helped with that like snippets, templates, and code generators.
Nadya 115 days ago [-]
Ever seen someone try and search something on Google and they are just AWFUL at it? They can never find what they're looking for and then you try and can pull it up in a single search? That's what it is like watching some people try to use LLM's. Learning how to prompt an LLM is as much a learned skill as much as learning how to phrase internet searches is a learned skill. And as much as people decried that "searching Google isn't a real skill" tech-savvy people knew better.
Same thing except now it's also many tech-savvy people joining in with the tech-unsavvy in saying that prompting isn't a real skill...but people who know better know that it is.
On average, people are awfully bad at describing exactly what it is they want. Ever speak with a client? And you have to go back and forward for a few hours to finally figure out what it is they wanted? In that scenario you're the LLM. Except the LLM won't keep asking probing questions and clarifications - it will simply give them what they originally asked for (which isn't what they want). Then they think the LLM is stupid and stop trying to ask it for things.
Utilizing an LLM to its full potential is a lot of iterative work and, at least for the time being, requires having some understanding of how it works underneath the hood (eg. would you get better results by starting a new session or asking it to forget previous, poorly worded instructions?).
skydhash 115 days ago [-]
I'm not arguing that you can't get result with LLMs, I'm just asking is it worth the actual effort especially when there's better way to get that result you're seeking (or if the result is really something that you want).
An LLM is a word (token?) generator which can be amazingly consistent according to its model. But rarely is my end goal to generate text. It's either to do something, to understand something, or to communicate. For the first, there are guides (books, manuals, ...), for the second, there are explanations (again books, manuals,...), and the third is just using language to communicate what's on my mind.
That's the same thing with search engines. I use them to look for something. What I need first is a description of that something, not how to do the "looking for". Then once you know what you want to find, it's easier to use the tool to find it.
If your end goal can be achieved with LLMs, be my guest to use them. But, I'm wary of people taking them at face value and then pushing the workload unto everyone else (like developers using electron).
Nadya 115 days ago [-]
It's hard to quantify how much time learning how to search saves because the difference can range between infinite (finding the result vs not finding it at all) to basically no difference (1st result vs 2nd result). I think many people agree it is worth learning how to "properly search" though. You spend much less time searching and you get the results you're looking for much more often. This applies outside of just Google search: learning how to find and lookup information is a useful skill in and of itself.
ChatGPT has helped me write some scripts for things that otherwise probably would have taken me at least 30+ minutes and it wrote them in <10 seconds and they worked flawlessly. I've also had times where I worked with it to develop something that ended up taking me 45 minutes to only ever get error-ridden code that I had to fix the obvious errors and rewrite parts of it to get it working. Sometimes during this process it actually has taught me a new approach to doing something. If I had started from scratch coding it by myself it probably would have taken me only 10~ minutes. But if I was better at prompting what if that 45 minutes was <10 minutes? It would go from from a time loss to a time save and be worth using. So improving my ability to prompt is worthwhile as long as doing so trends towards me spending less time prompting.
Which is thankfully pretty easy to track and test. On average, as I get better at prompting, do I need to spend more or less time prompting to get the results I am looking for? The answer to that is largely that I spend less time and get better results. The models constantly changing and improving over time can make this messy - is it the model getting better or is it my prompting? But I don't think models change significantly enough to rule out that I spend less time prompting than I have in the past.
panarky 115 days ago [-]
> how much time learning how to search saves
>>> you do need to break down the problem into smaller chunks so GPT can reason in steps
To search well, you need good intuition for how to select the right search terms.
To LLM well, you can ask the LLM to break the problem into smaller chunks, and then have the LLM solve each chunk, and then have the LLM check its work for errors and inconsistencies.
And then you can have the LLM write you a program to orchestrate all of those steps.
gtirloni 114 days ago [-]
Yes you can. What was the name of the agent that was going to replace all developers? Devin or something? It was shown it took more time iterate over a problem and created terrible solutions.
LLMs are in the evolutionary phase, IMHO. I doubt we're going to see revolutionary improvements from GPTs. So I say time and time again: the technology is here, show it doing all the marvelous things today. (btw, this is not directed at your comment in particular and I digressed a bit, sorry).
smallnamespace 115 days ago [-]
> asking is it worth the actual effort
If prompting ability varies then this is not some objective question, it depends on each person.
For me I've found more or less every interaction with an LLM to be useful. The only reason I'm not using it continually for 8 hours a day is because my brain is not able to usefully manage that torrent of new information and I need downtime.
rvnx 115 days ago [-]
It works quite nicely if you consider LLMs as a translator (and that’s actually why Transformers were created).
Enter technical specifications in English as input language, get code as destination language.
lolc 115 days ago [-]
English as input language works in simple scenarios but breaks down very very quickly. I have to get extremely specific and deliberate. At some point I have to write pseudocode to get the machine to get say double checked locking right. Because I have enough experiences where varying the prompting didn't work, I revert to just writing the code when I see the generator struggling.
When I encounter somebody who says they do not write code anymore, I assume that they either:
1. Just don't do anything beyond the simplest tutorial-level stuff
2. or don't consider their post-generation edits as writing code
3. or are just bullshitting
I don't know which it is for each person in question, but I don't trust that their story would work for me. I don't believe they have some secret sauce prompting that works for scenarios where I've tried to make it work but couldn't. Sure I may have missed some ways, but my map of what works and what doesn't may be very blurry at the border, but the surprises tend to be on the "doesn't work" side. And no Claude doesn't change this.
ben_w 114 days ago [-]
I definitely still write code. But I also prefer to break down problems into chunks which are small enough that an LLM could probably do them natively, if only you can convince it to use the real API instead of inventing a new API each time — concrete example from ChatGPT-3.5, I tried getting it to make and then use a Vector2D class — in one place it had sub(), mul() etc., the other place it had subtract(), multiply() etc.
It can write unit tests, but makes similar mistakes, so I have to rewrite them… but it nevertheless still makes it easier to write those tests.
It writes good first-drafts for documentation, too. I have to change it, delete some stuff that's excess verbiage, but it's better than the default of "nobody has time for documentation".
andrepd 114 days ago [-]
Exactly! What is this job that you can get where you don't code and just copy-paste from ChatGPT? I want it!
My experience is just as you describe it: I ask a question whose answer is in stackoverflow or fucking geeks4geeks? Then it produces a good answer. Anything more is an exercise in frustration as it tries to sneak nonsense code past me with the same confident spiel with which it produces correct code.
ben_w 114 days ago [-]
It's absolutely a translator, but they're similar good/bad/weird/hallucinaty at natural translation translations, too.
"It's a translator. But they seem to be good/bad/weird/delusional in natural translations. I have a"
(Google translate stopped suddenly, there).
I've tried using ChatGPT to translate two Wikipedia pages from German to English, as it can keep citations and formatting correct when it does so; it was fine for the first 2/3rds, then it made up mostly-plausible statements that were not translated from the original for the rest. (Which I spotted and fixed before saving, because I was expecting some failure).
Don't get me wrong, I find them impressive, but I think the problem here is the Peter Principle: the models are often being promoted beyond their competence. People listen to that promotion and expect them to do far more than they actually can, and are therefore naturally disappointed by the reality.
People like me who remember being thrilled to receive a text adventure casette tape for the Commodore 64 as a birthday or christmas gift when we were kids…
…compared to that, even the Davinci model (that really was autocomplete) was borderline miraculous, and ChatGPT-3.5 was basically the TNG-era Star Trek computer.
But anyone who reads me saying that last part without considering my context, will likely imagine I mean more capabilities than I actually mean.
ben_w 115 days ago [-]
> On average, people are awfully bad at describing exactly what it is they want. Ever speak with a client? And you have to go back and forward for a few hours to finally figure out what it is they wanted?
One of them it was the entire duration of me working for them.
They didn't understand why it was taking so long despite constantly changing what they asked for.
codr7 115 days ago [-]
Building the software is usually like 10% of the actual job, we could do a better job of teaching that.
The other 90% is mostly mushy human stuff, fleshing out the problem, setting expectations etc. Helping a group of people reach a solution everyone is happy with has little to do with technology.
ben_w 115 days ago [-]
Mostly agree. Until ChatGPT, I'd have agreed with all of that.
> Helping a group of people reach a solution everyone is happy with has little to do with technology.
This one specific thing, is actually something that ChatGPT can help with.
It's not as good as the best human, or even a middling human with 5 year's business experience, but rather it's useful because it's good enough at so many different domains that it can be used to clarify thoughts and explain the boundaries of the possible — Google Translate for business jargon, though like Google Translate it is also still often wrong — the ultimate "jack of all trades, master of none".
codr7 114 days ago [-]
We're currently in the shiny toy stage, once the flaws are thoroughly explored and accepted by all as fundamental I suspect interest will fade rapidly.
There's no substance to be found, no added information; it's just repeating what came before, badly, which is exactly the kind of software that would be better off not written if you ask me.
The plan to rebuild society on top of this crap is right up there with basing our economy on manipulating people into buying shit they don't need and won't last so they have to keep buying more. Because money.
threecheese 114 days ago [-]
The worry I have is that the net value will become great enough that we’ll simply ignore the flaws, and probabilistic good-enough tools will become the new normal. Consider how many ads the average person wades through to scroll an Insta feed for hours - “we’ve” accepted a degraded experience in order to access some new technology that benefits us in some way.
To paraphrase comedian Mark Normand: “Capitalism!”
codr7 114 days ago [-]
Scary thought, difficult to unthink.
I'm afraid you might be right.
We've accepted a lot of crap lately just to get what we think we want, convenience is a killer.
ben_w 114 days ago [-]
Indeed, even if I were to minimise what LLMs can do, they are still achieving what "targeted advertising" very obviously isn't.
codr7 114 days ago [-]
They're both short sighted attempts at extracting profit while ignoring all negative consequences.
ben_w 114 days ago [-]
To extent I agree, I think that's true for all tech since the plough, fire, axles.
But I would otherwise say that most (though not all*) AI researchers seem to be deeply concerned about the set of all potential negative consequences, including mutually incompatible outcomes where we don't know which one we're even heading towards yet.
* And not just Yann LeCun — though, given his position, it would still be pretty bad even if it was just him dismissing the possibility of anything going wrong
SmartHypercube 114 days ago [-]
> That's what it is like watching some people try to use LLM's.
Exactly. I made a game testing prompting skills a few days earlier, to share with some close friends, and it was your comment that inspired me to translate the game into English and submitted to HN. ( https://news.ycombinator.com/item?id=41545541 )
I am really curious about how other people write prompts, so while my submission only got 7 points, I'm happy that I can see hundreds of people's own ways to write prompts thanks to HN.
However, after reading most prompts (I may missed some), I found exactly 0 prompts containing any kind of common prompting techniques, such as "think step by step", explaining specific steps to solve the problem instead of only asking for final results, few-shots (showing example inputs and outputs). Half of the prompts are simply asking AI to do the thing (at least asking correctly). The other half do not make sense, even if we show the prompt to a real human, they won't know what to reply with.
Well... I expected that SOME complaints about AI online are from people not familiar with prompting / not good at prompting. But now I realized there are a lot more people than I thought not knowing some basic prompting techniques.
Anyway, a fun experience for me! Since it was your comment made me want to do this, I just want to share it with you.
atlex2 112 days ago [-]
Could you reference any youtube videos, blog posts, etc of people you would personally consider to be _really good_ at prompting? Curious what this looks like.
While I can compare good journalists to extremely great and intuitive journalists, I don't have really any references for this in the prompting realm (except for when the Dall-e Cookbook was circulating around).
Nadya 110 days ago [-]
Sorry for the late response - but I can't. I don't really follow content creators at a level where I can recall names or even what they are working on. If you browse AI-dominated spaces you'll eventually find people who include AI as part of their workflows and have gotten quite proficient at prompting them to get the results they desire very consistently. Most AI stuff enters into my realm of knowledge via AI Twitter, /r/singularity, /r/stablediffusion, and Github's trending tab. I don't go particularly out of my way to find it otherwise.
/r/stablediffusion used to (less so now) have a lot of workflow posts where people would share how they prompt and adjust the knobs/dials of certain settings and models to make what they make. It's not so different from knowing which knobs/dials to adjust in Apophysis to create interesting fractals and renders. They know what the knobs/dials adjust for their AI tools and so are quite proficient at creating amazing things using them.
People who write "jailbreak" prompts are a kind of example. There is some effort put into preventing people from prompting the models and removing the safeguards - and yet there are always people capable of prompting the model into removing its safeguards. It can be surprisingly difficult to do yourself for recent models and the jailbreak prompts themselves are becoming more complex each time.
For art in particular - knowing a wide range of artist names, names of various styles, how certain mediums will look, as well as mix & matching with various weights for the tokens can get you very interesting results. A site like https://generrated.com/ can be good for that as it gives you a quick baseline of how including certain names will change the style of what you generate. If you're trying to hit a certain aesthetic style it can really help. But even that is a tiny drop in a tiny bucket of what is possible. Sometimes it is less about writing an overly detailed prompt but rather knowing the exact keywords to get the style you're aiming for. Being knowledgeable about art history and famous artists throughout the years will help tremendously over someone with little knowledge. If you can't tell a Picasso from a Monet painting you're going to find generating paintings in a specific style much harder than an art buff.
114 days ago [-]
pbrowne011 115 days ago [-]
> But no one really show how they're actually solving problems with LLMs and how the alternatives were worse for them. It's all claims that it's great with no further elaboration on the workflows.
To give an example, one person (a researcher at DeepMind) recently wrote about specific instances of his uses of LLMs, with anecdotes about alternatives to each example. [1] People on HN had different responses with similar claims with elaborations on how it has changed some of their workflows. [2]
While it would be interesting to see randomized controlled trials on LLM usage, hearing people's anecdotes brings to mind the (often misquoted) phrase: "The plural of anecdote is data". [3] [4]
Or people say "I've been pumping out thousands of lines of perfectly good code by writing paragraphs and paragraphs of text explaining what I want!" its like what are you programming dog? and they will never tell you, and then you look at their github and its like a dead simple starter project.
I recently built a Brainfuck compiler and TUI debugger and I tested out a few LLM's just to see if I could get some useful output regarding a few niche and complicated issues, and it just gave me garbage that looked mildly correct. Then I'm told its because I'm not prompting hard enough... I'd rather just learn how to do it at that point. Once I solve that problem, I can solve it again in the future in .25x the time.
rtsil 115 days ago [-]
Here's the thing. 99% of people aren't writing compilers or debuggers, they're writing glorified CRUDs. LLM can save a lot of time for these people, just like 99% of people only use basic arithmetic operations, and MS Excel saves a lot of time for these people. It's not about solving new problems, it's about solving old and known problems very fast.
csmpltn 114 days ago [-]
> "99% of people aren't writing compilers or debuggers"
Look, I get the hype - but I think you need to step outside a bit before saying that 99% of the software out there is glorified CRUDs...
Think about the aerospace/defense industries, autonomous vehicles, cloud computing, robotics, sophisticated mobile applications, productivity suites, UX, gaming and entertainment, banking and payment solutions, etc. Those are not small industries - and the software being built there is often highly domain-specific, has various scaling challenges, and takes years to build and qualify for "production".
Even a simple "glorified CRUD", at a certain point, will require optimizations, monitoring, logging, debugging, refactoring, security upgrades, maintenance, etc...
There's much more to tech than your weekend project "Facebook but for dogs" success story, which you built with ChatGPT in 5 minutes...
I was the driver. I told it to parse and operate on the AST, to use a plugin pattern to reduce coupling, etc. The machine did the tippy-taps for me and at a much faster rate than I could ever dream of typing!
It’s all in a Claude Project and can easily and reliably create new modules for bash commands because it has the full scope of the system in context and a ginormous amount of bash commands and TypeScript in the training corpus.
KHRZ 115 days ago [-]
One good use case is unit tests, since they can be trivial while at the same time being cumbersome to make. I could give the LLM code for React components, and it would make the tests and setup all the mocks which is the most annoying part. Although making "all the tests" will typically involve asking the LLM again to think of more edge cases and be sure to cover everything.
hcks 114 days ago [-]
> I recently built a Brainfuck compiler and TUI debugger
Highly representative of what devs make all day indeed
sweeter 113 days ago [-]
Yea, obviously not, but the smaller problems this bigger project was composed of were things that you could see anywhere. I made heavy use of string manipulation that could be generally applied to basically anything
remoroid 115 days ago [-]
Really? Come on. You think trying to make it solve "niche and complicated issues" for a Brainfuck compiler is reasonable? I can't take this seriously. Do you know what most developer jobs entail?
I never need to type paragraphs to get the output I want. I don't even bother with correct grammar or spelling. If I need code for x crud web app who is going to type it faster, me or the LLM? This is really not hard to understand.
itsoktocry 114 days ago [-]
For many of us programming is a means to an end. I couldn't care less about compilers.
px1999 115 days ago [-]
Specifically within the last week, I have used Claude and Claude via cursor to:
- write some moderately complex powershell to perform a one-off process
- add typescript annotations to a random file in my org's codebase
- land a minor feature quickly in another codebase
- suggest libraries and write sample(ish) code to see what their rough use would look like to help choose between them for a future feature design
- provide text to fill out an extensive sales RFT spreadsheet based on notes and some RAG
- generat some very domain-specific realistic sounding test data (just naming)
- scaffold out some PowerPoint slides for a training session
There are likely others (LLMs have helped with research and in my personal life too)
All of these are things that I could do (and probably do better) but I have a young baby at the moment and the situation means that my focus windows are small and I'm time poor. With this workflow I'm achieving more than I was when I had fully uninterrupted time.
ben_w 115 days ago [-]
> But no one really show how they're actually solving problems with LLMs and how the alternatives were worse for them.
I'm an iOS dev, my knowledge of JS and CSS is circa 2004. I've used ChatGPT to convert some of my circa 2009 Java games into browser games.
> Chatting with a LLM look to me like your intent is either vague or you don't know the units to use
Or that you're moving up the management track.
Managers don't write code either. Some prefer it that way.
Workaccount2 115 days ago [-]
I have used chatGPT to write test systems for our (physical) products. I have a pretty decent understanding of how code/programs works structurally, I just don't know the syntax/language (Python in this case).
So I can translate things like
"Create an array, then query this instrument for xyz measurements, then store those measurements in the array. Then store that array in the .csv file we created before"
It works fantastic and saved us from outsourcing.
joseluis 115 days ago [-]
The key difference is that this is a multidisciplinary conversational interface, and a tool in itself for interrelating structured meaning and reshaping it coherently enough so that it can be of great value both in the specific domain of the dialog, and in the potential to take it on any tangent in any way that can be expressed.
Of course it has limitations and you can't be sleep at the wheel, but that's true of any tool or task.
DiogenesKynikos 115 days ago [-]
For one, I spend less time on Stackoverflow. LLMs can usually give you the answer to little questions about programming or command-line utilities right away.
lifeformed 115 days ago [-]
I think people who are successfully using it to write code are just chaining APIs together to make the same web apps you see everywhere.
imiric 115 days ago [-]
The vast majority of software is "just chaining APIs together". It makes sense that LLMs would excel at code they've been trained on the most, which means they can be useful to a lot of people. This also means that these people will be the first to be made redundant by LLMs, once the quality improves enough.
elbear 114 days ago [-]
I would say all software is chaining APIs together.
imiric 114 days ago [-]
Well, that depends on how you look at it.
All software calls APIs, but some rely on literally "just chaining" these calls together more than writing custom behavior from scratch. After all, someone needs to write the APIs to begin with. That's not to say that these projects aren't useful or valuable, but there's a clear difference in the skill required for either.
You could argue that it's all APIs down to the hardware level, but that's not a helpful perspective in this discussion.
elbear 111 days ago [-]
| You could argue that it's all APIs down to the hardware level, but that's not a helpful perspective in this discussion.
Yes, that's what I'm arguing. Why isn't useful? I think it's useful, because it demystifies things. You know that in order to do something, you need to know how to use the particular API.
> The fact everyone that say they've become more productive with LLMs won't say how exactly. But no one really show how they're actually solving problems with LLMs and how the alternatives were worse for them.
> The fact everyone that say they've become more productive with LLMs won't say how exactly.
I have python scripts which do lot of automation like downloading pdfs, bookmarking pdfs, processing them, etc. Thanks to LLMs I dont write a python code myself, I just ask an LLM to write it, I just provide the requirement. I just copy the code generated by the AI model and run it. If there any errors, I just ask AI to fix it.
NetOpWibby 114 days ago [-]
> The fact everyone that say they've become more productive with LLMs won't say how exactly.
Anecdotally, I no longer use StackOverflow. I don’t have to deal with random downvotes and feeling stupid because some expert with a 10k+ score on 15 SE sites each votes my question to be closed. I’m pretty tech savvy, been doing development for 15 years, but I’m always learning new things.
I can describe a rough idea of what I want to an LLM and get just enough code for me to hit the ground running…or, I can ask a question in forum and twiddle my thumbs and look through 50 tabs to hopefully stumble upon a solution in the meantime.
I’m productive af now. I was paying for ChatGPT but Claude has been my goto for the past few months.
Kiro 115 days ago [-]
You clearly have made up your mind that it can't be right but to me it's like arguing against breathing. There are no uncertainties or misunderstandings here. The productivity gains are real and the code produced is more robust. Not in theory, but in practice. This is a fact for me and you trying to convince me otherwise is just silly when I have the result right in front of me. It's also not just boilerplate. It's all code.
mhuffman 115 days ago [-]
>There are no uncertainties or misunderstandings here. The productivity gains are real and the code produced is more robust. Not in theory, but in practice.
So, that may be a fact for you but there are mixed results when you go out wide. For example [1] has this little nugget:
>The study identifies a disconnect between the high expectations of managers and the actual experiences of employees using AI.
>Despite 96% of C-suite executives expecting AI to boost productivity, the study reveals that, 77% of employees using AI say it has added to their workload and created challenges in achieving the expected productivity gains. Not only is AI increasing the workloads of full-time employees, it’s hampering productivity and contributing to employee burnout.
So not everyone is feeling the jump in productivity the same way. On this very site, there are people claiming they are blasting out highly-complex applications faster than they ever could, some of them also claiming they don't even have any experience programming. Then others claiming that LLMs and AI copilots just slow them down and cause much more trouble than they are worth.
It seems like just with programming itself, that different people are getting different results.
Just be mindful that it is one's interest to push the "LLMs suck, don't waste your time with them" narrative once they figure out how to harness LLMs.
"Jason is a strong coder, and he despises AI tools!"
cjbgkagh 115 days ago [-]
In my view these models produce above average code which is good enough for most jobs. But the hacker news sampling could be biased towards the top tier of coders - so their personal account of it not being good enough can also be true. For me the quality isn't anywhere close to good enough for my purposes, all of my easy code is already done so I'm only left working on gnarly niche stuff which the LLMs are not yet helpful with.
For the effect on the industry, I generally make the point that even if AI only replaces the below average coder it will cause a downward pressure on above average coders compensation expectation.
Personally, humans appear to be getting dumber at the same time that AI is getting smarter and while, for now, the crossover point is at a low threshold that threshold will of course increase over time. I used to try to teach ontologies, stats, SMT solvers to humans before giving up and switching to AI technologies where success is not predicated on human understanding. I used to think that the inability for most humans to understand these topics was a matter of motivation, but have rather recently come to understand that these limitations are generally innate.
rvnx 115 days ago [-]
It is also a problem of ego.
It is difficult if you have been told all your life that you are the best, to accept the fact that a computer or even other people might be better than you.
It requires lot of self-reflection.
Real top-tiers programmers actually don’t feel threatened by LLMs. For them it is just one more tool in the toolbox like syntax highlighting or code completion.
They choose to use these tools based on productivity gains or losses, depending on the situation.
hatefulmoron 115 days ago [-]
Not to diminish your point at all: I think it's also just a fear that the fun or interesting part of the task is being diminished. To say that the point of programming is to solve real world problems ('productivity') is true, but in my experience it's not necessarily true for the person doing the solving. Many people who work as programmers like to program (as in, the process of working with code, typing it, debugging it, building up solutions from scratch), and their job is an avenue to exercise that part of their brain.
Telling that sort of person that they're going to be more productive by skipping all the "time consuming programming stuff" is bound to hurt.
elbear 114 days ago [-]
The solution to this is to code your own things for fun.
p-e-w 115 days ago [-]
> Real top-tiers programmers actually don’t feel threatened by LLMs.
They should, because LLMs are coming for them also, just maybe 2-3 years later than for programmers that aren't "real top-tier".
The idea that human intellect is something especially difficult to replicate is just delusional. There is no reason to assume so, considering that we have gone from hole card programming to LLMs competing with humans in a single human lifetime.
I still remember when elite chessplayers were boasting "sure, chess computers may beat amateurs, but they will never beat a human grandmaster". That was just a few short years before the Deep Blue match.
The difference is that nobody will pay programmers to keep programming once LLMs outperform them. Programmers will simply become as obsolete as horse-drawn carriages, essentially overnight.
kaoD 114 days ago [-]
> They should, because LLMs are coming for them also, just maybe 2-3 years later than for programmers that aren't "real top-tier".
Would you be willing to set a deadline (not fuzzy dates) when my job is going to be taken by an LLM and bet $5k on that?
Because the more I use LLMs and I see their improvement rate, the less worried I am about my job.
The only thing that worries me is salaries going down because management cannot tell how bad they're burying themselves into technical debt and maintenance hell, so they'll underpay a bunch of LLM-powered interns... which I will have to clean up and honestly I don't want to (I've already been cleaning enough shit non-LLM code, LLMs will just generate more and more of that).
smallnamespace 114 days ago [-]
> Would you be willing to set a deadline (not fuzzy dates) when my job is going to be taken by an LLM and bet $5k on that?
This is just a political question and of course so long as humans are involved in politics they can just decide to ban or delay new technologies, or limit their deployment.
Also in practice it's not like people stopped traditional pre-industrial production after industrialization occurred. It's just that pre-industrial societies fell further and further behind and ended up very poor compared to societies that chose to adopt the newest means of production.
I mean, even today, you can make a living growing and eating your own crops in large swathes of the world. However you'll be objectively poor, making only the equivalent of a few dollars a day.
In short I'm willing to bet money that you'll always be able to have your current job, somewhere in the world. Whether your job maintains its relative income and whether you'd still find it attractive is a whole different question.
airspresso 114 days ago [-]
> The difference is that nobody will pay programmers to keep programming once LLMs outperform them. Programmers will simply become as obsolete as horse-drawn carriages, essentially overnight.
I don't buy this. A big part of the programmer's job is to convert vague and poorly described business requirements into something that is actually possible to implement in code and that roughly solves the business need. LLMs don't solve that part at all since it requires back and forth with business stakeholders to clarify what they want and educate them on how software can help. Sure, when the requirements are finally clear enough, LLMs can make a solution. But then the tasks of testing it, building, deploying and maintaining it remain too, which also typically fall to the programmer. LLMs are useful tools in each stage of the process and speed up tasks, but not replacing the human that designs and architects the solution (the programmer).
concordDance 114 days ago [-]
> > Real top-tiers programmers actually don’t feel threatened by LLMs.
> They should, because LLMs are coming for them also, just maybe 2-3 years later than for programmers that aren't "real top-tier".
Not worrying about that because if they've gotten to that point (note: top tier programmers also need domain knowledge) then we're all dead a few years later.
bcoates 115 days ago [-]
Re: Compensation expectations, I figured out a long time ago that bad programmers create bad code, and bad code creates work for good programmers.
If the amount of bad code is no longer limited by the availability of workers who can be trained up to "just below average" and instead anyone who knows how to work a touchscreen can make AI slop, this opens up a big economic opportunity.
cjbgkagh 115 days ago [-]
One could hope, but in my view perception precedes reality and even if that is the reality the perception is that AI will lower compensation demands and those doing the layoffs/hiring will act accordingly.
You could also make the same claims about outsourcing, and while it appears that in most cases the outsourcing doesn't pay off, the perception that it would has really damaged CS as a career.
tzs 115 days ago [-]
And like with outsourcing it starts with the jobs at the lower end of the skill range in an industry, and so people at the higher end don't worry about it, and later it expands and they learn that they too are not safe.
What happened a couple of decades ago in poetry [1] could happen now with programming:
> No longer is it just advertising jingles and limericks made in Haiti and Indonesia. It's quatrains, sonnets, and free-form verse being "outsourced" to India, the Philippines, Russia, and China.
...
> "Limericks are a small slice of the economy, and when people saw globalization creating instability there, a lot said, 'It's not my problem,'" says Karl Givens, an economist at Washington's Economic Policy Institute. "Now even those who work in iambic pentameter are feeling it."
Anything that makes fewer people get into programming is good for the field of CS. Only those who truly care go into it
115 days ago [-]
delusional 115 days ago [-]
What sort of problems do you solve? I tried to use it. I really did. I've been working on a tree edit distance implementation base on a paper from 95. Not novel stuff. I just can't get it to output anything coherent. The code rarely runs, it's written in absolutely terrible style, it doesn't follow any good practices for performant code. I've struggled with getting it to even implement the algorithm correctly, even though it's in the literature I'm sure it was trained on.
Even test cases have brought me no luck. The code was poorly written, being too complicated and dynamic for test code in the best case and just wrong on average. It constantly generated test cases that would be fine for other definitions of "tree edit distance" but were nonsense for my version of a "tree edit distance".
What are you doing where any of this actually works? I'm not some jaded angry internet person, but I'm honestly so flabbergasted about why I just can't get anything good out of this machine.
macrolime 115 days ago [-]
This kind of problems is really not where LLMs shine.
Where you save loads of time is when you need to write lots of code using unfamiliar APIs. Especially when it's APIs you won't work with a lot and spending loads of time learning then would just be a waste of time. In these cases LLMs call tell you the correct API cells and it's easy to verify. The LLM isn't really solving some difficult technical problem, but saves lots of work.
throwaway765123 115 days ago [-]
This exactly. LLMs can't reason, so we shouldn't expect them to try. They can do translation extremely well, so things like converting descriptions to 90-95% correct code in 10-100x less time, or converting from one language to another, are the killer use cases IMO.
But expecting them to solve difficult unsolved problems is a fundamental misunderstanding of what they are under the hood.
delusional 115 days ago [-]
I picked this problem specifically because it's about "converting from one language to another". The problem is already solved in the literature. I understand that doing cutting edge research is a different problem, and that is explicitly not what I'm doing here, nor what I am expecting of the tool. I have coauthored an actual published computer science paper, and this excercise is VERY far from the complexity of that.
Could you share some concrete experience of a problem where aider, or a tool like it, helped you? What was your workflow, and how was the experience?
kaoD 114 days ago [-]
I'm a senior engineer (as in, really senior, not only years of experience). I can get familiar with unfamiliar APIs in a few hours and then I can be sure I'm doing the right thing, instead of silently failing to meet edge cases and introducing bugs because I couldn't identify what was wrong in the LLM output (because, well, I'm unfamiliar with the API in the first place).
In other words: LLMs don't solve any noteworthy problems, at least yet.
delusional 113 days ago [-]
I feel sort of the same way but I'm desperate to understand what I'm missing. So many people sing such high praises. Billions are being invested. People are proclaiming the end of software developers. What I'm looking at can't be the product they are talking about.
I'm perfectly happy reading man pages personally. Half the fun of programming to me is mastering the API to get something out of it nobody expected was in there. To study the documentation (or implementation) to identify every little side effect. The details are most of the fun to me.
I don't really intend to use the AI for myself, but I do really wish to see what they see.
Tainnor 114 days ago [-]
Maybe for happy path cases. I've tried to ask ChatGPT how you can do a certain non-obvious thing with Kafka, and it just started inventing things. Turns out, that thing isn't actually possible to do with Kafka (by design).
thesz 115 days ago [-]
I think that contemporary models are trained for engagement, not for actual help.
My experience is the same as yours, but I noticed that while LLMs circa two years ago tried to come up with the answer, current generation of LLMs tries to make me come with the answer. And that not helping at all.
bongodongobob 115 days ago [-]
Did you tell it that? Are you trying to converse and discuss or are you trying to one shot stuff? If it gets something wrong, tell it. Don't just stop and try another prompt. You have to think of it as another person. You can talk to it, question it, guide it.
Try starting from ground zero and guiding it to the solution rather than trying to one shot your entire solution in one go.
I want you to implement this kind of tree in language x.
Ok good, now I want you to modify it to do Y.
Etc.
delusional 115 days ago [-]
I've tried both. One time I actually tried so hard that I ran out of context, and aider just dumped me back to the main prompt. I don't think It's possible to guide it any more than that.
My problem is that the solution is right there in the paper. I just have to understand it. Without first understanding that paper, I can't possibly guide the AI towards a reasonable implementation. The process of finding the implementation is exactly the understanding of the paper, and the AI just doesn't help me with that. In fact, all too often I would ask it to make some minor change, and it would start making random changes all over the file, completely destroying my mental model of how the program worked. Making it change that back completely pulls me out of the problem.
When it's a junior at my job, at least I can feel like I'm developing a person. They retain the conversation and culture I impart as part of the problem solving process. When I struggle against the computer, it's just a waste of my time. It's not learning anything.
I'm still really curious what you're doing with it.
minkles 115 days ago [-]
That’s fine until your code makes its way to production, an unconsidered side effect occurs and then you have to face me.
You are still responsible for what you do regardless of the means you used to do it. And a lot of people use this not because it’s more productive but because it requires less effort and less thought because those are the hard bits.
I’m collecting stats at the moment but the general trend in quality as in producing functional defects is declining when an LLM is involved in the process.
So far it’s not a magic bullet but a push for mediocrity in an industry with a rather bad reputation. Never a good story.
blargey 115 days ago [-]
Wasn't there a recent post about many research papers getting published with conclusions derived from buggy/incorrect code?
I'd put more hope in improving LLMs/derivatives than improving the level of effort and thought in code across the entire population of "people who code", especially the subset who would rather be doing something else with their time and effort / see it as a distraction from the "real" work that leverages their actual area of expertise.
a_wild_dandan 115 days ago [-]
> You are still responsible for what you do regardless of the means you used to do it. And a lot of people use this not because it’s more productive but because it requires less effort and less thought because those are the hard bits.
Yeah, that's...the whole point of tools. They reduce effort. And they don't shift your responsibility. For many of us, LLMs are overwhelmingly worth the tradeoffs. If your experience differs, then it's unfortunate, and I hate that for you. Don't use 'em!
bongodongobob 115 days ago [-]
Ugh, dude, I used to push bad code into production without ChatGPT. It is such a stupid argument. Do you really think people are just blindly pushing code they can't make heads or tails of? That they haven't tested? Do you seriously think people are just one shotting code and blasting it into prod? I am completely baffled by people in this industry that just don't get it. Learn to prompt. Write tests. Wtf.
hughesjj 115 days ago [-]
My problem is that, for a surprising number of applications, it's taken me longer to have the conversation with chatgpt to get the code I want than just doing it myself.
Copilot and the likes are legit for boilerplate, some test code, and posix/power shell scripting. Anything that's very common it's great.
Anything novel though and it suffers. Did AWS just release some new functionality and only like 4 people have touched it so far on GitHub? Are you getting source docs incomplete or spread out amongst multiple pages with some implicit/betwen-the-lines spec? Eh, good luck, you're probably better off just reading the docs yourself or guess and checking.
Same goes for versioning, sometimes it'll fall back into an older version of the system (ex Kafka with kraft vs zookeeper)
Personally, the best general use case of LLMs for me is focus. I know how to break down a task, but sometimes I have an issue staying focused on doing it and having a reasonably competent partner to rubber duck with is super useful. It helps that the chat log then becomes an easy artifact to more or less copy paste, and chatgpt doesn't do a terrible job reformatting either. Like for 90% of the stuff it's easier than using vim commands.
lanstin 115 days ago [-]
It seems great for like straightforward linear code, elisp functions, python data massage scripts, that sort of thing. I had it take a shot at some new module for a high volume Go server with concurrency/locking concerns and nil pointer receivers. I got more panics from the little bit of code GPT wrote than all my own code, not because it was bad code but because when I use dangerous constructs like locking and pointers that can be nil, I have certain rigid rules for how to use them and the generated code did not follow those rules.
sensanaty 114 days ago [-]
> Do you really think people are just blindly pushing code they can't make heads or tails of? That they haven't tested?
Yes, most definitely. I've recently been introduced to our CTOs little pet project that he's been building with copious help from ChatGPT, and it's genuinely some of the most horrid code I've ever seen in my professional career. He genuinely doesn't know what half of it even does when I quizzed him about some of the more egregious crap that was in there. The real fun part is that now that it's a "proven" PoC some poor soul is going to have to maintain that shit.
We also have a mandate from the same CTO to use more AI in our workflows, so I have 0 doubts in my mind that people are blindly pushing code without thinking about it, and people like myself are left dealing with this garbage. My time & energy is being wasted sifting through AI-generated garbage that doesn't pass the smell test if you spend a singular minute of effort reading through the trash it generates.
minkles 115 days ago [-]
Yes that's exactly what they are doing.
I literally had someone with the balls to tell me that it was ChatGPT's fault.
Due diligence and intelligence has shit the fucking bed quite frankly.
hobs 115 days ago [-]
Do you think ChatGPT has changed any of those answers from Yes to No? Because it hasn't.
People blindly copied stack overflow code, they blindly copied every example off of MSDN, they blindly copy from ChatGPT - your holier than thou statements are funny, and frankly most LLMs cannot leave a local maxima, so anyone who says they dont write any code anymore I frankly think they are not capable of telling the mistakes, both architecturally and specifically that they are making.
More and different prompting will not dig you out of the hole.
kaoD 114 days ago [-]
This. Most people I know that use LLMs to be super productive are like "make me a button, it's red" (hyperbolic statement but you know what I mean). I can do that faster and better myself.
When I'm deeply stuck on something and I think "let's see if an LLM could help here", I try (and actually tried many times) to recruit those prompting gurus around me that swear LLMs solve all their problems... and they consistently fail to help me at all. They cannot solve the problem at all and I'm just sitting there, watching the gurus spend hours prompting in circles until they give up and leave (still thinking LLMs are amazing, of course).
This experience is what makes me extremely suspicious of anyone on the internet claiming they don't write code anymore but refusing to show (don't tell!) -- when actually testing it in real life it has been nothing but disappointment.
scubbo 115 days ago [-]
> Do you really think people are just blindly pushing code they can't make heads or tails of? That they haven't tested? Do you seriously think people are just one shotting code and blasting it into prod?
Yes, and I see proof of it _literally every day_ in Code Reviews where I ask juniors to describe or justify their choices and they shrug and say "That's what Copilot told me to put".
mewpmewp2 115 days ago [-]
That sounds more like poor hiring decisions.
scubbo 113 days ago [-]
That sounds more like moving the goalposts. The claim (via sarcastic comment) was that people do not simply push code that they do not understand - and I provided a counter-example. No-one in that conversation disagrees that that's a bad practice - but until and unless I have full mandate to hire and fire whoever I want to work with, or to change jobs at will, I'm going to have to work with people whose development practices I disagree with.
benterix 115 days ago [-]
> I've found that I haven't written a line of code in weeks
Which is great until your next job interview. Really, it's tempting in the short run but I made a conscious decision to do certain tasks manually only so that I don't lose my basic skills.
vasco 115 days ago [-]
ChatGPT voice interface plugged into the audio stream, with the prompt:
- I need you to assist me during a programming interview, you will be listening to two people, the interviewer and me. When the interviewer asks a question, I'd like you to feed me lines that seem realistic for an interview where I'm nervous, don't give me a full blown answer right away. Be very succinct. If I think you misunderstood something, I will mention the key phrase "I'm nervous today and had too much coffee". In this situation, remember I'm the one that will say the phrase, and it might be because you've mistaken me by the interviewer and I want you to "reset". If I want you to dig deeper than what you've provided me with, I'll say the key phrase "Let's dig deeper now". If I think you've hallucinated and want you to try again, I'll say "This might be wrong, let me think for just a minute please". Remember, other than these key phrases, I'll only be talking to the interviewer, not you.
On a second screen of some sort. Other than that, interviewers will just have to accept that nobody will be doing the job without these sort of assistants from now on anyway. As an interviewer I let candidates consult online docs for specific things already because they'll have access to Google during the job, this is just an extension of that.
gcanyon 115 days ago [-]
I recently interviewed a number of people about their SQL skills. The format I used was to share two queries with them a couple days ahead of time in a google doc, and tell them I will ask them questions about those queries during the interview.
Out of maybe twenty people I interviewed this way, only three of them pointed out that one of the queries had a failing error in it. It was something any LLM would immediately point out.
Beyond that: the first question I asked was: "What does this query do, what does it return?" I got responses ranging from people who literally read the query back to me word by word, giving the most shallow and direct explanation of what each bit did step-by-step, to people who clearly summarized what the query did in high-level, abstract terms, as you might describe what you want to accomplish before you write the query.
I don't think anyone did something with ChatGPT live, but maybe?
apsurd 115 days ago [-]
This made me laugh. I can't deny it isn't already happening. But wow people work so hard to avoid working hard.
throwaway765123 115 days ago [-]
It's not about avoiding hard work - the audience on HN skews wealthy due to heavy representation of skilled devs in their 30s+, but the average person does not earn anything close to FAANG salaries. Even most devs in general don't earn like that. The interview process being fairly well understood in general, any advantage that can possibly get a person from $60k/year to generationally-life-changing $300k/year will be used eventually.
vasco 115 days ago [-]
And I wrote this as a knee-jerk reaction after reading the parent, I imagine people will be putting way more effort if it can get them a great job. And to be honest, if they can fool you, they can most likely do the job as well. Most of the industry tests at a higher skill level than what they actually require on the day to day anyway.
bessbd 115 days ago [-]
It's almost inspiring, isn't it?
elbear 114 days ago [-]
I think the point is to avoid pointless hard work.
jamesmotherway 115 days ago [-]
Not everyone is doing coding interviews. Some might struggle with a particular language due to lack of muscle memory, but can dictate the logic in pseudocode and can avoid pitfalls inferred from past experience. This sort of workflow is compatible with LLMs, assuming a sufficient background (otherwise one can't recognize when the output diverges from your intent).
I personally treat the LLM as a rubber duck. Often I reject its output. In other cases, I can accept it and refactor it into something even better. The name of the game is augmentation.
dmd 115 days ago [-]
I sometimes get the idea from statements like this - and HN's focus on interviewing in general - that people are switching jobs a dozen times a year or something. How often are most people switching jobs? I've had 5 jobs in the last 20 years.
macintux 115 days ago [-]
I'm old, and well-paid for my geographic region (but for various mostly stupid reasons utterly broke). No amount of value created (at least, for my skill level) will protect me from ageism and/or budget cuts.
115 days ago [-]
ed 115 days ago [-]
This. I’ve been using elixir for ~6 months (guided by Claude) and probably couldn’t solve fizz buzz at a whiteboard without making a syntax error. Eek.
stavros 115 days ago [-]
Who cares? If I'm hiring you to make a product, I care that the higher order logic is correct, that the requirements are all catered for, and that the code does reasonable things in all cases. Things I don't care about are FizzBuzz, programming on whiteboards, and not making syntax errors.
kaoD 114 days ago [-]
This is how companies fail. 5 years down the line no one is able to change anything in the system because it's so poorly architected (by being a bunch of Claude copypastes cobbled together) that it takes one month to do a one-day task (if it's even possible).
stavros 114 days ago [-]
I guess we should change our hiring practices to optimize for FizzBuzz and getting all the syntax right first try.
kaoD 114 days ago [-]
I can see how you got that impression from my comment (if you ignore how I mentioned architecture), so let me elaborate:
It's the opposite. FizzBuzz and getting the syntax right is what LLMs are good at... but there's so much more nuance at being experienced with a language/framework/library/domain which senior engineers understand and LLMs don't.
Being able to write Elixir assisted by an LLM does not mean you can produce proper architecture and abstractions even if the high level ideas are right. It's the tacit knowledge and second-order thinking that you should hire for.
But the thing is, if someone cannot write Elixir without syntax errors unless using an LLM, well, that's a extremely good proxy that they don't know the ins and outs of the language, ecosystem, best practices... Years of tacit knowledge that LLMs fail to use because they're trained on a huge number of tutorial and entry-level code ridden with the wrong abstractions.
The only code worse than one that doesn't work is one that kinda works unless your requirements change ever so slightly. That's a liability and you will pay it with interests.
To give a concrete example: I am very experienced with React. Very. A lot. The code that LLMs write for it is horrid, bug-ridden, inflexible and often misuses its footgun-y APIs like `useEffect` like a junior fresh out of a boot camp would, directly contradicting the known best practices for maintainable (and often even just "correct") code. But yeah it superficially solves the problem. Kinda. But good luck when the system needs to evolve. If it cannot do proper code that's <500 lines how do you expect it to deal with massive systems that need to scale to 10s of KLOC across an ever-growing twine?
But management will be happy because the feature shipped and time to market was low... until you can no longer ship anything new and you go out of business.
stavros 114 days ago [-]
Ah, sorry, I read your comment as disagreeing with me, now I see it's the opposite. Exactly, LLMs (for now) are good at writing low-level code, but we need someone to work on architecture.
I had an idea the other day of an LLM system that would start from a basic architecture of an app, and would zoom down and down on components until it wrote the entire codebase, module by module. I'll try that, it sounds promising.
LouisSayers 115 days ago [-]
You need to prep for job interviews anyway. I'd rather spend the majority of my time being productive.
calmworm 115 days ago [-]
Job interview? You might be surprised at the number of us who don’t code for a job.
__loam 115 days ago [-]
I'd bet most people on this forum program professionally.
calmworm 115 days ago [-]
I would take that bet.
idiotsecant 115 days ago [-]
Me too.
atomic128 115 days ago [-]
Somebody tested people on Hacker News to evaluate programming competency.
This was part of a larger evaluation comparing the Hacker News population to people on Reddit programming subreddits.
It appears that Hacker News is perhaps NOT populated by the programming elite. In contrast, there are real wizards on Reddit.
Surprising, I know.
__loam 115 days ago [-]
Not surprising given how bad the takes here are and how many of the users here are dumb kids right out of college who are aspiring founders.
calmworm 114 days ago [-]
Unnecessarily negative. Maybe rethink it.
calmworm 114 days ago [-]
Not surprised there would be a “heated” discussion as a result of this one link, that measured only those who engaged it, and how? I opened the link, hit Submit just to see what would happen… now the percentage of HN users who are competent programmers is even fewer than before, by that metric.
whamlastxmas 115 days ago [-]
I’ve made the decision to embrace being bad at coding but getting a ton of work done using an LLM and if my future employer doesn’t want massive productivity and would prefer being able to leetcode really well then I unironically respect that and that’s ok.
I’m not doing ground breaking software stuff, it’s just web dev at non massive scales.
__loam 115 days ago [-]
You future employer might expect you to bring some value through your expertise that doesn't come from her LLM. If you want to insist on degrading your own employability like this, I guess it's your choice.
fragmede 115 days ago [-]
For the most part, businesses don't care how you deliver value, just that you do. If programmer A does a ticket in 3 days with an LLM, and programmer B takes a week to do the same ticket, but doesn't use an LLM, with programmer B choosing not to out of some notion of purity, who's more employable?
__loam 115 days ago [-]
Productivity is not the only aspect of our profession that matters, and in fact it's probably not even the most important part. I'm not suggesting we get stuck or handcraft every aspect of our code, and there are multitudes of abstractions and tools that enhance productivity, including everything from frameworks to compilers.
What I'm saying is what the original comment is doing, having the LLM write all their code, will make them a less valuable employee in the long term. Participating in the act of programming makes your a better programmer. I'd rather have programmer B if they take the time to understand their code, so that when that code breaks at 4am and they get the call, they can actually fix it rather than be in a hole they dug with LLMs that they can't dig out of.
roenxi 115 days ago [-]
You don't need to call them at 4am, you can keep a git log of the prompts that were used to generate the code and some professional 4am debugger can sit there and use an LLM to fix it.
Probably not a practical option yet, but if we're looking at the long term that is where we are heading. Or, realistically, the even longer term where the LLM self-heals broken systems.
dvfjsdhgfv 114 days ago [-]
While a git log of prompts seems like a novel idea to me, I don't believe it would work - not because of temperature and LLMs being non-deterministic and the context window overflowing, but because at a certain level of complexity LLMs simply fail, even though they are excellent at fixing simple bugs.
__loam 115 days ago [-]
Lol, yeah the prompt is definitely going to help clarify what the code actually does.
115 days ago [-]
Der_Einzige 115 days ago [-]
See, if you work in AI, say, as an AI researcher, asking them not to be allowed to use AI models in the interview is basically not an option.
But "lines of code written" is a hollow metric to prove utility. Code literacy is more effective than code illiteracy.
Lines of natural language vs discrete code is a kind of preference. Code is exact which makes it harder to recall and master. But it provides information density.
> by just knuckling down and learning how to do the work?
This is the key for me. What work? If it's the years of learning and practice toward proficiency to "know it when you see it" then I agree.
smileson2 115 days ago [-]
we're a post illiteracy society now
acedTrex 115 days ago [-]
> I've found that I haven't written a line of code in weeks
How are people doing this, none of the code that gpt4o/copilot/sonnet spit out i ever use because it never meets my standards. How are other people accepting the shit it spits out.
viraptor 115 days ago [-]
You're listing plain models, so I'm assuming you're using them directly. Aider and similar agents use those models but they don't step at the first answer. You can add test running and a linter to the request and it will essentially enter a loop like: what are the steps to solve (prompt)?; here's a map of the repository, which files do you need?; what's your proposed change?; here's the final change and the test run, do you think the problem has been solved?; (go back to the beginning if not)
That just sounds/looks like more work then just doing it normally? what am I missing?
viraptor 115 days ago [-]
Depends on the task but if you're going high level enough, it's not more work. Think about it this way: if you're doing proper development you're going to write code, tests and commit messages. Since you know what you want to achieve, write a really good commit message as the prompt, start writing tests and let the agent run in the meantime. Worst case, it doesn't work and you do the code yourself. Best case, it worked and you saved time.
(Not sure if that was clear but the steps/loop described before happens automatically, you're not babysitting it)
freeone3000 115 days ago [-]
You put it behind an API call and run the loop automatically for every coding query
namanyayg 115 days ago [-]
I'm using Cursor and till now the "test run" part is manual, like Cursor doesn't care about testing or actually checking the code it wrote works
Any tips how I could integrate that? Do I need to switch to aider/plandex?
anujsjpatel 115 days ago [-]
For someone who didn't study a STEM subject or CS in school, I've gone from 0 to publishing a production modern looking app in a matter of a few weeks (link to it on my profile).
Sure, it's not the best (most maintainable, non-redundant styling) code that's powering the app but it's more than enough to put an MVP out to the world and see if there's value/interest in the product.
threeseed 115 days ago [-]
> HN, and the internet in general, have become just an ocean of reactionary sandbagging and blather about how "useless" LLMs are.
This is cult like behaviour that reminds me so much of the crypto space.
I don't understand why people are not allowed to be critical of a technology or not find it useful.
And if they are they are somehow ignorant, over-reacting or deficient in some way.
wenc 115 days ago [-]
I think it's perfectly ok to be critical of technology as long as one is thoughtful rather than dismissive. There is a lot of hype right now and pushing back against it is the right thing to do.
I'm more reacting against simplistic and categorical pronouncements of straight up "uselessness," which to me seems un-curious and deeply cynical, especially since it is evidentially untrue in many domains (though it is true for some domains). I just find this kind of emotional cynicism (not a healthy skepticism, but cynicism) to be contrary to the spirit of innovation and openness, and indeed contrary to evidence. It's also an overgeneralization -- "I don't find it useful, so it's useless" -- rather than "Why don't I find it useful, and why do others do? Let me learn more."
As future-looking HNers, I'd expect we would understand the world through a lens of "trajectories" rather than "current state". Just because LLMs hallucinate and make mistakes with a tone of confidence today -- a deep weakness -- doesn't mean they are altogether useless. We've witnessed that despite their weaknesses, we are getting a lot of value from them in many domains today and they are getting better over time.
Take neural networks themselves for instance. For most of the 90s-2000s, people thought they were a dead end. My own professor had great vitriol against Neural Networks. Most of the initial promises in the 80s truly didn't pan out. Turns out what was missing was (lots of) data, which the Internet provided. And look where we are today.
Another area of cynicism is self-driving cars (Level 5). Lots of hype and overpromise, and lots of people saying it will never happen because it requires a cognitive model of the world, which is too complicated, and there are too many exceptional cases for there to ever be Level 5 autonomy. Possibly true, but I think "never" is a very strong sentiment that is unworthy of a curious person.
rainsford 115 days ago [-]
I generally agree, although an important aspect of thinking in terms of "trajectories" is recognizing when a particular trajectory might end up at a dead end. One perspective on the weaknesses of current LLMs is that it's just where the things are today and they can still provide value even while the technology improves. But another perspective is that the persistence of these weaknesses indicates something more fundamentally broken with the whole approach that means it's not really the path towards "real" AI, even if you can finesse it into doing useful things in certain applications.
There's also an important nuance differentiating rejection of a general technological endpoint (e.g. AGI or Level 5 self-driving cars) with a particular technological approach to achieving those goals (e.g. current LLM design or Tesla's autopilot). As you said, "never" is a long time and it takes a lot of unwarranted confidence to say we will never be able to achieve goals like AGI or Level 5 self-driving. But it seems a lot more reasonable to argue Tesla or OpenAI (and everyone else doing essentially the same thing as OpenAI) are fundamentally on the wrong track to achieving those goals without significantly changing their approach.
I agree that none of that really warrants dismissive cynicism of new technology, but being curious and future-looking also requires being willing to say when you think something is a bad approach even if it's not totally useless. Among other reasons, our ability to explore new technology is not limitless, and hype for a flawed technology isn't just annoying but may be sucking all the oxygen out of the room not leaving any for a potentially better alternative. Part of me wants to be optimistic about LLMs, but another part of me thinks about how much energy (human and compute) has gone into this thing that does not seem to be providing a corresponding amount of value.
wenc 115 days ago [-]
I appreciate this thoughtful comment.
You are absolutely right that the trajectories, if taken linearly, might hit a dead end. I should clarify that when I mentioned "trajectories" I don't mean unpunctuated ones.
I am myself not convinced that LLMs -- despite their value to me today -- will eventually lead to AGI as a matter of course, nor the type of techniques used in autopilot will lead to L5 autonomy. And you're right that they are consuming a lot of our resources, which could well be better invested in a possibly better alternative.
I subscribe to Thomas Kuhn's [1] idea of scientific progress happening in "paradigms" rather than through a linear accumulation of knowledge. For instance, the path to LLMs itself was not linear, but through a series of new paradigms disrupting older ones. Early natural language processing was more rule-based (paradigm), then it became more statistical (paradigm), and then LLMs supplanted the old paradigms through transformers (paradigm) which made it scale to large swaths of data. I believe there is still significant runway left for LLMs, but I expect another paradigm must supplant it to get closer to AGI. (Yann Lecun said that he doesn't believe LLMs will lead to AGI).
Does that mean the current exuberant high investments in LLMs are misplaced? Possibly, but in Kuhn's philosophy, typically what happens is a paradigm will be milked for as much as it can be, until it reaches a crisis/anomaly when it doesn't work anymore, at which point another paradigm will supplant it.
At present, we are seeing how far we can push LLMs, and LLMs as they are have value even today, so it's not a bad approach per se even though it will hit its limits at some point. Perhaps what is more important are the second-order effects: the investments we are seeing in GPUs (essentially we are betting on linear algebra) might unlock the kind of commodity computational power the next paradigm needs to disrupt the current one. I see parallels between this and investments in NASA resulting in many technologies that we take for granted today, and military spend in California producing the technology base that enabled Silicon Valley today. Of course, these are just speculations and I have no more evidence that this is happening with LLMs than anyone else.
I appreciate your point however and it is always good to step back and ask, non-cynically, whether we are headed down a good path.
This entire comment can be summarised as: everyone who doesn't think like me is wrong.
Not everyone is interested in seeing the world through the hopes and dreams of e/acc types and would prefer to see it as it is today.
LLMs are a technology. Nothing more. It can be as amazing or useless as anyone likes.
fragmede 115 days ago [-]
And this comment can be summarized as "Nuh uh, I'm right". When summarizing longer bits of text down to a single sentence, nuance and meaning gets lost, making the summarization ultimatele useless, contributing nothing to the discussion.
ben_w 115 days ago [-]
Crypro and AI have similarities and differences.
The similarities include intense "true believer" pitches and governments taking them seriously.
The differences include that the most famous cryptocurrency can't function as a direct payment mechanism for just lunch purchases in just Berlin (IIRC nor is it enough for all interbank transactions so it can't even be a behind-the-scenes system by itself), while GenAI output keeps ending up in places people would rather not find it like homework and that person on Twitter who's telling you Russia Did Nothing Wrong (and also giving you a nice cheesecake recipe because they don't do any input sanitation).
wenc 115 days ago [-]
Also, I'm deeply skeptical of crypto too due to its present scamminess, but I am keeping an open mind that there is a future in which crypto -- once it gets over this phase of get-rich-quick schemers -- will be seen as just another asset class.
I read somewhere that historically bonds in their early days were also associated with scamminess but today they're just a vanilla asset.
rainsford 115 days ago [-]
I'm honestly more optimistic about cryptocurrency as a mechanism of exchange rather than an asset. As a mechanism of exchange, cryptocurrency has some actually novel properties like distributed consensus that could be useful in certain cases. But an asset class which has zero backing value seems unworkable except for wild speculation and scams. Unfortunately the incentives around most cryptocurrencies (and maybe fundamental to cryptocurrency as an idea) greatly emphasize the asset aspects, and it's getting to be long enough since it became a thing that I'm starting to become skeptical cryptocurrency will be a real medium of exchange outside of illegal activities and maybe a few other niche cases.
evilfred 115 days ago [-]
bonds have utility, crypto does not
evilfred 115 days ago [-]
just like with crypto and NFTs and the metaverse, they are always focused on what is suppsoedly coming down the pipe in the future and not what is actually possible today
rafaelmn 115 days ago [-]
I use sonet 3.5 and while it's actually usable for codegen (compared to gpt/copilot) it's still really not that great. It does well at tasks like "here's a stinky collection of tests that accrued over time - clean this up in style of x" but actually writing code still shows fundamental lack of understanding of underlying API and problem (the most banal example being constantly generating `x || Array.isArray(x)` test)
wokwokwok 115 days ago [-]
> I've found that I haven't written a line of code in weeks
Please post a video of your workflow.
It’s incredibly valuable for people to see this in action, otherwise they, quite legitimately, will simply think this is not true.
Kiro 115 days ago [-]
Who cares what they think? In fact, the fewer who uses this the better for the ones that do. It's not in my self-interest to convert anyone and I obviously don't need to convince myself when I have the result right in front of me. Whether you believe it or not does not make me less productive.
wokwokwok 115 days ago [-]
The obvious answer is you’ll get called a liar and shrill.
I’m not saying you are; I think there are a lot of legitimate AI workflows people use.
…but, there are a lot of people trying to sell AI, and that makes them say things about it which are just flat out false.
/shrug
But you know; freedom of speech; you can say whatever you want if you don’t care what people think of you.
My take on it is showing people things (videos, blogs, repos, workbooks like Terence posted) moves the conversation from “I don’t believe you” to “let’s talk about the actual content”. Wow, what an interesting workflow, maybe I’ll try that…
If you don’t want to talk to people or have a discussion that extends beyond meaningless trivia like “does AI actually have any value” (obviously flame bait opinions only comment threads)… why are you even here?
If you don’t care, then fine. Maybe someone else will and they’ll post an interesting video.
Isn’t that the point of reading HN threads? What do you win by telling people not to post examples of their workflow?
It’s incredibly selfish.
perching_aix 115 days ago [-]
> HN, and the internet in general, have become just an ocean of reactionary sandbagging and blather about how "useless" LLMs are.
Now imagine how profoundly depressing it is to visit a HN post like this one, and be immediately met with blatant tribalism like this at the very top.
Do you genuinely think that going on a performative tirade like this is what's going to spark a more nuanced conversation? Or would you rather just the common sentiment be the same as yours? How many rounds of intellectual dishonesty do we need to figure this out?
riku_iki 115 days ago [-]
> Meanwhile, in the real world, I've found that I haven't written a line of code in weeks. Just paragraphs of text that specify what I want and then guidance through and around pitfalls in a simple iterative loop of useful working code.
could it be that you are mostly engaged in "boilerplate coding", where LLMs are indeed good?
115 days ago [-]
holoduke 115 days ago [-]
People in general don't like change and are naturally defending against it. And the older people get the greater the percentage of people fighting against it. A very useful and powerful skill is to be flexible and adaptable. You positioned yourself in the happy few.
ijustlovemath 115 days ago [-]
How much do you typically pay in a month of tokens?
_wire_ 114 days ago [-]
> Meanwhile, in the real world, I've found that I haven't written a line of code in weeks. Just paragraphs of text that specify what I want and then guidance through and around pitfalls in a simple iterative loop of useful working code.
Comment on first principles:
Following the dictum that you can't prove the absence of bugs, only their presence, the idea of what constitutes "working code" deserves much more respect.
From an engineering perspective, either you understand the implementation or you don't. There's no meaning to iteratively loop of producing working code.
Stepwise refinement is a design process under the assumption that each step is understood in a process of exploration of the matching of a solution to a problem. The steps are the refinement of definition of a problem, to which is applied an understanding of how to compute a solution. The meaning of working code is in the appropriateness of the solution to the definition of the problem. Adjust either or both to unify and make sense of the matter.
The discipline of programming is rotting when the definition of working is copying code from an oracle you run it to see if it goes wrong.
The measure of works must be an engineering claim of understanding the chosen problem domain and solution. Understanding belongs to the engineer.
LLMs do not understand and cannot be relied upon to produce correct code.
If use of an LLM puts the engineer in contact with proven principles, materials and methods which he adapts to the job at hand, while the engineer maintains understanding of correctness, maybe that's a gain.
But if the engineer relies on the LLM transformer as an oracle, how does the engineer locate the needed understanding? He can't get it from the transformer: he's responsible for checking the output of the transformer!
OTOH if the engineer draws on understanding from elsewhere, what is the value of the transformer but as a catalog? As such, who has accountability for the contents of the catalog? It can't be the transformer because it can't understand. It can't be the developer of the transformer because he can't explain why the LLM produces any particular result! It has to be the user of the transformer.
So a system of production is being created whereby the engineer's going-in position is that he lacks the understanding needed to code a solution and he sees his work as integrating the output of an oracle that can't be relied upon.
The oracle is a peculiar kind of calculator with a unknown probability of generating relevant output that works at superhuman speeds, while the engineer is reduced to an operator in the position of verifying that output at human speeds.
This looks like a feedback system for risky results and slippery slope towards heretofore unknown degrees of incorrectness and margins for error.
At the same time, the only common vernacular for tracking oracle veracity is in arcane version numbers, which are believed, based on rough experimentation, to broadly categorize the hallucinatory tendencies of the oracle.
The broad trend of adoption of this sketchy tech is in the context of industry which brags about seeking disruption and distortion, regards its engineers as cost centers to be exploited as "human resources", and is managed by a specialized class of idiot savants called MBAs.
Get this incredible technology into infrastructure and in control of life sustaining systems immediately!
skybrian 115 days ago [-]
What sort of code do you write this way?
sph 115 days ago [-]
Probably nothing a junior programmer wouldn't be able to do relatively easily.
amrrs 115 days ago [-]
Curious why Aider? Why not Cursor ?
evilfred 115 days ago [-]
writing code is the easy part, designing is hard and not LLMable
fragmede 115 days ago [-]
Given how hard we thought programming was a year or two ago, I wouldn't bank my future on design being too hard for an LLM. They're already quite good at helping writing design docs.
bongodongobob 115 days ago [-]
Lol nope. When I'm trying to get it do make something big/complicated I start by telling it it's a software project manager and have me build a spec sheet on the design. Then I hand that off to an architect to flesh out the languages, libraries, files needed etc. Then from that list you can have it work on individual files and functions.
sterlind 115 days ago [-]
I also do OR-adjacent work, but I've had much less luck using 4o for formulating MIPs. It tends to deliver correct-looking answers with handwavy explanations of the math, but the equations don't work and the reasoning doesn't add up.
It's a strange experience, like taking a math class where the proofs are weird and none of the lessons click for you, and you start feeling stupid, only to learn your professor is an escaped dementia patient and it was gobbledygook to begin with.
I had a similar experience yesterday using o1 to see if a simple path exists through s to t through v using max flow. It gave me a very convincing-looking algorithm that was fundamentally broken. My working solution used some techniques from its failed attempt, but even after repeated hints it failed to figure out a working answer (it stubbornly kept finding s->t flows, rather than realizing v->{s,t} was the key.)
It's also extremely mentally fatiguing to check its reasoning. I almost suspect that RLHF has selected for obfuscating its reasoning, since obviously-wrong answers are easier to detect and penalize than subtly-wrong answers.
mjburgess 114 days ago [-]
Yip. We need research into how long it takes experts to repair faulty answers, vs. generate them on their own.
Benchmarking 10,000 attempts on an IQ test is irrelevant if on most of those attempts the time taken to repair an answer is long than the time to complete the test yourself.
I find its useful to generate examplars in areas you're roughly familiar with, but want to see some elaboration or a refresher. You can stich it all together to get further, but when it comes time to actually build/etc. something -- you need to start from scratch.
The time taken to reporduce what it's provided, now that you understand it, is trivial compared to the time needed to repair its flaws.
CJefferson 115 days ago [-]
I'm currently teaching a course on MIP, and out of interest I tried asking 4o about some questions I ask students. It could give the 'basic building blocks' (How to do x!=y, how to do a knapsack), but as soon as I asked it a vaguely interesting question that wasn't "bookwork", I don't think any of it's models were right.
I'm interested on how you seem to be getting better answers than me (or, maybe I just discard the answer once I can see it's wrong and write it myself, once I see it's wrong?)
In fact, I just asked it to do (and explain) x!=y for x,y integer variables in the range {1..9}, and while the constraints are right, the explanation isn't.
115 days ago [-]
wenc 115 days ago [-]
I had to prompt it correctly (tell it to exclude x=y case in the x≠y formulation), but ChatGPT seems to have arrived at the correct answer:
OK, but at that point you've told it basically everything, and this is a really basic book problem!
As another example I just gave it a network flow problem, and asked it to convert to maximum flow (I'm using the API, not chatGPT).
Despite numerous promptings, it never got it right -- it would not stop putting a limit on the source and sink (usually 1), which mean the flow was always exactly 1, here's the bit of wrong code (it's the last part, it's shouldn't be putting any restrictions on nmap['s'] and nmap['t'], as they represent the source and sink), and I couldn't pursade it this was wrong after several prods:
# Constraints: Ensure flow conservation at each vertex
A_eq = np.zeros((len(namelist), num_edges))
b_eq = np.zeros(len(namelist))
for i, (u, v, capacity) in enumerate(edges):
A_eq[nmap[u], i] = 1 # Outflow from u
A_eq[nmap[v], i] = -1 # Inflow to v
# Source 's' has a net outflow, and sink 't' has a net inflow
b_eq[nmap['s']] = 1
b_eq[nmap['t']] = -1
wenc 115 days ago [-]
Sure, but that is nature of LLM prompting. It does take some doing to set up the right guardrails. It's still a good starting point.
Also a trick when the LLM fights you: start from scratch, and put guardrails in your initial prompt.
LLM prompting is a bit like gradient descent in a bumpy nonconvex landscape with lots of spurious optima and saddle points -- if you constrain it to the right locality, it does a better job at finding an acceptable local optimum.
CJefferson 115 days ago [-]
I think this is just a case of different people wanting to work differently (and that's fine).
I can only tell this is wrong because I fully understand it -- and if I fully understand it, why not just write it myself rather than fight against an LLM. If I was trying to solve something I didn't know how to do, then I wouldn't know it was wrong, and where the bug was.
wenc 115 days ago [-]
That's true, except an LLM can sometimes propose a formulation that one has never thought of. In nuanced cases, there is more than one formulation that works.
For MIPs, correctness can often (not always but usually) be checked by simply flipping the binaries and checking the inequalities. Coming up the inequalities from scratch are not always straightforward so LLMs often provide good starting points. Sometimes the formulation is something specific from a paper that that one has never read. LLMs are a way to "mine" those answers (some sifting required).
I think this the mindset that is needed to get value out of LLMs -- it's not about getting perfect answers on textbook problems, but working with an assistant to explore the space quickly at a fraction of the effort.
l33t7332273 115 days ago [-]
I an also working in OR and I have had the complete opposite experience with respect to MILP optimization(and the research actually agrees; there was a big survey paper published earlier this year showing LLMs were mostly correct on textbook problems but got more and more useless as complexity and novelty increased.)
The results are boiler plate at best, but misleading and insidious at worst, especially when you get into detailed tasks. Ever try to ask a LLM what a specific constraint does or worse ask it to explain the mathematical model of some proprietary CPLEX syntactic sugar? It hallucinates the math, the syntax, the explanation, everything.
wenc 115 days ago [-]
Can you point me to that paper? What version of the model were they using?
Have you tried again with the latest LLMs? ChatGPT4 actually (correctly) explains what each constraint does in English -- it doesn't just provide the constraint when you ask it for the formulation. Also, not sure if CPLEX should be involved at all -- I usually just ask it for mathematical formulations, not CPLEX calling code (I don't use CPLEX). The OR literature primarily contains math formulations and that's where LLMs can best do pattern matching to problem shape.
All the LLM is doing is fitting the problem description to a combination of these formulations (and others).
l33t7332273 115 days ago [-]
I was referring to section 4 of A Survey for Solving Mixed Integer Programming via Machine Learning(2024): https://arxiv.org/pdf/2401.03244.
I’ve heard (but not so much observed) that there is substantial difference between recent models, so it’s possible that they are better than when this was written.
Anyways, CPLEX has an associated modeling language that features syntactic sugar which has the effect of providing opaqueness to the underlying MILP that it solves. I find LLMs essentially unable to even make an attempt at determining the MILP from that language.
PS: How is Xpress? Is there some reason to prefer it to Gurobi or Mosek?
wenc 115 days ago [-]
Thanks for sharing that, I appreciate it. It looks like they used open-source Llama models which are not great. I tested these models offline using Ollama and outside of being character chat bots, they weren't very good at much (the only models that give good answers are Sonnet 3.5 or ChatGPT 4). However the paper's conclusion is essentially correct even for state-of-the-art models:
"Overall, while LLM made several errors, the provided formulations can serve as a starting point for OR experts to create mathematical models. However, OR experts should not rely on LLM to accurately create mathematical models, especially for less common or complex problems. Each output needs to be thoroughly verified and adjusted by the experts to ensure correctness and relevance."
I wouldn't recommend anyone inexperienced to use LLMs to create entire models from scratch, but rather use LLMs as a search tool for specific formulations which are then verified and plugged into a larger model. For this, it works really well and saves me a ton of time. As MIP modeler, I have an intuition on the shape of the answer, so even if ChatGPT makes mistakes, I know how to extract the correct bits and it still saves me a ton of time.
The CPLEX API doesn't have a lot of good examples out in the wild, so I don't expect the training to be good. I've always used CPLEX through a modeling language like AMPL, and even AMPL code is rare so I can't expect an LLM to decipher any of it. On the other hand, MIP formulations abound in PDFs of journal publications.
In the vibes department, I feel Xpress is second to Gurobi and CPLEX and it does the job just fine. But it's been a while since I used CPLEX and Gurobi so I have no recent points of comparison (corporate licensing is prohibitively expensive).
marmakoide 115 days ago [-]
I had the same experience with computational geometry.
Very good at giving a textbook answer ("give a Python/ Numpy function that returns the Voronoi diagram of set of 2d points").
Now, I ask for the Laguerre diagram, a variation that is not mentioned in textbooks, but very useful in practice. I can spend a lot of time spoon-feeding the answer, I just have the bullshiting student answers.
I tried other problems like numerical approximation, physics simulation, same experience.
I don't get the hype. Maybe it's good at giving variations of glue code ie. Stack Overflow meet autocomplete ? As a search tool it's bad because it's so confidently incorrect, you may be fooled by bad answers.
CamperBob2 115 days ago [-]
But many of the low-effort comments seem to mostly fall into (1) and (2) -- cynicism rather than cautious optimism.
One good riposte to reflexive LLM-bashing is, "Isn't that just what a stochastic parrot would say?" Some HN'ers would dismiss a talking dog because the C code it wrote has a buffer overflow error.
Workaccount2 115 days ago [-]
It's understandable that people whose career and lifelong skill set that are seemingly on the precipice of obsolescence are going to be extremely hostile to that threat.
How many more years is senior swe work going to be a $175k/yr gig instead of an $75k check-what-the-robot-does gig?
CamperBob2 114 days ago [-]
It depends. If you got into computing because it seemed like the most lucrative career choice to which you might be suited, then yes, I can imagine feeling threatened. Bummer, sucks to be you. But if you got into computing because it seemed like the most interesting thing available to work on, then no, I can't imagine not being fascinated by, and supportive of, the progress being made in the ML field today. Any hostility you feel should be directed at the people who want to lock it all up behind legislative, judicial, or proprietary doors.
In my case, it's all I can do not to walk away from everything else I'm doing to follow this particular muse. I don't have a lot of sympathy for my colleagues who see it as a threat. If you're afraid of new ideas, technologies, and methodologies, you picked the wrong line of work.
jazzyjackson 115 days ago [-]
Id rather live in the world without talking dogs if their main utility is authoring buggy code
airstrike 115 days ago [-]
It also doesn't help that Lean has had so many breaking changes in such little time. When I tried using GPT-4 for it, it mostly rendered old code that would fail to run unless you already knew the answer and how to fix it, which basically made it entirely unhelpful.
benterix 115 days ago [-]
> people who complain on HN that (paid/good - only Sonnet 3.5 and GPT4o are in this category)
Correction: I complain that the only decent model in "Open"AI's arsenal, that is GPT-4, has been replaced by a cheaper GPT-4o, which gives subpar answers to most of my question (I don't care it does it faster). As they moved it to "old, legacy" models, I expect they will phase it out, at which point I'll cancel my OpenAI subscriptions and Sonnet 3.5 will become the clear leader for my daily tasks.
Kudos to Anthropic for their great work, you guys are going in the right direction.
bongodongobob 115 days ago [-]
Nah, o1 is fucking impressive. It's really fucking good. I'm guessing you haven't used it yet.
benterix 114 days ago [-]
I used it and was really disappointed, maybe because of the hype. It generates a long page of entries that I used to generate in the past, often with better results. Note I use it for code generation, not for problem solving.
So I cancelled two of my 3 subscriptions as I realized OpenAI goes in a direction that is not useful for me at all. Claude, on the other hand, is incredibly useful.
EvgeniyZh 115 days ago [-]
There is ~3 order of magnitude more Python code in the internet than Lean code (200GB vs 200MB in the stack v2). You can't tune it "the same way"
agumonkey 115 days ago [-]
Fair point but a lot of python code is redundant and low quality.
Davidzheng 115 days ago [-]
I'm not sure the lean coverage of pure math research is that much (maybe like 1% is represented on mathlib). But I think a system like alpha proof could even today be useful for mathematicians--I mostly dislike systems like o1 where they confidently say nonsense with such high frequency. But i think value is already there.
lanstin 115 days ago [-]
The point about using lean is you don't have to trust you can verify.
Davidzheng 115 days ago [-]
no I agree I just don't think existing Lean codebase is approaching useful coverage. Should change soon
lanstin 115 days ago [-]
I keep asking people in my department about using lean but zero interest so far.
RayVR 114 days ago [-]
I’m amazed you have had any luck with 4o. I found 4 was much better than 4o but still quite bad.
I tried to use 4/4o for a MIP several months ago. Frequently, it would iterate through three or four bad implementations over and over.
Claude 3.5 has been a significant improvement. I don’t really use chatgpt for anything at this point.
po76 115 days ago [-]
Give it a few months. ChatGPT will be recommending GPTs to use or do it automatically.
Nothing is static in the way things are moving.
andrepd 114 days ago [-]
I take cynicism over unbridled optimism. People speak as if we were on the cusp of technological singularity, but I've seen nothing to indicate we're not already past the inflection point of the logistic curve, and well into diminishing returns territory.
riffraff 115 days ago [-]
_can_ GPT be tuned more heavily on Lean?
It looks like the amount of python code in the corpus would outnumber Lean something like 1000:1. Although I guess OpenAI could generate more and train on that.
agumonkey 115 days ago [-]
side question, are there good OR websites / platforms (reddit, mastodon) to get involved in the field ?
dtquad 114 days ago [-]
Most OR researchers and practitioners are on Mastodon.
rabf 115 days ago [-]
Most people are on X.
agumonkey 114 days ago [-]
You mean Twitter ?
thelastparadise 115 days ago [-]
> but for someone who can and does, the $20/month I pay for ChatGPT more than pays for itself.
Would you be willing to pay even more, if it meant you were getting proportionally more valuable answers?
E.g. $200/month or $2,000/month (assuming the $2,000/month gets into employee/intern/contractor level of results.)
This might drive a positive feedback loop.
eab- 115 days ago [-]
Why do you expect GPT being tuned on Lean will help it for research-level math?
threeseed 115 days ago [-]
> side
Or (4) LLMs simply do not work properly for many use cases in particular where large volumes of trained data doesn't exist in its corpus.
And in these scenarios rather than say "I don't know" it will over and over again gaslight you with incoherent answers.
But sure condescendingly blame on the user for their ignorance and inability to understand or use the tool properly. Or call their criticism low-effort.
wenc 115 days ago [-]
That's category (3).
zamadatix 115 days ago [-]
What's the difference between (3) and (4), shouldn't the former contain the latter?
lanstin 115 days ago [-]
Yeah I have been using them to help with learning graduate maths as a grad student. Claude Sonnet 3.5 was unparalleled and the first quite useful one. GPT4o preview seems about equal (based on cutting and pasting the past six months of prompts into it).
fnordpiglet 115 days ago [-]
Rewind your mind to 2019 and imagine reading a post that said
“The experience seemed roughly on par with trying to advise a mediocre, but not completely incompetent, graduate student.”
With regard to interacting with the equivalent of Alexa. That’s a remarkable difference in 5 years.
JumpCrisscross 115 days ago [-]
The first profession AI seems on track to decimate is programming. In particular, the brilliant but remote and individual contributor. There is an obvious conflict of interest in this forum.
vessenes 115 days ago [-]
I see this theory a lot but mostly from people who haven’t tried pair coding with a quality llm. In fact these llms give experienced developers super powers; you can be crazy productive with them.
If you think we are close to the maximum useful software in the world already, then maybe. I do not believe that. Seeing software production and time costs drop one to two orders of magnitude means we will have very different viable software production processes. I don’t believe for a second that it disenfranchises quality thinkers; it empowers them.
Maxion 115 days ago [-]
I totally agree, there e.g. so many companies out there who rely on fully manual processes internally simply because they cannot currently afford to hire programmers to solve the problems they have for them. The ROI just isn't there.
Reduce costs by an order of magnitude or two, and suddenly there's a whole heap more projects that become profitable.
jero1000 114 days ago [-]
I abandoned 3D art after witnessing DALL-E 2's capabilities, and I've observed the ripple effects across creative fields. Initially, photographers and fellow artists dismissed AI as a non-threat. That turned out to be misguided optimism. Now, with Midjourney producing such impressive work, the majority of us have become largely obsolete. These days, I'm noticing developers exhibiting the same denial. From my perspective, they're on a similar trajectory. This AI revolution is impacting creative and technical industries far more rapidly and dramatically than most anticipated..
vessenes 113 days ago [-]
I was a paid wedding photographer in the 1990s, and I used a Rolleiflex TLR with 120 Roll film. I recently attended a friend's wedding, and took with me a Fuji GFX100 series camera, effortlessly shooting pictures I could never have taken with a Rollei from terrible angles at like 5x the resolution with far, far more dynamic range than 120 film ever had.
30 years after I gave up the Rollei, I'm not obsolete as a photographer, and when there's a quality diffusion model that could take a few of my photos from the event at 100 megapixels, and get prompted by me as to what I want to see out of them creatively, I will still not be obsolete, even as a photographer, but most certainly not obsolete as an artist. In fact, I'll have more tools available for my art, with new skills needed, and different workflows.
As to abandoning 3D art -- your call. If you love it, why not see how these new tools open up your art? If you don't love some of the new tools, no problem, don't use them. I still shoot medium format film some times. If you were planning on a long term creative career without staying on top of technical advances in your field, that has not been possible for at least a few centuries.
simonw 114 days ago [-]
Sure, Midjourney's work looks visually impressive, but have you seen evidence that it really is displacing professional 3D artists?
Are legitimate companies genuinely switching to Midjourney over hiring artists now, or is Midjourney usage still mostly happening in places that previously wouldn't have commissioned custom illustrations at all (instead using things like stock photography)?
bugglebeetle 114 days ago [-]
I used to be an illustrator and I know from speaking to my former colleagues that they have, in fact, lost work to AI image generation services. Illustration is seen as a cost center by anyone higher than front line art directors and taste normally stops at that level as well. I think this will eventually end up with a bimodal distribution of undifferentiated AI slop and those who use high-quality human illustration to signal a commitment to taste, design, or maybe even luxury, but the economic consequences of that shift are already in motion.
htfy96 114 days ago [-]
It raises the bar of 'professional 3D artist'.
There're hundreds of thousands of '3D worker' working behind the scene to create the 3D models for makeshift ads, and as far as I know many of them (including my high school mate) already got displaced by Midjourney and lost their job. This used to be a big industry but now almost entirely wiped out by AI.
talldayo 114 days ago [-]
> This used to be a big industry but now almost entirely wiped out by AI.
To my knowledge, 3D artists weren't that huge of an industry to begin with. One of my friends went to college researching 3D physics models, and never landed a job in the field long before the AI wave hit. Unless you're a freelancer or salaried Pixar employee, being a 3D artist is extremely difficult with extraordinarily low job security, AI or no AI.
I think "almost entirely wiped out by AI" is hyperbole, because the primary employer of these artists will still be hiring and products like Sora are a good decade away from being Toy Story quality. AI will be a substitute product for people that didn't even want 3D art in the first place.
IncreasePosts 115 days ago [-]
Before it can replace the brilliant programmer, it needs to be able to replace the mediocre programmer. There is so much programming and other tech/it related work that businesses or people want, but can't justify paying even low tech salaries in America for.
So far, there is little chance of a non-technical person developing a technical solution to their problems using AI.
JumpCrisscross 115 days ago [-]
> Before it can replace the brilliant programmer, it needs to be able to replace the mediocre programmer
Nope. Compensation is exponential. Being able to replace a top performer with a fee mediocre devs pair coding with an LLM is more than fine for 90% of use cases.
IncreasePosts 115 days ago [-]
A mediocre programmer won't be able to judge the allegedly expert level output any better than a non-programmer, so I don't see how that would work.
I think it is more likely that great programmers might just increase their productivity even more with, which will make their value even greater.
JumpCrisscross 115 days ago [-]
> mediocre programmer won't be able to judge the allegedly expert level output any better than a non-programmer, so I don't see how that would work
Sure. Plenty of businesses are. Particularly in the commercial automation sector that numerically hires the most people.
> more likely that great programmers might just increase their productivity
For those in high-productivity, high-margin businesses, yes. For most of the world, no—the surplus productivity doesn’t outweigh the compensation and concentration risk.
I broadly expect a spate of age discrimination lawsuits in the near future because most businesses don’t need a few stars. In the meantime, I’ve watched a lot of people find two people in Brazil + an LLM equals one WFH very good (but not brilliant) coder.
zeroonetwothree 115 days ago [-]
This makes no sense, there are problems that 'brilliant' programmers can solve and no number of mediocre ones ones. Just like you can't substitute Mozart with 100 mediocre composers.
JumpCrisscross 115 days ago [-]
> there are problems that 'brilliant' programmers can solve and no number of mediocre ones ones
These people will continue to have value. But most businesses don’t have problems that can be profitable solved only by brilliant coders.
9dev 114 days ago [-]
So, how many people listen to Mozart and how many to Taylor Swift?
jessekv 114 days ago [-]
Now, or in 300 years?
closewith 114 days ago [-]
> Just like you can't substitute Mozart with 100 mediocre composers.
Commercially, you can. After all, that's the current music business.
simonw 114 days ago [-]
I would expect a top performer with LLM access to be able to produce even more of a multiple of the work of a mediocre developer with LLM access.
If a top performer can produce 5x or more of the value, I would expect companies to continue to value top performers.
HarHarVeryFunny 114 days ago [-]
The programmers who will find LLMs most useful are going to be those who prior to LLMs were copying and pasting from Stack Overflow, and asking questions online about everything they were doing - tasks that LLMs have precisely replaced (it has now memorized all that boilerplate code, consensus answers, and API usage examples).
The developers who will find LLMs the least useful are the "brilliant" ones who never found any utility in any of that stuff, partly because they are not reinventing the wheel for the 1000th time, but instead addressing more challenging and novel problems.
zarzavat 114 days ago [-]
It's very much the opposite.
LLMs free me from the nuts and bolts of the "how", for example I don't have to manually type out a loop. I just write a comment and the loop magically appears. Sometimes I don't have to prompt it at all.
With my brain freed from the drudgery of everyday programming, I have more mental cycles to dedicate to higher concerns such as overall architecture, and I'm just way more productive.
For experienced programmers this is a godsend.
Less experienced developers lack the ability to mentally "see" how software should be architected in a way that balances the concerns, so writing a loop a bit faster it's not as much of an advantage. Also, they lack the reflexes to instantly decide if generated code is correct or incorrect.
LLMs are limited by the user's decision speed, the LLM generates code for you but you have to decide whether to accept or reject. If it takes me 1 second to decide to accept code that would have taken me 10 seconds to physically type, then I'm saving 9 seconds, which really adds up. For a junior developer, LLMs may give negative productivity if it takes them longer to decide if the LLM's version is correct than it would have taken them to type whatever they were going to write in the first place.
HarHarVeryFunny 114 days ago [-]
> LLMs are limited by the user's decision speed
This is obviously the critical point. It's not whether the LLM can do something, i.e. give it a go, but whether that actually saves you time. If it takes longer to verify the LLM code for correctness than to write it yourself, then there is no productivity gain.
I guess this partly also hinges on how much you care about correctness beyond "does it seem to work". For a prototype maybe that's enough, but for work use you probably should check for API "contractual correctness", corner cases, vulnerabilities, etc, or anything that you didn't explicitly specify (or even if you did!) to the LLM. If you are writing the code itself then these multifaceted requirements are all in your head, but with the LLM you'll need to spell them all out (or iterate and refine), and it may well have been faster just to code it yourself (cf working with an intern with -ve productivity).
If you fail to review the LLMs code thoroughly enough, and leave bugs in it to be discovered later, maybe in production, then the cost of doing that, both in time and money, will far outweigh any cost saving in just having written it correctly yourself in the first place. Again, this is more of a concern for production code than for hobbyist or prototype stuff, but having to fix bugs is always slower than getting it right in the first place.
For myself, it seems that for anything complex it's always the design that takes time, not the coding, and the coding in the end (once the detailed design has been worked out) just comes down to straightforward methods and functions that are mostly simple to get right first time. What would be useful, but of course does not yet exist, would be an AGI peer programmer that operated more like a human than a language model, who I could discuss the requirements and design with, and then maybe delegate the coding to as well.
simonw 114 days ago [-]
I like to think I'm more of a "challenging and novel problems" developer than a "copy and paste from Stack Overflow" developer, and I've been finding LLMs extremely useful for over two years at this point.
Yeah, I was gonna say this is not how I see this going. The copy/paste dev is replaced by the novel dev using LLM for the stuff they used to hire interns and juniors for.
In law, this sort of thing already happened with the rise of better research tools. The work L1s used to do a generation ago just does not exist now. An attorney with experience gets the results faster on their own now. With all the pipeline and QoL issues that go with that.
HarHarVeryFunny 114 days ago [-]
That makes some sense, but seems to be answering a different question of whose jobs may be in jeopardy from LLMs, as opposed to who might currently find them useful.
Note though that not all companies see it this way - the telecom I work at is hoping to replace senior onshore developers with junior offshore ones leveraging "GenAI"! I agree that the opposite makes more sense - the seniors are needed, and it's the juniors whose work may be more within reach of LLMs.
I really can't see junior developer positions wholesale disappearing though - more likely them just leveraging LLM/AI-enhanced dev tools to be more productive. Maybe in some companies where there are lots of junior developers they may (due to increased productivity) need fewer in the future, but the productivity gains to be had at this point seem questionable ... as another poster commented, the output of an LLM is only as useful as the skill of the person reviewing it for correctness.
DragonStrength 114 days ago [-]
I think we all assume each individual company will need fewer developers to do the same work they're doing now. The question is do they have fewer devs or do more work. And if it is have fewer devs, will that open up the door for more small companies to be competitive as well, since they need fewer devs and have less competition for talent from people with deep pockets.
I find a lot of the AI discussion seems to land in the "lump of labor" fallacy camp though.
xorcist 114 days ago [-]
I am a skeptic. What would you say would be the easiest way for me to change my mind?
objektif 114 days ago [-]
How many of them are there of the latter type? In my 15 yrs of experience I would say 95%+ of all developers belong to your first category.
HarHarVeryFunny 114 days ago [-]
95% sounds way high, but maybe I'm wrong. I think it's part generational - old school programmers are used to having to develop algorithms/etc from scratch, and the younger generation seem to have been taught in school to be more system integrators assembling solutions out of cut and paste code and relying on APIs to get stuff done (with limited capability to DIY if such an API does not exist).
But not all younger programmers can be Stack Overflow cut-n-pasters, because not all (and surely not 95%!) programming jobs are amenable to that approach. There are lots of jobs where people are developing novel solutions, interacting with proprietary or uncommon hardware and software, etc, where the solution does not exist on Stack Overflow (and by extension not in an LLM trained on Stack Overflow).
dartos 115 days ago [-]
No, the first profession AI was on track to decimate was artists, but that didn’t really happen.
AI just destroyed shutterstock.
energy123 115 days ago [-]
The large majority of professional writers and artists produce thankless commodity output for things like TV advertisements, games, SEO content. These jobs should be threatened.
zeroonetwothree 115 days ago [-]
They get paid pretty low wages so it's not even clear that AIs will be cheaper. Consider also that you still need a human to evaluate their output, make adjustments, etc.
> "It pretty much has killed most small jobs in writing."
> "entry-level writing jobs have ceased to exist."
... There isn't an infinite amount of demand for commodity writing/art/music/vfx, and AI inference is pretty cheap and rapidly getting cheaper.
mkarrmann 115 days ago [-]
Is most code being written the equivalent of high-art or Shutterstock?
dartos 115 days ago [-]
I think most code being written is like a custom car made out of the most cost effective parts available.
Not pretty, but it gets the job done for the specific use cases of a given business.
Real production code doesn’t and have a shutter stock equivalent.
If you think most code is stock, then you just haven’t had enough experience in industry yet.
mkarrmann 114 days ago [-]
I actually like that analogy. It's somewhere in between. Enough that LLMs can help in many ways, but the current models are still far away from doing everything.
dartos 114 days ago [-]
Yeah, they’re not useless, but I don’t really see them replacing the profession of programming.
Just another tool in the kit.
langcss 114 days ago [-]
I believe LLMs decimating the role of a software engineer requires AGI, which the second that happens decimates all jobs.
What it may do is change the job requrements. Web/JS has decimated (reduced by 90% or more) MFC C++ jobs after all.
The programmer doesnt just write Python. That is the how... not the what.
__loam 115 days ago [-]
It's going to be incredible watching you people write way more code than you can feasibly maintain.
farresito 115 days ago [-]
Once we have AI-based language servers, which will, at some point in the future, be able to track entire repositories, I think maintaining projects will actually be far easier than right now.
kubb 114 days ago [-]
The second it doesn’t work, you’ll be like my time is too valuable to invest in debugging this, I need a nerd to delegate this to.
ken47 115 days ago [-]
The conflict of interest might have something to do with the fact that OpenAI's CEO/founder was once a major figure in Y Combinator. But I think you wanted to insinuate that the conflict of interest ran in the other direction.
Once ChatGPT can even come close to replacing a junior engineer, you can retry your claim. The progression of the tech underlying ChatGPT will be sub-linear.
charlieyu1 115 days ago [-]
The current driving force of AI is the desire to cut costs. Jobs will be cut even if ChatGPT is nowhere near a junior engineer and that's the problem.
ken47 114 days ago [-]
Care to elaborate on the second sentence with any proof?
talldayo 115 days ago [-]
I would better believe that if any superior software was being primarily designed by AI.
115 days ago [-]
IshKebab 114 days ago [-]
I doubt it. It can do some impressive stuff for sure, but I very rarely get a perfectly working answer out of ChatGPT. Don't get me wrong, it's often extremely useful as a starting point and time saver, but it clearly isn't close to replacing anyone vaguely competent.
noch 115 days ago [-]
The important point is, I feel, that most people are not even at the level of intelligence of a "a mediocre, but not completely incompetent, graduate student." A mediocre graduate science student, especially of the sort who graduates and doesn't quit, is a very impressive individual compared to the rest of us.
For "us", having such a level of intelligence available as an assistant throughout the day is a massive life upgrade, if we can just afford more tokens.
a_wild_dandan 115 days ago [-]
My sheer productivity boost from these models is miraculous. It's like upgrading from a text editor to a powerful IDE. I've saved a mountain of hours just by removing tedious time sinks -- one-off language syntax, remembering patterns for some framework, migrating code, etc. And this boost applies to nearly all of my knowledge work.
Then I see contrarians claiming that LLMs are literally never useful for anyone, and I get "don't believe your lying eyes" vibes. At this point, such sentiments feel either willfully ignorant, or said in bad faith. It's wild.
dartos 115 days ago [-]
> At this point, such sentiments feel either willfully ignorant, or said in bad faith.
I feel exactly the same, but in the opposite direction.
As someone who’s been programming for 17 years and working professionally for 10, I’m unable to get any huge productivity boosts from AI tools.
They’re better than Google+stack overflow for asking random questions, but in a specific context and they’re good for repetitive, but not identical, syntax. That’s about where the gains end for me.
Maybe at this point I’m just so fast about looking up documentation. Maybe the languages/problems I’m facing aren’t well represented in the training data, but I just don’t see this amazing advancement.
I’d really love to see, live, someone programming who really gets these big productivity gains.
zeroonetwothree 115 days ago [-]
Right, in my experience the time it takes to verify that the code it wrote for you is correct is more than just to write it in the first place. A big exception is if you're working in a new domain (e.g., new language or framework). Then it's obviously much faster, and I do derive value from it. But I don't spend a very large % of my time doing that.
I would speculate it's a productivity boost for programmers specifically working in areas that they are new to (or haven't really mastered yet). One question I have is whether overly relying on LLMs will reduce the ability to master a domain, and thus hurt your long-term skill. It might seem silly, like complaining that no one knows assembly anymore because of compilers, but I think it's different than just another layer of abstraction.
mewpmewp2 115 days ago [-]
Most gains are from using Copilot, do you use that?
acedTrex 115 days ago [-]
I have it, tried it for a while. I have it turned mostly off new except for rare boilerplate heavy cases.
It kept generating annoyingly wrong code. Things with subtly wrong misleading names, missing edge cases, ignoring immediate same file context etc. I found that it slowed me down so i turned it off.
dartos 115 days ago [-]
This is my exact experience as well.
mewpmewp2 115 days ago [-]
Which language?
homebrewer 114 days ago [-]
Same experience, but with TypeScript and Go. They gave me a 60-day trial (IIRC), I used it for two days, disabled it for the next 58 days, and after that removed it from the editor.
mewpmewp2 114 days ago [-]
I get really good results with TypeScript and Python. Like it knows exactly what I want to do, I feel like I think exactly as Copilot does. Maybe I am the statistical average...
Makes me wonder if people who don't like Copilot output will not like my natural output as well.
dartos 114 days ago [-]
Feel free not to share, I don’t want you to get dogpiled, but if you would humor me,
Could you share any code on GitHub (or pastebin or whatever) that you wrote with the help of AI?
Or could you share what kind of experience you have with programming (how many years, what domain you work in, etc)
mewpmewp2 114 days ago [-]
The projects I do are mostly frontend in React and backend with TypeScript/Node.js.
I have around 10+ years of professional experience although I did on/off hobby coding before that since 15 years ago.
It's mostly API endpoints, calling a database, third party APIs, data transformation, aggregation type of things.
Then either UI according to what designers provide or whatever I want to do for my side projects.
I think it's of course wildly more productive multiplier for side projects, since then it's mostly about typing things out since you know exactly what you want to do and being a little off doesn't matter.
I don't want to share any of my actual code right now, but I think one example for example is a React component that needs to fetch some sort of data, e.g. using @tanstack/react-query, then it does loading handling, error handling boilerplate things for me, which some of I change to what I specifically need for that situation, but I need very few keystrokes myself to get the initial boilerplate out that I then edit, and during edits it of course also gives me decent suggestions. And it will create the component prop types based on the args I pass to the component etc.
Then with backend, it's really good at data transformations. E.g. combining different datasets, reducing etc.
How well it picks the correct libraries and patterns depends on the project and I think how much I've navigated around, I'm not fully sure how the context is exactly passed, so usually I will feel it out and adapt code where necessary.
dartos 114 days ago [-]
Yes I find copilot is nice for things like tansack query. It’s like better snippets.
At my job we have this pretty clean SOA type architecture backed by a mongo db.
Copilot has trouble building the more complicated, domain specific queries on its own, I’ve found.
I do occasionally ask chatgpt how to write a certain query in a general case and apply that to what I’m writing. I also don’t really like mongosh’s docs.
garyal 113 days ago [-]
Hi there - I'm a PM at MongoDB that works on the MongoDB Shell. I'm curious to hear your thoughts on the issues you're currently facing with mongosh docs and how we could make them better for you. Thanks for taking the time to leave feedback!
acedTrex 115 days ago [-]
golang and python mainly.
For rust it failed spectacularly. So bad that its not worth discussing lol
iudqnolq 114 days ago [-]
I tried it for a while and thought it was helping a lot. Then I happened to use an IDE without it and realized it was increasing my rate of syntax tokens per hour but reducing the rate of features implemented per hour. In particular I was constantly rewriting boilerplate instead of ever writing helper functions.
dartos 115 days ago [-]
I use it for those refactors I mentioned in my comment.
It’s autocomplete++, except without knowledge of the rest of my codebase.
crooked-v 115 days ago [-]
I tried it. It ended up just being slightly better, significantly slower autocomplete.
bachmeier 115 days ago [-]
> I see contrarians claiming that LLMs are literally never useful for anyone
While I don't doubt that there's at least one person that has said this, what you're saying doesn't conflict with the things I and many others in the "skeptic" camp have said. LLMs are useful for a very specific set of tasks. The tasks you've listed are a tiny sliver of all the tasks that AI could potentially be doing. Would it be a good idea to consult an LLM if your mother is passed out on the floor? Probably not. The problem I have is with extrapolating from the current successes to conclude that many more tasks will be done by AI in five years.
perching_aix 115 days ago [-]
Thing is, I'm used to hearing a very similar sentiment on how e.g. using vim keybindings is so literally going to make me a 10x 100x whatever rockstar developer - and it's like what, enabling me to edit text a bit faster? And it's always anecdotes that yeah, from-qualia you feel so fast. But from-qualia I run like a marathon runner and sound like a radio host.
I personally did find some use cases for it and it does a decent job of cutting out minor gruntwork for me. But the experience itself screams to me that whatever gains I'm feeling I'm getting are all in my head.
raincole 115 days ago [-]
> using vim keybindings is so literally going to make me a 10x 100x whatever rockstar developer - and it's like what, enabling me to edit text a bit faster?
Yes, to me LLM is exactly like this: from nano to vim.
pxc 115 days ago [-]
Nano is borderline unusable, so that's like... a lot?
perching_aix 114 days ago [-]
holy hyperboly, clearly i picked the right example...
pxc 114 days ago [-]
I don't think basic vim usage (which is all I know, really) makes anyone super efficient. I don't think typing/editing speed is generally an important factor in programmer productivity or 'coding speed'.
It's just that every time I use nano it's (a) unintentional, as it's opened via EDITOR; (b) sort-of coerced, because most distros installing it by default also think it's somehow too much to install Vim or Emacs alongside it; and (c) extremely painfully awkward, because all other editors I use, I've invested at least as couple years of practice into.
If I spent a year using nano every day, and if I evolved a config file and read the manual during that time, I might eventually reach a place where using nano didn't feel cumbersome and irritating, but why would I do that if I already use Emacs and Vim every day? If I learn a 'new' editor it's going to be something extensible that I could see myself programming in every day: Emacs without evil; or one of the newer modal editors with a reversed sentence order, like kakoune and Helix; or, hell, VSCode.
So nano is likely doomed to remain forever cumbersome and irritating for me, somewhere on the level of typing on a touchscreen instead of a real keyboard.
wizzwizz4 115 days ago [-]
I'm a contrarian who believes your anecdote, and could even imagine that 5% of LLM users feel the same way, but thinks (a) these systems are about half as good as they're ever going to get, (b) we're past the point of diminishing returns, and (c) what we do have isn't worth the energy costs of running it, let alone creating it in the first place.
christofosho 115 days ago [-]
I think there may be a set of people that have figured out, 1) how to interact with LLMs; and 2) what in their lives is improved when interacting with LLMs. I am in the group that has not found the best use case for my own life, and have never needed it for improving anything I need to get done. Always looking for suggestions, though!
ants_everywhere 115 days ago [-]
> I think there may be a set of people that have figured out, 1) how to interact with LLMs....
1) is all about experimenting, which is what Tao is doing.
Having a playful and open minded attitude is like 80% of the game
llm_bro 115 days ago [-]
[dead]
__loam 115 days ago [-]
It's a system that is designed to convince you.
sn9 115 days ago [-]
Anyone intelligent enough to make a living programming likely has more than enough IQ to become a mediocre somewhat competent graduate student in math.
They just don't have the background, and probably lack the interest to dedicate studying for a few years to get to that level.
azan_ 114 days ago [-]
That's interesting take, personally I'd say that graduate-level math is orders of magnitude harder than significant majority of programming. And I mean that it's inherently harder, i.e. not due to lack of background.
kiba 115 days ago [-]
We are more limited by our emotions, and then our skills in learning and acquiring knowledge.
Intelligence is probably a distant third.
atleastoptimal 115 days ago [-]
Nah. Dogs are far emotionally better than most humans. Their intelligence is their limitation. Also “skills in learning and acquiring knowledge” is basically intelligence
115 days ago [-]
thewanderer1983 115 days ago [-]
>A mediocre graduate science student, especially of the sort who graduates and doesn't quit, is a very impressive individual compared to the rest of us.
Incorrect. University graduates shows a good work ethic, a certain character and a ability to manage time. It's not a measure of being better than the rest of humanity. Also, it's not a good measure of intelligence. If you only want to view the world through credentials. Academics don't consider your intelligence until you have a Ph.D and X years of work in your field. Industry only uses graduates as a entry requirement for junior roles and then favors and cares only about your years of experience after that.
Given that statement I can only assume you haven't been to University. You are mistaken to think, especially in time we are in now that the elite class are any more knowledgeable then you are.
ahnick 115 days ago [-]
Here are the key points outlining why thewanderer1983's response misinterprets noch's comment and contains inaccuracies:
Misinterpretation of the Original Point:
Intelligence vs. Moral Superiority: Noch discusses the intelligence level of a mediocre graduate science student compared to the general population. Thewanderer1983 misreads this as a claim of moral or inherent superiority over "the rest of humanity," which was not implied.
Conflation of Educational Levels:
University Graduates vs. Graduate Students: The response conflates undergraduate university graduates with graduate science students. Noch specifically refers to graduate students who have pursued advanced degrees, which typically require higher levels of specialization and intellectual rigor.
Incorrect Assessment of Intelligence Measures:
Graduate Studies as a Measure of Intelligence: Successfully completing graduate studies, especially in science, often requires significant intellectual capability. Dismissing this as "not a good measure of intelligence" overlooks the challenges inherent in advanced academic work.
Irrelevant Focus on Credentials and Industry Practices:
Credentials vs. Intelligence Discussion: Noch's comment centers on intelligence levels, not merely on holding credentials. Bringing up how industry values experience over degrees shifts the focus away from the original discussion about intelligence.
Unfounded Assumptions About Noch's Background:
Ad Hominem Attack: Suggesting that Noch hasn't been to university is an unfounded personal assumption that does not contribute to the argument and detracts from a respectful discourse.
Introduction of the 'Elite Class' Notion:
Straw Man Argument: Thewanderer1983 introduces the concept of an "elite class," which Noch did not mention. This misrepresents the original comment and argues against a point that wasn't made.
Overgeneralizations About Academia and Industry:
Academia's Recognition of Intelligence: Claiming that academics don't consider intelligence until one has a Ph.D. and years of work is an overgeneralization. Intelligence is recognized and valued at various academic levels.
Industry's View on Graduates: Stating that industry only uses graduates as an entry requirement ignores the significant roles that advanced degree holders often play in innovation and leadership within industries.
Ignoring the Core Benefit Highlighted:
AI as a Life Upgrade: Noch emphasizes how access to AI with the intelligence level of a graduate student is a substantial benefit for most people. Thewanderer1983 fails to address this key point, instead focusing on unrelated issues.
Misunderstanding of the Value of Graduate Education:
Work Ethic vs. Intellectual Achievement: While a good work ethic is important, graduate education in science also demands high intellectual capability, critical thinking, and problem-solving skills.
Logical Fallacies:
Red Herring: The discussion about industry preferences and academic credentials diverts from the main argument about the intelligence level of graduate students.
Ad Hominem: Attacking Noch's presumed lack of university experience instead of addressing the argument presented.
dpig_ 115 days ago [-]
If you couldn't be bothered to write this comment, I can't be bothered to read it.
ahnick 115 days ago [-]
but you did bother to comment on it. :)
mfgk 115 days ago [-]
An excellent example of an LLM (or an imitated LLM output) that fiercely defends the status quo, is overly verbose, does not come to the point, makes incorrect assumptions and lectures from a high horse.
LLMs are good for mediocre poems and presidential speeches that have no shame.
ahnick 115 days ago [-]
Yeah, well, you know, that's just like, uh, your opinion, man.
thewanderer1983 112 days ago [-]
I can play this silly game also.
Let’s evaluate the correctness of Thewanderer’s argument in detail:
Distinction Between Credentials and Intelligence:
Correctness: Thewanderer is correct in stating that a university degree is not a definitive measure of intelligence. Intelligence is a complex trait that encompasses various cognitive abilities, problem-solving skills, creativity, and emotional intelligence. Academic credentials primarily reflect one’s ability to succeed in a structured educational environment, which is just one aspect of intelligence.
Value of Real-World Experience:
Correctness: The argument that real-world experience is crucial is accurate. Many industries value practical experience and skills over formal education. For example, in technology and business sectors, hands-on experience, problem-solving abilities, and adaptability are often more important than academic qualifications alone. This is supported by numerous studies and industry practices that prioritize experience and performance over degrees.
Critique of Credentialism:
Correctness: Thewanderer’s critique of credentialism is valid. Over-reliance on academic credentials can overlook the diverse talents and skills that individuals without formal degrees may possess. This perspective is supported by the growing recognition of alternative education paths, such as vocational training, apprenticeships, and self-directed learning, which can also lead to successful careers.
Inclusivity and Egalitarianism:
Correctness: Promoting inclusivity and valuing diverse forms of knowledge is a correct and progressive stance. Intelligence and capability are not confined to those with advanced degrees. Many successful individuals in various fields do not have formal academic credentials but have achieved significant accomplishments through experience, self-learning, and practical skills.
Encouragement of Self-Worth:
Correctness: Encouraging individuals to value their own experiences and knowledge is a positive and correct approach. It fosters confidence and self-worth, which are important for personal and professional growth. Recognizing the value of diverse experiences and perspectives contributes to a more inclusive and equitable society.
In summary, Thewanderer’s argument is correct in several key aspects:
It accurately distinguishes between academic credentials and broader measures of intelligence.
It correctly emphasizes the importance of real-world experience.
It validly critiques the overemphasis on academic credentials.
It promotes an inclusive and egalitarian view of intelligence.
It encourages self-worth and confidence in one’s abilities.
These points collectively support a well-rounded and accurate perspective on intelligence and capability.
noch 112 days ago [-]
> I can play this silly game also.
Please could you share your prompt or a link to the conversation?
I'm genuinely puzzled that you're more interested in doubling down and justifying yourself and making new points (different from what I initially presented) than understanding the other person's point of view.
If you share your prompt, I'll have a better understanding of your motivations and whether you are arguing in good faith.
As far as silly games go: if you honestly believe a game is silly, you shouldn't play it, unless you want to win silly prizes.
fumeux_fume 115 days ago [-]
Rewind your mind to 1950 and reading that the future is chatting with bots about solving math homework.
nathanasmith 115 days ago [-]
They would be wondering why it took so long.
114 days ago [-]
ksec 115 days ago [-]
Which is why I think the AI era isn't hype but very much real. Jensen said AI has reached the era of iPhone.
We wont have AGI or ASI, whatever definition people have with those terms in the next 5 - 10 years. But I would often like to refer AI as Assisted or Argumented Intelligence. And it will provide enough value that drives current Computer and Smartphone sales for at least another 5 - 10 years. Or 3-4 cycles.
j_timberlake 114 days ago [-]
Terry is a genius that can get that value out of an LLM.
Average Joe can't do anything like that yet, both because he won't be as good at prompting the model, and because his problems in life aren't text-based anyway.
fnordpiglet 114 days ago [-]
I think this is where multi modal LLMs are so powerful. The ability to directly speak to the LLM with your voice is huge.
bamboozled 115 days ago [-]
Remind your mind to 1850, imagine seeing a lightbulb.
talldayo 115 days ago [-]
To be honest, I have gotten 100x more useful answers out of Siri's WolframAlpha integration than I ever have out of ChatGPT. People don't want a "not completely incompetent graduate student" responding to their prompts, they want NLP that reliably processes information. Last-generation voice assistants could at least do their job consistently, ChatGPT couldn't be trusted to flick a light switch on a regular basis.
meowface 115 days ago [-]
I use both for different things. WolframAlpha is great for well-defined questions with well-defined answers. LLMs are often great for anything that doesn't fall into that.
fnordpiglet 114 days ago [-]
I use home assistant with the extended open ai integration from HACS. Let me tell you, it’s orders of magnitude better than generic voice assistants. It can understand fairly flexibly my requests without me having a literal memory of every device in the house. I can ask for complex tasks like turning every light in the basement on without there being a zone basement by inferring from the names. I have air quality sensors throughout and I can ask it to turn on the fan in areas with low air quality and if literally does it without programming an automation.
Usually Alexa will order 10,000 rolls of toilet paper and ship them to my boss when I ask it to turn on the bathroom fan.
Personally tho the utility of this level of skill (beginner grad in many areas) for me personally is in areas I have undergraduate questions in. While I literally never ask it questions in my field, I do for many other fields I don’t know well to help me learn. over the summer my family traveled and I was home alone so I fixed and renovated tons of stuff I didn’t know how to do. I work a headset and had the voice mode of ChatGPT on. I just asked it questions as I went and it answered. This enabled me to complete dozens of projects I didn’t know how to even start otherwise. If I had had to stop and search the web and sift through forums and SEO hell scapes, and read instructions loosely related and try to synthesize my answers, I would have gotten two rather than thirty projects done.
Karrot_Kream 115 days ago [-]
How does this square up with literally what Terence Tao (TFA) writes about O1? Is this meant to say there's a class of problems that O1 is still really bad at (or worse than intuition says it should be, at least)? Or is this "he says, she says" time for hot topics again on HN?
sebzim4500 115 days ago [-]
o1-preview is still quite a specialized model, and you can come up with very easy questions that it fails embarassingly despite it's success in seemingly much more difficult tests like olympiad programming/maths questions.
You certainly shouldn't think of it like having access to a graduate student whenever you want, although hopefully that's coming.
thelastparadise 115 days ago [-]
Wait til you generate WolframAlpha queries from natural language using Claude 3.5 and use it to interpret results as well.
talldayo 115 days ago [-]
I've tried the ChatGPT integration and it was kinda just useless. On smaller datasets it told me nothing that wasn't obviously apparent from the charts and tables; on larger datasets it couldn't do much besides basic key/value retrieval. Asking it to analyze a large time-series table was an exercise in futility, I remain pretty unimpressed with current offerings.
segmondy 115 days ago [-]
Then you have a skill issue. 10 million paying are for GPT monthly because a large of them are getting useful value out of it. WolframAlpha has been out for a while and didn't take off for a reason. "GPT couldn't be trusted to flick a light switch on a regular basis" pretty much implies you are not serious or your knowledge about the capabilities of LLM is pretty much dated or derived from things you have read.
jazzyjackson 115 days ago [-]
Wolframalpha is a free service, really kind of an ad for all the (curated, accurate) datasets built into Wolfram Language
Wolfram Research is a profitable company btw
codr7 115 days ago [-]
FACT: The technology is inherently unreliable in its current form. And the weakness is built in, its not going to go away anytime soon.
jonahx 115 days ago [-]
The same is true of search engines, yet they are still incredibly useful.
codr7 115 days ago [-]
Not the same technology at all, until recently at least.
EDIT: Looks like I hurt someone's feelings by killing their unicorn. It was going to happen sooner or later, and pretending isn't very constructive. In fact, pretending this technology is reliable is a very risky thing to do.
TrackerFF 115 days ago [-]
Even more amazing, there plenty - PLENTY - of posters here that routinely either completely shit on LLMs, or casually dismiss them as "hype", "useless", and what have you.
I've been saying this for quite some time now, but some people are in for a very rude awakening when the SOTA models 5-10 years from now are able to completely replace senior devs and engineers.
Better buckle up, and start diversifying your skills.
ramraj07 115 days ago [-]
The way I see it these models especially O1 is an intelligence booster. If you start with zero it gives you back zero. Especially if you’re just genuinely trying to use it and not just trying to do some gotcha stuff.
zeroonetwothree 115 days ago [-]
Not sure how this post is evidence of AIs replacing senior devs.
achierius 114 days ago [-]
Diversifying to what? When AI can fully replace senior developers the world as we know it is over. Best case capitalism enters terminal decline: buy rifles. Worst case, hope that whatever comes out the either side is either benevolent or implodes quickly.
skeks6272 114 days ago [-]
[flagged]
meroes 115 days ago [-]
I mean paying several hundred to thousands of grad students to RLHF for several years and you get a corpus of grad-student text. I'm not surprised at all. AI companies hire grad students to RLHF in every subject matter (chemistry, physics, math, etc).
The grad-students write the prompts, correct the model, and all of that is fed into a "more advanced" model. It's corpi of text. Repeat this for every grade level and subject.
Ask the model that's being trained on chemistry grad level work a simple math question and it will probably get it wrong. They aren't "smart". It's aggregations of text and ways to sample and then predict.
fnordpiglet 114 days ago [-]
Except you’re talking about a general purpose foundation model that’s doing all these subjects at once. It’s not like you choose the subject specific model with Claude or gpt-01.
The key isn’t whether these things are smart or not. The key is that they put something that can answer basic grad level questions on almost any subject. For people that don’t have a graduate level education in any subject this is a remarkable tool.
I don’t know why the statement that “wow this is useful and a remarkable step forward” is always met with “yeah but it’s not actually smart.” So? Half of all humans have an IQ less than 100. They’re not smart either. Is this their value? For a machine, being able to produce accurate answers to most basic graduate level questions is -science fiction- regardless of whether it’s “smart.”
The NLP feat alone is stunning, and going from basically one step above gibberish to “basic grad school” in two years is a mouth dropping rate of change. I suspect folks who quibble over whether it’s “real intelligence” or simply a stochastic parrot have lost the ability to dream.
meroes 114 days ago [-]
Well ya once each project, e.g. “grad level math”, “k—12 math”, “undergrad math”, “k-12 chemistry”, etc is sufficient they are all fed into a larger more powerful model.
Maybe my RLHF work does make it harder for me to dream, but I teach models math which means a lot of prompt writing, and yet I have not found a way to have the model teach me math I don’t know yet (and there’s a lot I don’t know). It’s fun to play around with, but I still gravitate toward the isolated texts, not the aggregation as too much is lost or averaged in my opinion/experience. But hey maybe I’m overtrained on the traditional learning methods.
roboboffin 115 days ago [-]
[dead]
eigenvalue 115 days ago [-]
The o1 model is really remarkable. I was able to get very significant speedups to my already highly optimized Rust code in my fast vector similarity project, all verified with careful benchmarking and validation of correctness.
Not only that, it also helped me reimagine and conceptualize a new measure of statistical dependency based on Jensen-Shannon divergence that works very well. And it came up with a super fast implementation of normalized mutual information, something I tried to include in the library originally but struggled to find something fast enough when dealing with large vectors (say, 15,000 dimensions and up).
While it wasn’t able to give perfect Rust code that compiled on the very first try, it was able to fix all the bugs in one more try after pasting in all the compiler warning problems from VScode. In contrast, gpt-4o usually would take dozens of tries to fix all the many rust type errors, lifetime/borrowing errors, and so on that it would inevitably introduce. And Claude3.5 sonnet is just plain stupid when it comes to Rust for some reason.
I really have to say, this feels like a true game changer, especially when you have really challenging tasks that you would be hard pressed to find many humans capable of helping with (at least without shelling out $500k+/year in compensation for).
And it’s not just the performance optimization and relatively bug free code— it’s the creative problem solving and synthesis of huge amounts of core mathematical and algorithmic knowledge plus contemporary research results, combined with a strong ability to understand what you’re trying to accomplish and making it happen.
Here is the diff to the code file showing the changes:
But a lot of what you pay humans $500k a year for is to work with enormous existing systems that an LLM cannot understand just yet. Optimizing small libraries and implementing fast functions though is a huge improvement in any programmer's toolbox.
eigenvalue 115 days ago [-]
Yes, that’s certainly true, and that’s why I selected that library in particular to try with it. The fact that it’s mathematical— so not many lines of code, but each line packs a lot of punch and requires careful thought to optimize— makes it a perfect test bed for this model in particular. For larger projects that are simpler, you’re probably better off with Claude3.5 sonnet, since it has double the context window.
dyauspitr 115 days ago [-]
Can’t Gemini work with a million+ input tokens?
eigenvalue 115 days ago [-]
Yes, but its reasoning ability is extremely poor in my experience with real world programming tasks. I’m talking about stuff that Claude3.5 Sonnet handles easily, and GPT4o can also handle if it can fit in its smaller context window, where Gemini 1.5 pro just completely fails.
Bigger context is definitely helpful, but not if it comes at the expense of reasoning/analytical ability. I’m always a bit puzzled why people stress the importance of these “needle in a haystack” tests where the model has to find one specific thing in a huge document. That seems far less relevant to me in terms of usefulness in the real world.
derefr 115 days ago [-]
> I’m always a bit puzzled why people stress the importance of these “needle in a haystack” tests where the model has to find one specific thing in a huge document. That seems far less relevant to me in terms of usefulness in the real world.
How do you mean?
Half of writing code within a codebase, is knowing what functions already exist in the codebase for you to call in your own code; and/or, what code you'll have to change upstream and downstream of the code you're modifying within the same codebase — or even by forking your dependencies and changing them — to get what you want to happen, to happen.
And half of, say, writing a longform novel, is knowing all the promises you've made to the reader, the active Chekov's guns, and all the other constraints you've placed on yourself by hundreds of pages or even several books ago, that just became relevant again as of this very sentence. Or, moreover, which of those details it's the proper time to make relevant again for maximum impact and proper first-in-last-out narrative bridging structure.
In both cases, these aren't really literal "needle in a haystack" stress-tests; they should properly be tests of the model's ability to perform some kind of "associational priority indexing" on the context, allowing it to build concepts into associational sub-networks and then make long-distance associations where the nodes are entire subnetworks. (Which isn't something we really see yet, in any model.)
eigenvalue 115 days ago [-]
Yes agreed, I wasn’t trying to say it’s totally useless, but it’s not as helpful as synthesizing all that context intelligently. It’s more of a parlor trick. But that trick can be handy if you need something like that. Really, the main issue with Gemini is that it’s simply not very smart compared to the competition, and the big context doesn’t make up for that in the slightest.
aprilthird2021 115 days ago [-]
It doesn't work well though. You can't just stuff your entire codebase into it and get good results. I work somewhere that tries to do this internally
jdiez17 115 days ago [-]
> 1,337 additions
cough
ejpir 115 days ago [-]
[dead]
Ylpertnodi 115 days ago [-]
>you would be hard pressed to find many humans capable of helping with (at least without shelling out $500k+/year in compensation for).
And now we have a $number we can relate, and refer, to.
abstractbill 115 days ago [-]
My experience with O1 has been very different. I wouldn't even say it's performing at a "good undergrad" level for me.
For example, I asked a pretty simple question here and it got completely confused:
Anecdata, but I've been finding O1 to be worse than 4o & Claude 3.5 Sonnet. To add insult to injury, it's slower & chattier.
anujsjpatel 115 days ago [-]
And sometimes it just bugs out and doesn't give any response? Faced that twice now, it "thought" for like 10-30s then no answer and I had to click regenerate and wait for it again.
jghn 115 days ago [-]
I've seen it take over a couple of minutes, at which point I switched to Claude. And have seen reports of it taking even longer. So it may be that you didn't wait long enough.
abdullahkhalids 115 days ago [-]
Thinking about training LLMs on geometry. A lot of information in the sources would be contained in the diagrams accompanying the text. This model is not multi-modal, so maybe it wasn't trained on the accompanying diagrams at all.
I would really like if people check on a set of geometry and a set of analysis questions and compare the difference.
jazzyjackson 115 days ago [-]
It will be trash. I'll have to dig up a chat I had the weekend GPT4 was released, I was musing about dodecahedron packing problems and GPT4 started with an assertion that a line through a sphere intersects the surface 3 times.
Maybe if you fine tuned it on Euclid's Elements and then allowed it to run experiments with Mathematica snippets it could check its assumptions before spouting nonsense
almostgotcaught 115 days ago [-]
Why would they do this - make it speak like a customer service agent. The ideal experience here is short and succinct, not verbose and obsequious.
ljlolel 115 days ago [-]
Performs better on chat bot arena head to head
svdr 115 days ago [-]
Did you find out what the error was in computing the volume of the truncated icosidodecahedron?
bitexploder 115 days ago [-]
The novelty to me is that the “The experience seemed roughly on par with trying to advise a mediocre, but not completely incompetent, graduate student.” in so many subject areas! I have found great value in using LLMs to sort things out. In areas where I am very experienced it can be really helpful at tons of small chores. Like Terrence was pointing out in his third experiment — if you break the problem down it does solid work filling in smaller blanks. You need the conceptual understanding. Part of this is prompting skill. If you go into an area you don’t know you have to try and build the prompts up. Dive into something small and specific and work outward if the answer is known. Start specific and focused if starting from the outside in. I have used this to cut through conceptual layers of very complex topics I have zero knowledge in and then verify my concepts via experts on YT/research papers/trusted sources. It is an amazing tool.
wenc 115 days ago [-]
This has been my experience as well. I treat LLMs like an intern or junior who can do the legwork that I have no bandwidth to do myself. I have to supervise it and help it along, checking for mistakes, but I do get useful results in the end.
Attitudinally, I suspect people who have had experience supervising interns or mentoring juniors are probably those who are able to get value out of LLMs (paid ones - free ones are no good) rather than grizzled lone individual contributors -- I myself have been in this camp for most of my early career -- who don't know how to coax value out of people.
wslh 115 days ago [-]
> ... that I have no bandwidth to do myself.
One of the most interesting aspects of this thread is how it brings us back to the fundamentals of attention in machine learning [1]. This is a key point: while humans have intelligence, our attention is inherently limited. This is why the concept behind Attention Is All You Need [2] is so relevant to what we're discussing.
My 2 cents: our human intelligence is the glue that binds everything together.
‘Able to make the same creative mathematical leaps as Terence Tao’ seems like a pretty high bar to be setting for AI.
This is like when you’re being interviewed for a programming job and the interviewer explains some problem to you that it took their team months to figure out, and then they’re disappointed you can’t whiteboard out the solution they came up with in 40 minutes without access to google.
ColinWright 114 days ago [-]
My experience of working with people like Terence Tao, and being nowhere near their standard, is that they are looking for any kind of creativity. Everything is accepted, and it doesn't have to be "at their level".
Having read what he's saying there, and with my experience, I think your characterisation is inaccurate.
And having been at the talk he gave for the IMO earlier this year he is impressed with some of the interactions, it's just that he feels that any kind of "creative spark" is still missing.
kmacdough 114 days ago [-]
Right, Terrance was hoping it would have something new to think about it, some new perspective, right or wrong. GPTs have the ability to process insane amounts of information across all branches of math, science and art. This ability eclipses that of the most motivated intellectuals such as Terrance. It is thus a little disappointing that it was unable to find anything in its vast knowledgebase to apply a new lens to the problem.
mlyle 114 days ago [-]
I wonder how much of this is an intrinsic limitation of LLMs, and how much that interdisciplinary thinking and mashing together of problem domains is missing in the training data. It's a pretty rare thing, and the only times these analogies and linkages become noted is when they happen to work out (and they don't seem so far out of left field anymore once this happens).
dr_dshiv 114 days ago [-]
This is what synthetic data might support— based on mashing together the implications of massive numbers of concepts…
baq 114 days ago [-]
I wonder what the creative spark even is in the context of an autoregressive transformer.
Perhaps it’s an ability to confabulate facts into the context window which are not present in the training data but which are, in the context of maths, viable hypotheses? Every LLM can generate bullshit, but maybe we just need the right bullshit?
drzzhan 114 days ago [-]
That's interesting. I am young so I don't know what actually is creativity. Could you explain that part for me?
vlovich123 114 days ago [-]
> Creativity is defined as the tendency to generate or recognize ideas, alternatives, or possibilities that may be useful in solving problems, communicating with others, and entertaining ourselves and others
Basically can you provide a new perspective on solving a problem that hasn't been considered or a new way of looking at an existing idea in a new way to unlock a path.
drzzhan 114 days ago [-]
That is such a good quote to remember. Thanks for your answer! It actually opened my understanding a lot. I searched your quote and found some other good ones.
bbor 114 days ago [-]
You’re asking the right questions, IMO! Chomsky has spent his life trying to answer this question in different forms, and has ultimately arrived to the conclusion that we live in a “Pre-Gallilean” era of cognitive science, where the only answers available to us are developed through the use of intuitive interpretation (like they used to do with ‘the heavens’/space) instead of empirical contradiction (aka science).
He does have some answers, such as “human creativity is the ability to create an infinite range of outputs from a finite range of inputs that nonetheless pertain to our motivations/context in some useful way”, but that’s obviously not a very satisfying answer. It tells us a little — I think Tao is gesturing to exactly this when he complains that GPTo1 can only apply and combine mathematical approaches within a sort of closed domain rather than propose radically new ones - but it’s not helpful for an everyday understanding of creativity. IMO :)
In his words, from Language and Mind:
"Roughly, where we deal with cognitive structures, either in a mature state of knowledge and belief or in the initial state, we face problems, but not mysteries. When we ask how humans make use of these cognitive structures, how and why they make choices and behave as they do, although there is much that we can say as human beings with intuition and insight, there is little, I believe, that we can say as scientists…
What I have called elsewhere 'the creative aspect of language use' remains as much a mystery to us as it was to the Cartesians who discussed it, in part, in the context of the problem of 'other minds."
If this sounds intriguing to you/anyone, I highly recommend his (in)famous debate with Foucault, which is available for free on YouTube. It’s a bit wandering, but about halfway through they discuss creativity in depth, contrasting Foucault’s vaguely postmodern view-that human creativity is mostly constrained by societal circumstances-with Chomsky’s view, that human creativity is mostly constrained by the natural structures of our cognitive system(s).
Wow. Thank you for your thoughtful answer!
> “human creativity is the ability to create an infinite range of outputs from a finite range of inputs that nonetheless pertain to our motivations/context in some useful way”
Do you think this can be used as a metric? Like the more useful answer we can come up with, the more "creative" we are. The constraint of outputs and "some useful way" is such a good insight.
perching_aix 114 days ago [-]
A couple days ago I saw a tweet that described how to remove an element from an array in O(1) time instead of O(n). The key to it was identifying that for the purpose the given array was being used for, it could be unordered / not fully ordered, and it would not be an issue.
This way, it was possible to simply replace the element with the last array element, then decrease the size of the array by one. I'd say that's pretty creative: whoever came up with this was able identify what can be traded off to make the previously impossible, possible, unlocking new scales and possibilities.
In practice, I'd say creativity is often being able to manifest people's qualia in some unprecedented way. For example, say you're experimenting in your DAW, and discover a pretty cool sound. You identify the ways it can be used to emote and then utilize it in a work. If you really stumbled upon a sound that a lot of people find as emotive as you did, you just did something creative: it's as if you translated the qualia of an emotion into sound.
This qualia to manifestation is what's behind creativity in all of senses of the word I believe. In my previous example, discovering that orderedness is not actually a strict requirement, and (ab)using that to significantly alter the scaling of such an action is creative, because it undoes the notion that orderedness is a requirement. It goes against what's natural, but in a way that becomes extremely natural and indispensable once realized.
I think, in that way, current AIs are trained to be uncreative, since being creative inherently requires experimentation that is unaligned with the normal.
IshKebab 114 days ago [-]
> whoever came up with this was able identify what can be traded off to make the previously impossible, possible, unlocking new scales and possibilities.
In fairness, that is an extremely standard trick so it's reasonably unlikely that the author came up with it themselves.
almostuseful 114 days ago [-]
But the first time someone came up with that idea it was an act of creativity.
IshKebab 113 days ago [-]
Yeah I agree. Though it is hard to tell if LLMs are capable of "easy" creativity like that though because anything that easy has already been done many times in its training set.
You've have to invent some new domain I guess and see if it could be creative within that domain. Difficult to think of a good test though.
drzzhan 114 days ago [-]
Thank you for your answer! TIL a new word: qualia
I like your two examples. They have different perspectives. Though I must say the "qualia to manifestation" is still a bit abstract for me now. I'll keep them in mind.
114 days ago [-]
nybsjytm 114 days ago [-]
> ‘Able to make the same creative mathematical leaps as Terence Tao’ seems like a pretty high bar to be setting for AI.
There's no need to try to infer this kind of high bar, because what he says is actually very specific and concrete: "Here the result was mildly disappointing ... Essentially the model proposed the same strategy that was already identified in the most recent work on the problem (and which I restated in the blog post), but did not offer any creative variants of that strategy." Crucially the blog post in question was part of his input to ChatGPT.
Otherwise, he's been clear that while he anticipates a future where it is more useful, at present he only uses AI/ChatGPT for bibliography formatting and for writing out simple "Hello World" style code. (He is, after all, a mathematician and not a coder.) I've seen various claims online that he's using ChatGPT all the time to help with his research and, beyond the coding usage, that just seems to not be true.
(However, it's fair to say that "able to help Terence Tao with research" is actually a high bar.)
faizshah 114 days ago [-]
This has been observed by more people than just Terence Tao. Try using chatgpt to program something of higher complexity than tutorial code or write a basic blog post, it lacks creativity and the code is poorly designed.
ilrwbwrkhv 114 days ago [-]
Even for like basic Rust programs it ties itself into countless borrow checker issues and cannot get out of it, both the OpenAI as well as Sonnet (Anthropic).
It doesn't really get logic still, but it does small edits well when the code is very clear.
I think this will always remain a problem. Because it can never shut up, it keeps making stuff up and "hallucinate" (works normally, just incorrectly) to dig itself further and further into a hole.
Autocomplete on steroids is what peak AI will look like till the time we can crack consciousness and AGI (which the modern versions are nothing even close to).
loxs 114 days ago [-]
To me it's quite good for Rust as it usually suggests me some (unknown to me) borrow checker API that actually helps me either solve the issue at hand or at least point me in the right direction to figure what is actually wrong with my code and it doesn't compile
ilrwbwrkhv 114 days ago [-]
Funny it just wrote this for me:
fn parse_image_url(html: &str) -> Option<String> {
let re = regex::Regex::new(r#"src="([^"]+)""#).ok()?;
re.captures(html)
.and_then(|caps| caps.get(1))
.map(|m| m.as_str().to_string())
}
Prompt was "write a function which extracts image urls from a given block of html"
114 days ago [-]
ben_w 114 days ago [-]
Of the 40 or so definitions of consciousness, I have no reason to think any is either necessary nor sufficient for any of the things we want AI to do that Transformer models can't (or can but badly and we want them to do better).
AndrewKemendo 114 days ago [-]
This is precisely the first thing I thought
If arguably the person with the highest IQ currently living, is impressed but still not fully satisfied that a computer doesn’t give Nobel prize winning mathematical reasoning I think that’s a massive metric itself
So what then should the first year maths PhD think? I believe Tao obliquely addresses this with his previous post with effectively “o1 is almost as good as a grad student”
nybsjytm 114 days ago [-]
> If arguably the person with the highest IQ currently living, is impressed but still not fully satisfied that a computer doesn’t give Nobel prize winning mathematical reasoning
No offense, but every part of this characterization is really unserious! He says "the model proposed the same strategy that was already identified in the most recent work on the problem (and which I restated in the blog post), but did not offer any creative variants of that strategy." That's very different from what you're suggesting.
The way you're talking, it sounds like it'd be actually impossible for him to meaningfully say anything negative about AI. Presumably, if he was directly critical of it, it would only be because his standards as the world's smartest genius must simply be too high!
In reality, he's very optimistic about it in the future but doesn't find it useful now except for basic coding and bibliography formatting. It's fascinating to see how this very concrete and easily understood sentiment is routinely warped by the Tao-as-IQ-genius mythos.
AndrewKemendo 114 days ago [-]
I’m not sure i understand your complaint
Are you arguing that I’m making an appeal to authority fallacy?
kzz102 115 days ago [-]
It's interesting that humans would also benefit from the "chain of thought" type reasoning. In fact, I would argue all students studying math will greatly increase their competence if they are required to recall all relevant definition and information before using it. We don't do this in practice (including teachers and mathematicians!) because recall is effortful, and we don't like to spent more effort than necessary to solve a problem. If recall fails, then we have to look up information which takes even more effort. This is why in practice, there is a tremendous incentive to just "wing it".
AI has no emotional barrier to wasted effort, which make them better reasoners than their innate ability would suggest.
schappim 115 days ago [-]
Showing your work in tests is kind of like “chain of thought” reasoning, but there’s a slight difference. Both force you to break down your process step by step, making sure the logic holds and you aren’t skipping crucial steps. But while showing your work is more about demonstrating the correct procedure, “chain of thought” reasoning pushes you to recall relevant definitions and concepts as you go, ensuring a deeper understanding. In both cases, the goal is to avoid just “winging it,” but “chain of thought” really digs into the recall aspect, which humans tend to avoid because it’s effortful.
Satam 115 days ago [-]
Wow! I love this take. Somehow with all this evidence of COT helping out LLMs, I never thought about using it more myself. Sure, we kind of do it already but definitely not to the degree of LLMs, at least not usually. Maybe that's why writing is so often admired as a way to do great thinking - it enables longer chains of thoughts with less effort.
jvvw 114 days ago [-]
I assumed that everybody did this when trying to solve a maths problem they are stuck on (thinking university type level maths rather than school maths) and when I was teaching I would always get people to go back to the definitions.
I wasn't amazing at maths research (did a PhD and post-doc and then gave up) but my experience was that it was partly thinking hard about things and grappling with what was going on and trying to break it down somehow, but also scanning everything you know related to the problem, trying to find other problems that resemble it in some way that you can steal ideas from etc.
perihelions 115 days ago [-]
I'm so excited in anticipation of my near-term return to studying math, as an independent curiosity hobby. It's going to be epically fun this time around with LLM's to lean on. Coincidentally like Terence Tao, I've also been asking complex analysis queries* of LLM's, things I was trying to understand better in my working through textbooks. Their ability to interpret open-ended math questions, and quickly find distant conceptual links that are helpful and relevant, astonishes me. Fields laureate Professor Tao (naturally) looks down on the current crop of mathematics LLM—"not completely incompetent graduate student..."—but at my current ability level that just means looking up.
*(I remember a specific impressive example from 6 months ago: I asked if certain definitions could be relaxed to allow complex analysis on a non-orientable manifold, like a Klein bottle, something I spent a lot of time puzzling over, and an LLM instantly figured out it would make the Cauchy-Riemann equations globally inconsistent. (In a sense the arbitrary sign convention in CR defines an orientation on a manifold: reversing manifold orientation is the same as swapping i with -i. I understand this now, solely because an LLM suggested looking at it). Of course, I'm sure this isn't original LLM thinking—the math's certainly written down somewhere in its training material, in some highly specific postgraduate textbook I have no knowledge of. That's not relevant to me. For me, it's absolutely impossible to answer this type of question, where I have very little idea where to start, without either an LLM or a PhD-level domain specialist. There is no other tool that can make this kind of semantic-level search accessible to me. I'm very carefully thinking how best to make use of such an, incredibly powerful but alien, tool...)
rossant 114 days ago [-]
I agree. Having access to a kind of semantic full search engine on basically all textbooks on Earth feels like a superpower. Even better would be if it could pinpoint the exact textbook references it found the answer in.
nybsjytm 115 days ago [-]
How will you know if its answers are correct or not?
perihelions 115 days ago [-]
Because I'm verifying everything by hand, as is the whole point of studying pure mathematics.
SOTGO 115 days ago [-]
How can you verify a proof though? Pure math isn't really about computations, and it can be very hard to spot subtle errors in a proof that an LLM might introduce, especially since they seem better at sounding convincing rather than being right.
perihelions 115 days ago [-]
The same way I verify my own proofs of textbook exercises: very cautiously. Subtle errors are a feature of the problem domain, not a new novelty.
fragmede 115 days ago [-]
By using Lean, a proof assistant and a functional programming language.
Here's @tao on mathstodon saying he's learning it.
To code proofs in lean, you have to understand the proof very well. It doesn't seem to be very reasonable for someone learning material for the first time.
The examples in this book are extraordinarily simple, and covers material that many proof assistants were designed to be extremely good at expressing. I wouldn't be surprised if a LLM could automate the exercises in this book completely.
Writing nontrivial proofs in a theorem prover is a different beast. In my experience (as someone who writes mechanized mathematical proofs for a living) you need to not only know the proof very well beforehand, but you also need to know the design considerations for all of the steps you are going to use beforehand, and you also need to think about all of the ways your proof is going to be used beforehand. Getting these wrong frequently means redoing a ton of work, because design errors in proof systems are subtle and can remain latent for a long time.
red_trumpet 114 days ago [-]
> think about all of the ways your proof is going to be used beforehand
What do you mean by that? I don't know much about theorem provers, but my POV would be that a proof is used to verify a statement. What other uses are there one should consider?
markusde 113 days ago [-]
The issue is-- there are lots of way to write down a statement.
One common example is if you're going to internalize or externalize a property of a data structure: eg. represent it with a dependent type, or a property about a non-dependent type. This comes with design tradeoffs: some lemmas might expect internalized representations only, some rewrites might only be usable (eg. no horrifying dependent type errors) with externalized representations. For math in particular, which involves rich hierarchies of data structures, your choice about internalization might can impacts about what structures from your mathematical library you can use, or the level of fragile type coercion magic that needs to happen behind the scenes.
JonChesterfield 115 days ago [-]
The premise is to have the LLM put up something that might be true, then have lean tell you whether it is true. If you trust lean, you don't need to understand the proof yourself to trust it.
nybsjytm 115 days ago [-]
The issue is that a hypothetical answer from a LLM is not even remotely easy to directly put into lean. You might ask the LLM to give you an answer together with a lean formalization, but the issue is that this kind of 'autoformalization' is at present not at at all reliable.
cma 115 days ago [-]
Tao says that isn't the case for all of it and that on massive collaborative projects he's done many nonmathemeticians did sections of them. He says someone who understands it well needs to do the initial proof sketch and key parts but that lots of parts of the proof can be worked on by nonmathemeticians.
nybsjytm 115 days ago [-]
If Tao says he's interested in something being coded in lean, there are literal teams of people who will throw themselves at him. Those projects are very well organized from the top down by people who know what they're doing, it's no surprise that they are able to create some space for people who don't understand the whole scope.
This is also the case for other top-profile mathematicians like Peter Scholze. Good luck to someone who wants to put chatgpt answers to random hypotheticals into lean to see if they're right, I don't think they'll have so easy a time of it.
Davidzheng 115 days ago [-]
are you questioning the entire premise of pure mathematics?
nybsjytm 115 days ago [-]
Good luck! That can be pretty hard to do when you're at the learning stage, and I would think doubly so given the LLM style where everything 'looks' very convincing.
WanderPanda 115 days ago [-]
How will we even measure this? Benchmarks are gamed/trained on and there is no way that there is much signal in the chatbot arena for these types of queries?
I think in just a few month the average user will not be able to tell the difference in performance between the major models
115 days ago [-]
fsndz 115 days ago [-]
Completely agree with Terence Tao. this is a real advancement. I've always believed that with the right data allowing the LLM to be trained to imitate reasoning, it's possible to improve its performance. However, this is still pattern matching, and I suspect that this approach may not be very effective for creating true generalization. As a result, once o1 becomes generally available, we will likely notice the persistent hallucinations and faulty reasoning, especially when the problem is sufficiently new or complex, beyond the "reasoning programs" or "reasoning patterns" the model learned during the reinforcement learning phase.
https://www.lycee.ai/blog/openai-o1-release-agi-reasoning
afro88 115 days ago [-]
The o1 model is hit and miss for me. On one hand it has solved the NYT Connections game [0] each day I've tried it [1]. Other models, including Claude Sonnet 3.5 cannot.
But on the other hand it misses important detail and hallucinates, just like GPT-4o. And can need a lot of hand holding and correction to get to the right answer, so much so that sometimes you wonder if it would have been easier to just do it yourself. Only this time it's worse because you're waiting 20-60 seconds for an answer.
I wonder if what it excels at is just the stuff that I don't need it for. I'm not in classic STEM, I'm in software engineering, and o1 isn't so much better that it justifies the wait time (yet).
One area I haven't explored is using it to plan implementation or architectural changes. I feel like it might be better for this, but need the right problems to throw at it.
I'm not a mathematician much beyond AP Calc in high school (almost 40 years ago). I am deeply fascinated by Bézier curves and geometric continuity. I've spent a lot of time digging up research papers and references about this and related Computer Aided Geometric Design mathematics. Mostly I skim them for the illustrations and more geometric relations. For several years I've been trying to understand how to make sure that a Bézier curve is G3 to an adjoining curve, given the tangent direction, and first and second curvature derivatives.
I've tried a variety of ways to ask various LLMs to help solve this. Finally with access to ChatGPT o1-preview I was able to get a good answer. The first answer was wrong, but with a little more prompting and clarification I was able to get the answer I wanted to relate the positions of P0, P1, P2 and P3 so that a Bézier curve could be G3. This isn't something that is unknown because there are many CAD programs which can do this already, but I had not been able to find the answer I was looking for in a form that was useful to me.
I don't really know where that puts o1-preview relative to a math grad student, but after spending tons of time over a couple years on this pet project, getting an answer from a chat bot was one of the more magical moments I've had with technology in a long time.
ak_111 115 days ago [-]
He mentions that he posed to O1 the same challenge he posed to a previous GPT (which he also previously blogged about), so I am wondering how much O1 benefited from potentially "seeing" this discussion in its training set (which probably contains a very well recent snapshot of the world wide web).
lysecret 115 days ago [-]
In some of the responses o1 actually was telling me it had a cutoff of 2023. not sure if they officially stated it somewhere.
lewhoo 114 days ago [-]
I wonder if those responses could already be influenced by the fact that the cutoff for some of the models out there was indeed 2023 and people wrote about it all over the internet.
gcanyon 115 days ago [-]
> The experience seemed roughly on par with trying to advise a mediocre, but not completely incompetent, graduate student.
Coming from Terence Tao that seems pretty remarkable to me?
nybsjytm 115 days ago [-]
Daniel Litt, an algebraic geometer on twitter, said "Pretty impressed by o1-preview! Still not having much luck asking it to do any interesting math but it seems much more reliable with simple things; I can actually imagine it being a net time-saver at this point with some non-mathematical tasks."
Any other takes by mathematicians out there?
115 days ago [-]
knotthebest 110 days ago [-]
Do note that Terry has access to the full o1. o1-preview is, well, a preview.
nmca 115 days ago [-]
Note the selection effect in “a mediocre graduate student” (that got to work with Terry Tao)
ocular-rockular 114 days ago [-]
I don't understand why this is news? This could have been said by any one particular contributor from HackerNews but just because it's from Terence Tao it hits the front page? I understand that the guy is a great mathematician, but why is his input on this any more valuable than the myriad of discussions about o1 from other professionals on here?
j_maffe 114 days ago [-]
If anyone comes across comments from professionals on the same level at Terrance Tao, I'd love for them to share it.
ocular-rockular 114 days ago [-]
Most people would likely not engage with that commentary simply because they don't enjoy the same celebrity status as Tao. Yet, I think those voices are equally important to be heard. Terence isn't the only PhD/professional using or discussing these tools.
j_maffe 114 days ago [-]
Of course not. And he is relatively more popular than some others on his level of expertise. But I think Tao's opinion is a bit more interesting than a general PhD/professional.
cwillu 114 days ago [-]
Someone at the top of their field discussing the capability of models in their field is much more interesting than someone mediocre in their field making trite observations about capabilities outside their field.
ocular-rockular 114 days ago [-]
There is any number of people discussing o1 from the context of their field. So again, why are we valuing one set of discussions above another? Terence Tao may be great, but he's not the end all, be all of commentary. There's plenty of other PhDs talking about this very same thing.
jackleeb 110 days ago [-]
there are levels to this shit.
the best competitive programmer in the world (gennady korotkevich, aka tourist) recently crossed the 4000 ELO barrier in Codeforces. o1 is about 1807 ELO.
the best ai model is compared against the best human in the context of competition programming, to set a clear standard of comparison.
similarly, terence tao represents the highest levels of math in analysis.
his input is valuable in regards to math. his summary of the current capabilities of o1 is important because we can then understand the level of competence the best ai models have right now, and set a standard of comparison just like with coding.
site note:
any number of phds = not the same expertise.
there are thousands of phds who graduate every year, let alone thousands of unemployable phds who fail to get a professorship.
there are only 2-4 fields medalists chosen every 4 years.
nybsjytm 114 days ago [-]
Tao is famous for being the world's most singular and inspiring genius. It's kind of a meme position* that most people accept because they think everyone else accepts it, but for people of a certain inclination, it makes anything he might say into a Pronouncement to read depths into. If you think someone is possibly the greatest mathematician of all time, you'd be interested in everything they think!
(* I think one could very legitimately view him as the top researcher in harmonic analysis in the world - he is a great mathematician - but it's not clear to me how people go from that to Epochal Genius and his extreme celebrity status across STEM)
sva_ 114 days ago [-]
Because it is an update on his previous post that got discussed here.
114 days ago [-]
j_timberlake 114 days ago [-]
[flagged]
busyant 115 days ago [-]
Well, one thing is clear.
Math grad students everywhere now have a benchmark to determine if Terry Tao considers them to be mediocre or incompetent.
bbor 114 days ago [-]
Does anyone here think this will change without a full cognitive apparatus? Aka “agents”, to use the modern term? I have my doubts, but I’m relatively uninformed about the cutting edge of pure ML itself.
Just off the top of my head, maybe a RLHF run performed by academic experts and geared towards “creative applications” could get us farther than we are? Given how much the original RLHF run cost with underpaid workers in developing countries that might be exorbitantly expensive, but it’s worth a dream. Perhaps as a governmental or NGO-driven open source initiative…
Of course, a core problem here is defining “creativity” in stringent — or in Chomsky’s words, “scientific” — terms. RLHF dodged that a bit by leaning on the intuitive capabilities of your human critics. I’m constantly opining about how LLMs solved the frame problem, but perhaps it’s better characterized as a partial solution for a relatively easy/basic environment: stories about the real world. The Abstract/Academic/Scientific Frame Problem might be another breakthrough away, yet…
If you know the contours of the answer and can describe what you are looking for it can quickly find it for you.
vavooom 114 days ago [-]
Most surprising thing about this article is discovering that 'mathstodon' exists and Terence is active on it!
ColinWright 114 days ago [-]
We started Mathstodon, an instance of Mastodon-the-Platform, on April 12, 2017.
So we've been up for over 7 years, and have just over 19K accounts.
j_maffe 114 days ago [-]
Awesome work! I love what the server has grown into.
ColinWright 114 days ago [-]
Thank you.
j_maffe 114 days ago [-]
Mathstodon is one of the most active Mastodon instances. I have no idea how it reached its current level of activity.
kldnav 115 days ago [-]
Tao and Aaronson are optimistic about LLMs. What are they telling their students? That math and science degrees will soon have the same value as a degree in medieval dance theory?
If they are overly optimistic, perhaps it would be good to hear the opinions of Wiles and Perelman.
raincole 115 days ago [-]
Tao isn't that optimistic. His opinion on LLMs is rather conservative.
> If you want to prove an unsolved conjecture, one of the first things you need to do is to break it up into smaller pieces, each of which has a better chance of being proven. But you will often break up a problem into harder problems. It’s very easy to transform a problem into one that’s harder than into one that’s simpler. And AI has not demonstrated any ability to be any better than humans in this regard.
Not sure if O1 changed his mind tho.
ljlolel 115 days ago [-]
If you look at a lot of people’s PHDs, we now teach these things to 1st years. PhDs today do incredible deep work and the edge of science will just go further.
Davidzheng 115 days ago [-]
What does this mean? Of course math AI will take over top research in next ten years but usefulness to society has never been a goal of pure mathematics. I don't know if you understand the motivation for studying pure math. Personally I think it will be mostly good for research math
asdasjhG 115 days ago [-]
The "value of a degree" means the employment prospects for the degree holder.
Which is going to zero if the optimistic predictions are correct, so the optimistic professors should warn their students.
I understand the motivation for pure math quite well. It is about beauty, understanding things and discovering things for oneself. If computers do the work, the discovery part is gone and pure math is ruined.
For the non-research part, the AI zealots will want to replace all human labor with software.
olalonde 115 days ago [-]
Why are you saying this as if it was a bad thing? Just because software becomes better at us at something doesn't mean we can't do it out of fun (e.g. see chess community for example).
Davidzheng 115 days ago [-]
do you also value your personal relationships based on employment prospects?
buddhistdude 114 days ago [-]
not fully related to what the parent is saying, but I need to get this off my chest:
isn't this development obviously going to result in the deprecation of the value of the human intellect to near-zero? which is the thing that virtually all people on this platform base their livelihood on?
there's such a deafening silence around this topic on the internet where there should be - i don't know what but not this silence. we don't know what to do right? and we're avoiding this topic.
with this version they broke the assumed wall of llms's developed that was the last copium that we could believe in. the wall is broken and now it's just a matter of time until your capacity to think will be completely unneeded. the machine will do it more accurately and more quickly by orders of magnitude.
am I a doomer? I was in the home country of my parents recently, that is completely dysfunctional and war is on the verge of breaking out. what I learned there is that humans stay ignorant of great dangers until the very moment in which it affects them. this must've been the case with all the great wars that we've had. the water is rising but until I start to suffocate I don't agree to see it. i make up copes, or I think some are going to drown but I'm safe, or I distract myself.
what are all the software engineers here thinking? what's your cope for this? or are we all freezing in shock right now? this o1 is solving problems that i know many of my colleagues can never solve. what are we hoping for I think? I don't have a future because my future was the image that I had of it. and no image of the future that would be nice to keep around seems plausible at this point.
zamadatix 114 days ago [-]
I wouldn't say there is a silence (as in avoidance) there are just folks convinced it's going to completely replace people (with 2 main subgroups: utopia and dystopia), folks convinced it's a parlor trick and never going to result in more, and folks convinced it's just the next efficiency increasing tool where some busy-ness we have will go away but only to make room for increasing total output not for replacing everyone wholesale.
Generally these folks have all said their piece and are tired of talking about it every time LLMs come up -> silence (as in nothing more to say) as each group is self convinced and most don't necessarily feel the need to get 100% of folks on board with their view. The dystopia or "doomer" group are the main ones left feeling like they need more of an answer, the rest move on quietly in either excitement or disinterest.
kiehaBe 114 days ago [-]
[flagged]
gary_0 115 days ago [-]
Tao mentions grad students; I wonder how they feel reading this?
As LLMs continue to improve I feel like anyone making a living doing the "99% perspiration" part of intellectual labor is about to enter a world of hurt.
fragmede 115 days ago [-]
> The experience seemed roughly on par with trying to advise a mediocre, but not completely incompetent, graduate student.
And you thought you had imposter syndrome before!
fishcrackers 115 days ago [-]
[dead]
lupire 114 days ago [-]
He clarified that he is referring to a very specific task of assisting an expert, not PhD research in general.
teaearlgraycold 115 days ago [-]
Or can everyone now lead research projects and build businesses?
asdasjhG 115 days ago [-]
No, almost everyone who gets funding for a business already belongs to the monied royalty and gets it either directly from his family, via friends of the family or laundered through a VC.
There are exceptions of course, but that's how the bulk of businesses, especially those with stupid ideas are funded. In the latter category success does not even matter, the trust fund baby just has to have the appearance of a leader position.
teaearlgraycold 115 days ago [-]
There is truth to this, but you’re overstating it. If AI is cheap and can replace grunt workers then we’ll have a massive wave of new businesses solving problems that previously required a massive capital investment.
someguy293 114 days ago [-]
not fully related to what the parent is saying, but I need to get this off my chest:
isn't this development obviously going to result in the deprecation of the value of the human intellect to near-zero? which is the thing that virtually all people on this platform base their livelihood on?
there's such a deafening silence around this topic on the internet where there should be - i don't know what but not this silence. we don't know what to do right? and we're avoiding this topic.
with this version they broke the assumed wall of llms's developed that was the last copium that we could believe in. the wall is broken and now it's just a matter of time until your capacity to think will be completely unneeded. the machine will do it more accurately and more quickly by orders of magnitude.
am I a doomer? I was in the home country of my parents recently, that is completely dysfunctional and war is on the verge of breaking out. what I learned there is that humans stay ignorant of great dangers until the very moment in which it affects them. this must've been the case with all the great wars that we've had. the water is rising but until I start to suffocate I don't agree to see it. i make up copes, or I think some are going to drown but I'm safe, or I distract myself.
what are all the software engineers here thinking? what's your cope for this? or are we all freezing in shock right now? this o1 is solving problems that i know many of my colleagues can never solve. what are we hoping for I think? I don't have a future because my future was the image that I had of it. and no image of the future that would be nice to keep around seems plausible at this point.
teaearlgraycold 114 days ago [-]
My cope is that by the time my job is stolen nearly every other job that involves sitting at a computer has also been lost. Once they come for my paycheck there will have already been an upheaval. Maybe it will be a disaster, maybe not. But I’m still better positioned than your average person.
sandspar 115 days ago [-]
Are you kidding or being serious? Most business owners are just regular people. Have you ever worked in a small business before?
tambourine_man 114 days ago [-]
Glad to see Tao using Mastodon instead of Twitter.
afian 115 days ago [-]
As a previously "mediocre, but not completely incompetent, graduate student" at a top research university (who's famous advisor was understandably frustrated with him), I consider this a huge win!
sgt101 115 days ago [-]
Is there a list of discoveries or siginficant works/constructions made by people collaborating with LLM's? I mean as opposed to specific deep networks like Alphafold or Graphcast?
rvnx 115 days ago [-]
It may cause a reputation or legal issue, so it is not in their interest to admit it. In the real world, is there PhD students or researchers using ChatGPT to move forward and help them think their ideas ?
Obviously yes, but admitting it may not be the right move.
Super spreadsheet - very useful. Thank you for doing the work and sharing.
lalaithion 115 days ago [-]
It needs a bigger context, but the moment someone can feed an entire GitHub repo into this thing and ask it to fix bugs... I think O2 may be the beginning of the end.
benreesman 115 days ago [-]
Reading anything Terrence Tao writes is thought provoking and I doubt I’m seeing anything others haven’t.
There’s at least a “complexity” if not a “problem” in terms of judging models that to a first approximation have been trained on “everything”.
Have people tried putting these things up against serious mathematical problems that are well studied? With or with Lean hinting has anyone gotten like, the Shimura-Taniyama conjecture/proof out?
No FLT yet, but as someone who was initially quite skeptical, I’m starting to be convinced!
lupire 114 days ago [-]
Those are not serious mathematical problems. Those are toy math problems, crafted backwards from known facts, designed to be solved in under 1hr, that are hard for most humans because they lack the memorization and recall and search speed that the computer has.
kevinventullo 112 days ago [-]
Sure. But even many high-caliber research mathematicians can’t do Putnam problems in a heartbeat. If we get to the point where an LLM can solve any homework problem that appears in a textbook, including graduate textbooks, that would already be something like a “lemma prover” if not a full-blown “theorem prover”.
Anyway, I think five years ago I was skeptical that ML would even get to the point of being able to solve competition problems, and I was proven wrong, so my priors have been updated.
maxglute 114 days ago [-]
>The experience seemed roughly on par with trying to advise a mediocre, but not completely incompetent, graduate student. However, this was an improvement over previous models, whose capability was closer to an actually incompetent graduate student.
Appreciate the no fucks given categorization of grad students.
artninja1988 115 days ago [-]
>could not generate conceptual ideas of their own
Is the most important part imo. A big goal should be some ai system coming up with its own discovery and ideas. Really unclear how we can get from the current paradigm to it coming up with something like general relativity, like Einstein. Does it require embodiment?
sfink 115 days ago [-]
Why should that be a big goal? It's difficult, it's not what they are good at, and they can get a lot better at assisting in other ways through incremental improvements. I'm happy to leave this part to the humans, at least for now, especially when there's so much more improvement still possible in other directions.
It also seems like one of those things where we ought to ask whether we should, before asking whether we could. Why not focus on areas that are easier, more beneficial, and less problematic from a "should" perspective?
roywiggins 115 days ago [-]
we don't know how to reliably produce humans who produce GR-level ideas, this might be biting off a lot more than we can chew
115 days ago [-]
reverseblade2 115 days ago [-]
Here's a little test I try on LLMs. So far only O1 and Microsoft Copilot (bing chat) was able to solve it:
Find a, b, c distinct positive integers satisfying a^3 + b^3 = c^4. Hint: try dividing all sides by c^3, then giving values to (a/c) and (b/c).
zeroonetwothree 115 days ago [-]
Any integer that is a sum of 2 cubes produces a solution. Since if x^3 + y^3 = z then we have (xz)^3 + (yz)^3 = z^4. So this doesn't seem super interesting?
d0mine 115 days ago [-]
"with even the latest tools the effort put in to get the model to produce useful output is still some multiple (but not an enormous multiple now, say 2x to 5x) of the effort needed to properly prompt and verify the output. However, I see no reason to prevent this ratio from falling below 1x in a few years, which I think could be a tipping point for broader adoption of these tools in my field"
Given the log scale on compute to improve performance, it is not a guarantee that the ratio can be improved so much in a few years
aoeusnth1 115 days ago [-]
The y axis is also log scale (log likelihood). It’s a power law, not an exponential law.
I think AI answer is not correct, it may be some textbook interpretation but I was expecting Euclid's exact wording.
Edit: Google's Gemini gives the exact wording of the postulate and then comments that this means that you can draw one line bitween two points. I think this is better
pama 115 days ago [-]
The original text is: Ἠιτήσθω ἀπὸ παντὸς σημείου ἐπὶ πᾶν σημεῖον εὐθεῖαν γραμμὴν ἀγαγεῖν. Roughly: let it be required that from any point to any point it is possible to draw a straight line.
Both gpt4o and o1 roughly know the correct original text, so prompting, the model’s background memory, or random chance may influence your outcomes, though hopefully (in an improved model) you should never get you incorrect info.
Edit: in case it isnt clear, I could not reproduce this error on my end with o1-mini
roywiggins 115 days ago [-]
it's definitely wrong though, "exactly one" straight line between two points is a different postulate and a stronger one.
Euclid has been translated, restated, and re-presented in enough books and textbooks that I'd expect a big-enough LLM to have actually memorized this correctly tbh
pama 115 days ago [-]
Agreed. That is what the original poster said. I didnt manage to reproduce the error on my end, but I dont know the full context or maybe the memory on my end changes the output.
115 days ago [-]
supermatt 115 days ago [-]
> I think AI answer is not correct, it may be some textbook interpretation but I was expecting Euclid's exact wording
It was written before English even existed. That said, the original never implied "exactly one", so I agree its a bad translation.
layer8 115 days ago [-]
Regardless of the wording being exact or not, ChatGPT’s answer is incorrect in its contents. The statement “exactly one” requires the parallel postulate, since otherwise it’s not necessarily true. Specifically in spherical geometry, which is considered to be consistent with Euclid’s first four postulates (i.e. without the parallel postulate).
The bottom line is, you can’t take any single LLM statement at face value, even in seemingly easy to answer cases like this.
roywiggins 115 days ago [-]
it's the point-line postulate, you can use it as part of a set of axioms equivalent to Euclid but it definitely not one of Euclid's
Euclid wrote in ancient Greek, so the "exact wording" in English does not exist.
nyrikki 115 days ago [-]
Euclid's Elements is less pervasive on the Internet then content produced for Liberal Arts math courses. As those courses tend to emphasize critical thinking and problem-solving math over pure theory and advanced concepts, they tend to be far more common and tend to win compared to more domain specific meanings.
Problems with polysemy across divergent, more advanced theories has been one of my biggest challenges in probing some of my areas of intrest.
Funny enough, one of my pet areas of obscure interest, riddled basins, is constantly muddied not by math, but LSAT questions, specifically non-math content directed at a reading comprehension test: "September 2006 LSAT Section 1 Question 26"
IMHO a lot of the prompt engineering you have to do with these highly domain specific problems is avoiding the most common responses in the corpus.
LLM responses will tend to reflect common usage, not academic terminology unless someone cares enough to change that for a specific case.
wging 115 days ago [-]
“Exact wording” would be Ancient Greek. Euclid did not even write in English. You’re checking whether the model matches a specific translation, which is not valuable. If you search around you’ll find many sources that choose a more intelligible phrasing.
ghransa 114 days ago [-]
I suspect, but am not certain - that if it had all of formalized mathematics in its context window it could likely extend the edges slightly further. Would be an interesting experiment irregardless.
lupire 114 days ago [-]
> GPT-o1, which performs an initial reasoning step before running the LLM.
Is that an accurate description?
I thought it just runs the LLM for longer, and multiple times,and truncates the beginning of the output.
fizx 114 days ago [-]
I'm curious how well a o1-like model thinks, given minutes instead of seconds and the temperature set relatively high.
2muchcoffeeman 115 days ago [-]
What a burn
“The experience seemed roughly on par with trying to advise a mediocre, but not completely incompetent, graduate student.”
alexnewman 114 days ago [-]
I tried giving it questions like 9.11 > 9.9 and it got basic stuff like that wrong more often than right
giardini 114 days ago [-]
Anyone tried using GPT in conjunction with Doug Lenat's tools (AM or Eurisko)?
ninetyninenine 115 days ago [-]
A specialized LLM could possibly meet his criteria already.
317070 115 days ago [-]
Probably. The missing factor is the dataset and the fact that so far OAI seems to be the only one who has figured out how to train this thing for reasoning.
But yeah, given o1 exists, it looks very doable. It's hard to imagine a reason for why something matching his criteria would be more than a decade out.
ein0p 115 days ago [-]
Idk I think the fact that it needs “hints” and “prodding” is a good thing, myself. Otherwise we wouldn’t need humans to get those answers, would we. I want it to augment humans, not replace them.
lupire 114 days ago [-]
The GPT Share links are 404 for me
oglop 114 days ago [-]
Ok lol. I mean just feed it serge lang problems and see what it does is all he’s saying.
It performs way better than undergrads. Funny he didn’t point that out but only made some slight to it about being a bad graduate student. Don’t believe me, open the book and ask away. It’s amazing, even if it is a “mediocre graduate student” which is far better than a good graduate student or professor that gives you no help or time for all that money you forked over.
It’s already worth the money, ignore this shitty write up by someone they doesn’t need its help.
jarbus 115 days ago [-]
I wonder how long it took for each of the responses it gave
diggan 115 days ago [-]
It varies a lot. If it's a simple question, it just does 3-4 sections of "thinking & reflection" but for more complicated ones I think I've seen something like 10 or more. Maybe 3-4 seconds per section on average I'd guess.
zamadatix 115 days ago [-]
It's unclear if Terence is referring to "GPT-o1... a prototype version of the model that I was granted access to" as in "he was given access to GPT-o1 by the research team" or as in "he is using o1-preview". The differences in scale and quality between his shared output and the answer I get trying the same prompt from o1-preview suggest perhaps the former (otherwise luck). I haven't actually seen any examples of how long o1 "full" will think about this kind of question, though I expect it's somewhere in the same ballpark given the thought expansion still only has one real concept in it.
darby_nine 115 days ago [-]
JUST when you thought the chatbot was dead
__loam 115 days ago [-]
This seems like it's just feeding the output back into the model and using more compute to try and get better answers. If that's all, I don't see how it fundamentally solves any of the issues currently present in LLMs. Maybe a marginal improvement in accuracy at the cost of making the computation more expensive. And you don't even get to see the so called reasoning tokens.
115 days ago [-]
iE8MRJS3fV2k 115 days ago [-]
[flagged]
HkJfJTle71cn 115 days ago [-]
[flagged]
2j8mJtnePjLM 115 days ago [-]
[flagged]
iamyourcanary 114 days ago [-]
[dead]
6qVRwS5iw3fA 115 days ago [-]
[flagged]
yvnRRzrTbS4E 115 days ago [-]
[flagged]
CIQkhyvj7lFQ 115 days ago [-]
[flagged]
IAS4oB40A63x 115 days ago [-]
[flagged]
115 days ago [-]
809VpzIOXqtQ 115 days ago [-]
[flagged]
tRys2l7G3RYa 115 days ago [-]
[flagged]
apyyxvIySl57 115 days ago [-]
[flagged]
WUt2nuuNDqaW 115 days ago [-]
[flagged]
3KccUAJuWG1b 115 days ago [-]
[flagged]
WPXQ0R7RgynH 115 days ago [-]
[flagged]
yjPHSrxjtosi 115 days ago [-]
[flagged]
rR9292wTfIdP 115 days ago [-]
[flagged]
fZFKY14LpSbH 115 days ago [-]
[flagged]
6EY7fcwLP3j0 115 days ago [-]
[flagged]
iE8MRJS3fV2k 115 days ago [-]
[flagged]
nektro 114 days ago [-]
lol
MrFots 115 days ago [-]
Incompetent Graduate Students is the name of my new sketch group.
idunnoman1222 114 days ago [-]
How did this guy not know how large language models work? Fancy compression algorithm for all written knowledge, how could it invent that which was not an input ?
I work in a field related to operations research (OR), and ChatGPT 4o has ingested enough of the OR literature that it's able to spit out very useful Mixed Integer Programming (MIP) formulations for many "problem shapes". For instance, I can give it a logic problem like "i need to put i items in n buckets based on a score, but I want to fill each bucket sequentially" and it actually spits out a very usable math formulation. I usually just need to tweak it a bit. It also warns against weak formulations where the logic might fail, which is tremendously useful for avoiding pitfalls. Compare this to the old way, which is to rack my brain over a weekend to figure out a water-tight formulation of MIP optimization problem (which is often not straightforward for non-intuitive problems). GPT has saved me so much time in this corner of my world.
Yes, you probably wouldn't be able to use ChatGPT well for this purpose unless you understood MIP optimization in the first place -- and you do need to break down the problem into smaller chunks so GPT can reason in steps -- but for someone who can and does, the $20/month I pay for ChatGPT more than pays for itself.
side: a lot of people who complain on HN that (paid/good - only Sonnet 3.5 and GPT4o are in this category) LLMs are useless to them probably (1) do not know how to use LLMs in way that maximizes their strengths; (2) have expectations that are too high based on the hype, expecting one-shot magic bullets. (3) LLMs are really not good for their domain. But many of the low-effort comments seem to mostly fall into (1) and (2) -- cynicism rather than cautious optimism.
Many of us who have discovered how to exploit LLMs in their areas of strength -- and know how to check for their mistakes -- often find them providing significant leverage in our work.
HN, and the internet in general, have become just an ocean of reactionary sandbagging and blather about how "useless" LLMs are.
Meanwhile, in the real world, I've found that I haven't written a line of code in weeks. Just paragraphs of text that specify what I want and then guidance through and around pitfalls in a simple iterative loop of useful working code.
It's entirely a learned skill, the models (and very importantly the tooling around them) have arrived at the base line they needed.
Much Much more productive world by just knuckling down and learning how to do the work.
edit: https://aider.chat/ + paid 3.5 sonnet
The fact everyone that say they've become more productive with LLMs won't say how exactly. I can talk about how VIM have make it more enjoyable to edit code (keybinding and motions), how Emacs is a good environment around text tooling (lisp machine), how I use technical books to further my learning (so many great books out here). But no one really show how they're actually solving problems with LLMs and how the alternatives were worse for them. It's all claims that it's great with no further elaboration on the workflows.
> I haven't written a line of code in weeks. Just paragraphs of text that specify what I want and then guidance through and around pitfalls in a simple iterative loop of useful working code.
Code is intent described in terms of machinery actions. Those actions can be masked by abstracting them in more understandable units, so we don't have to write opcodes, but we can use python instead. Programming is basically make the intent clear enough so that we know what units we can use. Software engineering is mostly selecting the units in a way to do minimal work once the intent changes or the foundational actions do.
Chatting with a LLM look to me like your intent is either vague or you don't know the units to use. If it's the former, then I guess you're assuming it is the expert and will guide you to the solution you seek, which means you believe it understands the problem more than you do. The second is more strange as it looks like playing around with car parts, while ignoring the manuals it comes with.
What about boilerplate and common scenarios? I agree that LLMs helps a great deal with that, but the fact is that there are perfectly good tools that helped with that like snippets, templates, and code generators.
Same thing except now it's also many tech-savvy people joining in with the tech-unsavvy in saying that prompting isn't a real skill...but people who know better know that it is.
On average, people are awfully bad at describing exactly what it is they want. Ever speak with a client? And you have to go back and forward for a few hours to finally figure out what it is they wanted? In that scenario you're the LLM. Except the LLM won't keep asking probing questions and clarifications - it will simply give them what they originally asked for (which isn't what they want). Then they think the LLM is stupid and stop trying to ask it for things.
Utilizing an LLM to its full potential is a lot of iterative work and, at least for the time being, requires having some understanding of how it works underneath the hood (eg. would you get better results by starting a new session or asking it to forget previous, poorly worded instructions?).
An LLM is a word (token?) generator which can be amazingly consistent according to its model. But rarely is my end goal to generate text. It's either to do something, to understand something, or to communicate. For the first, there are guides (books, manuals, ...), for the second, there are explanations (again books, manuals,...), and the third is just using language to communicate what's on my mind.
That's the same thing with search engines. I use them to look for something. What I need first is a description of that something, not how to do the "looking for". Then once you know what you want to find, it's easier to use the tool to find it.
If your end goal can be achieved with LLMs, be my guest to use them. But, I'm wary of people taking them at face value and then pushing the workload unto everyone else (like developers using electron).
ChatGPT has helped me write some scripts for things that otherwise probably would have taken me at least 30+ minutes and it wrote them in <10 seconds and they worked flawlessly. I've also had times where I worked with it to develop something that ended up taking me 45 minutes to only ever get error-ridden code that I had to fix the obvious errors and rewrite parts of it to get it working. Sometimes during this process it actually has taught me a new approach to doing something. If I had started from scratch coding it by myself it probably would have taken me only 10~ minutes. But if I was better at prompting what if that 45 minutes was <10 minutes? It would go from from a time loss to a time save and be worth using. So improving my ability to prompt is worthwhile as long as doing so trends towards me spending less time prompting.
Which is thankfully pretty easy to track and test. On average, as I get better at prompting, do I need to spend more or less time prompting to get the results I am looking for? The answer to that is largely that I spend less time and get better results. The models constantly changing and improving over time can make this messy - is it the model getting better or is it my prompting? But I don't think models change significantly enough to rule out that I spend less time prompting than I have in the past.
>>> you do need to break down the problem into smaller chunks so GPT can reason in steps
To search well, you need good intuition for how to select the right search terms.
To LLM well, you can ask the LLM to break the problem into smaller chunks, and then have the LLM solve each chunk, and then have the LLM check its work for errors and inconsistencies.
And then you can have the LLM write you a program to orchestrate all of those steps.
LLMs are in the evolutionary phase, IMHO. I doubt we're going to see revolutionary improvements from GPTs. So I say time and time again: the technology is here, show it doing all the marvelous things today. (btw, this is not directed at your comment in particular and I digressed a bit, sorry).
If prompting ability varies then this is not some objective question, it depends on each person.
For me I've found more or less every interaction with an LLM to be useful. The only reason I'm not using it continually for 8 hours a day is because my brain is not able to usefully manage that torrent of new information and I need downtime.
Enter technical specifications in English as input language, get code as destination language.
When I encounter somebody who says they do not write code anymore, I assume that they either:
1. Just don't do anything beyond the simplest tutorial-level stuff
2. or don't consider their post-generation edits as writing code
3. or are just bullshitting
I don't know which it is for each person in question, but I don't trust that their story would work for me. I don't believe they have some secret sauce prompting that works for scenarios where I've tried to make it work but couldn't. Sure I may have missed some ways, but my map of what works and what doesn't may be very blurry at the border, but the surprises tend to be on the "doesn't work" side. And no Claude doesn't change this.
It can write unit tests, but makes similar mistakes, so I have to rewrite them… but it nevertheless still makes it easier to write those tests.
It writes good first-drafts for documentation, too. I have to change it, delete some stuff that's excess verbiage, but it's better than the default of "nobody has time for documentation".
My experience is just as you describe it: I ask a question whose answer is in stackoverflow or fucking geeks4geeks? Then it produces a good answer. Anything more is an exercise in frustration as it tries to sneak nonsense code past me with the same confident spiel with which it produces correct code.
Consider this round-trip in Google Translate:
"དེ་ནི་སྐད་སྒྱུར་པ་ཞིག་ཡིན། འོན་ཀྱང་ཁོང་ཚོ་རང་བྱུང་སྐད་སྒྱུར་གྱི་སྐད་སྒྱུར་ནང་ལ་ཡག་པོ/ངན་པ/ཁྱད་མཚར་པོ/མགོ་སྐོར་གཏོང་བ་འདྲ་པོ་ཡོད།"
"It's a translator. But they seem to be good/bad/weird/delusional in natural translations. I have a"
(Google translate stopped suddenly, there).
I've tried using ChatGPT to translate two Wikipedia pages from German to English, as it can keep citations and formatting correct when it does so; it was fine for the first 2/3rds, then it made up mostly-plausible statements that were not translated from the original for the rest. (Which I spotted and fixed before saving, because I was expecting some failure).
Don't get me wrong, I find them impressive, but I think the problem here is the Peter Principle: the models are often being promoted beyond their competence. People listen to that promotion and expect them to do far more than they actually can, and are therefore naturally disappointed by the reality.
People like me who remember being thrilled to receive a text adventure casette tape for the Commodore 64 as a birthday or christmas gift when we were kids…
…compared to that, even the Davinci model (that really was autocomplete) was borderline miraculous, and ChatGPT-3.5 was basically the TNG-era Star Trek computer.
But anyone who reads me saying that last part without considering my context, will likely imagine I mean more capabilities than I actually mean.
One of them it was the entire duration of me working for them.
They didn't understand why it was taking so long despite constantly changing what they asked for.
The other 90% is mostly mushy human stuff, fleshing out the problem, setting expectations etc. Helping a group of people reach a solution everyone is happy with has little to do with technology.
> Helping a group of people reach a solution everyone is happy with has little to do with technology.
This one specific thing, is actually something that ChatGPT can help with.
It's not as good as the best human, or even a middling human with 5 year's business experience, but rather it's useful because it's good enough at so many different domains that it can be used to clarify thoughts and explain the boundaries of the possible — Google Translate for business jargon, though like Google Translate it is also still often wrong — the ultimate "jack of all trades, master of none".
There's no substance to be found, no added information; it's just repeating what came before, badly, which is exactly the kind of software that would be better off not written if you ask me.
The plan to rebuild society on top of this crap is right up there with basing our economy on manipulating people into buying shit they don't need and won't last so they have to keep buying more. Because money.
I'm afraid you might be right.
We've accepted a lot of crap lately just to get what we think we want, convenience is a killer.
But I would otherwise say that most (though not all*) AI researchers seem to be deeply concerned about the set of all potential negative consequences, including mutually incompatible outcomes where we don't know which one we're even heading towards yet.
* And not just Yann LeCun — though, given his position, it would still be pretty bad even if it was just him dismissing the possibility of anything going wrong
Exactly. I made a game testing prompting skills a few days earlier, to share with some close friends, and it was your comment that inspired me to translate the game into English and submitted to HN. ( https://news.ycombinator.com/item?id=41545541 )
I am really curious about how other people write prompts, so while my submission only got 7 points, I'm happy that I can see hundreds of people's own ways to write prompts thanks to HN.
However, after reading most prompts (I may missed some), I found exactly 0 prompts containing any kind of common prompting techniques, such as "think step by step", explaining specific steps to solve the problem instead of only asking for final results, few-shots (showing example inputs and outputs). Half of the prompts are simply asking AI to do the thing (at least asking correctly). The other half do not make sense, even if we show the prompt to a real human, they won't know what to reply with.
Well... I expected that SOME complaints about AI online are from people not familiar with prompting / not good at prompting. But now I realized there are a lot more people than I thought not knowing some basic prompting techniques.
Anyway, a fun experience for me! Since it was your comment made me want to do this, I just want to share it with you.
While I can compare good journalists to extremely great and intuitive journalists, I don't have really any references for this in the prompting realm (except for when the Dall-e Cookbook was circulating around).
/r/stablediffusion used to (less so now) have a lot of workflow posts where people would share how they prompt and adjust the knobs/dials of certain settings and models to make what they make. It's not so different from knowing which knobs/dials to adjust in Apophysis to create interesting fractals and renders. They know what the knobs/dials adjust for their AI tools and so are quite proficient at creating amazing things using them.
People who write "jailbreak" prompts are a kind of example. There is some effort put into preventing people from prompting the models and removing the safeguards - and yet there are always people capable of prompting the model into removing its safeguards. It can be surprisingly difficult to do yourself for recent models and the jailbreak prompts themselves are becoming more complex each time.
For art in particular - knowing a wide range of artist names, names of various styles, how certain mediums will look, as well as mix & matching with various weights for the tokens can get you very interesting results. A site like https://generrated.com/ can be good for that as it gives you a quick baseline of how including certain names will change the style of what you generate. If you're trying to hit a certain aesthetic style it can really help. But even that is a tiny drop in a tiny bucket of what is possible. Sometimes it is less about writing an overly detailed prompt but rather knowing the exact keywords to get the style you're aiming for. Being knowledgeable about art history and famous artists throughout the years will help tremendously over someone with little knowledge. If you can't tell a Picasso from a Monet painting you're going to find generating paintings in a specific style much harder than an art buff.
To give an example, one person (a researcher at DeepMind) recently wrote about specific instances of his uses of LLMs, with anecdotes about alternatives to each example. [1] People on HN had different responses with similar claims with elaborations on how it has changed some of their workflows. [2]
While it would be interesting to see randomized controlled trials on LLM usage, hearing people's anecdotes brings to mind the (often misquoted) phrase: "The plural of anecdote is data". [3] [4]
[1] https://nicholas.carlini.com/writing/2024/how-i-use-ai.html
[2] https://news.ycombinator.com/item?id=41150317
[3] http://blog.danwin.com/don-t-forget-the-plural-of-anecdote-i...
[4] originally misquoted as "Anecdote is the plural of data."
You misquoted it there! It should be: The plural of anecdote is data.
https://en.wikipedia.org/wiki/Muphry's_law
[1] https://chatgpt.com/share/1ead532d-3bd5-47c2-897c-2d77a38964...
I recently built a Brainfuck compiler and TUI debugger and I tested out a few LLM's just to see if I could get some useful output regarding a few niche and complicated issues, and it just gave me garbage that looked mildly correct. Then I'm told its because I'm not prompting hard enough... I'd rather just learn how to do it at that point. Once I solve that problem, I can solve it again in the future in .25x the time.
Look, I get the hype - but I think you need to step outside a bit before saying that 99% of the software out there is glorified CRUDs...
Think about the aerospace/defense industries, autonomous vehicles, cloud computing, robotics, sophisticated mobile applications, productivity suites, UX, gaming and entertainment, banking and payment solutions, etc. Those are not small industries - and the software being built there is often highly domain-specific, has various scaling challenges, and takes years to build and qualify for "production".
Even a simple "glorified CRUD", at a certain point, will require optimizations, monitoring, logging, debugging, refactoring, security upgrades, maintenance, etc...
There's much more to tech than your weekend project "Facebook but for dogs" success story, which you built with ChatGPT in 5 minutes...
https://github.com/williamcotton/guish
I was the driver. I told it to parse and operate on the AST, to use a plugin pattern to reduce coupling, etc. The machine did the tippy-taps for me and at a much faster rate than I could ever dream of typing!
It’s all in a Claude Project and can easily and reliably create new modules for bash commands because it has the full scope of the system in context and a ginormous amount of bash commands and TypeScript in the training corpus.
Highly representative of what devs make all day indeed
I never need to type paragraphs to get the output I want. I don't even bother with correct grammar or spelling. If I need code for x crud web app who is going to type it faster, me or the LLM? This is really not hard to understand.
- write some moderately complex powershell to perform a one-off process
- add typescript annotations to a random file in my org's codebase
- land a minor feature quickly in another codebase
- suggest libraries and write sample(ish) code to see what their rough use would look like to help choose between them for a future feature design
- provide text to fill out an extensive sales RFT spreadsheet based on notes and some RAG
- generat some very domain-specific realistic sounding test data (just naming)
- scaffold out some PowerPoint slides for a training session
There are likely others (LLMs have helped with research and in my personal life too)
All of these are things that I could do (and probably do better) but I have a young baby at the moment and the situation means that my focus windows are small and I'm time poor. With this workflow I'm achieving more than I was when I had fully uninterrupted time.
I'm an iOS dev, my knowledge of JS and CSS is circa 2004. I've used ChatGPT to convert some of my circa 2009 Java games into browser games.
> Chatting with a LLM look to me like your intent is either vague or you don't know the units to use
Or that you're moving up the management track.
Managers don't write code either. Some prefer it that way.
So I can translate things like
"Create an array, then query this instrument for xyz measurements, then store those measurements in the array. Then store that array in the .csv file we created before"
It works fantastic and saved us from outsourcing.
Of course it has limitations and you can't be sleep at the wheel, but that's true of any tool or task.
All software calls APIs, but some rely on literally "just chaining" these calls together more than writing custom behavior from scratch. After all, someone needs to write the APIs to begin with. That's not to say that these projects aren't useful or valuable, but there's a clear difference in the skill required for either.
You could argue that it's all APIs down to the hardware level, but that's not a helpful perspective in this discussion.
Yes, that's what I'm arguing. Why isn't useful? I think it's useful, because it demystifies things. You know that in order to do something, you need to know how to use the particular API.
https://gist.github.com/simonw/97e29b86540fcc627da4984daf5b7...
There are more to be found on his blog on the ai-assisted-programming tag. https://simonwillison.net/tags/ai-assisted-programming/
A pretty literal response: https://www.youtube.com/@TheRevAlokSingh/streams
Plenty of Lean 4 and Cursor.
I have python scripts which do lot of automation like downloading pdfs, bookmarking pdfs, processing them, etc. Thanks to LLMs I dont write a python code myself, I just ask an LLM to write it, I just provide the requirement. I just copy the code generated by the AI model and run it. If there any errors, I just ask AI to fix it.
Anecdotally, I no longer use StackOverflow. I don’t have to deal with random downvotes and feeling stupid because some expert with a 10k+ score on 15 SE sites each votes my question to be closed. I’m pretty tech savvy, been doing development for 15 years, but I’m always learning new things.
I can describe a rough idea of what I want to an LLM and get just enough code for me to hit the ground running…or, I can ask a question in forum and twiddle my thumbs and look through 50 tabs to hopefully stumble upon a solution in the meantime.
I’m productive af now. I was paying for ChatGPT but Claude has been my goto for the past few months.
So, that may be a fact for you but there are mixed results when you go out wide. For example [1] has this little nugget:
>The study identifies a disconnect between the high expectations of managers and the actual experiences of employees using AI.
>Despite 96% of C-suite executives expecting AI to boost productivity, the study reveals that, 77% of employees using AI say it has added to their workload and created challenges in achieving the expected productivity gains. Not only is AI increasing the workloads of full-time employees, it’s hampering productivity and contributing to employee burnout.
So not everyone is feeling the jump in productivity the same way. On this very site, there are people claiming they are blasting out highly-complex applications faster than they ever could, some of them also claiming they don't even have any experience programming. Then others claiming that LLMs and AI copilots just slow them down and cause much more trouble than they are worth.
It seems like just with programming itself, that different people are getting different results.
[1]https://www.forbes.com/sites/bryanrobinson/2024/07/23/employ...
"Jason is a strong coder, and he despises AI tools!"
For the effect on the industry, I generally make the point that even if AI only replaces the below average coder it will cause a downward pressure on above average coders compensation expectation.
Personally, humans appear to be getting dumber at the same time that AI is getting smarter and while, for now, the crossover point is at a low threshold that threshold will of course increase over time. I used to try to teach ontologies, stats, SMT solvers to humans before giving up and switching to AI technologies where success is not predicated on human understanding. I used to think that the inability for most humans to understand these topics was a matter of motivation, but have rather recently come to understand that these limitations are generally innate.
It is difficult if you have been told all your life that you are the best, to accept the fact that a computer or even other people might be better than you.
It requires lot of self-reflection.
Real top-tiers programmers actually don’t feel threatened by LLMs. For them it is just one more tool in the toolbox like syntax highlighting or code completion.
They choose to use these tools based on productivity gains or losses, depending on the situation.
Telling that sort of person that they're going to be more productive by skipping all the "time consuming programming stuff" is bound to hurt.
They should, because LLMs are coming for them also, just maybe 2-3 years later than for programmers that aren't "real top-tier".
The idea that human intellect is something especially difficult to replicate is just delusional. There is no reason to assume so, considering that we have gone from hole card programming to LLMs competing with humans in a single human lifetime.
I still remember when elite chessplayers were boasting "sure, chess computers may beat amateurs, but they will never beat a human grandmaster". That was just a few short years before the Deep Blue match.
The difference is that nobody will pay programmers to keep programming once LLMs outperform them. Programmers will simply become as obsolete as horse-drawn carriages, essentially overnight.
Would you be willing to set a deadline (not fuzzy dates) when my job is going to be taken by an LLM and bet $5k on that?
Because the more I use LLMs and I see their improvement rate, the less worried I am about my job.
The only thing that worries me is salaries going down because management cannot tell how bad they're burying themselves into technical debt and maintenance hell, so they'll underpay a bunch of LLM-powered interns... which I will have to clean up and honestly I don't want to (I've already been cleaning enough shit non-LLM code, LLMs will just generate more and more of that).
This is just a political question and of course so long as humans are involved in politics they can just decide to ban or delay new technologies, or limit their deployment.
Also in practice it's not like people stopped traditional pre-industrial production after industrialization occurred. It's just that pre-industrial societies fell further and further behind and ended up very poor compared to societies that chose to adopt the newest means of production.
I mean, even today, you can make a living growing and eating your own crops in large swathes of the world. However you'll be objectively poor, making only the equivalent of a few dollars a day.
In short I'm willing to bet money that you'll always be able to have your current job, somewhere in the world. Whether your job maintains its relative income and whether you'd still find it attractive is a whole different question.
I don't buy this. A big part of the programmer's job is to convert vague and poorly described business requirements into something that is actually possible to implement in code and that roughly solves the business need. LLMs don't solve that part at all since it requires back and forth with business stakeholders to clarify what they want and educate them on how software can help. Sure, when the requirements are finally clear enough, LLMs can make a solution. But then the tasks of testing it, building, deploying and maintaining it remain too, which also typically fall to the programmer. LLMs are useful tools in each stage of the process and speed up tasks, but not replacing the human that designs and architects the solution (the programmer).
> They should, because LLMs are coming for them also, just maybe 2-3 years later than for programmers that aren't "real top-tier".
Not worrying about that because if they've gotten to that point (note: top tier programmers also need domain knowledge) then we're all dead a few years later.
If the amount of bad code is no longer limited by the availability of workers who can be trained up to "just below average" and instead anyone who knows how to work a touchscreen can make AI slop, this opens up a big economic opportunity.
You could also make the same claims about outsourcing, and while it appears that in most cases the outsourcing doesn't pay off, the perception that it would has really damaged CS as a career.
What happened a couple of decades ago in poetry [1] could happen now with programming:
> No longer is it just advertising jingles and limericks made in Haiti and Indonesia. It's quatrains, sonnets, and free-form verse being "outsourced" to India, the Philippines, Russia, and China.
...
> "Limericks are a small slice of the economy, and when people saw globalization creating instability there, a lot said, 'It's not my problem,'" says Karl Givens, an economist at Washington's Economic Policy Institute. "Now even those who work in iambic pentameter are feeling it."
[1] http://www.watleyreview.com/2003/111103-2.html
Even test cases have brought me no luck. The code was poorly written, being too complicated and dynamic for test code in the best case and just wrong on average. It constantly generated test cases that would be fine for other definitions of "tree edit distance" but were nonsense for my version of a "tree edit distance".
What are you doing where any of this actually works? I'm not some jaded angry internet person, but I'm honestly so flabbergasted about why I just can't get anything good out of this machine.
Where you save loads of time is when you need to write lots of code using unfamiliar APIs. Especially when it's APIs you won't work with a lot and spending loads of time learning then would just be a waste of time. In these cases LLMs call tell you the correct API cells and it's easy to verify. The LLM isn't really solving some difficult technical problem, but saves lots of work.
But expecting them to solve difficult unsolved problems is a fundamental misunderstanding of what they are under the hood.
Could you share some concrete experience of a problem where aider, or a tool like it, helped you? What was your workflow, and how was the experience?
In other words: LLMs don't solve any noteworthy problems, at least yet.
I'm perfectly happy reading man pages personally. Half the fun of programming to me is mastering the API to get something out of it nobody expected was in there. To study the documentation (or implementation) to identify every little side effect. The details are most of the fun to me.
I don't really intend to use the AI for myself, but I do really wish to see what they see.
My experience is the same as yours, but I noticed that while LLMs circa two years ago tried to come up with the answer, current generation of LLMs tries to make me come with the answer. And that not helping at all.
Try starting from ground zero and guiding it to the solution rather than trying to one shot your entire solution in one go.
I want you to implement this kind of tree in language x.
Ok good, now I want you to modify it to do Y.
Etc.
My problem is that the solution is right there in the paper. I just have to understand it. Without first understanding that paper, I can't possibly guide the AI towards a reasonable implementation. The process of finding the implementation is exactly the understanding of the paper, and the AI just doesn't help me with that. In fact, all too often I would ask it to make some minor change, and it would start making random changes all over the file, completely destroying my mental model of how the program worked. Making it change that back completely pulls me out of the problem.
When it's a junior at my job, at least I can feel like I'm developing a person. They retain the conversation and culture I impart as part of the problem solving process. When I struggle against the computer, it's just a waste of my time. It's not learning anything.
I'm still really curious what you're doing with it.
You are still responsible for what you do regardless of the means you used to do it. And a lot of people use this not because it’s more productive but because it requires less effort and less thought because those are the hard bits.
I’m collecting stats at the moment but the general trend in quality as in producing functional defects is declining when an LLM is involved in the process.
So far it’s not a magic bullet but a push for mediocrity in an industry with a rather bad reputation. Never a good story.
I'd put more hope in improving LLMs/derivatives than improving the level of effort and thought in code across the entire population of "people who code", especially the subset who would rather be doing something else with their time and effort / see it as a distraction from the "real" work that leverages their actual area of expertise.
Yeah, that's...the whole point of tools. They reduce effort. And they don't shift your responsibility. For many of us, LLMs are overwhelmingly worth the tradeoffs. If your experience differs, then it's unfortunate, and I hate that for you. Don't use 'em!
Copilot and the likes are legit for boilerplate, some test code, and posix/power shell scripting. Anything that's very common it's great.
Anything novel though and it suffers. Did AWS just release some new functionality and only like 4 people have touched it so far on GitHub? Are you getting source docs incomplete or spread out amongst multiple pages with some implicit/betwen-the-lines spec? Eh, good luck, you're probably better off just reading the docs yourself or guess and checking.
Same goes for versioning, sometimes it'll fall back into an older version of the system (ex Kafka with kraft vs zookeeper)
Personally, the best general use case of LLMs for me is focus. I know how to break down a task, but sometimes I have an issue staying focused on doing it and having a reasonably competent partner to rubber duck with is super useful. It helps that the chat log then becomes an easy artifact to more or less copy paste, and chatgpt doesn't do a terrible job reformatting either. Like for 90% of the stuff it's easier than using vim commands.
Yes, most definitely. I've recently been introduced to our CTOs little pet project that he's been building with copious help from ChatGPT, and it's genuinely some of the most horrid code I've ever seen in my professional career. He genuinely doesn't know what half of it even does when I quizzed him about some of the more egregious crap that was in there. The real fun part is that now that it's a "proven" PoC some poor soul is going to have to maintain that shit.
We also have a mandate from the same CTO to use more AI in our workflows, so I have 0 doubts in my mind that people are blindly pushing code without thinking about it, and people like myself are left dealing with this garbage. My time & energy is being wasted sifting through AI-generated garbage that doesn't pass the smell test if you spend a singular minute of effort reading through the trash it generates.
I literally had someone with the balls to tell me that it was ChatGPT's fault.
Due diligence and intelligence has shit the fucking bed quite frankly.
People blindly copied stack overflow code, they blindly copied every example off of MSDN, they blindly copy from ChatGPT - your holier than thou statements are funny, and frankly most LLMs cannot leave a local maxima, so anyone who says they dont write any code anymore I frankly think they are not capable of telling the mistakes, both architecturally and specifically that they are making.
More and different prompting will not dig you out of the hole.
When I'm deeply stuck on something and I think "let's see if an LLM could help here", I try (and actually tried many times) to recruit those prompting gurus around me that swear LLMs solve all their problems... and they consistently fail to help me at all. They cannot solve the problem at all and I'm just sitting there, watching the gurus spend hours prompting in circles until they give up and leave (still thinking LLMs are amazing, of course).
This experience is what makes me extremely suspicious of anyone on the internet claiming they don't write code anymore but refusing to show (don't tell!) -- when actually testing it in real life it has been nothing but disappointment.
Yes, and I see proof of it _literally every day_ in Code Reviews where I ask juniors to describe or justify their choices and they shrug and say "That's what Copilot told me to put".
Which is great until your next job interview. Really, it's tempting in the short run but I made a conscious decision to do certain tasks manually only so that I don't lose my basic skills.
- I need you to assist me during a programming interview, you will be listening to two people, the interviewer and me. When the interviewer asks a question, I'd like you to feed me lines that seem realistic for an interview where I'm nervous, don't give me a full blown answer right away. Be very succinct. If I think you misunderstood something, I will mention the key phrase "I'm nervous today and had too much coffee". In this situation, remember I'm the one that will say the phrase, and it might be because you've mistaken me by the interviewer and I want you to "reset". If I want you to dig deeper than what you've provided me with, I'll say the key phrase "Let's dig deeper now". If I think you've hallucinated and want you to try again, I'll say "This might be wrong, let me think for just a minute please". Remember, other than these key phrases, I'll only be talking to the interviewer, not you.
On a second screen of some sort. Other than that, interviewers will just have to accept that nobody will be doing the job without these sort of assistants from now on anyway. As an interviewer I let candidates consult online docs for specific things already because they'll have access to Google during the job, this is just an extension of that.
Out of maybe twenty people I interviewed this way, only three of them pointed out that one of the queries had a failing error in it. It was something any LLM would immediately point out.
Beyond that: the first question I asked was: "What does this query do, what does it return?" I got responses ranging from people who literally read the query back to me word by word, giving the most shallow and direct explanation of what each bit did step-by-step, to people who clearly summarized what the query did in high-level, abstract terms, as you might describe what you want to accomplish before you write the query.
I don't think anyone did something with ChatGPT live, but maybe?
I personally treat the LLM as a rubber duck. Often I reject its output. In other cases, I can accept it and refactor it into something even better. The name of the game is augmentation.
It's the opposite. FizzBuzz and getting the syntax right is what LLMs are good at... but there's so much more nuance at being experienced with a language/framework/library/domain which senior engineers understand and LLMs don't.
Being able to write Elixir assisted by an LLM does not mean you can produce proper architecture and abstractions even if the high level ideas are right. It's the tacit knowledge and second-order thinking that you should hire for.
But the thing is, if someone cannot write Elixir without syntax errors unless using an LLM, well, that's a extremely good proxy that they don't know the ins and outs of the language, ecosystem, best practices... Years of tacit knowledge that LLMs fail to use because they're trained on a huge number of tutorial and entry-level code ridden with the wrong abstractions.
The only code worse than one that doesn't work is one that kinda works unless your requirements change ever so slightly. That's a liability and you will pay it with interests.
To give a concrete example: I am very experienced with React. Very. A lot. The code that LLMs write for it is horrid, bug-ridden, inflexible and often misuses its footgun-y APIs like `useEffect` like a junior fresh out of a boot camp would, directly contradicting the known best practices for maintainable (and often even just "correct") code. But yeah it superficially solves the problem. Kinda. But good luck when the system needs to evolve. If it cannot do proper code that's <500 lines how do you expect it to deal with massive systems that need to scale to 10s of KLOC across an ever-growing twine?
But management will be happy because the feature shipped and time to market was low... until you can no longer ship anything new and you go out of business.
I had an idea the other day of an LLM system that would start from a basic architecture of an app, and would zoom down and down on components until it wrote the entire codebase, module by module. I'll try that, it sounds promising.
This was part of a larger evaluation comparing the Hacker News population to people on Reddit programming subreddits.
Here is a very heated discussion of the result:
https://news.ycombinator.com/item?id=33293522
It appears that Hacker News is perhaps NOT populated by the programming elite. In contrast, there are real wizards on Reddit.
Surprising, I know.
I’m not doing ground breaking software stuff, it’s just web dev at non massive scales.
What I'm saying is what the original comment is doing, having the LLM write all their code, will make them a less valuable employee in the long term. Participating in the act of programming makes your a better programmer. I'd rather have programmer B if they take the time to understand their code, so that when that code breaks at 4am and they get the call, they can actually fix it rather than be in a hole they dug with LLMs that they can't dig out of.
Probably not a practical option yet, but if we're looking at the long term that is where we are heading. Or, realistically, the even longer term where the LLM self-heals broken systems.
Also, often folks in this space are better at cheating than you will be at detecting them. Don't believe me? https://bigvu.tv/captions-video-maker/ai-eye-contact-fix
But "lines of code written" is a hollow metric to prove utility. Code literacy is more effective than code illiteracy.
Lines of natural language vs discrete code is a kind of preference. Code is exact which makes it harder to recall and master. But it provides information density.
> by just knuckling down and learning how to do the work?
This is the key for me. What work? If it's the years of learning and practice toward proficiency to "know it when you see it" then I agree.
How are people doing this, none of the code that gpt4o/copilot/sonnet spit out i ever use because it never meets my standards. How are other people accepting the shit it spits out.
See the video at https://plandex.ai/ to get an idea how it works.
(Not sure if that was clear but the steps/loop described before happens automatically, you're not babysitting it)
Any tips how I could integrate that? Do I need to switch to aider/plandex?
Sure, it's not the best (most maintainable, non-redundant styling) code that's powering the app but it's more than enough to put an MVP out to the world and see if there's value/interest in the product.
This is cult like behaviour that reminds me so much of the crypto space.
I don't understand why people are not allowed to be critical of a technology or not find it useful.
And if they are they are somehow ignorant, over-reacting or deficient in some way.
I'm more reacting against simplistic and categorical pronouncements of straight up "uselessness," which to me seems un-curious and deeply cynical, especially since it is evidentially untrue in many domains (though it is true for some domains). I just find this kind of emotional cynicism (not a healthy skepticism, but cynicism) to be contrary to the spirit of innovation and openness, and indeed contrary to evidence. It's also an overgeneralization -- "I don't find it useful, so it's useless" -- rather than "Why don't I find it useful, and why do others do? Let me learn more."
As future-looking HNers, I'd expect we would understand the world through a lens of "trajectories" rather than "current state". Just because LLMs hallucinate and make mistakes with a tone of confidence today -- a deep weakness -- doesn't mean they are altogether useless. We've witnessed that despite their weaknesses, we are getting a lot of value from them in many domains today and they are getting better over time.
Take neural networks themselves for instance. For most of the 90s-2000s, people thought they were a dead end. My own professor had great vitriol against Neural Networks. Most of the initial promises in the 80s truly didn't pan out. Turns out what was missing was (lots of) data, which the Internet provided. And look where we are today.
Another area of cynicism is self-driving cars (Level 5). Lots of hype and overpromise, and lots of people saying it will never happen because it requires a cognitive model of the world, which is too complicated, and there are too many exceptional cases for there to ever be Level 5 autonomy. Possibly true, but I think "never" is a very strong sentiment that is unworthy of a curious person.
There's also an important nuance differentiating rejection of a general technological endpoint (e.g. AGI or Level 5 self-driving cars) with a particular technological approach to achieving those goals (e.g. current LLM design or Tesla's autopilot). As you said, "never" is a long time and it takes a lot of unwarranted confidence to say we will never be able to achieve goals like AGI or Level 5 self-driving. But it seems a lot more reasonable to argue Tesla or OpenAI (and everyone else doing essentially the same thing as OpenAI) are fundamentally on the wrong track to achieving those goals without significantly changing their approach.
I agree that none of that really warrants dismissive cynicism of new technology, but being curious and future-looking also requires being willing to say when you think something is a bad approach even if it's not totally useless. Among other reasons, our ability to explore new technology is not limitless, and hype for a flawed technology isn't just annoying but may be sucking all the oxygen out of the room not leaving any for a potentially better alternative. Part of me wants to be optimistic about LLMs, but another part of me thinks about how much energy (human and compute) has gone into this thing that does not seem to be providing a corresponding amount of value.
You are absolutely right that the trajectories, if taken linearly, might hit a dead end. I should clarify that when I mentioned "trajectories" I don't mean unpunctuated ones.
I am myself not convinced that LLMs -- despite their value to me today -- will eventually lead to AGI as a matter of course, nor the type of techniques used in autopilot will lead to L5 autonomy. And you're right that they are consuming a lot of our resources, which could well be better invested in a possibly better alternative.
I subscribe to Thomas Kuhn's [1] idea of scientific progress happening in "paradigms" rather than through a linear accumulation of knowledge. For instance, the path to LLMs itself was not linear, but through a series of new paradigms disrupting older ones. Early natural language processing was more rule-based (paradigm), then it became more statistical (paradigm), and then LLMs supplanted the old paradigms through transformers (paradigm) which made it scale to large swaths of data. I believe there is still significant runway left for LLMs, but I expect another paradigm must supplant it to get closer to AGI. (Yann Lecun said that he doesn't believe LLMs will lead to AGI).
Does that mean the current exuberant high investments in LLMs are misplaced? Possibly, but in Kuhn's philosophy, typically what happens is a paradigm will be milked for as much as it can be, until it reaches a crisis/anomaly when it doesn't work anymore, at which point another paradigm will supplant it.
At present, we are seeing how far we can push LLMs, and LLMs as they are have value even today, so it's not a bad approach per se even though it will hit its limits at some point. Perhaps what is more important are the second-order effects: the investments we are seeing in GPUs (essentially we are betting on linear algebra) might unlock the kind of commodity computational power the next paradigm needs to disrupt the current one. I see parallels between this and investments in NASA resulting in many technologies that we take for granted today, and military spend in California producing the technology base that enabled Silicon Valley today. Of course, these are just speculations and I have no more evidence that this is happening with LLMs than anyone else.
I appreciate your point however and it is always good to step back and ask, non-cynically, whether we are headed down a good path.
[1] https://en.wikipedia.org/wiki/The_Structure_of_Scientific_Re...
Not everyone is interested in seeing the world through the hopes and dreams of e/acc types and would prefer to see it as it is today.
LLMs are a technology. Nothing more. It can be as amazing or useless as anyone likes.
The similarities include intense "true believer" pitches and governments taking them seriously.
The differences include that the most famous cryptocurrency can't function as a direct payment mechanism for just lunch purchases in just Berlin (IIRC nor is it enough for all interbank transactions so it can't even be a behind-the-scenes system by itself), while GenAI output keeps ending up in places people would rather not find it like homework and that person on Twitter who's telling you Russia Did Nothing Wrong (and also giving you a nice cheesecake recipe because they don't do any input sanitation).
I read somewhere that historically bonds in their early days were also associated with scamminess but today they're just a vanilla asset.
Please post a video of your workflow.
It’s incredibly valuable for people to see this in action, otherwise they, quite legitimately, will simply think this is not true.
I’m not saying you are; I think there are a lot of legitimate AI workflows people use.
…but, there are a lot of people trying to sell AI, and that makes them say things about it which are just flat out false.
/shrug
But you know; freedom of speech; you can say whatever you want if you don’t care what people think of you.
My take on it is showing people things (videos, blogs, repos, workbooks like Terence posted) moves the conversation from “I don’t believe you” to “let’s talk about the actual content”. Wow, what an interesting workflow, maybe I’ll try that…
If you don’t want to talk to people or have a discussion that extends beyond meaningless trivia like “does AI actually have any value” (obviously flame bait opinions only comment threads)… why are you even here?
If you don’t care, then fine. Maybe someone else will and they’ll post an interesting video.
Isn’t that the point of reading HN threads? What do you win by telling people not to post examples of their workflow?
It’s incredibly selfish.
Now imagine how profoundly depressing it is to visit a HN post like this one, and be immediately met with blatant tribalism like this at the very top.
Do you genuinely think that going on a performative tirade like this is what's going to spark a more nuanced conversation? Or would you rather just the common sentiment be the same as yours? How many rounds of intellectual dishonesty do we need to figure this out?
could it be that you are mostly engaged in "boilerplate coding", where LLMs are indeed good?
Comment on first principles:
Following the dictum that you can't prove the absence of bugs, only their presence, the idea of what constitutes "working code" deserves much more respect.
From an engineering perspective, either you understand the implementation or you don't. There's no meaning to iteratively loop of producing working code.
Stepwise refinement is a design process under the assumption that each step is understood in a process of exploration of the matching of a solution to a problem. The steps are the refinement of definition of a problem, to which is applied an understanding of how to compute a solution. The meaning of working code is in the appropriateness of the solution to the definition of the problem. Adjust either or both to unify and make sense of the matter.
The discipline of programming is rotting when the definition of working is copying code from an oracle you run it to see if it goes wrong.
The measure of works must be an engineering claim of understanding the chosen problem domain and solution. Understanding belongs to the engineer.
LLMs do not understand and cannot be relied upon to produce correct code.
If use of an LLM puts the engineer in contact with proven principles, materials and methods which he adapts to the job at hand, while the engineer maintains understanding of correctness, maybe that's a gain.
But if the engineer relies on the LLM transformer as an oracle, how does the engineer locate the needed understanding? He can't get it from the transformer: he's responsible for checking the output of the transformer!
OTOH if the engineer draws on understanding from elsewhere, what is the value of the transformer but as a catalog? As such, who has accountability for the contents of the catalog? It can't be the transformer because it can't understand. It can't be the developer of the transformer because he can't explain why the LLM produces any particular result! It has to be the user of the transformer.
So a system of production is being created whereby the engineer's going-in position is that he lacks the understanding needed to code a solution and he sees his work as integrating the output of an oracle that can't be relied upon.
The oracle is a peculiar kind of calculator with a unknown probability of generating relevant output that works at superhuman speeds, while the engineer is reduced to an operator in the position of verifying that output at human speeds.
This looks like a feedback system for risky results and slippery slope towards heretofore unknown degrees of incorrectness and margins for error.
At the same time, the only common vernacular for tracking oracle veracity is in arcane version numbers, which are believed, based on rough experimentation, to broadly categorize the hallucinatory tendencies of the oracle.
The broad trend of adoption of this sketchy tech is in the context of industry which brags about seeking disruption and distortion, regards its engineers as cost centers to be exploited as "human resources", and is managed by a specialized class of idiot savants called MBAs.
Get this incredible technology into infrastructure and in control of life sustaining systems immediately!
It's a strange experience, like taking a math class where the proofs are weird and none of the lessons click for you, and you start feeling stupid, only to learn your professor is an escaped dementia patient and it was gobbledygook to begin with.
I had a similar experience yesterday using o1 to see if a simple path exists through s to t through v using max flow. It gave me a very convincing-looking algorithm that was fundamentally broken. My working solution used some techniques from its failed attempt, but even after repeated hints it failed to figure out a working answer (it stubbornly kept finding s->t flows, rather than realizing v->{s,t} was the key.)
It's also extremely mentally fatiguing to check its reasoning. I almost suspect that RLHF has selected for obfuscating its reasoning, since obviously-wrong answers are easier to detect and penalize than subtly-wrong answers.
Benchmarking 10,000 attempts on an IQ test is irrelevant if on most of those attempts the time taken to repair an answer is long than the time to complete the test yourself.
I find its useful to generate examplars in areas you're roughly familiar with, but want to see some elaboration or a refresher. You can stich it all together to get further, but when it comes time to actually build/etc. something -- you need to start from scratch.
The time taken to reporduce what it's provided, now that you understand it, is trivial compared to the time needed to repair its flaws.
I'm interested on how you seem to be getting better answers than me (or, maybe I just discard the answer once I can see it's wrong and write it myself, once I see it's wrong?)
In fact, I just asked it to do (and explain) x!=y for x,y integer variables in the range {1..9}, and while the constraints are right, the explanation isn't.
https://chatgpt.com/share/66e652e1-8e2c-800c-abaa-92e29e0550...
As another example I just gave it a network flow problem, and asked it to convert to maximum flow (I'm using the API, not chatGPT).
Despite numerous promptings, it never got it right -- it would not stop putting a limit on the source and sink (usually 1), which mean the flow was always exactly 1, here's the bit of wrong code (it's the last part, it's shouldn't be putting any restrictions on nmap['s'] and nmap['t'], as they represent the source and sink), and I couldn't pursade it this was wrong after several prods:
Also a trick when the LLM fights you: start from scratch, and put guardrails in your initial prompt.
LLM prompting is a bit like gradient descent in a bumpy nonconvex landscape with lots of spurious optima and saddle points -- if you constrain it to the right locality, it does a better job at finding an acceptable local optimum.
I can only tell this is wrong because I fully understand it -- and if I fully understand it, why not just write it myself rather than fight against an LLM. If I was trying to solve something I didn't know how to do, then I wouldn't know it was wrong, and where the bug was.
For MIPs, correctness can often (not always but usually) be checked by simply flipping the binaries and checking the inequalities. Coming up the inequalities from scratch are not always straightforward so LLMs often provide good starting points. Sometimes the formulation is something specific from a paper that that one has never read. LLMs are a way to "mine" those answers (some sifting required).
I think this the mindset that is needed to get value out of LLMs -- it's not about getting perfect answers on textbook problems, but working with an assistant to explore the space quickly at a fraction of the effort.
The results are boiler plate at best, but misleading and insidious at worst, especially when you get into detailed tasks. Ever try to ask a LLM what a specific constraint does or worse ask it to explain the mathematical model of some proprietary CPLEX syntactic sugar? It hallucinates the math, the syntax, the explanation, everything.
Have you tried again with the latest LLMs? ChatGPT4 actually (correctly) explains what each constraint does in English -- it doesn't just provide the constraint when you ask it for the formulation. Also, not sure if CPLEX should be involved at all -- I usually just ask it for mathematical formulations, not CPLEX calling code (I don't use CPLEX). The OR literature primarily contains math formulations and that's where LLMs can best do pattern matching to problem shape.
Many of the standard formulations are in here:
https://msi-jp.com/xpress/learning/square/10-mipformref.pdf
All the LLM is doing is fitting the problem description to a combination of these formulations (and others).
I’ve heard (but not so much observed) that there is substantial difference between recent models, so it’s possible that they are better than when this was written.
Anyways, CPLEX has an associated modeling language that features syntactic sugar which has the effect of providing opaqueness to the underlying MILP that it solves. I find LLMs essentially unable to even make an attempt at determining the MILP from that language.
PS: How is Xpress? Is there some reason to prefer it to Gurobi or Mosek?
"Overall, while LLM made several errors, the provided formulations can serve as a starting point for OR experts to create mathematical models. However, OR experts should not rely on LLM to accurately create mathematical models, especially for less common or complex problems. Each output needs to be thoroughly verified and adjusted by the experts to ensure correctness and relevance."
I wouldn't recommend anyone inexperienced to use LLMs to create entire models from scratch, but rather use LLMs as a search tool for specific formulations which are then verified and plugged into a larger model. For this, it works really well and saves me a ton of time. As MIP modeler, I have an intuition on the shape of the answer, so even if ChatGPT makes mistakes, I know how to extract the correct bits and it still saves me a ton of time.
The CPLEX API doesn't have a lot of good examples out in the wild, so I don't expect the training to be good. I've always used CPLEX through a modeling language like AMPL, and even AMPL code is rare so I can't expect an LLM to decipher any of it. On the other hand, MIP formulations abound in PDFs of journal publications.
In the vibes department, I feel Xpress is second to Gurobi and CPLEX and it does the job just fine. But it's been a while since I used CPLEX and Gurobi so I have no recent points of comparison (corporate licensing is prohibitively expensive).
Very good at giving a textbook answer ("give a Python/ Numpy function that returns the Voronoi diagram of set of 2d points").
Now, I ask for the Laguerre diagram, a variation that is not mentioned in textbooks, but very useful in practice. I can spend a lot of time spoon-feeding the answer, I just have the bullshiting student answers.
I tried other problems like numerical approximation, physics simulation, same experience.
I don't get the hype. Maybe it's good at giving variations of glue code ie. Stack Overflow meet autocomplete ? As a search tool it's bad because it's so confidently incorrect, you may be fooled by bad answers.
One good riposte to reflexive LLM-bashing is, "Isn't that just what a stochastic parrot would say?" Some HN'ers would dismiss a talking dog because the C code it wrote has a buffer overflow error.
How many more years is senior swe work going to be a $175k/yr gig instead of an $75k check-what-the-robot-does gig?
In my case, it's all I can do not to walk away from everything else I'm doing to follow this particular muse. I don't have a lot of sympathy for my colleagues who see it as a threat. If you're afraid of new ideas, technologies, and methodologies, you picked the wrong line of work.
Correction: I complain that the only decent model in "Open"AI's arsenal, that is GPT-4, has been replaced by a cheaper GPT-4o, which gives subpar answers to most of my question (I don't care it does it faster). As they moved it to "old, legacy" models, I expect they will phase it out, at which point I'll cancel my OpenAI subscriptions and Sonnet 3.5 will become the clear leader for my daily tasks.
Kudos to Anthropic for their great work, you guys are going in the right direction.
So I cancelled two of my 3 subscriptions as I realized OpenAI goes in a direction that is not useful for me at all. Claude, on the other hand, is incredibly useful.
I tried to use 4/4o for a MIP several months ago. Frequently, it would iterate through three or four bad implementations over and over.
Claude 3.5 has been a significant improvement. I don’t really use chatgpt for anything at this point.
Nothing is static in the way things are moving.
Would you be willing to pay even more, if it meant you were getting proportionally more valuable answers?
E.g. $200/month or $2,000/month (assuming the $2,000/month gets into employee/intern/contractor level of results.)
This might drive a positive feedback loop.
Or (4) LLMs simply do not work properly for many use cases in particular where large volumes of trained data doesn't exist in its corpus.
And in these scenarios rather than say "I don't know" it will over and over again gaslight you with incoherent answers.
But sure condescendingly blame on the user for their ignorance and inability to understand or use the tool properly. Or call their criticism low-effort.
“The experience seemed roughly on par with trying to advise a mediocre, but not completely incompetent, graduate student.”
With regard to interacting with the equivalent of Alexa. That’s a remarkable difference in 5 years.
If you think we are close to the maximum useful software in the world already, then maybe. I do not believe that. Seeing software production and time costs drop one to two orders of magnitude means we will have very different viable software production processes. I don’t believe for a second that it disenfranchises quality thinkers; it empowers them.
Reduce costs by an order of magnitude or two, and suddenly there's a whole heap more projects that become profitable.
30 years after I gave up the Rollei, I'm not obsolete as a photographer, and when there's a quality diffusion model that could take a few of my photos from the event at 100 megapixels, and get prompted by me as to what I want to see out of them creatively, I will still not be obsolete, even as a photographer, but most certainly not obsolete as an artist. In fact, I'll have more tools available for my art, with new skills needed, and different workflows.
As to abandoning 3D art -- your call. If you love it, why not see how these new tools open up your art? If you don't love some of the new tools, no problem, don't use them. I still shoot medium format film some times. If you were planning on a long term creative career without staying on top of technical advances in your field, that has not been possible for at least a few centuries.
Are legitimate companies genuinely switching to Midjourney over hiring artists now, or is Midjourney usage still mostly happening in places that previously wouldn't have commissioned custom illustrations at all (instead using things like stock photography)?
There're hundreds of thousands of '3D worker' working behind the scene to create the 3D models for makeshift ads, and as far as I know many of them (including my high school mate) already got displaced by Midjourney and lost their job. This used to be a big industry but now almost entirely wiped out by AI.
To my knowledge, 3D artists weren't that huge of an industry to begin with. One of my friends went to college researching 3D physics models, and never landed a job in the field long before the AI wave hit. Unless you're a freelancer or salaried Pixar employee, being a 3D artist is extremely difficult with extraordinarily low job security, AI or no AI.
I think "almost entirely wiped out by AI" is hyperbole, because the primary employer of these artists will still be hiring and products like Sora are a good decade away from being Toy Story quality. AI will be a substitute product for people that didn't even want 3D art in the first place.
So far, there is little chance of a non-technical person developing a technical solution to their problems using AI.
Nope. Compensation is exponential. Being able to replace a top performer with a fee mediocre devs pair coding with an LLM is more than fine for 90% of use cases.
I think it is more likely that great programmers might just increase their productivity even more with, which will make their value even greater.
Sure. Plenty of businesses are. Particularly in the commercial automation sector that numerically hires the most people.
> more likely that great programmers might just increase their productivity
For those in high-productivity, high-margin businesses, yes. For most of the world, no—the surplus productivity doesn’t outweigh the compensation and concentration risk.
I broadly expect a spate of age discrimination lawsuits in the near future because most businesses don’t need a few stars. In the meantime, I’ve watched a lot of people find two people in Brazil + an LLM equals one WFH very good (but not brilliant) coder.
These people will continue to have value. But most businesses don’t have problems that can be profitable solved only by brilliant coders.
Commercially, you can. After all, that's the current music business.
If a top performer can produce 5x or more of the value, I would expect companies to continue to value top performers.
The developers who will find LLMs the least useful are the "brilliant" ones who never found any utility in any of that stuff, partly because they are not reinventing the wheel for the 1000th time, but instead addressing more challenging and novel problems.
LLMs free me from the nuts and bolts of the "how", for example I don't have to manually type out a loop. I just write a comment and the loop magically appears. Sometimes I don't have to prompt it at all.
With my brain freed from the drudgery of everyday programming, I have more mental cycles to dedicate to higher concerns such as overall architecture, and I'm just way more productive.
For experienced programmers this is a godsend.
Less experienced developers lack the ability to mentally "see" how software should be architected in a way that balances the concerns, so writing a loop a bit faster it's not as much of an advantage. Also, they lack the reflexes to instantly decide if generated code is correct or incorrect.
LLMs are limited by the user's decision speed, the LLM generates code for you but you have to decide whether to accept or reject. If it takes me 1 second to decide to accept code that would have taken me 10 seconds to physically type, then I'm saving 9 seconds, which really adds up. For a junior developer, LLMs may give negative productivity if it takes them longer to decide if the LLM's version is correct than it would have taken them to type whatever they were going to write in the first place.
This is obviously the critical point. It's not whether the LLM can do something, i.e. give it a go, but whether that actually saves you time. If it takes longer to verify the LLM code for correctness than to write it yourself, then there is no productivity gain.
I guess this partly also hinges on how much you care about correctness beyond "does it seem to work". For a prototype maybe that's enough, but for work use you probably should check for API "contractual correctness", corner cases, vulnerabilities, etc, or anything that you didn't explicitly specify (or even if you did!) to the LLM. If you are writing the code itself then these multifaceted requirements are all in your head, but with the LLM you'll need to spell them all out (or iterate and refine), and it may well have been faster just to code it yourself (cf working with an intern with -ve productivity).
If you fail to review the LLMs code thoroughly enough, and leave bugs in it to be discovered later, maybe in production, then the cost of doing that, both in time and money, will far outweigh any cost saving in just having written it correctly yourself in the first place. Again, this is more of a concern for production code than for hobbyist or prototype stuff, but having to fix bugs is always slower than getting it right in the first place.
For myself, it seems that for anything complex it's always the design that takes time, not the coding, and the coding in the end (once the detailed design has been worked out) just comes down to straightforward methods and functions that are mostly simple to get right first time. What would be useful, but of course does not yet exist, would be an AGI peer programmer that operated more like a human than a language model, who I could discuss the requirements and design with, and then maybe delegate the coding to as well.
Notes: https://simonwillison.net/tags/ai-assisted-programming/
In law, this sort of thing already happened with the rise of better research tools. The work L1s used to do a generation ago just does not exist now. An attorney with experience gets the results faster on their own now. With all the pipeline and QoL issues that go with that.
Note though that not all companies see it this way - the telecom I work at is hoping to replace senior onshore developers with junior offshore ones leveraging "GenAI"! I agree that the opposite makes more sense - the seniors are needed, and it's the juniors whose work may be more within reach of LLMs.
I really can't see junior developer positions wholesale disappearing though - more likely them just leveraging LLM/AI-enhanced dev tools to be more productive. Maybe in some companies where there are lots of junior developers they may (due to increased productivity) need fewer in the future, but the productivity gains to be had at this point seem questionable ... as another poster commented, the output of an LLM is only as useful as the skill of the person reviewing it for correctness.
I find a lot of the AI discussion seems to land in the "lump of labor" fallacy camp though.
But not all younger programmers can be Stack Overflow cut-n-pasters, because not all (and surely not 95%!) programming jobs are amenable to that approach. There are lots of jobs where people are developing novel solutions, interacting with proprietary or uncommon hardware and software, etc, where the solution does not exist on Stack Overflow (and by extension not in an LLM trained on Stack Overflow).
AI just destroyed shutterstock.
https://www.reddit.com/r/freelanceWriters/comments/12ff5mw/i...
https://www.reddit.com/r/freelanceWriters/comments/17zms9f/w...
> "It pretty much has killed most small jobs in writing."
> "entry-level writing jobs have ceased to exist."
... There isn't an infinite amount of demand for commodity writing/art/music/vfx, and AI inference is pretty cheap and rapidly getting cheaper.
Not pretty, but it gets the job done for the specific use cases of a given business.
Real production code doesn’t and have a shutter stock equivalent.
If you think most code is stock, then you just haven’t had enough experience in industry yet.
Just another tool in the kit.
What it may do is change the job requrements. Web/JS has decimated (reduced by 90% or more) MFC C++ jobs after all.
The programmer doesnt just write Python. That is the how... not the what.
Once ChatGPT can even come close to replacing a junior engineer, you can retry your claim. The progression of the tech underlying ChatGPT will be sub-linear.
For "us", having such a level of intelligence available as an assistant throughout the day is a massive life upgrade, if we can just afford more tokens.
Then I see contrarians claiming that LLMs are literally never useful for anyone, and I get "don't believe your lying eyes" vibes. At this point, such sentiments feel either willfully ignorant, or said in bad faith. It's wild.
I feel exactly the same, but in the opposite direction.
As someone who’s been programming for 17 years and working professionally for 10, I’m unable to get any huge productivity boosts from AI tools. They’re better than Google+stack overflow for asking random questions, but in a specific context and they’re good for repetitive, but not identical, syntax. That’s about where the gains end for me.
Maybe at this point I’m just so fast about looking up documentation. Maybe the languages/problems I’m facing aren’t well represented in the training data, but I just don’t see this amazing advancement.
I’d really love to see, live, someone programming who really gets these big productivity gains.
I would speculate it's a productivity boost for programmers specifically working in areas that they are new to (or haven't really mastered yet). One question I have is whether overly relying on LLMs will reduce the ability to master a domain, and thus hurt your long-term skill. It might seem silly, like complaining that no one knows assembly anymore because of compilers, but I think it's different than just another layer of abstraction.
It kept generating annoyingly wrong code. Things with subtly wrong misleading names, missing edge cases, ignoring immediate same file context etc. I found that it slowed me down so i turned it off.
Makes me wonder if people who don't like Copilot output will not like my natural output as well.
Could you share any code on GitHub (or pastebin or whatever) that you wrote with the help of AI?
Or could you share what kind of experience you have with programming (how many years, what domain you work in, etc)
I have around 10+ years of professional experience although I did on/off hobby coding before that since 15 years ago.
It's mostly API endpoints, calling a database, third party APIs, data transformation, aggregation type of things.
Then either UI according to what designers provide or whatever I want to do for my side projects.
I think it's of course wildly more productive multiplier for side projects, since then it's mostly about typing things out since you know exactly what you want to do and being a little off doesn't matter.
I don't want to share any of my actual code right now, but I think one example for example is a React component that needs to fetch some sort of data, e.g. using @tanstack/react-query, then it does loading handling, error handling boilerplate things for me, which some of I change to what I specifically need for that situation, but I need very few keystrokes myself to get the initial boilerplate out that I then edit, and during edits it of course also gives me decent suggestions. And it will create the component prop types based on the args I pass to the component etc.
Then with backend, it's really good at data transformations. E.g. combining different datasets, reducing etc.
How well it picks the correct libraries and patterns depends on the project and I think how much I've navigated around, I'm not fully sure how the context is exactly passed, so usually I will feel it out and adapt code where necessary.
At my job we have this pretty clean SOA type architecture backed by a mongo db. Copilot has trouble building the more complicated, domain specific queries on its own, I’ve found.
I do occasionally ask chatgpt how to write a certain query in a general case and apply that to what I’m writing. I also don’t really like mongosh’s docs.
For rust it failed spectacularly. So bad that its not worth discussing lol
It’s autocomplete++, except without knowledge of the rest of my codebase.
While I don't doubt that there's at least one person that has said this, what you're saying doesn't conflict with the things I and many others in the "skeptic" camp have said. LLMs are useful for a very specific set of tasks. The tasks you've listed are a tiny sliver of all the tasks that AI could potentially be doing. Would it be a good idea to consult an LLM if your mother is passed out on the floor? Probably not. The problem I have is with extrapolating from the current successes to conclude that many more tasks will be done by AI in five years.
I personally did find some use cases for it and it does a decent job of cutting out minor gruntwork for me. But the experience itself screams to me that whatever gains I'm feeling I'm getting are all in my head.
Yes, to me LLM is exactly like this: from nano to vim.
It's just that every time I use nano it's (a) unintentional, as it's opened via EDITOR; (b) sort-of coerced, because most distros installing it by default also think it's somehow too much to install Vim or Emacs alongside it; and (c) extremely painfully awkward, because all other editors I use, I've invested at least as couple years of practice into.
If I spent a year using nano every day, and if I evolved a config file and read the manual during that time, I might eventually reach a place where using nano didn't feel cumbersome and irritating, but why would I do that if I already use Emacs and Vim every day? If I learn a 'new' editor it's going to be something extensible that I could see myself programming in every day: Emacs without evil; or one of the newer modal editors with a reversed sentence order, like kakoune and Helix; or, hell, VSCode.
So nano is likely doomed to remain forever cumbersome and irritating for me, somewhere on the level of typing on a touchscreen instead of a real keyboard.
1) is all about experimenting, which is what Tao is doing.
Having a playful and open minded attitude is like 80% of the game
They just don't have the background, and probably lack the interest to dedicate studying for a few years to get to that level.
Intelligence is probably a distant third.
Incorrect. University graduates shows a good work ethic, a certain character and a ability to manage time. It's not a measure of being better than the rest of humanity. Also, it's not a good measure of intelligence. If you only want to view the world through credentials. Academics don't consider your intelligence until you have a Ph.D and X years of work in your field. Industry only uses graduates as a entry requirement for junior roles and then favors and cares only about your years of experience after that. Given that statement I can only assume you haven't been to University. You are mistaken to think, especially in time we are in now that the elite class are any more knowledgeable then you are.
LLMs are good for mediocre poems and presidential speeches that have no shame.
Let’s evaluate the correctness of Thewanderer’s argument in detail:
In summary, Thewanderer’s argument is correct in several key aspects: These points collectively support a well-rounded and accurate perspective on intelligence and capability.Please could you share your prompt or a link to the conversation?
I'm genuinely puzzled that you're more interested in doubling down and justifying yourself and making new points (different from what I initially presented) than understanding the other person's point of view.
If you share your prompt, I'll have a better understanding of your motivations and whether you are arguing in good faith.
As far as silly games go: if you honestly believe a game is silly, you shouldn't play it, unless you want to win silly prizes.
We wont have AGI or ASI, whatever definition people have with those terms in the next 5 - 10 years. But I would often like to refer AI as Assisted or Argumented Intelligence. And it will provide enough value that drives current Computer and Smartphone sales for at least another 5 - 10 years. Or 3-4 cycles.
Average Joe can't do anything like that yet, both because he won't be as good at prompting the model, and because his problems in life aren't text-based anyway.
Usually Alexa will order 10,000 rolls of toilet paper and ship them to my boss when I ask it to turn on the bathroom fan.
Personally tho the utility of this level of skill (beginner grad in many areas) for me personally is in areas I have undergraduate questions in. While I literally never ask it questions in my field, I do for many other fields I don’t know well to help me learn. over the summer my family traveled and I was home alone so I fixed and renovated tons of stuff I didn’t know how to do. I work a headset and had the voice mode of ChatGPT on. I just asked it questions as I went and it answered. This enabled me to complete dozens of projects I didn’t know how to even start otherwise. If I had had to stop and search the web and sift through forums and SEO hell scapes, and read instructions loosely related and try to synthesize my answers, I would have gotten two rather than thirty projects done.
You certainly shouldn't think of it like having access to a graduate student whenever you want, although hopefully that's coming.
Wolfram Research is a profitable company btw
EDIT: Looks like I hurt someone's feelings by killing their unicorn. It was going to happen sooner or later, and pretending isn't very constructive. In fact, pretending this technology is reliable is a very risky thing to do.
I've been saying this for quite some time now, but some people are in for a very rude awakening when the SOTA models 5-10 years from now are able to completely replace senior devs and engineers.
Better buckle up, and start diversifying your skills.
The grad-students write the prompts, correct the model, and all of that is fed into a "more advanced" model. It's corpi of text. Repeat this for every grade level and subject.
Ask the model that's being trained on chemistry grad level work a simple math question and it will probably get it wrong. They aren't "smart". It's aggregations of text and ways to sample and then predict.
The key isn’t whether these things are smart or not. The key is that they put something that can answer basic grad level questions on almost any subject. For people that don’t have a graduate level education in any subject this is a remarkable tool.
I don’t know why the statement that “wow this is useful and a remarkable step forward” is always met with “yeah but it’s not actually smart.” So? Half of all humans have an IQ less than 100. They’re not smart either. Is this their value? For a machine, being able to produce accurate answers to most basic graduate level questions is -science fiction- regardless of whether it’s “smart.”
The NLP feat alone is stunning, and going from basically one step above gibberish to “basic grad school” in two years is a mouth dropping rate of change. I suspect folks who quibble over whether it’s “real intelligence” or simply a stochastic parrot have lost the ability to dream.
Maybe my RLHF work does make it harder for me to dream, but I teach models math which means a lot of prompt writing, and yet I have not found a way to have the model teach me math I don’t know yet (and there’s a lot I don’t know). It’s fun to play around with, but I still gravitate toward the isolated texts, not the aggregation as too much is lost or averaged in my opinion/experience. But hey maybe I’m overtrained on the traditional learning methods.
Not only that, it also helped me reimagine and conceptualize a new measure of statistical dependency based on Jensen-Shannon divergence that works very well. And it came up with a super fast implementation of normalized mutual information, something I tried to include in the library originally but struggled to find something fast enough when dealing with large vectors (say, 15,000 dimensions and up).
While it wasn’t able to give perfect Rust code that compiled on the very first try, it was able to fix all the bugs in one more try after pasting in all the compiler warning problems from VScode. In contrast, gpt-4o usually would take dozens of tries to fix all the many rust type errors, lifetime/borrowing errors, and so on that it would inevitably introduce. And Claude3.5 sonnet is just plain stupid when it comes to Rust for some reason.
I really have to say, this feels like a true game changer, especially when you have really challenging tasks that you would be hard pressed to find many humans capable of helping with (at least without shelling out $500k+/year in compensation for).
And it’s not just the performance optimization and relatively bug free code— it’s the creative problem solving and synthesis of huge amounts of core mathematical and algorithmic knowledge plus contemporary research results, combined with a strong ability to understand what you’re trying to accomplish and making it happen.
Here is the diff to the code file showing the changes:
https://github.com/Dicklesworthstone/fast_vector_similarity/...
Bigger context is definitely helpful, but not if it comes at the expense of reasoning/analytical ability. I’m always a bit puzzled why people stress the importance of these “needle in a haystack” tests where the model has to find one specific thing in a huge document. That seems far less relevant to me in terms of usefulness in the real world.
How do you mean?
Half of writing code within a codebase, is knowing what functions already exist in the codebase for you to call in your own code; and/or, what code you'll have to change upstream and downstream of the code you're modifying within the same codebase — or even by forking your dependencies and changing them — to get what you want to happen, to happen.
And half of, say, writing a longform novel, is knowing all the promises you've made to the reader, the active Chekov's guns, and all the other constraints you've placed on yourself by hundreds of pages or even several books ago, that just became relevant again as of this very sentence. Or, moreover, which of those details it's the proper time to make relevant again for maximum impact and proper first-in-last-out narrative bridging structure.
In both cases, these aren't really literal "needle in a haystack" stress-tests; they should properly be tests of the model's ability to perform some kind of "associational priority indexing" on the context, allowing it to build concepts into associational sub-networks and then make long-distance associations where the nodes are entire subnetworks. (Which isn't something we really see yet, in any model.)
cough
And now we have a $number we can relate, and refer, to.
For example, I asked a pretty simple question here and it got completely confused:
https://moorier.com/math-chat-1.png https://moorier.com/math-chat-2.png https://moorier.com/math-chat-3.png
(Full chat should be here: https://chatgpt.com/share/66e5d2dd-0b08-8011-89c8-f6895f3217...)
I would really like if people check on a set of geometry and a set of analysis questions and compare the difference.
Maybe if you fine tuned it on Euclid's Elements and then allowed it to run experiments with Mathematica snippets it could check its assumptions before spouting nonsense
Attitudinally, I suspect people who have had experience supervising interns or mentoring juniors are probably those who are able to get value out of LLMs (paid ones - free ones are no good) rather than grizzled lone individual contributors -- I myself have been in this camp for most of my early career -- who don't know how to coax value out of people.
One of the most interesting aspects of this thread is how it brings us back to the fundamentals of attention in machine learning [1]. This is a key point: while humans have intelligence, our attention is inherently limited. This is why the concept behind Attention Is All You Need [2] is so relevant to what we're discussing.
My 2 cents: our human intelligence is the glue that binds everything together.
[1] https://en.wikipedia.org/wiki/Attention_(machine_learning)
[2] https://en.wikipedia.org/wiki/Attention_Is_All_You_Need
This is like when you’re being interviewed for a programming job and the interviewer explains some problem to you that it took their team months to figure out, and then they’re disappointed you can’t whiteboard out the solution they came up with in 40 minutes without access to google.
Having read what he's saying there, and with my experience, I think your characterisation is inaccurate.
And having been at the talk he gave for the IMO earlier this year he is impressed with some of the interactions, it's just that he feels that any kind of "creative spark" is still missing.
Perhaps it’s an ability to confabulate facts into the context window which are not present in the training data but which are, in the context of maths, viable hypotheses? Every LLM can generate bullshit, but maybe we just need the right bullshit?
Basically can you provide a new perspective on solving a problem that hasn't been considered or a new way of looking at an existing idea in a new way to unlock a path.
He does have some answers, such as “human creativity is the ability to create an infinite range of outputs from a finite range of inputs that nonetheless pertain to our motivations/context in some useful way”, but that’s obviously not a very satisfying answer. It tells us a little — I think Tao is gesturing to exactly this when he complains that GPTo1 can only apply and combine mathematical approaches within a sort of closed domain rather than propose radically new ones - but it’s not helpful for an everyday understanding of creativity. IMO :)
In his words, from Language and Mind:
If this sounds intriguing to you/anyone, I highly recommend his (in)famous debate with Foucault, which is available for free on YouTube. It’s a bit wandering, but about halfway through they discuss creativity in depth, contrasting Foucault’s vaguely postmodern view-that human creativity is mostly constrained by societal circumstances-with Chomsky’s view, that human creativity is mostly constrained by the natural structures of our cognitive system(s).https://youtu.be/3wfNl2L0Gf8?si=WA3DpnaFEvd3QqCt
This way, it was possible to simply replace the element with the last array element, then decrease the size of the array by one. I'd say that's pretty creative: whoever came up with this was able identify what can be traded off to make the previously impossible, possible, unlocking new scales and possibilities.
In practice, I'd say creativity is often being able to manifest people's qualia in some unprecedented way. For example, say you're experimenting in your DAW, and discover a pretty cool sound. You identify the ways it can be used to emote and then utilize it in a work. If you really stumbled upon a sound that a lot of people find as emotive as you did, you just did something creative: it's as if you translated the qualia of an emotion into sound.
This qualia to manifestation is what's behind creativity in all of senses of the word I believe. In my previous example, discovering that orderedness is not actually a strict requirement, and (ab)using that to significantly alter the scaling of such an action is creative, because it undoes the notion that orderedness is a requirement. It goes against what's natural, but in a way that becomes extremely natural and indispensable once realized.
I think, in that way, current AIs are trained to be uncreative, since being creative inherently requires experimentation that is unaligned with the normal.
In fairness, that is an extremely standard trick so it's reasonably unlikely that the author came up with it themselves.
You've have to invent some new domain I guess and see if it could be creative within that domain. Difficult to think of a good test though.
There's no need to try to infer this kind of high bar, because what he says is actually very specific and concrete: "Here the result was mildly disappointing ... Essentially the model proposed the same strategy that was already identified in the most recent work on the problem (and which I restated in the blog post), but did not offer any creative variants of that strategy." Crucially the blog post in question was part of his input to ChatGPT.
Otherwise, he's been clear that while he anticipates a future where it is more useful, at present he only uses AI/ChatGPT for bibliography formatting and for writing out simple "Hello World" style code. (He is, after all, a mathematician and not a coder.) I've seen various claims online that he's using ChatGPT all the time to help with his research and, beyond the coding usage, that just seems to not be true.
(However, it's fair to say that "able to help Terence Tao with research" is actually a high bar.)
It doesn't really get logic still, but it does small edits well when the code is very clear.
I think this will always remain a problem. Because it can never shut up, it keeps making stuff up and "hallucinate" (works normally, just incorrectly) to dig itself further and further into a hole.
Autocomplete on steroids is what peak AI will look like till the time we can crack consciousness and AGI (which the modern versions are nothing even close to).
If arguably the person with the highest IQ currently living, is impressed but still not fully satisfied that a computer doesn’t give Nobel prize winning mathematical reasoning I think that’s a massive metric itself
So what then should the first year maths PhD think? I believe Tao obliquely addresses this with his previous post with effectively “o1 is almost as good as a grad student”
No offense, but every part of this characterization is really unserious! He says "the model proposed the same strategy that was already identified in the most recent work on the problem (and which I restated in the blog post), but did not offer any creative variants of that strategy." That's very different from what you're suggesting.
The way you're talking, it sounds like it'd be actually impossible for him to meaningfully say anything negative about AI. Presumably, if he was directly critical of it, it would only be because his standards as the world's smartest genius must simply be too high!
In reality, he's very optimistic about it in the future but doesn't find it useful now except for basic coding and bibliography formatting. It's fascinating to see how this very concrete and easily understood sentiment is routinely warped by the Tao-as-IQ-genius mythos.
Are you arguing that I’m making an appeal to authority fallacy?
AI has no emotional barrier to wasted effort, which make them better reasoners than their innate ability would suggest.
I wasn't amazing at maths research (did a PhD and post-doc and then gave up) but my experience was that it was partly thinking hard about things and grappling with what was going on and trying to break it down somehow, but also scanning everything you know related to the problem, trying to find other problems that resemble it in some way that you can steal ideas from etc.
*(I remember a specific impressive example from 6 months ago: I asked if certain definitions could be relaxed to allow complex analysis on a non-orientable manifold, like a Klein bottle, something I spent a lot of time puzzling over, and an LLM instantly figured out it would make the Cauchy-Riemann equations globally inconsistent. (In a sense the arbitrary sign convention in CR defines an orientation on a manifold: reversing manifold orientation is the same as swapping i with -i. I understand this now, solely because an LLM suggested looking at it). Of course, I'm sure this isn't original LLM thinking—the math's certainly written down somewhere in its training material, in some highly specific postgraduate textbook I have no knowledge of. That's not relevant to me. For me, it's absolutely impossible to answer this type of question, where I have very little idea where to start, without either an LLM or a PhD-level domain specialist. There is no other tool that can make this kind of semantic-level search accessible to me. I'm very carefully thinking how best to make use of such an, incredibly powerful but alien, tool...)
Here's @tao on mathstodon saying he's learning it.
https://mathstodon.xyz/@tao/111206761117553482
You can literally learn how to write proofs using Lean: https://djvelleman.github.io/HTPIwL/
Writing nontrivial proofs in a theorem prover is a different beast. In my experience (as someone who writes mechanized mathematical proofs for a living) you need to not only know the proof very well beforehand, but you also need to know the design considerations for all of the steps you are going to use beforehand, and you also need to think about all of the ways your proof is going to be used beforehand. Getting these wrong frequently means redoing a ton of work, because design errors in proof systems are subtle and can remain latent for a long time.
What do you mean by that? I don't know much about theorem provers, but my POV would be that a proof is used to verify a statement. What other uses are there one should consider?
One common example is if you're going to internalize or externalize a property of a data structure: eg. represent it with a dependent type, or a property about a non-dependent type. This comes with design tradeoffs: some lemmas might expect internalized representations only, some rewrites might only be usable (eg. no horrifying dependent type errors) with externalized representations. For math in particular, which involves rich hierarchies of data structures, your choice about internalization might can impacts about what structures from your mathematical library you can use, or the level of fragile type coercion magic that needs to happen behind the scenes.
This is also the case for other top-profile mathematicians like Peter Scholze. Good luck to someone who wants to put chatgpt answers to random hypotheticals into lean to see if they're right, I don't think they'll have so easy a time of it.
I think in just a few month the average user will not be able to tell the difference in performance between the major models
But on the other hand it misses important detail and hallucinates, just like GPT-4o. And can need a lot of hand holding and correction to get to the right answer, so much so that sometimes you wonder if it would have been easier to just do it yourself. Only this time it's worse because you're waiting 20-60 seconds for an answer.
I wonder if what it excels at is just the stuff that I don't need it for. I'm not in classic STEM, I'm in software engineering, and o1 isn't so much better that it justifies the wait time (yet).
One area I haven't explored is using it to plan implementation or architectural changes. I feel like it might be better for this, but need the right problems to throw at it.
[0] https://www.nytimes.com/games/connections
[1] https://chatgpt.com/share/66e40d64-6f70-8004-9fe5-83dd3653a5...
I've tried a variety of ways to ask various LLMs to help solve this. Finally with access to ChatGPT o1-preview I was able to get a good answer. The first answer was wrong, but with a little more prompting and clarification I was able to get the answer I wanted to relate the positions of P0, P1, P2 and P3 so that a Bézier curve could be G3. This isn't something that is unknown because there are many CAD programs which can do this already, but I had not been able to find the answer I was looking for in a form that was useful to me.
I don't really know where that puts o1-preview relative to a math grad student, but after spending tons of time over a couple years on this pet project, getting an answer from a chat bot was one of the more magical moments I've had with technology in a long time.
Coming from Terence Tao that seems pretty remarkable to me?
Any other takes by mathematicians out there?
the best competitive programmer in the world (gennady korotkevich, aka tourist) recently crossed the 4000 ELO barrier in Codeforces. o1 is about 1807 ELO.
the best ai model is compared against the best human in the context of competition programming, to set a clear standard of comparison.
similarly, terence tao represents the highest levels of math in analysis. his input is valuable in regards to math. his summary of the current capabilities of o1 is important because we can then understand the level of competence the best ai models have right now, and set a standard of comparison just like with coding.
site note: any number of phds = not the same expertise. there are thousands of phds who graduate every year, let alone thousands of unemployable phds who fail to get a professorship.
there are only 2-4 fields medalists chosen every 4 years.
(* I think one could very legitimately view him as the top researcher in harmonic analysis in the world - he is a great mathematician - but it's not clear to me how people go from that to Epochal Genius and his extreme celebrity status across STEM)
Math grad students everywhere now have a benchmark to determine if Terry Tao considers them to be mediocre or incompetent.
Just off the top of my head, maybe a RLHF run performed by academic experts and geared towards “creative applications” could get us farther than we are? Given how much the original RLHF run cost with underpaid workers in developing countries that might be exorbitantly expensive, but it’s worth a dream. Perhaps as a governmental or NGO-driven open source initiative…
Of course, a core problem here is defining “creativity” in stringent — or in Chomsky’s words, “scientific” — terms. RLHF dodged that a bit by leaning on the intuitive capabilities of your human critics. I’m constantly opining about how LLMs solved the frame problem, but perhaps it’s better characterized as a partial solution for a relatively easy/basic environment: stories about the real world. The Abstract/Academic/Scientific Frame Problem might be another breakthrough away, yet…
If you know the contours of the answer and can describe what you are looking for it can quickly find it for you.
So we've been up for over 7 years, and have just over 19K accounts.
If they are overly optimistic, perhaps it would be good to hear the opinions of Wiles and Perelman.
https://www.scientificamerican.com/article/ai-will-become-ma...
> If you want to prove an unsolved conjecture, one of the first things you need to do is to break it up into smaller pieces, each of which has a better chance of being proven. But you will often break up a problem into harder problems. It’s very easy to transform a problem into one that’s harder than into one that’s simpler. And AI has not demonstrated any ability to be any better than humans in this regard.
Not sure if O1 changed his mind tho.
Which is going to zero if the optimistic predictions are correct, so the optimistic professors should warn their students.
I understand the motivation for pure math quite well. It is about beauty, understanding things and discovering things for oneself. If computers do the work, the discovery part is gone and pure math is ruined.
For the non-research part, the AI zealots will want to replace all human labor with software.
isn't this development obviously going to result in the deprecation of the value of the human intellect to near-zero? which is the thing that virtually all people on this platform base their livelihood on?
there's such a deafening silence around this topic on the internet where there should be - i don't know what but not this silence. we don't know what to do right? and we're avoiding this topic.
with this version they broke the assumed wall of llms's developed that was the last copium that we could believe in. the wall is broken and now it's just a matter of time until your capacity to think will be completely unneeded. the machine will do it more accurately and more quickly by orders of magnitude.
am I a doomer? I was in the home country of my parents recently, that is completely dysfunctional and war is on the verge of breaking out. what I learned there is that humans stay ignorant of great dangers until the very moment in which it affects them. this must've been the case with all the great wars that we've had. the water is rising but until I start to suffocate I don't agree to see it. i make up copes, or I think some are going to drown but I'm safe, or I distract myself.
what are all the software engineers here thinking? what's your cope for this? or are we all freezing in shock right now? this o1 is solving problems that i know many of my colleagues can never solve. what are we hoping for I think? I don't have a future because my future was the image that I had of it. and no image of the future that would be nice to keep around seems plausible at this point.
Generally these folks have all said their piece and are tired of talking about it every time LLMs come up -> silence (as in nothing more to say) as each group is self convinced and most don't necessarily feel the need to get 100% of folks on board with their view. The dystopia or "doomer" group are the main ones left feeling like they need more of an answer, the rest move on quietly in either excitement or disinterest.
As LLMs continue to improve I feel like anyone making a living doing the "99% perspiration" part of intellectual labor is about to enter a world of hurt.
And you thought you had imposter syndrome before!
There are exceptions of course, but that's how the bulk of businesses, especially those with stupid ideas are funded. In the latter category success does not even matter, the trust fund baby just has to have the appearance of a leader position.
isn't this development obviously going to result in the deprecation of the value of the human intellect to near-zero? which is the thing that virtually all people on this platform base their livelihood on?
there's such a deafening silence around this topic on the internet where there should be - i don't know what but not this silence. we don't know what to do right? and we're avoiding this topic.
with this version they broke the assumed wall of llms's developed that was the last copium that we could believe in. the wall is broken and now it's just a matter of time until your capacity to think will be completely unneeded. the machine will do it more accurately and more quickly by orders of magnitude.
am I a doomer? I was in the home country of my parents recently, that is completely dysfunctional and war is on the verge of breaking out. what I learned there is that humans stay ignorant of great dangers until the very moment in which it affects them. this must've been the case with all the great wars that we've had. the water is rising but until I start to suffocate I don't agree to see it. i make up copes, or I think some are going to drown but I'm safe, or I distract myself.
what are all the software engineers here thinking? what's your cope for this? or are we all freezing in shock right now? this o1 is solving problems that i know many of my colleagues can never solve. what are we hoping for I think? I don't have a future because my future was the image that I had of it. and no image of the future that would be nice to keep around seems plausible at this point.
Obviously yes, but admitting it may not be the right move.
I have a related list of GPT accomplishments here: https://docs.google.com/spreadsheets/d/1kc262HZSMAWI6FVsh0zJ...
There’s at least a “complexity” if not a “problem” in terms of judging models that to a first approximation have been trained on “everything”.
Have people tried putting these things up against serious mathematical problems that are well studied? With or with Lean hinting has anyone gotten like, the Shimura-Taniyama conjecture/proof out?
No FLT yet, but as someone who was initially quite skeptical, I’m starting to be convinced!
Anyway, I think five years ago I was skeptical that ML would even get to the point of being able to solve competition problems, and I was proven wrong, so my priors have been updated.
Appreciate the no fucks given categorization of grad students.
Is the most important part imo. A big goal should be some ai system coming up with its own discovery and ideas. Really unclear how we can get from the current paradigm to it coming up with something like general relativity, like Einstein. Does it require embodiment?
It also seems like one of those things where we ought to ask whether we should, before asking whether we could. Why not focus on areas that are easier, more beneficial, and less problematic from a "should" perspective?
Find a, b, c distinct positive integers satisfying a^3 + b^3 = c^4. Hint: try dividing all sides by c^3, then giving values to (a/c) and (b/c).
Given the log scale on compute to improve performance, it is not a guarantee that the ratio can be improved so much in a few years
But I was curious and I asked something very simple, Euclid's first postulate and I got this answer:
Euclid's Postulate 1: "Through any two points, there is exactly one straight line."
In fact Euclid's Postulate 1 is "To draw a straight line from any point to any point." http://aleph0.clarku.edu/~djoyce/java/elements/bookI/bookI.h...
I think AI answer is not correct, it may be some textbook interpretation but I was expecting Euclid's exact wording.
Edit: Google's Gemini gives the exact wording of the postulate and then comments that this means that you can draw one line bitween two points. I think this is better
Both gpt4o and o1 roughly know the correct original text, so prompting, the model’s background memory, or random chance may influence your outcomes, though hopefully (in an improved model) you should never get you incorrect info.
https://farside.ph.utexas.edu/Books/Euclid/Elements.pdf
Edit: in case it isnt clear, I could not reproduce this error on my end with o1-mini
Euclid has been translated, restated, and re-presented in enough books and textbooks that I'd expect a big-enough LLM to have actually memorized this correctly tbh
It was written before English even existed. That said, the original never implied "exactly one", so I agree its a bad translation.
The bottom line is, you can’t take any single LLM statement at face value, even in seemingly easy to answer cases like this.
https://en.wikipedia.org/wiki/Point%E2%80%93line%E2%80%93pla...
Examples:
https://en.wiktionary.org/wiki/Euclidean_geometry
https://www.cerritos.edu/dford/SitePages/Math_70_F13/Postula...
Problems with polysemy across divergent, more advanced theories has been one of my biggest challenges in probing some of my areas of intrest.
Funny enough, one of my pet areas of obscure interest, riddled basins, is constantly muddied not by math, but LSAT questions, specifically non-math content directed at a reading comprehension test: "September 2006 LSAT Section 1 Question 26"
IMHO a lot of the prompt engineering you have to do with these highly domain specific problems is avoiding the most common responses in the corpus.
LLM responses will tend to reflect common usage, not academic terminology unless someone cares enough to change that for a specific case.
Is that an accurate description? I thought it just runs the LLM for longer, and multiple times,and truncates the beginning of the output.
“The experience seemed roughly on par with trying to advise a mediocre, but not completely incompetent, graduate student.”
But yeah, given o1 exists, it looks very doable. It's hard to imagine a reason for why something matching his criteria would be more than a decade out.
It performs way better than undergrads. Funny he didn’t point that out but only made some slight to it about being a bad graduate student. Don’t believe me, open the book and ask away. It’s amazing, even if it is a “mediocre graduate student” which is far better than a good graduate student or professor that gives you no help or time for all that money you forked over.
It’s already worth the money, ignore this shitty write up by someone they doesn’t need its help.
https://en.wikipedia.org/wiki/Hutter_Prize
It's not exactly a new conjecture that intelligence fundamentally is an act of compression.