NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
The beginning of scarcity in AI (tomtunguz.com)
dmazin 10 hours ago [-]
Constraints can lead to innovation. Just two things that I think will get dramatically better now that companies have incentive to focus on them:

* harness design

* small models (both local and not)

I think there is tremendous low hanging fruit in both areas still.

drra 10 minutes ago [-]
Absolutely. Anyone working on inference token level knows how wasteful it all is especially in multimodal tokens.
com2kid 10 hours ago [-]
China already operates like this. Low cost specialized models are the name of the game. Cheaper to train, easy to deploy.

The US has a problem of too much money leading to wasteful spending.

If we go back to the 80s/90s, remember OS/2 vs Windows. OS/2 had more resources, more money behind it, more developers, and they built a bigger system that took more resources to run.

Mac vs Lisa. Mac team had constraints, Lisa team didn't.

Unlimited budgets are dangerous.

phist_mcgee 34 minutes ago [-]
Perhaps its because american hyperscalers want unlimited upside for their capital?
cesarvarela 9 hours ago [-]
Harness is a big one, Claude Code still has trouble editing files with tabs. I wonder how many tokens per day are wasted on Claude attempting multiple times to edit a file.
lpcvoid 56 minutes ago [-]
The future is now, I guess
christkv 1 hours ago [-]
Could not agree more, this will spur innovation in all aspects of local models is my hunch.
dataviz1000 9 hours ago [-]
What do you mean by harness here?
Ifkaluva 9 hours ago [-]
When you go to the command line and type “Claude”, there is an LLM, and everything else is the harness
dataviz1000 8 hours ago [-]
I'm having an hard time getting my mind to see this.

> Users should re-tune their prompts and harnesses accordingly.

I read this in the press release and my mind thought it meant test harness. Then there was a blog post about long running harnesses with a section about testing which lead me to a little more confusion.

Yes, the word 'harness' is consistently used in the context as a wrapper around the LLM model not as 'test harness'.

dboreham 23 minutes ago [-]
This field is chock full of people using terms incorrectly, defining new words for things that already had well known names, overloading terms already in use. E.g. shard vs partition. TUI which already meant "telephony user interface ". "Client" to mean "server" in blockchain.
9 hours ago [-]
codybontecou 9 hours ago [-]
pi vs. claude code vs. codex These are all agent harnesses which run a model (in pi's case, any model) with a system prompt and their own default set of tools.
wg0 9 hours ago [-]
There's other side to it too.

Whoever running and selling their own models with inference is invested into the last dime available in the market.

Those valuations are already ridiculously high be it Anthropic or OpenAI to the tune of couple of trillion dollars easily if combind.

All that investment is seeking return. Correct me if I'm wrong.

Developers and software companies are the only serious users because they (mostly) review output of these models out of both culture and necessity.

Anywhere else? Other fields? There these models aren't any useful or as useful while revenue from software companies by no means going to bring returns to the trillion dollar valuations. Correct me if I'm wrong.

To make the matter worst, there's a hole in the bucket in form of open weight models. When squeezed further, software companies would either deploy open weight models or would resort to writing code by hand because that's a very skilled and hardworking tribe they've been doing this all their lives, whole careers are built on that. Correct me if I'm wrong.

Eventually - ROI might not be what VCs expect and constant losses might lead to bankruptcies and all that build out of data centers all of sudden would be looking for someone to rent that compute capacity result of which would be dime a dozen open weight model providers with generous usage tiers to capitalize on that available compute capacity owners of which have gone bankrupt and can't use it any more wanting to liquidate it as much as possible to recoup as much investment as possible.

EDIT: Typos

solenoid0937 2 hours ago [-]
OpenAI has an absurdly high valuation given their cash burn vs RRR.

Anthropic's is far more reasonable.

It makes no sense to lump these two companies together when talking about valuation. They have completely different financial dynamics

wg0 1 hours ago [-]
No matter how low and reasonably Anthropic is valued, don't think $200 Max plans are going to recoup the investment + some return on top because size of the software industry is not that huge and profit margins for AI inference aren't very high either.
solenoid0937 1 hours ago [-]
Pro and Max plans are probably a drop in the bucket for them.
drra 2 minutes ago [-]
Seems like everybody an their mothers are using max plans these days. I wouldn't be surprised if LTV of each customer was big enough to justify spending.
christkv 1 hours ago [-]
It feels like a repeat of the dot com infrastructure buildup that spurred the whole 2005 explosion in affordable hosting and new companies. This will probably leave us massive access to affordable compute in a couple of years.
piokoch 13 minutes ago [-]
Well it's in the books. O(n^2) algorithms are bad in the long run, transformers algorithm has such complexity, so not a big surprise we hit the limits.
henry2023 10 hours ago [-]
The US is bound by energy and China is bound by compute power. The one who solves its limitation first will end this “Scarcity Era”.
jakeinspace 9 hours ago [-]
China is installing something like 500 GW of wind and solar per year now. Even if they're only able to build and otherwise access chips that have half the SoTA performance per watt, they will win.
odo1242 9 hours ago [-]
Performance per dollar may be more important than performance per watt here, though
thelastgallon 2 hours ago [-]
A dollar is an entirely fictional unit and trillions of it can be manufactured at no cost, while watts are constrained by the laws of physics, photons/electrons, supply chain of electricity and all that fun stuff in the real world.
thelastgallon 2 hours ago [-]
US energy is constrained by the utility monopolies/oligopolies which have to extract more rents, specifically by increasing costs. Their profit is a percentage of cost, these perverse incentives + oligopolies will make it increasingly expensive to make anything (including AI) in US.
hvb2 27 minutes ago [-]
Or simply by the fact that increasing production takes time? Any power plant takes years to build?

Years, is like a lifetime for AI at this point...

CuriouslyC 9 hours ago [-]
The dynamics vastly favor China, part of the reason the US sprinting towards "ASI" isn't totally boneheaded is that the US and its industry needs a hail mary play to "win" the game, if they play it safe they lose for sure.
leptons 9 hours ago [-]
I'd be fine with a world without AI, honestly. Nobody really wins this race except the very wealthy. And I don't think it's really going to play out the way the wealthy think it will. It's more like a dog catching a car than it is a race.
odo1242 9 hours ago [-]
> It's more like a dog catching a car than it is a race.

What does this mean? I didn't understand the analogy.

digitalsushi 7 hours ago [-]
A car caught by a dog has no purpose. The activity concludes with no output.
leptons 2 hours ago [-]
"The dog that caught the car" refers to how dogs sometimes chase cars. Suppose the car stops and the dog catches up - what is it going to do? It has no plan, it has no purpose, it isn't going to bite the car, it isn't going to get anything out of catching the car. The car may even run it over. I intended it basically as "play stupid games, win stupid prizes", or "be careful what you wish for".
thelastgallon 2 hours ago [-]
My observation is that the dog sniffs all the tires, picks one tire, lifts one leg and does the deed. I don't know if its a way of marking territory or domination. We need a dogatologist to explain what it means.
Miraste 9 hours ago [-]
China's domestic chips are increasingly close to state-of-the-art. The US electrical grid is... not.
ttul 2 hours ago [-]
Energy scarcity will drive more innovation in local silicon and local inference. Apple will be the unexpected beneficiary of this reality.
2001zhaozhao 8 hours ago [-]
AKA, the beginning of big companies being able to roll over small companies with moar money

(note: I don't expect this to actually happen until the AI gets good enough to either nearly entirely replace humans or solve cooperation, but the long term trend of scarce AI will go towards that direction)

vessenes 10 hours ago [-]
It seems very possible that we have at least five years of real limitations on compute coming up. Maybe ten, depending on ASML. I wonder what an overshoot looks like. I also wonder if there might be room for new entrants in a compute-scarce environment.

For instance, at some point, could Coreweave field a frontier team as it holds back 10% of its allocations over time? Pretty unusual situation.

dist-epoch 9 hours ago [-]
Jensen just said that if the signal/commitments are there, ASML can scale in 2-3 years.
vessenes 5 hours ago [-]
With Anthropic buying compute in dark alleys I’d assume that day is coming..
itmitica 10 hours ago [-]
The current inference system is on a down slope.

It remains to be seen what new wave of AI system or systems will replace it, making the whole current architecture obsolete.

Meanwhile, they are milking it, in the name of scarcity.

stupefy 10 hours ago [-]
What limits LLM inference accelerators? I heard about Groq (https://groq.com/) not sure how much it pushes away the problem.
vessenes 10 hours ago [-]
ASML only makes a certain number of machines a year that can do extreme ultra-violet lithography.

Also - turbine blades limit power, according to Elon.

Between them - we cannot chip fabs past a certain rate, and we cannot stand up the datacenter to run these desired chips past a certain rate. Different people believe one or the other is the 'true' current bottleneck. The turbine supply chain scaling looks much more tractable -- EUV is essentially the most complicated production process humans have ever devised.

10 hours ago [-]
andai 9 hours ago [-]
Is global compute bottlenecked by one company?
Tanjreeve 56 minutes ago [-]
Yes. At least, the manufacturing of compute is. And a lot of the chain has been bitten hard by increasing capacity prematurely in the past so they're reticent to increase bandwidth at vast cost.
ls612 10 hours ago [-]
Presumably ASML can increase production if demand is high enough the question is over what time frame. 5 years seems plausible to me but I honestly don't know what that number is.
vessenes 10 hours ago [-]
It's ... really long, according to Dylan Patel on the Dwarkesh Podcast. The supply chain is extremely deep and complex.
juliansimioni 10 hours ago [-]
Yes. And the fab companies and their suppliers are deliberately and wisely slow to scale up production to meet short term changes in demand. They've seen the history of the semiconductor industry, it's constant boom and bust cycles. But they have the highest op-ex costs of anyone. So when the party's over they are the ones who pay for it the most.
Miraste 9 hours ago [-]
If only there were some form of cheap, widely manufactured power generation technology that didn't use turbines... Are they really going to wait until 2030 to get more turbines rather than invest in solar?
czk 10 hours ago [-]
"adaptive" thinking
yalogin 9 hours ago [-]
Does this also mean ram prices are not coming down anytime soon?
i_think_so 2 hours ago [-]
> Does this also mean ram prices are not coming down anytime soon?

One person replies "yes". Another replies "no".

This concludes our press conference.

<3 HN

stronglikedan 9 hours ago [-]
they already are
dist-epoch 9 hours ago [-]
yes, and it will keep increasing
com2kid 10 hours ago [-]
To bang on the same damn drum:

Open Weight models are 6 months to a year behind SOTA. If you were building a company a year ago based on what AI could do then, you can build a company today with models that run locally on a user's computer. Yes that may mean requiring your customers to buy Macbooks or desktops with Nvidia GPUs, but if your product actually improves productivity by any reasonable amount, that purchase cost is quickly made up for.

I'll argue that for anything short of full computer control or writing code, the latest Qwen model will do fine. Heck you can get a customer service voice chat bot running in 8GB of VRAM + a couple gigs more for the ASR and TTS engine, and it'll be more powerful than the hundreds of millions spent on chat bots that were powered by GPT 4.x.

This is like arguing the age of personal computing was over because there weren't enough mainframes for people to telnet into.

It misses the point. Yes deployment and management of personal PCs was a lot harder than dumb terminal + mainframe, but the future was obvious.

9 hours ago [-]
space_fountain 9 hours ago [-]
I've seen this claimed, but I'm not sure it's been true for my use cases? I should try a more involved analysis but so far open models seem much less even in their skills. I think this makes sense if a lot of them are built based on distillations of larger models. It seems likely that with task specific fine tuning this is true?
rstuart4133 5 hours ago [-]
> I've seen this claimed, but I'm not sure it's been true for my use cases?

I'd be surprised if it isn't true for your use cases. If you give GLM-5.1 and Optus 4.6 the same coding task, they will both produce code that passes all the tests. In both cases the code will be crap, as no model I've seen produces good code. GLM-5.1 is actually slightly better at following instructions exactly than Optus 4.6 (but maybe not 4.7 - as that's an area they addressed).

I've asked GLM-5.1 and Opus 4.6 to find a bug caused by a subtle race condition (the race condition leads to a number being 15172580 instead of 15172579 after about 3 months of CPU time). Both found it, in a similar amount of time. Several senior engineers had stared at the code for literally days and didn't find it.

There is no doubt the models do vary in performance at various tasks, but we are talking the difference between Ferrari vs Mercedes in F1. While the differences are undeniable, this isn't the F1. Things take a year to change there. The performance of the models from Anthropic and OpenAI literally change day by day, often not due to the model itself but because of the horsepower those companies choose to give them on the day, or them tweaking their own system prompts. You can find no end of posts here from people screaming in frustration the thing that worked yesterday doesn't work today, or suddenly they find themselves running out of tokens, or their favoured tool is blocked. It's not at all obvious the differences between the open-source models and the proprietary ones are worse than those day to day ones the proprietary companies inflict on us.

frodowtf2 2 hours ago [-]
> In both cases the code will be crap, as no model I've seen produces good code.

I'm wondering if you have actually used claude code because results are not so catastrophic as you describe them.

rstuart4133 1 hours ago [-]
I used LLMs to write what seems like far too many lines of code now. This is an example Opus 4.6 running at maximum wrote in C:

    if (foo == NULL) {
       log_the_error(...);
       goto END;
    }
    END:
    free(foo);
If you don't know C, in older versions that can be a catastrophic failure. (The issue is so serious in modern C `free(NULL)` is a no-op.) If it's difficult to get a `FOO == NULL` without extensive mocking (this is often the case) most programmers won't do it, so it won't be caught by unit tests. The LLMs almost never get unit test coverage up high enough to catch issues like this without heavy prompting.

But that's the least of it. The models (all of them) are absolutely hopeless at DRY'ing out the code, and when they do turn it into spaghetti because they seem almost oblivious to isolation boundaries, even when they are spelt out to them.

None of this is a problem if you are vibe coding, but you can only do that when you're targeting a pretty low quality level. That's entirely appropriate in some cases of course, but when it isn't you need heavy reviews from skilled programmers. No senior engineer is going to stomach the repeated stretches of almost the "same but not quite" code they churn out.

You don't have to take my word for it. Try asking Google "do llm's produce verbose code".

random_human_ 11 minutes ago [-]
Is foo a pointer in your example? Is free(NULL) not a valid operation?
com2kid 9 hours ago [-]
What are you trying to do?

Write code? No. Use frontier models. They are subsidized and amazing and they get noticably better ever few months.

Literally anything else? Smaller models are fine. Classifiers, sentiment analysis, editing blog posts, tool calling, whatever. They go can through documents and extract information, summarize, etc. When making a voice chat system awhile back I used a cheap open weight model and just asked it "is the user done speaking yet" by passing transcripts of what had been spoken so far, and this was 2 years ago and a crappy cheap low weight model. Be creative.

I wouldn't trust them to do math, but you can tool call out to a calculator for that.

They are perfectly fine at holding conversations. Their weights aren't large enough to have every book ever written contained in them, or the details of every movie ever made, but unless you need that depth and breadth of knowledge, you'll be fine.

space_fountain 8 hours ago [-]
I just mean is the claim that the open source models where the closed models were 12 to 6 months ago true? They do seem to be for some specific tasks which is cool, but they seem even more uneven in skills than the frontier model. They're definitely useful tools, but I'm not sure if they're a match for frontier models from a year ago?
com2kid 7 hours ago [-]
Frontier models from a year ago had issues with consistent tool calling, instruction following was pretty good but could still go off the rails from time to time.

Open weight models have those same issues. They are otherwise fine.

You can hook them up to a vector DB and build a RAG system. They can answer simple questions and converse back and forth. They have thinking modes that solve more complex problems.

They aren't going to discover new math theorems but they'll control a smart home and manage your calendar.

dist-epoch 9 hours ago [-]
Buy new Macs from where? There is a shortage of RAM, SSD, GPUs, and the CPU shortage just started.
byyoung3 9 hours ago [-]
distillation is an equalizing force
isawczuk 10 hours ago [-]
It's artificial scarcity. LLM inference will soon be commodity as cloud.

There is a 2-3years still before ASIC LLM inferences will catch up.

observationist 10 hours ago [-]
The problem with this idea is that someone can, and likely will, come up with the next best architecture that leapfrogs the current frontier models at least once a year, likely faster, for the foreseeable future. This means by the time you've manufactured your LLM on an ASIC, it's 4-5 generations behind, and probably much less efficient than current SOTA model at scale.

It won't make sense for ASIC LLMs to manifest until things start to plateau, otherwise it'll be cheaper to get smarter tokens on the cloud for almost all use cases.

That said, a 10 trillion parameter model on a bespoke compute platform overcomes a lot of efficiency and FOOM aspects of the market fit, so the angle is "when will models that can be run on an asic be good enough that people will still want them for various things even if the frontier models are 10x smarter and more efficient"

I think we're probably a decade of iteration on LLMs out, at least, and the entire market could pivot if the right breakthrough happens - some GPT-2 moment demonstrating some novel architecture that convinces the industry to make the move could happen any time now.

vessenes 10 hours ago [-]
I don't think so. GB200 prices are GOING UP. A100s are still expensive. This implies massive utilization and demand, no? These machines are not sitting idle, or prices would drop in the very competitive hyperscaler environment.
Morromist 9 hours ago [-]
Hard to say at this point. I'm sure you can run your LLM chips 24/7 for training and for the public to make weird thirst-trap videos about Judy Hopps but how real is the utilization and demand, really? Maybe very real, maybe not, I don't think we can know yet.

Its like being back in 1850 and you build the world's first amusement park where the rides are free or very cheap. People are like Amusement parks are the next big thing since Steam Boats! And tons of other rich people start to build huge amusement parks everywhere. The people who are skilled at making amusement park rides will increase their prices, and since the first amusement parks are free so they can get the public going to them demand will be huge.

But how sustainable is that? - well obviously we know from history that amusement parks did, in fact, take over the world and most people spent virtually all their time and money at amusement parks - I think the Crimean War was even fought over some religious-based theme park in Israel - until moving pictures came out, so it worked out for them, but for AI?

mattas 10 hours ago [-]
This notion that "we don't have enough compute" does not cleanly reconcile with the fact that labs are burning cash faster than any cohort of companies in history.

If I am a grocery store that pays $1 for oranges and sells them for $0.50, I can't say, "I don't have enough oranges."

FloorEgg 10 hours ago [-]
There is a major logic flaw in what you're saying.

'If I am a grocery store that pays $1 for oranges and sells them for $0.50, I can't say, "I don't have enough oranges."'

How about 'if I'm a grocery store and I see no limit on demand for oranges at $.50 but they are currently $1, I can say 'if oranges were cheaper I could sell orders of magnitude more of them'.

Buying oranges for $1 and selling for $0.5 is an investment into acquiring market share and customer relationships and a gamble on the price of oranges falling in the future.

0x3f 10 hours ago [-]
> acquiring market share and customer relationships

The whole setup rests on this, and it seems mythical to me. These guys have basically equivalent products at this point.

9 hours ago [-]
earthnail 10 hours ago [-]
If there were more oranges you’d pay less to buy them and your economics would work out.
0x3f 10 hours ago [-]
Not sure if this is a joke or not, but competitive pressure still exists. This only really holds if you're the only orange seller.
TeMPOraL 9 hours ago [-]
You can if you're exhausting the global production of oranges.
vessenes 10 hours ago [-]
You misunderstand.

"I built a ship to go to the Indies and bring back tea."

"Bro, the ship cost 100,000 pounds sterling and only brought back 50,000 pounds of tea. I don't care if you paid 12,500 pounds for the tea itself, you're losing money."

There is a very rational reason labs are spending everything they can get for more compute right now. The tea (inference) pays 60%+ margins. And that is rising. And that number is AFTER hyper scalars make their margins. There is an immense amount of profit floating around this system, and strategics at the edge believing they can build and control the demand through combined spend on training and inference in the proper ratios.

SpicyLemonZest 10 hours ago [-]
60%+ margins according to numbers which are not published publicly and have not AFAICT been audited.

Could they be accurate? Sure, I think people who claim this is impossible are overconfident. But I would encourage anyone who assumes they must be right to read a history of the Worldcom scandal. It's really quite easy for a person who wants to be making money (or an LLM who's been instructed to "run the accounts make no mistakes"!) to incorrectly categorize costs as capital investments when nobody's watching carefully.

10 hours ago [-]
paulddraper 10 hours ago [-]
This is wrong along multiple axes.

1. Supply can scale. You can point to COVID/supply-chain shocks, but the problem there is temporary changes. No one spins up a whole fab to address a 3 month spike. Whereas AI is not a temporary demand change.

2. Models are getting more efficient. DeepSeek V3 was 1/10th the cost of contemporary ChatGPT. Open weight models get more runnable or smarter every month. Cutting edge is always cutting edge, but if scarcity is real, model selection will adjust to fit it.

SadErn 10 hours ago [-]
[dead]
Lapalux 10 hours ago [-]
"The first hit is free....."
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 07:33:34 GMT+0000 (Coordinated Universal Time) with Vercel.