Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Intel's make-or-break 18A process node debuts for data center with 288-core Xeon (tomshardware.com)

301 points by vanburen 41 days ago | 294 comments

stego-tech 41 days ago [-]

These sorts of core-density increases are how I win cloud debates in an org.

* Identify the workloads that haven't scaled in a year. Your ERPs, your HRIS, your dev/stage/test environments, DBs, Microsoft estate, core infrastructure, etc. (EDIT, from zbentley: also identify any cross-system processing where data will transfer from the cloud back to your private estate to be excluded, so you don't get murdered with egress charges)

* Run the cost analysis of reserved instances in AWS/Azure/GCP for those workloads over three years

* Do the same for one of these high-core "pizza boxes", but amortized over seven years

* Realize the savings to be had moving "fixed infra" back on-premises or into a colo versus sticking with a public cloud provider

Seriously, what took a full rack or two of 2U dual-socket servers just a decade ago can be replaced with three 2U boxes with full HA/clustering. It's insane.

Back in the late '10s, I made a case to my org at the time that a global hypervisor hardware refresh and accompanying VMware licenses would have an ROI of 2.5yrs versus comparable AWS infrastructure, even assuming a 50% YoY rate of license inflation (this was pre-Broadcom; nowadays, I'd be eyeballing Nutanix, Virtuozzo, Apache Cloudstack, or yes, even Proxmox, assuming we weren't already a Microsoft shop w/ Hyper-V) - and give us an additional 20% headroom to boot. The only thing giving me pause on that argument today is the current RAM/NAND shortage, but even that's (hopefully) temporary - and doesn't hurt the orgs who built around a longer timeline with the option for an additional support runway (like the three-year extended support contracts available through VARs).

If we can't bill a customer for it, and it's not scaling regularly, then it shouldn't be in the public cloud. That's my take, anyway. It sucks the wind from the sails of folks gung-ho on the "fringe benefits" of public cloud spend (box seats, junkets, conference tickets, etc...), but the finance teams tend to love such clear numbers.

carefree-bob 41 days ago [-]

The main cost with on-prem is not the price of the gear but the price of acquiring talent to manage the gear. Most companies simply don't have the skillset internally to properly manage these servers, or even the internal talent to know whether they are hiring a good infrastructure engineer or not during the interview process.

For those that do, your scaling example works against you. If today you can merge three services into one, then why do you need full time infrastructure staff to manage so few servers? And remember, you want 24/7 monitoring, replication for disaster recovery, etc. Most businesses do not have IT infrastructure as a core skill or differentiator, and so they want to farm it out.

throwup238 41 days ago [-]

> even the internal talent to know whether they are hiring a good infrastructure engineer or not during the interview process.

This is really the core problem. Every time I’ve done the math on a sizable cloud vs on-prem deployment, there is so much money left on the table that the orgs can afford to pay FAANG-level salaries for several good SREs but never have we been able to find people to fill the roles or even know if we had found them.

The numbers are so much worse now with GPUs. The cost of reserved instances (let alone on-demand) for an 8x H100 pod even with NVIDIA Enterprise licenses included leaves tens of thousands per pod for the salary of employees managing it. Assuming one SREs can manage at least four racks the hardware pays for itself, if you can find even a single qualified person.

everforward 41 days ago [-]

I work in SRE and the way you describe it would give me pause.

The first is that SRE team size primarily scales with the number of applications and level of support. It does scale with hardware but sublinearly, where number of applications usually scales super linearly. It takes a ton less effort to manage 100 instances of a single app than 1 instance of 100 separate apps (presuming SRE has any support responsibilities for the app). Talking purely in terms of hardware would make me concerned that I’m looking at an impossible task.

The second (which you probably know, but interacts with my next point) is that you never have single person SRE teams because of oncall. Three is basically the minimum, four if you want to avoid oncall burnout.

The last is that I don’t know many SREs (maybe none at all) that are well-versed enough in all the hardware disciplines to manage a footprint the size we’re talking. If each SRE is 4 racks and a minimum team size is 4, that’s 16 racks. You’d need each SRE to be comfortable enough with networking, storage, operating system, compute scheduling (k8s, VMWare, etc) to manage each of those aspects for a 16 rack system. In reality, it’s probably 3 teams, each of them needs 4 members for oncall, so a floor of like 48 racks. Depending on how many applications you run on 48 racks, it might be more SREs that split into more specialized roles (a team for databases, a team for load balancers, etc).

Numbers obviously vary by level of application support. If support ends at the compute layer with not a ton of app-specific config/features, that’s fewer folks. If you want SRE to be able to trace why a particular endpoint is slow right now, that’s more folks.

PunchyHamster 41 days ago [-]

> The last is that I don’t know many SREs (maybe none at all) that are well-versed enough in all the hardware disciplines to manage a footprint the size we’re talking. If each SRE is 4 racks and a minimum team size is 4, that’s 16 racks. You’d need each SRE to be comfortable enough with networking, storage, operating system, compute scheduling (k8s, VMWare, etc) to manage each of those aspects for a 16 rack system. In reality, it’s probably 3 teams, each of them needs 4 members for oncall, so a floor of like 48 racks. Depending on how many applications you run on 48 racks, it might be more SREs that split into more specialized roles (a team for databases, a team for load balancers, etc).

That's vastly overstating it. You hit nail in the head in previous paragraphs, it's number of apps (or more generally speaking ,environments) that you manage, everything else is secondary.

And that is especially true with modern automation tools. Doubling rack count is big chunk of initial time spent moving hardware of course, but after that there is almost no difference in time spent maintaining them.

In general time per server spent will be smaller because the bigger you grow the more automation you will generally use and some tasks can be grouped together better.

Like, at previous job, server was installed manually, coz it was rare.

At my current job it's just "boot from network, pick the install option, enter the hostname, press enter". Doing whole rack (re)install would take you maybe an hour, everything else in install is automated, you write manifest for one type/role once, test it, and then it doesn't matter whether its' 2 or 20 servers.

If we grew server fleet say 5-fold, we'd hire... one extra person to a team of 3. If number of different application went 5-fold we'd probably had to triple the team size - because there is still some things that can be made more streamlined.

Tasks like "go replace failed drive" might be more common but we usually do it once a week (enough redundancy) for all servers that might've died, if we had 5x the number of servers the time would be nearly the same because getting there dominates the 30s that is needed to replace one.

ori_b 41 days ago [-]

Noteworthy: the number of apps isn't affected by whether the machines are in your datacenter or Amazon's.

everforward 40 days ago [-]

I would call what you’re describing Datacenter Operations, with the exception of PXE boot.

You could have SRE do it, but most places don’t because you can get someone to swap a dead drive for way cheaper (it’s not really a complicated operation).

That growth of SRE teams comes from wanting reliability further up the stack. If you’re not on AWS, there’s no Aurora so someone has to be DBA to do backups, performance monitoring, configuring failovers for when a disk dies and RAID needs to rebuild, etc. Same for network, networked storage, yada yada

esseph 41 days ago [-]

So your definition of SRE is anybody that works on infra?

skissane 41 days ago [-]

> The first is that SRE team size primarily scales with the number of applications and level of support. It does scale with hardware but sublinearly, where number of applications usually scales super linearly. It takes a ton less effort to manage 100 instances of a single app than 1 instance of 100 separate apps (presuming SRE has any support responsibilities for the app). Talking purely in terms of hardware would make me concerned that I’m looking at an impossible task.

Never been an SRE but interact with them all the time…

My own personal experience is there is commonly a division between App SREs that look after the app layer and Infra SREs that looks after the infrastructure layer (K8S, storage, network, etc)

The App SRE role absolutely scales with the number of distinct apps. The extent to which the Infra SRE role does depends on how diverse the apps are in terms of their infrastructure demands

everforward 40 days ago [-]

Yeah, that’s valid, there are a few common layouts for SRE. I would call what you’re describing a horizontal layout (each team owns a layer for all apps that use that layer).

It sort of comes back to support levels. Your Infra SRE teams stay small if either a) an app SRE team owns application specific stuff, or b) SRE just doesn’t support application specific stuff. Eg if a particular query is slow but the DB is normal, who owns root causing that? Whoever does needs headcount, whether it’s app SRE, infra SRE or the devs.

kaliszad 40 days ago [-]

Many people assume that companies need or want global enterprise level of management of infrastructure or 24/7 support. That's simply not the case. Many small and mid-sized companies just need their applications to run. There is no CTO on the board and nobody else really cares where the stuff runs if it fits a certain budget, is available enough to not cause major disruptions and is responsive enough to not cause complaints. Some companies may care about a certain level of compliance/ security and whether their admins/ DevOps people seem to be in agony most of the time but of those there aren't many. That's also a reason why the EU introduced directives such as NIS2, DORA, CRA, CER, even the now 10 year old GDPR and more.

Most companies I have seen have never updated the BIOS of their servers, nor the firmware on their switches. Some of those have production applications on Windows XP or older and you can see VMware ESXi < 6.5 still in the wild. The same for all kinds of other systems including Oracle Linux 5.5 with some ancient Oracle DB like 10g or something, that was the case like 5 years ago but I don't think the company has migrated away completely to this day.

Any sufficiently old company will accrete systems and approaches of various vintages over time only very slowly ripping out some of those systems. Usually what happens is that parts of old systems or old workarounds will live on for decades after they have been supposedly decommissioned. I had a colleague who was using CRT monitors in 2020 with computers of similar vintage, probably with Pentium III or early Pentium IV, because he had everything set up there and it just worked for what he was doing. I don't admire it, yet that stuff works and I do respect that people don't want to replace expensive systems just because they are out of support, when they do actually work and they have people taking care of them.

everforward 40 days ago [-]

Totally, but then you probably don’t want SREs. If you’re okay with 99% availability (~7 hours of downtime a month assuming 24x7 goal), you can get by with much cheaper staffing and won’t have to deal with the turnover from SREs who get bored.

tgrowazay 41 days ago [-]

Self-hosted 8xH100 is ~$250k, depreciated across three years => $80k/year, with power and cooling => $90k/year (~$10/hour total).

AWS charges $55/hour for EC2 p5.48xlarge instance, which goes down with 1 or 3 year commitments.

With 1 year commitment, it costs ~$30/hour => $262k per year.

3-year commitment brings price down to $24/hour => $210k per year.

This price does NOT include egress, and other fees.

So, yeah, there is a $120k-$175k difference that can pay for a full-time on-site SRE, even if you only need one 8xH100 server.

Numbers get better if you need more than one server like that.

Aurornis 41 days ago [-]

$120K isn't going to cover the fully loaded costs of an SRE who can set up and run that.

Hiring 1 person to run the infrastructure means that 1 person is on-call 24/7 forever.

If there's an issue with the server while they're sick or on vacation, you just stop and wait.

If they take a new job, you need to find someone to take over or very quickly hire a replacement.

There's a second bus factor: What happens when that 8xH100 starts to get flakey? You can't move the jobs to another server because you only have one. You can start diagnosing things and replacing parts and hope it gets to the root issue, but that's more downtime.

Going on-prem like this is highly risky. It works well until the hardware starts developing problems or the person in charge gets a new job. The weeks and months lost to dealing with the server start to become a problem. The SRE team starts to get tired of having to do all of their work on weekends because they can't block active use during the week. Teams start complaining that they need to use cloud to keep their project moving forward.

Figs 41 days ago [-]

> $120K isn't going to cover the fully loaded costs of an SRE who can set up and run that.

> Hiring 1 person to run the infrastructure means that 1 person is on-call 24/7 forever.

> If there's an issue with the server while they're sick or on vacation, you just stop and wait.

Very much depends on what you're doing, of course, but "you just stop and wait" for sickness/vacation sometimes is actually good enough uptime -- especially if it keeps costs down. I've had that role before... That said, it's usually better to have two or three people who know the systems though (even if they're not full time dedicated to them) to reduce the bus factor.

roryirvine 41 days ago [-]

So the entire business was happy to go offline for 2/3 weeks whenever their infra person fancied going off on their summer holiday?

By doing this, you're guaranteeing a bus factor of below 1. I can't think of any business that wouldn't see that as being a completely unacceptable risk.

Aurornis 40 days ago [-]

I agree.

I never understand the drive to stay away from cloud services for small scale operations. It’s not your money that’s being spent on the cloud, but it is your free time being asked to be on call when you encourage your company to self-host!

jononor 40 days ago [-]

Bus factor 1 is rarely enough for "entire business". But if the GPUs are for training models, and their users are the data scientists that are also on holiday around the same times - that might indeed be good enough policy.

Aurornis 40 days ago [-]

> and their users are the data scientists that are also on holiday around the same times

I’ve seen this before. It turns into restrictions on when you can schedule vacation times.

Not fun when your family wants to go on a trip but you can’t get the time off because it’s not one of the allowed vacation times.

jononor 40 days ago [-]

Ouch, that is indeed a risk one must be wary of. Can be a "works for the company but sucks for employees". Which can also drain the company of skilled people, a poor trade in most cases.

justsomehnguy 41 days ago [-]

If a business which require at least a quarter million bucks worth of hardware for the basic operation yet it can't pay the market rate for someonr who would operate it - maybe the basics of that business is not okay?

stego-tech 40 days ago [-]

This.

Companies following consultant reports will usually end up offering 50% ranges, which for SRE/SIE roles in major metros comes to around $163k. If they study BLS/FRED/CPI data and aim to pay someone enough for a 50/30/20 budget in a major metro at median rent, they’ll offer $175k to $200k+. If they want someone to stick around, buy an average home, lay roots, it’s $210k+, minimum.

“Six figures” doesn’t cover essentials anymore for almost every major city in the USA, and the last thing you can afford to cheap out on is the labor supporting your IT infra. Every corner you cut today on TC (outsourcing, offshoring, consulting) is just letting fires rage until you either parachute out or everything burns down, and that’s not a game you can afford to play with critical business technologies.

Aurornis 40 days ago [-]

I’m not disagreeing. I’m explaining to the commenter above that $120K isn’t going to cover the costs of a full-time SRE who will be on call 24/7

If a business can’t afford a properly staffed crew with enough allowance to cover a rotation of on call duties and allow for vacations, they should prefer the managed cloud services.

You’re paying more but you’re buying freedom and flexibility.

Manuel_D 41 days ago [-]

> There's a second bus factor: What happens when that 8xH100 starts to get flakey? You can't move the jobs to another server because you only have one.

You can still use cloud for excess capacity when needed. E.g. use on-prem for base load, and spin up cloud instances for peaks in load.

stego-tech 40 days ago [-]

This is my favorite use of the public cloud: the modern-day “hot site”. It’s way cheaper to just pay reserved rates for failover instances of critical infra than a whole other unused site, assuming your particular compliance or regulatory frameworks allow it. Especially in an era of remote work, it’s highly practical and cost-effective.

PunchyHamster 41 days ago [-]

> There's a second bus factor: What happens when that 8xH100 starts to get flakey? You can't move the jobs to another server because you only have one. You can start diagnosing things and replacing parts and hope it gets to the root issue, but that's more downtime.

they come with warranty, often with technican guaranteed to arrive within few hours or at most a day. Also if SHTF just getting cloud to augument current lackings isn't hard

41 days ago [-]

formerly_proven 41 days ago [-]

> There's a second bus factor: What happens when that 8xH100 starts to get flakey?

These come in a non-flakey variant?

spwa4 41 days ago [-]

It's called a warranty.

And the other argument: every company I've ever know to do AWS has an AWS sysadmin (sorry "devops"), same for Azure. Even for small deployments. And departments want their own person/team.

Aurornis 40 days ago [-]

You can tell in this thread who has and who hasn’t had to work with this hardware.

My favorite are the responses from people saying the warranty will have someone show up in “hours” and fix it. Best of luck to you.

stego-tech 41 days ago [-]

Out of all the comments on numbers, SREs, and scaling, you get the response for meeting numbers with numbers!

> $120K isn't going to cover the fully loaded costs of an SRE who can set up and run that.

Literally this. I can do SRE on-prem and cloud, and my 50/30/20 budget break-even point (as in, needs and savings but no wants - so 70%) is $170k before taxes. Rent is astonishingly high right now, and the sort of mid-career professional you want to handle SRE for your single DC is going to take $150k in this market before fucking off to the first $200k job they get.

Know your market, and pay accordingly. You cannot fuck around with SREs.

> Hiring 1 person to run the infrastructure means that 1 person is on-call 24/7 forever.

This is less of an issue than you might think, but strongly dependent upon the quality of talent you’ve retained and the budget you’ve given them. Shitbox hardware or cheap-ass talent means you’ll need to double or triple up locally, but a quality candidate with discretion can easily be supported by a counterpart at another office or site, at least short-term. Ideally though, yeah, you’ll need two engineers to manage this stack, but AWS savings on even a modest (~700 VMs) estate will cover their TC inside of six months, generally.

This strikes at another workload I neglected to mention, and one I highly recommend keeping in the public cloud: GPUs.

GPUs on-prem suck. Drivers are finnicky, firmware is flakey, vendor support inconsistent, and SR-IOV is a pain in the ass to manage at scale. They suck harder than HBAs, which I didn’t think was possible.

If you’re consuming GPUs 24x7 and can afford to support them on-prem, you’re definitely not here on HN killing time. For everyone else, tune your scaling controls on your cloud provider of choice to use what you need, when you need it, and accept the reality that hyperscalers are better suited for GPU workloads - for now.

> Going on-prem like this is highly risky.

Every transaction is risky, but the risk calculus for “static” (ADDS) or “stable” (ERP, HRIS, dev/test) work makes on-prem uniquely appealing when done right. Segment out your resources (resist the urge for HPC or HCI), build sensible redundancies (on-prem or in the cloud), and lean on workhorse products over newer, fancier platforms (bulletproof hypervisors instead of fragile K8s clusters), and you can make the move successful and sensible. The more cowboy you go with GPUs, K8s, or local Terraform, the more delicate your infra becomes on-prem - and thus the riskier it is to keep there.

Keep it simple, silly.

throwup238 41 days ago [-]

> Out of all the comments on numbers, SREs, and scaling, you get the response for meeting numbers with numbers!

>> $120K isn't going to cover the fully loaded costs of an SRE who can set up and run that.

> Literally this. I can do SRE on-prem and cloud, and my 50/30/20 budget break-even point (as in, needs and savings but no wants - so 70%) is $170k before taxes. Rent is astonishingly high right now, and the sort of mid-career professional you want to handle SRE for your single DC is going to take $150k in this market before fucking off to the first $200k job they get.

That's $120k per pod. Four pods per rack at 50kW.

What universe are we living in that a single SRE can't manage even a single rack for less than half a million in total comp?

stego-tech 40 days ago [-]

> What universe are we living in that a single SRE can't manage even a single rack for less than half a million in total comp?

The kind where TC isn’t measured by pod managed, but by person hired. Also the world where median rent in major metros is $3500 a month.

If you think $120k is rich, you’re either operating in the boonies, outside the USA/Canada, or incredibly out of touch with the cost of living today and need to seriously go study BLS/FRED/CPI data sets to understand how expensive it is to live right now.

ragall 40 days ago [-]

> outside the USA/Canada

Indeed, there's no reason for a company to host this kind of batch compute in North America. You can get very good people in Eastern Europe at 1/3 the cost.

Aurornis 40 days ago [-]

I like how this simple claim about being cheaper to self-host a single server has now escalated to opening an office in Eastern Europe and hiring people there to manage it.

ragall 40 days ago [-]

The trend of opening offices in Europe started one year into Covid. I'm sure that there are companies that haven't opened an office there yet, but fewer than one might imagine.

lazylizard 41 days ago [-]

i am not sre, merely sysadmin.

and somehow i have this impression that gpus on slurm/pbs could not be simpler.

u can use a vm for the head node, dont even need the clustering really..if u can accept taking 20min to restore a vm.. and the rest of the hardware are homogeneous - you setup 1 right and the rest are identical.

and its a cluster with a job queue.. 1 node going down is not the end of the world..

ok if u have pcie GPUs sometimes u have to re-seat them and its a pain. otherwise if ur h200 or disks fail u just replace them, under warranty or not...

stego-tech 40 days ago [-]

That sounds way easier than the methods I’ve had to manage GPUs in the Enterprise on-prem thus far (PCIe cards slotted into hypervisor boxes and shared via SR-IOV). I’ll have to look into it, but I doubt it’ll ever enter my personal wheelhouse given how quickly GPU-based workloads are either moved to the cloud for effective utilization at scale, or onto custom accelerators for edge workloads/inference.

charcircuit 41 days ago [-]

>If there's an issue with the server while they're sick or on vacation, you just stop and wait.

You can ask AI to troubleshoot and fix the issue.

harrall 41 days ago [-]

You didn’t find people because SREs don’t do that.

You wanted sysadmins / IT / data center technicians.

red-iron-pine 40 days ago [-]

yeah homie is talking about DevSecOps and what he needs to hire is a cable monkey

no shortage of IT talent in 2026, the market is literally overflowing with resumes and wages are dropping. huge gluts of fairly generic online degree holders.

they can use AI to write basic Ansible just as well as my Seniors

Eridrus 41 days ago [-]

I disagree with on-prem being ideal for GPU for most people.

If you're doing regular inference for a product with very flat throughput requirements (and you're doing on-prem already), on-prem GPUs can make a lot of sense.

But if you're doing a lot of training, you have very bursty requirements. And the H100s are specifically for training.

If you can have your H100 fleet <38% utilized across time, you're losing money.

If you have batch throughput you can run on the H100s when you're not training, you're probably closer to being able to wanting on-prem.

But the other thing to keep in mind is that AWS is not the only provider. It is a particularly expensive provider, and you can buy capacity from other neoclouds if you are cost-sensitive.

41 days ago [-]

ozgrakkurt 41 days ago [-]

This factually did not play out like this in my experience.

The company did need the same exact people to manage AWS anyway. And the cost difference was so high that it was possible to hire 5 more people which wasn't needed anyway.

Not only the cost but not needing to worry about going over the bandwidth limit and having soo much extra compute power made a very big difference.

Imo the cloud stuff is just too full of itself if you are trying to solve a problem that requires compute like hosting databases or similar. Just renting a machine from a provider like Hetzner and starting from there is the best option by far.

Drunkfoowl 41 days ago [-]

[dead]

LunaSea 41 days ago [-]

> The company did need the same exact people to manage AWS anyway.

That is incorrect. On AWS you need a couple DevOps that will Tring together the already existing services.

With on premise, you need someone that will install racks, change disks, setup high availability block storage or object storage, etc. Those are not DevOps people.

PunchyHamster 41 days ago [-]

> With on premise, you need someone that will install racks, change disks, setup high availability block storage or object storage, etc. Those are not DevOps people.

we have 7 racks and 3 people. The things you mentioned aren't even 5% of the workload.

There are things you figure out once, bake into automation, and just use.

You install server once and remove it after 5-10 years, depending on how you want to depreciate it. Drives die rarely enough it's like once every 2 months event at our size

The biggest expense is setting up automation (if I was re-doing our core infrastructure from scratch I'd probably need good 2 months of grind) but after that it's free sailing. Biggest disadvantage is "we need a bunch of compute, now", but depending on business that might never be a problem, and you have enough savings to overbuild a little and still be ahead. Or just get the temporary compute off cloud.

vel0city 41 days ago [-]

> Biggest disadvantage is "we need a bunch of compute, now"

And depending on the problem set in question, one can also potentially leverage "the cloud" for the big bursty compute needs and have the cheap colo for the day to day stuff.

For instance, in a past life the team I worked on needed to run some big ML jobs while having most things on extremely cheap colo infra. Extract the datasets, upload the extracted and well-formatted data to $cloud_provider, have VPN connectivity for the small amount of other database traffic, and we can burst to have whatever compute needed to get the computations done really quick. Copy the results artifact back down, deploy to cheap boxes back at the datacenter to host for clients stupid-cheap.

ocdtrekkie 41 days ago [-]

People will install racks and swap drives for significantly less money than DevOps, lol. People who can build LEGO sets are cheaper than software developers.

lightedman 41 days ago [-]

"Those are not DevOps people."

Real Devops people are competent from physical layer to software layer.

Signed,

Aerospace Devop

ragall 40 days ago [-]

There are no "Devops people". DevOps was created to mean a world where the DEVelopers are doing OPS, hence there cannot be "Devops people", as it would be a contradiction in terms. If you're specialized, you're just "Ops".

LunaSea 41 days ago [-]

> Real Devops people are competent from physical layer to software layer.

This is usually not the case because DevOps are often people that mostly worked on cloud services and Kubernetes clusters and not real hardware since most companies do not have on premise hardware anymore.

b40d-48b2-979e 41 days ago [-]

What a naïve take. Real™ DevOps know what they need to know.

rcxdude 41 days ago [-]

Moving around the physical hardware is a truly tiny part of the actual job, it's really not relevant. (especially nowadays, see the top level comment about how you can do an insane amount (probably more than the median cloud deployment) with a fraction of a rack).

ozgrakkurt 41 days ago [-]

To be clear, I'm not writing about on-premise. I mean difference between managed cloud and renting dedicated servers

Dylan16807 41 days ago [-]

Even if you do include physical server setup and maintenance, one or two days per month is probably enough enough for a couple hundred rack units.

LunaSea 41 days ago [-]

Ah sorry, yes, that makes sense.

mgaunard 41 days ago [-]

Ops people are typically more useful given you probably already have devs.

PunchyHamster 41 days ago [-]

> The main cost with on-prem is not the price of the gear but the price of acquiring talent to manage the gear. Most companies simply don't have the skillset internally to properly manage these servers, or even the internal talent to know whether they are hiring a good infrastructure engineer or not during the interview process.

That's partially true; managing cloud also takes skill, most people forget that with end result being "well we saved on hiring sysadmins, but had to have more devops guys". Hell I manage mostly physical infrastructure (few racks, few hundred VMs) and good 80% of my work is completely unrelated to that, it's just the devops gluing stuff together and helping developers to set their stuff up, which isn't all that different than it would be in cloud.

> And remember, you want 24/7 monitoring, replication for disaster recovery, etc.

And remember, you need that for cloud too. Plenty of cloud disaster stories to see where they copy pasted some tutorial thinking that's enough then surprise.

There is also partial way of just getting some dedicated servers from say OVH and run infra on that, you cut out a bit of the hardware management from skillset and you don't have the CAPEX to deal with.

But yes, if it is less than at least a rack, it's probably not worth looking for onprem unless you have really specific use case that is much cheaper there (I mean less than usual half)

dgxyz 41 days ago [-]

This is not the case. We had to double staff count going from three cages to AWS. And AWS was a lot more expensive. And now we're stuck.

On top of that no one really knows what the fuck they are doing in AWS anyway.

tempaccount5050 41 days ago [-]

You need the exact same people to run the infra in the cloud. If they don't have IT at all, they aren't spinning up cloud VMs. You're mixing together SaaS and actual cloud infra.

danielheath 41 days ago [-]

I'm one of those people, and I don't agree.

Before I drop 5 figures on a single server, I'd like to have some confidence in the performance numbers I'm likely to see. I'd expect folk who are experienced with on-prem have a good intuition about this - after a decade of cloud-only work, I don't.

Also, cloud networking offers a bunch of really nice primitives which I'm not clear how I'd replicate on-prem.

I've estimated our IT workload would roughly double if we were to add physically racking machines, replacing failed disks, monitoring backups/SMART errors etc. That's... not cheap in staff time.

Moving things on-prem starts making financial sense around the point your cloud bills hit the cost of one engineers salary.

ragall 40 days ago [-]

> I've estimated our IT workload would roughly double if we were to add physically racking machines, replacing failed disks, monitoring backups/SMART errors etc.

That's why nowadays one would use a managed collocation service, not hosting a rack in the office basement.

esseph 41 days ago [-]

> Also, cloud networking offers a bunch of really nice primitives which I'm not clear how I'd replicate on-prem.

Like what?

SamuelAdams 41 days ago [-]

IAM comes to mind, with fine grained control over everything.

S3 has excellent legal and auditory settings for data, as well as automatic data retention policies.

KMS is a very secure and well done service. I dare you to find an equivalent on-prem solution that offers as much security.

And then there's the whole DR idea. Failing over to another AWS region is largely trivial if you set it up correctly - on prem is typically custom to each organization, so you need to train new staff with your organizations workflows. Whereas in AWS, Route53 fail-over routing (for example) is the same across every organization. This reduces cost in training and hiring.

esseph 41 days ago [-]

I've worked at many enterprises that have done and do these very things. Some for fixed workloads at scale, some for data creation/use locality issues, some for performance. I think there is about a 15 year knowledge gap in on-prem competence and what the newest shiniest is on prem for some people. Yes, some of the vendors and gear are VERY bad, but not all, and there's always eBPF :)

danielheath 41 days ago [-]

The biggest one for me is the way AWS security groups & IAM work.

In AWS, it's straightforward to say e.g. "permit traffic on port X from instances holding IAM role Y".

You can easily e.g. get the firewall rules for all your ec2 instances in a structured format.

I really would not look forward to building something even 1/10th as functional as that.

esseph 41 days ago [-]

I would probably just build the infra in crossplane which standardizes a lot of features across the board and gives developers a set of APIs to use / dashboard against. Different deployments and orgs have different needs and desire different features though.

tempaccount5050 41 days ago [-]

And you think just anyone can set that up? No sys admin/infra guy needed? Seems pretty risky.

vel0city 41 days ago [-]

I mean not just anyone, but its far less complicated than dealing with arcane iptables commands. And yet far more powerful, being able to just say "instances like this can talk to instances like this in these particular ways, reject everything else". Don't need subnet rules or whatever, its all about identity of the actual things.

Meanwhile lots of enterprise firewalls barely even have a concept of "zones". Its practically not even close to comparing for most deployments. Maybe with extremely fancy firewall stacks with $ $MAX_INT service contracts one can do something similar. But I guess with on-prem stuff things are often less ephemeral, so there's slightly less need.

ahartmetz 41 days ago [-]

I could type your arcane iptables commands for a couple hundred an hour. That stuff is easy compared to some software development tasks. I have sometimes struggled, but I've always found a solution after a few hours max.

esseph 41 days ago [-]

> I guess with on-prem stuff things are often less ephemeral, so there's slightly less need

Kubernetes is running on bare metal quite a lot of places.

__turbobrew__ 41 days ago [-]

BGP based routing is a major pain in the ass to do on-prem. If you want true HA in the datacenter you are going to need to utilize BGP.

esseph 41 days ago [-]

I mean, BGP EVPN is the datacenter standard. (Linux infra / k8s / networking guy)

__turbobrew__ 40 days ago [-]

There are standards but actually designing a sane network architecture, buying all of the correct network hardware, and configuring all of the software to properly use that hardware is hard. At my company we have a team of about 20 people whose job it is to just design, install, and run the network.

esseph 40 days ago [-]

> There are standards but actually designing a sane network architecture, buying all of the correct network hardware, and configuring all of the software to properly use that hardware is hard. At my company we have a team of about 20 people whose job it is to just design, install, and run the network.

Network engineers do network engineering :)

justsomehnguy 41 days ago [-]

> main cost with on-prem is not the price of the gear but the price of acquiring talent to manage the gear

Not quite. If you hire a bad talent to manage your 'cloud gear' then you would find what the mistakes which would cost you nothing on-premises would cost you in the cloud. Sometimes - a lot.

boltzmann-brain 41 days ago [-]

As opposed to talent to manage the AWS? Sorry, AWS loses here as well.

carefree-bob 41 days ago [-]

I know of AWS's reputation as a business and what the devs say who work there, so I have no argument against your point, except to say that they do manage to make it work. Somewhere in there must be some unsung heroes keeping the whole thing online.

rcxdude 41 days ago [-]

The point being that AWS runs AWS, they don't run your business on AWS. You still need someone to actually set up AWS to do what you want, much like you would need someone to run your on-premises servers. And in my experience, the difference is not much.

boltzmann-brain 41 days ago [-]

The biggest issue is that with colo you're building a skill pool that can be used forever, with AWS you're building a skill pool centered around a corporate entity's business strategies and an inscrutable, closed-source system, which is not sustainable.

barrkel 41 days ago [-]

What about the cost of k8s and AWS experts etc.?

citrin_ru 41 days ago [-]

> price of acquiring talent to manage the gear

Is it still a problem in 2026 when unemployment in IT is rising? Reasons can be argued (the end of ZIRP or AI) but hiring should be easier than it was at any time during the last 10 years.

Figs 41 days ago [-]

Hiring people is still fucked in 2026 in my experience. HR processes are extremely dysfunctional at many organizations...

PunchyHamster 41 days ago [-]

people with that set of skills are never looking for job for long.

bdangubic 41 days ago [-]

hiring in 2026 is 100x harder than ever before

tiew9Vii 41 days ago [-]

This comes up again and again. It was the original sales pitch from cloud vendors.

Often the very same companies repeating this messaging are recruiting and paying large teams of platform developers to manage their cloud…and pay for them to be on call.

briffle 41 days ago [-]

While I agree with you, some solutions, such as Oxide Computing could come pretty close to having all the ease of cloud, one whole rack of computers at a time.

hattmall 41 days ago [-]

To me this doesn't sound logical because you still have to hire someone to manage your cloud deployments which is an entire specialized discipline. Yeah you can get some leeway the job being fully remote I guess but ultimately you aren't reducing headcount as linearly as you seem to imply by going cloud vs on-prem.

stego-tech 40 days ago [-]

Except they do have IT infra as a skill - or did - and replaced it with higher-paid “cloud architects” who just manage VM fleets and DBaaS in the vendor they’re certified in, which could just as easily be done on-site nowadays. They’re lower-skilled overall but more specialized, and thus command higher pay.

Generalists (it me) typically command lower market rates, remain far more flexible, and can replicate much of the experience on-prem while knowing when and why to put something in the public cloud. A few examples:

* 24/7 Telemetry and Monitoring: if you have the talent to roll your own with OpenTelemetry, Grafana, Prometheus, and a database to store the telemetry involved, then great! If it’s me wrangling a hybrid environment though, I’m likely leaning on New Relic to save on headcount and deliver similar results.

* DBaaS: this is increasingly just offered by hypervisor managers since it’s often just spooling up a container and pointing to storage. Not quite a “solved problem”, but enough of one that a single DBA can cover both estates if needed - or a moderately skilled generalist can at least secure it for internal use before offering it out to customers

* Vulnerability Management: as much as I’d love to have an internal Red Team, it’s an order of magnitude cheaper to leverage Nessus, Wazuh, Wiz, or any of the other fleets of continuous scanners to identify vulnerabilities or misconfigurations

That’s just me cherry-picking. The point is less “do everything possible on-prem”, and more “diversify workload placement depending on cost advantages relative to risk models”, and on-prem wins quite handily for a lot of LOB software that just quietly sits and does its thing with minimal fuss.

nszceta 41 days ago [-]

Managing AWS is a ton of work anyway

highfrequency 41 days ago [-]

Given how good Apple Silicon is these days, why not just buy a spec'd out Mac Studio (or a few) for $15k (512 GB RAM, 8 TB NVMe), maybe pay for S3 only to sync data across machines. No talent required to manage the gear. AWS EC2 costs for similar hardware would net out in something ridiculous like 4 months.

zbentley 41 days ago [-]

That’s definitely the right call in some cases. But as soon as there’s any high-interconnect-rate system that has to be in cloud (appliances with locked in cloud billing contracts, compute that does need to elastically scale and talks to your DB’s pizza box, edge/CDN/cache services with lots of fallthrough to sources of truth on-prem), the cloud bandwidth costs start to kill you.

I’ve had success with this approach by keeping it to only the business process management stacks (CRMs, AD, and so on—examples just like the ones you listed). But as soon as there’s any need for bridging cloud/onprem for any data rate beyond “cronned sync” or “metadata only”, it starts to hurt a lot sooner than you’d expect, I’ve found.

stego-tech 41 days ago [-]

Yep, 100%, but that's why identifying compatible workloads first is key. A lot of orgs skip right to the savings pitch, ignorant of how their applications communicate with one another - and you hit the nail on the head that applications doing even some processing in a cloud provider will murder you on egress fees by trying to hybrid your app across them.

Folks wanting one or the other miss savings had by effectively leveraging both.

hedora 41 days ago [-]

Any experience with the mid-to-small cloud providers that provide un-metered network ports and/or free interconnect with partner providers?

(For various reasons, I just care about VPS/bare metal, and S3-compatiblity.)

I'm looking at those because I'm having difficulty forecasting bandwidth usage, and the pessimistic scenarios seem to have me inside the acceptable use policies of the small providers while still predicting AWS would cost 5-10x more for the same workload.

stackskipton 41 days ago [-]

Vultr and Digital Ocean both offer Direct Connects. I've had good experience with their VPSes.

Imustaskforhelp 41 days ago [-]

Netcup and OVH provide free un-metered ports. There are actually lots of options available on the market. BuyVM is another good one.

PaulKeeble 41 days ago [-]

What has surprised me about the cloud is that the price has been towards ever increasing prices for cores. Yet the market direction is the opposite, what used to be a 1/2 or a 1/4 of a box is now 1/256 and its faster and yet the price on the cloud has gone ever up for that core. I think their business plan is to wipe out all the people who used to maintain the on premise machines and then they can continue to charge similar prices for something that is only getting cheaper.

Its hard drive and SSD space prices that stagger me on the cloud. Where one of the server CPUs might only be about 2x the price of buy a CPU for a few years if you buy less in a small system (all be it with less clock speed usually on the cloud) the drive space is at least 10-100x the price of doing it locally. Its got a bit more potential redudency but for that overhead you can repeat that data a lot of times.

As time has gone on the deal of cloud has got worse as the hardware got more cores.

ragall 40 days ago [-]

> What has surprised me about the cloud is that the price has been towards ever increasing prices for cores.

That makes a lot of sense. Cloud providers are selling compute, and as cores get faster, the single core gets more expensive.

jfindley 41 days ago [-]

Do note though that AIUI these are all E-cores, have poor single-threaded performance and won't support things like AVX512. That is going to skew your performance testing a lot. Some workloads will be fine, but for many users that are actually USING the hardware they buy this is likely to be a problem.

If that's you then the GraniteRapids AP platform that launched previously to this can hit similar numbers of threads (256 for the 6980P). There are a couple of caveats to this though - firstly that there are "only" 128 physical cores and if you're using VMs you probably don't want to share a physical core across VMs, secondly that it has a 500W TDP and retails north of $17000, if you can even find one for sale.

Overall once you're really comparing like to like, especially when you start trying to have 100+GbE networking and so on, it gets a lot harder to beat cloud providers - yes they have a nice fat markup but they're also paying a lot less for the hardware than you will be.

Most of the time when I see takes like this it's because the org has all these fast, modern CPUs for applications that get barely any real load, and the machines are mostly sitting idle on networks that can never handle 1/100th of the traffic the machine is capable of delivering. Solving that is largely a non-technical problem not a "cloud is bad" problem.

adrian_b 41 days ago [-]

These Intel Darkmont cores are in a different performance class than the (Crestmont) E-cores used in the previous generation of Sierra Forest Xeon CPUs. For certain workloads they may have even a close to double performance per core.

Darkmont is a slightly improved variant of the Skymont cores used in Arrow Lake/Lunar Lake and it has a performance very similar to the Arm Neoverse V3 cores used in Graviton5, the latest generation of custom AWS CPUs.

However, a Clearwater Forest Xeon CPU has much more cores per socket than Graviton5 and it also supports dual-socket motherboards.

Darkmont also has a greater performance than the older big Intel cores, like all Skylake derivatives, inclusive for AVX-using programs, so it is no longer comparable with the Atom series of cores from which it has evolved.

Darkmont is not competitive in absolute performance with AMD Zen 5, but for the programs that do not use AVX-512 it has better performance per watt.

However, since AMD has started to offer AVX-512 for the masses, the number of programs that have been updated to be able to benefit from AVX-512 is increasing steadily, and among them are also applications where it was not obvious that using array operations may enhance performance.

Because of this pressure from AMD, it seems that this Clearwater Forest Xeon is the final product from Intel that does not support AVX-512. Both next 2 Intel CPUs support AVX-512, i.e. the Diamond Rapids Xeon, which might be launched before the end of the year, and the desktop and laptop CPU Nova Lake, whose launch has been delayed to next year (together with the desktop Zen 6, presumably due to the shortage of memories and production allocations at TSMC).

formerly_proven 41 days ago [-]

E-cores aren't that slow, yesteryear ones were already around Skylake levels of performance (clock for clock). Now one might say that's a 10+ year old uarch, true, but those ten years were the slowest ten years in computing since the beginning of computing, at least as far as sequential programs are concerned.

bearjaws 41 days ago [-]

I just don't know if the human capital is there.

At my job we use HyperV, and finding someone who actually knows HyperV is difficult and expensive. Throw in Cisco networking, storage appliances, etc to make it 99.99% uptime...

Also that means you have just one person, you need at least two if you don't want gaps in staffing, more likely three.

Then you still need all the cloud folks to run that.

We have a hybrid setup like this, and you do get a bit of best of both worlds, but ultimately managing onprem or colo infra is a huge pain in the ass. We only do it due to our business environment.

to11mtm 41 days ago [-]

I think you're hitting on a general problem statement a lot of orgs run into, even ignoring the uptime figure...

All of the complexity of onprem, especially when you need to worry about failover/etc can get tricky, especially if you are in a wintel env like a lot of shops are.

i.e. lots of companies are doing sloppy 'just move the box to an EC2 instance' migrations because of how VMWare jacked their pricing up, and now suddenly EC2/EBS/etc costing is so cheap it's a no brain choice.

I think the knowledge base to set up a minimal cost solution is too tricky to find a benefit vs all the layers (as you almost touched on, all the licensing at every layer vs a cloud provider managing...)

That said, rug pulls are still a risk; I try to push for 'agnostic' workloads in architecture, if nothing else because I've seen too many cases where SaaS/PaaS/etc decide to jack up the price of a service that was cheap, and sure you could have done your own thing agnostically, but now you're there, and migrating away has a new cost.

IOW, I agree; I don't think the human capital is there as far as infra folks who know how to properly set up such environments, especially hitting the 'secure+productive' side of the triangle.

sounds 41 days ago [-]

> I just don't know if the human capital is there.

> At my job we use HyperV, and finding someone who actually knows HyperV is difficult and expensive...

Try offering significantly higher pay.

ponector 41 days ago [-]

Or even try to educate people. It was common to have learning programs but nowadays managers only complain you cannot find cheap experts.

eastabrooka 41 days ago [-]

"We educated the people and they left because they could get better elsewhere" - Some Manager

zer00eyz 41 days ago [-]

> These sorts of core-density increases are how I win cloud debates in an org.

AMD has had these sorts of densities available for a minute.

> Identify the workloads that haven't scaled in a year.

I have done this math recently, and you need to stop cherry picking and move everything. And build a redundant data center to boot.

Compute is NOT the major issue for this sort of move:

Switching and bandwidth will be major costs. 400gb is a minimum for interconnects and for most orgs you are going to need at least that much bandwidth top of rack.

Storage remains problematic. You might be able to amortize compute over this time scale, but not storage. 5 years would be pushing it (depending on use). And data center storage at scale was expensive before the recent price spike. Spinning rust is viable for some tasks (backup) but will not cut it for others.

Human capital: Figuring out how to support the hardware you own is going to be far more expensive than you think. You need to expect failures and staff accordingly, that means resources who are going to be, for the most part, idle.

jmward01 41 days ago [-]

Cloud = the right choice when just starting. It isn't about infra cost, it is about mental cost. Setting up infra is just another thing that hurts velocity. By the time you are serving a real load for the first time though you need to have the discussion about a longer term strategy and these points are valid as part of that discussion.

andoando 41 days ago [-]

I guess it depends, but infra is also a lot simpler when starting out. It really isnt much harder (easier even?) to setup services on a box or two than managing AWS.

Im pretty sure a box like this could run our whole startup, hosting PG, k8s, our backend apis, etc, would be way easier to setup, and not cost 2 devops and $40,000 a month to do it.

CyberDildonics 41 days ago [-]

Is infra really that hard to set up? It seems like infra is something a infra expert could establish to get the infra going and then your infra would be set up and you would always have infra.

ocdtrekkie 41 days ago [-]

As a big on-prem guy, I think cloud makes sense for early startups. Lead time on servers and networking setup can be significant, and if you don't know how much you need yet you will either be resource starved or burn all your cash on unneeded capacity.

On-prem wins for a stable organization every time though.

charcircuit 41 days ago [-]

You can rent a vps or dedicated server if you need something immediately to without going to cloud providers.

PunchyHamster 41 days ago [-]

You are correct but it still takes time. You can start using cloud today but you need to:

* sign the papers for server colo * get quote and order servers (which might take few weeks to deliver!), near always a pair of switches * set them up, install OSes, set up basic services inside the network (DNS, often netboot/DHCP if you want to have install over network, and often few others like image repository, monitoring etc.)

It's "we have product and cashflow, let's give someone a task to do it" thing, not "we're a startup ,barely have PoC" thing

estimator7292 41 days ago [-]

You have to pay that infra person and shield them from "infra works, why are we paying so much for IT staff" layoffs. Then you have ongoing maintenance costs like UPS battery replacement and redundant internet connections, on top of the usual hardware attrition.

It's unfortunately not so cut and dry

UltraSane 41 days ago [-]

Secure and reliable infrastructure is hard to set and keep secure and reliable over time.

readthenotes1 41 days ago [-]

Based on the evidence, not only is infrastructure really hard to set up in the first place, it is incredibly error-prone to adjust to new demand.

MrBuddyCasino 41 days ago [-]

It seems a lot of people have forgotten how BigCorp IT used to work.

- request some HW to run $service

- the "IT dept" (really, self-interested gatekeeper) might give you something now, or in two weeks, or god help you if they need to order new hardware then its in two months, best case

- there will be various weird rules on how the on-prem HW is run, who has access etc, hindering developer productivity even further

- the hardware might get insanely oversubscribed so your service gets half a cpu core with 1GB RAM, because perverse incentives mean the "IT dept" gets rewarded for minimizing cost, while the price is paid by someone else

- and so on...

The cloud is a way around this political minefield.

misswaterfairy 41 days ago [-]

> The cloud is a way around this political minefield.

Until the bills _really_ start skyrocketing...

madduci 41 days ago [-]

Is your calculation also taking cost of energy and personnel that keeps your own infra running?

matsemann 41 days ago [-]

Is that personnel cost more than running on someone else's infra? Just counting the amount of people a company now need just to maintain their cloud/kubernetes/whatever setup, paired with "devops" meaning all devs now have to spend time on this stuff, I could almost wager we would spend less on personnel if we just chucked a few laptops in a closet and sshed in.

varispeed 41 days ago [-]

That only works if purchasers in the organisation are immune to kickbacks.

klooney 41 days ago [-]

Man, how do you get box seats out of AWS, I'm missing out

umvi 41 days ago [-]

Is using virtualization the only good way of taking a 288-core box and splitting it up into multiple parallel workloads? One time I rented a 384-core AMD EPYC baremetal VM in GCP and I could not for the life of me get parallelized workloads to scale just using baremetal linux. I wanted to run a bunch of CPU inference jobs in parallel (with each one getting 16 cores), but the scaling was atrocious - the more parallel jobs you tried to add, the slower all of them ran. When I checked htop the CPU was very underutilized, so my theory was that there was a memory bottleneck somewhere happening with ONNX/torch (something to do with NUMA nodes?) Anyway, I wasn't able to test using proxmox or vmware on there to split up cpu/memory resources; we decided instead to just buy a bunch of smaller-core-count AMD Ryzen 1Us instead, which scaled way better with my naive approach.

PunchyHamster 41 days ago [-]

They are used for VMs because the load is pretty spiky and usually not that memory heavy. For just running single app smaller core count but higher clocked ones are usually more optimal

>Anyway, I wasn't able to test using proxmox or vmware on there to split up cpu/memory resources; we decided instead to just buy a bunch of smaller-core-count AMD Ryzen 1Us instead, which scaled way better with my naive approac

If that was single 384 (192 times 2 for hyperthreading) CPU you are getting "only" 12 DDR5 channels, so one RAM channel is shared by 16c/32y

So just plain 16 core desktop Ryzen will have double memory bandwidth per core

Dylan16807 41 days ago [-]

How did the speed of one or two jobs on the EPYC compare to the Ryzen?

And 384 actual cores or 384 hyperthreading cores?

Inference is so memory bandwidth heavy that my expectations are low. An EPYC getting 12 memory channels instead of 2 only goes so far when it has 24x as many cores.

user5994461 41 days ago [-]

> These sorts of core-density increases are how I win cloud debates in an org.

The core density is bullshit when each core is so slow that it can't do any meaningful work. The reality is that Intel is 3 times behind AMD/TSMC on performance vs power consumption ratio.

People would be better off having a look at the high frequency models (9xx5F models like the 9575F), that was the first generation of CPU server to reach ~5 GHz and sustain it on 32+ cores.

matja 41 days ago [-]

Intel seem to be deliberately hiding the clock frequency of this thing, the xeon-6-plus-product-deck.pdf has no mention of clock frequency or how LLC is shared.

41 days ago [-]

mschuster91 41 days ago [-]

> If we can't bill a customer for it, and it's not scaling regularly, then it shouldn't be in the public cloud. That's my take, anyway. It sucks the wind from the sails of folks gung-ho on the "fringe benefits" of public cloud spend (box seats, junkets, conference tickets, etc...), but the finance teams tend to love such clear numbers.

I agree, but.

For one, it's not just the machines themselves. You also need to budget in power, cooling, space, the cost of providing redundant connectivity and side gear (e.g. routers, firewalls, UPS).

Then, you need a second site, no matter what. At least for backups, ideally as a full failover. Either your second site is some sort of cloud, which can be a PITA to set up without introducing security risks, or a second physical site, which means double the expenses.

If you're a publicly listed company, or live in jurisdictions like Europe, or you want to have cybersecurity insurance, you have data retention, GDPR, SOX and a whole bunch of other compliance to worry about as well. Sure, you can do that on-prem, but you'll have a much harder time explaining to auditors how your system works when it's a bunch of on-prem stuff vs. "here's our AWS Backup plans covering all servers and other data sources, here is the immutability stuff, here are plans how we prevent backup expiry aka legal hold".

Then, all of that needs to be maintained, which means additional staff on payroll, if you own the stuff outright your finance team will whine about depreciation and capex, and you need to have vendors on support contracts just to get firmware updates and timely exchanges for hardware under warranty.

Long story short, as much as I prefer on-prem hardware vs the cloud, particularly given current political tensions - unless you are a 200+ employee shop, the overhead associated with on-prem infrastructure isn't worth it.

Imustaskforhelp 41 days ago [-]

> Then, you need a second site, no matter what. At least for backups, ideally as a full failover. Either your second site is some sort of cloud, which can be a PITA to set up without introducing security risks, or a second physical site, which means double the expenses.

You can technically have backblaze's unlimited backup option which costs around 7$ for a given machine although its more intended for windows, there have been people who make it work and Daily backups and it should work with gdpr (https://www.backblaze.com/company/policy/gdpr) with something like hetzner perhaps if you are worried about gdpr too much and OVH storage boxes (36 TB iirc for ~55$ is a good backup box) and you should try to follow 3-2-1 strategy.

> Then, all of that needs to be maintained, which means additional staff on payroll, if you own the stuff outright your finance team will whine about depreciation and capex, and you need to have vendors on support contracts just to get firmware updates and timely exchanges for hardware under warranty.

I can't speak for certain but its absolutely possible to have something but iirc for companies like dell, its possible to have products be available on a monthly basis available too and you can simply colocate into a decent datacenter. Plus points in that now you can get 10-50 GB ports as well if you are too bandwidth hungry and are available for a lot lot more customizable and the hardware is already pretty nice as GP observed. (Yes Ram prices are high, lets hope that is temporary as GP noted too)

I can't speak about firmware updates or timely exchanges for hardware under security.

That being said, I am not saying this is for everyone as well. It does essentially boils down to if they have expertise in this field/can get expertise in this field or not for cheaper than their aws bills or not. With many large AWS bills being in 10's of thousands of dollars if not hundreds of thousands of dollars, I think that far more companies might be better off with the above strategy than AWS actually.

mschuster91 41 days ago [-]

> You can technically have backblaze's unlimited backup option which costs around 7$ for a given machine although its more intended for windows, there have been people who make it work and Daily backups and it should work with gdpr (https://www.backblaze.com/company/policy/gdpr) with something like hetzner perhaps if you are worried about gdpr too much and OVH storage boxes (36 TB iirc for ~55$ is a good backup box) and you should try to follow 3-2-1 strategy.

Sure, but it doesn't solve the issue of "the datacenter is on fire" - neither if you're fully on prem or if you use colocation. You still need to acquire a new set of hardware, rack it, reconfigure the networking hardware and then restore from backups. That's an awful lot of work, and yes, I've been there.

41 days ago [-]

fennecfoxy 40 days ago [-]

>Realize the savings to be had moving "fixed infra" back on-premises or into a colo versus sticking with a public cloud provider

As other people have pointed out: what happens when the PSU or mobo shits itself, what happens when the new version of vmware or docker (or whatever) shits itself, etc

fnord77 41 days ago [-]

on prem = capex

cloud = opex

The accounting dept will always win this debate.

penglish1 39 days ago [-]

I don't post much on HN, but this topic is near and dear to my heart, so here we go.

Context: been helpdesk, sysadmin & network admin, DevOps, Site Reliability Engineer, in that progression, starting in the 90's. Max on-prem was 40 racks, scaled up and down over years.

Many comments talking about how staffing is a key element of this equation that can't be overlooked, but I decided to reply to the root comment, which doesn't say whether/how it considers staffing.

This is a complex equation - and it is relatively easy to present an incomplete or misleading picture management to push the move into the cloud.. or out of the cloud.

Some factors, in no particular order:

1) Scaling: it is self-evident that pulling a single physical server worth out of the cloud is not worth it.. even for 288 cores. Or perhaps 1152 for 4xXeon in a single server. Still likely not worth it. Why? Because a single server is never just that. Someone has to swap components when it goes down. When it goes down.. ALL 1152 cores are down, along with everything they are doing. Is that acceptable for all applications running on all those cores? It is also appropriate supporting infrastructure - power, cooling, physical space. The "fairly obvious" minimum scaling is "enough servers that one can be entirely down for maintenance while keeping everything else running." But now you're paying for some overhead. At 2 servers, you're buying 2x what you need, half that capacity is idle all the time. And so on.

On this point - I think the other comments talking about "each SRE managing 4 (or 5, or 7)" racks missed the point entirely. SRE's should be doing scalable work, whether in the cloud, or on-prem. And they should NOT be swapping failed hard drives and power supplies. Designing a larger-than-one rack install is probably worth hiring consultants for if you don't have that expertise in-house, though the SREs that would be supporting it would need to supply lots of input. To some extent, server & network equipment vendors can also help. It is not trivial as the scale goes up. But then it should run for some years, with relatively unskilled people handling hardware failures and you can re-engage consultants if necessary to do upgrades as hardware and needs evolve.

But your SREs should be on-staff, and probably on-call to handle the software running on that hardware.. and to some extent to call the remote hands to deal with hardware failures.

2) Business needs: does the business need the tech skills that self-hosting requires for the core business? For example - if the business itself is cloud SAAS, maybe DIY-ing at least some of your infrastructure is right in your wheelhouse. If so - a modest increase in staff could mean a huge cost savings. But if not, all the cost of skilled staff to run it is simply part of the cost of in-housing this stuff.

3) Staffing: the people that swap broken hardware are not the same people that respond to pages because the business critical application crashed due to a bug. You can pay a colo facility for all this, typically by the hour - but it isn't cheap and you've got to supply all the spares etc. Is that part of your budget for on-prem?

4) On-call: maybe your self-hosted ERP system can be down every night and weekend without issues.. and even business hours can tolerate 98% uptime. But that doesn't mean you can get away without having someone on call - presumably you're hosting more than just this lowish-requirement ERP system. I'll disagree with other comments - the "no burnout" number of on-call staff you need is 6-7, not 4! Remember people take vacations too. This is well studied, and established, I'll reference Tom Limoncelli's books. This could be relatively cheap and require fewer staff with geo-distributed staff, and it would tend to overlap with staff you already use to provide on-call for anything you host on the cloud - so maybe for your situation it is close to a wash. But you can't forget to budget for it even if the line item is $0.

5) Vendor support: maybe you already have your own data center or colo and are hosting a ton of stuff. Why not move all your Atlassian stuff in house and save the hosting cost. Oh.. wups, Atlassian simply doesn't support that any more. Host it with Atlassian or GTFO. A minor point as most vendors would give you enough notice you can simply run out the lifetime of the hardware it is on and not replace it.

6) Market pricing: At one point Amazon was starting "by the minute, by the core cloud" (as opposed the older "cloudish" model of leasing only an entire physical server by the entire year) and priced a bit under market to get going. Then once they established dominance, they cranked the price WAY up for profit extraction. But now they do have some competition and they're a bit more selective about how they extract profits. In my perception they've shifted a lot of the profit taking to the value-added services rather than raw instance time, but I could be wrong. And they have HUGE costs - it is beyond naive to look at the per-hour cost of an instance and compare it to purchasing an identical physical server solely on the purchase price of that server. Corey Quinn / Duckbill group has spent a huge part of his career in this space - if you're already in the cloud 100% it is well worth optimizing those costs before you start comparing it to what on-prem might cost.

justsomehnguy 41 days ago [-]

> The only thing giving me pause on that argument today is the current RAM/NAND shortage

Not a shortage - price gouging. And it would mean an increase in the 'cloud' prices because they need to refresh the HW too. So by the summer the equation would be back to it.

bee_rider 41 days ago [-]

I don’t quite follow:

> From a cache hierarchy standpoint, the design groups cores into four-core blocks that share approximately 4 MB of L2 cache per block. As a result, the aggregate last-level cache across the full package surpasses 1 GB, roughly 1,152 MB in total.

If cores are grouped into four-core blocks, and each block has 4MB of cache… isn’t that just 1MB per core? So 288MB total?

HotHardware reports

https://hothardware.com/news/intel-clearwater-forest-xeon-6-...

> these processors pack in up to 288 of the little guys as well as 576MB of last-level cache, 96 PCIe 5.0 lanes, and 12-channel DDR5-8000.

> The Xeon 6+ processors each have up to 12 compute tiles fabbed on 18A, all of which have six quad-core modules for a total of 24 cores per tile. There are also three 'active' base tiles on Intel 3, so-called because the base tiles include 192MB of last-level cache, which is so-called because each compute tile has 48MB of L3 cache.

So maybe 1MB per core L2, then 192MB of basically-L4 per base tile, then 48MB of L3 per compute tile? 192*3+48*12 gets me to the 1152, maybe that’s it.

Anyway, apparently these things will have “AMX” matrix extensions. I wonder if they’ll be good number crunchers.

50lo 41 days ago [-]

With packages like this (lots of cores, multi-chip packaging, lots of memory channels), the architecture is increasingly a small cluster on a package rather than a monolithic CPU.

I wonder whether the next bottleneck becomes software scheduling rather than silicon - OS/runtimes weren’t really designed with hundreds of cores and complex interconnect topologies in mind.

Agingcoder 41 days ago [-]

Yes there are scheduling issues, Numa problems , etc caused by the cluster in a box form factor.

We had a massive performance issue a few years ago that we fixed by mapping our processes to the numa zones topology . The default design of our software would otherwise effectively route all memory accesses to the same numa zone and performance went down the drain.

zadikian 41 days ago [-]

Wait, does a single CPU chip have numa within it now, or are you only talking about multi-socket machines?

__turbobrew__ 41 days ago [-]

Modern AMD processors are basically a bunch of smaller processors (chiplets) glued together with an interconnect. So yes single chip nodes can have many numa zones.

saati 40 days ago [-]

That was Zen 1, the later ones don't have per chiplet memory controllers, it's all on the single IO die, and they are not NUMA for a single socket.

creddit 41 days ago [-]

Single chips do.

brcmthrowaway 41 days ago [-]

Intel contributes to Linux, how is this a problem?

fc417fc802 41 days ago [-]

Wrong level of abstraction. NUMA is an additional layer. If the program (script, whatever) was written with a monolithic CPU in mind then the big picture logic won't account for the new details. The kernel can't magically add information it doesn't have (although it does try its best).

Given current trends I think we're eventually going to be forced to adopt new programming paradigms. At some point it will probably make sense to treat on-die HBM distinctly from local RAM and that's in addition to the increasing number of NUMA nodes.

Agingcoder 41 days ago [-]

Yes exactly.

The kernel tries to guess as well as it can though - many years ago I hit a fun bug in the kernel scheduler that was triggered by numa process migration ie the kernel would move the processes to the core closest to the ram. It happened that in some cases the migrated processes never got scheduled and got stuck forever.

Disabling numa migration removed the problem. I figured out the issue because of the excellent ‘a decade of wasted cores’ paper which essentially said that on ‘big’ machines like ours funky things could happen scheduling wise so started looking at scheduling settings .

The main numa-pinning performance issue I was describing was different though, and like you said came from us needing to change the way the code was written to account for the distance to ram stick. Modern servers will usually let you choose from fully managed ( hope and pray , single zone ) to many zones, and the depending on what you’ve chosen to expose, use it in your code. As always, benchmark benchmarks.

zadikian 41 days ago [-]

Guessing this is especially hard to automate with peripherals involved. I once had a workload slow severely because it was running on the NUMA node that didn't share memory with the NIC.

consp 41 days ago [-]

Isn't high grade SSD storage pretty much a memory layer as well these days as the difference is no longer several orders of magnitude in access time and thoughput but only one or two (compared to tha last layer of memory)?

Agingcoder 40 days ago [-]

Optane was supposed to fill the gap but Intel never found a market for this.

Flash is still extremely slow compared to ram, including modern flash, especially in a world where ram is already very slow and your cpu already keeps waiting for it.

That being said, you should consider ram/flash/spinning to be all part of a storage hierarchy with different constants and tradeoffs ( volatile or not, big or small , fast or slow etc ), and knowing these tradeoffs will help you design simpler and better systems.

fc417fc802 41 days ago [-]

Sort of? Relative to 6 or more channels of RAM it's still quite abysmal but perhaps high bandwidth flash will change how things are done.

wmf 41 days ago [-]

Often the Linux scheduling improvements come a year or two after the chip. Also, Linux makes moment-by-moment scheduling and allocation decisions that are unaware of the big picture of workload requirements.

jeffbee 41 days ago [-]

There definitely are bottlenecks. The one I always think of is the kernel's networking stack. There's no sense in using the kernel TCP stack when you have hundreds of independent workloads. That doesn't make any more sense than it would have made 20 years ago to have an external TCP appliance at the top of your rack. Userspace protocol stacks win.

otabdeveloper4 41 days ago [-]

> Userspace protocol stacks win.

No they don't. They are horribly wasteful and inefficient compared to kernel TCP. Also they are useless because they sit on top of a kernel network interface anyways.

Unless you're doing specific tricks to minimize latency (HFT, I guess?) then there is no point.

fc417fc802 41 days ago [-]

Do the partitioned stacks of network namespaces share a single underlying global stack or are they fully independent instances? (And if not, could they be made so?)

wmf 41 days ago [-]

Usually network namespaces are linked together with a single bridge so you can get lock contention there.

If you have a separate physical NIC for each namespace you probably won't have any contention.

jeffbee 41 days ago [-]

I think you could get much of the way there by isolating a single NIC's receive queues, so the kernel doesn't decide to run off and service softirqs for random foreign tasks just because your task called tcp_sendmsg.

rishabhaiover 41 days ago [-]

io_uring?

jeffbee 41 days ago [-]

If anything, uring makes the problem much worse by reducing the cost of one process flooding kernel internals in a single syscall.

lich_king 41 days ago [-]

I don't think there are any fundamental bottlenecks here. There's more scheduling overhead when you have a hundred processes on a single core than if you have a hundred processes on one hundred cores.

The bottlenecks are pretty much hardware-related - thermal, power, memory and other I/O. Because of this, you presumably never get true "288 core" performance out of this - as in, it's not going to mine Bitcoin 288 as fast as a single core. Instead, you have less context-switching overhead with 288 tasks that need to do stuff intermittently, which is how most hardware ends up being used anyway.

Retr0id 41 days ago [-]

Maybe no fundamental bottlenecks but it's easy to accidentally write software that doesn't scale as linearly as it should, e.g. if there's suddenly more lock contention than you were expecting, or in a more extreme case if you have something that's O(n^2) in time or space, where n is core count.

dehrmann 41 days ago [-]

> I don't think there are any fundamental bottlenecks here.

You memory only has so much bandwidth, but now it's shared by even more cores.

lich_king 40 days ago [-]

You're responding out of context. The parent was asking if there are bottlenecks specifically related to scheduling. I explicitly made the point that if there are bottlenecks, they're more likely related to memory.

marcyb5st 41 days ago [-]

I think linux and co do already a decent job. Even on K8s (so like at least another layer removed from the host OS) you can specify your topology preferences: https://kubernetes.io/docs/tasks/administer-cluster/topology...

So on the OS side we might already have the needed tools for these CoC (cluster on chip ;))

whateverboat 41 days ago [-]

I think linux can handle upto 1024 cores just fine.

zokier 41 days ago [-]

afaik the mainline limit is 4096 threads. HP sells server with 32 sockets x 60 cores/socket x 2 threads/core = 3840 threads, so we are pretty close to that limit.

Retr0id 41 days ago [-]

I had no idea we had socket counts so high, do you know where I could find a picture of one?

zokier 41 days ago [-]

It's bit cheating because it's cluster based system: https://www.hpe.com/psnow/doc/a50004268enw

So 4 sockets per chassis, up to 8 chassis in a complete system. Afaik OS sees it as single huge system, that is kinda their special sauce here.

nixon_why69 41 days ago [-]

How the heck does the OS see it as a single system, is there some pcie or rdma black magic that allows the kernel to just address memory in a different chassis? Maybe CXL?

stinkbeetle 41 days ago [-]

No it's actual hardware coherent memory across the system. At a high level it is the same way two cores/caches are connected within one chip, or the same way two sockets are connected on the same board. Just using cables instead of wires in the chip or on a board.

This system has SMP ASICs on the motherboards that talk to a couple of Intel processor sockets using their coherency protocol over QPI and they basically present themselves as a coherency agent and memory provider (similarly to the way that processors themselves have caches and DDR controllers). The Intel CPUs basically talk to them the same way they would another processor. But out the other side these ASICS connect to a bunch of others all doing the same thing, and they use their own coherency protocol among themselves.

nixon_why69 40 days ago [-]

Thanks for answering.

So it's not CXL, instead it's proprietary ASICs masquerading as NUMA nodes but actually forwarding to their counterparts in the other chassis? Are they proprietary to HP or is this some new standard?

stinkbeetle 40 days ago [-]

Proprietary called NUMAlnik.

stinkbeetle 41 days ago [-]

It's not cheating or a cluster based system. All the biggest high end servers use multiple externally cabled systems (chassis, sled, drawer). The biggest ones even span multiple racks (aka frames). These days it is HP and IBM remaining in the game.

These all have real hardware coherency going over the external cables, same protocol. Here is a Power10 server picture, https://www.engineering.com/ibm-introduces-power-e1080-serve... the cables attach right to headers brought out of the chip package right off the phy, there's no ->PCI->ethernet-> or anything like that.

These HP systems are similar. These are actually descendants of SGI Altix / SGI Origin systems which HP acquired, and they still use some of the same terminology (NUMAlink for the interconnect fabric). HP did make their own distinct line of big iron systems when they had PA-RISC and later Itanium but ended up acquiring and going with SGI's technology for whatever reasons.

These HP/SGI systems are slightly different from IBM mini/mainframes because they use "commodity" CPUs from Intel that don't support glueless multi socket that large or have signaling that can get across boards, so these have their own chipset that has some special coherency directories and a bunch of NUMAlink PHYs.

SGI systems came from HPC so they were actually much bigger before that, the biggest ones were something around 1024 sockets, back when you only had 1 CPU per socket. The interconnect topology used to be some tree thing that had like 10 hops between the farthest nodes. It did run Linux and wasn't technically cheating, but you really had to program it like a cluster because resource contention would quickly kill you if there was much cacheline transfer between nodes. Quite amazing machines, but not suitable for "enterprise" so IIRC they have cut it down and gone with all-to-all interconnect. It would be interesting to know what they did with coherency protocol, the SGI systems used a full directory scheme which is simple and great at scaling to huge sizes but not the best for performance. IBM systems use extremely complex broadcast source snooping designs (highly scoped and filtered) to avoid full directory overhead. Would be interesting to know if HPE finally went that way with NUMAlink too.

Found this diagram from an old HP product which was an SGI derivative https://support.hpe.com/hpesc/public/docDisplay?docId=a00062... 2 QPI busses and 16 NUMAlink ports!

Aha, it's still a directory protocol. https://support.hpe.com/hpesc/public/docDisplay?docId=sd0000...

Cheating IMO would be an actual cluster of systems using software (firmware/hypervisor) to present a single system image using MMU and IB/ethernat adapters to provide coherency.

to11mtm 41 days ago [-]

Sounds like a HPE Compute Scale-up Server 3200, but again keep in mind that's something where there's probably a fabric between nodes one way or another.

moffkalast 41 days ago [-]

https://xkcd.com/619/

TomMasz 41 days ago [-]

I was wondering this, too. There's no mention of OS support, but I assume Intel is working with the usual suspects on it.

rishabhaiover 41 days ago [-]

That's a great point. Linux has introduced io_uring, and I believe that gives us the native primitives to hide latency better?

But that's just one piece of the puzzle, I guess.

to11mtm 41 days ago [-]

> OS/runtimes weren’t really designed with hundreds of cores and complex interconnect topologies in mind.

I mean....

IMO Erlang/Elixir is a not-terrible benchmark for how things should work in that state... Hell while not a runtime I'd argue Akka/Pekko on JVM Akka.Net on the .NET side would be able to do some good with it...[0] Similar for Go and channels (at least hypothetically...)

[0] - Of course, you can write good scaling code on JVM or CLR without these, but they at least give some decent guardrails for getting a good bit of the Erlang 'progress guaranteed' sauce.

user5994461 41 days ago [-]

> I wonder whether the next bottleneck becomes software scheduling rather than silicon

Yep, the scheduling has been a problem for a while. There was an amazing article few years ago about how the Linux kernel was accidentally hardcoded to 8 cores, you can probably google and find it.

IMO the most interesting problem right now is the cache, you get a cache miss every time a task is moving core. Problem, with thousands of threads switching between hundreds of cores every few milliseconds, we're dangerously approaching the point where all the time is spent trashing and reloading the CPU cache.

01HNNWZ0MV43FF 41 days ago [-]

I searched for "Linux kernel limited to 8 cores" and found this

https://news.ycombinator.com/item?id=38260935

> This article is clickbait and in no way has the kernel been hardcoded to a maximum of 8 cores.

user5994461 41 days ago [-]

That's the one. Funny thing, it's not actually clickbait.

The bug made it to the kernel mailing list where some Intel people looked into it and confirmed there is a bug. There is a problem where is the kernel allocation logic was capped to 8 cores, which leaves a few percent of performance off the table as the number of cores increase and the allocation is less and less optimal.

It's classic tragedy of the commons. CPU have got so complicated, there may only be a handful of people in the world who could work and comprehend a bug like this.

andreadev 41 days ago [-]

I think everyone's focusing on the core count, but the packaging story is way more interesting here. This thing is 12 separate chiplets on 18A stacked on base dies made on Intel 3, connected to I/O tiles on Intel 7. Three different process nodes in one package, shipping at volume. That's nuts.

And it's clearly an IFS play too. Intel Foundry needs a proof point — you can publish PDKs all day, but nothing sells foundry credibility like eating your own cooking in a 288-core server part at 450W. If Foveros Direct works here, it's the best ad Intel could run for potential foundry customers.

The chiplet sizing is smart for another reason nobody's mentioned: yield. 18A is brand new, yields are probably rough. But 24 cores per die is small enough that even bad yields give you enough good chiplets. Basically AMD's Zen playbook but with a 3D twist.

Also — 64 CXL 2.0 lanes! Several comments here are complaining about DDR5 prices, which is fair. But CXL memory pooling across a rack could change that math completely. I wonder if Intel is betting the real value isn't the cores but being the best CXL hub in the datacenter.

The ARM competition is still the elephant in the room though. "Many efficient cores" is what ARM has always done natively, and 17% IPC uplift on Darkmont doesn't close that gap by itself.

nsteel 41 days ago [-]

Agree entirely with your take. The packaging story is awesome, I wish there were more details on the stacking used on this one.

But I am at a loss to how Intel are really going to get any traction with IFS. How can anyone trust Intel as a long-term foundry partner. Even if they priced it more aggressively, the opportunity cost in picking a supplier who decides to quit next year would be catastrophic for many. The only way this works is if they practically give their services away to someone big, who can afford to take that risk and can also make it worth Intel's continued investment. Any ideas who that would be, I've got nothing.

0x0203 41 days ago [-]

I suspect that timing might help Intel here, with so much of the better established foundries near fully allocated for the next two years, it may be more a question of availability than brand name risk. And for whatever problems Intel has, it's pretty unlikely they'd go completely under and disolve in less than a year. Good non completion clauses in the contracts can mitigate a good chunk of the remaining risk.

Not to mention potential customers who would prefer a US based foundry regardless. My guess is that there's a pretty large part of the market that would be perfectly fine with using Intel.

bryanlarsen 40 days ago [-]

> How can anyone trust Intel as a long-term foundry partner

With the standard form of business trust: a contract.

nsteel 40 days ago [-]

Worthless. Just looks how IFS worked out the previous two times they gave it a go. If you're not in the industry you may not even be aware it was a thing. And then not. Twice.

bryanlarsen 39 days ago [-]

And how many times did Intel get sued for breach of contract over changing their mind? If they have a contract, they'll honor it or compensate.

nsteel 39 days ago [-]

And that's why it's got to be a big company that takes this on, you need deep pockets to successfully sue a company like Intel. It's not realistic for most. Plus the huge opportunity cost of missing your market and wasting years having to start over. Again, a bigger company can survive that with multiple projects in parallel.

Dunedan 41 days ago [-]

> 18A is brand new, yields are probably rough.

That the CPU cores are low frequency cores probably helps with yield as well.

epolanski 41 days ago [-]

Are the two things related?

boltzmann-brain 41 days ago [-]

Helped a friend make a difficult career decision (cozy job vs something hard and new + moving to a new city) that ultimately ended up with him working on the project. Glad that happened. I love to see people grow.

NoNameHaveI 41 days ago [-]

As a Yocto enthusiast, I am curious as to how much elapsed realtime would be needed for a clean Yocto build. Yocto is thread heavy, so with 288, it oughta be good.

overfeed 41 days ago [-]

My Yocto build times on a 32-core AMD are negligible, <2 minutes for a full distro, IIRC. I suspect higher core counts have diminishing returns, especially since most dev builds are heavily cached.

foxglacier 41 days ago [-]

As a fellow yocto enthusiast, I think they should call the process node 1.8e15 ym instead of the stupid legacy Angstrom unit.

rubyn00bie 41 days ago [-]

I’ve not kept up with Intel in a while, but one thing that stood out to me is these are all E cores— meaning no hyperthreading. Is something like this competitive, or preferred, in certain applications? Also does anyone know if there have been any benchmarks against AMDs 192 core Epyc CPU?

topspin 41 days ago [-]

"Is something like this competitive, or preferred, in certain applications?"

They cite a very specific use case in the linked story: Virtualized RAN. This is using COTS hardware and software for the control plane for a 5G+ cell network operation. A large number of fast, low power cores would indeed suit such a application, where large numbers of network nodes are coordinated in near real time.

It's entirely possible that this is the key use case for this device: 5G networks are huge money makers and integrators will pay full retail for bulk quantities of such devices fresh out of the foundry.

cyanydeez 41 days ago [-]

is RAM a concern in these cluster applications, cause if prices stay up, how do you get them off the shelf if you also need TB of memory.

topspin 41 days ago [-]

> how do you get them off the shelf if you also need TB of memory

You make products for well capitalized wireless operators that can afford the prevailing cost of the hardware they need. For these operations, the increase in RAM prices is not a major factor in their plans: it's a marginal cost increase on some of the COTS components necessary for their wireless system. The specialized hardware they acquire in bulk is at least an order of magnitude more expensive than server RAM.

Intel will sell every one of these CPUs and the CPUs will end up in dual CPU SMP systems fully populated with 1-2 TB of DDR5-8000 (2-4GB/core, at least) as fast as they can make them.

cyanydeez 41 days ago [-]

I do likr the idea that capitalism can always ignore the broader base of consumers and just raise prices. Eventually, therell only be one viagra pill bought by trillionaires at 1$ million dollars.

God bless capitalism.

bgnn 41 days ago [-]

In HPC, like physics simulation, they are preferred. There's almost no benefit of HT. What's also preferred is high cluck frequencies. These high core count CPUs nerd their clixk frequencies though.

sllewe 41 days ago [-]

I'm so sorry for being juvenile but "high cluck frequencies" may be my favorite typo of all time.

Analemma_ 41 days ago [-]

It all depends on your exact workload, and I’ll wait to see benchmarks before making any confident claims, but in general if you have two threads of execution which are fine on an E-core, it’s better to actually put them on two E-cores than one hyperthreaded P-core.

amelius 41 days ago [-]

Without the hyperthreading (E-cores) you get more consistent performance between running tasks, and cloud providers like this because they sell "vCPUs" that should not fluctuate when someone else starts a heavy workload.

hedora 41 days ago [-]

Sort of. They can just sell even numbers of vCPUs, and dedicate each hyper-thread pair to the same tenant. That prevents another tenant from creating hyper-threading contention for you.

harias 41 days ago [-]

OP is probably talking about shared vCPUs, not dedicated

hedora 41 days ago [-]

For those, wouldn't hyperthreading be a win? Some fraction of the time, you'd get evicted to the hyperthread that shares your L1 cache (and the hypervisor could strongly favor that).

MengerSponge 41 days ago [-]

I don't know the nitty-gritty of why, but some compute intensive tasks don't benefit from hyperthreading. If the processor is destined for those tasks, you may as well use that silicon for something actually useful.

https://www.comsol.com/support/knowledgebase/1096

to11mtm 41 days ago [-]

It's a few things; mostly along the lines of data caching (i.e. hyper threading may mean that other thread needs a cache sync/barrier/etc).

That said I'll point to the Intel Atom - the first version and refresh were an 'in-order' where hyper-threading was the cheapest option (both silicon and power-wise) to provide performance, however with Silvermont they switched to OOO execution but ditched hyper threading.

bgnn 41 days ago [-]

Yeah of you are running Comsol you need real cores + high clock frequency + high memory bandwidth.

Gaming CPUs and some EPYCs are the best

DetroitThrow 41 days ago [-]

I think some of why is size on die. 288 E cores vs 72 P cores.

Also, there's so many hyperthreading vulnerabilities as of late they've disabled on hyperthreaded data center boards that I'd imagine this de-risks that entirely.

mort96 41 days ago [-]

For an application like a build server, the only metric that really matters is total integer compute per dollar and per watt. When I compile e.g a Yocto project, I don't care whether a single core compiles a single C file in a millisecond or a minute; I care how fast the whole machine compiles what's probably hundred thousands of source files. If E-cores gives me more compute per dollar and watt than P-cores, give me E-cores.

Of course, having fewer faster cores does have the benefit that you require less RAM... Not a big deal before, you could get 512GB or 1TB of RAM fairly cheap, but these days it might actually matter? But then at the same time, if two E-cores are more powerful than one hyperthreaded P-core, maybe you actually save RAM by using E-cores? Hyperthreading is, after all, only a benefit if you spawn one compiler process per CPU thread rather than per core.

EDIT: Why in the world would someone downvote this perspective? I'm not even mad, just confused

hedora 41 days ago [-]

Yocto's for embedded projects though, right?

I imagine that means less C++/Rust than most, which means much less time spent serialized on the linker / cross compilation unit optimizer.

mort96 41 days ago [-]

It's for building embedded Linux distros, and your typical Linux distro contains quite a lot of C++ and Rust code these days (especially if you include, say, a browser, or Qt). But you have parallelism across packages, so even if one core is busy doing a serial linking step, the rest of your cores are busy compiling other packages (or maybe even linking other packages).

That said, there are sequential steps in Yocto builds too, notably installing packages into the rootfs (it uses dpkg, opkg or rpm, all of which are sequential) and any code you have in the rootfs postprocessing step. These steps usually aren't a significant part of a clean build, but can be a quite substantial part of incremental builds.

georgeburdell 41 days ago [-]

E core vs P core is an internal power struggle between two design teams that looks on the surface like ARM’s big.LITTLE approach

Aardwolf 41 days ago [-]

E cores ruined P cores by forcing the removal of AVX-512 from consumer P cores

Which is why I used AMD in my last desktop computer build

jsheard 41 days ago [-]

That's finally set to be resolved with Nova Lake later this year, which will support AVX10 (the new iteration of AVX512) across both core types. Better very late than never.

mort96 41 days ago [-]

E cores didn't just ruin P cores, it ruined AVX-512 altogether. We were getting so close to near-universal AVX-512 support; enough to bother actually writing AVX-512 versions of things. Then, Intel killed it.

41 days ago [-]

bmenrigh 41 days ago [-]

I love the AVX512 support in Zen 5 but the lack of Valgrind support for many of the AVX512 instructions frustrates me almost daily. I have to maintain a separate environment for compiling and testing because of it.

paulf38 41 days ago [-]

There was someone at Intel working on AVX512 support in Valgrind. She is/was based in St Petersburg. Intel shuttered their Russian operations when Putin invaded Ukraine and that project stalled.

If anyone has the time and knowledge to help with AVX512 support then it would be most welcome. Fair warning, even with the initial work already done this is still a huge project.

zadikian 41 days ago [-]

I've seen scenarios where HT doesn't help, iirc very CPU-heavy things without much waiting on memory access. Which makes sense because the vcores are sharing the ALU.

Also have seen it disabled in academic settings where they want consistent performance when benchmarking stuff.

moffkalast 41 days ago [-]

I guess it competes with the like of Ampere's ARM servers? I'm sure there are use cases for lots and lots of weak cores, in telecom especially.

bluedino 40 days ago [-]

All the compute providers turn that off anyway.

re-thc 41 days ago [-]

It's a trade off. Hyperthreading takes up space on the die and the power budget.

As to E core itself - it's ARM's playbook.

avhception 41 days ago [-]

A bad moment to have a make-or-break moment for your CPU business - a lot of customers will probably hold off purchases right now because of the RAM prices, no matter how good your CPU might be.

skyberrys 41 days ago [-]

Isn't this new server CPU a drop in replacement though? So the DC could pull off the old CPU, drop in the new one and not touch the existing RAM setup, yet be able to deliver better performance within the limits of the existing RAM. Then once RAM prices drop (okay that might be a while) separately upgrade the RAM at a different time.

to11mtm 41 days ago [-]

That's semi-dependent on supplier arrangements; i.e. lots of shops won't want to upgrade CPUs on a server out of fear that they can't get support later; sometimes that's justified by contract, sometimes it's not.

winwang 41 days ago [-]

If you have enough cores, you could pool the L1 together for makeshift RAM!

tempaccount5050 41 days ago [-]

In my experience, RAM costs will have very little impact on businesses buying servers. When we buy is pretty much set by contract and warranty cycles.

renewiltord 41 days ago [-]

Core density plus power makes so many things worthwhile. Generally human cost of managing hardware scales with number of components under management. CPUs very reliable. So once you get lots of CPU and RAM on single machine you can run with very few.

But right pricing hardware is hard if you’re small shop. My mind is hard-locked onto Epyc processors without thought. 9755 on eBay is cheap as balls. Infinity cores!

Problem with hardware is lead time etc. cloud can spin up immediately. Great for experimentation. Organizationally useful. If your teams have to go through IT to provision machine and IT have to go through finance so that spend is reliable, everybody slows down too much. You can’t just spin up next product.

But if you’re small shop having some Kubernetes on rack is maybe $15k one time and $1.2k on going per month. Very cheap and you get lots and lots of compute!

Previously skillset was required. These days you plug Ethernet port, turn on Claude Code dangerously skip permissions “write a bash script that is idempotent that configures my Mikrotik CCR, it’s on IP $x on interface $y”. Hotspot on. Cold air blowing on face from overhead coolers. 5 minutes later run script without looking. Everything comes up.

Still, foolish to do on prem by default perhaps (now that I think about it): if you have cloud egress you’re dead, compliance story requires interconnect to be well designed. More complicated than just basics. You need to know a little before it makes sense.

Feel like reasoning LLM. I now have opposite position.

PunchyHamster 41 days ago [-]

> Previously skillset was required. These days you plug Ethernet port, turn on Claude Code dangerously skip permissions “write a bash script that is idempotent that configures my Mikrotik CCR, it’s on IP $x on interface $y”. Hotspot on. Cold air blowing on face from overhead coolers. 5 minutes later run script without looking. Everything comes up.

Last time I tried to do anything networking with Claude it set up route preference in opposite order (it thought lower number means more preferred, while it was opposite), fucking it up completely, and then invented config commands that do not exist in BIRD (routing software suite).

Then I looked at 2 different AIs and they both hallucinated same BIRD config commands that were nonexistent. And by same I mean they hallucinated existence of same feature.

> If your teams have to go through IT to provision machine and IT have to go through finance so that spend is reliable, everybody slows down too much. You can’t just spin up next product.

The time of having to order a bunch of servers for new project is long over. We just spun k8s cluster for devs to self-service themselves and the prod clusters just have a bit of accounting shim so adding new namespace have to be assigned to a certain project so we can bill client for it.

Also you're allowed to use cloud services while you have on-prem infrastructure. You get best of both, with some cognition cost involved.

9cb14c1ec0 41 days ago [-]

One day I hope to be rich enough to put a CPU like this (with proportional RAM and storage) in my proxmox cluster.

epistasis 41 days ago [-]

Some of the AMD offerings like this on Ebay are pretty close to affordable! It's the RAM that's killer these days...

I still regret not buying 1TB of RAM back in ~October...

mort96 41 days ago [-]

I bought a bundle with 512GB of RAM and an older 24-core EPYC (7F72) + supermicro motherboard on ebay a bit over a year ago, it was really an amazing deal and has made for a truly nice NAS. If you're okay with stuff that's old enough that you can buy decommissioned server stuff, you can get really high-quality gear at surprisingly low prices.

Companies decommission hardware on a schedule after all, not when it stops working.

EDIT: Though looking for similar deals now, I can only find ones up to 128GB RAM and they're near twice the price I paid. I got 7F72 + motherboard + 512GB DDR4 for $1488 (uh, I swear that's what I paid, $1488.03. Didn't notice the 1488 before.) The closest I can find now is 7F72 + motherboard + 128GB DDR4 for over $2500. That's awful

jauntywundrkind 41 days ago [-]

AMD also has some weird cpus like the 7c13 7r13, that are way way way below their normal price bands. You don't even have to buy used to get a ridiculous systems... Until 4 months ago (RIP ram prices). https://www.servethehome.com/amd-epyc-7c13-is-a-surprisingly...

MostlyStable 41 days ago [-]

I've heard it claimed that the era of being able to do this (buy slightly old used server hardware cheap on ebay) is coming to an end because, in the quest for ever more efficiency, the latest server hardware is no longer compatible with off-the-shelf power supplies etc. (there was more but that's the part that I remember) and therefore won't have any value on the second hand market.

I hope it was wrong, but it seems at least plausible to me. I'm sure that probably fixes could be made for all these issues, but the reason the current paradigm works is that, other than the motherboard and CPU, everything else you need is standard, consumer grade equipment which is therefore cheap. If you need to start buying custom (new) power supplies etc. to go along, then the price may not make as much sense anymore.

fc417fc802 41 days ago [-]

When boxes get decommissioned it's generally the entire thing. So you can pick up used power supplies as well. Or just buy new because even if it isn't ATX it's still a widely produced item that's used across multiple product lines.

The troublesome hardware is the stuff with custom backplanes and multiple daughterboards each hosting a node. Also AMD CPUs that lock themselves to a single motherboard.

evanjrowley 41 days ago [-]

The power supply incompatibility came to fruition a long time ago. Buying used Supermicro ATX motherboards to build into servers stopped being a thing for me about 1 decade ago. But used servers and desktops, even with their non-standard parts, have continued to deliver high value for me even today.

epistasis 41 days ago [-]

RAM! (And NAND SSDs too now, probably...)

When I was looking in October, I hadn't bought hardware for the better part of a decade, and I saw all these older posts on forums for DDR4 at $1/GB, but the lowest I could find was at least $2/GB used. These days? HAH!

If I had a decent sales channel I might be speculating on DDR4/DDR5 RAM and holding it because I expect prices to climb even higher in the coming months.

fc417fc802 41 days ago [-]

I don't remember DDR4 ever hitting $1/GB. Was I not shopping in the right places? IIRC DDR3 settled in at $1/GB quite a long time ago and recycled datacenter DDR4 was maybe ballpark $2.50/GB at some point.

epistasis 41 days ago [-]

These were used prices on eBay, according to Reddit (r/homelab, probably?)

For my hacking purposes it would have been perfect. It's hard to justify the project at even $2/GB though.

MayeulC 41 days ago [-]

I'm curious, what is the powe draw for such a system? Of course, it heavily depends on the disks, but does it idle under 200W?

I personally feel like I will downscale my homelab hardware to reduce its power draw. My HW is rather old (and leagues below yours), more recent HW tends to be more efficient, but I have no idea how well these high end server boards can lower their idle power consumption?

speed_spread 41 days ago [-]

That's an "if you have to ask, it's not for you" question. Also, the noise these things make... You better have a separate garage. The constraints of a data center are really far from those of a homelab.

mort96 39 days ago [-]

I don't remember, but I know I measured it once. I believe around 200W or a bit above.

Tepix 41 days ago [-]

Do you remember what you dreamed about 7 years ago? An Ampere Altra 80-core-CPU was sold for less than 210€ on eBay in January.

mort96 41 days ago [-]

Oh, nice! I always wanted one of those, a many-core build server running ARM would be excellent for Yocto. Anything running in quemu in the rootfs is so slow on x86 and I've seen the rootfs postprocess step take a long time.

Though... these days, getting enough RAM to support builds across 80 cores would be twice the price of the whole rest of the system I'm guessing.

Aurornis 41 days ago [-]

Wait long enough and these will be cheap on eBay.

By that point we'll be desiring the new 1000 core count CPUs though.

fred_is_fred 41 days ago [-]

This is the 2026 version of "I need a beowulf cluster of these".

TheCondor 41 days ago [-]

‘Can you imagine a Beowulf cluster of these’

cmxch 41 days ago [-]

Aside from the memory cost being exorbitant, 4th/5th gen ES CPUs aren’t horribly expensive for the core count you get. 8480s and 8592s have been quite accessible.

Stuffed an 8480+ ES with 192gb of memory across 8 channels and it’s actually not too bad.

SecretDreams 41 days ago [-]

> with proportional RAM and storage

Let's not get carried away here

hagbard_c 41 days ago [-]

Just give it a few years and you'll be able to buy the thing for a fraction of the 'current' price. By that time it will be considered to be 'slow' and 'power-hungry' and people will wonder why you're intent on running older hardware but it'll still work just fine. The DL380 G7 under the stairs here also used to cost an arm and a leg while I got it for some finger nail clippings.

O5vYtytb 41 days ago [-]

Sure looks like a lot of glue holding that CPU together :)

s3p 41 days ago [-]

As soon as I read chiplets I thought about this too! Glad even intel agrees that chiplet architecture is the way forward.

benj111 41 days ago [-]

Am I the only one disappointed they didn't settle for 286 cores?

soganess 41 days ago [-]

During the 8th gen they made an i7-8086... Hopefully Intel hasn't fired that person.

boltzmann-brain 41 days ago [-]

8086K, actually. I still run one inside one of my PCs!

hedora 41 days ago [-]

I wonder if they can bin out ones that have a dead core or two specifically for this purpose.

kissiel 41 days ago [-]

At least you got the Intel® Core™ Ultra 9 Processor 386H :)

gregw2 41 days ago [-]

Where's my new AWS Redshift instance with this? Been stuck on ra3 for 5 years now...

41 days ago [-]

mikelitoris 41 days ago [-]

More Intel vaporware. Seriously, their other 18A product, panther lake, supposedly "launched" January 18th. It's been 1.5 months and I still can't go and buy any panther lake laptop except from dell.com. Why are they like this? I'll believe it when I see it.

Also about "make-or-break": they've been saying this for all of Intel's products since at least 2022 *yawn*

stinkbeetle 41 days ago [-]

AFAIKS there are 18A laptops from Lenovo and HP for the first two I checked. Or do you mean if you click through and buy one it gets put on some indeterminate backorder?

mikelitoris 41 days ago [-]

link?

stinkbeetle 41 days ago [-]

https://www.lenovo.com/us/en/p/laptops/yoga/yoga-slim-series...

https://www.lenovo.com/us/en/p/laptops/ideapad/ideapad-slim-...

EDIT: Changed the lenovo link to a non-preorder laptop (was https://www.hp.com/us-en/shop/pdp/hp-omnibook-ultra-laptop-n...)

lowbloodsugar 41 days ago [-]

“Preorder”

stinkbeetle 41 days ago [-]

Oh, there are so many lenovo ones I just picked one at random and didn't notice it said preorder. The HP link doesn't say that though, right? Here's another lenovo one that doesn't say preorder.

https://www.lenovo.com/us/en/p/laptops/ideapad/ideapad-slim-...

iberator 41 days ago [-]

Why do you needs so many cores for? Apache threads? Any old school wizard here?

toast0 41 days ago [-]

I used to run many hosts with 28 cores per host. If performance scales, it's nicer to have a few 288 core hosts rather than a few hundred 28 core hosts.

Getting the performance to scale can be hard, of course. The less inter-core communication the better. Things that tend to work well are either stuff where a bunch of data comes in and a single thread works on it for a significant amount of time then ships the result or things where you can rely on the NIC(s) to split traffic and you can process the network queue for a connecrion on the same core that handles the userspace stuff (see Receive Side Scaling), but you need a fancy NIC to have 288 network queues.

whateverboat 41 days ago [-]

Host it in proxxmox, run 8 different services on it each with 32 cores.

Tepix 41 days ago [-]

Yeah, virtualization, many (small) containers / VMs.

jiggawatts 41 days ago [-]

These almost always run many smaller virtual machines on top of a hypervisor. The target market is large enterprise or hyperscalers like the public clouds, Meta, etc...

andriy_koval 41 days ago [-]

data processing

lysace 40 days ago [-]

As opposed to?

andriy_koval 40 days ago [-]

OP asked why one would need so many cores, I answered: data processing.

Sorry, not sure I am following your question.

urthor 41 days ago [-]

So TLDR is it competitive?

What are the dimensions and dynamics here vs EPYC?

aliljet 41 days ago [-]

This is really what I want to understand. Where can we see real world performance benchmarks?

wmf 41 days ago [-]

Phoronix should have them soon. Or if they don't it means the performance is bad.

user5994461 41 days ago [-]

Not competitive at all. It's easily visible on the laptop lines, where the same GPU manufactured on TSMC has 3 times the power/performance ratio compared to the Intel one.

Putting more cores is just another desperate move to play the benchmark. Power is roughly quadratic with frequency, every time you fall behind competition, you can double the number of cores and reduce the frequency by 1.414 to compensate.

Repeat a few times and you get CPU with hundreds of cores, but each core is so slow it can hardly do any work.

icegreentea2 41 days ago [-]

??? GPU vs CPU workloads are completely different. Comparing Panther Lake iGPU vs Ryzen iGPU is not going to tell you much about how high density server CPU performance will work out.

The Panther Lake vs Ryzen laptop performance comparisons show that Pather Lake does well, basically trading against top end Ryzen AI laptop chips in both absolute performance, and performance per watt.

user5994461 41 days ago [-]

If you're not aware, Intel has released a lineup of laptops, with some models having the GPU made by them and some having the same GPU made by TSMC. That makes the comparison very direct. TSMC can deliver nearly 3 times the power/performance.

GPU and CPU manufacturing is the same thing, same node, same result. GPU is always maximizing perf/power ratio because it's embarrassingly parallel, leaving no room to game the benchmark. CPU can be gamed by having a single fast core, that drops performance in half as soon as you use another core.

phsau 41 days ago [-]

Could you provide an article that explores this difference? I'd like to understand the mechanics of this and see how this conclusion is reached.

stinkbeetle 41 days ago [-]

That's very interesting if it's the same GPU and perf/W is that much worse. Where are these numbers published please?

StacyRawls 37 days ago [-]

[dead]

midtake 41 days ago [-]

I honestly just want Intel to fail. I believe they have done more anticompetitive harm than good these past years. Datacenter needs to move to ARM so Intel can finally go home.

Sweepi 41 days ago [-]

if 18A is Intel's make-or-break, its a break. Their next node looks promising.

bergheim 41 days ago [-]

Yeah this is their make or break moment.

Because if this is not thunder Intel will default.

I promise you. Heard it from some youtuber as well, trust me.

bigbuppo 41 days ago [-]

Meanwhile, somebody put 8192 arm cores on a chip and ran a risc-v emulator on top of that which emulated a 6502 which then emulated a 288 core xeon and it used 0.01% of the power and outperformed the Intel chip in every other metric 10:1, probably.

ilaksh 41 days ago [-]

Only slightly related, but six years ago I was able to run 400 ZX Spectrum (Z80) emulator instances simultaneously on an AWS graphics workstation.

https://youtu.be/BjeVzEQW4C8?si=0I7UGU0Xz5WUT4ek

bigbuppo 41 days ago [-]

I remember that. Neat stuff.

jvanderbot 41 days ago [-]

You know, a link would be great for this comment.

CamperBob2 41 days ago [-]

https://theonion.com, probably

bigbuppo 41 days ago [-]

Ah, nice to see a fellow lover of the finest news publication on the planet.

throwaway11456 41 days ago [-]

Well Linux was booted on an Intel 4004, emulating a MIPS R3000. Looks like it booted in 4.76 days. I don't believe this article was AI fabricated.

https://arstechnica.com/gadgets/2024/09/hacker-boots-linux-o...

DetroitThrow 41 days ago [-]

Somehow, that still doesn't sound real, but it looks like it is. Wow. Though that one was written by their recently fired hallucination writer.

CoastalCoder 41 days ago [-]

Too risky.

hedora 41 days ago [-]

So, they're selling this as an AI accelerator, with drop in compatibility with existing boards, and no boost to RAM bandwidth.

As I understand things, it would be extremely unusual to ship a chip that was bound by floating point throughput, not uncached memory access, especially in the desktop/laptop space.

I haven't been following the Intel server space too carefully, so it's an honest question: Was the old thing compute and not bandwidth limited, or is this going to be running inference at the same throughput (though maybe with lower power consumption)?

Tepix 41 days ago [-]

No, they're not selling this as an "AI accelerator":

Here is the quote:

"The company says operators deploying 5G Advanced and future 6G networks increasingly rely on server CPUs for virtualized RAN and edge AI inference, as they do not want to re-architect their data centers in a bid to accommodate AI accelerators."

Edge AI usually means very small models that run fine on CPUs.

hedora 41 days ago [-]

A very small model is going to be, what, 8GB? That'll easily blow through the caches. You're going to end up bottlenecked on DRAM either way.

So, I wonder if this is going to be any faster than the previous generation for edge AI.

fc417fc802 41 days ago [-]

Perhaps instead of posting erroneous assertions to HN you could wander over to your LLM of choice and ask it something along the lines of: What are some examples of edge AI applications that achieve good performance on a CPU where memory bandwidth is severely limited compared to a GPU? Please link to publicly available models where possible.

hedora 40 days ago [-]

I run AI applications all the time in exactly those situations. The models range from 2GB (vector models) 30GB (small LLMs) to 100GB (medium LLMs).

None of those fit in 4MB of cache (the per-core on this part), or 1GB (the aggregate cache).

What AI models are you actually talking about? Do you mean old-school ML stuff, like decision trees or high dimensional indexes? No one I know calls those "AI", which is generally reserved for big-ish neural networks.

fc417fc802 40 days ago [-]

"Exactly those situations" you say while describing an entirely different sort of model. Your first clue that you're missing knowledge should have been the part where the thing that the well financed experts were doing didn't make sense to you. Your second clue should have been the part where what I was saying didn't seem to match up with your experience.

I let you know that your were uninformed and even suggested a very low effort way that you might look into the matter. So why didn't you do that?

A couple fairly arbitrary examples. A high performance zero shot TTS model can weigh in at well under 150 MiB. You can solve MNIST (ie perform OCR of handwritten english) to better than 99% accuracy with a sub-100 KiB model. Your LLM of choice will be able to provide you with plenty of others.

Rendered at 00:50:26 GMT+0000 (Coordinated Universal Time) with Vercel.