Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Flash-KMeans: Fast and Memory-Efficient Exact K-Means (arxiv.org)

110 points by matt_d 3 days ago | 8 comments

jacquesm 21 minutes ago [-]

Nice one. K-Means is one of those neat little powertools that once you get the hang of it you find more and more applications for, but it can be a bit slow for larger data sets. So this is very nice to have, thank you matt_d for posting.

leecarraher 45 minutes ago [-]

Do they mean deterministic k-means, k-means++ ... ? Global optimal k-means is NP-Hard, so linear speedups aren't terribly helpful. It's nice, until you add more input. Standard k-means would be nice, or the k-means++ seed algorithm.

jmalicki 40 minutes ago [-]

Kmeans++ is just a seed, this is the inner loop.

Also analogous to flash attention, a linear speedup in big O sense based on the typical algorithmoc complexity computing model can be a polynomial speedup in measured wall clock time due to memory hierarchy differences.

Still small compared to exponential differences, but for an NP-Hard problem, a linear 100x speedup is the difference between practically computable vs. not. There are a ton of things I'd wait 2 hours for that I wouldn't wait a week for.

frakt0x90 2 hours ago [-]

They created this in service of their video generation model which "clusters and reorders tokens based on semantic similarity using k-means.":

http://arxiv.org/pdf/2505.18875

wood_spirit 4 hours ago [-]

Does this have corresponding speed ups or memory gains for normal CPUs too? Just thinking about all the cups of coffee that have been made and drunk while scikit-learn kmeans chugs through a notebook :)

snovv_crash 4 hours ago [-]

For CPU with bigger K you would put the centroids in a search tree, so take advantage of the sparsity, while a GPU would calculate the full NxK distance matrix. So from my understanding the bottleneck they are fixing doesn't show up on CPU.

xavxav 4 hours ago [-]

search trees tend not to scale well to higher dimensions though, right?

from what I've seen I had the impression that Yinyang k-means was the best way to take advantage of the sparsity.

openclaw01 3 hours ago [-]

[dead]

matrix2596 4 hours ago [-]

looks like flash attention concepts applied to kmeans, nice speedup results

maiconburn 2 hours ago [-]

[dead]

Rendered at 15:50:45 GMT+0000 (Coordinated Universal Time) with Vercel.