Better performance than TQ and better quality than FP16?
Am I reading this right??
qeternity 11 days ago [-]
It's not better quality: 59.3% vs 59.4% fp16 on AIME 25
sheepscreek 10 days ago [-]
0.1% is within margin of error. Depending on the performance boost, it might be worthwhile taking a minuscule quality hit.
qeternity 9 days ago [-]
I think it very much is worth it!
But the point was that quality didn't magically increase.
electroglyph 10 days ago [-]
any divergence (even if the benchmark is better) from full precision is error
7e 10 days ago [-]
Just pretend that it is the next step update when training. You didn’t train your model to step=inf, I hope?
thefox96 11 days ago [-]
Faster than Fp16, not better quality i guess
pbich 11 days ago [-]
[dead]
v3ss0n 11 days ago [-]
Why this is not a PR for vLLM ?
esafak 11 days ago [-]
It's the output of a research paper; the authors are not trying to build up vLLM, and they probably have no incentive to do so. You can submit a PR, though! It's easier now while the divergence is low, so don't wait. Since there are six authors, I bet you could get help with the inevitable review chores if you just take the step of creating the PR.
And with the help of AI, pointing at AI at this paper and saying "making a vLLM PR from this paper" tends to work surprisingly well, even if you need to nudge it a little bit along the way.
woadwarrior01 10 days ago [-]
Last I heard, vLLM was backed by a company that has raised $150m in seed funding. I'm sure they've got the resources to port it.
Am I reading this right??
But the point was that quality didn't magically increase.
edit: It might not be clear that it is based on vLLM 0.22, which is the current version: https://github.com/huawei-csl/KVarN/commit/d6290e99098d7426d.... All you have to do is create a diff off it; it's fairly straightforward.