Next.js App Router + React Server Components Demo

NHacker Next

new
past
show
ask
show
jobs
submit

▲Nvidia releases 8B model with learned 8x KV cache compression (huggingface.co)

9 points by alecco 14 days ago | 4 comments

alecco 14 days ago [-]

The paper behind it was presented in December 2025 Neurips

Release thread: https://xcancel.com/p_nawrot/status/2014770473289019709

Slides and audio presentation: https://neurips.cc/virtual/2025/loc/san-diego/poster/119605

vercaemert 14 days ago [-]

I'd be interested to hear some use cases people have for large contexts on an 8B model. Other than sentiment analysis or summarization (this release implies agentic use). My experience with the general intelligence of agentic interactions is that everything is unusable before 32B for any context greater than 4k tokens.

14 days ago [-]

WaalkTheEaarth 11 days ago [-]

I personally use a 8B model for general use on my laptop lol, it works like a charm and makes sense (most of the time atleast)

g023 7 days ago [-]

I made a smaller sized version https://huggingface.co/g023/Qwen3-8B-DMS-8x-4bit-NF4

Rendered at 18:58:49 GMT+0000 (Coordinated Universal Time) with Vercel.