Also note to use temperature = 1.0, top_p = 0.95, top_k = 64 and the EOS is "<turn|>". "<|channel>thought\n" is also used for the thinking trace!
NitpickLawyer 4 minutes ago [-]
Best thing is that this is Apache 2.0
The sizes are E2B and E4B (following gemma3n arch, with focus on mobile) and 26BA4 MoE and 31B dense. The mobile ones have audio in (so I can see some local privacy focused translation apps) and the 31B seems to be strong in agentic stuff. 26BA4 stands somewhere in between, similar VRAM footprint, but much faster inference.
a7om_com 4 minutes ago [-]
Gemma models are already in our AIPI inference pricing index. Open source models like Gemma run 70.7% cheaper than proprietary equivalents at the median across the 2,614 SKUs we track. With Gemma 4 hitting third-party platforms the pricing will be worth watching closely. Full data at a7om.com.
minimaxir 4 minutes ago [-]
The benchmark comparisons to Gemma 3 27B on Hugging Face are interesting: The Gemma 4 E4B variant (https://huggingface.co/google/gemma-4-E4B-it) beats the old 27B in every benchmark at a fraction of parameters.
The E2B/E4B models also support voice input, which is rare.
jwr 12 minutes ago [-]
Really looking forward to testing and benchmarking this on my spam filtering benchmark. gemma-3-27b was a really strong model, surpassed later by gpt-oss:20b (which was also much faster). qwen models always had more variance.
babelfish 3 minutes ago [-]
Wow, 30B parameters as capable as a 1T parameter model?
flakiness 4 minutes ago [-]
It's good they still have non-instruction-tuned models.
Rendered at 16:28:43 GMT+0000 (Coordinated Universal Time) with Vercel.
Thinking / reasoning + multimodal + tool calling.
Guide for those interested: https://unsloth.ai/docs/models/gemma-4
Also note to use temperature = 1.0, top_p = 0.95, top_k = 64 and the EOS is "<turn|>". "<|channel>thought\n" is also used for the thinking trace!
The sizes are E2B and E4B (following gemma3n arch, with focus on mobile) and 26BA4 MoE and 31B dense. The mobile ones have audio in (so I can see some local privacy focused translation apps) and the 31B seems to be strong in agentic stuff. 26BA4 stands somewhere in between, similar VRAM footprint, but much faster inference.
The E2B/E4B models also support voice input, which is rare.