This is a really exciting development. They’re matching Qwen 2.5 32B on 1/3 the compute budget.
> Refined post-training and RLVR: Our models integrate our latest breakthrough in reinforcement learning with verifiable rewards (RLVR) as part of the Tülu 3.1 recipe by using Group Relative Policy Optimization (GRPO) and improved training infrastructure further enhancing their capabilities.
I only recently discovered all the work AI2 put out with Tülu 3, really laying out all of the components that make up a state-of-the-art post-training data mix. Very interesting stuff!
Awesome to see great work from AI2 continuing. They are the only competitive fully open source model as far as I know - they share the training data and code as well. They also recently released an open source app that does on device AI on your phone!
laughingcurve 1 days ago [-]
There are some other models out there but OLMO is the best LLM for researchers to study
1 days ago [-]
Rendered at 23:56:15 GMT+0000 (Coordinated Universal Time) with Vercel.
> Refined post-training and RLVR: Our models integrate our latest breakthrough in reinforcement learning with verifiable rewards (RLVR) as part of the Tülu 3.1 recipe by using Group Relative Policy Optimization (GRPO) and improved training infrastructure further enhancing their capabilities.
I only recently discovered all the work AI2 put out with Tülu 3, really laying out all of the components that make up a state-of-the-art post-training data mix. Very interesting stuff!
https://allenai.org/blog/tulu-3-technical