Is this a fined tuned LLM, for example drop in replacement for Llama etc.
Or is it some algorithm on top of an LLM, doing some chain of reasoning?
peakji 8 minutes ago [-]
It is an LLM fine-tuned using a new type of dataset and RL reward. It's good at reasoning, but I would not recommend to replace Llama for general tasks.
Approaches like best of n sampling and majority voting are definitely feasible. But I don't recommend trying things related to CoT, as it might interfere with the internalized reasoning patterns.
Mr_Bees69 2 hours ago [-]
Really hope this goes somewhere, o1 without openai's costs and restrictions would be sweet.
ActorNightly 20 minutes ago [-]
OpenAIs o1 isnt really going that far though. Its definitelly better in some areas, but not overall better.
Im wondering if we can abstract chain of thought further down into the computation levels to replace a lot of matrix multiply. Like smaller transformers with less parameters and more selection of which transformer to use through search.
peakji 1 hours ago [-]
The model can already answer some tricky questions that other models (including GPT-4o) have failed to address, achieving a +5.56 improvement on the GPQA-Diamond dataset. Unfortunately, it has not yet managed to reproduce inference-time scaling. I will continue to explore different approaches!
Rendered at 17:54:47 GMT+0000 (Coordinated Universal Time) with Vercel.
Is this a fined tuned LLM, for example drop in replacement for Llama etc.
Or is it some algorithm on top of an LLM, doing some chain of reasoning?
Im wondering if we can abstract chain of thought further down into the computation levels to replace a lot of matrix multiply. Like smaller transformers with less parameters and more selection of which transformer to use through search.