Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲S1: Simple Test-Time Scaling (github.com)

32 points by t55 15 hours ago | 3 comments

mncharity 12 hours ago [-]

> This is similar to the "Superficial Alignment Hypothesis" presented in LIMA (Zhou et al., 2023), where the authors find that 1,000 examples can be sufficient to align a model to adhere to user preferences.

Link: LIMA: Less Is More for Alignment https://proceedings.neurips.cc/paper_files/paper/2023/file/a... , 1k cites: https://scholar.google.com/scholar?cites=1642843440474691780...

artifishy_intel 10 hours ago [-]

The diff between budget forcing on and off is all within the (surprisingly large) confidence intervals of evaluation datasets. Why spend more compute for no significant gain? Seems to distract from the high-value minimal reasoning ft set

Also - In the main/first figure, why are r1 and o1 (the best performing models in Table 1) omitted?

If you collect 59K and then pick the best 1K, is it really fair to say your approach is simple? Sifting through 59K examples doesn't seem simple.

Good stuff though, cool to see how minimal we can get to distill good models (esp. at the manageable 32 size).

randomcatuser 14 hours ago [-]

really cool! i wonder what happens when you teach it to use tools inside the reasoning as well. could be even better!

Rendered at 09:19:16 GMT+0000 (Coordinated Universal Time) with Vercel.