N
Hacker Next
new
past
show
ask
show
jobs
submit
login
▲
Batched reward model inference and Best-of-N sampling
(
raw.sh
)
34 points by
rawsh
445 days ago
|
0 comments
add comment
Rendered at 17:14:34 GMT+0000 (Coordinated Universal Time) with Vercel.