N
Hacker Next
new
past
show
ask
show
jobs
submit
login
▲
Batched reward model inference and Best-of-N sampling
(
raw.sh
)
33 points by
rawsh
4 days ago
|
0 comments
add comment
Rendered at 18:56:03 GMT+0000 (Coordinated Universal Time) with Vercel.