Next.js App Router + React Server Components Demo

NHacker Next

new
past
show
ask
show
jobs
submit

▲Value-Based Deep RL Scales Predictably (arxiv.org)

68 points by bearseascape 132 days ago | 3 comments

gsf_emergency_2 132 days ago [-]

My attempt at a summary: authors characterize the data-compute pareto front (aka how bitter is the lesson, exactly?)

For a different perspective, error vs compute, see

https://youtu.be/5eqRuVp65eY

and comments

(I particularly liked the one about string theorists rediscovering a fundamental theorem in GR decades too late-- rediscovering how to integrate happens in every field, it's nothing to be ashamed of :)

yobbo 132 days ago [-]

Skimmed little bits: "on policy" RL means the model has generated output and received feedback from sort of dynamic environment, which might not be scalable. Value-based off-policy means the model is trained with data that wasn't generated from the model itself exploring a dynamic environment. Instead it can be recordings. They then ask the question; how does that scale?

currymj 131 days ago [-]

RL is unbelievably finicky, sensitive to hyperparameters, and hard to make work. If you are pursuing a research project and decide to use RL, you are making your life a lot more difficult and stressful.

It's exciting to see any progress in making accurate predictions about what settings will work for RL training. I hope that this research direction can be expanded in scope and that ultimately, people who want to do research in RL can become confident in their training recipes.

I am more excited about that, than about the dream of scaling compute per se.

Rendered at 03:09:38 GMT+0000 (Coordinated Universal Time) with Vercel.