Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Search-R1: Training LLMs to Reason and Leverage Search Engines with RL (arxiv.org)

83 points by jonbaer 16 hours ago | 8 comments

perbu 7 hours ago [-]

This is the magical thing that happens when AI research happens in the open. Deepseek published their model and their methodology and then the nice people at the University of Illinois are able to build on it.

When OpenAI was launched this is what I thought it was going to be like. Something, something for the betterment of man kind.

c16 6 hours ago [-]

I'm always surprised at how many LLM research papers are published on here, so despite OpenAI, I think it's absolutely happening.

NitpickLawyer 23 minutes ago [-]

Unfortunately the "open"AI effect is starting to show in other labs as well. DeepMind recently announced a min 6months delay in publishing their SotA research, to give them a market advantage. I get it, but it's sad that it's happening.

The good thing is that there are a lot of companies out there that want to make a name for themselves. Mistral started like that with Apache 2.0 models, now ds w/ MIT models, and so on. And if the past year is a good indicator, it seems that closed SotA to open close-to-SotA is 6-3 months. So that's good.

I also find interesting LeCun's take that "there is no closed source moat, or not for long". In a podcast he went into detail on this, saying that "people move companies, and people talk". If someone finds some secret sauce, the ideas will move around and other labs will catch up quickly. So there's some hope.

deepsquirrelnet 3 hours ago [-]

This is pretty cool. I have a similar model that’s 8 days into training on msmarco.

So far I only have the “cold start” data posted, but I’m planning on posting a full distillation dataset.

https://huggingface.co/datasets/dleemiller/lm25

0xlogk 3 hours ago [-]

The paper mentions they used Wikipedia as search corpus. The repo states they plan to expand to Google, Bing APIs. I wonder how they will handle evolving search corpora, ie. if continual RL updates will be needed.

DeathArrow 3 hours ago [-]

Can someone ELI5 how reinforcement learning works with transformer based architecture?

sachinaag 7 hours ago [-]

I wonder if Perplexity uses similar methods under the hood or if it is a completely different approach.

mrklol 2 hours ago [-]

I feel like most of these services simply take your prompt and ask a model for search queries regarding that prompt. Then add the resulting pages into the context.

Rendered at 15:41:55 GMT+0000 (Coordinated Universal Time) with Vercel.