NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Auto-Differentiating Any LLM Workflow: A Farewell to Manual Prompting (arxiv.org)
snake_doc 3 days ago [-]
Holy unnecessary use of terminology to explain a reverse graph traversal. “Loss”, “gradients”, “differentiating”— no! stop!

This must be what AI hype actually is. Complete incoherent language to explain a very straight forward concept.

This is just: LLMs judging intermediate node outputs, and reverse traversing the graph while doing so until it modifies the original prompt.

currymj 3 days ago [-]
i am old enough to remember the opposite: people would try to sell deep learning to the mainstream ML community by pointing out that backprop is just message-passing on a Bayesian network with modified sum/product operations.
snake_doc 3 days ago [-]
Valid point, but at least that was mathematics. This paper isn’t even math, it’s a data control flow masquerading as math.
porridgeraisin 2 days ago [-]
Interestingly, backpropagation is the natural way I'd describe this process, not reverse graph traversal.

Background difference I suppose.

> This must be what AI hype actually is. Complete incoherent language to explain a very straightforward concept.

True, a lot of papers overdo the jargon just for hype purposes. My favorite funniest example is this one from Google Research (and universities) (have linked the paper review video below)

https://youtu.be/Pl8BET_K1mc

See the YouTube chapter about "Multidiffusion" (around 38minutes)

They spent multiple paragraphs formulating an "optimisation problem" which when peeled down amounts to taking the mean, just to be able to superficially cite their own previous paper.

Quite the sorry state of things.

upghost 4 days ago [-]
> - the task of crafting textual inputs to effectively direct LLMs -- remains difficult and labor-intensive

Damn, I knew we were lazy but describing prompting as labor-intensive is impressively lazy even to me.

Obviously reading the rest of the abstract was too labor intensive for me but I'm hoping I can just hook a probe up to my drool and it can infer my desires from that.

IanCal 4 days ago [-]
It is labour intensive to optimise a prompt though. Particularly for smaller models.

Setting parameters for any ml model is easy, but we'd call it labour intensive if we expected people to do it manually despite having evals. Instead we have ways of searching for and optimising settings. The methods for that are obvious for small cardinality discrete values or continuous variables. Less so for arbitrary text.

jyhong 3 days ago [-]
It is not labor-intensive if you want a prompt to work in 80%-90% of cases and humans are good at that. But it is labor-intensive if you want to make it to work at 99%. Then you need to go through many cases and "optimize" the prompt, which is the advantage of optimizer.
ionflow 4 days ago [-]
this made my day
xg15 4 days ago [-]
Just read the abstract so far. Sounds amazing, but just for the sake of understanding, what would be the inputs and outputs of such a system? If the prompt is generated, how do you tell the system what you'd like to have? And what is the ground truth that is trained against? Examples of the desired text?
amelius 4 days ago [-]
I think what they mean is intermediate prompts, i.e. the prompts that the system gives to itself when solving a problem that requires multiple stages.
xg15 4 days ago [-]
Ah, that makes sense. So (very) basically, they're putting a number of regular LLMs into a sort of compute chain/graph, where one LLM feeds into the other, then doing gradient descent on the whole chain at once, essentialy treating the boundaries between LLM n and LLM n+1 as "hidden layers"?
meame2010 4 days ago [-]
Author here. Yea, in this fashion. And it can create the feedback using llm as a backward engine
meame2010 7 days ago [-]
hnuser123456 7 days ago [-]
Congrats on the paper! I read through some of the github docs and read through the paper, this sounds very impressive, but I'm trying to think of how to best use this in practice... is the idea that I could give some kind of high-level task/project description (like a Python project), and this framework would intelligently update its own prompting to avoid getting stuck and to continue "gaining skill" throughout the process of working on a task? Could this be used to build such a system? Very curious to learn more.
meame2010 7 days ago [-]
you need a training dataset, and a task pipeline that works. You can refer to this doc: https://adalflow.sylph.ai/use_cases/question_answering.html
hnuser123456 7 days ago [-]
Thank you, I missed the use cases section, that explains a lot. Nice documentation. Might play with this when I get home.
stevage 4 days ago [-]
Can anyone ELI5? Or at least a kind of layman's explanation?
rsfern 4 days ago [-]
Consider a complex LLM pipeline with multiple steps. each LLM evaluation has an associated prompt to shape the style/context of the response. Conventionally these prompts are treated like hyperparameters that have to be manually adjusted to get the desired behavior from the LLM.

This work introduces a way to treat these prompts like trainable parameters, updating them through automatic differentiation of some kind of supervised training loss.

For me it kind of feels like deep dream or style transfer, which use autograd to optimize the model inputs (instead of the parameters) to achieve some goal (like mixing the style and content of two input images)

alankarmisra 4 days ago [-]
This paper suggests that LLMs can be trained to handle multi-stage questioning by automatically optimizing prompts using feedback-based methods, improving their ability to process complex, multi-step interactions.
deadbabe 4 days ago [-]
less words makes more prompt
iceman_w 3 days ago [-]
I always thought that the point of instruction tuning and ability to use prompts to get the model to do 0 shot tasks was that you don't have to collect tons of example data/samples. The method proposed here requires you to have tons of data. If you have that, why not just fine tune the underlying model?
rfw300 4 days ago [-]
I always find myself baffled by “prompt optimization” frameworks. Do people really find themselves needing random perturbations of a fixed prompt to improve accuracy? It’s my experience that the challenging part of writing a prompt is figuring out what the task you want done is, and understanding which data you need to pass to the model to make the task achievable. None of that can be achieved by “optimizing” the prompt—the hard part is a layer of abstraction upward.
popalchemist 3 days ago [-]
It's useful in enterprise scenarios where you need a reliable outcome for some kind of programmatic task, and you are dealing with throughput of jobs in the thousands to hundreds of thousands.
joehewett 4 days ago [-]
depends what you're doing. If you're using ChatGPT via the UI for a one off question, sure. If you're prompting an LLM that is doing a critical task in production millions of times, minor improvements can have significant benefit
rfw300 3 days ago [-]
I have done the latter much more than the former. My experience has been the issues come from inputs that you don’t foresee, not reliability on in-distribution uses (which would be your “training” data for prompt optimization). And the worry is that this kind of optimization would lead to substantive revisions of the guidelines set out in the prompt, which could further compromise performance out of distribution.

To the extent that you need to eke out reliability on the margins, one is vastly better served by actual fine-tuning, which is available both for open-source models and most major proprietary models.

thom 4 days ago [-]
Wow, just when I’d accepted MIPRO in DSPy was magic, here we are. Things continue apace.
code_biologist 4 days ago [-]
meame2010 4 days ago [-]
Yup. The LLM-AutoDiff is just getting started. But it has proven generation-only without explicitly doing few-shot samples can be even more effective and create shorter final prompts
schappim 2 days ago [-]
Here is a basic implementation of this in Ruby: https://gist.github.com/schappim/ad8b4953486617c7f813751c8ee...
nullc 4 days ago [-]
requires a backwards trained llm no?

I don't think anyone has pretrained a remotely-close-to-SOTA sized backwards model.

seanhunter 4 days ago [-]
Haven’t read the paper yet just the abstract, but it sounds like it uses a backwards trained llm itself to generate prompts and examples but can do the autodiff on any llm.
meame2010 4 days ago [-]
We use gpt4o as the backward model. But I’m excited to try deepseek r1 as it has explicit reasoning available.

We are continuously adding more benchmarks to the paper with UTAustin.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 09:48:55 GMT+0000 (Coordinated Universal Time) with Vercel.