This reminds me of the Agent Workflow Memory (AWM) paper [1], which also tries to find optimal decision paths for LLM-based agents but relies on in-context learning, whereas DeepRAG fine-tunes models to decide when to retrieve external knowledge.
I’ve been thinking about how modifying AWM to use fine-tuning or an external knowledge system (RAG) might work—capturing the ‘good’ workflows it discovers rather than relying purely on prompting.
Does anyone have a good recommendation for a local dev setup that does something similar with available tools? Ie incorporates a bunch of PDFs (~10,000 pages of datasheets) and other docs, as well as a curl style importer?
Trying to wean myself off the next tech molochs, ideally with local functionality similar to OpenAIs Search + Reason, and gave up on Langchain during my first attempt 6 months ago.
throwup238 12 hours ago [-]
Honestly you're better off rolling your own (but avoid LangChain like the plague). The actual implementation is simple but the devil is in the details - specifically how you chunk your documents to generate vector embeddings. Every time I've tried to apply general purpose RAG tools to specific types of documents like medical records, internal knowledge base, case law, datasheets, and legislation, it's been a mess.
Best case scenario you can come up with a chunking strategy specific to your use case that will make it work: stuff like grouping all the paragraphs/tables about a register together or grouping tables of physical properties in a datasheet with the table title or grouping the paragraphs in a PCB layout guideline together into a single unit. You also have to figure out how much overlap to allow between the different types of chunks and how many dimensions you need in the output vectors. You then have to link chunks together so that when your RAG matches the register description, it knows to include the chunk with the actual documentation so that the LLM can actually use the documentation chunk instead of just the description chunk. I've had to train many a classifier to get this part even remotely usable in nontrivial use cases like caselaw.
Worst case scenario you have to finetune your own embedding model because the colloquialisms the general purpose ones are trained on have little overlap with how terms of art and jargon used in the documents (this is especially bad for legal and highly technical texts IME). This generally requires thousands of examples created by an expert in the field.
deoxykev 12 hours ago [-]
Don't forget to finetune the reranker too if you end up doing the embedding model. That tends to have outsized effects on performance for out of distribution content.
cpursley 9 hours ago [-]
I've had great luck just base64'ing images and asking Qwen 2.5 VL to both parse it to markdown and generate a title, description and list of keywords (seems to work well on tables and charts). My plan is to split PDFs into pngs first then run those against Qwen async, then put them into a vector database (haven't gotten around to that quite yet).
metadat 4 hours ago [-]
How does the base64 output become useful / usable information to an LLM?
byefruit 12 hours ago [-]
> This generally requires thousands of examples created by an expert in the field.
Or an AI model pretending to be an expert in the field... (works well in a few niche domains I have used this in)
3abiton 11 hours ago [-]
I am looking up chunking techniques, but the resources are so scarce on this. What's your recommendation?
If there's a list of techniques and their optimal use cases I haven't found it.
I started writing one for the day job, but then graphRAG happened, and Garnter is saying all RAG will be graphRAG.
You can't fight Gartner, no matter how wrong they are, so the work stopped, now everything is a badly implemented graph.
That's a long way to say, if there is a comparison, a link would be most appreciated
petesergeant 5 hours ago [-]
It’s the big unsolved problem and nobody’s talking about it. I’ve had some decent success asking an expensive model to generate the chunks and combining that with document location, and my next plan for an upcoming project is to do that hierarchically, but there’s no generally accepted solution yet.
RAG’s big problem is turning PDFs into chunks, both as a parsing problem and as the chunking problem. I paid someone to do the parsing part into markdown for a project recently (including table data summaries) and it worked well. MathPix has an good API for this, but it only works sensibly for PDFs that don’t have insane layouts, and many do.
cyanydeez 4 hours ago [-]
The data source i have is a filesystem with docs, pdfs, graphs etc.
Will need to expand folder names, file abfeviations. Do repetative analysis to find footers and headets. Locate titles on first pages and dedupe a lot. It seems like some kind of content+hierarchy+keywords+subtitle will need to be vectorized, like a card catalog.
crishoj 11 hours ago [-]
> but avoid LangChain like the plague
Can you elaborate on this?
I have a proof-of-concept RAG system implemented with LangChain, but would like input before committing to this framework.
t1amat 11 hours ago [-]
LangChain is considered complicated to get started with despite offering probably the widest amount of functionality. If you are already comfortable with LangChain you are free to ignore that.
weitendorf 10 hours ago [-]
My company (actually our two amazing interns) was working on this over the summer, we abandoned it but it’s 85% of the way to doing what you want it to do: https://github.com/accretional/semantifly
We stopped working on it mostly because we had higher priorities and because I became pretty disillusioned with top-K rag. We had to build out a better workflow system anyway, and with that we could instead just have models write and run specific queries (eg list all .ts files containing the word “DatabaseClient”), and otherwise have their context set by users explicitly.
The problem with RAG is that simplistic implementations distract and slow down models. You probably need an implementation that makes multiple passes to prune the context down to what you need to get good results, but that’s complicated enough that you might want to build something else that gives you more bang for your buck.
numba888 5 hours ago [-]
> gave up on Langchain during my first attempt 6 months ago
Why? If it's not a secret. I'm just looking for something, not sure actually what... :-\
jondwillis 13 hours ago [-]
Continue and Cline work with local models (e.g. via Ollama) and have good UX for including different kinds of context. Cursor uses remote models, but provides similar functionality.
amrrs 13 hours ago [-]
I'm sorry trying to clarify - why would you use Cline (which is coding assistant) for RAG?
brunohaid 13 hours ago [-]
Appreciated! Didn’t know Cline already does RAG handling, thought I’d have to wire that up beforehand.
kordlessagain 12 hours ago [-]
I’ve been working on something that provides document search for agents to call if they need the documents. Let me know if you are interested. It’s Open Source. For this many documents it will need some bucketing with semantic relationships, which I’ve been noodling on this last year. Still needs some tweaking for what you are doing, probably. Might get you further along if you are considering rolling your own…
heywoods 12 hours ago [-]
Could I take a look at the repo? Thanks!
6 hours ago [-]
5 hours ago [-]
jondwillis 13 hours ago [-]
The title reads awkwardly to a native English speaker. A search of the PDF for "latency" returns one result, discussing how naive RAG can result in latency. What are the latency impacts and other trade-offs to achieve the claimed "[improved] answer accuracy by 21.99%"? Is there any way that I could replicate these results without having to write my own implementation?
3 hours ago [-]
Rendered at 06:55:49 GMT+0000 (Coordinated Universal Time) with Vercel.
I’ve been thinking about how modifying AWM to use fine-tuning or an external knowledge system (RAG) might work—capturing the ‘good’ workflows it discovers rather than relying purely on prompting.
[1] https://arxiv.org/abs/2409.07429 - Agent Workflow Memory (Wang et al., 2024)
Does anyone have a good recommendation for a local dev setup that does something similar with available tools? Ie incorporates a bunch of PDFs (~10,000 pages of datasheets) and other docs, as well as a curl style importer?
Trying to wean myself off the next tech molochs, ideally with local functionality similar to OpenAIs Search + Reason, and gave up on Langchain during my first attempt 6 months ago.
Best case scenario you can come up with a chunking strategy specific to your use case that will make it work: stuff like grouping all the paragraphs/tables about a register together or grouping tables of physical properties in a datasheet with the table title or grouping the paragraphs in a PCB layout guideline together into a single unit. You also have to figure out how much overlap to allow between the different types of chunks and how many dimensions you need in the output vectors. You then have to link chunks together so that when your RAG matches the register description, it knows to include the chunk with the actual documentation so that the LLM can actually use the documentation chunk instead of just the description chunk. I've had to train many a classifier to get this part even remotely usable in nontrivial use cases like caselaw.
Worst case scenario you have to finetune your own embedding model because the colloquialisms the general purpose ones are trained on have little overlap with how terms of art and jargon used in the documents (this is especially bad for legal and highly technical texts IME). This generally requires thousands of examples created by an expert in the field.
Or an AI model pretending to be an expert in the field... (works well in a few niche domains I have used this in)
It's setup so that you can perform whatever type of chunking you might prefer.
You can't fight Gartner, no matter how wrong they are, so the work stopped, now everything is a badly implemented graph.
That's a long way to say, if there is a comparison, a link would be most appreciated
RAG’s big problem is turning PDFs into chunks, both as a parsing problem and as the chunking problem. I paid someone to do the parsing part into markdown for a project recently (including table data summaries) and it worked well. MathPix has an good API for this, but it only works sensibly for PDFs that don’t have insane layouts, and many do.
Will need to expand folder names, file abfeviations. Do repetative analysis to find footers and headets. Locate titles on first pages and dedupe a lot. It seems like some kind of content+hierarchy+keywords+subtitle will need to be vectorized, like a card catalog.
Can you elaborate on this?
I have a proof-of-concept RAG system implemented with LangChain, but would like input before committing to this framework.
We stopped working on it mostly because we had higher priorities and because I became pretty disillusioned with top-K rag. We had to build out a better workflow system anyway, and with that we could instead just have models write and run specific queries (eg list all .ts files containing the word “DatabaseClient”), and otherwise have their context set by users explicitly.
The problem with RAG is that simplistic implementations distract and slow down models. You probably need an implementation that makes multiple passes to prune the context down to what you need to get good results, but that’s complicated enough that you might want to build something else that gives you more bang for your buck.
Why? If it's not a secret. I'm just looking for something, not sure actually what... :-\