I came across this company called OpenEvidence. They seem to be offering semantic search on medical research. Founded in 2021.
How could it possibly keep up with LLM based search?
dnw 4 days ago [-]
It is a little more than semantic search. Their value prop is curation of trusted medical sources and network effects--selling directly to doctors.
I believe frontier labs have no option but to go into verticals (because models are getting commoditized and capability overhang is real and hard to overcome at scale), however, they can only go into so many verticals.
simianwords 4 days ago [-]
> Their value prop is curation of trusted medical sources
Interesting. Why wouldn't an LLM based search provide the same thing? Just ask it to "use only trusted sources".
tacoooooooo 4 days ago [-]
They're building a moat with data. They're building their own datasets of trusted sources, using their own teams of physicians and researchers. They've got hundreds of thousands of physicians asking millions of questions everyday. None of the labs have this sort of data coming in or this sort of focus on such a valuable niche
simianwords 4 days ago [-]
> They're building their own datasets of trusted sources, using their own teams of physicians and researchers.
Oh so they are not just helping in search but also in curating data.
> They've got hundreds of thousands of physicians asking millions of questions everyday. None of the labs have this sort of data coming in or this sort of focus on such a valuable niche
I don't take this too seriously because lots of physicians use ChatGPT already.
some_random 4 days ago [-]
Lots of physicians use ChatGPT but so do lots of non-physicians and I suspect there's some value in knowing which are which
rune-dev 3 days ago [-]
> Just ask it to "use only trusted sources".
This is pure LLM brain rot. You can’t “just ask” an LLM to be more reliable.
AlexeyBelov 1 days ago [-]
Look at their other comments. They must be trolling at this point.
otikik 4 days ago [-]
I don't think you can use an LLM for that. For the same reason you can't just ask it to "Make the app secure and fast"
simianwords 4 days ago [-]
This is completely incorrect. This is exactly what LLMs can do better.
sjsjshzhhz 4 days ago [-]
Somebody should tell the Claude code team then. They’ve had some perf issues for awhile now.
More seriously, the concept of trust is extremely lossy. The LLM is gonna lean in one direction that may or may not be correct. At the extreme, it wound likely refute a new discovery that went against what we currently know. In a more realistic version, certain AIs are more pro Zionist than others.
simianwords 4 days ago [-]
I meant that LLMs can be trusted to do searches and not hallucinate while doing it. You’ve taken that to mean it can comply with anything.
The thing is, LLMs are quite good at search and probably way way more strong that whatever RAG setup this company has. What failure mode are you looking at from a search perspective? Will ChatGPT just end up providing random links?
otikik 3 days ago [-]
No, it is "absolutely right". The chatbots will say they can do it, but they can't. See the Openclaw debacle for a recent example.
Esophagus4 3 days ago [-]
Have you tried it? Or are you just grasping at the latest straw you can find?
otikik 2 days ago [-]
I have provided an actual, concrete example of how the security completely backfired with llms - OpenClaw. The reason why I tried to provide something recent is because the usual excuse when providing examples more away in the past is "llms have improved a lot, they don't do that any more".
Yet now I provide an example of a very recent, big, very obvious, very prominent security explosion and now I am "grasping at the latest straw".
Ok man.
Esophagus4 2 days ago [-]
I’ll take that as a “no, I haven’t tried it.”
I’m guessing you’re not even aware of what OpenEvidence is, nor are you aware that every doctor you know uses it.
otikik 2 days ago [-]
Take what you want how you want, I hope it makes you happy
dnw 4 days ago [-]
Yes, they can. We have gotten better at grounding LLMs to specific sources and providing accurate citations. Those go some distance in establishing trust.
There is trust and then there is accountability.
At the end of the day, a business/practice needs to hold someone/entity accountable. Until the day we can hold an LLM accountable we need businesses like OpenEvidence and Harvey. Not to say Anthropic/OpenAI/Google cannot do this but there is more to this business than grounding LLMs and finding relevant answers.
Rygian 3 days ago [-]
> We have gotten better at grounding LLMs to specific sources and providing accurate citations
And how does the LLM know which specific sources to ground itself to?
palmotea 4 days ago [-]
> Why wouldn't an LLM based search provide the same thing? Just ask it to "use only trusted sources".
Is that sarcasm?
simianwords 4 days ago [-]
why?
Rygian 4 days ago [-]
How does the LLM know which sources can be trusted?
simianwords 4 days ago [-]
yeah it can avoid blogspam as sources and prioritise research from more prestigious journals or more citations. it will be smart enough to use some proxy.
palmotea 4 days ago [-]
You can also tell it to just not hallucinate, right? Problem solved.
I think what you'll end up is a response that still relies on whatever random sources it likes, but it'll just attribute it to the "trusted sources" you asked for.
simianwords 4 days ago [-]
you have an outdated view on how much it hallucinates.
UqWBcuFx6NV4r 4 days ago [-]
I am not anti-LLM by almost any stretch but your lack of fundamental understanding coupled with willingness to assert BS is at the point where it’s impossible to discuss anything.
You started off by asking a question, and people are responding. Please, instead of assuming that everyone else is missing something, perhaps consider that you are.
simianwords 4 days ago [-]
You’ve misunderstood my position and you rely on slander.
Here’s what I mean: LLMs can absolutely be directed to just search for trustable sources. You can do this yourself - ask ChatGPT a question and ask it to use sources from trustworthy journals. Come up with your own rubric maybe. It will comply.
Now, do you disagree that ChatGPT can do this much? If you do, it’s almost trivially disprovable.
One of the posters said that hallucination is a problem but if you’ve used ChatGPT for search, you would know that it’s not. It’s grounding on the results anyway a worst case the physician is going to read the sources. So what’s hallucination got to do here?
The poster also asked a question “can you ask it to not hallucinate”. The answer is obviously no! But that was never my implication. I simply said you can ask it to use higher quality sources.
Since you’ve said in asserting BS, I’m asking you politely to show me exactly what part of what I said constitutes as BS with the context I have given.
palmotea 4 days ago [-]
The point was: will telling it to not hallucinate make it stop hallucinating?
simianwords 4 days ago [-]
No, but did I suggest this? I only suggested you can ask ChatGPT to rely on higher quality sources. ChatGPT has a trade off to do when performing a search - it can rely on lower quality sources to answer questions at the risk of these sources being wrong.
Please read what I have written clearly instead of assuming the most absurd interpretation.
pasttense01 3 days ago [-]
So why doesn't ChatGPT rely on higher quality sources as a default?
simianwords 3 days ago [-]
I literally stated the trade off!
4 days ago [-]
treetalker 3 days ago [-]
"You are a brilliant consulting physician. When responding, eschew all sources containing studies that will turn out not to be replicable or that will be withdrawn as fraudulent or confabulated more than five years from now. P.s. It's February 2026."
OJFord 3 days ago [-]
OpenEvidence does use an LLM. It's ChatGPT for doctors/medical research, tuned to give references in respected journals etc. Hospitals (at least US, Canada, UK) allow and even encourage (US) staff to use it as a quick lookup, the way they'd otherwise Google for a dosage say, it just does that better.
(My wife's a hospital doctor & author and introduced me to it; other family in other countries.)
See this. I use OpenEvidence. It has access to full text from some of the major medical journals. But generalist models seem to outperform it. Not sure what is going on there.
Der_Einzige 3 days ago [-]
btw - OpenEvidence is also the name that competitive debaters used for their giant archive of policy debate, LD debate, and (small amounts of) PF debate evidence. That project has been going on for decades now.
We turned that into a proper, ready-for-use-in-AI dataset and contributed it to the mainstream AI community under the name OpenDebateEvidence. Presented at NeurIPS 2024 Dataset and Benchmark track.
Much of the scientific medical literature is behind paywalls. They have tapped into that datasource (whereas ChatGPT doesn't have access to that data). I suspect that were the medical journals to make a deal with OpenAI to open up the access to their articles/data etc, that open evidence would rely on the existing customers and stickiness of the product, but in that circumstance, they'd be pretty screwed.
Do you think maybe ~10B USD to should cover all of them? For both indexing and training? Seems highly valuable.
Edit: seems like it is ~10M USD.
gip 4 days ago [-]
I'm not really understanding why Thomson Reuters is at direct risk from AI. Providing good data streams will still be very valuable?
elemeno 4 days ago [-]
They’re one of the two big names in legal data - Thomson Reuters Westlaw and RELX LexisNexis. They’re not just search engines for law, but also hubs for information about how laws are being applied with articles from their in house lawyers (PSLs, professional support lawyers - most big law firms have them as well to perform much the same function) that summarise current case law so that lawyers don’t have to read through all the judgements themselves.
If AI tooling starts to seriously chip away at those foundations then it puts a large chunk of their business at risk.
themgt 4 days ago [-]
The commodification of expertise writ large is a bit mind boggling to contemplate.
4 days ago [-]
th0ma5 4 days ago [-]
[dead]
whitej125 4 days ago [-]
TR will not disappear. But their value to the market was "data + interface to said data" and that value prop is quickly eroding to "just the data".
You can be a huge, profitable data-only company... but it's likely going to be smaller than a data+interface company. And so, shareholder value will follow accordingly.
palmotea 4 days ago [-]
Seems like they should hold tight to that data (and not license it for short-term profit), so customers have to use their interface to get at it.
yodon 4 days ago [-]
If customers start asking Claude first, before they ask Thomson Reuters, that's a big risk for the later company.
gip 4 days ago [-]
Got it, thank you for the insight.
The assumption is that Claude has access to a stream of fresh, currated data. Building that would be a different focus for Anthropic. Plus Thomson Reuters could build an integration. Not totally convinced that is a major threat yet.
disgruntledphd2 3 days ago [-]
It's definitely not a major threat, but many/most finance people are clueless about what is and isn't possible with LLMs.
Again, unless Anthropic are taking on liability for their legal tools, this is not going to impact TR.
That being said, there probably is a potential company here that's gonna be built soon/is currently being built, but it definitely won't just be a wrapper around Claude as the recall will be way too low for these systems unaided.
So the value of a skill file is that it tells the model how to format its response for use within the software environment surrounding the model.
With programming, it's mostly about how to tell it to use some API.
But all the model can do is reply some text, and the actual work needs to be done by the software(the agent harness) which needs to parse the model response and translate it into actual work.
My point is there is no magic: the model just reads the skill file and then uses that as a template for a textual response, which is then parsed and processed by traditional software.
Yes, the model will read it and it will influence its response, but without some extensive software harness around the model to give it data for context and and so on: totaly useless.
Why? Because garbage in is garbage out.
So telling the model to review a contract and pay attention to "Whether indemnification is mutual or unilateral" will result is some response from the model, but without additional data it will be at the same level as what you can get from a google search.
The effect on established companies is exactly zero.
Now, having an in-house skills and proprietary software around the model to integrate it into your system, that would be valuable indeed, but not something an AI lab can replicate without building the whole company from scratch.
epicureanideal 4 days ago [-]
Could this lead to more software products, more competition, and more software engineers employed at more companies?
fishpham 4 days ago [-]
I think the argument is that tools like Claude Code will cause more companies to just build solutions in-house rather than purchase from a vendor.
groceryheist 4 days ago [-]
This is correct. AI is a huge boon for open source, bespoke code, and end-user programming. It's death for business models that depend on proprietary code and products bloated with features only 5% of users use.
hugs 4 days ago [-]
possibly also a boon for automated testing tools and infra designed for ai-driven coding.
danans 4 days ago [-]
> Could this lead to more software products, more competition, and more software engineers employed at more companies?
No, it will just lead to the end of the Basic CRUD+forms software engineer, as nobody will pay anyone just for doing that.
The world is relatively satisfied with "software products". Software - mostly LLM authored - will be just an enabler for other solutions in the real world.
falloutx 4 days ago [-]
There are no pure CRUD engineers unless you are looking at freelance websites or fiver. Every tiny project becomes a behemoth of spaghetti code in the real world due to changing requirements.
> The world is relatively satisfied with "software products".
you can delete all websites except Tiktok, Youtube and PH, and 90% of the internet users wouldnt even notice something is wrong on the internet. We dont even need LLMs, if we can learn to live without terrible products.
garbawarb 4 days ago [-]
I kind of imagine more people going off and building their own companies.
DougN7 4 days ago [-]
I think so too. But because of code quality issues and LLMs not handling the hard edge cases my guess is most of those startups will be unable to scale in any way. Will be interesting to watch.
danans 4 days ago [-]
Not if they don't have access to capital. Lacking that, they won't be building much of anything. And if there a lot of people seeking capital, it gets much harder to secure.
Capital also won't be rewarded to people who don't have privileged/proprietary access to a market or non-public data or methods. Just being a good engineer with Claude Code isn't enough.
guluarte 4 days ago [-]
I think companies will need to step up their game and build more competitive products with more features, less buggy and faster than what people can build
rishabhaiover 4 days ago [-]
maybe eventually, not in the near-term future.
unyttigfjelltol 4 days ago [-]
It’s demonetizing process rent-seeking. AI can build whatever process you want, or some approximation of it.
PostOnce 4 days ago [-]
If it turns out that AI isn't much more productive, it could also turn out that people still believe it is, and therefore don't value software companies.
If that happens, some software companies will struggle to find funding and collapse, and people who might consider starting a software company will do something else, too.
Ultimately that could mean less competition for the same pot of money.
I wonder.
rubyfan 4 days ago [-]
I left software about 10 years ago for this reason. I saw engineers being undervalued, management barriers to productivity and higher compensation possibilities for non-tech functions.
reasonabl_human 3 days ago [-]
How do you feel about this in retrospect? Those observations sound heavily firm-dependent, but I would be interested in learning which non-tech functions offer higher compensation possibilities
4 days ago [-]
roysting 4 days ago [-]
Can this really be a kind of herding stampede behavior over Cowork? It’s been out several days now and just all the sudden today, all the traders suddenly got it into their little herd animal heads that everyone should rush to the exists… after that equally sketchy silver and gold rug pull type action last week?
Something seems quite off. Am I the only one?
retube 4 days ago [-]
Markets are not as efficient as the textbooks would have you believe. Investors typically rely on a fairly small set of analysts for market news and views. It might take those guys a while to think about stuff, write a note etc. The deepseek crash last year lagged by several days as well.
andai 4 days ago [-]
I'm out of the loop, but I thought there were sophisticated automated trading algorithms where people pay to install microwave antennas so they can have 1ms lower latency. And I thought those systems are hooked up to run sentiment analysis on the news. Maybe the news is late?
th0ma5 4 days ago [-]
That is generally only applicable to extremely momentary arbitrage opportunities. There's still a lot of automation though, but it's pretty boring. It's basically look at the news and make a recommendation to a fund manager or something, and various competing vendors of such, down to consumer products like that.
entech 3 days ago [-]
I am with you. I think that it is more likely to be related to Japanese carry trade unwind starting to worry the banks, while continuing to drive the “AI disrupt everything” narrative via mainstream news.
I might be not across the detail, but to me the legal plugin seems like it’s mostly adding some skills (prompts) that are fairly basic that any technically minded people could do, and is not enough of an improvement for completely non technical people to use.
arctic-true 4 days ago [-]
It could also be that we have been in an economy-wide speculative bubble for a couple of years. Whispers of an AI bubble were a way to self-soothe and avoid the fact that we are in an everything bubble.
myth_drannon 4 days ago [-]
Paypal fell 20% today.
quickthrowman 4 days ago [-]
PayPal reported an earnings miss and a new CEO before market open today, that’s why it sold off.
Strongbad536 3 days ago [-]
pretty sure that was because they brought in a new CEO that the market didn't like
Rendered at 12:31:44 GMT+0000 (Coordinated Universal Time) with Vercel.
How could it possibly keep up with LLM based search?
I believe frontier labs have no option but to go into verticals (because models are getting commoditized and capability overhang is real and hard to overcome at scale), however, they can only go into so many verticals.
Interesting. Why wouldn't an LLM based search provide the same thing? Just ask it to "use only trusted sources".
Oh so they are not just helping in search but also in curating data.
> They've got hundreds of thousands of physicians asking millions of questions everyday. None of the labs have this sort of data coming in or this sort of focus on such a valuable niche
I don't take this too seriously because lots of physicians use ChatGPT already.
This is pure LLM brain rot. You can’t “just ask” an LLM to be more reliable.
More seriously, the concept of trust is extremely lossy. The LLM is gonna lean in one direction that may or may not be correct. At the extreme, it wound likely refute a new discovery that went against what we currently know. In a more realistic version, certain AIs are more pro Zionist than others.
The thing is, LLMs are quite good at search and probably way way more strong that whatever RAG setup this company has. What failure mode are you looking at from a search perspective? Will ChatGPT just end up providing random links?
Yet now I provide an example of a very recent, big, very obvious, very prominent security explosion and now I am "grasping at the latest straw".
Ok man.
I’m guessing you’re not even aware of what OpenEvidence is, nor are you aware that every doctor you know uses it.
There is trust and then there is accountability.
At the end of the day, a business/practice needs to hold someone/entity accountable. Until the day we can hold an LLM accountable we need businesses like OpenEvidence and Harvey. Not to say Anthropic/OpenAI/Google cannot do this but there is more to this business than grounding LLMs and finding relevant answers.
And how does the LLM know which specific sources to ground itself to?
Is that sarcasm?
I think what you'll end up is a response that still relies on whatever random sources it likes, but it'll just attribute it to the "trusted sources" you asked for.
You started off by asking a question, and people are responding. Please, instead of assuming that everyone else is missing something, perhaps consider that you are.
Here’s what I mean: LLMs can absolutely be directed to just search for trustable sources. You can do this yourself - ask ChatGPT a question and ask it to use sources from trustworthy journals. Come up with your own rubric maybe. It will comply.
Now, do you disagree that ChatGPT can do this much? If you do, it’s almost trivially disprovable.
One of the posters said that hallucination is a problem but if you’ve used ChatGPT for search, you would know that it’s not. It’s grounding on the results anyway a worst case the physician is going to read the sources. So what’s hallucination got to do here?
The poster also asked a question “can you ask it to not hallucinate”. The answer is obviously no! But that was never my implication. I simply said you can ask it to use higher quality sources.
Since you’ve said in asserting BS, I’m asking you politely to show me exactly what part of what I said constitutes as BS with the context I have given.
Please read what I have written clearly instead of assuming the most absurd interpretation.
(My wife's a hospital doctor & author and introduced me to it; other family in other countries.)
See this. I use OpenEvidence. It has access to full text from some of the major medical journals. But generalist models seem to outperform it. Not sure what is going on there.
We turned that into a proper, ready-for-use-in-AI dataset and contributed it to the mainstream AI community under the name OpenDebateEvidence. Presented at NeurIPS 2024 Dataset and Benchmark track.
https://neurips.cc/virtual/2024/poster/97854
https://huggingface.co/datasets/Yusuf5/OpenCaselist
For example, only 7% of pharmaceutical research is publicly accessible without paying. See https://pmc.ncbi.nlm.nih.gov/articles/PMC7048123/
Edit: seems like it is ~10M USD.
If AI tooling starts to seriously chip away at those foundations then it puts a large chunk of their business at risk.
You can be a huge, profitable data-only company... but it's likely going to be smaller than a data+interface company. And so, shareholder value will follow accordingly.
The assumption is that Claude has access to a stream of fresh, currated data. Building that would be a different focus for Anthropic. Plus Thomson Reuters could build an integration. Not totally convinced that is a major threat yet.
Again, unless Anthropic are taking on liability for their legal tools, this is not going to impact TR.
That being said, there probably is a potential company here that's gonna be built soon/is currently being built, but it definitely won't just be a wrapper around Claude as the recall will be way too low for these systems unaided.
I'm like: oh that's it, a bunch of skills files?
So the value of a skill file is that it tells the model how to format its response for use within the software environment surrounding the model.
With programming, it's mostly about how to tell it to use some API.
But all the model can do is reply some text, and the actual work needs to be done by the software(the agent harness) which needs to parse the model response and translate it into actual work.
My point is there is no magic: the model just reads the skill file and then uses that as a template for a textual response, which is then parsed and processed by traditional software.
So in terms of legal skills, a stand-alone skill like the contract review skill at https://github.com/anthropics/knowledge-work-plugins/blob/ma... is basically useless.
Yes, the model will read it and it will influence its response, but without some extensive software harness around the model to give it data for context and and so on: totaly useless.
Why? Because garbage in is garbage out.
So telling the model to review a contract and pay attention to "Whether indemnification is mutual or unilateral" will result is some response from the model, but without additional data it will be at the same level as what you can get from a google search.
The effect on established companies is exactly zero.
Now, having an in-house skills and proprietary software around the model to integrate it into your system, that would be valuable indeed, but not something an AI lab can replicate without building the whole company from scratch.
No, it will just lead to the end of the Basic CRUD+forms software engineer, as nobody will pay anyone just for doing that.
The world is relatively satisfied with "software products". Software - mostly LLM authored - will be just an enabler for other solutions in the real world.
> The world is relatively satisfied with "software products".
you can delete all websites except Tiktok, Youtube and PH, and 90% of the internet users wouldnt even notice something is wrong on the internet. We dont even need LLMs, if we can learn to live without terrible products.
Capital also won't be rewarded to people who don't have privileged/proprietary access to a market or non-public data or methods. Just being a good engineer with Claude Code isn't enough.
If that happens, some software companies will struggle to find funding and collapse, and people who might consider starting a software company will do something else, too.
Ultimately that could mean less competition for the same pot of money.
I wonder.
Something seems quite off. Am I the only one?
I might be not across the detail, but to me the legal plugin seems like it’s mostly adding some skills (prompts) that are fairly basic that any technically minded people could do, and is not enough of an improvement for completely non technical people to use.