Honest question but how do you see your business being affected as foundational models improve? While I have massive complaints about them, Gemini + structured outputs is working remarkably well for this internally and it's only getting better. It's also an order of magnitude cheaper than anything I've seen commercially.
maxmaio 51 seconds ago [-]
We're excited for foundational models to improve because we hope it will unlock a lot more use cases. Things like analysis after extraction, able to accurately extract extremely complex documents, etc!
zh2408 2 hours ago [-]
Saw reducto released benchmark related to your product: https://reducto.ai/blog/rd-tablebench
Curious your take on the benchmark and how well midship performs
maxmaio 2 hours ago [-]
The reducto guys are great! Their benchmark is not exactly how we would index our product because we extract into a user specified template vs. extracting into markdown (wysiwyg). That being said their eval aligns with our internal findings of commercial OCR offerings.
ivanvanderbyl 2 hours ago [-]
Congrats on the launch!
I’m curious to hear more about your pivot from AI workflow builder to document parsing. I can see correlations there, but that original idea seems like a much larger opportunity than parsing PDFs to tables in what is an already very crowded space. What verticals did you find have this problem specifically that gave you enough conviction to pivot?
maxmaio 44 minutes ago [-]
We saw initial traction with real estate firms extracting property data like rent rolls. But we've also seen traction in other verticals like accounting and intake forms. The original idea was very ambitious and when talking to potential customers they all seemed to be happy with the existing players.
nostrebored 3 hours ago [-]
How does your accuracy compare with VLMs like ColFlor and ColPali?
kietay 3 hours ago [-]
We think about accuracy in 2 ways
Firstly as a function of the independent components in our pipeline. For example, we rely on commercial models for document layout and character recognition. We evaluate each of these and select the highest accuracy, then fine-tune where required.
Secondly we evaluate accuracy per customer. This is because however good the individual compenents are, if the model "misinterprets" a single column, every row of data will be wrong in some way. This is more difficult to put a top level number on and something we're still working on scaling on a per-customer basis, but much easier to do when the customer has historic extractions they have done by hand.
tlofreso 3 hours ago [-]
Congrats on the launch... You're in a crowded space. What differentiates Midship? What are you doing that's novel?
kietay 3 hours ago [-]
Cofounder here.
Great Q - there is definitely a lot of competition in dev tool offerings but less so in end to end experiences for non technical users.
Some of the things we offer above and beyond dev tools:
1. Schema building to define “what data to extract”
2. A hosted web app to review, audit and export extracted data
3. Integrations into downstream applications like spreadsheets
Outside of those user facing pieces, the biggest engineering effort for us has been in dealing with very complex inputs, like 100+ page PDFs. Just dumping into ChatGPT and asking nicely for the structured data falls over in both obvious (# input/output tokens exceeded) and subtle ways (e.g. missing a row in the middle of the extraction).
seany62 3 hours ago [-]
Are users able to export their organized data?
maxmaio 3 hours ago [-]
Yes today we support exports to csv or excel from our web app!
hk1337 3 hours ago [-]
This is interesting.
Can you do this with emails?
tlofreso 3 hours ago [-]
And if yes, be specific in answering. Emails are a bear! Emails can have several file types as attachemtns. Including: Other emails, Zip files, in-line images where position matters for context.
maxmaio 3 hours ago [-]
We currently support pdf, docx, most image types (jpeg/jpg, png, heic), and excel.
saving the email as a pdf would work!
hubraumhugo 3 hours ago [-]
Congrats on the launch! A quick search in the YC startup directory brought up 5-10 companies doing pretty much the same thing:
At the same time, accurate document extraction is becoming a commodity with powerful VLMs. Are you planning to focus on a specific industry, or how do you plan to differentiate?
mitchpatin 2 hours ago [-]
TableFlow co-founder here - I don't want to distract from the Midship launch (congrats!) but did want to add my 2 cents.
We see a ton of industries/use-cases still bogged down by manual workflows that start with data extraction. These are often large companies throwing many people at the issue ($$). The vast majority of these companies lack technical teams required to leverage VLMs directly (or at least the desire to manage their own software). There’s a ton of room for tailored solutions here, and I don't think it's a winner-take-all space.
maxmaio 2 hours ago [-]
+1 to what mitch said. We believe there is a large market for non-technical users who can now automate extraction tasks but do not know how to interact with apis. Midship is another option for them that requires 0 programming!
erulabs 1 hours ago [-]
Execution is everything. Not to drop a link in someone else’s HN launch but I’m building https://therapy-forms.com and these guys are way ahead of me on UI, polish, and probably overall quality. I do think there’s plenty of slightly different niches here, but even if there were not, execution is everything. Heck it’s likely I’ll wind up as a midship customer, my spare time to fiddle with OCR models is desperately limited and all I want to do is sell to clinics.
2 hours ago [-]
2 hours ago [-]
tlofreso 2 hours ago [-]
"accurate document extraction is becoming a commodity with powerful VLMs"
Agree.
The capability is fairly trivial for orgs with decent technical talent.
The tech / processes all look similar:
User uploads file -->
Azure prebuilt-layout returns .MD -->
prompt + .MD + schema set to LLM -->
JSON returned. Do whatever you want with it.
Kiro 24 minutes ago [-]
Why all those steps? Why not just file + prompt to JSON directly?
kietay 2 hours ago [-]
Totally agree that this is becoming the standard "reference architecture" for this kind of pipeline. The only thing that complicates this a lot today is complex inputs. For simple 1-2 page PDFs what you describes works quite well out of the box but for 100+ page doc it starts to fall over in ways I described in another comment.
tlofreso 1 hours ago [-]
Are really large inputs solved at midship? If so, I'd consider that a differentiator (at least today). The demo's limited to 15pgs, and I don't see any marketing around long-context or complex inputs on the site.
I suspect this problem gets solved in the next iteration or two of commodity models. In the meantime, being smart about how the context gets divvied works ok.
I do like the UI you appear to have for citing information. Drawing the polygons around the data, and then where they appear in the PDF. Nice.
maxmaio 3 hours ago [-]
Yes there is definitely a boom in document related startups. We see our niche as focusing on non technical users. We have focused on making it easy to build schemas, an audit and review experience, and integrating into downstream applications.
Rendered at 21:33:10 GMT+0000 (Coordinated Universal Time) with Vercel.
I’m curious to hear more about your pivot from AI workflow builder to document parsing. I can see correlations there, but that original idea seems like a much larger opportunity than parsing PDFs to tables in what is an already very crowded space. What verticals did you find have this problem specifically that gave you enough conviction to pivot?
Firstly as a function of the independent components in our pipeline. For example, we rely on commercial models for document layout and character recognition. We evaluate each of these and select the highest accuracy, then fine-tune where required.
Secondly we evaluate accuracy per customer. This is because however good the individual compenents are, if the model "misinterprets" a single column, every row of data will be wrong in some way. This is more difficult to put a top level number on and something we're still working on scaling on a per-customer basis, but much easier to do when the customer has historic extractions they have done by hand.
Great Q - there is definitely a lot of competition in dev tool offerings but less so in end to end experiences for non technical users.
Some of the things we offer above and beyond dev tools: 1. Schema building to define “what data to extract” 2. A hosted web app to review, audit and export extracted data 3. Integrations into downstream applications like spreadsheets
Outside of those user facing pieces, the biggest engineering effort for us has been in dealing with very complex inputs, like 100+ page PDFs. Just dumping into ChatGPT and asking nicely for the structured data falls over in both obvious (# input/output tokens exceeded) and subtle ways (e.g. missing a row in the middle of the extraction).
Can you do this with emails?
saving the email as a pdf would work!
- https://www.ycombinator.com/companies/tableflow
- https://www.ycombinator.com/companies/reducto
- https://www.ycombinator.com/companies/mindee
- https://www.ycombinator.com/companies/omniai
- https://www.ycombinator.com/companies/trellis
At the same time, accurate document extraction is becoming a commodity with powerful VLMs. Are you planning to focus on a specific industry, or how do you plan to differentiate?
We see a ton of industries/use-cases still bogged down by manual workflows that start with data extraction. These are often large companies throwing many people at the issue ($$). The vast majority of these companies lack technical teams required to leverage VLMs directly (or at least the desire to manage their own software). There’s a ton of room for tailored solutions here, and I don't think it's a winner-take-all space.
Agree.
The capability is fairly trivial for orgs with decent technical talent. The tech / processes all look similar:
User uploads file --> Azure prebuilt-layout returns .MD --> prompt + .MD + schema set to LLM --> JSON returned. Do whatever you want with it.
I suspect this problem gets solved in the next iteration or two of commodity models. In the meantime, being smart about how the context gets divvied works ok.
I do like the UI you appear to have for citing information. Drawing the polygons around the data, and then where they appear in the PDF. Nice.