Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Launch HN: Midship (YC S24) – Turn PDFs, docs, and images into usable data

121 points by maxmaio 52 days ago | 61 comments

monkeydust 52 days ago [-]

Heres a real world use case, our company has moved our pension provider. This provider like the old one sucks at providing me with a good way to navigate through the 120 funds I can invest in.

I want to create something that can paginate through 12 pages of html, perform clicks, download pdf fund factsheet, extract data from this factsheet into excel or CSV. Can this help? What's the best way to deal with the initial task of automating webpage interactions systematically?

maxmaio 52 days ago [-]

This is an interesting use case! We've heard similar stories from people dealing with pensions. Today we are ready to solve out of the box the extract data from a factsheet into excel or CSV step. Shoot me an email at max@midship.dev!

_hfqa 52 days ago [-]

Have you looked into tools https://www.multion.ai or https://www.browserbase.com.

navaed01 52 days ago [-]

You should check out Twin

ctippett 52 days ago [-]

Congrats on the launch. I just sent y'all an email – I'm curious with what you can do with airline crew rosters.

crossroadsguy 52 days ago [-]

I would like a tool that converts x months of credit card bills into a csv (the txn table from across PDFs and pages in each PDF) or something very easily.

kitcar 52 days ago [-]

Tabula can do that (open source) https://tabula.technology/

robertlagrant 52 days ago [-]

Wow, great tip.

anon99879 52 days ago [-]

worth a shot - has a free tier: https://www.oscar-idp.com/

maxmaio 51 days ago [-]

yes we can! shoot me an email max@midship.dev

rco8786 52 days ago [-]

Can you speak to the accuracy, particularly of numerical value extraction, that you’re achieving? I have a use case for pulling tabular financial data out of PDFs and accuracy is our main concern with using AI for that type of task.

maxmaio 51 days ago [-]

This is something we are hyper focused on. shoot me an email at max@midhsip.dev - you can also try our financial analysis template in our playground for free: https://app.midship.ai/demo.

abhgh 52 days ago [-]

Congratulations on the launch! Its a crowded space but I think there is place for a good and accurate tool!

Tried the examples - they seem tailored for specific document types. I have two questions around that: (a) is their a "best-effort" extraction you can perform or plan to support if you don't know the document type? (b) do you plan to support extraction from academic papers, i.e., potentially multi-column, with images, tables that are either single column or span two columns, equations, etc.?

maxmaio 51 days ago [-]

Hi! In our playground we only included a few templates, but in our app you can create any template you'd like! So yes we can extract from academic papers! shoot me an email: max@midship.dev

fluxode 49 days ago [-]

Congrats on the launch! Just some friendly advice: financial documents such as quarterly earnings are actually highly structured via xrbl. If you are positioning the company as an unstructured -> structured process, then using these types of financial documents is probably not a great example even though everybody seems to do it.

nostrebored 52 days ago [-]

How does your accuracy compare with VLMs like ColFlor and ColPali?

kietay 52 days ago [-]

We think about accuracy in 2 ways

Firstly as a function of the independent components in our pipeline. For example, we rely on commercial models for document layout and character recognition. We evaluate each of these and select the highest accuracy, then fine-tune where required.

Secondly we evaluate accuracy per customer. This is because however good the individual compenents are, if the model "misinterprets" a single column, every row of data will be wrong in some way. This is more difficult to put a top level number on and something we're still working on scaling on a per-customer basis, but much easier to do when the customer has historic extractions they have done by hand.

misstercool 52 days ago [-]

Saw your demo video. Are you focusing on the finance sector primarily? It is a challenging industry IMO, requiring high accuracy and has strict privacy/security bar. How do you address these concerns?

Curious what are the biggest complain from your users? Are they willing to manually auditing the numbers in the table, make sure the output is 1. accurate. 2. formatted in the table they expected.

maxmaio 51 days ago [-]

We have seen a lot of use cases in finance and are currently working with a few firms. Accuracy is generally their primary requirement. We've been focusing on not just accuracy but also the audit experience which includes confidence scoring. Your right about security and we are currently undergoing a soc 2 audit.

Generally firms are outsourcing the data entry and are already manually auditing with shortcuts like summing values.

Relating to formatted in the table they expected. We extract the data directly into their template format so it is always in the table they expect.

XzAeRosho 52 days ago [-]

I've been working on some ML solutions similar to this with, and the high accuracy is the challenging part. Specially with "creative" layout documents.

I'm curious as well to see how they are handling accuracy, since I had to build an external agent to validate data.

ivanvanderbyl 52 days ago [-]

Congrats on the launch!

I’m curious to hear more about your pivot from AI workflow builder to document parsing. I can see correlations there, but that original idea seems like a much larger opportunity than parsing PDFs to tables in what is an already very crowded space. What verticals did you find have this problem specifically that gave you enough conviction to pivot?

maxmaio 52 days ago [-]

We saw initial traction with real estate firms extracting property data like rent rolls. But we've also seen traction in other verticals like accounting and intake forms. The original idea was very ambitious and when talking to potential customers they all seemed to be happy with the existing players.

smt88 52 days ago [-]

How do you guarantee that nothing in an extracted rent roll is hallucinated?

themanmaran 52 days ago [-]

The same way you guarantee that a person manually typing the data never makes a mistake.

smt88 52 days ago [-]

Humans in this space tend to make mistakes like, "Added rent as per-square-foot instead of absolute value," or, "Missed a rent escalation for year 3."

These tend to be easy to catch, even for the same person who's reviewing the data. They would see that rent steps looked strange (Y2 and Y4, but not Y3) or there was an order-of-magnitude difference in rent from one month to another.

AI can do something like invent reasonable-looking rent steps. They're designed to create output that seems reasonable, even if it's completely made up.

When humans are wrong, they tend to misread what's there, which is much less insidious than inventing something.

And if you have a human reviewer for all the work this AI does, what's the point of the AI in the first place? The human has become the source of truth either way.

serjester 52 days ago [-]

Honest question but how do you see your business being affected as foundational models improve? While I have massive complaints about them, Gemini + structured outputs is working remarkably well for this internally and it's only getting better. It's also an order of magnitude cheaper than anything I've seen commercially.

maxmaio 52 days ago [-]

We're excited for foundational models to improve because we hope it will unlock a lot more use cases. Things like analysis after extraction, able to accurately extract extremely complex documents, etc!

arjvik 52 days ago [-]

Curious - have you compared Gemini against Anthropic and OpenAI’s offerings here? Am needing to do something similar for a one-off task and simply need to choose a model to use.

serjester 52 days ago [-]

Gemini is an awful developer experience but accuracy for OCR tasks is close to perfect. The pricing is also basically unbeatable - works out to 1k 10k pages per dollar depending on the model. OpenAI has subtle hallucinations and I haven’t profiled Anthropic.

inapis 52 days ago [-]

If I may ask which model are you using? I have tried OCR'ing my bank statements in AI studio and the results have been less than optimal. Specifically it has a tendency to ignore certain instructions combined with screwing up the order.

Some pointers on what worked for you would be greatly appreciated.

arjvik 52 days ago [-]

Thanks!

zh2408 52 days ago [-]

Saw reducto released benchmark related to your product: https://reducto.ai/blog/rd-tablebench Curious your take on the benchmark and how well midship performs

maxmaio 52 days ago [-]

The reducto guys are great! Their benchmark is not exactly how we would index our product because we extract into a user specified template vs. extracting into markdown (wysiwyg). That being said their eval aligns with our internal findings of commercial OCR offerings.

drcongo 52 days ago [-]

I may or may not be the target audience, but it may help you to know a "book demo" link instead of a pricing page in the primary nav is a good heuristic shortcut for me to decide I'm not the target audience.

maxmaio 51 days ago [-]

We are currently focused on business customers :/ but that may change in the coming months!

drcongo 51 days ago [-]

I should maybe have mentioned, I'm the founder, CTO and debit card holder of my business.

prithvi24 52 days ago [-]

Whats pricing look like with HIPAA compliance?

maxmaio 52 days ago [-]

Send me an email here: max@midship.dev so we can learn more. We're in the process of a SOC 2 audit and should be HIPAA by end of the month!

seany62 52 days ago [-]

Are users able to export their organized data?

maxmaio 52 days ago [-]

Yes today we support exports to csv or excel from our web app!

hk1337 52 days ago [-]

This is interesting.

Can you do this with emails?

maxmaio 52 days ago [-]

We currently support pdf, docx, most image types (jpeg/jpg, png, heic), and excel.

saving the email as a pdf would work!

tlofreso 52 days ago [-]

And if yes, be specific in answering. Emails are a bear! Emails can have several file types as attachemtns. Including: Other emails, Zip files, in-line images where position matters for context.

Brajeshwar 52 days ago [-]

A friend seems to be doing it for email - https://dwata.com They are early but is promising

tlofreso 52 days ago [-]

Congrats on the launch... You're in a crowded space. What differentiates Midship? What are you doing that's novel?

kietay 52 days ago [-]

Cofounder here.

Great Q - there is definitely a lot of competition in dev tool offerings but less so in end to end experiences for non technical users.

Some of the things we offer above and beyond dev tools: 1. Schema building to define “what data to extract” 2. A hosted web app to review, audit and export extracted data 3. Integrations into downstream applications like spreadsheets

Outside of those user facing pieces, the biggest engineering effort for us has been in dealing with very complex inputs, like 100+ page PDFs. Just dumping into ChatGPT and asking nicely for the structured data falls over in both obvious (# input/output tokens exceeded) and subtle ways (e.g. missing a row in the middle of the extraction).

747-8F 48 days ago [-]

hubraumhugo 52 days ago [-]

Congrats on the launch! A quick search in the YC startup directory brought up 5-10 companies doing pretty much the same thing:

- https://www.ycombinator.com/companies/tableflow

- https://www.ycombinator.com/companies/reducto

- https://www.ycombinator.com/companies/mindee

- https://www.ycombinator.com/companies/omniai

- https://www.ycombinator.com/companies/trellis

At the same time, accurate document extraction is becoming a commodity with powerful VLMs. Are you planning to focus on a specific industry, or how do you plan to differentiate?

maxmaio 52 days ago [-]

Yes there is definitely a boom in document related startups. We see our niche as focusing on non technical users. We have focused on making it easy to build schemas, an audit and review experience, and integrating into downstream applications.

themanmaran 52 days ago [-]

Hey we're on that list! Congrats on the launch Max & team!

I could definitely point to minor differences between all the platforms, but you're right that everyone is tackling the same unstructured data problem.

In general, I think it will be a couple years before anyone really butts heads in the market. The problem space is just that big. I'm constantly blown away by how big the document problem at these mid sized businesses. And most of these companies don't have any engineers on staff. So no attempt has ever been made to fix it.

52 days ago [-]

tlofreso 52 days ago [-]

"accurate document extraction is becoming a commodity with powerful VLMs"

Agree.

The capability is fairly trivial for orgs with decent technical talent. The tech / processes all look similar:

User uploads file --> Azure prebuilt-layout returns .MD --> prompt + .MD + schema set to LLM --> JSON returned. Do whatever you want with it.

kietay 52 days ago [-]

Totally agree that this is becoming the standard "reference architecture" for this kind of pipeline. The only thing that complicates this a lot today is complex inputs. For simple 1-2 page PDFs what you describes works quite well out of the box but for 100+ page doc it starts to fall over in ways I described in another comment.

tlofreso 52 days ago [-]

Are really large inputs solved at midship? If so, I'd consider that a differentiator (at least today). The demo's limited to 15pgs, and I don't see any marketing around long-context or complex inputs on the site.

I suspect this problem gets solved in the next iteration or two of commodity models. In the meantime, being smart about how the context gets divvied works ok.

I do like the UI you appear to have for citing information. Drawing the polygons around the data, and then where they appear in the PDF. Nice.

Kiro 52 days ago [-]

Why all those steps? Why not just file + prompt to JSON directly?

tlofreso 52 days ago [-]

Having the text (for now) is still pretty important for quality output. The vision models are quite good, but not a replacement for a quality OCR step. A combination of Text + Vision is compelling too.

erulabs 52 days ago [-]

Execution is everything. Not to drop a link in someone else’s HN launch but I’m building https://therapy-forms.com and these guys are way ahead of me on UI, polish, and probably overall quality. I do think there’s plenty of slightly different niches here, but even if there were not, execution is everything. Heck it’s likely I’ll wind up as a midship customer, my spare time to fiddle with OCR models is desperately limited and all I want to do is sell to clinics.

_hfqa 52 days ago [-]

Just a heads up, but I tried to signup but the button doesn't seem to work.

erulabs 51 days ago [-]

See what I mean about execution?

hermitcrab 52 days ago [-]

Do you know if there any good (pref C++) libraries for extracting data tables from PDFs?

mitchpatin 52 days ago [-]

TableFlow co-founder here - I don't want to distract from the Midship launch (congrats!) but did want to add my 2 cents.

We see a ton of industries/use-cases still bogged down by manual workflows that start with data extraction. These are often large companies throwing many people at the issue ($$). The vast majority of these companies lack technical teams required to leverage VLMs directly (or at least the desire to manage their own software). There’s a ton of room for tailored solutions here, and I don't think it's a winner-take-all space.

maxmaio 52 days ago [-]

+1 to what mitch said. We believe there is a large market for non-technical users who can now automate extraction tasks but do not know how to interact with apis. Midship is another option for them that requires 0 programming!

foogoo4 51 days ago [-]

[dead]

magamanlegends 52 days ago [-]

[dead]

Rendered at 03:43:23 GMT+0000 (Coordinated Universal Time) with Vercel.