Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Show HN: Autotab – Programmable AI browser for turning web tasks into APIs

159 points by jonasnelle 444 days ago | 82 comments

slfnflctd 443 days ago [-]

If I understand this correctly, it looks like the promise I saw in that 'Record Macro' button in my Excel toolbar in the 1990s might finally be coming to fruition in a wider and more capable sense! A pleasant surprise effect of the new AI situation if true.

I noticed in another comment that you said some steps can be made 'optional' (e.g. clicking through a modal). In my ancient Excel macro adventure, what I learned was that I had to tweak the heck out of the VBA code that Record button generated, which led to me just straight writing VBA for everything and eventually abandoning the Record feature entirely. I had a similar experience later on with AutoHotKey. What are the analogous aspects of Autotab to this? Also, to what extent is hand-manipulating the underlying automation possible and/or necessary to get optimal results?

jonasnelle 443 days ago [-]

Indeed! A little secret: Internally we call the skills/workflows in Autotab macros :)

Currently there is a bit of a learning curve for training Autotab to be really reliable in hard cases. We expect we’ll be able to decrease significantly in the next few months, as we get models to do more of the thinking about how to best codify a given task solution/workflow. As an intuition pump for why we expect such rapid progress: in the scenario you described you’d just have a model write the VBA code for you.

pugio 444 days ago [-]

I love the idea - owning the browser definitely seems like the right approach.

I tried it out on a workflow I've been manually piecing together and it gave me a bunch of "Error encountered, contact support" messages when doing things like clicking on a form input field, or even a button.

The more complex "Instruction" block worked correctly instead (literally things like "click the "Sign In" button), but then I ran out of the 5 minutes of free run time when trying to go through the full flow. I expect this kind of thing will be fixed soon, as it grows.

In terms of ultimate utility, what I really want is something which can export scripts that run entirely locally, but falling back to the more dynamic AI enhanced version when an error is encountered. I would want AutoTab to generate the workflow which I could then run on my own hardware in bulk.

Anyway, great work! This is definitely the best implementation I've seen of that glimpsed future of capable AI web browsing agents.

431 days ago [-]

alexirobbins 444 days ago [-]

sorry you encountered that issue! what website was the form on? we'll see if we can catch the error!

curious what you mean by generating the workflow that you run on your own hardware? Is this different than running Autotab locally?

pugio 444 days ago [-]

Hah, looks like you guys found my account error via my profile email, nice! Thanks for fixing that bug. I'll try again tomorrow when the fix is pushed.

My other request is probably not in line with your business model. I get the sense that Autotab is always communicating with some server on your end, probably for the various bits of AI functionality. What I was asking for is the ability to export the actions/workflow as, say, a python script (like a Selenium script, or even better, a script which drives your browser) which performs the actions in the Autotab workflow.

I need AI understanding when creating the workflow, or healing in case of an error, but I don't always need it when just executing a prepared script. In those (non AI needed) cases, I don't really want to use up my runtime minutes just because I'm executing a previously generated workflow.

rava-dosa 443 days ago [-]

Really exciting to see this approach to automation and intent specification! We’ve been working with similar challenges at Origins AI, where we focus on deep tech solutions.

I can’t overstate how much having a robust system for breaking down tasks and iterating on them has helped us.

For one of our recent projects, we had to integrate complex workflows with third-party systems, and it was clear that reliability came down to how well we could define and refine intent over time.

I’m especially curious about your self-healing automations. That’s an area where we’ve found a lot of value using models that can adapt to subtle UI changes, but it’s always a tradeoff with latency. Would love to hear more about how you balance that in production!

Looking forward to trying Autotab and seeing how it compares with some of the internal tools we’ve built!

jonasnelle 443 days ago [-]

Agree on the tradeoff between ability to handle novel situations and speed/cost. Autotab uses a “ladder of compute” system that escalates to the minimal level of compute required to solve a given subtask. I wrote a longer comment about this on another thread

MattDaEskimo 444 days ago [-]

Very neat in theory but I'm failing to find any technical details.

Which layer is the automation happening? Inside using Dev tools? Multiple?

What is the self-healing mechanic? I'm guessing invoking an LLM to find what happened and fix it?

I guess what I'm wondering is. Is this some sort of hybrid between computer use and Dev tools usage?

jonasnelle 444 days ago [-]

Autotab is definitely a hybrid approach, because when it comes to deciding where on the page to take an action, Autotab has to be fast & cheap (humans are both of those) while also being robust to changes. The solution we use is a "ladder of compute" where Autotab uses everything from really fast heuristics and local models up to the biggest frontier models, depending on how difficult the task is.

For instance, if Autotab is trying to click the "submit" button on a sparse page that looks like previous versions of that page, that click might take a few hundred milliseconds. But if the page is very noisy, and Autotab has to scroll, and the button says "next" on it because the flow has an additional step added to it, Autotab will probably escalate to a bigger model to help it find the right answer with enough certainty to proceed.

There is a certain cutoff in that hierarchy of compute that we decided to call "self-healing" because latency is high enough that we wanted to let users know it might take a bit longer for Autotab to proceed to the next step.

thelastparadise 443 days ago [-]

So no computer use (pixel-level understanding).

That's disappointing as the devtools approach always has limitations.

Kura agents, Runner H, and scrapybara will all end up more reliable than you.

jonasnelle 443 days ago [-]

If by pixel level you mean vision-first understanding and control of the UI then you’ve misunderstood my comment - Autotab primarily uses vision to reason about screens and take action.

You can also use Anthropic’s Computer Use model directly in Autotab via the instruct feature - our users find it most helpful for handling specific subtasks that are complex to spell out, like picking a date in a calendar.

Carrok 444 days ago [-]

You say "try it for free" but your website has no pricing information at all. Is this free for just a while? Free forever? What is your monetization strategy?

Can I point it at my own LLM or am I locked into using OpenAI?

alexirobbins 444 days ago [-]

We have unlimited free editing, so you can fully try everything out and know your skill will work before we ask you to subscribe. You also get 5m of free runtime. Subscriptions start at $39/month with 300 minutes of runtime included.

Right now we do not let you BYO llm, but it's something we would love to provide an option for where possible!

Carrok 444 days ago [-]

5 minutes seems like barely enough time to complete any given task, let alone actually try it out. $40/mo for a capped plan seems steep, but maybe I'm not your target customer. Best of luck!

alexirobbins 444 days ago [-]

The free edit mode has all of the features of run mode, and lets you fully test the skill. The only difference is that inside of a loop it will ask you to click to continue.

A lot of AI tools promise the world and don't deliver. We explicitly don't want anyone to pay us until they're sure Autotab can do their task, even though the model costs during editing are actually much higher than during runtime.

jonasnelle 444 days ago [-]

Good point, will add pricing information to our website ASAP, had skipped that one in the push to launch (it is only available in the app at the moment)

adamkhakhar 443 days ago [-]

This is awesome! What is your most common use case? Have you thought of competing with https://scribehow.com/ in the documentation space?

jonasnelle 443 days ago [-]

Thanks! Our most common use cases are repetitive tasks people have at work, think updating Hubspot with analytics data from an internal tool or reconciling payments between an invoicing system, a payment system and a CRM.

Haven’t done a lot with Scribe-like documentation cases. Given the pace at which this technology is developing we’re focused on making Autotab really good at the most economically valuable tasks.

_1tem 443 days ago [-]

How on earth does this help with reconciling payments? Can Autotab also recognize "this transaction belongs to this invoice" or does it just copy and paste all transaction and invoice data into a spreadsheet for manual reconciliation?

jonasnelle 443 days ago [-]

Yes, Autotab can reason over the state of applications and the data it is seeing. You can also teach it to do certain steps only in specific cases.

If you wanted Autotab to reconcile payments you would teach it to go to wherever the payments are listed eg a banking app. There you would have it iterate through the unreconciled payments. For each payment you’d have Autotab go to the invoicing tool and look up any details from the payment (eg IBAN, information from the reference number, amount, etc) to find the matching customer and invoice. This is where most of the reasoning happens - you can teach Autotab what counts as sufficiently close to be a match with prompts and examples. Then you can have Autotab mark the invoice as paid and go back to the payment app and mark the payment with the invoice number it grabbed from the matched payment.

alex_c 443 days ago [-]

The functionality looks very very cool. But the privacy policy raises an eyebrow - am I overreacting?

Usage Information. To help us understand how you use our Services and to help us improve them, we automatically receive information about your interactions with our Services, like the pages or other content you view, the searches you conduct, and the dates and times of your visits.

Desktop Activity on our Services. In order to provide the Services, we need to collect recordings of your desktop activity while using our Services, which may include audio and video screen recordings, your cookies, photos, local storage, search history, advertising interactions, and keystrokes.

Information from Cookies and Other Tracking Technologies. We and our third-party partners collect information using cookies, pixel tags, SDKs, or other tracking technologies. Our third-party partners, such as analytics partners, may use these technologies to collect information about your online activities over time and across different services.

[...]

How We Disclose the Information We Collect

Affiliates.We may disclose any information we receive to any current or future affiliates for any of the purposes described in this Privacy Policy.

Vendors and Service Providers. We may disclose any information we receive to vendors and service providers retained in connection with the provision of our Services.

alexirobbins 443 days ago [-]

We work with fortune 500 companies and have HIPAA compliant offerings, so we are very sensitive to privacy and security concerns. Fundamentally the models need to operate on whatever browser tasks users ask Autotab to perform, and we need to use frontier vision models like 4o and Claude to reliably perform them (model providers are the affiliates in question). If you have specific concerns happy to answer them.

alienallys 442 days ago [-]

Your response doesn't seem to address the Privacy concerns raised. Why is the policy so broad and invasive? There's no mention of how you handle PII data collected as telemetry.

handfuloflight 444 days ago [-]

I see it's able to perform data extraction, but what if you wanted to enter in data from another system, or generated by an LLM during the workflow?

jonasnelle 444 days ago [-]

Data from external systems can be provided to Autotab in the form of CSV files or string inputs, which can be passed to the API to parametrize skills. However, in most cases, ingesting data into Autotab is easiest by just having Autotab navigate to the website where the data is present.

Autotab has a structured type system underlying the workflows, so any data processed in the course of an automation can be referenced in later steps. It's a bit like a fuzzy programming language for automation, and the model generates schemas to ensure data flows reliably through the series of steps.

For example, users often start by collecting information in one system (using an extract step as you mentioned), then cross reference it in another and then submit some data by having Autotab type it into a third system. In Autotab, you can just type @ to reference a variable, each step has access to data from previous steps.

At the end, you can get a dump of all of Autotab's data from a run as a JSON file, or turn specific arrays of data into CSV files using a table step.

grugagag 444 days ago [-]

I don’t know what your intention is but I imagine that’s how more and more are going to push LLM slop on all corners of the internet. It’ll be easy to do in massive quantities.

thedays 443 days ago [-]

Is Autotab able to scrape data from multiple websites with different structures and combine this data into structured data in one CSV or JSON file? Example: scrape interest rates offered on savings accounts from multiple bank websites and extract the name of the bank, bank logo, product name and interest rate for each account and run this saved query on a regular schedule (daily, weekly etc)?

jonasnelle 443 days ago [-]

Assuming the bank’s websites look totally different from one another, you’d need open ended exploration to data extraction. We’ve focused more on reliability for repetitive tasks over flexibility for open ended tasks historically, but models are getting good enough that this tradeoff is diminishing. Expect updates from us on this front soon.

You can schedule skills in Autotab to run at arbitrary frequency.

smashah 444 days ago [-]

If this was an OSS project automating a specific service many HN-ers would come and bleet about TOS violations & being scared/wary of C&Ds.

How does this not violate TOS? Do you have legal protection set up from megacorps trying to bully you with legal threats?

Automation despite TOS via Adversarial Interop should be a Digital Human Right. Godspeed.

jonasnelle 444 days ago [-]

This has been much less of an issue than I would have expected - Autotab is optimized for reasoning heavy tasks in core systems that require high reliability over being really fast at doing giant scrapes. More automating leads in Salesforce, tickets in Jira and data in Airtable than hawking tickets.

smashah 444 days ago [-]

Just want to reiterate I fully support what you're doing and I despise the megacorps that send out legal threats to small companies/OSS devs but according to their overbroad TOS they do not make distinctions between the types of automations and reasoning behind them - technically, they would argue, both you and your users are violating TOS. I'm sure you have already, but make sure the legal help at YC give you the ammo you need to protect yourself and your customers when some of them randomly start getting banned.

As more and more AI Agent enabled tooling comes out, this will become a bigger issue (the fact that people are automating these services against the TOS) so it's good if everyone who can get legal help has and shares the tactics to fight back against any civil TOS-based legal threats so we are all protected.

diegolazcano 444 days ago [-]

This is awesome. I was just trying to get a rudimentary version of this for some "user" interaction heavy data extraction. Definitely giving it a try.

For a case with lots of requests how does Autotab handle ip-blocking? Does each run use a different portal instance?

jonasnelle 444 days ago [-]

When you run Autotab in the app it runs locally, so no IP blocking issues there. If you want to run it in the cloud eg via API, by default your IP will be from the data center but we have residential proxies that we can enable on a case by case basis.

diegolazcano 444 days ago [-]

Just tried it - very cool indeed. I did a page loop extraction but it seems to be the same speed when I run it. The elements I am doing the loop on look pretty much the same, just different images. I think it would be great if it was able to generalize how to find an element like with css selectors for example to speed up once its sure that is the data you are looking to extract for a given loop.

jonasnelle 443 days ago [-]

Totally agree, making page loop faster is on the top of our list of things to do! There are cases where you need page loop to do quite a bit of reasoning so it will be this slow until models get faster, but we can make it a lot faster today on happy paths - stay tuned :)

throwup238 444 days ago [-]

> we have residential proxies that we can enable on a case by case basis.

Who is your vendor for residential proxies? That’s quite a sketchy industry.

jonasnelle 443 days ago [-]

We use a range of different providers, it really depends on the customer and use case. We only enable the proxy in rare cases that need it for a specific reason.

nagisa12321 443 days ago [-]

Have you considered how to handle mobile verification codes, graphic verification codes, and "proving you are not a robot" verification methods?

jonasnelle 443 days ago [-]

Quoting my cofounder from another thread:

For 2FA, different users take different approaches. Everything from teaching Autotab to pull auth codes from their email, to setting intervention requests at the top of their skills, to enterprise integrations that we support with SSO and dedicated machine accounts.

Autotab also has the ability to securely sync session data from your local app to cloud instances. This usually removes the need for doing 2FA again for sites with “remember this device” functionality.

We can enable captcha solving for select customers, but don’t allow that in the public app to prevent abuse.

pacifi30 444 days ago [-]

Pretty slick. I recorded a session for ordering from a restaurant website, and it did repeat the entire workflow. It had some issues with a modal popped up but all in all well done! We have been trying to robotify the task of ordering from restaurant for our clients and seems like your solution can work well for us. I am guessing that you want your users to use Autotab browser, what is use for API?

jonasnelle 444 days ago [-]

Thanks! We think of the browser as an authoring tool where you create, test and refine skills.

After you've done that, the API is great for cases where you want to incorporate Autotab into a larger data flow or product.

For instance, say Company A has taught Autotab to migrate their customers' data - so their customers just see a sync button in the Company A product, which kicks off a Autotab run via API. Same for restaurant booking, if you'd want that to happen programatically.

pacifi30 444 days ago [-]

Understood! How does it work if we have several different restaurants to order from, do I need to record each ordering session and create skills for each restaurant or it can infer on its own given the task to order from a restaurant. Secondly, any docs or samples to see how to integrate this with your API?

jonasnelle 444 days ago [-]

Depends on how different the flows are for different restaurants. If they're just different names but use the same booking system you'd typically use an input and have Autotab find the correct restaurant first. If they're totally different booking systems you can try the instruct (open ended agentic) step but my guess is that will be too slow and unreliable for now, so you'd probably want to record different skills for each.

Docs are here with sample code: https://docs.autotab.com/api-reference

handfuloflight 444 days ago [-]

Is the API also charged based on runtime? And I'm assuming that workflow happens in the cloud? What if it's behind a login? What if that login requires 2FA?

alexirobbins 444 days ago [-]

Yep exactly. Authentication is primarily handled with session data, so passwords never leave your device, but we also support setting secrets.

Here is more info on auth and security: https://docs.autotab.com/manual/security

jonasnelle 444 days ago [-]

Also for the modal popup - this is the kind of issue that goes away in run mode because Autotab will escalate to bigger models to self-heal.

If the modal pops up frequently you can also record an click to dismiss it and make that click optional so Autotab knows to move on if the modal does not pop up sometimes.

treetalker 443 days ago [-]

> As it runs, Autotab asks for clarifications and feedback. These learnings are accumulated into action memory—improving Autotab's world model, and allowing it to work reliably for hours on end.

Is "learning", used as a noun, a term of art in this field?

If not, my reactioning to that using is that it is a being bad English that causes producings of gratings on the ears.

globalise83 443 days ago [-]

It's widely used.

Source: https://scholar.google.com/scholar?start=0&q=%22learnings+fr...

beacon294 443 days ago [-]

It's honestly common industry slang and may be British English.

earthlingdavey 443 days ago [-]

It's not really that common in British English. I've heard it from colleagues who learnt English in India.

amarsharma 443 days ago [-]

Been working in this space for almost 9 years and written a lot of scrappers and web automations for various clients, I am really excited to build something like this too. Are you guys hiring? Would love to chat.

angoragoats 442 days ago [-]

Warning: they want you to be in the office 6 days a week.

https://news.ycombinator.com/item?id=40225546

jonasnelle 443 days ago [-]

We are hiring. Feel free to reach out at contact@autotab.com

amarsharma 443 days ago [-]

Sweet, I have emailed you. Subject: "Amar from Hacker News"

hmontazeri 438 days ago [-]

I don't read docs. Didn't get it to work the way I wanted... It needs simplification.

wruza 443 days ago [-]

Honestly, the video feels like just any low/nocode tutorial video in a sense “that we’re going to automate something” and a minute later we are copying urls into some complex forms and following the voiceover of something you cannot grasp the meaning of. A little intro of what exactly we are doing would help.

I cold-watched only half of it, without reading any info on the project, but that’s how everyone does it, I guess.

But I get the idea. Automate by example with automatic scenario builder and fuzzy matching ui via ai.

As someone who works in automation, I (again, blindly) suggest looking into anti-detection and human behavior like mouse movements, typing errors and pauses, because that’s what your (and all ours) main enemy will be in the next decade.

All in all, this is in high demand, afaiu. I tend to use a classic ML approach for that (avoiding browser automation cause it obviously only works in a browser and limits/divides the area of application), but would love to try something that self-heals on site changes. Although I think I’d better use something that can detect changes and reconfigure my ML params rather than using it directly, cause I don’t really trust modern AI to free-float in runtime, and also costs.

surrTurr 443 days ago [-]

MacBook Pro m3 max; latest macos version:

Autotab has exited due to multiple fatal errors. Please contact support for assistance: contact@autotab.com.

jonasnelle 443 days ago [-]

Sorry about that! I don't see any matching errors from 2 hours ago in our logs - if you reach out to the contact@ email address with the email you used in Autotab, I'd be happy to take a closer look

linuxrebe1 443 days ago [-]

One thing I would recommend. Install instructions for Linux/Windows/Mac. Not finding them in the documentation.

alexirobbins 443 days ago [-]

Thanks for the note, we will try to make the install instructions clearer. The desktop app is available via a download button on the homepage: https://autotab.com

replwoacause 444 days ago [-]

Looks nice. Anybody else in this space? This one is on the pricier end but I’m just a single user so maybe not the target customer

Onavo 444 days ago [-]

If we are being honest, most of these browser screen scraping startups will be commoditized the moment OpenAI/Anthropic releases their next model. From my experience, having an in-house smaller model working in tandem with the bigger LLMs don't always necessarily produce a better result because in-context learning is just too powerful. The moment OpenAI releases a new model with a better prior, you will see a lot of these companies quietly swapping out their in-house "edge"/specialized fine tuned models. It's like those PDF data extraction companies that have been launching like crazy, 90% will be pivoting if they don't get enough B2B customers locked in. LLMs unfortunately is winner-take-all with the actual model providers cutting out all the middleman.

hailpixel 444 days ago [-]

AskUI could be a solution. It's also not just in browser, but the whole desktop: https://github.com/askui/vision-agent

replwoacause 444 days ago [-]

Thanks! Looks promising!

abrichr 443 days ago [-]

https://openadapt.ai is open source (MIT license).

replwoacause 443 days ago [-]

Will give it a try thanks!

alexirobbins 444 days ago [-]

Curious, what would you be interested in using Autotab for?

replwoacause 444 days ago [-]

Automating the creation of test orders in our Ecom and ERP tools is one possible use case I can think of, though I’m sure I’d find others in my day to day (possibly around some of the rote tasks I have in Confluence or DevOps)

alexirobbins 444 days ago [-]

That sounds like a really good use case! we're constrained by model costs but are interested in offering a lower cost plan – if you email me I'll see what we can do alexi@autotab.com

N4der 443 days ago [-]

Super cool. Congrats & well done. Can I install a Chrome extension within this browser and automate some actions on it?

jonasnelle 443 days ago [-]

Thanks! We currently have to manually add Chrome extensions on our side, but plan on supporting users installing arbitrary extensions in the future. So far we’ve found that most apps offer web UIs with the same functionality as the extension and Autotab can just use those.

What extension would you like to automate?

rno321 438 days ago [-]

Can you use this to auto apply online forms?

eddjlsh 443 days ago [-]

I tried it out on a website I am testing at work but sadly it failed to complete a form :(

alexirobbins 443 days ago [-]

what was the website? happy to help figure out your issue, you can also start a chat with us in the app (top left)

artificialLimbs 443 days ago [-]

'Google SSO'

Urgh. I was excited about this. Anxiously awaiting email/other SSO (we use MS).

jonasnelle 443 days ago [-]

Coming soon! Thanks for commenting, helps inform prioritization :)

kQsWEeE 443 days ago [-]

Hi, do you offer proxies?

jonasnelle 443 days ago [-]

Yes, proxies are something we can enable for select customers. If your use case requires them, feel free to reach out at contact@autotab.com

sciencesama 442 days ago [-]

Is it possible to get a personal license for testing ??

alexirobbins 442 days ago [-]

Yep! If you go to settings in the app you can pick your personal plan.

_1tem 443 days ago [-]

Where are the API docs / client libraries?

jonasnelle 443 days ago [-]

https://docs.autotab.com/api-reference/quickstart

Rendered at 14:50:31 GMT+0000 (Coordinated Universal Time) with Vercel.