NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
AI agents fail tasks 70% of the time (arxiv.org)
rogerkirkness 179 days ago [-]
Agents went from 10% to 30% reliable this year, which is still a big deal.
bogzz 178 days ago [-]
lol
drannex 178 days ago [-]
Yes! but, when they work, they only kinda work, sort of.
thebigspacefuck 178 days ago [-]
This is from a Dec 2024 which feels like a while ago
creatonez 174 days ago [-]
Hardly anything changed
bsallthewaydown 178 days ago [-]
AI is a going to be the next bubble. It can't even figure out who the real author of a sculpture is. It's really all BS made up to play with markets and geopolitics. Enjoy it while it lasts.
JTbane 179 days ago [-]
"We test baseline agents powered by both closed API-based and open-weights language models (LMs), and find that the most competitive agent can complete 30% of tasks autonomously."
gavinray 179 days ago [-]
So you ask it to try every task 3.33 times for guaranteed success?
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 07:27:38 GMT+0000 (Coordinated Universal Time) with Vercel.