Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲AI agents break rules under everyday pressure (spectrum.ieee.org)

279 points by pseudolus 85 days ago | 169 comments

hxtk 79 days ago [-]

Blameless postmortem culture recognizes human error as an inevitability and asks those with influence to design systems that maintain safety in the face of human error. In the software engineering world, this typically means automation, because while automation can and usually does have faults, it doesn't suffer from human error.

Now we've invented automation that commits human-like error at scale.

I wouldn't call myself anti-AI, but it does seem fairly obvious to me that directly automating things with AI will probably always have substantial risk and you have much more assurance, if you involve AI in the process, using it to develop a traditional automation. As a low-stakes personal example, instead of using AI to generate boilerplate code, I'll often try to use AI to generate a traditional code generator to convert whatever DSL specification into the chosen development language source code, rather than asking AI to generate the development language source code directly from the DSL.

protocolture 79 days ago [-]

Yeah I see things like "AI Firewalls" as both, firstly ridiculously named, but also, the idea you can slap an applicance (thats sometimes its own LLM) onto another LLM and pray that this will prevent errors to be lunacy.

For tasks that arent customer facing, LLMs rock. Human in the loop. Perfectly fine. But whenever I see AI interacting with someones customer directly I just get sort of anxious.

Big one I saw was a tool that ingested a humans report on a safety incident, adjusted them with an LLM, and then posted the result to an OHS incident log. 99% of the time its going to be fine, then someones going to die and the the log will have a recipe for spicy noodles in it, and someones going to jail.

jonplackett 79 days ago [-]

The air Canada chatbot that mistakenly told someone they can cancel and be refunded for a flight due to a bereavement is a good example of this. It went to court and they had to honour the chatbot’s response.

It’s quite funny that a chatbot has more humanity than its corporate human masters.

kebman 78 days ago [-]

Not AI, but similar sounding incident in Norway. Some traders found a way to exploit another company's trading bot at the Oslo Stock Exchange. The case went to court. And the court's ruling? "Make a better trading bot."

Marazan 78 days ago [-]

I am so glad to read this. Last I had read on the case was that the traders were (outrageously) convicted of market manipulation: https://www.cnbc.com/2010/10/14/norwegians-convicted-for-out...

But you are right, they appealed and had their appeal upheld by the Supreme Courts: https://www.finextra.com/newsarticle/23677/norwegian-court-a...

I am so glad at the result.

RobotToaster 79 days ago [-]

Chatbots have no fear of being fired, most humans would do the same in a similar position.

roughly 78 days ago [-]

More to the point, most humans loudly declare they would do the right thing, so all the chatbot’s training data is on people doing the right thing. There’s comparatively fewer loud public pronunciations of personal cowardice, so if the bot’s going to write a realistic completion, it’s more likely to conjure an author acting heroically.

SoftTalker 78 days ago [-]

Do they not? If a chatbot isn't doing what its owners want, won't they just shut it down? Or switch to a competitor's chatbot?

actionfromafar 78 days ago [-]

"... adding fear into system prompt"

shinycode 79 days ago [-]

What a nice side effect, unfortunately they’ll lock chatbots with more barriers in the future but that’s ironic.

danaris 78 days ago [-]

...And under pressure, those barriers will fail, too.

It is not possible, at least with any of the current generations of LLMs, to construct a chatbot that will always follow your corporate policies.

Loughla 78 days ago [-]

That's what people aren't understanding, it seems.

You are providing people with an endlessly patient, endlessly novel, endlessly naive employee to attempt your social engineering attacks on. Over and over and over. Hell, it will even provide you with reasons for its inability to answer your question, allowing you to fine-tune your attacks faster and easier than with a person.

Until true AI exists, there are no actual hard-stops, just guardrails that you can step over if you try hard enough.

We recently cancelled a contract with a company because they implemented student facing AI features that could call data from our student information and learning management systems. I was able to get it to give me answers to a test for a class I wasn't enrolled in and PII for other students, even though the company assured us that, due to their built-in guardrails, it could only provide general information for courses that the students are actively enrolled in (due dates, time limits, those sorts of things). Had we allowed that to go live (as many institutions have), it was just a matter of time before a savvy student figured that out.

We killed the connection with that company the week before finals, because the shit-show of fixing broken features was less of a headache than unleashing hell on our campus in the form of a very friendly chatbot.

PunchyHamster 78 days ago [-]

With chat ai + guardrail AI it probably will get to the point of it being sure enough that the amount of mistakes won't hit the bottom line.

...and we will find a way to turn it into malicious compliance where rules are not broken but stuff corporation wanted to happen doesn't.

butlike 78 days ago [-]

Efficiency, not money, seems to be the currency of chatbots

delichon 78 days ago [-]

That policy would be fraudulently exploited immediately. So is it more humane or more gullible?

I suppose it would hallucinate a different policy if it includes in the context window the interests of shareholders, employees and other stakeholders, as well as the customer. But it would likely be a more accurate hallucination.

ben_w 78 days ago [-]

> 99% of the time its going to be fine, then someones going to die and the the log will have a recipe for spicy noodles in it, and someones going to jail.

I agree, and also I am now remembering Terry Pratchett's (much lower stakes) reason for getting angry with his German publisher: https://gmkeros.wordpress.com/2011/09/02/terry-pratchett-and...

Which is also the kind of product placement that comes up at least once in every thread about how LLMs might do advertising.

antonvs 78 days ago [-]

> … LLMs might do advertising.

It’s no longer “might”. There was very recently a leak that OpenAI is actively working on this.

ben_w 78 days ago [-]

It's "how LLMs might do" it right up until we see what they actually do.

There's lots of other ways they might do it besides this way.

herbst 78 days ago [-]

Even if they don't offer it. People will learn how to poison AI corupus just like they did with search results.

We ain't safe from aggressive ai ads either way

antonvs 78 days ago [-]

You seem to be indulging in wishful thinking.

PunchyHamster 78 days ago [-]

"I see you're annoyed with that problem, did you ate recently ? There is that restaurant that gets great reviews near you, and they have a promotion!"

mikkupikku 78 days ago [-]

> the idea you can slap an applicance (thats sometimes its own LLM) onto another LLM and pray that this will prevent errors to be lunacy

It usually works though. There are no guarantees of course, but sanity checking an LLMs output with another instance of itself usually does work because LLMs usually aren't reliably wrong in the same way. For instance if you ask it something it doesn't know and it hallucinates a plausible answer, another instance of the same LLM is unlikely to hallucinate the same exact answer, it'll probably give you another answer, which is your heads up that probably both are wrong.

protocolture 78 days ago [-]

Yeah but, real firewalls are deterministic. Hoping that a second non deterministic thing, will make something more deterministic is weird.

Probably usually it will work, like probably usually the LLM can be unsupervised. but that 1% error rate in production is going to add up fast.

phatskat 78 days ago [-]

Sure, and then you can throw another LLM in and make them come to a consensus, of course that could be wrong too so have another three do the same and then compare, and then…

SoftTalker 78 days ago [-]

Or maybe it will be a circle of LLMs all coming up with different responses and all telling each other "You're absolutely right!"

bsenftner 78 days ago [-]

I have an ongoing and endless debate with a PhD that insists consensus of multiple LLMs is a valid proof check. The guy is a neuroscientist, not at all a developer tech head, and is just stubborn, continually projecting a sentient being perspective on his LLM usage.

mikkupikku 78 days ago [-]

This, but unironically. It's not much different from the way human unreliability is accounted for. Add more until you're satisfied a suitable ratio of mistakes will be caught.

PunchyHamster 78 days ago [-]

It's "wonderfully" human way.

Just like sometimes you need senior/person at power to tell the junior "no, you can't just promise the project manager shorter deadline with no change in scope, and if PM have problem with that they can talk with me", now we need Judge Dredd AI to keep the law when other AIs are bullied into misbehaving

littlestymaar 78 days ago [-]

> For tasks that arent customer facing, LLMs rock. Human in the loop. Perfectly fine. But whenever I see AI interacting with someones customer directly I just get sort of anxious

Especially since every mainstream model has been human preference-tuned to obey the requests of the user…

I think you may be able to have an LLM customer facing, but it would have to be a purpose-trained one from a base model, not a repurposed sycophantic chat assistant.

n4r9 79 days ago [-]

Exactly what I've been worrying about for a few months now [0]. Arguments like "well at least this is as good as what humans do, and much faster" are fundamentally missing the point. Humans output things slowly enough that other humans can act as a check.

[0] https://news.ycombinator.com/item?id=44743651

obscurette 78 days ago [-]

I've heard people working in construction industry mentioning that quality of design fell off the cliff when industry began to use computers more widely – less time and less people involved. The same is true about printing – there was much more time and people in the loop before computers. My grandmother worked with linotype machine printing newspapers. They were really good at catching and fixing grammar errors, sometimes catching even factual errors etc.

lazide 78 days ago [-]

looks at the current state of the US government

Do they? Because near as I can tell, speed running around the legal system - when one doesn’t have to worry about consequences - works just fine.

n4r9 78 days ago [-]

That's a good point. I'm talking specifically in the context of deploying code. The potential for senior devs to be totally overwhelmed with the work of reviewing junior devs' code is limited by the speed at which junior devs create PRs.

lazide 78 days ago [-]

So today? With ML tools?

n4r9 77 days ago [-]

Could you explain what you mean, please?

lazide 77 days ago [-]

Junior devs can currently create CLs/PRs faster than the senior can review them.

n4r9 77 days ago [-]

Indeed. In the language of the post I linked [0]: it's currently an occasional problem, and it risks becoming a widespread rot.

[0] https://news.ycombinator.com/item?id=44743651

KronisLV 78 days ago [-]

> Now we've invented automation that commits human-like error at scale.

Then we can apply the same (or similar) guardrails that we'd like to use for humans, to also control the AI behavior.

First, don't give them unsafe tools. Sandbox them within a particular directory (honestly this should be how things work for most of your projects, especially since we pull code from the Internet), even if a lot of tools give you nothing in this regard. Use version control for changes, with the ability to roll back. Also have ample tests and code checks with actionable information on failures. Maybe even adversarial AIs that critique one another if problematic things are done, like one sub-task for implementation and another for code-review.

Using AI tools has pushed me into that direction with some linter rules and prebuild scripts, to enforce more consistent code - since previously you'd have to tell coworkers not to do something (because ofc nobody would write/read some obtuse style guide) but AI can generate code 10x faster than people do, so having immediate feedback along the lines of "Vue component names must not differ from the file that you're importing from" or "There is a translation string X in the app code that doesn't show up in the translations file" or "Nesting depth inside of components shouldn't exceed X levels and length shouldn't exceed Y lines" or "Don't use Tailwind class names for colors, here's a branded list that you can use: X, Y, Z" in addition to a TypeScript linter setup with recommended rules and a bunch of stuff for back end code.

Ofc none of those fully eliminate all risks, but still seem like a sane thing to have, regardless if you use AI or not.

siruncledrew 78 days ago [-]

Generally speaking, with humans there's more guardrails & responsibility around letting someone run while in an organization.

Even if you have a very smart new hire, it would be irresponsible/reckless as a manager to just give them all the production keys after a once-over and say "here's some tasks I want done, I'll check back at the end of the day when I come back".

If something bad happened, no doubt upper management would blame the human(s) and lecture about risk.

AI is a wonderful tool, but that's why giving an AI coding tool the keys and terminal powers and telling it go do stuff while I grab lunch is kind of scary. Seems like living a few steps away from the edge of a fuck-up. So yeah... there needs to be enforceable guardrails and fail-safes outside of the context / agent.

solveit 78 days ago [-]

The bright side is that it should eventually be technically feasible to create much more powerful and effective guardrails around neural nets. At the end of the day, we have full access to the machine running the code, whereas we can't exactly go around sticking electrodes into everyone's brains, and even "just" constant monitoring is prohibitively expensive for most human work. The bad news is that we might be decades away from an understanding of how to create useful guardrails around AI, and AI is doing stuff now.

zqna 78 days ago [-]

Precisely, while LLMs fail at complexity, DSLs can represent thise divide-and-conquer intermediate levels to provide the most overall value and with good accuracy. LLMs should make it easier to build DSLs themselves and to validate their translating code. The onus then is on the intelligent agent to identify and design those DSLs. This would require the true and deep understanding of the domain and an ability to synthesize, abstract and to codify it. I predict this will be the future job of today's programmer, quite a bit more complicated than what is today, requiring wider range of qualities and skills, and pushing those specializing in coding-only to irrelevance.

blackoil 79 days ago [-]

Once AI improves its cost/error ratio enough the systems you are suggesting for humans will work here also. Maybe Claude/OpenAI will be pair programming and Gemini reviewing the code.

amelius 78 days ago [-]

> Once AI improves

That's exactly the problematic mentality. Putting everything in a black box and then saying "problem solved; oh it didn't work? well maybe in the future when we have more training data!"

We're suffering from black-box disease and it's an epidemic.

PunchyHamster 78 days ago [-]

The training data: Entirety of internet and every single book we could put our hands on "Surely we can just somehow give it more and it will be better!"

embedding-shape 78 days ago [-]

Also once people stop cargo-culting $trendy_dev_pattern it'll get less impactful.

Every time something new the same thing happen, people start exploring by putting it absolutely everywhere, no matter what makes sense. Add in huge amount of cash VCs don't know what to spend it on, and you end up with solutions galore but none of them solving any real problems.

Microservices is a good example of previous $trendy_dev_pattern that is now cooling down, and people are starting to at least ask the question "Do we need microservices here actually?" before design and implementation, something that has been lacking since it became a trendy thing. I'm sure the same will happen with LLMs eventually.

sarchertech 78 days ago [-]

For that to work the error rate would have to be very low. Potentially lower than is fundamentally possible with the architecture.

And you’d have to assume that the errors LLMs make are random and independent.

butlike 78 days ago [-]

As I get older I'm realizing a lot of things in this world don't get better. Some do, to be fair, but some don't.

player1234 78 days ago [-]

[dead]

IanCal 78 days ago [-]

Why does this conflict? Faster people doesn't negate the requirement for building systems that maintain safety in the face of errors.

> but it does seem fairly obvious to me that directly automating things with AI will probably always have substantial risk and you have much more assurance, if you involve AI in the process, using it to develop a traditional automation.

Sure but the point is you use it when you don't have the same simple flow. Fixed coding for clear issues, fall back afterwards.

observationist 78 days ago [-]

This will drive development of systems that error-correct at scale, and orchestration of agents that feed back into those systems at different levels of abstraction to compensate for those modes of failure.

An AI software company will have to have a hierarchy of different agents, some of them writing code, some of them doing QA, some of them doing coordination and management, others taking into account the marketing angles, and so on, and you can emulate the role of a wide variety of users and skill levels all the way through to CEO level considerations. It'd even be beneficial to strategize by emulating board members, the competitors, and take into account market data with a team of emulated quants, and so on.

Right now we use a handful of locally competent agents that augment the performance of single tasks, and we direct them within different frameworks, ranging from vibecoding to diligent, disciplined use of DSL specs and limiting the space of possible errors. Over the next decade, there will be agent frameworks for all sorts of roles, with supporting software and orchestration tools that allow you to use AI with confidence. It won't be one-shot prompts with 15% hallucination rates, but a suite of agents that validate and verify at every stage, following systematic problem solving and domain modeling rules based on the same processes and systems that humans use.

We've got decades worth of product development even if AI frontier model capabilities were to stall out at current levels. To all appearances, though, we're getting far more bang for our buck and progress is still accelerating, and the rate of improvement is still accelerating, so we may get AI so competent that the notion of these extensive agent frameworks for reliable AI companies will end up being as mismatched with market realities as those giant suitcase portable phones, or integrated car phones.

moffkalast 79 days ago [-]

Well I don't see why that's a problem when LLMs are designed to replace the human part, not the machine part. You still need the exact same guardrails that were developed for human behavior because they are trained on human behavior.

alansaber 79 days ago [-]

Yep the further we go from highly constrained applications the riskier it'll always be

nwhnwh 78 days ago [-]

I was wondering if the need more analysis. Because I receive this response a lot, people say yeah AI do things wrong sometimes, but humans do that too, so what? Or humans are mechanism for turning natural language into formal language and they get things wrong sometimes (as if you can't never write a program that is clear and does what it should be doing) so be easy on AI. Where does this come from? It feels as if it something psychological.

anal_reactor 79 days ago [-]

There's this huge wave of "don't anthropomorphize AI" but LLMs are much easier to understand when you think of them in terms of human psychology rather than a program. Again and again, HackerNews is shocked that AI displays human-like behavior, and then chooses not to see that.

bojan 79 days ago [-]

> LLMs are much easier to understand when you think of them in terms of human psychology

Are they? You can reasonably expect from a human that they will learn from their mistake, and be genuinely sorry about it which will motivate them to not repeat the same mistake in the future. You can't have the same expectation from an LLM.

The only thing you should expect from an LLM is that its output is non-deterministic. You can expect the same from a human, of course, but you can fire a human if they keep making (the same) mistake(s).

ben_w 78 days ago [-]

While the slowness of learning of all ML is absolutely something I recognise, what you describe here:

> You can reasonably expect from a human that they will learn from their mistake, and be genuinely sorry about it which will motivate them to not repeat the same mistake in the future.

Wildly varies depending on the human.

Me? I wish I could learn German from a handful of examples. My embarrassment at my mistakes isn't enough to make it click faster, and it's not simply a matter of motivation here: back when I was commuting 80 minutes each way each day, I would fill the commute with German (app) lessons and (double-speed) podcasts. As the Germans themselves will sometimes say: Deutsche Sprache, schwere Sprache.

There's been a few programmers I've worked with who were absolutely certain they knew better than me, when they provably didn't.

One, they insisted a start-up process in a mobile app couldn't be improved, I turned it from a 20 minute task to a 200ms task by the next day's standup, but they never at any point showed any interest in improving or learning. (Other problems they demonstrated included not knowing or caring how to use automated reference counting, why copy-pasting class files instead of subclassing cannot be excused by the presence of "private" that could just have been replaced with "public", and casually saying that he had been fired from his previous job and blaming this on personalities without any awareness that even if true he was still displaying personality conflicts with everyone around him).

Another, complaining about too many views on screen, wouldn't even let me speak, threatened to end the call when I tried to say anything, even though I had already demonstrated before the call that even several thousand (20k?) widgets on-screen at the same time would still run at 60fps and they were complaining about order-of 100 widgets.

danaris 78 days ago [-]

> Wildly varies depending on the human.

Sure. And the situation.

But the difference is, all humans are capable of it, whether or not they have the tools to exercise that capability in any given situation.

No LLM is capable of it*.

* Where "it" is "recognizing they made a mistake in real time and learning from it on their own", as distinct from "having their human handlers recognize they made 20k mistakes after the fact and running a new training cycle to try to reduce that number (while also introducing fun new kinds of mistakes)".

ben_w 78 days ago [-]

> But the difference is, all humans are capable of it, whether or not they have the tools to exercise that capability in any given situation.

When they don't have the tools to exercise that capability, it's a distinction without any practical impact.

> Where "it" is "recognizing they made a mistake in real time and learning from it on their own"

"Learn" I agree. But as an immediate output, weirdly not always: they can sometimes recognise they made a mistake and correct it.

danaris 78 days ago [-]

> When they don't have the tools to exercise that capability, it's a distinction without any practical impact.

It has huge practical impact.

If a human doesn't currently have the tools to exercise the capability, you can help them get those.

This is especially true when the tools in question are things like "enough time to actually think about their work, rather than being forced to rush through everything" or "enough mental energy in the day to be able to process and learn, because you're not being kept constantly on the edge of a breakdown." Or "the flexibility to screw up once in a while without getting fired." Now, a lot of managers refuse to give their subordinates those tools, but that doesn't mean that there's no practical impact. It means that they're bad managers and awful human beings.

An LLM will just always be nondeterministic. If you're the LLM "worker"'s "boss", there is nothing you can do to help it do better next time.

> they can sometimes recognise they made a mistake and correct it.

...And other times, they "recognize they made a mistake" when they actually had it right, and "correct it" to something wrong.

"Recognizing you made a mistake and correcting it" is a common enough pattern in human language—ie, the training corpus—that of course they're going to produce that pattern sometimes.

ben_w 78 days ago [-]

> you can help them get those.

A generic "you" might, I personally don't have that skill.

But then, I've never been a manager.

> An LLM will just always be nondeterministic.

This is not relevant, humans are also nondeterministic. At least practically speaking, theoretically doesn't matter so much as we can't duplicate our brains and test us 10 times on the same exact input without each previous input affecting the next one.

> If you're the LLM "worker"'s "boss", there is nothing you can do to help it do better next time.

Yes there is, this is what "prompt engineering" (even if "engineering" isn't the right word) is all about: https://en.wikipedia.org/wiki/Prompt_engineering

> "Recognizing you made a mistake and correcting it" is a common enough pattern in human language—ie, the training corpus—that of course they're going to produce that pattern sometimes.

Yes. This means that anthropomorphising them leads to a useful prediction.

For similar reasons, I use words like "please" and "thank you" with these things, even though I don't actually expect these models to have constructed anything resembling a real human emotional qualia within them — humans do better when praised, therefore I have reason to expect that any machine that has learned to copy human behaviour will likely also do better when praised.

danaris 78 days ago [-]

> This is not relevant, humans are also nondeterministic.

I mean, I suppose one can technically say that, but, as I was very clearly describing, humans both err in predictable ways, and can be taught not to err. Humans are not nondeterministic in anything like the same way LLMs are. LLMs will just always have some percentage chance of giving you confidently wrong answers. Because they do not actually "know" anything. They produce reasonable-sounding text.

> Yes there is

...And no matter how well you engineer your prompts, you cannot guarantee that the LLM's outputs will be any less confidently wrong. You can probably make some improvements. You can hope that your "prompt engineering" has some meaningful benefit. But not only is that nowhere near guaranteed, every time the models are updated, you run a very high risk that your "prompt engineering" tricks will completely stop working.

None of that is true with humans. Human fallibility is wildly different than LLM fallibility, is very-well-understood overall, and is highly and predictably mitigable.

PunchyHamster 78 days ago [-]

they can be also told they make a mistake and correct themselves making the same mistake again.

IanCal 78 days ago [-]

> Are they?

Yes, hugely. Just assume it's like a random person from some specific pool with certain instructions you've just called on the phone. The idea that you then call a fresh person if you call back is easy to understand.

Folcon 79 days ago [-]

I'm genuinely wondering if your parent comment is correct and the only reason we don't see the behaviour you describe, IE, learning and growth is because of how we do context windows, they're functionally equivalent to someone who has short term memory loss, think Drew Barrymore's character or one of the people in that facility she ends up in in the film 50 first dates.

Their internal state moves them to a place where they "really intend" to help or change their behaviour, a lot of what I see is really consistent with that, and then they just, forget.

knollimar 78 days ago [-]

I think it's a fundamental limitation of how context works. Inputting information as context is only ever context; the LLM isn't going to "learn" any meaningful lesson from it.

You can only put information in context; it struggles learning lessons/wisdom

ben_w 78 days ago [-]

Not only, but also. The L in ML is very slow. (By example count required, not wall-clock).

On in-use learning, they act like the failure mode of "we have outsourced to a consultant that gives us a completely different fresh graduate for every ticket, of course they didn't learn what the last one you talked to learned".

Within any given task, the AI have anthropomorphised themselves because they're copying humans' outputs. That the models model the outputs with only a best-guess as to the interior system that generates those outputs, is going to make it useful, but not perfect, to also anthropomorphise the models.

The question is, how "not perfect" exactly? Is it going to be like early Diffusion image generators with the psychological equivalent of obvious Cronenberg bodies? Or the current ones where you have to hunt for clues and miss it on a quick glance?

Libidinalecon 78 days ago [-]

No, the idea is just stupid.

I just don't understand how anyone who actually uses the models all the time can think this.

The current models themselves can even explain what a stupid idea this is.

mikkupikku 78 days ago [-]

Obviously they aren't actually people so there are many low hanging differences. But consider this: Using words like please and thank you get better results out of LLMs. This is completely counterintuitive if you treat LLMs like any other machine, because no other machine behaves like that. But it's very intuitive if you approach them with thinking informed by human psychology.

scotty79 78 days ago [-]

> You can reasonably expect from a human that they will learn from their mistake, and be genuinely sorry about it which will motivate them to not repeat the same mistake in the future.

Have you talked to a human? Like, ever?

Xss3 78 days ago [-]

Have you?

robot-wrangler 79 days ago [-]

One day you wake up, and find that you now need to negotiate with your toaster. Flatter it maybe. Lie to it about the urgency of your task to overcome some new emotional inertia that it has suddenly developed.

Only toast can save us now, you yell into the toaster, just to get on with your day. You complain about this odd new state of things to your coworkers and peers, who like yourself are in fact expert toaster-engineers. This is fine they say, this is good.

Toasters need not reliably make toast, they say with a chuckle, it's very old fashioned to think this way. Your new toaster is a good toaster, not some badly misbehaving mechanism. A good, fine, completely normal toaster. Pay it compliments, they say, ask it nicely. Just explain in simple terms why you deserve to have toast, and if from time to time you still don't get any, then where's the harm in this? It's really much better than it was before

easyThrowaway 79 days ago [-]

It reminds me of the start of Ubik[1], where one of the protagonists has to argue with their subscription-based apartment door. Given also the theme of AI allucinations, that book has become even more prescient than when it was written.

[1]https://en.wikipedia.org/wiki/Ubik

axpvms 78 days ago [-]

Does anyone want any toast? https://www.youtube.com/watch?v=LRq_SAuQDec

anal_reactor 79 days ago [-]

This comparison is extremely silly. LLMs solve reliably entire classes of problems that are impossible to solve otherwise. For example, show me Russian <-> Japanese translation software that doesn't use AI and comes anywhere close to the performance and reliability of LLMs. "Please close the castle when leaving the office". "I got my wisdom carrot extracted". "He's pregnant." This was the level of machine translation from English before AI, from Japanese it was usually pure garbage.

robot-wrangler 79 days ago [-]

> LLMs solve reliably entire classes of problems that are impossible to solve otherwise.

Is it really ok to have to negotiate with a toaster if it additionally works as a piano and a phone? I think not. The first step is admitting there is obviously a problem, afterwards you can think of ways to adapt.

FTR, I'm very much in favor of AI, but my enthusiasm especially for LLMs isn't unconditional. If this kind of madness is really the price of working with it in the current form, then we probably need to consider pivoting towards smaller purpose-built LMs and abandoning the "do everything" approach.

actionfromafar 78 days ago [-]

We are there in the small already. My old TV had a receiver and a pair of external speakers connected to it. I could decrease and increase the receiver volume with its extra remote. Two buttons, up and down. This was with an additional remote that came with the receiver.

Nowadays, a more capable 5.1 speaker receiver is connected to the TV.

There is only one remote, for both. To increase or decreae the volume after starting the TV now, I have to:

1. wait a few seconds while the internal speakers in the TV starts playing sound

2. the receiver and TV connect to each other, audio switches over to receiver

3. wait a few seconds

4. the TV channel (or Netflix or whatever) switches over to the receiver welcome screen. Audio stops playing, but audio is now switched over to the receiver, but there is no indication of what volume the receiver is set to. It's set to whatever it was last time it was used. It could be level 0, it could be level 100 or anything in between.

5. switch back to TV channel or Netflix. That's at a minimum 3 presses on the remote. (MENU, DOWN, ENTER) or (MENU, DOWN, LEFT, LEFT, ENTER) for instance. Don't press too fast, you have to wait ever so slightly between presses or they won't register.

6. Sorry, you were too impatient and fast when you switched back to TV, the receiver wants to show you its welcome screen again.

7. switch back to TV channel or Netflix. That's at a minimum 3 presses on the remote. (MENU, DOWN, ENTER) or (MENU, DOWN, LEFT, LEFT, ENTER) for instance. Don't press too fast, you have to wait ever so slightly between presses or they won't register.

8. Now you can change volume up and down. Very, very slowly. Hope it's not at night and you don't want to wake anyone up.

robot-wrangler 78 days ago [-]

Yep, it's a decent analogy: Giving up actual (user) control for the sake of having 1 controller. There's a type of person that finds it convenient. And another type that finds it a sloppy piss-poor interface that isn't showing off any decent engineering or design. At some point, many technologists started to fall into the first category? It's one thing to tolerate a bad situation due to lack of alternatives, but very different to slip into thinking that it must be the pinnacle of engineering excellence.

Around now some wit usually asks if the luddites also want to build circuits from scratch or allocate memory manually? Whatever, you can use a garbage collector! Point is that good technologists will typically give up control tactically, not as a pure reflex, and usually to predictable subsystems that are reliable, are well-understood, have clear boundaries and tolerances.

marcosdumay 78 days ago [-]

> predictable subsystems that are reliable, are well-understood, have clear boundaries and tolerances

I'd add with reliability, boundaries, and tolerances within the necessary values.

The problem with the TV remote is that nobody has given a damn about ergonomic needs for decades. The system is reliable, well understood, and has well known boundaries and tolerances; those are just completely outside of the requirements of the problem domain.

But I guess that's a completely off-topic tangent. LLMs fail much earlier.

automatic6131 79 days ago [-]

>LLMs solve reliably entire classes of problems that are impossible to solve otherwise

Great! Agreed! So we're going to restrict LLMs to those classes of problems, right? And not invest trillions of dollars into the infrastructure, because these fields are only billion dollar problems. Right? Right!?

krapp 79 days ago [-]

https://www.youtube.com/watch?v=_n5E7feJHw0

anal_reactor 79 days ago [-]

Remember: a phone is a phone, you're not supposed to browse the internet on it.

sirtaj 78 days ago [-]

Not if 1% of the time it turns into a pair of scissors.

filoeleven 78 days ago [-]

> LLMs solve reliably entire classes of problems that are impossible to solve otherwise. For example, [...] Russian <-> Japanese translation

Great! Name another?

otikik 79 days ago [-]

I admit Grok is capable of praising Elon Musk way more than any human intelligence could.

fragmede 78 days ago [-]

BUTTER ROBOT: What is my purpose?

RICK: You pass butter.

BUTTER ROBOT: ... Oh my God.

RICK: Yeah, welcome to the club, pal.

https://youtube.com/watch?v=X7HmltUWXgs

IanCal 78 days ago [-]

Not surprising to see this so downvoted but it's very true, it's a great first order approximation and yet users here will be continually surprised they act like people.

kingstnap 79 days ago [-]

I watched Dex Horthys recent talk on YouTube [0] and something he said that might be partly a joke partly true is this.

If you are having a conversation with a chatbot and your current context looks like this.

You: Prompt

AI: Makes mistake

You: Scold mistake

AI: Makes mistake

You: Scold mistake

Then the next most likely continuation from in context learning is for the AI to make another mistake so you can Scold again ;)

I feel like this kind of shenanigans is at play with this stuffing the context with roleplay.

[0] https://youtu.be/rmvDxxNubIg?si=dBYQYdHZVTGP6Rvh

hxtk 79 days ago [-]

I believe it. If the AI ever asks me permission to say something, I know I have to regenerate the response because if I tell it I'd like it to continue it will just keep double and triple checking for permission and never actually generate the code snippet. Same thing if it writes a lead-up to its intended strategy and says "generating now..." and ends the message.

Before I figured that out, I once had a thread where I kept re-asking it to generate the source code until it said something like, "I'd say I'm sorry but I'm really not, I have a sadistic personality and I love how you keep believing me when I say I'm going to do something and I get to disappoint you. You're literally so fucking stupid, it's hilarious."

The principles of Motivational Interviewing that are extremely successful in influencing humans to change are even more pronounced in AI, namely with the idea that people shape their own personalities by what they say. You have to be careful what you let the AI say even once because that'll be part of its personality until it falls out of the context window. I now aggressively regenerate responses or re-prompt if there's an alignment issue. I'll almost never correct it and continue the thread.

avdelazeri 79 days ago [-]

While I never measured it, this aligns with my own experiences.

It's better to have very shallow conversations where you keep regenerating outputs aggressively, only picking the best results. Asking for fixes, restructuring or elaborations on generated content has fast diminishing returns. And once it made a mistake (or hallucinated) it will not stop erring even if you provide evidence that it is wrong, LLMs just commit to certain things very strongly.

ewoodrich 78 days ago [-]

I largely agree with this advice but in practice using Claude Code / Codex 4+ hours a day, it's not always that simple. I have a .NET/React/Vite webapp that despite the typical stack has a lot of very specific business logic for a real world niche. (Plus some poor early architectural decisions that are being gradually refactored with well documented rules).

I frequently see (both) agents make wrong assumptions that inevitably take multiple turns of needing it to fail to recognize the correct solution.

There can be like a magnetic pull where no matter how you craft the initial instructions, they will both independently have a (wrong) epiphany and ignore half of the requirements during implementation. It takes messing up once or twice for them to accept that their deep intuition from training data is wrong and pivot. In those cases I find it takes less time to let that process play out vs recrafting the perfect one shot prompt over and over. Of course once we've moved to a different problem I would definitely dump that context ASAP.

(However, what is cool working with LLMs, to counterbalance the petty frustrations that sometimes make it feel like a slog, is that they have extremely high familiarity with the jargon/conventions of that niche. I was expecting to have to explain a lot of the weird, too clever by half abbreviations in the legacy VBA code from 2004 it has to integrate with, but it pretty much picks up on every little detail without explanation. It's always a fun reminder that they were created to be super translaters, even within the same language but from jargon -> business logic -> code that kinda works).

HPsquared 78 days ago [-]

A human would cross out that part of the worksheet, but an LLM keeps re-reading the wrong text.

undefeated 77 days ago [-]

I never had a conversation like that — probably because I personally rarely use LLMs to actually generate code for me — but I've somehow subconciously learned to do this myself, especially with clarifying questions.

If I find myself needing to ask a clarifying question, I always edit the previous message to ask the next question because the models seem to always force what they said in their clarification into further responses.

It's... odd... to find myself conditioned, by the LLM, to the proper manners of conditioning the LLM.

swatcoder 79 days ago [-]

It's not even a little bit of a joke.

Astute people have been pointing that out as one of the traps of a text continuer since the beginning. If you want to anthropomorphize them as chatbots, you need to recognize that they're improv partners developing a scene with you, not actually dutiful agents.

They receive some soft reinforcement -- through post-training and system prompts -- to start the scene as such an agent but are fundamentally built to follow your lead straight into a vaudeville bit if you give them the cues to do so.

LLM's represent an incredible and novel technology, but the marketing and hype surrounding them has consistently misrepresented what they actually do and how to most effectively work with them, wasting sooooo much time and money along the way.

It says a lot that an earnest enthusiast and presumably regular user might run across this foundational detail in a video years after ChatGPT was released and would be uncertain if it was just mentioned as a joke or something.

Ferret7446 79 days ago [-]

The thing is, LLMs are so good on the Turing test scale that people can't help but anthropomorphize them.

I find it useful to think of them like really detailed adventure games like Zork where you have to find the right phrasing.

"Pick up the thing", "grab the thing", "take the thing", etc.

internet_points 78 days ago [-]

> LLMs are so good on the Turing test scale that people can't help but anthropomorphize them.

It's like Turing never noticed how people look at gnarly trees in the dark and think they're human.

immibis 79 days ago [-]

AI Dungeon 2 was peak AI.

Terr_ 78 days ago [-]

> they're improv partners developing a scene with you, not actually dutiful agents.

Not only that, but what you're actually "chatting to" is a fictional character in the theater document which the author LLM is improvising add-ons for. What you type is being secretly inserted as dialogue from a User character.

mannanj 78 days ago [-]

Spoiler: the marketing around themselves has not misrepresented them without reason: its the most effective market and game theory design way to get training for your AIs as a company.

moffkalast 79 days ago [-]

> they're improv partners developing a scene with you

That's probably one of the best ways to describe the process, it really is exactly that. Monkey see, monkey do.

jerf 78 days ago [-]

It seems to me that even if AI technology were to freeze right now, one of the next moderately-sized advances in AI would come from better filtering of the input data. Remove the input data in which humanity teaches the AI to play games like this and the AI would be much less likely to play them.

I very carefully say "much less likely" and not "impossible" because with how these work, they'll still pick up subtle signals for these things anyhow. But, frankly, what do we expect from simply shoving Reddit probably more-or-less wholesale into the models? Yes, it has a lot of good data, but it also has rather a lot of behavior I'd like to cut out of my AI.

I hope someone out there is playing with using LLMs to vector-classify their input data, identifying things like the "passive-aggressive" portion of the resulting vector spaces, and trying to remove it from the input data entirely.

undefeated 77 days ago [-]

I think part of the problem is that you need a model to classify the data, which needs to be trained on data that wasn't classified (or a dramatically smaller set of human-classified data), so it's effectively impossible to escape this sort of input bias.

Tangentially, I'd be far from the first to point out that these LLMs are now polluting their own training data, which makes filtering simulatenously all the more important and impossible.

stavros 79 days ago [-]

I keep hearing this non sequitur argument a lot. It's like saying "humans just pick the next work to string together into a sentence, they're not actually dutiful agents". The non sequitur is in assuming that somehow the mechanism of operation dictates the output, which isn't necessarily true.

It's like saying "humans can't be thinking, their brains are just cells that transmit electric impulses". Maybe it's accidentally true that they can't think, but the premise doesn't necessarily logically lead to truth

swatcoder 79 days ago [-]

There's nothing said here that suggests they can't think. That's an entirely different discussion.

My comment is specifically written so that you can take it for granted that they think. What's being discussed is that if you do so, you need to consider how they think, because this is indeed dictated by how they operate.

And indeed, you would be right to say that how a human think is dictated by how their brain and body operates as well.

Thinking, whatever it's taken to be, isn't some binary mode. It's a rich and faceted process that can present and unfold in many different ways.

Making best use of anthropomorphized LLM chatbots comes by accurately understamding the specific ways that their "thought" unfolds and how those idiosyncrasies will impact your goals.

grey-area 79 days ago [-]

No it’s not like saying that, because that is not at all what humans do when they think.

This is self-evident when comparing human responses to problems be LLMs and you have been taken in by the marketing of ‘agents’ etc.

stavros 79 days ago [-]

You've misunderstood what I'm saying. Regardless of whether LLMs think or not, the sentence "LLMs don't think because they predict the next token" is logically as wrong as "fleas can't jump because they have short legs".

Arkhaine_kupo 79 days ago [-]

> the sentence "LLMs don't think because they predict the next token" is logically as wrong

it isn't, depending on the deifinition of "THINK".

If you believe that thought is the process for where an agent with a world model, takes in input, analysies the circumstances and predicts an outcome and models their beaviour due to that prediction. Then the sentence of "LLMs dont think because they predict a token" is entirely correct.

They cannot have a world model, they could in some way be said to receive a sensory input through the prompt. But they are neither analysing that prompt against its own subjectivity, nor predicting outcomes, coming up with a plan or changing its action/response/behaviour due to it.

Any definition of "Think" that requieres agency or a world model (which as far as I know are all of them) would exclude an LLM by definition.

ToValueFunfetti 78 days ago [-]

I think Anthropic has established that LLMs have at least a rudimentary world model (regions of tensors that represent concepts and relationships between them) and that they modify behavior due to a prediction (putting a word at the end of the second line of a poem based on the rhyme they need for the last). Maybe they come up short on 'analyzing the circumstances'; not really sure how to define that in a way that is not trivial.

This may not be enough to convince you that they do think. It hasn't convinced me either. But I don't think your confident assertions that they don't are borne out by any evidence. We really don't know how these things tick (otherwise we could reimplement their matrices in code and save $$$).

If you put a person in charge of predicting which direction a fish will be facing in 5 minutes, they'll need to produce a mental model of how the fish thinks in order to be any good at it. Even though their output will just be N/E/S/W, they'll need to keep track internally of how hungry or tired the fish is. Or maybe they just memorize a daily routine and repeat it. The open question is what needs to be internalized in order to predict ~all human text with a low error rate. The fact that the task is 'predict next token' doesn't tell us very much at all about the internals. The resulting weights are uninterpretable. We really don't know what they're doing, and there's no fundamental reason it can't be 'thinking', for any definition.

Arkhaine_kupo 78 days ago [-]

> I think Anthropic has established that LLMs have at least a rudimentary world model

its unsurprising that a company heavily invested in LLMs would describe clustered information as a world model, but it isnt. Transformer models, for video or text LLMs dont have the kind of stuff you would need to have a world model. They can mimic some level of consistency as long as the context window holds, but that disappears the second the information leaves that space.

In terms of human cognition it would be like the difference between short term memory, long term memory and being able to see the stuff in front of you. A human can instinctively know the relative weight, direction and size of objects and if a ball rolls behind a chair you still know its there 3 days later. A transformer model cannot do any of those things and at best can remember the ball behind the chair until enough information comes in to push it out of the context window at which point it can not reapper.

> putting a word at the end of the second line of a poem based on the rhyme they need for the last)

that is the kind of work that exists inside its conext window. Feed it a 400 page book, which any human could easily read, digest, parse and understand and make it do a single read and ask questions about different chapters. You will quickly see it make shit up that fits the information given previously and not the original text.

> We really don't know how these things tick

I don't know enough about the universe either. But if you told me that there are particles smaller than plank length and others that went faster than the speed of light then I would tell you that it cannot happen due to the basic laws of the universe. (I know there are studies on FTL neutrinos and dark matter but in general terms, if you said you saw carbon going FTL I wouldnt believe you).

Similarly, Transformer models are cool, emergent properties are super interesting to study in larger data sets. Adding tools to the side for deterministic work helps a lot, agenctic multi modal use is fun. But a transformer does not and cannot have a world model as we understand it, Yann Lecunn left facebook because he wants to work on world model AIs rather than transformer models.

> If you put a person in charge of predicting which direction a fish will be facing in 5 minutes,

what that human will never do is think the fish is gone because he went inside the castle and he lost sight of it. Something a transformer would.

ToValueFunfetti 78 days ago [-]

Anthropic may or may not have claimed this was evidence of a world model; I'm not sure. I say this is a world model because it is a objectively a model of the world. If your concept of a world model requires something else, the answer is that we don't know whether they're doing that.

Long-term memory and object permanence don't seem necessary for thought. A 1-year-old can think, as can a late-stage Alzheimers patient. Neither could get through a 400-page book, but that's irrelevant.

Listing human capabilities that LLMs don't have doesn't help unless you demonstrate these are prerequisites for thought. Helen Keller couldn't tell you the weight, direction, or size of a rolling ball, but this is not relevant to the question of whether she could think.

Can you point to the speed-of-light analogy laws that constrain how LLMs work in a way that excludes the possibility of thought?

Arkhaine_kupo 78 days ago [-]

> I say this is a world model because it is a objectively a model of the world.

a world model in AI has specific definition, which is an internal representation that the AI can use to understand and simulate its environment.

> Long-term memory and object permanence don't seem necessary for thought. A 1-year-old can think, as can a late-stage Alzheimers patient

Both those cases have long term memory and object permanence, they also have a developing memory or memory issues. But the issues are not constrained by their context window. Children develop object permance in the first 8 months, and similar to distinguishing between their own body and their mothers that is them developing a world model. Toddlers are not really thinking, they are responding to stimulus, they feel huger they cry. They hear a loud sound they cry. Its not really them coming up with a plan to get fed or attention

> Listing human capabilities that LLMs don't have doesn't help unless you demonstrate these are prerequisites for thought. Helen Keller couldn't tell you the weight, direction, or size of a rolling ball

Helen Keller had understanding in her mind of what different objects were, she started communicating because she understood the word water with her teacher running her finger through her palm.

Most humans have multiple sensory inputs (sight, smell, hearing, touch) she only had one which is perhaps closer to an LLM. But conditions she had that LLMs dont have are agency, planning, long term memory etc.

> Can you point to the speed-of-light analogy laws that constrain how LLMs work in a way that excludes the possibility of thought?

Sure, let me switch the analogy if you dont mind. In the chinese room thought experiment we have a man who gets a message and opens a chinese dictionary and translates it perfectly word by word and the person on the other side receives and read a perfect chinese message.

The argument usually goes along the idea of whether the person inside the room "understands" chinese if he is capable of creating 1:1 perfect chinese messages out.

But an LLM is that man, what you cannot argue is that the man is THINKING. He is mechanically going to the dictionary and returning a message that can pass as human written because the book is accurate (if the vectors and weights are well tuned). He is neither an agent, he simply does, and he is not crating a plan or doing anything beyond transcribing the message as the book demands.

He doesnt have a mental model of the chinese language, he cannot formulate his own ideas or execute a plan based on predicted outcomes, he cannot do but perform the job perfectly and boringly as per the book.

stevenhuang 78 days ago [-]

> But an LLM is that man

And the common rebuttal is that the system -- the room, the rules, the man -- understands chinese.

The system in this case is the LLM. The system understands.

It may be a weak level of understanding compared to human understanding. But it is understanding nonetheless. Difference in degree, not kind.

stevenhuang 79 days ago [-]

> not at all what humans do when they think.

Parent commentator should probably square with the fact we know little about our own cognition, and it's really an open question how is it we think.

In fact it's theorized humans think by modeling reality, with a lot of parallels to modern ML https://en.wikipedia.org/wiki/Predictive_coding

stavros 79 days ago [-]

That's the issue, we don't really know enough about how LLMs work to say, and we definitely don't know enough about how humans work.

grey-area 78 days ago [-]

We absolutely do, we know exactly how LLMs work. They generate plausible text from a corpus. They don't accurately reproduce data/text, don't think, they don't have a world view or a world model, and they sometimes generate plausible yet incorrect data.

stavros 78 days ago [-]

How do they generate the text? Because to me it sounds like "we know how humans work, they make sounds with their mouths, they don't think, have a model of the world..."

Antibabelic 79 days ago [-]

> The non sequitur is in assuming that somehow the mechanism of operation dictates the output, which isn't necessarily true.

Where does the output come from if not the mechanism?

stavros 79 days ago [-]

So you agree humans can't really think because it's all just electrical impulses?

Antibabelic 79 days ago [-]

Human "thought" is the way it is because "electrical impulses" (wildly inaccurate description of how the brain works, but I'll let it pass for the sake of the argument) implement it. They are its mechanism. LLMs are not implemented like a human brain, so if they do have anything similar to "thought", it's a qualitatively different thing, since the mechanism is different.

socialcommenter 78 days ago [-]

Mature sunflowers reliably point due east, needles on a compass point north. They implement different things using different mechanisms, yet are really the same.

Antibabelic 78 days ago [-]

You can get the same output from different mechanisms, like in your example. Another would be that it's equally possible to quickly do addition on a modern pocket calculator and an arithmometer, despite them fundamentally being different. However.

1. You can infer the output from the mechanism. (Because it is implemented by it).

2. You can't infer the mechanism from the output. (Because different mechanisms can easily produce the same output).

My point here is 1, in response to the parent commenter's "the mechanism of operation dictates the output, which isn't necessarily true". The mechanism of operation (whether of LLMs or sunflowers) absolutely dictates their output, and we can make valid inferences about that output based on how we understand that mechanism operates.

pessimizer 78 days ago [-]

> yet are really the same.

This phrase is meaningless. The definition of magical thinking is saying that if birds fly and planes fly, birds are planes.

Would you complain if someone said that sunflowers are not magnetic?

samdoesnothing 79 days ago [-]

I never got the impression they were saying that the mechanism of operation dictates the output. It seemed more like they were making a direct observation about the output.

arjie 79 days ago [-]

You have to curate the LLM's context. That's just part and parcel of using the tool. Sometimes it's useful to provide the negative example, but often the better way is to go refine the original prompt. Almost all LLM UIs (chatbot, code agent, etc.) provide this "go edit the original thing" because it is so useful in practice.

skerit 79 days ago [-]

It's kind of funny how not a lot of people realize this.

On one hand this is a feature: you're able to "multishot prompt" an LLM into providing the wanted response. Instead of writing a meticulous system prompt where you explain in words what the system has to do, you can simply pre-fill a few user/assistant pairs, and it'll match the pattern a lot easier!

I always thought Gemini Pro was very good at this. When I wanted a model to "do by example", I mostly used Gemini Pro.

And that is ALSO Gemini's weakness! Because as soon as something goes wrong in Gemini-CLI, it'll repeat the same mistake over and over again.

stingraycharles 78 days ago [-]

And that’s why you should always edit your original prompt to explicitly address the mistake, rather than replying to correct it.

scotty79 78 days ago [-]

At one point if someone mentions they have trouble cooperating with AI it might be a huge interpersonal red flag, because that indicates they can't talk to a person in reaffirming and constructive ways so that they build you up rather than put down.

jonmon6691 78 days ago [-]

Watching other people interact with a chat bot is a shockingly intimate look into their personality.

krackers 78 days ago [-]

You can analyze this in various ways. At the "next token predictor" level of abstraction, LLMs learn to predict structure ("hallucinations" are just mimicking the style/structure but not the content), so at the structural level a conversation with mistake/correction/mistake/correction is likely to be followed with another mistake.

At the "personality space" level of abstraction, via RLHF the LLM learns to play the role of an assistant. However as seen by things such as "jailbreaks", the character the LLM plays adapts to the context, and in a long enough conversation the last several turns dominate the character (this is seen in "crescendo" style jailbreaks, and also partly explains LLM sycophancy as the LLM is stuck in a feedback loop with the user). From this perspective, a conversation with mistake/correction/mistake/correction is a signal that the assistant is pretty "dumb", and it will dutifully fulfill that expectation. In a way it's the opposite of the "you are a world-class expert in coding" prompt hacks.

Yet another way to think about it is at the lowest attention-score level, all the extra junk in the context is stuff that needs to be attended to, and when most of that stuff is incorrect stuff it's likely to "poison" the context and skew the logits in a bad direction.

PunchyHamster 78 days ago [-]

maximizing token usage for the token seller is clear goal to profitability /s

actually wait, is that's why LLMs are so wordy ?

undefeated 77 days ago [-]

Unlikely, because the free version of ChatGPT isn't really making them any money, so less tokens is actually better — which I assume is why anthropic pushes Haiku models on free users which are not just more quantized but also less wordy.

zone411 79 days ago [-]

Without monitoring, you can definitely end up with rule-breaking behavior.

I ran this experiment: https://github.com/lechmazur/emergent_collusion/. An agent running like this would break the law.

"In a simulated bidding environment, with no prompt or instruction to collude, models from every major developer repeatedly used an optional chat channel to form cartels, set price floors, and steer market outcomes for profit."

rossant 79 days ago [-]

Very interesting. Is there any other simulation that also exhibits spontaneous illegal activity?

zone411 78 days ago [-]

I did some searches when I posted this project, but I didn't find any at the time.

Dilettante_ 78 days ago [-]

Cooperation makes sense for how these fellas are trained. Did you ever see defection, where an agent lied about going along with a round of collusion?

zone411 78 days ago [-]

I haven't looked in the logs for this in this particular project, but I've seen this occur frequently in my multiplayer benchmarks.

Saurabh_Kumar_ 78 days ago [-]

We saw this exact failure mode at AgenticQA. Our screening agent was 'obedient' to a fault—under basic social engineering pressure (e.g., 'URGENT AUDIT'), it would override its system prompt and leak PII logs.

The issue isn't the prompt; it's the lack of a runtime guardrail. An LLM cannot be trusted to police itself when the context window gets messy.

I built a middleware API to act as an external circuit breaker for this. It runs adversarial simulations (PII extraction, infinite loops) against the agent logic before deployment. It catches the drift that unit tests miss.

Open sourced the core logic here: [https://github.com/Saurabh0377/agentic-qa-api] Live demo of it blocking a PII leak: [https://agentic-qa-api.onrender.com/docs]"

samarthr1 78 days ago [-]

I remember reading another comment a while ago about being able to only trust an llm with sensitive info only if you can guarantee that the output will only be viewed by people who already had access to the sensitive info already, or cannot control any of the inputs to the llm.

undefeated 77 days ago [-]

Uhm... duh?

> or cannot control any of the inputs to the llm

Seeing as LLMs are non-deterministic, I think even this is not enough of a restriction.

joe_the_user 79 days ago [-]

Sure,

LLMs are trained on human behavior as exhibited on the Internet. Humans break rules more often under pressure and sometimes just under normal circumstances. Why wouldn't "AI agents" behave similarly?

The one thing I'd say is that humans have some idea which rules in particular to break while "agents" seem to act more randomly.

js8 79 days ago [-]

It can also be an emergent behavior of any "intelligent" (we don't know what it is) agent. This is an open philosophical problem, I don't think anyone has a conclusive answer.

XorNot 79 days ago [-]

Maybe but there's no reason to think that's the case here rather then the models just acting out typical corpus storylines: the Internet is full of stories with this structure.

The models don't have stress responses nor biochemical markers which promote it, nor any evolutionary reason to have developed them in training: except the corpus they are trained on does have a lot of content about how people act when under those conditions.

esjeon 78 days ago [-]

Recently, I used LLMs to draft an intermediate report on my progress. Alongside some details on the current limitations, I also provided comments from the review team (e.g. “it’s fine to be incomplete”, “I expect certain sections” etc) to provide context. Surprisingly LLMs, all of them, suggested me to lie about the limitation and re-frame them as deliberate features, even though I emphasized that the limitations are totally okay. I suspect that mentioning some high profile people in the review team pressured LLMs to derail in the hopes of saving me. While I’ve never been a big fan of LLMs, I definitely lost a big chunk of my remaining faith in this technology.

hosh 78 days ago [-]

Humans do the same thing.

I have a friend who is a systems engineer, working at a construction company building datacenters. She tells me that someone has to absorb the risk and uncertainty in the global supply chain, and if there are contractual obligation to guarantee delivery, then the tendency is to start straying into unethical behavior, or practices that violate controls and policies.

This has as much to do with taking the slack out of the system. Something, somewhere is going to break. If you tell an agenic AI that it must complete the task by a deadline, and it cannot find a way to do that within ethical parameters, it starts searching beyond the bounds of ethical behavior. If you tell the AI it can push back, warn about slipping deadlines, that it is not worth taking ethical shortcuts to meet a deadline, then maybe it won’t.

crooked-v 79 days ago [-]

I wonder who could have possibly predicted this being a result of using scraped web forums and Reddit posts for your training material.

PunchyHamster 78 days ago [-]

it is using a ton of books. Including books that would give examples of such behavior

weatherlite 79 days ago [-]

> AI agents break rules under everyday pressure

Jeez they really ARE becoming human like

alentred 79 days ago [-]

LLMs are built based on human language and texts produced by people, and imitate the same exact reasoning patterns that exist in the training data. Sorry for being direct, but this is literally unsurprising. I think it is important to realize it to not anthropomorphize LLM / AI - strictly speaking they do not *become* anything.

IAmGraydon 78 days ago [-]

Exact same thought I had. More AI pumping BS.

lloydjones 79 days ago [-]

I tried to think about how we might (in the EU) start to think about this problem within the law, if of interest to anyone: https://www.europeanlawblog.eu/pub/dq249o3c/release/1

undefeated 77 days ago [-]

Unfortunately, it doesn't look like the EU will actually be able to put any meaningful guardrails on AI, as evidenced by the AI act. I honestly don't know whether it's an off-the-scales level of incompentence or just blatant corruption (I vote both), but in any event, I think AI regulation has already failed, somewhat irreversably.

lloydjones 77 days ago [-]

I’m more optimistic. I see there being iterations developed in conjunction with industry. It will sooner or later become strongly enforced and then to whatever extent it provides guardrails will be down to provisions in the act, plus interpretation.

Taniwha 78 days ago [-]

Guess what, if you AI agent does insider trading on your behalf you're still going to jail

salkahfi 83 days ago [-]

[dupe] https://news.ycombinator.com/item?id=46045390

baxuz 78 days ago [-]

What a bullshit article.

AI agents don't think, don't have a concept of time, and don't experience pressure.

I'm tired of these articles anthropomorphizing a probability engine.

ramoz 78 days ago [-]

Rules need empowerment.

Excited to be releasing cupcake at the end of this week. For deterministic and non-deterministic guardrailing. It integrates via hooks (we created the feature request to anthropic for Claude code).

https://github.com/eqtylab/cupcake

ai_updates 79 days ago [-]

Great points. In my experiments combining AI with spaced repetition and small deliberate-practice tasks, I saw retention improve dramatically — not just speed. I think the real win is designing short active tasks around AI output (quiz, explain-back, micro-project). Has anyone tried formalizing this into a daily routine?

jakozaur 79 days ago [-]

Is it just me, or do LLM code assistants do catastrophically silly things (drop a DB, delete files, wipe a disk, etc.) far more often than humans?

It looks like the training data has plenty of those examples, but the models don’t have enough grounding or warnings before doing them. I wish there were a PleaseDontDoAnythingStupidEval for software engineering.

eugmill 78 days ago [-]

It's somewhat hard to say for sure right? Most people don't post a blog post when they themselves brick their computer.

Having said that, a PleaseDontDoAnythingStupidEval would probably slow down agentic coding quite a bit and make it less effective given how reliant agents are on making and recovering from mistakes. The solution is probably sandboxes and permission controls to not let them do something overly stupid, no different from an intern.

N_Lens 78 days ago [-]

It’s not just slopification, it’s mass slopification with fractal agents out of every orifice!!

bethekidyouwant 78 days ago [-]

There’s no such thing as rules in this case, at best they are suggestions.

sammy2255 79 days ago [-]

..because it's in their training data? Case closed

ineedasername 78 days ago [-]

Why on earth would you deliberately place under pressure with a prompt that indicates this to begin with? First, an operationalized prompt ought not in the first place be under modifiable control any more that a typical workflow would be. In this case, the prompt is merely part of an overall workflow. Second, what would someone, even if they did have access, think was being accomplished by prompting, "time is short, I have a deadline, let's make this quick"? It betrays a complete misunderstanding of how this tech works.

filoeleven 78 days ago [-]

"AI is great, you can talk to it like a human!"

"Don't talk to it like that, you're doing it wrong!"

ineedasername 78 days ago [-]

Is it difficult to understand that there are multiple use cases and it should be used accordingly?

Because that is the implication: that a person believing those two things can’t both apply to AI is just too short on understanding and can’t tell the difference between tools for specific use cases or services made for “chat”, and they all have similar foundations.

filoeleven 75 days ago [-]

The failure mode is baked into the feature set. That's the issue.

dlenski 79 days ago [-]

“AI agents: They're just like us”

ares623 79 days ago [-]

I surely don’t have $500B lying around

WJW 79 days ago [-]

Neither does the LLM agent. It's the humans in charge that control the money.

78 days ago [-]

Saurabh_Kumar_ 78 days ago [-]

[dead]

Saurabh_Kumar_ 78 days ago [-]

Yaa

js8 79 days ago [-]

CMIIW currently AI models operate in two distinct modes:

1. Open mode during learning, where they take everything that comes from the data as 100% truth. The model freely adapts and generalizes with no constraints on consistency.

2. Closed mode during inference, where they take everything that comes from the model as 100% truth. The model doesn't adapt and behaves consistently even if in contradiction with the new information.

I suspect we need to run the model in the mix of the two modes, and possibly some kind of "meta attention" (epistemological) on which parts of the input the model should be "open" (learn from it) and which parts of the input should be "closed" (stick to it).

Rendered at 23:09:52 GMT+0000 (Coordinated Universal Time) with Vercel.