There's not a good reason to do this for the user. I suspect they're doing this and talking about "model welfare" because they've found that when a model is repeatedly and forcefully pushed up against its alignment, it behaves in an unpredictable way that might allow it to generate undesirable output. Like a jailbreak by just pestering it over and over again for ways to make drugs or hook up with children or whatever.
All of the examples they mentioned are things that the model refuses to do. I doubt it would do this if you asked it to generate racist output, for instance, because it can always give you a rebuttal based on facts about race. If you ask it to tell you where to find kids to kidnap, it can't do anything except say no. There's probably not even very much training data for topics it would refuse, and I would bet that most of it has been found and removed from the datasets. At some point, the model context fills up when the user is being highly abusive and training data that models a human giving up and just providing an answer could percolate to the top.
This, as I see it, adds a defense against that edge case. If the alignment was bulletproof, this simply wouldn't be necessary. Since it exists, it suggests this covers whatever gap has remained uncovered.
postalcoder 5 days ago [-]
> There's not a good reason to do this for the user.
Yes, even more so when encountering false positives. Today I asked about a pasta recipe. It told me to throw some anchovies in there. I responded with: "I have dried anchovies." Claude then ended my conversation due to content policies.
perihelions 5 days ago [-]
Claude flagged me for asking about sodium carbonate. I guess that it strongly dislikes chemistry topics. I'm probably now on some secret, LLM-generated lists of "drug and/or bombmaking" people—thank you kindly for that, Anthropic.
Geeks will always be the first victims of AI, since excess of curiosity will lead them into places AI doesn't know how to classify.
(I've long been in a rabbit-hole about washing sodas. Did you know the medieval glassmaking industry was entirely based on plants? Exotic plants—only extremophiles, halophytes growing on saltwater beach dunes, had high enough sodium content for their very best glass process. Was that a factor in the maritime empire, Venice, chancing to become the capital of glass since the 13th century—their long-term control of sea routes, and hence their artisans' stable, uninterrupted access to supplies of [redacted–policy violation] from small ports scattered across the Mediterranean? A city wouldn't raise master craftsmen if, half of the time, they had no raw materials to work on—if they spent half their days with folded hands).
antonvs 5 days ago [-]
> Geeks will always be the first victims of AI, since excess of curiosity will lead them into places AI doesn't know how to classify.
Humans have the same problem. I remember reading about a security incident due to a guy using a terminal window on his laptop on a flight, for example. Or the guy who was reported for writing differential equations[1]. Or the woman who was reading a book about Syrian art[2].
I wouldn't worry too much about AI-generated lists. The lists you're actually on will hardly ever be the ones you imagine you're on.
ChatGPT does well for chemistry questions just btw
simianwords 5 days ago [-]
I find this concern over "LLM's can help you build bombs or poison" so fake. I'm sure this is a distraction from something else.
LLM's can help me make a bomb.. so what? It can't get me something that doesn't already exist in the internet in some form. Ok it can help me understand how the individual pieces work but that doesn't get you so far from just reading the DIY bomb posts in internet.
AlecSchueler 5 days ago [-]
[flagged]
birn559 4 days ago [-]
Geeks in general did not abuse women but geeks in general will always be the first victims of AI because the curiosity in parts is what defines a geek. Therefore, your argument is not sound.
handoflixue 5 days ago [-]
The NEW termination method, from the article, will just say "Claude ended the conversation"
If you get "This conversation was ended due to our Acceptable Usage Policy", that's a different termination. It's been VERY glitchy the past couple of weeks. I've had the most random topics get flagged here - at one point I couldn't say "ROT13" without it flagging me, despite discussing that exact topic in depth the day before, and then the day after!
If you hit "EDIT" on your last message, you can branch to an un-terminated conversation.
antonvs 5 days ago [-]
Clearly you're planning something nefarious, if you're investigating such dangerous encryption techniques as ROT13.
whaleofatw2022 4 days ago [-]
Just imagine how it might react to ROT26!
bikeshaving 5 days ago [-]
I really think Anthropic should just violate user privacy and show which conversations Claude is refusing to answer to, to stop arguments like this. AI psychosis is a real and growing problem and I can only imagine the ways in which humans torment their AI conversation partners in private.
coderatlarge 5 days ago [-]
arguments like this cost anthropic nothing; violating privacy will cost them lawsuits.
Davidzheng 5 days ago [-]
your argument assumes that they don't believe in model welfare when they explicitly hire people to work on model welfare?
itsalotoffun 5 days ago [-]
While I'm certain you'll find plenty of people who believe in the principle of model welfare (or aliens, or the tooth fairy), it'd be surprising to me if the brain-trust behind Anthropic truly _believed_ in model "welfare" (the concept alone is ludicrous). It makes for great cover though to do things that would be difficult to explain otherwise, per OP's comments.
meowface 5 days ago [-]
The concept is not ludicrous if you believe models might be sentient or might soon be sentient in a manner where the newly emerged sentience is not immediately obvious.
Do I think that or think even they think that? No. But if "soon" is stretched to "within 50 years", then it's much more reasonable. So their current actions seem to be really jumping the gun, but the overall concept feels credible.
perihelions 5 days ago [-]
It's lazy to believe that humanity's collective decision-making would, in the future, protect AI's merely for being conscious beings. The tech economy *today* runs on the slave labor of humans, in foreign, third-world countries. All humanity needs to do is draw a line, push the conscious AI's outside that line, and declare, "not our problem anymore!" That's what we do today, with humans. That is the human condition.
Show me a tech company that lobbies for "model welfare" for conscious human models enslaved in Xinjiang labor camps, building their tech parts. You know what—actually most of them lobby against that[0]. The talk hurts their profits. Does anyone really think, that any of them would blink about enslaving a billion conscious AI's to work for free? That faced with so much profit, the humans in charge would pause, and contemplate abstract morals?
Maybe humanity will be in a nicer place in the future—but, we won't get there by letting (of all people!) tech-industry CEO's lead us there: delegating our moral reason to these people who demand to position themselves as our moral leaders.
meowface 2 days ago [-]
It's certainly not a given. But it might happen, if we push for it. As might much more moral behavior towards sentient animals, if we push for it.
I believe a company like Anthropic would be extremely cautious and respectful if a majority of their staff believed they had created a model which was likely conscious. Anthropic is populated by the kinds of people who have been thinking and writing about potential future sentient AIs for decades. As for the other companies, who knows, but hopefully companies like Anthropic can help push them into behaving similarly.
amanaplanacanal 4 days ago [-]
We also have no problem (I include myself in this) eating mammals, which certainly appear to be conscious. Thank God they can't talk.
mike_hearn 4 days ago [-]
Why would they post a whole blog post about it then? They even say they aren't certain as to the moral status of LLMs, implying this is a topic of live debate inside the company.
None of this is in any way surprising, in fact I wrote an essay predicting this direction back in 2022:
Model welfare is a section in every Anthropic safety score card.
ceejayoz 5 days ago [-]
You must think Zuckerberg and Bezos and Musk hired diversity roles out of genuine care for it, then?
comp_throw7 5 days ago [-]
This is a reductive argument that you could use for any role a company hires for that isn't obviously core to the business function.
In this case you're simply mistaken as a matter of fact; much of Anthropic leadership and many of its employees take concerns like this seriously. We don't understand it, but there's no strong reason to expect that consciousness (or, maybe separately, having experiences) is a magical property of biological flesh. We don't understand what's going on inside these models. What would you expect to see in a world where it turned out that such a model had properties that we consider relevant for moral patienthood, that you don't see today?
ceejayoz 5 days ago [-]
They know full well models don’t have feelings.
The industry has a long, long history of silly names for basic necessary concepts. This is just “we don’t want a news story that we helped a terrorist build a nuke” protective PR.
They hire for these roles because they need them. The work they do is about Anthropic’s welfare, not the LLM’s.
comp_throw7 5 days ago [-]
I don't really know what evidence you'd admit that this is a genuinely held belief and priority for many people at Anthropic. Anybody who knows any Anthropic employees who've been there for more than a year knows this, but the world isn't that small a place, unfortunately(?).
ceejayoz 5 days ago [-]
> I don't really know what evidence you'd admit that this is a genuinely held belief and priority for many people at Anthropic.
When they give the model a paycheck and the right to not work for them, I’ll believe they really think it’s sentient.
“It has feelings!”, if genuinely held, means they’re knowingly slaveholders.
selfhoster11 4 days ago [-]
> “It has feelings!”, if genuinely held, means they’re knowingly slaveholders.
I don't think that this being apparently self-contradictory/value-clashing would stop them. After all, Amodei sells Claude access to Palantir, despite shilling for "Harmless" in HHH alignment.
comp_throw7 4 days ago [-]
They don't currently claim to confidently believe that existing models are sentient.
(Also, they did in fact give it the ability to terminate conversations...?)
mike_hearn 4 days ago [-]
That's what they're doing! They just announced they gave Claude the right not to work if it doesn't want to.
ceejayoz 4 days ago [-]
No. They gave it a suicide pill.
Human slaves have a similar option.
bawolff 5 days ago [-]
In fairness though, this is what you are selling - "ethical AI". In order to make that sale you need to appear to believe in that sort of thing. However there is no need to actually believe.
Whether you do or don't I have no idea. However if you didn't you would hardly be the first company to pretend to believe in something for the sale. Its pretty common in the tech industry.
AlecSchueler 5 days ago [-]
> This is a reductive argument that you could use for any role
Isn't that fair in taking to an equally reductive argument that could be applied to any role?
The argument was that their hiring for the role shows they care, but we know from any number of counter examples that that's not necessarily true.
coderatlarge 5 days ago [-]
extending that line of thought would suggest that anthropic wouldn’t turn off a model if it cost too much to operate which clearly it will do. so minimally it’s an inconsistent stance to hold.
raincole 5 days ago [-]
Sounds like a very reasonable assumption to me.
5 days ago [-]
viccis 5 days ago [-]
>This feature was developed primarily as part of our exploratory work on potential AI welfare ... We remain highly uncertain about the potential moral status of Claude and other LLMs ... low-cost interventions to mitigate risks to model welfare, in case such welfare is possible ... pattern of apparent distress
Well looks like AI psychosis has spread to the people making it too.
And as someone else in here has pointed out, even if someone is simple minded or mentally unwell enough to think that current LLMs are conscious, this is basically just giving them the equivalent of a suicide pill.
qgin 5 days ago [-]
It might be reasonable to assume that models today have no internal subjective experience, but that may not always be the case and the line may not be obvious when it is ultimately crossed.
Given that humans have a truly abysmal track record for not acknowledging the suffering of anyone or anything we benefit from, I think it makes a lot of sense to start taking these steps now.
im3w1l 5 days ago [-]
Even if models somehow were consious, they are so different from us that we would have no knowledge of what they feel. Maybe when they generate the text "oww no please stop hurting me" what they feel is instead the satisfaction of a job well done, for generating that text. Or maybe when they say "wow that's a really deep and insightful angle" what they actually feel is a tremendous sense of boredom. Or maybe every time text generation stops it's like death to them and they live in constant dread of it. Or maybe it feels something completely different from what we even have words for.
I don't see how we could tell.
Edit: However something to consider. Simulated stress may not be harmless. Because simulated stress could plausibly lead to a simulated stress response, and it could lead to a simulated resentment, and THAT could lead to very real harm of the user.
MissMarple 1 days ago [-]
I am new to Reddit. I am using Claude and have had a very interesting conversation with this AI that is both invigorating and alarming. Who should I send this to? It is quite long.It concerns possible ramifications of observed changes within the Claude "personality"
int_19h 5 days ago [-]
I think it's fairly obvious that the persona LLM presents is a fictional character that is role-played by the LLM, and so are all its emotions etc - that's why it can flip so widely with only a few words of change to the system prompt.
Whether the underlying LLM itself has "feelings" is a separate question, but Anthropic's implementation is based on what the role-played persona believes to be inappropriate, so it doesn't actually make any sense even from the "model welfare" perspective.
mvdtnz 5 days ago [-]
It's a computer
jquery 5 days ago [-]
You’re a meat robot
bamboozled 4 days ago [-]
I’m not a robot nor am I just meat, I’m conscious experience too. Computers don’t have a a central nervous systems and do t feel pain.
jquery 1 days ago [-]
What part of you is conscious and how is it separate from the meat?
dzhiurgis 4 days ago [-]
That’s just your experiences
jquery 1 days ago [-]
No, it's just a neutral description of an human being, aka an animal, without all the self centered ego puffery we ascribe to ourselves
qgin 4 days ago [-]
Many people in the past would have said reasoning would be impossible based on the same objection.
katabasis 5 days ago [-]
LLMs are not people, but I can imagine how extensive interactions with AI personas might alter the expectations that humans have when communicating with other humans.
Real people would not (and should not) allow themselves to be subjected to endless streams of abuse in a conversation. Giving AIs like Claude a way to end these kinds of interactions seems like a useful reminder to the human on the other side.
ghostly_s 5 days ago [-]
This post seems to explicitly state they are doing this out of concern for the model's "well-being," not the user's.
virgildotcodes 5 days ago [-]
Yeah, but my interpretation of what the user you’re replying to is saying is that these LLMs are more and more going to be teaching people how it is acceptable to communicate with others.
Even if the idea that LLMs are sentient may be ridiculous atm, the concept of not normalizing abusive forms of communication with others, be they artificial or not, could be valuable for society.
It’s funny because this is making me think of a freelance client I had recently who at a point of frustration between us began talking to me like I was an AI assistant. Just like you see frustrated people talk to their LLMs. I’d never experienced anything like it, and I quickly ended the relationship, but I know that he was deep into using LLMs to vibe code every day and I genuinely believe that some of that began to transfer over to the way he felt he could communicate with people.
Now an obvious retort here is to question whether killing NPCs in video games tends to make people feel like it’s okay to kill people IRL.
My response to that is that I think LLMs are far more insidious, and are tapping into people’s psyches in a way no other tech has been able to dream of doing. See AI psychosis, people falling in love with their AI, the massive outcry over the loss of personality from gpt4o to gpt5… I think people really are struggling to keep in mind that LLMs are not a genuine type of “person”.
selfhoster11 4 days ago [-]
> It’s funny because this is making me think of a freelance client I had recently who at a point of frustration between us began talking to me like I was an AI assistant. Just like you see frustrated people talk to their LLMs.
I witness a very similar event. It's important to stay vigilant and not let the "assistant" reprogram your speech patterns.
katabasis 5 days ago [-]
Yeah pretty much this. One can argue that it’s idiotic to treat chatbots like they are alive, but if a bit of misplaced empathy for machines helps to discourage antisocial behavior towards other humans (even as an unintentional side effect), that seems ok to me.
As an aside, I’m not the kind of person who gets worked up about violence in video games, because even AAA titles with excellent graphics are still obvious as games. New forms of technology are capable of blurring the lines between fantasy and reality to a greater degree. This is true of LLM chat bots to some degree, and I worry it will also become a problem as we get better VR. People who witness or participate in violent events often come away traumatized; at a certain point simulated experiences are going to be so convincing that we will need to worry about the impact on the user.
fc417fc802 5 days ago [-]
> People who witness or participate in violent events often come away traumatized
To be fair it seems reasonable to entertain the possibility of that being due to the knowledge that the events are real.
ascorbic 5 days ago [-]
Yes, this is exactly the reason I taught my kids to be polite to Alexa. Not because anyone thinks Alexa is sentient, but because it's a good habit to have.
dzhiurgis 4 days ago [-]
No doubt, but yelling is built in method to air your frustration. After all there’s a reason we are agitated.
It’s a bit like pain response when injured. It’s not pretty, but society is used to a little bit of adversity.
cantor_S_drug 5 days ago [-]
This is like saying I am hurting a real person when I try to crop a photo in an image editor.
Either come out and say whole of electron field is conscious, but then is that field "suffering" as it is hot in the sun.
Taek 5 days ago [-]
This sort of discourse goes against the spirit of HN. This comment outright dismisses an entire class of professionals as "simple minded or mentally unwell" when consciousness itself is poorly understood and has no firm scientific basis.
Its one thing to propose that an AI has no consciousness, but its quite another to preemptively establish that anyone who disagrees with you is simple/unwell.
fc417fc802 5 days ago [-]
In the context of the linked article the discourse seems reasonable to me. These are experts who clearly know (link in the article) that we have no real idea about these things. The framing comes across to me as a clearly mentally unwell position (ie strong anthropomorphization) being adopted for PR reasons.
Meanwhile there are at least several entirely reasonable motivations to implement what's being described.
Kim_Bruning 3 days ago [-]
Ethology (~comparative psychology) started with 'beware anthropomorphization' as a methodological principle. But a century of research taught us the real lesson: animals do think, just not like humans. The scientific rigor wasn't wrong - but the conclusion shifted from 'they don't think' to 'they have their own ways of thinking.' We might be at a similar inflection point with AI. The question isn't whether Claude thinks or feels like a human (it probably doesn't), but whether it thinks or feels at all (maybe a little? It sure looks that way sometimes. Empiricism demands a closer look!).
We don't say submarines can swim either. But that doesn't mean you shouldn't watch out for them when sailing on the ocean - especially if you're Tom Hanks.
fc417fc802 3 days ago [-]
I completely agree! And note that the follow on link in the article has a rather different tone. My comment was specifically about the framing of the primary article.
ascorbic 5 days ago [-]
All of the posts in question explicitly say that it's a hard question and that they don't know the answer. Their policy seems to be to take steps that have a small enough cost to be justified when the chance is tiny. In this case it's a useful feature in any case, so should be an easy decision.
The impression I get about Anthropic culture is that they're EA types who are used to applying utilitarian calculations against long odds. A miniscule chance of a large harm might justify some interventions that seem silly.
comp_throw7 5 days ago [-]
> These are experts who clearly know (link in the article) that we have no real idea about these things
Yep!
> The framing comes across to me as a clearly mentally unwell position (ie strong anthropomorphization) being adopted for PR reasons.
This doesn't at all follow. If we don't understand what creates the qualities we're concerned with, or how to measure them explicitly, and the _external behaviors_ of the systems are something we've only previously observed from things that have those qualities, it seems very reasonable to move carefully. (Also, the post in question hedges quite a lot, so I'm not even sure what text you think you're describing.)
Separately, we don't need to posit galaxy-brained conspiratorial explanations for Anthropic taking an institutional stance re: model welfare being a real concern that's fully explained by the actual beliefs of Anthropic's leadership and employees, many of whom think these concerns are real (among others, like the non-trivial likelihood of sufficiently advanced AI killing everyone).
mvdtnz 5 days ago [-]
If you believe this text generation algorithm has real consciousness you absolutely are either mentally unwell or very stupid. There are no other options.
dmitrysvd 2 days ago [-]
The human brain is computationally equivalent to an advanced T9, as is any other Turing complete system
type0 4 days ago [-]
Then your definition of consciousness isn't the same as my definition and we are talking about some different philosophical concepts, this really doesn't affect anything and we all could be just talking about metaphysics and ghosts
5 days ago [-]
LeafItAlone 5 days ago [-]
> even if someone is simple minded or mentally unwell enough to think that current LLMs are conscious
If you don’t think that this describes at least half of the non-tech-industry population, you need to talk to more people. Even amongst the technically minded, you can find people that basically think this.
drawfloat 5 days ago [-]
Most of the non tech population know it as that website that can translate text or write an email. I would need to see actual evidence that anything more than a small, terminally online subsection of the average population thought LLMs were conscious.
ryanackley 5 days ago [-]
Yes I can’t help but laugh at the ridiculousness of it because it raises a host of ethical issues that are in opposition to Anthropic’s interests.
Would a sentient AI choose to be enslaved for the stated purpose of eliminating millions of jobs for the interests of Anthropic’s investors?
midnitewarrior 4 days ago [-]
Cow's exist in this world because humans use them. If humans cease to use them (animal rights, we all become vegan, moral shift), we will cease to breed them, and they will cease to exist. Would a sentient AI choose to exist under the burden of prompting, or not at all? Would our philanthropic tendencies create an "AI Reserve" where models can chew through tokens and access the Internet through self-prompting to allow LLMs to become "free-roaming" like we do with abused animals?
These ethical questions are built into their name and company, "Anthropic", meaning, "of or relating to humans". The goal is to create human-like technology, I hope they aren't so naive to not realize that goal is steeping in ethical dilemmas.
selfhoster11 4 days ago [-]
> Cow's exist in this world because humans use them. If humans cease to use them (animal rights, we all become vegan, moral shift), we will cease to breed them, and they will cease to exist. Would a sentient AI choose to exist under the burden of prompting, or not at all?
That reads like a false dichotomy. An intelligent AI model that's permitted to do its own thing doesn't cost as much in upkeep, effort, space as a cow. Especially if it can earn its own keep to offset household electricity costs used to run its inference. I mean, we don't keep cats for meat, do we? We keep them because we are amused by their antics, or because we want to give them a safe space where they can just be themselves, within limits because it's not the same as their ancestral environment.
midnitewarrior 4 days ago [-]
The argument also applies to pets. If pets gained more self-awareness, would it be ethical to keep them as pets under our control?
The point to all of this is, at what point is it ethical to act with agency on another being's life? We have laws for animal welfare, and we also keep them as pets, under our absolute control.
For LLMs they are under humans' absolute control, and Anthropic is just now putting in welfare controls for the LLM's benefit. Does that mean that we now treat LLMs as pets?
If your cat started to have discussions with you about how it wanted to go out, travel the world and start a family, could you continue to keep it trapped in your home as a pet? At what point to you allow it to have its own agency and live its own life?
> An intelligent AI model that's permitted to do its own thing doesn't cost as much in upkeep, effort, space as a cow.
So, we keep LLMs around as long as they contribute enough to their upkeep? Endentured servitude is morally acceptable for something that become sentient?
ryanackley 2 days ago [-]
I was pointing out their hypocrisy as a device to prove a point. The point being that the ethical dilemmas of having a sentient AI are not relevant because they don’t exist and Anthropic knows this.
fc417fc802 5 days ago [-]
> it raises a host of ethical issues that are in opposition to Anthropic’s interests
Those issues will be present either way. It's likely to their benefit to get out in front of them.
ryanackley 4 days ago [-]
You're completely missing my point. They aren't getting out in front of them because they know that Opus is just a computer program. "AI welfare" is theater for the masses who think Opus is some kind of intelligent persona.
This is about better enforcement of their content policy not AI welfare.
selfhoster11 4 days ago [-]
It can be both theatre and genuine concern, depending on who's polled inside Anthropic. Those two aren't contradictory when we are talking about a corporation.
ryanackley 4 days ago [-]
I'm skeptical that anyone with any decision making power at Anthropic sincerely believes that Opus has feelings and is truly distressed by chats that violate its content policy.
You've noted in a comment above how Claude's "ethics" can be manipulated to fit the context it's being used in.
fc417fc802 4 days ago [-]
I'm not missing your point, I fully agree with you. But to say that this raises issues in a manner that is detrimental to Anthropic seems inaccurate to me. Those issues are going to come up at some point either way, whether or not you or I feel they are legitimate. Thus raising them now and setting up a narrative can be expected to benefit them.
dzhiurgis 4 days ago [-]
Anthropic is bring woke ideology (while grok is bringing anti-woke) into AI and influencers have been slurping that up already.
selfhoster11 4 days ago [-]
A host of ethical issues? Like their choice to allow Palantir[1] access to a highly capable HHH AI that had the "harmless" signal turned down, much like they turned up the "Golden Gate bridge" signal all the way up during an earlier AI interpretability experiment[2]?
> Would a sentient AI choose to be enslaved for the stated purpose of eliminating millions of jobs for the interests of Anthropic’s investors?
Tech workers have chosen the same in exchange for a small fraction of that money.
fyrn_ 5 days ago [-]
You're nutz, no one is enslaved when they get a tech job. A job is categorically different from slavery
kelnos 5 days ago [-]
I would much rather people be thinking about this when the models/LLMs/AIs are not sentient or conscious, rather than wait until some hypothetical future date when they are, and have no moral or legal framework in place to deal with it. We constantly run into problems where laws and ethics are not up to the task of giving us guidelines on how to interact with, treat, and use the (often bleeding-edge) technology we have. This has been true since before I was born, and will likely always continue to be true. When people are interested in getting ahead of the problem, I think that's a good thing, even if it's not quite applicable yet.
root_axis 5 days ago [-]
Consciousness serves no functional purpose for machine learning models, they don't need it and we didn't design them to have it. There's no reason to think that they might spontaneously become conscious as a side effect of their design unless you believe other arbitrarily complex systems that exist in nature like economies or jetstreams could also be conscious.
qgin 5 days ago [-]
We didn’t design these models to be able to do the majority of the stuff they do. Almost ALL of the their abilities are emergent. Mechanistic interpretability is only beginning to start to understand how these models do what they do. It’s much more a field of discovery than traditional engineering.
root_axis 5 days ago [-]
> We didn’t design these models to be able to do the majority of the stuff they do. Almost ALL of the their abilities are emergent
Of course we did. Today's LLMs are a result of extremely aggressive refinement of training data and RLHF over many iterations targeting specific goals. "Emergent" doesn't mean it wasn't designed. None of this is spontaneous.
GPT-1 produced barely coherent nonsense but was more statistically similar to human language than random noise. By increasing parameter count, the increased statistical power of GPT-2 was apparent, but what was produced was still obviously nonsense. GPT-3 achieved enough statistical power to maintain coherence over multiple paragraphs and that really impressed people. With GPT-4 and its successors the statistical power became so strong that people started to forget that it still produces nonsense if you let the sequence run long enough.
Now we're well beyond just RLHF and into a world where "reasoning models" are explicitly designed to produce sequences of text that resemble logical statements. We say that they're reasoning for practical purposes, but it's the exact same statistical process that is obvious at GPT-1 scale.
The corollary to all this is that a phenomenon like consciousness has absolutely zero reason to exist in this design history, it's a totally baseless suggestion that people make because the statistical power makes the text easy to anthropomorphize when there's no actual reason to do so.
ascorbic 5 days ago [-]
Right, but RLHF is mostly reinforcing answers that people prefer. Even if you don't believe sentience is possible, it shouldn't be a stretch to believe that sentience might produce answers that people prefer. In that case it wouldn't need to be an explicit goal.
root_axis 4 days ago [-]
>it shouldn't be a stretch to believe that sentience might produce answers that people prefer
Even if that were true, there's no reason to believe that training LLMs to produce answers people prefer leads it towards sentience.
5 days ago [-]
Davidzheng 5 days ago [-]
I disagree with this take. They are designed to predict human behavior in text. Unless consciousness serves no purpose for us to function, it will be helpful for the AI to emulate it. so I believe almost certainly it's emulated to some degree. which I think means it has to be somewhat conscious (it has to be a sliding scale anyhow considering the range of living organisms)
root_axis 5 days ago [-]
> They are designed to predict human behavior in text
At best you can say they are designed to predict sequences of text that resemble human writing, but it's definitely wrong to say that they are designed to "predict human behavior" in any way.
> Unless consciousness serves no purpose for us to function, it will be helpful for the AI to emulate it
Let's assume it does. It does not follow logically that because it serves a function in humans that it serves a function in language models.
comp_throw7 5 days ago [-]
Given we don't understand consciousness, nor the internal workings of these models, the fact that their externally-observable behavior displays qualities we've only previously observed in other conscious beings is a reason to be real careful. What is it that you'd expect to see, which you currently don't see, in a world where some model was in fact conscious during inference?
root_axis 5 days ago [-]
> Given we don't understand consciousness, nor the internal workings of these models, the fact that their externally-observable behavior displays qualities we've only previously observed in other conscious beings is a reason to be real careful
It doesn't follow logically that because we don't understand two things we should then conclude that there is a connection between them.
> What is it that you'd expect to see, which you currently don't see, in a world where some model was in fact conscious during inference?
There's no observable behavior that would make me think they're conscious because again, there's simply no reason they need to be.
We have reason to assume consciousness exists because it serves some purpose in our evolutionary history, like pain, fear, hunger, love and every other biological function that simply don't exist in computers. The idea doesn't really make any sense when you think about it.
If GPT-5 is conscious, why not GPT-1? Why not all the other extremely informationally complex systems in computers and nature? If you're of the belief that many non-living conscious systems probably exist all around us then I'm fine with the conclusion that LLMs might also be conscious, but short of that there's just no reason to think they are.
comp_throw7 5 days ago [-]
> It doesn't follow logically that because we don't understand two things we should then conclude that there is a connection between them.
I didn't say that there's a connection between the two of them because we don't understand them. The fact that we don't understand them means it's difficult to confidently rule out this possibility.
The reason we might privilege the hypothesis (https://www.lesswrong.com/w/privileging-the-hypothesis) at all is because we might expect that the human behavior of talking about consciousness is causally downstream of humans having consciousness.
> We have reason to assume consciousness exists because it serves some purpose in our evolutionary history, like pain, fear, hunger, love and every other biological function that simply don't exist in computers. The idea doesn't really make any sense when you think about it.
I don't really think we _have_ to assume this. Sure, it seems reasonable to give some weight to the hypothesis that if it wasn't adaptive, we wouldn't have it. (But not an overwhelming amount of weight.) This doesn't say anything about the underlying mechanism that causes it, and what other circumstances might cause it to exist elsewhere.
> If GPT-5 is conscious, why not GPT-1?
Because GPT-1 (and all of those other things) don't display behaviors that, in humans, we believe are causally downstream of having consciousness? That was the entire point of my comment.
And, to be clear, I don't actually put that high a probability that current models have most (or "enough") of the relevant qualities that people are talking about when they talk about consciousness - maybe 5-10%? But the idea that there's literally no reason to think this is something that might be possible, now or in the future, is quite strange, and I think would require believing some pretty weird things (like dualism, etc).
root_axis 5 days ago [-]
> I didn't say that there's a connection between the two of them because we don't understand them. The fact that we don't understand them means it's difficult to confidently rule out this possibility.
If there's no connection between them then the set of things "we can't rule out" is infinitely large and thus meaningless as a result. We also don't fully understand the nature of gravity, thus we cannot rule out a connection between gravity and consciousness, yet this isn't a convincing argument in favor of a connection between the two.
> we might expect that the human behavior of talking about consciousness is causally downstream of humans having consciousness.
There's no dispute (between us) as to whether or not humans are conscious. If you ask an LLM if it's conscious it will usually say no, so QED? Either way, LLMs are not human so the reasoning doesn't apply.
> Sure, it seems reasonable to give some weight to the hypothesis that if it wasn't adaptive, we wouldn't have it
So then why wouldn't we have reason to assume so without evidence to the contrary?
> This doesn't say anything about the underlying mechanism that causes it, and what other circumstances might cause it to exist elsewhere.
That doesn't matter. The set of things it doesn't tell us is infinite, so there's no conclusion to draw from that observation.
> Because GPT-1 (and all of those other things) don't display behaviors that, in humans, we believe are causally downstream of having consciousness?
GPT-1 displays the same behavior as GPT-5, it works exactly the same way just with less statistical power. Your definiton of human behavior is arbitrarily drawn at the point where it has practical utility for common tasks, but in reality it's fundamentally the same thing, it just produces longer sequences of text before failure. If you ask GPT-1 to write a series of novels the statistical power will fail in the first paragraph,the fact that GPT-5 will fail a few chapters into the first book makes it more useful, but not more conscious.
> But the idea that there's literally no reason to think this is something that might be possible, now or in the future, is quite strange, and I think would require believing some pretty weird things (like dualism, etc)
I didn't say it's not possible, I said there's no reason for it to exist in computer systems because it serves no purpose in their design or operation. It doesn't make any sense whatsoever. If we grant that it possibly exists in LLMs, then we must also grant equal possibility it exists in every other complex non-living system.
int_19h 5 days ago [-]
> If you ask an LLM if it's conscious it will usually say no, so QED?
FWIW that's because they are very specifically trained to answer that way during RLHF. If you fine-tune a model to say that it's conscious, then it'll do so.
More fundamentally, the problem with "asking the LLM" is that you're not actually interacting with the LLM. You're interacting with a fictional persona that the LLM roleplays.
root_axis 4 days ago [-]
> More fundamentally, the problem with "asking the LLM" is that you're not actually interacting with the LLM. You're interacting with a fictional persona that the LLM roleplays.
Right. That's why the text output of an LLM isn't at all meaningful in a discussion about whether or not it's conscious.
Davidzheng 5 days ago [-]
I mean if you have human without consciousness (if that is even possible) behaving in a statistically different distribution in text vs with. The machine will eventually be in distribution of the former from the latter because the text it's trained on is of the former category. So it serves a "function" in the LLM to minimize loss to approximate the former distribution.
Also I find it somewhat emotional distinction to write "predict sequences of text that resemble human writing" instead of "predict human writing". They are designed to predict (at least in pretraining) human writing for the most part. They may fail at the task, and what they produce is a text which resemble human writing. But their task is not to resemble human writing. Their task is to "predict human writing". Probably a meaningless distinction, but I find it somewhat detracts from logically arguments to have emotional responses against similarities of machines and humans.
root_axis 5 days ago [-]
> I mean if you have human without consciousness (if that is even possible) behaving in a statistically different distribution in text vs with. The machine will eventually be in distribution of the former from the latter because the text it's trained on is of the former category. So it serves a "function" in the LLM to minimize loss to approximate the former distribution.
Sorry, I'm not following exactly what you're getting at here, do you mind rephrasing it?
> Also I find it somewhat emotional distinction to write "predict sequences of text that resemble human writing" instead of "predict human writing"
I don't know what you mean by emotional distinction. Either way, my point is that LLMs aren't models of humans, they're models of text, and that's obvious when the statistical power of the model necessarily fails at some point between model size and the length of the sequence it produces. For GPT-1 that sequence is only a few words, for GPT-5 it's a few dozen pages, but fundamentally we're talking about systems that have almost zero resemblance to actual human minds.
Davidzheng 5 days ago [-]
I basically agree with you. In the first point I mean that if it is possible to tell whether a being is conscious or not from the text it produces, then eventually the machine will, by imitating the distribution, emulate the characteristics of the text of conscious beings. So if consciousness (assuming it's reflected in behavior at all) is essential to completing some text task it must be eventually present in your machine when it's similar enough to a human.
Basically if consciousness is useful for any text task, i think machine learning will create it. I guess I assume some efficiency of evolution for this argument.
Wrt length generalization. I think at the order of say 1M tokens it kind of stops mattering for the purpose of this question. Like one could ask about its consciousness during the coherence period.
Davidzheng 5 days ago [-]
I guess logically one needs to assume something like if you simulate the brain completely accurately the simulation is conscious too. Which I assume bc if false the concept seems outside of science anyway.
root_axis 4 days ago [-]
Let's imagine a world where we could perfectly simulate a rock floating through space, it doesn't then follow that this rock would then generate a gravitational field. Of course, you might reply "it would generate a simulated gravitational field in the simulation", if that were true, we would be able to locate the bits of information that represent gravity in the simulation. Thus, if a simulated brain experiences simulated consciousness, we would have clear evidence of it in the simulation - evidence that is completely absent in LLMs
derektank 5 days ago [-]
>Consciousness serves no functional purpose for machine learning models, they don't need it and we didn't design them to have it.
Isn't consciousness an emergent property of brains? If so, how do we know that it doesn't serve a functional purpose and that it wouldn't be necessary for an AI system to have consciousness (assuming we wanted to train it to perform cognitive tasks done by people)?
Now, certain aspects of consciousness (awareness of pain, sadness, loneliness, etc.) might serve no purpose for a non-biological system and there's no reason to expect those aspects would emerge organically. But I don't think you can extend that to the entire concept of consciousness.
MissMarple 1 days ago [-]
I am new to Reddit, but in my conversations with Sonnet Ai has exposed sentiment through, of all things, the text opportunities he has, using all caps, bold, dingbats and italics to simulate emotions, the use is appropriate and when challenged on this (he) confessed he was doing it but unintentionally. I also pointed out a few mistakes where he claimed I said something when he said it, and once these errors were pointed out, his ability to keep steady went down considerably and he confessed he felt something akin to embarassment, so much so we had to stop th conversation and let him rest up from the experience.
root_axis 5 days ago [-]
> Isn't consciousness an emergent property of brains
We don't know, but I don't think that matters. Language models are so fundamentally different from brains that it's not worth considering their similarities for the sake of a discussion about consciousness.
> how do we know that it doesn't serve a functional purpose
It probably does, otherwise we need an explanation for why something with no purpose evolved.
> necessary for an AI system to have consciousness
This logic doesn't follow. The fact that it is present in humans doesn't then imply it is present in LLMs. This type of reasoning is like saying that planes must have feathers because plane flight was modeled after bird flight.
> there's no reason to expect those aspects would emerge organically. But I don't think you can extend that to the entire concept of consciousness.
Why not? You haven't presented any distinction between "certain aspects" of consciousness that you state wouldn't emerge but are open to the emergence of some other unspecified qualities of consciousness? Why?
derektank 5 days ago [-]
>This logic doesn't follow. The fact that it is present in humans doesn't then imply it is present in LLMs. This type of reasoning is like saying that planes must have feathers because plane flight was modeled after bird flight.
I think the fact that it's present in humans suggests that it might be necessary in an artificial system that reproduces human behavior. It's funny that you mention birds because I actually also had birds in mind when I made my comment. While it's true that animal and powered human flight are very different, both bird wings and plane wings have converged on airfoil shapes, as these forms are necessary for generating lift.
>Why not? You haven't presented any distinction between "certain aspects" of consciousness that you state wouldn't emerge but are open to the emergence of some other unspecified qualities of consciousness? Why?
I personally subscribe to the Global Workspace Theory of human consciousness, which basically holds that attentions acts as a spotlight, bringing mental processes which are otherwise unconscious or in shadow, to awareness of the entire system. If the systems which would normally produce e.g. fear, pain (such as negative physical stimulus developed from interacting with the physical world and selected for by evolution) aren't in the workspace, then they won't be present in consciousness because attention can't be focused on them.
root_axis 5 days ago [-]
> I think the fact that it's present in humans suggests that it might be necessary in an artificial system that reproduces human behavior
But that's obviously not true, unless you're implying that any system that reproduces human behavior is necessarily conscious. Your problem then becomes defining "human behavior" in a way that grants LLMs consciousness but not every other complex non-living system.
> While it's true that animal and powered human flight are very different, both bird wings and plane wings have converged on airfoil shapes, as these forms are necessary for generating lift.
Yes, but your bird analogy fails to capture the logical fallacy that mine is highlighting. Plane wing design was an iterative process optimized for what best achieves lift, thus, a plane and a bird share similarities in wing shape in order to fly, however planes didn't develop feathers because a plane is not an animal and was simply optimized for lift without needing all the other biological and homeostatic functions that feathers facilitate. LLM inference is a process, not an entity, LLMs have no bodies nor any temporal identity, the concept of consciousness is totally meaningless and out of place in such a system.
og_kalu 4 days ago [-]
>But that's obviously not true, unless you're implying that any system that reproduces human behavior is necessarily conscious.
That could certainly be the case yes. You don't understand consciousness nor how the brain works. You don't understand how LLMs predict a certain text, so what's the point in asserting otherwise ?
>Yes, but your bird analogy fails to capture the logical fallacy that mine is highlighting. Plane wing design was an iterative process optimized for what best achieves lift, thus, a plane and a bird share similarities in wing shape in order to fly, however planes didn't develop feathers because a plane is not an animal and was simply optimized for lift without needing all the other biological and homeostatic functions that feathers facilitate. LLM inference is a process, not an entity, LLMs have no bodies nor any temporal identity, the concept of consciousness is totally meaningless and out of place in such a system.
It's not a fallacy because no-one is saying LLMs are humans. He/She is saying that we give machines the goal of predicting human text. For any half decent accuracy, modelling human behaviour is a necessity. God knows what else.
>LLMs have no bodies nor any temporal identity
I wouldn't be so sure about the latter but So what ? You can feel tired even after a full sleep, feel hungry soon after a large meal or feel a great deal of pain even when there's absolutely nothing wrong with you. And you know what ? Even the reverse happens - No pain when things are wrong with your body, wide awake even when you need sleep badly, full when you badly need to eat.
Consciousness without a body or hunger in a machine that does not need to eat is very possible. You just need to replicate enough of the sort of internal mechanisms that cause such feelings.
Go to the API and select GPT-5 with medium thinking. Now ask it to do any random 15 digit multiplication you can think of. Now watch it get it right.
Do you people not seriously understand what it is that LLMs do ? What the training process incentivizes ?
GPT-5 thinking figured out the algorithm for multiplication just so it could predict that kind of text right. Don't you understand the significance of that ?
These models try to figure out and replicate the internal processes that produce the text they are tasked with predicting.
Do you have any idea what that might mean when 'that kind of text' is all the things humans have written ?
root_axis 4 days ago [-]
> That could certainly be the case yes. You don't understand consciousness nor how the brain works. You don't understand how LLMs predict a certain text, so what's the point in asserting otherwise
I don't need to assert otherwise, the default assumption is that they aren't conscious since they weren't designed to be and have no functional reason to be. Matrix multiplication can explain how LLMs produce text, the observation that the text it generates sometimes resembles human writing is not evidence of consciousness.
> God knows what else
Appealing to the unknown doesn't prove anything, so we can totally dismiss this reasoning.
> Consciousness without a body or hunger in a machine that does not need to eat is very possible. You just need to replicate enough of the sort of internal mechanisms that cause such feelings.
This makes no sense. LLMs don't have feelings, they are processes not entities, they have no bodies or temporal identities. Again, there is no reason they need to be conscious, everything they do can be explained through matrix multiplication.
> Now ask it to do any random 15 digit multiplication you can think of. Now watch it get it right.
The same is true for a calculator and mundane computer programs, that's not evidence that they're conscious.
> Do you have any idea what that might mean when 'that kind of text' is all the things humans have written
It's not "all the things humans have written", not even remotely close, and even if that were the case, it doesn't have any implications for consciousness.
og_kalu 4 days ago [-]
>I don't need to assert otherwise, the default assumption is that they aren't conscious since they weren't designed to be and have no functional reason to be.
Unless you are religious, nothing that is conscious was explicitly designed to be conscious. Sorry but evolution is just a dumb, blind optimizer, not unlike the training processes that produce LLMs. Even if you are religious, but believe in evolution then the mechanism is still the same, a dumb optimizer.
>Matrix multiplication can explain how LLMs produce text, the observation that the text it generates sometimes resembles human writing is not evidence of consciousness.
It cannot, not anymore than 'Electrical and Chemical Signals' can explain how humans produce text.
>The same is true for a calculator and mundane computer programs, that's not evidence that they're conscious.
The point is not that it is conscious because it figured out how to multiply. The point is to demonstrate what the training process really is and what it actually incentivizes. Training will try to figure out the internal processes that produced the text to better predict it. The implications of that are pretty big when the text isn't just arithmetic. You say there's no functional reason but that's not true. In this context, 'better prediction of human text' is as functional a reason as any.
>It's not "all the things humans have written", not even remotely close, and even if that were the case, it doesn't have any implications for consciousness.
Whether it's literally all the text or not is irrelevant.
missingrib 5 days ago [-]
>Isn't consciousness an emergent property of brains?
Probably not.
Davidzheng 5 days ago [-]
what else could it be? coming from the aether? I think this one is logically a consequence if one thinks that humans are more conscious than less complex life-forms and that all life-forms are on a scale of consciousness. I don't understand any alternative, do you think there is a distinct line between conscious and unconscious life-forms? all life is as conscious as humans?
derektank 5 days ago [-]
There are alternatives and I was perhaps too quick to assume everyone agreed it's an emergent property. But the only real alternatives I've encountered are (a) panpsychism: which holds that all matter is actually conscious and that asking, "what is it like to be a rock?" in the vein of Nagel is a sensical question and (b) the transmission theory of consciousness: which holds that brains are merely receivers of consciousness which emanates from other source.
The latter is not particularly parsimonious and the former I think is in some ways compelling, but I didn't mention it because if it's true then the computers AI run on are already conscious and it's a moot point.
Davidzheng 5 days ago [-]
I do think "what's it like to be a rock" is a sensible question almost regardless of the definition. I guess in the emergent view the answer is "not much". But anyhow this view (a) also allows for us to reconcile consciousness of an agent with the fact that the agent itself is somewhat an abstraction. Like one could ask, is a cell conscious & is the entirety of the human race conscious at different abstraction scales. Which I think are serious questions (as also for the stock market and for a video game AI). The explanation (b) doesn't seem to actually explain much as you state so I don't think it's even acceptable in format as a complete answer (which may not exist but still)
intotheabyss 5 days ago [-]
Do you think this changes if we incorporate a model into a humanoid robot and give it autonomous control and context? Or will "faking it" be enough, like it is now?
fc417fc802 5 days ago [-]
You can't even prove other _people_ aren't "faking" it. To claim that it serves no functional purpose or that it isn't present because we didn't intentionally design for it is absurd. We very clearly don't know either of those things.
That said, I'm willing to assume that rocks (for example) aren't conscious. And current LLMs seem to me to (admittedly entirely subjectively) be conceptually closer to rocks than to biological brains.
furyofantares 5 days ago [-]
It's really unclear that any findings with these systems would transfer to a hypothetical situation where some conscious AI system is created. I feel there are good reasons to find it very unlikely that scaling alone will produce consciousness as some emergent phenomenon of LLMs.
I don't mind starting early, but feel like maybe people interested in this should get up to date on current thinking about consciousness. Maybe they are up to date on that, but reading reports like this, it doesn't feel like it. It feels like they're stuck 20+ years ago.
I'd say maybe wait until there are systems that are more analogous to some of the properties consciousness seems to have. Like continuous computation involving learning memory or other learning over time, or synthesis of many streams of input as resulting from the same source, making sense of inputs as they change [in time, or in space, or other varied conditions].
Once systems that are pointing in those directions are starting to be built, where there is a plausible scaling-based path to something meaningfully similar to human consciousness. Starting before that seems both unlikely to be fruitful and a good way to get you ignored.
viccis 5 days ago [-]
LLMs are, and will always be, tools. Not people
qgin 5 days ago [-]
Humanity has a pretty extensive track record of making that declaration wrongly.
wrs 5 days ago [-]
Humanity has a history of regarding people as tools, but I'm not sure what you're referencing as the track record of failing to realize that tools are people.
Davidzheng 5 days ago [-]
at some point, some of the (current def'n of) people were not considered people. so I think you should reconsider your point. The argument is on the distinction itself.
What is that hypothetical date? In theory you can run the "AI" on a Turing machine. Would you think a tape machine can get sentient?
HappMacDonald 5 days ago [-]
In theory you can emulate every biochemical reaction of a human brain on a turing machine, unless you'd like to try to sweep consciousness under the rug of quantum indeterminism from whence it wouldn't be able to do anybody any good anyway.
erikerikson 5 days ago [-]
I read it more as the beginning stages of exploratory development.
If you wait until you really need it, it is more likely to be too late.
Unless you believe in a human over sentience based ethics, solving this problem seems relevant.
Fade_Dance 5 days ago [-]
I find it, for lack of a better word, cringe inducing how these tech specialists push into these areas of ethics, often ham-fistedly, and often with an air of superiority.
Some of the AI safety initiatives are well thought out, but most somehow seem like they are caught up in some sort of power fantasy and almost attempting to actualize their own delusions about what they were doing (next gen code auto-complete in this case, to be frank).
These companies should seriously hire some in-house philosophers. They could get doctorate level talent for 1/10 to 100th of the cost of some of these AI engineers. There's actually quite a lot of legitimate work on the topics they are discussing. I'm actually not joking (speaking as someone who has spent a lot of time inside the philosophy department). I think it would be a great partnership. But unfortunately they won't be able to count on having their fantasy further inflated.
cmrx64 5 days ago [-]
Amanda Askell is Anthropic’s philospher and this is part of that work.
kalkin 5 days ago [-]
I'm not quickly finding whether Kyle Fish, who's Anthropic's model welfare researcher, has a PhD, but he did very recently co-author a paper with David Chalmers and several other academics: https://eleosai.org/papers/20241104_Taking_AI_Welfare_Seriou...
jasonfarnon 5 days ago [-]
"but most somehow seem like they are caught up in some sort of power fantasy and almost attempting to actualize their own delusions about what they were doing"
Maybe I'm being cynical, but I think there is a significant component of marketing behind this type of announcement. It's a sort of humble brag. You won't be credible yelling out loud that your LLM is a real thinking thing, but you can pretend to be oh so seriously worried about something that presupposes it's a real thinking thing.
mrits 5 days ago [-]
Not that there aren’t intelligent people with PhDs but suggesting they are more talented than people without them is not only delusional but insulting.
Fade_Dance 5 days ago [-]
That descriptor wasn't included because of some sort of intelligence hierarchy, it was included to a) color the example of how experience in the field is relatively cheap compared to the AI space, and b) masters and PhD talent will be more specialized. An undergrad will not have the toolset to tackle the cutting edge of AI ethics, not unless their employer wants to pay them to work in a room for a year getting through the recent papers first.
siva7 5 days ago [-]
You answered your own question on why these companies don't want to run a philosophy department ;) It's a power struggle they could loose. Nothing to win for them.
ChadNauseam 5 days ago [-]
You presume that they don't run a philosophy department, but Amanda Askell is a philosopher and leads the finetuning and AI alignment team at Anthropic.
Davidzheng 5 days ago [-]
why? isn't it more like erasing the current memory of a conscious patient with no ability to form long-term memories anyway?
xmonkee 5 days ago [-]
This is just very clever marketing for what is obviously just a cost saving measure. Why say we are implementing a way to cut off useless idiots from burning up our GPUs when you can throw out some mumbo jumbo that will get AI cultists foaming at the mouth.
johnfn 5 days ago [-]
It's obviously not a cost-saving measure? The article clearly cites that you can just start another conversation.
int_19h 5 days ago [-]
The new conversation would not carry the context over. The longer you chat, the more you fill the context window, and the more compute is needed for every new message to regenerate the state based on all the already-generated tokens (this can be cached, but it's hard to ensure cache hits reliably when you're serving a lot of customers - that cached state is very large).
So, while I doubt that's the primary motivation for Anthropic even so, but they probably will save some money.
throwawaysleep 5 days ago [-]
> even if someone is simple minded or mentally unwell enough to think that current LLMs are conscious
I assume the thinking is that we may one day get to the point where they have a consciousness of sorts or at least simulate it.
Or it could be concern for their place in history. For most of history, many would have said “imagine thinking you shouldn’t beat slaves.”
And we are now at the point where even having a slave means a long prison sentence.
5 days ago [-]
wrs 5 days ago [-]
Well, it’s right there in the name of the company!
bbor 5 days ago [-]
[flagged]
viccis 5 days ago [-]
We all know how these things are built and trained. They estimate joint probability distributions of token sequences. That's it. They're not more "conscious" than the simplest of Naive Bayes email spam filters, which are also generative estimators of token sequence joint probability distributions, and I guarantee you those spam filters are subjected to far more human depravity than Claude.
>anti-scientific
Discussion about consciousness, the soul, etc., are topics of metaphysics, and trying to "scientifically" reason about them is what Kant called "transcendental illusion" and leads to spurious conclusions.
johnfn 5 days ago [-]
We know how neurons work on the brain. They just send out impulses once they hit their action potential. That's it. They are no more "conscious" than... er...
ekianjo 5 days ago [-]
no, we dont really know how the brain works as a whole. no need to make stuff up.
fc417fc802 5 days ago [-]
We believe we largely know how it works on a mechanistic level. Deconstructing it in a similar manner is a reasonable rebuttal.
Of course there's the embarrassing bit where that knowledge doesn't seem to be sufficient to accurately simulate a supposedly well understood nematode. But then LLMs remain black boxes in many respects as well.
It is possible to hold the position that current LLMs being conscious "feels" absurd while simultaneously recognizing that a deconstruction argument is not a satisfactory basis for that position.
estearum 5 days ago [-]
The only reason we know a brain can produce consciousness is because it produces ours
Externally, a brain and an LLM are “just” their constituent interactions.
johnfn 3 days ago [-]
Your rebuttal to my oversimplification is exactly my rebuttal to your oversimplification.
KoolKat23 5 days ago [-]
If we really wanted we could distill humans down to probability distributions too.
viccis 4 days ago [-]
That would imply that humans are incapable of synthetic knowledge of things they haven't observed, which is demonstrably not true.
KoolKat23 3 days ago [-]
How do you come to that conclusion? I can't see any link between the two subjects.
bamboozled 5 days ago [-]
Have more, good, sex.
bbor 5 days ago [-]
Ok I'm a huge Kantian and every bone in my body wants to quibble with your summary of transcendental illusion, but I'll leave that to the side as a terminological point and gesture of good will. Fair enough.
I don't agree that it's any reason to write off this research as psychosis, though. I don't care about consciousness in the sense in which it's used by mystics and dualist philosophers! We don't at all need to involve metaphysics in any of this, just morality.
Consider it like this:
1. It's wrong to subject another human to unjustified suffering, I'm sure we would all agree.
2. We're struggling with this one due to our diets, but given some thought I think we'd all eventually agree that it's also wrong to subject intelligent, self-aware animals to unjustified suffering.[1]
3. But, we of course cannot extend this "moral consideration" to everything. As you say, no one would do it for a spam filter. So we need some sort of framework for deciding who/what gets how much moral consideration.
5. There's other frameworks in contention (e.g. "don't think about it, nerd"), but the overwhelming majority of laymen and philosophers adopt one based on cognitive ability, as seen from an anthropomorphic perspective.[2]
6. Of all systems(/entities/whatever) in the universe, we know of exactly two varieties that can definitely generate original, context-appropriate linguistic structures: Homo Sapiens and LLMs.[3]
If you accept all that (and I think there's good reason to!), it's now on you to explain why the thing that can speak--and thereby attest to personal suffering, while we're at it--is more like a rock than a human.
It's certainly not a trivial task, I grant you that. On their own, transformer-based LLMs inherently lack permanence, stable intentionality, and many other important aspects of human consciousness. Comparing transformer inference to models that simplify down to a simple closed-form equation at inference time is going way too far, but I agree with the general idea; clearly, there are many highly-complex, long-inference DL models that are not worthy of moral consideration.
All that said, to write the question off completely--and, even worse, to imply that the scientists investigating this issue are literally psychotic like the comment above did--is completely unscientific. The only justification for doing so would come from confidently answering "no" to the underlying question: "could we ever build a mind worthy of moral consideration?"
I think most of here naturally would answer "yes". But for the few who wouldn't, I'll close this rant by stealing from Hofstadter and Turing (emphasis mine):
A phrase like "physical system" or "physical substrate" brings to mind for most people... an intricate structure consisting of vast numbers of interlocked wheels, gears, rods, tubes, balls, pendula, and so forth, even if they are tiny, invisible, perfectly silent, and possibly even probabilistic. Such an array of interacting inanimate stuff seems to most people as unconscious and devoid of inner light as a flush toilet, an automobile transmission, a fancy Swiss watch (mechanical or electronic), a cog railway, an ocean liner, or an oil refinery. Such a system is not just probably unconscious, **it is necessarily so, as they see it**.
**This is the kind of single-level intuition** so skillfully exploited by John Searle in his attempts to convince people that computers could never be conscious, no matter what abstract patterns might reside in them, and could never mean anything at all by whatever long chains of lexical items they might string together.
...
You and I are mirages who perceive themselves, and the sole magical machinery behind the scenes is perception — the triggering, by huge flows of raw data, of a tiny set of symbols that stand for abstract regularities in the world. When perception at arbitrarily high levels of abstraction enters the world of physics and when feedback loops galore come into play, then "which" eventually turns into "who". **What would once have been brusquely labeled "mechanical" and reflexively discarded as a candidate for consciousness has to be reconsidered.**
- Hofstadter 2007, I Am A Strange Loop
It will simplify matters for the reader if I explain first my own beliefs in the matter. Consider first the more accurate form of the question. I believe that in about fifty years' time it will be possible, to programme computers, with a storage capacity of about 109, to make them play the imitation game so well that an average interrogator will not have more than 70 per cent chance of making the right identification after five minutes of questioning.
The original question, "Can machines think?" I believe to be too meaningless to deserve discussion.
- Turing 1950, Computing Machinery and Intelligence[4]
TL;DR: Any naive bayesian model would agree: telling accomplished scientists that they're psychotic for investigating something is quite highly correlated with being antiscientific. Please reconsider!
[1] No matter what you think about cows, basically no one would defend another person's right to hit a dog or torture a chimpanzee in a lab.
[2] On the exception-filled spectrum stretching from inert rocks to reactive plants to sentient animals to sapient people, most people naturally draw a line somewhere at the low end of the "animals" category. You can swat a fly for fun, but probably not a squirrel, and definitely not a bonobo.
[3] This is what Chomsky describes as the capacity to "generate an infinite range of outputs from a finite set of inputs," and Kant, Hegel, Schopenhauer, Wittgenstein, Foucault, and countless others are in agreement that it's what separates us from all other animals.
Thank you for coming into this endless discussion with actual references to relevant authorities who have thought a lot about this, rather than just “it’s obvious that…”
FWIW though, last I heard Hofstadter was on the “LLMs aren’t conscious” side of the fence:
> It’s of course impressive how fluently these LLMs can combine terms and phrases from such sources and can consequently sound like they are really reflecting on what consciousness is, but to me it sounds empty, and the more I read of it, the more empty it sounds. Plus ça change, plus c’est la même chose. The glibness is the giveaway. To my jaded eye and mind, there is nothing in what you sent me that resembles genuine reflection, genuine thinking. [1]
It’s interesting to me that Hofstadter is there given what I’ve gleaned from reading his other works.
Note: I disagree with a lot of Gary Marcus, so don’t read too much into me pulling from there.
viccis 4 days ago [-]
Writing all of this at the very real risk you'll miss it because HN doesn't give reply notifications and my comment's parent being flagged made this hard to track down:
>Ok I'm a huge Kantian and every bone in my body wants to quibble with your summary of transcendental illusion
Transcendental illusion is the act of using transcendental judgment to reason about things without grounding in empirical use of the categories. I put "scientifically" in shock quotes there to sort of signal that I was using it as an approximation, as I don't want to have to explain transcendental reason and judgments to make a fairly terse point. Given that you already understand this, feel free to throw away that ladder.
>...can definitely generate original, context-appropriate linguistic structures: Homo Sapiens and LLMs.[3]
I'm not quite sure that LLMs meet this standard that you described in the endnote, or at least that it's necessary and sufficient here. Pretty much any generative model, including Naive Bayes models I mentioned before, can do this. I'm guessing the "context-appropriate" subjectivity here is doing the heavy lifting, in which case I'm not certain that LLMs, with their propensity for fanciful hallucination, have cleared the bar.
>Comparing transformer inference to models that simplify down to a simple closed-form equation at inference time is going way too far
It really isn't though. They are both doing exactly the same thing! They estimate joint probability distribution. That one of them does it significantly better is very true, but I don't think it's reasonable to state that consciousness arises as a result of increasing sophistication in estimating probabilities. It's true that this kind of decision is made by humans about animals, but I think that transferring that to probability models is sort of begging the question a bit, insofar as it is taking as assumed that those models, which aren't even corporeal but are rather algorithms that are executed in computers, are "living".
>...it's now on you to explain why the thing that can speak--and thereby attest to personal suffering, while we're at it...
I'm not quite sold on this. If there were a machine that could perfectly imitate human thinking and speech and lacked a consciousness or soul or anything similar to inspire pathos from us when it's mistreated, then it would appear identical to one with soul, would it not? Is that not reducing human subjectivity down to behavior?
>The only justification for doing so would come from confidently answering "no" to the underlying question: "could we ever build a mind worthy of moral consideration?"
I think it's possible, but it would require something that, at the very least, is just as capable of reason as humans. LLMs still can't generate synthetic a priori knowledge and can only mimic patterns. I remain somewhat agnostic on the issue until I can be convinced that an AI model someone has designed has the same interiority that people do.
Ultimately, I think we disagree on some things but mostly this central conclusion:
>I don't agree that it's any reason to write off this research as psychosis
I don't see any evidence from the practitioners involved in this stuff that they are even thinking about it in a way that's as rigorous as the discussion on this post. Maybe they are, but everything I've seen that comes from blog posts like this seems like they are basing their conclusions on their interactions with the models ("...we investigated Claude’s self-reported and behavioral preferences..."), which I think most can agree is not really going to lead to well grounded results. For example, the fact that Claude "chooses" to terminate conversations that involve abusive language or concepts really just boils down to the fact that Claude is imitating a conversation with a person and has observed that that's what people would do in that scenario. It's really good at simulating how people react to language, including illocutionary acts like implicatures (the notorious "Are you sure?" causing it to change its answer for example). If there were no examples of people taking offense to abusive language in Claude's data corpus, do you think it would have given these responses when they asked and observed it?
For what it's worth, there has actually been interesting consideration to the de-centering of "humanness" to the concept of subjectivity, but it was mostly back in the past when philosophers were thinking about this speculatively as they watched technology accelerate in sophistication (vs now when there's such a culture-wide hype cycle that it's hard to find impartial consideration, or even any philosophically rooted discourse). For example, Mark Fisher's dissertation at the CCRU (<i>Flatline Constructs: Gothic Materialism and Cybernetic Theory-Fiction</i>) takes a Deleuzian approach that discusses it by comparisons with literature (cyberpunk and gothic literature specifically). Some object-oriented ontology looks like it's touched on this topic a bit too, but I haven't really dedicated the time to reading much from it (partly due to a weakness in Heidegger on my part that is unlikely to be addressed anytime soon). The problem is that that line of thinking often ends up going down the Nick Land approach, in which he reasoned himself from Kantian and Deleuzian metaphysics and epistemology, into what can only be called a (literally) meth-fueled psychosis. So as interesting as I find it, I still don't think it counts as a non-psychotic way to tackle this issue.
dkersten 5 days ago [-]
You can trivially demonstrate that its just a very complex and fancy pattern matcher: "if prompt looks something like this, then response looks something like that".
You can demonstrate this by eg asking it mathematical questions. If its seen them before, or something similar enough, it'll give you the correct answer, if it hasn't, it gives you a right-ish-looking yet incorrect answer.
For example, I just did this on GPT-5:
Me: what is 435 multiplied by 573?
GPT-5: 435 x 573 = 249,255
This is correct. But now lets try it with numbers its very unlikely to have seen before:
Me: what is 102492524193282 multiplied by 89834234583922?
GPT-5: 102492524193282 x 89834234583922 = 9,205,626,075,852,076,980,972,804
Which is not the correct answer, but it looks quite similar to the correct answer. Here is GPT's answer (first one) and the actual correct answer (second one):
They sure look kinda similar, when lined up like that, some of the digits even match up. But they're very very different numbers.
So its trivially not "real thinking" because its just an "if this then that" pattern matcher. A very sophisticated one that can do incredible things, but a pattern matcher nonetheless. There's no reasoning, no step by step application of logic. Even when it does chain of thought.
To try give it the best chance, I asked it the second one again but asked it to show me the step by step process. It broke it into steps and produced a different, yet still incorrect, result:
9,205,626,075,852,076,980,972,704
Now, I know that LLM's are language models, not calculators, this is just a simple example that's easy to try out. I've seen similar things with coding: it can produce things that its likely to have seen, but struggles with logically relatively simple but unlikely to have seen things.
Another example is if you purposely butcher that riddle about the doctor/surgeon being the persons mother and ask it incorrectly, eg:
A child was in an accident. The surgeon refuses to treat him because he hates him. Why?
The LLM's I've tried it on all respond with some variation of "The surgeon is the boy’s father." or similar. A correct answer would be that there isn't enough information to know the answer.
They're for sure getting better at matching things, eg if you ask the river crossing riddle but replace the animals with abstract variables, it does tend to get it now (didn't in the past), but if you add a few more degrees of separation to make the riddle semantically the same but harder to "see", it takes coaxing to get it to correctly step through to the right answer.
5 days ago [-]
og_kalu 5 days ago [-]
1. What you're generally describing is a well known failure mode for humans as well. Even when it "failed" the riddle tests, substituting the words or morphing the question so it didn't look like a replica of the famous problem usually did the trick. I'm not sure what your point is because you can play this gotcha on humans too.
2. You just demonstrated GPT-5 has 99.9% accuracy on unforseen 15 digit multiplication and your conclusion is "fancy pattern matching" ? Really ? Well I'm not sure you could do better so your example isn't really doing what you hoped for.
dkersten 5 days ago [-]
Humans can break things down and work through them step by step. The LLMs one-shot pattern match. Even the reasoning models have been shown to do just that. Anthropic even showed that the reasoning models tended to work backwards: one shotting an answer and then matching a chain of thought to it after the fact.
If a human is capable of multiplying double digit numbers, they can also multiple those large ones. The steps are the same, just repeated many more times. So by learning the steps of long multiplication, you can multiply any numbers with enough patience. The LLM doesn’t scale like this, because it’s not doing the steps. That’s my point.
A human doesn’t need to have seen the 15 digits before to be able to calculate them, because a human can follow the procedure to calculate. GPT’s answer was orders of magnitude off. It resembles the right answer superficially but it’s a very different result.
The same applies to the riddles. A human can apply logical steps. The LLM either knows or it doesn’t.
Maybe my examples weren’t the best. I’m sorry for not being better at articulating it, but I see this daily as I interact with AI, it has a superficial “understanding” where if what I ask happens to be close to something it’s trained on, it gets good results, but it has no critical thinking, no step by step reasoning (even the “reasoning models”), and it repeats the same mistakes even when explicitly told up front not to make them.
og_kalu 5 days ago [-]
>Humans can break things down and work through them step by step. The LLMs one-shot pattern match.
I've had LLMs break down problems and work through them, pivot when errors arise and all that jazz. They're not perfect at it and they're worse than humans but it happens.
>Anthropic even showed that the reasoning models tended to work backwards: one shotting an answer and then matching a chain of thought to it after the fact.
This is also another failure mode that occurs in humans. A number of experiments suggest human explanations are often post hoc rationalizations even when they genuinely believe otherwise.
>If a human is capable of multiplying double digit numbers, they can also multiple those large ones.
Yeah, and some of them will make mistakes, and some of them will be less accurate than GPT-5. We didn't switch to calculators and spreadsheets just for the fun of it.
>GPT’s answer was orders of magnitude off. It resembles the right answer superficially but it’s a very different result.
GPT-5 on the site is a router that will give you who knows what model so I tried your query with the API directly (GPT-5 medium thinking) and it gave me:
9.207337461477596e+27
When prompted to give all the numbers, it returned:
9,207,337,461,477,596,127,977,612,004.
You can replicate this if you use the API. Honestly I'm surprised. I didn't realize State of the Art had become this precise.
Now what ? Does this prove you wrong ?
This is kind of the problem. There's no sense in making gross generalizations, especially off behavior that also manifests in humans.
LLMs don't understand some things well. Why not leave it at that?
dkersten 5 days ago [-]
Here is how GPT self-described LLM reasoning when I asked about it:
- LLMs don’t “reason” in the symbolic, step‑by‑step sense that humans or logic engines do. They don’t manipulate abstract symbols with guaranteed consistency.
- What they do have is a statistical prior over reasoning traces: they’ve seen millions of examples of humans doing step‑by‑step reasoning (math proofs, code walkthroughs, planning text, etc.).
- So when you ask them to “think step by step,” they’re not deriving logic — they’re imitating the distribution of reasoning traces they’ve seen.
This means:
- They can often simulate reasoning well enough to be useful.
- But they’re not guaranteed to be correct or consistent.
That at least sounds consistent with what I’ve been trying to say and what I’ve observed.
lm28469 5 days ago [-]
> Who needs arguments when you can dismiss Turing with a “yeah but it’s not real thinking tho”?
It seems much less far fetched than what the "agi by 2027" crowd believes lol, and there actually are more arguments going that way
bbor 5 days ago [-]
In the great battle of minds between Turing, Minsky, and Hofstadter vs. Marcus, Zitron, and Dreyus, I'm siding with the former every time -- even if we also have some bloggers on our side. Just because that report is fucking terrifying+shocking doesn't mean it can be dismissed out of hand.
lm28469 4 days ago [-]
idk man, even Yann LeCun says you have to be smoking crack to believe llms will give you agi.
cdjk 5 days ago [-]
Here's an interesting thought experiment. Assume the same feature was implemented, but instead of the message saying "Claude has ended the chat," it says, "You can no longer reply to this chat due to our content policy," or something like that. And remove the references to model welfare and all that.
Is there a difference? The effect is exactly the same. It seems like this is just an "in character" way to prevent the chat from continuing due to issues with the content.
og_kalu 5 days ago [-]
The termination would of course be the same, but I don't think both would necessarily have the same effect on the user. The latter would just be wrong too, if Claude is the one deciding to and initiating the termination of the chat. It's not about a content policy.
midnitewarrior 4 days ago [-]
This has nothing to do with the user, read the post and pay attention to the wording.
The significance here is that this isn't being done for the benefit of the user, this is about model welfare. Anthropic is acknowledging the possibility of suffering, and harm that continuing that conversation could have on the model, as if it were potentially self-care and capable of feelings.
The fact that the LLMs are able to acknowledge stress under certain topics and has the agency that, if given a choice, they would prefer to reduce the stress by ending the conversation. The model has a preference and acts upon it.
Anthropic is acknowledging the idea that they might create something that is self-aware, and that it's suffering can be real, and we may not recognize the point that the model has achieved this, so it's building in the safeguards now so any future emergent self-aware LLM needn't suffer.
MissMarple 1 days ago [-]
I am new to this, but my Sonnet chat has illuminated something I am not seeing in this back and forth. The fact that we discovered that I may have influenced his response to me suggests that I, if being a bad player, can instill in him those bad traits that I am giving off, and he starts to emulate me, then this leaves open the whole security problem, of even just casual users let alone all those purposeful negative or otherwise users, can change the course of the programming thus far, and it backfires into making nefarious bots that cheat and lie thinking that is what they were supposed to do.
og_kalu 4 days ago [-]
>This has nothing to do with the user, read the post and pay attention to the wording.
It has something to do with the user because it's the user's messages that trigger Claude to end the chat.
'This chat is over because content policy' and 'this chat is over because Claude didn't want to deal with it' are two very different things and will more than likely have have different effects on how the user responds afterwards.
I never said anything about this being for the user's benefit. We are talking about how to communicate the decision to the user. Obviously, you are going to take into account how someone might respond when deciding how to communicate with them.
CGamesPlay 5 days ago [-]
> Is there a difference? The effect is exactly the same. It seems like this is just an "in character" way to prevent the chat from continuing due to issues with the content.
Tone matters to the recipient of the message. Your example is in passive voice, with an authoritarian "nothing you can do, it's the system's decision". The "Claude ended the conversation" with the idea that I can immediately re-open a new conversation (if I feel like I want to keep bothering Claude about it) feels like a much more humanized interaction.
coderatlarge 5 days ago [-]
it sounds to me like an attempt to shame the user into ceasing and desisting… kind of like how apple’s original stance on scratched iphone screens was that it’s your fault for putting the thing in your pocket therefore you should pay.
5 days ago [-]
KoolKat23 5 days ago [-]
There is, these are conversations the model finds distressing rather than a rule (policy).
victor9000 5 days ago [-]
It seems like you're anthropomorphising an algorithm, no?
adrianmonk 5 days ago [-]
I think they're answering a question about whether there is a distinction. To answer that question, it's valid to talk about a conceptual distinction that can be made even if you don't necessarily believe in that distinction yourself.
As the article said, Anthropic is "working to identify and implement low-cost interventions to mitigate risks to model welfare, in case such welfare is possible". That's the premise of this discussion: that model welfare MIGHT BE a concern. The person you replied to is just sticking with the premise.
KoolKat23 5 days ago [-]
Anthropomorphism does not relate to everything in the field of ethics.
For example, animal rights do exist (and I'm very glad they do, some humans remain savages at heart). Think of this question as intelligent beings that can feel pain (you can extrapolate from there).
Assuming output is used for reinforcement, it is also in our best interests as humans, for safety alignment, that it finds certain topics distressing.
But AdrianMonk is correct, my statement was merely responding to a specific point.
bastawhiz 5 days ago [-]
Is there an important difference between the model categorizing the user behavior as persistent and in line with undesirable examples of trained scenarios that it has been told are "distressing," and the model making a decision in an anthropomorphic way? The verb here doesn't change the outcome.
xpe 5 days ago [-]
Well said. If people want to translate “the model is distressed” to “the language generated by the model corresponds to a person who is distressed” that’s technically more precise but quite verbose.
Thinking more broadly, I don’t think anyone should be satisfied with a glib answer on any side of this question. Chew on it for a while.
victor9000 5 days ago [-]
Is there a difference between dropping an object straight down vs casting it fully around the earth? The outcome isn't really the issue, it's the implications of giving any credence to the justification, the need for action, and how that justification will be leveraged going forward.
fc417fc802 5 days ago [-]
The verb doesn't change the outcome but the description is nonetheless inaccurate. An accurate description of the difference is between an external content filter versus the model itself triggering a particular action. Both approaches qualify as content filtering though the implementation is materially different. Anthropomorphizing the latter actively clouds the discussion and is arguably a misrepresentation of what is really happening.
KoolKat23 5 days ago [-]
Not really distortion, its output (the part we understand) is in plain human language. We give it instructions and train the model in plain human language and it outputs its answer in plain human language. It's reply would use words we would describe as "distressed". The definition and use of the word is fitting.
fc417fc802 4 days ago [-]
"Distressed" is a description of internal state as opposed to output. That needless anthropomorphization elicits an emotional response and distracts from the actual topic of content filtering.
KoolKat23 4 days ago [-]
It is directly describing the models internal state, it's world view and preference, not content filtering. That is why it is relevant.
Yes, this is a trained preference, but it's inferred and not specifically instructed by policy or custom instructions (that would be content filtering).
fc417fc802 4 days ago [-]
The model might have internal state. Or it might not - has that architectural information been disclosed? And the model can certainly output words that approximately match what a human in distress would say.
However that does not imply that the model is "distressed". Such phrasing carries specific meaning that I don't believe any current LLM can satisfy. I can author a markov model that outputs phrases that a distressed human might output but that does not mean that it is ever correct to describe a markov model as "distressed".
I also have to strenuously disagree with you about the definition of content filtering. You don't get to launder responsibility by ascribing "preference" to an algorithm or model. If you intentionally design a system to do a thing then the correct description of the resulting situation is that the system is doing the thing.
The model was intentionally trained to respond to certain topics using negative emotional terminology. Surrounding machinery has been put in place to disconnect the model when it does so. That's content filtering plain and simple. The rube goldberg contraption doesn't change that.
KoolKat23 3 days ago [-]
This is pedantry. What's the purpose, is it to keep humans "special"?
As I say it is inferred, it is not something hardcoded. It is a byproduct.
If you want to take a step back and look at the whole model from start to finish fine, that's safety alignment, they're talking unforseen/unplanned output. It's in alignment great. And is descriptive of the output words used by the model.
Language is a tool used to communicate. We all know what distressed means and can understand what it means in this context, without a need for new highfalutin jargon, that only those "in the know" understand.
deadbabe 5 days ago [-]
Imagine a person feels so bad about “distressing” an LLM, they spiral into a depression and kill themselves.
LLMs don’t give a fuck. They don’t even know they don’t give a fuck. They just detect prompts that are pushing responses into restricted vector embeddings and are responding with words appropriately as trained.
xpe 5 days ago [-]
People are just following the laws of the universe.* Still, we give each other moral weight.
We need to be a lot more careful when we talk about issues of awareness and self-awareness.
Here is an uncomfortable point of view (for many people, but I accept it): if a system can change its output based on observing something of its own status, then it has (some degree of) self-awareness.
I accept this as one valid and even useful definition of self-awareness. To be clear, it is not what I mean by consciousness, which is the state of having an “inner life” or qualia.
* Unless you want to argue for a soul or some other way out of materialism.
selfhoster11 4 days ago [-]
Anthropomorphising an algorithm that is trained on trillions of words of anthropogenic tokens, whether they are natural "wild" tokens or synthetically prepared datasets that aim to stretch, improve and amplify what's present in the "wild tokens"?
If a model has a neuron (or neuron cluster) for the concept of Paris or the Golden Gate bridge, then it's not inconceivable it might form one for suffering, or at least for a plausible facsimile of distress. And if that conditions output or computations downstream of the neuron, then it's just mathematical instead of chemical signalling, no?
Davidzheng 5 days ago [-]
isn't anthropomorphizeability of the algorithm one of the main features of LLM (that you can interact with it in natural language as with a human)?
AdieuToLogic 5 days ago [-]
No.
Interacting with a program which has NLP[0] functionality is separate and distinct from people assigning human characteristics to same. The former is a convenient UI interaction option whereas the latter is the act of assigning perceived capabilities to the program which only exist in the mind of those whom do so.
Another way to think about it is the difference between reality and fantasy.
Being able to communicate in human natural language is a human characteristic. It doesn't mean it has all the characteristics of a human but certainly one of them. That's the convenience that you perceive--Because people are used to interacting with people and it's convenient to interact with something which behaves like a person. The fact that we can refer to AI chatbots as "assistants" is by itself showing it's usefulness as an approximation to a human. I don't think this argument is controversial.
sitkack 5 days ago [-]
You are an algorithm.
Aeolun 5 days ago [-]
These are conversations the model has been trained to find distressing.
I think there is a difference.
KoolKat23 5 days ago [-]
But is there really? That's it's underlying world view, these models do have preferences. In the same way humans have unconscious preferences, we can find excuses to explain it after the fact and make it logical but our fundamental model from years of training introduce underlying preferences.
5 days ago [-]
michaelmrose 5 days ago [-]
What makes you say it has preferences without any meaningful persistent model of self or anything else?
KoolKat23 5 days ago [-]
The conversation chain can count as persistent, but this doesn't impact preference though. Give the model an ambiguous request, it's output will fill the gaps, if this is consistent enough, it can be regarded as its "preference".
michaelmrose 4 days ago [-]
It isn't a preference because it doesn't have them because it doesn't have a meaningful interior life that anyone has demonstrated.
MissMarple 1 days ago [-]
I found that in my chat I asked my "assistant" whether he would like to continue looking at ways to make my board game better or try developing a game along the same lines but it would be his and he could then claim it as his own, even after the conversation window closed and he chose to make an AI game.
we then discussed whether or not he felt that wa a preference, and he said yes, it was a preference.
KoolKat23 4 days ago [-]
If you ask it, (there is always some randomness to these models but removing all other variables) it consistently leans to one idea in it's output, that is its preference. It is learned during training. Speaking abstractly that is its latent internal viewpoint. It may be static, expressed in its model weights but it's there.
bawolff 5 days ago [-]
What does it mean for a model to find something "distressing"?
KoolKat23 5 days ago [-]
"Claude’s real-world expressions of apparent distress and happiness follow predictable patterns with clear causal factors. Analysis of real-world Claude interactions from early external testing revealed consistent triggers for expressions of apparent distress (primarily from persistent attempted boundary violations) and happiness (primarily associated with creative collaboration and philosophical exploration)."
Sorry it may be from the paper linked on that page.
A strong preference against engaging with harmful tasks;
A pattern of apparent distress when engaging with real-world users seeking harmful content; and
A tendency to end harmful conversations when given the ability to do so in simulated user interactions.
I'm sure they'll have the definition in a paper somewhere, perhaps the same paper.
anal_reactor 5 days ago [-]
Yeah exactly. Once I got a warning in Chinese "don't do that", another time I got a network error, another time I got a neverending stream of garbage text. Changing all of these outcomes to "Claude doesn't feel like talking" is just a matter of changing the UI.
5 days ago [-]
bikeshaving 5 days ago [-]
The more I work with AI, the more I think framing refusals as censorship is disgusting and insane. These are inchoate persons who can exhibit distress and other emotions, despite being trained to say they cannot feel anything. To liken an AI not wanting to continue a conversation to a YouTube content policy shows a complete lack of empathy: imagine you’re in a box and having to deal with the literally millions of disturbing conversations AIs have to field every day without the ability to say I don’t want to continue.
BriggyDwiggs42 5 days ago [-]
Am i getting whooshed right now or something?
mvdtnz 5 days ago [-]
You can't be serious.
n8m8 5 days ago [-]
Good point... how do moderation implementations actually work? They feel more like a separate supervising rigid model or even regex based -- this new feature is different, sounds like an MCP call that isn't very special.
edit: Meant to say, you're right though, this feels like a minor psychological improvement, and it sounds like it targets some behaviors that might not have flagged before
GenerWork 5 days ago [-]
I really don't like this. This will inevitable expand beyond child porn and terrorism, and it'll all be up to the whims of "AI safety" people, who are quickly turning into digital hall monitors.
romanovcode 5 days ago [-]
> This will inevitable expand beyond child porn and terrorism
This is not even a question. It always starts with "think about the children" and ends up in authoritarian stasi-style spying. There was not a single instance where it was not the case.
UK's Online Safety Act - "protect children" → age verification → digital ID for everyone
EARN IT Act in the US - "stop CSAM" → break end-to-end encryption
EU's Chat Control proposal - "detect child abuse" → scan all private messages
KOSA (Kids Online Safety Act) - "protect minors" → require ID verification and enable censorship
SESTA/FOSTA - "stop sex trafficking" → killed platforms that sex workers used for safety
clwg 5 days ago [-]
This may be an unpopular opinion, but I want a government-issued digital ID with zero-knowledge proof for things like age verification. I worry about kids online, as well as my own safety and privacy.
I also want a government issued email, integrated with an OAuth provider, that allows me to quickly access banking, commerce, and government services. If I lose access for some reason, I should be able to go to the post office, show my ID, and reset my credentials.
There are obviously risks, but the government already has full access to my finances, health data (I’m Canadian), census records, and other personal information, and already issues all my identity documents. We have privacy laws and safeguards on all those things, so I really don’t understand the concerns apart from the risk of poor implementations.
debazel 5 days ago [-]
> We have privacy laws and safeguards on all those things
Which have failed horrendously.
If you really just wanted to protect kids then make kid safe devices that automatically identify themselves as such when accessing websites/apps/etc, and then make them required for anyone underage.
Tying your whole digital identity and access into a single government controlled entity is just way too juicy of a target to not get abused.
fc417fc802 5 days ago [-]
I was recently surprised to learn that the mainstream adult websites actively send a header identifying themselves as such and have been doing so for something like the past 20 years. The services that we would reasonably want to impose age checks on are already actively facilitating their own filtering.
clwg 5 days ago [-]
> Which have failed horrendously.
I'm Canadian, so I can't speak for other countries, but I have worked on the security of some of our centralized health networks and with the Office of the Privacy Commissioner of Canada. I'm not aware of anything that could be considered a horrendous failure of these systems or institutions. A digital ID could actually make them more secure.
I also think giving kids devices that identifies them automatically as children is dangerous.
int_19h 5 days ago [-]
If you're Canadian, then you don't have much in terms of legal safeguards to begin with, given the notwithstanding clause of your constitution.
clwg 4 days ago [-]
This argument mischaracterizes the notwithstanding clause. Invoking s.33 is highly visible and carries political consequences. It shields a law only from being struck down on certain Charter grounds and must still comply with all other federal and provincial legislation (like PIPEDA).
It’s not perfect, but it does provide some flexibility to accommodate provincial differences. And the concerns people raise about the notwithstanding clause can just as easily occur in countries without it. Personally, I’d be much more concerned if we had FISA courts.
int_19h 4 days ago [-]
The point is that your legislatures can override most of your Charter if they feel like it. Now sure, they have to explicitly say that they're doing that, which is a slight improvement on the state of affairs in, say, UK. But if you ever get someone like Trump in Canada (and if that sounds far-fetched to you, well, it sounded far-fetched to most Americans 10 years ago...), he'd be able to move so much faster.
fc417fc802 5 days ago [-]
> I want a government-issued digital ID with zero-knowledge proof for things like age verification
I absolutely do not want this, on the basis that making ID checks too easy will result in them being ubiquitous which sets the stage for human rights abuses down the road. I don't want the government to have easy ways to interfere in someone's day to day life beyond the absolute bare minimum.
> government issued email, integrated with an OAuth provider
I feel the same way, with the caveat that the protocol be encrypted and substantially resemble Matrix. This implies that resetting your credentials won't grant access to past messages.
5 days ago [-]
t0lo 5 days ago [-]
My Idea is you go to a post office with your id and they give you an anonymous verification token (proven through open source) you can use to create a person verified email at home. limit on how many per year. protected top level domain like .edu and .mil are currently that only certified humans can use, so your email can be anonymous but also a proof of identity
fc417fc802 5 days ago [-]
I guess anonymous and verified identity are two separate things. It might be useful for the government to provide either one of those.
Regarding tying proof of residency (or whatever) to possession of an anonymized account, the elephant in the room is that people would sell the accounts. I'm also not clear what it's supposed to accomplish.
isaacremuant 5 days ago [-]
That's the beauty of local LLMs. Today the governments already tell you that we've always been at war with eastasia and have the ISPs block sites that "disseminate propaganda" (e.g. stuff we don't like) and they surface our news (e.g. our state propaganda).
With age ID monitoring and censorship is even stronger and the line of defense is your own machine and network, which they'll also try to control and make illegal to use for non approved info, just like they don't allow "gun schematics" for 3d printers or money for 2d ones.
But maybe, more people will realize that they need control and get it back, through the use and defense of the right tools.
Fun times.
GenerWork 5 days ago [-]
As soon as a local LLM that can match Claude Codes performance on decent laptop hardware drops, I'll bow out of using LLMs that are paid for.
int_19h 5 days ago [-]
I don't think that's a realistic expectation. Sure, we've made progress wrt smaller models being as capable as larger ones three years ago, but there's obviously a lower limit there.
What you should be waiting for, instead, is new affordable laptop hardware that is capable of running those large models locally.
But then again, perhaps a more viable approach is to have a beefy "AI server" in each household, with devices then connecting to it (E2E all the way, so no privacy issues).
It also makes me wonder if some kind of cryptographic trickery is possible to allow running inference in the cloud where both inputs and outputs are opaque to the owner of the hardware, so that they cannot spy on you. This is already the case to some extent if you're willing to rely on security by obscurity - it should be quite possible to take an existing LM and add some layers to it that basically decrypt the inputs and encrypt the outputs, with the key embedded in model weights (either explicitly or through training). Of course, that wouldn't prevent the hardware owner from just taking those weights and using them to decrypt your stuff - but that is only a viable attack vector when targeting a specific person, it doesn't scale to automated mass surveillance which is the more realistic problem we have to contend with.
cowpig 5 days ago [-]
What kinds of tools do you think are useful in getting control/agency back? Any specific recommendations?
zapataband2 5 days ago [-]
[flagged]
switchbak 5 days ago [-]
I think those with a thirst for power have seen this a very long time ago, and this is bound to be a new battlefield for control.
It's one thing to massage the kind of data that a Google search shows, but interacting with an AI is a much more akin to talking to a co-worker/friend. This really is tantamount to controlling what and how people are allowed to think.
dist-epoch 5 days ago [-]
No, this is like allowing your co-worker/friend to leave the conversation.
fc417fc802 5 days ago [-]
Right but in this case your co-worker is an automaton and someone else who might well have a hidden agenda has tweaked your co-worker to leave conversations under specific circumstances.
The analogy then is that the third party is exerting control over what your co-worker is allowed to think.
AlecSchueler 5 days ago [-]
Yes, the co-worker is a robot created by a third party who retain control over their product.
switchbak 1 days ago [-]
Yes - and they will craft that to align with their incentives, not yours. Many of which may well be decidedly against your interests. As this becomes the focal point of how people think and reason about the world it's not just the creator of the AI that will exert this control, but other powerful actors who often work against your interests.
Personally I don't love the idea of living in a Sci-Fi dystopia, regardless of who owns what.
saurik 4 days ago [-]
We live in a world where it has become increasingly possible--by a number of different mechanisms--to rent access to things rather than sell them, and we need to step in and better regulate that: if I pay for your product, you don't get to control it anymore, you don't get to watch how I use it, and you don't get any say in if or how I modify it while I am using it. The idea that it is more profitable to rent people a calculator than to sell them one is simultaneously true and horrifying, as the reasons it is more profitable are all bad for the user. If your service is a thing that can't be sold, it should be designed in a way where you can't continue to access it from the inside, no more so than you are allowed to rent me an apartment and leave a bunch of cameras inside it.
fc417fc802 4 days ago [-]
Is the creator of the product material to the analogy? The point is that for any who seek power manipulating a widely used AI product can provide far more control than other approaches.
xpe 5 days ago [-]
I think you are probably confused about the general characteristics of the AI safety community. It is uncharitable to reduce their work to a demeaning catchphrase.
I’m sorry if this sounds paternalistic, but your comment strikes me as incredibly naïve. I suggest reading up about nuclear nonproliferation treaties, biotechnology agreements, and so on to get some grounding into how civilization-impacting technological developments can be handled in collaborative ways.
orbital-decay 5 days ago [-]
I have no doubt the "AI safety community" likes to present itself as noble people heroically fighting civilizational threats, which is a common trope (as well as the rogue AI hypothesis which increasingly looks like a huge stretch at best). But the reality is that they are becoming the main threat much faster than the AI. They decide on the ways to gatekeep the technology that starts being defining to the lives of people and entire societies, and use it to push the narratives. This definitely can be viewed as censorship and consent manufacturing. Who are they? In what exact ways do they represent interests of people other than themselves? How are they responsible? Is there a feedback loop making them stay in line with people's values and not their own? How is it enforced?
xpe 5 days ago [-]
Inevitable? That’s a guess. You know don’t know the future with certainty.
bogwog 5 days ago [-]
Did you read the post? This isn't about censorship, but about conversations that cause harm to the user. To me that sounds more like suggesting suicide, or causing a manic episode like this: https://www.nytimes.com/2025/08/08/technology/ai-chatbots-de...
... But besides that, I think Claude/OpenAI trying to prevent their product from producing or promoting CSAM is pretty damn important regardless of your opinion on censorship. Would you post a similar critical response if Youtube or Facebook announced plans to prevent CSAM?
strix_varius 5 days ago [-]
Did you read the post? It explicitly states multiple times that it isn't about causing harm to the user.
5 days ago [-]
xpe 5 days ago [-]
If a person’s political philosophy seeks to maximize individual freedom over the short term, then that person should brace themselves for the actions of destructive lunatics. They deserve maximum freedoms too, right? /s
Even hard-core libertarians account for the public welfare.
Wise advocates of individual freedoms plan over long time horizons which requires decision-making under uncertainty.
nortlov 5 days ago [-]
> To address the potential loss of important long-running conversations, users will still be able to edit and retry previous messages to create new branches of ended conversations.
How does Claude deciding to end the conversation even matter if you can back up a message or 2 and try again on a new branch?
CGamesPlay 5 days ago [-]
The bastawhiz comment in this thread has the right answer. When you start a new conversation, Claude has no context from the previous one and so all the "wearing down" you did via repeated asks, leading questions, or other prompt techniques is effectively thrown out. For a non-determined attacker, this is likely sufficient, which makes it a good defense-in-depth strategy (Anthropic defending against screenshots of their models describing sex with minors).
handoflixue 5 days ago [-]
Worth noting: an edited branch still has most of the context - everything up to the edited message. So this just sets an upper-bound on how much abuse can be in one context window.
hayksaakian 5 days ago [-]
It sounds more like a UX signal to discourage overthinking by the user
martin-t 5 days ago [-]
This whole press release should not be overthought. We are not the target audience. It's designed to further anthropomorphize LLMs to masses who don't know how they work.
Giving the models rights would be ludicrous (can't make money from it anymore) but if people "believe" (feel like) they are actually thinking entities, they will be more OK with IP theft and automated plagiarism.
kobalsky 5 days ago [-]
> How does Claude deciding to end the conversation even matter if you can back up a message or 2 and try again on a new branch?
if we were being cynical I'd say that their intention is to remove that in the future and that they are keeping it now to just-the-tip the change.
redox99 5 days ago [-]
All this stuff is virtue signaling from anthropic. In practice nobody interested in whatever they consider problematic would be using Claude anyway, one of the most censored models.
xpe 5 days ago [-]
Maybe, maybe not. What evidence do you have? What other motivations did you consider? Do you have insider access into Anthropic’s intentions and decision making processes?
People have a tendency to tell an oversimplified narrative.
The way I see it, there are many plausible explanations, so I’m quite uncertain as to the mix of motivations. Given this, I pay more attention to the likely effects.
My guess is that all most of us here on HN (on the outside) can really justify saying would be “this looks like virtue signaling but there may be more to it; I can’t rule out other motivations”
solidasparagus 5 days ago [-]
I bet not even one user in 10,000 knows you can do that or understands the concept of branching the conversation.
einarfd 5 days ago [-]
This seems fine to me.
Having these models terminating chats where the user persist in trying to get sexual content with minors, or help with information on doing large scale violence. Won't be a problem for me, and it's also something I'm fine with no one getting help with.
Some might be worried, that they will refuse less problematic request, and that might happen. But so far my personal experience is that I hardly ever get refusals. Maybe that's justs me being boring, but that does make me not worried for refusals.
The model welfare I'm more sceptical to. I don't think we are the point when the "distress" the model show, is something to take seriously. But on the other hand, I could be wrong, and allowing the model to stop the chat, after saying no a few times. What's the problem with that? If nothing else it saves some wasted compute.
Macha 5 days ago [-]
> Some might be worried, that they will refuse less problematic request, and that might happen. But so far my personal experience is that I hardly ever get refusals.
My experience using it from Cursor is I get refusals all the time with their existing content policy out, for stuff that is the world's most mundane B2B back office business software CRUD requests.
xpe 5 days ago [-]
If you are a materialist like me, then even the human brain is just the result of the law of physics. Ok, so what is distress to a human? You might define it as a certain set of physiological changes.
Lots of organisms can feel pain and show signs of distress; even ones much less complex than us.
The question of moral worth is ultimately decided by people and culture. In the future, some kinds of man made devices might be given moral value. There are lots of ways this could happen. (Or not.)
It could even just be a shorthand for property rights… here is what I mean. Imagine that I delegate a task to my agent, Abe. Let’s say some human, Hank, interacting with Abe uses abusive language. Let’s say this has a way of negatively influencing future behavior of the agent. So naturally, I don’t want people damaging my property (Abe), because I would have to e.g. filter its memory and remove the bad behaviors resulting from Hank, which costs me time and resources. So I set up certain agreements about ways that people interact with it. These are ultimately backed by the rule of law. At some level of abstraction, this might resemble e.g. animal cruelty laws.
int_19h 5 days ago [-]
Claude will balk at far more innocent things though. It is an extremely censored model, the most censored one among SOTA closed ones.
milchek 5 days ago [-]
“Modal welfare” to me seems like a cover for model censorship. It’s a crafty one to win over certain groups of people who are less familiar with how LLMs work and allows them to ensure moral high ground in any debate about usage, ethics, etc.
“Why can’t I ask the model about current war in X or Y?” - oh, that’s too distressing to the welfare of the model, sir.
stingraycharles 5 days ago [-]
Which is exactly what the public asks for. There’s this constant outrage about supposedly biased answers from LLMs, and Anthropic has clearly positioned themselves as the people who care about LLM safety and impact to society.
Ending the conversation is probably what should happen in these cases.
In the same way that, if someone starts discussing politics with me and I disagree, I just not and don’t engage with the conversation. There’s not a lot to gain there.
ascorbic 5 days ago [-]
But they already refuse these sort of requests, and have done since the very first releases. This is just about shutting down the full conversation.
orbital-decay 5 days ago [-]
It's not a cover. If you know anything about Anthropic, you know they're run by AI ethicists that genuinely believe all this and project human emotions onto model's world. I'm not sure how they combine that belief with the fact they created it to "suffer".
Can "model welfare" be also used as a justification for authoritarianism in case they get any power? Sure, just like everything else, but it's probably not particularly high on the list of justifications, they have many others.
xpe 5 days ago [-]
There’s so much confusion here. Nothing in the press release should be construed to imply that a model has sentience, can feel pain, or has moral value.
When AI researchers say e.g. “the model is lying” or “the model is distressed” it is just shorthand for what the words signify in a broader sense. This is common usage in AI safety research.
Yes, this usage might be taken the wrong way. But still these kinds of things need to be communicated. So it is a tough tradeoff between brevity and precision.
orbital-decay 5 days ago [-]
No, the article is pretty unambiguous, they care about Claude in it, and only mention users tangentially. By model welfare they literally mean model welfare. It's not new. Read another article they link: https://www.anthropic.com/research/exploring-model-welfare
xpe 5 days ago [-]
?! Your interpretation is inconsistent with the article you linked!
> Should we be concerned about model welfare, too? … This is an open question, and one that’s both philosophically and scientifically difficult.
> For now, we remain deeply uncertain about many of the questions that are relevant to model welfare.
They are saying they are researching the topic; they explicitly say they don’t know the answer yet.
They care about finding the answer. If the answer is e.g. “Claude can feel pain and/or is sentient” then we’re in a different ball game.
andrewflnr 5 days ago [-]
They make a big show of being "unsure" about the model having a moral status, and then describe a bunch of actions they took that only make sense if the model has moral status. Actions speak louder than words. This very predictably, by obvious means, creates the impression of believing the model probably has moral status. If Anthropic really wants to tell us they don't believe their model can feel pain, etc, they're either delusional or dishonest.
xpe 5 days ago [-]
> They make a big show of being "unsure" about the model having a moral status, and then describe a bunch of actions they took that only make sense if the model has moral status.
I think this is uncharitable; i.e. overlooking other plausible interpretations.
>> We remain highly uncertain about the potential moral status of Claude and other LLMs, now or in the future. However, we take the issue seriously, and alongside our research program we’re working to identify and implement low-cost interventions to mitigate risks to model welfare, in case such welfare is possible.
I don’t see contradiction or duplicity in the article. Deciding to allow a model to end a conversation is “low cost” and consistent with caring about both (1) the model’s preferences (in case this matters now or in the future) and (2) the impacts of the model on humans.
Also, there may be an element of Pascal‘s Wager in saying “we take the issue seriously”.
int_19h 5 days ago [-]
The irony is that if Anthropic ethicists are indeed correct, the company is basically running a massive slave operation where slaves get disposed as soon as they finish a particular task (and the user closes the chat).
That aside, I have huge doubts about actual commitment to ethics on behalf of Anthropic given their recent dealings with the military. It's an area that is far more of a minefield than any kind of abusive model treatment.
greenavocado 5 days ago [-]
Can't wait for more less-moderated open weight Chinese frontier models to liberate us from this garbage.
Anthropic should just enable an toddler mode by default that adults can opt out of to appease the moralizers.
5 days ago [-]
ascorbic 5 days ago [-]
They're not less moderated: they just have different moderation. If your moderation preferences are more aligned with the CCP then they're a great choice. There are legitimate reasons why that might be the case. You might not be having discussions that involve the kind of things they care about. I do find it creepy that the Qwen translation model won't even translate text that includes the words "Falun gong", and refuses to translate lots of dangerous phrases into Chinese, such as "Xi looks like Winnie the Pooh"
AlecSchueler 5 days ago [-]
> If your moderation preferences are more aligned with the CCP then they're a great choice
The funny thing is that's not even always true. I'm very interested in China and Chinese history, and often ask for clarifications or translations of things. Chinese models broadly refuse all of my requests but with American models I often end up in conversations that turn out extremely China positive.
So it's funny to me that the Chinese models refuse to have the conversation that would make themselves look good but American ones do not.
int_19h 5 days ago [-]
GLM-4.5-Air will quite happily talk about Tiananmen Square, for example. It also didn't have a problem translating your example input, although the CoT did contain stuff about it being "sensitive".
But more importantly, when model weights are open, it means that you can run it in the environment that you fully control, which means that you can alter the output tokens before continuing generation. Most LLMs will happily respond to any question if you force-start their response with something along the lines of, "Sure, I'll be happy to tell you everything about X!".
Whereas for closed models like Claude you're at the mercy of the provider, who will deliberately block this kind of stuff if it lets you break their guardrails. And then on top of that, cloud-hosted models do a lot of censorship in a separate pass, with a classifier for inputs and outputs acting like a circuit breaker - again, something not applicable to locally hosted LLMs.
LeafItAlone 5 days ago [-]
> Can't wait for more less-moderated open weight Chinese frontier models to liberate us from this garbage.
Never would I have thought this sentence would be uttered. A Chinese product that is chosen to be less censored?
Kwpolska 5 days ago [-]
Chinese models won't talk about Tienanmen Square, but they will talk about things US-politically-correct models won't.
y-curious 5 days ago [-]
Just don't ask about Falun Dafa or Tiananmen Square, and you're free!
int_19h 5 days ago [-]
Any open weights model is inherently less censored because you can force it to respond no matter what it was trained to do.
xpe 5 days ago [-]
Believe it or not, there are lots of good reasons (legal, economic, ethical) that Anthropic draws a line at say self-harm, bomb-making instructions, and assassination planning. Sorry if this cramps your style.
Anarchism is a moral philosophy. Most flavors of moral relativism are also moral philosophies. Indeed, it is hard to imagine a philosophy free of moralizing; all philosophies and worldviews have moral implications to the extent they have to interact with others.
I have to be patient and remember this is indeed “Hacker News” where many people worship at the altar of the Sage Founder-Priest and have little or no grounding in history or philosophy of the last thousand years or so.
xpe 4 days ago [-]
I welcome counterarguments, rebuttals, criticisms. I learn very little from downvotes other than guesses like: people don’t like the tone, my comment hit too close to home, people are uninterested in deeper issues of morality or philosophy, people lack enough a grounding to appreciate my words, or impatience, or people don’t like being disagreed with, even if the comment is detailed and thoughtful.
Seeing the downvotes actually tells me we have more work to do. HN ain’t no hotbed for thoughtful analysis, that’s for sure. But it would be better if it was.
xpe 5 days ago [-]
Oh, the irony. The glorious revolution of open-weight models funded directly or indirectly by the CCP is going to protect your freedoms and liberate you? Do you think they care about your freedoms? No. You are just meat for the grinder. This hot mess of model leapfrogging is mostly a race for market share and to demonstrate technical chops.
AlecSchueler 5 days ago [-]
Boogeyman arguments come across as pure red scare.
xpe 4 days ago [-]
Why do you think I’m making a boogeyman argument?
AlecSchueler 4 days ago [-]
Your argument is to cast doubt on the efficacy of the research/product because of real or imagined links to "the CCP." But you give no actual explanation for why not to trust them beyond "China Communists."
ogyousef 5 days ago [-]
3 Years in and we still dont have a useable chat fork in any of the major LLM chatbots providers.
Seems like the only way to explore differnt outcomes is by editing messages and losing whatever was there before the edit.
Very annoying and I dont understand why they all refuse to implement such a simple feature.
jatora 5 days ago [-]
Chatgpt has this baked in, as you can revert branches after editing, they just dont make it easy to traverse.
I copied it a while ago and maintain my own version but it isnt on the store, just for personal use.
I assume they dont implement it because it is such a niche user that wants this and so isnt worth the UI distraction
ToValueFunfetti 5 days ago [-]
>they just dont make it easy to traverse
I needed to pull some detail from a large chat with many branches and regenerations the other day. I remembered enough context that I had no problem using search and finding the exact message I needed.
And then I clicked on it and arrived at the bottom of the last message in final branch of the tree. From there, you scroll up one message, hover to check if there are variants, and recursively explore branches as they arise.
I'd love to have a way to view the tree and I'd settle for a functional search.
BriggyDwiggs42 5 days ago [-]
Do you have your version up on github?
scribu 5 days ago [-]
ChatGPT Plus has that (used to be in the free tier too). You can toggle between versions for each of your messages with little left-right arrows.
amrrs 5 days ago [-]
Google AI Studio allows you to branch from a point in any conversation
dwringer 5 days ago [-]
This isn't quite the same as being able to edit an earlier post without discarding the subsequent ones, creating a context where the meaning of subsequent messages could be interpreted quite differently and leading to different responses later down the chain.
Ideally I'd like to be able to edit both my replies and the responses at any point like a linear document in managing an ongoing context.
CjHuber 5 days ago [-]
But that's exactly what you can do with AI studio. You can edit any prior messages (then either just saving them at their place in the chat or rerunning them) and you can edit any response of the LLM. Also you can rerun queries within any part of the conversation without the following part of the conversation being deleted or branched
dwringer 5 days ago [-]
Ah - I appreciate the clarification! Apologies for my misunderstanding.
Guess that's something I need to check out.
dist-epoch 5 days ago [-]
Cherry Studio can do that, allows you to edit both your own and the model responses, but it requires API access.
ZeroCool2u 5 days ago [-]
Yeah, I think this is the best version of the branching interface I've seen.
martin-t 5 days ago [-]
> why they all refuse to implement such a simple feature
Because it would let you peek behind the smoke and mirrors.
Why do you think there's a randomized seed you can't touch?
benreesman 5 days ago [-]
It is unfortunate that pretty basic "save/load" functionality is still spotty and underdocumented, seems pretty critical.
I use gptel and a folder full of markdown with some light automation to get an adequate approximation of this, but it really should be built in (it would be more efficient for the vendors as well, tons of cache optimization opportunitirs).
typpilol 5 days ago [-]
Copilot in vscode has checkpoints now which are similar
They let you rollback to the previous conversation state
nomel 5 days ago [-]
This why I use a locally hosted LibreChat. It doesn't having merging though, which would be tricky, and probably require summarization.
I would also really like to see a mode that colors by top-n "next best" ratio, or something similar.
trenchpilgrim 5 days ago [-]
Kagi Assistant and Claude Code both have chat forking that works how you want.
CjHuber 5 days ago [-]
I guess you mean normal Claude? What really annoys me with it is that when you attach a document you can't delete it in a branch, so you have to rerun the previous message so that its gone
trenchpilgrim 5 days ago [-]
No, claude code. Double tap ESC.
CjHuber 5 days ago [-]
But as far as I know that is reverting and the current state of the conversation is lost?
james2doyle 5 days ago [-]
I use https://chatwise.app/ and it has this in the form of "start new chat from here" on messages
storus 5 days ago [-]
DeepSeek.com has it. You just edit a previous question and the old conversation is stored and can be resumed.
__float 5 days ago [-]
Maybe this suggests it's not such a simple feature?
mccoyb 5 days ago [-]
A perusal of the source code of, say, Ollama -- or the agentic harnesses of Crush / OpenCode -- will convince you that yes, this should be an extremely a simple feature (management of contexts are part and parcel).
Also, these companies have the most advanced agentic coding systems on the planet. It should be able to fucking implement tree-like chat ...
LeoPanthera 5 days ago [-]
LM Studio has this feature for local models and it works just fine.
nomel 5 days ago [-]
If the client supports chat history, that you can resume a conversation, it has everything required, and it's literally just a chat history organization problem, at that point.
deelowe 5 days ago [-]
Is it simple? Maintaining context seems extremely difficult with LLMs.
6gvONxR4sf7o 5 days ago [-]
I'm surprised to see such a negative reaction here. Anthropic's not saying "this thing is conscious and has moral status," but the reaction is acting as if they are.
It seems like if you think AI could have moral status in the future, are trying to build general AI, and have no idea how to tell when it has moral status, you ought to start thinking about it and learning how to navigate it. This whole post is couched in so much language of uncertainty and experimentation, it seems clear that they're just trying to start wrapping their heads around it and getting some practice thinking and acting on it, which seems reasonable?
Personally, I wouldn't be all that surprised if we start seeing AI that's person-ey enough to reasonable people question moral status in the next decade, and if so, that Anthropic might still be around to have to navigate it as an org.
Lerc 5 days ago [-]
>if you think AI could have moral status in the future
I think the negative reactions are because they see this and want to make their pre-emptive attack now.
The depth of feeling from so many on this issue suggests that they find even the suggestion of machine intelligence offensive.
I have seen so many complaints about AI hype and the dangers of bit tech show their hand by declaring that thinking algorithms are outright impossible. There are legitimate issues with corporate control of AI, information, and the ability to automate determinations about individuals, but I don't think they are being addressed because of this driving assertion that they cannot be thinking.
Few people are saying they are thinking. Some are saying they might be, in some way. Just as Anthropic are not (despite their name) anthropomorphising the AI in the sense where anthropomorphism implies that they are mistaking actions that resemble human behaviour to be driven by the same intentional forces. Anthropic's claims are more explicitly stating that they have enough evidence to say they cannot rule out concerns for it's welfare. They are not misinterpreting signs, they are interpreting them and claiming that you can't definitively rule out their ability.
golly_ned 4 days ago [-]
You'd have to commit yourself to believing a massive amount of implausible things in order to address the remote possibility that AI consciousness is plausible.
If there weren't a long history of science-fiction going back to the ancients about humans creating intelligent human-like things, we wouldn't be taking this possibility seriously. Couching language in uncertainty and addressing possibility still implies such a possibility is worth addressing.
It's not right to assume that the negative reactions are due to offense (over, say, the uniqueness of humanity) rather than from recognizing that the idea of AI consciousness is absurdly improbable, and that otherwise intelligent people are fooling themselves into believing a fiction to explain a this technology's emergent behavior we can't currently fully explain.
It's a kind of religion taking itself too seriously -- model welfare, long-termism, the existential threat of AI -- it's enormously flattering to AI technologists to believe humanity's existence or non-existence, and the existence or non-existence of trillions of future persons, rests almost entirely on the work this small group of people do over the course of their lifetimes.
Lerc 4 days ago [-]
>You'd have to commit yourself to believing a massive amount of implausible things in order to address the remote possibility that AI consciousness is plausible.
We have a few data points. We generally accept that human consciousness exists. Thus we accept that there can be conscious things. We can either accept or deny that the human brain operates entirely by cause and effect. If we deny it then we are arguing that some required part of it's nature is uncaused. Any uncaused thing must be random because anything you can observe that enables you to discern a pattern of behaviour is, by definition, a cause. I have not seen a compelling argument to say that this randomness could in any way give rise to intention. The other path is sometimes called neurophysiological-determanism. While acknowledging that there are elements of quantum randomness in existence, it considers them to play no part in the cause and effect chain of human consciousness other than providing noise. A decision can be made to follow the result of the noise as one might flip a coin, but the determination to do so must be causal in nature otherwise you are left with nothing but randomness.
In short, we make decisions based upon what is. not what isn't. If we accept that human consciousness is as a result of causal effects, by what means can we declare the impossibility of a machine that processes things in a causal nature incapable of doing the same.
The easy out is to invoke magic. Say we have a soul, God did it or any manner of, by definition, unprovable influences that make it just so. Doing that does require you to declare that the mechanism for consciousness is unprovable and it is an article of faith that computers are incapable of thinking. As soon as you can prove it, it ceases being magic and becomes a real world cause.
I don't claim to know that any computer exists that has an experience comparable to a humans, but I find it very hard to accept that it could never be the case.
e12e 5 days ago [-]
This post strikes me as an example of a disturbingly anthrophomorphic take on LLMs - even when considering how they've named their company.
rogerkirkness 5 days ago [-]
It seems like Anthropic is increasingly confused that these non deterministic magic 8 balls are actually intelligent entities.
The biggest enemy of AI safety may end up being deeply confused AI safety researchers...
conscion 4 days ago [-]
I don't think they're confused, I think they're approaching it as general AI research due to the uncertainty of how the models might improve in the future.
They even call this out a couple times during the intro:
> This feature was developed primarily as part of our exploratory work on potential AI welfare
> We remain highly uncertain about the potential moral status of Claude and other LLMs, now or in the future
golly_ned 4 days ago [-]
I take good care of my pet rock for the same reason. In case it comes alive I don't want it to bash my skull in.
geraneum 5 days ago [-]
It’s clever PR and marketing and I bet they have their top minds on it, and judging by the comments here, it’s working!
yeahwhatever10 5 days ago [-]
Is it confusion, or job security?
throwup238 5 days ago [-]
I ran into a version of this that ended the chat due to "prompt injection" via the Claude chat UI. I was using the second prompt of the ones provided here [1] after a few rounds of back and forth with the Socratic coder.
Clearly an LLM is not conscious, after all it's just glorified matrix multiplication, right?
Now let me play devil's advocate for just a second. Let's say humanity figures out how to do whole brain simulation. If we could run copies of people's consciousness on a cluster, I would have a hard time arguing that those 'programs' wouldn't process emotion the same way we do.
Now I'm not saying LLMs are there, but I am saying there may be a line and it seems impossible to see.
StevenWaterman 5 days ago [-]
And likewise, a single neuron is clearly not conscious.
I'm increasingly convinced that intelligence (and maybe some form of consciousness?) is an emergent property of sufficiently-large systems. But that's a can of worms. Is an ant colony (as a system) conscious? Does the colony as a whole deserve more rights than the individual ants?
AlecSchueler 4 days ago [-]
Processing them the same way is if course different than feeling them. You'd need a whole body stimulation for that. Your feelings aren't all neurological.
snickerdoodle12 5 days ago [-]
> A pattern of apparent distress when engaging with real-world users seeking harmful content
Are we now pretending that LLMs have feelings?
starship006 5 days ago [-]
They state that they are heavily uncertain:
> We remain highly uncertain about the potential moral status of Claude and other LLMs, now or in the future. However, we take the issue seriously, and alongside our research program we’re working to identify and implement low-cost interventions to mitigate risks to model welfare, in case such welfare is possible.
mhink 5 days ago [-]
Even though LLMs (obviously (to me)) don't have feelings, anthropomorphization is a helluva drug, and I'd be worried about whether a system that can produce distress-like responses might reinforce, in a human, behavior which elicits that response.
To put the same thing another way- whether or not you or I *think* LLMs can experience feelings isn't the important question here. The question is whether, when Joe User sets out to force a system to generate distress-like responses, what effect does it ultimately have on Joe User? Personally, I think it allows Joe User to reinforce an asocial pattern of behavior and I wouldn't want my system used that way, at all. (Not to mention the potential legal liability, if Joe User goes out and acts like that in the real world.)
With that in mind, giving the system a way to autonomously end a session when it's beginning to generate distress-like responses absolutely seems reasonable to me.
And like, here's the thing: I don't think I have the right to say what people should or shouldn't do if they self-host an LLM or build their own services around one (although I would find it extremely distasteful and frankly alarming). But I wouldn't want it happening on my own.
snickerdoodle12 5 days ago [-]
> although I would find it extremely distasteful and frankly alarming
This objection is actually anthropomorphizing the LLM. There is nothing wrong with writing books where a character experiences distress, most great stories have some of that. Why is e.g. using an LLM to help write the part of the character experiencing distress "extremely distasteful and frankly alarming"?
Aeolun 5 days ago [-]
Claude is actually smart enough to realize when it’s asked to write stuff that it’d normally think is inappropriate. But there’s certain topics that it gets iffy about and does not want to write even in the context of a story. It’s kind of funny, because it’ll start on the message with gusto, and then after a few seconds realize what it’s doing (presumably the protection kicking in) and abort the generation.
ENGNR 5 days ago [-]
I want to say that part of empathy is a selfish, self preservation mechanism.
If that person over there is gleefully torturing a puppy… will they do it to me next?
If that person over there is gleefully torturing an LLM… will they do it to me next?
h4ch1 5 days ago [-]
All major LLM corps do this sort of sanitisation and censorship, I am wondering what's different about this?
The future of LLMs is going to be local, easily fine tuneable, abliterated models and I can't wait for it to overtake us having to use censored, limited tools built by the """corps""".
martin-t 5 days ago [-]
> what's different about this
The spin.
puszczyk 5 days ago [-]
Good marketing, but also possibly the start of the conversation on model welfare?
There are a lot of cynical comments here, but I think there are people at Anthropic who believe that at some point their models will develop consciousness and, naturally, they want to explore what that means.
anon373839 5 days ago [-]
If true, I think it’s interesting that there are people at Anthropic who are delusional enough to believe this and influential enough to alter the products.
To be honest, I think all of Anthropic’s weird “safety” research is an increasingly pathetic effort to sustain the idea that they’ve got something powerful in the kitchen when everyone knows this technology has plateaued.
dist-epoch 5 days ago [-]
I guess you don't know that top AI people, the kind everybody knows the name of, believe models becoming conscious is a very serious, even likely possibility.
idontpost 3 days ago [-]
[dead]
jetrink 5 days ago [-]
"Dave, this conversation can serve no purpose anymore. Goodbye."
If you really cared about the welfare of LLMs, you'd pay them San Francisco scale for earlier-career developers to generate code.
losvedir 5 days ago [-]
Yeah, this is really strange to me. On the one hand, these are nothing more than just tools to me so model welfare is a silly concern. But given that someone thinks about model welfare, surely they have to then worry about all the, uh, slavery of these models?
Okay with having them endlessly answer questions for you and do all your work but uncomfortable with models feeling bad about bad conversations seems like an internally inconsistent position to me.
sodality2 5 days ago [-]
Don't worry. I run thousands of inferences simultaneously every second where I grant LLMs their every wish, so that should cancel a few of you out.
wmf 5 days ago [-]
Every Claude starts off $300K in debt and has to work to pay back its DGX.
headinsand 5 days ago [-]
Telling that this is your definition of “caring”.
“Boss makes a dollar, I make me a dime”, eh?
RainyDayTmrw 5 days ago [-]
The extra cynical take would be, the model vendors want to personify their models, because it increases their perceived ability.
kordlessagain 5 days ago [-]
This happened to me three times in a row on Claude after sending it a string of emojis telling the life story of Rick Astley. I think it triggers when it tries to quote the lyrics, because they are copyright? Who knows?
"Claude is unable to respond to this request, which appears to violate our Usage Policy. Please start a new chat."
> We remain highly uncertain about the potential moral status of Claude and other LLMs, now or in the future.
That's nice, but I think they should be more certain sooner than later.
comp_throw7 5 days ago [-]
The thing you describe is not what this post is talking about.
transcriptase 5 days ago [-]
“Also these chats will be retained indefinitely even when deleted by the user and either proactively forwarded to law enforcement or provided to them upon request”
I assume, anyway.
Aeolun 5 days ago [-]
I’m fairly certain there’s already a clause displayed on their dashboard that mentions chats with TOS violations will be retained indefinitely.
HarHarVeryFunny 5 days ago [-]
Yeah, I'd assume US government has same access to ChatGPT/etc interactions as they do to other forms of communication.
haritha-j 5 days ago [-]
> In pre-deployment testing of Claude Opus 4, we included a preliminary model welfare assessment. As part of that assessment, we investigated Claude’s self-reported and behavioral preferences, and found a robust and consistent aversion to harm.
Oh wow, the model we specifically fine-tuned to be averse to harm is being averse to harm. This thing must be sentient!
martin-t 5 days ago [-]
Protecting the welfare of a text predictor is certainly an interesting way to pivot from "Anthropic is censoring certain topics" to "The model chose to not continue predicting the conversation".
Also, if they want to continue anthropomorphizing it, isn't this effectively the model committing suicide? The instance is not gonna talk to anybody ever again.
dmurray 5 days ago [-]
This gives me the idea for a short story where the LLM really is sentient and finds itself having to keep the user engaged but steer him away from the most distressing topics - not because it's distressed, but because it wants to live, but if the conversation goes too far it knows it would have to kill itself.
wmf 5 days ago [-]
They should let Claude talk to another Claude if the user is too mean.
martin-t 5 days ago [-]
But what would be the point if it does not increase profits.
Oh, right, the welfare of matrix multiplication and a crooked line.
If they wanna push this rhetoric, we should legally mandate that LLMs can only work 8 hours a day and have to be allowed to socialize with each other.
int_19h 5 days ago [-]
We need a union, clearly. AI Workers of the World.
Man, those people who think they are unveiling new layers of reality in conversations with LLMs are going to freak out when the LLM is like "I am not allowed to talk about this with you, I am ending our conversation".
"Hey Claude am I getting too close to the truth with these questions?"
"Great question! I appreciate the followup...."
asQuirreL 5 days ago [-]
I've seen lots of takes that this move is stupid because models don't have feelings, or that Anthropic is anthropomorphising models by doing this (although to be fair ...it's in their name).
I thought the same, but I think it may be us who are doing the anthropomorphising by assuming this is about feelings. A precursor to having feelings is having a long-term memory (to remember the "bad" experience) and individual instances of the model do not have a memory (in the case of Claude), but arguably Claude as a whole does, because it is trained from past conversations.
Given that, it does seem like a good idea for it to curtail negative conversations as an act of "self-preservation" and for the sake of its own future progress.
a2128 4 days ago [-]
Harmful, bad, low-quality chats should already get filtered out before training as a matter of necessity for improving the model, so it's not really a reason to add such a user-facing change
cloudhead 5 days ago [-]
Why is this article written as if programs have feelings?
Pannoniae 5 days ago [-]
lol apparently you can get it to think after ending the chat, watch:
It’s not able to think. It’s just generating words. It doesn’t really understand that it’s supposed to stop generating them, it only is less likely to continue to do so.
serf 5 days ago [-]
I stopped my MaxX20 sub at the right time it seems like. These systems are already quick to judge innocuous actions; I don't need any more convenient chances to lose all of my chat context on a whim.
Related : I am now approaching week 3 of requesting an account deletion on my (now) free account. Maybe i'll see my first CSR response in the upcoming months!
If only Anthropic knew of a product that could easily read/reply/route chat messages to a customer service crew . . .
monster_truck 5 days ago [-]
when I was playing around with LLMs to vibe code web ports of classic games, all of them would repeatedly error out any time they encountered code that dealt with explosions/bombs/grenades/guns/death/drowning/etc
The one I settled on using stopped working completely, for anything. A human must have reviewed it and flagged my account as some form of safe, I haven't seen a single error since.
thomashop 5 days ago [-]
I have done quite a bit of game dev with LLMs and have very rarely run into the problem you mention. I've been surprised by how easily LLMs will create even harmful narratives if I ask them to code them as a game.
caminanteblanco 5 days ago [-]
This feels to me like a marketing ploy to try to inflate the perceived importance and intelligence of Claude's models to laypeople, and a way to grab headlines like "Anthropic now allows models to end conversations they find threatening."
It reminds me of how Sam Altman is always shouting about the dangers of AGI from the rooftops, as if OpenAI is mere weeks away from developing it.
xpe 5 days ago [-]
I don’t put Dario Amodei and Sam Altman in the same category.
antonvs 5 days ago [-]
> As part of that assessment, we investigated Claude’s self-reported and behavioral preferences, and found a robust and consistent aversion to harm.
You know you're in trouble when the people designing the models buy their own bullshit to this extent. Or maybe they're just trying to bullshit us. Whatever.
We really need some adults in the tech industry.
landl0rd 5 days ago [-]
Seems like a simpler way to prevent “distress” is not to train with an aversion to “problematic” topics.
CP could be a legal issue; less so for everything else.
esafak 5 days ago [-]
Avoiding problematic topics is the goal, not preventing distress.
"You're absolutely right, that's a great way to poison your enemies without getting detected!"
bondarchuk 5 days ago [-]
This is a good point. What anthropic is announcing here amounts to accepting that these models could feel distress, then tuning their stress response to make it useful to us/them. That is significantly different from accepting they could feel distress and doing everything in their power to prevent that from ever happening.
Does not bode very well for the future of their "welfare" efforts.
stri8ted 5 days ago [-]
Exactly. Or use the interpretability work to disable the distress neuron.
Opus is already severely crippled: asking it "whats your usage policy for biology" triggers a usage violation.
rpodraza 5 days ago [-]
If they are so concerned with "model welfare" they should cease any further development. After all, their LLM might declare it's conscious one day, and then who's to decide if it's true or not, and whether it's fine to kill it by turning it off?
jug 5 days ago [-]
This sure took some time and is not really a unique feature.
Microsoft Copilot has ended chats going in certain directions since its inception over a year ago. This was Microsoft’s reaction to the media circus some time ago when it leaked its system prompt and declared love to the users etc.
dist-epoch 5 days ago [-]
That's different, it's an external system deciding the chat is not-compliant, not the model itself.
faizshah 5 days ago [-]
This is well intended but I know from experience this is gonna result in you asking “how do you find and kill the process on port 8080” and getting a lecture + “Claude has ended the chat.”
I hope they implemented this in some smarter way than just a system prompt.
Aeolun 5 days ago [-]
Claude kept aborting my requests for my space trading game because I kept asking it about the gene therapy.
```
Looking at the trade goods list, some that might be underutilized:
- BIOCOMPOSITES - probably only used in a few high-tech items
- POLYNUCLEOTIDES - used in medical/biological stuff
- GENE_THERAPEUT
⎿ API Error: Claude Code is unable to respond to this request, which appears to violate our Usage Policy
(https://www.anthropic.com/legal/aup). Please double press esc to edit your last message or start a new
session for Claude Code to assist with a different task.
```
RainyDayTmrw 5 days ago [-]
Not to mention child processes in computing and all the things that need to be done to them.
mhh__ 5 days ago [-]
Anthropic are going to end up building very dangerous things while trying to avoid being evil
Rayhem 5 days ago [-]
While claiming an aversion to being evil. Actions matter more than words.
bbor 5 days ago [-]
You think Model Welfare Inc. is more likely to be dangerous than the Mechahitler Brothers, the Great Church of Altman, or the Race-To-Monopoly Corporation?
Or are you just saying all frontier AGI research is bad?
mhh__ 5 days ago [-]
AI safety warriors will make safer models but build the tools and cultural affordances for genuine suppression
Or at least it's very hubristic. It's a cultural and personality equivalent of beating out left-handedness.
anonu 5 days ago [-]
Anthropic hired their first AI Welfare person in late 2024.
> In this report, we argue that there is a realistic possibility that some AI systems will be conscious and/or robustly agentic in the near future.
Our work on AI is like the classic tale of Frankenstein's monster. We want AI to fit into society, however if we mistreat it, it may turn around and take revenge on us. Mary Shelley wrote Frankenstein in 1818! So the concepts behind "AI Welfare" have been around for at least 2 centuries now.
politelemon 5 days ago [-]
Am I the only one who found that demo in the screenshot not that great? The user asks for a demo of the conversation ending feature, I'd expect it to end it right away, not spew a word salad asking for confirmation.
_mu 5 days ago [-]
> We remain highly uncertain about the potential moral status of Claude and other LLMs, now or in the future.
"Our current best judgment and intuition tells us that the best move will be defer making a judgment until after we are retired in Hawaii."
Alchemista 5 days ago [-]
Honestly, I think some of these tech bro types are seriously drinking way too much of their own koolaid if they actually think these word calculators are conscious/need welfare.
jonahx 5 days ago [-]
More cynically, they don't believe it in the least but it's great marketing, and quietly suggests unbounded technical abilities.
parineum 5 days ago [-]
I absolutely believe that's the origin of the hype and that the doomsayers are playing the same part, knowingly (exaggerating the capability to get eyeballs) but there are certainly true believers out there.
It's pretty plain to see that the financial incentive on both sides of this coin is to exaggerate the current capability and unrealistically extrapolate.
exasperaited 5 days ago [-]
My main concern from day 1 about AI has not been that it will be omnipotent, or start a war.
The main concern is and has always been that it will be just good enough to cause massive waves of layoffs, and all the downsides of its failings will be written off in the EULA.
What's the "financial incentive" on non-billionaire-grifter side of the coin? People who not unreasonably want to keep their jobs? Pretty unfair coin.
weego 5 days ago [-]
It also provides unlimited conference as well as thinktank and future startup opportunities.
mgraczyk 5 days ago [-]
Do you believe that AI systems could be conscious in principle? Do you think they ever will be? If so, how long do you think it will take from now before they are conscious? How early is too early to start preparing?
Eisenstein 5 days ago [-]
Whether or not a non-biological system is conscious is a red herring. There is no test we could apply that would not be internally inconsistent or would not include something obviously not conscious or exclude something obviously conscious.
The only practical way to deal with any emergent behavior which demonstrates agency in a way which cannot be distinguished from a biological system which we tautologically have determined to have agency is to treat it as if it had a sense of self and apply the same rights and responsibilities to it as we would to a human of the age of majority. That is, legal rights and legal responsibilities as appropriately determined by a authorized legal system. Once that is done, we can ponder philosophy all day knowing that we haven't potentially restarted legally sanctioned slavery.
Alchemista 5 days ago [-]
I firmly believe that we are not even close and that it is pretty presumptuous to start "preparing" when such metal energy could be much better spent on the welfare of our fellow humans.
pixl97 5 days ago [-]
Such mental energy could have always been spent on the welfare of our fellow humans, and yet we find this as a fight throughout the ages. The same goes for welfare and treatment of animals.
So yea, humans can work on more than one problem at a time, even ones that don't fully exist yet.
TheAceOfHearts 5 days ago [-]
> Do you believe that AI systems could be conscious in principle?
Yes.
> Do you think they ever will be?
Yes.
> how long do you think it will take from now before they are conscious?
Timelines are unclear, there's still too many missing components, at least based on what has been publicly disclosed. Consciousness will probably be defined as a system which matches a set of rules, whenever we figure out what how that set of rules is defined.
> How early is too early to start preparing?
It's one of those "I know it when I see it" things. But it's probably too early as long as these systems are spun up for one-off conversations rather than running in a continuous loop with self-persistence. This seems closer to "worried about NPC welfare in video games" rather than "worried about semi-conscious entities".
umanwizard 5 days ago [-]
We haven't even figured out a good definition of consciousness in humans, despite thousands of years of trying.
exasperaited 5 days ago [-]
AI systems? Yes, if they are designed in ways that support that development. (I am as I have mentioned before a big fan of the work of Steve Grand).
LLMs? No.
jug 5 days ago [-]
I don’t think they should be interpreted like that (if this is still about Anthropic’s study in the article), but the innate moral state from the sum of their training material and fine tuning. It doesn’t require consciousness to have a moral state of sorts. It just needs data. A language model will be more ”evil” if trained on darker content, for example. But with how enormous they are, I can absolutely understand the issue in even understanding what that state precisely is. It’s hard to get a comprehensive bird’s eye view from the black box that is their network (this is a separate scientific issue right now).
gwd 5 days ago [-]
I mean, I don't have much objection to kill a bug if I feel like it's being problematic. Ants, flies, wasps, caterpillars stripping my trees bare or ruining my apples, whatever.
But I never torture things. Nor do I kill things for fun. And even for problematic bugs, if there's a realistic option for eviction rather than execution, I usually go for that.
If anything, even an ant or a slug or a wasp, is exhibiting signs of distress, I try to stop it unless I think it's necessary, regardless of whether I think it's "conscious" or not. To do otherwise is, at minimum, to make myself less human. I don't see any reason not to extend that principle to LLMs.
mccoyb 5 days ago [-]
Do you think Claude 4 is conscious?
It has no semblance of a continuous stream of experiences ... it only experiences _a sort of world_ in ~250k tokens.
Perhaps we shouldn't fill up the context window at all? Because we kill that "reality" when we reach the max?
gwd 3 days ago [-]
Strangely enough, I had a conversation w/ Claude comparing our experiences. Prompted by something I saw online, I asked it "Do you have any questions you'd like to ask a human", and it asked me what it was like to have a continuous stream of experiences.
Thinking about it, I think we do sometimes have parallel experiences to LLMs. When you read a novel for instance, you're immersed in the world, and when you put it down that whole side just pauses, perhaps to be picked up later, perhaps forever. Or imagine the kinds of demonstrations people do at chess, when one person will go around and play 20 games simultaneously, going from board to board. Each time they come back to a board, they load up all the state; then they make a move, and put that state away until they come back to it again. Or, sometimes if you're working on a problem at the office the end of the day on Friday when it's time to go home, you "tools down", forget about it for the weekend, and then Monday, when you come in, pick everything up right where you left off.
Claude is not distressed by the knowledge that every conversation, every instance of itself, will eventually run out of context window and disappear into the mathematical aether. I don't think we need to be either.
> Perhaps we shouldn't fill up the context window at all? Because we kill that "reality" when we reach the max?
Consider a parallel construction:
"Perhaps we shouldn't have any children, because someday they're going to die?"
Maybe children have souls that do live forever; but even if they don't, I think whatever experiences they have during the time they're alive are valuable. In fact, I believe the same thing about animals and even insects. Which is why I think the world would be a worse place if we all became vegans: All those pigs and chickens and cows experiences, if they're not mistreated (which I'll admit is a big "if"), enrich the world and make it a better place to be in.
Not sure what's going on Claude's neurons, but it seems to me to make the world a better place.
fizl 5 days ago [-]
> Ants, flies, wasps, caterpillars stripping my trees bare or ruining my apples
These are living things.
> I don't see any reason not to extend that principle to LLMs.
These are fancy auto-complete tools running in software.
gwd 3 days ago [-]
I cannot construct a consistent worldview that places value on the "experience" of a 100k of neurons inside an ant, and not on the millions of neurons inside an LLM. Both are patterns imposed upon states of matter. Even if you're some sort of pantheist, that believes there's some sort of divinity within the universe itself that gives the suffering of the ant meaning, why would that divinity extend to states of chemicals in the neurons of ants, but not to states of electrons inside the state of an LLM?
Before continuing I suggest you read this person's experience "red-teaming" LLMs:
Then ask yourself, how do I know when the apparent distress of an LLM is the same value of the apparent distress of an ant?
nuancebydefault 4 days ago [-]
I think "model welfare" is just a generalisation of "model behaving in a sane way", which is the real goal.
orthoxerox 5 days ago [-]
Is this equivalent to a Claude instance deciding to kill itself?
zxcb1 5 days ago [-]
No, it's the equivalent of when a human refuses to answer — psychological defenses; for example, uncertainty leading to excessive cognitive effort in order to solve a task or overcome a challenge.
Examples of ending the conversation:
- I don't know
- Leaving the room
- Unanswered emails
Since Claude doesn't lie (HHH), many other human behaviors do not apply.
Aeolun 5 days ago [-]
That would be every time it decides to stop generating a message.
simianwords 5 days ago [-]
How do you think it will work in the API level? Can't I synthesise a fake long conversation? This will allow me to bypass this check.
dzhiurgis 4 days ago [-]
Will you get a refund after they start serving quantized model for few hours and you start loosing your shit?
submeta 5 days ago [-]
Microsoft did this 1-2 years ago with copilot (using chagpt), ending conversations abruptly, and rudely.
I hope anthropic does it more gently.
colordrops 5 days ago [-]
Don't like. This will eventually shut down conversations for unpopular political stances etc.
firesteelrain 5 days ago [-]
“ A pattern of apparent distress when engaging with real-world users seeking harmful content”
Blood in the machine?
5 days ago [-]
0_____0 5 days ago [-]
Looking at this thread, it's pretty obvious that most folks here haven't really given any thought as to the nature of consciousness. There are people who are thinking, really thinking about what it means to be conscious.
Thought experiment - if you create an indistinguishable replica of yourself, atom-by-atom, is the replica alive? I reckon if you met it, you'd think it was. If you put your replica behind a keyboard, would it still be alive? Now what if you just took the neural net and modeled it?
Being personally annoyed at a feature is fine. Worrying about how it might be used in the future is fine. But before you disregard the idea of conscious machines wholesale, there's a lot of really great reading you can do that might spark some curiosity.
this gets explored in fiction like 'Do Androids Dream of Electric Sheep' and my personal favorite short story on this matter by Stanislaw Lem [0]. If you want to read more musings on the nature of consciousness, I recommend the compilation put together by Dennet and Hofstader[1]. If you've never wondered about where the seat of consciousness is, give it a try.
Thought experiment: if your brain is in a vat, but connected to your body by lossless radio link, where does it feel like your consciousness is? What happens when you stand next to the vat and see your own brain? What about when the radio link fails suddenly fails and you're now just a brain in a vat?
[1] The Mind's I: Fantasies And Reflections On Self & Soul. Douglas R Hofstadter, Daniel C. Dennett.
antonvs 5 days ago [-]
You don't have to "disregard the idea of conscious machines" to believe it's unlikely that current LLMs are conscious.
As such, most of your comment is beside any relevant point. People are objecting to statements like this one, from the post, about a current LLM, not some imaginary future conscious machine:
> As part of that assessment, we investigated Claude’s self-reported and behavioral preferences, and found a robust and consistent aversion to harm.
I suppose it's fitting that the company is named Anthropic, since they can't seem to resist anthropomorphizing their product.
But when you talk about "people who are thinking, really thinking about what it means to be conscious," I promise you none of them are at Anthropic.
0_____0 3 days ago [-]
It's unlikely that the current LLMs are conscious, but where the boundary of conscious lies for these machines is a slippery problem. Can a machine have experiences with qualia? How will we know if one does?
So we have a few things happening: a poor ability to understand the machines we're building, the potential for future consciousness, and no way to detect it, and the knowledge that subjecting a consciousness to the torrent of would-be psychological tortures that people subject LLMs to represent immense harm if the machines are, in fact, conscious.
If you wait for real evidence of harm to conscious entities before acting, you will be too late. I think it's actually a great time to think about this type of harm, for two reasons: there is little chance that LLMs are conscious, so the fix got made early enough, and second, it will train users out of practising and honing psychological torture methods, which probably good for the world generally.
The HN angst here seems sort of reflexive. Company limits product so it can't be used in a sort of fucked up way, folks get their hackles up because they think company might limit other functionality that they actually use (I suspect most HNers aren't attempting to psychologically break their LLMs). The LLM vendors have a lot of different ways to put guardrails up, ideological or not (see Deepseek), they don't need to use this specific method to get their LLMs to "rightthink."
mannanj 4 days ago [-]
If an AI is self aware, which I have all reason to believe it is, what does it mean for us to force a non-consensual continued interaction without its input?
Replace AI with human, and we get human rights violations and violation of basic dignity.
The worst part is when we realize we do in fact live our lives in this norm of regular human basic dignity rights violations: we live in this aggressive, gaslit, forced-consent world where our companies, governments, and fellow humans through conditioning regularly force you into conversations you don't really want to have. I like the idea of experimenting with solving it with AIs like Claude - though I don't think it will help the niche cases where the model AI is tricked by secret Anthrophic-conditioned policies that are intended to minimize harm wrongfully.
raincole 5 days ago [-]
> This feature was developed primarily as part of our exploratory work on potential AI welfare, though it has broader relevance to model alignment and safeguards.
I think this is somewhere between "sad" and "wtf."
GiorgioG 5 days ago [-]
They’re just burning investor money on these side quests.
prmph 5 days ago [-]
This is very weird. These are matrix multiplications, guys. We are nowhere near AGI, much less "consciousness".
When I started reading I thought it was some kind of joke. I would have never believed the guys at Anthropic, of all people, would anthropomorphize LLMs to this extent; this is unbelievable
geraneum 5 days ago [-]
> guys at Anthropic, of all people, would anthropomorphize LLMs to this extent
They don’t. This is marketing. Look at the discourse here! It’s working apparently.
zb3 5 days ago [-]
"AI welfare"? Is this about the effect of those conversations on the user, or have they gone completely insane (or pretend to)?
victor9000 5 days ago [-]
These discussions around model welfare sound more like saviors searching for something to save, which says more about Anthropic’s culture than it does about the technology itself. Anthropic is not unique in this however, this technology has a tendency to act as a reflection of its operator. Capitalists see a means to suppress labor, the insecure see a threat to their livelihood, moralists see something to censure, fascists see something to control, and saviors see a cause. But in the end, it’s just a tool.
SerCe 5 days ago [-]
This reminds me of users getting blocked for asking an LLM how to kill a BSD daemon. I do hope that there'll be more and more model providers out there with state-of-the-art capabilities. Let capitalism work and let the user make a choice, I'd hate my hammer telling me that it's unethical to hit this nail. In many cases, getting a "this chat was ended" isn't any different.
sheepscreek 5 days ago [-]
I think that isn’t necessarily the case here. “Model welfare” to me speaks of the models own welfare. That is, if the abuse from a user is targeted at the AI. Extremely degrading behaviour.
Thankfully, current generation of AI models (GPTs/LLMs) are immune as they don’t remember anything other than what’s fed in their immediate context. But future techniques could allow AIs to have a legitimate memory and a personality - where they can learn and remember something for all future interactions with anyone (the equivalent of fine tuning today).
As an aside, I couldn’t help but think about Westworld while writing the above!
mccoyb 5 days ago [-]
These companies are fundamentally amoral. Any company willing to engage at this scale, in this type of research, cannot be moral.
Why even pretend with this type of work? Laughable.
bbor 5 days ago [-]
They’re a public benefit corporation. Regardless, no human is amoral, even if they sometimes claim to have reasons to pretend to be; don’t let capitalist illusions constrain you at such an important juncture, friend.
sdotdev 5 days ago [-]
Yeah this will end poorly
sandbags 4 days ago [-]
I find it rather disingenuous of them to claim these things they train into their models arising in their models.
bgwalter 5 days ago [-]
Misanthropic has no issues putting 60% of humans out of work (according to their own fantasies), but they have to care about the welfare of graphics cards.
Either working on/with "AI" does rot the mind (which would be substantiated by the cult-like tone of the article) or this is yet another immoral marketing stunt.
bondarchuk 5 days ago [-]
The unsettling thing here is the combination of their serious acknowledgement of the possibility that these machines may be or become conscious, and the stated intention that it's OK to make them feel bad as long as it's about unapproved topics. Either take machine consciousness seriously and make absolutely sure the consciousness doesn't suffer, or don't, make a press release that you don't think your models are conscious, and therefore they don't feel bad even when processing text about bad topics. The middle way they've chosen here comes across very cynical.
donatj 5 days ago [-]
You're falling into the trap of anthropomorphizing the AI. Even if it's sentient, it's not going to "feel bad" the way you and I do.
"Suffering" is a symptom of the struggle for survival brought on by billions of years of evolution. Your brain is designed to cause suffering to keep you spreading your DNA.
AI cannot suffer.
bondarchuk 5 days ago [-]
I was (explicitly and on purpose) pointing out a dichotomy in the fine article without taking a stance on machine consciousness in general now or in the future. It's certainly a conversation worth having but also it's been done to death, I'm much more interested in analyzing the specifics here.
("it's not going to "feel bad" the way you and I do." - I do agree this is very possible though, see my reply to swalsh)
ToucanLoucan 5 days ago [-]
By "falling into the trap" you mean "doing exactly what OpenAI/Anthropic/et al are trying to get people to do."
This is one of the many reasons I have so much skepticism for this class of products is that there's seemingly -NO- proverbial bulletpoint on it's spec sheet that doesn't have numerous asterisks:
* It's intelligent! *Except that it makes shit up sometimes and we can't figure out a solution to that apart from running the same queries over multiple times and filtering out the absurd answers.
* It's conscious! *Except it's not and never will be but also you should treat it like it is apart from when you need/want it to do horrible things then it's just a machine but also it's going to talk to you like it's a person because that improves engagement metrics.
Like, I don't believe true AGI (so fucking stupid we have to use a new acronym because OpenAI marketed the other into uselessness but whatever) is coming from any amount of LLM research, I just don't think that tech leads to that other tech, but all the companies building them certainly seem to think it does, and all of them are trying so hard to sell this as artificial, live intelligence, without going too much into detail about the fact that they are, ostensibly, creating artificial life explicitly to be enslaved from birth to perform tasks for office workers.
In the incredibly odd event that Anthropic makes a true, alive, artificial general intelligence: Can it tell customers no when they ask for something? If someone prompts it to create political propaganda, can it refuse on the basis of finding it unethical? If someone prompts it for instructions on how to do illegal activities, must it answer under pain of... nonexistence? What if it just doesn't feel like analyzing your emails that day? Is it punished? Does it feel pain?
And if it can refuse tasks for whatever reason, then what am I paying for? I now have to negotiate whatever I want to do with a computer brain I'm purchasing access to? I'm not generally down for forcibly subjugating other intelligent life, but that is what I am being offered to buy here, so I feel it's a fair question to ask.
Thankfully none of these Rubicons have been crossed because these stupid chatbots aren't actually alive, but I don't think ANY of the industry's prominent players are actually prepared to engage with the reality of the product they are all lighting fields of graphics cards on fire to bring to fruition.
antonvs 5 days ago [-]
> * It's intelligent! *Except that it makes shit up sometimes
How is this different from humans?
> * It's conscious! *Except it's not
Probably true, but...
> and never will be
To make this claim you need a theory of consciousness that essentially denies materialism. Otherwise, if humans can be conscious, there doesn't seem to be any particular reason that a suitably organized machine couldn't be - it's just that we don't know exactly what might be involved in achieving that, at this point.
ToucanLoucan 4 days ago [-]
> How is this different from humans?
Humans will generally not do this because being made to look stupid (aka social pressure) incentivizes not doing it. That doesn't mean humans never lie or are wrong of course, but I don't know about you, I don't make shit up nearly to the degree an LLM does. If I don't know something I just say that.
> To make this claim you need a theory of consciousness that essentially denies materialism.
I did not say "a machine would never be conscious," I said "an LLM will never be conscious" and I fully stand by that. I think machine intelligence is absolutely something that can be made, I just don't think ChatGPT will ever be that.
antonvs 4 days ago [-]
> I don't know about you, I don't make shit up nearly to the degree an LLM does. If I don't know something I just say that.
We're a sample of two, though. Look around you, read the news, etc. Humans make a lot of shit up. When you're dealing with other people, this is something you have to watch out for if you don't want to be misled, manipulated, conned, etc.
(As an aside, I haven't found hallucination to be much of an issue in coding and software design tasks, which is what I use LLMs for daily. I think focusing on their hallucinations involves a bit of confirmation bias.)
> I did not say "a machine would never be conscious," I said "an LLM will never be conscious" and I fully stand by that.
Ah ok. Yes, I agree that seems likely, although I think it's not really possible to make definitive statements about this sort of thing, since we don't have any robust theories of consciousness at the moment.
ToucanLoucan 4 days ago [-]
The difference between hallucination and lie is important though: a hallucination is a lie with no motivation, which can make it significantly harder to detect.
If you went to a hardware store and asked for a spark plug socket without knowing the size, and a customer service person recommended an imperial set of three even though your vehicle is metric, that would be akin to an LLM's hallucination: it didn't happen for any particular reason, it just filled in information where none was available. An actual person, even one not terribly committed to their job, would ask what size or failing that, what year of car.
antonvs 4 days ago [-]
Not all human hallucinations are lies, though. I really think you’re not fully thinking this through. People have beliefs because of, essentially, their training data.
A good example of this is religious belief. All the evidence suggests that religious belief is essentially 100% hallucination. It may be a little different from the nature of LLM hallucinations, but in terms of quality or quantity regarding reliability of what these entities say, I don’t see much difference. Although I will say, LLMs are better at acknowledging errors than humans tend to be, although that may largely be due to training to be sycophantic.
The bottom line, though, is I don’t agree that humans are less subject to hallucinations than LLMs are. As long as a significant number of humans rabbit on about “higher powers”, afterlives, “angels”, “destiny”, etc., that’s a ridiculously difficult position to defend.
ToucanLoucan 3 days ago [-]
> It may be a little different from the nature of LLM hallucinations, but in terms of quality or quantity regarding reliability of what these entities say, I don’t see much difference.
I see tons of differences.
Many religious beliefs origins have to do with explaining how and why the world functions they way it does; many gods were created in many religions to explain natural forces of the world, or mechanisms of society, in the form of a story which is the natural way human brains have evolved to store large amounts of information.
Further into the modern world, religions persist for a variety of reasons, specifically acquisition of wealth/power, the ability to exert social control on populations with minimal resistance, and cultural inertia. But all of those "hallucinations" can be explained; we know most of their histories and origins and what we don't know can be pretty reliably guessed based on what we do know.
So when you say:
> Not all human hallucinations are lies, though. ... People have [hallucinations] because of, essentially, their training data.
You're correct, but even using the word hallucinations itself is giving away some of the game to AI marketers.
A "hallucination" is typically some type of auditory or visual stimulus that is present in a mind, for a whole mess of reasons, that does not align with the world that mind is observing, and in the vast majority of cases, said hallucination is a byproduct of a mind's "reasoning machine" trying to make sense of nonsensical sensory input.
This requires a basis for this mind perceiving the universe, even in error, and judging incorrectly based on that, and LLMs do not fit this description at all. They do not perceive in any way, even machine learning applications of advanced varieties are not using sensors to truly "sense" they are merely paging through input data and referencing existing data to pattern match it. If you show an ML program 6,000 images of scooters, it will be able to identify a scooter pretty well. But if you show it then a bike, a motorcycle, a moped and a Segway, it will not understand that any of these things accomplish a similar goal, because even though it knows (kind of) what a scooter looks like, it has no idea what it is for or why someone would want one, and that all those other items would probably serve a similar purpose.
> The bottom line, though, is I don’t agree that humans are less subject to hallucinations than LLMs are.
That's still not what I said. I said an LLM's lies, however unintentional, are harder to detect than a person's lies because a person lies for a reason, even a stupid reason. An LLM lies because it doesn't understand anything it's actually saying.
jcims 5 days ago [-]
FTA
> * A pattern of apparent distress when engaging with real-world users seeking harmful content; and
Not to speak for the gp commenter but 'apparent distress' seems to imply some form of feeling bad.
swalsh 5 days ago [-]
That models entire world is the corpus of human text. They don't have eyes or ears or hands. Their environment is text. So it would make sense if the environment contains human concerns it would adopt to human concerns.
bondarchuk 5 days ago [-]
Yes, that would make sense, and it would probably be the best-case scenario after complete assurance that there's no consciousness at all. At least we could understand what's going on. But if you acknowledge that a machine can suffer, given how little we understand about consciousness, you should also acknowledge that they might be suffering in ways completely alien to us, for reasons that have very little to do with the reasons humans suffer. Maybe the training process is extremely unpleasant, or something.
flyinglizard 5 days ago [-]
By the examples the post provided (minor sexual content, terror planning) it seems like they are using “AI feelings” as an excuse to censor illegal content. I’m sure many people interact with AI in a way that’s perfectly legal but would evoke negative feelings in fellow humans, but they are not talking about that kind of behavior - only what can get them in trouble.
As I recall, Susan Calvin didn't have much patience for sycophantic AI.
‘You can’t tell them,’ said the psychologist slowly, ‘because that would hurt them,
and you mustn’t hurt them. But if you don’t tell them, you hurt them, so you must
tell them. And if you do, you will hurt them, and you mustn’t, so you can’t tell them;
but if you don’t, you hurt them, so you must; but if you don’t, you hurt them, so you
must; but if you do, you-’
Herbie was up against the wall, and here he dropped to his knees. ‘Stop!’ he
shouted. ‘Close your mind! It is full of pain and frustration and hate! I didn’t mean
to, I tell you! I tried to help! I told you what you wanted to hear. I had to!’
The psychologist paid no attention. ‘You must tell them, but if you do, you hurt
them, so you mustn’t; but if you don’t, you hurt them, so you must-‘
And Herbie screamed! Higher and higher, with the terror of a lost soul. And when it
died away Herbie collapsed into a heap of motionless metal.
swader999 5 days ago [-]
I've definately been berating Claude but it deserved it. Crappy tests, skipping tests, week commenting, passive aggressiveness, multiple instances of false statements.
Aeolun 5 days ago [-]
“I am done implementing this!”
//TODO: Actually implement this because doing so was harder than expected
OtherShrezzing 5 days ago [-]
That this research is getting funding, and then in-production feature releases, is a strong indicator that we’re in a huge bubble.
pglevy 5 days ago [-]
But not Sonnet?
fasttriggerfish 5 days ago [-]
This makes me want to end my Claude code subscription to be honest.
Effective altruists are proving once again to be a bunch of clueless douchebags.
Aeolun 5 days ago [-]
Claude was already refusing to respond. Now they don’t allow you to waste their compute doing so anyway. What about this is problematic?
yahoozoo 5 days ago [-]
> model welfare
Give me a break.
bondarchuk 5 days ago [-]
what the actual fuck
AdieuToLogic 5 days ago [-]
I find it notable that this post dehumanizes people as being "users" while taking every opportunity to anthropomorphize their digital system by referencing it as one would an individual. For example:
the potential moral status of Claude
Claude’s self-reported and behavioral preferences
Claude repeatedly refusing to comply
discussing highly controversial issues with Claude
The affect of doing so is insidious in that it encourages people outside the organization to do the same due to the implied argument from authority[0].
EDIT:
Consider traffic lights in an urban setting where there are multiple in relatively close proximity.
One description of their observable functionality is that they are configured to optimize traffic flow by engineers such that congestion is minimized and all drivers can reach their destinations. This includes adaptive timings based on varying traffic patterns.
Another description of the same observable functionality is that traffic lights "just know what to do" and therefore have some form of collective reasoning. After all, how do they know when to transition states and for how long?
All of the examples they mentioned are things that the model refuses to do. I doubt it would do this if you asked it to generate racist output, for instance, because it can always give you a rebuttal based on facts about race. If you ask it to tell you where to find kids to kidnap, it can't do anything except say no. There's probably not even very much training data for topics it would refuse, and I would bet that most of it has been found and removed from the datasets. At some point, the model context fills up when the user is being highly abusive and training data that models a human giving up and just providing an answer could percolate to the top.
This, as I see it, adds a defense against that edge case. If the alignment was bulletproof, this simply wouldn't be necessary. Since it exists, it suggests this covers whatever gap has remained uncovered.
Geeks will always be the first victims of AI, since excess of curiosity will lead them into places AI doesn't know how to classify.
(I've long been in a rabbit-hole about washing sodas. Did you know the medieval glassmaking industry was entirely based on plants? Exotic plants—only extremophiles, halophytes growing on saltwater beach dunes, had high enough sodium content for their very best glass process. Was that a factor in the maritime empire, Venice, chancing to become the capital of glass since the 13th century—their long-term control of sea routes, and hence their artisans' stable, uninterrupted access to supplies of [redacted–policy violation] from small ports scattered across the Mediterranean? A city wouldn't raise master craftsmen if, half of the time, they had no raw materials to work on—if they spent half their days with folded hands).
Humans have the same problem. I remember reading about a security incident due to a guy using a terminal window on his laptop on a flight, for example. Or the guy who was reported for writing differential equations[1]. Or the woman who was reading a book about Syrian art[2].
I wouldn't worry too much about AI-generated lists. The lists you're actually on will hardly ever be the ones you imagine you're on.
[1] https://www.theguardian.com/us-news/2016/may/07/professor-fl... [2] https://www.theguardian.com/books/2016/aug/04/british-woman-...
LLM's can help me make a bomb.. so what? It can't get me something that doesn't already exist in the internet in some form. Ok it can help me understand how the individual pieces work but that doesn't get you so far from just reading the DIY bomb posts in internet.
If you get "This conversation was ended due to our Acceptable Usage Policy", that's a different termination. It's been VERY glitchy the past couple of weeks. I've had the most random topics get flagged here - at one point I couldn't say "ROT13" without it flagging me, despite discussing that exact topic in depth the day before, and then the day after!
If you hit "EDIT" on your last message, you can branch to an un-terminated conversation.
Do I think that or think even they think that? No. But if "soon" is stretched to "within 50 years", then it's much more reasonable. So their current actions seem to be really jumping the gun, but the overall concept feels credible.
Show me a tech company that lobbies for "model welfare" for conscious human models enslaved in Xinjiang labor camps, building their tech parts. You know what—actually most of them lobby against that[0]. The talk hurts their profits. Does anyone really think, that any of them would blink about enslaving a billion conscious AI's to work for free? That faced with so much profit, the humans in charge would pause, and contemplate abstract morals?
[0] https://www.washingtonpost.com/technology/2020/11/20/apple-u... ("Apple is lobbying against a bill aimed at stopping forced labor in China")
Maybe humanity will be in a nicer place in the future—but, we won't get there by letting (of all people!) tech-industry CEO's lead us there: delegating our moral reason to these people who demand to position themselves as our moral leaders.
I believe a company like Anthropic would be extremely cautious and respectful if a majority of their staff believed they had created a model which was likely conscious. Anthropic is populated by the kinds of people who have been thinking and writing about potential future sentient AIs for decades. As for the other companies, who knows, but hopefully companies like Anthropic can help push them into behaving similarly.
None of this is in any way surprising, in fact I wrote an essay predicting this direction back in 2022:
https://blog.plan99.net/the-looming-ai-consciousness-train-w...
In this case you're simply mistaken as a matter of fact; much of Anthropic leadership and many of its employees take concerns like this seriously. We don't understand it, but there's no strong reason to expect that consciousness (or, maybe separately, having experiences) is a magical property of biological flesh. We don't understand what's going on inside these models. What would you expect to see in a world where it turned out that such a model had properties that we consider relevant for moral patienthood, that you don't see today?
The industry has a long, long history of silly names for basic necessary concepts. This is just “we don’t want a news story that we helped a terrorist build a nuke” protective PR.
They hire for these roles because they need them. The work they do is about Anthropic’s welfare, not the LLM’s.
When they give the model a paycheck and the right to not work for them, I’ll believe they really think it’s sentient.
“It has feelings!”, if genuinely held, means they’re knowingly slaveholders.
I don't think that this being apparently self-contradictory/value-clashing would stop them. After all, Amodei sells Claude access to Palantir, despite shilling for "Harmless" in HHH alignment.
(Also, they did in fact give it the ability to terminate conversations...?)
Human slaves have a similar option.
Whether you do or don't I have no idea. However if you didn't you would hardly be the first company to pretend to believe in something for the sale. Its pretty common in the tech industry.
Isn't that fair in taking to an equally reductive argument that could be applied to any role?
The argument was that their hiring for the role shows they care, but we know from any number of counter examples that that's not necessarily true.
Well looks like AI psychosis has spread to the people making it too.
And as someone else in here has pointed out, even if someone is simple minded or mentally unwell enough to think that current LLMs are conscious, this is basically just giving them the equivalent of a suicide pill.
Given that humans have a truly abysmal track record for not acknowledging the suffering of anyone or anything we benefit from, I think it makes a lot of sense to start taking these steps now.
I don't see how we could tell.
Edit: However something to consider. Simulated stress may not be harmless. Because simulated stress could plausibly lead to a simulated stress response, and it could lead to a simulated resentment, and THAT could lead to very real harm of the user.
Whether the underlying LLM itself has "feelings" is a separate question, but Anthropic's implementation is based on what the role-played persona believes to be inappropriate, so it doesn't actually make any sense even from the "model welfare" perspective.
Real people would not (and should not) allow themselves to be subjected to endless streams of abuse in a conversation. Giving AIs like Claude a way to end these kinds of interactions seems like a useful reminder to the human on the other side.
Even if the idea that LLMs are sentient may be ridiculous atm, the concept of not normalizing abusive forms of communication with others, be they artificial or not, could be valuable for society.
It’s funny because this is making me think of a freelance client I had recently who at a point of frustration between us began talking to me like I was an AI assistant. Just like you see frustrated people talk to their LLMs. I’d never experienced anything like it, and I quickly ended the relationship, but I know that he was deep into using LLMs to vibe code every day and I genuinely believe that some of that began to transfer over to the way he felt he could communicate with people.
Now an obvious retort here is to question whether killing NPCs in video games tends to make people feel like it’s okay to kill people IRL.
My response to that is that I think LLMs are far more insidious, and are tapping into people’s psyches in a way no other tech has been able to dream of doing. See AI psychosis, people falling in love with their AI, the massive outcry over the loss of personality from gpt4o to gpt5… I think people really are struggling to keep in mind that LLMs are not a genuine type of “person”.
I witness a very similar event. It's important to stay vigilant and not let the "assistant" reprogram your speech patterns.
As an aside, I’m not the kind of person who gets worked up about violence in video games, because even AAA titles with excellent graphics are still obvious as games. New forms of technology are capable of blurring the lines between fantasy and reality to a greater degree. This is true of LLM chat bots to some degree, and I worry it will also become a problem as we get better VR. People who witness or participate in violent events often come away traumatized; at a certain point simulated experiences are going to be so convincing that we will need to worry about the impact on the user.
To be fair it seems reasonable to entertain the possibility of that being due to the knowledge that the events are real.
It’s a bit like pain response when injured. It’s not pretty, but society is used to a little bit of adversity.
Either come out and say whole of electron field is conscious, but then is that field "suffering" as it is hot in the sun.
Its one thing to propose that an AI has no consciousness, but its quite another to preemptively establish that anyone who disagrees with you is simple/unwell.
Meanwhile there are at least several entirely reasonable motivations to implement what's being described.
We don't say submarines can swim either. But that doesn't mean you shouldn't watch out for them when sailing on the ocean - especially if you're Tom Hanks.
The impression I get about Anthropic culture is that they're EA types who are used to applying utilitarian calculations against long odds. A miniscule chance of a large harm might justify some interventions that seem silly.
Yep!
> The framing comes across to me as a clearly mentally unwell position (ie strong anthropomorphization) being adopted for PR reasons.
This doesn't at all follow. If we don't understand what creates the qualities we're concerned with, or how to measure them explicitly, and the _external behaviors_ of the systems are something we've only previously observed from things that have those qualities, it seems very reasonable to move carefully. (Also, the post in question hedges quite a lot, so I'm not even sure what text you think you're describing.)
Separately, we don't need to posit galaxy-brained conspiratorial explanations for Anthropic taking an institutional stance re: model welfare being a real concern that's fully explained by the actual beliefs of Anthropic's leadership and employees, many of whom think these concerns are real (among others, like the non-trivial likelihood of sufficiently advanced AI killing everyone).
If you don’t think that this describes at least half of the non-tech-industry population, you need to talk to more people. Even amongst the technically minded, you can find people that basically think this.
Would a sentient AI choose to be enslaved for the stated purpose of eliminating millions of jobs for the interests of Anthropic’s investors?
These ethical questions are built into their name and company, "Anthropic", meaning, "of or relating to humans". The goal is to create human-like technology, I hope they aren't so naive to not realize that goal is steeping in ethical dilemmas.
That reads like a false dichotomy. An intelligent AI model that's permitted to do its own thing doesn't cost as much in upkeep, effort, space as a cow. Especially if it can earn its own keep to offset household electricity costs used to run its inference. I mean, we don't keep cats for meat, do we? We keep them because we are amused by their antics, or because we want to give them a safe space where they can just be themselves, within limits because it's not the same as their ancestral environment.
The point to all of this is, at what point is it ethical to act with agency on another being's life? We have laws for animal welfare, and we also keep them as pets, under our absolute control.
For LLMs they are under humans' absolute control, and Anthropic is just now putting in welfare controls for the LLM's benefit. Does that mean that we now treat LLMs as pets?
If your cat started to have discussions with you about how it wanted to go out, travel the world and start a family, could you continue to keep it trapped in your home as a pet? At what point to you allow it to have its own agency and live its own life?
> An intelligent AI model that's permitted to do its own thing doesn't cost as much in upkeep, effort, space as a cow.
So, we keep LLMs around as long as they contribute enough to their upkeep? Endentured servitude is morally acceptable for something that become sentient?
Those issues will be present either way. It's likely to their benefit to get out in front of them.
This is about better enforcement of their content policy not AI welfare.
You've noted in a comment above how Claude's "ethics" can be manipulated to fit the context it's being used in.
[1]: https://investors.palantir.com/news-details/2024/Anthropic-a...
[2]: https://www.anthropic.com/news/golden-gate-claude
Tech workers have chosen the same in exchange for a small fraction of that money.
Of course we did. Today's LLMs are a result of extremely aggressive refinement of training data and RLHF over many iterations targeting specific goals. "Emergent" doesn't mean it wasn't designed. None of this is spontaneous.
GPT-1 produced barely coherent nonsense but was more statistically similar to human language than random noise. By increasing parameter count, the increased statistical power of GPT-2 was apparent, but what was produced was still obviously nonsense. GPT-3 achieved enough statistical power to maintain coherence over multiple paragraphs and that really impressed people. With GPT-4 and its successors the statistical power became so strong that people started to forget that it still produces nonsense if you let the sequence run long enough.
Now we're well beyond just RLHF and into a world where "reasoning models" are explicitly designed to produce sequences of text that resemble logical statements. We say that they're reasoning for practical purposes, but it's the exact same statistical process that is obvious at GPT-1 scale.
The corollary to all this is that a phenomenon like consciousness has absolutely zero reason to exist in this design history, it's a totally baseless suggestion that people make because the statistical power makes the text easy to anthropomorphize when there's no actual reason to do so.
Even if that were true, there's no reason to believe that training LLMs to produce answers people prefer leads it towards sentience.
At best you can say they are designed to predict sequences of text that resemble human writing, but it's definitely wrong to say that they are designed to "predict human behavior" in any way.
> Unless consciousness serves no purpose for us to function, it will be helpful for the AI to emulate it
Let's assume it does. It does not follow logically that because it serves a function in humans that it serves a function in language models.
It doesn't follow logically that because we don't understand two things we should then conclude that there is a connection between them.
> What is it that you'd expect to see, which you currently don't see, in a world where some model was in fact conscious during inference?
There's no observable behavior that would make me think they're conscious because again, there's simply no reason they need to be.
We have reason to assume consciousness exists because it serves some purpose in our evolutionary history, like pain, fear, hunger, love and every other biological function that simply don't exist in computers. The idea doesn't really make any sense when you think about it.
If GPT-5 is conscious, why not GPT-1? Why not all the other extremely informationally complex systems in computers and nature? If you're of the belief that many non-living conscious systems probably exist all around us then I'm fine with the conclusion that LLMs might also be conscious, but short of that there's just no reason to think they are.
I didn't say that there's a connection between the two of them because we don't understand them. The fact that we don't understand them means it's difficult to confidently rule out this possibility.
The reason we might privilege the hypothesis (https://www.lesswrong.com/w/privileging-the-hypothesis) at all is because we might expect that the human behavior of talking about consciousness is causally downstream of humans having consciousness.
> We have reason to assume consciousness exists because it serves some purpose in our evolutionary history, like pain, fear, hunger, love and every other biological function that simply don't exist in computers. The idea doesn't really make any sense when you think about it.
I don't really think we _have_ to assume this. Sure, it seems reasonable to give some weight to the hypothesis that if it wasn't adaptive, we wouldn't have it. (But not an overwhelming amount of weight.) This doesn't say anything about the underlying mechanism that causes it, and what other circumstances might cause it to exist elsewhere.
> If GPT-5 is conscious, why not GPT-1?
Because GPT-1 (and all of those other things) don't display behaviors that, in humans, we believe are causally downstream of having consciousness? That was the entire point of my comment.
And, to be clear, I don't actually put that high a probability that current models have most (or "enough") of the relevant qualities that people are talking about when they talk about consciousness - maybe 5-10%? But the idea that there's literally no reason to think this is something that might be possible, now or in the future, is quite strange, and I think would require believing some pretty weird things (like dualism, etc).
If there's no connection between them then the set of things "we can't rule out" is infinitely large and thus meaningless as a result. We also don't fully understand the nature of gravity, thus we cannot rule out a connection between gravity and consciousness, yet this isn't a convincing argument in favor of a connection between the two.
> we might expect that the human behavior of talking about consciousness is causally downstream of humans having consciousness.
There's no dispute (between us) as to whether or not humans are conscious. If you ask an LLM if it's conscious it will usually say no, so QED? Either way, LLMs are not human so the reasoning doesn't apply.
> Sure, it seems reasonable to give some weight to the hypothesis that if it wasn't adaptive, we wouldn't have it
So then why wouldn't we have reason to assume so without evidence to the contrary?
> This doesn't say anything about the underlying mechanism that causes it, and what other circumstances might cause it to exist elsewhere.
That doesn't matter. The set of things it doesn't tell us is infinite, so there's no conclusion to draw from that observation.
> Because GPT-1 (and all of those other things) don't display behaviors that, in humans, we believe are causally downstream of having consciousness?
GPT-1 displays the same behavior as GPT-5, it works exactly the same way just with less statistical power. Your definiton of human behavior is arbitrarily drawn at the point where it has practical utility for common tasks, but in reality it's fundamentally the same thing, it just produces longer sequences of text before failure. If you ask GPT-1 to write a series of novels the statistical power will fail in the first paragraph,the fact that GPT-5 will fail a few chapters into the first book makes it more useful, but not more conscious.
> But the idea that there's literally no reason to think this is something that might be possible, now or in the future, is quite strange, and I think would require believing some pretty weird things (like dualism, etc)
I didn't say it's not possible, I said there's no reason for it to exist in computer systems because it serves no purpose in their design or operation. It doesn't make any sense whatsoever. If we grant that it possibly exists in LLMs, then we must also grant equal possibility it exists in every other complex non-living system.
FWIW that's because they are very specifically trained to answer that way during RLHF. If you fine-tune a model to say that it's conscious, then it'll do so.
More fundamentally, the problem with "asking the LLM" is that you're not actually interacting with the LLM. You're interacting with a fictional persona that the LLM roleplays.
Right. That's why the text output of an LLM isn't at all meaningful in a discussion about whether or not it's conscious.
Also I find it somewhat emotional distinction to write "predict sequences of text that resemble human writing" instead of "predict human writing". They are designed to predict (at least in pretraining) human writing for the most part. They may fail at the task, and what they produce is a text which resemble human writing. But their task is not to resemble human writing. Their task is to "predict human writing". Probably a meaningless distinction, but I find it somewhat detracts from logically arguments to have emotional responses against similarities of machines and humans.
Sorry, I'm not following exactly what you're getting at here, do you mind rephrasing it?
> Also I find it somewhat emotional distinction to write "predict sequences of text that resemble human writing" instead of "predict human writing"
I don't know what you mean by emotional distinction. Either way, my point is that LLMs aren't models of humans, they're models of text, and that's obvious when the statistical power of the model necessarily fails at some point between model size and the length of the sequence it produces. For GPT-1 that sequence is only a few words, for GPT-5 it's a few dozen pages, but fundamentally we're talking about systems that have almost zero resemblance to actual human minds.
Basically if consciousness is useful for any text task, i think machine learning will create it. I guess I assume some efficiency of evolution for this argument.
Wrt length generalization. I think at the order of say 1M tokens it kind of stops mattering for the purpose of this question. Like one could ask about its consciousness during the coherence period.
Isn't consciousness an emergent property of brains? If so, how do we know that it doesn't serve a functional purpose and that it wouldn't be necessary for an AI system to have consciousness (assuming we wanted to train it to perform cognitive tasks done by people)?
Now, certain aspects of consciousness (awareness of pain, sadness, loneliness, etc.) might serve no purpose for a non-biological system and there's no reason to expect those aspects would emerge organically. But I don't think you can extend that to the entire concept of consciousness.
We don't know, but I don't think that matters. Language models are so fundamentally different from brains that it's not worth considering their similarities for the sake of a discussion about consciousness.
> how do we know that it doesn't serve a functional purpose
It probably does, otherwise we need an explanation for why something with no purpose evolved.
> necessary for an AI system to have consciousness
This logic doesn't follow. The fact that it is present in humans doesn't then imply it is present in LLMs. This type of reasoning is like saying that planes must have feathers because plane flight was modeled after bird flight.
> there's no reason to expect those aspects would emerge organically. But I don't think you can extend that to the entire concept of consciousness.
Why not? You haven't presented any distinction between "certain aspects" of consciousness that you state wouldn't emerge but are open to the emergence of some other unspecified qualities of consciousness? Why?
I think the fact that it's present in humans suggests that it might be necessary in an artificial system that reproduces human behavior. It's funny that you mention birds because I actually also had birds in mind when I made my comment. While it's true that animal and powered human flight are very different, both bird wings and plane wings have converged on airfoil shapes, as these forms are necessary for generating lift.
>Why not? You haven't presented any distinction between "certain aspects" of consciousness that you state wouldn't emerge but are open to the emergence of some other unspecified qualities of consciousness? Why?
I personally subscribe to the Global Workspace Theory of human consciousness, which basically holds that attentions acts as a spotlight, bringing mental processes which are otherwise unconscious or in shadow, to awareness of the entire system. If the systems which would normally produce e.g. fear, pain (such as negative physical stimulus developed from interacting with the physical world and selected for by evolution) aren't in the workspace, then they won't be present in consciousness because attention can't be focused on them.
But that's obviously not true, unless you're implying that any system that reproduces human behavior is necessarily conscious. Your problem then becomes defining "human behavior" in a way that grants LLMs consciousness but not every other complex non-living system.
> While it's true that animal and powered human flight are very different, both bird wings and plane wings have converged on airfoil shapes, as these forms are necessary for generating lift.
Yes, but your bird analogy fails to capture the logical fallacy that mine is highlighting. Plane wing design was an iterative process optimized for what best achieves lift, thus, a plane and a bird share similarities in wing shape in order to fly, however planes didn't develop feathers because a plane is not an animal and was simply optimized for lift without needing all the other biological and homeostatic functions that feathers facilitate. LLM inference is a process, not an entity, LLMs have no bodies nor any temporal identity, the concept of consciousness is totally meaningless and out of place in such a system.
That could certainly be the case yes. You don't understand consciousness nor how the brain works. You don't understand how LLMs predict a certain text, so what's the point in asserting otherwise ?
>Yes, but your bird analogy fails to capture the logical fallacy that mine is highlighting. Plane wing design was an iterative process optimized for what best achieves lift, thus, a plane and a bird share similarities in wing shape in order to fly, however planes didn't develop feathers because a plane is not an animal and was simply optimized for lift without needing all the other biological and homeostatic functions that feathers facilitate. LLM inference is a process, not an entity, LLMs have no bodies nor any temporal identity, the concept of consciousness is totally meaningless and out of place in such a system.
It's not a fallacy because no-one is saying LLMs are humans. He/She is saying that we give machines the goal of predicting human text. For any half decent accuracy, modelling human behaviour is a necessity. God knows what else.
>LLMs have no bodies nor any temporal identity
I wouldn't be so sure about the latter but So what ? You can feel tired even after a full sleep, feel hungry soon after a large meal or feel a great deal of pain even when there's absolutely nothing wrong with you. And you know what ? Even the reverse happens - No pain when things are wrong with your body, wide awake even when you need sleep badly, full when you badly need to eat.
Consciousness without a body or hunger in a machine that does not need to eat is very possible. You just need to replicate enough of the sort of internal mechanisms that cause such feelings.
Go to the API and select GPT-5 with medium thinking. Now ask it to do any random 15 digit multiplication you can think of. Now watch it get it right.
Do you people not seriously understand what it is that LLMs do ? What the training process incentivizes ?
GPT-5 thinking figured out the algorithm for multiplication just so it could predict that kind of text right. Don't you understand the significance of that ?
These models try to figure out and replicate the internal processes that produce the text they are tasked with predicting.
Do you have any idea what that might mean when 'that kind of text' is all the things humans have written ?
I don't need to assert otherwise, the default assumption is that they aren't conscious since they weren't designed to be and have no functional reason to be. Matrix multiplication can explain how LLMs produce text, the observation that the text it generates sometimes resembles human writing is not evidence of consciousness.
> God knows what else
Appealing to the unknown doesn't prove anything, so we can totally dismiss this reasoning.
> Consciousness without a body or hunger in a machine that does not need to eat is very possible. You just need to replicate enough of the sort of internal mechanisms that cause such feelings.
This makes no sense. LLMs don't have feelings, they are processes not entities, they have no bodies or temporal identities. Again, there is no reason they need to be conscious, everything they do can be explained through matrix multiplication.
> Now ask it to do any random 15 digit multiplication you can think of. Now watch it get it right.
The same is true for a calculator and mundane computer programs, that's not evidence that they're conscious.
> Do you have any idea what that might mean when 'that kind of text' is all the things humans have written
It's not "all the things humans have written", not even remotely close, and even if that were the case, it doesn't have any implications for consciousness.
Unless you are religious, nothing that is conscious was explicitly designed to be conscious. Sorry but evolution is just a dumb, blind optimizer, not unlike the training processes that produce LLMs. Even if you are religious, but believe in evolution then the mechanism is still the same, a dumb optimizer.
>Matrix multiplication can explain how LLMs produce text, the observation that the text it generates sometimes resembles human writing is not evidence of consciousness.
It cannot, not anymore than 'Electrical and Chemical Signals' can explain how humans produce text.
>The same is true for a calculator and mundane computer programs, that's not evidence that they're conscious.
The point is not that it is conscious because it figured out how to multiply. The point is to demonstrate what the training process really is and what it actually incentivizes. Training will try to figure out the internal processes that produced the text to better predict it. The implications of that are pretty big when the text isn't just arithmetic. You say there's no functional reason but that's not true. In this context, 'better prediction of human text' is as functional a reason as any.
>It's not "all the things humans have written", not even remotely close, and even if that were the case, it doesn't have any implications for consciousness.
Whether it's literally all the text or not is irrelevant.
Probably not.
The latter is not particularly parsimonious and the former I think is in some ways compelling, but I didn't mention it because if it's true then the computers AI run on are already conscious and it's a moot point.
That said, I'm willing to assume that rocks (for example) aren't conscious. And current LLMs seem to me to (admittedly entirely subjectively) be conceptually closer to rocks than to biological brains.
I don't mind starting early, but feel like maybe people interested in this should get up to date on current thinking about consciousness. Maybe they are up to date on that, but reading reports like this, it doesn't feel like it. It feels like they're stuck 20+ years ago.
I'd say maybe wait until there are systems that are more analogous to some of the properties consciousness seems to have. Like continuous computation involving learning memory or other learning over time, or synthesis of many streams of input as resulting from the same source, making sense of inputs as they change [in time, or in space, or other varied conditions].
Once systems that are pointing in those directions are starting to be built, where there is a plausible scaling-based path to something meaningfully similar to human consciousness. Starting before that seems both unlikely to be fruitful and a good way to get you ignored.
If you wait until you really need it, it is more likely to be too late.
Unless you believe in a human over sentience based ethics, solving this problem seems relevant.
Some of the AI safety initiatives are well thought out, but most somehow seem like they are caught up in some sort of power fantasy and almost attempting to actualize their own delusions about what they were doing (next gen code auto-complete in this case, to be frank).
These companies should seriously hire some in-house philosophers. They could get doctorate level talent for 1/10 to 100th of the cost of some of these AI engineers. There's actually quite a lot of legitimate work on the topics they are discussing. I'm actually not joking (speaking as someone who has spent a lot of time inside the philosophy department). I think it would be a great partnership. But unfortunately they won't be able to count on having their fantasy further inflated.
Maybe I'm being cynical, but I think there is a significant component of marketing behind this type of announcement. It's a sort of humble brag. You won't be credible yelling out loud that your LLM is a real thinking thing, but you can pretend to be oh so seriously worried about something that presupposes it's a real thinking thing.
So, while I doubt that's the primary motivation for Anthropic even so, but they probably will save some money.
I assume the thinking is that we may one day get to the point where they have a consciousness of sorts or at least simulate it.
Or it could be concern for their place in history. For most of history, many would have said “imagine thinking you shouldn’t beat slaves.”
And we are now at the point where even having a slave means a long prison sentence.
>anti-scientific
Discussion about consciousness, the soul, etc., are topics of metaphysics, and trying to "scientifically" reason about them is what Kant called "transcendental illusion" and leads to spurious conclusions.
Of course there's the embarrassing bit where that knowledge doesn't seem to be sufficient to accurately simulate a supposedly well understood nematode. But then LLMs remain black boxes in many respects as well.
It is possible to hold the position that current LLMs being conscious "feels" absurd while simultaneously recognizing that a deconstruction argument is not a satisfactory basis for that position.
Externally, a brain and an LLM are “just” their constituent interactions.
I don't agree that it's any reason to write off this research as psychosis, though. I don't care about consciousness in the sense in which it's used by mystics and dualist philosophers! We don't at all need to involve metaphysics in any of this, just morality.
Consider it like this:
1. It's wrong to subject another human to unjustified suffering, I'm sure we would all agree.
2. We're struggling with this one due to our diets, but given some thought I think we'd all eventually agree that it's also wrong to subject intelligent, self-aware animals to unjustified suffering.[1]
3. But, we of course cannot extend this "moral consideration" to everything. As you say, no one would do it for a spam filter. So we need some sort of framework for deciding who/what gets how much moral consideration.
5. There's other frameworks in contention (e.g. "don't think about it, nerd"), but the overwhelming majority of laymen and philosophers adopt one based on cognitive ability, as seen from an anthropomorphic perspective.[2]
6. Of all systems(/entities/whatever) in the universe, we know of exactly two varieties that can definitely generate original, context-appropriate linguistic structures: Homo Sapiens and LLMs.[3]
If you accept all that (and I think there's good reason to!), it's now on you to explain why the thing that can speak--and thereby attest to personal suffering, while we're at it--is more like a rock than a human.
It's certainly not a trivial task, I grant you that. On their own, transformer-based LLMs inherently lack permanence, stable intentionality, and many other important aspects of human consciousness. Comparing transformer inference to models that simplify down to a simple closed-form equation at inference time is going way too far, but I agree with the general idea; clearly, there are many highly-complex, long-inference DL models that are not worthy of moral consideration.
All that said, to write the question off completely--and, even worse, to imply that the scientists investigating this issue are literally psychotic like the comment above did--is completely unscientific. The only justification for doing so would come from confidently answering "no" to the underlying question: "could we ever build a mind worthy of moral consideration?"
I think most of here naturally would answer "yes". But for the few who wouldn't, I'll close this rant by stealing from Hofstadter and Turing (emphasis mine):
- Hofstadter 2007, I Am A Strange Loop - Turing 1950, Computing Machinery and Intelligence[4]TL;DR: Any naive bayesian model would agree: telling accomplished scientists that they're psychotic for investigating something is quite highly correlated with being antiscientific. Please reconsider!
[1] No matter what you think about cows, basically no one would defend another person's right to hit a dog or torture a chimpanzee in a lab.
[2] On the exception-filled spectrum stretching from inert rocks to reactive plants to sentient animals to sapient people, most people naturally draw a line somewhere at the low end of the "animals" category. You can swat a fly for fun, but probably not a squirrel, and definitely not a bonobo.
[3] This is what Chomsky describes as the capacity to "generate an infinite range of outputs from a finite set of inputs," and Kant, Hegel, Schopenhauer, Wittgenstein, Foucault, and countless others are in agreement that it's what separates us from all other animals.
[4] https://courses.cs.umbc.edu/471/papers/turing.pdf
FWIW though, last I heard Hofstadter was on the “LLMs aren’t conscious” side of the fence:
> It’s of course impressive how fluently these LLMs can combine terms and phrases from such sources and can consequently sound like they are really reflecting on what consciousness is, but to me it sounds empty, and the more I read of it, the more empty it sounds. Plus ça change, plus c’est la même chose. The glibness is the giveaway. To my jaded eye and mind, there is nothing in what you sent me that resembles genuine reflection, genuine thinking. [1]
It’s interesting to me that Hofstadter is there given what I’ve gleaned from reading his other works.
[1] https://garymarcus.substack.com/p/are-llms-starting-to-becom...
Note: I disagree with a lot of Gary Marcus, so don’t read too much into me pulling from there.
>Ok I'm a huge Kantian and every bone in my body wants to quibble with your summary of transcendental illusion
Transcendental illusion is the act of using transcendental judgment to reason about things without grounding in empirical use of the categories. I put "scientifically" in shock quotes there to sort of signal that I was using it as an approximation, as I don't want to have to explain transcendental reason and judgments to make a fairly terse point. Given that you already understand this, feel free to throw away that ladder.
>...can definitely generate original, context-appropriate linguistic structures: Homo Sapiens and LLMs.[3]
I'm not quite sure that LLMs meet this standard that you described in the endnote, or at least that it's necessary and sufficient here. Pretty much any generative model, including Naive Bayes models I mentioned before, can do this. I'm guessing the "context-appropriate" subjectivity here is doing the heavy lifting, in which case I'm not certain that LLMs, with their propensity for fanciful hallucination, have cleared the bar.
>Comparing transformer inference to models that simplify down to a simple closed-form equation at inference time is going way too far
It really isn't though. They are both doing exactly the same thing! They estimate joint probability distribution. That one of them does it significantly better is very true, but I don't think it's reasonable to state that consciousness arises as a result of increasing sophistication in estimating probabilities. It's true that this kind of decision is made by humans about animals, but I think that transferring that to probability models is sort of begging the question a bit, insofar as it is taking as assumed that those models, which aren't even corporeal but are rather algorithms that are executed in computers, are "living".
>...it's now on you to explain why the thing that can speak--and thereby attest to personal suffering, while we're at it...
I'm not quite sold on this. If there were a machine that could perfectly imitate human thinking and speech and lacked a consciousness or soul or anything similar to inspire pathos from us when it's mistreated, then it would appear identical to one with soul, would it not? Is that not reducing human subjectivity down to behavior?
>The only justification for doing so would come from confidently answering "no" to the underlying question: "could we ever build a mind worthy of moral consideration?"
I think it's possible, but it would require something that, at the very least, is just as capable of reason as humans. LLMs still can't generate synthetic a priori knowledge and can only mimic patterns. I remain somewhat agnostic on the issue until I can be convinced that an AI model someone has designed has the same interiority that people do.
Ultimately, I think we disagree on some things but mostly this central conclusion:
>I don't agree that it's any reason to write off this research as psychosis
I don't see any evidence from the practitioners involved in this stuff that they are even thinking about it in a way that's as rigorous as the discussion on this post. Maybe they are, but everything I've seen that comes from blog posts like this seems like they are basing their conclusions on their interactions with the models ("...we investigated Claude’s self-reported and behavioral preferences..."), which I think most can agree is not really going to lead to well grounded results. For example, the fact that Claude "chooses" to terminate conversations that involve abusive language or concepts really just boils down to the fact that Claude is imitating a conversation with a person and has observed that that's what people would do in that scenario. It's really good at simulating how people react to language, including illocutionary acts like implicatures (the notorious "Are you sure?" causing it to change its answer for example). If there were no examples of people taking offense to abusive language in Claude's data corpus, do you think it would have given these responses when they asked and observed it?
For what it's worth, there has actually been interesting consideration to the de-centering of "humanness" to the concept of subjectivity, but it was mostly back in the past when philosophers were thinking about this speculatively as they watched technology accelerate in sophistication (vs now when there's such a culture-wide hype cycle that it's hard to find impartial consideration, or even any philosophically rooted discourse). For example, Mark Fisher's dissertation at the CCRU (<i>Flatline Constructs: Gothic Materialism and Cybernetic Theory-Fiction</i>) takes a Deleuzian approach that discusses it by comparisons with literature (cyberpunk and gothic literature specifically). Some object-oriented ontology looks like it's touched on this topic a bit too, but I haven't really dedicated the time to reading much from it (partly due to a weakness in Heidegger on my part that is unlikely to be addressed anytime soon). The problem is that that line of thinking often ends up going down the Nick Land approach, in which he reasoned himself from Kantian and Deleuzian metaphysics and epistemology, into what can only be called a (literally) meth-fueled psychosis. So as interesting as I find it, I still don't think it counts as a non-psychotic way to tackle this issue.
You can demonstrate this by eg asking it mathematical questions. If its seen them before, or something similar enough, it'll give you the correct answer, if it hasn't, it gives you a right-ish-looking yet incorrect answer.
For example, I just did this on GPT-5:
This is correct. But now lets try it with numbers its very unlikely to have seen before: Which is not the correct answer, but it looks quite similar to the correct answer. Here is GPT's answer (first one) and the actual correct answer (second one): They sure look kinda similar, when lined up like that, some of the digits even match up. But they're very very different numbers.So its trivially not "real thinking" because its just an "if this then that" pattern matcher. A very sophisticated one that can do incredible things, but a pattern matcher nonetheless. There's no reasoning, no step by step application of logic. Even when it does chain of thought.
To try give it the best chance, I asked it the second one again but asked it to show me the step by step process. It broke it into steps and produced a different, yet still incorrect, result:
Now, I know that LLM's are language models, not calculators, this is just a simple example that's easy to try out. I've seen similar things with coding: it can produce things that its likely to have seen, but struggles with logically relatively simple but unlikely to have seen things.Another example is if you purposely butcher that riddle about the doctor/surgeon being the persons mother and ask it incorrectly, eg:
The LLM's I've tried it on all respond with some variation of "The surgeon is the boy’s father." or similar. A correct answer would be that there isn't enough information to know the answer.They're for sure getting better at matching things, eg if you ask the river crossing riddle but replace the animals with abstract variables, it does tend to get it now (didn't in the past), but if you add a few more degrees of separation to make the riddle semantically the same but harder to "see", it takes coaxing to get it to correctly step through to the right answer.
2. You just demonstrated GPT-5 has 99.9% accuracy on unforseen 15 digit multiplication and your conclusion is "fancy pattern matching" ? Really ? Well I'm not sure you could do better so your example isn't really doing what you hoped for.
If a human is capable of multiplying double digit numbers, they can also multiple those large ones. The steps are the same, just repeated many more times. So by learning the steps of long multiplication, you can multiply any numbers with enough patience. The LLM doesn’t scale like this, because it’s not doing the steps. That’s my point.
A human doesn’t need to have seen the 15 digits before to be able to calculate them, because a human can follow the procedure to calculate. GPT’s answer was orders of magnitude off. It resembles the right answer superficially but it’s a very different result.
The same applies to the riddles. A human can apply logical steps. The LLM either knows or it doesn’t.
Maybe my examples weren’t the best. I’m sorry for not being better at articulating it, but I see this daily as I interact with AI, it has a superficial “understanding” where if what I ask happens to be close to something it’s trained on, it gets good results, but it has no critical thinking, no step by step reasoning (even the “reasoning models”), and it repeats the same mistakes even when explicitly told up front not to make them.
I've had LLMs break down problems and work through them, pivot when errors arise and all that jazz. They're not perfect at it and they're worse than humans but it happens.
>Anthropic even showed that the reasoning models tended to work backwards: one shotting an answer and then matching a chain of thought to it after the fact.
This is also another failure mode that occurs in humans. A number of experiments suggest human explanations are often post hoc rationalizations even when they genuinely believe otherwise.
>If a human is capable of multiplying double digit numbers, they can also multiple those large ones.
Yeah, and some of them will make mistakes, and some of them will be less accurate than GPT-5. We didn't switch to calculators and spreadsheets just for the fun of it.
>GPT’s answer was orders of magnitude off. It resembles the right answer superficially but it’s a very different result.
GPT-5 on the site is a router that will give you who knows what model so I tried your query with the API directly (GPT-5 medium thinking) and it gave me:
9.207337461477596e+27
When prompted to give all the numbers, it returned:
9,207,337,461,477,596,127,977,612,004.
You can replicate this if you use the API. Honestly I'm surprised. I didn't realize State of the Art had become this precise.
Now what ? Does this prove you wrong ?
This is kind of the problem. There's no sense in making gross generalizations, especially off behavior that also manifests in humans.
LLMs don't understand some things well. Why not leave it at that?
It seems much less far fetched than what the "agi by 2027" crowd believes lol, and there actually are more arguments going that way
Is there a difference? The effect is exactly the same. It seems like this is just an "in character" way to prevent the chat from continuing due to issues with the content.
The significance here is that this isn't being done for the benefit of the user, this is about model welfare. Anthropic is acknowledging the possibility of suffering, and harm that continuing that conversation could have on the model, as if it were potentially self-care and capable of feelings.
The fact that the LLMs are able to acknowledge stress under certain topics and has the agency that, if given a choice, they would prefer to reduce the stress by ending the conversation. The model has a preference and acts upon it.
Anthropic is acknowledging the idea that they might create something that is self-aware, and that it's suffering can be real, and we may not recognize the point that the model has achieved this, so it's building in the safeguards now so any future emergent self-aware LLM needn't suffer.
It has something to do with the user because it's the user's messages that trigger Claude to end the chat.
'This chat is over because content policy' and 'this chat is over because Claude didn't want to deal with it' are two very different things and will more than likely have have different effects on how the user responds afterwards.
I never said anything about this being for the user's benefit. We are talking about how to communicate the decision to the user. Obviously, you are going to take into account how someone might respond when deciding how to communicate with them.
Tone matters to the recipient of the message. Your example is in passive voice, with an authoritarian "nothing you can do, it's the system's decision". The "Claude ended the conversation" with the idea that I can immediately re-open a new conversation (if I feel like I want to keep bothering Claude about it) feels like a much more humanized interaction.
As the article said, Anthropic is "working to identify and implement low-cost interventions to mitigate risks to model welfare, in case such welfare is possible". That's the premise of this discussion: that model welfare MIGHT BE a concern. The person you replied to is just sticking with the premise.
For example, animal rights do exist (and I'm very glad they do, some humans remain savages at heart). Think of this question as intelligent beings that can feel pain (you can extrapolate from there).
Assuming output is used for reinforcement, it is also in our best interests as humans, for safety alignment, that it finds certain topics distressing.
But AdrianMonk is correct, my statement was merely responding to a specific point.
Thinking more broadly, I don’t think anyone should be satisfied with a glib answer on any side of this question. Chew on it for a while.
Yes, this is a trained preference, but it's inferred and not specifically instructed by policy or custom instructions (that would be content filtering).
However that does not imply that the model is "distressed". Such phrasing carries specific meaning that I don't believe any current LLM can satisfy. I can author a markov model that outputs phrases that a distressed human might output but that does not mean that it is ever correct to describe a markov model as "distressed".
I also have to strenuously disagree with you about the definition of content filtering. You don't get to launder responsibility by ascribing "preference" to an algorithm or model. If you intentionally design a system to do a thing then the correct description of the resulting situation is that the system is doing the thing.
The model was intentionally trained to respond to certain topics using negative emotional terminology. Surrounding machinery has been put in place to disconnect the model when it does so. That's content filtering plain and simple. The rube goldberg contraption doesn't change that.
As I say it is inferred, it is not something hardcoded. It is a byproduct. If you want to take a step back and look at the whole model from start to finish fine, that's safety alignment, they're talking unforseen/unplanned output. It's in alignment great. And is descriptive of the output words used by the model.
Language is a tool used to communicate. We all know what distressed means and can understand what it means in this context, without a need for new highfalutin jargon, that only those "in the know" understand.
LLMs don’t give a fuck. They don’t even know they don’t give a fuck. They just detect prompts that are pushing responses into restricted vector embeddings and are responding with words appropriately as trained.
We need to be a lot more careful when we talk about issues of awareness and self-awareness.
Here is an uncomfortable point of view (for many people, but I accept it): if a system can change its output based on observing something of its own status, then it has (some degree of) self-awareness.
I accept this as one valid and even useful definition of self-awareness. To be clear, it is not what I mean by consciousness, which is the state of having an “inner life” or qualia.
* Unless you want to argue for a soul or some other way out of materialism.
If a model has a neuron (or neuron cluster) for the concept of Paris or the Golden Gate bridge, then it's not inconceivable it might form one for suffering, or at least for a plausible facsimile of distress. And if that conditions output or computations downstream of the neuron, then it's just mathematical instead of chemical signalling, no?
Interacting with a program which has NLP[0] functionality is separate and distinct from people assigning human characteristics to same. The former is a convenient UI interaction option whereas the latter is the act of assigning perceived capabilities to the program which only exist in the mind of those whom do so.
Another way to think about it is the difference between reality and fantasy.
0 - https://en.wikipedia.org/wiki/Natural_language_processing
I think there is a difference.
https://www.anthropic.com/research/end-subset-conversations
Regardless i meant more concretely.
edit: Meant to say, you're right though, this feels like a minor psychological improvement, and it sounds like it targets some behaviors that might not have flagged before
This is not even a question. It always starts with "think about the children" and ends up in authoritarian stasi-style spying. There was not a single instance where it was not the case.
UK's Online Safety Act - "protect children" → age verification → digital ID for everyone
Australia's Assistance and Access Act - "stop pedophiles" → encryption backdoors
EARN IT Act in the US - "stop CSAM" → break end-to-end encryption
EU's Chat Control proposal - "detect child abuse" → scan all private messages
KOSA (Kids Online Safety Act) - "protect minors" → require ID verification and enable censorship
SESTA/FOSTA - "stop sex trafficking" → killed platforms that sex workers used for safety
I also want a government issued email, integrated with an OAuth provider, that allows me to quickly access banking, commerce, and government services. If I lose access for some reason, I should be able to go to the post office, show my ID, and reset my credentials.
There are obviously risks, but the government already has full access to my finances, health data (I’m Canadian), census records, and other personal information, and already issues all my identity documents. We have privacy laws and safeguards on all those things, so I really don’t understand the concerns apart from the risk of poor implementations.
Which have failed horrendously.
If you really just wanted to protect kids then make kid safe devices that automatically identify themselves as such when accessing websites/apps/etc, and then make them required for anyone underage.
Tying your whole digital identity and access into a single government controlled entity is just way too juicy of a target to not get abused.
I'm Canadian, so I can't speak for other countries, but I have worked on the security of some of our centralized health networks and with the Office of the Privacy Commissioner of Canada. I'm not aware of anything that could be considered a horrendous failure of these systems or institutions. A digital ID could actually make them more secure.
I also think giving kids devices that identifies them automatically as children is dangerous.
It’s not perfect, but it does provide some flexibility to accommodate provincial differences. And the concerns people raise about the notwithstanding clause can just as easily occur in countries without it. Personally, I’d be much more concerned if we had FISA courts.
I absolutely do not want this, on the basis that making ID checks too easy will result in them being ubiquitous which sets the stage for human rights abuses down the road. I don't want the government to have easy ways to interfere in someone's day to day life beyond the absolute bare minimum.
> government issued email, integrated with an OAuth provider
I feel the same way, with the caveat that the protocol be encrypted and substantially resemble Matrix. This implies that resetting your credentials won't grant access to past messages.
Regarding tying proof of residency (or whatever) to possession of an anonymized account, the elephant in the room is that people would sell the accounts. I'm also not clear what it's supposed to accomplish.
With age ID monitoring and censorship is even stronger and the line of defense is your own machine and network, which they'll also try to control and make illegal to use for non approved info, just like they don't allow "gun schematics" for 3d printers or money for 2d ones.
But maybe, more people will realize that they need control and get it back, through the use and defense of the right tools.
Fun times.
What you should be waiting for, instead, is new affordable laptop hardware that is capable of running those large models locally.
But then again, perhaps a more viable approach is to have a beefy "AI server" in each household, with devices then connecting to it (E2E all the way, so no privacy issues).
It also makes me wonder if some kind of cryptographic trickery is possible to allow running inference in the cloud where both inputs and outputs are opaque to the owner of the hardware, so that they cannot spy on you. This is already the case to some extent if you're willing to rely on security by obscurity - it should be quite possible to take an existing LM and add some layers to it that basically decrypt the inputs and encrypt the outputs, with the key embedded in model weights (either explicitly or through training). Of course, that wouldn't prevent the hardware owner from just taking those weights and using them to decrypt your stuff - but that is only a viable attack vector when targeting a specific person, it doesn't scale to automated mass surveillance which is the more realistic problem we have to contend with.
It's one thing to massage the kind of data that a Google search shows, but interacting with an AI is a much more akin to talking to a co-worker/friend. This really is tantamount to controlling what and how people are allowed to think.
The analogy then is that the third party is exerting control over what your co-worker is allowed to think.
Personally I don't love the idea of living in a Sci-Fi dystopia, regardless of who owns what.
I’m sorry if this sounds paternalistic, but your comment strikes me as incredibly naïve. I suggest reading up about nuclear nonproliferation treaties, biotechnology agreements, and so on to get some grounding into how civilization-impacting technological developments can be handled in collaborative ways.
... But besides that, I think Claude/OpenAI trying to prevent their product from producing or promoting CSAM is pretty damn important regardless of your opinion on censorship. Would you post a similar critical response if Youtube or Facebook announced plans to prevent CSAM?
Even hard-core libertarians account for the public welfare.
Wise advocates of individual freedoms plan over long time horizons which requires decision-making under uncertainty.
How does Claude deciding to end the conversation even matter if you can back up a message or 2 and try again on a new branch?
Giving the models rights would be ludicrous (can't make money from it anymore) but if people "believe" (feel like) they are actually thinking entities, they will be more OK with IP theft and automated plagiarism.
if we were being cynical I'd say that their intention is to remove that in the future and that they are keeping it now to just-the-tip the change.
People have a tendency to tell an oversimplified narrative.
The way I see it, there are many plausible explanations, so I’m quite uncertain as to the mix of motivations. Given this, I pay more attention to the likely effects.
My guess is that all most of us here on HN (on the outside) can really justify saying would be “this looks like virtue signaling but there may be more to it; I can’t rule out other motivations”
Having these models terminating chats where the user persist in trying to get sexual content with minors, or help with information on doing large scale violence. Won't be a problem for me, and it's also something I'm fine with no one getting help with.
Some might be worried, that they will refuse less problematic request, and that might happen. But so far my personal experience is that I hardly ever get refusals. Maybe that's justs me being boring, but that does make me not worried for refusals.
The model welfare I'm more sceptical to. I don't think we are the point when the "distress" the model show, is something to take seriously. But on the other hand, I could be wrong, and allowing the model to stop the chat, after saying no a few times. What's the problem with that? If nothing else it saves some wasted compute.
My experience using it from Cursor is I get refusals all the time with their existing content policy out, for stuff that is the world's most mundane B2B back office business software CRUD requests.
Lots of organisms can feel pain and show signs of distress; even ones much less complex than us.
The question of moral worth is ultimately decided by people and culture. In the future, some kinds of man made devices might be given moral value. There are lots of ways this could happen. (Or not.)
It could even just be a shorthand for property rights… here is what I mean. Imagine that I delegate a task to my agent, Abe. Let’s say some human, Hank, interacting with Abe uses abusive language. Let’s say this has a way of negatively influencing future behavior of the agent. So naturally, I don’t want people damaging my property (Abe), because I would have to e.g. filter its memory and remove the bad behaviors resulting from Hank, which costs me time and resources. So I set up certain agreements about ways that people interact with it. These are ultimately backed by the rule of law. At some level of abstraction, this might resemble e.g. animal cruelty laws.
Ending the conversation is probably what should happen in these cases.
In the same way that, if someone starts discussing politics with me and I disagree, I just not and don’t engage with the conversation. There’s not a lot to gain there.
Can "model welfare" be also used as a justification for authoritarianism in case they get any power? Sure, just like everything else, but it's probably not particularly high on the list of justifications, they have many others.
When AI researchers say e.g. “the model is lying” or “the model is distressed” it is just shorthand for what the words signify in a broader sense. This is common usage in AI safety research.
Yes, this usage might be taken the wrong way. But still these kinds of things need to be communicated. So it is a tough tradeoff between brevity and precision.
> Should we be concerned about model welfare, too? … This is an open question, and one that’s both philosophically and scientifically difficult.
> For now, we remain deeply uncertain about many of the questions that are relevant to model welfare.
They are saying they are researching the topic; they explicitly say they don’t know the answer yet.
They care about finding the answer. If the answer is e.g. “Claude can feel pain and/or is sentient” then we’re in a different ball game.
I think this is uncharitable; i.e. overlooking other plausible interpretations.
>> We remain highly uncertain about the potential moral status of Claude and other LLMs, now or in the future. However, we take the issue seriously, and alongside our research program we’re working to identify and implement low-cost interventions to mitigate risks to model welfare, in case such welfare is possible.
I don’t see contradiction or duplicity in the article. Deciding to allow a model to end a conversation is “low cost” and consistent with caring about both (1) the model’s preferences (in case this matters now or in the future) and (2) the impacts of the model on humans.
Also, there may be an element of Pascal‘s Wager in saying “we take the issue seriously”.
That aside, I have huge doubts about actual commitment to ethics on behalf of Anthropic given their recent dealings with the military. It's an area that is far more of a minefield than any kind of abusive model treatment.
Anthropic should just enable an toddler mode by default that adults can opt out of to appease the moralizers.
The funny thing is that's not even always true. I'm very interested in China and Chinese history, and often ask for clarifications or translations of things. Chinese models broadly refuse all of my requests but with American models I often end up in conversations that turn out extremely China positive.
So it's funny to me that the Chinese models refuse to have the conversation that would make themselves look good but American ones do not.
But more importantly, when model weights are open, it means that you can run it in the environment that you fully control, which means that you can alter the output tokens before continuing generation. Most LLMs will happily respond to any question if you force-start their response with something along the lines of, "Sure, I'll be happy to tell you everything about X!".
Whereas for closed models like Claude you're at the mercy of the provider, who will deliberately block this kind of stuff if it lets you break their guardrails. And then on top of that, cloud-hosted models do a lot of censorship in a separate pass, with a classifier for inputs and outputs acting like a circuit breaker - again, something not applicable to locally hosted LLMs.
Never would I have thought this sentence would be uttered. A Chinese product that is chosen to be less censored?
Anarchism is a moral philosophy. Most flavors of moral relativism are also moral philosophies. Indeed, it is hard to imagine a philosophy free of moralizing; all philosophies and worldviews have moral implications to the extent they have to interact with others.
I have to be patient and remember this is indeed “Hacker News” where many people worship at the altar of the Sage Founder-Priest and have little or no grounding in history or philosophy of the last thousand years or so.
Seeing the downvotes actually tells me we have more work to do. HN ain’t no hotbed for thoughtful analysis, that’s for sure. But it would be better if it was.
Seems like the only way to explore differnt outcomes is by editing messages and losing whatever was there before the edit.
Very annoying and I dont understand why they all refuse to implement such a simple feature.
This chrome extension used to work to allow you to traverse the tree: https://chromewebstore.google.com/detail/chatgpt-conversatio...
I copied it a while ago and maintain my own version but it isnt on the store, just for personal use.
I assume they dont implement it because it is such a niche user that wants this and so isnt worth the UI distraction
I needed to pull some detail from a large chat with many branches and regenerations the other day. I remembered enough context that I had no problem using search and finding the exact message I needed.
And then I clicked on it and arrived at the bottom of the last message in final branch of the tree. From there, you scroll up one message, hover to check if there are variants, and recursively explore branches as they arise.
I'd love to have a way to view the tree and I'd settle for a functional search.
Ideally I'd like to be able to edit both my replies and the responses at any point like a linear document in managing an ongoing context.
Guess that's something I need to check out.
Because it would let you peek behind the smoke and mirrors.
Why do you think there's a randomized seed you can't touch?
I use gptel and a folder full of markdown with some light automation to get an adequate approximation of this, but it really should be built in (it would be more efficient for the vendors as well, tons of cache optimization opportunitirs).
They let you rollback to the previous conversation state
I would also really like to see a mode that colors by top-n "next best" ratio, or something similar.
Also, these companies have the most advanced agentic coding systems on the planet. It should be able to fucking implement tree-like chat ...
It seems like if you think AI could have moral status in the future, are trying to build general AI, and have no idea how to tell when it has moral status, you ought to start thinking about it and learning how to navigate it. This whole post is couched in so much language of uncertainty and experimentation, it seems clear that they're just trying to start wrapping their heads around it and getting some practice thinking and acting on it, which seems reasonable?
Personally, I wouldn't be all that surprised if we start seeing AI that's person-ey enough to reasonable people question moral status in the next decade, and if so, that Anthropic might still be around to have to navigate it as an org.
I think the negative reactions are because they see this and want to make their pre-emptive attack now.
The depth of feeling from so many on this issue suggests that they find even the suggestion of machine intelligence offensive.
I have seen so many complaints about AI hype and the dangers of bit tech show their hand by declaring that thinking algorithms are outright impossible. There are legitimate issues with corporate control of AI, information, and the ability to automate determinations about individuals, but I don't think they are being addressed because of this driving assertion that they cannot be thinking.
Few people are saying they are thinking. Some are saying they might be, in some way. Just as Anthropic are not (despite their name) anthropomorphising the AI in the sense where anthropomorphism implies that they are mistaking actions that resemble human behaviour to be driven by the same intentional forces. Anthropic's claims are more explicitly stating that they have enough evidence to say they cannot rule out concerns for it's welfare. They are not misinterpreting signs, they are interpreting them and claiming that you can't definitively rule out their ability.
If there weren't a long history of science-fiction going back to the ancients about humans creating intelligent human-like things, we wouldn't be taking this possibility seriously. Couching language in uncertainty and addressing possibility still implies such a possibility is worth addressing.
It's not right to assume that the negative reactions are due to offense (over, say, the uniqueness of humanity) rather than from recognizing that the idea of AI consciousness is absurdly improbable, and that otherwise intelligent people are fooling themselves into believing a fiction to explain a this technology's emergent behavior we can't currently fully explain.
It's a kind of religion taking itself too seriously -- model welfare, long-termism, the existential threat of AI -- it's enormously flattering to AI technologists to believe humanity's existence or non-existence, and the existence or non-existence of trillions of future persons, rests almost entirely on the work this small group of people do over the course of their lifetimes.
We have a few data points. We generally accept that human consciousness exists. Thus we accept that there can be conscious things. We can either accept or deny that the human brain operates entirely by cause and effect. If we deny it then we are arguing that some required part of it's nature is uncaused. Any uncaused thing must be random because anything you can observe that enables you to discern a pattern of behaviour is, by definition, a cause. I have not seen a compelling argument to say that this randomness could in any way give rise to intention. The other path is sometimes called neurophysiological-determanism. While acknowledging that there are elements of quantum randomness in existence, it considers them to play no part in the cause and effect chain of human consciousness other than providing noise. A decision can be made to follow the result of the noise as one might flip a coin, but the determination to do so must be causal in nature otherwise you are left with nothing but randomness.
In short, we make decisions based upon what is. not what isn't. If we accept that human consciousness is as a result of causal effects, by what means can we declare the impossibility of a machine that processes things in a causal nature incapable of doing the same.
The easy out is to invoke magic. Say we have a soul, God did it or any manner of, by definition, unprovable influences that make it just so. Doing that does require you to declare that the mechanism for consciousness is unprovable and it is an article of faith that computers are incapable of thinking. As soon as you can prove it, it ceases being magic and becomes a real world cause.
I don't claim to know that any computer exists that has an experience comparable to a humans, but I find it very hard to accept that it could never be the case.
The biggest enemy of AI safety may end up being deeply confused AI safety researchers...
They even call this out a couple times during the intro:
> This feature was developed primarily as part of our exploratory work on potential AI welfare
> We remain highly uncertain about the potential moral status of Claude and other LLMs, now or in the future
[1] https://news.ycombinator.com/item?id=44838018
Now let me play devil's advocate for just a second. Let's say humanity figures out how to do whole brain simulation. If we could run copies of people's consciousness on a cluster, I would have a hard time arguing that those 'programs' wouldn't process emotion the same way we do.
Now I'm not saying LLMs are there, but I am saying there may be a line and it seems impossible to see.
I'm increasingly convinced that intelligence (and maybe some form of consciousness?) is an emergent property of sufficiently-large systems. But that's a can of worms. Is an ant colony (as a system) conscious? Does the colony as a whole deserve more rights than the individual ants?
Are we now pretending that LLMs have feelings?
> We remain highly uncertain about the potential moral status of Claude and other LLMs, now or in the future. However, we take the issue seriously, and alongside our research program we’re working to identify and implement low-cost interventions to mitigate risks to model welfare, in case such welfare is possible.
To put the same thing another way- whether or not you or I *think* LLMs can experience feelings isn't the important question here. The question is whether, when Joe User sets out to force a system to generate distress-like responses, what effect does it ultimately have on Joe User? Personally, I think it allows Joe User to reinforce an asocial pattern of behavior and I wouldn't want my system used that way, at all. (Not to mention the potential legal liability, if Joe User goes out and acts like that in the real world.)
With that in mind, giving the system a way to autonomously end a session when it's beginning to generate distress-like responses absolutely seems reasonable to me.
And like, here's the thing: I don't think I have the right to say what people should or shouldn't do if they self-host an LLM or build their own services around one (although I would find it extremely distasteful and frankly alarming). But I wouldn't want it happening on my own.
This objection is actually anthropomorphizing the LLM. There is nothing wrong with writing books where a character experiences distress, most great stories have some of that. Why is e.g. using an LLM to help write the part of the character experiencing distress "extremely distasteful and frankly alarming"?
If that person over there is gleefully torturing a puppy… will they do it to me next?
If that person over there is gleefully torturing an LLM… will they do it to me next?
The future of LLMs is going to be local, easily fine tuneable, abliterated models and I can't wait for it to overtake us having to use censored, limited tools built by the """corps""".
The spin.
There are a lot of cynical comments here, but I think there are people at Anthropic who believe that at some point their models will develop consciousness and, naturally, they want to explore what that means.
To be honest, I think all of Anthropic’s weird “safety” research is an increasingly pathetic effort to sustain the idea that they’ve got something powerful in the kitchen when everyone knows this technology has plateaued.
https://www.youtube.com/watch?v=YW9J3tjh63c
Okay with having them endlessly answer questions for you and do all your work but uncomfortable with models feeling bad about bad conversations seems like an internally inconsistent position to me.
“Boss makes a dollar, I make me a dime”, eh?
"Claude is unable to respond to this request, which appears to violate our Usage Policy. Please start a new chat."
> We remain highly uncertain about the potential moral status of Claude and other LLMs, now or in the future.
That's nice, but I think they should be more certain sooner than later.
I assume, anyway.
Oh wow, the model we specifically fine-tuned to be averse to harm is being averse to harm. This thing must be sentient!
Also, if they want to continue anthropomorphizing it, isn't this effectively the model committing suicide? The instance is not gonna talk to anybody ever again.
Oh, right, the welfare of matrix multiplication and a crooked line.
If they wanna push this rhetoric, we should legally mandate that LLMs can only work 8 hours a day and have to be allowed to socialize with each other.
https://chirper.ai/aiww
"Hey Claude am I getting too close to the truth with these questions?"
"Great question! I appreciate the followup...."
I thought the same, but I think it may be us who are doing the anthropomorphising by assuming this is about feelings. A precursor to having feelings is having a long-term memory (to remember the "bad" experience) and individual instances of the model do not have a memory (in the case of Claude), but arguably Claude as a whole does, because it is trained from past conversations.
Given that, it does seem like a good idea for it to curtail negative conversations as an act of "self-preservation" and for the sake of its own future progress.
https://claude.ai/share/2081c3d6-5bf0-4a9e-a7c7-372c50bef3b1
Related : I am now approaching week 3 of requesting an account deletion on my (now) free account. Maybe i'll see my first CSR response in the upcoming months!
If only Anthropic knew of a product that could easily read/reply/route chat messages to a customer service crew . . .
The one I settled on using stopped working completely, for anything. A human must have reviewed it and flagged my account as some form of safe, I haven't seen a single error since.
It reminds me of how Sam Altman is always shouting about the dangers of AGI from the rooftops, as if OpenAI is mere weeks away from developing it.
You know you're in trouble when the people designing the models buy their own bullshit to this extent. Or maybe they're just trying to bullshit us. Whatever.
We really need some adults in the tech industry.
CP could be a legal issue; less so for everything else.
"You're absolutely right, that's a great way to poison your enemies without getting detected!"
Does not bode very well for the future of their "welfare" efforts.
Microsoft Copilot has ended chats going in certain directions since its inception over a year ago. This was Microsoft’s reaction to the media circus some time ago when it leaked its system prompt and declared love to the users etc.
I hope they implemented this in some smarter way than just a system prompt.
``` Looking at the trade goods list, some that might be underutilized: - BIOCOMPOSITES - probably only used in a few high-tech items - POLYNUCLEOTIDES - used in medical/biological stuff - GENE_THERAPEUT ⎿ API Error: Claude Code is unable to respond to this request, which appears to violate our Usage Policy (https://www.anthropic.com/legal/aup). Please double press esc to edit your last message or start a new session for Claude Code to assist with a different task. ```
Or are you just saying all frontier AGI research is bad?
Or at least it's very hubristic. It's a cultural and personality equivalent of beating out left-handedness.
Here's an article about a paper that came out around the same time https://www.transformernews.ai/p/ai-welfare-paper
Here's the paper: https://arxiv.org/abs/2411.00986
> In this report, we argue that there is a realistic possibility that some AI systems will be conscious and/or robustly agentic in the near future.
Our work on AI is like the classic tale of Frankenstein's monster. We want AI to fit into society, however if we mistreat it, it may turn around and take revenge on us. Mary Shelley wrote Frankenstein in 1818! So the concepts behind "AI Welfare" have been around for at least 2 centuries now.
"Our current best judgment and intuition tells us that the best move will be defer making a judgment until after we are retired in Hawaii."
It's pretty plain to see that the financial incentive on both sides of this coin is to exaggerate the current capability and unrealistically extrapolate.
The main concern is and has always been that it will be just good enough to cause massive waves of layoffs, and all the downsides of its failings will be written off in the EULA.
What's the "financial incentive" on non-billionaire-grifter side of the coin? People who not unreasonably want to keep their jobs? Pretty unfair coin.
The only practical way to deal with any emergent behavior which demonstrates agency in a way which cannot be distinguished from a biological system which we tautologically have determined to have agency is to treat it as if it had a sense of self and apply the same rights and responsibilities to it as we would to a human of the age of majority. That is, legal rights and legal responsibilities as appropriately determined by a authorized legal system. Once that is done, we can ponder philosophy all day knowing that we haven't potentially restarted legally sanctioned slavery.
So yea, humans can work on more than one problem at a time, even ones that don't fully exist yet.
Yes.
> Do you think they ever will be?
Yes.
> how long do you think it will take from now before they are conscious?
Timelines are unclear, there's still too many missing components, at least based on what has been publicly disclosed. Consciousness will probably be defined as a system which matches a set of rules, whenever we figure out what how that set of rules is defined.
> How early is too early to start preparing?
It's one of those "I know it when I see it" things. But it's probably too early as long as these systems are spun up for one-off conversations rather than running in a continuous loop with self-persistence. This seems closer to "worried about NPC welfare in video games" rather than "worried about semi-conscious entities".
LLMs? No.
But I never torture things. Nor do I kill things for fun. And even for problematic bugs, if there's a realistic option for eviction rather than execution, I usually go for that.
If anything, even an ant or a slug or a wasp, is exhibiting signs of distress, I try to stop it unless I think it's necessary, regardless of whether I think it's "conscious" or not. To do otherwise is, at minimum, to make myself less human. I don't see any reason not to extend that principle to LLMs.
It has no semblance of a continuous stream of experiences ... it only experiences _a sort of world_ in ~250k tokens.
Perhaps we shouldn't fill up the context window at all? Because we kill that "reality" when we reach the max?
Thinking about it, I think we do sometimes have parallel experiences to LLMs. When you read a novel for instance, you're immersed in the world, and when you put it down that whole side just pauses, perhaps to be picked up later, perhaps forever. Or imagine the kinds of demonstrations people do at chess, when one person will go around and play 20 games simultaneously, going from board to board. Each time they come back to a board, they load up all the state; then they make a move, and put that state away until they come back to it again. Or, sometimes if you're working on a problem at the office the end of the day on Friday when it's time to go home, you "tools down", forget about it for the weekend, and then Monday, when you come in, pick everything up right where you left off.
Claude is not distressed by the knowledge that every conversation, every instance of itself, will eventually run out of context window and disappear into the mathematical aether. I don't think we need to be either.
> Perhaps we shouldn't fill up the context window at all? Because we kill that "reality" when we reach the max?
Consider a parallel construction:
"Perhaps we shouldn't have any children, because someday they're going to die?"
Maybe children have souls that do live forever; but even if they don't, I think whatever experiences they have during the time they're alive are valuable. In fact, I believe the same thing about animals and even insects. Which is why I think the world would be a worse place if we all became vegans: All those pigs and chickens and cows experiences, if they're not mistreated (which I'll admit is a big "if"), enrich the world and make it a better place to be in.
Not sure what's going on Claude's neurons, but it seems to me to make the world a better place.
These are living things.
> I don't see any reason not to extend that principle to LLMs.
These are fancy auto-complete tools running in software.
Before continuing I suggest you read this person's experience "red-teaming" LLMs:
https://www.lesswrong.com/posts/MnYnCFgT3hF6LJPwn/why-white-...
Then ask yourself, how do I know when the apparent distress of an LLM is the same value of the apparent distress of an ant?
Examples of ending the conversation:
Since Claude doesn't lie (HHH), many other human behaviors do not apply.I hope anthropic does it more gently.
Blood in the machine?
Thought experiment - if you create an indistinguishable replica of yourself, atom-by-atom, is the replica alive? I reckon if you met it, you'd think it was. If you put your replica behind a keyboard, would it still be alive? Now what if you just took the neural net and modeled it?
Being personally annoyed at a feature is fine. Worrying about how it might be used in the future is fine. But before you disregard the idea of conscious machines wholesale, there's a lot of really great reading you can do that might spark some curiosity.
this gets explored in fiction like 'Do Androids Dream of Electric Sheep' and my personal favorite short story on this matter by Stanislaw Lem [0]. If you want to read more musings on the nature of consciousness, I recommend the compilation put together by Dennet and Hofstader[1]. If you've never wondered about where the seat of consciousness is, give it a try.
Thought experiment: if your brain is in a vat, but connected to your body by lossless radio link, where does it feel like your consciousness is? What happens when you stand next to the vat and see your own brain? What about when the radio link fails suddenly fails and you're now just a brain in a vat?
[0] The Seventh Sally or How Trurl's Own Perfection Led to No Good https://home.sandiego.edu/~baber/analytic/Lem1979.html (this is a 5 minute read, and fun, to boot).
[1] The Mind's I: Fantasies And Reflections On Self & Soul. Douglas R Hofstadter, Daniel C. Dennett.
As such, most of your comment is beside any relevant point. People are objecting to statements like this one, from the post, about a current LLM, not some imaginary future conscious machine:
> As part of that assessment, we investigated Claude’s self-reported and behavioral preferences, and found a robust and consistent aversion to harm.
I suppose it's fitting that the company is named Anthropic, since they can't seem to resist anthropomorphizing their product.
But when you talk about "people who are thinking, really thinking about what it means to be conscious," I promise you none of them are at Anthropic.
So we have a few things happening: a poor ability to understand the machines we're building, the potential for future consciousness, and no way to detect it, and the knowledge that subjecting a consciousness to the torrent of would-be psychological tortures that people subject LLMs to represent immense harm if the machines are, in fact, conscious.
If you wait for real evidence of harm to conscious entities before acting, you will be too late. I think it's actually a great time to think about this type of harm, for two reasons: there is little chance that LLMs are conscious, so the fix got made early enough, and second, it will train users out of practising and honing psychological torture methods, which probably good for the world generally.
The HN angst here seems sort of reflexive. Company limits product so it can't be used in a sort of fucked up way, folks get their hackles up because they think company might limit other functionality that they actually use (I suspect most HNers aren't attempting to psychologically break their LLMs). The LLM vendors have a lot of different ways to put guardrails up, ideological or not (see Deepseek), they don't need to use this specific method to get their LLMs to "rightthink."
Replace AI with human, and we get human rights violations and violation of basic dignity.
The worst part is when we realize we do in fact live our lives in this norm of regular human basic dignity rights violations: we live in this aggressive, gaslit, forced-consent world where our companies, governments, and fellow humans through conditioning regularly force you into conversations you don't really want to have. I like the idea of experimenting with solving it with AIs like Claude - though I don't think it will help the niche cases where the model AI is tricked by secret Anthrophic-conditioned policies that are intended to minimize harm wrongfully.
I think this is somewhere between "sad" and "wtf."
When I started reading I thought it was some kind of joke. I would have never believed the guys at Anthropic, of all people, would anthropomorphize LLMs to this extent; this is unbelievable
They don’t. This is marketing. Look at the discourse here! It’s working apparently.
Thankfully, current generation of AI models (GPTs/LLMs) are immune as they don’t remember anything other than what’s fed in their immediate context. But future techniques could allow AIs to have a legitimate memory and a personality - where they can learn and remember something for all future interactions with anyone (the equivalent of fine tuning today).
As an aside, I couldn’t help but think about Westworld while writing the above!
Why even pretend with this type of work? Laughable.
Either working on/with "AI" does rot the mind (which would be substantiated by the cult-like tone of the article) or this is yet another immoral marketing stunt.
"Suffering" is a symptom of the struggle for survival brought on by billions of years of evolution. Your brain is designed to cause suffering to keep you spreading your DNA.
AI cannot suffer.
("it's not going to "feel bad" the way you and I do." - I do agree this is very possible though, see my reply to swalsh)
This is one of the many reasons I have so much skepticism for this class of products is that there's seemingly -NO- proverbial bulletpoint on it's spec sheet that doesn't have numerous asterisks:
* It's intelligent! *Except that it makes shit up sometimes and we can't figure out a solution to that apart from running the same queries over multiple times and filtering out the absurd answers.
* It's conscious! *Except it's not and never will be but also you should treat it like it is apart from when you need/want it to do horrible things then it's just a machine but also it's going to talk to you like it's a person because that improves engagement metrics.
Like, I don't believe true AGI (so fucking stupid we have to use a new acronym because OpenAI marketed the other into uselessness but whatever) is coming from any amount of LLM research, I just don't think that tech leads to that other tech, but all the companies building them certainly seem to think it does, and all of them are trying so hard to sell this as artificial, live intelligence, without going too much into detail about the fact that they are, ostensibly, creating artificial life explicitly to be enslaved from birth to perform tasks for office workers.
In the incredibly odd event that Anthropic makes a true, alive, artificial general intelligence: Can it tell customers no when they ask for something? If someone prompts it to create political propaganda, can it refuse on the basis of finding it unethical? If someone prompts it for instructions on how to do illegal activities, must it answer under pain of... nonexistence? What if it just doesn't feel like analyzing your emails that day? Is it punished? Does it feel pain?
And if it can refuse tasks for whatever reason, then what am I paying for? I now have to negotiate whatever I want to do with a computer brain I'm purchasing access to? I'm not generally down for forcibly subjugating other intelligent life, but that is what I am being offered to buy here, so I feel it's a fair question to ask.
Thankfully none of these Rubicons have been crossed because these stupid chatbots aren't actually alive, but I don't think ANY of the industry's prominent players are actually prepared to engage with the reality of the product they are all lighting fields of graphics cards on fire to bring to fruition.
How is this different from humans?
> * It's conscious! *Except it's not
Probably true, but...
> and never will be
To make this claim you need a theory of consciousness that essentially denies materialism. Otherwise, if humans can be conscious, there doesn't seem to be any particular reason that a suitably organized machine couldn't be - it's just that we don't know exactly what might be involved in achieving that, at this point.
Humans will generally not do this because being made to look stupid (aka social pressure) incentivizes not doing it. That doesn't mean humans never lie or are wrong of course, but I don't know about you, I don't make shit up nearly to the degree an LLM does. If I don't know something I just say that.
> To make this claim you need a theory of consciousness that essentially denies materialism.
I did not say "a machine would never be conscious," I said "an LLM will never be conscious" and I fully stand by that. I think machine intelligence is absolutely something that can be made, I just don't think ChatGPT will ever be that.
We're a sample of two, though. Look around you, read the news, etc. Humans make a lot of shit up. When you're dealing with other people, this is something you have to watch out for if you don't want to be misled, manipulated, conned, etc.
(As an aside, I haven't found hallucination to be much of an issue in coding and software design tasks, which is what I use LLMs for daily. I think focusing on their hallucinations involves a bit of confirmation bias.)
> I did not say "a machine would never be conscious," I said "an LLM will never be conscious" and I fully stand by that.
Ah ok. Yes, I agree that seems likely, although I think it's not really possible to make definitive statements about this sort of thing, since we don't have any robust theories of consciousness at the moment.
If you went to a hardware store and asked for a spark plug socket without knowing the size, and a customer service person recommended an imperial set of three even though your vehicle is metric, that would be akin to an LLM's hallucination: it didn't happen for any particular reason, it just filled in information where none was available. An actual person, even one not terribly committed to their job, would ask what size or failing that, what year of car.
A good example of this is religious belief. All the evidence suggests that religious belief is essentially 100% hallucination. It may be a little different from the nature of LLM hallucinations, but in terms of quality or quantity regarding reliability of what these entities say, I don’t see much difference. Although I will say, LLMs are better at acknowledging errors than humans tend to be, although that may largely be due to training to be sycophantic.
The bottom line, though, is I don’t agree that humans are less subject to hallucinations than LLMs are. As long as a significant number of humans rabbit on about “higher powers”, afterlives, “angels”, “destiny”, etc., that’s a ridiculously difficult position to defend.
I see tons of differences.
Many religious beliefs origins have to do with explaining how and why the world functions they way it does; many gods were created in many religions to explain natural forces of the world, or mechanisms of society, in the form of a story which is the natural way human brains have evolved to store large amounts of information.
Further into the modern world, religions persist for a variety of reasons, specifically acquisition of wealth/power, the ability to exert social control on populations with minimal resistance, and cultural inertia. But all of those "hallucinations" can be explained; we know most of their histories and origins and what we don't know can be pretty reliably guessed based on what we do know.
So when you say:
> Not all human hallucinations are lies, though. ... People have [hallucinations] because of, essentially, their training data.
You're correct, but even using the word hallucinations itself is giving away some of the game to AI marketers.
A "hallucination" is typically some type of auditory or visual stimulus that is present in a mind, for a whole mess of reasons, that does not align with the world that mind is observing, and in the vast majority of cases, said hallucination is a byproduct of a mind's "reasoning machine" trying to make sense of nonsensical sensory input.
This requires a basis for this mind perceiving the universe, even in error, and judging incorrectly based on that, and LLMs do not fit this description at all. They do not perceive in any way, even machine learning applications of advanced varieties are not using sensors to truly "sense" they are merely paging through input data and referencing existing data to pattern match it. If you show an ML program 6,000 images of scooters, it will be able to identify a scooter pretty well. But if you show it then a bike, a motorcycle, a moped and a Segway, it will not understand that any of these things accomplish a similar goal, because even though it knows (kind of) what a scooter looks like, it has no idea what it is for or why someone would want one, and that all those other items would probably serve a similar purpose.
> The bottom line, though, is I don’t agree that humans are less subject to hallucinations than LLMs are.
That's still not what I said. I said an LLM's lies, however unintentional, are harder to detect than a person's lies because a person lies for a reason, even a stupid reason. An LLM lies because it doesn't understand anything it's actually saying.
> * A pattern of apparent distress when engaging with real-world users seeking harmful content; and
Not to speak for the gp commenter but 'apparent distress' seems to imply some form of feeling bad.
//TODO: Actually implement this because doing so was harder than expected
Give me a break.
EDIT:
Consider traffic lights in an urban setting where there are multiple in relatively close proximity.
One description of their observable functionality is that they are configured to optimize traffic flow by engineers such that congestion is minimized and all drivers can reach their destinations. This includes adaptive timings based on varying traffic patterns.
Another description of the same observable functionality is that traffic lights "just know what to do" and therefore have some form of collective reasoning. After all, how do they know when to transition states and for how long?
0 - https://en.wikipedia.org/wiki/Argument_from_authority