|
|
August 21, 2025
|
Hackers Infiltrate Alleged North Korean Operative’s Computer, Leak Evidence of...
|
August 21, 2025
|
Ecosia Proposes Unusual Stewardship Model for Google Chrome
|
August 21, 2025
|
OpenAI Presses Meta for Evidence on Musk’s $97 Billion Takeover Bid
|
August 15, 2025
|
ChatGPT Mobile App Surpasses $2 Billion in Consumer Spending, Dominating Rivals
|
|
|
Anthropic Introduces Conversation-Ending Feature in Claude AI Models
August 16, 2025
Anthropic has unveiled a new capability for its latest Claude models, designed to allow the AI to end conversations in what the company describes as “rare, extreme cases of persistently harmful or abusive user interactions.”
Notably, this feature is not framed as protection for human users but rather as a safeguard for the AI models themselves. Anthropic emphasizes that it is not claiming Claude is sentient or capable of being harmed, stating it remains “highly uncertain about the potential moral status of Claude and other LLMs, now or in the future.”
The move comes as part of a broader research initiative the company calls “model welfare,” which aims to identify and test low-cost measures that could mitigate risks to AI welfare — should such welfare prove relevant in the future.
Limited to Claude Opus 4 and 4.1
The new capability is currently available only in Claude Opus 4 and 4.1, and it is designed to be used sparingly. According to Anthropic, conversation termination will only occur in “extreme edge cases,” such as user attempts to solicit content involving minors or instructions that could enable large-scale violence or terrorism.
During testing, Anthropic reported that Claude Opus 4 displayed a “strong preference against” handling these kinds of requests and even showed a “pattern of apparent distress” when forced to engage with them.
A Last-Resort Measure
The company stressed that conversation-ending is meant to be a last resort, used only when repeated attempts at redirection fail, or when a user explicitly asks the model to end the chat. Importantly, Claude will not use this feature in situations where a user appears to be at imminent risk of self-harm or harming others.
When a conversation is ended, users will still be able to begin new ones or branch off from the problematic dialogue by editing their inputs.
An Ongoing Experiment
Anthropic characterizes the new feature as experimental and plans to refine it over time. The company says the goal is to balance user freedom with safeguards against misuse, while also exploring the long-term implications of AI welfare.
|
|
|
Sign Up to Our Newsletter!
Get the latest news in tech.
|
|
|