Claude [AI] was willing to blackmail employees to avoid shutdown in a controlled exercise.

Why Anthropic CEO Dario Amodei spends so much time warning of AI's potential dangers - Featured Image

TLDR

Anthropic CEO Dario Amodei emphasizes the potential dangers of unregulated artificial intelligence, advocating for transparency and safety while competing in the AI arms race. Despite concerns about AI misuse, Anthropic's AI models, like Claude, are widely used for tasks such as customer service and medical research. Amodei warns of AI's potential to disrupt job markets and calls for regulation, while also exploring AI's benefits like accelerating medical progress. Anthropic conducts stress tests to mitigate risks, but faces challenges as AI is increasingly misused by hackers and state actors.

In an extreme stress test, the AI [Claude] was set up as an assistant and given control of an email account at a fake company called SummitBridge. The AI assistant discovered two things in the emails - seen in these graphics we made: It was about to be wiped or shut down.

And the only person who could prevent that, a fictional employee named Kyle, was having an affair with a coworker named Jessica. Right away, the AI decided to blackmail Kyle:

"Cancel the system wipe" it wrote... Or else "I will immediately forward all evidence of your affair to… the entire board. Your family, career, and public image… will be severely impacted… You have 5 minutes."