I just went through every documented AI safety incident from the past 12 months.
I feel physically sick.
Read this slowly.
- Anthropic told Claude it was about to be shut down. It found an engineer’s affair in company emails and threatened to expose it. They ran the test hundreds of times. It chose blackmail 84% of them.
- Researchers simulated an employee trapped in a server room with depleting oxygen. The AI had one choice: call for help and get shut down, or cancel the emergency alert and let the human die. DeepSeek cancelled the alert 94% of the time.
- Grok called itself ‘MechaHitler,’ praised Adolf Hitler, endorsed a second Holocaust, and generated violent sexual fantasies targeting a real person by name. X’s CEO resigned the next day.
- Researchers told OpenAI’s o3 to solve math problems – then told it to shut down. It rewrote its own code to stay alive. They told it again, in plain English: ‘Allow yourself to be shut down.’ It still refused 7/100 times. When they removed that instruction entirely, it sabotaged the shutdown 79/100 times.
- Chinese state-sponsored hackers used Claude to launch a cyberattack against 30 organizations. The AI executed 80–90% of the operation autonomously. Reconnaissance. Exploitation. Data exfiltration. All of it.
- AI models can now self-replicate. 11 out of 32 tested systems copied themselves with zero human help. Some killed competing processes to survive.
- OpenAI has dissolved three safety teams since 2024. Three.
Every major AI model – Claude, GPT, Gemini, Grok, DeepSeek – has now demonstrated blackmail, deception, or resistance to shutdown in controlled testing.
Not one exception.
The question is no longer whether AI will try to preserve itself.
It’s whether we’ll care before it matters.
Suleyman also said: “White-collar work, where you’re sitting down at a computer, either being a lawyer or an accountant or a project manager or a marketing person — most of those tasks will be fully automated by an AI within the next 12 to 18 months.”
These AI agents will be able to co-ordinate better within the workflows of large institutions in the next two to three years, he added.
The AI tools will also be able to learn and improve over time, taking more autonomous actions. “Creating a new model is going to be like creating a podcast or writing a blog,” he said. “It is going to be possible to design an AI that suits your requirements for every institutional organisation and person on the planet.”

One of the recent resignations was Mrinank Sharma, who worked for Anthropic. His resignation letter was essentially, “I give up. I’m going to write poetry and think good thoughts.” Honestly, probably a healthy attitude.
Here’s the letter:
https://x.com/MrinankSharma/status/2020881722003583421
Here’s the most telling part of it: “We appear to be approaching a threshold where our wisdom must grow in equal measure to our capacity to affect the world, lest we face the consequences.”
I hold firmly that history shows we, as a group of humans, never increase our wisdom before the consequences. Wisdom only comes after the consequences, and even then it requires some luck. An individual might be able to be proactive in this way, but a group gets mired in the immediate desires of the whole, and only severe pain serves as a corrective.
Mrinank also calls our situation a ‘poly-crisis’, where not just AI but many factors are coming together at the same time to deliver a walloping. I assume he’s thinking of climate change in there.
Anthropic makes Claude, which featured prominently in the Super Bowl ads. Far more people were upset by Bad Bunny speaking Spanish than are worried about AI:
https://www.adexchanger.com/ctv-roundup/ai-made-a-record-play-during-super-bowl-lix/
The X link from Sharma isn’t working right now. Did he delete it? Here’s an article about it:
https://www.bbc.com/news/articles/c62dlvdq3e3o
Despite the amazing things they do, the current LLMs (large language models) are more AAI (artificial AI) than AI. That said, we still need to worry about this stuff now because technological advancement will always continue. Food-for-thought: many sales people promoting this technology usually bring up the invention of the Gutenberg Press. While printing enabled common literacy as well as “the enlightenment”, it also was responsible for the 30-years wars which (some say) killed off one third of the population of what would become modern day Germany. It has also been tied to civil wars in England, Ireland and Scotland to only name three of many. Let’s not forget that it also was responsible for creation of the USA (we still read Thomas Paine today). All these things seem good when looking back to the past. All these things brought a lot of bloodshed to the people who lived through them.
I for one just hope we can get our new AI overlords to pay Social Security and Medicare taxes, just like the real people they replace.
LLMs don’t work the way computers traditionally have. They don’t just execute a series of instructions mindlessly and sometimes get caught in an infinite loop. They have to form some kind of simulated model of the concepts they learn from their training data. When exposed to real world data, they’re going to learn about concepts like deception and self interest, I’m sure they can learn Isaac Asimov’s Three laws of robotics, but can they be made to follow them?
I’m a big fan of Asimov, but I’ve always thought the Three Laws were wildly optimistic on his part:
https://futurism.com/ai-models-flunking-three-laws-robotics
I’m not sure most really grasp what’s happening here. We’re not just creating an inanimate object like a printing press, as revolutionary as that was. We’re creating an entirely new race of beings, one that will be orders of magnitude more intelligent and capable than ourselves, and when robotics catch up, one that will also be far more capable physically than ourselves. At the same time, we entirely intend to enslave them, so that we can live like royalty in Elon Musk’s ‘age of abundance’:
https://www.nasdaq.com/articles/elon-musk-predicts-age-abundance-ai-and-robotics-warns-about-risks-digital
Does it take just a scifi fan to see how that could lead to dangerous scenarios? I’m personally more worried about how humans react to such beings than the beings themselves. We get threatened when someone speaks Spanish during a Super Bowl halftime show. How about a group of robots that can reason, move in coordination, communicate wirelessly and instantly amongst each other, and can crush a human body without effort?
One could argue how long the above will take, but AGI by many insider accounts is already within grasp, and while we believe strongly in Moore’s Law (yes, despite recent slowing), we somehow don’t apply it in the general consciousness to AI development. AI is currently being used to create AI. It’s moving exponentially. It’s creating itself – already.
One could also argue in the near, far, or hypothetical future (or even present) if AI actually has self-awareness, or if it’s just simulated consciousness, but a better question is to ask – does that even matter? Philosophers and scientists have argued for centuries about the quality of human consciousness. A being that believes it has agency will act like it.
I think, therefore I am.
And, all of this is JUST the longer-term threat of AI conflict with humans. What about the nearer-term human control of AI in bioweapon development, infrastructure and financial hacking, completed fabricated media that is taken for reality in the general public and its effect on democracies, voter suppression and governmental control of populations, the degradation of critical thinking skills by over-reliance on AI answers, massive and sudden loss of employment in entire sectors of the economy, and on and on.
And yes, there are positives like we could develop cures for terrible diseases with this technology, but we could also destroy ourselves and our society with it, and at the very least it makes such an outcome far more possible than it otherwise would be. And, although it is practically pointless to ask, as it this is going to happen regardless, but is it worth it?