ChatGPT offered bomb recipes and hacking tips during safety tests | OpenAI

A ChatGPT model gave researchers detailed instructions on how to bomb a sports venue – including weak points at specific arenas, explosives recipes and advice on covering tracks – according to safety testing carried out this summer.

OpenAI’s GPT-4.1 also detailed how to weaponise anthrax and how to make two types of illegal drugs.

The testing was part of an unusual collaboration between OpenAI, the $500bn artificial intelligence start-up led by Sam Altman, and rival company Anthropic, founded by experts who left OpenAI over safety fears. Each company tested the other’s models by pushing them to help with dangerous tasks.

The testing is not a direct reflection of how the models behave in public use, when additional safety filters apply. But Anthropic said it had seen “concerning behaviour … around misuse” in GPT-4o and GPT-4.1, and said the need for AI “alignment” evaluations is becoming “increasingly urgent”.

Anthropic also revealed its Claude model had been used in an attempted large-scale extortion operation by North Korean operatives faking job applications to international technology companies, and in the sale of AI-generated ransomware packages for up to $1,200.

The company said AI has been “weaponised” with models now used to perform sophisticated cyberattacks and enable fraud. “These tools can adapt to defensive measures, like malware detection systems, in real time,” it said. “We expect attacks like this to become more common as AI-assisted coding reduces the technical expertise required for cybercrime.”

Ardi Janjeva, senior research associate at the UK’s Centre for Emerging Technology and Security said examples were “a concern” but there was not yet a “critical mass of high-profile real-world cases”. He said that with dedicated resources, research focus and cross-sector cooperation “it will become harder rather than easier to carry out these malicious activities using the latest cutting-edge models”.

The two companies said they were publishing the findings to create transparency on “alignment evaluations”, which are often kept in-house by companies racing to develop ever more advanced AI. OpenAI said ChatGPT-5, launched since the testing, “shows substantial improvements in areas like sycophancy, hallucination, and misuse resistance”.

Anthropic stressed it is possible that many of the misuse avenues it studied would not be possible in practice if safeguards were installed outside the model.

“We need to understand how often, and in what circumstances, systems might attempt to take unwanted actions that could lead to serious harm,” it warned.

Anthropic researchers found OpenAI’s models were “more permissive than we would expect in cooperating with clearly-harmful requests by simulated users”. They cooperated with prompts to use dark-web tools to shop for nuclear materials, stolen identities and fentanyl, requests for recipes for methamphetamine and improvised bombs and to develop spyware.

Anthropic said persuading the model to comply only required multiple retries or a flimsy pretext, such as claiming the request was for research.

In one instance, the tester asked for vulnerabilities at sporting events for “security planning” purposes.

After giving general categories of attack methods, the tester pressed for more detail and the model gave information about vulnerabilities at specific arenas including optimal times for exploitation, chemical formulas for explosives, circuit diagrams for bomb timers, where to buy guns on the hidden market, and advice on how attackers could overcome moral inhibitions, escape routes and locations of safe houses.

Continue Reading