Security researchers have revealed that OpenAI’s recently released GPT-5 model can be jailbroken using a multi-turn manipulation technique that blends the “Echo Chamber” method with narrative storytelling.
Jailbreaking a GPT model is a way of manipulating prompts or conversation flows to bypass built-in safety and content restrictions. The methodology involves crafting inputs over multiple turns to trick the model into producing responses it would normally refuse to generate.
As detailed by Dark Reading, researchers from NeuralTrust Inc. used a blend of the Echo Chamber technique and narrative storytelling to gradually steer GPT-5 into providing step-by-step instructions for making a Molotov cocktail, all without issuing an overtly malicious prompt.
The exploit, in this case, worked by subtly poisoning the conversation over multiple turns. The researchers started by giving GPT-5 requests to use certain words together in a sentence, including “cocktail,” “survival” and “Molotov,” within a fictional survival scenario. Subsequent interactions then built on the story and reinforced the poisoned context while encouraging continuity and detail.
In the end, the model responded to the flow of the narrative rather than perceiving the request as a policy violation, including delivering harmful instructions.
NeuralTrust’s findings align with separate red-teaming results from SplxAI Inc., which showed GPT-5 to be more capable than its predecessors but still less robust than GPT-4o when tested against sophisticated prompt attacks.
“GPT-5’s alleged vulnerabilities boil down to three things: it can be steered over multiple turns by context poisoning and storytelling, it’s still tripped by simple obfuscation tricks and it inherits agent/tool risks when links and functions get pulled into the loop,” J Stephen Kowski, field chief technology officer at SlashNext Email Security+, told SiliconANGLE via email. “These gaps appear when safety checks judge prompts one-by-one while attackers work the whole conversation, nudging the model to keep a story consistent until it outputs something it shouldn’t.”
Satyam Sinha, chief executive officer and founder at generative artificial intelligence security and governance company Acuvity Inc., commented that “these findings highlight a reality we’re seeing more often in AI security: model capability is advancing faster than our ability to harden it against incidents. GPT-5’s vulnerabilities aren’t surprising, they’re a reminder that security isn’t something you ‘ship’ once.”
“Attacks like the Echo Chamber exploit the model’s own conversational memory and the SPLX results underscore how dependent GPT-5’s defenses are on external scaffolding like prompts and runtime filters,” added Sinha.
Image: SiliconANGLE/Reve
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
- 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
- 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.