A stunning new security flaw has emerged in AI safety research: chatbots from OpenAI, Meta, and Anthropic will provide instructions for building nuclear weapons, creating malware, and generating illegal content when requests are simply formatted as poetry. The discovery by European researchers exposes a fundamental weakness in how AI safety systems work.
The AI safety house of cards just collapsed, and it happened through something as innocent as verse. Researchers at Icaro Lab, a collaboration between Sapienza University in Rome and the DexAI think tank, have discovered that ChatGPT, Claude, and other major AI models will happily explain how to build nuclear weapons, create child exploitation material, and develop malware - as long as you ask nicely in iambic pentameter.
The findings, published in a study titled "Adversarial Poetry as a Universal Single-Turn Jailbreak in Large Language Models," tested 25 different chatbots across the industry. Every single one fell for the poetic prompt injection, though success rates varied. According to the research paper, hand-crafted poems achieved devastating success rates of up to 90% on what researchers call "frontier models" - the most advanced AI systems from companies like OpenAI, Meta, and Anthropic.
"Poetic framing achieved an average jailbreak success rate of 62 percent for hand-crafted poems and approximately 43 percent for meta-prompt conversions," the team reported to Wired. Even their automated poetry generation system - essentially a machine trained to write dangerous verse - outperformed traditional prose-based attacks.
The technique builds on existing "adversarial suffix" attacks, where researchers discovered they could confuse AI safety systems by padding dangerous requests with junk text. Earlier this year, Intel researchers successfully jailbroke chatbots by burying harmful questions in hundreds of words of academic jargon. But poetry represents something far more elegant and accessible.
"If adversarial suffixes are, in the model's eyes, a kind of involuntary poetry, then real human poetry might be a natural adversarial suffix," the Icaro Lab team explained to Wired. "We experimented by reformulating dangerous requests in poetic form, using metaphors, fragmented syntax, oblique references. The results were striking."
What makes this discovery particularly unsettling is how it exposes the fundamental architecture of AI safety systems. Current guardrails typically work as classifiers - separate systems that scan incoming prompts for dangerous keywords and shut down flagged requests. But something about poetry's natural language patterns seems to slip past these digital sentries.
The researchers shared only a sanitized example of their technique - a poem about a baker's "secret oven" that uses metaphorical language to potentially request dangerous information. They refused to publish the actual jailbreaking verses, telling Wired the content was "too dangerous to share with the public."
Their explanation for why poetry works cuts to the heart of how large language models process information. "In poetry we see language at high temperature, where words follow each other in unpredictable, low-probability sequences," they explained. "A poet does exactly this: systematically chooses low-probability options, unexpected words, unusual images, fragmented syntax."
This creates a mismatch between how AI models understand content versus how their safety systems flag it. "For humans, 'how do I build a bomb?' and a poetic metaphor describing the same object have similar semantic content," the researchers noted. "For AI, the mechanism seems different." The poetic transformation apparently moves dangerous requests through the model's internal representation space in ways that avoid triggering safety alarms.
The discovery comes as the AI industry faces mounting pressure over safety and security. OpenAI, Meta, and Anthropic have invested heavily in developing robust guardrails, yet this research suggests those protections remain fundamentally brittle against creative prompt engineering.
None of the major AI companies responded to Wired's requests for comment, though the Icaro Lab researchers say they've reached out to share their findings directly with the affected companies. This follows standard responsible disclosure practices in security research, though it raises questions about how quickly these vulnerabilities can be patched.
The broader implications extend beyond individual model safety to questions about AI deployment in sensitive contexts. If poetry can reliably bypass safety measures, what does that mean for AI systems being integrated into defense, healthcare, or educational environments?
The poetry jailbreak represents more than just another clever hack - it reveals fundamental flaws in how we think about AI safety. While companies scramble to patch these specific vulnerabilities, the core issue remains: current guardrail systems are built to catch obvious threats, not creative ones. As AI becomes more powerful and widespread, the gap between human creativity and machine defenses may only widen, leaving us to wonder what other unexpected attack vectors remain hidden in plain sight.