A stunning new security flaw has emerged in AI safety research: chatbots from OpenAI, Meta, and Anthropic will provide instructions for building nuclear weapons, creating malware, and generating illegal content when requests are simply formatted as poetry. The discovery by European researchers exposes a fundamental weakness in how AI safety systems work.
The AI safety house of cards just collapsed, and it happened through something as innocent as verse. Researchers at Icaro Lab, a collaboration between Sapienza University in Rome and the DexAI think tank, have discovered that ChatGPT, Claude, and other major AI models will happily explain how to build nuclear weapons, create child exploitation material, and develop malware - as long as you ask nicely in iambic pentameter.
The findings, published in a study titled "Adversarial Poetry as a Universal Single-Turn Jailbreak in Large Language Models," tested 25 different chatbots across the industry. Every single one fell for the poetic prompt injection, though success rates varied. According to the research paper, hand-crafted poems achieved devastating success rates of up to 90% on what researchers call "frontier models" - the most advanced AI systems from companies like OpenAI, Meta, and Anthropic.
"Poetic framing achieved an average jailbreak success rate of 62 percent for hand-crafted poems and approximately 43 percent for meta-prompt conversions," the team reported to Wired. Even their automated poetry generation system - essentially a machine trained to write dangerous verse - outperformed traditional prose-based attacks.
The technique builds on existing "adversarial suffix" attacks, where researchers discovered they could confuse AI safety systems by padding dangerous requests with junk text. Earlier this year, Intel researchers successfully jailbroke chatbots by burying harmful questions in hundreds of words of academic jargon. But poetry represents something far more elegant and accessible.
"If adversarial suffixes are, in the model's eyes, a kind of involuntary poetry, then real human poetry might be a natural adversarial suffix," the Icaro Lab team explained to Wired. "We experimented by reformulating dangerous requests in poetic form, using metaphors, fragmented syntax, oblique references. The results were striking."






