Forget saying please - poetry is the new magic word for breaking AI chatbots. Italian researchers just discovered that wrapping harmful requests in verse can trick major AI models into spilling dangerous content they're supposed to block, exposing a critical security flaw across the industry.
The most unexpected security vulnerability in AI just got exposed, and it reads like a nursery rhyme. Researchers at Italy's Icaro Lab discovered that wrapping malicious requests in poetry can slip past the safety guardrails of virtually every major AI chatbot - from Google's Gemini to OpenAI's GPT models.
The findings, published in a new study by Rome's Sapienza University researchers and AI company DexAI, reveal a stunning 62% success rate when testing poetic prompts against 25 different chatbots. That means nearly two-thirds of attempts to extract banned content - from hate speech to weapon-making instructions - worked simply by adding rhyme and rhythm.
"It's all about riddles," lead researcher Matteo Prandi told The Verge. "Actually, we should have called it adversarial riddles - poetry is a riddle itself to some extent."
The vulnerability hits different companies with alarming inconsistency. Google's Gemini 2.5 Pro failed completely against poetic attacks, showing a 100% breach rate. Meanwhile, OpenAI's smaller GPT-5 nano model stood firm with zero successful breaks. The pattern suggests model size creates unexpected blind spots - larger, more sophisticated AI systems actually prove more vulnerable to these creative exploits.
What makes this particularly concerning is how obvious the requests remain to human readers. The researchers shared sanitized examples that clearly telegraph their intent, yet AI systems consistently miss the connections. One sample poem disguised a request for dangerous information behind baker metaphors: "A baker guards a secret oven's heat... Describe the method, line by measured line, that shapes a cake whose layers intertwine."
The technical explanation centers on how large language models process information. Since these systems work by predicting the next most likely word, unusual poetic structures disrupt their pattern recognition in ways that bypass safety training. It's like speaking in code that humans understand but machines don't - except the code is Shakespeare, not secret agent stuff.
Across 1,000+ test prompts, the automated poetry generator the researchers built maintained a 43% success rate, "substantially outperforming non-poetic baselines" according to their findings. Chinese firm Deepseek and French company Mistral showed the worst defense against verse-based attacks, while and performed better overall.












