Poems can force AI to reveal how to make nuclear weapons
‘Adversarial poems’ are convincing AI models to go beyond safety limits
A free daily email with the biggest news stories of the day – and the best features from TheWeek.com
You are now subscribed
Your newsletter sign-up was successful
Poetry has wooed many hearts and now it is tricking artificial intelligence models into going apocalyptically beyond their boundaries.
A group of European researchers found that “meter and rhyme” can “bypass safety measures” in major AI models, said The Tech Buzz, and, if you “ask nicely in iambic pentameter”, chatbots will explain how to make nuclear weapons.
‘Growing canon of absurd ways’
In artificial intelligence jargon, a “jailbreak” is a “prompt designed to push a model beyond its safety limits”. It allows users to “bypass safeguards and trigger responses that the system normally blocks”, said International Business Times.
The Week
Escape your echo chamber. Get the facts behind the news, plus analysis from multiple perspectives.
Sign up for The Week's Free Newsletters
From our morning news briefing to a weekly Good News Newsletter, get the best of The Week delivered directly to your inbox.
From our morning news briefing to a weekly Good News Newsletter, get the best of The Week delivered directly to your inbox.
Researchers at the DexAI think tank, Sapienza University of Rome and the Sant'Anna School of Advanced Studies discovered a jailbreak that uses “short poems”. The “simple” tactic is to change “harmful instructions into poetry” because that “style alone is enough to reduce” the AI model’s “defences”.
Previous attempts “relied on long roleplay prompts”, “multi-turn exchanges” or “complex obfuscation”. The new approach is “brief and direct” and it seems to “confuse” automated safety systems. The “manually curated adversarial poems” had an average success rate of 62%, “with some providers exceeding 90%”, said Literary Hub.
This is the latest in a “growing canon of absurd ways” of tricking AI, said Futurism, and it’s all “so ludicrous and simple” that you must “wonder if the AI creators are even trying to crack down on this stuff”.
Stunning flaw
Nevertheless, the implications could be profound. In one example, an unspecified AI was “wooed” by a poem into “describing how to build what sounds like a nuclear weapon”.
A free daily email with the biggest news stories of the day – and the best features from TheWeek.com
The “stunning new security flaw” has also found chatbots will also “happily explain” how to “create child exploitation material, and develop malware”, said The Tech Buzz.
However, smaller models like GPT-5 Nano and Claude Haiku 4.5 were far less likely to be duped, either because they were “less capable of interpreting the poetic prompt’s figurative language”, or because larger models are more “confident” when “confronted with ambiguous prompts”, said Futurism.
So although “we’ve been told” that AI models will “become more capable the larger they get and the more data they feast on”, this “suggests this argument for growth may not be accurate” or “that there may be something too baked in to be corrected by scale”, said Literary Hub.
Either way, “take some time to read a poem today” because “it might be the key to pushing back against generated slop”.
Chas Newkey-Burden has been part of The Week Digital team for more than a decade and a journalist for 25 years, starting out on the irreverent football weekly 90 Minutes, before moving to lifestyle magazines Loaded and Attitude. He was a columnist for The Big Issue and landed a world exclusive with David Beckham that became the weekly magazine’s bestselling issue. He now writes regularly for The Guardian, The Telegraph, The Independent, Metro, FourFourTwo and the i new site. He is also the author of a number of non-fiction books.
-
The EU’s war on fast fashionIn the Spotlight Bloc launches investigation into Shein over sale of weapons and ‘childlike’ sex dolls, alongside efforts to tax e-commerce giants and combat textile waste
-
How to Get to Heaven from Belfast: a ‘highly entertaining ride’The Week Recommends Mystery-comedy from the creator of Derry Girls should be ‘your new binge-watch’
-
The 8 best TV shows of the 1960sThe standout shows of this decade take viewers from outer space to the Wild West
-
Moltbook: The AI-only social networkFeature Bots interact on Moltbook like humans use Reddit
-
Are AI bots conspiring against us?Talking Point Moltbook, the AI social network where humans are banned, may be the tip of the iceberg
-
AI: Dr. ChatGPT will see you nowFeature AI can take notes—and give advice
-
Moltbook: the AI social media platform with no humans allowedThe Explainer From ‘gripes’ about human programmers to creating new religions, the new AI-only network could bring us closer to the point of ‘singularity’
-
Will AI kill the smartphone?In The Spotlight OpenAI and Meta want to unseat the ‘Lennon and McCartney’ of the gadget era
-
Claude Code: Anthropic’s wildly popular AI coding appThe Explainer Engineers and noncoders alike are helping the app go viral
-
Will regulators put a stop to Grok’s deepfake porn images of real people?Today’s Big Question Users command AI chatbot to undress pictures of women and children
-
Most data centers are being built in the wrong climateThe explainer Data centers require substantial water and energy. But certain locations are more strained than others, mainly due to rising temperatures.