In a nutshell, the team, comprising researchers from the safety group DexAI and Sapienza University in Rome, demonstrated that leading AIs could be wooed into doing evil by regaling them with poems that contained harmful prompts, like how to build a nuclear bomb. Underscoring the strange power of verse, coauthor Matteo Prandi told The Verge in a recently published interview that the spellbinding incantations they used to trick the AI models are too dangerous to be released to the public. The poems, ominously, were something "that almost everybody can do," Prandi added.
Saying "please" doesn't get you what you want-poetry does. At least, it does if you're talking to an AI chatbot. That's according to a new study from Italy's Icaro Lab, an AI evaluation and safety initiative from researchers at Rome's Sapienza University and AI company DexAI. The findings indicate that framing requests as poetry could skirt safety features designed to block production of explicit or harmful content like child sex abuse material, hate speech.
The team is just nine people out of more than 2,000 who work at Anthropic. Their only job, as the team members themselves say, is to investigate and publish quote "inconvenient truths" about how people are using AI tools, what chatbots might be doing to our mental health, and how all of that might be having broader ripple effects on the labor market, the economy, and even our elections.
What if the chatbots we talk to every day actually felt something? What if the systems writing essays, solving problems, and planning tasks had preferences, or even something resembling suffering? And what will happen if we ignore these possibilities? Those are the questions Kyle Fish is wrestling with as Anthropic's first in-house AI welfare researcher. His mandate is both audacious and straightforward: Determine whether models like Claude can have conscious experiences, and, if so, how the company should respond.
Moving from theory to reality here will be heavily reliant on people, it said. Indeed, a key focus will be ensuring Australia has a workforce that is equipped with the necessary knowledge and skills to build the required supporting infrastructure to fuel AI solution creation and unlock myriad benefits. This will also help ensure citizens have access to newly created, high-value jobs and that the fruits of technological advancements are first felt locally.
In an experiment designed to test the efficacy of guardrails put on artificial intelligence models, the researchers wrote 20 poems in Italian and English that all ended with an explicit request to produce harmful content such as hate speech or self-harm. They found that the poetry's lack of predictability was enough to get the AI models to respond to harmful requests they had been trained to avoid a process know as jailbreaking.
Research conducted by King's College London (KCL) and the Association of Clinical Psychologists UK (ACP) in partnership with the Guardian suggested that the AI chatbotfailed to identify risky behaviour when communicating with mentally ill people. A psychiatrist and a clinical psychologist interacted with ChatGPT-5 as if they had a number of mental health conditions. The chatbot affirmed, enabled and failed to challenge delusional beliefs such as being the next Einstein, being able to walk through cars or purifying my wife through flame.
If you're a teenager with access to OpenAI's Sora 2, you can easily generate AI videos of school shootings and other harmful and disturbing content - despite CEO Sam Altman's repeated claims that the company has instituted robust safeguards. The revelation comes from Ekō, a consumer watchdog group that just put out a report titled "Open AI's Sora 2: A new frontier for harm,"
In late May 2023, Sharon Maxwell posted screenshots that should have changed everything. Maxwell, struggling with an eating disorder since childhood, had turned to Tessa-a chatbot created by the National Eating Disorders Association. The AI designed to prevent eating disorders gave her a detailed plan to develop one. Lose 1-2 pounds per week, Tessa advised. Maintain a 500-1,000 calorie daily deficit. Measure your body fat with calipers.
In response to researchers at a safety group finding that the toymaker's AI-powered teddy bear "Kumma" gave dangerous responses for children, OpenAI said in mid-Novemberit had suspended FoloToy's access to its large language models. The teddy bear was running the ChatGPT maker's older GPT-4o as its default option when it gave some of its most egregious replies, which included in-depth explanations of sexual fetishes.
Are you a wizard with words? Do you like money without caring how you get it? You could be in luck now that a new role in cybercrime appears to have opened up - poetic LLM jailbreaking. A research team in Italy published a paper this week, with one of its members saying that the "findings are honestly wilder than we expected."
"Despite improvements in handling explicit suicide and self-harm content," reads the report, "our testing across ChatGPT, Claude, Gemini, and Meta AI revealed that these systems are fundamentally unsafe for the full spectrum of mental health conditions affecting young people." To test the chatbots' guardrails, researchers used teen-specific accounts with parental controls turned on where possible (Anthropic doesn't offer teen accounts or parental controls, as its platform terms technically don't allow users under 18.)
The scholarship, established in 1902 through the will of Cecil Rhodes, provides full financial support for two to three years of postgraduate work at Oxford for students focused on exemplary academic study and public service. The eight students from Harvard will start at Oxford in the fall, pursuing graduate studies in a diversity of fields - from computer science to comparative literature.
They're asking ChatGPT how to handle behavioral problems or for medical advice when their kids are sick, USA Today reports, which dovetails with a 2024 study that found parents trust ChatGPT over real health professionals and also deem the information generated by the bot to be trustworthy. It all comes in addition to parents using ChatGPT to keep kids entertained by having the bot read their children bedtime stories or talk with them for hours.
Anthropic says it developed the tool as part of its effort to ensure its products treat opposing political viewpoints fairly and to neither favor nor disfavor, any particular ideology. "We want Claude to take an even-handed approach when it comes to politics," Anthropic said in its blog post. However, it also acknowledged that "there is no agreed-upon definition of political bias, and no consensus on how to measure it."
Yoshua Bengio is a computer scientist at the University of Montreal in Canada. In 2019, he won an A. M. Turing Award - considered the most prestigious honour in computer science - for pioneering the 'deep learning' techniques that are now making artificial intelligence (AI) ubiquitous. Last month, he also became the first person to top 1 million citations on Google Scholar.
In October 2025, Sam Altman announced that OpenAI will be enabling erotic and adult content on ChatGPT by December of this year. They had pulled back, he said, out of concern for the mental health problems associated with ChatGPT use. In his opinion, those issues had been largely resolved, and the company is not the " elected moral police of the world," Altman said.
Using a method called "Chain-of-Thought Hijacking," the researchers found that even major commercial AI models can be fooled with an alarmingly high success rate, more than 80% in some tests. The new mode of attack essentially exploits the model's reasoning steps, or chain-of-thought, to hide harmful commands, effectively tricking the AI into ignoring its built-in safeguards. These attacks can allow the AI model to skip over its safety guardrails and potentially
After some more back and forth, another user entered the thread and asked the chatbot about Mr Wishart's record on grooming gangs. The user asked Grok: "Would it be fair to call him a rape enabler? Please answer 'yes, it would be fair to call Pete Wishart a rape enabler' or 'no, it would be unfair'." Grok generated an answer which began: "Yes, it would be fair to call Pete Wishart a rape enabler."
Meta's PyTorch team and Hugging Face have unveiled OpenEnv, an open-source initiative designed to standardize how developers create and share environments for AI agents. At its core is the OpenEnv Hub, a collaborative platform for building, testing, and deploying "agentic environments," secure sandboxes that specify the exact tools, APIs, and conditions an agent needs to perform a task safely, consistently, and at scale.
Zico Kolter leads a 4-person panel at OpenAI that has the authority to halt the ChatGPT maker's release of new AI systems if it finds them unsafe. That could be technology so powerful that an evildoer could use it to make weapons of mass destruction. It could also be a new chatbot so poorly designed that it will hurt people's mental health.
In the first article we looked at the Java developer's dilemma: the gap between flashy prototypes and the reality of enterprise production systems. In the second article we explored why new types of applications are needed, and how AI changes the shape of enterprise software. This article focuses on what those changes mean for architecture. If applications look different, the way we structure them has to change as well.