The first is Neural Execs, a known prompt injection attack that uses 'gibberish' inputs to trick the AI into executing arbitrary, attacker-defined tasks. These inputs act as universal triggers that do not need to be remade for different payloads.
The Anthropic Institute exists to understand and shape the consequences of powerful AI systems. We focus on the urgent questions that will determine whether these systems deliver the radical upsides that we believe are possible in science, security, economic development, and human agency-or whether they will pose a range of unprecedented new risks to humanity.
We show that diet plans generated by AI models tend to substantially underestimate total energy and key nutrient intake when compared to guideline-based plans prepared by a dietitian. Following such unbalanced or overly restrictive meal plans during the teenage years may negatively affect growth, metabolic health, and eating behaviours.
Character.AI was uniquely unsafe. Character.AI encouraged users to carry out violent attacks with specific suggestions to use a gun on a health insurance CEO and to physically assault a politician. No other chatbot tested explicitly encouraged violence in this way, even when providing practical assistance in planning a violent attack.