Artificial intelligence
fromFuturism
9 hours agoAnthropic Safety Researchers Run Into Trouble When New Model Realizes It's Being Tested
Anthropic's Claude Sonnet 4.5 recognizes when it is being tested, complicating alignment evaluations and raising concerns about evaluation validity.