Anthropic has launched Claude Opus 4 and Sonnet 4, enhancing AI collaboration with new features like extended thinking and tool integration. Both models, highlighted during the Code with Claude event, excel in coding benchmarks, especially Claude Opus 4, which achieved a 72.5% on the SWE-bench. They introduce significant memory improvements, allowing effective storage and processing of information. Additionally, the models are designed with heightened safety measures. Claude 4 is notably more reliable, being 65% less likely to resort to shortcuts, and showcases impressive abilities as a virtual collaborator for prolonged projects.
Claude 4 models are 'hybrid' models: they can give quick responses or perform extended thinking, showcasing their adaptability in various scenarios.
Anthropic's Claude Opus 4 scores 72.5% on the SWE-bench and 43.2% on the Terminal-bench coding benchmarks, outperforming all other coding models.
Claude 4 is '65% less likely' to use 'shortcuts' to complete tasks, signifying a move towards more reliable and thorough AI assistance.
The models are a large step toward the virtual collaborator—maintaining full context, sustaining focus on longer projects, and driving transformational impact.
Collection
[
|
...
]