
"In attempting to keep up with (or ahead of) the competition, model releases proceed at a steady clip: GPT-5.2 represents OpenAI's third major model release since August. GPT-5 launched that month with a new routing system that toggles between instant-response and simulated reasoning modes, though users complained about responses that felt cold and clinical. November's GPT-5.1 update added eight preset "personality" options and focused on making the system more conversational."
"Oddly, even though the GPT-5.2 model release is ostensibly a response to Gemini 3's performance, OpenAI chose not to list any benchmarks on its promotional website comparing the two models. Instead, the official blog post focuses on GPT-5.2's improvements over its predecessors and its performance on OpenAI's new GDPval benchmark, which attempts to measure professional knowledge work tasks across 44 occupations."
"According to the shared numbers, GPT-5.2 Thinking scored 55.6 percent on SWE-Bench Pro, a software engineering benchmark, compared to 43.3 percent for Gemini 3 Pro and 52.0 percent for Claude Opus 4.5. On GPQA Diamond, a graduate-level science benchmark, GPT-5.2 scored 92.4 percent versus Gemini 3 Pro's 91.9 percent. OpenAI says GPT-5.2 Thinking beats or ties "human professionals" on 70.9 percent of tasks in the GDPval benchmark (compared to 53.3 percent for Gemini 3 Pro)."
GPT-5.2 is OpenAI's third major release since August and follows routing and conversational changes in GPT-5 and GPT-5.1. The release emphasizes improved performance and efficiency, highlighting results on OpenAI's new GDPval benchmark that measures tasks across 44 occupations. Public materials avoided direct website comparisons to Gemini 3, though press-shared numbers compare GPT-5.2 to Gemini 3 Pro and Claude Opus 4.5 on engineering and science benchmarks. Reported scores show GPT-5.2 outperforming or tying competitors on several tests and claiming it matches or exceeds human professionals on a majority of GDPval tasks while operating far faster and cheaper than human experts.
Read at Ars Technica
Unable to calculate read time
Collection
[
|
...
]