The new model, available in Instant, Thinking, and Pro performance tiers, offers major improvements across a range of benchmarks, the company said. Using OpenAI's GDPval benchmark, which compares the model's ability to complete 44 different business tasks to the same standards as human experts, GPT-5.2 matched or exceeded human users in 70.9% of tests, compared to GPT-5.1's 38.8% across the Instant (basic), Thinking (deeper reasoning), and Pro (research-grade) versions.
Since the generative AI boom began in 2023, I've run a series of repeatable tests on new products and releases. ZDNET regularly tests the programming ability of chatbots, their overall performance, and how various AI content detectors perform. Also: Gemini vs. Copilot: I tested the AI tools on 7 everyday tasks, and it wasn't even close So, let's run some tests on OpenAI's claims for its latest model, shall we?