Live
The latest in AI — model releases, research breakthroughs, and industry news
Back to all articles

OpenAI Releases GPT-5: First Model to Surpass Human Experts on SWE-Bench

OpenAI's latest flagship model achieves 87.3% on SWE-Bench, marking the first time an AI system has consistently outperformed senior software engineers on real-world GitHub issues. The model is available immediately in ChatGPT and via API.

Twitter LinkedIn

OpenAI today released GPT-5, its most capable language model to date, achieving a groundbreaking 87.3% on the SWE-Bench benchmark — a suite of real-world software engineering tasks drawn from popular GitHub repositories.

The result is significant because SWE-Bench tests are not toy problems. They require understanding complex codebases, identifying bugs, and producing working patches that pass existing test suites. GPT-5’s score surpasses the typical performance of mid-to-senior engineers on the same benchmark.

Key Technical Improvements

GPT-5 introduces what OpenAI calls “extended reasoning chains,” a new training technique that allows the model to maintain coherent multi-step reasoning across contexts of up to 1 million tokens. The model also features improved tool use, native code execution, and a new “verification loop” that catches its own errors before outputting.

On MMLU, GPT-5 scores 94.1% (up from 86.4% for GPT-4o). On MATH, it achieves 89.7%. On HumanEval, it reaches 96.2%. But it’s the SWE-Bench result that has the industry buzzing — previous state-of-the-art was Anthropic’s Claude 4 Opus at 81.2%.

Pricing and Availability

GPT-5 is available immediately in ChatGPT Plus and Team plans. API pricing starts at $15 per million input tokens and $60 per million output tokens for the flagship model, with a smaller “GPT-5 Mini” variant priced at $3/$12.

Industry Reaction

Anthropic CEO Dario Amodei called the results “impressive” but noted that “benchmark saturation doesn’t tell the full story about real-world reliability.” Google DeepMind’s Jeff Dean highlighted that Gemini’s upcoming update would address “different dimensions of capability.”

The release comes amid intensifying competition in the frontier model space, with Anthropic, Google, Meta, and xAI all racing to release their next-generation models in the coming months.

Benchmarks GPT-5 LLM OpenAI

Related Articles