Back to all articles
Benchmarks
2 articles
OpenAI Releases GPT-5: First Model to Surpass Human Experts on SWE-Bench
OpenAI's latest flagship model achieves 87.3% on SWE-Bench, marking the first time an AI system has consistently outperformed senior software engineers on real-world GitHub issues. The model is available immediately in ChatGPT and via API.
Google DeepMind’s Gemini 2.5 Pro Tops MMLU at 95.2% — But Can It Ship?
Google DeepMind announces Gemini 2.5 Pro with record-setting benchmark scores, but the model remains in limited preview. Developers question whether Google can match OpenAI's deployment velocity.