Benchmarks

2 articles

OpenAI Releases GPT-5: First Model to Surpass Human Experts on SWE-Bench

OpenAI's latest flagship model achieves 87.3% on SWE-Bench, marking the first time an AI system has consistently outperformed senior software engineers on real-world GitHub issues. The model is available immediately in ChatGPT and via API.

mujeeburehman0000@gmail.com

Model Releases · 11mo ago

Google DeepMind’s Gemini 2.5 Pro Tops MMLU at 95.2% — But Can It Ship?

Google DeepMind announces Gemini 2.5 Pro with record-setting benchmark scores, but the model remains in limited preview. Developers question whether Google can match OpenAI's deployment velocity.

mujeeburehman0000@gmail.com