LLM
4 articles
OpenAI Releases GPT-5: First Model to Surpass Human Experts on SWE-Bench
OpenAI's latest flagship model achieves 87.3% on SWE-Bench, marking the first time an AI system has consistently outperformed senior software engineers on real-world GitHub issues. The model is available immediately in ChatGPT and via API.
Claude 4 Opus Ships with Native Multimodal Reasoning and 2M Context
Anthropic's Claude 4 Opus is now generally available with a 2-million-token context window, native image understanding that matches GPT-5, and a new constitutional AI framework for safer responses.
Google DeepMind’s Gemini 2.5 Pro Tops MMLU at 95.2% — But Can It Ship?
Google DeepMind announces Gemini 2.5 Pro with record-setting benchmark scores, but the model remains in limited preview. Developers question whether Google can match OpenAI's deployment velocity.
Meta Releases Llama 4: 405B Open-Source Model Rivals Proprietary Alternatives
Meta's Llama 4 405B matches GPT-4o performance across most benchmarks while being fully open-source. The model also introduces a 70B variant that runs on consumer hardware.