updated 15 Dec 2025
LLM Leaderboard
This LLM leaderboard displays the latest public benchmark performance for SOTA model versions released after April 2024. The data comes from model providers as well as independently run evaluations by Vellum or the open-source community. We feature results from non-saturated benchmarks, excluding outdated benchmarks (e.g. MMLU). If you want to use these models in your agents, try Vellum.
Top models per tasks
Best in Reasoning (GPQA Diamond)
1GPT 5.292.4
2Gemini 3 Pro91.9
3GPT 5.188.1
4Grok 487.5
5GPT-587.3
Best in High School Math (AIME 2025)
1GPT 5.2100
2Gemini 3 Pro100
3Kimi K2 Thinking99.1
4GPT oss 20b98.7
5OpenAI o398.4
Best in Agentic Coding (SWE Bench)
1Claude Sonnet 4.582
2Claude Opus 4.580.9
3GPT 5.280
4GPT 5.176.3
5Gemini 3 Pro76.2
Best Overall (Humanity's Last Exam)
1Gemini 3 Pro45.8
2Kimi K2 Thinking44.9
3GPT-535.2
4Grok 425.4
5Gemini 2.5 Pro21.6
Best in Visual Reasoning (ARC-AGI 2)
1Claude Opus 4.5378
2GPT 5.253
3Gemini 3 Pro31
4GPT 5.118
5GPT-518
Best in Multilingual Reasoning (MMMLU)
1Gemini 3 Pro91.8
2Claude Opus 4.590.8
3Claude Opus 4.189.5
4Gemini 2.5 Pro89.2
5Claude Sonnet 4.589.1
Fastest and most affordable models
Fastest Models (Tokens/sec)
1Llama 4 Scout2600
2Llama 3.3 70b2500
3Llama 3.1 70b2100
4Llama 3.1 8b1800
5Llama 3.1 405b969
Lowest Latency (TTFT)
1Nova Micro0.3s
2Llama 3.1 8b0.32s
3Llama 4 Scout0.33s
4Gemini 2.0 Flash0.34s
5GPT-4o mini0.35s
Cheapest Models (per 1M tokens)
1Nova Micro$0.04 / $0.14
2Gemma 3 27b$0.07 / $0.07
3Gemini 1.5 Flash$0.075 / $0.3
4GPT oss 20b$0.08 / $0.35
Model Comparison
| Model | Context size | Cutoff date | I/O cost | Max output | Latency | Speed |
|---|---|---|---|---|---|---|
| GPT 5.2 | 400k | Aug 2025 | $1.5 / $14 | 16,000 | 0.6s | 92 t/s |
| Claude Opus 4.5 | 200,000 | April 2025 | $5 / $25 | 64,000 | - | - |
| Claude Sonnet 4.5 | 200,000 | April 2025 | $3 / $15 | 160,000 | 31s | 69 t/s |
| Gemini 3 Pro | 10,000,000 | April 2025 | $2 / $12 | 650,000 | 30.3s | 128 t/s |
| Kimi K2 Thinking | 256,000 | April 2025 | $0.6 / $2.5 | 16,400 | 25.3s | 79 t/s |
| GPT 5.1 | 200,000 | April 2025 | $1.25 / $10 | 128,000 | - | - |
| GPT-5 | 400,000 | April 2025 | $1.25 / $10 | 128,000 | - | - |
| Claude Opus 4.1 | 200,000 | April 2025 | $15 / $75 | 32,000 | - | - |
| Gemini 2.5 Pro | 1,000,000 | Nov 2024 | $1.25 / $10 | 65,000 | 30s | 191 t/s |
| Claude 3.7 Sonnet | 200,000 | Nov 2024 | $3 / $15 | 128,000 | 0.91s | 78 t/s |
| DeepSeek-R1 | 128,000 | Dec 2024 | $0.55 / $2.19 | 8,000 | 4s | 24 t/s |
| GPT-4o | 128,000 | Oct 2023 | $2.5 / $10 | 4,096 | 0.51s | 143 t/s |
| Claude 3.5 Sonnet | 200,000 | Apr 2024 | $3 / $15 | 4,096 | 1.22s | 78 t/s |