updated 15 Dec 2025

LLM Leaderboard

This LLM leaderboard displays the latest public benchmark performance for SOTA model versions released after April 2024. The data comes from model providers as well as independently run evaluations by Vellum or the open-source community. We feature results from non-saturated benchmarks, excluding outdated benchmarks (e.g. MMLU). If you want to use these models in your agents, try Vellum.

Top models per tasks

Best in Reasoning (GPQA Diamond)

1GPT 5.292.4
2Gemini 3 Pro91.9
3GPT 5.188.1
4Grok 487.5
5GPT-587.3

Best in High School Math (AIME 2025)

1GPT 5.2100
2Gemini 3 Pro100
3Kimi K2 Thinking99.1
4GPT oss 20b98.7
5OpenAI o398.4

Best in Agentic Coding (SWE Bench)

1Claude Sonnet 4.582
2Claude Opus 4.580.9
3GPT 5.280
4GPT 5.176.3
5Gemini 3 Pro76.2

Best Overall (Humanity's Last Exam)

1Gemini 3 Pro45.8
2Kimi K2 Thinking44.9
3GPT-535.2
4Grok 425.4
5Gemini 2.5 Pro21.6

Best in Visual Reasoning (ARC-AGI 2)

1Claude Opus 4.5378
2GPT 5.253
3Gemini 3 Pro31
4GPT 5.118
5GPT-518

Best in Multilingual Reasoning (MMMLU)

1Gemini 3 Pro91.8
2Claude Opus 4.590.8
3Claude Opus 4.189.5
4Gemini 2.5 Pro89.2
5Claude Sonnet 4.589.1

Fastest and most affordable models

Fastest Models (Tokens/sec)

1Llama 4 Scout2600
2Llama 3.3 70b2500
3Llama 3.1 70b2100
4Llama 3.1 8b1800
5Llama 3.1 405b969

Lowest Latency (TTFT)

1Nova Micro0.3s
2Llama 3.1 8b0.32s
3Llama 4 Scout0.33s
4Gemini 2.0 Flash0.34s
5GPT-4o mini0.35s

Cheapest Models (per 1M tokens)

1Nova Micro$0.04 / $0.14
2Gemma 3 27b$0.07 / $0.07
3Gemini 1.5 Flash$0.075 / $0.3
4GPT oss 20b$0.08 / $0.35

Model Comparison

ModelContext sizeCutoff dateI/O costMax outputLatencySpeed
GPT 5.2400kAug 2025$1.5 / $1416,0000.6s92 t/s
Claude Opus 4.5200,000April 2025$5 / $2564,000--
Claude Sonnet 4.5200,000April 2025$3 / $15160,00031s69 t/s
Gemini 3 Pro10,000,000April 2025$2 / $12650,00030.3s128 t/s
Kimi K2 Thinking256,000April 2025$0.6 / $2.516,40025.3s79 t/s
GPT 5.1200,000April 2025$1.25 / $10128,000--
GPT-5400,000April 2025$1.25 / $10128,000--
Claude Opus 4.1200,000April 2025$15 / $7532,000--
Gemini 2.5 Pro1,000,000Nov 2024$1.25 / $1065,00030s191 t/s
Claude 3.7 Sonnet200,000Nov 2024$3 / $15128,0000.91s78 t/s
DeepSeek-R1128,000Dec 2024$0.55 / $2.198,0004s24 t/s
GPT-4o128,000Oct 2023$2.5 / $104,0960.51s143 t/s
Claude 3.5 Sonnet200,000Apr 2024$3 / $154,0961.22s78 t/s