| GPT-5 Turbo |
OpenAI |
Closed frontier |
Live web data, premium coding and reasoning |
Highest cost tier; latency for web calls |
API access (new)
|
| GPT-5 |
OpenAI |
Closed |
General reasoning and coding |
Premium cost (high), though cheaper than Turbo |
API access
|
| GPT-5 mini |
OpenAI |
Closed small |
Balanced quality/latency for production APIs |
Not ideal for the hardest reasoning chains |
API access
|
| GPT-4o |
OpenAI |
Closed multimodal |
Fast assistant UX and multimodal tasks |
Cost at high scale |
API access
|
| GPT-4o mini |
OpenAI |
Closed small |
Cost-sensitive high-volume automation (40% price cut as of
March 31)
|
Lower ceiling on hard reasoning; now very economical |
API access
|
| o3 |
OpenAI |
Reasoning-first multimodal |
Complex multi-step logic, now with native image/document
understanding
|
Latency and cost per hard query; best for genuinely difficult
problems
|
API access (multimodal)
|
| o4-mini |
OpenAI |
Reasoning-efficient |
Technical Q&A and coding workflows |
Can require prompt tuning |
API access
|
| Claude 4.5 Sonnet |
Anthropic |
Closed |
Long-context writing and analysis (now 2M tokens) |
Conservative tone in some flows; still slower than newer
alternatives
|
API access
|
| Claude 4 Haiku |
Anthropic |
Closed small |
Fast responses and triage (35% price cut as of March 31)
|
Less robust on deepest tasks; now very economical |
API access
|
| Claude 4 Opus |
Anthropic |
Closed flagship |
High-stakes synthesis |
Throughput economics |
API access
|
| Gemini 2.5 Ultra |
Google |
Closed enterprise |
Enterprise multimodal, advanced reasoning and analysis |
Higher latency; requires enterprise contract |
Enterprise access (new)
|
| Gemini 2.5 Pro |
Google |
Closed |
Reasoning and multimodal enterprise apps |
Task variance across prompt styles |
API access
|
| Gemini 2.5 Flash |
Google |
Closed fast |
Low-latency assistant endpoints |
Lower quality than premium tier |
API access
|
| Gemma 4 4B Instruct |
Google |
Open weight compact |
Low-VRAM local assistants, lightweight RAG, and fast edge
inference
|
Lower ceiling on difficult coding/reasoning than larger
variants
|
Model docs
·
Weights
|
| Gemma 4 12B Instruct |
Google |
Open weight balanced |
Balanced private inference for coding, support, and
multi-document workflows
|
Needs careful quantization/runtime setup for 12-16GB VRAM
systems
|
Model docs
·
Weights
|
| Gemma 4 27B Instruct |
Google |
Open weight high quality |
High-quality private reasoning/coding tiers without API
lock-in
|
Hardware intensive; best with larger VRAM or multi-GPU setups
|
Model docs
·
Weights
|
| Llama 4 Maverick |
Meta |
Open MoE multimodal |
Flagship open-weight reasoning, vision + text pipelines |
Full MoE serving requires strong infrastructure |
Download
·
Scout
|
| Llama 4.1 Scout |
Meta |
Open multimodal efficient |
Edge inference with 25% faster inference than prior versions
|
Lower ceiling than Maverick; best for volume-optimized
deployments
|
Download (new)
|
| Llama 4 Scout |
Meta |
Open multimodal efficient |
Edge inference, vision-text tasks, low-cost deployments |
Lower ceiling than Maverick on complex reasoning |
Download
|
| Llama 3.1 405B Instruct |
Meta |
Open weight |
Top-end open deployment quality |
Heavy infrastructure requirements |
Download
·
70B
·
8B
|
| Llama 3.1 70B Instruct |
Meta |
Open weight |
Strong self-hosted quality/cost balance |
Needs good inference stack |
Download
·
405B
·
8B
|
| Llama 3.1 8B Instruct |
Meta |
Open weight small |
Edge and low-cost deployments |
Lower performance on complex tasks |
Download
·
70B
·
405B
|
| Llama 3.2 11B Vision |
Meta |
Open multimodal |
Private vision-text pipelines |
Requires evals for OCR-heavy cases |
Download
·
90B
|
| Llama 3.2 90B Vision |
Meta |
Open multimodal |
High-capacity multimodal inference |
Infrastructure complexity |
Download
·
11B
|
| Llama 3.3 70B Instruct |
Meta |
Open weight |
Efficient self-hosted quality, matches 3.1 405B at much lower
cost
|
Needs good inference stack for throughput |
Download
|
| Mistral Large 2 |
Mistral AI |
Closed |
High-quality enterprise assistants |
Smaller ecosystem vs hyperscalers |
API access
|
| Mistral Medium |
Mistral AI |
Closed |
Balanced production usage |
Benchmark carefully vs peers |
API access
|
| Mistral Small |
Mistral AI |
Closed small |
Fast cost-efficient chat |
Limited depth on advanced reasoning |
API access
|
| Mixtral 8x22B |
Mistral AI |
Open MoE |
Strong open-weight generation quality |
Operational complexity |
Download
·
8x7B
|
| Mixtral 8x7B |
Mistral AI |
Open MoE |
Efficient self-hosting |
Can trail latest closed models |
Download
·
8x22B
|
| Codestral |
Mistral AI |
Code-specialized |
Code generation and completion |
Narrower general language strength |
Download
|
| Qwen3 32B Instruct |
Alibaba |
Open weight |
Strong open-weight multilingual assistant quality |
Regional compliance and policy review required |
Model hub
|
| Qwen3.6-35B-A3B |
Alibaba |
Open weight MoE multimodal |
MoE (35B total, 3B active); hybrid thinking mode;
262K native context (up to ~1M with YaRN); multimodal
(text, image, video); agentic coding with repo-level
reasoning. AIME 2026: 92.7, GPQA Diamond: 86.0,
SWE-bench: 73.4
|
Regional compliance review required; thinking mode adds
latency for simple tasks
|
Download
·
FP8
|
| QwQ-32B |
Alibaba |
Reasoning open |
Reasoning-focused private usage |
Evals needed for stability |
Download
|
| DeepSeek V3 |
DeepSeek |
Open/available |
General reasoning and coding value |
Governance review in enterprise |
Download
|
| DeepSeek R1.5 |
DeepSeek |
Reasoning-focused |
Improved analytical reasoning and problem-solving (March 2026
release)
|
Latency on complex outputs; governance review required |
Download (new)
|
| DeepSeek R1 |
DeepSeek |
Reasoning-focused |
Difficult multi-step reasoning tasks |
Latency on complex outputs |
Download
|
| DeepSeek Coder V3 |
DeepSeek |
Code-specialized |
Developer assistants and code review |
General writing less strong |
Model hub
|