MiniMax M2.5: 80.2% SWE-Bench Verified at 1/10 the Cost of Opus
MiniMax M2.5 was released on February 12, 2026 as a 229B total / 10B active parameter MoE model with 256 experts (8 active per token) and a 204,800-token context window. It achieves 80.2% on SWE-bench Verified using Claude Code scaffolding, placing it competitively with Claude Opus 4.5 at roughly 1/10 to 1/20 the output cost. A new Forge RL framework delivers 40x training speedup, and the model is 37% faster than M2.1 on SWE-bench tasks.
- M2.5 scores 80.2% on SWE-bench Verified, close to Claude Opus 4.5 at 80.9%.
- Output costs are 1/10 to 1/20 of Opus, Gemini 3 Pro, and GPT-5.
- 37% faster than M2.1 on SWE-bench: 22.8 min vs 31.3 min average per task.
Architecture and training
MiniMax M2.5 uses a Mixture-of-Experts architecture with 229B total parameters and only 10B active per token. The model has 256 experts, activating 8 per token, which keeps inference cost low while maintaining a large knowledge capacity. The context window is 204,800 tokens.
Training leveraged over 200,000 real-world reinforcement learning environments across 10+ programming languages. The new Forge RL framework provides a 40x training speedup over the previous generation, enabling rapid iteration on coding-agent capabilities.

Official image
MiniMax M2.5 publishes its strongest benchmark and cost narrative on the release page
The official release page is unusually complete: benchmark rows, speed claims, cost framing, and links to API and Coding Plan routes all appear together.
- Best single visual for M2.5 performance and positioning.
- Useful when you need one source-backed image that already shows both research and access routes.
Source: Official MiniMax M2.5 release.
| Specification | Value |
|---|---|
| Total parameters | 229B |
| Active parameters | 10B |
| Experts | 256 (8 active per token) |
| Context window | 204,800 tokens |
| Training tokens | 200,000+ real-world RL environments |
| Languages | 10+ programming languages |
| Release date | February 12, 2026 |
Benchmark results
MiniMax M2.5 delivers strong results across the major coding and agent benchmarks. Its 80.2% on SWE-bench Verified places it just behind Claude Opus 4.5 (80.9%) and ahead of Qwen 3.6-Plus (78.8%), GLM-5 (77.8%), and Kimi K2.5 (76.8%).
| Benchmark | Score | Notes |
|---|---|---|
| SWE-bench Verified | 80.2% | Using Claude Code scaffolding |
| Multi-SWE-Bench | 51.3% | Multi-language software engineering |
| BrowseComp | 76.3% | Web browsing and information retrieval |
| GDPval-MM | 59.0% | General-domain performance validation |
Public SWE-bench Verified scores for the current generation of coding models.
Official Anthropic evaluation.
Official MiniMax M2.5 release.
Official Qwen 3.6 release.
Official Z.AI evaluation.
Official Kimi K2.5 technical blog.
Source: Official MiniMax M2.5 release.
Cost efficiency
MiniMax M2.5 is positioned as a cost-efficient alternative to frontier models. At current API pricing, M2.5 output costs are roughly 1/10 to 1/20 those of Claude Opus, Gemini 3 Pro, and GPT-5. The Lightning tier doubles throughput to 100 TPS at a modest price premium.
Output cost per million tokens relative to MiniMax M2.5 standard tier.
Baseline: $1.20 / M output tokens.
$2.40 / M output tokens at 100 TPS.
Approximate relative output cost.
Approximate relative output cost.
Approximate relative output cost.
Source: Official MiniMax pricing.
Pricing tiers
MiniMax exposes M2.5 through both PAYG and Token Plan routes. The PAYG pricing page is the safest place to cite direct token rates, while Token Plan remains the public package route for bundled access.
- Token Plan subscriptions bundle access with tool integrations for a fixed monthly fee.
- Standard tier is the best value for batch and background coding tasks.
- Lightning tier is recommended for interactive coding sessions where latency matters.
- M2.5 is 37% faster than M2.1 on SWE-bench tasks (22.8 min vs 31.3 min average per task).
| Tier | Throughput | Input | Output |
|---|---|---|---|
| M2.5 Standard | 50 TPS | $0.30 / M | $1.20 / M |
| M2.5-Highspeed | 100 TPS | $0.60 / M | $2.40 / M |
Try MiniMax M2.5 through Token Plan for the best value
Token Plan bundles API access with tool integrations at a predictable monthly cost. Start with the Standard tier and upgrade to Lightning if latency becomes a bottleneck.
Sources and official links
Frequently asked questions
How does MiniMax M2.5 compare to Claude Opus 4.5 on SWE-bench?
M2.5 scores 80.2% on SWE-bench Verified, close to Claude Opus 4.5 at 80.9%. The difference is less than one percentage point, but M2.5 output costs are roughly 1/10 to 1/12 of Opus.
What is the Forge RL framework?
Forge RL is MiniMax's custom reinforcement learning framework that provides a 40x training speedup over the previous generation. It enabled M2.5 to be trained on over 200,000 real-world RL environments across 10+ programming languages.
Should I use M2.5 Standard or M2.5-Lightning?
Use Standard for batch tasks, background coding, and cost-sensitive workflows. Use Lightning for interactive sessions and latency-sensitive tasks where the 100 TPS throughput matters.