Model Guide9 min readReviewed Apr 20, 2026

MiniMax M2.5: 80.2% SWE-Bench Verified at 1/10 the Cost of Opus

MiniMax M2.5 was released on February 12, 2026 as a 229B total / 10B active parameter MoE model with 256 experts (8 active per token) and a 204,800-token context window. It achieves 80.2% on SWE-bench Verified using Claude Code scaffolding, placing it competitively with Claude Opus 4.5 at roughly 1/10 to 1/20 the output cost. A new Forge RL framework delivers 40x training speedup, and the model is 37% faster than M2.1 on SWE-bench tasks.

Published Apr 19, 2026Updated Apr 20, 2026
  • M2.5 scores 80.2% on SWE-bench Verified, close to Claude Opus 4.5 at 80.9%.
  • Output costs are 1/10 to 1/20 of Opus, Gemini 3 Pro, and GPT-5.
  • 37% faster than M2.1 on SWE-bench: 22.8 min vs 31.3 min average per task.
Quick note: This guide is based on public docs and release pages, but you should still verify current pricing, limits, supported tools, and region-specific billing on the official source before you pay, subscribe, or integrate.

Architecture and training

MiniMax M2.5 uses a Mixture-of-Experts architecture with 229B total parameters and only 10B active per token. The model has 256 experts, activating 8 per token, which keeps inference cost low while maintaining a large knowledge capacity. The context window is 204,800 tokens.

Training leveraged over 200,000 real-world reinforcement learning environments across 10+ programming languages. The new Forge RL framework provides a 40x training speedup over the previous generation, enabling rapid iteration on coding-agent capabilities.

MiniMax M2.5 official snapshot infographic
MiniMax M2.5 is easiest to explain from the official release page, the text-generation pricing page, and the Token Plan overview. Source: Official MiniMax M2.5 release.
Official MiniMax M2.5 release banner image

Official image

MiniMax M2.5 publishes its strongest benchmark and cost narrative on the release page

The official release page is unusually complete: benchmark rows, speed claims, cost framing, and links to API and Coding Plan routes all appear together.

  • Best single visual for M2.5 performance and positioning.
  • Useful when you need one source-backed image that already shows both research and access routes.

Source: Official MiniMax M2.5 release.

MiniMax M2.5 architecture specifications
SpecificationValue
Total parameters229B
Active parameters10B
Experts256 (8 active per token)
Context window204,800 tokens
Training tokens200,000+ real-world RL environments
Languages10+ programming languages
Release dateFebruary 12, 2026

Benchmark results

MiniMax M2.5 delivers strong results across the major coding and agent benchmarks. Its 80.2% on SWE-bench Verified places it just behind Claude Opus 4.5 (80.9%) and ahead of Qwen 3.6-Plus (78.8%), GLM-5 (77.8%), and Kimi K2.5 (76.8%).

Full MiniMax M2.5 benchmark results
BenchmarkScoreNotes
SWE-bench Verified80.2%Using Claude Code scaffolding
Multi-SWE-Bench51.3%Multi-language software engineering
BrowseComp76.3%Web browsing and information retrieval
GDPval-MM59.0%General-domain performance validation
SWE-bench Verified

Public SWE-bench Verified scores for the current generation of coding models.

Claude Opus 4.580.9

Official Anthropic evaluation.

MiniMax M2.580.2

Official MiniMax M2.5 release.

Qwen 3.6-Plus78.8

Official Qwen 3.6 release.

GLM-577.8

Official Z.AI evaluation.

Kimi K2.576.8

Official Kimi K2.5 technical blog.

Source: Official MiniMax M2.5 release.

Cost efficiency

MiniMax M2.5 is positioned as a cost-efficient alternative to frontier models. At current API pricing, M2.5 output costs are roughly 1/10 to 1/20 those of Claude Opus, Gemini 3 Pro, and GPT-5. The Lightning tier doubles throughput to 100 TPS at a modest price premium.

Relative output cost comparison (lower is better)

Output cost per million tokens relative to MiniMax M2.5 standard tier.

MiniMax M2.51x

Baseline: $1.20 / M output tokens.

MiniMax M2.5-Lightning2x

$2.40 / M output tokens at 100 TPS.

Claude Opus 4.5~12x

Approximate relative output cost.

GPT-5~10x

Approximate relative output cost.

Gemini 3 Pro~10x

Approximate relative output cost.

Source: Official MiniMax pricing.

BuyGLM shows package prices in USD. When a source page is published in CNY, the displayed value uses a fixed 1 USD = 8 CNY conversion and should still be checked against the live vendor page before payment.

Pricing tiers

MiniMax exposes M2.5 through both PAYG and Token Plan routes. The PAYG pricing page is the safest place to cite direct token rates, while Token Plan remains the public package route for bundled access.

  • Token Plan subscriptions bundle access with tool integrations for a fixed monthly fee.
  • Standard tier is the best value for batch and background coding tasks.
  • Lightning tier is recommended for interactive coding sessions where latency matters.
  • M2.5 is 37% faster than M2.1 on SWE-bench tasks (22.8 min vs 31.3 min average per task).
MiniMax M2.5 API pricing (USD per million tokens)
TierThroughputInputOutput
M2.5 Standard50 TPS$0.30 / M$1.20 / M
M2.5-Highspeed100 TPS$0.60 / M$2.40 / M
BuyGLM shows package prices in USD. When a source page is published in CNY, the displayed value uses a fixed 1 USD = 8 CNY conversion and should still be checked against the live vendor page before payment.

Try MiniMax M2.5 through Token Plan for the best value

Token Plan bundles API access with tool integrations at a predictable monthly cost. Start with the Standard tier and upgrade to Lightning if latency becomes a bottleneck.

Sources and official links

Frequently asked questions

How does MiniMax M2.5 compare to Claude Opus 4.5 on SWE-bench?

M2.5 scores 80.2% on SWE-bench Verified, close to Claude Opus 4.5 at 80.9%. The difference is less than one percentage point, but M2.5 output costs are roughly 1/10 to 1/12 of Opus.

What is the Forge RL framework?

Forge RL is MiniMax's custom reinforcement learning framework that provides a 40x training speedup over the previous generation. It enabled M2.5 to be trained on over 200,000 real-world RL environments across 10+ programming languages.

Should I use M2.5 Standard or M2.5-Lightning?

Use Standard for batch tasks, background coding, and cost-sensitive workflows. Use Lightning for interactive sessions and latency-sensitive tasks where the 100 TPS throughput matters.