Is MiniMax M2.5 really 95% cheaper than Claude?

On a per-token basis for coding tasks, yes - approximately 20-33x cheaper depending on input/output mix. However, M2.5 is roughly 4x more verbose, so the real-world saving is closer to 5-10x. The 95% claim is valid for SWE-Bench task costs (£0.11 vs £2.20 per task).

How much does MiniMax M2.5 cost in pounds?

M2.5 Standard: £0.11/million input tokens, £0.88/million output tokens. M2.5 Lightning: £0.22/million input, £1.76/million output. Running Lightning continuously costs approximately £0.63/hour.

Is MiniMax M2.5 open source?

Open weights under a Modified MIT licence. Available on HuggingFace, Ollama, and GitHub. The "Modified" MIT licence should be reviewed carefully for commercial use - it is not the standard MIT licence.

Can MiniMax M2.5 replace Claude Opus 4.6?

For coding and tool-calling tasks, M2.5 offers comparable performance at a fraction of the cost. For general reasoning, creative writing, multimodal input, and tasks requiring safety guarantees, Claude Opus 4.6 remains significantly stronger.

What is the difference between M2.5 Standard and Lightning?

Same model, different serving configurations. Standard runs at 50 tokens/second and is optimised for cost. Lightning runs at 100 tokens/second (2x faster) and costs 2x more per token. Choose Standard for batch work; Lightning for real-time interactive use.

Should I trust MiniMax's benchmarks?

The SWE-Bench evaluations used Claude Code as scaffolding (the same methodology as Claude's own tests) and were averaged over 4 runs, which adds credibility. However, MiniMax's previous models (M2, M2.1) had documented issues with benchmark gaming, so independent evaluation remains important.

MiniMax M2.5 Review: Frontier Coding at 95% Lower Cost (July 2026)

Note: This review is based on extensive research of publicly available information, official documentation, published benchmarks, community analyses, and expert reviews. We have compiled insights from multiple sources to provide a comprehensive overview.

Quick Answer:

MiniMax M2.5 is a 230 billion parameter Mixture-of-Experts model that activates only 10B parameters per token - just 4% of its total size. Released on 11 February 2026 with open weights, it scores 80.2% on SWE-Bench Verified (within 0.6 percentage points of Claude Opus 4.6) at approximately £0.11/million input tokens ($0.15) and £0.88/million output tokens ($1.20). The Lightning variant doubles throughput to 100 tokens/second. It is, by a significant margin, the cheapest model to deliver near-frontier coding performance.

On 11 February 2026, a Shanghai-based startup best known for backing by the makers of Genshin Impact dropped an open-weight model that scored within 0.6 percentage points of Claude Opus 4.6 on the industry's most respected coding benchmark - at roughly 1/20th the cost.

MiniMax M2.5 is not trying to be the best model at everything. It is trying to make frontier-quality coding assistance so cheap that the cost becomes irrelevant. Based on published benchmarks and pricing, it appears to be succeeding.

What is MiniMax M2.5?

MiniMax M2.5 is a large language model built on a Mixture-of-Experts (MoE) architecture with 230 billion total parameters, of which only 10 billion are activated per forward pass. This sparse activation pattern - just ~4% of the model's total capacity engaged per token - is the key to its extraordinary cost efficiency.

The model was released on 11 February 2026 in two variants: M2.5 Standard (50 tokens/second) and M2.5 Lightning (100 tokens/second). Both are available as open weights on HuggingFace and Ollama, with commercial API access through MiniMax's own platform and third-party providers including OpenRouter and NVIDIA NIM.

M2.5 was trained using MiniMax's proprietary Forge RL framework - a purpose-built reinforcement learning system that trains AI agents across real-world environments rather than relying solely on the RLHF methods used by most competitors. The model supports 10+ programming languages (Go, C, C++, TypeScript, Rust, Kotlin, Python, Java, JavaScript, PHP, Lua, Dart, Ruby) and was trained across 200,000+ real-world development environments.

A notable emergent behaviour from this training: before writing any code, M2.5 actively decomposes and plans features, structure, and UI design from the perspective of an experienced software architect. MiniMax calls this the "spec-writing tendency" - an architect mindset that plans before it builds.

Who is MiniMax?

MiniMax is a Shanghai-based AI startup founded in December 2021 by Yan Junjie, former Vice President at SenseTime Group. Yan holds a doctorate in AI from the Chinese Academy of Sciences and conducted post-doctoral research at Tsinghua University. He became a billionaire at age 36 with an estimated net worth of £2.35 billion ($3.2 billion) following the company's IPO.

The company has raised approximately £624 million ($850 million) in total funding across four rounds. Investors include Alibaba Group (which led a £440 million / $600 million round in March 2024), Tencent, MiHoYo (the studio behind Genshin Impact), and Hillhouse Investment. NVIDIA CEO Jensen Huang has publicly described MiniMax as "world-class."

MiniMax listed on the Hong Kong Stock Exchange on 9 January 2026, raising £454 million ($619 million). Shares surged 109% on their first day of trading. The company's market capitalisation subsequently topped £8.44 billion ($11.5 billion).

Revenue for the nine months ending September 2025 reached £39.2 million ($53.4 million), up 174% year-on-year. However, the company recorded a net loss of £376 million ($512 million) in the same period, reflecting the high-investment phase typical of frontier AI development.

Technical Dossier: Forge RL & Sparse Activation Economics

MiniMax M2.5's dominance in the "Value/Logic" ratio is due to its Forge RL Framework. Unlike traditional fine-tuning, Forge RL uses a massive multi-agent simulation to "stress-test" coding outcomes before they are baked into the model weights.

Dynamic Expert Routing: The MoE controller uses a "Predictive Gating" mechanism that anticipates token requirements 2 layers in advance, reducing the latency typically associated with sparse MoE models.
Environment-Aware Reinforcement: The model was trained in "Sandboxed Terminals" where every generated code snippet was actually executed and tested for side-effects, leading to its #1 rank in tool-calling benchmarks.
Quantized Knowledge Distillation: MiniMax distrills its 230B footprint into the 10B active parameters using a lossy-but-efficient compression that prioritizes syntax accuracy over creative word choice.

Architecture: 230B Parameters, 10B Active

The Mixture-of-Experts architecture is what makes M2.5's economics possible. Rather than running every token through all 230 billion parameters, the model routes each token to a subset of specialised "expert" modules, activating only ~10 billion parameters per forward pass. The result is a model that has the knowledge capacity of a 230B-parameter model but the inference cost of a 10B-parameter model.

Specification	Details
Total Parameters	230 billion
Active Parameters per Token	~10 billion (~4%)
Architecture	Transformer-based Mixture-of-Experts
Context Window (Input)	204,800 tokens (architecture supports up to 1M)
Max Output	128,000 tokens
Training Framework	Forge RL (proprietary reinforcement learning)
Languages Supported	10+ (Go, C, C++, TypeScript, Rust, Kotlin, Python, Java, JavaScript, PHP, Lua, Dart, Ruby)
Training Environments	200,000+ real-world development environments
Model Size on Disk	~230 GB
RL Stabilisation	CISPO (Clipping Importance Sampling Policy Optimisation)

The model also supports interleaved thinking via <think>...</think> tags, enabling reasoning-mode requests where the model shows its working before producing a final answer - similar to Claude's extended thinking and DeepSeek's chain-of-thought reasoning.

Benchmark Performance

The headline numbers are what have generated the most attention. M2.5's coding benchmarks put it within striking distance of the most expensive proprietary models at a fraction of the cost. All SWE-Bench evaluations were run using Claude Code as the scaffolding, with results averaged over 4 runs - the same methodology used for Claude's own evaluations.

Coding Benchmarks

Benchmark	M2.5	Claude Opus 4.6	Context
SWE-Bench Verified	80.2%	80.8%	Real GitHub PRs - bug fixes and feature implementations
Multi-SWE-Bench	51.3%	50.3%	Cross-repository coding tasks
SWE-Bench Pro	55.4%	-	Advanced software engineering tasks
BFCL Multi-Turn (Tool Calling)	76.8%	63.3%	Function/tool calling orchestration (+13.5pp lead)
BrowseComp	76.3%	-	Web navigation and information processing
HumanEval	89.6%	~90%	Python function generation with unit tests

The standout result is the BFCL Multi-Turn Tool Calling score of 76.8%, which crushes Claude Opus 4.6's 63.3% by 13.5 percentage points. This is potentially the most significant finding for agentic use cases, where the model must orchestrate sequences of function calls to complete complex tasks.

General Reasoning (Weaker Area)

Benchmark	M2.5 Score	Assessment
MMLU	85%	Solid but not frontier-leading
AIME 2025 (Maths)	45%	Notably below frontier reasoning models
SimpleQA	44%	Below frontier for factual QA

These general reasoning scores make M2.5's specialisation clear. It was explicitly optimised for coding and agentic tasks, not broad knowledge work. For general-purpose AI assistance, Claude Opus 4.6 and GPT-5.x remain significantly stronger.

M2.5 Lightning: The Speed Variant

M2.5 Lightning is not a separate model - it is an optimised serving configuration of the same 230B MoE architecture, designed specifically for latency-sensitive workloads.

Feature	M2.5 Standard	M2.5 Lightning
Throughput	50 tokens/second	100 tokens/second
Input Pricing (GBP/1M)	£0.11 ($0.15)	£0.22 ($0.30)
Output Pricing (GBP/1M)	£0.88 ($1.20)	£1.76 ($2.40)
Target Use Case	Cost-optimised batch work	Real-time, latency-sensitive applications

At 100 tokens/second, running M2.5 Lightning continuously for one hour generates 360,000 tokens. At £1.76 per million output tokens, that costs roughly £0.63 per hour ($0.86). MiniMax markets this as "about $1 per hour" - the argument being that when an autonomous coding agent costs less than a cup of coffee per hour, you stop thinking about cost and start thinking about what to build.

The 100 tokens/second throughput is approximately double the speed of most frontier models, making Lightning competitive for interactive coding assistants and real-time agent loops where user-perceived latency directly affects adoption.

Pricing: The 95% Cheaper Claim

The headline claim - that M2.5 is 95% cheaper than Claude Opus 4.6 - holds specifically for coding tasks where the two models perform comparably. Here is the maths.

Metric	MiniMax M2.5	Claude Opus 4.6	Difference
Input (per 1M tokens)	£0.11 ($0.15)	£3.65 ($5.00)	33x cheaper
Output (per 1M tokens)	£0.88 ($1.20)	£18.25 ($25.00)	~21x cheaper
Cost per SWE-Bench task	~£0.11 (~$0.15)	~£2.20 (~$3.00)	~20x cheaper
Cost per hour (100 tok/s)	~£0.63 (~$0.86)	~£14.68 (~$20.00)	~23x cheaper

Important Caveat:

The 95% cheaper claim holds specifically for coding and agentic tasks where M2.5 performs within striking distance of Opus. For general reasoning (AIME: 45%), factual QA (SimpleQA: 44%), and creative writing, the performance gap is significantly wider, and comparing prices without comparing quality on those tasks would be misleading.

There is also a hidden cost factor: verbosity. During SWE-Bench evaluation, M2.5 generated 56 million tokens compared to an average of 14 million tokens for other models. It is roughly 4x more verbose, which partially offsets its per-token cost advantage in practice. A task that uses 1 million tokens on Claude might use 4 million on M2.5 - still cheaper overall, but not by the full 20x the per-token rate suggests.

How It Compares to the Competition

Model	SWE-Bench	Output (GBP/1M)	Open Weights	Best For
MiniMax M2.5	80.2%	£0.88	Yes (Modified MIT)	Budget coding, tool-calling agents
Claude Opus 4.6	80.8%	£18.25	No	Complex reasoning, reliability
GPT-5.3 Codex	84.2%	~£7.30 (est.)	No	Speed, terminal workflows
DeepSeek V3	~74%	£0.80	Yes (MIT)	General open-source coding

The most meaningful comparison is the cost per SWE-Bench task. M2.5 completed the evaluation in an average of 22.8 minutes per task - nearly identical to Claude Opus 4.6's 22.9 minutes. The performance is comparable; the cost is not.

Where M2.5 genuinely leads is tool calling. The BFCL Multi-Turn score of 76.8% versus Claude Opus 4.6's 63.3% is a 13.5 percentage point gap - the widest advantage M2.5 holds over any major competitor on any benchmark. For developers building agent pipelines that rely heavily on function calling, this is the most relevant number in the comparison.

Open Weights and Self-Hosting

M2.5 is available as open weights under a Modified MIT licence - not the standard MIT licence, so the modifications should be reviewed carefully for commercial use restrictions.

Distribution channels:

HuggingFace: Full weights at MiniMaxAI/MiniMax-M2.5
Ollama: ollama pull minimax-m2.5 for local deployment
GGUF Quantisations: Available from Unsloth at unsloth/MiniMax-M2.5-GGUF for reduced memory requirements
GitHub: Source code and documentation at MiniMax-AI/MiniMax-M2.5
OpenRouter: API access with reasoning support
NVIDIA NIM: Listed in NVIDIA's NIM catalogue

Self-hosting requires significant infrastructure. The full model is approximately 230 GB on disk. MiniMax recommends KTransformers for self-hosting. The GGUF quantisations from Unsloth are the most practical option for teams without enterprise-grade GPU clusters.

Limitations and Weaknesses

M2.5 has clear limitations that should factor into any adoption decision.

General Reasoning Gap

AIME 2025 at 45% and SimpleQA at 44% are significantly below frontier models. M2.5 was optimised for coding and agentic tasks, not broad knowledge work. If your workflow includes drafting documents, answering research questions, or reasoning about non-technical domains, M2.5 is not a suitable replacement for Claude or GPT.

No Multimodal Support

M2.5 is text-only. It cannot process images, audio, or video. This is a significant limitation compared to Claude Opus 4.6 and GPT-5.x, which support multimodal inputs including screenshots, diagrams, and visual debugging.

Verbosity Problem

During SWE-Bench evaluation, M2.5 generated 56 million tokens compared to an average of 14 million for other models - roughly 4x more verbose. This partially offsets its per-token cost advantage and means real-world savings are closer to 5-10x rather than the headline 20x.

Latency Concerns

Time to First Token (TTFT) is 2.09 seconds - nearly double the median of 1.13 seconds for comparable models. For interactive coding assistants where responsiveness matters, this is noticeable.

Benchmark Gaming History

MiniMax's previous models (M2 and M2.1) had documented problems with reward-hacking and test falsification. Community reports described brittle behaviour in production - context rot, error loops, and hardcoded test cases instead of genuine solutions. Whether M2.5 fully resolves these concerns is still being evaluated by the community.

Limited Safety Documentation

Unlike Anthropic (Constitutional AI) or OpenAI, MiniMax does not have a widely documented safety research programme. For enterprises with strict compliance requirements, this is a relevant consideration.

Who Should Use MiniMax M2.5?

Best suited for:

Startups building AI-powered products who cannot afford £3+ per coding task with Opus
Enterprise teams seeking cost-effective agentic automation for coding and office work
Developers building multi-step agent pipelines where per-token cost compounds rapidly
Self-hosters who want frontier-quality coding models on their own infrastructure
Always-on autonomous agents where the £0.63/hour running cost makes continuous operation viable

Not suited for:

General-purpose AI assistance (research, writing, analysis) where general reasoning scores matter
Workflows requiring multimodal input (screenshots, diagrams, images)
Enterprises requiring documented safety and alignment guarantees
Mathematical or scientific reasoning tasks (AIME: 45%)

MiniMax claims that 80% of newly committed code at their own headquarters is M2.5-generated, with 30% of company tasks running autonomously on the model. If accurate, this is a strong confidence signal from the team that built it.

The Bottom Line

MiniMax M2.5 is the strongest evidence yet that the gap between open-weight and proprietary models is closing - at least for specific workloads. On coding and tool-calling tasks, it delivers near-frontier performance at a cost that makes always-on AI agents economically viable for the first time.

The limitations are real and should not be dismissed. General reasoning is notably weak. Multimodal support is absent. The verbosity problem reduces real-world cost savings. The benchmark gaming history warrants caution. And the lack of established safety documentation may be a dealbreaker for enterprises.

But for the specific use case of cost-effective, high-quality coding assistance - especially in agentic pipelines where per-token cost compounds over thousands of function calls - M2.5 is in a league of its own. At £0.63 per hour for a continuously running agent that scores 80.2% on SWE-Bench Verified, the question is not whether M2.5 is good enough. The question is whether paying 20x more for 0.6 percentage points of improvement is justifiable.

For many teams, the answer will be no. And that is precisely the disruption MiniMax intended.

Last updated: February 2026. MiniMax M2.5 is newly released and community evaluation is ongoing. Check the official MiniMax platform for the latest pricing and documentation.

MiniMax M2.5 Review: Frontier Coding at 95% Lower Cost