MiniMax shipped M3 on June 1, 2026, and the pitch is unusually ambitious: the first open-weights model that combines frontier-level coding, a 1-million-token context window, and native multimodal input in a single system. It scores 59.0% on SWE-Bench Pro and 83.5 on BrowseComp, which MiniMax says surpasses both GPT-5.5 and Gemini 3.1 Pro on coding and edges past Claude Opus 4.7 on autonomous browsing.
The number that matters for builders is the price. On OpenRouter, M3 launched with a temporary 50% promotion at roughly $0.30 per million input tokens and $1.20 per million output tokens. That is a small fraction of what the closed frontier charges for comparable coding work, and it makes a 1M context window affordable enough to actually build on.
This guide breaks down what M3 actually is: the MSA sparse-attention architecture that makes the long context cheap, the full benchmark picture, the pricing math, how to access it today, and where it fits (and does not fit) in a production stack.
📑 What This Guide Covers
1What MiniMax M3 Is
MiniMax M3 is a multimodal foundation model from MiniMax (Xiyu Technology). It accepts text, image, and video inputs and produces text output, with a context window of up to 1 million tokens and a guaranteed minimum of 512K. MiniMax positions it for long-horizon agent tasks, long-range coding, and long-video understanding, the three workloads that punish models with short or expensive context.
| Attribute | MiniMax M3 |
|---|---|
| Release date | June 1, 2026 (API live May 31) |
| Architecture | Sparse Mixture-of-Experts with MSA (MiniMax Sparse Attention) |
| Context window | Up to 1M tokens (512K guaranteed minimum) |
| Modalities | Text, image, video in; text out |
| Weights | Open-weights (released shortly after API launch) |
| Best at | Coding, agentic tool use, long-context reasoning, multimodal |
M3 sits in the same family as the M2 line that ran through 2025 and early 2026, but it is a generational change rather than a point release. The M2 series had removed sparse attention in favor of full attention. M3 brings it back in a new form, and that single decision is what unlocks the 1M context at a usable price. If you are coming from M2.7, see our M3 vs M2.7 upgrade guide.
2The MSA Sparse-Attention Architecture
The headline technical feature of M3 is MiniMax Sparse Attention (MSA). Standard transformer attention is quadratic: every token attends to every other token, so doubling the context roughly quadruples the attention compute. That is why long context windows have historically been slow and expensive.
MSA replaces full attention with a KV-block selection mechanism. Instead of attending to every token, the model selects the most relevant blocks of the key-value cache for each query, cutting per-token compute at long context while retaining quality across most tasks. MiniMax reports the gains are dramatic at the extreme end of the context window:
~9x
Faster prefill at 1M tokens vs prior generation
~15x
Faster decoding at 1M tokens vs prior generation
~1/10
Per-token compute at 1M tokens vs prior generation
Why this matters
A 1M context window is only useful if you can afford to fill it. MSA is what turns the long context from a marketing number into a practical tool: it is the same architectural lever Google, OpenAI, and Anthropic use in different forms to keep long-context inference from blowing up costs.
3Benchmarks: Coding, Agentic & Multimodal
M3 leads with coding and agentic numbers. Here are the headline scores MiniMax published at launch, alongside reference points for the closed frontier. Treat vendor-published comparisons with healthy skepticism and validate on your own workloads.
| Benchmark | MiniMax M3 | What it measures |
|---|---|---|
| SWE-Bench Pro | 59.0% | Long-horizon real-world software engineering tasks |
| Terminal-Bench 2.1 | 66.0% | Agentic terminal and CLI task completion |
| SWE-fficiency | 34.8% | Efficient resolution of engineering tasks |
| BrowseComp | 83.5 | Autonomous browsing and information retrieval (Opus 4.7: 79.3) |
| SVG-Bench | Surpasses Opus 4.7 | Programmatic SVG generation quality |
The framing MiniMax uses is consistent across sources: on SWE-Bench Pro, M3 surpasses GPT-5.5 and Gemini 3.1 Pro and approaches Claude Opus 4.7. For an open-weights model you can self-host, closing the gap to the closed frontier on coding is the entire story.
On the agentic side, MiniMax highlighted extreme long-horizon tests: M3 reportedly reproduced the experiments of an ICLR paper in 12 hours, and ran continuously for 24 hours without reference code while making nearly 2,000 tool calls. Those are vendor anecdotes, not benchmarks, but they signal where M3 is aimed: agents that work for hours, not single-turn chat.
Verify on your own tasks
Benchmark leadership rarely transfers cleanly to your specific domain. Before you commit M3 to production, run it against a held-out eval set built from your real tickets, prompts, and tool traces. See our eval-driven development guide for how to do that.
4Pricing & Cost Math
M3 launched on OpenRouter at a standard rate of $0.60 per million input tokens and $2.40 per million output tokens, with a temporary 50% promotional discount that brings it to roughly $0.30 input / $1.20 output per million tokens. The promo rate matches MiniMax's own M2-series pay-as-you-go pricing.
| Model | Input /M | Output /M |
|---|---|---|
| MiniMax M3 (promo) | $0.30 | $1.20 |
| MiniMax M3 (standard) | $0.60 | $2.40 |
| Claude Opus 4.x | $5.00 | $25.00 |
| GPT-5.5 | ~$10.00 | ~$30.00 |
A worked example makes the gap concrete. Say an agentic coding task consumes 500K input tokens and 100K output tokens (a realistic figure once you fill a long context with repo files and tool output):
- M3 (promo): (0.5 x $0.30) + (0.1 x $1.20) = $0.27 per task
- M3 (standard): (0.5 x $0.60) + (0.1 x $2.40) = $0.54 per task
- Claude Opus: (0.5 x $5.00) + (0.1 x $25.00) = $5.00 per task
At promo pricing, M3 runs the same task at roughly 5% of the Opus cost, and even at standard pricing it is about a tenth. For high-volume agentic workloads, that difference decides whether a product is viable.
5How to Access M3 Today
You have three practical paths to M3, depending on whether you want managed convenience or full control:
Option A: MiniMax Platform API
The first-party route. Create a key at platform.minimax.io and call the OpenAI-compatible endpoint:
curl https://api.minimax.io/v1/chat/completions \
-H "Authorization: Bearer $MINIMAX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "MiniMax-M3",
"messages": [
{"role": "user", "content": "Summarize this repo and propose a refactor plan."}
]
}'Option B: OpenRouter
OpenRouter listed M3 at launch (with the promotional discount), which is the fastest way to test it without a MiniMax account. Point your existing OpenAI-compatible client at OpenRouter and set the model to minimax/minimax-m3. If you route across providers, see our LLM gateway and model routing guide.
Option C: Self-Host the Open Weights
MiniMax released M3 as open-weights shortly after the API went live, so you can run it on your own infrastructure with an inference engine like vLLM or SGLang once they add MSA support. Self-hosting only makes sense at sustained high volume; at low volume the API is far cheaper than idle GPUs. Note the license has commercial-use conditions, so review the terms before shipping it in a product.
6Where M3 Fits in Your Stack
M3 is strongest where its three differentiators line up: cheap long context, agentic tool use, and multimodal input. Concretely:
Great fit
- Long-horizon coding agents over large repos
- High-volume agentic workflows where cost dominates
- Whole-codebase or whole-document analysis
- Autonomous browsing and research agents
- Multimodal pipelines (image, video understanding)
Consider alternatives
- Hardest multi-file refactors where Opus still leads marginally
- Workloads needing a strict commercial-use license without conditions
- Latency-critical chat where smaller models suffice
- Regulated data requiring a specific provider region
The pragmatic pattern most teams land on is a hybrid: route the bulk of agentic and long-context work to M3 for cost, and reserve a frontier closed model for the small slice of tasks where the last few points of quality matter. A gateway makes that routing trivial.
7Caveats & What to Watch
- Promo pricing is temporary. Budget against the standard $0.60/$2.40 rate so a promo expiry does not break your unit economics.
- Licensing has conditions. The open weights ship under terms with commercial restrictions that drew criticism at launch. Read the license before building a commercial product on self-hosted M3.
- Vendor benchmarks are vendor benchmarks. The "surpasses GPT-5.5 and Gemini 3.1 Pro" claims are MiniMax's own. Independent leaderboards and your own evals are the real test.
- Long context is not memory. A 1M window helps, but stuffing everything into context is not a substitute for a real memory system on long-running agents. See our agent memory guide.
8Why Lushbinary for AI Model Integration
At Lushbinary, we help teams adopt new models like MiniMax M3 without betting the product on a single provider. We specialize in:
- Model evaluation - building eval suites from your real workloads so model choices are data-driven, not hype-driven
- Cost-aware routing - gateways that send routine work to M3 and escalate hard tasks to a frontier model
- Agent architecture - long-horizon agents with memory, guardrails, and observability built in
- Self-hosting - sizing, deploying, and operating open-weights models on your own infrastructure when volume justifies it
🚀 Free Consultation
Curious whether MiniMax M3 fits your product? Lushbinary will benchmark it against your real workloads, design a cost-aware routing strategy, and ship a production integration - no obligation.
❓ Frequently Asked Questions
What is MiniMax M3?
MiniMax M3 is an open-weights multimodal foundation model launched June 1, 2026. It combines frontier coding and agentic performance, a 1M-token context window, and native text, image, and video inputs. It is built on the proprietary MiniMax Sparse Attention (MSA) architecture.
How much does MiniMax M3 cost?
At launch MiniMax M3 listed on OpenRouter at $0.60 per million input tokens and $2.40 per million output tokens, with a temporary 50% promotional discount bringing it to roughly $0.30 input and $1.20 output per million tokens. That is a fraction of frontier closed models like Claude Opus and GPT-5.5.
What benchmarks does MiniMax M3 achieve?
MiniMax M3 scores 59.0% on SWE-Bench Pro, 66.0% on Terminal-Bench 2.1, 34.8% on SWE-fficiency, and 83.5 on BrowseComp. MiniMax reports it surpasses GPT-5.5 and Gemini 3.1 Pro on SWE-Bench Pro, approaches Claude Opus 4.7, and beats Opus 4.7 on SVG-Bench and BrowseComp.
What is MiniMax Sparse Attention (MSA)?
MSA replaces full attention with a KV-block selection mechanism that cuts per-token compute at long context. MiniMax reports roughly 9x faster prefill and 15x faster decoding at 1M tokens versus the prior generation, with per-token compute around one-tenth of the previous model at that length.
Is MiniMax M3 open source?
MiniMax M3 is released as an open-weights model. The API went live first and the weights were scheduled to drop shortly after launch. Note that the license has commercial-use conditions, so review the terms before deploying it in a commercial product.
📚 Sources
- MiniMax Research - MiniMax M3 Announcement
- MiniMax M3 Model Page (specs & benchmarks)
- OpenRouter - MiniMax M3 Pricing & Providers
- MiniMax API Docs - Pay-As-You-Go Pricing
Content was rephrased for compliance with licensing restrictions. Benchmark and pricing data sourced from official MiniMax and OpenRouter publications as of June 2026. Pricing and promotional discounts may change - always verify on the vendor's website.
Put MiniMax M3 to Work in Your Product
We benchmark M3 against your real workloads, design cost-aware routing, and ship a production integration with guardrails and observability.
Ready to Build Something Great?
Get a free 30-minute strategy call. We'll map out your project, timeline, and tech stack - no strings attached.
Prefer email? Reach us directly:

