Logo
Back to Blog
AI & LLMsJune 1, 202614 min read

MiniMax M3 Developer Guide: Benchmarks, Pricing & MSA Architecture

MiniMax M3 launched June 1, 2026 as the first open-weights model to combine frontier coding, a 1M-token context window, and native multimodality. Full developer breakdown: the MSA sparse-attention architecture, 59% SWE-Bench Pro, 66% Terminal-Bench 2.1, 83.5 BrowseComp, $0.30/$1.20 promo pricing, how to access it, and where it fits in your stack.

Lushbinary Team

Lushbinary Team

AI & Cloud Solutions

MiniMax M3 Developer Guide: Benchmarks, Pricing & MSA Architecture

MiniMax shipped M3 on June 1, 2026, and the pitch is unusually ambitious: the first open-weights model that combines frontier-level coding, a 1-million-token context window, and native multimodal input in a single system. It scores 59.0% on SWE-Bench Pro and 83.5 on BrowseComp, which MiniMax says surpasses both GPT-5.5 and Gemini 3.1 Pro on coding and edges past Claude Opus 4.7 on autonomous browsing.

The number that matters for builders is the price. On OpenRouter, M3 launched with a temporary 50% promotion at roughly $0.30 per million input tokens and $1.20 per million output tokens. That is a small fraction of what the closed frontier charges for comparable coding work, and it makes a 1M context window affordable enough to actually build on.

This guide breaks down what M3 actually is: the MSA sparse-attention architecture that makes the long context cheap, the full benchmark picture, the pricing math, how to access it today, and where it fits (and does not fit) in a production stack.

1What MiniMax M3 Is

MiniMax M3 is a multimodal foundation model from MiniMax (Xiyu Technology). It accepts text, image, and video inputs and produces text output, with a context window of up to 1 million tokens and a guaranteed minimum of 512K. MiniMax positions it for long-horizon agent tasks, long-range coding, and long-video understanding, the three workloads that punish models with short or expensive context.

AttributeMiniMax M3
Release dateJune 1, 2026 (API live May 31)
ArchitectureSparse Mixture-of-Experts with MSA (MiniMax Sparse Attention)
Context windowUp to 1M tokens (512K guaranteed minimum)
ModalitiesText, image, video in; text out
WeightsOpen-weights (released shortly after API launch)
Best atCoding, agentic tool use, long-context reasoning, multimodal

M3 sits in the same family as the M2 line that ran through 2025 and early 2026, but it is a generational change rather than a point release. The M2 series had removed sparse attention in favor of full attention. M3 brings it back in a new form, and that single decision is what unlocks the 1M context at a usable price. If you are coming from M2.7, see our M3 vs M2.7 upgrade guide.

2The MSA Sparse-Attention Architecture

The headline technical feature of M3 is MiniMax Sparse Attention (MSA). Standard transformer attention is quadratic: every token attends to every other token, so doubling the context roughly quadruples the attention compute. That is why long context windows have historically been slow and expensive.

MSA replaces full attention with a KV-block selection mechanism. Instead of attending to every token, the model selects the most relevant blocks of the key-value cache for each query, cutting per-token compute at long context while retaining quality across most tasks. MiniMax reports the gains are dramatic at the extreme end of the context window:

~9x

Faster prefill at 1M tokens vs prior generation

~15x

Faster decoding at 1M tokens vs prior generation

~1/10

Per-token compute at 1M tokens vs prior generation

Full AttentionMSA Sparse AttentionEvery token attends to all (quadratic)Selects relevant KV blocks (near-linear)MSA cuts per-token compute at long context while retaining quality

Why this matters

A 1M context window is only useful if you can afford to fill it. MSA is what turns the long context from a marketing number into a practical tool: it is the same architectural lever Google, OpenAI, and Anthropic use in different forms to keep long-context inference from blowing up costs.

3Benchmarks: Coding, Agentic & Multimodal

M3 leads with coding and agentic numbers. Here are the headline scores MiniMax published at launch, alongside reference points for the closed frontier. Treat vendor-published comparisons with healthy skepticism and validate on your own workloads.

BenchmarkMiniMax M3What it measures
SWE-Bench Pro59.0%Long-horizon real-world software engineering tasks
Terminal-Bench 2.166.0%Agentic terminal and CLI task completion
SWE-fficiency34.8%Efficient resolution of engineering tasks
BrowseComp83.5Autonomous browsing and information retrieval (Opus 4.7: 79.3)
SVG-BenchSurpasses Opus 4.7Programmatic SVG generation quality

The framing MiniMax uses is consistent across sources: on SWE-Bench Pro, M3 surpasses GPT-5.5 and Gemini 3.1 Pro and approaches Claude Opus 4.7. For an open-weights model you can self-host, closing the gap to the closed frontier on coding is the entire story.

On the agentic side, MiniMax highlighted extreme long-horizon tests: M3 reportedly reproduced the experiments of an ICLR paper in 12 hours, and ran continuously for 24 hours without reference code while making nearly 2,000 tool calls. Those are vendor anecdotes, not benchmarks, but they signal where M3 is aimed: agents that work for hours, not single-turn chat.

Verify on your own tasks

Benchmark leadership rarely transfers cleanly to your specific domain. Before you commit M3 to production, run it against a held-out eval set built from your real tickets, prompts, and tool traces. See our eval-driven development guide for how to do that.

4Pricing & Cost Math

M3 launched on OpenRouter at a standard rate of $0.60 per million input tokens and $2.40 per million output tokens, with a temporary 50% promotional discount that brings it to roughly $0.30 input / $1.20 output per million tokens. The promo rate matches MiniMax's own M2-series pay-as-you-go pricing.

ModelInput /MOutput /M
MiniMax M3 (promo)$0.30$1.20
MiniMax M3 (standard)$0.60$2.40
Claude Opus 4.x$5.00$25.00
GPT-5.5~$10.00~$30.00

A worked example makes the gap concrete. Say an agentic coding task consumes 500K input tokens and 100K output tokens (a realistic figure once you fill a long context with repo files and tool output):

  • M3 (promo): (0.5 x $0.30) + (0.1 x $1.20) = $0.27 per task
  • M3 (standard): (0.5 x $0.60) + (0.1 x $2.40) = $0.54 per task
  • Claude Opus: (0.5 x $5.00) + (0.1 x $25.00) = $5.00 per task

At promo pricing, M3 runs the same task at roughly 5% of the Opus cost, and even at standard pricing it is about a tenth. For high-volume agentic workloads, that difference decides whether a product is viable.

5How to Access M3 Today

You have three practical paths to M3, depending on whether you want managed convenience or full control:

Option A: MiniMax Platform API

The first-party route. Create a key at platform.minimax.io and call the OpenAI-compatible endpoint:

curl https://api.minimax.io/v1/chat/completions \
  -H "Authorization: Bearer $MINIMAX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "MiniMax-M3",
    "messages": [
      {"role": "user", "content": "Summarize this repo and propose a refactor plan."}
    ]
  }'

Option B: OpenRouter

OpenRouter listed M3 at launch (with the promotional discount), which is the fastest way to test it without a MiniMax account. Point your existing OpenAI-compatible client at OpenRouter and set the model to minimax/minimax-m3. If you route across providers, see our LLM gateway and model routing guide.

Option C: Self-Host the Open Weights

MiniMax released M3 as open-weights shortly after the API went live, so you can run it on your own infrastructure with an inference engine like vLLM or SGLang once they add MSA support. Self-hosting only makes sense at sustained high volume; at low volume the API is far cheaper than idle GPUs. Note the license has commercial-use conditions, so review the terms before shipping it in a product.

6Where M3 Fits in Your Stack

M3 is strongest where its three differentiators line up: cheap long context, agentic tool use, and multimodal input. Concretely:

Great fit

  • Long-horizon coding agents over large repos
  • High-volume agentic workflows where cost dominates
  • Whole-codebase or whole-document analysis
  • Autonomous browsing and research agents
  • Multimodal pipelines (image, video understanding)

Consider alternatives

  • Hardest multi-file refactors where Opus still leads marginally
  • Workloads needing a strict commercial-use license without conditions
  • Latency-critical chat where smaller models suffice
  • Regulated data requiring a specific provider region

The pragmatic pattern most teams land on is a hybrid: route the bulk of agentic and long-context work to M3 for cost, and reserve a frontier closed model for the small slice of tasks where the last few points of quality matter. A gateway makes that routing trivial.

7Caveats & What to Watch

  • Promo pricing is temporary. Budget against the standard $0.60/$2.40 rate so a promo expiry does not break your unit economics.
  • Licensing has conditions. The open weights ship under terms with commercial restrictions that drew criticism at launch. Read the license before building a commercial product on self-hosted M3.
  • Vendor benchmarks are vendor benchmarks. The "surpasses GPT-5.5 and Gemini 3.1 Pro" claims are MiniMax's own. Independent leaderboards and your own evals are the real test.
  • Long context is not memory. A 1M window helps, but stuffing everything into context is not a substitute for a real memory system on long-running agents. See our agent memory guide.

8Why Lushbinary for AI Model Integration

At Lushbinary, we help teams adopt new models like MiniMax M3 without betting the product on a single provider. We specialize in:

  • Model evaluation - building eval suites from your real workloads so model choices are data-driven, not hype-driven
  • Cost-aware routing - gateways that send routine work to M3 and escalate hard tasks to a frontier model
  • Agent architecture - long-horizon agents with memory, guardrails, and observability built in
  • Self-hosting - sizing, deploying, and operating open-weights models on your own infrastructure when volume justifies it

🚀 Free Consultation

Curious whether MiniMax M3 fits your product? Lushbinary will benchmark it against your real workloads, design a cost-aware routing strategy, and ship a production integration - no obligation.

❓ Frequently Asked Questions

What is MiniMax M3?

MiniMax M3 is an open-weights multimodal foundation model launched June 1, 2026. It combines frontier coding and agentic performance, a 1M-token context window, and native text, image, and video inputs. It is built on the proprietary MiniMax Sparse Attention (MSA) architecture.

How much does MiniMax M3 cost?

At launch MiniMax M3 listed on OpenRouter at $0.60 per million input tokens and $2.40 per million output tokens, with a temporary 50% promotional discount bringing it to roughly $0.30 input and $1.20 output per million tokens. That is a fraction of frontier closed models like Claude Opus and GPT-5.5.

What benchmarks does MiniMax M3 achieve?

MiniMax M3 scores 59.0% on SWE-Bench Pro, 66.0% on Terminal-Bench 2.1, 34.8% on SWE-fficiency, and 83.5 on BrowseComp. MiniMax reports it surpasses GPT-5.5 and Gemini 3.1 Pro on SWE-Bench Pro, approaches Claude Opus 4.7, and beats Opus 4.7 on SVG-Bench and BrowseComp.

What is MiniMax Sparse Attention (MSA)?

MSA replaces full attention with a KV-block selection mechanism that cuts per-token compute at long context. MiniMax reports roughly 9x faster prefill and 15x faster decoding at 1M tokens versus the prior generation, with per-token compute around one-tenth of the previous model at that length.

Is MiniMax M3 open source?

MiniMax M3 is released as an open-weights model. The API went live first and the weights were scheduled to drop shortly after launch. Note that the license has commercial-use conditions, so review the terms before deploying it in a commercial product.

📚 Sources

Content was rephrased for compliance with licensing restrictions. Benchmark and pricing data sourced from official MiniMax and OpenRouter publications as of June 2026. Pricing and promotional discounts may change - always verify on the vendor's website.

Put MiniMax M3 to Work in Your Product

We benchmark M3 against your real workloads, design cost-aware routing, and ship a production integration with guardrails and observability.

Ready to Build Something Great?

Get a free 30-minute strategy call. We'll map out your project, timeline, and tech stack - no strings attached.

Let's Talk About Your Project

Prefer email? Reach us directly:

Contact Us

Subscribe · Newsletter

Build With MiniMax M3

Get practical guides on open-weights models, agents, and cost control.

  • New deep-dives on AI agents and cloud architecture
  • Engineering teardowns of shipped products
  • No spam, unsubscribe in one click

We respect your inbox. Read our privacy policy.

Exclusive Offer for Lushbinary Readers
WidelAI

One Subscription. Every Flagship AI Model.

Stop juggling multiple AI subscriptions. WidelAI gives you access to Claude, GPT, Gemini, and more - all under a single plan.

Claude Opus & SonnetGPT-5.5 & o3Gemini ProSingle DashboardAPI Access

Use code at checkout for 10% off your subscription:

MiniMax M3MiniMaxMSASparse Attention1M ContextOpen-Weights LLMAgentic AILLM BenchmarksMultimodal AISWE-Bench ProFrontier AICoding Models

ContactUs