MAI-Thinking-1 is the model that signals Microsoft is serious about building frontier intelligence on its own terms. Announced at Build 2026, it is Microsoft AI's first in-house reasoning model, and the flagship of the new seven-model MAI family. The pitch is unusual: a medium-sized model, trained from scratch with zero distillation, that Microsoft claims goes toe-to-toe with Claude Opus 4.6 on real software engineering tasks.
For developers, the interesting part is not just the benchmark numbers. It is the architecture. MAI-Thinking-1 keeps its active parameter count low (35B) while drawing on a roughly 1T-total sparse Mixture of Experts, which keeps inference cheaper than a dense model of comparable quality. Add a 256K context window, function calling, and Chat Completions API compatibility, and you have a reasoning model built for production rather than just leaderboards.
This guide covers the architecture, the published benchmarks, how the model was trained, its enterprise features, and how to access it. For the broader picture of all seven models, see our Microsoft MAI models developer guide.
What This Guide Covers
1What MAI-Thinking-1 Is
MAI-Thinking-1 is a reasoning model, meaning it breaks problems down step by step before answering, similar to the reasoning modes from OpenAI, Google, and Anthropic. Microsoft describes it as a medium-sized model that stands among the strongest in its weight class, and positions it as a step toward what the company calls Humanist Superintelligence: advanced AI designed to serve people and organizations rather than replace them.
The model matters on two axes, in Microsoft's framing: what it can do, and how it was built. The capability story is the benchmark performance. The provenance story is that it was trained entirely in-house on clean, licensed data, without distilling from any other lab's model. That second point is a direct response to growing enterprise concern about data provenance and the legal exposure of models trained on opaque or scraped data.
2Architecture: Sparse MoE & 256K Context
MAI-Thinking-1 is a sparse Mixture of Experts (MoE) model. It has roughly 1 trillion total parameters but activates only about 35 billion of them per token. That design gives it the knowledge capacity of a very large model while keeping the inference cost closer to a mid-sized one, because only a fraction of the network runs on any given forward pass.
| Spec | MAI-Thinking-1 |
|---|---|
| Architecture | Sparse Mixture of Experts |
| Active parameters | ~35 billion |
| Total parameters | ~1 trillion |
| Context window | 256,000 tokens (about a 600-page document) |
| Model type | Reasoning (step-by-step) |
| API compatibility | Chat Completions API, function calling |
Why the smaller active footprint matters: Microsoft argues that model size determines where advanced coding assistance can actually be deployed, how often it can be used, and whether it can move from occasional heavy tasks into daily workflows. A 35B-active model is far cheaper to serve at scale than a dense frontier model, which is central to Microsoft's cost story.
On the 256K context window
A 256K window covers most long-document and multi-file agent workloads: large codebases, full contract sets, or extended multi-turn conversations. It is smaller than the 1M-token windows some competitors advertise, so for genuinely massive context you may still need retrieval or chunking. For most enterprise reasoning tasks, 256K is more than enough.
3Benchmarks: SWE-Bench Pro, AIME & Human Preference
Microsoft published three categories of results: software engineering, mathematical reasoning, and blind human preference. Here are the headline figures, all vendor-reported.
| Benchmark | Result | Note |
|---|---|---|
| SWE-Bench Pro | Matches Claude Opus 4.6 | Real-world software engineering tasks |
| AIME 2025 | 97.0% | Competition mathematics |
| AIME 2026 | 94.5% | Newer, less likely to be memorized |
| Human preference vs Sonnet 4.6 | Preferred | 1,276 tasks, blind, judged by Surge raters |
The AIME results are worth a closer look. AIME 2025 at 97.0% is strong, but AIME 2026 at 94.5% is the more meaningful number, since a 2026 competition is far less likely to appear in training data. A small gap between the two suggests the model is genuinely reasoning rather than recalling memorized solutions.
The human preference evaluation is the result Microsoft leans on hardest. Benchmarks measure narrow capabilities; preference tests measure whether people actually find the responses helpful. Across 1,276 single-turn and multi-turn tasks, professional raters from Surge preferred MAI-Thinking-1 over Claude Sonnet 4.6. That is a meaningful signal, though it compares against Sonnet 4.6 rather than a top-tier model like Opus 4.6 on this axis.
How to read these numbers
All of these figures come from Microsoft's own model card and evaluations. Microsoft notes that competitor numbers were taken from respective official model cards. Independent third-party benchmarks had not landed at launch, so treat the comparisons as directional and validate against your own tasks before committing.
4How It Was Trained: No Distillation
Microsoft is unusually explicit about its training philosophy, built on three pillars it calls the Hill-Climbing Machine.
- Capabilities are learned, not inherited. MAI-Thinking-1 was trained without distillation from third-party models. Microsoft argues that an imitator is tied to the design choices of its teacher and struggles to adapt, so forcing the model to learn tasks directly yields more steerable behavior.
- Clean data. Training used clean, appropriately licensed data, with AI-generated content excluded from pre-training. Microsoft frames this as essential for quality, provenance, and control.
- Self-sufficiency across the stack. From co-designing with Microsoft's own Maia 200 accelerators through to the reinforcement learning framework, the training infrastructure is in-house.
For agentic coding specifically, Microsoft built verified training environments that are deterministic, executable, and graded by real test suites. That gives the model practice on the multi-step work developers actually do: reading code, editing files, running tests, observing failures, and recovering from intermediate mistakes. This is the same pattern we see across strong coding models, and it pairs well with the practices in our eval-driven development guide.
5Enterprise Features & Safety
MAI-Thinking-1 was built with enterprise readiness in mind. Beyond the 256K context window, it supports function calling, layered developer instructions, and a default style aligned to enterprise needs. It is compatible with the Chat Completions API, so swapping it into existing tooling is straightforward, and it ships with enterprise-grade security and compliance through Microsoft Foundry.
On safety, Microsoft takes a distinctive position. It treats both unsafe compliance and unnecessary refusal as defects in the same reward construction, aggregated by severity of potential harm. Safety is trained with the same reinforcement learning infrastructure used for capability, so safety improvements climb the same hill as capability rather than being bolted on. The stated goal is a model that maintains a safety bar on genuinely sensitive requests while staying helpful on everything else.
Still apply your own guardrails
Model-level safety training is valuable but not a substitute for application-layer controls. For production deployments, layer input validation, output filtering, and audit logging on top, as covered in our AI agent production guardrails guide.
6How to Access MAI-Thinking-1
As of the June 2, 2026 launch, MAI-Thinking-1 is available in private preview on Microsoft Foundry, with a public preview on MAI Playground coming soon. Because it speaks the Chat Completions API, integrating it looks like any other modern chat model.
# Chat Completions style request (illustrative)
POST https://<your-foundry-endpoint>/chat/completions
Authorization: Bearer <FOUNDRY_API_KEY>
Content-Type: application/json
{
"model": "mai-thinking-1",
"messages": [
{ "role": "system", "content": "You are a senior engineer." },
{ "role": "user", "content": "Refactor this module and add tests." }
],
"tools": [ /* function-calling schema */ ],
"max_tokens": 4096
}Microsoft also said that for the first time developers will be able to tune the model weights themselves through partners such as Baseten, alongside the managed Foundry experience. Always confirm the exact endpoint, model identifier, and parameters against the current MAI-Thinking-1 model page before building.
7Where MAI-Thinking-1 Fits in Your Stack
MAI-Thinking-1 is a strong default for workloads that reward careful, multi-step reasoning at a controllable cost:
- Agentic coding - reading and editing code, running tests, and recovering from failures across multi-file changes
- Long-document analysis - contracts, research, and policy work that fits inside the 256K window without chunking
- Math and technical reasoning - quantitative work where the AIME results suggest reliable step-by-step logic
- Enterprise assistants - tasks where provenance, compliance, and Foundry security matter as much as raw capability
If you are comparing reasoning models across vendors, our Claude Opus 4.8 vs GPT-5.5 comparison gives useful reference points for the competitive landscape MAI-Thinking-1 is entering.
8Why Lushbinary for Reasoning-Model Apps
A capable reasoning model is only the starting point. The value comes from wiring it into agents, retrieval, tools, and guardrails that hold up in production. Lushbinary builds those systems end-to-end, and we help teams evaluate new models like MAI-Thinking-1 against their real workloads before committing.
- Reasoning agents - tool use, function calling, and multi-step workflows built on MAI-Thinking-1 or a routed mix of models
- Foundry & Azure deployment - secure, compliant integration with observability and cost controls
- Evaluation harnesses - task-specific benchmarks so your model choice is grounded in your data, not a vendor slide
- Guardrails - input validation, output filtering, and audit logging layered on top of model-level safety
๐ Free Consultation
Want to put MAI-Thinking-1 to work in a real product? Lushbinary will scope your use case, build an evaluation harness, and recommend the right architecture with a realistic timeline, no obligation.
9Frequently Asked Questions
What is MAI-Thinking-1?
MAI-Thinking-1 is Microsoft AI's first in-house reasoning model, announced at Build 2026 on June 2, 2026. It is a 35B-active, roughly 1T-total parameter sparse Mixture of Experts model with a 256K token context window, trained from scratch on clean, commercially licensed data without distillation from third-party models.
How good is MAI-Thinking-1 at coding and math?
Microsoft reports MAI-Thinking-1 is toe-to-toe with Claude Opus 4.6 on SWE-Bench Pro, reaches 97.0% on AIME 2025 and 94.5% on AIME 2026, and was preferred over Claude Sonnet 4.6 in a blind human evaluation across 1,276 single-turn and multi-turn tasks run by its rating partner Surge.
What is MAI-Thinking-1's context window?
MAI-Thinking-1 supports a 256,000-token context window, which Microsoft says is enough to fit a 600-page document in a single pass. It also supports function calling, developer instructions, and the widely used Chat Completions API.
How can I access MAI-Thinking-1?
MAI-Thinking-1 is available in private preview on Microsoft Foundry as of launch, with a public preview on MAI Playground coming soon. It is also being made available through partners like Baseten, where developers can tune the model weights themselves for the first time.
Was MAI-Thinking-1 trained on OpenAI or other model outputs?
No. Microsoft states MAI-Thinking-1 was trained from the ground up without distillation from third-party models, on clean and appropriately licensed data with AI-generated content excluded from pre-training. Microsoft frames this as part of its push for long-term self-sufficiency and provenance you can trust.
๐ Sources
- Microsoft AI - Introducing MAI-Thinking-1
- Microsoft AI - Building a hill-climbing machine
- Microsoft AI - MAI-Thinking-1 model page
Content was rephrased for compliance with licensing restrictions. Architecture, benchmark, and availability details sourced from official Microsoft AI announcements as of June 2, 2026. All benchmark figures are vendor-reported and may change - always verify on Microsoft's website.
Building With MAI-Thinking-1?
From reasoning agents to secure Foundry deployments, Lushbinary builds production AI systems that are fast, compliant, and cost-aware. Let's talk about your project.
Ready to Build Something Great?
Get a free 30-minute strategy call. We'll map out your project, timeline, and tech stack - no strings attached.
Prefer email? Reach us directly:

