Image models have become a crowded, fast-moving field, and Microsoft just made a strong claim to the top tier. MAI-Image-2.5, part of the seven-model MAI family announced at Build 2026, launched at No. 2 for image editing on Arena, ahead of Nano Banana 2.1, and No. 3 for text-to-image. It is Microsoft's strongest image model yet, and it ships in two flavors: a maximum-fidelity model and a faster, cheaper Flash variant.
What makes MAI-Image-2.5 interesting for developers is not just generation quality but editing precision. It supports localized edits that change one object without disturbing the rest of the image, and it preserves facial identity across pose and expression changes. Combined with competitive Foundry pricing, that makes it a credible engine for production image workflows, from e-commerce product shots to presentation visuals.
This guide covers capabilities, Arena benchmarks, pricing for both variants, and how to access the models. For the full MAI lineup, see our Microsoft MAI models developer guide.
What This Guide Covers
1What MAI-Image-2.5 Is
MAI-Image-2.5 is a unified image model that handles both high-quality text-to-image generation and precise, controllable editing. Microsoft launched two versions at once: MAI-Image-2.5 for maximum fidelity, and MAI-Image-2.5-Flash for fast, scalable production workloads where speed and cost matter more than the last few points of quality.
It is a meaningful step up from MAI-Image-2, with an overall +75 point Arena improvement and the largest gains in Text Rendering (+107) and Cartoon, Anime & Fantasy (+90). Text rendering in particular has historically been a weak spot for image models, so a large jump there is directly useful for anything involving product labels, signage, or UI mockups.
2Capabilities: Generation & Editing
Microsoft groups the model's strengths into four areas:
- Step-change text-to-image quality - more detailed, coherent images with stronger text rendering, product imagery, and prompt adherence
- Complex visual reasoning - the model understands scene structure, lighting, scale, and spatial relationships, so it can add an object with the correct perspective and shadows
- Fine-grained edit control - precise, localized edits like replacing an object, updating text, or removing motion blur without altering the rest of the image
- Face and identity consistency - it preserves facial identity across edits, holding a recognizable likeness through changes in pose, expression, or viewpoint
Why localized editing matters
Many image models regenerate the whole frame when you ask for a small change, which means faces shift, backgrounds drift, and products lose their exact look. Localized editing that leaves the rest of the image untouched is what makes a model usable for real production work like e-commerce catalogs and brand assets, where consistency is the whole game.
3Arena Benchmarks
MAI-Image-2.5 is evaluated on Arena, the blind human-preference leaderboard for image models. Microsoft reports it surpasses GPT-Image-1.5 and Nano Banana Pro 2K on Arena scores. The rankings below are as of the June 2, 2026 launch and reflect Arena's live, community-judged leaderboard.
| Arena Category | MAI-Image-2.5 Ranking |
|---|---|
| Image editing | No. 2 (ahead of Nano Banana 2.1) |
| Text-to-image | No. 3 |
| Overall vs MAI-Image-2 | +75 points |
| Text Rendering gain | +107 points vs MAI-Image-2 |
| Cartoon, Anime & Fantasy gain | +90 points vs MAI-Image-2 |
On editing specifically, Microsoft reports MAI-Image-2.5 wins the majority of categories in blind human preference judging across 12 editing categories, including image cleanup, backgrounds, shadows, and text. Because Arena rankings are live and community-judged, exact positions shift over time, so check the current leaderboard before quoting a rank.
4Pricing: Standard vs Flash
Both variants are billed on Microsoft Foundry per million tokens, separated into text input, image input, and image output. The split lets you reason about cost based on whether your workload is generation-heavy (more image output) or edit-heavy (more image input).
| Price (per 1M tokens) | MAI-Image-2.5 | MAI-Image-2.5-Flash |
|---|---|---|
| Text input | $5.00 | $1.75 |
| Image input | $8.00 | $1.75 |
| Image output | $47.00 | $19.50 |
The Flash variant is materially cheaper across the board: image output, usually the dominant cost for generation workloads, drops from $47 to $19.50 per 1M tokens, roughly 41% of the standard price. For high-volume production where the absolute top fidelity is not required, Flash is the obvious default, with the standard model reserved for hero assets and detailed editing.
5How to Access MAI-Image-2.5
MAI-Image-2.5 and MAI-Image-2.5-Flash are available to developers in Microsoft Foundry and the MAI Playground today, and on OpenRouter, where Microsoft is making the models available to OpenRouter's developer community through the same API they already use. The models are also wired into Microsoft products: live in PowerPoint for generating presentation visuals and rolling out to OneDrive for precise photo editing.
# Image generation request (illustrative)
POST https://<your-foundry-endpoint>/images/generations
Authorization: Bearer <FOUNDRY_API_KEY>
{
"model": "mai-image-2.5-flash",
"prompt": "A matte ceramic coffee mug on oak,
soft morning light, product photography",
"size": "1024x1024"
}Confirm the exact endpoint, model identifiers, and request format against the current MAI-Image-2.5 announcement and the Foundry model card before building.
6Safety & Limitations
MAI-Image-2.5 includes layered safety guardrails, including prompt and output filtering, to detect and block harmful or policy-violating content. Microsoft is also clear about the limits: like all image models, it can reflect biases in its training data and may produce plausible but inaccurate or misleading visual details.
Review before sensitive use
Microsoft advises that generated images should be reviewed before use in sensitive contexts, including identity, legal, medical, financial, or news-related workflows. Build a human review step into any pipeline where an inaccurate image could cause real harm.
7Where MAI-Image-2.5 Fits
The combination of strong editing, identity consistency, and tiered pricing makes MAI-Image-2.5 a good fit for:
- E-commerce - product imagery, background cleanup, and consistent catalog edits at scale (use Flash for volume)
- Marketing & presentations - on-brand visuals with reliable text rendering for decks and campaigns
- Photo editing products - localized edits that preserve the original scene, similar to the OneDrive integration
- Creative tooling - generation and editing behind a single API, with the standard model for hero assets
If you are comparing image models across vendors, our AI image generation comparison puts MAI-Image-2.5 in context with the rest of the field.
8Why Lushbinary for Image AI
Plugging an image model into a real product means handling prompts, editing workflows, cost control, review steps, and storage at scale. Lushbinary builds image and multimodal features end-to-end, from e-commerce catalog automation to creative tooling.
- Image pipeline development - generation and editing workflows built on MAI-Image-2.5 or a routed mix of models
- Cost optimization - tiering between standard and Flash variants so you only pay for fidelity where it counts
- Review & safety - human-in-the-loop review steps for sensitive use cases and brand control
- Foundry & cloud integration - secure, scalable deployment with storage, CDN, and monitoring
๐ Free Consultation
Building an image generation or editing feature? Lushbinary will scope your use case, recommend the right model tier, and map a realistic build plan with cost projections, no obligation.
9Frequently Asked Questions
What is MAI-Image-2.5?
MAI-Image-2.5 is Microsoft's strongest image model to date, announced at Build 2026. It handles both text-to-image generation and precise image editing, and ranks No. 2 for image editing on Arena (ahead of Nano Banana 2.1) and No. 3 for text-to-image. A faster, lower-cost MAI-Image-2.5-Flash variant launched alongside it.
How much does MAI-Image-2.5 cost?
On Microsoft Foundry, MAI-Image-2.5 is $5 per 1M text input tokens, $8 per 1M image input tokens, and $47 per 1M image output tokens. MAI-Image-2.5-Flash is cheaper at $1.75 per 1M text input tokens, $1.75 per 1M image input tokens, and $19.50 per 1M image output tokens.
How does MAI-Image-2.5 compare to Nano Banana and GPT-Image?
Microsoft reports MAI-Image-2.5 surpasses GPT-Image-1.5 and Nano Banana Pro 2K on Arena scores, ranking No. 3 for text-to-image and No. 2 for image editing where it sits ahead of Nano Banana 2.1. It delivers an overall +75 point Arena improvement over MAI-Image-2.
Can MAI-Image-2.5 edit existing images without changing the rest?
Yes. MAI-Image-2.5 supports fine-grained, localized edits such as replacing an object, updating text, or removing motion blur without changing the rest of the image. It also preserves facial identity across edits, maintaining a recognizable likeness through changes in pose, expression, or viewpoint.
Where can I access MAI-Image-2.5?
MAI-Image-2.5 and MAI-Image-2.5-Flash are available to developers in Microsoft Foundry and the MAI Playground, and on OpenRouter. The models are also live in PowerPoint for image generation and rolling out to OneDrive for precise photo editing.
๐ Sources
- Microsoft AI - MAI-Image-2.5 launches at No. 2 for image editing on Arena
- Microsoft AI - Building a hill-climbing machine
- Arena - Image Edit leaderboard
Content was rephrased for compliance with licensing restrictions. Pricing, Arena rankings, and capability details sourced from official Microsoft AI announcements and the Arena leaderboard as of June 2, 2026. Pricing and live leaderboard rankings may change - always verify on Microsoft's website and arena.ai.
Building Image AI Features?
From generation pipelines to localized editing and cost-aware model tiering, Lushbinary builds production image systems that scale. Let's talk about your project.
Ready to Build Something Great?
Get a free 30-minute strategy call. We'll map out your project, timeline, and tech stack - no strings attached.
Prefer email? Reach us directly:

