For the past few years, progress in AI has mostly meant one thing: bigger models trained on more data. Sakana AI, the Tokyo research lab, just shipped a product built on a different bet. On June 22, 2026, it launched Sakana Fugu, a multi-agent orchestration system that you talk to as if it were a single foundation model. Send a request to one OpenAI-compatible endpoint, and Fugu decides whether to answer directly or to assemble a team of specialist models behind the scenes.
The twist that makes Fugu interesting is that the orchestrator is itself a language model. Fugu was trained to call other LLMs in an agent pool, including instances of itself recursively, and to handle model selection, delegation, verification, and synthesis on its own. Sakana frames this as the next frontier: not owning the single smartest model, but learning how to route across many strong ones.
This guide breaks down what actually shipped: the orchestration idea, how the architecture works, the two tiers, the pricing, the vendor-reported benchmarks and their caveats, and where an orchestration model is the right tool versus where a single model still wins.
What This Guide Covers
- What Is Sakana Fugu?
- Orchestration Models: The Idea Behind Fugu
- How Fugu Works: Route, Delegate, Verify, Synthesize
- Fugu vs Fugu Ultra: The Two Tiers
- Pricing and Access
- Benchmarks and the Caveats That Come With Them
- Where Fugu Fits (and Where It Does Not)
- Why Lushbinary for Orchestration-Model Builds
1What Is Sakana Fugu?
Sakana Fugu is a multi-agent orchestration system delivered as a single, OpenAI-compatible model API. From the caller's side it looks like any other chat model: one base URL, one API key, one chat/completions request. What happens inside is the difference. Fugu treats a set of high-performing models as an agent pool and combines them according to your input, managing the whole exchange so you do not have to.
Sakana's own framing on the release page is that Fugu is "a language model trained to call various LLMs in an agent pool, including instances of itself recursively." In plain terms: the thing routing your request is not a rules engine or a classifier bolted onto a gateway, it is a model that learned coordination as a skill.
The launch also drew attention for its timing. It arrived days after a US export-control directive cut global access to Anthropic's frontier Fable 5 and Mythos 5 models, and Sakana pitched Fugu as a way to reach frontier-level capability without depending on any single vendor. We cover that angle in depth in our Fugu Ultra vs Fable 5 and Mythos comparison.
2Orchestration Models: The Idea Behind Fugu
The argument Sakana makes is straightforward. Hard, real-world tasks need a mix of skills: planning, coding, math, retrieval, careful checking. No single benchmark, and arguably no single model, is best at all of them at once. So the highest performance comes from collective intelligence: knowing which model to use for each part, delegating the work, and combining domain-specific strengths while routing around individual weaknesses.
Teams have done this manually for a while with multi-agent frameworks and LLM gateways. The catch is that those systems are complex to build and tune: you write the routing rules, the verification steps, the fallback logic, and you maintain all of it. Fugu's pitch is that the orchestration itself is learned and packaged behind one API, so you get the benefit of a coordinated team without standing up the coordination yourself.
The core shift
Monolithic models compete on size and training. Orchestration models compete on coordination: which specialist to call, when to verify, and how to combine answers. The unit of competition moves from one model to the system that conducts many.
3How Fugu Works: Route, Delegate, Verify, Synthesize
When a request arrives, Fugu runs it through a learned coordination process rather than a fixed pipeline. For a simple prompt it can answer directly to keep latency and cost low. For a complex, multi-step task it assembles a coordinated group of models and assigns roles, commonly described as a Thinker that plans, a Worker that executes, and a Verifier that checks the work, before synthesizing a single answer.
Two design choices stand out. First, the orchestrator can call instances of itself recursively, so a hard subtask can be decomposed again rather than forced onto one model. Second, the coordination is learned, which means Fugu can pick up collaboration patterns that a human writing routing rules might not think to encode.
Reporting around the launch describes a compact orchestrator, on the order of a 7B-parameter conductor trained with reinforcement learning, steering much larger frontier models in the pool. Sakana has not published a full technical paper at the time of writing, so treat the internal sizing details as reported rather than confirmed, and weight the behavior you can observe through the API over architecture claims.
4Fugu vs Fugu Ultra: The Two Tiers
Sakana ships two tiers behind the same API, so you choose per request which behavior you want.
| Tier | Built for | Tradeoff |
|---|---|---|
| Fugu | Everyday, latency-sensitive work: chat, coding, code review | Prioritizes speed and lower cost |
| Fugu Ultra | Hard, multi-step problems where quality matters most | Coordinates a larger pool, so more latency and tokens |
The flagship variant carries the model id fugu-ultra-20260615. A practical pattern is to default to Fugu for interactive and high-volume traffic, and reserve Fugu Ultra for the requests where a wrong answer is expensive: a complex refactor, a multi-file change, a hard reasoning or research task.
5Pricing and Access
As of June 2026, Sakana lists pay-as-you-go pricing for Fugu Ultra at roughly $5 per million input tokens and $30 per million output tokens, alongside subscription plans in the range of $20, $100, and $200 per month. Access is through one OpenAI-compatible API, so existing clients and coding harnesses can point at Fugu by changing the base URL and key.
Watch the internal token use
Because Fugu can fan a request out across several models and verify intermediate work, a single Ultra call can consume more tokens than one call to a single model. Budget against real traffic and confirm current rates and any regional availability limits on Sakana's pricing page before committing.
6Benchmarks and the Caveats That Come With Them
At launch, Sakana reported that Fugu Ultra stands with the frontier on several headline tests. The numbers below are vendor-reported as of June 2026 and not yet independently reproduced.
| Benchmark | Fugu Ultra (reported) | What it measures |
|---|---|---|
| SWE-Bench Pro | 73.7 | Real-world software engineering tasks |
| TerminalBench 2.1 | 82.1 | Agentic command-line and terminal tasks |
| GPQA-Diamond | 95.5 | Graduate-level science reasoning |
There is a subtlety that makes these easy to misread: Fugu Ultra is a system score, not a single-model score. A high number can reflect excellent routing to the right specialist as much as raw model capability, and that is the point of the product. For a benchmark-by- benchmark walkthrough and how to validate the claims yourself, see our Fugu Ultra benchmarks guide.
7Where Fugu Fits (and Where It Does Not)
Fugu is a strong fit when your workload is varied and you would otherwise be building your own router:
- Mixed task types in one product (chat, coding, research) where no single model is best across the board.
- Teams that want frontier-level quality without committing to one vendor or maintaining bespoke multi-agent plumbing.
- Hard, multi-step jobs where the extra latency and tokens of Ultra are worth a more reliable answer.
A single model can still be the better call when:
- Latency and cost predictability dominate, and you want one model with a known token profile per request.
- You need data residency or deployment control that a hosted orchestration API over a third-party pool does not give you, in which case an open-weight model you run yourself may fit better. See our open-weight model comparison.
- The task is narrow and well understood, where a tuned single model plus a small amount of your own routing is cheaper to reason about.
8Why Lushbinary for Orchestration-Model Builds
Adopting an orchestration model is less about the API call and more about the system around it: how you route between Fugu and Fugu Ultra, how you control token spend when a request fans out, how you evaluate quality against your real workload, and how you keep a fallback if a provider or policy shifts under you. That is the engineering we do.
Lushbinary builds AI products and the infrastructure that makes them reliable: model routing, evaluation harnesses, cost controls, and provider-agnostic architectures that survive a vendor change. Whether you want to pilot Fugu against your current stack or design a routing layer you control, we can help you scope it and ship it.
🚀 Free Consultation
Curious whether an orchestration model like Fugu belongs in your stack? Lushbinary will review your workload, recommend a routing and evaluation approach, and give you a realistic plan with no obligation.
❓ Frequently Asked Questions
What is Sakana Fugu?
Sakana Fugu is a multi-agent orchestration system from Sakana AI, launched June 22, 2026, delivered as one OpenAI-compatible model API. Fugu is itself a language model trained to call a pool of other LLMs, and instances of itself recursively, then route, delegate, verify, and synthesize behind one endpoint.
How is Fugu different from a normal LLM?
A normal LLM answers from its own weights. Fugu decides whether to answer directly or to assemble a team of specialist models, assign roles, verify intermediate work, and combine outputs. From the outside it still behaves like a single model.
What are the Fugu and Fugu Ultra tiers?
Fugu is the standard tier, tuned for low latency and everyday work. Fugu Ultra (model id fugu-ultra-20260615) coordinates a larger pool of agents for hard, multi-step tasks where quality matters more than speed.
How much does Sakana Fugu cost?
Per Sakana's June 2026 pricing, Fugu Ultra pay-as-you-go is about $5 per million input tokens and $30 per million output tokens, with subscription plans around $20, $100, and $200 per month. Orchestration can use more tokens internally, so budget against real traffic.
Are Fugu's benchmark results independently verified?
No. The headline scores (Fugu Ultra at SWE-Bench Pro 73.7, TerminalBench 2.1 82.1, GPQA-Diamond 95.5) are vendor-reported as of launch. Validate them with your own evaluation.
Sources
- Sakana AI, "Sakana Fugu: One Model to Command Them All" (release page)
- Sakana Fugu: pricing, API, benchmarks, and how it works
Content was rephrased for compliance with licensing restrictions. Specifications, pricing, and benchmark data sourced from official Sakana AI materials as of June 2026. Benchmark figures are vendor-reported and may change. Always verify on Sakana's website.
Thinking About Orchestration Models?
Tell us about your workload and we will help you decide whether Sakana Fugu, a single model, or your own routing layer is the right fit, then build it with you.
Ready to Build Something Great?
Get a free 30-minute strategy call. We'll map out your project, timeline, and tech stack - no strings attached.
Prefer email? Reach us directly:

