Sakana AI Launches Sakana Fugu: A Model That Routes Tasks Across a Swappable Pool of LLMs

Japan’s Sakana AI has launched a new kind of AI system called Sakana Fugu. Instead of being one model that answers everything, Sakana Fugu acts like a smart manager. It routes each task to the best AI model in a pool, then combines the results into one answer. An LLM, or large language model, is the type of AI that powers chatbots and coding tools. Fugu can call many frontier LLMs, and even call copies of itself.

This is a fresh idea. Most AI products pick one model and stick with it. Sakana Fugu treats models like a team. It decides who should handle each job, checks their work, and merges the best parts. Here is how it works and how it scores against top rivals.

What Sakana Fugu Actually Is

Sakana Fugu is an “orchestration model.” Orchestration means coordinating many parts to work together, like a conductor leading a band. Fugu sits behind a single API endpoint. An API is a doorway that lets apps talk to the AI. Its endpoint works just like OpenAI’s, so developers can plug it in easily.

Here is the clever part. Fugu is itself a language model, but it is trained to call other models. It manages four things on its own: choosing which model to use, handing off the task, checking the answer, and combining results. It does this without fixed rules or hard-coded roles. It learns the best way to delegate.

The “swappable pool” matters too. The set of models behind Fugu can be changed. So as better models appear, Fugu can use them without a full rebuild.

Two Versions: Fugu and Fugu Ultra

Sakana offers two versions for different needs.

Fugu — Balances good performance with low latency. Latency is the delay before you get an answer, so low latency means faster replies. It lets you opt out of specific models, which helps with privacy or compliance rules. It suits coding, code review, and chatbots.
Fugu Ultra — Tuned for the highest quality on hard, multi-step problems. It uses a fixed pool of models with no opt-out. Its current model ID is fugu-ultra-20260615.

Benchmarks & Specs

A benchmark is a standard test used to compare AI models. Higher scores are generally better. Sakana reports the scores below against top rivals: Anthropic’s Opus 4.8, Google’s Gemini 3.1 Pro, and OpenAI’s GPT 5.5. These figures are as reported by Sakana AI.

Benchmark (what it tests)	Fugu	Fugu Ultra	Opus 4.8	Gemini 3.1 Pro	GPT 5.5
SWE Bench Pro (real coding fixes)	59.0	73.7	69.2	54.2	58.6
TerminalBench 2.1 (terminal tasks)	80.2	82.1	74.6	70.3	78.2
LiveCodeBench (live coding)	92.9	93.2	87.8	88.5	85.3
LiveCodeBench Pro	87.8	90.8	84.8	82.9	88.4
Humanity’s Last Exam (hard reasoning)	47.2	50.0	49.8	44.4	41.4
CharXiv Reasoning	85.1	86.6	84.2	83.3	84.1
GPQA-D (graduate science Q&A)	95.5	95.5	92.0	94.3	93.6
SciCode (science coding)	60.1	58.7	53.5	58.9	56.1
τ³ Banking (finance tasks)	21.7	20.6	20.6	8.4	20.6
Long Context Reasoning	74.7	73.3	67.7	72.7	74.3
MRCRv2 (long-context recall)	86.6	93.6	87.9	84.9	94.8

What it means: Fugu Ultra leads on most coding and reasoning tests, while GPT 5.5 still tops one long-context recall test (MRCRv2). In short, routing tasks across many models can beat any single model on a wide range of jobs.

Real-World Test Results

Sakana also shared results from hands-on tasks. These show Fugu working on real problems, not just exam-style tests.

Task	Result (as reported)
AutoResearch experiments	Best mean validation BPB of 0.9774 across 123 experiments in ~14 hours on one H100 GPU
Rubik’s Cube solver	Solved all 300 held-out cubes, averaging 19.72 moves
Classical Japanese kana reading	Normalized edit distance (NED) of 0.80
Online trading test	+19.43% average across five 50-week runs

The system is built on research from two ICLR 2026 papers, called “Trinity” and “Conductor,” which study how to learn good orchestration strategies. Managing how many models talk to each other is partly a memory problem; for a deeper look, see this technical guide to the types of agent memory.

How to Use It

Fugu is available through an OpenAI-compatible API at console.sakana.ai. It works with the standard Python OpenAI client. That means many teams can try it with only small changes to their existing code.

The launch drew mixed early reactions. In a manual review of 12 public posts on June 22, 2026, Sakana noted 3 were supportive, 6 were skeptical, and 3 were critical. So the idea is exciting, but some experts want to see more proof in daily use.

FAQ

What does “orchestration model” mean?

It means a model that coordinates other models. Like a conductor leading a band, Fugu decides which AI should handle each task, checks the work, and combines the results into one answer.

How is Fugu different from a normal chatbot?

A normal chatbot uses one model for everything. Fugu uses a pool of models and routes each task to the best one. It can even call copies of itself for complex jobs.

What is the difference between Fugu and Fugu Ultra?

Fugu is faster and lets you turn off certain models for privacy. Fugu Ultra aims for top quality on hard problems but uses a fixed set of models with no opt-out.

Why it matters (especially for India / founders)

For founders, Fugu points to a smarter way to build with AI. Instead of betting on one model, you can use a system that always picks the best tool for the job. As models improve, a swappable pool keeps you up to date without a costly rebuild.

The opt-out feature is useful for Indian teams that handle sensitive data. You can block certain models to meet privacy or compliance needs. And because Fugu uses an OpenAI-style API, Indian startups can test it quickly with their current code.

The takeaway

Sakana Fugu turns many AI models into one coordinated team. It routes tasks, checks answers, and combines results, and its reported benchmark scores beat top single models on many tests. The early reaction is mixed, but the idea is powerful. If it holds up in real use, “orchestration” could become a common way to build AI products. To see how the wider sector is spending, read about how the FAA is betting $875 million to cut flight delays.

Sources

MarkTechPost — Sakana AI Launches Sakana Fugu: An Orchestration Model That Routes Tasks Across a Swappable Pool of Frontier LLMs

Sakana AI Launches Sakana Fugu: A Model That Routes Tasks Across a Swappable Pool of LLMs

Sakana AI Launches Sakana Fugu: A Model That Routes Tasks Across a Swappable Pool of LLMs

What Sakana Fugu Actually Is

Two Versions: Fugu and Fugu Ultra

Benchmarks & Specs

Real-World Test Results

How to Use It

FAQ

What does “orchestration model” mean?

How is Fugu different from a normal chatbot?

What is the difference between Fugu and Fugu Ultra?

Why it matters (especially for India / founders)

The takeaway

Sources

Related coverage

Leave a Comment Cancel reply

Sakana AI Launches Sakana Fugu: A Model That Routes Tasks Across a Swappable Pool of LLMs

What Sakana Fugu Actually Is

Two Versions: Fugu and Fugu Ultra

Benchmarks & Specs

Real-World Test Results

How to Use It

FAQ

What does “orchestration model” mean?

How is Fugu different from a normal chatbot?

What is the difference between Fugu and Fugu Ultra?

Why it matters (especially for India / founders)

The takeaway

Sources

Related coverage

Related Stories

Disney Store AI Shopping Assistant: How It Could Change the Way You Shop

Oracle Admits It Cut 21,000 Jobs, Blaming AI Deployment in Part

Healthcare AI Platform Xsolis Confirms Data Breach Affecting 1.4 Million People

Leave a Comment Cancel reply