Anthropic officially launched Code Review, a sophisticated multi-agent system built directly into the Claude Code platform.
The tool is designed to solve the “pull request (PR) bottleneck” created by the recent explosion of AI-generated code. By using a “team” of specialized AI agents, Anthropic aims to catch subtle logic errors and security flaws before they ever reach a human reviewer.
The Multi-Agent “Team” Approach
Unlike standard linters or basic AI chat windows, Code Review operates as a coordinated workforce:
- Parallel Agents: When a PR is opened, several agents analyze the code simultaneously from different perspectives (e.g., security, performance, logic flow).
- Aggregation & Ranking: A “Lead Agent” then consolidates these findings, removes duplicates, and ranks them by severity using color-coded labels (Red for critical bugs, Yellow for concerns, and Purple for legacy issues).
- Step-by-Step Reasoning: For every bug found, Claude provides a detailed explanation of why it’s problematic and offers a recommended fix that can be applied instantly via Claude Code.
Performance & “Vampire Bug” Detection
Anthropic has been using this tool internally for several months, reporting that it has tripled the amount of “substantive” feedback developers receive.
- Catching the “Unseen”: In one internal case, the tool flagged an “innocuous-looking” change that would have silently broken an authentication service—a bug that human reviewers had missed.
- Scale of Findings: In large PRs (>1,000 lines), Code Review flags significant issues 84% of the time, averaging 7.5 findings per review. Even in small PRs (<50 lines), it finds notable issues 31% of the time.
Pricing & Availability
Because of the heavy compute required to run multiple agents in parallel, the service is positioned as a premium enterprise tool.
| Detail | Current Status (March 10, 2026) |
| Availability | Research Preview for Claude for Teams and Enterprise subscribers. |
| Integration | Direct “Day 0” integration with GitHub as a GitHub Action. |
| Cost | Token-based; averages $15 to $25 per review, scaling with PR complexity. |
| Model | Powered by the latest Claude Opus 4.6, which features a 128K output limit. |
The “Firefox” Benchmark
The launch comes on the heels of a massive security showcase where Anthropic used this technology (specifically Claude Opus 4.6) to scan the Mozilla Firefox codebase. In just two weeks, the AI discovered 22 security vulnerabilities, 14 of which were classified as “High Severity”—nearly 20% of all high-severity bugs Firefox patched in the entirety of 2025.


