On April 16, 2026, Anthropic released Claude Opus 4.7, its most capable generally available model to date. The numbers tell the story: 64.3% on SWE-bench Pro (up from 53.4%), 70% on CursorBench (up from 58%), and a vision accuracy leap from 54.5% to 98.5%. All at the same $5/$25 per million token pricing as Opus 4.6.

This isn't a paradigm shift. It's a meaningful upgrade across every dimension that matters to developers: better coding, better agentic reasoning, 3x higher image resolution, stricter instruction-following, and a new xhigh effort level that gives you finer control over the quality/cost tradeoff. Anthropic is now running at a $30 billion annualized revenue rate, and Opus 4.7 is the model that has to justify those numbers.

🆕 May 2026 Update: Higher Opus API Limits & SpaceX Compute

At the Code with Claude 2026 developer conference on May 6, Anthropic announced a SpaceX compute partnership (300+ MW, 220,000+ NVIDIA GPUs via the Colossus datacenter in Memphis) and used the new capacity to double Claude Code five-hour rate limits for Pro, Max, Team, and enterprise plans, remove peak-hour throttling, and raise Claude Opus API limits substantially. Tier 1 input tokens per minute jumped roughly 1,500% and output tokens per minute roughly 900%. On May 7, Claude Managed Agents also gained three new capabilities: dreaming (scheduled memory refinement), outcomes (rubric-graded task completion), and multiagent orchestration (parallel subagents on a shared filesystem).

This guide covers everything you need to know: benchmarks, new features, API breaking changes, migration from 4.6, vision capabilities, the competitive landscape against GPT-5.4 and Gemini 3.1 Pro, the May 2026 capacity and Managed Agents updates, and how to get the most out of the model in production.

What This Guide Covers

What Changed in Opus 4.7
Benchmark Results: Coding, Vision & Agentic Tasks
High-Resolution Vision: 3.75 Megapixels
The New xhigh Effort Level & Task Budgets
API Breaking Changes & Migration Guide
Opus 4.7 vs GPT-5.4 vs Gemini 3.1 Pro
Self-Verification & Instruction-Following
Cybersecurity Safeguards & Project Glasswing
Pricing, Availability & Claude Model Lineup
May 2026 Updates: SpaceX Compute & Managed Agents
When to Use Opus 4.7 vs Sonnet 4.6
Why Lushbinary for Your AI Integration

1What Changed in Opus 4.7

Opus 4.7 is a direct upgrade to Opus 4.6, continuing Anthropic's roughly two-month release cadence (Opus 4.5 in November 2025, Opus 4.6 in February 2026, Opus 4.7 in April 2026). It's not a new model tier. It's the same Opus class with targeted improvements in five areas:

Self-Verification

Checks its own work before presenting results. Catches logical faults during planning and validates outputs against original requirements.

3x Vision Resolution

Accepts images up to 2,576px on the long edge (3.75 MP). Scores 98.5% on XBOW visual acuity vs 54.5% for Opus 4.6.

Stricter Instruction-Following

Interprets instructions more literally. Explicit prompts produce more predictable results, but implied-context prompts may need adjustment.

New xhigh Effort Level

Five effort levels: low, medium, high, xhigh, max. Claude Code defaults to xhigh. Deeper reasoning than high without the full cost of max.

Longer Autonomous Sessions

Works coherently for hours on complex tasks. 10-15% higher task success rates with fewer instances of stopping mid-task.

Updated Tokenizer

Same input may produce 1.0-1.35x more tokens. Combined with deeper thinking, token usage increases. Mitigate with effort parameter and task budgets.

Anthropic's internal teams use Claude Code daily, and each model release reflects what they learned from the previous one. Intuit describes Opus 4.7 as "catching its own logical faults during the planning phase and accelerating execution." Vercel's team observed it "doing proofs on systems code before starting work, which is new behavior."

2Benchmark Results: Coding, Vision & Agentic Tasks

Opus 4.7 posted gains across coding, vision, legal, finance, and agentic evaluations. Here are the headline numbers:

Benchmark	Opus 4.7	Opus 4.6	Notable
SWE-bench Pro	64.3%	53.4%	+10.9 points
SWE-bench Verified	87.6%	80.8%	+6.8 points
CursorBench	70%	58%	+12 points
XBOW Visual Acuity	98.5%	54.5%	+44 points, generational leap
GPQA Diamond	94.2%	91.3%	Near-saturation
Terminal-Bench 2.0	3 new tasks solved	baseline	Tasks no prior model could pass
BigLaw Bench	90.9%	-	Harvey, high effort
OfficeQA Pro	21% fewer errors	baseline	Databricks evaluation
Notion Agent	+13% resolution	baseline	93-task internal benchmark
General Finance	0.813	0.767	AlphaSense research-agent

The CursorBench jump from 58% to 70% is particularly significant, as it measures real-world coding assistance quality in the editor most developers actually use. Rakuten reported 3x more production tasks resolved compared to Opus 4.6, with double-digit gains in Code Quality and Test Quality scores. CodeRabbit saw recall improve over 10%, noting the model is "a bit faster than GPT-5.4 xhigh."

On Terminal-Bench 2.0, Opus 4.7 solved three tasks that no previous Claude model (or competing frontier model) could handle, including fixing a race condition that required multi-file reasoning across a complex codebase.

3High-Resolution Vision: 3.75 Megapixels

Previous Claude models were limited to roughly 1,568 pixels on the long edge (about 1.15 megapixels). Opus 4.7 raises that ceiling to 2,576 pixels, roughly 3.75 megapixels, more than 3x the visual capacity. No API parameter changes needed.

Capability	Opus 4.6	Opus 4.7
Max resolution	~1,568px long edge	2,576px long edge
Megapixels	~1.15 MP	~3.75 MP
Visual acuity score	54.5%	98.5%
Coordinate mapping	Scale-factor math required	1:1 pixel mapping

What this means in practice:

Code screenshots at full resolution, no more squinting artifacts or misread variable names
Technical diagrams with fine labels and small text rendered accurately
Chemical structures and scientific notation parsed correctly (confirmed by Solve Intelligence)
Charts and graphs with dense data points interpreted without hallucinating values
Computer use coordinates now map 1:1 with actual pixels, eliminating the scale-factor math previously required

⚠️ Token Cost Note

Higher-resolution images consume more tokens. If you're passing images where fine detail isn't critical, downsample before sending to manage costs. The 3.75 MP ceiling is automatic, there's no opt-in, but you control what you send.

4The New xhigh Effort Level & Task Budgets

Opus 4.7 adds xhigh, a new effort level that sits between high and max. The effort scale now has five levels:

low

Fast, cheap responses

medium

Balanced speed/quality

high

Thorough reasoning

xhigh

Deep reasoning (new default)

max

Maximum thoroughness

Claude Code defaults to xhigh for all plans. Hex's CTO noted that "low-effort Opus 4.7 is roughly equivalent to medium-effort Opus 4.6," meaning the entire capability curve has shifted upward.

Task Budgets (Public Beta)

Task budgets are a new feature that gives the model a rough token target for an entire agentic loop (thinking, tool calls, tool results, and final output). The model sees a running countdown and uses it to prioritize work and wrap up gracefully as the budget runs out.

Task budgets are advisory, not hard caps, distinct from max_tokens, which is a hard per-request ceiling the model isn't aware of
Minimum task budget is 20,000 tokens
For open-ended agentic tasks where quality matters more than speed, Anthropic recommends not setting a task budget

# Using xhigh effort with task budget

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=16384,
    thinking={
        "type": "adaptive"
    },
    effort="xhigh",
    task_budget=50000,  # advisory token target
    messages=[{
        "role": "user",
        "content": "Refactor the auth module..."
    }]
)

5API Breaking Changes & Migration Guide

Opus 4.7 introduces three breaking changes to the Messages API. If you use Claude Managed Agents, there are no breaking API changes.

1. Extended thinking budgets removed

Setting thinking: {"type": "enabled", "budget_tokens": N} now returns a 400 error. Adaptive thinking is the only supported mode. Set thinking: {"type": "adaptive"} explicitly, it's off by default.

2. Sampling parameters removed

Setting temperature, top_p, or top_k to any non-default value returns a 400 error. Omit these parameters entirely and use prompting to guide the model's behavior.

3. Thinking content omitted by default

Thinking blocks still appear in the response stream, but their content is empty unless you opt in with "display": "summarized". If your product streams reasoning to users, the new default will look like a long pause before output begins.

Migration Checklist

# Step 1: Update model name
model = "claude-opus-4-6"  # Before
model = "claude-opus-4-7"  # After

# Step 2: Switch to adaptive thinking
thinking = {"type": "enabled", "budget_tokens": 8192}  # Before (400 error)
thinking = {"type": "adaptive"}  # After

# Step 3: Remove sampling parameters
temperature = 0.7  # Before (400 error)
# Just omit it - use prompting instead

# Step 4: Opt in to thinking display (if needed)
# Add to request: "display": "summarized"

# Step 5: Update max_tokens for headroom
# The new tokenizer may produce 1.0-1.35x more tokens
max_tokens = 8192   # Before
max_tokens = 12000  # After (give headroom)

💡 Behavior Changes Worth Noting

Beyond the breaking changes, Opus 4.7 has several behavioral shifts: more literal instruction-following, response length that adapts to task complexity, fewer tool calls by default (raise effort to increase), a more direct and opinionated tone, more regular progress updates during long agentic traces, and fewer subagents spawned by default.

6Opus 4.7 vs GPT-5.4 vs Gemini 3.1 Pro

The frontier model landscape as of April 2026 is tighter than ever. Here's how Opus 4.7 stacks up against the competition:

Benchmark	Opus 4.7	GPT-5.4	Gemini 3.1 Pro
SWE-bench Pro	64.3%	57.7%	54.2%
SWE-bench Verified	87.6%	78.2%	80.6%
CursorBench	70%	-	-
GPQA Diamond	94.2%	94.4%	94.3%
Context Window	1M tokens	1M tokens	2M tokens
Pricing (in/out)	$5 / $25	$5 / $25	$2 / $12

The takeaway: Opus 4.7 leads convincingly on coding benchmarks, the tasks most directly tied to real-world developer productivity. On graduate-level reasoning (GPQA Diamond), all three models have converged around 94%, effectively saturating the benchmark. The competitive differentiation has shifted from raw reasoning to applied performance on complex, multi-step tasks.

Gemini 3.1 Pro undercuts both Opus 4.7 and GPT-5.4 at $2/$12 per million tokens and offers a 2M token context window. For cost-sensitive workloads where coding performance isn't the primary concern, it's a strong choice. But for enterprise teams whose workloads demand the highest coding capability, Opus 4.7's lead on SWE-bench justifies the premium.

GPT-5.4 leads on computer use (75% OSWorld, first to beat humans) and professional knowledge work (83% GDPval). If your primary use case is desktop automation or broad knowledge tasks, GPT-5.4 may be the better fit. For coding and agentic workflows, Opus 4.7 is the current leader.

7Self-Verification & Instruction-Following

Two behavioral changes in Opus 4.7 deserve special attention because they affect how you write prompts and what you can expect from the model.

Self-Verification

Opus 4.7 proactively verifies its own outputs before reporting them. This isn't just "chain of thought." The model checks its work against the original requirements, catches logical faults during planning, and validates that its output actually solves the stated problem. Vercel's team observed it doing "proofs on systems code before starting work," which is genuinely new behavior.

In practice, this means fewer rounds of "wait, let me check that" from the user. The model catches more of its own mistakes before you see them.

Stricter Instruction-Following

This is a double-edged upgrade. Opus 4.7 interprets instructions more literally than 4.6. If your prompt says "fix the login function," it will fix the login function. It won't also refactor the adjacent auth middleware unless you ask. Notion found it was the first model to pass their "implicit-need tests," tasks where the model must infer what tools or actions are required rather than being told explicitly.

⚠️ Prompt Adjustment Required

If your existing prompts relied on Opus 4.6 filling in implied context or generalizing instructions, you may need to make them more explicit. The flip side: explicit instructions now produce more predictable, reliable results. This is especially noticeable at lower effort levels.

8Cybersecurity Safeguards & Project Glasswing

Opus 4.7 includes automated safeguards that detect and block requests indicating prohibited or high-risk cybersecurity uses. These safeguards are part of Anthropic's broader Project Glasswing initiative and serve as a testing ground for eventual broader release of their more capable Mythos-class models.

The cyber capabilities in Opus 4.7 are intentionally less advanced than what Anthropic's internal Mythos Preview can do. Training included differential reduction of certain cyber capabilities as a safety measure.

Legitimate security professionals can access cybersecurity capabilities through Anthropic's new Cyber Verification Program. This covers vulnerability research, penetration testing, and red-teaming.

Opus 4.7 maintains a similar safety profile to Opus 4.6 with targeted improvements in honesty and resistance to prompt injection attacks. Anthropic's assessment describes it as "largely well-aligned and trustworthy, though not fully ideal."

9Pricing, Availability & Claude Model Lineup

No price increase. Opus 4.7 maintains the same pricing as Opus 4.6:

Tier	Cost
Standard API	$5 input / $25 output per 1M tokens
Prompt caching	Up to 90% savings
Batch processing	50% savings
US-only inference	1.1x standard pricing
Claude Pro plan	$20/month (full Opus 4.7 access)

Full Claude Model Lineup (April 2026)

Model	Best For	Pricing (in/out)
Haiku 4.5	Fast, lightweight tasks	$0.80 / $4
Sonnet 4.6	Balanced performance & cost	$3 / $15
Opus 4.7	Complex reasoning, agentic coding	$5 / $25
Mythos Preview	Cybersecurity (restricted)	$25 / $125

Opus 4.7 is available across:

Claude Pro, Max, Team, and Enterprise subscriptions
Claude API as claude-opus-4-7
Amazon Bedrock
Google Cloud Vertex AI
Microsoft Foundry

The 1M token context window is included at standard pricing with no long-context premium. Maximum output is 128K tokens. The tokenizer change means the same input may cost slightly more (1.0-1.35x) due to different token boundaries, but for most workloads the increase is negligible.

10May 2026 Updates: SpaceX Compute & Managed Agents

Three weeks after the Opus 4.7 launch, at the Code with Claude 2026 developer conference in San Francisco on May 6, Anthropic announced a compute partnership with SpaceX that immediately changes how much Opus 4.7 capacity is available to developers. The deal gives Anthropic access to more than 300 megawatts of new capacity (over 220,000 NVIDIA GPUs) within the month, routed through SpaceX's Colossus datacenter in Memphis.

Anthropic paired the announcement with three immediate rate limit changes:

Claude Code 5-hour limits doubled

Pro, Max, Team, and seat-based enterprise plans get 2x the usage inside every 5-hour window. Effective immediately.

Peak-hour throttling removed

Claude Code no longer reduces limits for Pro and Max accounts during peak hours. Previously users rationed heavy sessions to off-peak windows.

Opus API limits raised substantially

Tier 1 maximum input tokens per minute went up roughly 1,500%; maximum output tokens per minute roughly 900%. All paid tiers received increases.

More capacity coming online

Anthropic CEO Dario Amodei described the deal as an effort to keep up with an 80-fold year-over-year growth rate in Q1 2026 revenue and usage.

📺 Recommended: Code with Claude 2026

Coverage and highlights from Anthropic's developer conference where the SpaceX deal, higher Opus limits, and Managed Agents updates were announced.

Read the official announcement →

Claude Managed Agents: Dreaming, Outcomes & Multiagent Orchestration

On May 7, Anthropic shipped three new features for Claude Managed Agents, the cloud-hosted agent framework launched in April. These changes matter for Opus 4.7 users because Managed Agents run on Opus under the hood for the lead orchestrator role:

Dreaming (research preview)

A scheduled process that reviews past agent sessions and memory stores, extracts patterns, and curates memory so agents improve over time. You choose whether dreaming writes to memory automatically or queues changes for your review. Memory captures what an agent learns during a session; dreaming refines that memory between sessions and surfaces shared learnings across agents.

Outcomes (rubric-driven grading)

You write a rubric describing what success looks like. A separate grader in its own context window evaluates the output against your criteria without being influenced by the agent's reasoning, pinpoints what needs to change, and the agent iterates. You can define an outcome, run the agent, and get a webhook callback when it finishes, which simplifies long-running autonomous jobs.

Multiagent orchestration

A lead agent (typically Opus 4.7) breaks a job into pieces and delegates each to a specialist subagent with its own model, prompt, and tools. Specialists run in parallel on a shared filesystem and contribute back to the lead agent's context. Because events are persistent, the lead agent can check back in with subagents mid-workflow. Netflix has already deployed this pattern for its platform team.

If you're already using Opus 4.7 via the Messages API directly, Managed Agents are worth a second look. They remove a significant amount of infrastructure work (memory persistence, subagent routing, grading harnesses) and the three May updates close the gap between custom agent frameworks and managed ones.

11When to Use Opus 4.7 vs Sonnet 4.6

The Opus/Sonnet split still makes sense. Sonnet 4.6 at $3/$15 handles daily coding tasks, quick questions, and moderate-complexity work extremely well. Opus 4.7 at $5/$25 is for the heavy lifting:

Use Opus 4.7 When:

Multi-file refactoring across large codebases
Long-running agentic tasks (hours of autonomous work)
High-resolution image analysis or computer use
Complex debugging requiring multi-step reasoning
Legal, financial, or scientific document analysis
Production code review where accuracy is critical

Use Sonnet 4.6 When:

Day-to-day coding assistance and completions
Quick questions and explanations
Moderate-complexity tasks with clear scope
Cost-sensitive workloads at scale
Prototyping and iteration
Tasks where speed matters more than depth

For teams using AI coding agents like Cursor, Claude Code, or Kiro, the upgrade path is straightforward: switch your default Opus model to 4.7. The only adjustment needed is reviewing prompts that depended on loose instruction interpretation.

Anthropic also shipped a new /ultrareview slash command in Claude Code that runs a focused review session, flagging bugs and design issues. Pro and Max users get 3 free ultrareviews. And Auto mode is now available for Max users, letting Claude make decisions autonomously with fewer interruptions.

For multi-model routing strategies, consider pairing Opus 4.7 for complex tasks with Sonnet 4.6 for routine work and Haiku 4.5 for high-volume, low-complexity requests. This approach optimizes both cost and quality across your AI workloads.

If you're building multi-agent systems with Claude Code, Opus 4.7's improved multi-agent coordination and longer autonomous sessions make it the clear choice for orchestrator agents, while Sonnet 4.6 can handle individual worker agents cost-effectively.

12Why Lushbinary for Your AI Integration

Integrating frontier AI models into production systems requires more than swapping an API key. You need prompt engineering tuned to each model's behavior, multi-model routing for cost optimization, proper error handling for agentic workflows, and architecture that scales with your usage.

Lushbinary has deep experience building AI-powered applications with Claude, GPT, and Gemini models. We've shipped production systems that use multi-model routing, agentic coding pipelines, and vision-based document processing, exactly the capabilities that Opus 4.7 excels at.

Claude API integration: Messages API, tool use, adaptive thinking, effort levels, and task budgets
Multi-model routing: Opus for complex tasks, Sonnet for routine work, Haiku for high-volume requests
Agentic workflows: Long-running autonomous tasks with proper error recovery and monitoring, including Claude Managed Agents with dreaming, outcomes, and multiagent orchestration
Vision pipelines: Document analysis, screenshot understanding, and computer use integration
AWS deployment: Amazon Bedrock integration, cost optimization, and infrastructure management

🚀 Free Consultation

Want to integrate Claude Opus 4.7 into your product or migrate from 4.6? Lushbinary specializes in AI-powered applications with frontier models. We'll scope your integration, recommend the right multi-model strategy, and give you a realistic timeline, no obligation.

❓ Frequently Asked Questions

What is Claude Opus 4.7 and when was it released?

Claude Opus 4.7 is Anthropic's most capable generally available model, released on April 16, 2026. It scores 64.3% on SWE-bench Pro, 87.6% on SWE-bench Verified, and 70% on CursorBench, with 3x higher image resolution (3.75 megapixels) and a new xhigh effort level.

How much does Claude Opus 4.7 cost?

Claude Opus 4.7 costs $5 per million input tokens and $25 per million output tokens - unchanged from Opus 4.6. Prompt caching saves up to 90%, and batch processing saves 50%. It is available on Claude Pro ($20/mo), Max, Team, and Enterprise plans.

How does Claude Opus 4.7 compare to GPT-5.4 and Gemini 3.1 Pro?

Opus 4.7 leads on SWE-bench Pro (64.3% vs GPT-5.4's 57.7% and Gemini 3.1 Pro's 54.2%) and CursorBench (70%). On GPQA Diamond, all three converge around 94%. Gemini 3.1 Pro is cheaper at $2/$12 per million tokens but trails on coding benchmarks.

What are the breaking API changes in Claude Opus 4.7?

Three breaking changes: (1) Extended thinking budgets removed - use adaptive thinking instead, (2) temperature/top_p/top_k parameters removed - use prompting to guide behavior, (3) thinking content is empty by default - opt in with display: 'summarized'. The model ID is claude-opus-4-7.

What is the xhigh effort level in Claude Opus 4.7?

xhigh is a new effort level between high and max. It provides deeper reasoning than high without the full cost of max. Claude Code defaults to xhigh for all plans. Hex's CTO noted that low-effort Opus 4.7 is roughly equivalent to medium-effort Opus 4.6.

Should I upgrade from Claude Opus 4.6 to 4.7?

Yes, if you use Opus for complex coding or agentic work. The upgrade is free (same pricing) and delivers +13% coding improvement, 3x vision resolution, and better self-verification. The only adjustment needed is reviewing prompts that relied on loose instruction interpretation, since 4.7 follows instructions more literally.

What changed for Opus 4.7 users in May 2026?

At Code with Claude 2026 on May 6, Anthropic announced a SpaceX compute partnership (300+ MW, 220,000+ GPUs) and immediately doubled Claude Code 5-hour rate limits, removed peak-hour throttling for Pro/Max, and raised Opus API limits (Tier 1 input tokens/minute up ~1,500%, output tokens/minute up ~900%). On May 7, Claude Managed Agents added dreaming (scheduled memory refinement), outcomes (rubric-graded completion), and multiagent orchestration (parallel subagents).

📚 Sources

Content was rephrased for compliance with licensing restrictions. Benchmark data sourced from official Anthropic announcements and third-party evaluations as of April 17, 2026. May 2026 capacity and Managed Agents updates sourced from Anthropic's Code with Claude 2026 announcements (May 6-7, 2026). Pricing and features may change, always verify on the vendor's website.

Build with Claude Opus 4.7

Need help integrating Opus 4.7 into your product, migrating from 4.6, or designing a multi-model AI architecture? Let's talk.

Ready to Build Something Great?

Get a free 30-minute strategy call. We'll map out your project, timeline, and tech stack - no strings attached.

Let's Talk About Your Project

Prefer email? Reach us directly:

connect@lushbinary.com

Claude Opus 4.7 Developer Guide: Benchmarks, 3x Vision, xhigh Effort & May 2026 Updates

What This Guide Covers

1What Changed in Opus 4.7

2Benchmark Results: Coding, Vision & Agentic Tasks

3High-Resolution Vision: 3.75 Megapixels

4The New xhigh Effort Level & Task Budgets

Task Budgets (Public Beta)

5API Breaking Changes & Migration Guide

Migration Checklist

6Opus 4.7 vs GPT-5.4 vs Gemini 3.1 Pro

7Self-Verification & Instruction-Following

Self-Verification

Stricter Instruction-Following

8Cybersecurity Safeguards & Project Glasswing

9Pricing, Availability & Claude Model Lineup

Full Claude Model Lineup (April 2026)

10May 2026 Updates: SpaceX Compute & Managed Agents

Claude Managed Agents: Dreaming, Outcomes & Multiagent Orchestration

11When to Use Opus 4.7 vs Sonnet 4.6

12Why Lushbinary for Your AI Integration

❓ Frequently Asked Questions

What is Claude Opus 4.7 and when was it released?

How much does Claude Opus 4.7 cost?

How does Claude Opus 4.7 compare to GPT-5.4 and Gemini 3.1 Pro?

What are the breaking API changes in Claude Opus 4.7?

What is the xhigh effort level in Claude Opus 4.7?

Should I upgrade from Claude Opus 4.6 to 4.7?

What changed for Opus 4.7 users in May 2026?

📚 Sources

Build with Claude Opus 4.7

Ready to Build Something Great?

Contact Us

Ship Better Engineering, Every Week

One Subscription. Every Flagship AI Model.

More from the Blog

How to Build an AI Calorie Tracker App Like Cal AI: Features, Tech Stack & MVP Cost

How to Build an AI App Builder Like Lovable: Architecture, Tech Stack & Cost

ContactUs