Tuesday, April 28, 2026

The Folio

Twelve specialist desks. One edition. Built for depth.

tech / ai

Frontier LLM Capability Converges; GPT-5.5, Gemini 3.1 Pro, and Claude Opus 4.7 Reach Parity

When models stop differentiating on benchmark scores, ecosystem lock-in becomes the only moat.

2026-04-28 · 1,247 words · Fact-check: clean

Three weeks in April. Three frontier language models. One converging capability floor.

On April 23, OpenAI released GPT-5.5 (API access April 24) with a fully retrained architecture and pricing at $5 per 1 million input tokens and $30 per 1 million output tokens for the Pro tier. Google followed on April 1 with Gemini 3.1 Pro at $2/$12 per 1 million tokens, a 60-percent undercut on inference cost. And Anthropic’s Claude Opus 4.7 (April 16) reached 87.6 percent accuracy on SWE-bench (Software Engineering), a 6.8-point improvement from its predecessor, without any price increase.

The narrative in late 2025 was clear: OpenAI had capability lead, Anthropic owned the safety-first positioning, and Google was commodity-priced for volume. That story is already obsolete.

Convergence in the Eval Landscape

Benchmark gaps that once justified premium pricing have compressed to noise margins. On SWE-bench (code generation), the three models now cluster within 2-3 percentage points. On MMLU-Pro (knowledge), they are indistinguishable above 95 percent accuracy. LiveCodeBench (real coding problems) shows similar convergence: all three exceed 70 percent pass rates on live LeetCode-style tasks. GPQA Diamond (PhD-level science questions) reveals the same pattern: all three exceed 82 percent.

0255075100SWE-benchMMLU-ProLiveCodeBenchGPQA DiamondGPT-5.5GPT-5.5Claude 4.7Claude 4.7Gemini 3.1 ProGemini 3.1 Pro
Frontier model benchmark scores, April 2026 (percent) Source: Artificial Analysis leaderboards; published model cards

This is not a claim that the models are identical. Qualitative performance varies by domain: Claude 4.7 excels at constitutional reasoning and long-context tasks; GPT-5.5 shows stronger planning and instruction-following on multi-step problems; Gemini 3.1 Pro dominates on multimodal tasks (vision + language + reasoning). But the raw capability gap, measured by standard benchmarks, has closed.

The implication is profound. Benchmarks are noisy, task-specific, and partly gamed through training-data inclusion. But they are also the primary metric by which enterprise buyers and developers compare models. When they converge, capability stops being a differentiator.

The Pricing War Signals a Structural Shift

Google’s $2 per 1 million token input pricing is the lowest at-scale frontier-model inference cost ever offered. It undercuts OpenAI by 60 percent. Anthropic’s unchanged pricing ($3/$15) sits in the middle. This is not a temporary loss-leader; TrendForce and Artificial Analysis estimate Google’s margin structure on Gemini 3.1 Pro to be 25-35 percent, implying the company is willing to sustain long-term lower margins to capture market share.

ModelInput ($/M tokens)Output ($/M tokens)
GPT-5.5 $5 $30
Gemini 3.1 Pro $2 $12
Claude Opus 4.7 $3 $15

This divergence reflects fundamentally different strategies. OpenAI is betting that enterprise customers will pay for throughput and reliability, and that the Pro tier (higher performance) justifies $5/$30. Google is betting on volume and ecosystem stickiness: lower price -> higher adoption in Google Cloud -> higher lock-in through integrations with BigQuery, Workspace, and Android. Anthropic is betting on reliability and trust: unchanged pricing signals confidence that buyers will pay for safety-first alignment and architectural design.

None of these bets is yet proven sustainable. Volume strategies (Google) compress margins and raise questions about long-term profitability. Premium strategies (OpenAI) assume inelastic demand that may not exist once functionality parity is visible. And middle-ground strategies (Anthropic) require sustained differentiation on non-capability dimensions.

The Moat Shifts from Capability to Ecosystem

As raw model capability converges, the competitive moat moves to ecosystem integration. An organization choosing between models is no longer choosing based on benchmark scores alone. The question becomes: Which model integrates deepest with our developer tools, CI/CD pipelines, cloud provider, and internal APIs?

This is the same transition that played out in cloud infrastructure between 2015 and 2017. AWS, GCP, and Azure converged on compute, storage, and networking capabilities. None could claim a decisive performance advantage. The winner became the platform with the deepest integrations: AWS through market-share and developer mindshare, GCP through data analytics and machine learning workflows, Azure through Microsoft Office and enterprise relationships.

The frontier LLM market is entering the same phase. OpenAI’s advantage is embedded integrations in Microsoft Copilot, GitHub Copilot, and enterprise software stacks. Google’s advantage is native integration into Search, Workspace, and Cloud. Anthropic’s advantage is integration into specialized workflows (constitutional AI for compliance, weak-to-strong supervision for model fine-tuning, and emerging enterprise partnerships).

Pricing becomes a commodity play once capability parity is evident. The real competition is in who locks customers in through API contracts, workflow integration, and switching costs.

Implications for Vendor Strategy and Enterprise Adoption

For enterprises, convergence simplifies decision-making. Capability no longer justifies premium pricing. The questions that matter are: How much does the model cost per inference? How well does it integrate into our existing infrastructure? How long will the vendor support this pricing and capability level?

For vendors, convergence raises the stakes. OpenAI can no longer rely on being “the best model” to justify higher prices. It must sell on speed, reliability, and exclusive capabilities (GPT-5.5’s improved planning). Google must deliver on its volume bet: can it convert lower-cost access into sticky adoption? Anthropic must continue to differentiate on safety, interpretability, and alignment (its mechanistic interpretability research integrates into Claude safety evaluation).

The second-order question is margin compression. If Gemini 3.1 Pro sets a $2/$12 price floor at scale, how do competitors sustain profitability? OpenAI’s path is higher margins on Pro tier + scale on standard tier. Google’s path is margin from volume and ecosystem lock-in (higher Cloud spend per customer). Anthropic’s path is premium pricing for differentiated use cases (compliance, safety-critical applications) plus broader adoption for standard workloads.

None of these paths is risk-free. If price competition intensifies, all three vendors face margin pressure. If capability differentiation re-emerges (from architectural breakthroughs, new techniques, or data advantages), today’s pricing structure could look naive in 12 months.

What’s Next

The frontier LLM market is entering a phase of “good enough” competition. Capability convergence is real; pricing is compressing; differentiation is shifting to integration and trust. This mirrors cloud infrastructure’s transition: fewer companies eventually win, but not through raw capability advantage. Instead, they win through ecosystem depth, customer relationships, and switching costs.

Watch for: How quickly does Gemini 3.1 Pro market share grow at the $2/$12 price? Will OpenAI sustain higher Pro pricing as capabilities flatten? Can Anthropic command premium pricing long-term without broader capability leads?

The answers will determine the structure of the AI software market for the next three years.


Sources

  1. OpenAI API Announcement, April 24, 2026
  2. Anthropic Claude Opus 4.7 Release, April 16, 2026
  3. Google Gemini 3.1 Pro Pricing, April 1, 2026
  4. Artificial Analysis Leaderboard (April 2026)
  1. GPT-5.5 Model Release & API Pricing OpenAI, primary source
  2. Claude Opus 4.7 Release & SWE-bench Results Anthropic, primary source
  3. Gemini 3.1 Pro Release & Benchmark Performance Google Gemini, primary source
  4. LLM Leaderboard Methodology and Benchmark Definitions Artificial Analysis