Google has implemented strict caps on Meta’s usage of its Gemini artificial intelligence models, according to a report by the Financial Times.
The restriction stems from a massive compute deficit. Meta approached Google Cloud to purchase a massive amount of Gemini capacity to backstop its internal systems, but Google was forced to reject the full scale of the order because it simply does not have enough computing power to satisfy Meta’s demand without shortchanging other enterprise clients.
1. The Timeline and Project Disruptions
The compute bottleneck has been building behind the scenes for months, according to FT sources:
- The March Deadline: Google privately informed Meta around March 2026 that it could not fulfill the requested API and cloud processing volume for the Gemini models.
- Internal Project Delays: Because Meta was relying on Google’s model infrastructure to power a fraction of its own systems, the unexpected capacity ceiling has disrupted and delayed several of Meta’s internal AI projects.
- The Ad and Moderation Angle: While Meta builds its own open-source models (like Llama), the company has simultaneously been in talks to leverage Google’s proprietary Gemini models. The goal was to fine-tune them with Meta’s ad data to screen for fraudulent promotions and scale automated content moderation across Instagram and Facebook.
[ Meta Seeks Massive Gemini Compute ] ──► Approaches Google Cloud for unprecedented volume
│
▼ (March 2026 Capacity Rejection)
[ Google Enforces Usage Ceilings ] ──► Cites physical infrastructure & hardware constraints
│
▼ (Downstream Impact)
[ Meta Enforces Internal Rationing ] ──► Instructs staff to severely ration AI "token" usage
2. The Token Rationing Mandate
Because Meta felt the infrastructure squeeze more acutely than any other third-party customer, its engineering leadership has had to step in with strict internal usage protocols.
Meta management has explicitly instructed its developer teams to be far more disciplined and efficient with AI tokens—the basic semantic units used to measure how much text and data an AI model processes. By forcing staff to write tighter, less compute-heavy prompts and optimization scripts, Meta is trying to stretch its restricted Gemini allocation as far as possible while it waits for alternative compute pipelines to open.
3. The Wider Industry Silicon Squeeze
While Meta is the most high-profile casualty of the capacity ceiling, the Financial Times noted that several other major Google Cloud enterprise clients have seen their Gemini allocations trimmed, albeit to a lesser degree.
The roadblock highlights a massive paradox currently facing the tech sector: despite hyperscalers spending hundreds of billions of dollars globally to construct data centers and buy Nvidia hardware, the physical supply of energy and server capacity is still failing to keep pace with enterprise AI workloads.
During Alphabet’s Q1 2026 earnings call, Google Cloud reported cross-sectional revenue surging to $20 billion. However, CEO Sundar Pichai explicitly warned investors that persistent computing power constraints were actively capping the cloud unit’s growth ceiling, resulting in an enormous backlog of enterprise orders that nearly doubled quarter-on-quarter.