Nobody publishes server counts. So we built a model to estimate them from what they do publish. Utility permits, EPA generator filings, sustainability reports. This is how it works, what we found, and why we think it matters.
We know not everyone reads the same way. Pick what describes you and we'll point you to the right section.
Ask Google how many CPUs are running in their Council Bluffs campus and they will give you their annual environmental report, which tells you how much electricity they used. Ask how many servers that translates to and they will smile politely and change the subject.
It is not that they are hiding something sinister. Server count is competitive intelligence. If you know how many CPUs AWS has, you can infer their cost structure, their capacity headroom, and their expansion plans. That is worth billions of dollars of advantage.
Honestly, we do not know for certain. But we can get close. And where public cross-checks exist, specifically companies that disclosed their GPU count (which implies a host CPU count at a known ratio), we can validate. Two of our estimates came within the expected range of the derived figures. That is not proof. It is evidence of plausibility.
The point of this project is not to publish a number that is definitively correct. It is to demonstrate that a reasonable, defensible estimate is possible from public information alone. That is a different and more interesting claim.
Every data center has a contracted power figure. It appears in utility permits, EPA generator filings, and sometimes press releases when the facility opens. That number, divided through three filters, gives you a CPU count.
Three variables. Each one sourced from a primary document: sustainability reports, SEC filings, analyst data. Not estimated from thin air. The model is only as good as its inputs, and we spent most of our time on the inputs.
The confidence band comes from running the formula at the extremes. Minimum CPUs if every socket is a 500W Intel Xeon Granite Rapids flagship. Maximum CPUs if every socket is a 75W ARM chip. The real answer lives somewhere in between, and the width of that band tells you how uncertain we are.
Halfway through building this, we realized we had been making a fundamental error for GPU-centric facilities like CoreWeave and Nebius.
When you divide a facility's IT load by CPU TDP, you are implicitly assuming all the power goes to CPUs. But in a GPU data center, roughly 85% of IT power goes to the GPUs. An NVIDIA H100 SXM draws 700 watts. The host CPUs that manage the GPU nodes draw maybe 300 watts combined, and there are only 2 CPUs per 8-GPU node.
We added a GPU power correction. For GPU-centric providers, the model strips 85% of the effective IT draw before dividing by CPU TDP. This gives you the host CPU count, not total compute. Which is the right metric because the question being answered is how many CPUs are running, not how much AI horsepower the facility has.
Why does host CPU count matter? If you are a CPU company pitching to a data center, knowing they have 30,000 host CPUs tells you the replacement cycle, the upgrade path, and the competitive landscape. It is a different number from GPU count but equally valuable for the right buyer.
Google's Council Bluffs campus is a different universe from everything else in the dataset. At 602 MW across 8 buildings, it dwarfs the next largest facility (Microsoft Boydton VA at 412.5 MW). And the reason is obvious once you look at the CPU mix: Google runs 60% ARM chips (their own Axion silicon, based on Neoverse V2). Those draw 75W each versus 300W for Intel Xeon. Same power budget, roughly four times more sockets.
That is not a minor architectural decision. That is a competitive moat built out of custom silicon.
| Facility | MW | CPU Est. | CPU/MW | Type |
|---|---|---|---|---|
| Google Council Bluffs IA | 602 | 2,794,342 | 4,641 | CPU-Mix |
| Microsoft Boydton VA | 412.5 | 1,330,089 | 3,224 | CPU-Mix |
| AWS Ashburn VA | 202.7 | 835,825 | 4,122 | CPU-Mix |
| Meta Altoona IA | 142 | 445,699 | 3,138 | CPU-Mix |
| Nebius Mantsala FI | 75 | 27,583 | 368 | GPU-Centric (host CPUs) |
| CoreWeave Plano TX | 120 | 35,960 | 300 | GPU-Centric (host CPUs) |
Notice how GPU-centric facilities have radically lower CPU-per-MW ratios. That is the model working correctly, not a bug. A Nebius facility running 60,000 H100 GPUs has far fewer CPUs per MW than a Google facility running 2.8 million Axion ARM chips. Different infrastructure, different ratio, different answer.
The second page of the model forecasts CPU fleet size from 2025 to 2030, using each provider's own CapEx as the anchor. Not the Big 4 aggregate. Their individual company figures from SEC filings and earnings calls.
The critical variable is GPU share of server spend. In 2024, Dell'Oro Group reported accelerated servers at 36 to 40% of OEM server revenue. Goldman Sachs projects that figure reaching the mid-50s by 2026 as Blackwell ramps. By 2030, the model assumes 65% for hyperscalers and 90% for GPU-centric providers.
This has a direct consequence for CPU budgets: as GPU spend grows, the slice available for CPU procurement shrinks as a fraction of total spend, even as absolute CapEx grows. CPU fleet sizes will still expand, but more slowly than the data center capacity numbers might suggest.
Not really. General-purpose cloud workloads, databases, web servers, internal tooling, app servers, are not going anywhere. Google, AWS, and Azure all have massive CPU-bound workloads that require continuous fleet refresh. The CPU market is not shrinking. It is just being joined by something much louder.
The interactive model: a single-page HTML/JS tool where you select a real facility and the CPU estimate auto-computes. Power figures are pre-researched from public records, never entered manually. The assumptions panel shows fixed values with hover tooltips linking to the primary source documents. No backend, no dependencies, lives as a static file.
The Python estimation script: runs the full formula chain across all 25 facilities, applies the GPU power correction for GPU-centric providers, generates the CSV dataset with all intermediate values, and validates against public disclosures where available.
The Tableau dashboard: built on the CSV output. Dark map with facility dots sized by CPU estimate, ranking bars, scatter plot of power versus compute, and a sensitivity table showing how estimates shift with PUE changes.
The SQL schema: three tables, six analytical queries. Facility ranking by CPU estimate, CPU density per provider type, PUE sensitivity analysis, utilization sensitivity, cross-check validation, and a flat view for Tableau. Everything lives in the GitHub repo.
The model is defensible but not complete. Here is what would make it significantly better:
More facilities. 25 is a reasonable starting dataset. 100 would be a credible industry tool. The data collection process is manual right now, permit databases, press releases, sustainability reports. Automating that pipeline with a scraper and structured ingestion would be the real engineering work.
Server-level validation. If you can find a facility where both the MW figure and the server count have been disclosed (rare but it happens in some regulatory filings), you can back-calculate the implied TDP and validate your CPU mix assumptions directly.
Time series. Right now this is a point-in-time snapshot. Facilities expand over years. Tracking capacity growth over time, from permit filing to completion to expansion, would make the forecast page much more grounded in observed data rather than projected CapEx.
We read everything submitted through the debate panel. Responses we have written are pinned here. Check back, we update this as questions come in.
Two people, one shared obsession with data infrastructure, and a lot of time spent reading utility permit databases.