Dark Capacity: The Write-Up

00

Pick your starting point

We know not everyone reads the same way. Pick what describes you and we'll point you to the right section.

Genuinely curious

You saw "data center CPU estimation" and wanted to know what that actually means

Data person

You want to see the methodology, the numbers, and whether the assumptions hold up

The skeptic

You think estimating CPU counts from power data is too hand-wavy to be useful

Great starting point. The short version: data centers use a lot of power. That power has to go somewhere. We worked backwards from the power number to estimate how many CPUs are running inside. Start with Section 01 and follow the thread.

You want Section 02. The formula, the assumptions, where each variable comes from, and the sensitivity analysis. All of it is sourced from primary documents. The GitHub repo has the Python script, SQL schema, and the full dataset. Scroll past the intro if the story-telling is not your thing.

Fair. The honest answer is: the estimates are uncertain. We never claimed otherwise. What we built is a defensible model with explicit assumptions and a confidence band that tells you how uncertain we are. Check Section 03 where we validate against actual public disclosures. Two of our estimates landed within the expected range of the derived figures. That is evidence of plausibility, not proof of accuracy. Read it and then tell us where it breaks.

01

Why don't they just publish it?

Ask Google how many CPUs are running in their Council Bluffs campus and they will give you their annual environmental report, which tells you how much electricity they used. Ask how many servers that translates to and they will smile politely and change the subject.

It is not that they are hiding something sinister. Server count is competitive intelligence. If you know how many CPUs AWS has, you can infer their cost structure, their capacity headroom, and their expansion plans. That is worth billions of dollars of advantage.

"So if nobody publishes it, how do you know the estimates are right?"

Honestly, we do not know for certain. But we can get close. And where public cross-checks exist, specifically companies that disclosed their GPU count (which implies a host CPU count at a known ratio), we can validate. Two of our estimates came within the expected range of the derived figures. That is not proof. It is evidence of plausibility.

The point of this project is not to publish a number that is definitively correct. It is to demonstrate that a reasonable, defensible estimate is possible from public information alone. That is a different and more interesting claim.

02

The formula: simpler than you think

Every data center has a contracted power figure. It appears in utility permits, EPA generator filings, and sometimes press releases when the facility opens. That number, divided through three filters, gives you a CPU count.

Total DC Power (MW)

PUE

= IT Load (MW)

IT Load x Utilization x 10⁶ = Effective Draw (W)

Effective Draw (W)

Weighted CPU TDP (W)

= CPU Count

PUE = Power Usage Effectiveness. 1.09 means 9% of power goes to cooling, not servers.
TDP = Thermal Design Power. Watts per CPU socket under load.
Weighted TDP = blend of Intel Xeon (300W), AMD EPYC (320W), ARM custom silicon (75W) per provider mix.

Three variables. Each one sourced from a primary document: sustainability reports, SEC filings, analyst data. Not estimated from thin air. The model is only as good as its inputs, and we spent most of our time on the inputs.

The confidence band comes from running the formula at the extremes. Minimum CPUs if every socket is a 500W Intel Xeon Granite Rapids flagship. Maximum CPUs if every socket is a 75W ARM chip. The real answer lives somewhere in between, and the width of that band tells you how uncertain we are.

03

The GPU problem: nobody warned us about

Halfway through building this, we realized we had been making a fundamental error for GPU-centric facilities like CoreWeave and Nebius.

When you divide a facility's IT load by CPU TDP, you are implicitly assuming all the power goes to CPUs. But in a GPU data center, roughly 85% of IT power goes to the GPUs. An NVIDIA H100 SXM draws 700 watts. The host CPUs that manage the GPU nodes draw maybe 300 watts combined, and there are only 2 CPUs per 8-GPU node.

"So what did you do about it?"

We added a GPU power correction. For GPU-centric providers, the model strips 85% of the effective IT draw before dividing by CPU TDP. This gives you the host CPU count, not total compute. Which is the right metric because the question being answered is how many CPUs are running, not how much AI horsepower the facility has.

Why does host CPU count matter? If you are a CPU company pitching to a data center, knowing they have 30,000 host CPUs tells you the replacement cycle, the upgrade path, and the competitive landscape. It is a different number from GPU count but equally valuable for the right buyer.

Cross-Check Validation

Validated Nebius Mantsala, Finland 60,000 GPUs disclosed, which implies roughly 30,000 host CPUs. Our model estimate: 27,583. Within expected range.

Validated Nebius Kansas City, MO 35,000 GPUs planned, implying roughly 17,500 host CPUs. Our model estimate: 12,872. Within expected range.

Note CoreWeave Plano, TX The 3,500 GPU figure from the press release covers one rack cluster, not the full 120 MW facility. Cross-check not directly comparable to whole-facility estimate.

04

What the numbers: actually say

602 MW — largest campus (Google Council Bluffs IA)

2.8M CPU sockets estimated at that campus alone

8 MW Smallest facility — AT&T Kings Mountain NC

Google's Council Bluffs campus is a different universe from everything else in the dataset. At 602 MW across 8 buildings, it dwarfs the next largest facility (Microsoft Boydton VA at 412.5 MW). And the reason is obvious once you look at the CPU mix: Google runs 60% ARM chips (their own Axion silicon, based on Neoverse V2). Those draw 75W each versus 300W for Intel Xeon. Same power budget, roughly four times more sockets.

That is not a minor architectural decision. That is a competitive moat built out of custom silicon.

Facility	MW	CPU Est.	CPU/MW	Type
Google Council Bluffs IA	602	2,794,342	4,641	CPU-Mix
Microsoft Boydton VA	412.5	1,330,089	3,224	CPU-Mix
AWS Ashburn VA	202.7	835,825	4,122	CPU-Mix
Meta Altoona IA	142	445,699	3,138	CPU-Mix
Nebius Mantsala FI	75	27,583	368	GPU-Centric (host CPUs)
CoreWeave Plano TX	120	35,960	300	GPU-Centric (host CPUs)

Notice how GPU-centric facilities have radically lower CPU-per-MW ratios. That is the model working correctly, not a bug. A Nebius facility running 60,000 H100 GPUs has far fewer CPUs per MW than a Google facility running 2.8 million Axion ARM chips. Different infrastructure, different ratio, different answer.

05

The CapEx forecast: where is this going?

The second page of the model forecasts CPU fleet size from 2025 to 2030, using each provider's own CapEx as the anchor. Not the Big 4 aggregate. Their individual company figures from SEC filings and earnings calls.

The critical variable is GPU share of server spend. In 2024, Dell'Oro Group reported accelerated servers at 36 to 40% of OEM server revenue. Goldman Sachs projects that figure reaching the mid-50s by 2026 as Blackwell ramps. By 2030, the model assumes 65% for hyperscalers and 90% for GPU-centric providers.

This has a direct consequence for CPU budgets: as GPU spend grows, the slice available for CPU procurement shrinks as a fraction of total spend, even as absolute CapEx grows. CPU fleet sizes will still expand, but more slowly than the data center capacity numbers might suggest.

"Is this pessimistic about CPUs?"

Not really. General-purpose cloud workloads, databases, web servers, internal tooling, app servers, are not going anywhere. Google, AWS, and Azure all have massive CPU-bound workloads that require continuous fleet refresh. The CPU market is not shrinking. It is just being joined by something much louder.

06

What we actually built: and how

The interactive model: a single-page HTML/JS tool where you select a real facility and the CPU estimate auto-computes. Power figures are pre-researched from public records, never entered manually. The assumptions panel shows fixed values with hover tooltips linking to the primary source documents. No backend, no dependencies, lives as a static file.

The Python estimation script: runs the full formula chain across all 25 facilities, applies the GPU power correction for GPU-centric providers, generates the CSV dataset with all intermediate values, and validates against public disclosures where available.

The Tableau dashboard: built on the CSV output. Dark map with facility dots sized by CPU estimate, ranking bars, scatter plot of power versus compute, and a sensitivity table showing how estimates shift with PUE changes.

The SQL schema: three tables, six analytical queries. Facility ranking by CPU estimate, CPU density per provider type, PUE sensitivity analysis, utilization sensitivity, cross-check validation, and a flat view for Tableau. Everything lives in the GitHub repo.

Python SQL (PostgreSQL) Tableau Public HTML / CSS / JS Utility permits EPA filings SEC 6-K filings Baxtel Datacenter.fyi Interconnection.fyi

07

What we would do: next

The model is defensible but not complete. Here is what would make it significantly better:

More facilities. 25 is a reasonable starting dataset. 100 would be a credible industry tool. The data collection process is manual right now, permit databases, press releases, sustainability reports. Automating that pipeline with a scraper and structured ingestion would be the real engineering work.

Server-level validation. If you can find a facility where both the MW figure and the server count have been disclosed (rare but it happens in some regulatory filings), you can back-calculate the implied TDP and validate your CPU mix assumptions directly.

Time series. Right now this is a point-in-time snapshot. Facilities expand over years. Tracking capacity growth over time, from permit filing to completion to expansion, would make the forecast page much more grounded in observed data rather than projected CapEx.

"The best models are not the ones that are right. They are the ones that are wrong in ways you understand."

08

Our responses: your questions

We read everything submitted through the debate panel. Responses we have written are pinned here. Check back, we update this as questions come in.

Responses from the Team

Questions submitted through the debate panel that we have responded to publicly.

No responses yet. Submit a question through the debate panel and we will respond here.

The Team

Who built: this

Two people, one shared obsession with data infrastructure, and a lot of time spent reading utility permit databases.

G

Gargee Nimdeo

Data and BI Engineering

Portfolio LinkedIn GitHub

S

KG Sriram

Product Marketing Manager

LinkedIn GitHub

S

Sandeep Vangara

Global Supply Manager

LinkedIn GitHub

What's actually insidea data center?

Pick your starting point

Why don't they just publish it?

The formula: simpler than you think

The GPU problem: nobody warned us about

What the numbers: actually say

The CapEx forecast: where is this going?

What we actually built: and how

What we would do: next

Our responses: your questions

Who built: this

What's actually inside
a data center?