CoreWeave: Demand for AI Inference is Growing Fast
What the Rise of Inference Means for Data Center Site Selection
Business is booming for AI infrastructure specialist CoreWeave. But the story goes beyond the numbers - which are big, with 470 megawatts of data center capacity deployed, and 2.2 gigawatts of power under contract.
I was struck by CoreWeave’s recent commentary about its customer demand, and especially the rise of AI inference, which signals a transition in the AI market.
“Inference is the monetization of artificial intelligence, and we are extremely excited to see that use case expanding within our infrastructure,” said CoreWeave CEO Michael Intrator on the company’s earnings call last month. “AI applications are beginning to permeate all areas of the economy, both through startups and enterprise, and demand for our cloud AI services is aggressively growing.”
Training, Inference and Infrastructure
A quick recap:
Training is the process of building the AI model, crunching large volumes of data into an algorithm.
Inference is the process of using the trained model to respond to user queries, like a chatbot prompt.
Training is compute-intensive, while some - but not all - inference is latency-sensitive.
The ongoing shift from training to inference has implications for data center infrastructure. Historically, users want results fast, suggesting that the rise of inference will shift more capacity close to users (dare I say “closer to the edge?”).
But the recent emergence of AI “chain of reasoning,” which takes time to break a complex query into steps, is different, as Intrator explains.
“When you are in a chain of reasoning query, latency is not particularly important. The compute is going to be more impactful than the latency or the relative distance to the query. If you’re in a different type of workload, latency becomes more important.
“Our approach has been, since the early days, to try and place our infrastructure as close to the population centers as we can in order to have the optionality associated with a low latency solution.
“Having said that, as we see chain of reasoning gain more traction, there’s definitely going to be significant demand for latency insensitive workloads that will be able to live in more remote regions. “
Following the Power
That’s an important development for large data center campuses that are being built outside traditional business markets, often following power availability. If the balance of AI computing shifts from training to inference, will training-focused campuses in secondary markets continue to be relevant?
This has been a key question for data center investors. The largest data center markets have been business markets with strong network connectivity - think Ashburn, Dallas, Chicago and Silicon Valley. The network intersections become customer magnets, creating cloud clusters and rich ecosystems of data center operators and customers.
That’s changed recently due to two trends:
Many leading markets have become power-constrained, with no data center connections available for months or years.
Customers are seeking larger campuses, featuring from 300 megawatts to 1 gigawatt or more of power capacity.
That’s why new construction has shifted to markets with ready access to ample power, like Louisiana, Indiana, Mississippi, Wyoming and North Dakota. But these locations often lack the local business markets and network density that have been stables of data center economics.
What happens if AI training tails off, and inference requires proximity to end users?
Reasoning and The Case for Fungibility
CoreWeave’s updates on reasoning and projection of “significant demand for latency insensitive workloads that will be able to live in more remote regions” suggests an ongoing business case for these data centers in secondary and tertiary markets.
But design will matter. Last year I wrote about the importance of hybrid data center designs that can support both air-cooled and liquid-cooled infrastructure. Balancing training and inference use cases are also in the mix, but often using a different term.
CoreWeave’s Michael Intrator:
“So when we build our infrastructure, we really build our infrastructure to be fungible, to be able to be moved back and forth, seamlessly between, training, and, inference.”
Fungible? I became familiar with “fungibility” from insights provided by two veteran industry watchers. The first was in May, when I spoke with David Liggitt from datacenterHawk for the Data Center Richness podcast:
In July, Daniel Golding expanded on the the concept in a LinkedIn essay.
Dan digs into the different types of inference and their latency requirements.
“Training datacenters can be repurposed to batch inference applications even if they are arbitrarily distant from users. And this is highly likely to happen - we don’t need an infinite amount of ML training, which will get much more efficient over time. Inference tasks which require a greater number of tokens - and thus, are more expensive - are likely to be supported at a distance.”
For more on the future of inference, check out Golding’s full essay - “Inference Three Ways - Fungibility is the Way.”
Perhaps spend a little time on unit economics and how they are going to fund the business going forward?
Define - “try”, “close” and “population centers”… in terms of actual MS RTD to the Top 20 ASNs. I’ll wait….