Marcus Emadi | Director at Turning Point Capital
CONTENTS
Google Cloud’s Infrastructure GM Sachin Gupta on Building for the Future of AI
While Google still lags behind Amazon and Microsoft in terms of cloud revenue, it is pushing hard to establish itself as the premier destination for AI infrastructure investment.
Speaking at the Google Cloud Summit in London in October 2024, Google’s EMEA President Tara Brady asserted that it was a “fact” Google created generative AI—likely a nod to its 2017 research paper Attention Is All You Need, which introduced the transformer architecture now foundational to generative AI models.
Brady cited strong momentum for Google in the AI space, stating that 90 percent of AI unicorns currently use Google Cloud Platform (GCP). This year alone, Google has signed AI-centric deals with Vodafone, Warner Bros. Discovery, Mercedes-Benz, Bayer, Best Buy, Orange, PwC, and others. Pfizer, Hiscox, Toyota, Lloyds Bank, Bupa, and Monzo were also mentioned as AI customers.
AI Demands New Infrastructure Choices
Google Cloud’s General Manager of Infrastructure, Sachin Gupta, said he is thrilled by the pace of AI development across sectors. “It’s exciting to see industries moving from experimentation into real-world scaling and production,” he told DCD.
“AI is forcing a decision,” he said at the summit. “Unlike traditional legacy applications, AI usually requires new infrastructure investment.”
Like other hyperscalers, Google is rapidly expanding its global data centre footprint. Gupta explained that infrastructure design now varies depending on whether the use case is training or inference. For large-scale model training, high-density clusters with vast network bandwidth are essential—either colocated or in close proximity. Inference, on the other hand, necessitates low-latency, high-availability environments.
The Hypercomputer Vision
A core part of Google’s AI strategy is the “hypercomputer,” a concept unveiled in late 2023. It refers to a unified, end-to-end architecture optimised for AI—spanning physical infrastructure, networking, hardware, and software layers.
“We view it as a full-stack design that brings together all the infrastructure components needed for AI,” said Gupta. This integrated approach, he claims, enables Google to achieve up to four times more performance per GPU or TPU compared to traditional siloed systems—an essential gain in a market where companies are buying GPUs in the tens of thousands.
Three Approaches to AI Infrastructure
Gupta categorised Google’s AI infrastructure approach into three main deployment models:
- Latency-insensitive, non-sovereign – Workloads can be centralised in a few global hubs.
- Sovereignty-sensitive – Training may be global, but fine-tuning and serving must be in-country. Google places GPUs and TPUs locally to comply.
- Latency-critical – Where ultra-low latency is required, infrastructure must be close to the end-user. These cases are still developing.
The first two models are well supported by Google’s existing regional footprint. For the third, Google is offering on-premises cloud solutions.
Bringing AI to On-Premises and the Edge
It’s no longer realistic to expect enterprises to move every workload to the public cloud. At the London event, Google projected that the industry is now beyond its last round of enterprise data centre upgrades—and that many organisations are shifting from “cloud-first” to “cloud-only.” However, news of cloud repatriation suggests the story isn’t so simple.
Gupta offered a more nuanced view: “For AI, something new is needed. But which path to take? If you want maximum scalability, flexibility, cutting-edge innovation and economies of scale, the public cloud is ideal.”
Still, some customers—particularly in defence, government, banking, and energy—can’t or won’t move to public cloud. For them, Google offers Distributed Cloud: a solution that brings Google Cloud’s capabilities to customers’ own data centres or edge sites, using third-party hardware such as Dell or HPE.
These systems range from compact 1U servers to hundreds of racks. Air-gapped versions are also available for sensitive use cases. One prominent user is McDonald’s, which is deploying Google hardware in thousands of outlets for AI-driven equipment analysis. Others are using the solution to deliver sovereign cloud services.
For organisations that want cloud-like AI capabilities on-premises—without the capital investment of GPUs—Distributed Cloud can offer both lower latency and reduced costs. Common use cases include translation, speech-to-text, enterprise search, and localised inference.
“Some customers are happy to train in the public cloud, then bring the model on-premise to fine-tune or develop applications,” said Gupta.
Modernising Enterprise Applications
Application modernisation is another driver of on-prem AI deployments. Many enterprise applications and data still reside in virtual machines across legacy infrastructure.
“As companies invest in AI, they must consider how much of their existing IT estate should transition to a cloud model,” Gupta noted. “Distributed Cloud allows you to migrate on-prem systems into a modernised cloud-like environment—still on-premise, but with the benefits of cloud APIs.”
Currently, Distributed Cloud only supports Nvidia GPUs (A100, H100, and soon H200). Google’s proprietary TPUs aren’t yet available outside of its data centres, although a scaled-down Edge TPU is available via Coral, a Google subsidiary. When asked if TPUs could eventually be deployed on customer premises, Gupta said the company is open to it.
“We could extend to AMD, Intel, or even our own TPUs,” he said. “Right now, Nvidia’s offering meets the current demand, but we’ll continue to adapt as needed.”
Sovereignty as a Key AI Driver
Google’s recent £1 billion data centre investment in the UK underscores how sovereignty is shaping infrastructure planning, particularly in Europe where capacity is stretched.
“Sovereignty is absolutely part of the decision,” Gupta said. “What infrastructure we deploy, and where, depends entirely on local use cases and regulatory needs. Some customers must keep data and inference within their own borders, so we’re building to support that.”
Calling it a “continuum of sovereignty,” he said Google is building data centres of various sizes and configurations—equipped with GPUs, TPUs, or CPUs as required by each market.
Chips and Flexibility
Google currently offers access to Nvidia GPUs and its own Tensor Processing Units (TPUs), now in their sixth generation—known as Trillium, launched in May 2024. TPUs were originally developed for internal use in 2015 and have been available to customers since 2018.
Competitors such as Oracle, Microsoft, and Amazon offer Nvidia and AMD chips, with Microsoft and Amazon also deploying custom silicon. IBM is the only one planning to support Intel’s new Gaudi 3 GPU.
Unlike its rivals, Google hasn’t announced plans to support AMD’s latest MI300X GPU. An April 2024 report suggested Google is content with its current hardware mix. Still, Gupta said the company remains open to customer preferences.
“We’ve always been flexible on CPUs,” he noted. “We’re just as open when it comes to GPUs and AI accelerators. If our customers want AMD or Intel, we’ll absolutely consider them.”