Google's New TPUs: Two Chips for the Agent Era, Not Just a Faster Number

Everyone else is fighting over Nvidia’s latest silicon like it’s the last lifeboat, but Google has always played a different game. Its cloud runs on custom Tensor Processing Units, and the company just announced the eighth generation. But here’s the thing: it’s not just a faster chip. Google split the new TPU into two distinct flavors, and I think that tells us more about where they think AI is headed than any spec sheet ever could.

The new chips are called the TPU8t and the TPU8i. The ‘t’ stands for training, the ‘i’ for inference. That’s a deliberate split. In the past, you’d get one TPU design that was expected to do everything reasonably well. Now, Google is saying the ‘agentic era’—where models don’t just answer questions but take actions, use tools, and run autonomously—requires fundamentally different hardware for the two phases of a model’s life.

Let’s talk about the TPU8t first. Training is the ugly, expensive part of the AI lifecycle. You’re throwing petabytes of data at a model for weeks or months. Google claims the 8t can cut training time for frontier models from months to weeks. That’s not a marginal improvement. If true, it changes the economics of who can afford to train a state-of-the-art model. I’ve seen a lot of ‘training accelerators’ come and go, but this is the first one that feels like it was designed from the ground up for the scale of modern foundation models, not retrofitted from a general-purpose GPU architecture.

Then there’s the TPU8i for inference. This is where the agentic bet gets interesting. Inference isn’t just running a query anymore. An agent might call an API, parse the response, decide on the next action, loop back, and do it all again. That’s a different workload profile than ‘user types prompt, model generates text.’ Google is optimizing the 8i for low latency per token and high throughput for these multi-step, tool-using scenarios. I’m skeptical of marketing buzzwords like ‘agent era,’ but the hardware design actually makes sense if you believe that’s where the industry is going.

What I don’t see yet is pricing or availability for non-Google customers. The TPU line has historically been a Google Cloud exclusive, and if you’re not on GCP, you’re still stuck in the Nvidia queue. That’s fine for Google’s internal models—Gemini, search, whatever they’re cooking up next—but it doesn’t change the broader market dynamics for everyone else. It’s a competitive advantage for Google Cloud, not a democratization of hardware.

I also wonder about the ecosystem. Nvidia has CUDA, libraries, and a decade of developer mindshare. Google has Pytorch/XLA and JAX, which are good, but they’re not the default. If you’re a startup building an agentic system, are you going to optimize for TPUs or just rent a cluster of H100s or B200s? The answer is probably the latter, unless Google makes it financially stupid not to switch.

Still, I respect the bet. Most hardware roadmaps are just ‘make the number bigger.’ Google is saying the architecture of AI workloads is changing, and the hardware should change with it. Whether the market agrees remains to be seen, but it’s a more interesting approach than just cranking up the clock speed and calling it a day.

Google’s New TPUs: Two Chips for the Agent Era, Not Just a Faster Number

Comments (0)