Microsoft just fired a major shot across the bow of Amazon and Google in the hyperscaler chip wars. The company unveiled Maia 200, its second-generation AI accelerator built on TSMC's 3nm process, claiming three times the FP4 performance of Amazon's Trainium 3 and superior FP8 performance over Google's TPU v7. Even more striking: Maia 200 delivers 30% better performance per dollar than Microsoft's current fleet, according to Scott Guthrie, EVP of Cloud + AI, in what marks the most aggressive custom silicon push from Redmond yet.
Microsoft isn't playing catch-up anymore in custom AI silicon. The Maia 200 launch represents the company's most aggressive move yet to reduce its dependence on Nvidia while undercutting cloud rivals on both raw performance and economics. The chip went from first silicon to datacenter deployment in half the time of comparable programs, a velocity that suggests Microsoft's betting big on custom hardware as a competitive moat.
The specs tell the story of a chip purpose-built for one thing: inference at scale. Each Maia 200 packs over 140 billion transistors fabricated on TSMC's bleeding-edge 3-nanometer process. It delivers over 10 petaFLOPS in 4-bit precision and over 5 petaFLOPS in 8-bit precision, all within a 750-watt thermal envelope. But Microsoft's real innovation shows up in the memory subsystem, a redesigned architecture centered on 216GB of HBM3e memory running at 7 TB/s bandwidth, plus 272MB of on-chip SRAM and specialized data movement engines that keep models fed without bottlenecks.
"FLOPS aren't the only ingredient for faster AI," Guthrie wrote in today's announcement. "Feeding data is equally important." That's a direct shot at competitors who've prioritized raw compute while letting memory bandwidth become the limiting factor. Microsoft's claiming Maia 200's memory architecture increases token throughput in ways that matter more than peak FLOPS numbers suggest.











