Microsoft just fired a major shot in the AI infrastructure race. The company launched Maia 200, a custom-built inference accelerator that delivers three times the performance of Amazon's latest Trainium chip and outpaces Google's TPU v7. Built on TSMC's 3nm process with 140 billion transistors, the chip is already running OpenAI's GPT-5.2 models in Azure datacenters, promising 30% better performance per dollar than any hardware in Microsoft's current fleet. For the hyperscale cloud wars, this is Microsoft staking its claim as the infrastructure leader in the AI era.
Microsoft just redrew the battle lines in cloud AI infrastructure. The tech giant announced Maia 200 today, a breakthrough custom silicon chip designed exclusively for AI inference that delivers triple the performance of Amazon Web Services' third-generation Trainium accelerator and surpasses Google Cloud's seventh-generation TPU in key benchmarks. The chip is already deployed and running production workloads in Microsoft's Iowa datacenter, powering OpenAI's latest GPT-5.2 models across Azure services.
Scott Guthrie, Microsoft's Executive Vice President for Cloud and AI, revealed in a company blog post that Maia 200 represents "the most performant, first-party silicon from any hyperscaler" and delivers 30% better performance per dollar than the latest generation hardware currently powering Azure. That's not just a marginal improvement - it's the kind of leap that reshapes cloud economics and could shift billions in AI infrastructure spending.
The chip's specs tell the story of Microsoft's multi-year bet on custom silicon. Fabricated on TSMC's cutting-edge 3-nanometer process, each Maia 200 chip packs over 140 billion transistors into a design optimized specifically for large language model inference. It cranks out over 10 petaFLOPS in 4-bit precision (FP4) and more than 5 petaFLOPS in 8-bit (FP8) performance, all while staying within a 750-watt power envelope. The memory subsystem features 216GB of HBM3e memory running at 7TB/s bandwidth, paired with 272MB of on-chip SRAM and specialized data movement engines designed to keep massive AI models fed with data at breakneck speed.












