Amazon Web Services just launched an AI agent that could revolutionize how companies handle system outages. The DevOps Agent automatically investigates technical failures and provides remediation suggestions before human engineers even join the call. Early testing shows it can solve problems in 15 minutes that typically take veteran engineers hours to diagnose.
Amazon Web Services is betting that AI can solve one of tech's most stressful problems - figuring out why systems crash and how to fix them fast. The company's new DevOps Agent doesn't just monitor for problems; it actively investigates them like a digital detective, working through multiple hypotheses while human engineers are still getting their morning coffee.
The timing couldn't be better. As companies become increasingly dependent on cloud infrastructure, even minor outages can cost millions. Commonwealth Bank of Australia has already put the tool through its paces, and the results are striking. According to AWS, what would typically take a seasoned engineer hours to diagnose, the AI solved in under 15 minutes.
"By the time the on-call ops team member dials in, they have an incident report with preliminary investigation of what could be the likely outcome, and then suggest what could be the remediation as well," Swami Sivasubramanian, AWS's vice president of agentic AI, told CNBC.
The agent works by pulling data from third-party monitoring tools like Datadog and Dynatrace, then automatically spinning up multiple investigation threads. Instead of waiting for a human to manually check logs, network connections, and database performance, the AI assigns different agents to explore various failure scenarios simultaneously.
This isn't Amazon's first rodeo with AI-powered developer tools. The company launched Kiro over the summer, a coding assistant that generates and modifies source code from text prompts. But DevOps Agent tackles a different pain point - the high-pressure moments when everything's on fire and customers are complaining.
The competitive landscape is heating up fast. Microsoft's Azure team introduced their own SRE Agent back in May, while startups like Resolve and Traversal are also targeting site reliability engineers with AI assistants. The race reflects how crucial these tools have become as companies struggle with increasingly complex infrastructure.












