A sandboxed setup dropped multiple Claude-powered agents into Docker containers to run a full incident response drill. Each agent played a role: probing Kubernetes clusters, sniffing out root causes, and shipping remediation PRs straight to GitHub. Out of 7 test incidents, they nailed the diagnoses and delivered usable fixes. But context slips and clunky handoffs exposed cracks - these things aren’t production-ready yet.










