A team pointed Claude Code at autoresearch and spun up 16 Kubernetes GPUs. The setup ran ~910 experiments in 8 hours. val_bpb dropped from 1.003 to 0.974 (2.87%). Throughput climbed ~9×. Parallel factorial waves revealed AR=96 as the best width. The pipeline used H100 for cheap screening and H200 for validation. SkyPilot provisioned the clusters and enabled agent-led provisioning. This avoided one-by-one tuning.










