Ah, WebArena—where getting math wrong gets a pass. Out of ten benchmarks, eight stumbled in spectacular style, misjudging things by a staggering 100%. Enter the AI Benchmark Checklist (ABC), a 43-point lifeline designed to yank these tests out of the abyss and show what AI can actually do.