METR tapped 16 devs to squash 136 live bugs with Cursor (Sonnet 3.5/3.7). They clocked 146 h. AI users zipped through code, but stalls, reviews, and IDE lag devoured their lead.
One dev who logged 50+ hours with Cursor unlocked a 38% speedup. That steep learning curve and costly context pivots wipe out gains for everyone else.
Implication: AI tools feel fast, but context switching kills momentum. Measure real gains by looking at full workflow speed, not just bursts.