Frontier Large Reasoning Models (LRMs) crash into an accuracy wall when tackling overly intricate puzzles, even when their token budget seems bottomless. LRMs exhibit this weird scaling pattern: they fizzle out as puzzles get tougher, while, curiously, simpler models often nail the easy stuff with flair.










