A sneaky bug in PyTorch’s MPS backend let non-contiguous tensors silently ignore in-place ops like addcmul_. That’s optimizer-breaking stuff. The culprit? The Placeholder abstraction - meant to handle temp buffers under the hood - forgot to actually write results back to the original tensor.










