PyTorch MPS backend silently fails on non-contiguous tensor operations, causing phantom training bugs
9/10 CriticalOn Apple Silicon (MPS backend, PyTorch <2.4), `addcmul_` and `addcdiv_` GPU kernel operations silently fail when writing to non-contiguous output tensors. This caused optimizer state to not update encoder weights, producing a loss plateau that was indistinguishable from a hyperparameter issue and took days to diagnose.
Collection History
Query: “What are the most common pain points with PyTorch for developers in 2025?”4/4/2026
PyTorch's MPS (Apple Silicon GPU) backend had a kernel bug where `addcmul_` and `addcdiv_` operations silently fail when writing to non-contiguous output tensors. The model appeared to be learning (the decoder was training normally), but progress stalled because the encoder stayed frozen. A subtle plateau that looked exactly like a hyperparameter issue.
Created: 4/4/2026Updated: 4/4/2026