buttondown.com

Weekly GitHub Report for Pytorch: December 01, 2025

12/8/2025Updated 1/7/2026

Excerpt

**Device and Hardware Support Limitations:**Issues highlight hardware-specific limitations such as LayerNorm and BatchNorm failing to compile on Apple M4 GPU with MPS backend, Conv2d being slower on macOS CPUs due to missing MKLDNN backend, and FP8 lowering tests failing on certain NVIDIA devices due to hardware constraints. **Feature Proposals and API Enhancements:**Several proposals aim to improve PyTorch functionality, including adding a debug mode context manager, a new DTensor placement … **Profiler and Debugging Issues:**The PyTorch C++ Profiler fails to collect information for the "privateuseone" backend due to incorrect flag handling, and Inductor pattern matcher improvements are proposed to add debug output and fix error message formatting. **DTensor and Distributed Tensor Bugs:**DTensor has bugs such as torch.equal() failing when comparing scalar to sharded tensors due to incorrect redistribution logic, and a violation in redistribute_cost function where direct redistribution cost is unexpectedly higher than via intermediate states. **Platform and API Inconsistencies:**The C++ Generator API requires platform-specific compilation unlike tensor creation, and importing .pt2 models shows inconsistent input validation between Windows and Linux, causing platform-dependent quantization failures. **Test Failures and CI Issues:**Multiple tests fail or are disabled due to regressions, flaky behavior, or environment changes, including parametrization tests failing after Python 3.14 migration, disabled HFPretrained tests on Linux, and CI failures after PyTorch 3.14.1 release. **CUDA and GPU Backend Problems:**Issues include SIGIOT stack smashing errors in CUDA CI tests indicating memory corruption, hipblaslt MI300 TF32 accuracy problems, and regression in training with Inductor backend causing illegal memory access during backward pass. **Export and Serialization Bugs:**Exporting models with sparse tensor inputs fails due to lack of dynamic shape support, and sequential torch.load() calls on the same stream fail with the new zipfile serialization format, causing errors on the second read. **Torch.compile and Dynamic Behavior Issues:**Using torch.compile with certain features causes errors such as failures with in-place int64 addmm_ operations due to dimension expansion errors, and mixing dynamic Python scalars with 0-d tensors causing excessive recompilations. **Performance and Optimization Requests:**Proposals include adding gating fusion to flex attention epilogue for improved performance, and logging side effects in Dynamo to better track user code behavior. **Backend Compiler and Runtime Errors:**Inductor backend fails compiling models with while loops and scatter ops due to IndexError, and torch._higher_order_ops.while_loop causes CPU-GPU synchronization issues leading to performance degradation. **Documentation and Development Tooling:**Suggestions include adding new Spin CLI commands to simplify development workflows and standardizing OpenReg testing with examples and documentation for new device backends. **Thread Safety and Concurrency Issues:**The non-thread-safe … **Dynamic Shape and Compilation Crashes:**Compiling models with dynamic shapes involving transpose and floating-point scale interpolation crashes due to OverflowError from float infinity to int conversion. **ONNX Export Failures:**Exporting models to ONNX fails with assertion errors when using cdist with dynamic inputs in PyTorch 2.8.0, whereas it worked in earlier versions. **Padding and Autotune Errors:**Padding a dimension marked as dynamic during compilation produces incorrect code and autotune errors despite successful compilation. **Documentation Build and Disk Space Issues:**The CI runner runs out of disk space during Sphinx site builds due to unnecessary rebuilding of already built documentation, suggesting skipping the build to prevent failures. **Numerical Accuracy and Backend Discrepancies:**FFT functions produce incorrect energy normalization on Intel CPUs for specific input sizes, and adaptive max pooling on Apple MPS devices with torch.compile yields significantly incorrect results compared to eager mode. **Segmentation Faults and Integer Overflow:**ReplicationPad2d causes segmentation faults with extremely large padding values near INT64_MAX due to integer overflow instead of raising proper exceptions. … - issues/169313, issues/169643, issues/169756 **Runtime errors with dynamic shapes and data types:**Runtime errors such as a TypeError from applying the unary ~ operator to a 'SymBool' type and a NotImplementedError for the "index_cpu" operation on Float8_e4m3fn tensors highlight challenges with dynamic shapes and new data types in PyTorch. These issues affect model execution and require fixes in later versions or backend support. … - issues/169331 **Graph and compiler correctness bugs:**A bug in the PyTorch compiler causes incorrect ordering of symbolic integer node inputs in the backward graph compared to the forward graph, potentially leading to assertion failures in regional inductor. This discrepancy affects graph correctness despite some cases running without error. - issues/169712 **Side effect detection improvements:**The current method for detecting side effects when using fullgraph=True in vLLM is unreliable, prompting a request for a better approach. Improving this detection is necessary to ensure correct graph compilation and execution. - issues/169598

Source URL

https://buttondown.com/weekly-project-news/archive/weekly-github-report-for-pytorch-december-01-2025-1604/

Related Pain Points

PyTorch hardware-specific backend bugs cause failures across MPS, CUDA, and ONNX

Multiple hardware-specific issues affect PyTorch across different backends: LayerNorm/BatchNorm fail to compile on Apple M4 MPS, Conv2d is slower on macOS without MKLDNN, CUDA CI tests exhibit memory corruption (SIGIOT), and ONNX exports with dynamic inputs regressed between versions. These issues require constant per-platform debugging.

compatibilityPyTorchCUDAONNX+1

torch.compile with dynamic shapes causes crashes, recompilations, and incorrect results

Using `torch.compile` with dynamic shapes leads to crashes (OverflowError from float-to-int conversion), excessive recompilations when mixing Python scalars with 0-d tensors, and incorrect outputs such as wrong adaptive max pooling results on Apple MPS. These issues significantly hinder adoption of compiled execution paths.

buildPyTorch