ar5iv.labs.arxiv.org
An Empirical Study on Bugs Inside PyTorch: A Replication Study
Excerpt
### IV-C Results: Root Causes of PyTorch Bugs Following the replicated study on TensorFlow [10], we classified the analyzed bugs’ root causes into 1 of 11 categories. Our results show that more than 25% of the bugs analyzed were caused by inconsistencies in the APIs which demonstrate that PyTorch requires more time and development effort in order to be a truly reliable framework. In the following, we discuss the 11 categories for root causes of bugs in PyTorch from the 194 bugs analyzed. |Root Cause|Description|Freq.| |--|--|--| |Logic Error|Wrong programming logic|25.77%| |Inconsistency|Inconsistent changes in the API|25.26%| |Algorithm|Wrong implementation of algorithms|12.37%| |Corner case|Wrong handling of corner cases|9.79%| |Configuration error|Wrong configurations|8.76%| |Type confusion|Type mismatches|8.25%| |Memory|Incorrect usage of memory|3.09%| |Referenced type error|Incorrect import of libraries|2.58%| |Processing|Incorrect variable initialization or assignment|2.06%| |Concurrency|Synchronization problems|1.55%| |Dimension mismatch|Dimension mismatch between tensors|0.52%| 1. Logic error (25.77%). The bugs in this category were caused by wrong programming logic. For example, in issue #50663 [32], maintainers report a bug in the implementation of a deep copy operation. A deep copy operation is expected to create an exact replica of the copied object, however, a wrong logic in the implementation caused it to not copy part of the object (gradient buffer), causing users to experience undefined behavior errors. 2. Inconsistency (25.26%). The bugs in this category were caused by changing the APIs or updating the framework’s version which resulted in inconsistencies or incompatibilities between framework interfaces, modules, or functions. For example, pull request (PR) #53424 [33] reports a bug in calling a tensor object. This bug was caused because of name shadowing after adding a new module in an update which raised an error during creating new tensor objects. … 4. Corner case (9.79%). The bugs in this category, were caused by wrong handling of corner cases. Corner cases are considered particular use-cases or program execution flow that are not generally used or triggered by library users, but must, nevertheless, be handled by the library. For example, in issue #16532 [35], it was reported that gradients are missing when autograd is called inside a function on Multi-GPUs. We classify such issues as corner cases since most developers will not use PyTorch functions in such a way. 5. Configuration error (8.76%). The bugs in this category were caused by wrong configurations. For example, issue #22389 [36] reports a bug which caused the developers to be unable to use TensorBoard. This bug happened because a dependency which was required for TensorBoard’s functionality was not installed during PyTorch installation. 6. Type confusion (8.25%). The bugs in this category were caused by type mismatches. Such issues present errors that stop the program from functioning. For example, issue #42218 [37] reports that the program failed to function because of such an error. 7. Memory (3.09%). The bugs in this category were caused by incorrect usage of memory resources. These issues can be caused because of using too much RAM or memory leaks. For example, issue #35901 [38] reports that program failed during run because of an out of memory error. … |Computation Graph|Computing tensor graph operations|6.93%| |CUDA|Interface with NVIDIA’s CUDA|6.93%| |Documentation|Functionalities for describing other components|4.95%| |Framework|Functionalities that don’t belong to other categories|4.95%| |API|Expand functionalities but not integrated into framework|1.98%| … Type confusion is a common issue in both PyTorch and TensorFlow libraries. This challenge can be largely attributed to Python’s dynamic typing. While dynamic typing allows for more concise expressions in code, it also means that type-related bugs are often only discovered during runtime. The majority of bug symptoms we observed in these libraries were program crashes and functional errors, which can be disruptive and time-consuming to resolve. This highlights the need for more robust type checking mechanisms and better developer education on how to avoid type-related pitfalls in deep learning libraries [57].
Related Pain Points
PyTorch API inconsistency causes breaking changes across versions
7API changes and framework version updates in PyTorch frequently introduce inconsistencies or breaking behavior, accounting for ~25% of all identified bugs. This forces developers to spend significant time tracking down compatibility issues rather than building features.
PyTorch dependency mismanagement causes missing integrations at install time
5Required dependencies for optional PyTorch integrations (e.g., TensorBoard) are not automatically installed, causing silent failures discovered only at runtime. Developers must manually track and install auxiliary dependencies that should be bundled or clearly flagged during setup.
Tensor dimension and type mismatches in PyTorch produce unclear runtime errors
5Mismatched tensor shapes or data types are a frequent source of cryptic runtime errors in PyTorch, requiring developers to manually inspect shapes and dtypes before each operation. Gradient propagation issues with custom layers compound the debugging difficulty.