arxiv.org
[PDF] An Empirical Study on Bugs Inside PyTorch: A Replication ... - arXiv
Excerpt
TABLE I ROOT CAUSES OF BUGS IDENTIFIED IN PYTORCH. Root Cause Description Freq. Logic Error Wrong programming logic 25.77% Inconsistency Inconsistent changes in the API 25.26% Algorithm Wrong implementation of algorithms 12.37% Corner case Wrong handling of corner cases 9.79% Configuration error Wrong configurations 8.76% Type confusion Type mismatches 8.25% Memory Incorrect usage of memory 3.09% Referenced type error Incorrect import of libraries 2.58% Processing Incorrect variable initialization or as- signment 2.06% … were caused by inconsistencies in the APIs which demonstrate that PyTorch requires more time and development effort in order to be a truly reliable framework. In the following, we discuss the 11 categories for root causes of bugs in PyTorch from the 194 bugs analyzed. 1) Logic error (25.77%). The bugs in this category were … tation caused it to not copy part of the object (gradient buffer), causing users to experience undefined behavior errors. 2) Inconsistency (25.26%). The bugs in this category were caused by changing the APIs or updating the framework’s version which resulted in inconsistencies or incompati- … corner cases since most developers will not use PyTorch functions in such a way. 5) Configuration error (8.76%). The bugs in this category were caused by wrong configurations. For example, issue #22389 [36] reports a bug which caused the developers to be unable to use TensorBoard. This bug happened … variables being initialized or assigned incorrectly, using incor- rect formats for variables, or other incorrect data processing related usages. Concurrency (1.55%) and dimension mismatch (0.52%) type errors were caused by synchronization problems (such as issue #67626 [41]) and dimension mismatch during tensor computation and transformation operations (such as PR … libraries. Inconsistencies are the second most important bug root cause in both libraries, where changes in the APIs caused breaking changes or incompatible behaviour in the library. An- other common theme is the prevalence of type confusion bugs across both libraries, a common issue in dynamically typed languages such as Python and configuration errors, due to the … that we find a much higher occurrence of bugs caused by wrong implementation of algorithms (12% in PyTorch) than the figures reported in TensorFlow (3%). Root Causes: PyTorch bugs are caused majorly by logic errors (25%), API inconsistency (25%), and wrong algorithm implementation (12%). Both PyTorch … Build Failure Program fails to compile 11.34% Warning-style error Display of warning message 8.25% Hang Program gets stuck mid-run 0.53% and SyncBatchNorm operations behaving incorrectly and causing the program to generate incorrect results. 3) Performance degradation (12.89%). This symptom in- … Torch reports more frequent performance degradation ( 13%). Warning-style errors are comparably similar, and bugs that cause the library to become not responsive are rare. Symptoms: Both PyTorch and TensorFlow frequently report as functional errors and program crash as the most frequent bug symptoms. While PyTorch reports more frequent performance Degradation, build failures
Source URL
https://arxiv.org/pdf/2307.13777.pdfRelated Pain Points
PyTorch has high rate of wrong algorithm implementations causing incorrect results
8Approximately 12% of PyTorch bugs stem from incorrect algorithm implementations, a rate four times higher than TensorFlow's 3%. This means developers may unknowingly get silently wrong results from core framework operations.
PyTorch API inconsistency causes breaking changes across versions
7API changes and framework version updates in PyTorch frequently introduce inconsistencies or breaking behavior, accounting for ~25% of all identified bugs. This forces developers to spend significant time tracking down compatibility issues rather than building features.