lujie.ac.cn

[PDF] Detecting TensorFlow Program Bugs in Real-World Industrial ...

Updated 3/24/2026

Excerpt

platform. Our findings and actions are: • Finding: Most bugs (63.69%) are common Python bugs, with argument mismatches, undefined variables, and missing attributes as the top three types of bugs. Action: We deployed four existing representative static tools, Mypy [16], Pylint [17], ... their applications before submitting a job to the platform. In summary, this paper makes the following contributions: • We report an extensive empirical study on 12,289 in- dustrial TensorFlow job failures. Our findings show that most failure-triggering bugs (63.69%) are Python bugs, and four existing representative static bug-detection tools … There are also other common bug types, such as Argument Mismatch (invoking a function with an inconsistent number of actual arguments) and Undefined Variables (referencing a variable before its definition), accounting for 12.67% and 12.51% of all bugs, respectively. Several Python bug types are directly related to the dynamic … Found (accessing maps with non-existent keys) and Divide by Zero (dividing a value by 0). 2) TensorFlow-Specific Bugs: Checkpoint Error, the most common bug type, accounts for 17.49% of all bugs. Platform users frequently use the checkpointing mechanism to store a … Shape Error (8.82%) arises when invoking TensorFlow operators with arguments of incompatible shapes (incom- patible ranks or dimensions). It is difficult for developers to understand the tricky semantics of thousands of Tensor- Flow APIs, leading to frequent Shape Error bugs in prac- tice. For example, many TensorFlow operators (e.g., soft- … Finding 2: Shape Error is one of the most common TensorFlow-specific bugs (8.82% of total bugs) and such bugs can be detected effectively as demonstrated in [14]. The other types of TensorFlow-specific bugs include Out of Memory (GPU out of memory, commonly fixed by reducing the sizes of tensors), Loss NaN (invalid loss values), GPU Sync Failed (memory issues in GPU [23]), and Graph Not Complete (invalid dataflow graphs). 4

Source URL

https://lujie.ac.cn/files/papers/ShapeTracer.pdf

Related Pain Points