nateharada.com

Tensorflow I Love You, But You're Bringing Me Down

6/4/2017Updated 10/1/2024

Excerpt

But while Tensorflow is a wonderful framework, the decisions (or lack thereof) being made by the Tensorflow product team are making it increasingly difficult for external developers to adopt. In my eyes, Tensorflow’s public face has grown without proper direction, and is threatening to alienate developers and allow competing frameworks to take over. ### Fragmented high level library support My main gripe strikes me as a weird and totally avoidable issue: there are too damn many Google supported libraries for Tensorflow. Good software engineers know that reinventing the wheel is a bad thing, and so when the prospect of writing yet another training and batching loop rears its ugly head, we look to high level libraries to ease the pain. Apparently, Google employees were aware this would happen, and in a mad scramble to curry organizational favor managed to release no less than … When “new” == “risky” for most companies, developers want a toolkit they can commit to deploying internally that will still be considered “best practice” in a few months. By offering a whole slew of somewhat supported options, Google is hindering adaptation of the Tensorflow framework in general. Avoiding writing boilerplate code each new experiment is a must have for most devs, but having to learn a new “hot” framework because previous ones are no longer feature competitive severely limits research output, and is an unreasonable problem to have when all are controlled by the same company. … Every week it seems a new Tensorflow product is announced – XLA, TFDBG, a graph operation to turn on your toaster, etc. No doubt these features are beneficial, but it also means that any resource about Tensorflow is immediately out of date. Documentation tends to be the most up to date, but often provides no context or example usage. Example code is often stale, sometimes presenting old functions or workflows that aren’t used anymore. Stack overflow questions tend to be only half-useful, since at least some of the answer is probably outdated. This problem *should* fade as time stabilizes the APIs and features, but to me it seems that this should have been planned for ahead of time. Tensorflow has been out for almost 2 years now (an eternity in deep learning time), but the Python API didn’t stabilize until March 2017. The other language bindings are **still** not stable. For a language touting its production-ready capabilities, you’d expect the C++ API to not be shifting under your feet. … ### A cry for help Tensorflow is trying to be everything to everyone, but does not present a developer friendly product to the greater deep learning community. Google is known for creating complex but effective internal tools, and taking these tools public is great for the developers at large. However, when you’re on a team at a company with minimal deep learning experience trying to build out production level systems, it’s almost impossible to learn how to do things correctly.

Source URL

https://nateharada.com/tensorflow-i-love-you-but/

Related Pain Points

PyTorch API inconsistency causes breaking changes across versions

API changes and framework version updates in PyTorch frequently introduce inconsistencies or breaking behavior, accounting for ~25% of all identified bugs. This forces developers to spend significant time tracking down compatibility issues rather than building features.

compatibilityPyTorch

Difficulty learning correct production patterns and best practices

For teams with minimal deep learning experience, it is nearly impossible to learn how to build production-level systems with TensorFlow. Documentation and community resources lack sufficient context for real-world deployment.

docsTensorFlow

Rapid Tool and Framework Proliferation Causes Fatigue

Developers struggle to keep up with an overwhelming number of new and existing tools and frameworks (26% reported challenge in 2021). This creates decision paralysis, version fragmentation where teams become stuck on older versions, and costly migration efforts when attempting to upgrade.

ecosystem

Inconsistent Documentation and Tutorial Gaps

TensorFlow documentation is inconsistent with lags between new functionality and documentation/tutorials. There are conceptual gaps between simple examples and state-of-the-art examples, particularly for RNNs, creating barriers for developers learning both concepts and the framework simultaneously.

docsTensorFlowRNN

Lack of direction and fragmented product vision

TensorFlow's public face has grown without clear strategic direction. Multiple competing initiatives (XLA, TFDBG, etc.) are announced constantly without cohesion, making it difficult for external developers to understand the intended evolution.

ecosystemTensorFlow