Pains
2403 pains collected
Assumption-Heavy Architecture Generation
8Claude Code fills specification gaps with reasonable but contextually wrong assumptions (e.g., OAuth2 instead of required SAML SSO, individual auth instead of organization-based). The generated code looks correct in isolation but creates unmaintainable architectures that don't match actual business requirements.
File Descriptor Exhaustion Limits Scalability
8NGINX's scalability is constrained by the operating system's maximum file descriptors (FDs), which commonly defaults to 1024. As a reverse proxy, NGINX consumes at least 2 FDs per request (client + upstream server), causing rapid FD depletion and hard connection failures at high concurrency if not manually increased via `worker_rlimit_nofile`.
Resource refactoring is destructive and risky
8Renaming or reorganizing resources in Terraform code causes them to be destroyed and recreated rather than updated, risking catastrophic downtime and data loss for stateful resources like databases. There is no native refactoring capability.
Proxy Buffering Misconfiguration Destroys Performance
8Disabling proxy buffering with `proxy_buffering off` forces NGINX worker processes to handle upstream responses in blocking, synchronous fashion, completely subverting the non-blocking architecture. This typically results in slower transfers, prolonged blocking times, and also disables caching, rate limiting, and request queuing.
Converting PyTorch distributed checkpoints to Hugging Face format is extremely painful
8There is no official or well-supported path for converting PyTorch distributed training checkpoints to Hugging Face Transformers-compatible checkpoints. NVIDIA has deprioritized this in favor of their NeMo framework, leaving the community without reliable tooling for this common workflow.
S3 event notifications are unreliable and not guaranteed to trigger
8S3 event triggers (e.g., for Lambda invocation) may fail silently, requiring developers to implement separate recovery mechanisms. This creates unpredictable behavior in event-driven architectures.
PyTorch's Python-centric design limits production deployment performance and interoperability
8PyTorch's tight coupling with the Python runtime introduces GIL-related parallelism constraints, lower execution speed compared to C++ or Java, and poor interoperability with non-Python production stacks. This makes it difficult to meet low-latency, high-throughput, and multi-language requirements in real production systems.
Production Database Concurrency Issues
8The official FastAPI documentation's recommended DB integration pattern using dependencies leads to deadlocks when handling more concurrent users in production environments.
Slow Maintainer Response and PR Review Bottleneck
8The FastAPI maintainer (@tiangolo) is a bottleneck for development; most PRs go months without response, require extensive rework, or remain unmerged despite being high-quality. No delegation of merge permissions limits community contribution.
Local state files without remote backends cause team collaboration and disaster recovery issues
8State files stored locally (default) instead of on remote backends (S3, GCS) prevent team collaboration, create single points of failure, and make disaster recovery impossible. Developers must manually manage state file access.
Poor Performance with Large Data Volumes and Analytics
8PostgreSQL is not optimal for applications requiring real-time or near-real-time analytics. For massive single datasets (billions of rows, hundreds of gigabytes) with frequent joins, queries can take hours. PostgreSQL lacks native columnar storage support, necessitating non-core extensions and increasing architectural complexity.
Sharding fails under high load during chunk migration
8Adding a shard to a MongoDB cluster under heavy load is problematic. MongoDB either migrates chunks so aggressively that it causes DoS conditions on production traffic, or refuses to move chunks at all, making it unsuitable for high-traffic sites with heavy write volumes.
Merge conflicts cause irreversible commit history gaps
8When a developer merges a branch significantly behind the milestone branch, a selection of programs can be overwritten with commit history gaps that are not reversible. Large teams (3-6+ programmers) with multiple feature branches are especially vulnerable.
Prisma environment variable handling breaks in monorepos and ESM contexts
8Prisma struggles to correctly load `.env` files in monorepo setups, doesn't support NODE_ENV-based `.env` switching, and silently pollutes `process.env` without explicit dotenv usage. Recent versions (6.7.0+) have introduced critical ESM-related module resolution failures across Turborepo, Next.js, Remix, and other frameworks.
PyTorch hardware-specific backend bugs cause failures across MPS, CUDA, and ONNX
8Multiple hardware-specific issues affect PyTorch across different backends: LayerNorm/BatchNorm fail to compile on Apple M4 MPS, Conv2d is slower on macOS without MKLDNN, CUDA CI tests exhibit memory corruption (SIGIOT), and ONNX exports with dynamic inputs regressed between versions. These issues require constant per-platform debugging.
TypeORM migration system unreliability and data loss risk
8TypeORM's migration system produces unpredictable results with documented cases of migrations dropping production tables. The system's inconsistency makes it risky for production use.
CI/CD pipeline failures and environment discrepancies after upgrade
8Existing CI/CD pipelines tuned for previous Next.js versions unexpectedly fail after upgrading to Next.js 16. Local development environments diverge from production servers, creating 'works on my machine' scenarios that are difficult to debug.
torch.compile with dynamic shapes causes crashes, recompilations, and incorrect results
8Using `torch.compile` with dynamic shapes leads to crashes (OverflowError from float-to-int conversion), excessive recompilations when mixing Python scalars with 0-d tensors, and incorrect outputs such as wrong adaptive max pooling results on Apple MPS. These issues significantly hinder adoption of compiled execution paths.
Slow incremental compile times after small code changes
8Developers report that incremental rebuilds after making minor source code changes take significantly longer than expected. Workspace rebuilds trigger full dependent crate recompilation (not incremental across boundaries), and the linking phase always runs from scratch without caching, creating major productivity bottlenecks.
Shared Kernel Isolation False Security in Containers
8Docker containers rely on Linux kernel namespaces and cgroups for isolation rather than hardware virtualization. This creates a false sense of isolation—if a kernel vulnerability exists, all running containers inherit it. Container security is critically dependent on timely kernel updates to mitigate container escape vulnerabilities.
No In-Place Major Version Upgrades
8PostgreSQL does not support in-place major version upgrades. Upgrades require either dumping and restoring the entire dataset or setting up logical replication, with rigorous application compatibility testing required. Delaying upgrades increases complexity and risk, as outdated versions miss critical security patches, transforming routine maintenance into a complex, high-risk migration project.
Session management issues and random logouts in authentication
8Third-party authentication solutions (NextAuth.js, Auth.js) integrated with Next.js experience session management problems and unexpected logouts, particularly due to Edge Runtime limitations lacking necessary Node.js APIs.
Sensitive data exposure and authorization complexity
8GraphQL's unified endpoint and flexible query structure can inadvertently expose sensitive data. Without strict authentication and authorization checks at the field level, unauthorized users can query restricted information. Field-level security is complex, error-prone, and can cause entire requests to fail.
PyTorch has high rate of wrong algorithm implementations causing incorrect results
8Approximately 12% of PyTorch bugs stem from incorrect algorithm implementations, a rate four times higher than TensorFlow's 3%. This means developers may unknowingly get silently wrong results from core framework operations.