Pains

2403 pains collected

Category:

Tech:

Severity:

Assumption-Heavy Architecture Generation

Claude Code fills specification gaps with reasonable but contextually wrong assumptions (e.g., OAuth2 instead of required SAML SSO, individual auth instead of organization-based). The generated code looks correct in isolation but creates unmaintainable architectures that don't match actual business requirements.

architectureClaude Code

File Descriptor Exhaustion Limits Scalability

NGINX's scalability is constrained by the operating system's maximum file descriptors (FDs), which commonly defaults to 1024. As a reverse proxy, NGINX consumes at least 2 FDs per request (client + upstream server), causing rapid FD depletion and hard connection failures at high concurrency if not manually increased via `worker_rlimit_nofile`.

configNGINX

Resource refactoring is destructive and risky

Renaming or reorganizing resources in Terraform code causes them to be destroyed and recreated rather than updated, risking catastrophic downtime and data loss for stateful resources like databases. There is no native refactoring capability.

dxTerraform

Proxy Buffering Misconfiguration Destroys Performance

Disabling proxy buffering with `proxy_buffering off` forces NGINX worker processes to handle upstream responses in blocking, synchronous fashion, completely subverting the non-blocking architecture. This typically results in slower transfers, prolonged blocking times, and also disables caching, rate limiting, and request queuing.

configNGINX

Converting PyTorch distributed checkpoints to Hugging Face format is extremely painful

There is no official or well-supported path for converting PyTorch distributed training checkpoints to Hugging Face Transformers-compatible checkpoints. NVIDIA has deprioritized this in favor of their NeMo framework, leaving the community without reliable tooling for this common workflow.

migrationPyTorchHugging Face Transformers

S3 event notifications are unreliable and not guaranteed to trigger

S3 event triggers (e.g., for Lambda invocation) may fail silently, requiring developers to implement separate recovery mechanisms. This creates unpredictable behavior in event-driven architectures.

compatibilityAmazon S3AWS Lambda

PyTorch's Python-centric design limits production deployment performance and interoperability

PyTorch's tight coupling with the Python runtime introduces GIL-related parallelism constraints, lower execution speed compared to C++ or Java, and poor interoperability with non-Python production stacks. This makes it difficult to meet low-latency, high-throughput, and multi-language requirements in real production systems.

deployPyTorchPythonTorchScript

Production Database Concurrency Issues

The official FastAPI documentation's recommended DB integration pattern using dependencies leads to deadlocks when handling more concurrent users in production environments.

compatibilityFastAPI

Slow Maintainer Response and PR Review Bottleneck

The FastAPI maintainer (@tiangolo) is a bottleneck for development; most PRs go months without response, require extensive rework, or remain unmerged despite being high-quality. No delegation of merge permissions limits community contribution.

ecosystemFastAPI

Local state files without remote backends cause team collaboration and disaster recovery issues

State files stored locally (default) instead of on remote backends (S3, GCS) prevent team collaboration, create single points of failure, and make disaster recovery impossible. Developers must manually manage state file access.

storageTerraformstate backendsS3

Poor Performance with Large Data Volumes and Analytics

PostgreSQL is not optimal for applications requiring real-time or near-real-time analytics. For massive single datasets (billions of rows, hundreds of gigabytes) with frequent joins, queries can take hours. PostgreSQL lacks native columnar storage support, necessitating non-core extensions and increasing architectural complexity.

performancePostgreSQL

Sharding fails under high load during chunk migration

Adding a shard to a MongoDB cluster under heavy load is problematic. MongoDB either migrates chunks so aggressively that it causes DoS conditions on production traffic, or refuses to move chunks at all, making it unsuitable for high-traffic sites with heavy write volumes.

deployMongoDB

Merge conflicts cause irreversible commit history gaps

When a developer merges a branch significantly behind the milestone branch, a selection of programs can be overwritten with commit history gaps that are not reversible. Large teams (3-6+ programmers) with multiple feature branches are especially vulnerable.

dataGit

Prisma environment variable handling breaks in monorepos and ESM contexts

Prisma struggles to correctly load `.env` files in monorepo setups, doesn't support NODE_ENV-based `.env` switching, and silently pollutes `process.env` without explicit dotenv usage. Recent versions (6.7.0+) have introduced critical ESM-related module resolution failures across Turborepo, Next.js, Remix, and other frameworks.

configPrismaTurborepoNext.js+2

PyTorch hardware-specific backend bugs cause failures across MPS, CUDA, and ONNX

Multiple hardware-specific issues affect PyTorch across different backends: LayerNorm/BatchNorm fail to compile on Apple M4 MPS, Conv2d is slower on macOS without MKLDNN, CUDA CI tests exhibit memory corruption (SIGIOT), and ONNX exports with dynamic inputs regressed between versions. These issues require constant per-platform debugging.

compatibilityPyTorchCUDAONNX+1

TypeORM migration system unreliability and data loss risk

TypeORM's migration system produces unpredictable results with documented cases of migrations dropping production tables. The system's inconsistency makes it risky for production use.

migrationTypeORM

CI/CD pipeline failures and environment discrepancies after upgrade

Existing CI/CD pipelines tuned for previous Next.js versions unexpectedly fail after upgrading to Next.js 16. Local development environments diverge from production servers, creating 'works on my machine' scenarios that are difficult to debug.

deployNext.js

torch.compile with dynamic shapes causes crashes, recompilations, and incorrect results

Using `torch.compile` with dynamic shapes leads to crashes (OverflowError from float-to-int conversion), excessive recompilations when mixing Python scalars with 0-d tensors, and incorrect outputs such as wrong adaptive max pooling results on Apple MPS. These issues significantly hinder adoption of compiled execution paths.

buildPyTorch

Slow incremental compile times after small code changes

Developers report that incremental rebuilds after making minor source code changes take significantly longer than expected. Workspace rebuilds trigger full dependent crate recompilation (not incremental across boundaries), and the linking phase always runs from scratch without caching, creating major productivity bottlenecks.

buildRustCargo

Shared Kernel Isolation False Security in Containers

Docker containers rely on Linux kernel namespaces and cgroups for isolation rather than hardware virtualization. This creates a false sense of isolation—if a kernel vulnerability exists, all running containers inherit it. Container security is critically dependent on timely kernel updates to mitigate container escape vulnerabilities.

securityDocker

No In-Place Major Version Upgrades

PostgreSQL does not support in-place major version upgrades. Upgrades require either dumping and restoring the entire dataset or setting up logical replication, with rigorous application compatibility testing required. Delaying upgrades increases complexity and risk, as outdated versions miss critical security patches, transforming routine maintenance into a complex, high-risk migration project.

migrationPostgreSQL

Session management issues and random logouts in authentication

Third-party authentication solutions (NextAuth.js, Auth.js) integrated with Next.js experience session management problems and unexpected logouts, particularly due to Edge Runtime limitations lacking necessary Node.js APIs.

authNext.jsNextAuth.jsAuth.js

Sensitive data exposure and authorization complexity

GraphQL's unified endpoint and flexible query structure can inadvertently expose sensitive data. Without strict authentication and authorization checks at the field level, unauthorized users can query restricted information. Field-level security is complex, error-prone, and can cause entire requests to fail.

securityGraphQL

PyTorch has high rate of wrong algorithm implementations causing incorrect results

Approximately 12% of PyTorch bugs stem from incorrect algorithm implementations, a rate four times higher than TensorFlow's 3%. This means developers may unknowingly get silently wrong results from core framework operations.

otherPyTorch

1…10 11 12 13 14…101