Pains

2403 pains collected

Category:

Tech:

Severity:

Slow Maintainer Response and PR Review Bottleneck

The FastAPI maintainer (@tiangolo) is a bottleneck for development; most PRs go months without response, require extensive rework, or remain unmerged despite being high-quality. No delegation of merge permissions limits community contribution.

ecosystemFastAPI

Table corruption issues in PostgreSQL

PostgreSQL experiences table corruption problems that can result in data integrity issues. This was significant enough to motivate organizations like Uber to evaluate alternative databases.

storagePostgreSQL

Compilation failures without error reporting

The build toolchain completes compilation while silently omitting code without throwing errors. Developers see 'successful' builds that are actually missing critical bits, making debugging extremely difficult and leading to runtime failures.

buildNext.js

S3 metadata replication consistency issues with dependent objects

When replicating S3 objects with RTC guarantees, metadata nodes that are referenced by other objects may not be replicated, causing queries to fail when using engines like Spark or PySpark because they cannot find the referenced files or objects.

compatibilityAmazon S3SparkPySpark

Shared Kernel Isolation False Security in Containers

Docker containers rely on Linux kernel namespaces and cgroups for isolation rather than hardware virtualization. This creates a false sense of isolation—if a kernel vulnerability exists, all running containers inherit it. Container security is critically dependent on timely kernel updates to mitigate container escape vulnerabilities.

securityDocker

GitHub Actions poor support for specialized workloads (AI/ML, testing, data pipelines)

GitHub Actions operates as a general-purpose platform lacking optimizations for domain-specific tasks. AI workflows need GPUs and long-running checkpointed jobs; testing needs centralized reporting and test-specific diagnostics; data pipelines require specialized optimization—all missing from the generalist platform.

architectureGitHub ActionsAI agentsmachine learning

Production Database Concurrency Issues

The official FastAPI documentation's recommended DB integration pattern using dependencies leads to deadlocks when handling more concurrent users in production environments.

compatibilityFastAPI

Premature Microservices Adoption Creates Operational Complexity

Teams adopt microservices before understanding business domain, resulting in distributed transactions, data consistency issues, painful debugging, and unnecessary operational complexity that becomes a blocker for scalability rather than an enabler.

architectureMicroservicesDistributed Systems

Proxy Buffering Misconfiguration Destroys Performance

Disabling proxy buffering with `proxy_buffering off` forces NGINX worker processes to handle upstream responses in blocking, synchronous fashion, completely subverting the non-blocking architecture. This typically results in slower transfers, prolonged blocking times, and also disables caching, rate limiting, and request queuing.

configNGINX

Docker build reproducibility issues with dependency version changes

Docker builds pulling dependencies from the public internet during build time cannot guarantee reproducibility over time. Different versions of dependencies may be pulled on subsequent builds, and if exact versions are no longer available, Docker throws errors, blocking deployments.

buildDocker

Scaling custom admin solutions causes cascading failures

Custom admin panels that work for small teams degrade rapidly as the user base or data grows, leading to performance issues, broken queries, and unexpected feature failures. Significant rebuilding is often required if scalability wasn't planned from day one.

performancePostgreSQL

Platform outages during critical deployments

Vercel experiences regional outages that cause 500 errors on production sites. For a premium service marketing to businesses, these reliability issues are concerning, particularly when they coincide with client campaigns or product launches.

deployVercel

PyTorch's Python-centric design limits production deployment performance and interoperability

PyTorch's tight coupling with the Python runtime introduces GIL-related parallelism constraints, lower execution speed compared to C++ or Java, and poor interoperability with non-Python production stacks. This makes it difficult to meet low-latency, high-throughput, and multi-language requirements in real production systems.

deployPyTorchPythonTorchScript

Converting PyTorch distributed checkpoints to Hugging Face format is extremely painful

There is no official or well-supported path for converting PyTorch distributed training checkpoints to Hugging Face Transformers-compatible checkpoints. NVIDIA has deprioritized this in favor of their NeMo framework, leaving the community without reliable tooling for this common workflow.

migrationPyTorchHugging Face Transformers

File Descriptor Exhaustion Limits Scalability

NGINX's scalability is constrained by the operating system's maximum file descriptors (FDs), which commonly defaults to 1024. As a reverse proxy, NGINX consumes at least 2 FDs per request (client + upstream server), causing rapid FD depletion and hard connection failures at high concurrency if not manually increased via `worker_rlimit_nofile`.

configNGINX

PyTorch hardware-specific backend bugs cause failures across MPS, CUDA, and ONNX

Multiple hardware-specific issues affect PyTorch across different backends: LayerNorm/BatchNorm fail to compile on Apple M4 MPS, Conv2d is slower on macOS without MKLDNN, CUDA CI tests exhibit memory corruption (SIGIOT), and ONNX exports with dynamic inputs regressed between versions. These issues require constant per-platform debugging.

compatibilityPyTorchCUDAONNX+1

DynamoDB cost explosion for fast-growing datasets

As datasets grow, DynamoDB automatically increases partitions (10GB max per partition) but does not increase total provisioned throughput proportionally. This forces continuous throughput increases to maintain query performance, causing costs to spiral multi-fold.

configDynamoDBAWS

Vapor Mode runtime compatibility challenges

Vapor Mode is an entirely new Vue runtime designed for performance, but ensuring consistent behavior between Vapor Mode and other modes is difficult. Implementing performance optimizations while maintaining compatibility is time-consuming and complex.

compatibilityVue

Gemini API key approval stuck in black box for weeks

Developers face indefinite approval delays (weeks or longer) for API key requests with opaque rejection messages providing no actionable feedback. The approval process lacks status updates, timelines, or clear requirements, causing developers to abandon Gemini for OpenAI or Anthropic.

dxGemini API

torch.compile with dynamic shapes causes crashes, recompilations, and incorrect results

Using `torch.compile` with dynamic shapes leads to crashes (OverflowError from float-to-int conversion), excessive recompilations when mixing Python scalars with 0-d tensors, and incorrect outputs such as wrong adaptive max pooling results on Apple MPS. These issues significantly hinder adoption of compiled execution paths.

buildPyTorch

Resource refactoring is destructive and risky

Renaming or reorganizing resources in Terraform code causes them to be destroyed and recreated rather than updated, risking catastrophic downtime and data loss for stateful resources like databases. There is no native refactoring capability.

dxTerraform

Payment Data Security and Compliance Implementation

Integrating Stripe requires implementing robust security measures to protect sensitive payment information and comply with strict regulations. Developers must navigate PCI compliance, fraud prevention, and secure data handling without clear best practices, creating risk of data breaches.

securityStripe

Redis persistence mechanisms are not foolproof for data protection

Redis persistence through RDB snapshots and AOF (Append-Only Files) can fail to prevent data loss during crashes or unexpected failures. These mechanisms are unreliable for mission-critical workloads where data loss is unacceptable, especially when persistence is disabled for performance.

storageRedis

Global write lock kills performance under heavy write loads

MongoDB requires a global write lock for any write operation. Under write-heavy loads, this severely degrades performance, making it unsuitable for applications with balanced or write-heavy read/write ratios.

performanceMongoDB

1…11 12 13 14 15…101