www.ilert.com

Who Was Affected By The Neon...

3/10/2026Updated 3/30/2026

Excerpt

# Neon: Kubernetes IP exhaustion disrupted services Neon experienced outages caused by Kubernetes IP exhaustion, impacting service availability. Explore what went wrong, Neon's response, key actions taken, and lessons learned to improve reliability. ... On May 16 and May 19, 2025, Neon experienced two outages totalling 5.5 hours in the AWS us-east-1 region. Customers were unable to start or create inactive databases, though active databases remained unaffected. The incidents resulted from exhausted IP addresses in Kubernetes subnets, triggered by control plane overload and AWS CNI misconfigurations. Immediate mitigations included reconfiguring IP allocation parameters and scaling prewarmed compute pools. ... The first incident began at 14:13 UTC on May 16, 2025, when customers started experiencing failures to activate databases. The second incident occurred on May 19, 2025, at 13:17 UTC, triggered by reverting the previous fixes. … ## Who was affected by the Neon outage, and how bad was it? Customers using Neon databases with scale-to-zero configurations in AWS us-east-1 were directly impacted. Users couldn't activate or create new inactive databases, disrupting development workflows and CI/CD processes. … ## What patterns did the Neon outage reveal? The outage revealed recurring risks in scaled infrastructure systems: - IP exhaustion acts as a hidden infrastructure bottleneck. - Configuration regressions were introduced during incident remediation. - Kubernetes clusters exceeding the designed pod limits under dynamic load conditions. ## Quick summary On May 16 and May 19, 2025, Neon faced two outages totalling 5.5 hours due to IP exhaustion in Kubernetes subnets in AWS us-east-1. Users were unable to activate databases with autoscaling configurations. Neon responded with rapid mitigations and transparent, though brief, communication. The incidents underscored the importance of robust infrastructure safeguards, effective configuration management, and clear, timely updates during critical incidents.

Source URL

https://www.ilert.com/postmortems/neon-outage-may-2025

Related Pain Points