blog.nginx.org
The Complex Dance of Lua and NGINX: Power, Pitfalls ...
Excerpt
NGINX, a high-performance web server and reverse proxy, has evolved significantly with the integration of Lua via OpenResty. This powerful combination enables dynamic request handling, flexible routing, and advanced features that static NGINX configurations alone cannot achieve. However, embedding Lua scripts into NGINX’s event-driven architecture introduces subtle complexities and risks that operators and developers must understand to avoid performance degradation, instability, and operational headaches. … header filter, body filter, and logging). This incomplete termination can lead to inconsistent logging or leaking internal headers. · **Variable scope and timing issues:** Variables set in one phase may not be available or may be stale in later phases if the timing and scope are misunderstood, leading to incorrect routing or access decisions. Logic that relies on a variable being set in a different phase (e.g., **set_by_lua*** vs. **access_by_lua***) can result in NGINX variables ($var) being empty or holding stale values, causing incorrect routing, logging, or access decisions. … · Lua code must be strictly **non-blocking** to maintain NGINX’s event-driven performance. Blocking operations (e.g., standard Lua I/O or OS calls) halt the entire worker, causing high latency and request timeouts. Using standard Lua libraries or C libraries that perform blocking I/O (like standard os.time() or slow file I/O) is a key pitfall and will block the entire NGINX worker process, resulting in massive performance degradation, high latency, and request timeouts for all concurrent requests handled by that worker. … ## Kubernetes ingress-nginx and Lua: Dynamic Configuration Risks The popular Kubernetes ingress controller ingress-nginx leverages Lua extensively for dynamic backend updates and routing logic. This dynamic approach introduces additional challenges: · Bugs in Lua scripts or shared dictionary (**ngx.shared.DICT**) management can break traffic routing, causing requests to be sent to unavailable or stale pods. Failure to implement a **TTL (Time-To-Live) or proper eviction policy** for keys in the dictionary causes it to fill up, resulting in **Out-of-Memory (OOM) errors** or cache thrashing. · Although Lua enables dynamic configuration without full NGINX reloads, some changes still require reloads, which can cause brief connection draining or latency spikes. · Frequent dynamic updates driven by Lua can cause the NGINX master process to fail to reap worker child processes properly, resulting in **zombie processes** accumulating on the host OS. These zombies consume system resources and complicate process management. … · **Blocking the event loop** with non-optimized Lua or external calls leads to massive latency spikes and request timeouts. · **Lua-based load balancing logic**, particularly under high pod counts, can result in a severe traffic imbalance where a small subset of backend pods receives an overwhelming majority of the traffic, creating “hot pods” and “cold pods.” · **Zombie processes** from improper worker reaping add operational complexity and resource waste. The accumulation of zombie processes occurs when the NGINX master process fails to properly reap worker child processes, often triggered by frequent dynamic endpoint updates driven by Lua. ## Operational Complexity and Security Concerns · Advanced features implemented via Lua snippets in annotations lead to configuration sprawl, drift, and audit difficulties. · The injection of Lua or NGINX configuration via user-supplied annotations has historically introduced critical remote code execution (RCE) vulnerabilities. · Configuration synchronization issues sometimes require manual intervention to delete and recreate Kubernetes Services and Ingresses. ## Ecosystem Management Risks ### Third-Party Module Instability and Version Control The dynamic and rapid nature of the Lua module ecosystem increases the complexity of maintaining stability. Errors rooted in third-party Lua modules are a known cause of gradual, indefinite memory consumption increases leading to OOM crashes. Without strict control over module versions and dependencies, operators face increased risk of subtle instability that is hard to debug. … A series of vulnerabilities discovered in 2025 demonstrated that Lua-based annotation parsers remained vulnerable to injection attacks even after the snippet restrictions. The **auth-url**, **auth-tls-match-cn**, and mirror UID parsers failed to properly sanitize user inputs before incorporating them into NGINX/Lua configurations. Attackers could craft malicious Ingress annotations that, when processed by the admission controller’s Lua-based validation logic, would inject arbitrary directives into the NGINX configuration template. … ## Conclusion Lua integration within NGINX, especially in Kubernetes ingress controllers like ingress-nginx, unlocks powerful dynamic capabilities but also introduces a complex set of challenges. Understanding the nuances of NGINX phases, Lua’s concurrency model, and the operational risks related to synchronization and state management, avoiding blocking the event loop (the “Cardinal Sin”), and preventing resource exhaustion from memory leaks or zombie processes is crucial for maintaining a stable deployment. Furthermore, the operational overhead from complex annotation sprawl and the inherent security risks associated with configuration injection (such as Remote Code Execution vulnerabilities) require careful mitigation to ensure system integrity.
Related Pain Points
Lua-Based Annotation Parsers Vulnerable to Injection Attacks
9Lua-based annotation parsers in ingress-nginx (e.g., `auth-url`, `auth-tls-match-cn`, mirror UID parsers) fail to properly sanitize user inputs before incorporating them into NGINX/Lua configurations. Attackers can craft malicious Ingress annotations that inject arbitrary directives into the NGINX configuration template via the admission controller's validation logic.
Lua Blocking Operations Halt Entire Worker Process
9Lua code embedded in NGINX must be strictly non-blocking to maintain event-driven performance. Using standard Lua libraries or C libraries that perform blocking I/O (e.g., standard `os.time()` or slow file I/O) halts the entire NGINX worker, causing massive performance degradation, high latency, and request timeouts for all concurrent requests.
Lua-Based Load Balancing Creates Hot Pod/Cold Pod Imbalance
7Lua-based load balancing logic in Kubernetes ingress-nginx, particularly under high pod counts, results in severe traffic imbalance where a small subset of backend pods receives an overwhelming majority of traffic, creating 'hot pods' and 'cold pods' and degrading overall cluster performance.
Caching keys without TTL causes unbounded memory growth
7Storing cache keys without expiration causes indefinite accumulation over time, leading to unbounded memory growth, increased eviction pressure, and out-of-memory errors. Keys added without TTLs because "data never changes" persist even after assumptions change, causing unpredictable eviction behavior.
Third-Party Lua Module Memory Leaks Cause Gradual OOM Crashes
7Errors in third-party Lua modules cause gradual, indefinite memory consumption increases leading to out-of-memory crashes. Without strict control over module versions and dependencies, operators face increased risk of subtle instability that is hard to debug.
Frequent Dynamic Updates Cause Zombie Process Accumulation
7Frequent dynamic endpoint updates driven by Lua in ingress-nginx cause the NGINX master process to fail to properly reap worker child processes, resulting in zombie processes accumulating on the host OS. These zombies consume system resources and complicate process management.
Lua Phase Timing and Variable Scope Cause Incorrect Routing
7Variables set in one Lua phase may not be available or may be stale in later phases if timing and scope are misunderstood. Logic relying on variables set in different phases (e.g., `set_by_lua*` vs. `access_by_lua*`) results in NGINX variables being empty or holding stale values, causing incorrect routing, logging, or access decisions.
Diverse Deployment Environments Create Configuration and Management Sprawl
6Managing applications across diverse deployment environments (AWS, Azure, on-premise, Kubernetes, serverless) requires different NGINX configurations, tools, and operational knowledge. This diversity leads to complexity sprawl, configuration drift, and increased operational toil.
Manual Intervention Required for Configuration Synchronization Issues
5Configuration synchronization issues in Kubernetes ingress-nginx sometimes require manual intervention to delete and recreate Services and Ingresses, creating operational toil and potential downtime.
Incomplete Lua Phase Termination Leaks Internal Headers
5Incomplete termination of Lua execution phases can lead to inconsistent logging or leaking of internal NGINX headers, causing information disclosure or incorrect log entries.