Back

news.ycombinator.com

Docker Considered Harmful (2025) - Hacker News

9/9/2025Updated 9/17/2025
https://news.ycombinator.com/item?id=45177566

If anything, it's the problem with the design of the UNIX's process management, inherited thoughtlessly, which Docker decided to not deal with on its own. Why does there have to be a whole special, unkillable process whose only job is to call wait(2) in an infinite loop? … Essentially, the work is pushed to the scheduler, but the logic itself lives in the user space at the cost of PID space pollution. cyphar 7 days ago The funny thing is that there is a way to opt out of zombie reaping as pid1 or a subreaper -- set sigaction of SIGCHLD to SIG_IGN (and so it really isn't that hard on the kernel side). Unfortunately this opts you out of all child death events, which means process managers can't use it. … IMHO the bigger issue with Docker and pid1 is that pid1 signal semantics (for instance, most signals are effectively SIG_IGN by default) are different than other processes and lots of programs didn't deal with that properly back then. Nowadays it might be a bit better, it Docker has also had a built-in minimal init for many years (just use --init) so the problem is basically solved these days. … Users will have to set it on their own, consider the security implications, and take the necessary measures to block forwarding between non-Docker interfaces. Our rules will be isolated in their own nft table, so hopefully it'll feel less like "Docker owns the system". > Docker’s lack of UID isolation by default This is not my area of expertise but this is omitting that user namespaces tend to drastically increase the attack surface (despite what some vendors say). For instance: https://blog.qualys.com/vulnerabilities-threat-research/2025.... > Docker makes it quite difficult to deploy IPv6 properly in containers, [...] since Docker relies on NAT [...] The only way around this is to… write your own firewall rules This is not true anymore. We added a network-level parameter to use IPv6 without NAT, and keep the semantic of `-p` (the port-publishing flag). … The downside of that approach is that some / all of your routers in your local network need to learn about this subnet to correctly route it to the Docker host. Configuring user namespaces for the container to improve containment = very good idea. Enabling CLONE_NEWUSER inside a container = (usually) a very bad idea. … This is not even an unusual opinion. LXC doesn't even consider containers with user namespaces disabled part of their threat model, precisely because it's so insecure to not use them[1]. Also, in my experience, most kernel developers generally assume (incorrectly) that most users use user namespaces when isolating containers and so make some security design decisions around that assumption. In every talk I've given on container security in the past few years I have urged people to use user namespaces. It is even better for each container to have its own uid/gid block. Podman, LXC and runc all support this but Docker doesn't really (though I think there was some work on this recently?). The main impediment to proper user namespaces support for most users was the lack of support for transparent uid/gid remapping of mount points but that is a solved problem now and has been for a few years (MOUNT_ATTR_IDMAP).

Related Pain Points3