news.ycombinator.com
Cloudflare outage on December 5, 2025 - Hacker News
Excerpt
As to architecture: Cloudflare has had some outages recently. However, what’s their uptime over the longer term? If an individual site took on the infra challenges themselves, would they achieve better? I don’t think so. … - Centralizing most of the dependency on Cloudflare results in a major outage when something happens at Cloudflare, it is fragile because Cloudflare becomes the single point of failure. Like: Oh Cloudflare is down... oh, none of my SaaS services work anymore. … dfex 42 days ago Putting Cloudflare in front of a site doesn't mean that site's backend suddenly never goes down. Availability will now be worse - you'll have Cloudflare outages* affecting all the sites they proxy for, along with individual site back-end failures which will of course still happen. * which are still pretty rare … The problem with pursuing efficiency as the primary value prop is that you will necessarily end up with a brittle result. ... It tracks what I've seen elsewhere: quality engineering can't keep up with the production engineering. It's just that I think of CloudFlare as an infrastructure place, where that shouldn't be true. I had a manager who came from defense electronics in the 1980's. ... You can argue all you want that folks "should" do this or that, but all I've seen in my entire career is that documentation is almost universally: out of date, and not worth relying on because it's actively steering you in the wrong direction. And I actually disagree (as someone with some gray in my beard) with your premise that this is part of "rigorous engineering" as is practiced today. I wish it was, but the reality is you have to read the code, read it again, see what it does on your desk, see what it does in the wild, and still not trust it. … What’s more concerning to me is that now we’ve had AWS, Azure, and CloudFlare (and CliudFlare twice) go down recently. My gut says: 1. developers and IT are using LLMs in some part of the process, which will not be 100% reliable. 2. Current culture of I have (some personal activity or problem) or we don’t have staff, AI will replace me, f-this. 3. Pandemic after effects. 4. Political climate / war / drugs; all are intermingled. … 2 minutes for their automated alerts to fire is terrible. For a system that is expected to have no downtime, they should have been alerted to the spike in 500 errors within seconds before the changes even fully propagated. Ideally the rollback would have been automated, but even if it is manual, the dude pressing the deploy button should have had realtime metrics on a second display with his finger hovering over the rollback button. … Not only that, but their API/pricing is specifically designed to cover edge-cases that will force you to buy a license. For example, they don't expose an API to assign a co-host. You can do that via the UI, manually, but not via the API. Can you share which solution are you moving to?
Related Pain Points
Cloudflare as single point of failure for dependent services
8Centralizing infrastructure dependencies on Cloudflare creates brittleness and risk. When Cloudflare experiences outages, all dependent SaaS services and proxied sites fail simultaneously, making the platform a critical single point of failure.
Quality engineering cannot keep pace with production changes
7Cloudflare experiences recent outages due to quality engineering lagging behind production engineering velocity. This is concerning for an infrastructure provider where stability should be paramount.
API gaps force manual UI operations for common tasks
5Cloudflare's API is incomplete for routine administrative tasks. Common operations like assigning co-hosts can only be done via UI, not programmatically, defeating infrastructure-as-code practices.