Sources

1577 sources collected

marcelsud.me

Cloudflare Outage June/2025: Lessons for Software Engineers - Marcelo Santos (@marcelsud)

This obscure storage provider turned out to be the backbone of Cloudflare's Workers Key-Value (KV) service — a critical piece of infrastructure that thousands of applications depended on for everything from user sessions to configuration data. When it stumbled, the dominoes began to fall: **91%**of Workers KV requests started failing **100%**failure rate on Access logins **90%+**error rate on Stream - Workers AI, Images, Turnstile, and parts of Dashboard also affected - Thousands of dependent applications around the globe started throwing errors - Customer support channels lit up like Christmas trees … ### The Failure Timeline **Ground Zero (T+0 minutes)**: A third-party storage provider experiences internal issues. Most of the world doesn't notice yet. **Primary Impact (T+5 minutes)**: Cloudflare's Workers KV service starts timing out. Alert dashboards begin showing yellow warnings that soon turn red. **Secondary Impact (T+15 minutes)**: Services that depend on Workers KV — Access for corporate authentication, Stream for video delivery, Workers AI for machine learning inference — start failing completely. These aren't graceful degradations; they're hard failures. **Tertiary Impact (T+30 minutes)**: Customer applications that relied on these services start experiencing outages. E-commerce sites can't authenticate users. Streaming platforms can't deliver content. AI-powered features simply disappear. **Ecosystem Impact (T+60 minutes)**: The blast radius has now extended to millions of end users who have no idea what "Workers KV" means. They just know their favorite apps aren't working. This progression reveals something crucial about modern distributed systems: ... Because of what I call the "invisible dependencies problem." When you're building at scale, you tend to think about your immediate dependencies — the databases you talk to, the APIs you call, the services you integrate with. But you rarely map the dependency tree three or four levels deep.

6/16/2025Updated 7/18/2025

www.excedo.se

Cloudflare's Abuse Blind Spot: When Scale Outweighs Safety

## Executive summary - Explosive abuse growth. Cloudflare developer domains set new records in 2024: **pages.dev** incidents rose by 198 % (460 → 1 370) and**workers.dev** by 104 % (2 447 → 4 999). Total campaigns are on pace to exceed 1 600 in 2025. - Systemic misuse. Multiple security vendors (Fortra, Trustwave, CloudSEK) and independent researchers show brand‑impersonation and credential‑harvesting on Cloudflare infrastructure at scale. - Process dead‑ends. Despite thousands of submissions - **including from trusted flaggers**- Cloudflare’s abuse desk replies with boilerplate denials and places the burden of proof on reporters. - Legal collision course. NIS2, its national transpositions, and the Digital Services Act (DSA) impose strict duties on “online platforms,” CDNs, DNS and reverse‑proxy providers. Cloudflare’s current practice is **non‑compliant** and creates**material liability** for EU customers. - Action items. Regulators must clarify CDN liability; enterprises should block **pages.dev / workers.dev by default**; incident responders should lobby for trusted‑flagger status; and procurement teams must reassess Cloudflare against**NIS2 supply‑chain obligations**. … ... … - Trustwave SpiderLabs highlighted “a huge number of phishing and scam pages abusing **pages.dev** Cloudflare services.” - CloudSEK described a generic phishing kit hosted on **workers.dev** that can impersonate any brand on demand. - A Reddit thread with >600 up‑votes chronicles a researcher’s frustration after reporting 200+ malicious **pages.dev** sites - with <**30 %** ever taken down. ## Why Cloudflare’s process fails trusted flaggers 1. **Form‑only reporting**– Email complaints receive an automated bounce directing reporters to the web form. Bulk incidents cannot be submitted efficiently. 2. **High evidentiary bar**– Reporters must prove phishing is active at the time of review, ignoring that campaigns often operate in short bursts. 3. **Opaque outcomes**– Cloudflare rarely discloses whether any action was taken, citing privacy and customer confidentiality. … ### For enterprises & SOCs - Re‑evaluate CDN providers during 2025 vendor risk reviews; require written evidence of NIS2 compliance and breach‑handling metrics. - Block or sandbox links ending in pages.dev and workers.dev until verified safe. - Sinkhole newly created Cloudflare subdomains that spoof your brand via DNS filtering. - Update incident‑response runbooks to include NIS2 supply‑chain obligations: document due diligence, preserve abuse evidence, and, if necessary, switch CDN rapidly. … ## Conclusion Cloudflare’s vision of “building a better Internet” rings hollow while its infrastructure operates as a turnkey phishing platform. Under NIS2, every **ignored report** is no longer just a user‑experience issue - it is a **potential regulatory offence** that can cascade fines down the **supply chain**. Enterprises that continue to **delegate critical traffic** to Cloudflare infrastructure without demanding transparent, audited abuse processes, now face a double jeopardy: compromised credentials and compliance penalties.**The time to act is now **- before the first NIS2 enforcement actions make headlines.

7/31/2025Updated 2/22/2026

blog.qatestlab.com

Cloudflare Outage 2025: What SaaS Teams Must Learn

On November 18, 2025, at 11:20 UTC, the largest Cloudflare outage in years brought thousands of SaaS platforms to an abrupt halt. What began as a minor database permissions change cascaded into widespread disruption, revealing how a single infrastructure failure can shake the entire digital ecosystem. For B2B providers, the incident highlighted how deeply modern products depend on infrastructure layers they do not control, underscoring the need to build real resilience and rethink stability strategies to inspire confidence in proactive planning. ... The November 2025 outage began with a small internal change to database access settings. That change accidentally broke how Cloudflare generated part of its bot protection configuration, causing a critical file to grow larger than the system was designed to handle. When this faulty configuration spread through the network, some core components crashed, and others stayed online but misclassified traffic. ... … - Platforms like ChatGPT and X became unavailable or responded with errors. - Popular consumer services such as Spotify and Canva experienced disruptions, preventing users from listening to music, editing files, or accessing stored content. - Several gaming platforms and online multiplayer services reported login failures and connectivity issues. - News and media websites across multiple regions — including Europe and the US — were temporarily unreachable, affecting access to information during peak hours. These real-world interruptions demonstrated that the outage extended far beyond infrastructure layers. ... **How the Outage Impacted SaaS Platforms and Their Customers** When Cloudflare’s infrastructure failed, B2B software platforms that relied on its DNS, CDN, or edge services became unavailable, even though their internal servers remained operational. Dashboards were unable to load, authentication stopped, and automated operations stalled, resulting in approximately 3 hours of severe service disruption across multiple SaaS platforms. … **Key impacts for SaaS providers:** **Uptime guarantees disrupted**: Platforms displayed “500 Internal Server Error” despite normal internal performance. **Customer trust eroded**: Clients questioned the long-term stability of subscription-based services, influencing satisfaction, retention, and renewal decisions. **Operational load increased**: Support teams saw a spike in incident reports while engineering teams were analyzing an external issue. **Vendor dependency risk exposed**: Reliance on a single infrastructure vendor increased the risk of unplanned downtime. **Interruptions across connected systems**: Partner integrations, billing services, and dependent APIs experienced degraded responsiveness and temporary functional limitations within connected environments.

12/16/2025Updated 12/16/2025

www.youtube.com

What do developers really want? (Panel) - Cloudflare Connect London 2025

But I think there {ts:842} are ways to make that that more efficient. It still makes me uncomfortable to be honest. {ts:848} Um especially as a developer product where I'm making a certain guarantee to downstream users about the quality of {ts:854} our code and that we understand it and stuff like that for application code. … {ts:1179} your products need to support folks who want to use your products with AI um for their own projects. Um, but there's a {ts:1187} challenge and getting these uh platforms up to date with like changes you guys are making, APIs you guys release. How {ts:1194} are you guys thinking about that problem space and the experience, the developer experience that is changing with these … So uh uh and it doesn't make {ts:1233} sense because suddenly it's trying to use things that um your framework does not support or does not uh very {ts:1240} coherently explains that it don't use or you don't expose the correct types and things that usually developers manage to {ts:1247} get around with and then the AI does something that like the how things should have worked like how it expects

4/21/2025Updated 9/30/2025

cf-assets.www.cloudflare.com

Connectivity cloud position paper 2025

Digital failure is a widespread issue. The Boston Consulting Group found that 70% of technology projects are late, over budget, and/or do not deliver on their original scope. More specifically, McKinsey found that 75% of cloud migrations run over budget. And on the AI front, Gartner predicts that at least 30% of generative AI … application onboarding. On the latter point, 48% of IT and security leaders say they are struggling to support evolving user types and a growing number of users, according to joint Forrester and Cloudflare research. But even more importantly, complexity in the network or IT and security stack also makes it harder to add new … supporting infrastructure as a key reason for failed AI projects. These examples do not even touch on security risks of complexity: incident response and analysis can become dangerously slow. They also donʼt include other cost considerations: using too many IT and security services usually means paying for features you never use. Small wonder, then, that 60% of security leaders

Updated 3/16/2026

blog.ashleypeacock.co.uk

Developer Week 2025 Recap: Everything Cloudflare Just Shipped

, and Cloudflare will build and push your container image to their registry, ready to be used by your application. Now, you might be thinking: what about cold starts? Well, depending how you configure your containers, you may experience cold starts while the container boots. However, there is configuration available to set a minimum number of instances — this means Cloudflare pre-warms your containers so they are ready to serve requests immediately. Alongside prewarming, you can also set a CPU threshold that will define when your containers scale up to meet demand — meaning autoscaling is built-in to the platform. … ... On the face of it, the announcement of VPCs might not *seem* the most exciting thing ever. However, I’ve had countless conversations with companies, small and large, that want to use Cloudflare’s Developer Platform and they simply can’t — because their AWS resources, for example, are within a VPC that Cloudflare Workers cannot securely connect to. … In short, it becomes quite slow — especially if you’re executing multiple queries within a single request. That’s because each query will need to go from say Australia to Europe every single time, and additionally, you’ll need to connect to the database each time a Worker is spun up, which is quite often each time a new request comes in as Workers don’t typically hang around for long. … #### Static Asset Workers — Frameworks Go Generally Available During Birthday Week last year, Cloudflare started the migration of Pages to Workers with the release of Static Asset Workers in beta. For as long as I can remember, Pages has been the go-to option for hosting static and full-stack websites on Cloudflare. However, there were always some drawbacks with it, such as not being able to use Durable Objects without creating a separate Worker, and more recently, not having access to Workers Logs.

4/14/2025Updated 4/1/2026

tei.forrester.com

The Total Economic Impact™ Of Cloudflare's Connectivity Cloud

### Key Challenges Prior to investing in Cloudflare, the interviewees’ organizations relied on legacy CDNs, VPNs, and security point solutions. The interviewees’ organizations struggled with complexity, costly manual effort, and poor security results. The interviewees noted how their organizations struggled with common challenges, including: … Too many point solutions. Interviewees highlighted that their organizations had pieced together a large number of point solutions to fill their security and connectivity needs. While each solution provided a specific needed capability, this ecosystem became unmanageable at scale. Furthermore, many of the older solutions lacked automation or were poorly integrated with each other, resulting in unnecessary manual management effort. … - Downtime from attacks. Interviewees highlighted that their previous solutions had insufficiently protected them from DDoS attacks, resulting in downtime or degraded performance. The senior principal security engineer at an e-commerce firm stated, “We were previously on another platform and had a pretty serious incident that they couldn’t solve, and so we migrated.” - Poor bot management. Similarly, prior solutions did a poor job protecting web applications from sophisticated bot schemes. Interviewees’ organizations struggled with slow performance and were vulnerable to schemes like bots scraping their pricing information. The director of global governance, risk, and compliance for a manufacturing firm explained: “We discovered we were getting scraped heavily by competitors. They were going out to our catalog sites and scraping our catalog. So, we had a lot of bot traffic.” … - Legacy solutions lacking automation. Interviewees noted that incumbent solutions lacked the automation or ease-of-use features they desired to cut down on manual work. The senior principal security engineer for the e-commerce firm explained: “There were some organizational pain points, in that we didn’t like dealing with [our old vendor] and their product. There was no automation, so it was all manual working in a web console … click, click, click. You know, no infrastructure as code.”

Updated 3/16/2026

research.etr.ai

Cloudflare Leads in Security but Lags for Developers

## Cloudflare Leads in Security but Lags for Developers ETR Insights presents a panel discussion between senior IT executives, all broadly positive on Cloudflare’s offerings, including security, DNS management, and content delivery, but not uniformly so. Cloudflare helps administrators coordinate the decentralized and edge computing environments that were accelerated by the Pandemic, including for multi-cloud strategies and load balancing. Panelists criticize Cloudflare’s limited DevOps integration, developer platform capabilities, and lack of documentation, and find its advanced analytics and API security lacking. To that end, panelists see opportunity for Cloudflare within SSO and expanding capabilities in API security, potentially through strategic acquisitions, along with improving analytics and observability features for generative AI deployments. ... The group is less enthusiastic about Cloudflare’s developer platforms, where executives seek better documentation, development tools, and DevSecOps integration. One panelist, managing data center operations globally, notes that while enterprises use Cloudflare for critical functions like single sign-on, the company's developer platform is simply less prominent compared to Oracle and Amazon. *“Cloudflare is just not known for being super developer-friendly. I don't know if Cloudflare hasn't focused on going after * *that, if there are security risks there that haven't been identified in order to mitigate to customers like us in the idea of using them as a developer platform, or if it's just not something that they lead with.” … *ETR Research: Product category usage and evaluation plans. ... Panelists also point to limitations in Cloudflare's built-in capabilities for critical data metrics. *“We pipe a lot* *of our* *stuff into Datadog, * *and then turn around and use things like blocking IPs or obvious sort of security concerns for people being naughty, that Cloudflare is * *happy* *to pass.* *There* *are* *enhancements that* *we've* *had to* *build from* *an* *application logic standpoint* *using effectively* *a third* *party, * *because* *it's* *not* *specifically* *baked* *into* *the* *platform.”* … *ETR* * Research: Product spending share breakout. ... Operational challenges and complex licensing structures prompt IT leaders to consider alternative solutions in security; some executives are rethinking their reliance on Zscaler and Palo Alto Networks, with Cloudflare as promising for quicker and more seamless policy updates across international locations. *“Some of the difficulties that we’ve found in using Zscaler is the turnaround time for adding exceptions or various sites, ports, IPs, for the percolation to happen across our entire enterprise.* * Some of those users—especially some of the larger locations that are providing customer service to our clients—if they can't get to a site, or if we have to run something, God forbid, a custom macro or something bizarre to get around something, it feels like it's defeating the purpose of what the security was designed to do to begin with, for us to be able to add and remove easily.” … More than a year after Cloudflare introduced its Workers AI platform, our panelists complain of limited documentation and dubious practical benefit. At best, they are in evaluation mode. … *” Another executive is similarly skeptical. *“Cloudflare* *has* *a* *little* *bit* *of* *a* *problem* *with * *documentation or* *getting stuff to actually launch. The* *prototyping,* *we've run into problems in the* *past,* *and our developers sort of remember* *those* *pain* *points.” … Within cybersecurity and compliance, practitioners should keep an eye on market trends such as edge AI, API security, and privacy regulations. In particular, APIs have become prime targets for cyberattacks. … But * *they’re* *also* *creating* *either* *security* *loopholes,* *or* *just* *directly* *allowing* *data* *to* *go* *to* *places* *that* *it* * shouldn't be going to.”*

7/1/2025Updated 7/2/2025

news.ycombinator.com

Cloudflare outage on November 18, 2025 post mortem

How can you write the proxy without handling the config containing more than the maximum features limit you set yourself? How can the database export query not have a limit set if there is a hard limit on number of features? Why do they do non-critical changes in production before testing in a stage environment? … Having a critical application issuing ad-hoc commands to system.* tablespace instead of using a well-tested library is just amateurism, and again - bad engineering; IMO it is good practice to consider all system.* privileged applications and ensure their querying is completely separate from your application logic; Sometimes some system tables change, and fields are added and/or removed - not planning for this will basically make future compatibility a nightmare. Not only the problematic query itself, but the whole context of this screams "lack of proper application design" and devs not knowing how to use the product and/or read the documentation. ... The database issue screamed at me: lack of expertise. I don't use CH, but seeing someone to mess with a production system and they being surprised "Oh, it does that?", is really bad. And this is obviously not knowledge that is hard to achieve, buried deep in a manual or an edge case only discoverable by source code, it's bread and butter knowledge you should know. ... But at the same time, what value do they add if they: * Took down the the customers sites due to their bug. * Never protected against an attack that our infra could not have handled by itself. * Don't think that they will be able to handle the "next big ddos" attack. It's just an extra layer of complexity for us. ... Be it management focusing on the wrong things, be it developers not being in the wrong position or annoyed enough to care or something else entirely. However, not doing these things is (likely) a sign that currently they are not in the state of creating reliable systems - at least none reliable enough for what they are doing. ... [1] And should make you adapt the process of analyzing issues. Eg. making sure config changes are "very loud" in monitoring. It's one of the most easily tracked thing that can go wrong, and can relatively easily be mapped to a point in time compared to many other things. … That said, I am totally fine with your use case in your application. ... My worry is that this runtime panic behavior has unwittingly seeped into library code that is beyond our ability and scope to observe. Or that an organization sets a policy, but that the tools don't allow for rigid enforcement.

11/18/2025Updated 2/6/2026

research.etr.ai

Cloudflare’s Strengths Shine but Perceived Gaps Persist

ETR Insights presents an interview with a panel of senior technology executives: Cloudflare remains their default perimeter for web performance and security, though most are holding spending steady, with only modest increases for expanded DDoS, bot management, and WAF coverage. Panelists like Cloudflare’s lightweight CDN, though find that their emerging SASE stack, while cheaper, is less mature relative to competitors like Zscaler. Persistent service quirks, technical gaps, and limited enterprise-grade management capabilities can also frustrate. Cloudflare Workers serves latency-sensitive edge functions and complex Web-application-firewall logic; panelists see upside if Cloudflare can convert its vast traffic telemetry into truly automated, AI-driven defenses without adding cost or oversight. Read on to learn more about Cloudflare’s “pay-as-you-grow” economics, skepticism that raw CDN speed will displace hyperscalers, and why some panelists find Cloudflare’s AI-driven API Shield too expensive to sustain. … **Several users express concern that Cloudflare launches products before they are fully enterprise-ready. Issues with incomplete DNS firewall features and UI limitations hinder deeper adoption of newer or advanced capabilities **Operational complexity at scale.**Managing Cloudflare across many domains or clients can be cumbersome. Lack of bulk update functionality in the UI and a heavy reliance on APIs or manual scripts are cited as pain points for multi-tenant or agency use **Cautious optimism for SASE adoption. … A significant complaint: small functional oddities persist for years, and quirks in core services. *“We’ve kind of had to trip on it ourselves or find it out through a support ticket,” *says one SVP, *“and then just kind of adjust our workflow or configuration to kind of work around it.” *Another CISO attempted to move DNS-resolution to Cloudflare, only to pull back when promised capabilities failed to materialize. … *“It's all very manual, which I don't love. It's a point solution we don't use very often, and it's not across all of our clients. It’s for very particular problems that we find that don't have a kind of prepackaged solution within Cloudflare already.” *Past Terraform gaps also sound to have slowed broader rollout. … *One executive imagines a potential arbitrage play. ... One SVP, whose firm advises clients on Cloudflare contracts, notes upselling has intensified, in particular attempts to dislodge Zscaler. Some of their clients struggle to understand which Cloudflare services are truly necessary, while others find that hurried purchases end up discarded at renewal. Again, follow-on is an issue. Within enterprise, *“They get [the product] out, and then it takes a while to get all the enterprise-grade features added after it's in the marketplace.” *Although the company’s support teams are improving, our panelists want to see deeper product maturity and steadier account guidance before declaring Cloudflare a fully enterprise-ready platform. *“They’ve been more responsive to that in probably the last four months. While aggressive and while cross-selling, I think they tend to be more willing to maybe put in a little bit of work now.” ... *“The primary thing that stopped us is existing contracts with existing players. When we're coming up on a renewal and have the opportunity, then we'll do an actual POC and determine whether or not those gaps have been reduced, and whether we can live with them if the associated cost delta is big enough.” *Panelists agreed that Cloudflare must shore up administrative features and step up executive-level sales outreach. *“If the organization team who has to make this decision doesn’t get much communication with Cloudflare, or time to spend on the comparison, then they may get more inclined towards another product like Zscaler.” * Panelists appreciate Cloudflare’s rapid, credit-card-driven onboarding and low-maintenance operation once initial settings are dialed in, though ones juggling dozens of customer domains complain the platform lacks true multi-site management, forcing them to rely on APIs and infrastructure-as-code scripts for routine bulk updates. *“Being forced to use an API—that's kind of their default answer to anything—that's also been a point of frustration.” *An effort to secure white-label enterprise agreements fell apart on overly restrictive terms. Cloudflare’s appeal is clear to power users, but cracks are showing at scale. *“What I like is that Cloudflare is innovative and cost-effective, and what I dislike is that they're slow to deliver enterprise-grade functionality.” *

12/4/2025Updated 1/20/2026

news.ycombinator.com

Cloudflare outage on December 5, 2025 - Hacker News

As to architecture: Cloudflare has had some outages recently. However, what’s their uptime over the longer term? If an individual site took on the infra challenges themselves, would they achieve better? I don’t think so. … - Centralizing most of the dependency on Cloudflare results in a major outage when something happens at Cloudflare, it is fragile because Cloudflare becomes the single point of failure. Like: Oh Cloudflare is down... oh, none of my SaaS services work anymore. … dfex 42 days ago Putting Cloudflare in front of a site doesn't mean that site's backend suddenly never goes down. Availability will now be worse - you'll have Cloudflare outages* affecting all the sites they proxy for, along with individual site back-end failures which will of course still happen. * which are still pretty rare … The problem with pursuing efficiency as the primary value prop is that you will necessarily end up with a brittle result. ... It tracks what I've seen elsewhere: quality engineering can't keep up with the production engineering. It's just that I think of CloudFlare as an infrastructure place, where that shouldn't be true. I had a manager who came from defense electronics in the 1980's. ... You can argue all you want that folks "should" do this or that, but all I've seen in my entire career is that documentation is almost universally: out of date, and not worth relying on because it's actively steering you in the wrong direction. And I actually disagree (as someone with some gray in my beard) with your premise that this is part of "rigorous engineering" as is practiced today. I wish it was, but the reality is you have to read the code, read it again, see what it does on your desk, see what it does in the wild, and still not trust it. … What’s more concerning to me is that now we’ve had AWS, Azure, and CloudFlare (and CliudFlare twice) go down recently. My gut says: 1. developers and IT are using LLMs in some part of the process, which will not be 100% reliable. 2. Current culture of I have (some personal activity or problem) or we don’t have staff, AI will replace me, f-this. 3. Pandemic after effects. 4. Political climate / war / drugs; all are intermingled. … 2 minutes for their automated alerts to fire is terrible. For a system that is expected to have no downtime, they should have been alerted to the spike in 500 errors within seconds before the changes even fully propagated. Ideally the rollback would have been automated, but even if it is manual, the dude pressing the deploy button should have had realtime metrics on a second display with his finger hovering over the rollback button. … Not only that, but their API/pricing is specifically designed to cover edge-cases that will force you to buy a license. For example, they don't expose an API to assign a co-host. You can do that via the UI, manually, but not via the API. Can you share which solution are you moving to?

12/5/2025Updated 1/17/2026

cloudflare.tv

Cloudflare Connect 2025 Highlights: Common & Company

And to prove it, they showed this benchmark in which they were doing a lot of math.sign, you know, trigonometry operations. And for some reason in this test, it ran three times faster on Cloudflare than on Bracel. And they said, ha ha. And then another YouTuber that's better known named Theo, the independent developer and YouTube personality, took issue with that, rightly so, and came up with his own set of benchmarks, which were designed to simulate CPU intensive workloads, which is actually not what most people are doing on either of these platforms. Usually, you're spending most of your time waiting for network communications, talking to your database, and so on, and not spending a lot of CPU time rendering a result. But for the purpose of this test, point was to measure raw JavaScript execution time. And he came up with a bunch of benchmarks. And in those benchmarks, it appeared to be that Cloudflare was slower by as much as like 3, 3.5x. … And to do this with takes a little while to spin up a container, and then you have to like, try to reuse it for things. But once you're reusing a sandbox, then you have to worry about is any data leaking between the different uses. And so it's a lot more expensive that way. And that's a wrap.

10/24/2025Updated 3/4/2026

1…17 18 19 20 21…132