Join us
@devopslinks ・ Nov 23,2025

A database permissions change led to a Cloudflare outage by creating an oversized feature file, causing network failures initially mistaken for a DDoS attack.
The Cloudflare outage was caused by a permissions change in a database system.
The network failures were initially misinterpreted as a DDoS attack.
An oversized feature file exceeded software limits, causing network failures.
The root cause was identified and resolved by reverting to an earlier version of the feature file.
Cloudflare restored services by deploying the correct configuration file globally.
Cloudflare encountered difficulties on November 18, 2025, when a permissions update in their ClickHouse database changed how metadata queries behaved. This unexpected shift produced duplicate rows in the Bot Management feature file, causing the file to suddenly double in size. The oversized file exceeded the Bot Management module’s 200-feature limit, triggering system panics inside Cloudflare’s core proxy. At first, the symptoms resembled a massive DDoS attack, but the real cause was this malformed configuration file.
Once the issue was correctly identified, Cloudflare stopped the generation and rollout of new feature files, manually inserted a known-good version, and restarted affected proxy components. By 14:30 UTC the network was stabilizing, and by 17:06 UTC all services had recovered.
This was no isolated glitch. The outage affected Cloudflare’s core CDN and security layers, Workers KV, Access, Turnstile, and even blocked many users from logging into the Cloudflare Dashboard. The root problem was a change in ClickHouse’s query behavior, which surfaced more metadata than expected, pushed the feature file past its size limit, and caused widespread HTTP 5xx errors.
Cloudflare has already begun work to harden these systems and avoid similar failures in the future. As one of the most significant outages since 2019, this incident highlights how even small changes in internal systems can ripple across a global platform.
The feature file size limit that was exceeded.
The current number of features used prior to the incident.
The frequency at which the feature file was generated.
The number of minutes from start of impact to core traffic largely flowing again.
Hard-stop value where the oversized file triggered the system panic (features limit).
Experienced a network outage due to a permissions change in a database system, affecting its services and customers.
Its oversized feature file was a key factor in the Cloudflare network outage.
Occurred on November 18, 2025, due to a permissions change in a database system.
The sector affected by the Cloudflare network outage, impacting businesses and end users.
Normal operations were ongoing with a deployment of a database access control change.
Deployment reaches customer environments, and the first errors were observed on customer HTTP traffic.
The team saw rising traffic and errors in Workers KV, which at first looked like degraded KV performance affecting other Cloudflare services. They tried traffic adjustments and account limits to stabilize it. Automated alerts fired at 11:31, manual investigation began at 11:32, and the incident call opened at 11:35.
During investigation, internal system bypasses for Workers KV and Cloudflare Access were used so they fell back to a prior version of the core proxy. Although the issue was also present in prior versions of the proxy, the impact was smaller.
Work focused on rollback of the Bot Management configuration file to a last-known-good version. The Bot Management configuration file was identified as the trigger for the incident.
The Bot Management module was identified as the source of the 500 errors, caused by a bad configuration file. Automatic deployment of new Bot Management configuration files was stopped.
Successful recovery was observed using the old version of the configuration file, and focus shifted to accelerating the fix globally.
A correct Bot Management configuration file was deployed globally, and most services started operating correctly.
All downstream services restarted and all operations were fully restored.
Subscribe to our weekly newsletter DevOpsLinks to receive similar updates for free!
Join other developers and claim your FAUN.dev() account now!
FAUN.dev() is a developer-first platform built with a simple goal: help engineers stay sharp without wasting their time.

FAUN.dev()
@devopslinks