ADVERTISEMENT

Cloudflare Outage Not A Cyberattack, Says CEO Matthew Prince — This Is What Really Happened

Cloudflare CEO Matthew Prince said a critical “feature file” suddenly doubled in size.

<div class="paragraphs"><p>Cloudflare CEO Matthew Prince has confirmed that the major global outage was not due to a cyberattack, but was instead caused by an internal configuration error in a database permissions change. (Source: ChatGPT)</p></div>
Cloudflare CEO Matthew Prince has confirmed that the major global outage was not due to a cyberattack, but was instead caused by an internal configuration error in a database permissions change. (Source: ChatGPT)
Show Quick Read
Summary is AI Generated. Newsroom Reviewed

Cloudflare CEO Matthew Prince has confirmed that the major global outage on Tuesday was not due to a cyberattack, despite initial suspicions of a massive DDoS attack, but was instead caused by an internal configuration error in a database permissions change.

In a detailed postmortem published late Tuesday, Prince explained that a modification to permissions on a ClickHouse database cluster — intended to grant better access to underlying data and metadata — contained a flawed query. This mistake caused the system to pull in far more data than intended, bloating a critical “feature file” used by Cloudflare’s Bot Management system to identify malicious traffic.

The feature file, which is regenerated and distributed across Cloudflare’s global network every five minutes, suddenly doubled in size and exceeded the maximum limit allowed by the company’s edge software. When the routing software encountered the oversized file, it crashed.

“Bad data was only generated if the query ran on a part of the cluster which had been updated. As a result, every five minutes there was a chance of either a good or a bad set of configuration files being generated and rapidly propagated across the network,” Prince wrote.

Every five minutes, the network would either receive a valid file (and briefly recover) or a corrupted oversized one (and fail again). This on-again, off-again behaviour lasted for roughly three hours starting around 11:20 UTC, triggering widespread and disrupting major services, including X (Twitter), ChatGPT, Canva, Discord, Cloudflare’s own dashboard, Turnstile, and countless other sites and applications.

Prince emphasised that “the issue was not caused, directly or indirectly, by a cyberattack or malicious activity of any kind,” and described it as a self-inflicted incident stemming from the permissions change.

“After we initially wrongly suspected the symptoms we were seeing were caused by a hyper-scale DDoS attack, we correctly identified the core issue and were able to stop the propagation of the larger-than-expected feature file and replace it with an earlier version of the file,” he added.

The outage, which Cloudflare called its most severe since 2019, was fully resolved by 17:06 UTC after engineers rolled back to a previous valid version of the file and restarted affected systems.

Prince issued a public apology to affected customers and users worldwide. Cloudflare has also committed to implementing stricter file-size safeguards, adding global kill switches for critical configurations, and conducting a broader review of failure modes in its core systems. 

Opinion
‘Please Unblock Challenges.Cloudflare.com To Proceed’ Breaks Internet Amid Cloudflare Outage — What It Means
OUR NEWSLETTERS
By signing up you agree to the Terms & Conditions of NDTV Profit