Many of the world’s best-known websites were inaccessible for more than an hour yesterday after the Fastly CDN failed.
The internet outage caused by the content delivery network (CDN) Fastly affected sites including Amazon, CNN, Twitch, Hulu, Reddit, The Financial Times, Stripe, Spotify, The Guardian and the gov.uk domain. It also trashed Twitter’s emojis.
It affected countries from the UK and US to New Zealand, while other appear not to have been affected, such as parts of Germany.
The outage was caused by a failure within a content delivery network Fastly, which runs an edge cloud to speed loading times for local websites and to minimize traffic on internet backbone routes and is intended to deal with traffic bursts. Hence Fastly, like other CDNs, is an intermediary between those big websites and those who use them.
Robust and fragile
The CDN platforms are designed to be very robust, with huge amounts of redundancy built in and yet they fail. Configuration errors seems to be a recurring theme.
In 2017, a problem at Amazon’s AWS hosting business, for instance, took out some of the world’s biggest websites for several hours across the entire US east coast.
Another CDN company, Cloudflare, caused outages in 2020, the latter for half an hour in cities across the Americas and Europe. That was traced to an error in a single link that triggered a cascade of failures, which took out about 20 data centres globally.
Although initially there was speculation that the Fastly failure was the work of hackers, in fact a configuration error is said to be the cause.
Power in few hands
The nub of the problem is summed up well by The Guardian: “The increasing centralisation of internet infrastructure in the hands of a few large companies means that single points of failure can result in sweeping outages.”
Akamai and Limelight are also major CDNs whose role in the running of the internet is huge, yet are virtually unknown outside of the industry – including to most who rely on their unseen services.