老九品茶

Skip to main content

AWS down: How a single network outage rippled through businesses, institutions and the economy

Racks of computer servers in rows of black cabinets in data center

Server racks are pictured in a data center. (Credit: Adobe Stock)

Amazon Web Services (AWS) made worldwide headlines on Monday when the service went offline for hours, disrupting popular apps, services and tools from Zoom and Venmo to Snapchat and Reddit.

According to , it was the biggest internet disruption since the cybersecurity firm CrowdStrike went down last year. It was also not the first time the AWS data center in northern Virginia has been implicated in a major outage.

What causes this type of network outage? What impacts can it have on businesses, institutions and the economy? And is there anything we can do to make such outages less likely in the future?

To answer these questions, we spoke with Levi Perigo, professor of computer science and co-director of 老九品茶 Professional Master's Program in Network Engineering.

What happened and who did it impact?

The AWS outage was widespread and caused major disruptions for many businesses, resulting in what some experts estimate to be hundreds of billions of dollars in economic impact. The issue stemmed from a failure in the Domain Name System (DNS), which acts as the Internet 鈥減hone book.鈥 DNS translates the easy-to-remember web addresses we use鈥攍ike amazon.com鈥攊nto the numeric IP addresses computers use to communicate.

When part of AWS internal DNS infrastructure went down, many of the services and websites hosted by AWS lost the ability to 鈥渇ind鈥 each other. To users, it appeared that websites and applications were broken or offline. Even though DNS is a relatively simple technology, it a fundamental part of how the Internet works, so when it fails, the effects can be enormous.

What causes this type of outage to happen?

Man poses for portrait

Levi Perigo

Outages like this can happen for several reasons, but most often it comes down to human or configuration errors that are amplified by the massive scale of operations at companies like AWS. To manage millions of systems efficiently, large cloud providers rely on network automation鈥攅ssentially using software to configure and control their infrastructure.

In this case, it likely that a small misconfiguration or script error was deployed across thousands of systems, resulting in a large-scale failure. These incidents highlight the importance of careful testing, validation, and documentation, especially when automation is involved.

Could it happen again? How vulnerable are we?

Unfortunately, outages like this are always possible. The more we rely on centralized cloud platforms such as AWS, the more we share in their risk. The scale of this week disruption shows just how much of the internet depends on a few key providers. While AWS has strong and reliable overall, no system鈥攏o matter how advanced鈥攊s immune to failure.

What can we do to prevent this or reduce the risk of it happening again?

There are ways to reduce risk, though it difficult to eliminate it completely. One key strategy is called multi-cloud architecture鈥攗sing multiple cloud providers (such as AWS, Google Cloud, and Microsoft Azure) to host services rather than relying on just one. This approach helps ensure that if one provider experiences an outage, the others can keep systems running.

Ultimately, incidents like this remind us that the internet is now critical infrastructure, and its reliability depends not only on technology, but also on careful design, operational discipline, and shared responsibility across providers and customers alike.