Amazon Outage Today, October 20, 2025
Millions of users across the United States and around the world faced widespread disruptions early on October 20. The incident began at 3:11 a.m. ET in AWS’s US-EAST-1 region and quickly affected core web services and consumer apps.
The company first described an operational issue that escalated into significant API errors and connectivity problems. Downdetector recorded about 6.5 million reports touching more than 1,000 sites, showing how dependent many services are on a single cloud region.
Social platforms, finance apps, airlines, and media outlets all saw interrupted workflows and stalled transactions. Early mitigation was signaled by 6:35 a.m. ET, but lingering problems and reduced request limits lasted through the day.
This introduction sets the scene for a deeper timeline and root-cause analysis. Readers will see how EC2 networking and DynamoDB interactions, DNS impacts, and cascading failures combined to create a major web service crisis.
What happened in the Amazon Outage and why it mattered across the United States
At 3:11 a.m. ET a fault in US‑EAST‑1, the oldest and largest cloud region, set off a chain reaction that hit many core online systems. The event made a wide range of services slow or unreachable for people across the country.
AWS’s US‑EAST‑1 disruption rippled through core internet services
Amazon web services reported an operational alert tied to internal load balancer health checks in EC2. That internal problem cascaded into DynamoDB and related APIs, and a separate API update led to DNS resolution failures.
The result: a visible disruption across many sectors. Social media and mainstream media platforms, banks, airlines, telecoms, and public portals all saw degraded performance or outages.
- Companies in finance, travel, retail, and government were affected.
- Core components for authentication, storage, and messaging were impacted.
- Shared reliance on one major provider magnified the scope of the incident.
The incident shows how tightly linked modern services are to a single cloud region. That shared dependency is why millions of people lost access to news, communication, and transactions nearly at the same time.
Live timeline: From first reports to phased recovery
Timeline entries trace the disruption from initial reports through gradual system recovery.
Early hours ET
At 3:11 a.m. ET an operational issue in US‑EAST‑1 affected 14 services. Users saw early interruptions across apps and websites.
Morning ET
By 6:35 a.m. ET AWS said the database problem was fully mitigated. Many users still experienced delays as backlogs cleared.
Late morning ET
At 10:14 a.m. ET engineers confirmed significant API errors and connectivity trouble across multiple services. An 11:43 a.m. ET update tied the fault to a subsystem monitoring load balancers and to EC2 networking.
Afternoon to evening ET
By 4:03 p.m. ET systems showed steady improvement and throttling limits were eased. Some apps remained degraded into the evening, so user experience lagged behind core recovery.
- Downdetector recorded 6.5 million reports across more than 1,000 sites.
- Phased restoration shows how distributed systems recover at different rates.
| Time (ET) | Update | Impact |
|---|---|---|
| 3:11 a.m. | Operational issue flagged | 14 services affected |
| 6:35 a.m. | Database fully mitigated | Lingering delays |
| 10:14 a.m. | API errors confirmed | Wider connectivity problems |
| 4:03 p.m. | Service improvements | Throttles reduced; some apps still degraded |
This concise timeline shows key milestones in the aws outage and clarifies that mitigation and full recovery did not happen at the same time.
Root cause analysis: EC2 network and Domain Name System knock-on effects
Investigators traced a linked chain of failures that began in an internal EC2 health checker and spread to multiple services. The initial fault degraded routing and made many endpoints unreachable.
Internal subsystem issue monitoring network load balancers in EC2
AN internal subsystem that monitors the health of EC2 network load balancers malfunctioned. That fault impaired routing and reduced connectivity across several regions.
DynamoDB API update and DNS resolution failures disrupted connections
A subsequent DynamoDB API update introduced an error that affected domain name resolution. Applications could not resolve the correct domain name for the API, which blocked authentication and data access.
Cascading impact to aws services including SQS, Amazon Connect, and more
The interplay of EC2 networking degradation and failed DNS lookups produced a cascading impact across 113 aws services. Dependent systems such as SQS and Amazon Connect experienced interrupted message flows and call routing.
“The root cause chain shows how control‑plane faults and API-level resolution issues can amplify into wide failures across shared infrastructure.”
- Cause chain: health-check error → impaired routing → API DNS failures.
- Why it mattered: domain name system failures prevent apps from finding endpoints, stopping key workflows.
- Aftermath: fixing the immediate cause did not clear backlogs; systems needed time to return to normal.
Amazon Outage services and apps affected
Users and businesses saw a broad set of consumer services and business apps disrupted during the incident. The impact touched social platforms, publishing sites, finance tools, travel systems, smart home devices, and gaming ecosystems.
Social and media platforms
Social media and news media were visibly affected. Snapchat, Pinterest, Apple Music, Apple TV, Disney, The New York Times, The Wall Street Journal, and other outlets faced delays or degraded feeds.
Finance and crypto
Payment and trading apps saw interruptions. Venmo, several banks, Coinbase and Robinhood reported issues that slowed payments and access to exchange data.
Airlines, telecom, smart home, gaming
Delta and United noted minor delays while T‑Mobile acknowledged service problems. Ring doorbells, Alexa speakers, Prime Video, and Kindle users reported missed notifications or failed downloads.
Gaming and creator platforms such as Roblox, Fortnite, Xbox, Duolingo, and Canva experienced logins or timeout errors. Trackers also listed WhatsApp, Zoom, Slack, Etsy, OpenAI, and ESPN as also affected.
- Why it mattered: many companies use the same core web services, so a single regional fault hit diverse apps at once.
- Note: newsrooms like the Associated Press activated backup systems to keep critical feeds running.
How users and businesses experienced the disruption
Consumers quickly noticed missing data, failed sign‑ins, and stalled feeds on services they use daily.
Consumer frustrations: login failures, missing data, app timeouts
Many users saw login failures and timeouts that prevented routine actions. Snapchat users lost friends lists and streaks. Streaming and messaging apps showed stalled content.
Smart home customers reported unresponsive Ring doorbells and stalled Alexa commands. Some essential websites for enrollment and provider search were intermittently unreachable, creating urgent problems for people needing access.
Operational headaches for companies relying on cloud infrastructure
Businesses faced internal tool slowdowns and external service degradation at the same time.
- Customer-facing systems: checkout flows, booking pages, and account portals timed out.
- Internal tools: support dashboards and provider searches returned errors or stale data.
- Time‑sensitive impact: flight checks and enrollment windows saw friction that affected planning.
| Symptom | Example | Impact |
|---|---|---|
| Login failures | Snapchat, gaming platforms | Users could not access accounts |
| Missing data | Friend lists, streaks, provider results | Confusion and lost session context |
| Device timeouts | Ring, Alexa | Smart home features paused |
Issues persisted as systems processed backlogs, so customers saw intermittent problems even after core services began recovery. This variability mapped to upstream network and DNS faults that produced a broad outage experience for many users and businesses.
How AWS responded: mitigation steps, limits, and recovery
The provider mobilized triage crews across control planes to both repair systems and manage user-facing impact. Engineers split into multiple parallel recovery paths to speed diagnosis and reduce risk of further failures.

Engineers engaged on parallel recovery paths and request throttling
Multiple teams worked in parallel to isolate the health-check and routing issues. To protect core capacity, the company said it limited new customer activity and throttled requests while balancing stability and throughput.
Key official updates tracked the cadence of recovery: the company said the database problem was mitigated at 6:35 a.m. ET, API errors were confirmed at 10:14 a.m. ET, and improvements with reduced limits were reported by 4:03 p.m. ET.
Support ticketing interruption and commitment to post-event summary
The automated support ticketing system also experienced downtime, complicating escalation paths for customers. That hiccup slowed direct communication even as engineers worked to restore service.
“The provider will publish a detailed post-event summary to explain causes and corrective actions.”
| Action | Purpose | Result |
|---|---|---|
| Parallel recovery tracks | Faster isolation of faults | Targeted fixes for affected systems |
| Throttling & limiting new activity | Prevent cascading failures | Reduced traffic; stabilized services |
| Support system downtime | N/A (impact) | Delayed customer escalations |
- Staged recovery required temporary limits so systems could sync and clear backlogs.
- Different aws services recovered at varying speeds, reflecting complex dependencies.
- Clear recovery communications proved essential for customers coordinating their own incident responses.
Overall, the response combined technical fixes with traffic management and a promise of transparency via a post-event report. That approach aimed to restore web operations while minimizing repeat risk during the prolonged outage.
On-the-ground snapshots: Roblox developers and real-world delays
Real-world workflows stalled as key development and editorial tools became unreachable for hours.
Roblox Studio downtime halted development work in the US and UK
Developers reported both the Roblox game and Roblox Studio were inaccessible in the US and the UK. Teams paused builds and tests until systems returned, a common response when external services fail.
Airline and healthcare slowdowns: minor delays and access issues
Delta logged only minor delays, but travel teams saw disrupted workflows and check‑in friction. Medicare users struggled to log into portals during open enrollment, creating time‑sensitive headaches.
Major media outlets, including The New York Times, faced service interruptions. The Associated Press activated backup systems to keep feeds running.
- Snapshot: companies and developers used manual workarounds while backend services recovered.
- Impact: production pipelines and schedules paused, showing how a region‑centered outage can ripple across the world.
- Aftermath: some problems lingered into the evening as queues and caches took extra time to clear.
“Contingency plans and clear communication proved essential for teams coping with the disruption.”
Context: AWS’s role in cloud computing and past outages
As the largest cloud provider, AWS shapes how millions of apps run and how failures ripple through the internet.
The company holds roughly 30% market share and generated about $107 billion in FY2024. That scale makes its control planes and data planes foundational for many web services and business operations.
Why centralized cloud services amplify outages’ reach
Centralization concentrates identity, storage, and messaging layers in shared regions. When those layers fail, correlated outages affect unrelated systems at the same time.
Past incidents in 2021 and 2023 show this pattern: airline reservations and payment apps felt the impact even when their internal architectures differed. Those events underline recurring risk for companies that rely on a dominant provider.
- Trade-offs: multi-region or multi-provider designs reduce single-region exposure but add cost and complexity.
- Scope: consolidated computing workloads can stall background processing, analytics, and customer-facing features.
- Responsibility: resilience requires both provider fixes and customer architecture, testing, and response plans.
“Shared infrastructure increases blast radius; design choices determine how loudly a failure is heard.”
What this means going forward for cloud resilience and users
Lessons from the service failures point to concrete changes in architecture and operations. Companies should design for region‑level failure with multi‑AZ and multi‑region failover, and consider selective multi‑provider strategies where feasible.
Teams must break services into smaller components, add circuit breakers, and use idempotent retries with exponential backoff. Test domain name system failure modes, add local DNS caches, and provide fallback endpoints for critical flows.
For data durability, adopt asynchronous queues, dead‑letter handling, and strict isolation between critical and noncritical paths. Centralize telemetry and run regular game days to validate recovery and failover runbooks.
Customers should expect intermittent issues during a wide outage and check official status pages. The provider’s promised root cause report will help companies align SLAs and improve resilience across web services worldwide.



