Coinbase Halts Trading After AWS Chiller Failure
On May 7 Coinbase paused trading after an AWS data center chiller failure caused quote errors that knocked its matching engine and Kafka offline, disrupting trades and balances.
Coinbase halted trading for several hours on May 7 after a cooling failure at an Amazon Web Services data center triggered widespread quote errors that took the exchange’s matching engine and Kafka messaging cluster offline. The outage disrupted spot and derivatives trading and delayed account balance updates. Internal monitors flagged the issue at about 23:50 UTC and engineers opened multiple Sev1 incidents as customers experienced interruptions across retail, advanced, institutional and international products, including Coinbase Prime and derivatives services. CEO Brian Armstrong posted on X calling the outage “never acceptable” and attributed it to “a room overheating in an AWS data center due to multiple chillers failing.” Coinbase identified the affected facility in the AWS us-east-1 region. Coinbase engineers described the root event as a thermal failure that impacted a small percentage of racks in a single facility. The company keeps its low-latency exchange infrastructure in one availability zone to meet speed requirements, while maintaining a distributed backup copy elsewhere. During the incident two critical components failed: hardware under the matching engine and the Kafka cluster that distributes data across systems. The matching engine runs as a distributed cluster that requires a quorum of healthy nodes to elect a leader and process trades. With some nodes compromised, quorum could not be achieved and trading halted across multiple markets. Engineering teams executed disaster recovery procedures to rebuild quorum and assess system health while operating with constrained infrastructure. Restoring Kafka required recovering partitions onto a new broker and re-synchronizing replication for terabytes of data, which caused delayed balance streams until replication caught up. Coinbase reported no customer data loss. When core systems were restored, markets were not re-enabled immediately. Products were first placed in cancel-only mode, then moved to auction mode to validate state before full trading resumed. The outage occurred as Coinbase pursues an AI-focused operational strategy that includes plans to reduce roughly 700 roles, about 14% of its workforce, to replace manual processes with automation. The company said it will publish a detailed postmortem in the coming weeks. In response to speculation on internal and public threads, employee Josh Ellithorpe wrote that “no one vibe coded something that failed. A ‘non-engineer’ didn’t push production code and take out the trading engine. It wasn’t intentional. It wasn’t because Coinbase failed to design a failover system. Things happen at scale, don’t let the armchair quarterbacks tell you tall tales.” Rob Witoff, head of platform, provided technical updates on the recovery steps and the causes, noting the work required to restore distributed systems under hardware stress and the manual effort to recover Kafka partitions and synchronization.
Content on BlockPort is provided for informational purposes only and does not constitute financial guidance.
We strive to ensure the accuracy and relevance of the information we share, but we do not guarantee that all content is complete, error-free, or up to date. BlockPort disclaims any liability for losses, mistakes, or actions taken based on the material found on this site.
Always conduct your own research before making financial decisions and consider consulting with a licensed advisor.
For further details, please review our Terms of Use, Privacy Policy, and Disclaimer.








