lol
AWS好像今晚修不好了
版主: 牛河梁
#3 Re: AWS好像今晚修不好了
但是好多网站还是起不来啊
#4 Re: AWS好像今晚修不好了
Oct 20 3:53 PM PDT Between 11:49 PM PDT on October 19 and 2:24 AM PDT on October 20, we experienced increased error rates and latencies for AWS Services in the US-EAST-1 Region. Additionally, services or features that rely on US-EAST-1 endpoints such as IAM and DynamoDB Global Tables also experienced issues during this time. At 12:26 AM on October 20, we identified the trigger of the event as DNS resolution issues for the regional DynamoDB service endpoints. After resolving the DynamoDB DNS issue at 2:24 AM, services began recovering but we had a subsequent impairment in the internal subsystem of EC2 that is responsible for launching EC2 instances due to its dependency on DynamoDB. As we continued to work through EC2 instance launch impairments, Network Load Balancer health checks also became impaired, resulting in network connectivity issues in multiple services such as Lambda, DynamoDB, and CloudWatch. We recovered the Network Load Balancer health checks at 9:38 AM. As part of the recovery effort, we temporarily throttled some operations such as EC2 instance launches, processing of SQS queues via Lambda Event Source Mappings, and asynchronous Lambda invocations. Over time we reduced throttling of operations and worked in parallel to resolve network connectivity issues until the services fully recovered. By 3:01 PM, all AWS services returned to normal operations. Some services such as AWS Config, Redshift, and Connect continue to have a backlog of messages that they will finish processing over the next few hours. We will share a detailed AWS post-event summary.
还有backlog, 估计很多网站明天还是起不来
#6 Re: AWS好像今晚修不好了
赖美豪中 写了: 2025年 10月 20日 21:01Oct 20 3:53 PM PDT Between 11:49 PM PDT on October 19 and 2:24 AM PDT on October 20, we experienced increased error rates and latencies for AWS Services in the US-EAST-1 Region. Additionally, services or features that rely on US-EAST-1 endpoints such as IAM and DynamoDB Global Tables also experienced issues during this time. At 12:26 AM on October 20, we identified the trigger of the event as DNS resolution issues for the regional DynamoDB service endpoints. After resolving the DynamoDB DNS issue at 2:24 AM, services began recovering but we had a subsequent impairment in the internal subsystem of EC2 that is responsible for launching EC2 instances due to its dependency on DynamoDB. As we continued to work through EC2 instance launch impairments, Network Load Balancer health checks also became impaired, resulting in network connectivity issues in multiple services such as Lambda, DynamoDB, and CloudWatch. We recovered the Network Load Balancer health checks at 9:38 AM. As part of the recovery effort, we temporarily throttled some operations such as EC2 instance launches, processing of SQS queues via Lambda Event Source Mappings, and asynchronous Lambda invocations. Over time we reduced throttling of operations and worked in parallel to resolve network connectivity issues until the services fully recovered. By 3:01 PM, all AWS services returned to normal operations. Some services such as AWS Config, Redshift, and Connect continue to have a backlog of messages that they will finish processing over the next few hours. We will share a detailed AWS post-event summary.
还有backlog, 估计很多网站明天还是起不来
又有多少老中来背锅?
#10 Re: AWS好像今晚修不好了
我就是干系统的。亲眼看到过很多网站的不堪一击。很多时候亚麻修复了网站也恢复不了,因为后台一堆东西瘫痪了,然后各种log啦 analytics 啦的pipeline积累一堆要处理,很多pipeline也没有考虑过系统这么久down
这几年又碎片化 动不动几百几千个micro Service 谁都搞不清到底哪出问题了 后续问题还会很多
#11 Re: AWS好像今晚修不好了
然
闪光的二猫 写了: 2025年 10月 21日 01:01我就是干系统的。亲眼看到过很多网站的不堪一击。很多时候亚麻修复了网站也恢复不了,因为后台一堆东西瘫痪了,然后各种log啦 analytics 啦的pipeline积累一堆要处理,很多pipeline也没有考虑过系统这么久down
这几年又碎片化 动不动几百几千个micro Service 谁都搞不清到底哪出问题了 后续问题还会很多







