Inventory and Delivery Planning

Checkout - Spike in use of Fallback Speeds in shipping options due to EDO Timeout

Resuelto

Incident Resolved: CTIM-993
Contain incident
Please contact CTIM or see ticket for more details.

Actualizado

UPDATE:

Findings from Dispatch squad are that between 3am - 4am, 2 Dec (UTC) where fallback shipping speeds were used due to EDO failure, there was an issue wherein the root cause was that some pods were in an unhealthy state. Specifically, the ability of these pods to make network calls was affected and in this case they did not get removed or self-heal. At the time of the incident, Dispatch was paged by their own monitoring but as this is a rare occurrence, took some time to find the pod issue and cycle them. In future cases of alerting, Dispatch team will look at pod health first, as well as they have lowered CPU scaling threshold to act as a preventative measure.

Actualizado

UPDATE:

Dispatch reported back that they were alerted by their own monitoring and took action during the incident.

Investigando

New Incident: CTIM-993
Priority: Critical
Escalation sent to: PCD: Dispatch Squad for review.
Between 3 AM and 4 AM UTC on December 2, there was an unusual spike in the use of Fallback Speeds for shipping options due to an EDO Timeout. While fallback speeds ensured shipping options remained available to affected users, this workaround is suboptimal, particularly during the holiday season. The incident impacted over 3,000 shoppers, presenting them with fallback speeds for shipping options.
For follow-up, please refer to the dedicated Vista Slack channel, #pbm-156316-post-incident-investigation.
Given the timing and potential impact, this should be treated as a priority during business hours.