Incident Resolved: CTIM-919
The timeouts and errors have improved significantly since various changes and improvements were implemented over the past two days. Teams in CT and Vista have both confirmed the increased stability in requests to the affected service. At this point, the incident is contained.
Please contact CTIM or see ticket for more details.
UPDATE:
The changes made to improve caching has resulted in a measurable decrease in timeout errors for merchandising quantity requests, although there’s still more errors when comparing the data to the previous month. CT and Vista teams continue to investigate the slowness to drill down on the exact causes.
UPDATE:
Updates to PI service are nearly completed, just pending the swap after code changes were deployed. Some minor improvements were seen after the updates to Product Pages by Vista, though we expect the cache changes to Strandbeest’s services to have more of a positive impact.
UPDATE:
The updates to the Product Page V2 has been deployed. This should help manage the API request traffic to the service while the caches are being implemented.
UPDATE:
Vista teams are planning to implement a change to reduce total requests per minute that could help the problem, though the change is pending merge approval and deployment currently. Concurrently, Strandbeest is adding capacity to handle the load more effectively.
UPDATE:
Strandbeest is aware of the latency spikes reported by Vista that appear to have started on 10/07/2024 and have steadily increased since that time. They have already been working to improve the caching process for BES which is being implemented today for some services. Currently, support/dev teams believe this could be the cause of the issue.
UPDATE:
Escalation sent to: PCD: Strandbeest
Added Strandbeest to assist.
New Incident: CTIM-919
Priority: Critical
Escalation sent to: PCD: Product Services for review.
Vista PBM has reported issues with product selection options on product pages. The selections are intermittently not showing to customers, though sometimes a page refresh will correct the issue. They noted an overall increase in timeouts for quantities merchandising requests to https://catalog-transition.products.vpsvc.com/api/v2/products (200-800/hr over the past week). Requesting health check from Product Services on this endpoint (owned by Strandbeest).