Product Configuration

Product Page experiencing errors with Selector API

Bijgewerkt

UPDATE:

No additional details to share at this point. We will monitor the issue and follow up with the dev team during business hours to check for any additional response time spikes for the Selector API. Next update in max 24 hours.

Bijgewerkt

UPDATE:

Teams continue to actively investigate possible containment measures for the periodic slowness, though we’ve received no additional reports of impact to customers at this point. We will monitor the progress and provide another update in 1 hour.

Bijgewerkt

UPDATE:

The team confirmed that defining a reasonable maximum for the quantity attribute resolves the performance issue for the affected versions. We will wait for the other teams to verify and confirm this. The next update will be shared once we have more information or within a maximum of 1 hour.

Bijgewerkt

UPDATE:

Further analysis shows that the API response time is significantly higher for versions while the underlying PRDs continue to function normally. The team is now reviewing whether there are optimization opportunities in the serialized attribute model returned by the model composer, which may be contributing to the API performance issue. Next update in max 2 hours

Bijgewerkt

UPDATE:

The team has identified a specific product, contributing to the slowness observed in the API and the behavior has been successfully replicated in the staging environment. Initial analysis shows that the engine initialization step is taking significantly longer than expected and consuming high CPU resources. The team is continuing to investigate why an older version of the product is being processed and its impact on API performance. Next update in max 2 hours.

Bijgewerkt

UPDATE:

Teams have reviewed the situation and agreed that while performance is not yet fully within expected SLA levels, the system is currently stable enough to continue investigation on Monday during business hours. Infrastructure capacity has been increased and monitoring has been enhanced to quickly detect any further issues. The situation will continue to be monitored throughout the weekend, and a deeper investigation will resume as a priority on Monday, with next CTIM update coming Monday as well.

Teams will re-engage CTIM if another spike occurs before then.

Bijgewerkt

UPDATE:

Vista reports that metrics have improved since the incident started but are not yet to baseline levels. Additional context related to ongoing mitigation efforts and next steps are needed from the dev teams. Next update as soon as we have more info from them.

Bijgewerkt

UPDATE:

Vista teams have noticed some improvements in the last hour with fewer timeouts though there are still some spikes. Investigation continues into the root cause and additional mitigation efforts. Next update in 1 hour or sooner.

Bijgewerkt

UPDATE:

Investigation indicates that the issue may be related to the complexity of certain products rather than overall request load. As a mitigation step, the team has temporarily reduced processing to 1/5th to contain the impact while the investigation continues. The team is now monitoring API CPU utilization and latency to confirm whether this change improves system stability.
Further clarification is also needed from the team on why the Service is calling the API. More updates will be shared once additional findings are available or max in 1 hour.

Bijgewerkt

UPDATE:

The team has identified a high volume of requests coming from client ID that belongs to NP and earlier spikes were also observed for the NP account between 12:15 PM – 1:30 PM IST. These spikes appear to be contributing to the increased load on the API.
The NP team has been engaged to review the client activity and confirm why these requests are being generated. Once they validate the source and behavior of these calls, we will have better clarity on whether this is expected traffic or related to a change. Next update when we get more information on the confirmation or max in 1 hour.

Bijgewerkt

UPDATE:

Teams are actively investigating the issue with priority. Next update in max 1 hour.

Bijgewerkt

UPDATE:

~1% of total API requests are experiencing high latency while 99% of requests are still being served within SLA. Other product pages continue to function normally.
Relevant teams are continuing their investigation to identify the root cause and implement measures to prevent this in the future. The next update will be provided once more information is available or max in 1 hour.

Bijgewerkt

UPDATE:
Escalation sent to: Commerce Support

Investigation continues to find the root cause. Adding another support to help further. Next update in max 1 hour.

Bijgewerkt

UPDATE:

The teams are debugging the issue internally to determine the root cause. The next update will be provided once more information is available or within a max of 30 minutes.

Bijgewerkt

UPDATE:

The team continue to investigate the issue with high priority. Next update in max 30 minutes.

Bijgewerkt

UPDATE:

The issue has reappeared and we are again seeing an increase in errors related to the API on the Product Page. The team is actively investigating the spike in error rates and elevated response times.
Relevant team has been paged to assist with the investigation. Next update in max 30 minutes

Bijgewerkt

UPDATE:
Escalation sent to: PCD: Falcon Squad

Adding another support team to assist investigation.

Opgelost

Incident Resolved: CTIM-2404
The issue has auto-resolved and dropped to normal levels; hence, the case status will be moved to Contained.
Please contact CTIM or see ticket for more details.

Bijgewerkt

UPDATE:

The team is continuing to investigate a Critical-priority incident affecting Vista, where the Selector API is impacting PPv2 and Wrangler services. The next update will be provided in 30 minutes or less.

Bijgewerkt

UPDATE:
Escalation sent to: PCD: FireSharks Squad

Paging the Firesharks squad, who own the Selector API. Next update in 30 minutes or less.

Onderzoekende

New Incident: CTIM-2404
Priority: Critical
Escalation sent to: PCD: Product Operations for review.
CTIM is investigating a critical-priority incident affecting Vista. The Selector API is impacting PPv2 and Wrangler services. Approximately 500 users have been confirmed to experience fallback or incorrect configurator behavior since the issue began, with most of the impact and error spikes occurring in the past 1–3 hours. The majority of issues are currently being observed on the Product Page BFF, but the Selector API is used across multiple site areas, and other dependencies may also be affected. The next update will be provided in 30 minutes or less.