Vista

All agents are seeing mass blue screen on windows

Resolved

Incident Resolved: CTIM-740
No additional problems have been reported in production. Services have stabilized and been running as expected for an hour (or longer in most cases). At this point the incident is fully contained.

Please contact CTIM or see ticket for more details.

Updated

Incident Updated: No additional major reports of fulfiller problems have been received by CTIM in the past hour. The Vista-specific workflows that were still having issues are being remedied. We will keep this incident open for now in case new symptoms are noticed, but we are moving closer to a point of stability in all previously-affected services.

Updated

Incident Updated: All support teams have been working to restore production functionality to various fulfillers. Most should be able to resume normal activities at this point. There remains some issues with specific workflows, primarily in Vista plants, that continues to be investigated. If you still have unresolved or undiagnosed problems, please notify CTIM or use your normal support escalation process so that we may help.

Updated

Incident Updated: Ganger started for both BHI, WND & RNO. Targets are generating. IBK status- In all of these restarts and domain controller problems, an internal service might have received a new IP address. Network Team and Product Services continues to check on the IP addresses.

Updated

Incident Updated: VEN IBK status- Found no network adapter configurated on prdvenlscibk001/002. Rebooting 001 now. Investigation on IBK continues. Ganger started for BHI & WND.

Updated

Incident Updated: IBK for WND came back up & Harshil from Product Services verifying it. Network team confirmed IBK address is resolving and pointing toward wnd-reverseproxy.net.vpsvc.com. IBK is online and should be working as expected.

Updated

Incident Updated: Fulfillment and Plant IT teams continue to work on restoring services after a recent crowdstrike global (And well beyond Cimpress/Vista in scope) outage brought down all of our Windows Based systems, both in the plants and in AWS. At present we are, optimistically, predicting that we should have full plant functionality at most sites restored within the next 2-4 hours, with Deer park lagging slightly behind (we may need someone to physically drive in to help with restoration of service, which will add some time)
All viper related servers seem to be back up in Venlo, some services are still having some issues, but trying to get those to work now.
VENVPDC01 / primary DNS for Venlo has been recovered. That should fix a ton of DNS issues. Venlo team will do the same for DPK now.

Updated

Incident Updated: Update from Muninn:
All plant DB servers are online
AWS DBs are online and accessable via SQL Credentials

Updated

Incident Updated: Product Services, Production Models & Composition Services teams are online and verifying their services.

Updated

Incident Updated: AWS DB Servers seem healthy and can be reached using the dbprd.msw.cimpress.io alias using SQL Credentials (windows auth most likely wont work at the moment)
Akeyless token rotations are healthy. Venlo PRD failures are machines that need to be started.
VISTAPRINT.SVC domain servers started
VENVPDC01 was down, trying to start;
VENVPDC02 is up

Updated

Incident Updated: Updates from Muninn:
WND/VEN unchanged (up and healthy)
DPK 813 &814 nodes are up and healthy, 815 is a VM booted up
BHI 713 is up and able to take connections 714 & 715 are down, and I’m unable to get into VCenter there to check their status
AWS: Waiting on Venlo’s VM’s to come back online as the Domain Controller they use is there

Updated

Incident Updated: BHI- All of Viper’s services have begun but it appears that Ganger is failing.
DPK- It is necessary to examine both VM farm and VPUDC.

Updated

Incident Updated: UFI - Akeyless client tokens are not refreshing in any plant. Investigation continues.

Updated

Incident Updated: Teams are still working on restoring the data center systems.

Updated

Incident Updated: Updates from Muninn:
Windsor plant DB cluster is up and healthy
Venlo plant DB Cluster is up and healthy, DBINFRA313 is down
DeerPark plant DB cluster is down
Bhiwandi plant DB cluster is down
AWS Cluster IS up, but unreachable due to .svc domain controllers in venlo being down

Updated

Incident Updated: All Plants IT teams are working with Muninn, Systems Engineering & MIS UFI squads to restore the data center systems following the manual steps. Some of them have already been restored, but there are still many that need to be restored.

Updated

Incident Updated: WND Plant confirmed that dbplant213 is back up and running. Awaiting for health check.

Updated

Incident Updated: 2:28am Eastern - Crowdstrike has reverted a change on their end to remediate this issue. They have provided instructions if you are stuck in a reboot loop. Add, they have also provided some steps that needs to be performed manually on all machines to restore them.

Updated

Incident Updated: DPK IT confirmed VPDC01 is online now and in safe mode.

Updated

Incident Updated: UPDATE: The Cimpress Security team has updated that there is a global outage with the Crowdstrike platform, and they are working towards a fix. We will continue to post updates as we hear back.

Investigating

New Incident: CTIM-740
Priority: Critical
Escalation sent to: CT Incident Management for review.
Currently all users using windows are seeing blue screen error unable to login