The Value of Digital Experience Management: A Lesson from the CrowdStrike Global Outage

Colleen Marinelli headshot
SHARE ON:

Last week, we experienced one of the largest global IT outages, impacting millions of devices. Businesses worldwide reported IT outages, including the infamous Windows “Blue Screen of Death” errors on their computers due to a defective update from cybersecurity firm CrowdStrike. No industry was immune to this incident, with the outage affecting airlines, banks, businesses, schools, governments, and even some health services facilities across the globe.

Global IT organizations are still recovering, and it could take weeks to fully recover. This incident underscores the critical importance of Observability and Digital Experience Management (DEM) solutions in today’s interconnected world. DEM solutions can provide immense value during global IT outages like the recent CrowdStrike incident.

Key benefits of DEM solutions during global IT outages

During an outage, clear communication with users is crucial. Organizations need to quickly detect and respond to issues to resolve the downtime and disruption. DEM solutions capture user interactions and performance metrics to allow organizations to keep users informed about service statuses and expected resolution times.

By offering insights into system performance and user behavior, DEM solutions help build more resilient IT infrastructures with comprehensive reporting enabling organizations to understand the impact of outages and improve future response strategies, providing valuable data for post-incident analysis and continuous improvement.

Riverbed Aternity: A vital tool for managing global outages

Riverbed Aternity is a prime example of a DEM solution that can be invaluable during global IT outages. The past few days, many customers have been using Aternity to gain visibility of the impact from the CrowdStrike incident, which has enabled organizations to take prescriptive actions to fix problems faster and mitigate this situation.

Aternity swiftly helped customers identify which applications and servers across the enterprise were affected and determined whether the issues were escalating or subsiding. This visibility let

IT teams quickly confirm which systems were back to normal, ensuring a smooth and efficient recovery process. Here are a few ways Aternity can help in these types of incidents:

  1. Real-Time Monitoring: Aternity provides real-time monitoring of user experiences and application performance. This can help organizations quickly identify and diagnose issues affecting their systems and devices.
  2. Incident Management: With its detailed analytics and insights, Aternity can assist IT teams in pinpointing the root causes of outages and performance degradation, enabling faster resolution.
  3. User Experience Insights: By understanding how the outage impacts end-users, organizations can prioritize critical issues and ensure that essential services are restored first.
  4. Proactive Alerts: Aternity’s proactive alerting system can notify IT teams of potential issues before they escalate, helping to mitigate the impact of the outage.
  5. Comprehensive Reporting: Detailed reports and dashboards provide visibility into the performance and availability of applications and services, aiding in post-incident analysis and future prevention strategies.

Aternity ensures consistent performance, availability, and continuous operation, even during large-scale disruptions. These capabilities make Riverbed Aternity a powerful ally in managing and mitigating the effects of a widespread IT outage.

Aternity’s ability to track and monitor critical errors

By tracking and monitoring instances of the Blue Screen of Death (BSOD) on Windows devices, Aternity helps IT teams identify and troubleshoot the root causes of these critical system errors, ensuring better stability and performance for end-users.

Aternity tracks BSOD events by monitoring the health and performance of Windows devices in real-time through the following process:

  • Agent Installation: A small agent is installed on each monitored device, collecting data on system performance, application usage, and errors, including BSOD events.
  • Event Logging: When a BSOD occurs, the agent logs the event details, such as the error code, timestamp, and relevant system information.
  • Data Transmission: The collected data is sent to Aternity’s central server, where it is aggregated and analyzed.
  • Dashboard and Alerts: IT teams can view BSOD events on Aternity’s dashboard, which provides visualizations and detailed reports. Alerts can also be configured to notify IT staff immediately when a BSOD occurs.
  • Root Cause Analysis: Aternity helps identify patterns and potential root causes of BSOD events by correlating them with other system and application performance data.

This comprehensive approach allows IT teams to quickly identify and address the underlying issues causing BSODs, improving overall system stability and user experience.

Assisting with remediation during outages

For those already using Aternity, the impact of software upgrades, such as the CrowdStrike Sensor Platform and CrowdStrike Windows Sensor from version 7.14.18408.0 to 7.14.18410.0, can be closely monitored. IT teams can run remediation scripts to resolve issues, such as:

  1. Booting Windows into Safe Mode or the Windows Recovery Environment.
  2. Navigating to the C:\Windows\System32\drivers\CrowdStrike directory.
  3. Locating and deleting the file matching “C-00000291*.sys”.
  4. Booting the host normally.

In conclusion, the recent CrowdStrike global outage has highlighted the critical importance of Digital Experience Management solutions. Solutions like Riverbed Aternity provide the real-time insights, proactive alerts, and comprehensive reporting needed to manage and mitigate the effects of widespread IT disruptions effectively. As organizations continue to recover, investing in robust DEM solutions will be key to building more resilient IT infrastructures and maintaining service continuity in the face of future challenges.

selected img