Lessons from the CrowdStrike Outage

Sebastian Griffin
Aug 8, 2024
2 min read

In today's interconnected world, technology is the keystone of nearly every aspect of our daily lives. The recent CrowdStrike outage has highlighted the vulnerabilities inherent in relying on a single entity for crucial technological services, reinforcing the importance of competition to provide different service options.

CrowdStrike, a leading cybersecurity provider, experienced a significant outage due to an engineering and configuration problem, not malicious activity. This incident resulted in the shutdown of critical IT systems across various sectors, including transportation. The ripple effects were profound and widespread: transportation networks ground to a halt, businesses lost access to vital data protection services, and the general public experienced disruptions in essential services—all due to the failure of a single company. This vividly illustrates how too much technological consolidation can wield excessive influence and pose severe risks to our economic and social stability.

Recovery from such an incident has been a significant undertaking. Cities like Philadelphia and New York faced monumental tasks to restore thousands of computer systems. Philadelphia restored more than 6,000 systems, while New York had to tackle issues with roughly 300,000 machines. Recovery initially required staff to work on each of the thousands of downed machines individually. This labor-intensive process highlights the critical need for better preparation and more resilient systems.

CrowdStrike's response included developing a method to help organizations in government or commercial cloud environments. This approach involved manually rebooting downed devices, which would then automatically identify and quarantine the flawed update before it could crash the system. Such measures, while effective, emphasize the importance of careful planning and rigorous testing before deploying updates. An issue that we find ourselves facing even in our region and state systems.

The question isn't just about short-term recovery—it's also about how organizations can plan to better withstand future incidents. Carefully deciding when to accept software updates is crucial. Real-time cybersecurity updates are intended to keep up with rapidly evolving threats, but quickly accepting a vendor’s update without first testing runs the risk of system failures. Organizations need to discuss with their vendors how quickly to update different kinds of systems, based on their operations' criticality and likelihood of being targeted.

CrowdStrike itself is now making changes to how it approaches updates. The company plans to use more types of testing, add more validation checks, and give customers more control over when and where content updates are deployed. Introducing updates to a small user base first before rolling them out to all customers is expected to make problems easier to catch before they affect everyone.

In the future, organizations should consider whether outside factors make a potential software acquisition riskier. A product widely used by Fortune 100 companies, for example, has the added risk of being an attractive target to attackers hoping to hit many such victims in a single attack. Identifying any single points of failure in their environments, where reliance on an IT solution's disruption could disrupt their entire organization, is also crucial.

While some resiliency measures may be too expensive for most organizations to adopt, awareness of these issues can help them adjust to and react to events like this in the future.

WHAT WE COVER

WHAT WE COVER

Free
Markets
First.

Lessons from the CrowdStrike Outage

Recent Posts

Comments