Saturday’s 911-system outage in the District of Columia highlights the necessity for fault-tolerant systems running mission-critical applications. Due to a PEPCO power outage to the call site on Martin Luther King Jr. Avenue, citizens could not reach EMS personnel from 1:53 to 2:16 p.m. Although traditional and social media channels did their best to get the word out about alternate numbers, all 617,996 citizens of the District were put at risk. Perhaps nothing is more critical to a city than public safety systems like EMS, Fire and Police response, and avoiding public safety outages.

@AriAnkhNeferet from Twitter said it best, “Someone please explain to me how it’s possible that 911 is experiencing a power outage?! Come on DC. we have to do better.”

She is right – the most mission-critical systems and applications shouldn’t be subject to outages, power or otherwise. Backup systems, fault-tolerant servers, and disaster recovery solutions are all possible ways to make your EMS system safer for the community. Servers wired for two distinct power sources that come from separate power grids, like our ftServers, are an easy way to guard against power outages. Live data replication and split-site capabilities, two features of our Avance high availability software, are two other ways to ensure your systems are protected.

Besides power failures, server crashes, memory failures, disk drive failures and a countless number of other technical problems can crash servers much more often. Saturday’s power outage demonstrates what could happen if a public safety system goes down for any number of reasons, and reinforces that steps need to be taken to protect systems from more normal/frequent occurrences.

When lives are at stake, you cannot be too careful. However, @AriAnkhNeferet’s tweet shows that something else is at stake: reputation. What happens when if public loses trust in the EMS system to respond? A large Metro can get 30,000 9-1-1 calls per day. That would mean the 20+ minute outage could have affected 400+ 9-1-1 calls, leaving citizens stranded and the city’s first line of defense helpless to respond.

If you run life-saving systems, it might be best to run through some worst-case scenarios on your existing architecture. What happens when a power failure happens in your call center? What happens when a server has a hardware failure? What is your disaster recovery plan in the case of an earthquake, fire, or flood? Are there dedicated resources available 24-hours in the case of a failure?