Introduction: Recently, I sat down with Terry Rourke in programs marketing to discuss how North America’s emergency call centers are doing in terms of keeping their critical public safety applications up and running. The answer, sadly, is not good. Terry has been conducting Stratus’ Public Safety Answering Point (PSAP) survey for two years, and as he prepares to unveil the results from the latest survey, he predicts not much will have changed in a year. In fact, he doesn’t think enough attention is placed on this important issue because 911 generally works when you dial it. But, what’s not often reported, Terry states, are all the system outages and lengthy downtime 911 centers experience. He believes even though incremental improvements are made year after year, including implementing virtualization and cloud technologies, these steps are never enough to detect and prevent a system failure before it happens and that’s essential when lives are on the line. We wanted to capture his thoughts and share them here on Voice of Uptime. Please let us know what you think, and what should be done to solve the availability challenge for our nation’s aging 911 infrastructure in the cloud era.
Question: In your opinion, why in 2015, will many PSAPs experience application outages? Isn’t virtualization and cloud technology at a point where downtime should never even enter the equation for a 24/7/365 operation like emergency response?
Terry: Sadly, many PSAPs will continue to experience downtime and outages this year because they are focused on the recovery of applications that are down versus preventing the downtime in the first place. Although technologies like virtualization and cloud are more mature than ever before, both of these technologies still fall into the recovery category. The good news, however, is the length of time these critical applications are down for should decrease, since these systems are focused on restoring availability.
Question: Last year, some of the most alarming statistics were: 911 can miss about 60 calls per hour; 16% of call centers experience downtime five or more times; and, almost 30% experience downtime lasting more than one hour. What do you think will be some of the most alarming stats this year and why?
Terry: I think two alarming themes will emerge from this year’s survey. The first is we’ll see smaller, regional emergency call centers experience more downtime this year due to their stretched resources. These centers generally have smaller budgets, and their first priority is finding enough quality staff to quickly answer incoming calls to provide support, guidance and care to those in need. They know adopting next-gen 911 technologies is important, but it may not be feasible for certain PSAPs.
The second alarming theme is 30% of call centers are still likely to experience more than an hour of downtime and that’s hard to digest given the advanced technology we have available today. Larger PSAPs are the most at risk of missing calls due to sever outages because of the sheer volume of calls they receive. Additionally, these larger centers must act as the failover sites for their smaller counterparts within the regions they serve.
Given these factors, these PSAPs should never rely on “good enough” availability technology such as clustering, which involves using sets of servers working together as a single system to restart failed applications. The issue with clusters is they don’t avoid failure, they just recover quickly from failures. By comparison, Stratus’ fault tolerant systems utilize a unique combination of duplex hardware combined with software that constantly monitors more than 500 system components and sensors to identify, handle and report faults to ensure uninterrupted performance of essential systems like CAD within the PSAP.
Question: What are most PSAPs doing to make improvements while “doing more with less?”
Terry: Most PSAPs are focused on implementing next-generation 911 technologies along with cloud measures to serve their population needs. Next-gen 911 systems are Internet Protocol (IP)-based and allow digital information used by the public (e.g., voice, photos, videos, and text messages) to flow seamlessly through 911 networks. This digital information comes through the internet to the PSAP and is shared or transferred to first responders via services implemented in a cloud environment, hence the parallel. But, as PSAPs contemplate these upgrades, they should consider more than the technology infrastructure itself. They should carefully evaluate vendor claims, service level agreements (SLAs) and the total cost of ownership for the full implementation and maintenance of the new system. Often, solutions that may have looked appealing at first blush, such as clustering options, are not as appealing when you factor in these hidden costs in addition to increased downtime during failovers.
Question: How can virtualization and cloud decrease the risk of downtime events in PSAPs?
Terry: These technologies do reduce the risk of planned downtime for things like service and maintenance, and they help PSAPs recover from downtime quickly, including feeding into their disaster recovery planning. But, they won’t reduce the risk of unplanned downtime until they are able to monitor the viability of the system with algorithms for predicting and avoiding downtime. And just as PSAPs struggle to use these technologies to minimize downtime, so do the largest cloud providers themselves. For example, according to cloud provider benchmarking company Cloud Harmony, Microsoft Azure was only 99.9388% available last year. This was due to 103 incidents that totaled just shy of 43 hours of downtime in a year!
Question: Are more PSAPs putting disaster recovery plans in place or firing up secondary locations in case of catastrophic outages this year?
Terry: Though all PSAPs have some type of contingency plan in place for catastrophic outage events, it’s not clear many have made progress in this area for their systems. I believe those who will establish secondary locations for their CAD will use cloud technologies to do so, and these centers will realize significant benefits. The reason: cloud offers a form of outsourced IT using virtualization. In a sense, applications and data can be hosted by a cloud service provider with multiple datacenter locations and integrated plans for disaster recovery.
Question: Given upgrades don’t happen to the extent needed, do you think the U.S. would ever consider a centralized 911 initiative, creating a broad pool of resources with a fail-safe architecture as is done in other English-speaking countries around the world such as New Zealand?
Terry: Actually, there are already some initiatives underway for centralized 911 infrastructures, but I think it will require a lot more support and mainstream awareness before the movement makes any real progress. So it is possible – it’s just not happening for a while.
Consider the struggle we’ve all witnessed around universal healthcare and electronic health records (EHR). These subjects are still widely debated and misunderstood, yet we have made progress on both, so better approaches will certainly evolve with time.
Question: Are there any additional takeaways you can share now before you release the new data? Thanks for your time.
Terry: Although public safety still has a lot of room for improvement, mid-survey results indicate next-gen 911 technology implementations have increased, with only 25% of PSAPs reporting they have no next-gen 911 plans in place today. Additionally, 13.5% reported they have already implemented next-gen 911, which is an improvement from last year and great for the communities these centers serve.
I’m looking forward to seeing what other improvements and plans large and small call centers have made, and I am excited to share with you the final data from 2014 in a few weeks as our public safety depends on it.