A few weeks ago, Stratus hosted a Webinar with Light Reading titled “Achieving Instantaneous Fault Tolerance for Any Application on Commodity Hardware” aimed at Telcos and Communications Application Providers. I was pleasantly surprised at the turn out. We had hundreds of people interested in this topic and here is a brief overview of what we discussed.

Communications networks have always needed high availability and resiliency. As more networking applications such as SDN Controllers and virtualized functions are being deployed on commodity servers rather than proprietary purpose built hardware, the need for software-based resiliency and fault tolerance has never been greater. A reliable network depends on its ability to quickly and reliably connect end-points, transfer data and maintain quality of service (QoS). If the network goes down, even just for a few seconds, many people can be affected. System failure may not only result in loss of revenue for the provider, but it can seriously damage its reputation and/or trigger penalty payments.

Unplanned server and data center outages are expensive, and the cost of downtime is rising. The average cost per minute of unplanned downtime is $7,900 to $11,000 per minute, depending on which study you believe. With average data center downtime of 90 minutes per year, this translates to a costs of about $1M per year, per data center.

A Highly Available (HA) network is one that ensures the network and its services are always on, always accessible (service accessibility) and active sessions are always maintained without disruption (service continuity). Five nines (99.999%) availability is the minimum benchmark, meaning that on average, the service is never down for more than five minutes in a one year period. While a typical HA of five nines (99.999%) or even six nines (99.9999%) sounds impressive, for maintained QoS, it may not be good enough!  Let’s look at an example. Consider an application that has six nines (99.9999%) of availability. At this level of HA it means the application will not go down for more than 31.5 seconds a year, which may seem impressive. However, if the application were to fail once a week for just a second and was not capable of returning to its original state after a failure, this would result in a situation where active sessions would likely be disrupted or degraded. So technically, a service may still be up (maintaining its HA metrics), but if active customer sessions are experiencing connection disruption or degradation in the form of reconnecting, less throughput, higher latency or less functionality, it will likely violate the Service Level Agreement (SLA) and result in significant customer dissatisfaction and penalty consequences for the service provider.

So what Telcos and Communications Providers need is more than just five nines or even six nines of availability – they need resilient platforms that can sophistically manage faults and continue service without disruption and degradation in performance, functionality and latency and maintain minimum acceptable levels of service as defined in the SLA.  And since not all applications require the same levels of resiliency, it is important to manage Resiliency SLA based on the different types of applications and their requirements. This is the difference between traditional HA solutions and resilient fault-tolerant solutions like everRun from Stratus Technologies.

everRun is a Software Defined Availability (SDA) infrastructure that moves fault management and automatic failover from the applications to software infrastructure. This provides fully automated and complete fault tolerance for all applications, which includes fault detection, localization, isolation, service restoration, redundancy restoration, and, if desired, state replication – all without requiring application code change and with dynamic levels of resiliency.  This means any application can be instantaneously deployed with high resiliency, multiple levels of state protection and ultra-fast service restoration speed – on commercial off-the-shelf (COTS) hardware in any network, without the complexity, time consuming effort and risk associated with modifying and testing every application. This is why everRun is ideal for communications applications that include video monitoring, network management, signaling gateways, firewalls, network controllers and more.

In the Webinar, we discussed the differences between standard HA system and resilient platforms like everRun, options for deploying resiliency (in the apps versus the software infrastructure), a brief overview of everRun, customer use cases and examples of how everRun is used in the communications space for telco networks and converging industries. To watch the webinar and learn more, please click here.