The number of outages has steadily climbed in recent years, prompting the Uptime Institute to develop a standardised method of articulating the severity of such incidents, with the aim of better understanding key trends…
The Uptime Institute has announced its new Outage Severity Rating (OSR) to help the digital infrastructure and data centre community better understand and articulate service outages in the context of how each incident affects the business.
For the past three years, Uptime Institute’s Intelligence group has been studying publicly reported outages to understand the causes and impacts of unplanned downtime. During the three-year time period, the number of public outages has steadily climbed, with 27 outages in 2016, 57 outages in 2017 and
78 outages in 2018.
This rise in outages is proportional to the complexity of typical infrastructures, where computing capacity and its associated data is delivered by a combination of in-house data centre sites, colocation facilities and the cloud, all connected by high capacity networks.
Consequently, IT system and network problems have now surpassed mission critical and facilities issues as the leading causes of publicly recorded outages, compared with power, which was the biggest cause in previous years.
Uptime Institute executive director of research Andy Lawrence says: “Public awareness of outages is becoming more pronounced as the number and impact of outages increases. In most cases, we find it difficult to understand the true nature and magnitude of the outage since most practitioners still characterise the severity of an outage based on the amount of affected physical infrastructure equipment.
“The OSR was developed to allow the data centre industry’s infrastructure practitioners to view outages from the top down, at the IT service delivery level, and then communicate with one another in an informed and normalised business impact fashion. The OSR eliminates the equipment-centric view of outages, and instead focuses on the ability for the hybrid digital infrastructure to support the required IT business services being delivered by the infrastructure.”
Historically an ‘outage’ was considered as a binary state of service delivery; entire data centres were described as online or offline. Consequently, Uptime Institute has been advising companies that they need to pay more attention to business service resiliency, understanding how the hybrid system is designed, what the interdependencies are, and then plan accordingly.
The use of OSR will allow IT business managers to better understand their own outage trends and where to focus their investments to reduce business continuity vulnerabilities and other risks over time.
The Outages Severity Rating (OSR) is categorised as follows:
Negligible – This is a minor outage, recorded and reported but with little or no obvious impact on business services, and no service disruptions.
Minimal – This is an outage where some number of IT business services are disrupted or degraded but with minimal effect on users/customers/reputation.
Significant – This is an outage with observable customer/user services disruptions, mainly of limited scope, duration or effect. Minimal or no financial effect. Some reputational or compliance impact(s) possible.
Serious – This is a major outage, with disruption of service and/or operations. Ramifications include some financial losses, compliance breaches, damage to reputation and possible safety concerns.
Severe – This is a mission critical outage, with major, damaging disruption of services and/or operations, with ramifications including large financial losses, possible safety issues, compliance breaches, customer losses and reputational damage.