Using analytics to tackle outage risks

0

Nlyte Software’s Enzo Greco speaks to Louise Frampton about the role of artificial intelligence in the wake of an Uptime Institute survey which found that many data centres are struggling to get to grips with increasing complexity 

Data centres are becoming “too big and too complex” to manage without data centre infrastructure management (DCIM) software, and the risks are far too great, according to Uptime Institute Research vice-president Rhonda Ascierto. Visibility and control in the form of data and analytics is critical, in her view. 

As the latest survey from Uptime Institute shows, many data centres are struggling to get to grips with the resilience of their operation and the rate and severity of outages is disturbingly high. Analytics can help reduce some of these risks. 

The Uptime Institute’s survey highlights that a ‘small majority’ of data centres now have some type of DCIM, and typically their implementation has been successful ‘contrary to widespread industry reports’. As more workloads move to the Cloud, the need for analytics and visibility is becoming even greater. Many workloads span multiple locations – onsite, off-prem or in a colo. 

“This creates challenges on a number of levels – workloads are harder to predict, they have unpredictable swings in power use, and failures can have greater knock-on effects. In order to support this, you need real-time operational management and intelligence reporting,” comments Ascierto.

Redefining ‘failure’ 

Commenting on the Uptime Institute’s latest survey findings, Nlyte Software’s chief strategy officer, Enzo Greco, says that the Institute has redefined ‘failure’ – rather than simply being an enterprise-wide, catastrophic event, ‘failure’ now refers to any incident that results in a “degradation of service”. 

“Many things can contribute to a degradation of service. It can be a power event, a thermal event, a security anomaly – or any number of issues… there are many variables – failure of the IT systems, failure of the facility and there is user failure, which is still the most prevalent,” he comments.

Greco agrees with Ascierto that analytics have a key role to play in addressing risk and this is now receiving a great deal of attention in the market. He points out that data centres generate a tremendous amount of data, including data generated by the server, operational data created by applications and a huge amount of data generated by the facility – from temperature and humidity, to power quality. 

“In the past, we would look at a very small subset of data inputs, such as the temperature. Today, we have multi-varied analytics… ultimately, we can include these many different inputs to optimise outcomes,” says Greco. 

He adds that it is important to integrate the different systems within the data centre – including the building management system (BMS), data centre infrastructure management (DCIM) system and IT service management (ITSM) system. 

“It is not uncommon for one of these to find a fault but not share it with the other systems. What constitutes state-of-the art is to integrate these different systems together so that we can have end-to-end visibility, but also end-to-end management of problems that may occur,” says Greco.

Nlyte recently added AI-driven data centre management as a service (DMaaS) to its real-time DCIM software (Nlyte Machine Learning) with IBM Watson Internet of Things (IoT). 

This cognitive DCIM solution simplifies DCIM adoption, workload optimisation and helps prevent data centre power and performance issues. 

DMaaS aggregates and analyses large sets of anonymised DCIM data, which is enhanced with machine learning to spot anomalies and patterns, optimise operations, as well as predict and forecast.

Greco points out that data centres, until now, have tended to be reactive – taking corrective action to address a problem after it arises. The role of analytics is to change this reactive model, into a proactive one. “The ability to predict events has always been the holy grail of the data centre and this software allows them to do just that,” he comments.

Efficient cooling management

Data centres are also facing increasing scrutiny around the power they consume, but an increasing focus on energy management is also adding to the complexity of operations. 

“Traditionally, you would have room-level cooling in the data centre. Today, there is a move to precision cooling, where it is possible to cool down to an individual rack. This goes a long way towards better efficiency. The technology allows us to do this, but there are so many control points, it cannot be performed manually… you need to have AI to manage this complexity,” continues Greco. 

He explains that if the data centre operator is able to predict that they will have a hotspot in the future, they can be highly efficient by targeting their cooling, pre-emptively, towards this area before problems occur.

At the same time, if they have integration of BMS, DCIM, as well as the IT systems, they can establish that an entire part of the data centre will have very little application activity in the future. Hence, they may not need to cool that part of the data centre in the same way and have the option to increase the temperature in that location.

Greco says that the next efficiency gains are not going to come from the cooling units themselves, which are already very efficient, but in how the units are operated. 

However, he also believes that there is “no better way to increase efficiency than to shut down idle servers”. 

Visibility of server usage can be integrated with information from the DCIM system on how the data centre is operating, and this can then coordinate with the BMS, which controls the temperature within the data centre. This integration of technologies is critical to efficient operation.

‘Dark’ facilities

With the increasing proliferation of unmanned edge data centres, smarter, more integrated control systems will become ever more important. 

“You will need localised intelligence; if something fails, you won’t have the luxury of being able to walk through the facility. These ‘dark’ facilities are part of a much larger fabric. 

“When there was just one data centre, it was simple – people knew what to control. Increasingly, there may be a set of centralised data centres, surrounded by any number of Edge centres, and these need to be managed as a cohesive enterprise. 

“These Edge data centres are an extension of one local, computing capacity, but geographically dispersed, and this topology is far more difficult to manage. It is calling out for artificial intelligence.”

DMaaS

While Greco points out that the ability to interpret data across the data centre – and the Edge – is vital, there is an opportunity to take this knowledge even further.

“Data centre management as a service enables you to access not just the data from your own facility, from your own environment and equipment, but also to gain an insight from similar environments, to predict failure. For example, UPS systems are the last defence against failure, so there is a lot of monitoring and control of these assets. While the data centre benefits from receiving data inputs from their own UPS systems, in the market place, there are thousands of these installed. The promise of DMaaS is to share information from all of these installations, which provides far better insight,” he comments.

Ultimately, analytics will enable data centre personnel to optimise their operations and identify potential vulnerabilities, taking corrective actions before harmful issues occur. This could contribute to the reduction of the outages, that have been reported, in recent years, while helping to reduce the impact of data centres on the environment.

DCIM now ‘mainstream’

Historically, DCIM has been a controversial and under-deployed technology. However, according to the Uptime Institute’s latest survey, DCIM has reached the point of being a ‘mainstream data centre technology’. More than half of survey respondents (54%) said they had purchased commercial DCIM software, with an additional 11% having deployed homegrown DCIM. Highlighting the maturity of the technology, 75% of the users said their deployment was successful, and nearly half (47%) are supplementing their implementation with more DCIM tools.The most common motivation for deploying DCIM was capacity planning (76%) and power monitoring (74%). Other reasons ranged from giving executives and customers (of multi-tenant data centres) visibility (52%), to compliance (35%).

LEAVE A REPLY

Please enter your comment!
Please enter your name here