The Uptime Institute’s recent global survey of more than 1,000 data centre executives shows that minimising the risk of power outages is still a high priority for data centre managers, but what are the main factors that need to be tackled to avoid such disasters? According to the Uptime Institute, the reason is much more complex than simply ‘human error’ and all too often the issues go much deeper within an organisation.
“There have been some high-profile outages in older facilities and I would question the original commissioning of these sites,” said Phil Collerton, managing director of the Uptime Institute in Europe, Middle East and Africa.
“Some of the data centres that have had problems in the UK were commissioned late 90s or early 2000s, when there was starting to be a squeeze on spending, so I would question how rigorous the commissioning was. Usually, when building a data centre, you are running to a tight time-scale and you have a date for going live. As the finish date gets closer, there may be delays, so you end up squeezing the three-week commissioning and integrated systems testing.
“This is one area that people should never cut back on. There are so many things that happen five or 10 years later that can be traced back to incomplete, poor and badly recorded commissioning.”
Another problem area for some older facilities is maintenance. Although there are schedules stipulated by the manufacturers of data centre infrastructure, in times of hardship, it can be tempting to cut back and defer maintenance.
“This can come back to bite you,” warned Collerton. “If we find a lot of planned maintenance has been cancelled, when we are certifying a data centre, this always raises a red flag…These patterns lead to bad practice and failures down the road.”
Often data centre failures are attributed to human error and this may refer to the fact that an operator ‘pressed the wrong button’ or perhaps didn’t react properly. But Collerton points out that their action is at the end of a whole chain of events starting at the very top with management decisions on budgets, staffing, training, maintenance and a whole host of other factors, that culminate in something going wrong.
“It is often the operator that is blamed, but it is not necessarily their fault,” said Collerton.
He pointed out that, at newer sites, outages are often due to cutting corners during the building of the data centre: “The design may be fine when it is handed over to the construction company but when it is being implemented they may see ways of saving money or they may add in ‘value engineering’.
“Suddenly, it is not the same as the original design. You may find you are trying to fix something when the documentation is wrong. When I used to build data centres, I would have someone walk around the facility every day to check what has been done.
“Often you would find things that have been ‘botched’ or changed. Data centres need to be vigilant during construction, commission adequately, and, above all else, maintain the site properly. It is a false economy to cut corners,” Collerton concluded.