Shared learning from data centre incidents could help reduce risk across the sector, but can the industry overcome a culture of secrecy? Louise Frampton reports
In recent years, there has been a drive to adopt a culture of learning from mistakes across a variety of mission critical industries. In healthcare, for example, staff are required to report safety incidents and alerts are subsequently shared across the sector, to warn of identified risks. The Mid Staffordshire scandal has prompted a major review of the way mistakes are dealt with and there is now a concerted effort to promote greater candour within the sector.
The airline, nuclear and oil and gas industries also have a long history of championing this approach, and have successfully developed a culture of openness and reporting of errors, with a view to reducing risk. The key to this has been a change of mind set; the culture has shifted from one of blame to one of shared learning – so could the data centre sector learn from their example?
Simon Allen, from the UK Data Centre Interest Group, believes that sharing information on data centre incidents could help to drive improvement across the sector and prevent costly outages: “The airline industry has an enviable record of continuously improving flight safety by industry-wide sharing of accident and potential accident information. However, the same is not the case in the data centre industry where it is common practice to cover up failures or potential disasters in a misguided attempt to protect reputations,” comments Allen.
He points out that root cause investigation findings are normally secret and bound by non-disclosure agreements, putting the data centre industry at a disadvantage.
“Learning from mistakes is an inherent and essential human ability – and denying the data centre industry from this single, most important development channel, is simply absurd,” argues Allen.
Incident reporting
The Data Centre Incident Reporting Network (DCIRN) has been set up to tackle this, issue and aims to:
• Increase awareness of data centre failure modes
• Share lessons learned from data centre failures
• Increase data centre uptime
• Reduce data centre failures
The brainchild of Ed Ansett, an industry leader in data centre reliability and risk analysis (and chairman of i3 Solutions), the Data Centre Incident Reporting Network is being championed by the UK Data Centre Interest Group, a not-for-profit organisation.
Following a presentation by Ansett in 2015, the idea for data centre incident reporting began to gain momentum. “Don Carless, a senior technical facilities engineer from Transport for London, was in the audience and bought into the idea. He got in touch with me, recognising the positive impact sharing information on data centre incidents would have on the industry and suggested that the UK Data Centre Interest Group should get behind this. Ed Ansett was invited to speak at a UK Data Centre Interest Group event and convinced me that being more open about our problems is an industry imperative,” Allen explains.
“Although I hear a few people in the data centre industry complain about the challenges of changing a long-standing culture of secrecy and sweeping things under the carpet, the overwhelming majority of colleagues tell me they would happily volunteer information – particularly the multi-tenant data centres (MTDCs) – as long as it cannot be attributed to them,” he comments.
The fundamental principal of DCIRN is that incident reports will be ‘dis-identified’, so that the person, company or data centre associated with the incident remains anonymous. In fact, DCIRN is modelled on an initiative called ‘CHIRP’ set up many years ago in the aviation industry to share information on incidents (and near misses). This has delivered, and continues to deliver, significant improvements in safety.
“We are getting some good wind behind us,” comments Allen. “John Lane, one of the most respected people in the industry, recognised for unimpeachable ethics, has agreed to be the chief executive on the DCIRN secretariat. The DCIRN secretariat will ensure that any reports cannot be associated to a person, company, data centre, or even manufacturer.
“We are not about pointing the finger; we are about making data centres safer and more reliable. We are also very lucky to have Mike Lonkhurst on the Secretariat – a former airline pilot who has seen ‘first hand’, the essential contribution CHIRP has made to his industry. His experience will be invaluable. Maria Morse also sits in the DCIRN secretariat as membership secretary and is well known and respected in the data centre world.”
Once reports are ‘dis-identified’ they will be passed to a member of the advisory council who will validate the report. (The incident report template was devised by respected industry expert, Professor Ian Bitterlin.) Only then will the anonymous incident report be made public – free of charge.
Could NDAs prevent reporting?
The elephant in the room is NDAs, acknowledges Allen; people think they cannot share information because it will infringe the NDA they signed.
However, Eversheds Sutherland and other prominent and respected legal authorities, working in the data centre industry, have given their opinion: “If the information is anonymous and the underlying parties / data centre in question truly cannot be identified (and without a risk that a third party could ‘put two and two together’ and work out which data centre or parties it concerns), this would not have the necessary quality of confidence to be ‘Confidential Information’ under the terms of a typical NDA.”
“It is not the intention of DCIRN to ‘name and shame’…The guiding principal is that, if you are not sure or nervous, then don’t report – it is simple as that,” says Allen.
The DCIRN web site (dcirn.org) is now live and will be providing more advice and information on the NDA issue in the coming months. Membership is currently free of charge and all members will receive failure bulletins on a quarterly basis. In the future, it may be necessary to charge a small membership fee to cover costs, although Allen stresses that it will be kept low enough for all to afford.
“Data centres now support every aspect of the digital economy and, as we become more reliant on them, it is only a matter of time before a data centre failure will be associated with human fatalities. We need to act now – there is no reason why this archaic secrecy should prevail,” Allen argues.
He warns that if DCIRN fails to succeed, governments will have to step in when fatalities begin to be associated with failures: “Should this happen, a whole new industry of data centre reporting auditors is likely to be created with the sole purpose of enforcing new stringent Government issued reporting guidelines,” says Allen – and this will cost the data centre industry dearly. “If the data centre industry can get behind DCIRN now we could avoid this – at no cost to the industry.”
Allen, Ansett and mission critical facilities expert Peter Gross, are putting their passion and commitment behind the project in a bid to ensure the sector catches up with other mission critical industries.: “We’re finding time outside our day jobs and stealing time from our families to do this and funding everything ourselves. We have taken a leap of faith that we are doing the right thing, for the right reasons. Once DCIRN is established, we will leave it to the Secretariat to manage and operate. But we hope the data centre industry will get behind us. When critical mass is achieved, MTDCs, who do not volunteer information, could be asked a simple question by potential customers: ‘Why not?’” he concludes.