PUE: a flawed and inadequate metric or practically perfect?

0

PUE has its critics but it is neither flawed nor inadequate. In fact, for what it is intended to represent, it is nearly perfect; and it would be were it not for a combination of a lack of understanding, deliberate abuse by marketing folks and the ease that it can be manipulated if the data centre operator so wishes, says Ian Bitterlin.

We should all know by now (after 10 years) that PUE stands for power usage effectiveness, even though many people still slip into using the word efficiency in place of effectiveness, since all data centres are zero ‘efficient’ unless all you want to do is create heat.

Like all good stories let us start at the beginning and, if you are sceptical by nature, PUE had a slightly ‘agenda’-based start in life. It was innovated by members of The Green Grid (TGG) for global consumption and the concept could not have been simpler; the ratio of total data centre annual energy to the net ICT annual energy. Clearly the closer to 1.0 the better, although, at the time, the average was nearer 2.5.

In other words, if you have an ICT load that consumes 10MWh over a full 12 months and, to support that load, the facility consumes 25MWh over the same period, then the PUE is 25/10 = 2.5  –simple! The ‘annualisation’ took care of seasonal changes in cooling energy and averaged out the load, while the only real definition was that PUE was a metric that should be used to chart the improvement over time in an individual facility and never be used to compare facilities.

Key issues

In that very simple definition four things are immediately obvious:

1. It should have been called EUE as it should be an ‘E’ (for kWh energy) metric, not a ‘P’ (for kW power) metric, but it is too late to change it now. So, despite being a glaring engineering error, we won’t mention it again.

2. It says precisely zero about the ICT load – the ‘one’ in the ‘one point something’ – that is considered as sacrosanct.

3. The user should be honest (with him/herself) and include all the overhead energy consumption, including the offices, plantroom small power, security, external lighting and embedded energy in other resources such as diesel fuel-oil and even potable water that is evaporated or discharged into the drains.

4. How many times have you read an article that says that a new data centre will have a PUE of 1.30, for example? How can that be if it hasn’t yet been run for a year and will almost certainly fill up with load slowly?

PUE agenda?

So, why did I say that PUE had an ‘agenda’ in the beginning? Well, this is a very personal view, although having aired it over the years, I haven’t had anyone disagree with me: PUE describes the infrastructure effectiveness and took the world’s attention away from the very thing that the ICT industry didn’t want exposed – the very poor server power supply efficiency at low load and the very low average server utilisation that meant that most servers idled most of the time at a relatively high power. That explained the long-reported condition of data centre power being steady, despite how the business usage should have affected it.

Now, apart from the utilisation, things have improved dramatically and PUE remains a valid metric that is valuable, easy to use and describes only the annualised facility energy overhead.

But let’s have a think about Point 2 above… Modern servers are idle (doing no useful IT work) at anywhere between 20% and 80% power draw (with a 2017 average of around 35%) and the average utilisation (if you exclude homogenous loads like search engines and HPC clusters) around the globe is near to 10%. In other words, most servers are idle most of the time and consume an average of 35% of their ‘pedal to the metal’ power, although the ‘worst’ performers idle at nearer 80%. A facility with 60% utilisation and PUE of 2 is a lot more ‘energy effective’ than a facility with 10% utilisation and PUE of 1.1.

However, I do not regard that a failure in PUE as it was never intended to be a measure of goodness of the data centre – only a measure of the ‘overhead’ power and cooling losses, lighting and controls etc.

On point 3, some users have made up their own rules about what to include (or not) when doing the PUE calculation. In fact, a lot of people still say that ‘PUE isn’t well defined’. That may have been true in 2007, but once version two was published by TGG, all the holes had been plugged. Since then, PUE has been standardised in ISO/IEC 30134-2 and no one should be in any doubt. To be a little critical of the ISO process for a moment, their resulting document is probably not as ‘perfect’ as V2 of TGG’s original document as it doesn’t include a clear definition of ‘partial PUE’ (pPUE, useful for sub-system contribution to the overall PUE ) and ‘instantaneous PUE’ (PUE0, useful for describing the peak kW facility power). Having said that, no one ‘has’ to follow a standard (unless it is health and safety related or embodied in local legislation) and, on the condition that you consistently apply the same rules every year, your PUE improvement plan will be well founded.

Of course, if you are reporting your PUE as some part of an energy saving scheme, such as an EU CoC Participant, then a set of common rules is a good thing. Having said all that, there are many examples of ‘PUE abuse’, which brings us onto the fourth point. Some time ago I coined the phrase ‘PUE abuse’ and here are several common examples:

•PUE is an annualised energy ratio so how can any marketing department claim that their new data centre is, for example, 1.3? Only after running for a full seasonal year can you report your PUE

• The press release that stated (on a cold January day in Amsterdam) that their data centre had achieved a PUE of 1.09 for ‘a whole 24-hour period’

• The (albeit tiny) cheat that a social-networking site uses when applying LED lighting with power-over-ethernet. It certainly saves energy and copper but conveniently allocates the lighting load to the ICT and not the overhead

• Try not to behave badly so that you can make great leaps in PUE reduction when called upon to do. I did visit a very large (>15MW) facility in Germany without any blanking plates in the cabinets and lots of bypass air – saving up an improvement if and when EU regulations came along

• The claim that someone had achieved a PUE of less than ‘one’. This still happens from time to time but less frequently. The ‘trick’ is always the same – some form of onsite generation or consumption of gas instead of electricity that has been netted-off the facility.  One of the funniest was the Californian desert facility that said it had a PUE of ‘0’. In reality it was a 5kW ICT load fed with 100% solar panel array and battery combination with no grid connection

All things considered, PUE is simple and useful. You should pick a set of rules, be that TGG V2 (free to download) or ISO/IEC 30134-2 (must be paid for) and then stick to it.  Do not tell anyone else what your PUE is unless it is to get you energy tax relief. Use your PUE to track your improvements and remember that, regardless of the PUE number, the aim is to reduce power consumption – so get rid of those comatose servers!

LEAVE A REPLY

Please enter your comment!
Please enter your name here