Pick up almost any publication about Green IT: any magazine article, web article, white paper, or blog (including this one) and you will learn that one of the easiest ways to get your data center a bit more green is to isolate your hot air from your cold air. Even magazine like "Processor" (www.processor.com), which you would think would have more to do with...well.. processors... has, in their January 14th issue an entire section devoted to Green IT. Green IT is hot (we're coming for you Cloud!). And sure enough, in an article titled "10 Things You Can Do Right Now To Be More Green", the number one and two items have to do with not mixing your hot and cold air.
If you are a Green IT person wherever you are, then you might think this advice, and the fact that it is *everywhere* is obvious and maybe even oversubscribed. As the Green IT person here at BMC though I have found that there is actually nothing obvious about it. In some ways, it goes against the natural tendency. If you keep a room cool, that is good, and putting in big fans and moving air around is also good, at least at some lizard brain level. Thats how we stay cool as people after all. Never mind that computers don't sweat....hopefully.
I had the recent opportunity to put the "don't mix your hot/cold air" concept to a test. People thought there was a mad scientist loose in the building for a bit, and perhaps there was. It was informative and educational for some who thought I was being a big pedantic for insisting that hot and cold air be kept in their own spaces.
The key efficiency one is after with isolation is this: HVAC prefers hot air on the intake side. When one can drive a temperature differential of about 30 degrees, then the HVAC can run as much as 50% more efficiently. That either maps to less power for cooling, more cooling capacity available in the case of a failure of some HVAC component, or more equipment in the same amount of cooling: Any way you slice that you are saving power, CO2, and money.
The learning opportunity came, as they so often do, in the middle of failure.
Setting the Scene
One of my R&D data centers is fairly small. 60 tons of HVAC, split between two 20 ton units, and two 10 ton units. Over time this DC has been refit twice, with HVAC added, and airflow re-arranged each time. Originally it was three smaller data centers that were physically right next to each other, but they were made, over time, into one larger data center.
The history of all that means that the HVAC were installed at different times, and are therefore of different ages. The failure was twofold: The same 10 ton unit failed twice, separated by about 3 weeks, for two different reasons.
The history of this DC also means that it was not really arranged optimally for hot / cold isolation. Racks were in rows, but in some areas hot backs of racks sprayed directly into cold aisles. There were no rack blanking plates. There were holes in the rows where racks used to be but were pulled out of service.
It stayed cold because there was enough HVAC in the room, but everyone kept telling everyone else that no more gear could go into the room, or it would overheat. Even professional HVAC people kept telling us that the room was running at capacity, though since they had no data about the actual installed gear that was based solely on walking around and feeling the heat of the room.
Even professionals intuition can be wrong. My measurements of the power in the room, and my study of the gear in the CMDB told me that we had *theoretical* capacity left in there, even though when I walked around in the room my body told me that the room was getting too warm.
The First Failure
The first failure of the 10 ton unit was a fan coil. It would not hold refrigerant any more. Nothing to do but replace it. That would take over a day as the part had to be ordered. In the meantime the room overheated, with the cold aisle running well into the 96+ degrees Fahrenheit, and that was with 4 or 5 tons of supplemental HVAC (portable cooling units) brought in and arrayed in a sort of surreal, robbie-the-robot sculpture garden, their fat white arms all pointed at the servers.
This event provided the inspiration for some changes to the room, and we ordered blanking plates for all the racks. That alone would be a huge improvement, but I was curious how far it could be taken. How far could I drive the temperature difference? Could I get to 30 degrees difference in the room? To test that, we used some tall cardboard to block and re-direct hot air into returns rather than into the cold aisle. A sheet of plastic went over the empty rack slots. Even cardboard slats on the tops of the racks to continue the hot air chimney to up closer the ceiling. Hot air does indeed rise, but it also likes to spread.
It was, in a word, ugly. Anyone who cared more about form over function would have been driven screaming from the room. Some interpreted the presence of all that new "air management" as indicating a new, unknown-to-them problem. One used the cardboard as a canvas to express latent artistic talents. Some looked, shook their heads, and sighed.
"The Experiment" was a success. It did in fact drive a 27 degree F difference from the hottest place in the hot aisle to the coldest place in a cold aisle. 76F to 103F (measured with a digital thermometer, and left to stabilize in each location for at least 5 minutes) It was not uniform: The average difference was maybe 20 degrees. That is still pretty good for an impromptu science project, done with parts found laying around the data center.
Failure, Part Two
I was getting ready to take the beast apart. The point had been made, and the data collected.
The 10 ton unit failed again. It was unexpected. It was hard to believe: it had just been fixed, hadn't it? How many parts to fail are there in that thing?
This time it was the fan. This time, because all the blanking plates and other isolation was still in place, the cold aisle spiked to only 87 degree F, and that with only 1 ton of supplemental air. All the other "Robbies" had been moved to other exhibits.
The increased efficiency of the remaining 50 tons of HVAC was not enough to completely offset the failed unit, but it was enough to keep the room from getting nearly as hot. The air isolation was far from perfect: Easily seen in the picture. It was made with cardboard! It leaked like a sieve. It was just way better than it was before.
Side note: The first failure led to 12 failed disks in various servers across the data center: that is to say, 12 *more* disks than would have failed in that same time frame. Disks fail all the time, but not at the observed "hot" rate. The heat increase had claimed its victims. The second HVAC failure had no additional, above and beyond normal disk failures.
Clearly part of this lower failure rate the second time around was that the marginal disks had just been replaced, so it was less likely there would be as big a spike in failures. Be that as it may I am fairly sure part of it was that the servers and their disks stayed cooler.
Back to Normal... Mostly
The cardboard is down: It would not do to block a sprinkler head. The lessons are learned and the results are tabulated. Two conversations afterwards in particular showed the value of the air management experiment.
One was with an Enterprise Architect. Another a former data center manager. This sums it up: they said in essence: "I thought the room was out of HVAC, and that we could not put anything else in there. Now I see. Now I feel it, when I walk around the room. The cold aisle is colder!.. and boy is that hot aisle hot!". People walk around, looking at the roof now going "Hey! I bet if we moved that return we'd get more isolation!"
This experiment proved the Green IT math works. The new, greener data center designs of the future that we'll do will take these lessons into account. As a side effect it underlined for me that isolation of airflow has consequences in how the fire suppression system would also need to be designed.
By just being in flight when the second failure happened, it probably saved us some server repairs and therefore outages along the way. The law of unintended consequences is not always against you.
The experience validated another assumption of mine, and that was the subject of my last post: The higher heat in the ASHRAE standards only works if you have gear built *after* they came into being. Our gear is a whole range of models and types, required because we support such an array of platforms. We'll have to keep that cold aisle a bit colder for a while longer yet. That makes it even more important from a Green IT point of view not to mix the hot and the cold air. It is important to both support our customers, and to be sure we run our DC's as good citizens of the planet as well.