||Add To My Personal Library
July 2, 2010
Vol.32 Issue 14|
Page(s) 9 in print issue
In Case Of Emergency
Take Steps Now To Prevent A Power & Cooling Disaster Later
Many enterprises make significant investments to prepare for data center power and cooling emergencies. However, even after implementing backup infrastructures and creating emergency procedures that staffers are supposed to follow, many data center managers learn the hard way that they are not as well-prepared as they had thought. Instead of servers going offline for a few minutes, entire data center operations can shut down for days at a time due to a lack of proper planning when the unexpected happens. Here are some ways to help you make sure that your data center’s operations can handle just about any power or cooling emergency that Murphy’s Law can throw at it.
• Make sure that the backup system can maintain power to the chillers as well as to the servers and that there is enough backup cooling capacity to handle the load.
• Certain maintenance procedures should be scheduled on a regular basis throughout the year. Summer is the best and most crucial season for testing.
• Ensuring that the layout of the backup power and cooling system is identical to the main system will cut down on time wasted looking for controls.
Get The Chillers Right
Your server backup power systems may be as reliable as possible, but not having the right chiller backup power in place in the event of an emergency means temporary server shutdowns at best or, at worst, the meltdown of servers due to overheating.
“It makes no sense to put a generator in to handle data center power if the cooling plant isn’t also backed by a generator, because you’ll have to shut down systems to prevent overheating,” says Nik Simpson, a senior analyst at Burton Group.
Ensuring that there is an adequate water supply for cooling is also crucial, Simpson says. “Many cooling systems need a clean supply of chilled water, which can be provided by refrigeration units at the data center or piped in from the local water utility,” he explains. “Either way, you need a backup plan for a second chilled water source if you want to cover the contingency of a refrigeration failure or [other disaster].” Simpson cites examples of companies that have gone so far as to drill wells to tap underground water sources or to build their data centers close to lakes, rivers, or canals.
In the big-picture sense, the concept is to plan ahead to ensure that the proper backup cooling capacity is in place so that the data center can run on the backup source as long as it has to, says Eddie Stevenson, CIS marketing supervisor at MovinCool (www.movincool.com). “Some of our distributors have reported that their clients have lost $500,000 in just [one emergency] because they didn’t have the proper cooling in place,” Stevenson says.
Regular Maintenance Is Crucial
Cooling and power emergency preparedness requires certain maintenance procedures that should be performed on a regular basis around the year.
“If you just have battery-based UPS, then it’s critical to ensure that battery health is maintained,” Simpson says, adding that an undetected battery failure or a battery that doesn’t hold charge well can substantially reduce runtime for the UPS, which can give you a nasty surprise when you need 30 minutes to shut things down gracefully and you only get 15 minutes of runtime. “When you throw a generator into the mix, there’s additional maintenance on such things as transfer switches, starter motors, and the like, while a failure in any of these can make the generator useless,” Simpson continues.
Regular monitoring and maintenance can also go a long way to make sure that the backup systems can handle the load in the event of a power or cooling emergency to optimize availability and efficiency, says Ben Kissell, service solutions manager with Emerson Network Power’s Liebert Services (www.liebert.com). “For example, if a UPS is operating over capacity, the critical load could be switched to bypass, exposing IT equipment to utility problems,” Kissell says. “To proactively prevent those failures, data center managers can upgrade existing systems or implement new units to manage the appropriate capacity within the data center.”
Steven Harris, director of data center planning at Forsythe Solutions Group, advises that summer is the ideal time to perform extra maintenance services. “When testing UPS batteries, generator startup batteries, fuel levels, HVAC filters, and other crucial backup components, summer is the best season because of the rising humidity and increasing temperatures,” Harris says. “Planning and preparation for energy-efficient operations during the summer is especially important in climates not used to warm temperatures year-round.”
Properly Configure The Backup Layout
Backup cooling and power infrastructure is, simply put, an alternative on which to rely in case of emergency. Because it is an alternative, IT staffers will likely not use the associated equipment as much as the main equipment. However, operating and running the backup systems will require direct human input, so knowing where the controls are physically located is crucial, especially in the event of an emergency when minor delays can spell big trouble for a data center that is without power and cooling.
To help make the layout of the backup equipment more familiar, use the same cooling and power layout for both the main and backup systems. “When somebody walks into the A room, everything should be in the same orientation as it is in the B room when you walk in. You immediately know where the bypass switches are and where everything else is that [you need in an emergency],” says Bill Kosik, energy and sustainability director for critical facilities services at HP. “Go to great lengths [to ensure] that the layout of the rooms in which the main and backup units are located is identical.”
by Bruce Gain
Top Tip: Do Dress Rehearsals |
The backup cooling and power infrastructure may be in place in the event of an emergency, but does everyone know what to do when disaster strikes? Making sure that staffers know how to react requires clearly outlining and documenting the procedures. It also necessitates practice, such as staging mock emergencies to prepare. “The adrenaline starts to flow, people’s judgment becomes impaired, and unless things are clearly marked and everything has been rehearsed and understood, there is a high probability that someone could hit the wrong switch or close the wrong valve, which could create further catastrophe,” says Bill Kosik, energy and sustainability director for critical facilities services at HP. “You can have a pull-the-plug test where you simulate a power and cooling failure and everyone has to man their stations in terms of what has to be done, where, and when.”