Data centers are fascinating. High-tech strongholds designed to survive pretty much any natural or man-made disaster.
Dull and unremarkable at first sight, yet so complex and intricate once you’ve walked through their gates. Secret, cold and lonely places, but throbbing with invisible energy. Are server racks a modern-day Stonehenge? Will the Gods be pleased and Grant us eternal uptime?
Mystical considerations aside, data centers are the alpha and the omega of modern civilization: our everyday life – and sometimes, the difference between life and death – depends on their availability. Face it: in the 21st century, Norns are weaving the web of our fate with fiber optic cables. Doom on you if they can’t tell the difference between DWDM and CWDM 😉
For any self-respecting engineer, the hunger to understand how data centers work should be irresistible. There’s so much to learn: construction, security, power, cooling, cabling, network, servers, etc. Where do you start? Actually, the question really is: where CAN you start? The data center industry is quite opaque… The veil is rarely lifted and some questions are hardly ever answered.
Here are a few precious engineering resources that will get you started. The more you’ll read, the more you’ll learn… and the more you’ll realize how much there is still to learn. A data center is a complex beast! You should focus initially on the basic concepts of power management, HVAC management, redundancy, etc.
- Books: most of them are a complete waste of time and money (especially if they have “Green IT” in the title). However, we can vouch for the quality of these two:
- Administering Data Centers: Servers, Storage, And Voice over Ip” by K. Jayaswal, (John Wiley & Sons, 2005, ISBN-13: 978-0471771838): a little bit old, but a very good place to start. All introductory concepts are present and newcomers will learn a lot on data center infrastructure and on IT inftastructure in general (servers, network, redundancy, etc).
- Maintaining Mission Critical Systems in a 24/7 Environment“, by. P. Curtis (Wiley-Blackwell, 2nd edition, 2011, ISBN-13: 978-0470650424): fantastic book, but definitely not recommended as an introduction. This book focuses on daily maintenance of data center infrastructure (not IT infrastructure) and it goes into A LOT of detail on how things work and what to do to keep them going. 9 pages on how to check the quality of fuel deliveries for your generators, how cool is that? A mine of technical information, definitely the bedtime book for data center staff.
- Online resources: sadly, most of the material out there is either watered-down to braindead level or just infected with product marketing bullsh#t (quite an oxymoron). If you want technical information, you’ll find it here. We’re not affiliated with any vendor, in case you were wondering 🙂
- Energy University @ Schneider Electric: lots of free courses on data center technology (power, cooling, etc). Tested and approved.
- Online courses @ Siemens: power only (breakers, transformers, etc). Very good stuff too.
- Data Center Knowledge: industry news, many pictures and videos. Make sure you add it to your daily feeds.
This will only take you so far, however. The best education is to visit as many data centers as you can. Never turn down any opportunity, even if you’re not actively looking for hosting space. Keep your eyes open, use your common sense (water pipes above the racks? Hmm) and ask a million questions.
Here are a few real-life pointers for more
inquisitive productive visits:
- Trust nothing, especially redundancy claims (i.e. “n+1”, “n+n”). Check EVERYTHING in person: is feed A really feed A and not feed B? Hmm?
- Beware of fake redundancy. If two power lines come into the building at the exact same spot, can they be called redundant? Nope. The same goes for fiber connections. Once again, check everything.
- Ask for power diagrams. Look for them in all rooms. Check that they tell the truth.
- Check if all equipment is really online. An additional transformer / generator won’t be very useful if it’s been “on maintenance” for months, will it?
- Ask for maintenance logs. Look for maintenance tags / stickers on all equipment: they usually have handwritten information about the last / next maintenance date. Very informative 🙂
- Ask about the fire alams, because pneumatic sirens may damage hard drives [Ed: It happened to us in 2011 and this paper explains why].>
- Do you see cold-isle isolation racks mixed with standard racks? Argh, someone is going to pay for hotter air than expected…
- If a water pipe reads “cold water” and feels warm to the touch, what does it tell you? [Ed: Man, the desperate look on the Sales guy’s face was *priceless*]
- Are technical rooms clean? dusty? greasy?
- Can you see what’s on the NOC screens? Any useful/confidential information? [Ed: during a recent visit, the Sales guy boasts about their sub-1.10 PUE. We walk by the NOC, where a screen displays in big bold letters “PUE: 1.50”. Oops].
- Does your hosting contract mention support / quality procedures ? Did you ever receive them? And if so, did you read them?
- Talk to the security guard. Ask some basic questions about access control. Ask for the access log to your suite in the last week.
- As a matter of fact, talk to every staff member you come across. With a bit of social engineering, they may give you the real lowdown on the site, not the sales pitch. Chances are you’ll meet VERY colorful characters, so don’t miss out.
- Check the Meet Me Room / Operator Room. If it’s mess of fiber cables, what does it tell you?
- Try to wander in the data center. Try to open all doors that you shouldn’t be able to open.
- While you’re at it, try to get in without any ID, preferably through the delivery docks, why not with drinks and food. Does anyone stop you?
The list goes on. This may sound harsh or paranoid… and yeah, it should be! Can you NOT consider the worst scenarios and how well a data center will survive them? Especially since the SLA penalties will *never* make up for your lost revenue!
In addition, what will you tell your CEO and your customers when power fails at the busiest time of the year? Or when planned maintenance goes wrong? Or when public construction cuts your “not-so-redundant” fiber?
“Sorry Boss, they said it couldn’t happen” ??? That just won’t work and on top of everything else, it might well be a resume generating event.
Remember: the more you sweat in training, the less you bleed in combat. Make sure you and your team sweat a lot.
That’s it for today. We’d love to read your advice and anecdotes in the comments section. Till next time, get rackin’, people!
Our lovely Community Manager / Event Manager is updating you about what's happening at Criteo Labs.See DevOps Engineer roles