In "Not All Electrons are Created Equal" I talked about different ways to pump electrons up the wire, so that they can charge your smartphone, and light up your life. Yeah, OK, I was talking about power usage in the DC, and how electrons are pumped mattering, but it comes to the same thing. No matter what you use the power for, the way it is moved along the wires is different depending on where you are, and that directly impacts your power bill and your CO2 footprint.
Data Centers are well known for being one of the largest single areas of power consumption, using over 1% of the worldwide total for that one functional area. 1% may not seem like much, but again: one thing! Just data centers. And it equals Terawatts of power. The good news is that it is not growing as much as it was thought it would, and the reasons for that are simple. For us, it comes down to one thing: Virtualization.
Its easy to read the above linked articles, see some very large scale trend data, and find out that Google is (surprise ... not) the most power efficient builder of data centers. But what is real, and what can we, as smaller data center operators, do?
BMC is 8th largest publicly traded ISV, and of course that means we have data centers. Not American Express size data centers, but a fair number of them, and peanut butter spread across the world. Partly that is history: As we acquired companies, we did not necessarily move the data center / lab to a central location. That is disruptive, hard work, and requires planning to make sure that the products are still being developed and growing. Partly its the fact the the speed of light still has not been upgraded, despite years of trying. My geek moment here: Virtual Particles affecting the speed of light. But I digress.
Regardless of virtual particles and purchase models, many small data centers is not an opportunity for scale, and so for the last ten years we have reduced our North American data center physical footprint by about 40%. All targets of opportunity.
No big central project.
All that changed when Scott Crowder, a new (at the time) VP in IT walked into the place, looked around, shook his head, and said in so many words "This needs to change". His concepts are tried and true: If you are a mainframe person they are not even revolutionary sounding: "Buy Big". Don't get a hundred little disk bays and hook them up all over the place: Get three of four really big, enterprise class arrays and put them in large, central places. Ditto servers: we don't want amber waving fields of PC and rack mount servers: We want density. Blades are ready now. You can jam all sorts of capacity into 10 or 11 U now. Get rid of the old stuff. Move your reliability up. Reduce the average age of the gear. Have everything in one place so that support is all in one place.
The "Internal Cloud" or the "Glass House": Whatever you want to call it. It's back. The swings between centralized and decentralized never stopped. By any name, we are headed back to centralized. The variation this time is all the stuff at the edge: The tablets and smart phones and 24/7 high speed connectivity.
Nothing new about centralizing at all... except that it is completely hard to do. It takes being willing to stick to your guns, and also, in an R&D environment, making sure you are not throwing out the baby with the bath water. At the end of the day you want to be sure that R&D can still do their job. In a perfect world, they would never even know you were making changes, but unfortunately it does not work that way: You have to partner, and study, and discuss and adapt.
"Buy Big" became a project: "Go Big to Get Small". The goal was more than reliability, or centralization of support. It was power reduction. Big power reduction. 67% reduction, over a three year project. At the end of the project, the first servers bought would just be coming off support, and feed back into the next opportunity cycle.
Goals like these don't work without vision and without support at the highest levels. If anyone and their cat can buy computers and bring them into the DC, you have lost before you even started. Budget and buying authority have to be centralized, and the people doing those things have to understand the goals. Get the design. And understand when there have to be exceptions.
Exceptions are Exceptions
There are times when a product (actually, for R&D, a products server) can not be virtualized. If we develop and sell a product that does bare metal provisioning (we do) then you have to develop and test on bare metal. Your build machine can be in a VM though. All that matters for a build machine is that code be compiled into binaries as quickly as possible. These days the overhead of the virtual world is so low that this is not a problem.
In that example, there are exceptions. Some servers are not virtualized because of what they do. If we had no customers doing bare metal provisioning anymore... but that is not the case.
What about high I/O use cases? Again: many virtual technologies have gotten far better at I/O than they used to be, with the ability to dedicate I/O devices to virtual machines when required. Or you can move virtual I/O outboard of the server. Co-process it with dedicated virtualization infrastructure. Cache I/O in specialized RAM. variations abound. Sometimes there is just so much I/O, and it is hitting the disks or SSD's so hard that a VM is just not the right thing to do. The costs of making the VM work exceed just giving the process a real server to run that load on.
These are exceptions. Corner cases that require analysis to be sure that we are doing the right thing. This can't be the rule with more exceptions than adherence. This is not like the English languages so called rule "I before E except after C". Maybe. Maybe not.
Scalability is another case: If you want repeatable numbers, virtualization is not an easy place to get them. When you tell someone that the numbers were created in a virtual world, and then performance data used to massage them back into real world numbers, most people will have quit listening and perhaps started to suspect your honesty. Its not that it can't be done. It can. Its just not easy, and you have to be utterly trusted before you even start that conversation.
Maybe 5-10% of the servers we have need to stay "Real" for whatever reason. 90% or so do not: its a target rich environment
Regardless of Platform
You may want to virtualize more than you actually can. One exception would be the case where something does not virtualize. Maybe nothing was ever developed to virtualize it. Maybe its just too old.
We have one of most everything ever made in computerdom, and versions of things going back a decade or more. We still have a VAX chipped based server for example, even though VMS was ported to Alpha, and then to Itanium (and hopefully to AMD64[X86-64] soon...)
Some things virtualize in a limited way: AIX 5.2 and 5.3, Solaris 8 and 9. In these four cases they can be virtualized, but:
- Patches have to be applied to get them to a virtualizable level
- They run in "Zones" (WPAR's in IBM speak) rather than a more isolated LDOM / LPAR.
On the plus side, they use less system resources and are faster this way.
When doing "Virtual First", in an R&D environment, the words / terms / actual underlying code may change, but the concepts are the same. For the five major environments, there are viable solutions:
- IBM: LPAR and WPAR
- Mainframe: z/VM and LPAR
- Amdahl UTS: We miss you...
- Sun / Oracle: LDOM and Zone
- HP: IVM's and Containers (Of very special interest here to me: HP 9000 Containers. Means I can reduce my use of PA-RISC based systems)
- AMD64(X86-64): VMware, Virtual Box, KVM, Xen, Parallels...
- I can not footnote a blog, so worth noting here that AMD and Intel both have microcode assists for virtualization. Borrowing from the mainframe concept of the SIE instruction and building upon the Intel compatible, 64 bit architecture pioneered by AMD most AMD64 compatible virtualization solutions are low overhead and near native in execution speeds.
Storage is every bit as virtualizable as servers these days, though what that means is slightly different. You can:
- Thin Provision
- Data De-dupe
- Mix and match RAID levels, including across some RAID numbers/types that the standards commitee never heard of.
- Tier (as in response time tiering: hot data on fast devices)
Even small storage solutions have some of these features now. But we don't want small arrays. We want big, central enterprise class arrays. Arrays that grow. Arrays that perform. Arrays that do not go down.
As things move back together, and start to scale up, we start to recall what the classic problems of this style environment is. A blade server chassis going down can be thousands of system images offline, and half of R&D sitting around looking at each other and wondering when the server will be back.
If you were silly enough to roll out something that had no ability to recover to another chassis in any case.
Same thing when it comes to storage only moreso. Centralize all the little disks to the big enterprise class array, and if it goes down, everything hooked to it is gone too. That could be petabytes going offline at once.
Oh. yes. Its all coming back now.
Decentralized may be more expensive, and not very power efficient, and failures frequent, but at least the risk was spread, and it was limited. It probably was not on backups out there someplace either, but it was contained. Lots of small problems over time rather than one big one if you did not do it right.
Speeds and Feeds
Next time out, more numbers. Yay numbers! A look at the vendors and their products and the power and CO2 we are looking to save.