I have mentioned this several times in the series, but will quickly reiterate it here. While in many ways I could be talking about a production shop at AnyCompanyAnyWhere Inc., I am not. This is much more complicated because this is for R&D. This is for our thousands of customers and our hundreds of products. This is where the products get designed and built and supported. We have no particular favorites, but we standardize where we can. Where it makes sense to. For all that, we will have one of most everything, and that goes for storage.
Stretched out across all my R&D data centers are things like IBM DS/Shark, XIV, and SVC. Apple XRAID. Compaq/HP StorageWorks. Xsigo. Hitachi. EMC. Sun/Oracle/Storagetek. Various white box players. JBOD. On and on. Stuff from vendors long dead. Stuff from vendors I can not tell you about.
Not to mention Terabytes of local storage.
Where it makes sense for R&D to have access to this device or that one, we have it. Where the storage is just a LUN presented to a VM, we cut back on the variety a bit.
I am going to talk here about the Hitachi VSP. Its lessons are generalize-able to any SAN storage that might be used for the same mission. Here is one of ours:
Note the three large rectangular divisions: I am calling them "bricks" here, but that is just a name I made up because the front design kind of looks like a brick or tile wall, and calling them tiles ... just did not seem right. These "bricks" are the DKC's and DKU's that the storage array is built from. More on that further on.
Enterprise Storage and Virtual Density
There are certain things one must do when trying to get 10-to-1 decreases in server footprint. When the wall of blade servers goes up, there is more than just the heat coming off that wall to consider:
- Boot From SAN: All of it. Every blade from every vendor. Every VM. Every host OS. Keep this as utility as possible so that the servers are just compute nodes. Don't give in to any temptation to install local storage because it is easier. You will be sad. Sooner or later if you do.
- Fiber Channel it. As fast as you can afford. We went with 8 Gb, and are ready for 16 Gb on most of the blades / chassis. We looked at Infiniband, and it is not off the table for future iterations, though some early work in storage virtualization left a bad taste in our mouth about it. Ditto ISCSI.
- Enterprise Class: When you have this many assets running in this small a space, going down becomes massively more painful. A single blade might take out fifty to seventy VM's. A single chassis ten times that. But the central SAN failing is all of it. Thousands of VM's. The entire internal cloud.
- Tier it. Virtualize it. Thin provision it in the hardware. You need to go fast, and you do not want unused bytes of expensive enterprise class storage just sitting around hoping that someday someone will use them.
Gigabytes and Kilowatts
Other than bytes, one fairly common way to measure your storage is how much power it takes to run how much storage. Gigabytes / Terabytes per watt / kilowatt kinds of numbers. This being 2013, I'll go with Watts per Terabyte usually in this post. Before the "Go Big" consolidation efforts started, some of the devices we had/have in the DC came out when watts per Gigabyte or even Megabyte made more sense. I'm looking at you 1993.
Another thing to consider is that with Enterprise devices like these, the money is up front. By that I mean that a base level device is the most expensive way to build it. A big empty frame is expensive, even though it positions one for less expensive upgrades down the road. It is also the most expensive in terms of watts per Terabyte. Numbers on that farther down the post.
Buying as big as possible up front will save monetary units down the road.
I don't have an amp clamp on any of our VSP's, so all I can measure in our DC is at the PDU or UPS. I can use lots of data center math, and DCIM tools like Nlyte to get pretty close. So that one can follow along at home, I am going to use a free tool that anyone can use. Hitachi's Weight and Power calculator.
There are other advantages to using the same tool. How we put one of these arrays together is not going to be the way anyone else is. How much tiering. How much total storage. How many controllers. How much cache. All of it. It all changes the numbers for storage-to-power ratios. The calculator is a spreadsheet, and this was version 13.14.
Side Note: I tried to run this under LibreOffice 4.1 and Excel 2011 on the Mac, but this SS appears to be sadly MS Windows specific. That's why I have a Windows 7 VM though.
For the purposes of this article, I'll put together two different configs in the tool: A starter systems and a midrange setup. We'll see how that works out as far as power. It is between you and your finance department how much you are spending for this of course.
A lot of the Hitachi specific terminology is here is in this reference guide.
This SAN is the center of the Virtual world. Some things should not be skimped on. I'll put the cache at 512 GB for both configs. For fat pipes, I am maxing out the 16 port fiber channel at 4 per controller (DKU). That appears in the tool to reserve some ports for high speed internal usage.
The Hitachi VSP can scale all the way into the Petabytes. In the maximum config it has 6 standard rack size cabinets, about 24" by 40 " by 42U.
This first config will just be a single maxed out cabinet (Frame 00 on the config). That is 1 DKC (Controller “brick”) and 2 DKU's (disk “bricks”). Each disk brick can hold either 80 large form factor (LFF) disks, or 128 Small Form Factor (SFF) disks. I will make one SFF (DKU-01) and one LFF (DKU-00).
Config 1: Small. Medium Performance. Big cache.
Using the tool, I will stuff 80 7200 RPM SAS LFF drives at 3 TB each, and 128 10,000 RPM SAS SFF drives at 900 GB each. I will configure 8 drive spare of each type, and that gives me 324,000 Gigabytes of storage, with two tiers, lots of redundancy, and about 260 TB of usable capacity to thin provision into.
According to the Calculator that is 4.3 Kilowatts of power under standard load (whatever that means). Still : an amazingly small amount of power. 13 watts per Terabyte raw, or 16 watts per TB after formatting. There once was a day when 25 watts per TB was the holy grail of SAN storage. Because of the density a 3 TB SAS disk brings to the party, we are well under that.
We actually started with a config very very similar to this one, and connected literally thousands of VM's to it, and they perform far better than they did before. “Before” was on old, slow hardware with internal disks. If this works, one can only imagine what a better tiered, more scaled up version can do.. so lets have a look.
Config 2: Medium Size, Good Tiering.
Enabling Frame 01 and Frame 02, this is the biggest the Hitachi can go without adding a second controller frame (DKC 1).
All these bricks! What to Do?
I put in four SFF bricks and four LFF bricks. For the purposes of this discussion, five tiers (only four performance tiers: there are two 7200 RPM disk types. I just did that to show it could be done):
- SAS SFF SSD 400GB
- 8 spares
- This will be 22 TB of smoking fast storage.
- SAS SFF 15K 300GB
- 10 spares
- 64 TB of high speed storage
- SAS SFF 10K 900 GB
- 10 spares
- 190 TB of medium speed storage
- SAS LFF 7200 3 TB
- 12 spares
- 684 TB of low speed storage
- SATA LFF 7200 2 TB
- 12 spares
- 136 TB of low speed / lower cost storage
That is a lot of disks and a lot of storage: Its easy to play around and mess with the ratios of the tiers. Performance data would give you some clues about the best tiering. I went with this since it demonstrated the point and made sure there was always plenty of storage in each tier. Also plenty of backup drives of each type. Its not meant to be optimal to cost or even power, but rather something I feel comfortable with saying that 10's of thousands of VM's could run here. As the back-end for a CLM install? No problem.
The power calculator reports this is 11.4 KW for just over a petabyte raw. 4 watts a TB raw or 5 watts a terabyte formatted. There is a a pretty good chunk of 10 and 15 thousand RPM disks spinning. Fast disks use more power than slow ones: only makes sense.
It would be easy to make this lower power / higher capacity. This config should keep the Fiber Channel pipes full.
Adding a fourth cabinet adds more disk controllers in another DKC, and everything should more or less scale linearly from there. As you can see from the picture, we don't have anything like this big and fast yet: All kinds of growth are possible is speed and capacity, with very little increase in footprint or power consumption. We shrank the 38,000 square foot data center to 11,000 square feet, dropped all sort of power, and fit everything storage-wise in these two cabinets.
Full Disclosure / Broken Record: We did not virtualize everything. The Heterogeneity previous alluded to.... That means not every single server in the DC has its disks space out here. Everything we could put here, or on something like it, we did.
Compared to What?
Looking back the starting place for all of this desire for consolidation and hardware updating: Racks full of thousands of servers. Real physical servers. Often desk-side engineering stations on shelves in racks, making airflow management a pain.
There was/is NAS installed of course, but each server booted internal disks, and often had at least two internal SCSI disks.Disks such as Seagate Cheetah, Fujitsu, or IBM Ultrastar. I have a couple of them on my desk here: This 18 GB IBM Ultrastar I am looking at right here (see it?) is rated at 700 Ma at 5 volts (3.5 watts) and 800Ma at 12 volts (9.6 watts) for a total of 13 watts. This other disk is a 33.8 GB Fujitsu. Says at 5V / 1 Amp + 12V 1.2 Amps. 5 watts + 14.4 watts = 19 watts peak. The specs for this Seagate Cheetah say idle power runs between 8.7 to 11.68 watts, depending on interface.
We had piles of servers with of 18, 33, and 72 GB drives. 146 GB was considered huge back in the day. Over five hundred server had 20 GB disks or smaller.
Extremely conservatively: If the average across the shop was 30 GB, at 10 watts each is is easy to figure out some interesting things like watts per GB and power reductions. This is extremely conservative, if for no other reason than there were a fair number of 5.25 inch disk drives still in use, and they use *a lot* more power than this. The vast majority were either SCSI or IDE, not Fiber Channel or SAS.
The next question is at what point in the history of the Data Center are we comparing this too? Go back 10 years, and we had 17,000 real systems across the globe, and well over 5,000 in the biggest one. We had two 750 KVA UPS's loaded up.
Today we are using less than the full capacity of one UPS. Over the last year or so has been the real Go Big to Get Small effort, so that only about 2,000 devices in this one large DC. 4000 or so disk drives. That's easily 30 KW. A third of a watt per Gigabye, or over 300 watts per TeraByte!
Even worse: That capacity was sprayed all over the place: 38,000 square feet of data center, full to the rafters with old systems, each one an island of storage. Shared capacity was only on the NAS (of which there were many terabytes). Better than a bunch of cross mounted systems, but still not optimal.
No thin provisioning. No tiering. The system went as fast as the fastest disks you bought for it. If the system that was attached to the storage failed, and that storage and whatever it did for the business was unavailable. If it was something like a continuous Integration server, things could grind to a halt till the server was fixed, or the disks swapped to a working server, or some other similar thing.
Not everything could be HA when running amber waving fields of physical servers. Virtualize it. Consolidate it. Now everything has to be HA, but everything benefits from that. Not only is power being saved, and space reduced, and less CO2 emitted, the availability is *higher*.
Its also easier to find all your "Stuff". Its right there.
Pete and Repeat
I want to state again: not everything in the DC is Hitachi. I picked the VSP for this post because we have a lot of it, and it is a good example of what would be achieved with any Enterprise class storage. Had I picked IBM XIV, the details of how the storage is installed, how the controllers work, how the disks go in to the frame, and laid out would have been different. The point would be the same. Ditto any other Enterprise vendors gear. Go big and dense to achieve the goal. Make sure it is HA.
In another kind of repeat, if this all seems like we have been here before it it is because we have been. We just called it the mainframe back then. Therein lies the tale of the data center, and the redesign, and that's next.