-by Steve Carl, Senior Technologist, R&D Support
What goes round comes round. What is old is new. Here we go again. Haven't we been here before?
I have been doing several things lately that all intersect. One of them is designing a new data center for R&D labs. Another is looking at one of our VMware server farms that was an early proof of concept for the whole server virtualization thing, and trying to figure out where it needs to go *next*. Another is looking at Cloud Computing: where it is now, and where it is headed.
At the center of almost all of those things is virtualization technology, and that is where I started my career on the mainframe back in 1980. VM SP it was called back then. At the center of the current X86 version of virtualization is another old friend of mine, Linux.
Quick terminology note: The mainframe OS called VM created VM's. Yep. They called the OS what it did. Kind of like people being named "Smith". VM over the years has had many flavors: VM/370, VM/SP, VM/XA, VM/ESA, and the current z/VM. VM system programmers just called it VM mostly, and knew by context if the discussion was about the OS, or the Guest Operating Systems. A Guest OS is a virtual machine, running as a guest of the host OS.. named VM. I'll try an be contextually clear here. VM created virtual mainframe hardware that the Guest OS used without knowing it was not real. Mostly.
VMWare, Xen, and others are slightly less confusing here because the Virtual Machine tag is reserved for the virtualized, or guest OS only. When you talk about the host OS, it has a different name. Don't even get me started on bare metal virtualization, or hypervisors. Not going there.
VM on the mainframe predates me by a good bit. Depending on how you interpret a few things computer virtualization started in either the late 1950's or mid 1960's. The exact start date is not really that important to this post. What is important is that the early experiments turned up the fact the hardware virtualization worked better when hardware assisted in its own virtualization. To keep memory from being bashed, trashed and generally abused there were features added to allow indexing and translation assistance.
While the implementations are technically different, they are not that conceptually different from what AMD did with Pacifica (AKA AMD-V) or Intel did with Vanderpool (AKA VT-x). It was really a very old lesson.
As the mainframe evolved, the technology that it used changed and evolved until we reach the place where almost all the virtualization formerly being done by the VM Operating system started being done in the hardware via the SIE (Start Interpretive Execution) microcode.
We have not quite reached that place in X86 land yet, but AMD and Intel keep adding more and more hardware features to support OS virtualization to their processors. The Nehalem generation of processors from Intel adds Extended Page Tables (EPT) for example. Wow... where I have I seen that before? Oh yeah... I remember. AMD is adding I/O Memory Management to increase virtual machine isolation. Think I have seen that before too....
Don't take my tone wrong: these are great ideas, and my point here only is that we have been here before, and so if you want to know where we are going, just have a look at today's Z10 mainframe because the hardware features it has now to assist in virtualization will appear sooner or later, in some form, in X86 space.
Back in the day, when we were running a large number of virtual machines under VM on the mainframe (oh.. wait. We still do that...), the thing we needed first, before anything else, was RAM. That is as true if not more so today.
One of the things that was discovered early on about virtualization was that if you do shared memory wrong, all you get is a computer that beats itself to death trying to manage memory. Thrashing. Doing nothing but paging. I was not there, but think that is more than likely why some of the very first hardware assists for virtualization back in the early 1960's came in the form of memory management. I was here when Intel and AMD added hardware assists for memory, so I think my theory has legs.
In our use of VMware internally, we find that more often than not the number of virtual machines that we can deploy on any given ESX server is more a function of RAM, not CPU, being the bottleneck. Even with features like memory over-commit, it is common for us to see in our BMC Performance Assurance data a two to one ratio of memory usage to CPU usage. It would be easy to assume from that data point that the speed of the CPU has outstripped the other components of the general computer.
When CPU's were not the fastest thing in the box, IBM spent a great deal of time and engineering trying make sure that they only they only did work that was high value. I/O was spun off to dedicated I/O processors. Memory was used for the I/O processors to communicate the actual data the CPU needed to work on next. In general, the mission was 'keep the CPUs busy'. All this customer engineering made for expensive computers, and so no mainframe data center worth their salt would buy new capacity before they needed it, and an underused mainframe was considered a failure on the part of the people that designed and recommended that configuration.
Aside: When I was at NASA as a subcontractor, I heard a quite believable story about a mainframe system programmer who wrote a program that was a CPU soaker. After a new upgrade, he would make it use more CPU, and keep end user response time more or less the same. When load increased, they would dial back the soak task, and things would return to normal. When there was no soak left in the soak task, it was time for a new mainframe.
Whether that story is true or not, that fact it was told says something about the way that the resources of the mainframe were viewed.
The constraint for us with VMware is RAM. With 256 GB of RAM I can virtualize twice as many virtual machines as with 128 GB of RAM, without adding any CPU resources. But the cost the 8GB SIMMS to do that more than double the cost of the system many times... It was often literally less expensive to buy two ESX servers with 128GB of RAM than one ESX server with 256GB of RAM. Boy will that set of numbers look stupid in a few years... but the point will still be the same.
Factor in the three year ROI, and that is not true of course. Power, air conditioning, rack space, network and KVM connections all more or less double with two ESX Servers instead of one. The problem in tough economic times is to take three year ROI into account. The good news is that being green: using less of this planets resources, is also a good thing and makes the discussion about buying fewer, more expensive computers feel a lot like the ones we used to have back in the heydays of the mainframe.
For a project I am working on I did a three year ROI on two 256GB VMware servers (Dell R900's, but the same held true for Del R905's) versus four 128Gb ESX servers (Also Dell R900's). Not even counting the VMware licenses for the CPU sockets, the costs came back with the 128Gb config costing 25% *more* over the three years of the server. It is even better than that, because I rounded down all the power costs in my model, did not include taxes on the purchase or the power, and did not count the VMware per-socket CPU license charges. The real number is probably closer to 50% in the real world, but I wanted this estimate to be utterly fiscally low-ball. Under promise, over-deliver, just like Mr. Scott on Star Trek.
[Geek Points for the Star Trek reference!!!!]
I/O and Channels
The mainframe is still the king of the I/O mountain, and at the core of that is the way the the I/O subsystem on the hardware is designed. There are lots of I/O channels, they can load balance, and more than one can connect to any given I/O device for not just load balance but redundancy. It also further allows virtual machines to start I/O on any given channel to any given device in a totally shared and transparent but still isolated way. Now add that the Z10 can have 1024 I/O channels to a single mainframe. X86 is not anywhere near this yet. Not even close.
Not being near and not being close is not the same thing as not knowing where they need to head over time though. Have a look at the Virtensys web site for example. Clearly they have in mind the same thing that the mainframe did: Decouple the I/O from the processor, and share it. The picture doesn't have 1024 I/O channels in it: There is a reason why the mainframe costs more for starters. Of course you could argue that makes the MF the perfect convergence platform at the core of server consolidation... and you would be right.
VM on the mainframe has virtual network switches interconnecting virtual machines. VMware has the same thing in ESX. TCP/IP allows all sorts of possibilities for tunneling other protocols inside it. Stepping back a second, it seems like whoever has the least expensive, fattest pipe can become the transport de jour for all the other I/O in the shop. iSCSI, FCoE... you name it.
In a reverse of the way things normally happen, distributed system brought networking to the mainframe. Well... not exactly. The mainframe has it's own way of networking before (SNA) but it did not survive "contact with the enemy". Mainframe people just had trouble getting their heads around the idea that the protocol did not guarantee the packet would get where it was sent. But I digress.
The mainframe has virtual network I/O long long before it was cool. We used to lash Virtual Machines together into virtual networks with Virtual Channel to Channel (VCTC) adapters. There was real hardware that let one mainframe talk to another over its high speed channels (high speed then...). It was a complicated sort of flipping transmit and receive, except that the mainframe channels were parallel, not serial. VM could do that same trick virtually, and then to guest OS's could converse not at hardware speed (4.5 Megabytes a second on a fully spiffed S/370 set of cables), but at *memory* speeds.
If you buy a large VMware server, it can have inside it a virtual network where a large number of its virtual machines converse with each other without ever touching a real wire. We have been here before, and it was good then too.
UNIX and other platforms have long had the concept of a virtual disk. The hardware design made this vary, but at the very base of this was the disk slice. Take a disk, and instead of using the whole thing as one disk address, write a partition table on it, and have it contain more than one disk image. PATA had, for example, four primary partitions available by design. In Linux terms, HDA became HDA1, HDA2, HDA3, and HDA4. When that was not enough, PATA layered on the "extended" partition, so that one of the four disk slices could be sub-divided into slices again.
VM (the MF OS) has always had the idea of a "mini-disk". Unlike a partition table, it was not written to the disk being sliced how the disk was divided up. The disk was blissfully ignorant and just stored the data written to it where it was told to by VM. The disk slicing was defined in the VM directory. MF disks mostly came in the Count Key Data flavor (with a few exceptions) and that meant that they were subdivided into "Cylinders". Different disk models have different numbers of cylinders. The smallest minidisk is one cylinder, meaning that a MF disk could contain literally thousands of mini-disks. The limit was how many cylinders the hardware presented to the OS.
VM would then take these minidisks and assign them to VM's or Guests. The Guest OS would have no idea that the minidisk was not a real, but smaller version of, a real disk.
In another echo of the past, I have seen various Linux recommendations being made to use disk labels rather than device addresses. In Linux, this is mostly in /etc/fstab. Mainframe has use disk labels forever. The Directory that defines the minidisks is keyed of the disk labels in fact. The reason is the same too: with labels, the underlying disk hardware address can change and it will not affect the operations of the system. Handy for system recovery and such.
Disk slice limitations were a real problem for UNIX and other OS's. Logical Volume managers sprang up to deal with that by abstracting the real underlying hardware from the applications on the OS. LUN's became VLUN's. A VLUN could be part of a disk, a disk, or many disks. In this the mainframe was exceeded for a while: to aggregate minidisks would be a long time coming. The first time I thought is was *easy* was in fact the convergence of Linux and VM, where Linux creates VLUN's over the top of VM supplied minidisks.
Driving Utilization Up and System Count Down
As noted before, when I started in the mainframe biz, it was just generally accepted that you did not run your MF at less than 80%. If you did, you had overbought your capacity. At the same time, there had to be headroom for peaks: things like quarter close, and billing runs and stuff like that which made your averages and your peaks both things your capacity planner / system programmer took into account when figuring out what kind of mainframe they needed to buy next time. Not counting the CPU soaker person. That story ends with them being fired.
One of the reasons that people loved distributed systems is that they could buy all these little computers with tons of spare capacity and just not worry about that anymore. Of course we now understand that the freedom came with a cost: OS license counts, applications license counts, and amber waving fields of systems that needed to be replaced every three years or so. This contrasted with the more structured world of the glass house where those things were planned for and dealt with by small staffs of people that understood all the issues around the troubles that come from things like hardware and OS upgrades.
There may not have been capacity planners in the distributed world, but now everyone was a desktop admin.
All those computers sitting about and doing nothing when someone was not sitting in front of them. Even when they are sitting in front of them, I am watching the CPU meter as I type this. The computer is not even noticing the keystrokes. I have two browsers open, email, and a couple other applications and the CPU is looking out the window and is frankly rather bored. Memory is at 60% though. When I need the CPU, I will spike it to 100% for a few seconds, but then it will return to its lackadaisical state. The average computer runs at about 2% CPU usage most of the time. With enough memory, this CPU should be able to handle 40 or 50 people doing the same things I am right now.
The mainframe stayed as busy as possible by buying fewer CPU's, offloading its I/O, and leveraging RAM and disk storage (VM's paging / swapping subsystem is a thing of beauty).
For the end user, the so-called "green screen" sat connected to a terminal controller, and the terminal controller buffered together the I/O of about 32 users, and batched it up to the mainframe when it needed to. These days you can do more or less the same thing with a web browser, AJAX, and a Linux server. That Linux server can be running with a bunch of other Linux servers at the same time on an ESX, Xen, or other virtualized server inside the data center. Work is offloaded onto my computer, and batched back to the server as needed. As I am writing this in Google Docs, the server is off someplace in a server cloud: I know not where.
Feet on the Ground
Cloud Computing. Seems like another name for "Central Data Center" in so many ways. What are the concepts that drive the cloud?
First off, while people sometime act like the idea that they are in one place, but their data is in another is new. They get jazzed about the idea that a modern computer screen looks so much better and is so much higher resolution and has all the nifty colors and nice icons and all. All that is true of course: We use tons of computer cycles and RAM to make all the pretty screens. We'll be using a lot more whenever we get to the place we can talk to the computers.
I was talking to a banker the other day, and he was talking about how they are supposed to open new accounts. There is this pretty GUI based thing and using it requires patience, especially when it crashes. All he wants to do is open an account, and the person is waiting right there! When the GUI widget crashes, he goes to a green screen hidden away somewhere in the back ... I am guessing an IBM terminal, but they did not know. May have been ASCII. There he flies through a series of screens, typing in codes and data, and in a minute or so, all is done.
That terminal accesses a computer someplace else. He does not know where. It is "out there". Sure, it is an internal to the bank cloud, but it is still mysterious and magical. Of course it helps he knows how to use the green screen. A new comer to the bank would more than likely not know the old system, and will have to be patient waiting for the new system to either catch up, or maybe even come back up.
Is the magic of the cloud the protocols? Does one really care where the computer is? Of course not. What they care about is whether or not someone can steal their data, their identity, or embarrass them.
I am not saying there is nothing new under the sun... or is that Sun here? The fact that there are over a billion wireless phones on the planet, and having a full browser in the wireless palm (or is that iPhone) of your hand is clearly new. It is nothing that science fiction did not think about for years, but now it is real. All those itty bitty computers in our hands would be useless without the cloud of computers behind them. On my iPhone, I have no idea where the computer is that makes Google Maps work, but I am fairly sure that it is not the same one as the computer that makes my Twitter app work. the protocols and the interfaces, and the speeds and feeds have all changed, but conceptually, how different is that then the green screen on the desk that somehow accesses the customer data and sets up the account? If the bank has web banking, that customer can now go and access the exact same information from their iPhone.
I'll have more to say about the clouds in the future, but as it related to this post I think I have beaten it enough. Probably not into submission. the cloud crowd are pretty loud and proud.
[More geek points for alliteration!]
One of the areas I am looking into right now is storage virtualization. Making disk farms into blocks of storage, and abstracting the volumes at a layer above that. Spread I/O as far and wide as is required for the application at hand.
Even at that, I have been here before. The mainframe had a bit of disk storage back in the 90's called the Iceberg, from then-STK. The 'Berg was a real disruptive technology. It took Fixed Block, SCSI disks and virtualized them into Count Key Data volumes. The mainframe no longer knew where the actual blocks were anymore. IBM later came up with the RVA, then the Shark, and now the DS line, and all of this virtualization has continued and increased. There is, as far as I am aware, no such thing as a *real* Count Key Data disk anymore.
The IBM SVA , Xsigo Systems VP*, or HP SVSP do the same thing for disk arrays from a wide range of disk vendors, or, if you prefer, there are devices like the ones from Compellent (to name but one) where the disk back end is provided, but utterly virtualized.
Deja Vu. I feel like I have been here before. Not exactly. There are differences. Still, I wonder what is going to get mined from the mainframe next. It is not like the Mainframe is sitting still waiting to be overtaken either. Like TCP/IP, it has absorbed some ideas, concepts, and even operating systems like Linux. UTS would be proud
The postings in this blog are my own and don't necessarily represent BMC's opinion or position.