Share: |


Fixed Block Architecture, Count Key Data, Lions, Tigers, and Bears

 

http://vmlmat.wiki.sourceforge.net/

 

My days ... years really... as a mainframer covered a period of time where IBM did something very technically interesting (for a V/geek anyway) where mainframe disks are concerned. Back in the 1960's IBM designed the disk data architecture called Count Key Data (CKD) and it is a very different beastie than the Fixed Block Architecture (FBA) we are all used to from UNIX, Linux, and MS Windows disk formats. Follow the links on the terms for a deeper dive.

 

As the second link points out, FBA is a term that is hardly even used anymore since it is so common.

 

Modern mainframe disks from IBM, STK, and others are, as far as I have seen, all based off concepts introduced in STK's Iceburg project. This was the first place I came across them in any case. Among other things, what the 'berg did was to stuff a cabinet full of FBA SCSI disks, and then put a controller in front of them that emulated CKD to the mainframe. An early version of storage virtualization. Why go to all the trouble?

 

IBM had introduced real FBA devices for the mainframe much earlier: devices like the 3310 and the 3370. They were often hated and abused by the system programming community that was used to CKD. The reason for the dislike was mostly the granularity of addressing all the blocks on the disk. Despite this dislike (more on that in a bit) I was working in an IBM internal data center at the time and I was told by a hardware person that FBA was the future whether the mainframe folks liked it or not. They were partly right.

 

VM on the mainframe was into storage virtualization long before storage virtualization was cool. VM has what are called 'minidisks'. A real CKD device such as a 3350, 3380, or 3390 (in my time: there were other models) could be subdivided into many smaller virtual disks. This is not the same thing as disk slices either: nothing is written to a partition table on the disk, and the limit to the number of mini-disks was equal to the number of cylinders that the device had. A 3390 model 1 had 1113 cylinders, and each cylinder had 849,960 Bytes. So VM could create 1112 mini-disks. The 3390 model 54 (which is actually not real hardware, but was something that later disk products emulated on top of FBA SCSI disks) has 65520 cylinders.

 

To create a mini-disk meant going into the VM directory (the place that defines all the attributes for all the virtual machines that will be running on that computer) and entering the start and end cylinder of each mini-disk. You had to be sure that your new mini-disk did not overlay any other mini-disks. When one is working with numbers like 1113, the math is pretty easy. Minidisk 1 starts at cylinder 2 and ends at cylinder 10. Minidisk 2 starts at cylinder 11 and ends... where-ever. Point is, do not overlap them!

 

Enter FBA. The 3370-A2/B2 did not have cylinders, it had blocks. 712,752 blocks per disk. The data entry in the directory became very persnickity.

 

Here is a nifty table with all this kinds of data to continue that deeper dive.

 

The point of all that is this: IBM responded to the customer and did not put out any more FBA disk-stuffs for the mainframe. FBA was the future, but the customers (primarily, I was told, the MVS community) wanted CKD. This means in part that IBM was on the wrong side of history when it comes to commoditization of disk storage, even if they were doing the right thing by being responsive to the customer. There are a lot fewer CKD disks sold in the world than FBA, and in fact right now I am not aware of any real CKD. CKD looking disks are 'created' by smart storage controllers, essentially virtualized by microcode out of FBA disk arrays. The STK Iceburg led the way on that, and changed the way MF storage has been designed ever since. That is one of the reasons that there was never anything past the 3390 model in the IBM storage line of real CKD devices. Everything since then: the Ramacs, Sharks, and DS lines all use smart storage controllers that create "virtual" CKD devices out of real FBA ones. VM and MVS don't know they are not talking to 3390's for the most part.

 

The fact that the disk MF vendors have to emulate the old CKD architecture does have an effect on price. MF storage is far less expensive per Terabyte now than it was back then. It is more expensive than most of the straight FBA / SAN / NAS type solutions available today.


FCP

 

These days you can hook the mainframe to SAN disks, which are of course FBA. Last time I checked though, z/OS (the current name for MVS) could not use them at all, and while VM can see them, it can not fully utilize them. Linux guests of the MF can, even to the place of being able to boot off them, and that takes one to a funny place in mainframe-land. If one uses SAN disks of the MF, probably to save money, then they are also not getting full advantage of the mainframes monster I/O subsystem. In one of the oddities that is MF, even though it still uses arcane formats like ECKD (and I imagine I'll get some hate mail for saying ECKD is arcane) there is no getting around the fact that with 256 I/O channels,  multipathed devices, etc, that the MF is still the world champ for doing I/O. To the MF, doing I/O down one strand of fiber feels to it like drinking a real ice cream malt through a plastic coffee stirrer.

So: Sure, one can do FCP / SAN disks but we did not, and so there was no money to be saved there for us. We needed a different way to leverage the economies of FBA. This was something VMLMAT was created to do.


NAS and Archive

 

I have spend many many posts here talking about our various tiers of storage, and how we have built them out of commodity hardware. Most recently I have been talking about how we replaced the most mission critical NAS server in the shop that was a TruCluster with a CentOS cluster. This has had huge effect on our cost-per-Terabyte. Depending on how HA we need it to be, we are looking at anywhere from 1000 to 3000 USD per Terabyte now.

None of that is mainframe attached of course. No Escon or Ficon or even an external Fiber Channel connection (there is an internal-to-the-NAS-cluster FC SAN). To access this low cost storage requires a TCP/IP based protocol: NFS, ISCSI or CIFS. No problem for UNIX, Linux, or MS Windows based gear, and these days, no problem for the mainframe either... as long as you are talking about a MF with Ethernet connections, and most MF's these days have the "OSA" installed. The Open Systems Adapter sits where a channel adapter used to, and provides a number of industry standard Ethernet connections so that the MF is no longer isolated from the "Network is the Computer" world: Has not been for years actually, but the OSA has been a far better solution than what came before it, like the 3745 communications controller with an Ethernet adapter in it and a few other external, channel attached Ethernet (or FDDI or Token Ring...) communications adapters.

 

Now put Linux on the Mainframe. Native TCP/IP needs are met, and both Linux and VM understand the that protocol stack. NFS is there, NFS can be pretty fast, and it is inexpensive these days to use it.

 

Virtualization can lead pretty easily to server sprawl, and in some cases, like ours, there is a very strong business need to keep around all sorts of versions of Linux. It is not just the multiplicity of the Linux versions either, although there is that to consider. Here are some scenarios where there need to be a number of versions of Linux around on the Mainframe:

 

  • Test copies: After Linux is installed, someone comes along and configures it to meet a special need. Installed more applications, configures it all to work, and then wants to save that off as a template so that others can do destructive testing against the same environment.

  • Multi-Tier application roll out: One group writes the code, another Beta tests it, another QA's it, another stages it for production and another *is* production.

  • Disaster Recovery and Business continuance: A running production environment needs to be replicated to another location.

 

It *easily* can get more complicated than that. It is no effort at all to end up with 100 running Linux Virtual Machines and have three and four times that many non-running copies staged out on the disks in case they are needed. The Mainframe disks. The very expensive Mainframe Disks.

This was where Ron's vision of VMLMAT started in part. It was not originally just a cloning tool, but an archiving tool. Once he had the ability to clone a VM to a new VM, he had what he needed to create archives. The only bad thing was, even tarred and gzipped (tgz), it would still be on the expensive mainframe disks.

 

Ron had been involved from the very early days with our Linux based NAS servers, building several of them. He knew they were inexpensive and very fast, able to run NFS at near Ethernet wire speed. He wanted to be able to leverage them to store his tgz's from the mainframe, so he added that feature to VMLMAT.

 

With the mainframe on one end, and the speedy NAS server on the other, it was a matter of less than 10 minutes to recall an archive stored out on NAS, unpack it onto a target VM, customize its identity, and boot it. It became cheap to keep multiple copies of various archives. It became easy to take the storage tree and mirror or otherwise rsync it to other, different NAS anywhere in the network. It also was fairly inexpensive to keep off site tape archives of the MF archive tgz's. Nothing more special than just being sure the archive directory is being picked up by your backup systems of choice: To the backup software the tgz's are just large files, and they do not care that they came from the MF at all.

 

It is such a simple idea that it almost seems obvious in retrospect, but oddly it was not. When one works on the MF, there is a reticence to leverage non-MF gear sometimes. It just doesn't feel as safe.

 

There is also nothing that says that this has to be NFS. That is what we used, but VMLMAT is Open Source. Adding a different protocol, say FTP, to the archiving function would be trivial.

 

The ROI was immediate: All the MF storage we had been using to keep all the various versions, flavors, and configurations was returned for other projects: We avoided a MF disk upgrade because of it. Only the running Linux VM's are stored on the MF disks. If there is a downside it is that by reducing the cost of storing the archive, there is a tendency to now keep *everything*. Kind of like server sprawl, the cost of storage makes the cost of maintaining and cleaning the archives kind of expensive.

 

This is not a huge issue though: If you are getting that big then maybe you are also looking at HSM and data de-duplication as part of your data center plan, and VMLMAT's way of storing things plays right into that. Archives not recently referenced can be moved out to tape and deleted, and only one or three or whatever you policy it copies of them stored.


Best of Both Worlds

 

With VMLMAT you now have the best of both worlds: Your running, important MF Linux images are getting all the benefits of being on the mainframe, such as the bodacious I/O subsystem, and your inactive archives are stored at extremely low cost and are quickly accessible.