Share: |


So, what's a good value to use for disk I/O thresholds?


First: the concept of I/O operations: each OS-level I/O is issued as a SCSI command, and the SCSI command, whether it's a read, a write, or a control command, is the unit of work in the I/O stack. The throughput is expressed in terms of the number of I/O operations per second, or IOPS. So, I/O thresholds are easiest to express in terms of IOPS.  For our purposes, we can ignore variable-size I/Os and the tricks that can be done to combine together multiple I/Os to speed up the I/O processing, and assume that you can get hold of an operating system metric that gives you average IOPS during a period.


Now, the operating system can also monitor the total time taken for the operation through the I/O stack.  Different operating systems, of course, call it by different names, but many do report average response time over a period if asked to.  For a given setup, the average response time of an I/O operation can give an indication of whether there is congestion in the I/O stack or not.


Similarly, there can be some measure of the queue length for I/O operations, i.e., the average number of operations waiting to be executed---anything over 1 indicates that there is some congestion.  Note that there may be other reasons for congestion--- there may be something wrong with the I/O stack, or there may be other I/O being done on the stack, e.g., an array may be shared with some other large workload.  But barring these, it's possible to put together a model of how much the maximum I/O threshold should be.


Nowadays, with the virtualization of servers, there is renewed focus on disk I/O as a shared, scarce resource.  The hypervisor vendors have responded with pretty good instrumentation in this area.  IBM has always had excellent instrumentation, and VMWare has improved its coverage with vSphere 4.1. VMWare calls the response time "disk latency".  In terms of what values of latency to use, we can give some rough guidelines here. (Your mileage may vary).

 

In practice, if the hosts are not pegged, then command latencies are generally governed by the disk speed and I/O transport in our experimental setup.  One lab setup uses a shared Fibre Channel SAN with LUNs shared between only a few (< 10) ESX hosts at a time.  We have a lightly loaded CLARiiON CX family array (a CX4-120) situated close to the servers in terms of hops, with large striped RAID groups and a 15-way DAE with fast disks.


Under these circumstances, total read or write command latencies should be well below 10 ms, showing up as zero or close to zero in the VMWare infrastructure.  They will increase if you add a significant amount of sustained and random I/O load to the same RAID group, either from one of the same servers or from a different server, but ordinarily this setup is too powerful to break a sweat.


A cheaper way to see higher command latencies is to use a directly-attached disk.  The ESX host is probably booting off a local SCSI disk.  If you can find such a disk in vCenter and create a VM hosted on a datastore on this disk, you can get nice numbers in the 20-30 millisecond range with even a light load, depending upon the type of disk.  You can make it go higher than 50-60 ms by pounding away at it with IOmeter.