Search BMC.com
Search

Share: |


So, what's a good value to use for disk I/O thresholds?


First: the concept of I/O operations: each OS-level I/O is issued as a SCSI command, and the SCSI command, whether it's a read, a write, or a control command, is the unit of work in the I/O stack. The throughput is expressed in terms of the number of I/O operations per second, or IOPS. So, I/O thresholds are easiest to express in terms of IOPS.  For our purposes, we can ignore variable-size I/Os and the tricks that can be done to combine together multiple I/Os to speed up the I/O processing, and assume that you can get hold of an operating system metric that gives you average IOPS during a period.


Now, the operating system can also monitor the total time taken for the operation through the I/O stack.  Different operating systems, of course, call it by different names, but many do report average response time over a period if asked to.  For a given setup, the average response time of an I/O operation can give an indication of whether there is congestion in the I/O stack or not.


Similarly, there can be some measure of the queue length for I/O operations, i.e., the average number of operations waiting to be executed---anything over 1 indicates that there is some congestion.  Note that there may be other reasons for congestion--- there may be something wrong with the I/O stack, or there may be other I/O being done on the stack, e.g., an array may be shared with some other large workload.  But barring these, it's possible to put together a model of how much the maximum I/O threshold should be.


Nowadays, with the virtualization of servers, there is renewed focus on disk I/O as a shared, scarce resource.  The hypervisor vendors have responded with pretty good instrumentation in this area.  IBM has always had excellent instrumentation, and VMWare has improved its coverage with vSphere 4.1. VMWare calls the response time "disk latency".  In terms of what values of latency to use, we can give some rough guidelines here. (Your mileage may vary).

 

In practice, if the hosts are not pegged, then command latencies are generally governed by the disk speed and I/O transport in our experimental setup.  One lab setup uses a shared Fibre Channel SAN with LUNs shared between only a few (< 10) ESX hosts at a time.  We have a lightly loaded CLARiiON CX family array (a CX4-120) situated close to the servers in terms of hops, with large striped RAID groups and a 15-way DAE with fast disks.


Under these circumstances, total read or write command latencies should be well below 10 ms, showing up as zero or close to zero in the VMWare infrastructure.  They will increase if you add a significant amount of sustained and random I/O load to the same RAID group, either from one of the same servers or from a different server, but ordinarily this setup is too powerful to break a sweat.


A cheaper way to see higher command latencies is to use a directly-attached disk.  The ESX host is probably booting off a local SCSI disk.  If you can find such a disk in vCenter and create a VM hosted on a datastore on this disk, you can get nice numbers in the 20-30 millisecond range with even a light load, depending upon the type of disk.  You can make it go higher than 50-60 ms by pounding away at it with IOmeter.

Share: |


Capacity reports are often trying to answer this simple question: how much of my capacity is being used, and how much is available?


It's intuitive to report that memory or CPU is 50% utilized, because it tells us that we know both the used and the available capacity.  But the same level of information is not available for disk I/O or network I/O.  What does it mean that a server is using 30 kB/second of disk I/O?  How much more I/O capacity is available before it runs out?


There are ways to model the maximum I/O capacity, but they require a lot more knowledge about the disk susbsystem than is usually available.  You need to know what kind of transport (SCSI, iSCSI, Fibre Channel, NFS, etc.) is being used, over how many elements, and how the disk subsystem is organized at the other end.  In addition, if it's a shared disk subsystem like an array, then you also have to know what other I/O traffic is incident upon it.


If you used the above information to create a model, you could theoretically come up with a maximum capacity.  Then you could use that as the threshold against which to compare the actual I/O for reporting.  But if that avenue is closed for all practical purposes, capacity planners (or the software they use) struggle to find the maximum value for reporting.


There is a "fake" method of reporting, which looks at a large collection of servers and simply uses the maximum I/O number as the threshold.  This is mere fudging of the numbers. It may make the report look nice, and many products do indeed do this, but it's of limited use.


There's the obvious, thorough way of finding the maximum capacity: use I/O generation programs like Iometer to generate large amounts of I/O, and find out at what point the response time for a transaction exceeds a usable threshold like 10 seconds.  But this method, while it would work, is impractical in production environments.  Besides, it still doesn't account for all the possible external factors (like other servers doing I/O to a shared disk subsystem) that could affect the capacity of your server.


There are other techniques that are commonly used, which I will explain in a subsequent blog post.


Share: |


With VMWare resource pools, you can delegate capacity management to other admins.  Resource pools let you think about your ESX cluster as a single unit of resources, without worrying about host boundaries too much.  But they do have their peculiarities, which are a rich source of FAQs.

 

A frequent list of questions is about the four parameters you can set on resource pools.

 

The principle is actually straightforward:

  • Limit and Shares: control the growth of VMs, i.e., how much capacity VMs can use.
  • Reservation and Expandable: control admission, i.e., whether new VMs can be powered on or not.

 

Let's see how this principle works, with some example questions from customers:

 

Question 1.  I set Expandable = false, Limit = 150GB, Reservation = 150GB.  I already have 2 guests in the pool of allocated size 120GB.  So now, can I power on a new VM of allocated size 50GB?

 

Answer: No, because the pool has only 30GB of reserved capacity left, and it cannot borrow from its parent.

 

Question 2. Same as Question 1, but I increase the Limit to 250GB.  Now can I power on a new VM of allocated size 50GB?

 

Answer: No, because it still cannot borrow from its parent.  The Limit setting has nothing to do with it.  The Limit setting controls how much the existing two VMs can grow, provided their host has enough memory.

 

Question 3. Same as Question 1, but I set Expandable = true. Now can I turn on a new 50GB VM?

 

Answer: If the parent pool has at least 20GB of reserved capacity available, then the pool can borrow that on top of its own remaining 30GB reserved capacity, and let the new VM power on.

 

If you're not doing anything fancy, then here's some advice: just leave Expandable and Limit alone.  Their defaults are true and false respectively.  This lets the pool borrow from its parent if there is capacity, and lets VMs grow if the resources are available.  If you want to make sure a certain business application get preferential treatment, then set Reservation on its pool to a high number.

Filter Blog

By date:
By tag:
It's amazing what I.T. was meant to be.