Share: |


Capacity reports are often trying to answer this simple question: how much of my capacity is being used, and how much is available?


It's intuitive to report that memory or CPU is 50% utilized, because it tells us that we know both the used and the available capacity.  But the same level of information is not available for disk I/O or network I/O.  What does it mean that a server is using 30 kB/second of disk I/O?  How much more I/O capacity is available before it runs out?


There are ways to model the maximum I/O capacity, but they require a lot more knowledge about the disk susbsystem than is usually available.  You need to know what kind of transport (SCSI, iSCSI, Fibre Channel, NFS, etc.) is being used, over how many elements, and how the disk subsystem is organized at the other end.  In addition, if it's a shared disk subsystem like an array, then you also have to know what other I/O traffic is incident upon it.


If you used the above information to create a model, you could theoretically come up with a maximum capacity.  Then you could use that as the threshold against which to compare the actual I/O for reporting.  But if that avenue is closed for all practical purposes, capacity planners (or the software they use) struggle to find the maximum value for reporting.


There is a "fake" method of reporting, which looks at a large collection of servers and simply uses the maximum I/O number as the threshold.  This is mere fudging of the numbers. It may make the report look nice, and many products do indeed do this, but it's of limited use.


There's the obvious, thorough way of finding the maximum capacity: use I/O generation programs like Iometer to generate large amounts of I/O, and find out at what point the response time for a transaction exceeds a usable threshold like 10 seconds.  But this method, while it would work, is impractical in production environments.  Besides, it still doesn't account for all the possible external factors (like other servers doing I/O to a shared disk subsystem) that could affect the capacity of your server.


There are other techniques that are commonly used, which I will explain in a subsequent blog post.