How should CPU utilization data collected for AIX SPLPAR machines by Perform be interpreted?

Version 6
    Share:|

    This document contains official content from the BMC Software Knowledge Base. It is automatically updated when the knowledge article is modified.


    PRODUCT:

    TrueSight Capacity Optimization


    COMPONENT:

    Capacity Optimization


    APPLIES TO:

    TrueSigh Capacity Optimization 11.x, 10.x, BMC Performance Assurance for Virtual Servers 9.x



    QUESTION:

     
       CPU Utilization reported for AIX SPLPAR machines can depend on many factors including the number of physical processors in the CPU pool, the number of logical processors assigned to the SPLPAR, the CPU entitlement, whether the SPLPAR is running in 'capped' or 'uncapped' mode, and whether SMT is enabled or disabled.   
         
         
    • How should CPU numbers reported via the System Statistics group in Investigate be interpreted?
    •    
    • How should CPU numbers reported via TSCO Capacity Agent be interpreted?
    •   
       


    ANSWER:

     

    Legacy ID:KA312207

      
      The data reported in Capacity Agents for CPU is the same as what the agent would report for CPU numbers in Investigate for the 'System Statistics' group.  
     
    Here is information about how the Capacity Agent reports System Statistics level CPU utilization in Investigate:  
     
    There are three settings that define how much CPU can be used on an SPLPAR. One is the 'entitlement' that is the guaranteed amount of CPU that will be available for this processor. Another is the number of virtual processors allocated to the SPLPAR. In this case the SPLPAR has been allocated 2 virtual processors. The third setting is whether the machine is 'capped' or 'uncapped'. A capped SPLPAR can only consume as much CPU as its entitlement limit. So, for this machine if it were capped, it would only be able to use up to 25% of a single processor. Thus 100% utilization for this machine would mean 25% of a single processor. An uncapped SPLPAR can consume as much CPU as its virtual processors can get scheduled time on physical processors. So, a machine with 2 virtual processors (like this one) could use up to 200% CPU when uncapped. But, if other SPLPARs in the same pool were competing for CPU resources it might not be able to get up to 200% because other SPLPARs might be using the available physical processors.  
     
      Modes Capped/Uncapped: 
    Capped mode on the SLPAR doesn't allow the partition to exceed the entitled capacity assigned to it even if there are free resources in the processor pool.  
    Uncapped mode will let the logical partition to get more processing power from the host/pool if needed if enough resources were available and other partitions are not using it.  
    Uncapped partitions have access to spare processor cycles in the shared processor pool.  
     
     
    So, what Development did was decide to always represent CPU Utilization in System Statistics as being 'out of' the 'entitlement' for the machine regardless of whether it was capped or uncapped. The reason is that the entitlement is the total amount of CPU that we know the machine can get, where as the limit up to the number of virtual processors allocated to the SPLPAR is just a theoretical limit (from the perspective of the collector which only knows about the local machine).  
     
    So, the collector is going to report for '% CPU Utilization' the value from lparstat called '%entc' capped at 100%. So, if %entc goes above 100% in lparstat (as it is on this machine) the Perform collector will report the machine as being 100% utilized.  
        
        NOTE: The 'System Statistics:CPU Utilization' value reported by Perform version 7.3.00 Investigate may be reported incorrectly. See Resolution 10009171 for additional information.   
     
    Sample lparstat output:   
       System configuration: type=Shared mode=Uncapped smt=On lcpu=4 mem=8192 psize=50 ent=0.25   
    %user  %sys  %wait  %idle physc %entc  lbusy  vcsw phint   
    -----  ----  -----  ----- ----- ----- ------  ---- -----   
    65.2  29.8    0.4    4.6  1.20 479.4   32.0  1646    93   
    61.7  33.6    0.3    4.4  1.14 456.4   29.3  1496    86   
    63.7  31.5    0.3    4.6  1.19 477.6   31.5  1619    82   
    54.1  40.9    0.2    4.7  1.13 453.6   29.7  1386    69   
    64.4  30.8    0.4    4.4  1.16 465.5   30.6  1804    78   
    70.2  24.5    0.4    4.9  1.28 512.0   35.9  1964    77   
    68.4  26.2    0.5    4.8  1.19 476.2   32.0  1949    73   
    73.5  21.2    0.8    4.5  1.26 504.9   34.3  2176    81   
    57.7  37.0    0.3    5.0  1.19 476.1   31.3  1643    74   
    77.9  16.8    0.3    5.0  1.54 616.9   49.1  5199   151   
    84.0  12.9    0.6    2.4  1.81 725.7   72.7  3896   265   
    84.8  11.8    0.7    2.7  1.79 714.7   66.2  3407   187   
    80.8  14.7    0.3    4.2  1.51 605.3   47.1  2633   106   
    81.7  13.4    0.3    4.6  1.42 568.4   42.2  2604   109   
    81.8  13.0    0.2    5.0  1.47 589.8   44.8  2361   128  
        

    Q: How does the sum of CPU Statistics reported CPU relate to System Statistics reported CPU?

    Reporting the sum of CPU Statistics utilization should give you a value from 0% - X hundred % where 'X' is the number of Logical Processors (LPROCS). The main issue with this number would be that it would not consider the entitlement of a capped or uncapped LPAR. That would mean you'd need to understand what your real maximum was on the machine.  
     
    For example, on a capped SPLPAR with an entitlement of .5, that would mean that while System Statistics was reporting from 0% to 100%, the sum of CPU Statistics utilization would be reporting from 0% to 50%. On an uncapped SPLPAR with 2 Logical Processors and an entitlement of .5 then while System Statistics was reporting from 0% to 100% (with 100% meaning 'anything about my entitlement of 50% of a single physical processor) sum of CPU Statistics utilization could, in theory, report anywhere between 0% and 200%. But, if other partitions in the shared pool were using up the remaining processing power available over the entitlement then you'd never really know how much CPU above 50% was really available to this partition.   

    Q: How does the sum of Process Statistics reported CPU relate to System Statistics reported CPU?

    Reporting the sum of Process Statistics CPU utilization should give you a value from 0% to X hundred % where 'X' is the number of Logical Processors (LPROCS). This number would not consider the entitlement and its maximum possible range would be controlled by the number of Logical Processors. For a capped SPLPAR or uncapped SPLPAR where other partitions were using the available resources beyond the entitlement in the shared pool the actual maximum might be less than 100% * the number of LPROCS.  
     
    This is how CPU Statistics reporting generally works on even normal LPARs (where there are no shared resources). Each processor can contribute from 0% to 100% so the sum of the CPU utilization will be from 0% to X hundred % and isn't normalized into a range from 0% to 100%.   

    Q: How does the System Statistics CPU data reported by Perform match with the 'ec' value reported via 'vmstat' on AIX?

    The Perform reported CPU utilization should approximately match the 'ec' field reported in 'vmstat' below 100%. This is approximately (on the lower side) because %entc includes PURR ticks in wait and idle modes, and the Perform reported System Statistics "CPU Utilization" value doesn't.  
        

    Q: How are the '%user', '%sys', '%wait', and '%idle' values related to 'physc' in lparstat?

    Is there a relationship between the user/sys/wait/idle values reported and any of the other fields listed?  
     
    Sample 'lparstat' output:  
        
       System configuration: type=Shared mode=Uncapped smt=On lcpu=4   
    mem=8192 psize=50 ent=0.25   
    %user  %sys  %wait  %idle physc %entc  lbusy  vcsw phint   
    -----  ----  -----  ----- ----- ----- ------  ---- -----   
    65.2  29.8    0.4    4.6  1.20 479.4   32.0  1646    93   
    61.7  33.6    0.3    4.4  1.14 456.4   29.3  1496    86   
    63.7  31.5    0.3    4.6  1.19 477.6   31.5  1619    82  
       
    The relationship between physc and %entc is very simple: %entc = (physc / ent) * 100%.  
    Below 100%, $user + %sys ~= %entc. (Approximately, for the same reason as above.) Above 100%, there is no meaningful relationship.  
     
    Of the other fields, lbusy is positively correlated with %user+%sys. That's about as precise as you can get, because it's based on logical (non-PURR) counters and as such has no formal relationship with the PURR-based ratios. vcsw and phint are not time-related at all. 

     


    Article Number:

    000031236


    Article Type:

    FAQ/Procedural



      Looking for additional information?    Search BMC Support  or  Browse Knowledge Articles