TrueSight Capacity Optimization (TSCO) - Why is the CPU Run Queue Length value from the collected UDR data different from the CPU Run Queue Length value reported in Visualizer or Perceiver in Performance Assurance?

Version 2
    Share This:

    This document contains official content from the BMC Software Knowledge Base. It is automatically updated when the knowledge article is modified.


    PRODUCT:

    BMC Performance Assurance for Virtual Servers


    APPLIES TO:

    BMC Performance Assurance for Virtual Servers



    QUESTION:

     
        
    • BMC Performance Assurance for Virtual Servers  9.0.00, 7.5.10, 7.5.00, 7.4.10,  7.4.00,  7.3.00,  7.2.10,  7.2.00
    •   
    • BMC Performance Assurance for Servers  9.0.00, 7.5.10, 7.5.00, 7.4.10,  7.4.00,  7.3.00,  7.2.10,  7.2.00
    •  
       
        
    • Unix
    •   
    • Windows
    •  


    ANSWER:

     

    Legacy ID:KA305644

      

    The Run Queue Length values from Investigate, the Patrol Console, and a Perceiver UDR data source are different than the values reported by Analyze, Predict, and Visualizer on Solaris, Tru64, HP-UX, and Windows.

      
      

    NOTE: This document was originally published as Solution SLN000000163795.

      
      


    Some Operating Systems do not include threads currently being serviced by the CPU as being on the Run Queue but others do include those running threads when reporting the Run Queue.

    Platforms that do not include running processes in the reported Run Queue length are:

      
       
    • Solaris
    •  
    • Tru64
    •  
    • HP-UX
    •  
    • Windows 
      


    Platforms that do include running processes in the reported Run Queue length are:

      
       
    • AIX
    •  
    • Linux
      


    When viewing the Run Queue reported in Investigate, the PATROL console, and Perceive UDR data sources the raw, OS provided run queue value is reported. This means that when looking at data from a Solaris, Tru64, HP-UX, or Windows machine the running processes will not be included in the reported value, but when looking at the run queue reported for AIX and Linux the running processes will be included.

    The Run Queue length reported by Visualizer/Perceiver is based on averages produced by Analyze. To make comparisons between different platforms easier, Analyze also applies a consistent definition of the Run Queue:  running processes/threads plus waiting processes/threads. The values produced by Analyze, and thus reported by Predict and Visualizer, always include any running processes/threads. Thus, if a machine is running one process that is CPU bound, the Run Queue length reported by Visualizer will be 1.0, whereas Investigate or an OS command like 'sar' or 'vmstat' may report it as zero or one (depending on the platform).

    For systems where the running process is not included in the reported CPU queue length the formula to calculate the Analyze reported queue (running + waiting processes) is:

      
        'Measured Queue Length' + 'CPU Utilization / 100' 
      


    For example, on a Solaris machine where the Investigate reported queue length is .5 and CPU utilization is 250% (out of # processors * 100%), then the Analyze reported queue length would be 3.0 [ (.5 + 250 / 100) = 3.0 ].

    If a processor is running at 50% utilization then that means that half the time there is a process 'running' on it and half the time it is idle. So, if you include the running process in your queue value (which we do in Analyze, Predict, and Visualizer/Perceiver), then that processor's run queue component would be 0.5. A processor at 100% will always have a process running on it, so its contribution to the run queue is 1.0.

      


    A system with only 1 thread running that required 100% CPU would report a run queue of 1. On a machine with two threads running where each wanted to spend 100% of their time on a CPU, the reported run queue would be 2, regardless of the number of CPUs on the system.

      

    Additional Information

      

    The Run Queue reported in Visualizer is based upon the total number of runnable threads across all processor run queues and includes the currently running threads (not just threads waiting for time on a CPU).  The key understanding is that whatever is schedulable for that operating system, that's the "unit" of the Run Queue metric.

    The Run Queue number reported in Visualizer should exactly match the definition of the Run Queue ('r' field) in vmstat on AIX.

    IBM Redbook (http://www.redbooks.ibm.com/redbooks/SG244989/css/SG244989_283.html) provides the following rule of thumb: if > 5 procs in run queue that may be CPU problem. How does the collected metric apply to this run of thumb?

    The IBM document is likely talking about having > 5 threads in the run queue per processor, not the raw number of threads in the run queue. Since on a 12 processor system if there were 12 threads running and each had unfettered access to a CPU the Run Queue reported via the 'vmstat' command and Analyze/Investigate/Visualizer for AIX would be 12 and there would be absolutely no CPU problems in that scenario just runnable threads getting CPU time.

      
    Related Products:  
       
    1. BMC Performance Predictor for Servers
    2.  
    3. BMC Performance Analyzer for Servers

     


    Article Number:

    000367751


    Article Type:

    FAQ/Procedural



      Looking for additional information?    Search BMC Support  or  Browse Knowledge Articles