Initial debugging of TSCO Gateway Server (or BPA console) data collection and data transfer issues

Version 7
    Share This:

    This document contains official content from the BMC Software Knowledge Base. It is automatically updated when the knowledge article is modified.


    PRODUCT:

    BMC Performance Assurance for Servers


    APPLIES TO:

    TrueSight Capacity Optimization (Gateway Server/Capacity Agent) 11.x, 10.x ; BMC Performance Assurance for Servers



    PROBLEM:

     

      Step to begin debugging data collection and data transfer problems within the TrueSight Capacity Optimization (Gateway Server and Agent) product.  
      
      
       
        
    • BMC TrueSight Capacity Optimization 11.x, 10.x
    •   
    • BMC Performance Assurance for Virtual Servers 9.5, 9.0, 7.5.10, 7.5.00,  7.4.10,  7.4.00,  7.3.00,  7.2.10,  7.2.00
    •   
    • BMC Performance Assurance for Servers 9.5, 9.0, 7.5.10, 7.5.00,  7.4.10,  7.4.00,  7.3.00,  7.2.10,  7.2.00
    •  
      
      
       
        
    • Computers where Capacity Agent UDR data collection and data transfer is failing
    •  
      

     


    SOLUTION:

     

    Generally if using the BMC TrueSight Capacity Optimization (TSCO) Gateway Manager component, a good place to start debugging is in the Gateway Manager -> Gateway Reports -> Node History.  That will list the errors related to each computer in relation to data collection, transfer, and processing.  We can then see if the computers that aren't showing up with data being imported into the TSCO database are having a data collection, transfer, or processing problem on the TSCO Gateway Server side.

      
     
    Outside of the Gateway Manager another good place to look when debugging data collection and data transfer issues within the BMC Performance Assurance for Servers product is either (a) the UDR Collection Manager (UCM) Status Reports which are available on the console or (b) for TSCO 10.7.01 and earlier the General Manager Reports which are available via the Perceiver interface (Component dropped since TSCO 11.0). On the Windows console the UCM Status Reports are accessible under Manager -> Status Reports and on the Unix console you need to run a local web browser and access the /usr/adm/best1_default/local/manager/status/UCMStatus.html file, share them out via a locally installed web server, or copy the directory to a Windows PC where they can be accessed via a web browser. The information available within the UCM status reports is generally very useful for debugging data collection and data transfer problems.  
     
    In general, TSCO Technical Support can comment on whether the source of the data collection problem is one of the binaries on the remote node if send the entire contents of the $BEST1_HOME/bgs/log (for Unix) or %BEST1_COLLECT_HOME%\bgs\log (for Windows) directory from the problem remote node. The Service Daemon, Agent, and Collector logs are all in that directory so any Perform binary problem on the remote node should be visible there.  
     
    When debugging a data collection or data transfer problem the main goal is to determine which binary within the process is failing.  
     
    For Manager based data collection/transfer the likely suspects are:  
     
      On the console   
        
    • udrCollectMgr (on the console)
    •  
        On the agent   
        
    • bgssd.exe
    •   
    • bgsagent
    •   
    • bgscollect
    •  
       

    Section I: Console

       
       Debugging udrCollectMgr  
      
    Information on UDR Collection Manager problems will generally be available in the UCM Status Reports or UCM log files ($BEST1_HOME/local/manager/log) on the console.   
      
    The UCM status reports is a great place to start debugging. If udrCollectMgr is failing there either won't be status reports generated for the day or they will look very different than days when data collection is working. For example, if all nodes are still in 'Sending data collection' status hours after data collection should have started that is a UCM problem. In general UCM is very reliable and isn't the typical source of data collection or data transfer problems.   
       
       

    Section II: Agent

       
       Debugging the Service Daemon  
      
    A good first place to look (beyond the UCM Status reports) is the Service Daemon (bgssd.exe) on the remote node. This is a common point of failure because it relies on inetd (or xinetd) to accept the request and pass it onto the Service Daemon process.   
      
    The best test to identify a Service Daemon failure is to run a query against the remote node:   
         
          $BEST1_HOME/bgs/scripts/best1collect -B $BEST1_HOME -n [Remote Hostname] -q   
        
       User-added image  
      
    A connection refused or connection timed out message would indicate a Service Daemon problem.   
      
    Information on debugging Service Daemon issues is available in     KA 000032141.   
      
       Debugging the Perform Agent  
      
    If the Service Daemon has accepted the collection request the next place to look is the Perform Agent on the machine. The Perform Agent log is somewhat difficult for people new to it to interpret because many normal everyday messages will look like errors. It is generally best to compare the Perform Agent log on a problem machine to the Agent log on a good machine to look for differences.   
      
    There really isn't a good command to test the Perform Agent. The [hostname]-bgsagent_6767.log is the best source of debugging information.   
      
       Debugging the Perform Collector  
      
    Frequently messages in the Perform Agent log will indicate that the Perform Collector (bgscollect) is the source of the problem. Messages indicating that the collector is crashing (Maximum collector restarts exceeded) or has hung (spilltime not within requested range) are common indications in the Agent log of a data collection problem.   
      
    In general, the best information about what the collector is doing comes from:    
         
    • The [hostname]-bgscollect-noInstance.log file
    •    
    • The output of $BEST1_HOME/bgs/bin/Look -b $BEST1_HOME -f
    •   
        
    The Look command checks what data is being written into shared memory by the collector and will indicate when shared memory was last updated with data. Each section of data that represents a specific metric group will begin with a header like this:    
          ------------------------------------------------------------------------------    
      'Collector Information' [noInstance]  Fri Dec  1 10:30:46 2006    
      ------------------------------------------------------------------------------   
        
    That date is the last time the group was updated in shared memory by the collector. If that date is old that would indicate the collector wasn't updating shared memory with new information.  
      
      

    General Manager

      

    The General Manager functionality provided via the Perceiver or TSCO web UI can be very useful for identifying and debugging Agent data collection and data transfer problems.

      

    The Console Reports -> Node History is a good way to see what transfer error has been reported:

      

    Perceiver Console - Node History

      


     

      

    TSCO Console - Node History

      

    Node History in TSCO UI

      

    Clicking on the Transfer error code will provide some additional detail about the error.  A BMC Knowledgebase search on the error message will often yield a document with additional debugging suggestions related to that error code.

      

    TSCO Gateway Server Linux console command line

    Two command line commands that can be really useful for debugging data collection and data transfer issues are: 

    # This command will report the current status of data collection for all systems in Manager runs managed by the TSCO Gateway Server 
    $BEST1_HOME/bgs/bin/udrCollectStat -D -d `date +%m-%d-%Y` -f "%v %r %d %n: %s, %ch, %ce, %ces %gc" 

    # This command will report yesterday's status for data collection and data transfer for all systems in Manager runs managed by the TSCO Gateway Server 
    $BEST1_HOME/bgs/bin/udrCollectStat -D -d `date --date=yesterday +%m-%d-%Y` -f "%v %r %d %n: %no, %s, %ch, %ce, %ces, %th, %te, %tes, %tg, %tt"  

    Tools

      

    General Manager Lite

    The General Manager Lite reporting tool is a very good way to see the success of collection, transfer, processing, and population into TSCO.

    Information on implementing General Manager Lite reporting is here:
      000108584: Tool which summarizes the results for all Gateway Servers (using General Manager), and gathers logs for nodes with collection errors

    Custom Support Script (Deprecated)

    The Processing Status tool from Technical Support works on the Unix console and can be used to check the status of collect, transfer, Visualizer file creation, and population for active Manager runs for common Manager configurations.

    Step 1

    The tool can be downloaded here:  

      Server: ftp.bmc.com
    Username: anonymous
    Password: [Your e-mail addr]
    Location: /pub/perform/gfc/tools
    Filename: processing_status.tar.Z

       Step 2  

    Uncompress and extract the processing_status.tar.Z into a directory on your machine.

       Step 3  

    As the user under which your Manager runs are scheduled and with the BEST1_HOME environment variable set, run ./processing_status.pl

      

    # Sample output:
    #
    #     Manager Run :          Domain :               Node :  Collect : Transfer :  Process : Populate
    #
    #  -------------- :   ------------- :      ------------- : -------- : -------- : -------- : --------
    #
    #    remedy_daily :          remedy :  hou-remprd-01-int :  WARNING :       OK :      YES :      YES
    #    remedy_daily :          remedy :      hou-remprd-02 :  WARNING :       OK :      YES :      YES
    #    remedy_daily :          remedy :  hou-remprd-03-int :  WARNING :       OK :      YES :      YES
    #    remedy_daily :          remedy :      hou-remprd-04 :  WARNING :       OK :      YES :      YES
    #    remedy_daily :          remedy :  hou-remprd-05-int :  WARNING :       OK :      YES :      YES
    #    remedy_daily :          remedy :      hou-remprd-06 :  WARNING :       OK :      YES :      YES
    #    remedy_daily :          remedy :      hou-remprd-07 :  WARNING :       OK :      YES :      YES
    #    remedy_daily :          remedy :      hou-remprd-08 :    ERROR :      N/A :       NO :      N/A
    #
    #
    # The values in the 'Collect' and 'Transfer' fields come from UDR Collection
    # Manager (UCM).
    #
    # The 'Process' field reports 'YES' if the node is in the SYSTEMS table of the
    # Visualizer file created for that date.  'NO' if the node isn't in the]
    # Visualizer file, or if no Visualizer file was created.
    #
    # The 'Populate' field reports 'YES' if Unix Populate is enabled for the Manager
    # run and the action after a successful population is to MOVE the Visualizer
    # file to another directory.  'NO' means Unix Populate is enabled and the MOVE
    # action is selected but the Visualizer file is still in the Manager Output
    # Directory.  'N/A' means the script doesn't know if the Visualizer file was
    # successfully populated or not.

      
    Related Products:  
       
    1. TrueSight Capacity Optimization
    2.  
    3. BMC Performance Assurance for Servers
    4.  
    5. BMC Performance Analysis for Servers
       Legacy ID:KA339449

     


    Article Number:

    000097196


    Article Type:

    Solutions to a Product Problem



      Looking for additional information?    Search BMC Support  or  Browse Knowledge Articles