Tool which summarizes the results for all Gateway Servers (using General Manager), and gathers logs for nodes with collection errors

Version 32
    Share This:

    This document contains official content from the BMC Software Knowledge Base. It is automatically updated when the knowledge article is modified.


    PRODUCT:

    TrueSight Capacity Optimization


    COMPONENT:

    Capacity Optimization


    APPLIES TO:

    TrueSight Capacity Optimization (Gateway Server) 11.5, 11.3.01, 11.0, 10.7, 10.5, 10.3, 10.0



    PROBLEM:

     

    TrueSight Capacity Optimization (TSCO) Gateway Server Automation tools and techniques
     

     


    SOLUTION:

     

    For the TrueSight Capacity Optimization (TSCO) Gateway Server, the General Manager Lite utility is designed to monitor the collection, transfer, processing. and population status of nightly Manager runs and provide a time series view of the environment over the stability of the environment over the last 30 days (by default).  It can also track the progression of that data into the TSCO datawarehouse as well if TSCO has been implemented.

      


    The benefits of the BCO_BPAStatusAndRecoveryManager.pl script are:

      
       
    • The script prompts you for the information necessary to implement General Manager Lite (rather than requiring a manual setup, which is described below)
    •  
    • The script will automate the execution of the General Manager scripts via the BPA pcron facility
      

     

      

    IMPLEMENTATION VIA THE BCO_BPAStatusAndRecoveryManager.pl SCRIPT

      


    Execute the $BEST1_HOME/bgs/scripts/BCO_BPAStatusAndRecoveryManager.pl script and enter the requested information:

      

    >> Enter BPA Console Name[s] (Multiple Consoles comma separated( Example console1,console2))  (Current Value=localhost)

      

    General Manager Lite (GMLite) must run on a Linux system where the BPA console is installed, but it can communicate with all BPA Unix, Linux, and Windows consoles in your environment in order to build a centralized view of your BPA data processing.  For this prompt, specify a list of BPA consoles for GMLite to contact to obtain BPA data processing information on a nightly basis.

      

    >> Enter Daily Script Execution Time (HH:MM)   (Current Value=20:00)

      

    This prompt is for when GMLite scripts should be executed each day.  By default the script will execute at 8 PM.  This time should be (a) some time after your last Manager run has finished processing for the day (b) data import into TSCO should be complete (if applicable), and (c) at a time when recover populates of data into TSCO could be attempted (if applicable).

      

    >> Enter BPA GeneralManager Port  (Current Value=10129)

      

    This is by default port 10129, and that port is not commonly customized.

      

    >> Enter BPA Output Directory Where the Data will be put  (Current Value=$BEST1_HOME/local/manager/status/GeneralManagerLite)

      

    Specify where the GMLite output should be written on this console if you don't want to use the default location.

      

    >> Enter gnuplot install directory (Do not specify anything if you wish use one in your path)  (Current Value=undefined)

      

    General Manager Lite can create a web page that includes charts reporting the number of computers configured, collected, transferred, processed, populated into the BPA database, and imported into TSCO.  This functionality requires that the 'gnuplot' utility be installed on your TSCO Gateway Server console.  If GNUplot is installed and you would to enable this functionality, specifiy the gnuplot location here.  On Linux the default installation path for GNUplot is /usr/bin.

    NOTE that this is legacy functionality that is not typically used.

      

    >> Enter are you configuring BCO BPA ETL Status reporting [Y|N] (You will need ORACLE_HOME, DSN, user name and password)  (Current Value=N)

      

    If you are importing BPA data into BCO, General Manager Lite can be configured to monitor the success rate of the import of the VIS files into TSCO.  Answer 'Y' if GMLite should be configured to monitor TSCO population success.

      

    >> Enter BCO Oracle DSN (must be configured via tnsnames.ora (see http://www.orafaq.com/wiki/Tnsnames.ora))  (Current Value=undefined)

      

    When integrated with TSCO, supply the TNS Name of your TSCO Database (as defined in the $ORACLE_HOME/network/admin/tnsnames.ora file).

      

    >> Enter BCO ORACLE_HOME  (Current Value=undefined)

      

    When integrated with TSCO, supply the path to your Oracle Client installation on the TSCO Gateway Server console.  If you are using Unix Populate this can be the same path specified in the $BEST1_HOME/local/setup/MpopulateOracleHome.loc file.

      

    >> Enter BCO Oracle Password (Displayed encrypted)   (Current Value=undefined)

      

    When integrated with TSCO, supply the password for the BCO_OWN database user (schema owner).

      

    >> Enter BCO Oracle User Name  (Current Value=undefined)

      

    When integrated with TSCO, supply the the TSCO database account that owns the TSCO installation (by default 'BCO_OWN').  In older TSCO installations this may be CPIT_OWN.  You can validate by checking in the TSCO web interface under Administration -> System -> Configuration -> General -> Database Username (Schema Owner).

      

    >> Enter Number of Days to recover starting from today  (Current Value=2)

      

    NOTE: This parameter is associated with a deprecated configuration of the TSCO Gateway Server VIS parser ETL that is generally not used so the default value can be selected.

    When integrated with TSCO , this is the number of days that General Manager Lite should look back for recovery of failed TSCO Gateway Server data imports into TSCO.  The metric "CPU_UTIL" is checked for each TSCO entity and if it's entirely missing for the day, the entity is identified as one which needs recovery.  Note that method will not flag partial data imports, incorrect system entity types, or partial data collection failures.  It's design is intended to identify systems for which there was absolutely no data imported for that day,

      

    >> Enter Number per day of top BPA visualizer file errors to recover  (Current Value=10)

      

    NOTE: This parameter is associated with a deprecated configuration of the TSCO Gateway Server VIS parser ETL that is generally not used so the default value can be selected.

    When integrated with TSCO, this is the number of Visualizer files that General Manager Lite should attempt to recover each day.  The reason to specify a limit is to throttle the amount of recovery activity to prevent recovery populates from interfering with the nightly load of BPA data into BCO.

      

    >> Enter BPA vis file directory  (Current Value=undefined)

    NOTE: This parameter is associated with a deprecated configuration of the TSCO Gateway Server VIS parser ETL that is generally not used so the default value can be selected.

    When integrated with TSCO, this is the archive directory where the TSCO Gateway Server Visualizer files are to be copied by General Manager Lite when they need to be recovered by the TSCO Gateway Server VIS parser Recovery ETLs configured in TSCO.

      

     

      

    Sample output of configuring the script

      

     

      

    > $BEST1_HOME/bgs/scripts/BCO_BPAStatusAndRecoveryManager.pl
    INFO: Using path /usr/adm/best1_9.0.00/bgs/scripts/BCO_BPAStatusAndRecoveryManager.pl
    Info: Using BEST1_HOME=/usr/adm/best1_9.0.00
    Info : reading input file /usr/adm/best1_9.0.00/local/setup/BCO_BPAStatusAndRecoveryManager.opt
    Please answer the following questions regarding the operation of GeneralManagerLite
    Enter BPA Console Name[s] (Multiple Consoles comma seperated( Example console1,console2))  (Current Value=localhost)
    [ Hit Return to Accept Current Value ]) ?vl-hou-cus-sp55.bmc.com
    Enter Daily Script Execution Time (HH:MM)   (Current Value=20:00)
    [ Hit Return to Accept Current Value ]) ?08:00
    Enter BPA GeneralManager Port  (Current Value=10129)
    [ Hit Return to Accept Current Value ]) ?
    Current Value=10129 kept
    Enter BPA Output Directory Where the Data will be put  (Current Value=$BEST1_HOME/local/manager/status/GeneralManagerLite)
    [ Hit Return to Accept Current Value ]) ?
    Current Value=$BEST1_HOME/local/manager/status/GeneralManagerLite kept
    Please answer the following questions regarding the operation of BCO_BPAtimeAnalysisWebPageCreate
    Enter gnuplot install directory (Do not specify anything if you wish use one in your path)  (Current Value=undefined)
    [ Hit Return to Accept Current Value ]) ?/usr/bin
    Enter are you configuring BCO BPA ETL Status reporting [Y|N] (You will need ORACLE_HOME, DSN, user name and password)  (Current Value=N)
    [ Hit Return to Accept Current Value ]) ?Y
    Please answer the following questions regarding the operation of BCOStatus
    Enter BCO Oracle DSN (must be configured via tnsnames.ora (see http://www.orafaq.com/wiki/Tnsnames.ora))  (Current Value=undefined)
    [ Hit Return to Accept Current Value ]) ?ORA112DB_SP71
    Enter BCO ORACLE_HOME  (Current Value=undefined)
    [ Hit Return to Accept Current Value ]) ?/data1/oracle/product/11.2.0/client_1
    Enter BCO Oracle Password (Displayed encrypted)   (Current Value=undefined)
    [ Hit Return to Accept Current Value ]) ?BmcCapac1ty_OWN
    Enter BCO Oracle User Name  (Current Value=undefined)
    [ Hit Return to Accept Current Value ]) ?BCO_OWN
    Please answer the following questions regarding the operation of BCORecover
    Enter Number of Days to recover starting from today  (Current Value=2)
    [ Hit Return to Accept Current Value ]) ?
    Current Value=2 kept
    Enter  Number per day of top BPA visualizer file errors to recover  (Current Value=10)
    [ Hit Return to Accept Current Value ]) ?
    Current Value=10 kept
    Enter BPA vis file directory  (Current Value=undefined)
    [ Hit Return to Accept Current Value ]) ?/data1/best1data/bcovisrecover
    Warning : The directory does not exist, attempting to create /usr/adm/best1_9.0.00/local/manager/status/GeneralManagerLite
    Info : testing BPA console=vl-hou-cus-sp55.bmc.com
    OS=Linux
    Info : Running /data1/oracle/product/11.2.0/client_1/bin/tnsping ORA112DB_SP71
    Info :
    TNS Ping Utility for Linux: Version 11.2.0.1.0 - Production on 17-SEP-2013 09:52:16

    Copyright (c) 1997, 2009, Oracle.  All rights reserved.

    Used parameter files:


    Used TNSNAMES adapter to resolve the alias
    Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = vl-sjc-cus-sp71.labs.bmc.com)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = ORA112DB)))
    OK (650 msec)

    Info : BCO ETL will be queried
    pcrontab: can't find task ID in your pcrontab file.
    Info : unable run pcrontab to unschedule : /usr/adm/best1_9.0.00/bgs/scripts/pcrontab.sh -unschedule 03 : return 1536
    Info : no runs to unschedule
    Info : Scheduling : /usr/adm/best1_9.0.00/bgs/scripts/pcrontab.sh -schedule 03 "00 08 * * * /usr/adm/best1_9.0.00/bgs/scripts/BCO_BPAStatusAndRecoveryManager.pl -r > /usr/adm/best1_9.0.00/local/manager/log/BCO_BPAStatusAndRecoveryManager.log 2>&1"

      

     

      

    GENERAL MANAGER LITE WEB PAGE OUTPUT

      

    NOTE: This section describes deprecated functionality of the General Manager Lite reporting that is not typically used.  The typical use case of the GMLite reporting is the daily status e-mail that it sends.

      

    Below the 'BPA console' refers to the TSCO Gateway Server console.

    If gnuplot is available on the BPA Linux console where General Manager Lite is scheduled, then it will automically create some web reports that summarize the data collection, transfer, processing, populate, and BCO import success of your BPA environment.

      

    The reports are created by default in the $BEST1_HOME/local/manager/status/GeneralManagerLite/BCO_BPAWebReport directory, and can be viewed via a local web browser running on your Linux console or shared out via a web server running on the BPA console.

      

    Below is a sample of the charts from three different BPA consoles.  The breakdown of the available data is:

      
       
    • Red line -- The number of configured BPA computers.  This is the number of computers included in domain/policy files in an active BPA Manager run
    •  
    • Green line -- The number of computers that successfully collected data in the environment
    •  
    • Blue line -- The number of computers that sucessfully transferred data back to the BPA console
    •  
    • Purple line -- The number of computers that were successfully processed and included in a Visualizer file by the BPA console
    •  
    • Cyan line -- The number of computers that were successfully imported into BCO by the BPA ETLs
      

    User-added image

      

    DETAILED INFORMATION ABOUT THE UNDERLYING GENERAL MANAGER LITE SCRIPTS

      

     

      

    In order to obtain this functionality, the following are required

      

    (1) A Unix console, 7.5.10 or later; for 9.5 and later, this must be a Linux console

      

    (2) You need a perl script (GeneralManagerLite.pl) and updated GeneralManagerClient (this enhancement is recorded as QM001745812).  These are available as part of 7.5.10 UNIX console patches, beginning with June 2012.

      

    Additional updates have been made since June 2012, QM001764244, and this is included in 7.5.10 SP2 console patch from December 2012.  Additional enhancements have been made since December 2012, including support for Windows consoles (QM001781969 and QM001779687).  These were included in 7.5.12 Cumulative Patch 2 (May 2013).

      

    An enhancement to support environments where there are multiple manager runs per day was introduced by fix QM001850974, first available in 9.5 SP1 from August 2014.

      

    For 9.0.00, install 9.0 SP4 or later.

      


    The script only needs to be run on from one of your consoles (UNIX only). It will gather processing statistics from all of your BPA (UNIX and/or Windows) consoles.

    What the tool does:

      


    1. Identifies all the nodes in your environment (this is output as allNodes.csv file)
    2. Identifies all your manager run/domain mappings (this is output as domain2ManagerMap.csv file)
    3. Identifies all duplicate nodes in your environment (this is output as duplicateNodes.csv file)
    4. Identifies all failed nodes and categorizes them into collect, transfer and processing errors (this is output as failedNodes.csv and failedNodesNoAgent.csv files)
    5. Gets all the remote agent logs for collect, transfer and processing errors (including proxy collection).

    How to run the tool manually: 

      

    $BEST1_HOME/bgs/scripts/GeneralManagerLite.pl -c <Console Name> [-o <Output Directory > -p <General Manager Port>  -l  -d -i <manager run pattern>] 

      

     Where: 

      

       Required:

      

        Console Name                  BPA console with GeneralManagerServer running or a comma-separated list of BPA consoles

      

       Optional: 

      

        Output Directory                Output Directory where results will be deposited : default is the current directory 

      

        GeneralManagerPort       General Manager Port : default 10129 

      

        -l                                           Get the Remote Agent and Proxy Logs for detailed analysis (warning this can take a lot of disk space) : Default 

      

        -d                                          Save results in date-stamped directories for the last 30 days (recommended configuration)

      

        -i                                           Ignore/remove results for manager runs which match the pattern specified; multiple patterns may be specified by using a comma

      

    Note that the BCO_BPAStatusAndRecoveryManager implementation method described above is just a semi-automated method for running this script and supplying the necessary input parameters.

      


    Instructions for using the manually run script:

    1. Find a location with a considerable amount of disk space if you are planning to acquire the optional log files (using -l).

      


    If you have specified that you want the collect logs,  logs are about 7 MB per node.  So if you have 1000 nodes with collect failures, you will need  at least 10 GB of free space. 

      


    2.  Run $BEST1_HOME/bgs/scripts/GeneralManagerLite.pl -c <Console Name> [-o <Output Directory > -p <General Manager Port>  -l -d ] 

      


    The script will generate a subdirectory for each BPA console which contains all relevant csv and log files.  Then the files are zipped.  These files can be sent to Customer Support as a summary of the entire BPA console environment.

    It can take a while for the script to run (at 10 seconds per node, 1000 failing nodes will take 3 hours). The more failures there are, the longer the processing takes.

    Only "today" is captured when the script is run.  If you want to keep track of "history", you can set this up by specifying the -d flag (automatically keeps the last 30 days of results in date-stamped directories and removes data older than 30 days).  You should schedule the script to run every day, but note that it will overwrite the results from the previous day unless the -d option is used (or you manually archive the files).

      

    If you have "special" manager runs, such as ones with no data collection where data is simply being reprocessed, you should remove these from the output by using the -i option.  Otherwise, they will produce incorrect results since they don't have the full complement of activities occurring.  Note that this is implemented by using a pattern match so that you don't have to specify the full names of manager runs.

      

     NOTE:  If you are a Windows-console only installation, you can use a Linux VM to do a BPA console install in order to run the script.  You don't need the console to be actually running any Manager runs.

      

     

      

     

      

     

      

    INTERPRETING GENERAL MANAGER LITE OUTPUT FILES

      

     

      

     

      

    "Nodes" which didn't get successfully put into the database for a particular day are divided between failedNodes.csv and failedNodesNoAgent.csv because the type of followup required is likely to be different between the two groups of nodes.

      

    The error code associated with each node's status is provided in the .csv file: C means collect failure, T means data transfer failure, P means processing (no data created for input to the CDB) failure

      

    The error code numbers are available through this document (see attached spreadsheet or 000097173), and are detailed in the associated logs for that node (if requested using -l).

      

    This enables a summary level understanding of how many failures there are for the date, and how many nodes have the same kind of failure.  The purpose is to provide a convenient way to troubleshoot groups of nodes rather than doing them one at a time.  The details for each node are available in the associated log (if requested via -l), so low level reporting is fully supported as well.

      

    failedNodesNoAgent.csv lists all nodes with Collect errors 91, 92, or 94:

      

    91       Error SD_COMM_BAD_HOST          Service daemon invalid host name provided (can not find server or DNS error)  The agent name is not known by the OS.

      

    92       Error SD_COMM_BAD_PORT          Service daemon not installed on the remote node (connection refused)  The product is not installed on the agent computer or the service daemon is not running.

      

    94       Error SD_COMM_CONNECT_TIMEOUT   Service daemon connection timed out (node offline     The agent node is off the network.

      

     

      

    A field-by-field description of all the output .csv files is provided as part of 000097397

      

     

      

     

      

    "BEST PRACTICES" FOR DOING A DAILY HEALTH CHECK OF YOUR BPA CONSOLES 

      

     

      

    (1) Review the GeneralManagerLite output as described above.  This gives an overall summary of how many nodes are under management, and the status of each node.  Also comparing results from day-to-day immediately highlights any change in the overall health as well as pinpoints the source of the changes.

      

    Prior to 9.5 SP2, there was an unsupported script is available to summarize this daily review and to email you the results.  The script (coded for a 9.0 console) is attached to this article and described in the attached Word document.  Here's the information about the email option which is now part of 9.5 SP2 000085246

      

    (2) Using the General Manager GUI (displayed in Perceiver or BCO 9.0), Console Operations -> "Recover Runs" view.

      

    Alternatively, you can use failedNodes.csv (output from GeneralManagerLite) or export the "Recover Runs" to csv if you prefer. 

      

    The methodology here is to initiate any Recovery actions first, then work on the data collection problems which typically require more analysis to resolve. 

      

    (3) Sort by "Populate Status".  For any Manager run which is not "OK", select the run, and then select "Recover".

      

    (4) Sort by "Transfer Fail".  For any Manager run which doesn't have a value of 0, select the run, and then select "Recover".

      

    (5) Sort by "Collect Fail".  Use the corresponding Console Reports -> "Node History" view to establish the precise problem (using the error code), how many nodes have the same problem, and if the problem is persistent (using 3 or 5 day history setting).  Perform remediation as indicated by the error code and cause.  Note that the results of successful remediation may not appear for up to 2 days depending on the problem fixed and how often the Manager run is scheduled for execution.

      

    If you've specified the optional log gathering feature, the corresponding logs have already been retrieved from the remote nodes and zipped so that they can be sent to Customer Support.

      

    (6) Rerun the GeneralManagerLite script after the recovery actions have been completed in order to assess the "recovered" overall health of the data flow for today.

      

    When additional troubleshooting time is available, determining the root cause for repeating Population, Processing, or Transfer errors can avoid the need to "Recover" the run(s) each day.

      

    The failedNodesNoAgent.csv lists all nodes which are listed as under management by BPA, but no collection agent is present.  Typically this requires an internal ticket to get the agent software installed (either on a proxy or local agent).  Note that this condition can occur when a node has its OS upgraded, but the corresponding BPA agent wasn't upgraded at the same time.

      
    Related Products:  
       
    1. TrueSight Capacity Optimization
    2.  
    3. BMC Performance Assurance for Servers
       Legacy ID:KA366377

     


    Article Number:

    000108584


    Article Type:

    Solutions to a Product Problem



      Looking for additional information?    Search BMC Support  or  Browse Knowledge Articles