This document contains official content from the BMC Software Knowledge Base. It is automatically updated when the knowledge article is modified.
BMC Performance Assurance for Virtual Servers
TrueSight Capacity Optimization 20.02, 11.5, 11.3.01, 11.0, 10.7, 10.5, 10.3, 10.0 ; BMC Performance Assurance 9.5
What are the recovery and debugging options available when the BEST1_HOME file system has become 100% full and caused TrueSight Capacity Optimization (TSCO) Gateway Server Manager run failures?
BMC TrueSight Capacity Optimization (Gateway Server) 20.02, 11.5, 11.3.01, 11.0, 10.7, 10.5, 10.3, 10.0
BMC Performance Assurance 9.5, 9.0, 7.5.10
One of the worst thing that can happen on the BMC TrueSight Capacity Optimization Gateway Server (formerly BMC Performance Assurance, BPA, or Perform) console is for the $BEST1_HOME file system to fill up. If the udrCollectMgr processes can't update the $BEST1_HOME/local/manager/run files that can cause problems that are hard to recovery from because the Manager runs for the day become totally corrupted meaning that it would be necessary to move from automated recovery steps to very manual recovery steps.
The first thing to determine is what part failed -- data transfer, data processing or both? The easiest way to do identify problems with data transfer is by looking at the UCM Status Reports in the $BEST1_HOME/local/manager/status directory via a web browser (either a local web browser on your console or you can copy the files to a Windows machine and open the UCMStatus.html file in a web browser on the Windows machine).
The goal is to see that all of your Manager runs are listed in the UCM Status Reports and then look through whether there are problems with failed transfer requests.
It would also be possible to figure that out using the udrCollectStat command (since you can get to all the information available in the UCM Status Reports via the udrCollectStat command. For example:
$BEST1_HOME/bgs/bin/udrCollectStat -D -d `date --date=yesterday +%m-%d-%Y` -f "%v %r %d %n: %s, %ch, %ce, %ces %th %te %tes %tg %tt"
Information about why the udrCollectMgr (data collection/data transfer) part of the Manger run failed can be obtained by looking at the UCM logs in the $BEST1_HOME/local/manager/log directory. Unfortunately there are somewhat difficult to interpret unless you have previous experience with them.
If the problem on the console was just that the UDR data repository filled up that is less of a major problem. In that case the Manager runs would all be listed in the UCM Status Reports but there would be a number of failed transfer attempts associated with the runs.
If the problem is just failed transfer then this document has the best way to recover your data transfers:
000031803: How can Perform UDR capacity planning data be manually transferred from the remote agent node to the Perform console server?
Once you've recovered the failed transfers via the '*.XferData -r' command you'd then need to execute the *.ProcessDay scripts for the Manager runs that failed to process.
The following command can be run to list the active Manager runs and the associated Manager Output Directory:
$BEST1_HOME/bgs/scripts/listManagerRuns.pl -p MANAGER_COMMANDS_FILE OUTPUT_DIRECTORY
Note that if the problem has been caught quickly (the day that the file system filled or the day after) then the following KA describes a command that be used to recover the failed processing and re-initialize the run:
See KA 000210037: For the TrueSight Capacity Optimization (TSCO) Gateway Server on Linux, what is the best way to re-initialize Manager runs if the *.Manager script hasn't been getting executed? (https://bmcsites.force.com/casemgmt/sc_KnowledgeArticle?sfdcid=000210037)
Common Gateway Manager symptoms of a file system full condition on the Linux console:The most common problem symptoms associated with a file system full condition on the Linux Gateway Server console are:
- Manager runs are not listed in Gateway Manager -> Gateway Reports: Exception Reports
- Last night's data collection and transfer status for individual computers are not listed under Gateway Manager -> Gateway Reports: Node History
Advanced RecoveryWhen using AutoNodeDiscovery (Agent List based Manager runs) if the output directory where the Manager input files are located fills there is the potential for key configuration file corruption. This includes:
- The Manager *.vcmds files may be truncated
- The *.dmn files may be truncated
Look for Manager *.vcmds file that are unusually small or domain files that are 0 length.
There is no automated mechanism to rebuild a Manager *.vcmds file if it has been truncated. If a backup copy of the parent Manager run *.vcmds file is available that *.vcmds file can be restored before re-running the [date]-[date].Manager script. Once the *.Manager script is run for the master VCMDS it will repair the child VCMDS files (the additional runs created by AutoNodeDiscovery to handle the agent list) and will repair all of the underlying *.dmn files. Once the files have been repaired the [date].[date].Manager runs can be re-executed to re-initialize the run.
- TrueSight Capacity Optimization
- BMC Capacity Management
- BMC Performance Assurance