This document contains official content from the BMC Software Knowledge Base. It is automatically updated when the knowledge article is modified.
BMC Performance Assurance for Servers
TrueSight Capacity Optimization (Gateway Server/Capacity Agent) 11.x, 10.x ; BMC Performance Assurance for Servers
- BMC TrueSight Capacity Optimization 20.02, 11.x, 10.x
- BMC Performance Assurance for Servers 9.5, 9.0, 7.5.10, 7.5.00, 7.4.10, 7.4.00, 7.3.00, 7.2.10, 7.2.00
- Computers where Capacity Agent UDR data collection and data transfer is failing
Generally if using the BMC TrueSight Capacity Optimization (TSCO) Gateway Manager component, a good place to start debugging is in the Gateway Manager -> Gateway Reports -> Node History. That will list the errors related to each computer in relation to data collection, transfer, and processing. One can then see if the computers that aren't showing up with data being imported into the TSCO database are having a data collection, transfer, or processing problem on the TSCO Gateway Server side.
Outside of the Gateway Manager another good place to look when debugging data collection and data transfer issues within the TruyeSight Capacity Optimization (TSCO) Gateway Server or the BMC Performance Assurance (BPA) product is either (a) the UDR Collection Manager (UCM) Status Reports which are available on the console or (b) for TSCO 10.7.01 and earlier the General Manager Reports which are available via the Perceiver interface (Component dropped since TSCO 11.0). On the Windows console through TSCO version 11.3.01 the UCM Status Reports are accessible under Manager -> Status Reports and on the Unix console you need to run a local web browser and access the /usr/adm/best1_default/local/manager/status/UCMStatus.html file, share them out via a locally installed web server, or copy the directory to a Windows PC where they can be accessed via a web browser. The information available within the UCM status reports is generally very useful for debugging data collection and data transfer problems.
In general, TSCO Technical Support can comment on whether the source of the data collection problem is one of the binaries on the remote node if send the entire contents of the $BEST1_HOME/bgs/log (for Unix) or %BEST1_COLLECT_HOME%\bgs\log (for Windows) directory from the problem remote node. The Service Daemon, Agent, and Collector logs are all in that directory so any Perform binary problem on the remote node should be visible there.
When debugging a data collection or data transfer problem the main goal is to determine which binary within the process is failing.
For Manager based data collection/transfer the likely suspects are:
On the console
- udrCollectMgr (on the console)
Information on UDR Collection Manager problems will generally be available in the UCM Status Reports or UCM log files ($BEST1_HOME/local/manager/log) on the console.
The UCM status reports is a great place to start debugging. If udrCollectMgr is failing there either won't be status reports generated for the day or they will look very different than days when data collection is working. For example, if all nodes are still in 'Sending data collection' status hours after data collection should have started that is a UCM problem. In general UCM is very reliable and isn't the typical source of data collection or data transfer problems.
A good first place to look (beyond the UCM Status reports) is the Service Daemon (bgssd.exe) on the remote node. This is a common point of failure because it relies on inetd (or xinetd) to accept the request and pass it onto the Service Daemon process.
The best test to identify a Service Daemon failure is to run a query against the remote node:
A connection refused or connection timed out message would indicate a Service Daemon problem.
Information on debugging Service Daemon issues is available in KA 000032141.
Debugging the TSCO Agent
If the Service Daemon has accepted the collection request the next place to look is the Perform Agent on the machine. The Perform Agent log is somewhat difficult for people new to it to interpret because many normal everyday messages will look like errors. It is generally best to compare the Perform Agent log on a problem machine to the Agent log on a good machine to look for differences.
There really isn't a good command to test the Perform Agent. The [hostname]-bgsagent_6767.log is the best source of debugging information.
Debugging the TSCO Agent Data Collector
Frequently messages in the TSCO Agent log will indicate that the TSCO Collector (bgscollect) is the source of the problem. Messages indicating that the collector is crashing (Maximum collector restarts exceeded) or has hung (spilltime not within requested range) are common indications in the Agent log of a data collection problem.
In general, the best information about what the collector is doing comes from:
- The [hostname]-bgscollect-noInstance.log file
- The output of $BEST1_HOME/bgs/bin/Look -b $BEST1_HOME -f
The Look command checks what data is being written into shared memory by the collector and will indicate when shared memory was last updated with data. Each section of data that represents a specific metric group will begin with a header like this:
'Collector Information' [noInstance] Fri Dec 1 10:30:46 2006
That date is the last time the group was updated in shared memory by the collector. If that date is old that would indicate the collector wasn't updating shared memory with new information.
The General Manager functionality provided via the Perceiver or TSCO web UI can be very useful for identifying and debugging Agent data collection and data transfer problems.
The Console Reports -> Node History is a good way to see what transfer error has been reported:
Perceiver Console - Node History
TSCO Console - Node History
Clicking on the Transfer error code will provide some additional detail about the error. A BMC Knowledgebase search on the error message will often yield a document with additional debugging suggestions related to that error code.
TSCO Gateway Server Linux console command lineTwo command line commands that can be really useful for debugging data collection and data transfer issues are:
# This command will report the current status of data collection for all systems in Manager runs managed by the TSCO Gateway Server
$BEST1_HOME/bgs/bin/udrCollectStat -D -d `date +%m-%d-%Y` -f "%v %r %d %n: %s, %ch, %ce, %ces %gc"
# This command will report yesterday's status for data collection and data transfer for all systems in Manager runs managed by the TSCO Gateway Server
$BEST1_HOME/bgs/bin/udrCollectStat -D -d `date --date=yesterday +%m-%d-%Y` -f "%v %r %d %n: %no, %s, %ch, %ce, %ces, %th, %te, %tes, %tg, %tt"
General Manager Lite
The General Manager Lite reporting tool is a very good way to see the success of collection, transfer, processing, and population into TSCO.
Information on implementing General Manager Lite reporting is here:
000108584: Tool which summarizes the results for all Gateway Servers (using General Manager), and gathers logs for nodes with collection errors
Custom Support Script (Deprecated)
The Processing Status tool from Technical Support works on the Unix console and can be used to check the status of collect, transfer, Visualizer file creation, and population for active Manager runs for common Manager configurations.
Password: [Your e-mail addr]
Uncompress and extract the processing_status.tar.Z into a directory on your machine.Step 3
As the user under which your Manager runs are scheduled and with the BEST1_HOME environment variable set, run ./processing_status.pl
- TrueSight Capacity Optimization
- BMC Performance Assurance for Servers
- BMC Performance Analysis for Servers
Other related KA's : KA 00104520 , KA 000101624 , KA 000158682