This document contains official content from the BMC Software Knowledge Base. It is automatically updated when the knowledge article is modified.
TrueSight Capacity Optimization
TrueSight Capacity Optimization 20.02, 11.5, 11.3,11.0, 10.7, 10.5, 10.3, 10.0 ; TrueSight Capacity Optimization (Gateway Server)
For the TrueSight Capacity Optimization (TSCO) Gateway Manager VIS parser ETL it can be very useful to implement the 'General Manager Lite' reporting scripts in the Gateway Manager console to be able to report on the success of data collection, transfer, processing, and then import into TSCO.
But, if there is a problem seen in relation to the data import into TSCO, how can one determine whether the problem is related to a problem with a particular ETL, ETL chain, ETL Engine, or the Datahub?
When data isn't being imported by the TSCO Gateway Server VIS parser ETL the first question is: "Is the problem limited to a particular ETL, a particular subset of ETLs, or globally across all ETLs?" This is important because it can give an indication of the most likely problem.
So, if the problem is limited to a single TSCO Gateway Server VIS parser ETL I'd be thinking:
- There is a problem with the Manager run creating Vis files or that particular ETL.
- What do those ETLs have in common? Are they all running on the same Console? Are they all being processed on the same ETL Engine? So, for a set of ETLs failing I'd be thinking that the problem was either on the ETL Engine side or on the Gateway Server side.
- Do the ETL logs show the ETLs are running? One possibility is that there is a global problem with TSCO instances (possible database failure) -- something that is impacting the whole environment. This is where you want to check the Diagnostic Alerts for errors or the Component Status for red ERROR state or stale component status timestamps.
What you want to look for there is the length of the "Queue Length" of the store-sys queue, the "Max Queue Age" and whether the queue is showing activity.
For problems with the Datahub, you'll want to look for symptoms like:
(1) Long Queue Length (> 1M rows)
(2) Long Max Queue Age (> 1 day)
(3) Active Threads was reporting '10' but the 'Processing Throughput' was 0.
For example, the high active threads but 0 throughput situation is usually an indication that the Near-real-time Warehouse (NRTWH) thinks that it is processing data but something has gone wrong with its connections to the database -- they are 'active' but only because they have hung.
For most Datahub Near-real-time Warhouse (NRTWH) problems the best initial problem resolution step is to stop and restart the Datahub. This will allow the Datahub to re-initialize and if the data processing problem was triggered by a problem at a point in time (but isn't currently happening) the restart will likely fix it. This is particularly true in a scenario where there were database connectivity errors at a point in time, the Datahub stopped working, the database connectivity problems were corrected (or at least went away), but the Datahub didn't start working again after that.