In TrueSight Capacity Optimization what triggers a 'BCO_ETL_WARN301: Service 101 has detected a problem in data saving into BCO.  Please verify service status.' error?

Version 5
    Share This:

    This document contains official content from the BMC Software Knowledge Base. It is automatically updated when the knowledge article is modified.


    PRODUCT:

    TrueSight Capacity Optimization


    COMPONENT:

    Capacity Optimization


    APPLIES TO:

    TrueSight Capacity Optimization 11.3.01, 11.0, 10.7, 10.5, 10.3, 10.0



    QUESTION:

    The TrueSight Capacity Optimization (TSCO) version 9.5 & 10.X vCenter Service Extractor has data gaps and is reporting the following error
    BCO_ETL_WARN301: Service 101 has detected a problem in data saving into BCO. Please verify service status


     


    ANSWER:

     

    Q: What is causing the 'BCO_ETL_WARN301: Service 101 has detected a problem in data saving into BCO. Please verify service status.' error?

      

    The Data Saver is expected to run every hour and load the data to the BCO database.  If the Data Saver thread is running very slowly, has hung, or the SQL*Loader it executes is running slowly or hung that will trigger the WARNING message from CO.  This is a significant error because if the Saver task remains hung for 2 hours the vCenter Service Extractor will begin purging older data from its local storage database resulting in missing data points.

      

    One of the reason could Oracle database performance (current load exceeding the hardware configuration).

      

    If the Saver is detected to not have been run within the last 3 hours (3 * default saver interval) then the above warning is triggered. During the saver thread execution the SQLite database (used for temporary storage by the vCenter Service Extractor is locked which prevents the aggregator from populating data to the local SQLite database. The aggregator caches data for 2 hour but after that it starts flushing the cached data which is logged with the following message:
    2014-11-03 04:33:27,770 INFO [service-102-1]- [clearStaleCachedIntervals] Dropping sample older than 2 hours... [1414999976], sample timestamp: 03/11/2014 07:10:00 +0000

    Another potential cause for the error when the Data Saver and SQL*Loader are functioning normally is LookupHelper task of the datahub is taking a long time due to a change in the way the ETL runs the LookupHelper.  (Affects VMware ETLs in TSCO 10.7.x and up).  Because the LookupHelper provides data to the Data Saver, for it to run long would result in the Data Saver being unable to complete in a timely manner, resulting in the same error.

      


    Some Examples

      
    Slow SQL*Loader
      

    For example, this error can occur if there is a significant performance problem on the CO Database side resulting in unreasonably slow SQL*Loader execution. For example, the log messages below are from an environment where intermittently the SQL Loader was taking a long time (in this one example around 90 minutes) to load data to SYS_DATA_STAGE table. This was visible several different times in the logs.

    The log lines below are an example of the problem:
    2014-11-03 02:33:27,687 INFO [service-102-2]- BCO loader: loading dataset SYSDAT
    2014-11-03 02:37:14,671 INFO [service-102-2]- OracleCTLL: Dataset size is 454936 -- destination table: SYS_DATA_STAGE
    2014-11-03 02:37:14,682 INFO [service-102-2]- Executing SQLLOADER. ControlFile=/opt/bmc/BCO/etl/output/SYSDATOB32007102.ctl LogFile=/opt/bmc/BCO/etl/output/SYSDATOB32007102.ctl.log
    2014-11-03 04:09:28,780 INFO [service-102-2]- Load completed successfully - exit code: 0


    So, the Oracle SQL*Loader was running the data loads into the BCO database and that took much longer than expected (that process should take a few minutes at most).

      
    Hung Saver Thread
      

    In this scenario it looks like the vCenter Service Extractor "Saver" thread has hung.

    The recommended workaround for this is to kill and then restart the problem BCO vCenter Service Extractor ETL or restart the BCO Scheduler on the ETL Engine where the error is being generated.

    To do that:
      (1) Under Administration -> ETL & SYSTEM TASKS -> ETL Tasks -> Find the problem vCenter Service Extractor (ID: 91) and click the 'Immediately stop this service execution button' (red background with a x).  Then, after it stops, restart it.

    If that doesn't address the problem stop and restart the BCO Scheduler on the ETL Engine (EE) server:
      ./cpit restart scheduler

    That should fix it because stopping and restarting the scheduler will force the ETL to fully re-initialize (all ETLs).

    Usually one would expect to see something like this in the vCenter Service Extractor logs:

    2015-02-12 10:25:00,002 INFO  [service-91-2]- [VMWareCollector] Saving data...
    2015-02-12 10:25:00,002 INFO  [service-91-2]- Saver: opening cache ../localdb/vmwservice_91.db
    2015-02-12 10:25:12,159 INFO  [service-91-2]- Setting busy timeout to 30 s
    2015-02-12 10:25:12,193 INFO  [service-91-2]- Using shared lookup identifier: 9
    2015-02-12 10:25:48,693 INFO  [service-91-2]- [populateConfData] Changed vm conf - toCheck:1423758350379
    2015-02-12 10:25:50,926 INFO  [service-91-2]- [populateConfData] Changed poolconf conf - recomputing optimistic/pessimistic limits
    2015-02-12 10:25:52,929 INFO  [service-91-2]- Processing table SYSDAT
    2015-02-12 10:25:52,932 INFO  [service-91-2]- [ServiceDataSaver] Creating LOADER of type com.neptuny.cpit.etl.loader.CSVL - loading dataset: SYSDAT
    <-- cut -->


    But in logs where the Saver has hung you'll see messages like this instead:

    2015-02-12 11:25:00,002 INFO  [service-91-2]- [VMWareCollector] Saving data...
    2015-02-12 11:25:00,002 INFO  [service-91-2]- Saver: opening cache ../localdb/vmwservice_91.db
    2015-02-12 11:25:11,532 INFO  [service-91-2]- Setting busy timeout to 30 s


    That is the last message associated with the Saver thread.  Further log messages are generated related to the data collection thread but there are no further messages from the Saver thread in the logs.

    Collector Thread Hung example:

    2016-06-22 17:04:00,783 ALERT [Thread-4]- BCO_ETL_WARN301: Service 57 has detected a problem in data extration.  Please verify service status.
    2016-06-22 17:25:00,001 INFO  [service-57-1]- Starting saver [57]...
    ...

    ... The ETL behaves normally according the log messages and insert the SQL Data pretty fast.
    ...
    2016-06-22 17:26:56,802 INFO  [service-57-1]- ... end saver [57]
    2016-06-22 17:34:00,996 ALERT [Thread-4]- BCO_ETL_WARN301: Service 57 has detected a problem in data extraction.  Please verify service status.


    But as you can see the warning occurs 20 minutes before the ETL starts and 8 minutes after is has ended.
    This is likely because of the collector thread hung.

    - Restart the ETL , if this doesn't help run cpit clean scheduler and cpit restart schedule

    LookupHelper example (TSCO 11.0 and later):

    In reviewing the service logs of the VMWare ETL, messages such as below may give an indication that the time for the LookupHelper take to complete is too long (in the example below, nearly 20 hours).  The LookupHelper provides the Data Saver

      

    2018-02-19 01:47:23,783 INFO  [service-107-2]- [LookupHelper] # calls:4095889 - total time:72418s - avg call time:17ms - peak call time:22031ms - min call time:6ms

      

    2018-02-19 02:03:00,534 INFO  [service-107-2]- [LookupHelperFactory] instantiating lookup service based helper

      

    2018-02-19 02:33:37,553 INFO  [service-107-2]- [LookupHelperFactory] instantiating lookup service based helper

      

    2018-02-20 00:02:10,314 INFO  [service-107-2]- [LookupHelper] # calls:4172669 - total time:74961s - avg call time:17ms - peak call time:22655ms - min call time:6ms

      

    2018-02-20 00:29:19,915 INFO  [service-107-2]- [LookupHelperFactory] instantiating lookup service based helper

    When this is happening, first try a restart of the datahub.  If the problem persists, look for the "skip.entity.creation=true" parameter in the ETL. If it exists, adding the following line to the ETL will allow the LookupHelper task to function as expected: "lookup.helper.mode=DB_BASED" (no quotes).  Restart the datahub afterward.
     

      

    Object Cache (TSCO 10.7.01 and earlier)

      

    The TSCO 10.7.01 and earlier vCenter may appear  to hang for a very long time when trying to load the object cache:

      

    2019-02-24 01:46:32,181 INFO  [service-94-2]- TSCO loader: caching ent object table...

      

    2019-02-24 05:13:02,269 INFO  [service-94-2]- TSCO loader: ent object cache loaded

      

    In that example it took 3 1/2 hours for the Object Cache to be loaded into TSCO.

    If this is the problem check the ping time for both the TSCO Database Instance and the TSCKO Application Server to the TSCO ETL Engine Server.  It has been seen that a ping time of 80 ms between the TSCO Database instance and the TSCO ETL Engine Server can result in a very long delay loading the object cache (which must be done each time the LOADER phase executes).

    A good test would be to move the vCenter Extractor Service to run on an ETL Engine that has a much, much lower ping time between the TSCO Database and the TSCO ETL Engine.  If there is a large delay in the load of the object cache one and a slow ping then correcting the ping response time should allow the ETL LOADER phase to execute more quickly and consistently each hour as expected.
     

     


    Article Number:

    000030552


    Article Type:

    FAQ/Procedural



      Looking for additional information?    Search BMC Support  or  Browse Knowledge Articles