New TSCO Kubernetes 2.x ETL version with enhancements and fixes to address memory errors, crashes and filesystem space

Version 1
    Share:|

    This document contains official content from the BMC Software Knowledge Base. It is automatically updated when the knowledge article is modified.


    PRODUCT:

    TrueSight Capacity Optimization


    COMPONENT:

    Capacity Optimization


    APPLIES TO:

    TSCO Kubernetes



    PROBLEM:

    Using the Kubernetes version 1.3.xx version of the ETL data collection in some environments was failing with memory errors and crashes. The ETL has been seen to use high memory and some times the ETL runs out of memory and crashes. In the Kubernetes version 3.11 release the default "--metric_resolution" option has been changed from 60 seconds to 30 seconds which doubles the amount of traffic being generated by heapster for the ETL to consume.  This change may case the ETL to queue data as it isn't able to keep up with the incoming Heapster data volume and eventually run out of memory.

    In the 'Moviri Integrator Version 1.3.05 for TrueSight Capacity Optimization - Kubernetes' patch the following additional messages were added to monitor the ETL's Heapster data queue:
     

      2019-01-23 08:03:06,878 INFO  [pool-6-thread-1]- QUEUE_MONITOR - Queue length: 0  
    2019-01-23 08:08:06,877 INFO  [pool-6-thread-1]- QUEUE_MONITOR - Queue length: 5513  
    2019-01-23 08:13:06,877 INFO  [pool-6-thread-1]- QUEUE_MONITOR - Queue length: 59632  
    2019-01-23 08:18:06,877 INFO  [pool-6-thread-1]- QUEUE_MONITOR - Queue length: 94444  
    2019-01-23 08:23:06,877 INFO  [pool-6-thread-1]- QUEUE_MONITOR - Queue length: 129629  
    2019-01-23 08:28:06,878 INFO  [pool-6-thread-1]- QUEUE_MONITOR - Queue length: 165406  
    2019-01-23 08:33:06,877 INFO  [pool-6-thread-1]- QUEUE_MONITOR - Queue length: 201579  
    2019-01-23 08:38:08,983 INFO  [pool-6-thread-1]- QUEUE_MONITOR - Queue length: 227187  
    2019-01-23 08:43:12,290 INFO  [pool-6-thread-1]- QUEUE_MONITOR - Queue length: 243636  
    2019-01-23 08:48:06,877 INFO  [pool-6-thread-1]- QUEUE_MONITOR - Queue length: 256235  
    2019-01-23 08:53:15,457 INFO  [pool-6-thread-1]- QUEUE_MONITOR - Queue length: 263251  
    2019-01-23 08:58:15,145 INFO  [pool-6-thread-1]- QUEUE_MONITOR - Queue length: 268614 
      
     
    This increase in queue length will eventually case the process to run out of memory. 

     


    SOLUTION:

    Moviri has released a new version of the ETL (Moviri Integrator for TrueSight Capacity Optimization - Kubernetes.addon) which has been changed to improve the processing of the data for the ETL:

       
    • Performance improvement, with focus on memory management
    •  
    • Entity identification
    •  
    • Back-compatibility with the previous version
    •  
    • Forward-compatibility with the upcoming integration based on Prometheus (instead of Heapster, deprecated)
    •  
    • Solved issues when running POD queries error " org.h2.jdbc.JdbcSQLException: Column "Q1.POD_ID" must be in the GROUP BY list; SQL statement:" 
    •  
    • Implemented Entity Tag to import Kubernetes Label
    •  
    • Support for multiple Kubernetes ETL on the same TSCO ETL Engine
    •  
    • Fix the ack that the ETL sends back to Heapster (trailing '}')
    •  
    • Reviewed H2 queries to avoid SQL error messages
    •  
    • Improved error handling on forbidden/not found when accessing Kubernetes API
    •  
    • Improved "Batch Processing" message, showing the number of total rows inserted on last save
    •  
    • Improved clean-up login on Heapster table rollover
    •  
    • [TSCO 10.7 / TSCO 11.3] Metric Resolution fixed for all the Kubernetes entities (before, the ETL was importing data at hourly resolution
    •  
    • [TSCO 10.7] Reviewed support for TSCO 10.7
    •  
    •  Removed Kubernetes Label support (not available in TSCO 10.7)
    •  
    •  Fixed library dependencies
    •  
    •  Solved issue on Namespace Metric resolution
    •  
    • Enhanced Moviri Integrator deployment, that now takes care of cleaning the environment automatically
    •  
    • Better file system utilization from the H2 Database File, deleting the file once a day
    •  
    • Support for NET_BIT* metrics
    •  
    • File System Utilization improvement (by actually freeing up space) ***patch 2.1.67 or 2.0.67 version and later ***
      Implementation of the additional hidden property “extract.k8s.db.refresh”, allows TSCO administrator to define how frequently the H2 database should be refreshed. The property accepts a number from 4-24, defining the validity hours for the H2 database (for example, if value “4” is considered, the H2 database will be refreshed every 4 hours). If not defined, the H2 database will be deleted once a day at midnight. The property is available only from the Advanced Run Configuration Editor (see below).
    •  
    • The new version implements a new property "extract.k8s.db.timeout" that increases the grace period used by the thread while waiting for the H2 database to be deleted and restored. The property accepts a value in milliseconds. Install the patch and then set the ETL configuration property "extract.k8s.db.timeout" to 30 seconds (30000) value to help with error:
      
      BCO_ETL_FAIL112: Detected error during service 423 data saving to BCO database: org.h2.jdbc.JdbcSQLException: Timeout trying to lock table "HEAPSTER_2_LAST";  
      
     
      
    The patch has been split into two versions -- one for TSCO 10.7.01 and one for TSCO 11.0 and later.  The top level URL for both patches is   https://drive.google.com/drive/folders/11b36RWOWIiWpQrjxmf9KoYMGrI2DsciB and the direct links for each version are:  
        
         
     
    To deploy the new ETL version use these steps:  
      
      
      - Stop the Kubernetes ETL running  
    - Install the new version of the Moviri Integrator for Kubernetes  
    - Delete all the Kubernetes H2 files under $CPIT_BASE/scheduler/bin/localdb/k8s*.h2.db   
    - Restart the scheduler with ./cpit clean scheduler  
    - Start the Kubernetes ETL 

     


    Article Number:

    000163506


    Article Type:

    Solutions to a Product Problem



      Looking for additional information?    Search BMC Support  or  Browse Knowledge Articles