TrueSight Capacity Optimization (TSCO) components are reporting an "java.lang.OutOfMemoryError: unable to create new native thread" error

Version 1
    Share This:

    This document contains official content from the BMC Software Knowledge Base. It is automatically updated when the knowledge article is modified.


    PRODUCT:

    TrueSight Capacity Optimization


    APPLIES TO:

    TrueSight Capacity Optimization 11.3.01



    PROBLEM:

    The TrueSight Capacity Optimization (TSCO) Scheduler and Data Hub are intermittently reporting an "Out Of Memory" issue on the Application Server.

    For example:

     
      BCO_DH_FAIL002: wh-sys-launcher error detected - going to sleep - sleep timeout 600s 
    StackTrace: java.lang.OutOfMemoryError: unable to create new native thread 
    at java.lang.Thread.start0(Native Method) 
    at java.lang.Thread.start(Thread.java:717) 
    at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) 
    at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) 
    at com.neptuny.cpit.warehouse.AbstractThreadLauncher.run(AbstractThreadLauncher.java:267)
      
    But the DATAHUB_HEAP_SIZE and SCHEDULER_HEAP_SIZE for the environment seem appropriate.  Also, the java.lang.OutOfMemoryError is associated with an "unable to create new native thread" message rather than the usual message regarding garbage collection.

    There are also similar errors intermittently being reported in the Datahub's catalina.out file:

       
      Exception in thread "Timer-2" java.lang.OutOfMemoryError: unable to create new native thread
            at java.lang.Thread.start0(Native Method)
            at java.lang.Thread.start(Thread.java:717)
            at com.bmc.bco.scheduler.commander.DatabaseSchedulerCommander$TaskExecReqReader.run(DatabaseSchedulerCommander.java:1341)
            at java.util.TimerThread.mainLoop(Timer.java:555)
            at java.util.TimerThread.run(Timer.java:505)

     


    SOLUTION:

    In the case the "java.lang.OutOfMemoryError: unable to create new native thread" is nott related to an actual "out of memory" condition but is related to the TSCO Installation Owner user running up against the 'ulimit -a' 'max user processes' setting (since that actually limits the maximum number of user _threads_ on Linux, not processes).

    This web page that talks about a number of possible reasons for that error:
      https://dzone.com/articles/troubleshoot-outofmemoryerror-unable-to-create-new

    TSCO 11.3.01 Defect

      

    In TSCO 11.3.01 CHF 11.3.01.001.C0007 and earlier the TSCO "Event Manager" task may cause the TSCO Scheduler thread usage to increase each day and eventually the Primary Scheduler will use all available threads for the TSCO Installation Owner user resulting in new thread allocation by any TSCO component to fail.

    This problem has been tracked by defect DRCOZ-19932: 'Event manager - end processing is not properly cleaning allocated threads causing JVM errors' and a fix is available in the TSCO 11.3.01 Cumulative Hot Fix (CHF) 11.3.01.001.C00008 and later package.

    Information on how to obtain the latest CHF is available here:
      000097159: Cumulative Hot Fixes for TrueSight Capacity Optimization (CO), CO Gateway Server, and CO Agent, and CO Perceiver (https://bmcsites.force.com/casemgmt/sc_KnowledgeArticle?sfdcid=000097159)

      


    Workaround

       A good workaround (but it would require another Datahub restart) and probably need to be done by the system administrator (to be permanent) would be to increase the "max user processes" value on the machine.  

     

      

    Option A: Change the limit at the system level

      
      To change it permanently:
     # vi /etc/security/limits.conf

    Add this value to the file:
    cpit soft nproc 4096 (Assuming the TSCO Installation Owner is the 'cpit' user).
      


    Option B: Set the 'user processes' soft limit via the customenvpre.sh file

      
      The customenvpre.sh file is a good way to increase the thread limit in an environment where the existing hard limit is set sufficiently high and all TSCO processes inherit the environment from the env.sh which sources the customenvpre.sh during execution so the ulimit setting in there should be picked up by all TSCO processes on startup.

    So, you could just put the following inside the customenvpre.sh:
      
       
    •     
            ulimit -Su 4096  
      
     
      That could be at the beginning or end of that file and it should increase the ulimit for the maximum threads for all of the TSCO processes when they are started.
      


    Option C: To set a temporary setting in a shell where the TSCO processes are restarted

      
      To change it temporarily (just in the session where you restart the Datahub or other components):
      
       
    •     
            ulimit -Su 4096 # To set the thread limit to 4096 threads  
    •  
    •     
            ulimit -a | grep "max user processes"  
    •  
    •     
            ./cpit restart datahub  
      
     
      That assumes that the following command returns a value 4096 or higher (that is the hard limit set at the OS level).
      
       
    •     
            ulimit -aH | grep "max user processes"  
      

     

      

    Additional Debugging

       It may be useful to try to determine which process is causing the environment to reach the thread limit for the TSCO Installation Owner user.

    A command like this would be a good starting point for the analisys:
      ps -efL | grep cpit | wc -l

    But that would just give a total count of the number of in-use threads.  A more detailed analysis breaking down the thread count by process (using a variation of that command) would be necessary to understand why the thread limit was being exceeded.
      


    Additional Information

       It looks like the OS default thread limit on Red Hat Enterprise Linux (RHEL) has been increased from 1024 in RHEL 6 to 4096 in RHEL 7.
      

     


    Article Number:

    000162005


    Article Type:

    Solutions to a Product Problem



      Looking for additional information?    Search BMC Support  or  Browse Knowledge Articles