TSSA/BSA: Connection reset error while running database cleanup jobs

Version 1
    Share This:

    This document contains official content from the BMC Software Knowledge Base. It is automatically updated when the knowledge article is modified.


    PRODUCT:

    TrueSight Server Automation


    COMPONENT:

    TrueSight Server Automation



    PROBLEM:

    While running any of the blcli based cleanup commands (eg blcli Delete cleanupDatabase) via a NSH Script Job, you see the following error in the job run logs:
     

    Command execution failed. com.bladelogic.om.infra.app.remote.BlRemoteException: com.bladelogic.om.infra.mfw.util.BlAppServerException: Connection reset  The command 'Delete cleanupDatabase' failed. Please run 'Cleanup Diagnostic Test' for further details.
      
    Across multiple runs, the error happens after the same amount of time each time, eg after 5 minutes:  
    "BSA Recommended Database Cleanup Job Run at Feb 18, 2019 11:30:31 AM",Info,"Feb 18, 2019 11:32:48 AM",[Mon Feb 18 11:32:48 CET 2019] Executing stored procedure: OldVersionJob "BSA Recommended Database Cleanup Job Run at Feb 18, 2019 11:30:31 AM",Info,"Feb 18, 2019 11:37:59 AM",Command execution failed.  com.bladelogic.om.infra.mfw.util.BlAppServerException: Connection reset[Mon Feb 18 11:37:56 CET 2019] Started execution retention policy 
      
    Here you see the job was started at 11:30, at 11:32 a stored procedure started and then after 5 minutes of inactivity the connection was reset at 11:37.  The distinguishing factor is the interval is always the same - always 5 minutes, always 10 minutes, etc. 

      

     


    CAUSE:

    Load Balancer configuration


    SOLUTION:

    This error can occur in environments that use a load balancer to distribute incoming gui connections across appservers.  

    If you have separate JOB and CONFIG servers, check to see if the AppServiceURLs setting in each JOB instance is set to the VIP.  If it is, change it to the server itself and whatever the AppSvcPort is for that instance, or leave the setting blank.  To check the values, run:

    blasadmin -s <instance name> show auth all blasadmin -s <instance name> show app all
      
    If you have only ALL instances, then add an alias in the   hosts file on each appserver so the VIP resolves to that appserver itself. 

    Then restart the appserver service on the appserver(s). 

    The   connection reset can happen in the above configuration when a node behind a VIP goes out through the load balancer, connects to the VIP, and then is directed to a node behind the VIP.  This is not a typical network connection path and results in the connection being broken after some time.  There is no reason for a JOB instance to have the blcli connect to anything other than itself so the AppServiceURL on a JOB instance should always point to itself. 

    If you are using ALL instances and are unable to add the alias, then you should consult your Load Balancer documentation to see if it can be configured for the scenario where you want nodes behind a VIP to be able to access and use the VIP. 

    With some load balancers, it is possible to configure them to handled the network connection path noted above.  F5 calls this SNAT.  However, this is not recommended for two reasons:  
       
    • Sending requests to another appserver increases troubleshooting complexity - you won't know what node your request went to
    •  
    • Sending requests to another appserver introduces a dependency on another node.  If that node goes down for some reason, then any blcli requests from other nodes will fail where as if the node is using itself, then that node can fail and not affect blcli requests from other nodes.

     


    Article Number:

    000167563


    Article Type:

    Solutions to a Product Problem



      Looking for additional information?    Search BMC Support  or  Browse Knowledge Articles