Standby App Visibility Portal and/or App Visibility Collector in High Availability (HA) mode will not start due to corrupted PostgreSQL database

Version 1
    Share This:

    This document contains official content from the BMC Software Knowledge Base. It is automatically updated when the knowledge article is modified.


    PRODUCT:

    TrueSight App Visibility Manager Server


    COMPONENT:

    App Visibility Portal


    APPLIES TO:

    App Visibility Portal (Standby HA) - All versions App Visibility Collector (Standby HA) - All versions



    PROBLEM:

     

    When the Standby App Visibility Portal and/or App Visibility Collector is started, it makes a connection to the Active App Visibility Portal and/or App Visibility Collector and starts to copy the PostgreSQL database from the Active App Visibility Portal and/or App Visibility Collector.  During this copying process, the PostgreSQL database files and directories will be deleted from the /AppVisibility/ADOP_DB/pgsql/data directory on the Standby node and copy the database files/directories from the Active node.

    If the App Visibility Portal and/or App Visibility Collector process or PostgreSQL database process is stopped on the Standby node at any point during this copy process then the copying of the database will not complete and the PostgreSQL database will become corrupted.  At this point, the Standby App Visibility Portal and/or App Visibility Collector will not start up properly.

    To confirm if the PostgreSQL database has an issue on the Standby node then below are several items to check:

    Item 1) Look in the Application Log within the Windows Event Viewer, there should be Critical events like the following:

    ----------- Start of Critical event -----------
    The description for Event ID 0 from source PostgreSQL cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

    If the event originated on another computer, the display information had to be saved with the event.

    The following information was included with the event:

    postgres: could not access the server configuration file "C:/ Program Files/BMC Software/App Visibility/ADOP_DB/pgsql/data/postgresql.conf": No such file or directory
    ----------- End of Critical event -----------

    and/or

    ----------- Start of Critical event -----------
    The description for Event ID 0 from source PostgreSQL cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

    If the event originated on another computer, the display information had to be saved with the event.

    The following information was included with the event:

    “FATAL:  configuration file " C:/ Program Files/BMC Software/App Visibility/ADOP_DB/pgsql/data/postgresql.conf" contains errors”
    ----------- End of Critical event -----------

    Item 2) In the C:/ Program Files/BMC Software/App Visibility/adop_db/pgsql/data/pg_log directory, there are no PostgreSQL database logs that start with "postgres".

    Item 3) In the C:/ Program Files/BMC Software/App Visibility/adop_db/pgsql/data directory, there were no PostgreSQL database logs present nor the file called postgresql.conf, which was mentioned in the Application Log.

    Item 4) The portal.log file (if this issue occurs on the Standby App Visibility Portal) will contain messages about failing to configure the PostgreSQL database for HA, and the Portal service has stopped.

    Item 5) The collector.log file (if this issue occurs on the Standby App Visibility Collector) will contain messages about “[HA] [Thread-2] [ERROR] - Connection refused: connect”, and the Collector service has stopped.

     


    CAUSE:

    The PostgreSQL database files and directories on the Standby node were not copied successfully from the Active node


    SOLUTION:

     

    To resolve this issue, the App Visibility Portal and/or App Visibility Collector on the Standby node will need to be uninstalled and reinstalled.  This will force the App Visibility component to have a blank copy of the database and then it connect back to the Active App Visibility component to start copying the database from the Active node.

    Before proceeding to the solution, ensure the following prerequisites are met:

    Item 1) Make a backup of the App Visibility Portal and/or App Visibility Collector on the Active node

    Item 2) When performing the steps, do not stop the App Visibility Portal and/or App Visibility Collector or stop the App Visibility PostgreSQL database on the Active node

    Once Item 1 and Item 2 are completed, below are the steps to perform:

    Step 1) Uninstall the existing App Visibility Portal and/or App Visibility Collector on the Standby node

    Step 2) Install a new App Visibility Portal and/or App Visibility Collector on the Standby node

    Note 1: Ensure this App Visibility component is the same exact version as the App Visibility component on the Active node.

    Note 2: Ensure the FQDN for the Active and Standby nodes on the App Visibility components are entered with the same case.  They are case sensitive.  For example, if the FQDN is entered as "ABCnode1" (without the quotes) on the Active App Visibility node then enter the same FQDN as "ABCnode1" (without the quotes) on the Standby App Visibility node.

    Step 3) After the App Visibility component is installed successfully on the Standby node, ensure the following components are started:

    - App Visibility Portal and/or App Visibility Collector
    - App Visibility PostgreSQL database

    At this time, the Standby node should be copying the PostgreSQL database information from the Active node.  It may take several minutes to a few hours for the database to copy successfully.  The time will depend on the size of the PostgreSQL database.  While this copy is happening, do not stop any of the App Visibility components on the Active node and Standby node.  The App Visibility components will be starting and stopping during this copy process so it is important to not interfere with this process.

    Step 4) To confirm the database copy was successful and the Standby App Visibility Portal and/or App Visibility Collector was started successfully, open the Portal.log (App Visibility Portal) and/or the Collector.log (App Visibility Collector) on the Standby node and look for the following lines:

      
       
    • Starting to copy primary node data dir

    •  
    • Primary node data dir copied successfully

    •  
    • Configured database server as [HA_STANDBY]

      

    If the "Configured database server as [HA_STANDBY]" message appears in the log file then it means the database was copied successfully from the Active node, and the Standby App Visibility Portal and/or App Visibility Collector is running.

    However if the following line appears in the log file:

      
       
    • Configured database server as [HA_ACTIVE]

      

    Then this means the Standby App Visibility Portal and/or App Visibility Collector did not make a successful connection with the Active node and now thinks it is the Active node.  If this is the case then stop the App Visibility Portal/App Visibility Collector service or process and determine the cause of this behavior.

    Note: The App Visibility PostgreSQL database does not need to be stopped.


     

     


    Article Number:

    000350724


    Article Type:

    Solutions to a Product Problem



      Looking for additional information?    Search BMC Support  or  Browse Knowledge Articles