8 Replies Latest reply on Dec 21, 2016 1:20 PM by Anuparn Padalia

    Issue with TSPS Secondary Node

    Anuparn Padalia

      Hi All,

       

      I am facing issue with my TSPS HA environment where I am unable to open Secondary Node URL which is in Standby Node. On checking Truesight Presentation Service it was down. Next I checked truesight.log file where I am getting below error codes -

       

      ##################### ERROR  ##########################

      ERROR 12/14 15:17:32.388 [Timer-2] c.b.t.a.u.DatabaseUtil BMC_TS-PL000007F    Failed to create database connection.

      org.postgresql.util.PSQLException: FATAL: the database system is shutting down

          at org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:420) ~[postgresql-9.4-1201.jdbc41.jar:9.4]

          at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:195) ~[postgresql-9.4-1201.jdbc41.jar:9.4]

          at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:66) ~[postgresql-9.4-1201.jdbc41.jar:9.4]

          at org.postgresql.jdbc2.AbstractJdbc2Connection.<init>(AbstractJdbc2Connection.java:127) ~[postgresql-9.4-1201.jdbc41.jar:9.4]

          at org.postgresql.jdbc3.AbstractJdbc3Connection.<init>(AbstractJdbc3Connection.java:29) ~[postgresql-9.4-1201.jdbc41.jar:9.4]

          at org.postgresql.jdbc3g.AbstractJdbc3gConnection.<init>(AbstractJdbc3gConnection.java:21) ~[postgresql-9.4-1201.jdbc41.jar:9.4]

          at org.postgresql.jdbc4.AbstractJdbc4Connection.<init>(AbstractJdbc4Connection.java:41) ~[postgresql-9.4-1201.jdbc41.jar:9.4]

          at org.postgresql.jdbc4.Jdbc4Connection.<init>(Jdbc4Connection.java:24) ~[postgresql-9.4-1201.jdbc41.jar:9.4]

          at org.postgresql.Driver.makeConnection(Driver.java:414) ~[postgresql-9.4-1201.jdbc41.jar:9.4]

          at org.postgresql.Driver.connect(Driver.java:282) ~[postgresql-9.4-1201.jdbc41.jar:9.4]

          at java.sql.DriverManager.getConnection(Unknown Source) ~[na:1.8.0_51]

          at java.sql.DriverManager.getConnection(Unknown Source) ~[na:1.8.0_51]

          at com.bmc.truesight.api.util.DatabaseUtil.getDatabaseConnection(DatabaseUtil.java:71) ~[core-api.jar:na]

          at com.bmc.truesight.api.util.DatabaseUtil.isDatabaseReachable(DatabaseUtil.java:51) ~[core-api.jar:na]

          at com.bmc.truesight.platform.monitoring.DBMonitoring.isDBReachable(DBMonitoring.java:61) [platform.jar:na]

          at com.bmc.truesight.platform.monitoring.DBMonitoring.isOK(DBMonitoring.java:51) [platform.jar:na]

          at com.bmc.truesight.api.util.monitoring.task.AbstractSHMTracker$SHMTimerTask.run(AbstractSHMTracker.java:62) [core-api.jar:na]

          at java.util.TimerThread.mainLoop(Unknown Source) [na:1.8.0_51]

          at java.util.TimerThread.run(Unknown Source) [na:1.8.0_51]

      #############################################################

       

      When I check logs its getting stuck -

       

      INFO  12/15 08:08:56.825 [main] c.b.t.p.c.CellServcieStartup Successfully started cell service

      INFO  12/15 08:08:56.825 [main] c.b.s.n.s.m.TSMessageServiceStartup Starting Messaging Service ....

      INFO  12/15 08:08:56.840 [main] c.b.s.n.s.m.TSMessageServiceStartup Successfully initialized Messaging Service ....

       

      Please let me know if anyone faces such issue in HA environment.

       

      ./Anuparn Padalia

        • 1. Re: Issue with TSPS Secondary Node

          Hi Anuparn,

           

          I have found that the belwo error is very generic ,

          Failed to create database connection.

          org.postgresql.util.PSQLException: FATAL: the database system is shutting down

           

          Have you tried bringing up the services ? Is the secondary node still down ?

          • 2. Re: Issue with TSPS Secondary Node
            Anuparn Padalia

            Well Sachin, I tried bringing up the Service, its up right now.

             

            But it seems the TSPS Secondary is not coming up completely and as per my above logs its getting stuck before starting elastic search DB.

             

            Now, BMC Support is asking me to make changes in cache.conf and then do a failover exercise from primary to secondary, where I am 100% sure if I do that it will bring down my Primary node as well. And my PostgreSQL DB at primary will crash.

             

            I am currently analyzing all risk involved after making these changes.

             

            ./Anuparn Padalia

            • 3. Re: Issue with TSPS Secondary Node
              Anuparn Padalia

              Hi All,

               

              This is accepted as a Product Defect by BMC.

               

              For which solution is shared. Currently its not tested by me, but will verify and publish outcome here.

               

              KnowledgeArticle - BMC

               

              ./Anuparn Padalia

              • 4. Re: Issue with TSPS Secondary Node
                Sistemas Securitas Direct

                Good morning Anupard

                 

                I'm facing the same issue, but checking the workaround provided, I have noticed that BMC has already included the change in the last software I have downloaded ( cache.conf file includes await-initial-transfer="false" property). So it is not working for me.

                 

                I'll investigate about another error seen in tssh log:

                 

                BMC_TS-CA000016E      Unable to get property [port] from DB conf file at [/opt/bmc/TrueSightPServer/truesightpserver/bin/../data/pgsql/postgresql.conf] Exception while reading PostgreSQL configuration file.

                 

                kind regards

                Sergio

                • 5. Re: Issue with TSPS Secondary Node
                  Anuparn Padalia

                  Thanks for your input here.

                   

                  ./Anuparn Padalia

                  • 6. Re: Issue with TSPS Secondary Node
                    Anuparn Padalia

                    Same issue reoccurred in my QA Environment, where I performed below steps as per KB Article -

                    1. On the secondary TSPS server, edit the TrueSightPServer\truesightpserver\conf\ha\cache.conf file

                    2. Locate the following (state-transfer)

                       ===========
                       <state-transfer
                       timeout="240000"
                       enabled="true"
                       chunk-size="10000"
                       />
                       ===========

                    3. Replace it with following (state-transfer await-initial-transfer="false")

                       ===========
                       <state-transfer await-initial-transfer="false"
                       timeout="240000"
                       enabled="true"
                       chunk-size="10000"
                       />
                       ===========

                    4. Stop the TSPS server.

                    5. Made a tssh ha copysnapshot

                    6. Start the TSPS Server.

                    The problem was with the asynchronous transfer/communication for "tssh ha copysnapshot" command and by adding the state-transfer await-initial-transfer="false" parameter it is changed to synchronous. (this is addressed in future release where it synchronous by default).

                     

                     

                    Post this QA TSPS Secondary Node works fine.

                     

                    ./Anuparn Padalia

                    • 7. Re: Issue with TSPS Secondary Node
                      Sistemas Securitas Direct

                      Good afternoon

                       

                      I have just solved our problem. When installing TSPS in HA mode, the HA snapshot copy is not performed because secondary node can't connect to postgre DB in primary:

                       

                      ERROR 12/21 12:59:05.748 [main] c.b.t.a.u.DatabaseUtil BMC_TS-PL000007F Failed to create database connection.

                      org.postgresql.util.PSQLException: Connection to localhost:5432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.

                      Caused by: java.net.ConnectException: Connection refused

                       

                      The copysnapshot command:

                       

                      ./tssh ha copysnapshot

                      +-------------------------------------------------------------------------------------------+

                      |   BMC TrueSight Presentation Server - Command Line Interface 2016 version 10.5            |

                      |   Copyright 1997-2016. BMC Software, Inc. as an unpublished work.  All rights reserved.   |

                      +-------------------------------------------------------------------------------------------+

                      Copying snapshot ... Failed.

                      Copy the database from the other node resulted in an error.

                       

                      generates the following log: db-remote-copy.log

                       

                      Wed Dec 21 15:17:23 CET 2016 Copying DB from remote host :[MYHOSTNAME.MYDOMAIN] User:[patrol]

                      pg_basebackup: could not connect to server: FATAL:  no pg_hba.conf entry for replication connection from host "[MYIP]", user "patrol", SSL off

                       

                      So I checked in primary node the file TrueSightPServer/truesightpserver/data/pgsql/pg_hba.conf

                       

                      hostnossl replication patrol MYPRIMARYHOSTNAME.MYDOMAIN trust

                      hostnossl replication patrol MYSECONDARYHOSTNAME.MYDOMAIN trust

                       

                      The entries generated during the installation were with FQDN, and the database access is with HOSNAME (without domain) so I included new entries in the file

                       

                      hostnossl replication patrol MYPRIMARYHOSTNAME trust

                      hostnossl replication patrol MYSECONDARYHOSTNAME trust

                      and restarted TSPS in Primary.

                       

                      After that, I was able to perform the copysnapshot from secondary node:

                       

                      tssh ha copysnapshot

                       

                      +-------------------------------------------------------------------------------------------+

                      |   BMC TrueSight Presentation Server - Command Line Interface 2016 version 10.5            |

                      |   Copyright 1997-2016. BMC Software, Inc. as an unpublished work.  All rights reserved.   |

                      +-------------------------------------------------------------------------------------------+

                       

                      Copying snapshot ... Done.

                      Successfully copied a snapshot of the database from the other node and initialized for execution.

                       

                      Once restarting secondary node, access shows:

                       

                      Error

                      You accessed a system that is in standby mode. Contact your system administrator for assistance.

                       

                      I hope this will be helpful!

                      kind regards

                      Sergio Martín

                      • 8. Re: Issue with TSPS Secondary Node
                        Anuparn Padalia

                        Thanks Sergio! For sharing an outcome.

                         

                        But this issue was different in my case, I have given FQDN during Installation but after running for around more than 1 week, Secondary node stopped working.

                         

                        In the case highlighted by you, issue is from day1. This is on almost all versions of TSPS.

                         

                        ./Anuparn Padalia