11 Replies Latest reply on Mar 11, 2016 1:22 PM by Bill Robinson

    When both Appserver are up jobs hang

    Neal Meagher

      I have issue where when both app server are up i run any job and its hangs. When I shutdown one, it runs. Appserver1 is is also the NFS files server. Mounted to appserver2. keystore file are in sync. Not sure what the issue is. Version 8.3.03

       

       

       

       

       

       

      Thanks

        • 1. Re: When both Appserver are up jobs hang
          Bill Robinson

          What does ‘hang’ mean ?

          What’s in the appserver log when this happens?

          Which appserver picks up the job and the wit for the job when there is a ‘hang’ ?

          What kind of jobs?  any ?

          Can the appservers talk to each other on registryport, minport through maxport ?

          • 2. Re: When both Appserver are up jobs hang
            Neal Meagher

            When the job is just waiting and waiting, there are no log entries.

             

            The port tests were all successful for 9836 and 9850-9854  9855-99 were refused (nothing listening).

             

            All jobs. It just sits, does not run or fail.

            • 3. Re: When both Appserver are up jobs hang
              Sean Berry

              What happens when only one is up?  Are the keystores synced?

              • 4. Re: When both Appserver are up jobs hang
                Bill Robinson

                It sits where? do you see it in tasks in progress?  what state is it in ? what kind of job ?

                Can you attach the logs from both appservers when this happens and indicate the job name and time you started it ?

                • 5. Re: When both Appserver are up jobs hang
                  Neel Shaha

                  It seems all jobs are going in "Waiting to run" state when you have multi-app server configuration in BL. Even I was facing the same issue once but restarting the process spawner & application server services on all app servers resolved my issue.

                   

                  Have a look at below links. This can help you to resolve the issue that you are facing:-

                   

                  Waiting to run....

                  Multiple Application Servers - Job stuck in "Waiting to Run.." 

                   

                  Regards,

                  Neel

                  • 6. Re: When both Appserver are up jobs hang
                    Neal Meagher

                    First I brought the app down on server 02.

                    Then I brought the app up on both servers.

                    I logged in, and as bladmin I attempted a deployment several times, which failed every time.

                     

                    Then I brought the app on server01 down leaving the app up on server02.

                    Then when I logged back in and tried the deployment again it just stayed in the waiting state.

                    So I logged out and I was about to restart the app on server02 but I noticed in the log that it disconnected two IDs for me so I tried to login once more.

                    Tried to login but after I would select my desired role during the sign-in process, the client app would hang and not come up completely.  This occurred perhaps 6 or so times until finally I was able to get in.

                     

                    When I did finally get in I saw the deployment that was in the waiting state was completed successfully.

                    And when I ran other deployments with any role they all worked once more.

                     

                     

                    The job I was running was "Nagios Win64" and I began running it at 6:51 this morning.

                     

                    There was a successful run at 7:04 that actually was originally stuck in a wait state until I logged out and stopped the application on server 01.

                     

                    When I was finally able to log back in, I saw that the job completed.

                     

                    It shows in the app log 1 failing

                     

                    [ERROR] ] [Deploy] Failed to copy file //blfs/depot/blpackages/0fc8adca-c43f-488a-9652-59c504d558b7/bldeploy.xml to //10.133.246.20/tmp/stage/8bdf9f63f5103fd6a6692cff3f9b69bb/bldeploy.xml

                     

                    and succeeding in appserver log 2 when the server was brought down.

                     

                    [INFO] [ [Deploy] The job 'Nagios Win64' has succeeded

                     

                    Im getting alot of error in app server launcher. I have verified keystore files.

                    • 7. Re: When both Appserver are up jobs hang
                      Bill Robinson

                      [01 Mar 2016 21:13:15,385] [Event-Transfer-Thread-4] [WARN] [::] [] Unknown host: nhpbsaapp02; nested exception is:

                          java.net.UnknownHostException: nhpbsaapp02

                      java.rmi.UnknownHostException: Unknown host: nhpbsaapp02; nested exception is:

                       

                      can your appservers resolve each other ? looks like no.

                      • 8. Re: When both Appserver are up jobs hang
                        Neal Meagher

                        It does resolve using entries in the hosts files for one another and the file server

                        • 9. Re: When both Appserver are up jobs hang
                          Bill Robinson

                          did you fix that after the 1st ?

                           

                          for this error:

                          [10 Mar 2016 06:51:49,442] [WorkItem-Thread-1] [ERROR] [Alan.Moss@dt.inc:CC_DTClassic_QA:] [Deploy] Failed to copy file //blfs/depot/blpackages/0fc8adca-c43f-488a-9652-59c504d558b7/bldeploy.xml to //10.133.246.20/tmp/stage/8bdf9f63f5103fd6a6692cff3f9b69bb/bldeploy.xml

                           

                          what is in the rscd log on 10.133.246.20 at the same time ?

                           

                          "I logged in, and as bladmin I attempted a deployment several times, which failed every time."

                          -> w/ what error?  the above, something else ?

                           

                          "So I logged out and I was about to restart the app on server02 but I noticed in the log that it disconnected two IDs for me so I tried to login once more."

                          -> what time ?

                           

                          "Tried to login but after I would select my desired role during the sign-in process, the client app would hang and not come up completely. "

                          -> when?  how are you connecting to these appservers?  directly?  via a load balancer?  network path from your gui to the appserver?

                           

                          [10 Mar 2016 06:51:45,632] [Job-Execution-0] [INFO] [Alan.Moss@dt.inc:CC_DTClassic_QA:] [Deploy] Started running the job 'Nagios Win64' with priority 'NORMAL' on application server 'nhpbsaapp01'(1)

                          -> is that when you started running the job or was it minutes after ?

                           

                          the appservers and db are all set to use the same time zone and their time is in sync ?

                           

                          your mail server is not setup correctly or it's blocking connections from the appserver:

                          [10 Mar 2016 07:04:51,120] [Job-Execution-1] [ERROR] [edwin.lindeman@dt.inc:BLAdmins:] [NSHScript] Invalid Addresses

                          javax.mail.SendFailedException: Invalid Addresses;

                            nested exception is:

                              com.sun.mail.smtp.SMTPAddressFailedException: 550 5.7.1 Unable to relay for DT_BSA_ADMINS@dt.inc

                           

                           

                          [10 Mar 2016 07:05:19,405] [WorkItem-Thread-1] [INFO] [Alan.Moss@dt.inc:BLAdmins:] [Deploy] Executing work item Deploy Apply Job (Pre-Execute):Nagios Win64; Server:10.133.246.20;  on application server: nhpbsaapp02

                          -> this is the one that worked ?  so what's in the rscd on the target when that goes through ?  note this is as BLAdmins not CC_DTClassic_QA.  that points to a permission issue on the target for this role.  all attempts to run the job as that role have failed.

                           

                          when is there a time where all appserver instances are running in the logs you attached ?

                           

                           

                          so it seems like you have possible two issues:

                          - you say there is an issue when you have all appserver instances running.  you have 1 ALL, 1 config and 2 JOB per appserver.  it looks like the 'ALL' are set to only 1gb of heap - that's too low.  and how much memory do you have in these boxes ?

                          • 10. Re: When both Appserver are up jobs hang
                            Neal Meagher

                            It only runs jobs when one server is up a time. I do see DB connection issues. All the right ports are open. I see heartbeat issues, which would lead to com issues with both app servers. Port 1433 is good.

                             

                            lient-Connections-Thread-2] [WARN] [Anonymous:Anonymous:192.168.14.146] [Client] Error authorizing the connection

                              Line 9488: [10 Mar 2016 20:14:43,524] [Client-Connections-Thread-5] [WARN] [Anonymous:Anonymous:192.168.11.123] [Client] An error occurred while attempting to access the database:

                              Line 9489: Message : Connection reset SQLState: 08S01 ErrorCode: 0

                              Line 9491: com.bladelogic.om.infra.app.db.DBException: An error occurred while attempting to access the database:

                              Line 9492: Message : Connection reset SQLState: 08S01 ErrorCode: 0

                              Line 9537: [10 Mar 2016 20:14:43,526] [Client-Connections-Thread-5] [WARN] [Anonymous:Anonymous:192.168.11.123] [Client] Error authorizing the connection

                              Line 9539: [10 Mar 2016 20:14:47,724] [Client-Connections-Thread-0] [WARN] [Anonymous:Anonymous:192.168.11.123] [Client] An error occurred while attempting to access the database:

                              Line 9540: Message : Connection reset SQLState: 08S01 ErrorCode: 0

                              Line 9542: com.bladelogic.om.infra.app.db.DBException: An error occurred while attempting to access the database:

                              Line 9543: Message : Connection reset SQLState: 08S01 ErrorCode: 0

                              Line 9588: [10 Mar 2016 20:14:47,725] [Client-Connections-Thread-0] [WARN] [Anonymous:Anonymous:192.168.11.123] [Client] Error authorizing the connection

                              Line 9592: [10 Mar 2016 20:14:48,011] [Scheduled-System-Tasks-Thread-17] [ERROR] [System:System:] [App Server Heartbeat] Connection reset

                              Line 9628: [10 Mar 2016 20:14:48,012] [Scheduled-System-Tasks-Thread-17] [ERROR] [System:System:] [App Server Heartbeat] The connection is closed.

                              Line 9630: at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:171)

                              Line 9655: [10 Mar 2016 20:14:48,013] [Scheduled-System-Tasks-Thread-17] [ERROR] [System:System:] [App Server Heartbeat] The connection is closed.

                              Line 9657: at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:171)

                              Line 9679: [10 Mar 2016 20:14:48,013] [Scheduled-System-Tasks-Thread-17] [ERROR] [System:System:] [App Server Heartbeat] Already closed.

                              Line 9703: at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:171)

                              Line 9726: [10 Mar 2016 20:14:48,014] [Scheduled-System-Tasks-Thread-17] [ERROR] [System:System:] [App Server Heartbeat] com.microsoft.sqlserver.jdbc.SQLServerException: The connection is closed.

                              Line 9743: at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:171)

                             

                             

                             

                            here my blasadmin output.

                             

                             

                            ./blasadmin -a show database all
                            blasadmin now running against the following deployments: nhpapp2_job1, default, nhpapp2_config, _template, nhpapp2_job2
                            [Database]
                            AutoBatchEnabled:true
                            ConnectionString:jdbc:sqlserver://nhpbsadbsql1:1433;DatabaseName=bladelogic;SelectMethod=cursor
                            DatabaseInstrumentationFilePath:
                            DatabaseInstrumentationRolloverCount:10
                            DatabaseInstrumentationRolloverSize:10000
                            DatabaseVersion:8.3.03
                            DriverClass:com.microsoft.sqlserver.jdbc.SQLServerDriver
                            FetchSize:100
                            IdleConnectionTestPeriod:600
                            MaxClientConnections:100
                            MaxGeneralConnections:100
                            MaxIdleTime:600
                            MaxJobExecutionConnections:100
                            MaxWaitTime:
                            MinClientConnections:0
                            MinGeneralConnections:0
                            MinJobExecutionConnections:0
                            MinTimeToLog:0
                            Password:
                              sql/sqlmap.properties
                              sql/streamable_sqlmap.properties
                            TransactionAttempts:10
                            UserId:bsauser

                            • 11. Re: When both Appserver are up jobs hang
                              Bill Robinson

                              What’s the network path from the appservers to db server ?  and the db  server can handle the max of 1600 or so connections the 8 instances are going to make ?

                              How much memory in the appservers ?