6 Replies Latest reply on Nov 16, 2011 8:09 AM by Mohammed Asif Huque

    Waiting to run....

      Has anyone seen this before where, when we deploy any job it just is in a "Waiting to run..." status.

      I am on 7.3..nothing much in the appserverlogs, I will have to get it to debug mode

        • 1. Re: Waiting to run....

          I already saw that and there were 3 cases where this happened

           

          - all your Work-Item-Thread are in the "STUCK" state (see AppServer details) - this means that at some point they were used to communicate with agents that are in a state where a socket was opened and the thread waits for an answer for ever.

           

          - If you have more than one AppServer configured on your DB, by default Jobs can be split on several AppServers, this means that some tasks will be performed by remote Work Item Threads. This mechanism uses special Threads called "RMI threads" (basically these permit Remote Method Invocations from a JVM to another) these threads can be in some cases in a stuck state (although not that obvious in the AppServer details as the first case) in which case the job will stay in the waiting to run state until another job that's already running terminates.

           

          - the third case is a variant of the second one: the same "RMI threads" are involved in communication with the Process Spawner, and the same kind of problem might then happen.

           

          There are several tickets and defects opened for this problem (not solved yet) but I think you probably need to open a ticket as well...

           

           

          The workarounds are for now:

           

          1. restart the Process Spawner (if used)

          2. restart the AppServers

          3. reconfigure the AppServers to avoid using Job splitting capabilities and external spawning:

           


          <AppServer>
          ...
          <SendRemoteWorkItemsEnabled>false</SendRemoteWorkItemsEnabled>
          <ReceiveRemoteWorkItemsEnabled>false</ReceiveRemoteWorkItemsEnabled>
          ...
          </AppServer>
          ...
          <ProcessSpawner>
          <SpawnExternally>false</SpawnExternally>
          ...
          </ProcessSpawner>

           

          Note: you gotta do this editing the "Config.xml" file as some keywords aren't available to blasadmin.

           

          Cheers, Olivier.

          • 2. Re: Waiting to run....
            Bill Robinson

            does the process spawner work? i've had problems in < 7.3 getting it to function properly - i get the RMI timeouts or errors.

            • 3. Re: Waiting to run....

              ... well that's a question I ask myself actually, the thing I know is that I used it for a couple of days without problem before falling again in this RMI issue.

               

              At first I thought that disabling inter AppServer communication would be sufficient but now I'm in a situation where basically I use my AppServers as I used to back in 6, so I believe the right answer is no although the problem seems to be a generic RMI problem.

               

              -Olivier

              • 4. Re: Waiting to run....

                Oliver, thanks for the information. I updated some of the items you suggested in blasadmin and now I am not seeing the issue anymore..

                 

                MaxWorkerThreads:10

                MaxContexts:100

                MaxJobs:20

                MaxWorkItemThreads:50

                MaxConcurrentRemoteWorkItemRequests:5 (default = 5)

                MaxTimeForCancelToFinish:5 (Changed, default = 10) PropagateWorkItemTimeout:false (Changed, default = true)

                MaxNshProxyThreads:

                IdleConnectionPruneTime:30 (Changed, default = empty)

                IdleNshProxyPruneTime:

                HTTPProxyName:

                HTTPProxyPort:

                HTTPProxyUser:

                HTTPProxyPassword:

                CLRPort:9828

                SRPPort:9829

                Krb5Port:

                SSLPort:9831

                CLRProxyPort:

                KRB5ProxyPort:

                SRPProxyPort:

                RegistryPort:9836 (default = 9836)

                SocketsBindAddress:all

                CertStore:/usr/nsh/br/bladelogic.keystore

                CertPasswd:OLBKXLVQQUBUTWTTUMAWLPAXUUPWQBXMEMOXUBQENOLMOBXBPUBTPQENMNLNUMUE

                AtEdge:no

                PWDStore:

                Krb5LoginConfig:

                Krb5Config:

                NSHProxyOnly:

                ComponentCacheMaxSize:100

                TemplateCacheMaxSize:100

                SnapshotCacheMaxSize:100

                AssetPathCacheMaxSize:10000

                EnableSessionBasedCaching :

                FileSystemObjectCacheMaxSize:50000

                ComplianceResultMaxNumberOfAssets: (default = empty) RemoteServerTimeout:60 (default = 60) ServerMonitorInterval:10 (default = 10) SocketConnectTimeout:30 (default = 30) SocketTimeout:600 (default = 600) UseSSLSockets:no (default = no)

                RequireClientAuthentication:yes (default = yes)

                 

                MaxJobTimeInSchedulerQ:60 (default = 60)

                 

                ConnectionString:jdbc:oracle:thin:@oracle-db1:1521:bladelogic

                DriverClass:oracle.jdbc.driver.OracleDriver

                UserId:bladelogic

                Password:KXTUOXKKOBWBBXEOPXQMEZLBELVQMMWKVWOUTKPZMMKEAVKLKWLPEWPENUTVZWAU

                CommitSize:50

                FetchSize:100

                MinJobExecutionConnections:0 (default = 0)

                MaxJobExecutionConnections:100 (default = 100)

                MinGeneralConnections:0 (default = 0)

                MaxGeneralConnections:20 (default = 20)

                MinClientConnections:0

                MaxClientConnections:20

                TransactionAttempts:

                • 5. Re: Waiting to run....

                  ... Did you experience your problem as soon as the AppServer was up?

                   

                  -O

                  • 6. Re: Waiting to run....
                    Mohammed Asif Huque

                    I faced this issue of jobs being in 'waiting to run' state for ever. To fix this, I had to go back to the basics of syncing the clocks of the participating app servers... Turns out my app servers were on different time zones.  I was able to run the jobs after this.

                     

                    Thanks.