7 Replies Latest reply on May 22, 2019 6:19 AM by Bill Robinson

    Push active Jobs to another appserver?

    Markus Kruse
      Share This:

      Hello Guys,

       

      I'm wondering if it's possible to push active / running jobs to another appserver?

       

      Sometimes we're facing jobs which get stuck in their running state and even job-cancel don't do a thing. The only thing that helps there is to restart the appserver were the job is running on.

       

      Unfortunately sometimes the following happens:

       

      The appserver on which the stucked job is running appears much more slower as if he got stuck, too. This causes other running jobs on the appserver to run very slowly.

       

      In a middle of a patchroutine this is annoying because we can decide to let the patchroutine run (which takes hours more than usual) or to restart the appserver and cancel the patchjobs by thi, because they are automatically cancelled if their appserver underneath is killed in the process.

       

      Is it possible to push the other running jobs to another appserver to get completed so that we can restart the stucking appserver without problems?

       

      Greetings

      Markus

        • 1. Re: Push active Jobs to another appserver?
          Bill Robinson

          no - you can't push running jobs to other appservers.

           

          are you cancelling or aborting the job you think is 'stuck' ?

           

          why do you think these jobs are 'stuck' ?

          • 2. Re: Push active Jobs to another appserver?
            Markus Kruse

            we cancel them in the GUI. They seem to be stuck since the results show "green" or "red" for either completed w/o error or w/ error. It says "exit 0" oder "exit 1" but it just doesn't finish.

            • 3. Re: Push active Jobs to another appserver?
              Scott Rabinow

              Bill,

               

              We see this "stuck" job problem occasionally as well.  We determine it is "stuck" from several observations.  First, it's generally long past the JOB_TIMEOUT value for that specific job.  Second, when we check the Infrastructure Management window, and look at the Job Manager service on all of the job servers, the job that shows up in the TIP window isn't listed as a running job on any appserver, and the job logs will show repeated attempts to cancel the job by several people.

               

              It doesn't occur very often, and we have not been able to establish a pattern by job type, although I'd suggest that deploy jobs are more common than other types for this behavior.  There also doesn't seem to be a pattern related to the appserver either, and it happens in all of our separate environments.

               

              We've had jobs remain in this state for weeks, until the appserver it is "running" on is restarted.

              • 4. Re: Push active Jobs to another appserver?
                Bill Robinson

                ok, so they are showing as completed.  and why do you think they are still running or 'stuck' ?

                • 5. Re: Push active Jobs to another appserver?
                  Bill Robinson

                  when that happens you should look in the appserver status report and see if there are any active wits.  if no, you can query the job_run table to see if any jobs are in a running state.  there's also a running state in the job_result_device and job_result_component but i don't think that will cause the entry in tip.

                   

                  in that case we should look back in the appserver logs that were involved in running the job and see if something went wrong during the job execution.

                   

                  if the job isn't in a running state and there are no active wits then we need to look at that so i'd leave it in that state until support can look at it. 

                   

                  now - i think there was a defect recently w/ some job types where something like this could happen - i'd have to go dig around and see if the behaviour was like this or not.

                   

                  also - not sure what other word we can use but 'STUCK' is a state for WITs that denotes the thread is ... stuck ... somewhere.  not to be confused w/ what is being described in this thread.

                  • 6. Re: Push active Jobs to another appserver?
                    Markus Kruse

                     

                    This is a sample case of what i described. There seem to be no WIT, Job is running in GUI and was aborted many times but still it appears in the GUI (and do nothing)

                    • 7. Re: Push active Jobs to another appserver?
                      Bill Robinson

                      ok, so in that case it would be helpful to have the appserver logs from the env covering from the start of the job till now - any appserver involved in running the job or its WITs, the job run log and the appserver status report.  get that info, along w/ the screenshot and open a case.