5 Replies Latest reply on Aug 27, 2020 4:22 PM by Steve Jankowski

    Auto Rerunning a Cyclic AFT Job on First Failure That Runs Multiple Times/Day.

    Steve Jankowski
      Share This:

      Good afternoon,

       

      I did some searching around on Google and here, but seem to have come up empty. I did find some helpful information, but nothing that seems to address my specific scenario. I'm fairly new to Control-M and the jobs were set up by an associate that is no longer with the company, so please bear with me.

       

      I have an AFT flow of 4 jobs that connect to a third party vendor. These jobs are sequential to prevent denies from multiple, simultaneous connections.

       

      This flow runs every 15 minutes, 24x7x365, so they're already cyclic.

       

      What we were seeing were connection refused errors with the jobs occasionally. Our solution is to have the operator retest the connection from inside Control-M, and rerun the job, if successful, which was the case 99.99% of the time. This was acceptable for a while, but we want to take operator intervention out of it even more, so they only create incident tickets and engage support when it's an actual issue.

       

      So, that's when I started looking into auto rerunning cyclic jobs after the first failure.

       

      One potential solution that I found is here: How can I setup my Control-M jobs to rerun on failure and send notification only on the failure of the rerun?

       

      The document says to make the job cyclic and set the number of failures to 2 and set your failure conditions/notifications. However, all the jobs are already cyclic to remove the conditions from the previous runs, so I'm not sure that would work.

       

      For reference, here are the cyclic settings for the jobs:

       

      The maximum reruns are set to 99. Outside of that, nothing else is set. The rerun settings show "Rerun every 0 Minutes, from Job's Start".

       

      What I'm looking for is pretty simple:

       

      If the AFT job fails on the first run, wait 5 minutes and rerun. If that fails, post the failure to Control-M.

       

      In regards to the linked document, can I set the On Do actions mentioned in the doc, without removing the cyclic setting?

      Are there other settings would I need to change with in the job to make this work? 

       

      We would like to be able to do this without creating new jobs/flows, if possible.

       

      Thanks for reading and thank you in advance.

       

      -Steve