1 2 Previous Next 16 Replies Latest reply on Apr 7, 2014 5:39 PM by Barry McQuillan

    How to copy host list to 'copy and execute' jobs?



      I have a perfectly good type 3 job ('copy and execute') that's been thoroughly debugged.   About 720 lines of bash script, that's I'd prefer not to re-write in NSH.


      However, it was executed last month on 2 out of 3 nodes of a prod cluster.  Which led to instability.  What I want is it to execute on all nodes, or none.


      On a specific node, it's very easy to determine all nodes of that cluster.  So if I can pass this type 3 job the list of all targets for this job execution, it can compare against list of all nodes.  If all nodes in target list, proceed.  Else, fail.


      The type 3 jobs run on each node will do this same check.  So all invocations will independently reach the same conclusion.  All or none.


      So probably 5 lines of additional code.  Total.


      Obviously, I could re-write this as a type 2 job ('Execute once, pass host list as a parameter').  I've written numerous type 2 jobs.  You have

      both %f and %h available, to pass in the target list of servers.


      I see three downsides with re-writing this job as a type 2 job:

         - the re-coding (as discussed above)

         - handling the degree of parallelism myself (type 3 jobs I can set degree of parallelism)

         - hanging BBSA agents can hang a type 2 job's for loop


      The other possibility is to write this as a batch job.  Consisting of two parts:

           A type 2 job that merely writes the target list to a central pub share.

           Followed by the std type 3 job that first thing retrieves this target list from pub share and applies above logic.


      That also seems fraught with peril -- now I need a central pub share.  What if some nodes can't hit it?


      It'd be so easy, if a type 3 job accepted %f or %h.  Or something equiv.



        1 2 Previous Next