4 Replies Latest reply on Nov 5, 2012 8:46 PM by Bill Robinson

    NSH script job hang

      Hi all,


      Got a bit of a weird issue where my NSH job seems to hung on a few servers. Out of almost 500 servers the job just sits and does nothing on 15 of them,nothing ing the job log. It doesn't time out or fail and most time I end up having to cancel it.


      The job itself doesnt doesn't do anything exciting just queries the servers about running processes and rpms installed.


      I've tried restarting and reinstalling the RSCD agents on the servers with the issues with no luck.


      Anyone able to help at all?



        • 1. Re: NSH script job hang
          Bill Robinson

          what specifically is the script doing ?  can you attach the script?


          is there anything common about the boxes it hangs on, eg same vlan, same os?  does it hang on the same 15 servers all the time ?


          more details about your env - how many appservers, max heap settings, etc ?


          have you tried running the hanging agents w/ debug logging enabled ?


          can you easily reproduce this ?

          • 2. Re: NSH script job hang



            I've attached the script:


            Unfortunatley there appears to nothing linking the servers, happening on different OS/kernel versions and accross multiple vlans.


            Yes it does hang on the same server each time, and easy to produce as it happens everytime i run the script.


            Can you tell me how to enable debugging?



            • 3. Re: NSH script job hang



              I assume you have already tried running the script directly against the problematic target via. NSH instead of a job.

              This way we could rule out anything from the script to be responsible for the hang.


              What version are the targets where teh hang is being seen? Is it different than the ones where its works?


              By debug logging I think Bill might be referring to enabling -x in your script, which you already have, and enabling debug logging for rscd and a couple of other things on the target.

              On Unix, you would do this by modifying the /usr/lib/rsc/log4crc.txt and using the logging priority level of "debug" in following lines:

              <category name="rscd" priority="info1" appender="/opt/bmc/bladelogic/NSH/log/rscd.log" debugappender="/opt/bmc/bladelogic/NSH/log/rscd.log"/>

              <category name="bldeploy" priority="info"/>



              • 4. Re: NSH script job hang
                Bill Robinson

                yes - in the top of the script put a 'set -x', after the shebang.


                another idea is to put in some 'date' statements or hellos or something to see where the hang might be happening as it could be a problem w/ one of the commands you are nexec'ing.