14 Replies Latest reply on Mar 11, 2013 5:29 PM by Bill Robinson

    NSH Script Jobs against 5000+ targets

    richard mcleod

      The environment I administer is quite large, 15000 clients and growing, I've written some NSH scripts to grab data from servers and/or to set server extended properties, these all work fine in my UAT environment (~20 targets). When I would promote the scripts to production and run against a large group of targets (more than 5000) I receive an error that goes like this:

       

      Error Feb 7, 2013 6:00:05 PM nsh:13: arg list too long: /opt/bmc/BladeLogic/8.1/NSH/tmp/longesblogicp1_j03/scripts/job__57f9480a-8dec-4a06-b6d0-6fca78bf61aa/script_DBKey-SJobKeyImpl-2002484-2__3db7f11f-c54e-4fa5-a6ab-b5f739653c07.Set Build Date

       

      So I raised a ticket with BMC Support, they told me its a known issue (didn't say what the max target count was..) and that I should write the servers to a file and then read them in through a while loop.

       

      So the journey begins

       

      Since targets can't be >5000 I am sending the smart server group path in as a parameter $1

      fpsp=/tmp/rscddirwinpath_tmplst_$cdate

      blcli_execute Server listServersInGroup "$group"

       

       

      blcli_storeenv serverlst

      echo $serverlst > $fpsp

      while read serverList

      do

        for serverName in $=serverList

        ...

        do things

         ...

        done

      done < $fpsp

      Looks like it should work right? When I run it in a small environment, it works without problem, when it is run the larger environment and the smart server group contains more than 5000 servers, it outputs all the data from 'blcli_execute server listserversingroup' then I just receive an exit 0. It doesn't enter the loop ever.

       

      What I think is happening is that the script isn't necessarily running sequentially or maybe it is but it is not waiting for the command to finish before jumping to the next command. So it enters the loop with no data in $fpsp because it hasn't been written yet. I've tried using 'sleep 120' but it is seemingly ignored.

      Any ideas?

        • 1. Re: NSH Script Jobs against 5000+ targets

          Whilst I agree that it *should* work, I've found myself moving more and more towards Jython for my scripts simply because it seems to just "work" more reliably.  Is this something you could consider?

           

          -John.

          • 2. Re: NSH Script Jobs against 5000+ targets
            richard mcleod

            Interesting, never used Jython. Know of any good places to start? Possibly with BladeLogic specific examples?

            • 3. Re: NSH Script Jobs against 5000+ targets

              You can consider creating a execution task to run the same job agains different set of targets , within the limit.

              This will be quicker than looping.

              • 4. Re: NSH Script Jobs against 5000+ targets

                I started with a combination of http://www.jython.org/jythonbook/en/1.0/ and the "Jython Essentials" book from O'Reilly.  The hardest part is getting BLJython to work - the rest is pretty easy if you've done any scripting before!

                 

                (...and it opens up the doors to write some AWESOME scripts!)

                 

                -John.

                • 5. Re: NSH Script Jobs against 5000+ targets
                  richard mcleod

                  Rohit - Thats accepting defeat! Never!

                   

                  I solved the problem. It was a memory limitation with the blcli_storeenv var

                   

                  Instead of:

                   

                  blcli_execute server listserversingroup "$group"

                  blcli_storeenv serverlst

                   

                  I used

                   

                  serverlst=`blcli_execute server listserversingroup "$group"`

                   

                  works like a charm now!

                   

                  performance commands huh!

                  • 6. Re: NSH Script Jobs against 5000+ targets
                    richard mcleod

                    Thanks, picking up the Jython Oreilly book at lunch!

                    • 7. Re: NSH Script Jobs against 5000+ targets

                      What version of BL u are on ?

                      Jython enables you to automate thugs more efficiently , it's similar to Python in terms of syntax and semantics.

                      The previous patch solution till 8.1 was supported by use of jython, it's installation would setup jython automatically for you

                      I am not sure if you can get your hands around that now.

                       

                      Thanks,

                      Rohit

                      • 8. Re: NSH Script Jobs against 5000+ targets
                        richard mcleod

                        We're running 8.1.03. I am def interested in using jython. Sometimes I can't stand the quirkiness of NSH/ZSH!

                        • 9. Re: NSH Script Jobs against 5000+ targets
                          Bill Robinson

                          you realize that's spawning a new jvm when you run the blcli_execute in ` ` right?

                           

                          for a single command execution that's probably ok, but if you are going to be looping or running multiple commands it defeats the purpose of using the performance commands.

                           

                          also - your loop was not really setup correctly - this is your original loop:

                            

                          fpsp=/tmp/rscddirwinpath_tmplst_$cdate 

                          blcli_execute Server listServersInGroup "$group"

                          blcli_storeenv serverlst

                          echo $serverlst > $fpsp

                          while read serverList

                          do

                          for serverName in $=serverList

                          ...

                          do things

                          ...

                          done

                          done < $fpsp

                           

                          so you get the server list from the blcli and echo it into a file, then you read a variable called 'serverList' from the file containing the list of servers, so at this point '$serverList' will evaluate to one server, then you do another for loop on a single server name ?

                           

                           

                          what should work is this:

                           

                          fpsp=/tmp/rscddirwinpath_tmplst_$cdate 

                          blcli_execute Server listServersInGroup "$group"

                          blcli_storeenv serverlst

                          echo $serverlst > $fpsp

                          unset serverList

                           

                          while read serverName

                          do

                          do things

                          ...

                          done

                          done < $fpsp

                          • 10. Re: NSH Script Jobs against 5000+ targets
                            Bill Robinson

                            also - i'd ask why you are doing it this way - a much easier way would be to create a type 2 script, target it to a server group, use the '%f' as input to the job and then do like:

                             

                            while read server

                            do

                            stuff

                            done < $1

                            • 11. Re: NSH Script Jobs against 5000+ targets
                              richard mcleod

                              The problem is the memory limit on blcli_storeenv is unable to store all of the data from $group.

                               

                              What I've gathered is that the following commands work in unison like this

                               

                              blcli_execute server listallserversingroup $group #writes files out to stdout

                              blcli_storeenv serverlst #stores data from stdout into serverlst variable

                               

                              The script simply exits with 0 before it reads the entire 'server listallserversingroup $group', to me that suggests its a memory issue. A PS guy that works with us tried using the blcli command to increase jvm size but that didn't help at all.

                               

                              The script I wrote helps with two things

                               

                              1. I can dynamically create the input file based on these commands (my environment is large and constantly changing, enrolling/decomming servers weekly)

                               

                              serverlst=`blcli_execute server listallserversingroup $group`

                              echo $serverlst > $fpsp

                               

                              This cuts down on any manual work I would have to do in creating the list and setting it as a %f param for a type 2 script

                               

                              2. It breaks up the serverList (one huge variable comprised of all the data from the blcli_execute command)

                               

                              Once I've pushed the data from memory to a file with echo serverlst > $fpsp I now have a one line file with all my server names.

                               

                              The while loop reads that data into the serverList variable, so now I have a really huge variable

                               

                              The for loop reads out one piece of data during each loop by using the $=serverList operator (field splitting)

                               

                              The appserver boxes i have are beefy 16 procs, 128 gb memory so i can afford to use the extra jvm and read data back from the file to a variable without really noticing any service degradation.

                               

                              I know the script could use some optimizing but at this point I am able to accomplish what I set to out do...

                               

                              "do things" against more than 5000 targets

                              • 12. Re: NSH Script Jobs against 5000+ targets
                                Bill Robinson

                                so why not use the %f and a type 2 job?  you are already using a blcli command to list servers in a group and dump it into a file, which is exactly what targeting the job against a smart group and using %f. 

                                 

                                i've used various blcli commands to return large datasets and used blcli_storeenv w/o any issue - the problem was keeping the variable in memory in the nsh shell.  so you do an immediate dump to a file and then unset the shell variable.

                                • 13. Re: NSH Script Jobs against 5000+ targets
                                  richard mcleod

                                  Because type 2 job would mean that I would either need to manually provide the list of servers and/or run another job that dumps the servers to a file, then would have to ensure the filename/path is synced between the two jobs.

                                   

                                  Ran many jobs this past weekend with the described method, no issues. moving onto my other issues. thanks for the input

                                  • 14. Re: NSH Script Jobs against 5000+ targets
                                    Bill Robinson

                                    For a type 2 job you can target a server group (smart or static), and %f (the file) will contain that list of servers.  is that not possible ?