1 2 Previous Next 18 Replies Latest reply on Jun 8, 2018 8:00 AM by Sean Berry

    BSA sizing recommendations for patching 4000 servers at the same time

    Patrik Stanz
      Share This:

      We have BSA 8.9.01 installed on Windows server 2012 R2 64 bit machine.
      We have 4000 windows servers enrolled in our BSA environment.

      We want to patch all 4000 windows server with BSA at the same time.
      What is the recommendations from BMC for BSA to fullfill that? Please let me know:
      - How many BSA jobserver should I have configured
      - How many Work Item Thread numbers should be configured per BSA Job server
      - How is the Work Item thread calculation done? (like amount of CPU * 100 ?)

      As far as I know, If I run the patch analysis job against 100 servers, it will consume 100 Work Items threads or? So I would need 4000 Work Item threads to run it against 4000 servers at the same time or?

       

      Thanks,

      Patrik

        • 2. Re: BSA sizing recommendations for patching 4000 servers at the same time
          Patrik Stanz

          Hi!

          The doc does not help at all. My goal is, to serve the patching process on all 4000 systems at the same time - parallel execution.

           

          With 100 Work Item Threads, it seems like BSA is handling the patching process in 100 server chunks. Once all 4000 servers are analysed, BSA is configured to start remediation job(s). Now BSA is starting to roll out patches to the first 100 server chunks, then once 1 or more servers finsihed, the next chunk will start.

           

          If we assume, that the longest duration for 1 server takes e.g. 30 minutes to patch, 100 server will be patched then in 30 minutes as well. But for 4000 systems, it will take 40*30 minutes = 20 hours which of course is way too long.

           

          So I need to fullfill the following scenario: Serve all 4000 systems with patch analysis and afterwards the remediation job at the same time to reach the overall runtime for the 4000 systems accordingly the longest run (e.g. 30 minutes).

           

          PS: I do not care in this scenario about network speed / bandwidth, firewalls, .... I am just speaking about the recommended config of BSA itself.

          1 of 1 people found this helpful
          • 3. Re: BSA sizing recommendations for patching 4000 servers at the same time
            Barry Reilly

            Hi Patrik,

             

            I have similar setup, with 3 All Role App servers.

             

            12GB RAM , 4 vCPU per app server.

             

            Java max heap set to 8GB

             

            app server java settings.png

             

            The way I do it is group servers by : Non Prod One At A Time, Non Prod Manual Non Prod All Other, and similar 3 for Prod.

             

            If you dont need to distingush like that then simply have a minimum of 3 app servers, create 3 smart groups and divide the estate between them.

             

            Then make 3 corresponding windows patch jobs.

             

            Note 1: Using Batch Job with all 3 patch jobs in it to run in parallel, does not auto balance then between the 3 app servers

            Note 2: Max_DIsk _Percentage per target property, is used  by the Remediation Deploy Options  (Tick to check space), so in some cases if the C drive is not a consistent size and its free space is very low you may find they dont get patched. some thing I recently discovered.

             

            For the 3 patching jobs simply schedule them individually to start one after the other, say 2 minute delay. What should happen is the jobs will get assigned to each app server.

             

            Note 3: Utilise Max Excution Time (in minutes) on each patch job. So say your windows is 18:00 to 08:00 the next morning, set it for 10 hours (600 minutes) and allow for the last 2 hours for reboots.

             

            Note: 4: Sometimes its worth not allowing the deploy options to auto reboot at end of job, instead Ignore all Reboot requests , and instead have a nshscript job to reboot them all. Clears any pending reboots.

            Note :5 Pre Patching reboots - some thing worth considering is clearing the pending reboots at the start of your patching window. Using smart groups and a nsh script you can set a property for Needs_Reboot, and a smart group set to filter on that being True. Then run a reboot script against that group.

             

            My recommendation is minimum of 3 BSA App servers. or 1 for every 1000. The Paralel server count per patch job, can be 50-100, but suggest 50 at  a time per patch job.

            1 of 1 people found this helpful
            • 4. Re: BSA sizing recommendations for patching 4000 servers at the same time
              Patrik Stanz

              Thanks for your detailed answer :-)

              You mentioned good points how it can be done in a productive environment.

               

              However, for my customer its all about duration (currently): They calculate like this:

              - BSA has 100 work item threads, 30 minutes for 1 server for 4000 windows server = 40*30 minutes = 20 hours for all servers

              - HP Orchestrator can serve all server at same time = 30 minutes for all servers

              - Windows SCCM: Can serve all servers at the same time = 30 minutes for all servers

              - Windows WSUS: Can serve all servers at the same time = 30 minutes for all servers

               

              So they see no benefit (at least for patching purpose) in using BSA because of this duration issue. So my goal here is to configure BSA somehow to serve all systems at the same time (e.g. having 4000 Work Item Threads) to reach (almost) the same duration as the other tools (SCCM, WSUS, HPO) would have

              • 5. Re: BSA sizing recommendations for patching 4000 servers at the same time
                Bill Robinson

                when you say 'patch' what do you mean?  install patches?  run analysis, both ?

                 

                - BSA has 100 work item threads, 30 minutes for 1 server for 4000 windows server = 40*30 minutes = 20 hours for all servers

                what takes 30 min ?

                 

                 

                - HP Orchestrator can serve all server at same time = 30 minutes for all servers

                - Windows SCCM: Can serve all servers at the same time = 30 minutes for all servers

                - Windows WSUS: Can serve all servers at the same time = 30 minutes for all servers

                w/ what infrastructure ?  and doing what ?  analysis ?  installing patches ?  you are comparing some about of bsa infra - 100wits to some unknown sizing of the other products.  do you have 500 sccm servers ?  500 wsus servers?  1 wsus server ?

                • 6. Re: BSA sizing recommendations for patching 4000 servers at the same time
                  Daniel Goetzman

                  I am interested in the comment about the MAX_DISK_PERCENTAGE Server Property and how it is linked to Deploy jobs?

                  I don't believe they are linked? Is this a valid???

                   

                  My understanding is the Server Property MAX_DISK_PERCENTAGE is simply an integer value used by some content and not by Deploy Jobs?

                  And the "Test for sufficient staging directory on target for phase:" simply checks that the target staging area has enough space to hold the entire payload at STAGE phase?

                  • 7. Re: BSA sizing recommendations for patching 4000 servers at the same time
                    Patrik Stanz

                    Hi Bill!

                     

                    I will try to clarify things:

                    • what takes 30 min ?
                      • Let's imagine analysis + patch rollout (copy) + installation take on every server exactly 30 minutes. Then my calculation "BSA has 100 work item threads, 30 minutes for 1 server for 4000 windows server = 40*30 minutes = 20 hours for all servers" is right or?

                     

                    • w/ what infrastructure ?  and doing what ?  analysis ?  installing patches ?  you are comparing some about of bsa infra - 100wits to some unknown sizing of the other products.  do you have 500 sccm servers ?  500 wsus servers?  1 wsus server ?
                      • I can ask the customer how there environment looks like exactly but I am pretty sure they have less than 10 WSUS/SCCM servers.
                      • Doing What  -> Installing a list of required patches on the systems

                     

                    As I asked before, in the case of

                    • having 4000 windows systems
                    • Analysis + patch Rollout (copying) + Installation would take on every server exactly 30 minutes (just for calculation)

                    how shall I configure BSA that it would serve all 4000 servers at the same time with a duration of 30 minutes for all?

                    1 of 1 people found this helpful
                    • 8. Re: BSA sizing recommendations for patching 4000 servers at the same time
                      Bill Robinson
                      • Let's imagine analysis + patch rollout (copy) + installation take on every server exactly 30 minutes. Then my calculation "BSA has 100 work item threads, 30 minutes for 1 server for 4000 windows server = 40*30 minutes = 20 hours for all servers" is right or?

                      so first lets break down the 30 min per server.  what happens in that 30 min ?  how long does analysis take ?  how long does staging take ?  how long does the commit take ?

                      if analysis is taking up most of the time are you using an include?  or why is consuming much of the time ?analysis and staging will benefit from have additional appserver capacity (eg, more wits) so they can run more concurrently.  staging is also hampered by the network, so if you are patching remote systems you should look into repeaters so you aren't copying the same patches over and over across a wan.  do the other product infrastructures have something similar to a bsa repeater ?  if the commit is taking up time, why ?  because of how many patches are being installed ?  because you are patching a bunch of VMs on the same vm datastore and you are killing your storage ?  generally commit should run async meaning you can have far more commit phases running concurrently than you have wits.  do you have the blasadmin settings 'enableasyncexecution = true' and 'maxlightweightworkitemthreads=200' on all the jobs instances ?  if not, you aren't using async for commit.  when you run the patching from the other tools are they all doing the analysis, payload copy and install in the same run ?  if not, what are they doing in the 30 min ?  how long is their per-server runtime ?

                       

                      • I can ask the customer how there environment looks like exactly but I am pretty sure they have less than 10 WSUS/SCCM servers.

                      and you have 1 bsa appserver? 

                       

                      • Doing What  -> Installing a list of required patches on the systems

                      which means doing what specifically ?  analysis ?  generating packages to deploy for each system ?  blindly blasting out patches ?  we can't even discuss comparing the runtimes until we know exactly what each of these products are doing during a 'patch' operation.

                       

                      • Analysis + patch Rollout (copying) + Installation would take on every server exactly 30 minutes (just for calculation)

                      why are you using 30 min 'just for calculation' ?  is that how long all of this actually takes in your env for a single server ? 

                       

                      do you need to run analysis, stage and commit at the same time ?  why not analysis and stage ahead of time ?

                      • 9. Re: BSA sizing recommendations for patching 4000 servers at the same time
                        Patrik Stanz

                        Hi Bill!

                         

                        so first lets break down the 30 min per server.  what happens in that 30 min ? -> Just please imagine the following to simplify all the things here. Forget about the patching procedure, lets think about this:

                        I have 1 BLPackage with 10 patches included. Then I have 1 deploy Job to install them on 4000 servers with 1 application server with 100 Work Item Threads.

                         

                        All 4000 systems are exactly the same and lets assume that the deploy job will take to finish for 1 server exactly 30 minutes. As I would say 100 servers will be finished then in 30 minutes right (as I have 100 threads). So for 4000 systems it will take then 40 *30 minutes or?

                         

                        which means doing what specifically ?  analysis ? -> Installing 10 patches for this example here to be comparable.

                         

                        why are you using 30 min 'just for calculation' ? -> Just a number. You can put in 10 minutes here as well.

                         

                        My main question: How can I configure BSA to run e.g. a deploy job against 4000 systems at the same time (parallel) and avoid chunk execution? Do I need 4000 Threads?

                        • 10. Re: BSA sizing recommendations for patching 4000 servers at the same time
                          Bill Robinson

                          -> Just please imagine the following to simplify all the things here. Forget about the patching procedure

                          we can't forget about the procedure here.  you say 'patching' takes 30 min per server.  you need to define what 'patching' means here and if it's multiple jobs like patching, remediation, deploy, which jobs are taking up how much time.  those jobs/phases will scale differently and allow for different concurrency. 

                           

                          , lets think about this:

                          I have 1 BLPackage with 10 patches included. Then I have 1 deploy Job to install them on 4000 servers with 1 application server with 100 Work Item Threads.

                          once simulate and staging are complete, commit will not use any wits while it runs which means you can run more targets in parallel than you have WITs for the commit phase.

                           

                          All 4000 systems are exactly the same and lets assume that the deploy job will take to finish for 1 server exactly 30 minutes. As I would say 100 servers will be finished then in 30 minutes right (as I have 100 threads). So for 4000 systems it will take then 40 *30 minutes or?

                          which part of the deploy ?  simulate and stage take 30 min ?  or simulate, stage and commit take 30 min total.  let's say simulate and stage take 10 min and commit takes 20.  as soon as stage is done on a system, that wit goes back to the pool for use by another target. 

                           

                          My main question: How can I configure BSA to run e.g. a deploy job against 4000 systems at the same time (parallel) and avoid chunk execution? Do I need 4000 Threads?

                          depends what part of the deploy job you are asking about here.  simulate and stage - yes - you would need 40 appserver instances, each w/ 100 WITs if you want to have 4000 targets being hit at exactly the same time, concurrently.  what is 'chunk execution' ? 

                           

                          why do you want to hit 4000 systems concurrently ?  what is the actual goal here - to patch some number of patches to 4000 systems in 30 min ?  and by patch do you mean just install patches?  stage patches and deploy ?  run analysis, then stage then deploy ? 

                           

                          we have several customers that will run analysis, remediation and staging well ahead of their patch window and then when the window opens only the commit (install) is left to run.  because it seems silly to burn time doing analysis and file copy during the window where you need to leave time to do the patch install and reboot.

                          1 of 1 people found this helpful
                          • 11. Re: BSA sizing recommendations for patching 4000 servers at the same time
                            Patrik Stanz

                            Hi Bill!

                             

                            Thanks for your help!

                             

                            why do you want to hit 4000 systems concurrently ?

                            Its a requirement from the customer. They want to see that we can install (e.g. 1 patch) at the same time on 4000 systems as their current system can do so.

                             

                             

                            what is the actual goal here - to patch some number of patches to 4000 systems in 30 min ?

                            The goal is to serve 4000 systems at the same time (e.g. deploy 1 patch to 4000 systems at the same time) 

                             

                            and by patch do you mean just install patches?

                             

                            Analyse+Copy to target (==staging)+Installation (==Commit/Apply)

                             

                            what is 'chunk execution' ?

                            With 100 threads the system/BSA can only handle 100 servers at the same time (at least it seems like this). So with chunk execution I mean execution/handling of targets in (100) groups.

                             

                            Is there a rule on how to configure the work item threads (like number of CPU*100)? can we have e.g. 500 Work Items Threads configured? If so what would be the hardware requirement?

                             

                            Thanks,

                            Patrik

                            • 12. Re: BSA sizing recommendations for patching 4000 servers at the same time
                              Bill Robinson
                              Its a requirement from the customer. They want to see that we can install (e.g. 1 patch) at the same time on 4000 systems as their current system can do so.

                              and what infrastructure does their current system have ?  eg, how many 'appservers' to do that ?  and how did you confirm that their current system can analyze, stage and deploy all at the same time ?  i'd push on this a little because most of our customers will do the analysis and staging ahead of time so that the time window they have to install patches is used to install patches, not copy files.  if you really only need to worry about the commit phase, the requirements change a lot.  why is the requirement to do 4000 all at the same time ? why not 4000 in a certain time window ?  i think it's important to get what the end goal is because different systems work differently.  if the real goal is to patch n servers in a certain time window focus on that, not how the execution happens.

                               

                              Is there a rule on how to configure the work item threads (like number of CPU*100)? can we have e.g. 500 Work Items Threads configured? If so what would be the hardware requirement?

                              max is 100 wit per instance.  8g heap per instance.  so you can have a few systems w/ lots of cpu and memory or many smaller systems.  so that's 40 instances of the appserver.  which i don't think any of our customers are running, even those managing 10k servers and patching all of those.

                              1 of 1 people found this helpful
                              • 13. Re: BSA sizing recommendations for patching 4000 servers at the same time
                                Patrik Stanz

                                "and what infrastructure does their current system have ?  eg, how many 'appservers' to do that ?  and how did you confirm that their current system can analyze, stage and deploy all at the same time ?  i'd push on this a little because most of our customers will do the analysis and staging ahead of time so that the time window they have to install patches is used to install patches, not copy files.  if you really only need to worry about the commit phase, the requirements change a lot.  why is the requirement to do 4000 all at the same time ? why not 4000 in a certain time window ?  i think it's important to get what the end goal is because different systems work differently.  if the real goal is to patch n servers in a certain time window focus on that, not how the execution happens."

                                 

                                They had before a HPO solution. They did the normal/standard Windows OS patching with the windows native solution called "WSUS". For emergency patches e.g. they used the HPO solution for rolling them out on their systems.

                                Their security council told the windows team to e.g. install emergency patch A and B on all their systems outside patch window immediately. So they build a HPO package to roll them out on all systems. As I understood it, they can do the rollout with HPO in 2 ways:

                                - HPO copied the patch on the system (without analysis) and tries to install the patch.  The Windows OS Patch installer told with return code the state "Already Installed" or "Success" or "Failed"

                                - You can create groups there to live group enrolled servers accordingly the patch state (read from the Windows OS "Installed Patchlist") and group servers where emergency Patch A or B is missing and roll it out there

                                Anyway, there they mentioned that they were able to roll the patches (e.g. emergency patch A and B) out on all of their 4000 systems at the very same time, therefore my question regarding parallel execution. And for emergency patches they need their environment configured to roll the patches out as fast as possible.

                                 

                                Currently they have 3 BSA appserver/appservers instances in place.

                                 

                                 

                                max is 100 wit per instance.  8g heap per instance.  so you can have a few systems w/ lots of cpu and memory or many smaller systems.  so that's 40 instances of the appserver.  which i don't think any of our customers are running, even those managing 10k servers and patching all of those.

                                 

                                Thanks for the info. Is there any info available on how other customer do the patching of the 10k server e.g? How long is the duration of a patching cycle there? Is there anything available that I can pass to the customer to calm their concerns down?

                                 

                                Thanks for your detailed explanations Bill :-)

                                I will pass all the gathered information so far to the customer and they should decide on how they want to use the system in their environment.

                                • 14. Re: BSA sizing recommendations for patching 4000 servers at the same time
                                  Bill Robinson
                                  - HPO copied the patch on the system (without analysis) and tries to install the patch.  The Windows OS Patch installer told with return code the state "Already Installed" or "Success" or "Failed"

                                  -> bsa can do the same thing.  deploy job, no analysis.  payload copy will use a WIT, commit will happen async.  so then i think the calculation is like:

                                  (time to copy file to one server) x (4000 servers / 100 wit) + (time to install patch ) = total deploy time.

                                   

                                   

                                  - You can create groups there to live group enrolled servers accordingly the patch state (read from the Windows OS "Installed Patchlist") and group servers where emergency Patch A or B is missing and roll it out there

                                  you could do the same thing w/ the 'components' in bsa and some discovery rules.

                                   

                                  Anyway, there they mentioned that they were able to roll the patches (e.g. emergency patch A and B) out on all of their 4000 systems at the very same time,

                                  that's a pretty vague statement.  that could mean they pushed the patch to 4000 systems at the same time but under the covers you had the 'chunk execution' happening.  did you see any evidence that it was 4000 concurrent all doing the same steps concurrently on each target

                                   

                                  therefore my question regarding parallel execution. And for emergency patches they need their environment configured to roll the patches out as fast as possible.

                                  for emergencies the blind deploy (no analysis) would be the fastest.  probably the group based 'blind deploy' would be better.

                                   

                                   

                                  Thanks for the info. Is there any info available on how other customer do the patching of the 10k server e.g? How long is the duration of a patching cycle there? Is there anything available that I can pass to the customer to calm their concerns down?

                                  there's a section on patching in here.  it applies to 8.9 as well.

                                  BMC Server Automation Sizing Guide 1.0

                                   

                                  i think you need to change the conversation from this '4000 at a time' and focus on the time window and number of targets and then how to meet it.  if we don't need to do analysis then this isn't even really a 'patching' discussion.  it's a deploy discussion.

                                  2 of 2 people found this helpful
                                  1 2 Previous Next