this depends, you should read through documentation regarding sizing and scalability in order to understand how to size your environment correctly.
but generally it does not depend of number of targets but how many jobs you are running what jobs etc
The question is more about how many parallel jobs you plan to execute and how many targets in a job?
With bladelogic it is always the jobserver or the appserver that executes tasks on the target. Target never executes anything on its own.
Appserver or the job server basically establishes a connection to each target to execute a task or job. There is a physical limit to the total number of connections that can go out of your appserver to the target machines that will define how much you can scale.
The number of parallel jobs to be executed at the same time also matters since every job occupies memory and cpu on the appserver.
And third bottleneck is your fileserver. If your jobs end up pushing a lot of files from fileserver to targets, then disk IO will be also come into play.
So your use-cases with actually define how much you can scale with a single appserver.
parallel jobs about 20
targets in a job can be maximum 7000. we have Jobs that run 15hours
So let's say your appserver has 8GB ram.
All your jobs are of type NSH script jobs.
Each job is configured to limit the parallel targets per job to 15. i.e. every job will process only 15 targets at a time.
In this scenario, maximum number of targets being processed across all jobs will be 300.
300 outgoing parallel ssl connections and 300 parallel database connections should work I think.
Having said this jobs will take very long to complete. i.e. if it takes 10 mins for a job to complete on a single target on an average. Then total time for single job to complete will be more than 7000/15 * 10 = 4666 min. More than 77 hours.
This is not acceptable since maintenance windows are fixed and so you need to add more job servers to complete work in fixed amount of time.
Hope this helps!
if you need a job w/ 7000 targets to run in some amount of time you need to do the math and figure out how long it runs per target and then how many parallel targets you'd need to have to hit the amount of time you have. the max a single JOB server instance can handle is 100 targets in parallel. on a single physical server you can have multiple bsa appserver instances (if you have the memory and CPU).
so do the math, figure out what your acceptable runtime is, then how many app/job server instances you'd need for that and either scale up (more cpu/memory in the same box + more instances) and/or out (more boxes w/ more instances)
There’s some math around this in the Deployment Architecture BP webinar. Basically, a given task (work item) runs per target, using a workitemthread, for a certain number of minutes (a workitemthread-minute). Multiply out by number of targets, to get the total number of workitemthread-minutes. Cf. an inventory task that takes, say, 2 minutes to collect some data on a given machine. To collect data from 500 hosts, will take 2 min & 500 hosts = 1000 workitemthread-minutes. If you have 50 workitemthreads available (for example, on a single-appserver older environmnet), it’ll take roughly 20 minutes to run the core parts of that task (1000 / 50 = 20). If you had 500 workitemthreads available (5 JOB instances at 100 WIT each), the core parts of that task could run in as fast as 2 minutes (1000 / 500 = 2). Other caveat: how much other work is running on the system.
There’s some overhead to setting up and shutting down a job, and writing the results and logs to the database, but generally having more capacity lets jobs run faster.
As a rule, I don’t tend to run single jobs on more than about 4000 targets, I tend to aggressively use JOB_PART_TIMEOUT, and I always make sure to use a Server Smartgroup that excludes unavailable agents, as trying to talk to them are the thing that will cause very large jobs to take much longer than expected to complete.
Please follow up with questions or concerns.
Thank you Saif, Sean, Bill and others... JOB_PART_TIMEOUT is a good Idee. I will test it.