This may be related to this issue,
BMC Server Automation (BSA) Windows Patch Analysis or Compliance jobs are hanging when run against a large number of servers.
The job has a JOB_PART_TIMEOUT defined so is not hanging due to issues with particular servers.
Running the job against a small number of servers results in the job running to completion
The same job against the same group of servers may have worked correctly until a recent upgrade
When the job is in this state, other BSA jobs may be affected and hang.
BMC Server Automation 8.5.01
BMC Server Automation 8.5.01 Patch 1
BMC Server Automation 8.5.01 Patch 2
BMC Server Automation 8.5.01 Patch 3
BMC Server Automation 8.5.01 Patch 4
BMC Server Automation 8.3.03.155
The root cause of this issue may be an issue with certain builds of the Oracle Java Runtime Environment (JRE).
The affected builds which shipped with certain versions of BSA are:
JRE Version = 1.7.0_55 - Ships with BSA 8.5.01 and 8.5.01 Patches 1 through 4.
JRE Version = 1.6.0_75 - Ships with BSA 220.127.116.11
To confirm this might be the issue, check the version of the JRE which is being used in this BSA environment. The easiest way to check is from the following area in the BSA Console:
Configuration - Infrastructure Management - Application Servers
Select any appserver and click on Appserver Service
Scroll down to where the JRE version is listed e.,g:
JRE Version = 1.6.0_75 from Sun Microsystems Inc. or
JRE Version = 1.7.0_55 from Oracle Corporation
Upgrade to BSA 8.5.1 Patch 5 when it releases in March 2015. BSA 8.5.1 Patch 6 will ship with JRE Version 1.6.0_u71 which resolves this issue.
If this issue is encountered prior to 8.5.1 Patch 5 being released and must be resolved immediately, contact BMC Support who may be able to assist with a workaround of either downgrading or upgrading the JRE to a version known not to have this issue e.g 1.6.0_37 (for BSA 8.3) or 1.7.0_04/1.7.0_71 (for BSA 8.5.1.X). Modifying the JRE version should not be attempted without the guidance of BMC Support.
You can set the JOB_PART_TIMEOUT property of the Job
Also, you can make the parallel execution to a higher limit.
I took out all timeouts from jobs, ran it against 30 servers in parallel max, its taking much longer to run now but at least im not getting a massive amount of cancellations. Running CIS-type compliance against 950 redhat endpoints is taking around 3 hrs, but to be fair we have a lot of custom scripts (ext objects) that have to open new jvms for every rule.
Why do you need to open jvms in your eos ?
sorry didnt mean jvms, opens a scriptutil session for every sh script, so its using up a worker thread for each session.