What does the util look like on the targets when they are being scanned? Are you using Oracle or MSSQL DB?
Hey Adam. Will need to look at the targets tomorrow for their utilization.
Using MSSQL DB. Looking at the MSSQL DB server, network and CPU utilization match the high throughput.
Can also look at the network flow to see if its Appserver->targets or appserver->SQL or both.
Another question, are your MSSQL table spaces on local disk, SAN, or NAS? Also, on your cpu util on the appserver, is the proc being hijacked by JAVA, NSH, or something else?
Wish I could post some of the statistics here but would be a pain to transfer it from production.
Basically we were seeing high CPU run queue (20-30) with high system cpu time (versus user cpu time). Today I looked in more detail and found out (with our VMWare admins) that the MSSQL VM was being capped on how much CPU resource it could leverage from the ESX Host. Additionally, there was very little traffic going from the appserver to the targets, but a lot of traffic between appserver and MSSQL server. These two VMs were on two different ESX Hosts.
So, the VMadmin vmotioned the SQL vm so that it is co-hosted on the same ESX server (to keep the communication internal to the vswitch), and we saw drastic improvements on the CPU Run queue (was down to < 7 when processing 800 hosts for compliance job w/ 10 parallel targets).
The ESX hosts have 6x 1GB NICs but the throughput is still 1GB (not 6GB). What is even more strange is that when NetBackup kicks in the middle of the night, it way surpases the NIC throughput when a compliance job was being run.
So, current resolution seems either A) removing the CPU cap on the MSSQL VM and/or 2) placing both VMs on the same ESX host so that traffic is transversed internal to the vswitch.
To your question above, the MSSQL table space is on a local VM, which is in a datastore, which is housed on a fibre channel-based SAN.
From a process perspective, the top process is the blappserv java process itself.
Still have two questions:
1) Is it normal for bl appserver to consume a lot of sys cpu time versus user cpu time?
2) Has anyone seen > 0 CPU run queue time when running a job? Don't know if this is normal or abnormal due to something else we haven't pin pointed yet.
1 of 1 people found this helpful
I have never taken consideration of either of the two questions.
I can say though that BMC's best practice is to not have a database on a virtual machine (for any of our products). Too many unexplainable issues have arisen when databases are virtualized. Not sure if that is the issue for you here, but it does complicate things.