We are running 8.0 sp4:
Guys I have these 2 jython jobs against 2000+ agents, one only updates the server status against a smart group: Agent not responding. This job works great when ran through either of our appservers when scheduling the job via the gui, or by running it inside a nsh shell. In short it kicks off bljython.bat jythonscript.py.
The update cmd is: jli.run(['Utility','updateServersStatus',host,'20','120000','false'])
And my other job attempts to update all server properties, running the same bljython.bat and passing it a jython script. The update cmd is:
jli.run(['Utility','updateServersStatus',host,'20','120000','true']) . The only thing different between the two is true vs false. When running this job, or by simply calling bljython.bat jythonscript.py in a cmd window on either appserver host the script executes and reports a successful update for each server. But the list of unresponsive agents will grow from 50 to well over 1000 agents. I can run this same script on any non appserver host through a nsh cmdline, and it does what it should, the list is updated correctly and the unresponsive list doesn’t grow. We currently use the built in options in the gui to update all server props nightly, but that job also will spike the unresponsive agents, keep in mind the agents that get marked down, all are responding fine, and it's not a time out issue. Has anyone experienced this?
I saw something similar at a client site, where status of some agents was marked as 'unavailable' when running some batch jobs.
It has something to do with network availability, its wasnt a bladelogic system error. The agents technically were not down but appeared unavailable in the console. Not sure if its the same thing you're experienceing.