Working on a NSH script to update the server status (Utility updateServersStatus mimicks the "verify" functionality found in the console). The commands used in the script are below. To summarize the problem:
1) Verify that the servers using verify in the console. The AGENT_STATUS is set to "agent is alive".
2) Ensure the servers in question are licensed via agentinfo.
3) Execute the NSH commands to run updateServersStatus. The script returns in just a few seconds (less than the 60 set for timeout on the updateServersStatus command) with the message "Could not update all properties for 'statp51' because the agent is not responding. Only the AGENT_STATUS property was updated." for all machines in the group.
4) The AGENT_STATUS on the target machines is set to "agent is not responding". However, the agent is running, I am able to execute jobs against the machines, and doing a verify again puts the status back to "agent is alive".
BladeLogic 8.2 GA (184.108.40.206) and the target machines in questions are RHEL 64-bit. The 8.2 agent is installed on the target machines and is running w/o issue.
SERVERGROUP="/By Role/Stats Servers/Production"
blcli_execute Server listServersInGroup "$SERVERGROUP"
FMT_SERVERLIST=`echo $SERVERLIST | sed -E 's/[ ]/,/g'` # Make the list comma-delimited for input to Utility updateServersStatus.
blcli_execute Utility updateServersStatus $FMT_SERVERLIST 10 60000 true
voidCould not update all properties for 'statp51' because the agent is not responding. Only the AGENT_STATUS property was updated.
Could not update all properties for 'statp52' because the agent is not responding. Only the AGENT_STATUS property was updated.
I removed the line where I am formatting the server list to include the commas and just hard-coded 1 server (with the comma) for testing. This time I am using a windows server as the target machine.
I did the same as above: verify server, ensure it's licensed and the current AGENT_STATUS is alive. I execute the script and I get the same message about the agent not responding.
RYAN-W7# blcli_execute Utility updateServersStatus $SERVERLIST 10 60000 true
voidCould not update all properties for 'odcsgacpxoe02' because the agent is not
responding. Only the AGENT_STATUS property was updated.
The rscd.log on the target machine contains the following. Logs shows nothing in regards to the agent not responding.
03/15/12 14:08:34.577 INFO rscd - ODCSGACPXOE02 4272 SYSTEM (Not_available): (Not_available): Adding account right "SeDenyInteractiveLogonRight" to user BladeLogicRSCD@ODCSGACPXOE02 for user privilege mapping
03/15/12 14:08:34.878 INFO rscd - 10.28.64.119 4272 BladeLogicRSCD@ODCSGACPXOE02->Anonymous:PrivilegeMapped (root): agentinfo: agentinfo odcsgacpxoe02
03/15/12 14:08:34.883 INFO rscd - ODCSGACPXOE02 4272 BladeLogicRSCD (Not_available): (Not_available): The operation completed successfully.
03/15/12 14:08:46.188 INFO1 rscd - 10.28.64.119 3840 svc_bladmin@DEVCS:PasswordLogon (BLAdmins:RYAN@WORKDOMAIN.COM): CM: > [Client] Retrieving property values
03/15/12 14:08:46.192 INFO rscd - ODCSGACPXOE02 3840 svc_BLAdmin (Not_available): (Not_available): The operation completed successfully.
03/15/12 14:08:47.239 INFO rscd - ODCSGACPXOE02 3840 svc_BLAdmin (Not_available): (Not_available): The operation completed successfully.
03/15/12 14:08:47.242 INFO rscd - ODCSGACPXOE02 3840 svc_BLAdmin (Not_available): (Not_available): The operation completed successfully.
The target machine's RSCD agent is still running. The agent status is now "agent not responding", but when I do a verify the agent status goes back to alive.
There is a ~12 second gap between where things are running as the local user BladeLogicRSCD @ 14:08:34.883 and what may be the change to the automation principal svc_bladmin@DEVCS. Since I am sending in 'true' to the updateServersStatus command, the 'request' goes through the application server and there is an automation principal set on the role being used, would that be contributing to the issue?
Anyone have any ideas? Am I hitting some sort of bug here?
If I use NSH directly on the application server using the exact same commands and targets (app server is RHEL 5.5), the command runs w/o issue. I am only experiencing the problem when I run NSH locally (Win 7 x64).
Message was edited by: Ryan Add more info.