Is there anything in the agent log when you run the verify or agent info ?
Is there anything in the appserver log ?
Better search in KB.. There are lots of inputs available there..
What are the agent logs telling you that’s different between the connections that “work” and the ones that aren’t? I/O error usually has a useful log entry.
No, there are no related entries in the target server agent logs or the configuration app server console logs or agent logs. It appears that either those commands don't involve communication with the target server agent or those commands are communicating differently than the other commands that are making them take longer to respond.
Are there firewalls between the appservers and these targets? If there’s no entries on the agents, that means the communication’s not even getting to the agents, so the question is… who –is- the appserver talking to?
From the overall description it sounds as though you have a communication issue between the appservers and the agents on the targets. I would recommend going to each appserver and doing an agentinfo from each server to a single target and post the results. I've seen where one or more appservers were having an issue communicating to targets and it would show up as a random issue.
Agentinfo returns an I/O error on these servers, however we were able to telnet to these nodes over port 4750 from the app servers.
We found the cause of the issue. The servers in question have two IP addresses in DNS. Example:
Addresses: 10.176.136.27, 10.168.128.27
When we do a telnet from the app servers to Linux540 over port 4750, it works fine. However, if we try to SSH to Linux540 (using the name) from a known working bastion host, it won't connect. If we use the correct IP address, it connects fine.
Similar behavior is happening within BL as well... we can connect to the File System through Live View just fine in the BL client, and we can push ACLs fine. But when we try to do a 'Verify', it times out. I'm guessing the 'Verify' command and 'agentinfo' are making calls to Linux540 in a different way that Live View does.
So the solution is obviously to remove the incorrect IP address from the DNS entry, but it leaves the question open on how those commands are internally trying to communicate to the target server. Not sure if that's something we'll get the answer to that question on here, but one of the BMC BladeLogic software developer could probably tell us.
Is the IP_ADDRESS property value populated for any of these systems ?
Yes, it's populated and actually has the correct IP address in it.
1 of 1 people found this helpful
It could be the appserver has the wrong ip cached… fixing dns and bouncing the appserver should fix this.
I'm putting in a request to our network team to remove the invalid IPs from the DNS entries. Will let you know if that fixes the issue.