BladeLogic RSCD agent stability is better than it’s ever been. That being said, 24,000 server object at 5% failure rate is still 1200 down agents. We needed a way quickly identify all the issue with a server not communicating with BladeLogic Infrastructure. (Example: RSCD Agent installed, RSCD Version, service running, Service Start type, FQDN Lookup, Open Firewall Porta and so on) I’m able to do this level of checking outside of BladeLogic with a custom PowerShell Scripts.
So from the custom Get-RSCDHealth command you can quickly see what the problems with the Client Server RSCD service. And, after running additional custom commands you can quickly track down to the exact issue causing the server to not function correctly.
So if I can write a PowerShell script to handle this level of troubleshooting I would like to see the BMC add this level of support.
Below example is proof of concept and is a working PowerShell Module
# Full RSCD Health Check
PS C:\> Get-RSCDHealth ClientServer01.domain.com
Target Server : ClientServer01.domain.com
Agent Version : 8.3.02.332
Service Status : Running
Service Start Type : Auto
Desktop Interact : True
Resolved FQDN : False
Firewall Port Open : False
# Verify Server FQDN
PS C:\> Get-FQDN ClientServer01 -Verbose
VERBOSE: Deleting previous log: C:\temp\BSAAgentCheck\Get-FQDN.log
VERBOSE: Creating C:\temp\BSAAgentCheck\Get-FQDN.log
VERBOSE: Starting Clear-JobQueue Process at [06/03/2015 13:31:15]
VERBOSE: Beginning input loop
VERBOSE: Performing the operation "Get-FQDN" on target "ClientServer01".
VERBOSE: Getting FQDN for ClientServer01
VERBOSE: DNSshell-FQDN Lookup: for ClientServer01
VERBOSE: DotNET-FQDN Lookup: ClientServer01.domain.com for Server Name: ClientServer01
VERBOSE: AD-FQDN Lookup: ClientServer01.domain.com for Server Name: ClientServer01
VERBOSE: DNSshell-FQDN Lookup: NO DNS RECORD for Server Name: ClientServer01
VERBOSE: One of Three FQDN Values Returned did not match FQDN Results: False
VERBOSE: ClientServer01: Job Queue has been processed.
VERBOSE: Ending Clear-JobQueue Process at [06/03/2015 13:31:16]
#Verify all firewall ports are open from BSA Servers to client Server Running RSCD Agent
PS C:\> Get-RSCDportstatus ClientServer01.domain.com
BSA Server List Target Server Exit Status
---- ______ ----------
BSAserver01.domain.com ClientServer01.domain.com 0
BSAserver02.domain.com ClientServer01.domain.com 0
BSAserver03.domain.com ClientServer01.domain.com 7
BSAserver04.domain.com ClientServer01.domain.com 0
The above is only the first step. Afterwords you can use the above scripts to audit server and provided self-heal outside BSA but executed by via compliance (component-template) within BladeLogic via a mid server to handle the get-commands/remediation-commands..