Is this lockout reproducible? If so, please place the RSCD agent into debug mode and modify the number of rolled logs to keep then reproduce the error. Send the logs to support for review and diagnosis.
You need to rename the BladeLogicRSCD account used on the domain level as mentioned on this page: http://docs.bmc.com/docs/display/public/bsa83/Installing+RSCD+agents+in+a+replicated+domain+controller+environment
This should prevent the lock out. there are cases where, on a member server, running some binary as a local account will attempt to reach out to the domain as that account. 'certutil' is one example of such a command and apparently the .NET install is another. there is nothing that can be done in blade to prevent this.
for a test try this:
on a member server create a new local account that is in the adminsitrators group - call it testadmin
login as this account and run the installer
search on the domain controller security logs for failed authentication attempts as 'testadmin' from this server.
that should prove that the problem is outside of our control.
We are going to give this a try and I will post the results.
Looks like that worked.
We are now seing lockouts of the local BladelogicRSCD account on our primary appserver.
when does the lockout occur ? what job is running ? or after a reboot ?
It is not after a reboot.
I have this isolated to the job that performs our domain join. If there are any errors during the deploy (like account already exists) the local BladelogicRSCD on the fileserver becomes locked out. The only way to get it working again is to stop the service, delete BladelogicRSCD and start RSCD up again on the fileserver.
I'm testing to see if this occurs when there are no errors during the network join.
How does this domain join job work ?
We use the netdom command. Which worked fine until we renamed the domain account.
I was able to verify that the account on the appserver/fileserver only locks out if there is an error from joining the domain.
Any other errors in the package don't cause a lockout.
is the file server also the appserver ? and this is a blpackage or a nsh script ?
Yes, the file server is also the app server. The package is a bl package.
Is the job that deploys this blpackage targeting the appserver/fileserver or the server you are adding to the domain ?
It’s targeting the server we are building. Not the appserver or fileserver.
We can reproduce the problem by running the domain join job against a server that still has a domain account. When it errors out BladelogicRSCD on our Appserver/fileserver (same box) becomes locked out. If we unlock it, the account locks again when BL attempts to run another job. We have to stop the agent on the appserver/fileserver, delete the account, and restart the agent. Otherwise the appserver stops talking to the filestore. Right now we are good, because we are following process and deleting domain accounts ahead of rebuilds.
can you check if the rscd service on the file server is depended on Server and NetLogon ?
if not, can you add the dependencies and restart the service ? it's possible that the file server agent thinks it's running in the domain context and the domain join locks the domain bladelogicRSCD account and then that causes the file server agent to fail.
otherwise i'm not sure how a job running against one target would lock the agent account on another box.