We have a "reboot servers" job that is made up of a package that just echoes a single command ("@echo Now rebooting..." or somesuch), then reboots. The reboot on this job is controlled by flagging the package item as requiring a reboot, then setting the Reboot Options on the job properties to "Use item defined reboot setting".
Just wondering exactly how the console issues a reboot command to a given server; what we are finding is that in some cases, the job log will indicate that the server has failed to reboot, even though the server does in fact reboot. Here is a sample from the job log:
|serverName||Commit||1||13-Dec||2009 9:55:31 AM||Info||REBOOTING SERVER serverName|
|serverName||Commit||1||13-Dec||2009 9:55:32 AM||Warning||Reboot in progress|
|serverName||Commit||1||13-Dec||2009 9:55:32 AM||Warning||Reboot found copy on reboot operations pending, checking if QChain tool initialization is required|
|serverName||Commit||1||13-Dec||2009 9:55:33 AM||Info||Attempting shutdown REBOOT=true TIMEOUT=0 MSG=System is rebooting|
|serverName||Commit||1||13-Dec||2009 9:55:34 AM||Info||Package Reboot-Servers@2009.03.28-220.127.116.114-0400-132387.2 about to reboot system|
|serverName||Commit||1||13-Dec||2009 9:55:34 AM||Info||Package Reboot-Servers@2009.03.28-18.104.22.1684-0400-132387.2 attempting reboot|
|serverName||Commit||1||13-Dec||2009 9:56:36 AM||Error||Failed to reboot for item EXTERNALCMD|
As you can see, it took about 65 seconds for the console to decide that the reboot had failed. Here is the eventlog entry for the same server at the time the job was running:
The process winlogon.exe has initiated the restart of computer serverName on behalf of user serverName\BladeLogicRSCD for the following reason: Legacy API shutdown
Reason Code: 0x80070000
Shutdown Type: restart
Comment: System is rebooting
... and the machine did reboot as expected.
Just wondering if maybe whatever command the console is issuing has returned an error code that is being mis-interpreted as a failure? In this case, the job throws a red "X" next to that server name, which is normally not something we want to see on a reboot.
Several of these makes it difficult to track down the servers that legitimately have a problem with the reboot or do not come back online. Is there some other way we can do a "reboot only" job that gives more reliable results?
(In this case, several other servers did reboot and the job indicated a success -- so this is not an "all the time" thing. Just here and there, which is frustrating to track down.)