14 Replies Latest reply on Nov 6, 2017 12:46 PM by David Poole

    How does the CM console reboot Windows servers?

      We have a "reboot servers" job that is made up of a package that just echoes a single command ("@echo Now rebooting..." or somesuch), then reboots. The reboot on this job is controlled by flagging the package item as requiring a reboot, then setting the Reboot Options on the job properties to "Use item defined reboot setting".

       

      Just wondering exactly how the console issues a reboot command to a given server; what we are finding is that in some cases, the job log will indicate that the server has failed to reboot, even though the server does in fact reboot. Here is a sample from the job log:

       

       

      serverNameCommit113-Dec 2009   9:55:31 AMInfo REBOOTING SERVER serverName
      serverNameCommit113-Dec 2009 9:55:32 AMWarningReboot in progress
      serverNameCommit113-Dec 2009 9:55:32 AMWarningReboot found copy on reboot   operations pending, checking if QChain tool initialization is required
      serverNameCommit113-Dec 2009 9:55:33 AMInfoAttempting shutdown REBOOT=true TIMEOUT=0 MSG=System is rebooting
      serverNameCommit113-Dec 2009 9:55:34 AMInfoPackage Reboot-Servers@2009.03.28-11.52.23.924-0400-132387.2 about to   reboot system
      serverNameCommit113-Dec 2009 9:55:34 AMInfoPackage Reboot-Servers@2009.03.28-11.52.23.924-0400-132387.2 attempting   reboot
      serverNameCommit113-Dec 2009 9:56:36 AMError Failed to reboot for item   EXTERNALCMD

       

      As you can see, it took about 65 seconds for the console to decide that the reboot had failed. Here is the eventlog entry for the same server at the time the job was running:

       

      The process winlogon.exe has initiated the restart of computer serverName on behalf of user serverName\BladeLogicRSCD for the following reason: Legacy API shutdown
      Reason Code: 0x80070000
      Shutdown Type: restart
      Comment: System is rebooting

       

      ... and the machine did reboot as expected.

       

      Just wondering if maybe whatever command the console is issuing has returned an error code that is being mis-interpreted as a failure? In this case, the job throws a red "X" next to that server name, which is normally not something we want to see on a reboot.

       

      Several of these makes it difficult to track down the servers that legitimately have a problem with the reboot or do not come back online. Is there some other way we can do a "reboot only" job that gives more reliable results?

       

      (In this case, several other servers did reboot and the job indicated a success -- so this is not an "all the time" thing. Just here and there, which is frustrating to track down.)

        • 1. Re: How does the CM console reboot Windows servers?

          Hi there,

           

          Technially, the console doesn't issue the reboot command. In a nutshell, the AppServer starts a local process on the target via the RSCD agent during the commit phase of a deploy job. That local process goes through the BLPackage, item by item, and carry out the appropriate action. So, in this case, it is that local process requests a reboot via the Windows APIs. The local process waits for 60 seconds after it requested the reboot. If it is still alive after 60 seconds, it reports an error (which looks like the case you encountered). My guess is the server takes longer than 60 seconds to shutdown. To prove/disprove that theory, you could check the event log for the time stamp of the shutdown event for the event log service and compare that will the time stamp of the "Attemping shutdown..." message in the deploy log. The difference should be great than 60 seconds.

           

          Hope this helps.

          • 2. Re: How does the CM console reboot Windows servers?

            Excellent. Thanks much for the insight!

            • 3. Re: How does the CM console reboot Windows servers?
              Patrick O'Callaghan

              Hi,

               

              I might be facing the same problem, investigating into this now.  Am going to try to run further testing but so far it does not look to be this timeout.  Would still like to extend that timeout regardless.

              Is there a way to configure that shutdown timeout?  Physical servers often take longer than 60 seconds to complete the shutdown.

               

              This reboot error is killing our provisioning process as we need a number of reboots to install the post configuration application stack and configurations and once it fails during the reboot the entire provisioning process fails because the jobs continue to execute and fail while the server is in the process of rebooting.

               

              The Error I am seeing is:

              Error    Mar 5, 2015 12:30:46 PM    The job 'Restart' has failed  
              Info    Mar 5, 2015 12:29:42 PM    Executing work item Deploy Apply Job (Post-Execute):Restart; Server:TestServer;  on application server: App2  
              Info    Mar 5, 2015 12:27:27 PM    Executing work item Deploy Apply Job (Pre-Execute):Restart; Server:TestServer;  on application server: App2  
              Info    Mar 5, 2015 12:27:21 PM    Executing work item Deploy Staging Job:Restart; Server:TestServer;  on application server: App2  
              Info    Mar 5, 2015 12:27:18 PM    Started running the job 'Restart' with priority 'NORMAL' on application server 'App1'(1)  

               

              The App Servers are windows 2008R2, the TestServer is Windows 2012R2 datacenter.

              I was not seeing these same issues with windows 2008R2 TestServer, however it didn't require as many reboots.

              BSA Version 8.3.02.332 and the agent 8.3.02.332

               

              Any help would be greatly appreciated, there does not seem to be much if any information in the logs and the reboot occurs successfully but the job fails for no apparent reason almost immediately as the server is shutting down.

              • 4. Re: How does the CM console reboot Windows servers?
                Santosh Kothuru

                Its not possible to configure that shutdown timeout but you can achieve that with your job code to wait with editing up "sleep" commands.

                • 5. Re: How does the CM console reboot Windows servers?
                  richard mcleod

                  recommend using this script NSH Script: Reboot Server

                   

                  Provides much better logging + its cross platform

                  • 6. Re: How does the CM console reboot Windows servers?
                    Bill Robinson

                    iirc the issue w/ the blpackage initiated reboot is that the box reboots too quickly and the appserver doesn't see that the box has restarted before it starts checking to see if it'd down.  i thought there was a defect for this but i did a quick search and didn't find it.  i'd open a ticket on it.

                    • 7. Re: How does the CM console reboot Windows servers?
                      Patrick O'Callaghan

                      Thanks Bill,

                       

                      Would you have any idea if this is still an issue in later versions? specifically 8.6?  We are still waiting on that defect fix for 8.6 windows patch catalog, but once that is released we are going to upgrade.

                      If this reboot issue is no longer a problem in 8.6 then we have no pressing need to get it fixed in 8.3.

                      • 8. Re: How does the CM console reboot Windows servers?
                        Patrick O'Callaghan

                        Was testing in our 8.6 Environment running BlPackage with reboot after item deployment and it ran successfully 4 times and on the 5th it failed with:

                         

                        Info    03/06/2015 12:02:33    The job 'Reboot->Reboot' has failed on server TestServer  
                        Error    03/06/2015 12:02:33    APPLY failed for server TestServer. Exit code = -5001  
                        Info    03/06/2015 12:00:25    Failure in event service code ( exitCode = -5001 )  
                        Info    03/06/2015 12:00:19    Package "Reboot" UUID(32ef1c305d633f10a81193e0cc4379b2) started  
                        Info    03/06/2015 12:00:18    Started running the deploy step job 'Reboot->Reboot' on application server 'AppServer1'(1) against target server 'TestServer'  
                        Info    03/06/2015 12:00:18    Deploy Apply Job (Pre-Execute):Reboot; Server:TestServer;  PkgID:"bc373407-a5b5-4b95-a467-b1ab95a9d463-7376.2"; UUID:32ef1c305d633f10a81193e0cc4379b2  

                         

                        This is the same error I was seeing in 8.3 so I am assuming the issue still remains.

                        Will open a support ticket.

                        • 9. Re: How does the CM console reboot Windows servers?
                          Steve Cupp

                          Patrick, if possible would you please update this thread with whatever comes from your ticket. We have been battling various reboot issues for some time now in our 8.5 environment and because of the seeming randomness of the failures, we have not opened a ticket on it.

                          • 10. Re: How does the CM console reboot Windows servers?
                            Sean Berry

                            Steve Cupp, you might try either the NSH Script above if you need/want a dedicated reboot process, or add a sleep command to the original reboot solution.  If you can get something reproducible, the resolution on it tends to go pretty quickly.

                            • 11. Re: How does the CM console reboot Windows servers?
                              Patrick O'Callaghan

                              Unfortunately, BMC was not able to reproduce the issue, and as it is due to the server restarting to fast for Bladelogic to even communicate with the RSCD agent before it shuts down, nothing is written to the logs.  The logs just cut off suddenly.

                               

                              BMC essentially said there is nothing that can be done to fix this, use the workarounds such as blpackage out-of-band reboot with a reboot cmd in the package.

                               

                              Also, that 8.7 handles deployments differently, and this shouldn't be an issue in 8.7.

                              • 12. Re: How does the CM console reboot Windows servers?
                                Steve Cupp

                                For the "out of band" reboot option, how should the deploy job be set for the reboot?

                                Thank you Patrick, all good info to know!

                                • 13. Re: How does the CM console reboot Windows servers?
                                  Patrick O'Callaghan

                                  I am not sure what the best way would be but I was trying "cmd /c shutdown /r /t 5"

                                  • 14. Re: How does the CM console reboot Windows servers?
                                    David Poole

                                    Inside version BladeLogic 8.9 this behavior still exist unfortunetley, you can refer to my new post here  https://communities.bmc.com/message/733974#733974

                                     

                                    This maybe an idea for BMC developer but improve on how the deployjob checks for server uptime instead of using a 1 minute sleep check? You can check SERVER UPTIME of the server or read event codes 6005 and 6006 to determine shutdown and boot status. Then Compare the uptime before with the Uptime after. Those 2 variables are very good information to do logic and be 100% sure is server got rebooted or not.