1 2 Previous Next 29 Replies Latest reply on Mar 2, 2017 12:18 PM by Bill Robinson

    RSCD Agent on Windows Patch Repository becomes unresponsive

    Steffen Kreis

      We are running every now and then in the situation where the RSCD Agent on our Windows Patch Repository becomes unresponsive and the Patching then fails.



      Scenario:
      We are maintaining an Agent-Copy based Windows Patch Catalog, where the Patch Payload is stored on a Windows Server runing an RSCD 8.6.1 agent.
      When we deploy patches using that catalog, the payload is tranfered from this Patch-Repository Server to the Target machines.
      During our Patching sessions on the weekends AND sometimes during Patch-catalog Updates, this Agent becomes unresposnive and the Patching, or CUJ fails.

      The error we then see in the Deploy job is similiar to this:
      Error 01-Feb-2016 11:53:14 There was a problem resolving the soft links in the package: java.io.IOException: JNI file copy from '//PATCH_REPO_SERVER/D/Apps/BladeLogic/PatchStore/Patch-Catalogs/Microsoft-Office-Catalog/onenoteloc2010-kb3054978-fullfile-x64-glb.exe' to '//server123456/d/temp/stage/bb58c64a5bf835809e03679035b2d031/2685700.1/onenoteloc2010-kb3054978-fullfile-x64-glb.exe' failed: : Connection timed out


      The error, when a CUJ fails is similiar to this:
      Warning Wed, 09 Dec 2015 17:11:22 Payload does not exist for hotfix : Windows6.1-2008-R2-KB2699988-x64.msu-MS12-037-en-INTERNET EXPLORER 8 (X64)-GOLD, , Error: com.bmc.sa.patchfeed.FeedException: com.bmc.sa.patchfeed.windows.UpdatorFeedException: Error occur: Level: FAILURE
      Type: STORAGE_FILE_ERROR
      Message: The payload Windows6.1-KB2699988-x64.msu not found at file://PATCH_REPO_SERVER/D/Apps/BladeLogic/Patch-Management/Payload-Source/Windows-Server-global-download (Caused By: com.bmc.sa.patchfeed.windows.UpdatorFeedException: Error occur: Level: FAILURE
      Type: STORAGE_FILE_ERROR
      Message: The payload Windows6.1-KB2699988-x64.msu not found at file://PATCH_REPO_SERVER/D/Apps/BladeLogic/Patch-Management/Payload-Source/Windows-Server-global-download (Caused By: Error occur: Level: FAILURE
      Type: STORAGE_FILE_ERROR
      Message: The payload Windows6.1-KB2699988-x64.msu not found at file://PATCH_REPO_SERVER/D/Apps/BladeLogic/Patch-Management/Payload-Source/Windows-Server-global-download))


      Although ther Payload definetely is in place.


      What we noticed is that, when this happens the RSCD.exe process on the Patch-Repo server has a fairly large memory usage ( sometimes larger than 1 GB) and has hundreds of Threads going, according to Process-Explorer.
      Most of these threads are in the state Wait:WrResource.


      At the moment when this happens we restart the Agent, but as the issue comes back again and again, we need to solve this permanently.



      Has anybody seen someything similiar ?


      Steffen


        • 1. Re: RSCD Agent on Windows Patch Repository becomes unresponsive
          Steffen Kreis

          I am wondering if it makes sense for us to not use another Windows box as the Patch-Repo, but instead place the agent-copy Patch-Payload on the FileServer.

           

          I know that the Patch-Payload is not handled by the FileServer, but it might be a good place to put the stuff on to.

           

          Where do other people store their agent-copy Patch-Payload ?

          • 2. Re: RSCD Agent on Windows Patch Repository becomes unresponsive
            Bill Robinson

            That could just result in the file server agent behind overwhelmed…  what os is the file server ?

            • 4. Re: RSCD Agent on Windows Patch Repository becomes unresponsive
              Steffen Kreis

              Yep, as Rohit said, Solaris 10 x86.

               

              Would you "generally" say a Solaris/Linux Agent is more robust under heavy load than a Windows Agent ?

              So in order to leave the FileServer alone and do what its supposed to do, would you think a Patch-Repo hosted on another Linux box for example would make sense ?

               

              I'm just asking, cause the constant trouble with a hanging agent during Patch-Peak-Times is a pain.

               

              Steffen

              • 5. Re: RSCD Agent on Windows Patch Repository becomes unresponsive
                Steffen Kreis

                This is getting more and more serious.

                 

                We just had the situation where the RSCD on the Patch-Repo simply died.

                 

                 

                It seems as this has happened more than once in the past.

                Here are the relevant entries from the rscdsvc.log

                 

                02/01/16 14:12:01.826 INFO     rscdsvc - HandleAgentStart: Agent process started ("C:\Program Files\BMC Software\BladeLogic\RSCD\/RSCD.exe")

                02/01/16 14:13:53.462 ERROR    rscdsvc - ManageAgent: Agent down hard thread(18938528)

                02/01/16 14:15:30.248 INFO     rscdsvc - HandleAgentStart: Agent process started ("C:\Program Files\BMC Software\BladeLogic\RSCD\/RSCD.exe")

                02/03/16 06:05:41.468 ERROR    rscdsvc - ManageAgent: Agent down hard thread(19266208)

                02/03/16 06:05:41.468 ERROR    rscdsvc - ManageAgent: Agent being restarted - count=1

                02/03/16 06:05:43.480 INFO     rscdsvc - HandleAgentStart: Agent process started ("C:\Program Files\BMC Software\BladeLogic\RSCD\/RSCD.exe" -r -R)

                02/03/16 22:16:50.516 ERROR    rscdsvc - ManageAgent: Agent down hard thread(19266208)

                02/03/16 22:21:01.640 INFO     rscdsvc - HandleAgentStart: Agent process started ("C:\Program Files\BMC Software\BladeLogic\RSCD\/RSCD.exe")

                02/15/16 10:16:55.031 ERROR    rscdsvc - ManageAgent: Agent down hard thread(19724960)

                02/15/16 10:16:55.046 ERROR    rscdsvc - ManageAgent: Agent being restarted - count=1

                02/15/16 10:16:57.074 INFO     rscdsvc - HandleAgentStart: Agent process started ("C:\Program Files\BMC Software\BladeLogic\RSCD\/RSCD.exe" -r -R)

                02/15/16 10:17:03.096 ERROR    rscdsvc - ManageAgent: Agent down hard thread(19724960)

                02/15/16 10:17:03.096 ERROR    rscdsvc - ManageAgent: Agent being restarted - count=2

                02/15/16 10:17:05.109 INFO     rscdsvc - HandleAgentStart: Agent process started ("C:\Program Files\BMC Software\BladeLogic\RSCD\/RSCD.exe" -r -R)

                 

                 

                The rscd.log contains no errors or hints at the time when the agent dies.

                 

                This happened during a Patching-Job on Monday against ~ 600 Windows Servers, where the parallelism on the Patch-Analysis-Job is set to 200.

                200 Server were actually patched fine, and all the rest aborted due to "connection refused" from the Patch-Repo, at the moment the agent died.

                • 6. Re: RSCD Agent on Windows Patch Repository becomes unresponsive
                  Bill Robinson

                  can we look at a few things?

                   

                  -> dir listing of NSH/br/stdlib on all the appservers - should be the same, and attach one

                  -> dir listing of NSH/br/java/lib/ext on all appservers - should be the same and attach one

                  -> we could try disabling fips on the rscd - i think you can just rename the RSCD/share/openssl.cnf file and restart the agent.

                  • 7. Re: RSCD Agent on Windows Patch Repository becomes unresponsive
                    Steffen Kreis

                    We have disabled FIPS for now and will monitor how it goes,

                     

                    Pfa the requested file listings

                    • 8. Re: RSCD Agent on Windows Patch Repository becomes unresponsive
                      Jim Campbell

                      We have always put the patch payload on the fileserver.  We used to have the same problem you are seeing periodically but in our case the rscd service log on the fileserver/patch content repository would always indicate that the service had crashed and restarted itself when this occurred.  I never did figure out what was occurring but I began restarting the application server services and the fileserver agent prior to large patching windows and that seemed to reduce the likelihood of an issue.

                       

                      We have since moved to different hardware and the problem no longer occurs although I think it may be because of some kind of bandwidth throttling preventing artificially preventing us from crashing the agent.  In response we have just started doing analysis and patch staging well in advance and just having the actual patch application in commit start during the maintenance window.

                      • 9. Re: RSCD Agent on Windows Patch Repository becomes unresponsive
                        Bill Robinson

                        i think the appserver libs look ok.  if the patch repo is still on the windows box, after disabling fips have you seen any issues during high request periods ?

                        • 10. Re: RSCD Agent on Windows Patch Repository becomes unresponsive
                          Steffen Kreis

                          Hi Bill,

                           

                          it looks indeed much better now with FIPS disabled.

                          The "memory-leaking" was down to a minimal on the RSCD process after the recent weekend, which was a patch-heavy weekend.

                           

                          I will monitor the performqance closely during the upcoming one as well.

                           

                          One question though. What does disabling FIPS exactly do ?

                          Does it just disable a certain encryption mode, or does it disable enrcyption at all ?

                           

                          We did disabled it in the past on all machines, when the RSCD agent wasn't able to cope with activated ASLR on the Windows Servers, but it was nver really clear what it means exactly.

                           

                          Steffen

                          • 11. Re: RSCD Agent on Windows Patch Repository becomes unresponsive
                            Bill Robinson

                            FIPS 140-2 - Wikipedia, the free encyclopedia

                             

                            when this is disabled we still use tls encryption but algorithms that are not compliant to the fips 140-2 standard.

                             

                            one more thing to check - you don't have the OPENSSL_CONF variable set anywhere on the windows system w/ the rscd ?

                            1 of 1 people found this helpful
                            • 12. Re: RSCD Agent on Windows Patch Repository becomes unresponsive
                              Steffen Kreis

                              Hi,

                               

                              unfortuantely the problem is coming back !

                              Since we deploy more and more Windows Server 2012 Builds and we only patch 2012 through the agent-copy method we are getting a higher load on the Patch-Repo server again.

                               

                              Whenever there is a higher number of parallel targets to stage to the appservers see "Connection timed out" when trying to copy the Patch Payload from the Windows based Patch Repo to the targets.

                               

                              This happens when we process more than 300 targets , which from my point of view is still quite a small number.

                               

                              FIPS is still disabled btw.

                               

                              We really need a more robust solution.

                               

                              We are currently looking at:

                               

                              - Placing the Patch Payload on the Solaris based Fileser

                              - Deployment of a new Linux based Patch-Repo Server

                              - NFS Share mapped to all of our 4 Linux based AppServers and setting the Patch-Payload URL to //localhost so that we share the load across 4 agents

                               

                               

                              Any thoughts on these options ?

                               

                              Steffen

                              • 13. Re: RSCD Agent on Windows Patch Repository becomes unresponsive
                                Jim Campbell

                                We still also have this problem and have just been working around it.

                                 

                                I think this may be exacerbated by the new Microsoft patching model as there are now bigger patches and in general more targets per deploy job.  It has been occurring more frequently ( for us at least ) the past 2 months.

                                • 14. Re: RSCD Agent on Windows Patch Repository becomes unresponsive
                                  Steffen Kreis

                                  Oh okay !

                                   

                                  What's the architecture you are running with on the moment with regards to the Patch-Repo ?

                                   

                                  Steffen

                                  1 2 Previous Next