14 Replies Latest reply on Jun 3, 2014 6:41 PM by Bill Robinson

    agent enrollment failed

    Raja Mohan

      During the RHEL provisioning I notice the provisioning job fails with

      "Agent enrollment failed. Any post provisioning jobs scheduled to run with this provisioning job might fail."

       

      I am able to select "force post-install batch job" and proceed with batch processing.

       

      I added a wait agent for few minutes as first job in the post install before executing autoinst phase1,2 and 3. Though the job says it failed, I can see the sever in the BSA console with penguin indicating a successful. I also notice on the console log in appserver, "noUserWasDefined". Find below the java stack trace. I have added the user in users, users.local in the kickstart. Any insight?

       

      [02 Jun 2014 16:49:11,071] [WorkItem-Thread-36] [INFO] [bxxx@xxx.com:ENGINEER:] [Provisioning] OS Provisioning of device '00-50-56-8C-70-85' completed.

      [02 Jun 2014 16:52:20,409] [WorkItem-Thread-36] [ERROR] [bxxx@xxx.com:ENGINEER:] [Provisioning] Error in Socket.connect(target=bsatest2_rhel6 ; portNum=4750 ; localResolve=true ; addr=bsatest2_rhel6/10.80.105.55:4750 ; noUserWasDefined)

      [02 Jun 2014 16:52:20,409] [WorkItem-Thread-36] [ERROR] [bxxx@xxx.com:ENGINEER:] [Provisioning] java.net.ConnectException: Connection timed out

      [02 Jun 2014 16:52:20,409] [WorkItem-Thread-36] [ERROR] [bxxx@xxx.com:ENGINEER:] [Provisioning] Connection timed out

      java.net.ConnectException: Connection timed out

              at java.net.PlainSocketImpl.socketConnect(Native Method)

              at java.net.PlainSocketImpl.doConnect(Unknown Source)

              at java.net.PlainSocketImpl.connectToAddress(Unknown Source)

              at java.net.PlainSocketImpl.connect(Unknown Source)

              at java.net.Socket.connect(Unknown Source)

              at java.net.Socket.connect(Unknown Source)

              at com.bladelogic.om.infra.app.service.agentservice.XmlRpcBLHttpTransport.newSocket(XmlRpcBLHttpTransport.java:236)

              at com.bladelogic.om.infra.app.service.agentservice.XmlRpcBLHttpTransport.connect(XmlRpcBLHttpTransport.java:92)

              at com.bladelogic.om.infra.app.service.agentservice.AgentConnectionImpl.<init>(AgentConnectionImpl.java:129)

              at com.bladelogic.om.infra.app.service.agentservice.AgentConnectionPool.createConnection(AgentConnectionPool.java:109)

              at com.bladelogic.om.infra.app.service.agentservice.AgentConnectionPool.getConnection(AgentConnectionPool.java:162)

              at com.bladelogic.om.infra.app.service.agentservice.AgentConnectionServiceImpl.getConnection(AgentConnectionServiceImpl.java:40)

              at com.bladelogic.om.infra.app.service.agentservice.AgentMethodInvocationProvider.executeRequest(AgentMethodInvocationProvider.java:72)

              at com.bladelogic.om.infra.app.service.agentservice.AgentMethodInvocationProvider.invoke(AgentMethodInvocationProvider.java:50)

              at com.bladelogic.om.infra.app.service.routing.RoutingServiceImpl.invokeMethodLocally(RoutingServiceImpl.java:317)

              at com.bladelogic.om.infra.app.service.routing.RoutingServiceImpl.invoke(RoutingServiceImpl.java:244)

              at com.bladelogic.om.infra.app.service.agentservice.AgentRequestManager.invoke(AgentRequestManager.java:156)

              at $Proxy34.getAsset(Unknown Source)

              at com.bladelogic.om.infra.daal.DAALService.getAsset(DAALService.java:164)

              at com.bladelogic.om.infra.model.server.ServerImpl.updatePropertyValuesFromAgent(ServerImpl.java:1428)

              at com.bladelogic.om.infra.model.server.ServerImpl.updatePropertyValuesFromAgent(ServerImpl.java:1399)

              at com.bladelogic.om.infra.model.server.ServerImpl.updatePropertyValuesFromAgent(ServerImpl.java:1389)

              at com.bladelogic.om.provisioning.model.job.provision.ProvisionJobExecutor$PollForProvisionedWorkItem.pollForAgentUpdate(ProvisionJobExecutor.java:1405)

              at com.bladelogic.om.provisioning.model.job.provision.ProvisionJobExecutor$PollForProvisionedWorkItem.execute(ProvisionJobExecutor.java:1263)

              at com.bladelogic.om.infra.app.service.workitem.WorkItem.doExecute(WorkItem.java:114)

              at com.bladelogic.om.infra.app.service.workitem.thread.WorkItemThread.execute(WorkItemThread.java:176)

              at com.bladelogic.om.infra.app.service.workitem.thread.WorkItemThread.execute(WorkItemThread.java:51)

              at com.bladelogic.om.infra.app.service.thread.BlBlockingThread.run(BlBlockingThread.java:95)

      [02 Jun 2014 16:52:20,410] [WorkItem-Thread-36] [ERROR] [bxxx@xxx.com:ENGINEER:] [Provisioning] Failed to connect to bsatest2_rhel6:4750

      org.apache.xmlrpc.XmlRpcException: Failed to connect to bsatest2_rhel6:4750

              at com.bladelogic.om.infra.app.service.agentservice.XmlRpcBLHttpTransport.connect(XmlRpcBLHttpTransport.java:96)

              at com.bladelogic.om.infra.app.service.agentservice.AgentConnectionImpl.<init>(AgentConnectionImpl.java:129)

              at com.bladelogic.om.infra.app.service.agentservice.AgentConnectionPool.createConnection(AgentConnectionPool.java:109)

              at com.bladelogic.om.infra.app.service.agentservice.AgentConnectionPool.getConnection(AgentConnectionPool.java:162)

              at com.bladelogic.om.infra.app.service.agentservice.AgentConnectionServiceImpl.getConnection(AgentConnectionServiceImpl.java:40)

              at com.bladelogic.om.infra.app.service.agentservice.AgentMethodInvocationProvider.executeRequest(AgentMethodInvocationProvider.java:72)

              at com.bladelogic.om.infra.app.service.agentservice.AgentMethodInvocationProvider.invoke(AgentMethodInvocationProvider.java:50)

              at com.bladelogic.om.infra.app.service.routing.RoutingServiceImpl.invokeMethodLocally(RoutingServiceImpl.java:317)

              at com.bladelogic.om.infra.app.service.routing.RoutingServiceImpl.invoke(RoutingServiceImpl.java:244)

              at com.bladelogic.om.infra.app.service.agentservice.AgentRequestManager.invoke(AgentRequestManager.java:156)

              at $Proxy34.getAsset(Unknown Source)

              at com.bladelogic.om.infra.daal.DAALService.getAsset(DAALService.java:164)

              at com.bladelogic.om.infra.model.server.ServerImpl.updatePropertyValuesFromAgent(ServerImpl.java:1428)

              at com.bladelogic.om.infra.model.server.ServerImpl.updatePropertyValuesFromAgent(ServerImpl.java:1399)

              at com.bladelogic.om.infra.model.server.ServerImpl.updatePropertyValuesFromAgent(ServerImpl.java:1389)

              at com.bladelogic.om.provisioning.model.job.provision.ProvisionJobExecutor$PollForProvisionedWorkItem.pollForAgentUpdate(ProvisionJobExecutor.java:1405)

              at com.bladelogic.om.provisioning.model.job.provision.ProvisionJobExecutor$PollForProvisionedWorkItem.execute(ProvisionJobExecutor.java:1263)

              at com.bladelogic.om.infra.app.service.workitem.WorkItem.doExecute(WorkItem.java:114)

              at com.bladelogic.om.infra.app.service.workitem.thread.WorkItemThread.execute(WorkItemThread.java:176)

              at com.bladelogic.om.infra.app.service.workitem.thread.WorkItemThread.execute(WorkItemThread.java:51)

              at com.bladelogic.om.infra.app.service.thread.BlBlockingThread.run(BlBlockingThread.java:95)

      [02 Jun 2014 16:52:20,416] [WorkItem-Thread-36] [INFO] [bxxx@xxx.com:ENGINEER:] [Provisioning] Update Property Values from Agent on enrollment of server 'bsatest2_rhel6' failed, Failed to connect to bsatest2_rhel6:4750

      [02 Jun 2014 16:52:20,416] [WorkItem-Thread-36] [INFO] [bxxx@xxx.com:ENGINEER:] [Provisioning] Retry Update Property Values from Agent on enrollment of server 'bsatest2_rhel6'

       

      my wait for agent script below

       

      HOST=$1

       

       

      echo "checking for server"

      blcli_execute Server serverExists $HOST

      blcli_storeenv var_serverExists

      wait_count=1

       

       

      #

      # Wait for 5 minutes and exit if the server is still not registered

      #

      echo "$HOST"

      while [ "${var_serverExists}" != "true" ] && [ "${wait_count}" -lt 6 ] ; do

        echo "Agent is not enrolled yet. retry in 60 seconds"

        sleep 60

        # Increment wait count

        ((wait_count++))

        echo "$wait_count"

        blcli_execute Server serverExists $HOST

        blcli_storeenv var_serverExists

      done

        • 1. Re: agent enrollment failed
          Bill Robinson

          What’s in the rscd.log(s) on the new target ?

          • 2. Re: agent enrollment failed
            Raja Mohan

            The agent is will alive and listening on the newly provisioned server

            tcp        0      0 0.0.0.0:4750                0.0.0.0:*                   LISTEN      1626/bin/rscd

            • 3. Re: agent enrollment failed
              Raja Mohan

              Bill Robinson thanks for your quick response. I do appreciate your help. I dont notice anything on the rscd log

               

              06/02/14 16:48:56.715 INFO     rscd -  bsatest2_rhel6 6635 -1/-1 (Not_available): (Not_available): FIPS already enabled

              06/02/14 16:48:56.715 INFO     rscd -  bsatest2_rhel6 6635 -1/-1 (Not_available): (Not_available): Agent version is 8.3.02.332

              06/02/14 16:48:56.715 INFO     rscd -  bsatest2_rhel6 6635 -1/-1 (Not_available): (Not_available): Platform Details: x86_64;bsatest2_rhel6;2.6.32-358.el6.x86_64;Linux;#1 SMP Tue Jan 29 11:47:41 EST 2013;x86_64

              06/02/14 16:49:11.138 INFO1    rscd -  10.65.95.85 6702 0/0 (ENGINEER:bxxx@xxx.com): CM: > [Provisioning] Retrieving property values

              06/02/14 16:52:50.454 INFO1    rscd -  10.65.95.85 6702 0/0 (ENGINEER:bxxx@xxx.com): CM: > [Provisioning] Retrieving property values

              06/02/14 16:53:04.359 INFO1    rscd -  10.65.95.85 6749 0/0 (ENGINEER:bxxx@xxx.com): CM: > [Deploy] Job 'Deploy_RHEL6_autoinst_phase1_install' is executing a dry run

              06/02/14 16:53:24.415 INFO1    rscd -  10.65.95.85 6753 0/0 (ENGINEER:bxxx@xxx.com): CM: > [Deploy] Deleting //bsatest2_rhel6/opt/bsa/bladelogic/NSH/Transactions/log/tmp/bldeploy-8d2290ed5b4e34ff8d3e942e2b69c1fd.log

               

              This is the log from provisioning job

              ErrorJun 2, 2014 4:52:51 PMAgent enrollment failed. Any post provisioning jobs scheduled to run with this provisioning job might fail.
              InfoJun 2, 2014 4:49:11 PMOS Provisioning of device '00-50-56-8C-70-85' completed.
              InfoJun 2, 2014 4:43:40 PM++ cat /proc/meminfo

              ++ awk '{print $2}'

              ++ grep MemTotal:

              + RAM=8187728

              + '[' 8187728 -lt 2100000 ']'

              + '[' 8187728 -ge 2100000 -a 8187728 -lt 4300000 ']'

              + '[' 8187728 -ge 4300000 -a 8187728 -lt 6300000 ']'

              + '[' 8187728 -ge 6300000 ']'

              + SWAP=8192

              + echo 'RAM size is: 8187728, SWAP size is: 8192'

              + echo 'part swap --size=8192'

               

               

              InfoJun 2, 2014 4:43:36 PM+ echo 'Switch Boot Image Done'

              Switch Boot Image Done

               

               

              InfoJun 2, 2014 4:41:59 PMpxe image file: gentoo64/gentoord.gz
              InfoJun 2, 2014 4:41:59 PMRunning provisioning job with data store: Linux_Datastore
              • 4. Re: agent enrollment failed
                Bill Robinson

                in the agent log we see:

                06/02/14 16:49:11.138 INFO1    rscd -  10.65.95.85 6702 0/0 (ENGINEER:bxxx@xxx.com): CM: > [Provisioning] Retrieving property values

                 

                06/02/14 16:52:50.454 INFO1    rscd -  10.65.95.85 6702 0/0 (ENGINEER:bxxx@xxx.com): CM: > [Provisioning] Retrieving property values

                 

                so it looks like someone in the ENGINEER role from 10.65.95.85 (the appserver?) connected to the target, mapped to root and tried to pull the agent properties (which would be part of the post-install job) - i'm assuming the one at 16:49 is from the provision job ? 

                 

                in the provision job you see a timeout.  so it looks like there is some connection issue during the 1st connection attempt.  is there anything that might prevent the appserver from making the initial connection ?

                 

                is 10.80.105.55 the ip of the target ?

                1 of 1 people found this helpful
                • 5. Re: agent enrollment failed
                  Barry McQuillan

                  Hi,

                   

                  What version of RSCD agent are you installing?

                  Also is iptables enabled on the newly provisioned server?

                  • 6. Re: agent enrollment failed
                    Raja Mohan

                    Bill Robinson dont see any issues from the appserver being able to reach the guest. IP address of the guest is 10.65.105.55. I think you have answered my question, I need to findout where it is getting the IP wrong. 10.80.105.55 is wrong.

                     

                    Barry McQuillan the agent is 8.3.2.332 and there is no iptables on the server.

                    • 7. Re: agent enrollment failed
                      Raja Mohan

                      do you know it is possible that BSA remembers from previous failures? I had the wrong ip once. I fixed that and the build is functional. The DNS entries also have correct information

                       

                      # nslookup

                      > set type=PTR

                      > 10.65.105.55

                      Server:         10.65.94.16

                      Address:        10.65.94.16#53

                       

                       

                      55.105.65.10.in-addr.arpa       name = bsatest2_rhel6.xxx.com.

                      > 10.80.105.55

                      Server:         10.65.94.16

                      Address:        10.65.94.16#53

                       

                       

                      ** server can't find 55.105.80.10.in-addr.arpa.: NXDOMAIN

                      • 8. Re: agent enrollment failed
                        Raja Mohan

                        [02 Jun 2014 16:49:03,273] [SSL-Connections-Thread-3] [INFO] [Anonymous:Anonymous:10.65.105.55] [Client] The Device [00-50-56-8C-70-85] Architecture [x64] Current State [Provisioned] IP Address [10.65.105.55]

                        [02 Jun 2014 16:49:03,342] [SSL-Connections-Thread-5] [INFO] [Anonymous:Anonymous:10.65.105.55] [Client] Connection disconnecting: id = 2009

                        [02 Jun 2014 16:49:05,192] [Scheduled-System-Tasks-Thread-6] [INFO] [System:System:] [Memory Monitor] Total JVM (B): 1398996992,Free JVM (B): 1030913520,Used JVM (B): 368083472,VSize (B): 5811531776,RSS (B): 1867915264,Used File Descriptors: 282

                        [02 Jun 2014 16:49:11,071] [WorkItem-Thread-36] [INFO] [bxxx@xxx.com:ENGINEER:] [Provisioning] OS Provisioning of device '00-50-56-8C-70-85' completed.

                         

                        when the client registers, it does have the right IP address.     

                        • 9. Re: agent enrollment failed
                          Barry McQuillan

                          Yes this can occur.

                           

                          What OS is your appserver?

                           

                          On *nix:

                          You could run "/etc/rc.d/init.d/nscd restart" on the app server at some point to flush the DNS cache

                           

                          Also the appserver JVM caches DNS entries.

                          To clear the cache you will need to restart the application server.

                           

                          The DNS cache refresh can be configured if required, refer to Knowledge Artice https://kb.bmc.com/infocenter/index?page=content&id=KA304875&actp=search&viewlocale=en_US&searchid=1379513669358

                           

                          It can normally be set to 0.

                          I could imagine if you had thousands of servers and constant activity against them, that you could potentially run into a situation where you create a DNS storm, but in most of my environments, (with WITS from 50-100), the chances of that are next to 0 and the performance impact is negligible.

                          1 of 1 people found this helpful
                          • 10. Re: agent enrollment failed
                            Raja Mohan

                            Barry McQuillan The app server is running on a RHEL based Linux server. I will flush the dns cache as you suggested and will retry.

                            • 11. Re: agent enrollment failed
                              Raja Mohan

                              Barry McQuillan I hope you meant rscd  (i dont have a process nscd). I restarted the application server and the rscd agent, followed by the rebuild of the server. The agent enrollment succeeded now. Thank you both for your help. I noticed the security default is set not to expire cache, guess i will have to go fix that now.

                              • 12. Re: agent enrollment failed

                                Barry,

                                 

                                Does the app server DNS cache not respect the TTL of the DNS record?

                                 

                                We're caching DNS records in our Redhat Linux servers as well.  We set the cache timeout to exactly coincide with the DNS TTL values of our domain. Which is 5 minutes.  (One DNS query every 5 mins is not egregious, for the convenience of updates appearing in short order.)

                                 

                                So while I didn't remember whether nscd respects DNS TTL records or not, I googled.  Except for odd-ball end-corner cases (TTL = 0), nscs has been respecting DNS TTL values since 2007.

                                 

                                I know Windows used to never respect DNS TTLs.  But recent versions do.

                                 

                                Do the app server JVMs not respect DNS TTLs?

                                 

                                Spike

                                • 13. Re: agent enrollment failed
                                  Raja Mohan

                                  hi Spike White by default it looks like the JVM is set for inifinite caching atleast in my environment (which would mean only when you recycle app servers).

                                   

                                  #networkaddress.cache.ttl=-1

                                  Though it is commented, JAVA default is the same

                                   

                                  It is documented on the KB article that Barry provided and was also provided by the support earlier this morning.

                                  • 14. Re: agent enrollment failed
                                    Bill Robinson

                                    nscd is the name server caching daemon.  it's a dns cache.