2 Replies Latest reply on Oct 6, 2017 9:22 AM by Alexander Alexander

    Dev Studio stuck in "Logging In" when both CDP & HACDP service are up

    Alexander Alexander

      Hi experts,

       

      Sometimes our dev studio stuck in "logging in" when both HACDP & CDP service are up.

      If one of them is down, we easily connect to CDP / HACDP.

       

      We set dev studio grid.log to debug and keep getting these logs:

       

      06 Oct 2017 17:09:15,024 DEBUG HANode          : Initiating shared resource maintenance for topology "JMS_DISCOVERY"...

      06 Oct 2017 17:09:15,024 DEBUG HANode          : Shared resource maintenance for topology "JMS_DISCOVERY" completed successfully.

      06 Oct 2017 17:09:15,258 DEBUG HANode          : Initiating shared resource maintenance for topology "GRID_MANAGER"...

      06 Oct 2017 17:09:15,258 DEBUG HANode          : Shared resource maintenance for topology "GRID_MANAGER" completed successfully.

      06 Oct 2017 17:09:15,274 DEBUG TopologyParticipantProxy : Pipe health test failed for management.urn:jxta:uuid-569AF6FDCFC745278583C77F825D458602.JMS_DISCOVERY.CDP.urn:jxta:uuid-59616261646162614E50472050325033B7BFE43543F24D539D47CA133211BB0103.SERVER 1 time.

      06 Oct 2017 17:09:16,508 DEBUG ManagedThreadPoolExecutor : [ "Grid Framework - Inbound Message Processor" ] : queue size = 0

      06 Oct 2017 17:09:16,508 DEBUG TopologyParticipantProxiesManager : getTopologyParticipantProxies() [key=HACDP][participants=[[TopologyParticipantProxy remotePeerName=HACDP]]][key=CDP][participants=[[TopologyParticipantProxy remotePeerName=CDP]]]

      06 Oct 2017 17:09:16,508 DEBUG TopologyParticipantProxiesManager : getTopologyParticipantProxies() TPList=[[TopologyParticipantProxy remotePeerName=HACDP], [TopologyParticipantProxy remotePeerName=CDP]]

      06 Oct 2017 17:09:16,508 DEBUG TopologyParticipantProxiesManager : getTopologyParticipantProxies() [key=HACDP][participants=[[TopologyParticipantProxy remotePeerName=HACDP]]][key=CDP][participants=[[TopologyParticipantProxy remotePeerName=CDP]]]

      06 Oct 2017 17:09:16,508 DEBUG TopologyParticipantProxiesManager : getTopologyParticipantProxies() TPList=[[TopologyParticipantProxy remotePeerName=HACDP], [TopologyParticipantProxy remotePeerName=CDP]]

      06 Oct 2017 17:09:41,586 DEBUG TopologyParticipant : Calling: fail(com.realops.foundation.gridframework.TimeoutException: Message[summary=Timeout occurred., detail=A request was sent to a remote peer and a response was not received prior to the specified timeout.])

       

      06 Oct 2017 17:09:45,024 DEBUG HANode          : Initiating shared resource maintenance for topology "JMS_DISCOVERY"...

      06 Oct 2017 17:09:45,024 DEBUG HANode          : Shared resource maintenance for topology "JMS_DISCOVERY" completed successfully.

      06 Oct 2017 17:09:45,258 DEBUG HANode          : Initiating shared resource maintenance for topology "GRID_MANAGER"...

      06 Oct 2017 17:09:45,258 DEBUG HANode          : Shared resource maintenance for topology "GRID_MANAGER" completed successfully.

      06 Oct 2017 17:09:45,274 DEBUG TopologyParticipantProxiesManager : getTopologyParticipantProxies() [key=HACDP][participants=[[TopologyParticipantProxy remotePeerName=HACDP]]][key=CDP][participants=[[TopologyParticipantProxy remotePeerName=CDP]]]

      06 Oct 2017 17:09:45,274 DEBUG TopologyParticipantProxiesManager : getTopologyParticipantProxies() TPList=[[TopologyParticipantProxy remotePeerName=HACDP], [TopologyParticipantProxy remotePeerName=CDP]]

      06 Oct 2017 17:09:45,274 DEBUG ManagedThreadPoolExecutor : [ "Grid Framework - Inbound Message Processor" ] : queue size = 0

      06 Oct 2017 17:09:45,524 DEBUG TopologyParticipantProxiesManager : getTopologyParticipantProxies()

      06 Oct 2017 17:09:45,524 DEBUG TopologyParticipantProxiesManager : getTopologyParticipantProxies() TPList=[]

      06 Oct 2017 17:09:46,649 DEBUG HANode          : [Thread 43] [HANode on [Peer name=aolappapp1 id=[JmsPeerId urn:jxta:uuid-59616261646162614E50472050325033D31095A4B3624C398F5DA1498C64198C03]], isMasterCandidate=false] sent master message (action=register payload=<jms-pipe-ad><pipe-name>management.urn:jxta:uuid-569AF6FDCFC745278583C77F825D458602.JMS_DISCOVERY.aolappapp1.urn:jxta:uuid-59616261646162614E50472050325033D31095A4B3624C398F5DA1498C64198C03.CLIENT</pipe-name><broker-uri>failover:(ssl://10.53.71.104:61721?socket.enabledProtocols=TLSv1.2&amp;connectionTimeout=1000&amp;socket.enabledCipherSuites=TLS_RSA_WITH_AES_256_CBC_SHA,ssl://10.53.71.103:61719?socket.enabledProtocols=TLSv1.2&amp;connectionTimeout=1000&amp;socket.enabledCipherSuites=TLS_RSA_WITH_AES_256_CBC_SHA)?randomize=false&amp;maxReconnectAttempts=1</broker-uri></jms-pipe-ad>) when master not known. Queueing.

      06 Oct 2017 17:09:46,821 DEBUG HANode          : [Thread 43] [HANode on [Peer name=aolappapp1 id=[JmsPeerId urn:jxta:uuid-59616261646162614E50472050325033D31095A4B3624C398F5DA1498C64198C03]], isMasterCandidate=false] sent master message (action=register payload=<jms-pipe-ad><pipe-name>management.urn:jxta:uuid-569AF6FDCFC745278583C77F825D458602.GRID_MANAGER.aolappapp1.urn:jxta:uuid-59616261646162614E50472050325033D31095A4B3624C398F5DA1498C64198C03.ADP</pipe-name><broker-uri>failover:(ssl://10.53.71.104:61721?socket.enabledProtocols=TLSv1.2&amp;connectionTimeout=1000&amp;socket.enabledCipherSuites=TLS_RSA_WITH_AES_256_CBC_SHA,ssl://10.53.71.103:61719?socket.enabledProtocols=TLSv1.2&amp;connectionTimeout=1000&amp;socket.enabledCipherSuites=TLS_RSA_WITH_AES_256_CBC_SHA)?randomize=false&amp;maxReconnectAttempts=1</broker-uri></jms-pipe-ad>) when master not known. Queueing.

       

      My suspect the root cause is, CDP claim itself as master and HACDP also claim itself as master.

      But when i take a look at CDP & HACDP Grid Manager, both of them tell CDP as master.

       

      CDP point of view:

       

      HACDP Point of view:

       

      Sometimes we bounce the CDP and HACDP service, then it solved the problem.

      But sometimes we already restart, the issue still not fixed, then 2-3 hours later, magically the issue is gone.

       

      This issue always happen 1-3 times in a week.

      Any idea how to fix this?

       

      Thanks in advance.

      Alexander