1 2 Previous Next 23 Replies Latest reply on Aug 11, 2017 1:09 PM by Andrew Waters

    Consolidation stuck v10.2.0.1

    Ravi Sankar Pasumarthy

      Hi,

       

      Currently consolidation is stuck in our Production consolidator. Discovery status is 'running' on Consolidator and scans are completed on scanners. I have enabled DEBUG on consolidator,Below is the tw_reasoningstatus command output. Can anyone please help on this asap.

       

      140473111426816: 2016-03-18 11:51:35,085: reasoning.ecacontroller: INFO:

       

       

                                       |                                                           ECA Engine

                                       |       0 |       1 |       2 |       3 |       4 |       5 |       6 |       7 |       8 |       9 |      10 |      11 |      12

      ------------------------------------------------------------------------------------------------------------------------------------------------------------------

                          Event engine |         |         |         |         |         |         |         |         |         |         |         |         |       

                               Status: | running | running | running | running | running | running | running | running | running | running | running | running | running

                        Queued events: |      60 |       7 |       0 |      21 |      33 |       0 |     303 |       0 |      14 |      21 |     228 |      59 |      49

          Events processing (maximum): |   1 (1) |   1 (1) |   1 (1) |   1 (1) |   1 (1) |   1 (1) |   1 (1) |   1 (1) |   1 (1) |   1 (1) |   1 (1) |   1 (1) |   1 (1)

                       Actions loaded: |     187 |     187 |     187 |     187 |     187 |     187 |     187 |     187 |     187 |     187 |     187 |     187 |     187

                         Rules loaded: |   21567 |   21567 |   21567 |   21567 |   21567 |   21567 |   21567 |   21567 |   21567 |   21567 |   21567 |   21567 |   21567

                                       |         |         |         |         |         |         |         |         |         |         |         |         |       

                      Discovery engine |         |         |         |         |         |         |         |         |         |         |         |         |       

                               Status: | running | running | running | running | running | running | running | running | running | running | running | running | running

                      Queued requests: |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0

                       Endpoint count: |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0

                    Endpoints waiting: |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0

      Endpoints discovering (maximum): |  0 (30) |  0 (30) |  0 (30) |  0 (30) |  0 (30) |  0 (30) |  0 (30) |  0 (30) |  0 (30) |  0 (30) |  0 (30) |  0 (30) |  0 (30)

                Asynchronous requests: |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0

                  Endpoint throttling: |   False |   False |   False |   False |   False |   False |   False |   False |   False |   False |   False |   False |   False

                 Providers throttling: |   False |   False |   False |   False |   False |   False |   False |   False |   False |   False |   False |   False |   False

       

      Most of the lines in tw_svc_reasoning log are like below. What do they exactly mean ?

       

      140473111426816: 2016-03-18 11:45:48,144: common.delayedPerform: DEBUG: 5975:Calling (<bound method ECAControllerLocal._statusUpdate of <reasoning.ecacontroller.ECAControllerLocal object at 0x7fc274a93ed0>>, (), {})

      140473111426816: 2016-03-18 11:45:48,145: common.delayedPerform: DEBUG: 5976: register delay 10.000000, (<bound method ReasoningControllerLocal._deleteConsolidatedData of <reasoning.reasoningcontroller.ReasoningControllerLocal object at 0x7fc278c4c650>>, (), {})

      140473111426816: 2016-03-18 11:45:48,147: common.delayedPerform: DEBUG: waiting for 9.997819

       

      Regards,

      Ravi

        • 1. Re: Consolidation stuck v10.2.0.1
          Ravi Sankar Pasumarthy

          Below is the reasoning log info from Scanner

           

          140225322727168: 2016-03-18 12:19:49,012: reasoning.ecacontroller: INFO:

           

           

                                           |                                                           ECA Engine

                                           |       0 |       1 |       2 |       3 |       4 |       5 |       6 |       7 |       8 |       9 |      10 |      11 |      12

          ------------------------------------------------------------------------------------------------------------------------------------------------------------------

                              Event engine |         |         |         |         |         |         |         |         |         |         |         |         |       

                                   Status: | running | running | running | running | running | running | running | running | running | running | running | running | running

                            Queued events: |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0

              Events processing (maximum): |   0 (1) |   0 (1) |   0 (1) |   0 (1) |   0 (1) |   0 (1) |   0 (1) |   0 (1) |   0 (1) |   0 (1) |   0 (1) |   0 (1) |   0 (1)

                           Actions loaded: |     187 |     187 |     187 |     187 |     187 |     187 |     187 |     187 |     187 |     187 |     187 |     187 |     187

                             Rules loaded: |   21371 |   21371 |   21371 |   21371 |   21371 |   21371 |   21371 |   21371 |   21371 |   21371 |   21371 |   21371 |   21371

                                           |         |         |         |         |         |         |         |         |         |         |         |         |       

                          Discovery engine |         |         |         |         |         |         |         |         |         |         |         |         |       

                                   Status: | running | running | running | running | running | running | running | running | running | running | running | running | running

                          Queued requests: |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0

                           Endpoint count: |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0

                        Endpoints waiting: |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0

          Endpoints discovering (maximum): |  0 (30) |  0 (30) |  0 (30) |  0 (30) |  0 (30) |  0 (30) |  0 (30) |  0 (30) |  0 (30) |  0 (30) |  0 (30) |  0 (30) |  0 (30)

                    Asynchronous requests: |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0 |       0

                      Endpoint throttling: |   False |   False |   False |   False |   False |   False |   False |   False |   False |   False |   False |   False |   False

                     Providers throttling: |   False |   False |   False |   False |   False |   False |   False |   False |   False |   False |   False |   False |   False

          • 2. Re: Consolidation stuck v10.2.0.1
            Brice-Emmanuel Loiseaux

            Looks like the consolidation queue is full of events. Do you see the "Queued events" figures decreasing or not? Any activity when you have a look to the eca_engine logs?

            • 3. Re: Consolidation stuck v10.2.0.1
              Andrew Waters

              As Brice says, this shows that the scanner has finished processing everything but the consolidator is processing events.

               

              What makes you think it is stuck?

              • 4. Re: Consolidation stuck v10.2.0.1
                Andrew Waters

                That is internal information about what the system is doing. They are to help support understand what is happening. If you are seeing those then the consolidator is definitely not stuck.

                • 5. Re: Consolidation stuck v10.2.0.1
                  Ravi Sankar Pasumarthy

                  Hi Andrew,

                   

                  Sorry for the late response.

                   

                  There were some scans which were finished on scanner days like 1-2 days before and still it shows up on the consolidator which was the reason i guessed that those were stuck. I cancelled those runs and immediately updated the other runs also as completed which i am seeing for the first time.

                   

                  But the thing here is i see the same again ..I mean every now and then i see that consolidator shows some runs in progress whereas they will be completed on scanners already 3-4 days before. Every time i have to cancel them so that they dont show up. If i dont cancel them they are there forever.

                   

                  Any inputs here please ?

                   

                  Regards

                  Ravi

                  • 6. Re: Consolidation stuck v10.2.0.1
                    Timothy Crawford

                    Ravi,

                     

                    I have noticed this issue before. I do not have an answer nor do I have a solution. I am currently trying to solve with an upgrade to 10.2.0.3 (with the latest Mar 2016 OS updates, TKU, EDP etc.).

                    However, there are some checks worth doing before pursuing the same road.

                    If you have already done these checks, then I apologise, but I thought it might be worth sharing some experiences and some notable milestones.

                    1) Please check that the port 25032 is open on the consolidation appliance (I know it may have been open before, but you never know when some network chap decides to install a firewall between location X and Y and happens to block that essential port; note: Not to give network admins any bad reputation; we rely on them so much and appreciate all their efforts).

                    2) Please check the hard disk of the consolidation appliance. Is it too full?

                    3) Please check whether (if you know), if there are there any patterns or application mappings on the consolidation appliance but not on the scanner? This can affect the performance difference between the two. It might be that the consolidation appliance is triggering more than the scanning, generating more nodes, etc.

                    4) Please check the performance of the appliance e.g. in the performance tab in the Administartion section of the appliance UI. Is the consolidation appliance under resourced for it's efforts?

                    5) Is there anyway for you to test the network traffic quality between the two appliances (scanning and consolidation)? If there's an issue there, it could be corrupting the data.

                    6) It's something else deeper and unknown, and I can only suggest taking the same road to version 10.2.0.3 or even version 11 in the hope it gets resolved.

                     

                    All the best.

                    • 7. Re: Consolidation stuck v10.2.0.1
                      Brice-Emmanuel Loiseaux

                      Again, how do you know these runs are stuck instead of taking ages to process some endpoints due to some pattern problems, or if really stuck what are the last endpoints processed. Any commonalities between the scans that you saw "stuck" (same runs, run with same endpoints, endpoints with same software footprint, ...).

                       

                      You may be better served opening a customer support case here.

                      • 8. Re: Consolidation stuck v10.2.0.1
                        Ravi Sankar Pasumarthy

                        Hi Tim and Brian,

                         

                        I have checked some of your points but not all. The next time issue happens i will not cancel them but consider the points you have  raised and as suggested will go with a case if needed.

                         

                        Thanks for your inputs.

                         

                        Regards,

                        Ravi

                        • 9. Re: Consolidation stuck v10.2.0.1
                          Yan De Wulf

                          One of our consolidator is running into this very same issue, this after being down 2 days over the weekend for compacting (30hrs) brought it back online to catch with 26 discovery runs which our scanners completed long ago. Now only 1 consolidation run is processing at all and this at slower than snail pace. Here's a sample of our reasoning log when set to debug.

                           

                          Opened a case w/ BMC and David Miller started working it.

                           

                          140344911300352: 2016-06-13 15:34:56,568: reasoning.ecacontroller: USEFUL: Setting loglevel to DEBUG

                          140344875108096: 2016-06-13 15:35:00,924: common.delayedPerform: DEBUG: 499: register delay 1.000000, (<bound method ECAControllerLocal._statusUpdate of <reasoning.ecacontroller.ECAControllerLocal object at 0x7fa4aa1a4990>>, (), {})

                          140345169078016: 2016-06-13 15:35:00,964: common.delayedPerform: DEBUG: waiting for 0.959611

                          140345169078016: 2016-06-13 15:35:01,924: common.delayedPerform: DEBUG: 499:Calling (<bound method ECAControllerLocal._statusUpdate of <reasoning.ecacontroller.ECAControllerLocal object at 0x7fa4aa1a4990>>, (), {})

                          140345169078016: 2016-06-13 15:35:01,925: common.delayedPerform: DEBUG: 500: register delay 10.000000, (<bound method ReasoningControllerLocal._deleteConsolidatedData of <reasoning.reasoningcontroller.ReasoningControllerLocal object at 0x7fa4ae375e90>>, (), {})

                          140345169078016: 2016-06-13 15:35:01,926: common.delayedPerform: DEBUG: waiting for 9.998805

                          140345169078016: 2016-06-13 15:35:11,925: common.delayedPerform: DEBUG: 500:Calling (<bound method ReasoningControllerLocal._deleteConsolidatedData of <reasoning.reasoningcontroller.ReasoningControllerLocal object at 0x7fa4ae375e90>>, (), {})

                          140345169078016: 2016-06-13 15:35:11,926: common.delayedPerform: DEBUG: waiting for 41.174248

                          140344854128384: 2016-06-13 15:35:33,844: common.delayedPerform: DEBUG: 501: register delay 1.000000, (<bound method ECAControllerLocal._statusUpdate of <reasoning.ecacontroller.ECAControllerLocal object at 0x7fa4aa1a4990>>, (), {})

                          140345169078016: 2016-06-13 15:35:33,886: common.delayedPerform: DEBUG: waiting for 0.958324

                          140345169078016: 2016-06-13 15:35:34,844: common.delayedPerform: DEBUG: 501:Calling (<bound method ECAControllerLocal._statusUpdate of <reasoning.ecacontroller.ECAControllerLocal object at 0x7fa4aa1a4990>>, (), {})

                          140345169078016: 2016-06-13 15:35:34,845: common.delayedPerform: DEBUG: 502: register delay 10.000000, (<bound method ReasoningControllerLocal._deleteConsolidatedData of <reasoning.reasoningcontroller.ReasoningControllerLocal object at 0x7fa4ae375e90>>, (), {})

                          • 10. Re: Consolidation stuck v10.2.0.1
                            Andrew Waters

                            Reasoning starts processing consolidation from the older DiscoveryRun to the newest. This means that if the oldest run has enough data then only that run will be in progress.

                             

                            Consolidation "scanning" behaviour is significantly different to discovery. The reason you see lots of things in progress in discovery is because the system starts new work when waiting for a request to a machine being discovered to be completed. Consolidation does not have to wait so there is no time to start another request.

                             

                            This log indicates at least some work is being done.

                            • 11. Re: Consolidation stuck v10.2.0.1
                              Ravi Sankar Pasumarthy

                              Hi Yan,

                               

                              How did your case go.. any findings ?

                               

                              Regards,

                              Ravi

                              • 12. Re: Consolidation stuck v10.2.0.1
                                Yan De Wulf

                                No. support didn't find anything meaningful in all the logs requested which I uploaded to the case. We ended up "cancelling" all 26 stuck consolidation runs on the problem consolidator. Each of the cancelled run disappeared right away and we did not see any errors come through, I was monitoring the reasoning log real time.  All the while all new discovery runs consolidated right way w/out problems.

                                1 of 1 people found this helpful
                                • 13. Re: Consolidation stuck v10.2.0.1
                                  Yan De Wulf

                                  Thanks for the feedback Andrew

                                  • 14. Re: Consolidation stuck v10.2.0.1
                                    Ravi Sankar Pasumarthy

                                    Thanks for the update !!

                                     

                                    That is what usually we end up doing. If we decide to find root cause for this i think it will go too long and usually we will have other issues or tasks to worry about and hence end up going for this solution.

                                     

                                    Ravi

                                    1 2 Previous Next