1 2 3 Previous Next 41 Replies Latest reply on Mar 27, 2018 8:05 AM by Greg Sacrey

    Consolidation runs stuck at 100 % in UI

    Ondrej Kieler

      Hello,

       

      we are using BMC Discovery version 11.2.0.1. We are consolidating data into the Consolidation cluster consisting of 3 appliances and we are facing the issue, that we now for few days see in UI of application not finished runs where discovery process achieved 100% and consolidation process also. Please see picture bellow:

       

       

      tw_reasoning status shows this:

       

      Run xxxx [08cfd335a517f49ec8aa0a11321b7541]

          Discovery start time:   2017-11-06 12:00:07 GMT

          Discovery complete:     Yes

          Last received data:     2017-11-07 09:18:46 GMT

          Endpoints consolidated: 16400 done, 0 waiting

       

       

          Run xxxx [08cfd335a52053bcfcc80a11321b7541]

          Discovery start time:   2017-11-06 13:00:02 GMT

          Discovery complete:     Yes

          Last received data:     2017-11-07 12:35:57 GMT

          Endpoints consolidated: 16400 done, 0 waiting

       

       

          Run xxxx [153d8035a528b531b146ac13a0194020]

          Discovery start time:   2017-11-06 14:00:02 GMT

          Discovery complete:     Yes

          Last received data:     2017-11-06 23:20:43 GMT

          Endpoints consolidated: 132911 done, 0 waiting

       

       

          Run xxxx [08cfd335a528b57fad600a11321b7541]

          Discovery start time:   2017-11-06 14:00:02 GMT

          Discovery complete:     Yes

          Last received data:     2017-11-07 13:45:19 GMT

          Endpoints consolidated: 16432 done, 0 waiting

       

      So last received data was yesterday at 13:45. We have 0 waiting endpoints so it seems that run is completely consolidated. But it is still visible. After restart of services these runs disappeared.

       

      Do you have any idea what can be wrong?

       

      Thanks,

       

      Ondrej

        • 1. Re: Consolidation runs stuck at 100 % in UI
          Andrew Waters

          From this - no.

           

          Is anything logged in tw_svc_reasoning.log or tw_svc_reasoning.out on the consolidation coordinator?

          1 of 1 people found this helpful
          • 2. Re: Consolidation runs stuck at 100 % in UI
            Ondrej Kieler

            tw_svc_reasoning.out shows that service was started since 6th of November and then failed to stop during the restart of services:

            Mon Nov  6 09:31:03 2017 : tw_svc_reasoning started.

            Wed Nov  8 09:14:06 2017 : Server run finished - shutdown beginning ...

            Wed Nov  8 09:14:09 2017 : Server run finished - shutdown beginning ...

            Wed Nov  8 09:14:09 2017 : Server run finished - shutdown beginning ...

            Wed Nov  8 09:14:09 2017 : Server run finished - shutdown beginning ...

            Wed Nov  8 09:14:09 2017 : Server run finished - shutdown beginning ...

            Wed Nov  8 09:14:09 2017 : Server run finished - shutdown beginning ...

            Wed Nov  8 09:14:09 2017 : Server run finished - shutdown beginning ...

            Wed Nov  8 09:14:09 2017 : Server run finished - shutdown beginning ...

            Wed Nov  8 09:14:36 2017 : Watchdog : Service failed during shutdown. Exiting.

             

             

            Wed Nov  8 09:19:49 2017 : tw_svc_reasoning started.

            And now it is started again after the restart.

             

            Before resatrt shows tw_svc_reasoning.log these four runs as done and 0 waiting. No CIs from these runs were processed. During the restart it showed this:

            140603529475840: 2017-11-08 09:14:06,331: common.server: INFO: Server run finished - shutdown beginning ...

            140603529475840: 2017-11-08 09:14:06,332: common.server: INFO: Suppress SIGTERM and SIGINT during shutdown.

            140603529475840: 2017-11-08 09:14:06,399: reasoning.patternupdate: INFO: Stopping pattern update thread

            140603529475840: 2017-11-08 09:14:06,400: reasoning.external_events.events_manager: INFO: Writing final event source details

            140603529475840: 2017-11-08 09:14:06,400: reasoning.polling: INFO: Stopping event polling threads

            140603529475840: 2017-11-08 09:14:06,439: reasoning.consolidation.sender: INFO: Stop consolidation sender thread

            140603073824512: 2017-11-08 09:14:09,562: common.delayedPerform: ERROR: Exception while calling delayed function <bound method ReasoningControllerLocal._deleteConsolidatedData of <reasoning.reasoningcontroller.ReasoningControllerLocal object at 0x7fe0bdbb3d10>>

            Traceback (most recent call last):

              File "./delay.py", line 341, in run

              File "./reasoningcontroller.py", line 5387, in _deleteConsolidatedData

              File "./ecacontroller.py", line 4640, in decorated

              File "./health.py", line 139, in __exit__

            ECAControllerUnavailable: ReasoningCORBA.ECAControllerUnavailable(reason='Consolidatiton_Cluster_GIO-02 [second cluster node]: ecDeleteConsolidatedData failure - TRANSIENT_ConnectFailed')

            140603529475840: 2017-11-08 09:14:09,577: reasoning.maintenance_schedules: INFO: Stop model maintenance informing thread

            140603529475840: 2017-11-08 09:14:10,349: reasoning.reasoninginterface: INFO: Reasoning shutdown complete

            140552053348096: 2017-11-08 09:19:49,134: common.configuration: USEFUL: PRODUCT: BMC Discovery

            140552053348096: 2017-11-08 09:19:49,134: common.configuration: USEFUL: VERSION: 11.2.0.1

            140552053348096: 2017-11-08 09:19:49,134: common.configuration: USEFUL: RELEASE: 713859

            140552053348096: 2017-11-08 09:19:49,134: common.configuration: USEFUL: COPYRIGHT: Copyright (C) 2003 - 2017, BMC Software

            140552053348096: 2017-11-08 09:19:49,134: common.configuration: USEFUL: PROCESS ID: 7822

            140552053348096: 2017-11-08 09:19:49,135: common.configuration: USEFUL: APPLIANCE: xxxx

            140552053348096: 2017-11-08 09:19:49,135: common.configuration: USEFUL: APPLIANCE NAME: Consolidatiton_Cluster_GIO-01

            140552053348096: 2017-11-08 09:19:49,135: common.configuration: INFO: Adding security settings & options

            140552053348096: 2017-11-08 09:19:49,136: common.configuration: INFO: Adding daemon settings & options

            140552053348096: 2017-11-08 09:19:49,136: common.configuration: INFO: Adding CORBA server settings & options

            140552053348096: 2017-11-08 09:19:49,136: common.configuration: INFO: Adding datastore settings & options

            140552053348096: 2017-11-08 09:19:49,136: common.configuration: INFO: Processing command line options

            140552053348096: 2017-11-08 09:19:49,137: common.configuration: INFO: Using sys.argv

            140552053348096: 2017-11-08 09:19:49,137: common.configuration: INFO: Consuming --daemon

            140552053348096: 2017-11-08 09:19:49,137: common.configuration: INFO: Consuming --daemon argument: start

            140552053348096: 2017-11-08 09:19:49,148: common.watchdog: INFO: Watchdog managed process started for the first time, pid 7842

            140552053348096: 2017-11-08 09:19:49,153: common.configuration: INFO: SSL: Using CA path /usr/tideway/etc/ca/hash, (path exists)

            140552053348096: 2017-11-08 09:19:49,190: businesslogic: INFO: Initialise business logic.

            140552053348096: 2017-11-08 09:19:49,195: businesslogic.event: INFO: Not sending data store events.

            140552053348096: 2017-11-08 09:19:49,195: businesslogic.partition: INFO: Initialise business logic partition for local use...

            140552053348096: 2017-11-08 09:19:49,196: businesslogic.registry: INFO: Initialise business logic registry for local use...

            140552053348096: 2017-11-08 09:19:49,198: businesslogic.registry: INFO: Finished initialising business logic registry. 9 node kinds; 1 relationship kind.

            140552053348096: 2017-11-08 09:19:49,227: reasoning.reasoninginterface: INFO: Creating local ReasoningController

            140552053348096: 2017-11-08 09:19:49,228: reasoning.reasoningcontroller: INFO: Building table of ECA Controllers

            140552053348096: 2017-11-08 09:19:49,228: reasoning.reasoningcontroller: INFO:   Local ECA Controller will register later

            140552053348096: 2017-11-08 09:19:49,229: reasoning.reasoningcontroller: INFO:   Add First member ECA Controller

            140552053348096: 2017-11-08 09:19:49,229: reasoning.reasoningcontroller: INFO:   Add Second member ECA Controller

            140552053348096: 2017-11-08 09:19:49,230: reasoning.pattern_execution: INFO: Finding pattern executions...

            140552053348096: 2017-11-08 09:19:49,231: reasoning.pattern_execution: INFO: Found 0 in progress pattern executions

            140552053348096: 2017-11-08 09:19:49,394: reasoning.patternupdate: INFO: Ensuring rules up to date

            140552053348096: 2017-11-08 09:19:54,793: reasoning.patternupdate: INFO: Finding all rule modules

            140552053348096: 2017-11-08 09:19:55,429: reasoning.patternupdate: INFO: Finding which rule modules to compile

            140552053348096: 2017-11-08 09:19:58,860: reasoning.patternupdate: INFO: Storing rule module details

            140552053348096: 2017-11-08 09:19:58,868: reasoning.patternupdate: INFO: Committing changes to datastore

            140552053348096: 2017-11-08 09:19:58,875: reasoning.reasoningcontroller: INFO: Initialising Reasoning

            140552053348096: 2017-11-08 09:19:58,879: reasoning.external_events.events_manager: INFO: Loaded 1 event source

            140552053348096: 2017-11-08 09:19:58,892: reasoning.consolidation.manager: INFO: Consolidating from 24 systems

            140552053348096: 2017-11-08 09:19:58,999: reasoning.maintenance_manager: INFO: Loaded 0 maintenance schedules

            140552053348096: 2017-11-08 09:19:59,023: reasoning.polling: INFO: vCenter Polling Manager: found 5 active vCenter event patterns

            140552053348096: 2017-11-08 09:19:59,027: reasoning.reasoninginterface: INFO: Creating local ECAController

            140552053348096: 2017-11-08 09:19:59,058: reasoning.providers: INFO: (Re)registered 'Mainframe': type 'Mainframe' with priority 100

            140552053348096: 2017-11-08 09:19:59,083: reasoning.providers: INFO: (Re)registered 'SQL Provider': type 'SQL' with priority 100

            140552053348096: 2017-11-08 09:19:59,086: reasoning.ecacontroller: INFO: 8 CPUs. Creating 7 ECA Engines.

            140552053348096: 2017-11-08 09:19:59,086: reasoning.ecacontroller: INFO: Creating 7 ECA Engines

            140552053348096: 2017-11-08 09:20:19,255: reasoning.ecacontroller: INFO: Ready to accept heartbeat

            140552053348096: 2017-11-08 09:20:19,257: reasoning.reasoningcontroller: INFO: Waiting for 2 ECAControllers

            140552053348096: 2017-11-08 09:20:22,260: reasoning.reasoningcontroller: INFO: Waiting for 1 ECAController (3s)

            140551419369216: 2017-11-08 09:20:24,301: reasoning.ecacontroller: INFO: First member is available

            140551419369216: 2017-11-08 09:20:24,343: reasoning.ecacontroller: INFO: Second member is available

            140552053348096: 2017-11-08 09:20:25,264: reasoning.reasoningcontroller: INFO: Recovering system...

            140552053348096: 2017-11-08 09:20:25,265: reasoning.reasoningcontroller: INFO: Initiating recovery 0de4b835a693fa22c8440a2719781e8e

            140552053348096: 2017-11-08 09:20:30,275: reasoning.reasoningcontroller: INFO: Find discovery runs...

            140552053348096: 2017-11-08 09:20:30,290: reasoning.reasoningcontroller: INFO: Found 12 in progress discovery runs

            140552053348096: 2017-11-08 09:20:30,314: reasoning.external_events.events_manager: INFO: Find external events...

            140552053348096: 2017-11-08 09:20:30,316: reasoning.external_events.events_manager: INFO: Found 0 in progress external events

            140552053348096: 2017-11-08 09:20:30,316: reasoning.reasoningcontroller: INFO: Recover trackers...

            140552053348096: 2017-11-08 09:20:30,322: reasoning.reasoningcontroller: INFO: Retrieving 42 queue details...

            140552053348096: 2017-11-08 09:20:30,326: reasoning.reasoningcontroller: INFO: Retrieved queue details (31/42)

            140552053348096: 2017-11-08 09:20:35,332: reasoning.reasoningcontroller: INFO: Retrieved queue details (42/42)

            140552053348096: 2017-11-08 09:20:35,333: reasoning.reasoningcontroller: INFO: Selecting recovery machines...

            140552053348096: 2017-11-08 09:20:35,336: reasoning.reasoningcontroller: INFO: Recovering 24 trackers...

            140552053348096: 2017-11-08 09:20:35,337: reasoning.reasoningcontroller: INFO: Recover 7 trackers from ECA Controller Consolidatiton_Cluster_GIO-01

            140552053348096: 2017-11-08 09:20:35,356: reasoning.reasoningcontroller: INFO: Recover 9 trackers from ECA Controller Consolidatiton_Cluster_GIO-02

            140552053348096: 2017-11-08 09:20:35,371: reasoning.reasoningcontroller: INFO: Recover 8 trackers from ECA Controller Consolidatiton_Cluster_GIO-03

            140552053348096: 2017-11-08 09:20:35,387: reasoning.reasoningcontroller: INFO: Recovered trackers (0/24)

            140552053348096: 2017-11-08 09:20:40,400: reasoning.reasoningcontroller: INFO: ECA controller Consolidatiton_Cluster_GIO-01 [10.39.25.120] recovery of 7 trackers complete

            140552053348096: 2017-11-08 09:20:40,401: reasoning.reasoningcontroller: INFO: Recovered trackers (10/24)

            140552053348096: 2017-11-08 09:20:45,413: reasoning.reasoningcontroller: INFO: ECA controller Consolidatiton_Cluster_GIO-03 [degio9010ua122p.wniadmp.intranet] recovery of 8 trackers complete

            140552053348096: 2017-11-08 09:20:45,414: reasoning.reasoningcontroller: INFO: ECA controller Consolidatiton_Cluster_GIO-02 [degio9010ua121p.wniadmp.intranet] recovery of 9 trackers complete

            140552053348096: 2017-11-08 09:20:45,416: reasoning.reasoningcontroller: INFO: Recovered trackers (24/24)

            140552053348096: 2017-11-08 09:20:45,417: reasoning.consolidation.receiver: INFO: Recovering consolidation...

             

            After restart I cannot see these four runs them in tw_svc_reasoning.log. I can see only 4 consolidation runs in logs and also in UI.

             

            Do you have any hint?

             

            Thanks

            • 3. Re: Consolidation runs stuck at 100 % in UI
              Andrew Waters

              tw_svc_reasoning.log would be from before the service restart rather than afterwards.

              1 of 1 people found this helpful
              • 4. Re: Consolidation runs stuck at 100 % in UI
                Ondrej Kieler

                There is unfortunately nothing significant:

                 

                Appliances:

                 

                 

                  Consolidatiton_Cluster_GIO-01 [coordinator]

                 

                 

                  Consolidatiton_Cluster_GIO-02 [member 1]:

                    last success: Wed Nov  8 09:11:11 2017

                    last failure: never

                 

                 

                  Consolidatiton_Cluster_GIO-03 [member 2]:

                    last success: Wed Nov  8 09:11:11 2017

                    last failure: never

                 

                 

                Consolidation status:

                     [08cfd335a517f49ec8aa0a11321b7541]

                    Discovery start time:  2017-11-06 12:00:07 GMT

                    Discovery complete:    Yes

                    Last received data:    2017-11-07 09:18:46 GMT

                    Endpoints consolidated: 16400 done, 0 waiting

                 

                 

                    [08cfd335a52053bcfcc80a11321b7541]

                    Discovery start time:  2017-11-06 13:00:02 GMT

                    Discovery complete:    Yes

                    Last received data:    2017-11-07 12:35:57 GMT

                    Endpoints consolidated: 16400 done, 0 waiting

                 

                 

                    [153d8035a528b531b146ac13a0194020]

                    Discovery start time:  2017-11-06 14:00:02 GMT

                    Discovery complete:    Yes

                    Last received data:    2017-11-06 23:20:43 GMT

                    Endpoints consolidated: 132911 done, 0 waiting

                 

                 

                    [08cfd335a528b57fad600a11321b7541]

                    Discovery start time:  2017-11-06 14:00:02 GMT

                    Discovery complete:    Yes

                    Last received data:    2017-11-07 13:45:19 GMT

                    Endpoints consolidated: 16432 done, 0 waiting

                 

                 

                    [b75cf135a5b7321d4fec194620872c54]

                    Discovery start time:  2017-11-07 07:00:00 GMT

                    Discovery complete:    No

                    Last received data:    2017-11-08 09:11:11 GMT

                    Endpoints consolidated: 26025 done, 3 waiting

                 

                 

                    [0e177a35a5c7fb0912960a15323914ab]

                    Discovery start time:  2017-11-07 09:00:09 GMT

                    Discovery complete:    Yes

                    Last received data:    2017-11-07 10:53:00 GMT

                    Endpoints consolidated: 16385 done, 0 waiting

                 

                 

                     [4187be35a5d059273f120a12326810c8]

                    Discovery start time:  2017-11-07 10:00:03 GMT

                    Discovery complete:    Yes

                    Last received data:    2017-11-07 10:35:15 GMT

                    Endpoints consolidated: 16386 done, 0 waiting

                 

                 

                    [bba77e35a5e9815605860a1732536747]

                    Discovery start time:  2017-11-07 13:00:08 GMT

                    Discovery complete:    Yes

                    Last received data:    2017-11-08 05:55:00 GMT

                    Endpoints consolidated: 49195 done, 0 waiting

                 

                 

                    [bba77e35a5f1e030b8780a1732536747]

                    Discovery start time:  2017-11-07 14:00:03 GMT

                    Discovery complete:    No

                    Last received data:    2017-11-08 09:10:40 GMT

                    Endpoints consolidated: 50329 done, 0 waiting

                 

                 

                    [bba77e35a5fa4074b05e0a1732536747]

                    Discovery start time:  2017-11-07 15:00:00 GMT

                    Discovery complete:    No

                    Last received data:    2017-11-08 09:10:51 GMT

                    Endpoints consolidated: 47187 done, 0 waiting

                 

                 

                     [65284a35a613694706320a176b1045a2]

                    Discovery start time:  2017-11-07 18:00:06 GMT

                    Discovery complete:    Yes

                    Last received data:    2017-11-08 08:15:27 GMT

                    Endpoints consolidated: 16400 done, 0 waiting

                 

                 

                    Run WN_intranet_2 [153d8035a66f9cda4e30ac13a0194020]

                    Discovery start time:  2017-11-08 05:00:06 GMT

                    Discovery complete:    No

                    Last received data:    2017-11-08 09:11:14 GMT

                    Endpoints consolidated: 65803 done, 43 waiting

                140603073824512: 2017-11-08 09:11:33,490: reasoning.ecacontroller: INFO:

                 

                 

                                                |                            ECA Engine

                                                |      0 |      1 |      2 |      3 |      4 |      5 |      6

                ------------------------------------------------------------------------------------------------------

                                    Event engine |        |        |        |        |        |        |      

                                        Status: | running | running | running | running | running | running | running

                                  Queued events: |      1 |      78 |      0 |      10 |      0 |    119 |      0

                    Events processing (maximum): |  1 (1) |  1 (1) |  1 (1) |  1 (1) |  1 (1) |  1 (1) |  1 (1)

                                Actions loaded: |    225 |    225 |    225 |    225 |    225 |    225 |    225

                                  Rules loaded: |  26250 |  26250 |  26250 |  26250 |  26250 |  26250 |  26250

                                                |        |        |        |        |        |        |      

                                Discovery engine |        |        |        |        |        |        |      

                                        Status: | running | running | running | running | running | running | running

                                Queued requests: |      0 |      0 |      0 |      0 |      0 |      0 |      0

                                Endpoint count: |      0 |      0 |      0 |      0 |      0 |      0 |      0

                              Endpoints waiting: |      0 |      0 |      0 |      0 |      0 |      0 |      0

                Endpoints discovering (maximum): |  0 (30) |  0 (30) |  0 (30) |  0 (30) |  0 (30) |  0 (30) |  0 (30)

                          Asynchronous requests: |      0 |      0 |      0 |      0 |      0 |      0 |      0

                            Endpoint throttling: |  False |  False |  False |  False |  False |  False |  False

                          Providers throttling: |  False |  False |  False |  False |  False |  False |  False

                 

                 

                 

                 

                Scanning:

                  Run 153d8035a66f9cda4e30ac13a0194020:

                    Engine 0: 172.18.124.10

                    Engine 1: 172.18.124.98

                    Engine 3: 172.18.123.250

                    Engine 5: 172.18.124.33

                 

                 

                  Run bba77e35a5fa4074b05e0a1732536747:

                    Engine 2: 10.194.92.9

                    Engine 4: 10.195.92.100

                    Engine 6: 10.194.92.8

                 

                From this it seems that nothing is processed from runs which I posted in picture.

                • 5. Re: Consolidation runs stuck at 100 % in UI
                  Andrew Waters

                  There is something odd going on there because in the first case you have a run which in the UI is showing 132902 endpoints but then in the status is reporting 132911 done. So the question is where did the other 9 IPs come from.

                   

                  Presumably the scanner says there were 132902 IPs in the original scan.

                  3 of 3 people found this helpful
                  • 6. Re: Consolidation runs stuck at 100 % in UI
                    Ondrej Kieler

                    yes, Andrew, you are right scanner is showing different number of IPs than tw_reasoningstatus and tw_svc_reasoning.log is showing. But I have no idea from where they can come from. Where would you search for the reason?

                    • 7. Re: Consolidation runs stuck at 100 % in UI
                      Andrew Waters

                      That is going to be rather painful. Realistically you would need to get all the DAs and DroopedEndpoints on the scanner and consolidator and check for differences.

                       

                      My guess is that there is going to be a few IPs duplicated on the consolidator where there is only one occurrence on the scanner.

                      • 8. Re: Consolidation runs stuck at 100 % in UI
                        Ondrej Kieler

                        I very briefly checked the numbers (I will investigate it furthermore):

                         

                        Scanner:

                        Endpoints Dropped 120644

                        DAs: 12258

                        Together: 132902

                         

                        Consolidator:

                        Endpoints Dropped: 120995

                        DAs: 11914

                        Together: 132909

                         

                        So your idea with duplicated IPs on Consolidator is probably right. But how it is possible that we have even so much different numbers in Dropped endpoints and DAs? I thought that numbers should be same on scanner and consolidator. How can be some endpoints which were scanned on scanner marked like dropped on Consolidator?

                         

                        Thanks.

                        • 9. Re: Consolidation runs stuck at 100 % in UI
                          Ondrej Kieler

                          I went through the run. I found out that some IPs are duplicated. They are included in dropped endpoints and also in noresponse devices. But there is probably something more which is wrong because I found only 132909 IPs on Consolidator. Not 132911 which is showing reasoning log. Also I can see for example less inferred hosts on consolidator then I see in same run on the scanner.

                          • 10. Re: Consolidation runs stuck at 100 % in UI
                            Greg Sacrey

                            I have seen the exact same thing. Some consolidation runs stuck at 100% complete for days.  I let them run but it is very strange.

                             

                            One thing to check is to make sure the same patterns are active on the scanners as the consolidator.

                            1 of 1 people found this helpful
                            • 11. Re: Consolidation runs stuck at 100 % in UI
                              Andrew Waters

                              Did you restart the consolidator services during this time, i.e. before restarting them to try and clear the runs.

                              • 12. Re: Consolidation runs stuck at 100 % in UI
                                Ondrej Kieler

                                Yes, in case of this particular run I restarted them.

                                • 13. Re: Consolidation runs stuck at 100 % in UI
                                  Ondrej Kieler

                                  Hello Greg,

                                   

                                  the uploaded TKUs are the same. But there is one other strange thing on the scanner. It is showing that 2091 pattern modules are active but when I count together Latest TKU, Latest EDP and Latest storage patterns. Sum is 2088.

                                   

                                  Scanner:

                                   

                                  Consolidator:

                                   

                                  • 14. Re: Consolidation runs stuck at 100 % in UI
                                    Andrew Waters

                                    That particular run or all the ones that stayed on 100%?

                                    1 2 3 Previous Next