8 Replies Latest reply on Dec 7, 2017 3:12 AM by Alvaro Paronuzzi

    Something is killing my cell

    Alvaro Paronuzzi

      Hi everyone,

       

      Since a couple of days I'm seeing some issues in my BPPM environment. The cell crashes (even though the service remains up) and I have to manually kill the process from the task manager in order to restart it successfully. After 5 minutes more or less it crashes again...and so on.

      I enabled the logs and the only thing I found is tons of this set of lines:

       

      20171204 085533.784000 mcell: RULES: BMC-IMC110806I: sklSDVPropagateIncInfo.mrl, 223: execute PropIncUrlSklEvToSklEv: DATAMINER_EV #6825674: when block execution starting

      20171204 085533.784000 mcell: RULES: BMC-IMC110203I: sklSDVPropagateIncInfo.mrl, 339: execute PropIncUrlSklEvToSklEv: DATAMINER_EV #6825674: $UPDATEDLIST =

      20171204 085533.784000 mcell: RULES: BMC-IMC110203I: sklSDVPropagateIncInfo.mrl, 340: execute PropIncUrlSklEvToSklEv: DATAMINER_EV #6825674: $FATHER.skl_child_events =

      20171204 085533.784000 mcell: RULES: BMC-IMC110202I: sklSDVPropagateIncInfo.mrl, 221: execute PropIncUrlSklEvToSklEv: DATAMINER_EV #6825674: solution 45513 to event query:

      $FATHER = 0xb243bc0 (class: DATAMINER_EV,   event_handle: 6745079, mc_ueid: mc.bem2-prepro.1a103d9f.0)

       

      Tons! I have 5 file logs with similar messages but I don't know if this is what causing the problem.

      I'm attaching the cell log and the mentioned mrl rule looking for some help. Unfortunately my BPPM version (9.0) is no longer supported by the BMC Customer Support so I have some difficulties in resolving this issue by myself.

       

      Thank you in advance for any help you can provide.

       

      Al

        • 1. Re: Something is killing my cell
          Alvaro Paronuzzi

          I removed the mentioned mrl rule from the load, recompiled the kb and restarted the cell.

          I am still seeing issues after the restart, even though now it's not down (but I don't see any event inside the collectors for example).

          I attach the latest log file.

           

          Thank you,

          Al

          • 2. Re: Something is killing my cell
            Kaushik KM

            Hi Alvaro,

            Sorry i am able to see the attached files now. my bad. let me check, if i can be of any help

            • 3. Re: Something is killing my cell
              Alvaro Paronuzzi

              Hi Kaushik,

              after the jserver restart BPPM is working fine without the mentioned rule.

              The root cause of the issue seems to be the mrl rule then...but actually I don't understand what's wrong with it.

              Thank you in advance for your help.

               

              Al

              1 of 1 people found this helpful
              • 4. Re: Something is killing my cell
                Kaushik KM

                Heyy Alvaro,

                 

                You are correct.

                I was thinking if you are receiving events or not.

                 

                I had similar problem in Production environment 9.0V BPPM very recently.

                mcell was taking forever to start, in the process to solve that problem i faced your problem too, that non of the events were in any collectors.

                i followed the below steps to get it working:

                 

                1)I started the mcell in foreground with "mcell -d -n Production_BPPM"

                2)then started all the service using "pw system start"

                3)stopped mcell in foreground.

                4)started mcell in background using "pw p s mcell"

                5)Ops console was up. but was unable to see any events on console.

                6)stopped jserver process with "pw p e jserver"

                7)started jserver process with "pw p s jserver"

                 

                9.0 is out of support - started to see more problems in mcell going down

                • 5. Re: Something is killing my cell
                  Alvaro Paronuzzi

                  Hi Kaushik,

                   

                  I am receiving many events in this BPPM environment...I think the execution of that rule for such a huge amount of events is almost killing the cell.

                  I am applying the workaround of restarting the cell and the jserver as well but after this restart BPPM stays up no longer that 5/10 minutes...so it's quite unusable.

                  In the other BPPM 9.0 environment with the same set of rules and a much lower volume of events I'm not seeing any issue. In other words it seems that the execution of this rule for a huge amount of events is causing performance issues but I don't know how to investigate further. :-s

                  • 6. Re: Something is killing my cell
                    Kaushik KM

                    Hi Alvaro,

                    Yes if you are executing a rule that in turn trigger a script for example like automation or something,Performance will be affected for sure, we have it in our environment too.

                     

                    We shall wait for suggestion from Experts.

                     

                    can you take a look into this problem please:

                     

                    https://communities.bmc.com/thread/172566

                    let me provide you any particular logs if required

                    • 7. Re: Something is killing my cell
                      Charles Kelley

                      I've not tested here, but just from a review of the rule, I think that might be negating the benefit of a hashed index with the where clause that you have on the hashed index.  Looks like the where clause on the hashed index isn't necessary, if the values would match between the child and father events... would they definitely match?  For example, both message values should not contain Flapping, but would they also be the same value between those two?  If so, you might try testing this:

                       

                       

                      index SKLEV_idx1: SKL_EV hashed [skl_sdv, msg, mc_parameter] END

                       

                       

                      execute PropIncUrlSklEvToSklEv : SKL_EV($CHILD) #effect

                         where [$CHILD.skl_sdv == 'Yes' AND NOT $CHILD.msg contains 'Flapping' AND NOT $CHILD.mc_parameter == 'Manual']

                         using ALL { index SKLEV_idx1 [$CHILD.skl_sdv, $CHILD.msg, $CHILD.mc_parameter] ($FATHER) #cause

                      1 of 1 people found this helpful
                      • 8. Re: Something is killing my cell
                        Alvaro Paronuzzi

                        Hi Charles,

                         

                        Thank you for the input. I had the chance to review these rules with a collegue and we better understood the use of indexes.

                        Actually we could apply your suggestion only to a few rules contained in that mrl file, but this change has strongly improved the cell performance and now it's not crashing anymore.

                        Thanks for your help!

                         

                        Al