6 Replies Latest reply on Sep 13, 2018 10:46 AM by Steve Unegbu

    Bladelogic Cleanup Task Running in Database

    Steve Unegbu

      I ran a cleanup job which took too long to complete so I cancelled it and restarted the BL PROC service.

      I am assuming this will stop the nsh cleanup tasks running from the application server however what about the cleanup tasks running IN the database?

       

      When I run the below BMC provided procedure to " find out whether any long running cleanup task is in processing stage":

      set lines 132
                      col current_action format a80
                      col duration format a15
                      select
                          task_id,
                          cast(
                               (cast(updated_at as timestamp)-cast(started_at as timestamp))
                               as interval day(0) to second(0)
                          )
                          as duration,
                          current_action,
                          to_char(deleted_rows,'9G999G999G999') as deleted_rows
                      from
                          delete_tasks
                      where
                          ended_at is null
                      ;

      I still get the below which I was getting while the job was running:

       

         TASK_ID      DURATION                  CURRENT_ACTION                                                                                             DELETED_ROWS 
      ----------            ---------------                    --------------------------------------------------------------------------------                               --------------
      103000           +24 00:33:14.00         deleting DEPLOY_JOB_RUN_EVENT                                                                      187,454,816
                                                                                                                          

      105911                                              Hard deleting objects of class 'SJobRunEvent'                                                              0
      18590559                                          Marking child of class 'SJobRunOfBatchJob' for deletion: SJobRunEvent                    0
        21066292     +00 00:19:58.00         soft deleting bl_value children: 67.63% complete                                                           0
                                                                                                                           

        30113642     +04 02:38:52.00         deleting SNAPSHOT_ASSET_CNT                                                                        121,245,246

       

       

      How can this task be killed off within the database so we rule out any potential performance issue brought about by this persistent task?
                                                                                                                           

        • 1. Re: Bladelogic Cleanup Task Running in Database
          Bill Robinson

          I ran a cleanup job which took too long to complete so I cancelled it

          what does 'too long' mean ?  please don't just randomly cancel things before you investigate w/ support.

           

          and restarted the BL PROC service.

          you mean the process spawner service ?

           

          I am assuming this will stop the nsh cleanup tasks running from the application server

          depends - if you have the appserver configured to use the process spawner then that probably would have killed the nsh processes spawned to run the cleanup script.  if the appserver isn't configured to use the spawner then no.  and that wouldn't cancel the job either, but it might cause the job to fail.  normally if you want to stop a job from running you cancel the job.

           

          however what about the cleanup tasks running IN the database?

          no, it won't.  so now you are going to have to wait for the in process transactions to complete, depending where cleanup was and when it realizes it got cancelled, transaction rollback might get triggered and you'll have to wait for that too.

           

          How can this task be killed off within the database so we rule out any potential performance issue brought about by this persistent task?

          open a ticket w/ support ?  if you think there's a performance issue in your db you should be getting awr/addm/etc reports instead of just randomly killing things off.

          • 2. Re: Bladelogic Cleanup Task Running in Database
            Steve Unegbu

            Hi Bill

             

            By too long, it was taking over 2 weeks and running the sql procedure above to show any process showed it was still at the same stage after most of the 2 weeks.

            Yes I restarted the process spawner service.

            The job was cancelled via the console but was still showing as running, we then issued the cancel command and restarted the process service

            This was done several weeks ago so any in process transaction should have completed. How can this be checked?

            We'll open a case, in the meantime enquire with DBAs about running awr reports.

            What do you recommend when clean-ups take excessively long and there is a high chance of hung tasks?

            • 3. Re: Bladelogic Cleanup Task Running in Database
              Bill Robinson
              By too long, it was taking over 2 weeks and running the sql procedure above to show any process showed it was still at the same stage after most of the 2 weeks.

              after a day i would have opened a ticket w/ support. 

               

              Yes I restarted the process spawner service.

              and is the appserver configured to use it ?  do you know how to check that ?

               

              The job was cancelled via the console but was still showing as running, we then issued the cancel command and restarted the process service

              cancelled or aborted ?  cancel will wait for the job to get to a break point to stop executing.  afaik that won't happen w/ a nsh job.  aborted should kill it.

               

              This was done several weeks ago so any in process transaction should have completed. How can this be checked?

              awr/addm/etc ? the delete_tasks just shows the last state of cleanup that got logged to the db. 

               

              What do you recommend when clean-ups take excessively long and there is a high chance of hung tasks?

              well those are two problems - 'excessively long cleanups' - you need to look at db reports, etc.  maybe you aren't running table stats updates as we recommend. maybe you had a highly fragmented db.  maybe there's an index that should be added.  maybe you just have a lot of data to delete.  maybe there's some product issue.  or all of those. 

              'high chance of hung tasks' - that can also be caused by the above reasons, totally outside of any cleanup running.

              • 4. Re: Bladelogic Cleanup Task Running in Database
                Steve Unegbu

                Checked the process spawn process via the infrastructure management, application server launchers.

                 

                It was cancelled. What is the best way to cancel an nsh cleanup job?

                 

                We ran it in typical mode twice, the first completed in 2 days but some of the procedures timed out before competing so though the overall job was successful it did not fully complete. For the second run we doubled the MaxDuration to 1440 (default 720) so the procedures wouldn't timeout, this is where the excessive time began. Perhaps this setting is incorrect?

                • 5. Re: Bladelogic Cleanup Task Running in Database
                  Bill Robinson
                  Checked the process spawn process via the infrastructure management, application server launchers.

                  you checked it and you found what ?  in the blasadmin config for your appserver instances you saw:

                  [ProcessSpawner]

                  SpawnExternally:false

                  or

                  [ProcessSpawner]

                  SpawnExternally:true

                  the default is false - not to use the spawner.  so even if the service is running, it's not being used.

                   

                   

                  It was cancelled. What is the best way to cancel an nsh cleanup job?

                  don't do it.  instead, open a ticket so we can investigate what is wrong w/ your db or why cleanup is taking 'too long'.

                   

                  We ran it in typical mode twice, the first completed in 2 days but some of the procedures timed out before competing so though the overall job was successful it did not fully complete. For the second run we doubled the MaxDuration to 1440 (default 720) so the procedures wouldn't timeout, this is where the excessive time began. Perhaps this setting is incorrect?

                  do you have some logs of these job runs ?  i would include those in your support ticket.  it really depends - if you have never run cleanup before and your env has been up for some amount of time or very active then you are probably going to need to run cleanup differently than you would normally run it so you can 'catch up' to a point where the normal method of running cleanup can keep pace. 

                  • 6. Re: Bladelogic Cleanup Task Running in Database
                    Steve Unegbu

                    You're correct Bill. SpawnExternally was set to a default of False.

                     

                    As far as doing "catch-up" cleanups go (offline may not be possible) would you say it's best pre or post 8.9.3 upgrade?