Hi, Just dropping a line to stay up to date. We've noticed a lock 2 weeks back as well. Altough is was occasional and we don't seem to be able to reproduce it we're not feeling completely confident.
We've scheduled the cleanup out of critical timeframes now to be sure.
We had the exact same issue and stopped our cleanup script for around 6 months.
We have re-kicked it off, however we are after something we could monitor as a means of creating a service management incident.
Did you check out the oracle logs for errors?
If there is anything of interest, could you let us know as we could set up monitors for it in our own system.
Out of interest what version of BL are you on?
We're on 8.2.01
There are a few general changes we can make that reduce the likelihood of cleanupDatabase causing blocking in the database:
1 - In the script that calls cleanupDatabase use a reduced logging level (8.3.00 and later only)
blcli Delete cleanupDatabase 1 true <duration>
The first argument is the Log level. Valid values are 1 (ERROR), 2 (WARNING), 3 (INFO), 4 (DETAILED), 5 (VERY DETAILED), 6 (DEBUG)).
cleanupDatabase is writing log information to the database while it's running and this can cause blocking as running jobs are trying to write to the same tables. Reducing the logging level reduces this probability.
2 - Update the job_run_event cleanup stored procedure to remove additional rows (job_run_event entries associated with soft deleted jobs runs)
Information is on this page under "Additional JOB_RUN_EVENT Cleanup":
3 - Before running cleanupDatabse, make sure the cleanupHistoricalData commands have completed.
2 &3 are recommended because this will reduce the amount of 'historical data' that cleanupDatabase must deal with. When cleanupDatabase runs it must delete the soft-deleted objects and all of the child objects. So for example if a job is being hard deleted by cleanupDatabase, the child job_run and job_run_event entries must be deleted. This could be millions of rows, and this can induce the blocking behaviour as running jobs are trying to write to these tables that cleanupDatabase is acting on.
cleanupHistoricalData does not have this same issue so its safer to have cleanupHistoricalData remove the bulk of the data before cleanupDatabase does the hard deletes for the soft-deleted objects.
4 - Ensure you are running the bmc-provided gather stats (Oracle only)
Information is on this page under "Gather statistics"
5 - Create additional indexes on foreign keys (Oracle Only)
Information is on this page under "Additional Indexes"
4&5 will improve overall database and cleanup performance
6 - Reduce the blasadmin cleanup MaxObjectsPurgedPerTransaction from 1000 down to smaller number in the range 100 to 500. This reduces the number of objects deleted in a batch during cleanup database which results in smaller and shorter commits and a smaller likelihood of blocking.
What you may want to do is run the historical cleanups on a daily basis or a few times during the week and then run the cleanupDatabase/fileserver/sharedobject cleanups on the weekends or some other low activity period .
The referenced page is recommended reading https://docs.bmc.com/docs/display/public/bsa83/Cleaning+up+the+BMC+Server+Automation+database
(This post has been created from original information provided by Bill Robinson - credit where it's due )
Many thanks for the the response.
We have already carried out the recommendations set by Bill. (excluding the first as it is for 8.3)
We were wanting to know whether an error message would appear in the oracle / bladelogic log for a locked database.
Our worry is towards scheduled jobs that occur out of hours, we wish to create a service management incident to invoke out call out procedures, to enable us to resolve the issue before an impact is felt throughout the business.
oracle or sqlserver ?
Bill, it's Oracle based on mentions in previous posts in this thread.
Yes it's Oracle, sorry for the delay.
There won't necessarily be a 'locked' message in the database that is that clear. you should get your dba to generate ADDM and AWR reports that cover about an hour or so around when this occurs. from there we can tell what queries are causing a problem.
how many of the steps in the wiki page have you been able to perform so far?
From my knowledge we have completed all the steps from the 8.2 wiki.
Our cleanup has been running for around a week now without any issues; if we get a lock I will request that both of the reports are generated and will upload them.
Many thanks all.
It seems we are not licensed for ADDM and AWR reports.
In the event the issue reoccurs would a statspack report suffice?
I’m not familiar w/ what that provides. If the problem happens again I’d say get the stats pack covering the time of the problem (or a couple hours of the problem) and we’ll see if it gives us something useful.