There are some commands in either JobLogItem or LogItem (unreleased commands) that will dump out the text of all entries that show up in the job run log. would that be helpful here?
Maybe, if I can find a way to use that in BSARA Reporting... but short of parsing the job logs for the line saying "Exit Code XX" in the job run logs, I don't think this value is transfered anywhere else in the ETL from what you said... I'll try thatc ommand and see what it gives me though...
If you run the jobrunevent ETL that info would be in there. that has the possibility to eat up alot of db space.
the nsh script will return the exit code of whatever runs last. if it's non-zero it's a failure.
What about creating and or setting a custom property value based on the exit code?
The problem would be that there is no place to put the property. I did some work for another customer that wanted to be able to have a ticket number in the property of every job run and then be able to pull that data into the reports.
The problem is that instances do not have their own customizable class. So, when they added the ticket_number property to the job, the value in reports would always be the most recent ticket number that the property was updated to.
i.e. I have a job. I update the ticket_number property on the job to be 001. I run the job. I then repeat the same action for the next 6 job runs. I then run the reports ETL and then do a job run reports against that job and include the ticket_number field. All of them will report having ticket_number=007.
Just change ticket_number in the above to “exit code.” There is no way to create properties in the jobRun class today.
There just doesn’t seem like a good way to do what you are doing, until they expose the jobRun class to custom properties.
Hi Adam, I think I'm working for that company you're talking about actually
As you mentioned, there's no way to create a custom property for the job run record... But, we may have found a way to work around the issue I described above.
We can either adapt our reports to parse the job_run_event records of each jon_run and extract the line with the words 'Exit Code %' in the message field, and then subscrting that result to get just the exit code. I tested this concept manually with a DB query directly in the OM database and it's returning what we need.
What I find weird though (or don't understand) is that there is a field in the job_run table called "overall_exit", which is ALWAYS set to 0 for all of our job_runs... Whatis this field supposed to contain and why is it showing 0 even though the job failed and returned errors? I would have thought the exit value of the NSH script would be the value of thet field... but no.
Vous êtes probablement. No doubt the data you are looking for is in the DB. But since BMC does not support directly querying the database, then I can’t reallu suggest that as an option. ☹ But, from my experience with the product, the overall_exit I believe refers to the actual termination and close out of the job run from a BladeLogic standpoint. For instance, if BladeLogic experienced some error while trying to gather exit code of the NSH script and couldn’t report of that piece, then there would be an exit code other than zero indicating that the BL system was unable to successfully complete the job run. The key would be to open a ticket with support to pose the question to engineering.
Effectivement mon cher
Ok I see. But my point was providing that the ETL transports the job_run_event data to the Data Warehouse in BSARA (which I believe it does), then we could, in Cognos, adapt the report to add a custom filter that fetches the job_run_event record for a given job_run that matches the line 'Exit COde %' and return that value in a column we can use for our trending.
I'm actually talking about the Incident Resolution jobs that we devised if it rings a bell... What Peter and you were working on was to be able to see the ticket number in our report, but the exit code is another issue entirely.
The problem is that we now have reports of all the job runs, but not the actual exit code... So if a job fails, we can't define how other than by looking at the logs of that specific job run. For upper management, this means they can't have accurate trending and cost saving reports done... They see the job is failing 20% of the time but don't know it's not always a programming error (i.e. sometimes the target is just not available).
The other solution we looked at was to simply have all of our NSH jobs output a one liner in a file (one file per job run, with special nomenclature) in a centralized depot, and then we could parse those files using another script and build a more meaningful report... However this data wouldn't be transported to BSARA... so we're assessing if it would be a good solution or not...
Yeah. I am completely with you there, and understand why you would want it. I am not a savvy Cognos guy, so I can’t really assist in that regard. But, I think that even if you could report on the error code, all you might get is ‘1’ for failure in most cases. This could be because an agent isn’t available or because of a programming error. But, again, I am not sure.
I did a simple SQL query (MSSQL) on our OM, and got the results I was expecting.
select distinct message from dbo.job_run_event where message like 'Exit Code %'
It showed me several non-zero exit codes that seem to match the NSH exit codes programmed in our scripts...