Have not done research on this, but I’d think it would depend on what you are doing with BladeLogic.
If you simply want to scan for error messages, then the trace line will contain [ERROR] in it (following by the stack trace):
[24 Jul 2012 17:58:24,513] [Scheduled-Job-Tasks-Thread-2] [ERROR] [System:System:] [Schedule Monitor] An error occurred while attempting to access the database:
[24 Jul 2012 17:58:29,264] [Scheduled-System-Tasks-Thread-10] [ERROR] [System:System:] [Registry Monitor] Exception creating connection to: BMC-4SRCZN1; nested exception is:
If you want to monitor the java heap size to see patters where the free memory fades away and may potentially cause the appserver to run out of it, then perhaps you would monitor these messages:
[24 Jul 2012 16:57:42,843] [Scheduled-System-Tasks-Thread-20] [INFO] [System:System:] [Memory Monitor] Total JVM (B): 167968768,Free JVM (B): 39785664,Used JVM (B): 128183104,VSize (B): 350150656,RSS (B): 289345536,Used File Descriptors: 2235
If you have a Job scheduled to run and you want to avoid as many issues as possible, then the following could be done upfront against the targets (via respective jobs):
- Check that the agent is alive and licensed (Update Server Properties Job), and review the ones that are no longer active.
- Check that the correct authorizations are granted for the role that’s about to run the job (right-click on objects / Update Permissions)
To make sure that the fileserver/database does not overgrow in size (this applies more to large organizations), then a regular cleanup routine needs to be implemented.
Review documentation/best practices/recomentdations with regards to scaling/sizing/appserver configuration depending on how large the infrastructure is. This should be available in the docs section.
This is only tip of the iceberg; I think this thread will get more responses...
you may also want to look at the jconsole and jmxcli to see if there is anything useful in there that you want to see. monitoring the heap may not be worth is - you can run w/ a high percentage of the heap allocated w/o any issues, or something could set you just over the threshold and then you get a OOM. scraping the appserver logs won't really get you anything predictive, other than looking at the open files i think from the memory monitor. once you start seeing the error messages, you are reactive, not predictive
Agreed, but with the heap if you see it progressively climbing overtime, then you may predict that at some point it may run OOM (theoretically). In reality a good configuration setup (best practices/white papers), scheduled cleanup and maintenance jobs (acls, update properties, etc), test lab environment should minimize the need of reactive state in prod, which is inevitable
When I say "predictive" I mean a problem is occurring but has not crippled the system to the point users notice, but I appreciate the point and context of your humor.
so, am I to assume that the jconsole mentioned above doesn't ship with BSA, and I would need to download a java developer kit to get access to it? Is there anything about how to do monitoring with those tools written down somewhere before I trudge off to try and decode it myself? I prefer to not reinvent the wheel if I can help it.
I apologize for not intuiting this stuff, just seems documentation and tools for these products is of the enigma-wrapped-in-riddle-shrouded-in-mystery class of solutions.
there should be a 'bljconsole' binary in the NSH/br directory on the appserver/gui. there is a not well documented jmxcli that is a command line interface to the same. so once you figure out what you might want to monitor, you can do something w/ the jmxcli.
there's stuff on the OS to watch - open files creeping up, times are in sync, overall memory utilization that are important. w/ in bsa though i would say it's more complicated to watch. if you start seeing db errors you are most likely screwed, if you start seeing the appserver be unable to talk to each other, you're also screwed (meaning the env is already broken) - you might be able to do some monitoring of the java heap, but as i mentioned that is not so easy to monitor.