5 Replies Latest reply on Aug 9, 2012 8:58 AM by Bill Robinson

    Errors indicating predictive failure conditions in appserver.log

      My boss has asked me to come up with a way to identify predictive and clear failure conditions found in appserver.log on our BSA servers.


      Our current BSA instance is pretty small -- it was scaled for a larger deployment before there were some strategic changes that minimized the scope of the product in our environment -- so I am not in a position to get a bunch of leading indicator type data from the appserver.log.


      I'm curious if there is some data out there on errors that are written to appserver.log that I can leverage to engage the other automation tools to cut tickets to get assistance for the server... or to initiate some sort of self-healing routine.  For now, I have to wait and try to anticipate failures or prefailure conditions and then capture data from the appserver.log... has anyone already done research on this, and if so, can they share what they know for the general public?




        • 1. Re: Errors indicating predictive failure conditions in appserver.log

          Have not done research on this, but I’d think it would depend on what you are doing with BladeLogic.
          If you simply want to scan for error messages, then the trace line will contain [ERROR] in it (following by the stack trace):


          [24 Jul 2012 17:58:24,513] [Scheduled-Job-Tasks-Thread-2] [ERROR] [System:System:] [Schedule Monitor] An error occurred while attempting to access the database:
          [24 Jul 2012 17:58:29,264] [Scheduled-System-Tasks-Thread-10] [ERROR] [System:System:] [Registry Monitor] Exception creating connection to: BMC-4SRCZN1; nested exception is:


          If you want to monitor the java heap size to see patters where the free memory fades away and may potentially cause the appserver to run out of it, then perhaps you would monitor these messages:


          [24 Jul 2012 16:57:42,843] [Scheduled-System-Tasks-Thread-20] [INFO] [System:System:] [Memory Monitor] Total JVM (B): 167968768,Free JVM (B): 39785664,Used JVM (B): 128183104,VSize (B): 350150656,RSS (B): 289345536,Used File Descriptors: 2235


          If you have a Job scheduled to run and you want to avoid as many issues as possible, then the following could be done upfront against the targets (via respective jobs):
          - Check that the agent is alive and licensed (Update Server Properties Job), and review the ones that are no longer active.
          - Check that the correct authorizations are granted for the role that’s about to run the job (right-click on objects / Update Permissions)


          To make sure that the fileserver/database does not overgrow in size (this applies more to large organizations), then a regular cleanup routine needs to be implemented.


          Review documentation/best practices/recomentdations with regards to scaling/sizing/appserver configuration depending on how large the infrastructure is. This should be available in the docs section.


          This is only tip of the iceberg; I think this thread will get more responses...

          • 2. Re: Errors indicating predictive failure conditions in appserver.log
            Bill Robinson

            you may also want to look at the jconsole and jmxcli to see if there is anything useful in there that you want to see.  monitoring the heap may not be worth is - you can run w/ a high percentage of the heap allocated w/o any issues, or something could set you just over the threshold and then you get a OOM.  scraping the appserver logs won't really get you anything predictive, other than looking at the open files i think from the memory monitor.  once you start seeing the error messages, you are reactive, not predictive

            • 3. Re: Errors indicating predictive failure conditions in appserver.log

              Agreed, but with the heap if you see it progressively climbing overtime, then you may predict that at some point it may run OOM (theoretically). In reality a good configuration setup (best practices/white papers), scheduled cleanup and maintenance jobs (acls, update properties, etc), test lab environment should minimize the need of reactive state in prod, which is inevitable

              • 4. Re: Errors indicating predictive failure conditions in appserver.log

                When I say "predictive" I mean a problem is occurring but has not crippled the system to the point users notice, but I appreciate the point and context of your humor.


                so, am I to assume that the jconsole mentioned above doesn't ship with BSA, and I would need to download a java developer kit to get access to it?  Is there anything about how to do monitoring with those tools written down somewhere before I trudge off to try and decode it myself?  I prefer to not reinvent the wheel if I can help it.


                I apologize for not intuiting this stuff, just seems documentation and tools for these products is of the enigma-wrapped-in-riddle-shrouded-in-mystery class of solutions.



                • 5. Re: Errors indicating predictive failure conditions in appserver.log
                  Bill Robinson

                  there should be a 'bljconsole' binary in the NSH/br directory on the appserver/gui.  there is a not well documented jmxcli that is a command line interface to the same.  so once you figure out what you might want to monitor, you can do something w/ the jmxcli.


                  there's stuff on the OS to watch - open files creeping up, times are in sync, overall memory utilization that are important.  w/ in bsa though i would say it's more complicated to watch.  if you start seeing db errors you are most likely screwed, if you start seeing the appserver be unable to talk to each other, you're also screwed (meaning the env is already broken) - you might be able to do some monitoring of the java heap, but as i mentioned that is not so easy to monitor.