Share This:

You can use the SLA to change the status of your applications in the TSPS console and you can also use the AM KM with TrueSight Infrastructure Management (TSIM) to alert if an event threshold is breached n times in a series.

 

The SLA is used in TSPS to update the application status.  However, the SLA can only see the most recent 5 minute results window and cannot see anything that happened in the history of the application.  So if there is a failed result in the 5 minute period that could cause the SLA to trigger an event, then it will set application status to the appropriate value.

 

You may also have your AM KM or SMR set to trigger an event if an event threshold is breached n times in a series.  In this scenario, if an event is triggered, it will occur after n times. 

 

Please remember that the SLA and AM KM are 2 separate items and report on the data in different manners.

 

Below are a few use cases to demonstrate how the SLA and AM KM or SMR work (Please note that I am only discussing critical notifications in these use cases and not minor):

 

AM KM and SMR's are configured to alert if there is a failure 2 times in a row

The Execution Plan runs once every 5 minutes and takes 1 minute to complete

The SLA has the following configuration:

     1.  Latency - 60 seconds

     2.  Accuracy Errors- 35%

     3. Execution Errors - 35%

     4.  Availability Errors - 35%

 

8:00 AM - Execution Plan starts

8:01 AM - Execution Plan stops with 1 Accuracy error

8:01 AM - AM KM and SMR see 1 error.  No events are raised

8:05 AM - TSPS Console receives the results.  The SLA evaluates the results and sees 1 Accuracy error.  This means there is 100% failure.  This is higher than the 35%.  The Application status is set to red.

 

8:05 AM - Execution Plan starts

8:06 AM - Execution Plan stops with 1 Accuracy error

8:06 AM - AM KM and SMR see 1 error.  This is the second error in a row, so an event is raised and a notification is sent out as defined in the AM KM policy.

8:10 AM - TSPS Console receives the results.  The SLA evaluates the results and sees 1 Accuracy error.  This means there is 100% failure.  This is higher than the 35%.  The Application status is set to red.

 

As you can see in the above scenario, the Application status was changed to critical during the first run of the Execution Plan, but the user was not alerted by the AM KM or SMR's until the 2nd run of the Execution Plan.

 

Let’s look at a second use case where a user receives a notification from the AM KM or SMR, but the application status remains green:

 

AM KM and SMR's are configured to alert if there is a failure 2 times in a row

The Execution Plan runs every  minute and takes 30 seconds to complete

The SLA has the following configuration:

     1.  Latency - 45 seconds

     2.  Accuracy Errors- 75%

     3. Execution Errors - 75 %

     4.  Availability Errors - 75%

 

8:00:00 AM - Execution Plan starts

8:00:30 AM - Execution Plan stops with 1 Accuracy error

8:00:30 AM - AM KM and SMR see 1 error.  No events are raised

 

8:01:00 AM - Execution Plan starts

8:01:30 AM - Execution Plan runs with 1 Accuracy error

8:01:30 AM - AM KM and SMR see 1 error.  This is the second error in a row, so an event is raised and a notification is sent out as defined in the AM KM policy and SMR Rule.

 

8:02:00 AM - Execution Plan starts

8:02:30 AM - Execution Plan succeeds

8:02:30 AM - AM KM and SMR see 0 errors.  No events are raised

 

8:03:00 AM - Execution Plan starts

8:03:30 AM - Execution Plan succeeds

8:03:30 AM - AM KM and SMR see 0 errors.  No events are raised

 

8:04:00 AM - Execution Plan starts

8:04:30 AM - Execution Plan succeeds

8:04:30 AM - AM KM and SMR see 0 errors.  No events are raised

 

8:05 AM - TSPS Console receives the results.  The SLA evaluates the results and sees 2  Accuracy errors.  This means there is 40% failure.  This is lower  than the defined 75%.  The Application status remains green.

 

As you can see in the above scenario, the Application status was never changed because the combined results from the 5 minute results bucket did not meet the criteria for the SLA.  However, we did see 2 event threshold failures in a row, which means that the user was sent a notification from TSIM.

 

I hope this post helps understand how this situation happens and how to investigate.  See more blogs at TrueSight Support Blogs

 

Lisa Jahrsdoerfer