My TrueSight experiences so far

Version 2
    Share This:

    Introduction

    As we work with TrueSight Operations Management, which includes the TrueSight Presentation Server, TrueSight Infrastructure Management and Integration Services as components, for quite a while I like to share what my experience with the product is. Before I start some general points.

    First I want to apologize for my English, as it is not my mother tongue there will be some mistakes for sure. I appreciate your comment if something is not understood and I will try to reword it.

    Second I might mention issues and unexpected behaviours of the product. Keep in mind that this is my perspective and it must not apply to everyone. Further I want to mention, if there are defects in the product, I do not say that BMC is not trying to fix them. We experienced a good collaboration between BMC and us during the last few month and we are confident that BMC is trying hard to fix defects.

     

    Environment Setup

    Ok, let us start with the installation process. We have successfully installed three environments: test, pre-production and production. The test is a simple installation consisting out of a TSPS, one TSIM and an IS server (actually there are two IS instances on the same server: a staging and a "normal" IS). The TSIM uses the embedded Sybase as database. We use this environment to test new releases of the TrueSight infrastructure components, to create deployable packages and for KM development (mainly the CMA part for configuration).

    Before anything is installed in production, we first test the components in our pre-production environment. This environment is slightly more complex as the test. It consists out of a TSPS, two TSIM and several servers with multiple instances of staging IS and "normal" IS.

    As database backend for the TSIM oracle is used.

    The production consists out of a TSPS, four TSIM and several servers with multiple instances of staging IS and normal IS. From a scalability point of view our setup should meet the ammount of servers we currently monitor in the PATROL7 environment. AAs database we use Oracle configured as Data Guard.

    Overall we can say that the installation process was easy and did not cause any problems.

     

    TrueSight at running

    TrueSight was running fine. Of course we did not have a high number of systems connected to it. Most of the systems integrated into TrueSight were belonging to the infrastructure. A short time after the installation of the pre-production (this environment was installed first) we encountered a problem with the Atrium SSO. Everytime when the Atrium SSO was down for maintenance, all policies in CMA were disabled. This was caused by a "policy validation task" running on the TSPS. The same happened when a user was deleted (moved away from a authorized group) or when a group was removed. We demanded BMC to change this behaviour immediately! And they did. First they provided us a workaround by setting the scheduling time of the policy validation task to a high number. This can be done with the command line utility "tssh properties set {propertyName} {propertyValue}". Apart from this BMC promised to deliver a hotfix/pointfix to resolve the issue completely. User which want the original behaviour of disabling the policies if a User doesn't exists any longer or a group is removed it can be enabled though. The fix made its way into the 10.1 release of the TrueSight Presentation Server. Not to disable the policies, in case of a Atrium SSO disconnect, user or group "deletion", is now the default behaviour.

     

    Right at the beginning of our project we made thoughts about automation. Despite the Integration Service provides a staging functionality with staging policies, it lacks of the ability to control the distribution based on load and amount.So we decided to write our own automation. As our infrastructure has been installed on windows, we decided to do it in PowerShell. The PowerShell script uses database queries to the TSPS PostgreSQL database as well web service calls. The automation creates a staging policy for each new PATROL Agent which is connecting to the staging IS. The staging policy distributes the PATROL Agents among the IS based on the amount. In almost all cases we implemented IS clusters to provide high availability. Regarding our automation there are three points I would like to outline:

     

    1. This will end up in a huge number of staging policies
    2. There is no exchange of information between the TSIM servers. This results to the following: as soon the staging policy is deployed to the PATROL Agent, it will disconnect from the staging IS and it connects to the IS assigned in the staging policy. If the assigned IS is connected to another TSIM as the staging IS the PATROL Agent will remain as disconnected on the staging IS. Which means either you wait until it is deleted by the automatic deletion process (per default after 30 days) or you delete it manually in the CMA console.
    3. Currently there are some display issues in CMA - agents are shown as connected on the staging is but in fact they are connected to another IS on another TSIM.

     

    This facts prevents us of speaking of a 'fully automated workflow'. But still it is much better than it was with the PATROL 7 environment. If you have questions regarding the automation, don't hesitate to contact me.

     

    Let us move away from the automation to other topics. Comparing to BPPM the TrueSight Presentation Server - TSPS - is a big step into the right direction: an unified console. Today we can not say that this target has been achieved, but we think BMC is on a good way. For certain tasks or to fetch some data/information you are still required to hop on the TSIM servers. To get debug information for example. Also the graphing on the TSPS is not quite there where it should be. Important information like hourly maximum/minimum value, despite the necessary checkboxes are available, can not be displayed today (case with support is open).

     

    Another point of discussion is the visibility of the available attributes. To know what attributes are available on a monitor, you have to click you through until you are in the graph settings. All this is hidden behind "three dots". In my opinion this is not user friendly at all. If the TSPS console should be a full value replacement for the PATROL Central Console, it must not only provide views optimized for events, it must also provide easy access to the attributes and the related data. And one crucial information which can not easily be accessed, neither through TSPS and TSIM, are text parameters. The only chance to get the data of a text parameter is with a PSL command through the "Query Patrol Agent" dialog - which is to be honest, not very handy. Nevertheless text parameters often hold very important information and BMC should take this seriously.

     

    Let us take a view at groups. Of course there is a nice search function built in into TSPS. But often a group is quite handy. In TSPS you can choose of two kind of groups: manual groups and role based groups. If you want to have a few servers in a group, a manual group is your choice. If you want to create a group of all Windows servers in a multiple-thousand server environment, you would like to create a role based group. But today a role based group can only be created based on the device name or the monitor name. This lack of criterias you can specify makes it impossible to create a role based group which holds all Windows Servers unless they can be separated from other servers by their name. Practical would be if you could use the same criterias to create a role based group as you can for a policy.

    When we talk about dashboards or dashlets, groups become even more important. Some dashlets use groups as a selection. For example the "Monitors - Top/Bottom Performers" dashlet. Today this dashlet can only be based on TSIM groups. What makes it even worse, you can only select on TSIM. As example if you have 4 TSIM servers, you will end up with 4 dashlets to get the top/bottom performers for CPU. But if you could use the TSPS groups and you could build role based groups based on the same criterias as policies then this would be one of the most demanded dashlets I suppose.

     

    When we come to CMA, we also miss some features. For example we miss the system output window. The SOW provided us important information to debug the function of the PATROL Agent or the KMs. Of course there is the already mentioned "Query Patrol Agent" dialog but again it is not very handy. And yes I am aware that you can provide pre-built commands.

    CMA should also provide easy access to functions which are used to collect debug information for BMC support. For example the PATROL Agent Diagnostic report. Today we have to involve the system administrator to get this information.

     

    Conclusion

    TrueSight is definitely going into the right direction. When BMC is listen to the users and implements their ideas it is going to be a good product. Today it is usable. It even simplifies some crucial workflows. But still it lacks on some places.

    What I wish is, that BMC not only lays the focus on events but also on the analysis of attributes and their data. In fact TrueSight - mainly TSPS - must become a worthy successor of the PATROL Central Console and Impact Explorer in one product. It is not quite yet.