Redesign TrueSight

Version 2
    Share:|

     

    1 Introduction

     

    First of all, this document should be understood as a proposal.

     

    In my years experience I have learned a lot about TrueSight. As I wrote in my post recently, TrueSight definitively goes in the right direction. There are also a lot of things I like on TrueSight. But it cannot be discussed away that in some areas TrueSight is not designed well. It is not my intent to criticize the work of the developers, I am sure they had the best intent to create a good product. Maybe the developers did not know how the product will be used in a productive and dynamic environment.

     

    In my opinion there are some steps needed to take TrueSight to the next level and to keep it a competitive product. Some areas of TrueSight need to be redesigned. Mainly the UIs are not are not as usable as they should be and there are also some APIs wich should be either developed or improved.

     

    As I have no insights into BMC workflows, I can only guess what the difficulties are. I can imagine that Time-to-market issues, new technologies, late or not in depth insight into products for competition reasons are the main stumbling blocks for BMC to develop a perfect product.

    There is Microsoft System Center Operations Manager which is specifically designed for all Microsoft products and we cannot discuss away that they know their products at best. There is VMware Realize Operations Manager and they know virtualization at its best.

     

    So the product must set itself apart from others in a kind of way and generate more value for the customer. Today TrueSight is good as a monitoring tool when it is used as the one and only one. Therefore BMC should choose another way. TrueSight should concentrate to be a central point of monitoring metrics & event data. So the aim should be to create integration for all the specialized monitoring products. Of course TrueSight should remain its capability to perform monitoring tasks itself. And from a technical point of view, the PATROL Agent is still a good instrument for monitoring. It offers a broad platform support and the flexibility to monitor a variety of components which can be OS specific, middleware, databases and applications.

     

     

    2 TrueSight for Monitoring (PATROL)

     

    As mentioned in the introduction, the PATROL Agent is a great tool for monitoring. It offers broad platform support and the flexibility to monitor a variety of components which can be OS specific, middleware, databases, application etc. BMC should keep this as main tool for monitoring. There is not much to redesign. But the PATROL Agent should be enhanced in some areas. As published on BMC Communities, Sentry Software has announced that they implement a REST-full API for the PATROL Agent. It is a pity that this API is not implemented by BMC itself. Other areas apply more to the integration of the PATROL Agent into the TrueSight environment. The areas are deeper described in the following sections.

     

    2.1 PATROL Agent REST-full API

     

    The REST-full API should provide all functionalities to

     

    • gather information about the PATROL Agent Status
    • gather configuration information
    • get data
    • get logfiles and other debug data
    • perform configuration
    • restart/reinitialize the agent

     

    The API must be secure because you can breake a lot of things as the API is designed to control all aspects of the PATROL Agent.

     

     

    2.2 Monitoring Configuration

     

    From my experience, in most cases there is a default configuration for each server type. For example there is a default monitoring configuration for all Solaris servers. On these Solaris servers run different components. There are databases, midlewares and applications. This leads inevitable to individual monitoring configurations. The configuration system should be based on inheritance. This means, the server gets the default monitoring configuration and then the individual deviations and/or extensions to the monitoring configuration. Today, this is possible in a limited scale by using the "Precedence" when configuring Monitoring Policies in TrueSight. However, the way of configuration should be improved because often you have to copy a whole policy and you cannot just configure the delta.

     

    What is also an aspect which is causing difficulties, are the Versions of Monitoring Solutions configured in a Policy. In the current design, it is not possible to change the Version of a monitor in a policy. This means, the policy must either be exported and the version changed in the JSON structure - which is not without risk of breaking the monitoring configuration - or re-creating the whole policy. This part should be redesigned to allow version changes in policies. When a version of a monitoring solution is changed in a policy, there should be a kind of wizard which tells the user what critical configurations have to be changed or added when changing the version. This might conflict with BMCs plans to automate the distribution of Monitoring Solutions to the PATROL Agents based on the Monitoring Policies. But even then, a change of the version is vital to keep the monitoring solutions up to date.

     

    2.3 PATROL Agent Debugging

     

    PATROL Agent Debugging in a TrueSight environment is not an easy task. Especialy if no other access to the hosts were PATROL Agents are installed on is possible. Therefore it should be possible from the TrueSight infrastructure to

     

    • get (download) the PATROL Agent logfiles
    • create a PATROL Agent Diagnostic Report (patroldiag)
    • set PATROL Agent debug
    • easy access to the system output window (issow.log)

     

    This all should be possible from todays "Managed Devices" view.

     

    3 Third-Party Integrations

     

    There are many tools out there. Among them are some specialized monitoring tools for a specific product range or technology e.g. VMware vRealize Operations Manager. And there are other general purpose monitoring tools like Icinga2, CheckMK, Nagios, etc.. There are also other enterprise monitoring Tools like HP OpenView, Microsoft System Center Operations Manager, etc.. And of course the new fancy stuff like Prometheus. TrueSight should be able to collect metrics and event data from such monitoring tools. This would strengthen TrueSight as a single point of view. To be preceise, the user would have to use only one console for all monitoring tools - all monitoring tools would be under one roof.

     

     

    4 TrueSight UI

     

    The TrueSight UI has to be redesigned. First of all in regards of the usability but also to meet all new requirements.

     

    4.1 Usability

     

    The TrueSight UI shuld be improved/redesigned/rewritten in the following areas

     

    • Event View
      • In comparison to the "old" Impact Explorer (IX), the Event View still laks of performance, especially when handling a larger number of events.
      • The view should be more flexible in displaying slots. It is a bit better with the Event Table Views but it is still not as userfriendly as it could be. Each user should be able to define his default Event Table View and this is not possible today.
    • Groups
      • It should be possible to create groups of devices based on all selection criterias as they are available for Agent Selection Criteria
    • Device View
      • A "Health at a glance" view for the selected device would be helpful. For the device the most important KPIs should be shown in a simple view. And it schould also be possible to hide it with a single click.
      • The accordion should not collaps after displaying attribute details (performance graph)
      • Comparison of metrics of different devices and the possibility to "save" them.
      • export of graphs and export of the data (e.g. CSV)
    • Managed Devices View
      • See points mentioned under 2.3
    • Infrastructure Policies
      • there should be an option to group policies for a better overview
      • in addition to Monitoring, Blackout and Staging policies also Tag policies should be possible
      • as mentioned under 2.2 it should be possible to inherit from a policy and only changing the delta

     

    4.1 Integrations

     

    There should be an individual section for each integrated system e.g. a section for SCOM, Icinga2, etc. In those sections it should be possible to access the data the different tools do provide. For example the Alerts in SCOM and metric data if possible, the Problems (Events) in Icinga2 and if possible performance data, etc.