My blog post 12 Steps to a systems approach to diagnostics covers a complete list of diagnostics that can be used to check the overall health of enterprise applications. In this post, I will share thoughts on how features of the BMC Atrium CMDB Suite map to them, some of the lesser known diagnostic features, and ways to validate the overall health of the application.
Verify the Product Installation and Environment
It is important to verify the product is successfully installed or upgraded before you start using it. For this reason, there is a post-install routine included in the installer to verify the installation was successful by unit testing a few features. Did you know this tool can also be run months later to verify the features are still working? Launch AtriumCoreMaintenanceTool from the server and access the Health Check page. See Performing a Health Check in the product documentation for more details on it. This is an example of a component test diagnostic. The Health Check can take some time to complete, so be sure to perform it on a development server first to understand the impact and duration before running it on a production system.
Another useful tool to bring back from install time is the BMC Remedy Pre-Checker for Remedy ITSM 8.1 (unsupported) which checks environmental dependencies before running an ITSM install or upgrade. The ITSM dependencies are generally a superset of the Atrium CMDB install dependencies. Some of these values may have been modified since installation time, so this utility can quickly point to those discrepancies. The Pre-checker is what I referred to as a configuration analyzer. It is available in communities instead of bundled in the product. That is one way to address the need for automatic updates to analyzers - the latest version is posted in the community. This allows additional verification options to be added easily based on experience after the product version is released.
Update: March 2014: See Trending in Support: The Pre-checker becomes Remedy Configuration Check Utility for a change in BMC Atrium CMDB Suite version 8.1.01 - including an expansion of the functionality to cover some post-installation configuration checks.
Verifying Job Run Status
Most of the work performed in BMC Atrium CMDB is performed by jobs. To quickly assess the health of the system, you can perform these steps to view the Job History:
- Access Atrium Integrator, set ‘Show all Job Runs With Status’ to “Failed”, and select each job, check if any job encountered a failure. The default behavior is to show the past week of job runs.
- Access Normalization, set ‘Show all Job Runs With Status’ to “Failed”, and select each job, check if any job encountered a failure.
- Access Reconciliation, set ‘Show all Job Runs With Status’ to “Failed”, and select each job, check if any job encountered a failure.
An understanding of the implementation strategy helps in performing this analysis. For example, on several occasions when checking the job run status, the issue was discovered to be a job that was running unexpectedly. So in these cases, it was particularly helpful to NOT filter on Failed job runs – viewing all the job runs was more helpful. Another reason to look at the latest job runs and not exclusively Failures is because the job may have completed but encountered failures on some CIs. Checking the last run of the job to see whether it was successful, how many errors were encountered, and the next scheduled job run is a better way to evaluate the health of particularly relevant business processes.
Below is a screenshot of the Reconciliation Job History for a particular job, where it shows the job runs daily and has succeeded each time it was run in the last week:
Integration / Federation / Connectivity unit tests
Several Integrations for BMC Atrium CMDB Suite involve connecting to Atrium, and others connecting from Atrium. Connectivity tests are more meaningful when performed from the consuming or acting application. For example, Atrium Integrator may retrieve data from many different data sources to import into CMDB, and since Atrium Integrator is the actor, it is best to check the status there. In the case of Atrium Web Services, you can verify it is working, but unit testing the integration means testing from the consuming application.
Atrium CMDB Suite integrates with other applications as a consumer via Federation. One useful way to unit test these integrations is to maintain a single CI – such as a particular computer system – which integrates to each of the federated data sources. This allows a quick unit test by viewing the CI in Atrium Explorer and launching to each federated source.
Data model and data quality
Two of the least known diagnostics to test the overall health of the product are command-line utilities accessed from the server – cdmchecker and cmdbdiag.
Cdmchecker is what I referred to as a “meta data consistency checker” – it checks for issues in the CMDB class definition and underlying ARSystem form definitions. Making changes to class definitions in Atrium Core Class Manager makes corresponding changes to the ARSystem forms and workflow, which is the designed behavior and rarely encounters issues. Discrepancies can occur when migrating workflow changes via BMC Remedy Developer Studio, or when failed changes are left in place and encountered later. Cdmchecker provides a quick way to validate the Common Data Model definitions and either confirm or deny issues are present in the class definitions. Learn more about cdmchecker in the documentation.
Cmdbdiag is another command line utility, used for identifying or correcting issues in the instance data. This utility is included in the product and can be run from the system console. It is an administrator-only diagnostic which can check for and address data quality issues by running queries against the database to identify data which is incorrect, or to perform bulk. Data updates run directly against the database so they run quicker and go around validation workflow, but it is important to make a database backup and perform the steps carefully because user error can impact a large amount of data. Learn more about cmdbdiag in the documentation.
Diagnostic collection and error analysis
AtriumCoreMaintenanceTool is accessed from the server Clicking Zip Logs or Zip All Logs collects system, configuration, and log files to a zip file. See Collecting diagnostics using Log Zipper for more information about how to run it. The Maintenance Tool is used by many BMC Software products, and the procedure described in this video gives more details on how to analyze the files captured.
There is no error log analyzer used with Atrium CMDB Suite at this time. It does not appear to be a productive approach to use to evaluate the overall health of BMC Atrium CMDB Suite for a few reasons:
- The other methods described above provide a better overview of what is working or not working in the application.
- Errors are recorded to different files, with file names based on the job and max size of logs
- Processes are multi-threaded and process CIs in parallel
- Actions with fail and are subsequently fixed still report the original error
- Log files outside the time of interest must be archived or removed, which is a manual process at present
- It is possible that future improvements in these areas may make investment in a log analyzer more productive. Today, log analysis is better addressed when investigating an encountered issue rather than a test of overall product health.
End-to-end system validation
As described in 12 Steps to a systems approach to diagnostics, an end to end system validation is most successful when implemented as part of a solution which drives or limits post installation deployment options. The features of BMC Atrium CMDB allow for easy automation, so it is possible to build but there is nothing productized. The effectiveness of such a diagnostic is largely influenced by picking uses cases that are meaningful but not disruptive. For example, to test functionality of:
- Importing computer systems from an external data source
- Normalization of the computer system
- Reconciling it into BMC.ASSET
An Atrium Integrator job could be scheduled to import a single computer system to update it in a source dataset, this could trigger normalization via In-line Normalization, and the Reconciliation job could be either scheduled or continuous to update the computer system with a value passed in from the import. The single computer system could be the same one described above which is used to test federation. Would this be a meaningful test to automate? Not necessarily. Most data updates will involve a more substantial amount of data to be loaded, will be scheduled to occur in appropriate windows, and errors encountered would reflect data and workflow constraints. Automating a single transaction to occur frequently may point to simple issues such as a job not running, but may also encounter false positives when an operation is slow because it occurs at the same time as the business processes it is meant to test.
A BMC Atrium CMDB Systems Approach to Diagnostics
In this post, I covered the main features used to take a systems approach to diagnosing issues with BMC Atrium CMDB Suite, including:
- AtriumCoreMaintenanceTool Health Check and Pre-checker to validate the system configuration and installation
- Job History for Atrium Integrator, Normalization, and Reconciliation
- Atrium Explorer to view a test Computer System for validating federation
- Cdmchecker and to validate the data model and class definitions
- Cmdbdiag to validate and correct instance or relationship data
- Log Zipper to collect relevant diagnostics and view them
I also discussed a few diagnostics which are not used to take a systems approach to diagnosing issues - at least not yet - with BMC Atrium CMDB, including:
- Error analyzer
- End-to-end system validation
and some of the challenges to making those kind of diagnostics relevant and meaningful.
This post represents my own opinions and experiences and do not necessarily represent BMC Software's position, strategies or opinions. I am interested in feedback on your own experiences. What diagnostic steps or utilities have you found useful to isolate an issue in the product by evaluating the overall health of the product? What features would you like to see in the product? Please add your comments below.