Share This:

Welcome to the first blog on CMDB in the year 2020. First off, I'd like to acknowledge the efforts of my colleagues Devendra Borse & Varun Patwardhan for reviewing the blog contents and Manish Dwivedi for sharing the Performance Stress Test results and valuable inputs on PerfMonitor.  In this blog, we cover information on fine tuning BMC CMDB Reconciliation performance, which we hope will assist you in handling such situations  Below are the sub-topics covered in this blog:

 

[A] Introduction – CMDB Reconciliation Performance

[B] Establishing a CMDB Reconciliation Performance Benchmark

[C] How to handle CMDB Reconciliation performance degradation using standard recommendations from BMC?

[C.1] Identify the uniqueness of the Reconciliation Job performance issues, and Remediate them

[C.2]  Are there unwanted CI data participating in CMDB Reconciliation job?

[D] How to report performance issues to BMC Support?

 

As Application and Database performance issues are broad topics, let’s look at the ones which are not covered in this blog:

 

  • Network related issues
  • Crashes of the ARRECOND process

 

We will now proceed with discussing the planned topics.

 

[A] Introduction – CMDB Reconciliation Performance

 

Reconciliation is one of the most important and resource intensive activities within the BMC CMDB application.  I’ve detailed some vital topics around this subject, which I hope will assist in fine tuning its performance.

 

[B] Establishing a CMDB Reconciliation Performance Benchmark

 

Like any other database application, the best way to ascertain Reconciliation job performance is by measuring.  It involves executing the Reconciliation jobs, measuring the test results by its activities in terms of the number of Configuration Items (CIs) identified and merged per second.  This exercise shall assist you in arriving at a performance benchmark for CMDB Reconciliation.  Some of the test results conducted in our lab are as following:

 

IMPORTANT: Please note, the test results are expected to vary somewhat in every environment.  The below figures are only a result of the sample tests performed in BMC labs, and should not be considered as official confirmation of a definite count of Configuration Items (CI) to reconcile, even given a similar load and architecture.

 

Test Results Spreadsheet.PNG

 

You can record similar stats from your recent runs of TEST and PROD environments' Reconciliation jobs, and arrive at a benchmark. The benchmark can alter over a period of time depending on CI data count, fine tuning queries, indexing and other AR and database server parameters affecting performance.  

 

Please find below instructions on establishing a benchmark, data volumes to use, methodologies etc

 

https://docs.bmc.com/docs/brid1908/bmc-remedy-itsm-suite-19-02-solution-performance-benchmarks-879731271.html

 

[C] How to handle CMDB Reconciliation performance degradation using standard recommendations from BMC?

 

While performing the test of running CMDB Reconciliation jobs in order to establish a benchmark, if you notice a performance degradation, please visit our standard configuration for AR server and Database server as following

 

https://bmcsites.force.com/casemgmt/sc_KnowledgeArticle?sfdcid=kA014000000h9kqCAA&type=FAQ

 

 

If the performance of CMDB Reconciliation remains slow even after applying the standard recommendations, it must be investigated from an AR Server, Database server and Reconciliation Job perspective.  Based on our past experience, we’ve put down some steps in identifying the slowness issue, for remediation.

 

[C.1] Identify the uniqueness of the Reconciliation Job performance issue

 

There are three main components involved during the running of the Reconciliation Job:  AR System, Database & the Recon job configuration, including CI data that it reconciles.  I'd suggest the following approach to understand which component is causing the performance issue and resolving it.

 

[C.1.a] Is the slow performance of the Recon job faced as a result of huge numbers of Configuration Items (CI) to process?

 

You don't have control on the number of legitimate CI to reconcile, but you can use certain measures to eliminate those with erroneous conditions, and by using qualifications. Firstly, find out the total number of CIs to process, using the following SQL queries, and then compare the count with the benchmark you established – in terms of time taken by Recon job for a similar load.  That is to estimate the time to finish the job.

 

For Identification activity

 

SELECT COUNT(*) FROM <schema>.<class name> WHERE DatasetID != 'BMC.ASSET' AND ReconciliationIdentity = ‘0’

 

For Merge activity

 

SELECT COUNT(*) FROM <schema>.<class name> WHERE DatasetID != 'BMC.ASSET' AND ReconciliationMergeStatus = '40'

OR

SELECT COUNT(*) FROM <schema>.<class name> WHERE DatasetID = '<Dataset ID>' AND ReconciliationMergeStatus = '40'

Example:

SELECT Count(*) from dbo.BMC_CORE_BMC_ComputerSystem where DatasetID != 'BMC.ASSET' and ReconciliationMergeStatus = '40'

SELECT COUNT(*) FROM <schema>.<class name> WHERE DatasetID = 'BMC.ADDM’' AND ReconciliationMergeStatus = '40'

 

You can further optimize the query by adding more conditions, but without removing the core ones like 'ReconciliationMergeStatus', 'ReconciliationIdentity'                                    

                                       

Remediate:  After you finish running the Reconciliation job, If you find the total time taken to reconcile a given number of CIs is inconsistent with the Benchmark spreadsheet, then AR and Database server configuration must be revisited and fine-tuned.  If the Recon job performance is as good as the benchmark, there isn't much one can do, however there are a few steps to eliminate the unwanted CIs for processing:

 

  • You can relieve the stress on the Recon job by eliminating those CIs off its radar which are the victim of data errors like duplicate CI, orphan relationships etc. We have a different sub-topic in this blog by the name 'Who is participating in Recon Job?' that you can refer to, for effectively handling those data errors.

 

  • To limit the count of CIs and improve the data quality, you may choose to reconcile only normalized CIs.

 

  • Lastly, qualifications can be used in the Recon job Identification activity to limit the number of CIs to process.

 

[C.1.b] Is the Recon Job performance issue faced because there are not enough threads configured on the AR server Recon private queue to cater to the huge number of CIs?

                                                      

Symptoms: If you notice from Reconciliation engine log (arrecond.log) that only a few threads, like 1-2 of them, are used for a job that has a big count of CIs to reconcile, then you likely have insufficient threads.  In other words, if you notice good performance in the database server when running SQL queries, and that your Database administrator confirms available bandwidth of database engine to accept additional connections from the AR server, the slowness could be narrowed down to insufficient AR server thread configuration.  In some cases, adding a higher number of CPU cores may be an option to address the situation.                                                   

 

Remediate: BMC recommends using a Private RPC queue, '390698', for the Atrium CMDB Reconciliation engine process. In order to maintain the balance of threads among other processes on the AR Server, we recommend a number of (1.5 * Number of available CPU) for thread configuration purposes.

 

  • How to find total number of CPU core in Windows and Linux OS? 

 

In Windows server OS, the information on the total number of available CPU can be obtained by visiting the 'Performance' tab of Task Manager

 

CPU Pic.png

 

 

  • In Linux, the same information can be gained using the command ‘lscpu’ at the linux prompt                                                                                                 

 

Linux Number of CPU.png

 

 

 

Below is a table of recommended thread settings per CPU core   

 

Number of CPU

Max Threads

Recommended

4 CPU Core

6

4

8 CPU Core

12

8

         

         

                                       

                        

  • A screenshot of displaying the Thread settings for CMDB Reconciliation private queue

 

Private Queue 390698.png

 

  • General recommendations for threads per CPU core, and other performance indicators for Remedy applications

 

                                             https://bmcsites.force.com/casemgmt/sc_KnowledgeArticle?sfdcid=kA014000000h9kqCAA

 

                        

                                                                                                                                                                                             

 

  • When can you possibly increase the number of CPU cores on the AR server?     

 

Increase the number of CPU cores on AR server if it can benefit reconciliation via increasing the number of threads, especially when the database (on the database server) can accommodate the additional connections, and can also effectively handle the load of SQL queries. More information on handling of database performance is documented under the sub-topic 'Is the Recon Job performance issue faced because of performance issue on Database server?'

 

  • How can you monitor  the CPUs performing as against the thread settings?

 

Please engage your System Administration team who can monitor CPU performance using advanced tools like PerfMonitor on Windows OS, or some CPU management related commands in Linux, to monitor the CPU performance while Recon job runs.

 

There are various videos on using PerfMonitor other than the below one from Microsoft that can search on the web, and use it to gather information on various performance counters.  Once this data is gathered, you can consult your System Administrator in case there's a need for additional CPU cores, or configure Thread settings effectively in AR Server for CMDB Reconciliation private queue.

 

https://techcommunity.microsoft.com/t5/windows-admin-center-blog/introducing-the-new-performance-monitor-for-windows/ba-p/957991

 

Below are some of the counters used by our QA team to monitor CPU performance using PerfMonitor

 

Important counters from AR server perspective are as follows:

 

#Processor Counters

"\\ServerName\Processor(*)\% Processor Time"

"\\ServerName\Processor(*)\% User Time"

"\\ServerName\Processor(*)\% Idle Time"

"\\ServerName\Processor(*)\% Interrupt Time"

"\\ServerName\System\Processor Queue Length"

#Memory Counters

"\\ServerName\Memory\Available MBytes"

"\\ServerName\Memory\Page Faults/sec"

"\\ServerName\Memory\Page Reads/sec"

"\\ServerName\Memory\Page Writes/sec"

"\\ServerName\Memory\Page Writes/sec"

"\\ServerName\Memory\Pages Input/sec"

"\\ServerName\Memory\Pages Output/sec"

"\\ServerName\Memory\Pages/sec"

#Process Counters

"\\ServerName\Process(*)\ID Process"

"\\ServerName\Process(*)\% Processor Time"

"\\ServerName\Process(*)\% User Time"

"\\ServerName\Process(*)\Private Bytes"

"\\ServerName\Process(*)\Working Set"

"\\ServerName\Process(*)\Thread Count"

#PhysicalDisk Counters

"\\ServerName\PhysicalDisk(*)\% Idle time"

"\\ServerName\PhysicalDisk(*)\% Disk time"

"\\ServerName\PhysicalDisk(*)\% Disk Read Time"

"\\ServerName\PhysicalDisk(*)\% Disk Write Time"

"\\ServerName\PhysicalDisk(*)\Avg. Disk sec/Read"

"\\ServerName\PhysicalDisk(*)\Avg. Disk sec/Write"

"\\ServerName\PhysicalDisk(*)\Avg. Disk Queue Length"

"\\ServerName\PhysicalDisk(*)\Avg. Disk Read Queue Length"

"\\ServerName\PhysicalDisk(*)\Avg. Disk Write Queue Length"

#LogicalDisk Counters

"\\ServerName\LogicalDisk(*)\% Idle time"

"\\ServerName\LogicalDisk(*)\% Disk time"

"\\ServerName\LogicalDisk(*)\% Disk Read Time"

"\\ServerName\LogicalDisk(*)\% Disk Write Time"

"\\ServerName\LogicalDisk(*)\Avg. Disk sec/Read"

"\\ServerName\LogicalDisk(*)\Avg. Disk sec/Write"

"\\ServerName\LogicalDisk(*)\Avg. Disk Queue Length"

"\\ServerName\LogicalDisk(*)\Avg. Disk Read Queue Length"

"\\ServerName\LogicalDisk(*)\Avg. Disk Write Queue Length"

#Network Interface Counters

"\\ServerName\Network Interface(*)\Bytes Total/sec"

"\\ServerName\Network Interface(*)\Bytes Sent/sec"

"\\ServerName\Network Interface(*)\Bytes Received/sec"

#"\\ServerName\Network Interface(*)\Bytes/sec"

"\\ServerName\Network Interface(*)\Output Queue Length"

 

Important counters from Database server perspective are as follows:

 

#Processor Counters

"\\ServerName\Processor(*)\% Processor Time"

"\\ServerName\Processor(*)\% User Time"

"\\ServerName\Processor(*)\% Idle Time"

"\\ServerName\Processor(*)\% Interrupt Time"

"\\ServerName\System\Processor Queue Length"

#Memory Counters

"\\ServerName\Memory\Available MBytes"

"\\ServerName\Memory\Page Faults/sec"

"\\ServerName\Memory\Page Reads/sec"

"\\ServerName\Memory\Page Writes/sec"

"\\ServerName\Memory\Page Writes/sec"

"\\ServerName\Memory\Pages Input/sec"

"\\ServerName\Memory\Pages Output/sec"

"\\ServerName\Memory\Pages/sec"

#Process Counters

"\\ServerName\Process(*)\ID Process"

"\\ServerName\Process(*)\% Processor Time"

"\\ServerName\Process(*)\% User Time"

"\\ServerName\Process(*)\Private Bytes"

"\\ServerName\Process(*)\Working Set"

"\\ServerName\Process(*)\Thread Count"

#PhysicalDisk Counters

"\\ServerName\PhysicalDisk(*)\% Idle time"

"\\ServerName\PhysicalDisk(*)\% Disk time"

"\\ServerName\PhysicalDisk(*)\% Disk Read Time"

"\\ServerName\PhysicalDisk(*)\% Disk Write Time"

"\\ServerName\PhysicalDisk(*)\Avg. Disk sec/Read"

"\\ServerName\PhysicalDisk(*)\Avg. Disk sec/Write"

"\\ServerName\PhysicalDisk(*)\Avg. Disk Queue Length"

"\\ServerName\PhysicalDisk(*)\Avg. Disk Read Queue Length"

"\\ServerName\PhysicalDisk(*)\Avg. Disk Write Queue Length"

#LogicalDisk Counters

"\\ServerName\LogicalDisk(*)\% Idle time"

"\\ServerName\LogicalDisk(*)\% Disk time"

"\\ServerName\LogicalDisk(*)\% Disk Read Time"

"\\ServerName\LogicalDisk(*)\% Disk Write Time"

"\\ServerName\LogicalDisk(*)\Avg. Disk sec/Read"

"\\ServerName\LogicalDisk(*)\Avg. Disk sec/Write"

"\\ServerName\LogicalDisk(*)\Avg. Disk Queue Length"

"\\ServerName\LogicalDisk(*)\Avg. Disk Read Queue Length"

"\\ServerName\LogicalDisk(*)\Avg. Disk Write Queue Length"

#Network Interface Counters

"\\ServerName\Network Interface(*)\Bytes Total/sec"

"\\ServerName\Network Interface(*)\Bytes Sent/sec"

"\\ServerName\Network Interface(*)\Bytes Received/sec"

#"\\ServerName\Network Interface(*)\Bytes/sec"

"\\ServerName\Network Interface(*)\Output Queue Length"

#SQL Server Counters

"\\ServerName\MSSQL$CMDBPERF:General Statistics\User Connections"

"\\ServerName\MSSQL$CMDBPERF:SQL Statistics\Batch Requests/Sec"

"\\ServerName\MSSQL$CMDBPERF:SQL Statistics\SQL Compilations/sec"

"\\ServerName\MSSQL$CMDBPERF:SQL Statistics\SQL Re-Compilations/Sec"

"\\ServerName\MSSQL$CMDBPERF:Buffer Manager\Buffer cache hit ratio"

"\\ServerName\MSSQL$CMDBPERF:Locks(_Total)\Average Wait Time (ms)"

"\\ServerName\MSSQL$CMDBPERF:Locks(_Total)\Lock Timeouts/sec"

"\\ServerName\MSSQL$CMDBPERF:Locks(_Total)\Lock Wait Time (ms)"

"\\ServerName\MSSQL$CMDBPERF:Locks(_Total)\Lock Waits/sec"

"\\ServerName\MSSQL$CMDBPERF:Locks(_Total)\Number of Deadlocks/sec"

 

 

 

[C.1.c] Is the Recon Job performance issue faced because of performance issues on the Database server?         

 

Symptoms:       If the database server responds to SQL queries slowly while running the Reconciliation job, and as a result keeps the AR server threads in waiting mode, then the performance issue can be narrowed down to the database side.  Slow database performance can be due to multiple reasons, some of which are mentioned below

 

  • Long running queries due to the usage of incorrect index used
  • Worker threads not available to take the requests
  • I/O delays

                                       

Due to its complex nature, to understand the real cause of the database performance, please engage your Database Administrator (DBA) team.  From your side, you can help identify long running queries.

 

[i] Identifying Long Running Queries

 

  • Identify long running queries on the Remedy Database -                            

Usage of Remedy's Log Analyzer tool

 

Remedy log Analyzer tool is a very useful tool to find long running API calls, and there by SQL statements when analyzing the AR server side (SQL+API+Filter) logs captured at the time of AR Server performance issue.  For those who are still new to this tool, please go through the below video.

                                                                                                         https://www.youtube.com/watch?v=bYK1PFyNwr0

 

 

  • Identify long running queries on the Remedy Database -                                           

Usage database tools like Oracle Automatic Workload Repository (AWR) or Microsoft SQL Server Database Tuning Advisor

 

Please engage your Database Administrator to generate these reports. The benefit of generating these reports over AR Server logs is that it gives an overall picture of how the database server is performing not only from the Remedy application standpoint, but also to other applications it may be serving.

                                                                                           

               To understand more about Oracle AWR report, please go through the below link:

               https://www.oracle.com/technetwork/database/manageability/diag-pack-ow09-133950.pdf

 

               To understand more about MS SQL Server Tuning Advisor, please go through the below link:

               https://docs.microsoft.com/en-us/sql/relational-databases/performance/start-and-use-the-database-engine-tuning-advisor?view=sql-server-ver15

 

[ii] Understanding the reason behind time taken

 

  • Is it a missing index?
  • Are queries waiting on a lock to be released?
  • Is it the physical disk I/O or Memory to temporary store the query results or CPU taking time to process the query?
  • Any other unknown reason

 

The validation of the above topics requires ownership by and engagement of your Database Administrators. However, from our past experience, we’d suggest the following approach.

  • Using Database tools, identify the set of SQL queries returning the results slowly.

 

  • Are those SQL queries related to Reconciliation job activities, the Remedy application in general or an external application?  It is easy to identify Reconciliation Job queries as they are run against CMDB class forms, for example BMC_CORE_BMC_BaseElement, BMC_CORE_BMC_BaseRelationship, BMC_CORE_BMC_ComputerSystem etc

 

  • If your DBA identifies long running queries on CMDB class forms, then have them advise on the creation of an index(s) that can help speed up query results.   Index fields and their sequence is decided by a DBA.

 

  • Indexes can be created on the CMDB Forms directly using CMDB Class Manager.  But we recommend to first create the indexes directly on the database tables.  Please take appropriate backup of the database before making changes to database schema.  After its successful testing (that also includes satisfactory query response time), you may later choose to drop that index from the database, and create it on the form instead.  The benefit of creating of the indexes on Remedy forms is that they're retained during upgrade of the application.

 

  • Once the index is created, the DBA should run the queries a few times with the new index outside of Recon job first to gather the stats.  If the results are satisfying, you can proceed to run the Recon job.

 

  • At this point, before attempting to run Recon job to test the new indexes, the DBA must evaluate the rest of the available indexes on those tables to ensure they don’t overlap and hence may take priority over the newly created ones, when running the Recon job.  Basically, your DBA must ensure that with new indexes, the SQL queries will use them even during the Recon job.  This is needed to confirm because neither the BMC Remedy AR server or CMDB application have API features that can force a particular index with a SQL query it generates.

 

  • When running the Recon job, if you notice proper usage of the index, yet the slowness persist, the DBA needs to investigate other parameters like Disk I/O, Memory usage, CPU consumption or database locks, unless the DB tool report still suggests some long running queries which are newer or the previous ones.

 

  • The indexing improvisation process continues until all the long running queries are fine tuned.

 

 

  [iii] Index design Consideration & other general tips

 

  • Understand the characteristics of the most frequently used queries. For example, knowing that a frequently used query joins two or more tables (like Child CMDB classes - a join of BMC_BaseElement and BMC_<child class> will help you determine the best type of indexes to use.

 

  • Microsoft suggests to determine the optimal storage location for the index. A non-clustered index can be stored in the same filegroup as the underlying table, or on a different filegroup. The storage location of indexes can improve query performance by increasing disk I/O performance. For example, storing a non-clustered index on a filegroup that is on a different disk than the table filegroup can improve performance because multiple disks can be read at the same time.

 

  • Use tools like Microsoft Database Engine Tuning Advisor and Oracle's SQL Tuning Advisor to analyze the database and get better index recommendations.

 

  • Avoid large numbers of indexes on the tables with frequent changes in data, like BMC_BaseElement, BMC_BaseRelationship, BMC_ComputerSystem etc

 

  • Avoid using too many columns in indexes.  Use the appropriate ones and its sequence.  Consider the order of the columns if the index will contain multiple columns. The column that is used in the WHERE clause in an equal to (=), greater than (>), less than (<), or BETWEEN search condition, or participates in a join, should be placed first. Additional columns should be ordered based on their level of distinctness, that is, from the most distinct to the least distinct.

 

  • The default cursor_sharing parameter in Oracle 10g is set to exact.

 

  • The Oracle database instance is allocated only a small amount of memory

 

  • SQL Server is allocated insufficient amount of space in the tempdb database

 

  • Avoid using the LIKE operator in queries (Identification rules and Qualification of Reconciliation Job)

 

  • For better performance and results, it is recommend that you use the Reconciliation Merge Order  By class in separate transactions option, and deselect  'Include Unchanged CIs option within Merge activity

 

 

[C.2] Are there any errors for CIs participating in the CMDB Reconciliation activity?

 

You might see performance issues if there are too many CIs failing to identify or merge. The same amount of CIs will go through reconciliation activity during every job run because of failures, which will unnecessarily increase load on Recon Engine, which is also unproductive calls to AR server and the database.

Hence, it’s better in the long-run to resolve those errors instead of ignoring them. Below are the most common errors that you may experience in Reconciliation activity:

 

 

ARERR[120092] The Dataset ID and Reconciliation Identity combination is not unique

          To resolve this error, please follow this KA # https://communities.bmc.com/docs/DOC-72932

 

Investigating issues related to Reconciliation Job

          Issues related to reconciliation jobs - Documentation for BMC CMDB 19.08 - BMC Documentation

 

Found multiple matches (instances) for class

          Follow this KA # https://communities.bmc.com/docs/DOC-108436

 

 

 

[D] How to report performance issues to BMC Support?

 

  • On the AR server running the Recon job, generate AR server side SQL+API+FILTER logs for least 10 minutes to capture slowness.  Also capture the Recon job log during the same time.

                         Setting log files options - Documentation for Remedy Action Request System 9.1 - BMC Documentation

 

  • Run the Atrium Core Maintenance utility tool on the AR server running Reconciliation Engine - grab the logs and config files on the AR server running Recon engine

                          https://docs.bmc.com/docs/ac91/bmc-atrium-core-maintenance-tool-609847389.html

 

  • If possible, run the log analysis using LogAnalyzer tool, and share the output with us

https://www.youtube.com/watch?v=bYK1PFyNwr0

 

Thank you for reading.  Please share your feedback and queries.