Skip navigation
Share:|

BCO Data Warehouse - an Introduction

 

The Data Warehouse (DWH) is an essential component in BMC Capacity Optimization (BCO) which is managed by the Data Hub component.

 

The Data Hub uses a backend service, the Near-Real Time Warehouse (NRTWH) to store, organize, and calculate statistics for the collected data. This component is always running inside the Data Hub

 

The Data Warehouse administration menu lets you inspect and tune the mechanisms that control the warehouse.

To access this menu, go to Administration>Data Warehouse in the Navigation frame.

 

Data Warehouse Activity

The Data Warehouse activity consists of:

 

Data Aggregation

  • ETL Tasks collect data into a stage table
  • The warehousing engine calculates aggregations and splits data into summary tables at different time resolution qualities (detail, hour, day, and month)
  • Data aggregated based on internal rules (derived rows) and are referred to as hierarchical data aggregation

Data classification

  • The day and hour class definitions are used to categorize data as specified by the Calendar

Data aging

  • Data classified as old by customizable aging parameters is deleted from the warehouse tables

Custom statistics

  • User can define additional statistics to perform calculations on data series (for example, percentages or baselines)

 

A Data flow report is visible in the Data Warehouse section in the console, or through one of BCO’s Diagnostic reports. This provides a way to check BCO import activity and keep it under control, as the number of rows processed per day is a good indicator of the system's health and ability to process records.

 

When performing historical imports (for example, if you are planning to bulk load more than two million rows), it is strongly recommended to split the ETL data in smaller chunks, limiting the amount of information processed at one time. This prevents congestion in the Near-Real-Time Warehouse engine.

 

 

 

Data Warehouse (DWH) Reference Model

The Data Warehouse can host different types of data:

 

  • Time series (TS): Performance or business driver data representing metrics over time, with two subtypes:
    • Sampled, that is metric samples, at regular time intervals
    • Delta records
  • Custom structures (CS): Data that represent generic records with custom attributes, with two subtypes:
    • Buffer tables, containing data that is copied into BCO for further processing, but is generally not important for direct analysis
    • item-level detail tables, containing data that represent the details of an item that are important for analysis purposes. For example, errors for a specific page
  • Object relationships (OBJREL): Data that represent relationships between entities
  • Events (EVDAT): Data that represents events

 

The primary purpose of the DWH is the collection of historical time series data. A time series is a sequence of samples or statistics for a certain measurement, each corresponding to a point in time. The DWH contains both time series samples and statistics, aggregated at different time resolutions (hour, day, and month).

 

All time series are associated with a measured object, described according to a reference model. In its most general form, the reference model for measured objects is displayed in the figure below:

 

RefModel.jpg

The Reference Model comprises of the following components:

 

  • An entity represents a single system (for example, a database instance or web server) or a business driver. There are also three categories of entities: Domains, Business Drivers, and Systems (https://docs.bmc.com/docs/display/public/bcmco95/Entity+types) category has its own set of sub-types that can be also assigned to the entity.
  • An object is a metric for a system or business driver for which data is collected (for example, the CPU utilization of a server or the FTP transfer bit rate).
  • The location tracks the physical location from where a metric was observed. (for example, the FTP transfer bit rate when a file is downloaded from New York or from San Diego.
  • A subobject represents finer details of a metric (for example, a metric measuring the free space of a disk could have details about the free space of each disk partition as its subobjects).

 

Each available object/metric has a standard name in BCO which adheres to the naming convention defined for BCO datasets. A complete metric listing can be seen in the console, Administration >DATAWAREHOUSE>Datasets & metrics or in the product documentation (https://docs.bmc.com/docs/display/public/bcmco95/Datasets+and+metrics)

 

 

Each metric has a type which defines the unit of measure and how statistics about that data should be collected. In BCO version 9.5, metric types are listed in the table below:

 

TypeDescription
GENERICGeneric counter, absolute value
PERCENTAGEPercentage counter
COUNTA count of events, absolute number
RATEA frequency, in events/sec
POSACCUMPositive accumulation counter
ConfConfiguration data (string)
NEGACCUMNegative accumulation counter
ELAPSEDElapsed time, in seconds
WEIGHTED_GENERICPercentage counter, weighted
PEAK_PERCENTAGEPeak Percentage counter
PEAK_RATEPeak frequency, in events/sec
DELTADifference between subsequent samples
WEIGHTED_PERCENTAGEGeneric counter, absolute value, weighted

 

Measurement units and formats use common standards:

 

TypeFormat
TimestampsYYYY-MM-DD HH24:MI:SS
Elapsed timesseconds (duration)
Percentage metricsfrom 0 to 1
Rate metricsevents/sec

 

 

In summary, the reference model specifies:

  • The standard for object structure
  • The standard for metric names
  • The standard for measurement units
  • The time granularity, which is automatically adjusted by the data warehouse

 

 

Data Flow Report

The Data flow report page in the BCO console summarizes warehousing operations in terms of imported, derived, reduced and aged rows, and in terms of duration. This is a daily report, so you can monitor the growth of BCO and check how many rows are stored each day at detail level. Also see, https://docs.bmc.com/docs/display/public/bcmco95/Data+flow+report

 

The page and the report provide the following information:

  • Data Load Capacity Used summary:
    • Loaded daily (last 30 day average):  number of rows in the data flow report that are loaded daily. The value is averaged out for the last 30 days.
    • Estimated daily capacity (8 hour period): estimated maximum daily throughput sustainable by the deployment, considering a processing time of 8 hours.
  • TS: Referring date (timestamp): Current day statistics are updated at regular intervals
  • Derived Rows: number of derived rows (sum)
  • Processing Count: number of rows processed by all threads (sum)
  • Processing Rows: number of rows processed from stage tables (sum)
  • Processing Throughput: total processing count divided by total processing time (aggregation on all DWH threads)
  • Processing Time [S]: number of seconds spent in processing, with at least one active thread (sum)
  • Reduced Rows: number of rows stored in the D and MDCH tables
  • Split Rows: number of rows generated due to a split, having a "TS+duration" that exceeds the hour limit (sum)
  • Stored Rows Conf: number of rows stored in the CONF_VARIATION table (sum)
  • Stored Rows Day: number of rows stored in the DH table (sum)
  • Stored Rows Detail: number of rows stored in the DETAIL table (sum)
  • Thread Processing Throughput: total number of rows processed, divided by the thread processing time

This report can be very useful to support in helping determine where the problem lies!

 

Data Flow Report

The Data flow report page in the BCO console summarizes warehousing operations in terms of imported, derived, reduced and aged rows, and in terms of duration. This is a daily report, so you can monitor the growth of BCO and check how many rows are stored each day at detail level. Also see, https://docs.bmc.com/docs/display/public/bcmco95/Data+flow+report

The page and the report provide the following information:

  • Data Load Capacity Used summary:
    • Loaded daily (last 30 day average):  number of rows in the data flow report that are loaded daily. The value is averaged out for the last 30 days.
    • Estimated daily capacity (8 hour period): estimated maximum daily throughput sustainable by the deployment, considering a processing time of 8 hours.
  • TS: Referring date (timestamp): Current day statistics are updated at regular intervals
  • Derived Rows: number of derived rows (sum)
  • Processing Count: number of rows processed by all threads (sum)
  • Processing Rows: number of rows processed from stage tables (sum)
  • Processing Throughput: total processing count divided by total processing time (aggregation on all DWH threads)
  • Processing Time [S]: number of seconds spent in processing, with at least one active thread (sum)
  • Reduced Rows: number of rows stored in the D and MDCH tables
  • Split Rows: number of rows generated due to a split, having a "TS+duration" that exceeds the hour limit (sum)
  • Stored Rows Conf: number of rows stored in the CONF_VARIATION table (sum)
  • Stored Rows Day: number of rows stored in the DH table (sum)
  • Stored Rows Detail: number of rows stored in the DETAIL table (sum)
  • Thread Processing Throughput: total number of rows processed, divided by the thread processing time

This report can be very useful to support in helping determine where the problem lies!

 

The image below is an example of a Data flow report taken from one of our BCO lab servers:DataFlowReport.jpg

If you completed a sizing exercise with BMC before you installed and started using BCO, you will have a basis for comparing your warehouse to your initial sizing estimates, which relatres to the number of rows that the warehouse is able to process. This can be useful in cases where your initial estimates were lower than expected due to the popularity of gathering capacity information from various ETL’s, and you are now processing more data than it is able to handle.

 

Do you think BCO queries are taking too long?

Generally, we recommend that the DWH processing time should be less than 4 hours, which is set as the default in BCO for your system tasks and ETLs. When processing time exceeds this, you may notice a warning message similar to the one below:

BCO_ETL_WARN301: ETL "task name” [nn] on scheduler [n] is running since yyyy-mm-dd hh-mm-ss. The expected execution time is less than 4 hours.

 

Unless you have a dramatic increase in the number of new entities imported, or are playing catch-up with older historical data, the row Processing Count and Processing Time from the Data flow report should be relatively close each day, and you should not see this message. If it is not, then there could be something else going on, and should be investigated.

 

 

Additional Reference

In problems with the warehouse or datahub, these articles may be helpful in resolving or at least determining where the issue is:

 

Datahub page is not responsive and takes several minutes to refresh in GUI. Datawarehouse performance issues KA400067,

https://kb.bmc.com/infocenter/index?page=content&id=KA400067

 

The BCO console is taking a long time to complete tasks that at other times were faster. What can be done to determine why the BCO console is running slowly? KA376940, https://kb.bmc.com/infocenter/index?page=content&id=KA376940

 

How to quickly assess the health of a BCO installation - Daily checks over a BCO installation, KA 413301,

https://kb.bmc.com/infocenter/index?page=content&id=KA413301

 

 

I hope you have found this article to be useful. Feel free to comment on it, or to suggest other topics of interest for future articles.

Miss a Pulse? BMC ProactiveNet Pulse Blogs

 

thx, timo

Filter Blog

By date:
By tag: