Right Sizing for Success
In my brief essay on What is Middleware? I touched on some of the advantages and consequences of shifting the complexity of software function out of applications into middleware. In Are You Situationally (Middleware) Aware? I also reflected on how the wrong sorts of automation can make life more difficult for us.
This leads us to think about a right sizing your approach to monitoring. Specifically, using a top-down, bottom-up approach in following the life of a business transaction helps in developing a monitoring strategy which is closely aligned with the goals of your business customers. In a few small steps you are able to quickly find a route to value which your customers will love.
About Transaction Tracing
IT and development organizations need to understand how important business processes are being affected by the services that they provide. Application Performance Management is one area of development operations which provides another tool for service improvement. The APM paradigm identifies transaction tracing as one important part of this. And the type of tracing that is needed goes beyond the simple silo tracing approach to a multi-disciplinary approach.
Tracing transactions is nothing new. Since the dawn of computing there has always been a need to trace transaction steps from either a business or technical point of view. Businesses need trace records to reliably audit business steps that have been performed in order to prove completion or compliance. Technical operations need trace records to reliably trace steps that were taken by an algorithm or service to prove correct and timely processing.
Frequently, tracing paradigms suffer from numerous shortcomings. They may be too wordy and performance-draining, resulting in too much irrelevant data being collected for the ultimate goal of supporting the business. Or they may be too little too late, since human intervention and expertise is required to manually interpret the results. Or, the tracing may be too isolated, since an identified event recorded in one technologies’ trace may be difficult to relate to an upstream or downstream event in another technologies’ trace.
What is new is the need for an effective, real time, correlated view of both the business and technical service steps completed throughout all of the IT services that a business transaction touches. This multi-disciplinary tracing paradigm should be as technology-independent as possible in order to provide for maximum effectiveness. And it should be agile enough to quickly adapt to the changing business and IT landscape.
Today's number of IT components, their configurations, relationships between them, and the constant changes introduced by business and technology needs makes manual tracking difficult. It is impractical and costly to conduct manual audits to maintain IT service relationship and application dependency mapping information. What is needed here is automation that gathers the required information concerning hard and soft IT assets as well as the relationships among them.
On the other hand, the sheer size and complexity of asset/software relationships can also pose a problem. The mere action of automatically mapping a set of assets is in itself meaningless unless thoughtful and meaningful context is placed around the relationships from a business point of view. Again, everything worth knowing about in one’s IT infrastructure is not necessarily important enough to rise to the level the business service view. From the point of view of the business, a “boiling the ocean” service tracing approach is not necessarily efficient or desirable.
This is where a reduction step is necessary to map one or more technical service relationships to a business service for a well-defined service impact view. Correlated, real time transaction tracing which feeds both technical service performance as well as business performance metrics into a business service state view is yet one more aspect of technical service management which is needed to provide the most complete business service impact picture. Clearly, an understanding of the business, how it depends upon the underlying technical services, and how a business transaction is expected to flow through the underlying application services is important to know.
Tools and Methodology
What is needed is a set of tools and methodology for effectively building relevant business context around a set of technical IT services with a minimal amount of effort.
Discovery automates the process of gathering required information and populating and maintaining a Configuration Management Database (CMDB). This is done by discovering IT hardware and software and creating instances of configuration items (CIs) and relationships from the discovered data. A CI can be physical (such as a computer system), logical (such as an installed instance of a software program), or conceptual (such as a business service).
The mapping step automatically relates CIs (hardware/software assets) together to form service impact relationships. This is where a reduction methodology helps to crystallize services relationships into meaningful context which have meaning to a running business. This is where knowledge of “what is important to the business” is important for IT to know and apply to the mapping of service relationships. Emerging Big Data analytic approaches can assist in a knowledge-based goal-oriented approach to determining what is important for effective tracing.
The mapping of a business transaction flow among application services using a correlated real time transaction tracing view completes the picture by providing expected inter- and intra-application service elapsed times and business-related data. This provides the basis for enriched alerting within service impact models in the event of any excursion from the expected technical or business metric norms.
Here we outline some best practices for deciding how to select the appropriate technical and application assets and metrics to provide the appropriate context within an APM transaction tracing solution.
Top-Down: Four Easy Steps to Right Size Your Transaction Tracing
STEP 1: DETERMINE WHAT IS IMPORTANT TO THE BUSINESS
Just because a process exists does not mean that it is important enough to be traced.
Frequently, not enough thought is given to the relative importance of a process to a business. Tracing for the sake of monitoring brings no value to a business. Tracing is only valuable if it actively used to help the business stay on top of its game.
A simple measure of what is important can be summarized as follows. Determine the business processes which:
(1) Bring the company the most revenue.
(2) Bring the company the most profits.
(3) Cost the company the most money in terms of
a. Direct costs.
b. Hidden costs due to under- or non-performance risks (monetary penalties or fines).
(4) Have the highest risk in terms of damage to the company reputation (human visibility) if it fails or is faulty.
(5) Uses up the most expert people time (war-room scenario) when trying to narrow down the problem focus.
(6) Must be audited for legal or company policy reasons.
Once the relevance of an IT-dependent business process is determined, it is much easier to focus on who will directly benefit from transaction tracing.
STEP 2: DETERMINE WHO WILL BENEFIT THE MOST IN THE BUSINESS
Business users can directly benefit from transaction tracing processes.
Determine who will benefit the most from any of:
(1) Transaction history (elapsed times, payload details) for analysis.
(2) Receiving real time alerts for underperforming transactions. This could include end-to-end as well as inter-activity elapsed times or numeric payload values exceeding numeric thresholds.
(3) Narrowing the scope of a problem transaction run to a specific technology or runtime. This is so that the appropriate people can be alerted to the issue and provided known information for use in remediation or setting business customer expectations.
(4) Proof of correct business transaction completion or data compliance.
At this point we have identified the appropriate business processes to trace. Once the direct transaction tracing users are determined, a mapping of the subset of runtime assets can be done which is relevant to the most important processes and people of the business.
STEP 3: DETERMINE WHERE THE BUSINESS TRANSACTION WILL BE EXPOSED
Once the appropriate business transaction has been identified, the specific runtimes it traverses (OS type, version, hostnames, IP addresses, and middleware types, versions, and names if any) from logical start to finish of the business transaction need to be identified. Further, the logical order of the runtimes called needs to be determined.
It may not be necessary or even appropriate to map every single runtime traversed for that particular business transaction. This would depend on the need (see “Who” and “What”). If eighty percent of the “What” happens in only twenty percent of the runtimes (if this can be reasonably determined ahead of time), then it would be appropriate to start with those runtimes. Another starting point is to merely trace the end-to-end (front to back and end to end) runtimes, and then successively build out tracing to more granular levels in between the beginning and end runtimes when the classes of underperforming transactions and further suspect runtimes have been identified.
The most inappropriate way to trace transactions is to monitor every single transaction traversing every single runtime, no matter what business transaction it is, no matter who the audience for the tracing is, and without some guidance as to the expected order of runtime execution. This is also known as “boiling the ocean” and is not appropriate, efficient, or useful.
For each runtime call, the resource names being traversed need to be determined. For example, WebSphere MQ resource names are calling application name, queue manager name, queue names or topic strings, and channel names. For WebSphere Application Server the HTTP URI or Servlet Name are the resource names.
Above all what is most important in deciding where to expose the transaction is the selection of the transaction tracing toolset. What is needed in any transaction tracing solution is broad technology and OS support and not just individual silo solutions. For example, integrated transaction tracing does not end at a JEE or .Net container wall. The non-JEE and non-.Net runtimes both upstream (before) and downstream (after) the JEE or .Net container must be accessible to the transaction tracing solution, and it must span both distributed and mainframe systems. Anything less will result in a “black-box” myopic view of up- or downstream results based only on elapsed times and with no more granular or contextual contributions to analyzing the transaction flow.
STEP 4: DETERMINE TRANSACTION IDENTITY AND ATTRIBUTES
There are basically two types of data that are important to know from an exposed transaction tracing data stream. These are transaction identity and transaction payload.
Identifying the Transaction
Transaction identity is important to know since it is the way to uniquely distinguish one distinct transaction from another transaction. The identity of a transaction can be any field, technical- or business-related, that is exposed in the data stream. The identity of a transaction should be unique enough so that for a class of transactions the identity of the transaction does not repeat for a distinctly separate transaction within the same class.
If a transaction traverses multiple runtimes (either in parallel, serial, or looped fashion), it is important to have a method to “stitch” the transaction events (i.e. the appearance of the transaction at each runtime) together. The easiest, though not the usual, way is to count on the unique transaction identifier to appear at each transaction event with the same value for that distinct transaction. Depending on the scope of the transaction and how the business application was architected this could be difficult if not impossible since modern transactions typically span multiple runtimes running with different technologies.
More typically a distinct transaction identity morphs itself into a new identity, due to either one application service calling another application service, or because one runtime passes the call on to another runtime downstream. In this case a correlating transaction event must be captured to relate the two aliasing transaction identities together.
If a distinct transaction is processed multiple times in a runtime in a looped fashion, and it is required to trace the transaction at each loop iteration, then there are two additional sub-types of transaction identity data that are important to know. These are instance data, which uniquely identifies each loop iteration, and the limiting data, which distinctly signals when the loop for that distinct transaction has finished.
Categorizing the Transaction
Identifying a transaction without context is not useful for most analytic purposes. Although time stamps can be collected and associated with a distinct transaction event, using this data in isolation without business or technical context does not necessarily help solve a performance or business problem.
What is more useful is to collect additional data associated with a transaction event. Also known as “payload data”, again, a selective approach to collecting the data associated with a transaction event is the most effective way to trace the transaction. What payload data to collect would again depend upon the need (see “Who” and “What” above).
Collecting payload field values can be useful for classifying the business transaction in some way i.e. by claim type, for example, or by agent name, for reporting and alerting. It may also be useful to collect environmental information related to the transaction event for someone involved in solving a technical problem. Another way to select payload is to monitor and alert on it if the value in the payload field exceeds a threshold, or to filter classes of transactions. The entire transaction can even be flagged as failed based on payload field value(s) or threshold(s). Payload fields can also be collected and stored in history to prove compliance with laws or company policies.
Be selective when collecting payload values. For example, if the same payload value appears in two different transaction events for one distinct transaction, it is only necessary to collect it at one transaction event, not at both events. The correlation through the transaction identity associates that piece of data with all of the other transaction events for that distinct transaction and effort of collecting it again is unnecessary.
The most inappropriate way to collect payload data is to store every piece of data that is accessible at the runtime call at every runtime for “whenever it is needed” and for “whoever may use” it in the future. This is another form of “boiling the ocean” and is not appropriate, efficient or useful. This can potentially require large amounts of IT resources to back the collected data for the benefit of no one.
Bottom-Up: Enrich the Business Service Impact Model With Your Transaction Tracing Metrics
Once the appropriate process, people, and supporting transactions have been identified, the resulting exposed transactions, transaction categories, and metrics can then be tied back into the impact models which support the business service.
A service impact model should represent, at a minimum, the IT software/hardware assets which support the business services and their relationships to the business services. For example, if a claims system depends upon a hardware server running and OS, then any performance degradation of the hardware or OS should map to the performance degradation of the business service.
This two-dimensional relationship mapping is enriched by a third dimension, which is that of the transaction flow. Not only are end-to-end, inter-, and intra-elapsed times available to the impact model, but any other transaction metrics which have been exposed in the payload of the transaction at any transaction event. This encompasses not only technical but also business metrics. As a result, transaction tracing brings a richer context into a service impact model without overwhelming the result.
Transaction tracing is an important dimension of the Application Performance Management arena. Not only are technically-oriented tracing alternatives for a technology silo important, but a cross-platform, cross-technology and cross-domain correlated transaction tracing alternative is also needed to add business context to an application performance problem.
For this, a business knowledge-based goal-oriented approach to determining what is important to measure, followed by exposing only the relevant data which applies to these business-oriented performance measuring goals goes a long way to reduce the shortcomings of most of today’s siloed APM tracing offerings.
The result is a holistic, correlated, cross-platform - from distributed to mainframe and beyond - view of critical business transactions, answering the all-important question: “What happened to my transaction?”
The opinions expressed in this writing are solely my own and do not necessarily reflect the opinions, positions, or strategies or of BMC Software, it's management, or any other organization or entity. If you have any comment or question, please feel free to respond and I will be happy to respond to you. Please comment, your feedback is important. Thank you for reading this post.