BMC is pleased to announce that as an IBM Business Partner we are once again proud to be supporting the Early Support Program (ESP) for DB2. For those BMC DB2 customers participating in IBM’s ESP for DB2 11 for z/OS, we will be providing new versions of their licensed DB2 tools to install along with the DB2 11 ESP code base. Please see the announcement from IBM at http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?subtype=ca&infotype=an&appname=iSource&supplier=897&letternum=ENUS212-364 for more information on the ESP.
With the FORCE keyword, Reorg Plus for DB2 will CANCEL any threads in DB2 which are holding locks on participating objects and preventing a DRAIN from succeeding. This function is available via PTF.
The FORCE function is in effect any time a DRAIN is taken during the course of the utility. Here is the syntax for the FORCE keyword:
FORCE NONE | READERS | ALL
FORCE_AT START | RETRY | LASTRETRY
FORCE NONE (default) disables the FORCE function, enabling the utility to work as before. When FORCE NONE (default) is in effect, the utility will wait for any in-flight transactions to complete. If the in-flight transactions do not complete within the DRAIN_WAIT time limitation, the DRAIN will fail, and may RETRY if enabled via user options.
FORCE READERS enables the FORCE function to eliminate any threads which are not performing any kind of update and are holding only IS (Intent Share) or S (Share) locks on participating objects. FORCE READERS will functionally eliminate READ queries, while allowing WRITE queries to complete normally. When FORCE READERS is in effect, DRAINS may still fail if the WRITE queries do not complete within the DRAIN_WAIT time limitation.
FORCE ALL enables the utility to eliminate all threads holding any kind of lock on participating objects. When FORCE ALL is in effect, the FORCE processor will CANCEL any thread which is in-flight when the DRAIN begins, subject to the FORCE_AT specification.
BMC’s implementation is unique in that it offers control over when to begin the FORCE process. This is accomplished with the FORCE_AT parameter.
FORCE_AT START (default) instructs the utility to begin the FORCE process as soon as the first DRAIN attempt begins. This approach plows the way for the DRAIN function and insures the utility will complete in the least time. In the case of LOGFINAL, when FORCE_AT START is in effect, the LOGFINAL outage is diminished to the least possible time since there should be no wait time for thread completion and no DRAIN RETRIES.
FORCE_AT RETRY instructs the utility to wait for the first DRAIN RETRY before beginning the FORCE process. This allows transactions to complete normally if they can within the DRAIN_WAIT time limitation. But, if any threads exceed DRAIN_WAIT in duration then they will be eliminated when the second DRAIN attempt begins.
FORCE_AT LASTRETRY is offered for compatibility with the IBM FORCE functionality. Just as it sounds, FORCE_AT LASTRETRY waits until the FINAL DRAIN RETRY (based on the user DRAIN_WAIT RETRY option). So, if syntax DRAIN_WAIT 5 RETRY 3 is used with FORCE_AT LASTRETRY then the FORCE process will be invoked on the last (fourth) DRAIN attempt.
Can a thread be “Too big to fail?” Well, something like that. It may be more like being too big to get out of the way in time. Even with FORCE in effect, a badly behaving transaction which has not COMMITed in a very long time may not be FORCE-ABLE due to the ROLLBACK time required. In other words, if DRAIN_WAIT 10 is in effect, but it takes 12 seconds for a thread to ROLLBACK, then the DRAIN will still fail and require a RETRY. The RETRY should succeed in this case, and FORCE remains in effect as well. Nonetheless, this scenario should be very rare in a production environment where long running update transactions which do not perform COMMITs are rarely tolerated. READER transactions do not have this consideration since they require no ROLLBACK interval.
Enabling the FORCE function requires application of the two following PTFs:
(ARU) REORG PLUS FOR DB2 10.1 BPU3948
(SCC) SOLUTION COMMON CODE 10.1 BPJ0472
Applying either of the above PTFs without the other is not harmful, but the FORCE function will not be enabled until both are applied.
Thanks to Ken Kornblum from BMC Software for his contributions to this article.
In this article, we’ll take a look at zIIP processors from an angle that we have rarely seen covered. Our focus will be on the characteristics of code which can run on a zIIP, and code which will suffer minimum performance loss if rewritten to run on a zIIP, rather than a general purpose processor (GP).
A Short History of zIIPs
In January of 2006, IBM announced the zIIP (system Z Integrated Information Processor). Hardware and software to allow its use became available in June of that year. In addition to exploiting the zIIP in selected IBM software products, IBM licensed code to allow third party vendors to run code on zIIP’s under the control of a Non-Disclosure and License Agreement. IBM licenses the zIIP hardware to their customers for a fraction of the corresponding charges for general purpose processors. More importantly to you, zIIP processors, unlike GP’s, are not counted in the basis for software charges, so major savings can be gained by shifting workload from GP’s to zIIP’s.
In addition, zIIP processors are run “at full speed”, so you can actually get more processing done in less time on a zIIP, if your GP’s are not full capacity models (for example, if you are running on a z196, and your model number is not between 2817-701 and 2817-780).
So why don’t we simply replace every GP with a zIIP, and save money on both hardware and software? There are two reasons, each associated with IBM’s market strategy for the zIIP, which is to encourage the deployment on z/OS of non-traditional (read “distributed”) workloads for which there are competitive platforms. IBM’s goal seems to be to allow customers to reduce the hardware and software costs of supporting the back ends for these workloads.
So, first, IBM has placed a restriction on zIIP usage. There can be no more than one zIIP per GP in a CEC. Second, and this is the primary reason for this article, IBM’s License Agreement places restrictions on the kind of code that can be eligible to run on a zIIP. It must run in a z/OS “enclave” under the control of an SRB (Service Request Block). The use of an enclave means that the work is dispatched by z/OS Workload Manager (WLM), and that the code run from the enclave can be handled by a common set of WLM controls and policies, because the code has a common set of characteristics. Scheduling an SRB instead of attaching a Task Control Block (TCB) implies further restrictions; the most pertinent of these is that no SVC (except ABEND) can be issued, which prevents the use of many z/OS services commonly invoked by application programs. This has been often misstated as “you cannot do I/O on a zIIP”, which is not true. However, it is true that QSAM and BSAM, which many application programs use to perform sequential file I/O, cannot be invoked from SRB code because they issue SVCs.
Finally, for completeness, we should mention “zAAP on zIIP”, by which IBM allows processes that would be zAAP eligible to run on a zIIP in environments where there are no zAAPs. This applies primarily to Java code running on z/OS, and therefore applies only to vendor products written in Java. Very little of the legacy code which is the subject of this article fits this profile.
Why Isn’t More Vendor Code on the zIIP?
Why aren’t your mainframe vendors “zIIP enabling” 100% of the code in all their products? The answer gets back to the second restriction mentioned above – that to be zIIP eligible, code must run under an SRB and therefore cannot invoke many z/OS services. If we look at the issue of changing existing product code to make it zIIP eligible, we see that most products were written to run in TCB mode. Historically, SRB mode code was used only where it was considered a cost effective way to do cross memory processing.
If we look at a hypothetical piece of code (part of a software product) which runs under a TCB today and which makes no SVC calls, to make it zIIP eligible, we must do something like the following each time we want to execute the code:
We have to set up the SRB in memory to point to the code to be executed, and then issue the IEAMSCHED macro to schedule the SRB for execution. The TCB code must wait for the SRB code to execute. When the SRB code completes, it must signal the TCB code to resume before terminating. We could also choose to start the SRB once, and loop and wait in the SRB code instead of scheduling a new SRB each time the function is to be performed.
The point of this discussion is that there is overhead incurred in simply moving from TCB to SRB mode and back, so converting an existing code function to be zIIP eligible will always incur a CPU cost. Of course, for such a change to make sense, this should be outweighed by the cost advantages of offloading the program’s execution to the zIIP.
What kinds of programs are best suited to undergo this transformation? Ideally, they should be CPU intensive, isolated pieces of code without many of the z/OS services calls that cannot be made from an SRB, and without much I/O. Also, such programs must be executed often enough during typical processing that the CPU savings on your GPs will be substantial enough to make the recoding effort worthwhile. Of course, the first examples vendors tend to look for are code fragments which are already in SRB mode. As I pointed out above, there don’t tend to be many of those.
Next, we look for CPU intensive functions within existing TCB code. There are more of those that fit the bill.
What Does This Mean?
While it would be nice if we could enable all of the code in every one of your z/OS software products to run on the zIIP instead of GPs, that is unrealistic. I hope the information in this article will provide you with the information you need to discuss zIIP exploitation with your vendors.
Thanks to Jim Dee from BMC Software for his contributions to this article.
To measure any computer event requires some overhead to capture the event, record the related performance statistics, and write the event to disk for future reporting and analysis. If that event is very efficient (uses milliseconds of CPU to process) then the overhead to capture the performance metrics could significantly impact the event’s overall resource utilization. That said, if this efficient event consistently uses the same amount of CPU resource for every execution, then once you have established the performance statistics, you can significantly reduce the overhead by counting the occurrences of the event and still have accurate performance information.
That’s the concept behind Efficiency Filtering in the BMC SQL Performance for DB2 solution. Efficiency Filtering is an innovative filter option you can use to reduce the overhead for collecting SQL performance statistics. It’s especially useful for very efficient SQL statements.
Here’s how it works: When the Apptune Data Collector is collecting SQL performance metrics, it uses a very small amount of overhead for every SQL statement. For most SQL, the collector overhead is negligible when compared to the CPU resources required to process the SQL statement itself. However, for very efficient SQL statements that use milliseconds of CPU, the collection overhead to capture the associated performance metrics is a significant percentage of the CPU overhead for the statement. To reduce the overhead for these SQL statements, the DBA can turn on Efficiency Filtering. After the Apptune Data Collector has captured the performance metrics for a user specified number of occurrences, it will cease collecting the performance metrics and simply count the occurrences of the specific SQL statement until the end of the collector interval (24 hours by default). If a performance exception such as a deadlock or timeout occurs, the Apptune Data Collector will revert back to collecting the performance metrics. If another performance exception is not encountered, after the specified number of executions, Apptune will return to counting occurrences for this SQL statement. This ensures that detailed performance metrics are captured when there is a performance problem, but when everything is performing well, collection overhead is kept to a minimum.
Benchmarks from our development lab show that the savings can be significant.
To turn on efficiency filtering, you have to create a new filter option set or modify an existing filter options set in APPTUNE Administration to provide a little context. Within the filter option set, you will need to expand the Exception Thresholds and Options dialog and set the Efficiency Filtering option by providing a number from 0 – 9999. This value represents the number of times APPTUNE will capture performance statistics during an interval after which, if no exceptions have occurred, APPTUNE counts, but stops monitoring, the statement.
You can view the associated documentation at the BMC Documentation Center:
http://www.bmc.com/support/product-documentation (support login required)
Just search for “Efficiency Filtering” and select the topic as shown below:
Thanks to Ann Darks, Mike Behne and Mike Watkins from BMC Software for their contributions to this article.
The SQL Performance Solution for DB2 includes a set of features collectively called the Performance Advisor. These features include Reorg Advisor, Workload Access Path Compare, Recommindex, and Exception Advisor.
Exception Advisor assists in diagnosing the root cause of performance-related problems. In this article, we’ll explore its wide range of capabilities and provide a general overview of how to set it up.
Exception Advisor examines the data accompanying the triggered exception and compares that data to past execution statistics for the same statement in baseline or aggregated tables. To use Exception Advisor, the user must set up exception definitions in the APPTUNE administration panels. The data collector will capture performance data including exception records. Next, you will need to load the performance data into the Performance Advisor Database (execute the supplied load jobs) and run the Exception Advisor job. For more information on setting up and managing the Performance Advisor Database, see Chapter 3, Managing performance with Performance Advisor (BMC support login required).
Exception Advisor compares exception records for elapsed or CPU time to historical performance records for the same SQL statement to produce analysis and advice based on a rules dataset. Since it has already captured this information, there is no additional overhead required to capture the performance data. The rules dataset can be customized to adapt the recommendations to your organization’s preferences.
Here is an example of an entry from the rules dataset:
RULE:Timeouts per Escalation;
RATIO=QTXADEA+QTXATIM/QTXALEX+QTXALES; > 0
This statement experienced a timeout or deadlock, most likely due to
RULE:SyncIO Wait percentage;PERCENTAGE=SYNCWAIT/ELAPTIME; > 30
This execution spent a large percentage of time peforming
synchronous I/Os. If the getpage count has risen, it can indicate
a need to rebind the program, update catalog statistics for accessed
objects, or reorganize affected objects. Also check for RID list
failures for the statement and buffer pool performance for accessed objects.
In diagnosing performance exceptions, it can be helpful to know what issues are NOT a cause of the problem so that you avoid wasting time in your diagnosis. That point is illustrated by this report (click to enlarge image):
We can see that there is no correlation between CPU time and LOCK WAIT PERCENTAGE. The user does not need to spend time looking into locking problems for this statement. Let’s look at another report to see if a correlation exists between the exception and other statistics (click to enlarge image):
In this case, we can see that historically, there has been a high correlation between the CPU exception and both SYNCIO wait percentage and GETPAGE count. Consequently, this would be a more likely cause of your high CPU exception.
Another advantage of seeing the history of the exception is to determine if the problem arose gradually or was a sudden spike. This would help you correlate a recent system or database change to the performance problem.
Thanks to Ann Darks, Mike Behne and Mike Watkins from BMC Software for their contributions to this article.
Set up an ongoing process with Log Master for DB2 to move the latest updates from one DB2 sub-system to another DB2 or even Oracle or DB2 for Linux, Unix, Windows (LUW). (BMC Support login required.) Watch the video
Leverage the Workload Access Path Compare feature of SQL Performance for DB2 to identify problems before SQL is promoted across DB2 environments. (BMC Support login required.) Watch the video
- Al deMoya
- Andy Laredo
- Chad Reiber
- Doug Wilson
- Darel Stewart
- Jim Kurtz
- Michael Cotignola
- Phil Grainger
- Peter Plevka
- Raymond Bell
- Roberto Cason
- Rick Weaver
- Ramon Menendez
- Sam Antoun
- Ryan Smith
- Todd Mollenhauer
- Tom Barry
- Bill Moran