Share: |


In mid-August I conducted a well attended Webinar on the topic of Coordinated Recovery for DB2, IMS, and VSAM. I have given the presentation at several conferences, Briefings, and customer sites, and I am scheduled to give the topic in New York City on September 26th. Why the resurgent interest in this topic, first addressed in 1999? I think it is because IT organizations recognize the need for a more granular local application recovery capability that is not supported by disk mirroring solutions or disaster recovery procedures.

 

As with any solution, there are pros and cons to disk mirroring. Typically these solutions are aimed at disaster recovery – a catastrophic outage impacting the entire data center. Disk mirroring solutions can reduce or eliminate data loss and downtime. They are expensive to build and support, but they can serve a useful purpose. However, for most recovery situations, they are not an effective solution.

 

Consider the following likely events:

• An application program change results in incorrectly updated data

• A user inadvertently updates the wrong data

• A database administrator incorrectly modifies a database structure

• A disgruntled employee updates data maliciously

• A storage controller fails, impacting hundreds of volumes of disk data

• System software maintenance is applied, containing changes that impact database data 

• And many more…

 

In these cases, one would not declare disaster and move to the remote site mirror. Local application database level recoveries are required. Making the application recoveries even more complex, over the years application relationships have evolved that include DB2, IMS, and CICS/VSAM components. Recovery of any of these components may require recovery of the related components, especially if the recovery action is a recovery to a prior point in time. It is the coordinated recovery to a prior point in time that is the topic of this article.

 

Suppose an event has occurred that corrupted your DB2 application data. The data corruption is not severe enough to declare disaster. You have decided to recover the local application to a prior point in time. The application is complex and has IMS and CICS/VSAM components that are related to the DB2 application, so those objects must be recovered to the same point in time.

 

You can exploit the BMC Recovery solutions to support this recovery event.

 

• For DB2, IMS, and VSAM, all BMC Recovery solutions support a recovery technique called BACKOUT to TIMESTAMP. BACKOUT is very fast and more efficient than normal forward recovery from an image copy.

o This assumes the underlying database datasets are physically accessible – the storage is fine, it is a logical error we are correcting.

o This also assumes that since these applications are related, they are sharing the same system clock – so TIMESTAMP is the same point in time for all applications.

o For BMC DB2 recovery, there may be events that would render BACKOUT obsolete (for instance a LOAD LOG NO executed after your recovery point – we cannot jump around that hole in the log. We would automatically fall back to a normal forward recovery for those objects).

o BMC DB2 recovery also supports a process called Recovery Avoidance. Suppose you are doing this point in time recovery for an application with 100 objects. Assume 80 of those objects have not been updated since the designated TIMESTAMP. There is no reason to recovery them, they are both physically and logically sound.

o If the DB2 recovery is a subsystem wide event (perhaps the application in play is SAP), then a DB2 conditional restart may be required. The BMC DB2 recovery solution automates the analysis for this requirement and generates the conditional restart process if needed. At the conclusion of the generated BMC Recovery action, you will have consistent data as of the designated TIMESTAMP. All in-flight units of work as of the TIMESTAMP will not be recovered. The BACKOUT technique is very fast, reducing downtime for the recovery. For the DB2 application, Recovery Avoidance further reduces downtime.

 

Some companies are interested is generating a Coordinated Recovery in a Disaster Recovery scenario. There are some differences in the procedure to implement this requirement:

• The recovery point in time TIMESTAMP can be specified, or it can be based on an event such as an IMS Log Switch.

• For IMS, a RECON Clean up utility will need to run at the remote site, as the RECONs were backed up while open.

• The IMS and CICS/VSAM recoveries will be to the specified TIMESTAMP.

• For DB2, the ‘recovery’ will not be a TIMESTAMP recovery; it will be a DB2 Conditional Restart to the RBA/LRSN that correlates to the specified TIMESTAMP. The BMC DB2 Recovery solution provides utilities that will translate the TIMESTAMP to RBA/LRSN and generate the appropriate Conditional Restart process to that point. 

• Once the DB2 Conditional Restart is complete, the application recoveries will all be to the new end of log.

 

Some questions (and answers) that have come up during presentation of this topic:

• Does the DB2 Catalog and Directory copy have to be done by BMC COPY PLUS, or can normal IBM copy work. (normal IBM copy will work)

• Can RECOVERY PLUS for IMS recover to an alternate dataset (yes. RECOVER PLUS for DB2 can do this too).

• Does the IMS Disaster Recovery part require the use of DBRC? (yes in our example, that is where we obtain the Timestamp that will be used to drive the entire DB2 Conditional Restart process)

• Does the Coordinated Disaster Recovery process have to be driven by an IMS Log Switch? (no, any event can be used to kick off the process – it could be a DB2 log switch, or the end of the batch cycle, or an arbitrary timestamp.)

 

The Coordinated Recovery Webinar was recorded and can be seen at this URL: http://go.bmc.com/forms/WBNR_MSM_DPM_CoordRecovAug15_BMCcom_EN_Aug2012

 

Please comment any questions!