Consider job recovery

Job recovery and starting again should be a basic part of application design. Applications should be designed to handle:

Job recovery procedures should ensure the integrity of the user's data and allow for easy starting of the interrupted application. Journaling and commitment control can be used in application design to help in job recovery. Recovery procedures should be transparent to the end users.

Interactive job recovery

If you are running a data entry job or one that updates a single file, it is unlikely that you need to plan an extensive recovery strategy. The operators can inquire against the file to determine which record was last updated and then continue from that point.

To recover from inquire-only jobs, the workstation operators simply start where they left off. When using update transactions for many files, consider using a journal or commitment control. The system automatically recovers journaled files during the initial program load (IPL) following an abnormal end of the system, or during make available (vary on) processing of an independent ASP after an abnormal vary off. In addition, the journal can be used for user-controlled forward or backward file recovery. There are other object types in addition to database physical files that you can protect with journaling.

Commitment control, using the file changes recorded in the journal, provides automatic transaction and file synchronization. During job end, the system automatically rolls back file updates to the beginning of the transaction. In addition, the commitment control notify object can assist you in restarting the transaction.

When designing an interactive application, consider the possibility that you can experience equipment problems with your workstations and communications lines. For example, suppose your computer system loses power. If you have an uninterruptible power supply installed to maintain power to the processing unit and disk units, your system remains active. However, in this example, your workstations lost power. When your programs attempt to read or write to the workstations, an error indication is returned to the program. If the application is not designed to handle these errors, the system can spend all its time in workstation error recovery.

You should design your interactive applications to look at error feedback areas and handle any errors indicated. If the application handles the errors and stops, the system resource is not used to do nonproductive error recovery. Examples of using error feedback areas and error recovery routines can be found in the programming languages reference manuals.

Batch job recovery

Print-only batch jobs normally do not need special recovery to start again. Running the program again may be adequate.

Batch jobs that perform file updates (add, change, or delete actions) present additional considerations for starting again and recovery. One approach to starting again is to use an update code within the record. As a record is updated, the code for that record can also be updated to show that processing for that record is complete. If the job is started again, the batch program positions itself (as a result of the update code) to the first record that it had not processed. The program then continues processing from that point in the file.

Another way to start batch processing again is to save or copy the file before starting the job. You can use one of the following commands to save or copy the file:

Then, if you have to start again, restore or copy the file to its original condition and run the job again. With this approach, you need to ensure that no other job is changing the files. One way to ensure this is to get an exclusive lock on the file while the job is running. A variation of this approach is to use the journal. For example, if starting again is required, you could issue the Remove Journal Change (RMVJRNCHG) command to remove changes to the files. Then, run the job again against the files.

If your batch job consists of a complex input stream, you probably want to design a strategy for starting again into the input stream. Then, if the batch job needs to be started again, the job determines from what point the stream continues.

Commitment control also can be used for batch job recovery. However, if you plan to use commitment control for batch jobs, consider that the maximum number of record locks allowed in a commit cycle is 4 000 000. Therefore, you may need to divide the batch job into logical transactions. For example, if your batch program updates a master file record followed by several detail records in another file, each of those sets of updates can represent a logical transaction and can be committed separately. Locks are held on all records changed within a commit cycle. Therefore, changed data is made available more quickly if your batch job is divided into small, logical transactions.

Journaling can also be used to assist in batch job recovery just as it can be for interactive jobs.