Prepare for disk failures

Because your data is spread across your disks, it is important that you consider how to protect your data in the event that one of those disks fails. Disk protection provides a means to ensure availability of data stored on disks.

Disk storage is the storage that is either internal to your iSeries™ server or is attached to it. This disk space, together with your server's main memory, is regarded by your system as one large storage area. When you save a file, you do not assign it to a storage location; instead, the system places the file in the location that ensures the best performance. It may spread the data in the file across multiple disk units, if that is the best option. When you add more records to the file, the system assigns additional space on one or more disk units. This way of addressing storage is known as single-level storage.

Start of change In addition to internal disk storage, you can also use IBM^® TotalStorage^® Enterprise Storage Server^® (ESS) to attach a large volume of external disk units. ESS provides enhanced disk protection, the ability to copy data quickly and efficiently to other ESS servers, and the capability of assigning multiple paths to the same data to eliminate connection failures. For additional information on IBM TotalStorage Enterprise Storage Server (ESS) and its features and to determine if this solution is right for you, see Enterprise disk storage. End of change

Device parity protection

Device parity protection allows your system to continue to operate when a disk fails or is damaged. When you use device parity protection, the disk input/output adapter (IOA) calculates and saves a parity value for each bit of data. The IOA computes the parity value from the data at the same location on each of the other disk units in the device parity set. When a disk failure occurs, the data can be reconstructed by using the parity value and the values of the bits in the same locations on the other disks. Your system continues to run while the data is being reconstructed.

Start of change For an overview of device parity protection, see Device parity protection. End of change

i5/OS supports two types of device parity protection:

RAID 5

With RAID 5, the system can continue to operate if one disk fails in a parity set. If more than one disk fails, data will be lost and you must restore the data for the entire system (or only the affected disk pool) from the backup media. Logically, the capacity of one disk unit is dedicated to storing parity data in a parity set consisting of 3 to 18 disk units

RAID 6

With RAID 6, the system can continue to operate if one or two disks fail in a parity set. If more than two disk units fail, you must restore the data for the entire system (or only the affected disk pool) from the backup media. Logically, the capacity of two disk units is dedicated to storing parity data in a parity set consisting of 4 to 18 disk units.

See Elements of device parity protection for detailed comparison of RAID 5 and RAID 6.

Start of change Write cache and auxiliary write cache IOA End of change

Start of change When the system sends a write operation, the data is first written to the write cache on the disk IOA and then later written to the disk. If the IOA experiences a failure, the data in the cache may be lost and cause an extended outage to recover the system. End of change

Start of change The auxiliary write cache is an additional IOA that has a one-to-one relationship with a disk IOA. The auxiliary write cache protects against extended outages due to the failure of a disk IOA or its cache by providing a copy of the write cache which can be recovered following the repair of the disk IOA. This avoids a potential system reload and gets the system back on line as soon as the disk IOA is replaced and the recovery procedure completes. However, the auxiliary write cache is not a failover device and cannot keep the system operational if the disk IOA, or its cache, fails. End of change

See Write cache and auxiliary write cache IOA in Disk management for detailed information on write cache and auxiliary write cache IOA.

Mirrored protection

Disk mirroring is recommended to provide the best system availability and the maximum protection against against disk-related component failures. Data is protected because the system keeps two copies of the data on two separate disk units. When a disk-related component fails, the system may continue to operate without interruption by using the mirrored copy of the data until the failed component is repaired.

Start of change Different levels of mirrored protection are possible, depending on what hardware is duplicated. The level of mirrored protection determines whether the system keeps running when different levels of hardware fail. To understand these different levels of protection, see Determine the level of protection. End of change

Start of change You can duplicate the following disk-related hardware: End of change

Disk unit
Disk controllers
I/O bus unit
I/O adapter
I/O processors
A bus
Expansion towers
HSL ring

For details on mirrored protection, including how it works and how to plan for it, see Mirrored protection.

Independent disk pools

Start of change Independent disk pools (also called independent auxiliary storage pools) enable you to prevent certain unplanned outages because the data on them is isolated from the rest of your server. If an independent disk pool fails, your system can continue to operate on data in other disk pools. Combined with different levels of disk protection, independent disk pools provide more control in isolating the effect of a disk-related failure as well as better prevention and recovery techniques. For detailed information on how to use independent disk pools, see Independent disk pools. End of change

Geographic mirroring

Geographic mirroring is a function that keeps two identical copies of an independent disk pool at two sites to provide high availability and disaster recovery. The copy owned by the primary node is the production copy and the copy owned by a backup node at the other site is the mirror copy. User operations and applications access the independent disk pool on the primary node, the node that owns the production copy. Geographic mirroring is a sub-function of cross-site mirroring (XSM), which is part of i5/OS Option 41, High Available Switchable Resources.

For details on geographic mirroring, including how it works and how to plan for it, see Geographic mirroring.

Multipath disk units

You can define up to eight connections from each LUN (Logical Unit) created on the IBM TotalStorage Enterprise Storage Server (ESS) to the IOPs on an iSeries server. If you are using an ESS solution, assigning multiple paths to the same data allows the data to be accessed even though some failures may occur in other connections to the data. Each connection for a multipath disk unit functions independently. Several connections provide availability by allowing disk storage to be used even if a single path fails.

For details on multipath disk units, including its requirements, see Considerations for multipath disk units.