Change partitioned nodes to failed

Sometimes, a partitioned condition is reported when there really was a node outage. This can occur when cluster resource services loses communications with one or more nodes, but cannot detect if the nodes are still operational. When this condition occurs, a simple mechanism exists for you to indicate that the node has failed.

Attention: When you tell cluster resource services that a node has failed, it makes recovery from the partition state simpler. However, changing the node status to failed when, in fact, the node is still active and a true partition has occurred should not be done. Doing so can cause a node in more than one partition to assume the primary role for a cluster resource group. When two nodes think they are the primary node, data such as files or databases can become disjoint or corrupted if multiple nodes are each independently making changes to their copies of files. In addition, the two partitions cannot be merged back together when a node in each partition has been assigned the primary role.

When the status of a node is changed to Failed, the role of nodes in the recovery domain for each cluster resource group in the partition may be reordered. The node being set to Failed will be assigned as the last backup. If multiple nodes have failed and their status needs to be changed, the order in which the nodes are changed will affect the final order of the recovery domain's backup nodes. If the failed node was the primary node for a CRG, the first active backup will be reassigned as the new primary node.

Related concepts
Merge
Rejoin
Related tasks
Tips: Cluster partitions
Related reference
CHGCLUNODE command
Change Cluster Node Entry API (QcstChangeClusterNodeEntry)
STRCLUNOD command
Start Cluster Node (QcstStartClusterNode) API

Using iSeries Navigator

This requires Option 41 (i5/OS™ - HA Switchable Resources) to be installed and licensed.

When cluster resource services has lost communications with a node but cannot detect if the node is still operational, a cluster node will have a status of Not communicating in the Nodes container in iSeries™ Navigator. You may need to change the status of the node from Not communicating to Failed. You will then be able to restart the node.

To change the status of a node from Not communicating to Failed, follow these steps:

  1. In iSeries Navigator, expand Management Central.
  2. Expand Clusters.
  3. Expand the cluster that contains the node for which you want to change the status.
  4. Click Nodes.
  5. Right-click the node for which you want to change the status, and select Cluster > Change Status.
and select ClusterChange Status

To restart the node, follow these steps:

  1. Right-click the node, and select Cluster > Start.

Using CL commands and APIs

To change the status of a node from Not communicating to Failed, follow these steps:
  1. Use the CHGCLUNODE command or the Change Cluster Node Entry (QcstChangeClusterNodeEntry) API to change the status of a node from partitioned to failed. This should be done for all nodes that have actually failed.
  2. Use the STRCLUNOD command or the Start Cluster Node (QcstStartClusterNode) API to start the cluster node, allowing the node to rejoin the cluster.