Use these tips for cluster partitions.
- The rules for restricting operations within a partition are designed
to make merging the partitions feasible. Without these restrictions, reconstructing
the cluster requires extensive work.
- If the nodes in the primary partition have been destroyed, special
processing may be necessary in a secondary partition. The most common scenario
that causes this condition is the loss of the site that made up the
primary partition. Use the example in recovering from partition errors and
assume that Partition 1 was destroyed. In this case, the primary node for
Cluster Resource Groups B, C, and D must be located in Partition 2. The simplest
recovery is to use Change Cluster Node Entry to set both Node A and Node B
to failed. See changing partitioned nodes to failed for more information about
how to do this. Recovery can also be achieved manually. In order to do this,
perform these operations:
- Remove Nodes A and B from the cluster in Partition 2. Partition
2 is now the cluster.
- Establish any logical replication environments needed in the new cluster.
IE. Start Cluster Resource Group API/CL command, and so on.
Since nodes have been removed from the cluster definition in Partition
2, an attempt to merge Partition 1 and Partition 2 will fail. In order to
correct the mismatch in cluster definitions, run the Delete Cluster (QcstDeleteCluster) API on
each node in Partition 1. Then add the nodes from Partition 1 to the cluster,
and reestablish all the cluster resource group definitions, recovery domains,
and logical replication. This requires a great deal of work and is also prone to errors.
It is very important that you do this procedure only in a site loss situation.
- Processing a start node operation is dependent on the status of
the node that is being started:
The node either failed or an
End Node operation ended the node:
- Cluster resource services is started on the node that is being
added
- Cluster definition is copied from an active node in the cluster
to the node that is being started.
- Any cluster resource group that has the node being started in
the recovery domain is copied from an active node in the cluster to the node
being started. No cluster resource groups are copied from the node that is
being started to an active node in the cluster.
The node is a partitioned node:
- The cluster definition of an active node is compared to the
cluster definition of the node that is being started. If the definitions are
the same, the start will continue as a merge operation. If the definitions
do not match, the merge will stop, and the user will need to intervene.
- If the merge continues, the node that is being started is set
to an active status.
- Any cluster resource group that has the node being started in
the recovery domain is copied from the primary partition of the cluster resource
group to the secondary partition of the cluster resource group. Cluster resource
groups may be copied from the node that is being started to nodes that are
already active in the cluster.