A recovery domain is a subset of
cluster nodes that are grouped together in a cluster resource group (CRG)
for a common purpose such as performing a recovery action or synchronizing
events.
A domain represents those nodes of the cluster from which cluster resource can be accessed. This subset of cluster nodes that is assigned to a particular cluster resource group either supports the primary point of access, secondary (backup) point of access, replicate point of access, or peer point of access.
The four types of roles a node can have in a recovery domain are:
If the primary node for a CRG fails, or a manual switchover
is initiated, then the primary point of access for that CRG is moved to the
first backup node
For nodes that participate in a primary-backup model, each node in the recovery domain has a role with respect to the current operational environment of the cluster. This is called its current role in the recovery domain. As the cluster goes through operational changes such as nodes ending, nodes starting, and nodes failing, the node's current role is changed accordingly. Each node in the recovery domain also has a role with respect to the preferred or ideal cluster environment. This is called its preferred role in the recovery domain. The preferred role is a static definition that is initially set when the cluster resource group is created. As the cluster environment changes, this role is not changed. The preferred role is only changed when nodes are added or removed from the recovery domain, or when a node is removed from the cluster. You can also manually change the preferred roles.
Conceptually, you can view the recovery domain within a primary-backup model as follows:
Node | Current role | Preferred role |
---|---|---|
A | Backup 1 | Primary |
B | Backup 2 | Backup 1 |
C | Primary | Backup 2 |
D | Replicate | Replicate |
In this example, Nodes A, B, C, and D provide an
example of a CRG that is a primary-backup model. Node C is serving as the
current primary node. Because it has a preferred role of the second backup,
Node C's current role as primary results from two failover or switchover actions.
Upon the first failover or switchover action, the primary role moved from
Node A to Node B since Node B is defined as the first backup. The second failover
or switchover triggered Node C to become the primary node since it is defined
as the second backup node. Node D current and preferred role is that of replicate.
A replicate node cannot be the assume the point of access during a failover
or switchover unless its role is changed manually to either primary or backup.
For peer model, a node
within a cluster resource groups can have one of two roles: peer or replicate.
Node | Current role | Preferred role |
---|---|---|
A | Peer | Peer |
B | Peer | Peer |
C | Peer | Peer |
D | Replicate | Replicate |
Nodes A, B, and C are defined in the recovery domain
as peer nodes. When a failure occurs on Node A, it is communicated to all
nodes in recovery domain regardless of current role. These nodes resumes the
operation at the point when Node A failed. Node D contains the data, but cannot
resume the operation since it is defined as Replicate.
Any number of nodes can be designated as the peer or replicate. Peer nodes are not ordered and can become an active access point for the cluster resources. Replicates are not ordered and cannot become an active access point for the cluster resource unless the Change Cluster Resource Group (QcstChangeClusterResourceGroup) API is used to change its role from replicate to peer.