A recovery domain is a subset of cluster nodes that are grouped together in a cluster resource group (CRG) for a common purpose such as performing a recovery action or synchronizing events.
A domain represents those nodes of the cluster from which cluster resource can be accessed. This subset of cluster nodes that is assigned to a particular cluster resource group either supports the primary point of access, secondary (backup) point of access, replicate point of access, or peer point of access.
The four types of roles a node can have in a recovery domain are:
If the primary node for a CRG fails, or a manual switchover is initiated, then the primary point of access for that CRG is moved to the first backup node
For nodes that participate in a primary-backup model, each node in the recovery domain has a role with respect to the current operational environment of the cluster. This is called its current role in the recovery domain. As the cluster goes through operational changes such as nodes ending, nodes starting, and nodes failing, the node's current role is changed accordingly. Each node in the recovery domain also has a role with respect to the preferred or ideal cluster environment. This is called its preferred role in the recovery domain. The preferred role is a static definition that is initially set when the cluster resource group is created. As the cluster environment changes, this role is not changed. The preferred role is only changed when nodes are added or removed from the recovery domain, or when a node is removed from the cluster. You can also manually change the preferred roles.
Conceptually, you can view the recovery domain within a primary-backup model as follows:
Node | Current role | Preferred role |
---|---|---|
A | Backup 1 | Primary |
B | Backup 2 | Backup 1 |
C | Primary | Backup 2 |
D | Replicate | Replicate |
In this example, Nodes A, B, C, and D provide an example of a CRG that is a primary-backup model. Node C is serving as the current primary node. Because it has a preferred role of the second backup, Node C's current role as primary results from two failover or switchover actions. Upon the first failover or switchover action, the primary role moved from Node A to Node B since Node B is defined as the first backup. The second failover or switchover triggered Node C to become the primary node since it is defined as the second backup node. Node D current and preferred role is that of replicate. A replicate node cannot be the assume the point of access during a failover or switchover unless its role is changed manually to either primary or backup.
For peer model, a node within a cluster resource groups can have one of two roles: peer or replicate.
Node | Current role | Preferred role |
---|---|---|
A | Peer | Peer |
B | Peer | Peer |
C | Peer | Peer |
D | Replicate | Replicate |
Nodes A, B, and C are defined in the recovery domain as peer nodes. When a failure occurs on Node A, it is communicated to all nodes in recovery domain regardless of current role. These nodes resumes the operation at the point when Node A failed. Node D contains the data, but cannot resume the operation since it is defined as Replicate.
Any number of nodes can be designated as the peer or replicate. Peer nodes are not ordered and can become an active access point for the cluster resources. Replicates are not ordered and cannot become an active access point for the cluster resource unless the Change Cluster Resource Group (QcstChangeClusterResourceGroup) API is used to change its role from replicate to peer.