Cluster Resource Group Exit Program

Required Parameter Group:

1	Success indicator	Output	Binary(4)
2	Action code	Input	Binary(4)
3	Exit program data	Input	Char(256)
4	Information given to user	Input	Char(*)
5	Format name	Input	Char(8)

Include: QSYSINC/QCSTCRG3

For each cluster resource group that has an exit program specified, the exit program is called when various Cluster Resource Services APIs are used or when various cluster events occur. The exit program is called on each active node in the cluster resource group's recovery domain and is passed an Action Code that tells the exit program what function to perform.

An active node in the cluster resource group's recovery domain means that cluster resource services and the job for the particular cluster resource group are running on the node.

The exit program is required for data, application Start of change and peer End of change cluster resource groups and is responsible for providing and managing the environment necessary for the resource's resilience.

The exit program is optional for device cluster resource groups because the system manages resilient devices. An exit program may be specified for a device cluster resource group if a user has additional functions to perform during various APIs or cluster events.

The exit program is called from a separate job which is started with the user profile specified on the Create Cluster Resource Group (QcstCreateClusterResourceGroup) API. For most action codes, Cluster Resource Services waits for the exit program to finish before continuing. A time out is not used. If the exit program goes into a long wait such as waiting for a response to a message sent to an operator, no other work will be started for the affected cluster resource group. In the case of a long wait during failover processing for a node failure, all Cluster Resource Services jobs are affected and no other cluster work will be started. Care should be exercised in the exit program when the possibility of a long wait exists.

In general if the exit program is unsuccessful or ends abnormally, the exit program will be called a second time with an action code of Undo. This allows any unfinished activity to be backed out and the original state of the cluster resource group and the resilient resource to be restored. There are some exceptions to this general statement about Undo. Some APIs continue even if the exit program is not successful and do not make a second call with an Undo action code. Also, an application cluster resource group exit program is not called with Undo if it fails while processing the Start action code for a Switchover or Failover.

More information on action codes, functions an exit program should perform, and what causes an exit program to be called is presented after the exit program parameters are described.

The exit program is restricted to the Cluster Resource Services APIs or commands it can use. Only the following are allowed:

Distribute Information (QcstDistributeInformation) API
List Cluster Information (QcstListClusterInfo) API
List Cluster Resource Group Information (QcstListClusterResourceGroupIn) API
List Cluster Resource Groups (QcstListClusterResourceGroups) API
List Device Domain Information (QcstListDeviceDomainInfo) API
Retrieve Cluster Information (QcstRetrieveClusterInfo) API
Retrieve Cluster Resource Services Information (QcstRetrieveCRSInfo) API
DSPCLUINF Command
DSPCRGINF Command

Also, the exit program must follow these guidelines to run properly in the job Cluster Resource Services starts for it and to handle error conditions correctly.

The exit program cannot be in an independent auxiliary storage pool.
It must run in a named activation group or the caller's activation group (*CALLER), when the cluster resource group exit program is an ILE program.
It must have a cancel handler to deal with situations where the job is cancelled. In particular, an application cluster resource group exit program will have its job cancelled as part of the Initiate Switchover (QcstInitiateSwitchOver) API.
It should have an exception handler to deal with unexpected exceptions and perform as much cleanup as possible at that point while it can still interrogate program state information.
It may use threads but the job description in the user profile for the job must specify that threads are allowed.
The user profile that the cluster resource group exit program runs under must not have an IASP associated with the user profile job description.

Note: See Cluster Resource Services Job Structure for additional information about jobs used to call exit programs.

Start of change High Availability Business Partners (HABPs) End of change provide software products that replicate data to other nodes in a cluster by using data cluster resource groups. Application cluster resource groups may have dependencies on these data cluster resource groups. An application cluster resource group exit program can be used to coordinate activities with data cluster resource group exit programs that are provided by an HABP.

Sample source code that can be used as the basis for writing an exit program is shipped in the QUSRTOOL library. See the TCSTAPPEXT and TCSTDTAARA members in the QATTSYSC file for an example written in ILE C.

Authorities and Locks

None.

Required Parameter Group

Success indicator

OUTPUT; BINARY(4)

Indicates to Cluster Resource Services the results of the cluster resource group exit program. The exit program must set this parameter before it ends. If the job running the exit program is cancelled before the exit program ends, the exit program cancel handler should set this parameter. Possible values of this parameter

0	Successful.
1	Unsuccessful, do not attempt restart.
2	Unsuccessful, attempt restart (applies only to an application cluster resource group).

Some APIs ignore this field. In other words, regardless of what value is set by the exit program these functions continue to completion and do not backout partial results or call the exit program a second time with an Undo action code. Likewise, the exit program should make every attempt to complete successfully for these APIs. This field is ignored by the following:

Change Cluster Node Entry (QcstChangeClusterNodeEntry) API
Change Cluster Node Entry (CHGCLUNODE) Command
Delete Cluster (QcstDeleteCluster) API
Delete Cluster (DLTCLU) Command
Delete Cluster Resource Group (QcstDeleteClusterResourceGroup) API for the Delete action code
Delete Cluster Resource Group From Cluster command for the Delete action code
Delete Cluster Resource Group (DLTCRG) command.
End Cluster Node (QcstEndClusterNode) API
Remove Cluster Node Entry (QcstRemoveClusterNodeEntry) API
End Cluster Node (ENDCLUNOD) command
Remove Cluster Node Entry (RMVCLUNODE) command
Failover Cancelled action code

An informational, alert message, CPIBB10, will be sent if the exit program returns anything other than Successful, has an unhandled exception, or the job running the exit program is cancelled.-

See When the Exit Program Ends for additional information on the Success indicator.

Action code

INPUT; BINARY(4)

Identifies the cluster API or event that is being processed and, therefore, the action the exit program should perform. The action codes listed below apply to all cluster resource group types unless otherwise specified. See also action code dependent data field in the Field Descriptions below. This further defines the action code.

Possible action codes:

1 - Initialize	A cluster resource group object is being created.
2 - Start	Establish resilience for a cluster resource group. In addition, start the application for an application cluster resource group. The exit program job for an application cluster resource group remains active on the primary node. Thus when a cluster resource group API is called while an application is running, a second job will be submitted on the primary node to run the exit program to handle the API action code. For a peer cluster resource group all peer nodes will be started and the exit program will be called.
3 - Restart	Restarts an application. Applies only to an application cluster resource group which has been configured to allow restart.
4 - End	Disable resilience of a cluster resource group on all nodes in the recovery domain.
5 - Verification phase	Provides a chance for the cluster resource group exit program to verify it really wants to perform the requested function before actual doing the specified function. This is similar to a pre-exit program.
7 - Delete	The cluster resource group object is being deleted.
8 - Rejoin	An inactive node is being activated or communication with a partitioned node is re-established. If cluster resource services is not running on any node in the cluster, the first node which is started will cause the exit program to be called with an action code of Rejoin. In this case, there will be no other active nodes in the cluster.
9 - Failover	A node failure has occurred.
10 - Switchover	Either the Initiate Switchover (QcstInitiateSwitchOver) API or the Change Cluster Resource Group Primary (CHGCRGPRI) command is being processed. The role of primary is being assigned to the node that has the current role of first backup. Does not apply to peer cluster resource groups.
11 - Add Node	A node is being added to the recovery domain.If the cluster resource group is active and a peer node is being added to a peer cluster resource group, the node will be an active access point.
12 - Remove Node	A node is being removed from the recovery domain. When an active node is being removed with either the Remove Cluster Node Entry (QcstRemoveClusterNodeEntry) API or the Remove Cluster Node Entry (RMVCLUNODE) command, only the node being removed sees this action code. Other nodes in the recovery domain will see the Failover action code. When an inactive node is being removed with either the Remove Cluster Node Entry (QcstRemoveClusterNodeEntry) API or the Remove Cluster Node Entry (RMVCLUNODE) command, all nodes see this action code.
13 - Change	The nodes listed in the recovery domain have changed their role.For a peer cluster resource group all the recovery domain nodes are provided even if only some of them have their role changed.
14 - Delete Command	The cluster resource group is being deleted on the local node only.
15 - Undo	Backout the prior request. The prior request is in the Prior Action Code field.
16 - End Node	A cluster node is ending. Only the node being ended will see this action code. Other nodes in the recovery domain will see the Failover action code.
17 - Add Device Entry	A device is being added to a device cluster resource group. Applies only to a device cluster resource group.
18 - Remove Device Entry	A device is being removed from a device cluster resource group. Applies only to a device cluster resource group.
19 - Change Device Entry	Information about a device in a device cluster resource group is being changed. Applies only to a device cluster resource group.
20 - Change Node Status	The status of a node is being changed from partitioned to failed.
21 - Failover Cancelled	A node failure occurred, but the failover was cancelled through the use of the failover message queue.

Exit program data

INPUT; CHAR(256)

Because this parameter is passed between nodes in the cluster, it can contain anything except pointers. For example, it can be used to provide state information. The owner of the cluster resource group knows the layout of the information contained in this parameter.

This data comes into existence when the cluster resource group object is created with the Create Cluster Resource Group (QcstCreateClusterResourceGroup) API or the Create Cluster Resource Group (CRTCRG) command. Change this data in the following ways:

See the description of each API in Cluster Resource Group APIs.

Information given to user

INPUT; CHAR(*)

Detailed information for this exit program call. See the EXTP0100 Format or EXTP0200 Format for more information.

Format name

INPUT; CHAR(8)

The format of the information provided in the Information Given To User parameter. If the exit program is called with a second action code such as Undo, the format contains the same data as was passed the original action code. The format name supported is:

EXTP0100	This format is used for actions. This value is allowed for primary-backup model cluster resource group and peer model cluster resource group.
EXTP0200	Same as EXTP0100, with additional information on site name and data port IP addresses on each node in the recovery domain. This value is allowed for primary-backup model cluster resource group.

EXTP0100 Format

This format Start of change contains information for the cluster event End of change .

Offset		Type	Field
Dec	Hex	Type	Field
0	0	BINARY(4)	Length of information given to user
4	4	CHAR(10)	Cluster name
14	E	CHAR(10)	Cluster resource group name
24	18	BINARY(4)	Cluster resource group type
28	1C	BINARY(4)	Cluster resource group status
32	20	CHAR(16)	Request handle
48	30	BINARY(4)	Node role type
52	34	CHAR(8)	Current node id
60	3C	CHAR(8)	Changing node ID
68	44	BINARY(4)	Changing node role
72	48	CHAR(16)	Takeover IP address
88	58	CHAR(10)	Job name
98	62	CHAR(2)	Reserved
100	64	BINARY(4)	Prior action code
104	68	BINARY(8)	Cluster resource group changes
112	70	BINARY(4)	Offset to recovery domain array
116	74	BINARY(4)	Number of nodes in the recovery domain
120	78	BINARY(4)	Original cluster resource group status
124	7C	BINARY(4)	Action code dependent data
128	80	BINARY(4)	Offset to prior recovery domain array
132	84	BINARY(4)	Number of nodes in the prior recovery domain array
136	88	BINARY(4)	Offset to configuration object array
140	8C	BINARY(4)	Number of entries in configuration object array
144	90	BINARY(4)	Length of configuration object array entry
148	94	BINARY(8)	Cluster resource group attributes
156	9C	CHAR(10)	Distribute information user queue name
166	A6	CHAR(10)	Distribute information user queue library name
176	B0	BINARY(4)	Failover wait time
180	B4	BINARY(4)	Failover default action
184	B8	CHAR(10)	Failover message queue name
194	C2	CHAR(10)	Failover message queue library name
204	CC	BINARY(4)	Cluster version
208	D0	BINARY(4)	Cluster version modification level
212	D4	CHAR(10)	Requesting user profile
222	DE	CHAR(1)	Reserved
222	DF	CHAR(1)	Allow active takeover IP address
224	E0	CHAR(20)	Application id
244	F4	BINARY(4)	Length of recovery domain array entry
248	F8	BINARY(4)	Length of prior recovery domain array entry
252	FC	CHAR(8)	Leader node id
*	*	Array() of CHAR()	Recovery domain array
These fields repeat, in the order listed, for each node in the recovery domain.		CHAR(8)	Node ID
		BINARY(4)	Node role
		BINARY(4)	Membership status
*	*	Array() of CHAR()	Prior recovery domain array
These fields repeat, in the order listed, for each node in the recovery domain.		CHAR(8)	Node ID
		BINARY(4)	Node role
		BINARY(4)	Membership status
*	*	Array(*) of CHAR(44)	Configuration object array
These fields repeat, in the order listed, for each entry in the configuration object array.		CHAR(10)	Configuration object name
		CHAR(2)	Reserved
		BINARY(4)	Configuration object type
		BINARY(4)	Device type
		BINARY(4)	Configuration object online
		BINARY(4)	Device subtype
		CHAR(16)	Server takeover IP address

EXTP0200 Format

This format Start of change contains information for the cluster event with additional information on site name and data port IP addresses on each node in the recovery domain. End of change

Offset		Type	Field
Dec	Hex	Type	Field
0	0	BINARY(4)	Length of information given to user
4	4	CHAR(10)	Cluster name
14	E	CHAR(10)	Cluster resource group name
24	18	BINARY(4)	Cluster resource group type
28	1C	BINARY(4)	Cluster resource group status
32	20	CHAR(16)	Request handle
48	30	BINARY(4)	Node role type
52	34	CHAR(8)	Current node id
60	3C	CHAR(8)	Changing node ID
68	44	BINARY(4)	Changing node role
72	48	CHAR(16)	Takeover IP address
88	58	CHAR(10)	Job name
98	62	CHAR(2)	Reserved
100	64	BINARY(4)	Prior action code
104	68	BINARY(8)	Cluster resource group changes
112	70	BINARY(4)	Offset to recovery domain array
116	74	BINARY(4)	Number of nodes in the recovery domain
120	78	BINARY(4)	Length of recovery domain array entry
124	7C	BINARY(4)	Original cluster resource group status
128	80	BINARY(4)	Action code dependent data
132	84	BINARY(4)	Offset to prior recovery domain array
136	88	BINARY(4)	Number of nodes in the prior recovery domain array
140	8C	BINARY(4)	Length of prior recovery domain array entry
144	90	BINARY(4)	Offset to configuration object array
148	94	BINARY(4)	Number of entries in configuration object array
152	98	BINARY(4)	Length of configuration object array entry
156	9C	BINARY(8)	Cluster resource group attributes
164	A4	CHAR(10)	Distribute information user queue name
174	AE	CHAR(10)	Distribute information user queue library name
184	B8	BINARY(4)	Failover wait time
188	BC	BINARY(4)	Failover default action
192	C0	CHAR(10)	Failover message queue name
202	CA	CHAR(10)	Failover message queue library name
212	D4	BINARY(4)	Cluster version
216	D8	BINARY(4)	Cluster version modification level
220	DC	CHAR(10)	Requesting user profile
230	E6	CHAR	Reserved
231	E7	CHAR	Reserved
*	*	Array() of CHAR()	Recovery domain array
These fields repeat, in the order listed, for each node in the recovery domain.		BINARY(4)	Length of entry in the recovery domain
		CHAR(8)	Node ID
		BINARY(4)	Node role
		BINARY(4)	Membership status
		CHAR(8)	Site name
		BINARY(4)	Offset to data port IP address array
		BINARY(4)	Number of data port IP addresses
		Array(*) of CHAR(16)	Data port IP address
*	*	Array() of CHAR()	Prior recovery domain array
These fields repeat, in the order listed, for each node in the recovery domain.		BINARY(4)	Length of entry in the recovery domain
		CHAR(8)	Node ID
		BINARY(4)	Node role
		BINARY(4)	Membership status
		CHAR(8)	Site name
		BINARY(4)	Offset to data port IP address array
		BINARY(4)	Number of data port IP addresses
		Array(*) of CHAR(16)	Data port IP address
*	*	Array(*) of CHAR(44)	Configuration object array
These fields repeat, in the order listed, for each entry in the configuration object array.		CHAR(10)	Configuration object name
		CHAR(2)	Reserved
		BINARY(4)	Configuration object type
		BINARY(4)	Device type
		BINARY(4)	Configuration object online
		BINARY(4)	Device subtype
		CHAR(16)	Server takeover IP address

Field Descriptions

Action code dependent data. For some action codes, additional information is provided to describe the action code. This field is used during:

Delete action code
End action code
Failover action code
Failover Cancelled action code
Rejoin action code
Remove Node action code
Start action code
Undo action code
Verification Phase action code

The possible values are:

0 - No information	No additional information.
1 - Merge	Provided with the Rejoin action code to indicate cluster partitions are merging.
2 - Join	Provided with the Rejoin action code to indicate a node which was Inactive is joining the cluster.
3 - Partition failure	Provided with the Failover or End action code to indicate the cluster has become partitioned. When provided with Failover, this is a primary partition. When provided with End, this is a secondary partition.
4 - Node failure	Provided with the Failover or Failover Cancelled action code to indicate Cluster Resource Services has failed for the entire node. This may mean the node failed such as with a power failure or that all of Cluster Resource Services has failed such as when the QSYSWRK subsystem is cancelled but the node is still running.
5 - Member failure	Provided with the Failover, Failover Cancelled, and End Node action codes to indicate a failure affecting only this cluster resource group has been detected. While other cluster resource groups may be affected also, Cluster Resource Services is unable to determine that. An example of a member failure is when a single cluster resource group job is cancelled.
6 - End node	Provided with the Failover or Failover Cancelled action code to indicate an active node in the cluster has been ended by either the End Cluster Node (QcstEndClusterNode) API or the End Cluster Node (ENDCLUNOD) command.
7 - Remove node	Provided with the Failover action code to indicate an active node in the cluster has been removed by either the Remove Cluster Node Entry (QcstRemoveClusterNodeEntry) API or the Remove Cluster Node Entry (RMVCLUNODE) command.
8 - Application failure	Provided with the Failover or Failover Cancelled action code to indicate the application failed on the primary node and could not be restarted there. The failure may have been due to an exception in the application program, an Unsuccessful attempt restart success indicator but the restart count has been reached, or an Unsuccessful do not attempt restart success indicator.
9 - Resource end	Provided with the End action code to indicate a resource has ended. This occurs when an application ends normally for an active application cluster resource group or when an active device cluster resource group has a hardware failure of an auxiliary storage pool that prevents failover. Normal ending of an application occurs when the success indicator is set to Successful and the job has not been cancelled or had an unhandled exception.
10 - Delete cluster	Provided with the Delete action code to indicate the cluster is being deleted with either the Delete Cluster (QcstDeleteCluster) API or the Delete Cluster (DLTCLU) command.
11 - Remove recovery domain node	Provided with the Remove Node action code to indicate the node is being removed with either the Remove Node From Recovery Domain (QcstRemoveNodeFromRcvyDomain) API or the Remove Cluster Resource Group Node Entry (RMVCRGNODE) command.
12 - Delete cluster resource group	Provided with the Verification phase action code to indicate the cluster resource group is being deleted with the Delete Cluster Resource Group API or commands. The Delete Cluster Resource Group API and Delete Cluster Resource Group from Cluster command can now be rejected if any recovery domain node fails the verification phase(5) action code and will not be called with an Undo action code.
13 - Failover	Provided with the Verification phase action code. This indicates the cluster resource group is being failed over to a backup node.
14 - Switchover	Provided with the Verification phase action code. This indicates the cluster resource group is being failed over to a backup node.
15 - Remove passive node	Provided with the Remove Node and the Delete action code. This indicates a node that is not active is being removed from the cluster and it needs to be removed from the recovery domain. The delete action code is used if the node being removed is the only node in the recovery domain or there are no other nodes in the recovery domain available to assume primary role (i.e. only replicates left), the cluster resource group is being deleted from the cluster.

Start of change Allow active takeover IP address. Allows a takeover IP address to already be active when it is assigned to an application cluster resource group. This field is only valid when configure takeover IP address field is 0x01. Possible values are:

0	The takeover IP address must not already be active when starting the cluster resource group.
1	The takeover IP address is allowed to be active prior to starting the cluster resource group but only on the primary node.

Start of change Application id. This is an application identifier for the Peer CRG type. It identifies the application supplying the peer cluster. Recommend format is 'vendor-id.name' where vendor-id is an identifier for the vendor creating the cluster resource group (i.e.QIBM.ExamplePeer). This indicates it is supplied by IBM for the ExamplePeer application. It is not recommended to use QIBM for vendor id name unless the cluster resource group is supplied by IBM. This field only applies to peer cluster resource groups. End of change

Changing node ID. The node in the recovery domain being assigned a new role or status. This field is hexadecimal zeroes if it doesn't apply.

A special value of *LIST is specified for this parameter when more than one node is changed. The special value is left-justified. When *LIST is specified, entries in the recovery domain array and the prior recovery domain array can be compared to determine which nodes have had changes to the node role or membership status.

This field is used during:

Add Node action code
Change Node Status action code
Change action code
End Node action code
Failover Cancelled action code
Switchover action code
Remove Node action code
Failover or Failover Cancelled action code
Rejoin action code

Changing node role. The role the node is being assigned. This field is used by the same situations that the Changing node ID field is used. The values are:

0	Primary node. Only one node can have this value.
>=1	Backup node. The backup order is designated by increasing value. The values need not be consecutive. No two backup nodes can have the same value. At the completion of the API, Cluster Resource Services will sequence the backups using consecutive numbers starting with 1.
-1	Replicate node. All replicates have this value.
-2	Changing node role not used by the action code being processed.
-3	LIST. When LIST is specified, entries the recovery domain array and the prior recovery domain array can be compared to determine which nodes have had changes to the node role or membership status.
-4	Peer node. All peer nodes have this value.

Cluster name. The name of the cluster containing the cluster resource group.

Cluster resource group attributes. A bit mask that identifies various cluster resource group attributes. The 64 bits in this field are numbered 0 thru 63 starting with the rightmost bit. If a bit is set to '1', it indicates the cluster resource group has that attribute. The meaning of each of the bits are:

This field applies only to application cluster resource groups.

0	The takeover IP address is configured by the user
1-63	Reserved. These will be set to '0'.

Cluster resource group changes. A bit mask that identifies the fields in the cluster resource group that are being changed by the Change Cluster Resource Group API. Set to hexadecimal zeroes for all other exit program calls. The 64 bits in this field are numbered 0 thru 63 starting with the rightmost bit. If a bit is set to '1', it indicates that the action represented by the bit is occurring. Even though multiple bits may be set to indicate several things are being changed, the exit program is called only when the recovery domain is changed. For more information, see the Change Cluster Resource Group (QcstChangeClusterResourceGroup) API or Change Cluster Resource Group (CHGCRG) command. This field is used by the Change and Undo action codes. The meaning of each of the bits is:

0	Recovery domain is changing
1	Takeover IP address is changing
2-63	Reserved. These will be set to '0'.

Cluster resource group name. The cluster resource group that is being processed by Cluster Resource Services.

Cluster resource group status. Status of the cluster resource group at the time the exit program is called. Possible values include:

10 - Active	For Rejoin action code.
20 - Inactive	For action codes of Rejoin or Delete Command.
30 - Indoubt	For Rejoin action code.
40 - Restored	For Rejoin action code.
500 and greater - Pending	Pending values set by various APIs.

Additional information for cluster resource group status can be found in Cluster Resource Group APIs.

Cluster resource group type. The type of cluster resource group:

1	Data resilience
2	Application resilience
3	Device resilience
4	Peer resilience

Cluster version. The exit program is being called to process the action code at this cluster version. This value determines the cluster's ability to use new functions supported by the cluster. It is set when the cluster is created and can be changed by the Adjust Cluster Version (QcstAdjustClusterVersion) API or Change Cluster Version (CHGCLUVER) command. Note: When the Adjust Cluster Version API is executed, there is a small window of time where the cluster and cluster resource group job may be operating at different cluster versions.

Cluster version modification level. The exit program is being called to process the action code at this modification level The modification level further identifies the version at which the nodes in the cluster can communicate. It is updated when code changes that impact the version are applied to the system. Note: When the Adjust Cluster Version API is executed, there is a small window of time where the cluster and cluster resource group job may be operating at different cluster version modification levels.

Configuration object array. This array identifies the resilient devices that can be switched from one node to another. This array is present only for a device cluster resource group.

Configuration object name. The name of the auxiliary storage pool device description object which can be switched between the nodes in the recovery domain. An auxiliary storage pool device description can be specified in only one cluster resource group.

Configuration object online. Vary the configuration object on or leave the configuration object varied off when a device is switched from one node to another or when it is failed over to a backup node. Possible values are:

0	Do not vary the configuration object on and do not start the server takeover IP address.
1	Vary the configuration object on and start the server takeover IP address.
2	Perform the same action for a seondary auxiliary storage pool as is specified in the primary.

Configuration object type. This specifies the type of configuration object specified with configuration object name. Possible values are:

1	Device description

Current node ID. Identifies the node running the exit program.

Data port IP address. The IP address associated with the recovery domain node. This is a dotted decimal format field and is a null-terminated string.

Device subtype. A device's subtype. This information is only as current as the last time the cluster resource group object could be updated. If configuration changes have been made on the node which owns the hardware and those changes have not yet been distributed to all nodes in the cluster, this information may be inaccurate. The data cannot be distributed if the configuration was changed on a node which does not have cluster resource services running. Possible values are:

-1	The subtype cannot be determined because hardware configuration is not complete.
0	This device type does not have a subtype.
1	UDFS independent auxiliary storage pool.
2	Secondary independent auxiliary storage pool.
3	Primary independent auxiliary storage pool.

Device type. This specifies the type of device. Possible values are:

1	Auxiliary storage pool

Distribute information user queue library name. The name of the library that contains the user queue to receive the distributed information. This field will be set to hexadecimal zeros if no distribute information user queue name was specified when the cluster resource group was created.

Distribute information user queue name. The name of the user queue to receive distributed information from the Distribute Information API. This field will be set to hexadecimal zeros if no distribute information user queue name was specified when the cluster resource group was created.

Failover default action. Should a response to the failover message queue not be received in the failover wait time limit, then this field tells clustering what it should do pertaining to the failover request. This field applies to all Start of change primary-backup model End of change cluster resource groups.

0	Proceed with failover.
1	Do NOT attempt failover.

Failover message queue library name. The name of the library that contains the user queue to receive failover messages. This field will be set to hexadecimal zeros if no failover response user queue name was specified. This field applies to all Start of change primary-backup model End of change cluster resource groups.

Failover message queue name. The name of the message queue to receive messages dealing with failover. This field will be set to hexadecimal zeros if no failover response user queue name was specified. This field applies to all Start of change primary-backup model End of change cluster resource groups.

Failover wait time. Number of minutes to wait for a reply to the failover message that was enqueued on the failover message queue. This field applies to all Start of change primary-backup model End of change cluster resource groups.

-1	Wait forever until a response is given to the failover inquiry message.
0	Failover proceeds without user intervention. Acts the same as V5R1M0 and prior.
>=1	Number of minutes to wait for a response to the failover inquiry message. If no response is received in the specified number of minutes, the failover default action field will be looked at to decide how to proceed.

Job name. Name of the job associated with an application cluster resource group exit program. This field is used only by application cluster resource groups.

Start of change Leader node id. This field identifies the name of a recovery domain node that is actively participating in the current protocol for the given cluster resource group. A value of hexadecimal zero means the exit program cannot use this field. This field only applies to a peer cluster resource group.

The leader node id is available for these action codes:

Start
End
Remove Node (only if removing a node from the recovery domain)
Change
Delete (only if deleting the cluster resource group)

Length of configuration object array entry. This specifies the length of an entry in the configuration object array. This field applies only to device cluster resource groups.

Length of entry in the recovery domain. The length of an entry in the recovery domain array. This field is used if each entry may have a different length.

Length of prior recovery domain array entry. The length of an entry in the prior recovery domain array. Start of change For EXTP0100 format this length should be used to navigate to the next prior recovery domain array entry. End of change

Length of recovery domain array entry. The length of an entry in the recovery domain array. Start of change For EXTP0100 format this length should be used to navigate to the next recovery domain array entry. End of change

Length of information given to user. The length of the data passed in the format.

Membership status. The cluster resource group membership status for the current role of a node:

0	Active. Cluster Resource Services for this cluster resource group is active on the node.
1	Inactive. Cluster Resource Services for this cluster resource group is not active on the node. The node may have failed, the node may have been ended, the QSYSWRK subsystem on that node which runs the Cluster Resource Services jobs may have been ended, or the cluster resource services job on that node may not be running.
2	Partition. The node has become partitioned and Cluster Resource Services cannot determine whether the node is active or inactive.
3	Ineligible. Cluster Resource Services for this cluster resource group is active on the node but not eligible to become the cluster resource group primary node.

Node ID. A unique string of characters that identifies a node in the recovery domain.

Node role. The role a node is to be assigned at the successful completion of the action code being processed. Start of change For primary-backup model cluster resource groups node can have one of three roles: primary, backup, or replicate. For peer model cluster resource groups a node can have one of two roles: peer or replicate. Any number of nodes can be designated as the peer or replicate. End of change

0	Primary node. Only one node can have this value. The primary node can become the active access point for the cluster resource.
>=1	Backup node. The backup order is designated by increasing value. The values need not be consecutive. No two backup nodes can have the same value. At the completion of the API, Cluster Resource Services will sequence the backups using consecutive numbers starting with 1. Backup nodes are available to be become active access points for the cluster resource after the primary node.
-1	Replicate node. All replicates have this value. Replicate nodes are not ordered and cannot become an access point unless the role is changed to a value appropriate for the cluster resource group type.
-4	Peer node. All peer nodes have this value. Peer nodes are not ordered and can all become active access points at the same time when the cluster resource group is started.

Node role type. Indicates which of the two node roles is being processed:

1	Current
2	Preferred

Number of entries in configuration object array. The number of resilient device entries in the Configuration Object Entry array. This field has a value of 0 for a data or application cluster resource group. This field applies only to device cluster resource groups.

Number of data port IP addresses. The number of data port IP addresses associated with the recovery domain node. This field has a value of 0 for a data or application cluster resource group. This field applies only to device cluster resource groups.

Number of nodes in the prior recovery domain. The number of nodes in the prior recovery domain. This is the number of elements there are in the Prior Recovery Domain Array. This will be 0 if the Prior Recovery Domain Array is not included. This field is used during:

Add Node action code
Change action code
Change Node Status action code
Failover action code
Failover Cancelled action code
Rejoin action code
Remove Node action code
Restart action code
Start action code
Switchover action code
Undo action code

Number of nodes in the recovery domain array. The number of nodes in the recovery domain. This is the number of elements in the recovery domain array.

Offset to configuration object array. The byte offset from the beginning of the format to the list of resilient devices. This field has a value of 0 for a Start of change non-device End of change cluster resource group. This field applies only to device cluster resource groups.

Offset to data port IP address array. The byte offset from the beginning of the format to the list of data port IP addresses for a recovery domain node. This field has a value of 0 for a Start of change non-device End of change cluster resource group. This field applies only to device cluster resource groups.

Offset to prior recovery domain array. The byte offset from the beginning of the format to the array of nodes in the prior recovery domain. This will be 0 if the prior recovery domain array is not included. This field is used during:

Add Node action code
Change action code
Change Node Status action code
Failover action code
Failover Cancelled action code
Rejoin action code
Remove Node action code
Restart action code
Start action code
Switchover action code
Undo action code

Offset to recovery domain array. The byte offset from the beginning of the format to the array of nodes in the recovery domain.

Original cluster resource group status. The original status of the cluster resource group before it was changed to some pending status while an API is running. For example when the exit program is called for the Start Cluster Resource Group (QcstStartClusterResourceGroup) API, the Cluster resource group status field will contain 550 (Start CRG Pending) while this field will contain 20 (Inactive) or 30 (Indoubt). Possible values include:

10	Active
20	Inactive
30	Indoubt
40	Restored

Additional information for cluster resource group status can be found in Cluster Resource Group APIs.

Preferred node role. The preferred role a node is assigned. Start of change See Node role for a more detailed description of the node role. End of change

0	Primary node. Only one node can have this value.
>=1	Backup node. The backup order is designated by increasing value. The values need not be consecutive. No two backup nodes can have the same value. At the completion of the API, Cluster Resource Services will sequence the backups using consecutive numbers starting with 1.
-1	Replicate node. All replicates have this value.
-4	Peer node. All peers have this value. Only valid in a peer cluster resource group.

Prior action code. When a cluster resource group exit program is called with an action code of Undo (15), the action code for the unsuccessful operation is placed in this field. Otherwise, this will be hex zeroes.

Prior recovery domain array. The prior recovery domain array contains the view of the recovery domain before changes were made as a result of the API being used or a cluster event occurring.

For example if a switchover is done, the prior recovery domain array will have the view with the old primary and backup order. The recovery domain array will have the view with the new primary and backup order.

If an event such as a node failure occurs, the prior recovery domain array will have the old membership status for the failing node such as Active while the recovery domain array will have the new status such as Inactive.

In most cases, the prior recovery domain is a view of the current recovery domain. If the Change Cluster Resource Group (QcstChangeClusterResourceGroup) API is being used to change the preferred recovery domain, the prior recovery domain will have a view of the preferred recovery domain.

The prior recovery domain array is available for these action codes:

Add Node
Change
Change Cluster Node Status
End Node
Failover
Failover Cancelled
Rejoin
Remove Node
Start (Only if inactive backup nodes were reordered in the recovery domain. See Start Cluster Resource Group for more information.)
Switchover

Recovery domain array. The nodes that are the recovery domain for the cluster resource group. This view of the recovery domain will contain any changes made to the node's membership status or the node's role by the API or cluster event which caused the exit program to be called.

Request handle. Uniquely identifies the API request. It is used to associate responses on the user queue specified in the Results Information parameter. This field will have a null value when the exit program is called with an action code of Failover (9).

Requesting user profile. This is the user profile that initiated the API request.

Reserved. This field is reserved and is set to hexadecimal zeroes.

Server takeover IP address. This is a takeover IP address for servers associated with the relational database. This is a dotted decimal field and is a null-terminated string. Start of change This field only applies to device cluster resource groups. End of change

Site name. The name of the site associated with the recovery domain node. Start of change This field only applies to device cluster resource groups. End of change

Takeover IP address. This is the floating IP address that is associated with an application. This is a dotted decimal field and is a null-terminated string. This field is used only by application cluster resource groups.

Application Takeover IP Address Management

The takeover IP address is the IP address used to control how clients access the application as the point of access for the application moves from one node to another during Switchover or failover. The takeover IP address is started only on one node at a time. That node is the primary node in the cluster resource group's recovery domain. The takeover IP address can be configured by Cluster Resource Services or it can be configured by the user. This attribute is specified on the Create Cluster Resource Group API and is passed to the exit program in the cluster resource group attributes field.

The following table shows which cluster APIs and events configure and manage the takeover IP address. This occurs only for application cluster resource groups. Additional information on the takeover IP address can be found in Cluster Resource Group APIs

Table 1. Takeover IP Address Management

When the Exit Program Ends

When an exit program is called with an action code, control can return to its caller because it set the success indicator and returned, had an unhandled exception, or the exit program job was cancelled.

Setting the Success Indicator and Returning

The returned value of the success indicator is used by the operating system in different ways depending upon the action code. For most action codes, anything other than Successful will result in the exit program being called again with an action code of Undo to backout the actions previously performed. There are two exceptions to this.

One, if an application exit program was called with an action code of Start, setting the success indicator to Unsuccessful, attempt restart will result in the exit program being called with Restart. Being called with an action code of Restart will occur as long as the restart count has not been reached. When the restart count is reached, failover occurs and the application is started on the first active backup node.

The exit program is not called with Restart if either an Unsuccessful, do not attempt restart indicator is returned, the exit program sets the success indicator to Successful and returns, or the cluster resource group is ended with the End Cluster Resource Group (QcstEndClusterResourceGroup) API.

Two, some action codes always proceed regardless of the exit program success indicator and the exit program is not called again with an action code of Undo. These are:

Change Cluster Node Entry
Delete
Delete Command
End Cluster Node
Remove Node (only when removing a node from the cluster)
Undo

If the exit program returns an unsuccessful indicator from Undo, the cluster resource group's status is set to Indoubt.

An Exception Occurs

An unhandled exception is treated the same way as an unsuccessful indicator. The exit program will be called again with either Restart or Undo except for the same action codes listed above where it is not called again with Undo.

If the exit program does not handle an exception while processing Undo, the cluster resource group's status is set to Indoubt.

Job is Cancelled

If the exit program job is cancelled and the exit program was performing the function of any action code other than Undo, Start, or Restart, it is treated as an unsuccessful indicator. The exit program is called with an Undo action code except for those action codes listed above where it is not called again with Undo.

If the exit program was cancelled while performing the function of Undo, the cluster resource group's status is set to Indoubt.

If the exit program was cancelled while performing the function of Start or Restart, the cluster resource group is ended; failover does not occur. It is the responsibility of the exit program cancel handler to also end any other jobs or subsystems it may have started.

An exit program job always has an associated cluster resource group job. It is the associated cluster resource group job that submits the exit program job. If the cluster resource group job is cancelled while an exit program is running, the exit program job is also cancelled. If the cluster resource group job is cancelled, the exit program is called with the End Node action code on the node where the job was cancelled.

Restarting an Application Cluster Resource Group Exit Program

Cluster Resource Services uses a restart count to control how often an active application will be restarted on the primary node before a failover occurs. The restart count is specified on the Create Cluster Resource Group (QcstCreateClusterResourceGroup) or Change Cluster Resource Group(QcstChangeClusterResourceGroup) APIs for application cluster resource groups. If the specified value is 0, the failed application will not be restarted on the primary node but failed over to the first backup. If the specified value is greater than 0, Cluster Resource Services will call the exit program with an action code of Restart after having initially called the exit program with an action code of Start. It will continue to do this for each failure, until the restart count has been reached. The exit program will be called with an action code of Restart if it returns from handling the Start action code in one of these ways:

The exit program returns with the success indicator set to 2 (Unsuccessful, attempt restart).
The exit program does not handle an exception.

Once the restart count has been reached, Failover will be attempted in order to start the application on the first active backup node. The restart count is reset only when the exit program is called with a Start action code. This occurs with the Start Cluster Resource Group (QcstStartClusterResourceGroup) or Initiate Switchover (QcstInitiateSwitchOver) API or the failover event.

Multiple Action Codes

In most situations, cluster APIs or events result in the exit program being called with a single action code. When the exit program completes successfully, the exit program is not called again for that API or cluster event. There are several situations where successful completion results in the exit program being called twice. This occurs for active application cluster resource groups for the Initiate Switchover API and the failover cluster event. In both cases, the exit program is called on the new primary first with either the Switchover or Failover action code. During this time, the exit program should do any preparation work necessary to start the application but should not yet start the application. When the exit program returns with a successful indicator, it will be called a second time with the Start action code to start the application.

Another situation occurs when a cluster resource group is deleted using either the Delete Cluster Resource Group API or Delete Cluster Resource Group From Cluster command. The exit program will be called first with Verification Phase action code and then with the Delete action code. If the verification phase returns with a unsuccessful indicator, the exit program will not be called a second time and the cluster resource group will not be deleted.

Causes of the Failover Event

It is natural to think of the failover event being caused by the most obvious problem: a node fails. The node failure could be due to a hardware problem such as the loss of a processor or an environmental problem such as the loss of electrical power.

There are a wide variety of other things that can cause a failover event when it occurs on a node that is in a cluster resource group's recovery domain. For more information about failover events, see Clusters in the System Management topic.

Cluster Resource Services APIs or Commands
- End Cluster Node API
- End Cluster Node Command
- Remove Cluster Node Entry API when Cluster Resource Services is active on the node
- Remove Cluster Node Entry Command
iSeries operator actions when Cluster Resource Services is active
- Pushing the IPL button on the front panel
- Powering down the system (PWRDWNSYS)
- Ending all subsystems (ENDSBS)
- Ending the QSYSWRK subsystem (ENDSBS)
- Change Cluster Recovery command
- Ending the system (ENDSYS)
- Ending TCP (ENDTCP)
- Cancelling the QCSTCTL, QCSTCRGM, or a specific CRG job
- Ending a TCP/IP interface used by clustering for communication with other nodes
Failures
- System hardware failure causing a machine check
- System software failure causing a machine check
- Communication line, router, and IOP failures for a communication line used by clustering for communication with other nodes
- Environmental problems causing the system to fail (electrical power loss, hardware destruction by storms, earth quake, and so on)
- An application exit program returns from handing the Start or Restart action code with the Success indicator set to Unsuccessful, attempt restart and the restart limit has been reached
- An application exit program returns from handing the Start or Restart action code with the Success indicator set to Unsuccessful, do not attempt restart
- An application exit program does not handle an exception while processing the Start or Restart action code and the restart limit has been reached

The failover event always calls the exit program so that the exit program is aware a member left the cluster. The exit program is called regardless of the state of the cluster resource group: active, inactive, or indoubt. Also, the exit program is called regardless of which member left the cluster: primary, backup, replicate Start of change or peer End of change . The exit program must look at both the state of the cluster resource group and the role of the node that left in order to perform the correct action.

Cluster resource groups should failover in a particular order when a node failure occurs. That order is device cluster resource groups first, application resource groups, and then data cluster resource groups. Start of change Peer cluster resource groups failover in parallel with the other cluster resource group types. End of change

Partition Processing

A cluster enters a partition state when a failure occurs that cannot conclusively be identified as a node failure. Cluster Resource Services detects that communication with another node or nodes has been lost but cannot determine why. A classic example is the failure of a communication line between the systems.

The exit program is called when a cluster partitions. The membership status for each partitioned node in the recovery domain will be set to Partition. However, this is different for each cluster partition. For example, suppose we have a 2 node cluster with nodes A and B, both nodes are in a cluster resource group's recovery domain, and the cluster partitions. When the exit program on A is called, the recovery domain will indicate that A is active and B is partitioned. When the exit program on B is called, the recovery domain will indicate that B is active and A is partitioned.

Start of change For primary-backup model cluster resource groups: End of change

The action code seen by the exit program on each node depends upon whether the node is in the primary or secondary partition. The primary partition contains the cluster resource group's primary node. The secondary partition does not.
All nodes in the primary partition of the cluster resource group's recovery domain will be passed an action code of Failover. All nodes in the secondary partition are passed an action code of End. Action code dependent data of Partition failure is passed in each case. These action codes are used whether the cluster resource group is active or inactive.

Start of change For peer model cluster resource groups:

All recovery domain nodes will be passed an action code of Failover. The access points that are active will remain active in all partitions.
All configuration changes are not allowed when the recovery domain spans a network partition.

Handling the Undo Action Code

When Cluster Resource Services processes an API or cluster event and an exit program is called, a failure either in the exit program or in Cluster Resource Services after the exit program ends results in an attempt to recover the prior state of the cluster resource group and its resilient resources.

Actions performed by Cluster Resource Services which changed the cluster resource group are backed out. The exit program is called with an action code of Undo so that actions it took can also be backed out.

If the exit program had nothing to do for an action code, its work to handle the Undo is trivial. Merely set the success indicator to Successful and return.

If the exit program has a failure and can back out its actions as part of handling the original action code, it may also have little or nothing to do when called with the Undo action code. Doing this back out as part of the original action code processing may be driven from the procedure which detected the problem, or from an exception handler, or from a cancel handler.

When the exit program handles the original action code successfully but Cluster Resource Services subsequently detects an error that requires the API or cluster event to be backed out, the Undo processing by the exit program becomes more involved. While the exit program is passed the action code it worked on before being called with Undo, there may be other information the exit program will have to obtain in order to successfully perform the back out. Any required back out information will have to be kept where a new job can be access it.

The format data passed to the exit program for Undo is exactly the same as was passed for the original action code except for the prior action code field.

A cluster resource group's status is returned to its original value if both the exit program and Cluster Resource Services handle the Undo action code successfully. If Cluster Resource Services is unable to back out changes or the exit program sets the success indicator to anything other than Successful, the status of the cluster resource group is set to Indoubt. When this occurs, someone such as an operator or programmer may have to be involved to determine what errors caused the problem.

Reasons an Exit Program Is Called

The table below shows the reasons an exit program is called and maps the reason to the Action Code parameter on the cluster resource group exit program. The third and fourth columns of the table give suggestions for the types of things a data or application cluster resource group exit program might do for an action code.

The following cluster resource group manager APIs or commands do not cause the exit program to be called:

Change Cluster Resource Group API if the recovery domain is not changed
Change Cluster Resource Group Command
Distribute Information API
List Cluster Resource Group Information API
List Cluster Resource Groups API
Display Cluster Resource Group Command

For a device cluster resource group, neither the replication provider nor the application provider need to supply an exit program. An exit program is optional. An exit program is required only if customer specific activities are required for resilient devices. Some examples of why a customer may wish to provide an exit program might include:

When a cluster resource group is created or a node is added to the recovery domain, the exit program could perform configuration functions for devices not supported by the device cluster resource group.
When a cluster resource group is started, the exit program could vary on devices not supported by the device cluster resource group.
When a switchover or failover is done, the exit program could vary off devices on the current primary node for devices not supported by the device cluster resource group and vary them on for the new primary node.
When a cluster resource group is deleted or a node is removed from the recovery domain, the exit program could delete configuration information previously created.
Besides managing device configuration or varying devices on or off, the exit program could perform other functions that might be useful in synchronizing events between actions on a device and operator notification or application dependencies.

Table 2. Reasons an Exit Program Is Called

Reason an Exit Program Is Called	Action Code Parameter Passed to Exit Program	Supplied by Replication Provider Exit Program Actions - Data/Peer Resilience	Supplied by Application Provider Exit Program Actions - Application Resilience
Create Cluster Resource Group API or Create Cluster Resource Group Command This interface creates a cluster resource group object, which identifies a recovery domain.	1 (Initialize)	Put data on all nodes in the recovery domain. Prime all nodes in the recovery domain.	Put applications on all nodes in the recovery domain. Prime all nodes in the recovery domain.
Start Cluster Resource Group API or Start Cluster Resource Group Command This interface establishes resilience for a cluster resource group.	2 (Start)	Start journaling. Start replication. If a Peer CRG is active, all peer nodes are actively replicating.	Start server jobs. Keep track of server jobs started. This will be needed when server jobs are restarted or the End Cluster Resource Group API is called.
Application cluster resource group exit program ends abnormally or unexpectedly.	3 (Restart)	Not called.	Restart server jobs if necessary.
End Cluster Resource Group API or End Cluster Resource Group Command This interface will disable resilience for a cluster resource group object. Application ends The Success indicator is sent to Successful and the application ends	4 (End)	Stop replication. End journaling.	End server jobs. End application resilience.
Delete Cluster Resource Group API or Delete Cluster Resource Group from Cluster Command	5 (Verification Phase)	Verify that the operation is ok to do.	Verify that the operation is ok to do.
	6 (Reserved)
Delete Cluster Resource Group API or Delete Cluster Resource Group From Cluster Command This interface deletes a cluster resource group object from all nodes in the recovery domain. Delete Cluster API or Delete Cluster Command (if Cluster Resource Services is active) This interface deletes all cluster resource groups from all nodes. Remove Cluster Node Entry API or Remove Cluster Node Entry Command (if Cluster Resource Services is active and node being remove is not active in the cluster.) This interface deletes a cluster resource group object from all nodes in the recovery domain.	7 (Delete)	7 (Delete)	7 (Delete)
Start Cluster Node API or Start Cluster Node Group Command This interface is used to start Cluster Resource Services on one or more nodes in the cluster. Partition merge event When a communication problem which caused a cluster to partition has been corrected, cluster partitions will merge back together.	8 (Rejoin)	Resynchronize data Start replication if the cluster resource group status is active (10). For peer this should be done for all peer nodes.	Start application if the cluster resource group status is Active (10)
Node failure or resource failure event End Cluster Node API or End Cluster Node Command The recovery domain node(s) which are not being ended see this action code. The node being ended sees the End Node action code. Remove Cluster Node Entry API or Remove Cluster Node Entry Command An active recovery domain node is being removed from the cluster.	9 (Failover)	For Data: Get data objects to highest level of currency. Redirect remote journal receivers. For peer, this is just notification only.	Make sure exit program data contains all key information for application restart. This can be accomplished by the Change Cluster Resource Group API or Change Cluster Resource Group Command. Use exit program data to restart application at last known point. Exit program data must contain enough key information for most effective restart. Restart server jobs after data is current. Actually occurs when the Cluster Resource Services calls the cluster resource group exit program with an action code of 2 (Start) on the new primary only.
Initiate Switchover API or Change Cluster Resource Group Primary Command This API changes the current role of a node in the recovery domain of a cluster resource group by switching the access point from the primary node to the first backup.	10 (Switchover)	For Data only: Stop replication on primary and journaling. Continue replication on other active nodes in the recovery domain. This is a combination of 4 (End) and 9 (Failover). Not valid for peer.	Make sure exit program data contains all key information for application restart before the Initiate Switchover API is called. 10 (Switchover - Pre-exit program) Stop server jobs. 2 (Start - Post-exit program) Use exit program data to restart application at last known point. Exit program data must contain enough key information for most effective restart. Restart server jobs after data is current.
Add Node to Recovery Domain API or Add Cluster Resource Group Node Entry Command This interface will add a node ID to the recovery domain of a cluster resource group.	11 (Add Node)	Exit program actions performed on the node being added: If cluster resource group is Active, do 1 (Initialize) and 2 (Start) actions.A peer node will be an active access point. If cluster resource group is Inactive, do 1 (Initialize) action.	Perform 1 (Initialize) on the node being added.
Remove Node from Recovery Domain API or Remove Cluster Resource Group Node Entry Command This interface will remove a node from the recovery domain of a cluster resource group. Remove Cluster Node Entry API or Remove Cluster Node Entry Command (will be seen on the node being removed if Cluster Resource Services is active on the node being removed) Remove Cluster Node Entry API or Remove Cluster Node Entry Command (will be seen on active cluster nodes if Cluster Resource Services is inactive on the node being removed and the API is run on an active node)	12 (Remove Node)	Exit program actions on the node being removed: If the cluster resource group is Active and the node being removed is active in the cluster do 4 (End) and 7 (Delete). For peer this should be done on all active peer nodes. If the node being remove is not active in the cluster, this is just notification that the node is being removed from the recovery domain. If cluster resource group is Inactive and the node being removed is active in the cluster , do 7 (Delete) action. If the node being remove is not active in the cluster, this is just notification that the node is being removed from the recovery domain.	Exit program actions on the node being removed: If the cluster resource group is Active and the node being removed is active in the cluster, do 4 (End) and 7 (Delete). If the node being remove is not active in the cluster, this is just notification that the node is being removed from the recovery domain. If cluster resource group is Inactive and the node being removed is active in the cluster, do 7 (Delete) action. If the node being remove is not active in the cluster, this is just notification that the node is being removed from the recovery domain.
Change Cluster Resource Group API or Change Cluster Resource Group Command This interface changes some of the attributes of a cluster resource group. Only if the recovery domain is changed will the cluster resource group exit program be called.	13 (Change)	Redirect replication if necessary. Redirect journaling if necessary. For peer if the cluster resource group is Active (10) and the node is being changed from replicate to peer perform a 2 (Start) on the new peer node. If the node is being changed from peer to replicate perform an 4 (End) on the new replicate node.
Delete Cluster Resource Group CL command This command deletes a cluster resource group object from the node running the command. This is not a distributed request. Delete Cluster API or Delete Cluster Command (if Cluster Resource Services is inactive) Remove Cluster Node Entry API or Remove Cluster Node Entry Command (if Cluster Resource Services is inactive on the node being removed and the API is run on that node)	14 (Delete Command)
	15(Undo)	Rollback operations from previous request.	Rollback operations from previous request.
End Cluster Node API or End Cluster Node Command is used to end Cluster Resource Services on a node in the recovery domain. Job cancelled A Cluster Resource Services job is cancelled.	16(End Node)	On the node being ended: Do End (4) and Change (13) if the cluster resource group status is Active (10). Do Change (13) if the cluster resource group status is Inactive(20) or Indoubt (30). For data cluster resource group if the node assigned the primary role is ended, exit program actions on the first backup: If the cluster resource group status is Active (10) do Failover (9). If the cluster resource group status is Inactive (20) or Indoubt (30) do Change (13).	On the node being ended: Do End (4) and Change (13) if the cluster resource group status is Active (10). Do Change (13) if the cluster resource group status is Inactive (20) or Indoubt (30). If the node assigned the primary role is ended, exit program actions on the first backup: If the cluster resource group status is Active (10) do Failover (9). If the cluster resource group status is Inactive (20) or Indoubt (30) do Change (13).
Add Cluster Resource Group Device Entry API or Add Cluster Resource Group Device Entry Command A resilient device entry is added to a cluster resource group	17(Add Device Entry)	Does not apply to a data or peercluster resource group.	Does not apply to an application cluster resource group
Remove Cluster Resource Group Device Entry API or Remove Cluster Resource Group Device Entry Command A resilient device entry is removed from a cluster resource group	18(Remove Device Entry)	Does not apply to a dataor peer cluster resource group	Does not apply to an application cluster resource group
Change Cluster Resource Group Device Entry API or Change Cluster Resource Group Device Entry Command Information for a resilient device entry is being changed	19(Change Device Entry)	Does not apply to a dataor peer cluster resource group	Does not apply to an application cluster resource group
Change Cluster Node Entry API or Change Cluster Node Entry Command The status of a node is being changed.	20(Change Node Status)	For data cluster resource group if the primary had failed and its status is being changed, start the cluster resource group.	If the primary had failed and its status is being changed, start the cluster resource group.
Primary node failure or resource failure event Failover is cancelled by failover message queue End Cluster Node API or End Cluster Node Command Primary node is ended and failover cancelled by failover message queue. The recovery domain node(s) which are not being ended see this action code. The node being ended sees the End Node action code.	21 (Failover Cancelled)	Not applicable to peer cluster resource group. Stop replication. End journaling.	End server jobs. End application resilience.
Primary node failure Failover is cancelled by failover message queue End Cluster Node API or End Cluster Node Command Primary node is ended and failover cancelled by failover message queue. The recovery domain node(s) which are not being ended see this action code. The node being ended sees the End Node action code.

Action Code Cross Reference

Some action codes are used by more than one API or cluster event. The following table is a cross reference between an action code and which API or cluster event uses it. The action code dependent data value is listed in parenthesis after each API and cluster event. Those with no specified dependent data value have a value of No Information (0).

Table 3. API and Cluster Event to Action Code Cross Reference

Action Code	API, Command, or Cluster Event that Uses the Action Code	Cluster Resource Group Type the Action Code Applies to
1 - Initialize	Create Cluster Resource Group API Create Cluster Resource Group Command	All cluster resource group types
2 - Start	Start Cluster Resource Group API Start Cluster Resource Group Command The second action code on the new primary for Initiate Switchover API for an active application cluster resource group The second action code on the new primary for Failover event for an active application cluster resource group	All cluster resource group types
3 - Restart	Exit program failure when processing the Start action code	Application cluster resource group
4 - End	End Cluster Resource Group API (0 - No information) End Cluster Resource Group Command For primary-backup model cluster resource groups only.Cluster partition event for the nodes in the secondary partition for both active and inactive cluster resource groups (3 - Partition failure) When an application cluster resource group exit program ends with a Success return code while processing the Start or Restart action codes (9 - Resource end)	All cluster resource group types
5 - Verification Phase	Delete Cluster Resource Group from Cluster Command Delete Cluster Resource Group API	All cluster resource group types
7 - Delete	Delete Cluster Resource Group API (0 - No information) Delete Cluster API (10 - Delete cluster) Delete Cluster Resource Group From Cluster Command Delete Cluster Command For primary-backup model cluster resource groups only.Remove Cluster Node Entry API or Remove Cluster Node Entry Command ( 15 - Remove passive node ) When a node being removed from the cluster is not active and the node is the only recovery domain node or it is the primary and there are no backups defined.	All cluster resource group types
8 - Rejoin	Cluster partitions are merging (1 - Merge) A node that was ended or failed is started (2 - Join)	All cluster resource group types
9 - Failover	See Causes of the Failover Event for a list of things that cause the failover event (4 - Node failure, 5 - Member failure, 6 - End node, 7 - Remove node, 8 - Application failover)	All cluster resource group types
10 - Switchover	Initiate Switchover API Change Cluster Resource Group Primary Command	All cluster resource group types except peer
11 - Add Node	Add Node to Recovery Domain API Add Cluster Resource Group Node Entry Command	All cluster resource group types
12 - Remove Node	Remove Cluster Node Entry API or Remove Cluster Node Entry Command All nodes see this action code if the node being removed is inactive. The action code dependent data will be 15 - Remove passive node. If removed node is active, the node being removed sees this action code while the other nodes see the Failover action code (7 - QcstRemoveNode). Remove Node From Recovery Domain API Remove Cluster ReSource Group Node Entry Entry Command (11 - Remove recovery domain node)	All cluster resource group types
13 - Change	Change Cluster Resource Group API Change Cluster Resource Group Command	All cluster resource group types
14 - Delete Command	Delete Cluster Resource Group command Delete Cluster API (when Cluster Resource Services is inactive) Remove Cluster Node Entry API used on a node where Cluster Resource Services is not running	All cluster resource group types
15 - Undo	An Undo action code is used whenever the exit program is ended due to an unhandled exception or returns with a non successful return code except for these action codes which never call the exit program a second time with Undo: 7 - Delete 12 - Remove Node (if the node being removed from the cluster) 14 - Delete Command 15 - Undo 16 - End Node 20 - Change Cluster Node Entry 22 - Cancel	All cluster resource group types.
16 - End Node	End Cluster Node API (0 - No information) End Cluster Node Command A cluster resource group job is cancelled on a node (5 - Member failure)	All cluster resource group types
17 - Add Device Entry	Add Cluster Resource Group Device Entry API Add Cluster Resource Group Device Entry Command	Device cluster resource group
18 - Remove Device Entry	Remove Cluster Resource Group Device Entry API Remove Cluster Resource Group Device Entry Command	Device cluster resource group
19 - Change Device Entry	Change Cluster Resource Group Device Entry API Change Cluster Resource Group Device Entry Command	Device cluster resource group
20 - Change Node Status	Change Cluster Node Entry API Change Cluster Node Entry Command when used to change the status of a cluster node to Failed.	All cluster resource group types
21 - Failover Cancelled	The primary node failed and the failover was cancelled through the use of the failover message queue. See Causes of the Failover Event for a list of things that cause the failover event (4 - Node failure, 5 - Member failure, 6 - End node, 7 - Remove node, 8 - Application failover)	All cluster resource group types except peer
22 - Cancel	N/A	N/A

Exit program introduced: V4R4

Top | Cluster APIs | APIs by category

API or cluster event	Cluster Resource Services does Configuration	User does Configuration
Add Node to Recovery Domain API Add Cluster Resource Group Node Entry Command	Cluster Resource Services ensures the IP address does not exist on the node being added. Cluster Resource Services adds the IP address to the node being added.	If the cluster resource group is active and the node being added is a backup node, Cluster Resource Services ensures the IP address is not active on the node being added.
cancel job The exit program job running as a result of handling the Start action code is cancelled by some operator action.	Cluster Resource Services ends the IP address after the exit program's cancel handler ends.	Cluster Resource Services ends the IP address after the exit program's cancel handler ends.
Change Cluster Node Entry API Change Cluster Node Entry Command	Cluster Resource Services does not do anything with the IP address.	Cluster Resource Services does not do anything with the IP address.
Change Cluster Recovery API Change Cluster Recovery Command	Cluster Resource Services does not do anything with the IP address.	Cluster Resource Services does not do anything with the IP address.
Change Cluster Resource Group API Change Cluster Resource Group Command	When the takeover IP address is being changed, Cluster Resource Services removes the old IP address on all nodes in a cluster resource group's recovery domain and adds the new IP address. If the cluster resource group is active and the role of a replicate node is being changed to a backup node, Cluster Resource Services ensures the takeover IP address exists and is not active.	If the cluster resource group is active and the role of a replicate node is being changed to a backup node, Cluster Resource Services ensures the takeover IP address exists and is not active.
Create Cluster Resource Group API Create Cluster Resource Group Command	Cluster Resource Services ensures the IP address does not exist on any node in the recovery domain. Cluster Resource Services adds the IP address to every node in the recovery domain.	Cluster Resource Services does not do anything with the IP address.
Delete Cluster API Delete Cluster Command	The IP address is ended on the primary node if the cluster resource group is active. Cluster Resource Services removes the IP address on all nodes in a cluster resource group's recovery domain.	The IP address is ended on the primary node if the cluster resource group is active. Cluster Resource Services does not do anything else with the IP address.
Delete Cluster Resource Group API Delete Cluster Resource Group Cluster Command	Cluster Resource Services removes the IP address on all nodes in a cluster resource group's recovery domain.	Cluster Resource Services does not do anything with the IP address.
Delete Cluster Resource Group CL command	Cluster Resource Services removes the IP address on all nodes in a cluster resource group's recovery domain.	Cluster Resource Services does not do anything with the IP address.
End Cluster Node API End Cluster Node Command	If the cluster resource group is active and the node being ended is the primary node, Cluster Resource Services ends the IP address on the primary node after calling the exit program with the End Node action code. See the failover event for how other nodes in the recovery domain are handled.	If the cluster resource group is active and the node being ended is the primary node, Cluster Resource Services ends the IP address on the primary node after calling the exit program with the End Node action code. See the failover event for how other nodes in the recovery domain are handled.
End Cluster Resource Group API End Cluster Resource Group Command	Cluster Resource Services ends the IP address on the primary node after calling the exit program with the End action code.	Cluster Resource Services ends the IP address on the primary node after calling the exit program with the End action code.
Failover event	Cluster Resource Services starts the IP address on the new primary node before calling the exit program with the Start action code.	Cluster Resource Services starts the IP address on the new primary node before calling the exit program with the Start action code.
Initiate Switchover API Change Cluster Resource Group Primary Command	Cluster Resource Services ends the IP address on the current primary node before calling the exit program with the Switchover action code. Cluster Resource Services starts the IP address on the new primary node before calling the exit program with the Start action code.	Cluster Resource Services ends the IP address on the current primary node before calling the exit program with the Switchover action code. Cluster Resource Services starts the IP address on the new primary node before calling the exit program with the Start action code.
Node joining event	If the cluster resource group is active and the node joining is a backup node, Cluster Resource Services ensures the IP address is not active on the joining node.	If the cluster resource group is active and the node joining is a backup node, Cluster Resource Services ensures the IP address is not active on the joining node.
Partition merge event	If the cluster resource group is active and the node(s) merging with the primary partition is a backup node, Cluster Resource Services ensures the IP address is not active on the merging node(s).	If the cluster resource group is active and the node(s) merging with the primary partition is a backup node, Cluster Resource Services ensures the IP address is not active on the merging node(s).
Remove Cluster Node Entry API Remove Cluster Node Entry Command	If the cluster resource group is active and the node being removed is the primary node, Cluster Resource Services ends the IP address on the primary node after calling the exit program with the Remove Node action code. See the failover event for how other nodes in the recovery domain are handled. Cluster Resource Services removes the IP address on the node being removed.	If the cluster resource group is active and the node being removed is the primary node, Cluster Resource Services ends the IP address on the primary node after calling the exit program with the Remove Node action code. See the failover event for how other nodes in the recovery domain are handled.
Remove Node from Recovery Domain API Remove Cluster Resource Group Node Entry Command	Cluster Resource Services removes the IP address on the node being removed.	Cluster Resource Services does not do anything with the IP address.
Start Cluster Resource Group API Start Cluster Resource Group Command	Cluster Resource Services ensures the IP address exists on the primary node and all backup nodes. Cluster Resource Services ensures the IP address is not active on any node unless the allow active takeover IP address field is set to 1. Cluster Resource Services starts the IP address on the primary node before calling the exit program with the Start action code.	Cluster Resource Services ensures the IP address exists on the primary node and all backup nodes. Cluster Resource Services ensures the IP address is not active on any node in the recovery domain. Cluster Resource Services starts the IP address on the primary node before calling the exit program with the Start action code.