Lists some of the most common problems that can occur in a cluster,
as well as ways to avoid and recover from them.
The following common problems are easily avoidable or easily correctable.
You cannot start or restart a cluster node
This
situation is typically due to some problem with your communications environment.
To avoid this situation, ensure that your network attributes are set correctly,
including the loopback address, INETD settings, ALWADDCLU attibute, and the
IP addresses for cluster communications.
- The ALWADDCLU network attribute must be appropriately set on the target
node if trying to start a remote node. This should be set to either *ANY or
*RQSAUT depending on your environment.
- The IP addresses chosen to be used for clustering locally and on the target
node must show an Active status.
- The LOOPBACK address (127.0.0.1) locally and on the target node must also
be active.
- The local and any remote nodes must be able to PING using the IP addresses
to be used for clustering to insure network routing is active.
- INETD must be active on the target node. When INETD is active, port 5550
on the target node should be in a Listen state. See INETD
server for information about starting the INETD server.
- Prior to attempting to start a node, port 5551 on the node to be started
must not be opened or it will, in fact, prevent the successful start of clustering
on the subject node.
You end up with several, disjointed one-node clusters
This
can occur when the node being started cannot communicate with the rest of
the cluster nodes. Check the communications paths.
The response from exit programs is slow.
A common
cause for this situation is incorrect setting for the job description used
by the exit program. The MAXACT parameter may be set too low so that, for
example, only one instance of the exit program can be active at any point
in time. It is recommended that this be set to *NOMAX.
Performance in general seems to be slow.
There are
several common causes for this symptom.
- The most likely cause is heavy communications traffic over a shared communications
line.
- Another likely cause is an inconsistency between the communications environment
and the cluster message tuning parameters. You can use the Retrieve Cluster Resource
Services Information (QcstRetrieveCRSInfo) API to view the
current settings of the tuning parameters and the Change Cluster Resource Services (QcstChgClusterResourceServices) API to
change the settings. Cluster performance may be degraded under default cluster
tuning parameter settings if using old adapter hardware. The adapter hardware
types included in the definition of old are 2617, 2618,
2619, 2626, and 2665. In this case, setting of the Performance class tuning
parameter to Normal is desired.
- Another common cause of this condition is problems with the IP multicast
groups. If the primary cluster addresses (first address entered for a given
node when creating a cluster or adding a node) for several nodes reside on
a common LAN, the cluster will utilize IP multicast capability. Using the NETSTAT command,
insure the primary cluster addresses show a multicast host group of 226.5.5.5.
This can be seen using option 14 Display multicast group for
the subject address. If the multicast group does not exist, verify the default
setting of TRUE is still set for the Enable multicast cluster
tuning parameter by using the Retrieve Cluster Resource Services
Information (QcstRetrieveCRSInfo) API.
- If all the nodes of a cluster are on a local LAN or have routing capabilities
which can handle Maximum Transmission Unit (MTU) packet sizes of greater than
1,464 bytes throughout the network routes, large cluster message transfers
(greater than 1,536K bytes) can be greatly speeded up by increasing the cluster
tuning parameter value for Message fragment size to better
match the route MTUs.
You cannot use any of the function of the new release.
If
you attempt to use new release function and you see error message CPFBB70,
then your current cluster version is still set at the prior version level.
You must upgrade all cluster nodes to the new release level and then use the
adjust cluster version interface to set the current cluster version to the
new level. See Adjust the cluster version of a cluster for more information.
You cannot add a node to a device domain or access the iSeries™ Navigator
cluster management interface.
To access the iSeries Navigator cluster management
interface, or to use switchable devices, you must have i5/OS™ Option
41, HA Switchable Resources installed on your system. You must also have a
valid license key for this option.
You applied a cluster PTF and it does not seem to be working.
You should ensure that you have completed the following tasks
after applying the PTF:
- End the
cluster
- Signoff then signon
The old program is still active in the activation
group until the activation group is destroyed. All of the cluster code (even
the cluster APIs) run in the default activation group.
- Start
the cluster
Most cluster PTFs require clustering to be ended and
restarted on the node to activate the PTF.
CEE0200 appears in the exit program joblog.
On this
error message, the from module is QLEPM and the from procedure is Q_LE_leBdyPeilog.
Any program that the exit program invokes must run in either *CALLER or a
named activation group. You must change your exit program or the program in
error to correct this condition.
CPD000D followed by CPF0001 appears in the cluster resource
services joblog.
When you receive this error message, make sure
the QMLTTHDACN system
value is set to either 1 or 2.
Cluster appears hung.
Make sure cluster resource
group exit programs are outstanding. To check the exit program, use the WRKACTJOB (Work
with Active Jobs) command, then look in the Function column
for the presence of PGM-QCSTCRGEXT.