162 lines
11 KiB
HTML
162 lines
11 KiB
HTML
|
<?xml version="1.0" encoding="UTF-8"?>
|
||
|
<!DOCTYPE html
|
||
|
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||
|
<html lang="en-us" xml:lang="en-us">
|
||
|
<head>
|
||
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
|
||
|
<meta name="security" content="public" />
|
||
|
<meta name="Robots" content="index,follow" />
|
||
|
<meta http-equiv="PICS-Label" content='(PICS-1.1 "http://www.icra.org/ratingsv02.html" l gen true r (cz 1 lz 1 nz 1 oz 1 vz 1) "http://www.rsac.org/ratingsv01.html" l gen true r (n 0 s 0 v 0 l 0) "http://www.classify.org/safesurf/" l gen true r (SS~~000 1))' />
|
||
|
<meta name="DC.Type" content="concept" />
|
||
|
<meta name="DC.Title" content="Common cluster problems" />
|
||
|
<meta name="abstract" content="Lists some of the most common problems that can occur in a cluster, as well as ways to avoid and recover from them." />
|
||
|
<meta name="description" content="Lists some of the most common problems that can occur in a cluster, as well as ways to avoid and recover from them." />
|
||
|
<meta name="DC.Relation" scheme="URI" content="rzaigtroubleshoot.htm" />
|
||
|
<meta name="DC.Relation" scheme="URI" content="rzaigconfigenablenode.htm" />
|
||
|
<meta name="DC.Relation" scheme="URI" content="rzaigmanageclusterperformance.htm" />
|
||
|
<meta name="DC.Relation" scheme="URI" content="rzaigplanclusterversions.htm" />
|
||
|
<meta name="DC.Relation" scheme="URI" content="rzaigmanageadjustclusterversion.htm" />
|
||
|
<meta name="DC.Relation" scheme="URI" content="rzaigconfigsimpleclustermanagement.htm" />
|
||
|
<meta name="copyright" content="(C) Copyright IBM Corporation 1998, 2006" />
|
||
|
<meta name="DC.Rights.Owner" content="(C) Copyright IBM Corporation 1998, 2006" />
|
||
|
<meta name="DC.Format" content="XHTML" />
|
||
|
<meta name="DC.Identifier" content="rzaigtroubleshootcommonproblems" />
|
||
|
<meta name="DC.Language" content="en-us" />
|
||
|
<!-- All rights reserved. Licensed Materials Property of IBM -->
|
||
|
<!-- US Government Users Restricted Rights -->
|
||
|
<!-- Use, duplication or disclosure restricted by -->
|
||
|
<!-- GSA ADP Schedule Contract with IBM Corp. -->
|
||
|
<link rel="stylesheet" type="text/css" href="./ibmdita.css" />
|
||
|
<link rel="stylesheet" type="text/css" href="./ic.css" />
|
||
|
<title>Common cluster problems</title>
|
||
|
</head>
|
||
|
<body id="rzaigtroubleshootcommonproblems"><a name="rzaigtroubleshootcommonproblems"><!-- --></a>
|
||
|
<!-- Java sync-link --><script language="Javascript" src="../rzahg/synch.js" type="text/javascript"></script>
|
||
|
<h1 class="topictitle1">Common cluster problems</h1>
|
||
|
<div><p>Lists some of the most common problems that can occur in a cluster,
|
||
|
as well as ways to avoid and recover from them.</p>
|
||
|
<p>The following common problems are easily avoidable or easily correctable.</p>
|
||
|
<div class="section"><h4 class="sectiontitle">You cannot start or restart a cluster node</h4><p>This
|
||
|
situation is typically due to some problem with your communications environment.
|
||
|
To avoid this situation, ensure that your network attributes are set correctly,
|
||
|
including the loopback address, INETD settings, ALWADDCLU attibute, and the
|
||
|
IP addresses for cluster communications.</p>
|
||
|
<ul><li>The ALWADDCLU network attribute must be appropriately set on the target
|
||
|
node if trying to start a remote node. This should be set to either *ANY or
|
||
|
*RQSAUT depending on your environment.</li>
|
||
|
<li>The IP addresses chosen to be used for clustering locally and on the target
|
||
|
node must show an <var class="varname">Active</var> status.</li>
|
||
|
<li>The LOOPBACK address (127.0.0.1) locally and on the target node must also
|
||
|
be active.</li>
|
||
|
<li>The local and any remote nodes must be able to PING using the IP addresses
|
||
|
to be used for clustering to insure network routing is active.</li>
|
||
|
<li>INETD must be active on the target node. When INETD is active, port 5550
|
||
|
on the target node should be in a <var class="varname">Listen</var> state. See INETD
|
||
|
server for information about starting the INETD server.</li>
|
||
|
<li>Prior to attempting to start a node, port 5551 on the node to be started
|
||
|
must not be opened or it will, in fact, prevent the successful start of clustering
|
||
|
on the subject node.</li>
|
||
|
</ul>
|
||
|
</div>
|
||
|
<div class="section"><h4 class="sectiontitle">You end up with several, disjointed one-node clusters</h4><p>This
|
||
|
can occur when the node being started cannot communicate with the rest of
|
||
|
the cluster nodes. Check the communications paths.</p>
|
||
|
</div>
|
||
|
<div class="section"><h4 class="sectiontitle">The response from exit programs is slow.</h4><p>A common
|
||
|
cause for this situation is incorrect setting for the job description used
|
||
|
by the exit program. The MAXACT parameter may be set too low so that, for
|
||
|
example, only one instance of the exit program can be active at any point
|
||
|
in time. It is recommended that this be set to *NOMAX.</p>
|
||
|
</div>
|
||
|
<div class="section"><h4 class="sectiontitle">Performance in general seems to be slow.</h4><p>There are
|
||
|
several common causes for this symptom.</p>
|
||
|
<ul><li>The most likely cause is heavy communications traffic over a shared communications
|
||
|
line.</li>
|
||
|
<li>Another likely cause is an inconsistency between the communications environment
|
||
|
and the cluster message tuning parameters. You can use the <a href="../apis/clcntrtcrs.htm"><span class="apiname">Retrieve Cluster Resource
|
||
|
Services Information (QcstRetrieveCRSInfo)</span> API </a>to view the
|
||
|
current settings of the tuning parameters and the <a href="../apis/clcntchgcrs.htm"><span class="apiname">Change Cluster Resource Services (QcstChgClusterResourceServices)</span> API</a> to
|
||
|
change the settings. Cluster performance may be degraded under default cluster
|
||
|
tuning parameter settings if using old adapter hardware. The adapter hardware
|
||
|
types included in the definition of <var class="varname">old</var> are 2617, 2618,
|
||
|
2619, 2626, and 2665. In this case, setting of the <var class="varname">Performance class</var> tuning
|
||
|
parameter to <var class="varname">Normal</var> is desired.</li>
|
||
|
<li>Another common cause of this condition is problems with the IP multicast
|
||
|
groups. If the primary cluster addresses (first address entered for a given
|
||
|
node when creating a cluster or adding a node) for several nodes reside on
|
||
|
a common LAN, the cluster will utilize IP multicast capability. Using the <span class="cmdname">NETSTAT</span> command,
|
||
|
insure the primary cluster addresses show a multicast host group of 226.5.5.5.
|
||
|
This can be seen using option 14 <var class="varname">Display multicast group</var> for
|
||
|
the subject address. If the multicast group does not exist, verify the default
|
||
|
setting of TRUE is still set for the <var class="varname">Enable multicast</var> cluster
|
||
|
tuning parameter by using the <span class="apiname">Retrieve Cluster Resource Services
|
||
|
Information (QcstRetrieveCRSInfo)</span> API.</li>
|
||
|
<li>If all the nodes of a cluster are on a local LAN or have routing capabilities
|
||
|
which can handle Maximum Transmission Unit (MTU) packet sizes of greater than
|
||
|
1,464 bytes throughout the network routes, large cluster message transfers
|
||
|
(greater than 1,536K bytes) can be greatly speeded up by increasing the cluster
|
||
|
tuning parameter value for <var class="varname">Message fragment size</var> to better
|
||
|
match the route MTUs.</li>
|
||
|
</ul>
|
||
|
</div>
|
||
|
<div class="section"><h4 class="sectiontitle">You cannot use any of the function of the new release.</h4><p>If
|
||
|
you attempt to use new release function and you see error message CPFBB70,
|
||
|
then your current cluster version is still set at the prior version level.
|
||
|
You must upgrade all cluster nodes to the new release level and then use the
|
||
|
adjust cluster version interface to set the current cluster version to the
|
||
|
new level. See Adjust the cluster version of a cluster for more information.</p>
|
||
|
</div>
|
||
|
<div class="section"><h4 class="sectiontitle">You cannot add a node to a device domain or access the iSeries™ Navigator
|
||
|
cluster management interface.</h4><p>To access the iSeries Navigator cluster management
|
||
|
interface, or to use switchable devices, you must have <span class="keyword">i5/OS™</span> Option
|
||
|
41, HA Switchable Resources installed on your system. You must also have a
|
||
|
valid license key for this option.</p>
|
||
|
</div>
|
||
|
<div class="section"><h4 class="sectiontitle">You applied a cluster PTF and it does not seem to be working.</h4><p><img src="./delta.gif" alt="Start of change" />You should ensure that you have completed the following tasks
|
||
|
after applying the PTF:<img src="./deltaend.gif" alt="End of change" /></p>
|
||
|
<div class="p"><ol><li><a href="rzaigmanageendnode.htm">End the
|
||
|
cluster </a></li>
|
||
|
<li>Signoff then signon<p>The old program is still active in the activation
|
||
|
group until the activation group is destroyed. All of the cluster code (even
|
||
|
the cluster APIs) run in the default activation group.</p>
|
||
|
</li>
|
||
|
<li><a href="rzaigmanagestartnode.htm">Start
|
||
|
the cluster</a><p>Most cluster PTFs require clustering to be ended and
|
||
|
restarted on the node to activate the PTF.</p>
|
||
|
</li>
|
||
|
</ol>
|
||
|
</div>
|
||
|
</div>
|
||
|
<div class="section"><h4 class="sectiontitle">CEE0200 appears in the exit program joblog.</h4><p>On this
|
||
|
error message, the from module is QLEPM and the from procedure is Q_LE_leBdyPeilog.
|
||
|
Any program that the exit program invokes must run in either *CALLER or a
|
||
|
named activation group. You must change your exit program or the program in
|
||
|
error to correct this condition.</p>
|
||
|
</div>
|
||
|
<div class="section"><h4 class="sectiontitle">CPD000D followed by CPF0001 appears in the cluster resource
|
||
|
services joblog.</h4><p>When you receive this error message, make sure
|
||
|
the <a href="../rzakz/rzakzqmltthdacn.htm">QMLTTHDACN </a> system
|
||
|
value is set to either 1 or 2.</p>
|
||
|
</div>
|
||
|
<div class="section"><h4 class="sectiontitle">Cluster appears hung.</h4><p>Make sure cluster resource
|
||
|
group exit programs are outstanding. To check the exit program, use the <a href="../cl/wrkactjob.htm"><span class="cmdname">WRKACTJOB (Work
|
||
|
with Active Jobs)</span> command</a>, then look in the Function column
|
||
|
for the presence of PGM-QCSTCRGEXT.</p>
|
||
|
</div>
|
||
|
</div>
|
||
|
<div>
|
||
|
<div class="familylinks">
|
||
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="rzaigtroubleshoot.htm" title="Find error recovery solutions for problems that are specific to clusters.">Troubleshoot clusters</a></div>
|
||
|
</div>
|
||
|
<div class="relconcepts"><strong>Related concepts</strong><br />
|
||
|
<div><a href="rzaigconfigenablenode.htm" title="Before you can add a node to a cluster, you need to set a value for the Allow add to cluster (ALWADDCLU) network attribute.">Enable a node to be added to a cluster</a></div>
|
||
|
<div><a href="rzaigmanageclusterperformance.htm" title="When changes are made to a cluster, the overhead necessary to manage the cluster can be affected.">Cluster performance</a></div>
|
||
|
<div><a href="rzaigplanclusterversions.htm" title="A cluster version represents the level of function available on the cluster.">Cluster version</a></div>
|
||
|
<div><a href="rzaigconfigsimpleclustermanagement.htm" title="IBM offers a cluster management interface that is available through iSeries Navigator and accessible through Option 41 (i5/OS - HA Switchable Resources).">iSeries Navigator cluster management</a></div>
|
||
|
</div>
|
||
|
<div class="reltasks"><strong>Related tasks</strong><br />
|
||
|
<div><a href="rzaigmanageadjustclusterversion.htm" title="The cluster version defines the level at which all the nodes in the cluster are actively communicating with each other.">Adjust the cluster version of a cluster</a></div>
|
||
|
</div>
|
||
|
</div>
|
||
|
</body>
|
||
|
</html>
|