ibm-information-center/dist/eclipse/plugins/i5OS.ic.rzaig_5.4.0.1/rzaigtroubleshootcommonproblems.htm

162 lines
11 KiB
HTML

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html lang="en-us" xml:lang="en-us">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="security" content="public" />
<meta name="Robots" content="index,follow" />
<meta http-equiv="PICS-Label" content='(PICS-1.1 "http://www.icra.org/ratingsv02.html" l gen true r (cz 1 lz 1 nz 1 oz 1 vz 1) "http://www.rsac.org/ratingsv01.html" l gen true r (n 0 s 0 v 0 l 0) "http://www.classify.org/safesurf/" l gen true r (SS~~000 1))' />
<meta name="DC.Type" content="concept" />
<meta name="DC.Title" content="Common cluster problems" />
<meta name="abstract" content="Lists some of the most common problems that can occur in a cluster, as well as ways to avoid and recover from them." />
<meta name="description" content="Lists some of the most common problems that can occur in a cluster, as well as ways to avoid and recover from them." />
<meta name="DC.Relation" scheme="URI" content="rzaigtroubleshoot.htm" />
<meta name="DC.Relation" scheme="URI" content="rzaigconfigenablenode.htm" />
<meta name="DC.Relation" scheme="URI" content="rzaigmanageclusterperformance.htm" />
<meta name="DC.Relation" scheme="URI" content="rzaigplanclusterversions.htm" />
<meta name="DC.Relation" scheme="URI" content="rzaigmanageadjustclusterversion.htm" />
<meta name="DC.Relation" scheme="URI" content="rzaigconfigsimpleclustermanagement.htm" />
<meta name="copyright" content="(C) Copyright IBM Corporation 1998, 2006" />
<meta name="DC.Rights.Owner" content="(C) Copyright IBM Corporation 1998, 2006" />
<meta name="DC.Format" content="XHTML" />
<meta name="DC.Identifier" content="rzaigtroubleshootcommonproblems" />
<meta name="DC.Language" content="en-us" />
<!-- All rights reserved. Licensed Materials Property of IBM -->
<!-- US Government Users Restricted Rights -->
<!-- Use, duplication or disclosure restricted by -->
<!-- GSA ADP Schedule Contract with IBM Corp. -->
<link rel="stylesheet" type="text/css" href="./ibmdita.css" />
<link rel="stylesheet" type="text/css" href="./ic.css" />
<title>Common cluster problems</title>
</head>
<body id="rzaigtroubleshootcommonproblems"><a name="rzaigtroubleshootcommonproblems"><!-- --></a>
<!-- Java sync-link --><script language="Javascript" src="../rzahg/synch.js" type="text/javascript"></script>
<h1 class="topictitle1">Common cluster problems</h1>
<div><p>Lists some of the most common problems that can occur in a cluster,
as well as ways to avoid and recover from them.</p>
<p>The following common problems are easily avoidable or easily correctable.</p>
<div class="section"><h4 class="sectiontitle">You cannot start or restart a cluster node</h4><p>This
situation is typically due to some problem with your communications environment.
To avoid this situation, ensure that your network attributes are set correctly,
including the loopback address, INETD settings, ALWADDCLU attibute, and the
IP addresses for cluster communications.</p>
<ul><li>The ALWADDCLU network attribute must be appropriately set on the target
node if trying to start a remote node. This should be set to either *ANY or
*RQSAUT depending on your environment.</li>
<li>The IP addresses chosen to be used for clustering locally and on the target
node must show an <var class="varname">Active</var> status.</li>
<li>The LOOPBACK address (127.0.0.1) locally and on the target node must also
be active.</li>
<li>The local and any remote nodes must be able to PING using the IP addresses
to be used for clustering to insure network routing is active.</li>
<li>INETD must be active on the target node. When INETD is active, port 5550
on the target node should be in a <var class="varname">Listen</var> state. See INETD
server for information about starting the INETD server.</li>
<li>Prior to attempting to start a node, port 5551 on the node to be started
must not be opened or it will, in fact, prevent the successful start of clustering
on the subject node.</li>
</ul>
</div>
<div class="section"><h4 class="sectiontitle">You end up with several, disjointed one-node clusters</h4><p>This
can occur when the node being started cannot communicate with the rest of
the cluster nodes. Check the communications paths.</p>
</div>
<div class="section"><h4 class="sectiontitle">The response from exit programs is slow.</h4><p>A common
cause for this situation is incorrect setting for the job description used
by the exit program. The MAXACT parameter may be set too low so that, for
example, only one instance of the exit program can be active at any point
in time. It is recommended that this be set to *NOMAX.</p>
</div>
<div class="section"><h4 class="sectiontitle">Performance in general seems to be slow.</h4><p>There are
several common causes for this symptom.</p>
<ul><li>The most likely cause is heavy communications traffic over a shared communications
line.</li>
<li>Another likely cause is an inconsistency between the communications environment
and the cluster message tuning parameters. You can use the <a href="../apis/clcntrtcrs.htm"><span class="apiname">Retrieve Cluster Resource
Services Information (QcstRetrieveCRSInfo)</span> API </a>to view the
current settings of the tuning parameters and the <a href="../apis/clcntchgcrs.htm"><span class="apiname">Change Cluster Resource Services (QcstChgClusterResourceServices)</span> API</a> to
change the settings. Cluster performance may be degraded under default cluster
tuning parameter settings if using old adapter hardware. The adapter hardware
types included in the definition of <var class="varname">old</var> are 2617, 2618,
2619, 2626, and 2665. In this case, setting of the <var class="varname">Performance class</var> tuning
parameter to <var class="varname">Normal</var> is desired.</li>
<li>Another common cause of this condition is problems with the IP multicast
groups. If the primary cluster addresses (first address entered for a given
node when creating a cluster or adding a node) for several nodes reside on
a common LAN, the cluster will utilize IP multicast capability. Using the <span class="cmdname">NETSTAT</span> command,
insure the primary cluster addresses show a multicast host group of 226.5.5.5.
This can be seen using option 14 <var class="varname">Display multicast group</var> for
the subject address. If the multicast group does not exist, verify the default
setting of TRUE is still set for the <var class="varname">Enable multicast</var> cluster
tuning parameter by using the <span class="apiname">Retrieve Cluster Resource Services
Information (QcstRetrieveCRSInfo)</span> API.</li>
<li>If all the nodes of a cluster are on a local LAN or have routing capabilities
which can handle Maximum Transmission Unit (MTU) packet sizes of greater than
1,464 bytes throughout the network routes, large cluster message transfers
(greater than 1,536K bytes) can be greatly speeded up by increasing the cluster
tuning parameter value for <var class="varname">Message fragment size</var> to better
match the route MTUs.</li>
</ul>
</div>
<div class="section"><h4 class="sectiontitle">You cannot use any of the function of the new release.</h4><p>If
you attempt to use new release function and you see error message CPFBB70,
then your current cluster version is still set at the prior version level.
You must upgrade all cluster nodes to the new release level and then use the
adjust cluster version interface to set the current cluster version to the
new level. See Adjust the cluster version of a cluster for more information.</p>
</div>
<div class="section"><h4 class="sectiontitle">You cannot add a node to a device domain or access the iSeries™ Navigator
cluster management interface.</h4><p>To access the iSeries Navigator cluster management
interface, or to use switchable devices, you must have <span class="keyword">i5/OS™</span> Option
41, HA Switchable Resources installed on your system. You must also have a
valid license key for this option.</p>
</div>
<div class="section"><h4 class="sectiontitle">You applied a cluster PTF and it does not seem to be working.</h4><p><img src="./delta.gif" alt="Start of change" />You should ensure that you have completed the following tasks
after applying the PTF:<img src="./deltaend.gif" alt="End of change" /></p>
<div class="p"><ol><li><a href="rzaigmanageendnode.htm">End the
cluster </a></li>
<li>Signoff then signon<p>The old program is still active in the activation
group until the activation group is destroyed. All of the cluster code (even
the cluster APIs) run in the default activation group.</p>
</li>
<li><a href="rzaigmanagestartnode.htm">Start
the cluster</a><p>Most cluster PTFs require clustering to be ended and
restarted on the node to activate the PTF.</p>
</li>
</ol>
</div>
</div>
<div class="section"><h4 class="sectiontitle">CEE0200 appears in the exit program joblog.</h4><p>On this
error message, the from module is QLEPM and the from procedure is Q_LE_leBdyPeilog.
Any program that the exit program invokes must run in either *CALLER or a
named activation group. You must change your exit program or the program in
error to correct this condition.</p>
</div>
<div class="section"><h4 class="sectiontitle">CPD000D followed by CPF0001 appears in the cluster resource
services joblog.</h4><p>When you receive this error message, make sure
the <a href="../rzakz/rzakzqmltthdacn.htm">QMLTTHDACN </a> system
value is set to either 1 or 2.</p>
</div>
<div class="section"><h4 class="sectiontitle">Cluster appears hung.</h4><p>Make sure cluster resource
group exit programs are outstanding. To check the exit program, use the <a href="../cl/wrkactjob.htm"><span class="cmdname">WRKACTJOB (Work
with Active Jobs)</span> command</a>, then look in the Function column
for the presence of PGM-QCSTCRGEXT.</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="rzaigtroubleshoot.htm" title="Find error recovery solutions for problems that are specific to clusters.">Troubleshoot clusters</a></div>
</div>
<div class="relconcepts"><strong>Related concepts</strong><br />
<div><a href="rzaigconfigenablenode.htm" title="Before you can add a node to a cluster, you need to set a value for the Allow add to cluster (ALWADDCLU) network attribute.">Enable a node to be added to a cluster</a></div>
<div><a href="rzaigmanageclusterperformance.htm" title="When changes are made to a cluster, the overhead necessary to manage the cluster can be affected.">Cluster performance</a></div>
<div><a href="rzaigplanclusterversions.htm" title="A cluster version represents the level of function available on the cluster.">Cluster version</a></div>
<div><a href="rzaigconfigsimpleclustermanagement.htm" title="IBM offers a cluster management interface that is available through iSeries Navigator and accessible through Option 41 (i5/OS - HA Switchable Resources).">iSeries Navigator cluster management</a></div>
</div>
<div class="reltasks"><strong>Related tasks</strong><br />
<div><a href="rzaigmanageadjustclusterversion.htm" title="The cluster version defines the level at which all the nodes in the cluster are actively communicating with each other.">Adjust the cluster version of a cluster</a></div>
</div>
</div>
</body>
</html>