ibm-information-center/dist/eclipse/plugins/i5OS.ic.rzaig_5.4.0.1/rzaigtroubleshootdetermineproblem.htm

117 lines
7.3 KiB
HTML

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html lang="en-us" xml:lang="en-us">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="security" content="public" />
<meta name="Robots" content="index,follow" />
<meta http-equiv="PICS-Label" content='(PICS-1.1 "http://www.icra.org/ratingsv02.html" l gen true r (cz 1 lz 1 nz 1 oz 1 vz 1) "http://www.rsac.org/ratingsv01.html" l gen true r (n 0 s 0 v 0 l 0) "http://www.classify.org/safesurf/" l gen true r (SS~~000 1))' />
<meta name="DC.Type" content="concept" />
<meta name="DC.Title" content="Determine if a cluster problem exists" />
<meta name="abstract" content="Start here to diagnose your cluster problems." />
<meta name="description" content="Start here to diagnose your cluster problems." />
<meta name="DC.Relation" scheme="URI" content="rzaigtroubleshoot.htm" />
<meta name="DC.Relation" scheme="URI" content="../rzaks/rzaksvwjobonsbs.htm" />
<meta name="DC.Relation" scheme="URI" content="../cl/wrkactjob.htm" />
<meta name="DC.Relation" scheme="URI" content="../cl/dspcluinf.htm" />
<meta name="DC.Relation" scheme="URI" content="rzaigmanagejobstructure.htm" />
<meta name="copyright" content="(C) Copyright IBM Corporation 1998, 2006" />
<meta name="DC.Rights.Owner" content="(C) Copyright IBM Corporation 1998, 2006" />
<meta name="DC.Format" content="XHTML" />
<meta name="DC.Identifier" content="rzaigtroubleshootdetermineproblem" />
<meta name="DC.Language" content="en-us" />
<!-- All rights reserved. Licensed Materials Property of IBM -->
<!-- US Government Users Restricted Rights -->
<!-- Use, duplication or disclosure restricted by -->
<!-- GSA ADP Schedule Contract with IBM Corp. -->
<link rel="stylesheet" type="text/css" href="./ibmdita.css" />
<link rel="stylesheet" type="text/css" href="./ic.css" />
<title>Determine if a cluster problem exists</title>
</head>
<body id="rzaigtroubleshootdetermineproblem"><a name="rzaigtroubleshootdetermineproblem"><!-- --></a>
<!-- Java sync-link --><script language="Javascript" src="../rzahg/synch.js" type="text/javascript"></script>
<h1 class="topictitle1">Determine if a cluster problem exists</h1>
<div><p>Start here to diagnose your cluster problems.</p>
<p>At times, it may seem that your cluster is not operating correctly. When
you think a problem exists, you can use the following to help determine if
a problem exists and the nature of the problem.</p>
<ul><li><strong>Determine if clustering is active on your system.</strong> <p>To determine
if cluster resource services is active, look for the two jobs - QCSTCTL and
QCSTCRGM - in the QSYSWRK subsystem. If these jobs are active, then cluster
resource services is active. You can use the Work Management function in iSeries™ Navigator
to View jobs in a subsystem or use the <a href="../cl/wrkactjob.htm"><span class="cmdname">WRKACTJOB (Work with Active Jobs)</span> command</a> to
do this. You can also use the <a href="../cl/dspcluinf.htm"><span class="cmdname">DSPCLUINF (Display Cluster Information)</span> command </a>to
view status information for the cluster.</p>
<ul><li>Additional jobs for cluster resource services may also be active. <a href="rzaigmanagejobstructure.htm">Cluster resource
services job structure</a> provides information about how cluster resource
services jobs are formatted. </li>
</ul>
</li>
<li><strong>Look for messages indicating a problem.</strong> <ul><li>Look for inquiry messages in QSYSOPR that are waiting for a response.</li>
<li>Look for error messages in QSYSOPR that indicate a cluster problem. Generally,
these will be in the CPFBB00 to CPFBBFF range.</li>
<li>Display the history log (<span class="cmdname">DSPLOG</span> CL command) for messages
that indicate a cluster problem. Generally, these will be in the CPFBB00 to
CPFBBFF range.</li>
</ul>
</li>
<li><strong>Look at job logs for the cluster jobs for severe errors.</strong> <p>These
jobs are initially set with a logging level at (4 0 *SECLVL) so that you can
see the necessary error messages. You should ensure that these jobs and the
exit program jobs have the logging level set appropriately. If clustering
is not active, you can still look for spool files for the cluster jobs and
exit program jobs.</p>
</li>
<li><strong>If you suspect some kind of hang condition, look at call stacks of
cluster jobs.</strong> <p>Determine if there is any program in some kind of DEQW
(dequeue wait). If so, check the call stack of each thread and see if any
of them have getSpecialMsg in the call stack.</p>
</li>
<li><strong>Check for cluster vertical licensed internal code (VLIC) logs entries.</strong> <p>These
log entries have a 4800 major code.</p>
</li>
<li><strong>Use NETSTAT command to determine if there are any abnormalities in
your communications environment.</strong> <p>NETSTAT returns information about
the status of TCP/IP network routes, interfaces, TCP connections and UDP ports
on your system.</p>
<ul><li>Use Netstat Option 1 (Work with TCP/IP interface status) to ensure that
the IP addresses chosen to be used for clustering show an 'Active' status.
Also ensure that the LOOPBACK address (127.0.0.1) is also active.</li>
<li>Use Netstat Option 3 (Work with TCP/IP Connection Status) to display the
port numbers (F14). Local port 5550 should be in a 'Listen' state. This port
must be opened via the STRTCPSVR *INETD command evidenced by the existence
of a QTOGINTD (User QTCP) job in the Active Jobs list. If clustering is started
on a node, local port 5551 must be opened and be in a '*UDP' state. If clustering
is not started, port 5551 must not be opened or it will, in fact, prevent
the successful start of clustering on the subject node.</li>
</ul>
</li>
<li>Use ping. If you try to start a cluster node and it cannot be pinged,
you will receive an internal clustering error (CPFBB46).</li>
<li><img src="./delta.gif" alt="Start of change" /><strong>Use the CLUSTERINFO macro to show cluster
resource services' view of nodes in the cluster, nodes in the various cluster
resource groups, and cluster IP addresses being currently used.</strong> <p>Discrepencies
found here may help pinpoint trouble areas if the cluster is not performing
as expected. See <a href="rzaiginvestigateclusterinfo.htm">Investigate a problem with CLUSTERINFO macro</a> for details on using and interpreting the CLUSTERINFO
macro results.</p>
<img src="./deltaend.gif" alt="End of change" /></li>
</ul>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="rzaigtroubleshoot.htm" title="Find error recovery solutions for problems that are specific to clusters.">Troubleshoot clusters</a></div>
</div>
<div class="relconcepts"><strong>Related concepts</strong><br />
<div><a href="rzaigmanagejobstructure.htm" title="When managing cluster, you need to know about job structures and user queues.">Job structure and user queues</a></div>
</div>
<div class="reltasks"><strong>Related tasks</strong><br />
<div><a href="../rzaks/rzaksvwjobonsbs.htm">View jobs in a subsystem</a></div>
</div>
<div class="relref"><strong>Related reference</strong><br />
<div><a href="../cl/wrkactjob.htm">WRKACTJOB (Work with Active Jobs)</a></div>
<div><a href="../cl/dspcluinf.htm">DSPCLUINF (Display Cluster Information) command</a></div>
</div>
</div>
</body>
</html>