117 lines
7.3 KiB
HTML
117 lines
7.3 KiB
HTML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!DOCTYPE html
|
|
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
|
<html lang="en-us" xml:lang="en-us">
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
|
|
<meta name="security" content="public" />
|
|
<meta name="Robots" content="index,follow" />
|
|
<meta http-equiv="PICS-Label" content='(PICS-1.1 "http://www.icra.org/ratingsv02.html" l gen true r (cz 1 lz 1 nz 1 oz 1 vz 1) "http://www.rsac.org/ratingsv01.html" l gen true r (n 0 s 0 v 0 l 0) "http://www.classify.org/safesurf/" l gen true r (SS~~000 1))' />
|
|
<meta name="DC.Type" content="concept" />
|
|
<meta name="DC.Title" content="Determine if a cluster problem exists" />
|
|
<meta name="abstract" content="Start here to diagnose your cluster problems." />
|
|
<meta name="description" content="Start here to diagnose your cluster problems." />
|
|
<meta name="DC.Relation" scheme="URI" content="rzaigtroubleshoot.htm" />
|
|
<meta name="DC.Relation" scheme="URI" content="../rzaks/rzaksvwjobonsbs.htm" />
|
|
<meta name="DC.Relation" scheme="URI" content="../cl/wrkactjob.htm" />
|
|
<meta name="DC.Relation" scheme="URI" content="../cl/dspcluinf.htm" />
|
|
<meta name="DC.Relation" scheme="URI" content="rzaigmanagejobstructure.htm" />
|
|
<meta name="copyright" content="(C) Copyright IBM Corporation 1998, 2006" />
|
|
<meta name="DC.Rights.Owner" content="(C) Copyright IBM Corporation 1998, 2006" />
|
|
<meta name="DC.Format" content="XHTML" />
|
|
<meta name="DC.Identifier" content="rzaigtroubleshootdetermineproblem" />
|
|
<meta name="DC.Language" content="en-us" />
|
|
<!-- All rights reserved. Licensed Materials Property of IBM -->
|
|
<!-- US Government Users Restricted Rights -->
|
|
<!-- Use, duplication or disclosure restricted by -->
|
|
<!-- GSA ADP Schedule Contract with IBM Corp. -->
|
|
<link rel="stylesheet" type="text/css" href="./ibmdita.css" />
|
|
<link rel="stylesheet" type="text/css" href="./ic.css" />
|
|
<title>Determine if a cluster problem exists</title>
|
|
</head>
|
|
<body id="rzaigtroubleshootdetermineproblem"><a name="rzaigtroubleshootdetermineproblem"><!-- --></a>
|
|
<!-- Java sync-link --><script language="Javascript" src="../rzahg/synch.js" type="text/javascript"></script>
|
|
<h1 class="topictitle1">Determine if a cluster problem exists</h1>
|
|
<div><p>Start here to diagnose your cluster problems.</p>
|
|
<p>At times, it may seem that your cluster is not operating correctly. When
|
|
you think a problem exists, you can use the following to help determine if
|
|
a problem exists and the nature of the problem.</p>
|
|
<ul><li><strong>Determine if clustering is active on your system.</strong> <p>To determine
|
|
if cluster resource services is active, look for the two jobs - QCSTCTL and
|
|
QCSTCRGM - in the QSYSWRK subsystem. If these jobs are active, then cluster
|
|
resource services is active. You can use the Work Management function in iSeries™ Navigator
|
|
to View jobs in a subsystem or use the <a href="../cl/wrkactjob.htm"><span class="cmdname">WRKACTJOB (Work with Active Jobs)</span> command</a> to
|
|
do this. You can also use the <a href="../cl/dspcluinf.htm"><span class="cmdname">DSPCLUINF (Display Cluster Information)</span> command </a>to
|
|
view status information for the cluster.</p>
|
|
<ul><li>Additional jobs for cluster resource services may also be active. <a href="rzaigmanagejobstructure.htm">Cluster resource
|
|
services job structure</a> provides information about how cluster resource
|
|
services jobs are formatted. </li>
|
|
</ul>
|
|
</li>
|
|
<li><strong>Look for messages indicating a problem.</strong> <ul><li>Look for inquiry messages in QSYSOPR that are waiting for a response.</li>
|
|
<li>Look for error messages in QSYSOPR that indicate a cluster problem. Generally,
|
|
these will be in the CPFBB00 to CPFBBFF range.</li>
|
|
<li>Display the history log (<span class="cmdname">DSPLOG</span> CL command) for messages
|
|
that indicate a cluster problem. Generally, these will be in the CPFBB00 to
|
|
CPFBBFF range.</li>
|
|
</ul>
|
|
</li>
|
|
<li><strong>Look at job logs for the cluster jobs for severe errors.</strong> <p>These
|
|
jobs are initially set with a logging level at (4 0 *SECLVL) so that you can
|
|
see the necessary error messages. You should ensure that these jobs and the
|
|
exit program jobs have the logging level set appropriately. If clustering
|
|
is not active, you can still look for spool files for the cluster jobs and
|
|
exit program jobs.</p>
|
|
</li>
|
|
<li><strong>If you suspect some kind of hang condition, look at call stacks of
|
|
cluster jobs.</strong> <p>Determine if there is any program in some kind of DEQW
|
|
(dequeue wait). If so, check the call stack of each thread and see if any
|
|
of them have getSpecialMsg in the call stack.</p>
|
|
</li>
|
|
<li><strong>Check for cluster vertical licensed internal code (VLIC) logs entries.</strong> <p>These
|
|
log entries have a 4800 major code.</p>
|
|
</li>
|
|
<li><strong>Use NETSTAT command to determine if there are any abnormalities in
|
|
your communications environment.</strong> <p>NETSTAT returns information about
|
|
the status of TCP/IP network routes, interfaces, TCP connections and UDP ports
|
|
on your system.</p>
|
|
<ul><li>Use Netstat Option 1 (Work with TCP/IP interface status) to ensure that
|
|
the IP addresses chosen to be used for clustering show an 'Active' status.
|
|
Also ensure that the LOOPBACK address (127.0.0.1) is also active.</li>
|
|
<li>Use Netstat Option 3 (Work with TCP/IP Connection Status) to display the
|
|
port numbers (F14). Local port 5550 should be in a 'Listen' state. This port
|
|
must be opened via the STRTCPSVR *INETD command evidenced by the existence
|
|
of a QTOGINTD (User QTCP) job in the Active Jobs list. If clustering is started
|
|
on a node, local port 5551 must be opened and be in a '*UDP' state. If clustering
|
|
is not started, port 5551 must not be opened or it will, in fact, prevent
|
|
the successful start of clustering on the subject node.</li>
|
|
</ul>
|
|
</li>
|
|
<li>Use ping. If you try to start a cluster node and it cannot be pinged,
|
|
you will receive an internal clustering error (CPFBB46).</li>
|
|
<li><img src="./delta.gif" alt="Start of change" /><strong>Use the CLUSTERINFO macro to show cluster
|
|
resource services' view of nodes in the cluster, nodes in the various cluster
|
|
resource groups, and cluster IP addresses being currently used.</strong> <p>Discrepencies
|
|
found here may help pinpoint trouble areas if the cluster is not performing
|
|
as expected. See <a href="rzaiginvestigateclusterinfo.htm">Investigate a problem with CLUSTERINFO macro</a> for details on using and interpreting the CLUSTERINFO
|
|
macro results.</p>
|
|
<img src="./deltaend.gif" alt="End of change" /></li>
|
|
</ul>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="rzaigtroubleshoot.htm" title="Find error recovery solutions for problems that are specific to clusters.">Troubleshoot clusters</a></div>
|
|
</div>
|
|
<div class="relconcepts"><strong>Related concepts</strong><br />
|
|
<div><a href="rzaigmanagejobstructure.htm" title="When managing cluster, you need to know about job structures and user queues.">Job structure and user queues</a></div>
|
|
</div>
|
|
<div class="reltasks"><strong>Related tasks</strong><br />
|
|
<div><a href="../rzaks/rzaksvwjobonsbs.htm">View jobs in a subsystem</a></div>
|
|
</div>
|
|
<div class="relref"><strong>Related reference</strong><br />
|
|
<div><a href="../cl/wrkactjob.htm">WRKACTJOB (Work with Active Jobs)</a></div>
|
|
<div><a href="../cl/dspcluinf.htm">DSPCLUINF (Display Cluster Information) command</a></div>
|
|
</div>
|
|
</div>
|
|
</body>
|
|
</html> |