<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
  PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html lang="en-us" xml:lang="en-us">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="security" content="public" />
<meta name="Robots" content="index,follow" />
<meta http-equiv="PICS-Label" content='(PICS-1.1 "http://www.icra.org/ratingsv02.html" l gen true r (cz 1 lz 1 nz 1 oz 1 vz 1) "http://www.rsac.org/ratingsv01.html" l gen true r (n 0 s 0 v 0 l 0) "http://www.classify.org/safesurf/" l gen true r (SS~~000 1))' />
<meta name="DC.Type" content="concept" />
<meta name="DC.Title" content="Investigate a problem with Dump Cluster Trace (DMPCLUTRC) command" />
<meta name="abstract" content="Use the Dump Cluster Trace (DMPCLUTRC) command to help determine and resolve problems with a cluster." />
<meta name="description" content="Use the Dump Cluster Trace (DMPCLUTRC) command to help determine and resolve problems with a cluster." />
<meta name="DC.Relation" scheme="URI" content="rzaigtroubleshoot.htm" />
<meta name="copyright" content="(C) Copyright IBM Corporation 1998, 2006" />
<meta name="DC.Rights.Owner" content="(C) Copyright IBM Corporation 1998, 2006" />
<meta name="DC.Format" content="XHTML" />
<meta name="DC.Identifier" content="rzaiginvestigatetrace" />
<meta name="DC.Language" content="en-us" />
<!-- All rights reserved. Licensed Materials Property of IBM -->
<!-- US Government Users Restricted Rights -->
<!-- Use, duplication or disclosure restricted by -->
<!-- GSA ADP Schedule Contract with IBM Corp. -->
<link rel="stylesheet" type="text/css" href="./ibmdita.css" />
<link rel="stylesheet" type="text/css" href="./ic.css" />
<title>Investigate a problem with Dump Cluster Trace (DMPCLUTRC) command</title>
</head>
<body id="rzaiginvestigatetrace"><a name="rzaiginvestigatetrace"><!-- --></a>
<img src="./delta.gif" alt="Start of change" /><!-- Java sync-link --><script language="Javascript" src="../rzahg/synch.js" type="text/javascript"></script>
<h1 class="topictitle1">Investigate a problem with Dump Cluster Trace (DMPCLUTRC) command</h1>
<div><p>Use the Dump Cluster Trace (DMPCLUTRC) command to help determine
and resolve problems with a cluster.</p>
<p>The <a href="../cl/dmpclutrc.htm"><span class="cmdname">Dump
Cluster Trace (DMPCLUTRC)</span> command</a> can help you to determine
whether a cluster job has completed or what the job is currently processing.
The command dumps cluster-related trace and debug information to a file. The
information is dumped locally on one or more cluster nodes. The command can
be used to dump one or all cluster resource service (CRS) jobs. Each CRS job
that is dumped has a file member in the file. The name of the file member
is the name of the CRS job. Clustering must be active for the command to produce
output.   Only nodes that have an active CRS job will have output. The information
that is dumped originates from the user trace and other information is taken
from cluster objects. The amount of information dumped is determined by the
dump level.   The different dump levels are basic information, error information,
informational information, and verbose information. The dump level determines
how much information is sent to the file.  In most cases an IBM<sup>®</sup> service representative
will inform you of which level to specify based on your needs, however; LEVEL(*ERROR)
is sufficient for most troubleshooting scenarios. If you have a question on
which level is appropriate for your situation, contact an IBM Service representative. </p>
<div class="section"><h4 class="sectiontitle">Interpreting trace results</h4><p>You can analyze the trace
results to get an understanding of what the clustering is doing, such as which
cluster job is causing the protocol to wait.   The output which comes from
the user trace will contain a separator line which is a series of equal signs
(=).  Depending on how many times the DMPCLUTRC was issued will have an impact
on how many separator lines are seen in the file. There may be multiple calls
of DMPCLUTRC in the same file. The last set of stack dumps has the most current
information. In some cases, a CRG job can have two groups.  Each group has
a separate dump section in the file.    </p>
<p>In the following example of
Dump Cluster Trace results, there are two nodes (SYSTEM1 and SYSTEM2) in a
cluster named MYCLUSTER.  It has one CRG named MYCRG.  Both nodes are in the
recovery domain of the MYCRG.  The user issued the STRCRG CL command and the
process is taking a long time to return results. From a different workstation,
the user entered <span class="cmdname">DMPCLUTRC CLUSTER(MYCLUSTER) CRG(*ALL) LEVEL(*ERROR)
FILE(MYFILE)</span> on a command line interface. </p>
<div class="p">For this example,
the output from the DMPCLUTRC command resides in a file called MYFILE in member
MYCRG.  To help explain what the contents of member MYCRG, it has been broken
down into sections. Throughout the sections, numbers are highlighted in parentheses
to identify the information being described. These details can aid you in
troubleshooting cluster problems. <div class="note"><span class="notetitle">Note:</span> <img src="./delta.gif" alt="Start of change" />Vertical ellipses indicate
that a portion of the trace was removed and is not displayed in the output.<img src="./deltaend.gif" alt="End of change" /></div>
</div>
</div>
<div class="section"><h4 class="sectiontitle">Section 1 of DMPCLUTRC results</h4><div class="p"><pre>User Trace Dump for job 073586/QSYS/MYCRG. Size: 300K, Wrapped 0 times.

--- 08/22/2005 16:43:32 ---
<strong>(1a)</strong> 00000006:658536 Main thread handle 2
<strong>(1b)</strong> 00000008:748016 Work thread 1 handle 13 
<strong>(1b)</strong> 00000007:754576 Work thread 2 handle 11
--- 08/22/2005 16:46:04 ---
 00000008:269608 CSTDAMBR 1115: WaitForMsg 4 1005 CPFBB3C
--- 08/22/2005 16:48:17 ---
       00000006:925112
<strong>(1c) </strong>       DMPCLUTRC  Node SYSTEM1  Group MYCRG
    ===================================================</pre>
</div>
<p>The first section contains the thread numbers and handles
for the cluster job. Cluster jobs may have two or more threads. In this example
there is one for the main thread (1a), which is where all the work comes into,
and two work threads (1b). This section also contains information about what
system this trace came from and to which cluster job it pertains too (1c).
  </p>
</div>
<div class="section"><h4 class="sectiontitle">Sections 2 of DMPCLUTRC results</h4><div class="p"><pre>00000006:925168 Stack Dump For Target Thread: Handle 2 (0x00000002)
       00000006:925192 Stack:
 <strong>(2aa)</strong> Main Thread Stack MYCRG
       00000006:925256 Stack:  Library    / Program     Module      Stmt    Procedure
       00000006:933432 Stack:  QSYS       / QCSTCRGJOB  CSTCRGJOB   0     : _CXX_PEP__Fv
       00000006:933488 Stack:  QSYS       / QCSTCRGJOB  CSTCRGJOB   46    : main
       00000006:933536 Stack:  QSYS       / QCSTCRGJOB  CSTCRGJOB   65    : completeStartup__FP8CstDAMbr
       00000006:933584 Stack:  QSYS       / QCSTCRGJOB  CSTCRGJOB   26    : mainQueueProcessLoop__FP8Cs
																																		DAMbr
       00000006:933616 Stack:  QSYS       / QCSTCMN     CSTDAMBR    57    : processQueueMsgs__8CstDAMbrF
																																		Q2_8CstDAMbr13CstQueueIndex
       00000006:933664 Stack:  QSYS       / QCSTCMN     CSTDAMBR    53    : processMsg__8CstDAMbrFP6CstM
																																		sg
       00000006:933712 Stack:  QSYS       / QCSTCMN     CSTDAMBR    17    : callFnPtr__8CstDAMbrFPQ2_8Cs
																																		tDAMbr19MsgFunctionPtrEntryP6
       00000006:933744 Stack:  QSYS       / QCSTCRGJOB  CSTCRGJOB   94    : crgDump__FP6CstMsgP8CstDAMbr
       00000006:933792 Stack:  QSYS       / QCSTCMN     CSTACK      95    : CstAckQueryMsg
       00000006:933832 Stack:  QSYS       / QP0ZCPA     QP0ZUDBG    3     : Qp0zDumpTargetStack
00000006:933864 Stack:  QSYS       / QP0ZSCPA    QP0ZSDBG    12    : Qp0zSUDumpTargetStack
       00000006:934016 Stack:  Exception         In Stack Dump Code
       00000006:934040 Stack:  thread is likely terminated or no longer running the same code as the 
captured stack
       00000006:934080
 <strong>(2a)</strong>		Work Thread Index 1  Group MYCRG      Last or current values
 <strong>(2e)</strong>  00000006:934112   Request handle 8E3E1002 EE3218A1 824F0004 AC000456
 <strong>(2c)</strong>  00000006:934136   SPI name QcstStartClusterResourceGroup
       00000006:934160 <strong>(2g)</strong>   POF 10, Completed ack round 1 <strong>(2i)</strong>
       00000006:934176  <strong>(2o)</strong> In waitForJobEnd QDFTJOBD  MYCLUSTER     073590
       00000006:934216   						Node         Ack Status      POF     <strong>(2bb)</strong> Nack Msg Id
       00000006:934240 <strong>(2n)</strong>    SYSTEM1   <strong>(2cc)</strong> Ready					
       00000006:934272     SYSTEM2   Ack    10 <strong>(2k)</strong>
       00000006:934296   Messages
       00000006:934320 Stack Dump For Target Thread: Handle 13 (0x0000000d)
       00000006:934344 Stack:  Work Thread 1 Stack MYCRG
       00000006:934792 Stack:  Library    / Program     Module      Stmt    Procedure
       00000006:934840 Stack:  QSYS       / QCSTCRGJOB  CSTCRGJOB   9     : workThreadRoutine__FPv
       00000006:934888 Stack:  QSYS       / QCSTCRGJOB  CSTCRGJOB   28    : workQueueProcessLoop__FP8Cst
																																		DAMbr
       00000006:941688 Stack:  QSYS       / QCSTCMN     CSTDAMBR    57    : processQueueMsgs__8CstDAMbrF
																																		Q2_8CstDAMbr13CstQueueIndex
       00000006:941696 Stack:  QSYS       / QCSTCMN     CSTDAMBR    33    : processMsg__8CstDAMbrFP6CstM
																																		sg
       00000006:941712 Stack:  QSYS       / QCSTCMN     CSTDAMBR    17    : callFnPtr__8CstDAMbrFPQ2_8Cs
																																		tDAMbr19MsgFunctionPtrEntryP6
       00000006:941728 Stack:  QSYS       / QCSTCMN     CSTACK      3     : CstStripOffHeaderMsgPart
       00000006:941736 Stack:  QSYS       / QCSTCMN     CSTDAMBR    53    : processMsg__8CstDAMbrFP6CstM
																																		sg
       00000006:941752 Stack:  QSYS       / QCSTCMN     CSTDAMBR    17    : callFnPtr__8CstDAMbrFPQ2_8Cs
																																		tDAMbr19MsgFunctionPtrEntryP6
       00000006:970888 Stack:  QSYS       / QCSTCRGS2   CSTCRGSS    39    : startCrg
       00000006:970912 Stack:  QSYS       / QCSTCRGS2   CSTCRGSS    344   : doMessageProcessing__FP6CstM
																																		sgP8CstDAMbr
       00000006:970928 Stack:  QSYS       / QCSTCRGS2   CSTCRGSS    57    : doExitPgmPhase__FP6CstMsgP8C
																																		stDAMbr
       00000006:981984 Stack:  QSYS       / QCSTCMN     CSTDAMBR    52    : waitForJobEnd__8CstDAMbrFPA2
																																		6_ci
       00000006:982000 Stack:  QSYS       / QCSTCMN     CSTDAMBR    73    : waitForSpecialMsg__8CstDAMbr
																																		FP17CstSpecialMsgListPA8_ciT3
       00000006:982016 Stack:  QSYS       / QC2UTIL1    QC2MI3      1     : <strong>(2dd)</strong> deq
       00000006:982136 Stack:  Exception         In Stack Dump Code
       00000006:982136 Stack:  thread is likely terminated or no longer running the same code as the 
captured stack
       00000006:982160
<strong>(2b)</strong>Work Thread Index 2  Group MYCRG      Last or current values
<strong>(2f)</strong>00000006:982176   Request handle D9C3C8C3 E2E3F5F2 0003 0000
<strong>(2cc)</strong>00000006:982176   SPI name
       00000006:982184 <strong>(2h)</strong>  POF 0, <strong>(2d)</strong>Completed ack <strong>(2j)</strong>round 0
       00000006:982184   In getNextWorkMsg
       00000006:982208   Node         Ack Status      POF     Nack Msg Id
<strong>(2l)</strong> 00000006:982208     SYSTEM1   Ready
<strong>(2l)</strong> 00000006:982232     SYSTEM2   Ready
       00000006:982248   Messages
       00000006:982256 Stack Dump For Target Thread: Handle 11 (0x0000000b)
       00000006:982256 Stack:  Work Thread 2 Stack MYCRG
       00000006:982344 Stack:  Library    / Program     Module      Stmt    Procedure
       00000006:982360 Stack:  QSYS       / QCSTCRGJOB  CSTCRGJOB   9     : workThreadRoutine__FPv
       00000006:982376 Stack:  QSYS       / QCSTCRGJOB  CSTCRGJOB   28    : workQueueProcessLoop__FP8Cst
																																		DAMbr
       00000006:982392 Stack:  QSYS       / QCSTCMN     CSTDAMBR    51    : processQueueMsgs__8CstDAMbrF
																																		Q2_8CstDAMbr13CstQueueIndex
<strong>(2m)</strong> 00000006:982400 Stack:  QSYS   / QCSTCMN     CSTDAMBR    105   : getNextWorkMsg__8CstDAMbrFv
       00000006:982416 Stack:  QSYS       / QC2UTIL1    QC2MI3      1     : deq
       00000006:982480 Stack:  Exception        In Stack Dump Code
       00000006:982480 Stack:  thread is likely terminated or no longer running the same code as the 
captured stack</pre>
</div>
<p>The second section contains the call stacks
for each thread that is part of the cluster job. For the most part, the main
thread will show the DMPCLUTRC which just completed (2aa).  The work threads
(2a and 2b) contain the trace information that will help determine what is
happening with a cluster job. This section contains details on the call stack,
such as the SPI name (2c), the completed acknowledgement (ACK)(2d), request
handle for the associated APIs (2e) or the last completed request handle (2f),
current point of failure (POF) (2g and 2h), current round of acknowledgement
(ACK) (2i and 2j) and the nodes that have acknowledged (ACK) (2k and 2l).</p>
<p> Current<dfn class="term"> point
of failure (POF)</dfn> is an internal value that represents where in the
code the current protocol is, and may not necessarily indicate a failure has
occurred. An <dfn class="term">Ack</dfn> means the node has successfully completed this
part of the protocol and is waiting for all other nodes to acknowledge (ACK)
or Nack. A <dfn class="term">Nack</dfn> means the node cannot successfully complete this
part of the protocol and is waiting for all other nodes to respond. The message
ID for the Nack is given in the next column (2bb). This is the same message
that is sent to the originator's RIQ. If a node fails during a protocol, it's
status shows as Fail and, depending on the protocol and the node, may or may
not be considered a Nack. An Ack status of Inactive means the node did not
participate in the protocol. A value of Ready means that node has not responded
back yet.  When a thread is in <samp class="codeph">getNextWorkMsg</samp> (2m) it means
the thread is waiting for work to do.   </p>
<p>Read the procedure names by
starting at the bottom and working up the call stack. This example file contains
a <samp class="codeph">deq</samp> (2dd) with a <samp class="codeph">waitForSpecialMsg</samp>, <samp class="codeph">waitForJobEnd</samp> and <samp class="codeph">doExitPgmPhase</samp>.
This indicates that the protocol is waiting for an exit program to complete
before it can continue processing.  From the <samp class="codeph">Ack Status</samp> (2k),
we can determine which node the protocol is waiting for.  In this example,
we are waiting for node SYSTEM1 (2n).  The qualified job name (2e) indicates
the job for which the system is waiting.   Once you determine the job name,
you can work with the job to troubleshoot the cause of the delay. Some possible
causes may be that the job is still waiting in job queue, the job is running
but it takes time to process, or the job is waiting for an object that is
locked. </p>
<p>In this example, the protocol is waiting for an exit program
to complete.  An easier way to determine if a protocol is waiting for an exit
program or vary job to complete is to look at the first section, and see if
in <samp class="codeph">waitForJobEnd</samp> (2o). The job name being waited for is on
the same line.  This eliminates the need to look through the stacks.  </p>
</div>
<div class="section"><img src="./delta.gif" alt="Start of change" /><h4 class="sectiontitle">Sections 3 of DMPCLUTRC results</h4><div class="p"><pre>5722SS1 V5R4M0  060210    AS/400 DUMP    073586/QSYS/MYCRG       08/22/05 16:48:18        PAGE    1
DMPSYSOBJ PARAMETERS
<strong>(3a)</strong>OBJ- MYCRGAIX                        CONTEXT- QTEMP
TYPE- *ALL SUBTYPE-*ALL
OBJECT TYPE-           INDEX                                           *CRGM
NAME-        MYCRGAIX                        TYPE-          0E   SUBTYPE-          A5
LIBRARY-     QTEMP     006B8A19B00C9E807000  TYPE-          04   SUBTYPE-          C1
CREATION-    08/22/05  16:43:32              SIZE-          0000007000
ATTRIBUTES-          0000                    ADDRESS-       C7FE286F04 000000
.
.
.
.POINTERS-
  NONE
OIR DATA-
  NONE
END OF DUMP
                                    * * * * *  E N D  O F  L I S T I N G  * * * * *</pre>
</div>
<p>The third section shown is an internal object which contains
information about the cluster job.  In this example, it’s an internal index
called MYCRGAIX (3a).  The information in here is much easier to read than
in section 2 above. </p>
<img src="./deltaend.gif" alt="End of change" /></div>
<div class="section"><h4 class="sectiontitle">Sections 4 of DMPCLUTRC results</h4><div class="p"><pre>5722SS1 V5R4M0  060210   AS/400 DUMP     073586/QSYS/USER     08/22/05 16:48:18        PAGE    1
DMPSYSOBJ PARAMETERS
<strong>(4a)</strong>OBJ- MYCRGTQ                         CONTEXT- QTEMP
TYPE- 0A   SUBTYPE-EF
OBJECT TYPE-           QUEUE                                           *QTQ
NAME-        MYCRGTQ                         TYPE-          0A   SUBTYPE-          EF
LIBRARY-     QTEMP     006B8A19B00C9E807000  TYPE-          04   SUBTYPE-          C1
CREATION-    08/22/05  16:43:32              SIZE-          000002C000
ATTRIBUTES-          0000                    ADDRESS-       CC6765CAA2 000000
QUEUE ATTRIBUTES-
.
.
.
.POINTERS-
  000010   SPP 1A EF MYCRG     QSYS      073586                                            00002160  0000
  000020   SPP 1A EF MYCRG     QSYS      073586                                            00001540  8000
  000030   SPP 1A EF MYCRG     QSYS      073586                                            000016E0  0000
  000040   SPP 1A EF MYCRG     QSYS      073586                                            00001690  0000
  000050   SPP 1A EF MYCRG     QSYS      073586                                            000016A0  0000
  000070   SPP 1A EF MYCRG     QSYS      073586                                            00002160  0000
OIR DATA-
  NONE
END OF DUMP
                                     * * * * *  E N D  O F  L I S T I N G  * * * * *</pre>
</div>
<div class="p">The fourth section shown is called the trace queue (4a).
 In this situation it is called MYCRGTQ.  This contains information about
what the cluster has been having this job execute and how each job responded
to the request.   <div class="note"><span class="notetitle">Note:</span> Each message is not described fully here.</div>
</div>
</div>
<div class="section"><h4 class="sectiontitle">Sections 5 of DMPCLUTRC results</h4><div class="p"><pre>5722SS1 V5R4M0  060210  AS/400 DUMP  073586/QSYS/MYCRG     08/22/05 16:48:18        PAGE    1
DMPSYSOBJ PARAMETERS
<strong>(5a)</strong> OBJ- MYCRG                           CONTEXT- QUSRSYS
TYPE- 19   SUBTYPE-2C
OBJECT TYPE-           SPACE                                           *CRG
NAME-        MYCRG                           TYPE-          19   SUBTYPE-          2C
LIBRARY-     QUSRSYS                         TYPE-          04   SUBTYPE-          01
CREATION-    08/17/05  07:16:40              SIZE-          0000002000
OWNER-       MYCLUSTER                           TYPE-          08   SUBTYPE-          01
ATTRIBUTES-          0800                    ADDRESS-       1EC687A1F3 000000
SPACE ATTRIBUTES-
.
.
.
END OF DUMP
                                          * * * * *  E N D  O F  L I S T I N G  * * * * *</pre>
</div>
<p>The fifth section contains information about the CRG object
(5a).</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="rzaigtroubleshoot.htm" title="Find error recovery solutions for problems that are specific to clusters.">Troubleshoot clusters</a></div>
</div>
</div>
<img src="./deltaend.gif" alt="End of change" /></body>
</html>