304 lines
20 KiB
HTML
304 lines
20 KiB
HTML
<?xml version="1.0" encoding="UTF-8"?>
|
||
<!DOCTYPE html
|
||
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||
<html lang="en-us" xml:lang="en-us">
|
||
<head>
|
||
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
|
||
<meta name="security" content="public" />
|
||
<meta name="Robots" content="index,follow" />
|
||
<meta http-equiv="PICS-Label" content='(PICS-1.1 "http://www.icra.org/ratingsv02.html" l gen true r (cz 1 lz 1 nz 1 oz 1 vz 1) "http://www.rsac.org/ratingsv01.html" l gen true r (n 0 s 0 v 0 l 0) "http://www.classify.org/safesurf/" l gen true r (SS~~000 1))' />
|
||
<meta name="DC.Type" content="concept" />
|
||
<meta name="DC.Title" content="Investigate a problem with Dump Cluster Trace (DMPCLUTRC) command" />
|
||
<meta name="abstract" content="Use the Dump Cluster Trace (DMPCLUTRC) command to help determine and resolve problems with a cluster." />
|
||
<meta name="description" content="Use the Dump Cluster Trace (DMPCLUTRC) command to help determine and resolve problems with a cluster." />
|
||
<meta name="DC.Relation" scheme="URI" content="rzaigtroubleshoot.htm" />
|
||
<meta name="copyright" content="(C) Copyright IBM Corporation 1998, 2006" />
|
||
<meta name="DC.Rights.Owner" content="(C) Copyright IBM Corporation 1998, 2006" />
|
||
<meta name="DC.Format" content="XHTML" />
|
||
<meta name="DC.Identifier" content="rzaiginvestigatetrace" />
|
||
<meta name="DC.Language" content="en-us" />
|
||
<!-- All rights reserved. Licensed Materials Property of IBM -->
|
||
<!-- US Government Users Restricted Rights -->
|
||
<!-- Use, duplication or disclosure restricted by -->
|
||
<!-- GSA ADP Schedule Contract with IBM Corp. -->
|
||
<link rel="stylesheet" type="text/css" href="./ibmdita.css" />
|
||
<link rel="stylesheet" type="text/css" href="./ic.css" />
|
||
<title>Investigate a problem with Dump Cluster Trace (DMPCLUTRC) command</title>
|
||
</head>
|
||
<body id="rzaiginvestigatetrace"><a name="rzaiginvestigatetrace"><!-- --></a>
|
||
<img src="./delta.gif" alt="Start of change" /><!-- Java sync-link --><script language="Javascript" src="../rzahg/synch.js" type="text/javascript"></script>
|
||
<h1 class="topictitle1">Investigate a problem with Dump Cluster Trace (DMPCLUTRC) command</h1>
|
||
<div><p>Use the Dump Cluster Trace (DMPCLUTRC) command to help determine
|
||
and resolve problems with a cluster.</p>
|
||
<p>The <a href="../cl/dmpclutrc.htm"><span class="cmdname">Dump
|
||
Cluster Trace (DMPCLUTRC)</span> command</a> can help you to determine
|
||
whether a cluster job has completed or what the job is currently processing.
|
||
The command dumps cluster-related trace and debug information to a file. The
|
||
information is dumped locally on one or more cluster nodes. The command can
|
||
be used to dump one or all cluster resource service (CRS) jobs. Each CRS job
|
||
that is dumped has a file member in the file. The name of the file member
|
||
is the name of the CRS job. Clustering must be active for the command to produce
|
||
output. Only nodes that have an active CRS job will have output. The information
|
||
that is dumped originates from the user trace and other information is taken
|
||
from cluster objects. The amount of information dumped is determined by the
|
||
dump level. The different dump levels are basic information, error information,
|
||
informational information, and verbose information. The dump level determines
|
||
how much information is sent to the file. In most cases an IBM<sup>®</sup> service representative
|
||
will inform you of which level to specify based on your needs, however; LEVEL(*ERROR)
|
||
is sufficient for most troubleshooting scenarios. If you have a question on
|
||
which level is appropriate for your situation, contact an IBM Service representative. </p>
|
||
<div class="section"><h4 class="sectiontitle">Interpreting trace results</h4><p>You can analyze the trace
|
||
results to get an understanding of what the clustering is doing, such as which
|
||
cluster job is causing the protocol to wait. The output which comes from
|
||
the user trace will contain a separator line which is a series of equal signs
|
||
(=). Depending on how many times the DMPCLUTRC was issued will have an impact
|
||
on how many separator lines are seen in the file. There may be multiple calls
|
||
of DMPCLUTRC in the same file. The last set of stack dumps has the most current
|
||
information. In some cases, a CRG job can have two groups. Each group has
|
||
a separate dump section in the file. </p>
|
||
<p>In the following example of
|
||
Dump Cluster Trace results, there are two nodes (SYSTEM1 and SYSTEM2) in a
|
||
cluster named MYCLUSTER. It has one CRG named MYCRG. Both nodes are in the
|
||
recovery domain of the MYCRG. The user issued the STRCRG CL command and the
|
||
process is taking a long time to return results. From a different workstation,
|
||
the user entered <span class="cmdname">DMPCLUTRC CLUSTER(MYCLUSTER) CRG(*ALL) LEVEL(*ERROR)
|
||
FILE(MYFILE)</span> on a command line interface. </p>
|
||
<div class="p">For this example,
|
||
the output from the DMPCLUTRC command resides in a file called MYFILE in member
|
||
MYCRG. To help explain what the contents of member MYCRG, it has been broken
|
||
down into sections. Throughout the sections, numbers are highlighted in parentheses
|
||
to identify the information being described. These details can aid you in
|
||
troubleshooting cluster problems. <div class="note"><span class="notetitle">Note:</span> <img src="./delta.gif" alt="Start of change" />Vertical ellipses indicate
|
||
that a portion of the trace was removed and is not displayed in the output.<img src="./deltaend.gif" alt="End of change" /></div>
|
||
</div>
|
||
</div>
|
||
<div class="section"><h4 class="sectiontitle">Section 1 of DMPCLUTRC results</h4><div class="p"><pre>User Trace Dump for job 073586/QSYS/MYCRG. Size: 300K, Wrapped 0 times.
|
||
|
||
--- 08/22/2005 16:43:32 ---
|
||
<strong>(1a)</strong> 00000006:658536 Main thread handle 2
|
||
<strong>(1b)</strong> 00000008:748016 Work thread 1 handle 13
|
||
<strong>(1b)</strong> 00000007:754576 Work thread 2 handle 11
|
||
--- 08/22/2005 16:46:04 ---
|
||
00000008:269608 CSTDAMBR 1115: WaitForMsg 4 1005 CPFBB3C
|
||
--- 08/22/2005 16:48:17 ---
|
||
00000006:925112
|
||
<strong>(1c) </strong> DMPCLUTRC Node SYSTEM1 Group MYCRG
|
||
===================================================</pre>
|
||
</div>
|
||
<p>The first section contains the thread numbers and handles
|
||
for the cluster job. Cluster jobs may have two or more threads. In this example
|
||
there is one for the main thread (1a), which is where all the work comes into,
|
||
and two work threads (1b). This section also contains information about what
|
||
system this trace came from and to which cluster job it pertains too (1c).
|
||
</p>
|
||
</div>
|
||
<div class="section"><h4 class="sectiontitle">Sections 2 of DMPCLUTRC results</h4><div class="p"><pre>00000006:925168 Stack Dump For Target Thread: Handle 2 (0x00000002)
|
||
00000006:925192 Stack:
|
||
<strong>(2aa)</strong> Main Thread Stack MYCRG
|
||
00000006:925256 Stack: Library / Program Module Stmt Procedure
|
||
00000006:933432 Stack: QSYS / QCSTCRGJOB CSTCRGJOB 0 : _CXX_PEP__Fv
|
||
00000006:933488 Stack: QSYS / QCSTCRGJOB CSTCRGJOB 46 : main
|
||
00000006:933536 Stack: QSYS / QCSTCRGJOB CSTCRGJOB 65 : completeStartup__FP8CstDAMbr
|
||
00000006:933584 Stack: QSYS / QCSTCRGJOB CSTCRGJOB 26 : mainQueueProcessLoop__FP8Cs
|
||
DAMbr
|
||
00000006:933616 Stack: QSYS / QCSTCMN CSTDAMBR 57 : processQueueMsgs__8CstDAMbrF
|
||
Q2_8CstDAMbr13CstQueueIndex
|
||
00000006:933664 Stack: QSYS / QCSTCMN CSTDAMBR 53 : processMsg__8CstDAMbrFP6CstM
|
||
sg
|
||
00000006:933712 Stack: QSYS / QCSTCMN CSTDAMBR 17 : callFnPtr__8CstDAMbrFPQ2_8Cs
|
||
tDAMbr19MsgFunctionPtrEntryP6
|
||
00000006:933744 Stack: QSYS / QCSTCRGJOB CSTCRGJOB 94 : crgDump__FP6CstMsgP8CstDAMbr
|
||
00000006:933792 Stack: QSYS / QCSTCMN CSTACK 95 : CstAckQueryMsg
|
||
00000006:933832 Stack: QSYS / QP0ZCPA QP0ZUDBG 3 : Qp0zDumpTargetStack
|
||
00000006:933864 Stack: QSYS / QP0ZSCPA QP0ZSDBG 12 : Qp0zSUDumpTargetStack
|
||
00000006:934016 Stack: Exception In Stack Dump Code
|
||
00000006:934040 Stack: thread is likely terminated or no longer running the same code as the
|
||
captured stack
|
||
00000006:934080
|
||
<strong>(2a)</strong> Work Thread Index 1 Group MYCRG Last or current values
|
||
<strong>(2e)</strong> 00000006:934112 Request handle 8E3E1002 EE3218A1 824F0004 AC000456
|
||
<strong>(2c)</strong> 00000006:934136 SPI name QcstStartClusterResourceGroup
|
||
00000006:934160 <strong>(2g)</strong> POF 10, Completed ack round 1 <strong>(2i)</strong>
|
||
00000006:934176 <strong>(2o)</strong> In waitForJobEnd QDFTJOBD MYCLUSTER 073590
|
||
00000006:934216 Node Ack Status POF <strong>(2bb)</strong> Nack Msg Id
|
||
00000006:934240 <strong>(2n)</strong> SYSTEM1 <strong>(2cc)</strong> Ready
|
||
00000006:934272 SYSTEM2 Ack 10 <strong>(2k)</strong>
|
||
00000006:934296 Messages
|
||
00000006:934320 Stack Dump For Target Thread: Handle 13 (0x0000000d)
|
||
00000006:934344 Stack: Work Thread 1 Stack MYCRG
|
||
00000006:934792 Stack: Library / Program Module Stmt Procedure
|
||
00000006:934840 Stack: QSYS / QCSTCRGJOB CSTCRGJOB 9 : workThreadRoutine__FPv
|
||
00000006:934888 Stack: QSYS / QCSTCRGJOB CSTCRGJOB 28 : workQueueProcessLoop__FP8Cst
|
||
DAMbr
|
||
00000006:941688 Stack: QSYS / QCSTCMN CSTDAMBR 57 : processQueueMsgs__8CstDAMbrF
|
||
Q2_8CstDAMbr13CstQueueIndex
|
||
00000006:941696 Stack: QSYS / QCSTCMN CSTDAMBR 33 : processMsg__8CstDAMbrFP6CstM
|
||
sg
|
||
00000006:941712 Stack: QSYS / QCSTCMN CSTDAMBR 17 : callFnPtr__8CstDAMbrFPQ2_8Cs
|
||
tDAMbr19MsgFunctionPtrEntryP6
|
||
00000006:941728 Stack: QSYS / QCSTCMN CSTACK 3 : CstStripOffHeaderMsgPart
|
||
00000006:941736 Stack: QSYS / QCSTCMN CSTDAMBR 53 : processMsg__8CstDAMbrFP6CstM
|
||
sg
|
||
00000006:941752 Stack: QSYS / QCSTCMN CSTDAMBR 17 : callFnPtr__8CstDAMbrFPQ2_8Cs
|
||
tDAMbr19MsgFunctionPtrEntryP6
|
||
00000006:970888 Stack: QSYS / QCSTCRGS2 CSTCRGSS 39 : startCrg
|
||
00000006:970912 Stack: QSYS / QCSTCRGS2 CSTCRGSS 344 : doMessageProcessing__FP6CstM
|
||
sgP8CstDAMbr
|
||
00000006:970928 Stack: QSYS / QCSTCRGS2 CSTCRGSS 57 : doExitPgmPhase__FP6CstMsgP8C
|
||
stDAMbr
|
||
00000006:981984 Stack: QSYS / QCSTCMN CSTDAMBR 52 : waitForJobEnd__8CstDAMbrFPA2
|
||
6_ci
|
||
00000006:982000 Stack: QSYS / QCSTCMN CSTDAMBR 73 : waitForSpecialMsg__8CstDAMbr
|
||
FP17CstSpecialMsgListPA8_ciT3
|
||
00000006:982016 Stack: QSYS / QC2UTIL1 QC2MI3 1 : <strong>(2dd)</strong> deq
|
||
00000006:982136 Stack: Exception In Stack Dump Code
|
||
00000006:982136 Stack: thread is likely terminated or no longer running the same code as the
|
||
captured stack
|
||
00000006:982160
|
||
<strong>(2b)</strong>Work Thread Index 2 Group MYCRG Last or current values
|
||
<strong>(2f)</strong>00000006:982176 Request handle D9C3C8C3 E2E3F5F2 0003 0000
|
||
<strong>(2cc)</strong>00000006:982176 SPI name
|
||
00000006:982184 <strong>(2h)</strong> POF 0, <strong>(2d)</strong>Completed ack <strong>(2j)</strong>round 0
|
||
00000006:982184 In getNextWorkMsg
|
||
00000006:982208 Node Ack Status POF Nack Msg Id
|
||
<strong>(2l)</strong> 00000006:982208 SYSTEM1 Ready
|
||
<strong>(2l)</strong> 00000006:982232 SYSTEM2 Ready
|
||
00000006:982248 Messages
|
||
00000006:982256 Stack Dump For Target Thread: Handle 11 (0x0000000b)
|
||
00000006:982256 Stack: Work Thread 2 Stack MYCRG
|
||
00000006:982344 Stack: Library / Program Module Stmt Procedure
|
||
00000006:982360 Stack: QSYS / QCSTCRGJOB CSTCRGJOB 9 : workThreadRoutine__FPv
|
||
00000006:982376 Stack: QSYS / QCSTCRGJOB CSTCRGJOB 28 : workQueueProcessLoop__FP8Cst
|
||
DAMbr
|
||
00000006:982392 Stack: QSYS / QCSTCMN CSTDAMBR 51 : processQueueMsgs__8CstDAMbrF
|
||
Q2_8CstDAMbr13CstQueueIndex
|
||
<strong>(2m)</strong> 00000006:982400 Stack: QSYS / QCSTCMN CSTDAMBR 105 : getNextWorkMsg__8CstDAMbrFv
|
||
00000006:982416 Stack: QSYS / QC2UTIL1 QC2MI3 1 : deq
|
||
00000006:982480 Stack: Exception In Stack Dump Code
|
||
00000006:982480 Stack: thread is likely terminated or no longer running the same code as the
|
||
captured stack</pre>
|
||
</div>
|
||
<p>The second section contains the call stacks
|
||
for each thread that is part of the cluster job. For the most part, the main
|
||
thread will show the DMPCLUTRC which just completed (2aa). The work threads
|
||
(2a and 2b) contain the trace information that will help determine what is
|
||
happening with a cluster job. This section contains details on the call stack,
|
||
such as the SPI name (2c), the completed acknowledgement (ACK)(2d), request
|
||
handle for the associated APIs (2e) or the last completed request handle (2f),
|
||
current point of failure (POF) (2g and 2h), current round of acknowledgement
|
||
(ACK) (2i and 2j) and the nodes that have acknowledged (ACK) (2k and 2l).</p>
|
||
<p> Current<dfn class="term"> point
|
||
of failure (POF)</dfn> is an internal value that represents where in the
|
||
code the current protocol is, and may not necessarily indicate a failure has
|
||
occurred. An <dfn class="term">Ack</dfn> means the node has successfully completed this
|
||
part of the protocol and is waiting for all other nodes to acknowledge (ACK)
|
||
or Nack. A <dfn class="term">Nack</dfn> means the node cannot successfully complete this
|
||
part of the protocol and is waiting for all other nodes to respond. The message
|
||
ID for the Nack is given in the next column (2bb). This is the same message
|
||
that is sent to the originator's RIQ. If a node fails during a protocol, it's
|
||
status shows as Fail and, depending on the protocol and the node, may or may
|
||
not be considered a Nack. An Ack status of Inactive means the node did not
|
||
participate in the protocol. A value of Ready means that node has not responded
|
||
back yet. When a thread is in <samp class="codeph">getNextWorkMsg</samp> (2m) it means
|
||
the thread is waiting for work to do. </p>
|
||
<p>Read the procedure names by
|
||
starting at the bottom and working up the call stack. This example file contains
|
||
a <samp class="codeph">deq</samp> (2dd) with a <samp class="codeph">waitForSpecialMsg</samp>, <samp class="codeph">waitForJobEnd</samp> and <samp class="codeph">doExitPgmPhase</samp>.
|
||
This indicates that the protocol is waiting for an exit program to complete
|
||
before it can continue processing. From the <samp class="codeph">Ack Status</samp> (2k),
|
||
we can determine which node the protocol is waiting for. In this example,
|
||
we are waiting for node SYSTEM1 (2n). The qualified job name (2e) indicates
|
||
the job for which the system is waiting. Once you determine the job name,
|
||
you can work with the job to troubleshoot the cause of the delay. Some possible
|
||
causes may be that the job is still waiting in job queue, the job is running
|
||
but it takes time to process, or the job is waiting for an object that is
|
||
locked. </p>
|
||
<p>In this example, the protocol is waiting for an exit program
|
||
to complete. An easier way to determine if a protocol is waiting for an exit
|
||
program or vary job to complete is to look at the first section, and see if
|
||
in <samp class="codeph">waitForJobEnd</samp> (2o). The job name being waited for is on
|
||
the same line. This eliminates the need to look through the stacks. </p>
|
||
</div>
|
||
<div class="section"><img src="./delta.gif" alt="Start of change" /><h4 class="sectiontitle">Sections 3 of DMPCLUTRC results</h4><div class="p"><pre>5722SS1 V5R4M0 060210 AS/400 DUMP 073586/QSYS/MYCRG 08/22/05 16:48:18 PAGE 1
|
||
DMPSYSOBJ PARAMETERS
|
||
<strong>(3a)</strong>OBJ- MYCRGAIX CONTEXT- QTEMP
|
||
TYPE- *ALL SUBTYPE-*ALL
|
||
OBJECT TYPE- INDEX *CRGM
|
||
NAME- MYCRGAIX TYPE- 0E SUBTYPE- A5
|
||
LIBRARY- QTEMP 006B8A19B00C9E807000 TYPE- 04 SUBTYPE- C1
|
||
CREATION- 08/22/05 16:43:32 SIZE- 0000007000
|
||
ATTRIBUTES- 0000 ADDRESS- C7FE286F04 000000
|
||
.
|
||
.
|
||
.
|
||
.POINTERS-
|
||
NONE
|
||
OIR DATA-
|
||
NONE
|
||
END OF DUMP
|
||
* * * * * E N D O F L I S T I N G * * * * *</pre>
|
||
</div>
|
||
<p>The third section shown is an internal object which contains
|
||
information about the cluster job. In this example, it’s an internal index
|
||
called MYCRGAIX (3a). The information in here is much easier to read than
|
||
in section 2 above. </p>
|
||
<img src="./deltaend.gif" alt="End of change" /></div>
|
||
<div class="section"><h4 class="sectiontitle">Sections 4 of DMPCLUTRC results</h4><div class="p"><pre>5722SS1 V5R4M0 060210 AS/400 DUMP 073586/QSYS/USER 08/22/05 16:48:18 PAGE 1
|
||
DMPSYSOBJ PARAMETERS
|
||
<strong>(4a)</strong>OBJ- MYCRGTQ CONTEXT- QTEMP
|
||
TYPE- 0A SUBTYPE-EF
|
||
OBJECT TYPE- QUEUE *QTQ
|
||
NAME- MYCRGTQ TYPE- 0A SUBTYPE- EF
|
||
LIBRARY- QTEMP 006B8A19B00C9E807000 TYPE- 04 SUBTYPE- C1
|
||
CREATION- 08/22/05 16:43:32 SIZE- 000002C000
|
||
ATTRIBUTES- 0000 ADDRESS- CC6765CAA2 000000
|
||
QUEUE ATTRIBUTES-
|
||
.
|
||
.
|
||
.
|
||
.POINTERS-
|
||
000010 SPP 1A EF MYCRG QSYS 073586 00002160 0000
|
||
000020 SPP 1A EF MYCRG QSYS 073586 00001540 8000
|
||
000030 SPP 1A EF MYCRG QSYS 073586 000016E0 0000
|
||
000040 SPP 1A EF MYCRG QSYS 073586 00001690 0000
|
||
000050 SPP 1A EF MYCRG QSYS 073586 000016A0 0000
|
||
000070 SPP 1A EF MYCRG QSYS 073586 00002160 0000
|
||
OIR DATA-
|
||
NONE
|
||
END OF DUMP
|
||
* * * * * E N D O F L I S T I N G * * * * *</pre>
|
||
</div>
|
||
<div class="p">The fourth section shown is called the trace queue (4a).
|
||
In this situation it is called MYCRGTQ. This contains information about
|
||
what the cluster has been having this job execute and how each job responded
|
||
to the request. <div class="note"><span class="notetitle">Note:</span> Each message is not described fully here.</div>
|
||
</div>
|
||
</div>
|
||
<div class="section"><h4 class="sectiontitle">Sections 5 of DMPCLUTRC results</h4><div class="p"><pre>5722SS1 V5R4M0 060210 AS/400 DUMP 073586/QSYS/MYCRG 08/22/05 16:48:18 PAGE 1
|
||
DMPSYSOBJ PARAMETERS
|
||
<strong>(5a)</strong> OBJ- MYCRG CONTEXT- QUSRSYS
|
||
TYPE- 19 SUBTYPE-2C
|
||
OBJECT TYPE- SPACE *CRG
|
||
NAME- MYCRG TYPE- 19 SUBTYPE- 2C
|
||
LIBRARY- QUSRSYS TYPE- 04 SUBTYPE- 01
|
||
CREATION- 08/17/05 07:16:40 SIZE- 0000002000
|
||
OWNER- MYCLUSTER TYPE- 08 SUBTYPE- 01
|
||
ATTRIBUTES- 0800 ADDRESS- 1EC687A1F3 000000
|
||
SPACE ATTRIBUTES-
|
||
.
|
||
.
|
||
.
|
||
END OF DUMP
|
||
* * * * * E N D O F L I S T I N G * * * * *</pre>
|
||
</div>
|
||
<p>The fifth section contains information about the CRG object
|
||
(5a).</p>
|
||
</div>
|
||
</div>
|
||
<div>
|
||
<div class="familylinks">
|
||
<div class="parentlink"><strong>Parent topic:</strong> <a href="rzaigtroubleshoot.htm" title="Find error recovery solutions for problems that are specific to clusters.">Troubleshoot clusters</a></div>
|
||
</div>
|
||
</div>
|
||
<img src="./deltaend.gif" alt="End of change" /></body>
|
||
</html> |