ibm-information-center/dist/eclipse/plugins/i5OS.ic.rzaig_5.4.0.1/rzaiginvestigatetrace.htm

304 lines
20 KiB
HTML
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html lang="en-us" xml:lang="en-us">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="security" content="public" />
<meta name="Robots" content="index,follow" />
<meta http-equiv="PICS-Label" content='(PICS-1.1 "http://www.icra.org/ratingsv02.html" l gen true r (cz 1 lz 1 nz 1 oz 1 vz 1) "http://www.rsac.org/ratingsv01.html" l gen true r (n 0 s 0 v 0 l 0) "http://www.classify.org/safesurf/" l gen true r (SS~~000 1))' />
<meta name="DC.Type" content="concept" />
<meta name="DC.Title" content="Investigate a problem with Dump Cluster Trace (DMPCLUTRC) command" />
<meta name="abstract" content="Use the Dump Cluster Trace (DMPCLUTRC) command to help determine and resolve problems with a cluster." />
<meta name="description" content="Use the Dump Cluster Trace (DMPCLUTRC) command to help determine and resolve problems with a cluster." />
<meta name="DC.Relation" scheme="URI" content="rzaigtroubleshoot.htm" />
<meta name="copyright" content="(C) Copyright IBM Corporation 1998, 2006" />
<meta name="DC.Rights.Owner" content="(C) Copyright IBM Corporation 1998, 2006" />
<meta name="DC.Format" content="XHTML" />
<meta name="DC.Identifier" content="rzaiginvestigatetrace" />
<meta name="DC.Language" content="en-us" />
<!-- All rights reserved. Licensed Materials Property of IBM -->
<!-- US Government Users Restricted Rights -->
<!-- Use, duplication or disclosure restricted by -->
<!-- GSA ADP Schedule Contract with IBM Corp. -->
<link rel="stylesheet" type="text/css" href="./ibmdita.css" />
<link rel="stylesheet" type="text/css" href="./ic.css" />
<title>Investigate a problem with Dump Cluster Trace (DMPCLUTRC) command</title>
</head>
<body id="rzaiginvestigatetrace"><a name="rzaiginvestigatetrace"><!-- --></a>
<img src="./delta.gif" alt="Start of change" /><!-- Java sync-link --><script language="Javascript" src="../rzahg/synch.js" type="text/javascript"></script>
<h1 class="topictitle1">Investigate a problem with Dump Cluster Trace (DMPCLUTRC) command</h1>
<div><p>Use the Dump Cluster Trace (DMPCLUTRC) command to help determine
and resolve problems with a cluster.</p>
<p>The <a href="../cl/dmpclutrc.htm"><span class="cmdname">Dump
Cluster Trace (DMPCLUTRC)</span> command</a> can help you to determine
whether a cluster job has completed or what the job is currently processing.
The command dumps cluster-related trace and debug information to a file. The
information is dumped locally on one or more cluster nodes. The command can
be used to dump one or all cluster resource service (CRS) jobs. Each CRS job
that is dumped has a file member in the file. The name of the file member
is the name of the CRS job. Clustering must be active for the command to produce
output. Only nodes that have an active CRS job will have output. The information
that is dumped originates from the user trace and other information is taken
from cluster objects. The amount of information dumped is determined by the
dump level. The different dump levels are basic information, error information,
informational information, and verbose information. The dump level determines
how much information is sent to the file. In most cases an IBM<sup>®</sup> service representative
will inform you of which level to specify based on your needs, however; LEVEL(*ERROR)
is sufficient for most troubleshooting scenarios. If you have a question on
which level is appropriate for your situation, contact an IBM Service representative. </p>
<div class="section"><h4 class="sectiontitle">Interpreting trace results</h4><p>You can analyze the trace
results to get an understanding of what the clustering is doing, such as which
cluster job is causing the protocol to wait. The output which comes from
the user trace will contain a separator line which is a series of equal signs
(=). Depending on how many times the DMPCLUTRC was issued will have an impact
on how many separator lines are seen in the file. There may be multiple calls
of DMPCLUTRC in the same file. The last set of stack dumps has the most current
information. In some cases, a CRG job can have two groups. Each group has
a separate dump section in the file. </p>
<p>In the following example of
Dump Cluster Trace results, there are two nodes (SYSTEM1 and SYSTEM2) in a
cluster named MYCLUSTER. It has one CRG named MYCRG. Both nodes are in the
recovery domain of the MYCRG. The user issued the STRCRG CL command and the
process is taking a long time to return results. From a different workstation,
the user entered <span class="cmdname">DMPCLUTRC CLUSTER(MYCLUSTER) CRG(*ALL) LEVEL(*ERROR)
FILE(MYFILE)</span> on a command line interface. </p>
<div class="p">For this example,
the output from the DMPCLUTRC command resides in a file called MYFILE in member
MYCRG. To help explain what the contents of member MYCRG, it has been broken
down into sections. Throughout the sections, numbers are highlighted in parentheses
to identify the information being described. These details can aid you in
troubleshooting cluster problems. <div class="note"><span class="notetitle">Note:</span> <img src="./delta.gif" alt="Start of change" />Vertical ellipses indicate
that a portion of the trace was removed and is not displayed in the output.<img src="./deltaend.gif" alt="End of change" /></div>
</div>
</div>
<div class="section"><h4 class="sectiontitle">Section 1 of DMPCLUTRC results</h4><div class="p"><pre>User Trace Dump for job 073586/QSYS/MYCRG. Size: 300K, Wrapped 0 times.
--- 08/22/2005 16:43:32 ---
<strong>(1a)</strong> 00000006:658536 Main thread handle 2
<strong>(1b)</strong> 00000008:748016 Work thread 1 handle 13
<strong>(1b)</strong> 00000007:754576 Work thread 2 handle 11
--- 08/22/2005 16:46:04 ---
00000008:269608 CSTDAMBR 1115: WaitForMsg 4 1005 CPFBB3C
--- 08/22/2005 16:48:17 ---
00000006:925112
<strong>(1c) </strong> DMPCLUTRC Node SYSTEM1 Group MYCRG
===================================================</pre>
</div>
<p>The first section contains the thread numbers and handles
for the cluster job. Cluster jobs may have two or more threads. In this example
there is one for the main thread (1a), which is where all the work comes into,
and two work threads (1b). This section also contains information about what
system this trace came from and to which cluster job it pertains too (1c).
</p>
</div>
<div class="section"><h4 class="sectiontitle">Sections 2 of DMPCLUTRC results</h4><div class="p"><pre>00000006:925168 Stack Dump For Target Thread: Handle 2 (0x00000002)
00000006:925192 Stack:
<strong>(2aa)</strong> Main Thread Stack MYCRG
00000006:925256 Stack: Library / Program Module Stmt Procedure
00000006:933432 Stack: QSYS / QCSTCRGJOB CSTCRGJOB 0 : _CXX_PEP__Fv
00000006:933488 Stack: QSYS / QCSTCRGJOB CSTCRGJOB 46 : main
00000006:933536 Stack: QSYS / QCSTCRGJOB CSTCRGJOB 65 : completeStartup__FP8CstDAMbr
00000006:933584 Stack: QSYS / QCSTCRGJOB CSTCRGJOB 26 : mainQueueProcessLoop__FP8Cs
DAMbr
00000006:933616 Stack: QSYS / QCSTCMN CSTDAMBR 57 : processQueueMsgs__8CstDAMbrF
Q2_8CstDAMbr13CstQueueIndex
00000006:933664 Stack: QSYS / QCSTCMN CSTDAMBR 53 : processMsg__8CstDAMbrFP6CstM
sg
00000006:933712 Stack: QSYS / QCSTCMN CSTDAMBR 17 : callFnPtr__8CstDAMbrFPQ2_8Cs
tDAMbr19MsgFunctionPtrEntryP6
00000006:933744 Stack: QSYS / QCSTCRGJOB CSTCRGJOB 94 : crgDump__FP6CstMsgP8CstDAMbr
00000006:933792 Stack: QSYS / QCSTCMN CSTACK 95 : CstAckQueryMsg
00000006:933832 Stack: QSYS / QP0ZCPA QP0ZUDBG 3 : Qp0zDumpTargetStack
00000006:933864 Stack: QSYS / QP0ZSCPA QP0ZSDBG 12 : Qp0zSUDumpTargetStack
00000006:934016 Stack: Exception In Stack Dump Code
00000006:934040 Stack: thread is likely terminated or no longer running the same code as the
captured stack
00000006:934080
<strong>(2a)</strong> Work Thread Index 1 Group MYCRG Last or current values
<strong>(2e)</strong> 00000006:934112 Request handle 8E3E1002 EE3218A1 824F0004 AC000456
<strong>(2c)</strong> 00000006:934136 SPI name QcstStartClusterResourceGroup
00000006:934160 <strong>(2g)</strong> POF 10, Completed ack round 1 <strong>(2i)</strong>
00000006:934176 <strong>(2o)</strong> In waitForJobEnd QDFTJOBD MYCLUSTER 073590
00000006:934216 Node Ack Status POF <strong>(2bb)</strong> Nack Msg Id
00000006:934240 <strong>(2n)</strong> SYSTEM1 <strong>(2cc)</strong> Ready
00000006:934272 SYSTEM2 Ack 10 <strong>(2k)</strong>
00000006:934296 Messages
00000006:934320 Stack Dump For Target Thread: Handle 13 (0x0000000d)
00000006:934344 Stack: Work Thread 1 Stack MYCRG
00000006:934792 Stack: Library / Program Module Stmt Procedure
00000006:934840 Stack: QSYS / QCSTCRGJOB CSTCRGJOB 9 : workThreadRoutine__FPv
00000006:934888 Stack: QSYS / QCSTCRGJOB CSTCRGJOB 28 : workQueueProcessLoop__FP8Cst
DAMbr
00000006:941688 Stack: QSYS / QCSTCMN CSTDAMBR 57 : processQueueMsgs__8CstDAMbrF
Q2_8CstDAMbr13CstQueueIndex
00000006:941696 Stack: QSYS / QCSTCMN CSTDAMBR 33 : processMsg__8CstDAMbrFP6CstM
sg
00000006:941712 Stack: QSYS / QCSTCMN CSTDAMBR 17 : callFnPtr__8CstDAMbrFPQ2_8Cs
tDAMbr19MsgFunctionPtrEntryP6
00000006:941728 Stack: QSYS / QCSTCMN CSTACK 3 : CstStripOffHeaderMsgPart
00000006:941736 Stack: QSYS / QCSTCMN CSTDAMBR 53 : processMsg__8CstDAMbrFP6CstM
sg
00000006:941752 Stack: QSYS / QCSTCMN CSTDAMBR 17 : callFnPtr__8CstDAMbrFPQ2_8Cs
tDAMbr19MsgFunctionPtrEntryP6
00000006:970888 Stack: QSYS / QCSTCRGS2 CSTCRGSS 39 : startCrg
00000006:970912 Stack: QSYS / QCSTCRGS2 CSTCRGSS 344 : doMessageProcessing__FP6CstM
sgP8CstDAMbr
00000006:970928 Stack: QSYS / QCSTCRGS2 CSTCRGSS 57 : doExitPgmPhase__FP6CstMsgP8C
stDAMbr
00000006:981984 Stack: QSYS / QCSTCMN CSTDAMBR 52 : waitForJobEnd__8CstDAMbrFPA2
6_ci
00000006:982000 Stack: QSYS / QCSTCMN CSTDAMBR 73 : waitForSpecialMsg__8CstDAMbr
FP17CstSpecialMsgListPA8_ciT3
00000006:982016 Stack: QSYS / QC2UTIL1 QC2MI3 1 : <strong>(2dd)</strong> deq
00000006:982136 Stack: Exception In Stack Dump Code
00000006:982136 Stack: thread is likely terminated or no longer running the same code as the
captured stack
00000006:982160
<strong>(2b)</strong>Work Thread Index 2 Group MYCRG Last or current values
<strong>(2f)</strong>00000006:982176 Request handle D9C3C8C3 E2E3F5F2 0003 0000
<strong>(2cc)</strong>00000006:982176 SPI name
00000006:982184 <strong>(2h)</strong> POF 0, <strong>(2d)</strong>Completed ack <strong>(2j)</strong>round 0
00000006:982184 In getNextWorkMsg
00000006:982208 Node Ack Status POF Nack Msg Id
<strong>(2l)</strong> 00000006:982208 SYSTEM1 Ready
<strong>(2l)</strong> 00000006:982232 SYSTEM2 Ready
00000006:982248 Messages
00000006:982256 Stack Dump For Target Thread: Handle 11 (0x0000000b)
00000006:982256 Stack: Work Thread 2 Stack MYCRG
00000006:982344 Stack: Library / Program Module Stmt Procedure
00000006:982360 Stack: QSYS / QCSTCRGJOB CSTCRGJOB 9 : workThreadRoutine__FPv
00000006:982376 Stack: QSYS / QCSTCRGJOB CSTCRGJOB 28 : workQueueProcessLoop__FP8Cst
DAMbr
00000006:982392 Stack: QSYS / QCSTCMN CSTDAMBR 51 : processQueueMsgs__8CstDAMbrF
Q2_8CstDAMbr13CstQueueIndex
<strong>(2m)</strong> 00000006:982400 Stack: QSYS / QCSTCMN CSTDAMBR 105 : getNextWorkMsg__8CstDAMbrFv
00000006:982416 Stack: QSYS / QC2UTIL1 QC2MI3 1 : deq
00000006:982480 Stack: Exception In Stack Dump Code
00000006:982480 Stack: thread is likely terminated or no longer running the same code as the
captured stack</pre>
</div>
<p>The second section contains the call stacks
for each thread that is part of the cluster job. For the most part, the main
thread will show the DMPCLUTRC which just completed (2aa). The work threads
(2a and 2b) contain the trace information that will help determine what is
happening with a cluster job. This section contains details on the call stack,
such as the SPI name (2c), the completed acknowledgement (ACK)(2d), request
handle for the associated APIs (2e) or the last completed request handle (2f),
current point of failure (POF) (2g and 2h), current round of acknowledgement
(ACK) (2i and 2j) and the nodes that have acknowledged (ACK) (2k and 2l).</p>
<p> Current<dfn class="term"> point
of failure (POF)</dfn> is an internal value that represents where in the
code the current protocol is, and may not necessarily indicate a failure has
occurred. An <dfn class="term">Ack</dfn> means the node has successfully completed this
part of the protocol and is waiting for all other nodes to acknowledge (ACK)
or Nack. A <dfn class="term">Nack</dfn> means the node cannot successfully complete this
part of the protocol and is waiting for all other nodes to respond. The message
ID for the Nack is given in the next column (2bb). This is the same message
that is sent to the originator's RIQ. If a node fails during a protocol, it's
status shows as Fail and, depending on the protocol and the node, may or may
not be considered a Nack. An Ack status of Inactive means the node did not
participate in the protocol. A value of Ready means that node has not responded
back yet. When a thread is in <samp class="codeph">getNextWorkMsg</samp> (2m) it means
the thread is waiting for work to do. </p>
<p>Read the procedure names by
starting at the bottom and working up the call stack. This example file contains
a <samp class="codeph">deq</samp> (2dd) with a <samp class="codeph">waitForSpecialMsg</samp>, <samp class="codeph">waitForJobEnd</samp> and <samp class="codeph">doExitPgmPhase</samp>.
This indicates that the protocol is waiting for an exit program to complete
before it can continue processing. From the <samp class="codeph">Ack Status</samp> (2k),
we can determine which node the protocol is waiting for. In this example,
we are waiting for node SYSTEM1 (2n). The qualified job name (2e) indicates
the job for which the system is waiting. Once you determine the job name,
you can work with the job to troubleshoot the cause of the delay. Some possible
causes may be that the job is still waiting in job queue, the job is running
but it takes time to process, or the job is waiting for an object that is
locked. </p>
<p>In this example, the protocol is waiting for an exit program
to complete. An easier way to determine if a protocol is waiting for an exit
program or vary job to complete is to look at the first section, and see if
in <samp class="codeph">waitForJobEnd</samp> (2o). The job name being waited for is on
the same line. This eliminates the need to look through the stacks. </p>
</div>
<div class="section"><img src="./delta.gif" alt="Start of change" /><h4 class="sectiontitle">Sections 3 of DMPCLUTRC results</h4><div class="p"><pre>5722SS1 V5R4M0 060210 AS/400 DUMP 073586/QSYS/MYCRG 08/22/05 16:48:18 PAGE 1
DMPSYSOBJ PARAMETERS
<strong>(3a)</strong>OBJ- MYCRGAIX CONTEXT- QTEMP
TYPE- *ALL SUBTYPE-*ALL
OBJECT TYPE- INDEX *CRGM
NAME- MYCRGAIX TYPE- 0E SUBTYPE- A5
LIBRARY- QTEMP 006B8A19B00C9E807000 TYPE- 04 SUBTYPE- C1
CREATION- 08/22/05 16:43:32 SIZE- 0000007000
ATTRIBUTES- 0000 ADDRESS- C7FE286F04 000000
.
.
.
.POINTERS-
NONE
OIR DATA-
NONE
END OF DUMP
* * * * * E N D O F L I S T I N G * * * * *</pre>
</div>
<p>The third section shown is an internal object which contains
information about the cluster job. In this example, its an internal index
called MYCRGAIX (3a). The information in here is much easier to read than
in section 2 above. </p>
<img src="./deltaend.gif" alt="End of change" /></div>
<div class="section"><h4 class="sectiontitle">Sections 4 of DMPCLUTRC results</h4><div class="p"><pre>5722SS1 V5R4M0 060210 AS/400 DUMP 073586/QSYS/USER 08/22/05 16:48:18 PAGE 1
DMPSYSOBJ PARAMETERS
<strong>(4a)</strong>OBJ- MYCRGTQ CONTEXT- QTEMP
TYPE- 0A SUBTYPE-EF
OBJECT TYPE- QUEUE *QTQ
NAME- MYCRGTQ TYPE- 0A SUBTYPE- EF
LIBRARY- QTEMP 006B8A19B00C9E807000 TYPE- 04 SUBTYPE- C1
CREATION- 08/22/05 16:43:32 SIZE- 000002C000
ATTRIBUTES- 0000 ADDRESS- CC6765CAA2 000000
QUEUE ATTRIBUTES-
.
.
.
.POINTERS-
000010 SPP 1A EF MYCRG QSYS 073586 00002160 0000
000020 SPP 1A EF MYCRG QSYS 073586 00001540 8000
000030 SPP 1A EF MYCRG QSYS 073586 000016E0 0000
000040 SPP 1A EF MYCRG QSYS 073586 00001690 0000
000050 SPP 1A EF MYCRG QSYS 073586 000016A0 0000
000070 SPP 1A EF MYCRG QSYS 073586 00002160 0000
OIR DATA-
NONE
END OF DUMP
* * * * * E N D O F L I S T I N G * * * * *</pre>
</div>
<div class="p">The fourth section shown is called the trace queue (4a).
In this situation it is called MYCRGTQ. This contains information about
what the cluster has been having this job execute and how each job responded
to the request. <div class="note"><span class="notetitle">Note:</span> Each message is not described fully here.</div>
</div>
</div>
<div class="section"><h4 class="sectiontitle">Sections 5 of DMPCLUTRC results</h4><div class="p"><pre>5722SS1 V5R4M0 060210 AS/400 DUMP 073586/QSYS/MYCRG 08/22/05 16:48:18 PAGE 1
DMPSYSOBJ PARAMETERS
<strong>(5a)</strong> OBJ- MYCRG CONTEXT- QUSRSYS
TYPE- 19 SUBTYPE-2C
OBJECT TYPE- SPACE *CRG
NAME- MYCRG TYPE- 19 SUBTYPE- 2C
LIBRARY- QUSRSYS TYPE- 04 SUBTYPE- 01
CREATION- 08/17/05 07:16:40 SIZE- 0000002000
OWNER- MYCLUSTER TYPE- 08 SUBTYPE- 01
ATTRIBUTES- 0800 ADDRESS- 1EC687A1F3 000000
SPACE ATTRIBUTES-
.
.
.
END OF DUMP
* * * * * E N D O F L I S T I N G * * * * *</pre>
</div>
<p>The fifth section contains information about the CRG object
(5a).</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="rzaigtroubleshoot.htm" title="Find error recovery solutions for problems that are specific to clusters.">Troubleshoot clusters</a></div>
</div>
</div>
<img src="./deltaend.gif" alt="End of change" /></body>
</html>