ibm-information-center/dist/eclipse/plugins/i5OS.ic.rzatz_5.4.0.1/51/trb/trbhangdet.htm

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=utf-8">
<LINK rel="stylesheet" type="text/css" href="../../../rzahg/ic.css">

<title>Troubleshoot: Detect hung threads in J2EE applications</title>
</head>
<BODY>
<!-- Java sync-link -->
<SCRIPT LANGUAGE="Javascript" SRC="../../../rzahg/synch.js" TYPE="text/javascript"></SCRIPT>
<h3><a name="trbhangdet"></a>Troubleshoot: Detect hung threads in J2EE applications</h3>

<p>A common error in J2EE applications is a hung thread. A hung thread can
result from a simple software defect (such as an infinite loop) or a more
complex cause (for example, a resource deadlock). System resources, such as
CPU time, might be consumed by this hung transaction when threads run unbounded
code paths, such as when the code is running in an infinite loop. Alternately,
a system can become unresponsive even though all resources are idle, as in
a deadlock scenario. Unless an end user or a monitoring tool reports the problem,
the system may remain in this degraded state indefinitely.</p>

<p>The hang detection option for WebSphere Application Server - Express is turned on
by default. You can configure a hang detection policy to accommodate your
applications and environment so that potential hangs can be reported, providing
earlier detection of failing servers. When a hung thread is detected, WebSphere
Application Server - Express notifies you so that you can troubleshoot the problem.</p>

<p>Using the hang detection policy, you can specify a time that is too long
for a unit of work to complete. The thread monitor checks all managed threads
in the system (for example, Web container threads and object request broker
(ORB) threads). Unmanaged threads, which are threads created by applications,
are not monitored.</p>

<p>When WebSphere Application Server - Express detects that a thread has been active
longer than the time defined by the thread monitor threshold, the application
server takes the following actions:</p>

<ul>
<li><p>Logs a warning in the WebSphere Application Server - Express System.Out log
file that indicates the name of the thread that is hung and how long it has
already been active. The following message is written to the log:</p>
<pre>
WSVR0605W: Thread <em>threadname</em> has been active for <em>hangtime</em> and may be hung.
There are <em>totalthreads</em> threads in total in the server that may be hung.
</pre>
<p>where <em>threadname</em> is the name that appears in a JVM thread dump,
<em>hangtime</em> gives an approximation of how long the thread has been active
and <em>totalthreads</em> gives an overall assessment of the system
threads.</p>
</li>
<li><p>Issues a Java Management Extensions (JMX) notification. This notification
enables third-party tools to catch the event and take appropriate action,
such as triggering a JVM thread dump of the server, or issuing an electronic
page or e-mail. The following JMX notification events are defined in the
com.ibm.websphere.management.NotificationConstants class:</p>

	 	<ul>
		<li>TYPE_THREAD_MONITOR_THREAD_HUNG: This event is triggered by the detection
of a (potentially) hung thread.</li>
		<li>TYPE_THREAD_MONITOR_THREAD_CLEAR: This event is triggered if a thread
that was previously reported as hung completes its work. See <a href="#falsealarms">False Alarms</a> for more information.</li>
		</ul>

</li>

<li><p>Triggers changes in the performance monitoring
infrastructure (PMI) data counters. These PMI data counters are used by various
tools to provide a performance analysis.</p></li>
</ul>


<p><a name="falsealarms"></a><b>False Alarms</b></p>

<p>If the work actually completes, a second set of messages, notifications and PMI events is produced to identify the false alarm. The following message is written to the System.out log:</p>
<pre>
WSVR0606W: Thread <em>threadname</em> was previously reported to be hung but has completed.
It was active for approximately <em>hangtime</em>.
There are <em>totalthreads</em> threads in total in the server that still may be hung.
</pre>
<p>where <em>threadname</em> is the name that appears in a JVM thread dump,
<em>hangtime</em> gives an approximation of how long the thread has been active
and <em>totalthreads</em> gives an overall assessment of the system threads.</p>

<p><b>Automatic adjustment of the hang time threshold</b></p>

<p>If the thread monitor determines that too many false alarms are issued (determined
by the number of pairs of hang and clear messages), it can automatically adjust
the threshold. When this adjustment occurs, the following message is written
to the System.out log:</p>
<pre>
WSVR0607W: Too many thread hangs have been falsely reported.
The hang threshold is now being set to <em>thresholdtime</em>.
</pre>
<p>where <em>thresholdtime</em> is the time (in seconds) in which a thread can
be active before it is considered hung.</p>

<p>You can prevent WebSphere Application Server - Express from automatically
adjusting the hang time threshold.</p>

<p>For more information, see the following topics:</p>

<ul>
<li><a href="trbadjhangdet.htm">Adjust the hang detection policy</a></li>
<li><a href="trbconfighangdet.htm">Configure the hang detection policy</a></li>
</ul>


</body>
</html>