ibm-information-center/dist/eclipse/plugins/i5OS.ic.rzaig_5.4.0.1/rzaigtroubleshootpartitionerrors.htm

134 lines
9.9 KiB
HTML
Raw Permalink Normal View History

2024-04-02 14:02:31 +00:00
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html lang="en-us" xml:lang="en-us">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="security" content="public" />
<meta name="Robots" content="index,follow" />
<meta http-equiv="PICS-Label" content='(PICS-1.1 "http://www.icra.org/ratingsv02.html" l gen true r (cz 1 lz 1 nz 1 oz 1 vz 1) "http://www.rsac.org/ratingsv01.html" l gen true r (n 0 s 0 v 0 l 0) "http://www.classify.org/safesurf/" l gen true r (SS~~000 1))' />
<meta name="DC.Type" content="concept" />
<meta name="DC.Title" content="Partition errors" />
<meta name="abstract" content="Certain cluster conditions are easily corrected. If a cluster partition has occurred, you can learn how to recover. This topic also tells you how to avoid a cluster partition and gives you an example of how to merge partitions back together." />
<meta name="description" content="Certain cluster conditions are easily corrected. If a cluster partition has occurred, you can learn how to recover. This topic also tells you how to avoid a cluster partition and gives you an example of how to merge partitions back together." />
<meta name="DC.Relation" scheme="URI" content="rzaigtroubleshoot.htm" />
<meta name="DC.Relation" scheme="URI" content="rzaigtroubleshootdeterminepartitions.htm" />
<meta name="DC.Relation" scheme="URI" content="rzaigtroubleshootchangepartitionednodes.htm" />
<meta name="DC.Relation" scheme="URI" content="rzaigtroubleshoottipclusterpartitions.htm" />
<meta name="DC.Relation" scheme="URI" content="rzaigconceptpartition.htm" />
<meta name="DC.Relation" scheme="URI" content="rzaigplanavoidclusterpartition.htm" />
<meta name="DC.Relation" scheme="URI" content="rzaigconceptsmerge.htm" />
<meta name="DC.Relation" scheme="URI" content="rzaigtroubleshootexamplefailover.htm" />
<meta name="copyright" content="(C) Copyright IBM Corporation 1998, 2006" />
<meta name="DC.Rights.Owner" content="(C) Copyright IBM Corporation 1998, 2006" />
<meta name="DC.Format" content="XHTML" />
<meta name="DC.Identifier" content="rzaigtroubleshootpartitionerrors" />
<meta name="DC.Language" content="en-us" />
<!-- All rights reserved. Licensed Materials Property of IBM -->
<!-- US Government Users Restricted Rights -->
<!-- Use, duplication or disclosure restricted by -->
<!-- GSA ADP Schedule Contract with IBM Corp. -->
<link rel="stylesheet" type="text/css" href="./ibmdita.css" />
<link rel="stylesheet" type="text/css" href="./ic.css" />
<title>Partition errors</title>
</head>
<body id="rzaigtroubleshootpartitionerrors"><a name="rzaigtroubleshootpartitionerrors"><!-- --></a>
<!-- Java sync-link --><script language="Javascript" src="../rzahg/synch.js" type="text/javascript"></script>
<h1 class="topictitle1">Partition errors</h1>
<div><p>Certain cluster conditions are easily corrected. If a cluster partition
has occurred, you can learn how to recover. This topic also tells you how
to avoid a cluster partition and gives you an example of how to merge partitions
back together.</p>
<p>A cluster partition occurs in a cluster whenever contact is lost between
one or more nodes in the cluster and a failure of the lost nodes cannot be
confirmed. This is not to be confused with a partition in a logical partition
(LPAR) environment.</p>
<p>If you receive error message CPFBB20 in either the history log (QHST) or
the QCSTCTL joblog, a cluster partition has occurred and you need to know
how to recover. The following example shows a cluster partition that involves
a cluster made up of four nodes: A, B, C, and D. The example shows a loss
of communication between cluster nodes B and C has occurred, which results
in the cluster dividing into two cluster partitions. Before the cluster partition
occurred, there were four cluster resource groups, which can be of any type,
called CRG A, CRG B, CRG C, and CRG D. The example shows the recovery domain
of each cluster resource group.</p>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" frame="border" border="1" rules="all"><caption>Table 1. Example of a recovery domain during a cluster partition</caption><thead align="left"><tr valign="bottom"><th align="center" valign="bottom" width="24%" id="d0e36">Node A</th>
<th align="center" valign="bottom" width="24%" id="d0e38">Node B</th>
<th align="center" valign="bottom" width="4%" id="d0e40">x</th>
<th align="center" valign="bottom" width="24%" id="d0e42">Node C</th>
<th align="center" valign="bottom" width="24%" id="d0e44">Node D</th>
</tr>
</thead>
<tbody><tr><td align="center" valign="top" width="24%" headers="d0e36 ">CRG A (backup1)</td>
<td align="center" valign="top" width="24%" headers="d0e38 ">CRG A (primary)</td>
<td rowspan="4" valign="top" width="4%" headers="d0e40 ">&nbsp;</td>
<td align="center" valign="top" width="24%" headers="d0e42 ">&nbsp;</td>
<td align="center" valign="top" width="24%" headers="d0e44 ">&nbsp;</td>
</tr>
<tr><td align="center" valign="top" width="24%" headers="d0e36 ">&nbsp;</td>
<td align="center" valign="top" width="24%" headers="d0e38 ">CRG B (primary)</td>
<td align="center" valign="top" width="24%" headers="d0e42 ">CRG B (backup1)</td>
<td align="center" valign="top" width="24%" headers="d0e44 ">&nbsp;</td>
</tr>
<tr><td align="center" valign="top" width="24%" headers="d0e36 ">&nbsp;</td>
<td align="center" valign="top" width="24%" headers="d0e38 ">CRG C (primary)</td>
<td align="center" valign="top" width="24%" headers="d0e42 ">CRG C (backup1)</td>
<td align="center" valign="top" width="24%" headers="d0e44 ">CRG C (backup2)</td>
</tr>
<tr><td align="center" valign="top" width="24%" headers="d0e36 ">CRG D (backup2)</td>
<td align="center" valign="top" width="24%" headers="d0e38 ">CRG D (primary)</td>
<td align="center" valign="top" width="24%" headers="d0e42 ">CRG D (backup1)</td>
<td align="center" valign="top" width="24%" headers="d0e44 ">&nbsp;</td>
</tr>
<tr><td colspan="2" align="center" valign="top" headers="d0e36 d0e38 "><strong>Partition 1</strong></td>
<td align="center" valign="top" width="4%" headers="d0e40 ">&nbsp;</td>
<td colspan="2" align="center" valign="top" headers="d0e42 d0e44 "><strong>Partition 2</strong></td>
</tr>
</tbody>
</table>
</div>
<p>A cluster may partition if the maximum transmission unit (MTU) at any point
in the communication path is less than the cluster communications tuneable
parameter, message fragment size. MTU for a cluster IP address can be verified
using the <span class="cmdname">Work with TCP/IP Network Status (WRKTCPSTS)</span> command
on the subject node. The MTU must also be verified at each step along the
entire communication path. If the MTU is less than the message fragment size,
either raise the MTU of the path or lower the message fragment size. You can
use the <span class="apiname">Retrieve Cluster Resource Services Information (QcstRetrieveCRSInfo)</span> API
to view the current settings of the tuning parameters and the<span class="apiname"> Change
Cluster Resource Services (QcstChgClusterResourceServices)</span> API to
change the settings.</p>
<p>Once the cause of the cluster partition condition has been corrected, the
cluster will detect the re-established communication link and issue the message
CPFBB21 in either the history log (QHST) or the QCSTCTL joblog. This informs
the operator that the cluster has recovered from the cluster partition. Be
aware that once the cluster partition condition has been corrected, it may
be a few minutes before the cluster merges back together.</p>
</div>
<div>
<ul class="ullinks">
<li class="ulchildlink"><strong><a href="rzaigtroubleshootdeterminepartitions.htm">Determine primary and secondary cluster partitions</a></strong><br />
<span><img src="./delta.gif" alt="Start of change" />In order to determine the types of cluster resource
group actions that you can take within a cluster partition, you need to know
whether the partition is a primary or a secondary cluster partition. When
a partition is detected, each partition is designated as a primary or secondary
partition for each cluster resource group defined in the cluster.<img src="./deltaend.gif" alt="End of change" /></span></li>
<li class="ulchildlink"><strong><a href="rzaigtroubleshootchangepartitionednodes.htm">Change partitioned nodes to failed</a></strong><br />
Sometimes, a partitioned condition is reported when there really was a node outage. This can occur when cluster resource services loses communications with one or more nodes, but cannot detect if the nodes are still operational. When this condition occurs, a simple mechanism exists for you to indicate that the node has failed.</li>
<li class="ulchildlink"><strong><a href="rzaigtroubleshoottipclusterpartitions.htm">Tips: Cluster partitions</a></strong><br />
Use these tips for cluster partitions.</li>
</ul>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="rzaigtroubleshoot.htm" title="Find error recovery solutions for problems that are specific to clusters.">Troubleshoot clusters</a></div>
</div>
<div class="relconcepts"><strong>Related concepts</strong><br />
<div><a href="rzaigconceptpartition.htm" title="A cluster partition is a subset of the active cluster nodes that results from a communications failure. Members of a partition maintain connectivity with each other.">Cluster partition</a></div>
<div><a href="rzaigplanavoidclusterpartition.htm" title="The typical network-related cluster partition can best be avoided by configuring redundant communications paths between all nodes in the cluster.">Avoid a cluster partition</a></div>
<div><a href="rzaigconceptsmerge.htm" title="A merge operation is similar to a rejoin operation except that it occurs when nodes that are partitioned begin communicating again.">Merge</a></div>
<div><a href="rzaigtroubleshootexamplefailover.htm" title="Usually, a failover results from a node failure, but there are other reasons that can also generate a failover.">Example: Failure</a></div>
</div>
</div>
</body>
</html>