134 lines
9.9 KiB
HTML
134 lines
9.9 KiB
HTML
|
<?xml version="1.0" encoding="UTF-8"?>
|
||
|
<!DOCTYPE html
|
||
|
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||
|
<html lang="en-us" xml:lang="en-us">
|
||
|
<head>
|
||
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
|
||
|
<meta name="security" content="public" />
|
||
|
<meta name="Robots" content="index,follow" />
|
||
|
<meta http-equiv="PICS-Label" content='(PICS-1.1 "http://www.icra.org/ratingsv02.html" l gen true r (cz 1 lz 1 nz 1 oz 1 vz 1) "http://www.rsac.org/ratingsv01.html" l gen true r (n 0 s 0 v 0 l 0) "http://www.classify.org/safesurf/" l gen true r (SS~~000 1))' />
|
||
|
<meta name="DC.Type" content="concept" />
|
||
|
<meta name="DC.Title" content="Partition errors" />
|
||
|
<meta name="abstract" content="Certain cluster conditions are easily corrected. If a cluster partition has occurred, you can learn how to recover. This topic also tells you how to avoid a cluster partition and gives you an example of how to merge partitions back together." />
|
||
|
<meta name="description" content="Certain cluster conditions are easily corrected. If a cluster partition has occurred, you can learn how to recover. This topic also tells you how to avoid a cluster partition and gives you an example of how to merge partitions back together." />
|
||
|
<meta name="DC.Relation" scheme="URI" content="rzaigtroubleshoot.htm" />
|
||
|
<meta name="DC.Relation" scheme="URI" content="rzaigtroubleshootdeterminepartitions.htm" />
|
||
|
<meta name="DC.Relation" scheme="URI" content="rzaigtroubleshootchangepartitionednodes.htm" />
|
||
|
<meta name="DC.Relation" scheme="URI" content="rzaigtroubleshoottipclusterpartitions.htm" />
|
||
|
<meta name="DC.Relation" scheme="URI" content="rzaigconceptpartition.htm" />
|
||
|
<meta name="DC.Relation" scheme="URI" content="rzaigplanavoidclusterpartition.htm" />
|
||
|
<meta name="DC.Relation" scheme="URI" content="rzaigconceptsmerge.htm" />
|
||
|
<meta name="DC.Relation" scheme="URI" content="rzaigtroubleshootexamplefailover.htm" />
|
||
|
<meta name="copyright" content="(C) Copyright IBM Corporation 1998, 2006" />
|
||
|
<meta name="DC.Rights.Owner" content="(C) Copyright IBM Corporation 1998, 2006" />
|
||
|
<meta name="DC.Format" content="XHTML" />
|
||
|
<meta name="DC.Identifier" content="rzaigtroubleshootpartitionerrors" />
|
||
|
<meta name="DC.Language" content="en-us" />
|
||
|
<!-- All rights reserved. Licensed Materials Property of IBM -->
|
||
|
<!-- US Government Users Restricted Rights -->
|
||
|
<!-- Use, duplication or disclosure restricted by -->
|
||
|
<!-- GSA ADP Schedule Contract with IBM Corp. -->
|
||
|
<link rel="stylesheet" type="text/css" href="./ibmdita.css" />
|
||
|
<link rel="stylesheet" type="text/css" href="./ic.css" />
|
||
|
<title>Partition errors</title>
|
||
|
</head>
|
||
|
<body id="rzaigtroubleshootpartitionerrors"><a name="rzaigtroubleshootpartitionerrors"><!-- --></a>
|
||
|
<!-- Java sync-link --><script language="Javascript" src="../rzahg/synch.js" type="text/javascript"></script>
|
||
|
<h1 class="topictitle1">Partition errors</h1>
|
||
|
<div><p>Certain cluster conditions are easily corrected. If a cluster partition
|
||
|
has occurred, you can learn how to recover. This topic also tells you how
|
||
|
to avoid a cluster partition and gives you an example of how to merge partitions
|
||
|
back together.</p>
|
||
|
<p>A cluster partition occurs in a cluster whenever contact is lost between
|
||
|
one or more nodes in the cluster and a failure of the lost nodes cannot be
|
||
|
confirmed. This is not to be confused with a partition in a logical partition
|
||
|
(LPAR) environment.</p>
|
||
|
<p>If you receive error message CPFBB20 in either the history log (QHST) or
|
||
|
the QCSTCTL joblog, a cluster partition has occurred and you need to know
|
||
|
how to recover. The following example shows a cluster partition that involves
|
||
|
a cluster made up of four nodes: A, B, C, and D. The example shows a loss
|
||
|
of communication between cluster nodes B and C has occurred, which results
|
||
|
in the cluster dividing into two cluster partitions. Before the cluster partition
|
||
|
occurred, there were four cluster resource groups, which can be of any type,
|
||
|
called CRG A, CRG B, CRG C, and CRG D. The example shows the recovery domain
|
||
|
of each cluster resource group.</p>
|
||
|
|
||
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" frame="border" border="1" rules="all"><caption>Table 1. Example of a recovery domain during a cluster partition</caption><thead align="left"><tr valign="bottom"><th align="center" valign="bottom" width="24%" id="d0e36">Node A</th>
|
||
|
<th align="center" valign="bottom" width="24%" id="d0e38">Node B</th>
|
||
|
<th align="center" valign="bottom" width="4%" id="d0e40">x</th>
|
||
|
<th align="center" valign="bottom" width="24%" id="d0e42">Node C</th>
|
||
|
<th align="center" valign="bottom" width="24%" id="d0e44">Node D</th>
|
||
|
</tr>
|
||
|
</thead>
|
||
|
<tbody><tr><td align="center" valign="top" width="24%" headers="d0e36 ">CRG A (backup1)</td>
|
||
|
<td align="center" valign="top" width="24%" headers="d0e38 ">CRG A (primary)</td>
|
||
|
<td rowspan="4" valign="top" width="4%" headers="d0e40 "> </td>
|
||
|
<td align="center" valign="top" width="24%" headers="d0e42 "> </td>
|
||
|
<td align="center" valign="top" width="24%" headers="d0e44 "> </td>
|
||
|
</tr>
|
||
|
<tr><td align="center" valign="top" width="24%" headers="d0e36 "> </td>
|
||
|
<td align="center" valign="top" width="24%" headers="d0e38 ">CRG B (primary)</td>
|
||
|
<td align="center" valign="top" width="24%" headers="d0e42 ">CRG B (backup1)</td>
|
||
|
<td align="center" valign="top" width="24%" headers="d0e44 "> </td>
|
||
|
</tr>
|
||
|
<tr><td align="center" valign="top" width="24%" headers="d0e36 "> </td>
|
||
|
<td align="center" valign="top" width="24%" headers="d0e38 ">CRG C (primary)</td>
|
||
|
<td align="center" valign="top" width="24%" headers="d0e42 ">CRG C (backup1)</td>
|
||
|
<td align="center" valign="top" width="24%" headers="d0e44 ">CRG C (backup2)</td>
|
||
|
</tr>
|
||
|
<tr><td align="center" valign="top" width="24%" headers="d0e36 ">CRG D (backup2)</td>
|
||
|
<td align="center" valign="top" width="24%" headers="d0e38 ">CRG D (primary)</td>
|
||
|
<td align="center" valign="top" width="24%" headers="d0e42 ">CRG D (backup1)</td>
|
||
|
<td align="center" valign="top" width="24%" headers="d0e44 "> </td>
|
||
|
</tr>
|
||
|
<tr><td colspan="2" align="center" valign="top" headers="d0e36 d0e38 "><strong>Partition 1</strong></td>
|
||
|
<td align="center" valign="top" width="4%" headers="d0e40 "> </td>
|
||
|
<td colspan="2" align="center" valign="top" headers="d0e42 d0e44 "><strong>Partition 2</strong></td>
|
||
|
</tr>
|
||
|
</tbody>
|
||
|
</table>
|
||
|
</div>
|
||
|
<p>A cluster may partition if the maximum transmission unit (MTU) at any point
|
||
|
in the communication path is less than the cluster communications tuneable
|
||
|
parameter, message fragment size. MTU for a cluster IP address can be verified
|
||
|
using the <span class="cmdname">Work with TCP/IP Network Status (WRKTCPSTS)</span> command
|
||
|
on the subject node. The MTU must also be verified at each step along the
|
||
|
entire communication path. If the MTU is less than the message fragment size,
|
||
|
either raise the MTU of the path or lower the message fragment size. You can
|
||
|
use the <span class="apiname">Retrieve Cluster Resource Services Information (QcstRetrieveCRSInfo)</span> API
|
||
|
to view the current settings of the tuning parameters and the<span class="apiname"> Change
|
||
|
Cluster Resource Services (QcstChgClusterResourceServices)</span> API to
|
||
|
change the settings.</p>
|
||
|
<p>Once the cause of the cluster partition condition has been corrected, the
|
||
|
cluster will detect the re-established communication link and issue the message
|
||
|
CPFBB21 in either the history log (QHST) or the QCSTCTL joblog. This informs
|
||
|
the operator that the cluster has recovered from the cluster partition. Be
|
||
|
aware that once the cluster partition condition has been corrected, it may
|
||
|
be a few minutes before the cluster merges back together.</p>
|
||
|
</div>
|
||
|
<div>
|
||
|
<ul class="ullinks">
|
||
|
<li class="ulchildlink"><strong><a href="rzaigtroubleshootdeterminepartitions.htm">Determine primary and secondary cluster partitions</a></strong><br />
|
||
|
<span><img src="./delta.gif" alt="Start of change" />In order to determine the types of cluster resource
|
||
|
group actions that you can take within a cluster partition, you need to know
|
||
|
whether the partition is a primary or a secondary cluster partition. When
|
||
|
a partition is detected, each partition is designated as a primary or secondary
|
||
|
partition for each cluster resource group defined in the cluster.<img src="./deltaend.gif" alt="End of change" /></span></li>
|
||
|
<li class="ulchildlink"><strong><a href="rzaigtroubleshootchangepartitionednodes.htm">Change partitioned nodes to failed</a></strong><br />
|
||
|
Sometimes, a partitioned condition is reported when there really was a node outage. This can occur when cluster resource services loses communications with one or more nodes, but cannot detect if the nodes are still operational. When this condition occurs, a simple mechanism exists for you to indicate that the node has failed.</li>
|
||
|
<li class="ulchildlink"><strong><a href="rzaigtroubleshoottipclusterpartitions.htm">Tips: Cluster partitions</a></strong><br />
|
||
|
Use these tips for cluster partitions.</li>
|
||
|
</ul>
|
||
|
|
||
|
<div class="familylinks">
|
||
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="rzaigtroubleshoot.htm" title="Find error recovery solutions for problems that are specific to clusters.">Troubleshoot clusters</a></div>
|
||
|
</div>
|
||
|
<div class="relconcepts"><strong>Related concepts</strong><br />
|
||
|
<div><a href="rzaigconceptpartition.htm" title="A cluster partition is a subset of the active cluster nodes that results from a communications failure. Members of a partition maintain connectivity with each other.">Cluster partition</a></div>
|
||
|
<div><a href="rzaigplanavoidclusterpartition.htm" title="The typical network-related cluster partition can best be avoided by configuring redundant communications paths between all nodes in the cluster.">Avoid a cluster partition</a></div>
|
||
|
<div><a href="rzaigconceptsmerge.htm" title="A merge operation is similar to a rejoin operation except that it occurs when nodes that are partitioned begin communicating again.">Merge</a></div>
|
||
|
<div><a href="rzaigtroubleshootexamplefailover.htm" title="Usually, a failover results from a node failure, but there are other reasons that can also generate a failover.">Example: Failure</a></div>
|
||
|
</div>
|
||
|
</div>
|
||
|
</body>
|
||
|
</html>
|