利用Cluster Verify Utility工具体验RAC最佳实践

Cluster Verification Utilit(CVU)是Oracle所推荐的一种集群检验工具。该检验工具帮助用户在Cluter部署的各个阶段验证集群的重要组件,这些阶段包括硬件搭建、Clusterware的安装、RDBMS的安装、存储等等。我们既可以在Cluster安装之前使用CVU来帮我们检验所配置的环境正确可用,也可以在软件安装完成后使用CVU来做对集群的验收。

CVU提供了一种可扩展的框架,其所实施的常规检验活动是独立于具体的平台,并且向存储和网络的检验提供了厂商接口(Vendor Interface)。
CVU工具不依赖于其他Oracle软件,仅使用命令cluvfy,如cluvfy stage -pre crsinst -n vrh1,vrh2。

cluvfy的部署十分简单,在本地节点安装后,该工具在运行过程中会自动部署到远程主机上。具体的自动部署流程如下:

  1. 用户在本地节点安装CVU
  2. 用户针对多个节点实施Verify检验命令
  3. CVU工具将拷贝自身必要的文件到远程节点
  4. CVU会在所有节点执行检验任务并生成报告

CVU工具可以为我们提供以下功能:

  1. 验证Cluster集群是否规范配置以便后续的RAC安装、配置和操作顺利
  2. 全类型的验证
  3. 非破坏性的验证
  4. 提供了易于使用的接口
  5. 支持各种平台和配置的RAC,明确完善的统一行为方式

注意不要误解cluvfy的作用,它仅仅是一个检验者,而不负责实际的配置或修复工作:

  1. cluvfy不支持任何类型的cluster或RAC操作
  2. 在检验到问题或失败后,cluvfy不会采取任何修正行为
  3. cluvfy不是性能调优或监控工具
  4. cluvfy不会尝试帮助你验证RAC数据库的内部结构

RAC的实际部署可以被逻辑地区分为多个操作阶段,这些阶段被称作是”stage”,在实际的部署过程中每一个stage由一系列的操作组成。每一个stage的都有自身的预检查(pre-check)和验收检查(post-check),如图:

cluvfy_stage_list

 

我们会在CRS和RAC数据库的安装过程中具体使用Cluster Verify Utility的不同”stage”,各种不同的stage可以使用cluvfy stage -list命令列出:

cluvfy stage -list

USAGE:
cluvfy stage {-pre|-post} <stage-name> <stage-specific options>  [-verbose]

Valid Stages are:
      -pre cfs        : pre-check for CFS setup
      -pre crsinst    : pre-check for CRS installation
      -pre acfscfg    : pre-check for ACFS Configuration.
      -pre dbinst     : pre-check for database installation
      -pre dbcfg      : pre-check for database configuration
      -pre hacfg      : pre-check for HA configuration
      -pre nodeadd    : pre-check for node addition.
      -post hwos      : post-check for hardware and operating system
      -post cfs       : post-check for CFS setup
      -post crsinst   : post-check for CRS installation
      -post acfscfg   : post-check for ACFS Configuration.
      -post hacfg     : post-check for HA configuration
      -post nodeadd   : post-check for node addition.
      -post nodedel   : post-check for node deletion.

在RAC Cluster中独立的子系统或者模块被称作组件(component),集群组件的可用性、完整性、稳定性以及其他一些表现均可以使用CVU来验证。简单如某个存储设备、复杂如包含CRSD、EVMD、CSSD、OCR等多个子组件的CRS stack都可以被认为是一个组件。

在CRS运行过程中为了检验Cluster中的某个特定组件(component)或者为了独立诊断某个Cluster集群子系统,需要用到合适的组件检查命令;各种不同组件的检验可以使用cluvfy comp -list命令列出:

 cluvfy comp -list

USAGE:
cluvfy comp     [-verbose]

Valid Components are:
      nodereach       : checks reachability between nodes
      nodecon         : checks node connectivity
      cfs             : checks CFS integrity
      ssa             : checks shared storage accessibility
      space           : checks space availability
      sys             : checks minimum system requirements
      clu             : checks cluster integrity
      clumgr          : checks cluster manager integrity
      ocr             : checks OCR integrity
      olr             : checks OLR integrity
      ha              : checks HA integrity
      crs             : checks CRS integrity
      nodeapp         : checks node applications existence
      admprv          : checks administrative privileges
      peer            : compares properties with peers
      software        : checks software distribution
      acfs            : checks ACFS integrity
      asm             : checks ASM integrity
      gpnp            : checks GPnP integrity
      gns             : checks GNS integrity
      scan            : checks SCAN configuration
      ohasd           : checks OHASD integrity
      clocksync       : checks Clock Synchronization
      vdisk           : checks Voting Disk configuration and UDEV settings
      dhcp            : Checks DHCP configuration
      dns             : Checks DNS configuration

我们可以从OTN的CVU专栏内下载到最新版本的cluvfy,如果没有特殊的要求那么我们总是推荐使用最新版本。一般在完成RAC安装后也可以从以下2个路径找到cluvfy:

Clusterware Home
<crs_home>/bin/cluvfy

Oracle Home
$ORACLE_HOME/bin/cluvfy

使用最为频繁的几个cluvfy命令如下:

Verify the hardware and operating system:检验操作系统和硬件的配置

cluvfy stage -post hwos -n vrh1,vrh2

cluvfy stage -post hwos -n vrh1,vrh2

Performing post-checks for hardware and operating system setup 

Checking node reachability...
Node reachability check passed from node "vrh1"

Checking user equivalence...
User equivalence check passed for user "grid"

Checking node connectivity...

Checking hosts config file...

Verification of the hosts config file successful

Node connectivity passed for subnet "192.168.1.0" with node(s) vrh2,vrh1
TCP connectivity check passed for subnet "192.168.1.0"

Node connectivity passed for subnet "192.168.2.0" with node(s) vrh2,vrh1
TCP connectivity check passed for subnet "192.168.2.0"

Node connectivity passed for subnet "169.254.0.0" with node(s) vrh2,vrh1
TCP connectivity check passed for subnet "169.254.0.0"

Interfaces found on subnet "192.168.1.0" that are likely candidates for VIP are:
vrh2 eth0:192.168.1.163 eth0:192.168.1.164 eth0:192.168.1.166
vrh1 eth0:192.168.1.161 eth0:192.168.1.190 eth0:192.168.1.162

Interfaces found on subnet "169.254.0.0" that are likely candidates for VIP are:
vrh2 eth1:169.254.8.92
vrh1 eth1:169.254.175.195

Interfaces found on subnet "192.168.2.0" that are likely candidates for a private interconnect are:
vrh2 eth1:192.168.2.19
vrh1 eth1:192.168.2.18

Node connectivity check passed

Check for multiple users with UID value 0 passed
Time zone consistency check passed

Checking shared storage accessibility...

  Disk                                  Sharing Nodes (2 in count)
  ------------------------------------  ------------------------
  /dev/sdb                              vrh2 vrh1               

  Disk                                  Sharing Nodes (2 in count)
  ------------------------------------  ------------------------
  /dev/sdc                              vrh2 vrh1               

  Disk                                  Sharing Nodes (2 in count)
  ------------------------------------  ------------------------
  /dev/sdd                              vrh2 vrh1               

  Disk                                  Sharing Nodes (2 in count)
  ------------------------------------  ------------------------
  /dev/sde                              vrh2 vrh1               

  Disk                                  Sharing Nodes (2 in count)
  ------------------------------------  ------------------------
  /dev/sdf                              vrh2 vrh1               

  Disk                                  Sharing Nodes (2 in count)
  ------------------------------------  ------------------------
  /dev/sdg                              vrh2 vrh1               

  Disk                                  Sharing Nodes (2 in count)
  ------------------------------------  ------------------------
  /dev/sdh                              vrh2 vrh1               

  Disk                                  Sharing Nodes (2 in count)
  ------------------------------------  ------------------------
  /dev/sdi                              vrh2 vrh1               

  Disk                                  Sharing Nodes (2 in count)
  ------------------------------------  ------------------------
  /dev/sdj                              vrh2 vrh1               

  Disk                                  Sharing Nodes (2 in count)
  ------------------------------------  ------------------------
  /dev/sdk                              vrh2 vrh1               

  Disk                                  Sharing Nodes (2 in count)
  ------------------------------------  ------------------------
  /dev/sdl                              vrh2 vrh1               

  Disk                                  Sharing Nodes (2 in count)
  ------------------------------------  ------------------------
  /dev/sdm                              vrh2 vrh1               

  Disk                                  Sharing Nodes (2 in count)
  ------------------------------------  ------------------------
  /dev/sdn                              vrh2 vrh1               

  Disk                                  Sharing Nodes (2 in count)
  ------------------------------------  ------------------------
  /dev/sdo                              vrh2 vrh1               

  Disk                                  Sharing Nodes (2 in count)
  ------------------------------------  ------------------------
  /dev/sdp                              vrh2 vrh1               

  Disk                                  Sharing Nodes (2 in count)
  ------------------------------------  ------------------------
  /dev/sdq                              vrh2 vrh1               

  Disk                                  Sharing Nodes (2 in count)
  ------------------------------------  ------------------------
  /dev/sdr                              vrh2 vrh1               

  Disk                                  Sharing Nodes (2 in count)
  ------------------------------------  ------------------------
  /dev/sds                              vrh2 vrh1               

  Disk                                  Sharing Nodes (2 in count)
  ------------------------------------  ------------------------
  /dev/sdt                              vrh2 vrh1               
Shared storage check was successful on nodes "vrh2,vrh1"
Post-check for hardware and operating system setup was successful.

Cluster Installation Ready check on all nodes:安装Clusterware前执行以下命令

 cluvfy stage -pre crsinst -n vrh1,vrh2

cluvfy stage -pre crsinst -n vrh1,vrh2
Performing pre-checks for cluster services setup
Checking node reachability...
Node reachability check passed from node "vrh1"
Checking user equivalence...
User equivalence check passed for user "grid"
Checking node connectivity...
Checking hosts config file...
Verification of the hosts config file successful
Node connectivity passed for subnet "192.168.1.0" with node(s) vrh2,vrh1
TCP connectivity check passed for subnet "192.168.1.0"

Node connectivity passed for subnet "192.168.2.0" with node(s) vrh2,vrh1
TCP connectivity check passed for subnet "192.168.2.0"

Node connectivity passed for subnet "169.254.0.0" with node(s) vrh2,vrh1
TCP connectivity check passed for subnet "169.254.0.0"

Interfaces found on subnet "192.168.1.0" that are likely candidates for VIP are:
vrh2 eth0:192.168.1.163 eth0:192.168.1.164 eth0:192.168.1.166
vrh1 eth0:192.168.1.161 eth0:192.168.1.190 eth0:192.168.1.162

Interfaces found on subnet "169.254.0.0" that are likely candidates for VIP are:
vrh2 eth1:169.254.8.92
vrh1 eth1:169.254.175.195

Interfaces found on subnet "192.168.2.0" that are likely candidates for a private interconnect are:
vrh2 eth1:192.168.2.19
vrh1 eth1:192.168.2.18

Node connectivity check passed

Checking ASMLib configuration.
Check for ASMLib configuration passed.
Total memory check passed
Available memory check passed
Swap space check passed
Free disk space check passed for "vrh2:/tmp"
Free disk space check passed for "vrh1:/tmp"
Check for multiple users with UID value 54322 passed
User existence check passed for "grid"
Group existence check passed for "oinstall"
Group existence check passed for "dba"
Membership check for user "grid" in group "oinstall" [as Primary] failed
Check failed on nodes:
        vrh2,vrh1
Membership check for user "grid" in group "dba" failed
Check failed on nodes:
        vrh2,vrh1
Run level check passed
Hard limits check passed for "maximum open file descriptors"
Soft limits check passed for "maximum open file descriptors"
Hard limits check passed for "maximum user processes"
Soft limits check passed for "maximum user processes"
System architecture check passed
Kernel version check passed
Kernel parameter check passed for "semmsl"
Kernel parameter check passed for "semmns"
Kernel parameter check passed for "semopm"
Kernel parameter check passed for "semmni"
Kernel parameter check passed for "shmmax"
Kernel parameter check passed for "shmmni"
Kernel parameter check passed for "shmall"
Kernel parameter check passed for "file-max"
Kernel parameter check passed for "ip_local_port_range"
Kernel parameter check passed for "rmem_default"
Kernel parameter check passed for "rmem_max"
Kernel parameter check passed for "wmem_default"
Kernel parameter check passed for "wmem_max"
Kernel parameter check passed for "aio-max-nr"
Package existence check passed for "make-3.81( x86_64)"
Package existence check passed for "binutils-2.17.50.0.6( x86_64)"
Package existence check passed for "gcc-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libaio-0.3.106 (x86_64)( x86_64)"
Package existence check passed for "glibc-2.5-24 (x86_64)( x86_64)"
Package existence check passed for "compat-libstdc++-33-3.2.3 (x86_64)( x86_64)"
Package existence check passed for "elfutils-libelf-0.125 (x86_64)( x86_64)"
Package existence check passed for "elfutils-libelf-devel-0.125( x86_64)"
Package existence check passed for "glibc-common-2.5( x86_64)"
Package existence check passed for "glibc-devel-2.5 (x86_64)( x86_64)"
Package existence check passed for "glibc-headers-2.5( x86_64)"
Package existence check passed for "gcc-c++-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libaio-devel-0.3.106 (x86_64)( x86_64)"
Package existence check passed for "libgcc-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libstdc++-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libstdc++-devel-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "sysstat-7.0.2( x86_64)"
Package existence check passed for "ksh-20060214( x86_64)"
Check for multiple users with UID value 0 passed
Current group ID check passed

Starting Clock synchronization checks using Network Time Protocol(NTP)...

NTP Configuration file check started...
No NTP Daemons or Services were found to be running

Clock synchronization check using Network Time Protocol(NTP) passed

Core file name pattern consistency check passed.

User "grid" is not part of "root" group. Check passed
Default user file creation mask check passed
Checking consistency of file "/etc/resolv.conf" across nodes

File "/etc/resolv.conf" does not have both domain and search entries defined
domain entry in file "/etc/resolv.conf" is consistent across nodes
search entry in file "/etc/resolv.conf" is consistent across nodes
All nodes have one search entry defined in file "/etc/resolv.conf"
The DNS response time for an unreachable node is within acceptable limit on all nodes

File "/etc/resolv.conf" is consistent across nodes

Time zone consistency check passed

Starting check for Huge Pages Existence ...

Check for Huge Pages Existence passed

Starting check for Hardware Clock synchronization at shutdown ...

Check for Hardware Clock synchronization at shutdown passed

Pre-check for cluster services setup was unsuccessful on all the nodes.

Database Installation Ready check on all nodes:安装RDBMS前执行以下命令
cluvfy stage -pre dbinst -n vrh1,vrh2

cluvfy stage -pre dbinst -n vrh1,vrh2 

Performing pre-checks for database installation 

Checking node reachability...
Node reachability check passed from node "vrh1"

Checking user equivalence...
User equivalence check passed for user "grid"

Checking node connectivity...

Checking hosts config file...

Verification of the hosts config file successful

Check: Node connectivity for interface "eth0"
Node connectivity passed for interface "eth0"

Check: Node connectivity for interface "eth1"
Node connectivity passed for interface "eth1"

Node connectivity check passed

Total memory check passed
Available memory check passed
Swap space check passed
Free disk space check passed for "vrh2:/tmp"
Free disk space check passed for "vrh1:/tmp"
Check for multiple users with UID value 54322 passed
User existence check passed for "grid"
Group existence check passed for "asmadmin"
Group existence check passed for "dba"
Membership check for user "grid" in group "asmadmin" [as Primary] passed
Membership check for user "grid" in group "dba" failed
Check failed on nodes:
        vrh2,vrh1
Run level check passed
Hard limits check passed for "maximum open file descriptors"
Soft limits check passed for "maximum open file descriptors"
Hard limits check passed for "maximum user processes"
Soft limits check passed for "maximum user processes"
System architecture check passed
Kernel version check passed
Kernel parameter check passed for "semmsl"
Kernel parameter check passed for "semmns"
Kernel parameter check passed for "semopm"
Kernel parameter check passed for "semmni"
Kernel parameter check passed for "shmmax"
Kernel parameter check passed for "shmmni"
Kernel parameter check passed for "shmall"
Kernel parameter check passed for "file-max"
Kernel parameter check passed for "ip_local_port_range"
Kernel parameter check passed for "rmem_default"
Kernel parameter check passed for "rmem_max"
Kernel parameter check passed for "wmem_default"
Kernel parameter check passed for "wmem_max"
Kernel parameter check passed for "aio-max-nr"
Package existence check passed for "make-3.81( x86_64)"
Package existence check passed for "binutils-2.17.50.0.6( x86_64)"
Package existence check passed for "gcc-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libaio-0.3.106 (x86_64)( x86_64)"
Package existence check passed for "glibc-2.5-24 (x86_64)( x86_64)"
Package existence check passed for "compat-libstdc++-33-3.2.3 (x86_64)( x86_64)"
Package existence check passed for "elfutils-libelf-0.125 (x86_64)( x86_64)"
Package existence check passed for "elfutils-libelf-devel-0.125( x86_64)"
Package existence check passed for "glibc-common-2.5( x86_64)"
Package existence check passed for "glibc-devel-2.5 (x86_64)( x86_64)"
Package existence check passed for "glibc-headers-2.5( x86_64)"
Package existence check passed for "gcc-c++-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libaio-devel-0.3.106 (x86_64)( x86_64)"
Package existence check passed for "libgcc-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libstdc++-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libstdc++-devel-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "sysstat-7.0.2( x86_64)"
Package existence check passed for "ksh-20060214( x86_64)"
Check for multiple users with UID value 0 passed
Current group ID check passed
Default user file creation mask check passed

Checking CRS integrity...

CRS integrity check passed

Checking Cluster manager integrity... 

Checking CSS daemon...
Oracle Cluster Synchronization Services appear to be online.

Cluster manager integrity check passed

Checking node application existence...

Checking existence of VIP node application (required)
VIP node application check passed

Checking existence of NETWORK node application (required)
NETWORK node application check passed

Checking existence of GSD node application (optional)
GSD node application is offline on nodes "vrh2,vrh1"

Checking existence of ONS node application (optional)
ONS node application check passed

Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed

Checking if CTSS Resource is running on all nodes...
CTSS resource check passed

Querying CTSS for time offset on all nodes...
Query of CTSS for time offset passed

Check CTSS state started...
CTSS is in Active state. Proceeding with check of clock time offsets on all nodes...
Check of clock time offsets passed

Oracle Cluster Time Synchronization Services check passed
Checking consistency of file "/etc/resolv.conf" across nodes

File "/etc/resolv.conf" does not have both domain and search entries defined
domain entry in file "/etc/resolv.conf" is consistent across nodes
search entry in file "/etc/resolv.conf" is consistent across nodes
All nodes have one search entry defined in file "/etc/resolv.conf"
The DNS response time for an unreachable node is within acceptable limit on all nodes

File "/etc/resolv.conf" is consistent across nodes

Time zone consistency check passed
Checking VIP configuration.
Checking VIP Subnet configuration.
Check for VIP Subnet configuration passed.
Checking VIP reachability
Check for VIP reachability passed.

Pre-check for database installation was unsuccessful on all the nodes.

利用cluvfy检验ocr组件:
cluvfy comp ocr

cluvfy comp ocr
Verifying OCR integrity

Checking OCR integrity...

Checking the absence of a non-clustered configuration...
All nodes free of non-clustered, local-only configurations

ASM Running check passed. ASM is running on all specified nodes

Checking OCR config file "/etc/oracle/ocr.loc"...

OCR config file "/etc/oracle/ocr.loc" check successful

Disk group for ocr location "+SYSTEMDG" available on all the nodes

Disk group for ocr location "+FRA" available on all the nodes

Disk group for ocr location "+DATA" available on all the nodes

NOTE:
This check does not verify the integrity of the OCR contents.
Execute 'ocrcheck' as a privileged user to verify the contents of OCR.

OCR integrity check passed

Verification of OCR integrity was successful.

11gr2 RAC安装INS-35354问题一例

今天在安装一套11.2.0.2 RAC数据库时出现了INS-35354的问题:
11gR2-GI-INS-35354

因为之前已经成功安装了11.2.0.2的GI,而且Cluster的一切状态都正常,出现这错误都少有点意外:

[grid@vrh1 ~]$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

去MOS搜了一圈,发现有可能是oraInventory中的inventory.xml更新不正确导致的:

Applies to:
Oracle Server - Enterprise Edition - Version: 11.2.0.1 to 11.2.0.2 - Release: 11.2 to 11.2
Information in this document applies to any platform.
Symptoms

Installing 11gR2 database software in a Grid Infrastrsucture environment fails with the error INS-35354:

The system on which you are attempting to install Oracle RAC is not part of a valid cluster.

Grid Infrastructure (Oracle Clusterware) is running on all nodes in the cluster which can be verified with:

crsctl check crs

Changes
This is a new install.
Cause
As per 11gR2 documentation the error description is:

INS-35354: The system on which you are attempting to install Oracle RAC is not part of a valid cluster.

Cause: Prior to installing Oracle RAC, you must create a valid cluster. 
This is done by deploying Grid Infrastructure software, 
which will allow configuration of Oracle Clusterware and Automatic Storage Management.

However, the problem at hand may be that the central inventory is missing the "CRS=true" flag 
(for the Grid Infrastructure Home).
<inventory.xml>
-------------

<HOME_LIST>
<HOME NAME="Ora11g_gridinfrahome1" LOC="/u01/grid" TYPE="O" IDX="1">
<NODE_LIST>
<NODE NAME="node1"/>
<NODE NAME="node2"/>
</NODE_LIST>

 -------------

From the inventory.xml, we see that the HOME NAME line is missing the CRS="true" flag.

The error INS-35354 will occur when the central inventory entry for the Grid Infrastructure 
home is missing the flag that identifies it as CRS-type home.
Solution
Use the -updateNodeList option for the installer command to fix the the inventory.

The full syntax is:

./runInstaller -updateNodeList "CLUSTER_NODES={node1,node2}"
ORACLE_HOME="" ORACLE_HOME_NAME="" LOCAL_NODE="Node_Name" CRS=[true|false]

Execute the command on any node in the cluster.

Examples:

For a two-node RAC cluster on UNIX:

Node1:
cd /u01/grid/oui/bin
./runInstaller -updateNodeList "CLUSTER_NODES={node1,node2}" ORACLE_HOME="/u01/crs" 
ORACLE_HOME_NAME="GI_11201" LOCAL_NODE="node1" CRS=true

For a 2-node RAC cluster on Windows:

Node 1:
cd e:\app\11.2.0\grid\oui\bin
e:\app\11.2.0\grid\oui\bin\setup -updateNodeList "CLUSTER_NODES={RACNODE1,RACNODE2}" 
ORACLE_HOME="e:\app\11.2.0\grid" ORACLE_HOME_NAME="OraCrs11g_home1" LOCAL_NODE="RACNODE1" CRS=true

我环境中的inventory.xml内容如下:

[grid@vrh1 ContentsXML]$ cat inventory.xml 
<?xml version="1.0" standalone="yes" ?>
<!-- Copyright (c) 1999, 2010, Oracle. All rights reserved. -->
<!-- Do not modify the contents of this file by hand. -->
<INVENTORY>
<VERSION_INFO>
   <SAVED_WITH>11.2.0.2.0</SAVED_WITH>
   <MINIMUM_VER>2.1.0.6.0</MINIMUM_VER>
</VERSION_INFO>
<HOME_LIST>
<HOME NAME="Ora11g_gridinfrahome1" LOC="/g01/11.2.0/grid" TYPE="O" IDX="1" >
   <NODE_LIST>
      <NODE NAME="vrh1"/>
      <NODE NAME="vrh2"/>
   </NODE_LIST>
</HOME>
</HOME_LIST>
</INVENTORY>

显然是在<HOME NAME这里缺少了CRS=”true”的标志,导致OUI安装界面在检测时认为该节点没有安装GI。

解决方案其实很简单只要加入CRS=”true”在重启runInstaller即可,不需要如文档中介绍的那样使用runInstaller -updateNodeList的复杂命令组合。

[grid@vrh1 ContentsXML]$ cat /g01/oraInventory/ContentsXML/inventory.xml 
<?xml version="1.0" standalone="yes" ?>
<!-- Copyright (c) 1999, 2010, Oracle. All rights reserved. -->
<!-- Do not modify the contents of this file by hand. -->
<INVENTORY>
<VERSION_INFO>
   <SAVED_WITH>11.2.0.2.0</SAVED_WITH>
   <MINIMUM_VER>2.1.0.6.0</MINIMUM_VER>
</VERSION_INFO>
<HOME_LIST>
<HOME NAME="Ora11g_gridinfrahome1" LOC="/g01/11.2.0/grid" TYPE="O" IDX="1" CRS="true">
   <NODE_LIST>
      <NODE NAME="vrh1"/>
      <NODE NAME="vrh2"/>
   </NODE_LIST>
</HOME>
</HOME_LIST>
</INVENTORY>

如上修改后问题解决,安装界面正常:
11gr2-RAC-Installing-db-step-4-10

For RAC, Vmware or Virtualbox? I choose VBox!

VMware  Server版已经2年没有更新,这导致在多个Linux的发行版本上安装VM Server存在种种的阻力。譬如说我现在正在使用的Fedora 15就很难使用Vmware,因此我想到了使用原SUN公司的VirtualBox来替代Vmware。

在网上搜了一圈发现Vbox虚拟RAC环境的Post还不少,花了1天时间实践了一下用VBox部署4节点的11gr2 RAC系统,在这里列出个人对VMware和VirtualBox间特性区别的看法

1.VMware Server可以采用浏览器+插件的方式访问,算的上是一种瘦客户端的应用。就访问的便利而言,Vmware有着明显的优势。VirtualBox默认以前台方式运行,虽然也可以从命令行中以headless方式来后台启动,但就访问来说还是太复杂了

2.Vmware的管理可以完全在管理页面中完成,不需要用户额外去记一些命令,管理上更方便。但是管理页面本身需要Tomcat驱动,如果Tomcat或者管理页面出现一些问题的话那么就令人束手无策了。VirtualBox的图形化界面管理还算流畅,但不是所有操作都支持,很大程度上用户需要记部分VBoxManage命令来达到目的,虽然有一定的学习成本,但加强了用户对VBox的了解,VBoxManage命令简单可靠,是我迁移到VBox的一大理由

3.性能方面VirtualBox有较大的优势,主要体现在内存消耗更少,而且Vbox使用的进程也更少(主要是VBoxXPCOMIPCD,VBoxSVC,VBoxHeadless这三个),关于这一点的讨论见OTN上的Oracle RAC on VMWare vs. Virtualbox这个Thread

4.文档数量上,Vmware占有一定优势,这很大程度上得益于vmware公司作为虚拟化方案巨头的大力宣传。而VirtualBox仅是SUN公司x86虚拟化实现中的一环,当然这不影响Vbox作为我们测试RAC或其他MAA方案的最佳平台。Vbox上的11g r2 RAC最佳实践,可见<Oracle Database 11g Release 2 RAC On Linux Using VirtualBox>

关于VirtualBox的一些信息可以见John Heaton的<Oracle Enterprise Linux and Oracle VM (SUN)Virtual Box>:

Know about Oracle RAC Heartbeat

Oracle CRS的心跳主要分为Disk和Network Heartbeat,不同种类的心跳超时会对RAC造成不同的影响。

Heartbeat Mechanisms

Disk Heartbeat ( Voting device ) – IOT
Network Heartbeat ( across the interconnect ) – misscount
Misscount – max time a heartbeat can be missed before entering into a cluster reconfig to evict a node
IOT – internal I/O timeout, in seconds where an I/O to the disk must complete
IOT = misscount [ during initial cluster formation time]
IOT = disktimeout [ all other times ] 

Hardware Storage timeout
Software Storage timeout
OCFS Heartbeat timeouts

disktimeout > max(O2CB_HEARTBEAT_THRESHOLD * 2, HW_STORAGE_TIMEOUT, SW_STORAGE_TIMEOUT)

oracle_crs_heartbeats

Oracle RAC内部错误:ORA-00600[kjbmprlst:shadow]一例

一套Linux x86-64上的11.2.0.1 4节点RAC系统中LMS GCS服务进程遭遇到内部错误ORA-00600[kjbmprlst:shadow],导致节点实例意外终止,具体日志如下:

Fri Jul 08 02:04:43 2011
Errors in file /u01/app/oracle/diag/rdbms/PROD/PROD1/trace/PROD1_lms1_536.trc  (incident=1011732):
ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/PROD/PROD1/incident/incdir_1011732/PROD1_lms1_536_i1011732.trc
Fri Jul 08 02:04:44 2011
Trace dumping is performing id=[cdmp_20110708020444]
Errors in file /u01/app/oracle/diag/rdbms/PROD/PROD1/trace/PROD1_lms1_536.trc:
ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], []
Errors in file /u01/app/oracle/diag/rdbms/PROD/PROD1/trace/PROD1_lms1_536.trc:
ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], []
LMS1 (ospid: 536): terminating the instance due to error 484
Fri Jul 08 02:04:45 2011
opiodr aborting process unknown ospid (27387) as a result of ORA-1092
System state dump is made for local instance
System State dumped to trace file /u01/app/oracle/diag/rdbms/PROD/PROD1/trace/PROD1_diag_513.trc
Fri Jul 08 02:04:54 2011
Termination issued to instance processes. Waiting for the processes to exit
Fri Jul 08 02:04:58 2011
ORA-1092 : opitsk aborting process

该ORA-00600[kjbmprlst:shadow]错误定位为11.2.0.1上的Bug 10121589或Bug 9458781:

Bug 10121589  ORA-600 [kjbmprlst:shadow] can occur in RAC
Affects:

    Product (Component)	Oracle Server (Rdbms)
    Range of versions believed to be affected 	Versions BELOW 12.1
    Versions confirmed as being affected 	

        11.2.0.1 

    Platforms affected	Generic (all / most platforms affected)

Fixed:

    This issue is fixed in	

        12.1 (Future Release)
        11.2.0.2 Bundle Patch 2 for Exadata Database
        11.2.0.1 Bundle Patch 7 for Exadata Database 

Symptoms:

Related To:

    Internal Error May Occur (ORA-600)
    ORA-600 [kjbmprlst:shadow] 

    RAC (Real Application Clusters) / OPS 

Description

    An ORA-600 [kjbmprlst:shadow] can occur if the fix for bug 9979039
    is present.

    Note:
     One off patches for 10200390 should also include this fix.

Bug 9458781  Missing close message to master leaves closed lock dangling crashing the instance with assorted Internal error

Affects:

    Product (Component)	Oracle Server (Rdbms)
    Range of versions believed to be affected 	Versions >= 11.2.0.1 but BELOW 11.2.0.2
    Versions confirmed as being affected 	

        11.2.0.1 

    Platforms affected	Generic (all / most platforms affected)

Fixed:

    This issue is fixed in	

        11.2.0.2 (Server Patch Set)
        11.2.0.1 Bundle Patch 4 for Exadata Database 

Symptoms:

Related To:

    Instance May Crash
    Internal Error May Occur (ORA-600)
    ORA-600 [KJBMPRLST:SHADOW]
    ORA-600 [KJBMOCVT:RID]
    ORA-600 [KJBRREF:PKEY]
    ORA-600 [KJBRASR:PKEY] 

    RAC (Real Application Clusters) / OPS 

Description

    A lock is closed without sending a message to the master.
    This causes closed lock dangling at the master crashing the instance with different internal errors.

    Reported internal errors so far are :
    - KJBMPRLST:SHADOW
    - KJBMOCVT:RID
    - KJBRREF:PKEY
    - KJBRASR:PKEY

该kjbmprlst:shadow内部函数用以管理kjbm shadow锁(/libserver10.a/kjbm.o )信息,存在某个已关闭的lock没有及时message给master node的代码漏洞,目前除了安装补丁外没有已验证的workaround办法(disable drm似乎是无效的):

oradebug lkdebug (track resources, take dumps)
KCL history
KJBL history
KJL history

PCM (GCS) and non-PCM (GES) resources are kept separate and use separate code paths.
GES:
Resource table: kjr and kjrt
Lock table: kjlt
Processes: kjpt
GCS:
Resource table: kjbr
Lock table: kjbl

DLM Structures (continued)
/* PCM resource structure */
typedef struct kjbr {                                /* 68 bytes on sun4u */
  kjsolk       hash_q_kjbr;                             /* hash list : hp */
  ub4          resname_kjbr[2];	                     /* the resource name */
  kjsolk       scan_q_kjbr; /* chain to lmd scan q of grantable resources */
  kjsolk       grant_q_kjbr;                 /* list of granted resources */
  kjsolk       convert_q_kjbr;       /* list of resources being converted */
  ub4          diskscn_bas_kjbr;         /* scn(base) known to be on disk */
  ub2          diskscn_wrap_kjbr;        /* scn(wrap) known to be on disk */
  ub2          writereqscn_wrap_kjbr;    /* scn(wrap) requested for write */
  ub4          writereqscn_bas_kjbr;     /* scn(base) requested for write */
  struct kjbl *sender_kjbr;                 /* lock elected to send block */
  ub2          senderver_kjbr;                  /* version# of above lock */
  ub2          writerver_kjbr;                  /* version# of lock below */
  struct kjbl *writer_kjbr;                /* lock elected to write block */
  ub1          mode_role_kjbr; /* one of 'n', 's', 'x' && one of 'l' or 'g' */
  ub1          flags_kjbr;                        /* ignorewip, free etc. */
  ub1          rfpcount_kjbr;                      /* refuse ping counter */
  ub1          history_kjbr;                /* resource operation history */
  kxid         xid_kjbr;                          /* split transaction ID */
} kjbr ;

/* kjbl - PCM lock structure
** Clients and most of the DLM will use the KJUSER* or KJ_* modes and kscns  */

typedef struct kjbl {                                /* 52 bytes on sun4u */
  union {                     /* discriminate lock@master and lock@client */
    struct {                                           /* for lock@master */
      kgglk        state_q_kjbl;             /* link to chain to resource */
      kjbopqi     *rqinfo_kjbl;                             /* target bid */
      struct kjbr *resp_kjbl;                   /* pointer to my resource */
    } kjbllam;                                 /* KJB Lock Lock At Master */
    struct {                                           /* for lock@client */
      ub4         disk_base_kjbl;        /* disk version(base) for replay */
      ub2         disk_wrap_kjbl;        /* disk version(wrap) for replay */
      ub1         master_node_kjbl;                   /* master instance# */
      ub1         client_flag_kjbl;     /* flags specific to client locks */
      ub2         update_seq_kjbl;               /* last update to master */
    } kjbllac;                                 /* KJB Lock Lock At Client */
  } kjblmcd;                        /* KJB Lock Master Client Discrimnant */
  void  *remote_lockp_kjbl;           /* pointer to client lock or shadow */
  ub2    remote_ver_kjbl;                         /* remote lock version# */
  ub2        ver_kjbl;                                     /* my version# */
  ub2        msg_seq_kjbl;                         /* client->master seq# */
  ub2        reqid_kjbl;                         /* requestid for convert */
  ub2        creqid_kjbl; /* requestid for convert that has been cancelled */
  ub2        pi_wrap_kjbl;                     /* scn(wrap) of highest pi */
  ub4        pi_base_kjbl;                     /* scn(base) of highest pi */
  ub1        mode_role_kjbl; /* one of 'n', 's', 'x' && one of 'l' or 'g' */
  ub1        state_kjbl;       /* _L|_R|_W|_S, notify, which q, lock type */
  ub1        node_kjbl;                       /* instance lock belongs to */
  ub1        flags_kjbl;                                /* lock flag bits */
  ub2        rreqid_kjbl;                               /* save the reqid */
  ub2         write_wrap_kjbl;        /* last write request version(wrap) */
  ub4         write_base_kjbl;        /* last write request version(base) */
  ub4         history_kjbl;                     /* lock operation history */
} kjbl;

PCM DLM locks that are owned by the local instance are allocated and embedded in an LE structure.
PCM DLM locks that are owned by remote instances and mastered by the local instance are allocated in SHARED_POOL.

PCM Locks and Resources
Fields of interest in the kclle structure: kcllerls or releasing; kcllelnm or name(id1,id2);
kcllemode or held-mode; kclleacq or acquiring; kcllelck or DLM lock.

Fields of interest in the kjbr structure: resname_kjbr[2] or resource name; grant_q_kjbr or grant queue;
convert_q_kjbr or convert queue; mode_role_kjbr, which is a bitwise merge of grant mode and
role-interpreted NULL(0x00), S(0x01), X(0x02), L0 Local (0x00), G0 Global without PI (0x08), G1 Global with PI (0x018).

The field mode_role_kjbl in kjbl is a bitwise merge of grant, request, and lock mode: 0x00 if grant NULL;
0x01 if grant S; 0x02 if grant X; 0x04 lock has been opened at master; 0x08 if global role (otherwise local);
0x10 has one or more PI; 0x20 if request CR; 0x40 if request S; 0x80 if request X.

Someone has to keep a list of all buffers and where they are mastered
This is called Global Resource Directory (GRD)
GRD is present on all the instances of the cluster
To find out the master:
select  b.dbablk, r.kjblmaster master_node
from x$le l, x$kjbl r, x$bh b
where b.obj =
and b.le_addr = l.le_addr
and l.le_kjbl = r.kjbllockp

Oracle Support宣称可以通过11.2.0.2 (Server Patch Set)11.2.0.1 Bundle Patch 4 for Exadata Database修复该bug,但是有迹象表明在11.2.0.2上仍可能发生该ORA-00600[kjbmprlst:shadow]内部错误,同时该bug更多地发生在超过2个节点的RAC系统中。

 

Oracle等待事件KJC:Wait for msg sends to complete

KJC: Wait for msg sends to complete是一种只会发生在RAC环境中的等待事件,目前对于该等待事件已知的信息不多。
KJ意为Kernel Lock Management Layer,Messaging Enhancements (KJC)。
猜测该等待事件为global enqueue message的网络传输等待,前台进程和后台进程如LMS、LMD均可能发生该等待事件。

 

select name,parameter1,parameter2,parameter3,wait_class from v$event_name where name like '%KJC%';

NAME                                                             PARAMETER1
---------------------------------------------------------------- ----------------------------------------------------------------
PARAMETER2                                                       PARAMETER3
---------------------------------------------------------------- ----------------------------------------------------------------
WAIT_CLASS
----------------------------------------------------------------
KJC: Wait for msg sends to complete                              msg
dest|rcvr                                                        mtype
Other

SQL> select * from gV$system_Event where event like '%KJC%';

   INST_ID EVENT                                                            TOTAL_WAITS
---------- ---------------------------------------------------------------- -----------
TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT TIME_WAITED_MICRO TOTAL_WAITS_FG TOTAL_TIMEOUTS_FG
-------------- ----------- ------------ ----------------- -------------- -----------------
TIME_WAITED_FG AVERAGE_WAIT_FG TIME_WAITED_MICRO_FG   EVENT_ID WAIT_CLASS_ID WAIT_CLASS#
-------------- --------------- -------------------- ---------- ------------- -----------
WAIT_CLASS
----------------------------------------------------------------
         2 KJC: Wait for msg sends to complete                                      415
             0           1            0              7638            163                 0
             0               0                 2494 4079369678    1893977003           0
Other

         1 KJC: Wait for msg sends to complete                                       93
             0           0            0              2094             37                 0
             0               0                  615 4079369678    1893977003           0
Other

注意到该等待事件的类别是other而非concurrency,若发生该等待事件我们应当关心的是网络性能
netstat -s or ifconfig?

以下为具体的等待发生场景:

PARSING IN CURSOR #140438030304224 len=169 dep=2 uid=0 oct=3 lid=0
tim=1310310630713104 hv=2411583655 ad='b5a49898' sqlid='637txtu7vvq57'
select service_name, service_name_hash,wait_class_id, wait_class#,  wait_class,total_waits, time_waited  from
gv$service_wait_class where inst_id = USERENV('Instance')

END OF STMT
PARSE #140438030304224:c=6999,e=7685,p=0,cr=5,cu=0,mis=1,r=0,dep=2,og=4,plh=809657567,tim=1310310630713101
CLOSE #140438030304224:c=0,e=20,dep=2,type=0,tim=1310310630713388
WAIT #140438029567384: nam='row cache lock' ela= 530 cache id=8 mode=0 request=3 obj#=6623 tim=1310310630716555
EXEC #140438032451672:c=0,e=61,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=4,plh=2853959010,tim=1310310630716708
FETCH #140438032451672:c=0,e=49,p=0,cr=4,cu=0,mis=0,r=1,dep=2,og=4,plh=2853959010,tim=1310310630716809
CLOSE #140438032451672:c=0,e=10,dep=2,type=3,tim=1310310630716877

WAIT #140438029567384: nam='KJC: Wait for msg sends to complete' ela= 18
msg=3235372280 dest|rcvr=65536 mtype=12 obj#=6623 tim=1310310630719146

EXEC #140438032420328:c=0,e=190,p=0,cr=1,cu=5,mis=0,r=1,dep=1,og=4,plh=0,tim=1310310630781989

WAIT #140438032420328: nam='KJC: Wait for msg sends to complete' ela= 55
msg=3235374760 dest|rcvr=65537 mtype=32 obj#=74418 tim=1310310630782830

这里的msg猜测可能是递增的message id
mytpe为该msg的类型
obj#为对象号

KJC: Wait for msg sends to complete可能的相关stack call:

LMS's stack:
ksedms ksdxfdmp ksdxcb sspuser sigacthandler libc_poll
kslwait ksxprcv kjctr_rksxp kjctsrw kjccwcrm kjccqmg
kjbmsclose kjbclose kclpdc kclpdcl kclpbi kjmsm
ksbrdp opirip opidrv sou2o main start 

 LMS4 was waiting for  "KJC: Wait for msg sends to complete"

  'KJC: Wait for msg sends to complete', at one of the stack trace
  below

  o kjccqmg => kjctcwcrm
  o kjccsmg => kjctcwcrm

o  an Oracle process is spinning in kjctcwcrm() on 9.0, 9.2 or 10.1

o  an Oracle process got ORA-600[kjccgmb:1] on 9.0, 9.2 or 10.1

o Got ORA-7445[] at an unknown function called from kjbmmsg2res()
o A LMS process hung in kjctcwcrm() on shutdown

LMS may hang waiting for 'kjc: wait for msg sends to complete'
during shutdown of an instance with another session
reporting ORA-600 [kjccgmb:1]

IAGNOSTIC ANALYSIS:
--------------------
Let the customer tracing the lmon, lmd when doing the shutdown immediate,
find out:
    In sur11_lmon_5948.trc
    WAIT #0: nam='ges LMD to shutdown' ela= 496057 p1=0 p2=0 p3=0
    WAIT #0: nam='KJC: Wait for msg sends to complete' ela= 49213 p1=0 p2=0
p3=0
    WAIT #0: nam='KJC: Wait for msg sends to complete' ela= 49811 p1=0 p2=0
p3=0
    WAIT #0: nam='KJC: Wait for msg sends to complete' ela= 49772 p1=0 p2=0
p3=0
    WAIT #0: nam='KJC: Wait for msg sends to complete' ela= 49813 p1=0 p2=0
p3=0
    WAIT #0: nam='KJC: Wait for msg sends to complete' ela= 49779 p1=0 p2=0
p3=0
    WAIT #0: nam='KJC: Wait for msg sends to complete' ela= 49787 p1=0 p2=0
p3=0
    WAIT #0: nam='KJC: Wait for msg sends to complete' ela= 49779 p1=0 p2=0 

	LMS4's stack:
ksedms ksdxfdmp ksdxcb sspuser sigacthandler libc_poll
kslwait ksxprcv kjctr_rksxp kjctsrw kjccwcrm kjccqmg
kjbmsclose kjbclose kclpdc kclpdcl kclpbi kjmsm
ksbrdp opirip opidrv sou2o main start 

LMS4 was encountered ORA-4031 during shutting down,

#0  0x0000000004f5bdbe in kjccqmg ()
#1  0x0000000005172378 in kjbmflushsc ()
#2  0x0000000005175351 in kjbmfto ()
#3  0x0000000004ff425f in kjmsm ()
#4  0x00000000022677a7 in ksbrdp ()
#5  0x000000000248c8a3 in opirip ()
#6  0x000000000171eefd in opidrv ()
#7  0x0000000001d95f5f in sou2o ()
#8  0x0000000000a079ba in opimai_real ()
#9  0x0000000001d9b408 in ssthrdmain ()
#10 0x0000000000a078a1 in main ()

(gdb) bt
#0  0x00000000053907aa in kclpbi ()
#1  0x0000000004ff46dc in kjmsm ()
#2  0x00000000022677a7 in ksbrdp ()
#3  0x000000000248c8a3 in opirip ()
#4  0x000000000171eefd in opidrv ()
#5  0x0000000001d95f5f in sou2o ()
#6  0x0000000000a079ba in opimai_real ()
#7  0x0000000001d9b408 in ssthrdmain ()
#8  0x0000000000a078a1 in main ()

(gdb) bt
#0  0x0000000004f5bdbe in kjccqmg ()
#1  0x00000000051a232a in kjbmswrite ()
#2  0x00000000051014f1 in kjbrswrite ()
#3  0x00000000050fec8d in kjbrwreq ()
#4  0x0000000005189fcc in kjbmpwrite ()
#5  0x0000000004fe98fe in kjmxmpm ()
#6  0x0000000004feceee in kjmpmsgi ()
#7  0x0000000004ff2d8e in kjmsm ()
#8  0x00000000022677a7 in ksbrdp ()
#9  0x000000000248c8a3 in opirip ()
#10 0x000000000171eefd in opidrv ()
#11 0x0000000001d95f5f in sou2o ()
#12 0x0000000000a079ba in opimai_real ()
#13 0x0000000001d9b408 in ssthrdmain ()
#14 0x0000000000a078a1 in main ()

以下列出了与该’KJC: Wait for msg sends to complete’相关的Bug信息

Bug Affected Version Description
Bug 5075434 Small performance overhead in RAC (waits for “KJC: Wait for msg sends to complete”) 10.2.0.2 RAC session can suffer a small performance overhead due to waiting for a timeout on “KJC: Wait for msg sends to complete”.
Bug 11687519: PZ99 PROCESS SPIN WITH “KJC: WAIT FOR MSG SENDS TO COMPLETE” 10.2.0.3 Abstract: PZ99 PROCESS SPIN WITH “KJC: WAIT FOR MSG SENDS TO COMPLETE”
Bug 4672266: LMS WAITING FOR ‘KJC: WAIT FOR MSG SENDS TO COMPLETE’ WHILE SHUTDOWN 9.2.0.4 Abstract: LMS WAITING FOR ‘KJC: WAIT FOR MSG SENDS TO COMPLETE’ WHILE SHUTDOWN LMS2 has been waiting for the event ‘KJC: Wait for msg sends to complete’
while SHUTDOWN IMMEDIATE.
My customer decided to issue SHUTDOWN ABORT to stop the instance in one hour.
Bug 4672266 LMS hang waiting for ‘kjc: wait for msg sends to complete’ during shutdown 9.2.0.4 9.2.0.7 10.1.0.5 10.2.0.2 LMS may hang waiting for ‘kjc: wait for msg sends to complete’ during shutdown of an instance with another session reporting ORA-600 [kjccgmb:1]
Bug 2447621 – Unneccessary waits for “KJC: WAIT FOR MSG SEND COMPLETION” <9.2 Unneccessary waits for “KJC: Wait for message completion” can occur in a RAC environment.
Bug 5347566: SHUTDOWN IMMEDIATE VERY SLOW WITH ‘KJC: WAIT FOR MSG SENDS TO COMPLETE’ 9.2.0.7 Exactly similar as Bug 5289823 and Bug 4672266 LMS hang waiting for ‘kjc: wait for msg sends to complete’ during shutdown
Bug 5084699: RACGIMON WAITING FOR ‘KJC: WAIT FOR MSG SENDS TO COMPLETE’ 10.2.0.1 racgimon on one of the 4 nodes cluster has been exhibiting process spin.
It appears to wait for 15 minutes every 20 minutes on “KJC: Wait for msg
sends to complete” in the following PL/SQL execution:
‘begin dbms_lock.allocate_unique(:a1, :a2); end;’
Bug 3561040: SHUTDOWN HANG DUE TO LMS PROCESS WAITING FOR KJC: WAIT FOR MSG SENDS TO COMPLETE 9.2.0.4 LMS process is waiting for  “KJC: Wait for msg sends to complete” while shutdown and shutdown does not complete.

了解你所不知道的SMON功能(六):Instance Recovery

SMON的作用还包括RAC环境中的Instance Recovery,注意虽然Instance Recovery可以翻做实例恢复,但实际上和我们口头所说的实例恢复是不同的。我们口头语言所说的实例恢复很大程度上是指Crash Recovery崩溃恢复,Instance Recovery与Crash Recovery是存在区别的:针对单实例(single instance)或者RAC中所有节点全部崩溃后的恢复,我们称之为Crash Recovery。而对于RAC中的某一个节点失败,存活节点(surviving instance)试图对失败节点线程上redo做应用的情况,我们称之为Instance Recovery。对于Crash Recovery更多的内容可见<还原真实的cache recovery>一文。

现象

Instance Recovery期间分别存在cache recovery和ges/gcs remaster2个recovery stage,注意这2个舞台的恢复是同时进行的。cache recovery的主角是存活节点上的SMON进程,SMON负责分发redo给slave进程。而实施ges/gcs remaster的是RAC专有进程LMON。

整个Reconfiuration的过程如下图:
rac_instance_recovery_reconfiguration
 
注意以上Crash Detected时数据库进入部分可用(Partial Availability)状态,从Freeze Lockdb开始None Availability,到IR applies redo即前滚时转换为Partial Availability,待前滚完成后会实施回滚,但是此时数据库已经进入完全可用(Full Availability)状态了,如下图:
 
oracledatabase12g_instance_recovery

The graphic illustrates the degree of database availability during each step of Oracle instance recovery: 

A.         Real Application Clusters is running on multiple nodes.

B.         Node failure is detected.

C.         The enqueue part of the GRD is reconfigured; resource management is redistributed to the surviving nodes. This operation occurs relatively quickly.

D.        The cache part of the GRD is reconfigured and SMON reads the redo log of the failed instance to identify the database blocks that it needs to recover.

E.         SMON issues the GRD requests to obtain all the database blocks it needs for recovery. After the requests are complete, all other blocks are accessible.

F.         The Oracle server performs roll forward recovery. Redo logs of the failed threads are applied to the database, and blocks are available right after their recovery is completed.

G.        The Oracle server performs rollback recovery. Undo blocks are applied to the database for all uncommitted transactions.

H.        Instance recovery is complete and all data is accessible.

Note: The dashed line represents the blocks identified in step 2 in the previous slide. Also, the dotted steps represent the ones identified in the previous slide.

我们来实际观察一下Instance Recovery的过程:

INST 1:

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmaclean.com

SQL> alter system set event='10426 trace name context forever,level 12' scope=spfile;  -- 10426 event Reconfiguration trace event
System altered.

SQL> startup force;
ORACLE instance started.

INST 2:
SQL> shutdown abort
ORACLE instance shut down.

=============================================================
========================alert.log============================

Reconfiguration started (old inc 4, new inc 6)
List of instances:
 1 (myinst: 1)
 Global Resource Directory frozen
 * dead instance detected - domain 0 invalid = TRUE
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
 Set master node info
 Submitted all remote-enqueue requests
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
 Post SMON to start 1st pass IR
Instance recovery: looking for dead threads
Beginning instance recovery of 1 threads
 parallel recovery started with 2 processes                 --2 recovery slave
 Submitted all GCS remote-cache requests
 Post SMON to start 1st pass IR
 Fix write in gcs resources
Reconfiguration complete
Started redo scan
Completed redo scan
 read 88 KB redo, 82 data blocks need recovery
Started redo application at
 Thread 2: logseq 374, block 2, scn 54624376
Recovery of Online Redo Log: Thread 2 Group 4 Seq 374 Reading mem 0
  Mem# 0: +DATA/prod/onlinelog/group_4.271.747100549
  Mem# 1: +DATA/prod/onlinelog/group_4.272.747100553
Completed redo application of 0.07MB
Completed instance recovery at
 Thread 2: logseq 374, block 178, scn 54646382
 73 data blocks read, 83 data blocks written, 88 redo k-bytes read
Thread 2 advanced to log sequence 375 (thread recovery)
Redo thread 2 internally disabled at seq 375 (SMON)
ARC3: Creating local archive destination LOG_ARCHIVE_DEST_1: '/s01/arch/2_374_747100216.dbf' (thread 2 sequence 374) (PROD1)
Setting Resource Manager plan SCHEDULER[0x310B]:DEFAULT_MAINTENANCE_PLAN via scheduler window
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
ARC3: Closing local archive destination LOG_ARCHIVE_DEST_1: '/s01/arch/2_374_747100216.dbf' (PROD1)
2011-06-27 22:19:29.280000 +08:00
Archived Log entry 792 added for thread 2 sequence 374 ID 0x9790ab2 dest 1:
ARC0: Creating local archive destination LOG_ARCHIVE_DEST_1: '/s01/arch/2_375_747100216.dbf' (thread 2 sequence 375) (PROD1)
2011-06-27 22:19:30.336000 +08:00
ARC0: Archiving disabled thread 2 sequence 375
ARC0: Closing local archive destination LOG_ARCHIVE_DEST_1: '/s01/arch/2_375_747100216.dbf' (PROD1)
Archived Log entry 793 added for thread 2 sequence 375 ID 0x9790ab2 dest 1:
minact-scn: Master considers inst:2 dead

==================================================================
===========================smon trace begin=======================

*** 2011-06-27 22:19:28.279
2011-06-27 22:19:28.279849 : Start recovery for domain=0, valid=0, flags=0x0
Successfully allocated 2 recovery slaves
Using 67 overflow buffers per recovery slave
Thread 2 checkpoint: logseq 374, block 2, scn 54624376
  cache-low rba: logseq 374, block 2
    on-disk rba: logseq 374, block 178, scn 54626382
  start recovery at logseq 374, block 2, scn 54624376
Instance recovery not required for thread 1

*** 2011-06-27 22:19:28.487
Started writing zeroblks thread 2 seq 374 blocks 178-185

*** 2011-06-27 22:19:28.487
Completed writing zeroblks thread 2 seq 374
==== Redo read statistics for thread 2 ====
Total physical reads (from disk and memory): 4096Kb
-- Redo read_disk statistics --
Read rate (ASYNC): 88Kb in 0.18s => 0.48 Mb/sec
Longest record: 8Kb, moves: 0/186 (0%)
Longest LWN: 33Kb, moves: 0/47 (0%), moved: 0Mb
Last redo scn: 0x0000.0341884d (54626381)
----------------------------------------------
----- Recovery Hash Table Statistics ---------
Hash table buckets = 262144
Longest hash chain = 1
Average hash chain = 82/82 = 1.0
Max compares per lookup = 1
Avg compares per lookup = 248/330 = 0.8
----------------------------------------------

*** 2011-06-27 22:19:28.489
KCRA: start recovery claims for 82 data blocks

*** 2011-06-27 22:19:28.526
KCRA: blocks processed = 82/82, claimed = 81, eliminated = 1
2011-06-27 22:19:28.526088 : Validate domain 0
 **************** BEGIN RECOVERY HA STATS  ****************

 I'm the recovery instance

 smon posted (1278500359646), recovery started 0.027 secs,(1278500359673)
 domain validated 0.242 secs (1278500359888)
 claims opened 70, claims converted 11, claims preread 0

 ****************  END RECOVERY HA STATS  *****************
2011-06-27 22:19:28.526668 : Validated domain 0, flags = 0x0

*** 2011-06-27 22:19:28.556
Recovery of Online Redo Log: Thread 2 Group 4 Seq 374 Reading mem 0

*** 2011-06-27 22:19:28.560
Completed redo application of 0.07MB

*** 2011-06-27 22:19:28.569
Completed recovery checkpoint
----- Recovery Hash Table Statistics ---------
Hash table buckets = 262144
Longest hash chain = 1
Average hash chain = 82/82 = 1.0
Max compares per lookup = 1
Avg compares per lookup = 330/330 = 1.0
----------------------------------------------
*** 2011-06-27 22:19:28.572 5401 krsg.c
Acquiring RECOVERY INFO PING latch from [krsg.c:5401] IX0
*** 2011-06-27 22:19:28.572 5401 krsg.c
Successfully acquired RECOVERY INFO PING latch IX+
*** 2011-06-27 22:19:28.572 5406 krsg.c
Freeing RECOVERY INFO PING latch from [krsg.c:5406] IX0
*** 2011-06-27 22:19:28.572 5406 krsg.c
Successfully freed RECOVERY INFO PING latch IX-
krss_sched_work: Prod archiver request from process SMON (function:0x2000)
krss_find_arc: Evaluating ARC3 to receive message (flags 0x0)
krss_find_arc: Evaluating ARC0 to receive message (flags 0x0)
krss_find_arc: Evaluating ARC1 to receive message (flags 0xc)
krss_find_arc: Evaluating ARC2 to receive message (flags 0x2)
krss_find_arc: Selecting ARC2 to receive REC PING message
*** 2011-06-27 22:19:28.572 3093 krsv.c
krsv_send_msg: Sending message to process ARC2
*** 2011-06-27 22:19:28.572 1819 krss.c
krss_send_arc: Sent message to ARC2 (message:0x2000)
Recovery sets nab of thread 2 seq 374 to 178 with 8 zeroblks
Retrieving log 4
pre-aal: xlno:4 flno:0 arf:0 arb:2 arh:2 art:4
Updating log 3 thread 2 sequence 375
Previous log 3 thread 2 sequence 0
Updating log 4 thread 2 sequence 374
Previous log 4 thread 2 sequence 374
post-aal: xlno:4 flno:0 arf:3 arb:2 arh:2 art:3
krss_sched_work: Prod archiver request from process SMON (function:0x1)
krss_find_arc: Evaluating ARC3 to receive message (flags 0x0)
krss_find_arc: Selecting ARC3 to receive message
*** 2011-06-27 22:19:28.589 3093 krsv.c
krsv_send_msg: Sending message to process ARC3
*** 2011-06-27 22:19:28.589 1819 krss.c
krss_send_arc: Sent message to ARC3 (message:0x1)
Retrieving log 2
Kicking thread 1 to switch logfile
Retrieving log 4
Retrieving log 3
krss_sched_work: Prod archiver request from process SMON (function:0x1)
krss_find_arc: Evaluating ARC0 to receive message (flags 0x0)
krss_find_arc: Selecting ARC0 to receive message
*** 2011-06-27 22:19:28.599 3093 krsv.c
krsv_send_msg: Sending message to process ARC0
*** 2011-06-27 22:19:28.599 1819 krss.c
krss_send_arc: Sent message to ARC0 (message:0x1)
*** 2011-06-27 22:19:28.599 838 krsv.c
krsv_dpga: Waiting for pending I/O to complete

*** 2011-06-27 22:19:29.304
krss_sched_work: Prod archiver request from process SMON (function:0x1)
krss_find_arc: Evaluating ARC1 to receive message (flags 0xc)
krss_find_arc: Selecting ARC1 to receive message
*** 2011-06-27 22:19:29.304 3093 krsv.c
krsv_send_msg: Sending message to process ARC1
*** 2011-06-27 22:19:29.304 1819 krss.c
krss_send_arc: Sent message to ARC1 (message:0x1)
SMON[INST-TXN-RECO]:about to recover undo segment 11 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 11 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 12 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 12 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 13 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 13 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 14 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 14 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 15 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 15 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 16 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 16 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 17 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 17 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 18 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 18 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 19 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 19 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 20 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 20 as available status:2 ret:0

*** 2011-06-27 22:19:43.299
* kju_tsn_aff_drm_pending TRACEUD: called with tsn x2, dissolve 0
* kju_tsn_aff_drm_pending TRACEUD: tsn_pkey = x2.1
* >> RM REQ QS ---:
 single window RM request queue is empty
 multi-window RM request queue is empty
* Global DRM state ---:
   There is no dynamic remastering
   RM lock state = 0
 pkey 2.1 undo 1 stat 0 masters[32768, 1->1] reminc 4 RM# 1
 flg x0 type x0 afftime x36e6e3a8
 nreplays by lms 0 = 0
* kju_tsn_aff_drm_pending TRACEUD: matching request not found on swin queue
* kju_tsn_aff_drm_pending TRACEUD: pp found, stat x0
* kju_tsn_aff_drm_pending TRACEUD: 2 return true

*** 2011-06-27 22:22:18.333
* kju_tsn_aff_drm_pending TRACEUD: called with tsn x2, dissolve 0
* kju_tsn_aff_drm_pending TRACEUD: tsn_pkey = x2.1

* >> RM REQ QS ---:
 single window RM request queue is empty
 multi-window RM request queue is empty
* Global DRM state ---:
   There is no dynamic remastering
   RM lock state = 0
 pkey 2.1 undo 1 stat 0 masters[32768, 1->1] reminc 4 RM# 1
 flg x0 type x0 afftime x36e6e3a8
 nreplays by lms 0 = 0
* kju_tsn_aff_drm_pending TRACEUD: matching request not found on swin queue
* kju_tsn_aff_drm_pending TRACEUD: pp found, stat x0
* kju_tsn_aff_drm_pending TRACEUD: 2 return true

*** 2011-06-27 22:24:53.365
* kju_tsn_aff_drm_pending TRACEUD: called with tsn x2, dissolve 0
* kju_tsn_aff_drm_pending TRACEUD: tsn_pkey = x2.1

* >> RM REQ QS ---:
 single window RM request queue is empty
 multi-window RM request queue is empty
* Global DRM state ---:
   There is no dynamic remastering
   RM lock state = 0
 pkey 2.1 undo 1 stat 0 masters[32768, 1->1] reminc 4 RM# 1
 flg x0 type x0 afftime x36e6e3a8
 nreplays by lms 0 = 0
* kju_tsn_aff_drm_pending TRACEUD: matching request not found on swin queue
* kju_tsn_aff_drm_pending TRACEUD: pp found, stat x0
* kju_tsn_aff_drm_pending TRACEUD: 2 return true

========================================================================
==============================lmon trace begin==========================

 *** 2011-06-27 22:19:27.748
kjxgmpoll reconfig instance map: 1

*** 2011-06-27 22:19:27.748
kjxgmrcfg: Reconfiguration started, type 1
CGS/IMR TIMEOUTS:
  CSS recovery timeout = 31 sec (Total CSS waittime = 65)
  IMR Reconfig timeout = 75 sec
  CGS rcfg timeout = 85 sec
kjxgmcs: Setting state to 4 0.

*** 2011-06-27 22:19:27.759
     Name Service frozen
kjxgmcs: Setting state to 4 1.
kjxgrdecidever: No old version members in the cluster
kjxgrssvote: reconfig bitmap chksum 0x2137452d cnt 1 master 1 ret 0
kjxgrpropmsg: SSMEMI: inst 1 - no disk vote
kjxgrpropmsg: SSVOTE: Master indicates no Disk Voting
2011-06-27 22:19:27.760783 : kjxgrDiskVote: nonblocking method is chosen
kjxggpoll: change poll time to 50 ms
2011-06-27 22:19:27.918847 : kjxgrDiskVote: Obtained RR update lock for sequence 5, RR seq 4
2011-06-27 22:19:28.023160 : kjxgrDiskVote: derive membership from CSS (no disk votes)
2011-06-27 22:19:28.023240 : proposed membership: 1

*** 2011-06-27 22:19:28.081
2011-06-27 22:19:28.081952 : kjxgrDiskVote: new membership is updated by inst 1, seq 6
2011-06-27 22:19:28.082073 : kjxgrDiskVote: bitmap: 1
CGS/IMR TIMEOUTS:
  CSS recovery timeout = 31 sec (Total CSS waittime = 65)
  IMR Reconfig timeout = 75 sec
  CGS rcfg timeout = 85 sec
kjxgmmeminfo: can not invalidate inst 2
kjxgmps: proposing substate 2
kjxgmcs: Setting state to 6 2.
kjfmSendAbortInstMsg: send an abort message to instance 2
 kjfmuin: inst bitmap 1
 kjfmmhi: received msg from inst 1 (inc 2)
     Performed the unique instance identification check
kjxgmps: proposing substate 3
kjxgmcs: Setting state to 6 3.
     Name Service recovery started
     Deleted all dead-instance name entries
kjxgmps: proposing substate 4
kjxgmcs: Setting state to 6 4.
     Multicasted all local name entries for publish
     Replayed all pending requests
kjxgmps: proposing substate 5
kjxgmcs: Setting state to 6 5.
     Name Service normal
     Name Service recovery done

*** 2011-06-27 22:19:28.191
kjxgmps: proposing substate 6
kjxgmcs: Setting state to 6 6.
kjxgmcs: total reconfig time 0.432 seconds (from 2895072218 to 2895072650)
kjxggpoll: change poll time to 600 ms
kjfmact: call ksimdic on instance (2)
2011-06-27 22:19:28.211846 :

********* kjfcrfg() called, BEGIN LMON RCFG *********
2011-06-27 22:19:28.211906 : * Begin lmon rcfg step KJGA_RCFG_BEGIN
* kjfcrfg: Resource broadcasting disabled
* kjfcrfg: kjfcqiora returned success
kjfcrfg: DRM window size = 4096->4096 (min lognb = 15)
2011-06-27 22:19:28.211954 :
Reconfiguration started (old inc 4, new inc 6)
TIMEOUTS:
  Local health check timeout: 70 sec
  Rcfg process freeze timeout: 70 sec
  Remote health check timeout: 140 sec
  Defer Queue timeout: 163 secs
  CGS rcfg timeout: 85 sec
  Synchronization timeout: 248 sec
  DLM rcfg timeout: 744 sec
List of instances:
 1 (myinst: 1)
Undo tsn affinity 1
OMF 0
2011-06-27 22:19:28.212394 : * Begin lmon rcfg step KJGA_RCFG_FREEZE

*** 2011-06-27 22:19:28.233
* published: inc 6, isnested 0, rora req 0,
  rora start 0, rora invalid 0, (roram 32767), isrcvinst 1,
  (rcvinst 1), isdbopen 1, drh 0, (myinst 1)
  thread 1, isdbmounted 1, sid hash x0
* kjfcrfg: published bigns successfully
* Force-published at step 3
2011-06-27 22:19:28.233575 :  Global Resource Directory frozen
* roram 32767, rcvinst 1
* kjfc_thread_qry: instance 1 flag 3 thread 1 sid 0
* kjfcrfg: queried bigns successfully
inst 1
* kjfcrfg: single_instance_kjga = TRUE

 asby init, 0/1/x2

 asby returns, 0/1/x2/false

* Domain maps before reconfiguration:
*   DOMAIN 0 (valid 1): 1 2
* End of domain mappings

 * dead instance detected - domain 0 invalid = TRUE

* Domain maps after recomputation:
*   DOMAIN 0 (valid 0): 1
* End of domain mappings
2011-06-27 22:19:28.235110 : * Begin lmon rcfg step KJGA_RCFG_COMM
2011-06-27 22:19:28.235242 : GSIPC:KSXPCB: msg 0xd8b84550 status 34, type 2, dest 2, rcvr 0
2011-06-27 22:19:28.235339 : GSIPC:KSXPCB: msg 0xd8b80180 status 34, type 2, dest 2, rcvr 1
 Active Sendback Threshold = 50 %
 Communication channels reestablished
2011-06-27 22:19:28.240076 : * Begin lmon rcfg step KJGA_RCFG_EXCHANGE
2011-06-27 22:19:28.240192 : * Begin lmon rcfg step KJGA_RCFG_ENQCLEANUP
 Master broadcasted resource hash value bitmaps
2011-06-27 22:19:28.251474 :
 Non-local Process blocks cleaned out
2011-06-27 22:19:28.251822 : * Begin lmon rcfg step KJGA_RCFG_CLEANUP
2011-06-27 22:19:28.265220 : * Begin lmon rcfg step KJGA_RCFG_TIMERQ
2011-06-27 22:19:28.265308 : * Begin lmon rcfg step KJGA_RCFG_DDQ
2011-06-27 22:19:28.265393 : * Begin lmon rcfg step KJGA_RCFG_SETMASTER
2011-06-27 22:19:28.271551 :
 Set master node info
2011-06-27 22:19:28.271931 : * Begin lmon rcfg step KJGA_RCFG_ENQREPLAY
2011-06-27 22:19:28.275490 :  Submitted all remote-enqueue requests
2011-06-27 22:19:28.275596 : * Begin lmon rcfg step KJGA_RCFG_ENQDUBIOUS
 Dwn-cvts replayed, VALBLKs dubious
2011-06-27 22:19:28.277223 : * Begin lmon rcfg step KJGA_RCFG_ENQGRANT
 All grantable enqueues granted
2011-06-27 22:19:28.277992 : * Begin lmon rcfg step KJGA_RCFG_PCMREPLAY
2011-06-27 22:19:28.279234 :
2011-06-27 22:19:28.279255 :  Post SMON to start 1st pass IR               --SMON posted by LMON
2011-06-27 22:19:28.307890 :  Submitted all GCS cache requests             --IR acquires all gcs resource needed for recovery
2011-06-27 22:19:28.308038 : * Begin lmon rcfg step KJGA_RCFG_FIXWRITES
 Post SMON to start 1st pass IR
 Fix write in gcs resources
2011-06-27 22:19:28.313508 : * Begin lmon rcfg step KJGA_RCFG_END
2011-06-27 22:19:28.313720 :
2011-06-27 22:19:28.313733 :
Reconfiguration complete

*   domain 0 valid?: 0
* kjfcrfg: ask RMS0 to do pnp work
 **************** BEGIN DLM RCFG HA STATS  ****************

 Total dlm rcfg time (inc 6): 0.100 secs (1278500359581, 1278500359681)
   Begin step .........: 0.001 secs (1278500359581, 1278500359582)
   Freeze step ........: 0.020 secs (1278500359582, 1278500359602)
   Remap step .........: 0.002 secs (1278500359602, 1278500359604)
   Comm step ..........: 0.005 secs (1278500359604, 1278500359609)
   Sync 1 step ........: 0.000 secs (0, 0)
   Exchange step ......: 0.000 secs (1278500359609, 1278500359609)
   Sync 2 step ........: 0.000 secs (0, 0)
   Enqueue cleanup step: 0.011 secs (1278500359609, 1278500359620)
   Sync pcm1 step .....: 0.000 secs (0, 0)
   Cleanup step .......: 0.013 secs (1278500359620, 1278500359633)
   Timerq step ........: 0.000 secs (1278500359633, 1278500359633)
   Ddq step ...........: 0.000 secs (1278500359633, 1278500359633)
   Set master step ....: 0.006 secs (1278500359633, 1278500359639)
   Sync 3 step ........: 0.000 secs (0, 0)
   Enqueue replay step : 0.004 secs (1278500359639, 1278500359643)
   Sync 4 step ........: 0.000 secs (0, 0)
   Enqueue dubious step: 0.001 secs (1278500359643, 1278500359644)
   Sync 5 step ........: 0.000 secs (0, 0)
   Enqueue grant step .: 0.001 secs (1278500359644, 1278500359645)
   Sync 6 step ........: 0.000 secs (0, 0)
   PCM replay step ....: 0.030 secs (1278500359645, 1278500359675)
   Sync 7 step ........: 0.000 secs (0, 0)
   Fixwrt replay step .: 0.003 secs (1278500359675, 1278500359678)
   Sync 8 step ........: 0.000 secs (0, 0)
   End step ...........: 0.001 secs (1278500359680, 1278500359681)
 Number of replayed enqueues sent / received .......: 0 / 0
 Number of replayed fusion locks sent / received ...: 0 / 0
 Number of enqueues mastered before / after rcfg ...: 2217 / 2941
 Number of fusion locks mastered before / after rcfg: 3120 / 5747

 ****************  END DLM RCFG HA STATS  *****************

*** 2011-06-27 22:19:36.589
kjxgfipccb: msg 0x0x7ff526139320, mbo 0x0x7ff526139310, type 19, ack 0, ref 0, stat 34

=====================================================================
============================lms trace begin==========================


*** 2011-06-27 22:38:54.663
2011-06-27 22:38:54.663764 :  0 GCS shadows cancelled, 0 closed, 0 Xw survived
2011-06-27 22:38:54.673539 :  5230 GCS resources traversed, 0 cancelled
2011-06-27 22:38:54.707671 :  9322 GCS shadows traversed, 0 replayed, 0 duplicates,
5183 not replayed, dissolve 0 timeout 0 RCFG(10) lms 0 finished replaying gcs resources
2011-06-27 22:38:54.709132 :  0 write requests issued in 384 GCS resources        --check past image
 0 PIs marked suspect, 0 flush PI msgs
2011-06-27 22:38:54.709520 :  0 write requests issued in 273 GCS resources
 1 PIs marked suspect, 0 flush PI msgs
2011-06-27 22:38:54.709842 :  0 write requests issued in 281 GCS resources
 0 PIs marked suspect, 0 flush PI msgs
2011-06-27 22:38:54.710159 :  0 write requests issued in 233 GCS resources
 0 PIs marked suspect, 0 flush PI msgs
2011-06-27 22:38:54.710531 :  0 write requests issued in 350 GCS resources
 lms 0 finished fixing gcs write protocol

Instance Recovery和普通的Crash Recovery最大的区别在于实例恢复过程中的GRD Frozen和对GES/GCS资源的Remaster,这部分工作主要由LMON进程完成,可以从以上trace中发现一些KJGA_RCFG_*形式的Reconfiguration步骤,它们的含义:

Reconfiguration Steps:

1.    KJGA_RCFG_BEGIN
LMON continuously polling for reconfiguration event. Once cgs reports a change in cluster membership,
LMON starts reconfiguration

2.    KJGA_RCFG_FREEZE
All processes acknowledges to the reconfiguration freeze before LMON continue 

3.    KJGA_RCFG_REMAP
Updates new instance map (kjfchsu), re-distributes resource mastership. Invalidate recovery domains
if reconfiguration is caused by instance death.

4.    KJGA_RCFG_COMM
Reinitialize communication channel

5.    KJGA_RCFG_EXCHANGE
Exchange of master information of gcs, ges and file affinity master

6.    KJGA_RCFG_ENQCLEANUP
Delete remote dead gcs/ges locks. Cancel converting gcs requests. 

7.    KJGA_RCFG_CLEANUP
Cleanup/remove ges resources

8.    KJGA_RCFG_TIMERQ
Restore relative timeout for enqueue locks on timeout queue

9.    KJGA_RCFG_DDQ
Clean out enqueue locks on deadlock queue

10.  KJGA_RCFG_SETMASTER
Update master info for each enqueue resources that need to be remastered.

11.  KJGA_RCFG_REPLAY
Replay enqueue locks

12.  KJGA_RCFG_ENQDUBIOUS
Invalidates ges resources without established value

13.  KJGA_RCFG_ENQGRANT
Grants all grantable ges lock requests

14.  KJGA_RCFG_REPLAY2
Enqueue reconfiguration complete. Post SMON to start instance recovery.  Starts replaying gcs resources.

15.  KJGA_RCFG_FIXWRITES2
Fix write state of gcs resources

16.  KJGA_RCFG_END
Unfreeze lock database

Instance Recovery相关的诊断事件
我们无法禁止Instance Recovery的发生,事实上一旦出现Instance Crash那么Instance Recovery就是必须的。
与Instance Recovery相关的诊断事件主要有10426和29717等:

    10426 – Reconfiguration trace event
    10425 – Enqueue operations
    10432 – Fusion activity
    10429 – IPC tracing

oerr ora 10425
10425, 00000, "enable global enqueue operations event trace"
// *Document: NO
// *Cause:
// *Action: Dump trace for global enqueue operations.

oerr ora 10426
10426, 00000, "enable ges/gcs reconfiguration event trace"
// *Document: NO
// *Cause:
// *Action: Dump trace for ges/gcs reconfiguration.

oerr ora 10430
10430, 00000, "enable ges/gcs dynamic remastering event trace"
// *Document: NO
// *Cause:
// *Action: Dump trace for ges/gcs dynamic remastering.

oerr ora 10401
10401, 00000, "turn on IPC (ksxp) debugging"
// *Cause:
// *Action: Enables debugging code for IPC service layer (ksxp)

oerr ora 10708
10708, 00000, "print out trace information from the RAC buffer cache"
// *Cause: N/A
// *Action: THIS IS NOT A USER ERROR NUMBER/MESSAGE.  THIS DOES NOT NEED TO BE
//          TRANSLATED OR DOCUMENTED. IT IS USED ONLY FOR DEBUGGING.

oerr ora 29717
29717, 00000, "enable global resource directory freeze/unfreeze event trace"
// *Document: NO
// *Cause:
// *Action: Dump trace for global resource directory freeze/unfreeze.

diag RAC INSTANCE SHUTDOWN LMON
LMON will dump more informations to trace during reconfig and freeze.

event="10426 trace name context forever, level 8"
event="29717 trace name context forever, level 5"
or
event="10426 trace name context forever, level 12"
event="10430 trace name context forever, level 12"
event="10401 trace name context forever, level 8"
event="10046 trace name context forever, level 8"
event="10708 trace name context forever, level 15"
event="29717 trace name context forever, level 5"

see 29717  grd frozen trace

alter system set event='29717 trace name context forever, level 5' scope=spfile;

=========================================================================
============================lmon trace begin=============================

********* kjfcrfg() called, BEGIN LMON RCFG *********
2011-06-27 23:13:16.693089 : * Begin lmon rcfg step KJGA_RCFG_BEGIN
* kjfcrfg: Resource broadcasting disabled
* kjfcrfg: kjfcqiora returned success
kjfcrfg: DRM window size = 4096->4096 (min lognb = 15)
2011-06-27 23:13:16.693219 :
Reconfiguration started (old inc 4, new inc 6)
TIMEOUTS:
  Local health check timeout: 70 sec
  Rcfg process freeze timeout: 70 sec
  Remote health check timeout: 140 sec
  Defer Queue timeout: 163 secs
  CGS rcfg timeout: 85 sec
  Synchronization timeout: 248 sec
  DLM rcfg timeout: 744 sec
List of instances:
 1 (myinst: 1)
Undo tsn affinity 1
OMF 0
[FDB][start]
2011-06-27 23:13:16.701320 : * Begin lmon rcfg step KJGA_RCFG_FREEZE
[FACK][18711 not frozen]          --fack means acknowledge in advance
[FACK][18713 not frozen]
[FACK][18719 not frozen]
[FACK][18721 not frozen]
[FACK][18723 not frozen]
[FACK][18729 not frozen]
[FACK][18739 not frozen]
[FACK][18743 not frozen]
[FACK][18745 not frozen]
[FACK][18747 not frozen]
[FACK][18749 not frozen]
[FACK][18751 not frozen]
[FACK][18753 not frozen]
[FACK][18755 not frozen]
[FACK][18757 not frozen]
[FACK][18759 not frozen]
[FACK][18763 not frozen]
[FACK][18765 not frozen]
[FACK][18767 not frozen]
[FACK][18769 not frozen]
[FACK][18771 not frozen]
[FACK][18775 not frozen]
[FACK][18777 not frozen]
[FACK][18816 not frozen]
[FACK][18812 not frozen]
[FACK][18818 not frozen]
[FACK][18820 not frozen]
[FACK][18824 not frozen]
[FACK][18826 not frozen]
[FACK][18830 not frozen]
[FACK][18835 not frozen]
[FACK][18842 not frozen]
[FACK][18860 not frozen]
[FACK][18865 not frozen]
[FACK][18881 not frozen]
[FACK][18883 not frozen]
[FACK][18909 not frozen]

*** 2011-06-27 23:13:16.724
* published: inc 6, isnested 0, rora req 0,
  rora start 0, rora invalid 0, (roram 32767), isrcvinst 0,
  (rcvinst 32767), isdbopen 1, drh 0, (myinst 1)
  thread 1, isdbmounted 1, sid hash x0
* kjfcrfg: published bigns successfully
* Force-published at step 3
2011-06-27 23:13:16.724764 :  Global Resource Directory frozen
* kjfc_qry_bigns: noone has the rcvinst established yet, set it to the highest open instance = 1
* roram 32767, rcvinst 1
* kjfc_thread_qry: instance 1 flag 3 thread 1 sid 0
* kjfcrfg: queried bigns successfully

=====================================================================
==========================lmd0 trace begin===========================

*** 2011-06-27 23:13:16.700
[FFCLI][frozen]
[FUFCLI][normal]

Advise on OS patch upgrade with RAC

Customer is running 10.2.0.2 RAC on Solaris 10. Final objective is to upgrade to RAC 10.2.0.4. Requirement is as below;

1) Upgrade Solaris 10 patches from Generic_118833-03 to 142909-17
2) Upgrade RAC to 10.2.0.4

Question1:
Customer is thinking of performing the OS patch upgrade via the below strategy;

Night1:
1. Shutdown node2.
2. Shutdown crs/db on node1.
3. Perform OS upgrade on node1.
4. Once node1 successfully upgrade it’s OS, start crs/db on node1 and continue database operation until night2. Take note that node2 is down from night1 until night2.

Night2:
1. Shutdown node1.
2. Startup node2; do not startup node2 crs/db.
3. Perform OS upgrade of node2.
4. Once node2 successfully upgrade it’s OS, startup crs/db on node2.
5. Now startup node1 and it’s crs/db and join cluster
Now both node crs/db continues operation.

Is there any concern on the OS patch upgrade strategy as above?

A: No issue on the OS patch upgarde. Stay within a 24 hours of upgrade window.

Question2: After the OS patch upgrade via the strategy above, is there a need to do anything on rac (db/crs)? Perhaps relink? Or do we just bring up crs and both crs/db will up and running?

A: Doc ID 444595.1 – Is It Necessary To Relink Oracle Following OS Upgrade?

Question3: We notice that SUNWsprox OS package is not available in both servers. However RAC is already up and running. Does it mean that SUNWsprox package is not needed in SOlaris 10?

A: That OS-Package SUNWsprox is not required at Solaris 10 (see note 169706.1). Therefore this error can be ignored on your OS version.

Question4: Customer would like to disable gsd. Is this recommended?

A: Yes you can disable the gsd.

Question5: During the OS patch upgrade, java version will be upgraded from “1.5.0_06” to “1.5.0_24”. Will there be any issue with RAC?

A: No issue with RAC.

Question6: If OS patch in node1 is upgraded but not on node2, will RAC have any issue?

A: Please refer the below doc,

Doc ID 220970.1 – RAC: Frequently Asked Questions

沪公网安备 31010802001379号

TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569