CRS-4258: Addition and deletion of voting files are not allowed because the voting files are on ASM

客户的一套11.2.0.1 RAC系统采用ASM diskgroup 存放ocr和votedisk,该REG diskgroup中的某个LUN disk由于硬件的原因损坏了,导致冗余的votedisk表决磁盘有一个处于OFFLINE状态,客户希望能删除该OFFLINE的votedisk并新增一个可用的。

在删除该votedisk文件时出现了CRS-4258的错误,错误如下:

crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. OFFLINE  5b3380d6367e4f94bf19e9db5f2f684e ()  []
 2. ONLINE   6802e6d139354fb3bf95725dd01a02fd (/dev/ocr2) [REG]
 3. ONLINE   a433d51ebd2d4facbfc8e95b017f5393 (/dev/asm-disk1) [REG]
 4. ONLINE   3784d344bffa4f6ebff21c4dd3c873bd (/dev/asm-disk2) [REG]
Located 4 voting disk(s).

crsctl delete css votedisk 5b3380d6367e4f94bf19e9db5f2f684e
CRS-4258: Addition and deletion of voting files are not allowed because the voting files are on ASM

居然无法移除ASM存储上的voting files,太搞笑了。

客户在MOS上找到了CRS-4258相关问题的Note:

CRS-4258: Addition and Deletion of Voting Files are not Allowed Because the Voting Files are on ASM in 11gR2 [ID 1060146.1]
Applies to:
Oracle Server - Enterprise Edition - Version: 11.2.0.1 to 11.2.0.1 - Release: 11.2 to 11.2
Information in this document applies to any platform.
Symptoms
CRS-4258: Addition and deletion of voting files are not allowed because the voting files are on ASM in 11gR2.
Changes
Stale voting files are seen after accidently dropping one of ASM disks belonging to the ASM diskgroup where voting files are stored.
And CRS-4258 occurs when trying to delete the stale voting files using crsctl delete css votedisk FUID.
[root@grid]# crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 5b91aad0a2184f3dbfa8f970e8ae4d49 (/dev/oracleasm/disks/ASM10) [PLAY]
2. ONLINE 53b1b40b73164f9ebf3f498f6d460187 (/dev/oracleasm/disks/ASM9) [PLAY]
3. OFFLINE 82dfd04b96f14f6dbf36f5a62b118f61 () []
[root@grid]# crsctl delete css votedisk 82dfd04b96f14f6dbf36f5a62b118f61
CRS-4258: Addition and deletion of voting files are not allowed because the voting files are on ASM
Cause
1. Seeing stale voting files is due to bug 9024611.
2. "delete" command is not available , only "replace" command is available when voting files are stored on  ASM diskgroup.    
Please see Oracle Clusterware Administration and Deployment Guide11g Release 2 (11.2)
Solution
1. This issue is permanently fixed in 11.2.0.2.0.
2. Apply patch 9024611. Please contact Oracle support if this patch is not available on your platform.
3. If CSS has stale voting files even after applying patch 9024611, do the following workaround -
WORKAROUND:
Do something to trigger ASM to try to relocate the voting file.
e.g)  $ crsctl replace votedisk  +asm_disk_group   --- Put available ASM diskgroup
$ crsctl query css votedisk         --- Check if voting files are all online on the new ASM diskgroup
$ crsctl replace votedisk +PLAY    -- Put the original ASM diskgroup where voting files were 
4. If the workaround above cannot be followed for any reason then you can request the patch for unpublished bug 9409327 for your platform.
References
BUG:9294664 - NOT ABLE TO REMOVE THE VOTEDISK WHICH IS OFFILNE
Hdr: 9294664 11.2.0.1 PCW 11.2.0.1 ADMUTL PRODID-5 PORTID-226 9024611
Abstract: NOT ABLE TO REMOVE THE VOTEDISK WHICH IS OFFILNE
PROBLEM:
--------
crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. ONLINE   9f7f4f7f798d4f69bfe31653894421a2 (ORCL:GRID1) [GRID]
2. OFFLINE  a9b785a59c3c4f67bf15babc67ffb79a () []
3. OFFLINE  29988f37fa794f12bfea3f3672c99609 () []
4. ONLINE   a8b3a040195c4f54bfce8ef21bd4fa07 (ORCL:GRID3) [GRID]
5. ONLINE   a1e4fbd9df6f4f67bf8fc12fe9780721 (ORCL:GRID2) [GRID]
Located 5 voting disk(s).
[root@sdc-drrac01 grid]# crsctl delete css votedisk 
a9b785a59c3c4f67bf15babc67ffb79a
CRS-4258: Addition and deletion of voting files are not allowed because the 
voting files are on ASM
DIAGNOSTIC ANALYSIS:
--------------------
Ct is performing some voting disk failover scenarios in which he has removed 
the 2 votedisk which were on ASM buy drop disk using asmlib and after that 
recreating the disk again and start the cluster in exclusive mode and start 
the ASM and mount the diskgourp So that rebalancing has been done but after 
that
crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. ONLINE   9f7f4f7f798d4f69bfe31653894421a2 (ORCL:GRID1) [GRID]
2. OFFLINE  a9b785a59c3c4f67bf15babc67ffb79a () []
3. OFFLINE  29988f37fa794f12bfea3f3672c99609 () []
4. ONLINE   a8b3a040195c4f54bfce8ef21bd4fa07 (ORCL:GRID3) [GRID]
5. ONLINE   a1e4fbd9df6f4f67bf8fc12fe9780721 (ORCL:GRID2) [GRID]
Located 5 voting disk(s).
and not able to drop the vote disk which is offiline 
WORKAROUND:
-----------
n/a
RELATED BUGS:
-------------
as per bug 9024611 tried the workaround:
but while running 
crsctl css votedisk delete 
we got syntax error and found that there is no command with crsctl css ...

这个Note说明不能移除ASM存储内voting files的问题在11.2.0.2.0上已经解决了,也可以通过安装one-off patch 9024611来修复。

但是实际在11.2.0.2上测试可以发现仍旧无法删除ASM上的voting files:

root@rh2 ~]# crsctl query crs  releaseversion
Oracle High Availability Services release version on the local node is [11.2.0.2.0]
[root@rh2 ~]# crsctl query crs  activeversion
Oracle Clusterware active version on the cluster is [11.2.0.2.0]
[grid@rh2 ~]$ /s01/grid/OPatch/opatch lsinventory
Invoking OPatch 11.2.0.1.1
Oracle Interim Patch Installer version 11.2.0.1.1
Copyright (c) 2009, Oracle Corporation.  All rights reserved.
Oracle Home       : /s01/grid
Central Inventory : /s01/app/oraInventory
from           : /etc/oraInst.loc
OPatch version    : 11.2.0.1.1
OUI version       : 11.2.0.2.0
OUI location      : /s01/grid/oui
Log file location : /s01/grid/cfgtoollogs/opatch/opatch2011-08-04_18-50-34PM.log
Patch history file: /s01/grid/cfgtoollogs/opatch/opatch_history.txt
Lsinventory Output file location : /s01/grid/cfgtoollogs/opatch/lsinv/lsinventory2011-08-04_18-50-34PM.txt
--------------------------------------------------------------------------------
Installed Top-level Products (1): 
Oracle Grid Infrastructure                                           11.2.0.2.0
There are 1 products installed in this Oracle Home.
There are no Interim patches installed in this Oracle Home.
Rac system comprising of multiple nodes
Local node = rh2
Remote node = rh3
--------------------------------------------------------------------------------
OPatch succeeded.
[root@rh2 ~]# crsctl delete css votedisk a433d51ebd2d4facbfc8e95b017f5393
CRS-4258: Addition and deletion of voting files are not allowed because the voting files are on ASM

又是一个伪修复的Bug….!!

无法,寄希望与replace能解决问题,结果发现:

crsctl replace votedisk +DATA
Failed to create voting files on disk group DATA.
Change to configuration failed, but was successfully rolled back.
CRS-4000: Command Replace failed, or completed with errors.

方法四中指出的unpublished bug 9409327(Patch 9409327: OFFLINE VF ENTRY REMAINS AFTER PATCH FOR BUG 9024611),目前仅在IBM AIX on POWER Systems (64-bit)的11.2.0.1上有对应的补丁。

Brain Split?

真正出现脑裂的几率并不高,但确实让我们碰上了。2个月前为一套AIX6.1上的10.2.0.4双节点RAC系统做故障测试,主要内容是拔除RAC interconnect网线,测试CRS能否正确处理私有网络挂掉的情况。

 

正式测试时发现2台主机都没有重启,而两端的CSS都认为对方节点已经down了。这就造成2个节点都以为自身是幸存者,也就是我们说的脑裂(brain split),此时实例一般会因为LMON进程的缘故而hang住。

 

我们来比对当时2个节点上的日志进一步分析:

 

STEP 1 :20:41:19物理拔出网线后,节点间无法正常通信,进入misscount倒计时600s
节点1:
[    CSSD]2010-06-22 20:41:21.465 [3342] >TRACE:   clssnmPollingThread: node gis2 (2) missed(2) checkin(s)
[    CSSD]2010-06-22 20:41:22.465 [3342] >TRACE:   clssnmPollingThread: node gis2 (2) missed(3) checkin(s)
.........
[    CSSD]2010-06-22 20:51:17.956 [3342] >TRACE:   clssnmPollingThread: node gis2 (2) missed(598) checkin(s)
[    CSSD]2010-06-22 20:51:18.963 [3342] >TRACE:   clssnmPollingThread: node gis2 (2) missed(599) checkin(s)
[    CSSD]2010-06-22 20:51:19.963 [3342] >TRACE:   clssnmPollingThread: Eviction started for node gis2 (2), flags 0x0001, state 3, wt4c 0
/* 节点1上完成倒计时后开始驱逐节点2*/
节点2:
[    CSSD]2010-06-22 20:41:19.598 [3342] >TRACE:   clssnmPollingThread: node gis1 (1) missed(2) checkin(s)
[    CSSD]2010-06-22 20:41:20.599 [3342] >TRACE:   clssnmPollingThread: node gis1 (1) missed(3) checkin(s)
......................
[    CSSD]2010-06-22 20:51:15.871 [3342] >TRACE:   clssnmPollingThread: node gis1 (1) missed(598) checkin(s)
[    CSSD]2010-06-22 20:51:16.871 [3342] >TRACE:   clssnmPollingThread: node gis1 (1) missed(599) checkin(s)
[    CSSD]2010-06-22 20:51:17.878 [3342] >TRACE:   clssnmPollingThread: Eviction started for node gis1 (1), flags 0x0001, state 3, wt4c 0
/*同样的节点2也是在10分钟后的51分开始驱逐节点1*/
STEP 2: 2个节点读取磁盘心跳信息(clssnmReadDskHeartbeat),且认为对方节点已经down了
节点1:
[    CSSD]2010-06-22 20:51:20.964 [3856] >TRACE:   clssnmSetupAckWait: node(1) is ACTIVE
[    CSSD]2010-06-22 20:51:20.964 [3856] >TRACE:   clssnmSendVote: syncSeqNo(3)
[    CSSD]2010-06-22 20:51:20.964 [3856] >TRACE:   clssnmWaitForAcks: Ack message type(13), ackCount(1)
[    CSSD]2010-06-22 20:51:20.965 [2057] >TRACE:   clssnmSendVoteInfo: node(1) syncSeqNo(3)
[    CSSD]2010-06-22 20:51:21.714 [1543] >TRACE:   clssnmReadDskHeartbeat: node(2) is down. rcfg(3) wrtcnt(4185) LATS(1628594178) Disk lastSeqNo(4185)
[    CSSD]2010-06-22 20:51:21.965 [3856] >TRACE:   clssnmWaitForAcks: done, msg type(13)
[    CSSD]2010-06-22 20:51:21.965 [3856] >TRACE:   clssnmCheckDskInfo: Checking disk info...
[    CSSD]2010-06-22 20:51:22.718 [1543] >TRACE:   clssnmReadDskHeartbeat: node(2) is down. rcfg(3) wrtcnt(4186) LATS(1628595183) Disk lastSeqNo(4186)
[    CSSD]2010-06-22 20:51:22.964 [3342] >TRACE:   clssnmPollingThread: node gis1 (1) missed(2) checkin(s)
[    CSSD]2010-06-22 20:51:23.722 [1543] >TRACE:   clssnmReadDskHeartbeat: node(2) is down. rcfg(3) wrtcnt(4187) LATS(1628596186) Disk lastSeqNo(4187)
[ CSSD]2010-06-22 20:51:24.724 [1543] >TRACE: clssnmReadDskHeartbeat: node(2) is down.
rcfg(3) wrtcnt(4188) LATS(1628597189) Disk lastSeqNo(4188)
.............................
[    CSSD]2010-06-22 20:59:49.953 [1543] >TRACE:   clssnmReadDskHeartbeat: node(2) is down. rcfg(3) wrtcnt(4692) LATS(1629102418) Disk lastSeqNo(4692)
[    CSSD]2010-06-22 20:59:50.057 [3085] >TRACE:   clssgmPeerDeactivate: node 2 (gis2), death 0, state 0x80000001 connstate 0xf
[    CSSD]2010-06-22 20:59:50.104 [1029] >TRACE:   clssnm_skgxncheck: CSS daemon failed on node 2
[    CSSD]2010-06-22 20:59:50.382 [2314] >TRACE:   clssgmClientConnectMsg: Connect from con(112a6c5b0) proc(112a5a190) pid() proto(10:2:1:1)
[    CSSD]2010-06-22 20:59:51.231 [3856] >TRACE:   clssnmEvict: Start
[    CSSD]2010-06-22 20:59:51.231 [3856] >TRACE:   clssnmEvict: Evicting node 2, birth 1, death 3, killme 1
[    CSSD]2010-06-22 20:59:51.232 [3856] >TRACE:   clssnmWaitOnEvictions: Start
[    CSSD]2010-06-22 20:59:51.232 [3856] >TRACE:   clssnmWaitOnEvictions: Node(0) down, LATS(0),timeout(1629103696)
[    CSSD]2010-06-22 20:59:51.232 [3856] >TRACE:   clssnmWaitOnEvictions: Node(2) down, LATS(1629102418),timeout(1278)
[    CSSD]2010-06-22 20:59:51.232 [3856] >TRACE:   clssnmSetupAckWait: Ack message type (15)
[    CSSD]2010-06-22 20:59:51.232 [3856] >TRACE:   clssnmSetupAckWait: node(1) is ACTIVE
[    CSSD]2010-06-22 20:59:51.232 [3856] >TRACE:   clssnmSendUpdate: syncSeqNo(3)
[    CSSD]2010-06-22 20:59:51.232 [3856] >TRACE:   clssnmWaitForAcks: Ack message type(15), ackCount(1)
[    CSSD]2010-06-22 20:59:51.232 [2057] >TRACE:   clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[    CSSD]2010-06-2F1.232 [2057] >TRACE:   clssnmDeactivateNode: node 0 () left cluster
[    CSSD]2010-06-22 20:59:51.232 [2057] >TRACE:   clssnmUpdateNodeState: node 1, state (3/3) unique (1277207505/1277207505) prevConuni(0) birth (2/2) (old/new)
[    CSSD]2010-06-22 20:59:51.232 [2057] >TRACE:   clssnmUpdateNodeState: node 2, state (0/0) unique (1277206874/1277206874) prevConuni(1277206874) birth (1/0) (old/new)
[    CSSD]2010-06-22 20:59:51.232 [2057] >TRACE:   clssnmDeactivateNode: node 2 (gis2) left cluster
[    CSSD]2010-06-22 20:59:51.233 [2057] >USER:    clssnmHandleUpdate: SYNC(3) from node(1) completed
[    CSSD]2010-06-22 20:59:51.233 [2057] >USER:    clssnmHandleUpdate: NODE 1 (gis1) IS ACTIVE MEMBER OF CLUSTER
[    CSSD]2010-06-22 20:59:51.310 [4114] >TRACE:   clssgmReconfigThread:  started for reconfig (3)
[    CSSD]2010-06-22 20:59:51.310 [4114] >USER:    NMEVENT_RECONFIG [00][00][00][02]
[    CSSD]2010-06-22 20:59:51.310 [4114] >TRACE:   clssgmCleanupGrocks: cleaning up grock crs_version type 2
[    CSSD]2010-06-22 20:59:51.310 [4114] >TRACE:   clssgmCleanupGrocks: cleaning up grock ORA_CLSRD_1_gisdb type 2
[    CSSD]2010-06-22 20:59:51.310 [4114] >TRACE:   clssgmCleanupGrocks: cleaning up grock ORA_CLSRD_1_gisdb type 3
[    CSSD]2010-06-22 20:59:51.310 [4114] >TRACE:   clssgmCleanupGrocks: cleaning up grock ORA_CLSRD_2_gisdb type 3
[    CSSD]2010-06-22 20:59:51.310 [4114] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(ORA_CLSRD_2_gisdb) birth(1/0)
[    CSSD]2010-06-22 20:59:51.310 [4114] >TRACE:   clssgmCleanupGrocks: cleaning up grock DBGISDB type 2
[    CSSD]2010-06-22 20:59:51.310 [4114] >TRACE:   clssgmCleanupGrocks: cleaning up grock DGGISDB type 2
[    CSSD]2010-06-22 20:59:51.310 [4114] >TRACE:   clssgmCleanupGrocks: cleaning up grock DAALL_DB type 2
[    CSSD]2010-06-22 20:59:51.310 [4114] >TRACE:   clssgmCleanupGrocks: cleaning up grock EVMDMAIN type 2
[    CSSD]2010-06-22 20:59:51.310 [4114] >TRACE:   clssgmCleanupGrocks: cleaning up grock CRSDMAIN type 2
[    CSSD]2010-06-22 20:59:51.310 [4114] >TRACE:   clssgmCleanupGrocks: cleaning up grock IGGISDBALL type 2
[    CSSD]2010-06-22 20:59:51.310 [4114] >TRACE:   clssgmCleanupGrocks: cleaning up grock ocr_crs type 2
[    CSSD]2010-06-22 20:59:51.310 [4114] >TRACE:   clssgmCleanupGrocks: cleaning up grock ORA_CLSRCSN_SRV_gisdb1 type 3
[    CSSD]2010-06-22 20:59:51.311 [4114] >TRACE:   clssgmEstablishConnections: 1 nodes in cluster incarn 3
[    CSSD]2010-06-22 20:59:51.311 [3085] >TRACE:   clssgmPeerListener: connects done (1/1)
[    CSSD]2010-06-22 20:59:51.311 [4114] >TRACE:   clssgmEstablishMasterNode: MASTER for 3 is node(1) birth(2)
[    CSSD]2010-06-22 20:59:51.311 [4114] >TRACE:   clssgmChangeMasterNode: requeued 1 RPCs
[    CSSD]2010-06-22 20:59:51.311 [4114] >TRACE:   clssgmMasterCMSync: Synchronizing group/lock status
[    CSSD]2010-06-22 20:59:51.312 [4114] >TRACE:   clssgmMasterSendDBDone: group/lock status synchronization complete
[    CSSD]CLSS-3000: reconfiguration successful, incarnation 3 with 1 nodes
[    CSSD]CLSS-3001: local node number 1, master node number 1
/* 节点1在hearbeat 8分钟左右后认为CSS daemon failed on node 2,正式认为Node 2离开了集群,并完成了reconfiguration*/
节点2:
[    CSSD]2010-06-22 20:51:18.892 [3856] >TRACE:   clssnmSendVote: syncSeqNo(3)
[    CSSD]2010-06-22 20:51:18.892 [3856] >TRACE:   clssnmWaitForAcks: Ack message type(13), ackCount(1)
[    CSSD]2010-06-22 20:51:18.892 [2057] >TRACE:   clssnmSendVoteInfo: node(2) syncSeqNo(3)
[    CSSD]2010-06-22 20:51:19.287 [1543] >TRACE:   clssnmReadDskHeartbeat: node(1) is down. rcfg(3) wrtcnt(3548) LATS(351788040) Disk lastSeqNo(3548)
[    CSSD]2010-06-22 20:51:19.892 [3856] >TRACE:   clssnmWaitForAcks: done, msg type(13)
[    CSSD]2010-06-22 20:51:19.892 [3856] >TRACE:   clssnmCheckDskInfo: Checking disk info...
[ CSSD]2010-06-22 20:51:20.288 [1543] >TRACE: clssnmReadDskHeartbeat: node(1) is down. rcfg(3) wrtcnt(3549) LATS(351789041) Disk lastSeqNo(3549)
[    CSSD]2010-06-22 20:51:21.308 [1543] >TRACE:   clssnmReadDskHeartbeat: node(1) is down. rcfg(3) wrtcnt(3550) LATS(351790062) Disk lastSeqNo(3550)
...........................
[    CSSD]2010-06-22 20:59:46.122 [1543] >TRACE:   clssnmReadDskHeartbeat: node(1) is down. rcfg(3) wrtcnt(4051) LATS(352294875) Disk lastSeqNo(4051)
[    CSSD]2010-06-22 20:59:46.341 [2314] >TRACE:   clssgmClientConnectMsg: Connect from con(112947c70) proc(112946f90) pid() proto(10:2:1:1)
[    CSSD]2010-06-22 20:59:46.355 [2314] >WARNING: clssgmShutDown: Received explicit shutdown request from client.
[    CSSD]2010-06-22 20:59:46.355 [2314] >TRACE:   clssgmClientShutdown: Aborting client (112a50210) proc (112a4e3d0)
[    CSSD]2010-06-22 20:59:46.355 [2314] >TRACE:   clssgmClientShutdown: Aborting client (112a50cd0) proc (112a4e3d0)
[    CSSD]2010-06-22 20:59:46.355 [2314] >TRACE:   clssgmClientShutdown: Aborting client (112a536f0) proc (112a4e3d0)
[    CSSD]2010-06-22 20:59:46.355 [2314] >TRACE:   clssgmClientShutdown: Aborting client (112a4eb90) proc (112a4eef0)
[    CSSD]2010-06-22 20:59:46.355 [2314] >TRACE:   clssgmClientShutdown: Aborting client (112a69250) proc (112a67e10)
[    CSSD]2010-06-22 20:59:46.355 [2314] >TRACE:   clssgmClientShutdown: Aborting client (112946050) proc (112945e50)
[    CSSD]2010-06-22 20:59:46.355 [2314] >TRACE:   clssgmClientShutdown: waited 0 seconds on 6 IO capable clients
[    CSSD]2010-06-22 20:59:46.494 [2314] >WARNING: clssgmClientShutdown: graceful shutdown completed.
[    CSSD]2010-06-22 20:59:47.130 [1543] >TRACE:   clssnmReadDskHeartbeat: node(1) is down. rcfg(3) wrtcnt(4052) LATS(352295883) Disk lastSeqNo(4052)
[    CSSD]2010-06-22 21:34:40.167 >USER:    Oracle Database 10g CSS Release 10.2.0.1.0 Production Copyright 1996, 2004 Oracle.  All rights reserved.
/* node2 也正确进行了heartbeat,并认为node(1) is down,最后被手动关闭;之后还原了网络故障,在21:34时CSS重新启动*/

 

如果你仔细看以上日志的话,你大概会找出”Oracle Database 10g CSS Release 10.2.0.1.0″的记录;这套RAC不是10.2.0.4的吗,为什么CSS还是10.2.0.1版本的呢,事后调查才发觉是安装该套系统的施工方国内某X码工程师在给CRS打补丁的时候忘记运行最后的root102.sh脚本了,该脚本将更新OCR/Voting disk及实际的CRS binary文件等,如果补丁安装结束后没有运行该脚本则升级不会有任何效果,而只会更新oraInventory中的信息。

 

刚开始时哪位X码的工程师抵死不肯承认忘记了运行脚本,而实际上在AIX 6.1上打10.2.0.4 CRS的patch是需要为oracle用户赋特有的权限的,这一点不同于AIX 5.3上,即:

 

chuser capabilities=CAP_BYPASS_RAC_VMM,CAP_PROPAGATE,CAP_NUMA_ATTACH oracle
/*进一步检查*/
lsuser -f oracle | grep capabilities
capabilities=CAP_BYPASS_RAC_VMM,CAP_PROPAGATE,CAP_NUMA_ATTACH

 

如果未对oracle用户赋以上权限则运行root102.sh脚本时将报错。另一个判断的标志是pre10204/pre10205目录,如果运行过root102.sh脚本的话$ORA_CRS_HOME/install目录下会多出一个形如pre$VERSION的目录,没有的话一般就是没有运行过脚本,当然也有可能是时候删除了(不建议删除)。

 

了解到以上信息后对此次脑裂的追根溯源就要简单的多了,版本10201上的CRS可以说Bug众多的,从10201-10204期间CRS加入了不少新的参数和机制,在MOS上搜索关键词”brain split CSS”可以找到以下案例:

Hdr: 8293652 10.2.0.3 PCW 10.2.0.3 OSD PRODID-5 PORTID-46
Abstract: CSS CANNOT HANDLE SPLIT-BRAIN AND DB INSTANCE RECEIVES ORA-29740
PROBLEM:
——–
config:
2-node RAC: Node1 (pdb01) and Node2 (pdb02)
There’s no time difference between two nodes.

pdb02 got ORA-29740 and terminated at “Tue Feb 17 12:13:06 2009”
ORA-29740 occured with reason 1.
After ORA-29740 happened, the instance won’t be able to start
until rebooting OS.
After rebooting OS, everything was fine and instances were up.

DIAGNOSTIC ANALYSIS:
——————–
clssnmReadDskHeartbeat: node(2) is down. rcfg(8) wrtcnt(2494425)
LATS(1205488794) Disk lastSeqNo(2494425)
nodes
clssgmMasterSendDBDone: group/lock status synchronization complete
nodes

WORKAROUND:
———–
none

RELATED BUGS:
————-

REPRODUCIBILITY:
—————-
once at ct’s env.

TEST CASE:
———-

STACK TRACE:
————

SUPPORTING INFORMATION:
———————–

24 HOUR CONTACT INFORMATION FOR P1 BUGS:
—————————————-

DIAL-IN INFORMATION:
——————–

IMPACT DATE:
————

Does ct apply any CRS(bundle) patch ?

When problem happen, cssd can’t connect each other via interconnect,
but both cssd can do heartbeat to voting disk.
However, both cssd consider that “I’m survivor”.
Looking into node 1 cssd.
* 12:02:30.856 – Initiated sync
[ CSSD]2009-02-17 12:02:30.856 [1262557536] >TRACE: clssnmDoSyncUpdate:
Initiating sync 7
[ CSSD]2009-02-17 12:02:30.856 [1262557536] >TRACE: clssnmDoSyncUpdate:
diskTimeout set to (57000)ms

* Checking voting disk, and find node2 is still voting and living.
[ CSSD]2009-02-17 12:02:30.874 [1262557536] >TRACE: clssnmCheckDskInfo:
Checking disk info…
[ CSSD]2009-02-17 12:02:30.874 [1262557536] >TRACE: clssnmCheckDskInfo:
node(2) timeout(20) state_network(5) state_disk(3) misstime(0)
[ CSSD]2009-02-17 12:02:31.878 [1262557536] >TRACE: clssnmCheckDskInfo:
node(2) disk HB found, network state 5, disk state(3) misstime(1010)

* Compared cluster size and confirmed it can survive.
[ CSSD]2009-02-17 12:02:34.885 [1262557536] >TRACE: clssnmCheckDskInfo:
node 2, iz-pdb02, state 5 with leader 2 has smaller cluster size 1;
my cluster size 1 with leader 1

* Then finished
[ CSSD]2009-02-17 12:02:34.886 [1262557536] >TRACE: clssnmDoSyncUpdate:
Sync Complete!
*** 03/08/09 11:23 pm ***
Looking into node 2 cssd log.

* 12:02:20.647 – initiated sync protocol
[ CSSD]2009-02-17 12:02:20.647 [1262557536] >TRACE: clssnmDoSyncUpdate:
Initiating sync 7
[ CSSD]2009-02-17 12:02:20.647 [1262557536] >TRACE: clssnmDoSyncUpdate:
diskTimeout set to (57000)ms

* Checking disk and find node1 does not do disk heartbeart for 59690 ms.
it would have waited for misscount and considered node 1 is dead
[ CSSD]2009-02-17 12:02:22.285 [1262557536] >TRACE: clssnmCheckDskInfo:
Checking disk info…
[ CSSD]2009-02-17 12:02:22.285 [1262557536] >TRACE: clssnmCheckDskInfo:
node(1) timeout(59690) state_network(5) state_disk(3) misstime(61000)

* node2 is the only active member of cluster, finished.
[ CSSD]2009-02-17 12:02:22.723 [1262557536] >TRACE: clssnmDoSyncUpdate:
Sync Complete!
*** 03/08/09 11:45 pm ***
So, strange point is, node 2 cssd says node 1 cssd didn’t do
disk heartbeat for 60 seconds.

Looking into node1 cssd log just before initiating sync. We see 87sec gap.
———————————–
[ CSSD]2009-02-17 12:00:19.354 [1199618400] >TRACE: clssgmClientConnectMsg:
Connect from con(0x784d80) proc(0x7749b0) pid(14746) proto(10:2:1:1)
[ CSSD]2009-02-17 12:00:33.338 [1199618400] >TRACE: clssgmClientConnectMsg:
Connect from con(0x7d8620) proc(0x75cfb0) pid() proto(10:2:1:1)
[ CSSD]2009-02-17 12:01:03.688 [1199618400] >TRACE: clssgmClientConnectMsg:
Connect from con(0x76a390) proc(0x75cfb0) pid(13634) proto(10:2:1:1)
[ CSSD]2009-02-17 12:02:30.855 [1168148832] >WARNING: clssnmDiskPMT:
sltscvtimewait timeout (69200)
[ CSSD]2009-02-17 12:02:30.855 [1189128544] >WARNING: clssnmeventhndlr:
Receive failure with node 2 (iz-pdb02), state 3, con(0x72b980),
probe((nil)), rc=10
[ CSSD]2009-02-17 12:02:30.855 [1189128544] >TRACE: clssnmDiscHelper:
iz-pdb02, node(2) connection failed, con (0x72b980), probe((nil))
[ CSSD]2009-02-17 12:02:30.856 [1262557536] >TRACE: clssnmDoSyncUpdate:
Initiating sync 7

As an interesting point, clssgmClientConnectMsg does not show message,
but nm polling thread/disk ping thread does not warn timeout.
(Usually it should write message first at 50% of timeout = 30 sec)

And “sltscvtimewait timeout (69200)” message means, DiskPingMonitor thread
does not run for a 69200 ms whereas it just wants to sleep 1 second.

These suggest, cssd does not scheduled about 70 seconds on node 1.
I don’t see any log from DiskPingThread, but I assume it is suspended
at some point also, and back to work after 70 seconds.
Please check OS message file to see any interesting error is recorded.
To prevent this issue, keep watching OS performance to see if any
extreme high load does not happen.

Recommended solution it to go to 10.2.0.4 and use oprocd so that
we can expect oprocd kill node1 in such case.

 

 

上述案例同样是在”cssd can’t connect each other via interconnect”的状况下出现了”I’m survivor”的脑裂问题,MOS的建议是升级到10204后oprocd进程可以阻止这样的惨剧发生。

 

该问题最后通过升级到10.2.0.5解决了,这个case告诉我们在中国的it大环境内,有时候我们不得不亲力亲为地关心每一个细节,就拿这次来说我一开始也没发现升级没完成的情况,后来还是同事提醒了我;因为这是一个非常低级错误,如果施工方的X码工程师仔仔细细地按照他们下发的文档按部就班亦或者能留意一下升级时的图形窗口中的说明的话,这个问题都不会发生!而实际上不仅仅是此套系统,连带着其他2套系统也是这位X码工程师安装升级的,这几套系统在之后的故障测试时都发现了同样的问题。

事实告诉我们,细节决定成败!

沪ICP备14014813号

沪公网安备 31010802001379号