CTSSD服务无法正常启动:Failure 4 in trying to open SV key PROCL-4/PROCL-5 clsctss_r_av2

如果CTSSD服务无法正常启动,则在11.2中将无法正常启动HAS=>CRS导致RAC无法正常使用。

具体的诊断需要参考$GRID_HOME/log/$HOSTNAME/ctssd/octssd.log 的日志,例如:

 

 

2010-12-23 16:45:31.287: [ GIPC][1257901344] gipcCheckInitialization: possible incompatible non-threaded init from [prom.c : 687], original from [clsss.c : 5325]
[ default][1257901344]Failure 4 in trying to open SV key SYSTEM.version.localhost

[ default][1257901344]procr_open_key error 4 errorbuf : PROCL-4: The local registry key to be operated on does not exist.
or PROCL-5
2010-12-23 16:45:31.289: [ CTSS][1257901344]clsctss_r_av2: Error [3] retrieving Active Version from OLR. Returns [19].
2010-12-23 16:45:31.292: [ CTSS][1257901344](:ctss_init16:): Error [19] retrieving active version. Returns [19].
2010-12-23 16:45:31.292: [ CTSS][1257901344]ctss_main: CTSS init failed [19]
2010-12-23 16:45:31.292: [ CTSS][1257901344]ctss_main: CTSS daemon aborting [19].

 

 

触发该错误,主要有2种可能:

  1. ocr所在ASM DISK的用户权限存在问题
  2. ocr所在ASM DISK在节点之间设备名不一致
  3. HAS的状态不正常
  4.  bug 10420872

 

对于问题2  建议通过kfed read /dev/oracleasm/disks/OCRVOTE01 | grep kfbh.type  来确认节点之间的ASM DISK一致。

对于问题3 可以尝试 重启 HAS 服务:

su - root

crsctl stop has -f

crsctl start has

【教学视频】Maclean教你一步一步使用Vbox在Linux 5上安装Oracle 11gR2 RAC

 

https://zcdn.askmaclean.com/dbDao_install%2011gr2%20RAC%20on%20Linux%205.7%20in%20vbox.pdf

[视频教学]Maclean教你用Vbox在Enterprise Linux 5上安装Oracle 10gR2 RAC

【视频教学】Maclean手把手教你用Vbox在Enterprise Linux 5上安装Oracle 10gR2 RAC数据库

WebEx视频播放软件 , 以下wrf格式文件 需要使用webex播放,

下载WebEX: 

 

 
 WebEx.zip (2.55 MB, 下载次数: 6539)

Agenda:

相关脚本 和命令附件:    脚本命令.txt (1.71 KB, 下载次数: 1444)

第一讲视频-在Windows上安装Vbox

 第一讲视频-在Windows上安装Vbox.zip (4.41 MB, 下载次数: 4930)

第二讲视频-在Vbox上安装Oracle Enterprise Linux 5.7虚拟主机

 第二讲视频-在Vbox上安装Oracle Enterprise Linux 5.7虚拟主机.zip (12.05 MB, 下载次数: 11741)

第三讲视频-配置并克隆Oracle Enterprise 5.7虚拟主机

 第三讲视频-配置并克隆Oracle Enterprise 5.7虚拟主机.zip (14.82 MB, 下载次数: 30549)

第四讲视频-配置Vbox共享存储和UDEV存储设备绑定

 第四讲视频-配置Vbox共享存储和UDEV存储设备绑定.zip (13.16 MB, 下载次数: 29091)

第五讲视频-安装10.2.0.1 x64 Linux Oracle Clusterware CRS

 第五讲视频-安装10.2.0.1 x64 Linux Oracle Clusterware CRS.zip (10.48 MB, 下载次数: 8603)

第六讲视频-升级Clusterware CRS 10.2.0.1 到 Patchset 10.2.0.5

 第六讲视频-升级Clusterware CRS 10.2.0.1 到 Patchset 10.2.0.5 .zip (3.2 MB, 下载次数: 5716)

第七讲视频-安装10.2.0.1 DB数据库并升级到10.2.0.5 Patchset 并DBCA创建数据库

 第七讲视频-安装10.2.0.1DB数据库升级到10.2.0.5创建数据库.zip (10.37 MB, 下载次数: 8511)

More About ORACLE_AFFINITY ORACLE_PRIORITY Windows Process Scheduling

Using ORACLE_AFFINITY

Under certain conditions, it might be desirable to reserve a CPU for a particular thread or set of threads. Good candidates are usually threads that either are single threads and whose performance degrades with higher load or who affect performance adversely on all CPU’s on the system.

 

When running RAC, the CPU utilization of LMS processes increases at a faster pace than the number of lock and CR and current blocks requests. The throughput and scalability of LMS are crucial to the performance of RAC, since current and CR server, as well as lock request latencies are directly correlated with how fast and efficient LMS can process the requests. Tests on Solaris in Oracle8i have shown, that binding LMD0 to a separate CPU, will give nearly optimal performance results. On NT, the LMS threads can also be bound to a dedicated processor.

 

Example:

 

HKEY_LOCAL_MACHINE->SOFTWARE->ORACLE->

 

  • ORACLE_AFFINITY=”LMS0:8;LMS1:8;DEF:7;USER:7”

On a 4-CPU machine, this would indicate that the LMS threads run on CPU 3, and all other Oracle threads on CPU’s 0, 1 and 2.

 

  • ORACLE_AFFINITY=”LMS0:1;DEF:14”

LMS0 runs on CPU 0, all other threads on CPU’s 1, 2 and 3.

 

Using ORACLE_PRIORITY

 

Similarly to ORACLE_AFFINITY, the priority of an Oracle thread can be modified. There exist six different thread priorities:

 

  • lowest
  • below_normal
  • normal
  • above_normal
  • highest
  • time_critical

 

Since threads can have their priorities dynamically adjusted by the system within a process in the idle, normal or high priority class, using ORACLE_PRIORITY may not have a big impact. Also, when CPU resources are already limited, having a single thread running with increased priority will not increase throughput any further. This was observed when running the LMD thread with priority set to HIGHEST in an Oracle8 8i environment.

 

More information can be found in the “Oracle Database 10g Architecture on Windows” white paper on OTN, in the Windows specific 10g documentation.

Global Cache CR Requested But Current Block Received

这篇文章和之前的《MINSCN与Cache Fusion Read Consistent》 是姊妹篇,他们源于同一个问题帖子

我们来重现提问者所看到的这样一个场景:

 

SQL> select * from V$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production

SQL> select count(*) from gv$instance;

  COUNT(*)
----------
         2

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmaclean.com

 

 

 

在11gR2 2节点RAC环境中将一个数据块的status修改为XG,假设这个Xcurrent block当前在INSTANCE 2被hold住,这时我们在INSTANCE 1反复查询这个数据块,并观察结果:

 

SQL> select  * from test;

        ID
----------
         1
         2

SQL> select dbms_rowid.rowid_block_number(rowid),dbms_rowid.rowid_relative_fno(rowid) from test;

DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID)
------------------------------------ ------------------------------------
                               89233                                    1
                               89233                                    1

SQL> alter system flush buffer_cache;

System altered.

INSTANCE 1 Session A:

SQL>  update test set id=id+1 where id=1;

1 row updated.

INSTANCE 1 Session B:

SQL> select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;

     STATE CR_SCN_BAS
---------- ----------
         1          0
         3    1755287

SQL> oradebug setmypid;
Statement processed.
SQL> oradebug dump gc_elements 255;
Statement processed.
SQL> oradebug tracefile_name;
/s01/orabase/diag/rdbms/vprod/VPROD1/trace/VPROD1_ora_19111.trc

GLOBAL CACHE ELEMENT DUMP (address: 0xa4ff3080):
  id1: 0x15c91 id2: 0x1 pkey: OBJ#76896 block: (1/89233)
  lock: X rls: 0x0 acq: 0x0 latch: 3
  flags: 0x20 fair: 0 recovery: 0 fpin: 'kdswh11: kdst_fetch'
  bscn: 0x0.146e20 bctx: (nil) write: 0 scan: 0x0
  lcp: (nil) lnk: [NULL] lch: [0xa9f6a6f8,0xa9f6a6f8]
  seq: 32 hist: 58 145:0 118 66 144:0 192 352 197 48 121 113 424 180 58
  LIST OF BUFFERS LINKED TO THIS GLOBAL CACHE ELEMENT:
    flg: 0x02000001 lflg: 0x1 state: XCURRENT tsn: 0 tsh: 2
      addr: 0xa9f6a5c8 obj: 76896 cls: DATA bscn: 0x0.1ac898
BH (0xa9f6a5c8) file#: 1 rdba: 0x00415c91 (1/89233) class: 1 ba: 0xa9e56000
  set: 5 pool: 3 bsz: 8192 bsi: 0 sflg: 3 pwc: 0,15
  dbwrid: 0 obj: 76896 objn: 76896 tsn: 0 afn: 1 hint: f
  hash: [0x91f4e970,0xbae9d5b8] lru: [0x91f58848,0xa9f6a828]
  lru-flags: debug_dump
  obj-flags: object_ckpt_list
  ckptq: [0x9df6d1d8,0xa9f6a740] fileq: [0xa2ece670,0xbdf4ed68] objq: [0xb4964e00,0xb4964e00] objaq: [0xb4964de0,0xb4964de0]
  st: XCURRENT md: NULL fpin: 'kdswh11: kdst_fetch' tch: 2 le: 0xa4ff3080
  flags: buffer_dirty redo_since_read
  LRBA: [0x19.5671.0] LSCN: [0x0.1ac898] HSCN: [0x0.1ac898] HSUB: [1]
  buffer tsn: 0 rdba: 0x00415c91 (1/89233)
  scn: 0x0000.001ac898 seq: 0x01 flg: 0x00 tail: 0xc8980601
  frmt: 0x02 chkval: 0x0000 type: 0x06=trans data

 

 

可以看到此时block: (1/89233)的GLOBAL CACHE ELEMENT DUMP中LOCK状态仍是X 而非XG , 这是因为这个Current Block仅在一个Instance中被modify修改过,没有在全局范围内被更新过。

 

 

紧接着在Instance 2 修改该块:

 

 

Instance 2 Session C:

SQL> update test set id=id+1 where id=2;

1 row updated.

Instance 2 Session D:

SQL>  select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;

     STATE CR_SCN_BAS
---------- ----------
         1          0
         3    1756658

SQL> oradebug setmypid;
Statement processed.
SQL> oradebug dump gc_elements 255;
Statement processed.
SQL> oradebug tracefile_name;
/s01/orabase/diag/rdbms/vprod/VPROD2/trace/VPROD2_ora_13038.trc

GLOBAL CACHE ELEMENT DUMP (address: 0x89fb25a0):
  id1: 0x15c91 id2: 0x1 pkey: OBJ#76896 block: (1/89233)
  lock: XG rls: 0x0 acq: 0x0 latch: 3
  flags: 0x20 fair: 0 recovery: 0 fpin: 'kduwh01: kdusru'
  bscn: 0x0.1acdf3 bctx: (nil) write: 0 scan: 0x0
  lcp: (nil) lnk: [NULL] lch: [0x96f4cf80,0x96f4cf80]
  seq: 61 hist: 324 21 143:0 19 16 352 329 144:6 14 7 352 197
  LIST OF BUFFERS LINKED TO THIS GLOBAL CACHE ELEMENT:
    flg: 0x0a000001 state: XCURRENT tsn: 0 tsh: 1
      addr: 0x96f4ce50 obj: 76896 cls: DATA bscn: 0x0.1acdf6
BH (0x96f4ce50) file#: 1 rdba: 0x00415c91 (1/89233) class: 1 ba: 0x96bd4000
  set: 5 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,15
  dbwrid: 0 obj: 76896 objn: 76896 tsn: 0 afn: 1 hint: f
  hash: [0x96ee1fe8,0xbae9d5b8] lru: [0x96f4d0b0,0x96f4cdc0]
  obj-flags: object_ckpt_list
  ckptq: [0xbdf519b8,0x96f4d5a8] fileq: [0xbdf519d8,0xbdf519d8] objq: [0xb4a47b90,0xb4a47b90] objaq: [0x96f4d0e8,0xb4a47b70]
  st: XCURRENT md: NULL fpin: 'kduwh01: kdusru' tch: 1 le: 0x89fb25a0
  flags: buffer_dirty redo_since_read remote_transfered
  LRBA: [0x11.9e18.0] LSCN: [0x0.1acdf6] HSCN: [0x0.1acdf6] HSUB: [1]
  buffer tsn: 0 rdba: 0x00415c91 (1/89233)
  scn: 0x0000.001acdf6 seq: 0x01 flg: 0x00 tail: 0xcdf60601
  frmt: 0x02 chkval: 0x0000 type: 0x06=trans data

 GCS CLIENT 0x89fb2618,6 resp[(nil),0x15c91.1] pkey 76896.0
   grant 2 cvt 0 mdrole 0x42 st 0x100 lst 0x20 GRANTQ rl G0
   master 1 owner 2 sid 0 remote[(nil),0] hist 0x94121c601163423c
   history 0x3c.0x4.0xd.0xb.0x1.0xc.0x7.0x9.0x14.0x1.
   cflag 0x0 sender 1 flags 0x0 replay# 0 abast (nil).x0.1 dbmap (nil)
   disk: 0x0000.00000000 write request: 0x0000.00000000
   pi scn: 0x0000.00000000 sq[(nil),(nil)]
   msgseq 0x1 updseq 0x0 reqids[6,0,0] infop (nil) lockseq x2b8
   pkey 76896.0
   hv 93 [stat 0x0, 1->1, wm 32768, RMno 0, reminc 18, dom 0]
   kjga st 0x4, step 0.0.0, cinc 20, rmno 6, flags 0x0
   lb 0, hb 0, myb 15250, drmb 15250, apifrz 0

 

 

在Instance 2中被修改过后block: (1/89233)的 GLOBAL CACHE ELEMENT Lock Convert成lock: XG

除了通过GC_ELEMENTS DUMP来分析XCUR Cache Fusion外,也可以直接查询X$ VIEW,主要是 X$LE X$KJBR X$KJBL, 这三个X$ VIEW的更多信息可以很方便地从我的博客中找到:

 

 

INSTANCE 2 Session D:

SELECT *
  FROM x$le
 WHERE le_addr IN (SELECT le_addr
                     FROM x$bh
                    WHERE obj IN (SELECT data_object_id
                                    FROM dba_objects
                                   WHERE owner = 'SYS'
                                     AND object_name = 'TEST')
                      AND class = 1
                      AND state != 3);

ADDR                   INDX    INST_ID LE_ADDR              LE_ID1     LE_ID2
---------------- ---------- ---------- ---------------- ---------- ----------
    LE_RLS     LE_ACQ   LE_FLAGS    LE_MODE   LE_WRITE   LE_LOCAL LE_RECOVERY
---------- ---------- ---------- ---------- ---------- ---------- -----------
   LE_BLKS    LE_TIME LE_KJBL
---------- ---------- ----------------
00007F94CA14CF60       7003          2 0000000089FB25A0      89233          1
         0          0         32          2          0          1           0
         1          0 0000000089FB2618

 

 

PCM Resource NAME由[ID1][ID2],[BL]等组成, ID1和ID2 通过blockno和 fileno计算获得, 这里我们直接参考以上GC_elements dump中的 id1: 0x15c91 id2: 0x1 pkey: OBJ#76896 block: (1/89233)信息 ,则  kjblname 和 kjbrname 应以”[0x15c91][0x1],[BL]” 开头:

 

 

INSTANCE 2 Session D:

SQL> set linesize 80 pagesize 1400
SQL> SELECT *
  2    FROM x$kjbl l
  3   WHERE l.kjblname LIKE '%[0x15c91][0x1],[BL]%';

ADDR                   INDX    INST_ID KJBLLOCKP        KJBLGRANT KJBLREQUE
---------------- ---------- ---------- ---------------- --------- ---------
  KJBLROLE KJBLRESP         KJBLNAME
---------- ---------------- ------------------------------
KJBLNAME2                       KJBLQUEUE
------------------------------ ----------
KJBLLOCKST                                                       KJBLWRITING
---------------------------------------------------------------- -----------
KJBLREQWRITE  KJBLOWNER KJBLMASTER KJBLBLOCKED KJBLBLOCKER    KJBLSID KJBLRDOMID
------------ ---------- ---------- ----------- ----------- ---------- ----------
  KJBLPKEY
----------
00007F94CA22A288        451          2 0000000089FB2618 KJUSEREX  KJUSERNL
         0 00               [0x15c91][0x1],[BL][ext 0x0,0x
89233,1,BL                              0
GRANTED                                                                    0
           0          1          0           0           0          0          0
     76896

SQL> SELECT r.* FROM x$kjbr r WHERE r.kjbrname LIKE '%[0x15c91][0x1],[BL]%';

no rows selected

Instance 1 session B:

SQL>  SELECT r.* FROM x$kjbr r WHERE r.kjbrname LIKE '%[0x15c91][0x1],[BL]%';

ADDR                   INDX    INST_ID KJBRRESP         KJBRGRANT KJBRNCVL
---------------- ---------- ---------- ---------------- --------- ---------
  KJBRROLE KJBRNAME                       KJBRMASTER KJBRGRANTQ
---------- ------------------------------ ---------- ----------------
KJBRCVTQ         KJBRWRITER          KJBRSID KJBRRDOMID   KJBRPKEY
---------------- ---------------- ---------- ---------- ----------
00007F801ACA68F8       1355          1 00000000B5A62AE0 KJUSEREX  KJUSERNL
         0 [0x15c91][0x1],[BL][ext 0x0,0x          0 00000000B48BB330
00               00                        0          0      76896

 

 

接着我们将在Instance 1上查询block: (1/89233),这应当会引发Instance 2 build cr block 并传输给Instance 1, 为了进一步了解其细节 我们分别对 Instance 1的 Foreground Process 和 Instance 2的LMS进程做详细的RAC  TRACE:

 

Instance 2:

[oracle@vrh2 ~]$ ps -ef|grep ora_lms|grep -v grep
oracle   23364     1  0 Apr29 ?        00:33:15 ora_lms0_VPROD2

SQL> oradebug setospid 23364
Oracle pid: 13, Unix process pid: 23364, image: oracle@vrh2.oracle.com (LMS0)

SQL> oradebug event 10046 trace name context forever,level 8:10708 trace name context forever,level 103: trace[rac.*] disk high;
Statement processed.

SQL> oradebug tracefile_name
/s01/orabase/diag/rdbms/vprod/VPROD2/trace/VPROD2_lms0_23364.trc

Instance 1 session B :

SQL> select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;

     STATE CR_SCN_BAS
---------- ----------
         3    1756658
         3    1756661
         3    1755287

Instance 1 session A :

SQL> alter session set events '10046 trace name context forever,level 8:10708 trace name context forever,level 103: trace[rac.*] disk high';

Session altered.

SQL> select * from test;

        ID
----------
         2
         2

SQL> select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;

     STATE CR_SCN_BAS
---------- ----------
         3    1761520

 

从x$BH的信息来看,如我们所预期的Instance 1收到了build好的CR block,进一步分析 TRACE 文件:

 

Instance 1 foreground Process:

PARSING IN CURSOR #140336527348792 len=18 dep=0 uid=0 oct=3 lid=0 tim=1335939136125254 hv=1689401402 ad='b1a4c828' sqlid='c99yw1xkb4f1u'
select * from test
END OF STMT
PARSE #140336527348792:c=2999,e=2860,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=1357081020,tim=1335939136125253
EXEC #140336527348792:c=0,e=40,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1357081020,tim=1335939136125373
WAIT #140336527348792: nam='SQL*Net message to client' ela= 6 driver id=1650815232 #bytes=1 p3=0 obj#=0 tim=1335939136125420

*** 2012-05-02 02:12:16.125
kclscrs: req=0 block=1/89233
2012-05-02 02:12:16.125574 : kjbcro[0x15c91.1 76896.0][4]

*** 2012-05-02 02:12:16.125
kclscrs: req=0 typ=nowait-abort

*** 2012-05-02 02:12:16.125
kclscrs: bid=1:3:1:0:f:1e:0:0:10:0:0:0:1:2:4:1:20:0:0:0:c3:49:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:4:3:2:1:2:0:1c:0:4d:26:a3:52:0:0:0:0:c7:c:ca:62:c3:49:0:0:0:0:1:0:14:8e:47:76:1:2:dc:5:a9:fe:17:75:0:0:0:0:0:0:0:0:0:0:0:0:99:ed:0:0:0:0:0:0:10:0:0:0
2012-05-02 02:12:16.125718 : kjbcro[0x15c91.1 76896.0][4]
2012-05-02 02:12:16.125751 : GSIPC:GMBQ: buff 0xba0ee018, queue 0xbb79a7b8, pool 0x60013fa0, freeq 0, nxt 0xbb79a7b8, prv 0xbb79a7b8
2012-05-02 02:12:16.125780 : kjbsentscn[0x0.1ae0f0][to 2]
2012-05-02 02:12:16.125806 : GSIPC:SENDM: send msg 0xba0ee088 dest x20001 seq 177740 type 36 tkts xff0000 mlen x1680198
2012-05-02 02:12:16.125918 : kjbmscr(0x15c91.1)reqid=0x8(req 0xa4ff30f8)(rinst 1)hldr 2(infosz 200)(lseq x2b8)
2012-05-02 02:12:16.126959 : GSIPC:KSXPCB: msg 0xba0ee088 status 30, type 36, dest 2, rcvr 1

*** 2012-05-02 02:12:16.127
kclwcrs: wait=0 tm=1233

*** 2012-05-02 02:12:16.127
kclwcrs: got 1 blocks from ksxprcv
WAIT #140336527348792: nam='gc cr block 2-way' ela= 1233 p1=1 p2=89233 p3=1 obj#=76896 tim=1335939136127199
2012-05-02 02:12:16.127272 : kjbcrcomplete[0x15c91.1 76896.0][0]
2012-05-02 02:12:16.127309 : kjbrcvdscn[0x0.1ae0f0][from 2][idx 2012-05-02 02:12:16.127329 : kjbrcvdscn[no bscn <= rscn 0x0.1ae0f0][from 2]

 

 

前台进程 kjbcro[0x15c91.1 76896.0][4] kjbsentscn[0x0.1ae0f0][to 2] 向Instance 2请求SCN=1ae0f0=1761520的 block: (1/89233),并进入’gc cr block 2-way’ 等待,之后成功收到请求的CR block。

 

Instance 2 LMS TRACE

 

 

2号实例上LMS进程的情况是这样的 收到gcs remote message GSIPC 关于要求SCN=[0x0.1ae0f0] block=1/89233的请求,进入BAST kjbmpbast(0x15c91.1),因为 block=1/89233已被写入到磁盘 触发fairness算法(在11.2.0.3中默认_fairness_threshold=2),对current block做KCL: F156: fairness downconvert,从Xcurrent DownConvert为 Scurrent:

 

 

之后Instance 2 LMS 将cr block加入到 kjbmslset(0x15c91.1)) 传送队列SEND QUEUE GSIPC:SNDQ: enq msg 0x7f16e1c56490。

 

接着我们再次在Instance 1上运行对 block: (1/89233)的查询 看会发生什么:

 

 

前台进程与上一次类似申请的是SCN=[0x0.1ae28c]=1761932 Version的CR block, 但这一次receive的居然是Xcurrent Block且其SCN=1ae28d=1761933,Instance 1收到了Xcurrent Block后自己build出了查询所需要的SCN=1761932的CR BLOCK, 因为实际收到的是Current block,所以等待事件变成了’gc current block 2-way’。

这里可以看到前台进程并没有request current block,而仍是使用kjbcro;追根述源是Instance 2的LMS进程给它传送了Current Block:

 

 

 

为什么Instance 2上的LMS要偷懒,不构造build cr block,而直接传输给Instance 1自己所有的Current Block呢?通过观察Instance 2上的v$cr_block_server视图可以发现LIGHT_WORKS字段在发生current block transfer后并没有增长,说明并没有触发 CR server的 Light Work Rule(Light Work Rule是8i Cr Server中就存在的优化算法,当Remote LMS创建 build CR涉及的工作过多时,resource holder的LMS会选择传输现有block,而不完成CR build If creating the consistent read version block involves too much work (such as reading blocks from disk), then the holder sends the block to the requestor, and the requestor completes the CR fabrication. The holder maintains a fairness counter of CR requests. After the fairness threshold is reached, the holder downgrades it to lock mode.)。

 

到底何种条件下 CR Request 才会收到Current Block呢?

答案是:针对不同种类class的block,CR server有不同的做法。 对于undo block或者 undo header block的CR quest, LMS总是传输Current Block, 这是为了给 远程节点 可能正在进行的 block cleanout、 CR  Version构造提供便利。

 

对于普通的数据块 data blocks, 默认情况下总是 CR quest  & CR received的(排除之前所说的Light Work Rule,LMS”偷懒”), 除非Current Block已经DownConvert降级为S lock,那么LMS进程会宣称直接ship一个current version的block。

 

为了证明这一点 , 我们再次测试 ,这次我们将控制DownConvert的隐藏参数”_fairness_threshold“提高到200,这将造成Xcurrent Block无法降级为Scurrent, 也将导致LMS更少地传输Current Version的Data Block:

 

 

 

 

 

实际测试发现_fairness_threshold似乎存在某个上限,虽然设置为200 但是在不多的几次CR serve后就Downgrade了lock, 这造成对data block的 CR Request最终还是Receive了 Current Block。

MINSCN与Cache Fusion Read Consistent

问题来源于 网友在Ask Maclean Home 上关于  RAC 中 Past Image PI块的提问, 网友提出了一个演示,我们在11.2.0.3 2 Node RAC的环境中重现这个实验:

 

 

X$BH 视图的 STATE字段代表Buffer的状态, 以下是状态列表:

 

 

 

这个演示中我们需要用到的是 : state =1 Xcurrent 、 state=2 Scurrent 、 state=3 CR

 

接着在 Instance 2  更新 同一个数据块内的另一条记录 ,这回引发 gc current block 2 way  并将Current Block 传输到 Instance 2, 同时 Instance 1 的原”Current Block” Convert 成 Past Image:

 

 

 

STATE =1 的Xcurrent block已传输到 Instance 2 , 再来看 Instance 1 此时的 GC状态:

 

 

 

问题出现在这里, 当网友再次在Instance 1上的session A中执行对TEST表的SELECT查询后 ,发现原来的 3个 State=3的CR 块 数量减少到了1个:

 

 

 

网友在原帖中是通过v$BH 视图观察CR块的数量,发现在执行SELECT查询后 CR块数量反而减少了,故而产生了疑问。 我们在以上演示中直接 观察X$BH视图可以发现 , 原本的三个CR块的SCN Version分别为: 1227641、1227638、1227595, 在SELECT查询完成后被 2个不同SCN version的CR块 1227716和 1227713 所替换, Oracle为什么要这样做呢?

 

所幸我们在实际执行SELECT查询前设置了event 10708和 rac.*的诊断TRACE,我们先来看看TRACE内容:

 

 

 

通过TRACE不难发现 因为之前没有收集过TEST表的统计信息, 所以这里出发了Dynamic Sampling的动态采样,这本身会引发对TEST表的 CR读请求,实际产生了一次’gc cr block 2-way’ 等待:

2012-04-29 07:28:33.632654 : kjbsentscn[0x0.12bbc1][to 2]

12bbc1= 1227713  与上述X$BH中的一个CR块对应,kjbsentscn[0x0.12bbc1][to 2] 可以理解为 向 Instance 2 发送了SCN=12bbc1=1227713   DBA=0x15c91.1 76896.0 的  CR Request(obj#=76896)

之后kjbrcvdscn函数确认了 [no bscn <= rscn 0x0.12bbc1][from 2] ,即没有 比已receive的 SCN Version =12bbc1 更好的Best Version

 

CR Server Arch

CR Server in Oracle9i Release 2 There are three instances involved: the requestor instance, the lock master instance, and the current block owner instance. The lock is granted if one of the following is true: Resource held mode is NULL. Resource held mode is S and there is no holder of an S lock in the master node. Otherwise, the master forwards the CR request to the holder node. If the lock is global, then you choose a node to forward the CR request to as follows: If there is a past image (PI) at the lock master instance, and the PI SCN is greater than snap-scn, then the master node is this node. Otherwise, you choose a PI with the smallest SCN and PI-SCN greater than snap-SCN. The owner node of this PI is the node you forward the CR request to. The PI with smallest SCN is the most interesting one, because you have less UNDO to be applied. If there is no PI at all, you choose the node that the current buffer belongs to.

 

 

动态采样完成后才真正执行了用户发出的SELECT语句:

 

 

 

类似的, “SELECT * FROM TEST”也引发了一次’gc cr block 2-way’等待:
2012-04-29 07:28:33.637374 : kjbrcvdscn[0x0.12bbc4][from 2][idx 2012-04-29 07:28:33.637389 : kjbrcvdscn[no bscn

最后Foreground Process从 Remote LMS哪里got的是 SCN=1227716 Version的CR, 同样与 之前我们从X$BH 视图查到的scn对应。

 

这样就可以解释为什么Instance 1上出现了2个SCN更大的CR块, 但仍无法解释原来存在于Instance 1 Buffer Cache中的 三个SCN Version 较小的CR 块消失的原因。

 

我们来看下面的演示:

 

 

 

 

设置以上 “_enable_minscn_cr”=false 和 “_db_block_max_cr_dba”=20 并重启RAC所有实例, 重现上述演示:

 

 

 

 

如上述演示 在设置了 “_enable_minscn_cr”(enable/disable minscn optimization for CR)=false 和 “_db_block_max_cr_dba”=20 (Maximum Allowed Number of CR buffers per dba) 2个 参数后 最多的时候 Instance 1 中缓存了同一个数据块的 多达 19个版本的CR块。

 

“_enable_minscn_cr”是11g以后出现的新隐藏参数,它控制Oracle是否计算CR块的最小SCN,当Foreground Process Receive接收到同一个数据块的更新(SCN更大)的SCN Version CR Block时可能会清除CBC上的 SCN较小的、旧的CR块 , 这样做的目的是减少Buffer Cache中同一个数据块 不同版本SCN Version的CR块的数量,注意不管是 语句级别或者事务级别 其所要求的Snap_Scn 快照 SCN总是 语句或事务开始时的Current SCN, 保留一些旧的CR块虽然可能对一些持续时间长的查询或者游标有益, 但是实例Buffer Cache中 同一个数据块的多版本 CR块的总数量是有限的, 这个总数受到 “_db_block_max_cr_dba” 隐藏参数的控制, 如我们上述演示中设置为20 ,则最多可以在Buffer Cache中缓存多大19个版本的CR块; 注意该”_db_block_max_cr_dba” 参数的默认值为6 , 即一个实例Buffer cache中同一个数据块的CR 版本同时不多于6个。

 

引入”_enable_minscn_cr” 优化CR的最小MINSCN 是有其理由的, 即便那些版本较旧的CR块被新陈代谢了, 但只要需要 Foreground Process还是可以通过CR Request , 要求 Holder Instance LMS 去build一个 BEST CR 来用, 这不消我们去担心。

Script:收集RAC诊断信息

以下脚本可以用于汇总RAC诊断信息, 并非每一个输出或者每一个日志都要阅读,在问题诊断时可以有针对性地查看, 如果是远程协助的会,可以让用户收集完打包 ,这样省去多次交互的时间:

 

 

      - Cluster 启动状态
   - hostname
   - uptime 
   - vmstat iostat nmon OS stats 
   - /etc/sysctl.conf /etc/system  vmo -a no -a 
   - /etc/inittab
   - /etc/init.d/init.crs
   - /etc/init.d/init.crsd
   - /etc/init.d/init.cssd
   - /etc/init.d/init.evmd
   - olsnodes
   - crsctl check css
   - crsctl check crs status
   - crsctl check crsd status
   - crsctl check cssd status
   - crsctl check evmd status
   - crsctl query css votedisk
   - crsctl get css priority
   - crsctl get css diagwait
   - crsctl get css disktimeout
   - crsctl get css misscount
   - crsctl get css reboottime
   - crsctl query crs softwareversion
   - crsctl query crs activeversion
   - opatch lsinventory -detail 
   - /var/opt/oracle/ocr.loc 或 /etc/oracle/ocr.loc
   - cluvfy comp sys -n <hostname> -p crs -verbose
   - cluvfy comp sys -n <hostname> -p databaase -verbose
   - cluvfy stage -post crsinst -n <hostname> -verbose
   - cluvfy comp software -verbose -n all
   - Clusterware 版本
   - $ORA_CRS_HOME 已安装产品的信息
   - $ORA_CRS_HOME 已应用补丁的信息
   - crs_stat -u
   - crs_stat -p
   - $ORA_CRS_HOME/log/<hostname>/alert<hostname>.log
   - $ORA_CRS_HOME/crs/log/*.log
   - $ORA_CRS_HOME/log/<hostname>/crsd/*
   - $ORA_CRS_HOME/css/log/*.log
   - $ORA_CRS_HOME/css/init/*.log
   - $ORA_CRS_HOME/log/<hostname>/cssd/*
   - $ORA_CRS_HOME/racg/dump/*
   - $ORACLE_HOME/racg/dump/*
   - $ORA_CRS_HOME/log/<hostname>/racg/*
   - $ORACLE_HOME/log/<hostname>/racg/*
   - $ORA_CRS_HOME/evm/log/*.log
   - $ORA_CRS_HOME/evm/init/*.log
   - $ORA_CRS_HOME/log/<hostname>/evmd/*
   - $ORA_CRS_HOME/log/<hostname>/client/*
   - $ORACLE_HOME/log/<hostname>/client/*
   - $ORA_CRS_HOME/log/<hostname>/ohasd/ohasd.log
   - $ORA_CRS_HOME/log/<hostname>/gpnpd/gpnpd.log
   - /etc/oracle/oprocd/*oprocd.log 或 /var/opt/oracle/oprocd/*oprocd.log
   - /etc/oracle/lastgasp/*.lgl 或  /var/opt/oracle/lastgasp/*.lgl
   - debug resource
  # cd $ORA_CRS_HOME/bin
  # crsctl debug log res <resname:level>
  # crsctl debug log res <vip resourfce name>:5
   - ocrdump
   - ocrcheck
   - ocrconfig -showbackup
   - /var/opt/oracle/srvConfig.loc 或 /etc/srvConfig.loc
   - srvctl config nodeapps -n <hostname> -a -l -g -s
   - srvctl config
   - srvctl config database -d <dbname> -a
   - srvctl config scan_listener
   - srvctl status scan_listener
   - srvctl config scan
   - srvctl status scan
   - srvctl config srvpool
   - srvctl status srvpool -a
   - srvctl config listener
   - srvctl status listener
   - srvctl config listener -n <hostname>
   - oifcfg iflist
   - oifcfg getif
   - interconnect 信息 x$skgxpia or oradebug ipc
   - IPC 信息
   - messages 或 syslog OS日志 
   IBM: /bin/errpt -a > messages.out
      Linux: /var/log/messages
      Solaris: /var/adm/messages
   - netstat -rn
   - ifconfig -a
   - cat /etc/hosts
   - netstat -s

Script:Collect vip resource Diagnostic Information

以下脚本可以用于收集 Oracle RAC中vip 资源或其他CRS resource的诊断信息:

 

action plan:
./runcluvfy.sh stage -post crsinst -n all  -verbose
./runcluvfy.sh stage -pre crsinst -n all  -verbose
or
cluvfy stage -post crsinst -n all -verbose
cluvfy stage -pre  crsinst -n all -verbose
1. Please upload the following logs of all two nodes:
$CRS_HOME/log/<nodename>/*.log
$CRS_HOME/log/<nodename>/crsd/*.log
$CRS_HOME/log/<nodename>/cssd/*.log
$CRS_HOME/log/<nodename>/racg/*.log
$CRS_HOME/log/<nodename>/client/*.log
$CRS_HOME/log/<nodename>/evmd/*.log
/etc/oracle/oprocd/*.log.* or /var/opt/oracle/oprocd/*.log.* (If have)
$crs_stat –t
$crsctl check crs
$crsctl check boot
2. 
Please consult your sysadmin and make sure that the gateway is pingable all the time 
1- test the gw on every node
consult your sysadmin to create a crontab unix shell script to ping the
gateway of your public interface every 2 seconds for example and the result is to be
spooled in /tmp/test_gw_<nodename>.log
ping your gateway  and upload the ping log 
2- increase the tracing level of the vip resource  
  as root user
  # cd $ORA_CRS_HOME/bin
  # crsctl debug log res <resname:level>
  # crsctl debug log res <vip resourfce name>:5
3- restart the clusterware
3- execute this test on both nodes at the same time
   $ script /tmp/testvip_<nodename>.log
   $ cd $ORA_CRS_HOME/bin
   $ hostname
   $ date
   $ cat /etc/hosts
   $ ifconfig -a
   $ oifcfg getif  
   $ netstat -rn
   $ oifcfg iflist
   $ srvctl config nodeapps -n <nodename> -a -g -s -l               (repeate it for all nodes)
   $ crs_stat –t
   $ exit
4- reset the tracing level of the vip resource  
  as root user
  # cd $ORA_CRS_HOME/bin
  # crsctl debug log res <resname:level>
  # crsctl debug log res <vip resourfce name>:1
Up on the next occurence, please upload the following information from all nodes
  a-  /tmp/test_gw_<nodename>.log
  b- /tmp/testvip_<nodename>.log
  c- the crsd log
  d. The resource racg
     $ORA_CRS_HOME/log/<nodename>/racg/vip*
  e. the racgvip script from
     $ORA_CRS_HOME/bin/racgvip
  f- RDA from all the nodes
    Note 314422.1 Remote Diagnostic Agent (RDA) 4.0 – Overview
   g- the o/s message file
      IBM: /bin/errpt -a > messages.out
      Linux: /var/log/messages
      Solaris: /var/adm/messages
3. CRS Diagnostics
note 330358.1 -- CRS Diagnostic Collection Guide, please use (all .gz files especially crsData_$HOST.tar.gz
need to be uploaded)
diagcollection.pl --collect 
Please make sure to include *ALL* requested files (missing any will delay or prevendting from
identifying root cause) from *ALL* nodes in a single zip and upload.
Note 330358.1 - CRS 10gR2/ 11gR1/ 11gR2 Diagnostic Collection Guide       
Note 298895.1 - Modifying the default gateway address used by the Oracle 10g VIP
Note 399213.1 - VIP Going Offline Intermittantly - Slow Response from Default Gateway
Note 401783.1 - Changes in Oracle Clusterware after applying 10.2.0.3 Patchset

RAC CRS Resource资源的生命周期

CRS Resource的生命周期可以通过一系列的crs_* 资源管理命令来一步步诠释:

 

  • crs_profile : Create/Edit the resource’s attributes
  • crs_register : Add the resource
  • crs_start : start the resource
  • crs_stat : status of the resource
  • crs_relocate : move the resource
  • crs_stop : stop the resource
  • crs_unregister : Remove the resource

 

如何确认11.2 RAC Grid Infrastructure的时区

本文档描述如何在11.2 Grid Infrastructure 安装完成后修改其GI的时区time zone 设置。

 

一旦OS默认时区被修改,注意确保以下2点:

1. 对于11.2.0.1 ,确保root、grid、oracle用户的shell环境变量TZ设置正确!
2. 对于11.2.0.2及以上版本,确认 $GRID_HOME/crs/install/s_crsconfig_<nodename>_env.txt 中的参数设置为正确的时区

 

例如:

 

ech $TZ
TZ=US/Pacific
grep TZ s_crsconfig__env.txt 
TZ=US/Pacific

 

 

若timezone设置时区不正确或存在无关字符可能导致RAC Grid Infrastructure无法正常启动。

确保以上2点保证GI能正常启动,这种因时区不正确导致的启动异常问题常发生在OS、GI已经安装完毕后而时区最后被修改的情景中,若发现OS时间与ohasd.log、ocssd.log等日志中的最新记录的时间不匹配,则往往是由该时区问题引起的。

 

在11.2 CRS之前可以通过init.cssd diag来确认时区设置。

 

 

以下为各OS上Timezone默认配置信息

 

 

Linux
To change: /usr/sbin/timeconfig
To display current setting:
cat /etc/sysconfig/clock
ZONE="America/Los_Angeles"
UTC=true
ARC=false
To find out all valid setting: ls -l /usr/share/zoneinfo
Anything that appears in this directory is valid to use, for example, CST6CDT and America/Chicago.
Note: the "Zone" field in /etc/sysconfig/clock could be different than what's in /usr/share/zoneinfo in OL6.3/RHEL6.3, the one from /usr/share/zoneinfo should be used in $GRID_HOME/crs/install/s_crsconfig_<nodename>_env.txt
hp-ux
To display current setting:
cat /etc/default/tz
PST8PDT
To change: set_parms timezone
To find out all valid setting: ls -l /usr/lib/tztab
Solaris
To display current setting:
grep TZ /etc/TIMEZONE
TZ=US/Pacific
To change, modify TIMEZONE, also run "rtc -z US/pacific; rtc -c"
To find out all valid settings: ls -l /usr/share/lib/zoneinfo
AIX
To display current setting:
grep TZ /etc/environment
TZ=GMT

沪ICP备14014813号

沪公网安备 31010802001379号