Know more about Oracle User Space Quota

Tablespace Quota 表空间限额是Oracle数据库中限制User使用空间的重要手段,我们来深入浅出地了解一下Space Quota在内部的实现:

 

 

SQL> select  * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bi
PL/SQL Release 10.2.0.5.0 - Production
CORE    10.2.0.5.0      Production
TNS for Linux: Version 10.2.0.5.0 - Production
NLSRTL Version 10.2.0.5.0 - Production

SQL> select  * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmaclean.com

SQL> create user maclean_space identified by oracle;

User created.

SQL> oradebug setmypid;       
Statement processed.
SQL> oradebug event 10046 trace name context forever,level 12;
Statement processed.
SQL> 
SQL> alter user maclean_space quota 10M on users;

User altered.

SQL> oradebug tracefile_name
/s01/admin/G10R25/udump/g10r25_ora_25686.trc

[oracle@vrh8 ~]$ egrep -i "insert|update|delete" /s01/admin/G10R25/udump/g10r25_ora_25686.trc
insert into tsq$ (ts#,user#,blocks,maxblocks,grantor#,priv1,priv2,priv3) values (:1,:2,:3,:4,:5,:6,:7,:8)

update user$ set name=:2,password=:3,datats#=:4,tempts#=:5,type#=:6,defrole=:7,resource$=:8,ptime=DECODE(to_char(:9, 'YYYY-MM-DD'), 
'0000-00-00', to_date(NULL), :9),exptime=DECODE(to_char(:10, 'YYYY-MM-DD'), '0000-00-00', to_date(NULL), :10),
ltime=DECODE(to_char(:11, 'YYYY-MM-DD'), '0000-00-00', to_date(NULL), :11),astatus=:12, lcount=:13, defschclass=:14, spare1=:15 where user#=:1
STAT #3 id=1 cnt=0 pid=0 pos=1 obj=0 op='UPDATE  USER$ (cr=4 pr=0 pw=0 time=296 us)'

 

 

以上可以看到登录用户quota限额信息到数据字典的dictionary recursive SQL 数据字典递归SQL是”insert into tsq$” 向字典TSQ$中插入一条记录,  TSQ$是重要的数据字典基表,在创建数据字典时被create, 在11g以前可以从$ORACLE_HOME/rdbms/admin/sql.bsq中找到该表的定义:

 

 

create table tsq$                                  /* tablespace quota table */
( ts#           number not null,                        /* tablespace number */
  user#         number not null,                              /* user number */
  grantor#      number not null,                               /* grantor id */
  blocks        number not null,         /* number of blocks charged to user */
  maxblocks     number,     /* user's maximum number of blocks, NULL if none */
  priv1         number not null,            /* reserved for future privilege */
  priv2         number not null,            /* reserved for future privilege */
  priv3         number not null)            /* reserved for future privilege */
cluster c_user# (user#)
/

 

USER_TS_QUOTAS和DBA_TS_QUOTAS这些字典视图直接依赖于tsq$和seg$这2个字典基表, 它们的定义在11g之前可以在$ORACLE_HOME/rdbms/admin/catspace.sql中找到:

 

 

remark  FAMILY "TS_QUOTAS"
remark  Tablespace quotas for users.
remark  This family has no ALL member.
remark
Rem
Rem  Performance improvement:
Rem    Get segments number of blocks from seg$.blocks. This column was
Rem    introduced in 10g. For databases that were upgraded from older
Rem    releases, dbms_space_admin.segment_number_blocks() is called to
Rem    gather the information.
Rem    View USER_TS is now useless. It is still left here just to avoid
Rem    any potential upgrade issue.
Rem
create or replace view USER_TS(uname, tsname, tsn)
as select user$.name, ts$.name, ts$.ts# from user$, ts$
/
create or replace view TBS_SPACE_USAGE(tsn, user#, blocks, maxblocks)
as select tsq$.ts#, tsq$.user#,
          NVL(sum(decode(bitand(seg$.spare1, 131072), 131072, seg$.blocks,
                         (decode(bitand(seg$.spare1, 1), 1,
                            dbms_space_admin.segment_number_blocks(tsq$.ts#,
                                   seg$.file#, seg$.block#, seg$.type#,
                                   seg$.cachehint, seg$.spare1,
                                   seg$.hwmincr, seg$.blocks), seg$.blocks)))),
              0),
          tsq$.maxblocks
from seg$, tsq$
where tsq$.ts# = seg$.ts# (+)
and   tsq$.user# = seg$.user# (+)
group by tsq$.ts#, tsq$.user#, tsq$.maxblocks
/
create or replace view USER_TS_QUOTAS
    (TABLESPACE_NAME, BYTES, MAX_BYTES, BLOCKS, MAX_BLOCKS, DROPPED)
as
select ts.name, spc.blocks * ts.blocksize,
       decode(spc.maxblocks, -1, -1, spc.maxblocks * ts.blocksize),
       spc.blocks, spc.maxblocks, decode(ts.online$, 3, 'YES', 'NO')
from sys.ts$ ts, sys.tbs_space_usage spc
where spc.tsn = ts.ts#
  and spc.user# = userenv('SCHEMAID')
/

 

 

 

需要注意的是UNLIMITED TABLESPACE这个无限表空间限额的系统权限并不依赖于TSQ$的份额基表,所以也不会产生USER_TS_QUOTAS/DBA_TS_QUOTAS中的记录:

 

SQL> create user maclean_space1 identified by oracle;

User created.

SQL> oradebug setmypid;       
Statement processed.
SQL> oradebug event 10046 trace name context forever,level 12;
Statement processed.
SQL> 
SQL> grant UNLIMITED TABLESPACE to maclean_space1;

Grant succeeded.

SQL> oradebug tracefile_name
/s01/admin/G10R25/udump/g10r25_ora_25820.trc

[oracle@vrh8 ~]$ egrep -i "insert|update|delete" /s01/admin/G10R25/udump/g10r25_ora_25820.trc
insert into sysauth$ (grantee#,privilege#,option$,sequence#) values (:1,:2,decode(:3,0,null,:3),system_grant.nextval)

 

 

此外Oracle并不会通过dictionary recursive SQL字典递归SQL了解表空间份额的信息,而是直接将这部分信息缓存在row cache字典缓存的dc_tablespace_quotas中:

 

 

SQL> ALTER SESSION SET EVENTS 'immediate trace name row_cache level 10';

Session altered.

SQL> oradebug setmypid;
Statement processed.
SQL> 
SQL> oradebug tracefile_name
/s01/admin/G10R25/udump/g10r25_ora_25854.trc

[oracle@vrh8 ~]$ grep dc_tablespace_quotas /s01/admin/G10R25/udump/g10r25_ora_25854.trc
  row cache parent object: address=0x876732e0 cid=5(dc_tablespace_quotas)
  row cache parent object: address=0x8a4bb778 cid=5(dc_tablespace_quotas)
  row cache parent object: address=0x8aa09478 cid=5(dc_tablespace_quotas)
  row cache parent object: address=0x8a91c958 cid=5(dc_tablespace_quotas)
  row cache parent object: address=0x8749f648 cid=5(dc_tablespace_quotas)
  row cache parent object: address=0x8a8aafd8 cid=5(dc_tablespace_quotas)
  row cache parent object: address=0x8749f418 cid=5(dc_tablespace_quotas)
  row cache parent object: address=0x8d358fc0 cid=5(dc_tablespace_quotas)
  row cache parent object: address=0x8aa69e88 cid=5(dc_tablespace_quotas)

 

 

当我们添加一个定额用户(quota user)时, 相应的要多产生一个dc_tablespace_quotas row cache parent object:

 

 

SQL> create user maclean_space2 identified by oracle;

User created.

SQL> alter user maclean_space2 quota 100M on users;

User altered.

SQL> ALTER SESSION SET EVENTS 'immediate trace name row_cache level 10';

Session altered.

[oracle@vrh8 ~]$ grep dc_tablespace_quotas /s01/admin/G10R25/udump/g10r25_ora_25891.trc
 row cache parent object: address=0x872d3d08 cid=5(dc_tablespace_quotas)
  row cache parent object: address=0x876732e0 cid=5(dc_tablespace_quotas)
  row cache parent object: address=0x8a4bb778 cid=5(dc_tablespace_quotas)
  row cache parent object: address=0x8aa09478 cid=5(dc_tablespace_quotas)
  row cache parent object: address=0x8a91c958 cid=5(dc_tablespace_quotas)
  row cache parent object: address=0x8749f648 cid=5(dc_tablespace_quotas)
  row cache parent object: address=0x8a8aafd8 cid=5(dc_tablespace_quotas)
  row cache parent object: address=0x8749f418 cid=5(dc_tablespace_quotas)
  row cache parent object: address=0x8d358fc0 cid=5(dc_tablespace_quotas)
  row cache parent object: address=0x8aa69e88 cid=5(dc_tablespace_quotas)

BUCKET 23:
  row cache parent object: address=0x872d3d08 cid=5(dc_tablespace_quotas)
  hash=36109d16 typ=9 transaction=(nil) flags=00000002
  own=0x872d3dd8[0x872d3dd8,0x872d3dd8] wat=0x872d3de8[0x872d3de8,0x872d3de8] mode=N
  status=VALID/-/-/-/-/-/-/-/-
  data=
  00000004 0000004a 00000000 00003200 00000000 00000000
  BUCKET 23 total object count=1

SQL> select * from user_ts_quotas;

TABLESPACE_NAME                     BYTES  MAX_BYTES     BLOCKS MAX_BLOCKS DRO
------------------------------ ---------- ---------- ---------- ---------- ---
USERS                                   0  104857600          0      12800 NO

 

 

可以看到以上address=0x872d3d08对象的dc_tablespace_quotas记录是create user/alter user quota后产生的,该row cache的data stack中的0x3200对应为12800 个block。

 

修改该用户的quota信息,会引发stack data的变化:

 

 

  SQL> alter user maclean_space2 quota 101M on users;

User altered.

  BUCKET 23:
  row cache parent object: address=0x872d3d08 cid=5(dc_tablespace_quotas)
  hash=36109d16 typ=9 transaction=(nil) flags=00000002
  own=0x872d3dd8[0x872d3dd8,0x872d3dd8] wat=0x872d3de8[0x872d3de8,0x872d3de8] mode=N
  status=VALID/-/-/-/-/-/-/-/-
  data=
  00000004 0000004a 00000000 00003280 00000000 00000000
  BUCKET 23 total object count=1

SQL> select * from user_ts_quotas;

TABLESPACE_NAME                     BYTES  MAX_BYTES     BLOCKS MAX_BLOCKS DRO
------------------------------ ---------- ---------- ---------- ---------- ---
USERS                                   0  105906176          0      12928 NO  

 12928 block = 0x3280

 

 

Oracle内部使用KTS模块的函数实现Tablespace Quota的管理, 以下为ORA-01950错误的errostack stack call:

 

 

SQL> oradebug setmypid;
Statement processed.
SQL> oradebug event 1950 trace name errorstack level 4:10046 trace name context forever,level 12;
Statement processed.
SQL> create table maclean_space.space_test tablespace system as select * from dba_tables;
create table maclean_space.space_test tablespace system as select * from dba_tables
*
ERROR at line 1:
ORA-01950: no privileges on tablespace 'SYSTEM'
SQL> oradebug tracefile_name
/s01/admin/G10R25/udump/g10r25_ora_25758.trc
Current SQL statement for this session:
create table maclean_space.space_test tablespace system as select * from dba_tables
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ssd_unwind_bp: unhandled instruction at 0x76a02d instr=f
ksedst()+31          call     ksedst1()            000000000 ? 000000001 ?
7FFF9451AF10 ? 7FFF9451AF70 ?
7FFF9451AEB0 ? 000000000 ?
ksedmp()+610         call     ksedst()             000000000 ? 000000001 ?
7FFF9451AF10 ? 7FFF9451AF70 ?
7FFF9451AEB0 ? 000000000 ?
ksddoa()+1766        call     ksedmp()             000000004 ? 000000001 ?
7FFF9451AF10 ? 7FFF9451AF70 ?
7FFF9451AEB0 ? 000000000 ?
ksdpcg()+646         call     ksddoa()             7FAACAD703F8 ? 7FAACAD56980 ?
7FFF9451AF10 ? 7FFF9451AF70 ?
7FFF9451AEB0 ? 000000000 ?
ksdpec()+247         call     ksdpcg()             00000079E ? 7FAACAD703F8 ?
7FFF9451AF10 ? 7FFF9451AF70 ?
7FFF9451AEB0 ? 000000000 ?
ksfpec()+171         call     ksdpec()             00000079E ? 7FAACAD703F8 ?
7FFF9451AF10 ? 7FFF9451AF70 ?
7FFF9451AEB0 ? 000000000 ?
kgesev()+686         call     ksfpec()             00000079E ? 7FFF9451AF10 ?
7FFF9451AF10 ? 7FFF9451AF70 ?
7FFF9451AEB0 ? 000000000 ?
ksesec1()+189        call     kgesev()             006AF5CE0 ? 00805C028 ?
00000079E ? 000000001 ?
7FFF9451C100 ? 000000000 ?
kttgsq()+425         call     ksesec1()            006AF5CE0 ? 000000001 ?
000000006 ? 7FFF9451C1F2 ?
0000000CA ? 08FF3BF28 ?
ktfbtgex1()+420      call     kttgsq()             000000000 ? 000000048 ?
000000006 ? 7FFF9451C1F2 ?
0000000CA ? 08FF3BF28 ?
ktsscrseg()+1072     call     ktfbtgex1()          7FFF9451CA88 ? 000000048 ?
7FFF9451D370 ? 000000008 ?
7FAA00000000 ? 7FFF00000001 ?
ktssctr_segment1()+  call     ktsscrseg()          7FFF9451D368 ? 7FFF9451D04C ?
939                                                7FFF9451CC40 ? 7FFF9451CFF8 ?
300000000 ? 7FFF00000001 ?
ktssctr_segment()+2  call     ktssctr_segment1()   7FFF9451DDC8 ? 7FFF9451D368 ?
26                                                 7FFF9451DC20 ? 7FFF9451CFF8 ?
7FAA00000000 ? 7FFF00000001 ?
ktrsexec()+437       call     ktssctr_segment()    7FFF9451DC08 ? 7FFF9451D368 ?
7FFF9451DC20 ? 006AF5E60 ?
7FAA00000000 ? 7FFF00000001 ?
ktsscf_segment()+67  call     ktrsexec()           7FFF9451DC08 ? 7FFF9451D368 ?
7                                                  7FFF9451DC20 ? 006AF5E60 ?
7FAA00000000 ? 7FFF00000001 ?
qerlt_lsa()+1695     call     ktsscf_segment()     7FFF9451DDC8 ? 000000005 ?
7FFF9451DC20 ? 006AF5E60 ?
7FAA00000000 ? 7FFF00000001 ?
klclil1r()+483       call     qerlt_lsa()          000002000 ? 08731BA70 ?
7FAACACCC2F8 ? 000000001 ?
7FFF9451DE94 ? 7FFF00000001 ?
qerltRop()+928       call     klclil1r()           7FAACACCC008 ? 08731BA70 ?
7FAACACCC2F8 ? 000000001 ?
7FFF9451DE94 ? 7FFF00000001 ?
qerstRowP()+388      call     qerltRop()           08731BA70 ? 000007FFF ?

为什么Flashback Log产生量要远少于Redo Log?

有同学提问关于闪回数据库日志为什么远少于redo log?   RVWR( Recovery Writer)每3s检查flashback generate buffer中的block before image的具体算法是如何的? 是否每一次block change都需要RVWR写出block before image 到flashback log?

 

为了实现闪回数据库,Oracle需要记录数据块的前景象before image到一种新的日志中,这种日志被命名为flashback database logs闪回日志。   闪回日志总是被循环复用,连续写出。

 

在一个实例中当一个数据块首次被修改时,前台进程会将该数据块的before image拷贝到位于shared pool中的flashback log buffer中,RVWR进程定期地将flashback log buffer中的记录刷新到磁盘上。 在DBWR进程可以写出相关脏块到磁盘之前,DBWR必须保证该buffer header相关FBA(Flashback Byte Address)的flashback log buffer已经写出到闪回日志。 这被称作 先写闪回日志 机制。

 

在常规的闪回日志维护操作中 , RVWR进程定期地插入闪回标记(flashback markers)到flashback database logs中。 闪回标记(flashback markers)的作用是在闪回数据库是告知Oracle如何flashback 到之前的某个时间点。  在闪回操作执行过程中, Oracle 会用闪回标记(flashback markers)中的信息来决定多大范围的flashback database log需要用来还原数据块景象block image; 之后Oracle 会利用前向恢复(forward recovery)的方式把数据库穿越到用户指定闪回的SCN或者时间点。

 

flashback markers for example:

 

 **** Record at fba: (lno 1 thr 1 seq 1 bno 4 bof 8184) ****
RECORD HEADER:
Type: 3 (Skip)  Size: 8132
RECORD DATA (Skip):
**** Record at fba: (lno 1 thr 1 seq 1 bno 4 bof 52) ****
RECORD HEADER:
Type: 7 (Begin Crash Recovery Record)  Size: 36
RECORD DATA (Begin Crash Recovery Record):
Previous logical record fba: (lno 1 thr 1 seq 1 bno 3 bof 316)
Record scn: 0x0000.00000000 [0.0]
**** Record at fba: (lno 1 thr 1 seq 1 bno 3 bof 8184) ****
RECORD HEADER:
Type: 3 (Skip)  Size: 7868
RECORD DATA (Skip):
**** Record at fba: (lno 1 thr 1 seq 1 bno 3 bof 316) ****
RECORD HEADER:
Type: 2 (Marker)  Size: 300
RECORD DATA (Marker):
Previous logical record fba: (lno 0 thr 0 seq 0 bno 0 bof 0)
Record scn: 0x0000.00000000 [0.0]
Marker scn: 0x0000.0060e024 [0.6348836] 06/13/2012 15:56:35
Flag 0x0
Flashback threads: 1, Enabled redo threads 1
Recovery Start Checkpoint:
scn: 0x0000.0060e024 [0.6348836]  06/13/2012 15:56:12
thread:1 rba:(0x80.180.10)
Flashback thread Markers:
Thread:1 status:0 fba: (lno 1 thr 1 seq 1 bno 2 bof 8184)
Redo Thread Checkpoint Info:
Thread:1 rba:(0x80.180.10)
**** Record at fba: (lno 1 thr 1 seq 1 bno 2 bof 8184) ****
RECORD HEADER:
Type: 3 (Skip)  Size: 8168
RECORD DATA (Skip):
End-Of-Thread reached

 

 

 

需要注意的是不是数据库中的每一次block change 都会触发before image被记录到闪回日志flashback log中。 如果每一次block change都记录flashback log record 那么闪回日志会要比 redo log大的多!因为毕竟flashback log 是记录整个块的before image 而 redo log只记录 change vector 。  Oracle对于闪回日志使用一种即能够保证可以讲数据库一致地穿越到某个历史时间点的状态,又不过分造成I/O损耗和生成大量闪回日志的方法:

对于hot block热块,Oracle仅在一段时间内记录一次block image到闪回日志; Oracle 内部通过闪回分界线(flashback barriers)实现这一点。在常规数据库状态下,flashback barriers被周期性的触发(一般为15分钟),对应每一个闪回分界线(flashback barriers)会有一个(flashback markers)被写出到闪回日志。

 

对于热块, 即那些频繁被change的数据库块, 闪回日志算法要求限制其写出到闪回日志的版本数。举例来说, 在15分钟内对于某个数据块仅限其写出一个版本到flashback log。虽然更多版本的before image有助于减少闪回数据库的时间,但是最小化有效闪回日志的容量是更重要的因素,比起实际闪回所用的时间来说。 为了实现这个目标,闪回日志算法引入了闪回分界线(flashback barriers), flashback barriers会被定期地触发,典型的例如15分钟一次。 对应每一个flashback barriers会有一个闪回标记(flashback markers)被插入到闪回日志中。常规情况下,对于每一个被修改的数据块在一个闪回区域(被分界线barriers分割的区域)内仅记录一个block image ,无论这个数据块在这段区域内被修改了多少次、被写出过多少次到磁盘上。

这就是为什么闪回日志flashback log要原少于redo log的产生量!

 

好了说了,这么多如果不亲身体验一下就太虚了, 我们来做以下的测试。

 

 

SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL> select * from global_name;
GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmaclean.com
SQL> create table flash_maclean (t1 varchar2(200)) tablespace users;
Table created.
SQL> insert into flash_maclean values('MACLEAN LOVE HANNA');
1 row created.
SQL> commit;
Commit complete.
SQL> startup force;
ORACLE instance started.
Total System Global Area  939495424 bytes
Fixed Size                  2233960 bytes
Variable Size             713034136 bytes
Database Buffers          218103808 bytes
Redo Buffers                6123520 bytes
Database mounted.
Database opened.
SQL> update flash_maclean set t1='HANNA LOVE MACLEAN';
1 row updated.
commit;
Commit complete.
SQL> alter system checkpoint;
System altered.
SQL> select dbms_rowid.rowid_block_number(rowid),dbms_rowid.rowid_relative_fno(rowid) from flash_maclean;
DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID)
------------------------------------ ------------------------------------
140431                                    4
datafile 4 block 140431 对应RDBA rdba: 0x0102248f (4/140431)
SQL> ! ps -ef|grep rvwr|grep -v grep
oracle   26695     1  0 15:56 ?        00:00:00 ora_rvwr_G11R23
SQL> oradebug setospid  26695
Oracle pid: 20, Unix process pid: 26695, image: oracle@nas.oracle.com (RVWR)
SQL> ORADEBUG DUMP FBTAIL 1;
Statement processed.
To dump the last 2000 flashback records , 以上ORADEBUG DUMP FBTAIL 1命令可以转出最近2000条的闪回记录
SQL> oradebug tracefile_name
/s01/orabase/diag/rdbms/g11r23/G11R23/trace/G11R23_rvwr_26695.trc
在 TRACE文件中可以找到对应block的 before image 
**** Record at fba: (lno 1 thr 1 seq 1 bno 55 bof 2564) ****
RECORD HEADER:
Type: 1 (Block Image)  Size: 28
RECORD DATA (Block Image):
file#: 4 rdba: 0x0102248f
Next scn: 0x0000.00000000 [0.0]
Flag: 0x0
Block Size: 8192
BLOCK IMAGE:
buffer rdba: 0x0102248f
scn: 0x0000.00609044 seq: 0x01 flg: 0x06 tail: 0x90440601
frmt: 0x02 chkval: 0xc626 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
Dump of memory from 0x00002B1D94183C00 to 0x00002B1D94185C00
2B1D94183C00 0000A206 0102248F 00609044 06010000  [.....$..D.`.....]
2B1D94183C10 0000C626 00000001 00014AD4 0060903A  [&........J..:.`.]
2B1D94183C20 00000000 00320002 01022488 00090006  [......2..$......]
2B1D94183C30 00000CC8 00C00340 000D0542 00008000  [....@...B.......]
2B1D94183C40 006040BC 000F000A 00000920 00C002E4  [.@`..... .......]
2B1D94183C50 0017048F 00002001 00609044 00000000  [..... ..D.`.....]
2B1D94183C60 00000000 00010100 0014FFFF 1F6E1F77  [............w.n.]
2B1D94183C70 00001F6E 1F770001 00000000 00000000  [n.....w.........]
2B1D94183C80 00000000 00000000 00000000 00000000  [................]
Repeat 500 times
2B1D94185BD0 00000000 00000000 2C000000 4D120102  [...........,...M]
2B1D94185BE0 454C4341 4C204E41 2045564F 4E4E4148  [ACLEAN LOVE HANN]
2B1D94185BF0 01002C41 43414D07 4E41454C 90440601  [A,...MACLEAN..D.]
Block header dump:  0x0102248f
Object id on Block? Y
seg/obj: 0x14ad4  csc: 0x00.60903a  itc: 2  flg: E  typ: 1 - DATA
brn: 0  bdba: 0x1022488 ver: 0x01 opc: 0
inc: 0  exflg: 0
Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0006.009.00000cc8  0x00c00340.0542.0d  C---    0  scn 0x0000.006040bc
0x02   0x000a.00f.00000920  0x00c002e4.048f.17  --U-    1  fsc 0x0000.00609044
bdba: 0x0102248f
data_block_dump,data header at 0x2b1d94183c64
===============
tsiz: 0x1f98
hsiz: 0x14
pbl: 0x2b1d94183c64
76543210
flag=--------
ntab=1
nrow=1
frre=-1
fsbo=0x14
fseo=0x1f77
avsp=0x1f6e
tosp=0x1f6e
0xe:pti[0]      nrow=1  offs=0
0x12:pri[0]     offs=0x1f77
block_row_dump:
tab 0, row 0, @0x1f77
tl: 22 fb: --H-FL-- lb: 0x2  cc: 1
col  0: [18]  4d 41 43 4c 45 41 4e 20 4c 4f 56 45 20 48 41 4e 4e 41
end_of_block_dump
SQL> select dump('MACLEAN LOVE HANNA',16) from dual;
DUMP('MACLEANLOVEHANNA',16)
--------------------------------------------------------------------
Typ=96 Len=18: 4d,41,43,4c,45,41,4e,20,4c,4f,56,45,20,48,41,4e,4e,41

 

 

若我们在短期内在多个事务内反复更新同样的数据块,其在flashback log中的before image版本也不会大幅增长

 

create table flash_maclean1 (t1 int) tablespace users;
SQL>      select vs.name, ms.value
2       from v$mystat ms, v$sysstat vs
3       where vs.statistic# = ms.statistic#
4         and vs.name in ('redo size','db block changes');
NAME                                                                  VALUE
---------------------------------------------------------------- ----------
db block changes                                                          0
redo size                                                                 0
SQL>      select name,value from v$sysstat where name like 'flashback log%';
NAME                                                                  VALUE
---------------------------------------------------------------- ----------
flashback log writes                                                     49
flashback log write bytes                                           9306112
SQL>      begin
2       for i in 1..5000 loop
3       update flash_maclean1 set t1=t1+1;
4       commit;
5       end loop;
6       end;
7       /
PL/SQL procedure successfully completed.
SQL>     select vs.name, ms.value
2       from v$mystat ms, v$sysstat vs
3       where vs.statistic# = ms.statistic#
4         and vs.name in ('redo size','db block changes');
NAME                                                                  VALUE
---------------------------------------------------------------- ----------
db block changes                                                      20006
redo size                                                           3071288
SQL>      select name,value from v$sysstat where name like 'flashback log%';
NAME                                                                  VALUE
---------------------------------------------------------------- ----------
flashback log writes                                                     52
flashback log write bytes                                          10338304

 

 

在以上的测试中可以看到 对于hot block,在产生20006 个block changes的情况下 产生了 3000k 的redo log 和 大约1000k的 flashback log 。

Know more about Enqueue Deadlock Detection

今天在 ORACLE ALLSTAR群里讨论了一个关于队列锁死锁检测的问题,原帖子在这里。 有同学指出对于enqueue lock的死锁检测应当是每3 秒钟检测一次,这样说的依据是通过一个简单可重复的实验可以证明在实际出现ora-00060 dead lock错前process等待了3s:

 

SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bi
PL/SQL Release 10.2.0.5.0 - Production
CORE    10.2.0.5.0      Production
TNS for Linux: Version 10.2.0.5.0 - Production
NLSRTL Version 10.2.0.5.0 - Production
SQL> select * from global_name;
GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmaclean.com
PROCESS A:
set timing on;
update maclean1 set t1=t1+1;
PROCESS B:
update maclean2 set t1=t1+1;
PROCESS A:
update maclean2 set t1=t1+1;
PROCESS B:
update maclean1 set t1=t1+1;
等待3s后 PROCESS A 会报
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource
Elapsed: 00:00:03.02

 

 

可以看到Process A在检测到死锁之前确实等待了 3s,而且这是一个可以重复的实验,很具有说服力。

事实真的是这样的吗?

 

来看下面的演示:

 

SQL> col name for a30
SQL> col value for a5
SQL> col DESCRIB for a50
SQL> set linesize 140 pagesize 1400
SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
2   FROM SYS.x$ksppi x, SYS.x$ksppcv y
3   WHERE x.inst_id = USERENV ('Instance')
4   AND y.inst_id = USERENV ('Instance')
5   AND x.indx = y.indx
6  AND x.ksppinm='_enqueue_deadlock_scan_secs';
NAME                           VALUE DESCRIB
------------------------------ ----- --------------------------------------------------
_enqueue_deadlock_scan_secs    0    deadlock scan interval
SQL> alter system set "_enqueue_deadlock_scan_secs"=18 scope=spfile;
System altered.
Elapsed: 00:00:00.01
SQL> startup force;
ORACLE instance started.
Total System Global Area  851443712 bytes
Fixed Size                  2100040 bytes
Variable Size             738198712 bytes
Database Buffers          104857600 bytes
Redo Buffers                6287360 bytes
Database mounted.
Database opened.
PROCESS A:
SQL> set timing on;
SQL> update maclean1 set t1=t1+1;
1 row updated.
Elapsed: 00:00:00.06
Process B
SQL> update maclean2 set t1=t1+1;
1 row updated.
SQL>  update maclean1 set t1=t1+1;
Process A:
SQL>
SQL> alter session set events '10704 trace name context forever,level 10:10046 trace name context forever,level 8';
Session altered.
SQL> update maclean2 set t1=t1+1;
update maclean2 set t1=t1+1
*
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource 
Elapsed: 00:00:18.05
ksqcmi: TX,90011,4a9 mode=6 timeout=21474836
WAIT #12: nam='enq: TX - row lock contention' ela= 2930070 name|mode=1415053318 usn<<16 | slot=589841 sequence=1193 obj#=56810 tim=1308114759849120
WAIT #12: nam='enq: TX - row lock contention' ela= 2930636 name|mode=1415053318 usn<<16 | slot=589841 sequence=1193 obj#=56810 tim=1308114762779801
WAIT #12: nam='enq: TX - row lock contention' ela= 2930439 name|mode=1415053318 usn<<16 | slot=589841 sequence=1193 obj#=56810 tim=1308114765710430
*** 2012-06-12 09:58:43.089
WAIT #12: nam='enq: TX - row lock contention' ela= 2931698 name|mode=1415053318 usn<<16 | slot=589841 sequence=1193 obj#=56810 tim=1308114768642192
WAIT #12: nam='enq: TX - row lock contention' ela= 2930428 name|mode=1415053318 usn<<16 | slot=589841 sequence=1193 obj#=56810 tim=1308114771572755
WAIT #12: nam='enq: TX - row lock contention' ela= 2931408 name|mode=1415053318 usn<<16 | slot=589841 sequence=1193 obj#=56810 tim=1308114774504207
DEADLOCK DETECTED ( ORA-00060 )
[Transaction Deadlock]
The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:

 

 

 

以上可以看到Process A从’enq: TX – row lock contention’ 等待到触发ORA-00060 deadlock detected的时间从3s 变成了 18s , 恰巧是hidden parameter “_enqueue_deadlock_scan_secs”所指定的值,该隐藏参数默认为0。

 

我们再来看另一个实验:

 

SQL> alter system set "_enqueue_deadlock_scan_secs"=4 scope=spfile;
System altered.
Elapsed: 00:00:00.01
SQL> alter system set "_enqueue_deadlock_time_sec"=9 scope=spfile;
System altered.
Elapsed: 00:00:00.00
SQL> startup force;
ORACLE instance started.
Total System Global Area  851443712 bytes
Fixed Size                  2100040 bytes
Variable Size             738198712 bytes
Database Buffers          104857600 bytes
Redo Buffers                6287360 bytes
Database mounted.
Database opened.
SQL> set linesize 140 pagesize 1400
SQL> show parameter dead
NAME                                 TYPE                             VALUE
------------------------------------ -------------------------------- ------------------------------
_enqueue_deadlock_scan_secs          integer                          4
_enqueue_deadlock_time_sec           integer                          9
SQL> set timing on
SQL> select * from maclean1 for update wait 8; 
T1
----------
11
Elapsed: 00:00:00.01
PROCESS B
SQL> select * from maclean2 for update wait 8;
T1
----------
3
SQL> select * from maclean1 for update wait 8;
select * from maclean1 for update wait 8
PROCESS A
SQL> select * from maclean2 for update wait 8;
select * from maclean2 for update wait 8
*
ERROR at line 1:
ORA-30006: resource busy; acquire with WAIT timeout expired
Elapsed: 00:00:08.00

 

 

以上我们可以看到 当指定 select for update wait的enqueue request timeout 等待超时为8s时 ,虽然我们指定了”_enqueue_deadlock_scan_secs”=4(deadlock scan interval
),即4s做一次deadlock detected,但是实际Process A并没有做deadlock 的检测, 为什么这样说?

 

因为Process A在这里等待8s后raised的是”ORA-30006: resource busy; acquire with WAIT timeout expired”错误,而非ORA-00060,这就说明了process A没有做死锁检测。

 

这是因为隐藏参数”_enqueue_deadlock_time_sec”(requests with timeout <= this will not have deadlock detection)的存在,当enqueue request time < “_enqueue_deadlock_time_sec”时Server process就会放弃做dead lock detection,这是因为即然指定了enqueue request 超时的时间且timeout值又不是很大(_enqueue_deadlock_time_sec默认值为5,即timeout<5s),那么不做死锁检测也是可以接受的;而对于指定的timeout>”_enqueue_deadlock_time_sec”的情况,Oracle仍需要定时做死锁检测避免长时间的死锁发生。

 

我们再来看下面的演示:

 

 

SQL> show parameter dead
NAME                                 TYPE                             VALUE
------------------------------------ -------------------------------- ------------------------------
_enqueue_deadlock_scan_secs          integer                          4
_enqueue_deadlock_time_sec           integer                          9
Process A:
SQL> set timing on;
SQL> select * from maclean1 for update wait 10;  
T1
----------
11
Process B:
SQL> select * from maclean2 for update wait 10;  
T1
----------
3
SQL> select * from maclean1 for update wait 10;  
PROCESS A:
SQL> select * from maclean2 for update wait 10;
select * from maclean2 for update wait 10
*
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource
Elapsed: 00:00:06.02

 

 

这一次我们指定 select for update wait 10即10s超时, 因为 10s大于当前的_enqueue_deadlock_time_sec设置值(9s),所以Process A就会做死锁检测。 但是可以看到这里死锁检测时间为6s 而不是我们指定_enqueue_deadlock_scan_secs的4s 。

 

通过反复的测试可以发现,实际的死锁检测时间总是_enqueue_deadlock_scan_secs指定的值向上取整为3的倍数。

 

总结:

enqueue lock队列锁的死锁检测遵循以下原则

1. 默认情况下死锁检测deadlock detection总是3s发生一次, 但是实际受到参数_enqueue_deadlock_scan_secs(deadlock scan interval)的控制,该参数默认为0,死锁检测时间总是_enqueue_deadlock_scan_secs指定的值向上取整为3的倍数, 当_enqueue_deadlock_scan_secs=0 则为3s一次, 当_enqueue_deadlock_scan_secs=4则为6s一次,依此类推。

2. 死锁检测还受到_enqueue_deadlock_time_sec(requests with timeout <= this will not have deadlock detection)的影响,若enqueue request timeout< _enqueue_deadlock_time_sec(默认值为5),则Server process不会做死锁检测。若enqueue request timeout>_enqueue_deadlock_time_sec 则会参考_enqueue_deadlock_scan_secs定期做死锁检测, 当然request timeout的指定不限于select for update wait [TIMEOUT]一种。

 

补充:

 

实际在10.2.0.1中尚未引入以上的这2个hidden parameter , 大约在patchset 10.2.0.3开始引入了 _enqueue_deadlock_time_sec, 在patchset 10.2.0.5中开始引入了_enqueue_deadlock_scan_secs。

 

有同学指出RAC中的死锁检测默认应当为10s, 这受到隐藏参数_lm_dd_interval(dd time interval in seconds) ,该参数最早在版本8.0.6中引入。 通过实验可以验证这位同学的说法,我也表示支持,  在10g中该参数默认为60s,从11g开始默认减少到10s。  需要注意的是在11g之前_lm_dd_interval这个参数指定值是不可信任的,这是因为在11g中对LMD进程的死锁检测进行了优化,原本在节点数量较多的RAC中LMD的Deadlock Detection可能消耗更多的CPU,以下是11g中Oracle内部研发Team测试的LMD在多节点情况下的CPU使用情况:

 

 

 

我们进一步来了解11g中LMD检测死锁的细节,这里我们会用到11g特有的 UTS TRACE 来获取更多 DD的信息:

 

 

SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL>
SQL> select * from global_name
2  ;
GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmaclean.com 
SQL> alter system set "_lm_dd_interval"=20 scope=spfile;
System altered.
SQL> startup force;
ORACLE instance started.
Total System Global Area 1570009088 bytes
Fixed Size                  2228704 bytes
Variable Size            1325403680 bytes
Database Buffers          234881024 bytes
Redo Buffers                7495680 bytes
Database mounted.
Database opened.
SQL> set linesize 140 pagesize 1400
SQL> show parameter lm_dd
NAME                                 TYPE                             VALUE
------------------------------------ -------------------------------- ------------------------------
_lm_dd_interval                      integer                          20
SQL> select count(*) from gv$instance;
COUNT(*)
----------
2
instance 1:
SQL> oradebug setorapid 12
Oracle pid: 12, Unix process pid: 8608, image: oracle@vrh2.oracle.com (LMD0)
对 LMD0进程做 UTS TRACE所有RAC相关的内部操作将无所遁形
SQL> oradebug event 10046 trace name context forever,level 8:10708 trace name context forever,level 103: trace[rac.*] disk high;
Statement processed.
Elapsed: 00:00:00.00
SQL> update maclean1 set t1=t1+1;
1 row updated.
instance 2:
SQL> update maclean2 set t1=t1+1;
1 row updated.
SQL> update maclean1 set t1=t1+1;
Instance 1:
SQL> update maclean2 set t1=t1+1;
update maclean2 set t1=t1+1
*
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource
Elapsed: 00:00:20.51
LMD0进程的UTS TRACE
2012-06-12 22:27:00.929284 : [kjmpbmsg:process][type 22][msg 0x7fa620ac85a8][from 1][seq 8148.0][len 192]
2012-06-12 22:27:00.929346 : [kjmxmpm][type 22][seq 0.0][msg 0x7fa620ac85a8][from 1]
*** 2012-06-12 22:27:00.929
* kjddind: received DDIND msg with subtype x6
*          reqp->dd_master_inst_kjxmddi == 1
* kjddind: dump sgh:
2012-06-12 22:27:00.929346*: kjddind: req->timestamp [0.15], kjddt [0.13]
2012-06-12 22:27:00.929346*: >> DDmsg:KJX_DD_REMOTE,TS[0.15],Inst 1->2,ddxid[id1,id2,inst:2097153,31,1],ddlock[0x95023930,829],ddMasterInst 1
2012-06-12 22:27:00.929346*: lock [0x95023930,829], op = [mast]
2012-06-12 22:27:00.929346*: reqp->timestamp [0.15], kjddt [0.13]
2012-06-12 22:27:00.929346*: kjddind: updated local timestamp [0.15]
* kjddind: case KJX_DD_REMOTE
2012-06-12 22:27:00.929346*: ADD IO NODE WFG: 0 frame pointer
2012-06-12 22:27:00.929346*: PUSH: type=res, enqueue(0xffffffff.0xffffffff)=0xbbb9af40, block=KJUSEREX, snode=1
2012-06-12 22:27:00.929346*: PROCESS: type=res, enqueue(0xffffffff.0xffffffff)=0xbbb9af40, block=KJUSEREX, snode=1
2012-06-12 22:27:00.929346*: POP: type=res, enqueue(0xffffffff.0xffffffff)=0xbbb9af40, block=KJUSEREX, snode=1
2012-06-12 22:27:00.929346*: kjddopr[TX 0xe000c.0x32][ext 0x5,0x0]: blocking lock 0xbbb9a800, owner 2097154 of inst 2
2012-06-12 22:27:00.929346*: PUSH: type=txn, enqueue(0xffffffff.0xffffffff)=0xbbb9a800, block=KJUSEREX, snode=1
2012-06-12 22:27:00.929346*: PROCESS: type=txn, enqueue(0xffffffff.0xffffffff)=0xbbb9a800, block=KJUSEREX, snode=1
2012-06-12 22:27:00.929346*: ADD NODE TO WFG: type=txn, enqueue(0xffffffff.0xffffffff)=0xbbb9a800, block=KJUSEREX, snode=1
2012-06-12 22:27:00.929346*: POP: type=txn, enqueue(0xffffffff.0xffffffff)=0xbbb9a800, block=KJUSEREX, snode=1
2012-06-12 22:27:00.929346*: kjddopt: converting lock 0xbbce92f8 on 'TX' 0x80016.0x5d4,txid [2097154,34]of inst 2
2012-06-12 22:27:00.929346*: PUSH: type=res, enqueue(0xffffffff.0xffffffff)=0xbbce92f8, block=KJUSEREX, snode=1
2012-06-12 22:27:00.929346*: PROCESS: type=res, enqueue(0xffffffff.0xffffffff)=0xbbce92f8, block=KJUSEREX, snode=1
2012-06-12 22:27:00.929346*: ADD NODE TO WFG: type=res, enqueue(0xffffffff.0xffffffff)=0xbbce92f8, block=KJUSEREX, snode=1
2012-06-12 22:27:00.929855 : GSIPC:AMBUF: rcv buff 0x7fa620aa8cd8, pool rcvbuf, rqlen 1102
2012-06-12 22:27:00.929878 : GSIPC:GPBMSG: new bmsg 0x7fa620aa8d48 mb 0x7fa620aa8cd8 msg 0x7fa620aa8d68 mlen 192 dest x100 flushsz -1
2012-06-12 22:27:00.929878*: << DDmsg:KJX_DD_REMOTE,TS[0.15],Inst 2->1,ddxid[id1,id2,inst:2097153,31,1],ddlock[0x95023930,829],ddMasterInst 1
2012-06-12 22:27:00.929878*: lock [0xbbce92f8,287], op = [mast]
2012-06-12 22:27:00.929878*: ADD IO NODE WFG: 0 frame pointer
2012-06-12 22:27:00.929923 : [kjmpbmsg:compl][msg 0x7fa620ac8588][typ p][nmsgs 1][qtime 0][ptime 0]
2012-06-12 22:27:00.929947 : GSIPC:PBAT: flush start. flag 0x79 end 0 inc 4.4
2012-06-12 22:27:00.929963 : GSIPC:PBAT: send bmsg 0x7fa620aa8d48 blen 224 dest 1.0
2012-06-12 22:27:00.929979 : GSIPC:SNDQ: enq msg 0x7fa620aa8d48, type 65521 seq 8325, inst 1, receiver 0, queued 1
012-06-12 22:27:00.929979 : GSIPC:SNDQ: enq msg 0x7fa620aa8d48, type 65521 seq 8325, inst 1, receiver 0, queued 1
2012-06-12 22:27:00.929996 : GSIPC:BSEND: flushing sndq 0xb491dd28, id 0, dcx 0xbc517770, inst 1, rcvr 0  qlen 0 1
2012-06-12 22:27:00.930014 : GSIPC:BSEND: no batch1 msg 0x7fa620aa8d48 type 65521 len 224 dest (1:0)
2012-06-12 22:27:00.930088 : kjbsentscn[0x0.3f72dc][to 1]
2012-06-12 22:27:00.930144 : GSIPC:SENDM: send msg 0x7fa620aa8d48 dest x10000 seq 8325 type 65521 tkts x1 mlen xe00110
2012-06-12 22:27:00.930531 : GSIPC:KSXPCB: msg 0x7fa620aa8d48 status 30, type 65521, dest 1, rcvr 0
WAIT #0: nam='ges remote message' ela= 1372 waittime=80 loop=0 p3=74 obj#=-1 tim=1339554420931640
2012-06-12 22:27:00.931728 : GSIPC:RCVD: ksxp msg 0x7fa620af6490 sndr 1 seq 0.8149 type 65521 tkts 1
2012-06-12 22:27:00.931746 : GSIPC:RCVD: watq msg 0x7fa620af6490 sndr 1, seq 8149, type 65521, tkts 1
2012-06-12 22:27:00.931763 : GSIPC:RCVD: seq update (0.8148)->(0.8149) tp -15 fg 0x4 from 1 pbattr 0x0
2012-06-12 22:27:00.931779 : GSIPC:TKT: collect msg 0x7fa620af6490 from 1 for rcvr 0, tickets 1
2012-06-12 22:27:00.931794 : kjbrcvdscn[0x0.3f72dc][from 1][idx 2012-06-12 22:27:00.931810 : kjbrcvdscn[no bscn dd_master_inst_kjxmddi == 1
* kjddind: dump sgh:
NXTIN (nil) 0 wq 0 cvtops x0  0x0.0x0(ext 0x0,0x0)[0000-0000-00000000] inst 1
BLOCKER 0xbbb9a800 5 wq 1 cvtops x28 TX 0xe000c.0x32(ext 0x5,0x0)[20000-0002-00000022] inst 2
BLOCKED 0xbbce92f8 5 wq 2 cvtops x1 TX 0x80016.0x5d4(ext 0x2,0x0)[20000-0002-00000022] inst 2
NXTOUT (nil) 0 wq 0 cvtops x0  0x0.0x0(ext 0x0,0x0)[0000-0000-00000000] inst 1
2012-06-12 22:27:00.932058*: kjddind: req->timestamp [0.15], kjddt [0.15]
2012-06-12 22:27:00.932058*: >> DDmsg:KJX_DD_VALIDATE,TS[0.15],Inst 1->2,ddxid[id1,id2,inst:2097153,31,1],ddlock[0x95023930,829],ddMasterInst 1
2012-06-12 22:27:00.932058*: lock [(nil),0], op = [vald_dd]
2012-06-12 22:27:00.932058*: kjddind: updated local timestamp [0.15]
* kjddind: case KJX_DD_VALIDATE
*** 2012-06-12 22:27:00.932
* kjddvald called: kjxmddi stuff:
*                  cont_lockp (nil)
*                  dd_lockp 0x95023930
*                  dd_inst 1
*                  dd_master_inst 1
* sgh graph:
NXTIN (nil) 0 wq 0 cvtops x0  0x0.0x0(ext 0x0,0x0)[0000-0000-00000000] inst 1
BLOCKER 0xbbb9a800 5 wq 1 cvtops x28 TX 0xe000c.0x32(ext 0x5,0x0)[20000-0002-00000022] inst 2
BLOCKED 0xbbce92f8 5 wq 2 cvtops x1 TX 0x80016.0x5d4(ext 0x2,0x0)[20000-0002-00000022] inst 2
NXTOUT (nil) 0 wq 0 cvtops x0  0x0.0x0(ext 0x0,0x0)[0000-0000-00000000] inst 1
POP WFG NODE: lock=(nil)
* kjddvald: dump the PRQ:
BLOCKER 0xbbb9a800 5 wq 1 cvtops x28 TX 0xe000c.0x32(ext 0x5,0x0)[20000-0002-00000022] inst 2
BLOCKED 0xbbce92f8 5 wq 2 cvtops x1 TX 0x80016.0x5d4(ext 0x2,0x0)[20000-0002-00000022] inst 2
* kjddvald: KJDD_NXTONOD ->node_kjddsg.dinst_kjddnd =1
* kjddvald: ... which is not my node, my subgraph is validated but the cycle is not complete
Global blockers dump start:---------------------------------
DUMP LOCAL BLOCKER/HOLDER: block level 5 res [0x80016][0x5d4],[TX][ext 0x2,0x0]
发现dead lock!!!

 

 

可以看到在以上11.2.0.3的测试中 RAC LMD的死锁检测时间确实受到”_lm_dd_interval”参数的影响变成了指定值的20s。  需要注意的是在10g中_lm_dd_interval默认为60s,在节点较少且Processes数不多的环境中这个参数往往是可信的,但是如果节点很多且有大量Server Process则死锁未必能在这60s内检测到。在11g中对该参数(实际优化的是LMD的死锁检测算法)做了充分的优化,一般可以信任其在默认的10s内完成 Enqueue Deadlock Detection。

 

在11g中控制 RAC中LMD进程检测死锁的hidden parameter 还不仅于”_lm_dd_interval”这一个,RAC的内部机制永远是一个有意思的话题,但是今天我们就聊到这里:

 

SQL> col name for a50
SQL> col describ for a60
SQL> col value for a20
SQL> set linesize 140 pagesize 1400
SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
2   FROM SYS.x$ksppi x, SYS.x$ksppcv y
3   WHERE x.inst_id = USERENV ('Instance')
4   AND y.inst_id = USERENV ('Instance')
5   AND x.indx = y.indx
6  AND x.ksppinm like '_lm_dd%';
NAME                                               VALUE                DESCRIB
-------------------------------------------------- -------------------- ------------------------------------------------------------
_lm_dd_interval                                    20                   dd time interval in seconds
_lm_dd_scan_interval                               5                    dd scan interval in seconds
_lm_dd_search_cnt                                  3                    number of dd search per token get
_lm_dd_max_search_time                             180                  max dd search time per token
_lm_dd_maxdump                                     50                   max number of locks to be dumped during dd validation
_lm_dd_ignore_nodd                                 FALSE                if TRUE nodeadlockwait/nodeadlockblock options are ignored
6 rows selected.

了解你所不知道的SMON功能(十一):OFFLINE UNDO SEGMENT

SMON这个老牌的后台关键进程的作用还包括对UNDO/ROLLBACK SEGMENT的维护, 这种维护主要体现在2个方面: OFFLINE和SHRINK  UNDO/ROLLBACK SEGMENT, 今天我们主要介绍OFFLINE ROLLBACK SEGMENT。

 

你肯定要问,Oracle为什么OFFLINE UNDO/ROLLBACK SEGMENT?

最主要的目的是减轻高并发事务环境中对UDNO SPACE撤销空间使用的压力。

 

触发场景

 

在10g之前的9i中每12个小时SMON会根据V$UNDOSTAT中记录来决定在现有基础上要OFFLINE多少个UNDO SEGMENT,又要保留多少个UNDO SEGMENT; 在9i中被OFFLINED UNDO SEGMENT 还会被SMON DROP掉,以进一步回收空间。

 

具体保留多少个UNDO SEGMENT,取决于过去12个小时内的V$UNDOSTAT动态视图记录的最大并发事务数量在加上1,具体公式可以参考下面的SQL:

 

 

SQL> select max(MAXCONCURRENCY)+1 from v$undostat where begin_time> (sysdate-1/2);
MAX(MAXCONCURRENCY)+1
---------------------
4

 

若你在alert.log中发现类似以下的信息则说明OFFLINE UNDO SEGS已经在你的系统中发生过了:

 

SMON offlining US=13
Freeing IMU pool for usn 13
SMON offlining US=14
SMON offlining US=15
SMON offlining US=16
SMON offlining US=17

 

 

9i中SMON通过ktusmofd函数实现对UDNO SEGMENT的OFFLINE,ktusmofd的含义为[K]ernel [T]ransaction [U]ndo [S]ystem [M]anaged OFFLINE & DROP
通过ktsmgfru函数返回必要保留的ONLINE UNDO SEGMENT, 其详细的算法如下:

 

 

SMON调用ktusmofd ,并发现instance启动未超过12个小时并且_smu_debug_mode未设置KTU_DEBUG_SMU_SMON_SHRINK标志位
(_smu_debug_mode是SYSTEM MANAGED UNDO内部参数,KTU_DEBUG_SMU_SMON_SHRINK标志位控制是否强制SMON做SHRINK)
YES  -  SMON不OFFLINE任何东西直接返回
NO   -  调用ktsmgfru 获得过去12小时的最大并发事务数
设置keep_online变量为ktsmgfru 返回值加上1
尝试hold TA ENQUEUE(该队列锁控制UNDO TABLESPACE的串行操作),该操作的超时限制为30s
若无法获得该ENQUEUE则说明正在切换UNDO TABLESPACE,ktusmofd将直接返回且不OFFLINE任何UNDO SEGMENTS
成功获得该ENQUEUE锁,进一步调用ktusmofxu并使用之前获得的keep_online作为参数,开始OFFLINE
调用kslgpl函数获得KTU LATCH 包括parent和所有的children
LOOP 在现有的ONLINE UNDO SEGMENT之间循环
若发现该UNDO SEGMENT是SMU-SYSTEM MANAGED UNDO且其所在表空间是当前undo_tablespace指向的表空间的话
若keep_online >0 , 则keep_online--
否则
释放KTU latches
调用kturof1函数实际OFFLINE 该UNDO SEGMENT
重新get KTU latches
END LOOP
释放 KTU latches

 

SMON 调用ktusmofd维护OFFLINE UNDO SEGMENT的常见STACK CALL如下:

 

ktmmon->ktusmofd->ktusmdxu->ktcrcm->ktccpcmt->ktcccdel->ktadrpc->ktssdro_segment->
ktssdrbm_segment->ktsxbmdelext->kqrcmt->ktsscu
xctrol ktcpoptx ktccpcmt ktcrcm ktusmdxu ktusmofd ktmmon
ksedmp ksfdmp kgeasnmierr ktusmgmct ktusmdxu ktusmofd ktmmon ksbrdp opirip 
opidrv sou2o main

 

10g以前的UNDO OFFLINE算法仍不完善,这导致在实例重启或切换UNDO TABLESPACE撤销表空间时,生成一定数量ONLINE UNDO SEGMENT的系统预热时间可能长达几分钟,对于高并发的环境来说这种延时是难以接受的。

 

从10g开始改进了SMON OFFLINE UNDO SEGMENT的算法,SMON会基于过去7天的(而非12个小时的)V$UNDOSTAT动态视图信息或者AWR自动负载仓库中的UNDO历史快照使用信息来决定OFFLINE UNDO SEGMENT的数量,  且在10g中SMON 不再DROP掉多余的UNDO SEGS,而仅仅OFFLINE掉;作为一种SMU的改良算法这种做法被叫做”Fast Ramp-Up”。”Fast Ramp-Up”避免了早期版本中由SMON维护UNDO SEGS引起的等待或性能问题; 此外,未公开的BUG 5079978可能在版本10.2.0.1中被触发,该BUG的信息如下:

 

Unpublished
Bug 5079978 – APPST GSI 10G : – PRODUCTION INSTANCE UNUSABLE DUE TO US ENQUEUE WAITS
is fixed in 11.1 and patch set 10.2.0.4 and interim patches are available for several earlier versions.
Please refer to Note 5079978.8

 

可以通过后面要介绍的 10511 event来规避以上bug,Oracle官方也推荐在10g以前的版本中使用 10511 event来避免SMON过度OFFLINE UNDO SEGS所引起的问题。

 

 

 

 

10g以后的具体算法如下:

 

 

判断实例启动是否超过7天?
YES -  直接使用v$undostat中过去7天的最大并发事务数max(maxconcurrency)
NO  -  判断是否是第一次调用OFFLINE UNDO SEGMENT的内核函数
YES - 检查是否存在select_workload_repository function (SWRF)快照数据
NO  - ONLINE 最小数目的UNDO SEGMENTS
YES - 尝试获取AWR记录表wrh$_undostat中过去7天的最大并发事务数max(maxconcurrency)
若无法获得以上值,则尝试读取wrh$_rollstat中最近7天的最大rollback segs数量max(rbs cnt)
将返回值保存到内部变量中
NO -  直接使用内部变量中的值

 

如何禁止SMON OFFLINE UNDO SEGMENT?

 

可以通过设置诊断事件event=’10511 trace name context forever, level 1′ 来禁用SMON OFFLINE UNDO SEGS;   但是10511事件不会跳过”Fast Ramp Up”,而仅会限制SMON对UNDO SEGS产生的工作负载。 一旦设置了10511 event, 则所有已生成的 UNDO SEGS会始终保持ONLINE状态。

具体的设置方法:

 

SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bi
PL/SQL Release 10.2.0.5.0 - Production
CORE    10.2.0.5.0      Production
TNS for Linux: Version 10.2.0.5.0 - Production
NLSRTL Version 10.2.0.5.0 - Production
SQL> select * from global_name;
GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmaclean.com
[oracle@vrh8 ~]$ oerr ora 10511
10511, 00000, "turn off SMON check to cleanup undo dictionary"
// *Cause:
// *Action:
SQL> alter system set events '10511 trace name context forever,level 1';
System altered.

 

OFFLINE UNDO SEGS的相关BUG

 

以下列出了SMON OFFLINE UNDO SEGS的一些公开的BUG,这些BUG一般都存在于10.2.0.3之前; 若你真的遇到了,可以在考虑升级之余 采用10511 event workaround规避该问题:

 

Hdr: 2726601 9.2.0.2 RDBMS 9.2.0.2 TXN MGMT LOCAL PRODID-5 PORTID-46 ORA-600 3439552
Abstract: ORA-600 [4406] IN ROUTINE KTCRAB(); 4 NODE RAC CLUSTER

Hdr: 6878461 9.2.0.4.0 RDBMS 9.2.0.4.0 TXN MGMT LOCAL PRODID-5 PORTID-23 ORA-601 5079978
Abstract: ESSC: ORA-601 ORA-474 AFTER OFFLINING UNDO SEGMENTS

Hdr: 4253991 9.2.0.4.0 RDBMS 9.2.0.4.0 TXN MGMT LOCAL PRODID-5 PORTID-23 ORA-600 2660394
Abstract: ORA-600 [KTSXR_ADD-4] FOLLOWED BY ORA-600 [KTSISEGINFO1]

Hdr: 2696314 9.2.0.2.0 RDBMS 9.2.0.2.0 TXN MGMT LOCAL PRODID-5 PORTID-46
Abstract: RECEIVING ORA-600: [KTUSMGMCT-01] AFTER APPLYING 92020 PATCH SET

Hdr: 3578807 9.2.0.4 RDBMS 9.2.0.4 TXN MGMT LOCAL PRODID-5 PORTID-23 ORA-600
Abstract: OERI 4042 RAISED INTERMITTENTLY
Hdr: 2727303 9.2.0.1.0 RDBMS 9.2.0.1.0 TXN MGMT LOCAL PRODID-5 PORTID-100 ORA-600
Abstract: [RAC] ORA-600: [KTUSMGMCT-01] ARE OCCURED IN HIGH LOAD

 

11g新特性:RDBMS Component TRACE

11g新特性UTS TRACE是比10g中引入的KST TRACE更强力内部诊断工具, 有了UTS和KST这2现代化个debugger工具, 我们差不多可以扔掉gdb和dtrace的拐杖了。

 

今后会继续介绍UTS和KST TRACE的使用方法,让任何人都有机会了解Oracle的内部运行原理。

 

以下是UTS trace events 的列表,可以利用oradebug doc component rdbms列出:

 

SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL> oradebug doc component rdbms
Components in library RDBMS:
--------------------------
  SQL_Compiler                 SQL Compiler
    SQL_Parser                 SQL Parser (qcs)
    SQL_Semantic               SQL Semantic Analysis (kkm)
    SQL_Optimizer              SQL Optimizer
      SQL_Transform            SQL Transformation (kkq, vop, nso)
        SQL_MVRW               SQL Materialized View Rewrite
        SQL_VMerge             SQL View Merging (kkqvm)
        SQL_Virtual            SQL Virtual Column (qksvc, kkfi)
      SQL_APA                  SQL Access Path Analysis (apa)
      SQL_Costing              SQL Cost-based Analysis (kko, kke)
        SQL_Parallel_Optimization SQL Parallel Optimization (kkopq)
    SQL_Code_Generator         SQL Code Generator (qka, qkn, qke, kkfd, qkx)
      SQL_Parallel_Compilation SQL Parallel Compilation (kkfd)
      SQL_Expression_Analysis  SQL Expression Analysis (qke)
      SQL_Plan_Management      SQL Plan Managment (kkopm)
    MPGE                       MPGE (qksctx)
  SQL_Execution                SQL Execution (qer, qes, kx, qee)
    Parallel_Execution         Parallel Execution (qerpx, qertq, kxfr, kxfx, kxfq, kxfp)
      PX_Messaging             Parallel Execution Messaging (kxfp)
      PX_Group                 Parallel Execution Slave Group (kxfp)
      PX_Affinity              Parallel Affinity (ksxa)
      PX_Buffer                Parallel Execution Buffers (kxfpb)
      PX_Granule               Parallel Execution Granules (kxfr)
      PX_Control               Parallel Execution Control (kxfx)
      PX_Table_Queue           Parallel Execution Table Queues (kxfq)
      PX_Scheduler             Parallel Execution Scheduler (qerpx)
      PX_Queuing               Parallel Execution Queuing (kxfxq)
    Bloom_Filter               Bloom Filter (qerbl, qesbl)
      PX_Blackbox              Parallel Execution Blackbox (kxf)
  PGA_Manage                   PGA Memory Management
    PGA_Compile                PGA Memory Compilation
    PGA_IMM                    PGA Memory Instance Manage
    PGA_CMM                    PGA Memory Cursor Manage
    PGA_ADV                    PGA Memory Advisor
  rdbms_dde                    RDBMS Diagnostic Data Extractor (dbke)
  VOS                          VOS (ks)
    hang_analysis              Hang Analysis (ksdhng)
    background_proc            Background Processes (ksb, ksbt)
    system_param               System Parameters (ksp, kspt)
    ksu                        Kernel Service User (ksu)
      ksutac                   KSU Timeout Actions
    ksv_trace                  Kernel Services Slave Management (ksv)
  sql_mon                      SQL Monitor (keswx)
    sql_mon_deamon             SQL Monitor Deamon
    sql_mon_query              SQL Monitor Query
  CACHE_RCV                    Cache Recovery (kcv, kct, kcra, kcrp, kcb)
  DIRPATH_LOAD                 Direct Path Load (kl, kdbl, kpodp)
    DIRPATH_LOAD_BIS           Direct Path Kpodpbis Routine (kpodp)
  RAC                          Real Application Clusters
    GES                        Global Enqueue Service
    GCS                        Global Cache Service (kjb)
    GSIPC                      Global Enqueue/Cache Service IPC
      KSI                      Kernel Service Instance locking (ksi)
      RAC_ENQ                  Enqueue Operations
    RAC_RCFG                   Reconfiguration
    RAC_DRM                    Dynamic Remastering
    RAC_MRDOM                  Multiple Recovery Domains
    CGS                        Cluster Group Services (kjxg)
    CGSIMR                     Instance Membership Recovery (kjxgr)
      DD                       GES Deadlock Detection
      GCS_BSCN                 Broadcast SCN (kjb, kcrfw)
    RAC_WLM                    Work Load Management (wlm)
    RAC_MLMDS                  RAC Multiple LMS (kjm)
      GCS_READMOSTLY           GCS Read-mostly (kjb)
      GCS_READER_BYPASS        GCS Reader Bypass (kjb)
      GCS_DELTAPUSH            GCS Delta Push (kjb)
      RAC_BCAST                Enqueue Broadcast Operations
  db_trace                     RDBMS server only tracing
  kst                          server trace layer tracing (kst)
  ddedmp                       RDBMS Diagnostic Data Extractor Dumper (dbked)
  cursor                       Shared Cursor (kxs, kks)
    Bind_Capture               Bind Capture Tracing
  KSM                          Kernel Service Memory (ksm)
  KSE                          Kernel Service Error Manager (kse)
  explain                      SQL Explain Plan (xpl)
  rdbms_event                  RDBMS Events (dbkd)
  LOB_INODE                    Lob Inode (kdli)
  rdbms_adr                    RDBMS ADR (dbkr)
  ASM                          Automatic Storage Management (kf)
    KFK                        KFK (kfk)
      KFKIO                    KFK IO (kfkio)
      KFKSB                    KFK subs (kfksubs)
    KFN                        ASM Networking subsystem (kfn)
      KFNU                     ASM Umbillicus (kfnm, kfns, kfnb)
      KFNS                     ASM Server networking (kfns)
      KFNC                     ASM Client networking (kfnc)
    KFIS                       ASM Intelligent Storage interfaces (kfis)
    KFM                        ASM Node Monitor Interface Implementation (kfm)
      KFMD                     ASM Node Monitor Layer for Diskgroup Registration (kfmd)
      KFMS                     ASM Node Monitor Layers Support Function Interface (kfms)
    KFFB                       ASM Metadata Block (kffb)
    KFFD                       ASM Metadata Directory (kffd)
    KFZ                        ASM Zecurity subsystem (kfz)
    KFC                        ASM Cache (kfc)
    KFR                        ASM Recovery (kfr)
    KFE                        ASM attributes (kfe)
    KFDP                       ASM PST (kfdp)
    KFG                        ASM diskgroups (kfg)
    KFDS                       ASM staleness registry and resync (kfds)
  DML                          DML Drivers (ins, del, upd)
  Health_Monitor               Health Monitor
  DRA                          Data Repair Advisor
  DIRACC                       Direct access to fixed tables (kqfd)
  ASH                          Active Session History (kewa)
  PART                         Partitioning (kkpo, qespc, qesma, kkpa, qergi)
    PART_IntPart               Interval Partitioning
    PART_Dictionary            Partitioning Dictionary (kkpod)
  LOB_KDLW                     Lob kdlw (kdlw)
  LOB_KDLX                     Lob xfm (kdlx)
  LOB_KDLXDUP                  Lob dedup (kdlxdup)
  LOB_KDLRCI                   Lob rci (kdlrci)
  LOB_KDLA                     SecureFile Archive (kdla)
  SQL_Manage                   SQL Manageability (kes)
    SQL_Manage_Infra           Other SQL Manageability Infrastructure (kesai, kesqs, kesatm, kesutl, kessi, keswat, keswts, keswsq)
    SQL_Tune                   SQL Tuning Advisor (kest)
      SQL_Tune_Auto            SQL Tuning Advisor (auto-tune) (kestsa)
      SQL_Tune_Index           SQL Tuning Advisor (index-tune) (kestsi)
      SQL_Tune_Plan            SQL Tuning Advisor (plan node analysis) (kestsp)
      SQL_Tune_Px              SQL Tuning Advisor (parallel execution) (kestsa)
      SQL_Tune_Fr              SQL Tuning Advisor (fix regression) (kestsa)
    SQL_Test_Exec              SQL Test-Execute Service (kestse)
    SQL_Perf                   SQL Performance Analyzer (kesp, keswpi)
    SQL_Repair                 SQL Repair Advisor (kesds)
        Auto_Tune_Opt          Auto Tuning Optimizer (kkoat)
    SQL_trace_parser           SQL trace parser (kesstp)
  SQL_Analyze                  SQL Analyze (qksan)
  SQL_DS                       SQL Dynamic Sampling Services (qksds)
  SQL_DDL                      SQL DDL (atb, ctc, dtb)
  RAT_WCR                      Real Application Test: Workload Capture and Replay (kec)
  Spatial                      Spatial (md)
    Spatial_IND                Spatial Indexing (mdr)
    Spatial_GR                 Spatial GeoRaster (mdgr)
  Text                         Text (dr)
  rdbms_gc                     RDBMS Diagnostic Generic Configuration (dbkgc)
  XS                           XS Fusion Security (kzx)
    XSSESSION                  XS Session (kzxs)
    XSPRINCIPAL                XS Principal (kzxu)
    XSSECCLASS                 XS Security Class (kzxc)
    XSXDS                      XS Data Security (kzxd)
    XSVPD                      XS VPD
    XSXDB_DEFAULT              XS XDB
    XS_MIDTIER                 XS Midtier (kpuzxs)
  AQ                           Streams Advanced Queuing (kwq, kkcn, kpon, kpoaq, kpce, kpcm, kpun, kpuaq)
    AQ_DEQ                     Streams Advanced Queuing Dequeue (kwqid, kwqdl)
    AQ_TM                      Streams Advanced Queuing Time Manager (kwqit, kwqmn)
  KSFM                         Kernel Service File Mapping (ksfm)
  KXD                          Exadata specific Kernel modules (kxd)
    KXDAM                      Exadata Disk Auto Manage (kxdam)
    KCFIS                      Exadata Predicate Push (kcfis)
    NSMTIO                     Trace Non Smart I/O (nsmtio)
  DV                           Database Vault (kzv)

SCN Headroom与时光倒流到1988年的Oracle数据库

最近一阵关于scn headroom的讨论很热,  这是由于在最新的2012 Apr的PSU中例如10.2.0.5上的PSU 13632743和 patch  13916709: SCN: HIGH CALLS TO KCMGAS AFTER APPLYING SCN PATCHES 中引入了对scn增长过快的FIX修复。

Oracle SCN(System Change Number)也叫做系统变更号,Oracle中的Commit操作与SCN紧密相关。

引入SCN的最根本目的在于:

  1. 为读一致性所用
  2. 为redolog中的记录排序,以及恢复

 

 

SCN由SCN Base和Scn Wrap组成,是一种6个字节的结构(structure)。其中SCN Base占用4个字节,而SCN wrap占用2个字节。但在实际存储时SCN-like的stucture常会占用8个字节。

 

ub4 kscnbas
ub2 kscnwrp
struct kcvfhcrs, 8 bytes                 @100                              Creation Checkpointed at scn
ub4 kscnbas                        @100      0x000a8849              www.askmaclean.com
ub2 kscnwrp                        @104      0x0000

 

 

当事务提交COMMIT时,需要完成第一个操作就是得到一个SCN值。

 

SCN是Oracle数据库内部的一种逻辑时间戳,通过SCN将数据库内的事件理清次序, 这是保证事务属性ACID的必要信息。

 

数据库使用SCN来帮助实现查询和跟踪变化。举例来说,当一个事务更新一行数据,那么数据库就需要将该update发生时的SCN记录下来,该事务(transaction)中的其他修改操作通常都会使用同样的SCN。当一个事务commit提交,数据库又会相应地记录提交时的SCN。 多个事务同事commit可能会共享同一个SCN。

 

SCN总是单调递增的序列, Oracle数据库最大可以使用到的SCN上限值是一个天文数字,目前该上限是281万亿,即281,474,976,710,656(2的48次方)。这是对SCN的硬限制,理论上一个数据库的SCN总是不能超过281万亿, 以每秒16K的增速计算,花费557年SCN上限才会被耗尽,作为一个hard limit ,我们很少有机会触及。

 

除了281万亿的hard limit外, Oracle数据库还存在一种 soft limit 即SCN headroom, 为了保证SCN的增长速度不要过于离谱,Oracle使用了一种基于时间的限量供应SCN的系统。

 

关于headroom ,字典翻译头上空间, 实际你可以理解为 在一个房间内 天花板和 头部之间的空间, 数据库的Current SCN就是你头部的高度,那么(房间高度-头部高度)=headroom,

在任何时间点上,Oracle数据库均会计算一个当前时间点DB 不能超过的SCN LIMIT上限, 注意这里提到了时间点 ,通俗点说这个SCN LIMIT是随着时间流逝在增加的。
Oracle计算的算法基于从1988年到当前时间点的秒数,再乘上16,384(16k),用SQL表达就如以下语句:

 

 

select
(((to_number(to_char(sysdate,'YYYY'))-1988)*12*31*24*60*60) +
((to_number(to_char(sysdate,'MM'))-1)*31*24*60*60) +
--www.askmaclean.com
(((to_number(to_char(sysdate,'DD'))-1))*24*60*60) +
(to_number(to_char(sysdate,'HH24'))*60*60) +
(to_number(to_char(sysdate,'MI'))*60) +
(to_number(to_char(sysdate,'SS'))))*16384 from dual;

 

 

 

变化下公式就是    (current_time-1988)       *    16384 – (current_scn) = headroom。
1988到当前时间的秒数   *    16384 –  当前SCN

即SCN headroom是 当前SCN 和(1988年到当前时间的秒数*    16384)之间的差距。

 

通过将SCN的增长率和时间流逝关联起来,确保SCN的限量增长,保证Oracle数据库理论上可以处理500年的数据。

 

 

通过以上公式我们可以发现SCN每秒的合理增长量为16,384,然而Oracle公司在近年发现某些软件层面的BUG会导致数据库试图超过或接近这个当前时间点的SCN 最大值。
常规情况下,若数据库尝试超过当前的SCN最大值,数据库将会cancel取消掉引发该超越事件的事务, 在应用程序层面将看到一个错误。在接下来的一秒钟,上限值会随着时间而增长16k,因此通常应用程序会在短暂停顿后继续工作。 但是在极少数情况下,数据库可能需要为了保护自身的完整性而shutdown关闭,理论上不会造成数据丢失或corrupted。

 

类似于计算机网络中的时钟同步,当2个数据库通过DBLINK相互交流访问时,他们会选用2者中当前Current SCN最大的一个来同步SCN, 譬如说数据库A 的SCN 是10000,而数据库B是20000,当2者发生DBLINK联系时,将会用最大的SCN (20000)来同步,数据库A的SCN将jump跳跃到20000。 在一些环境中,往往不是本地数据库触发了SCN快速增长的bug,而是众多数据库中的某一个存在活跃的SCN BUG,而其他数据库与该问题数据库发生DBLINK联系后,就会因为SCN同步而经历 SCN headroom的的极速减少; 换句话说就是一只老鼠坏了一锅汤,异常高的SCN会通过DBLINK传播给正常的数据库,这种传播往往呈爆炸式发展。
由于数据库总是会拒绝SCN超过当前的SCN上限,所以Oracle官方宣称Oracle数据库理论运行500年的SCN预备量不会受以上问题的影响。 但是受到传播的数据库仍可能由于自我保护的原因而宕机。

 

Oracle官方宣布所有与SCN headroom相关的bug均已在January 2012 CPU 和相关的PSU中修复了, 同样的修复补丁也被包含在DB Patchset Update (PSU) 以及最新的Exadata和Windows的Bundle Patch上。

 

 

有一些客户纠结于他们的SCN接近于当前SCN最大值,且SCN的增长量远大于他们处理数据库的合理值。在所有这些cases中Oracle 发现均是January 2012 CPU 中已经修复bug的现象,在客户实施这些修复后SCN headroom 开始有效增长了。

 

为了保证系统不出现潜在的问题,用户可以运行Metalink Note”Installing, Executing and Interpreting output from the “scnhealthcheck.sql” script [ID 1393363.1]”中包含的脚本scnhealthcheck.sql来检查特定数据库的当前SCN距离当前SCN 最大值有多少差距。该脚本会警告用户该数据库接近于当前SCN的最大上限,在这种情况下建议立即对受影响数据库实施CPU/PSU补丁。实施以上补丁后的预期结果是SCN headroom有效增长,官方宣称在所有case中都如预期一样。

 

Oracle推荐尽可能客户尽可能快地APPLY CPU补丁以处理最新发现的安全问题。

 

从长远来看Oracle会在今后的版本或补丁中将SCN的hard limit从281万亿 提高到一个更高数字。

 

你肯定好奇于为什么这里要使用 1988年到当前时间的秒数,1988这个年份有什么意义?

 

 

我们来看看Oracle数据库中的1988:

 

[oracle@vrh8 ~]$ strings $ORACLE_HOME/bin/oracle > oracle.log
[oracle@vrh8 ~]$ grep 1988  oracle.log
1988
xsaggr.c:1988
Version: %d {0 = 1988 }
SQL>
SQL> select  * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bi
PL/SQL Release 10.2.0.5.0 - Production
CORE    10.2.0.5.0      Production
TNS for Linux: Version 10.2.0.5.0 - Production
NLSRTL Version 10.2.0.5.0 - Production
SQL> select * from global_name;
GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmaclean.com
SQL> oradebug setmypid;
Statement processed.
SQL> oradebug dump controlf 4
Statement processed.
SQL> oradebug tracefile_name;
/s01/admin/G10R25/udump/g10r25_ora_9823.trc
THREAD #1 - status:0x2 flags:0x0 dirty:252
low cache rba:(0xe.7742.0) on disk rba:(0xe.7b44.0)
on disk scn: 0x0000.002d4ac2 06/03/2012 10:03:34
resetlogs scn: 0x0000.00294b33 05/22/2012 11:33:28
heartbeat: 784994177 mount id: 2670678794
THREAD #2 - status:0x0 flags:0x0 dirty:0
low cache rba:(0x0.0.0) on disk rba:(0x0.0.0)
on disk scn: 0x0000.00000000 01/01/1988 00:00:00
resetlogs scn: 0x0000.00000000 01/01/1988 00:00:00
heartbeat: 0 mount id: 0
THREAD #3 - status:0x0 flags:0x0 dirty:0
low cache rba:(0x0.0.0) on disk rba:(0x0.0.0)
on disk scn: 0x0000.00000000 01/01/1988 00:00:00
resetlogs scn: 0x0000.00000000 01/01/1988 00:00:00
heartbeat: 0 mount id: 0
THREAD #4 - status:0x0 flags:0x0 dirty:0
low cache rba:(0x0.0.0) on disk rba:(0x0.0.0)
on disk scn: 0x0000.00000000 01/01/1988 00:00:00
resetlogs scn: 0x0000.00000000 01/01/1988 00:00:00
heartbeat: 0 mount id: 0
THREAD #5 - status:0x0 flags:0x0 dirty:0
low cache rba:(0x0.0.0) on disk rba:(0x0.0.0)
on disk scn: 0x0000.00000000 01/01/1988 00:00:00
resetlogs scn: 0x0000.00000000 01/01/1988 00:00:00
heartbeat: 0 mount id: 0
THREAD #6 - status:0x0 flags:0x0 dirty:0
low cache rba:(0x0.0.0) on disk rba:(0x0.0.0)
on disk scn: 0x0000.00000000 01/01/1988 00:00:00
resetlogs scn: 0x0000.00000000 01/01/1988 00:00:00
***************************************************************************
DATA FILE RECORDS
***************************************************************************
(size = 428, compat size = 428, section max = 100, section in-use = 8,
last-recid= 421, old-recno = 0, last-recno = 0)
(extent = 1, blkno = 11, numrecs = 100)
DATA FILE #1:
(name #10) /s01/oradata/G10R25/datafile/o1_mf_system_7ch7d4mn_.dbf
creation size=0 block size=8192 status=0xe head=10 tail=10 dup=1
tablespace 0, index=1 krfil=1 prev_file=0
unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00
Checkpoint cnt:418 scn: 0x0000.002d2072 06/03/2012 03:41:16
Stop scn: 0xffff.ffffffff 05/24/2012 09:51:21
Creation Checkpointed at scn:  0x0000.00000007 04/20/2010 08:24:26

 

1988年为什么如此重要?

 

在1988年发布了Oracle V6,首次实现了行级锁定,首次实现了数据库热备份,Oracle公司从Belmont移到加利福尼亚的redwood  shores,并引入了PL/SQL。

 

我相信目前使用版本7-11g  仍沿用了大量的V6的代码,V6的代码中做了大量DEFINE的工作,这大概是一切的开始。

这就好像是”And God said, Let there be light”!

 

 

Technical Data and Computer Software (October 1988) 这个所有权著名,你可以在很多Oracle的早起文档上找到。

 

你肯定又要问 , 我们可以在1988年之前运行Oracle V6以后的程序吗? 假设我们获得了时光机,拿着Oracle 10.2.0.5回到1988年前

 

[root@vrh8 ~]# date -s "1985-07-25 00:00:00"
Thu Jul 25 00:00:00 EDT 1985
SQL> startup;
ORA-01513: invalid current time returned by operating system
[oracle@vrh8 ~]$ strace -o startup.log  -p 9935
Process 9935 attached - interrupt to quit
SQL> startup;
ORA-01513: invalid current time returned by operating system
[oracle@vrh8 ~]$ oerr ora 01513
01513, 00000, "invalid current time returned by operating system"
// *Cause:  The operating system returned a time that was not between
//          1988 and 2121.
// *Action: Correct the time kept by the operating system.

 

 

可以看到 Oracle数据库可运行的时间区间其实是 1988-2121年,500年的SCN headroom其实没什么用处, 没有哪个凡人或者DBA等得起5个世纪!

微观ORACLE(一):PMON Release Lock

我们在学习Oracle基础知识的时候肯定了解过PMON后台进程的功能,包括轮训其他ORACLE PROCESS,清理cleanup dead process并释放锁release enqueue lock ,及清理cleanup latch。

 

虽然这些理论在你耳边萦绕了千百回, 但你是否有亲眼见证过一次Pmon cleanup dead process并release lock呢?大多数人可能没有。

 

微观Oracle=> MicroOracle, Maclean带领你见证微观视角下的Oracle behavior:

 

 

SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL> select * from global_name;
GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmaclean.com 
SQL> select pid,program  from v$process;
       PID PROGRAM
---------- ------------------------------------------------
         1 PSEUDO
         2 oracle@vrh1.oracle.com (PMON)
         3 oracle@vrh1.oracle.com (PSP0)
         4 oracle@vrh1.oracle.com (VKTM)
         5 oracle@vrh1.oracle.com (GEN0)
         6 oracle@vrh1.oracle.com (DIAG)
         7 oracle@vrh1.oracle.com (DBRM)
         8 oracle@vrh1.oracle.com (PING)
         9 oracle@vrh1.oracle.com (ACMS)
        10 oracle@vrh1.oracle.com (DIA0)
        11 oracle@vrh1.oracle.com (LMON)
        12 oracle@vrh1.oracle.com (LMD0)
        13 oracle@vrh1.oracle.com (LMS0)
        14 oracle@vrh1.oracle.com (RMS0)
        15 oracle@vrh1.oracle.com (LMHB)
        16 oracle@vrh1.oracle.com (MMAN)
        17 oracle@vrh1.oracle.com (DBW0)
        18 oracle@vrh1.oracle.com (LGWR)
        19 oracle@vrh1.oracle.com (CKPT)
        20 oracle@vrh1.oracle.com (SMON)
        21 oracle@vrh1.oracle.com (RECO)
        22 oracle@vrh1.oracle.com (RBAL)
        23 oracle@vrh1.oracle.com (ASMB)
        24 oracle@vrh1.oracle.com (MMON)
        25 oracle@vrh1.oracle.com (MMNL)
        26 oracle@vrh1.oracle.com (MARK)
        27 oracle@vrh1.oracle.com (D000)
        28 oracle@vrh1.oracle.com (SMCO)
        29 oracle@vrh1.oracle.com (S000)
        30 oracle@vrh1.oracle.com (LCK0)
        31 oracle@vrh1.oracle.com (RSMN)
        32 oracle@vrh1.oracle.com (TNS V1-V3)
        33 oracle@vrh1.oracle.com (W000)
        34 oracle@vrh1.oracle.com (TNS V1-V3)
        35 oracle@vrh1.oracle.com (TNS V1-V3)
        37 oracle@vrh1.oracle.com (ARC0)
        38 oracle@vrh1.oracle.com (ARC1)
        40 oracle@vrh1.oracle.com (ARC2)
        41 oracle@vrh1.oracle.com (ARC3)
        43 oracle@vrh1.oracle.com (GTX0)
        44 oracle@vrh1.oracle.com (RCBG)
        46 oracle@vrh1.oracle.com (QMNC)
        47 oracle@vrh1.oracle.com (TNS V1-V3)
        48 oracle@vrh1.oracle.com (TNS V1-V3)
        49 oracle@vrh1.oracle.com (Q000)
        50 oracle@vrh1.oracle.com (Q001)
        51 oracle@vrh1.oracle.com (GCR0)
SQL> drop table maclean;
Table dropped.
SQL> create table maclean(t1 int);
Table created.
SQL> insert into maclean values(1);
1 row created.
SQL> commit;
Commit complete.

 

以上构建了示例环境, 可以看到在该实例中:
PID=2  PMON
PID=11 LMON
PID=18 LGWR
PID=20 SMON
PID=12 LMD

 

我们尝试构造2个存在”enq: TX – row lock contention”冲突的事务,之后使KILL掉第一个事务,这会导致需要PMON来recover dead process并release TX lock:

 

PROCESS A:
QL> select addr,spid,pid from v$process where addr = ( select paddr from v$session where sid=(select distinct sid from v$mystat));
ADDR             SPID                            PID
---------------- ------------------------ ----------
00000000BD516B80 17880                            46
SQL> select distinct sid from v$mystat;
       SID
----------
        22
SQL> update maclean set t1=t1+1;
1 row updated.
PROCESS B
SQL> select addr,spid,pid from v$process where addr = ( select paddr from v$session where sid=(select distinct sid from v$mystat));
ADDR             SPID                            PID
---------------- ------------------------ ----------
00000000BD515AD0 17908                            45
SQL> update maclean set t1=t1+1;
HANG..............
PROCESS B 因为"enq: TX – row lock contention"而HANG住

 

 

我们使用PROCESS C启用 对SMON的10500 event trace 和对PMON的KST TRACE:

 

SQL> set linesize 200 pagesize 1400
SQL> select * from v$lock where sid=22;
ADDR             KADDR                   SID TY        ID1        ID2      LMODE    REQUEST      CTIME      BLOCK
---------------- ---------------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
00000000BDCD7618 00000000BDCD7670         22 AE        100          0          4          0         48          2
00007F63268A9E28 00007F63268A9E88         22 TM      77902          0          3          0         32          2
00000000B9BB4950 00000000B9BB49C8         22 TX     458765        892          6          0         32          1
PROCESS A holde的ENQUEUE LOCK有三个 AE、TM、TX
SQL> alter system switch logfile;
System altered.
SQL> alter system checkpoint;
System altered.
SQL>  alter system flush buffer_cache;
System altered.
SQL> alter system set "_trace_events"='10000-10999:255:2,20,33';
System altered.
SQL> ! kill -9 17880
KILL PROCESS A 并等待PROCESS B完成update
对PMON 和 PROCESS B 做errorstack 将KST TRACE记录到磁盘
SQL> oradebug setorapid 2;
Oracle pid: 2, Unix process pid: 17533, image: oracle@vrh1.oracle.com (PMON)
SQL> oradebug dump errorstack 4;
Statement processed.
SQL> oradebug tracefile_name
/s01/orabase/diag/rdbms/vprod/VPROD1/trace/VPROD1_pmon_17533.trc
SQL> oradebug setorapid 45;
Oracle pid: 45, Unix process pid: 17908, image: oracle@vrh1.oracle.com (TNS V1-V3)
SQL> oradebug dump errorstack 4;
Statement processed.
SQL>oradebug tracefile_name
/s01/orabase/diag/rdbms/vprod/VPROD1/trace/VPROD1_ora_17908.trc

 

 

分析PMON的 KST TRACE:

 

2012-05-18 10:37:34.557225 :8001ECE8:db_trace:ktur.c@5692:ktugru(): [10444:2:1] next rollback uba: 0x00000000.0000.00
2012-05-18 10:37:34.557382 :8001ECE9:db_trace:ksl2.c@16009:ksl_update_post_stats(): [10005:2:1] KSL POST SENT postee=18 num=4 loc='ksa2.h LINE:285 ID:ksasnd' id1=0 id2=0 name=   type=0
2012-05-18 10:37:34.557514 :8001ECEA:db_trace:ksq.c@8540:ksqrcli(): [10704:2:1] ksqrcl: release TX-0007000d-0000037c mode=X
2012-05-18 10:37:34.558819 :8001ECF0:db_trace:ksl2.c@16009:ksl_update_post_stats(): [10005:2:1] KSL POST SENT postee=45 num=5 loc='kji.h LINE:3418 ID:kjata: wake up enqueue owner' id1=0 id2=0 name=   type=0
2012-05-18 10:37:34.559047 :8001ECF8:db_trace:ksl2.c@16009:ksl_update_post_stats(): [10005:2:1] KSL POST SENT postee=12 num=6 loc='kjm.h LINE:1224 ID:kjmpost: post lmd' id1=0 id2=0 name=   type=0
2012-05-18 10:37:34.559271 :8001ECFC:db_trace:ksq.c@8826:ksqrcli(): [10704:2:1] ksqrcl: SUCCESS
2012-05-18 10:37:34.559291 :8001ECFD:db_trace:ktu.c@8652:ktudnx(): [10813:2:1] ktudnx: dec cnt xid:7.13.892 nax:0 nbx:0
2012-05-18 10:37:34.559301 :8001ECFE:db_trace:ktur.c@3198:ktuabt(): [10444:2:1] ABORT TRANSACTION - xid: 0x0007.00d.0000037c
2012-05-18 10:37:34.559327 :8001ECFF:db_trace:ksq.c@8540:ksqrcli(): [10704:2:1] ksqrcl: release TM-0001304e-00000000 mode=SX
2012-05-18 10:37:34.559365 :8001ED00:db_trace:ksq.c@8826:ksqrcli(): [10704:2:1] ksqrcl: SUCCESS
2012-05-18 10:37:34.559908 :8001ED01:db_trace:ksq.c@8540:ksqrcli(): [10704:2:1] ksqrcl: release AE-00000064-00000000 mode=S
2012-05-18 10:37:34.559982 :8001ED02:db_trace:ksq.c@8826:ksqrcli(): [10704:2:1] ksqrcl: SUCCESS
2012-05-18 10:37:34.560217 :8001ED03:db_trace:ksfd.c@15379:ksfdfods(): [10298:2:1] ksfdfods:fob=0xbab87b48 aiopend=0
2012-05-18 10:37:34.560336 :GSIPC:kjcs.c@4876:kjcsombdi(): GSIPC:SOD: 0xbc79e0c8 action 3 state 0 chunk (nil) regq 0xbc79e108 batq 0xbc79e118
2012-05-18 10:37:34.560357 :GSIPC:kjcs.c@5293:kjcsombdi(): GSIPC:SOD: exit cleanup for 0xbc79e0c8 rc: 1, loc: 0x303
2012-05-18 10:37:34.560375 :8001ED04:db_trace:kss.c@1414:kssdch(): [10809:2:1] kssdch(0xbd516b80 = process, 3) 1 0 exit
2012-05-18 10:37:34.560939 :8001ED06:db_trace:kmm.c@10578:kmmlrl(): [10257:2:1] KMMLRL: Entering: flg(0x0) rflg(0x4)
2012-05-18 10:37:34.561091 :8001ED07:db_trace:kmm.c@10472:kmmlrl_process_events(): [10257:2:1] KMMLRL: Events: succ(3) wait(0) fail(0)
2012-05-18 10:37:34.561100 :8001ED08:db_trace:kmm.c@11279:kmmlrl(): [10257:2:1] KMMLRL: Reg/update: flg(0x0) rflg(0x4)
2012-05-18 10:37:34.563325 :8001ED0B:db_trace:kmm.c@12511:kmmlrl(): [10257:2:1] KMMLRL: Update: ret(0)
2012-05-18 10:37:34.563335 :8001ED0C:db_trace:kmm.c@12768:kmmlrl(): [10257:2:1] KMMLRL: Exiting: flg(0x0) rflg(0x4)
2012-05-18 10:37:34.563354 :8001ED0D:db_trace:ksl2.c@2598:kslwtbctx(): [10005:2:1] KSL WAIT BEG [pmon timer] 300/0x12c 0/0x0 0/0x0 wait_id=78 seq_num=79 snap_id=1

 

 

PMON发现dead process A后首先释放了其手上的TX Lock:
ksqrcl: release TX-0007000d-0000037c mode=X

 

接着它异步Post Process B,告诉Process B 你acquire的TX lock已经被我释放了:
KSL POST SENT postee=45 num=5 loc=’kji.h LINE:3418 ID:kjata: wake up enqueue owner’ id1=0 id2=0 name=   type=0

 

Process B在收到PMON通知后立即继续了工作
ksl2.c@14563:ksliwat(): [10005:45:151] KSL POST RCVD poster=2 num=5 loc=’kji.h LINE:3418 ID:kjata: wake up enqueue owner’ id1=0 id2=0 name=   type=0 fac#=3 posted=0x3 may_be_posted=1
kslwtbctx(): [10005:45:151] KSL WAIT BEG [latch: ges resource hash list] 3162668560/0xbc827e10 91/0x5b 0/0x0 wait_id=14 seq_num=15 snap_id=1
kslwtectx(): [10005:45:151] KSL WAIT END [latch: ges resource hash list] 3162668560/0xbc827e10 91/0x5b 0/0x0 wait_id=14 seq_num=15 snap_id=1

 

 

 

在RAC中它还会POST LMD(lock Manager)进程,通知其改变相关的GES信息:
2012-05-18 10:37:34.559047 :8001ECF8:db_trace:ksl2.c@16009:ksl_update_post_stats(): [10005:2:1] KSL POST SENT postee=12 num=6 loc=’kjm.h LINE:1224 ID:kjmpost: post lmd’ id1=0 id2=0 name=   type=0

 

 

 

之后ksqrcl: release TX的工作才真正成功:
ksq.c@8826:ksqrcli(): [10704:2:1] ksqrcl: SUCCESS

 

之后PMON abort Process A原有的Transaction
2012-05-18 10:37:34.559291 :8001ECFD:db_trace:ktu.c@8652:ktudnx(): [10813:2:1] ktudnx: dec cnt xid:7.13.892 nax:0 nbx:0
2012-05-18 10:37:34.559301 :8001ECFE:db_trace:ktur.c@3198:ktuabt(): [10444:2:1] ABORT TRANSACTION – xid: 0x0007.00d.0000037c

 

释放Process A原本持有的maclean表的TM lock:
ksq.c@8540:ksqrcli(): [10704:2:1] ksqrcl: release TM-0001304e-00000000 mode=SX
ksq.c@8826:ksqrcli(): [10704:2:1] ksqrcl: SUCCESS

 

释放Process A原本持有的AE ( Prevent Dropping an edition in use) lock:
ksq.c@8540:ksqrcli(): [10704:2:1] ksqrcl: release AE-00000064-00000000 mode=S
ksq.c@8826:ksqrcli(): [10704:2:1] ksqrcl: SUCCESS

 

清理cleanup process A
kjcs.c@4876:kjcsombdi(): GSIPC:SOD: 0xbc79e0c8 action 3 state 0 chunk (nil) regq 0xbc79e108 batq 0xbc79e118
GSIPC:kjcs.c@5293:kjcsombdi(): GSIPC:SOD: exit cleanup for 0xbc79e0c8 rc: 1, loc: 0x303
kss.c@1414:kssdch(): [10809:2:1] kssdch(0xbd516b80 = process, 3) 1 0 exit

 

0xbd516b80是原PROCESS A 的paddr 起始地址

kssdch函数的主要作用是 删除process相关的state object SO KSS: delete children of state obj.

 

PMON 调用kmmlrl()函数更新instance goodness统计update for session drop delta
kmmlrl(): [10257:2:1] KMMLRL: Entering: flg(0x0) rflg(0x4)
kmmlrl_process_events(): [10257:2:1] KMMLRL: Events: succ(3) wait(0) fail(0)
kmmlrl(): [10257:2:1] KMMLRL: Reg/update: flg(0x0) rflg(0x4)
kmmlrl(): [10257:2:1] KMMLRL: Update: ret(0)
kmmlrl(): [10257:2:1] KMMLRL: Exiting: flg(0x0) rflg(0x4)

 

完成手头的工作后PMON再次进入 3s一次的”pmon timer”等待
kslwtbctx(): [10005:2:1] KSL WAIT BEG [pmon timer] 300/0x12c 0/0x0 0/0x0 wait_id=78 seq_num=79 snap_id=1

Know More About Oracle Row Lock

我们都知道在Oracle中实现了细粒度的行锁row lock,且在ORACLE的内部实现中没有使用基于内存的行锁管理器,row lock是依赖于数据块本身实现的。换句话说判定一行数据究竟有没有没锁住,要求Server Process去pin住相应的block buffer并检查才能够发现。

但是试想一个场景,若process A 通过update语句锁定了数据表 Z数据块上的一行数据, 并长时间既没有rollback亦没有commit;此时Process B也运行了一条DML语句, 它通过索引找到rowid并找到了 Z数据块, 它发现这一行数据已经被process A发起的一个事务的ITL锁住了,它可能试图做一些cleanout操作,但是发现被锁住的row在相关的事务表中仍未被commit, 那么很抱歉的是Process B需要进入”enq: TX – row lock contention”的等待事件了。 问题在于Process B的无尽等待的终结在哪里呢?

 

有同学肯定会说只要Process A释放了其锁定的row,那么Process B就会立即结束其”enq: TX – row lock contention”等待了。

 

事实是这样的吗?  我们来看一个演示:

 

 

SESSION A:
SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bi
PL/SQL Release 10.2.0.5.0 - Production
CORE    10.2.0.5.0      Production
TNS for Linux: Version 10.2.0.5.0 - Production
NLSRTL Version 10.2.0.5.0 - Production
SQL> select * from global_name;
GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmaclean.com
SQL> create table maclean_lock(t1 int);
Table created.
SQL> insert into maclean_lock values (1);
1 row created.
SQL> commit;
Commit complete.
SQL> select dbms_rowid.rowid_block_number(rowid),dbms_rowid.rowid_relative_fno(rowid) from maclean_lock;
DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID)
------------------------------------ ------------------------------------
                               67642                                    1
SQL>  select distinct sid from v$mystat;
       SID
----------
       142
SQL> select pid,spid from v$process where addr = ( select paddr from v$session where sid=(select distinct sid from v$mystat));
       PID SPID
---------- ------------
        17 15636
使用SESSION A 创建一个savepoint ,并update 表上的唯一一行数据        
SQL>  savepoint NONLOCK;
Savepoint created.
SQL> select * From v$Lock where sid=142;
no rows selected
SQL> set linesize 140 pagesize 1400
SQL>  update maclean_lock set t1=t1+2;
1 row updated.
SQL> select * From v$Lock where sid=142;
ADDR             KADDR                   SID TY        ID1        ID2      LMODE    REQUEST      CTIME      BLOCK
---------------- ---------------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
0000000091FC69F0 0000000091FC6A18        142 TM      55829          0          3          0          6          0
00000000914B4008 00000000914B4040        142 TX     393232        609          6          0          6          0        
SQL> select dump(3,16) from dual;
DUMP(3,16)
--------------------------------------------------------------------------------
Typ=2 Len=2: c1,4
ALTER SYSTEM DUMP DATAFILE 1 BLOCK 67642;
 Object id on Block? Y
 seg/obj: 0xda16  csc: 0x00.234718  itc: 2  flg: O  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000a.00f.000001e0  0x00800075.02a6.29  C---    0  scn 0x0000.00234711
0x02   0x0007.018.000001fe  0x0080065c.017a.02  ----    1  fsc 0x0000.00000000
data_block_dump,data header at 0x81d185c
===============
tsiz: 0x1fa0
hsiz: 0x14
pbl: 0x081d185c
bdba: 0x0041083a
     76543210
flag=--------
ntab=1
nrow=1
frre=-1
fsbo=0x14
fseo=0x1f9a
avsp=0x1f83
tosp=0x1f83
0xe:pti[0]      nrow=1  offs=0
0x12:pri[0]     offs=0x1f9a
block_row_dump:
tab 0, row 0, @0x1f9a
tl: 6 fb: --H-FL-- lb: 0x2  cc: 1
col  0: [ 2]  c1 04
end_of_block_dump
观察 BLOCK DUMP 可以发现 唯一的一行被XID=0x0007.018.000001fe 的transaction锁定 lb:0x1
启动SESSION B ,执行同样的UPDATE语句 会引发enq: TX - row lock contention 等待
SQL> select distinct sid from v$mystat;
       SID
----------
       140
SQL> select pid,spid from v$process where addr = ( select paddr from v$session where sid=(select distinct sid from v$mystat));
       PID SPID
---------- ------------
        24 15652
SQL> alter system set "_trace_events"='10000-10999:255:24';
System altered.        
SQL> update maclean_lock set t1=t1+2;
select * From v$Lock where sid=142 or sid=140 order by sid;
SESSION C:
SQL> select * From v$Lock where sid=142 or sid=140 order by sid;
ADDR             KADDR                   SID TY        ID1        ID2      LMODE    REQUEST      CTIME      BLOCK
---------------- ---------------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
0000000091FC6B10 0000000091FC6B38        140 TM      55829          0          3          0         84          0
00000000924F4A58 00000000924F4A78        140 TX     458776        510          0          6         84          0
00000000914B51E8 00000000914B5220        142 TX     458776        510          6          0        312          1
0000000091FC69F0 0000000091FC6A18        142 TM      55829          0          3          0        312          0
可以看到 SESSION B SID=140 对SESSION A 的TX ENQUEUE 有X mode的REQUEST
SQL> oradebug dump systemstate 266;
Statement processed.
SESSION B waiter's enqueue lock
      SO: 0x924f4a58, type: 5, owner: 0x92bb8dc8, flag: INIT/-/-/0x00
      (enqueue) TX-00070018-000001FE    DID: 0001-0018-00000022
      lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  res_flag: 0x6
      req: X, lock_flag: 0x0, lock: 0x924f4a78, res: 0x925617c0
      own: 0x92b76be0, sess: 0x92b76be0, proc: 0x92a737a0, prv: 0x925617e0
TX-00070018-000001FE=> TX 458776 510
SESSION A owner's enqueue lock 
      SO: 0x914b51e8, type: 40, owner: 0x92b796d0, flag: INIT/-/-/0x00
      (trans) flg = 0x1e03, flg2 = 0xc0000, prx = 0x0, ros = 2147483647 bsn = 0xed5 bndsn = 0xee7 spn = 0xef7
      efd = 3
      file:xct.c lineno:1179
      DID: 0001-0011-000000C2
      parent xid: 0x0000.000.00000000
      env: (scn: 0x0000.00234718  xid: 0x0007.018.000001fe  uba: 0x0080065c.017a.02  statement num=0  parent xid: xid: 0x0000.000.00000000  scn: 0x00
00.00234718 0sch: scn: 0x0000.00000000)
      cev: (spc = 7818  arsp = 914e8310  ubk tsn: 1 rdba: 0x0080065c  useg tsn: 1 rdba: 0x00800069
            hwm uba: 0x0080065c.017a.02  col uba: 0x00000000.0000.00
            num bl: 1 bk list: 0x91435070)
            cr opc: 0x0 spc: 7818 uba: 0x0080065c.017a.02
      (enqueue) TX-00070018-000001FE    DID: 0001-0011-000000C2
      lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  res_flag: 0x6
      mode: X, lock_flag: 0x0, lock: 0x914b5220, res: 0x925617c0
      own: 0x92b796d0, sess: 0x92b796d0, proc: 0x92a6ffd8, prv: 0x925617d0
       xga: 0x8b7c6d40, heap: UGA
      Trans IMU st: 2 Pool index 65535, Redo pool 0x914b58d0, Undo pool 0x914b59b8
      Redo pool range [0x86de640 0x86de640 0x86e0e40]
      Undo pool range [0x86dbe40 0x86dbe40 0x86de640]
        ----------------------------------------
        SO: 0x91435070, type: 39, owner: 0x914b51e8, flag: -/-/-/0x00
        (List of Blocks) next index = 1
        index   itli   buffer hint   rdba       savepoint
        -----------------------------------------------------------
            0      2   0x647f1fc8    0x41083a     0xee7
在SESSION A中 ROLLBACK 到savepoint:
SQL> rollback to NONLOCK;
Rollback complete.
因为这个savepoint 是在update语句之前创建的 所以UPDATE相关的操作应当被 ROLLBACK:
SQL> select * From v$Lock where sid=142 or sid=140;
ADDR             KADDR                   SID TY        ID1        ID2      LMODE    REQUEST      CTIME      BLOCK
---------------- ---------------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
00000000924F4A58 00000000924F4A78        140 TX     458776        510          0          6        822          0
0000000091FC6B10 0000000091FC6B38        140 TM      55829          0          3          0        822          0
00000000914B51E8 00000000914B5220        142 TX     458776        510          6          0       1050          1
可以看到 SESSION A 142 回滚到SAVEPOINT 也释放了表上的TM LOCK 
但是请注意 ROLLBACK TO SAVEPOINT并不会释放SESSION已获得TX LOCK!!!!
如上面的输出SESSION 142仍持有TX ID1=458776 ID2=510, 这是因为ROLLBACK TO SAVEPOINT并不会放弃整个事务ABORT TRANSACTION
但是 SESSION B  SID=140仍被  SESSION A 阻塞 , 实际观察也可以发现SESSION B的 update语句仍HANG着。
我们来看一下此时的CACHE中的数据块:
 Object id on Block? Y
 seg/obj: 0xda16  csc: 0x00.2347b7  itc: 2  flg: O  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000a.00f.000001e0  0x00800075.02a6.29  C---    0  scn 0x0000.00234711
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
data_block_dump,data header at 0x745d85c
===============
tsiz: 0x1fa0
hsiz: 0x14
pbl: 0x0745d85c
bdba: 0x0041083a
     76543210
flag=--------
ntab=1
nrow=1
frre=-1
fsbo=0x14
fseo=0x1f9a
avsp=0x1f83
tosp=0x1f83
0xe:pti[0]      nrow=1  offs=0
0x12:pri[0]     offs=0x1f9a
block_row_dump:
tab 0, row 0, @0x1f9a
tl: 6 fb: --H-FL-- lb: 0x0  cc: 1
col  0: [ 2]  c1 02
end_of_block_dump
可以看到 ITL=0x02的 事务已经被回滚清理,col  0: [ 2]  c1 02 的数据已经不被锁定
此时若我们另开一个SESSION D ,那么因为没有row lock所以 其UPDATE是可以顺利完成的
SESSION D:
SQL> update maclean_lock set t1=t1+2;
1 row updated.
SQL> rollback;
Rollback complete.
那么SESSION B 为什么无谓地等待着呢?
这就涉及到ORACLE的内部实现机制了, 注意虽然很多时候我们把 TX lock叫做 row lock , 但是实际上它们是2回事。
row lock是基于数据块实现的, 而TX lock则是通过内存中的ENQUEUE LOCK实现的。
问题在于若一个进程PROCESS K在DML过程中发现其所需要的数据行已经被其他进程锁定了,如果不依赖于内存中的TX LOCK, 
这意味着PROCESS Z需要定期去读取检查该数据行锁在的数据块以发现相应的ROW LOCK是否已经被释放了, 
可以想象如果在OLTP环境中这样去设计所造成的性能损失将是巨大的。 
所以ROW LOCK的Release 就需要依赖于TX的ENQUEUE LOCK,大致的过程是这样的Process J 首先锁定了数据块中的一行,
Process K需要更新同样的一行数据 ,Process K读取该行锁在数据块,发现该row piece的lb不是0x0 ,而指向一个ITL,
Process Z分析该ITL就可以得到之前Process J的事务的XID,就可以找到Process J这个事务的TX lock,
PROCESS K 就会在TX resource的Enqueue Waiter Linked List上创建一个X mode(exclusive)的enqueue lock。
这样当Process J释放TX lock时,Process J就会查看该TX resource的Enqueue Waiter Linked List
并发现Process K还在那里等待,并会POST一个信息给Process K说 TX lock已经被我释放,
隐含的意思就是row lock也已经被我释放,你可以继续工作了。
我们来细致入微地观察 这整个过程:
SESSION A 对应的PID =17 我们在之前的演示中已经自解释了这一点
SESSION B 对应的PID =24 
通过之前所做的 "_trace_events"='10000-10999:255:24';  KST trace 可以详细地观察 Server Process的行为
SESSION A PID=17  首先 acqure了SX mode表上的TM Lock ,之后 启动事务Transaction绑定了一个UNDO SEGMENT 7,获取了XID 7.24.510,
并acquire 了X mode的 TX-00070018-000001fe 。
这里可以看到 00070018-000001fe 其实就是 7- 24 - 510即 XID 。
781F4B8A:007A569C    17   142 10704  83 ksqgtl: acquire TM-0000da15-00000000 mode=SX flags=GLOBAL|XACT why="contention"
781F4B92:007A569D    17   142 10704  19 ksqgtl: SUCCESS
781F4BB3:007A569E    17   142 10812   2 0x000000000041083A 0x0000000000000000 0x0000000000234717
781F4BBA:007A569F    17   142 10812   3 0x0000000000000000 0x0000000000000000 0x0000000000000000
781F4BC0:007A56A0    17   142 10812   4 0x0000000000000000 0x0000000000000000 0x0000000000000000
781F4BD3:007A56A1    17   142 10812   5 0x000000000041083A 0x0000000000000000 0x0000000000000000
781F4BFE:007A56A2    17   142 10811   1 0x000000000041083A 0x0000000000000000 0x0000000000234711 0x0000000000000002
781F4C06:007A56A3    17   142 10811   2 0x000000000041083A 0x0000000000000000 0x0000000000234718 0x00007FA074EDA560
781F4C26:007A56A4    17   142 10813   1 ktubnd: Bind usn 7 nax 1 nbx 0 lng 0 par 0
781F4C43:007A56A5    17   142 10813   2 ktubnd: Txn Bound xid: 7.24.510
781F4C4A:007A56A6    17   142 10704  83 ksqgtl: acquire TX-00070018-000001fe mode=X flags=GLOBAL|XACT why="contention"
781F4C51:007A56A7    17   142 10704  19 ksqgtl: SUCCESS
之后因为前台没有操作 所以进入空闲等待
781F4CBF:007A56A8    17   142 10005   1 KSL WAIT BEG [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0
781F4CCC:007A56A9    17   142 10005   2 KSL WAIT END [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 time=13
781F4CDE:007A56AA    17   142 10005   1 KSL WAIT BEG [SQL*Net message from client] 1650815232/0x62657100 1/0x1 0/0x0
786BD85D:007A57E0    17   142 10005   2 KSL WAIT END [SQL*Net message from client] 1650815232/0x62657100 1/0x1 0/0x0 time=5016447
786BD966:007A57E1    17   142 10005   1 KSL WAIT BEG [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0
786BD96E:007A57E2    17   142 10005   2 KSL WAIT END [SQL*Net message to client] 1650815232/0x62657100 1/0x1 0/0x0 time=8
SESSION B 对应的PID =24  ,它也首先获得了 SX mode的 TM lock,发现row lock后 acquire X mode的TX-00070018-000001fe
ksqgtl: acquire TM-0000da15-00000000 mode=SX flags=GLOBAL|XACT why="contention"
ksqgtl: SUCCESS
0x000000000041083A 0x0000000000000000 0x00000000002354F8
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
0x000000000041083A 0x0000000000000000 0x00000000002354F8 0x0000000000000001
0x000000000041083A 0x0000000000000000 0x00000000002354F8 0x0000000008A63780
0x0000000000000001 0x0000000000800861 0x0000000000000241 0x0000000000000001
0x000000000041083A 0x0000000000000001 0x0000000000000001
0x000000000041083A 0x0000000000000000 0x00000000002354F9 0x0000000000000002
ksqgtl: acquire TX-00070018-000001fe mode=X flags=GLOBAL|LONG why="row lock contention"
C4048EBD:007F52B6    24   140 10005   2 KSL WAIT END [enq: TX - row lock contention] 1415053318/0x54580006 458776/0x70018 510/0x1fe time=2929879
C4048ED4:007F52B7    24   140 10005   1 KSL WAIT BEG [enq: TX - row lock contention] 1415053318/0x54580006 458776/0x70018 510/0x1fe
C43146CA:007F535E    24   140 10005   2 KSL WAIT END [enq: TX - row lock contention] 1415053318/0x54580006 458776/0x70018 510/0x1fe time=2930676
因为等待的时间过久 ,PID=24 还会定期调用ksqcmi检查是否存在死锁 deadlock
C43146D9:007F535F    24   140 10704 134 ksqcmi: performing local deadlock detection on TX-00070018-000001fe
C43146F8:007F5360    24   140 10704 150 ksqcmi: deadlock not detected on TX-00070018-000001fe
接着 我们对 PID 17 执行ROLLBACK 彻底回滚 ,真正放弃这个事务:
PID 17 
ROLLBACK;
D7A495BB:007F9D3E    17   142 10005   4 KSL POST SENT postee=24 loc='ksqrcl' id1=0 id2=0 name=   type=0
D7A495D8:007F9D3F    17   142 10444  12 ABORT TRANSACTION - xid: 0x0007.018.000001fe
注意  PID 17 查看了 TX resource的Enqueue Waiter linked List 发现了PID 24在等待,于是使用KSL POST SENT 告知 PID 24,
我已经ksqrcl释放了ENQUEUE LOCK
而PID 24也立即收到了KSL POST (KSL POST RCVD poster=17), 并ksqgtl成功获得 TX-00070018-000001fe 随后 ksqrcl释放,
注意PID 24实际不会沿用这个 TX lock和USN ,而是自行绑定了 USN 3 XID 3.11.582 ,并成功acquire TX-0003000b-00000246
D7A49616:007F9D41    24   140 10005   3 KSL POST RCVD poster=17 loc='ksqrcl' id1=0 id2=0 name=   type=0 fac#=0 facpost=1
D7A4961C:007F9D42    24   140 10704  19 ksqgtl: SUCCESS
D7A4967D:007F9D43    24   140 10704 117 ksqrcl: release TX-00070018-000001fe mode=X
D7A496A5:007F9D44    24   140 10813   1 ktubnd: Bind usn 3 nax 1 nbx 0 lng 0 par 0
D7A496C2:007F9D45    24   140 10813   2 ktubnd: Txn Bound xid: 3.11.582
D7A496C7:007F9D46    24   140 10704  83 ksqgtl: acquire TX-0003000b-00000246 mode=X flags=GLOBAL|XACT why="contention"
D7A496E4:007F9D47    24   140 10704  19 ksqgtl: SUCCESS

 

 

 

ROW LOCK的Release 就需要依赖于TX的ENQUEUE LOCK,大致的过程是这样的Process J 首先锁定了数据块中的一行, Process K需要更新同样的一行数据 ,Process K读取该行锁在数据块,发现该row piece的lb不是0x0 ,而指向一个ITL,Process Z分析该ITL就可以得到之前Process J的事务的XID,就可以找到Process J这个事务的TX lock,PROCESS K 就会在TX resource的Enqueue Waiter Linked List上创建一个X mode(exclusive)的enqueue lock。 这样当Process J释放TX lock时,Process J就会查看该TX resource的Enqueue Waiter Linked List 并发现Process K还在那里等待,并会POST一个信息给Process K说 TX lock已经被我释放,隐含的意思就是row lock也已经被我释放,你可以继续工作了。

Know more about Cache Buffer Handle

在之前的文章《latch free:cache buffer handles造成的SQL性能问题》中我介绍了cache buffer handle latch的一些知识,在这里我们复习一下:

“当会话需要pin住buffer header时它首先要获去buffer handle,得到buffer handle的过程中首先要抢占cache buffer handles栓,为了避免对于cache buffer handles栓的过度争用,每个会话被允许cache一小撮buffer handles,也叫保留集(reserved set)。该保留集的上限由隐式参数_db_handles_cached(默认为5)所控制,在此基础上会话在执行不是十分复杂的SQL时不必反复申请栓。
同时存在一些十分复杂的语句,需要同时pin住大量的缓存,此时若存在空闲的handle,会话被允许索要手头5个cached buffer handles以外的handle。也为了限制单个会话的资源占用量,Oracle通过内部算法总是尽量让每个进程能够pin住”合理份额“的buffer,由此单个会话同一时间所能持有的总handle数等于db_block_buffers/processes,隐式参数_cursor_db_buffers_pinned指出了该值。另cache buffer handles并没有子栓,仅有父栓本身,所以如果系统中有大量复杂SQL,会话所能cache的buffer handles远不足以满足需求时,该栓就成为整个系统性能的瓶颈,甚至可能出现潜在的死锁/饿死问题。”


应网友在T.ASKMACLEAN.COM上的提问,我们介绍更多cache Buffer handle相关的知识:

 

cache buffer handle 结构:

 

------------------------------
|  Buffer state object       |
------------------------------
|  Place to hang the buffer  |
------------------------------
|  Consistent Get?           |
------------------------------
|  Proc Owning SO            |
------------------------------
|  Flags(RIR)                |
------------------------------

 

来看一个 cache buffer handle

SO: 70000046fdfe530, type: 24, owner: 70000041b018630, flag: INIT/-/-/0x00
(buffer) (CR) PR: 70000048e92d148 FLG: 0x500000
lock rls: 0, class bit: 0
kcbbfbp: [BH: 7000001c7f069b0, LINK: 70000046fdfe570]
where: kdswh02: kdsgrp, why: 0
BH (7000001c7f069b0) file#: 12 rdba: 0x03061612 (12/398866) class: 1 ba: 7000001c70ee000
set: 75 blksize: 8192 bsi: 0 set-flg: 0 pwbcnt: 0
dbwrid: 2 obj: 66209 objn: 48710 tsn: 6 afn: 12
hash: [700000485f12138,700000485f12138] lru: [70000025af67790,700000132f69ee0]
lru-flags: hot_buffer
ckptq: [NULL] fileq: [NULL] objq: [700000114f5dd10,70000028bf5d620]
use: [70000046fdfe570,70000046fdfe570] wait: [NULL]
st: SCURRENT md: SHR tch: 0
flags: affinity_lock
LRBA: [0x0.0.0] HSCN: [0xffff.ffffffff] HSUB: [65535]
where: kdswh02: kdsgrp, why: 0

# Example:
#   (buffer) (CR) PR: 37290 FLG:    0
#   kcbbfbp    : [BH: befd8, LINK: 7836c] (WAITING)

Buffer handle (X$KCBBF) kernel cache, buffer buffer_handles

Query x$kcbbf  – lists all the buffer handles

相关参数

_db_handles             System-wide simultaneous buffer operations ,no of buffer handles
_db_handles_cached      Buffer handles cached each process , no of processes  default 5
_cursor_db_buffers_pinned  additional number of buffers a cursor can pin at once
_session_kept_cursor_pins       Number of cursors pins to keep in a session

When a buffer is pinned it is attached to buffer state object.

 

 

接下来 我们实际体验一下 cache buffer handles latch 和 buffer pin的影响:

 

 

SESSION A :
SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bi
PL/SQL Release 10.2.0.5.0 - Production
CORE    10.2.0.5.0      Production
TNS for Linux: Version 10.2.0.5.0 - Production
NLSRTL Version 10.2.0.5.0 - Production
SQL> create table test_cbc_handle(t1 int);
Table created.
SQL> insert into test_cbc_handle values(1);
1 row created.
SQL> commit;
Commit complete.
SQL> select rowid from test_cbc_handle; 
ROWID
------------------
AAANO6AABAAAQZSAAA
SQL> select * from test_cbc_handle where rowid='AAANO6AABAAAQZSAAA';
        T1
----------
         1 
SQL> select addr,name from v$latch_parent where name='cache buffer handles';
ADDR             NAME
---------------- --------------------------------------------------
00000000600140A8 cache buffer handles
SQL> select to_number('00000000600140A8','xxxxxxxxxxxxxxxxxxxx') from dual;
TO_NUMBER('00000000600140A8','XXXXXXXXXXXXXXXXXXXX')
----------------------------------------------------
                                          1610694824
注意cache buffer handles只有一个parent latch 而没有 child latch
我们让SESSION A hold 住唯一的一个cache buffer handles parent latch
这会用到 oradebug call kslgetl 命令, kslgetl是oracle内部get latch的函数
SQL> oradebug setmypid;
Statement processed.
SQL> oradebug call kslgetl 1610694824 1;
Function returned 1
另外开一个SESSION B 来观察:
SQL> select * from v$latchholder;
       PID        SID LADDR            NAME                                                                   GETS
---------- ---------- ---------------- ---------------------------------------------------------------- ----------
        15        141 00000000600140A8 cache buffer handles                                                    119
cache buffer handles latch 确实被session A hold住了,这将阻塞其他进程acquire cache buffer handle latch
SQL> select * from test_cbc_handle where rowid='AAANO6AABAAAQZSAAA';
        T1
----------
         1
但此时其他Server Process还是可以正常 read buffer, 这是因为隐藏参数"_db_handles_cached",
默认process会cache 5个 cache buffer handle 
修改"_db_handles_cached"=0,让process不再拥有5个缓存的cache buffer handle ,
这将导致 process 若需要pin buffer,则必须hold cache buffer handle latch以便分配一个cache buffer handle
SQL> alter system set "_db_handles_cached"=0 scope=spfile;
System altered.
并重启实例 
shutdown immediate;
startup;
session A:
SQL> oradebug setmypid;
Statement processed.
SQL> oradebug call kslgetl 1610694824 1;
Function returned 1
session B:
select * from test_cbc_handle where rowid='AAANO6AABAAAQZSAAA';
session B hang!!
WHY?
SQL> oradebug setmypid;
Statement processed.
SQL> oradebug dump systemstate 266;
Statement processed.
  SO: 0x11b30b7b0, type: 2, owner: (nil), flag: INIT/-/-/0x00
  (process) Oracle pid=22, calls cur/top: (nil)/0x11b453c38, flag: (0) -
            int error: 0, call error: 0, sess error: 0, txn error 0
  (post info) last post received: 0 0 0
              last post received-location: No post
              last process to post me: none
              last post sent: 0 0 0
              last post sent-location: No post
              last process posted by me: none
    (latch info) wait_event=0 bits=8
      holding    (efd=4) 600140a8 cache buffer handles level=3
  SO: 0x11b305810, type: 2, owner: (nil), flag: INIT/-/-/0x00
  (process) Oracle pid=10, calls cur/top: 0x11b455ac0/0x11b450a58, flag: (0) -
            int error: 0, call error: 0, sess error: 0, txn error 0
  (post info) last post received: 0 0 0
              last post received-location: No post
              last process to post me: none
              last post sent: 0 0 0
              last post sent-location: No post
              last process posted by me: none
    (latch info) wait_event=0 bits=2
        Location from where call was made: kcbzgs:
      waiting for 600140a8 cache buffer handles level=3
FBD93353:000019F0    10   162 10005   1 KSL WAIT BEG [latch: cache buffer handles] 1610694824/0x600140a8 125/0x7d 0/0x0
FF936584:00002761    10   144 10005   1 KSL WAIT BEG [latch: cache buffer handles] 1610694824/0x600140a8 125/0x7d 0/0x0
PID=22 holding 持有cache buffer handles latch
PID=10 等待 cache buffer handles latch, 这是因为"_db_handles_cached"=0 导致 process自身没有缓存cache buffer handles
这个systemstate中找不到 kcbbfbp cache buffer handle对象,
因为 "_db_handles_cached"=0 且 cache buffer handles latch被hold 住了
我们释放cache buffer handles latch , 来观察 buffer被pin住而不是释放时的情境
session A exit 
session B:
SQL> select * from v$latchholder;
no rows selected
SQL> insert into test_cbc_handle values(2);
1 row created.
SQL> commit;
Commit complete.
SQL>
SQL> select t1,rowid from test_cbc_handle;
        T1 ROWID
---------- ------------------
         1 AAANPAAABAAAQZSAAA
         2 AAANPAAABAAAQZSAAB
SQL> select spid,pid from v$process where addr = ( select paddr from v$session where sid=(select distinct sid from v$mystat));
SPID                PID
------------ ----------
19251                10
用 GDB 对 SPID=19215 做debug , 使用 kcbrls 函数作为breakpoint 这会让 进程无法release buffer
[oracle@vrh8 ~]$ gdb $ORACLE_HOME/bin/oracle 19251
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-37.el5)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /s01/oracle/product/10.2.0.5/db_1/bin/oracle...(no debugging symbols found)...done.
Attaching to program: /s01/oracle/product/10.2.0.5/db_1/bin/oracle, process 19251
Reading symbols from /s01/oracle/product/10.2.0.5/db_1/lib/libskgxp10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/oracle/product/10.2.0.5/db_1/lib/libskgxp10.so
Reading symbols from /s01/oracle/product/10.2.0.5/db_1/lib/libhasgen10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/oracle/product/10.2.0.5/db_1/lib/libhasgen10.so
Reading symbols from /s01/oracle/product/10.2.0.5/db_1/lib/libskgxn2.so...(no debugging symbols found)...done.
Loaded symbols for /s01/oracle/product/10.2.0.5/db_1/lib/libskgxn2.so
Reading symbols from /s01/oracle/product/10.2.0.5/db_1/lib/libocr10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/oracle/product/10.2.0.5/db_1/lib/libocr10.so
Reading symbols from /s01/oracle/product/10.2.0.5/db_1/lib/libocrb10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/oracle/product/10.2.0.5/db_1/lib/libocrb10.so
Reading symbols from /s01/oracle/product/10.2.0.5/db_1/lib/libocrutl10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/oracle/product/10.2.0.5/db_1/lib/libocrutl10.so
Reading symbols from /s01/oracle/product/10.2.0.5/db_1/lib/libjox10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/oracle/product/10.2.0.5/db_1/lib/libjox10.so
Reading symbols from /s01/oracle/product/10.2.0.5/db_1/lib/libclsra10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/oracle/product/10.2.0.5/db_1/lib/libclsra10.so
Reading symbols from /s01/oracle/product/10.2.0.5/db_1/lib/libdbcfg10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/oracle/product/10.2.0.5/db_1/lib/libdbcfg10.so
Reading symbols from /s01/oracle/product/10.2.0.5/db_1/lib/libnnz10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/oracle/product/10.2.0.5/db_1/lib/libnnz10.so
Reading symbols from /usr/lib64/libaio.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libaio.so.1
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnsl.so.1
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnss_files.so.2
0x00000035c000d940 in __read_nocancel () from /lib64/libpthread.so.0
 (gdb) break kcbrls
Breakpoint 1 at 0x10e5d24
session B:
select * from test_cbc_handle where rowid='AAANPAAABAAAQZSAAA'; select hang !!
GDB
(gdb) c
Continuing.
Breakpoint 1, 0x00000000010e5d24 in kcbrls ()
(gdb) bt
#0  0x00000000010e5d24 in kcbrls ()
#1  0x0000000002e87d25 in qertbFetchByUserRowID ()
#2  0x00000000030c62b8 in opifch2 ()
#3  0x00000000032327f0 in kpoal8 ()
#4  0x00000000013b7c10 in opiodr ()
#5  0x0000000003c3c9da in ttcpip ()
#6  0x00000000013b3144 in opitsk ()
#7  0x00000000013b60ec in opiino ()
#8  0x00000000013b7c10 in opiodr ()
#9  0x00000000013a92f8 in opidrv ()
#10 0x0000000001fa3936 in sou2o ()
#11 0x000000000072d40b in opimai_real ()
#12 0x000000000072d35c in main ()
SQL> oradebug setmypid;
Statement processed.
SQL> oradebug dump systemstate 266;
Statement processed.
此时可以看到 kcbbfbp buffer cache handle 将  SO state object 和 BH BUFFER HEADER  link在一起
    ----------------------------------------
    SO: 0x11b452348, type: 3, owner: 0x11b305810, flag: INIT/-/-/0x00
    (call) sess: cur 11b41bd18, rec 0, usr 11b41bd18; depth: 0
      ----------------------------------------
      SO: 0x1182dc750, type: 24, owner: 0x11b452348, flag: INIT/-/-/0x00
      (buffer) (CR) PR: 0x11b305810 FLG: 0x108000
      class bit: (nil)
      kcbbfbp: [BH: 0xf2fc69f8, LINK: 0x1182dc790]
      where: kdswh05: kdsgrp, why: 0
      BH (0xf2fc69f8) file#: 1 rdba: 0x00410652 (1/67154) class: 1 ba: 0xf297c000
        set: 3 blksize: 8192 bsi: 0 set-flg: 2 pwbcnt: 272
        dbwrid: 0 obj: 54208 objn: 54202 tsn: 0 afn: 1
        hash: [f2fc47f8,1181f3038] lru: [f2fc6b88,f2fc6968]
        obj-flags: object_ckpt_list
        ckptq: [1182ecf38,1182ecf38] fileq: [1182ecf58,1182ecf58] objq: [108712a28,108712a28]
        use: [1182dc790,1182dc790] wait: [NULL]
        st: XCURRENT md: SHR tch: 12
        flags: buffer_dirty gotten_in_current_mode block_written_once
                redo_since_read
        LRBA: [0xc7.73b.0] HSCN: [0x0.1cbe52] HSUB: [1]
        Using State Objects
          ----------------------------------------
          SO: 0x1182dc750, type: 24, owner: 0x11b452348, flag: INIT/-/-/0x00
          (buffer) (CR) PR: 0x11b305810 FLG: 0x108000
          class bit: (nil)
          kcbbfbp: [BH: 0xf2fc69f8, LINK: 0x1182dc790]
          where: kdswh05: kdsgrp, why: 0
        buffer tsn: 0 rdba: 0x00410652 (1/67154)
        scn: 0x0000.001cbe52 seq: 0x01 flg: 0x02 tail: 0xbe520601
        frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
tab 0, row 0, @0x1f9a
tl: 6 fb: --H-FL-- lb: 0x0  cc: 1
col  0: [ 2]  c1 02
tab 0, row 1, @0x1f94
tl: 6 fb: --H-FL-- lb: 0x2  cc: 1
col  0: [ 2]  c1 15
end_of_block_dump        
(buffer) (CR) PR: 0x11b305810 FLG: 0x108000
st: XCURRENT md: SHR tch: 12
该 buffer header的status= XCURRENT mode=KCBMSHARE KCBMSHR     current share
也可以通过  x$kcbbf 找到相关的 cache buffer handle
SQL> select distinct KCBBPBH from  x$kcbbf ;
KCBBPBH
----------------
00
00000000F2FC69F8            ==>0xf2fc69f8
SQL> select * from x$kcbbf where kcbbpbh='00000000F2FC69F8';
ADDR                   INDX    INST_ID KCBBFSO_TYP KCBBFSO_FLG KCBBFSO_OWN
---------------- ---------- ---------- ----------- ----------- ----------------
  KCBBFFLG    KCBBFCR    KCBBFCM KCBBFMBR         KCBBPBH
---------- ---------- ---------- ---------------- ----------------
KCBBPBF          X0KCBBPBH        X0KCBBPBF        X1KCBBPBH
---------------- ---------------- ---------------- ----------------
X1KCBBPBF        KCBBFBH            KCBBFWHR   KCBBFWHY
---------------- ---------------- ---------- ----------
00000001182DC750        748          1          24           1 000000011B452348
   1081344          1          0 00               00000000F2FC69F8
00000001182DC750 00               00000001182DC750 00
00000001182DC7F8 00                      583          0
SQL> desc x$kcbbf;
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ADDR                                               RAW(8)
 INDX                                               NUMBER
 INST_ID                                            NUMBER
 KCBBFSO_TYP                                        NUMBER
 KCBBFSO_FLG                                        NUMBER
 KCBBFSO_OWN                                        RAW(8)
 KCBBFFLG                                           NUMBER
 KCBBFCR                                            NUMBER
 KCBBFCM                                            NUMBER
 KCBBFMBR                                           RAW(8)
 KCBBPBH                                            RAW(8)
 KCBBPBF                                            RAW(8)
 X0KCBBPBH                                          RAW(8)
 X0KCBBPBF                                          RAW(8)
 X1KCBBPBH                                          RAW(8)
 X1KCBBPBF                                          RAW(8)
 KCBBFBH                                            RAW(8)
 KCBBFWHR                                           NUMBER
 KCBBFWHY                                           NUMBER
gdb 退出 让process能够顺利完成kcbrls release buffer后 相关的cache buffer handle被清理
SQL> select distinct KCBBPBH from  x$kcbbf ;
KCBBPBH
----------------
00

Global Cache CR Requested But Current Block Received

这篇文章和之前的《MINSCN与Cache Fusion Read Consistent》 是姊妹篇,他们源于同一个问题帖子

我们来重现提问者所看到的这样一个场景:

 

SQL> select * from V$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL> select count(*) from gv$instance;
COUNT(*)
----------
2
SQL> select * from global_name;
GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmaclean.com

 

 

 

在11gR2 2节点RAC环境中将一个数据块的status修改为XG,假设这个Xcurrent block当前在INSTANCE 2被hold住,这时我们在INSTANCE 1反复查询这个数据块,并观察结果:

 

SQL> select  * from test;
ID
----------
1
2
SQL> select dbms_rowid.rowid_block_number(rowid),dbms_rowid.rowid_relative_fno(rowid) from test;
DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID)
------------------------------------ ------------------------------------
89233                                    1
89233                                    1
SQL> alter system flush buffer_cache;
System altered.
INSTANCE 1 Session A:
SQL>  update test set id=id+1 where id=1;
1 row updated.
INSTANCE 1 Session B:
SQL> select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;
STATE CR_SCN_BAS
---------- ----------
1          0
3    1755287
SQL> oradebug setmypid;
Statement processed.
SQL> oradebug dump gc_elements 255;
Statement processed.
SQL> oradebug tracefile_name;
/s01/orabase/diag/rdbms/vprod/VPROD1/trace/VPROD1_ora_19111.trc
GLOBAL CACHE ELEMENT DUMP (address: 0xa4ff3080):
id1: 0x15c91 id2: 0x1 pkey: OBJ#76896 block: (1/89233)
lock: X rls: 0x0 acq: 0x0 latch: 3
flags: 0x20 fair: 0 recovery: 0 fpin: 'kdswh11: kdst_fetch'
bscn: 0x0.146e20 bctx: (nil) write: 0 scan: 0x0
lcp: (nil) lnk: [NULL] lch: [0xa9f6a6f8,0xa9f6a6f8]
seq: 32 hist: 58 145:0 118 66 144:0 192 352 197 48 121 113 424 180 58
LIST OF BUFFERS LINKED TO THIS GLOBAL CACHE ELEMENT:
flg: 0x02000001 lflg: 0x1 state: XCURRENT tsn: 0 tsh: 2
addr: 0xa9f6a5c8 obj: 76896 cls: DATA bscn: 0x0.1ac898
BH (0xa9f6a5c8) file#: 1 rdba: 0x00415c91 (1/89233) class: 1 ba: 0xa9e56000
set: 5 pool: 3 bsz: 8192 bsi: 0 sflg: 3 pwc: 0,15
dbwrid: 0 obj: 76896 objn: 76896 tsn: 0 afn: 1 hint: f
hash: [0x91f4e970,0xbae9d5b8] lru: [0x91f58848,0xa9f6a828]
lru-flags: debug_dump
obj-flags: object_ckpt_list
ckptq: [0x9df6d1d8,0xa9f6a740] fileq: [0xa2ece670,0xbdf4ed68] objq: [0xb4964e00,0xb4964e00] objaq: [0xb4964de0,0xb4964de0]
st: XCURRENT md: NULL fpin: 'kdswh11: kdst_fetch' tch: 2 le: 0xa4ff3080
flags: buffer_dirty redo_since_read
LRBA: [0x19.5671.0] LSCN: [0x0.1ac898] HSCN: [0x0.1ac898] HSUB: [1]
buffer tsn: 0 rdba: 0x00415c91 (1/89233)
scn: 0x0000.001ac898 seq: 0x01 flg: 0x00 tail: 0xc8980601
frmt: 0x02 chkval: 0x0000 type: 0x06=trans data

 

 

可以看到此时block: (1/89233)的GLOBAL CACHE ELEMENT DUMP中LOCK状态仍是X 而非XG , 这是因为这个Current Block仅在一个Instance中被modify修改过,没有在全局范围内被更新过。

 

 

紧接着在Instance 2 修改该块:

 

 

Instance 2 Session C:
SQL> update test set id=id+1 where id=2;
1 row updated.
Instance 2 Session D:
SQL>  select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;
STATE CR_SCN_BAS
---------- ----------
1          0
3    1756658
SQL> oradebug setmypid;
Statement processed.
SQL> oradebug dump gc_elements 255;
Statement processed.
SQL> oradebug tracefile_name;
/s01/orabase/diag/rdbms/vprod/VPROD2/trace/VPROD2_ora_13038.trc
GLOBAL CACHE ELEMENT DUMP (address: 0x89fb25a0):
id1: 0x15c91 id2: 0x1 pkey: OBJ#76896 block: (1/89233)
lock: XG rls: 0x0 acq: 0x0 latch: 3
flags: 0x20 fair: 0 recovery: 0 fpin: 'kduwh01: kdusru'
bscn: 0x0.1acdf3 bctx: (nil) write: 0 scan: 0x0
lcp: (nil) lnk: [NULL] lch: [0x96f4cf80,0x96f4cf80]
seq: 61 hist: 324 21 143:0 19 16 352 329 144:6 14 7 352 197
LIST OF BUFFERS LINKED TO THIS GLOBAL CACHE ELEMENT:
flg: 0x0a000001 state: XCURRENT tsn: 0 tsh: 1
addr: 0x96f4ce50 obj: 76896 cls: DATA bscn: 0x0.1acdf6
BH (0x96f4ce50) file#: 1 rdba: 0x00415c91 (1/89233) class: 1 ba: 0x96bd4000
set: 5 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,15
dbwrid: 0 obj: 76896 objn: 76896 tsn: 0 afn: 1 hint: f
hash: [0x96ee1fe8,0xbae9d5b8] lru: [0x96f4d0b0,0x96f4cdc0]
obj-flags: object_ckpt_list
ckptq: [0xbdf519b8,0x96f4d5a8] fileq: [0xbdf519d8,0xbdf519d8] objq: [0xb4a47b90,0xb4a47b90] objaq: [0x96f4d0e8,0xb4a47b70]
st: XCURRENT md: NULL fpin: 'kduwh01: kdusru' tch: 1 le: 0x89fb25a0
flags: buffer_dirty redo_since_read remote_transfered
LRBA: [0x11.9e18.0] LSCN: [0x0.1acdf6] HSCN: [0x0.1acdf6] HSUB: [1]
buffer tsn: 0 rdba: 0x00415c91 (1/89233)
scn: 0x0000.001acdf6 seq: 0x01 flg: 0x00 tail: 0xcdf60601
frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
GCS CLIENT 0x89fb2618,6 resp[(nil),0x15c91.1] pkey 76896.0
grant 2 cvt 0 mdrole 0x42 st 0x100 lst 0x20 GRANTQ rl G0
master 1 owner 2 sid 0 remote[(nil),0] hist 0x94121c601163423c
history 0x3c.0x4.0xd.0xb.0x1.0xc.0x7.0x9.0x14.0x1.
cflag 0x0 sender 1 flags 0x0 replay# 0 abast (nil).x0.1 dbmap (nil)
disk: 0x0000.00000000 write request: 0x0000.00000000
pi scn: 0x0000.00000000 sq[(nil),(nil)]
msgseq 0x1 updseq 0x0 reqids[6,0,0] infop (nil) lockseq x2b8
pkey 76896.0
hv 93 [stat 0x0, 1->1, wm 32768, RMno 0, reminc 18, dom 0]
kjga st 0x4, step 0.0.0, cinc 20, rmno 6, flags 0x0
lb 0, hb 0, myb 15250, drmb 15250, apifrz 0

 

 

在Instance 2中被修改过后block: (1/89233)的 GLOBAL CACHE ELEMENT Lock Convert成lock: XG

除了通过GC_ELEMENTS DUMP来分析XCUR Cache Fusion外,也可以直接查询X$ VIEW,主要是 X$LE X$KJBR X$KJBL, 这三个X$ VIEW的更多信息可以很方便地从我的博客中找到:

 

 

INSTANCE 2 Session D:
SELECT *
FROM x$le
WHERE le_addr IN (SELECT le_addr
FROM x$bh
WHERE obj IN (SELECT data_object_id
FROM dba_objects
WHERE owner = 'SYS'
AND object_name = 'TEST')
AND class = 1
AND state != 3);
ADDR                   INDX    INST_ID LE_ADDR              LE_ID1     LE_ID2
---------------- ---------- ---------- ---------------- ---------- ----------
LE_RLS     LE_ACQ   LE_FLAGS    LE_MODE   LE_WRITE   LE_LOCAL LE_RECOVERY
---------- ---------- ---------- ---------- ---------- ---------- -----------
LE_BLKS    LE_TIME LE_KJBL
---------- ---------- ----------------
00007F94CA14CF60       7003          2 0000000089FB25A0      89233          1
0          0         32          2          0          1           0
1          0 0000000089FB2618

 

 

PCM Resource NAME由[ID1][ID2],[BL]等组成, ID1和ID2 通过blockno和 fileno计算获得, 这里我们直接参考以上GC_elements dump中的 id1: 0x15c91 id2: 0x1 pkey: OBJ#76896 block: (1/89233)信息 ,则  kjblname 和 kjbrname 应以”[0x15c91][0x1],[BL]” 开头:

 

 

INSTANCE 2 Session D:
SQL> set linesize 80 pagesize 1400
SQL> SELECT *
2    FROM x$kjbl l
3   WHERE l.kjblname LIKE '%[0x15c91][0x1],[BL]%';
ADDR                   INDX    INST_ID KJBLLOCKP        KJBLGRANT KJBLREQUE
---------------- ---------- ---------- ---------------- --------- ---------
KJBLROLE KJBLRESP         KJBLNAME
---------- ---------------- ------------------------------
KJBLNAME2                       KJBLQUEUE
------------------------------ ----------
KJBLLOCKST                                                       KJBLWRITING
---------------------------------------------------------------- -----------
KJBLREQWRITE  KJBLOWNER KJBLMASTER KJBLBLOCKED KJBLBLOCKER    KJBLSID KJBLRDOMID
------------ ---------- ---------- ----------- ----------- ---------- ----------
KJBLPKEY
----------
00007F94CA22A288        451          2 0000000089FB2618 KJUSEREX  KJUSERNL
0 00               [0x15c91][0x1],[BL][ext 0x0,0x
89233,1,BL                              0
GRANTED                                                                    0
0          1          0           0           0          0          0
76896
SQL> SELECT r.* FROM x$kjbr r WHERE r.kjbrname LIKE '%[0x15c91][0x1],[BL]%';
no rows selected
Instance 1 session B:
SQL>  SELECT r.* FROM x$kjbr r WHERE r.kjbrname LIKE '%[0x15c91][0x1],[BL]%';
ADDR                   INDX    INST_ID KJBRRESP         KJBRGRANT KJBRNCVL
---------------- ---------- ---------- ---------------- --------- ---------
KJBRROLE KJBRNAME                       KJBRMASTER KJBRGRANTQ
---------- ------------------------------ ---------- ----------------
KJBRCVTQ         KJBRWRITER          KJBRSID KJBRRDOMID   KJBRPKEY
---------------- ---------------- ---------- ---------- ----------
00007F801ACA68F8       1355          1 00000000B5A62AE0 KJUSEREX  KJUSERNL
0 [0x15c91][0x1],[BL][ext 0x0,0x          0 00000000B48BB330
00               00                        0          0      76896

 

 

接着我们将在Instance 1上查询block: (1/89233),这应当会引发Instance 2 build cr block 并传输给Instance 1, 为了进一步了解其细节 我们分别对 Instance 1的 Foreground Process 和 Instance 2的LMS进程做详细的RAC  TRACE:

 

Instance 2:
[oracle@vrh2 ~]$ ps -ef|grep ora_lms|grep -v grep
oracle   23364     1  0 Apr29 ?        00:33:15 ora_lms0_VPROD2
SQL> oradebug setospid 23364
Oracle pid: 13, Unix process pid: 23364, image: oracle@vrh2.oracle.com (LMS0)
SQL> oradebug event 10046 trace name context forever,level 8:10708 trace name context forever,level 103: trace[rac.*] disk high;
Statement processed.
SQL> oradebug tracefile_name
/s01/orabase/diag/rdbms/vprod/VPROD2/trace/VPROD2_lms0_23364.trc
Instance 1 session B :
SQL> select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;
STATE CR_SCN_BAS
---------- ----------
3    1756658
3    1756661
3    1755287
Instance 1 session A :
SQL> alter session set events '10046 trace name context forever,level 8:10708 trace name context forever,level 103: trace[rac.*] disk high';
Session altered.
SQL> select * from test;
ID
----------
2
2
SQL> select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;
STATE CR_SCN_BAS
---------- ----------
3    1761520

 

从x$BH的信息来看,如我们所预期的Instance 1收到了build好的CR block,进一步分析 TRACE 文件:

 

Instance 1 foreground Process:
PARSING IN CURSOR #140336527348792 len=18 dep=0 uid=0 oct=3 lid=0 tim=1335939136125254 hv=1689401402 ad='b1a4c828' sqlid='c99yw1xkb4f1u'
select * from test
END OF STMT
PARSE #140336527348792:c=2999,e=2860,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=1357081020,tim=1335939136125253
EXEC #140336527348792:c=0,e=40,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1357081020,tim=1335939136125373
WAIT #140336527348792: nam='SQL*Net message to client' ela= 6 driver id=1650815232 #bytes=1 p3=0 obj#=0 tim=1335939136125420
*** 2012-05-02 02:12:16.125
kclscrs: req=0 block=1/89233
2012-05-02 02:12:16.125574 : kjbcro[0x15c91.1 76896.0][4]
*** 2012-05-02 02:12:16.125
kclscrs: req=0 typ=nowait-abort
*** 2012-05-02 02:12:16.125
kclscrs: bid=1:3:1:0:f:1e:0:0:10:0:0:0:1:2:4:1:20:0:0:0:c3:49:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:4:3:2:1:2:0:1c:0:4d:26:a3:52:0:0:0:0:c7:c:ca:62:c3:49:0:0:0:0:1:0:14:8e:47:76:1:2:dc:5:a9:fe:17:75:0:0:0:0:0:0:0:0:0:0:0:0:99:ed:0:0:0:0:0:0:10:0:0:0
2012-05-02 02:12:16.125718 : kjbcro[0x15c91.1 76896.0][4]
2012-05-02 02:12:16.125751 : GSIPC:GMBQ: buff 0xba0ee018, queue 0xbb79a7b8, pool 0x60013fa0, freeq 0, nxt 0xbb79a7b8, prv 0xbb79a7b8
2012-05-02 02:12:16.125780 : kjbsentscn[0x0.1ae0f0][to 2]
2012-05-02 02:12:16.125806 : GSIPC:SENDM: send msg 0xba0ee088 dest x20001 seq 177740 type 36 tkts xff0000 mlen x1680198
2012-05-02 02:12:16.125918 : kjbmscr(0x15c91.1)reqid=0x8(req 0xa4ff30f8)(rinst 1)hldr 2(infosz 200)(lseq x2b8)
2012-05-02 02:12:16.126959 : GSIPC:KSXPCB: msg 0xba0ee088 status 30, type 36, dest 2, rcvr 1
*** 2012-05-02 02:12:16.127
kclwcrs: wait=0 tm=1233
*** 2012-05-02 02:12:16.127
kclwcrs: got 1 blocks from ksxprcv
WAIT #140336527348792: nam='gc cr block 2-way' ela= 1233 p1=1 p2=89233 p3=1 obj#=76896 tim=1335939136127199
2012-05-02 02:12:16.127272 : kjbcrcomplete[0x15c91.1 76896.0][0]
2012-05-02 02:12:16.127309 : kjbrcvdscn[0x0.1ae0f0][from 2][idx 2012-05-02 02:12:16.127329 : kjbrcvdscn[no bscn <= rscn 0x0.1ae0f0][from 2]

 

 

前台进程 kjbcro[0x15c91.1 76896.0][4] kjbsentscn[0x0.1ae0f0][to 2] 向Instance 2请求SCN=1ae0f0=1761520的 block: (1/89233),并进入’gc cr block 2-way’ 等待,之后成功收到请求的CR block。

 

Instance 2 LMS TRACE

 

2012-05-02 02:12:15.634057 : GSIPC:RCVD: ksxp msg 0x7f16e1598588 sndr 1 seq 0.177740 type 36 tkts 0
2012-05-02 02:12:15.634094 : GSIPC:RCVD: watq msg 0x7f16e1598588 sndr 1, seq 177740, type 36, tkts 0
2012-05-02 02:12:15.634108 : GSIPC:TKT: collect msg 0x7f16e1598588 from 1 for rcvr -1, tickets 0
2012-05-02 02:12:15.634162 : kjbrcvdscn[0x0.1ae0f0][from 1][idx 2012-05-02 02:12:15.634186 : kjbrcvdscn[no bscn1, wm 32768, RMno 0, reminc 18, dom 0]
kjga st 0x4, step 0.0.0, cinc 20, rmno 6, flags 0x0
lb 0, hb 0, myb 15250, drmb 15250, apifrz 0
GCS CLIENT END
2012-05-02 02:12:15.635211 : kjbdowncvt[0x15c91.1 76896.0][1][options x0]
2012-05-02 02:12:15.635230 : GSIPC:AMBUF: rcv buff 0x7f16e1c56420, pool rcvbuf, rqlen 1103
2012-05-02 02:12:15.635308 : GSIPC:GPBMSG: new bmsg 0x7f16e1c56490 mb 0x7f16e1c56420 msg 0x7f16e1c564b0 mlen 152 dest x101 flushsz -1
2012-05-02 02:12:15.635334 : kjbmslset(0x15c91.1)) seq 0x4 reqid=0x6 (shadow 0xb48bb330.xb)(rsn 2)(mas@1)
2012-05-02 02:12:15.635355 : GSIPC:SPBMSG: send bmsg 0x7f16e1c56490 blen 184 msg 0x7f16e1c564b0 mtype 33 attr|dest x30101 bsz|fsz x1ffff
2012-05-02 02:12:15.635377 : GSIPC:SNDQ: enq msg 0x7f16e1c56490, type 65521 seq 118669, inst 1, receiver 1, queued 1
*** 2012-05-02 02:12:15.635
kclccctx: cleanup copy 0x7f16e1d94798
2012-05-02 02:12:15.635479 : [kjmpmsgi:compl][type 36][msg 0x7f16e1598588][seq 177740.0][qtime 0][ptime 1257]
2012-05-02 02:12:15.635511 : GSIPC:BSEND: flushing sndq 0xb491dd28, id 1, dcx 0xbc516778, inst 1, rcvr 1  qlen 0 1
2012-05-02 02:12:15.635536 : GSIPC:BSEND: no batch1 msg 0x7f16e1c56490 type 65521 len 184 dest (1:1)
2012-05-02 02:12:15.635557 : kjbsentscn[0x0.1ae0f1][to 1]
2012-05-02 02:12:15.635578 : GSIPC:SENDM: send msg 0x7f16e1c56490 dest x10001 seq 118669 type 65521 tkts x10002 mlen xb800e8
WAIT #0: nam='gcs remote message' ela= 180 waittime=1 poll=0 event=0 obj#=0 tim=1335939135635819
2012-05-02 02:12:15.635853 : GSIPC:RCVD: ksxp msg 0x7f16e167e0b0 sndr 1 seq 0.177741 type 32 tkts 0
2012-05-02 02:12:15.635875 : GSIPC:RCVD: watq msg 0x7f16e167e0b0 sndr 1, seq 177741, type 32, tkts 0
2012-05-02 02:12:15.636012 : GSIPC:TKT: collect msg 0x7f16e167e0b0 from 1 for rcvr -1, tickets 0
2012-05-02 02:12:15.636040 : kjbrcvdscn[0x0.1ae0f1][from 1][idx 2012-05-02 02:12:15.636060 : kjbrcvdscn[no bscn <= rscn 0x0.1ae0f1][from 1]
2012-05-02 02:12:15.636082 : GSIPC:TKT: dest (1:1) rtkt not acked 1  unassigned bufs 0  tkts 0  newbufs 0
2012-05-02 02:12:15.636102 : GSIPC:TKT: remove ctx dest (1:1)
2012-05-02 02:12:15.636125 : [kjmxmpm][type 32][seq 0.177741][msg 0x7f16e167e0b0][from 1]
2012-05-02 02:12:15.636146 : kjbmpocr(0xb0.6)seq 0x1,reqid=0x23a,(client 0x9fff7b58,0x1)(from 1)(lseq xdf0)

 

2号实例上LMS进程的情况是这样的 收到gcs remote message GSIPC 关于要求SCN=[0x0.1ae0f0] block=1/89233的请求,进入BAST kjbmpbast(0x15c91.1),因为 block=1/89233已被写入到磁盘 触发fairness算法(在11.2.0.3中默认_fairness_threshold=2),对current block做KCL: F156: fairness downconvert,从Xcurrent DownConvert为 Scurrent:

 

Instance 2:
SQL> select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;
STATE CR_SCN_BAS
---------- ----------
2 0
3 1756658

 

之后Instance 2 LMS 将cr block加入到 kjbmslset(0x15c91.1)) 传送队列SEND QUEUE GSIPC:SNDQ: enq msg 0x7f16e1c56490。

 

接着我们再次在Instance 1上运行对 block: (1/89233)的查询 看会发生什么:

 

Instance 2:
SQL> select CURRENT_RESULTS,LIGHT_WORKS from v$cr_block_server;
CURRENT_RESULTS LIGHT_WORKS
--------------- -----------
29273         437
Instance 1 session A:
SQL>
SQL> select * from test;
ID
----------
2
2
SQL> select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;
STATE CR_SCN_BAS
---------- ----------
3    1761942
3    1761932
1          0
3    1761520
Instance 2:
SQL> select CURRENT_RESULTS,LIGHT_WORKS from v$cr_block_server;
CURRENT_RESULTS LIGHT_WORKS
--------------- -----------
29274         437
select * from test
END OF STMT
PARSE #140336529675592:c=0,e=337,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1357081020,tim=1335939668940051
EXEC #140336529675592:c=0,e=96,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1357081020,tim=1335939668940204
WAIT #140336529675592: nam='SQL*Net message to client' ela= 5 driver id=1650815232 #bytes=1 p3=0 obj#=0 tim=1335939668940348
*** 2012-05-02 02:21:08.940
kclscrs: req=0 block=1/89233
2012-05-02 02:21:08.940676 : kjbcro[0x15c91.1 76896.0][5]
*** 2012-05-02 02:21:08.940
kclscrs: req=0 typ=nowait-abort
*** 2012-05-02 02:21:08.940
kclscrs: bid=1:3:1:0:f:21:0:0:10:0:0:0:1:2:4:1:20:0:0:0:c3:49:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:4:3:2:1:2:0:1f:0:4d:26:a3:52:0:0:0:0:c7:c:ca:62:c3:49:0:0:0:0:1:0:17:8e:47:76:1:2:dc:5:a9:fe:17:75:0:0:0:0:0:0:0:0:0:0:0:0:99:ed:0:0:0:0:0:0:10:0:0:0
2012-05-02 02:21:08.940799 : kjbcro[0x15c91.1 76896.0][5]
2012-05-02 02:21:08.940833 : GSIPC:GMBQ: buff 0xba0ee018, queue 0xbb79a7b8, pool 0x60013fa0, freeq 0, nxt 0xbb79a7b8, prv 0xbb79a7b8
2012-05-02 02:21:08.940859 : kjbsentscn[0x0.1ae28c][to 2]
2012-05-02 02:21:08.940870 : GSIPC:SENDM: send msg 0xba0ee088 dest x20001 seq 177810 type 36 tkts xff0000 mlen x1680198
2012-05-02 02:21:08.940976 : kjbmscr(0x15c91.1)reqid=0xa(req 0xa4ff30f8)(rinst 1)hldr 2(infosz 200)(lseq x2b8)
2012-05-02 02:21:08.941314 : GSIPC:KSXPCB: msg 0xba0ee088 status 30, type 36, dest 2, rcvr 1
*** 2012-05-02 02:21:08.941
kclwcrs: wait=0 tm=707
*** 2012-05-02 02:21:08.941
kclwcrs: got 1 blocks from ksxprcv
2012-05-02 02:21:08.941818 : kjbassume[0x15c91.1][sender 2][mymode x1][myrole x0][srole x0][flgs x0][spiscn 0x0.0][swscn 0x0.0]
2012-05-02 02:21:08.941852 : kjbrcvdscn[0x0.1ae28d][from 2][idx 2012-05-02 02:21:08.941871 : kjbrcvdscn[no bscn

 

前台进程与上一次类似申请的是SCN=[0x0.1ae28c]=1761932 Version的CR block, 但这一次receive的居然是Xcurrent Block且其SCN=1ae28d=1761933,Instance 1收到了Xcurrent Block后自己build出了查询所需要的SCN=1761932的CR BLOCK, 因为实际收到的是Current block,所以等待事件变成了’gc current block 2-way’。

这里可以看到前台进程并没有request current block,而仍是使用kjbcro;追根述源是Instance 2的LMS进程给它传送了Current Block:

 

Instance 2 LMS trace:
2012-05-02 02:21:08.448743 : GSIPC:RCVD: ksxp msg 0x7f16e14a4398 sndr 1 seq 0.177810 type 36 tkts 0
2012-05-02 02:21:08.448778 : GSIPC:RCVD: watq msg 0x7f16e14a4398 sndr 1, seq 177810, type 36, tkts 0
2012-05-02 02:21:08.448798 : GSIPC:TKT: collect msg 0x7f16e14a4398 from 1 for rcvr -1, tickets 0
2012-05-02 02:21:08.448816 : kjbrcvdscn[0x0.1ae28c][from 1][idx 2012-05-02 02:21:08.448834 : kjbrcvdscn[no bscn <= rscn 0x0.1ae28c][from 1]
2012-05-02 02:21:08.448857 : GSIPC:TKT: dest (1:1) rtkt not acked 2  unassigned bufs 0  tkts 0  newbufs 0
2012-05-02 02:21:08.448875 : GSIPC:TKT: remove ctx dest (1:1)
2012-05-02 02:21:08.448970 : [kjmxmpm][type 36][seq 0.177810][msg 0x7f16e14a4398][from 1]
2012-05-02 02:21:08.448993 : kjbmpbast(0x15c91.1) reqid=0x6 (req 0xa4ff30f8)(reqinst 1)(reqid 10)(flags x0)
*** 2012-05-02 02:21:08.449
kclcrrf: req=48054 block=1/89233
*** 2012-05-02 02:21:08.449
kcl_compress_block: compressed: 6 free space: 7680
2012-05-02 02:21:08.449085 : kjbsentscn[0x0.1ae28d][to 1]
2012-05-02 02:21:08.449142 : kjbdeliver[to 1][0xa4ff30f8][10][current 1]
2012-05-02 02:21:08.449164 : kjbmssch(reqlock 0xa4ff30f8,10)(to 1)(bsz 344)
2012-05-02 02:21:08.449183 : GSIPC:AMBUF: rcv buff 0x7f16e18bcec8, pool rcvbuf, rqlen 1102
*** 2012-05-02 02:21:08.449
kclccctx: cleanup copy 0x7f16e1d94838
*** 2012-05-02 02:21:08.449
kcltouched: touch seconds 3271
*** 2012-05-02 02:21:08.449
kclgrantlk: req=48054
2012-05-02 02:21:08.449347 : [kjmpmsgi:compl][type 36][msg 0x7f16e14a4398][seq 177810.0][qtime 0][ptime 1119]
WAIT #0: nam='gcs remote message' ela= 568 waittime=1 poll=0 event=0 obj#=0 tim=1335939668449962
2012-05-02 02:21:08.450001 : GSIPC:RCVD: ksxp msg 0x7f16e1bb22a0 sndr 1 seq 0.177811 type 32 tkts 0
2012-05-02 02:21:08.450024 : GSIPC:RCVD: watq msg 0x7f16e1bb22a0 sndr 1, seq 177811, type 32, tkts 0
2012-05-02 02:21:08.450043 : GSIPC:TKT: collect msg 0x7f16e1bb22a0 from 1 for rcvr -1, tickets 0
2012-05-02 02:21:08.450060 : kjbrcvdscn[0x0.1ae28e][from 1][idx 2012-05-02 02:21:08.450078 : kjbrcvdscn[no bscn <= rscn 0x0.1ae28e][from 1]
2012-05-02 02:21:08.450097 : GSIPC:TKT: dest (1:1) rtkt not acked 3  unassigned bufs 0  tkts 0  newbufs 0
2012-05-02 02:21:08.450116 : GSIPC:TKT: remove ctx dest (1:1)
2012-05-02 02:21:08.450136 : [kjmxmpm][type 32][seq 0.177811][msg 0x7f16e1bb22a0][from 1]
2012-05-02 02:21:08.450155 : kjbmpocr(0xb0.6)seq 0x1,reqid=0x23e,(client 0x9fff7b58,0x1)(from 1)(lseq xdf4)

 

 

为什么Instance 2上的LMS要偷懒,不构造build cr block,而直接传输给Instance 1自己所有的Current Block呢?通过观察Instance 2上的v$cr_block_server视图可以发现LIGHT_WORKS字段在发生current block transfer后并没有增长,说明并没有触发 CR server的 Light Work Rule(Light Work Rule是8i Cr Server中就存在的优化算法,当Remote LMS创建 build CR涉及的工作过多时,resource holder的LMS会选择传输现有block,而不完成CR build If creating the consistent read version block involves too much work (such as reading blocks from disk), then the holder sends the block to the requestor, and the requestor completes the CR fabrication. The holder maintains a fairness counter of CR requests. After the fairness threshold is reached, the holder downgrades it to lock mode.)。

 

到底何种条件下 CR Request 才会收到Current Block呢?

答案是:针对不同种类class的block,CR server有不同的做法。 对于undo block或者 undo header block的CR quest, LMS总是传输Current Block, 这是为了给 远程节点 可能正在进行的 block cleanout、 CR  Version构造提供便利。

 

对于普通的数据块 data blocks, 默认情况下总是 CR quest  & CR received的(排除之前所说的Light Work Rule,LMS”偷懒”), 除非Current Block已经DownConvert降级为S lock,那么LMS进程会宣称直接ship一个current version的block。

 

为了证明这一点 , 我们再次测试 ,这次我们将控制DownConvert的隐藏参数”_fairness_threshold“提高到200,这将造成Xcurrent Block无法降级为Scurrent, 也将导致LMS更少地传输Current Version的Data Block:

 

 

 

SQL> show parameter fair
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
_fairness_threshold                  integer     200
Instance 1:
SQL>  update test set id=id+1 where id=4;
1 row updated.
Instance 2:
SQL>  update test set id=id+1 where id=2;
1 row updated.
SQL> select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;
STATE CR_SCN_BAS
---------- ----------
1          0
3    1838166
在Instance 1上 不断查询,并 通过instance 2的 v$cr_block_server验证
instance 1
SQL> select * from test;
ID
----------
10
3
instance 2:
SQL>  select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;
STATE CR_SCN_BAS
---------- ----------
1          0
3    1883707
8          0
SQL> select * from test;
ID
----------
10
3
SQL>  select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;
STATE CR_SCN_BAS
---------- ----------
1          0
3    1883707
8          0
...................
SQL> /
STATE CR_SCN_BAS
---------- ----------
2          0
3    1883707
3    1883695
repeat cr request on Instance 1
SQL> /
STATE CR_SCN_BAS
---------- ----------
8          0
3    1883707
3    1883695

 

 

实际测试发现_fairness_threshold似乎存在某个上限,虽然设置为200 但是在不多的几次CR serve后就Downgrade了lock, 这造成对data block的 CR Request最终还是Receive了 Current Block。

沪ICP备14014813号

沪公网安备 31010802001379号