【Maclean Liu技术分 享】深入理解Oracle中 Mutex的内部原理
本文适合对Oracle Mutex/latch有兴趣了解其深入内部原理的同学
下载地址:
【Maclean Liu技术分享】深入理解Oracle中Mutex的内部原理
Oracle数据库数据恢复、性能优化、故障诊断来问问Maclean
刘相兵 邮箱: liu.maclean@gmail.com 手机: 13764045638, ORA-ALLSTARS Exadata用户组QQ群:23549328
【Maclean Liu技术分 享】深入理解Oracle中 Mutex的内部原理
本文适合对Oracle Mutex/latch有兴趣了解其深入内部原理的同学
下载地址:
【Maclean Liu技术分享】深入理解Oracle中Mutex的内部原理
虽然Mutex中文翻译为互斥锁,但为了和OS mutex充分的区别,所以我们在本文里称Oracle Mutex为Mutex。
Oracle中的mutex,类似于Latch,是一种低级的串行机制,用以控制对SGA中部分共享数据结构的访问控制。 Oracle中的串行机制有不少,引入它们的目的是避免一个对象出现下述现象:
不同于Latch,Mutex的使用更灵活,用途更多,例如:
理论上mutex即可以存放在其保护的结构本身中(其实是嵌入在结构里),也可以存放在其他地方。 一般情况下Mutex是在数据结构需要被保护时动态创建出来的。 如是嵌在需要保护结构体内的mutex,则当 所依附的数据结构被清理时 该mutex也将被摧毁。
虽然mutex和latch都是Oracle中的串行机制,但是mutex具有一些latch没有的好处
Mutex作为Latch的替代品,具有更快速获得,更小等优势。 获取一个mutex进需要大约30~35个指令, 而Latch则需要150~200个指令。一个mutex结构的大小大约为16 bytes,而在10.2版本中一个latch需要112个bytes,在更早的版本中是200个bytes。 从200个bytes 精简到112个是通过减少不必要的统计指标 SLEEP1~SLEEP11、WAITERS_WOKEN, WAITS_HOLDING_LATCH等从而实现的。今后我们将看到更多关于Latch的代码优化。
典型情况下一个Latch保护多个对象。 当一个Latch保护多个热对象时,并行地对这些对象的频繁访问让latch本身变成性能的串行点。 这也就是我们此处说的伪争用点, 因为争用是发生在这个串行保护的机制上,而不是进程去访问的对象本身。与latch不同, 使用mutex的情况下Oracle开发人员可以为每一个要保护的数据结构创建一个独立的mutex。 这意味着Latch的那种伪争用将大大减少,因为每一个对象均被自己独立拥有的mutex保护
一个Mutex可供多个Oracle进程并行地参考,反过来说进程们可以以S(Shared 共享) mode模式参考一个Mutex。以S mode一起共享参考这个mutex的进程的总数成为参考总数reference count。Mutex自身结构中存放了这个ref count的数据。另一方面,mutex也可以被以X (Exclusive)mode排他模式被仅有一个进程所持有Held。
Mutex有2种用途,一方面他们可以充当维护必要串行机制的结构,如同latch那样; 同时也可以充当pin,避免对象被age out。
举例来说,mutex结构中包含的ref count信息可以用作替代library cache pin。 在mutex充当cursor pin之前,当一个进程要执行=>pin一个cursor时需要做的是针对性地创建library cache pin和删除这个library cache pin(均为S mode)。mutex充当cursor pin之后,进程只需要增加和减少mutex上的ref count即可。
当某一个进程首次解析一个游标 Cursor,他将临时创建并移除一个library cache pin,但是该进程后续的解析或执行进要求增加或者减少ref count。注意在这个增加/减少ref count的过程中无需acquire latch,这是因为mutex自身能起到限制串行访问修改ref count的作用。 当一个进程要移除自己的mutex pin时,它减少ref count,同样的无需acquire 任何latch。
Latches和Mutex 是独立的串行机制, 举例来说一个进程可以同时持有latch和mutex。 在进程异常dead的情况下,一般latch要比Mutex更早被PMON清理。 一般情况下不存在mutex的死锁。 不像latch,在早期版本例如9i之前我们经常遇到latch死锁的问题。
在版本10.2中仅仅有 KKS 这个内核层是mutex的客户,KKS 意为 Kernel Kompile Shared, 它是Library Cache中的shared cursor游标共享部分层次的代码。 在之后的版本中,ORACLE开发部门更多地使用了Mutex,不局限于KKS。
kks 使用mutex以便保护对于下述基于parent cursor父游标和子游标child cursor的一系列操作。
在版本10.2中, 以下是几个SQL解析与执行从MUTEX哪里获得主要收益:
通过V$SQLSTAT视图(本质上是X$KKSSQLSTAT)访问SQL统计信息时,其所需要的CPu和获取的Latch数量要远远少于访问其他V$SQL视图。 在早期版本中, 并行地访问V$SQL或者V$SQLAREA视图会造成 library cache latch的争用。
下面是一个AWR中的Mutex Sleep Statistics, 这些数据主要来源于V$MUTEX_SLEEP视图。
| Mutex Type | Location | Sleeps | Wait Time (ms) |
|---|---|---|---|
| Library Cache | kglhdgh1 64 | 2,356 | 0 |
| Library Cache | kglpnal2 91 | 2,345 | 0 |
| Cursor Pin | kkslce [KKSCHLPIN2] | 2,084 | 0 |
| Library Cache | kglpin1 4 | 956 | 0 |
| Library Cache | kglhdgn2 106 | 784 | 0 |
| Library Cache | kglpndl1 95 | 691 | 0 |
| Library Cache | kglpnal1 90 | 605 | 0 |
| Library Cache | kgllkdl1 85 | 580 | 0 |
| Library Cache | kgllkal1 80 | 404 | 0 |
| Library Cache | kglllal3 111 | 282 | 0 |
| Library Cache | kglllal1 109 | 218 | 0 |
| Library Cache | kglhdgn1 62 | 163 | 0 |
| Library Cache | kgllldl2 112 | 156 | 0 |
| Library Cache | kgllkc1 57 | 105 | 0 |
| Library Cache | kglget2 2 | 100 | 0 |
| Library Cache | kglini1 32 | 53 | 0 |
| Library Cache | kglget1 1 | 31 | 0 |
| Cursor Pin | kksLockDelete [KKSCHLPIN6] | 22 | 0 |
| Library Cache | kgllkal3 82 | 18 | 0 |
| Library Cache | kglUnsetHandleReference 120 | 10 | 0 |
| Cursor Pin | kksxsccmp [KKSCHLPIN5] | 10 | 0 |
| Library Cache | kglobld1 75 | 8 | 0 |
| Cursor Pin | kksfbc [KKSCHLPIN1] | 8 | 0 |
| Library Cache | kglUpgradeLock 119 | 7 | 0 |
| Library Cache | kglhdgc1 102 | 2 | 0 |
| Cursor Pin | kksfbc [KKSCHLFSP2] | 2 | 0 |
| Library Cache | kgldtin1 42 | 1 | 0 |
| Library Cache | kglhbh1 63 | 1 | 0 |
| Library Cache | kgllkal5 84 | 1 | 0 |
| Library Cache | kglrdtin1 44 | 1 | 0 |
| Cursor Parent | kkscsPruneChild [KKSPRTLOC35] | 1 | 0 |
Mutex的类型Mutex Type
Mutex的类型其实就是 mutex对应的客户的名字, 在版本10.2中基本只有KKS使用Mutex,所以仅有3种:
在版本11g中扩展了对Mutex的使用,在Library Cache的HASH BUCKET中嵌入了mutex以保护hash bucket,所以多了一种mutex type : Library Cache
哪些代码函数会申请Mutex?
Oracle中哪些代码函数会申请Mutex? 例如KKSFBC等,其实很像 V$LATCH的location列
10.2中最常见的下面的几个函数
kkspsc0 -负责解析游标 – 检测我们正在解析的游标是否有对象的parent cursor heap 0存在
kksfbc – 负责找到合适的子游标 或者创建一个新的子游标
kksFindCursorstat
11g开始有大量函数需要用到Mutex了
SQL> select location from X$MUTEX_SLEEP_HISTORY;
LOCATION
—————————————-
kkslce [KKSCHLPIN2]
kksfbc [KKSCHLFSP2]
kglhdgn2 106
kglpin1 4
kglhdgn2 106
kglllal1 109
kglpin1 4
kglpndl1 95
kglpin1 4
kglpin1 4
kksfbc [KKSCHLFSP2]
kglhdgn1 62
kglpnal1 90
kglllal3 111
kglpnal1 90
kglpnal1 90
kglget2 2
kglllal3 111
kglget2 2
kglobld1 75
kkslce [KKSCHLPIN2]
kglpndl1 95
kglpndl1 95
kglpin1 4
kkslce [KKSCHLPIN2]
kglpin1 4
kglget2 2
kglllal1 109
kgllkc1 57
kglget2 2
kglpnal1 90
kglpin1 4
kglpin1 4
kglpin1 4
kgllkdl1 85
kglllal3 111
kgllldl2 112
kglpin1 4
kglpndl1 95
kkslce [KKSCHLPIN2]
kksLockDelete [KKSCHLPIN6]
kglpndl1 95
kkslce [KKSCHLPIN2]
kglpnal1 90
kglpin1 4
kglpin1 4
kgllldl2 112
kgllkdl1 85
kglpin1 4
kglhdgn2 106
kglhdgn2 106
kksLockDelete [KKSCHLPIN6]
kglhdgn1 62
当一个Mutex被申请时, 一般称为一个get request。 若初始的申请未能得到授权, 则该进程会因为此次申请而进入到255次SPIN中(255是在代码中用常量写死的),每次SPIN循环迭代过程中该进程都会去看看Mutex被释放了吗。
若该Mutex在SPIN之后仍未被释放,则该进程针对申请的mutex进入对应的mutex wait等待事件中。 实际进程的等待事件和等待方式由mutex的类型锁决定,例如 Cursor pin、Cursor Parent。 举例来说,这种等待可能是阻塞等待,也可以是sleep。
但是请注意在V$MUTEX_SLEEP_*视图上的sleep列意味着等待的次数。相关代码函数在开始进入等待时自加这个sleep字段。
等待计时从进程进入等待前开始计算等待时间, 当一个进程结束其等待,则等待的时间加入都总和total中。 该进程再次尝试申请之前的Mutex,若该Mutex仍不可用,则它再次进入spin/wait的循环。
V$MUTEX_SLEEP_HISTORY视图的GETS列仅在成功申请到一个Mutex时才增加。
短期持有一个mutex: spin 循环255次一般可以有效以S mode获得一个mutex, 前提是该Mutex 已经被以S mode持有。 简单来说若有2个进程同时以S mode去申请一个Mutex,则稍晚的一个申请者需要进入SPIN并等稍早一点的申请者完成它的例如创建针对该mutex的一个reference的操作,但这都是非常迅速的操作。
长期持有一个Mutex: 如若一个Mutex已经被某进程以X mode持有, 则往往有其进程以SHRD模式去申请该mutex时仍发现该mutex 以X mode被其他进程所持有,则往往这个EXCL 持有是 LONG_EXCL(可以通过SSD DUMP发现),则后续的申请者往往要进入spin循环,甚至需要等待
上面我讲了willing-to-wait的mutex, 实际上mutex 的申请也可以是 nowait的。进程以nowait申请mutex时不会进入spin-cycle也不sleep,它只继续常规处理。 当一个nowait get失败时,将增加一次miss,但是实际上V$MUTEX_SLEEP_*中记录的miss不是这样的miss, 视图中记录的miss是等待的次数, 对于真正的miss没有统计项。
Wait Time等待时间
类似于latch,spin time 不算做mutex的消耗时间,它只包含等待消耗的时间。
Mutex数据结构中存放了Holder id持有者ID , Ref Count,和其他Mutex相关的统计信息。 Holder id对应于持有该Mutex的session id (v$session.sid) 。 特别注意, Ref Count是进程并发以S mode参考该Mutex的进程数量(如下文的演示)。
当一个Mutex被以X mode 持有,则Holder id 为对应持有该mutex的session id,而Ref Count为0。
每一个共享S mode持有者仅仅增加mutex上的Ref Count。 可供大量session并发以S mode持有参考一个Mutex。 但是注意更新ref count的操作是串行的, 这是为了避免错漏并维护mutex中正确的ref count。
下面我们详细介绍一个执行游标过程中对mutex share pin的过程:
cursor: mutex * events等待事件
‘cursor: pin * events’等待事件
该类等待事件一般是为了pin相关的子游标
V$MUTEX_SLEEP
shows the wait time, and the number of sleeps for each combination of mutex type and location.
| Column | Datatype | Description |
|---|---|---|
MUTEX_TYPE |
VARCHAR2(32) |
Type of action/object the mutex protects |
LOCATION |
VARCHAR2(40) |
The code location where the waiter slept for the mutex |
SLEEPS |
NUMBER |
Number of sleeps for this MUTEX_TYPE and LOCATION |
WAIT_TIME |
NUMBER |
Wait time in microseconds |
V$MUTEX_SLEEP_HISTORY
displays time-series data. Each row in this view is for a specific time, mutex type, location, requesting session and blocking session combination. That is, it shows data related to a specific session (requesting session) that slept while requesting a specific mutex type and location, because it was being held by a specific blocking session. The data in this view is contained within a circular buffer, with the most recent sleeps shown.
| Column | Datatype | Description |
|---|---|---|
SLEEP_TIMESTAMP |
TIMESTAMP(6) |
The last date/time this MUTEX_TYPE and LOCATION was slept for by theREQUESTING_SESSION, while being held by the BLOCKING_SESSION. |
MUTEX_TYPE |
VARCHAR2(32) |
Type of action/object the mutex protects |
GETS |
NUMBER |
The number of times the mutex/location was requested by the requesting session while being held by the blocking session. GETS is only incremented once per request, irrespective of the number of sleeps required to obtain the mutex. |
SLEEPS |
NUMBER |
The number of times the requestor had to sleep before obtaining the mutex |
REQUESTING_SESSION |
NUMBER |
The SID of a session requesting the mutex |
BLOCKING_SESSION |
NUMBER |
The SID of a session holding the mutex |
LOCATION |
VARCHAR2(40) |
The code location where the waiter slept for the mutex |
MUTEX_VALUE |
RAW(4) |
If the mutex is held in exclusive (X) mode, this column shows the SID of the blocking session, else it shows the number of sessions referencing the mutex in S mode. |
P1 |
NUMBER |
Internal use only |
P1RAW |
RAW(4) |
Internal use only |
P2 |
NUMBER |
Internal use only |
P3 |
NUMBER |
Internal use only |
P4 |
NUMBER |
Internal use only |
P5 |
VARCHAR2(64) |
Internal use only |
接着我们会在环境中模拟cursor pin S wait on X的场景,并通过systemstate dump和v$mutex_sleep , v$mutex_sleep_history等视图观察这一现象
session A:
SQL> select * from v$version;
BANNER
—————————————————————-
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 – 64bi
PL/SQL Release 10.2.0.5.0 – Production
CORE 10.2.0.5.0 Production
TNS for Linux: Version 10.2.0.5.0 – Production
NLSRTL Version 10.2.0.5.0 – Production
www.askmaclean.com
SQL> show parameter kks
SQL>
SQL> create table mac_kks tablespace users nologging as select * from dba_objects;
Table created.
SQL> insert /*+ append */ into mac_kks select * from mac_kks;
77386 rows created.
SQL> commit;
Commit complete.
SQL> insert /*+ append */ into mac_kks select * from mac_kks;
154772 rows created.
SQL> commit;
Commit complete.
SQL> insert /*+ append */ into mac_kks select * from mac_kks;
309544 rows created.
SQL> commit;
Commit complete.
SQL> insert /*+ append */ into mac_kks select * from mac_kks;
619088 rows created.
SQL> commit;
Commit complete.
SQL> insert /*+ append */ into mac_kks select * from mac_kks;
1238176 rows created.
SQL> commit;
Commit complete.
SQL> oradebug setmypid
Statement processed.
SQL> oradebug tracefile_name
Statement processed.
SQL> alter table mac_kks add t2 char(2000) default ‘MACLEAN’;
session B:
SQL> oradebug setospid 32424
Oracle pid: 17, Unix process pid: 32424, image: oracle@vrh8.oracle.com (TNS V1-V3)
SQL> oradebug suspend;
Statement processed.
session C:
select * from mac_kks where rownum=1; ==> hang
session D:
select * from mac_kks where rownum=1; ==> hang
session E:
SQL> select sid,event from v$session where wait_class!=’Idle’;
SID EVENT
———- —————————————————————-
141 SQL*Net message to client
145 library cache lock
149 cursor: pin S wait on X
159 log buffer space
SQL> oradebug setmypid
Statement processed.
SQL> oradebug dump systemstate 266;
Statement processed.
SQL> oradebug tracefile_name
/s01/admin/G10R25/udump/g10r25_ora_32537.trc
Object Names
~~~~~~~~~~~~
LOCK: handle=a7115ef0
Mutex 7fff7abadecf
KGX Atomic Operation Log 0x8d88a8d8
Mutex 0x954eaff8(145, 0) idn 7fff7abadecf oper EXCL
Cursor Pin uid 145 efd 0 whr 1 slp 0
opr=3 pso=0x97951af0 flg=0
pcs=0x954eaff8 nxt=(nil) flg=35 cld=0 hd=0xa7864b08 par=0x9523a9e0
ct=0 hsh=0 unp=(nil) unn=0 hvl=9595c3d8 nhv=1 ses=0xa8416738
hep=0x954eb078 flg=80 ld=1 ob=0x95ac6128 ptr=0x8fd90128 fex=0x8fd8f438
0x954eaff8(145, 0) ==> sid和 ref count
pso ==> parent state object
hd=0xa7864b08 ==>cursor 对应的handle address
par ==> 父游标的heap 0 pointer
ses=0xa8416738 ==》 一般 EXCL是才有 session address v$session.saddr
SID=145 对Mutex 0x954eaff8 oper EXCL以X mode Hold 该Mutex, SID=145 在等 SYS.MAC_KKS表的library cache lock,该表被X mode pin和lock,而解析SQL要求以S mode lock该表
SID=149 对Mutex 0x954eaff8 申请 oper GET_SHRD, SID=149在等cursor: pin S wait on X
KGX Atomic Operation Log 0x8db79798
Mutex 0x954eaff8(145, 0) idn 7fff7abadecf oper GET_SHRD
Cursor Pin uid 149 efd 0 whr 5 slp 13893
opr=2 pso=0x8e6bd518 flg=0
pcs=0x954eaff8 nxt=(nil) flg=35 cld=0 hd=0xa7864b08 par=0x9523a9e0
ct=0 hsh=0 unp=(nil) unn=0 hvl=9595c3d8 nhv=1 ses=0xa8416738
hep=0x954eb078 flg=80 ld=1 ob=0x95ac6128 ptr=0x8fd90128 fex=0x8fd8f438
SO: 0xa841bd18, type: 4, owner: 0xa830cf98, flag: INIT/-/-/0×00
(session) sid: 149 trans: (nil), creator: 0xa830cf98, flag: (80000041) USR/- BSY/-/-/-/-/-
DID: 0001-0019-00000066, short-term DID: 0000-0000-00000000
txn branch: (nil)
oct: 0, prv: 0, sql: 0x8d88bf90, psql: (nil), user: 0/SYS
service name: SYS$USERS
O/S info: user: oracle, term: pts/5, ospid: 32510, machine: vrh8.oracle.com
program: sqlplus@vrh8.oracle.com (TNS V1-V3)
application name: sqlplus@vrh8.oracle.com (TNS V1-V3), hash value=543908804
waiting for ‘cursor: pin S wait on X’ wait_time=0, seconds since wait started=0
idn=7abadecf, value=9100000000, where|sleeps=500003645
blocking sess=0x(nil) seq=13901
Dumping Session Wait History
for ‘cursor: pin S wait on X’ count=1 wait_time=0.266596 sec
idn=7abadecf, value=9100000000, where|sleeps=500003644
for ‘cursor: pin S wait on X’ count=1 wait_time=0.010679 sec
idn=7abadecf, value=9100000000, where|sleeps=500003643
for ‘cursor: pin S wait on X’ count=1 wait_time=0.010633 sec
idn=7abadecf, value=9100000000, where|sleeps=500003642
for ‘cursor: pin S wait on X’ count=1 wait_time=0.010843 sec
idn=7abadecf, value=9100000000, where|sleeps=500003641
for ‘cursor: pin S wait on X’ count=1 wait_time=0.011008 sec
idn=7abadecf, value=9100000000, where|sleeps=500003640
for ‘cursor: pin S wait on X’ count=1 wait_time=0.010406 sec
SO: 0xa8416738, type: 4, owner: 0xa830bfa8, flag: INIT/-/-/0×00
(session) sid: 145 trans: (nil), creator: 0xa830bfa8, flag: (80000041) USR/- BSY/-/-/-/-/-
DID: 0001-0017-0000008E, short-term DID: 0000-0000-00000000
txn branch: (nil)
oct: 3, prv: 0, sql: 0x8d88bf90, psql: (nil), user: 0/SYS
service name: SYS$USERS
O/S info: user: oracle, term: pts/4, ospid: 32485, machine: vrh8.oracle.com
program: sqlplus@vrh8.oracle.com (TNS V1-V3)
application name: sqlplus@vrh8.oracle.com (TNS V1-V3), hash value=543908804
waiting for ‘library cache lock’ wait_time=0, seconds since wait started=165
handle address=a7115ef0, lock address=978def20, 100*mode+namespace=c9
blocking sess=0x(nil) seq=9
Dumping Session Wait History
for ‘library cache lock’ count=1 wait_time=3.297287 sec
handle address=a7115ef0, lock address=978def20, 100*mode+namespace=c9
for ‘library cache lock’ count=1 wait_time=2.930321 sec
handle address=a7115ef0, lock address=978def20, 100*mode+namespace=c9
for ‘library cache lock’ count=1 wait_time=2.930856 sec
handle address=a7115ef0, lock address=978def20, 100*mode+namespace=c9
for ‘library cache lock’ count=1 wait_time=2.930698 sec
handle address=a7115ef0, lock address=978def20, 100*mode+namespace=c9
for ‘library cache lock’ count=1 wait_time=2.931518 sec
handle address=a7115ef0, lock address=978def20, 100*mode+namespace=c9
SO: 0x978def20, type: 53, owner: 0xa8456058, flag: INIT/-/-/0×00
LIBRARY OBJECT LOCK: lock=978def20 handle=a7115ef0 request=S
call pin=(nil) session pin=(nil) hpc=0005 hlc=0000
htl=0x978defa0[0x8e4a8950,0x8e4a8950] htb=0x8e4a8950 ssga=0x8e4a7a88
user=a8416738 session=a8416738 count=0 flags=RES/[0010] savepoint=0x1f
LIBRARY OBJECT HANDLE: handle=a7115ef0 mtx=0xa7116020(0) lct=8 pct=10 cdp=0
name=SYS.MAC_KKS
hash=c066b9b9c6c80736a15f5ba325563fdb timestamp=04-14-2013 00:43:14
namespace=TABL flags=KGHP/TIM/SML/[02000000]
kkkk-dddd-llll=0000-0701-0201 lock=X pin=X latch#=3 hpc=0006 hlc=0004
lwt=0xa7115f98[0x978def50,0x978def50] ltm=0xa7115fa8[0xa7115fa8,0xa7115fa8]
pwt=0xa7115f60[0xa7115f60,0xa7115f60] ptm=0xa7115f70[0xa7115f70,0xa7115f70]
ref=0xa7115fc8[0xa7115fc8,0xa7115fc8] lnd=0xa7115fe0[0x9cc2c260,0xa79fd198]
LIBRARY OBJECT: object=95f0e5b0
type=TABL flags=EXS/LOC/UPD[0905] pflags=[0000] status=VALD load=0
DATA BLOCKS:
data# heap pointer status pins change whr
—– ——– ——– ——— —- —— —
0 a78d3840 95f0e708 I/P/A/-/- 0 NONE 00
8 957a6ad8 94461f88 I/P/A/-/- 1 UPDATE 00
Mutex 0x954eaff8 被 SID=145 oper EXCL以X mode Hold 该Mutex
体现为 子游标child cursor被以 X mode pin
SO: 0x8e6bd518, type: 53, owner: 0xa841bd18, flag: INIT/-/-/0×00
LIBRARY OBJECT LOCK: lock=8e6bd518 handle=a7864b08 mode=N
call pin=(nil) session pin=(nil) hpc=0000 hlc=0000
htl=0x8e6bd598[0x8e542f60,0x979da4d0] htb=0x979da4d0 ssga=0x979d96c8
user=a841bd18 session=a841bd18 count=1 flags=CBK[0020] savepoint=0×0
LIBRARY OBJECT HANDLE: handle=a7864b08 mtx=0xa7864c38(0) lct=2 pct=3 cdp=0
namespace=CRSR flags=RON/KGHP/PN0/EXP/[10010100]
kkkk-dddd-llll=0000-0001-0001 lock=N pin=X latch#=3 hpc=0002 hlc=0002
lwt=0xa7864bb0[0xa7864bb0,0xa7864bb0] ltm=0xa7864bc0[0xa7864bc0,0xa7864bc0]
pwt=0xa7864b78[0xa7864b78,0xa7864b78] ptm=0xa7864b88[0xa7864b88,0xa7864b88]
ref=0xa7864be0[0x954eb320,0x954eb320] lnd=0xa7864bf8[0xa7864bf8,0xa7864bf8]
LIBRARY OBJECT: object=95ac6128
type=CRSR flags=EXS[0001] pflags=[0000] status=VALD load=0
DATA BLOCKS:
data# heap pointer status pins change whr
—– ——– ——– ——— —- —— —
0 8d81aa70 95ac6240 I/P/A/-/- 0 NONE 00
6 954eb0f0 8fd90128 I/P/A/-/E 0 NONE 00
并行执行6个SQL语句 并做systemstate dump :
[oracle@vrh8 ~]$ grep SHRD /s01/admin/G10R25/udump/g10r25_ora_32716.trc |grep 0×95987388
Mutex 0×95987388(0, 6) idn 7fff8f32fff6 oper SHRD
Mutex 0×95987388(0, 6) idn 7fff8f32fff6 oper SHRD
Mutex 0×95987388(0, 6) idn 7fff8f32fff6 oper SHRD
Mutex 0×95987388(0, 6) idn 7fff8f32fff6 oper SHRD
Mutex 0×95987388(0, 6) idn 7fff8f32fff6 oper SHRD
Mutex 0×95987388(0, 6) idn 7fff8f32fff6 oper SHRD
to be continued ……………………………
如果在udump/bdump目录下trace中找到
kksfbc : Clear parse Err=44201 xsc=1105fddf0 chd=0 clk=0 400020 40000000 20a2000
kksfbc : Clear parse Err=44201 xsc=1113ad1f8 chd=0 clk=0 400020 40000000 20a2000
的相关信息,一般是可以忽略的。
kksfbc是解析parse 过程中的重要函数,其作用在于find bound cursor、并返回一个合适的子游标, 而相关的内核函数kkspfx则用于pin cursor以便执行。
kksfbc – Find Bound Cursor – creation of a new child cursor
Oracle 中硬解析(Hard Parse)构造子游标 build child cursor的典型stack call:
opitca()<-kksFullTypeCheck()<-rpiswu2()<-kksLoadChild()<-kkslod()<-kglobld()<-kglobpn()<-kglpim()<-kglpin()<-kxsRuntimeLock()<-kksfbc()<-kkspsc0()<-kksParseCursor()<-opiosq0()<-kpooprx()<kpoal8()<-opiodr()<-ttcpip()<-opitsk()<-opiino()<-opiodr()<-opidrv()<-sou2o()<-opimai_real()<-main()
软解析找寻一个合适子游标的典型stack call:
kksSearchChildList ()<-kksfbc()<-kkspsc0()<-kksParseCursor()<-opiosq0()<-kpooprx()<kpoal8()<-opiodr()<-ttcpip()<-opitsk()<-opiino()<-opiodr()<-opidrv()<-sou2o()<-opimai_real()<-main()
执行时找到一个合适子游标的典型stack call:
kksSearchChildList()<-kksfbc()<-opiexe ()<-kpoal8 ()<-opiodr ()<-ttcpip ()<-opitsk ()<-opiino ()<-opiodr ()<-opidrv ()<-sou2o ()<-opimai_real main ()
Continously Receiving "Kksfbc : Clear Parse Err=44201" In Bdump Tracefiles (Doc ID 748940.1) Modified01-MAR-2013TypePROBLEMStatusMODERATED(EXTERNAL)Priority3To Bottom In this Document Symptoms Cause Solution References Platforms: 1-914CU; This document is being delivered to you via Oracle Support's Rapid Visibility (RaV) process and therefore has not been subject to an independent technical review. Applies to: Oracle Server - Enterprise Edition - Version: 10.2.0.3 and later [Release: 10.2 and later ] Information in this document applies to any platform. Symptoms -- Problem Statement: On 10.2.0.3 in Production: we are getting many trace files generated in bdump with the following message : ERROR kksfbc : Clear parse Err=44201 xsc=800003ffbfc3d678 chd=0000000000000000 clk=0000000000000000 400020 0 22080 Cause This issue due to Bug 5397402 "KKSFBC : CLEAR PARSE ERR=44201 XSC=0XB5F48550 CHD=(NIL) CLK=(NIL) Diagnostic output may be written to the session's trace file under some conditions. The extraneous trace looks like this: kksfbc : Clear parse Err=44201 xsc=800003ffbfc3d678 chd=0000000000000000 clk=0000000000000000 400020 0 22080 Solution -- To implement the solution, please execute the following steps:: 1) Apply Patch 5397402 Or 2) Apply the 10.2.0.5 patchset. Also you can safely ignore the generated trace files. References BUG:5397402 - KKSFBC : CLEAR PARSE ERR=44201 XSC=0XB5F48550 CHD=(NIL) CLK=(NIL) IN TRACES
SMON对于Undo(Rollback)segment的日常管理还不止于OFFLINE UNDO SEGMENT ,在AUM(automatic undo management或称SMU)模式下SMON还定期地收缩Shrink Rollback/undo segment。
触发场景
这种AUM下rollback/undo segment的undo extents被shrink的现象可能被多种条件触发:
若smon shrink rollback/undo真的发生时,会这样处理:
计算平均的undo retention大小,按照下列公式:
retention size=(undo_retention * undo_rate)/(#online_transaction_table_segment 在线回滚段的个数)
对于每一个undo segment
注意SMON的定期Shrink,每12个小时才发生一次,具体发生时可以参考SMON进程的TRACE。
若系统中存在大事务,则rollback/undo segment可能扩展到很大的尺寸;视乎事务的大小,则undo tablespace上的undo/rollback segment会呈现出不规则的空间占用分布。
SMON的定期清理undo/rollback segment就是要像一个大锤敲击钢铁那样,把这些大小不规则的online segment清理成大小统一的回滚段,以便今后使用。
当然这种定期的shrink也可能造成一些阻碍,毕竟在shrink过程中会将undo segment header锁住,则事务极低概率可能遇到ORA-1551错误:
[oracle@vmac1 ~]$ oerr ora 1551 01551, 00000, "extended rollback segment, pinned blocks released" // *Cause: Doing recursive extent of rollback segment, trapped internally // by the system // *Action: None
如何禁止SMON SHRINK UNDO SEGMENT?
可以通过设置诊断事件event=’10512 trace name context forever, level 1′来禁用SMON OFFLINE UNDO SEGS;
SQL> select * from global_name; GLOBAL_NAME -------------------------------------------------------------------------------- www.askmaclean.com SQL> alter system set events '10512 trace name context forever,level 1'; System altered.
相关BUG
这些BUG主要集中在9.2.0.8之前,10.2.0.3以后几乎绝迹了:
Bug 1955307 – SMON may self-deadlock (ORA-60) shrinking a rollback segment in SMU mode [ID 1955307.8]
Bug 3476871 : SMON ORA-60 ORA-474 ORA-601 AND DATABASE CRASHED
Bug 5902053 : SMON WAITING ON ‘UNDO SEGMENT TX SLOT’ HANGS DATABASE
Bug 6084112 : INSTANCE SLOW SHOW SEVERAL LONGTIME RUNNING WAIT EVENTS
Database Replay是11g中很酷的特性,对于workload capture的内部工作原理大家理解的不多,这里就介绍一下。
对于Workload Capture需要考虑的因素:
这里需要注意的有2点, 对于RAC集群存在workload capture file的目录必须位于共享文件系统上,否则start_capture时会报错。
对于每个session会多消耗64KB的内存,这是由于本质上写出负载信息到workload capture file的同样是Server Process服务进程本身,但是这种写出并非在parse解析或execution执行阶段,Server Process将其LOGON、LOGOFF、SQL执行等信息记录存放在PGA中,主要是WCR Capture PG、WCR Capture PGA中,当PGA中的工作负载历史记录达到一定数目时,Server Process本身负责写出这些数据到WCR文件中,在写出这些WCR文件时Server Process进入’WCR: capture file IO write’等待事件。
与WCR相关的等待事件另有:
SQL> select * from v$version; BANNER -------------------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production PL/SQL Release 11.2.0.3.0 - Production CORE 11.2.0.3.0 Production TNS for Linux: Version 11.2.0.3.0 - Production NLSRTL Version 11.2.0.3.0 - Production SQL> select name from v$event_name where name like '%WCR%'; NAME ---------------------------------------------------------------- WCR: replay client notify WCR: replay clock WCR: replay lock order WCR: replay paused WCR: RAC message context busy WCR: capture file IO write WCR: Sync context busy latch: WCR: sync latch: WCR: processes HT 11g中还多出了部分为WCR而设的LATCH 1* select name,gets from v$latch where name like '%WCR%' SQL> / NAME GETS ------------------------------ ---------- WCR: kecu cas mem 3 WCR: kecr File Count 37 WCR: MMON Create dir 1 WCR: ticker cache 0 WCR: sync 495 WCR: processes HT 0 WCR: MTS VC queue 0 7 rows selected.
我们通过如下演示具体说明Database Replay Capture的内部原理
1. 首先打开capture dbms_workload_capture.start_capture
CREATE OR REPLACE DIRECTORY dbcapture AS '/home/oracle/dbcapture';
execute dbms_workload_capture.start_capture('CAPTURE','DBCAPTURE',default_action=>'INCLUDE');
SQL> select id,name,status,start_time,end_time,connects,user_calls,dir_path from
dba_workload_captures where id = (select max(id) from dba_workload_captures) ;
ID
----------
NAME
--------------------------------------------------------------------------------
STATUS START_TIM END_TIME CONNECTS
---------------------------------------- --------- --------- ----------
USER_CALLS
----------
DIR_PATH
--------------------------------------------------------------------------------
1
CAPTURE
IN PROGRESS 08-DEC-12 11
ID
----------
NAME
--------------------------------------------------------------------------------
STATUS START_TIM END_TIME CONNECTS
---------------------------------------- --------- --------- ----------
USER_CALLS
----------
DIR_PATH
--------------------------------------------------------------------------------
167
/home/oracle/dbcapture
2. 窥探 capture file目录
[oracle@mlab2 dbcapture]$ ls -lR
.:
total 8
drwxr-xr-x 2 oracle oinstall 4096 Dec 8 07:24 cap
drwxr-xr-x 3 oracle oinstall 4096 Dec 8 07:24 capfiles
-rw-r--r-- 1 oracle oinstall 0 Dec 8 07:24 wcr_cap_00001.start
./cap:
total 4
-rw-r--r-- 1 oracle oinstall 91 Dec 8 07:24 wcr_scapture.wmd
./capfiles:
total 4
drwxr-xr-x 12 oracle oinstall 4096 Dec 8 07:24 inst1
./capfiles/inst1:
total 40
drwxr-xr-x 2 oracle oinstall 4096 Dec 8 08:31 aa
drwxr-xr-x 2 oracle oinstall 4096 Dec 8 07:24 ab
drwxr-xr-x 2 oracle oinstall 4096 Dec 8 07:24 ac
drwxr-xr-x 2 oracle oinstall 4096 Dec 8 07:24 ad
drwxr-xr-x 2 oracle oinstall 4096 Dec 8 07:24 ae
drwxr-xr-x 2 oracle oinstall 4096 Dec 8 07:24 af
drwxr-xr-x 2 oracle oinstall 4096 Dec 8 07:24 ag
drwxr-xr-x 2 oracle oinstall 4096 Dec 8 07:24 ah
drwxr-xr-x 2 oracle oinstall 4096 Dec 8 07:24 ai
drwxr-xr-x 2 oracle oinstall 4096 Dec 8 07:24 aj
./capfiles/inst1/aa:
total 316
-rw-r--r-- 1 oracle oinstall 1762 Dec 8 07:25 wcr_c6cdah0000001.rec
-rw-r--r-- 1 oracle oinstall 16478 Dec 8 07:28 wcr_c6cf1h0000002.rec
-rw-r--r-- 1 oracle oinstall 1772 Dec 8 07:29 wcr_c6cjdh0000004.rec
-rw-r--r-- 1 oracle oinstall 1535 Dec 8 07:29 wcr_c6cnah0000005.rec
-rw-r--r-- 1 oracle oinstall 1821 Dec 8 07:41 wcr_c6cpfh0000007.rec
-rw-r--r-- 1 oracle oinstall 1815 Dec 8 07:33 wcr_c6cq6h000000a.rec
-rw-r--r-- 1 oracle oinstall 1535 Dec 8 07:34 wcr_c6cxmh000000h.rec
-rw-r--r-- 1 oracle oinstall 1427 Dec 8 07:41 wcr_c6cxvh000000j.rec
-rw-r--r-- 1 oracle oinstall 1425 Dec 8 07:41 wcr_c6czph000000k.rec
-rw-r--r-- 1 oracle oinstall 2398 Dec 8 07:49 wcr_c6dqfh000000q.rec
-rw-r--r-- 1 oracle oinstall 259321 Dec 8 08:35 wcr_c6du7h000000r.rec
-rw-r--r-- 1 oracle oinstall 0 Dec 8 07:55 wcr_c6f6yh000000t.rec
-rw-r--r-- 1 oracle oinstall 0 Dec 8 08:28 wcr_c6h3qh0000013.rec
./capfiles/inst1/ab:
total 0
./capfiles/inst1/ac:
total 0
./capfiles/inst1/ad:
total 0
./capfiles/inst1/ae:
total 0
./capfiles/inst1/af:
total 0
./capfiles/inst1/ag:
total 0
./capfiles/inst1/ah:
total 0
./capfiles/inst1/ai:
total 0
./capfiles/inst1/aj:
total 0
[oracle@mlab2 dbcapture]$ cd ./capfiles/inst1/aa
[oracle@mlab2 aa]$ ls -l
total 316
-rw-r--r-- 1 oracle oinstall 1762 Dec 8 07:25 wcr_c6cdah0000001.rec
-rw-r--r-- 1 oracle oinstall 16478 Dec 8 07:28 wcr_c6cf1h0000002.rec
-rw-r--r-- 1 oracle oinstall 1772 Dec 8 07:29 wcr_c6cjdh0000004.rec
-rw-r--r-- 1 oracle oinstall 1535 Dec 8 07:29 wcr_c6cnah0000005.rec
-rw-r--r-- 1 oracle oinstall 1821 Dec 8 07:41 wcr_c6cpfh0000007.rec
-rw-r--r-- 1 oracle oinstall 1815 Dec 8 07:33 wcr_c6cq6h000000a.rec
-rw-r--r-- 1 oracle oinstall 1535 Dec 8 07:34 wcr_c6cxmh000000h.rec
-rw-r--r-- 1 oracle oinstall 1427 Dec 8 07:41 wcr_c6cxvh000000j.rec
-rw-r--r-- 1 oracle oinstall 1425 Dec 8 07:41 wcr_c6czph000000k.rec
-rw-r--r-- 1 oracle oinstall 2398 Dec 8 07:49 wcr_c6dqfh000000q.rec
-rw-r--r-- 1 oracle oinstall 259321 Dec 8 08:35 wcr_c6du7h000000r.rec
-rw-r--r-- 1 oracle oinstall 0 Dec 8 07:55 wcr_c6f6yh000000t.rec
-rw-r--r-- 1 oracle oinstall 0 Dec 8 08:28 wcr_c6h3qh0000013.rec
[oracle@mlab2 aa]$ ls -l |wc -l
14
以上负载目录共14个文件
3. 我们LOGON一个新的Server Process
[oracle@mlab2 ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Sat Dec 8 08:37:40 2012
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Data Mining
and Real Application Testing options
会多出一个wcr文件
[oracle@mlab2 aa]$ ls -ltr
total 316
-rw-r--r-- 1 oracle oinstall 1762 Dec 8 07:25 wcr_c6cdah0000001.rec
-rw-r--r-- 1 oracle oinstall 16478 Dec 8 07:28 wcr_c6cf1h0000002.rec
-rw-r--r-- 1 oracle oinstall 1772 Dec 8 07:29 wcr_c6cjdh0000004.rec
-rw-r--r-- 1 oracle oinstall 1535 Dec 8 07:29 wcr_c6cnah0000005.rec
-rw-r--r-- 1 oracle oinstall 1815 Dec 8 07:33 wcr_c6cq6h000000a.rec
-rw-r--r-- 1 oracle oinstall 1535 Dec 8 07:34 wcr_c6cxmh000000h.rec
-rw-r--r-- 1 oracle oinstall 1425 Dec 8 07:41 wcr_c6czph000000k.rec
-rw-r--r-- 1 oracle oinstall 1427 Dec 8 07:41 wcr_c6cxvh000000j.rec
-rw-r--r-- 1 oracle oinstall 1821 Dec 8 07:41 wcr_c6cpfh0000007.rec
-rw-r--r-- 1 oracle oinstall 2398 Dec 8 07:49 wcr_c6dqfh000000q.rec
-rw-r--r-- 1 oracle oinstall 0 Dec 8 07:55 wcr_c6f6yh000000t.rec
-rw-r--r-- 1 oracle oinstall 0 Dec 8 08:28 wcr_c6h3qh0000013.rec
-rw-r--r-- 1 oracle oinstall 259321 Dec 8 08:35 wcr_c6du7h000000r.rec
-rw-r--r-- 1 oracle oinstall 0 Dec 8 08:37 wcr_c6hp4h0000018.rec
但是新出现的wcr_c6hp4h0000018.rec 是空的
SQL> select spid from v$process where addr = ( select paddr from v$session where sid=(select distinct sid from v$mystat));
SPID
------------------------
14293
该新的服务进程的操作系统进程号为14293, 可以看到该进程的打开文件描述符,其中就包含了wcr_c6hp4h0000018.rec
[oracle@mlab2 ~]$ ls -l /proc/14293/fd
total 0
lr-x------ 1 oracle oinstall 64 Dec 8 08:39 0 -> /dev/null
l-wx------ 1 oracle oinstall 64 Dec 8 08:39 1 -> /dev/null
lrwx------ 1 oracle oinstall 64 Dec 8 08:39 10 -> /u01/app/oracle/product/11201/db_1/rdbms/audit/CRMV_ora_14293_1.aud
l-wx------ 1 oracle oinstall 64 Dec 8 08:39 11 -> /u01/app/oracle/diag/rdbms/crmv/CRMV/trace/CRMV_ora_14293.trc
l-wx------ 1 oracle oinstall 64 Dec 8 08:39 12 -> pipe:[34585895]
l-wx------ 1 oracle oinstall 64 Dec 8 08:39 13 -> /u01/app/oracle/diag/rdbms/crmv/CRMV/trace/CRMV_ora_14293.trm
l-wx------ 1 oracle oinstall 64 Dec 8 08:39 2 -> /dev/null
lr-x------ 1 oracle oinstall 64 Dec 8 08:39 3 -> /dev/null
lr-x------ 1 oracle oinstall 64 Dec 8 08:39 4 -> /dev/null
lr-x------ 1 oracle oinstall 64 Dec 8 08:39 5 -> /u01/app/oracle/product/11201/db_1/rdbms/mesg/oraus.msb
lr-x------ 1 oracle oinstall 64 Dec 8 08:39 6 -> /proc/14293/fd
lr-x------ 1 oracle oinstall 64 Dec 8 08:39 7 -> /dev/zero
lrwx------ 1 oracle oinstall 64 Dec 8 08:39 8 -> /home/oracle/dbcapture/capfiles/inst1/aa/wcr_c6hp4h0000018.rec
lr-x------ 1 oracle oinstall 64 Dec 8 08:39 9 -> pipe:[34585894]
也可以通过lsof查到
[root@mlab2 ~]# lsof|grep wcr_c6hp4h0000018.rec
oracle 14293 oracle 8u REG 8,1 0 17629644 /home/oracle/dbcapture/capfiles/inst1/aa/wcr_c6hp4h0000018.rec
可以得出一个结论,一个Server Process对应一个WCR REC文件,当Server Process LOGON时生成该文件
3.执行少量SQL语句:
SQL> select 1 from dual;
1
----------
1
SQL> /
1
----------
1
[oracle@mlab2 aa]$ strings wcr_c6hp4h0000018.rec
==》执行少量SQL无任何时, 不写出任何数据
执行某个超长SQL语句后,可以在下面看到其文件内容包含该WCR文件的版本号,LOGON的登录信息和所执行过的SQL语句历史,但是不包含执行计划
[oracle@mlab2 aa]$ strings wcr_c6hp4h0000018.rec
11.2.0.3.0
*File header info. (Shadow process='14293')
D0576B5D710A34F4E043B201A8C0ECFE
SYS;
NLS_LANGUAGE?
AMERICAN>
NLS_TERRITORY?
AMERICA>
NLS_CURRENCY?
NLS_ISO_CURRENCY?
AMERICA>
NLS_NUMERIC_CHARACTERS?
NLS_CALENDAR? GREGORIAN>
NLS_DATE_FORMAT? DD-MON-RR>
NLS_DATE_LANGUAGE?
AMERICAN>
NLS_CHARACTERSET?
AL32UTF8>
NLS_SORT?
BINARY>
NLS_TIME_FORMAT?
HH.MI.SSXFF AM>
NLS_TIMESTAMP_FORMAT?
DD-MON-RR HH.MI.SSXFF AM>
NLS_TIME_TZ_FORMAT?
HH.MI.SSXFF AM TZR>
NLS_TIMESTAMP_TZ_FORMAT?
DD-MON-RR HH.MI.SSXFF AM TZR>
NLS_DUAL_CURRENCY?
NLS_SPECIAL_CHARS?
NLS_NCHAR_CHARACTERSET?
UTF8>
NLS_COMP?
BINARY>
NLS_LENGTH_SEMANTICS?
BYTE>
NLS_NCHAR_CONV_EXCP?
FALSE
(DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/u01/app/oracle/product/11201/db_1/bin/oracle)(ARGV0=oracleCRMV)(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))')(DETACH=NO))(CONNECT_DATA=(CID=(PROGRAM=sqlplus)(HOST=mlab2.oracle.com)(USER=oracle))))
,T$sqlplus@mlab2.oracle.com (TNS V1-V3)U
tselect spid from v$process where addr = ( select paddr from v$session where sid=(select distinct sid from v$mystat))
` _
select 1 from dual
select 1 from dual
执行某些错误语句时甚至会记录其错误信息
[oracle@mlab2 aa]$ strings wcr_c6hp4h0000018.rec
9`9_^B
create table vva(t1 int)
`:_i
:`:_iB
`;_^
;`;_^B
create table vva(t1 int)
`_i
>`>_iB
FusC
`?_^
?`?_^B
FvWC
_begin
for i in 1..50000 loop
execute immediate 'select 1 from dual where 2='||i;
end loop;
end;
若SERVER PROCESS LOGOFF 则会有如下信息
C`E_ B
k^2C
以上说明Server Process并不胡在parse或execution阶段写WCR文件,而是将这些数据存放在PGA中,达到一定大小阀值后才写出,所以其性能影响小,但是如果在WCR数据写出前就意外终止则,这些数据将丢失。
4. 窥视进程信息
SQL> oradebug setmypid
Statement processed.
SQL> oradebug dump processstate 10;
Statement processed.
SQL> oradebug tracefile_name
/u01/app/oracle/diag/rdbms/crmv/CRMV/trace/CRMV_ora_14293.trc
从processstate 文件中可以发现历史等待信息包括 WCR: capture file IO write,这是Server process在写WCR 文件
3: waited for 'SQL*Net message to client'
driver id=0x62657100, #bytes=0x1, =0x0
wait_id=139 seq_num=140 snap_id=1
wait times: snap=0.000007 sec, exc=0.000007 sec, total=0.000007 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 0.934091 sec of elapsed time
4: waited for 'latch: shared pool'
address=0x60106b20, number=0x133, tries=0x0
wait_id=138 seq_num=139 snap_id=1
wait times: snap=0.000066 sec, exc=0.000066 sec, total=0.000066 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 1.180690 sec of elapsed time
5: waited for 'WCR: capture file IO write'
=0x0, =0x0, =0x0
wait_id=137 seq_num=138 snap_id=1
wait times: snap=0.000189 sec, exc=0.000189 sec, total=0.000189 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 3.122783 sec of elapsed time
6: waited for 'WCR: capture file IO write'
=0x0, =0x0, =0x0
wait_id=136 seq_num=137 snap_id=1
wait times: snap=0.000191 sec, exc=0.000191 sec, total=0.000191 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 3.053132 sec of elapsed time
7: waited for 'WCR: capture file IO write'
5.窥视PGA内存信息
SQL> oradebug dump heapdump 536870917;
Statement processed.
grep WCR /u01/app/oracle/diag/rdbms/crmv/CRMV/trace/CRMV_ora_14293.trc
Chunk 7fb1b606bfc0 sz= 65600 freeable "WCR Capture PG " ds=0x7fb1b6115f90
Chunk 7fb1b6111e18 sz= 4224 freeable "WCR Capture PG " ds=0x7fb1b6115f90
Chunk 7fb1b6112e98 sz= 4184 freeable "WCR Capture PG " ds=0x7fb1b6115f90
Chunk 7fb1b6113ef0 sz= 4224 freeable "WCR Capture PG " ds=0x7fb1b6115f90
Chunk 7fb1b6114f70 sz= 4104 recreate "WCR Capture PG " latch=(nil)
Chunk 7fb1b6115f78 sz= 160 freeable "WCR Capture PGA"
Chunk 7fb1b6116018 sz= 3248 freeable "WCR Capture PGA"
Subheap ds=0x7fb1b6115f90 heap name= WCR Capture PG size= 82336
HEAP DUMP heap name="WCR Capture PG" desc=0x7fb1b6115f90
FIVE LARGEST SUB HEAPS for heap name="WCR Capture PG" desc=0x7fb1b6115f9
PGA中存在WCR Capture PG 和WCR Capture PGA的freeable or recreate内存chunk,这些内存可以从Server Process返回给OS
Chunk 7fb1b606bfc0 sz= 65600 freeable "WCR Capture PG " ds=0x7fb1b6115f90
sz= 65600=》 64k 这可能就是文档描述的64k内存,但是实际可以看到使用总量超过了64k
WCR是否可能引起内存泄露呢?:)!
6.隐藏参数
控制WCR CAPTURE的参数至少有以下2个
SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
2 FROM SYS.x$ksppi x, SYS.x$ksppcv y
3 WHERE x.inst_id = USERENV ('Instance')
4 AND y.inst_id = USERENV ('Instance')
5 AND x.indx = y.indx
6 AND x.ksppinm in ('_capture_buffer_size','_wcr_control');
NAME VALUE DESCRIB
-------------------- -------------------- ------------------------------------------------------------
_wcr_control 0 Oracle internal test WCR parameter used ONLY for testing!
_capture_buffer_size 65536 To set the size of the PGA I/O recording buffers
其中_capture_buffer_size 控制PGA中WCR BUFFER的SIZE,默认为64k
_wcr_control 控制WCR的内部行为,具体尚不清晰
7.结束capture
SQL> execute dbms_workload_capture.finish_capture;
PL/SQL procedure successfully completed.
DBMS_WORKLOAD_CAPTURE.START_CAPTURE(): Starting database capture at 12/08/2012 07:24:54
DBMS_WORKLOAD_CAPTURE.FINISH_CAPTURE(): Stopped database capture (not all sessions could flush their capture buffers) at 12/08/2012 10:29:49
综上所述,我们得出结论:
1. 负责写WCR WORKLOAD CAPTURE文件的不是什么后台进程,而是Server Process服务进程(前台进程)
2. 每一个server process对应一个WCR文件
3. Server Proess会写出LOGON、LOGOFF、SQL执行以及错误信息到WCR文件中
4. Server Process的写不是立即模式Immediate mode,而是将数据存放在PGA的(WCR Capture) subheap中,当数据达到一定大小或一定时间(timeout才写出一次)
5. 再次强调, Server Process的写不是立即模式Immediate mode,即capture不发生在parse或者execution阶段(很多人认为capture发生在parse时这种说法可以证明为不正确,parse并不受到capture的影响),而仅仅是将LOGON、SQL历史(不包括执行计划)放在PGA的WCR Capture内存子堆中,条件触发才写出,所以其性能损耗小,按照tpcc的测试结论为4.5%
6. 隐藏参数_capture_buffer_size 控制PGA中WCR BUFFER的SIZE,默认为64k
7. WCR Capture文件虽然是binrary 2进制文件,但并未加密,所以第三方软件理论上也可以读取该WCR capture file
8. WCR: capture file IO write等待事件是Server Process在写WCR文件
9. 执行dbms_workload_capture.finish_capture;会让现有Server Process FLUSH WCR buffer,但是就ALERT.LOG的自解释信息看”Stopped database capture (not all sessions could flush their capture buffers)”,这种flush不是必然能回收内存的
从10.2.0.3开始 DB_BLOCK_CHECKSUM有三个选项:OFF TYPICAL FULL
在10.2.0.3之前 DB_BLOCK_CHECKSUM有2个选项: TRUE FALSE
在11g中DB_BLOCK_CHECKSUM和 DB_BLOCK_CHECKING参数 被 DB_ULTRA_SAFE参数整合到一起:
DB_ULTRA_SAFE sets the default values for other parameters that control protection levels.
Values:
OFF
When any of DB_BLOCK_CHECKING, DB_BLOCK_CHECKSUM, or DB_LOST_WRITE_PROTECT are explicitly set, no changes are made.
DATA_ONLY
DB_BLOCK_CHECKING will be set to MEDIUM.
DB_LOST_WRITE_PROTECT will be set to TYPICAL.
DB_BLOCK_CHECKSUM will be set to FULL.
DATA_AND_INDEX
DB_BLOCK_CHECKING will be set to FULL.
DB_LOST_WRITE_PROTECT will be set to TYPICAL.
DB_BLOCK_CHECKSUM will be set to FULL.
DB_BLOCK_CHECKSUM 参数决定了DBWn进程和直接路径读取进程是否为块计算checksum并将该checksum存放在每个数据块的cache header并写入到磁盘中。当该数据块被读取时,该checksum会受到验证, 前提是DB_BLOCK_CHECKSUM 被设置为TYPICAL 或 FULL,且最近一次该块的写出中存有checksum。
在FULL模式下,Oracle还会当块要发生变化应用前对该块验证checksum,并会在DML update/insert/delete语句引起变化被应用到块后再次计算该checksum。
此外,Oracle会对写入到当前redo日志文件的每一个redo block计算checksum。
若该参数设置为OFF,则 DBWn进程仅为system表空间上的对象计算checksum, 而对于普通表空间不计算。
checksum让Oracle具备检测由底层磁盘、存储子系统、IO子系统引起的坏块。
若设置为FULL, 则DB_BLOCK_CHECKSUM还会捕捉内存讹误并避免将存在逻辑讹误的块被写入到磁盘上。
官方文档介绍设置DB_BLOCK_CHECKSUM为TYPICAL模式可能引起1%-2%的性能损耗,设置为FULL mode可能引起4%-5%的性能损耗。
除非有极高的数据安全性要求,否则推荐用户设置DB_BLOCK_CHECKSUM为TYPICAL。
实际测试发现在 Linux +版本11.2.0.3上DB_BLOCK_CHECKSUM=FULL对普通的内存讹误(memory corruption)几乎无效,详见以下测试:
SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE 11.2.0.3.0 Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL> show parameter db_block
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
db_block_buffers integer 0
db_block_checking string FULL
db_block_checksum string FULL
db_block_size integer 8192
create table maclean_corruption(t1 int) tablespace system;
insert into maclean_corruption values(1);
insert into maclean_corruption values(1);
commit;
1* select t1,dump(t1,16) from maclean_corruption
SQL> /
T1 DUMP(T1,16)
---------- ------------------------------
1 Typ=2 Len=2: c1,2
1 Typ=2 Len=2: c1,2
SQL> select dbms_rowid.rowid_block_number(rowid),dbms_rowid.rowid_relative_fno(rowid) from maclean_corruption;
DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID)
------------------------------------ ------------------------------------
89985 1
89985 1
ALTER SESSION SET EVENTS 'immediate trace name buffers level 3';
SQL> oradebug setmypid
Statement processed.
SQL> oradebug tracefile_name
/s01/diag/rdbms/prodb/PRODB/trace/PRODB_ora_28545.trc
BH (0xb0f748c8) file#: 1 rdba: 0x00415f81 (1/89985) class: 1 ba: 0xb0154000
set: 6 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,0
dbwrid: 0 obj: 76977 objn: 76977 tsn: 0 afn: 1 hint: f
hash: [0xce69cd58,0xce69cd58] lru: [0xb0f74ae0,0xb0f74880]
obj-flags: object_ckpt_list
ckptq: [0xb0f6c168,0xb0f76a08] fileq: [0xb0f6c178,0xb0f76a18] objq: [0xc70de768,0xc70de768] objaq: [0xc70de748,0xc70de748]
st: XCURRENT md: NULL fpin: 'kcbwh6: kcbnew' tch: 4
flags: buffer_dirty redo_since_read
LRBA: [0xc.92.0] LSCN: [0x0.2329de] HSCN: [0x0.2329e0] HSUB: [1]
buffer tsn: 0 rdba: 0x00415f81 (1/89985)
scn: 0x0000.002329e0 seq: 0x01 flg: 0x06 tail: 0x29e00601
frmt: 0x02 chkval: 0x0a60 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
Dump of memory from 0x00000000B0154000 to 0x00000000B0156000
0B0154000 0000A206 00415F81 002329E0 06010000 [....._A..)#.....]
0B0154010 00000A60 00000001 00012CB1 002329DD [`........,...)#.]
0B0154020 00000000 00030002 00000000 001E0002 [................]
0B0154030 00000348 00C0312E 0032008E 00002002 [H....1....2.. ..]
0B0154040 002329E0 00000000 00000000 00000000 [.)#.............]
0B0154050 00000000 00000000 00000000 00020100 [................]
0B0154060 0016FFFF 1F781F94 00001F78 1F9A0002 [......x.x.......]
0B0154070 00001F94 00000000 00000000 00000000 [................]
0B0154080 00000000 00000000 00000000 00000000 [................]
Repeat 497 times
0B0155FA0 5F840000 C1020041 41000645 41004C50 [..._A...E..APL.A]
0B0155FB0 00415F83 0432C102 275F4100 00000000 [._A...2..A_'....]
0B0155FC0 00000000 00000000 00000000 00000000 [................]
Repeat 2 times
0B0155FF0 0201012C 012C02C1 02C10201 29E00601 [,.....,........)]
Block header dump: 0x00415f81
Object id on Block? Y
seg/obj: 0x12cb1 csc: 0x00.2329dd itc: 2 flg: O typ: 1 - DATA
fsl: 0 fnx: 0x0 ver: 0x01
Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0002.01e.00000348 0x00c0312e.008e.32 --U- 2 fsc 0x0000.002329e0
0x02 0x0000.000.00000000 0x00000000.0000.00 ---- 0 fsc 0x0000.00000000
bdba: 0x00415f81
data_block_dump,data header at 0xb015405c
===============
tsiz: 0x1fa0
hsiz: 0x16
pbl: 0xb015405c
76543210
flag=--------
ntab=1
nrow=2
frre=-1
fsbo=0x16
fseo=0x1f94
avsp=0x1f78
tosp=0x1f78
0xe:pti[0] nrow=2 offs=0
0x12:pri[0] offs=0x1f9a
0x14:pri[1] offs=0x1f94
block_row_dump:
tab 0, row 0, @0x1f9a
tl: 6 fb: --H-FL-- lb: 0x1 cc: 1
col 0: [ 2] c1 02
tab 0, row 1, @0x1f94
tl: 6 fb: --H-FL-- lb: 0x1 cc: 1
col 0: [ 2] c1 02
end_of_block_dump
0B0155FF0 0201012C 012C02C1 02C10201 29E00601 [,.....,........)]
0B0155FF4 012C02C1 这里的 02C1即 C102=> 十进制的 "1"
0B0155FF8 02C10201 同样的 02C1即 C102=> 十进制的 "1"
修改 02C1为 04C1 即将 1修改为 3
oradebug setmypid
oradebug poke 0x0B0155FF4 4 0x012C04C1
SQL> select t1,dump(t1,16) from maclean_corruption;
T1 DUMP(T1,16)
---------- ------------------------------
1 Typ=2 Len=2: c1,2
3 Typ=2 Len=2: c1,4
SQL> delete maclean_corruption where t1=3;
1 row deleted.
SQL> commit;
Commit complete.
SQL> alter system flush buffer_cache;
System altered.
SQL> select * from maclean_corruption;
T1
----------
1
SQL> ALTER SESSION SET EVENTS 'immediate trace name buffers level 3';
Session altered.
0B004BFF0 0201023C 002C04C1 02C10201 2B350601
oradebug poke 0x0B004BFF8 4 0x50C10201;
BEFORE: [0B004BFF8, 0B004BFFC) = FFC10201
AFTER: [0B004BFF8, 0B004BFFC) = 50C10201
SQL>
SQL>
SQL> select * from maclean_corruption;
T1
----------
79
SQL> update maclean_corruption set t1=t1+1 where t1=80;
0 rows updated.
SQL> update maclean_corruption set t1=t1+1 where t1=79;
1 row updated.
SQL> select * from maclean_corruption;
T1
----------
80
SQL>
SQL> oradebug peek 0x0B004BFF8 4
[0B004BFF8, 0B004BFFC) = 50C10201
BH (0xafff92d8) file#: 1 rdba: 0x00415f81 (1/89985) class: 1 ba: 0xaff4a000
set: 6 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,0
dbwrid: 0 obj: 76977 objn: 76977 tsn: 0 afn: 1 hint: f
hash: [0xb0f6ab88,0xce69cd58] lru: [0xafff94f0,0xcd1371e8]
obj-flags: object_ckpt_list
ckptq: [0xcd1554b8,0xcd1554b8] fileq: [0xcd1554d8,0xcd1554d8] objq: [0xc80e2bc0,0xc80e2bc0] objaq: [0xc80e2ba0,0xc80e2ba0]
st: XCURRENT md: NULL fpin: 'kdswh11: kdst_fetch' tch: 5
flags: buffer_dirty redo_since_read
LRBA: [0xc.1d7.0] LSCN: [0x0.232b9d] HSCN: [0x0.232bb7] HSUB: [1]
buffer tsn: 0 rdba: 0x00415f81 (1/89985)
scn: 0x0000.00232bb7 seq: 0x01 flg: 0x06 tail: 0x2bb70601
frmt: 0x02 chkval: 0x5e6a type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
Dump of memory from 0x00000000AFF4A000 to 0x00000000AFF4C000
0AFF4A000 0000A206 00415F81 00232BB7 06010000 [....._A..+#.....]
0AFF4A010 00005E6A 00000001 00012CB1 00232B33 [j^.......,..3+#.]
0AFF4A020 00000000 00030002 00000000 000D0003 [................]
0AFF4A030 00000346 00C03094 0021008F 00002001 [F....0....!.. ..]
0AFF4A040 00232BB7 0004000A 00000290 00C00AF0 [.+#.............]
0AFF4A050 0028007B 00072001 00232B35 00020100 [{.(.. ..5+#.....]
0AFF4A060 0016FFFF 1F781F94 00001F81 1F9A0002 [......x.........]
0AFF4A070 00001F94 00000000 00000000 00000000 [................]
0AFF4A080 00000000 00000000 00000000 00000000 [................]
Repeat 497 times
0AFF4BFA0 5F840000 C1020041 41000645 41004C50 [..._A...E..APL.A]
0AFF4BFB0 00415F83 0432C102 275F4100 00000000 [._A...2..A_'....]
0AFF4BFC0 00000000 00000000 00000000 00000000 [................]
Repeat 2 times
0AFF4BFF0 0201023C 012C04C1 51C10201 2BB70601 [
Block header dump: 0x00415f81
Object id on Block? Y
seg/obj: 0x12cb1 csc: 0x00.232b33 itc: 2 flg: O typ: 1 - DATA
fsl: 0 fnx: 0x0 ver: 0x01
Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0003.00d.00000346 0x00c03094.008f.21 --U- 1 fsc 0x0000.00232bb7
0x02 0x000a.004.00000290 0x00c00af0.007b.28 --U- 1 fsc 0x0007.00232b35
bdba: 0x00415f81
data_block_dump,data header at 0xaff4a05c
===============
tsiz: 0x1fa0
hsiz: 0x16
pbl: 0xaff4a05c
76543210
flag=--------
ntab=1
nrow=2
frre=-1
fsbo=0x16
fseo=0x1f94
avsp=0x1f78
tosp=0x1f81
0xe:pti[0] nrow=2 offs=0
0x12:pri[0] offs=0x1f9a
0x14:pri[1] offs=0x1f94
block_row_dump:
tab 0, row 0, @0x1f9a
tl: 6 fb: --H-FL-- lb: 0x1 cc: 1
col 0: [ 2] c1 51
tab 0, row 1, @0x1f94
tl: 2 fb: --HDFL-- lb: 0x2
end_of_block_dump
0AFF4BFF0 0201023C 012C04C1 51C10201 2BB70601
oradebug poke 0x0AFF4BFF8 4 0x59C10201; ==> 修改 51=》59 记修改T1=80 为88
select * from maclean_corruption;
T1
----------
88
SQL> delete maclean_corruption where t1=88;
1 row deleted.
SQL> commit;
Commit complete.
SQL> alter system flush buffer_cache;
System altered.
SQL> /
System altered.
SQL> alter system archive log current;
System altered.
SQL> /
System altered.
SQL> select * from maclean_corruption;
no rows selected
SQL>
SQL> analyze table maclean_corruption validate structure;
Table analyzed.
以上通过oradebug poke命令反复修改数据库块内的T1字段,模拟数据块在内存中出现内存讹误(block corruption in memory),发现DB_BLOCK_CHECKSUM=FULL对该种简单的内存讹误却无法实际检测到(这不代表DB_BLOCK_CHECKSUM=TYPICAL无法检查到检测由底层磁盘、存储子系统、IO子系统引起的坏块,DB_BLOCK_CHECKSUM=TYPICAL仍是检测物理坏块的最佳默认推荐配置),通过update/delete修改对应手工讹误的内存块,均可以正常工作并生成redo且flush脏块(dirty buffer)到磁盘上,则会将该内存讹误的情况持久化,且即便在db_block_checking=FULL的情况下也无法检测到一丁点逻辑讹误,这说明不管是db_block_checksum还是db_block_checking都对内存中数据块的细微讹误无法有效检测,虽然这不代表checksum+checking无法检测到更破坏块结构的内存讹误,但多少还是有些悲哀的。
这个问题源于OTN中文论坛的一个帖子<大事务中的更新丢失问题>:
环境为Oracle 10.2.0.4 on Linux x64 有一个大表,百万级,col1字段全为0 t1 事务A启动,把所有记录col2全更新为1 t2 事务B启动,根据主键,把一条记录更新为2,然后commit t3 事务A执行完成,并COMMIT t4 查询此表,发现col1全部为1,事务B的更新丢失了。 这是为什么呢,其中逻辑是怎样的 谢谢!
对于这个问题我想说明的是对于事务transaction 而言Oracle同样提供读一致性,称为transaction-level read consistency:
The database can also provide read consistency to all queries in a transaction, known as transaction-level read consistency. In this case, each statement in a transaction sees data from the same point in time, which is the time at which the transaction began.
Oracle Database Concepts
11g Release 2 (11.2)
Part Number E10713-04
为了证明和演示该事务级别的读一致性,我设计了以下演示:
有一张2个列的大表(T1,T2), 其中分别有 T1=600000,T2=’U2′ 和 T1=900000和 T2=’U1′的 2行数据,T1为主键。
在A)时刻,Session A使用SELECT .. FOR UPDATE锁住T1=600000的这一行
在之后的B)时刻,Session B尝试update TAB set t2=’U3′ where t2=’U2′ 即更新所有T2=’U2′的数据行,但是因为 T1=600000,T2=’U2′这一行正好被Session A锁住了,所以Session B会一直等待’enq: TX – row lock contention’;T1=900000和 T2=’U1′的数据行位于Session B处理 T1=600000,T2=’U2′行等待之后才能处理到的数据块中。
在之后的C)时刻,Session C更新update TAB set t2=’U2′ where t1=900000;并commit, 即将T1=900000和 T2=’U1′更新为 T1=900000和 T2=’U2′,这样就符合Session B 更新Update的条件t2=’U2′了。
在D)时刻, Session A执行commit释放锁,Session B得以继续工作,当他处理到T1=900000的记录时存在以下分歧:
1)若update DML满足transaction-level read consistency,则它应当看到的是session B事务开始环境SCN(env SCN)时的块的前镜像,即虽然session C更新了t2=’U2′满足其条件,但是为了一致性,session B仍需要对该行所在数据块做APPLY UNDO,直到满足该session B事务开始时间点的Best CR块,而CR一致镜像中t2=’U1′,不满足Session B的更新条件, 那么session B在整个事务中仅更新一行数据 T1=600000,T2=’U2′,session B only Update One Rows。
2) 若update DML不满足transaction-level read consistency,则session B看到的是当前read commited的镜像,即是Session C已更新并提交的块镜像,此时的记录为T1=900000和 T2=’U2′符合session B的更新条件,则session B要更新2行数据。
我们来看一下实际实验的结果:
构建实验环境:
SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE 11.2.0.3.0 Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
create table update_cr tablespace users as select rownum t1, 'MACLEAN ' T2
from dual connect by level update update_cr set t2='U2' where t1=600000;
1 row updated.
SQL> update update_cr set t2='U1' where t1=900000;
1 row updated.
SQL> commit;
Commit complete.
SQL> select * from update_cr where t1=600000 or t1=900000;
T1 T2
---------- ------------
600000 U2
900000 U1
在A)时刻,Session A使用SELECT .. FOR UPDATE锁住T1=600000的这一行:
session A:
SQL> select * from update_cr where t1=600000 for update;
T1 T2
---------- ------------
600000 U2
在之后的B)时刻,Session B尝试update TAB set t2=’U3′ where t2=’U2′ 即更新所有T2=’U2′的数据行,但是因为 T1=600000,T2=’U2′这一行正好被Session A锁住了,所以Session B会一直等待’enq: TX – row lock contention’;T1=900000和 T2=’U1′的数据行位于Session B处理 T1=600000,T2=’U2′行等待之后才能处理到的数据块中。
session B: SQL> alter system flush buffer_cache; System altered. SQL> / System altered. SQL> alter session set events '10046 trace name context forever,level 8 : 10201 trace name context forever,level 10'; Session altered. SQL> update update_cr set t2='U3' where t2='U2';
在之后的C)时刻,Session C更新update TAB set t2=’U2′ where t1=900000;并commit, 即将T1=900000和 T2=’U1′更新为 T1=900000和 T2=’U2′,这样就符合Session B 更新Update的条件t2=’U2′了。
session C:
SQL> select * from update_cr where t1=900000;
T1 T2
---------- ------------
900000 U1
SQL> update update_cr set t2='U2' where t1=900000;
1 row updated.
SQL> commit;
Commit complete.
SQL> alter system flush buffer_cache;
System altered.
之后session A 执行 commit;session B得以继续update,得到实验的结果:
session A 执行 commit; session B; SQL> update update_cr set t2='U3' where t2='U2'; 1 row updated.
以看到以上实验的结果 update仅更新了一行数据,证明了观点1″若update DML满足transaction-level read consistency,则它应当看到的是session B事务开始环境SCN(env SCN)时的块的前镜像,即虽然session C更新了t2=’U2′满足其条件,但是为了一致性,session B仍需要对该行所在数据块做APPLY UNDO,直到满足该session B事务开始时间点的Best CR块,而CR一致镜像中t2=’U1′,不满足Session B的更新条件, 那么session B在整个事务中仅更新一行数据 T1=600000,T2=’U2′,session B only Update One Rows。”的正确性。
即update/delete之类的DML在Oracle中满足transaction-level read consistency,保证其所”看到的”是满足事务开始时间点读一致性的Consistent Read,这也是为什么DML会产生Consistent Read的原因之一。
我们回过头来看一下 上面Session B产生的10201 undo apply和10046 trace的内容,可以发现更多内容:
SQL> select dbms_rowid.rowid_block_number(rowid),dbms_rowid.rowid_relative_fno(rowid) from update_cr where t1=600000 or t1=900000;
DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID)
------------------------------------ ------------------------------------
2589 4
3558 4
WAIT #139924171154784: nam='db file sequential read' ela= 19504 file#=3 block#=128 blocks=1 obj#=0 tim=1258427129863987
Applying CR undo to block 4 : 1000a1d itl entry 03:
xid: 0x0001.010.00000277 uba: 0x00c0300e.007c.1e
flg: ---- lkc: 1 fsc: 0x0000.00000000
WAIT #139924171154784: nam='db file sequential read' ela= 12957 file#=3 block#=12302 blocks=1 obj#=0 tim=1258427129877291
CRS upd rd env [0x7f42a284a704]: (scn: 0x0000.00223eb9 xid: 0x0000.000.00000000 uba: 0x00000000.0000.00 statement num=0 parent xid: 0x0000.000.00000000 st-scn: 0x0000.00000000 hi-scn: 0x0000.00000000 ma-scn: 0x0000.00223e78 flg: 0x00000060) undo env [0x7fffe3fe7f60]: (
scn: 0x0000.00223eb9 xid: 0x0001.010.00000277 uba: 0x00c0300e.007c.1e statement num=2 parent xid: 0x0000.000.00000000 st-scn: 0xe404.00000000 hi-scn: 0x2100.00000000 ma-scn: 0x0000.00000000 flg: 0x62515e33)
CRS upd (before): 0xa5fe9198 cr-scn: 0x0000.00223eb9 xid: 0x0000.000.00000000 uba: 0x00000000.0000.00 cl-scn: 0x0000.00223eba sfl: 0
CRS upd (after) : 0xa5fe9198 cr-scn: 0x0000.00223eb9 xid: 0x0001.010.00000277 uba: 0x00c0300e.007c.1e cl-scn: 0x0000.00223eba sfl: 0
*** 2009-11-16 22:06:01.253
WAIT #139924171154784: nam='enq: TX - row lock contention' ela= 31358812 name|mode=1415053318 usn<<16 | slot=65552 sequence=631 obj#=76969 tim=1258427161253791
WAIT #139924171154784: nam='db file sequential read' ela= 36051 file#=4 block#=2589 blocks=1 obj#=76969 tim=1258427161290180
WAIT #139924171154784: nam='db file scattered read' ela= 14718 file#=4 block#=2590 blocks=98 obj#=76969 tim=1258427161305684
WAIT #139924171154784: nam='db file scattered read' ela= 26432 file#=4 block#=2690 blocks=126 obj#=76969 tim=1258427161338364
WAIT #139924171154784: nam='db file scattered read' ela= 38289 file#=4 block#=2818 blocks=126 obj#=76969 tim=1258427161384445
WAIT #139924171154784: nam='db file scattered read' ela= 24265 file#=4 block#=2946 blocks=126 obj#=76969 tim=1258427161416302
WAIT #139924171154784: nam='db file scattered read' ela= 21288 file#=4 block#=3074 blocks=126 obj#=76969 tim=1258427161444684
WAIT #139924171154784: nam='db file scattered read' ela= 23840 file#=4 block#=3202 blocks=126 obj#=76969 tim=1258427161476063
WAIT #139924171154784: nam='db file scattered read' ela= 27439 file#=4 block#=3330 blocks=126 obj#=76969 tim=1258427161511429
WAIT #139924171154784: nam='db file scattered read' ela= 24424 file#=4 block#=3458 blocks=126 obj#=76969 tim=1258427161543429
Applying CR undo to block 4 : 1000de6 itl entry 03:
xid: 0x0003.01f.00000345 uba: 0x00c03092.008f.34
flg: --U- lkc: 1 fsc: 0x0000.00223ef9
WAIT #139924171154784: nam='db file sequential read' ela= 8033 file#=3 block#=12434 blocks=1 obj#=0 tim=1258427161557534
CRS upd rd env [0x7f42a284a704]: (scn: 0x0000.00223eb9 xid: 0x0000.000.00000000 uba: 0x00000000.0000.00 statement num=0 parent xid: 0x0000.000.00000000 st-scn: 0x0000.00000000 hi-scn: 0x0000.00000000 ma-scn: 0x0000.00223e78 flg: 0x00000060) undo env [0x7fffe3fe7f60]: (
WAIT #139924171154784: nam='db file sequential read' ela= 8033 file#=3 block#=12434 blocks=1 obj#=0 tim=1258427161557534
CRS upd rd env [0x7f42a284a704]: (scn: 0x0000.00223eb9 xid: 0x0000.000.00000000 uba: 0x00000000.0000.00 statement num=0 parent xid: 0x0000.000.00000000 st-scn: 0x0000.00000000 hi-scn: 0x0000.00000000 ma-scn: 0x0000.00223e78 flg: 0x00000060) undo env [0x7fffe3fe7f60]: (
scn: 0x0000.00223ef8 xid: 0x0003.01f.00000345 uba: 0x00c03092.008f.34 statement num=8192 parent xid: 0x0000.000.0baf3fa0 st-scn: 0x3fa0.00000000 hi-scn: 0x0000.00000000 ma-scn: 0x0000.00000000 flg: 0x00000000)
CRS upd (before): 0xa4fdaa08 cr-scn: 0x0000.00223eb9 xid: 0x0000.000.00000000 uba: 0x00000000.0000.00 cl-scn: 0x0000.00223eff sfl: 0
CRS upd (after) : 0xa4fdaa08 cr-scn: 0x0000.00223eb9 xid: 0x0003.01f.00000345 uba: 0x00c03092.008f.34 cl-scn: 0x0000.00223eff sfl: 0
WAIT #139924171154784: nam='db file scattered read' ela= 13029 file#=4 block#=3586 blocks=126 obj#=76969 tim=1258427161572511
WAIT #139924171154784: nam='db file scattered read' ela= 22282 file#=4 block#=3714 blocks=126 obj#=76969 tim=1258427161602324
WAIT #139924171154784: nam='db file sequential read' ela= 6127 file#=4 block#=522 blocks=1 obj#=76969 tim=1258427161615461
WAIT #139924171154784: nam='db file scattered read' ela= 13503 file#=4 block#=3842 blocks=42 obj#=76969 tim=1258427161629323
以上可以看到对T1=600000所在的数据块1000a1d=》datafile 4 259 apply了UNDO,其环境SCN 为scn: 0×0000.00223eb9:
Applying CR undo to block 4 : 1000a1d itl entry 03:
xid: 0×0001.010.00000277 uba: 0x00c0300e.007c.1e
flg: —- lkc: 1 fsc: 0×0000.00000000
WAIT #139924171154784: nam=’db file sequential read’ ela= 12957 file#=3 block#=12302 blocks=1 obj#=0 tim=1258427129877291
CRS upd rd env [0x7f42a284a704]: (scn: 0×0000.00223eb9 xid: 0×0000.000.00000000 uba: 0×00000000.0000.00 statement num=0 parent xid: 0×0000.000.00000000 st-scn: 0×0000.00000000 hi-scn: 0×0000.00000000 ma-scn: 0×0000.00223e78 flg: 0×00000060) undo env [0x7fffe3fe7f60]: (
scn: 0×0000.00223eb9 xid: 0×0001.010.00000277 uba: 0x00c0300e.007c.1e statement num=2 parent xid: 0×0000.000.00000000 st-scn: 0xe404.00000000 hi-scn: 0×2100.00000000 ma-scn: 0×0000.00000000 flg: 0x62515e33)
CRS upd (before): 0xa5fe9198 cr-scn: 0×0000.00223eb9 xid: 0×0000.000.00000000 uba: 0×00000000.0000.00 cl-scn: 0×0000.00223eba sfl: 0
CRS upd (after) : 0xa5fe9198 cr-scn: 0×0000.00223eb9 xid: 0×0001.010.00000277 uba: 0x00c0300e.007c.1e cl-scn: 0×0000.00223eba sfl: 0
对于T1=900000的数据行所在块1000de6=》 datafile 4 3558 block同样apply了UNDO,其环境SCN均为00223eb9
Applying CR undo to block 4 : 1000de6 itl entry 03:
xid: 0×0003.01f.00000345 uba: 0x00c03092.008f.34
flg: –U- lkc: 1 fsc: 0×0000.00223ef9
WAIT #139924171154784: nam=’db file sequential read’ ela= 8033 file#=3 block#=12434 blocks=1 obj#=0 tim=1258427161557534
CRS upd rd env [0x7f42a284a704]: (scn: 0×0000.00223eb9 xid: 0×0000.000.00000000 uba: 0×00000000.0000.00 statement num=0 parent xid: 0×0000.000.00000000 st-scn: 0×0000.00000000 hi-scn: 0×0000.00000000 ma-scn: 0×0000.00223e78 flg: 0×00000060) undo env [0x7fffe3fe7f60]: (
scn: 0×0000.00223ef8 xid: 0×0003.01f.00000345 uba: 0x00c03092.008f.34 statement num=8192 parent xid: 0×0000.000.0baf3fa0 st-scn: 0x3fa0.00000000 hi-scn: 0×0000.00000000 ma-scn: 0×0000.00000000 flg: 0×00000000)
CRS upd (before): 0xa4fdaa08 cr-scn: 0×0000.00223eb9 xid: 0×0000.000.00000000 uba: 0×00000000.0000.00 cl-scn: 0×0000.00223eff sfl: 0
CRS upd (after) : 0xa4fdaa08 cr-scn: 0×0000.00223eb9 xid: 0×0003.01f.00000345 uba: 0x00c03092.008f.34 cl-scn: 0×0000.00223eff sfl: 0
在启动数据库 (指database open)时若需要recovery的redo较多,涉及到的数据块多时可能会看到以下信息:
> Buffer cache spillover: only 32768 of 117227 buffers available$
这是由于crash recovery的redo apply本身需要用到buffer cache,此时buffer cache被称作recovery buffer。 一旦分配了recovery buffer,其将不会被释放或age out直到所有数据块的redo change都被应用并写入到磁盘。
由于只有一个实例可以参与recovery(不管是crash recovery还是instance recovery),恢复实例的buffer cache 大小将会是大规模恢复数据文件(recovery)的限制。 在锁声明阶段(lock claim phase),若实例耗尽了空余的buffer cache 则会spillover溢出到磁盘上(有点像swap)并必须予以处理。此时锁声明会暂时中止,oracle会先应用日志将redo change应用到哪些已经声明过的buffer cache中。直到所有这些recovery buffer都被恢复并被写出,此时它们才变得对下一次后续的锁声明可用。 由于以上说明的磁盘溢出恢复(spillover recovery)持续多次锁声明和日志应用阶段(理想的是一次锁声明,一次日志应用即完成redo apply),直到本次完整的恢复完成。
需要注意的是,以上crash recovery的算法在 buffer cache很小的情况下性能较差; 常见的例子是这样的, 在产品数据库中由于有着较大buffer cache而不会遇到该问题,而在某些基于存储或卷管理软件实现的复制、测试环境中,由于需要恢复的数据集合较大而且往往db_cache_size比产品库小很多,所以alter database open时往往看到该(Buffer cache spillover: only 32768 of 117227 buffers available$),且打开数据库耗费比产品库更多的时间。
我们一般不用担心buffer cache spillover,因为即便速度缓慢在10分钟内数据库往往还是能够打开的,但是如果你监控着alert.log 30分钟都没有任何新日志,那么可能是spillover导致的hang,如果遇到该问题请去OTN Ask Maclean
Copyright © 2013 · Minimum Theme on Genesis Framework · WordPress · Log in
最新回复