为什么说log file sync(其实是写redo慢)会造成buffer busy wait?

《gc buffer busy/gcs log flush sync与log file sync》一文中我介绍了 redo flush慢造成RAC中gc buffer busy争用的原理, 而在《【技术分享】开Oracle调优鹰眼,深入理解AWR性能报告》 中我又介绍了 log file sync(其实本质是lgwr 写redo慢)也会造成单实例single instance环境中的buffer busy wait等待, 这是为什么呢?

 

我们来做一个演示说明该问题:

 

示例用表:

conn maclean/oracle

create table  maclog (t1 int);

 

打开一个Session A 并获得它的OS进程号

[oracle@vrh8 ~]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.5.0 – Production on Sat Mar 9 09:32:25 2013

Copyright (c) 1982, 2010, Oracle. All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 – 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

 

 

[oracle@vrh8 ~]$ ps -ef|grep LOCAL
oracle 18441 18438 0 09:41 ? 00:00:00 oracleG10R25 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 18445 18348 0 09:42 pts/3 00:00:00 grep LOCAL

 

使用 gdb debug 该18375的前台进程,并指定breakpoint

[oracle@vrh8 ~]$ gdb $ORACLE_HOME/bin/oracle 18441
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-37.el5)

(gdb) b kcrf_commit_force
Breakpoint 1 at 0x286519c

 

之后在session A执行:

 

SQL> insert into maclog values(1);

1 row created.

若insert hang,则在GDB 中输入多次C  - continue 这样能够继续

 

输入 commit; ==》 此时session A会hang住

 

SQL> commit;   ====》HANG

 

在GDB 中输入bt 可以看到stack call

(gdb) bt
#0 0x000000000286519c in kcrf_commit_force ()
#1 0x00000000028620ef in kcrfw_redo_gen ()
#2 0x00000000010e7dba in kcbchg1_main ()
#3 0x00000000010e6d99 in kcbchg ()
#4 0x000000000143f65a in ktucmt ()
#5 0x00000000013c7a06 in ktcCommitTxn ()
#6 0x00000000042a559e in ktdcmt ()
#7 0x00000000024fe09c in k2lcom ()
#8 0×0000000002418993 in k2send ()
#9 0x0000000001418b47 in xctctl ()
#10 0x00000000014174dd in xctcom_with_options ()
#11 0x000000000211fc26 in kksExecuteCommand ()
#12 0x00000000030ef87a in opiexe ()
#13 0x0000000003232d47 in kpoal8 ()
#14 0x00000000013b7c10 in opiodr ()
#15 0x0000000003c3c9da in ttcpip ()
#16 0x00000000013b3144 in opitsk ()
#17 0x00000000013b60ec in opiino ()
#18 0x00000000013b7c10 in opiodr ()
#19 0x00000000013a92f8 in opidrv ()
#20 0x0000000001fa3936 in sou2o ()
#21 0x000000000072d40b in opimai_real ()
#22 0x000000000072d35c in main ()

其stack call为ktcCommitTxn=> ktucmt => kcbchg => kcbchg1_main => kcrfw_redo_gen => kcrf_commit_force

 

kcbchg==> block change ,为什么要发生block change呢? 因为commit需要对在Buffer Cache里的block做immediate block cleanout

此时开一个session B  查询maclog表

SQL> select * from maclog; ==》阻塞在buffer busy wait上

 

这样就通过 无法完成的immediate block cleanout 去pin住buffer ,来形成了一个buffer busy wait

 

做一个system state dump :
SQL> oradebug setmypid
Statement processed.
SQL>
SQL> oradebug dump systemstate 266;
Statement processed.
SQL> oradebug tracefile_name
/s01/admin/G10R25/udump/g10r25_ora_18551.trc

 

分析该systemstate dump

 

Session B一直在等 buffer busy wait

SO: 0xaa42fff8, type: 4, owner: 0xaa3048f8, flag: INIT/-/-/0×00
(session) sid: 164 trans: (nil), creator: 0xaa3048f8, flag: (100051) USR/- BSY/-/-/-/-/-
DID: 0001-0008-00000002, short-term DID: 0000-0000-00000000
txn branch: (nil)
oct: 0, prv: 0, sql: (nil), psql: (nil), user: 0/SYS
service name: SYS$BACKGROUND
waiting for ‘buffer busy waits’ wait_time=0, seconds since wait started=245
file#=2, block#=89, class#=21
blocking sess=0x(nil) seq=12413
Dumping Session Wait History
for ‘buffer busy waits’ count=1 wait_time=0.215431 sec
file#=2, block#=89, class#=21
for ‘buffer busy waits’ count=1 wait_time=0.977438 sec
file#=2, block#=89, class#=21
for ‘buffer busy waits’ count=1 wait_time=0.977538 sec
file#=2, block#=89, class#=21
for ‘buffer busy waits’ count=1 wait_time=0.977512 sec
file#=2, block#=89, class#=21
for ‘buffer busy waits’ count=1 wait_time=0.977480 sec
file#=2, block#=89, class#=21
for ‘buffer busy waits’ count=1 wait_time=0.977488 sec
file#=2, block#=89, class#=21
for ‘buffer busy waits’ count=1 wait_time=0.977639 sec
file#=2, block#=89, class#=21

其stack call为:

Short stack dump:
ksdxfstk()+32<-ksdxcb()+1573<-sspuser()+111<-__restore_rt()+0<-__GI_semtimedop()+10<-sskgpwwait()+265<-skgpwwait()+162<-kslwaitns_timed()+1102<-kskthbwt()+246<-kslwait(
)+228<-kcbzwb()+1496<-kcbgtcr()+23190<-ktugct()+588<-ktbgcl1()+4711<-ktrgcm()+1979<-ktrget()+486<-kdst_fetch()+524<-kdstf0000001kmP()+3137<-kdsttgr()+2427<-qertbFetch()
+650<-qergsFetch()+444<-opifch2()+2944<-opiall0()+2206<-opikpr()+642<-opiodr()+1184<-rpidrus()+196<-skgmstack()+158<-rpidru()+116<-rpiswu2()+409<-kprball()+1270<-kkescF
etch()+83<-kkedsamp()+6304<-kkedsSel()+1495<-kkecdn()+3055<-kkotap()+859<-kkoiqb()+9830<-kkooqb()+904<-kkoqbc()+2093<-apakkoqb()+167<-apaqbdDescendents()+414<-apaqbdLis
tReverse()+68<-apadrv()+573<-opitca()+1545<-kksLoadChild()+9714<-kxsGetRuntimeLock()+1454<-kksfbc()+14910<-kkspsc0()+979<-kksParseCursor()+142<-opiosq0()+1641<-kpooprx(
)+318<-kpoal8()+964<-opiodr()+1184<-ttcpip()+1226<-opitsk()+1310<-opiino()+1024<-opiodr()+1184<-opidrv()+548<-sou2o()+114<-opimai_real()+163<-main()+116<-__libc_start_m
ain()+244<-_start()+41

 

 

file=2 block=0×89 即 137 md: EXCL  被  owner: 0xaa30b888  PID=22持有

SO: 0xaadd91d8, type: 24, owner: 0xaa44f240, flag: INIT/-/-/0xc0
(buffer) (CR) PR: 0xaa30c878 FLG: 0×0
class bit: (nil)
kcbbfbp: [BH: 0x69fe2948, LINK: 0xaadd9218] (WAITING)
where: ktuwh05: ktugct, why: 0
BH (0x69fe2948) file#: 2 rdba: 0×00800089 (2/137) class: 33 ba: 0x69cc6000
set: 3 blksize: 8192 bsi: 0 set-flg: 0 pwbcnt: 0
dbwrid: 0 obj: -1 objn: 0 tsn: 1 afn: 2
hash: [aacc0958,aacc0958] lru: [75fc3228,7eff15a8]
obj-flags: object_ckpt_list
ckptq: [6ef77368,aadf28f8] fileq: [6ef77378,aadf2938] objq: [a7bdd678,6afdf518]
use: [aaddb748,aaddb748] wait: [aaddef10,aadd9218]
st: XCURRENT md: EXCL tch: 465
flags: mod_started gotten_in_current_mode block_written_once
redo_since_read
change state: ACTIVE
change count: 1
LRBA: [0x2d5.10d5a.0] HSCN: [0x0.242e76a] HSUB: [1]
Using State Objects
—————————————-
SO: 0xaaddb708, type: 24, owner: 0xaa454568, flag: INIT/-/-/0×00
(buffer) PR: 0xaa30b888 FLG: 0×1000
class bit: (nil)
kcbbfbp: [BH: 0x69fe2948, LINK: 0xaaddb748]
where: ktuwh02: ktugus, why: 0
Waiting State Objects
—————————————-
SO: 0xaaddeed0, type: 24, owner: 0xaa450598, flag: INIT/-/-/0xc0
(buffer) PR: 0xaa3048f8 FLG: 0×0
class bit: (nil)
kcbbfbp: [BH: 0x69fe2948, LINK: 0xaaddef10] (WAITING)
where: ktuwh02: ktugus, why: 0

 

简单来说这个示例说明了几点:

  1. OLTP类型的小DML操作一般都会是immediate block cleanout的,这要求在commit之前对block做change kcbchg
  2. 在commit kcrf_commit_force完成前都不会释放对该block buffer的buffer pin
  3. 由上述2点造成的buffer pin最终会影响select和其他insert/update/delete 形成buffer busy wait
  4. 由于慢的lgwr写redo log会造成 kcrf_commit_force commit的缓慢,表现在等待事件上就是log file sync
  5. 由于block cleanout时pin block buffer且commit 慢,则会导致更长时间的buffer busy wait
  6. 若log file sync是由lgwr 写redo log慢(log file parallel write)引起的,则它的另一个效应就是buffer busy wait增多
  7. 若看到AWR中log file sync+buffer busy wait是主要等待事件,则优先解决log file sync ,因为buffer busy wait实际可能是受害者

 

AWR中与commit cleanout相关的 Instance activity 有好几个

 

commit cleanout failures: block lost
commit cleanout failures: buffer being written
commit cleanout failures: callback failure
commit cleanout failures: cannot pin
commit cleanouts
commit cleanouts successfully completed

 

  1. 有什么办法看懂
    其stack call为ktcCommitTxn=> ktucmt => kcbchg => kcbchg1_main => kcrfw_redo_gen => kcrf_commit_force
    类似Kcbchg,ktucmt,这样缩写关键字

  2. Pingback: enq: TX – row lock/index contention、allocate ITL等待事件 | Ask Maclean Oracle Blog

  3. (gdb) break kcbzarBreakpoint 1 at 0x1129b0c(gdb) break kcbrlsBreakpoint 2 at 0x10e5d24(gdb) cContinuing.Breakpoint 2, 0x00000000010e5d24 in kcbrls ()#0 0x00000000010e5d24 in kcbrls ()#1 0x0000000002ab8940 in kdddgb ()#2 0x0000000002a8f627 in kdusru ()#3 0x0000000002a7e94a in kauupd ()#4 0x0000000001ee601d in updrow ()#5 0x0000000002eb2df8 in qerupRowProcedure ()#6 0x0000000002eb33d3 in qerupFetch ()#7 0x0000000001eef47b in updaul ()#8 0x0000000001eedf0e in updThreePhaseExe ()#9 0x0000000001eec2fe in updexe ()#10 0x00000000030f32ed in opiexe ()#11 0x0000000003232d47 in kpoal8 ()#12 0x00000000013b7c10 in opiodr ()#13 0x0000000003c3c9da in ttcpip ()#14 0x00000000013b3144 in opitsk ()#15 0x00000000013b60ec in opiino ()#16 0x00000000013b7c10 in opiodr ()#17 0x00000000013a92f8 in opidrv ()#18 0x0000000001fa3936 in sou2o ()#19 0x000000000072d40b in opimai_real ()#20 0x000000000072d35c in main ()(gdb) session B:SQL> update maclog set t1=t1+2 where rowid=’AAAW4+AABAAATkqAAA’;1 row updated.session C:SQL> update maclog set t1=t1+2 where rowid=’AAAW4+AABAAATkqAAA’; SO: 0xa830bfa8, type: 2, owner: (nil), flag: INIT/-/-/0×00 (process) Oracle pid=23, calls cur/top: 0xa8452348/0xa8452348, flag: (0) – int error: 0, call error: 0, sess error: 0, txn error 0 (post info) last post received: 0 0 0 last post received-location: No post last process to post me: none last post sent: 0 0 0 last post sent-location: No post last process posted by me: none (latch info) wait_event=0 bits=0 Process Group: DEFAULT, pseudo proc: 0xa834c6e0 O/S info: user: oracle, term: pts/4, ospid: 6905 OSD pid info: Unix process pid: 6905, image: oracle@vrh8.oracle.com (TNS V1-V3) Short stack dump:ksdxfstk()+32<-ksdxcb()+1573<-sspuser()+111<-__restore_rt()+0<-__GI_semtimedop()+10<-sskgpwwait()+265<-skgpwwait()+162<-kslwaitns_timed()+1102<-kskthbwt()+246<-kslwait()+228<-kcbzwb()+1496<-kcbgcur()+5863<-ktbgcur()+152<-kdddgb()+590<-kdusru()+7999<-kauupd()+450<-updrow()+1643<-qerupRowProcedure()+80<-qerupFetch()+677<-updaul()+1075<-updThreePhaseExe()+378<-updexe()+624<-opiexe()+15853<-kpoal8()+7231<-opiodr()+1184<-ttcpip()+1226<-opitsk()+1310<-opiino()+1024<-opiodr()+1184<-opidrv()+548<-sou2o()+114<-opimai_real()+163<-main()+116<-__libc_start_main()+244<-_start()+41 —————————————- SO: 0xa841bd18, type: 4, owner: 0xa830bfa8, flag: INIT/-/-/0×00 (session) sid: 149 trans: 0x9c768988, creator: 0xa830bfa8, flag: (80000041) USR/- BSY/-/-/-/-/- DID: 0001-0017-00000001, short-term DID: 0000-0000-00000000 txn branch: (nil) oct: 6, prv: 0, sql: 0xa781c2a0, psql: (nil), user: 0/SYS service name: SYS$USERS O/S info: user: oracle, term: pts/4, ospid: 6904, machine: vrh8.oracle.com program: sqlplus@vrh8.oracle.com (TNS V1-V3) application name: sqlplus@vrh8.oracle.com (TNS V1-V3), hash value=543908804 waiting for ‘buffer busy waits’ wait_time=0, seconds since wait started=45 file#=1, block#=1392a, class#=1 blocking sess=0x(nil) seq=9 Dumping Session Wait History for ‘buffer busy waits’ count=1 wait_time=0.370403 sec file#=1, block#=1392a, class#=1 for ‘buffer busy waits’ count=1 wait_time=0.977508 sec file#=1, block#=1392a, class#=1 for ‘buffer busy waits’ count=1 wait_time=0.977548 sec file#=1, block#=1392a, class#=1 for ‘buffer busy waits’ count=1 wait_time=0.977520 sec file#=1, block#=1392a, class#=1 for ‘buffer busy waits’ count=1 wait_time=0.977519 sec file#=1, block#=1392a, class#=1 for ‘buffer busy waits’ count=1 wait_time=0.977530 sec file#=1, block#=1392a, class#=1 for ‘buffer busy waits’ count=1 wait_time=0.977505 sec file#=1, block#=1392a, class#=1 for ‘buffer busy waits’ count=1 wait_time=0.977515 sec file#=1, block#=1392a, class#=1 for ‘buffer busy waits’ count=1 wait_time=0.977550 sec file#=1, block#=1392a, class#=1 for ‘buffer busy waits’ count=1 wait_time=0.977530 sec file#=1, block#=1392a, class#=1 Sampled Session History of session 149 serial 18 SO: 0xa8452348, type: 3, owner: 0xa830bfa8, flag: INIT/-/-/0×00 (call) sess: cur a841bd18, rec 0, usr a841bd18; depth: 0 —————————————- SO: 0xa42316a0, type: 24, owner: 0xa8452348, flag: INIT/-/-/0xd0 (buffer) (exclusive wait) PR: 0xa830bfa8 FLG: 0×0 class bit: (nil) kcbbfbp: [BH: 0x8cfed238, LINK: 0xa42316e0] (WAITING) EXCLUSIVE where: kduwh01: kdusru, why: 0 BH (0x8cfed238) file#: 1 rdba: 0x0041392a (1/80170) class: 1 ba: 0x8ce04000 set: 3 blksize: 8192 bsi: 0 set-flg: 2 pwbcnt: 119 dbwrid: 0 obj: 93758 objn: 93758 tsn: 0 afn: 1 hash: [8cfefcb8,a5e48ab8] lru: [8cfed3c8,8cfed1a8] ckptq: [NULL] fileq: [NULL] objq: [88fc3a08,96ff6e90] use: [a4233c10,a4233c10] wait: [a4231b40,a42316e0] st: XCURRENT md: EXCL tch: 3 flags: gotten_in_current_mode block_written_once redo_since_read LRBA: [0x0.0.0] HSCN: [0xffff.ffffffff] HSUB: Using State Objects —————————————- SO: 0xa4233bd0, type: 24, owner: 0xa844f168, flag: INIT/-/-/0×00 (buffer) PR: 0xa8307fe8 FLG: 0×8000 class bit: 0×2 kcbbfbp: [BH: 0x8cfed238, LINK: 0xa4233c10] where: kduwh01: kdusru, why: 0 Waiting State Objects —————————————- SO: 0xa4231b00, type: 24, owner: 0xa84536b8, flag: INIT/-/-/0xd0 (buffer) (exclusive wait) PR: 0xa830afb8 FLG: 0×0 class bit: (nil) kcbbfbp: [BH: 0x8cfed238, LINK: 0xa4231b40] (WAITING) EXCLUSIVE where: kduwh01: kdusru, why: 0 —————————————- SO: 0xa42316a0, type: 24, owner: 0xa8452348, flag: INIT/-/-/0xd0 (buffer) (exclusive wait) PR: 0xa830bfa8 FLG: 0×0 class bit: (nil) kcbbfbp: [BH: 0x8cfed238, LINK: 0xa42316e0] (WAITING) EXCLUSIVE where: kduwh01: kdusru, why: 0 buffer tsn: 0 rdba: 0x0041392a (1/80170) scn: 0×0000.00c502d3 seq: 0×01 flg: 0×00 tail: 0x02d30601 1* select WA_NXT,WA_PRV from x$bh where file#=1 and DBABLK=80170SQL> /WA_NXT WA_PRV—————- —————-00000000A42316E0 00000000A4231B40000000008CFF8FC8 000000008CFF8FC8SQL> /WA_NXT WA_PRV—————- —————-00000000A42316E0 00000000A4231B40000000008CFF8FC8 000000008CFF8FC8SQL> /WA_NXT WA_PRV—————- —————-00000000A4231B40 00000000A42316E0000000008CFF8FC8 000000008CFF8FC8

  4. Pingback: log file sync | 魏兴华-DBA

  5. Pingback: Oracle SQL性能优化专题 – ORACLE数据库数据恢复、性能优化、故障诊断来问问MACLEAN

  6. oltp类型系统 并发量大 commit比较多 批量提交业务上难实现 除了尽量使用快速磁盘 还有其他办法减少 log file sync这个事件吗

    • commit_write = batch,nowait是一个选择,但有极低的概率丢失事务,对于重要的transaction系统需谨慎使用