反向负载转移（reverse offloading）的再次探索

原文链接： http://www.dbaleet.org/second_try_with_exadata_reverse_offloading/ 距上一篇关于reverse offloading文章发布已经一个多月了，但是这个功能似乎还有一些值得探究的地方。于是又顺便摸索了一把：正好我自己也有几个一些疑惑，那么就带着问题上路吧。第一个问题是：在数据库db server一端是否又参数可能控制这个功能？以我对oracle的了解，就像storage index一样，reverse offloading这样的功能一定会在db一端有对应的参数的，并且通常是一个隐含参数，那么这个参数是什么呢？我们又需要动用到人尽皆知的”秘密武器“了：

SQL> set lines 180 pages 999
SQL> col name for a50
SQL> col value for a20
SQL> col describ for a70
SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
FROM SYS.x$ksppi x, SYS.x$ksppcv y
WHERE x.inst_id = USERENV ('Instance')
AND y.inst_id = USERENV ('Instance')
AND x.indx = y.indx
AND x.ksppinm LIKE '%&par%' 2 3 4 5 6
7 /
Enter value for par: passthru
old 6: AND x.ksppinm LIKE '%&par%'
new 6: AND x.ksppinm LIKE '%passthru%'

NAME VALUE DESCRIB
-------------------------------------------------- -------------------- ----------------------------------------------------------------------
_kcfis_cell_passthru_enabled FALSE Do not perform smart IO filtering on the cell
_kcfis_cell_passthru_dataonly TRUE Allow dataonly passthru for smart scan
_kcfis_cell_passthru_fromcpu_enabled TRUE Enable automatic passthru mode when cell CPU util is too high
_kcfis_celloflsrv_passthru_enabled FALSE Enable offload server usage for passthru operations

以上参数是在12.1.0.2的测试环境生成的，在11.2中应该是没有_kcfis_cell_passthru_dataonly和_kcfis_celloflsrv_passthru_enabled 这样的参数的。当然刚好手头没有11.2的Exadata环境，有条件的可以在11gR2的普通database上进行查看。这两个参数应该是12c关于Exadata上一个未公开的新特性叫做offload server（oflsrv）, 当然这个特性我目前没有研究透，所以不便妄下断言。 _kcfis_cell_passthru_enabled是一个我们非常熟悉的参数，曾经在troubleshooting exadata wrong result这样的文章中有介绍过，这个参数是用来控制smart io过滤功能的，也就是关系到整个offloading功能的，所以这个参数肯定不是reverse offloading的控制参数。那么剩下的那个_kcfis_cell_passthru_fromcpu_enabled看起描述都能猜到这才是我们真正想要挖到的金子。其description道：Enable automatic passthru mode when cell CPU util is too high。 这里我就不进行测试了，因为俺可不想不想影响别人的测试工作，毕竟Exadata测试环境有限是很多这个特性默认是启用的，也就是说默认情况下，如果cell节点的cpu利用率超过了相应的阈值，cell节点就会考虑不适用smart io，而是使用传统的block io。不要问我怎么猜到是这个参数的，这一切都是个谜呀。第二个问题是：是否有历史视图可以查看reverse offloading的历史信息：首先，我想到的是在cell机器上有一个叫做cellsrvstat的工具，这里面应该会有一些线索：

[root@dm01cel01 ~]# cellsrvstat | grep -i cpu
SQL ids consuming the most CPU
END SQL ids consuming the most CPU
Total cpu passthru output IO size (KB) 0 0
CPU passthru output size (KB) 0 0
CPU passthru output size (KB) 0 0

额，这里看名字应该是CPU passthru output size是总的reverse offloading的的总量，但是为什么会有两行一模一样的东西呢？

== Offload server related stats ==
Offload group name: SYS_112331_131106
Total input IO size (KB) 0 0
Total output size (KB) 0 0
Passthru output size (KB) 0 0
CPU passthru output size (KB) 0 0
OS memory allocated to SGA (KB) 0 6144
SGA heap used - OAL statistics (KB) 0 4472
OS memory allocated to PGA (KB) 0 58705
PGA heap used - OAL statistics (KB) 0 58799
OS memory allocated to group pool (KB) 0 2040
Group pool used (KB) 0 52
Total OS memory (KB) 0 248773

Offload group name: SYS_121110_131107
Total input IO size (KB) 0 7040
Total output size (KB) 0 2332
Passthru output size (KB) 0 0
CPU passthru output size (KB) 0 0
OS memory allocated to SGA (KB) 0 6144
SGA heap used - OAL statistics (KB) 0 5275
OS memory allocated to PGA (KB) 0 18526
PGA heap used - OAL statistics (KB) 0 12628
OS memory allocated to group pool (KB) 0 2040
Group pool used (KB) 0 52
Total OS memory (KB)

进一步细看，发现并非重复记录，而是在新版的Exadata cell image下包含了两个offload group，这显然是之前说的offload server特性的一部分。打住！今天主要目的是其它的。这个历史信息会不会在dump里面有，于是搜肠刮肚的翻资料，一个个的看cell的dump参数，终于找到了这个cellsrv.cellsrv_dump(‘mpp_stats’) 咦，貌似有点戏。

[root@dm01cel01 ~]# cellcli
CellCLI: Release 12.1.1.1.0 - Production on Wed Nov 20 08:08:36 MST 2013

Copyright (c) 2007, 2013, Oracle. All rights reserved.
Cell Efficiency Ratio: 1,030

CellCLI> alter cell events = "immediate cellsrv.cellsrv_dump('mpp_stats','0')"
Dump sequence #3 has been written to /opt/oracle/cell/log/diag/asm/cell/dm01cel01/trace/svtrc_11153_28.trc
Cell slcc05cel01 successfully altered

打开这个trace文件，可以看到：

*** 2013-11-14 23:36:33.559
2013-11-14 23:36:33.559674 :00020F2E: FenceMaster: OSS_IOCTL_FENCE_ENTITY is set, number of fencing in progress 0 reid cid=e35e5387a02befeebfc2f925878ed4ef,icin=279937547,nmn=1,lnid=279937547,gid=-2147483637,gin=2,gmn=0,umemid=4,opid=0,opsn=0,lvl=member hdr=0xfece0100
2013-11-20 08:12:59.364785 :000CB112: [MPP] Number of blocks executed in passthru mode because of high CPU utilization: 0 out of 35 total blocks. Percent = 0.000000%
2013-11-20 08:12:59.365066 :000CB113: Dump sequence #4:
[MPP] Current cell cpu utilization: 4
[MPP] Wed Nov 20 08:12:59 2013 [Cell CPU History] 4 [Pushback Rate] 0
[MPP] Wed Nov 20 08:12:59 2013 [Cell CPU History] 4 [Pushback Rate] 0
[MPP] Wed Nov 20 08:12:58 2013 [Cell CPU History] 4 [Pushback Rate] 0
[MPP] Wed Nov 20 08:12:58 2013 [Cell CPU History] 5 [Pushback Rate] 0
[MPP] Wed Nov 20 08:12:58 2013 [Cell CPU History] 5 [Pushback Rate] 0
[MPP] Wed Nov 20 08:12:58 2013 [Cell CPU History] 3 [Pushback Rate] 0
[MPP] Wed Nov 20 08:12:58 2013 [Cell CPU History] 4 [Pushback Rate] 0
[MPP] Wed Nov 20 08:12:57 2013 [Cell CPU History] 3 [Pushback Rate] 0
[MPP] Wed Nov 20 08:12:57 2013 [Cell CPU History] 5 [Pushback Rate] 0
[MPP] Wed Nov 20 08:12:57 2013 [Cell CPU History] 4 [Pushback Rate] 0
......
[MPP] Wed Nov 20 07:42:48 2013 [Cell CPU History] 0 [Pushback Rate] 0

我们可以看到

被reverse offloading push back到db server的block的数量;
当前cell的cpu的利用率;
距离现在30分钟的cell cpu利用率以及reverse offloading push back的比例。