ksfd: async disk IO等待事件

Often a write to disk.

Solutions

Consistent Slow Performance

Improve I/O

Investigate if the application can be changed to perform fewer writes.

Abnormal Variations in Performance

Investigate potential problems with the operating systems or I/O sub-system.

Investigate contention at the disk/controller level.

Does LGWR use synchronous IO even AIO enabled?

在Oracle中开启AIO异步IO后可以一定程度上提升数据库IO性能,但同时也引入了丢失commit数据的风险。具体可以参见小荷同学的文章,但是这里存在一个疑问就是redo重做记录是同步写出还是在启用AIO后同样使用异步IO的API,我们可以通过跟踪lgwr后台进程的system call系统调用来说明该问题:

SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production
SQL> show parameter disk_asynch_io
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
disk_asynch_io                       boolean     TRUE
SQL> show parameter filesystem
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
filesystemio_options                 string      SETALL
[oracle@rh2 ~]$ ps -ef|grep dbw0_G10R2|grep -v grep
oracle   29168     1  0 19:02 ?        00:00:01 ora_dbw0_G10R2
[oracle@rh2 ~]$ strace -p 29168
.............................
io_submit(140140183375872, 34, {{0x7f74ee284e10, 0, 1, 0, 16},
{0x7f74ee290920, 0, 1, 0, 16}, {0x7f74ee286970, 0, 1, 0, 16}, {0x7f74ee290db0, 0, 1, 0, 16}  = 34
io_getevents(140140183375872, 1, 1024, {{0x7f74ee284e10, 0x7f74ee284e10, 8192, 0},
0x7f74ee289710, 8192, 0}}, {600, 0}) = 12
times(NULL)                             = 480509951
times(NULL)                             = 480509951
times(NULL)                             = 480509951
times(NULL)                             = 480509951
times(NULL)                             = 480509951
times(NULL)                             = 480509951
times(NULL)                             = 480509951
times(NULL)                             = 480509951
times(NULL)                             = 480509951
times(NULL)                             = 480509951
times(NULL)                             = 480509951
times(NULL)                             = 480509951
getrusage(RUSAGE_SELF, {ru_utime={0, 417936}, ru_stime={0, 823874}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 417936}, ru_stime={0, 823874}, ...}) = 0
io_getevents(140140183375872, 1, 1024, {{0x7f74ee28c4b0, 0x7f74ee28c4b0, 8192, 0},
{0x7f74ee287290, 0x7f74ee287290, 8192, 0}, {0x7f74ee283988, 0x7f74ee283988, 8192, 0},
{0x7f74ee27fbf0, 0x7f74ee27fbf0, 8192, 0}, {0x7f74ee28a030, 0x7f74ee28a030, 8192, 0}, {0x7f74ee28fdb8, = 22
/* 此处dbwr使用io_submit函数将I/O请求入列,
io_submit是典型的asynchronous I/O system calls,
可见Oracle针对数据文件写出已在使用异步IO */
[oracle@rh2 ~]$ ps -ef|grep lgwr_G10R2|grep -v grep
oracle   29170     1  0 19:02 ?        00:00:01 ora_lgwr_G10R2
[oracle@rh2 ~]$ strace -p 29170
.............................
io_submit(139932588023808, 2, {{0x7f4497f423c8, 0, 1, 0, 20}, {0x7f4497f42590, 0, 1, 0, 21}}) = 2
io_getevents(139932588023808, 1, 1024, {{0x7f4497f423c8, 0x7f4497f423c8, 3584, 0}}, {600, 0}) = 1
times(NULL)                             = 480533371
io_getevents(139932588023808, 1, 1023, {{0x7f4497f42590, 0x7f4497f42590, 3584, 0}}, {600, 0}) = 1

以上io_submit system call说明当数据库启用AIO后lgwr同样使用异步IO写出重做记录到online logfile,换而言之当存储crash时的确可能出现redo记录丢失而造成的记录丢失情况。

实际控制lgwr是否使用异步IO的是一个隐藏参数_lgwr_async_io,该参数一般默认为false:

SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
2   FROM SYS.x$ksppi x, SYS.x$ksppcv y
3   WHERE x.inst_id = USERENV ('Instance')
4   AND y.inst_id = USERENV ('Instance')
5   AND x.indx = y.indx
6  AND x.ksppinm='_lgwr_async_io';
NAME
--------------------------------------------------------------------------------
VALUE
--------------------------------------------------------------------------------
DESCRIB
--------------------------------------------------------------------------------
_lgwr_async_io
FALSE
LGWR Asynchronous IO enabling boolean flag

你可能要这样问:”既然该_lgwr_async_io参数默认为false,那么lgwr就应当使用同步IO而非async,这岂不矛盾?”

是的,理论上来说默认_lgwr_async_io参数为false,lgwr应当使用sync。但是在版本10.1.0.2到11.1.0.6存在一个“Bug:8357698 LGWR USES ASYNC IO INSPITE OF SETTING _LGWR_ASYNC_IO=FALSE”:

Abstract: LGWR USES ASYNC IO INSPITE OF SETTING _LGWR_ASYNC_IO=FALSE

PROBLEM:
——–
+ Lgwr uses asynch IO irrespective of the setting _lgwr_async_io=false
and consumes high CPU performing IO poll operations.
+ short stack and truss output show that the lgwr is waiting for asynch IO
completion notification.

This particular problem was dormant for a long time and highly affects the
database performance.

DIAGNOSTIC ANALYSIS:
——————–
LGWR Shortstack
~~~~~~~~~~~~~~~

aiowait()+540<-skgfospo()+216<-skgfrwat()+80<-ksfdwtio()+476<-ksfdwat1()+84<-k
sfdrwat

0()+520<-kcrfw_post()+500<-kcrfw_redo_write()+2964<-ksbabs()+764<-ksbrdp()

aiowait()+540<-skgfospo()+216<-skgfrwat()+80<-ksfdwtio()+476<-ksfdwat1()+84<-k
sfdrwat

0()+520<-kcrfw_post()+500<-kcrfw_redo_write()+2964<-ksbabs()+764<-ksbrdp()

aiowait()+540<-skgfospo()+216<-skgfrwat()+80<-ksfdwtio()+476<-ksfdwat1()+84<-k
sfdrwat

0()+520<-kcrfw_post()+500<-kcrfw_redo_write()+2964<-ksbabs()+764<-ksbrdp()

Truss output
~~~~~~~~~~~~~
= 1048576
67072
27640/38: kaio(AIONOTIFY, 0) = 0
27640/40: kaio(AIONOTIFY, 0) = 0
27640/43: kaio(AIONOTIFY, 0) = 0
27640/45: kaio(AIONOTIFY, 0) = 0
27640/1: kaio(AIOWAIT, 0xFFFFFFFF7FFFD480) = 1
27640/1: kaio(AIOWAIT, 0xFFFFFFFF7FFFD480) = 1
27640/1: kaio(AIOWAIT, 0xFFFFFFFF7FFFD480) = 1
27640/1: kaio(AIOWAIT, 0xFFFFFFFF7FFFD480) = 1
= 1048576
27640/42: kaio(AIONOTIFY, 0) = 0
27640/1: kaio(AIOWAIT, 0xFFFFFFFF7FFFD480) = 1

Note : The TRUSS output was taken with the filesystemio_options=setall.

1. Performed the test on solaris Box with the below settings :

filesystemio_options=none
disk_asynch_io=false
_lgwr_async_io=TRUE

2. Performed another test on Linux with the following settings and it looks
like the behaviour matches with Solaris and hence it looks like it may not be
port specific .

filesystemio_options=SETALL
disk_asynch_io=TRUE
_lgwr_async_io=TRUE/FALSE

Linux_lgwr_true.out ) .

Request you to review the uploaded files and update with the results of your
analysis .

As per Bug:8357698 LGWR USES ASYNC IO INSPITE OF SETTING _LGWR_ASYNC_IO=FALSE , the filesystemio_options & disk_asynch_io are overwriting _lgwr_async_io settings .

Solution
Most often high CPU consumption by lgwr is related to Async IO.

You can try setting filesystemio_options=none & disk_asynch_io=false with obvious performance impact , but this may not be acceptable to you .

As per Bug:8357698 LGWR USES ASYNC IO INSPITE OF SETTING _LGWR_ASYNC_IO=FALSE , the filesystemio_options & disk_asynch_io are overwriting _lgwr_async_io settings .

1] Move the Redologs to the filesystem where the async IO is not permitted . This will cause the lgwr process not to use async calls.

OR

2] Use direct I/O, which is best done by mounting the corresponding filesystems unbuffered ( (using the”sync” option in Linux ext3, or the “forcedirectio” option in Solaris ufs, for example).

For example :
=========
On solaris:
—————
# mount_ufs -o forcedirectio /dev/rdsk/c2t3d0s0 /d17

If forcedirectio is specified and supported by the file system, then for the duration of the mount forced direct I/O will be used. If the filesystem is mounted using forcedirectio, then data is transferred directly between user address space and the disk. If the filesystem is mounted using noforcedirectio, then data is buffered in kernel address space when data is transferred between user address space and the disk. forcedirectio is a performance option that benefits only from large sequential data transfers. The default behavior is noforcedirectio.

On linux :
————–
# mount -o sync /dev/rdsk/c2t3d0s0 /d17

The sync option has effect only for ext2, ext3, fat, vfat and ufs . The default is async .

PS: We recommend you to involve your system administrator in chainging the moutn options of the OS file system .

由于Bug:8357698的存在,导致filesystemio_options和disk_asynch_io 2个参数可以覆盖_lgwr_async_io的设置,这个问题一直到11.1.0.6以后才得到修复,而在11.2中_lgwr_async_io参数干脆被取消了:

SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production
SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
2   FROM SYS.x$ksppi x, SYS.x$ksppcv y
3   WHERE x.inst_id = USERENV ('Instance')
4   AND y.inst_id = USERENV ('Instance')
5   AND x.indx = y.indx
6  AND x.ksppinm='_lgwr_async_io';
no rows selected
SQL> show parameter filesystem
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
filesystemio_options                 string      setall
SQL> show parameter async
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
disk_asynch_io                       boolean     TRUE
tape_asynch_io                       boolean     TRUE
io_submit(139755827974144, 2, {{0x7f1b6e992450, 0, 1, 0, 256}, {0x7f1b6e9921f8, 0, 1, 0, 259}}) = 2
io_getevents(139755827974144, 2, 128, {{0x7f1b6e992450, 0x7f1b6e992450, 1024, 0}, {0x7f1b6e9921f8, 0x7f1b6e9921f8, 1024, 0}}, {600, 0}) = 2
times({tms_utime=3, tms_stime=4, tms_cutime=0, tms_cstime=0}) = 480863691
semtimedop(2654212, 0x7fffadc3db80, 1, {3, 0}) = 0
times({tms_utime=3, tms_stime=4, tms_cutime=0, tms_cstime=0}) = 480863808
times({tms_utime=3, tms_stime=4, tms_cutime=0, tms_cstime=0}) = 480863808
io_submit(139755827974144, 2, {{0x7f1b6e9921f8, 0, 1, 0, 256}, {0x7f1b6e992450, 0, 1, 0, 259}}) = 2
io_getevents(139755827974144, 2, 128, {{0x7f1b6e9921f8, 0x7f1b6e9921f8, 512, 0}, {0x7f1b6e992450, 0x7f1b6e992450, 512, 0}}, {600, 0}) = 2
/* 在11.2中若启用了AIO,那么lgwr仍会沿用async IO */

总结:
lgwr是否启用async IO取决于_lgwr_async_io隐藏参数,该参数默认为false,也就是说lgwr默认情况下应当使用sync同步IO。但是因为10.1.0.2到11.1.0.6间存在bug会导致_lgwr_async_io被disk_asynch_io和filesystemio_options参数覆盖而不起作用;在以上版本中若启用了AIO那么lgwr无疑会使用AIO,如果既想要启用AIO又不想因为lgwr使用async而造成隐患,那么可以将日志文件所在移动到不允许async IO的位置,或者使用direct/sync/forcedirectio选项来mount文件系统。在版本11.2以后_lgwr_async_io参数被废弃,lgwr是否启用async io完全取决于disk_asynch_io及filesystemio_options。

11g新特性之IO校准(IO Calibration)

IO校准(IO Calibration)特性可以帮助我们了解存储系统的真实性能,以进一步判断I/O性能问题是由数据库还是存储系统自身引起的。I/O校准特性(IO Calibration)通过对Oracle数据文件的随机I/O访问存储介质,其结论值更符合数据库IO性能的真实情况。
在使用该特性前,我们要确保满足以下条件:

  • 调用该存储过程需要用到SYSDBA权限
  • TIME_STATISTICS 参数为true:

SQL> show parameter timed_statistics;

NAME TYPE VALUE
———————————— ———– ——————————
timed_statistics boolean TRUE

  • 必须打开异步IO;注意在使用文件系统时,FILESYSTEMIO_OPTIONS需设为SETALL或ASYNC,否则Oracle不会启用异步IO。
  • 可以通过以下查询检验是否启用了异步IO:
SQL> SELECT NAME, ASYNCH_IO
2    FROM V$DATAFILE F, V$IOSTAT_FILE I
3   WHERE F.FILE# = I.FILE_NO
4     AND FILETYPE_NAME = 'Data File';
NAME                                                                             ASYNCH_IO
-------------------------------------------------------------------------------- ---------
D:\TOOLS\ADMINSTRATOR\11G\ORADATA\PROD\DATAFILE\O1_MF_SYSTEM_65DN8HXT_.DBF       ASYNC_ON
D:\TOOLS\ADMINSTRATOR\11G\ORADATA\PROD\DATAFILE\O1_MF_SYSAUX_65DN8J18_.DBF       ASYNC_ON
D:\TOOLS\ADMINSTRATOR\11G\ORADATA\PROD\DATAFILE\O1_MF_UNDOTBS1_65DN8J1S_.DBF     ASYNC_ON
D:\TOOLS\ADMINSTRATOR\11G\ORADATA\PROD\DATAFILE\O1_MF_USERS_65DN8J2X_.DBF        ASYNC_ON

IO Calibration特性可以通过DBMS_RESOURCE_MANAGER.CALIBRATE_IO存储过程调用;该过程会对Oracle数据文件引发一系列IO敏感的只读工作负载(由1MB大小的随机IO组成),从而判断存储系统所能持续的最大IOPS(每秒最大IO请求数)和MBPS(每秒IO传输速率)。为了使结果更具代表性,应当保持IO Calibration测试过程中整个数据库是空闲的,没有其他IO负载损耗。

让我们具体来看一下CALIBRATE_IO存储过程的调用方法:

 -- Initiate an I/O calibration
--
-- Input arguments:
--  num_physical_disks   - Approximate number of physical disks in
--                         the database storage
--  max_latency          - Maximum tolerable latency in milliseconds for
--                         database-block-sized IO requests
--
-- Output arguments:
--  max_iops       - Maximum number of I/O requests per second that can be
--                   sustained.  The I/O requests are randomly-distributed,
--                   database-block-sized reads.
--  max_mbps       - Maximum throughput of I/O that can be sustained,
--                   expressed in megabytes per second. The I/O requests
--                   are randomly-distributed, 1 megabyte reads.
--  actual_latency - Average latency of database-block-sized I/O
--                   requests at "max_iops" rate, expressed in milliseconds.
--
PROCEDURE calibrate_io (
num_physical_disks  IN  PLS_INTEGER DEFAULT 1,
max_latency         IN  PLS_INTEGER DEFAULT 20,
max_iops            OUT PLS_INTEGER,
max_mbps            OUT PLS_INTEGER,
actual_latency      OUT PLS_INTEGER
);

其中输入参数num_physical_disks用以指定数据库存储所用物理磁盘的近似数;max_latency指定了客户所能容忍的最大IO 延迟时间,单位为ms;一般我们认为IO平均等待在10ms左右是一种较好的表现,远大于这个数字往往说明IO负载过高了,这里我们假定15ms是我们的deadline。

SQL> set serveroutput on;
SQL> DECLARE
2    lat  INTEGER;
3    iops INTEGER;
4    mbps INTEGER;
5  BEGIN
6  -- DBMS_RESOURCE_MANAGER.CALIBRATE_IO (disk_count,max_latency , iops, mbps, lat);
7     DBMS_RESOURCE_MANAGER.CALIBRATE_IO (2, 15, iops, mbps, lat);
8
9    DBMS_OUTPUT.PUT_LINE ('max_iops = ' || iops);
10    DBMS_OUTPUT.PUT_LINE ('latency  = ' || lat);
11    dbms_output.put_line('max_mbps = ' || mbps);
12  end;
13  /
max_iops = 127
latency  = 15
max_mbps = 38
PL/SQL procedure successfully completed
/*平均延迟为15ms时,最大iops为127,mbps为38*/
/*执行过程中也可以通过V$IO_CALIBRATION_STATUS视图了解进度*/
SQL> select * from V$IO_CALIBRATION_STATUS;
STATUS        CALIBRATION_TIME
------------- --------------------------------------------------------------------------------
IN PROGRESS   11-8月 -10 12.02.35.121 下午
SQL> alter session set nls_timestamp_format='HH24:MI';
Session altered
SQL> col start_time for a10;
SQL> col end_time for a10;
SQL> select * from DBA_RSRC_IO_CALIBRATE;
START_TIME END_TIME     MAX_IOPS   MAX_MBPS  MAX_PMBPS    LATENCY NUM_PHYSICAL_DISKS
---------- ---------- ---------- ---------- ---------- ---------- ------------------
12:55      12:59             127         38         14         15                  2
/*DBA_RSRC_IO_CALIBRATE视图记录了IO CALIBRATION的历史信息*/

[gview file=”http://askmaclean.com/wp-content/uploads/resource/calibrate_io.pdf”]

沪ICP备14014813号

沪公网安备 31010802001379号