latch free:SQL memory manager latch

一套HP-UX上的10.2.0.4 RAC系统，其中一个节点出现大量的latch free:SQL memory manager latch等待事件，相关ADDM性能信息：

SELECT KSMCHCLS CLASS, COUNT(KSMCHCLS) NUM, SUM(KSMCHSIZ) SIZ,
To_char( ((SUM(KSMCHSIZ)/COUNT(KSMCHCLS)/1024)),'999,999.00')||'k' "AVG SIZE"
FROM X$KSMSP
GROUP BY KSMCHCLS;
CLASS NUM SIZ AVG SIZE
-------- ---------- ---------- ------------
R-freea 927 9481080 9.99k
freeabl 1037478 2001975328 1.88k
R-free 593 170107192 280.14k
recr 515080 595805056 1.13k
R-perm 498 35345056 69.31k
R-recr 2 2224 1.09k
perm 97848 1320357168 13.18k
free 251577 161871664 .63k

DETAILED ADDM REPORT FOR TASK 'TASK_21286' WITH ID 21286
--------------------------------------------------------

Analysis Period: 26-OCT-2009 from 10:00:52 to 11:00:12
Database ID/Instance: 2429423994/1
Database/Instance Names: VPROD/VPROD
Host Name: crmdb1
Database Version: 10.2.0.4.0
Snapshot Range: from 10253 to 10254
Database Time: 420176 seconds
Average Database Load: 118 active sessions

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

FINDING 1: 75% impact (313574 seconds)
--------------------------------------
Database latches in the "Other" wait class were consuming significant database
time.

RECOMMENDATION 1: DB Configuration, 75% benefit (313574 seconds)
RATIONALE: The latch "SQL memory manager latch" with latch id 289 was
one of the top latches contributing to high "latch free" waits.
RELEVANT OBJECT: database latch# 289

SYMPTOMS THAT LED TO THE FINDING:
SYMPTOM: Wait class "Other" was consuming significant database time.
(75% impact [313620 seconds])

FINDINGS
========
Load Profile

Per Second Per Transaction
Redo size: 534,863.07 8,586.10
Logical reads: 527,828.57 8,473.18
Block changes: 13,065.49 209.74
Physical reads: 3,999.96 64.21
Physical writes: 617.50 9.91
User calls: 9,869.10 158.43
Parses: 3,287.89 52.78
Hard parses: 27.90 0.45
Sorts: 1,564.66 25.12
Logons: 4.35 0.07
Executes: 5,423.19 87.06

COMMENTS
==========
Load in terms of GETS is tremendous and no particular SQL is responsible for a
large percentage of the load. It may simply be too much for the hardware/configuration.

HOWEVER
=========
I/O while moderate in volume is very slow.

Tablespace IO Stats

* ordered by IOs (Reads + Writes) desc

Tablespace Reads Av Reads/s Av Rd(ms) Av Blks/Rd Writes Av Writes/s Buffer Waits Av Buf Wt(ms)
------------------------------------------------------------------------------------------------------
TBS_CRM_TS_S1 8,340,219 772 4.48 2.95 356,810 33 17,284,247 2.15
TBS_CRM_IDX_S1 1,643,104 152 7.89 1.22 298,265 28 22,684 4.34
TEMP 762,343 71 1.09 6.67 411,652 38 0 0.00
TBS_BAKDATA 358,137 33 6.00 4.11 93,672 9 335,632 6.32
TBS_RM_TS_S1 400,528 37 9.57 1.09 39,000 4 1,546 10.49
TBS_RM_IDX_S1 399,888 37 8.71 1.01 33,342 3 1,577 9.74
TBS_PROD 422,348 39 3.68 1.00 941 0 5 12.00
TBS_SP_TS_S1 341,104 32 6.26 12.19 504 0 44 5.68
TBS_BFBAKDATA 187,013 17 2.34 7.49 909 0 104,797 2.75
TBS_PF 141,810 13 8.76 1.79 8,571 1 563 11.39
TBS_BSS_SYS 118,965 11 2.26 14.72 2,171 0 7 0.00

So about 1/3 typical performance in terms of Av Rd(ms).

SEGMENTS
==========
Segments by Logical Reads

* Total Logical Reads: 5,699,755,248
* Captured Segments account for 72.1% of Total

Owner Tablespace Name Object Name Subobject Name Obj. Type Logical Reads %Total
CRM TBS_CRM_TS_S1 PROD_2_TD TABLE 454,967,408 7.98
RM TBS_RM_TS_S1 PHONE_NUMBER TABLE 435,121,200 7.63
EVENT TBS_CRM_TS_S1 CUST_INTERACTION TABLE 386,838,512 6.79
CRM TBS_CRM_TS_S1 CO_2_TD TABLE 383,932,304 6.74
CRM TBS_CRM_TS_S1 PARTY_IDENTIFY_EXTEND TABLE 298,659,184 5.24

So 5 individual segments are the target of 72% of all gets.

.
ACTION PLAN
============
Please do:

(1) Disk I/O
-----------------
Please ask your SA why Disk I/O is so slow.

(2) Segements
---------------------
Please consider partitioning these 5 objects to spread out GETS and READS:

Owner Tablespace Name Object Name
---------------------------------------------
CRM TBS_CRM_TS_S1 PROD_2_TD
RM TBS_RM_TS_S1 PHONE_NUMBER
EVENT TBS_CRM_TS_S1 CUST_INTERACTION
CRM TBS_CRM_TS_S1 CO_2_TD
CRM TBS_CRM_TS_S1 PARTY_IDENTIFY_EXTEND

诊断发现shared pool共享池有较多的空闲内存，反倒是IO响应速度的Av Rd(ms)要慢于典型的1 ms to 5 ms。

需要SA系统管理员进一步确认存储IO是否存在性能问题。

文档《Note.457063.1 Slow I/O On HP Unix》介绍了HP-UX平台上IO性能瓶颈的一些解决路径，引用如下：

Slow I/O On HP Unix

Applies to:
Oracle Server – Enterprise Edition – Version: 8.1.7.0 to 10.2.0.3 – Release: 8.1.7 to 10.2
HP OpenVMS VAX
HP-UX PA-RISC (64-bit)
HP-UX PA-RISC (32-bit)
HP OpenVMS Alpha
HP-UX Itanium
HP Tru64 UNIX
Symptoms

Disk utility output shows 100% usage for disk continously:

ProcList CPU Rpt Mem Rpt Disk Rpt NextKeys SlctProc Help Exit
GlancePlus C.04.50.00 19:15:39 bplita3 ia64 Current Avg High
———————————————————————-
CPU Util S SN NU U | 90% 90% 90%
Disk Util F F |100% 100% 100% <=== Disk Too heavily loaded

In Statspack or AWR report, high ‘free buffer waits’ seen even after with considerably increasing db_cache_size and db writers.

Statspack Report

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Buffer Nowait %:   99.99       Redo NoWait %:     99.99
            Buffer Hit   %:   58.92    In-memory Sort %:     99.82
            Library Hit   %:   89.19        Soft Parse %:     83.51
         Execute to Parse %:   50.76         Latch Hit %:    100.00
Parse CPU to Parse Elapsd %:   81.48     % Non-Parse CPU:     93.21

….

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~                                                     % Total
Event                                               Waits    Time (s) Ela Time
——————————————– ———— ———– ——–
free buffer waits                                  76,390      74,807    55.82
enqueue                                             4,071      11,625     8.67
…..

Cause

The problem is related to Disk I/O.
Solution

1. Use the O/S striping software

Try to use the O/S striping software to distribute database files over as many disks as you can.

2. Use Direct IO
Mount the filesystem with direct IO option.
For example:
% Mount –F vxfs –o remount,nodatainlog,mincache=direct,convosync=direct /dev/vg00/lv_ora /soft/oracle

mincache and convosync
“mincache=direct” => bypass buffer cache on read
“convosync=direct” => force direct I/O for DB writers

Mincache=direct and convosync=direct allow data to be transferred directly from Oracle buffer cache to disk and disk to Oracle buffer cache. This avoids double buffering by bypassing the file system buffer cache and can improve physical read/write performance. However, cases where the disk read could have been avoided because a required block was in file system buffer cache may be negatively impacted.

If your filesystem is mounted with this option then FILESYSTEMIO_OPTIONS parameter needs to set to “SETALL” in order to use DIO.
Parameters in Oracle influencing the use of Direct IO

    FILESYSTEMIO_OPTIONS defines the IO operations on filesystem files .This parameter should not normally be set by the user.
    The value may be any of the following:
        asynch – Set by default on HP. This allows asynchronous IO to be used where supported by the OS.
        directIO – This allows directIO to be used where supported by the OS. Direct IO bypasses any Unix buffer cache.
        setall – Enables both ASYNC and DIRECT IO.
        none – This disables ASYNC IO and DIRECT IO so that Oracle uses normal synchronous writes, without any direct io options.

See
Document 120697.1 Init.ora Parameter “FILESYSTEMIO_OPTIONS” Reference Note

Oracle recommends that you use the value SETALL instead of the value DIRECTIO, because the DIRECTIO value disables asynchronous I/O.

DISK_ASYNCH_IO controls whether I/O to datafiles, control files, and logfiles is asynchronous (that is, whether parallel server processes can overlap I/O requests with CPU processing during table scans). If your platform supports asynchronous I/O to disk, Oracle recommends that you leave this parameter set to its default value. However, if the asynchronous I/O implementation is not stable, you can set this parameter to false to disable asynchronous I/O. If your platform does not support asynchronous I/O to disk, this parameter has no effect.

If you set DISK_ASYNCH_IO to false, then you should also set DBWR_IO_SLAVES or DB_WRITER_PROCESSES to a value other than its default of zero in order to simulate asynchronous I/O.

DB_WRITER_PROCESSES or DBWR_IO_SLAVES
see comments in DISK_ASYNCH_IO

3. Concurrent I/O
An alternative solution to Direct I/O is to use Concurrent I/O. Concurrent I/O is available in OnlineJFS 5.0.1.

To enable Concurrent I/O, the filesystem must be mounted with “-o cio”.
Eg:

mount -F vxfs -o nodatainlog,cio /soevxfs/redos /oracle/mnt/redos

Please note that remount should not be used to enable Concurrent I/O on mounted filesystems.

“-o cio”
Concurrent I/O allows multiple processes to read from or write to the same file without blocking other read(2) or write(2) calls.With Concurrent I/O, the read and write operations are not serialized. This advisory is generally used by applications that require high performance for accessing data and do not perform overlapping writes to the same file. It is the responsibility of the application or the running threads to coordinate the write activities to the same file. It also avoids double buffering by bypassing the filesystem buffer cache and thus improves physical read/write performance significantly. Concurrent I/O performs very close to that of raw logical volumes.

latch free:SQL memory manager latch

Comments

Leave a Reply Cancel reply