Row Cache lock Problem

一套AIX上的11.1.0.7系统,应用启动时出现大量row cache lock等待,具体的systemstate dump信息如下:

FILE VERSIONS
-----------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
With the Partitioning, Real Application Clusters, Data Mining and Real Application Testing options
ORACLE_HOME = /oracle/product/11.1.0/database
System name: AIX
Node name: HA5_4ADB01
Release: 3
Version: 5
Machine: 000687C2D900
Instance name: HN4A1


TRACE FILE
---------------------------
Filename=HN4A1_ora_2061038.trc



Chains most likely to have caused the hang:
 [a] Chain 1 Signature: 'row cache lock'<='row cache lock' (cycle)
     Chain 1 Signature Hash: 0x75bdd0c
 [b] Chain 2 Signature: 'row cache lock'<='row cache lock' (cycle)
     Chain 2 Signature Hash: 0x75bdd0c
 [c] Chain 3 Signature: 'row cache lock'<='row cache lock' (cycle)
     Chain 3 Signature Hash: 0x75bdd0c
 
===============================================================================
Cycles:
-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
                instance: 1 (hn4a.hn4a1)
                   os id: 2114372
              process id: 211, oracle@HA5_4ADB01
              session id: 610
        session serial #: 5
    }
    is waiting for 'row cache lock' with wait info:
    {
                      p1: 'cache id'=0x7
                      p2: 'mode'=0x0
                      p3: 'request'=0x5
            time in wait: 0.491964 sec
      heur. time in wait: 26.859741 sec
           timeout after: 2.508036 sec
                 wait id: 311
                blocking: 1 session
            wait history:
              1.     event: 'row cache lock'
                   wait id: 310             p1: 'cache id'=0x7
               time waited: 2.929713 sec    p2: 'mode'=0x0
                                            p3: 'request'=0x5
              2.     event: 'row cache lock'
                   wait id: 309             p1: 'cache id'=0x7
               time waited: 2.929726 sec    p2: 'mode'=0x0
                                            p3: 'request'=0x5
              3.     event: 'row cache lock'
                   wait id: 308             p1: 'cache id'=0x7
               time waited: 2.929720 sec    p2: 'mode'=0x0
                                            p3: 'request'=0x5
    }
    and is blocked by
 => Oracle session identified by:
    {
                instance: 1 (hn4a.hn4a1)
                   os id: 2118476
              process id: 209, oracle@HA5_4ADB01
              session id: 616
        session serial #: 5
    }
    which is waiting for 'row cache lock' with wait info:
    {
                      p1: 'cache id'=0x7
                      p2: 'mode'=0x0
                      p3: 'request'=0x5
            time in wait: 0.491910 sec
      heur. time in wait: 26.859705 sec
           timeout after: 2.508090 sec
                 wait id: 308
                blocking: 1 session
            wait history:
              1.     event: 'row cache lock'
                   wait id: 307             p1: 'cache id'=0x7
               time waited: 2.929713 sec    p2: 'mode'=0x0
                                            p3: 'request'=0x5
              2.     event: 'row cache lock'
                   wait id: 306             p1: 'cache id'=0x7
               time waited: 2.929729 sec    p2: 'mode'=0x0
                                            p3: 'request'=0x5
              3.     event: 'row cache lock'
                   wait id: 305             p1: 'cache id'=0x7
               time waited: 2.929715 sec    p2: 'mode'=0x0
                                            p3: 'request'=0x5
    }
    and is blocked by the session at the start of the chain.
 
Chain 1 Signature: 'row cache lock'<='row cache lock' (cycle)
Chain 1 Signature Hash: 0x75bdd0c

...==>many similar chains

PROCESS 211:
  ----------------------------------------
  SO: 0x7000008a1370178, type: 2, owner: 0x0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
   proc=0x7000008a1370178, name=process, file=ksu.h LINE:10706 ID:, pg=0
  (process) Oracle pid:211, ser:3, calls cur/top: 0x7000008ac61a0e8/0x7000008ac61a0e8
            flags : (0x0) -
            flags2: (0x0),  flags3: (0x0)
            int error: 0, call error: 0, sess error: 0, txn error 0
  ksudlp FALSE at location: 0
  (post info) last post received: 0 0 201
              last post received-location: kqr.h LINE:2181 ID:kqrbgl: compatible mode
              last process to post me: 70000089134b800 1 6
              last post sent: 0 0 0
              last post sent-location: No post
              last process posted by me: none
    (latch info) wait_event=0 bits=0
    Process Group: DEFAULT, pseudo proc: 0x7000008a153a4c8
    O/S info: user: oracle, term: UNKNOWN, ospid: 2114372
    OSD pid info: Unix process pid: 2114372, image: oracle@HA5_4ADB01
    Short stack dump:
<-ksedsts()+0254<-ksdxfstk()+0028<-ksdxcb()+05d8<-sspuser()+0074<-44c0<-skgpwwait()+00b8<-ksliwat()+0c48<-kslwaitctx()+0150<-kqrigt()+06ac<-kqrLockAndPinPo()+0250<-kqrpre1()+061c<-kqrpre()+001c<-kziavua()+0384<-kpolnb()+0ffc<-kpoauth()+07a0<-opiodr()+0b98<-ttcpip()+115c<-opitsk()+1700<-opiino()+09f0<-opiodr()+0b98<-opidrv()+0440<-sou2o()+0090<-opimai_real()+01b0<-main()+0090<-__start()+0098


    service name: SYS$USERS
    client details:
      O/S info: user: aiuap, term: unknown, ospid: 1234
      machine: HA5-4A26 program: JDBC Thin Client
      application name: JDBC Thin Client, hash value=2546894660
...
         proc=0x7000008a1370178, name=row cache enqueues, file=kqr.h LINE:2004 ID:, pg=0
        row cache enqueue: count=1 session=7000008916dee00 object=7000008dfd84428, request=X
        savepoint=0x3e
        row cache parent object: address=7000008dfd84428 cid=7(dc_users)
        hash=481062cf typ=21 transaction=700000888da17a0 flags=00000002
        own=7000008dfd844f0[7000008bf6e18f0,7000008bf6e18f0] wat=7000008dfd84500[7000008bf5d2750,7000008ef34e818] mode=X
        status=VALID/-/-/-/-/-/-/-/-
        request=N release=FALSE flags=8
        instance lock=QK 481062cf bb6f8fe9
...
      LIBRARY OBJECT LOCK: 7000008ef5069d8 handle=7000008dfd22338 mod=N
      pnc=0 pns=0 cbb=1 rpr=1 exc=1 ilh=0 ctx=0
      use=7000008916dee00 ses=7000008a15c44f0 cnt=1 flg=CNB/[0001] spn=0x4c3d09ea
      LIBRARY HANDLE:7000008dfd22338 bid=105620 hid=a3c59c94 lmd=N pmd=0 sta=VALD
      name=select privilege#,level from sysauth$ connect by grantee#=prior privilege# and privilege#>0 start with grantee#=:1 and privilege#>0
      hash=8973f7c629f09e3f081962c2a3c59c94 idn=0

众多wait chain的源头均是row cache lock,systemstate转储中该等待事件相关的p1,也就是cache id均为7;cache#为7的字典缓存代表着dc_users,即字典中的用户信息(包括用户名,密码等信息):

SQL> select cache#,parameter from v$rowcache group by cache#,parameter order by cache#;

    CACHE# PARAMETER
---------- --------------------------------
         0 dc_tablespaces
         1 dc_free_extents
         2 dc_segments
         3 dc_rollback_segments
         4 dc_used_extents
         5 dc_tablespace_quotas
         6 dc_files
         7 dc_users
         8 dc_object_grants
         8 dc_objects
         9 dc_qmc_cache_entries

    CACHE# PARAMETER
---------- --------------------------------
        10 dc_usernames
        11 dc_object_ids
        12 dc_constraints
        13 dc_sequences
        14 dc_profiles
        15 dc_database_links
        16 dc_histogram_data
        16 dc_histogram_defs
        17 dc_global_oids
        18 dc_outlines
        19 dc_partition_scns

    CACHE# PARAMETER
---------- --------------------------------
        19 dc_table_scns
        20 rule_info
        21 rule_fast_operators
        21 rule_or_piece
        22 dc_awr_control
        23 dc_qmc_ldap_cache_entries
        24 outstanding_alerts
        25 dc_hintsets
        26 global database name
        27 qmtmrcin_cache_entries
        28 qmtmrctn_cache_entries

    CACHE# PARAMETER
---------- --------------------------------
        29 qmtmrcip_cache_entries
        30 qmtmrctp_cache_entries
        31 qmtmrciq_cache_entries
        32 qmtmrctq_cache_entries
        33 kqlsubheap_object
        34 realm auth
        34 realm cache
        35 Command rule cache
        36 Realm Object cache
        36 Realm Subordinate Cache
        37 event map

    CACHE# PARAMETER
---------- --------------------------------
        38 format
        39 audit collector
        40 Rule Set Cache

47 rows selected.

大量的dc_users字典缓存锁可能引起后续会话登录挂起,用户登录无法成功,乃至整个实例hang住。那么又是什么样的操作可能引起这种row cache lock并发锁等待事件呢?

应用维护人员反馈问题发生时部分应用程序配置文件中的数据库用户密码不正确,同时该应用会在短期内发起大量会话登录数据库,但应用人员观察到这一问题后并没有联想到可能引发进一步的实例上的dc_user row cache lock问题,仅尝试修改密码并重启该部分的应用程序。至此该问题的root cause已经较为明了:由于短期内出现大量连接请求但连接发起端给出的密码均不正确,引起了Server端严重的用户信息字典缓存锁(users dictionary cache  lock)争用。

该问题诊断过程中提交了SR,Oracle GCS给出了进一步的结论,在11g release 1上Bug 7715339也可能造成类似的row cache lock问题,且该Bug一开始在AIX平台上没有one-off的patch,之后客户要求MOS开发出了该平台上的补丁,目前AIX上也有7715339的one-off patch了:

1. On 11.1.0.7, Many ‘row cache lock’ happened , application can not login to database.
2. The issue is caused by password has been changed but some application still used old password, so these application continuously trying connect to database and leads to dc_users contention.
3. This symptom matches Bug 7715339, but there is no patch for AIX available.
4. Customer requested one-off patch 7715339 on AIX Power System(64-bit), because they are afraid the issues again after changing password.

其实我一直认为Oracle的登录机制存在一定的问题,实际上不管用户提供的密码是否正确,Oracle都会为新的connection分配一个shadow process,该服务进程为了进一步验证用户信息都不得不获取少量资源;如果以此为契机,即便在无法正常登录的情况下依然能在短期内造成实例僵死。


Posted

in

by

Tags:

Comments

7 responses to “Row Cache lock Problem”

  1. admin Avatar
    admin

    Hdr: 9362179 11.1.0.7 RDBMS 11.1.0.7 ROW CACHE PRODID-5 PORTID-226 7715339
    Abstract: ROW CACHE ENQUEUES ON DC_USERS AFTER UPGRADE TO 11G

    PROBLEM:
    ——–
    After upgrade to 11g the row cache contention heavily increase for similar
    load.
    Avg
    wait % DB
    Event Waits Time(s) (ms) time Wait
    Class
    —————————— ———— ———– —— ——
    ———-
    row cache lock 10,939 231,475 21161 68.1
    Concurrenc
    wait for a undo record 1,233,565 52,004 42 15.3 Other
    db file sequential read 2,886,518 19,680 7 5.8 User
    I/O
    DB CPU 10,021 2.9
    read by other session 860,510 6,547 8 1.9 User
    I/O

    DIAGNOSTIC ANALYSIS:
    ——————–
    There are no X holders in the systemstates but there a lot of X requestors
    and S requestors for dc_users.

    For example:

    P0211: row cache enqueue: count=1 session=0x3a1bdd030
    object=0x377e52188, request=S
    P0211.1: row cache enqueue: count=1 session=0x39db30fb0
    object=0x377e52188, request=X

    P0213: row cache enqueue: count=1 session=0x395c37648
    object=0x377e52188, request=S
    P0213.1: row cache enqueue: count=1 session=0x39db26290
    object=0x377e52188, request=X

    P0214: row cache enqueue: count=1 session=0x39db43ea8
    object=0x377e52188, request=S
    P0214.1: row cache enqueue: count=1 session=0x399bb4e90
    object=0x377e52188, request=X

    WORKAROUND:
    ———–
    N/A

    RELATED BUGS:
    ————-
    Bug 4206084 fixed in this version.
    Bug 7715339 cannot be tested due conflicts with existent patches.

  2. casoul Avatar
    casoul

    大哥,你的订阅不能访问:http://www.oracledatabase12g.com/archives/2010/09/06/dns%E8%AE%BE%E7%BD%AE%E5%BC%95%E8%B5%B7%E7%9A%84%E7%99%BB%E5%BD%95%E5%BB%B6%E8%BF%9F.html,可能是里面包含中文了吧。

    1. admin Avatar
      admin

      因为包含了日期,我把永久链接改成/archives/%postname%.html的形式了

  3. 云飞扬IT Avatar

    你好,国庆节快乐!欢迎回访,光临我的小站!

  4. teapot Avatar
    teapot

    11gR2上碰到一例:
    解决办法:
    LIBRARY CACHE LOCKS DUE TO INVALID LOGIN ATTEMPTS [ID 1309738.1]
    EVENT=”28401 TRACE NAME CONTEXT FOREVER, LEVEL 1″

    SQL> SET linesize 120
    SQL> SET feedback off
    SQL> SET SERVEROUTPUT ON
    SQL> DECLARE
    2 err_msg VARCHAR2(120);
    3 BEGIN
    4 dbms_output.enable (1000000);
    5 FOR err_num IN 28400..28401
    6 LOOP
    7 err_msg := SQLERRM (-err_num);
    8 IF err_msg NOT LIKE ‘%Message ‘||err_num||’ not found%’ THEN
    9 dbms_output.put_line (err_msg);
    10 END IF;
    11 END LOOP;
    12 END;
    13 /
    ORA-28400: Event to enable TDE tracing
    ORA-28401: Event to disable delay after three failed login attempts
    SQL> exit

  5. 任季新 Avatar

    刘大 登录的时候为什么要进行row cache lock 哪

    1. Maclean Liu Avatar

      一些user info存放在row cache里

Leave a Reply to admin Cancel reply

Your email address will not be published. Required fields are marked *