Oracle ORA-4031错误技术背景

  1. SGA中的内存池包含不同大小的内存块。当数据库启动时,就有一个大的内存块分配并被hush buckets 里的空闲列表追踪。随着时间推移,随着内存的分配和释放,内存块被按照大小在不同的hush buckets间移动。当SGA里任何一个内存池里出现不能满足内部分配请求的情况时,ORA-04031就出现了。

shared pool共享池的管理方式不同于其它的内存池。。共享池存放与数据字典和library cache有关的信息。但是,这些内存区域根据空闲列表和最近使用算法(LRU)管理。当在共享池的所有搜索结束后,从LRU列表清除所有的可能清除的对象, 多次扫描空闲列表后,仍没有找到内存块,ORA-04031就出现了。这意味着ORA-04031很难预测。

 

  1. 对共享池的监测,可以看它是否包含许多类似的SQL,只有文字不同。 这种情况会占用更多的共享池内存并引共享池碎片,过多的共享池碎片(fragment)会导致虽然共享池中仍有大量的free memory,但都是尺寸较小的内存块(chunk),当Oracle进程申请一些较大的连续内存空间(memory chunk)时,虽然共享池中的free memory大小远大于申请的连续空间大小,仍会引发ORA-4031错误。使用绑定变量可以使SQL 共享。使用本文所附的脚本可以查出内存中是否有许多类似SQL。

即使使用了绑定变量后,仍然可能存在高version count(子指针)的情况。为了使子指针共享,CURSOR_SHARING参数可能需要调整。metalink 文档Note 296377.1 和 261020.1可以提供详细信息。若造成4031的原因是由于未绑定变量或者游标无法共享导致的过度硬解析(Hard Parse),则应当调整应用绑定变量或者调整初始化参数。

 

  1. 9i中开始引入shared pool subpool子池技术。设计多个共享池子池的目的是分散单个shared pool LRU Latch的并行压力。 ,每一个子池都包含自有的LRU LIST和保留区域等其他内存结构。 subpools子池的数量会在Oracle实例启动(startup nomount)时根据SGA_MAX_SIZE(或11g中的memory_max_target)以及服务器上的CPU数量而自动决定。子池数量最多为7个,在Card系统中为4个。

每一个subpool都是一个”迷你型”的共享池,其包括自有的Freelist、内存结构记录和LRU LIST。subpool子池技术是对shared pool并发扩张能力的增强,且每一个子池现在都使用独立的shared pool child latch来保护。这意味着不再像9i以前的版本那样因为只有一个shared pool latch而剧烈争用了。

但是在实际使用中发现版本9i中仍存在一些BUG,可能导致在子池之间的内存使用分布不平衡均匀,这可能导致虽然部分子池使用率不高,但是个别子池内存过度分配,从而导致ORA-4031错误。一般把这种现象称作”subpool imbalance”。

 

 

 

分析问题发生时段的ALERT.LOG内容如下:

 

 

Wed Nov 21 02:16:01 2012

ARC1: Evaluating archive   log 3 thread 1 sequence 2373

ARC1: Beginning to archive log 3 thread 1 sequence 2373

Creating archive destination LOG_ARCHIVE_DEST_1: ‘/card_arch1/1_2373.arc’

ARC1: Completed archiving  log 3 thread 1 sequence 2373

Wed Nov 21 16:11:46 2012

Errors in file /oracle/admin/card/udump/card1_ora_6782.trc:

ORA-04031: unable to allocate 4216 bytes of shared memory (“shared pool”,”select rowid, count(*) as TO…”,”sga heap(1,0)”,”library cache”)

ORA-04031: unable to allocate 4216 bytes of shared memory (“shared pool”,”select rowid, count(*) as TO…”,”sga heap(1,0)”,”library cache”)

Wed Nov 21 16:11:47 2012

Trace dumping is performing id=[cdmp_20121121161147]

Wed Nov 21 16:14:10 2012

Errors in file /oracle/admin/card/udump/card1_ora_6682.trc:

ORA-04031: unable to allocate 4216 bytes of shared memory (“shared pool”,”TD_REPTCASH”,”sga heap(1,0)”,”library cache”)

ORA-04031: unable to allocate 4216 bytes of shared memory (“shared pool”,”TD_REPTCASH”,”sga heap(1,0)”,”library cache”)

Wed Nov 21 16:14:11 2012

Trace dumping is performing id=[cdmp_20121121161411]

 

在16:11:46时SPID=6782的前台服务进程报ORA-4031错误,分析该进程的TRACE如下:

 

/oracle/admin/card/udump/card1_ora_6782.trc

Oracle9i Enterprise Edition Release 9.2.0.8.0 – 64bit Production

With the Partitioning and Real Application Clusters options

JServer Release 9.2.0.8.0 – Production

ORACLE_HOME = /oracle/product/9.2

System name:   HP-UX

Node name:      ecprod1

Release:    B.11.31

Version:   U

Machine:  ia64

Instance name: card1

Redo thread mounted by this instance: 1

Oracle process number: 114

Unix process pid: 6782, image: oracle@ecprod1 (TNS V1-V3)

 

 

*** 2012-11-21 16:06:40.006

*** SESSION ID:(498.18090) 2012-11-21 16:06:40.005

=================================

Begin 4031 Diagnostic Information

=================================

The following information assists Oracle in diagnosing

causes of ORA-4031 errors.  This trace may be disabled

by setting the init.ora parameter _4031_dump_bitvec = 0

======================================

Allocation Request Summary Information

======================================

Current information setting:  00654fff

Dump Interval=300 seconds  SGA Heap Dump Interval=3600 seconds

Last Dump Time=11/21/2012 16:06:39

Allocation request for: kkslpkp – literal info.

Heap: c000000353cb25d0, size: 4200

******************************************************

HEAP DUMP heap name=”sga heap”  desc=c000000381602030

extent sz=0xfe0 alt=200 het=32767 rec=9 flg=-126 opc=0

parent=0000000000000000 owner=0000000000000000 nex=0000000000000000 xsz=0x1

******************************************************

HEAP DUMP heap name=”library cache”  desc=c000000353cb25d0

extent sz=0x348 alt=32767 het=32 rec=0 flg=2 opc=2

parent=c000000381602030 owner=c00000032f85db08 nex=0000000000000000 xsz=0x1

Subheap has 4712 bytes of memory allocated

====================

Process State Object

====================

—————————————-

SO: c0000002e93dac10, type: 2, owner: 0000000000000000, flag: INIT/-/-/0x00

(process) Oracle pid=114, calls cur/top: c00000030275a888/c00000030275a888, flag: (0) –

int error: 0, call error: 0, sess error: 0, txn error 0

(post info) last post received: 0 0 111

last post received-location: kglpndl: post after freeing latch

last process to post me: c0000002e93dd1b0 137 0

last post sent: 1298803 0 16

last post sent-location: ksasnd

last process posted by me: c0000002e93bb2d0 1 6

(latch info) wait_event=0 bits=0

Process Group: DEFAULT, pseudo proc: c0000002ea3a4ff0

O/S info: user: oracle, term: UNKNOWN, ospid: 6782

OSD pid info: Unix process pid: 6782, image: oracle@ecprod1 (TNS V1-V3)

=========================

User Session State Object

=========================

—————————————-

SO: c0000002ea3fc930, type: 4, owner: c0000002e93dac10, flag: INIT/-/-/0x00

(session) trans: 0000000000000000, creator: c0000002e93dac10, flag: (100041) USR/- BSY/-/-/-/-/-

DID: 0001-0072-00161539, short-term DID: 0000-0000-00000000

txn branch: 0000000000000000

oct: 0, prv: 0, sql: c000000347a44a50, psql: c0000003009a42e0, user: 26/OC_MES

O/S info: user: root, term: , ospid: 1234, machine: ecrsouterapp1

program:

last wait for ‘latch free’ blocking sess=0x0 seq=33786 wait_time=12162

address=c00000030245cdc8, number=9d, tries=0

temporary object counter: 0

=========================

Current Parent KGL Object

=========================

LIBRARY OBJECT HANDLE: handle=c000000347a44a50

name=select rowid, count(:”SYS_B_0″) amount    from te_enterpriseinfo_cur a   where a.borrowerid = :”SYS_B_1″     and a.regionalismcode like :”SYS_B_2″

hash=5740b5 timestamp=11-21-2012 16:06:39

namespace=CRSR flags=RON/KGHP/TIM/PN0/MED/[50010000]

kkkk-dddd-llll=0000-0001-0001 lock=N pin=X latch#=9

lwt=c000000347a44a80[c000000347a44a80,c000000347a44a80] ltm=c000000347a44a90[c000000347a44a90,c000000347a44a90]

pwt=c000000347a44ab0[c000000347a44ab0,c000000347a44ab0] ptm=c000000347a44b40[c000000347a44b40,c000000347a44b40]

ref=c000000347a44a60[c000000347a44a60, c000000347a44a60] lnd=c000000347a44b58[c000000347a44b58,c000000347a44b58]

LIBRARY OBJECT: object=c00000037760ea28

type=CRSR flags=EXS[0001] pflags= [00] status=VALD load=0

CHILDREN: size=16

child#    table reference   handle

—— ——– ——— ——–

0 c00000037760ec88 c00000035b9ba6a8 c000000353207a90

 

C000000000000004 ?

C000000000032F90 ?

—– End of Call Stack Trace —–

===============================

Memory Utilization of Subpool 1

===============================

Allocation Name          Size

_________________________  __________

“free memory              ”    82400392

“miscellaneous            ”    43284312

“type object de           ”           0

“parameters               ”           0

“KCL name table           ”     3673416

“MTTR advisory            ”       98256

“Checkpoint queue         ”     9177280

“db_block_hash_buckets    ”     9699328

“PL/SQL MPCODE            ”           0

“trigger inform           ”           0

“sim memory hea           ”     1279200

“KCL extra lock elements  ”      864000

“PL/SQL PPCODE            ”           0

“ges reserved msg buffers ”     2096008

“fixed allocation callback”         744

“pl/sql source            ”           0

“KQR M PO                 ”      189272

“trigger source           ”           0

“ges enqueues             ”     3261024

“KSXR pending messages que”      853952

“gcs shadows              ”    12303112

“ges resource hash table  ”     1048576

“dictionary cache         ”     1068608

“ges resources            ”     1592696

“enqueue resources        ”      572504

“errors                   ”           0

“PL/SQL DIANA             ”           0

“library cache            ”   439251408

“KQR L PO                 ”      266936

“sql area                 ”       64536

“sessions                 ”      749008

“gcs resources            ”    20555384

“KQR S SO                 ”        2048

“trigger defini           ”           0

“event statistics per sess”     3137568

“KGLS heap                ”       26608

“KQR X PO                 ”       18032

“table definiti           ”           0

===============================

Memory Utilization of Subpool 2

===============================

Allocation Name          Size

_________________________  __________

“free memory              ”   108187608

“miscellaneous            ”    37722824

“table definiti           ”        1984

“ges enqueues             ”     3395408

“db_block_hash_buckets    ”     9794640

“ksm_file2sga region      ”      370496

“PL/SQL DIANA             ”      175992

“KQR S SO                 ”         512

“KQR L PO                 ”      633360

“gcs shadows              ”    28408552

“sim memory hea           ”     1286432

“parameters               ”      206424

“1M buffer                ”     1049088

“errors                   ”       40280

“type object de           ”           0

“KQR M PO                 ”      804672

“trigger defini           ”         272

“trigger source           ”          88

“partitioning d           ”       22728

“channel handle           ”      260392

“KGK heap                 ”         552

“db_files                 ”      253952

“dictionary cache         ”     1068608

“KQR X PO                 ”       23184

“gcs resources            ”    36658760

“transaction              ”      480384

“trigger inform           ”          64

“library cache            ”   159389832

“ges resources            ”     1482984

“sql area                 ”   226277160

“sessions                 ”      746304

“Checkpoint queue         ”     9177280

“event statistics per sess”     3148936

“fixed allocation callback”         720

“FileIdentificatonBlock   ”     3580496

“KGLS heap                ”      381736

“ktlbk state objects      ”      651240

“PL/SQL MPCODE            ”     1850264

===============================

Memory Utilization of Subpool 3

===============================

Allocation Name          Size

_________________________  __________

“free memory              ”   108746832

“miscellaneous            ”    43190000

“KGLS heap                ”      340640

“gcs resource hash table  ”    16105280

“trigger source           ”          80

“joxs heap init           ”        4240

“PL/SQL PPCODE            ”       23624

“errors                   ”        5592

“KQR L PO                 ”      612576

“trigger inform           ”          64

“table definiti           ”        1560

“sim memory hea           ”     1279200

“MTTR advisory            ”      106368

“1M buffer                ”     1049088

“ges regular msg buffers  ”     1088248

“gcs resources            ”    20555384

“KQR M PO                 ”      805872

“pl/sql source            ”           0

“KQR X PO                 ”       59248

“KQR S SO                 ”        2816

“ges enqueues             ”     3985496

“ges resources            ”     1917944

“dictionary cache         ”     1068608

“trigger defini           ”        9008

“type object de           ”           0

“PL/SQL DIANA             ”      410776

“parameters               ”      187768

“library cache            ”   197201256

“processes                ”     1376000

“sql area                 ”   209882928

“sessions                 ”      746304

“qmps connections         ”      486200

“event statistics per sess”     3137568

“Checkpoint queue         ”     9177280

“PLS non-lib hp           ”        2088

“gcs shadows              ”    12306208

“fixed allocation callback”         680

“PL/SQL MPCODE            ”     1661384

===============================

Memory Utilization of Subpool 4

===============================

Allocation Name          Size

_________________________  __________

“free memory              ”    82782928

“miscellaneous            ”    40984472

“ges resources            ”     1783744

“db_handles               ”     1160000

“parameters               ”      323936

“sim memory hea           ”     1279200

“KGLS heap                ”      697784

“KQR M PO                 ”      791440

“ges big msg buffers      ”     4674088

“temporary tabl           ”           0

“PL/SQL DIANA             ”      146680

“partitioning d           ”       67896

“KGK heap                 ”       65584

“KQR L PO                 ”      600312

“Checkpoint queue         ”     9177280

“PL/SQL MPCODE            ”     3006528

“enqueue                  ”     2332272

“FileOpenBlock            ”    16104056

“KSXR receive buffers     ”     1034000

“table definiti           ”        3272

“KQR X PO                 ”       46368

“ges enqueues             ”     4135704

“dictionary cache         ”     1068608

“gcs shadows              ”    12306208

“KQR S SO                 ”        2304

“PX subheap               ”       54640

“trigger inform           ”         680

“library cache            ”   138469232

“gcs resources            ”    20555384

“sql area                 ”   271924816

“sessions                 ”      746304

“errors                   ”       15304

“event statistics per sess”     3137568

“ges process array        ”     1273272

“pl/sql source            ”           0

“trigger defini           ”        4400

“type object de           ”           0

“fixed allocation callback”         728

LIBRARY CACHE STATISTICS:

namespace           gets hit ratio      pins hit ratio    reloads   invalids

————– ——— ——— ——— ——— ———- ———-

CRSR           1044847223     0.999 1198493483     0.551  507345521      90137

TABL/PRCD/TYPE 1190948651     1.000 2350380701     1.000     125865          0

BODY/TYBD         568579     1.000    568659     0.999         71          0

TRGR            12494087     1.000  12494089     1.000        774          0

INDX             2091941     0.999   2091942     0.999         34          0

CLST               68101     0.998     83171     0.996          0          0

OBJE                   0     0.000         0     0.000          0          0

PIPE                   0     0.000         0     0.000          0          0

LOB                    0     0.000         0     0.000          0          0

DIR                  348     0.994       216     0.778         22          0

QUEU                   0     0.000         0     0.000          0          0

OBJG                   0     0.000         0     0.000          0          0

PROP                   0     0.000         0     0.000          0          0

JVSC                 502     0.578       502     0.578          0          0

JVRE                 502     0.584       502     0.584          0          0

ROBJ                   0     0.000         0     0.000          0          0

REIP                   0     0.000         0     0.000          0          0

CPOB                   0     0.000         0     0.000          0          0

EVNT            11919560     1.000  17875331     1.000       1680          0

SUMM                   0     0.000         0     0.000          0          0

DIMN                   0     0.000         0     0.000          0          0

CTX                    0     0.000         0     0.000          0          0

OUTL                   0     0.000         0     0.000          0          0

RULS                   0     0.000         0     0.000          0          0

RMGR                   0     0.000         0     0.000          0          0

IFSD                   0     0.000         0     0.000          0          0

PPLN                   0     0.000         0     0.000          0          0

PCLS                   0     0.000         0     0.000          0          0

SUBS                   0     0.000         0     0.000          0          0

LOCS                   0     0.000         0     0.000          0          0

RMOB                   0     0.000         0     0.000          0          0

RSMD                   0     0.000         0     0.000          0          0

JVSD                   0     0.000         0     0.000          0          0

ENPR                   0     0.000         0     0.000          0          0

RELC                   0     0.000         0     0.000          0          0

STREAM                 0     0.000         0     0.000          0          0

APPLY                  0     0.000         0     0.000          0          0

APPLY SOURCE           0     0.000         0     0.000          0          0

APPLY DESTN            0     0.000         0     0.000          0          0

TEST                   0     0.000         0     0.000          0          0

CUMULATIVE     2262939494     0.999 3581988596     0.850  507473967      90137

 

 

 

 

 

当Oracle进程触发ORA-4031错误后会产生4031相关的TRACE,其中包括了当时简要的shared pool使用情况; 分析上述4031 trace中的信息可以发现存在4个subpool (Memory Utilization of Subpool 1、Memory Utilization of Subpool 2、Memory Utilization of Subpool 3、Memory Utilization of Subpool 4)。

以上4个subpool的主要使用情况如下:

 

  Subpool1 Subpool2 Subpool3 Subpool4 Total Size
Free Memory 82400392 108187608 108746832 82782928 364MB
library cache 439251408 159389832 197201256 138469232 947MB
Sql area 64536 226277160 209882928 271924816 675MB
Total size 637531776 637531776 637531776 620754624 2415MB

 

从以上表格可以看到library cache和SQL Area占用了1600MB的共享池空间,而实际free memory的总数也达到364MB,实际引发ORA-4031时所申请的chunk大小仅为4216 bytes。

 

进一步分析可以发现在subpool1中的sql area仅占用64536字节,而subpool 4中sql area占用271924816字节,说明subpool1与其他subpool之间在空间使用上存在不均衡。

 

 

 

 

 

 

为了进一步分析Card库的共享池使用情况,我们通过动态性能视图收集了11-23日 14时当前的shared pool使用状态,由于并非在ORA-4031当时所收集的信息,虽然不能作为诊断该问题的直接证据,但同样具有参考意义:

 

 

BUCKET                           KSMCHCLS   subpool       From      Count    Biggest    AvgSize      Total

——————————– ——– ———- ———- ———- ———- ———- ———-

0 (<140)                         free              1         40       1443         48         43      62112

0 (<140)                         free              1         50        373         56         56      20888

0 (<140)                         free              1         60        537         64         64      34368

0 (<140)                         free              1         70        364         72         72      26208

0 (<140)                         free              1         80       2778         88         86     241024

0 (<140)                         free              1         90        206         96         96      19776

0 (<140)                         free              1        100        230        104        104      23920

0 (<140)                         free              1        110        602        112        112      67424

0 (<140)                         free              1        120        706        128        124      88112

0 (<140)                         free              1        130        199        136        136      27064

0 (<140)                         free              2         40       1031         48         43      45232

0 (<140)                         free              2         50        766         56         56      42896

0 (<140)                         free              2         60        616         64         64      39424

0 (<140)                         free              2         70        374         72         72      26928

0 (<140)                         free              2         80       3421         88         86     295744

0 (<140)                         free              2         90        338         96         96      32448

0 (<140)                         free              2        100        451        104        104      46904

0 (<140)                         free              2        110        325        112        112      36400

0 (<140)                         free              2        120       1458        128        121     176832

0 (<140)                         free              2        130        233        136        136      31688

0 (<140)                         free              3         40       1604         48         44      70896

0 (<140)                         free              3         50        640         56         56      35840

0 (<140)                         free              3         60        694         64         64      44416

0 (<140)                         free              3         70        680         72         72      48960

0 (<140)                         free              3         80       3006         88         85     258256

0 (<140)                         free              3         90        615         96         96      59040

0 (<140)                         free              3        100        602        104        104      62608

0 (<140)                         free              3        110        487        112        112      54544

0 (<140)                         free              3        120       1298        128        123     160360

0 (<140)                         free              3        130        340        136        136      46240

0 (<140)                         free              4         40       1110         48         43      48424

0 (<140)                         free              4         50        758         56         56      42448

0 (<140)                         free              4         60        557         64         64      35648

0 (<140)                         free              4         70        451         72         72      32472

0 (<140)                         free              4         80       2678         88         85     230072

0 (<140)                         free              4         90        305         96         96      29280

0 (<140)                         free              4        100        333        104        104      34632

0 (<140)                         free              4        110        303        112        112      33936

0 (<140)                         free              4        120       1353        128        121     164832

0 (<140)                         free              4        130        243        136        136      33048

1 (140-267)                      free              1        200       1001        216        204     205048

1 (140-267)                      free              1        140        378        152        148      55968

1 (140-267)                      free              1        160        568        176        164      93208

1 (140-267)                      free              1        180        178        192        188      33608

1 (140-267)                      free              1        220       1536        232        231     355320

1 (140-267)                      free              1        240        449        256        249     112048

1 (140-267)                      free              1        260        135        264        264      35640

1 (140-267)                      free              2        200       1028        216        206     212768

1 (140-267)                      free              2        140        774        152        148     114560

1 (140-267)                      free              2        160       1293        176        171     221352

1 (140-267)                      free              2        180        576        192        189     109144

1 (140-267)                      free              2        220        377        232        228      86184

1 (140-267)                      free              2        240        921        256        250     230856

1 (140-267)                      free              2        260        272        264        264      71808

1 (140-267)                      free              3        200       1078        216        205     221824

1 (140-267)                      free              3        140        874        152        148     129448

1 (140-267)                      free              3        160       1642        176        168     276560

1 (140-267)                      free              3        180       2612        192        185     483816

1 (140-267)                      free              3        220        443        232        227     100824

1 (140-267)                      free              3        240        858        256        248     213608

1 (140-267)                      free              3        260        193        264        264      50952

1 (140-267)                      free              4        200        741        216        206     152928

1 (140-267)                      free              4        140        546        152        148      80816

1 (140-267)                      free              4        160       1164        176        169     197160

1 (140-267)                      free              4        180        408        192        187      76632

1 (140-267)                      free              4        220        304        232        227      69216

1 (140-267)                      free              4        240        578        256        249     144200

1 (140-267)                      free              4        260        172        264        264      45408

2 (268-523)                      free              1        300        494        336        320     158368

2 (268-523)                      free              1        250        450        296        284     127960

2 (268-523)                      free              2        300        719        336        320     230744

2 (268-523)                      free              2        500         20        520        518      10360

2 (268-523)                      free              2        250        655        296        283     185744

2 (268-523)                      free              2        450         51        496        480      24488

2 (268-523)                      free              3        300        956        336        322     308272

2 (268-523)                      free              3        250        641        296        281     180688

2 (268-523)                      free              4        300        636        336        320     203920

2 (268-523)                      free              4        400         46        424        409      18840

2 (268-523)                      free              4        250        582        296        283     165232

2 (268-523)                      free              4        350         23        392        383       8824

3-5 (524-4107)                   free              1        500          1        776        776        776

3-5 (524-4107)                   free              1       3000          1       3160       3160       3160

3-5 (524-4107)                   free              2        500        306        984        700     214312

3-5 (524-4107)                   free              2       1000        123       1496       1201     147760

3-5 (524-4107)                   free              2       1500       1139       1992       1878    2139208

3-5 (524-4107)                   free              2       2000       1444       2496       2322    3353672

3-5 (524-4107)                   free              2       2500       4534       2992       2751   12476912

3-5 (524-4107)                   free              2       3000       7520       3496       3211   24151448

3-5 (524-4107)                   free              2       3500       6105       3992       3770   23018216

3-5 (524-4107)                   free              2       4000       2403       4104       4065    9769384

3-5 (524-4107)                   free              3        500        652        992        858     559536

3-5 (524-4107)                   free              3       1000       1897       1496       1162    2204544

3-5 (524-4107)                   free              3       1500        289       1992       1726     498840

3-5 (524-4107)                   free              3       2000       2276       2496       2260    5144800

3-5 (524-4107)                   free              3       2500       4021       2992       2789   11217152

3-5 (524-4107)                   free              3       3000       1822       3496       3279    5975424

3-5 (524-4107)                   free              3       3500      11485       3992       3785   43475232

3-5 (524-4107)                   free              3       4000       2710       4104       4048   10970720

3-5 (524-4107)                   free              4        500         44        992        966      42512

3-5 (524-4107)                   free              4       1000        238       1488       1177     280248

3-5 (524-4107)                   free              4       1500        152       1968       1765     268392

3-5 (524-4107)                   free              4       2000       2015       2496       2376    4789544

3-5 (524-4107)                   free              4       2500       3400       2992       2754    9364976

3-5 (524-4107)                   free              4       3000       2037       3496       3287    6696072

3-5 (524-4107)                   free              4       3500       4731       3992       3710   17552504

3-5 (524-4107)                   free              4       4000       2043       4104       4073    8321968

6+ (4108+)                       free              1       4000          8       4712       4250      34000

6+ (4108+)                       free              1       5000          1       5912       5912       5912

6+ (4108+)                       free              1       7000          1       7048       7048       7048

6+ (4108+)                       free              1      13000          1      13416      13416      13416

6+ (4108+)                       free              1      15000          1      15752      15752      15752

6+ (4108+)                       free              1      19000          1      19600      19600      19600

6+ (4108+)                       free              1      21000          1      21360      21360      21360

6+ (4108+)                       free              1      25000          1      25232      25232      25232

6+ (4108+)                       free              2       4000        100       4272       4146     414648

6+ (4108+)                       free              2       8000          4       8456       8338      33352

6+ (4108+)                       free              2      11000          1      11872      11872      11872

6+ (4108+)                       free              3       4000        145       4720       4132     599248

6+ (4108+)                       free              3       8000          4       8664       8432      33728

6+ (4108+)                       free              4       4000         90       4528       4152     373728

6+ (4108+)                       free              4       6000          1       6896       6896       6896

6+ (4108+)                       free              4       7000          3       7848       7626      22880

6+ (4108+)                       free              4       8000          3       8456       8360      25080

 

 

以上列出了各subpool freelist bucket的情况,6+ (4108+)  代表存放大于4108bytes的free chunk的hash bucket, 可以看到大于4108bytes的free chunk总大小越为1.6M, 而3-5 (524-4107) bucket中chunk数量最多、总大小也最大。  由此可知card库共享池存在大量碎片。过多的共享池碎片导致连续尺寸大约4108字节的free chunk越来越少,当16:11时需要分配4216字节连续chunk时,由于无法分配该连续空间而触发了ORA-4031错误。

 

综上所述是由于共享池碎片和subpool内存分配不均衡最终引发了此次ORA-4031 。

 

18:33时触发过ORA-4031的Oracle进程陆续报ORA-00600: [729], [8680], [space leak]错误,该错误的stack call 如下 ksudel=> ksuxds=> ksmugf=> cold_ksmuhe

 

 

 

*** 2012-11-21 18:33:35.807

ksedmp: internal or fatal error

ORA-00600: internal error code, arguments: [729], [8680], [space leak], [], [], [], [], []

—– Call Stack Trace —–

calling              call     entry                argument values in hex

location             type     point                (? means dubious value)

——————– ——– ——————– —————————-

ksedmp()+528         call     _etext_f()+23058430  000000000 ?

09110304648          C000000000000996 ?

400000000285CD00 ?

ksfdmp()+64          call     _etext_f()+23058430  000000003 ?

09110304648

kgeriv()+432         call     _etext_f()+23058430  600000000004F280 ?

09110304648          000000003 ?

C000000000000716 ?

40000000052FF030 ?

00000863F ?

60000000004B5788 ?

000000000 ? 000000000 ?

kgesiv()+208         call     _etext_f()+23058430  600000000004F280 ?

09110304648          60000000004B7A08 ?

60000000000502F8 ?

600000000001D800 ?

600000000001D790 ?

ksesic2()+208        call     _etext_f()+23058430  600000000004F280 ?

09110304648          60000000005E5198 ?

0000002D9 ? 000000002 ?

9FFFFFFFFFFF9418 ?

$cold_ksmuhe()+432   call     _etext_f()+23058430  0000002D9 ?

09110304648          60000000004C1680 ?

9FFFFFFFFFFF9418 ?

60000000004BFFD0 ?

00000000A ?

4000000000874260 ?

60000000005E1780 ?

000000000 ?

ksmugf()+688         call     _etext_f()+23058430  0000021E8 ?

09110304648          600000000004F060 ?

60000000005E1778 ?

000000001 ?

C000000000000B9C ?

ksuxds()+4320        call     _etext_f()+23058430  4000000000C9B4B8 ?

09110304648          C0000002EB3B47B8 ?

400000000086A4C0 ?

C000000000001736 ?

4000000001EC4B60 ?

000008E73 ?

9FFFFFFFFFFF9480 ?

60000000004ABC40 ?

ksudel()+144         call     9fffffffffff94f8     C0000002EB3B4320 ?

 

通过上述stack call可以定位到Bug 3931332 : ORA-600 [729] LEAKS WITH ORA-4031 TRACES:

 

 

 

Hdr: 3931332 9.2.0.5.0 RDBMS 9.2.0.5.0 DICTIONARY PRODID-5 PORTID-46 ORA-600

Abstract: ORA-600 [729] LEAKS WITH ORA-4031 TRACES

 

sedmp: internal or fatal error

ORA-600: internal error code, arguments: [729], [4380], [space leak], [],

[], [], [], []

—– Call Stack Trace —–

 

ksmuhe ksmugf ksuxds ksudel

 

 

以上ORA-600 [729] LEAKS内部错误常由ORA-4031引发,一般通过避免ORA-4031可以绕过ORA-600错误

 

问题总结:

  1. 由于共享池碎片和subpool内存分配不均衡最终引发了此次ORA-4031,解决方案见下文建议部分
  2. ORA-600 [729] LEAKS内部错误由ORA-4031触发BUG 3931332引起,通过避免ORA-4031错误可以绕过该问题
  3. 执行statspack长时间无响应并消耗100%的CPU的问题在11-23日实际操作中未出现,由于没有当时的性能记录无法进一步追查。

 

 

 

1.1  建议

  1. 定期监控shared pool free memory空闲内存量和碎片情况,具体可以使用如下脚本:

 

 

set pages 1000 lines 120

col name for a60

col value for a30

 

select * from v$sgastat where pool like ‘shared%’  and name=’free memory’;

 

select ‘0 (<140)’ BUCKET, KSMCHCLS, KSMCHIDX, 10*trunc(KSMCHSIZ/10) “From”,

count(*) “Count” , max(KSMCHSIZ) “Biggest”,

trunc(avg(KSMCHSIZ)) “AvgSize”, trunc(sum(KSMCHSIZ)) “Total”

from x$ksmsp

where KSMCHSIZ<140

and KSMCHCLS=’free’

group by KSMCHCLS, KSMCHIDX, 10*trunc(KSMCHSIZ/10)

UNION ALL

select ‘1 (140-267)’ BUCKET, KSMCHCLS, KSMCHIDX,20*trunc(KSMCHSIZ/20) ,

count(*) , max(KSMCHSIZ) ,

trunc(avg(KSMCHSIZ)) “AvgSize”, trunc(sum(KSMCHSIZ)) “Total”

from x$ksmsp

where KSMCHSIZ between 140 and 267

and KSMCHCLS=’free’

group by KSMCHCLS, KSMCHIDX, 20*trunc(KSMCHSIZ/20)

UNION ALL

select ‘2 (268-523)’ BUCKET, KSMCHCLS, KSMCHIDX, 50*trunc(KSMCHSIZ/50) ,

count(*) , max(KSMCHSIZ) ,

trunc(avg(KSMCHSIZ)) “AvgSize”, trunc(sum(KSMCHSIZ)) “Total”

from x$ksmsp

where KSMCHSIZ between 268 and 523

and KSMCHCLS=’free’

group by KSMCHCLS, KSMCHIDX, 50*trunc(KSMCHSIZ/50)

UNION ALL

select ‘3-5 (524-4107)’ BUCKET, KSMCHCLS, KSMCHIDX, 500*trunc(KSMCHSIZ/500) ,

count(*) , max(KSMCHSIZ) ,

trunc(avg(KSMCHSIZ)) “AvgSize”, trunc(sum(KSMCHSIZ)) “Total”

from x$ksmsp

where KSMCHSIZ between 524 and 4107

and KSMCHCLS=’free’

group by KSMCHCLS, KSMCHIDX, 500*trunc(KSMCHSIZ/500)

UNION ALL

select ‘6+ (4108+)’ BUCKET, KSMCHCLS, KSMCHIDX, 1000*trunc(KSMCHSIZ/1000) ,

count(*) , max(KSMCHSIZ) ,

trunc(avg(KSMCHSIZ)) “AvgSize”, trunc(sum(KSMCHSIZ)) “Total”

from x$ksmsp

where KSMCHSIZ >= 4108

and KSMCHCLS=’free’

group by KSMCHCLS, KSMCHIDX, 1000*trunc(KSMCHSIZ/1000);

 

 

SELECT KSMCHCLS CLASS, COUNT(KSMCHCLS) NUM, SUM(KSMCHSIZ) SIZ,

To_char( ((SUM(KSMCHSIZ)/COUNT(KSMCHCLS)/1024)),’999,999.00′)||’k’ “AVG SIZE”

FROM X$KSMSP GROUP BY KSMCHCLS;

 

 

 

 

  1. 当shared pool 空闲内存少于80M或碎片过多时考虑alter system flush shared_pool强制冲刷共享池。若条件允许也可以考虑定期在业务空闲时段例如凌晨冲刷共享池,或者定期重启数据库实例

 

  1. 启用statspack自动收集性能快照,一般Interval为一个小时,statspack是久经考验的性能监控工具,对性能的损耗非常小。

 

  1. 考虑部署OSW性能监控工具:

OSWatcher(OS Watcher Black Box)是Oracle标准的Support工具,该工具通过一系列的SHELL脚本来收集多方面的OS信息,包括CPU、内存、网络等。

 

OSWatcher不会给系统带来较重的性能负载,OSWatcher本质上是调用最常见的OS命令如ps、vmstat、netstat。

 

通过大量广泛的OSW使用,已经证明了OSW是一套安全、可靠的资源监控工具。

 

 

 

更多OSW的信息可以参考以下Metalink文档:

 

OSW 用户手册:OSWatcher Black Box User Guide.

 

What Is The OSWATCHER Black Box Effect On The Server Performance And What Are The Alternative Commands? [ID 946107.1]

 

中文:OSWatcher Black Box: How to improve performance and monitor your system automatically (Mandarin) [ID 1492202.1]

 

OSWatcher Black Box Analyzer User Guide [ID 461053.1]

关注刘相兵的新浪微博

扫码加入微信Oracle小密圈,了解Oracle最新技术下载分享资源

Speak Your Mind

沪公网安备 31010802001379号

TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569