11gR2新特性:Heavy swapping observed on system in last 5 mins.

在11gR2中DBRM(database resource manager,11gR2中新的后台进程,见《Learning 11g New Background Processes》)会在Alert.log告警日志中反映OS操作系统最近5分钟是否有剧烈的swap活动了, 具体的日志如下:

 

WARNING: Heavy swapping observed on system in last 5 mins.
pct of memory swapped in [3.07%] pct of memory swapped out [4.44%].
Please make sure there is no memory pressure and the SGA and PGA
are configured correctly. Look at DBRM trace file for more details.

 

进一步诊断可以观察DBRM后台进程的trace:

 

[oracle@vrh2 trace]$ cat VPROD2_dbrm_5466.trc
Trace file /s01/orabase/diag/rdbms/vprod/VPROD2/trace/VPROD2_dbrm_5466.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
ORACLE_HOME = /s01/orabase/product/11.2.0/dbhome_1
System name:    Linux
Node name:      vrh2.oracle.com
Release:        2.6.32-200.13.1.el5uek
Version:        #1 SMP Wed Jul 27 21:02:33 EDT 2011
Machine:        x86_64
Instance name: VPROD2
Redo thread mounted by this instance: 2
Oracle process number: 7
Unix process pid: 5466, image: oracle@vrh2.oracle.com (DBRM)

*** 2011-12-29 22:08:14.627
*** SESSION ID:(165.1) 2011-12-29 22:08:14.627
*** CLIENT ID:() 2011-12-29 22:08:14.627
*** SERVICE NAME:() 2011-12-29 22:08:14.627
*** MODULE NAME:() 2011-12-29 22:08:14.627
*** ACTION NAME:() 2011-12-29 22:08:14.627

kgsksysstop: blocking mode (2) timestamp: 1325214494612191
kgsksysstop: successful
kgsksysresume: successful

*** 2011-12-29 22:08:43.869
PQQ: Active Services changed
PQQ: Old service table
SvcIdx  SvcId Active ActDop
     5      5      1      0
     6      6      1      0
PQQ: New service table
SvcIdx  SvcId Active ActDop
     1      1      1      0
     2      2      1      0
     5      5      1      0
     6      6      1      0
2012-01-02 01:49:39.805820 : GSIPC:KSXPCB: msg 0x9bc353f0 status 34, type 12, dest 1, rcvr 0

*** 2012-01-02 01:49:54.509
PQQ: Skipping service checks
Trace file /s01/orabase/diag/rdbms/vprod/VPROD2/trace/VPROD2_dbrm_5466.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
ORACLE_HOME = /s01/orabase/product/11.2.0/dbhome_1
System name:    Linux
Node name:      vrh2.oracle.com
Release:        2.6.32-200.13.1.el5uek
Version:        #1 SMP Wed Jul 27 21:02:33 EDT 2011
Machine:        x86_64
Instance name: VPROD2
Redo thread mounted by this instance: 2
Oracle process number: 7
Unix process pid: 5466, image: oracle@vrh2.oracle.com (DBRM)

*** 2012-01-03 03:05:54.518
*** SESSION ID:(165.1) 2012-01-03 03:05:54.518
*** CLIENT ID:() 2012-01-03 03:05:54.518
*** SERVICE NAME:() 2012-01-03 03:05:54.518
*** MODULE NAME:() 2012-01-03 03:05:54.518
*** ACTION NAME:() 2012-01-03 03:05:54.518

PQQ: Skipping service checks
kgsksysstop: blocking mode (2) timestamp: 1325577954530079
kgsksysstop: successful
kgsksysresume: successful

*** 2012-01-03 03:05:59.270
PQQ: Active Services changed
PQQ: Old service table
SvcIdx  SvcId Active ActDop
     5      5      1      0
     6      6      1      0
PQQ: New service table
SvcIdx  SvcId Active ActDop
     1      1      1      0
     2      2      1      0
     5      5      1      0
     6      6      1      0
PQQ: Checking service limits

*** 2012-01-07 02:06:51.856
PQQ: Skipping service checks
PQQ: Checking service limits

*** 2012-01-08 23:12:11.302
PQQ: Skipping service checks
Heavy swapping observed in last 5 mins:    [pct of total memory][bytes]

*** 2012-01-09 22:39:51.619
total swpin [ 3.07%][124709K], total swpout [ 4.44%][180120K]
vm stats captured every 30 secs for last 5 mins:
swpin:                 swpout:  
[ 0.27%][     11096K]  [ 0.25%][     10451K]
[ 0.27%][     11240K]  [ 0.29%][     12000K]
[ 0.29%][     12001K]  [ 0.02%][       853K]
[ 0.16%][      6849K]  [ 0.02%][       966K]
[ 0.53%][     21604K]  [ 0.09%][      4031K]
[ 0.10%][      4415K]  [ 0.03%][      1414K]
[ 0.43%][     17808K]  [ 0.37%][     15016K]
[ 0.64%][     25972K]  [ 1.61%][     65515K]
[ 0.26%][     10560K]  [ 0.88%][     36051K]
[ 0.07%][      3164K]  [ 0.83%][     33823K]

 

可以看到dbrm收集到了短期内的swapin和swapout数据,这样便于我们诊断由swap造成的性能或者hang问题。

 

解决OS 系统严重swap的一些思路:

1.  诊断是否存在内存泄露的进程,解决内存泄露
2.  调优SGA/PGA ,减少oracle对内存的占用
3.  利用  echo 3 > /proc/sys/vm/drop_caches 命令可以暂时释放一些cache的内存
4. 调整系统VM内存管理参数, 例如Linux上sysctl.conf中的以下几个参数

vm.min_free_kbytes   :Raising the value in /proc/sys/vm/min_free_kbytes will cause the system to start reclaiming memory at an earlier time than it would have before.

vm.vfs_cache_pressure :        At the default value of vfs_cache_pressure = 100 the kernel will attempt to reclaim dentries and inodes at a “fair” rate with respect to pagecache and swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer to retain dentry and inode caches. Increasing vfs_cache_pressure beyond 100 causes the kernel to prefer to reclaim dentries and inodes.

vm.swappiness  : default 60 ;Apparently /proc/sys/vm/swappiness on Red Hat Linux allows the admin to tune how aggressively the kernel swaps out processes’ memory. Decreasing the  swappiness setting may result in improved Directory performance as the kernel
holds more of the server process in memory longer before swapping it out.

设置以下值,减少out of memory的可能性:

# Oracle-Validated setting for vm.min_free_kbytes is 51200 to avoid OOM killer
vm.min_free_kbytes = 51200
#vm.swappiness = 40
vm.vfs_cache_pressure = 200

RAC中增大db_cache_size引发的ORA-04031错误

几个礼拜前, 有一套10.2.0.2 的 二节点RAC 数据库因为增大db_cache_size , 引发其中一个实例发生著名的ORA-04031 错误,日志如下:

 

Errors in file /oracle/oracle/admin/maclean/udump/u1_ora_13757.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 1048 bytes of shared memory
("shared pool","select name,online$,contents...","Typecheck","kgghteInit")
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 1048 bytes of shared memory
("shared pool","select name,online$,contents...","Typecheck","seg:kggfaAllocSeg")
Thu Oct 13 08:25:05 2011
Log from www.oracledatabase12g.com  & www.askmaclean.com
Errors in file /oracle/oracle/admin/maclean/udump/u1_ora_1444.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 4120 bytes of shared memory
("shared pool","select name,online$,contents...","Typecheck","kgghtInit")
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 4120 bytes of shared memory
("shared pool","select name,online$,contents...","Typecheck","kgghtInit")

 

以上错误出现的同时实例出现大量的row cache lock字典缓存和cursor:pin S wait on X等待事件,说明共享池中的row cache字典缓存和SQL area 执行计划因为Free Memory不足而被不断换出,导致硬解析增多并SQL解析性能下降,进一步造成了应用程序挂起,赶到现场后对该ORA-04031错误进行了分析。

SGA中的内存池包含不同大小的内存块。当数据库启动时,会有一块大的内存被分配并使用Free list的空闲列表追踪。随着时间推移,这些内存被不断分配和释放,内存块(chunk)被按照其大小在不同的Fress list中移动,当SGA里任何一个内存池出现不能满足内部分配一整块连续的内存块请求时,就可能出现ORA-04031错误。实际使用中造成ORA-04031错误的原因可能是Oracle软件bug、产品缺陷、应用程序设计不当、Oracle内存参数设置不当。

 

这里出现ORA-04031错误的内存池是shared pool即共享池,为了搞清楚ORA-04031错误发生的实际原因,我们通过AWR报告分析共享池的使用情况。

 

以下是 ORA-04031 问题发生前一天AWR报告中的共享池内存使用情况:

 

Pool Name Begin MB End MB % Diff
large free memory 112.00 112.00 0.00
shared ASH buffers 25.60 25.60 0.00
shared CCursor 19.44 20.16 3.70
shared Checkpoint queue 5.87 5.87 0.00
shared PCursor 10.57 11.14 5.38
shared event statistics per sess 7.72 7.72 0.00
shared free memory 32.99 33.00 0.02
shared gcs resources 78.75 78.75 0.00
shared gcs shadows 49.61 49.61 0.00
shared ges big msg buffers 15.03 15.03 0.00
shared ges reserved msg buffers 7.86 7.86 0.00
shared ges resource 5.28 5.28 0.00
shared kglsim heap 16.63 16.63 0.00
shared kglsim object batch 25.63 25.63 0.00
shared library cache 21.32 22.01 3.23
shared row cache 7.13 7.13 0.00
shared sql area 64.06 61.55 -3.91
streams free memory 64.00 64.00 0.00
buffer_cache 3,936.00 3,936.00 0.00
fixed_sga 2.08 2.08 0.00
log_buffer 3.09 3.09 0.00

 

以下是 ORA-04031 问题发生时AWR报告中的共享池内存使用情况:

 

Pool Name Begin MB End MB % Diff
large free memory 112.00 112.00 0.00
shared ASH buffers 25.60 25.60 0.00
shared Checkpoint queue 5.87 5.87 0.00
shared KCL name table 9.00 9.00 0.00
shared event statistics per sess 7.72 7.72 0.00
shared free memory 25.56 25.52 -0.12
shared gcs resources 143.39 143.39 0.00
shared gcs shadows 90.33 90.33 0.00
shared ges big msg buffers 15.03 15.03 0.00
shared ges reserved msg buffers 7.86 7.86 0.00
shared library cache 7.59 7.65 0.80
shared row cache 7.13 7.13 0.00
shared sql area 8.70 7.35 -15.57
streams free memory 64.00 64.00 0.00
buffer_cache 7,168.00 7,168.00 0.00
fixed_sga 2.09 2.09 0.00
log_buffer 3.09 3.09 0.00

 

红色部分标注了2个报告中差异最大的地方,在问题发生时共享池中gcs resources和gcs shadows 2种资源对比前一天增长了169M。 gcs资源在共享池中享有较高的优先级, 而普通的SQL语句或执行计划享有较低的优先级,因为gcs资源所占用空间的大量膨胀,导致在没有调大共享池大小的情况下sql area和row cache内存资源被换出进而引发SQL解析性能下降和ORA-04031问题。

 

gcs resources和gcs shadow资源均是Oracle RAC中特有的全局缓存服务资源,这些资源负责处理RAC中的全局buffer cache。 同时这些资源所占用共享池的空间视乎Oracle实例所使用高速缓存的大小而决定,Metalink文档说明了该问题:

 

“The ‘gcs resources’ and ‘gcs shadows’ structures are used for handling buffer caches in RAC, so their memory usages are depending on buffer cache size. We can use V$RESOURCE_LIMIT to monitor them.”

 

当实例高速缓存buffer cache被大小时gcs资源所占用的空间也相应增长,具体算法如下:

 

‘gcs_resources’ = initial_allocation * 120 bytes = “_gcs_resources parameter” * 120 bytes
‘gcs_shadows’ = initial_allocation * 72 bytes = “_gcs_shadow_locks parameter” * 72 bytes

select * from v$resource_limit where resource_name like '%gcs%';

RESOURCE_NAME CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION LIMIT_VALUE
------------------------------ ------------------- --------------- ---
gcs_resources 507772 514607 976083 976083
gcs_shadows 133862 139927 976083 976083

 

我们可以通过现有的v$resource_limit视图中的INITIAL_ALLOCATION估算Buffer cache增加后的INITIAL_ALLOCATION数量,例如我们准备将db_cache_size从10g增加到20g,那么可以通过下列公式算出有必要增加的共享池大小:

 

add_to_shared_pool_size= 140 * Buffer_cache增加的兆数 * 192 bytes * 1.6

= 140 * 10* 1024 * 192 * 1.6 = 440401920 = 420M

 

问题总结

 

由于RAC环境中Oracle 使用共享池中的gcs resource/shadow 资源管理 全局缓存 , 当实例的Buffer Cache总量增加时gcs resource/shadow 这些资源的数目也会相应上升 , 这导致共享池中可用的剩余空间大幅下降,又因为 gcs 全局缓存资源在共享池中享有较高的优先级( perm ,且在10.2中 gcs资源不能和其他如row cache或library cache 共享一个Extent的内存区间) , 引发了大量的row/dictionary cache字典缓存和SQL执行计划被换出共享池, 引发大量的解析等待: cursor pin s on x 和 row cache lock ,  最终还是 没有避免ORA-04031 的错误被触发。 这里要补充一句, 因为这套10g 的系统没有 启用ASMM(Automatic Shared Memory Management) 特性 且 共享池本身设置地较小(shared_pool_size=512MB) , 都是导致该ORA-04031 错误较为显性地被触发的因素 。

 

<深入了解ASMM>中我介绍了ASMM的一些优点, 这里再罗列一下:

手动管理SGA的缺点在于:

  • 个别组件如shared pool、default buffer pool的大小存在最优值,但组件之间无法交换内存
  • 在9i中就提供了多种内存建议(advisor),但都要求人工手动干预
  • 无法适应工作负载存在变化的环境
  • 往往会导致内存浪费,没有用到实处
  • 若设置不当,引发著名的ORA-04031错误的可能性大大提高

ASMM自动管理SGA的优势在于:

  • 全自动的共享内存管理
  • 无需再配置每一个内存组件大小参数
  • 仅使用一个参数sga_target驱动
  • 有效利用所有可用的内存,极大程度上减少内存浪费
  • 对比MSMM其内存管理模式:
    • 更加动态
    • 更加灵活
    • 并具备适应性
  • 易于使用
  • 一定程度上增强了性能,因为内存分配更为合理了
  • 当某个组件急需更多内存时可以有效提供,因此可以一定程度避免ORA-04031的发生

 

解决方案

 

可以通过以下2种方式避免该RAC环境中特有的由增大Buffer_Cache导致GCS资源空间膨胀造成的ORA-04031问题:

  1. 在增加Buffer_Cache的同时估算所需相应增长的shared pool共享池大小
  2. 使用10g的SGA自动管理内存ASMM方式可以一定程度上避免ORA-04031错误的发生,但是自动管理方式存在实际使用时存在一些缺点,建议在启用ASMM:SGA_Target的同时,设置 shared_pool_size和 db_cache_size的最小大小,以最大可能避免因resize造成的问题; 同时建议设置_enabled_shared_pool_duration=false,禁用shared pool duration特性,也可以一定程度上减少ORA-04031发生的概率。

_enabled_shared_pool_duration:该参数控制是否启用10g中特有的shared pool duration特性,当我们设置sga_target为0时该参数为false;同时在10.2.0.5前若cursor_space_for_time设置为true时该参数也为false,不过在10.2.0.5以后cursor_space_for_time参数被废弃

Slide:深入了解Oracle自动内存管理ASMM by Maclean Liu

PL/SQL Virtual Machine Memory Usage

PL/SQL Program Units即PL/SQL程序单元,常被叫做”library units”或lib-units.

参考以下模块类型:

  • package spec
  • package body
  • top-level function or procedure
  • type spec
  • type body
  • trigger
  • anonymous blocks.

PL/SQL 虚拟机的内存使用主要体现在4个方面:

  • PGA
    • PL/SQL stack call,用于保存本地变量和其他一些状态结构
    • NCOMP生成的动态链接库文件
  • CGA
    • 二级内存(secondary memory),分配的堆和大的可收缩本地变量如大的strings、Lob或collections
  • UGA
    • 程度单元的实例(library-unit instantiations),如package global variables, DL0/ DL1 dependency vectors, display frame等
  • SGA
    共享池中的MCODE子堆

KGL – Kernel Generic Library Manager
该layer管理会话间需要共享的资源,如PL/SQL MCODE,Diana,Source,SQL cursor,SQL Plan)

KGI – Kernel Generic Instantiation Layer.
该layer管理特定会话非共享的资源,如实例化的包含了包全局变量状态信息的PL/SQL程序单元

KOH/KGH 该layer用以管理heap service堆服务

KGL_Entry_PLSQL_UNIT

 

PLSQL MCODE Heap的属性

  • machine dependent binary format for a compiled PL/SQL library-unit.
  • to execute code in a lib-unit, its MCODE heap must be loaded in memory.
  • MCODE is loaded in SGA and is “pinned” for CALL duration.
  • once unpinned, the heap may be aged; hence, may need to get re-loaded.
  • important to page large data structures in SGA.

MCODE Heap: Subcomponents

  • EntryPoint Piece (PL_UEP)
  • Code Segment or Byte Code Piece (PL_UCP)
  • Constant Pool:
    • Data Segment (PL_UKP)
    • Handle Segment (PL_UHS)
  • SQL Strings Table (PL_USP)

PL/SQL Instantiations

  • When a lib-unit is first referenced by a program (session) an instantiation of the lib unit is created.
  • PL/SQL relies on KGI for inst obj mgmt.
  • A PL/SQL lib-unit instantiation consists of:
    • PLIO struct (the handle of the PL/SQL inst obj)
    • Static Frame
    • Secondary (Heap) Memory for package globals
  • PLIO Struct
    • first portion of PLIO struct is the KGIOB struct (kgi’s portion of the object handle)
    • points to the static frame struct (PLIOST)
    • also contains other book-keeping info (such as memory duration of instantiation’s work area, etc.)
  • Static Frame:
    • represents that part of instantiation’s work area whose size is compile-time determined.
    • the root of the static frame is PLIOST struct which leads the following sub-pieces:
      • depends-on array to global variable vectors (DL0)
      • depends-on array to other instantiations (DL1)
      • Display Frame (DPF)
      • global variable vector for this unit (GF)
      • primary memory for global variables.
  • Secondary Memory for package globals
    • used to allocate data types that are stored out-of-line (heap allocated) e.g., collections, large strings, large records, LOBs, datetime types, etc.

Structure of a PLSQL Instantiation Object
Memory Model In PLSQL Instantiation

SHMALL, SHMMAX and SGA sizing

Question:

I need to confirm my Linux kernel settings and also get pointers/explanation on how i need to properly setup my kernel for proper operation of the Oracle Server.
My aim for the SR is not so much to get actual answers on how to set values. Rather, I need help to clear up the concepts behind the numbers.

From the output of the commands below it can be seen that the server has 12 GB of memory and after the kernel is configured (see below output of ipcs -lms command), I have SHMMAX set at 8589933568.
After consulting various documents I have come to understand the following, please verify:

- The largest SGA size is that defined by PAGESIZE*kernel.shmall (in this case 16GB, which is a mistake apparently as the system only has 12GB of RAM)
- It is OK for shmmax to be smaller than the requested SGA. If additional size is needed, then the space will be allocated in multiple pages, as long as the size does not exceed PAGESIZE*kernel.shmall
- If more than one Oracle instances reside on the same server, then Linux Kernel settings will have to cater for the largest instance SGA, since
- … different instances will hold completely different memory segments, which will have to seperately adhere to kernel limitations, therefore the kernel limitations do not care for multiple instances, as those are different memory areas
- Memory for SGA is allocated completely by setting SGA_TARGET. In a different case, it will be allocated as needed

$ free
total used free shared buffers cached
Mem: 12299352 8217844 4081508 0 190816 6799828
-/+ buffers/cache: 1227200 11072152
Swap: 16775764 90912 16684852

ipcs -lms

—— Shared Memory Limits ——–
max number of segments = 4096
max seg size (kbytes) = 8388607
max total shared memory (kbytes) = 16777216
min seg size (bytes) = 1

—— Semaphore Limits ——–
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 32000
semaphore max value = 32767

also ‘getconf PAGESIZE’ returns 4096

Answer:

- The largest SGA size is that defined by PAGESIZE*kernel.shmall (in this case 16GB, which is a mistake apparently as the system only has 12GB of RAM)

Comment :
Yes this needs to comply with the formula :
kernel.shmall = physical RAM size / pagesize as per NOTE:339510.1 .

- It is OK for shmmax to be smaller than the requested SGA. If additional size is needed, then the space will be allocated in multiple pages, as long as the size does not exceed PAGESIZE*kernel.shmall

Comment :
Yes it is ok to have SHMMAX<SGASIZE NOTE:567506.1 .
The allocation will be done in multiple shared segments either contigues
or non contiguous as per NOTE:15566.1

- If more than one Oracle instances reside on the same server, then Linux Kernel settings will have to cater for the largest instance SGA, since
different instances will hold completely different memory segments, which will have to seperately adhere to kernel limitations, therefore the kernel limitations do not care for multiple instances, as those are different memory areas.

Comment :
Yes thats valid for the SHMMAX , but for the SHMALL it is a systemwide
kernel variable affected by the physical memory and the pagesize .

- Memory for SGA is allocated completely by setting SGA_TARGET. In a different case, it will be allocated as needed.

comment :

Memory for the SGA is allocated completely by the SGA_MAX_SIZE .

I need to confirm my Linux kernel settings and also get pointers/explanation on how i need to properly setup my kernel for proper operation of the Oracle Server.
My aim for the SR is not so much to get actual answers on how to set values. Rather, I need help to clear up the concepts behind the numbers.

From the output of the commands below it can be seen that the server has 12 GB of memory and after the kernel is configured (see below output of ipcs -lms command), I have SHMMAX set at 8589933568.
After consulting various documents I have come to understand the following, please verify:

- The largest SGA size is that defined by PAGESIZE*kernel.shmall (in this case 16GB, which is a mistake apparently as the system only has 12GB of RAM)
- It is OK for shmmax to be smaller than the requested SGA. If additional size is needed, then the space will be allocated in multiple pages, as long as the size does not exceed PAGESIZE*kernel.shmall
- If more than one Oracle instances reside on the same server, then Linux Kernel settings will have to cater for the largest instance SGA, since
- … different instances will hold completely different memory segments, which will have to seperately adhere to kernel limitations, therefore the kernel limitations do not care for multiple instances, as those are different memory areas
- Memory for SGA is allocated completely by setting SGA_TARGET. In a different case, it will be allocated as needed

$ free
total used free shared buffers cached
Mem: 12299352 8217844 4081508 0 190816 6799828
-/+ buffers/cache: 1227200 11072152
Swap: 16775764 90912 16684852

ipcs -lms

—— Shared Memory Limits ——–
max number of segments = 4096
max seg size (kbytes) = 8388607
max total shared memory (kbytes) = 16777216
min seg size (bytes) = 1

—— Semaphore Limits ——–
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 32000
semaphore max value = 32767

also ‘getconf PAGESIZE’ returns 4096

 

Large Memory Footprints on AIX

Connor Mcdonald-一位Oracle极客为我们分享了一个AIX平台上11g独享服务进程内存占用过量的问题,该问题最后被确认为Bug”11G SERVER PROCESSES CONSUMING MUCH MORE MEMORY THAT 10G OR 9I”,相关文档如下:

 

Memory Footprint For Dedicated Server Processes More Than Doubled After 11g Upgrade On AIX Platform [ID 1246995.1]

Bug 9796810: 11G SERVER PROCESSES CONSUMING MUCH MORE MEMORY THAT 10G OR 9I

Bug 10190759: PROCESSES CONSUMING ADDITIONAL MEMORY DUE TO ‘USLA HEAP’

可以看到上述问题仅发生在从9i/10g升级到11g后,作为一个已确认的升级Bug值得我们大家去关注;最近几年这样的升级会越来越多,同时希望该Bug能在11.2.0.3中修复。

实际上我在10.2.0.3上就遇到过类似的Process Large Footprints问题:用户在打上一个one-off patch[6110331]后单个server process的rss量明显上升,主机的内存使用量大幅提高,虽然这个问题同样提交了SR,但最后没有确认为Bug;用户试图询问Oracle GCS关于rss上升的原因,但语焉而不详。

Search Criteria:AIX 11.2

Memory Footprint For Dedicated Server Processes More Than Doubled After 11g Upgrade On AIX Platform (Doc ID 1246995.1)

1. Have you installed the patch 10190759 ?

Review the note:
Memory Footprint For Dedicated Server Processes More Than Doubled After 11g Upgrade On AIX Platform (Doc ID 1246995.1)

If you have not installed the patch ?
–>>there is one available for 11.2.0.2.0, 11.2.0.2.2, 11.2.0.2.3

If you need me to review the patches you have installed you can upload the opatch listing?

opatch lsinventory -patch -detail

2. If you have already installed the patch 10190759 then

The additional memory seen allocated to oracle processes in the 11.2 release is a consequence of the additional link options added to the oracle link
line, -bexpfull and -brtllib. The two link options were specifically added in 11.2.0.1 to support the online patching feature.
Patch Name or Number: 10190759

 

Changes in the make file have been implemented such that you can relink without these options (-bexpfull and -brtllib) to avoid
additional memory overhead incurred by adding these options.These changes are available via a one-off patch.

This is a known bug: BUG:10190759 – PROCESSES CONSUMING ADDITIONAL MEMORY DUE TO ‘USLA HEAP’

Install  Patch: 10190759

How GoldenGate process consumes memory

Question:
We are using Golden Gate to replicate the data from Oracle 9.2.0.8 on Solaris8 SPARC 64 bit (GoldenGate Version 10.4.0.31 Build 001) to Oracle RAC 11.2.0.1 on Solaris10 SPARC 64bit (GoldenGate Version 10.4.0.19 Build 002). Both GoldenGate Extract and Replicat process are working fine. Please refer below information for more easy to understand our goldengate setup.

Extract Side | Replicat Side
Hostname: HK8SN020 | Hostname: HK8SP226 (HK8SP227 dont have any goldengate client, all goldengate process are located on HK8SP226)
Oracle 9.2.0.8 (32bit binary) | Oracle 11.2.0.1 (64bit binary)
Solaris8 Sparc 64bit Kernel | Solaris10 Sparc 64bit kernel
GoldenGate Version 10.4.0.31 Build 001 | GoldenGate Version 10.4.0.19 Build 002

However, on 27-Mar-2010, we found the server memory utilization on Solaris10 HK8SP226 are unexpected continuously rising since around 01:30. At around 3:20, the server memory utilization are up to 100%. At around 5:20, the server memory utilization is suddenly drop and back to normal. We compared the “Sar -r” and Solaris server message logfile. We found that on 05:21:44, goldengate relicate process are terminated with error message ” malloc 2097152 bytes failed”. After that, seem the server memory are suddenly released and back to normal memory utilization level.
We suspected the abnormal server memory usage are cased by goldengate replicate process. Can you please help to investigate and find out the root cause?

Answer:
GoldenGate replicates only committed transactions, it stores the operations of each transaction in a managed virtual-memory pool known as a cache until it receives either a commit or a rollback for that transaction. One global cache operates as a shared resource of an Extract process. The following sub-pools of virtual memory are allocated from the global cache:(1)One sub-pool per log reader thread for most transaction row data. and (2)One sub-pool for BLOB data and possibly other large items.

Within each sub-pool, individual buffers are allocated from the global cache, each one containing information that is relative to a transaction that is being processed by GoldenGate. The sizes of the initial and incremental buffers are controlled by the CACHEBUFFERSIZE option of CACHEMGR.

The actual amount of physical memory that is used by any GoldenGate process is controlled by the operating system, not the GoldenGate process. The global cache size is controlled by the CACHESIZE option of CACHEMGR.Cache manager keeps a GoldenGate process working within the soft limit of its global cache size, only allocating virtual memory (not physical memory) on demand. The actual amount of physical memory that is used by any GoldenGate process is controlled by the operating system, not the GoldenGate program.

GoldenGate cache manager only takes advantage of the memory management functions of the operating system to ensure that GoldenGate processes work in a sustained and efficient manner. Within cache, OGG makes use of all the modern “virtual memory” techniques by allocating and managing active buffers efficiently and recycling old buffers instead of paging to disk, when possible and paging less-used information to disk, when necessary.

When COM initializes, by default it first determines how much virtual memory the OS has available for it and uses that to determine what CACHESIZE should be. Default for CACHESIZE is 8GB for 64-bit systems and 2GB for 32-bit systems.

The available virtual memory is reported with the PROCESS VM AVAIL FROM OS value in the report file. The CACHESIZE value will either be rejected or sized down if it is larger than, or sufficiently close to, the amount of virtual memory that is available to the process.

The CACHESIZE value will always be a power of two, rounded down from the value of PROCESS VM AVAIL FROM OS, unless the latter is itself a power of two, in which case it is halved. After the specified size is consumed by data, the memory manager will try to free up memory by paging data to disk or by reusing aged buffers, before requesting more memory from the system.

The memory manager generates statistics that can be viewed with the SEND EXTRACT or SEND REPLICAT command when used with the CACHEMANAGER option.The statistics show the size of the memory pool, the paging frequency, the size of the transactions, and other information that creates a system profile. Based on this profile, you might need to make adjustments to the memory cache if you see performance problems that appear to be related to file caching. The first step is to modify the CACHESIZE and CACHEPAGEOUTSIZE parameters. You might need to use a higher or lower cache size, a higher or lower page size, or a combination of both, based on the size and type of transactions that are being generated. You might also need to adjust the initial memory allocation with the CACHEBUFFERSIZE option. It is possible, however, that operating system constraints could limit the effect of modifying any components of the CACHEMGR parameter. In particular, if the operating system has a small per-process virtual memory limit, it will force more file caching, regardless of the CACHEMGR configuration.

Once the CACHESIZE is set to 1 GB, the GoldenGate process will use up to 1 GB virtual memory and then it will use swap space on disk.
If the CACHESIZE is explicitly set in process parameter file then the CACHEMGR will use only 1GB. Otherwise it will default to the Memory size depending upon the platform(32 or 64).If a fixed CACHESIZE is set in the parameter file then it will be taken by the process, if no the default will be taken by the process depending upon the platform. If very low virtual memory limit is set or available in the OS then it will force more file caching. There is always a difference between caching in memory buffers and file caching as it involves read and write i/o’s.

So try to set a default CACHESIZE for the GoldenGate Process (Extract/Replicat). Edit the respective source extract and target replicat parameter files and use the below mentioned CACHEMGR parameter with the options given and restart the processes.

CACHEMGR CACHEBUFFERSIZE 64KB, CACHESIZE 1GB, CACHEDIRECTORY
, CACHEDIRECTORY
Example:
CACHEMGR CACHEBUFFERSIZE 64KB, CACHESIZE 1GB, CACHEDIRECTORY /ggs/dirtmp, CACHEDIRECTORY /ggs2/temp

So once the CACHESIZE is set to 1 GB, the GoldenGate process will use up to 1 GB virtual memory only and then after it will use swap space on disk.

共享池中的NETWORK BUFFER

中午休闲时在itpub看到一个关于network buffer占用大量内存的求助帖,帖子原文如下:

各位大侠们,请教个问题。昨天遇到一个solaris10平台下的oracle10g(10.2.0.4)数据库报共享内存不足,发现数据库的sga_target才2512M,而在v$sgastat视图中查到的
shared pool–>NETWORK BUFFER就有1848744416字节,是什么引起network buffer这么大呢,在udmp目录下1分钟产生几个跟 ORA-4031相关的文件。

==================
SQL> show parameter sga

NAME                                 TYPE        VALUE
———————————— ———– ——————————
lock_sga                             boolean     FALSE
pre_page_sga                         boolean     FALSE
sga_max_size                         big integer 2512M
sga_target                           big integer 2512M
SQL> show parameter share

NAME                                 TYPE        VALUE
———————————— ———– ——————————
hi_shared_memory_address             integer     0
max_shared_servers                   integer
shared_memory_address                integer     0
shared_pool_reserved_size            big integer 72142028
shared_pool_size                     big integer 0
shared_server_sessions               integer
shared_servers                       inte


NETWORK BUFFER对我们来说或许有些陌生,那是因为绝大多数场合都采用dedicated server模式,共享服务器模式下NETWORK BUFFER将被大量使用。MOS文档[741523.1]叙述了NETWORK BUFFER的主要用途:

On 10.2, after upgrading from 9iR2, the following error occurs:

ORA-07445: exception encountered: core dump [] [] [] [] [] []

plus

Dispatcher Trace file contains an ORA-4031 Diagnostic trace, with:
Allocation request for: NETWORK BUFFER

…followed by…

found dead dispatcher ‘D000′, pid = (12, 1)

The amount of memory used by NETWORK BUFFERs in the shared pool has significantly grown between 9.2 and 10.2.  The side-effect is to run-out of Shared Pool memory (reporting an ORA-4031), when a large number of sessions are connecting to the server (in the order of 1000′s).

While a session is being established, we allocate 3 buffers each of 32k in size.  After the session is established, we use the 3 SDU-sized buffers, however we do not deallocate the 3x32k buffer we allocated initially.

This issue has been logged in unpublished Bug 5410481.

Additionally, there is  Bug 6907529.

NS buffers are allocated based on the SDU specified by the user. The negotiated SDU could be considerably lower. The difference between these two is wasted.

For example, the dispatcher specifies an SDU of 32k. Clients, by default, use an SDU of 8k. The remaining 24k is never used.

Issue in Bug 6907529 is fixed in 11.2.

Bug 5410481 is fixed in 10.2.0.3.

As a workaround to 5410481, the ADDRESS part of DISPATCHERS parameter can be used to specify a smaller SDU size.

For example:
DISPATCHERS=”(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp))(SDU=8192))”

To implement the change;

  1. connect to the database as SYSDBA
  2. alter system set dispatchers=’(address=(protocol=tcp)(host=IP-Address)(sdu=8192))(dispatchers=DispatcherCount)’ scope=spfile;
  • re-start the database
  • 你可能会问SDU是什么?Oracle NET缓存的数据以SDU为基本单位,SDU即 session data unit,一般默认为8192 bytes。当这些数据单元被写满,或被client读取时,他们将被传递给Oracle Network层(oracle network layer)。譬如Data Guard环境中redo传输的每个Chunk往往要大于8192 bytes,那么默认的SDU就不太适用。当有大量重做数据要传输到standby库时,增大SDU buffer的大小可以改善Oracle的网络性能。你可以很方便的通过修改sqlnet.ora配置文件来修改SDU,如在该文件内加入以下条目:
    DEFAULT_SDU_SIZE=32767 /*修改全局默认SDU到32k*/
    当然你也可以在tnsnames.ora中定义服务别名时个别指定SDU,下文我们会用到。
    如上文所述在版本10.2.0.3以前当会话建立时,Oracle会以dispatchers参数定义的SDU为单位,分配3个单位的NETWORK  BUFFER,而实际上client端可能并未指定和dispatchers一致的SDU,若dispatchers中定义的SDU为32k,而client使用默认的8k SDU,则一个会话可能要浪费3*32-3*8=72k的NETWORK BUFFER。

    为什么共享服务器模式下会用到共享池中的NETWORK BUFFER,而独享服务器模式下没有呢?因为在独享服务器模式下每个会话所分配的三个SDU是从PGA中获取的;当使用共享服务器模式时会话与服务进程形成一对多的映射关系,这三个SDU 的NETWORK BUFFER同UGA一样转移到了SGA中。

    下面我们通过实践来进一步验证。

    SQL> select * from v$version;
    
    BANNER
    ----------------------------------------------------------------
    Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
    PL/SQL Release 10.2.0.4.0 - Production
    CORE    10.2.0.4.0      Production
    TNS for Linux: Version 10.2.0.4.0 - Production
    NLSRTL Version 10.2.0.4.0 - Production
    /*实验服务器端采用10.2.0.4版本*/
    SQL> show parameter dispatch
    
    NAME                                 TYPE        VALUE
    ------------------------------------ ----------- ------------------------------
    dispatchers                          string      (address=(protocol=tcp)(host=1
                                                            92.168.1.103)(sdu=32768))(SERV
                                                            ICE=cXDB)(dispatchers=10)
    /*dispatchers中指定了SDU为32k*/
    
    C:\Windows\System32>tnsping cXDB
    TNS Ping Utility for 32-bit Windows: Version 11.2.0.1.0 - Production on 05-8月 -2010 22:51:27
    Copyright (c) 1997, 2010, Oracle.  All rights reserved.
    已使用的参数文件:
    D:\tools\adminstrator\11g\orahome\network\admin\sqlnet.ora
    已使用 TNSNAMES 适配器来解析别名
    尝试连接 (DESCRIPTION = (SDU=8192) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.103)(PORT = 1521))) (CONNECT_DATA = (SERVER = SHARED) (SERVICE_NAME = cXDB)))
    OK (30 毫秒)
    /* client端采用11.2.0.1版本,定义了共享服务器模式的服务别名,显式指定SDU为8192字节*/
    

    这里我们要用到一个简单的java程序,用来模拟大量会话登录;这个程序很傻瓜,但是总比你一个个开SQLPLUS要明智的多:

    /*这是一个很简单的java程序,登录远程数据库,并尝试打开600个回话,并且都指定了SDU为8192*/
    package javaapplication2;
    import oracle.jdbc.*;
    import java.sql.*;
    public class Main
    {
        public static void main(String[] args) throws SQLException
        {
            try
            {
                Class.forName("oracle.jdbc.driver.OracleDriver");
            }
            catch(Exception e )
            {
            }
            Connection cnn1=DriverManager.getConnection("jdbc:oracle:thin:@(DESCRIPTION = (SDU=8192) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.103)(PORT = 1521))) (CONNECT_DATA = (SERVER = SHARED) (SERVICE_NAME = cXDB)))", "system", "D2nb51tz");
            Statement stat1=cnn1.createStatement();
            ResultSet rst1=stat1.executeQuery("select * from v$version");
            while(rst1.next())
            {
                System.out.println(rst1.getString(1));
            }
            Connection m[]=new Connection[2000];
            Statement s[]=new Statement[2000];
            ResultSet r[]=new ResultSet[2000];
            int i=0;
            while(i<600)
            {
                try
                {
                    m[i]=DriverManager.getConnection("jdbc:oracle:thin:@(DESCRIPTION = (SDU=8192) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.103)(PORT = 1521))) (CONNECT_DATA = (SERVER = SHARED) (SERVICE_NAME = cXDB)))", "system", "d2nb51tz");
                }
                catch (Exception em)
                {
                    System.out.println(em.getMessage());
                }
                try
                {
                    Thread.sleep(3);
                }
                catch (Exception e)
                {
                }
                s[i]=m[i].createStatement();
                m[i].setAutoCommit(false);
                i++;
                System.out.println(i+"is ok !");
            }
            System.out.println("We are waiting!");
            try
            {
                Thread.sleep(1000);
            }
            catch (Exception e)
            {
            }
        }
    }
    

    编译上面这段程序,尝试执行看看,执行的同时留意观察NETWORK BUFFER:

    SQL> select name,pool,bytes from v$sgastat where name like '%NETWORK%';
    
    NAME                       POOL              BYTES
    -------------------------- ------------ ----------
    NETWORK BUFFER             shared pool      328080
    
    java -jar ora_network_buffer_test_8.jar
    /*启动编译后的测试程序*/
    
    SQL> select name,pool,bytes from v$sgastat where name like '%NETWORK%';
    NAME                       POOL              BYTES
    -------------------------- ------------ ----------
    NETWORK BUFFER             shared pool    69608200
    
    SQL> select name,pool,bytes from v$sgastat where name like '%NETWORK%';
    NAME                       POOL              BYTES
    -------------------------- ------------ ----------
    NETWORK BUFFER             shared pool      348960
    /*会话终止后,NETWORK BUFFER回缩*/
    
    修改上述程序中的SDU到32k,重新编译后再次测试
    java -jar ora_network_buffer_test_32.jar
    
    SQL> select name,pool,bytes from v$sgastat where name like '%NETWORK%';
    
    NAME                       POOL              BYTES
    -------------------------- ------------ ----------
    NETWORK BUFFER             shared pool      328080
    
    SQL> select name,pool,bytes from v$sgastat where name like '%NETWORK%';
    
    NAME                       POOL              BYTES
    -------------------------- ------------ ----------
    NETWORK BUFFER             shared pool    99148576
    /*可以看到同样的会话数量,client端SDU增大到32k后,NETWORK BUFFER有了大幅增长*/
    
    我们修改dispatchers参数中的SDU到8k看看
    SQL> alter system set dispatchers='';
    
    System altered.
    
    SQL> alter system set dispatchers='(address=(protocol=tcp)(host=192.168.1.103)(sdu=8192))(SERVICE=cXDB)(dispatchers=10)';
    
    System altered.
    SQL> show parameter dispatchers
    
    NAME                                 TYPE        VALUE
    ------------------------------------ ----------- ------------------------------
    dispatchers                          string      (address=(protocol=tcp)(host=1
                                                            92.168.1.103)(sdu=8192))(SERVI
                                                            CE=cXDB)(dispatchers=10)
    SQL> select name,pool,bytes from v$sgastat where name like '%NETWORK%';
    
    NAME                       POOL              BYTES
    -------------------------- ------------ ----------
    NETWORK BUFFER             shared pool      328080
    
    java -jar ora_network_buffer_test_32.jar
    
    SQL> select name,pool,bytes from v$sgastat where name like '%NETWORK%';
    
    NAME                       POOL              BYTES
    -------------------------- ------------ ----------
    NETWORK BUFFER             shared pool    99148552
    /*看起来dispatcher中的SDU优先级并没有client中的高*/
    我们再来看看client中SDU为8k的情况
    SQL> show parameter dispatchers
    
    NAME                                 TYPE        VALUE
    ------------------------------------ ----------- ------------------------------
    dispatchers                          string      (address=(protocol=tcp)(host=1
                                                            92.168.1.103)(sdu=8192))(SERVI
                                                            CE=cXDB)(dispatchers=10)
    SQL> select name,pool,bytes from v$sgastat where name like '%NETWORK%';
    
    NAME                       POOL              BYTES
    -------------------------- ------------ ----------
    NETWORK BUFFER             shared pool      328080
    
    java -jar ora_network_buffer_test_8.jar
    
    SQL> select name,pool,bytes from v$sgastat where name like '%NETWORK%';
    NAME                       POOL              BYTES
    -------------------------- ------------ ----------
    NETWORK BUFFER             shared pool    69608200
    /*与dispatchers中为32k,而client为8k时一样*/
    

    由以上实践可知10.2.0.4之后,NETWORK BUFFER的使用量由客户端设定的SDU和共享服务器会话数决定。我在之前的博文中曾经列出过TNS协议的几个基础类描述(见《Oracle 网络TNS协议的几个基础类描述》),其中Session包含了setSDU(int i)方法,其代码如下:

    public void setSDU(int i)
    {
    if(i <= 0) sdu = 2048;
    else if(i > 32767)
    sdu = 32767;
    else if(i < 512)
    sdu = 512;
    else
    sdu = i;
    }
    

    由以上代码可知,客户端设定的SDU时,其最大最小值分别为32k和512bytes,大于32k时被强制设为32k,而小于512bytes时被强制设为512bytes,若设定SDU<0,则被强制修正为2048 bytes,在512 bytes- 32767 bytes之间则为原值不变。