resmgr:cpu quantum等待事件

resmgr:cpu quantum是Resource Manager特性导致的等待事件,理论上只有版本10g以后才可能出现,同时应当仅在resource manager plan被激活的时间窗口中发生该等待事件。

该等待事件存在的意义是当resource manager控制CPU调度时,需要控制对应进程暂时不使用CPU而进程到内部运行队列中,以保证该进程对应的consumer group(消费组)没有消耗比指定resource manager指令更多的CPU。

此时session就会以”resmgr:cpu quantum”的名义等待在内部运行队列中,wait一段时间以减少对CPU的争用,直到再次获得CPU时该等待事件结束。

 

 

 

 

需要注意的是虽然国内的绝大多数数据库都不太可能去设置resource plan激活某种资源计划,但是10g开始默认的gather_stats_job自动收集统计信息作业会在每个工作日的晚上22:00-06:00和周六、周日全天打开default_maintance_windows该维护窗口会默认打开一个Oracle预定义的资源计划,在这个窗口中服务进程仍可能进入resmgr:cpu quantum等待事件。

合理的resmgr:cpu quantum是为了实现cpu control的必要代价,但是存在一些BUG可能导致服务进程因为resmgr:cpu quantum而HANG住,并一直等待该事件,这些BUG主要发生在10g的10.2.0.5之前。

 

AWR TOP 5 EVENT:

Event Waits Time(s) (ms) time Wait Class
———————————– ——— —— —— ———-
resmgr:cpu quantum 1,193,863 96,152 81 67.0 Scheduler
enq: HW – contention 11,789 13,172 1117 9.2 Configurat
enq: US – contention 38,030 11,876 312 8.3 Other
buffer busy waits 19,224 10,762 560 7.5 Concurrenc
enq: TX – index contention 4,862 6,989 1437 4.9 Concurrenc

 

解决resmgr:cpu quantum等待事件的方法有:

 

一. 禁用resource plan windows

To disable the DEFAULT_MAINTENANCE_PLAN you can use the below steps as suggested in Note 786346.1

1. Set the current resource manager plan to null (or another plan that is not restrictive):

alter system set resource_manager_plan=” scope=both;

2. Change the active windows to use the null resource manager plan (or other nonrestrictive plan) using:

execute dbms_scheduler.set_attribute(‘WEEKNIGHT_WINDOW’,'RESOURCE_PLAN’,”);
execute dbms_scheduler.set_attribute(‘WEEKEND_WINDOW’,'RESOURCE_PLAN’,”);

3. Then, for each window_name (WINDOW_NAME from DBA_SCHEDULER_WINDOWS), run:

execute dbms_scheduler.set_attribute(‘<window name>’,'RESOURCE_PLAN’,”);

 

二、打上resource manager必要的补丁

可以参考文档Recommended Patches for CPU Resource Manager (Doc ID 1339803.1)

 

It applies to Oracle RDBMS Enterprise Edition, versions 10.2.0.4, 10.2.0.5, 11.1.0.7, 11.2.0.1, 11.2.0.2, and 11.2.0.3.

The Oracle Database Resource Manager can be used in many ways, such as managing CPU, I/O, parallel execution, runaway queries, etc. This note addresses issues that are specific to managing CPU.
CPU Resource Manager is enabled when the current Resource Plan contains CPU management directives. These directives are called cpu_p1, cpu_p2, cpu_p3, etc. in releases prior to 11.1. They are called mgmt_p1, mgmt_p2, mgmt_p3, etc. in 11.1 and all subsequent releases.
When using CPU Resource Manager, 3 types of issues are seen:

(1) Excessive throttling, resulting in under-utilization of the CPU.
(2) Poor conformance to the Resource Plan.
(3) Crashes or internal errors.

Below is a list of the most critical, known bugs for CPU Resource Manager. These bug fixes are recommended for all customers that are either evaluating or using CPU Resource Manager.

For other recommended bug fixes, monitoring scripts, and other tips for Resource Manager, see the master MOS Document 1339769.1.

Bug Bug Description Recommended Releases
Bug 6874858 CONNECTION TIMEOUT WHEN RMAN BACKUP RUNINGResource Manager under-utilizes the CPUs. This bug occurs with workloads that contain a lot of sequential I/Os. 10.2.0.4
Unpublished bug 8793492 INSTANCE CAGING LETS TOO MANY ORACLE FOREGROUND PROCESSES RUNNING PERIODICALLYThe name for this bug is misleading, as it also occurs when Instance Caging is not used. The symptom is long wait times for “library cache: mutex X”, resulting in excessive CPU utilization. 11.1.0.7
11.2.0.1 (fixed in Exadata BP7)
Unpublished bug 6431266 TURN OFF QUANTUM DONATION: PLAN CONFORM DISPARITY + HIGH THRESHOLD BASED ON LOWThis fix improves the accuracy with which Resource Manager enforces the resource allocations specified in the Resource Plan. 10.2.0.4
Unpublished bug 8636407 INSTANCE CRASH DUE TO INTERNAL ERROR ENCOUNTERED BY PMONThe symptom of this bug is ORA-600[kgskexitsch]. It is a very intermittent bug that can result in instance crashes. 10.2.0.4
10.2.0.5 (fixed in PSU6)
11.1.0.7 (fixed in PSU10)
11.2.0.1
Bug 10039731 HIGH WAITS FOR RUNNABLE PROCESS USING PX AND RESMGRResource Manager under-utilizes the CPUs. This fix addresses the under-utilization. It also improves the accuracy with which Resource Manager enforces the resource allocations specified in the Resource Plan. 11.2.0.1 (fixed in Exadata BP8)
11.2.0.2 (fixed in Exadata BP2 and PSU4)
Bug 8660422 SINCE APPLYING PATCH FOR 8624887 “UNSPECIFIED WAIT EVENT” IS SEEN IN AWRThis bug causes Resource Manager to under-utilize the CPUs. This bug can also result in a large number of “unspecified” wait events. 10.2.0.4
10.2.0.5 (fixed in 10.2.0.5.4)
11.1.0.7 (fixed in 11.1.0.7.7)
Unpublished bug 9924349 DBMV2: ORA 600 [RESPLAN:TRYADD_3] IN THE CELL SIDE WITH GE/DBFS WORKLOADThis bug only occurs on Exadata systems. The symptom is a storage cell crash with the error ORA 600 [RESPLAN:TRYADD_3]. The workaround is to avoid using Resource Plans with subplans, such as the “default_maintenance_plan”, which is enabled during the maintenance windows. Exadata-only issue.
11.2.0.1 (fixed in Exadata BP12)
11.2.0.2 (fixed in Exadata BP3)
Unpublished bug 10326338 HIGH RESMGR:CPU QUANTUM WITH APPSQOS_PLAN IN PLACEResource Manager under-utilizes the CPUs. This bug can occur on with wait-intensive workloads (often OLTP databases) with any resource plan. 10.2.0.4
10.2.0.5
11.1.0.7
11.2.0.1 (fixed in Exadata BP11)
11.2.0.2 (fixed in Exadata BP6 and PSU4)
Unpublished bug 11064851 CPU CONSUMED TIME DOES NOT TAKE ACCOUNT OF SHORT WAITSThis bug only affects the Resource Manager statistics. The Resource Manager statistics for “cpu consumed” is too low. It doesn’t include CPU consumed during short waits. 11.2.0.1
11.2.0.2 (fixed in Exadata BP 10)
Unpublished bug 7414919 CONSUMED_CPU_TIME IN SYS_GROUP IS INCORRECTLY REPORTEDThis bug only affects the Resource Manager statistics. The reported value of v$rsrc_consumer_group.consumed_cpu_time is too high. 10.2.0.4
Unpublished bug 12420002 LNX64-11203-RAC:DB HIT ORA-700 [KGSKRECALC:RECALCRUNCOUNT1]This bug can cause intermittent errors. The symptoms of this bug are ORA-600 [kgskewtx] or ORA-700[kskplanresetact] or ORA-700 [kgskrecalc:recalcruncount1] or ORA-600[kgskrunnextint:state]. 10.2.0.4
10.2.0.5
11.1.0.7
11.2.0.1
11.2.0.2
Unpublished bug 10219583 DBMV2: DATABASE HUNG AFTER 4 HRS OF ENABLE/DISABLE IORMPLAN
This bug only occurs on Exadata. It can cause the database to hang when Resource Manager plans are being set while smart scans are running. This bug occurs very intermittently.
The workaround is to avoid changing database resource plans while workloads are running.
Exadata-only issue.
11.1.0.7
11.2.0.1
11.2.0.2 (fixed in BP6)

 

 

 

================================================

/

STATUS	 SQL_ID
-------- -------------
EVENT
----------------------------------------------------------------
ACTIVE
resmgr:cpu quantum

SQL> /

STATUS	 SQL_ID
-------- -------------
EVENT
----------------------------------------------------------------
ACTIVE
resmgr:cpu quantum

SQL> oradebug short_stack;
ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1782<-sspuser()+112<-__restore_rt()<-pfrrun_no_tool()+63<-pfrrun()+1030<-plsql_run()+774<-peicnt()+301<-kkxuexe()+1082<-kkxmpsexe()+1034<-kgmexwi()+631<-kgmexec()+1059<-evapls()+1071<-evaopn2()+904<-qerocImageIterStart()+400<-qerocStart()+443<-selexe0()+1088<-opiexe()+14771<-opipls()+3103<-opiodr()+1149<-__PGOSF141_rpidrus()+211<-skgmstack()+148<-rpiswu2()+617<-rpidrv()+1347<-psddr0()+464<-psdnal()+462<-pevm_OPND()+142<-pfrinstr_OPND()+53<-pfrrun_no_tool()+63<-pfrrun()+1030<-plsql_run()+774<-peidxr_run()+258<-peidxexe()+79<-kkxdexe()+609<-kkxmpexe()+241<-kgmexwi()+631<-kgmexec()+1059<-evapls()+1071<-evaopn2()+904<-qerocImageIterStart()+400<-qerocStart()+443<-qersoStart()+1581<-selexe0()+1088<-opiexe()+14771<-kpoal8()+2299<-opiodr()+1149<-ttcpip()+1251<-opitsk()+1633<-opiino()+958<-opiodr()+1149<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+214<-main()+201<-__libc_start_main()+244 oradebug short_stack;
ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1782<-sspuser()+112<-__restore_rt()<-__PGOSF346_pfrinstr_RELBRNCH()+62<-pfrrun_no_tool()+63<-pfrrun()+1030<-plsql_run()+774<-peicnt()+301<-kkxuexe()+1082<-kkxmpsexe()+1034<-kgmexwi()+631<-kgmexec()+1059<-evapls()+1071<-evaopn2()+904<-qerocImageIterStart()+400<-qerocStart()+443<-selexe0()+1088<-opiexe()+14771<-opipls()+3103<-opiodr()+1149<-__PGOSF141_rpidrus()+211<-skgmstack()+148<-rpiswu2()+617<-rpidrv()+1347<-psddr0()+464<-psdnal()+462<-pevm_OPND()+142<-pfrinstr_OPND()+53<-pfrrun_no_tool()+63<-pfrrun()+1030<-plsql_run()+774<-peidxr_run()+258<-peidxexe()+79<-kkxdexe()+609<-kkxmpexe()+241<-kgmexwi()+631<-kgmexec()+1059<-evapls()+1071<-evaopn2()+904<-qerocImageIterStart()+400<-qerocStart()+443<-qersoStart()+1581<-selexe0()+1088<-opiexe()+14771<-kpoal8()+2299<-opiodr()+1149<-ttcpip()+1251<-opitsk()+1633<-opiino()+958<-opiodr()+1149<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+214<-main()+201<-__libc_start_main()+244