2011年4月28日

Oracle内部错误:ORA-00600[kgskdecrstat1]一例

mac

Author

8 min

Read Time

295

Views

一套Itanium HP-UX上的9.2.0.5系统最近出现了ORA-00600: internal error code, arguments: [kgskdecrstat1], [], [], [], [], [],内部错误，其错误日志如下:

Mon Apr 18 12:32:20 2011
ORA-00600: internal error code, arguments: [kgskdecrstat1], [], [], [], [], [], [], []
ARC0: Completed archiving  log 6 thread 1 sequence 831803
Mon Apr 18 12:32:22 2011
ORA-00600: internal error code, arguments: [kgskdecrstat1], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kgskdecrstat1], [], [], [], [], [], [], []
Mon Apr 18 12:32:23 2011
ORA-00600: internal error code, arguments: [kgskdecrstat1], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kgskdecrstat1], [], [], [], [], [], [], []

Trace file
----------
Oracle9i Enterprise Edition Release 9.2.0.5.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.5.0 - Production
ORACLE_HOME = /opt/oracle/product/9.2.0.5
System name:	HP-UX
Release:	B.11.23
Version:	U
Machine:	ia64
Redo thread mounted by this instance: 1
Oracle process number: 28
Unix process pid: 14827, image: oracle@(TNS V1-V3)

*** 2011-04-18 02:39:15.043
*** SESSION ID:(276.16183) 2011-04-18 02:39:15.042
DEADLOCK DETECTED
Current SQL statement for this session:
The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:
Deadlock graph:
                      ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TX-00030022-0070d56f        28     276     X            344     259           X
TX-000a0012-0058e4fc       344     259     X             28     276           X
session 276: DID 0001-001C-0000002C	session 259: DID 0001-0158-000020E6
session 259: DID 0001-0158-000020E6	session 276: DID 0001-001C-0000002C
Rows waited on:
Session 259: obj - rowid = 00001F0D - AABNCDAAEAAAAXYAAL
 (dictionary objn - 7949, file - 4, block - 1496, slot - 11)
Session 276: obj - rowid = 00001F0B - AABNCAAAEAAAAXCAAY
 (dictionary objn - 7947, file - 4, block - 1474, slot - 24)
Information on the OTHER waiting sessions:
Session 259:
 pid=344 serial=17280 audsid=106170227 user: 41
           program: JDBC Thin Client
 application name: JDBC Thin Client, hash value=0
 Current SQL Statement:
End of information on OTHER waiting sessions.
*** 2011-04-18 12:32:20.392
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kgskdecrstat1], [], [], [], [], [], [], []
----- Call Stack Trace -----
... kgskdecrstat  kskdecrstat  ktudecrustat  ktcdso  ktcrcm  ktdcmt  k2lcom ...
----- End of Call Stack Trace -----

PROCESS STATE
-------------
   program: JDBC Thin Client
   application name: JDBC Thin Client, hash value=0
   last wait for 'SQL*Net message from client' blocking sess=0x0 seq=39332 wait_time=21242
               driver id=74637000, #bytes=1, =0

可以看到以上引发ORA-00600[kgskdecrstat1]内部错误的进程同时发现了死锁(DEADLOCK)，在MOS上搜索可以发现"Bug 2894072: ORA-00600: INTERNAL ERROR CODE, ARGUMENTS: [KGSKDECRSTAT1]"，但是该Bug最早发生在9.2.0.3上，已经确定影响9.2.0.x所有版本，并且在9.2.0.5上没有backport的bug fix。提交sr后，Oracle Gcs给出了解决的solution: 1.升级数据库到10.2.0.5，11.1.0.7，11.2.0.2等目前支持的版本 2.解决引发该bug的ORA-00060 dead lock问题 3.如果在解决ORA-00060后仍出现以上ORA-00600[kgskdecrstat1]内部错误，且启用了9i早期版本中的resource manager的话，可以尝试禁用该特性: ALTER SYSTEM SET resource_manager_plan='' SCOPE=BOTH; 以便绕过bug 2494790。

admin 2011-04-28

Hdr: 3090309 9.2.0.3 RDBMS 9.2.0.3 TXN MGMT DIST PRODID-5 PORTID-453 2494790 Abstract: ORA-600 [KGSKDECRSTAT1] W/ SCHEDULE_PUSH AND RESOURCE_LIMIT=TRUE PROBLEM: -------- Fidelity continues to get the ORA-600 [kgskdecrstat1] Even after applying Bug Patch 2494790. DIAGNOSTIC ANALYSIS: -------------------- Current SQL statement for this session: DECLARE job BINARY_INTEGER := :job; next_date DATE := :mydate; broken BOOLEAN := FALSE; BEGIN declare rc binary_integer; begin rc := sys.dbms_defer_sys.push(destination=>'FSADB20.WORLD', stop_on_error= >FALSE, execution_seconds=>5400, delay_seconds=>50, parallelism=>0); end; :mydate := next_date; IF broken THEN :b := 1; ELSE :b := 0; END IF; END; ----- PL/SQL Call Stack ----- If I change resource_limit=FALSE, there is no error occurring. However, we do use profiles for some usernames and want the resource_limit=TRUE and need a solution to this ora-600 bug. WORKAROUND: ----------- none RELATED BUGS: ------------- 2894072 2494790 REPRODUCIBILITY: ---------------- Always TEST CASE: ---------- STACK TRACE: ------------ ksedmp()+248 CALL ksedst()+0 kgerinv()+160 PTR_CALL 00000000 kgeasnmierr()+28 CALL kgerinv()+0 kgskdecrstat()+448 CALL kgeasnmierr()+0 kskdecrstat()+500 CALL kgskdecrstat()+0 ktudecrustat()+460 CALL kskdecrstat()+0 ktcdso()+756 CALL ktudecrustat()+0 ktcrcm()+2812 CALL ktcdso()+0 ktdcmt()+132 CALL ktcrcm()+0 k2lcom()+88 CALL ktdcmt()+0 k2send()+1024 CALL k2lcom()+0 xctctl()+96 CALL k2send()+0 xctcom()+284 CALL xctctl()+0 psdtcm()+216 CALL xctcom()+0 pevm_icd_call_commo PTR_CALL 00000000 pfrrun()+7420 CALL pevm_icd_call_commo peicnt()+216 CALL pfrrun()+0 kkxexe()+524 CALL peicnt()+0 opiexe()+11508 CALL kkxexe()+0 opiodr()+2488 PTR_CALL 00000000 rpidrus()+112 CALL opiodr()+0 skgmstack()+152 PTR_CALL 00000000 rpidru()+124 CALL skgmstack()+0 rpiswu2()+328 PTR_CALL 00000000 rpidrv()+2428 CALL rpiswu2()+0 rpiexe()+32 CALL rpidrv()+0 kkjex1e()+4052 CALL rpiexe()+0 kkjsexe()+240 CALL kkjex1e()+0 kkjrdp()+804 CALL kkjsexe()+0 opirip()+992 CALL kkjrdp()+0 opidrv()+684 CALL opirip()+0 sou2o()+16 CALL opidrv()+0 main()+160 CALL sou2o()+0 _start()+264 CALL main()+0 SUPPORTING INFORMATION: ----------------------- The scenario is not a problem when I stop and start the resource_manager_plan repeatedly in a short period of time. The scenario is when then plan is active, and oracle advanced replication transactions are pushed, than the ORA-600 error occurs each time. 24 HOUR CONTACT INFORMATION FOR P1 BUGS: ---------------------------------------- DIAL-IN INFORMATION: -------------------- IMPACT DATE: ------------ ASAP Currently in the certification stage ... Going into Production within the next 1-2 weeks. Fidelity needs to resolve this issue before going to prod. Question from Client regarding _job_queue_interval ... ****************************************************** I am taking your direction and applying the patch, 2494790 on the solaris 2.8 machine and bouncing the icmq1 database at 4pm est today. My question, is the change to job_queue_interval from 60 to 2000. The issue to me is that we use job_queue_interval for 60 now w/ advanced replication and have a VERY high flow of transactions per minute, and my understanding is that the job_queue_interval wakes up and looks for work every 60 seconds. Changing this to 2000 means our schedule_push would be pushing every 33 minutes, instead of every one minute. This would be unacceptable if this assumption is correct. Please advise. From Fidelity : We have successfully installed the patch 2494790 and bounced the icmq1 database. I have restarted the database resource manager and did schedule push of oracle replicated transactions and reproduced the ORA-600 errors. I will upload the alert log and the latest .trc file for your review. Also, the job_queue_interval init.ora parameter is not a normal parameter available in 9.2.0.3 for me to modify. ************************* I did inform them this is an "underscore" parameter, but based on your latest update, IN will upload the latest trace and alert files, and wait further instructions. Here is the process that Fidelity uses to create the plan and start the resourse manager.... There is additional info as well. Procedure from Fidelity: The commands to start and stop the resource manager are... (run as my dba account...) exec dbms_resource_manager.switch_plan(''); exec dbms_resource_manager.switch_plan('WEEKLY_SYNCHRO_PLAN'); Here's my procedure to install the database resource manager via the oracle enterprise manager gui... 1. (assumes database was migrated from 8.1.7.4.0 to 9.2.0.3.0) 2. existing resource manager plan, called WEEKLY_SYNCHRO_PLAN deleted via oem 3. use create option (not create wizard) for resource plan and create a new plan called WEEKLY_SYNCHRO_PLAN via om 4. create new resource groups for DBA_GROUP, REPORTING_GROUP, SYNCHRO_GROUP. 5. assign users to default groups and assign 90% to 5 different levels for 5 different resource groups. 6. apply parallelism limitations for each group 7. turn on the plan.. exec dbms_resource_manager.switch_plan(`WEEKLY_SYNCHRO_PLAN?); 8. run a transaction on a table that is set up in real-time replication and ensure transaction is pushed via schedule_push from this database. 9. observe alert log to see if ORA-600 occurred. Additional Info from Fidelity: I think I need to clarify that the database resource plan is normally not running in production except for once/week from wed 8pm to thursday about 10am. So when the 8i to 9i database upgrade is done, the resource manager is definitely not running. I delete the resource plan while it is not active. Just tested by shutting off the plan and turning it on 5 minutes later, then pushing a replicated transaction, but did NOT cause an ORA-600 to occur. So I cannot prove that this alone causes the error. I had also reported earlier that I had run a large periodic update producing thousands of replicated transactions for a few hours and no additional ORA-600 was produced during this time period. Latest info from Fidelity: 1) true, problem only exists when resource_limit=TRUE this init.ora parameter setting is a requiremnt for us to go live. 2) I did not specify an undo limit for the resource manager. We are using the "old" rollback segments for this live install on 12-SEP-2003 and are NOT utilizing the new undo management feature for this release due to reports that it was "buggy" in the first release of oracle 9i. Since we are not utilizing the undo mgt feature, what exactly do you want me to try a workaround with? Also, I could not find info on the 10720 trace event. but since it undoes the undo mgt feature and we are not utilizing this feature, is this still an issue? . The current definition for the undo_pool feature within database resource manager looks like this... all 5 resource groups show the undo_pool entry as UNLIMITED. ---------------------- The only info that I could find about event 10720 is what you have already indicated and the little info in not 76237.1 "traces db scheduling". Fidelity has set the event 10720 in the system ... Here is the outcome ... ---------------- So far so good, event added to init.ora, icmq1 database bounced, I have turned the database resource manager plan on and off several times in row and pushed orep transactions and have so far, not resulted in any alert log error and no ORA-600 errors. I'll continue to test. Now, the patch that was applied to this database, 2494790, is this still required or should it be backed out? if it should be backed out, how is this done? -------------------- Can you provide feedback on the 2494790 patch?

admin 2011-04-28

Bug 2494790 Undo quota discrepancies may be reported for global transactions This note gives a brief overview of bug 2494790. The content was last updated on: 29-MAR-2004 Click here for details of each of the sections below. Affects: Product (Component) Oracle Server (Rdbms) Range of versions believed to be affected Versions >= 9.0 but < 10.1.0.2 Versions confirmed as being affected 9.2.0.3 Platforms affected Generic (all / most platforms affected) Fixed: This issue is fixed in 9.0.1.5 iAS Patch Set 9.2.0.4 (Server Patch Set) 10.1.0.2 (Base Release) Symptoms: Related To: (None Specified) XA / Distributed Transactions Description When the resource manager is enabled, global transactions can cause undo quota discrepancy messages in the alert log. eg: KTUDECRUSTAT: undo quota discrepancy: xcb=0, xcbusz=52, ResMgr=0 Workaround: These messages can be ignored as they are not a serious problem.

admin 2011-04-28

Type B - Defect Fixed in Product Version - Severity 2 - Severe Loss of Service Product Version 9.2.0.3.0 Status 36 - Duplicate Bug. To Filer Platform 23 - Oracle Solaris on SPARC (64-bit) Created 08-Apr-2003 Platform Version 5.8 Updated 26-Nov-2003 Base Bug 2494790 Database Version 9.2.0.3.0 Affects Platforms Generic Product Source Oracle Affected Version - Related Products Line Oracle Database Products Family Oracle Database Area Oracle Database Product 5 - Oracle Server - Enterprise Edition Hdr: 2894072 9.2.0.3.0 RDBMS 9.2.0.3.0 TXN MGMT LOCAL PRODID-5 PORTID-23 ORA-600 2494790 Abstract: ORA-600: INTERNAL ERROR CODE, ARGUMENTS: [KGSKDECRSTAT1] PROBLEM: -------- Customer is getting ORA-600: internal error code, arguments: [kgskdecrstat1] DIAGNOSTIC ANALYSIS: -------------------- WORKAROUND: ----------- Unknown RELATED BUGS: ------------- Bug:1533623> BugDB see Bug:1533623.-P No Base Bug Related Bugs : <BugMatrix:1533623> DDR: <BugDesc:1533623> Customer: INTERNAL Created: 07-DEC-00 Component: RDBMS Comp Ver: 9.0.0 Rel St: A Updated: 05-JUL-01 Sub Comp: RDBMS Ver: 9.0.0 By: ARHEE Status: 91,Closed, Could Not Reproduce Sup Pri: 2,Severe Loss of Service Fixed In Ver: O/S: 453 Sun SPARC Solaris PL Group: UNIX Gen/Port: G Error #: - Pub: N ************************************************************************** Hdr: 1533623 9.0.0 RDBMS 9.0.0 PRODID-5 PORTID-453 Abstract: ORA-600 [KGSKDECRSTAT1], [], [], [], [], [] ON SWITCH CONSUMER GROUPS REPRODUCIBILITY: ---------------- TEST CASE: ---------- N/A one time error STACK TRACE: ------------ ORA-600: internal error code, arguments: [kgskdecrstat1], [], [], [], [], [], [], [] ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- ksedmp kgerinv kgeasnmierr kgskdecrstat kskdecrstat ktudecrustat ktcdso ktcrcm ktdcmt k2lcom k2send xctctl xctcom opiodr ttcpip opitsk opiino opiodr opidrv sou2o main _start SUPPORTING INFORMATION: ----------------------- Will upload related files.

评论 (3)