Oracle RAC内部错误:ORA-00600[keltnfy-ldmInit]一例

一套SUNOS上的2节点10.2.0.2 RAC系统日前出现ORA-00600: internal error code, arguments: [keltnfy-ldmInit], [46], [1], [], [], [], [], []内部错误,错误发生时系统操作人员误使用hostname命令修改了1号主机的主机名,之后陆续出现以上ora-00600错误,同时操作系统日志显示RAC CSS进程意外终止,具体日志如下:

================== OS Message=====================
Jan 10 11:15:10 cupd25k-a root: [ID 702911 user.error] Cluster Ready Services completed waiting on dependencies.
Jan 10 11:15:16 cupd25k-a root: [ID 702911 user.error] Duplicate Oracle CLSMON found. Killing and restarting it.
Jan 10 11:15:16 cupd25k-a root: [ID 702911 user.error] Oracle CSS daemon failed to start up. Check CRS logs for diagnostics.
Jan 10 11:15:16 cupd25k-a root: [ID 702911 user.error] Oracle CLSMON terminated with unexpected status 137. Respawning

/* 这里的Duplicate Oracle CLSMON found 因该指的是OCLSMON进程,
"In Oracle 10.2.0.2 and above there is an additional process called OCLSOMON
which monitors the CSS daemon for hangs or scheduling issues and can reboot a
node if there is a perceived hang. OCLSOMON is spawned in init.cssd and runs
as the Oracle user."
   oclsmon进程在10.2.0.2以后版本被引入,用以监视css进程,
   若发生hang或操作系统调度问题时该进程可能会reboot节点,
   oclsmon进程会被init.cssd脚本spawned.  */

==================oclsmon.log======================
2011-01-10 11:15:11.376
unspecified member number is (1)
Member 1 group OCLSMON_ in use. Is oclsmon already up?
2011-01-10 11:15:11.479
Internal Error Information:
  Category: 8
  Operation: skgxnreg: the member number is i
  Location: skgxnreg_7
  Other:
  Dep: 1
2011-01-10 11:15:11.737
unspecified member number is (1)
Member 1 group OCLSMON_ in use. Is oclsmon already up?
2011-01-10 11:15:11.751
Internal Error Information:
  Category: 8
  Operation: skgxnreg: the member number is i
  Location: skgxnreg_7
  Other:
  Dep: 1
2011-01-10 11:15:12.006
unspecified member number is (1)
Member 1 group OCLSMON_ in use. Is oclsmon already up?
2011-01-10 11:15:12.023
Internal Error Information:
  Category: 8
  Operation: skgxnreg: the member number is i
  Location: skgxnreg_7
  Other:
  Dep: 1
2011-01-10 11:15:12.278
unspecified member number is (1)
Member 1 group OCLSMON_ in use. Is oclsmon already up?
2011-01-10 11:15:12.293
Internal Error Information:
  Category: 8
  Operation: skgxnreg: the member number is i
  Location: skgxnreg_7
  Other:
  Dep: 1

/*  skgxn是Oracle Clusterware用以监视skgxn事件(即第三方CLUSTERWARE相关的事宜,他们应该有用sun的cluster);
    似乎是修改hostname导致了Oracle CSS出现了fatal error,并启动了一个以上的OCLSMON进程(Duplicate Oracle CLSMON found),
    最后"Oracle CSS daemon failed to start up. Check CRS logs for diagnostics",
    在Oracle instance启动的情况下25k-a节点的CSS进程意外终止,
    可能导致该节点上的所有实例的LMD(global Enqueue Service daemon)、LMON无法正常工作而导致实例hang住。*/

==========================alert.log====================
Errors in file /oracle/oracle/admin/BOCPCS/udump/bocpcs1_ora_12320.trc:
ORA-00600: internal error code, arguments: [keltnfy-ldmInit], [46], [1], [], [], [], [], []

=========================part of trace file===============
*** 2011-01-10 11:11:02.957
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [keltnfy-ldmInit], [46], [1], [], [], [], [], []
Current SQL information unavailable - no session.
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedmp()+716         CALL     ksedst()             FFFFFFFF7FFF9D40 ?
                                                   000000000 ? 0FFFFFFFF ?
                                                   FFFFFFFF7FFF8EE8 ?
                                                   FFFFFFFF7FFFA640 ?
                                                   000000008 ?
kgerinv()+200        PTR_CALL 0000000000000000     000000002 ? 10638A1CC ?
                                                   000000001 ? 000000000 ?
                                                   10638A000 ? 10638A1CC ?
kgeasnmierr()+28     CALL     kgerinv()            106384B98 ? 000000000 ?
                                                   105D3B940 ? 000000002 ?
                                                   FFFFFFFF7FFFDFF0 ?
                                                   000001430 ?
keltnfy()+784        CALL     kgeasnmierr()        106384B98 ? 1064DCBF0 ?
                                                   105D3B940 ? 000000002 ?
                                                   000000000 ? 00000002E ?
kscnfy()+552         PTR_CALL 0000000000000000     10639B498 ? 38001E7A8 ?
                                                   1055AC5D0 ? 10639B498 ?
                                                   000102C00 ? 10638A1C0 ?
ksucrp()+2436        CALL     kscnfy()             000008000 ? 000808214 ?
                                                   100C4C220 ? 1055C6680 ?
                                                   00000000F ? 000000001 ?
opiino()+2056        CALL     ksucrp()             000106387 ? 380007608 ?
                                                   000000000 ? 000380000 ?
                                                   000106000 ? 106387618 ?
opiodr()+1488        PTR_CALL 0000000000000000     10555A000 ?
                                                   FFFFFFFF7FFFF1C8 ?
                                                   00010555A ? 000106000 ?
                                                   105C83000 ? 000000001 ?
opidrv()+828         CALL     opiodr()             106391000 ? 000000000 ?
                                                   106390DD8 ? 106390000 ?
                                                   106391BD0 ? 000106000 ?
sou2o()+80           CALL     opidrv()             106394358 ? 000000001 ?
                                                   00000003C ? 000000000 ?
                                                   00000003C ? 000106000 ?
opimai_real()+124    CALL     sou2o()              FFFFFFFF7FFFF788 ?
                                                   00000003C ? 000000004 ?
                                                   FFFFFFFF7FFFF7B0 ?
                                                   105C82000 ? 000105C82 ?
main()+152           CALL     opimai_real()        000000002 ?
                                                   FFFFFFFF7FFFF888 ?
                                                   103F1BBCC ? 10632DB10 ?
                                                   002411E44 ? 000014400 ?
_start()+380         CALL     main()               000000002 ? 000000008 ?
                                                   000000000 ?
                                                   FFFFFFFF7FFFF898 ?
                                                   FFFFFFFF7FFFF9A8 ?
                                                   FFFFFFFF7C700200 ?

/* 可以看到以上trace文件指出了no session,
    在服务进程启动阶段遭遇了该keltnfy-ldmInit内部错误*/

metalink文档Startup Database Produces Ora-00600: [Keltnfy-Ldminit] [ID 336447.1]
介绍了该内部错误一般由主机上的不当网络配置引起,很显然使用hostname命令修改了一个无法解析的
主机名时可能引发该ORA-00600[keltnfy-ldmInit]内部错误。

Applies to:
Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 10.2.0.3 - Release: 10.2 to 10.2
Information in this document applies to any platform.
***Checked for relevance on 09-Jun-2010***
Symptoms

An startup nomount on Oracle 10g Release 2 database produces the following exception in alert log

Starting up ORACLE RDBMS Version: 10.2.0.1.0.
Errors in file /opt/oracle/10.2/admin/ORCL/udump/ORCL_ora_535.trc:
ORA-00600: internal error code, arguments: [keltnfy-ldmInit], [46], [1], [], [], [], [], []
USER: terminating instance due to error 600
Instance terminated by USER, pid = 535
Cause

The problem is related to getting host information.
In this case, ldmInit()/sldmInit() is failing with error 46 : LDMERR_HOST_NOT_FOUND

The following exception may also occur :

LDMERR_SOSD_INIT         OSD init failed to be specific in these OSD failures
 LDMERR_BAD_ADDR         bad address when system call gethostname failed
 LDMERR_HOST_NOT_FOUND   gethostbyname system call fails
 LDMERR_NO_SUPPORT       when specific address type is not supported

Development has fixed two bugs so far regarding this issue

Bug:5438154 - Abstract: ORA-600[KELTNFY-LDMINIT]  STARTING THE DB
Release Notes:
ldmInit returned LDMERR_HOST_NOT_FOUND for the machine huge alias list/address list
Workaround:
reduce the alais list of the machine

Bug:5486074 - Abstract: ORA-600 [KELTNFY-LDMINIT] WHEN DNS IS NOT AVAILABLE
Release Notes:
Internal error is raised by the Server Generated Alert subsystem when it can not determine Host Name or
Network Address. This can be caused by DNS server being unaavilable. 

Solution

The fix for 5486074 will not fix any underlying error from gethostbyname(), it just change the internal error to a warning message :

 "Warning: keltnfy call to ldmInit failed with error 46"

You will still need to fix the network config issue.  

These are the check you can do verify the host information 

      Check permission on /etc/hosts 

$ ls -l /etc/hosts
-rw-r--r--  2 root root 194 Oct 17  2006 /etc/hosts

      Check if /etc/hosts file is correctly configured

              ( all of this on one line ). 

Check the hostname:
$ hostname
$ ping `hostname`

Make sure you are able to ping the hostname
      Check if /etc/nodename is correctly configured

If you have DNS setup, ping is not a tool to diagnose DNS problem. A better tool to use is nslookup, dnsquery, or dig.

$ nslookup
$ nslookup
$ nslookup 

The forward and reverse lookup should succeed and return consistent address/info.  

 Check nsswitch.conf

$ more nsswitch.conf
hosts:      files dns
Make sure host lookup is also done through the /etc/hosts file and not just dns.  It is recommended that FILES come first before DNS.
Also, check the resolv.conf. This makes sure that the DNS is working properly.

显然在生产主机上使用hostname命令是危险的,因为你很难保证你在打字的时候不会因为同事的一下拍击而输错,有人说在生产环境中rm命令因该被禁用,那么这种特殊待遇对hostname命令也适用,我们可以用什么来代替hostname查看主机名呢?选择可以有非常多,这里我推荐一种:

-bash-3.00$ oslevel -r 
5300-07

-bash-3.00$ hostname
oracledatabase12g.com

-bash-3.00$ uname -n
oracledatabase12g.com

/* uname -n完全可以满足你的需要! */
That's great!

Oracle内部错误:ORA-00600[OSDEP_INTERNAL]一例

一套HP-UX上的9.2.0.5系统在shutdown abort时出现ORA-00600: internal error code, arguments: [OSDEP_INTERNAL], [], [], [], [], [], [], []内部错误,伴随有ORA-27302: failure occurred at: skgpwinit4,ORA-27303: additional information: attach to invalid skgp shared ctx,具体日志如下:

/opt/oracle/product/9.2.0.5/rdbms/log/ngende_ora_7669.trc
Oracle9i Enterprise Edition Release 9.2.0.5.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.5.0 - Production
ORACLE_HOME = /opt/oracle/product/9.2.0.5
System name: HP-UX
Node name: yictngd3
Release: B.11.23
Version: U
Machine: ia64
Instance name: nGende
Redo thread mounted by this instance: 0 
Oracle process number: 0
7669

*** 2010-09-08 00:10:02.985
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [OSDEP_INTERNAL], [], [], [], [], [], [], []
ORA-27302: failure occurred at: skgpwinit4
ORA-27303: additional information: attach to invalid skgp shared ctx
Current SQL information unavailable - no session.

Call stack
--------------
ksedmp <- ksfdmp <- kgerinv <- kgerin <- kgerecoserr <- ksucrp <- ksucresg <- kpolna 
<- kpogsk <- opiodr <- ttcpip <- opitsk <- Cannot <- Cannot <- Cannot <- Cannot <- opiino 
<- opiodr <- opidrv <- sou2o <- main <- main_opd_entry

经查该内部错误与操作系统共享内存有关,相关的Note有:

Ora-00600: Internal Error Code, Arguments: [Osdep_internal] [ID 304027.1]
Applies to:
Oracle Server - Enterprise Edition - Version: 9.2.0.2 to 10.2.0.3 - Release: 9.2 to 10.2
Information in this document applies to any platform.
***Checked for relevance on 03-NOV-2010***

Getting ORA-600 [OSDEP_INTERNAL] errors while starting up the database:

ORA-00600: internal error code, arguments: [OSDEP_INTERNAL],
[], [], [], [], [], [], []
ORA-27302: failure occurred at: skgpwreset1
ORA-27303: additional information: invalid shared ctx
ORA-27146: post/wait initialization failed
ORA-27300: OS system dependent operation:semget failed with status: 28
ORA-27301: OS failure message: No space left on device
ORA-27302: failure occurred at: sskgpsemsper
Symptoms
Getting ORA-600 [OSDEP_INTERNAL]
Accompanied by the following errors
ORA-27302:Failure occured at: skgpwreset1
ORA-27303:additional information: invalid shared ctx
ORA-27146: post/wait initialization failed
ORA-27300: OS system dependent operation: segment failed with error 28
ORA-27301: OS system Failure message: No space left on device
ORA-27302: failure occured at: sskgpsemsper

Cause
The functions in the trace file generated point to the semaphore settings .
Smmns is set too low.

Solution
set semmns 32767
Arrange to make the changes persistent as per the Operating system then restart the server and check if the changes are persistent.
eg: Linux /etc/sysctl.conf

sem = semmsl semmns semopm semmni
kernel.sem = 256 32768 100 228

Getting ORA-00600 [OSDEP_INTERNAL]: Internal Error While Trying To Connect / As Sysdba [ID 253885.1]

Applies to:
Oracle Server - Enterprise Edition - Version: 9.2.0.3 and later   [Release: 9.2 and later ]
HP-UX PA-RISC (64-bit)
Symptoms
Getting following error while trying to connect as sysdba using sqlplus:

SQL> conn / as sysdba
ERROR:
ORA-01041: internal error. hostdef extension doesn't exist

Alert.log shows:

ORA-00600: internal error code, arguments: [OSDEP_INTERNAL], [], [], [], [], [],[], []
ORA-27302: failure occurred at: skgpwinit4
ORA-27303: additional information: attach to invalid skgp shared ctx
Cause
- Database was shutdown using "shutdown abort" option.
- Shared memory segment was not removed even though the instance was down.
Solution
+ Check which shared memory segments are owned by the oracle owner

Use the ipcs -bm command:

% ipcs -bm

m 34034336 0xf8f18468 --rw-r----- ORACLE dba 16777216

+ Delete the 'orphan' shared memory segments:

% ipcrm -m 34034336

If there is more than one instance running on the server and you are not sure how to identify the shared
memory segments then please contact support.

不恰当的设置OS VM参数可能导致该问题,而在HP-UX PA-RISC平台上使用'shotdown abort'命令时可能因为共享内存未能正常移除而出现该内部错误;因为实例还是以'abort'方式关闭的,仅仅是共享内存未能释放,所以只需要以ipcs->ipcrm等os命令将相应的共享内存段释放就可以了,不会造成其他影响。

Oracle内部错误:ORA-00600[15801], [1]一例

一套Sparc Solaris上的11.1.0.7系统,在创建索引时频繁出现ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []内部错误,日志信息如下:

Tue Aug 17 17:34:21 2010
WARNING: Oracle executable binary mismatch detected.
Binary of new process does not match binary which started instance
issue alter system set "_disable_image_check" = true to disable these messages
Tue Aug 17 17:34:21 2010
Errors in file /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/trace/ORAHCMU_p023_22262.trc (incident=12505):
ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []
Incident details in: /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/incident/incdir_12505/ORAHCMU_p023_22262_i12505.trc
Tue Aug 17 17:34:21 2010
Errors in file /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/trace/ORAHCMU_p021_22258.trc (incident=12489):
ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []
Incident details in: /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/incident/incdir_12489/ORAHCMU_p021_22258_i12489.trc

Errors in file /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/trace/ORAHCMU_p015_9328.trc (incident=19909):
ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []
Errors in file /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/trace/ORAHCMU_p043_9388.trc (incident=20133):
ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []
Mon Aug 23 14:43:42 2010
Errors in file /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/trace/ORAHCMU_p087_9668.trc (incident=20485):
ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []
Mon Aug 23 14:43:42 2010
Errors in file /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/trace/ORAHCMU_p012_9322.trc (incident=19885):
ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []
Incident details in: /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/incident/incdir_19789/ORAHCMU_ora_8602_i19789.trc
Mon Aug 23 14:43:43 2010
WARNING: Oracle executable binary mismatch detected.
Binary of new process does not match binary which started instance
issue alter system set "_disable_image_check" = true to disable these messages

Dump continued from file: /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/trace/ORAHCMU_ora_8602.trc
ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []

*** 2010-08-23 14:43:42.974
----- Current SQL Statement for this session (sql_id=00abhfx460qm9) -----
CREATE UNIQUE iNDEX PS_HM_BEN_GP_STG ON PS_HM_BEN_GP_STG (CAL_ID, GP_PAYGROUP, 
EMPLID, EMPL_RCD, HM_INCURRED_BY, HM_SUM_ASSURED) TABLESPACE PSINDEX STORAGE 
(INITIAL 40000 NEXT 100000 MAXEXTENTS UNLIMITED PCTINCREASE 0) PCTFREE 10 PARALLEL NOLOGGING

----- Call Stack Trace -----
ksedst1 ksedst dbkedDefDump dbgexPhaseII dbgexProcessError dbgePostErrorKGE kgeade kgerem
kxfpProcessError kxfpqidqr kxfpqdqr kxfxgs kxfxcp qerpxSendParse kxfpValidateSlaveGroup kxfpgsg
kxfrAllocSlaves kxfrialo kxfralo qerpx_rowsrc_start qerpxStart kdicrws kdicdrv opiexe opiosq0
kpooprx kpoal8 opiodr ttcpip opitsk opiino opiodr opidrv sou2o main



SO: 0x3bf0bbf20, type: 4, owner: 0x3bf5452d0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x3bf5452d0, name=session, file=ksu.h LINE:10719 ID:, pg=0
(session) sid: 217 ser: 767 trans: 0x3bc0660f8, creator: 0x3bf5452d0
flags: (0x8000041) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x44008) DDLT1/-
DID: , short-term DID:
txn branch: 0x0
oct: 9, prv: 0, sql: 0x3b5d14510, psql: 0x3b6d59820, user: 31/SYSADM
ksuxds FALSE at location: 0
service name: ORAHCMU
client details:
O/S info: user: Administrator, term: UJWALTPVM, ospid: 304:2892
machine: WORKGROUP\UJWALTPVM program: pside.exe
client info: ujwal,Administrator,UJWALTPVM,,pside.exe,
application name: pside.exe, hash value=2824484291
Current Wait Stack:
Not in wait; last wait ended 2.475286 sec ago
Wait State:
auto_close=0 flags=0x21 boundary=0x0/-1
Session Wait History:
0: waited for 'lient'
=c8, =1, =0
wait_id=10483 seq_num=10484 snap_id=1
wait times: snap=0.168502 sec, exc=0.168502 sec, total=0.168502 sec
wait times: max=2.000000 sec
wait counts: calls=1 os=1
occurred after 0.000903 sec of elapsed time
1: waited for ' waiting for ruleset'
=10010063, =1, =0
wait_id=10482 seq_num=10483 snap_id=1
wait times: snap=0.008580 sec, exc=0.008580 sec, total=0.008580 sec
wait times: max=2.000000 sec
wait counts: calls=1 os=1
occurred after 0.000731 sec of elapsed time
2: waited for ' waiting for ruleset'
=1001004f, =4, =0
wait_id=10481 seq_num=10482 snap_id=1
wait times: snap=0.000132 sec, exc=0.000132 sec, total=0.000132 sec
wait times: max=2.000000 sec
wait counts: calls=1 os=1
occurred after 0.000074 sec of elapsed time
3: waited for ' waiting for ruleset'
=1001004f, =3, =0
wait_id=10480 seq_num=10481 snap_id=1
wait times: snap=0.000002 sec, exc=0.000002 sec, total=0.000002 sec
wait times: max=2.000000 sec
wait counts: calls=1 os=1
occurred after 0.000065 sec of elapsed time

----- Session Cursor Dump -----
Current cursor: 1, pgadep=0

Open cursors(pls, sys, hwm, max): 3(0, 2, 64, 300)
NULL=1 SYNTAX=0 PARSE=0 BOUND=1 FETCH=0 ROW=1
Cached frame pages(total, free):
4k(14, 14), 8k(1, 1), 16k(1, 1), 32k(0, 0)

----- Current Cursor -----


----- Plan Table -----

============
Plan Table
============
----------------------------------------------------+-----------------------------------+-------------------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time | TQ |IN-OUT|PQ Distrib |
----------------------------------------------------+-----------------------------------+-------------------------+
| 0 | CREATE INDEX STATEMENT | | | | 2 | | | | |
| 1 | PX COORDINATOR | | | | | | | | |
| 2 | PX SEND QC (ORDER) | :TQ10001 | 82 | 4510 | | |:Q1001| P->S |QC (ORDER) |
| 3 | INDEX BUILD UNIQUE | PS_HM_BEN_GP_STG| | | | |:Q1001| PCWP | |
| 4 | SORT CREATE INDEX | | 82 | 4510 | | |:Q1001| PCWP | |
| 5 | PX RECEIVE | | 82 | 4510 | 2 | 00:00:01 |:Q1001| PCWP | |
| 6 | PX SEND RANGE | :TQ10000 | 82 | 4510 | 2 | 00:00:01 |:Q1000| P->P |RANGE |
| 7 | PX BLOCK ITERATOR | | 82 | 4510 | 2 | 00:00:01 |:Q1000| PCWC | |
| 8 | TABLE ACCESS FULL | PS_HM_BEN_GP_STG| 82 | 4510 | 2 | 00:00:01 |:Q1000| PCWP | |
----------------------------------------------------+-----------------------------------+-------------------------+


----------------------------------------
Cursor#1(0xffffffff7ce31928) state=BOUND curiob=0xffffffff7ce57d28
curflg=4c fl2=0 par=0x0 ses=0x3bf0bbf20
----- Dump Cursor sql_id=00abhfx460qm9 xsc=0xffffffff7ce57d28 cur=0xffffffff7ce31928 -----
Dump Parent Cursor sql_id=00abhfx460qm9 phd=0x3b5d14510 plk=0x3b0bb3318
sqltxt(0x3b5d14510)=CREATE UNIQUE iNDEX PS_HM_BEN_GP_STG ON PS_HM_BEN_GP_STG 
(CAL_ID, GP_PAYGROUP, EMPLID, EMPL_RCD, HM_INCURRED_BY, HM_SUM_ASSURED) 
TABLESPACE PSINDEX STORAGE (INITIAL 40000 NEXT 100000 MAXEXTENTS UNLIMITED PCTINCREASE 0) 
PCTFREE 10 PARALLEL NOLOGGING
hash=616eaa631fc21f4c0029707748605a69
parent=0x3ae539590 maxchild=01 plk=0x3b0bb3318 ppn=n
cursor instantiation=0xffffffff7ce57d28 used=1282545779 exec_id=16777216 exec=1
child#0(0x3b5d05e10) pcs=0x3b678c128
clk=0x3b7e200d0 ci=0x3b5b204c8 pn=0x39955d2b8 ctx=0x3b86ee988
kgsccflg=0 llk[0xffffffff7ce57d30,0xffffffff7ce57d30] idx=0
xscflg=c0102276 fl2=c000400 fl3=2202008 fl4=100
Frames pfr 0xffffffff7ce67098 siz=85976 efr 0xffffffff7ce66fb8 siz=85960
Cursor frame dump
enxt: 7.0x00000168 enxt: 6.0x00008000 enxt: 5.0x00008000 enxt: 4.0x00003978
enxt: 3.0x00000490 enxt: 2.0x000000b8 enxt: 1.0x00000fa0
pnxt: 1.0x00000010
kxscphp=0xffffffff7dd80a18 siz=984 inu=312 nps=312
kxscwhp=0xffffffff7ddd2cc8 siz=8136 inu=6264 nps=3968
kxscefhp=0xffffffff7ce51468 siz=88456 inu=86128 nps=86128


FileName
----------------
ORAHCMU_ora_8602.trc

FileComment
----------------------


Oracle Support - August 27, 2010 6:13:39 PM GMT+08:00 [ODM Data Collection]
Name
--------
=== ODM Data Collection ===

=== ODM Data Collection ===

Trace file /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/trace/ORAHCMU_p012_9322.trc


*** 2010-08-23 14:43:00.472
WARNING: Oracle executable binary mismatch detected.
Binary of new process does not match binary which started instance
issue alter system set "_disable_image_check" = true to disable these messages
startup image information
iid info sz=245752512 inode=65458 ts=0x4c6df668
current process image information
iid info sz=245750720 inode=65427 ts=0x4c7204b0
set _disable_image_check = TRUE to disable this check
qksceLinearToCe error

*** 2010-08-23 14:43:42.974
*** SESSION ID:(220.111) 2010-08-23 14:43:42.974
*** CLIENT ID:(ujwal) 2010-08-23 14:43:42.974
*** SERVICE NAME:(ORAHCMU) 2010-08-23 14:43:42.974

DDE: Problem Key 'ORA 600 [15801]' was flood controlled (0x6) (incident: 19885)
ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []
kxfxdss
KXFXSLAVESTATE dump [0, 0]
(pgakid: 0 oercnt: 0 oerrcd: -2224892588)
kxfxdss
no current cursor context.
kxfxdss
no cursors.

关于binary no match的问题已知是由于在实例启动情况下relink导致的;这个case提交了SR,metalink认为ORA-600 15801一般由QC与服务子进程通信问题引起:

The ORA-600 15801 is reporting a communication problem between QC and slaves related with messages sent/received.
Alert log reports several of the following error on the ASM instance:
ORA-600: internal error code, arguments: [15801], [1], [], [], [], [], [], 
[]

last wait was for 'eq: Msg Fragment' 

DIAGNOSTIC ANALYSIS:
--------------------
There were also several of the following message in the alert log:
WARNING: Oracle executable binary mismatch detected.
 Binary of new process does not match binary which started instance
issue alter system set "_disable_image_check" = true to disable these 
messages

So, I asked the customer to set the "_disable_image_check" = true 
This had no impact on the ora-600 errors as expected.

ORA-600 [15801] is signalled when a message overflow occurs between  PQ 
processes.

WORKAROUND:
-----------
none 
RELATED BUGS:
-------------
none
REPRODUCIBILITY:
----------------
intermittent but frequently - occurs at all different times of the day.
STACK TRACE:
------------
*** ID:(29.2904) 2006-07-05 15:50:57.972
qksceLinearToCe error
*** 15:50:58.233
ksedmp: internal or fatal error
ORA-600: internal error code, arguments: [15801], [1], [], [], [], [], [], 
[]
----- Call Stack Trace -----

kxfxGeter qks3tttdefReceive kxfxsui kxfxsp kxfxmai kxfprdp 

    SO: 0x67977018, type: 4, owner: 0x6793f208, flag: INIT/-/-/0x00
    (session) sid: 29 trans: (nil), creator: 0x6793f208, flag: (c0000041) 
USR/- BSY/-/-/-/-/-
              DID: 0000-0012-0000FADB, short-term DID: 0000-0000-00000000
              txn branch: (nil)
              oct: 3, prv: 0, sql: (nil), psql: (nil), user: 0/SYS
    O/S info: user: oracle, term: , ospid: 4558, machine: 
    last wait for 'eq: Msg Fragment' blocking sess=0x(nil) seq=2 
wait_time=4441 seconds since wait started=3
                ct path write=1002ffff, ct path write temp=2, Network=0
    Dumping Session Wait History
     for 'eq: Msg Fragment' count=1 wait_time=4441
                ct path write=1002ffff, ct path write temp=2, Network=0
     for 'eq: Msg Fragment' count=1 wait_time=31
                ct path write=1002ffff, ct path write temp=1, Network=0
    temporary object counter: 0

最后这个case通过设置10235和10501事件后错误不再产生了:

event = "10235 trace name context forever, level 2"  

10235, 00000, "check memory manager internal structures" 

event = "10501 trace name context forever, level 1"
  
10501, 00000, "periodically check selected heap"
// *Cause:
// *Action:
//    Level:  0x01 PGA
//            0x02 SGA
//            0x04 UGA
//            0x08 current call
//            0x10 user call
//            0x20 large allocation pool

Oracle内部错误:ORA-00600:[4097]一例

一套Linux上的10.2.0.4系统在异常恢复后(使用_allow_resetlogs_corruption隐藏参数打开后遭遇ORA-00600:[40xx]相关的内部错误,创建并切换到了新的撤销表空间上)出现ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []内部错误,当该非内部错误(non-fatal)出现100次以上时会在告警日志alert.log中出现记录。
并有可能导致实例crash,具体日志如下:

 

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

 

诗檀软件专业数据库修复团队

 

服务热线 : 400-690-3643   备用电话: 18501767907    邮箱:service@parnassusdata.com

 

 

Errors in file /s01/10gdb/admin/clinica/bdump/clinica_smon_21463.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Tue Jan  4 23:13:19 2011
Non-fatal internal error happenned while SMON was doing logging scn->time mapping.
SMON encountered 1 out of maximum 100 non-fatal internal errors.

clinica_smon_21463.trc:
Dump of buffer cache at level 4 for tsn=1, rdba=8388633
BH (0x91fdf428) file#: 2 rdba: 0x00800019 (2/25) class: 19 ba: 0x91c62000
  set: 3 blksize: 8192 bsi: 0 set-flg: 0 pwbcnt: 0
  dbwrid: 0 obj: -1 objn: 0 tsn: 1 afn: 2
  hash: [fcf7dd68,fcf7dd68] lru: [91fdf5b8,91fdf398]
  ckptq: [NULL] fileq: [NULL] objq: [f5b53d60,f5b53d60]
  use: [fa694970,fa694970] wait: [NULL]
  st: XCURRENT md: SHR tch: 0
  flags: gotten_in_current_mode
  LRBA: [0x0.0.0] HSCN: [0xffff.ffffffff] HSUB: [65535]
  buffer tsn: 1 rdba: 0x00800019 (2/25)
  scn: 0x0000.0352d07c seq: 0x01 flg: 0x00 tail: 0xd07c2601
  frmt: 0x02 chkval: 0x0000 type: 0x26=KTU SMU HEADER BLOCK

/* 这里dump了一个tsn=1,file#=2的数据块,
    可以看到它的类型是KTU SMU HEADER BLOCK即某个回滚段头
*/

Hex dump of block: st=0, typ_found=1
........................
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Current SQL statement for this session:
insert into smon_scn_time (thread, time_mp, time_dp, scn, scn_wrp, scn_bas,  num_mappings, tim_scn_map) 
values (0, :1, :2, :3, :4, :5, :6, :7)
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst()+31          call     ksedst1()            000000000 ? 000000001 ?
                                                   7FFFF53BC160 ? 7FFFF53BC1C0 ?
                                                   7FFFF53BC100 ? 000000000 ?
ksedmp()+610         call     ksedst()             000000000 ? 000000001 ?
                                                   7FFFF53BC160 ? 7FFFF53BC1C0 ?
                                                   7FFFF53BC100 ? 000000000 ?
ksfdmp()+21          call     ksedmp()             000000003 ? 000000001 ?
                                                   7FFFF53BC160 ? 7FFFF53BC1C0 ?
                                                   7FFFF53BC100 ? 000000000 ?
kgeriv()+176         call     ksfdmp()             000000003 ? 000000001 ?
                                                   7FFFF53BC160 ? 7FFFF53BC1C0 ?
                                                   7FFFF53BC100 ? 000000000 ?
kgesiv()+119         call     kgeriv()             0068C97C0 ? 2ABDF1D42BF0 ?
                                                   000000000 ? 0F4A33EA0 ?
                                                   7FFFF53BC100 ? 000000000 ?
ksesic0()+209        call     kgesiv()             0068C97C0 ? 2ABDF1D42BF0 ?
                                                   000001001 ? 000000000 ?
                                                   7FFFF53BCEE0 ? 000000000 ?
ktugti()+3200        call     ksesic0()            000001001 ? 0068C9940 ?
                                                   000000000 ? 00000009A ?
                                                   000000010 ? 101010101010101 ?
ktsftcmove()+4149    call     ktugti()             0B73F111C ? 7FFFF53BD278 ?
                                                   7FFFF53BD280 ? 000000000 ?
                                                   7FFFF53BD27C ? 7FFFF53BD270 ?
ktsf_gsp()+1937      call     ktsftcmove()         00000000A ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   7FFFF53BD27C ? 7FFFF53BD270 ?
kdtgsp()+512         call     ktsf_gsp()           000000000 ? 7FFFF53BF460 ?
                                                   000000024 ? 000000002 ?
                                                   7FFFF53BF460 ? 000000000 ?
kdccak()+111         call     kdtgsp()             2ABDF1D6A2D8 ? 7FFF00000000 ?
                                                   2ABDF1D68530 ? 000000002 ?
                                                   7FFFF53BF460 ? 000000000 ?
kdcgcs()+5419        call     kdccak()             2ABDF1D6A2D8 ? 000000001 ?
                                                   0F4A3BBA8 ? 000000000 ?
                                                   2ABDF1D6A370 ? 000000000 ?
kdcgsp()+1372        call     kdcgcs()             2ABDF1D6A2D8 ? 000000001 ?
                                                   0F4A3BBA8 ? 000000000 ?
                                                   2ABDF1D6A370 ? 000000000 ?
kdtInsRow()+1808     call     kdcgsp()             2ABDF1D6A2D8 ? 000000001 ?
                                                   0F4A3BBA8 ? 000000000 ?
                                                   2ABDF1D6A370 ? 000000000 ?
insrow()+342         call     kdtInsRow()          2ABDF1D6A2D8 ? 000000001 ?
                                                   0F4A3BBA8 ? 000000000 ?
                                                   2ABDF1D6A370 ? 000000000 ?
insdrv()+594         call     insrow()             2ABDF1D6A2D8 ? 7FFFF53BFCC8 ?
                                                   000000000 ? 0F4A33DE0 ?
                                                   2ABDF1D6A370 ? 000000000 ?
inscovexe()+404      call     insdrv()             2ABDF1D6A2D8 ? 7FFFF53BFCC8 ?
                                                   000000000 ? 2ABDF1D6D908 ?
                                                   2ABDF1D6A370 ? 000000000 ?
insExecStmtExecIniE  call     inscovexe()          0F4A33DE0 ? 0F4A3C230 ?
ngine()+85                                         7FFFF53C0EF0 ? 2ABDF1D69F20 ?
                                                   2ABDF1D6A370 ? 000000000 ?
insexe()+386         call     insExecStmtExecIniE  0F4A33DE0 ? 0F4A3C230 ?
                              ngine()              2ABDF1D69F20 ? 2ABDF1D69F20 ?
                                                   2ABDF1D6A370 ? 000000000 ?
opiexe()+9182        call     insexe()             0F4A333A8 ? 7FFFF53C0EF0 ?
                                                   0F4A33DE0 ? 2ABDF1D69F20 ?
                                                   2ABDF1D6A370 ? 2ABDF1D69F20 ?
opiall0()+1842       call     opiexe()             000000049 ? 000000003 ?
                                                   7FFFF53C12F8 ? 000000001 ?
..............

针对该ORA-00600:[4097]内部错误,metalink上Note [ID 1030620.6]介绍了一种workaround的方法:

An ORA-600 [4097] can be encountered through various activities that use 
rollback segments.

Solution Description: 
===================== 

The most likely cause of this is BUG 427389.  This BUG is fixed in
version 7.3.3.3.  The BUG is caused when Rollback Segments are dropped and 
recreated after a shutdown abort.  It is encountered through a very specific 
set of circumstances: 

When an instance has a rollback segment offline and the instance crashes, or 
the user does a shutdown abort, the rollback segment wrap number does not get 
updated.  If that segment is then dropped and recreated immediately after the 
instance is restarted, the wrap number could be lower than existing wrap 
numbers.  This will cause the ORA-600[4097] to occur in subsequent 
transactions using Rollback. 

To avoid encountering this bug, rollback segments should only be dropped and 
recreated after the instance has been shutdown normal and restarted.  If you 
have already encountered the bug, use the following workaround:  

   Select segment_name, segment_id from dba_rollback_segs; 

   Drop all Rollback Segments except for SYSTEM.  

   Recreate dummy (small) rollback segments with the same names in their place. 

   Then, recreate additional rollback segments you want to keep with their 
   permanent storage parameters.   

   Now drop the dummy ones. This should ensure that the segment_ids are not 
   reused. 

If you ever want to add a rollback segment you have to use the workaround steps
again.  If you do not fill the dummy slots you may see the problem re-appear.

我们可以尝试drop异常恢复前已有的可能存在问题的rollback segment来规避这个问题,虽然在10g下使用AMU(automatic managed undo)但仍可以做到这一点:

SQL> alter system set "_smu_debug_mode"=4;
System altered.

/* 设置SMU debug模式为4以便能够手动管理回滚段 */

SQL> set heading off 

SQL> select 'drop rollback segment "'||segment_name||'";' from dba_rollback_segs where segment_name!='SYSTEM';

drop rollback segment "_SYSSMU1$";
drop rollback segment "_SYSSMU2$";
drop rollback segment "_SYSSMU3$";
drop rollback segment "_SYSSMU4$";
drop rollback segment "_SYSSMU5$";
drop rollback segment "_SYSSMU6$";
drop rollback segment "_SYSSMU7$";
drop rollback segment "_SYSSMU8$";
drop rollback segment "_SYSSMU9$";
drop rollback segment "_SYSSMU10$";
drop rollback segment "_SYSSMU11$";
drop rollback segment "_SYSSMU12$";
drop rollback segment "_SYSSMU13$";
drop rollback segment "_SYSSMU14$";
drop rollback segment "_SYSSMU15$";
drop rollback segment "_SYSSMU16$";
drop rollback segment "_SYSSMU17$";
drop rollback segment "_SYSSMU18$";
drop rollback segment "_SYSSMU19$";
drop rollback segment "_SYSSMU20$";
drop rollback segment "_SYSSMU21$";
drop rollback segment "_SYSSMU22$";
drop rollback segment "_SYSSMU23$";
drop rollback segment "_SYSSMU24$";
drop rollback segment "_SYSSMU25$";
drop rollback segment "_SYSSMU26$";
drop rollback segment "_SYSSMU27$";
drop rollback segment "_SYSSMU28$";
drop rollback segment "_SYSSMU29$";
drop rollback segment "_SYSSMU30$";

30 rows selected.

/* 依次执行以上的drop rollback segment回滚段的命令
    注意当前撤销表空间上的回滚段仅能offline而无法drop掉,
    实际上我们需要做的也仅仅是把之前undo表空间上有问题的回滚段drop掉
*/

SQL> alter rollback segment "_SYSSMU30$" offline;
Rollback segment altered.

SQL> drop rollback segment "_SYSSMU30$";
drop rollback segment "_SYSSMU30$"
*
ERROR at line 1:
ORA-30025: DROP segment '_SYSSMU30$' (in undo tablespace) not allowed

SQL> alter rollback segment "_SYSSMU30$" online;
Rollback segment altered.

经过以上drop问题回滚段rollback segment后,系统不再出现ORA-00600:[4097]内部错误,实例恢复正常。在系统正常后,我们有必要重置之前所设的”_smu_debug_mode”UNDO管理debug模式的隐藏参数。

Oracle内部错误:ORA-00600:[6033]一例

一套HP-UX上的9.2.0.8系统,某条查询语句执行时出现ORA-00600: internal error code, arguments: [6033], [], [], [], [], [], [], []内部错误,错误trace信息如下:

*** SESSION ID:(583.18281) 2010-12-20 22:49:01.364
*** 2010-12-20 22:49:01.364
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [6033], [], [], [], [], [], [], []
Current SQL statement for this session:
SELECT INTERFACE_HEADER_ID, DOCUMENT_SUBTYPE, AGENT_ID, VENDOR_SITE_ID FROM PO_HEADERS_INTERFACE WHE
RE WF_GROUP_ID = :B1 ORDER BY INTERFACE_HEADER_ID
----- PL/SQL Call Stack -----
object line object
handle number name
c0000001067e3328 4332 package body APPS.PO_AUTOCREATE_DOC
c0000000fd267060 1 anonymous block
c000000108fe4d60 1979 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 1745 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 1099 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 560 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 1863 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 1099 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 560 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 1863 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 1099 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 560 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 1863 package body
PL/SQL call stack truncated after 1024 bytes.
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedmp()+184 ? ksedst() C0000000CEB36420 ?
400000000147994B ?

已知的ORA-00600:[6033]错误一般和索引逻辑讹误相关,metalink上有相关的Note建议在出现该错误后运行analyze table validate structure cascade语句以验证表与索引间的数据正确性。

ORA-600 [6033] "null value retrieved from index leaf lookup" [ID 45795.1]
Modified 03-JUN-2010 Type REFERENCE Status PUBLISHED
Note: For additional ORA-600 related information please read Note:146580.1

PURPOSE:
This article represents a partially published OERI note.
It has been published because the ORA-600 error has been
reported in at least one confirmed bug.
Therefore, the SUGGESTIONS section of this article may help
in terms of identifying the cause of the error.
This specific ORA-600 error may be considered for full publication
at a later date. If/when fully published, additional information
will be available here on the nature of this error.
SUGGESTIONS:
Run the ANALYZE command on any tables and indexes in the
trace file:
Example: ANALYZE TABLE
 VALIDATE STRUCTURE CASCADE;
Rebuild any corrupted indexes.
Index corruption.
Known Bugs

NB Bug Fixed Description
6401576 9.2.0.8.P22 OERI[ktbair1] / ORA-600 [6101] index corruption possible
5845232 9.2.0.8.P06 Block corruption / errors from concurrent dequeue operations
2718937 9.2.0.4, 10.1.0.2 OERI:6033 from SELECT on IOT with COMPRESSED PRIMARY KEY
1573283 8.1.7.2, 9.0.1.0 OERI:6033 from ALTER INDEX .. REBUILD ONLINE PARAMETERS ('OPTIMIZE FULL')
Certain index operations can lead to block corruption / memory corruption with varying symptoms such as ORA-600 [6033], ORA-600 [6101] , ORA-600 [ktbair1] , ORA-600 [kcbzpb_1], ORA-600 [4519] and ORA-600 [kcoapl_blkchk] if DB_BLOCK_CHECKING is enabled. Concurrent dequeue operations can lead to block corruption / memory corruption with varying symptoms such as ORA-600 [6033], ORA-600 [6101] and ORA-600 [kcoapl_blkchk] if DB_BLOCK_CHECKING is enabled. Note: This issue was previously fixed under bug 5559640 but that fix had a serious problem which could lead to SGA memory corruption. This fix supercedes the fix for bug 5559640. The problem with patch 5559640 is alerted in Note:414109.1 This fix is superceeded by the fix for bug 6401576.

通过analyze table validate structure cascade命令验证索引后若存在问题则会进一步产生相关的trace文件,一般这类索引逻辑讹误的问题可以通过drop-recreate索引来解决。

Oracle内部错误:ORA-00600[17175]一例

一套HP-UX上的10.2.0.4系统出现ORA-00600[17175] Oracle600内部错误,相关的日志信息如下:

Wed Dec 1 01:57:55 2010
Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_pmon_3250.trc:
ORA-00600: internal error code, arguments: [17175], [255], [], [], [], [], [], []
ORA-00601: cleanup lock conflict
Wed Dec 1 01:57:57 2010
Trace dumping is performing id=[cdmp_20101201015757]
Wed Dec 1 01:58:05 2010
LGWR: terminating instance due to error 472
Wed Dec 1 01:58:05 2010
Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lms1_3291.trc:
ORA-00472: PMON process terminated with error
Wed Dec 1 01:58:05 2010
Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lms2_3293.trc:
ORA-00472: PMON process terminated with error
Wed Dec 1 01:58:05 2010
Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lms3_3295.trc:
ORA-00472: PMON process terminated with error
Wed Dec 1 01:58:05 2010
Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lms0_3289.trc:
ORA-00472: PMON process terminated with error
Wed Dec 1 01:58:05 2010
Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lmon_3283.trc:
ORA-00472: PMON process terminated with error
Wed Dec 1 01:58:05 2010
Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lmd0_3287.trc:
ORA-00472: PMON process terminated with error
Wed Dec 1 01:58:05 2010
Shutting down instance (abort)
License high water mark = 421

/u01/app/oracle/admin/xgp2/bdump/xgp21_pmon_3250.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/10.2.0/db_1
System name: HP-UX
Node name: XGP2_db1
Release: B.11.31
Version: U
Machine: ia64
Instance name: xgp21
Redo thread mounted by this instance: 1
Oracle process number: 2
Unix process pid: 3250, image: oracle@XGP2_db1 (PMON)

*** SERVICE NAME:(SYS$BACKGROUND) 2010-12-01 01:57:55.933
*** SESSION ID:(333.1) 2010-12-01 01:57:55.933
*** 2010-12-01 01:57:55.933
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [17175], [255], [], [], [], [], [], []
ORA-00601: cleanup lock conflict


ksedst <- ksedmp <- ksfdmp <- kgeriv <- kgesiv
<- kgesic1 <- kghcln <- kslilcr <- $cold_ksl_cleanup <- ksepop
<- kgepop <- kgesev <- ksesec0 <- $cold_kslges <- ksl_get_child_latch
<- kslgpl <- es <- ksfglt <- kghext_numa <- ksmasgn
<- kghnospc <- $cold_kghalo <- ksmdacnk <- ksmdget <- ksosp_alloc
<- ksoreq_submit <- ksbsrv <- kmmssv <- kmmlsa <- kmmlod
<- ksucln <- ksbrdp <- opirip <- $cold_opidrv <- sou2o
<- $cold_opimai_real <- main <- main_opd_entry

PROCESS STATE
-------------
Process global information:
process: c00000018d000078, call: c00000018d252238, xact: 0000000000000000, curses: c00000018d2508a8, usrses: c00000018d2508a8
----------------------------------------
SO: c00000018d000078, type: 2, owner: 0000000000000000, flag: INIT/-/-/0x00
(process) Oracle pid=2, calls cur/top: c00000018d252238/c00000018d252238, flag: (e) SYSTEM
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 48
last post received-location: ksoreq_reply
last process to post me: c00000018d037978 1 64
last post sent: 0 0 24
last post sent-location: ksasnd
last process posted by me: c00000018d001058 1 6
(latch info) wait_event=0 bits=90
holding (efd=5) c00000020001d500 Parent+children shared pool level=7
Location from where latch is held: kghfrunp: alloc: clatch nowait:
Context saved from call: 0
state=busy, wlstate=free
holding (efd=5) c00000020000b5f8 OS process allocation level=4
Location from where latch is held: ksoreq_submit:
Context saved from call: 13835058076152957304
state=busy, wlstate=free
Process Group: DEFAULT, pseudo proc: c0000004dd263230
O/S info: user: oracle, term: UNKNOWN, ospid: 3250
OSD pid info: Unix process pid: 3250, image: oracle@XGP2_db1 (PMON)


SO: c0000004df4d5f28, type: 19, owner: c00000018d000078, flag: INIT/-/-/0x00
GES MSG BUFFERS: st=emp chunk=0x0000000000000000 hdr=0x0000000000000000 lnk=0x0000000000000000 flags=0x0 inc=4
outq=0 sndq=0 opid=2 prmb=0x0
mbg[i]=(2 19) mbg[b]=(0 0) mbg[r]=(0 0)
fmq[i]=(4 1) fmq[b]=(0 0) fmq[r]=(0 0)
mop[s]=20 mop[q]=1 pendq=0 zmbq=0
nonksxp_recvs=0
------------process 0xc0000004df4d5f28--------------------
proc version : 0
Local node : 0
pid : 3250
lkp_node : 0
svr_mode : 0
proc state : KJP_NORMAL
Last drm hb acked : 0
Total accesses : 181
Imm. accesses : 180
Locks on ASTQ : 0
Locks Pending AST : 0
Granted locks : 0
AST_Q:
PENDING_Q:
GRANTED_Q:
----------------------------------------
SO: c00000018d2f3610, type: 11, owner: c00000018d000078, flag: INIT/-/-/0x00
(broadcast handle) flag: (2) ACTIVE SUBSCRIBER, owner: c00000018d000078,
event: 1, last message event: 1,
last message waited event: 1, messages read: 0
channel: (c0000004dd29fdb0) scumnt mount lock
scope: 1, event: 19, last mesage event: 0,
publishers/subscribers: 0/19,
messages published: 0
SO: c00000018d2508a8, type: 4, owner: c00000018d000078, flag: INIT/-/-/0x00
(session) sid: 333 trans: 0000000000000000, creator: c00000018d000078, flag: (51) USR/- BSY/-/-/-/-/-
DID: 0001-0002-00000003, short-term DID: 0000-0000-00000000
txn branch: 0000000000000000
oct: 0, prv: 0, sql: 0000000000000000, psql: 0000000000000000, user: 0/SYS
service name: SYS$BACKGROUND
last wait for 'latch: shared pool' blocking sess=0x0000000000000000 seq=342 wait_time=175677 seconds since wait started=0
address=c0000002000fff60, number=d6, tries=7
Dumping Session Wait History
for 'latch: shared pool' count=1 wait_time=175677
address=c0000002000fff60, number=d6, tries=7
for 'latch: shared pool' count=1 wait_time=97554
address=c0000002000fff60, number=d6, tries=6
for 'latch: shared pool' count=1 wait_time=78023
address=c0000002000fff60, number=d6, tries=5
for 'latch: shared pool' count=1 wait_time=38978
address=c0000002000fff60, number=d6, tries=4
for 'latch: shared pool' count=1 wait_time=38942
address=c0000002000fff60, number=d6, tries=3
for 'latch: shared pool' count=1 wait_time=19435
address=c0000002000fff60, number=d6, tries=2
for 'latch: shared pool' count=1 wait_time=12655
address=c0000002000fff60, number=d6, tries=1
for 'latch: shared pool' count=1 wait_time=8
address=c0000002000fff60, number=d6, tries=0
for 'os thread startup' count=1 wait_time=144253
=0, =0, =0
for 'os thread startup' count=1 wait_time=141360
=0, =0, =0


SO: c00000018d2f3500, type: 11, owner: c00000018d000078, flag: INIT/-/-/0x00
(broadcast handle) flag: (2) ACTIVE SUBSCRIBER, owner: c00000018d000078,
event: 2, last message event: 40,
last message waited event: 40, messages read: 1
channel: (c0000004dd29bbd8) system events broadcast channel
scope: 2, event: 224634, last mesage event: 40,
publishers/subscribers: 0/161,
messages published: 1


SO: c00000018d252238, type: 3, owner: c00000018d000078, flag: INIT/-/-/0x00
(call) sess: cur c00000018d2508a8, rec 0, usr c00000018d2508a8; depth: 0
----------------------------------------
SO: c00000018d2594b0, type: 5, owner: c00000018d252238, flag: INIT/-/-/0x00
(enqueue) PR-00000000-00000000 DID: 0001-0002-00000003
lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 res_flag: 0x2
res: 0xc0000004df401718, mode: X, lock_flag: 0x0
own: 0xc00000018d2508a8, sess: 0xc00000018d2508a8, proc: 0xc00000018d000078, prv: 0xc0000004df401728
----------------------------------------
SO: c00000018d30b710, type: 16, owner: c00000018d000078, flag: INIT/-/-/0x00
(osp req holder)
CHILD REQUESTS:
(osp req) type=2(BACKGROUND) flags=0x20001(STATIC/-) state=1(INITED) err=0
pg=0 arg1=0 arg2=(null) reply=(null) pname=S018
pid=0 parent=c00000018d30b710 fulfill=0000000000000000
----------------------------------------
SO: c0000004dbff09c0, type: 192, owner: c0000004dbff09c0, flag: -/-/-/0x00

在metalink上搜索600[17175]内部错误相关的文档,可以找到该错误的大量信息:

Keywords: ora-00600 [17175]

1. Bug 6250251: ORA-00600 17175 DURING KGI CLEANUP - DUMP - ORADEBUG
--ora-600 followed by ora-601 and instance crash with ORA-17175.
--Also, setting of heap check event triggers this problem. In this case
--it is event="10235 trace name context forever, level 27"

2. Bug 4216668 - Dump from INSERT / MERGE on internal columns (Doc ID 4216668.8)
--INSERT or MERGE commands might core dump if operating on object types and internal columns are involved.

3. Bug 7590297: ORA-600 [17175] [255] ORA-601: CLEANUP LOCK CONFLICT CRASHED THE DATABASE

4. SR 3-2296150050
--The error has occurred when Oracle was cleaning shared pool latch/heap information about the process 
which died in middle.
--There is no data corruption associated with this error.
--This is evident from the function kghcln in the trace stack at which it failed.
--This problem is usually the symptom of some earlier problem with the latch.
--Either after a process has died, or a process has signaled an error while holding a shared pool latch, 
and the index to the shared pool latch is invalid.

--There was a Bug 7590297 raised for this issue which could not be progressed due to unavailability of information.
--From few earlier known issues - This can be due to PMON may sometimes signal ORA-601 
while trying to start up additional shared servers or dispatchers.
--There the workaround suggested was to Start the instance with max # of shared servers.

--Can you reproduce the problem?If the instance has been restated the issue may not persist as it is related to memory.
--If the issue persists then we have to perform the following to monitoring the instance to investigate further:

--1. Set the following event in parameter file:
--event="10257 trace name context forever, level 10"
--event="601 trace name SYSTEMSTATE level 10"

--The first event will cause PMON to dump info about shared server startup.
--The second event will cause PMON to do a system state dump when the 601 occurs.

--2. You should also have the track of this in intervals and save the historical results from:

--SQL> select e.total_waits, e.total_timeouts, e.time_waited from v$session_event e, v$session s
, v$bgprocess b where b.name='PMON' and s.paddr=b.paddr and e.sid=s.sid and e.event='process startup';


5. SR 3-2123025401
--=== ODM Solution / Action Plan ===
--Disabled NUMA for resolution

6. SR 7314313.994

Analysis:

Bug 6250251 and bug 4216668 are not applicable to this case.
Bug 7590297 is applicable to this case, as the call stack, error message are the same with this case. 
But this patch is suspended as requested info is not available.

SR 3-2296150050: same error message, same DB version, similar call stack; closed without solution.
SR 3-2123025401: same error message, same DB version, similar call stack. 
The issue happened twice in that SR and solved by disabling NUMA
SR 7314313.994: same error message, same DB version, similar call stack; closed without solution.

ERROR:
ORA-600 [17175] [a]


VERSIONS:
versions 9.2 to 10.1


DESCRIPTION:


This error occurs when we are cleaning up a shared pool latch (either after a process has died, 
or a process has signaled an error while holding a shared pool latch), 
and the index to the shared pool latch is invalid.


ARGUMENTS:
Arg [a] index of the latch recovery structure - usually 255

FUNCTIONALITY:
Generic Heap Manager


IMPACT:
INSTANCE HANG
PROCESS FAILURE
INSTANCE FAILURE

以下为Oracle GCS给出的行动计划,GCS认为绝大多数ORA-00600 [17xxx]是由memory相关的问题引起的,这些问题往往在重启实例后就可以得到解决。并建议可以设置shared_servers=max_shared_servers后进一步观察:

From the uploaded files it looks like you were reported with ORA-00600 [17175] errors 
and crashed the instance.What is the current status after the restart of the database. 
Are you still reported with the same errors and crashing the instance ?
Mostly the ORA-00600 [17xxx] errors are memory releated and might have got resolved after the database restart.


Further looking at the uploaded trace file the failing functions and the error closely 
matches Bug 6958493and is closed as duplicate of BaseBug 6962340which is closed 
as could not able to reproduce the error.


Also a smillar issue is reported inBug 3104250which is fixed in 10g, but that doesn't mean 
you cannot get this error for a new reason and that the same workaround would fix it.
We need to implement the workaround and set: shared_servers=max_shared_servers 
if the error reproduces again. If this is still repeated issue then we can file a new bug with development for the same.


ACTION PLAN
===========
1. Monitor the alertlog for the ORA-00600 [17175] errors for the next few days and if the database still crashes then please
set shared_servers=max_shared_servers and see if the problem resolves or not.

Oracle内部错误:ORA-00600[kccchb_3]一例

一套Linux x86-64上的11.1.0.7 RAC系统,该RAC使用Netapps NFS作为共享存储。其中一个节点出现ORA-00600: internal error code, arguments: [kccchb_3]内部错误并导致实例意外终止,详细日志如下:

Mon Dec 27 00:03:13 2010
Error: Controlfile sequence number in file header is different from the one in memory
Please check that the correct mount optionsare used if controlfile is located on NFS
Errors in file /d02/oratst/TSMOTEBS/db/11.1.0/admin/TSMOTEBS1_l118464dcss3001/diag/rdbms/tsmotebs/TSMOTEBS1/trace/TSMOTEBS1_ckpt_15907.trc (incident=11353):
ORA-00600: internal error code, arguments: [kccchb_3], [555541], [555540], [555540], [], [], [], [], [], [], [], []
Incident details in: /d02/oratst/TSMOTEBS/db/11.1.0/admin/TSMOTEBS1_l118464dcss3001/diag/rdbms/tsmotebs/TSMOTEBS1/incident/incdir_11353/TSMOTEBS1_ckpt_15907_i11353.trc
Errors in file /d02/oratst/TSMOTEBS/db/11.1.0/admin/TSMOTEBS1_l118464dcss3001/diag/rdbms/tsmotebs/TSMOTEBS1/trace/TSMOTEBS1_ckpt_15907.trc:
ORA-00600: internal error code, arguments: [kccchb_3], [555541], [555540], [555540], [], [], [], [], [], [], [], []
CKPT (ospid: 15907): terminating the instance due to error 469

根据Metalink文档Note 303986.1 RAC instance using NFS via Netapps report Ora-600 [kccchb_3]:

Symptoms
RAC instance using NFS via Netapps occasionally reports:
ORA-00600 [kccchb_3], [689], [0], [],[],[],[]
ORA-00600 [kclchkblk_3], [7], [738515663], [14], [], [], [], []

Cause
This is most probably a Netapp issue wherein 'noac' doesn't work as expected.
NetApp recommended mount options for Oracle9i RAC on Solaris:
rw,bg,vers=3,proto=tcp,hard,intr,rsize=32768,wsize=32768,forcedirectio,noac

NetApp recommended mount options for Linux SLES9, RHEL4, RHEL3 QU3 and later:
rw,bg,hard,nointr,tcp,vers=3,timeo=600,rsize=ORACLE_BLOCK_SIZE,
wsize=ORACLE_BLOCK_SIZE,actimeo=0

NetApp recommended mount options for Linux RHAS2.1, RHEL3 pre QU3:
rw,bg,hard,nointr,tcp,vers=3,timeo=600,rsize=ORACLE_BLOCK_SIZE,
wsize=ORACLE_BLOCK_SIZE,noac

And for AS 2.1 in /etc/modules.conf add:
options nfs nfs_uncached_io=1
Solution

Workaround
set filesystemio_options=DIRECTIO for all instances (to bypass any buffer cache)

以上介绍了该ORA-00600 [kccchb_3]内部错误可以通过设置初始化参数filesystemio_options为DIRECTIO来workaround;实际上更好的选择可以是设置filesystemio_options为SETALL,即同时使用DIRECTIO和ASYNC异步IO。

顺便提一下在Linux上启用异步ASYNC IO不仅需要设置该filesystemio_options参数,我们还需要同时保证以下几点:
1.filesystemio_options参数设置为ASYNC或者SETALL

2.合理设置Linux Kernel内核参数fs.aio-max-nr,例如设置为3145728;以及参数fs.file-max =6815744

3.disk_asynch_io初始化参数设置为TRUE

4.在10gR2下oracle binary可能没有正确以async_on选项编译,我们需要手动重新make(The reason behind this behavior is that the LIBAIO_0.1 Linux OS library is not attached for io_getevents.)

SQL>shutdown immediate
[maclean@rh8 ~]$ cd $ORACLE_HOME/rdbms/lib
[maclean@rh8 lib]$ ln -s /usr/lib/libaio.so skgaio.o
[maclean@rh8 lib]$ make PL_ORALIBS=-laio -f ins_rdbms.mk async_on
rm -f /u01/oracle/product/10.2.0/db_1/rdbms/lib/skgaioi.o
cp /u01/oracle/product/10.2.0/db_1/rdbms/lib/skgaio.o /u01/oracle/product/10.2.0/db_1/rdbms/lib/skgaioi.o
chmod 755 /u01/oracle/product/10.2.0/db_1/bin

 - Linking Oracle 
rm -f /u01/oracle/product/10.2.0/db_1/rdbms/lib/oracle
gcc  -o /u01/oracle/product/10.2.0/db_1/rdbms/lib/oracle -L/u01/oracle/product/10.2.0/db_1/rdbms/lib/ -L/u01/oracle/product/10.2.0/db_1/lib/ -L/u01/oracle/product/10.2.0/db_1/lib/stubs/ -L/usr/lib -lirc   -Wl,-E `test -f /u01/oracle/product/10.2.0/db_1/rdbms/lib/sskgpsmti.o && echo /u01/oracle/product/10.2.0/db_1/rdbms/lib/sskgpsmti.o` /u01/oracle/product/10.2.0/db_1/rdbms/lib/opimai.o /u01/oracle/product/10.2.0/db_1/rdbms/lib/ssoraed.o /u01/oracle/product/10.2.0/db_1/rdbms/lib/ttcsoi.o /u01/oracle/product/10.2.0/db_1/rdbms/lib/defopt.o  -Wl,--whole-archive -lperfsrv10 -Wl,--no-whole-archive /u01/oracle/product/10.2.0/db_1/lib/nautab.o /u01/oracle/product/10.2.0/db_1/lib/naeet.o /u01/oracle/product/10.2.0/db_1/lib/naect.o /u01/oracle/product/10.2.0/db_1/lib/naedhs.o /u01/oracle/product/10.2.0/db_1/rdbms/lib/config.o  -lserver10 -lodm10 -lnnet10 -lskgxp10 -lsnls10 -lnls10  -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -lhasgen10 -lcore10 -lskgxn2 -locr10 -locrb10 -locrutl10 -lhasgen10 -lcore10 -lskgxn2   -lclient10  -lvsn10 -lcommon10 -lgeneric10 -lknlopt `if /usr/bin/ar tv /u01/oracle/product/10.2.0/db_1/rdbms/lib/libknlopt.a | grep xsyeolap.o > /dev/null 2>&1 ; then echo "-loraolap10" ; fi` -lslax10 -lpls10  -lplp10 -lserver10 -lclient10  -lvsn10 -lcommon10 -lgeneric10 -lknlopt -lslax10 -lpls10  -lplp10 -ljox10 -lserver10 -lclsra10 -ldbcfg10 -locijdbcst10 -lwwg  `cat /u01/oracle/product/10.2.0/db_1/lib/ldflags`    -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lnro10 `cat /u01/oracle/product/10.2.0/db_1/lib/ldflags`    -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lmm -lsnls10 -lnls10  -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 `cat /u01/oracle/product/10.2.0/db_1/lib/ldflags`    -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lnro10 `cat /u01/oracle/product/10.2.0/db_1/lib/ldflags`    -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10   -lsnls10 -lnls10  -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 `if /usr/bin/ar tv /u01/oracle/product/10.2.0/db_1/rdbms/lib/libknlopt.a | grep "kxmnsd.o" > /dev/null 2>&1 ; then echo " " ; else echo "-lordsdo10"; fi` -lctxc10 -lctx10 -lzx10 -lgx10 -lctx10 -lzx10 -lgx10 -lordimt10 -lsnls10 -lnls10  -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -lsnls10 -lunls10  -lsnls10 -lnls10  -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -laio    `cat /u01/oracle/product/10.2.0/db_1/lib/sysliblist` -Wl,-rpath,/u01/oracle/product/10.2.0/db_1/lib -lm    `cat /u01/oracle/product/10.2.0/db_1/lib/sysliblist` -ldl -lm   -L/u01/oracle/product/10.2.0/db_1/lib
/u01/oracle/product/10.2.0/db_1/lib//libcore10.a(lcd.o): In function `lcdprm':
lcd.c:(.text+0x8e8): warning: the `gets' function is dangerous and should not be used.
mv -f /u01/oracle/product/10.2.0/db_1/bin/oracle /u01/oracle/product/10.2.0/db_1/bin/oracleO
mv /u01/oracle/product/10.2.0/db_1/rdbms/lib/oracle /u01/oracle/product/10.2.0/db_1/bin/oracle
chmod 6751 /u01/oracle/product/10.2.0/db_1/bin/oracle

SQL> startup;

/* 通过以下方式可以验证ASYNC IO的工作情况 */

cat /proc/slabinfo | grep kio

[maclean@rh8 ~]$ cat /proc/slabinfo | grep kio
kioctx                56     72    320   12    1 : tunables   54   27    8 : slabdata      6      6      0
kiocb                 15     15    256   15    1 : tunables  120   60    8 : slabdata      1      1      0

/* 这里kioctx的值大于零,说明正在使用异步ASYNC IO */

如何是11g,那么无需那么麻烦要重新编译async_on,同时可以通过以下手段了解异步ASYNC IO的使用情况:

SQL> col name for a60 
SQL> SELECT name, asynch_io FROM v$datafile f,v$iostat_file i 
WHERE f.file# = i.file_no
AND filetype_name = 'Data File' ; 

NAME							     ASYNCH_IO
------------------------------------------------------------ ---------
/standby/oradata/PROD/datafile/o1_mf_system_6q9dwgwh_.dbf    ASYNC_ON
/standby/oradata/PROD/datafile/o1_mf_sysaux_6q9dwgyp_.dbf    ASYNC_ON
/standby/oradata/PROD/datafile/o1_mf_undotbs1_6q9dwh0r_.dbf  ASYNC_ON
/standby/oradata/PROD/datafile/o1_mf_users_6q9dwh2x_.dbf     ASYNC_ON
/standby/oradata/PROD/datafile/o1_mf_example_6q9dzhh1_.dbf   ASYNC_ON
/standby/oradata/PROD/datafile/o1_mf_enc_6q9fdzcx_.dbf	     ASYNC_ON

6 rows selected.

/* 可以看到ASYNCH_IO的状态为ON,说明正使用异步IO */

ORA-00600:[kclchkinteg_2]及[kjmsm_epc]内部错误一例

一套AIX上的9.2.0.6 RAC系统,alert日志中最初报LMS进程(Lock Manager Server process,即锁服务管理进程,仅出现在RAC系统中)遭遇ORA-00600[kclchkinteg_2],继而出现ORA-00600[[kjmsm_epc]内部错误导致实例crash(instance crashed)。相关日志如下:

Tue Dec 1 01:20:25 2009
Errors in file /oracle/admin/crmdb/bdump/crmdb1_lms7_2143190.trc:
ORA-00600: internal error code, arguments: [kclchkinteg_2], [], [], [], [], [], [], []
Tue Dec 1 01:20:25 2009
Errors in file /oracle/admin/crmdb/bdump/crmdb1_lms7_2143190.trc:
ORA-00600: internal error code, arguments: [kclchkinteg_2], [], [], [], [], [], [], []
Tue Dec 1 01:20:26 2009
Trace dumping is performing id=[cdmp_20091201012026]
Tue Dec 1 01:20:40 2009
Errors in file /oracle/admin/crmdb/bdump/crmdb1_lms7_2143190.trc:
ORA-00600: internal error code, arguments: [kjmsm_epc], [], [], [], [], [], [], []
Tue Dec 1 01:20:41 2009
Errors in file /oracle/admin/crmdb/bdump/crmdb1_lms7_2143190.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [kjmsm_epc], [], [], [], [], [], [], []
Tue Dec 1 01:20:43 2009

经过和metalink确认,认为此次实例意外终止时由于”Bug 3671867 – OERI[kclchkinteg_2] possible in RAC environment”:

Hdr: 3671867 10.1.0.2 RDBMS 10.1.0.2 RAC PRODID-5 PORTID-212 ORA-600
Abstract: INSTANCE TERMINATED WITH ORA-600 [KCLCHKINTEG_2]
PROBLEM:
--------
During testing, one instance was terminated with ORA-600 [KCLCHKINTEG_2]

DIAGNOSTIC ANALYSIS:
--------------------
Alert log contains:

Thu Jun  3 12:57:26 2004
Private_strands 0 at log switch
Thread 1 advanced to log sequence 153
  Current log# 1 seq# 153 mem# 0: +EMDBD1/dbotazc0/onlinelog/group_1.267.1
  Current log# 1 seq# 153 mem# 1: +EMDBD1/dbotazc0/onlinelog/group_1.268.1
Thu Jun  3 13:41:45 2004
Private_strands 0 at log switch
Thread 1 advanced to log sequence 154
  Current log# 2 seq# 154 mem# 0: +EMDBD1/dbotazc0/onlinelog/group_2.269.1
  Current log# 2 seq# 154 mem# 1: +EMDBD1/dbotazc0/onlinelog/group_2.270.1
Thu Jun  3 13:55:06 2004
Errors in file /DBA/nest/oracle/DBOTA1/logs/dbota1_dbw0_82244.trc:
ORA-600: internal error code, arguments: [kclchkinteg_2], [], [], [], [],
[], [], []
Thu Jun  3 13:55:08 2004
Errors in file /DBA/nest/oracle/DBOTA1/logs/dbota1_dbw0_82244.trc:
ORA-600: internal error code, arguments: [kclchkinteg_2], [], [], [], [],
[], [], []
Thu Jun  3 13:55:08 2004
DBW0: terminating instance due to error 471
Thu Jun  3 13:55:08 2004
Trace dumping is performing id=[cdmp_20040603135508]
Thu Jun  3 13:55:10 2004
Dump system state for local instance only
Thu Jun  3 13:55:10 2004
Trace dumping is performing id=[cdmp_20040603135510]
The only trace file we have been sent is the dbw trace.
WORKAROUND:
-----------
None
RELATED BUGS:
-------------
There are several which end in KCLCHKINTEG but not KCLCHKINTEG_2
REPRODUCIBILITY:
----------------
Appears only to have happened once so far.
TEST CASE:
----------
N/A
STACK TRACE:
------------
ksedmp ksfdmp kgerinv kgeasnmierr kclassertle kclchkinteg kclfwrite1 kcbbic1
kcbbiop kcbbdrv ksbabs ksbrdp opirip opidrv sou2o main start

SUPPORTING INFORMATION:
-----------------------
Bruce Carter has looked at this and suggested a bug be raised.
Affects:
    Product (Component)	Oracle Server (Rdbms)
    Range of versions believed to be affected	Versions < 10.2
    Versions confirmed as being affected
        * 10.1.0.3
    Platforms affected	Generic (all / most platforms affected)
Fixed:
    This issue is fixed in
        * 9.2.0.7 (Server Patch Set)
        * 10.1.0.4 (Server Patch Set)
        * 10.2.0.1 (Base Release)
Symptoms:
Related To:
    * Internal Error May Occur (ORA-600)
    * Instance May Crash
    * ORA-600 [kclchkinteg_2]
    * RAC (Real Application Clusters) / OPS
Description
    ORA-600 [kclchkinteg_2] possible in RAC environment

Oracle开发部分确认该3671867 bug已在9.2.0.8中得到修复,建议通过应用9.2.0.8补丁集或者打上bug 3671867的one-off patch来修复该问题;该Bug可能导致实例意外终止,因此其Severity也极高,值得手头仍有9i RAC系统需要管理的dba注意。