Oracle内部错误:ORA-00600[OSDEP_INTERNAL]一例

一套HP-UX上的9.2.0.5系统在shutdown abort时出现ORA-00600: internal error code, arguments: [OSDEP_INTERNAL], [], [], [], [], [], [], []内部错误,伴随有ORA-27302: failure occurred at: skgpwinit4,ORA-27303: additional information: attach to invalid skgp shared ctx,具体日志如下:

/opt/oracle/product/9.2.0.5/rdbms/log/ngende_ora_7669.trc
Oracle9i Enterprise Edition Release 9.2.0.5.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.5.0 - Production
ORACLE_HOME = /opt/oracle/product/9.2.0.5
System name: HP-UX
Node name: yictngd3
Release: B.11.23
Version: U
Machine: ia64
Instance name: nGende
Redo thread mounted by this instance: 0 
Oracle process number: 0
7669

*** 2010-09-08 00:10:02.985
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [OSDEP_INTERNAL], [], [], [], [], [], [], []
ORA-27302: failure occurred at: skgpwinit4
ORA-27303: additional information: attach to invalid skgp shared ctx
Current SQL information unavailable - no session.

Call stack
--------------
ksedmp <- ksfdmp <- kgerinv <- kgerin <- kgerecoserr <- ksucrp <- ksucresg <- kpolna 
<- kpogsk <- opiodr <- ttcpip <- opitsk <- Cannot <- Cannot <- Cannot <- Cannot <- opiino 
<- opiodr <- opidrv <- sou2o <- main <- main_opd_entry

经查该内部错误与操作系统共享内存有关,相关的Note有:

Ora-00600: Internal Error Code, Arguments: [Osdep_internal] [ID 304027.1]
Applies to:
Oracle Server - Enterprise Edition - Version: 9.2.0.2 to 10.2.0.3 - Release: 9.2 to 10.2
Information in this document applies to any platform.
***Checked for relevance on 03-NOV-2010***

Getting ORA-600 [OSDEP_INTERNAL] errors while starting up the database:

ORA-00600: internal error code, arguments: [OSDEP_INTERNAL],
[], [], [], [], [], [], []
ORA-27302: failure occurred at: skgpwreset1
ORA-27303: additional information: invalid shared ctx
ORA-27146: post/wait initialization failed
ORA-27300: OS system dependent operation:semget failed with status: 28
ORA-27301: OS failure message: No space left on device
ORA-27302: failure occurred at: sskgpsemsper
Symptoms
Getting ORA-600 [OSDEP_INTERNAL]
Accompanied by the following errors
ORA-27302:Failure occured at: skgpwreset1
ORA-27303:additional information: invalid shared ctx
ORA-27146: post/wait initialization failed
ORA-27300: OS system dependent operation: segment failed with error 28
ORA-27301: OS system Failure message: No space left on device
ORA-27302: failure occured at: sskgpsemsper

Cause
The functions in the trace file generated point to the semaphore settings .
Smmns is set too low.

Solution
set semmns 32767
Arrange to make the changes persistent as per the Operating system then restart the server and check if the changes are persistent.
eg: Linux /etc/sysctl.conf

sem = semmsl semmns semopm semmni
kernel.sem = 256 32768 100 228

Getting ORA-00600 [OSDEP_INTERNAL]: Internal Error While Trying To Connect / As Sysdba [ID 253885.1]

Applies to:
Oracle Server - Enterprise Edition - Version: 9.2.0.3 and later   [Release: 9.2 and later ]
HP-UX PA-RISC (64-bit)
Symptoms
Getting following error while trying to connect as sysdba using sqlplus:

SQL> conn / as sysdba
ERROR:
ORA-01041: internal error. hostdef extension doesn't exist

Alert.log shows:

ORA-00600: internal error code, arguments: [OSDEP_INTERNAL], [], [], [], [], [],[], []
ORA-27302: failure occurred at: skgpwinit4
ORA-27303: additional information: attach to invalid skgp shared ctx
Cause
- Database was shutdown using "shutdown abort" option.
- Shared memory segment was not removed even though the instance was down.
Solution
+ Check which shared memory segments are owned by the oracle owner

Use the ipcs -bm command:

% ipcs -bm

m 34034336 0xf8f18468 --rw-r----- ORACLE dba 16777216

+ Delete the 'orphan' shared memory segments:

% ipcrm -m 34034336

If there is more than one instance running on the server and you are not sure how to identify the shared
memory segments then please contact support.

不恰当的设置OS VM参数可能导致该问题,而在HP-UX PA-RISC平台上使用'shotdown abort'命令时可能因为共享内存未能正常移除而出现该内部错误;因为实例还是以'abort'方式关闭的,仅仅是共享内存未能释放,所以只需要以ipcs->ipcrm等os命令将相应的共享内存段释放就可以了,不会造成其他影响。

Oracle内部错误:ORA-00600[15801], [1]一例

一套Sparc Solaris上的11.1.0.7系统,在创建索引时频繁出现ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []内部错误,日志信息如下:

Tue Aug 17 17:34:21 2010
WARNING: Oracle executable binary mismatch detected.
Binary of new process does not match binary which started instance
issue alter system set "_disable_image_check" = true to disable these messages
Tue Aug 17 17:34:21 2010
Errors in file /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/trace/ORAHCMU_p023_22262.trc (incident=12505):
ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []
Incident details in: /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/incident/incdir_12505/ORAHCMU_p023_22262_i12505.trc
Tue Aug 17 17:34:21 2010
Errors in file /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/trace/ORAHCMU_p021_22258.trc (incident=12489):
ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []
Incident details in: /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/incident/incdir_12489/ORAHCMU_p021_22258_i12489.trc

Errors in file /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/trace/ORAHCMU_p015_9328.trc (incident=19909):
ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []
Errors in file /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/trace/ORAHCMU_p043_9388.trc (incident=20133):
ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []
Mon Aug 23 14:43:42 2010
Errors in file /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/trace/ORAHCMU_p087_9668.trc (incident=20485):
ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []
Mon Aug 23 14:43:42 2010
Errors in file /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/trace/ORAHCMU_p012_9322.trc (incident=19885):
ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []
Incident details in: /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/incident/incdir_19789/ORAHCMU_ora_8602_i19789.trc
Mon Aug 23 14:43:43 2010
WARNING: Oracle executable binary mismatch detected.
Binary of new process does not match binary which started instance
issue alter system set "_disable_image_check" = true to disable these messages

Dump continued from file: /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/trace/ORAHCMU_ora_8602.trc
ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []

*** 2010-08-23 14:43:42.974
----- Current SQL Statement for this session (sql_id=00abhfx460qm9) -----
CREATE UNIQUE iNDEX PS_HM_BEN_GP_STG ON PS_HM_BEN_GP_STG (CAL_ID, GP_PAYGROUP, 
EMPLID, EMPL_RCD, HM_INCURRED_BY, HM_SUM_ASSURED) TABLESPACE PSINDEX STORAGE 
(INITIAL 40000 NEXT 100000 MAXEXTENTS UNLIMITED PCTINCREASE 0) PCTFREE 10 PARALLEL NOLOGGING

----- Call Stack Trace -----
ksedst1 ksedst dbkedDefDump dbgexPhaseII dbgexProcessError dbgePostErrorKGE kgeade kgerem
kxfpProcessError kxfpqidqr kxfpqdqr kxfxgs kxfxcp qerpxSendParse kxfpValidateSlaveGroup kxfpgsg
kxfrAllocSlaves kxfrialo kxfralo qerpx_rowsrc_start qerpxStart kdicrws kdicdrv opiexe opiosq0
kpooprx kpoal8 opiodr ttcpip opitsk opiino opiodr opidrv sou2o main



SO: 0x3bf0bbf20, type: 4, owner: 0x3bf5452d0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x3bf5452d0, name=session, file=ksu.h LINE:10719 ID:, pg=0
(session) sid: 217 ser: 767 trans: 0x3bc0660f8, creator: 0x3bf5452d0
flags: (0x8000041) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x44008) DDLT1/-
DID: , short-term DID:
txn branch: 0x0
oct: 9, prv: 0, sql: 0x3b5d14510, psql: 0x3b6d59820, user: 31/SYSADM
ksuxds FALSE at location: 0
service name: ORAHCMU
client details:
O/S info: user: Administrator, term: UJWALTPVM, ospid: 304:2892
machine: WORKGROUP\UJWALTPVM program: pside.exe
client info: ujwal,Administrator,UJWALTPVM,,pside.exe,
application name: pside.exe, hash value=2824484291
Current Wait Stack:
Not in wait; last wait ended 2.475286 sec ago
Wait State:
auto_close=0 flags=0x21 boundary=0x0/-1
Session Wait History:
0: waited for 'lient'
=c8, =1, =0
wait_id=10483 seq_num=10484 snap_id=1
wait times: snap=0.168502 sec, exc=0.168502 sec, total=0.168502 sec
wait times: max=2.000000 sec
wait counts: calls=1 os=1
occurred after 0.000903 sec of elapsed time
1: waited for ' waiting for ruleset'
=10010063, =1, =0
wait_id=10482 seq_num=10483 snap_id=1
wait times: snap=0.008580 sec, exc=0.008580 sec, total=0.008580 sec
wait times: max=2.000000 sec
wait counts: calls=1 os=1
occurred after 0.000731 sec of elapsed time
2: waited for ' waiting for ruleset'
=1001004f, =4, =0
wait_id=10481 seq_num=10482 snap_id=1
wait times: snap=0.000132 sec, exc=0.000132 sec, total=0.000132 sec
wait times: max=2.000000 sec
wait counts: calls=1 os=1
occurred after 0.000074 sec of elapsed time
3: waited for ' waiting for ruleset'
=1001004f, =3, =0
wait_id=10480 seq_num=10481 snap_id=1
wait times: snap=0.000002 sec, exc=0.000002 sec, total=0.000002 sec
wait times: max=2.000000 sec
wait counts: calls=1 os=1
occurred after 0.000065 sec of elapsed time

----- Session Cursor Dump -----
Current cursor: 1, pgadep=0

Open cursors(pls, sys, hwm, max): 3(0, 2, 64, 300)
NULL=1 SYNTAX=0 PARSE=0 BOUND=1 FETCH=0 ROW=1
Cached frame pages(total, free):
4k(14, 14), 8k(1, 1), 16k(1, 1), 32k(0, 0)

----- Current Cursor -----


----- Plan Table -----

============
Plan Table
============
----------------------------------------------------+-----------------------------------+-------------------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time | TQ |IN-OUT|PQ Distrib |
----------------------------------------------------+-----------------------------------+-------------------------+
| 0 | CREATE INDEX STATEMENT | | | | 2 | | | | |
| 1 | PX COORDINATOR | | | | | | | | |
| 2 | PX SEND QC (ORDER) | :TQ10001 | 82 | 4510 | | |:Q1001| P->S |QC (ORDER) |
| 3 | INDEX BUILD UNIQUE | PS_HM_BEN_GP_STG| | | | |:Q1001| PCWP | |
| 4 | SORT CREATE INDEX | | 82 | 4510 | | |:Q1001| PCWP | |
| 5 | PX RECEIVE | | 82 | 4510 | 2 | 00:00:01 |:Q1001| PCWP | |
| 6 | PX SEND RANGE | :TQ10000 | 82 | 4510 | 2 | 00:00:01 |:Q1000| P->P |RANGE |
| 7 | PX BLOCK ITERATOR | | 82 | 4510 | 2 | 00:00:01 |:Q1000| PCWC | |
| 8 | TABLE ACCESS FULL | PS_HM_BEN_GP_STG| 82 | 4510 | 2 | 00:00:01 |:Q1000| PCWP | |
----------------------------------------------------+-----------------------------------+-------------------------+


----------------------------------------
Cursor#1(0xffffffff7ce31928) state=BOUND curiob=0xffffffff7ce57d28
curflg=4c fl2=0 par=0x0 ses=0x3bf0bbf20
----- Dump Cursor sql_id=00abhfx460qm9 xsc=0xffffffff7ce57d28 cur=0xffffffff7ce31928 -----
Dump Parent Cursor sql_id=00abhfx460qm9 phd=0x3b5d14510 plk=0x3b0bb3318
sqltxt(0x3b5d14510)=CREATE UNIQUE iNDEX PS_HM_BEN_GP_STG ON PS_HM_BEN_GP_STG 
(CAL_ID, GP_PAYGROUP, EMPLID, EMPL_RCD, HM_INCURRED_BY, HM_SUM_ASSURED) 
TABLESPACE PSINDEX STORAGE (INITIAL 40000 NEXT 100000 MAXEXTENTS UNLIMITED PCTINCREASE 0) 
PCTFREE 10 PARALLEL NOLOGGING
hash=616eaa631fc21f4c0029707748605a69
parent=0x3ae539590 maxchild=01 plk=0x3b0bb3318 ppn=n
cursor instantiation=0xffffffff7ce57d28 used=1282545779 exec_id=16777216 exec=1
child#0(0x3b5d05e10) pcs=0x3b678c128
clk=0x3b7e200d0 ci=0x3b5b204c8 pn=0x39955d2b8 ctx=0x3b86ee988
kgsccflg=0 llk[0xffffffff7ce57d30,0xffffffff7ce57d30] idx=0
xscflg=c0102276 fl2=c000400 fl3=2202008 fl4=100
Frames pfr 0xffffffff7ce67098 siz=85976 efr 0xffffffff7ce66fb8 siz=85960
Cursor frame dump
enxt: 7.0x00000168 enxt: 6.0x00008000 enxt: 5.0x00008000 enxt: 4.0x00003978
enxt: 3.0x00000490 enxt: 2.0x000000b8 enxt: 1.0x00000fa0
pnxt: 1.0x00000010
kxscphp=0xffffffff7dd80a18 siz=984 inu=312 nps=312
kxscwhp=0xffffffff7ddd2cc8 siz=8136 inu=6264 nps=3968
kxscefhp=0xffffffff7ce51468 siz=88456 inu=86128 nps=86128


FileName
----------------
ORAHCMU_ora_8602.trc

FileComment
----------------------


Oracle Support - August 27, 2010 6:13:39 PM GMT+08:00 [ODM Data Collection]
Name
--------
=== ODM Data Collection ===

=== ODM Data Collection ===

Trace file /u04/app/oracle/diag/rdbms/orahcmu/ORAHCMU/trace/ORAHCMU_p012_9322.trc


*** 2010-08-23 14:43:00.472
WARNING: Oracle executable binary mismatch detected.
Binary of new process does not match binary which started instance
issue alter system set "_disable_image_check" = true to disable these messages
startup image information
iid info sz=245752512 inode=65458 ts=0x4c6df668
current process image information
iid info sz=245750720 inode=65427 ts=0x4c7204b0
set _disable_image_check = TRUE to disable this check
qksceLinearToCe error

*** 2010-08-23 14:43:42.974
*** SESSION ID:(220.111) 2010-08-23 14:43:42.974
*** CLIENT ID:(ujwal) 2010-08-23 14:43:42.974
*** SERVICE NAME:(ORAHCMU) 2010-08-23 14:43:42.974

DDE: Problem Key 'ORA 600 [15801]' was flood controlled (0x6) (incident: 19885)
ORA-00600: internal error code, arguments: [15801], [1], [], [], [], [], [], [], [], [], [], []
kxfxdss
KXFXSLAVESTATE dump [0, 0]
(pgakid: 0 oercnt: 0 oerrcd: -2224892588)
kxfxdss
no current cursor context.
kxfxdss
no cursors.

关于binary no match的问题已知是由于在实例启动情况下relink导致的;这个case提交了SR,metalink认为ORA-600 15801一般由QC与服务子进程通信问题引起:

The ORA-600 15801 is reporting a communication problem between QC and slaves related with messages sent/received.
Alert log reports several of the following error on the ASM instance:
ORA-600: internal error code, arguments: [15801], [1], [], [], [], [], [], 
[]

last wait was for 'eq: Msg Fragment' 

DIAGNOSTIC ANALYSIS:
--------------------
There were also several of the following message in the alert log:
WARNING: Oracle executable binary mismatch detected.
 Binary of new process does not match binary which started instance
issue alter system set "_disable_image_check" = true to disable these 
messages

So, I asked the customer to set the "_disable_image_check" = true 
This had no impact on the ora-600 errors as expected.

ORA-600 [15801] is signalled when a message overflow occurs between  PQ 
processes.

WORKAROUND:
-----------
none 
RELATED BUGS:
-------------
none
REPRODUCIBILITY:
----------------
intermittent but frequently - occurs at all different times of the day.
STACK TRACE:
------------
*** ID:(29.2904) 2006-07-05 15:50:57.972
qksceLinearToCe error
*** 15:50:58.233
ksedmp: internal or fatal error
ORA-600: internal error code, arguments: [15801], [1], [], [], [], [], [], 
[]
----- Call Stack Trace -----

kxfxGeter qks3tttdefReceive kxfxsui kxfxsp kxfxmai kxfprdp 

    SO: 0x67977018, type: 4, owner: 0x6793f208, flag: INIT/-/-/0x00
    (session) sid: 29 trans: (nil), creator: 0x6793f208, flag: (c0000041) 
USR/- BSY/-/-/-/-/-
              DID: 0000-0012-0000FADB, short-term DID: 0000-0000-00000000
              txn branch: (nil)
              oct: 3, prv: 0, sql: (nil), psql: (nil), user: 0/SYS
    O/S info: user: oracle, term: , ospid: 4558, machine: 
    last wait for 'eq: Msg Fragment' blocking sess=0x(nil) seq=2 
wait_time=4441 seconds since wait started=3
                ct path write=1002ffff, ct path write temp=2, Network=0
    Dumping Session Wait History
     for 'eq: Msg Fragment' count=1 wait_time=4441
                ct path write=1002ffff, ct path write temp=2, Network=0
     for 'eq: Msg Fragment' count=1 wait_time=31
                ct path write=1002ffff, ct path write temp=1, Network=0
    temporary object counter: 0

最后这个case通过设置10235和10501事件后错误不再产生了:

event = "10235 trace name context forever, level 2"  

10235, 00000, "check memory manager internal structures" 

event = "10501 trace name context forever, level 1"
  
10501, 00000, "periodically check selected heap"
// *Cause:
// *Action:
//    Level:  0x01 PGA
//            0x02 SGA
//            0x04 UGA
//            0x08 current call
//            0x10 user call
//            0x20 large allocation pool

Oracle内部错误:ORA-00600:[4097]一例

一套Linux上的10.2.0.4系统在异常恢复后(使用_allow_resetlogs_corruption隐藏参数打开后遭遇ORA-00600:[40xx]相关的内部错误,创建并切换到了新的撤销表空间上)出现ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []内部错误,当该非内部错误(non-fatal)出现100次以上时会在告警日志alert.log中出现记录。
并有可能导致实例crash,具体日志如下:

 

如果自己搞不定可以找ASKMACLEAN专业ORACLE数据库修复团队成员帮您恢复!

 

 

Errors in file /s01/10gdb/admin/clinica/bdump/clinica_smon_21463.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Tue Jan  4 23:13:19 2011
Non-fatal internal error happenned while SMON was doing logging scn->time mapping.
SMON encountered 1 out of maximum 100 non-fatal internal errors.

clinica_smon_21463.trc:
Dump of buffer cache at level 4 for tsn=1, rdba=8388633
BH (0x91fdf428) file#: 2 rdba: 0x00800019 (2/25) class: 19 ba: 0x91c62000
  set: 3 blksize: 8192 bsi: 0 set-flg: 0 pwbcnt: 0
  dbwrid: 0 obj: -1 objn: 0 tsn: 1 afn: 2
  hash: [fcf7dd68,fcf7dd68] lru: [91fdf5b8,91fdf398]
  ckptq: [NULL] fileq: [NULL] objq: [f5b53d60,f5b53d60]
  use: [fa694970,fa694970] wait: [NULL]
  st: XCURRENT md: SHR tch: 0
  flags: gotten_in_current_mode
  LRBA: [0x0.0.0] HSCN: [0xffff.ffffffff] HSUB: [65535]
  buffer tsn: 1 rdba: 0x00800019 (2/25)
  scn: 0x0000.0352d07c seq: 0x01 flg: 0x00 tail: 0xd07c2601
  frmt: 0x02 chkval: 0x0000 type: 0x26=KTU SMU HEADER BLOCK

/* 这里dump了一个tsn=1,file#=2的数据块,
    可以看到它的类型是KTU SMU HEADER BLOCK即某个回滚段头
*/

Hex dump of block: st=0, typ_found=1
........................
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Current SQL statement for this session:
insert into smon_scn_time (thread, time_mp, time_dp, scn, scn_wrp, scn_bas,  num_mappings, tim_scn_map) 
values (0, :1, :2, :3, :4, :5, :6, :7)
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst()+31          call     ksedst1()            000000000 ? 000000001 ?
                                                   7FFFF53BC160 ? 7FFFF53BC1C0 ?
                                                   7FFFF53BC100 ? 000000000 ?
ksedmp()+610         call     ksedst()             000000000 ? 000000001 ?
                                                   7FFFF53BC160 ? 7FFFF53BC1C0 ?
                                                   7FFFF53BC100 ? 000000000 ?
ksfdmp()+21          call     ksedmp()             000000003 ? 000000001 ?
                                                   7FFFF53BC160 ? 7FFFF53BC1C0 ?
                                                   7FFFF53BC100 ? 000000000 ?
kgeriv()+176         call     ksfdmp()             000000003 ? 000000001 ?
                                                   7FFFF53BC160 ? 7FFFF53BC1C0 ?
                                                   7FFFF53BC100 ? 000000000 ?
kgesiv()+119         call     kgeriv()             0068C97C0 ? 2ABDF1D42BF0 ?
                                                   000000000 ? 0F4A33EA0 ?
                                                   7FFFF53BC100 ? 000000000 ?
ksesic0()+209        call     kgesiv()             0068C97C0 ? 2ABDF1D42BF0 ?
                                                   000001001 ? 000000000 ?
                                                   7FFFF53BCEE0 ? 000000000 ?
ktugti()+3200        call     ksesic0()            000001001 ? 0068C9940 ?
                                                   000000000 ? 00000009A ?
                                                   000000010 ? 101010101010101 ?
ktsftcmove()+4149    call     ktugti()             0B73F111C ? 7FFFF53BD278 ?
                                                   7FFFF53BD280 ? 000000000 ?
                                                   7FFFF53BD27C ? 7FFFF53BD270 ?
ktsf_gsp()+1937      call     ktsftcmove()         00000000A ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   7FFFF53BD27C ? 7FFFF53BD270 ?
kdtgsp()+512         call     ktsf_gsp()           000000000 ? 7FFFF53BF460 ?
                                                   000000024 ? 000000002 ?
                                                   7FFFF53BF460 ? 000000000 ?
kdccak()+111         call     kdtgsp()             2ABDF1D6A2D8 ? 7FFF00000000 ?
                                                   2ABDF1D68530 ? 000000002 ?
                                                   7FFFF53BF460 ? 000000000 ?
kdcgcs()+5419        call     kdccak()             2ABDF1D6A2D8 ? 000000001 ?
                                                   0F4A3BBA8 ? 000000000 ?
                                                   2ABDF1D6A370 ? 000000000 ?
kdcgsp()+1372        call     kdcgcs()             2ABDF1D6A2D8 ? 000000001 ?
                                                   0F4A3BBA8 ? 000000000 ?
                                                   2ABDF1D6A370 ? 000000000 ?
kdtInsRow()+1808     call     kdcgsp()             2ABDF1D6A2D8 ? 000000001 ?
                                                   0F4A3BBA8 ? 000000000 ?
                                                   2ABDF1D6A370 ? 000000000 ?
insrow()+342         call     kdtInsRow()          2ABDF1D6A2D8 ? 000000001 ?
                                                   0F4A3BBA8 ? 000000000 ?
                                                   2ABDF1D6A370 ? 000000000 ?
insdrv()+594         call     insrow()             2ABDF1D6A2D8 ? 7FFFF53BFCC8 ?
                                                   000000000 ? 0F4A33DE0 ?
                                                   2ABDF1D6A370 ? 000000000 ?
inscovexe()+404      call     insdrv()             2ABDF1D6A2D8 ? 7FFFF53BFCC8 ?
                                                   000000000 ? 2ABDF1D6D908 ?
                                                   2ABDF1D6A370 ? 000000000 ?
insExecStmtExecIniE  call     inscovexe()          0F4A33DE0 ? 0F4A3C230 ?
ngine()+85                                         7FFFF53C0EF0 ? 2ABDF1D69F20 ?
                                                   2ABDF1D6A370 ? 000000000 ?
insexe()+386         call     insExecStmtExecIniE  0F4A33DE0 ? 0F4A3C230 ?
                              ngine()              2ABDF1D69F20 ? 2ABDF1D69F20 ?
                                                   2ABDF1D6A370 ? 000000000 ?
opiexe()+9182        call     insexe()             0F4A333A8 ? 7FFFF53C0EF0 ?
                                                   0F4A33DE0 ? 2ABDF1D69F20 ?
                                                   2ABDF1D6A370 ? 2ABDF1D69F20 ?
opiall0()+1842       call     opiexe()             000000049 ? 000000003 ?
                                                   7FFFF53C12F8 ? 000000001 ?
..............

针对该ORA-00600:[4097]内部错误,metalink上Note [ID 1030620.6]介绍了一种workaround的方法:

An ORA-600 [4097] can be encountered through various activities that use 
rollback segments.

Solution Description: 
===================== 

The most likely cause of this is BUG 427389.  This BUG is fixed in
version 7.3.3.3.  The BUG is caused when Rollback Segments are dropped and 
recreated after a shutdown abort.  It is encountered through a very specific 
set of circumstances: 

When an instance has a rollback segment offline and the instance crashes, or 
the user does a shutdown abort, the rollback segment wrap number does not get 
updated.  If that segment is then dropped and recreated immediately after the 
instance is restarted, the wrap number could be lower than existing wrap 
numbers.  This will cause the ORA-600[4097] to occur in subsequent 
transactions using Rollback. 

To avoid encountering this bug, rollback segments should only be dropped and 
recreated after the instance has been shutdown normal and restarted.  If you 
have already encountered the bug, use the following workaround:  

   Select segment_name, segment_id from dba_rollback_segs; 

   Drop all Rollback Segments except for SYSTEM.  

   Recreate dummy (small) rollback segments with the same names in their place. 

   Then, recreate additional rollback segments you want to keep with their 
   permanent storage parameters.   

   Now drop the dummy ones. This should ensure that the segment_ids are not 
   reused. 

If you ever want to add a rollback segment you have to use the workaround steps
again.  If you do not fill the dummy slots you may see the problem re-appear.

我们可以尝试drop异常恢复前已有的可能存在问题的rollback segment来规避这个问题,虽然在10g下使用AMU(automatic managed undo)但仍可以做到这一点:

SQL> alter system set "_smu_debug_mode"=4;
System altered.

/* 设置SMU debug模式为4以便能够手动管理回滚段 */

SQL> set heading off 

SQL> select 'drop rollback segment "'||segment_name||'";' from dba_rollback_segs where segment_name!='SYSTEM';

drop rollback segment "_SYSSMU1$";
drop rollback segment "_SYSSMU2$";
drop rollback segment "_SYSSMU3$";
drop rollback segment "_SYSSMU4$";
drop rollback segment "_SYSSMU5$";
drop rollback segment "_SYSSMU6$";
drop rollback segment "_SYSSMU7$";
drop rollback segment "_SYSSMU8$";
drop rollback segment "_SYSSMU9$";
drop rollback segment "_SYSSMU10$";
drop rollback segment "_SYSSMU11$";
drop rollback segment "_SYSSMU12$";
drop rollback segment "_SYSSMU13$";
drop rollback segment "_SYSSMU14$";
drop rollback segment "_SYSSMU15$";
drop rollback segment "_SYSSMU16$";
drop rollback segment "_SYSSMU17$";
drop rollback segment "_SYSSMU18$";
drop rollback segment "_SYSSMU19$";
drop rollback segment "_SYSSMU20$";
drop rollback segment "_SYSSMU21$";
drop rollback segment "_SYSSMU22$";
drop rollback segment "_SYSSMU23$";
drop rollback segment "_SYSSMU24$";
drop rollback segment "_SYSSMU25$";
drop rollback segment "_SYSSMU26$";
drop rollback segment "_SYSSMU27$";
drop rollback segment "_SYSSMU28$";
drop rollback segment "_SYSSMU29$";
drop rollback segment "_SYSSMU30$";

30 rows selected.

/* 依次执行以上的drop rollback segment回滚段的命令
    注意当前撤销表空间上的回滚段仅能offline而无法drop掉,
    实际上我们需要做的也仅仅是把之前undo表空间上有问题的回滚段drop掉
*/

SQL> alter rollback segment "_SYSSMU30$" offline;
Rollback segment altered.

SQL> drop rollback segment "_SYSSMU30$";
drop rollback segment "_SYSSMU30$"
*
ERROR at line 1:
ORA-30025: DROP segment '_SYSSMU30$' (in undo tablespace) not allowed

SQL> alter rollback segment "_SYSSMU30$" online;
Rollback segment altered.

经过以上drop问题回滚段rollback segment后,系统不再出现ORA-00600:[4097]内部错误,实例恢复正常。在系统正常后,我们有必要重置之前所设的”_smu_debug_mode”UNDO管理debug模式的隐藏参数。

Oracle内部错误:ORA-00600:[6033]一例

一套HP-UX上的9.2.0.8系统,某条查询语句执行时出现ORA-00600: internal error code, arguments: [6033], [], [], [], [], [], [], []内部错误,错误trace信息如下:

*** SESSION ID:(583.18281) 2010-12-20 22:49:01.364
*** 2010-12-20 22:49:01.364
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [6033], [], [], [], [], [], [], []
Current SQL statement for this session:
SELECT INTERFACE_HEADER_ID, DOCUMENT_SUBTYPE, AGENT_ID, VENDOR_SITE_ID FROM PO_HEADERS_INTERFACE WHE
RE WF_GROUP_ID = :B1 ORDER BY INTERFACE_HEADER_ID
----- PL/SQL Call Stack -----
object line object
handle number name
c0000001067e3328 4332 package body APPS.PO_AUTOCREATE_DOC
c0000000fd267060 1 anonymous block
c000000108fe4d60 1979 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 1745 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 1099 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 560 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 1863 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 1099 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 560 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 1863 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 1099 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 560 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 1863 package body
PL/SQL call stack truncated after 1024 bytes.
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedmp()+184 ? ksedst() C0000000CEB36420 ?
400000000147994B ?

已知的ORA-00600:[6033]错误一般和索引逻辑讹误相关,metalink上有相关的Note建议在出现该错误后运行analyze table validate structure cascade语句以验证表与索引间的数据正确性。

ORA-600 [6033] "null value retrieved from index leaf lookup" [ID 45795.1]
Modified 03-JUN-2010 Type REFERENCE Status PUBLISHED
Note: For additional ORA-600 related information please read Note:146580.1

PURPOSE:
This article represents a partially published OERI note.
It has been published because the ORA-600 error has been
reported in at least one confirmed bug.
Therefore, the SUGGESTIONS section of this article may help
in terms of identifying the cause of the error.
This specific ORA-600 error may be considered for full publication
at a later date. If/when fully published, additional information
will be available here on the nature of this error.
SUGGESTIONS:
Run the ANALYZE command on any tables and indexes in the
trace file:
Example: ANALYZE TABLE
 VALIDATE STRUCTURE CASCADE;
Rebuild any corrupted indexes.
Index corruption.
Known Bugs

NB Bug Fixed Description
6401576 9.2.0.8.P22 OERI[ktbair1] / ORA-600 [6101] index corruption possible
5845232 9.2.0.8.P06 Block corruption / errors from concurrent dequeue operations
2718937 9.2.0.4, 10.1.0.2 OERI:6033 from SELECT on IOT with COMPRESSED PRIMARY KEY
1573283 8.1.7.2, 9.0.1.0 OERI:6033 from ALTER INDEX .. REBUILD ONLINE PARAMETERS ('OPTIMIZE FULL')
Certain index operations can lead to block corruption / memory corruption with varying symptoms such as ORA-600 [6033], ORA-600 [6101] , ORA-600 [ktbair1] , ORA-600 [kcbzpb_1], ORA-600 [4519] and ORA-600 [kcoapl_blkchk] if DB_BLOCK_CHECKING is enabled. Concurrent dequeue operations can lead to block corruption / memory corruption with varying symptoms such as ORA-600 [6033], ORA-600 [6101] and ORA-600 [kcoapl_blkchk] if DB_BLOCK_CHECKING is enabled. Note: This issue was previously fixed under bug 5559640 but that fix had a serious problem which could lead to SGA memory corruption. This fix supercedes the fix for bug 5559640. The problem with patch 5559640 is alerted in Note:414109.1 This fix is superceeded by the fix for bug 6401576.

通过analyze table validate structure cascade命令验证索引后若存在问题则会进一步产生相关的trace文件,一般这类索引逻辑讹误的问题可以通过drop-recreate索引来解决。

Oracle内部错误:ORA-00600[17175]一例

一套HP-UX上的10.2.0.4系统出现ORA-00600[17175] Oracle600内部错误,相关的日志信息如下:

Wed Dec 1 01:57:55 2010
Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_pmon_3250.trc:
ORA-00600: internal error code, arguments: [17175], [255], [], [], [], [], [], []
ORA-00601: cleanup lock conflict
Wed Dec 1 01:57:57 2010
Trace dumping is performing id=[cdmp_20101201015757]
Wed Dec 1 01:58:05 2010
LGWR: terminating instance due to error 472
Wed Dec 1 01:58:05 2010
Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lms1_3291.trc:
ORA-00472: PMON process terminated with error
Wed Dec 1 01:58:05 2010
Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lms2_3293.trc:
ORA-00472: PMON process terminated with error
Wed Dec 1 01:58:05 2010
Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lms3_3295.trc:
ORA-00472: PMON process terminated with error
Wed Dec 1 01:58:05 2010
Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lms0_3289.trc:
ORA-00472: PMON process terminated with error
Wed Dec 1 01:58:05 2010
Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lmon_3283.trc:
ORA-00472: PMON process terminated with error
Wed Dec 1 01:58:05 2010
Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lmd0_3287.trc:
ORA-00472: PMON process terminated with error
Wed Dec 1 01:58:05 2010
Shutting down instance (abort)
License high water mark = 421

/u01/app/oracle/admin/xgp2/bdump/xgp21_pmon_3250.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/10.2.0/db_1
System name: HP-UX
Node name: XGP2_db1
Release: B.11.31
Version: U
Machine: ia64
Instance name: xgp21
Redo thread mounted by this instance: 1
Oracle process number: 2
Unix process pid: 3250, image: oracle@XGP2_db1 (PMON)

*** SERVICE NAME:(SYS$BACKGROUND) 2010-12-01 01:57:55.933
*** SESSION ID:(333.1) 2010-12-01 01:57:55.933
*** 2010-12-01 01:57:55.933
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [17175], [255], [], [], [], [], [], []
ORA-00601: cleanup lock conflict


ksedst <- ksedmp <- ksfdmp <- kgeriv <- kgesiv
<- kgesic1 <- kghcln <- kslilcr <- $cold_ksl_cleanup <- ksepop
<- kgepop <- kgesev <- ksesec0 <- $cold_kslges <- ksl_get_child_latch
<- kslgpl <- es <- ksfglt <- kghext_numa <- ksmasgn
<- kghnospc <- $cold_kghalo <- ksmdacnk <- ksmdget <- ksosp_alloc
<- ksoreq_submit <- ksbsrv <- kmmssv <- kmmlsa <- kmmlod
<- ksucln <- ksbrdp <- opirip <- $cold_opidrv <- sou2o
<- $cold_opimai_real <- main <- main_opd_entry

PROCESS STATE
-------------
Process global information:
process: c00000018d000078, call: c00000018d252238, xact: 0000000000000000, curses: c00000018d2508a8, usrses: c00000018d2508a8
----------------------------------------
SO: c00000018d000078, type: 2, owner: 0000000000000000, flag: INIT/-/-/0x00
(process) Oracle pid=2, calls cur/top: c00000018d252238/c00000018d252238, flag: (e) SYSTEM
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 48
last post received-location: ksoreq_reply
last process to post me: c00000018d037978 1 64
last post sent: 0 0 24
last post sent-location: ksasnd
last process posted by me: c00000018d001058 1 6
(latch info) wait_event=0 bits=90
holding (efd=5) c00000020001d500 Parent+children shared pool level=7
Location from where latch is held: kghfrunp: alloc: clatch nowait:
Context saved from call: 0
state=busy, wlstate=free
holding (efd=5) c00000020000b5f8 OS process allocation level=4
Location from where latch is held: ksoreq_submit:
Context saved from call: 13835058076152957304
state=busy, wlstate=free
Process Group: DEFAULT, pseudo proc: c0000004dd263230
O/S info: user: oracle, term: UNKNOWN, ospid: 3250
OSD pid info: Unix process pid: 3250, image: oracle@XGP2_db1 (PMON)


SO: c0000004df4d5f28, type: 19, owner: c00000018d000078, flag: INIT/-/-/0x00
GES MSG BUFFERS: st=emp chunk=0x0000000000000000 hdr=0x0000000000000000 lnk=0x0000000000000000 flags=0x0 inc=4
outq=0 sndq=0 opid=2 prmb=0x0
mbg[i]=(2 19) mbg[b]=(0 0) mbg[r]=(0 0)
fmq[i]=(4 1) fmq[b]=(0 0) fmq[r]=(0 0)
mop[s]=20 mop[q]=1 pendq=0 zmbq=0
nonksxp_recvs=0
------------process 0xc0000004df4d5f28--------------------
proc version : 0
Local node : 0
pid : 3250
lkp_node : 0
svr_mode : 0
proc state : KJP_NORMAL
Last drm hb acked : 0
Total accesses : 181
Imm. accesses : 180
Locks on ASTQ : 0
Locks Pending AST : 0
Granted locks : 0
AST_Q:
PENDING_Q:
GRANTED_Q:
----------------------------------------
SO: c00000018d2f3610, type: 11, owner: c00000018d000078, flag: INIT/-/-/0x00
(broadcast handle) flag: (2) ACTIVE SUBSCRIBER, owner: c00000018d000078,
event: 1, last message event: 1,
last message waited event: 1, messages read: 0
channel: (c0000004dd29fdb0) scumnt mount lock
scope: 1, event: 19, last mesage event: 0,
publishers/subscribers: 0/19,
messages published: 0
SO: c00000018d2508a8, type: 4, owner: c00000018d000078, flag: INIT/-/-/0x00
(session) sid: 333 trans: 0000000000000000, creator: c00000018d000078, flag: (51) USR/- BSY/-/-/-/-/-
DID: 0001-0002-00000003, short-term DID: 0000-0000-00000000
txn branch: 0000000000000000
oct: 0, prv: 0, sql: 0000000000000000, psql: 0000000000000000, user: 0/SYS
service name: SYS$BACKGROUND
last wait for 'latch: shared pool' blocking sess=0x0000000000000000 seq=342 wait_time=175677 seconds since wait started=0
address=c0000002000fff60, number=d6, tries=7
Dumping Session Wait History
for 'latch: shared pool' count=1 wait_time=175677
address=c0000002000fff60, number=d6, tries=7
for 'latch: shared pool' count=1 wait_time=97554
address=c0000002000fff60, number=d6, tries=6
for 'latch: shared pool' count=1 wait_time=78023
address=c0000002000fff60, number=d6, tries=5
for 'latch: shared pool' count=1 wait_time=38978
address=c0000002000fff60, number=d6, tries=4
for 'latch: shared pool' count=1 wait_time=38942
address=c0000002000fff60, number=d6, tries=3
for 'latch: shared pool' count=1 wait_time=19435
address=c0000002000fff60, number=d6, tries=2
for 'latch: shared pool' count=1 wait_time=12655
address=c0000002000fff60, number=d6, tries=1
for 'latch: shared pool' count=1 wait_time=8
address=c0000002000fff60, number=d6, tries=0
for 'os thread startup' count=1 wait_time=144253
=0, =0, =0
for 'os thread startup' count=1 wait_time=141360
=0, =0, =0


SO: c00000018d2f3500, type: 11, owner: c00000018d000078, flag: INIT/-/-/0x00
(broadcast handle) flag: (2) ACTIVE SUBSCRIBER, owner: c00000018d000078,
event: 2, last message event: 40,
last message waited event: 40, messages read: 1
channel: (c0000004dd29bbd8) system events broadcast channel
scope: 2, event: 224634, last mesage event: 40,
publishers/subscribers: 0/161,
messages published: 1


SO: c00000018d252238, type: 3, owner: c00000018d000078, flag: INIT/-/-/0x00
(call) sess: cur c00000018d2508a8, rec 0, usr c00000018d2508a8; depth: 0
----------------------------------------
SO: c00000018d2594b0, type: 5, owner: c00000018d252238, flag: INIT/-/-/0x00
(enqueue) PR-00000000-00000000 DID: 0001-0002-00000003
lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 res_flag: 0x2
res: 0xc0000004df401718, mode: X, lock_flag: 0x0
own: 0xc00000018d2508a8, sess: 0xc00000018d2508a8, proc: 0xc00000018d000078, prv: 0xc0000004df401728
----------------------------------------
SO: c00000018d30b710, type: 16, owner: c00000018d000078, flag: INIT/-/-/0x00
(osp req holder)
CHILD REQUESTS:
(osp req) type=2(BACKGROUND) flags=0x20001(STATIC/-) state=1(INITED) err=0
pg=0 arg1=0 arg2=(null) reply=(null) pname=S018
pid=0 parent=c00000018d30b710 fulfill=0000000000000000
----------------------------------------
SO: c0000004dbff09c0, type: 192, owner: c0000004dbff09c0, flag: -/-/-/0x00

在metalink上搜索600[17175]内部错误相关的文档,可以找到该错误的大量信息:

Keywords: ora-00600 [17175]

1. Bug 6250251: ORA-00600 17175 DURING KGI CLEANUP - DUMP - ORADEBUG
--ora-600 followed by ora-601 and instance crash with ORA-17175.
--Also, setting of heap check event triggers this problem. In this case
--it is event="10235 trace name context forever, level 27"

2. Bug 4216668 - Dump from INSERT / MERGE on internal columns (Doc ID 4216668.8)
--INSERT or MERGE commands might core dump if operating on object types and internal columns are involved.

3. Bug 7590297: ORA-600 [17175] [255] ORA-601: CLEANUP LOCK CONFLICT CRASHED THE DATABASE

4. SR 3-2296150050
--The error has occurred when Oracle was cleaning shared pool latch/heap information about the process 
which died in middle.
--There is no data corruption associated with this error.
--This is evident from the function kghcln in the trace stack at which it failed.
--This problem is usually the symptom of some earlier problem with the latch.
--Either after a process has died, or a process has signaled an error while holding a shared pool latch, 
and the index to the shared pool latch is invalid.

--There was a Bug 7590297 raised for this issue which could not be progressed due to unavailability of information.
--From few earlier known issues - This can be due to PMON may sometimes signal ORA-601 
while trying to start up additional shared servers or dispatchers.
--There the workaround suggested was to Start the instance with max # of shared servers.

--Can you reproduce the problem?If the instance has been restated the issue may not persist as it is related to memory.
--If the issue persists then we have to perform the following to monitoring the instance to investigate further:

--1. Set the following event in parameter file:
--event="10257 trace name context forever, level 10"
--event="601 trace name SYSTEMSTATE level 10"

--The first event will cause PMON to dump info about shared server startup.
--The second event will cause PMON to do a system state dump when the 601 occurs.

--2. You should also have the track of this in intervals and save the historical results from:

--SQL> select e.total_waits, e.total_timeouts, e.time_waited from v$session_event e, v$session s
, v$bgprocess b where b.name='PMON' and s.paddr=b.paddr and e.sid=s.sid and e.event='process startup';


5. SR 3-2123025401
--=== ODM Solution / Action Plan ===
--Disabled NUMA for resolution

6. SR 7314313.994

Analysis:

Bug 6250251 and bug 4216668 are not applicable to this case.
Bug 7590297 is applicable to this case, as the call stack, error message are the same with this case. 
But this patch is suspended as requested info is not available.

SR 3-2296150050: same error message, same DB version, similar call stack; closed without solution.
SR 3-2123025401: same error message, same DB version, similar call stack. 
The issue happened twice in that SR and solved by disabling NUMA
SR 7314313.994: same error message, same DB version, similar call stack; closed without solution.

ERROR:
ORA-600 [17175] [a]


VERSIONS:
versions 9.2 to 10.1


DESCRIPTION:


This error occurs when we are cleaning up a shared pool latch (either after a process has died, 
or a process has signaled an error while holding a shared pool latch), 
and the index to the shared pool latch is invalid.


ARGUMENTS:
Arg [a] index of the latch recovery structure - usually 255

FUNCTIONALITY:
Generic Heap Manager


IMPACT:
INSTANCE HANG
PROCESS FAILURE
INSTANCE FAILURE

以下为Oracle GCS给出的行动计划,GCS认为绝大多数ORA-00600 [17xxx]是由memory相关的问题引起的,这些问题往往在重启实例后就可以得到解决。并建议可以设置shared_servers=max_shared_servers后进一步观察:

From the uploaded files it looks like you were reported with ORA-00600 [17175] errors 
and crashed the instance.What is the current status after the restart of the database. 
Are you still reported with the same errors and crashing the instance ?
Mostly the ORA-00600 [17xxx] errors are memory releated and might have got resolved after the database restart.


Further looking at the uploaded trace file the failing functions and the error closely 
matches Bug 6958493and is closed as duplicate of BaseBug 6962340which is closed 
as could not able to reproduce the error.


Also a smillar issue is reported inBug 3104250which is fixed in 10g, but that doesn't mean 
you cannot get this error for a new reason and that the same workaround would fix it.
We need to implement the workaround and set: shared_servers=max_shared_servers 
if the error reproduces again. If this is still repeated issue then we can file a new bug with development for the same.


ACTION PLAN
===========
1. Monitor the alertlog for the ORA-00600 [17175] errors for the next few days and if the database still crashes then please
set shared_servers=max_shared_servers and see if the problem resolves or not.

Oracle内部错误:ORA-00600[kccchb_3]一例

一套Linux x86-64上的11.1.0.7 RAC系统,该RAC使用Netapps NFS作为共享存储。其中一个节点出现ORA-00600: internal error code, arguments: [kccchb_3]内部错误并导致实例意外终止,详细日志如下:

Mon Dec 27 00:03:13 2010
Error: Controlfile sequence number in file header is different from the one in memory
Please check that the correct mount optionsare used if controlfile is located on NFS
Errors in file /d02/oratst/TSMOTEBS/db/11.1.0/admin/TSMOTEBS1_l118464dcss3001/diag/rdbms/tsmotebs/TSMOTEBS1/trace/TSMOTEBS1_ckpt_15907.trc (incident=11353):
ORA-00600: internal error code, arguments: [kccchb_3], [555541], [555540], [555540], [], [], [], [], [], [], [], []
Incident details in: /d02/oratst/TSMOTEBS/db/11.1.0/admin/TSMOTEBS1_l118464dcss3001/diag/rdbms/tsmotebs/TSMOTEBS1/incident/incdir_11353/TSMOTEBS1_ckpt_15907_i11353.trc
Errors in file /d02/oratst/TSMOTEBS/db/11.1.0/admin/TSMOTEBS1_l118464dcss3001/diag/rdbms/tsmotebs/TSMOTEBS1/trace/TSMOTEBS1_ckpt_15907.trc:
ORA-00600: internal error code, arguments: [kccchb_3], [555541], [555540], [555540], [], [], [], [], [], [], [], []
CKPT (ospid: 15907): terminating the instance due to error 469

根据Metalink文档Note 303986.1 RAC instance using NFS via Netapps report Ora-600 [kccchb_3]:

Symptoms
RAC instance using NFS via Netapps occasionally reports:
ORA-00600 [kccchb_3], [689], [0], [],[],[],[]
ORA-00600 [kclchkblk_3], [7], [738515663], [14], [], [], [], []

Cause
This is most probably a Netapp issue wherein 'noac' doesn't work as expected.
NetApp recommended mount options for Oracle9i RAC on Solaris:
rw,bg,vers=3,proto=tcp,hard,intr,rsize=32768,wsize=32768,forcedirectio,noac

NetApp recommended mount options for Linux SLES9, RHEL4, RHEL3 QU3 and later:
rw,bg,hard,nointr,tcp,vers=3,timeo=600,rsize=ORACLE_BLOCK_SIZE,
wsize=ORACLE_BLOCK_SIZE,actimeo=0

NetApp recommended mount options for Linux RHAS2.1, RHEL3 pre QU3:
rw,bg,hard,nointr,tcp,vers=3,timeo=600,rsize=ORACLE_BLOCK_SIZE,
wsize=ORACLE_BLOCK_SIZE,noac

And for AS 2.1 in /etc/modules.conf add:
options nfs nfs_uncached_io=1
Solution

Workaround
set filesystemio_options=DIRECTIO for all instances (to bypass any buffer cache)

以上介绍了该ORA-00600 [kccchb_3]内部错误可以通过设置初始化参数filesystemio_options为DIRECTIO来workaround;实际上更好的选择可以是设置filesystemio_options为SETALL,即同时使用DIRECTIO和ASYNC异步IO。

顺便提一下在Linux上启用异步ASYNC IO不仅需要设置该filesystemio_options参数,我们还需要同时保证以下几点:
1.filesystemio_options参数设置为ASYNC或者SETALL

2.合理设置Linux Kernel内核参数fs.aio-max-nr,例如设置为3145728;以及参数fs.file-max =6815744

3.disk_asynch_io初始化参数设置为TRUE

4.在10gR2下oracle binary可能没有正确以async_on选项编译,我们需要手动重新make(The reason behind this behavior is that the LIBAIO_0.1 Linux OS library is not attached for io_getevents.)

SQL>shutdown immediate
[maclean@rh8 ~]$ cd $ORACLE_HOME/rdbms/lib
[maclean@rh8 lib]$ ln -s /usr/lib/libaio.so skgaio.o
[maclean@rh8 lib]$ make PL_ORALIBS=-laio -f ins_rdbms.mk async_on
rm -f /u01/oracle/product/10.2.0/db_1/rdbms/lib/skgaioi.o
cp /u01/oracle/product/10.2.0/db_1/rdbms/lib/skgaio.o /u01/oracle/product/10.2.0/db_1/rdbms/lib/skgaioi.o
chmod 755 /u01/oracle/product/10.2.0/db_1/bin

 - Linking Oracle 
rm -f /u01/oracle/product/10.2.0/db_1/rdbms/lib/oracle
gcc  -o /u01/oracle/product/10.2.0/db_1/rdbms/lib/oracle -L/u01/oracle/product/10.2.0/db_1/rdbms/lib/ -L/u01/oracle/product/10.2.0/db_1/lib/ -L/u01/oracle/product/10.2.0/db_1/lib/stubs/ -L/usr/lib -lirc   -Wl,-E `test -f /u01/oracle/product/10.2.0/db_1/rdbms/lib/sskgpsmti.o && echo /u01/oracle/product/10.2.0/db_1/rdbms/lib/sskgpsmti.o` /u01/oracle/product/10.2.0/db_1/rdbms/lib/opimai.o /u01/oracle/product/10.2.0/db_1/rdbms/lib/ssoraed.o /u01/oracle/product/10.2.0/db_1/rdbms/lib/ttcsoi.o /u01/oracle/product/10.2.0/db_1/rdbms/lib/defopt.o  -Wl,--whole-archive -lperfsrv10 -Wl,--no-whole-archive /u01/oracle/product/10.2.0/db_1/lib/nautab.o /u01/oracle/product/10.2.0/db_1/lib/naeet.o /u01/oracle/product/10.2.0/db_1/lib/naect.o /u01/oracle/product/10.2.0/db_1/lib/naedhs.o /u01/oracle/product/10.2.0/db_1/rdbms/lib/config.o  -lserver10 -lodm10 -lnnet10 -lskgxp10 -lsnls10 -lnls10  -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -lhasgen10 -lcore10 -lskgxn2 -locr10 -locrb10 -locrutl10 -lhasgen10 -lcore10 -lskgxn2   -lclient10  -lvsn10 -lcommon10 -lgeneric10 -lknlopt `if /usr/bin/ar tv /u01/oracle/product/10.2.0/db_1/rdbms/lib/libknlopt.a | grep xsyeolap.o > /dev/null 2>&1 ; then echo "-loraolap10" ; fi` -lslax10 -lpls10  -lplp10 -lserver10 -lclient10  -lvsn10 -lcommon10 -lgeneric10 -lknlopt -lslax10 -lpls10  -lplp10 -ljox10 -lserver10 -lclsra10 -ldbcfg10 -locijdbcst10 -lwwg  `cat /u01/oracle/product/10.2.0/db_1/lib/ldflags`    -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lnro10 `cat /u01/oracle/product/10.2.0/db_1/lib/ldflags`    -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lmm -lsnls10 -lnls10  -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 `cat /u01/oracle/product/10.2.0/db_1/lib/ldflags`    -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lnro10 `cat /u01/oracle/product/10.2.0/db_1/lib/ldflags`    -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10   -lsnls10 -lnls10  -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 `if /usr/bin/ar tv /u01/oracle/product/10.2.0/db_1/rdbms/lib/libknlopt.a | grep "kxmnsd.o" > /dev/null 2>&1 ; then echo " " ; else echo "-lordsdo10"; fi` -lctxc10 -lctx10 -lzx10 -lgx10 -lctx10 -lzx10 -lgx10 -lordimt10 -lsnls10 -lnls10  -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -lsnls10 -lunls10  -lsnls10 -lnls10  -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -laio    `cat /u01/oracle/product/10.2.0/db_1/lib/sysliblist` -Wl,-rpath,/u01/oracle/product/10.2.0/db_1/lib -lm    `cat /u01/oracle/product/10.2.0/db_1/lib/sysliblist` -ldl -lm   -L/u01/oracle/product/10.2.0/db_1/lib
/u01/oracle/product/10.2.0/db_1/lib//libcore10.a(lcd.o): In function `lcdprm':
lcd.c:(.text+0x8e8): warning: the `gets' function is dangerous and should not be used.
mv -f /u01/oracle/product/10.2.0/db_1/bin/oracle /u01/oracle/product/10.2.0/db_1/bin/oracleO
mv /u01/oracle/product/10.2.0/db_1/rdbms/lib/oracle /u01/oracle/product/10.2.0/db_1/bin/oracle
chmod 6751 /u01/oracle/product/10.2.0/db_1/bin/oracle

SQL> startup;

/* 通过以下方式可以验证ASYNC IO的工作情况 */

cat /proc/slabinfo | grep kio

[maclean@rh8 ~]$ cat /proc/slabinfo | grep kio
kioctx                56     72    320   12    1 : tunables   54   27    8 : slabdata      6      6      0
kiocb                 15     15    256   15    1 : tunables  120   60    8 : slabdata      1      1      0

/* 这里kioctx的值大于零,说明正在使用异步ASYNC IO */

如何是11g,那么无需那么麻烦要重新编译async_on,同时可以通过以下手段了解异步ASYNC IO的使用情况:

SQL> col name for a60 
SQL> SELECT name, asynch_io FROM v$datafile f,v$iostat_file i 
WHERE f.file# = i.file_no
AND filetype_name = 'Data File' ; 

NAME							     ASYNCH_IO
------------------------------------------------------------ ---------
/standby/oradata/PROD/datafile/o1_mf_system_6q9dwgwh_.dbf    ASYNC_ON
/standby/oradata/PROD/datafile/o1_mf_sysaux_6q9dwgyp_.dbf    ASYNC_ON
/standby/oradata/PROD/datafile/o1_mf_undotbs1_6q9dwh0r_.dbf  ASYNC_ON
/standby/oradata/PROD/datafile/o1_mf_users_6q9dwh2x_.dbf     ASYNC_ON
/standby/oradata/PROD/datafile/o1_mf_example_6q9dzhh1_.dbf   ASYNC_ON
/standby/oradata/PROD/datafile/o1_mf_enc_6q9fdzcx_.dbf	     ASYNC_ON

6 rows selected.

/* 可以看到ASYNCH_IO的状态为ON,说明正使用异步IO */

ORA-00600:[kclchkinteg_2]及[kjmsm_epc]内部错误一例

一套AIX上的9.2.0.6 RAC系统,alert日志中最初报LMS进程(Lock Manager Server process,即锁服务管理进程,仅出现在RAC系统中)遭遇ORA-00600[kclchkinteg_2],继而出现ORA-00600[[kjmsm_epc]内部错误导致实例crash(instance crashed)。相关日志如下:

Tue Dec 1 01:20:25 2009
Errors in file /oracle/admin/crmdb/bdump/crmdb1_lms7_2143190.trc:
ORA-00600: internal error code, arguments: [kclchkinteg_2], [], [], [], [], [], [], []
Tue Dec 1 01:20:25 2009
Errors in file /oracle/admin/crmdb/bdump/crmdb1_lms7_2143190.trc:
ORA-00600: internal error code, arguments: [kclchkinteg_2], [], [], [], [], [], [], []
Tue Dec 1 01:20:26 2009
Trace dumping is performing id=[cdmp_20091201012026]
Tue Dec 1 01:20:40 2009
Errors in file /oracle/admin/crmdb/bdump/crmdb1_lms7_2143190.trc:
ORA-00600: internal error code, arguments: [kjmsm_epc], [], [], [], [], [], [], []
Tue Dec 1 01:20:41 2009
Errors in file /oracle/admin/crmdb/bdump/crmdb1_lms7_2143190.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [kjmsm_epc], [], [], [], [], [], [], []
Tue Dec 1 01:20:43 2009

经过和metalink确认,认为此次实例意外终止时由于”Bug 3671867 – OERI[kclchkinteg_2] possible in RAC environment”:

Hdr: 3671867 10.1.0.2 RDBMS 10.1.0.2 RAC PRODID-5 PORTID-212 ORA-600
Abstract: INSTANCE TERMINATED WITH ORA-600 [KCLCHKINTEG_2]
PROBLEM:
--------
During testing, one instance was terminated with ORA-600 [KCLCHKINTEG_2]

DIAGNOSTIC ANALYSIS:
--------------------
Alert log contains:

Thu Jun  3 12:57:26 2004
Private_strands 0 at log switch
Thread 1 advanced to log sequence 153
  Current log# 1 seq# 153 mem# 0: +EMDBD1/dbotazc0/onlinelog/group_1.267.1
  Current log# 1 seq# 153 mem# 1: +EMDBD1/dbotazc0/onlinelog/group_1.268.1
Thu Jun  3 13:41:45 2004
Private_strands 0 at log switch
Thread 1 advanced to log sequence 154
  Current log# 2 seq# 154 mem# 0: +EMDBD1/dbotazc0/onlinelog/group_2.269.1
  Current log# 2 seq# 154 mem# 1: +EMDBD1/dbotazc0/onlinelog/group_2.270.1
Thu Jun  3 13:55:06 2004
Errors in file /DBA/nest/oracle/DBOTA1/logs/dbota1_dbw0_82244.trc:
ORA-600: internal error code, arguments: [kclchkinteg_2], [], [], [], [],
[], [], []
Thu Jun  3 13:55:08 2004
Errors in file /DBA/nest/oracle/DBOTA1/logs/dbota1_dbw0_82244.trc:
ORA-600: internal error code, arguments: [kclchkinteg_2], [], [], [], [],
[], [], []
Thu Jun  3 13:55:08 2004
DBW0: terminating instance due to error 471
Thu Jun  3 13:55:08 2004
Trace dumping is performing id=[cdmp_20040603135508]
Thu Jun  3 13:55:10 2004
Dump system state for local instance only
Thu Jun  3 13:55:10 2004
Trace dumping is performing id=[cdmp_20040603135510]
The only trace file we have been sent is the dbw trace.
WORKAROUND:
-----------
None
RELATED BUGS:
-------------
There are several which end in KCLCHKINTEG but not KCLCHKINTEG_2
REPRODUCIBILITY:
----------------
Appears only to have happened once so far.
TEST CASE:
----------
N/A
STACK TRACE:
------------
ksedmp ksfdmp kgerinv kgeasnmierr kclassertle kclchkinteg kclfwrite1 kcbbic1
kcbbiop kcbbdrv ksbabs ksbrdp opirip opidrv sou2o main start

SUPPORTING INFORMATION:
-----------------------
Bruce Carter has looked at this and suggested a bug be raised.
Affects:
    Product (Component)	Oracle Server (Rdbms)
    Range of versions believed to be affected	Versions < 10.2
    Versions confirmed as being affected
        * 10.1.0.3
    Platforms affected	Generic (all / most platforms affected)
Fixed:
    This issue is fixed in
        * 9.2.0.7 (Server Patch Set)
        * 10.1.0.4 (Server Patch Set)
        * 10.2.0.1 (Base Release)
Symptoms:
Related To:
    * Internal Error May Occur (ORA-600)
    * Instance May Crash
    * ORA-600 [kclchkinteg_2]
    * RAC (Real Application Clusters) / OPS
Description
    ORA-600 [kclchkinteg_2] possible in RAC environment

Oracle开发部分确认该3671867 bug已在9.2.0.8中得到修复,建议通过应用9.2.0.8补丁集或者打上bug 3671867的one-off patch来修复该问题;该Bug可能导致实例意外终止,因此其Severity也极高,值得手头仍有9i RAC系统需要管理的dba注意。

ora-00600[kkocxj:pjpCtx]内部错误一例

一套HP-UX上的10.2.0.4系统在运行某条 select查询语句时出现ORA-00600[kkocxj:pjpCtx]内部错误,TRACE文件信息如下:


FILE VERSION
------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /oracle/app/oracle/product/10.2
System name:    HP-UX
Node name:      crmdb1
Release:        B.11.31
Version:        U
Machine:        ia64
Instance name: cbssnm
Redo thread mounted by this instance: 1

TRACE FILE
---------------
Filename = cbssnm_ora_29061.trc

*** ACTION NAME:(SQL 窗口 - 新建) 2010-07-02 15:59:46.238
*** MODULE NAME:(PL/SQL Developer) 2010-07-02 15:59:46.238
*** SERVICE NAME:(SYS$USERS) 2010-07-02 15:59:46.238
*** SESSION ID:(770.4341) 2010-07-02 15:59:46.237
*** 2010-07-02 15:59:46.237
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kkocxj : pjpCtx], [], [], [], [], [], [], []
Current SQL statement for this session:
select p.access_number, aa.name
 from crm.product p,
      (select aa.prod_id, os.name, os.staff_number
         from (select *
                 from (select prod_id,
                              party_id,
                              row_number() over(partition by prod_id order by start_dt desc) num
                         from crm.party_2_prod
                        where end_dt > sysdate
                          and party_product_rela_role_cd = 3)
                where num = 1) aa,
              crm.our_staff os
        where aa.party_id = os.staff_id) aa
where p.prod_id = aa.prod_id(+)
  and p.access_number = '15335581126'
----- Call Stack Trace -----
    ksedst <- ksedmp <- ksfdmp <- kgerinv <- kgeasnmierr        
<- $cold_kkocxj <- kkoiqb <- kkooqb <- kkoqbc <- apakkoqb         
<- apaqbdDescendents <- apaqbd <- kkqctCostTransfQB <- kkqctdrvJP 
<- kkqjpdttr          <- kkqctdrvTD <- kkqjpddrv <- kkqdrv <- kkqctdrvIT 
<- apadrv           <- opitca <- kksFullTypeCheck <- rpiswu2 <- kksLoadChild 
<- kxsGetRuntimeLock            <- kksfbc <- kkspsc0 <- kksParseCursor 
<- opiosq0 <- kpooprx             <- kpoal8 <- opiodr <- ttcpip <- opitsk 
<- opiino              <- opiodr <- opidrv <- sou2o <- opimai_real <- main               
<- main_opd_entry

根据错误代码和stack trace可以在metalink上匹配到如下Bug:

Bug:7014646
Abstract: ORA-600: INTERNAL ERROR CODE, ARGUMENTS: [KKOCXJ : PJPCTX], [], [], [], [], []
Affects:
    Product (Component)	Oracle Server (Rdbms)
    Range of versions believed to be affected	Versions < 11.2
    Versions confirmed as being affected	
        * 10.2.0.4
        * 11.1.0.6 
    Platforms affected	Generic (all / most platforms affected)
Fixed:
    This issue is fixed in	
        * 10.2.0.4 Patch 7 on Windows Platforms
        * 10.2.0.5 (Server Patch Set)
        * 11.1.0.7 (Server Patch Set)
        * 11.2 (Future Release) 
Symptoms:
    * Internal Error May Occur (ORA-600)
    * ORA-600 [kkocxj : pjpCtx] 
Related To:
    * Optimizer
    * _OPTIMIZER_PUSH_PRED_COST_BASED 
Description
    A complex query can fail during parse with 
    ORA-600 [kkocxj : pjpCtx]
    Workaround
     Set  "_optimizer_push_pred_cost_based"=false

该bug可以通过实施one off Patch 7014646修复,也可以尝试通过修改隐式参数_optimizer_push_pred_cost_based禁用基于成本的谓词前置特性(WORKAROUND: disable cost based push predicate)来规避该[KKOCXJ:PJPCTX]内部错误发生,具体的修改方法:

SQL> conn / as sysdba
SQL> alter system set "_optimizer_push_pred_cost_based"=false;
SQL> exit
/* 设置该隐式参数无需重启实例 */

Oracle GCS更推荐通过应用补丁7014646的方法来解决问题,而修改以上隐式参数则不一定百分之百能解决问题。