ORA-00600:[kclchkinteg_2]及[kjmsm_epc]内部错误一例

一套AIX上的9.2.0.6 RAC系统,alert日志中最初报LMS进程(Lock Manager Server process,即锁服务管理进程,仅出现在RAC系统中)遭遇ORA-00600[kclchkinteg_2],继而出现ORA-00600[[kjmsm_epc]内部错误导致实例crash(instance crashed)。相关日志如下:

Tue Dec 1 01:20:25 2009
Errors in file /oracle/admin/crmdb/bdump/crmdb1_lms7_2143190.trc:
ORA-00600: internal error code, arguments: [kclchkinteg_2], [], [], [], [], [], [], []
Tue Dec 1 01:20:25 2009
Errors in file /oracle/admin/crmdb/bdump/crmdb1_lms7_2143190.trc:
ORA-00600: internal error code, arguments: [kclchkinteg_2], [], [], [], [], [], [], []
Tue Dec 1 01:20:26 2009
Trace dumping is performing id=[cdmp_20091201012026]
Tue Dec 1 01:20:40 2009
Errors in file /oracle/admin/crmdb/bdump/crmdb1_lms7_2143190.trc:
ORA-00600: internal error code, arguments: [kjmsm_epc], [], [], [], [], [], [], []
Tue Dec 1 01:20:41 2009
Errors in file /oracle/admin/crmdb/bdump/crmdb1_lms7_2143190.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [kjmsm_epc], [], [], [], [], [], [], []
Tue Dec 1 01:20:43 2009

经过和metalink确认,认为此次实例意外终止时由于”Bug 3671867 – OERI[kclchkinteg_2] possible in RAC environment”:

Hdr: 3671867 10.1.0.2 RDBMS 10.1.0.2 RAC PRODID-5 PORTID-212 ORA-600
Abstract: INSTANCE TERMINATED WITH ORA-600 [KCLCHKINTEG_2]
PROBLEM:
--------
During testing, one instance was terminated with ORA-600 [KCLCHKINTEG_2]

DIAGNOSTIC ANALYSIS:
--------------------
Alert log contains:

Thu Jun  3 12:57:26 2004
Private_strands 0 at log switch
Thread 1 advanced to log sequence 153
  Current log# 1 seq# 153 mem# 0: +EMDBD1/dbotazc0/onlinelog/group_1.267.1
  Current log# 1 seq# 153 mem# 1: +EMDBD1/dbotazc0/onlinelog/group_1.268.1
Thu Jun  3 13:41:45 2004
Private_strands 0 at log switch
Thread 1 advanced to log sequence 154
  Current log# 2 seq# 154 mem# 0: +EMDBD1/dbotazc0/onlinelog/group_2.269.1
  Current log# 2 seq# 154 mem# 1: +EMDBD1/dbotazc0/onlinelog/group_2.270.1
Thu Jun  3 13:55:06 2004
Errors in file /DBA/nest/oracle/DBOTA1/logs/dbota1_dbw0_82244.trc:
ORA-600: internal error code, arguments: [kclchkinteg_2], [], [], [], [],
[], [], []
Thu Jun  3 13:55:08 2004
Errors in file /DBA/nest/oracle/DBOTA1/logs/dbota1_dbw0_82244.trc:
ORA-600: internal error code, arguments: [kclchkinteg_2], [], [], [], [],
[], [], []
Thu Jun  3 13:55:08 2004
DBW0: terminating instance due to error 471
Thu Jun  3 13:55:08 2004
Trace dumping is performing id=[cdmp_20040603135508]
Thu Jun  3 13:55:10 2004
Dump system state for local instance only
Thu Jun  3 13:55:10 2004
Trace dumping is performing id=[cdmp_20040603135510]
The only trace file we have been sent is the dbw trace.
WORKAROUND:
-----------
None
RELATED BUGS:
-------------
There are several which end in KCLCHKINTEG but not KCLCHKINTEG_2
REPRODUCIBILITY:
----------------
Appears only to have happened once so far.
TEST CASE:
----------
N/A
STACK TRACE:
------------
ksedmp ksfdmp kgerinv kgeasnmierr kclassertle kclchkinteg kclfwrite1 kcbbic1
kcbbiop kcbbdrv ksbabs ksbrdp opirip opidrv sou2o main start

SUPPORTING INFORMATION:
-----------------------
Bruce Carter has looked at this and suggested a bug be raised.
Affects:
    Product (Component)	Oracle Server (Rdbms)
    Range of versions believed to be affected	Versions < 10.2
    Versions confirmed as being affected
        * 10.1.0.3
    Platforms affected	Generic (all / most platforms affected)
Fixed:
    This issue is fixed in
        * 9.2.0.7 (Server Patch Set)
        * 10.1.0.4 (Server Patch Set)
        * 10.2.0.1 (Base Release)
Symptoms:
Related To:
    * Internal Error May Occur (ORA-600)
    * Instance May Crash
    * ORA-600 [kclchkinteg_2]
    * RAC (Real Application Clusters) / OPS
Description
    ORA-600 [kclchkinteg_2] possible in RAC environment

Oracle开发部分确认该3671867 bug已在9.2.0.8中得到修复,建议通过应用9.2.0.8补丁集或者打上bug 3671867的one-off patch来修复该问题;该Bug可能导致实例意外终止,因此其Severity也极高,值得手头仍有9i RAC系统需要管理的dba注意。

ora-00600[kkocxj:pjpCtx]内部错误一例

一套HP-UX上的10.2.0.4系统在运行某条 select查询语句时出现ORA-00600[kkocxj:pjpCtx]内部错误,TRACE文件信息如下:


FILE VERSION
------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /oracle/app/oracle/product/10.2
System name:    HP-UX
Node name:      crmdb1
Release:        B.11.31
Version:        U
Machine:        ia64
Instance name: cbssnm
Redo thread mounted by this instance: 1

TRACE FILE
---------------
Filename = cbssnm_ora_29061.trc

*** ACTION NAME:(SQL 窗口 - 新建) 2010-07-02 15:59:46.238
*** MODULE NAME:(PL/SQL Developer) 2010-07-02 15:59:46.238
*** SERVICE NAME:(SYS$USERS) 2010-07-02 15:59:46.238
*** SESSION ID:(770.4341) 2010-07-02 15:59:46.237
*** 2010-07-02 15:59:46.237
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kkocxj : pjpCtx], [], [], [], [], [], [], []
Current SQL statement for this session:
select p.access_number, aa.name
 from crm.product p,
      (select aa.prod_id, os.name, os.staff_number
         from (select *
                 from (select prod_id,
                              party_id,
                              row_number() over(partition by prod_id order by start_dt desc) num
                         from crm.party_2_prod
                        where end_dt > sysdate
                          and party_product_rela_role_cd = 3)
                where num = 1) aa,
              crm.our_staff os
        where aa.party_id = os.staff_id) aa
where p.prod_id = aa.prod_id(+)
  and p.access_number = '15335581126'
----- Call Stack Trace -----
    ksedst <- ksedmp <- ksfdmp <- kgerinv <- kgeasnmierr        
<- $cold_kkocxj <- kkoiqb <- kkooqb <- kkoqbc <- apakkoqb         
<- apaqbdDescendents <- apaqbd <- kkqctCostTransfQB <- kkqctdrvJP 
<- kkqjpdttr          <- kkqctdrvTD <- kkqjpddrv <- kkqdrv <- kkqctdrvIT 
<- apadrv           <- opitca <- kksFullTypeCheck <- rpiswu2 <- kksLoadChild 
<- kxsGetRuntimeLock            <- kksfbc <- kkspsc0 <- kksParseCursor 
<- opiosq0 <- kpooprx             <- kpoal8 <- opiodr <- ttcpip <- opitsk 
<- opiino              <- opiodr <- opidrv <- sou2o <- opimai_real <- main               
<- main_opd_entry

根据错误代码和stack trace可以在metalink上匹配到如下Bug:

Bug:7014646
Abstract: ORA-600: INTERNAL ERROR CODE, ARGUMENTS: [KKOCXJ : PJPCTX], [], [], [], [], []
Affects:
    Product (Component)	Oracle Server (Rdbms)
    Range of versions believed to be affected	Versions < 11.2
    Versions confirmed as being affected	
        * 10.2.0.4
        * 11.1.0.6 
    Platforms affected	Generic (all / most platforms affected)
Fixed:
    This issue is fixed in	
        * 10.2.0.4 Patch 7 on Windows Platforms
        * 10.2.0.5 (Server Patch Set)
        * 11.1.0.7 (Server Patch Set)
        * 11.2 (Future Release) 
Symptoms:
    * Internal Error May Occur (ORA-600)
    * ORA-600 [kkocxj : pjpCtx] 
Related To:
    * Optimizer
    * _OPTIMIZER_PUSH_PRED_COST_BASED 
Description
    A complex query can fail during parse with 
    ORA-600 [kkocxj : pjpCtx]
    Workaround
     Set  "_optimizer_push_pred_cost_based"=false

该bug可以通过实施one off Patch 7014646修复,也可以尝试通过修改隐式参数_optimizer_push_pred_cost_based禁用基于成本的谓词前置特性(WORKAROUND: disable cost based push predicate)来规避该[KKOCXJ:PJPCTX]内部错误发生,具体的修改方法:

SQL> conn / as sysdba
SQL> alter system set "_optimizer_push_pred_cost_based"=false;
SQL> exit
/* 设置该隐式参数无需重启实例 */

Oracle GCS更推荐通过应用补丁7014646的方法来解决问题,而修改以上隐式参数则不一定百分之百能解决问题。

ORA-00600: internal error code, arguments: [15160]

一套windows上的11.1.0.7系统,用户在查询时出现ORA-00600: internal error code, arguments: [15160], [], [], [], [], [], [], [], [], [], [], []错误如下:

Dump continued from file: c:\app\administrator\diag\rdbms\dbuat\dbuat\trace\dbuat_ora_544.trc
ORA-00600: internal error code, arguments: [15160], [], [], [], [], [], [], [], [], [], [], []

========= Dump for incident 16131 (ORA 600 [15160]) ========

*** 2010-07-20 15:49:20.015
----- Current SQL Statement for this session (sql_id=3vx5wh2859qy4) -----
SELECT * FROM ACVW_ACDCBIRD
WHERE CUST_AC_NO = NVL(:B5 , CUST_AC_NO)
AND BRANCH_CODE = :B4
AND ACCOUNT_CLASS = NVL(:B3 , ACCOUNT_CLASS)
AND CUST_NO = NVL(:B2 , CUST_NO)
AND CCY = NVL(:B1 , CCY)

----- PL/SQL Call Stack -----
object line object
handle number name
4648BA6C 811 package body FCUAT.ACPKS_FCJ_ACDCBIRD
4648BA6C 895 package body FCUAT.ACPKS_FCJ_ACDCBIRD
4648BA6C 199 package body FCUAT.ACPKS_FCJ_ACDCBIRD
4648BA6C 122 package body FCUAT.ACPKS_FCJ_ACDCBIRD
4648E458 13 anonymous block
4648F520 161 package body FCUAT.STPKS_FIDPKG_WRAPPER
464953BC 1940 package body FCUAT.STPKS_FCMAINT_SERVICE
464B8DE0 164 package body FCUAT.GWPKS_SERVICE_ROUTER
464B8DE0 1549 package body FCUAT.GWPKS_SERVICE_ROUTER
464B8DE0 1718 package body FCUAT.GWPKS_SERVICE_ROUTER
464B96E0 1 anonymous block

----- Call Stack Trace -----

PGOSF60__ksfdmp dbgexPhaseII dbgexProcessError dbgeExecuteForError

该ORA-600[15160]内部错误已知可能由以下Bug引起:

1)
Bug:8295719 - ORA-600 [15160] running query with FULL OUTER join Fixed-Releases: B201 Tags: ANSI CBO OERI REDISCOVERY INFORMATION:
If a query with nested views and Full Outer Join fails during parsing in Oracle11g with ORA-600 [15160] and Join Predicate Push Down (JPPD) is taking place and one of the nested view query blocks in the 10053 trace has a very high value for cost in the chosen plan (in section CBQT Join Predicate Push Down Additional Phase) then you may have encountered this bug.
WORKAROUND:
_optimizer_push_pred_cost_based = false

2)
Bug 7370515 - ORA-600[15160]: INTERNAL ERROR CODE
REDISCOVERY INFORMATION:
If you encountered ORA-600[15160] error when UNION/UNION ALL view is on the right side of null aware anti join, it could be this bug.
WORKAROUND:
Set "_optimizer_null_aware_antijoin"=false

3)
Bug 9826143 - UNION QUERY WITH OUTER JOIN AND GROUP BY CLAUSE FAILS WITH ORA-600 [15160] (36)
>>>>>>>>>>
Bug 9213751 - SQL GETS ORA-600 [15160] WITH _OPTIMIZER_PUSH_PRED_COST_BASED=TRUE (11)
From trace: [_optimizer_push_pred_cost_based = true]

4)
Bug 8947490 - ORA-600[15160] OCCURS WHEN USING SQL TUNING ADVISOR
REDISCOVERY INFORMATION:
a query with full outer joined union view produced ora-[15160] most likely during sqltune /index advisor.

可以从bug描述中了解到这是一个优化器相关的bug,一般与优化器相关的bug都可以通过关闭优化器的部分特性来workaround。针对ORA-600[15160]内部错误,我们可以尝试设置以下2个隐藏参数以绕过问题:

1) SQl>
Alter session set "_optimizer_null_aware_antijoin"=false;

/* to disable null aware anti join */

2) SQl>
Alter session set "_OPTIMIZER_PUSH_PRED_COST_BASED" =false ;

/*WORKAROUND: disable cost based push predicate */

以上多个bug据Oracle官方说法均已在11.2.0.1版本中修复了。

ORA-00600:[15570]内部错误一例

一套Linux上的10.2.0.1系统出现ORA-00600:[15570]内部错误,日志如下:

Sat Jun 5 11:33:17 2010
Memory Notification: Library Cache Object loaded into SGA
Heap size 2190K exceeds notification threshold (2048K)
KGL object name :XDB.XDbD/PLZ01TcHgNAgAIIegtw==
Sat Jun 5 14:57:25 2010
Thread 1 advanced to log sequence 16540
Current log# 3 seq# 16540 mem# 0: /ora_data/mantas/redo03.log
Sat Jun 5 14:58:37 2010
Errors in file /opt/oracle/admin/mantas/udump/mantas_ora_10803.trc:
ORA-00600: internal error code, arguments: [15570], [], [], [], [], [], [], []
Sat Jun 5 14:58:37 2010
Errors in file /opt/oracle/admin/mantas/udump/mantas_ora_10903.trc:
ORA-00600: internal error code, arguments: [15570], [], [], [], [], [], [], []
Sat Jun 5 14:58:39 2010
Errors in file /opt/oracle/admin/mantas/udump/mantas_ora_10801.trc:
ORA-00600: internal error code, arguments: [15570], [], [], [], [], [], [], []

[Read more…]

ORA-07445 [SIGBUS] [Object specific hardware error]错误一例

一套Solaris上的9.2.0.7系统,实例意外终止,告警日志中出现以下记录:

Thu Sep 2 02:15:41 2010
Errors in file /u01/app/oracle/admin/preg063/bdump/preg063_smon_11391.trc:
ORA-07445: exception encountered: core dump [0000000101E05500] [SIGBUS] [Object specific hardware error] [0xFFFFFFFF7CB3BF90] [] []
Thu Sep 2 02:15:48 2010
Errors in file /u01/app/oracle/admin/preg063/bdump/preg063_pmon_11379.trc:
ORA-00474: SMON process terminated with error
Thu Sep 2 02:15:48 2010
PMON: terminating instance due to error 474
Wed Sep 1 15:04:20 2010
Errors in file /u01/app/oracle/admin/preg063/udump/preg063_ora_1772.trc:
ORA-07445: exception encountered: core dump [FFFFFFFF7F400980] [SIGBUS] [Object specific hardware error] [0xFFFFFFFF7CA78000] [] []
Wed Sep 1 15:06:24 2010
Errors in file /u01/app/oracle/admin/preg063/udump/preg063_ora_5316.trc:
ORA-07445: exception encountered: core dump [0000000101E10F94] [SIGBUS] [Object specific hardware error] [0xFFFFFFFF7C843DC8] [] []
Wed Sep 1 15:06:24 2010
Errors in file /u01/app/oracle/admin/preg063/udump/preg063_ora_5207.trc:
ORA-07445: exception encountered: core dump [0000000101E10F94] [SIGBUS] [Object specific hardware error] [0xFFFFFFFF7C887A70] [] []
Wed Sep 1 15:06:24 2010
Errors in file /u01/app/oracle/admin/preg063/udump/preg063_ora_28532.trc:
ORA-07445: exception encountered: core dump [0000000101E05500] [SIGBUS] [Object specific hardware error] [0xFFFFFFFF7C9B7FE8] [] []
Wed Sep 1 15:06:31 2010
Errors in file /u01/app/oracle/admin/preg063/udump/preg063_ora_4392.trc:
ORA-07445: exception encountered: core dump [0000000100624600] [SIGBUS] [Object specific hardware error] [0xFFFFFFFF7CA80000] [] []
Wed Sep 1 15:06:33 2010
Errors in file /u01/app/oracle/admin/preg063/udump/preg063_ora_3748.trc:
ORA-07445: exception encountered: core dump [FFFFFFFF7F400980] [SIGBUS] [Object specific hardware error]

[Read more…]

ORA-03137: TTC protocol internal error : [12333]错误一例

Oracle Solaris上的一套11.2.0.1.0最近出现以下告警记录:

Dump file /cnbbs01/app/oracle/diag/rdbms/nbbsprd1/nbbsprd1/incident/incdir_373041/nbbsprd1_ora_24754_i373041.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
ORACLE_HOME = /cnbbs01/app/oracle/product/11.2.0/db_1
System name: SunOS
Node name: ut06db03
Release: 5.10
Version: Generic_142901-12
Machine: i86pc
Instance name: nbbsprd1
Redo thread mounted by this instance: 1
Oracle process number: 130
Unix process pid: 24754, image: oracle@ut06db03
*** 2010-08-25 02:01:19.169
*** SESSION ID:(1203.16390) 2010-08-25 02:01:19.169
*** CLIENT ID:() 2010-08-25 02:01:19.169
*** SERVICE NAME:(SYS$USERS) 2010-08-25 02:01:19.169
*** MODULE NAME:() 2010-08-25 02:01:19.169
*** ACTION NAME:() 2010-08-25 02:01:19.169

Dump continued from file: /cnbbs01/app/oracle/diag/rdbms/nbbsprd1/nbbsprd1/trace/nbbsprd1_ora_24754.trc
ORA-03137: TTC protocol internal error : [12333] [11] [120] [110] [] [] [] []

[Read more…]

ORA-00600:[32695], [hash aggregation can't be done]错误一例

还是那个hash group by算法的问题,日志文件中出现以下记录:

*** ACTION NAME:(SQL 窗口 - 新建) 2010-09-03 14:27:54.594
*** MODULE NAME:(PL/SQL Developer) 2010-09-03 14:27:54.594
*** SERVICE NAME:(HQYDB1) 2010-09-03 14:27:54.594
*** SESSION ID:(3205.17923) 2010-09-03 14:27:54.594
*** 2010-09-03 14:27:54.594
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [32695], [hash aggregation can't be done], [], [], [], [], [], []
Current SQL statement for this session:
create table zou_201008_cell_id as
select /* g_all_cdr02,60 */
calling_num mobile_number,
lac,
lpad(cell_id,5,'0') cell_id,
count(*) c,
sum(call_duration) call_duration,
sum(decode(record_type,'00',1,0)*call_duration) moc_call_duration,
sum(decode(record_type,'01',1,0)*call_duration) mtc_call_duarion
from  g_all_cdr02
where substr(calling_num,1,7) in (select mobile_prefix from zou_mobile_prefix)
group by
calling_num ,
lac,
lpad(cell_id,5,'0')
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst+001c          bl       ksedst1              100000000 ? 11055A9A0 ?
ksedmp+0290          bl       ksedst               104A54870 ?
ksfdmp+0018          bl       03F30204
kgeriv+0108          bl       _ptrgl
kgesiv+0080          bl       kgeriv               1050BE654 ? 1050BE604 ?
0000027E5 ? 080000000 ?
07FFFFFFF ?
ksesic1+0060         bl       kgesiv               43300000FFFF5310 ?
4530000000000000 ?
000000071 ? 000000001 ?
000000000 ?
qeshPartitionBuildH  bl       01F9CA24
D+04bc
qeshGBYOpenScan2+02  bl       qeshPartitionBuildH  0000027E5 ? 1105C06C0 ?
34                            D
qeshGBYOpenScan+001  bl       qeshGBYOpenScan2     FFFFFFFFFFF5740 ? 11055A938 ?
8                                                  000000000 ? 000000010 ?
qerghFetch+05e8      bl       qeshGBYOpenScan      000001000 ?
rwsfcd+0054          bl       _ptrgl
qerltFetch+036c      bl       03F2EB1C
ctcdrv+4160          bl       01F9C898
opiexe+2884          bl       ctcdrv               100000001 ? 100000001 ?
110467F30 ?
opiosq0+19f0         bl       opiexe               FFFFFFFFFFF8B50 ?
2824422142420820 ?
FFFFFFFFFFF8C10 ?
kpooprx+0168         bl       opiosq0              300000000 ? 000000000 ?
000000000 ? A4000000000000 ?
kpoal8+0400          bl       kpooprx              FFFFFFFFFFFB464 ?
FFFFFFFFFFFB068 ?
1BF000001BF ? 100000001 ?
000000000 ? A40000000000A4 ?
000000000 ? 1103878F8 ?
opiodr+0ae0          bl       _ptrgl
ttcpip+1020          bl       _ptrgl
opitsk+1124          bl       01F9F2A0
opiino+0990          bl       opitsk               000000000 ? 000000000 ?
opiodr+0ae0          bl       _ptrgl
opidrv+0484          bl       01F9E0E8
sou2o+0090           bl       opidrv               3C02DC1BBC ? 44065F000 ?
FFFFFFFFFFFF3A0 ?
opimai_real+01bc     bl       01F9B9F4
main+0098            bl       opimai_real          000000000 ? 000000000 ?
__start+0098         bl       main                 000000000 ? 000000000 ?
--------------------- Binary Stack Dump ---------------------

这次是因为应用人员不了解alter session的作用域,在PL/SQL Developer工具中的不同窗口(也就是不在同一会话中)中执行了”alter session set “_gby_hash_aggregation_enabled” = false;”和涉及group by操作的SQL,并导致了unpublished bug:6471770被触发。
我们比较容易地workaround绕过这个Bug:

/* 在会话级别设置优化参数_gby_hash_aggregation_enabled */
alter session set "_gby_hash_aggregation_enabled" = false;
/* 或者在语句中加入NO_USE_HASH_AGGREGATION的 hint */
select  /*+ NO_USE_HASH_AGGREGATION */ ....

以上提及的unpublished bug:6471770据称在10.2.0.5,11.1.0.7,11.2.0.1版本中被修正了。

Database Force open example

帮网友强制打开了一个没有备份的测试库,这个库没有备份也没有打开归档,因为之前也出现过active日志文件损毁,一直使用隐式参数才能正常打开:

                 _allow_resetlogs_corruption= TRUE

 

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 13764045638   QQ号:47079569    邮箱:service@parnassusdata.com

 

 

这次一开始这个库报ORA-600[2662]错误:

Mon Aug 23 09:37:00 2010
Errors in file /oracle/QAS/saptrace/usertrace/qas_ora_852096.trc:
ORA-00600: internal error code, arguments: [2662], [0], [130131504], [0], [130254136], [4264285], [], []
Mon Aug 23 09:37:02 2010
Errors in file /oracle/QAS/saptrace/usertrace/qas_ora_852096.trc:
ORA-00600: internal error code, arguments: [2662], [0], [130131506], [0], [130254136], [4264285], [], []

ORA-600 [2662] “Block SCN is ahead of Current SCN”错误是当数据块中的SCN领先于current SCN,由于后台进程或服务进程都会比对UGA中的dependent SCN和数据库当前的SCN,如果数据库当前SCN小于dependent SCN,那么该进程就会报ORA-600 [2662]错误,如果遭遇该错误的是服务进程,那么服务进程一般会异常终止;如果遭遇该错误的是后台进程譬如SMON,则会导致实例CRASH。
ORA-600 [2662]错误可以能由以下几种情况引起:
1.启用隐含参数_ALLOW_RESETLOGS_CORRUPTION后,以resetlogs形式打开数据库;这种情况下发生2662错误,根本原因是没有完全前滚导致控制文件中的SCN滞后于数据块中的SCN。
2.硬件故障导致数据库没法写控制文件和联机日志文件
3.错误的部分恢复数据库
4.恢复了控制文件,但是没有使用recover database using backup controlfile进行恢复
5.数据库crash后设置了_DISABLE_LOGGING隐含参数
6.在并行服务器环境中DLM存在问题

该错误的5个参数的具体含义如下:
ARGUMENTS:
Arg [a] Current SCN WRAP
Arg [b] Current SCN BASE
Arg [c] dependent SCN WRAP
Arg [d] dependent SCN BASE
Arg [e] Where present this is the DBA where the dependent SCN came from.

我们的case当中dependent SCN为130254136,而当前SCN为130131506,其差值为122630;从以上告警日志中可以看到数据库的当前SCN是在不断缓慢增长的,当我们遭遇到2662错误时,很滑稽的一点是只要不断重启数据库保持current SCN的增长,一段时间后2662错误会不药而愈。当然我们也可以不用这种笨办法,10015事件可以帮助我们调整数据库当前SCN:

/* 当数据库处于mount状态,可以使用10015事件来调整scn */
alter session  set events '10015 trace name adjust_scn level 1';
/* 这里可以设置level 2..10等 (level 1是在每次打开数据库时scn增加1000k)*/
/* 需要注意的是10g某些版本不同于9i,需要设置隐式参数_allow_error_simulation,才能真正增进scn */
SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production
SQL> col current_scn format 999,999,999,999
SQL> select current_scn from v$database;
CURRENT_SCN
-----------
1141408
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.
Total System Global Area 1653518336 bytes
Fixed Size                  2213896 bytes
Variable Size             989857784 bytes
Database Buffers          654311424 bytes
Redo Buffers                7135232 bytes
Database mounted.
SQL> alter session set events '10015 trace name adjust_scn level 1';
Session altered.
SQL> alter database open;
Database altered.
SQL>  select current_scn from v$database;
CURRENT_SCN
-----------
1142031
/* 可以看到current_scn并未大量增加,10.2.0.4上默认10015 adjust_scn不被触发 */
SQL>  alter system set "_allow_error_simulation"=true scope=spfile;
System altered.
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.
Total System Global Area 1653518336 bytes
Fixed Size                  2213896 bytes
Variable Size             989857784 bytes
Database Buffers          654311424 bytes
Redo Buffers                7135232 bytes
Database mounted.
SQL> alter session set events '10015 trace name adjust_scn level 1';
Session altered.
SQL> alter database open;
Database altered.
SQL>select current_scn from v$database;
CURRENT_SCN
----------------
1,073,741,980

在接手之前该网友已经通过反复重启数据库将数据库的当前SCN提高到dependent SCN的127037138;原以为这样就可以打开数据库了,谁知道又出现了一下错误:

Wed Aug 25 07:43:53 2010
Errors in file /oracle/QAS/saptrace/usertrace/qas_ora_929958.trc:
ORA-00600: internal error code, arguments: [4000], [8], [], [], [], [], [], []
Wed Aug 25 07:43:53 2010
Errors in file /oracle/QAS/saptrace/usertrace/qas_ora_929958.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [4000], [8], [], [], [], [], [], []
Wed Aug 25 07:43:53 2010
Error 704 happened during db open, shutting down database

bootstrap自举过程中遭遇了ORA-600 [4000]错误,该错误一般当Oracle尝试读取数据字典(主要是undo$基表)中记录的USN对应的回滚段失败引起.,通过设置隐式参数_corrupted_rollback_segments可以一定程度上规避该错误,强制打开数据库,其Argument[a]代表造成读取失败的USN(undo segment number),但实际上有问题的回滚段可能不止这一个:

/* 通过strings工具从system表空间上找回各回滚段的名字  */
$strings system.dbf |grep _SYSSMU|less
_SYSSMU1$
_SYSSMU2$
_SYSSMU3$
_SYSSMU4$
_SYSSMU5$
_SYSSMU6$
_SYSSMU7$
_SYSSMU8$
_SYSSMU9$
.........
alter system set "_corrupted_rollback_segments"='(_SYSSMU1$, _SYSSMU2$, _SYSSMU3$, _SYSSMU4$, _SYSSMU5$, _SYSSMU6$, _SYSSMU7$, _SYSSMU8$, _SYSSMU9$, _SYSSMU10$, _SYSSMU11$, _SYSSMU12$)' scope=spfile;
System altered.
/* 即便设置了_corrupted_rollback_segments隐式参数,也还有一定概率遭遇4000错误,尝试加上10513事件,并多次重启数据库 */
SQL> alter system set event='10513 trace name context forever,level 2' scope=spfile;
System altered.
/* 再次出现4000 错误 */
Errors in file /oracle/QAS/saptrace/usertrace/qas_ora_1016014.trc:
ORA-00600: internal error code, arguments: [4000], [8], [], [], [], [], [], []
Thu Aug 26 09:43:39 2010
Errors in file /oracle/QAS/saptrace/usertrace/qas_ora_1016014.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [4000], [8], [], [], [], [], [], []
Thu Aug 26 09:43:39 2010
Error 704 happened during db open, shutting down database
/* 再次重启后发现4000错误不再出现 * /

再次重启发现不再出现ORA-600[4000]错误,但在字典检查阶段Oracle认为数据文件227不匹配于当前的incarnation:

Thu Aug 26 11:13:22 2010
Dictionary check beginning
Thu Aug 26 09:46:00 2010
Errors in file /oracle/QAS/saptrace/usertrace/qas_ora_897162.trc:
ORA-01177: data file does not match dictionary - probably old incarnation
ORA-01110: data file 227: '/oracle/QAS/sapdata2/qas_192/qas.data196'
Error 1177 happened during db open, shutting down database
USER: terminating instance due to error 1177
Instance terminated by USER, pid = 897162

初步判断出现ORA-01177可能为2种可能性:
1.数据字典出现讹误,227号文件对应的incarnation信息不正确
2.在之前的某次resetlogs open过程中,227号文件头由于某些原因没有正确更新incarnation信息

针对这样的情况如果一定要找回该数据文件上的数据的话只能通过手动修改数据字典或文件头,当然也可以尝试使用一些直接从数据文件上抽取数据的工具。
因为这是一次友情协助,就没有继续深入下去,通过重建控制文件并跳过该数据文件解决了:

CREATE CONTROLFILE REUSE DATABASE "QAS" RESETLOGS  NOARCHIVELOG
--  SET STANDBY TO MAXIMIZE PERFORMANCE
MAXLOGFILES 255
MAXLOGMEMBERS 3
MAXDATAFILES 254
MAXINSTANCES 50
MAXLOGHISTORY 36302
LOGFILE
GROUP 1 (
'/oracle/QAS/redolog/redolog11A.dbf',
'/oracle/QAS/redolog/redolog11B.dbf'
) SIZE 500M,
GROUP 2 (
'/oracle/QAS/redolog/redolog12A.dbf',
'/oracle/QAS/redolog/redolog12B.dbf'
) SIZE 500M
-- STANDBY LOGFILE
DATAFILE
'/oracle/QAS/sapdata1/system_1/system.data1',
........
'/oracle/QAS/sapdata2/qas_192/qas.data195'
CHARACTER SET WE8DEC
Thu Aug 26 14:04:50 2010
Successful mount of redo thread 1, with mount id 2117500093
Thu Aug 26 14:04:50 2010
Completed: CREATE CONTROLFILE REUSE DATABASE "QAS" RESETLOGS
Thu Aug 26 14:05:05 2010
alter database mount
Thu Aug 26 14:05:05 2010
ORA-1100 signalled during: alter database mount...
Thu Aug 26 14:05:15 2010
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 1125281471596
Resetting resetlogs activation ID 0 (0x0)
Online log 1 of thread 1 was previously cleared
Thu Aug 26 14:05:36 2010
Assigning activation ID 2117500093 (0x7e367cbd)
Thread 1 opened at log sequence 1
Current log# 2 seq# 1 mem# 0: /oracle/QAS/redolog/redolog12A.dbf
Current log# 2 seq# 1 mem# 1: /oracle/QAS/redolog/redolog12B.dbf
Successful open of redo thread 1
Thu Aug 26 14:05:36 2010
SMON: enabling cache recovery
Thu Aug 26 14:05:36 2010
Dictionary check beginning
Tablespace 'PSAPTEMP' #2 found in data dictionary,
but not in the controlfile. Adding to controlfile.
File #227 found in data dictionary but not in controlfile.
Creating OFFLINE file 'MISSING00227' in the controlfile.
This file can no longer be recovered so it must be dropped.
File #228 found in data dictionary but not in controlfile.
Creating OFFLINE file 'MISSING00228' in the controlfile.
This file can no longer be recovered so it must be dropped.
File #229 found in data dictionary but not in controlfile.
Creating OFFLINE file 'MISSING00229' in the controlfile.
This file can no longer be recovered so it must be dropped.
Dictionary check complete
Thu Aug 26 14:05:38 2010
SMON: enabling tx recovery
Thu Aug 26 14:05:38 2010
Database Characterset is WE8DEC
replication_dependency_tracking turned off (no async multimaster replication found)
Completed: alter database open resetlogs

这个case告诉我们,测试库并不一定就不重要,测试库也是需要备份的。

【Oracle数据恢复】ORA-600[4194]错误一例

ORA-600[4194]内部错误一般由重做记录与回滚记录不匹配引发。Oracle在验证Undo record number时,会对比redo change 和回滚段中的undo record number,若发现2者存在差异则报该4194错误。其错误argument[a][b],a代表回滚块中的最大undo record number,b代表重做日志中记录的undo record number。这个错误可能由回滚段或者redo log日志文件讹误引起。

ORA-00600[4194]错误的根本原因是 redo记录与回滚段(rollback/undo)记录之间的不一致。当ORACLE在验证undo记录时相对应的变化需要应用到undo数据块的最大undo记录上,此时若检验出错则会报ORA-00600[4194]

 

 

 

此错误不像ORA-600[2662]或ORA-600[4000]错误那样必然导致数据库无法打开,因为它很少出现在前滚阶段;当数据库被打开,smon开始执行事务恢复或一些回滚段的管理工作时则很有可能触发该错误。

 

ORA-600[4194]的2个的含义:

Arg [a] Maximum Undo record number in Undo block
Arg [b] Undo record number from Redo block

 

这个ORA-600[4194] 报错属于ORACLE内核从cache层到事务undo处理,可能的影响是进程失败或者可能的回滚段坏块。

 

可能的bug 包括:

 

8240762  10.2.0.5,
11.1.0.7.10,
11.2.0.1
Undo corruptions with ORA-600 [4193]/ORA-600 [4194] or ORA-600 [4137] /
SMON may spin to recover transaction

 

3210520 9.2.0.5, 10.1.0.2 OERI[kjccqmg:esm] / OERI[4194] / corruption possible in RAC

792610 8.0.6.0, 8.1.6.0 Rollback segment corruption

 

对于非自举对象non-bootstrap对象对应的undo记录可以通过如下方法搞定,如果涉及到的对象是bootstrap系统对象则可能需要手动 bbed来修复, 如果自己搞不定可以找ASKMACLEAN专业数据库修复团队成员帮您恢复

 

 

 

来具体看一下错误记录:

 

 

 

Thu Aug 26 18:58:50 2010
Errors in file /s01/10gdb/admin/YOUYUS/bdump/youyus_smon_6587.trc:
ORA-01595: error freeing extent (3) of rollback segment (4))
ORA-00600: internal error code, arguments: [4194], [53], [41], [], [], [], [], []
Thu Aug 26 18:58:50 2010
..............
Errors in file /s01/10gdb/admin/YOUYUS/bdump/youyus_j000_6630.trc:
ORA-00354: corrupt redo log block header
ORA-00353: log corruption near block 2 change 1617922 time 08/26/2010 18:35:39
ORA-00334: archived log: '/s01/10gdb/flash_recovery_area/YOUYUS/onlinelog/o1_mf_3_65psr4on_.log'
Thu Aug 26 19:00:31 2010
Errors in file /s01/10gdb/admin/YOUYUS/bdump/youyus_j000_6630.trc:
ORA-00600: internal error code, arguments: [4194], [53], [41], [], [], [], [], []
Thu Aug 26 19:00:34 2010
Errors in file /s01/10gdb/admin/YOUYUS/bdump/youyus_j000_6630.trc:
ORA-00354: corrupt redo log block header
ORA-00353: log corruption near block 2 change 1617922 time 08/26/2010 18:35:39
ORA-00334: archived log: '/s01/10gdb/flash_recovery_area/YOUYUS/onlinelog/o1_mf_3_65psr4on_.log'
ORA-00600: internal error code, arguments: [4194], [53], [41], [], [], [], [], []
Thu Aug 26 19:00:35 2010
Errors in file /s01/10gdb/admin/YOUYUS/bdump/youyus_j000_6630.trc:
ORA-00354: corrupt redo log block header
ORA-00353: log corruption near block 2 change 1617922 time 08/26/2010 18:35:39
ORA-00334: archived log: '/s01/10gdb/flash_recovery_area/YOUYUS/onlinelog/o1_mf_3_65psr4on_.log'
ORA-00600: internal error code, arguments: [4194], [53], [41], [], [], [], [], []

 

 

 

如果你因为ORA-600[4194]错误导致数据库无法打开,那么可以尝试设置以下事件:

 

 

SQL> alter system set event='10513 trace name context forever,level 2 : 10512 trace name context forever,level 1: 10511 trace name context forever,level 2: 10510 trace name context forever,level 1' scope=spfile;
System altered.
/* 10513事件用以阻止SMON在启动数据库后执行事务恢复(transaction recovery) */
/* 10512事件用以阻止SMON shrink rollback segment */
/* 10511事件用以阻止SMON check to cleanup undo dictionary */
/* 10500事件用以阻止SMON check to offline pending offline rollback segment */
SQL> alter system set undo_management=MANUAL scope=spfile;
System altered.
SQL> shutdown immediate;
ORA-03113: end-of-file on communication channel
SQL> startup mount;
ORACLE instance started.
Total System Global Area 2634022912 bytes
Fixed Size                  2086288 bytes
Variable Size            2382367344 bytes
Database Buffers          234881024 bytes
Redo Buffers               14688256 bytes
Database mounted.
SQL> alter database open;
Database altered.
SQL>  create undo tablespace undoc datafile size 300M;
SQL> alter system set undo_management=AUTO scope=spfile;
System altered.
SQL>  alter system set undo_tablespace=undoc scope=spfile;
System altered.
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.
Total System Global Area 2634022912 bytes
Fixed Size                  2086288 bytes
Variable Size            2382367344 bytes
Database Buffers          234881024 bytes
Redo Buffers               14688256 bytes
Database mounted.
SQL> alter database open;
Database altered.
/* 通过重建undo表空间可以避免一些4194错误,但不是全部 */
/* 这个库目前处于随时会crash的不可控状态,我们必须要导出数据并导入到新库中 * /
/* 这种情况下direct方式 可能可以规避一些意外错误 */
[maclean@rh2 dump]$ exp maclean/maclean file=full_maclean.dmp owner=maclean  direct=y statistics=none
Export: Release 10.2.0.4.0 - Production on Thu Aug 26 21:18:40 2010
Copyright (c) 1982, 2007, Oracle.  All rights reserved.
Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Export done in ZHS16GBK character set and UTF8 NCHAR character set
About to export specified users ...
. exporting pre-schema procedural objects and actions
. exporting foreign function library names for user MACLEAN
. exporting PUBLIC type synonyms
. exporting private type synonyms
. exporting object type definitions for user MACLEAN
About to export MACLEAN's objects ...
. exporting database links
. exporting sequence numbers
. exporting cluster definitions
. about to export MACLEAN's tables via Direct Path ...
Table SYS_EXPORT_TABLE_01 will be exported in conventional path.
. . exporting table            SYS_EXPORT_TABLE_01        256 rows exported
Table SYS_EXPORT_TABLE_02 will be exported in conventional path.
. . exporting table            SYS_EXPORT_TABLE_02        257 rows exported
Table SYS_EXPORT_TABLE_03 will be exported in conventional path.
..............
exporting refresh groups and children
. exporting dimensions
. exporting post-schema procedural objects and actions
. exporting statistics
Export terminated successfully with warnings.
/* we are lucky! */

ORA-00600:[qctcte1]内部错误一例

一套AIX上的4节点10.2.0.4 RAC系统在1月份出现实例hang住的现象,并伴随有ORA-00600:[qctcte1]内部错误,trace文件内容如下:

siposrc1_ora_102944.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /oracle/product/10.2.0/db_1
System name: AIX
Node name: jszydb1
Release: 3
Version: 5
Machine: 00CE31834C00
Instance name: siposrc1
Redo thread mounted by this instance: 1
Oracle process number: 34
Unix process pid: 102944, image: oracle@jszydb1
*** ACTION NAME:() 2010-01-18 15:53:11.530
*** MODULE NAME:(JDBC Thin Client) 2010-01-18 15:53:11.530
*** SERVICE NAME:(siposrc) 2010-01-18 15:53:11.530
*** SESSION ID:(2175.6953) 2010-01-18 15:53:11.530
*** 2010-01-18 15:53:11.530
ksedmp: internal or fatal error
ORA-00600: 内部错误代码, 参数: [qctcte1], [0], [], [], [], [], [], []
Current SQL statement for this session:
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_00000"), NVL(SUM(C2),:"SYS_B_00001") FROM (SELECT /*+ IGNORE_WHERE_CLAUSE NO_PARALLEL("SIPO_ECLA$_TEMP") FULL("SIPO_ECLA$_TEMP") NO_PARALLEL_INDEX("SIPO_ECLA$_TEMP") */ :"SYS_B_00002" AS C1, CASE WHEN "SIPO_ECLA$_TEMP"."SEQ_ID"=:"SYS_B_00003" OR "SIPO_ECLA$_TEMP"."SEQ_ID"=:"SYS_B_00004" OR "SIPO_ECLA$_TEMP"."SEQ_ID"=:"SYS_B_00005" OR "SIPO_ECLA$_TEMP"."SEQ_ID"=:"SYS_B_00006" OR "SIPO_ECLA$_TEMP"."SEQ_ID"=:"SYS_B_00007" OR "SIPO_ECLA$_TEMP"."SEQ_ID"=:"SYS_B_00008" OR "SIPO_ECLA$_TEMP"."SEQ_ID"=:"SYS_B_00009" OR "SIPO_ECLA$_TEMP"."SEQ_ID"=:"SYS_B_00010" OR "SIPO_ECLA$_TEMP"."SEQ_ID"=:"SYS_B_00011" OR "SIPO_ECLA$_TEMP"."SEQ_ID"=:"SYS_B_00012" OR "SIPO_ECLA$_TEMP"."SEQ_ID"=:"SYS_B_00013" OR "SIPO_ECLA$_TEMP"."SEQ_ID"=:"SYS_B_00014" OR "SIPO_ECLA$_TEMP"."SEQ_ID"=:"SYS_B_00015" OR "SIPO_ECLA$_TEMP"."SEQ_ID"=:"SYS_B_00016" OR "SIPO_ECLA$_TEMP"."SEQ_ID"=:"SYS_B_00017" OR "SIPO_ECLA$_TEMP"."SEQ_ID"=:"SYS_B_00018" OR "SIPO_ECLA
.
.
$_TEMP"."SEQ_ID"=:"SYS_B_40000" OR "SIPO_ECLA$_TEMP"."SEQ_ID"=:"SYS_B_40001" OR "SIPO_ECLA$_TEMP"."SEQ_ID"=:"SYS_B_40002" THEN :"SYS_B_40003" ELSE :"SYS_B_40004" END AS C2 FROM "SIPO_ECLA$_TEMP" SAMPLE BLOCK (:"SYS_B_40005" , :"SYS_B_40006") SEED (:"SYS_B_40007") "SIPO_ECLA$_TEM
STACK
qctcte qctocssm qctcopn qctcopn xtyxcssr xtyopncb qctcopn qctcpqb <- qctcpqbl <- xtydrv <- opitca <- kksFullTypeCheck <- rpiswu2 <- kksSetBindType <- kksfbc <- opiexe <- opiall0 <- opikpr <- opiodr <- rpidrus <- skgmstack <- rpidru <- rpiswu2 <- kprball <- IPRA <- IPRA <- kkedsSel <- kkecdn <- kkotap <- kkoiqb <- kkooqb <- kkoqbc <- apakkoqb <- apaqbdDescendents <- 3d4 <- apaqbdListReverse <- 06c <- apaqbd <- apadrv <- opitca <- kksLoadChild <- kxsGetRuntimeLock <- 810 <- kksfbc <- kkspsc0 <- kksParseCursor <- opiosq0 <- opiall0 <- opial7 <- opiodr <- ttcpip <- opitsk <- opiino <- opiodr <- opidrv

提交SR,MOS认为可能是Bug 6666870,给出以下方案:
There are a large number of possible bugs but Bug 6666870 is the most likely culprit. There is no one off patch available. 10205 is not yet available.

ACTION PLAN:
=============
1.Please disable Cost Based Transformation as a workaround, this can be done in the init.ora/spfile or at the session level, for example:
SQL> alter session set "_optimizer_cost_based_transformation"=off ;
2.Apply the 10205 patch set when it becomes available.
3.If the optimiser change fails to resolve your issue, please advise us of any recent changes to your DB or server?
4.In particular, did you recently apply the CPUJAN2008 Patch?
If so, please see Note.558901.1 Ext/Mod ORA-00600 internal error code, arguments [qctcte1] After Applying CPUJAN2008 Patch
5.If there is a function based index involved, please see;
Note.788124.1 Ext/Pub ORA-00600 [qctcte1] With Function Based Index Access
6.Changing your code to eliminate the parallel clauses may also act as a workaround.

沪ICP备14014813号

沪公网安备 31010802001379号