如何绕过ORA-00701错误和降低bootstrap对象的高水位

如何绕过ORA-701错误来实施对数据库自举对象bootstrap object的一些修改呢?

 

[oracle@mlab1 ~]$ oerr ora 701
00701, 00000, "object necessary for warmstarting database cannot be altered"
// *Cause:  Attempt to alter or drop a database object (table, cluster, or
//          index) which are needed for warmstarting the database.
// *Action: None.

 

 

首先需要说明的是,这纯粹为了技术教学,在实际的产品环境中不要使用如下手段!!!

 

SQL> alter index SYS.I_H_OBJ#_COL# rebuild;
alter index SYS.I_H_OBJ#_COL# rebuild
*
ERROR at line 1:
ORA-00701: object necessary for warmstarting database cannot be altered

SQL> ! oerr ora 38003
38003, 00000, "CBO Disable column stats for the dictionary objects in recursive SQL"
// *Cause:
// *Action:

 

 

1. 可以通过38003 event禁用递归SQL中队数据字典对象字段信息的核实,但是注意设置该38003 event需要重启实例:

 

 

SQL> alter system set event='38003 trace name context forever, level 10' scope=spfile;

System altered.

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup ;
ORACLE instance started.

Total System Global Area 1653518336 bytes
Fixed Size                  2228904 bytes
Variable Size             536874328 bytes
Database Buffers         1107296256 bytes
Redo Buffers                7118848 bytes
Database mounted.
Database opened.

SQL>  alter index SYS.I_H_OBJ#_COL# rebuild;

Index altered.

 

以上38003 event对于alter index rebuild这种操作绝大多数场景都适用,但是如果你想MOVE/SHRINK OBJ$表则做不到:

 

2. 使用第二种方法必须将数据库置于MIGRATE/UPDATE模式,这要求停机时间:

 

 

create table tab$                                             /* table table */
( obj#          number not null,                            /* object number */
  /* DO NOT CREATE INDEX ON DATAOBJ#  AS IT WILL BE UPDATED IN A SPACE
   * TRANSACTION DURING TRUNCATE */

cluster c_obj#(obj#)
/

shutdown immediate;
startup upgrade;

SQL> alter table c_obj# enable row movement;

Table altered.

SQL> alter table c_obj# shrink space;

Table altered.

 

由于tab$其实是c_obj的cluster table所以,只需要SHRINK C_OBJ#即可回收空间。 但是请注意如果为了回收几百兆的空间去操作这些关键性的自举对象是没有意义的, 千万不要在产品环境中这样操作!!

Oracle数据恢复:解决ORA-00600:[4000] ORA-00704: bootstrap process failure错误一例

节前帮网友打开了一套断电后无法正常启动的10.2.0.1数据库,该库使用非归档模式且没有任何备份。

主机断电后,该网友尝试重启数据库实例,但是遇到了ORA-00600:[kccpb_sanity_check_2]内部错误:

 

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 13764045638   QQ号:47079569    邮箱:service@parnassusdata.com

 

SQL> select status from v$instance;                                             

STATUS
------------
STARTED                                                                         

SQL>
SQL> shutdown immediate;
ORA-01507: database not mounted                                                 

ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.                                                        

Total System Global Area 2147483648 bytes
Fixed Size                  1220432 bytes
Variable Size             486539440 bytes
Database Buffers         1644167168 bytes
Redo Buffers               15556608 bytes
ORA-00600: internal error code, arguments: [kccpb_sanity_check_2], [8198],
[8175], [0x0], [], [], [], []

 

该kccpb_sanity_check_2内部错误常由不正确的control file中seq#记录引起,MOS note <ORA-00600: [kccpb_sanity_check_2] During Instance Startup>简单描述了该问题:

 

ORA-00600: [kccpb_sanity_check_2] During Instance Startup

Applies to:
Oracle Server - Enterprise Edition - Version: 10.2.0.1 and later   [Release: 10.2 and later ]
Information in this document applies to any platform.
Symptoms
The database is getting the following errors on Startup:

ORA-00600: internal error code, arguments: [kccpb_sanity_check_2], [3621501], [3621462], [0x000000000]

Changes
In this case, the customer moved the box from one data center to another.
Cause
ORA-600 [kccpb_sanity_check_2] indicates that the seq# of the last read block is
higher than the seq# of the control file header block. This is indication of
the lost write of the header block during commit of the previous cf
transaction.

Solution

1) restore a backup of a controlfile and recover

OR

2) recreate the controlfile

OR

3) restore the database from last good backup and recover

NOTE:  If you do not have any special backup of control file to restore and you are
using Multiple Control File copies in your pfile/init.ora/spfile you can attempt to mount the
database using each control file one by one.
If you are able to mount the database with any of these control file copies
you can then issue 'alter database backup controlfile to trace' to recreate controlfile.

针对该由control file corruption引起的ORA-00600: [kccpb_sanity_check_2] ,可以尝试一下三种解决方法:

  1. restore controlfile from backup
  2. 若有多路复用controlfile,则尝试使用每个单独的control file来mount实例,若能成功则使用该可用controlfile复制到其余位置
  3. 使用controlfile backup trace来重建控制文件,若没有trace那么也可以手动写create controlfile语句来重建

 

接手这个问题后尝试第二种方法失败,只好到udump/bdump目录下搜索可用的controlfile backup trace,所幸找到了,重建控制文件并成功:

Create controlfile reuse set database "kmcdb"
MAXINSTANCES 8
MAXLOGHISTORY 1
MAXLOGFILES 16
MAXLOGMEMBERS 3
MAXDATAFILES 100
Datafile
'/u01/oracle/oradata/kmcdb/system01.dbf',
'/u01/oracle/oradata/kmcdb/undotbs01.dbf',
'/u01/oracle/oradata/kmcdb/sysaux01.dbf',
'/u01/oracle/oradata/kmcdb/users01.dbf',
'/u01/oracle/oradata/kmcdb/example01.dbf'
LOGFILE GROUP 1 ('/u01/oracle/oradata/kmcdb/redo01.log') SIZE 51200K,
GROUP 2 ('/u01/oracle/oradata/kmcdb/redo02.log') SIZE 51200K,
GROUP 3 ('/u01/oracle/oradata/kmcdb/redo03.log') SIZE 51200K RESETLOGS
Wed Sep 28 16:57:47 2011
WARNING: Default Temporary Tablespace not specified in CREATE DATABASE command
Default Temporary Tablespace will be necessary for a locally managed database in future release
Setting recovery target incarnation to 1
Wed Sep 28 16:57:48 2011
Successful mount of redo thread 1, with mount id 523482251
Wed Sep 28 16:57:48 2011
Completed: Create controlfile reuse set database "kmcdb"
MAXINSTANCES 8
MAXLOGHISTORY 1
MAXLOGFILES 16
MAXLOGMEMBERS 3
MAXDATAFILES 100
Datafile
'/u01/oracle/oradata/kmcdb/system01.dbf',
'/u01/oracle/oradata/kmcdb/undotbs01.dbf',
'/u01/oracle/oradata/kmcdb/sysaux01.dbf',
'/u01/oracle/oradata/kmcdb/users01.dbf',
'/u01/oracle/oradata/kmcdb/example01.dbf'
LOGFILE GROUP 1 ('/u01/oracle/oradata/kmcdb/redo01.log') SIZE 51200K,
GROUP 2 ('/u01/oracle/oradata/kmcdb/redo02.log') SIZE 51200K,
GROUP 3 ('/u01/oracle/oradata/kmcdb/redo03.log') SIZE 51200K RESETLOGS
Wed Sep 28 16:57:57 2011
alter database mount

 

成功mount实例后,尝试recover database 并 open 数据库 :

 

 recover database using backup controlfile until cancel;
因为controlfile是重建出来的,所以当前的controlfile并不知道那个在线日志是current的,需要手动指定
ALTER DATABASE RECOVER  database using backup controlfile until cancel
Wed Sep 28 17:27:09 2011
Media Recovery Start
WARNING! Recovering data file 1 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 2 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 3 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 4 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 5 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
parallel recovery started with 7 processes
ORA-279 signalled during: ALTER DATABASE RECOVER  database using backup controlfile until cancel  ...
Wed Sep 28 17:27:20 2011
ALTER DATABASE RECOVER    LOGFILE '/u01/oracle/oradata/kmcdb/redo02.log'
Wed Sep 28 17:27:20 2011
Media Recovery Log /u01/oracle/oradata/kmcdb/redo02.log
Errors with log /u01/oracle/oradata/kmcdb/redo02.log
ORA-339 signalled during: ALTER DATABASE RECOVER    LOGFILE '/u01/oracle/oradata/kmcdb/redo02.log'  ...
Wed Sep 28 17:27:20 2011
ALTER DATABASE RECOVER CANCEL
ORA-1547 signalled during: ALTER DATABASE RECOVER CANCEL ...
Wed Sep 28 17:27:37 2011
ALTER DATABASE RECOVER  database using backup controlfile until cancel
Wed Sep 28 17:27:37 2011
Media Recovery Start
WARNING! Recovering data file 1 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 2 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 3 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 4 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 5 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
parallel recovery started with 7 processes
ORA-279 signalled during: ALTER DATABASE RECOVER  database using backup controlfile until cancel  ...
Wed Sep 28 17:27:52 2011
ALTER DATABASE RECOVER    LOGFILE '/u01/oracle/oradata/kmcdb/redo03.log'
Wed Sep 28 17:27:52 2011
Media Recovery Log /u01/oracle/oradata/kmcdb/redo03.log
Wed Sep 28 17:27:52 2011
Incomplete recovery applied all redo ever generated.
Recovery completed through change 16228646
Wed Sep 28 17:27:52 2011
Media Recovery Complete (kmcdb)
Completed: ALTER DATABASE RECOVER    LOGFILE '/u01/oracle/oradata/kmcdb/redo03.log'
Wed Sep 28 17:28:12 2011

 

多次尝试后发现redo03.log是所需要的current group member,尝试alter database open resetlogs,但是出现ORA-00600: internal error code, arguments: [4000], [7], [], [], [], [], [], []错误:

 

alter database open resetlogs
Wed Sep 28 17:28:12 2011
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 16228646
Resetting resetlogs activation ID 523468297 (0x1f337e09)
Online log /u01/oracle/oradata/kmcdb/redo01.log: Thread 1 Group 1 was previously cleared
Online log /u01/oracle/oradata/kmcdb/redo02.log: Thread 1 Group 2 was previously cleared
Wed Sep 28 17:28:12 2011
Setting recovery target incarnation to 5
Wed Sep 28 17:28:12 2011
Assigning activation ID 523485789 (0x1f33c25d)
Thread 1 opened at log sequence 1
Current log# 3 seq# 1 mem# 0: /u01/oracle/oradata/kmcdb/redo03.log
Successful open of redo thread 1
Wed Sep 28 17:28:12 2011
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Wed Sep 28 17:28:12 2011
SMON: enabling cache recovery
Wed Sep 28 17:28:12 2011
Errors in file /u01/oracle/admin/kmcdb/udump/kmcdb_ora_31798.trc:
ORA-00600: internal error code, arguments: [4000], [7], [], [], [], [], [], []
Wed Sep 28 17:28:14 2011
Errors in file /u01/oracle/admin/kmcdb/udump/kmcdb_ora_31798.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [4000], [7], [], [], [], [], [], []

以上ORA-00600: [4000], [7],说明在使用usn=7的回滚段rollback数据块时发现rollback segment存在讹误(注意不要和ora600 [2662]错误混淆],且该ORA-00600: [4000]伴随有ORA-00704: bootstrap process failure错误,说明需要回滚的数据块是bootstrap重要的自举对象。

一般来说bootstrap object需要做rollback或cleanup而apply undo数据时,我们是无法使用_corrupted_rollback_segments,_offline_rollback_segments或10513事件来阻止ORA-00600: [4000]的发生。

 

但还是仍心存侥幸,修改_corrupted_rollback_segments,_offline_rollback_segments和event参数再次尝试:

alter system set event='10513 trace name context forever,level 2 :
10512 trace name context forever,level 1: 10511 trace name context forever,level 2:
10510 trace name context forever,level 1' scope=spfile;
ALTER SYSTEM SET _offline_rollback_segments='(_SYSSMU7$)' SCOPE=SPFILE;
ALTER SYSTEM SET _corrupted_rollback_segments='(_SYSSMU7$)' SCOPE=SPFILE;
再次open database
alter database open resetlogs
Wed Sep 28 17:47:05 2011
RESETLOGS after complete recovery through change 16228668
Resetting resetlogs activation ID 523471415 (0x1f338a37)
Online log /u01/oracle/oradata/kmcdb/redo01.log: Thread 1 Group 1 was previously cleared
Online log /u01/oracle/oradata/kmcdb/redo02.log: Thread 1 Group 2 was previously cleared
Wed Sep 28 17:47:05 2011
Setting recovery target incarnation to 11
Wed Sep 28 17:47:05 2011
Assigning activation ID 523466708 (0x1f3377d4)
Thread 1 opened at log sequence 1
Current log# 3 seq# 1 mem# 0: /u01/oracle/oradata/kmcdb/redo03.log
Successful open of redo thread 1
Wed Sep 28 17:47:05 2011
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Wed Sep 28 17:47:05 2011
SMON: enabling cache recovery
Wed Sep 28 17:47:05 2011
Errors in file /u01/oracle/admin/kmcdb/udump/kmcdb_ora_32261.trc:
ORA-00600: internal error code, arguments: [4000], [7], [], [], [], [], [], []
Wed Sep 28 17:47:06 2011
Errors in file /u01/oracle/admin/kmcdb/udump/kmcdb_ora_32261.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [4000], [7], [], [], [], [], [], []
Wed Sep 28 17:47:06 2011
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 32261
ORA-1092 signalled during: alter database open resetlogs...

无法避免bootstrap对象发生ORA-00600:[4000],而这对打开数据库来说是致命的,要想恢复这个数据库就必须要解决该bootstrap对象。

怎么办呢?

我们先来读一读ORA-00600:[4000]内部错误的trace日志:

 

*** SERVICE NAME:() 2011-09-28 17:46:09.022
*** SESSION ID:(160.3) 2011-09-28 17:46:09.022
Recovery target incarnation = 10, activation ID = 0
Influx buffer limit = 98098 (50% x 196196)
Successfully allocated 7 recovery slaves
Using 156 overflow buffers per recovery slave
Start recovery at thread 1 ckpt scn 16228666 logseq 1 block 2
*** 2011-09-28 17:46:09.236
Media Recovery add redo thread 1
*** 2011-09-28 17:46:14.752
Media Recovery drop redo thread 1
*** 2011-09-28 17:46:41.788
Recovery target incarnation = 10, activation ID = 0
Influx buffer limit = 98098 (50% x 196196)
Successfully allocated 7 recovery slaves
Using 156 overflow buffers per recovery slave
Start recovery at thread 1 ckpt scn 16228666 logseq 1 block 2
*** 2011-09-28 17:46:41.842
Media Recovery add redo thread 1
*** 2011-09-28 17:46:52.799
Media Recovery Log /u01/oracle/oradata/kmcdb/redo03.log
----- Redo read statistics for thread 1 -----
Read rate (ASYNC): 0Kb in 10.99s => 0.00 Mb/sec
Total physical reads: 4096Kb
Longest record: 0Kb, moves: 0/1 (0%)
Change moves: 0/1 (0%), moved: 0Mb
Longest LWN: 0Kb, moves: 0/1 (0%), moved: 0Mb
Last redo scn: 0x0000.00f7a13b (16228667)
----------------------------------------------
*** 2011-09-28 17:46:52.835
Media Recovery drop redo thread 1
File 1 (stop scn 16228668) completed recovery at checkpoint scn 16228668
File 2 (stop scn 16228668) completed recovery at checkpoint scn 16228668
File 3 (stop scn 16228668) completed recovery at checkpoint scn 16228668
File 4 (stop scn 16228668) completed recovery at checkpoint scn 16228668
File 5 (stop scn 16228668) completed recovery at checkpoint scn 16228668
ARCH: Connecting to console port...
*** 2011-09-28 17:47:05.351
Prior to RESETLOGS processing...
ALTER SYSTEM ARCHIVE LOG ALL USING BACKUP CONTROLFILE start
Database is not in archivelog mode
ALTER SYSTEM ARCHIVE LOG ALL USING BACKUP CONTROLFILE complete
*** 2011-09-28 17:47:05.353
*** 2011-09-28 17:47:05.707
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4000], [7], [], [], [], [], [], []
Current SQL statement for this session:
select ctime, mtime, stime from obj$ where obj# = :1
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst()+27          call     ksedst1()            0 ? 1 ?
ksedmp()+557         call     ksedst()             0 ? 9EF2E6D0 ? 0 ? 2A ?
9CCB06F4 ? 70000 ?
ksfdmp()+19          call     ksedmp()             3 ? BFAA18D0 ? AC152A0 ?
CBD2D40 ? 3 ? CB843B8 ?
kgeriv()+188         call     00000000             CBD2D40 ? 3 ?
kgeasi()+113         call     kgeriv()             CBD2D40 ? B7EB0020 ? FA0 ?
1 ? BFAA190C ?
ktudba()+264         call     kgeasi()             CBD2D40 ? B7EB0020 ? FA0 ?
2 ? 1 ? 0 ? 7 ? 0 ?
ktrgcm()+6207        call     ktudba()             7 ? BFAA1DEC ? 0 ? 0 ?
ktrgtc()+941         call     ktrgcm()             B7EFF4F4 ? 0 ? B7EFE054 ?
96F1A0B4 ? 96F10CE8 ? 198 ?
kdsgrp()+107         call     ktrgtc()             B7EFF4F4 ? B7EFF49C ?
9C22142 ? BFAA1F08 ? 240 ?
9C24DC4 ? 9C21D7C ?
kdsfbrcb()+513       call     kdsgrp()             B7EFF4F0 ? 0 ? B7EFF4F0 ?
qertbFetchByRowID()  call     kdsfbrcb()           B7EFF4F0 ? B7EFDFEC ? 0 ? 1 ?
+2052                                              0 ? 0 ?
opifch2()+5157       call     00000000             96F10A8C ? A11CDE4 ?
BFAA2534 ? 1 ?
引发ORA-00600:[4000]错误的数据块头信息:
Block header dump:  0x0040007a
Object id on Block? Y
seg/obj: 0x12  csc: 0x00.f84ef0  itc: 1  flg: -  typ: 1 - DATA
fsl: 0  fnx: 0x0 ver: 0x01
Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0007.01f.000011cf  0x00800057.0faf.07  --U-    1  fsc 0x0000.00f84ef1
data_block_dump,data header at 0x82bba044

以上的trace文件信息量极大,我们来抽丝剥茧的罗列这些信息量:

1.
File 1 (stop scn 16228668) completed recovery at checkpoint scn 16228668, 这条日志指出了1号数据文件最后恢复到的checkpoint scn,当尝试open database时数据库当前的scn将从16228668开始

2.
ORA-00600: internal error code, arguments: [4000], [7], [], [], [], [], [], []
Current SQL statement for this session:
select ctime, mtime, stime from obj$ where obj# = :1

引发ORA-00600:[4000], [7]错误的语句是”select ctime, mtime, stime from obj$ where obj# = :1″,这是一条常用的递归SQL语句,查询的对象是非常critical的bootstrap自举对象OBJ$基表,这说明需要cleanup的块时OBJ$表上的

3. 引发ORA-00600:[4000] raised是ktudba函数,由它触发了oracle 错误机制,其stack call为kdsgrp-> ktrgtc -> ktrgcm -> ktudba -> kgeasi(错误处理函数)

4. 引发ORA-00600:[4000], [7]错误的数据块是1号数据文件的122块,seg/obj为0x12,块类型为Data,且存在有一条ITL entry:

Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0007.01f.000011cf 0x00800057.0faf.07 –U- 1 fsc 0x0000.00f84ef1

该ITL 的Scn为00f84ef1= 16273137 > current_scn=16228668 且lck=1

这里存在一个问题:
为什么这个”select ctime, mtime, stime from obj$ where obj# = :1″查询语句要针对该OBJ$的数据块访问该块上ITL相关的rollback segment呢?

这里要牵扯到Oracle中查询语句构建CR一致性读块和块延迟清除的一些概念:

1. 当现有的buffer block无法满足查询的SCN要求时,查询进程会通过现有buffer和undo信息构造出一个CR块来。譬如说cache中现有的一个block的SCN是10,而我的查询语句其实是从Scn=1时就开始的,那么这个查询语句的Snap_scn就是1,为了满足其SCN的需求,就需要访问相关的rollback segment 并对 buffer block做”rollback”。

2. 另一方面当查询语句访问块时,可能需要对块做cleanup,也就是我们说的延迟块清除(deferred block cleanout),当这种Block Cleanout发生时查询进程也需要去访问块上ITL相关的 rollback segment ,那么具体当访问到哪些块时需要做cleanout呢?这个将会在今后的文章中详述。

回归到上面的ORA-00600:[4000]中来,因为该致命内部错误涉及到的对象是重要的Bootstrap表OBJ$,所以无法使用传统的例如_corrupted_rollback_segments,_offline_rollback_segments或10513事件来阻止ORA-00600: [4000]的发生,必须使用块修改工具BBED来修改存在问题的数据块将ITL事务槽的FLAG从U修改为C(Commit)。

TRANSACTION_COMMITED = 0x08;
TRANSACTION_UPBOUND = 0x02;
TRANSACTION_ACTIVE = 0x01;

Flag= –U- 即TRANSACTION_UPBOUND时flag所占字节为0x02,需要将该字节修改为TRANSACTION_COMMITED = 0x08;

十分不巧的是,通过QQ远程协助帮助这位网友使用BBED打开了数据块,但因为这位网友的Terminal工具存在问题导致日志没有记全,特别是BBED这部分的:

[oracle@DBserver1 lib]$ cd /u01/oracle/oradata/kmcdb/
[oracle@DBserver1 kmcdb]$ ls
control01.ctl      control02.ctl.bak  ctl_backup     redo02.log    system01.dbf   users01.dbf
control01.ctl.bak  control03.ctl      example01.dbf  redo03.log    temp01.dbf
control02.ctl      control03.ctl.bak  redo01.log     sysaux01.dbf  undotbs01.dbf
[oracle@DBserver1 kmcdb]$ cp system01.dbf ctl_backup/
[oracle@DBserver1 kmcdb]$ cp system01.dbf system01.dbf.bak           
在正式BBED之前一定要备份所有的数据文件
oracle@DBserver1 kmcdb]$ bbed filename=system01.dbf password=blockedit mode=edit                                         
BBED: Release 2.0.0.0.0 - Limited Production on Wed Sep 28 21:22:41 2011                                        
Copyright (c) 1982, 2005, Oracle.  All rights reserved.                                                         
BBED> set blocksize 8192
BLOCKSIZE       8192                                                                                    
BBED> set block 123
BLOCK#          123                                                                                     
BBED> map /v
File: system01.dbf (0)
Block: 123                                   Dba:0x00000000
------------------------------------------------------------
KTB Data Block (Table/Cluster)                                                                                 
struct kcbh, 20 bytes                      @0
ub1 type_kcbh                           @0
ub1 frmt_kcbh                           @1
ub1 spare1_kcbh                         @2
ub1 spare2_kcbh                         @3
ub4 rdba_kcbh                           @4
ub4 bas_kcbh                            @8
ub2 wrp_kcbh                            @12
ub1 seq_kcbh                            @14
ub1 flg_kcbh                            @15
ub2 chkval_kcbh                         @16
ub2 spare3_kcbh                         @18                                                                 
struct ktbbh, 48 bytes                     @20
ub1 ktbbhtyp                            @20
union ktbbhsid, 4 bytes                 @24
struct ktbbhcsc, 8 bytes                @28
b2 ktbbhict                             @36
ub1 ktbbhflg                            @38
ub1 ktbbhfsl                            @39
ub4 ktbbhfnx                            @40
struct ktbbhitl[1], 24 bytes            @44                                                                 
struct kdbh, 14 bytes                      @68
ub1 kdbhflag                            @68
b1 kdbhntab                             @69
b2 kdbhnrow                             @70
sb2 kdbhfrre                            @72
sb2 kdbhfsbo                            @74
sb2 kdbhfseo                            @76
b2 kdbhavsp                             @78
b2 kdbhtosp                             @80                                                                 
struct kdbt[1], 4 bytes                    @82
b2 kdbtoffs                             @82
b2 kdbtnrow                             @84                                                                 
sb2 kdbr[108]                              @86                                                                 
ub1 freespace[873]                         @302                                                                
ub1 rowdata[7013]                          @1175                                                               
ub4 tailchk                                @8188      
ITL记录从    struct ktbbhitl[1], 24 bytes            @44  开始
BBED> set offset 61
OFFSET          61                                                                                      
BBED> d
File: system01.dbf (0)
Block: 123              Offsets:   61 to  572           Dba:0x00000000
------------------------------------------------------------------------
200000f1 4ef80000 016c00ff ffea0053 04690369 0300006c 007b1f3a 1ffe1ec1
1e801e40 1ef81db8 1d771d29 1dec1cb0 1c681c27 1ce71ba7 1b681b27 1beb1aaf
1a701a2f 1af319b4 19731937 19f118ae 18671827 18e717a7 1766172b 17e316a2
1661161f 16e315a3 15661529 15e914a8 146c142c 14ec13b0 13731332 13f112b0
126f122f 12ee11a0 11601121 11de109b 105f1023 10e60fa3 0f620f1d 0fd80e9c
0e5a0e14 0ed30d94 0d540d13 0dd30c93 0c570c17 0cd90b98 0b580b15 0bd00a8b
0a460a01 0ac10981 09410901 09c0087d 083e0801 08bc077e 074207fc 06bb0674
063206f0 05aa0564 051e05d8 04950453 04000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000      
BBED> modify /x 0x80
#modify /x 20 filename 'system01.dbf' block 123. offset 61.
sum apply
verify
因为该ITL事务槽的lck=1,因此还需要将lb: 0x1的row piece记录修改为lb: 0x0
在trace中,该记录为
tab 0, row 26, @0x18f1
tl: 70 fb: --H-FL-- lb: 0x1  cc: 17
col  0: [ 2]  c1 02
col  1: [ 4]  c3 08 61 2c
col  2: [ 1]  80
col  3: [12]  5f 4e 45 58 54 5f 4f 42 4a 45 43 54
col  4: [ 2]  c1 02
col  5: *NULL*
set offset 6454
modify /x 0x00
#modify /x 01 filename 'system01.dbf' block 123. offset 6454.
sum apply
verify
以上完成了对OBJ$上数据块的修改

 

完成以上BBED手动修改操作后,再次打开数据库成功,但是发现有很多坏块:

 

alter database open resetlogs
Wed Sep 28 21:42:53 2011
RESETLOGS after complete recovery through change 16228672
Resetting resetlogs activation ID 523466708 (0x1f3377d4)
Online log /u01/oracle/oradata/kmcdb/redo01.log: Thread 1 Group 1 was previously cleared
Online log /u01/oracle/oradata/kmcdb/redo02.log: Thread 1 Group 2 was previously cleared
Wed Sep 28 21:42:54 2011
Setting recovery target incarnation to 12
Wed Sep 28 21:42:54 2011
Assigning activation ID 523426694 (0x1f32db86)
Thread 1 opened at log sequence 1
Current log# 3 seq# 1 mem# 0: /u01/oracle/oradata/kmcdb/redo03.log
Successful open of redo thread 1
Wed Sep 28 21:42:54 2011
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Wed Sep 28 21:42:54 2011
SMON: enabling cache recovery
Wed Sep 28 21:42:54 2011
Dictionary check beginning
Tablespace 'TEMP' #3 found in data dictionary,
but not in the controlfile. Adding to controlfile.
Corrected file 5 plugged in read-only status in control file
Dictionary check complete
Wed Sep 28 21:42:54 2011
SMON: enabling tx recovery
Wed Sep 28 21:42:54 2011
*********************************************************************
WARNING: The following temporary tablespaces contain no files.
This condition can occur when a backup controlfile has
been restored.  It may be necessary to add files to these
tablespaces.  That can be done using the SQL statement:
ALTER TABLESPACE  ADD TEMPFILE
Alternatively, if these temporary tablespaces are no longer
needed, then they can be dropped.
Empty temporary tablespace: TEMP
*********************************************************************
Updating character set in controlfile to ZHS16GBK
Wed Sep 28 21:42:54 2011
Errors in file /u01/oracle/admin/kmcdb/bdump/kmcdb_smon_32531.trc:
ORA-00600: internal error code, arguments: [2662], [0], [16273152], [0], [16273158], [4220957], [], []
**********************************************************
WARNING: Files may exists in db_recovery_file_dest
that are not known to the database. Use the RMAN command
CATALOG RECOVERY AREA to re-catalog any such files.
One of the following events caused this:
1. A backup controlfile was restored.
2. A standby controlfile was restored.
3. The controlfile was re-created.
4. db_recovery_file_dest had previously been enabled and
then disabled.
**********************************************************
replication_dependency_tracking turned off (no async multimaster replication found)
WARNING: AQ_TM_PROCESSES is set to 0. System operation might be adversely affected.
Wed Sep 28 21:42:55 2011
LOGSTDBY: Validating controlfile with logical metadata
Wed Sep 28 21:42:55 2011
LOGSTDBY: Validation complete
ORA-01555 caused by SQL statement below (SQL ID: 96g93hntrzjtr, SCN: 0x0000.00f84f0a):
Wed Sep 28 21:42:55 2011
select /*+ rule */ bucket_cnt, row_cnt, cache_cnt, null_cnt, timestamp#, sample_size, minimum,
maximum, distcnt, lowval, hival, density, col#, spare1, spare2, avgcln from hist_head$ where obj#=:1 and intcol#=:2
Global Name changed to KMCDB
Wed Sep 28 21:42:55 2011
Non-fatal internal error happenned while SMON was doing non-existent object cleanup.
SMON encountered 1 out of maximum 100 non-fatal internal errors.
Wed Sep 28 21:42:56 2011
Completed: alter database open resetlogs
Wed Sep 28 21:42:57 2011
Hex dump of (file 1, block 61479) in trace file /u01/oracle/admin/kmcdb/bdump/kmcdb_smon_32531.trc
Corrupt block relative dba: 0x0040f027 (file 1, block 61479)
Fractured block found during buffer read
Data in bad block:
type: 6 format: 2 rdba: 0x0040f027
last change scn: 0x0000.00f80906 seq: 0x1 flg: 0x06
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0x36a70601
check value in block header: 0xe5d2
computed block checksum: 0x3fa1
Reread of rdba: 0x0040f027 (file 1, block 61479) found same corrupted data
Wed Sep 28 21:42:57 2011
Errors in file /u01/oracle/admin/kmcdb/bdump/kmcdb_smon_32531.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 61479)
ORA-01110: data file 1: '/u01/oracle/oradata/kmcdb/system01.dbf'
Wed Sep 28 21:42:58 2011
Starting background process CJQ0
CJQ0 started with pid=24, OS id=724
Wed Sep 28 21:43:01 2011
Errors in file /u01/oracle/admin/kmcdb/bdump/kmcdb_m000_720.trc:
ORA-00600: internal error code, arguments: [2662], [0], [16273222], [0], [16273348], [12624044], [], []
Wed Sep 28 21:43:02 2011
Errors in file /u01/oracle/admin/kmcdb/bdump/kmcdb_m000_720.trc:
ORA-00600: internal error code, arguments: [2662], [0], [16273288], [0], [16273348], [12624044], [], []
ORA-00600: internal error code, arguments: [2662], [0], [16273222], [0], [16273348], [12624044], [], []
Wed Sep 28 21:44:35 2011
create temporary tablespace temp02 tempfile '/u01/oracle/oradata/temp02.dbf' size 400M
Wed Sep 28 21:44:35 2011
Completed:  create temporary tablespace temp02 tempfile '/u01/oracle/oradata/temp02.dbf' size 400M
Wed Sep 28 21:45:52 2011
create undo tablespace undo03 datafile '/u01/oracle/oradata/undo03.dbf' size 500M
Hex dump of (file 3, block 3839) in trace file /u01/oracle/admin/kmcdb/udump/kmcdb_ora_749.trc
Corrupt block relative dba: 0x00c00eff (file 3, block 3839)
Fractured block found during buffer read
Data in bad block:
type: 6 format: 2 rdba: 0x00c00eff
last change scn: 0x0000.00f7be10 seq: 0x1 flg: 0x06
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0x49b20601
check value in block header: 0xe0a3
computed block checksum: 0xf7a2
Reread of rdba: 0x00c00eff (file 3, block 3839) found same corrupted data
Completed: create undo tablespace undo03 datafile '/u01/oracle/oradata/undo03.dbf' size 500M
Wed Sep 28 21:45:58 2011
Errors in file /u01/oracle/admin/kmcdb/bdump/kmcdb_smon_32531.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []
Wed Sep 28 21:45:58 2011
Non-fatal internal error happenned while SMON was doing Corrupt Block Seg Info.
SMON encountered 2 out of maximum 100 non-fatal internal errors.
Wed Sep 28 21:47:15 2011
ALTER SYSTEM SET undo_management='AUTO' SCOPE=SPFILE;
Wed Sep 28 21:47:19 2011
Starting background process EMN0
EMN0 started with pid=22, OS id=757
Wed Sep 28 21:47:19 2011
Shutting down instance: further logons disabled
Wed Sep 28 21:47:19 2011
Stopping background process CJQ0
Wed Sep 28 21:47:19 2011

 

尝试创建并切换到新的Undo表空间,可能可以避免一些后续出现的ORA-600问题。

以上alert.log中2662和4097内部错误暂时不导致数据库crash,但是因为是用非常手段打开的数据库所以这套数据库已不再稳定,所以要第一时间逻辑和物理备份重要数据, 并着手重建数据库:

 

run{
set maxcorrupt for datafile 1,2,3,4,5 to 100;
backup as compressed backupset incremental level 0 database;
}
exp system/oracle full=y file=full.dmp

 

虽然该网友和我一再强调他的数据库中的数据十分重要,但是他的行为是在无声的诉说这套数据库里的数据是无关紧要的。

如果你觉得数据是important的,那么请善待数据。 善待数据的第一条准则是 数据备份重于一切,如果没有备份 那么请不要奢望数据是持久的!

了解你所不知道的SMON功能(三):清理obj$基表

SMON的作用还包括清理obj$数据字典基表(cleanup obj$)

OBJ$字典基表是Oracle Bootstarp启动自举的重要对象之一:

SQL> set linesize 80 ;
SQL> select sql_text from bootstrap$ where sql_text like 'CREATE TABLE OBJ$%';
SQL_TEXT
--------------------------------------------------------------------------------
CREATE TABLE OBJ$("OBJ#" NUMBER NOT NULL,"DATAOBJ#" NUMBER,"OWNER#" NUMBER NOT N
ULL,"NAME" VARCHAR2(30) NOT NULL,"NAMESPACE" NUMBER NOT NULL,"SUBNAME" VARCHAR2(
30),"TYPE#" NUMBER NOT NULL,"CTIME" DATE NOT NULL,"MTIME" DATE NOT NULL,"STIME"
DATE NOT NULL,"STATUS" NUMBER NOT NULL,"REMOTEOWNER" VARCHAR2(30),"LINKNAME" VAR
CHAR2(128),"FLAGS" NUMBER,"OID$" RAW(16),"SPARE1" NUMBER,"SPARE2" NUMBER,"SPARE3
" NUMBER,"SPARE4" VARCHAR2(1000),"SPARE5" VARCHAR2(1000),"SPARE6" DATE) PCTFREE
10 PCTUSED 40 INITRANS 1 MAXTRANS 255 STORAGE (  INITIAL 16K NEXT 1024K MINEXTEN
TS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 OBJNO 18 EXTENTS (FILE 1 BLOCK 121))

 

触发场景

OBJ$基表是一张低级数据字典表,该表几乎对库中的每个对象(表、索引、包、视图等)都包含有一行记录。很多情况下,这些条目所代表的对象是不存在的对象(non-existent),引起这种现象的一种可能的原因是对象本身已经被从数据库中删除了,但是对象条目仍被保留下来以满足消极依赖机制(negative dependency)。因为这些条目的存在会导致OBJ$表不断膨胀,这时就需要由SMON进程来删除这些不再需要的行。SMON会在实例启动(after startup of DB is started cleanup function again)时以及启动后的每12个小时执行一次清理任务(the cleanup is scheduled to run after startup and then every 12 hours)。

我们可以通过以下演示来了解SMON清理obj$的过程:

SQL>  BEGIN
2      FOR i IN 1 .. 5000 LOOP
3      execute immediate ('create synonym gustav' || i || ' for
4  perfstat.sometable');
5      execute immediate ('drop   synonym gustav' || i );
6      END LOOP;
7    END;
8    /
PL/SQL procedure successfully completed.
SQL> startup force;
ORACLE instance started.
Total System Global Area 1065353216 bytes
Fixed Size                  2089336 bytes
Variable Size             486542984 bytes
Database Buffers          570425344 bytes
Redo Buffers                6295552 bytes
Database mounted.
Database opened.
SQL>   select count(*) from user$ u, obj$ o
2        where u.user# (+)=o.owner# and o.type#=10 and not exists
3        (select p_obj# from dependency$ where p_obj# = o.obj#);
COUNT(*)
----------
5000
SQL> /
COUNT(*)
----------
5000
SQL> /
COUNT(*)
----------
4951
SQL> oradebug setospid 18457;
Oracle pid: 8, Unix process pid: 18457, image: oracle@rh2.oracle.com (SMON)
SQL> oradebug event 10046 trace name context forever ,level 1;
Statement processed.
SQL> oradebug tracefile_name;
/s01/admin/G10R2/bdump/g10r2_smon_18457.trc
select o.owner#,
o.obj#,
decode(o.linkname,
null,
decode(u.name, null, 'SYS', u.name),
o.remoteowner),
o.name,
o.linkname,
o.namespace,
o.subname
from user$ u, obj$ o
where u.use r#(+) = o.owner#
and o.type# = :1
and not exists
(select p_obj# from dependency$ where p_obj# = o.obj#)
order by o.obj#
for update
select null
from obj$
where obj# = :1
and type# = :2
and obj# not in
(select p_obj# from dependency$ where p_obj# = obj$.obj#)
delete from obj$ where obj# = :1
/* 删除过程其实较为复杂,可能要删除多个字典基表上的记录 */

现象

我们可以通过以下查询来了解obj$基表中NON-EXISTENT对象的条目总数(type#=10),若这个总数在不断减少说明smon正在执行清理工作
obj$_type#=10

    select trunc(mtime), substr(name, 1, 3) name, count(*)
from obj$
where type# = 10
and not exists (select * from dependency$ where obj# = p_obj#)
group by trunc(mtime), substr(name, 1, 3);
select count(*)
from user$ u, obj$ o
where u.user#(+) = o.owner#
and o.type# = 10
and not exists
(select p_obj# from dependency$ where p_obj# = o.obj#);

如何禁止SMON清理obj$基表

我们可以通过设置诊断事件event=’10052 trace name context forever’来禁止SMON清理obj$基表,当我们需要避免SMON因cleanup obj$的相关代码而意外终止或spin从而开展进一步的诊断时可以设置该诊断事件。在Oracle并行服务器或RAC环境中,也可以设置该事件来保证只有特定的某个节点来执行清理工作。

10052, 00000, "don't clean up obj$"
alter system set events '10052 trace name context forever, level 65535';
Problem Description: We are receiving the below warning during db startup:
WARNING: kqlclo() has detected the following :
Non-existent object 37336 NOT deleted because an object
of the same name exists already.
Object name: PUBLIC.USER$
This is caused by the SMON trying to cleanup the SYS.OJB$.
SMON cleans all dropped objects which have a SYS.OBJ$.TYPE#=10. 
This can happen very often when you create an object that have the same name as a public synonym. 
When SMON is trying to remove non-existent objects and fails because there are duplicates, 
multiple nonexistent objects with same name.
This query will returned many objects with same name under SYS schema:
select o.name,u.user# from user$ u, obj$ o where u.user# (+)=o.owner# and o.type#=10 
and not exists (select p_obj# from dependency$ where p_obj# = o.obj#);
To cleanup this message:
Take a full backup of the database - this is crucial. If anything goes wrong during this procedure, 
your only option would be to restore from backup, so make sure you have a good backup before proceeding. 
We suggest a COLD backup. If you plan to use a HOT backup, you will have to restore point in time if any problem happens
Normally DML against dictionary objects is unsupported, 
but in this case we know exactly what the type of corruption, 
also you are instructing to do this under guidance from Support.
Data dictionary patching must be done by an experienced DBA. 
This solution is unsupported. 
It means that if there were problems after applying this solution, a database backup must be restored.
1. Set event 10052 at parameter file to disable cleanup of OBJ$ by SMON
EVENT="10052 trace name context forever, level 65535"
2. Startup database in restricted mode
3. Delete from OBJ$, COMMIT
SQL> delete from obj$ where (name,owner#) in ( select o.name,u.user# from user$ u, obj$ o
where u.user# (+)=o.owner# and o.type#=10 and not exists (select p_obj# from
dependency$ where p_obj# = o.obj#) );
SQL> commit;
SQL> Shutdown abort.
4. remove event 10052 from init.ora
5. Restart the database and monitor for the message in the ALERT LOG file

沪ICP备14014813号

沪公网安备 31010802001379号