数据恢复:解决ORA-600[kghstack_free2][kghstack_err+0068]一例

数据库后台日志文件alert.log报以下异常:

Database Characterset is AL32UTF8
Opening with internal Resource Manager plan
where NUMA PG = 1, CPUs = 16
replication_dependency_tracking turned off (no async multimaster replication found)
Thu Sep 20 09:46:09 2012
Errors in file /user/oracle/admin/maclean/bdump/maclean_p003_856120.trc:
ORA-00600: internal error code, arguments: [kghstack_free2], [], [], [], [], [], [], []
Starting background process QMNC
QMNC started with pid=53, OS id=1229222
Thu Sep 20 09:46:11 2012
Errors in file /user/oracle/admin/maclean/bdump/maclean_p003_856120.trc:
ORA-07445: exception encountered: core dump [kghstack_err+0068] [SIGSEGV] [Address not mapped to object] [0x778670101010109] [] []
ORA-00600: internal error code, arguments: [kghstack_free2], [], [], [], [], [], [], []
Thu Sep 20 09:46:12 2012
Errors in file /user/oracle/admin/maclean/bdump/maclean_p003_856120.trc:
ORA-07445: exception encountered: core dump [kghstack_err+0068] [SIGSEGV] [Address not mapped to object] [0x778670101010109] [] []
ORA-00607: Internal error occurred while making a change to a data block
ORA-00602: internal programming exception
ORA-07445: exception encountered: core dump [kghstack_err+0068] [SIGSEGV] [Address not mapped to object] [0x778670101010109] [] []
ORA-00600: internal error code, arguments: [kghstack_free2], [], [], [], [], [], [], []
Thu Sep 20 09:46:13 2012
Errors in file /user/oracle/admin/maclean/bdump/maclean_p003_856120.trc:
ORA-07445: exception encountered: core dump [kghstack_err+0068] [SIGSEGV] [Address not mapped to object] [0x778670101010109] [] []
ORA-00607: Internal error occurred while making a change to a data block
ORA-00602: internal programming exception
ORA-07445: exception encountered: core dump [kghstack_err+0068] [SIGSEGV] [Address not mapped to object] [0x778670101010109] [] []
ORA-00607: Internal error occurred while making a change to a data block
ORA-00602: internal programming exception
ORA-07445: exception encountered: core dump [kghstack_err+0068] [SIGSEGV] [Address not mapped to object] [0x778670101010109] [] []
ORA-00600: internal error code, arguments: [kghstack_free2], [], [], [], [], [], [], []
Thu Sep 20 09:46:15 2012
Doing block recovery for file 370 block 140818
Doing block recovery for file 370 block 140818
Doing block recovery for file 370 block 140818
Doing block recovery for file 370 block 140818
Doing block recovery for file 370 block 140818
Thu Sep 20 09:46:16 2012
Completed: ALTER DATABASE OPEN
Thu Sep 20 09:46:18 2012
Doing block recovery for file 370 block 140818
Block recovery from logseq 336560, block 264 to scn 12971246479133
Thu Sep 20 09:46:18 2012
Recovery of Online Redo Log: Thread 1 Group 4 Seq 336560 Reading mem 0
Mem# 0: /oradata4/macleanDB/redolog/redolog4a.log
Mem# 1: /oradata4/macleanDB/redolog/redolog4b.log
Block recovery completed at rba 336560.27935.16, scn 3020.445245214
Thu Sep 20 09:46:18 2012
SMON: slave died unexpectedly, downgrading to serial recovery
Thu Sep 20 09:46:18 2012
Errors in file /user/oracle/admin/maclean/bdump/maclean_smon_729208.trc:
ORA-00600: internal error code, arguments: [kghstack_free2], [], [], [], [], [], [], []
Thu Sep 20 09:46:20 2012
Errors in file /user/oracle/admin/maclean/bdump/maclean_smon_729208.trc:
ORA-07445: exception encountered: core dump [kghstack_err+0068] [SIGSEGV] [Address not mapped to object] [0x778670101010109] [] []
ORA-00600: internal error code, arguments: [kghstack_free2], [], [], [], [], [], [], []
Thu Sep 20 09:46:21 2012
Errors in file /user/oracle/admin/maclean/bdump/maclean_smon_729208.trc:
ORA-07445: exception encountered: core dump [kghstack_err+0068] [SIGSEGV] [Address not mapped to object] [0x778670101010109] [] []
ORA-00607: Internal error occurred while making a change to a data block
ORA-00602: internal programming exception
ORA-07445: exception encountered: core dump [kghstack_err+0068] [SIGSEGV] [Address not mapped to object] [0x778670101010109] [] []
ORA-00600: internal error code, arguments: [kghstack_free2], [], [], [], [], [], [], []
Thu Sep 20 09:46:22 2012
Errors in file /user/oracle/admin/maclean/bdump/maclean_smon_729208.trc:
ORA-07445: exception encountered: core dump [kghstack_err+0068] [SIGSEGV] [Address not mapped to object] [0x778670101010109] [] []
ORA-00607: Internal error occurred while making a change to a data block
ORA-00602: internal programming exception
ORA-07445: exception encountered: core dump [kghstack_err+0068] [SIGSEGV] [Address not mapped to object] [0x778670101010109] [] []
ORA-00607: Internal error occurred while making a change to a data block
ORA-00602: internal programming exception
ORA-07445: exception encountered: core dump [kghstack_err+0068] [SIGSEGV] [Address not mapped to object] [0x778670101010109] [] []
ORA-00600: internal error code, arguments: [kghstack_free2], [], [], [], [], [], [], []
Thu Sep 20 09:46:24 2012
Errors in file /user/oracle/admin/maclean/bdump/maclean_pmon_786638.trc:
ORA-00474: SMON process terminated with error
Thu Sep 20 09:46:24 2012
PMON: terminating instance due to error 474
Instance terminated by PMON, pid = 786638
Thu Sep 20 10:38:31 2012
Starting ORACLE instance (normal)
sskgpgetexecname failed to get name
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Picked latch-free SCN scheme 3
Autotune of undo retention is turned on.
IMODE=BR
ILAT =61
LICENSE_MAX_USERS = 0
SYS auditing is enabled
ksdpec: called for event 13740 prior to event group initialization
Starting up ORACLE RDBMS Version: 10.2.0.4.0.
System parameters with non-default values:
processes = 500
sessions = 555
__shared_pool_size = 637534208
__large_pool_size = 16777216
__java_pool_size = 16777216
__streams_pool_size = 0
sga_target = 21474836480
control_files = /oradata1/macleanDB/control1/control01.ctl, /oradata1/macleanDB/control2/control02.ctl, /oradata1/macleanDB/control3/control03.ctl
db_block_size = 8192
__db_cache_size = 20786970624
compatible = 10.2.0.3.0
log_archive_dest_1 = LOCATION=/oraarch/maclean
log_archive_format = %t_%s_%r.dbf
db_files = 4000
db_file_multiblock_read_count= 16
undo_management = AUTO
undo_tablespace = UNDOTBS2
undo_retention = 7200
remote_login_passwordfile= EXCLUSIVE
audit_sys_operations = TRUE
db_domain =
dispatchers = (PROTOCOL=TCP) (SERVICE=macleanXDB)
job_queue_processes = 10
background_dump_dest = /user/oracle/admin/maclean/bdump
user_dump_dest = /user/oracle/admin/maclean/udump
core_dump_dest = /user/oracle/admin/maclean/cdump
audit_file_dest = /user/oracle/admin/maclean/adump
audit_trail = DB, EXTENDED
db_name = maclean
open_cursors = 300
pga_aggregate_target = 10737418240
MMAN started with pid=4, OS id=368872
PSP0 started with pid=3, OS id=184496
PMON started with pid=2, OS id=94512
DBW0 started with pid=5, OS id=331874
DBW1 started with pid=6, OS id=418128
LGWR started with pid=7, OS id=397498
CKPT started with pid=8, OS id=508388
SMON started with pid=9, OS id=459050
RECO started with pid=10, OS id=471536
CJQ0 started with pid=11, OS id=442822
MMON started with pid=12, OS id=180548
Thu Sep 20 10:38:38 2012
starting up 1 dispatcher(s) for network address ‘(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))’…
MMNL started with pid=13, OS id=352574
Thu Sep 20 10:38:38 2012
starting up 1 shared server(s) …
Thu Sep 20 10:38:42 2012
ALTER DATABASE MOUNT
Thu Sep 20 10:38:47 2012
Setting recovery target incarnation to 2
Thu Sep 20 10:38:47 2012
Successful mount of redo thread 1, with mount id 2376692082
Thu Sep 20 10:38:47 2012
Database mounted in Exclusive Mode
Completed: ALTER DATABASE MOUNT
Thu Sep 20 10:38:47 2012
ALTER DATABASE OPEN
Thu Sep 20 10:38:49 2012
Beginning crash recovery of 1 threads
parallel recovery started with 15 processes
Thu Sep 20 10:38:50 2012
Started redo scan
Thu Sep 20 10:38:51 2012
Completed redo scan
27994 redo blocks read, 8334 data blocks need recovery
Thu Sep 20 10:38:52 2012
Started redo application at
Thread 1: logseq 336560, block 3
Thu Sep 20 10:38:52 2012
Recovery of Online Redo Log: Thread 1 Group 4 Seq 336560 Reading mem 0
Mem# 0: /oradata4/macleanDB/redolog/redolog4a.log
Mem# 1: /oradata4/macleanDB/redolog/redolog4b.log
Thu Sep 20 10:38:53 2012
Completed redo application
Thu Sep 20 10:38:56 2012
Completed crash recovery at
Thread 1: logseq 336560, block 27997, scn 12971246499154
8334 data blocks read, 8334 data blocks written, 27994 redo blocks read
Thu Sep 20 10:38:57 2012
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=32, OS id=770284
ARC1 started with pid=33, OS id=762350
Thu Sep 20 10:38:57 2012
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
Thu Sep 20 10:38:58 2012
Thread 1 advanced to log sequence 336561 (thread open)
Thread 1 opened at log sequence 336561
Current log# 2 seq# 336561 mem# 0: /oradata1/macleanDB/redolog/redolog2a.log
Current log# 2 seq# 336561 mem# 1: /oradata1/macleanDB/redolog/redolog2b.log
Successful open of redo thread 1
Thu Sep 20 10:38:58 2012
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu Sep 20 10:38:58 2012
ARC1: Becoming the ‘no FAL’ ARCH
ARC1: Becoming the ‘no SRL’ ARCH
Thu Sep 20 10:38:58 2012
ARC0: Becoming the heartbeat ARCH
Thu Sep 20 10:38:58 2012
SMON: enabling cache recovery
Thu Sep 20 10:39:03 2012
Successfully onlined Undo Tablespace 5.
Thu Sep 20 10:39:03 2012
SMON: enabling tx recovery
Thu Sep 20 10:39:03 2012
Database Characterset is AL32UTF8
Opening with internal Resource Manager plan
where NUMA PG = 1, CPUs = 16
Thu Sep 20 10:39:04 2012
Errors in file /user/oracle/admin/maclean/bdump/maclean_p003_491552.trc:
ORA-00600: internal error code, arguments: [kghstack_free2], [], [], [], [], [], [], []
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=52, OS id=770310
Thu Sep 20 10:39:06 2012
Errors in file /user/oracle/admin/maclean/bdump/maclean_p003_491552.trc:
ORA-07445: exception encountered: core dump [kghstack_err+0068] [SIGSEGV] [Address not mapped to object] [0x778670101010109] [] []
ORA-00600: internal error code, arguments: [kghstack_free2], [], [], [], [], [], [], []
Thu Sep 20 10:39:08 2012
Errors in file /user/oracle/admin/maclean/bdump/maclean_p003_491552.trc:
ORA-07445: exception encountered: core dump [kghstack_err+0068] [SIGSEGV] [Address not mapped to object] [0x778670101010109] [] []
ORA-00607: Internal error occurred while making a change to a data block
ORA-00602: internal programming exception
ORA-07445: exception encountered: core dump [kghstack_err+0068] [SIGSEGV] [Address not mapped to object] [0x778670101010109] [] []
ORA-00600: internal error code, arguments: [kghstack_free2], [], [], [], [], [], [], []
Thu Sep 20 10:39:08 2012
Errors in file /user/oracle/admin/maclean/bdump/maclean_p003_491552.trc:
ORA-07445: exception encountered: core dump [kghstack_err+0068] [SIGSEGV] [Address not mapped to object] [0x778670101010109] [] []
ORA-00607: Internal error occurred while making a change to a data block
ORA-00602: internal programming exception
ORA-07445: exception encountered: core dump [kghstack_err+0068] [SIGSEGV] [Address not mapped to object] [0x778670101010109] [] []
ORA-00607: Internal error occurred while making a change to a data block
ORA-00602: internal programming exception
ORA-07445: exception encountered: core dump [kghstack_err+0068] [SIGSEGV] [Address not mapped to object] [0x778670101010109] [] []
ORA-00600: internal error code, arguments: [kghstack_free2], [], [], [], [], [], [], []
Thu Sep 20 10:39:11 2012
Doing block recovery for file 370 block 140818
Doing block recovery for file 370 block 140818
Doing block recovery for file 370 block 140818
Doing block recovery for file 370 block 140818
Doing block recovery for file 370 block 140818

 

设置10513 level 2内部事件后,SMON不再recover dead transaction ,打开数据库后实例不再意外终止。

SQL> show parameter event

NAME TYPE VALUE
———————————— ———– ——————————
event string 10513 trace name context forev
er, level 2

通过RMAN validate命令和dbv检测数据文件,发现表MAC_maclean_TAB上存在一个逻辑讹误块和一个物理讹误块。
SQL> select * from V$DATABASE_BLOCK_CORRUPTION ;

FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO
———- ———- ———- —————— ———
370 140818 1 1.2971E+13 LOGICAL
84 23875 1 0 CORRUPT
鉴于用户可以接受少量数据的损失,且blockrecover以上物理讹误块在RMAN中实际运行时寻找该FILE 84 BLOCK 23875耗费大量时间,所以考虑利用ROWID的方式绕过讹误块,拷贝出绝大多数的表数据。

SQL> select dbms_rowid.rowid_create(1,353223,370,140818,0) from dual;

DBMS_ROWID.ROWID_C
——————
AAAh1CAFyAAAiYSAAA
SQL> select dbms_rowid.rowid_create(1,353223,370,140819,0) from dual;

DBMS_ROWID.ROWID_C
——————
AABWPHAFyAAAiYTAAA

 

alter session set events ‘10231 trace name context forever, level 10′;
SQL> select count(*) from MAC_maclean_TAB;

COUNT(*)
———-
8450840

 

insert /*+ APPEND */ into MAC_maclean_TAB_BAK select /*+ ROWID(A) */ * from MAC_maclean_TAB A where rowid <‘AABWPHAFyAAAiYSAAA';

insert /*+ APPEND */ into MAC_maclean_TAB_BAK select /*+ ROWID(A) */ * from MAC_maclean_TAB A where rowid >=’AABWPHAFyAAAiYTAAA';
SQL> select count(*) from MAC_maclean_TAB_BAK;
COUNT(*)
———-
8450796

alter table MAC_maclean_TAB rename to MAC_maclean_TAB_20120920_BAK;
alter table MAC_maclean_TAB_BAK rename to MAC_maclean_TAB;

 

SQL>
SQL> select count(*) from MAC_maclean_TAB;

COUNT(*)
———-
8450796

SQL> analyze table MAC_maclean_TAB validate structure;

Table analyzed.

drop table MAC_maclean_TAB_20120920_BAK;
原MAC_maclean_TAB共计8450840条数据,实际修复出8450796条记录,丢失44条。

后续使用CREATE TABLE命令将原存在逻辑坏块的数据块覆盖,避免上述ORA-600问题再次发生。
create table LARGE_TABLE (t1 int) tablespace MAC_REPORT_TBS ;

alter table LARGE_TABLE allocate extent (datafile ‘/maclean/maclean_32.dbf’ size 10M);

 
对全库做数据文件的验证操作,找出数据库中潜在的坏块,避免此类问题再次发生,若使用RMAN可以参考如下脚本:

RMAN> run {
allocate channel d1 type disk;
allocate channel d2 type disk;
allocate channel d3 type disk;
allocate channel d4 type disk;
backup validate check logical database;
}
以上会将找到的坏块信息汇总到V$DATABASE_BLOCK_CORRUPTION视图中:

select * from V$DATABASE_BLOCK_CORRUPTION ;