Oracle RMAN 10g中如何提高copy datafile的并发

Oracle RMAN 10g中如何提高copy datafile的并发
try something like this:
format后面修改为你复制的目标文件名
 run
 { allocate channel mac01  DEVICE TYPE DISK MAXOPENFILES=1 PARMS=’BLKSIZE=1048576′ ;
   allocate channel mac02  DEVICE TYPE DISK MAXOPENFILES=1 PARMS=’BLKSIZE=1048576′ ;
   allocate channel mac03  DEVICE TYPE DISK MAXOPENFILES=1 PARMS=’BLKSIZE=1048576′ ;
   allocate channel mac04  DEVICE TYPE DISK MAXOPENFILES=1 PARMS=’BLKSIZE=1048576′ ;
   backup as copy (datafile ‘+DATADG/MAC/DATAFILE/system.258.819691771′    format ‘/s01/backup/system.dbf’ channel mac01) 
   (datafile ‘+DATADG/MAC/DATAFILE/sysaux.257.819691725′    format ‘/s01/backup/sysaux.dbf’    channel mac02)
   (datafile ‘+DATADG/MAC/DATAFILE/undotbs1.260.819691817′  format ‘/s01/backup/undotbs1.dbf’  channel mac03);
 }

oracle中闪回数据库到还原点的操作

oracle中闪回数据库到还原点的操作

CREATE RESTORE POINT before_clean GUARANTEE FLASHBACK DATABASE;
 
==>必须保证闪回回复区flashback recovery area有足够的磁盘空间
 
在standby 上,  注意 操作之前要记得 关闭redo传输
 
alter database recover managed standby database finish force;
 
alter database open;
 
操作
 
shutdown immediate;
startup mount;
 
flashback database to RESTORE POINT before_clean;

oracle中以测试为目的人为制造物理坏块的方法

oracle中以测试为目的人为制造物理坏块的方法

SQL> create table maclean_corrupt tablespace users as select * from dba_tables;
 
表已创建。
 
SQL> select dbms_rowid.rowid_block_number(rowid),dbms_rowid.rowid_relative_fno(rowid) from maclean_corrupt where rownum<10;
 
DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID)
———————————— ————————————
                              382307                                    6
                              382307                                    6
                              382307                                    6
                              382307                                    6
                              382307                                    6
                              382307                                    6
                              382307                                    6
                              382307                                    6
                              382307                                    6
 
已选择 9 行。
 
 
 
SQL> alter system checkpoint;
 
系统已更改。
 
 
C:\Users\xiangbli>rman target /
 
恢复管理器: Release 12.1.0.1.0 – Production on 星期日 9月 22 09:23:50 2013
 
Copyright (c) 1982, 2013, Oracle and/or its affiliates.  All rights reserved.
 
已连接到目标数据库: MACLEAN (DBID=1694338843)
 
RMAN> recover datafile 6 block 382307 clear;
 
启动 recover 于 22-9月 -13
使用目标数据库控制文件替代恢复目录
分配的通道: ORA_DISK_1
通道 ORA_DISK_1: SID=254 设备类型=DISK
完成 recover 于 22-9月 -13
 
 
 
SQL> alter system flush buffer_cache;
 
系统已更改。
 
SQL> select count(*) from maclean_corrupt;
select count(*) from maclean_corrupt
       *
第 1 行出现错误:
ORA-01578: ORACLE 数据块损坏 (文件号 6, 块号 382307)
ORA-01110: 数据文件 6: ‘C:\APP\XIANGBLI\ORADATA\MACLEAN\USERS01.DBF’

诗檀软件成功帮助山西某公安系统恢复了超过3TB的数据库

诗檀软件成功帮助山西某公安系统恢复了超过3TB的数据库

 

由于客户数据库使用裸设备,且数据量高达3T。这里仅对整个system表空间文件及部分数据文件头进行备份。
相应备份被放置在/oracle/backup目录下。

1) 备份spfile
create pfile='/home/oracle/pfile1' from spfile;

2) 备份system表空间文件及数据文件头
SQL> select bytes/1024/1024 from v$datafile where file#=1;  --  2040
$ mkdir /oracle/backup

$ dd if=/dev/rrac_system01 of=/oracle/backup/rrac_system01.bak bs=1024000

$ dbv file=rrac_system01.bak blocksize=8192

SQL> set echo off
SQL> set heading off
SQL> set linesize 300 pagesize 0
SQL> select 'dd if='||name||' of=/oracle/backup/header_'||file# || ' bs=1024000 count=10' from v$datafile;

3) 备份控制文件
$ dd if=/dev/rrac_ctrol01 of=/oracle/backup/rrac_ctrol01 bs=1024000

2. 现场报错及启库操作

在数据救援中发现以下报错
Errors in file
/oracle/product/admin/orcl/udump/orcl2.ora_160642.trc:
ORA-00600: internal error code, arguments: [2662], [4], [4186310893], [4], [4186345409], [58720264], [],…

Errors in file 
/oracle/product/admin/orcl/udump/orcl2_ora_136116.trc:
ORA-00600: internal error code, arguments: [2256],[0], [1073741824], [5],[3],[],[],[]

除了以上数据库SCN问题错误外,还存在online redo 日志讹误及Undo问题。诗檀软件工程师通过设置系统参数重新修正了SCN号,通过设置10513, 10231 event事件避免了相关检查报错,并使用恢复操作恢复了数据库。

诗檀软件成功帮助云南某旅游企业恢复断电受损的Oracle数据库

诗檀软件在年前成功帮助云南某旅游企业恢复断电受损的数据库

用户网站后台数据库在部署HA之前意外断电后无法OPEN,诗檀软件工程师@Biot_Wang在2个小时内顺利打开数据库并修复了后续问题。

 

Screen Shot 2015-02-16 at 4.49.11 PM

解决ORA-00600[ktspgfb-1]一例

解决ORA-00600[ktspgfb-1]一例,该案例的trace信息如下:

 

 

 

 

ktspgfbs的含义为Get first level bitmap block for search

ktspgfblk的含义为KTSP Get first level bitmap block

发生该ORA-00600: internal error code, arguments: [ktspgfb-1]错误,基本可以认为是数据对象的基本信息一级位图块可能存在逻辑损坏/讹误。

针对该错误需要根据不同的数据对象类型制定不同的修复策略。

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 13764045638   QQ号:47079569    邮箱:service@parnassusdata.com

 

该ORA-00600[ktspgfb-1]错误的bug信息如下:

解决ORA-00600:[kcbgcur_3]一例

解决ORA-00600:[kcbgcur_3]一例 ,kcbgcur_3这个函数出现ORA-00600错误一般是告诉我们,当一个状态为”CURRENT”的cache buffer中存放的是程序所预期的数据地址,即TABLESPACE号和相关DBA(RDBA),但Oracle却发现这个block并不属于一个预期的OBJECT(实际是发现了不同的Data_object_id)。

 

 

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 13764045638   QQ号:47079569    邮箱:service@parnassusdata.com

 

这一般说明是数据出现了逻辑上的讹误,主要有几种原因造成这种问题:

  1. 严重的写丢失Lost Write造成逻辑上的不一致
  2. Oracle自身bug造成的逻辑上的不一致

在此例子中kcbgcur_3的相关argument为ORA-00600: internal error code, arguments: [kcbgcur_3], [91738], [1], [0], [0], [], [], [],而数据库版本为9.2.0.8, 参考附录的kcbgcur_3的argument信息详解,各argument的含义为

 

Arg [a] 91738 即对象的object_id

Arg [b] 1代表临时对象

通过91738这个object_id可以迅速定位到出错对象,根据不同的对象类型可以给与不同的修复手段。

 

此例子的stack call如下:

 

附录

 

关于kcbgcur_3 ORA-00600的详细信息,其报错argument的信息含义如下:

 

10gR2及以后是:

 

Arg [a] Object Id passed to the cache by the layer accessing the cache.

Arg [b] Class of the block.

Arg [c] Flags which define characteristics of buffer usage

Arg [d] 1 for a temporary object.

 

10gR1中的信息如下:

 

Arg [a] Object Id passed to the cache by the layer accessing the cache.

Arg [b] Class of the block.

Arg [c] 1 for a temporary object.

 

Oracle 9.X中的信息如下:

 

Arg [a] Object Id passed to the cache by the layer accessing the cache.

Arg [b] 1 for a temporary object.

 

Oracle 8.0.4到8.1.7中的信息如下:

 

Arg [a] Class of the block.

Arg [b] Tablespace number.

Arg [c] Relative DBA.

Arg [d] Object Id in the cache buffer at the time of the error.

Arg [e] Object Id passed to the cache by the layer accessing the cache. Indicates if this is a temporary or a permanent object.

Arg [f] In version 8.0.4 to 8.1.5, this is 1 for a permanent object. In version 8.1.6 and 8.1.7, this is 1 for a temporary object.

 

Oracle 8.0.3中的信息如下:

Arg [a] Class of the block.
Arg [b] Tablespace number.
Arg [c] Relative DBA.
Arg [d] Object Id in the cache buffer at the time of the error.
Arg [e] Object Id passed to the cache by the layer accessing the cache.

 

 kcbgcur_3的相关BUG列表如下

 

Oracle ORA-00704 ORA-00604 ORA-01502 SYS.I_DEPENDENCY1.+損害あるいは使えなくなった

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline:  +86 13764045638 E-mail: service@parnassusdata.com

 

 

ORA-00704 ORA-00604 ORA-01502: index ‘SYS.I_DEPENDENCY1′ or partition of such index is in unusable stateエラはどうやって対応できるかと友に聞かれた。

11.2でI_DEPENDENCY1損害の対応
I_DEPENDENCY2損害の対応

SYS.I_DEPENDENCY1が壊されたあるいは使えなくなったときに、以下のようなエラになる:

Mon Dec 22 09:13:32 2014

Errors in file /home/oracle/oracle/product/admin/orcl/udump/orcl_ora_19783_lunar.trc:

ORA-00704: bootstrap process failure

ORA-00604: error occurred at recursive SQL level 1

ORA-01502: index ‘SYS.I_DEPENDENCY1’ or partition of such index is in unusable state

まずは10gでの対応策で、二つがある:
1,10gで直にリカバリする
2,11.2でデータベースをアップグレードモードに調整して、リカバリする

まずはデータベースに依頼関係があるテーブルは何があるかを見てみよう:

[oracle@lunar oracle]$ ss

 

SQL*Plus: Release 10.2.0.1.0 – Production on Mon Dec 22 08:10:07 2014

 

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

 

 

Connected to:

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 – Production

With the Partitioning, OLAP and Data Mining options

 

SYS@lunar>select owner,object_id,object_name,object_type from dba_objects where object_name like ‘%DEPENDENCY%’ and owner=’SYS’;

 

OWNER            OBJECT_ID OBJECT_NAME                    OBJECT_TYPE

————— ———- —————————— ——————-

SYS                     92 DEPENDENCY$                    TABLE

SYS                  46827 GENDEPENDENCYBLOCKSEQUENCE     TYPE

SYS                  46828 GENDEPENDENCYBLOCKSEQUENCESEQU TYPE

SYS                  46824 GENDEPENDENCYBLOCKSTRUCT       TYPE

SYS                  46826 GENDEPENDENCYBLOCKUNION        TYPE

SYS                  46825 GENINCLUDEDDEPENDENCYBLOCKSTRU TYPE

SYS                   1691 GV_$OBJECT_DEPENDENCY          VIEW

SYS                    122 I_DEPENDENCY1                  INDEX

SYS                    123 I_DEPENDENCY2                  INDEX

SYS                   3950 PUBLIC_DEPENDENCY              VIEW

SYS                  47048 SQLDEPENDENCYBLOCKSEQ          TYPE

SYS                  47049 SQLDEPENDENCYBLOCKSEQUENCE     TYPE

SYS                  47050 SQLDEPENDENCYBLOCKSEQUENCESEQ  TYPE

SYS                  47051 SQLDEPENDENCYBLOCKSEQUENCESEQU TYPE

SYS                  47045 SQLDEPENDENCYBLOCKSTRUCT       TYPE

SYS                  47047 SQLDEPENDENCYBLOCKUNION        TYPE

SYS                  47046 SQLINCLUDEDDEPENDENCYBLOCKSTRU TYPE

SYS                   1066 V_$OBJECT_DEPENDENCY           VIEW

 

18 rows selected.

 

SYS@lunar>

ここで、大切なのはDEPENDENCY$のインディクス情報である。DEPENDENCY$テーブルに二つのインディクスがあることを察知できる。
ここの二つのインディクスが壊された影響も異なっている。その意味は以下の通り:

SYS@lunar>select owner,table_name,index_name,tablespace_name,status from dba_indexes where table_name=’DEPENDENCY$’;

 

OWNER           TABLE_NAME                     INDEX_NAME                     TABLESPACE_NAME                STATUS

————— —————————— —————————— —————————— ——–

SYS             DEPENDENCY$                    I_DEPENDENCY1                  SYSTEM                         VALID

SYS             DEPENDENCY$                    I_DEPENDENCY2                  SYSTEM                         VALID

 

SYS@lunar>

9i、10g、11.1の前に、一部のコアオブジェクトの定義は$ORACLE_HOME/rdbms/admin/sql.bsqにある。
例えばtab$,obj$,ind$など。
11.2から、一部の大事なコアベーステーブルを作成するスクリプトは依然としてsql.bsqであるが、いろんなオブジェクトに異なったスクリプトに分類された:

dcore.bsq

dsqlddl.bsq

dmanage.bsq

dplsql.bsq

dtxnspc.bsq

dfmap.bsq

denv.bsq

drac.bsq

dsec.bsq

doptim.bsq

dobj.bsq

djava.bsq

dpart.bsq

drep.bsq

daw.bsq

dsummgt.bsq

dtools.bsq

dexttab.bsq

ddm.bsq

dlmnr.bsq

ddst.bsq

dependency$,i_dependency1とi_dependency1の定義は以下の通り(10.2と11.2に相違がない):

create table dependency$                                 /* dependency table */

( d_obj#        number not null,                  /* dependent object number */

d_timestamp   date not null,   /* dependent object specification timestamp */

order#        number not null,                             /* order number */

p_obj#        number not null,                     /* parent object number */

p_timestamp   date not null,      /* parent object specification timestamp */

d_owner#      number,                           /*  dependent owner number */

property      number not null,                   /* 0x01 = HARD dependency */

/* 0x02 = REF  dependency */

/* 0x04 = FINER GRAINED dependency */

d_attrs       raw(“M_CSIZ”), /* Finer grain attr. numbers if finer grained */

d_reason      raw(“M_CSIZ”))  /* Reason mask of attrs causing invalidation */

storage (initial 10k next 100k maxextents unlimited pctincrease 0)

/

 

create unique index i_dependency1 on

dependency$(d_obj#, d_timestamp, order#)

storage (initial 10k next 100k maxextents unlimited pctincrease 0)

/

 

create index i_dependency2 on

dependency$(p_obj#, p_timestamp)

storage (initial 10k next 100k maxextents unlimited pctincrease 0)

/

次に、どんなブロックを使ったかを探ってみよう:

SYS@lunar>select owner,segment_name,segment_type,extent_id,file_id,block_id from DBA_EXTENTS where segment_name like ‘%DEPENDENCY%’;

 

OWNER           SEGMENT_NAME                   SEGMENT_TYPE        EXTENT_ID    FILE_ID   BLOCK_ID

————— —————————— —————— ———- ———- ———-

SYS             DEPENDENCY$                    TABLE                       0          1        689

SYS             DEPENDENCY$                    TABLE                       1          1       6329

SYS             DEPENDENCY$                    TABLE                       2          1       6665

SYS             DEPENDENCY$                    TABLE                       3          1       6945

SYS             DEPENDENCY$                    TABLE                       4          1       8481

SYS             DEPENDENCY$                    TABLE                       5          1      10505

SYS             DEPENDENCY$                    TABLE                       6          1      12825

SYS             DEPENDENCY$                    TABLE                       7          1      13425

SYS             DEPENDENCY$                    TABLE                       8          1      14753

SYS             DEPENDENCY$                    TABLE                       9          1      15513

SYS             DEPENDENCY$                    TABLE                      10          1      17169

SYS             DEPENDENCY$                    TABLE                      11          1      19209

SYS             DEPENDENCY$                    TABLE                      12          1      19897

SYS             DEPENDENCY$                    TABLE                      13          1      26689

SYS             DEPENDENCY$                    TABLE                      14          1      28153

SYS             DEPENDENCY$                    TABLE                      15          1      29377

SYS             DEPENDENCY$                    TABLE                      16          1      31241

SYS             DEPENDENCY$                    TABLE                      17          1      45321

SYS             DEPENDENCY$                    TABLE                      18          1      51465

SYS             DEPENDENCY$                    TABLE                      19          1      55433

SYSMAN          MGMT_INV_DEPENDENCY_RULE       TABLE                       0          3      24009

SYSMAN          MGMT_METRIC_DEPENDENCY_DEF     TABLE                       0          3      26193

SYSMAN          MGMT_METRIC_DEPENDENCY         TABLE                       0          3      26209

SYSMAN          MGMT_METRIC_DEPENDENCY_DETAILS TABLE                       0          3      26225

SYS             I_DEPENDENCY1                  INDEX                       0          1        929

SYS             I_DEPENDENCY1                  INDEX                       1          1       6313

SYS             I_DEPENDENCY1                  INDEX                       2          1       6777

SYS             I_DEPENDENCY1                  INDEX                       3          1       8905

SYS             I_DEPENDENCY1                  INDEX                       4          1      12185

SYS             I_DEPENDENCY1                  INDEX                       5          1      13433

SYS             I_DEPENDENCY1                  INDEX                       6          1      14761

SYS             I_DEPENDENCY1                  INDEX                       7          1      15537

SYS             I_DEPENDENCY1                  INDEX                       8          1      18425

SYS             I_DEPENDENCY1                  INDEX                       9          1      19273

SYS             I_DEPENDENCY1                  INDEX                      10          1      26705

SYS             I_DEPENDENCY1                  INDEX                      11          1      28297

SYS             I_DEPENDENCY1                  INDEX                      12          1      30609

SYS             I_DEPENDENCY1                  INDEX                      13          1      32297

SYS             I_DEPENDENCY1                  INDEX                      14          1      34353

SYS             I_DEPENDENCY1                  INDEX                      15          1      37129

SYS             I_DEPENDENCY1                  INDEX                      16          1      38665

SYS             I_DEPENDENCY1                  INDEX                      17          1      47113

SYS             I_DEPENDENCY1                  INDEX                      18          1      51721

SYS             I_DEPENDENCY2                  INDEX                       0          1        937

SYS             I_DEPENDENCY2                  INDEX                       1          1       6337

SYS             I_DEPENDENCY2                  INDEX                       2          1       6657

SYS             I_DEPENDENCY2                  INDEX                       3          1       6961

SYS             I_DEPENDENCY2                  INDEX                       4          1       9545

SYS             I_DEPENDENCY2                  INDEX                       5          1      11481

SYS             I_DEPENDENCY2                  INDEX                       6          1      13281

SYS             I_DEPENDENCY2                  INDEX                       7          1      14369

SYS             I_DEPENDENCY2                  INDEX                       8          1      14841

SYS             I_DEPENDENCY2                  INDEX                       9          1      16617

SYS             I_DEPENDENCY2                  INDEX                      10          1      18409

SYS             I_DEPENDENCY2                  INDEX                      11          1      19889

SYS             I_DEPENDENCY2                  INDEX                      12          1      26681

SYS             I_DEPENDENCY2                  INDEX                      13          1      28129

SYS             I_DEPENDENCY2                  INDEX                      14          1      29369

SYS             I_DEPENDENCY2                  INDEX                      15          1      30649

SYS             I_DEPENDENCY2                  INDEX                      16          1      32137

SYS             I_DEPENDENCY2                  INDEX                      17          1      45449

SYS             I_DEPENDENCY2                  INDEX                      18          1      51593

SYS             I_DEPENDENCY2                  INDEX                      19          1      59785

SYSMAN          PK_MGMT_METRIC_DEPENDENCY_DEF  INDEX                       0          3      26201

SYSMAN          PK_MGMT_METRIC_DEPENDENCY      INDEX                       0          3      26217

 

65 rows selected.

 

SYS@lunar>

では10.2でi_dependency1が失効になった場合をシミュレーションする(テスト環境は10.2.0.1):

[oracle@lunar oracle]$ ss

 

SQL*Plus: Release 10.2.0.1.0 – Production on Mon Dec 22 08:10:07 2014

 

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

 

 

Connected to:

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 – Production

With the Partitioning, OLAP and Data Mining options

 

SYS@lunar>alter index i_dependency1 unusable;

 

Index altered.

 

SYS@lunar>select owner,table_name,index_name,tablespace_name,status from dba_indexes where table_name=’DEPENDENCY$’;

 

OWNER           TABLE_NAME                     INDEX_NAME                     TABLESPACE_NAME                STATUS

————— —————————— —————————— —————————— ——–

SYS             DEPENDENCY$                    I_DEPENDENCY1                  SYSTEM                         UNUSABLE

SYS             DEPENDENCY$                    I_DEPENDENCY2                  SYSTEM                         VALID

 

SYS@lunar>select obj#,flags,to_char(flags,’xxxxxxxxxxxxx’) from ind$ where obj# in (92,122,123);

 

OBJ#      FLAGS TO_CHAR(FLAGS,

———- ———- ————–

122       2051            803   —— 2000+10+40+1=====>/* unusable (dls) : 0x01 ,这里也可以看出来被为unusable了

123       2050            802

 

SYS@lunar>

alter index i_dependency1 unusableを実行して、alert.logの情報は以下の通り:

Mon Dec 22 09:08:37 2014

Index SYS.I_DEPENDENCY1 or some [sub]partitions of the index have been marked unusable

 

[oracle@lunar bdump]$ date

Mon Dec 22 09:11:29 CST 2014

[oracle@lunar bdump]$

ind$のflagsフィールドの定義(その定義がベーステーブル定義を検索できる。10.2でsql.bsq,11.1の後dcore.bsq):

flags         number not null,

/* mutable flags: anything permanent should go into property */

/* unusable (dls) : 0x01 */

/* analyzed       : 0x02 */

/* no logging     : 0x04 */

/* index is currently being built : 0x08 */

/* index creation was incomplete : 0x10 */

/* key compression enabled : 0x20 */

/* user-specified stats : 0x40 */

/* secondary index on IOT : 0x80 */

/* index is being online built : 0x100 */

/* index is being online rebuilt : 0x200 */

/* index is disabled : 0x400 */   ———注意这里,设置一个index失效的标示(转换成10进制是1024)

/* global stats : 0x800 */

/* fake index(internal) : 0x1000 */

/* index on UROWID column(s) : 0x2000 */

/* index with large key : 0x4000 */

/* move partitioned rows in base table : 0x8000 */

/* index usage monitoring enabled : 0x10000 */

/* 4 bits reserved for bitmap index version : 0x1E0000 */

/* Delayed Segment Creation: 0x4000000 */

以上の結果によって、実際にind$.flagsに対する定義が変わっていないが、内部操作が変更されただけ。
このマークもつかえる。例えばorcacle 9204にSYSのXXXインディクスbugによって、データベースが起動できなくなる場合に、このマークでリカバリできる。
例えば、10g前に、onlineでインディクスを作成したら、中止するにはとってもめんどくさくなるので、この場合にも利用できる。
10.2のあとdbms_repair.online_index_cleanでインディクス状態をクリンアップできるが、その機能は10.2の公式ファイルに記されていない。(desc dbms_repairなら、その機能を見られる)。
Oracleはこれについての説明は以下の通りBug 3965042 : NEW FUNCTION DBMS_REPAIR.ONLINE_INDEX_CLEAN NEEDS TO BE DOCUMENTED)
ALTER TABLE MOVE ONLINE一个IOTテーブルにも似たようなトラブルもある。対応策も同じようになる。

この時、データベースを起動して、ORA-00704 ORA-00604 ORA-01502: index ‘SYS.I_DEPENDENCY1′ or partition of such index is in unusable stateエラが報告される:

SYS@lunar>shutdown abort

ORACLE instance shut down.

SYS@lunar>startup mount

ORACLE instance started.

 

Total System Global Area  167772160 bytes

Fixed Size                  1218316 bytes

Variable Size              71305460 bytes

Database Buffers           92274688 bytes

Redo Buffers                2973696 bytes

Database mounted.

SYS@lunar>oradebug setmypid

alter session set db_file_multiblocK_read_count=1;

alter session set tracefile_identifier=’lunar’;

Statement processed.

SYS@lunar>

Session altered.

 

SYS@lunar>oradebug event 10046 trace name context forever,level 12;

 

Session altered.

 

SYS@lunar>Statement processed.

SYS@lunar>oradebug tracefile_name

/home/oracle/oracle/product/admin/orcl/udump/orcl_ora_19783_lunar.trc

SYS@lunar>alter database open;

alter database open

*

ERROR at line 1:

ORA-01092: ORACLE instance terminated. Disconnection forced

 

 

SYS@lunar>

alert.logに情報は以下の通り

Mon Dec 22 09:13:29 2014

SMON: enabling cache recovery

Mon Dec 22 09:13:29 2014

ARC2: Archival started

ARC0: STARTING ARCH PROCESSES COMPLETE

ARC0: Becoming the heartbeat ARCH

ARC2 started with pid=17, OS id=19789

Mon Dec 22 09:13:32 2014

Deleted Oracle managed file /home/oracle/oracle/product/flash_recovery_area/ORCL/archivelog/2009_11_21/o1_mf_1_42_5jg30x9m_.arc

Mon Dec 22 09:13:32 2014

Errors in file /home/oracle/oracle/product/admin/orcl/udump/orcl_ora_19783_lunar.trc:

ORA-00704: bootstrap process failure

ORA-00604: error occurred at recursive SQL level 1

ORA-01502: index ‘SYS.I_DEPENDENCY1’ or partition of such index is in unusable state

Mon Dec 22 09:13:32 2014

Error 704 happened during db open, shutting down database

USER: terminating instance due to error 704

Instance terminated by USER, pid = 19783

ORA-1092 signalled during: alter database open…

Traceファイル情報:

PARSING IN CURSOR #5 len=179 dep=1 uid=0 oct=3 lid=0 tim=1385948058756927 hv=2812844157 ad=’29b07834′

select owner#,name,namespace,remoteowner,linkname,p_timestamp,p_obj#, nvl(property,0),subname,d_attrs from dependency$ d, obj$ o where d_obj#=:1 and p_obj#=obj#(+) order by order#

END OF STMT

PARSE #5:c=6999,e=70263,p=8,cr=40,cu=0,mis=1,r=0,dep=1,og=4,tim=1385948058756919

=====================

PARSING IN CURSOR #2 len=84 dep=2 uid=0 oct=3 lid=0 tim=1385948058757830 hv=2686874206 ad=’29b0652c’

select o.name, u.name from obj$ o, user$ u  where o.obj# = :1 and o.owner# = u.user#

END OF STMT

PARSE #2:c=1000,e=266,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,tim=1385948058757826

BINDS #2:

kkscoacd

Bind#0

oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00

oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0

kxsbbbfp=b72f40bc  bln=22  avl=03  flg=05

value=122  ———》》obj#=122的正是I_DEPENDENCY1

EXEC #2:c=0,e=658,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,tim=1385948058758543

WAIT #2: nam=’db file sequential read’ ela= 15160 file#=1 block#=98 blocks=1 obj#=-1 tim=1385948058773774

WAIT #2: nam=’db file sequential read’ ela= 481 file#=1 block#=90 blocks=1 obj#=-1 tim=1385948058774587

FETCH #2:c=2000,e=16086,p=2,cr=5,cu=0,mis=0,r=1,dep=2,og=4,tim=1385948058774648

STAT #2 id=1 cnt=1 pid=0 pos=1 obj=0 op=’NESTED LOOPS  (cr=5 pr=2 pw=0 time=16090 us)’

STAT #2 id=2 cnt=1 pid=1 pos=1 obj=18 op=’TABLE ACCESS BY INDEX ROWID OBJ$ (cr=3 pr=0 pw=0 time=28 us)’

STAT #2 id=3 cnt=1 pid=2 pos=1 obj=36 op=’INDEX UNIQUE SCAN I_OBJ1 (cr=2 pr=0 pw=0 time=16 us)’

STAT #2 id=4 cnt=1 pid=1 pos=2 obj=22 op=’TABLE ACCESS CLUSTER USER$ (cr=2 pr=2 pw=0 time=16050 us)’

STAT #2 id=5 cnt=1 pid=4 pos=1 obj=11 op=’INDEX UNIQUE SCAN I_USER# (cr=1 pr=1 pw=0 time=15264 us)’

ORA-00704: bootstrap process failure

ORA-00604: error occurred at recursive SQL level 1

ORA-01502: index ‘SYS.I_DEPENDENCY1’ or partition of such index is in unusable state

EXEC #1:c=252962,e=5328707,p=155,cr=730,cu=0,mis=0,r=0,dep=0,og=1,tim=1385948059778098

ERROR #1:err=1092 tim=432829200

ここで、次のアーカイブSQLのエラによって、データベースが起動できなくなるかもしれない:
select owner#,name,namespace,remoteowner,linkname,p_timestamp,p_obj#, nvl(property,0),subname,d_attrs
from dependency$ d, obj$ o where d_obj#=:1 and p_obj#=obj#(+) order by order#

ここのバインド変数は122で、I_DEPENDENCY1である。

今データベースを起動して、upgradeし、flags=1024を修正する。(つまり、そのインディクスが無効とマークする)。では、やってみよう:

[oracle@lunar oracle]$ ss

 

SQL*Plus: Release 10.2.0.1.0 – Production on Mon Dec 22 09:22:06 2014

 

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

 

Connected to an idle instance.

 

SYS@lunar>startup mount

ORACLE instance started.

 

Total System Global Area  167772160 bytes

Fixed Size                  1218316 bytes

Variable Size              71305460 bytes

Database Buffers           92274688 bytes

Redo Buffers                2973696 bytes

Database mounted.

SYS@lunar>oradebug setmypid

Statement processed.

SYS@lunar>alter session set db_file_multiblocK_read_count=1;

 

Session altered.

 

SYS@lunar>alter session set tracefile_identifier=’lunar’;

 

Session altered.

 

SYS@lunar>oradebug event 10046 trace name context forever,level 12;

Statement processed.

SYS@lunar>oradebug tracefile_name

/home/oracle/oracle/product/admin/orcl/udump/orcl_ora_19834_lunar.trc

SYS@lunar>alter database open upgrade;

 

Database altered.

 

SYS@lunar>

alert.log情報は以下の通り:

Mon Dec 22 09:23:02 2014

Successfully onlined Undo Tablespace 1.

Mon Dec 22 09:23:02 2014

SMON: enabling tx recovery

Mon Dec 22 09:23:03 2014

Database Characterset is ZHS16GBK

Mon Dec 22 09:23:06 2014

Stopping background process MMNL

Mon Dec 22 09:23:06 2014

Stopping background process MMON

Starting background process MMON

MMON started with pid=10, OS id=19842

Mon Dec 22 09:23:06 2014

Starting background process MMNL

MMNL started with pid=11, OS id=19844

Mon Dec 22 09:23:06 2014

ALTER SYSTEM enable restricted session;

Mon Dec 22 09:23:07 2014

ALTER SYSTEM SET _system_trig_enabled=FALSE SCOPE=MEMORY;

Mon Dec 22 09:23:07 2014

ALTER SYSTEM SET aq_tm_processes=0 SCOPE=MEMORY;

Mon Dec 22 09:23:07 2014

ALTER SYSTEM SET resource_manager_plan=” SCOPE=MEMORY;

replication_dependency_tracking turned off (no async multimaster replication found)

kwqiconfy: Warning AQ Kchunk open notifier failed with 1502

XDB UNINITIALIZED: XDB$SCHEMA not accessible

Completed: alter database open upgrade

1

 

 

 

データベースがアップグレードモードを起動した。先のエラ文を探し出して、アップグレードモードにいると見つけた

1

=====================

PARSING IN CURSOR #4 len=221 dep=2 uid=0 oct=3 lid=0 tim=1385948615586776 hv=1926936385 ad=’29b04538′

select /* ordered use_nl(o) index(d) index(o) */ owner#,name,namespace,remoteowner,linkname,p_timestamp,p_obj#, nvl(property,0),subname,d_attrs from dependency$ d, obj$ o where d_obj#=:1 and p_obj#=obj#(+) order by order#

END OF STMT

PARSE #4:c=2999,e=3438,p=4,cr=35,cu=0,mis=1,r=0,dep=2,og=4,tim=1385948615586774

=====================

PARSING IN CURSOR #2 len=84 dep=3 uid=0 oct=3 lid=0 tim=1385948615587498 hv=2686874206 ad=’29b0396c’

select o.name, u.name from obj$ o, user$ u  where o.obj# = :1 and o.owner# = u.user#

END OF STMT

PARSE #2:c=0,e=279,p=0,cr=0,cu=0,mis=1,r=0,dep=3,og=4,tim=1385948615587494

BINDS #2:

kkscoacd

Bind#0

oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00

oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0

kxsbbbfp=b725753c  bln=22  avl=03  flg=05

value=122

EXEC #2:c=1000,e=534,p=0,cr=0,cu=0,mis=1,r=0,dep=3,og=4,tim=1385948615588087

FETCH #2:c=0,e=56,p=0,cr=5,cu=0,mis=0,r=1,dep=3,og=4,tim=1385948615588159

STAT #2 id=1 cnt=1 pid=0 pos=1 obj=0 op=’NESTED LOOPS  (cr=5 pr=0 pw=0 time=60 us)’

STAT #2 id=2 cnt=1 pid=1 pos=1 obj=18 op=’TABLE ACCESS BY INDEX ROWID OBJ$ (cr=3 pr=0 pw=0 time=26 us)’

STAT #2 id=3 cnt=1 pid=2 pos=1 obj=36 op=’INDEX UNIQUE SCAN I_OBJ1 (cr=2 pr=0 pw=0 time=15 us)’

STAT #2 id=4 cnt=1 pid=1 pos=2 obj=22 op=’TABLE ACCESS CLUSTER USER$ (cr=2 pr=0 pw=0 time=23 us)’

STAT #2 id=5 cnt=1 pid=4 pos=1 obj=11 op=’INDEX UNIQUE SCAN I_USER# (cr=1 pr=0 pw=0 time=6 us)’

具体的なSQL:

select /* ordered use_nl(o) index(d) index(o) */ owner#,name,namespace,remoteowner,linkname,p_timestamp,p_obj#, nvl(property,0),subname,d_attrs

from dependency$ d, obj$ o where d_obj#=:1 and p_obj#=obj#(+) order by order#

dba_objectsを検索し、ORA-00604 ORA-01502: index ‘SYS.I_DEPENDENCY1′ or partition of such index is in unusable stateエラになる。
原因は以上のように、インディクスが壊された。けどOracleは依然としそのインディクスを使うから、エラになる:

SYS@lunar>select owner,object_id,object_name,object_type from dba_objects where object_id in (92,122,123);

select owner,object_id,object_name,object_type from dba_objects where object_id in (92,122,123)

*

ERROR at line 1:

ORA-00604: error occurred at recursive SQL level 1

ORA-01502: index ‘SYS.I_DEPENDENCY1’ or partition of such index is in unusable state

SYS@lunar>

では、なぜOracleが起動できるか?
これはOracleがUpgradeモードにいれば、いくつか普通の起動と違った操作がある。
正常にデータベースを起動した時に、アーカイブsqlでOBJ#とUSER#を獲得すると、実行計画はI_DEPENDENCY1でDEPENDENCY1をスキャンする。
インディクスが使えなくなったからデータベースも起動できなくなった。それにupgradeモードでデータベースを起動すると、/* ordered use_nl(o) index(d) index(o) */ hintをつかう。
OBJ#を獲得したときに、oid$を使ってobj#を獲得するから、使えなくなった。SYS.I_DEPENDENCY1インディクスも使っていないから、upgradeモードでデータベースを起動できる。
tkprofでこのアーカイブsqlの実行計画が見られる:

select /* ordered use_nl(o) index(d) index(o) */ owner#,name,namespace,

remoteowner,linkname,p_timestamp,p_obj#, nvl(property,0),subname,d_attrs

from

dependency$ d, obj$ o where d_obj#=:1 and p_obj#=obj#(+) order by order#

 

 

call     count       cpu    elapsed       disk      query    current        rows

——- ——  ——– ———- ———- ———- ———-  ———-

Parse        5      0.00       0.00          0          0          0           0

Execute      0      0.00       0.00          0          0          0           0

Fetch        0      0.00       0.00          0          0          0           0

——- ——  ——– ———- ———- ———- ———-  ———-

total        5      0.00       0.00          0          0          0           0

 

Misses in library cache during parse: 5

Optimizer mode: CHOOSE

Parsing user id: SYS   (recursive depth: 2)

********************************************************************************

 

select obj#

from

oid$ where user#=:1 and oid$=:2

 

 

call     count       cpu    elapsed       disk      query    current        rows

——- ——  ——– ———- ———- ———- ———-  ———-

Parse        1      0.00       0.00          0          0          0           0

Execute      2      0.00       0.00          0          0          0           0

Fetch        2      0.00       0.01          3          5          0           1

——- ——  ——– ———- ———- ———- ———-  ———-

total        5      0.00       0.02          3          5          0           1

 

Misses in library cache during parse: 1

Misses in library cache during execute: 1

Optimizer mode: CHOOSE

Parsing user id: SYS   (recursive depth: 1)

 

Rows     Row Source Operation

——-  —————————————————

0  TABLE ACCESS BY INDEX ROWID OID$ (cr=2 pr=2 pw=0 time=7433 us)

0   INDEX UNIQUE SCAN I_OID1 (cr=2 pr=2 pw=0 time=7419 us)(object id 179)

 

 

Elapsed times include waiting on following events:

Event waited on                             Times   Max. Wait  Total Waited

—————————————-   Waited  ———-  ————

db file sequential read                         3        0.00          0.01

oid$の定義は以下の通り:

create table oid$                    /* OID mapping table for schema objects */

(

user#         number not null,   /* user this mapping is for (user$.user#) */

oid$          raw(16) not null,        /* OID for typed table/view or type */

obj#          number not null)         /* target object number (obj$.obj#) */

/* key: (user#, oid$) */

/

ここで以下のようにリカバリする:

[oracle@lunar oracle]$ ss

 

SQL*Plus: Release 10.2.0.1.0 – Production on Mon Dec 22 08:10:07 2014

 

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

 

 

Connected to:

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 – Production

With the Partitioning, OLAP and Data Mining options

 

SYS@lunar>select obj# from obj$ where name=’I_DEPENDENCY1′;

 

OBJ#

———-

122

 

 

SYS@lunar>update ind$ set flags=1024 where obj#=122;

 

1 row updated.

 

SYS@lunar>select flags,obj# from ind$ where obj#=122;

 

FLAGS       OBJ#

———- ———-

1024        122

 

SYS@lunar>commit;

 

Commit complete.

 

SYS@lunar>

 

制約モードでデータベースを再起動する:

SYS@lunar>alter index i_dependency1 rebuild

2  ;

 

Index altered.

 

SYS@lunar>analyze table dependency$ validate structure cascade;

 

Table analyzed.

 

SYS@lunar>select owner,table_name,index_name,tablespace_name,status from dba_indexes where table_name=’DEPENDENCY$’;

 

OWNER           TABLE_NAME                     INDEX_NAME                     TABLESPACE_NAME                STATUS

————— —————————— —————————— —————————— ——–

SYS             DEPENDENCY$                    I_DEPENDENCY1                  SYSTEM                         VALID

SYS             DEPENDENCY$                    I_DEPENDENCY2                  SYSTEM                         VALID

 

SYS@lunar>

ここで、データベースを完璧にリカバリした

 

ORA-15042 ORA-15040 ORA-15032 ASM add disk加盘

存在这种可能性,即ORACLE ASM在add disk扩盘时add disk操作正常完成,disk group的rebalance其实还没有开始,但是由于新加入的disk存在硬件故障,导致add disk后写入到disk header的所有metadata元数据全部丢失,且由于diskgroup是外部冗余即EXTERNAL REdundancy所以该diskgroup由于已经加入了一个DISK,而该DISK上的metadata全部丢失的缘故,所以该diskgroup 将无法正常MOUNT。

且由于新加入的disk上的所有metadata都丢失了,而不仅仅是丢失了disk header的KFBTYP_DISKHEAD,所以还不能是仅仅将KFBTYP_DISKHEAD的信息通过kfed merge其他的disk信息并做修改来还原,其需要通过特殊的手工处理才能绕过该问题。

如下面的例子:

 

SUCCESS: diskgroup TESTDG03 was created
NOTE: cache deleting context for group TESTDG03 1/0x86485c30
NOTE: cache registered group TESTDG03 number=1 incarn=0xab385c36
NOTE: cache began mount (first) of group TESTDG03 number=1 incarn=0xab385c36
NOTE: Assigning number (1,3) to disk (/oracleasm/asm-disk04)
NOTE: Assigning number (1,2) to disk (/oracleasm/asm-disk03)
NOTE: Assigning number (1,1) to disk (/oracleasm/asm-disk02)
NOTE: Assigning number (1,0) to disk (/oracleasm/asm-disk01)
Thu Jan 29 08:21:07 2015
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 92 for pid 20, osid 20176
Thu Jan 29 08:21:07 2015
NOTE: cache opening disk 0 of grp 1: TESTDG03_0000 path:/oracleasm/asm-disk01
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache opening disk 1 of grp 1: TESTDG03_0001 path:/oracleasm/asm-disk02
NOTE: cache opening disk 2 of grp 1: TESTDG03_0002 path:/oracleasm/asm-disk03
NOTE: cache opening disk 3 of grp 1: TESTDG03_0003 path:/oracleasm/asm-disk04
NOTE: cache mounting (first) external redundancy group 1/0xAB385C36 (TESTDG03)
NOTE: cache recovered group 1 to fcn 0.0
NOTE: redo buffer size is 256 blocks (1053184 bytes)
Thu Jan 29 08:21:07 2015
NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (TESTDG03)
NOTE: LGWR found thread 1 closed at ABA 0.10750
NOTE: LGWR mounted thread 1 for diskgroup 1 (TESTDG03)
NOTE: LGWR opening thread 1 at fcn 0.0 ABA 2.0
NOTE: setting 11.2 start ABA for group TESTDG03 thread 1 to 2.0
NOTE: cache mounting group 1/0xAB385C36 (TESTDG03) succeeded
NOTE: cache ending mount (success) of group TESTDG03 number=1 incarn=0xab385c36
GMON querying group 1 at 93 for pid 13, osid 4612
Thu Jan 29 08:21:07 2015
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1
SUCCESS: diskgroup TESTDG03 was mounted
SUCCESS: CREATE DISKGROUP TESTDG03 EXTERNAL REDUNDANCY  DISK '/oracleasm/asm-disk01' SIZE 129500M ,
'/oracleasm/asm-disk02' SIZE 128800M ,
'/oracleasm/asm-disk03' SIZE 129200M ,
'/oracleasm/asm-disk04' SIZE 128800M  ATTRIBUTE 'compatible.asm'='11.2.0.0.0','au_size'='1M' /* ASMCA */
Thu Jan 29 08:21:07 2015
NOTE: diskgroup resource ora.TESTDG03.dg is online
NOTE: diskgroup resource ora.TESTDG03.dg is updated
Thu Jan 29 08:21:23 2015
SQL> alter diskgroup testdg03  add disk '/oracleasm/asm-disk06' 
ORA-15032: not all alterations performed
ORA-15260: permission denied on ASM disk group
ERROR: alter diskgroup testdg03  add disk '/oracleasm/asm-disk06'
Thu Jan 29 08:21:31 2015
SQL> alter diskgroup testdg03  add disk '/oracleasm/asm-disk06' 
NOTE: Assigning number (1,4) to disk (/oracleasm/asm-disk06)
NOTE: requesting all-instance membership refresh for group=1
NOTE: initializing header on grp 1 disk TESTDG03_0004
NOTE: requesting all-instance disk validation for group=1
Thu Jan 29 08:21:32 2015
NOTE: skipping rediscovery for group 1/0xab385c36 (TESTDG03) on local instance.
NOTE: requesting all-instance disk validation for group=1
NOTE: skipping rediscovery for group 1/0xab385c36 (TESTDG03) on local instance.
NOTE: initiating PST update: grp = 1
Thu Jan 29 08:21:32 2015
GMON updating group 1 at 94 for pid 21, osid 22706
NOTE: PST update grp = 1 completed successfully 
NOTE: membership refresh pending for group 1/0xab385c36 (TESTDG03)
GMON querying group 1 at 95 for pid 13, osid 4612
NOTE: cache opening disk 4 of grp 1: TESTDG03_0004 path:/oracleasm/asm-disk06
GMON querying group 1 at 96 for pid 13, osid 4612
SUCCESS: refreshed membership for 1/0xab385c36 (TESTDG03)
SUCCESS: alter diskgroup testdg03  add disk '/oracleasm/asm-disk06'
NOTE: Attempting voting file refresh on diskgroup TESTDG03
Thu Jan 29 08:22:09 2015
SQL> alter diskgroup testdg03 dismount 
NOTE: cache dismounting (clean) group 1/0xAB385C36 (TESTDG03) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 22730, image: oracle@mlab2.oracle.com (TNS V1-V3)
Thu Jan 29 08:22:10 2015
NOTE: LGWR doing clean dismount of group 1 (TESTDG03)
NOTE: LGWR closing thread 1 of diskgroup 1 (TESTDG03) at ABA 2.15
NOTE: cache dismounted group 1/0xAB385C36 (TESTDG03) 
Thu Jan 29 08:22:10 2015
GMON dismounting group 1 at 97 for pid 21, osid 22730
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
SUCCESS: diskgroup TESTDG03 was dismounted
NOTE: cache deleting context for group TESTDG03 1/0xab385c36
Thu Jan 29 08:22:10 2015
NOTE: diskgroup resource ora.TESTDG03.dg is offline
SUCCESS: alter diskgroup testdg03 dismount
NOTE: diskgroup resource ora.TESTDG03.dg is updated
SQL> alter diskgroup testdg03 mount 
NOTE: cache registered group TESTDG03 number=1 incarn=0x83f85c5f
NOTE: cache began mount (first) of group TESTDG03 number=1 incarn=0x83f85c5f
NOTE: Assigning number (1,3) to disk (/oracleasm/asm-disk04)
NOTE: Assigning number (1,2) to disk (/oracleasm/asm-disk03)
NOTE: Assigning number (1,1) to disk (/oracleasm/asm-disk02)
NOTE: Assigning number (1,0) to disk (/oracleasm/asm-disk01)
Thu Jan 29 08:22:22 2015
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 100 for pid 21, osid 22730
Thu Jan 29 08:22:22 2015
NOTE: Assigning number (1,4) to disk ()
GMON querying group 1 at 101 for pid 21, osid 22730
NOTE: cache dismounting (clean) group 1/0x83F85C5F (TESTDG03) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 22730, image: oracle@mlab2.oracle.com (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 1/0x83F85C5F (TESTDG03) 
NOTE: cache ending mount (fail) of group TESTDG03 number=1 incarn=0x83f85c5f
NOTE: cache deleting context for group TESTDG03 1/0x83f85c5f
GMON dismounting group 1 at 102 for pid 21, osid 22730
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
ERROR: diskgroup TESTDG03 was not mounted
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "4" is missing from group number "1" 
ERROR: alter diskgroup testdg03 mount
Thu Jan 29 08:27:37 2015
SQL> alter diskgroup testdg03 mount 
NOTE: cache registered group TESTDG03 number=1 incarn=0x56985c64
NOTE: cache began mount (first) of group TESTDG03 number=1 incarn=0x56985c64
NOTE: Assigning number (1,3) to disk (/oracleasm/asm-disk04)
NOTE: Assigning number (1,2) to disk (/oracleasm/asm-disk03)
NOTE: Assigning number (1,1) to disk (/oracleasm/asm-disk02)
NOTE: Assigning number (1,0) to disk (/oracleasm/asm-disk01)
Thu Jan 29 08:27:43 2015
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 105 for pid 21, osid 23017
NOTE: cache opening disk 0 of grp 1: TESTDG03_0000 path:/oracleasm/asm-disk01
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache opening disk 1 of grp 1: TESTDG03_0001 path:/oracleasm/asm-disk02
NOTE: cache opening disk 2 of grp 1: TESTDG03_0002 path:/oracleasm/asm-disk03
NOTE: cache opening disk 3 of grp 1: TESTDG03_0003 path:/oracleasm/asm-disk04
NOTE: cache mounting (first) external redundancy group 1/0x56985C64 (TESTDG03)
NOTE: cache recovered group 1 to fcn 0.609
NOTE: redo buffer size is 256 blocks (1053184 bytes)
Thu Jan 29 08:27:43 2015
NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (TESTDG03)
NOTE: LGWR found thread 1 closed at ABA 2.15
NOTE: LGWR mounted thread 1 for diskgroup 1 (TESTDG03)
NOTE: LGWR opening thread 1 at fcn 0.609 ABA 3.16
NOTE: cache mounting group 1/0x56985C64 (TESTDG03) succeeded
NOTE: cache ending mount (success) of group TESTDG03 number=1 incarn=0x56985c64
GMON querying group 1 at 106 for pid 13, osid 4612
Thu Jan 29 08:27:43 2015
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1
SUCCESS: diskgroup TESTDG03 was mounted
SUCCESS: alter diskgroup testdg03 mount
Thu Jan 29 08:27:43 2015
NOTE: diskgroup resource ora.TESTDG03.dg is online
NOTE: diskgroup resource ora.TESTDG03.dg is updated
Thu Jan 29 08:33:52 2015
SQL> alter diskgroup testdg03 check all norepair 
NOTE: starting check of diskgroup TESTDG03
Thu Jan 29 08:33:52 2015
GMON checking disk 0 for group 1 at 107 for pid 21, osid 23017
GMON checking disk 1 for group 1 at 108 for pid 21, osid 23017
GMON checking disk 2 for group 1 at 109 for pid 21, osid 23017
GMON checking disk 3 for group 1 at 110 for pid 21, osid 23017
ERROR: no kfdsk for  (4)
ERROR: check of diskgroup TESTDG03 found 1 total errors
ORA-15049: diskgroup "TESTDG03" contains 1 error(s)
ORA-15032: not all alterations performed
ORA-15049: diskgroup "TESTDG03" contains 1 error(s)
ERROR: alter diskgroup testdg03 check all norepair
Thu Jan 29 08:34:07 2015
SQL> alter diskgroup testdg03 check all 
NOTE: starting check of diskgroup TESTDG03
Thu Jan 29 08:34:07 2015
GMON checking disk 0 for group 1 at 111 for pid 21, osid 23017
GMON checking disk 1 for group 1 at 112 for pid 21, osid 23017
GMON checking disk 2 for group 1 at 113 for pid 21, osid 23017
GMON checking disk 3 for group 1 at 114 for pid 21, osid 23017
ERROR: no kfdsk for  (4)
ERROR: check of diskgroup TESTDG03 found 1 total errors
ORA-15049: diskgroup "TESTDG03" contains 1 error(s)
ORA-15032: not all alterations performed
ORA-15049: diskgroup "TESTDG03" contains 1 error(s)
ERROR: alter diskgroup testdg03 check all
SQL> alter diskgroup testdg03 check all repair 
NOTE: starting check of diskgroup TESTDG03
GMON checking disk 0 for group 1 at 115 for pid 21, osid 23017
GMON checking disk 1 for group 1 at 116 for pid 21, osid 23017
GMON checking disk 2 for group 1 at 117 for pid 21, osid 23017
GMON checking disk 3 for group 1 at 118 for pid 21, osid 23017
ERROR: no kfdsk for  (4)
ERROR: check of diskgroup TESTDG03 found 1 total errors
ORA-15049: diskgroup "TESTDG03" contains 1 error(s)
ORA-15032: not all alterations performed
ORA-15049: diskgroup "TESTDG03" contains 1 error(s)
ERROR: alter diskgroup testdg03 check all repair

[oracle@mlab2 oracleasm]$ oerr ora 15042
15042, 00000, “ASM disk \”%s\” is missing from group number \”%s\” ”
// *Cause:  The specified disk, which is a necessary part of a diskgroup,
//          could not be found on the system.
// *Action: Check the hardware configuration.
//

ORA-15042错误正是因为add disk的磁盘上的metadata全部丢失了,但搞笑的时候新加入的盘上可能因为还没有开始rebalance而没有一点真正有意义的数据,但因为ASM认为该disk已经add进来了,所以必须要该disk可用才能mount diskgroup。 而且用户甚至无法强制DROP这个DISK,原因是需要DISKGROUP在MOUNT状态下才可以drop disk, 这就变成了鸡生蛋 蛋生鸡的死循环, 要DROP这个disk必须MOUNT DISKGROUP,但要MOUNT DISKGROUP要先DROP该DISK。

 

对于此问题一般需要诗檀软件工程师手动修改ASM metadata来绕过问题,或者如果有之前的ASM metadata也可以采用。

 

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 13764045638   QQ号:47079569    邮箱:service@parnassusdata.com

ASM kfed repair到底干点啥?

官方对于kfed repair命令的描述比较简单:Recover the disk header from the redundant copy of it maintained on an unused portion of the disk. 其主要用来disk header的头4096 bytes的KFBTYP_DISKHEAD结构,这个恢复是基于10.2.0.5以后的Disk Header自动备份机制的。其在PST即AU=1的最后第二个数据块中(Read from PST(AU 1)’s penultimate Block)自动备份了KFBTYP_DISKHEAD。

 

如:

[oracle@mlab2 oracleasm]$ kfed read asm-disk04 aun=1 blkn=254|less
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:              2147483651 ; 0x008: disk=3
kfbh.check:                    98849704 ; 0x00c: 0x05e453a8
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr:         ORCLDISK ; 0x000: length=8
kfdhdb.driver.reserved[0]:            0 ; 0x008: 0x00000000
kfdhdb.driver.reserved[1]:            0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                168820736 ; 0x020: 0x0a100000
kfdhdb.dsknum:                        3 ; 0x024: 0x0003
kfdhdb.grptyp:                        3 ; 0x026: KFDGTP_HIGH
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:              DATA1_0003 ; 0x028: length=10
kfdhdb.grpname:                   DATA1 ; 0x048: length=5
kfdhdb.fgname:               DATA1_0003 ; 0x068: length=10
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.crestmp.hi:             33006980 ; 0x0a8: HOUR=0x4 DAYS=0xc MNTH=0x9 YEAR=0x7de
kfdhdb.crestmp.lo:           2555232256 ; 0x0ac: USEC=0x0 MSEC=0x370 SECS=0x4 MINS=0x26
kfdhdb.mntstmp.hi:             33008247 ; 0x0b0: HOUR=0x17 DAYS=0x13 MNTH=0xa YEAR=0x7de
kfdhdb.mntstmp.lo:           3341018112 ; 0x0b4: USEC=0x0 MSEC=0xf9 SECS=0x32 MINS=0x31
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.mfact:                    113792 ; 0x0c0: 0x0001bc80
kfdhdb.dsksize:                  128800 ; 0x0c4: 0x0001f720
kfdhdb.pmcnt:                         3 ; 0x0c8: 0x00000003
kfdhdb.fstlocn:                       1 ; 0x0cc: 0x00000001
kfdhdb.altlocn:                       2 ; 0x0d0: 0x00000002
:

由于ASM 有这个备份,所以kfed repair可以自动修复ASM disk header的最开始的4096 bytes,但又由于这个备份只备份4096字节的metadata,所以它不能应对整体metadata(除掉KFBTYP_DISKHEAD外)还有大量的其他必须的metadata元数据。

之前在用户现场发现kfed repair也能修复PST KFBTYP_PST_META中的部分逻辑讹误/损坏,但一直无法在自己的环境中重现。

 

具体STRACE了下 kfed repair的处理过程:

 

[oracle@mlab2 oracleasm]$ dd if=/dev/zero of=asm-disk04 bs=4096 count=1 conv=notrunc
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 6.9811e-05 seconds, 58.7 MB/s
[oracle@mlab2 oracleasm]$ 
[oracle@mlab2 oracleasm]$ kfed read asm-disk04
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
7F70E2F06400 00000000 00000000 00000000 00000000  [................]
Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]
[oracle@mlab2 oracleasm]$ 
[oracle@mlab2 oracleasm]$ 
[oracle@mlab2 oracleasm]$ www.askmaclean.com
[oracle@mlab2 oracleasm]$ 
[oracle@mlab2 oracleasm]$ strace -o  kfed1.log kfed repair asm-disk04
munmap(0x7fd80aa2d000, 143360)          = 0
stat("asm-disk04", {st_mode=S_IFREG|0644, st_size=135056588800, ...}) = 0
access("asm-disk04", F_OK)              = 0
statfs("asm-disk04", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=961422084, f_bfree=454635689, f_bavail=405808746, f_files=244187136, f_ffree=24
4182184, f_fsid={-138681668, -1790432782}, f_namelen=255, f_frsize=4096}) = 0
open("asm-disk04", O_RDWR)              = 7
lseek(7, 2088960, SEEK_SET)             = 2088960
read(7, "\1\202\1\1\0\0\0\0\3\0\0\200\250S\344\5\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
lseek(7, 0, SEEK_SET)                   = 0
read(7, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
lseek(7, 0, SEEK_SET)                   = 0
write(7, "\1\202\1\1\0\0\0\0\3\0\0\200\250S\344\5\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
close(7)

在以上场景中未重现KFED 修复KFBTYP_PST_META的元数据,其仅仅读取了偏移量为2088960的(2088960/4096=510=255+255 即AUN=1的最后第二个块)的4096字节,并写入到OFFSET=0的地方。

 

 

具体查看了以下KFED的源代码 , 也并未发现其会修复其他地方metadata的线索:

 

#define KFEDOP_REPAIR ((kfedop)13)      /* Repair ASM disk header            */
www.askmaclean.com
case KFEDOP_REPAIR:
/* Read from PST(AU 1)'s penultimate Block */
cx->aunum_kfedcx  = (ub4)1;
cx->blknum_kfedcx = (ub4)(bfact - 2);
if (!kfedReadBlk(cx))
goto done;
/* Validate the Disk Header block read from PST */
if(!kfedValidateBlk(cx, KFBTYP_DISKHEAD))
goto done;
/* Fix the block number and checksum in the buffer */
if (!kfedFixBackupHeader(cx)) www.askmaclean.com
goto done;
/* Write to Disk Header(AU 0 and Block 0) */
cx->aunum_kfedcx  = (ub4)0;
cx->blknum_kfedcx = (ub4)0;
if (!kfedWriteBlk(cx))
goto done;
break;

以上代码可以理解为,KFEDOP_REPAIR操作读取PST(AU 1)’s penultimate Block即AUN=1的最后第二个块,若无法读取则直接报错,若可以读取则验证从PST中读取的DISK header的block,并FIX其中的block number以及checksum值,之后写出到disk header即AUN=0 BLKN=0的地方。

 

总之,如果确实遇到了此类ASM的问题,那么在充分备份DISK HEADER后(备份前200MB是必要的),在有DISK HEADER自动备份的情况下,尝试KFED REPAIR一下吧,如果不能成功那么后续会有一堆的诊断metadata和手动修复工作等着我们呢。

 

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

 

诗檀软件专业数据库修复团队

 

服务热线 : 13764045638   QQ号:47079569    邮箱:service@parnassusdata.com

 

ORACLE PRM是诗檀软件独立研发的ORACLE数据库灾难恢复软件,其具有全程图形化界面、简单高效等特点。

欢迎下载使用ORACLE PRM。 下载地址:http://parnassusdata.com/sites/default/files/ParnassusData_PRMForOracle_3206.zip

PRM用户使用手册。http://www.parnassusdata.com/sites/default/files/ParnassusData%20Recovery%20Manager%20For%20Oracle%20Database%E7%94%A8%E6%88%B7%E6%89%8B%E5%86%8C%20v0.3.pdf

 

 

 

沪ICP备14014813号

沪公网安备 31010802001379号