Oracle ASM磁盘头块 disk header block修复如何运行?

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 13764045638    QQ号:47079569    邮箱:[email protected]

 

本文中,我会解释我关于ASM磁盘头块的最后几个作品之一。

首先,我会在Oracle数据库中新建TEST表空间。TEST的数据文件由ASM进行管理。

[oracle@oel ~]$ . oraenv

ORACLE_SID = [+ASM] ? orcl

The Oracle base remains unchanged with value /u01/app/oracle

[oracle@oel ~]$ sqlplus / as sysdba

SQL> create tablespace test datafile ‘+data’ size 5M autoextend on maxsize unlimited;

Tablespace created.

SQL>  select instance_name from v$instance;

INSTANCE_NAME

—————-

orcl

SQL> select file_id,tablespace_name,file_name,status,online_status from dba_data_files

  2  /

   FILE_ID TABLESPACE_NAME                FILE_NAME                                          STATUS    ONLINE_

———- —————————— ————————————————– ——— ——-

         4 USERS                          /u02/oradata/orcl/users01.dbf                      AVAILABLE ONLINE

         3 UNDOTBS1                       /u02/oradata/orcl/undotbs01.dbf                    AVAILABLE ONLINE

         2 SYSAUX                         /u02/oradata/orcl/sysaux01.dbf                     AVAILABLE ONLINE

         1 SYSTEM                         /u02/oradata/orcl/system01.dbf                     AVAILABLE SYSTEM

         5 EXAMPLE                        /u02/oradata/orcl/example01.dbf                    AVAILABLE ONLINE

         6 TEST                           +DATA/orcl/datafile/test.258.798578613             AVAILABLE ONLINE

6 rows selected.

我的ASM实例管理两个磁盘组DATA(外部冗余)和RL(正常冗余)。

DATA是由ASMLib管理的2个磁盘组成的磁盘组

SQL> select group_number,name,type,state from v$asm_diskgroup

  2  /

GROUP_NUMBER NAME TYPE   STATE

———— —- —— ———–

           1 DATA EXTERN MOUNTED

           2 RL   NORMAL MOUNTED

SQL> select path from v$asm_disk where group_number=1;

PATH

———————————

ORCL:ASM1

ORCL:ASM2

现在,我们清除每个磁盘的头块。头块就是ASM磁盘设备中的第一个,其默认大小为4096字节(该尺寸在_asm_blksize未公开参数中可用)。

[oracle@oel ~]$ dd if=/dev/zero of=/dev/oracleasm/disks/ASM1 bs=4096 count=1

1+0 records in

1+0 records out

4096 bytes (4.1 kB) copied, 2.2e-05 seconds, 186 MB/s

[oracle@oel ~]$ dd if=/dev/zero of=/dev/oracleasm/disks/ASM2 bs=4096 count=1

1+0 records in

1+0 records out

4096 bytes (4.1 kB) copied, 0.000336 seconds, 12.2 MB/s

即使块头遭破坏,我还可以写在磁盘组中,并在其中分配区段。

SQL> alter database datafile 6 resize 10M;

Database altered.

SQL> create table t tablespace TEST as select * from dba_source;

Table created.

SQL> alter system checkpoint;

System altered.

SQL> select file#,checkpoint_time from v$datafile_header where file#=6;

     FILE# CHECKPOINT_TIME

———- ——————-

         6 05/11/2012 19:27:43

好,我们尝试重启rdbms实例:

SQL> shutdown immediate

Database closed.

Database dismounted.

ORACLE instance shut down.

SQL> startup

ORACLE instance started.

Total System Global Area 1068937216 bytes

Fixed Size                  2235208 bytes

Variable Size             377488568 bytes

Database Buffers          683671552 bytes

Redo Buffers                5541888 bytes

Database mounted.

Database opened.

SQL> select count(*) from t;

  COUNT(*)

———-

    702070

我仍然可以重启它并读取所有文件区。

这个演示中有点奇怪,ASM磁盘头无效但我仍然可以读取文件。我会用kfed验证我的头状态(下面仅展示了一个ASM磁盘):

oracle@oel ~]$ kfed read /dev/oracleasm/disks/ASM1

kfbh.endian:                          0 ; 0x000: 0x00

kfbh.hard:                            0 ; 0x001: 0x00

kfbh.type:                            0 ; 0x002: KFBTYP_INVALID

kfbh.datfmt:                          0 ; 0x003: 0x00

kfbh.block.blk:                       0 ; 0x004: blk=0

kfbh.block.obj:                       0 ; 0x008: file=0

kfbh.check:                           0 ; 0x00c: 0x00000000

kfbh.fcn.base:                        0 ; 0x010: 0x00000000

kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000

kfbh.spare1:                          0 ; 0x018: 0x00000000

kfbh.spare2:                          0 ; 0x01c: 0x00000000

2B469E002400 00000000 00000000 00000000 00000000  […………….]

  Repeat 255 times

KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

[oracle@oel ~]$ sudo /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/kfed read /dev/sdb1

kfbh.endian:                          0 ; 0x000: 0x00

kfbh.hard:                            0 ; 0x001: 0x00

kfbh.type:                            0 ; 0x002: KFBTYP_INVALID

kfbh.datfmt:                          0 ; 0x003: 0x00

kfbh.block.blk:                       0 ; 0x004: blk=0

kfbh.block.obj:                       0 ; 0x008: file=0

kfbh.check:                           0 ; 0x00c: 0x00000000

kfbh.fcn.base:                        0 ; 0x010: 0x00000000

kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000

kfbh.spare1:                          0 ; 0x018: 0x00000000

kfbh.spare2:                          0 ; 0x01c: 0x00000000

2B413559C400 00000000 00000000 00000000 00000000  […………….]

  Repeat 255 times

KFED-00322: file not found; arguments: [kfbtTraverseBlock] [Invalid OSM block type] [] [0]

KFED已经确认我的头块无效。现在,我会尝试卸载并挂载磁盘组,看对磁盘组上的安装运行是否有任何影响。

[oracle@oel ~]$ sqlplus / as sysasm

SQL> alter diskgroup DATA dismount;

Diskgroup altered.

SQL> alter diskgroup DATA mount;

alter diskgroup DATA mount

*

ERROR at line 1:

ORA-15032: not all alterations performed

ORA-15017: diskgroup “DATA” cannot be mounted

ORA-15063: ASM discovered an insufficient number of disks for diskgroup “DATA”

所以,ASM磁盘头对于安装磁盘组似乎很重要,但对ASM扩展区分配等没有影响。

现在,我要用kfed和和修复操作来修复ASM磁盘,以复原ASM头块:

[oracle@oel ~]$ kfed repair /dev/oracleasm/disks/ASM1

[oracle@oel ~]$ kfed repair /dev/oracleasm/disks/ASM2

[oracle@oel ~]$ sqlplus / as sysasm

SQL> alter diskgroup DATA mount;

Diskgroup altered.

所以KFED 能够复原头块(用修复操作).

[oracle@oel ~]$ . oraenv

ORACLE_SID = [+ASM] ? orcl

The Oracle base remains unchanged with value /u01/app/oracle

[oracle@oel ~]$ sqlplus / as sysdba

Connected to an idle instance.

SQL> startup

Database opened.

SQL> select count(*) from T;

  COUNT(*)

———-

    702070

T表中的数据依然存在……没问题!

好,我们来总结一下……抹去我的磁盘组中两个ASM磁盘的头块,对文件的扩展区分配没有任何影响。唯一的影响是在挂载能力上。接下来,我用KFED修复操作复原了每个磁盘的头块……是的,但没有磁盘或磁盘头的任何备份。所以我想..我头块的备份在哪里?

首先,我要尝试一个很简单的方法。我会看着ASM磁盘设备中的所有元数据块。希望找到另一个头块。为此,我会使用一个基本的shell脚本来分析每个块。如果该脚本中找到一个KFBTYP_DISKHEAD类型的块,它将保持在块位置,并且在脚本执行的最后打印出来:

基本的脚本:

#!/bin/bash

#set -x

export ORAENV_ASK=NO

export ORACLE_SID=+ASM

. oraenv

i=0

ret=0

tab_cnt=0;

blk_typ=”;

while [ “$blk_typ” != “KFBTYP_INVALID” ];

do

  blk_typ=`kfed read $1 blkn=$i | grep kfbh.type | awk ‘{print $5;}’ | sed ‘s/^ *//g’ | sed ‘s/ *$//g’ `

  ret=$?

  if [ “$blk_typ” = “KFBTYP_DISKHEAD” ]; then

    t[$i]=$i

  fi

  let i=$i+1

done

echo “list of header block with KFBTYP_DISKHEAD type”

echo ${t[@]}

执行结果:

[oracle@oel ~]$ ./asm_surgery.sh /dev/oracleasm/disks/ASM1

The Oracle base remains unchanged with value /u01/app/oracle

list of header block with KFBTYP_DISKHEAD type

0 510

[oracle@oel ~]$ ./asm_surgery.sh /dev/oracleasm/disks/ASM2

The Oracle base remains unchanged with value /u01/app/oracle

list of header block with KFBTYP_DISKHEAD type

0 510

所以在我的磁盘上第510块有头块的副本。事实上,正如Bane Radulović 在博客中提到的,ASM磁盘头的备份副本位于分配单位1的倒数第二块。所以,如果我的AU1MB,块大小为4096字节,头块的副本将在((1048576/4096* 2- 1 =511)中可用,由于块#从0开始,它位于第510块。

要仔细检查这一点,我会清除第一个ASM磁盘的头块,并使用KFED修复和检查,以查看到底发生了什么:

[oracle@oel ~]$ dd if=/dev/zero of=/dev/oracleasm/disks/ASM1 bs=4096 count=1

1+0 records in

1+0 records out

4096 bytes (4.1 kB) copied, 2.3e-05 seconds, 178 MB/s

[oracle@oel ~]$ strace kfed repair /dev/oracleasm/disks/ASM1

…/…

stat(“/dev/oracleasm/disks/ASM1”, {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 17), …}) = 0

access(“/dev/oracleasm/disks/ASM1”, F_OK) = 0

statfs(“/dev/oracleasm/disks/ASM1”, {f_type=0x958459f6, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096}) = 0

open(“/dev/oracleasm/disks/ASM1”, O_RDWR) = 7

lseek(7, 2088960, SEEK_SET)             = 2088960

read(7, “\1\202\1\1\376\200<\206\371\7″…, 4096) = 4096

lseek(7, 0, SEEK_SET)                   = 0

read(7, “”…, 4096) = 4096

lseek(7, 0, SEEK_SET)                   = 0

write(7, “\1\202\1\1\200\302\206\371\7″…, 4096) = 4096

close(7)                                = 0

这很有意思.

第一次运行,打开设备,在2088960字节后读取块。但是 2088960

/4096=510,所以它读取盘的第510块。接下来,它在位置0(头块)读取块,并把它写入到第510块的内容。

现在我知道在秘密的位置每个头块都有副本! (块#510),Kfed使用该块修复主头块。

测试期间我注意到,有时ASM磁盘失去了ASM标签(不知道为什么)。因此,修复的磁盘不会再用oracleasm scandiskd重建(或重新启动后)。

[root@oel ~]# ls /dev/oracleasm/disks/*

/dev/oracleasm/disks/ASM1  /dev/oracleasm/disks/ASM2  /dev/oracleasm/disks/ASM3  /dev/oracleasm/disks/ASM4

[root@oel ~]# oracleasm scandisks

Reloading disk partitions: done

Cleaning any stale ASM disks…

Cleaning disk “ASM1”

Scanning system for ASM disks…

[root@oel ~]# ls /dev/oracleasm/disks/*

/dev/oracleasm/disks/ASM2  /dev/oracleasm/disks/ASM3  /dev/oracleasm/disks/ASM4

既然知道哪些设备与asmlib磁盘设备相对应,就可以用oracleasm renamedisk命令重新标记该磁盘(使用该命令一定要小心):

[root@oel ~]# oracleasm renamedisk -f /dev/sdb1 ASM1

Writing disk header: done

Instantiating disk “ASM1”: done

[root@oel ~]# oracleasm scandisks

Reloading disk partitions: done

Cleaning any stale ASM disks…

Scanning system for ASM disks…

[root@oel ~]# ls /dev/oracleasm/disks/*

/dev/oracleasm/disks/ASM1  /dev/oracleasm/disks/ASM2  /dev/oracleasm/disks/ASM3  /dev/oracleasm/disks/ASM4


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *