Oracle数据恢复专题

恢复恢复是Oracle中永恒的话题, 只要有数据 就有备份恢复的需求。 而在国内对于备份以及备份的可用性往往被企业所忽视。这造成了再数据库恢复上存在着东西方的差异。 更多的老外DBA把经历花在对Oracle内部原理和性能优化的研究上。

oracle data block structure

而我们国内 DBA似乎必须要精通一门额外的技术==》 在没有任何备份的情况下 恢复Oracle数据库中数据的技术!  虽然这在大多数情况下是屠龙之技, 但很多时候却又变成了衡量一个DBA技术水准的标准了,(这样不好..  不好)。

 

当然也并不是说 这种无备份下的数据恢复是无技术含金量的,实际上它们很需要对Oracle数据文件、数据块及其数据结构的理解,以及对数据字典构成的了解。

 

这里我们总结Oracle数据恢复专题的专题,包括一些在无备份情况下的数据恢复:例如DUL和BBED工具恢复等技术。

 

 

【数据恢复】利用构造ROWID实现无备份情况下绕过ORA-1578、ORA-8103、ORA-1410等逻辑/物理坏块问题

【数据恢复】ORA-600[kccpb_sanity_check_2]一例

Oracle rman中set newname可能很慢

如何清除Oracle控制文件中的无用记录,例如v$archived_log中的deleted归档日志记录

如何找回被create or replace覆盖的PL/SQL对象

Archivelog Completed Before VS UNTIL TIME

ASM丢失disk header导致ORA-15032、ORA-15040、ORA-15042 Diskgroup无法mount

Overcome ORA-600[4xxx] open database

数据恢复:解决ORA-600[kghstack_free2][kghstack_err+0068]一例

清理RMAN Catalog恢复目录

如何rename datafile name中存在乱码的数据文件

11g新特性recover corruption list

解决ORA-01578错误一例

Script:收集介质恢复诊断信息

如何重建SYSAUX表空间上的对象

Oracle数据恢复:解决ORA-00600:[4000] ORA-00704: bootstrap process failure错误一例

Script:检查数据库当前是否有备份操作在执行中

数据恢复:模拟2个逻辑坏块

Script:收集Oracle备份恢复信息

Oracle备份恢复:Rman Backup缓慢问题一例

了解rman catalog的兼容性

Oracle内部错误:ORA-00600[2608]一例

使用bbed解决ORA-01189错误

Fractured block found during backing up datafile

手动递增SCN号的几种方法:How to increase System Change Number by manual

DBMS_REPAIR example

 

【Exadata一体机】Exadata Cell监控最佳实践

  1. Verify cable connections via the following steps

Visually inspect all cables for proper connectivity.

 

确认缆线链接正常

 

 

 

[root@dm01db01 ~]# cat /sys/class/net/ib0/carrier

1

[root@dm01db01 ~]# cat /sys/class/net/ib1/carrier

1

 

确认输出是1

 

 

检查这些命令,

ls -l /sys/class/infiniband/*/ports/*/*errors*

 

 

/opt/oracle.SupportTools/ibdiagtools 目录包含了verify_topology 和infinicheck工具 运行并确认网络。下面是这些工具的信息:

 

[root@dm01db01 ~]# cd /opt/oracle.SupportTools/

[root@dm01db01 oracle.SupportTools]# ls

asrexacheck         defaultOSchoose.pl  firstconf                        make_cellboot_usb  PS4ES            sys_dirs.tar

CheckHWnFWProfile   diagnostics.iso     flush_cache.sh                   MegaSAS.log        reclaimdisks.sh

CheckSWProfile.sh   em                  harden_passwords_reset_root_ssh  ocrvothostd        setup_ssh_eq.sh

dbserver_backup.sh  exachk              ibdiagtools                      onecommand         sundiag.sh

 

 

[root@dm01db01 oracle.SupportTools]# cd ibdiagtools/

[root@dm01db01 ibdiagtools]# ls

cells_conntest.log    dcli                  ibqueryerrors.log  perf_cells.log0  perf_mesh.log1     subnet_cells.log  VERSION_FILE

cells_user_equiv.log  diagnostics.output    infinicheck        perf_cells.log1  perf_mesh.log2     subnet_hosts.log  xmonib.sh

checkbadlinks.pl      hosts_conntest.log    monitord           perf_cells.log2  README             topologies

cleanup_remote.log    hosts_user_equiv.log  netcheck           perf_hosts.log0  SampleOutputs.txt  topology-zfs

clearcounters.log     ibping_test           netcheck_scratch   perf_mesh.log0   setup-ssh          verify-topology

 

 

 

[root@dm01db01 ibdiagtools]# ./verify-topology -h

 

[ DB Machine Infiniband Cabling Topology Verification Tool ]

[Version IBD VER 2.c 11.2.3.1.1  120607]

Usage: ./verify-topology [-v|--verbose] [-r|--reuse (cached maps)]  [-m|--mapfile]

[-ibn|--ibnetdiscover (specify location of ibnetdiscover output)]

[-ibh|--ibhosts (specify location of ibhosts output)]

[-ibs|--ibswitches (specify location of ibswitches output)]

[-t|--topology [torus | quarterrack ] default is fattree]

[-a|--additional [interconnected_quarterrack]

[-factory|--factory non-exadata machines are treated as error]

 

Please note that halfrack is now redundant. Checks for Half Racks

are now done by default.

-t quarterrack

option is needed to be used only if testing on a stand alone quarterrack

-a interconnected_quarterrack

option is to be used only when testing on large multi-rack setups

-t fattree

option is the default option and not required to be specified

 

Example : perl ./verify-topology

Example : ././verify-topology -t quarterrack

Example : ././verify-topology -t torus

Example : ././verify-topology -a interconnected_quarterrack

——— Some Important properties of the fattree cabling topology————–

(1) Every internal switch must be connected to every external switch

(2) No 2 external switches must be connected to each other

——————————————————————————-

Please note that switch guid can be determined by logging in to a switch and

trying either of these commands, depending on availability -

>module-firmware show

OR

>opensm

 

 

 

[root@dm01db01 ibdiagtools]# ./verify-topology -t fattree

 

[ DB Machine Infiniband Cabling Topology Verification Tool ]

[Version IBD VER 2.c 11.2.3.1.1  120607]

External non-Exadata-image nodes found: check for ZFS if on T4-4 – else ignore

Leaf switch found: dmibsw03.acs.oracle.com (212846902ba0a0)

Spine switch found: 10.146.24.251 (2128469c74a0a0)

Leaf switch found: dmibsw02.acs.oracle.com (21284692d4a0a0)

Spine switch found: 10.146.24.252 (2128b7f744c0a0)

Spine switch found: dmibsw01.acs.oracle.com (21286cc7e2a0a0)

Spine switch found: 10.146.24.253 (2128b7ac44c0a0)

 

Found 2 leaf, 4 spine, 0 top spine switches

 

Check if all hosts have 2 CAs to different switches……………[SUCCESS]

Leaf switch check: cardinality and even distribution…………..[SUCCESS]

Spine switch check: Are any Exadata nodes connected …………..[SUCCESS]

Spine switch check: Any inter spine switch links………………[ERROR]

Spine switches 10.146.24.251 (2128469c74a0a0) & 10.146.24.252 (2128b7f744c0a0) should not be connected

[ERROR]

Spine switches 10.146.24.251 (2128469c74a0a0) & 10.146.24.253 (2128b7ac44c0a0) should not be connected

[ERROR]

Spine switches 10.146.24.252 (2128b7f744c0a0) & dmibsw01.acs.oracle.com (21286cc7e2a0a0) should not be connected

[ERROR]

Spine switches 10.146.24.252 (2128b7f744c0a0) & 10.146.24.253 (2128b7ac44c0a0) should not be connected

[ERROR]

Spine switches dmibsw01.acs.oracle.com (21286cc7e2a0a0) & 10.146.24.253 (2128b7ac44c0a0) should not be connected

 

Spine switch check: Any inter top-spine switch links…………..[SUCCESS]

Spine switch check: Correct number of spine-leaf links…………[ERROR]

Leaf switch dmibsw03.acs.oracle.com (212846902ba0a0) must be linked

to spine switch 10.146.24.252 (2128b7f744c0a0) with

at least 1 links…0 link(s) found

[ERROR]

Leaf switch dmibsw02.acs.oracle.com (21284692d4a0a0) must be linked

to spine switch 10.146.24.252 (2128b7f744c0a0) with

at least 1 links…0 link(s) found

[ERROR]

Spine switch 10.146.24.252 (2128b7f744c0a0) has fewer than 2 links to leaf switches.

It has 0

[ERROR]

Leaf switch dmibsw03.acs.oracle.com (212846902ba0a0) must be linked

to spine switch 10.146.24.253 (2128b7ac44c0a0) with

at least 1 links…0 link(s) found

[ERROR]

Leaf switch dmibsw02.acs.oracle.com (21284692d4a0a0) must be linked

to spine switch 10.146.24.253 (2128b7ac44c0a0) with

at least 1 links…0 link(s) found

[ERROR]

Spine switch 10.146.24.253 (2128b7ac44c0a0) has fewer than 2 links to leaf switches.

It has 0

 

Leaf switch check: Inter-leaf link check……………………..[ERROR]

Leaf switches dmibsw03.acs.oracle.com (212846902ba0a0) & dmibsw02.acs.oracle.com (21284692d4a0a0) have 0 links between them

They should have 7 links instead.

 

Leaf switch check: Correct number of leaf-spine links………….[SUCCESS]

 

 

 

 

确认硬件和固件

 

cd /opt/oracle.cellos/

[root@dm01db01 oracle.cellos]# ./CheckHWnFWProfile

 

[SUCCESS] The hardware and firmware profile matches one of the supported profiles

 

 

确认平台软件

 

 

 

 

 

[root@dm01db01 oracle.cellos]# cd /opt/oracle.SupportTools/

[root@dm01db01 oracle.SupportTools]# ./CheckSWProfile.sh

usage: ./CheckSWProfile.sh options

 

This script returns 0 when the platform and software on the

machine on which it runs matches one of the suppored platform and

software profiles. It will return nonzero value in all other cases.

The check is applicable both to Exadata Cells and Database Nodes

with Oracle Enterprise Linux (OEL) and RedHat Enterprise Linux (RHEL).

 

OPTIONS:

-h    Show this message

-s    Show supported platforms and software profiles for this machine

-c    Check this machine for supported platform and software profiles

-I <No space comma separated list of Infiniband switch names/ip addresses>

To check configuration for SPINE switch prefix the switch host name or

ip address with IS_SPINE.

Example: CheckSWProfile.sh -I IS_SPINEswitch1.company.com,switch2.company.com

Check for the software revision on the managed Infiniband switches

in the Database Machine. You will need to supply the password for

admin user.

-S <No space comma separated list of Infiniband switch names/ip addresses>

Example: CheckSWProfile.sh -S switch1.company.com,switch2.company.com

Prints the Serial number and Hardware version for the switches

in the Database Machine. You will need to supply the password for

admin user for Voltaire switches and root user for Sun switches.

 

 

[root@dm01db01 oracle.SupportTools]# ./CheckSWProfile.sh  -c

[INFO] Software checker check option is only available on Exadata cells.

 

[root@dm01db01 oracle.SupportTools]# ssh dm01cel01-priv

 

[root@dm01cel01 oracle.SupportTools]# ./CheckSWProfile.sh -c

 

[INFO] SUCCESS: Meets requirements of operating platform and InfiniBand software.

[INFO] Check does NOT verify correctness of configuration for installed software.

 

 

[root@dm01cel01 oracle.SupportTools]# cd /opt/oracle.cellos/

[root@dm01cel01 oracle.cellos]# ./CheckHWnFWProfile

[SUCCESS] The hardware and firmware profile matches one of the supported profiles

 

 

 

If hardware is replaced, rerun the /opt/oracle.cellos/CheckHWnFWProfile script.

【Goldengate性能优化】优化Extract抽取进程性能,解决OGG抽取日志延迟

一般来说OGG Goldengate 抽取进程对CPU的压力非常小, 而对于I/O 、network的吞吐量有轻量级的要求。

用低配置AIX测试结果如下。

抽取进程支持DB Log生成峰值速度 = 4 * 2.1 = 8.4 MB/秒,或30GB/小时,或726 GB/天。
抽取进程平均CPU占用1.9% 。

投递进程支持DB Log生成平均速度 = 2,096,854 * 2.1 = 4.5 MB/秒,或16 GB/小时,或380 GB/天。
投递进程平均CPU占用7% 。

 

 

对于Extract抽取日志缓慢导致延迟的问题,优先采用如下方法诊断具体慢在 抽取 还是 写trail上:

 

1. 收集原始慢的Extract的性能信息

GGSCI> stats extract <extract_name>, totalsonly *, reportrate sec
GGSCI> stats extract <extract_name>, totalsonly *, reportrate min

 

2. 创建一个新的extract 参数文件

cp <extract_name>.prm ETEST.prm

3. 修改上述 etest params file中的extract名字 和 trail 位置

 

4. 加入TESTMAPPINGSPEED 参数到 etest的params files

TESTMAPPINGSPEED参数的作用是 不让extract 去写trail 文件 而仅仅抽取日志, 若加入该参数后抽取速度大幅提升则说明性能瓶颈在 write trail上

TESTMAPPINGSPEED
REPORTCOUNT EVERY 5000 RECORDS

 

5. 增加etest这个extract

GGSCI> add extract etest, tranlog, begin now

GGSCI> add exttrail ./dirdat/ma , extract etest , megabytes 200

 

6. 为etest指定 原始extract 存在抽取速度问题的archivelog 的sequence

GGSCI> alter extract etest, extseqno <arch_seq_no>, extrba 0

 

7. 启动etest 这个extract

GGSCI> start extract etest

 

等待5分钟并检查

GGSCI> stats extract etest, totalsonly *, reportrate sec
GGSCI> stats extract etest, totalsonly *, reportrate min

 

对比 原始慢的extract 与 新加入的etest的 stats reportrate 报告中的性能指标,若 TESTMAPPINGSPEED 后 性能明显提升则说明问题出在 写trail  (extract 写到本地的情况) 或者 网络传出慢( 直接写到目标机上)。

 

如果TESTMAPPINGSPEED 后性能也无明显变化则继续。

 

8. 将所有extract 的表都注释掉,而仅仅extract 一张很少变化记录的表, 若这样 后性能明显提升则说明 瓶颈不在读archivelog 上而在 日志记录的处理上 log record processing 。

一般来说redo日志的解析分成2部分:

A. Record parsing in Extract
B. Record fetching if needed

 

9.为了进一步确认问题 将TESTMAPPINGSPEED 注释掉, 并 加入 TRACE/TRACE2 参数 以便确认 Extract是否慢在fetch上

 

–TESTMAPPINGSPEED http://www.askmaclean.com
TRACE ./dirtmp/ext.trc
TRACE2 ./dirtmp/ext.trc2

 

10 检查生成的trace 文件 若 其中显示 大量的时间耗费在一些SELECT语句上,则需要DBA介入来调优这些SELECT SQL

 

11. 若看到一些与undo/rollback 相关的错误例如ORA-1555则确保UNDO 表空间可用 空间足够,  也可以加入  FETCHOPTIONS NOUSESNAPSHOT 让 Extract fetch column 数据是尽可能不要走UNDO CR READ

 

12. 如果将大部分表都去掉,只剩下一个不太用的表且仍无明显的性能增长, 且CPU 也不忙, 一般来说这可能是IO瓶颈造成的

 

13. 建议dd测一下archivelog 的读取速度

例如maclean>time dd if=<归档日志> of=/dev/null bs=1M

对比其他磁盘若有明显差异, 则考虑将archivelog 移动到对应磁盘并再次上述测试。

 

 

对于cache较小的sequence 可以引起在replicat DDL 时频繁执行 ALTER SEQUENCE “SEQ_NAME” CYCLE的DDL语句:

 

2013-04-22 09:54:06  INFO    OGG-01487  DDL found, operation [ALTER SEQUENCE "TXN_SEQ" CYCLE  (size 31)], start SCN [20181621217], commit SCN [20181621231] instance [C
ULPRODB (1)], DDL seqno [2734821], marker seqno [2736076].

2013-04-22 09:54:06  INFO    OGG-00487  DDL operation included [Include Mapped], optype [ALTER], objtype [SEQUENCE], objowner [OLSUSER], objname [TXN_SEQ].

2013-04-22 09:54:06  INFO    OGG-00497  Writing DDL operation to extract trail file.
Wildcard SEQUENCE resolved (entry olsuser.*):
  Sequence "OLSUSER"."TXN_SEQ";
Resolving source sequence OLSUSER.TXN_SEQ.

2013-04-22 09:54:07  INFO    OGG-01487  DDL found, operation [ALTER SEQUENCE "TXN_SEQ" CYCLE  (size 31)], start SCN [20181621236], commit SCN [20181621248] instance [C
ULPRODB (1)], DDL seqno [2734822], marker seqno [2736077].

2013-04-22 09:54:07  INFO    OGG-00487  DDL operation included [Include Mapped], optype [ALTER], objtype [SEQUENCE], objowner [OLSUSER], objname [TXN_SEQ].

2013-04-22 09:54:07  INFO    OGG-00497  Writing DDL operation to extract trail file.
Wildcard SEQUENCE resolved (entry olsuser.*):
  Sequence "OLSUSER"."TXN_SEQ";
Resolving source sequence OLSUSER.TXN_SEQ.

2013-04-22 09:54:08  INFO    OGG-01487  DDL found, operation [ALTER SEQUENCE "TXN_SEQ" CYCLE  (size 31)], start SCN [20181621251], commit SCN [20181621261] instance [C
ULPRODB (1)], DDL seqno [2734823], marker seqno [2736078].

2013-04-22 09:54:08  INFO    OGG-00487  DDL operation included [Include Mapped], optype [ALTER], objtype [SEQUENCE], objowner [OLSUSER], objname [TXN_SEQ].

2013-04-22 09:54:08  INFO    OGG-00497  Writing DDL operation to extract trail file.
Wildcard SEQUENCE resolved (entry olsuser.*):
  Sequence "OLSUSER"."TXN_SEQ";
Resolving source sequence OLSUSER.TXN_SEQ.

 

若该replicat target库上尚有extract则 extract挖掘日志时可能因为处理频繁的DDL操作而变得很慢,”Why GoldenGate replicat issues “alter sequence .. cycle|nocycle” in sequence replication? [ID 1535322.1]” 文档指出了 这种频繁的 是为了在target 上上可信赖的同步sequence的高水位。

但是这种超频繁的 几乎每2s 一次的ALTER SEQUENCE CYCLE操作确实拖慢了Extract的速度, 可以通过指定参数 DBOPTIONS _NOAUTOMATICSEQUENCEFLUSH来减少ALTER SEQUENCE CYCLE出现,实际并不能完全避免。 文档指出使用该 DBOPTIONS _NOAUTOMATICSEQUENCEFLUSH参数可能导致TARGET上的SEQUENCE 不同步。

实际优先考虑增加SOURCE上SEQUENCE的CACHE解决问题, 之后再考虑用 DBOPTIONS _NOAUTOMATICSEQUENCEFLUSH; 针对该SEQUENCE  在EXTRACT里 排除该索引 没有效果, 例如 DDL EXCLUDE OBJNAME “XX.SEQNAME”  或者 TABLEEXCLUDE “XX.SEQUENCE”在实际测试中均没有明显的改善, 但修改DBOPTIONS _NOAUTOMATICSEQUENCEFLUSH确实可以该少性能。

【数据恢复】利用构造ROWID实现无备份情况下绕过ORA-1578、ORA-8103、ORA-1410等逻辑/物理坏块问题

对于无备份情况下的ORA-1578、ORA-8103、ORA-1410等逻辑/物理坏块问题,可以通过如下PL/SQL 构造ROWID的方式挽救绝大多数非坏块的数据, 一般分成 2种情况 有索引可以用来获取ROWID, 或者 没有索引、索引不可用,必须通过dbms_rowid.ROWID_CREATE来构造ROWID的方式。

 

对于有索引的情况,可以直接使用MOS上提供的脚本:

 

 

REM Create a new table based on the table that is producing errors with no rows:

create table 
as
select *
from   
where  1=2;

REM Create the table to keep track of ROWIDs pointing to affected rows:

create table bad_rows (row_id rowid
                      ,oracle_error_code number);
set serveroutput on

DECLARE
  TYPE RowIDTab IS TABLE OF ROWID INDEX BY BINARY_INTEGER;
  CURSOR c1 IS select /*+ index(tab1) */ rowid
  from  tab1
  where  is NOT NULL;
  r RowIDTab;
  rows NATURAL := 20000;
  bad_rows number := 0 ;
  errors number;
  error_code number;
  myrowid rowid;
BEGIN
  OPEN c1;
  LOOP
   FETCH c1 BULK COLLECT INTO r LIMIT rows;
   EXIT WHEN r.count=0;
   BEGIN
    FORALL i IN r.FIRST..r.LAST SAVE EXCEPTIONS
     insert into 
     select /*+ ROWID(A) */ 
     from  A where rowid = r(i);
   EXCEPTION
   when OTHERS then
    BEGIN
     errors := SQL%BULK_EXCEPTIONS.COUNT;
     FOR err1 IN 1..errors LOOP
      error_code := SQL%BULK_EXCEPTIONS(err1).ERROR_CODE;
      if error_code in (1410, 8103, 1578) then
       myrowid := r(SQL%BULK_EXCEPTIONS(err1).ERROR_INDEX);
       bad_rows := bad_rows + 1;
       insert into bad_rows values(myrowid, error_code);
      else
       raise;
      end if;
     END LOOP;
    END;
   END;
   commit;
  END LOOP;
  commit;
  CLOSE c1;
  dbms_output.put_line('Total Bad Rows: '||bad_rows);
END;
/

 

 

 

对于没有索引或者索引损坏的情况可以使用如下方法:

 

 

创建示例数据

create table maclean_tab1 (t1 int,t2 date default sysdate) tablespace users
partition by range(t1) 
(partition p1 values less than (10000),
partition p2 values less than (20000),
partition p3 values less than (30000),
partition p4 values less than (40000),
partition p5 values less than (50000),
partition p6 values less than (60000),
partition p7 values less than (70000),
partition p8 values less than (80000),
partition p9 values less than (90000),
partition p10 values less than (100000),
partition p11 values less than (110000),
partition p12 values less than (120000),
partition p13 values less than (130000),
partition p14 values less than (140000),
partition p15 values less than (150000),
partition p16 values less than (160000))
;

insert into maclean_tab1(t1) select rownum from dual connect by level<160000;

commit;

SQL> select count(*) from maclean_tab1;

  COUNT(*)
----------
    159999

exec dbms_stats.gather_table_stats(USER,'MACLEAN_TAB1');   

SQL> alter system flush buffer_cache;

System altered.

SQL> /

System altered.

随即采样一些块来做 坏块 约涉及到10个块的数据

set linesize 200 pagesize 1400

select dbms_rowid.rowid_block_number(rowid) blkid,
       dbms_rowid.rowid_relative_fno(rowid) rfile
  from maclean_tab1 sample(0.01)
 where rownum <= 200
 group by dbms_rowid.rowid_block_number(rowid),
          dbms_rowid.rowid_relative_fno(rowid)
 order by 1;

     BLKID      RFILE
---------- ----------
    741833          4
    741850          4
    741994          4
    742030          4
    742085          4
    742141          4
    742159          4
    742172          4
    742173          4
    742179          4

 制造坏块

    [oracle@vrh8 udump]$ rman target  /

RMAN> blockrecover datafile 4 block 741833,741850,741994,742030,742085,742141,742159,742172,742173,742179 clear;

Starting blockrecover at 21-APR-13
using channel ORA_DISK_1
Finished blockrecover at 21-APR-13

SQL> select count(*) from maclean_tab1;
select count(*) from maclean_tab1
                     *
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 4, block # 741833)
ORA-01110: data file 4:
'/s01/oradata/G10R25/datafile/o1_mf_users_8nx5srgb_.dbf'

通过blockrecover datafile block clear 构造了一系列坏块且没有备份 ,我们通过下列脚本挽回大部分可用数据

一个实际操作过程



drop table maclean_tab_backup;

create table maclean_tab_backup 
tablespace users 
nologging compress pctfree 0 pctused 99        --可以注释掉的
 as select * from maclean_tab1 where 1=0;

drop table bad_rows;
create table bad_rows (row_id rowid,oracle_error_code varchar2(50))
tablespace users 
nologging compress pctfree 0 pctused 99        --可以注释掉的;

set serveroutput on;
set timing on;

DECLARE
  TYPE RowIDTab IS TABLE OF ROWID INDEX BY BINARY_INTEGER;
  CURSOR Crowid_info IS
    select Do.DATA_OBJECT_ID dataid,
           DE.FILE_ID        fid,
           DE.BLOCK_ID       blkid,
           DE.BLOCKS         blkcnt
      from dba_objects DO, dba_extents DE
     where DO.OBJECT_NAME = 'MACLEAN_TAB1' 
     --and DE.PARTITION_NAME='&PARTITION_NAME'          --若指定分区则取消注释
       and nvl(DO.SUBOBJECT_NAME,'-1') = nvl(DE.PARTITION_NAME,'-1')
       and DO.OBJECT_NAME = DE.SEGMENT_NAME
       and DO.owner = 'SYS'
     order by 1, 2, 3 asc;
  bad_rows   number := 0;
  errors     varchar2(500);
  error_code varchar2(500);
  myrowid    rowid;
BEGIN
  /* Maclean Liu http://www.askmaclean.com * Copy Right 2013-4-20 */ 
  execute immediate 'alter session set commit_write=''batch,nowait'' ';
  for i in Crowid_info loop
    for j in 0 .. i.blkcnt - 1 loop
      for z in 0 .. 2000 loop
        begin
          myrowid := dbms_rowid.ROWID_CREATE(1,
                                             i.dataid,
                                             i.fid,
                                             i.blkid + j,
                                             z);
          insert into maclean_tab_backup
            select /*+ ROWID(A) */
             *
              from maclean_tab1 A
             where rowid = myrowid;
        EXCEPTION
          when OTHERS then
            BEGIN
              errors     := SQLERRM;
              error_code := SQLCODE;
              if (error_code like '%1410%' or error_code like '%8103%' or  error_code like '%1578%') then
                bad_rows := bad_rows + 1;
                insert into bad_rows values (myrowid, error_code);
                commit;
              else
                raise;
              end if;
            END;
            commit;
        end;
      end loop;
    end loop;
  end loop;
  dbms_output.put_line('Total Bad Rows: ' || bad_rows);
  commit;
END;
/

Elapsed: 00:01:10.16

SQL> select count(*) from maclean_tab_backup;

  COUNT(*)
----------
    155921

 ===>损失了少量的10个块的数据

 

 

原始脚本如下:

步骤1 创建备份表

create table <new table name>
as
select *
from   <original table name>
where  1=2;

步骤2 创建bad_rows表

create table bad_rows (row_id rowid,oracle_error_code varchar2(50));

步骤3 运行下列脚本

DECLARE
  TYPE RowIDTab IS TABLE OF ROWID INDEX BY BINARY_INTEGER;
  CURSOR Crowid_info IS
    select Do.DATA_OBJECT_ID dataid,
           DE.FILE_ID        fid,
           DE.BLOCK_ID       blkid,
           DE.BLOCKS         blkcnt
      from dba_objects DO, dba_extents DE
     where DO.OBJECT_NAME = '&TABNAME' 
     --and DE.PARTITION_NAME='&PARTITION_NAME'          --若指定分区则取消注释
       and nvl(DO.SUBOBJECT_NAME,'-1') = nvl(DE.PARTITION_NAME,'-1')
       and DO.OBJECT_NAME = DE.SEGMENT_NAME
       and DO.owner = '&OWNER'
     order by 1, 2, 3 asc;
  bad_rows   number := 0;
  errors     varchar2(500);
  error_code varchar2(500);
  myrowid    rowid;
BEGIN
  /* Maclean Liu http://www.askmaclean.com * Copy Right 2013-4-20 */ 
  execute immediate 'alter session set commit_write=''batch,nowait'' ';
  for i in Crowid_info loop
    for j in 0 .. i.blkcnt - 1 loop
      for z in 0 .. 2000 loop
        begin
          myrowid := dbms_rowid.ROWID_CREATE(1,
                                             i.dataid,
                                             i.fid,
                                             i.blkid + j,
                                             z);
          insert into &backup_table
            select /*+ ROWID(A) */
             *
              from &source_table A
             where rowid = myrowid;
        EXCEPTION
          when OTHERS then
            BEGIN
              errors     := SQLERRM;
              error_code := SQLCODE;
              if (error_code like '%1410%' or error_code like '%8103%' or  error_code like '%1578%') then
                bad_rows := bad_rows + 1;
                insert into bad_rows values (myrowid, error_code);
                commit;
              else
                raise;
              end if;
            END;
            commit;
        end;
      end loop;
    end loop;
  end loop;
  dbms_output.put_line('Total Bad Rows: ' || bad_rows);
  commit;
END;
/

构造ROWID绕过ORA-1578、ORA-8103、ORA-1410脚本下载

【数据库升级】dbms_registry_sys.gather_stats过程可能过慢

dbms_registry_sys.gather_stats 存储过程在数据库字典升级 catupgrad.sql运行过程中被调用,该存储过程负责收集各 component 包括字典收集统计信息Gather Dictionary Schema Statistics ,如果 系统中AWR保存时间过长 例如30天以上,则可能耗费大量时间在收集一些WRH$、WRI$的AWR基础表上,这是正常的。 经验是 80GB的SYSAUX表空间在EMC中端存储下, Gather Dictionary Schema Statistics大约耗时58分钟; 有国外的朋友遇到过dbms_registry_sys.gather_stats(NULL)超过4小时。

 

PROCEDURE gather_stats (comp_id IN VARCHAR2);

 

文档During A Manual Database Upgrade To 11.2, Gathering Dictionary Statistics Takes Too Long (catupgrd.sql, cmpupend.sql) [ID 1425763.1]给出了一些解决方案,包括:

Solution

1. Make sure that you gather dictionary statistics as a preparation step before upgrading, by executing the following command:
SQL> EXECUTE dbms_stats.gather_dictionary_stats;

2. To reduce downtime for the database upgrade, set the following parameter in the init.ora (or spfile) before upgrading:

Note: DBUA will not retain the hidden parameter and this parameter is required to be set for the post-installation steps so setting this parameter will not help if you are upgrading database using DBUA .

_optim_dict_stats_at_db_cr_upg = false

Once all post install steps have been completed, remove (or comment out) the _optim_dict_stats_at_db_cr_upg parameter from the init.ora (or spfile)
NOTE: This parameter is only for reducing downtime during a database upgrade

3. Gather dictionary statistics again after the database has been upgraded successfully.

Please refer to the following articles for complete information about gathering statistics:
How to Gather Statistics on SYS Objects and ‘Fixed’ Objects? (Note:457926.1)
How to Gather Optimizer Statistics on 11g (Note:749227.1)

 

了解Oracle Linux近况

从2006年开始Oracle 提供自己的Linux发行版本: Oracle Enterprise Linux 后改称为Oracle Linux, 采用自己定制的UEK unbreakable Enterprise Kernel内核。

到 Linux 6开始仅在自己的Oracle Linux上提供ASMLIB软件,并集成了Ksplice

目前关于Oracle Linux的市场份额 gartner 和 idc没有对外公开的市场调查文档,但是有报告宣称在新的Linux 市场license许可证销售中Oracle Linux占了80.5%的比重(redhat的同学勿拍砖)。

在实际使用中 若搭建Oracle 、Mysql 数据库服务器,本人更推荐采用Oracle Linux 5/6。

同时Oracle官方宣称UEK比 redhat的kernel 在性能上有诸多优势:

  • 8kb flash cache reads 下UEK比 Red Hat compatible kernel ,IOPS高400%
  • Solid State Disk access 每秒吞吐量高137%
  • 8 socket database OLTP 下 每秒事务数 高75%

 

oracle linux uek performance

 

 

 

【Maclean Liu技术分 享】深入理解Oracle中 Mutex的内部原理

【Maclean Liu技术分 享】深入理解Oracle中 Mutex的内部原理

本文适合对Oracle Mutex/latch有兴趣了解其深入内部原理的同学

下载地址:

【Maclean Liu技术分享】深入理解Oracle中Mutex的内部原理

 

【Maclean Liu技术分 享】深入了解Oracle ASM(一)基础概念

【Maclean Liu技术分 享】深入了解Oracle ASM(一)基础概念

本文适合刚入门ASM,基础概念仍不清晰的同学。

 

下载地址:

【Maclean Liu技术分享】深入了解Oracle ASM(一)基础概念