清理RMAN Catalog恢复目录

有这样一个需求,用户使用RMAN Catalog恢复目录保存多台Database Server的备份信息。 由于每天都会执行大量的backup操作,而且所备份的DB的结构本身都已经十分复杂了,导致recovery catalog恢复目录占用空间迅速增长,且用户的磁盘空间较为紧张,可能在短期内无法扩disk space,这就让我们考虑到需要清理RMAN Recovery Catalog中一些不再需要的记录。

首先搞清楚在Catalog库中那些数据段占用了最多的空间Space:

 

这里 RMAN 用户是Catalog Schema 的拥有者

SQL>  select bytes/1024/1024 "MB" ,segment_name,segment_type
from dba_segments where owner='RMAN' order by bytes desc;

        MB SEGMENT_NAME         SEGMENT_TYPE
---------- -------------------- ------------------
        88 ROUT                 TABLE
        47 ROUT_U1              INDEX
        31 ROUT_I_RSR           INDEX
        23 ROUT_I_DB            INDEX
      .125 RSR                  TABLE

 

可以看到是ROUT表占用了最多的空间,那么这张表是做什么用处的呢?

ROUT表用于记录RMAN在执行backup等命令时的输出内容,实际上堆积了V$RMAN_OUTPUT视图中的内容,当目标数据库连接到CATALOG库执行操作或Resync Catalog时会将V$RMAN_OUTPUT中的信息同步到Catalog库的ROUT表中。

 

SQL> desc v$rman_output;
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 SID                                                NUMBER
 RECID                                              NUMBER
 STAMP                                              NUMBER
 SESSION_RECID                                      NUMBER
 SESSION_STAMP                                      NUMBER
 OUTPUT                                             VARCHAR2(130)
 RMAN_STATUS_RECID                                  NUMBER
 RMAN_STATUS_STAMP                                  NUMBER
 SESSION_KEY                                        NUMBER

SQL> select output from v$rman_output where output is not null and rownum <3;

OUTPUT
--------------------------------------------------------------------------------
connected to target database: VPROD (DBID=881465081)
connected to recovery catalog database

 

ROUT表的定义存放在$ORACLE_HOME/rdbms/admin/recover.sql 中,具体如下:

 

CREATE TABLE rout
(
db_key          NUMBER NOT NULL,        -- database output belongs to
rsr_key         NUMBER NOT NULL,        -- command that generated the output
rout_skey       NUMBER NOT NULL,        -- session that created the output
rout_recid      NUMBER NOT NULL,        -- record id from server
rout_stamp      NUMBER NOT NULL,        -- timestamp when row was added
rout_text       VARCHAR2(130) NOT NULL, -- RMAN output
CONSTRAINT rout_u1 UNIQUE(db_key, rout_skey, rsr_key, rout_recid, rout_stamp),
CONSTRAINT rout_f1 FOREIGN KEY(db_key)
  REFERENCES db ON DELETE CASCADE,
CONSTRAINT rout_f2 FOREIGN KEY(rsr_key)
  REFERENCES rsr ON DELETE CASCADE
) &tablespace&

 

正是因为该Catalog恢复目录每天都会记录大量结构复杂(拥有众多数据文件、归档日志)的数据库的备份输出信息,导致ROUT表迅速膨胀,可能会存放几十万条记录,在我们的例子中:

 

SQL> select count(*) from rman.rout;

  COUNT(*)
----------
   1069485

 

根据文档<Rman Catalog Resync Operation is Very slow at 10G [ID 378234.1]>的介绍ROUT表包含了所有RMAN会话的输出内容,且这些内容仅对 Enterprise Manager企业管理器有用。 默认情况下ROUT表也会在每次Resync Catalog操作时被清理(clean up),但是在版本10.2.0.1-10.2.0.3仍会保留最近60天来的记录,当RMAN被频繁使用时60天的ROUT数据也可能非常多。(The ROUT table contains the RMAN output generated during all rman sessions and is used only by Enterprise Manager. The ROUT table is cleaned up automatically during each resync operation leaving by default, the last 60 days worth of entries in ROUT.   However where RMAN is used  very frequently, 60 days of ROUT entries is still too much.)

 

而从版本10.2.0.4 开始清理ROUT的存储过程cleanupROUT默认仅保留最近7天的数据,这就保证了ROUT不会占用过多的空间。

 

这里实际控制ROUT表中数据如何清理的recovery.sql中定义的cleanupROUT存储过程,在版本10.2.0.1 中该过程的定义如下:

 

-- The procedure cleanupROUT (deletes) all RC_RMAN_OUTPUT rows corresponding
-- to job older than 60 days.
PROCEDURE cleanupROUT IS
  start_time       date;
  high_stamp       number;
  high_session_key number;
BEGIN
  IF (this_db_key IS NULL) THEN
    raise_application_error(-20021, 'Database not set');
  END IF;

  start_time := SYSDATE;
  high_stamp := date2stamp(start_time-60);

  SELECT nvl(max(session_key), 0) into high_session_key
     from rc_rman_status where session_stamp < high_stamp;

  DELETE FROM rout
        WHERE rout_skey <= high_session_key
          AND this_db_key = rout.db_key;

  deb('cleanupROUT deleted ' || sql%rowcount || ' rows from rout table');
  deb('cleanupROUT took ' || ((sysdate - start_time) * 86400) || ' seconds');

END cleanupROUT;

 

实际我们只要修改这里high_stamp := date2stamp(start_time-60); 指定的60天的范围,就可以改变10.2.0.4前保留60天ROUT历史数据导致该表暴涨的问题。

 

首先备份recovery.sql 文件, 之后我们修改该cleanupROUT过程的定义如下:

 

 

cd $ORACLE_HOME/rdbms/admin
cp  recover.bsq recover.bsq.bak

PROCEDURE cleanupROUT IS
  start_time       date;
  high_stamp       number;
  high_session_key number;
BEGIN
  IF (this_db_key IS NULL) THEN
    raise_application_error(-20021, 'Database not set');
  END IF;

  start_time      := SYSDATE;
  high_stamp      := date2stamp(start_time-7);

  SELECT max(rsr_key) into high_session_key
    FROM rsr, dbinc
   WHERE dbinc.db_key = this_db_key
     AND rsr.dbinc_key = dbinc.dbinc_key
     AND rsr.rsr_stamp < high_stamp;

  deb('cleanupROUT select took ' || ((sysdate - start_time) * 86400) ||
      ' seconds');

  -- Delete rows from rout table for jobs older than 7 days.
  If high_session_key IS NOT NULL THEN
     DELETE FROM rout
     WHERE  db_key     = this_db_key
       AND  rout_skey <= high_session_key;
     deb('cleanupROUT deleted ' || sql%rowcount || ' rows from rout table');
  END IF;

  deb('cleanupROUT took ' || ((sysdate - start_time) * 86400) || ' seconds');

END cleanupROUT;

 

完成上述定义修改后,需要使以上的定义生效,连接到catalog库并升级catalog:

 

[oracle@vrh8 admin]$ rman target / catalog rman/rman@VPROD

Recovery Manager: Release 10.2.0.1.0 - Production on Mon Dec 5 00:37:35 2011

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

connected to target database: G10R21 (DBID=2807279631)
connected to recovery catalog database

RMAN> upgrade catalog;

recovery catalog owner is RMAN
enter UPGRADE CATALOG command again to confirm catalog upgrade

RMAN> upgrade catalog;

recovery catalog upgraded to version 10.02.00.00
DBMS_RCVMAN package upgraded to version 10.02.00.00
DBMS_RCVCAT package upgraded to version 10.02.00.00

RMAN> resync catalog;

starting full resync of recovery catalog
full resync complete

 

完成上述resync后,ROUT会保留最近7天以来的记录,但是原本占用的空间不会释放,因为cleanupROUT仅仅是delete记录。

 

SQL> select count(*) from rman.rout;

  COUNT(*)
----------
        55

SQL>  select bytes/1024/1024 "MB" ,segment_name,segment_type from dba_segments where segment_name='ROUT';

        MB SEGMENT_NAME         SEGMENT_TYPE
---------- -------------------- ------------------
        88 ROUT                 TABLE

 

这里存在2种方案:

1. 直接truncate rout表,因为本身rman output的数据只对Enterprise Manager有用,对于很多没有部署EM的环境,ROUT上的这些记录实际没有太大的意义,直接truncate rout 可以回收其原占有的空间:

 

stop any rman catalog operation 停止任何RMAN操作

SQL> truncate table rman.rout;

Table truncated.

 

2. 利用10g以后出现的shrink space功能收缩空间,这样既保留了ROUT中的少量记录,又回收了必要的空间:

 

stop any rman catalog operation 停止任何RMAN操作

SQL>  alter table rman.rout enable row movement;

Table altered.

SQL> alter table rman.rout shrink space;

Table altered.

SQL> select bytes/1024/1024 "MB" ,segment_name,segment_type from dba_segments where segment_name='ROUT';

        MB SEGMENT_NAME         SEGMENT_TYPE
---------- -------------------- ------------------
     .0625 ROUT                 TABLE
  1. Rman Catalog Resync Operation is Very slow at 10G

    Applies to:
    Oracle Server – Enterprise Edition – Version: 10.2.0.2 and later [Release: 10.2 and later ]
    Information in this document applies to any platform.
    ***Checked for relevance on 29-Nov-2011***
    Symptoms

    Catalog resyncs are taking a long time to complete.

    Debug trace shows the time is being spent in the RMANOUTPUTRESYNC phase.

    The problem SQL in the catalog is:

    SELECT MAX(RSR_KEY)
    FROM
    RSR WHERE RSR.DBINC_KEY IN (SELECT DBINC_KEY FROM DB WHERE DB.DB_KEY = :B3 )
    AND RSR.RSR_SRECID = :B2 AND RSR.RSR_SSTAMP = :B1 AND RSR.RSR_TYPE =
    ‘SESSION’

    Additionally, the size of the ROUT table is very large: 6376845 rows.
    Cause
    Bug 5219484CATALOG RESYNCS ARE VERY SLOW – ROUT TABLE HAS 6 MILLION ROWS +

    The ROUT table contains the RMAN output generated during all rman sessions and is used only by Enterprise Manager.
    The ROUT table is cleaned up automatically during each resync operation leaving by default, the last 60 days worth of entries in ROUT. However where RMAN is used very frequently, 60 days of ROUT entries is still too much.
    Solution

    Apply Patch 5219484 or ask for backport

    Workaround:
    This change will help for a while but you will end up on the same again after some time.

    1. Reduce the amount of information retained in the ROUT table by reducing the default from 60 days to n – where n is 1-59, this will reflect the amount of log history you will be able to view via OEM.

    Do this by editing $ORACLE_HOME/rdbms/admin/recover.bsq file (take a backup of this file FIRST) and finding the cleanupROUT procedure:

    — The procedure cleanupROUT (deletes) all RC_RMAN_OUTPUT rows corresponding
    — to job older than 60 days.
    PROCEDURE cleanupROUT IS ….

    Replace this procedure completely with this:

    ——————-

    PROCEDURE cleanupROUT IS
    start_time date;
    high_stamp number;
    high_session_key number;
    BEGIN
    IF (this_db_key IS NULL) THEN
    raise_application_error(-20021, ‘Database not set’);
    END IF;

    start_time := SYSDATE;
    high_stamp := date2stamp(start_time-10);

    SELECT nvl(max(rsr_key), 0) INTO high_session_key
    FROM rsr, dbinc
    WHERE rsr.dbinc_key = dbinc.dbinc_key
    AND dbinc.db_key = this_db_key
    AND rsr.rsr_sstamp < high_stamp;

    DELETE FROM rout
    WHERE rout_skey <= high_session_key
    AND this_db_key = rout.db_key;

    deb('cleanupROUT - deleted ' || sql%rowcount || ' rows from rout table');
    deb('cleanupROUT - took ' || ((sysdate - start_time) * 86400) || ' seconds');

    -----------------------------------

    Note the line:

    high_stamp := date2stamp(start_time-10);
    This will keep 10 days worth of rman log information - you can reduce this figure as necessary.

    The new procedure will also limit the query of the RSR table to only those rows associated with 'this database'.

    To pick up the change in the recovery catalog you must UPGRADE the catalog to recompile the catalog schema:
    RMAN> upgrade catalog;

    2. Make sure also the following indexes exist in your catalog:

    SQL> CREATE INDEX rout_i_db on rout(db_key);
    CREATE INDEX rout_i_rsr on rout(rsr_key);

    References
    BUG:5219484 – CATALOG RESYNCS ARE VERY SLOW – ROUT TABLE HAS 6 MILLION ROWS +
    PATCH:5219484 – CATALOG RESYNCS ARE VERY SLOW – ROUT TABLE HAS 6 MILLION ROWS +

  2. Pingback: Oracle数据恢复专题 – ORACLE数据库数据恢复、性能优化、故障诊断来问问MACLEAN