Oracle数据库数据恢复、性能优化、故障诊断来问问Maclean

刘相兵 邮箱: liu.maclean@gmail.com 手机: 13764045638, ORA-ALLSTARS Exadata用户组QQ群:23549328

  • 关于我|了解Maclean Liu
  • About Me
  • 向Maclean提问
  • 文章列表
  • 网站地图
  • blogs.oracle.com上的镜像
  • Oracle Feed World
  • 自学Oracle之窗
  • Oracle数据恢复专题
  • Exadata专题
  • 上海Oracle用户组

Oracle中RMAN恢复目录性能问题研究

2009/09/15 by Maclean Liu Leave a Comment

下列文本摘自metalink:

Applies to:

Oracle Server – Enterprise Edition – Version: 9.0.1.0 to 11.1.0.8
Information in this document applies to any platform.

Purpose

This note discusses what to look for when queries against the RMAN repository (whether in the controlfile or a catalog database) are performing badly and what diagnostics needs to be gathered when raising a Service Request with Oracle Support Services.

This note is intended for use by Database Administrators and Support personnel investigating RMAN performance problems relating specifically to catalog resyncs or queries against the controlfile or catalog database whilst running RMAN.

Last Review Date

November 7, 2008

Instructions for the Reader

A Troubleshooting Guide is provided to assist in debugging a specific issue. When possible, diagnostic tools are included in the document to assist in troubleshooting.

Troubleshooting Details

Identifying a problem with the RMAN repository

The RMAN repository is updated whenever new backups are taken or housekeeping is done to maintain the backup metadata; it is queried when RMAN reports are run and during restore to check backup history and determine the best and most efficient way to carry out the restore. If a catalog is used, a number of resyncs are done first to update the catalog with any new information in the controlfile but once the resyncs are complete, ALL subsequent queries of RMAN metadata (during crosscheck, report, restore etc) are run against the catalog database only.

You know there is a problem if:

  • An explicit catalog resync takes a long time to complete
  • Statspack or AWR reports for a nocatalog run shows top wait events in the target are for CPU or control file IO:
    • “control file parallel write”
    • “control file sequential read”
  • Statspack or AWR reports for catalog run shows top wait events in the catalog are for CPU or the catalog datafile IO:
    • “db file scattered read”
    • “db file sequential read”
  • Statspack or AWR reports shows most expensive queries in terms of CPU or elapsed time are those issued against the controlfile or rman catalog tables
  • You have worked through Note 360443.1: RMAN Backup Performance, and identified the bottleneck to be in the Oracle Layer before the backup actually starts
  • You have worked through Note 740911.1: RMAN Restore Performance, and identified the bottleneck to be in the Oracle Layer before the restore actually starts

Control_file_record_keep_time and recovery window

Control_file_record_keep_time specifies the minimum amount of RMAN metadata guarranteed to be retained in the controlfile; the default is 7 days.  In practice the rate at which records age out of the controlfile depends on how long it takes for a recyclable section to fill up which may well be more than 7 days. It is important to note that control_file_record_keep_time and recovery window are NOT the same; the former governs when records can be overwritten in the controlfile, primarily to prevent unlimited controlfile growth whilst the latter is concerned with retention of RMAN metadata according to business requirements (see Bug 6448254: RAISE WARNING IF RECOVERY WINDOW IS > CONTROL_FILE_RECORD_KEEP_TIME IN NOCAT MODE). If a catalog is used and RMAN is run daily or at a frequency LESS than control_file_record_keep_time days, RMAN metadata will never be lost as implicit catalog resyncs ensure data is regularly propogated into the catalog.

Integrity of RMAN metadata

Constraints in the catalog ensure the integrity of the rman metadata. For example, a backup_set must have a corresponding backup_piece which must have corresponding backup_X (datafile, redolog ) entries. There are no such constraints in a controlfile; records in different sections can age out at different rates. For example, the one:many relationship between backup_set and backup_redolog entries means the backup_redologs section will fill up faster than the backup_set section and over time, a backup_redolog entry may age out of the controlfile BEFORE its corresponding backup_set entry, leaving a backup_set entry with missing backup_redolog entries (mis-matched). Running housekeeping operations (report/delete obsolete, crosscheck etc) against such a CONTROLFILE with mismatched entries will produce unpredictable results (see Bug 7357779: RMAN REPORT/DELETE OBSOLETE RETURNS ARCHIVELOG BACKUPS INSIDE RETENTION POLICY).

To avoid any issues with mis-matched entries, set control_file_record_keep_time equal to the recovery window +1 which ensures that RMAN metadata INSIDE the retention policy will never be affected. In practice this may not be possible if a very large recovery window is needed so typically a catalog is used, a retention policy configured and control_file_record_keep_time left to its default of 7 days. But there may still be times when the catalog database is unavailable (it is being upgraded perhaps or is under investigation) and an RMAN backup must still proceed in which case:

  • set control_file_record_keep_time days to the recovery window plus 1 to ensure no loss of metadata whilst the catalog is unavailable
  • DO NOT RUN ANY MAINTENANCE OPERATIONS until the catalog is again available expecially if your recovery window is greater than the original setting of control_file_record_keep_time days

Tackling a performance issue

AWR snapshot reports from the target (nocatalog) or catalog database should identify the specific SQL that is causing a problem in terms of CPU, high IO or elapsed times.

Refer to the following notes to see if the problematic SQL is a known issue:

Note 247611.1:Known RMAN Performance Problems
Note 463227.1: Oracle10g RMAN Recovery Catalog Known Performance Issues

If none of the issues in the above notes are relevant, then further investigation is needed and the following diagnostics will be useful and should be uploaded if a Service Request is raised with Oracle Support Services. There are three processes to consider:

  • The RMAN client – this should always be traced
  • The 1st default channel (for nocatalog operations only, this process runs all the rman queries against the controlfile)
  • The RMAN session in the catalog instance

All the traces will generate a lot of information so consider FIRST how to minimise the output by stripping the job down to a SINGLE rman command. Execute each command in a script in isolation to see which command produces the problem and trace just that one command. It is always worth running an explicit catalog resync first to see if the real problem lies in the implicit resync done for virtually all commands.

Tracing the RMAN client

Ideally, set debug trace on the RMAN command line and execute a simplied script to illustrate the problem eg

%rman target / catalog usr/pw@<alias> log rman.log trace rman.trc debug
run {
allocate channel t1 type sbt parms=(…………);
backup database plus archivelog all;
}

Otherwise trace only the command that you are interested in. The following example traces only the crosscheck command:

%rman target / catalog usr/pw@<alias> log rman.log trace rman.trc
run {
allocate channel t1 type sbt parms=(….);
allocate channel t2 type sbt parms=(….);
backup database;
backup archivelog all until time ' sysdate –7' delete all input;
debug on;
crosscheck backup;
debug off;
}

Tracing queries against the RMAN repository

Make sure before you invoke rman that the following environmental variables are set:

NLS_LANG='american_america.<charset>'
NLS_DATE_FORMAT='dd-mon-rr hh24:mi:ss'

And for the target database (nocatalog operations) or catalog database (catalog operations):

TIME_STATISTICS=TRUE
MAX_DUMP_FILE_SIZE=UNLIMITED

If a recovery catalog is normally used, try again WITHOUT the catalog; if the problem persists then for diagnostic purposes, get the trace against the CONTROLFILE only. Tracing the RMAN catalog session should only be done if the problem only occurs when a catalog is used.

The Controlfile (nocatalog)

%rman target / log rman.log trace rman.trc debug
run {
sql "alter session set events ''10046 trace name context forever, level 12''";
set command id to 'rman';
allocate channel for maintenance type sbt…………..;
backup database;
}

Note: Two single quotes before : 10046
Two single quotes and a double quote after : level 12

This traces the first default channel; trace file will be generated in the target udump directory using the format <SID>_ora_<spid>.trc. To find the <spid> of the 1st default channel:

col program format a20
col action format a20
select s.sid, p.spid, s.program, s.client_info,s.action, seq#, event, wait_time,
seconds_in_wait AS sec_wait
from v$session s, v$process p
where s.paddr = p.addr and s.program like '%rman%';

Typical output:

SID SPID PROGRAM                          CLIENT_INFO
---------------------------------------------------------
ACTION              SEQ# EVENT                      WAIT_TIME SEC_WAIT
-----------------------------------------------------------------------------
139 24822 rman@celcsol4 (TNS V1-V3)
0000012 FINISHED129    853  SQL*Net message from client 0        1794
147  4574 rman@celcsol4 (TNS V1-V3)
115  SQL*Net message from client 0        1797
135  5056 rman@celcsol4 (TNS V1-V3)       rman channel=ORA_DISK_1
0000046 FINISHED129   2219  SQL*Net message from client 0        1794

Note:
- sid 147 is the polling channel; it never has a value in ACTION
- sid 139 is the 1st default Channel (CLIENT_INFO is null)
- sid 135 is the allocated channel ORA_DISK_1 (per CLIENT_INFO)

The Catalog

Allow the rman job to make the catalog connection first – it may help to use the host command to allow time to find the catalogn session:

%rman target / catalog usr/pw@<alias> log rman.log trace rman.trc debug
run {
set command id to 'rman';
allocate channel for maintenance type sbt…………..;
host;
backup database;
}

The host command will halt execution and return control to the OS prompt.
To identify the RMAN session in the catalog database:

SELECT s.sid, s.serial#, p.spid, s.program, s.action
FROM v$session s, v$process p
WHERE s.paddr = p.addr
AND s.program like '%rman%';

Note the spid value eg 3164 then do:

oradebug setospid 3164
oradebug unlimit
oradebug event 10046 trace name context forever, level 12
exit

Once this is done, to return to the RMAN session and to continue execution of the script type: exit
A trace file will be generated in the CATALOG udump directory using the format <SID>_ora_<spid>.trc.

Upload Diagnostics

1. RMAN client trace file
2. 1st default channel 10046 trace file (nocatalog only)
3. RMAN catalog 10046 trace file (catalog issues only)
4. Related AWR/Statspack reports

Filed Under: Oracle Tagged With: backups, catalog database, database administrators, diagnostic tools, housekeeping, long time, metadata, metalink, Oracle, oracle server, oracle support services, performance problems, queries, repository, resync, server enterprise, service request, target, troubleshooting details, troubleshooting guide

好用的站内搜索



新浪微博

About Me

author's avatar The Maclean Liu

AskMaclean Logo AskMaclean Logo 10g_ocm ioug_member_logo

acoug

最新动态

  • 如何在无审计的环境中追踪Truncate/Drop等危险的DDL操作
  • 【Maclean Liu技术分享】拨开Oracle优化器迷雾探究Histogram之秘
  • Exadata Database Machine数据库一体机专题
  • 【CBO Optimizer优化器】IX_SEL索引选择率
  • 【数据恢复】详解ORA-1410错误
  • 【转Oracle补丁】老托的Oracle 数据库Patch概念性小常识
  • 【数据恢复】详解ORA-8103错误
  • Oracle数据恢复专题
  • 【Exadata一体机】Exadata Cell监控最佳实践
  • 【Goldengate性能优化】优化Extract抽取进程性能,解决OGG抽取日志延迟

数据恢复

  • 如何在无审计的环境中追踪Truncate/Drop等危险的DDL操作
  • 【数据恢复】详解ORA-1410错误
  • 【数据恢复】详解ORA-8103错误
  • Oracle数据恢复专题
  • 【数据恢复】利用构造ROWID实现无备份情况下绕过ORA-1578、ORA-8103、ORA-1410等逻辑/物理坏块问题
  • 【数据恢复】ORA-600[kccpb_sanity_check_2]一例
  • Oracle rman中set newname可能很慢
  • 如何清除Oracle控制文件中的无用记录,例如v$archived_log中的deleted归档日志记录
  • 如何找回被create or replace覆盖的PL/SQL对象
  • Archivelog Completed Before VS UNTIL TIME

内部原理研究

  • 【Maclean Liu技术分 享】深入理解Oracle中 Mutex的内部原理
  • 深入理解Oracle中的Mutex
  • kksfbc Clear parse
  • 了解你所不知道的SMON功能(十二):Shrink UNDO(rollback) SEGMENT
  • 了解Database Replay Capture内部原理
  • 实测DB_BLOCK_CHECKSUM=FULL的作用
  • DML UPDATE/DELETE与CR一致性读的秘密
  • Buffer cache spillover: only buffers
  • Crash/Instance Recovery与Media Recovery的本质区别
  • _CORRUPTED_ROLLBACK_SEGMENTS隐藏参数

RAC集群

  • gc buffer busy/gcs log flush sync与log file sync
  • 【视频教学】Maclean教你用Vbox在Linux 6.3上安装Oracle 11gR2 RAC
  • 针对11.2 RAC丢失OCR和Votedisk所在ASM Diskgroup的恢复手段
  • 从此SQLPLUS有了Top命令
  • 【教学视频】Maclean教你一步一步使用Vbox在Linux 5上安装Oracle 11gR2 RAC
  • [视频教学]Maclean教你用Vbox在Enterprise Linux 5上安装Oracle 10gR2 RAC
  • Global Cache CR Requested But Current Block Received
  • MINSCN与Cache Fusion Read Consistent
  • Script:收集RAC诊断信息
  • Script:Collect vip resource Diagnostic Information

11g新特性

  • 【11g新特性】Cardinality Feedback基数反馈
  • SQL Performance Analyzer SPA常用脚本汇总
  • 11gR2游标共享新特性带来的一些问题以及_cursor_features_enabled、_cursor_obsolete_threshold和106001 event
  • VKTM进程消耗大量CPU的问题
  • 了解Database Replay Capture内部原理
  • 那些在11gR2中可能惹祸的新特性,一张列表帮助你摆脱升级11gR2带来的烦恼
  • 11g中如何禁用自动统计信息收集作业
  • 针对11.2 RAC丢失OCR和Votedisk所在ASM Diskgroup的恢复手段
  • 在11gR2 RAC中修改ASM DISK Path磁盘路径
  • 了解11g OCM

GoldenGate OGG

  • 【Goldengate性能优化】优化Extract抽取进程性能,解决OGG抽取日志延迟
  • 了解GoldenGate中LAG的含义
  • 了解GoldenGate Replicat的HANDLECOLLISIONS参数
  • Oracle Goldengate OGG 11g与各操作系统及数据库版本的兼容列表
  • Goldengate Parameter SUPPRESSTRIGGERS & DEFERREFCONST
  • OGG-01154 SQL error 1400 cannot insert NULL into错误解析
  • Goldengate Best Parameters & TEST from Maclean
  • OGG-01738 BOUNDED RECOVERY
  • ALTER EXTSEQNO must be performed on each corresponding downstream reader
  • OGG常见初始化方案

Exadata数据库一体机

  • Exadata Database Machine数据库一体机专题
  • 【Exadata一体机】Exadata Cell监控最佳实践
  • Exadata一体机健康检查报告
  • Calibrate测试Exadata IO
  • 【转】Oracle SPARC SuperCluster全能王:不改大道至简本色
  • Oracle Open World 2012信息汇总贴
  • Exadata offload incremental backup
  • Exadata:Smart Scan(一)
  • 一些有用的Exadata诊断命令
  • Script:收集Exadata诊断信息

性能调优

  • 【Maclean Liu技术分享】拨开Oracle优化器迷雾探究Histogram之秘
  • 【CBO Optimizer优化器】IX_SEL索引选择率
  • 【Maclean技术分享】开Oracle调优鹰眼,深入理解AWR性能报告 第二讲
  • GATHER_STATS_JOB: Stopped by Scheduler. Consider increasing the maintenance window duration if this happens frequently.
  • 【11g新特性】Cardinality Feedback基数反馈
  • 【Maclean Liu技术分享】拨开Oracle优化器迷雾探究Histogram之秘
  • CBO Cost Formulas基于成本优化器的成本计算公式大全
  • 关于10053 trace中的UNCOMPBKTS和ENDPTVALS
  • Oracle CBO术语大集合
  • SQL Performance Analyzer SPA常用脚本汇总

案例诊断

  • latch free:SQL memory manager latch
  • 一次Exadata上的ORA-600[kjbmprlst:shadow]故障分析
  • Warning: lio_listio returned EAGAIN Performance degradation may be seen
  • MMON Slave日志[KDL_TRIM]: NEWLEN: MESSAGES
  • LIBCLNTSH.SO: WRONG ELF CLASS: ELFCLASS32错误一例
  • dbms_logmnr Unsupported SQLREDO
  • 了解SYSDATE函数
  • Oracle等待事件KJC:Wait for msg sends to complete
  • 了解你所不知道的SMON功能(四):维护col_usage$字典基表
  • Too many fragmentation in LMT?

12c新特性

  • Oracle数据库新版本12c信息汇总
  • 【Database 12c】手动创建CDB Container Database容器数据库
  • 【Oracle Database 12c新特性】 12c DataPump Expdp/Impdp新特性
  • 【12c新特性】安装12c Standalone Grid Infrastructure
  • 【12c新特性】EM Database Express
  • 将在版本11.2之后废弃或不再支持的特性
  • 解读Tom介绍的Oracle Database 12c的12个新特性
  • 解读Oracle Database 12.1新特性Pluggable Databases
  • 【转】Oracle Database 12c将亮相OpenWorld
  • 12c分页查询特性FETCH FIRST ROWS,OFFSET ROWS FETCH NEXT ROW LIMIT Clause子句

实用脚本大全

  • 【脚本】检测CPU计算能力
  • SQL Performance Analyzer SPA常用脚本汇总
  • Script:挖掘AWR实现查询SCN历史增长走势
  • 从此SQLPLUS有了Top命令
  • Maclean Liu的脚本工具盒
  • Script:诊断SYSAUX表空间使用情况
  • Script:查找表或索引增长的历史信息
  • Script:ASM修复脚本,寻找LISTHEAD和Kfed源数据
  • Script:找出ASM中的Spfile参数文件
  • 如何诊断ORA-125XX连接问题

文章分类

最新回复

  • 如何一步步搭建Exadata虚拟机——DB节点 | Oracle Exadata & Best practices on 【教学视频】Maclean教你一步一步使用Vbox在Linux 5上安装Oracle 11gR2 RAC
  • Ask_Maclean_liu_Oracle on 为什么RHEL 6上没有ASMLIB?
  • 微宇 on 为什么RHEL 6上没有ASMLIB?
  • Ask_Maclean_liu_Oracle on 11g默认审计选项
  • admxy on 在Linux 6上使用UDEV解决RAC ASM存储设备名问题
  • frankying on 在Linux 6上使用UDEV解决RAC ASM存储设备名问题
  • Ask_Maclean_liu_Oracle on 在Linux 6上使用UDEV解决RAC ASM存储设备名问题
  • admxy on 在Linux 6上使用UDEV解决RAC ASM存储设备名问题
  • admxy on 在Linux 6上使用UDEV解决RAC ASM存储设备名问题
  • ray on My Opinion On Exadata V2

国内Oracle圈

  • A MySQL DBA's Blog
  • AtCTO
  • banping
  • Beckdim Blog's
  • CHANEL [K]
  • chenjunlu的博客
  • dbalife
  • DBAROAD
  • eagle's home
  • Ligle
  • Luda's life
  • mahir-quluzade
  • mycsdn
  • Oracle Exadata & Best practice
  • Oracle Tech Zone
  • oraqidu blog
  • orczhou
  • OS与Oracle
  • OTN中文论坛
  • pandblife
  • ricky.zhu
  • Roger
  • stanleylog
  • Thomas Zhang的杂货铺
  • Todd Bao's Blog (包光磊)
  • vmcd's library Home
  • xifenfei
  • ZhaoYu Acoug DBA Life
  • 小荷's Blog
  • 我在博客园的镜像
  • 数据工人
  • 数据智能网
  • 深入MySQL内核
  • 老熊

Return to top of page

Copyright © 2013 · Minimum Theme on Genesis Framework · WordPress · Log in