GoldenGate实现Live Standby主备库切换(1)

Oracle Goldengate目前支持主被动式的双向配置,换而言之OGG可以将来自于激活的主库的数据变化完全复制到从库中,从库在不断同步数据的同时已经为计划内的和计划外的outages做好了故障切换的准备,也就是我们说的Live Standby。这里我们重点介绍一下配置Oracle Goldengate Live Standby系统的步骤,和具体的故障切换过程。

SQL> conn clinic/clinic
Connected.
SQL> drop table tv;
create table tv (t1 int primary key,t2 int,t3 varchar2(30));
Table dropped.

SQL> 

Table created.

SQL> drop sequence seqt1;

create sequence seqt1 start with 1 increment by 1;
Sequence dropped.

SQL> SQL>
Sequence created.

declare
  rnd number(9,2);
begin
   for i in 1..100000 loop
     insert into tv values(seqt1.nextval,i*dbms_random.value,'MACLEAN IS TESTING');
     commit;
   end loop;
end;
/

/* 以上脚本在primary主库的某个应用账户下创建了测试用的数据,
    接着我们可以使用各种工具将数据初始化到从库中,如果在这个过程中
    希望实时在线数据迁移的话,可以参考《Goldengate实现在线数据迁移》
*/

/* 注意我们在Live Standby的环境中往往需要复制sequence序列,以保证切换到备库时业务可以正常进行  */

/* 初始化备库数据后,确保已与主库完全一致 */
primary :
SQL> select sum(t2) from tv;

   SUM(T2)
----------
2498624495

SQL> select last_number from user_sequences;

LAST_NUMBER
-----------
     100001

standby:
SQL> select sum(t2) from tv;

   SUM(T2)
----------
2498624495

SQL> select last_number from user_sequences;

LAST_NUMBER
-----------
     100001

以上完成准备工作后,我们可以进入到正式配置Goldengate live stanby的阶段,包括以下步骤:

  1. 配置由主库到备库的extract、replicat、data pump,该步骤同普通的单向复制没有太大的区别
  2. 配置由备库到主库的extract、replicat、data pump
  3. 启动由主库到备库的extract、replicat、data pump

接下来我们会实践整个配置过程:

1.
创建由主库到备库的extract、data pump、replicat
GGSCI (rh2.oracle.com) 10> dblogin userid maclean
Password: 
Successfully logged into database.

GGSCI (rh2.oracle.com) 11> add trandata clinic.*
Logging of supplemental redo data enabled for table CLINIC.TV


GGSCI (rh2.oracle.com) 4> add extract extstd1,tranlog,begin now
EXTRACT added.

GGSCI (rh2.oracle.com) 5> add exttrail /d01/ext/cl,megabytes 100,extract extstd1
EXTTRAIL added.

GGSCI (rh2.oracle.com) 7> view params extstd1

-- Identify the Extract group:
EXTRACT extstd1
-- Specify database login information as needed for the database:
userid maclean, password maclean
-- Specify the local trail that this Extract writes to:
EXTTRAIL /d01/ext/cl
-- Specify sequences to be captured:
SEQUENCE clinic.seqt1;
-- Specify tables to be captured:
TABLE clinic.*;
-- Exclude specific tables from capture if needed:
-- TABLEEXCLUDE 

GGSCI (rh2.oracle.com) 17> add extract pumpstd1,exttrailsource /d01/ext/cl,begin now
EXTRACT added.

GGSCI (rh2.oracle.com) 98> add rmttrail /d01/rmt/cl,megabytes 100,extract pumpstd1
RMTTRAIL added.

GGSCI (rh2.oracle.com) 129> view params pumpstd1
-- Identify the data pump group:
EXTRACT pumpstd1
userid maclean, password maclean
-- Specify database login information as needed for the database:
userid maclean, password maclean
RMTHOST rh3.oracle.com, MGRPORT 7809
-- Specify the remote trail on the standby system:
RMTTRAIL /d01/rmt/cl
-- Pass data through without mapping, filtering, conversion:
PASSTHRU
sequence clinic.seqt1;
Table clinic.*;


在备库上配置由主库到备库的replicat:

GGSCI (rh3.oracle.com) 4> add replicat repstd1,exttrail /d01/rmt/cl,begin now
REPLICAT added.

GGSCI (rh3.oracle.com) 49> view params repstd1
-- Identify the Replicat group:
REPLICAT repstd1
-- State that source and target definitions are identical:
ASSUMETARGETDEFS
-- Specify database login information as needed for the database:
userid maclean, password maclean
-- Specify tables for delivery:
MAP clinic.*, TARGET clinic.*;
-- Exclude specific tables from delivery if needed:
-- MAPEXCLUDE 

2.
创建由备库到主库的extract、data pump、replicat

GGSCI (rh3.oracle.com) 51> dblogin userid maclean
Password: 
Successfully logged into database.

GGSCI (rh3.oracle.com) 52> add trandata clinic.*
Logging of supplemental redo data enabled for table CLINIC.TV.

/* 不要忘记在备库端的相关表加上追加日志 */

GGSCI (rh3.oracle.com) 53> add extract extstd2,tranlog,begin now
EXTRACT added.

GGSCI (rh3.oracle.com) 54> add exttrail /d01/ext/cl,megabytes 100,extract extstd2
EXTTRAIL added.

GGSCI (rh3.oracle.com) 58> view params extstd2
-- Identify the Extract group:
EXTRACT extstd2
-- Specify database login information as needed for the database:
userid maclean, password maclean
-- Specify the local trail that this Extract writes to:
EXTTRAIL /d01/ext/cl
-- Specify sequences to be captured:
SEQUENCE clinic.seqt1;
-- Specify tables to be captured:
TABLE clinic.*;
-- Exclude specific tables from capture if needed:
-- TABLEEXCLUDE 

GGSCI (rh3.oracle.com) 59> add extract pumpstd2,exttrailsource /d01/ext/cl,begin now
EXTRACT added.

GGSCI (rh3.oracle.com) 60> add rmttrail /d01/rmt/cl,megabytes 100,extract pumpstd2
RMTTRAIL added.

GGSCI (rh3.oracle.com) 63> view params pumpstd2

-- Identify the data pump group:
EXTRACT pumpstd2
userid maclean, password maclean
-- Specify database login information as needed for the database:
userid maclean, password maclean
RMTHOST rh2.oracle.com, MGRPORT 7809
-- Specify the remote trail on the standby system:
RMTTRAIL /d01/rmt/cl
-- Pass data through without mapping, filtering, conversion:
PASSTHRU
sequence clinic.seqt1;
Table clinic.*;

在主库上配置replicat:


GGSCI (rh2.oracle.com) 136> add replicat repstd2,exttrail /d01/rmt/cl,begin now,checkpointtable maclean.ck
REPLICAT added.

GGSCI (rh2.oracle.com) 138> view params repstd2

-- Identify the Replicat group:
REPLICAT repstd2
-- State that source and target definitions are identical:
ASSUMETARGETDEFS
-- Specify database login information as needed for the database:
userid maclean, password maclean
-- Specify tables for delivery:
MAP clinic.*, TARGET clinic.*;
-- Exclude specific tables from delivery if needed:
-- MAPEXCLUDE 

3.
完成以上OGG配置后,可以启动主库到备库的extract、pump、以及replicat:
GGSCI (rh2.oracle.com) 141> start extstd1
Sending START request to MANAGER ...
EXTRACT EXTSTD1 starting


GGSCI (rh2.oracle.com) 142> start pumpstd1
Sending START request to MANAGER ...
EXTRACT PUMPSTD1 starting



GGSCI (rh3.oracle.com) 70> start repstd1
Sending START request to MANAGER ...
REPLICAT REPSTD1 starting

/* 如果你是在offline状态下配置的话,那么此时可以启用应用了*/

接下来我们尝试做有计划的主备库切换演练:

1.
首先停止一切在主库上的应用,这一点和DataGuard Switchover一样。在保证没有活动事务的情况下,才能切换干净。
2.
在主库端使用LAG等命令了解extract的延迟,若返回如"At EOF, no more records to process"的信息,则说明所有事务均已被抽取。
GGSCI (rh2.oracle.com) 144> lag extstd1
Sending GETLAG request to EXTRACT EXTSTD1 ...
Last record lag: 0 seconds.
At EOF, no more records to process.

在EOF的前提下关闭extract:
GGSCI (rh2.oracle.com) 146> stop extstd1 
Sending STOP request to EXTRACT EXTSTD1 ...
Request processed.

3.
同样对pump使用LAG命令,若返回如"At EOF, no more records to process"的信息,则说明已抽取的数据都被发送到备库了。
GGSCI (rh2.oracle.com) 147> lag pumpstd1
Sending GETLAG request to EXTRACT PUMPSTD1 ...
Last record lag: 3 seconds.
At EOF, no more records to process.

在EOF的前提下,关闭data pump
GGSCI (rh2.oracle.com) 148> stop pumpstd1
Sending STOP request to EXTRACT PUMPSTD1 ...
Request processed.

3.
检查备库端replicat的同步情况,如返回"At EOF, no more records to process.",则说明所有记录均被复制。
GGSCI (rh3.oracle.com) 71> lag repstd1
Sending GETLAG request to REPLICAT REPSTD1 ...
Last record lag: 5 seconds.
At EOF, no more records to process.

在EOF的前提下关闭replicat

GGSCI (rh3.oracle.com) 72> stop repstd1
Sending STOP request to REPLICAT REPSTD1 ...
Request processed.

4.
紧接着我们可以在备库上为业务应用用户赋予必要的insert、update、delete权限,启用各种触发器trigger及cascade delete约束等;
以上手段在主库上对应的操作是收回应用业务的权限,disable掉各种触发器及cascade delete约束,
之所以这样做是为了保证在任何时候扮演备库角色的数据库均不应当接受任何除了OGG外的手动的或者应用驱动的业务数据变更,
以保证主备库间的数据一致。

5.
修改原备库上的extract的启动时间到现在,已保证它不去抽取那些之前的重做日志

GGSCI (rh3.oracle.com) 75> alter extstd2 ,begin now
EXTRACT altered.

GGSCI (rh3.oracle.com) 76> start extstd2

Sending START request to MANAGER ...
EXTRACT EXTSTD2 starting


若之前没有启动由备库到主库的pump和replicat的话可以在此时启动:

GGSCI (rh3.oracle.com) 78> start pumpstd2

Sending START request to MANAGER ...
EXTRACT PUMPSTD2 starting

GGSCI (rh2.oracle.com) 161> start repstd2

Sending START request to MANAGER ...
REPLICAT REPSTD2 starting

6.此时我们可以正式启动在原备库现在的主库上的应用了


接下来我们尝试回切到原主库上:
1.前提步骤与之前的切换相似,首先停止在原备库上的任何应用,
之后使用LAG命令确认extract和replicat的进度,在确认后关闭extract和replicat。
完成在主库上的维护工作:包括赋予权限,启用触发器等等。

2.修改原主库上的extract的开始时间为当前,保证它不去处理之前的重做日志:
GGSCI (rh2.oracle.com) 165> alter extract extstd1,begin now
EXTRACT altered.

3.此时我们已经可以启动在原主库现在的主库上的应用了

4.启动最早配置的由主库到备库的extract、pump、replicat:

GGSCI (rh2.oracle.com) 166> start extstd1

Sending START request to MANAGER ...
EXTRACT EXTSTD1 starting

GGSCI (rh2.oracle.com) 171> start pumpstd1

Sending START request to MANAGER ...
EXTRACT PUMPSTD1 starting

GGSCI (rh3.oracle.com) 86> start repstd1

Sending START request to MANAGER ...
REPLICAT REPSTD1 starting


以上完成了OGG的Live Standby中主备库之间的计划内的切换Switchover,That's Great!

ARCHIVER ERROR ORA-00354: CORRUPT REDO LOG BLOCK HEADER

Problem Description:
ORA-16038: log 2 sequence# 13831 cannot be archived
ORA-00354: corrupt redo log block header
ORA-00312: online log 2 thread 1: ‘/oradata/3/TOOLS/stdby_redo/srl1.log’

LOG FILE
---------------
Filename = alert_TOOLS5_from_1021.log
See ...

...
Wed Oct 28 11:41:59 2009
Primary database is in MAXIMUM AVAILABILITY mode
Standby controlfile consistent with primary
RFS[1]: Successfully opened standby log 1: '/oradata/3/TOOLS/stdby_redo/srl0.log'
Wed Oct 28 11:42:00 2009
ARC0: Log corruption near block 604525 change 10551037679542 time ?
Wed Oct 28 11:42:00 2009
Errors in file /tools/oracle/admin/TOOLS/bdump/tools_arc0_2143.trc:
ORA-00354: corrupt redo log block header
ORA-00353: log corruption near block 604525 change 10551037679542 time 10/28/2009 11:29:50
ORA-00312: online log 2 thread 1: '/oradata/3/TOOLS/stdby_redo/srl1.log'
ARC0: All Archive destinations made inactive due to error 354
Wed Oct 28 11:42:00 2009
ARC0: Closing local archive destination LOG_ARCHIVE_DEST_2: '/oradata/3/TOOLS/archive/dgarc/1_13831_635534096.arc' (error 354)
(TOOLS)
Committing creation of archivelog '/oradata/3/TOOLS/archive/dgarc/1_13831_635534096.arc' (error 354)
ARCH: Archival stopped, error occurred. Will continue retrying
Wed Oct 28 11:42:05 2009
ORACLE Instance TOOLS - Archival Error
Wed Oct 28 11:42:05 2009
ORA-16038: log 2 sequence# 13831 cannot be archived
ORA-00354: corrupt redo log block header
ORA-00312: online log 2 thread 1: '/oradata/3/TOOLS/stdby_redo/srl1.log'
Wed Oct 28 11:42:05 2009
Errors in file /tools/oracle/admin/TOOLS/bdump/tools_arc0_2143.trc:
ORA-16038: log 2 sequence# 13831 cannot be archived
ORA-00354: corrupt redo log block header
ORA-00312: online log 2 thread 1: '/oradata/3/TOOLS/stdby_redo/srl1.log'
Wed Oct 28 11:43:04 2009
ARCH: Archival stopped, error occurred. Will continue retrying
Wed Oct 28 11:43:04 2009
ORACLE Instance TOOLS - Archival Error
Wed Oct 28 11:43:04 2009
Primary database is in MAXIMUM AVAILABILITY mode
Changing standby controlfile to RESYNCHRONIZATION level
Wed Oct 28 11:43:04 2009
ORA-16014: log 1 sequence# 13832 not archived, no available destinations
ORA-00312: online log 1 thread 1: '/oradata/3/TOOLS/stdby_redo/srl0.log'
Wed Oct 28 11:43:04 2009
Errors in file /tools/oracle/admin/TOOLS/bdump/tools_arc1_2145.trc:
ORA-16014: log 1 sequence# 13832 not archived, no available destinations
ORA-00312: online log 1 thread 1: '/oradata/3/TOOLS/stdby_redo/srl0.log'
RFS[1]: Successfully opened standby log 2: '/oradata/3/TOOLS/stdby_redo/srl1.log'
Wed Oct 28 11:43:13 2009
RFS[3]: Archived Log: '/oradata/3/TOOLS/archive/dgarc/1_13831_635534096.arc'
Wed Oct 28 11:43:14 2009
RFS LogMiner: Registered logfile [/oradata/3/TOOLS/archive/dgarc/1_13831_635534096.arc] to LogMiner session id [4]
Wed Oct 28 11:43:15 2009
LOGMINER: Begin mining logfile for session 4 thread 1 sequence 13831, /oradata/3/TOOLS/archive/dgarc/1_13831_635534096.arc
Wed Oct 28 11:44:03 2009
RFS[3]: Archived Log: '/oradata/3/TOOLS/archive/dgarc/1_13832_635534096.arc'
...
LOG FILE
---------------
Filename = alert_TOOLS6_from_1021.log
See ...

...
Wed Oct 28 11:16:01 2009
Thread 1 advanced to log sequence 13830 (LGWR switch)
Current log# 8 seq# 13830 mem# 0: /oradata/1/redo/TOOLS/redo1a.log
Current log# 8 seq# 13830 mem# 1: /oradata/2/redo/TOOLS/redo1b.log
Current log# 8 seq# 13830 mem# 2: /oradata/3/redo/TOOLS/redo1c.log
Wed Oct 28 11:29:50 2009
LGWR: Standby redo logfile selected to archive thread 1 sequence 13831
LGWR: Standby redo logfile selected for thread 1 sequence 13831 for destination LOG_ARCHIVE_DEST_2
Wed Oct 28 11:29:50 2009
Thread 1 advanced to log sequence 13831 (LGWR switch)
Current log# 9 seq# 13831 mem# 0: /oradata/1/redo/TOOLS/redo2a.log
Current log# 9 seq# 13831 mem# 1: /oradata/2/redo/TOOLS/redo2b.log
Current log# 9 seq# 13831 mem# 2: /oradata/3/redo/TOOLS/redo2c.log
Wed Oct 28 11:41:59 2009
LGWR: Standby redo logfile selected to archive thread 1 sequence 13832
LGWR: Standby redo logfile selected for thread 1 sequence 13832 for destination LOG_ARCHIVE_DEST_2
Wed Oct 28 11:41:59 2009
Thread 1 advanced to log sequence 13832 (LGWR switch)
Current log# 10 seq# 13832 mem# 0: /oradata/1/redo/TOOLS/redo3a.log
Current log# 10 seq# 13832 mem# 1: /oradata/2/redo/TOOLS/redo3b.log
Current log# 10 seq# 13832 mem# 2: /oradata/3/redo/TOOLS/redo3c.log
Wed Oct 28 11:43:04 2009
Destination LOG_ARCHIVE_DEST_2 is UNSYNCHRONIZED
LGWR: Standby redo logfile selected to archive thread 1 sequence 13833
LGWR: Standby redo logfile selected for thread 1 sequence 13833 for destination LOG_ARCHIVE_DEST_2
Wed Oct 28 11:43:04 2009
Thread 1 advanced to log sequence 13833 (LGWR switch)
Current log# 11 seq# 13833 mem# 0: /oradata/1/redo/TOOLS/redo4a.log
Current log# 11 seq# 13833 mem# 1: /oradata/2/redo/TOOLS/redo4b.log
Current log# 11 seq# 13833 mem# 2: /oradata/3/redo/TOOLS/redo4c.log
Wed Oct 28 11:45:04 2009
Destination LOG_ARCHIVE_DEST_2 is SYNCHRONIZED
LGWR: Standby redo logfile selected to archive thread 1 sequence 13834
LGWR: Standby redo logfile selected for thread 1 sequence 13834 for destination LOG_ARCHIVE_DEST_2
Wed Oct 28 11:45:05 2009
Thread 1 advanced to log sequence 13834 (LGWR switch)
Current log# 8 seq# 13834 mem# 0: /oradata/1/redo/TOOLS/redo1a.log
Current log# 8 seq# 13834 mem# 1: /oradata/2/redo/TOOLS/redo1b.log
Current log# 8 seq# 13834 mem# 2: /oradata/3/redo/TOOLS/redo1c.log
Wed Oct 28 11:46:03 2009
Thread 1 cannot allocate new log, sequence 13835
Checkpoint not complete
Current log# 8 seq# 13834 mem# 0: /oradata/1/redo/TOOLS/redo1a.log
Current log# 8 seq# 13834 mem# 1: /oradata/2/redo/TOOLS/redo1b.log
Current log# 8 seq# 13834 mem# 2: /oradata/3/redo/TOOLS/redo1c.log
Wed Oct 28 11:46:10 2009
Destination LOG_ARCHIVE_DEST_2 is UNSYNCHRONIZED
LGWR: Standby redo logfile selected to archive thread 1 sequence 13835
LGWR: Standby redo logfile selected for thread 1 sequence 13835 for destination LOG_ARCHIVE_DEST_2
Wed Oct 28 11:46:11 2009
Thread 1 advanced to log sequence 13835 (LGWR switch)
Current log# 9 seq# 13835 mem# 0: /oradata/1/redo/TOOLS/redo2a.log
Current log# 9 seq# 13835 mem# 1: /oradata/2/redo/TOOLS/redo2b.log
Current log# 9 seq# 13835 mem# 2: /oradata/3/redo/TOOLS/redo2c.log
Wed Oct 28 11:48:03 2009
Thread 1 cannot allocate new log, sequence 13836
Checkpoint not complete
Current log# 9 seq# 13835 mem# 0: /oradata/1/redo/TOOLS/redo2a.log
Current log# 9 seq# 13835 mem# 1: /oradata/2/redo/TOOLS/redo2b.log
Current log# 9 seq# 13835 mem# 2: /oradata/3/redo/TOOLS/redo2c.log
Wed Oct 28 11:48:06 2009
...

From the standby, as at 2009-10-28, 11:42, when the archiver tried to archive the standby
redo logfile. it encountered this error:

ORA-00354: corrupt redo log block header
ORA-00353: log corruption near block 604525 change 10551037679542 time 10/28/2009 11:29:50
ORA-00312: online log 2 thread 1: '/oradata/3/TOOLS/stdby_redo/srl1.log'

Errors in file /tools/oracle/admin/TOOLS/bdump/tools_arc0_2143.trc

The real logfile is retrieved from primary by the standby RFS process, then the log apply continue as usual.
The fact that the standby redo logs are corrupted and identified as corrupt by the ARC process , makes it clear that there could be some sort of I/O errors which has caused.
Reviewing the alert.log file it is clear that the RFS process fetched the new copy of the file which is corrupted and the issue has been resolved.
This is more an issue to be concentrated from the system adminisration end to determine in case there are any issues at the I.O subsystem
.

list some Script to Collect Data Guard Primary Site Diagnostic Information:

Overview
——–
This script is intended to provide an easy method to provide information
necessary to troubleshoot Data Guard issues.

Script Notes
————-
This script is intended to be run via sqlplus as the SYS or Internal user.

Script
——-
- – - – - – - – - – - – - – - – Script begins here – - – - – - – - – - – - – - – -

– NAME: dg_prim_diag.sql (Run on PRIMARY with a LOGICAL or PHYSICAL STANDBY)
– ————————————————————————
– Copyright 2002, Oracle Corporation
– LAST UPDATED: 2/23/04

– Usage: @dg_prim_diag
– ————————————————————————
– PURPOSE:
– This script is to be used to assist in collection information to help
– troubeshoot Data Guard issues with an emphasis on Logical Standby.
– ————————————————————————
– DISCLAIMER:
– This script is provided for educational purposes only. It is NOT
– supported by Oracle World Wide Technical Support.
– The script has been tested and appears to work as intended.
– You should always run new scripts on a test instance initially.
– ————————————————————————
– Script output is as follows:

set echo off
set feedback off
column timecol new_value timestamp
column spool_extension new_value suffix
select to_char(sysdate,’Mondd_hhmi’) timecol,
‘.out’ spool_extension from sys.dual;
column output new_value dbname
select value || ‘_’ output
from v$parameter where name = ‘db_name’;
spool dg_prim_diag_&&dbname&&timestamp&&suffix
set linesize 79
set pagesize 35
set trim on
set trims on
alter session set nls_date_format = ‘MON-DD-YYYY HH24:MI:SS’;
set feedback on
select to_char(sysdate) time from dual;

set echo on

– In the following the database_role should be primary as that is what
– this script is intended to be run on. If protection_level is different
– than protection_mode then for some reason the mode listed in
– protection_mode experienced a need to downgrade. Once the error
– condition has been corrected the protection_level should match the
– protection_mode after the next log switch.

column role format a7 tru
column name format a10 wrap

select name,database_role role,log_mode,
protection_mode,protection_level
from v$database;

– ARCHIVER can be (STOPPED | STARTED | FAILED). FAILED means that the
– archiver failed to archive a log last time, but will try again within 5
– minutes. LOG_SWITCH_WAIT The ARCHIVE LOG/CLEAR LOG/CHECKPOINT event log
– switching is waiting for. Note that if ALTER SYSTEM SWITCH LOGFILE is
– hung, but there is room in the current online redo log, then value is
– NULL

column host_name format a20 tru
column version format a9 tru

select instance_name,host_name,version,archiver,log_switch_wait
from v$instance;

– The following query give us information about catpatch.
– This way we can tell if the procedure doesn’t match the image.

select version, modified, status from dba_registry
where comp_id = ‘CATPROC’;

– Force logging is not mandatory but is recommended. Supplemental
– logging must be enabled if the standby associated with this primary is
– a logical standby. During normal operations it is acceptable for
– SWITCHOVER_STATUS to be SESSIONS ACTIVE or TO STANDBY.

column force_logging format a13 tru
column remote_archive format a14 tru
column dataguard_broker format a16 tru

select force_logging,remote_archive,
supplemental_log_data_pk,supplemental_log_data_ui,
switchover_status,dataguard_broker
from v$database;

– This query produces a list of all archive destinations. It shows if
– they are enabled, what process is servicing that destination, if the
– destination is local or remote, and if remote what the current mount ID
– is.

column destination format a35 wrap
column process format a7
column archiver format a8
column ID format 99
column mid format 99

select dest_id “ID”,destination,status,target,
schedule,process,mountid mid
from v$archive_dest order by dest_id;

– This select will give further detail on the destinations as to what
– options have been set. Register indicates whether or not the archived
– redo log is registered in the remote destination control file.

set numwidth 8
column ID format 99

select dest_id “ID”,archiver,transmit_mode,affirm,async_blocks async,
net_timeout net_time,delay_mins delay,reopen_secs reopen,
register,binding
from v$archive_dest order by dest_id;

– The following select will show any errors that occured the last time
– an attempt to archive to the destination was attempted. If ERROR is
– blank and status is VALID then the archive completed correctly.

column error format a55 wrap

select dest_id,status,error from v$archive_dest;

– The query below will determine if any error conditions have been
– reached by querying the v$dataguard_status view (view only available in
– 9.2.0 and above):

column message format a80

select message, timestamp
from v$dataguard_status
where severity in (‘Error’,'Fatal’)
order by timestamp;

– The following query will determine the current sequence number
– and the last sequence archived. If you are remotely archiving
– using the LGWR process then the archived sequence should be one
– higher than the current sequence. If remotely archiving using the
– ARCH process then the archived sequence should be equal to the
– current sequence. The applied sequence information is updated at
– log switch time.

select ads.dest_id,max(sequence#) “Current Sequence”,
max(log_sequence) “Last Archived”
from v$archived_log al, v$archive_dest ad, v$archive_dest_status ads
where ad.dest_id=al.dest_id
and al.dest_id=ads.dest_id
group by ads.dest_id;

– The following select will attempt to gather as much information as
– possible from the standby. SRLs are not supported with Logical Standby
– until Version 10.1.

set numwidth 8
column ID format 99
column “SRLs” format 99
column Active format 99

select dest_id id,database_mode db_mode,recovery_mode,
protection_mode,standby_logfile_count “SRLs”,
standby_logfile_active ACTIVE,
archived_seq#
from v$archive_dest_status;

– Query v$managed_standby to see the status of processes involved in
– the shipping redo on this system. Does not include processes needed to
– apply redo.

select process,status,client_process,sequence#
from v$managed_standby;

– The following query is run on the primary to see if SRL’s have been
– created in preparation for switchover.

select group#,sequence#,bytes from v$standby_log;

– The above SRL’s should match in number and in size with the ORL’s
– returned below:

select group#,thread#,sequence#,bytes,archived,status from v$log;

– Non-default init parameters.

set numwidth 5
column name format a30 tru
column value format a48 wra
select name, value
from v$parameter
where isdefault = ‘FALSE’;

spool off

- – - – - – - – - – - – - – - – Script ends here – - – - – - – - – - – - – - – -

another one:

Overview
——–

This script is intended to provide an easy method to provide information
necessary to troubleshoot Data Guard issues.

Script Notes
————-

This script is intended to be run via sqlplus as the SYS or Internal user.

Script
——-

- – - – - – - – - – - – - – - – Script begins here – - – - – - – - – - – - – - – -

– NAME: DG_phy_stby_diag.sql
– ————————————————————————
– AUTHOR:
– Michael Smith – Oracle Support Services – DataServer Group
– Copyright 2002, Oracle Corporation
– ————————————————————————
– PURPOSE:
– This script is to be used to assist in collection information to help
– troubeshoot Data Guard issues.
– ————————————————————————
– DISCLAIMER:
– This script is provided for educational purposes only. It is NOT
– supported by Oracle World Wide Technical Support.
– The script has been tested and appears to work as intended.
– You should always run new scripts on a test instance initially.
– ————————————————————————
– Script output is as follows:

set echo off
set feedback off
column timecol new_value timestamp
column spool_extension new_value suffix
select to_char(sysdate,’Mondd_hhmi’) timecol,
‘.out’ spool_extension from sys.dual;
column output new_value dbname
select value || ‘_’ output
from v$parameter where name = ‘db_name’;
spool dgdiag_phystby_&&dbname&&timestamp&&suffix
set lines 200
set pagesize 35
set trim on
set trims on
alter session set nls_date_format = ‘MON-DD-YYYY HH24:MI:SS’;
set feedback on
select to_char(sysdate) time from dual;

set echo on


– ARCHIVER can be (STOPPED | STARTED | FAILED) FAILED means that the archiver failed
– to archive a — log last time, but will try again within 5 minutes. LOG_SWITCH_WAIT
– The ARCHIVE LOG/CLEAR LOG/CHECKPOINT event log switching is waiting for. Note that
– if ALTER SYSTEM SWITCH LOGFILE is hung, but there is room in the current online
– redo log, then value is NULL

column host_name format a20 tru
column version format a9 tru
select instance_name,host_name,version,archiver,log_switch_wait from v$instance;

– The following select will give us the generic information about how this standby is
– setup. The database_role should be standby as that is what this script is intended
– to be ran on. If protection_level is different than protection_mode then for some
– reason the mode listed in protection_mode experienced a need to downgrade. Once the
– error condition has been corrected the protection_level should match the protection_mode
– after the next log switch.

column ROLE format a7 tru
select name,database_role,log_mode,controlfile_type,protection_mode,protection_level
from v$database;

– Force logging is not mandatory but is recommended. Supplemental logging should be enabled
– on the standby if a logical standby is in the configuration. During normal
– operations it is acceptable for SWITCHOVER_STATUS to be SESSIONS ACTIVE or NOT ALLOWED.

column force_logging format a13 tru
column remote_archive format a14 tru
column dataguard_broker format a16 tru
select force_logging,remote_archive,supplemental_log_data_pk,supplemental_log_data_ui,
switchover_status,dataguard_broker from v$database;

– This query produces a list of all archive destinations and shows if they are enabled,
– what process is servicing that destination, if the destination is local or remote,
– and if remote what the current mount ID is. For a physical standby we should have at
– least one remote destination that points the primary set but it should be deferred.

COLUMN destination FORMAT A35 WRAP
column process format a7
column archiver format a8
column ID format 99

select dest_id “ID”,destination,status,target,
archiver,schedule,process,mountid
from v$archive_dest;

– If the protection mode of the standby is set to anything higher than max performance
– then we need to make sure the remote destination that points to the primary is set
– with the correct options else we will have issues during switchover.

select dest_id,process,transmit_mode,async_blocks,
net_timeout,delay_mins,reopen_secs,register,binding
from v$archive_dest;

– The following select will show any errors that occured the last time an attempt to
– archive to the destination was attempted. If ERROR is blank and status is VALID then
– the archive completed correctly.

column error format a55 tru
select dest_id,status,error from v$archive_dest;

– Determine if any error conditions have been reached by querying thev$dataguard_status
– view (view only available in 9.2.0 and above):

column message format a80
select message, timestamp
from v$dataguard_status
where severity in (‘Error’,'Fatal’)
order by timestamp;

– The following query is ran to get the status of the SRL’s on the standby. If the
– primary is archiving with the LGWR process and SRL’s are present (in the correct
– number and size) then we should see a group# active.

select group#,sequence#,bytes,used,archived,status from v$standby_log;

– The above SRL’s should match in number and in size with the ORL’s returned below:

select group#,thread#,sequence#,bytes,archived,status from v$log;

– Query v$managed_standby to see the status of processes involved in the
– configuration.

select process,status,client_process,sequence#,block#,active_agents,known_agents
from v$managed_standby;

– Verify that the last sequence# received and the last sequence# applied to standby
– database.

select al.thrd “Thread”, almax “Last Seq Received”, lhmax “Last Seq Applied”
from (select thread# thrd, max(sequence#) almax
from v$archived_log
where resetlogs_change#=(select resetlogs_change# from v$database)
group by thread#) al,
(select thread# thrd, max(sequence#) lhmax
from v$log_history
where first_time=(select max(first_time) from v$log_history)
group by thread#) lh
where al.thrd = lh.thrd;

– The V$ARCHIVE_GAP fixed view on a physical standby database only returns the next
– gap that is currently blocking redo apply from continuing. After resolving the
– identified gap and starting redo apply, query the V$ARCHIVE_GAP fixed view again
– on the physical standby database to determine the next gap sequence, if there is
– one.

select * from v$archive_gap;

– Non-default init parameters.

set numwidth 5
column name format a30 tru
column value format a50 wra
select name, value
from v$parameter
where isdefault = ‘FALSE’;

spool off

- – - – - – - – - – - – - – - – Script ends here – - – - – - – - – - – - – - – -

Database Force open example

帮网友强制打开了一个没有备份的测试库,这个库没有备份也没有打开归档,因为之前也出现过active日志文件损毁,一直使用隐式参数才能正常打开:

                 _allow_resetlogs_corruption= TRUE

这次一开始这个库报ORA-600[2662]错误:

Mon Aug 23 09:37:00 2010
Errors in file /oracle/QAS/saptrace/usertrace/qas_ora_852096.trc:
ORA-00600: internal error code, arguments: [2662], [0], [130131504], [0], [130254136], [4264285], [], []
Mon Aug 23 09:37:02 2010
Errors in file /oracle/QAS/saptrace/usertrace/qas_ora_852096.trc:
ORA-00600: internal error code, arguments: [2662], [0], [130131506], [0], [130254136], [4264285], [], []

ORA-600 [2662] “Block SCN is ahead of Current SCN”错误是当数据块中的SCN领先于current SCN,由于后台进程或服务进程都会比对UGA中的dependent SCN和数据库当前的SCN,如果数据库当前SCN小于dependent SCN,那么该进程就会报ORA-600 [2662]错误,如果遭遇该错误的是服务进程,那么服务进程一般会异常终止;如果遭遇该错误的是后台进程譬如SMON,则会导致实例CRASH。
ORA-600 [2662]错误可以能由以下几种情况引起:
1.启用隐含参数_ALLOW_RESETLOGS_CORRUPTION后,以resetlogs形式打开数据库;这种情况下发生2662错误,根本原因是没有完全前滚导致控制文件中的SCN滞后于数据块中的SCN。
2.硬件故障导致数据库没法写控制文件和联机日志文件
3.错误的部分恢复数据库
4.恢复了控制文件,但是没有使用recover database using backup controlfile进行恢复
5.数据库crash后设置了_DISABLE_LOGGING隐含参数
6.在并行服务器环境中DLM存在问题

该错误的5个参数的具体含义如下:
ARGUMENTS:
Arg [a] Current SCN WRAP
Arg [b] Current SCN BASE
Arg [c] dependent SCN WRAP
Arg [d] dependent SCN BASE
Arg [e] Where present this is the DBA where the dependent SCN came from.

我们的case当中dependent SCN为130254136,而当前SCN为130131506,其差值为122630;从以上告警日志中可以看到数据库的当前SCN是在不断缓慢增长的,当我们遭遇到2662错误时,很滑稽的一点是只要不断重启数据库保持current SCN的增长,一段时间后2662错误会不药而愈。当然我们也可以不用这种笨办法,10015事件可以帮助我们调整数据库当前SCN:

/* 当数据库处于mount状态,可以使用10015事件来调整scn */

alter session  set events '10015 trace name adjust_scn level 1';

/* 这里可以设置level 2..10等 (level 1是在每次打开数据库时scn增加1000k)*/

/* 需要注意的是10g某些版本不同于9i,需要设置隐式参数_allow_error_simulation,才能真正增进scn */

SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

SQL> col current_scn format 999,999,999,999

SQL> select current_scn from v$database;
CURRENT_SCN
-----------
    1141408

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.

Total System Global Area 1653518336 bytes
Fixed Size                  2213896 bytes
Variable Size             989857784 bytes
Database Buffers          654311424 bytes
Redo Buffers                7135232 bytes
Database mounted.

SQL> alter session set events '10015 trace name adjust_scn level 1';
Session altered.

SQL> alter database open;
Database altered.

SQL>  select current_scn from v$database;
CURRENT_SCN
-----------
    1142031

/* 可以看到current_scn并未大量增加,10.2.0.4上默认10015 adjust_scn不被触发 */

SQL>  alter system set "_allow_error_simulation"=true scope=spfile;
System altered.

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.

SQL> startup mount;
ORACLE instance started.
Total System Global Area 1653518336 bytes
Fixed Size                  2213896 bytes
Variable Size             989857784 bytes
Database Buffers          654311424 bytes
Redo Buffers                7135232 bytes
Database mounted.

SQL> alter session set events '10015 trace name adjust_scn level 1';
Session altered.

SQL> alter database open;
Database altered.

SQL>select current_scn from v$database;
     CURRENT_SCN
----------------
   1,073,741,980

在接手之前该网友已经通过反复重启数据库将数据库的当前SCN提高到dependent SCN的127037138;原以为这样就可以打开数据库了,谁知道又出现了一下错误:

Wed Aug 25 07:43:53 2010
Errors in file /oracle/QAS/saptrace/usertrace/qas_ora_929958.trc:
ORA-00600: internal error code, arguments: [4000], [8], [], [], [], [], [], []
Wed Aug 25 07:43:53 2010
Errors in file /oracle/QAS/saptrace/usertrace/qas_ora_929958.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [4000], [8], [], [], [], [], [], []
Wed Aug 25 07:43:53 2010
Error 704 happened during db open, shutting down database

bootstrap自举过程中遭遇了ORA-600 [4000]错误,该错误一般当Oracle尝试读取数据字典(主要是undo$基表)中记录的USN对应的回滚段失败引起.,通过设置隐式参数_corrupted_rollback_segments可以一定程度上规避该错误,强制打开数据库,其Argument[a]代表造成读取失败的USN(undo segment number),但实际上有问题的回滚段可能不止这一个:

/* 通过strings工具从system表空间上找回各回滚段的名字  */
$strings system.dbf |grep _SYSSMU|less
_SYSSMU1$
_SYSSMU2$
_SYSSMU3$
_SYSSMU4$
_SYSSMU5$
_SYSSMU6$
_SYSSMU7$
_SYSSMU8$
_SYSSMU9$
.........
alter system set "_corrupted_rollback_segments"='(_SYSSMU1$, _SYSSMU2$, _SYSSMU3$, _SYSSMU4$, _SYSSMU5$, _SYSSMU6$, _SYSSMU7$, _SYSSMU8$, _SYSSMU9$, _SYSSMU10$, _SYSSMU11$, _SYSSMU12$)' scope=spfile;
System altered.

/* 即便设置了_corrupted_rollback_segments隐式参数,也还有一定概率遭遇4000错误,尝试加上10513事件,并多次重启数据库 */

SQL> alter system set event='10513 trace name context forever,level 2' scope=spfile;
System altered.

/* 再次出现4000 错误 */
Errors in file /oracle/QAS/saptrace/usertrace/qas_ora_1016014.trc:
ORA-00600: internal error code, arguments: [4000], [8], [], [], [], [], [], []
Thu Aug 26 09:43:39 2010
Errors in file /oracle/QAS/saptrace/usertrace/qas_ora_1016014.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [4000], [8], [], [], [], [], [], []
Thu Aug 26 09:43:39 2010
Error 704 happened during db open, shutting down database

/* 再次重启后发现4000错误不再出现 * /

再次重启发现不再出现ORA-600[4000]错误,但在字典检查阶段Oracle认为数据文件227不匹配于当前的incarnation:

Thu Aug 26 11:13:22 2010
Dictionary check beginning
Thu Aug 26 09:46:00 2010
Errors in file /oracle/QAS/saptrace/usertrace/qas_ora_897162.trc:
ORA-01177: data file does not match dictionary - probably old incarnation
ORA-01110: data file 227: '/oracle/QAS/sapdata2/qas_192/qas.data196'
Error 1177 happened during db open, shutting down database
USER: terminating instance due to error 1177
Instance terminated by USER, pid = 897162

初步判断出现ORA-01177可能为2种可能性:
1.数据字典出现讹误,227号文件对应的incarnation信息不正确
2.在之前的某次resetlogs open过程中,227号文件头由于某些原因没有正确更新incarnation信息

针对这样的情况如果一定要找回该数据文件上的数据的话只能通过手动修改数据字典或文件头,当然也可以尝试使用一些直接从数据文件上抽取数据的工具。
因为这是一次友情协助,就没有继续深入下去,通过重建控制文件并跳过该数据文件解决了:

CREATE CONTROLFILE REUSE DATABASE "QAS" RESETLOGS  NOARCHIVELOG
--  SET STANDBY TO MAXIMIZE PERFORMANCE
    MAXLOGFILES 255
    MAXLOGMEMBERS 3
    MAXDATAFILES 254
    MAXINSTANCES 50
    MAXLOGHISTORY 36302
LOGFILE
  GROUP 1 (
    '/oracle/QAS/redolog/redolog11A.dbf',
    '/oracle/QAS/redolog/redolog11B.dbf'
  ) SIZE 500M,
  GROUP 2 (
    '/oracle/QAS/redolog/redolog12A.dbf',
    '/oracle/QAS/redolog/redolog12B.dbf'
  ) SIZE 500M
-- STANDBY LOGFILE
DATAFILE
  '/oracle/QAS/sapdata1/system_1/system.data1',
........
  '/oracle/QAS/sapdata2/qas_192/qas.data195'
CHARACTER SET WE8DEC
Thu Aug 26 14:04:50 2010
Successful mount of redo thread 1, with mount id 2117500093
Thu Aug 26 14:04:50 2010
Completed: CREATE CONTROLFILE REUSE DATABASE "QAS" RESETLOGS
Thu Aug 26 14:05:05 2010
alter database mount
Thu Aug 26 14:05:05 2010
ORA-1100 signalled during: alter database mount...
Thu Aug 26 14:05:15 2010
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 1125281471596
Resetting resetlogs activation ID 0 (0x0)
Online log 1 of thread 1 was previously cleared
Thu Aug 26 14:05:36 2010
Assigning activation ID 2117500093 (0x7e367cbd)
Thread 1 opened at log sequence 1
  Current log# 2 seq# 1 mem# 0: /oracle/QAS/redolog/redolog12A.dbf
  Current log# 2 seq# 1 mem# 1: /oracle/QAS/redolog/redolog12B.dbf
Successful open of redo thread 1
Thu Aug 26 14:05:36 2010
SMON: enabling cache recovery
Thu Aug 26 14:05:36 2010
Dictionary check beginning
Tablespace 'PSAPTEMP' #2 found in data dictionary,
but not in the controlfile. Adding to controlfile.
File #227 found in data dictionary but not in controlfile.
Creating OFFLINE file 'MISSING00227' in the controlfile.
This file can no longer be recovered so it must be dropped.
File #228 found in data dictionary but not in controlfile.
Creating OFFLINE file 'MISSING00228' in the controlfile.
This file can no longer be recovered so it must be dropped.
File #229 found in data dictionary but not in controlfile.
Creating OFFLINE file 'MISSING00229' in the controlfile.
This file can no longer be recovered so it must be dropped.
Dictionary check complete
Thu Aug 26 14:05:38 2010
SMON: enabling tx recovery
Thu Aug 26 14:05:38 2010
Database Characterset is WE8DEC
replication_dependency_tracking turned off (no async multimaster replication found)
Completed: alter database open resetlogs

这个case告诉我们,测试库并不一定就不重要,测试库也是需要备份的。

DataGuard Managed recovery hang

Our team deleted some archivelog by mistake. Rolled the database forwards by RMAN incremental recovery to an SCN. Did a manual recovery to sync it with the primary. Managed recovery is now failing.
alter database recover managed standby database disconnect

Alert log has :

Fri Jan 22 13:50:22 2010
Attempt to start background Managed Standby Recovery process
MRP0 started with pid=12
MRP0: Background Managed Standby Recovery process started
Media Recovery Waiting for thread 1 seq# 193389
Fetching gap sequence for thread 1, gap sequence 193389-193391
Trying FAL server: ITS
Fri Jan 22 13:50:28 2010
Completed: alter database recover managed standby database d
Fri Jan 22 13:53:25 2010
Failed to request gap sequence. Thread #: 1, gap sequence: 193389-193391
All FAL server has been attempted.

Managed recovery was working earlier today after the Rman incremental and resolved two gaps automatically. But it now appears hung with the standby falling behind the primary.

SQL> show parameter fal

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
fal_client string ITS_STBY
fal_server string ITS

[v08k608:ITS:oracle]$ tnsping ITS_STBY

TNS Ping Utility for Solaris: Version 9.2.0.7.0 - Production on 22-JAN-2010 15:01:17

Copyright (c) 1997 Oracle Corporation. All rights reserved.

Used parameter files:
/oracle/product/9.2.0/network/admin/sqlnet.ora


Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL= TCP)(Host= v08k608.am.mot.com)(Port= 1526)) (CONNECT_DATA = (SID = ITS)))
OK (10 msec)
[v08k608:ITS:oracle]$ tnsping ITS

TNS Ping Utility for Solaris: Version 9.2.0.7.0 - Production on 22-JAN-2010 15:01:27

Copyright (c) 1997 Oracle Corporation. All rights reserved.

Used parameter files:
/oracle/product/9.2.0/network/admin/sqlnet.ora


Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL= TCP)(Host= 187.10.68.75)(Port= 1526)) (CONNECT_DATA = (SID = ITS)))
OK (320 msec)

Primary has :
SQL> show parameter log_archive_dest_2
log_archive_dest_2 string SERVICE=DRITS_V08K608 reopen=6
0 max_failure=10 net_timeout=1
80 LGWR ASYNC=20480 OPTIONAL
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest_state_2 string ENABLE
[ITS]/its15/oradata/ITS/arch> tnsping DRITS_V08K608
TNS Ping Utility for Solaris: Version 9.2.0.7.0 - Production on 22-JAN-2010 15:03:24
Copyright (c) 1997 Oracle Corporation. All rights reserved.
Used parameter files:
/oracle/product/9.2.0/network/admin/sqlnet.ora
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL= TCP)(Host= 10.177.13.57)(Port= 1526)) (CONNECT_DATA = (SID = ITS)))
OK (330 msec)

The arch process on the primary database might hang due to a bug below so that it couldn’t ship the missing archive log
files to the standby database.

BUG 6113783 ARC PROCESSES CAN HANG INDEFINITELY ON NETWORK
[ Not published so not viewable in My Oracle Support ]
Fixed 11.2, 10.2.0.5 patchset

We could work workaround the issue by killing the arch processes on the primary site and they will be respawned
automatically immediately without harming the primary database.

[maclean@rh2 ~]$ ps -ef|grep arc
maclean   8231     1  0 22:24 ?        00:00:00 ora_arc0_PROD
maclean   8233     1  0 22:24 ?        00:00:00 ora_arc1_PROD
maclean   8350  8167  0 22:24 pts/0    00:00:00 grep arc
[maclean@rh2 ~]$ kill -9 8231 8233
[maclean@rh2 ~]$ ps -ef|grep arc
maclean   8389     1  0 22:25 ?        00:00:00 ora_arc0_PROD
maclean   8391     1  1 22:25 ?        00:00:00 ora_arc1_PROD
maclean   8393  8167  0 22:25 pts/0    00:00:00 grep arc

and alert log will have:

Fri Jul 30 22:25:27 EDT 2010
ARCH: Detected ARCH process failure
ARCH: Detected ARCH process failure
ARCH: STARTING ARCH PROCESSES
ARC0 started with pid=26, OS id=8389
Fri Jul 30 22:25:27 EDT 2010
ARC0: Archival started
ARC1: Archival started
ARCH: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=27, OS id=8391
Fri Jul 30 22:25:27 EDT 2010
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
Fri Jul 30 22:25:27 EDT 2010
ARC1: Becoming the heartbeat ARCH

Actually if we don’t kill some fatal process in 10g , oracle will respawn all nonfatal processes.
For example:

[maclean@rh2 ~]$ ps -ef|grep ora_|grep -v grep
maclean  14264     1  0 23:16 ?        00:00:00 ora_pmon_PROD
maclean  14266     1  0 23:16 ?        00:00:00 ora_psp0_PROD
maclean  14268     1  0 23:16 ?        00:00:00 ora_mman_PROD
maclean  14270     1  0 23:16 ?        00:00:00 ora_dbw0_PROD
maclean  14272     1  0 23:16 ?        00:00:00 ora_lgwr_PROD
maclean  14274     1  0 23:16 ?        00:00:00 ora_ckpt_PROD
maclean  14276     1  0 23:16 ?        00:00:00 ora_smon_PROD
maclean  14278     1  0 23:16 ?        00:00:00 ora_reco_PROD
maclean  14338     1  0 23:16 ?        00:00:00 ora_arc0_PROD
maclean  14340     1  0 23:16 ?        00:00:00 ora_arc1_PROD
maclean  14452     1  0 23:17 ?        00:00:00 ora_s000_PROD
maclean  14454     1  0 23:17 ?        00:00:00 ora_d000_PROD
maclean  14456     1  0 23:17 ?        00:00:00 ora_cjq0_PROD
maclean  14458     1  0 23:17 ?        00:00:00 ora_qmnc_PROD
maclean  14460     1  0 23:17 ?        00:00:00 ora_mmon_PROD
maclean  14462     1  0 23:17 ?        00:00:00 ora_mmnl_PROD
maclean  14467     1  0 23:17 ?        00:00:00 ora_q000_PROD
maclean  14568     1  0 23:18 ?        00:00:00 ora_q001_PROD

[maclean@rh2 ~]$ ps -ef|grep ora_|grep -v pmon|grep -v ckpt |grep -v lgwr|grep -v smon|grep -v grep|grep -v dbw|grep -v psp|grep -v mman |grep -v rec|awk '{print $2}'|xargs kill -9

and alert log will have:
Fri Jul 30 23:20:58 EDT 2010
ARCH: Detected ARCH process failure
ARCH: Detected ARCH process failure
ARCH: STARTING ARCH PROCESSES
ARC0 started with pid=20, OS id=14959
Fri Jul 30 23:20:58 EDT 2010
ARC0: Archival started
ARC1: Archival started
ARCH: STARTING ARCH PROCESSES COMPLETE
Fri Jul 30 23:20:58 EDT 2010
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
ARC1 started with pid=21, OS id=14961
ARC1: Becoming the heartbeat ARCH
Fri Jul 30 23:21:29 EDT 2010
found dead shared server 'S000', pid = (10, 3)
found dead dispatcher 'D000', pid = (11, 3)
Fri Jul 30 23:22:29 EDT 2010
Restarting dead background process CJQ0
Restarting dead background process QMNC
CJQ0 started with pid=12, OS id=15124
Fri Jul 30 23:22:29 EDT 2010
Restarting dead background process MMON
QMNC started with pid=13, OS id=15126
Fri Jul 30 23:22:29 EDT 2010
Restarting dead background process MMNL
MMON started with pid=14, OS id=15128
MMNL started with pid=16, OS id=15132

That's all right!

Fail to queue the whole FAL gap in dataguard一例

近日告警日志中出现以下记录:
FAL[server]: Fail to queue the whole FAL gap
GAP – thread 1 sequence 180-180
DBID 3731271451 branch 689955035

这是一个10.2.0.3的dataguard环境,采用物理备库,归档传输模式;查询metalink发现相关note:

Symptoms

When using ARCH transport, gaps could be flagged in the alert log even though the single log gap was for a log that had not been written at the primary yet.

alert.log on primary shows:

FAL[server]: Fail to queue the whole FAL gap

GAP – thread 1 sequence 63962-63962

DBID 1243807152 branch 631898097

or alert.log on standby shows:

Fetching gap sequence in thread 1, gap sequence 63962-63962

Thu Jan 24 14:36:30 2008

FAL[client]: Failed to request gap sequence

GAP – thread 1 sequence 63962-63962

DBID 2004523329 branch 594300676

FAL[client]: All defined FAL servers have been attempted.

v$archive_gap returns no rows

SQL> select * from v$archive_gap;

no rows selected

Cause

Bug 5526409 – FAL gaps reported at standby for log not yet written at primary

Solution

Bug 5526409 is fixed in 10.2.0.4 and 11.1.

Upgrade to 10.2.0.4 as Bug 5526409 is fixed in 10.2.0.4.

Their is no impact of these messages on the database. You can safely ignore these messages.

One-off Patch for Bug 5526409 on top of 10.2.0.3 is available for some platforms. Please check Patch 5526409 for your platform.

该note描述在10.1.0.2-10.2.0.3版本中,在ARCH传输的DataGuard环境中可能出现日志传输gap为单个在primary库中尚未写出的日志,该gap可能会在告警日志中以以上形式标示。
该bug(问题)在版本10.2.0.4和11.1中得到了修复,在10.2.0.3版本中部分平台上有one-off补丁。但实际上该bug(问题)对于主备库不会有任何影响,我们也可以将之忽略。

7月最新发布11.2.0.1.2 Patch set update

7月13日,11g release 2 的第二个补丁集更新发布了;9i的最终版本为9.2.0.8,10g上10.2.0.5很有可能成为最终版本,我们预期今后(11g,12g)中Patch set数量会有效减少,而patch set update数量可能大幅增加;这样的更新形式可以为Oracle Database提升一定的软件形象。可以猜想11gr2的最终版本号可能是11.2.0.2/3.x。

附该psu的readme note:

Released: July 13, 2010

This document is accurate at the time of release. For any changes and additional information regarding PSU 11.2.0.1.2, see these related documents that are available at My Oracle Support (http://support.oracle.com/):

  • Note 854428.1 Patch Set Updates for Oracle Products
  • Note 1089071.1 Oracle Database Patch Set Update 11.2.0.1.2 Known Issues

This document includes the following sections:

1 Patch Information

Patch Set Update (PSU) patches are cumulative. That is, the content of all previous PSUs is included in the latest PSU patch.

PSU 11.2.0.1.2 includes the fixes listed in Section 5, “Bugs Fixed by This Patch”.

Table 1 describes installation types and security content. For each installation type, it indicates the most recent PSU patch to include new security fixes that are pertinent to that installation type. If there are no security fixes to be applied to an installation type, then “None” is indicated. If a specific PSU is listed, then apply that or any later PSU patch to be current with security fixes.

Table 1 Installation Types and Security Content

Installation Type Latest PSU with Security Fixes
Server homes PSU 11.2.0.1.2


Client-Only Installations None
Instant Client Installations None

(The Instant Client installation is not the same as the client-only Installation. For additional information about Instant Client installations, see Oracle Database Concepts.)

2 Patch Installation and Deinstallation

This section includes the following sections:

2.1 Platforms for PSU 11.2.0.1.2

For a list of platforms that are supported in this Patch Set Update, see My Oracle Support Note 1060989.1 Critical Patch Update July 2010 Patch Availability Document for Oracle Products.

2.2 OPatch Utility Information

You must use the OPatch utility version 11.2.0.1.0 or later to apply this patch. Oracle recommends that you use the latest released OPatch 11.2, which is available for download from My Oracle Support patch 6880880 by selecting the 11.2.0.0.0 release.

For information about OPatch documentation, including any known issues, see My Oracle Support Note 293369.1 OPatch documentation list.

2.3 Patch Installation

These instructions are for all Oracle Database installations.

2.3.1 Patch Pre-Installation Instructions

Before you install PSU 11.2.0.1.2, perform the following actions to check the environment and to detect and resolve any one-off patch conflicts.

2.3.1.1 Environments with ASM

If you are installing the PSU to an environment that has Automatic Storage Management (ASM), note the following:

  • For Linux x86 and Linux x86-64 platforms, install either (A) the bug fix for 8898852 and the Database PSU patch 9654983, or (B) the Grid Infrastructure PSU patch 9343627.
  • For all other platforms, no action is required. The fix for 8898852 was included in the base 11.2.0.1.0 release.

2.3.1.2 Environment Checks
  1. Ensure that the $PATH definition has the following executables: make, ar, ld, and nm.The location of these executables depends on your operating system. On many operating systems, they are located in /usr/ccs/bin, in which case you can set your PATH definition as follows:
    export PATH=$PATH:/usr/ccs/bin
    

2.3.1.3 One-off Patch Conflict Detection and Resolution

For an introduction to the PSU one-off patch concepts, see “Patch Set Updates Patch Conflict Resolution” in My Oracle Support Note 854428.1 Patch Set Updates for Oracle Products.

The fastest and easiest way to determine whether you have one-off patches in the Oracle home that conflict with the PSU, and to get the necessary conflict resolution patches, is to use the Patch Recommendations and Patch Plans features on the Patches & Updates tab in My Oracle Support. These features work in conjunction with the My Oracle Support Configuration Manager. Recorded training sessions on these features can be found in Note 603505.1.

However, if you are not using My Oracle Support Patch Plans, follow these steps:

  1. Determine whether any currently installed one-off patches conflict with the PSU patch as follows:
    unzip p9654983_11201_<platform>.zip
    opatch prereq CheckConflictAgainstOHWithDetail -phBaseDir ./9654983
    
  2. The report will indicate the patches that conflict with PSU 9654983 and the patches for which PSU 9654983 is a superset.Note that Oracle proactively provides PSU 11.2.0.1.2 one-off patches for common conflicts.
  3. Use My Oracle Support Note 1061295.1 Patch Set Updates – One-off Patch Conflict Resolution to determine, for each conflicting patch, whether a conflict resolution patch is already available, and if you need to request a new conflict resolution patch or if the conflict may be ignored.
  4. When all the one-off patches that you have requested are available at My Oracle Support, proceed with Section 2.3.2, “Patch Installation Instructions”.

2.3.2 Patch Installation Instructions

Follow these steps:

  1. If you are using a Data Guard Physical Standby database, you must first install this patch on the primary database before installing the patch on the physical standby database. It is not supported to install this patch on the physical standby database before installing the patch on the primary database. For more information, see My Oracle Support Note 278641.1.
  2. Do one of the following, depending on whether this is a RAC environment:
    • If this is a RAC environment, choose one of the patch installation methods provided by OPatch (rolling, all node, or minimum downtime), and shutdown instances and listeners as appropriate for the installation method selected.This PSU patch is rolling RAC installable. Refer to My Oracle Support Note 244241.1 Rolling Patch – OPatch Support for RAC.
    • If this is not a RAC environment, shut down all instances and listeners associated with the Oracle home that you are updating. For more information, see Oracle Database Administrator’s Guide.
  3. Set your current directory to the directory where the patch is located and then run the OPatch utility by entering the following commands:
    unzip p9654983_11201_<platform>.zip
    cd 9654983
    opatch apply
    
  4. If there are errors, refer to Section 3, “Known Issues”.

2.3.3 Patch Post-Installation Instructions

After installing the patch, perform the following actions:

  1. Apply conflict resolution patches as explained in Section 2.3.3.1.
  2. Load modified SQL files into the database, as explained in Section 2.3.3.2.

2.3.3.1 Applying Conflict Resolution Patches

Apply the patch conflict resolution one-off patches that were determined to be needed when you performed the steps in Section 2.3.1.3, “One-off Patch Conflict Detection and Resolution”.

2.3.3.2 Loading Modified SQL Files into the Database

The following steps load modified SQL files into the database. For a RAC environment, perform these steps on only one node.

  1. For each database instance running on the Oracle home being patched, connect to the database using SQL*Plus. Connect as SYSDBA and run the catbundle.sql script as follows:
    cd $ORACLE_HOME/rdbms/admin
    sqlplus /nolog
    SQL> CONNECT / AS SYSDBA
    SQL> STARTUP
    SQL> @catbundle.sql psu apply
    SQL> QUIT
    

    The catbundle.sql execution is reflected in the dba_registry_history view by a row associated with bundle series PSU.

    For information about the catbundle.sql script, see My Oracle Support Note 605795.1 Introduction to Oracle Database catbundle.sql.

  2. Check the following log files in $ORACLE_HOME/cfgtoollogs/catbundle for any errors:
    catbundle_PSU_<database SID>_APPLY_<TIMESTAMP>.log
    catbundle_PSU_<database SID>_GENERATE_<TIMESTAMP>.log
    

    where TIMESTAMP is of the form YYYYMMMDD_HH_MM_SS. If there are errors, refer to Section 3, “Known Issues”.

2.3.4 Patch Post-Installation Instructions for Databases Created or Upgraded after Installation of PSU 11.2.0.1.2 in the Oracle Home

These instructions are for a database that is created or upgraded after the installation of PSU 11.2.0.1.2.

You must execute the steps in Section 2.3.3.2, “Loading Modified SQL Files into the Database” for any new database only if it was created by any of the following methods:

  • Using DBCA (Database Configuration Assistant) to select a sample database (General, Data Warehouse, Transaction Processing)
  • Using a script that was created by DBCA that creates a database from a sample database

2.4 Patch Deinstallation

These instructions apply if you need to deinstall the patch.

2.4.1 Patch Deinstallation Instructions for a Non-RAC Environment

Follow these steps:

  1. Verify that an $ORACLE_HOME/rdbms/admin/catbundle_PSU_<database SID>_ROLLBACK.sql file exists for each database associated with this ORACLE_HOME. If this is not the case, you must execute the steps in Section 2.3.3.2, “Loading Modified SQL Files into the Database” against the database before deinstalling the PSU.
  2. Shut down all instances and listeners associated with the Oracle home that you are updating. For more information, see Oracle Database Administrator’s Guide.
  3. Run the OPatch utility specifying the rollback argument as follows.
    opatch rollback -id 9654983
    
  4. If there are errors, refer to Section 3, “Known Issues”.

2.4.2 Patch Post-Deinstallation Instructions for a Non-RAC Environment

Follow these steps:

  1. Start all database instances running from the Oracle home. (For more information, see Oracle Database Administrator’s Guide.)
  2. For each database instance running out of the ORACLE_HOME, connect to the database using SQL*Plus as SYSDBA and run the rollback script as follows:
    cd $ORACLE_HOME/rdbms/admin
    sqlplus /nolog
    SQL> CONNECT / AS SYSDBA
    SQL> STARTUP
    SQL> @catbundle_PSU_<database SID>_ROLLBACK.sql
    SQL> QUIT
    

    In a RAC environment, the name of the rollback script will have the format catbundle_PSU_<database SID PREFIX>_ROLLBACK.sql.

  3. Check the log file for any errors. The log file is found in $ORACLE_HOME/cfgtoollogs/catbundle and is named catbundle_PSU_<database SID>_ROLLBACK_<TIMESTAMP>.log where TIMESTAMP is of the form YYYYMMMDD_HH_MM_SS. If there are errors, refer to Section 3, “Known Issues”.

2.4.3 Patch Deinstallation Instructions for a RAC Environment

Follow these steps for each node in the cluster, one node at a time:

  1. Shut down the instance on the node.
  2. Run the OPatch utility specifying the rollback argument as follows.
    opatch rollback -id 9654983
    

    If there are errors, refer to Section 3, “Known Issues”.

  3. Start the instance on the node as follows:
    srvctl start instance
    

2.4.4 Patch Post-Deinstallation Instructions for a RAC Environment

Follow the instructions listed in Section Section 2.4.2, “Patch Post-Deinstallation Instructions for a Non-RAC Environment” only on the node for which the steps in Section 2.3.3.2, “Loading Modified SQL Files into the Database” were executed during the patch application.

All other instances can be started and accessed as usual while you are executing the deinstallation steps.

3 Known Issues

For information about OPatch issues, see My Oracle Support Note 293369.1 OPatch documentation list.

For issues documented after the release of this PSU, see My Oracle Support Note 1089071.1 Oracle Database Patch Set Update 11.2.0.1.2 Known Issues.

Other known issues are as follows.

Issue 1
The following ignorable errors may be encountered while running the catbundle.sql script or its rollback script:

ORA-29809: cannot drop an operator with dependent objects
ORA-29931: specified association does not exist
ORA-29830: operator does not exist
ORA-00942: table or view does not exist
ORA-00955: name is already used by an existing object
ORA-01430: column being added already exists in table
ORA-01432: public synonym to be dropped does not exist
ORA-01434: private synonym to be dropped does not exist
ORA-01435: user does not exist
ORA-01917: user or role 'XDB' does not exist
ORA-01920: user name '<user-name>' conflicts with another user or role name
ORA-01921: role name '<role name>' conflicts with another user or role name
ORA-01952: system privileges not granted to 'WKSYS'
ORA-02303: cannot drop or replace a type with type or table dependents
ORA-02443: Cannot drop constraint - nonexistent constraint
ORA-04043: object <object-name> does not exist
ORA-29832: cannot drop or replace an indextype with dependent indexes
ORA-29844: duplicate operator name specified
ORA-14452: attempt to create, alter or drop an index on temporary table already in use
ORA-06512: at line <line number>. If this error follow any of above errors, then can be safely ignored.
ORA-01927: cannot REVOKE privileges you did not grant

4 References

The following documents are references for this patch.

Note 293369.1 OPatch documentation list

Note 360870.1 Impact of Java Security Vulnerabilities on Oracle Products

Note 468959.1 Enterprise Manager Grid Control Known Issues

Note 9352237.8 Bug 9352237 – 11.2.0.1.1 Patch Set Update (PSU)

5 Bugs Fixed by This Patch

This patch includes the following bug fixes.

5.1 CPU Molecules

CPU molecules in PSU 11.2.0.1.2:

PSU 11.2.0.1.2 contains the following new CPU molecules:

9676419 – DB-11.2.0.1-MOLECULE-004-CPUJUL2010

9676420 – DB-11.2.0.1-MOLECULE-005-CPUJUL2010

5.2 Bug Fixes

PSU 11.2.0.1.2 contains the following new fixes:

Automatic Storage Management

8755082 – ORA-00600: [KCFIS_TRANSLATE4:VOLUME LOOKUP], [2], [WRONG DEVICE NAME], [], [], [

8890026 - ASM PARTNERING CREATES IMBALANCED PARTNERSHIPS

9170608 - STBH:DD BLOCKS PINNED FOR QUERIES THAT DO NOT REQUEST USED SPACE

9363145 - STBH:DB INSTANCES TERMINATED BY ASMB DUE TO ORA-00600 [KFDSKALLOC0]

Buffer Cache

8330783 – HANGING DB WITH “CACHE BUFFER CHAINS” AND “BUFFER DEADLOCK” WAITS DURING INSERT

8822531 – TAKING AWR SNAP HANGS

Data Guard Broker

8918433 – UNPERSISTED FSFO STATE BITS CAN GET PERSISTED

9363384 – PHYSICAL STANDBY SERVICES NOT STARTED AFTER CONVERT FROM SNAPSHOT

9467635 – BROKER’S METADATA FILE UPGRADE TO 11.2 IS BROKEN

9467727 – GETSTATUS DOC YIELDS INCORRECT RESULT IF DBRESOURCE_ID PROP VALUE IS USED

Data Guard Logical

8774868 – LGSBFSFO: ORA-600 [3020], [3], [138] RAISED IN RECOVERY SLAVE

8822832 – V$ARCHIVE_DEST_STATUS HAS INCORRECT VALUE FOR APPLIED_SEQ#

DataGuard Redo Transport

8872096 – ARCHIVING FORCED DURING CLOSE WHEN NO STANDBY IS PRESENT

9399090 – STBH: CONSTANT/HIGH FREQUENT LOG SWITCHES ON BEEHIVE DATABASE IN THE LAST 3 DAYS

Shared Cursors

8865718 – RECURSIVE CURSORS CONTAINING “AS OF SNAPSHOT” CLAUSE ARE NOT SHARED

8981059 – HIGH VERSION COUNT:BIND_MISMATCH,USER_BIND_PEEK_MISMATCH,OPTIMIZER_MODE_MISMATCH

9010222 – APPS ST 11G ORA-00600 [KKSFBC-REPARSE-INFINITE-LOOP]

9067282 – TB:SH:ORA-00600:[KKSFBC-WRONG-KKSCSFLGS] WHILE RUNNING TPC-H

DML Drivers

9255542 – ARRAY INSERT TO PARTITIONED TABLE LOOSES ROWS DUE TO CONCURRENT DDL (ORA-14403)

9488887 – FORIEGN KEY VIOLATION WITH ARRAY-INSERT AND ONLINE IDX REBUILD AFTER BUG-9255542

Flashback Database

8834425 – ORA-240 IN RVWR PROCESS CAUSING 5MIN TRANSACTIONAL HANG

PLSQL

9210925 – AFTER MANUAL UPGRADE TO 11.1.0.7 PL/SQL CALLS INCORRECT FUNCTION

Automatic Memory Management

8505803 – PRE_PAGE_SGA RESULTS IN EXCESSIVE PAGE TABLE SIZE WHEN USING MEMORY_TARGET [AMM]

Partitioning

9165206 – PARTITIONING ORA-600 [KKPOLLS1] / [KKDOILSF1] – DURING PARTITION MAINTANANCE

Real Application Cluster

8875671 – LX64: ORA-600 ARGS [KJPNP_CHK:!MASTER_READY],

9093300 – LOTS OF REPEATED KJXOCDR: DROP DUPLICATE OPEN MESSAGE IN LMD TRACE

Row Access Method

8544696 – TABLE GROWTH – BLOCKS ARE NOT REUSED

Streams

8650719 – DOWNSTREAM CAPTURE ABORTS WITH ORA-26766

Secure Files

8856478 – RAM SECUREFILE PERF DEGRADATION WITH SF COMPRESSION ON SMALL LOBS DURING ATB MOVE

9272086 – STBH: DATA PUMP WRITER SEEMS TO BE WAITING ON WAIT FOR UNREAD MESSAGE ON BROADCA

DB Recovery

8909984 – APPSST GSI 11G: GAPS IN AWR SNAPSHOTS

9068088 – MEDIA RECOVERY WAS HUNG ON STANDBY

9145541 – ORA-600 [25027] / ORA-600 [4097] FOR ACTIVE TX IN A PLUGGED TABLESPACE

9167285 – PKT-BUGOLTP: ORA-07445: [KCRALC()+87]

Space Management

7519406 – ‘J000′ TRACE FILE REGARDING GATHER_STATS_JOB INTERMITTENTLY SINCE 10.2.0.4

8815639 – [11GR2-LNX-090813] MULTIPLE INSERT CAUSE DATA ALLOCATION ABOVE HHWM

9216806 – HIGH “ENQ: TS – CONTENTION” FOR TEMPORARY SEGMENT WHILE SQLLDR DIRECT PATH LOAD

9242411 – STRESS-BIGBH: LOTS OF OR-3113S IN BIGBH STRESS TEST

9461782 – ORA-7445 [KTSLF_SUMFSG()+54] [SIGSEGV] AND KTSLFSUM_CFS ON CALL STACK

Compression

9011088 – [11GR2]ADDING COLUMN TO COMPRESSED TABLE, DATA LOSS OCCURED.

9275072 – APPSST GSI 11G : BUFFER BUSY WAITS INSERTING INTO TABLES

9341448 – APPSST GSI 11G : BUFFER BUSY WAITS AND LATCH: CACHE BUFFERS WAITS WHEN INSERTING

9637033 – ORA-07445[KDR9IR2RST0] INSERT AS SELECT IN A COMPRESSED TABLE WITH > 255 COLUMNS

SQL Execution

8664189 – ORA-00600 [KDISS_UNCOMPRESS: BUFFER LENGTH]

9119194 – PSRC: DISTRIBUTED QUERY SLOWER IN 10.2.0.4 COMPARED TO 10.2.0.3

Transaction Management

8268775 – PERF: HIGH US ENQUEUE CONTENTION DURING A LOGIN STORM OR SESSION FAILOVER

8803762 – ORA-00600 [KDSGRP1] BLOCK CORRUPTION ON 11G DATABASE UPGRADE

Memory Management

8431487 – INSTANCE CRASH ORA-07445 [KGGHSTFEL()+192] ORA-07445[KGGHSTMAP()+241]

Message

9713537 – ENHANCE CAUSE/ACTION FIELDS OF THE INTERNAL ERROR ORA-00600

9714832 – ENHANCE CAUSE/ACTION FIELDS OF THE INTERNAL ERROR ORA-07445

x$ksusecst 内部视图详解

9i 中v$session_wait 是Oracle wait interface的一个主要用户接口,而该动态视图的内容来源于x$ksusecst内部视图:


SQL> select view_definition from v$fixed_view_definition where view_name='GV$SESSION_WAIT';

VIEW_DEFINITION
--------------------------------------------------------------------------------
select s.inst_id,s.indx,s.ksussseq,e.kslednam, e.ksledp1,s.ksussp1,s.ksussp1r,e.
ksledp2, s.ksussp2,s.ksussp2r,e.ksledp3,s.ksussp3,s.ksussp3r, decode(s.ksusstim,
0,0,-1,-1,-2,-2,   decode(round(s.ksusstim/10000),0,-1,round(s.ksusstim/10000)))
, s.ksusewtm, decode(s.ksusstim, 0, 'WAITING', -2, 'WAITED UNKNOWN TIME',  -1, '
WAITED SHORT TIME', 'WAITED KNOWN TIME')  from x$ksusecst s, x$ksled e where bit
and(s.ksspaflg,1)!=0 and bitand(s.ksuseflg,1)!=0 and s.ksussseq!=0 and s.ksussop
c=e.indx

SQL> desc x$ksusecst
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ADDR                                               RAW(4)
//即 v$session中 saddr 会话的起始地址
 INDX                                               NUMBER
//即 instance_id
 INST_ID                                            NUMBER
//即 sid
 KSSPAFLG                                           NUMBER
 KSUSEFLG                                           NUMBER
//该session是否仍活着, 1 为 alive
 KSUSENUM                                           NUMBER
//另一个固有编号
 KSUSSSEQ                                           NUMBER
// 相当于v$session 视图的SERIAL#列
 KSUSSOPC                                           NUMBER
// 对应x$ksled视图indx列,等待事件列表的一个序列号
 KSUSSP1                                            NUMBER
// 即v$session_wait表的p1列
 KSUSSP1R                                           RAW(4)
// 即v$session_wait表的p1raw
 KSUSSP2                                            NUMBER
// 即v$session_wait表的p2
 KSUSSP2R                                           RAW(4)
// 即v$session_wait表的p2raw
 KSUSSP3                                            NUMBER
// 即v$session_wait表的p3
 KSUSSP3R                                           RAW(4)
// 即v$session_wait表的p3raw
 KSUSSTIM                                           NUMBER
// 即v$session_wait表的wait_time,但单位为微秒
 KSUSEWTM                                           NUMBER
// 即v$session_wait表的seconds_in_wait,单位仍为秒

粗略写了一个可以代替v$session_wait视图的查询语句,过滤了可能出现的空闲等待事件,同时细化wait_time列到us级别:


select s.inst_id,
       s.indx sid,
       s.ksussseq seq#,
       e.kslednam event,
       e.ksledp1 p1text,
       s.ksussp1 p1,
       s.ksussp1r p1raw,
       e.ksledp2 p2text,
       s.ksussp2 p2,
       s.ksussp2r p2raw,
       e.ksledp3 p3text,
       s.ksussp3 p3,
       s.ksussp3r p3raw,
       s.ksusstim wait_time,
       s.ksusewtm seconds_in_wait,
       decode(s.ksusstim,
              0,
              'WAITING',
              -2,
              'WAITED UNKNOWN TIME',
              -1,
              'WAITED SHORT TIME',
              'WAITED KNOWN TIME') state
 from x$ksusecst s, x$ksled e
 where bitand(s.ksspaflg, 1) != 0
   and bitand(s.ksuseflg, 1) != 0
   and s.ksussseq != 0
   and s.ksussopc = e.indx
   and e.kslednam not in ('pmon timer',
                          'VKTM Logical Idle Wait',
                          'VKTM Init Wait for GSGA',
                          'IORM Scheduler Slave Idle Wait',
                          'rdbms ipc message',
                          'i/o slave wait',
                          'VKRM Idle',
                          'wait for unread message on broadcast channel',
                          'wait for unread message on multiple broadcast channels',
                          'class slave wait',
                          'KSV master wait',
                          'PING',
                          'watchdog main loop',
                          'DIAG idle wait',
                          'ges remote message',
                          'gcs remote message',
                          'heartbeat monitor sleep',
                          'SGA: MMAN sleep for component shrink',
                          'MRP redo arrival',
                          'LNS ASYNC archive log',
                          'LNS ASYNC dest activation',
                          'LNS ASYNC end of log',
                          'simulated log write delay',
                          'LGWR real time apply sync',
                          'parallel recovery slave idle wait',
                          'LogMiner builder: idle',
                          'LogMiner builder: branch',
                          'LogMiner preparer: idle',
                          'LogMiner reader: log (idle)',
                          'LogMiner reader: redo (idle)',
                          'LogMiner client: transaction',
                          'LogMiner: other',
                          'LogMiner: activate',
                          'LogMiner: reset',
                          'LogMiner: find session',
                          'LogMiner: internal',
                          'Logical Standby Apply Delay',
                          'parallel recovery coordinator waits for slave cleanup',
                          'parallel recovery control message reply',
                          'parallel recovery slave next change',
                          'PX Deq: Txn Recovery Start',
                          'PX Deq: Txn Recovery Reply',
                          'fbar timer',
                          'smon timer',
                          'PX Deq: Metadata Update',
                          'Space Manager: slave idle wait',
                          'PX Deq: Index Merge Reply',
                          'PX Deq: Index Merge Execute',
                          'PX Deq: Index Merge Close',
                          'PX Deq: kdcph_mai',
                          'PX Deq: kdcphc_ack',
                          'shared server idle wait',
                          'dispatcher timer',
                          'cmon timer',
                          'pool server timer',
                          'JOX Jit Process Sleep',
                          'jobq slave wait',
                          'pipe get',
                          'PX Deque wait',
                          'PX Idle Wait',
                          'PX Deq: Join ACK',
                          'PX Deq Credit: need buffer',
                          'PX Deq Credit: send blkd',
                          'PX Deq: Msg Fragment',
                          'PX Deq: Parse Reply',
                          'PX Deq: Execute Reply',
                          'PX Deq: Execution Msg',
                          'PX Deq: Table Q Normal',
                          'PX Deq: Table Q Sample',
                          'Streams fetch slave: waiting for txns',
                          'Streams: waiting for messages',
                          'Streams capture: waiting for archive log',
                          'single-task message',
                          'SQL*Net message from client',
                          'SQL*Net vector message from client',
                          'SQL*Net vector message from dblink',
                          'PL/SQL lock timer',
                          'Streams AQ: emn coordinator idle wait',
                          'EMON slave idle wait',
                          'Streams AQ: waiting for messages in the queue',
                          'Streams AQ: waiting for time management or cleanup tasks',
                          'Streams AQ: delete acknowledged messages',
                          'Streams AQ: deallocate messages from Streams Pool',
                          'Streams AQ: qmn coordinator idle wait',
                          'Streams AQ: qmn slave idle wait',
                          'Streams AQ: RAC qmn coordinator idle wait',
                          'HS message to agent',
                          'ASM background timer',
                          'auto-sqltune: wait graph update',
                          'WCR: replay client notify',
                          'WCR: replay clock',
                          'WCR: replay paused',
                          'JS external job',
                          'cell worker idle',
                          'SQL*Net message to client');

关于参数log_file_name_convert

Oracle文档对于该参数的描述十分容易产生歧义:converts the filename of a new log file on the primary database to the filename of a log file on the standby database,有时被误解为归档日志的文件名转换。
如在某standby备库进行以下测试:

alter system set log_file_name_convert='orcl','ZZZZZZ' scope=spfile;

SQL> select fnnam,fnonm from x$kccfn;

FNNAM

--------------------------------------------------------------------------------

FNONM

--------------------------------------------------------------------------------

/u01/oradata/ZZZZZZ/redo03.log

/u01/oradata/orcl/redo03.log

/u01/oradata/ZZZZZZ/redo02.log

/u01/oradata/orcl/redo02.log

/u01/oradata/ZZZZZZ/redo01.log

/u01/oradata/orcl/redo01.log


alter system set log_file_name_convert='orcl','8888888' scope=spfile;

SQL> select fnnam,fnonm from x$kccfn;



FNNAM

--------------------------------------------------------------------------------

FNONM

--------------------------------------------------------------------------------

/u01/oradata/8888888/redo03.log

/u01/oradata/orcl/redo03.log

 /u01/oradata/8888888/redo02.log

/u01/oradata/orcl/redo02.log

/u01/oradata/8888888/redo01.log

/u01/oradata/orcl/redo01.log

v$datafile中的大部分信息来源于x$kccfn内部视图,kccfn意为[F]ile [N]ames来源于Controlfile,其中 fnnam为经过对controlfile中文件名记录转制(由db_file_name_convert或 log_file_name_convert等参数convert)后的记录,而fnonm为控制文件中的原始文件名(或曰文件路径)。若在Data Guard配置过程中遭遇到日志文件名或数据文件名的转制问题,可以通过查询该视图进一步分析。

author: maclean
permanent link:http://www.youyus.com/2010/05/31/%E5%85%B3%E4%BA%8E%E5%8F%82%E6%95%B0log_file_name_convert/
date:2010-05-31
All rights reserved.