9i中DG remote archive可能导致Primary Database挂起

9i中若dg的remove archive stuck卡住可能即便在Max Performance最大性能模式下也会导致Primary database 主库的local archive 本地归档无法正常进行, 进而导致Primary db hang挂起。

Solaris sparce 64  平台上的9.2.0.5 库 的相关日志:

 

Verifying the issue in the alert log

Fri Oct 28 08:10:12 2011
ALTER SYSTEM SET log_archive_dest_1='location=/u24/app/oradata/temp_arch' SCOPE=BOTH;
Fri Oct 28 08:10:51 2011
ORACLE Instance maclean - Can not allocate log, archival required
Fri Oct 28 08:10:51 2011
ARCH: Connecting to console port...
Thread 1 cannot allocate new log, sequence 18963
All online logs needed archiving
 Current log# 3 seq# 18962 mem# 0: /u23/app/oradata/maclean/redo3a.log
 Current log# 3 seq# 18962 mem# 1: /u24/app/oradata/maclean/redo3b.log
Fri Oct 28 08:12:22 2011
ALTER SYSTEM SET log_archive_dest_1='location=/u04/app/oracle/admin/maclean mandatory' SCOPE=BOTH; 

<== changing archive destination multiple times

Fri Oct 28 08:27:56 2011
Shutting down instance: further logons disabled
Shutting down instance (immediate) <===== issued shut immediate
License high water mark = 546
Fri Oct 28 08:30:55 2011
PMON failed to delete process, see PMON trace file
Fri Oct 28 08:33:11 2011
Active call for process 19878 user 'oracle' program 'oracle@e02k72 (TNS V1-V3)'
SHUTDOWN: waiting for active calls to complete. <=== looks there are active trans and waiting to complete
Fri Oct 28 08:36:40 2011
Starting ORACLE instance (force) <=== issued startup force

Fri Oct 28 08:37:03 2011
ALTER DATABASE   MOUNT
Fri Oct 28 08:37:08 2011
Successful mount of redo thread 1, with mount id 2399989231
Fri Oct 28 08:37:08 2011
Database mounted in Exclusive Mode.
Completed: ALTER DATABASE   MOUNT <=== mount completed

Fri Oct 28 08:37:08 2011
ALTER DATABASE OPEN
Fri Oct 28 08:37:08 2011
Beginning crash recovery of 1 threads <=== started crash recover as expected
Fri Oct 28 08:37:08 2011
Started redo scan
Fri Oct 28 08:37:08 2011
Completed redo scan
0 redo blocks read, 0 data blocks need recovery
Fri Oct 28 08:37:08 2011
Started recovery at
Thread 1: logseq 18962, block 409599, scn 2775.3390299666
Recovery of Online Redo Log: Thread 1 Group 3 Seq 18962 Reading mem 0
 Mem# 0 errs 0: /u23/app/oradata/maclean/redo3a.log
 Mem# 1 errs 0: /u24/app/oradata/maclean/redo3b.log
Fri Oct 28 08:37:08 2011
Completed redo application
Fri Oct 28 08:37:08 2011
Ended recovery at
Thread 1: logseq 18962, block 409599, scn 2775.3390319667
0 data blocks read, 0 data blocks written, 0 redo blocks read
Crash recovery completed successfully
Fri Oct 28 08:37:08 2011
LGWR: Primary database is in CLUSTER CONSISTENT mode
Fri Oct 28 08:37:08 2011
ARCH: Evaluating archive   log 1 thread 1 sequence 18959
ARCH: Beginning to archive log 1 thread 1 sequence 18959
Creating archive destination LOG_ARCHIVE_DEST_2: 'maclean_dr'
Creating archive destination LOG_ARCHIVE_DEST_1: '/u04/app/oracle/admin/maclean/maclean_1_18959.arch'
Fri Oct 28 08:41:12 2011
alter database open
Fri Oct 28 08:41:12 2011
ORA-1154 signalled during: alter database open...
Fri Oct 28 08:59:10 2011
ARCH: Completed archiving  log 1 thread 1 sequence 18959
Fri Oct 28 08:59:10 2011
LGWR: Primary database is in CLUSTER CONSISTENT mode
Thread 1 advanced to log sequence 18963
Thread 1 opened at log sequence 18963
 Current log# 1 seq# 18963 mem# 0: /u23/app/oradata/maclean/redo1a.log
 Current log# 1 seq# 18963 mem# 1: /u24/app/oradata/maclean/redo1b.log
Successful open of redo thread 1
Fri Oct 28 08:59:11 2011
LOG_CHECKPOINT_INTERVAL was set when MTTR advisory was switched on.
Fri Oct 28 08:59:11 2011
ARC0: Evaluating archive   log 2 thread 1 sequence 18960
ARC0: Beginning to archive log 2 thread 1 sequence 18960
Creating archive destination LOG_ARCHIVE_DEST_2: 'maclean_dr'
Fri Oct 28 08:59:11 2011
SMON: enabling cache recovery
Fri Oct 28 08:59:11 2011
Creating archive destination LOG_ARCHIVE_DEST_1: '/u04/app/oracle/admin/maclean/maclean_1_18960.arch'
Fri Oct 28 08:59:12 2011
Undo Segment 1 Onlined
Undo Segment 2 Onlined
Undo Segment 3 Onlined
Undo Segment 4 Onlined
Undo Segment 5 Onlined
Undo Segment 6 Onlined
Undo Segment 7 Onlined
Undo Segment 8 Onlined
Undo Segment 9 Onlined
Undo Segment 10 Onlined
Successfully onlined Undo Tablespace 1.
Fri Oct 28 08:59:12 2011
SMON: enabling tx recovery
Fri Oct 28 08:59:12 2011
Database Characterset is AL32UTF8
Fri Oct 28 08:59:12 2011
SMON: about to recover undo segment 17
Fri Oct 28 08:59:12 2011
replication_dependency_tracking turned off (no async multimaster replication found)
Fri Oct 28 08:59:12 2011
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 18
SMON: mark undo segment 18 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 18
SMON: mark undo segment 18 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 18
SMON: mark undo segment 18 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: Parallel transaction recovery tried <=== transaction recovery
Fri Oct 28 08:59:14 2011
Completed: ALTER DATABASE OPEN <=== database is now open
Fri Oct 28 09:02:34 2011

This looks to be an expected behavior, as the database waited for the crash/transaction recovery to
complete and it opened the database just after the transaction recovery.

Note 414242.1 Database Hangs Because SMON is taking 100% CPU doing transaction recovery Startup Hang

Also Please note that 9i  is Sustaining Support as of  now, so we cannot raise any bug/backport requests for
this version of Oracle.
So, with limited resources and information We will do our best to solve your issue.
However, if there is no available workaround or fix for this version of Oracle, you may need to migrate to
a supported version like 11.2.0.3 to solve this problem.

Also please note that 9.2.0.5 is no more listed in the certification for Oracle Solaris on SPARC (64-bit) platform.
So i request you to upgrade to at least 9.2.0.8 immediately, if upgrade to 11.2.0.3 is not feasible immediately.

If it is up are you able to issue an "alter system switch logfile" command and it finish?
yes now alter system switch logfile is working. but we had to restart database couple of times by shutdown abort..
we did it around 4-5 times.. why it concerns us a lot bcoz produciton instance was down for 2-3 hours.

3. Upload the output of the command SQL> show parameter archive.

SQL> show parameter archive

NAME TYPE
------------------------------------ --------------------------------
VALUE
------------------------------
archive_lag_target integer
0
log_archive_dest string

log_archive_dest_1 string
location=/u04/app/oracle/admin
/maclean
log_archive_dest_10 string

NAME TYPE
------------------------------------ --------------------------------
VALUE
------------------------------
log_archive_dest_2 string
service=maclean_dr reopen=60
log_archive_dest_3 string

log_archive_dest_4 string

log_archive_dest_5 string

log_archive_dest_6 string

NAME TYPE
------------------------------------ --------------------------------
VALUE
------------------------------

log_archive_dest_7 string

log_archive_dest_8 string

log_archive_dest_9 string

log_archive_dest_state_1 string
enable

NAME TYPE
------------------------------------ --------------------------------
VALUE
------------------------------
log_archive_dest_state_10 string
enable
log_archive_dest_state_2 string
enable
log_archive_dest_state_3 string
enable
log_archive_dest_state_4 string
enable
log_archive_dest_state_5 string

NAME TYPE
------------------------------------ --------------------------------
VALUE
------------------------------
enable
log_archive_dest_state_6 string
enable
log_archive_dest_state_7 string
enable
log_archive_dest_state_8 string
enable
log_archive_dest_state_9 string
enable

NAME TYPE
------------------------------------ --------------------------------
VALUE
------------------------------
log_archive_duplex_dest string

log_archive_format string
maclean_%t_%s.arch
log_archive_max_processes integer
2
log_archive_min_succeed_dest integer
1
log_archive_start boolean

NAME TYPE
------------------------------------ --------------------------------
VALUE
------------------------------
TRUE
log_archive_trace integer
0
remote_archive_enable string
true
standby_archive_dest string
?/dbs/arch
SQL>

Destination LOG_ARCHIVE_DEST_2 is in CLUSTER CONSISTENT mode
Destination LOG_ARCHIVE_DEST_2 is in MAXIMUM PERFORMANCE mode
Destination LOG_ARCHIVE_DEST_2 is in CLUSTER CONSISTENT mode
Destination LOG_ARCHIVE_DEST_2 is in MAXIMUM PERFORMANCE mode

This indicates that your Primary Init.ora parameter log_archive_dest_2 is not in sync with your standby destination.

I presume your standby database maclean_dr is working properly?

The standby dataguard database may not be in a correct state. Archive logs not being successfully transported.
Possible password errors. Possible destination file errors, possible set up errors.

standby was in sync. we even defer log_archive_dest_2 during issue to check if log switch works after that.
but even after deferring the log_archive_dest_2 log switch didn't work .& database was in bad shape for 2-3 hours..

Fri Oct 28 08:35:32 2011
RFS: Possible network disconnect with primary database <<<<<DISCONNECT FROM PRIMARY - SHUTDOWN IMMEDIATE ON PRIMARY
Closing latent archivelog for thread 1 sequence 18960
EOF located at block 47105 low SCN 2775:-904684681 next SCN 16:0
Latent archivelog '/u04/app/oracle/admin/maclean/maclean_1_18960.arch'
If you wish to failover to this standby database, you should use the
following command to manually register the archivelog for recovery:
ALTER DATABASE REGISTER LOGFILE '/u04/app/oracle/admin/maclean/maclean_1_18960.arch';
Fri Oct 28 08:35:32 2011
Errors in file /u01/app/oracle/admin/maclean/udump/maclean_rfs_6173.trc:
ORA-00367: checksum error in log file header <<<<<<<<<<<<<<<<<<<<<<CHECKSUM ERROR IN FILES
ORA-00332: archived log is too small - may be incompletely archived
ORA-00334: archived log: '/u04/app/oracle/admin/maclean/maclean_1_18960.arch'

by 0954 the standby had caught up.

Fri Oct 28 09:54:41 2011
Media Recovery Log /u04/app/oracle/admin/maclean/maclean_1_18962.arch
Media Recovery Log /u04/app/oracle/admin/maclean/maclean_1_18963.arch
Media Recovery Log /u04/app/oracle/admin/maclean/maclean_1_18964.arch
Media Recovery Log /u04/app/oracle/admin/maclean/maclean_1_18965.arch
Media Recovery Log /u04/app/oracle/admin/maclean/maclean_1_18966.arch
Media Recovery Log /u04/app/oracle/admin/maclean/maclean_1_18967.arch
Media Recovery Log /u04/app/oracle/admin/maclean/maclean_1_18968.arch
Media Recovery Log /u04/app/oracle/admin/maclean/maclean_1_18969.arch
Media Recovery Log /u04/app/oracle/admin/maclean/maclean_1_18970.arch
Media Recovery Log /u04/app/oracle/admin/maclean/maclean_1_18971.arch
Media Recovery Log /u04/app/oracle/admin/maclean/maclean_1_18972.arch
Media Recovery Log /u04/app/oracle/admin/maclean/maclean_1_18973.arch

The following notes should help you to reduce some of the errors on the standby side.

"Data Guard 9i ORA-332 Error When Registering Partial Archive Log After Primary Server Goes Down (Doc ID 233253.1)"
"Note:14579.1 SOLARIS Configuring Keepalive on Solaris (SPARC): "

This shows the value of the keep alive timer

ndd /dev/tcp tcp_keepalive_interval

You can change the parameter with:

ndd -set /dev/tcp tcp_keepalive_interval <new-smaller-value>

Current value for tcp_keepalive_interval is 7200000,
Can you please suggest us the value for this parameter also let us know if this parameter requires
server reboot or it can be done online.

$ ndd /dev/tcp tcp_keepalive_interval
7200000

Fri Oct 28 04:59:21 2011
Creating archive destination LOG_ARCHIVE_DEST_1: '/u04/app/oracle/admin/maclean/maclean_1_18958.arch'
Fri Oct 28 04:59:22 2011
Creating archive destination LOG_ARCHIVE_DEST_1: '/u04/app/oracle/admin/maclean/maclean_1_18957.arch'
Fri Oct 28 05:01:48 2011
ORACLE Instance maclean - Can not allocate log, archival required
Fri Oct 28 05:01:48 2011
ARCH: Connecting to console port...
Thread 1 cannot allocate new log, sequence 18960
All online logs needed archiving
Current log# 1 seq# 18959 mem# 0: /u23/app/oradata/maclean/redo1a.log
Current log# 1 seq# 18959 mem# 1: /u24/app/oradata/maclean/redo1b.log
Fri Oct 28 05:12:10 2011
ARCH: Completed archiving log 2 thread 1 sequence 18957
ARCH: Evaluating archive log 4 thread 1 sequence 18956
ARCH: Unable to archive log 4 thread 1 sequence 18956
Log actively being archived by another process
Fri Oct 28 05:12:10 2011
Thread 1 advanced to log sequence 18960
Current log# 2 seq# 18960 mem# 0: /u23/app/oradata/maclean/redo2a.log
Current log# 2 seq# 18960 mem# 1: /u24/app/oradata/maclean/redo2b.log
Fri Oct 28 05:40:25 2011
ARC1: Completed archiving log 4 thread 1 sequence 18956
ARC1: Evaluating archive log 3 thread 1 sequence 18958
ARC1: Unable to archive log 3 thread 1 sequence 18958
Log actively being archived by another process
ARC1: Evaluating archive log 1 thread 1 sequence 18959
ARC1: Beginning to archive log 1 thread 1 sequence 18959
Creating archive destination LOG_ARCHIVE_DEST_2: 'maclean_dr'
Creating archive destination LOG_ARCHIVE_DEST_1: '/u04/app/oracle/admin/maclean/maclean_1_18959.arch'
Fri Oct 28 05:40:27 2011
ARCH: Evaluating archive log 3 thread 1 sequence 18958
ARCH: Unable to archive log 3 thread 1 sequence 18958
Log actively being archived by another process
ARCH: Evaluating archive log 1 thread 1 sequence 18959
ARCH: Unable to archive log 1 thread 1 sequence 18959
Log actively being archived by another process
Fri Oct 28 05:40:27 2011
Thread 1 advanced to log sequence 18961
Current log# 4 seq# 18961 mem# 0: /u23/app/oradata/maclean/redo4a.log
Current log# 4 seq# 18961 mem# 1: /u24/app/oradata/maclean/redo4b.log
Fri Oct 28 06:11:45 2011
ORACLE Instance maclean - Can not allocate log, archival required
Fri Oct 28 06:11:45 2011
ARCH: Connecting to console port...
Thread 1 cannot allocate new log, sequence 18962
All online logs needed archiving
Current log# 4 seq# 18961 mem# 0: /u23/app/oradata/maclean/redo4a.log
Current log# 4 seq# 18961 mem# 1: /u24/app/oradata/maclean/redo4b.log
Fri Oct 28 07:40:42 2011
Archive Log Stop
Archiving is disabled
Shutting down archive processes
Archiving is disabled
Archive Log Start
Archiving is enabled
ARCH: STARTING ARCH PROCESSES
ARCH: STARTING ARCH PROCESSES COMPLETE
Fri Oct 28 07:54:01 2011
ALTER SYSTEM SET log_archive_dest_state_2='defer' SCOPE=SPFILE;
Fri Oct 28 07:54:09 2011
ALTER SYSTEM SET log_archive_dest_state_2='defer' SCOPE=BOTH;
Fri Oct 28 08:03:52 2011
ALTER SYSTEM SET log_archive_dest_state_2='enable' SCOPE=BOTH;
Fri Oct 28 08:06:20 2011
ARC0: Completed archiving log 3 thread 1 sequence 18958
Fri Oct 28 08:06:20 2011
ARCH: Evaluating archive log 1 thread 1 sequence 18959
ARCH: Unable to archive log 1 thread 1 sequence 18959
Log actively being archived by another process
ARCH: Evaluating archive log 2 thread 1 sequence 18960
ARCH: Beginning to archive log 2 thread 1 sequence 18960
Creating archive destination LOG_ARCHIVE_DEST_2: 'maclean_dr'
Fri Oct 28 08:06:20 2011
Thread 1 advanced to log sequence 18962
Current log# 3 seq# 18962 mem# 0: /u23/app/oradata/maclean/redo3a.log
Current log# 3 seq# 18962 mem# 1: /u24/app/oradata/maclean/redo3b.log
Fri Oct 28 08:06:20 2011
ARCH shutting down
ARC0: Archival stopped
ARC0: Shutdown aborted (current state is 3)
ARC0: Evaluating archive log 1 thread 1 sequence 18959
ARC0: Unable to archive log 1 thread 1 sequence 18959
Log actively being archived by another process
ARC0: Evaluating archive log 2 thread 1 sequence 18960
ARC0: Unable to archive log 2 thread 1 sequence 18960
Log actively being archived by another process
Fri Oct 28 08:06:21 2011
ARC0: Evaluating archive log 4 thread 1 sequence 18961
Fri Oct 28 08:06:21 2011
Undo Segment 21 Onlined
ARC0: Beginning to archive log 4 thread 1 sequence 18961
Creating archive destination LOG_ARCHIVE_DEST_2: 'maclean_dr'
Fri Oct 28 08:06:21 2011
Undo Segment 22 Onlined
Fri Oct 28 08:06:21 2011
Undo Segment 23 Onlined
Fri Oct 28 08:06:21 2011
Undo Segment 24 Onlined
Fri Oct 28 08:06:21 2011
Undo Segment 25 Onlined
Fri Oct 28 08:06:21 2011
Undo Segment 26 Onlined
Fri Oct 28 08:06:21 2011
Creating archive destination LOG_ARCHIVE_DEST_1: '/u04/app/oracle/admin/maclean/maclean_1_18960.arch'
Fri Oct 28 08:06:21 2011
Creating archive destination LOG_ARCHIVE_DEST_1: '/u04/app/oracle/admin/maclean/maclean_1_18961.arch'
Fri Oct 28 08:10:12 2011
ALTER SYSTEM SET log_archive_dest_1='location=/u24/app/oradata/temp_arch' SCOPE=BOTH;
Fri Oct 28 08:10:51 2011
ORACLE Instance maclean - Can not allocate log, archival required
Fri Oct 28 08:10:51 2011
ARCH: Connecting to console port...
Thread 1 cannot allocate new log, sequence 18963
All online logs needed archiving
Current log# 3 seq# 18962 mem# 0: /u23/app/oradata/maclean/redo3a.log
Current log# 3 seq# 18962 mem# 1: /u24/app/oradata/maclean/redo3b.log
Fri Oct 28 08:12:22 2011
ALTER SYSTEM SET log_archive_dest_1='location=/u04/app/oracle/admin/maclean mandatory' SCOPE=BOTH;
Fri Oct 28 08:27:56 2011
Shutting down instance: further logons disabled
Shutting down instance (immediate)
License high water mark = 546
Fri Oct 28 08:30:55 2011
PMON failed to delete process, see PMON trace file
Fri Oct 28 08:33:11 2011
Active call for process 19878 user 'oracle' program 'oracle@e02k72 (TNS V1-V3)'
SHUTDOWN: waiting for active calls to complete.
Fri Oct 28 08:36:40 2011
Starting ORACLE instance (force)
License high water mark = 546
Instance terminated by USER, pid = 28373

FileName
----------------
alert_maclean.log

 

这个SR提交之后 , 一开始认可能是Solaris上tcp tcp_keepalive_interval 参数的问题, 后来esclated到serverity 1 后Oracle high availability support team接手, 发现是 因为 9i中若 remote archive stuck住 会导致本地归档无法正常,  造成数据库无响应。

 

 

Please refer the below Article which seems to match with your issue :
Doc ID 260040.1 -- Refining Remote Archival Over a Slow Network with the ARCH Process .

Please set the parameter "_LOG_ARCHIVE_CALLOUT" to 'LOCAL_FIRST=TRUE' and also
increase the number of ARCH processes from the default value of 2 to 6.

SQL> alter system set log_archive_max_processes=6 scope=both;

As in Doc ID 260040.1
If the remote archiving goes into unknown state, it can cause primary hang as it doesn't attempts
local archiving first in 9..2.0.5

It can still happen in max performance mode. The only workaround to prevent such problem is to
set _LOG_ARCHIVE_CALLOUT" to 'LOCAL_FIRST=TRUE'

With that parameter, it will do local archiving first before trying remote archiving preventing
any impact of standby setup to primary database.

You mentioned that you did try deferring remote archiving but it didn't help.
If the destination is stuck (in network) already, deferring the destination may not help.
But you rather restart instances.

Action plan suggested).

Please set _LOG_ARCHIVE_CALLOUT" to 'LOCAL_FIRST=TRUE'

If problem happens again, get multiple system state dump with 10046 trace
on alter system switch log file session. 

When the parameter "_LOG_ARCHIVE_CALLOUT" is set , 1 ARCH process is reserved for local archiving.
 So even if the network is slow or the Standby is down, the Primary will not be impacted.

Setting the parameter "_LOG_ARCHIVE_CALLOUT" should not cause log shipping issues.

 

 

“_LOG_ARCHIVE_CALLOUT”(Secondly, you can change the above outlined archiver behavior, to make sure the local destination is archived first, causing the OPTIONAL destination to behave really optionally. If the following parameter is set then the ARCH process will begin archiving to the local destination first. Once the redo log has been completely and successfully archived to at least one local destination. )

For 9iR2 (9.2.0.5 and above):

_log_archive_callout=’LOCAL_FIRST=TRUE’

–OR–

alter system set “_LOG_ARCHIVE_CALLOUT”=’LOCAL_FIRST=TRUE’ scope=both;

In Oracle 10g, this parameter is replaced by ‘log_archive_local_first’, which defaults to TRUE.

In Oracle 11g, ‘log_archive_local_first’ has been deprecated.

 

这个隐藏参数可以控制ARCH归档进程优先完成本地归档local archive工作,这样就避免了本地重做日志长期无法归档造成数据库无响应。

 

文档《Refining Remote Archival Over a Slow Network with the ARCH Process》介绍了更多信息:

 

Purpose

When archiving locally and remotely using the ARCH process where the remote destination is across a saturated or slow network you can receive the following errors in the alert log:

ARC0: Evaluating archive   log 2 thread 1 sequence 100

ARC0: Unable to archive log 2 thread 1 sequence 100

      Log actively being archived by another process

If the ARCH process is unable to archive at the rate at which online logs are switched then it is possible for the primary database to suspend while waiting for archiving to complete.  The following discussion describes how this can occur.

Default Behavior for 9iR2 and Below

The ARCH process sits in a very tight loop waiting for an update to the controlfile that states an online log needs to be archived.  Once the update occurs the ARCH process builds a list of archive destinations that need to be serviced.  Once this list is complete, the ARCH process will read a one megabyte chunk of data from the online log that is to be archived.  This one megabyte chunk is then sent to the first destination in the list.  When the write has completed, the same one megabyte chunk is written to the second destination.  This continues until all of the data from the online log being archived has been written to all destinations.  So it can be said that archiving is only as fast as the slowest destination.

A common misconception is that if the LOG_ARCHIVE_DEST_n parameter for a particular destination has the OPTIONAL attribute set, then that destination will not impede local archiving. This is true during error situations while archiving to that destination – e.g. a network disconnect error, but not during an archival over a slow network, which is not an error situation. In error situations, whether the destination is marked OPTIONAL or MANDATORY, Data Guard will close that destination and continue transmitting to all other valid destinations. Transmitting to the closed destination will be attempted again only after the time specified in the REOPEN attribute has expired and a log switch has occurred.  This process will continue for the number of times specified by the MAX_FAILURE attribute. During this time, it is possible that the log writer process recycles through the available online redo log groups and tries to use the online redo log file which has not yet been transmitted successfully to the remote destination. If the destination is marked OPTIONAL, log writer will reuse the online redo log file for the next set of redo. If the destination is marked MANDATORY,  log writer will not be able to reuse that online redo log file, and the primary database will delay processing until that online redo log file has been successfully transmitted to the remote destination.

However, the situation is very different if the transmission is being done over a slow network. In this case, no error is encountered and the destination is not closed. Transmission continues, but is very slow. Ultimately, with the unavailability of any more online redo log groups, Log writer may suspend because the archive process is taking a long time to complete its archival, including local archival.

Refining the Default Behavior

The following underscore parameter was introduced as of 9.2.0.5 to allow the DBA to change this default behavior:

_LOG_ARCHIVE_CALLOUT=’LOCAL_FIRST=TRUE’

This is a dynamic Parameter, so you can set it this Way:

SQL> alter system set “_LOG_ARCHIVE_CALLOUT”=’LOCAL_FIRST=TRUE’ scope=both;

If the above parameter is set then the ARCH process will begin archiving to the local destination first.  Once the redo log has been completely and successfully archived to at least one local destination, it will then be transmitted to the remote destination. This is the default behavior beginning with Oracle Database 10g Release 1.

Starting in 9.2.0.7 patchsets, one ARCH process will begin acting as a ‘dedicated’ archiver, handling only local archival duties. It will not perform remote log shipping or service FAL requests. This is a backport of behavior from 10gR1 to 9iR2.

 

 

设置_LOG_ARCHIVE_CALLOUT=’LOCAL_FIRST=TRUE’  + ’log_archive_max_processes=6′  一般可以解决该因为 归档状态异常而引起的实例无响应。


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *