ORA-00600 [3020] when break remote mirror and startup database

If you cannot recover the data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline:  +86 13764045638 E-mail: [email protected]

 

1. Customer is using HDS remote mirror for DR solution. After breaking the mirror, in the DR site, some databases cannot startup with the following errors:

a1.
ORA-01122: database file 2 failed verification check
ORA-01110: data file 2: ‘+DATA07_AI401PO1/ai401po1/datafile/sysaux_01.dbf’
ORA-01207: file is more recent than control file – old control file

a2 (same database as a1, after some commands).
ORA00600: internal error code, arguments: [3020], [5], [896], [20972416], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 5, block# 896, file offset is 7340032 bytes)
ORA-10564: tablespace UNDOTBS2
ORA-01110: data file 5: ‘+DATA07_AI401PO1/ai401po1/datafile/undotbs2_01.dbf’
ORA-10560: block type ‘KTU UNDO BLOCK’

Resolved by “recover datafile 3”.

b.
ERROR at line 1:
ORA00600: internal error code, arguments: [kcratr1_lastbwr], [], [], [], [],
[], [], [], [], [], [], []

Resolved by “recover database”.

Questions:
Q1. Understand from the MOS note 604683.1 and 784776.1 that, the storage vendor (HDS) is responsible for Oracle requirements of “crash consistent”, “write ordering”, POC and procedure. However, given the above errrors, how to tell if the break mirror fulfill Oracle requirements or not?

Q2. For (b) above, the MOS note 393984.1 matches it. It may happen even in a single site crash recovery scenario. It is an Oracle bug or expected behavior?

The database version is 11.2.0.2.

 

A1. The errors in a1 indicate that a datafile had at least a higher database checkpoint than the controlfile. It may have helped to get a controlfile dump and data file header dumps to verify.

It’s not clear what commands were issued to get to the state of a2, but an ORA-600 [3020] probably means that a data block was behind the file header checkpoint info. In other words, based on file header info, we started recovery with logfile #N. But block 896 probably needed a redo record from logfile N-1 to be applied first. If recovering from an older backup of the data file worked, then that would give more weight to that theory. Note 30866.1 does list some bugs where you can still get ORA-600 [3020] during regular recovery though.

A2. You can also read bugs that reference ORA-600 [kcratr_scan_lastbwr]. The ORA-600 [kcratr1_lastbwr] seems to only be in 11.2.0.1 rather than in 11.2.0.2.  Maybe the customer is really on 11.2.0.1 + PSU 2?  Anyway it could be indicative of a stale mirror as noted in bug 9584943, but there are other bugs that I didn’t read in detail.

 

Customer just updated that the EMC “consistent group” was not implemented for some reasons.

We are going to tell customer that, in this break remote mirror DR solution, if the Oracle requirements of “crash consistent”, “write ordering” cannot be meet (MOS note 604683.1 and 784776.1 ), in the worst case, customer may not be able to even recover the database.  Is this correct?

 

Recovery might work if they restore a prior backup and roll forward. 🙂
But maybe full recovery from a backup could still result in transaction loss if the active online redo logs are also corrupt because of lost writes to the mirror those redo logs reside on.


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *