ORA-600:[ksnpost:ksnigb]错误一例

一套HP-UX Itanium上的10.2.0.3系统出现了ORA-00600: internal error code, arguments: [ksnpost:ksnigb], [], [], [], [], [],错误,相关的日志如下:

 

ORA-00600: internal error code, arguments: [kolaslGetLength-1], [], [], [], [], [], [], []

ORA-00600: internal error code, arguments: [ksnpost:ksnigb], [], [], [], [], [], [], []
ORA-00609: could not attach to incoming connection
ORA-12157: TNS:internal network communication error

ORA-27300: OS system dependent operation:fork failed with status: 12
ORA-27301: OS failure message: Not enough space
ORA-27302: failure occurred at: skgpspawn3

ORA-00600: internal error code, arguments: [kolaslGetLength-1], [], [], [], [], [], [], []
Current SQL statement for this session:
SELECT * FROM gv$sql

ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [ksnpost:ksnigb], [], [], [], [], [], [], []
ORA-00609: could not attach to incoming connection
ORA-12157: TNS:internal network communication error
Current SQL information unavailable - no session.

 

Mos上相关的Bug 信息:

 

Bug # 4634662 – Ora-600 [kolaslGetLength-1]
Bug # 7479468 – Ora-600 [ksnpost:ksnigb]

 

Issue 1)
This issue is described in the note 357016.1.Please apply the patch mentioned in the note 357016.1 to avoid this issue.

 

Issue 2)
This is due to OS resource limitation. You need to check with HP for this.We are running out of OS resources and hence
this issue is coming.
Also you can set the ulimit -a values to unlimited for oracle and root user to avoid this issue.

 

Bug 4634662 OERI:kolaslGetLength-1 from V$SQL in RAC
ORA-600[kolaslGetLength-1] selecting from V$SQL (of GV$SQL)
Range of versions believed to be affected Versions < 11
Versions confirmed as being affected

10.1.0.5
10.2.0.3

his issue is fixed in

10.2.0.3 Patch 5 on Windows Platforms
10.2.0.4 (Server Patch Set)
11.1.0.6 (Base Release)

on a RAC database with a varying width character set.

Applies to:
Oracle Server – Enterprise Edition – Version: 10.1.0.2 to 10.2.0.3 – Release: 10.1 to 10.2
Information in this document applies to any platform.
Symptoms

select * from gv$sql on a RAC enviornment

errors with:
ORA-00600: internal error code, arguments: [kolaslGetLength-1], [], [], [], [], [], [], [].

with a multibye characterset (ie:al32utf8, utf8 )
Note: This error has also been reported when select from SQLT$_GV$SQL_PLAN while running Sqlt Execute diagnostic utility – SQLTXPLAIN.SQLT$*

Cause
This is Bug 4634662

Details: ora-600[kolaslGetLength-1] received selecting from v$sql over a RAC db which has a varying width character set

Solution

Bug 4634662 is fixed in 10.2.0.4 and Rel 11.

To check if a patch exists for your platform, go to Metalink:

a) Click on Patches & Updates Folder.
b) Click on Simple Search.
c) Enter patch#: Patch 4634662
d) Select your O/S
e) Click Go.

ORA-600 [Ksnpost:Ksnigb] ORA-12157

Applies to:
Oracle Server – Enterprise Edition – Version: 10.2.0.4 and later [Release: 10.2 and later ]
Information in this document applies to any platform.
Symptoms

In the alert log is reported the following:

ORA-00600: internal error code, arguments: [ksnpost:ksnigb], [], [], [], [], [], [], []
ORA-00609: could not attach to incoming connection
ORA-12157: TNS:internal network communication error

Cause

ORA-600 [ksnpost:ksnigb] along with ORA-600 [729] errors are creating problem while creating new sessions.
All these point to insufficient memory or OS limits.

Bug 7479468 was raised for similar issue that was closed as there was not enough swap swap space hence was the problem.
Solution

Check the swap space in the system and increase if required.
References
BUG:7479468 – ORA-600 [KSNPOST:KSNIGB] AND ORA-12157

 

分析发现在10.2.0.3等版本上存在RAC中查询gv$sql视图会引发ORA-600[kolaslGetLength-1]错误的bug,该bug 可以通过实施补丁4634662来解决。

 

此外当有效内存不足或OS参数上限过小时可能导致Oracle无法成功创建新的session ,引发ORA-600[ksnpost:ksnigb]错误。

建议检查ulimit -a 的输出, 在HP-UX上不妨将stack size设置为unlimited(set the stack size to unlimited.)

$ulimit -a
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) 1048576
stack(kbytes) 131072
memory(kbytes) unlimited
coredump(blocks) 4194303
nofiles(descriptors) 2048

EM dbconsole显示数据文件使用量为负数?

有网友在itpub上反应11g中查看EM dbconsole显示数据文件的使用量为负数,如图:

 

 

实际上导致(EM ( EM Home page -> Administration Tab -> Storage -> Datafiles ))显示负的使用量的Bug在11g和10g中均存在,这里将这些bug列出来以供参考:

 

10g Enterprise Manager Shows Negative Utilisation For An Oracle Datafile.

Applies to:
Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 10.2.0.3
This problem can occur on any platform.
Symptoms

EM ( EM Home page -> Administration Tab -> Storage -> Datafiles ) shows negative 
utilization for some files. Value in Used (MB) column is negative.

Cause
This is because of Bug:5083393.

If the object is dropped and not purged, the value of FILE_ID / REL_FNO may be wrong in 
DBA_FREE_SPACE and this cause EM to display wrong value in Used (MB) column.

Solution
If available, download Patch:5083393 to resolve this issue.
Note that this patch is available on 10.2.0.2/10.2.0.3 for some platforms at the 
time of this writing this article (JULY 2007).

Workaround :
-------------
Purge all objects from the recycle bin.

$sqlplus "/as sysdba"
sql> purge dba_recyclebin 

Hdr: 5083393 10.1.0.4 RDBMS 10.1.0.4 SPACE PRODID-5 PORTID-912
Abstract: VALUE OF FILE_ID AND RELATIVE_FNO OF DBA_FREE_SPACE IS DIFFERENT

PROBLEM:
--------
When the table is dropped, the value of the FILE_ID column and
the RELATIVE_FNO column of dba_free_space might be different.

SQL> select * from dba_free_space where tablespace_name = 'TBS_TEST1';

TABLESPACE_NAME FILE_ID BLOCK_ID    BYTES BLOCKS RELATIVE_FNO
--------------- ------- -------- -------- ------ ------------
E
E
E
TBS_TEST1             6      105    65536      8            6
TBS_TEST1             6      113    65536      8            6
TBS_TEST1             6      121    65536      8            6
TBS_TEST1             6        9    65536      8            7   purge table tbs_test01;

Table purged.

SQL> select * from dba_free_space where tablespace_name = 'TBS_TEST1';

TABLESPACE_NAME FILE_ID BLOCK_ID    BYTES BLOCKS RELATIVE_FNO
--------------- ------- -------- -------- ------ ------------
TBS_TEST1             6        9   983040    120            6
TBS_TEST1             7        9   983040    120            7
TBS_TEST1             8        9 20905984   2552            8

In a word, there seems to be a problem in the display of an free space
concerning the segment that exists in RECYCLE BIN.

WORKAROUND:
-----------
None

RELATED BUGS:
-------------
None

REPRODUCIBILITY:
----------------
  100% (both In-house and Ct' site) 
  Rep?     Platform               RDBMS Ver. 
  -------  ---------------------  ---------- 
  Y(100%)  912 Intel Windows NT   10.1.0.4
  Y(100%)  912 Intel Windows NT   10.1.0.5
  Y(100%)  912 Intel Windows NT   10.2.0.2
  Y(100%)   46 Linux x86          10.1.0.4

 

11gR2 EMCONSOLE SHOWS NEGATIVE VALUES WHEN VIEWING DATAFILE USAGE SPACE IN 11.2.0.1

 

Hdr: 9868375 11.2.0.1 STORAGE 11.2.0.1 PRODID-1366 PORTID-226 9548105
Abstract: EMCONSOLE SHOWS NEGATIVE VALUES WHEN VIEWING DATAFILE USAGE SPACE IN 11.2.0.1

PROBLEM:
--------
The display of the usage of the data files (Tablespaces -> Datafiles) shows 
values in the range negative. The fill rate is not shown correctly.

This is the same exact issue as explained in note 559414.1 but on top of 
11.2.0.1 and also the issue does not seem fixed when trying the workaround by 
clearing the recycle bin or by turning it off.

DIAGNOSTIC ANALYSIS:
--------------------
Checked running the statements from sqlplus they provided the correct value 
and checked this Bug 5549540 seems it should have been fixed in 11G but still 
having the same issue.

WORKAROUND:
-----------
To run the statements and check the data file usage from sqlplus.

 

11gR1 Grid Control Shows Duplicate Datafile Names Multiple times (Some With Negative Values) When Drilling Onto Tablespace Link

Applies to:
Enterprise Manager Grid Control - Version: 10.2.0.5 to 11.1.0.1 - Release: 10.2 to 11.1
Information in this document applies to any platform.
Symptoms
The list of datafiles displayed for a tablespace in EM Grid Control is wrong, so that some 
datafiles are listed repeatedly multiple times, and some of the sizes are wrong and showing a negative value.
This is seen from database home page => server tab => tablespaces.
Click on tablespace name, a list of datafiles associated with that tablespace should be returned.
Cause
If the SQL statement which Grid Control uses is run manually at Sqlplus, the same output of the 
same datafile listed multiple times can also be seen. 
Here you have the SQL statement launched by the Grid Control to collect the list of datafiles :
SELECT d.file_name, TO_CHAR((d.bytes / 1024 / 1024), '99999990.000'), 
NVL(TO_CHAR(((d.bytes - s.bytes) / 1024 / 1024), '99999990.000'), TO_CHAR((d.bytes / 1024 / 1024), '99999990.000')), 
d.file_id, d.autoextensible, d.increment_by, d.maxblocks FROM sys.dba_data_files d, 
(SELECT file_id, SUM(bytes) bytes FROM sys.dba_free_space WHERE tablespace_name = '&tablespaceName' 
GROUP BY file_id) s WHERE (s.file_id (+)= d.file_id) AND (d.tablespace_name = '&tablespaceName')
This is the issue described in following bug :
Bug 8499373 GRID CONTROL SHOWS SAME DATAFILE NAME REPEATEDLY WHEN DRILL INTO A TABLESPACE
This bug is actually closed as a duplicate of a database bug Bug 8570838 WRONG RESULTS FOR QUERY AGAINST 
DBA_DATA_FILES IN 10.2.0.4
The root cause of the database bug is :
Bug 7155655 Intermittent Wrong Results/Plan Instability from Join Predicate Push Down (JPPD)
Solution
Apply the Patch 7155655. 
Note :
Depending on platform it may be necessary to apply a merge patch for this - eg. in some cases it was 
necessary to apply Patch 7278117.
Hdr: 9748363 11.2.0.1 STORAGE 11.1.0.1 PRODID-1366 PORTID-226 9548105
Abstract: NEGATIVE VALUES REPORTED BY GRIDCONTROLFOR TABLESPACE WITH AT LEAST TWO DATAFILE
*** 05/25/10 09:46 am ***
TAR
---
3-1772640611
Problem Description
-------------------
The Used (MB) column on the datafiles summary webpage in the 11.2 grid 
control 
shows negative values for a tablespace containing at least two datafiles
Environment Information
-----------------------
Test Case Step-by-Step Instructions
-----------------------------------
Test Case Location
------------------
Diagnostic Analysis
-------------------
This is the same as bug 9548105 except for this is for GridControl vs 
dbconsole.
SELECT d.file_name, TO_CHAR((d.bytes / 1024 / 1024), '99999990.000'),
NVL(TO_CHAR(((d.bytes - s.bytes) / 1024 / 1024), '99999990.000'),
TO_CHAR((d.bytes / 1024/ 1024), '99999990.000')), d.file_id, 
d.autoextensible,
d.increment_by, d.maxblocks FROM sys.dba_data_files d,  
(SELECT ts.name tablespace_name, SUM(e.length * ts.blocksize) bytes  
FROM sys.fet$ e, sys.ts$ ts  WHERE ts.ts# = e.ts# GROUP BY ts.name  
UNION ALL  SELECT ts.name tablespace_name, SUM(e.blocks * ts.blocksize) bytes 
FROM sys.dba_lmt_free_space e, sys.ts$ ts  WHERE ts.ts# = e.tablespace_id 
GROUP BY ts.name) s
WHERE (s.tablespace_name = d.tablespace_name) AND (d.tablespace_name = :1)
Performance
-----------
NLS Information
---------------
-------
Log Files Location
------------------
Reproducibility
---------------
yes
URL
---
Did you test with the latest version?
-------------------------------------
Available Workarounds
---------------------
none
Related Bugs
------------
9548105

Oracle内部错误:ORA-00600[13013][5001]故障诊断一例

周五被叫到客户现场解决一套10.2.0.4 RAC数据库上的ORA-00600[13013]内部错误问题,这个问题同事已经在上午通过远程拨号了解过情况,
初步判断是索引存在讹误corruption引起的600。

前期诊断

同事在我抵达现场之前已经做了初步的诊断,该10.2.0.4上的RAC系统主用节点的告警日志中多次出现ORA-00600:[13013], [5001]、ORA-00600:[qertbFetchByRowID]及ORA-00600: [25027] 等内部错误,具体的日志如下:

Fri Sep 16 01:16:54 2011
Errors in file /oravl01/oracle/admin/CRMDB2/udump/crmdb22_ora_2138788.trc:
ORA-00600: internal error code, arguments: [13013], [5001], [66209], [50730514], [23], [50730514], [3], []
Fri Sep 16 01:16:55 2011
Trace dumping is performing id=[cdmp_20110916011655]
Fri Sep 16 01:17:06 2011
Errors in file /oravl01/oracle/admin/CRMDB2/udump/crmdb22_ora_2138788.trc:
ORA-00600: internal error code, arguments: [13013], [5001], [66209], [50730514], [23], [50730514], [3], []
以下为13013 trace
ORA-00600: internal error code, arguments: [13013], [5001], [66209], [50730514], [23], [50730514], [3], []
Current SQL statement for this session:
UPDATE CUST_SUBSCRB_PERSONAL A SET a.cust_mobile='A04204441' WHERE a.subscrbid=71524739
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst+001c          bl       ksedst1              088424844 ? 041124844 ?
ksedmp+0290          bl       ksedst               104A54870 ?
ksfdmp+0018          bl       03F30204
kgeriv+0108          bl       _ptrgl
kgesiv+0080          bl       kgeriv               1000D8FE4 ? 0785F7165 ?
10501ED28 ? FFFFFFFFFFF6748 ?
06C626E72 ?
ksesic6+0060         bl       kgesiv               110072D08 ? 7000000AC52B8F8 ?
7000000AC52B3A8 ? 07FFFFFFF ?
000000000 ?
updThreePhaseExe+0c  bl       01F9D7A8
4c
updexe+041c          bl       updThreePhaseExe     7000000AC52B8F8 ? 000000000 ?
110540128 ? FFFFFFFFFFF7C80 ?
opiexe+27d8          bl       updexe               7000000AC52B8F8 ?
FFFFFFFFFFF82B8 ?
kpoal8+0edc          bl       opiexe               FFFFFFFFFFFB434 ?
FFFFFFFFFFFB198 ?
FFFFFFFFFFF9608 ?
opiodr+0ae0          bl       _ptrgl
ttcpip+1020          bl       _ptrgl
opitsk+1124          bl       01F9F2A0
opiino+0990          bl       opitsk               000000000 ? 000000000 ?
============
Plan Table
============
------------------------------------------------------+-----------------------------------+
| Id  | Operation           | Name                    | Rows  | Bytes | Cost  | Time      |
------------------------------------------------------+-----------------------------------+
| 0   | UPDATE STATEMENT    |                         |       |       |     1 |           |
| 1   |  UPDATE             | CUST_SUBSCRB_PERSONAL   |       |       |       |           |
| 2   |   INDEX UNIQUE SCAN | CUST_SUBSCRB_PERSONAL_PK|     1 |    33 |     1 |  00:00:01 |
------------------------------------------------------+-----------------------------------+
===============================================================================================================
同时还伴随有ORA-00600:[qertbFetchByRowID]出现
Fri Sep 16 01:08:57 2011
Errors in file /oravl01/oracle/admin/CRMDB2/udump/crmdb22_ora_2138788.trc:
ORA-00600: internal error code, arguments: [qertbFetchByRowID], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [qertbFetchByRowID], [], [], [], [], [], [], []
Current SQL statement for this session:
SELECT * from CUST_SUBSCRB_PERSONAL A WHERE a.subscrbid=307557025
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst+001c          bl       ksedst1              578318D500000003 ?
FFFFFFFFFFF80D0 ?
ksedmp+0290          bl       ksedst               104A54870 ?
ksfdmp+0018          bl       03F30204
kgerinv+00dc         bl       _ptrgl
kgeasnmierr+004c     bl       kgerinv              11048AE68 ? 000000000 ?
000000001 ? 104BDEC18 ?
110489398 ?
qertbFetchByRowID+0  bl       03F2EF38
d34
opifch2+141c         bl       03F2EB1C
opifch+003c          bl       opifch2              1100DD338 ? 000000000 ?
FFFFFFFFFFF9980 ?
============
Plan Table
============
---------------------------------------------------------------+-----------------------------------+
| Id  | Operation                    | Name                    | Rows  | Bytes | Cost  | Time      |
---------------------------------------------------------------+-----------------------------------+
| 0   | SELECT STATEMENT             |                         |       |       |     1 |           |
| 1   |  TABLE ACCESS BY INDEX ROWID | CUST_SUBSCRB_PERSONAL   |     1 |   287 |     1 |  00:00:01 |
| 2   |   INDEX UNIQUE SCAN          | CUST_SUBSCRB_PERSONAL_PK|     1 |       |     1 |  00:00:01 |
---------------------------------------------------------------+-----------------------------------+
===============================================================================================================
此外还会出现ORA-00600: [25027], [6], [1443670797], [], [], [], [], []
Fri Sep 16 15:06:00 2011
Errors in file /oravl01/oracle/admin/CRMDB2/udump/crmdb22_ora_1872436.trc:
ORA-00600: internal error code, arguments: [25027], [6], [1443670797], [], [], [], [], []
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [25027], [6], [4278267905], [], [], [], [], []
Current SQL statement for this session:
select q.subscrbid,q.serv_lvl,q.cust_lvl
from cust_subscrb_personal q
where q.serv_lvl is null
or q.cust_lvl is null
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst+001c          bl       ksedst1              088424844 ? 041124844 ?
ksedmp+0290          bl       ksedst               104A54870 ?
ksfdmp+0018          bl       03F30204
kgeriv+0108          bl       _ptrgl
kgesiv+0080          bl       kgeriv               000000000 ? 000000000 ?
000000000 ? 105673724 ?
00000000C ?
ksesic2+0060         bl       kgesiv               000000245 ? 0FFFFFFFF ?
700000010013BD0 ?
700000010018078 ? 110000998 ?
krtd2abh+040c        bl       ksesic2              61C3000061C3 ? 000000000 ?
000000006 ? 000000000 ?
0FF013001 ? 000000FA0 ?
105673FB8 ? 105673FA8 ?
kcbgtcr+24a0         bl       krtd2abh             700000471D95738 ?
7000001A9F0F050 ?
FFFFFFFFFFF78E0 ?
700000471D93500 ? 000000003 ?
ktrget+04ac          bl       kcbgtcr              1105A75A8 ? 000002000 ?
000000000 ? 000001940 ?
kdsgrp+00bc          bl       ktrget               000000064 ? 110000998 ?
700000010008000 ?
kdsgnp+0444          bl       kdsgrp               000000000 ? 000000000 ?
FFFFFFFFFFF80D0 ?
kafger+08fc          bl       kdsgnp               000000000 ? 000000000 ?
111119E38 ?
kdstf1100101km+0f58  bl       kafger               1105A74D8 ? 111119E38 ?
FFFFFFFFFFF8190 ? 000000000 ?
70000046DCA8488 ? 000000000 ?
0FFFF9608 ? 110471948 ?
kdsttgr+1a04         bl       kdstf1100101km       111119E38 ? 0000102A1 ?
70000042DF6DC50 ?
FFFFFFFFFFF8418 ? 0FFFFFFFF ?
000001FE8 ? 000000000 ?
000000000 ?
qertbFetch+09b8      bl       kdsttgr              111119E38 ? 000000418 ?
000000000 ? 000000000 ?
000000000 ? 000000000 ?
000000418 ? BE4610195B2C ?
opifch2+141c         bl       03F2EB1C
opifch+003c          bl       opifch2              1100DD338 ? 000000000 ?
FFFFFFFFFFF9980 ?
============
Plan Table
============
--------------------------------------------------+-----------------------------------+
| Id  | Operation          | Name                 | Rows  | Bytes | Cost  | Time      |
--------------------------------------------------+-----------------------------------+
| 0   | SELECT STATEMENT   |                      |       |       |  4898 |           |
| 1   |  TABLE ACCESS FULL | CUST_SUBSCRB_PERSONAL|    15 |   165 |  4898 |  00:00:59 |
--------------------------------------------------+-----------------------------------+

我们先从ORA-00600:[13013], [5001]内部错误入手,Mos Note <How to resolve ORA-00600 [13013], [5001] [ID 816784.1]> 比较翔实地介绍了该600错误,其argument的具体含义如下:

Arg [a] Passcount
Arg [b] Data Object number
Arg [c] Tablespace Relative DBA of block containing the row to be updated
Arg [d] Row Slot number
Arg [e] Relative DBA of block being updated (should be same as [c])
Arg [f] Code

我们这里的[13013], [5001], [66209], [50730514], [23], [50730514], [3], [] 就是

data_object_id = 66209 = 0x102A1
RDBA = 50730514 即  3061612 fild id=12  block id =398866
Row Slot number = 23
Code = 3

翻译过来就是对data_object_id为66209 的对象在12号数据文件上的398866块的第23行做代码为3的检测时发现了异常,通过data_object_id可以定位到该对象就是正在执行的SQL语句中的CUST_SUBSCRB_PERSONAL表。

ORA-00600:[13013], [5001]可能由表上的数据行或索引中的记录逻辑讹误(logical Corruption)所造成,这里要强调一下是逻辑讹误,而非物理损坏。逻辑讹误一般是由于Oracle Bug或者memory Corruption 所引起的,该ORA-00600:[13013], [5001]已知可能由5085288和4549673等多个Bug 引发,在版本10.2.0.4 上这些bug 触发概率较高。

伴随ORA-00600:[13013], [5001]发生的ORA-600 [qertbFetchByRowID]常由索引损坏引起,介绍了该问题:

ORA-600 [qertbFetchByRowID] Select Queries on 10.2.0.4 Database [ID 755592.1]
Applies to:
Oracle Server - Enterprise Edition - Version: 10.2.0.4 and later   [Release: 10.2 and later ]
Symptoms
The following error was received during SELECT operation:
ORA-00600: internal error code, arguments: [qertbFetchByRowID], [], [], [], [], [], [], []
The call stack contains:
qertbFetchByRowID <- qergiFetch <- rwsfcd <- qerhjFetch
< - qerjotFetch <- rwsfcd <- qertqoFetch <- qerpxSlaveFetch <- qerpxFetch
Cause
Index corruption.
Solution
1.
Analyze the affected object to look for logical corruption.
SQL> analyze table TABLENAME validate structure cascade;
This will place an exclusive lock on the table.
If an error is reported try dropping and recreating the index(es) first.
2.
If this problem is occurring on a version less than 10.2.0.4, ensure the patch for
Bug 4883635 has been applied if Materialized Views are used.
BUG:5866783 - ORA-00600: [QERTBFETCHBYROWID] ON SELECT FROM ONE TABLE

而ORA-00600: [25027]也是类似的问题:

ORA-600 [25027] [ID 284433.1]
PURPOSE:
This article represents a partially published OERI note.
It has been published because the ORA-600 error has been
reported in at least one confirmed bug.
Therefore, the SUGGESTIONS section of this article may help
in terms of identifying the cause of the error.
This specific ORA-600 error may be considered for full publication
at a later date. If/when fully published, additional information
will be available here on the nature of this error.
ERROR:
Format: ORA-600 [25027] [a] [b]
VERSIONS:
versions 9.2 and above
ARGUMENTS:
Arg [a]  Tablespace Number (TSN)
Arg [b]  Decimal Relative Data Block Address (RDBA)
SUGGESTIONS:
1. If the Arg [b] (the RDBA) is 0 (zero), then this could be due to fake indexes.
The following query will list fake indexes:
select do.owner,do.object_name, do.object_type,sysind.flags
from dba_objects do, sys.ind$ sysind
where do.object_id = sysind.obj#
and bitand(sysind.flags,4096)=4096;
If the above query returns any rows, check the objects involved and consider
dropping them as they can cause this error. 
2. Run analyze table validate structure on the table referenced in the Current SQL statement in
the related trace file.
If the Known Issues section below does not help in terms of identifying
a solution, please submit the trace files and alert.log to Oracle
Support Services for further analysis.

但是请注意在该故障示例中ORA-00600: [25027]给出的Arg [b] Decimal Relative Data Block Address (RDBA)是一个完全不相干的RDBA,具体原因尚不明确。

小插曲

这当中有一个小的插曲,原来客户这里的应用人员需要尽快在该CUST_SUBSCRB_PERSONAL表上执行一段查询语句,但是该语句一旦执行就会遇到ORA-600 [qertbFetchByRowID]错误,因为是周五小周末,所以如果今天无法运行的话,就要拖到下个礼拜了。

所以被要求优先解决该语句执行的问题,首先看了一下该语句的执行计划:

SQL> explain plan for select count(*) from SHUCRM2C.cust_subscrb_personal a,  SHUCRM2C.wzy_bs20110916 b
2  where a.svcnum=b.svcnum and a.countyid='A00' and a.serv_lvl=0;
Explained.
SQL> select * from table(dbms_xplan.display());
PLAN_TABLE_OUTPUT
------------------------------------------------
Plan hash value: 3616548176
-----------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                           |     1 |    31 |   231   (1)| 00:00:03 |
|   1 |  SORT AGGREGATE               |                           |     1 |    31 |            |          |
|*  2 |   HASH JOIN                   |                           |   425 | 13175 |   231   (1)| 00:00:03 |
|*  3 |    TABLE ACCESS BY INDEX ROWID| CUST_SUBSCRB_PERSONAL     |   425 |  8075 |   208   (0)| 00:00:03 |
|*  4 |     INDEX RANGE SCAN          | CUST_SUBSCRB_PERSONAL_2IX | 42462 |       |    20   (0)| 00:00:01 |
|   5 |    TABLE ACCESS FULL          | WZY_BS20110916            | 39792 |   466K|    22   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("A"."SVCNUM"="B"."SVCNUM")
3 - filter(TO_NUMBER("A"."SERV_LVL")=0)
4 - access("A"."COUNTYID"='A00')
Note
-----
- dynamic sampling used for this statement
23 rows selected.
SQL> select count(*) from SHUCRM2C.cust_subscrb_personal a,  SHUCRM2C.wzy_bs20110916 b
2  where a.svcnum=b.svcnum and a.countyid='A00' and a.serv_lvl=0;
select count(*) from SHUCRM2C.cust_subscrb_personal a,  SHUCRM2C.wzy_bs20110916 b
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [qertbFetchByRowID], [], [], [], [], [], [], []

以上执行计划中对表上的索引CUST_SUBSCRB_PERSONAL_2IX做了range scan后通过获得的rowid到表上去fetch记录(qertbFetchByRowID),在实际fetch by rowid 的时候引发了ORA-600错误。

因为需求较为紧急,所以我还是考虑能否使执行计划绕过该索引,投石问路给语句加上了RULE提示,再次执行:

SQL> select /*+ rule */
2  count(*) from SHUCRM2C.cust_subscrb_personal a,  SHUCRM2C.wzy_bs20110916 b
3  where a.svcnum=b.svcnum and a.countyid='A00' and a.serv_lvl=0;
COUNT(*)
----------
11559
Execution Plan
----------------------------------------------------------
Plan hash value: 2851452146
-------------------------------------------------------------------
| Id  | Operation                     | Name                      |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                           |
|   1 |  SORT AGGREGATE               |                           |
|   2 |   NESTED LOOPS                |                           |
|   3 |    TABLE ACCESS FULL          | WZY_BS20110916            |
|*  4 |    TABLE ACCESS BY INDEX ROWID| CUST_SUBSCRB_PERSONAL     |
|   5 |     AND-EQUAL                 |                           |
|*  6 |      INDEX RANGE SCAN         | CUST_SUBSCRB_SVCNUM_2006  |
|*  7 |      INDEX RANGE SCAN         | CUST_SUBSCRB_PERSONAL_2IX |
-------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter(TO_NUMBER("A"."SERV_LVL")=0)
6 - access("A"."SVCNUM"="B"."SVCNUM")
7 - access("A"."COUNTYID"='A00')
Note
-----
- rule based optimizer used (consider using cbo)

想不到居然执行成功了,但是执行计划当中仍有CUST_SUBSCRB_PERSONAL_2IX这个索引,这让我潜意识中认识到很可能是表而非索引存在逻辑讹误。

不管怎么说至少解了燃眉之急,先把这个方法告诉应用人员,然后回过头来继续诊断。

初步验证索引

为了确定到底是表还是索引存在逻辑讹误,一般需要使用analyze table validate structure cascade命令以验证表和索引的结构,但是该命令会要求以共享方式锁住表(TM mode=4),对于更新频繁的生产系统中的关键应用表,这是不可接受的。同事在接手这个故障时已经考虑过该问题了,所以他推荐用查询的方式来检验到底是那些索引存在corruption,具体方法如下:

通过ROWID_CREATE 构造出ROWID
SQL> select dbms_rowid.ROWID_CREATE(1,66209,12,398866,23) from dual;
DBMS_ROWID.ROWID_C
------------------
AAAQKhAAMAABhYSAAX
SQL> select * from SHUCRM2O.CUST_SUBSCRB_PERSONAL a where rowid=chartorowid('AAAQKhAAMAABhYSAAX');
no rows selected
SQL> SELECT INDEX_NAME,COLUMN_NAME FROM DBA_IND_COLUMNS WHERE TABLE_NAME='CUST_SUBSCRB_PERSONAL' ORDER BY 1;
INDEX_NAME                     COLUMN_NAME
------------------------------ ----------------------------------------
CUST_SUBSCRB_CARD_NUM          VIP_CARD_NUM
CUST_SUBSCRB_MANAGERID_2006    MANAGERID
CUST_SUBSCRB_PERSONAL_2IX      COUNTYID
CUST_SUBSCRB_PERSONAL_3IX      CUST_LVL
CUST_SUBSCRB_PERSONAL_PK       SUBSCRBID
CUST_SUBSCRB_SERV_COUNTY       SERV_COUNTYID
CUST_SUBSCRB_SVCNUM_2006       SVCNUM
IDX_CUST_SUBSCRB_PERSONAL_01   SERV_LVL
之后强制使用index提示使用不同的索引
SQL> select /*+ INDEX(a CUST_SUBSCRB_CARD_NUM ) */ * from SHUCRM2O.CUST_SUBSCRB_PERSONAL a
where rowid=chartorowid('AAAQKhAAMAABhYSAAX');
no rows selected
SQL> select /*+ INDEX(a CUST_SUBSCRB_MANAGERID_2006) */ * from SHUCRM2O.CUST_SUBSCRB_PERSONAL a
where rowid=chartorowid('AAAQKhAAMAABhYSAAX');
no rows selected
SQL> select /*+ INDEX(a CUST_SUBSCRB_PERSONAL_2IX) */ * from SHUCRM2O.CUST_SUBSCRB_PERSONAL a
where rowid=chartorowid('AAAQKhAAMAABhYSAAX');
select /*+ INDEX(a CUST_SUBSCRB_PERSONAL_2IX) */ * from SHUCRM2O.CUST_SUBSCRB_PERSONAL a
where rowid=chartorowid('AAAQKhAAMAABhYSAAX')
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [qertbFetchByRowID], [], [], [], [], [], [], []
SQL> select /*+ INDEX(a CUST_SUBSCRB_PERSONAL_3IX) */ * from SHUCRM2O.CUST_SUBSCRB_PERSONAL a
where rowid=chartorowid('AAAQKhAAMAABhYSAAX');
no rows selected
SQL> select /*+ INDEX(a CUST_SUBSCRB_PERSONAL_PK ) */ * from SHUCRM2O.CUST_SUBSCRB_PERSONAL a
where rowid=chartorowid('AAAQKhAAMAABhYSAAX');
select /*+ INDEX(a CUST_SUBSCRB_PERSONAL_PK ) */ * from SHUCRM2O.CUST_SUBSCRB_PERSONAL a
where rowid=chartorowid('AAAQKhAAMAABhYSAAX')
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [qertbFetchByRowID], [], [], [], [], [], [], []
SQL> select /*+ INDEX(a CUST_SUBSCRB_SERV_COUNTY) */ * from SHUCRM2O.CUST_SUBSCRB_PERSONAL a
where rowid=chartorowid('AAAQKhAAMAABhYSAAX');
no rows selected
SQL> select /*+ INDEX(a CUST_SUBSCRB_SVCNUM_2006) */ * from SHUCRM2O.CUST_SUBSCRB_PERSONAL a
where rowid=chartorowid('AAAQKhAAMAABhYSAAX');
select /*+ INDEX(a CUST_SUBSCRB_SVCNUM_2006) */ * from SHUCRM2O.CUST_SUBSCRB_PERSONAL a
where rowid=chartorowid('AAAQKhAAMAABhYSAAX')
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [qertbFetchByRowID], [], [], [], [], [], [], []
SQL> select /*+ INDEX(a IDX_CUST_SUBSCRB_PERSONAL_01) */ * from SHUCRM2O.CUST_SUBSCRB_PERSONAL a
where rowid=chartorowid('AAAQKhAAMAABhYSAAX');
select /*+ INDEX(a IDX_CUST_SUBSCRB_PERSONAL_01) */ * from SHUCRM2O.CUST_SUBSCRB_PERSONAL a
where rowid=chartorowid('AAAQKhAAMAABhYSAAX')
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [qertbFetchByRowID], [], [], [], [], [], [], []

可以看到使用以上方法在其中4个索引上遇到了ORA-00600:[qertbFetchByRowID]。 但是使用该方法还是无法彻底搞清楚逻辑到底存在于table还是index上?

 

必要的工具

 

这里我们要介绍一下validate structure还存在online在线使用的选项,在online模式下整个验证过程不会在表上加任何TM锁,这一点具体可以使用10704 事件来证明:

 

10704, 00000, "Print out information about what enqueues are being obtained"
// *Cause:  When enabled, prints out arguments to calls to ksqcmi and
//          ksqlrl and the return values.
// *Action: Level indicates details:
//   Level: 1-4: print out basic info for ksqlrl, ksqcmi
//          5-9: also print out stuff in callbacks:  ksqlac, ksqlop
//          10+: also print out time for each line
SQL> create table validate_me (t1 int);
Table created.
SQL> create index ind_validate_me on validate_me(t1);
Index created.
SQL> insert into validate_me select rownum  from dba_tables where rownum<201;
200 rows created.
SQL> commit;
Commit complete.
SQL> oradebug setmypid;
Statement processed.
SQL> oradebug event 10704 trace name context forever,level 10;
Statement processed.
SQL> analyze table validate_me validate structure cascade online;
Table analyzed.
SQL> oradebug tracefile_name;
c:\app\diag\rdbms\g11r2\g11r2\trace\g11r2_ora_3020.trc
g11r2_ora_3020.trc=========================================================================
*** 2011-09-18 20:55:25.373
Oradebug command 'event 10704 trace name context forever,level 10' console output: <none>
*** 2011-09-18 20:55:49.765
ksqgtl *** TX-00060005-000006a9 mode=6 flags=0x401 timeout=0 ***
ksqgtl: xcb=0x31434194, ktcdix=2147483647, topxcb=0x31434194
ktcipt(topxcb)=0x0
*** 2011-09-18 20:55:49.766
ksucti: init txn DID from session DID
ksqgtl:
ksqlkdid: 0001-001B-00000006
*** 2011-09-18 20:55:49.766
*** ksudidTrace: ksqgtl
ktcmydid(): 0001-001B-00000006
ksusesdi:   0000-0000-00000000
ksusetxn:   0001-001B-00000006
ksqgtl: RETURNS 0
*** 2011-09-18 20:55:49.766
ksqrcl: TX,60005,6a9
ksqrcl: returns 0

但是validate strucutre online也有它的缺点,那就是在线模式下结构验证命令将不填充索引的状态信息到index_stats视图,如:

SQL> analyze index  ind_validate_me validate structure  ;
Index analyzed.
SQL> select count(*) from index_stats;
COUNT(*)
----------
1
SQL> conn / as sysdba
Connected.
SQL> analyze index  ind_validate_me validate structure  online;
Index analyzed.
SQL> select count(*) from index_stats;
COUNT(*)
----------
0

但是因为我们这里只要用到validate structure的结构验证功能,而对索引的详细状态没有兴趣,所以我们可以充分利用该online模式。

 

具体验证

 

使用validate structure online具体验证该问题表和表上的索引:

先仅对表进行验证,以区分到底是表还是索引存在逻辑讹误 
SQL> analyze table SHUCRM2O.CUST_SUBSCRB_PERSONAL  validate structure online;
analyze table SHUCRM2O.CUST_SUBSCRB_PERSONAL  validate structure online
*
ERROR at line 1:
ORA-01498: block check failure - see trace file
若验证发现问题会出现ORA-01498错误,并产生trace 文件
kdrchk:  row is marked as both a Key and Clustered
prow=0x7000001f241c45c flag=0xff
Block Checking: DBA = 50730514, Block Type = KTB-managed data block
data header at 0x7000001f241c07c
kdbchk: bad row tab 0, slot 23
Block header dump:  0x03061612
Object id on Block? Y
seg/obj: 0x102a1  csc: 0xb43.ecde68ca  itc: 3  flg: E  typ: 1 - DATA
brn: 0  bdba: 0x3061609 ver: 0x01 opc: 0
inc: 0  exflg: 0
............
可以看到问题发生在  23 号槽位上
tab 0, row 23, @0x3e0
tl: 4 fb: KCHDFLPN lb: 0x1  cc: 0 cki: 1
tab 0, row 24, @0x3de
tl: 2 fb: --HD---N lb: 0x30
tab 0, row 25, @0x3dc
tl: 2 fb: --HD---- lb: 0x30
tab 0, row 26, @0x3d8
tl: 4 fb: KCHDFLPN lb: 0xff  cc: 0 cki: 255

这里居然第23个row piece的 flag 是KCHDFLPN 即实际fb = 0xff,也就是该row piece同时被标记为key和clustered(row is marked as both a Key and Clustered),因此不管当服务进程尝试update该问题行记录或者通过ROWID访问该row时都出现了ORA-00600错误,虽然其错误代码不同,但都是由于该数据块中第23行记录的flag存在讹误引起的。

关于该ORA-00600:[13013], [5001]问题的成因和解决方法,更多内容可以参考<手工模拟Oracle数据块逻辑讹误引发ORA-00600:[13013], [5001]一例>一文。

 

待修订!

Upgrade GI/CRS 11.1.0.7 to 11.2.0.2. Rootupgrade.sh Hanging

Upgrade grid 11.1.0.7 to 11.2.0.2. Rootupgrade.sh Hanging

We installed 11gR2 GI software and applied PSU2 patches upon getting runupgrade.sh prompt.runupgrade.sh hang on the first node.

[root@vrh8 client]# uname -a
Linux vrh8 2.6.18-238.5.1.el5 #1 SMP Mon Feb 21 05:52:39 EST 2011 x86_64

x86_64 x86_64 GNU/Linux
cluvfy passed with 2 ignorable errors:

[root@vrh8 vrh8]# cd /tmp
[root@vrh8 tmp]# df -lh .
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg0-tmp 992M 263M 679M 28% /tmp

[root@vrh8 grid]# grep fail cluvfy_during_inst.log
/tmp l118464lwap1049 /tmp 713MB 1GB failed
Result: Free disk space check failed for “l118464lwap1049:/tmp”
/tmp vrh8 /tmp 692.131MB 1GB failed
Result: Free disk space check failed for “vrh8:/tmp”
Result: Check for multiple users with UID value 0 failed

[root@vrh8 vrh8]# cd /tmp
[root@vrh8 tmp]# df -lh .
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg0-tmp 992M 263M 679M 28% /tmp

We installed 11gR2 GI software and applied PSU2 patches upon getting runupgrade.sh prompt.

runupgrade.sh hang on the first node. We followed “How to Proceed from Failed Upgrade to 11gR2

Grid Infrastructure on Linux/Unix [ID 969254.1]” 1A section, it didn’t help.

[root@vrh8 bin]# ./crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [11.1.0.7.0]

rootupgrade.sh output:

[root@vrh8 11.2.0.2]# ./rootupgrade.sh
Running Oracle 11g root script…

The following environment variables are set as:
ORACLE_OWNER= oracrs
ORACLE_HOME= /d22/oracrs/11.2.0.2

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of “dbhome” have not changed. No need to overwrite.
The contents of “oraenv” have not changed. No need to overwrite.
The contents of “coraenv” have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /d22/oracrs/11.2.0.2/crs/install/crsconfig_params
LOCAL ADD MODE
Creating OCR keys for user ‘root’, privgrp ‘root’..
Operation successful.
OLR initialization – successful
Adding daemon to inittab
ACFS-9200: Supported
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9312: Existing ADVM/ACFS installation detected.
ACFS-9314: Removing previous ADVM/ACFS installation.
ACFS-9315: Previous ADVM/ACFS components successfully removed.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies – this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.

****hanging here for more than 2 hrs, so we cancelled it

INT at /d22/oracrs/11.2.0.2/crs/install/crsconfig_lib.pm line 1173.
/d22/oracrs/11.2.0.2/perl/bin/perl -I/d22/oracrs/11.2.0.2/perl/lib –

I/d22/oracrs/11.2.0.2/crs/install /d22/oracrs/11.2.0.2/crs/install/rootcrs.pl execution failed
Oracle root script execution aborted!

1. The below logs are required to analyze this issue.

NEW_GRID_HOME/cfgtoollogs/crsconfig/*.*
NEW_GRID_HOME/log/<nodename>/*.*

Please upload the logs under the above directories. Zip and upload the files including the subdirectories.

2. When the rootupgrade was handing, did you check the usage of /tmp. Was free space exhausting?

=== ODM Research ===

There has been multiple root script run for upgrade. I have taken the first incident from the file
rootcrs_vrh8.log:
—————————————–

2011-02-13 13:07:55: Successfully started requested Oracle stack daemons
2011-02-13 13:07:55: Upgrading the existing voting disks!
2011-02-13 13:07:55: Executing /d22/oracrs/11.2.0.2/bin/cssvfupgd
2011-02-13 13:07:55: Executing cmd: /d22/oracrs/11.2.0.2/bin/cssvfupgd <<<<<<<<<<<<<<< The root script seems to hang at this point.
2011-02-13 15:01:16: ###### Begin DIE Stack Trace ######
2011-02-13 15:01:16: Package File Line Calling
2011-02-13 15:01:16: ————— ——————– —- ———-
2011-02-13 15:01:16: 1: main rootcrs.pl 325 crsconfig_lib::dietrap
2011-02-13 15:01:16: 2: crsconfig_lib crsconfig_lib.pm 9301 main::__ANON__
2011-02-13 15:01:16: 3: crsconfig_lib crsconfig_lib.pm 9301 (eval)
2011-02-13 15:01:16: 4: crsconfig_lib crsconfig_lib.pm 9260 crsconfig_lib::system_cmd_capture1
2011-02-13 15:01:16: 5: crsconfig_lib crsconfig_lib.pm 9247 crsconfig_lib::system_cmd_capture
2011-02-13 15:01:16: 6: crsconfig_lib crsconfig_lib.pm 924 crsconfig_lib::system_cmd
2011-02-13 15:01:16: 7: oracss oracss.pm 275 crsconfig_lib::run_crs_cmd
2011-02-13 15:01:16: 8: crsconfig_lib crsconfig_lib.pm 1019 oracss::CSS_upgrade
2011-02-13 15:01:16: 9: crsconfig_lib crsconfig_lib.pm 1006 crsconfig_lib::start_cluster
2011-02-13 15:01:16: 10: main rootcrs.pl 697 crsconfig_lib::perform_start_cluster
2011-02-13 15:01:16: ####### End DIE Stack Trace #######

cssvfupgd.log:
——————–
Oracle Database 11g Clusterware Release 11.2.0.2.0 – Production Copyright 1996, 2010 Oracle. All rights reserved.
2011-02-13 13:07:55.356: [ OCRRAW][3605955376]prgval:buffer passed is too small
2011-02-13 13:07:55.361: [CSSVFUPG][3605955376]cssvfupgd_GetVFList: found voting file /s01/app/ocrvot/VOTEDISK/UAT2_vdisk1.dat
2011-02-13 13:07:55.365: [ OCRRAW][3605955376]prgval:buffer passed is too small
2011-02-13 13:07:55.369: [CSSVFUPG][3605955376]cssvfupgd_GetVFList: found voting file /s01/app/ocrvot/VOTEDISK/UAT2_vdisk2.dat
2011-02-13 13:07:55.373: [ OCRRAW][3605955376]prgval:buffer passed is too small
2011-02-13 13:07:55.377: [CSSVFUPG][3605955376]cssvfupgd_GetVFList: found voting file /s01/app/ocrvot/VOTEDISK/UAT2_vdisk3.dat
2011-02-13 13:07:55.402: [CSSVFUPG][3605955376]cssvfupgd_SetNum: Processing SYSTEM.css.misscount
2011-02-13 13:07:55.404: [CSSVFUPG][3605955376]cssvfupgd_SetNum: Processing SYSTEM.css.disktimeout
2011-02-13 13:07:55.406: [CSSVFUPG][3605955376]cssvfupgd_SetNum: Processing SYSTEM.css.reboottime
2011-02-13 13:07:55.408: [CSSVFUPG][3605955376]cssvfupgd_SetNum: Processing SYSTEM.css.diagwait
2011-02-13 13:07:55.414: [CSSVFUPG][3605955376]cssvfupgd_SetNum: Processing SYSTEM.css.pollinterval
2011-02-13 13:07:55.416: [CSSVFUPG][3605955376]cssvfupgd_GetGUID: Fetching GUID for /s01/app/ocrvot/VOTEDISK/UAT2_vdisk1.dat
2011-02-13 13:07:55.419: [ SKGFD][3605955376]NOTE: No asm libraries found in the system

2011-02-13 13:07:55.419: [ CLSF][3605955376]Allocated CLSF context
2011-02-13 13:07:55.419: [ SKGFD][3605955376]Discovery with str:/s01/app/ocrvot/VOTEDISK/UAT2_vdisk1.dat:

2011-02-13 13:07:55.419: [ SKGFD][3605955376]UFS discovery with :/s01/app/ocrvot/VOTEDISK/UAT2_vdisk1.dat:

2011-02-13 13:07:55.420: [ SKGFD][3605955376]Fetching UFS disk :/s01/app/ocrvot/VOTEDISK/UAT2_vdisk1.dat:

2011-02-13 13:07:55.420: [ SKGFD][3605955376]OSS discovery with :/s01/app/ocrvot/VOTEDISK/UAT2_vdisk1.dat:

2011-02-13 13:07:55.421: [ SKGFD][3605955376]Handle 0x124de360 from lib :UFS:: for disk :/s01/app/ocrvot/VOTEDISK/UAT2_vdisk1.dat:

2011-02-13 14:19:31.132: [ SKGFD][3605955376]WARNING:io_getevents timed out 2226 sec >>>>>>>>>>>>>>>>>>>> After about one hour it shows time out error.

2011-02-13 14:19:31.132: [ SKGFD][3605955376]WARNING:io_getevents timed out 2226 sec

The script has stalled at the voting disk upgrade phase. Please provide me the below details.

1. What cluster file system are you using for the voting files? provide its details and the mount options used.

for ocfs, get its mount options
mount | grep ocfs

3. Voting disks details
ls -l /s01/app/ocrvot/VOTEDISK/UAT2_vdisk*

4. Get the diagwait detail.
OLD_CRS_HOME/bin/crsctl get css diagwait

1. What cluster file system are you using for the voting files? provide its details and the mount options used
/dev/emcpowera1 on /s01/app/ocrvot type ocfs2 (rw,_netdev,datavolume,nointr,heartbeat=local)

2. Voting disks details

[root@vrh8 11.2.0.2]# ls -l /s01/app/ocrvot/VOTEDISK/UAT2_vdisk*
-rw-r—– 1 oracrs oinstall 21004288 Jun 11 07:31 /s01/app/ocrvot/VOTEDISK/UAT2_vdisk1.dat
-rw-r—– 1 oracrs oinstall 21004288 Jun 11 07:31 /s01/app/ocrvot/VOTEDISK/UAT2_vdisk2.dat
-rw-r—– 1 oracrs oinstall 21004288 Jun 11 07:31 /s01/app/ocrvot/VOTEDISK/UAT2_vdisk3.dat

 

3. Get the diagwait detail

crsctl get css diagwait
Failure 33 in main Oracle Cluster Registry context initialization: PROC-33: Oracle Cluster Registry is not configured Operating System error [No such file or directory] [2]

owc may not be required now as the issue we face is clear.

The diagwait should not error out, as explained in the following note,
11gR2 rootupgrade.sh Fails as cssvfupgd Can not Upgrade Voting Disk (Doc ID 1102283.1)

Make sure you are running ‘crsctl get css diagwait’ from the old crs home. You can also check it in multiple node. If it errors out, this has to be fixed as explained in the above note.

according to that note ,When I ./oprocd stop ,get error:
[root@l118464lwap1049 bin]# ./oprocd stop
Jun 16 23:24:42.966 | ERR | failed to connect to daemon, errno(111)

ACFS-9200: Supported
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies – this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.

cssvfupgd.log
2011-02-13 23:36:49.311: [ OCRRAW][3394941744]prgval:buffer passed is too small
2011-02-13 23:36:49.315: [CSSVFUPG][3394941744]cssvfupgd_GetVFList: found voting
file /s01/app/ocrvot/VOTEDISK/UAT2_vdisk2.dat
2011-02-13 23:36:49.319: [ OCRRAW][3394941744]prgval:buffer passed is too small
2011-02-13 23:36:49.323: [CSSVFUPG][3394941744]cssvfupgd_GetVFList: found voting
file /s01/app/ocrvot/VOTEDISK/UAT2_vdisk3.dat
2011-02-13 23:36:49.351: [CSSVFUPG][3394941744]cssvfupgd_SetNum: Processing SYST
EM.css.misscount
2011-02-13 23:36:49.354: [CSSVFUPG][3394941744]cssvfupgd_SetNum: Processing SYST
EM.css.disktimeout
2011-02-13 23:36:49.356: [CSSVFUPG][3394941744]cssvfupgd_SetNum: Processing SYST
EM.css.reboottime
2011-02-13 23:36:49.358: [CSSVFUPG][3394941744]cssvfupgd_SetNum: Processing SYST
EM.css.diagwait
2011-02-13 23:36:49.367: [CSSVFUPG][3394941744]cssvfupgd_SetNum: Processing SYST
EM.css.pollinterval
2011-02-13 23:36:49.369: [CSSVFUPG][3394941744]cssvfupgd_GetGUID: Fetching GUID
for /s01/app/ocrvot/VOTEDISK/UAT2_vdisk1.dat
2011-02-13 23:36:49.371: [ SKGFD][3394941744]NOTE: No asm libraries found in t
he system

2011-02-13 23:36:49.372: [ CLSF][3394941744]Allocated CLSF context
2011-02-13 23:36:49.372: [ SKGFD][3394941744]Discovery with str:/s01/app/ocrvo
t/VOTEDISK/UAT2_vdisk1.dat:

2011-02-13 23:36:49.372: [ SKGFD][3394941744]UFS discovery with :/s01/app/ocrv
ot/VOTEDISK/UAT2_vdisk1.dat:

2011-02-13 23:36:49.372: [ SKGFD][3394941744]Fetching UFS disk :/s01/app/ocrvo
t/VOTEDISK/UAT2_vdisk1.dat:

2011-02-13 23:36:49.372: [ SKGFD][3394941744]OSS discovery with :/s01/app/ocrv
ot/VOTEDISK/UAT2_vdisk1.dat:

2011-02-13 23:36:49.372: [ SKGFD][3394941744]Handle 0x98c4360 from lib :UFS::
for disk :/s01/app/ocrvot/VOTEDISK/UAT2_vdisk1.dat:

Question:
in Your update about cssvfupgd.log You stated it was hanging there.
Is there an entry after about 70 minutes about a timeout in that log file like:

2011-02-13 23:36:49.372: [ SKGFD][3394941744]Handle 0x98c4360 from lib :UFS::
for disk :/s01/app/ocrvot/VOTEDISK/UAT2_vdisk1.dat:
2011-02-17 0:48:19.372: [ SKGFD][3394941744]WARNING:io_getevents timed out 4294 sec <<<< present ???

Please provide the following outputs:
rpm -qa|grep ocfs2
uname -a
cat /etc/redhat-release

[root@vrh8 ~]# rpm -qa|grep ocfs2
ocfs2console-1.4.4-1.el5
ocfs2-tools-1.4.4-1.el5
ocfs2-2.6.18-238.5.1.el5-1.4.7-1.el5
[root@vrh8 ~]# uname -a
Linux vrh8 2.6.18-238.5.1.el5 #1 SMP Mon Feb 21 05:52:39 EST 2011 x86_64 x86_64 x86_64 GNU/Linux
[root@vrh8 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.6 (Tikanga)
[root@vrh8 ~]#

Combinations that install SUCCESSFUL:

OEL5.4+ocfs2-1.4.7-1+ocfs2-tools-1.4.4
OEL5.6+ocfs2-1.4.8-1+ocfs2-tools-1.6.3
OEL5.6+ocfs2-1.4.7-1+ocfs2-tools-1.4.4
RHLE5.6+OEL kernel(redhat compatible kernel)+ocfs2-1.4.8-1+ocfs2-tools-1.6.3
RHLE5.6+OEL kernel(redhat compatible kernel)+ocfs2-1.4.7-1+ocfs2-tools-1.4.4
RHEL5.4

Combinations that failed:
RHLE5.6(redhat kernel)+ocfs2-1.4.7-1+ocfs2-tools-1.4.4
RHLE5.6(redhat kernel)+ocfs2-1.4.8-1+ocfs2-tools-1.6.3

Problem reproduces with redhat kernel — RHEL 5.6 with 2.6.18-2xx kernels

Please review the following Note to change the location of your voting disk
Note 428681.1
Title: How to ADD/REMOVE/REPLACE/MOVE Oracle Cluster Registry (OCR) and Voting Disk

Pasting info from —
Oracle? Clusterware Administration and Deployment Guide
11g Release 2 (11.2)

3 Managing Oracle Cluster Registry and Voting Disks
Oracle Universal Installer for Oracle Clusterware 11g release 2 (11.2), does not support the use of raw or block devices. However, if you upgrade from a previous Oracle Clusterware release, then you can continue to use raw or block devices.

[oracrs@vrh8 grid]$ grep fail cluvfy_during_inst_061711.log
/tmp l118464lwap1049 /tmp 706MB 1GB failed
Result: Free disk space check failed for “l118464lwap1049:/tmp”
/tmp vrh8 /tmp 927.1312MB 1GB failed
Result: Free disk space check failed for “vrh8:/tmp”
Result: Check for multiple users with UID value 0 failed
PRVF-5431 : Oracle Cluster Voting Disk configuration check failed

[oracrs@vrh8 grid]$ ./runcluvfy.sh stage -pre crsinst -n vrh8,l118464lwap1049 -verbose|tee cluvfy_during_inst.log

Please upload the following Cluvfy trace log —
$ORA_CRS_HOME/cv/log/cvutrace.log.0

Please download the latest CVU from OTN:
http://www.oracle.com/technetwork/database/clustering/downloads/cvu-download-homepage-099973.html

Please upload
/s02/app/crs/11.2.0.2/log/vrh8/agent/ohasd/oraagent_oracrs/oraagent_oracrs.log

In addition pls upload
/s02/app/crs/11.2.0.2/log/vrh8/agent/ohasd/oracssdagent_root/oracssdagent_root.log

Please run this command on both the new setup and your existing production setup for a quick comparison —
rpm -qa|grep ocfs2

Server with issue:
[root@vrh8 ohasd]# rpm -qa|grep ocfs2
ocfs2console-1.4.4-1.el5
ocfs2-tools-1.4.4-1.el5
ocfs2-2.6.18-238.5.1.el5-1.4.7-1.el5

Prod:

[root@vrh9  bin]# rpm -qa|grep ocfs2
ocfs2-2.6.18-194.el5-1.4.7-1.el5
ocfs2console-1.4.4-1.el5
ocfs2-tools-1.4.4-1.el5
ocfs2-2.6.18-194.8.1.el5-1.4.7-1.el5

[root@vrh8 ~]# uname -a
Linux vrh8 2.6.18-238.5.1.el5 #1 SMP Mon Feb 21 05:52:39 EST 2011 x86_64 x86_64 x86_64 GNU/Linux

[root@vrh8 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.6 (Tikanga)

rpm -qa|grep ocfs2
ocfs2console-1.4.4-1.el5
ocfs2-tools-1.4.4-1.el5
ocfs2-2.6.18-238.5.1.el5-1.4.7-1.el5

@ . from Bug 11876815 (Doc ID 1321757.1)
@ combinations that install SUCCESSFUL:
@ .
@ OEL5.4+ocfs2-1.4.7-1+ocfs2-tools-1.4.4
@ OEL5.6+ocfs2-1.4.8-1+ocfs2-tools-1.6.3
@ OEL5.6+ocfs2-1.4.7-1+ocfs2-tools-1.4.4
@ RHLE5.6+OEL kernel(redhat compatible kernel)+ocfs2-1.4.8-1+ocfs2-tools-1.6.3
@ RHLE5.6+OEL kernel(redhat compatible kernel)+ocfs2-1.4.7-1+ocfs2-tools-1.4.4
@ RHEL5.4
@ .
@ combinations that failed:
@ RHLE5.6(redhat kernel)+ocfs2-1.4.7-1+ocfs2-tools-1.4.4
@ RHLE5.6(redhat kernel)+ocfs2-1.4.8-1+ocfs2-tools-1.6.3
@ .
@ .
@ So that is clear that , it is redhat kernel’s problem.Since RHEL5.6 redhat
@ provided 2.6.18-2xx kernels, we can’t fix redhat kernels, please use Oracle
@ Enterprise kernel (redhat compatible) for installation.

As per last action plan (conveyed if any) you need to contact REDHAT support to know the cause of this issue. Workaround is to not use OCFS and go for raw device for upgrade to succeed.
A Oracle bug 11876815 was logged internally for this hang issue and few combinations of OEL, RHEL, OCFS2 were tried and tested and the combination you are using has not worked for us too (per bug internal updates given above)
The solution provided by Oracle bug developer is to use OEL and not RHEL or contact RHEL support for identifying the cause and solution (incase they have already tested this setup).
Let me know if RHEL support is already engaged and provide the case id so that I can open internal SR for Oracle/Red Hat Joint Escalation Team (JET) Engagement for both vendors to work together internally.

+ the SR issue of grid upgrade from 11.1 to 11.2.0.2.2 is resolved
– voting disk was moved from ocfs to raw device – as a workaround for Bug 11876815
– set TMP and TEMP env to new dir with availabe space before running the installer and prechecks to succeed
– applied GIPSU#2 before the rootupgrade.sh step
– rootupgrade.sh step was successful on all nodes
– verified post upgrade checks and logs to confirm GI upgrade was success !

+ DB upgrade to 11.2.0.2 Plus PSU#2 will be resumed shorlty

MMON Slave日志[KDL_TRIM]: NEWLEN: MESSAGES

有网友反映windows 2003 上的10.2.0.4数据库,MMON的Slave后台进程报带有NEWLEN: MESSAGES的相关日志,如:

*** ACTION NAME:(Auto ADDM Slave Action) 2011-09-09 18:00:12.053
*** MODULE NAME:(MMON_SLAVE) 2011-09-09 18:00:12.053
*** SERVICE NAME:(SYS$BACKGROUND) 2011-09-09 18:00:12.053
*** SESSION ID:(797.606) 2011-09-09 18:00:12.053
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0

这一般是由于10.2.0.4上已经confirmed的bug:”Bug 6972843 – Unnecessary trace files with “[kdl_trim]: newlen” text in them [ID 6972843.8]”所引起的,在bug目前确认仅在10.2.0.4上发生,而到10.2.0.4.1中已经被修复了。

主要是MMON的辅助slave进程会不必要地输出如”[kdl_trim]: newlen: 0″的日志信息:

Unnecessary trace files may be generated with entries that read:'[kdl_trim]: newlen: '.
Hdr: 9600827 10.2.0.4 RDBMS 10.2.0.4 RAM LOBS PRODID-5 PORTID-212
Abstract: [KDL_TRIM]: NEWLEN: MESSAGES IN MMON SLAVES TRACES
PROBLEM:
--------
MMON slave processes traces contains the following entries despite patch 
6972843 applied
*** NAME:(Auto ADDM Slave Action) 2010-04-14 06:00:42.899
*** NAME:(MMON_SLAVE) 2010-04-14 06:00:42.899
*** NAME:(SYS$BACKGROUND) 2010-04-14 06:00:42.899
*** ID:(2472.45569) 2010-04-14 06:00:42.899
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
DIAGNOSTIC ANALYSIS:
--------------------
- Opatch contains patch 6972843 applied
WORKAROUND:
-----------
none
RELATED BUGS:
-------------
Bug 6972843 - HOURLY TRACE FILE CREATED IN BDUMP FOR AUTO ADDM SLAVE ACTION

该bug可以被忽略,但是如果日志产生过于频繁的话可以考虑升级10.2.0.4的PSU或者直接升级到10.2.0.5。

为11.2.0.2 Grid Infrastructure添加节点

在之前的文章中我介绍了为10g RAC Cluster添加节点的具体步骤。在11gr2中Oracle CRS升级为Grid Infrastructure,通过GI我们可以更方便地控制CRS资源如:VIP、ASM等等,这也导致了在为11.2中的GI添加节点时,同10gr2相比有着较大的差异。

这里我们要简述在11.2中为GI ADD NODE的几个要点:

一、准备工作

准备工作是不可忽略的,在10g RAC Cluster添加节点中我列举了必须完成的先决条件,在11.2 GI中这些条件依然有效,但请注意以下2点:

1.不仅要为oracle用户配置用户等价性,也要为grid(GI安装用户)用户配置;除非你同时使用oracle安装GI和RDBMS,这是不推荐的

2.在11.2 GI中推出了octssd(Oracle Cluster Synchronization Service Daemon)时间同步服务,如果打算使用octssd的话那么建议禁用ntpd事件服务,具体方法如下:

# service ntpd stop
Shutting down ntpd:                                        [  OK  ]
# chkconfig ntpd off
# mv /etc/ntp.conf /etc/ntp.conf.orig
# rm /var/run/ntpd.pid

3.使用cluster verify工具验证新增节点是否满足cluster的要求:

cluvfy stage -pre nodeadd -n <NEW NODE>
具体用法如:
su - grid
[grid@vrh1 ~]$ cluvfy stage -pre nodeadd -n vrh3
Performing pre-checks for node addition 
Checking node reachability...
Node reachability check passed from node "vrh1"
Checking user equivalence...
User equivalence check passed for user "grid"
Checking node connectivity...
Checking hosts config file...
Verification of the hosts config file successful
Check: Node connectivity for interface "eth0"
Node connectivity passed for interface "eth0"
Node connectivity check passed
Checking CRS integrity...
CRS integrity check passed
Checking shared resources...
Checking CRS home location...
The location "/g01/11.2.0/grid" is not shared but is present/creatable on all nodes
Shared resources check for node addition passed
Checking node connectivity...
Checking hosts config file...
Verification of the hosts config file successful
Check: Node connectivity for interface "eth0"
Node connectivity passed for interface "eth0"
Check: Node connectivity for interface "eth1"
Node connectivity passed for interface "eth1"
Node connectivity check passed
Total memory check passed
Available memory check passed
Swap space check passed
Free disk space check passed for "vrh3:/tmp"
Free disk space check passed for "vrh1:/tmp"
Check for multiple users with UID value 54322 passed
User existence check passed for "grid"
Run level check passed
Hard limits check failed for "maximum open file descriptors"
Check failed on nodes:
vrh3
Soft limits check passed for "maximum open file descriptors"
Hard limits check passed for "maximum user processes"
Soft limits check passed for "maximum user processes"
System architecture check passed
Kernel version check passed
Kernel parameter check passed for "semmsl"
Kernel parameter check passed for "semmns"
Kernel parameter check passed for "semopm"
Kernel parameter check passed for "semmni"
Kernel parameter check passed for "shmmax"
Kernel parameter check passed for "shmmni"
Kernel parameter check passed for "shmall"
Kernel parameter check passed for "file-max"
Kernel parameter check passed for "ip_local_port_range"
Kernel parameter check passed for "rmem_default"
Kernel parameter check passed for "rmem_max"
Kernel parameter check passed for "wmem_default"
Kernel parameter check passed for "wmem_max"
Kernel parameter check passed for "aio-max-nr"
Package existence check passed for "make-3.81( x86_64)"
Package existence check passed for "binutils-2.17.50.0.6( x86_64)"
Package existence check passed for "gcc-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libaio-0.3.106 (x86_64)( x86_64)"
Package existence check passed for "glibc-2.5-24 (x86_64)( x86_64)"
Package existence check passed for "compat-libstdc++-33-3.2.3 (x86_64)( x86_64)"
Package existence check passed for "elfutils-libelf-0.125 (x86_64)( x86_64)"
Package existence check passed for "elfutils-libelf-devel-0.125( x86_64)"
Package existence check passed for "glibc-common-2.5( x86_64)"
Package existence check passed for "glibc-devel-2.5 (x86_64)( x86_64)"
Package existence check passed for "glibc-headers-2.5( x86_64)"
Package existence check passed for "gcc-c++-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libaio-devel-0.3.106 (x86_64)( x86_64)"
Package existence check passed for "libgcc-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libstdc++-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libstdc++-devel-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "sysstat-7.0.2( x86_64)"
Package existence check passed for "ksh-20060214( x86_64)"
Check for multiple users with UID value 0 passed
Current group ID check passed
Checking OCR integrity...
OCR integrity check passed
Checking Oracle Cluster Voting Disk configuration...
Oracle Cluster Voting Disk configuration check passed
Time zone consistency check passed
Starting Clock synchronization checks using Network Time Protocol(NTP)...
NTP Configuration file check started...
No NTP Daemons or Services were found to be running
Clock synchronization check using Network Time Protocol(NTP) passed
User "grid" is not part of "root" group. Check passed
Checking consistency of file "/etc/resolv.conf" across nodes
File "/etc/resolv.conf" does not have both domain and search entries defined
domain entry in file "/etc/resolv.conf" is consistent across nodes
search entry in file "/etc/resolv.conf" is consistent across nodes
All nodes have one search entry defined in file "/etc/resolv.conf"
PRVF-5636 : The DNS response time for an unreachable node exceeded "15000" ms on following nodes: vrh3
File "/etc/resolv.conf" is not consistent across nodes
Pre-check for node addition was unsuccessful on all the nodes.

一般来说如果我们不使用DNS解析域名方式的话,那么resolv.conf不一直的问题可以忽略,但在slient安装模式下可能造成我们的操作无法完成,这个后面会介绍。

二、向GI中加入新的节点

注意11.2.0.2 GI添加节点的关键脚本addNode.sh可能存在Bug,如官方文档所述当希望使用Interactive Mode交互模式启动OUI界面添加节点时,只要运行addNode.sh脚本即可,实际情况则不是这样:

documentation said:
Go to CRS_home/oui/bin and run the addNode.sh script on one of the existing nodes.
Oracle Universal Installer runs in add node mode and the Welcome page displays.
Click Next and the Specify Cluster Nodes for Node Addition page displays.
we done:
运行addNode.sh要求以GI拥有者身份运行该脚本,一般为grid用户,要求在已有的正运行GI的节点上启动脚本
[grid@vrh1 ~]$ cd $ORA_CRS_HOME/oui/bin
[grid@vrh1 bin]$ ./addNode.sh
ERROR:
Value for CLUSTER_NEW_NODES not specified.
USAGE:
/g01/11.2.0/grid/cv/cvutl/check_nodeadd.pl  {-pre|-post} 
/g01/11.2.0/grid/cv/cvutl/check_nodeadd.pl -pre [-silent] CLUSTER_NEW_NODES={}
/g01/11.2.0/grid/cv/cvutl/check_nodeadd.pl -pre [-silent] CLUSTER_NEW_NODES={} 
CLUSTER_NEW_VIRTUAL_HOSTNAMES={}
/g01/11.2.0/grid/cv/cvutl/check_nodeadd.pl -pre [-silent] -responseFile
/g01/11.2.0/grid/cv/cvutl/check_nodeadd.pl -post [-silent]

我们的本意是期望使用图形化的交互界面的OUI(runInstaller -addnode)来新增节点,然而addNode.sh居然让我们输入一些参量,而且其调用的check_nodeadd.pl脚本使用的是silent模式。

在MOS和GOOGLE上搜了一圈,基本所有的文档都推荐使用silent模式来添加节点,无法只好转到静默添加上来。实际上静默添加所需要提供的参数并不多,这可能是这种方式得到推崇的原因之一,但是这里又碰到问题了:

语法SYNTAX:
./addNode.sh –silent 
"CLUSTER_NEW_NODES={node2}" 
"CLUSTER_NEW_PRIVATE_NODE_NAMES={node2-priv}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={node2-vip}"
在我们的例子中具体命令如下
./addNode.sh -silent
"CLUSTER_NEW_NODES={vrh3}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={vrh3-vip}"
"CLUSTER_NEW_PRIVATE_NODE_NAMES={vrh3-priv}" 
以上命令因为采用silent模式所以没有任何窗口输出(实际上会输出到 /tmp/silentInstall.log日志文件中),去掉-silent参数
./addNode.sh  "CLUSTER_NEW_NODES={vrh3}"
"CLUSTER_NEW_VIRTUAL_HOSTNAMES={vrh3-vip}" "CLUSTER_NEW_PRIVATE_NODE_NAMES={vrh3-priv}"
Performing pre-checks for node addition 
Checking node reachability...
Node reachability check passed from node "vrh1"
Checking user equivalence...
User equivalence check passed for user "grid"
Checking node connectivity...
Checking hosts config file...
Verification of the hosts config file successful
Check: Node connectivity for interface "eth0"
Node connectivity passed for interface "eth0"
Node connectivity check passed
Checking CRS integrity...
CRS integrity check passed
Checking shared resources...
Checking CRS home location...
The location "/g01/11.2.0/grid" is not shared but is present/creatable on all nodes
Shared resources check for node addition passed
Checking node connectivity...
Checking hosts config file...
Verification of the hosts config file successful
Check: Node connectivity for interface "eth0"
Node connectivity passed for interface "eth0"
Check: Node connectivity for interface "eth1"
Node connectivity passed for interface "eth1"
Node connectivity check passed
Total memory check passed
Available memory check passed
Swap space check passed
Free disk space check passed for "vrh3:/tmp"
Free disk space check passed for "vrh1:/tmp"
Check for multiple users with UID value 54322 passed
User existence check passed for "grid"
Run level check passed
Hard limits check failed for "maximum open file descriptors"
Check failed on nodes:
vrh3
Soft limits check passed for "maximum open file descriptors"
Hard limits check passed for "maximum user processes"
Soft limits check passed for "maximum user processes"
System architecture check passed
Kernel version check passed
Kernel parameter check passed for "semmsl"
Kernel parameter check passed for "semmns"
Kernel parameter check passed for "semopm"
Kernel parameter check passed for "semmni"
Kernel parameter check passed for "shmmax"
Kernel parameter check passed for "shmmni"
Kernel parameter check passed for "shmall"
Kernel parameter check passed for "file-max"
Kernel parameter check passed for "ip_local_port_range"
Kernel parameter check passed for "rmem_default"
Kernel parameter check passed for "rmem_max"
Kernel parameter check passed for "wmem_default"
Kernel parameter check passed for "wmem_max"
Kernel parameter check passed for "aio-max-nr"
Package existence check passed for "make-3.81( x86_64)"
Package existence check passed for "binutils-2.17.50.0.6( x86_64)"
Package existence check passed for "gcc-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libaio-0.3.106 (x86_64)( x86_64)"
Package existence check passed for "glibc-2.5-24 (x86_64)( x86_64)"
Package existence check passed for "compat-libstdc++-33-3.2.3 (x86_64)( x86_64)"
Package existence check passed for "elfutils-libelf-0.125 (x86_64)( x86_64)"
Package existence check passed for "elfutils-libelf-devel-0.125( x86_64)"
Package existence check passed for "glibc-common-2.5( x86_64)"
Package existence check passed for "glibc-devel-2.5 (x86_64)( x86_64)"
Package existence check passed for "glibc-headers-2.5( x86_64)"
Package existence check passed for "gcc-c++-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libaio-devel-0.3.106 (x86_64)( x86_64)"
Package existence check passed for "libgcc-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libstdc++-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libstdc++-devel-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "sysstat-7.0.2( x86_64)"
Package existence check passed for "ksh-20060214( x86_64)"
Check for multiple users with UID value 0 passed
Current group ID check passed
Checking OCR integrity...
OCR integrity check passed
Checking Oracle Cluster Voting Disk configuration...
Oracle Cluster Voting Disk configuration check passed
Time zone consistency check passed
Starting Clock synchronization checks using Network Time Protocol(NTP)...
NTP Configuration file check started...
No NTP Daemons or Services were found to be running
Clock synchronization check using Network Time Protocol(NTP) passed
User "grid" is not part of "root" group. Check passed
Checking consistency of file "/etc/resolv.conf" across nodes
File "/etc/resolv.conf" does not have both domain and search entries defined
domain entry in file "/etc/resolv.conf" is consistent across nodes
search entry in file "/etc/resolv.conf" is consistent across nodes
All nodes have one search entry defined in file "/etc/resolv.conf"
PRVF-5636 : The DNS response time for an unreachable node exceeded "15000" ms on following nodes: vrh3
File "/etc/resolv.conf" is not consistent across nodes
Checking VIP configuration.
Checking VIP Subnet configuration.
Check for VIP Subnet configuration passed.
Checking VIP reachability
Check for VIP reachability passed.
Pre-check for node addition was unsuccessful on all the nodes.

在addNode.sh正式添加节点之前它也会调用cluvfy工具来验证新加入节点是否满足条件,如果不满足则拒绝下一步操作。因为我们在之前已经验证过了新节点的可用性,所以这里完全可以跳过addNode.sh的验证,具体来看一下addNode.sh脚本的内容:

[grid@vrh1 bin]$ cat addNode.sh 
#!/bin/sh
OHOME=/g01/11.2.0/grid
INVPTRLOC=$OHOME/oraInst.loc
ADDNODE="$OHOME/oui/bin/runInstaller -addNode -invPtrLoc $INVPTRLOC ORACLE_HOME=$OHOME $*"
if [ "$IGNORE_PREADDNODE_CHECKS" = "Y" -o ! -f "$OHOME/cv/cvutl/check_nodeadd.pl" ]
then
$ADDNODE
else
CHECK_NODEADD="$OHOME/perl/bin/perl $OHOME/cv/cvutl/check_nodeadd.pl -pre $*"
$CHECK_NODEADD
if [ $? -eq 0 ]
then
$ADDNODE
fi
fi

可以看到存在一个IGNORE_PREADDNODE_CHECKS环境变量可以控制是否进行节点新增的预检查,我们手动设置该变量,之后再次运行addNode.sh脚本:

export IGNORE_PREADDNODE_CHECKS=Y
./addNode.sh  "CLUSTER_NEW_NODES={vrh3}"
"CLUSTER_NEW_VIRTUAL_HOSTNAMES={vrh3-vip}" "CLUSTER_NEW_PRIVATE_NODE_NAMES={vrh3-priv}"
> add_node.log  2>&1
另开一个窗口可以监控新增节点的过程日志
tail -f add_node.log 
Starting Oracle Universal Installer...
Checking swap space: must be greater than 500 MB.   Actual 5951 MB    Passed
Checking monitor: must be configured to display at least 256 colors.    Actual 16777216    Passed
Oracle Universal Installer, Version 11.2.0.2.0 Production
Copyright (C) 1999, 2010, Oracle. All rights reserved.
Performing tests to see whether nodes vrh2,vrh3 are available
............................................................... 100% Done.
.
-----------------------------------------------------------------------------
Cluster Node Addition Summary
Global Settings
Source: /g01/11.2.0/grid
New Nodes
Space Requirements
New Nodes
vrh3
/: Required 6.66GB : Available 32.40GB
Installed Products
Product Names
Oracle Grid Infrastructure 11.2.0.2.0
Sun JDK 1.5.0.24.08
Installer SDK Component 11.2.0.2.0
Oracle One-Off Patch Installer 11.2.0.0.2
Oracle Universal Installer 11.2.0.2.0
Oracle USM Deconfiguration 11.2.0.2.0
Oracle Configuration Manager Deconfiguration 10.3.1.0.0
Enterprise Manager Common Core Files 10.2.0.4.3
Oracle DBCA Deconfiguration 11.2.0.2.0
Oracle RAC Deconfiguration 11.2.0.2.0
Oracle Quality of Service Management (Server) 11.2.0.2.0
Installation Plugin Files 11.2.0.2.0
Universal Storage Manager Files 11.2.0.2.0
Oracle Text Required Support Files 11.2.0.2.0
Automatic Storage Management Assistant 11.2.0.2.0
Oracle Database 11g Multimedia Files 11.2.0.2.0
Oracle Multimedia Java Advanced Imaging 11.2.0.2.0
Oracle Globalization Support 11.2.0.2.0
Oracle Multimedia Locator RDBMS Files 11.2.0.2.0
Oracle Core Required Support Files 11.2.0.2.0
Bali Share 1.1.18.0.0
Oracle Database Deconfiguration 11.2.0.2.0
Oracle Quality of Service Management (Client) 11.2.0.2.0
Expat libraries 2.0.1.0.1
Oracle Containers for Java 11.2.0.2.0
Perl Modules 5.10.0.0.1
Secure Socket Layer 11.2.0.2.0
Oracle JDBC/OCI Instant Client 11.2.0.2.0
Oracle Multimedia Client Option 11.2.0.2.0
LDAP Required Support Files 11.2.0.2.0
Character Set Migration Utility 11.2.0.2.0
Perl Interpreter 5.10.0.0.1
PL/SQL Embedded Gateway 11.2.0.2.0
OLAP SQL Scripts 11.2.0.2.0
Database SQL Scripts 11.2.0.2.0
Oracle Extended Windowing Toolkit 3.4.47.0.0
SSL Required Support Files for InstantClient 11.2.0.2.0
SQL*Plus Files for Instant Client 11.2.0.2.0
Oracle Net Required Support Files 11.2.0.2.0
Oracle Database User Interface 2.2.13.0.0
RDBMS Required Support Files for Instant Client 11.2.0.2.0
RDBMS Required Support Files Runtime 11.2.0.2.0
XML Parser for Java 11.2.0.2.0
Oracle Security Developer Tools 11.2.0.2.0
Oracle Wallet Manager 11.2.0.2.0
Enterprise Manager plugin Common Files 11.2.0.2.0
Platform Required Support Files 11.2.0.2.0
Oracle JFC Extended Windowing Toolkit 4.2.36.0.0
RDBMS Required Support Files 11.2.0.2.0
Oracle Ice Browser 5.2.3.6.0
Oracle Help For Java 4.2.9.0.0
Enterprise Manager Common Files 10.2.0.4.3
Deinstallation Tool 11.2.0.2.0
Oracle Java Client 11.2.0.2.0
Cluster Verification Utility Files 11.2.0.2.0
Oracle Notification Service (eONS) 11.2.0.2.0
Oracle LDAP administration 11.2.0.2.0
Cluster Verification Utility Common Files 11.2.0.2.0
Oracle Clusterware RDBMS Files 11.2.0.2.0
Oracle Locale Builder 11.2.0.2.0
Oracle Globalization Support 11.2.0.2.0
Buildtools Common Files 11.2.0.2.0
Oracle RAC Required Support Files-HAS 11.2.0.2.0
SQL*Plus Required Support Files 11.2.0.2.0
XDK Required Support Files 11.2.0.2.0
Agent Required Support Files 10.2.0.4.3
Parser Generator Required Support Files 11.2.0.2.0
Precompiler Required Support Files 11.2.0.2.0
Installation Common Files 11.2.0.2.0
Required Support Files 11.2.0.2.0
Oracle JDBC/THIN Interfaces 11.2.0.2.0
Oracle Multimedia Locator 11.2.0.2.0
Oracle Multimedia 11.2.0.2.0
HAS Common Files 11.2.0.2.0
Assistant Common Files 11.2.0.2.0
PL/SQL 11.2.0.2.0
HAS Files for DB 11.2.0.2.0
Oracle Recovery Manager 11.2.0.2.0
Oracle Database Utilities 11.2.0.2.0
Oracle Notification Service 11.2.0.2.0
SQL*Plus 11.2.0.2.0
Oracle Netca Client 11.2.0.2.0
Oracle Net 11.2.0.2.0
Oracle JVM 11.2.0.2.0
Oracle Internet Directory Client 11.2.0.2.0
Oracle Net Listener 11.2.0.2.0
Cluster Ready Services Files 11.2.0.2.0
Oracle Database 11g 11.2.0.2.0
-----------------------------------------------------------------------------
Instantiating scripts for add node (Monday, August 15, 2011 10:15:35 PM CST)
.                                                                 1% Done.
Instantiation of add node scripts complete
Copying to remote nodes (Monday, August 15, 2011 10:15:38 PM CST)
...............................................................................................                                 96% Done.
Home copied to new nodes
Saving inventory on nodes (Monday, August 15, 2011 10:21:02 PM CST)
.                                                               100% Done.
Save inventory complete
WARNING:A new inventory has been created on one or more nodes in this session.
However, it has not yet been registered as the central inventory of this system.
To register the new inventory please run the script at '/g01/oraInventory/orainstRoot.sh'
with root privileges on nodes 'vrh3'.
If you do not register the inventory, you may not be able to update or
patch the products you installed.
The following configuration scripts need to be executed as the "root" user in each cluster node.
/g01/oraInventory/orainstRoot.sh #On nodes vrh3
/g01/11.2.0/grid/root.sh #On nodes vrh3
To execute the configuration scripts:
1. Open a terminal window
2. Log in as "root"
3. Run the scripts in each cluster node
The Cluster Node Addition of /g01/11.2.0/grid was successful.
Please check '/tmp/silentInstall.log' for more details.

以上GI软件的安装成功了,接下来我们还需要在新加入的节点上运行2个关键的脚本,千万不要忘记这一点!:

运行orainstRoot.sh 和 root.sh脚本要求以root身份
su - root 
[root@vrh3]# cat /etc/oraInst.loc
inventory_loc=/g01/oraInventory                     --这里是oraInventory的位置
inst_group=asmadmin
[root@vrh3 ~]# cd /g01/oraInventory
[root@vrh3 oraInventory]# ./orainstRoot.sh
Creating the Oracle inventory pointer file (/etc/oraInst.loc)
Changing permissions of /g01/oraInventory.
Adding read,write permissions for group.
Removing read,write,execute permissions for world.
Changing groupname of /g01/oraInventory to asmadmin.
The execution of the script is complete.
运行CRS_HOME下的root.sh脚本,可能会有警告但不要紧
[root@vrh3 ~]# cd $ORA_CRS_HOME
[root@vrh3 g01]# /g01/11.2.0/grid/root.sh
Running Oracle 11g root script...
The following environment variables are set as:
ORACLE_OWNER= grid
ORACLE_HOME=  /g01/11.2.0/grid
Enter the full pathname of the local bin directory: [/usr/local/bin]:
Copying dbhome to /usr/local/bin ...
Copying oraenv to /usr/local/bin ...
Copying coraenv to /usr/local/bin ...
Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /g01/11.2.0/grid/crs/install/crsconfig_params
Creating trace directory
LOCAL ADD MODE
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
OLR initialization - successful
Adding daemon to inittab
ACFS-9200: Supported
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.
CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node vrh1, number 1, and is terminating
An active cluster was found during exclusive startup, restarting to join the cluster
clscfg: EXISTING configuration version 5 detected.
clscfg: version 5 is 11g Release 2.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
/g01/11.2.0/grid/bin/srvctl start listener -n vrh3 ... failed
Failed to perform new node configuration at /g01/11.2.0/grid/crs/install/crsconfig_lib.pm line 8255.
/g01/11.2.0/grid/perl/bin/perl -I/g01/11.2.0/grid/perl/lib -I/g01/11.2.0/grid/crs/install 
/g01/11.2.0/grid/crs/install/rootcrs.pl execution failed

以上会出现了2个小错误:

1.新增节点上LISTENER启动失败的问题可以忽略,这是因为RDBMS_HOME仍未安装,但CRS尝试去启动相关的监听

[root@vrh3 g01]# /g01/11.2.0/grid/bin/srvctl start listener -n vrh3
PRCR-1013 : Failed to start resource ora.CRS_LISTENER.lsnr
PRCR-1064 : Failed to start resource ora.CRS_LISTENER.lsnr on node vrh3
CRS-5010: Update of configuration file "/s01/orabase/product/11.2.0/dbhome_1/network/admin/listener.ora" failed: details at "(:CLSN00014:)" in "/g01/11.2.0/grid/log/vrh3/agent/crsd/oraagent_oracle/oraagent_oracle.log"
CRS-5013: Agent "/g01/11.2.0/grid/bin/oraagent.bin" failed to start process "/s01/orabase/product/11.2.0/dbhome_1/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/g01/11.2.0/grid/log/vrh3/agent/crsd/oraagent_oracle/oraagent_oracle.log"
CRS-2674: Start of 'ora.CRS_LISTENER.lsnr' on 'vrh3' failed
CRS-5013: Agent "/g01/11.2.0/grid/bin/oraagent.bin" failed to start process "/s01/orabase/product/11.2.0/dbhome_1/bin/lsnrctl" for action "clean": details at "(:CLSN00008:)" in "/g01/11.2.0/grid/log/vrh3/agent/crsd/oraagent_oracle/oraagent_oracle.log"
CRS-5013: Agent "/g01/11.2.0/grid/bin/oraagent.bin" failed to start process "/s01/orabase/product/11.2.0/dbhome_1/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/g01/11.2.0/grid/log/vrh3/agent/crsd/oraagent_oracle/oraagent_oracle.log"
CRS-2678: 'ora.CRS_LISTENER.lsnr' on 'vrh3' has experienced an unrecoverable failure
CRS-0267: Human intervention required to resume its availability.
PRCC-1015 : LISTENER was already running on vrh3
PRCR-1004 : Resource ora.LISTENER.lsnr is already running

2.rootcrs.pl脚本运行失败的话,一般重新运行一次即可:

[root@vrh3 bin]# /g01/11.2.0/grid/perl/bin/perl -I/g01/11.2.0/grid/perl/lib
-I/g01/11.2.0/grid/crs/install /g01/11.2.0/grid/crs/install/rootcrs.pl
Using configuration parameter file: /g01/11.2.0/grid/crs/install/crsconfig_params
PRKO-2190 : VIP exists for node vrh3, VIP name vrh3-vip
PRKO-2420 : VIP is already started on node(s): vrh3
Preparing packages for installation...
cvuqdisk-1.0.9-1
Configure Oracle Grid Infrastructure for a Cluster ... succeeded

3.建议在新增节点上重启crs,并使用cluvfy验证nodeadd顺利完成 :

[root@vrh3 ~]# crsctl stop crs
[root@vrh3 ~]# crsctl start crs
[root@vrh3 ~]# su - grid
[grid@vrh3 ~]$ cluvfy stage -post nodeadd -n vrh1,vrh2,vrh3
Performing post-checks for node addition 
Checking node reachability...
Node reachability check passed from node "vrh1"
Checking user equivalence...
User equivalence check passed for user "grid"
Checking node connectivity...
Checking hosts config file...
Verification of the hosts config file successful
Check: Node connectivity for interface "eth0"
Node connectivity passed for interface "eth0"
Node connectivity check passed
Checking cluster integrity...
Cluster integrity check passed
Checking CRS integrity...
CRS integrity check passed
Checking shared resources...
Checking CRS home location...
The location "/g01/11.2.0/grid" is not shared but is present/creatable on all nodes
Shared resources check for node addition passed
Checking node connectivity...
Checking hosts config file...
Verification of the hosts config file successful
Check: Node connectivity for interface "eth0"
Node connectivity passed for interface "eth0"
Check: Node connectivity for interface "eth1"
Node connectivity passed for interface "eth1"
Node connectivity check passed
Checking node application existence...
Checking existence of VIP node application (required)
VIP node application check passed
Checking existence of NETWORK node application (required)
NETWORK node application check passed
Checking existence of GSD node application (optional)
GSD node application is offline on nodes "vrh3,vrh2,vrh1"
Checking existence of ONS node application (optional)
ONS node application check passed
Checking Single Client Access Name (SCAN)...
Checking TCP connectivity to SCAN Listeners...
TCP connectivity to SCAN Listeners exists on all cluster nodes
Checking name resolution setup for "vrh.cluster.oracle.com"...
ERROR:
PRVF-4664 : Found inconsistent name resolution entries for SCAN name "vrh.cluster.oracle.com"
ERROR:
PRVF-4657 : Name resolution setup check for "vrh.cluster.oracle.com" (IP address: 192.168.1.190) failed
ERROR:
PRVF-4664 : Found inconsistent name resolution entries for SCAN name "vrh.cluster.oracle.com"
Verification of SCAN VIP and Listener setup failed
User "grid" is not part of "root" group. Check passed
Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed
Checking if CTSS Resource is running on all nodes...
CTSS resource check passed
Querying CTSS for time offset on all nodes...
Query of CTSS for time offset passed
Check CTSS state started...
CTSS is in Active state. Proceeding with check of clock time offsets on all nodes...
Check of clock time offsets passed
Oracle Cluster Time Synchronization Services check passed
Post-check for node addition was successful.

Oracle内部错误:ORA-07445[kcflfi()+466] [INT_DIVIDE_BY_ZERO]一例

一套Windows上的11.2.0.1单实例数据库在database open阶段出现了ORA-07445:core dump [kcflfi()+466] [INT_DIVIDE_BY_ZERO] [] [PC:0x500282E] [] []内部错误,具体的出错日志如下:

LOG CONTENT
=======================ALERT.LOG============================
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Picked latch-free SCN scheme 2
Using LOG_ARCHIVE_DEST_1 parameter default value as USE_DB_RECOVERY_FILE_DEST
ARCH: Warning; less destinations available than specified
by LOG_ARCHIVE_MIN_SUCCEED_DEST init.ora parameter
Autotune of undo retention is turned on. 
IMODE=BR
ILAT =84
2011-08-01 13:13:47.068000 +08:00
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options.
Using parameter settings in server-side spfile C:\APP\PRODUCT\11.2.0\DBHOME_1\DATABASE\SPFILEG11R2.ORA
System parameters with non-default values:
_spin_count              = 2000
processes                = 500
event                    = "10500 trace name context forever,level 8:10013 trace name context forever,level 10:
10015 trace name context forever,level 10"
sga_max_size             = 600M
shared_pool_size         = 152M
large_pool_size          = 32M
java_pool_size           = 4M
streams_pool_size        = 0
_db_file_direct_io_count = 12
sga_target               = 0
memory_target            = 0
control_files            = "C:\APP\ORADATA\G11R2\CONTROLFILE\O1_MF_6VWCSH9J_.CTL"
control_files            = "C:\APP\FLASH_RECOVERY_AREA\G11R2\CONTROLFILE\O1_MF_6VWCSHNF_.CTL"
db_block_checksum        = "TRUE"
db_block_size            = 8192
db_cache_size            = 196M
_shared_io_pool_size     = 0
compatible               = "11.2.0.0.0"
log_archive_dest_2       = "service=stdby optional lgwr sync affirm valid_for=(online_logfiles,all_roles)"
log_buffer               = 10485760
db_create_file_dest      = "C:\app\oradata"
db_recovery_file_dest    = "C:\app\flash_recovery_area"
db_recovery_file_dest_size= 500000M
undo_tablespace          = "UNDOTBS1"
_kgl_bucket_count        = 2
remote_login_passwordfile= "EXCLUSIVE"
db_domain                = ""
session_cached_cursors   = 300
audit_file_dest          = "C:\APP\ADMIN\G11R2\ADUMP"
optimizer_features_enable= "10.2.0.4"
audit_trail              = "DB"
cell_offload_plan_display= "ALWAYS"
db_name                  = "G11R2"
open_cursors             = 3000
_optimizer_extended_cursor_sharing_rel= "NONE"
pga_aggregate_target     = 300M
diagnostic_dest          = "C:\APP"
2011-08-01 13:13:48.164000 +08:00
PMON started with pid=2, OS id=984 
VKTM started with pid=3, OS id=3656 at elevated priority
VKTM running at (10)millisec precision with DBRM quantum (100)ms
GEN0 started with pid=4, OS id=5824 
DIAG started with pid=5, OS id=5832 
DBRM started with pid=6, OS id=2784 
PSP0 started with pid=7, OS id=2500 
DIA0 started with pid=8, OS id=5320 
MMAN started with pid=9, OS id=4128 
DBW0 started with pid=10, OS id=5852 
LGWR started with pid=11, OS id=3960 
CKPT started with pid=12, OS id=4472 
SMON started with pid=13, OS id=5788 
RECO started with pid=14, OS id=6036 
MMON started with pid=15, OS id=5740 
MMNL started with pid=16, OS id=2112 
ORACLE_BASE from environment = C:\app
alter database mount exclusive
2011-08-01 13:13:52.390000 +08:00
Sweep [inc][135908]: completed
NSS2 started with pid=19, OS id=2728 
Sweep [inc][135901]: completed
Successful mount of redo thread 1, with mount id 2704081164
Database mounted in Exclusive Mode
2011-08-01 13:13:53.413000 +08:00
Lost write protection disabled
2011-08-01 13:13:54.578000 +08:00
Sweep [inc][135897]: completed
Sweep [inc2][135908]: completed
Sweep [inc2][135901]: completed
Sweep [inc2][135897]: completed
2011-08-01 13:13:55.788000 +08:00
Completed: alter database mount exclusive
alter database open
Beginning crash recovery of 1 threads
parallel recovery started with 3 processes
2011-08-01 13:13:56.959000 +08:00
Started redo scan
Completed redo scan
read 0 KB redo, 0 data blocks need recovery
Started redo application at
Thread 1: logseq 867, block 88140, scn 9122496
Recovery of Online Redo Log: Thread 1 Group 3 Seq 867 Reading mem 0
Mem# 0: C:\APP\ORADATA\G11R2\ONLINELOG\O1_MF_3_6VWCSMPO_.LOG
Mem# 1: C:\APP\FLASH_RECOVERY_AREA\G11R2\ONLINELOG\O1_MF_3_6VWCSNGX_.LOG
Completed redo application of 0.00MB
Completed crash recovery at
Thread 1: logseq 867, block 88140, scn 9142497
0 data blocks read, 0 data blocks written, 0 redo k-bytes read
2011-08-01 13:13:58.738000 +08:00
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=22, OS id=4784 
2011-08-01 13:13:59.765000 +08:00
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
ARC1 started with pid=24, OS id=2780 
ARC2 started with pid=25, OS id=1288 
ARC1: Archival started
LGWR: Primary database is in MAXIMUM AVAILABILITY mode
ARC2: Archival started
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
ARC2: Becoming the heartbeat ARCH
LGWR: Destination LOG_ARCHIVE_DEST_1 is not serviced by LGWR
ARC3 started with pid=26, OS id=3876 
2011-08-01 13:14:00.828000 +08:00
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
NSS2 started with pid=19, OS id=5156 
2011-08-01 13:14:29.008000 +08:00
ORA-16198: LGWR received timedout error from KSR
2011-08-01 13:14:35.980000 +08:00
Errors in file c:\app\diag\rdbms\g11r2\g11r2\trace\g11r2_lgwr_3960.trc:
ORA-16198: Timeout incurred on internal channel during remote archival
LGWR: Error 16198 verifying archivelog destination LOG_ARCHIVE_DEST_2
Destination LOG_ARCHIVE_DEST_2 is UNSYNCHRONIZED
LGWR: Continuing...
ARCH: LGWR is scheduled to archive destination LOG_ARCHIVE_DEST_2 after log switch
2011-08-01 13:14:38.629000 +08:00
Trying to expand controlfile section 11 for Oracle Managed Files
Exception [type: INT_DIVIDE_BY_ZERO, ] [] [PC:0x500282E, __VInfreq__kcflfi()+466]
Errors in file c:\app\diag\rdbms\g11r2\g11r2\trace\g11r2_arc0_4784.trc  (incident=136091):
ORA-07445: exception encountered: core dump [kcflfi()+466] [INT_DIVIDE_BY_ZERO] [] [PC:0x500282E] [] []
Incident details in: c:\app\diag\rdbms\g11r2\g11r2\incident\incdir_136091\g11r2_arc0_4784_i136091.trc
2011-08-01 13:14:40.283000 +08:00
Trace dumping is performing id=[cdmp_20110801131440]
2011-08-01 13:14:52.417000 +08:00
Sweep [inc][136091]: completed
Sweep [inc2][136091]: completed
2011-08-01 13:14:59.805000 +08:00
ARC2: Detected ARCH process failure
ARC2: STARTING ARCH PROCESSES
ARC0 started with pid=19, OS id=5016 
2011-08-01 13:15:00.836000 +08:00
ARC0: Archival started
ARC2: STARTING ARCH PROCESSES COMPLETE
2011-08-01 13:15:36.689000 +08:00
Deleted Oracle managed file C:\APP\FLASH_RECOVERY_AREA\G11R2\ARCHIVELOG\2011_08_01\O1_MF_1_866_73DFKWRK_.ARC
2011-08-01 13:15:38.013000 +08:00
Error 12154 received logging on to the standby
Errors in file c:\app\diag\rdbms\g11r2\g11r2\trace\g11r2_ora_4852.trc:
ORA-12154: TNS:could not resolve the connect identifier specified
ARCH: Error 12154 Creating archive log file to 'stdby'
Trying to expand controlfile section 11 for Oracle Managed Files
Exception [type: INT_DIVIDE_BY_ZERO, ] [] [PC:0x500282E, __VInfreq__kcflfi()+466]
Errors in file c:\app\diag\rdbms\g11r2\g11r2\trace\g11r2_ora_4852.trc  (incident=136051):
ORA-07445: exception encountered: core dump [kcflfi()+466] [INT_DIVIDE_BY_ZERO] [] [PC:0x500282E] [] []
Incident details in: c:\app\diag\rdbms\g11r2\g11r2\incident\incdir_136051\g11r2_ora_4852_i136051.trc
2011-08-01 13:15:39.680000 +08:00
Trace dumping is performing id=[cdmp_20110801131539]
2011-08-01 13:15:42.782000 +08:00
PMON (ospid: 984): terminating the instance due to error 397
2011-08-01 13:15:50.520000 +08:00
Instance terminated by PMON, pid = 984
=============================g11r2_ora_4852_i136051.trc=============================
Dump file c:\app\diag\rdbms\g11r2\g11r2\incident\incdir_136051\g11r2_ora_4852_i136051.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Windows NT Version V6.1 Service Pack 1 
CPU                 : 4 - type 586, 2 Physical Cores
Process Affinity    : 0x0x00000000
Memory (Avail/Total): Ph:2122M/3566M, Ph+PgF:5413M/7130M, VA:1084M/2047M 
Instance name: g11r2
Redo thread mounted by this instance: 1
Oracle process number: 17
Windows thread id: 4852, image: ORACLE.EXE (SHAD)
*** 2011-08-01 13:15:38.527
*** SESSION ID:(197.1) 2011-08-01 13:15:38.527
*** CLIENT ID:() 2011-08-01 13:15:38.527
*** SERVICE NAME:() 2011-08-01 13:15:38.527
*** MODULE NAME:(oradim.exe) 2011-08-01 13:15:38.527
*** ACTION NAME:() 2011-08-01 13:15:38.527
Dump continued from file: c:\app\diag\rdbms\g11r2\g11r2\trace\g11r2_ora_4852.trc
ORA-07445: exception encountered: core dump [kcflfi()+466] [INT_DIVIDE_BY_ZERO] [] [PC:0x500282E] [] []
========= Dump for incident 136051 (ORA 7445 [kcflfi()+466]) ========
----- Beginning of Customized Incident Dump(s) -----
Exception [type: INT_DIVIDE_BY_ZERO, ] [] [PC:0x500282E, __VInfreq__kcflfi()+466]
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
Process Id: 0x000010bc  Thread Id : 0x000012f4    Time : Mon Aug 01 13:15:38 
Excp. Code: 0xc0000094  Excp. Type: INT_DIVIDE    Flags: 0x00000000
------------------- Registers ----------------------------
eip = 0500282e esp = 0d9f525c ebp = 0d9f577c edi = 37eefe00 esi = 00000265
eax = 00000265 ebx = 00000000 ecx = 089ee234 edx = 00000000
ecs = 0000001b eds = 00000023 ees = 00000023 ess = 00000023
egs = 00000000 efs = 0000003b
eflags = 00010246
------------------- End of Registers ---------------------
*** 2011-08-01 13:15:38.536
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x3, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=a01hp0psv0rrh) -----
alter database open
----------- messages from pre-loading .sym files:
Symbol file C:\app\product\11.2.0\dbhome_1\RDBMS\ADMIN\oracommon11.SYM does not match binary.
Symbol TimeStamp=4bb5eaac, Module TimeStamp=0 are different
Symbol file C:\app\product\11.2.0\dbhome_1\RDBMS\ADMIN\oraclsra11.SYM does not match binary.
Symbol TimeStamp=4bb4cf99, Module TimeStamp=0 are different
----------- end of messages from pre-loading .sym files
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
Symbol file C:\app\product\11.2.0\dbhome_1\BIN\oracommon11.SYM does not match binary.
Symbol TimeStamp=4bb5eaac, Module TimeStamp=0 are different
Symbol file C:\app\product\11.2.0\dbhome_1\BIN\oraclsra11.SYM does not match binary.
Symbol TimeStamp=4bb4cf99, Module TimeStamp=0 are different
EnumerateLoadedModules64 failed with error -1073741819
Symbol file oraclsra11.SYM does not match binary.
Symbol TimeStamp=4bb4cf99, Module TimeStamp=0 are different
Symbol file oracommon11.SYM does not match binary.
Symbol TimeStamp=4bb5eaac, Module TimeStamp=0 are different
__VInfreq__kcflfi()           00000000             
+466                                               
_kccrszf()+287       CALLrel  _kcflfi()            0 318345B8 34 31C0DD40 4000
265 4 7FFFFFFF 1 0 0
_kccrsd_expd()+1418  CALLrel  _kccrszf()           D9F7CEC 268 264
_kccwnc_reuse_expan  CALLrel  _kccrsd_expd()       D9F7CEC B 38
d()+640                                            
__VInfreq__kccwnc()  CALLrel  _kccwnc_reuse_expan  D9F7CEC B 26
+235                          d()                  
_krse_arc_complete(  CALLrel  _kccwnc()            D9F7CEC D9F6D38 B
)+1615                                             
_krse_arc_driver_co  CALLrel  _krse_arc_complete(  D9F78AC
re()+1307                     )                    
_krse_arc_driver()+  CALLrel  _krse_arc_driver_co  D9F7CEC 1 D9F7C6C 0 0 D9F7CC8
274                           re()                 0 0 0 0 0 0 0
_krsq_arch_to_force  CALLrel  _krse_arc_driver()   D9F7CEC 1 D9F7C6C 0 0 D9F7CC8
_switch()+196                                      0 0 0 0 0 0 0
__VInfreq__kcttsc()  CALLrel  _krsq_arch_to_force  D9F7CEC 1
+129                          _switch()            
_kcfopd()+1504       CALLrel  _kcttsc()            2
_adbdrv()+16700      CALLrel  _kcfopd()            0 0 0 0 D9FBBF8
_opiexe()+13594      CALLrel  _adbdrv()            4A C0000094 33644518 D9FBD38
6D60697 2F3FC5F0
_opiosq0()+6248      CALLrel  _opiexe()            4 0 D9FC704
_kpooprx()+277       CALLrel  _opiosq0()           3 E D9FC970 A4 0
_kpoal8()+632        CALLrel  _kpooprx()           D9FF074 D9FD3F8 13 1 0 A4
_opiodr()+1248       CALLreg  00000000             5E 1C D9FF070
___dyn_tls_init_cal  CALLreg  00000000             5E 1C D9FF070 1
lback()+2935122                                    
_opitsk()+1404       CALL???  00000000             C9A10E8 5E D9FF070 0 D9FED00
D9FF19C 53E52E 0 D9FF1C8
_opiino()+980        CALLrel  _opitsk()            0 0
_opiodr()+1248       CALLreg  00000000             3C 4 D9FFBC4
_opidrv()+1201       CALLrel  _opiodr()            3C 4 D9FFBC4 0
_sou2o()+55          CALLrel  _opidrv()            3C 4 D9FFBC4
_opimai_real()+124   CALLrel  _sou2o()             D9FFBD4 3C 4 D9FFBC4
_opimai()+125        CALLrel  _opimai_real()       2 D9FFBFC
_OracleThreadStart@  CALLrel  _opimai()            2 D9FFF3C 0 70 FFFFFFFF
4()+830                                            FFFFFFFF
___dyn_tls_init_cal  CALLptr  00000000             901FF6C D9FFFD4 776437F5
lback()+366382316                                  901FF6C 765D34CB 0
___dyn_tls_init_cal  CALLreg  00000000             901FF6C 765D34CB 0 0 901FF6C
lback()+367384440                                  0
___dyn_tls_init_cal  CALLrel  ___dyn_tls_init_cal  401326 901FF6C 0 0 0 0
lback()+367384392             lback()+367384403    
00000000             CALL???  00000000             
--------------------- Binary Stack Dump ---------------------
..................

从以上日志中可以看到在”Trying to expand controlfile section 11 for Oracle Managed Files“扩扎控制文件过程中出现了
_kccwnc_reuse_expan->_kccrsd_expd->_kccrszf->_kcflfi->_VInfreq__kcflfi()
函数的7445错误,kcf意为(manages and coordinates operations on the control file(s),kcf.c),是在处理日志文件中引发了INT_DIVIDE_BY_ZERO除数为零的代码bug。

通过7445和kcflfi关键词在MOS上搜索没有太大的发现,说明该Bug的处罚几率非常低,正好让我碰到说明是某些特殊参数的设置引起了该问题。

目标锁定启动日志中的非默认隐藏参数”_db_file_direct_io_count”,该参数决定了直接路径读写的IO大小,从9i开始该参数的单位调整为bytes而非原先的blocks,之前因为对该参数进行一些测试所以设置了一个较小值。

Parameter: DB_FILE_DIRECT_IO_COUNT
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Versions:	8.0 - 8.1
This parameter is hidden in 9.0 onwards.
Parameter type:        integer
Parameter class:       dynamic, scope = ALTER SYSTEM DEFERRED
Default value:         64
Range of values:       operating system-dependent
Description:
~~~~~~~~~~~~
DB_FILE_DIRECT_IO_COUNT is used to specify the number of blocks to be used
for IO operations done by backup, restore or direct path read and write
functions. The IO buffer size is a product of DB_FILE_DIRECT_IO_COUNT and
DB_BLOCK_SIZE. The IO buffer size cannot exceed max_IO_size for your
platform.
Assigning a high value to this parameter results in greater use of PGA or
SGA memory.
o In Oracle8i, minimize the number of I/O requests by setting the
DB_FILE_DIRECT_IO_COUNT instance parameter so that
DB_BLOCK_SIZE x DB_FILE_DIRECT_IO_COUNT = max_io_size of system
In Oracle8i the default for this is 64 blocks.
(In Oracle9i, it is replaced by _DB_FILE_DIRECT_IO_COUNT which governs
the size of direct I/Os in BYTES (not blocks). The default is 1Mb but
will be sized down if the max_io_size of the system is smaller.)
ORA-19863 during RMAN duplicate
Applies to:
Oracle Server - Enterprise Edition - Version: 10.2.0.3
This problem can occur on any platform.
Symptoms
-- Problem Statement:
Duplicate failed during the datafile restore stage:
Starting restore at 2008-Apr-09 09:28:24
using channel ORA_AUX_DISK_1
channel ORA_AUX_DISK_1: starting datafile backupset restore
channel ORA_AUX_DISK_1: specifying datafile(s) to restore from backup set
restoring datafile 00003 to /u06/oradata/hcmprdc/sysaux01.dbf
...
restoring datafile 00121 to /u06/oradata/hcmprdc/waapp.dbf
channel ORA_AUX_DISK_1: reading from backup piece
/u04/oradata/flash_recovery_area/HCMPRD/mdjd8s5v_1_1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 04/09/2008 09:28:28
RMAN-03015: error occurred in stored script Memory Script
ORA-19870: error reading backup piece /u04/oradata/flash_recovery_area/HCMPRD/mdjd8s5v_1_1
ORA-19863: device block size 1040384 is larger than max allowed: 262144
Cause
The database parameter _db_file_direct_io_count in the target and auxiliary instance does not match.
Solution
-- To implement the solution:
Ensure that parameter _db_file_direct_io_count on the target and auxiliary database the same
_DB_FILE_DIRECT_IO_COUNT need to be set to the same value between the source database 
where the backup was taken and the target database where the backup is being restored.
2.0 Size of Input/Output Buffers
================================
a. input buffers
----------------
NOTE : DB_FILE_DIRECT_IO_COUNT is not available in Oracle9i onwards.
In Oracle9i, it is replaced by a hidden _DB_FILE_DIRECT_IO_COUNT which 
governs the size of direct I/Os in BYTES (not blocks). The default is 
1Mb butwill be sized down if the max_io_size of the system is smaller.
The input buffer size is:
buffersize = db_block_size * db_file_direct_io_count
As there are 4 input buffers, the total input buffer memory use per channel is:
memory(input) = #buffers * #files * buffersize
= 4 * #files * buffersize
For example, if 2 channels are used, and each of these channels backs up 3 
files, then for each channel
memory(input) = 4 * 3 * db_block_size * db_file_direct_io_count
b. output buffers
-----------------
For disk channels, the output buffer size is:
buffersize = db_block_size * db_file_direct_io_count
For SBT_TAPE channels, the output buffer size in Oracle8/8i is o/s dependant. (On Solaris,
this defaults to 64k) On 9i/10g it defaults to 256k for all platforms. The BLKSIZE argument to 'allocate channel...' can be
used to override the default value.
As there are 4 output buffers,
memory(output) = #buffers * buffersize
= 4 * buffersize

一般来说使用该隐藏参数的默认值即可,通过重置该参数后修复启动问题:

SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
PL/SQL Release 11.2.0.1.0 - Production
CORE    11.2.0.1.0      Production
TNS for 32-bit Windows: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production
SQL> alter system reset "_db_file_direct_io_count" scope=spfile;
System altered.
SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
FROM SYS.x$ksppi x, SYS.x$ksppcv y
WHERE x.inst_id = USERENV ('Instance')
AND y.inst_id = USERENV ('Instance')
AND x.indx = y.indx
AND x.ksppinm LIKE '%db_file_direct_io_count%'
/
NAME                           VALUE                DESCRIB
------------------------------ -------------------- ------------------------------
_db_file_direct_io_count       1048576              Sequential I/O buf size
windows上的11gr2默认该参数为1MB

Oracle RAC内部错误:ORA-00600[kjbmprlst:shadow]一例

一套Linux x86-64上的11.2.0.1 4节点RAC系统中LMS GCS服务进程遭遇到内部错误ORA-00600[kjbmprlst:shadow],导致节点实例意外终止,具体日志如下:

Fri Jul 08 02:04:43 2011
Errors in file /u01/app/oracle/diag/rdbms/PROD/PROD1/trace/PROD1_lms1_536.trc  (incident=1011732):
ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/PROD/PROD1/incident/incdir_1011732/PROD1_lms1_536_i1011732.trc
Fri Jul 08 02:04:44 2011
Trace dumping is performing id=[cdmp_20110708020444]
Errors in file /u01/app/oracle/diag/rdbms/PROD/PROD1/trace/PROD1_lms1_536.trc:
ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], []
Errors in file /u01/app/oracle/diag/rdbms/PROD/PROD1/trace/PROD1_lms1_536.trc:
ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], []
LMS1 (ospid: 536): terminating the instance due to error 484
Fri Jul 08 02:04:45 2011
opiodr aborting process unknown ospid (27387) as a result of ORA-1092
System state dump is made for local instance
System State dumped to trace file /u01/app/oracle/diag/rdbms/PROD/PROD1/trace/PROD1_diag_513.trc
Fri Jul 08 02:04:54 2011
Termination issued to instance processes. Waiting for the processes to exit
Fri Jul 08 02:04:58 2011
ORA-1092 : opitsk aborting process

该ORA-00600[kjbmprlst:shadow]错误定位为11.2.0.1上的Bug 10121589或Bug 9458781:

Bug 10121589  ORA-600 [kjbmprlst:shadow] can occur in RAC
Affects:
Product (Component)	Oracle Server (Rdbms)
Range of versions believed to be affected 	Versions BELOW 12.1
Versions confirmed as being affected 	
11.2.0.1 
Platforms affected	Generic (all / most platforms affected)
Fixed:
This issue is fixed in	
12.1 (Future Release)
11.2.0.2 Bundle Patch 2 for Exadata Database
11.2.0.1 Bundle Patch 7 for Exadata Database 
Symptoms:
Related To:
Internal Error May Occur (ORA-600)
ORA-600 [kjbmprlst:shadow] 
RAC (Real Application Clusters) / OPS 
Description
An ORA-600 [kjbmprlst:shadow] can occur if the fix for bug 9979039
is present.
Note:
One off patches for 10200390 should also include this fix.
Bug 9458781  Missing close message to master leaves closed lock dangling crashing the instance with assorted Internal error
Affects:
Product (Component)	Oracle Server (Rdbms)
Range of versions believed to be affected 	Versions >= 11.2.0.1 but BELOW 11.2.0.2
Versions confirmed as being affected 	
11.2.0.1 
Platforms affected	Generic (all / most platforms affected)
Fixed:
This issue is fixed in	
11.2.0.2 (Server Patch Set)
11.2.0.1 Bundle Patch 4 for Exadata Database 
Symptoms:
Related To:
Instance May Crash
Internal Error May Occur (ORA-600)
ORA-600 [KJBMPRLST:SHADOW]
ORA-600 [KJBMOCVT:RID]
ORA-600 [KJBRREF:PKEY]
ORA-600 [KJBRASR:PKEY] 
RAC (Real Application Clusters) / OPS 
Description
A lock is closed without sending a message to the master.
This causes closed lock dangling at the master crashing the instance with different internal errors.
Reported internal errors so far are :
- KJBMPRLST:SHADOW
- KJBMOCVT:RID
- KJBRREF:PKEY
- KJBRASR:PKEY

该kjbmprlst:shadow内部函数用以管理kjbm shadow锁(/libserver10.a/kjbm.o )信息,存在某个已关闭的lock没有及时message给master node的代码漏洞,目前除了安装补丁外没有已验证的workaround办法(disable drm似乎是无效的):

oradebug lkdebug (track resources, take dumps)
KCL history
KJBL history
KJL history
PCM (GCS) and non-PCM (GES) resources are kept separate and use separate code paths.
GES:
Resource table: kjr and kjrt
Lock table: kjlt
Processes: kjpt
GCS:
Resource table: kjbr
Lock table: kjbl
DLM Structures (continued)
/* PCM resource structure */
typedef struct kjbr {                                /* 68 bytes on sun4u */
kjsolk       hash_q_kjbr;                             /* hash list : hp */
ub4          resname_kjbr[2];	                     /* the resource name */
kjsolk       scan_q_kjbr; /* chain to lmd scan q of grantable resources */
kjsolk       grant_q_kjbr;                 /* list of granted resources */
kjsolk       convert_q_kjbr;       /* list of resources being converted */
ub4          diskscn_bas_kjbr;         /* scn(base) known to be on disk */
ub2          diskscn_wrap_kjbr;        /* scn(wrap) known to be on disk */
ub2          writereqscn_wrap_kjbr;    /* scn(wrap) requested for write */
ub4          writereqscn_bas_kjbr;     /* scn(base) requested for write */
struct kjbl *sender_kjbr;                 /* lock elected to send block */
ub2          senderver_kjbr;                  /* version# of above lock */
ub2          writerver_kjbr;                  /* version# of lock below */
struct kjbl *writer_kjbr;                /* lock elected to write block */
ub1          mode_role_kjbr; /* one of 'n', 's', 'x' && one of 'l' or 'g' */
ub1          flags_kjbr;                        /* ignorewip, free etc. */
ub1          rfpcount_kjbr;                      /* refuse ping counter */
ub1          history_kjbr;                /* resource operation history */
kxid         xid_kjbr;                          /* split transaction ID */
} kjbr ;
/* kjbl - PCM lock structure
** Clients and most of the DLM will use the KJUSER* or KJ_* modes and kscns  */
typedef struct kjbl {                                /* 52 bytes on sun4u */
union {                     /* discriminate lock@master and lock@client */
struct {                                           /* for lock@master */
kgglk        state_q_kjbl;             /* link to chain to resource */
kjbopqi     *rqinfo_kjbl;                             /* target bid */
struct kjbr *resp_kjbl;                   /* pointer to my resource */
} kjbllam;                                 /* KJB Lock Lock At Master */
struct {                                           /* for lock@client */
ub4         disk_base_kjbl;        /* disk version(base) for replay */
ub2         disk_wrap_kjbl;        /* disk version(wrap) for replay */
ub1         master_node_kjbl;                   /* master instance# */
ub1         client_flag_kjbl;     /* flags specific to client locks */
ub2         update_seq_kjbl;               /* last update to master */
} kjbllac;                                 /* KJB Lock Lock At Client */
} kjblmcd;                        /* KJB Lock Master Client Discrimnant */
void  *remote_lockp_kjbl;           /* pointer to client lock or shadow */
ub2    remote_ver_kjbl;                         /* remote lock version# */
ub2        ver_kjbl;                                     /* my version# */
ub2        msg_seq_kjbl;                         /* client->master seq# */
ub2        reqid_kjbl;                         /* requestid for convert */
ub2        creqid_kjbl; /* requestid for convert that has been cancelled */
ub2        pi_wrap_kjbl;                     /* scn(wrap) of highest pi */
ub4        pi_base_kjbl;                     /* scn(base) of highest pi */
ub1        mode_role_kjbl; /* one of 'n', 's', 'x' && one of 'l' or 'g' */
ub1        state_kjbl;       /* _L|_R|_W|_S, notify, which q, lock type */
ub1        node_kjbl;                       /* instance lock belongs to */
ub1        flags_kjbl;                                /* lock flag bits */
ub2        rreqid_kjbl;                               /* save the reqid */
ub2         write_wrap_kjbl;        /* last write request version(wrap) */
ub4         write_base_kjbl;        /* last write request version(base) */
ub4         history_kjbl;                     /* lock operation history */
} kjbl;
PCM DLM locks that are owned by the local instance are allocated and embedded in an LE structure.
PCM DLM locks that are owned by remote instances and mastered by the local instance are allocated in SHARED_POOL.
PCM Locks and Resources
Fields of interest in the kclle structure: kcllerls or releasing; kcllelnm or name(id1,id2);
kcllemode or held-mode; kclleacq or acquiring; kcllelck or DLM lock.
Fields of interest in the kjbr structure: resname_kjbr[2] or resource name; grant_q_kjbr or grant queue;
convert_q_kjbr or convert queue; mode_role_kjbr, which is a bitwise merge of grant mode and
role-interpreted NULL(0x00), S(0x01), X(0x02), L0 Local (0x00), G0 Global without PI (0x08), G1 Global with PI (0x018).
The field mode_role_kjbl in kjbl is a bitwise merge of grant, request, and lock mode: 0x00 if grant NULL;
0x01 if grant S; 0x02 if grant X; 0x04 lock has been opened at master; 0x08 if global role (otherwise local);
0x10 has one or more PI; 0x20 if request CR; 0x40 if request S; 0x80 if request X.
Someone has to keep a list of all buffers and where they are mastered
This is called Global Resource Directory (GRD)
GRD is present on all the instances of the cluster
To find out the master:
select  b.dbablk, r.kjblmaster master_node
from x$le l, x$kjbl r, x$bh b
where b.obj =
and b.le_addr = l.le_addr
and l.le_kjbl = r.kjbllockp

Oracle Support宣称可以通过11.2.0.2 (Server Patch Set)11.2.0.1 Bundle Patch 4 for Exadata Database修复该bug,但是有迹象表明在11.2.0.2上仍可能发生该ORA-00600[kjbmprlst:shadow]内部错误,同时该bug更多地发生在超过2个节点的RAC系统中。

 

[zt]如何有效地报告Bug

作者:Simon Tatham 专业的自由软件程序员

翻译:Dasn

为公众写过软件的人,大概都收到过很拙劣的bug(计算机程序代码中的错误或程序运行时的瑕疵——译者注)报告,例如:

在报告中说“不好用”;

所报告内容毫无意义;

在报告中用户没有提供足够的信息;

在报告中提供了错误信息;

所报告的问题是由于用户的过失而产生的;

所报告的问题是由于其他程序的错误而产生的;

所报告的问题是由于网络错误而产生的;

这便是为什么“技术支持”被认为是一件可怕的工作,因为有拙劣的bug报告需要处理。然而并不是所有的bug报告都令人生厌:我在业余时间维护自由软件,有时我会收到非常清晰、有帮助并且“有内容”的bug报告。

在这里我会尽力阐明如何写一个好的bug报告。我非常希望每一个人在报告bug之前都读一下这篇短文,当然我也希望用户在给报告bug之前已经读过这篇文章。

简单地说,报告bug的目的是为了让程序员看到程序的错误。您可以亲自示范,也可以给出能导致程序出错的、详尽的操作步骤。如果程序出错了,程序员会收集额外的信息直到找到错误的原因;如果程序没有出错,那么他们会请您继续关注这个问题,收集相关的信息。

在bug报告里,要设法搞清什么是事实(例如:“我在电脑旁”和“XX出现了”)什么是推测(例如:“我问题可能是出在……”)。如果愿意的话,您可以省去推测,但是千万别省略事实。

当您报告bug的时候(既然您已经这么做了),一定是希望bug得到及时修正。所以此时针对程序员的任何过激或亵渎的言语(甚至谩骂)都是与事无补的——因为这可能是程序员的错误,也有可能是您的错误,也许您有权对他们发火,但是如果您能多提供一些有用的信息(而不是激愤之词)或许bug会被更快的修正。除此以外,请记住:如果是免费软件,作者提供给我们已经是出于好心,所以要是太多的人对他们无礼,他们可能就要“收起”这份好心了。

“程序不好用”

程序员不是弱智:如果程序一点都不好用,他们不可能不知道。他们不知道一定是因为程序在他们看来工作得很正常。所以,或者是您作过一些与他们不同的操作,或者是您的环境与他们不同。他们需要信息,报告bug也是为了提供信息。信息总是越多越好。

许多程序,特别是自由软件,会公布一个“已知bug列表”。如果您找到的bug在列表里已经有了,那就不必再报告了,但是如果您认为自己掌握的信息比列表中的丰富,那无论如何也要与程序员联系。您提供的信息可能会使他们更简单地修复bug。

本文中提到的都是一些指导方针,没有哪一条是必须恪守的准则。不同的程序员会喜欢不同形式的bug报告。如果程序附带了一套报告bug的准则,一定要读。如果它与本文中提到的规则相抵触,那么请以它为准。

如果您不是报告bug,而是寻求帮助,您应该说明您曾经到哪里找过答案,(例如:我看了第四章和第五章的第二节,但我找不到解决的办法。)这会使程序员了解用户喜欢到哪里去找答案,从而使程序员把帮助文档做得更容易使用。

“演示给我看”

报告bug的最好的方法之一是“演示”给程序员看。让程序员站在电脑前,运行他们的程序,指出程序的错误。让他们看着您启动电脑、运行程序、如何进行操作以及程序对您的输入有何反应。

他们对自己写的软件了如指掌,他们知道哪些地方不会出问题,而哪些地方最可能出问题。他们本能地知道应该注意什么。在程序真的出错之前,他们可能已经注意到某些地方不对劲,这些都会给他们一些线索。他们会观察程序测试中的每一个细节,并且选出他们认为有用的信息。

这些可能还不够。也许他们觉得还需要更多的信息,会请您重复刚才的操作。他们可能在这期间需要与您交流一下,以便在他们需要的时候让bug重新出现。他们可能会改变一些操作,看看这个错误的产生是个别问题还是相关的一类问题。如果您不走运,他们可能需要坐下来,拿出一堆开发工具,花上几个小时来好好地研究一下。但是最重要的是在程序出错的时候让程序员在电脑旁。一旦他们看到了问题,他们通常会找到原因并开始试着修改。

“告诉我该怎么做”

如今是网络时代,是信息交流的时代。我可以点一下鼠标把自己的程序送到俄罗斯的某个朋友那里,当然他也可以用同样简单的方法给我一些建议。但是如果我的程序出了什么问题,我不可能在他旁边。“演示”是很好的办法,但是常常做不到。

如果您必须报告bug,而此时程序员又不在您身边,那么您就要想办法让bug重现在他们面前。当他们亲眼看到错误时,就能够进行处理了。

确切地告诉程序员您做了些什么。如果是一个图形界面程序,告诉他们您按了哪个按钮,依照什么顺序按的。如果是一个命令行程序,精确的告诉他们您键入了什么命令。您应该尽可能详细地提供您所键入的命令和程序的反应。

把您能想到的所有的输入方式都告诉程序员,如果程序要读取一个文件,您可能需要发一个文件的拷贝给他们。如果程序需要通过网络与另一台电脑通讯,您或许不能把那台电脑复制过去,但至少可以说一下电脑的类型和安装了哪些软件(如果可以的话)。

“哪儿出错了?在我看来一切正常哦!”

如果您给了程序员一长串输入和指令,他们执行以后没有出现错误,那是因为您没有给他们足够的信息,可能错误不是在每台计算机上都出现,您的系统可能和他们的在某些地方不一样。有时候程序的行为可能和您预想的不一样,这也许是误会,但是您会认为程序出错了,程序员却认为这是对的。

同样也要描述发生了什么。精确的描述您看到了什么。告诉他们为什么您觉得自己所看到的是错误的,最好再告诉他们,您认为自己应该看到什么。如果您只是说:“程序出错了”,那您很可能漏掉了非常重要的信息。

如果您看到了错误消息,一定要仔细、准确的告诉程序员,这确实很重要。在这种情况下,程序员只要修正错误,而不用去找错误。他们需要知道是什么出问题了,系统所报的错误消息正好帮助了他们。如果您没有更好的方法记住这些消息,就把它们写下来。只报告“程序出了一个错”是毫无意义的,除非您把错误消息一块报上来。

特殊情况下,如果有错误消息号,一定要把这些号码告诉程序员。不要以为您看不出任何意义,它就没有意义。错误消息号包含了能被程序员读懂的各种信息,并且很有可能包含重要的线索。给错误消息编号是因为用语言描述计算机错误常常令人费解。用这种方式告诉您错误的所在是一个最好的办法。

在这种情形下,程序员的排错工作会十分高效。他们不知道发生了什么,也不可能到现场去观察,所以他们一直在搜寻有价值的线索。错误消息、错误消息号以及一些莫名其妙的延迟,都是很重要的线索,就像办案时的指纹一样重要,保存好。

如果您使用UNIX系统,程序可能会产生一个内核输出(coredump)。内核输出是特别有用的线索来源,别扔了它们。另一方面,大多数程序员不喜欢收到含有大量内核输出文件的EMAIL,所以在发邮件之前最好先问一下。还有一点要注意:内核输出文件记录了完整的程序状态,也就是说任何秘密(可能当时程序正在处理一些私人信息或秘密数据)都可能包含在内核输出文件里。

“出了问题之后,我做了……”

当一个错误或bug发生的时候,您可能会做许多事情。但是大多数人会使事情变的更糟。我的一个朋友在学校里误删了她所有的Word文件,在找人帮忙之前她重装了Word,又运行了一遍碎片整理程序,这些操作对于恢复文件是毫无益处的,因为这些操作搞乱了磁盘的文件区块。恐怕在这个世界上没有一种反删除软件能恢复她的文件了。如果她不做任何操作,或许还有一线希望。

这种用户仿佛一只被逼到墙角的鼬(黄鼠狼、紫貂一类的动物——译者注):背靠墙壁,面对死亡的降临奋起反扑,疯狂攻击。他们认为做点什么总比什么都不做强。然而这些在处理计算机软件问题时并不适用。

不要做鼬,做一只羚羊。当一只羚羊面对料想不到的情况或受到惊吓时,它会一动不动,是为了不吸引任何注意,与此同时也在思考解决问题的最好办法(如果羚羊有一条技术支持热线,此时占线。)。然后,一旦它找到了最安全的行动方案,它便去做。

当程序出毛病的时候,立刻停止正在做的任何操作。不要按任何健。仔细地看一下屏幕,注意那些不正常的地方,记住它或者写下来。然后慎重地点击“确定” 或“取消”,选择一个最安全的。学着养成一种条件反射——一旦电脑出了问题,先不要动。要想摆脱这个问题,关掉受影响的程序或者重新启动计算机都不好,一个解决问题的好办法是让问题再次产生。程序员们喜欢可以被重现的问题,快乐的程序员可以更快而且更有效率的修复bug。

“我想粒子的跃迁与错误的极化有关”

并不只是非专业的用户才会写出拙劣的bug报告,我见过一些非常差的bug报告出自程序员之手,有些还是非常优秀的程序员。

有一次我与另一个程序员一起工作,他一直在找代码中的bug,他常常遇到一个bug,但是不会解决,于是就叫我帮忙。“出什么毛病了?”我问。而他的回答却总是一些关于bug的意见。如果他的观点正确,那的确是一件好事。这意味着他已经完成了工作的一半,并且我们可以一起完成另一半工作。这是有效率并有用的。

但事实上他常常是错的。这就会使我们花上半个小时在原本正确的代码里来回寻找错误,而实际上问题出在别的地方。我敢肯定他不会对医生这么做。“大夫,我得了Hydroyoyodyne(真是怪病——译者),给我开个方子”,人们知道不该对一位医生说这些。您描述一下症状,哪个地方不舒服,哪里疼、起皮疹、发烧……让医生诊断您得了什么病,应该怎样治疗。否则医生会把您当做疑心病或精神病患者打发了,这似乎没什么不对。

做程序员也是一样。即便您自己的“诊断”有时真的有帮助,也要只说“症状”。“诊断”是可说可不说的,但是“症状”一定要说。同样,在bug报告里面附上一份针对bug而做出修改的源代码是有用处的,但它并不能替代bug报告本身。

如果程序员向您询问额外的信息,千万别应付。曾经有一个人向我报告bug,我让他试一个命令,我知道这个命令不好用,但我是要看看程序会返回一个什么错误(这是很重要的线索)。但是这位老兄根本就没试,他在回复中说“那肯定不好用”,于是我又花了好些时间才说服他试了一下那个命令。

用户多动动脑筋对程序员的工作是有帮助的。即使您的推断是错误的,程序员也应该感谢您,至少您去帮助他们,使他们的工作变的更简单。不过千万别忘了报告“症状”,否则只会使事情变得更糟。

“真是奇怪,刚才还不好用,怎么现在又好了?”

“间歇性错误”着实让程序员发愁。相比之下,进行一系列简单的操作便能导致错误发生的问题是简单的。程序员可以在一个便于观察的条件下重复那些操作,观察每一个细节。太多的问题在这种情况下不能解决,例如:程序每星期出一次错,或者偶然出一次错,或者在程序员面前从不出错(程序员一离开就出错。——译者)。当然还有就是程序的截止日期到了,那肯定要出错。

大多数“间歇性错误”并不是真正的“间歇”。其中的大多数错误与某些地方是有联系的。有一些错误可能是内存泄漏产生的,有一些可能是别的程序在不恰当的时候修改某个重要文件造成的,还有一些可能发生在每一个小时的前半个小时中(我确实遇到过这种事情)。

同样,如果您能使bug重现,而程序员不能,那很有可能是他们的计算机和您的计算机在某些地方是不同的,这种不同引起了问题。我曾写过一个程序,它的窗口可以蜷缩成一个小球呆在屏幕的左上角,它在别的计算机上只能在 800×600 的解析度工作,但是在我的机器上却可以在 1024×768 下工作。

程序员想要了解任何与您发现的问题相关的事情。有可能的话您到另一台机器上试试,多试几次,两次,三次,看看问题是不是经常发生。如果问题出现在您进行了一系列操作之后,不是您想让它出现它就会出现,这就有可能是长时间的运行或处理大文件所导致的错误。程序崩溃的时候,您要尽可能的记住您都做了些什么,并且如果您看到任何图形,也别忘了提一下。您提供的任何事情都是有帮助的。即使只是概括性的描述(例如:当后台有EMACS运行时,程序常常出错),这虽然不能提供导致问题的直接线索,但是可能帮助程序员重现问题。

最重要的是:程序员想要确定他们正在处理的是一个真正的“间歇性错误”呢,还是一个在另一类特定的计算机上才出现的错误。他们想知道有关您计算机的许多细节,以便了解您的机器与他们的有什么不同。有许多细节都依仗特定的程序,但是有一件东西您一定要提供——版本号。程序的版本、操作系统的版本以及与问题有关的程序的版本。

“我把磁盘装进了 Windows……”

表意清楚在一份bug报告里是最基本的要求。如果程序员不知道您说的是什么意思,那您就跟没说一样。我收到的bug报告来自世界各地,有许多是来自非英语国家,他们通常为自己的英文不好而表示歉意。总的来说,这些用户发来的bug报告通常是清晰而且有用的。几乎所有不清晰的bug报告都是来自母语是英语的人,他们总是以为只要自己随便说说,程序员就能明白。

  • 精确。如果做相同的事情有两种方法,请说明您用的是哪一种。例如:“我选择了‘载入’”,可能意味着“我用鼠标点击‘载入’”或“我按下了‘ALT+L’”,说清楚您用了哪种方法,有时候这也有关系。
  • 详细。信息宁多毋少!如果您说了很多,程序员可以略去一部分,可是如果您说的太少,他们就不得不回过头再去问您一些问题。有一次我收到了一份bug报告只有一句话,每一次我问他更多事情时,他每次的回复都是一句话,于是我花了几个星期的时间才得到了有用的信息。
  • 慎用代词。诸如“它”,“窗体”这些词,当它们指代不清晰的时候不要用。来看看这句话:“我运行了FooApp,它弹出一个警告窗口,我试着关掉它,它就崩溃了。”这种表述并不清晰,用户究竟关掉了哪个窗口?是警告窗口还是整个FooApp程序?您可以这样说,“我运行FooApp程序时弹出一个警告窗口,我试着关闭警告窗口,FooApp崩溃了。”这样虽然罗嗦点,但是很清晰不容易产生误解。
  • 检查。重新读一遍您写的bug报告,觉得它是否清晰?如果您列出了一系列能导致程序出错的操作,那么照着做一遍,看看您是不是漏写了一步。

小结:

  • bug报告的首要目的是让程序员亲眼看到错误。如果您不能亲自做给他们看,给他们能使程序出错的详细的操作步骤。
  • 如果首要目的不能达成,程序员不能看到程序出错。这就需要bug报告的第二个目的来描述程序的什么地方出毛病了。详细的描述每一件事情:您看到了什么,您想看到什么,把错误消息记下来,尤其是“错误消息号”。
  • 当您的计算机做了什么您料想不到的事,不要动!在您平静下来之前什么都别做。不要做您认为不安全的事。
  • 尽量试着自己“诊断”程序出错的原因(如果您认为自己可以的话)。即使做出了“诊断”,您仍然应该报告“症状”。
  • 如果程序员需要,请准备好额外的信息。如果他们不需要,就不会问您要。他们不会故意为难自己。您手头上一定要有程序的版本号,它很可能是必需品。
  • 表述清楚,确保您的意思不能被曲解。
  • 总的来说,最重要的是要做到精确。程序员喜欢精确。

声明:我从没有真的看见过鼬和羚羊,我的比喻可能不恰当。

版权所有 Simon Tatham 1999

本文属于OPL(OpenContent License),请在复制和使用本文时自觉遵守OPL。

对本文的任何意见和批评请发送至:

英文版:anakin@pobox.com

中文版:dasn@users.sf.net

 

 

Pre-check while you are applying one-off patch

Oracle Database的补丁除去我们最常见的Patch set大补丁集(就是那种动辄1、2个G的补丁包)外,还包括Critical Patch Update(CPU)和Patch Set Update(PSU),one-off Patch(一次性补丁);在数据库使用过程中出现某些问题后(最常见的是出现ORA-00600、ORA-07445错误),通过DBA与MOS的协作将问题定位到某个大版本上的具体Bug,如果该Bug在该版本上现有one-off patch的话,我们可以通过one-off patch的实施来解决问题,one-off patch具有如下特点:

  1. one-off patch的patch包一般都很小,因为它是为了修正一个或多个bug所特制,它一般只包含必要的inventory xml信息,以及和bug相关的共享库文件(一般是.o或者.a文件)
  2. one-off patch绝大多数是随需随制,一般是Oracle内部人员发现Bug或者客户在生产过程中遇到Bug后,向Oracle development开发部门提出patch building的Request ,开发人员接到building或backport的要求后会为客户系统指定的数据库版本和操作系统平台特制一个补丁(如果还没有的话);举例来说可能一个bug在Linux平台的10.2.0.4版本中已有one-off patch,但客户在Solaris Sparc 平台上10.2.0.3上碰到同样的bug,那么这个时候就可以提交SR要求Oracle开发部门backport这个patch
  3. 绝大多数one-off patch只需要用Opatch工具apply即可,不需要升级/修正数据字典,这一点和Patch set/CPU/PSU是不同的
  4. one-off patch是针对某个特殊版本的,如10.2.0.4上的one-off patch只能针对10.2.0.4实施,而在10.2.0.4.8(10.2.0.4的PSU 8)上该one-off patch可能是一个conflicting Bug即有冲突的一次性bug补丁;针对这种情况Oracle一般会给出当前版本上的替代one-off patch,例如6996030″MERGE LABEL REQUEST ON TOP OF 10.2.0.4 FOR AUDIT TRAIL CLEAN-UP”是10.2.0.4上的一个one-off patch,但在10.2.0.4.4上6996030是一个conflicting bug存在冲突,这样我们就需要一个替代品、一个10.2.0.4.4上的Equivalent patch等价补丁,在我们的例子中这个patch是9650152 “MERGE REQUEST ON TOP OF 10.2.0.4.4 FOR BUGS 9589005 6805009 6991606”
  5. 实施one-off patch意味着你的数据库软件将与其他主流版本的数据库在行为方式上有所区别,另一方面你需要考虑到如果发现了新的Bug并考虑打上另一个one-off patch时可能存在Conflicting冲突。大多数情况下one-off patch可以迅速帮我们解决Bug,但反过来说实施任意一个one-off patch时我们可能冒着今后花费更多时间去维护数据库补丁的风险;就这一点来看,应用one-off patch有着它明显的缺点
  6. 在11g中提出了hot patch的概念,这里的patch就是指的one-off patch;就目前看来online-patching只是一个噱头,因为真正具备在线实施能力的patch少之又少,你很难从MOS上的patch专栏中找到一个可用的online-patch
  7. 一个我个人的观点:Oracle GCS会通过stack call、trace、dump等信息将问题定位到bug,最为常见的是ORA-00600、ORA-07445等内部错误,这些错误因为有着明确的error argument、stack call等信息可以让GCS精确地定位故障,而对于另一些问题例如process spin、异常等待事件等不产生明确argument、stack、dump的问题时,定位就要困难一些。我们经常可以看到一些bug note,在客户apply one-off patch并设置特定event后one-off patch并不起作用,导致这种情况的原因多种多样,一种可能是patch在开发过程中存在问题(补丁代码没有被正确触发),另一种则是bug没有被准确定位。

我们目前的问题在于虽然one-off patch存在这样那样的问题,但很多时候又不得不实施one-off patch(大多数原因是停机时间限制);以及在一个已经实施了诸多Patch set、PSU、CPU和one-off patch的环境中新增one-off patch,为了更好地完成这项工作,我们有必要在具体实施补丁前完成以下的预备工作:

1.在定位到具体Bug后我们可以获得一个Bug No#,使用该No#到My Oracle Support上的Patches & Updates专栏中搜索该Bug相关的patch信息,这里存在三种情况:

  • 最好的情况是指定的数据库版本和操作系统平台上存在相关的patch
  • 指定数据库版本上有patch,注意这里的指定数据库版本需要精确到具体的PSU,但没有指定操作系统平台上的,这个时候可以提交SR要求在指定平台上build一个
  • 指定数据库版本上没有patch,注意这里的指定数据库版本需要精确到具体的PSU,例如10.2.0.4上的patch是不能直接用在10.2.0.4.x上的;针对这种情况我们可以首先查询Metalink Note“Patch Set Updates – One-off Patch Conflict Resolution”,可以从这个Note中查到指定one-off patch是否被包含在某个psu中,或者在某个psu上有特定的等价补丁(Equivalent on top),一部分one-off patch在不同的psu之间还存在通用性,如10.2.0.4.4上的部分patch可以在10.2.0.4.5上实施,具体是否通用可以参考下文中冲突检查的方法来检验。如果都没有等价补丁也没有通用补丁那么需要提交SR让Oracle开发部门去merge一个

2.更复杂的情况是在现有版本上已经应用了某些特定的patch(psu、cpu、或者其他one-off patch),我们要保证需要新增的one-off patch之间不存在conflict关系,也不会和系统中已有的patch发生conflict,这时候我们可以使用opatch prereq命令来做冲突检查并得到相关建议,如:

[oracle@rh2 tmp]$ /s01/db_1/OPatch/opatch lsinventory
Invoking OPatch 11.2.0.1.3
Oracle Interim Patch Installer version 11.2.0.1.3
Copyright (c) 2010, Oracle Corporation.  All rights reserved.
Oracle Home       : /s01/db_1
Central Inventory : /s01/oraInventory
from           : /etc/oraInst.loc
OPatch version    : 11.2.0.1.3
OUI version       : 10.2.0.4.0
OUI location      : /s01/db_1/oui
Log file location : /s01/db_1/cfgtoollogs/opatch/opatch2011-05-06_21-41-39PM.log
Patch history file: /s01/db_1/cfgtoollogs/opatch/opatch_history.txt
Lsinventory Output file location : /s01/db_1/cfgtoollogs/opatch/lsinv/lsinventory2011-05-06_21-41-39PM.txt
--------------------------------------------------------------------------------
Installed Top-level Products (2): 
Oracle Database 10g                                                  10.2.0.1.0
Oracle Database 10g Release 2 Patch Set 3                            10.2.0.4.0
There are 2 products installed in this Oracle Home.
Interim patches (2) :
Patch  9654991      : applied on Fri May 06 21:33:38 CST 2011
Unique Patch ID:  12816418
Created on 30 Jun 2010, 18:30:40 hrs PST8PDT
Bugs fixed:
9714832, 9713537, 9678695, 9655017, 9173248, 8309642, 9678697, 9678690
9573054, 9654991
This patch overlays patches:
9352164
This patch needs patches:
9352164
as prerequisites
Patch  9352164      : applied on Fri May 06 21:29:20 CST 2011
Unique Patch ID:  12307477
Created on 1 Apr 2010, 11:33:14 hrs PST8PDT
Bugs fixed:
/* 以上示例环境中实施了10.2.0.4上的psu4(9352164)和psu5(9654991) */
/* 在示例环境中我们试图实施补丁7527908和9696904,这2个merge patch是针对psu4(10.2.0.4.4)版本的,
但在psu5(10.2.0.4.5)上没有相应的替代补丁,我们需要检测这2个patch是否和psu5发生冲突  */
[oracle@rh2 ~]$ mkdir $ORACLE_HOME/patches
/* 在Oracle HOME下创建存放patch的目录 */
[oracle@rh2 ~]$ cd $ORACLE_HOME/patches
[oracle@rh2 patches]$ unzip /home/oracle/patch/p7527908_102044_Linux-x86-64.zip 
Archive:  /home/oracle/patch/p7527908_102044_Linux-x86-64.zip
creating: 7527908/
creating: 7527908/files/
creating: 7527908/files/lib/
creating: 7527908/files/lib/libserver10.a/
inflating: 7527908/files/lib/libserver10.a/ktm.o
inflating: 7527908/files/lib/libserver10.a/ktu.o
inflating: 7527908/files/lib/libserver10.a/ktusm.o
creating: 7527908/etc/
creating: 7527908/etc/config/
inflating: 7527908/etc/config/inventory
inflating: 7527908/etc/config/actions
creating: 7527908/etc/xml/
inflating: 7527908/etc/xml/GenericActions.xml
inflating: 7527908/etc/xml/ShiphomeDirectoryStructure.xml
inflating: 7527908/README.txt      
[oracle@rh2 patches]$ unzip ~/patch/p9696904_102044_Linux-x86-64.zip
Archive:  /home/oracle/patch/p9696904_102044_Linux-x86-64.zip
creating: 9696904/
creating: 9696904/files/
creating: 9696904/files/lib/
creating: 9696904/files/lib/libserver10.a/
inflating: 9696904/files/lib/libserver10.a/kks1.o
inflating: 9696904/files/lib/libserver10.a/kksc.o
inflating: 9696904/files/lib/libserver10.a/kksh.o
inflating: 9696904/files/lib/libserver10.a/ksmp.o
inflating: 9696904/files/lib/libserver10.a/kspt.o
creating: 9696904/etc/
creating: 9696904/etc/config/
inflating: 9696904/etc/config/inventory
inflating: 9696904/etc/config/actions
creating: 9696904/etc/xml/
inflating: 9696904/etc/xml/GenericActions.xml
inflating: 9696904/etc/xml/ShiphomeDirectoryStructure.xml
inflating: 9696904/README.txt

接下来我们首先检查以上2个patch之间是否存在conflict冲突,这里我们会用到opatch prereq CheckConflictAmongPatchesWithDetail命令:

[oracle@rh2 db_1]$ /s01/db_1/OPatch/opatch prereq CheckConflictAmongPatchesWithDetail -phBaseDir $ORACLE_HOME/patches
Invoking OPatch 11.2.0.1.3
Oracle Interim Patch Installer version 11.2.0.1.3
Copyright (c) 2010, Oracle Corporation.  All rights reserved.
PREREQ session
Oracle Home       : /s01/db_1
Central Inventory : /s01/oraInventory
from           : /etc/oraInst.loc
OPatch version    : 11.2.0.1.3
OUI version       : 10.2.0.4.0
OUI location      : /s01/db_1/oui
Log file location : /s01/db_1/cfgtoollogs/opatch/opatch2011-05-06_22-07-18PM.log
Patch history file: /s01/db_1/cfgtoollogs/opatch/opatch_history.txt
Invoking prereq "checkconflictamongpatcheswithdetail"
Prereq "checkConflictAmongPatchesWithDetail" passed.
OPatch succeeded.
/* 可以看到以上补丁间冲突检查发现需要实施的补丁间不存在冲突 */

接着我们需要为将要实施的补丁与系统中现有的补丁检查是否存在冲突,这里我们要用到opatch prereq CheckConflictAgainstOHWithDetail命令,如:

[oracle@rh2 db_1]$ /s01/db_1/OPatch/opatch prereq CheckConflictAgainstOHWithDetail -phBaseDir /s01/db_1/patches/9696904
Invoking OPatch 11.2.0.1.3
Oracle Interim Patch Installer version 11.2.0.1.3
Copyright (c) 2010, Oracle Corporation.  All rights reserved.
PREREQ session
Oracle Home       : /s01/db_1
Central Inventory : /s01/oraInventory
from           : /etc/oraInst.loc
OPatch version    : 11.2.0.1.3
OUI version       : 10.2.0.4.0
OUI location      : /s01/db_1/oui
Log file location : /s01/db_1/cfgtoollogs/opatch/opatch2011-05-06_22-08-44PM.log
Patch history file: /s01/db_1/cfgtoollogs/opatch/opatch_history.txt
Invoking prereq "checkconflictagainstohwithdetail"
Prereq "checkConflictAgainstOHWithDetail" passed.
OPatch succeeded.
[oracle@rh2 db_1]$ /s01/db_1/OPatch/opatch prereq CheckConflictAgainstOHWithDetail -phBaseDir /s01/db_1/patches/7527908
Invoking OPatch 11.2.0.1.3
Oracle Interim Patch Installer version 11.2.0.1.3
Copyright (c) 2010, Oracle Corporation.  All rights reserved.
PREREQ session
Oracle Home       : /s01/db_1
Central Inventory : /s01/oraInventory
from           : /etc/oraInst.loc
OPatch version    : 11.2.0.1.3
OUI version       : 10.2.0.4.0
OUI location      : /s01/db_1/oui
Log file location : /s01/db_1/cfgtoollogs/opatch/opatch2011-05-06_22-08-57PM.log
Patch history file: /s01/db_1/cfgtoollogs/opatch/opatch_history.txt
Invoking prereq "checkconflictagainstohwithdetail"
Prereq "checkConflictAgainstOHWithDetail" passed.
OPatch succeeded.
/* 在这里另外举一个检查后发现冲突的例子 */

[oracle@rh2 tmp]$ /s01/db_1/OPatch/opatch prereq CheckConflictAgainstOHWithDetail -phBaseDir 8557428
Invoking OPatch 11.2.0.1.3
Oracle Interim Patch Installer version 11.2.0.1.3
Copyright (c) 2010, Oracle Corporation.  All rights reserved.
PREREQ session
Oracle Home       : /s01/db_1
Central Inventory : /s01/oraInventory
from           : /etc/oraInst.loc
OPatch version    : 11.2.0.1.3
OUI version       : 10.2.0.4.0
OUI location      : /s01/db_1/oui
Log file location : /s01/db_1/cfgtoollogs/opatch/opatch2011-05-06_22-38-25PM.log
Patch history file: /s01/db_1/cfgtoollogs/opatch/opatch_history.txt
Invoking prereq "checkconflictagainstohwithdetail"
ZOP-47: The patch(es) has supersets with other patches installed in the Oracle Home (or) among themselves.
ZOP-40: The patch(es) has conflicts with other patches installed in the Oracle Home (or) among themselves.
Prereq "checkConflictAgainstOHWithDetail" failed.
Summary of Conflict Analysis:
There are no patches that can be applied now.
Following patches have conflicts. Please contact Oracle Support and get the merged patch of the patches :
9352164
Following patches are not required, as they are subset of the patches in Oracle Home or
subset of the patches in the given list : 
Following patches will be rolled back from Oracle Home on application of the patches in the given list :
9352164
Conflicts/Supersets for each patch are:
Patch : 8557428
Bug Conflict with 9352164
Conflicting bugs are:
8479537,  7462072,  8210889,  6404447

OPatch succeeded.
/* 可以看到以上发现了8557428 one-off patch与9352164(PSU4)之间存在冲突,
引发冲突的Bug是8479537,  7462072,  8210889,  6404447
opatch建议用户提交合并8557428和9352164补丁的服务要求  */

通过以上检查可以发现虽然是针对psu4(10.2.0.4.4)发布的one-off patch,但在对psu5(10.2.0.4.5)的ORACLE HOME做checkConflict时未发现冲突,接着我们opatch apply -report命令来预演补丁实施,使用-report选项让opatch仅报告其所会做的操作,而不真正去实施这些操作:

[oracle@rh2 db_1]$ cd patches/7527908/
[oracle@rh2 7527908]$ ls
etc  files  README.txt
[oracle@rh2 7527908]$ /s01/db_1/OPatch/opatch apply -report
Invoking OPatch 11.2.0.1.3
Oracle Interim Patch Installer version 11.2.0.1.3
Copyright (c) 2010, Oracle Corporation.  All rights reserved.
Oracle Home       : /s01/db_1
Central Inventory : /s01/oraInventory
from           : /etc/oraInst.loc
OPatch version    : 11.2.0.1.3
OUI version       : 10.2.0.4.0
OUI location      : /s01/db_1/oui
Log file location : /s01/db_1/cfgtoollogs/opatch/opatch2011-05-06_22-19-25PM.log
Patch history file: /s01/db_1/cfgtoollogs/opatch/opatch_history.txt
ApplySession applying interim patch '7527908' to OH '/s01/db_1'
Running prerequisite checks...
[Report: skip "rm -rf /s01/db_1/ccr"]
[Report: skip unzipping "/s01/db_1/OPatch/ocm/ocm.zip"]
Provide your email address to be informed of security issues, install and
initiate Oracle Configuration Manager. Easier for you if you use your My
Oracle Support Email address/User Name.
Visit http://www.oracle.com/support/policies.html for details.
Email address/User Name: 
You have not provided an email address for notification of security issues.
Do you wish to remain uninformed of security issues ([Y]es, [N]o) [N]:  Yes
[Report: Skip calling "/bin/sh /s01/db_1/ccr/bin/setupCCR -R
/s01/db_1/.patch_storage/ocmRespFile -S OPatch -V 11.2.0.1.3" under "/s01/db_1/ccr/bin"]
OPatch detected non-cluster Oracle Home from the inventory and will patch the local system only.
Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.
(Oracle Home = '/s01/db_1')
Is the local system ready for patching? [y|n]
y
User Responded with: Y
Backing up files and inventory (not for auto-rollback) for the Oracle Home
Users request no RAC file generation.  Do not create MP files.
Skip patching component oracle.rdbms, 10.2.0.4.0 and its actions.
The actions are reported here, but are not performed.
Updating archive file "/s01/db_1/lib/libserver10.a"  with "lib/libserver10.a/ktm.o"
Updating archive file "/s01/db_1/lib/libserver10.a"  with "lib/libserver10.a/ktu.o"
Updating archive file "/s01/db_1/lib/libserver10.a"  with "lib/libserver10.a/ktusm.o"
Running make for target ioracle
ApplySession skipping inventory update.
Verifying the update...
Inventory and System verification is performed here.
The local system has been patched and can be restarted.
OPatch succeeded.
[oracle@rh2 patches]$ cd 9696904/
[oracle@rh2 9696904]$ /s01/db_1/OPatch/opatch apply -report
Invoking OPatch 11.2.0.1.3
Oracle Interim Patch Installer version 11.2.0.1.3
Copyright (c) 2010, Oracle Corporation.  All rights reserved.
Oracle Home       : /s01/db_1
Central Inventory : /s01/oraInventory
from           : /etc/oraInst.loc
OPatch version    : 11.2.0.1.3
OUI version       : 10.2.0.4.0
OUI location      : /s01/db_1/oui
Log file location : /s01/db_1/cfgtoollogs/opatch/opatch2011-05-06_22-22-11PM.log
Patch history file: /s01/db_1/cfgtoollogs/opatch/opatch_history.txt
ApplySession applying interim patch '9696904' to OH '/s01/db_1'
Running prerequisite checks...
[Report: skip "rm -rf /s01/db_1/ccr"]
[Report: skip unzipping "/s01/db_1/OPatch/ocm/ocm.zip"]
Provide your email address to be informed of security issues, install and
initiate Oracle Configuration Manager. Easier for you if you use your My
Oracle Support Email address/User Name.
Visit http://www.oracle.com/support/policies.html for details.
Email address/User Name: 
You have not provided an email address for notification of security issues.
Do you wish to remain uninformed of security issues ([Y]es, [N]o) [N]:  Yes
[Report: Skip calling "/bin/sh /s01/db_1/ccr/bin/setupCCR -R /s01/db_1/.patch_storage/ocmRespFile -S 
OPatch -V 11.2.0.1.3" under "/s01/db_1/ccr/bin"]
OPatch detected non-cluster Oracle Home from the inventory and will patch the local system only.
Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.
(Oracle Home = '/s01/db_1')
Is the local system ready for patching? [y|n]
y
User Responded with: Y
Backing up files and inventory (not for auto-rollback) for the Oracle Home
Users request no RAC file generation.  Do not create MP files.
Skip patching component oracle.rdbms, 10.2.0.4.0 and its actions.
The actions are reported here, but are not performed.
Updating archive file "/s01/db_1/lib/libserver10.a"  with "lib/libserver10.a/kks1.o"
Updating archive file "/s01/db_1/lib/libserver10.a"  with "lib/libserver10.a/kksc.o"
Updating archive file "/s01/db_1/lib/libserver10.a"  with "lib/libserver10.a/kksh.o"
Updating archive file "/s01/db_1/lib/libserver10.a"  with "lib/libserver10.a/ksmp.o"
Updating archive file "/s01/db_1/lib/libserver10.a"  with "lib/libserver10.a/kspt.o"
Running make for target ioracle
ApplySession skipping inventory update.
Verifying the update...
Inventory and System verification is performed here.
The local system has been patched and can be restarted.
OPatch succeeded.

以上使用opatch apply -report进一步确认了实施以上补丁不会引起冲突或其他失败,我们可以放心地正式应用补丁了!

[oracle@rh2 patches]$ cd 7527908/
[oracle@rh2 7527908]$ /s01/db_1/OPatch/opatch apply
Invoking OPatch 11.2.0.1.3
Oracle Interim Patch Installer version 11.2.0.1.3
Copyright (c) 2010, Oracle Corporation.  All rights reserved.
Oracle Home       : /s01/db_1
Central Inventory : /s01/oraInventory
from           : /etc/oraInst.loc
OPatch version    : 11.2.0.1.3
OUI version       : 10.2.0.4.0
OUI location      : /s01/db_1/oui
Log file location : /s01/db_1/cfgtoollogs/opatch/opatch2011-05-06_22-24-46PM.log
Patch history file: /s01/db_1/cfgtoollogs/opatch/opatch_history.txt
ApplySession applying interim patch '7527908' to OH '/s01/db_1'
Running prerequisite checks...
Provide your email address to be informed of security issues, install and
initiate Oracle Configuration Manager. Easier for you if you use your My
Oracle Support Email address/User Name.
Visit http://www.oracle.com/support/policies.html for details.
Email address/User Name: 
You have not provided an email address for notification of security issues.
Do you wish to remain uninformed of security issues ([Y]es, [N]o) [N]:  Yes
OPatch detected non-cluster Oracle Home from the inventory and will patch the local system only.
Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.
(Oracle Home = '/s01/db_1')
Is the local system ready for patching? [y|n]
y
User Responded with: Y
Backing up files and inventory (not for auto-rollback) for the Oracle Home
Backing up files affected by the patch '7527908' for restore. This might take a while...
Backing up files affected by the patch '7527908' for rollback. This might take a while...
Patching component oracle.rdbms, 10.2.0.4.0...
Updating archive file "/s01/db_1/lib/libserver10.a"  with "lib/libserver10.a/ktm.o"
Updating archive file "/s01/db_1/lib/libserver10.a"  with "lib/libserver10.a/ktu.o"
Updating archive file "/s01/db_1/lib/libserver10.a"  with "lib/libserver10.a/ktusm.o"
Running make for target ioracle
ApplySession adding interim patch '7527908' to inventory
Verifying the update...
Inventory check OK: Patch ID 7527908 is registered in Oracle Home inventory with proper meta-data.
Files check OK: Files from Patch ID 7527908 are present in Oracle Home.
The local system has been patched and can be restarted.
OPatch succeeded.
[oracle@rh2 7527908]$ cd ..
[oracle@rh2 patches]$ cd 9696904/
[oracle@rh2 9696904]$ /s01/db_1/OPatch/opatch apply
Invoking OPatch 11.2.0.1.3
Oracle Interim Patch Installer version 11.2.0.1.3
Copyright (c) 2010, Oracle Corporation.  All rights reserved.
Oracle Home       : /s01/db_1
Central Inventory : /s01/oraInventory
from           : /etc/oraInst.loc
OPatch version    : 11.2.0.1.3
OUI version       : 10.2.0.4.0
OUI location      : /s01/db_1/oui
Log file location : /s01/db_1/cfgtoollogs/opatch/opatch2011-05-06_22-25-58PM.log
Patch history file: /s01/db_1/cfgtoollogs/opatch/opatch_history.txt
ApplySession applying interim patch '9696904' to OH '/s01/db_1'
Running prerequisite checks...
Provide your email address to be informed of security issues, install and
initiate Oracle Configuration Manager. Easier for you if you use your My
Oracle Support Email address/User Name.
Visit http://www.oracle.com/support/policies.html for details.
Email address/User Name: 
You have not provided an email address for notification of security issues.
Do you wish to remain uninformed of security issues ([Y]es, [N]o) [N]:  Yes
OPatch detected non-cluster Oracle Home from the inventory and will patch the local system only.
Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.
(Oracle Home = '/s01/db_1')
Is the local system ready for patching? [y|n]
y
User Responded with: Y
Backing up files and inventory (not for auto-rollback) for the Oracle Home
Backing up files affected by the patch '9696904' for restore. This might take a while...
Backing up files affected by the patch '9696904' for rollback. This might take a while...
Patching component oracle.rdbms, 10.2.0.4.0...
Updating archive file "/s01/db_1/lib/libserver10.a"  with "lib/libserver10.a/kks1.o"
Updating archive file "/s01/db_1/lib/libserver10.a"  with "lib/libserver10.a/kksc.o"
Updating archive file "/s01/db_1/lib/libserver10.a"  with "lib/libserver10.a/kksh.o"
Updating archive file "/s01/db_1/lib/libserver10.a"  with "lib/libserver10.a/ksmp.o"
Updating archive file "/s01/db_1/lib/libserver10.a"  with "lib/libserver10.a/kspt.o"
Running make for target ioracle
ApplySession adding interim patch '9696904' to inventory
Verifying the update...
Inventory check OK: Patch ID 9696904 is registered in Oracle Home inventory with proper meta-data.
Files check OK: Files from Patch ID 9696904 are present in Oracle Home.
The local system has been patched and can be restarted.
OPatch succeeded.
[oracle@rh2 9696904]$ /s01/db_1/OPatch/opatch lsinventory
Invoking OPatch 11.2.0.1.3
Oracle Interim Patch Installer version 11.2.0.1.3
Copyright (c) 2010, Oracle Corporation.  All rights reserved.
Oracle Home       : /s01/db_1
Central Inventory : /s01/oraInventory
from           : /etc/oraInst.loc
OPatch version    : 11.2.0.1.3
OUI version       : 10.2.0.4.0
OUI location      : /s01/db_1/oui
Log file location : /s01/db_1/cfgtoollogs/opatch/opatch2011-05-06_22-28-56PM.log
Patch history file: /s01/db_1/cfgtoollogs/opatch/opatch_history.txt
Lsinventory Output file location : /s01/db_1/cfgtoollogs/opatch/lsinv/lsinventory2011-05-06_22-28-56PM.txt
--------------------------------------------------------------------------------
Installed Top-level Products (2): 
Oracle Database 10g                                                  10.2.0.1.0
Oracle Database 10g Release 2 Patch Set 3                            10.2.0.4.0
There are 2 products installed in this Oracle Home.
Interim patches (4) :
Patch  9696904      : applied on Fri May 06 22:26:37 CST 2011
Unique Patch ID:  12575150
Created on 10 May 2010, 23:22:02 hrs PST8PDT
Bugs fixed:
7025450, 8575528, 6904068
This patch overlays patches:
9352164
This patch needs patches:
9352164
as prerequisites
Patch  7527908      : applied on Fri May 06 22:25:33 CST 2011
Unique Patch ID:  12546933
Created on 30 Apr 2010, 12:48:09 hrs PST8PDT
Bugs fixed:
7527908
This patch overlays patches:
9352164
This patch needs patches:
9352164
as prerequisites
Patch  9654991      : applied on Fri May 06 21:33:38 CST 2011
Unique Patch ID:  12816418
Created on 30 Jun 2010, 18:30:40 hrs PST8PDT
Bugs fixed:
9714832, 9713537, 9678695, 9655017, 9173248, 8309642, 9678697, 9678690
9573054, 9654991
This patch overlays patches:
9352164
This patch needs patches:
9352164
as prerequisites
Patch  9352164      : applied on Fri May 06 21:29:20 CST 2011
Unique Patch ID:  12307477
Created on 1 Apr 2010, 11:33:14 hrs PST8PDT
Bugs fixed:
.............................................................
/* Applied one-off patch successfully, That's great! */

沪ICP备14014813号

沪公网安备 31010802001379号