Upgrade 11.2.0.1 GI/CRS to 11.2.0.2 in Linux

11.2.0.2已经release 1年多了,相对于11.2.0.1要稳定很多。现在我们为客户部署新系统的时候一般都会推荐直接装11.2.0.2(out of place),并打到<Oracle Recommended Patches — Oracle Database>所推荐的PSU。

对于现有的系统则推荐在停机窗口允许的前提下尽可能升级到11.2.0.2上来,当然客户也可以更耐心的等待11.2.0.3版本的release。

针对11.2.0.1到11.2.0.2上的升级工程,其与10g中的升级略有区别。对于misson-critical的数据库必须进行有效的升级演练和备份操作,因为Oracle数据库软件的升级一直是一项复杂的工程,并且具有风险,不能不慎。

同时RAC数据库的升级又要较single-instance单实例的升级来的复杂,主要可以分成以下步骤:

1.  若使用Exadata Database Machine硬件,首先要检查是否需要升级Exadata Storage Software和Infiniband Switch的版本,<Database Machine and Exadata Storage Server 11g Release 2 (11.2) Supported Versions>

2. 完成rolling upgrade Grid Infrastructure的准备工作

3.滚动升级Gird Infrastructure GI软件

4.完成升级RDBMS数据库软件的准备工作

5.具体升级RDBMS数据库软件,包括升级数据字典、并编译失效对象等

这里我们重点介绍的是滚动升级GI/CRS集群软件的准备工作和具体升级步骤,因为11.2.0.2是11gR2的第一个Patchset,且又是首个out of place的大补丁集,所以绝大多数人对新的升级模式并不熟悉。

 

升级GI的准备工作

 

1.注意从11.2.0.1 GI/CRS滚动升级(rolling upgrade)到 11.2.0.2时可能出现意外错误,具体见<Pre-requsite for 11.2.0.1 to 11.2.0.2 ASM Rolling Upgrade>,这里一并引用:

Applies to:
Oracle Server - Enterprise Edition - Version: 11.2.0.1.0 to 11.2.0.2.0 - Release: 11.2 to 11.2
Oracle Server - Enterprise Edition - Version: 11.2.0.1 to 11.2.0.2   [Release: 11.2 to 11.2]
Information in this document applies to any platform.
Purpose
This note is to clarify the patch requirement when doing 11.2.0.1 to 11.2.0.2 rolling upgrade.
Scope and Application
Intended audience includes DBA, support engineers.
Pre-requsite for 11.2.0.1 to 11.2.0.2 ASM Rolling Upgrade

There has been some confusion as what patches need to be applied for 11.2.0.1 ASM rolling
upgrade to 11.2.0.2 to be successful. Documentation regarding this is not very clear
(at the time of writing) and a documentation bug has been filed and documentation will be updated in the future.

There are two bugs related to 11.2.0.1 ASM rolling upgrade to 11.2.0.2:

Unpublished bug 9413827: 11201 TO 11202 ASM ROLLING UPGRADE - OLD CRS STACK FAILS TO STOP

Unpublished bug 9706490: LNX64-11202-UD 11201 -> 11202, DG OFFLINE AFTER RESTART CRS STACK DURING UPGRADE

Some of the symptoms include error message when running rootupgrade.sh:

ORA-15154: cluster rolling upgrade incomplete (from bug: 9413827)

or

Diskgroup status is shown offline after the upgrade, crsd.log may have:

2010-05-12 03:45:49.029: [ AGFW][1506556224] Agfw Proxy Server sending the
last reply to PE for message:RESOURCE_START[ora.MYDG1.dg rwsdcvm44 1] ID 4098:1526
TextMessage[CRS-2674: Start of 'ora.MYDG1.dg' on 'rwsdcvm44' failed]
TextMessage[ora.MYDG1.dg rwsdcvm44 1]
ora.MYDG1.dg rwsdcvm44 1:

To overcome this issue, there are two actions you need to take:

a). apply proper patch.
b). change crsconfig_lib.pm

Applying Patch:

1). If $GI_HOME is on version 11.2.0.1.2 (i.e GI PSU2 is applied):

Action: You can apply Patch:9706490 for version 11.2.0.1.2.

Unpublished bug 9413827 is fixed in 11.2.0.1.2 GI PSU2. Patch:9706490 for version
11.2.0.1.2 is built on top of 11.2.0.1.2 GI PSU2 (i.e. includes the 11.2.0.1.2 GI PSU2,
hence includes the fix for 9413827). Applying Patch:9706490 includes both fixes.
opatch will recognize 9706490 is superset of 11.2.0.1.2 GI PSU2 (Patch: 9655006)
and rollback patch 9655006 before applying Patch: 9706490).

2). If $GI_HOME is on version 11.2.0.1.0 (i.e. no GI PSU applied).

Action: You can apply Patch:9706490 for version 11.2.0.1.2. This would make sure you have
applied 11.2.0.1.2 GI PSU2 plus both 9706490 and 9413827 (which is included in GI PSU2).

For platforms that do not have 11.2.0.1.2 GI PSU, then you can apply patch 9413827 on 11.2.0.1.0.

3). If $GI_HOME is on version 11.2.0.1.1 (GI PSU1) (this is rare since GI PSU1 was only
released for Linux platforms and was quite old).

Action: You can rollback GI PSU1 then apply Patch:9706490 on version 11.2.0.1.2
if your platform has 11.2.0.1.2 GI PSU. If your platform does not have 11.2.0.1.2GI PSU,
then apply patch 9413827.

Modify crsconfig_lib.pm

After patch is applied, modify $11.2.0.2_GI_HOME/crs/install/crsconfig_lib.pm:

Before the change:
# grep for bugs 9655006 or 9413827
@cmdout = grep(/(9655006|9413827)/, @output);

After the change:
# grep for bugs 9655006 or 9413827 or 9706490
@cmdout = grep(/(9655006|9413827|9706490)/, @output);

This would prevent rootupgrade.sh from failing when it validates the pre-requsite patches.

这里我们假设环境中的11.2.0.1 GI没有apply任何PSU补丁,为了解决这一”11201 TO 11202 ASM ROLLING UPGRADE – OLD CRS STACK FAILS TO STOP” bug,并成功滚动升级GI,需要在正式升级11.2.0.2 Patchset之前apply 9413827 bug的对应patch。

此外我们还推荐使用最新的opatch工具以避免出现11.2.0.1上opatch无法识别相关patch的问题。

所以我们为了升级GI到11.2.0.2,需要先从MOS下载  3个对应平台(platform)的补丁包,它们是

1.   11.2.0.2.0 PATCH SET FOR ORACLE DATABASE SERVER (Patchset)(patchid:10098816),注意实际上11.2.0.2的这个Patchset由多达7个zip文件组成,如在Linux x86-64平台上:

Patch 10098816 11.2.0.2.0 PATCH SET FOR ORACLE DATABASE SERVER_download

其中升级我们只需要下载1-3的zip包即可,第一、二包是RDBMS Database软件的out of place Patchset,而第三个包为Grid Infrastructure/CRS软件的out of place Patchset,实际在本篇文章(只升级GI)中仅会用到p10098816_112020_Linux-x86-64_3of7.zip这个压缩包。

2.  Patch 9413827: 11201 TO 11202 ASM ROLLING UPGRADE – OLD CRS STACK FAILS TO STOP(patchid:9413827)

3.  Patch 6880880: OPatch 11.2 (patchid:6880880),最新的opatch工具

2. 在所有节点上安装最新的opatch工具,该步骤不需要停止任何服务:

切换到GI拥有者用户,并移动原有的Opatch目录,将新的Opatch安装到CRS_HOME

su - grid

[grid@vrh1 ~]$ mv $CRS_HOME/OPatch $CRS_HOME/OPatch_old
[grid@vrh1 ~]$ unzip /tmp/p6880880_112000_Linux-x86-64.zip -d $CRS_HOME

确认opatch版本

[grid@vrh1 ~]$ $CRS_HOME/OPatch/opatch
Invoking OPatch 11.2.0.1.6

Oracle Interim Patch Installer version 11.2.0.1.6
Copyright (c) 2011, Oracle Corporation.  All rights reserved.

3.  在所有节点上滚动安装BUNDLE Patch for Base Bug 9413827补丁包:

1.切换到GI拥有者用户,并确认已经安装的补丁

su - grid 

opatch lsinventory -detail -oh $CRS_HOME

Invoking OPatch 11.2.0.1.6

Oracle Interim Patch Installer version 11.2.0.1.6
Copyright (c) 2011, Oracle Corporation.  All rights reserved.

Oracle Home       : /g01/11.2.0/grid
Central Inventory : /g01/oraInventory
   from           : /etc/oraInst.loc
OPatch version    : 11.2.0.1.6
OUI version       : 11.2.0.1.0
Log file location : /g01/11.2.0/grid/cfgtoollogs/opatch/opatch2011-09-04_19-08-33PM.log

Lsinventory Output file location :
/g01/11.2.0/grid/cfgtoollogs/opatch/lsinv/lsinventory2011-09-04_19-08-33PM.txt

--------------------------------------------------------------------------------
Installed Top-level Products (1): 

Oracle Grid Infrastructure                                           11.2.0.1.0
There are 1 products installed in this Oracle Home.
........................
###########################################################################

2. 解压之前下载的 p9413827_11201_$platform.zip的补丁包

 unzip p9413827_112010_Linux-x86-64.zip 

###########################################################################

3. 切换到DB HOME拥有者身份,在本地节点上停止RDBMS DB HOME相关的资源:

su - oracle

语法:
 % [RDBMS_HOME]/bin/srvctl stop home -o [RDBMS_HOME] -s [status file location] -n [node_name]

srvctl stop home -o $ORACLE_HOME  -n vrh1 -s stop_db_res           

cat stop_db_res
db-vprod

 hostname
www.askmaclean.com

###########################################################################

4. 切换到root用户执行rootcrs.pl -unlock 命令

[root@vrh1 ~]# $CRS_HOME/crs/install/rootcrs.pl -unlock 

2011-09-04 20:46:53: Parsing the host name
2011-09-04 20:46:53: Checking for super user privileges
2011-09-04 20:46:53: User has super user privileges
Using configuration parameter file: /g01/11.2.0/grid/crs/install/crsconfig_params
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'vrh1'
CRS-2673: Attempting to stop 'ora.crsd' on 'vrh1'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'vrh1'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'vrh1'
CRS-2673: Attempting to stop 'ora.SYSTEMDG.dg' on 'vrh1'
CRS-2673: Attempting to stop 'ora.registry.acfs' on 'vrh1'
CRS-2673: Attempting to stop 'ora.DATA.dg' on 'vrh1'
CRS-2673: Attempting to stop 'ora.FRA.dg' on 'vrh1'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.vrh1.vip' on 'vrh1'
CRS-2677: Stop of 'ora.vrh1.vip' on 'vrh1' succeeded
CRS-2672: Attempting to start 'ora.vrh1.vip' on 'vrh2'
CRS-2677: Stop of 'ora.registry.acfs' on 'vrh1' succeeded
CRS-2676: Start of 'ora.vrh1.vip' on 'vrh2' succeeded
CRS-2677: Stop of 'ora.SYSTEMDG.dg' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.FRA.dg' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.DATA.dg' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'vrh1'
CRS-2677: Stop of 'ora.asm' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.ons' on 'vrh1'
CRS-2673: Attempting to stop 'ora.eons' on 'vrh1'
CRS-2677: Stop of 'ora.ons' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.net1.network' on 'vrh1'
CRS-2677: Stop of 'ora.net1.network' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.eons' on 'vrh1' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'vrh1' has completed
CRS-2677: Stop of 'ora.crsd' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'vrh1'
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'vrh1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'vrh1'
CRS-2673: Attempting to stop 'ora.evmd' on 'vrh1'
CRS-2673: Attempting to stop 'ora.asm' on 'vrh1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'vrh1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'vrh1'
CRS-2677: Stop of 'ora.cssdmonitor' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.asm' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'vrh1'
CRS-2677: Stop of 'ora.cssd' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.diskmon' on 'vrh1'
CRS-2673: Attempting to stop 'ora.gipcd' on 'vrh1'
CRS-2677: Stop of 'ora.gipcd' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.diskmon' on 'vrh1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'vrh1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
Successfully unlock /g01/11.2.0/grid

###########################################################################

5.以RDBMS HOME拥有者用户执行patch目录下的prepatch.sh脚本

su - oracle

% custom/server/9413827/custom/scripts/prepatch.sh -dbhome [RDBMS_HOME]

[oracle@vrh1 tmp]$ 9413827/custom/server/9413827/custom/scripts/prepatch.sh -dbhome $ORACLE_HOME

9413827/custom/server/9413827/custom/scripts/prepatch.sh completed successfully.

###########################################################################

6.实际apply patch

以GI/CRS拥有者用户执行以下命令

 % opatch napply -local -oh [CRS_HOME] -id 9413827

su - grid

cd /tmp/9413827/

opatch napply -local -oh $CRS_HOME -id 9413827

Invoking OPatch 11.2.0.1.6

Oracle Interim Patch Installer version 11.2.0.1.6
Copyright (c) 2011, Oracle Corporation.  All rights reserved.

UTIL session

Oracle Home       : /g01/11.2.0/grid
Central Inventory : /g01/oraInventory
   from           : /etc/oraInst.loc
OPatch version    : 11.2.0.1.6
OUI version       : 11.2.0.1.0
Log file location : /g01/11.2.0/grid/cfgtoollogs/opatch/opatch2011-09-04_20-52-37PM.log

Verifying environment and performing prerequisite checks...
OPatch continues with these patches:   9413827  

Do you want to proceed? [y|n]
y
User Responded with: Y
All checks passed.
Provide your email address to be informed of security issues, install and
initiate Oracle Configuration Manager. Easier for you if you use your My
Oracle Support Email address/User Name.
Visit http://www.oracle.com/support/policies.html for details.
Email address/User Name: 

You have not provided an email address for notification of security issues.
Do you wish to remain uninformed of security issues ([Y]es, [N]o) [N]:  y

Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.
(Oracle Home = '/g01/11.2.0/grid')

Is the local system ready for patching? [y|n]
y
User Responded with: Y
Backing up files...
Applying interim patch '9413827' to OH '/g01/11.2.0/grid'

Patching component oracle.crs, 11.2.0.1.0...
Patches 9413827 successfully applied.
Log file location: /g01/11.2.0/grid/cfgtoollogs/opatch/opatch2011-09-04_20-52-37PM.log

OPatch succeeded.

以DB/RDBMS拥有者用户执行以下命令

su - oracle
cd /tmp/9413827/

% opatch napply custom/server/ -local -oh [RDBMS_HOME] -id 9413827

opatch napply custom/server/ -local -oh $ORACLE_HOME -id 9413827

Verifying the update...
Inventory check OK: Patch ID 9413827 is registered in Oracle Home inventory with proper meta-data.
Files check OK: Files from Patch ID 9413827 are present in Oracle Home.
Running make for target install
Running make for target install

The local system has been patched and can be restarted.

UtilSession: N-Apply done.

OPatch succeeded.

###########################################################################

7. 配置HOME目录

以root用户执行以下命令

 chmod +w $CRS_HOME/log/[nodename]/agent
 chmod +w $CRS_HOME/log/[nodename]/agent/crsd

以DB/RDBMS拥有者用户执行以下命令
su - oracle

 cd /tmp/9413827/

% custom/server/9413827/custom/scripts/postpatch.sh -dbhome [RDBMS_HOME]

[oracle@vrh1 9413827]$ custom/server/9413827/custom/scripts/postpatch.sh -dbhome $ORACLE_HOME
Reading /s01/orabase/product/11.2.0/dbhome_1/install/params.ora..
Reading /s01/orabase/product/11.2.0/dbhome_1/install/params.ora..
Parsing file /s01/orabase/product/11.2.0/dbhome_1/bin/racgwrap
Parsing file /s01/orabase/product/11.2.0/dbhome_1/bin/srvctl
Parsing file /s01/orabase/product/11.2.0/dbhome_1/bin/srvconfig
Parsing file /s01/orabase/product/11.2.0/dbhome_1/bin/cluvfy
Verifying file /s01/orabase/product/11.2.0/dbhome_1/bin/racgwrap
Verifying file /s01/orabase/product/11.2.0/dbhome_1/bin/srvctl
Verifying file /s01/orabase/product/11.2.0/dbhome_1/bin/srvconfig
Verifying file /s01/orabase/product/11.2.0/dbhome_1/bin/cluvfy
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/bin/racgwrap
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/bin/srvctl
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/bin/srvconfig
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/bin/cluvfy
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/bin/racgmain
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/bin/racgeut
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/bin/diskmon.bin
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/bin/lsnodes
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/bin/osdbagrp
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/bin/rawutl
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/srvm/admin/ractrans
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/srvm/admin/getcrshome
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/bin/gnsd
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/bin/crsdiag.pl
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/lib/libhasgen11.so
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/lib/libclsra11.so
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/lib/libdbcfg11.so
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/lib/libocr11.so
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/lib/libocrb11.so
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/lib/libocrutl11.so
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/lib/libuini11.so
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/lib/librdjni11.so
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/lib/libgns11.so
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/lib/libgnsjni11.so
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/lib/libagfw11.so

###########################################################################

8.以root用户重启CRS进程

# $CRS_HOME/crs/install/rootcrs.pl -patch 

2011-09-04 21:03:32: Parsing the host name
2011-09-04 21:03:32: Checking for super user privileges
2011-09-04 21:03:32: User has super user privileges
Using configuration parameter file: /g01/11.2.0/grid/crs/install/crsconfig_params
CRS-4123: Oracle High Availability Services has been started.

# $ORACLE_HOME/bin/srvctl start home -o $ORACLE_HOME -s $STATUS_FILE -n nodename

###########################################################################

9. 使用opatch命令确认补丁安装成功

 opatch lsinventory -detail -oh $CRS_HOME
 opatch lsinventory -detail -oh $RDBMS_HOME

###########################################################################

10. 在其他节点上重复以上步骤,直到在所有节点上成功安装该补丁

###########################################################################

注意AIX平台上有额外的注意事项:

# Special Instruction for AIX
# ---------------------------
#
# During the application of this patch should you see any errors with regards
# to files being locked or opatch  being  unable to copy files then this
#
# could be as result of a process which requires termination or an additional
#
# file needing to be unloaded from the system cache.
#
#
# To try and identify the likely cause please execute the following  commands
#
# and provide the output to your support representative, who will be  able to
#
# identify the corrective steps.
#
#
#     genld -l | grep [CRS_HOME]
#
#     genkld | grep [CRS_HOME]    ( full or partial path will do )
#
#
# Simple Case Resolution:
#
# If genld returns data then a currently executing process has something open
# in
# the [CRS_HOME] directory, please terminate the process as
# required/recommended.
#
#
#  If genkld return data then please remove the enteries from the
#  OS system cache by using the slibclean command as root;
#
#
#     slibclean
#
###########################################################################
#
#  Patch Deinstallation Instructions:
#  ----------------------------------
#
#  To roll back the patch, follow all of the above steps 1-5. In step 6,
#  invoke the following opatch commands to roll back the patch in all homes.
#
#  % opatch rollback -id 9413827 -local -oh [CRS_HOME]
#
#  % opatch rollback -id 9413827 -local -oh [RDBMS_HOME]
#
#  Afterwards, continue with steps 7-9 to complete the procedure.
#
###########################################################################
#
#  If you have any problems installing this PSE or are not sure
#  about inventory setup please call Oracle support.
#
###########################################################################

 

正式升级GI到11.2.0.2

 

1. 解压软件包,如上所述第三个zip包为grid软件

unzip p10098816_112020_Linux-x86-64_3of7.zip

 

2. 以GI拥有者用户启动GI/CRS的OUI安装界面,并选择Out of Place的安装目录

(grid)$ unset ORACLE_HOME ORACLE_BASE ORACLE_SID
(grid)$ export DISPLAY=:0
(grid)$ cd /u01/app/oracle/patchdepot/grid
(grid)$ ./runInstaller
Starting Oracle Universal Installer…

在”Select Installation Options”屏幕中选择Upgrade Oracle Grid Infrastructure or Oracle Automatic Storage Management

 

upgrade_110202_GI

upgrade_110202_GI_a

 

选择不同于现有GI软件的目录

 

upgrade_110202_GI_b

完成安装后会提示要以root用户执行rootupgrade.sh

upgrade_110202_GI_c

3. 注意在正式执行rootupgrade.sh之前数据库服务在所有节点上都是可用的,而在执行rootupgrade.sh脚本期间,本地节点的CRS将短暂关闭,也就是说滚动升级期间至少有一个节点不用

因为unpublished bug 10011084 and unpublished bug 10128494的关系,在执行rootupgrade.sh之前需要修改crsconfig_lib.pm参数文件,修改方式如下:

cp $NEW_CRS_HOME/crs/install/crsconfig_lib.pm $NEW_CRS_HOME/crs/install/crsconfig_lib.pm.bak
vi $NEW_CRS_HOME/crs/install/crsconfig_lib.pm

从以上配置文件中修改如下行,并使用diff命令确认

From
 @cmdout = grep(/$bugid/, @output);
To
  @cmdout = grep(/(9655006|9413827)/, @output);

From
my @exp_func = qw(check_CRSConfig validate_olrconfig validateOCR
To
my @exp_func = qw(check_CRSConfig validate_olrconfig validateOCR read_file

$ diff crsconfig_lib.pm.orig crsconfig_lib.pm
699c699
< my @exp_func = qw(check_CRSConfig validate_olrconfig validateOCR --- >
my @exp_func = qw(check_CRSConfig validate_olrconfig validateOCR read_file
13277c13277
< @cmdout = grep(/$bugid/, @output); --- > @cmdout = grep(/(9655006|9413827)/, @output);

cp /g01/11.2.0.2/grid/crs/install/crsconfig_lib.pm /g01/11.2.0.2/grid/crs/install/crsconfig_lib.pm.bak

并在所有节点上复制该配置文件
scp /g01/11.2.0.2/grid/crs/install/crsconfig_lib.pm vrh2:/g01/11.2.0.2/grid/crs/install/crsconfig_lib.pm

如果觉得麻烦,那么也可以直接从这里下载修改好的crsconfig_lib.pm

由于 bug 10056593 和 bug 10241443 的缘故执行rootupgrde.sh的过程中还可能出现以下错误

Due to bug 10056593, rootupgrade.sh will report this error and continue. This error is ignorable.

Failed to add (property/value):('OLD_OCR_ID/'-1') for checkpoint:ROOTCRS_OLDHOMEINFO.Error code is 256

Due to bug 10241443, rootupgrade.sh may report the following error when installing the cvuqdisk package.
This error is ignorable.

    ls: /usr/sbin/smartctl: No such file or directory
    /usr/sbin/smartctl not found.

以上错误可以被忽略,不会影响到升级。

4.正式执行rootupgrade.sh脚本,建议从负载较高的节点开始

[root@vrh1 grid]# /g01/11.2.0.2/grid/rootupgrade.sh
Running Oracle 11g root script...

The following environment variables are set as:
ORACLE_OWNER= grid
ORACLE_HOME= /g01/11.2.0.2/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /g01/11.2.0.2/grid/crs/install/crsconfig_params
Creating trace directory
Failed to add (property/value):('OLD_OCR_ID/'-1') for checkpoint:ROOTCRS_OLDHOMEINFO.Error code is 256

ASM upgrade has started on first node.

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'vrh1'
CRS-2673: Attempting to stop 'ora.crsd' on 'vrh1'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'vrh1'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'vrh1'
CRS-2673: Attempting to stop 'ora.SYSTEMDG.dg' on 'vrh1'
CRS-2673: Attempting to stop 'ora.registry.acfs' on 'vrh1'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.vrh1.vip' on 'vrh1'
CRS-2677: Stop of 'ora.vrh1.vip' on 'vrh1' succeeded
CRS-2672: Attempting to start 'ora.vrh1.vip' on 'vrh2'
CRS-2677: Stop of 'ora.registry.acfs' on 'vrh1' succeeded
CRS-2676: Start of 'ora.vrh1.vip' on 'vrh2' succeeded
CRS-2677: Stop of 'ora.SYSTEMDG.dg' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'vrh1'
CRS-2677: Stop of 'ora.asm' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.ons' on 'vrh1'
CRS-2673: Attempting to stop 'ora.eons' on 'vrh1'
CRS-2677: Stop of 'ora.ons' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.net1.network' on 'vrh1'
CRS-2677: Stop of 'ora.net1.network' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.eons' on 'vrh1' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'vrh1' has completed
CRS-2677: Stop of 'ora.crsd' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.mdnsd' on 'vrh1'
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'vrh1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'vrh1'
CRS-2673: Attempting to stop 'ora.evmd' on 'vrh1'
CRS-2673: Attempting to stop 'ora.asm' on 'vrh1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'vrh1'
CRS-2677: Stop of 'ora.cssdmonitor' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.asm' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'vrh1'
CRS-2677: Stop of 'ora.cssd' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'vrh1'
CRS-2673: Attempting to stop 'ora.diskmon' on 'vrh1'
CRS-2677: Stop of 'ora.diskmon' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'vrh1'
CRS-2677: Stop of 'ora.gipcd' on 'vrh1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'vrh1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
Successfully deleted 1 keys from OCR.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
OLR initialization - successful
Adding daemon to inittab
ACFS-9200: Supported
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9312: Existing ADVM/ACFS installation detected.
ACFS-9314: Removing previous ADVM/ACFS installation.
ACFS-9315: Previous ADVM/ACFS components successfully removed.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.
clscfg: EXISTING configuration version 5 detected.
clscfg: version 5 is 11g Release 2.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Preparing packages for installation...
cvuqdisk-1.0.9-1
Configure Oracle Grid Infrastructure for a Cluster ... succeeded

 

最后执行rootupgrade.sh脚本的节点会出现以下GI/CRS成功升级的信息:

 

Successfully deleted 1 keys from OCR.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
OLR initialization - successful
Adding daemon to inittab
ACFS-9200: Supported
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9312: Existing ADVM/ACFS installation detected.
ACFS-9314: Removing previous ADVM/ACFS installation.
ACFS-9315: Previous ADVM/ACFS components successfully removed.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.
clscfg: EXISTING configuration version 5 detected.
clscfg: version 5 is 11g Release 2.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Started to upgrade the Oracle Clusterware. This operation may take a few minutes.
Started to upgrade the CSS.
Started to upgrade the CRS.
The CRS was successfully upgraded.
Oracle Clusterware operating version was successfully set to 11.2.0.2.0

ASM upgrade has finished on last node.

Preparing packages for installation...
cvuqdisk-1.0.9-1
Configure Oracle Grid Infrastructure for a Cluster ... succeeded

5. 确认GI/CRS的版本

su - grid

$ crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [11.2.0.2.0]

 hostname
www.askmaclean.com

/g01/11.2.0.2/grid/OPatch/opatch lsinventory -oh /g01/11.2.0.2/grid
Invoking OPatch 11.2.0.1.1

Oracle Interim Patch Installer version 11.2.0.1.1
Copyright (c) 2009, Oracle Corporation.  All rights reserved.

Oracle Home       : /g01/11.2.0.2/grid
Central Inventory : /g01/oraInventory
   from           : /etc/oraInst.loc
OPatch version    : 11.2.0.1.1
OUI version       : 11.2.0.2.0
OUI location      : /g01/11.2.0.2/grid/oui
Log file location : /g01/11.2.0.2/grid/cfgtoollogs/opatch/opatch2011-09-05_02-17-19AM.log

Patch history file: /g01/11.2.0.2/grid/cfgtoollogs/opatch/opatch_history.txt

Lsinventory Output file location : /g01/11.2.0.2/grid/cfgtoollogs/opatch/lsinv/lsinventory2011-09-05_02-17-19AM.txt

--------------------------------------------------------------------------------
Installed Top-level Products (1): 

Oracle Grid Infrastructure                                           11.2.0.2.0
There are 1 products installed in this Oracle Home.

6.更新bash_profile , 将CRS_HOME、ORACLE_HOME、PATH等变量指向新的GI目录

11gr2 RAC安装INS-35354问题一例

今天在安装一套11.2.0.2 RAC数据库时出现了INS-35354的问题:
11gR2-GI-INS-35354

因为之前已经成功安装了11.2.0.2的GI,而且Cluster的一切状态都正常,出现这错误都少有点意外:

[grid@vrh1 ~]$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

去MOS搜了一圈,发现有可能是oraInventory中的inventory.xml更新不正确导致的:

Applies to:
Oracle Server - Enterprise Edition - Version: 11.2.0.1 to 11.2.0.2 - Release: 11.2 to 11.2
Information in this document applies to any platform.
Symptoms

Installing 11gR2 database software in a Grid Infrastrsucture environment fails with the error INS-35354:

The system on which you are attempting to install Oracle RAC is not part of a valid cluster.

Grid Infrastructure (Oracle Clusterware) is running on all nodes in the cluster which can be verified with:

crsctl check crs

Changes
This is a new install.
Cause
As per 11gR2 documentation the error description is:

INS-35354: The system on which you are attempting to install Oracle RAC is not part of a valid cluster.

Cause: Prior to installing Oracle RAC, you must create a valid cluster. 
This is done by deploying Grid Infrastructure software, 
which will allow configuration of Oracle Clusterware and Automatic Storage Management.

However, the problem at hand may be that the central inventory is missing the "CRS=true" flag 
(for the Grid Infrastructure Home).
<inventory.xml>
-------------

<HOME_LIST>
<HOME NAME="Ora11g_gridinfrahome1" LOC="/u01/grid" TYPE="O" IDX="1">
<NODE_LIST>
<NODE NAME="node1"/>
<NODE NAME="node2"/>
</NODE_LIST>

 -------------

From the inventory.xml, we see that the HOME NAME line is missing the CRS="true" flag.

The error INS-35354 will occur when the central inventory entry for the Grid Infrastructure 
home is missing the flag that identifies it as CRS-type home.
Solution
Use the -updateNodeList option for the installer command to fix the the inventory.

The full syntax is:

./runInstaller -updateNodeList "CLUSTER_NODES={node1,node2}"
ORACLE_HOME="" ORACLE_HOME_NAME="" LOCAL_NODE="Node_Name" CRS=[true|false]

Execute the command on any node in the cluster.

Examples:

For a two-node RAC cluster on UNIX:

Node1:
cd /u01/grid/oui/bin
./runInstaller -updateNodeList "CLUSTER_NODES={node1,node2}" ORACLE_HOME="/u01/crs" 
ORACLE_HOME_NAME="GI_11201" LOCAL_NODE="node1" CRS=true

For a 2-node RAC cluster on Windows:

Node 1:
cd e:\app\11.2.0\grid\oui\bin
e:\app\11.2.0\grid\oui\bin\setup -updateNodeList "CLUSTER_NODES={RACNODE1,RACNODE2}" 
ORACLE_HOME="e:\app\11.2.0\grid" ORACLE_HOME_NAME="OraCrs11g_home1" LOCAL_NODE="RACNODE1" CRS=true

我环境中的inventory.xml内容如下:

[grid@vrh1 ContentsXML]$ cat inventory.xml 
<?xml version="1.0" standalone="yes" ?>
<!-- Copyright (c) 1999, 2010, Oracle. All rights reserved. -->
<!-- Do not modify the contents of this file by hand. -->
<INVENTORY>
<VERSION_INFO>
   <SAVED_WITH>11.2.0.2.0</SAVED_WITH>
   <MINIMUM_VER>2.1.0.6.0</MINIMUM_VER>
</VERSION_INFO>
<HOME_LIST>
<HOME NAME="Ora11g_gridinfrahome1" LOC="/g01/11.2.0/grid" TYPE="O" IDX="1" >
   <NODE_LIST>
      <NODE NAME="vrh1"/>
      <NODE NAME="vrh2"/>
   </NODE_LIST>
</HOME>
</HOME_LIST>
</INVENTORY>

显然是在<HOME NAME这里缺少了CRS=”true”的标志,导致OUI安装界面在检测时认为该节点没有安装GI。

解决方案其实很简单只要加入CRS=”true”在重启runInstaller即可,不需要如文档中介绍的那样使用runInstaller -updateNodeList的复杂命令组合。

[grid@vrh1 ContentsXML]$ cat /g01/oraInventory/ContentsXML/inventory.xml 
<?xml version="1.0" standalone="yes" ?>
<!-- Copyright (c) 1999, 2010, Oracle. All rights reserved. -->
<!-- Do not modify the contents of this file by hand. -->
<INVENTORY>
<VERSION_INFO>
   <SAVED_WITH>11.2.0.2.0</SAVED_WITH>
   <MINIMUM_VER>2.1.0.6.0</MINIMUM_VER>
</VERSION_INFO>
<HOME_LIST>
<HOME NAME="Ora11g_gridinfrahome1" LOC="/g01/11.2.0/grid" TYPE="O" IDX="1" CRS="true">
   <NODE_LIST>
      <NODE NAME="vrh1"/>
      <NODE NAME="vrh2"/>
   </NODE_LIST>
</HOME>
</HOME_LIST>
</INVENTORY>

如上修改后问题解决,安装界面正常:
11gr2-RAC-Installing-db-step-4-10

crsctl status resource -t -init in 11.2.0.2 grid infrastructure

11.2.0.2的grid infrastructure中crsctl stat res 命令不再显示如ora.cssd、ora.ctssd、ora.diskmon等基础资源的信息,如果用户想要了解这些resource状态需要加上-init选项:

[grid@rh2 ~]$ crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [11.2.0.2.0]

[grid@rh2 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       rh2
ora.LISTENER.lsnr
               OFFLINE OFFLINE      rh2
ora.asm
               ONLINE  ONLINE       rh2
ora.gsd
               OFFLINE OFFLINE      rh2
ora.net1.network
               ONLINE  ONLINE       rh2
ora.ons
               ONLINE  ONLINE       rh2
ora.registry.acfs
               OFFLINE OFFLINE      rh2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        OFFLINE OFFLINE
ora.cvu
      1        OFFLINE OFFLINE
ora.dw.db
      1        OFFLINE OFFLINE
ora.maclean.db
      1        OFFLINE OFFLINE
ora.oc4j
      1        OFFLINE OFFLINE
ora.prod.db
      1        OFFLINE OFFLINE
      2        OFFLINE OFFLINE
ora.prod.maclean.svc
      1        OFFLINE OFFLINE
      2        OFFLINE OFFLINE
ora.prod.maclean_pre.svc
      1        OFFLINE OFFLINE
      2        OFFLINE OFFLINE
ora.prod.maclean_pre_preconnect.svc
      1        OFFLINE OFFLINE
ora.prod.maclean_taf.svc
      1        OFFLINE OFFLINE
      2        OFFLINE OFFLINE
ora.rh2.vip
      1        OFFLINE OFFLINE
ora.rh3.vip
      1        OFFLINE OFFLINE
ora.scan1.vip
      1        OFFLINE OFFLINE                                       

[grid@rh2 ~]$ crsctl stat res -t -init 
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       rh2                      Started
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       rh2
ora.crf
      1        ONLINE  ONLINE       rh2
ora.crsd
      1        ONLINE  ONLINE       rh2
ora.cssd
      1        ONLINE  ONLINE       rh2
ora.cssdmonitor
      1        ONLINE  ONLINE       rh2
ora.ctssd
      1        ONLINE  ONLINE       rh2                      OBSERVER
ora.diskmon
      1        ONLINE  ONLINE       rh2
ora.drivers.acfs
      1        ONLINE  OFFLINE
ora.evmd
      1        ONLINE  ONLINE       rh2
ora.gipcd
      1        ONLINE  ONLINE       rh2
ora.gpnpd
      1        ONLINE  ONLINE       rh2
ora.mdnsd
      1        ONLINE  ONLINE       rh2

此外在11.2.0.2的grid中当我们想启动、停止、修改这些init资源时都需要加上-init选项,否则将出现CRS-2613: Could not find resource错误:

[grid@rh2 ~]$ crsctl stat res ora.asm
NAME=ora.asm
TYPE=ora.asm.type
TARGET=ONLINE
STATE=ONLINE on rh2

[grid@rh2 ~]$ crsctl modify res ora.asm -attr AUTO_START=never

[grid@rh2 ~]$ crsctl stat res ora.asm -p
NAME=ora.asm
TYPE=ora.asm.type
ACL=owner:grid:rwx,pgrp:oinstall:rwx,other::r--
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%
ALIAS_NAME=ora.%CRS_CSS_NODENAME%.ASM%CRS_CSS_NODENUMBER%.asm
AUTO_START=never
CHECK_INTERVAL=60
CHECK_TIMEOUT=30
DEFAULT_TEMPLATE=PROPERTY(RESOURCE_CLASS=asm) ELEMENT(INSTANCE_NAME= %GEN_USR_ORA_INST_NAME%)
DEGREE=1
DESCRIPTION=Oracle ASM resource
ENABLED=1
GEN_USR_ORA_INST_NAME=
GEN_USR_ORA_INST_NAME@SERVERNAME(rh2)=+ASM1
GEN_USR_ORA_INST_NAME@SERVERNAME(rh3)=+ASM2
LOAD=1
LOGGING_LEVEL=1
NLS_LANG=
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=60
START_DEPENDENCIES=weak(ora.LISTENER.lsnr)
START_TIMEOUT=900
STATE_CHANGE_TEMPLATE=
STOP_DEPENDENCIES=
STOP_TIMEOUT=600
TYPE_VERSION=1.2
UPTIME_THRESHOLD=1d
USR_ORA_ENV=
USR_ORA_INST_NAME=+ASM%CRS_CSS_NODENUMBER%
USR_ORA_OPEN_MODE=mount
USR_ORA_OPI=false
USR_ORA_STOP_MODE=immediate
VERSION=11.2.0.2.0

[grid@rh2 ~]$ crsctl status resource  -init -t|grep -v ONLINE|tail -13
ora.asm
ora.cluster_interconnect.haip
ora.crf
ora.crsd
ora.cssd
ora.cssdmonitor
ora.ctssd
ora.diskmon
ora.drivers.acfs
ora.evmd
ora.gipcd
ora.gpnpd
ora.mdnsd

[grid@rh2 ~]$ crsctl status resource  -init -t|grep -v ONLINE|tail -13|xargs crsctl status resource
CRS-2613: Could not find resource 'ora.cluster_interconnect.haip'.
CRS-2613: Could not find resource 'ora.crf'.
CRS-2613: Could not find resource 'ora.crsd'.
CRS-2613: Could not find resource 'ora.cssd'.
CRS-2613: Could not find resource 'ora.cssdmonitor'.
CRS-2613: Could not find resource 'ora.ctssd'.
CRS-2613: Could not find resource 'ora.diskmon'.
CRS-2613: Could not find resource 'ora.drivers.acfs'.
CRS-2613: Could not find resource 'ora.evmd'.
CRS-2613: Could not find resource 'ora.gipcd'.
CRS-2613: Could not find resource 'ora.gpnpd'.
CRS-2613: Could not find resource 'ora.mdnsd'.
NAME=ora.asm
TYPE=ora.asm.type
TARGET=ONLINE
STATE=ONLINE on rh2

[grid@rh2 ~]$ crsctl status res ora.crsd -init -p
NAME=ora.crsd
TYPE=ora.crs.type
ACL=owner:root:rw-,pgrp:oinstall:rw-,other::r--,user:grid:r-x
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=0
AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%CRS_EXE_SUFFIX%
AUTO_START=always
CARDINALITY=1
CHECK_ARGS=
CHECK_COMMAND=
CHECK_INTERVAL=30
CLEAN_ARGS=
CLEAN_COMMAND=
DAEMON_LOGGING_LEVELS=AGENT=1,AGFW=0,CLSFRAME=0,CLSVER=0,CLUCLS=0,COMMCRS=0,COMMNS=0,CRSAPP=0,CRSCCL=0,CRSCEVT=0,CRSCOMM=0,CRSD=0,CRSEVT=0,CRSMAIN=0,CRSOCR=0,CRSPE=0,CRSPLACE=0,CRSRES=0,CRSRPT=0,CRSRTI=0,CRSSE=0,CRSSEC=0,CRSTIMER=0,CRSUI=0,CSSCLNT=0,SuiteTes=1,UiServer=0,OCRAPI=1,OCRCLI=1,OCRSRV=1,OCRMAS=1,OCRMSG=1,OCRCAC=1,OCRRAW=1,OCRUTL=1,OCROSD=1,OCRASM=1
DAEMON_TRACING_LEVELS=AGENT=0,AGFW=0,CLSFRAME=0,CLSVER=0,CLUCLS=0,COMMCRS=0,COMMNS=0,CRSAPP=0,CRSCCL=0,CRSCEVT=0,CRSCOMM=0,CRSD=0,CRSEVT=0,CRSMAIN=0,CRSOCR=0,CRSPE=0,CRSPLACE=0,CRSRES=0,CRSRPT=0,CRSRTI=0,CRSSE=0,CRSSEC=0,CRSTIMER=0,CRSUI=0,CSSCLNT=0,SuiteTes=0,UiServer=0,OCRAPI=1,OCRCLI=1,OCRSRV=1,OCRMAS=1,OCRMSG=1,OCRCAC=1,OCRRAW=1,OCRUTL=1,OCROSD=1,OCRASM=1
DEFAULT_TEMPLATE=
DEGREE=1
DESCRIPTION="Resource type for CRSD"
DETACHED=true
ENABLED=1
FAILOVER_DELAY=0
FAILURE_INTERVAL=3
FAILURE_THRESHOLD=5
HOSTING_MEMBERS=
LOAD=1
LOGGING_LEVEL=1
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
ORA_VERSION=11.2.0.2.0
PID_FILE=
PLACEMENT=balanced
PROCESS_TO_MONITOR=
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=10
SCRIPT_TIMEOUT=60
SERVER_POOLS=
START_ARGS=
START_COMMAND=
START_DEPENDENCIES=hard(ora.asm,ora.cssd,ora.ctssd,ora.gipcd)pullup(ora.asm,ora.cssd,ora.ctssd,ora.gipcd)
START_TIMEOUT=600
STATE_CHANGE_TEMPLATE=
STOP_ARGS=
STOP_COMMAND=
STOP_DEPENDENCIES=hard(shutdown:ora.asm,intermediate:ora.cssd,intermediate:ora.gipcd)
STOP_MODE=NONE
STOP_TIMEOUT=43200
UPTIME_THRESHOLD=1m
USR_ORA_ENV=

[grid@rh2 ~]$ crsctl modify res ora.crsd -init -attr "SCRIPT_TIMEOUT"=65   
CRS-0245:  User doesn't have enough privilege to perform the operation
CRS-4000: Command Modify failed, or completed with errors.

/* 修改某些资源的属性要求root权限 */

[root@rh2 ~]# crsctl modify res ora.crsd -init -attr "SCRIPT_TIMEOUT"=65 

[root@rh2 ~]# crsctl status res ora.crsd -init -p|grep SCRIPT_TIMEOUT
SCRIPT_TIMEOUT=65

[root@rh2 ~]# crsctl status res ora.ctssd -p -init
NAME=ora.ctssd
TYPE=ora.ctss.type
ACL=owner:root:rw-,pgrp:oinstall:rw-,other::r--,user:grid:r-x
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=0
AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%CRS_EXE_SUFFIX%
AUTO_START=always
CARDINALITY=1
CHECK_ARGS=
CHECK_COMMAND=
CHECK_INTERVAL=30
CLEAN_ARGS=
CLEAN_COMMAND=
DAEMON_LOGGING_LEVELS=CLUCLS=0,CSSCLNT=0,CRSCCL=1,CTSS=5,OCRAPI=1,OCRCLI=1,OCRMSG=1
DAEMON_TRACING_LEVELS=CLUCLS=0,CSSCLNT=0,CRSCCL=1,CTSS=5,OCRAPI=1,OCRCLI=1,OCRMSG=1
DEFAULT_TEMPLATE=
DEGREE=1
DESCRIPTION="Resource type for Ctss Agents"
DETACHED=true
ENABLED=1
FAILOVER_DELAY=0
FAILURE_INTERVAL=3
FAILURE_THRESHOLD=5
HOSTING_MEMBERS=
LOAD=1
LOGGING_LEVEL=1
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
ORA_VERSION=11.2.0.2.0
PID_FILE=
PLACEMENT=balanced
PROCESS_TO_MONITOR=
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=60
SERVER_POOLS=
START_ARGS=
START_COMMAND=
START_DEPENDENCIES=hard(ora.cssd,ora.gipcd)pullup(ora.cssd,ora.gipcd)
START_TIMEOUT=60
STATE_CHANGE_TEMPLATE=
STOP_ARGS=
STOP_COMMAND=
STOP_DEPENDENCIES=hard(ora.cssd,ora.gipcd)
STOP_TIMEOUT=60
UPTIME_THRESHOLD=1m
USR_ORA_ENV=

[root@rh2 ~]# crsctl status res ora.diskmon -p -init
NAME=ora.diskmon
TYPE=ora.diskmon.type
ACL=owner:root:rw-,pgrp:oinstall:rw-,other::r--,user:grid:r-x
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=0
AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%CRS_EXE_SUFFIX%
AUTO_START=never
CARDINALITY=1
CHECK_ARGS=
CHECK_COMMAND=
CHECK_INTERVAL=3
CHECK_TIMEOUT=30
CLEAN_ARGS=
CLEAN_COMMAND=
DAEMON_LOGGING_LEVELS=
DAEMON_TRACING_LEVELS=
DEFAULT_TEMPLATE=
DEGREE=1
DESCRIPTION="Resource type for Diskmon"
DETACHED=true
ENABLED=1
FAILOVER_DELAY=0
FAILURE_INTERVAL=3
FAILURE_THRESHOLD=5
HOSTING_MEMBERS=
LOAD=1
LOGGING_LEVEL=1
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
ORA_VERSION=11.2.0.2.0
PID_FILE=
PLACEMENT=balanced
PROCESS_TO_MONITOR=
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=10
SCRIPT_TIMEOUT=60
SERVER_POOLS=
START_ARGS=
START_COMMAND=
START_DEPENDENCIES=weak(concurrent:ora.cssd)pullup:always(ora.cssd)
START_TIMEOUT=600
STATE_CHANGE_TEMPLATE=
STOP_ARGS=
STOP_COMMAND=
STOP_DEPENDENCIES=
STOP_TIMEOUT=60
UPTIME_THRESHOLD=5s
USR_ORA_ENV=ORACLE_USER=grid
VERSION=11.2.0.2.0

[root@rh2 ~]# crsctl status res ora.cssd -init -p
NAME=ora.cssd
TYPE=ora.cssd.type
ACL=owner:root:rw-,pgrp:oinstall:rw-,other::r--,user:grid:r-x
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=0
AGENT_FILENAME=%CRS_HOME%/bin/cssdagent%CRS_EXE_SUFFIX%
AGENT_HB_INTERVAL=0
AGENT_HB_MISCOUNT=10
AUTO_START=always
CARDINALITY=1
CHECK_ARGS=
CHECK_COMMAND=
CHECK_INTERVAL=30
CLEAN_ARGS=abort
CLEAN_COMMAND=
CSSD_MODE=
CSSD_PATH=%CRS_HOME%/bin/ocssd%CRS_EXE_SUFFIX%
CSS_USER=grid
DAEMON_LOGGING_LEVELS=CSSD=2,GIPCNM=2,GIPCGM=2,GIPCCM=2,CLSF=0,SKGFD=0,GPNP=1,OLR=0
DAEMON_TRACING_LEVELS=CSSD=0,GIPCNM=0,GIPCGM=0,GIPCCM=0,CLSF=0,SKGFD=0,GPNP=0,OLR=0
DEFAULT_TEMPLATE=
DEGREE=1
DESCRIPTION="Resource type for CSSD"
DETACHED=true
ENABLED=1
ENV_OPTS=
FAILOVER_DELAY=0
FAILURE_INTERVAL=3
FAILURE_THRESHOLD=5
HOSTING_MEMBERS=
LOAD=1
LOGGING_LEVEL=1
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
OMON_INITRATE=1000
OMON_POLLRATE=500
ORA_OPROCD_MODE=
ORA_VERSION=11.2.0.2.0
PID_FILE=
PLACEMENT=balanced
PROCD_TIMEOUT=1000
PROCESS_TO_MONITOR=
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=3
SCRIPT_TIMEOUT=600
SERVER_POOLS=
START_ARGS=
START_COMMAND=
START_DEPENDENCIES=weak(concurrent:ora.diskmon)hard(ora.cssdmonitor,ora.gpnpd,ora.gipcd)pullup(ora.gpnpd,ora.gipcd)
START_TIMEOUT=600
STATE_CHANGE_TEMPLATE=
STOP_ARGS=
STOP_COMMAND=
STOP_DEPENDENCIES=hard(intermediate:ora.gipcd,shutdown:ora.diskmon,intermediate:ora.cssdmonitor)
STOP_TIMEOUT=900
UPTIME_THRESHOLD=1m
USR_ORA_ENV=
VMON_INITLIMIT=16
VMON_INITRATE=500
VMON_POLLRATE=500

How to recover from root.sh on 11.2 Grid Infrastructure Failed

从10g的clusterware到11g Release2的Grid Infrastructure,Oracle往RAC这个框架里塞进了太多东西。虽然照着Step by Step Installation指南步步为营地去安装11.2.0.1的GI,但在实际执行root.sh脚本的时候,不免又要出现这样那样的错误。例如下面的一例:

[root@rh3 grid]# ./root.sh
Running Oracle 11g root.sh script...

The following environment variables are set as:
    ORACLE_OWNER= maclean
    ORACLE_HOME=  /u01/app/11.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]: 

The file "dbhome" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]: The file "oraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]:
The file "coraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]: 

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.
2011-03-28 20:43:13: Parsing the host name
2011-03-28 20:43:13: Checking for super user privileges
2011-03-28 20:43:13: User has super user privileges
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
LOCAL ADD MODE
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Adding daemon to inittab
CRS-4123: Oracle High Availability Services has been started.
ohasd is starting

ADVM/ACFS is not supported on oraclelinux-release-5-5.0.2

一个节点上的root.sh脚本运行居然提示说ADVM/ACFS不支持OEL 5.5,但实际上Redhat 5或者OEL 5是目前仅有的少数支持ACFS的平台(The ACFS install would be on a supported Linux release – either Oracle Enterprise Linux 5 or Red Hat 5)。

检索Metalink发现这是一个Linux平台上的Bug 9474252: ‘ACFSLOAD START’ RETURNS “ADVM/ACFS IS NOT SUPPORTED ON DHL-RELEASE-…”

因为以上Not Supported错误信息在另外一个节点(也是Enterprise Linux Enterprise Linux Server release 5.5 (Carthage)) 运行root.sh脚本时并未出现,那么一般只要找出2个节点间的差异就可能解决问题了:

未出错节点上release相关rpm包的情况

[maclean@rh6 tmp]$ cat /etc/issue
Enterprise Linux Enterprise Linux Server release 5.5 (Carthage)
Kernel \r on an \m

[maclean@rh6 tmp]$ rpm -qa|grep release
enterprise-release-notes-5Server-17
enterprise-release-5-0.0.22

出错节点上release相关rpm包的情况

[root@rh3 tmp]# rpm -qa | grep release
oraclelinux-release-5-5.0.2
enterprise-release-5-0.0.22
enterprise-release-notes-5Server-17

以上可以看到相比起没有出错的节点,出错节点上多安装了一个名为oraclelinux-release-5-5.0.2的rpm包,我们尝试来卸载该rpm是否能解决问题,补充实际上该问题也可以通过修改/tmp/.linux_release文件的内容为enterprise-release-5-0.0.17来解决,而无需如我们这里做的卸载名为oraclelinux-release-5*的rpm软件包:

[root@rh3 install]# rpm -e oraclelinux-release-5-5.0.2

[root@rh3 grid]# ./root.sh
Running Oracle 11g root.sh script...

The following environment variables are set as:
    ORACLE_OWNER= maclean
    ORACLE_HOME=  /u01/app/11.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The file "dbhome" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]:
The file "oraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]:
The file "coraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]: 

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.
2011-03-28 20:57:21: Parsing the host name
2011-03-28 20:57:21: Checking for super user privileges
2011-03-28 20:57:21: User has super user privileges
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
CRS is already configured on this node for crshome=0
Cannot configure two CRS instances on the same cluster.
Please deconfigure before proceeding with the configuration of new home.

再次在失败节点上运行root.sh,被提示告知需要首先deconfigure然后才能再次配置。在官方文档<Oracle Grid Infrastructure Installation Guide 11g Release 2>中介绍了如何反向配置11g release 2中的Grid Infrastructure(Deconfiguring Oracle Clusterware Without Removing Binaries):

/* 同为管理Grid Infra所以仍需要root用户来执行以下操作 */

[root@rh3 grid]# pwd
/u01/app/11.2.0/grid

/* 目前位于GRID_HOME目录下  */

[root@rh3 grid]# cd crs/install

/* 以-deconfig选项执行一个名为rootcrs.pl的脚本 */

[root@rh3 install]# ./rootcrs.pl -deconfig
2011-03-28 21:03:05: Parsing the host name
2011-03-28 21:03:05: Checking for super user privileges
2011-03-28 21:03:05: User has super user privileges
Using configuration parameter file: ./crsconfig_params
VIP exists.:rh3
VIP exists.: //192.168.1.105/255.255.255.0/eth0
VIP exists.:rh6
VIP exists.: //192.168.1.103/255.255.255.0/eth0
GSD exists.
ONS daemon exists. Local port 6100, remote port 6200
eONS daemon exists. Multicast port 20796, multicast IP address 234.227.83.81, listening port 2016
Please confirm that you intend to remove the VIPs rh3 (y/[n]) y
ACFS-9200: Supported
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rh3'
CRS-2673: Attempting to stop 'ora.crsd' on 'rh3'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'rh3'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'rh3'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'rh3' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'rh3' has completed
CRS-2677: Stop of 'ora.crsd' on 'rh3' succeeded
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rh3'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rh3'
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'rh3'
CRS-2673: Attempting to stop 'ora.ctssd' on 'rh3'
CRS-2673: Attempting to stop 'ora.evmd' on 'rh3'
CRS-2677: Stop of 'ora.cssdmonitor' on 'rh3' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'rh3' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'rh3' succeeded
CRS-2677: Stop of 'ora.evmd' on 'rh3' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'rh3' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'rh3'
CRS-2677: Stop of 'ora.cssd' on 'rh3' succeeded
CRS-2673: Attempting to stop 'ora.diskmon' on 'rh3'
CRS-2673: Attempting to stop 'ora.gipcd' on 'rh3'
CRS-2677: Stop of 'ora.gipcd' on 'rh3' succeeded
CRS-2677: Stop of 'ora.diskmon' on 'rh3' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rh3' has completed
CRS-4133: Oracle High Availability Services has been stopped.
Successfully deconfigured Oracle clusterware stack on this node

/* 如果以上deconfig操作未能成功反向配置那么可以以-FORCE选项执行rootcrs.pl脚本 */

[root@rh3 install]# ./rootcrs.pl -deconfig -force
2011-03-28 21:41:00: Parsing the host name
2011-03-28 21:41:00: Checking for super user privileges
2011-03-28 21:41:00: User has super user privileges
Using configuration parameter file: ./crsconfig_params
VIP exists.:rh3
VIP exists.: //192.168.1.105/255.255.255.0/eth0
VIP exists.:rh6
VIP exists.: //192.168.1.103/255.255.255.0/eth0
GSD exists.
ONS daemon exists. Local port 6100, remote port 6200
eONS daemon exists. Multicast port 20796, multicast IP address 234.227.83.81, listening port 2016
ACFS-9200: Supported
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rh3'
CRS-2673: Attempting to stop 'ora.crsd' on 'rh3'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'rh3'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'rh3'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'rh3' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'rh3' has completed
CRS-2677: Stop of 'ora.crsd' on 'rh3' succeeded
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rh3'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rh3'
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'rh3'
CRS-2673: Attempting to stop 'ora.ctssd' on 'rh3'
CRS-2673: Attempting to stop 'ora.evmd' on 'rh3'
CRS-2677: Stop of 'ora.cssdmonitor' on 'rh3' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'rh3' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'rh3' succeeded
CRS-2677: Stop of 'ora.evmd' on 'rh3' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'rh3' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'rh3'
CRS-2677: Stop of 'ora.cssd' on 'rh3' succeeded
CRS-2673: Attempting to stop 'ora.diskmon' on 'rh3'
CRS-2673: Attempting to stop 'ora.gipcd' on 'rh3'
CRS-2677: Stop of 'ora.gipcd' on 'rh3' succeeded
CRS-2677: Stop of 'ora.diskmon' on 'rh3' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rh3' has completed
CRS-4133: Oracle High Availability Services has been stopped.
Successfully deconfigured Oracle clusterware stack on this node

/* 所幸以上这招总是能够奏效,否则岂不是每次都要完全卸载后重新安装GI? */

顺利完成以上反向配置CRS后,就可以再次尝试运行多灾多难的root.sh了:

[root@rh3 grid]# pwd
/u01/app/11.2.0/grid

[root@rh3 grid]# ./root.sh
Running Oracle 11g root.sh script...

The following environment variables are set as:
    ORACLE_OWNER= maclean
    ORACLE_HOME=  /u01/app/11.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The file "dbhome" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]:
The file "oraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]:
The file "coraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]: 

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.
2011-03-28 21:07:29: Parsing the host name
2011-03-28 21:07:29: Checking for super user privileges
2011-03-28 21:07:29: User has super user privileges
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
LOCAL ADD MODE
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Adding daemon to inittab
CRS-4123: Oracle High Availability Services has been started.
ohasd is starting
FATAL: Module oracleoks not found.
FATAL: Module oracleadvm not found.
FATAL: Module oracleacfs not found.
acfsroot: ACFS-9121: Failed to detect /dev/asm/.asm_ctl_spec.

acfsroot: ACFS-9310: ADVM/ACFS installation failed.

acfsroot: ACFS-9311: not all components were detected after the installation.

CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node rh6, number 1, and is terminating
An active cluster was found during exclusive startup, restarting to join the cluster
CRS-2672: Attempting to start 'ora.mdnsd' on 'rh3'
CRS-2676: Start of 'ora.mdnsd' on 'rh3' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'rh3'
CRS-2676: Start of 'ora.gipcd' on 'rh3' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rh3'
CRS-2676: Start of 'ora.gpnpd' on 'rh3' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rh3'
CRS-2676: Start of 'ora.cssdmonitor' on 'rh3' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rh3'
CRS-2672: Attempting to start 'ora.diskmon' on 'rh3'
CRS-2676: Start of 'ora.diskmon' on 'rh3' succeeded
CRS-2676: Start of 'ora.cssd' on 'rh3' succeeded
CRS-2672: Attempting to start 'ora.ctssd' on 'rh3'
CRS-2676: Start of 'ora.ctssd' on 'rh3' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'rh3'
CRS-2676: Start of 'ora.crsd' on 'rh3' succeeded
CRS-2672: Attempting to start 'ora.evmd' on 'rh3'
CRS-2676: Start of 'ora.evmd' on 'rh3' succeeded
/u01/app/11.2.0/grid/bin/srvctl start vip -i rh3 ... failed
Preparing packages for installation...
cvuqdisk-1.0.7-1
Configure Oracle Grid Infrastructure for a Cluster ... failed
Updating inventory properties for clusterware
Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 5023 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /s01/oraInventory
'UpdateNodeList' was successful.

以上虽然绕过了”ADVM/ACFS is not supported”的问题,但又出现了”FATAL: Module oracleoks/oracleadvm/oracleacfs not found”,Linux下ACFS/ADVM相关加载Module无法找到的问题,查了下metalink发现这是GI 11.2.0.2中2个被确认的bug 10252497bug 10266447,而实际我所安装的是11.2.0.1版本的GI…….. 好了,所幸我目前的环境是使用NFS的存储,所以如ADVM/ACFS这些存储选项的问题可以忽略不计,准备在11.2.0.2上再测试下。

不得不说11.2.0.1版本GI的安装存在太多的问题,以至于Oracle Support不得不撰写了不少相关故障诊断的文档,例如:<Troubleshooting 11.2 Grid Infastructure Installation Root.sh Issues [ID 1053970.1]>,<How to Proceed from Failed 11gR2 Grid Infrastructure (CRS) Installation [ID 942166.1]>。目前为止还没体验过11.2.0.2的GI,希望它不像上一个版本那么糟糕!

How to find error message from OMS repository

OEM或者OMS管理工具为我们提供了方便的访问数据库度量告警和错误信息的界面,那么我们是否可以通过手动查询的方式来直接查看这些信息呢?答案是肯定的,这需要我们对OMS repository有一定的了解:

SQL> desc sysman.MGMT_CURRENT_METRIC_ERRORS;
 Name					   Null?    Type
 ----------------------------------------- -------- ----------------------------
 TARGET_GUID				   NOT NULL RAW(16)
 METRIC_GUID				   NOT NULL RAW(16)
 COLL_NAME				   NOT NULL VARCHAR2(64)
 AGENT_GUID				   NOT NULL RAW(16)
 COLLECTION_TIMESTAMP			   NOT NULL DATE
 METRIC_ERROR_MESSAGE				    VARCHAR2(4000)
 METRIC_ERROR_TYPE				    NUMBER(1)

/* MGMT_CURRENT_METRIC_ERRORS记录了当前出现在OMS console中的度量错误记录 */

SQL> desc sysman.MGMT_METRIC_ERRORS;
 Name					   Null?    Type
 ----------------------------------------- -------- ----------------------------
 TARGET_GUID				   NOT NULL RAW(16)
 METRIC_GUID				   NOT NULL RAW(16)
 COLL_NAME				   NOT NULL VARCHAR2(64)
 AGENT_GUID				   NOT NULL RAW(16)
 COLLECTION_TIMESTAMP			   NOT NULL DATE
 METRIC_ERROR_MESSAGE				    VARCHAR2(4000)
 METRIC_ERROR_TYPE				    NUMBER(1)

/* MGMT_METRIC_ERRORS为度量错误历史记录 */

SQL> SELECT METRIC_ERROR_MESSAGE,COLLECTION_TIMESTAMP,METRIC_ERROR_TYPE FROM MGMT_CURRENT_METRIC_ERRORS WHERE COLLECTION_TIMESTAMP<'20-Mar-11';

METRIC_ERROR_MESSAGE				   COLLECTIO METRIC_ERROR_TYPE
-------------------------------------------------- --------- -----------------
Missing Properties : [SQLINPARAM1]		   03-MAR-11		     0
Missing Properties : [SQLINPARAM1]		   03-MAR-11		     0
Missing Properties : [SQLINPARAM1]		   03-MAR-11		     0
Missing Properties : [SQLINPARAM1,SQLINPARAM4]	   03-MAR-11		     0
Missing Properties : [SQLINPARAM1]		   03-MAR-11		     0
Missing Properties : [SQLINPARAM1]		   03-MAR-11		     0
Missing Properties : [SQLINPARAM1]		   03-MAR-11		     0
Missing Properties : [SQLINPARAM1]		   03-MAR-11		     0
Missing Properties : [SQLINPARAM1]		   03-MAR-11		     0
Result has repeating key value : OCM_APPLY,apply   05-MAR-11		     0

/* 以上查询显示了3月20前出现的仍在console中的信息记录 */

SQL> desc sysman.mgmt$alert_current;
 Name					   Null?    Type
 ----------------------------------------- -------- ----------------------------
 TARGET_NAME				   NOT NULL VARCHAR2(256)
 TARGET_TYPE				   NOT NULL VARCHAR2(64)
 TARGET_GUID				   NOT NULL RAW(16)
 VIOLATION_GUID 				    RAW(16)
 METRIC_NAME				   NOT NULL VARCHAR2(64)
 METRIC_COLUMN				   NOT NULL VARCHAR2(64)
 METRIC_GUID				   NOT NULL RAW(16)
 METRIC_LABEL					    VARCHAR2(64)
 COLUMN_LABEL					    VARCHAR2(256)
 KEY_VALUE					    VARCHAR2(256)
 KEY_VALUE2					    VARCHAR2(256)
 KEY_VALUE3					    VARCHAR2(256)
 KEY_VALUE4					    VARCHAR2(256)
 KEY_VALUE5					    VARCHAR2(256)
 COLLECTION_TIMESTAMP				    DATE
 ALERT_STATE					    VARCHAR2(18)
 VIOLATION_LEVEL			   NOT NULL NUMBER
 VIOLATION_TYPE 				    VARCHAR2(19)
 MESSAGE					    VARCHAR2(4000)
 MESSAGE_NLSID					    VARCHAR2(64)
 MESSAGE_PARAMS 				    VARCHAR2(4000)
 ACTION_MESSAGE 				    VARCHAR2(4000)
 ACTION_MESSAGE_NLSID				    VARCHAR2(64)
 ACTION_MESSAGE_PARAMS				    VARCHAR2(4000)
 TYPE_DISPLAY_NAME				    VARCHAR2(128)
 CURRENT_VALUE					    VARCHAR2(1024)

/* mgmt$alert_current记录当前Grid中活跃的警告信息,类似的mgmt$alert_history为其历史记录表 */

SQL> select target_name,message from SYSMAN.mgmt$alert_current;

TARGET_NAME	     MESSAGE
-------------------- ----------------------------------------------------------------------------------------------------
nas:3872	     Difference between OMS system time and Agent system time is 7680 mins and has exceeded the critical
		     threshold 120 mins

rh3:3872	     Agent Virtual Memory Growth is 1.53%
rh3:3872	     Count of targets not uploading exceeded the critical threshold (0). Current value: 2
EMREP_rh3	     User SYS logged on from rh3.oracle.com.
LSN1_rh3	     The listener is down: TNS-12541: TNS:no listener .
rh6:3872	     Difference between OMS system time and Agent system time is 7199 mins and has exceeded the critical
		     threshold 120 mins

Management Services  Management Service Status for nas:4889_Management_Service exceeded the critical threshold (DOWN). Cu
and Repository	     rrent value: DOWN


在OEL5上安装配置Oracle Gird Control 10.2.0.5

早期的Grid Control问题实在太多了,以至于把10.2.0.1的Grid Control升级到10.2.0.5几乎是不可能完成的任务;此外10.2.0.5以前的gc不支持11g作为repository database仓库数据库,不仅于此10.2.0.1版本是不支持rhel5或OEL5的,如果想安装的话rhel/oel 4是仅有理想的平台。这就这为我们制造了许多局限。如果是Fresh Installation的话似乎只安装软件(software only),而不在安装10.2.0.1阶段配置oms,在升级到10.2.0.5后再进行oms的config会是一种比较理想的安装方法。不过我们仍需要应付一个复杂的配置过程,写这个文档的目的是帮助我们应付(包括已安装过的人,因为很容易忘记)这种窘况。

1.我们需要一个已经存在的Oracle数据库,当然它应当是纯净的(没有相关的em repository),其版本最好是10.2.0.5或者11.2.0.2;我们假设你的数据库已经满足了一切安装grid control的前提要求,这包括设置几个初始化参数和装有dbms_shared_pool包等等。

2.其次你需要下载对应的软件,这包括了完全版本的10.2.0.1安装介质和10.2.0.5 gc patchset,并且最好有10.2.0.5版的agent(个人经验agent不太稳定,有时候需要重配,所以有介质的话会方便些):

  • Linux_Grid_Control_full_102010_disk1.zip
  • Linux_Grid_Control_full_102010_disk2.zip
  • Linux_Grid_Control_full_102010_disk3.zip
  • gc_x86_64_10205_part1of2.zip
  • gc_x86_64_10205_part2of2.zip
  • Linux_x86_64_Grid_Control_agent_download_10_2_0_5_0.zip

要下载这一大堆介质可能是我们安装过程中最麻烦的一件事情,特别是当你的网络状况欠佳的季节。
因为都是zip包,所以你只需要使用unzip命令将以上介质一一解压到合适目录就可以了。

3.配置OMS所在主机的内核参数和rpm包,下面给出了相关配置文件的示例值:

/etc/sysctl.conf:
kernel.shmall = 2097152
kernel.shmmax = 536870912
kernel.shmmni = 4096
kernel.sem = 250 32000 100 128
fs.file-max = 65536
# semaphores: semmsl, semmns, semopm, semmni
net.ipv4.ip_local_port_range = 1024 65000
net.core.rmem_default = 262144
net.core.rmem_max = 262144
net.core.wmem_default = 262144
net.core.wmem_max = 262144

/* 注意以上参数并不一定适合你的主机,具体如何配置请参见Metalink文档 */

同时修改/etc/security/limits.conf参数文件:
*               soft    nproc   2047
*               hard    nproc   16384
*               soft    nofile  1024
*               hard    nofile  65536

/* 星号换上你的安装用户名,如oracle或者其他dba组成员 */

安装适当的rpm包,安装database时需要的包一律也都需要,此外请特别留意安装一下几个包:
compat-libstdc++-296-2.96-138.i386
libstdc++-devel-4.1.2-48.el5.x86_64
libstdc++-devel-4.1.2-48.el5.i386
glibc-devel-2.5-49.x86_64
glibc-devel-2.5-49.i386

并建立下列符号连接:
ln -s /usr/lib/libgdbm.so.2.0.0 /usr/lib/libdb.so.2

4.上述工作完成后我们需要修改response文件以满足安装的需要,在10.2.0.1安装介质的解压目录下操作:

[root@nas media]# ls
dcommon  doc  index.htm  install  libskgxn  oms  rdbms  response  runInstaller  stage

[root@nas media]# vi response/em_using_existing_db.rsp 

/* 修改em_using_existing_db.rsp响应文件 */

包括以下参数需要从默认值修改为指定值:
UNIX_GROUP_NAME="dba"
#dba应当是有效的安装用户所在组

FROM_LOCATION="/s01/media/oms/Disk1/stage/products.xml"
#FROM LOCATION指向安装介质stage目录下的products.xml文件

BASEDIR="/s01/app/gc"
#BASEDIR指向grid control安装的基础目录

INSTALLATION_NAME="oms10g"
#安装名

s_reposHost="rh3.oracle.com"
#repository数据库的主机名或ip地址

s_reposPort="1521"
#repository数据库的监听端口

s_reposSID="EMREP"
#repository数据库的sid

s_reposDBAPwd="maclean"
#repository数据库的sys用户密码

s_mgmtTbsName="/s01/orabase/oradata/EMREP/mgmt.dbf"
#repository数据库今后的mgmt表空间的数据文件名

s_ecmTbsName=s_mgmtTbsName="/s01/orabase/oradata/EMREP/mgmt_ecm.dbf"
#repository数据库今后的ecm表空间的数据文件名

s_securePassword="maclean"
#agent将来使用的secure密码

s_securePasswordConfirm="maclean"
#确认上一步的密码

b_lockedSelected=false
#确定agent交互是否被锁

s_reposPwd="maclean"
#确定仓库数据库中模式拥有者(sysman)的密码

s_reposPwdConfirm="maclean"
#确认上一步的密码

5.以静默方式安装grid control 10.2.0.1,但不配置oms:

[maclean@nas ~]$ export  TMP=/tmp

[maclean@nas ~]$  /s01/media/install/runInstaller -noconfig -silent -ignoreSysPrereqs -responseFile \
/s01/media/response/em_using_existing_db.rsp  use_prereq_checker=false b_skipDBValidation=true -force

以上安装完成后,运行相关脚本:
[maclean@nas ~]$ su - root -c "/home/maclean/oraInventory/orainstRoot.sh"

[maclean@nas ~]$ su - root -c "/s01/app/gc/oms10g/allroot.sh"

并使用opmonctl命令关闭http等服务:
[maclean@nas ~]$ /s01/app/gc/oms10g/opmn/bin/opmnctl stopall
opmnctl: stopping opmn and all managed processes...

5.接下来我们需要安装grid control 10.2.0.5补丁集,同样的需要修改一个response响应文件:

[maclean@nas 10205]$ unzip /tmp/gc_x86_64_10205_part2of2.zip
Archive:  /tmp/gc_x86_64_10205_part2of2.zip
extracting: p3731593_10205_Linux-x86-64.zip

[maclean@nas 10205]$ unzip p3731593_10205_Linux-x86-64.zip
..............

[maclean@nas ~]$ cp /s01/10205/3731593/Disk1/response/patchset.rsp /s01/10205/3731593/Disk1/response/oms_patchset.rsp

/* 对oms_patchset.rsp修改已有的参数为指定值 */

[maclean@nas ~]$ vi /s01/10205/3731593/Disk1/response/oms_patchset.rsp

ORACLE_HOME="/s01/app/gc/oms10g"
b_softwareonly=true
s_sysPassword="maclean"
sl_pwdInfo={ "maclean" }
oracle.iappserver.st_midtier:szl_InstanceInformation={ "maclean" }

ORACLE_HOME_NAME="oms10g"
#另外增加以上条目

[maclean@nas ~]$ /s01/10205/3731593/Disk1/runInstaller -noconfig -silent \
-responseFile /s01/10205/3731593/Disk1/response/oms_patchset.rsp

/* 以上10.2.0.5补丁安装完成后,同样需要执行root.sh */

[maclean@nas ~]$ su - root -c "/s01/app/gc/oms10g/root.sh"

6.完成上述安装后可以开始配置OMS了:

[maclean@nas ~]$ export PERL5LIB=/s01/app/gc/oms10g/perl/lib/5.6.1

[maclean@nas ~]$ /s01/app/gc/oms10g/perl/bin/perl /s01/app/gc/oms10g/sysman/install/ConfigureGC.pl \
/s01/app/gc
Base Directory: /s01/app/gc

 Starting ito execute Configuration Assistants: 

Running the configuration assistants using the following command:
/s01/app/gc/oms10g/oui/bin/runConfig.sh INV_PTR_LOC=/s01/app/gc/oms10g/oraInst.loc
ORACLE_HOME=/s01/app/gc/oms10g ACTION=configure MODE=perform
COMPONENT_XML={encap_oms.1_0_0_0_0.xml}
perform - mode is starting for action: configure

以上命令的具体形式为:
<OMS ORACLE_HOME>/perl/bin/perl <OMS ORACLE_HOME>/sysman/install/ConfigureGC.pl <Parent Directory filepath> 

配置会消耗大量的时间,建议通过.../oms10g/cfgtoollogs/oui/configActions<>.log.日志文件监控配置过程:

[root@nas oui]# cd /s01/app/gc/oms10g/cfgtoollogs/oui

[root@nas oui]# tail -f configActions2011-01-23_08-57-20-AM.log
... return status = 0 (success)
Oracle JAAS [Sun Jan 23 08:57:43 CST 2011]  $ORACLE_HOME/j2ee/home/config/jazn-data.xml is synchronized successfully to dcm repository.
Please check the log file [/s01/app/gc/oms10g/cfgtoollogs/jaznca.log] for details.

The plug-in Java Security Configuration Assistant has successfully been performed
------------------------------------------------------
------------------------------------------------------
The plug-in Web Cache Configuration Assistant is running

2
Start traversing...
got process-manager node
got ias-instance node
attrValue=IASPT

attrValue=DSA

attrValue=HTTP_Server

attrValue=LogLoader

attrValue=dcm-daemon

attrValue=OC4J

attrValue=WebCache

Entity found.

got ias-instance node
Current status is : enabled
Changing the value of port to enabled
 Modified ...
Before After WaitForComplete
After WaitForComplete
Completed smiSetStatus
Checking status ... enableconfiguration
In ... getWebcachePort
WebCache Default Port :7777
In ... updateApacheConf
Apache Port Value : 7777
Apache Before WaitForComplete
Apache After WaitForComplete
Checking for Apache updation status
Apache httpd.conf updated
smiTearDown
Will be checking the status ...
Webcache Configuration finished successfully

The plug-in Web Cache Configuration Assistant has successfully been performed
------------------------------------------------------
------------------------------------------------------
The plug-in Oracle Application Server Instance Configuration Assistant is running

The plug-in Oracle Application Server Instance Configuration Assistant has successfully been performed
------------------------------------------------------
------------------------------------------------------
The plug-in OC4J Instance Configuration Assistant is running

Reading ini file - '/s01/app/gc/oms10g/j2ee/deploy.ini'
Adding web-app 'IsWebCacheWorkingWeb.war' for app 'IsWebCacheWorking'.
Adding web-app 'wsrp-samples.war' for app 'portletapp'.
Initializing DCM...done.
OC4J instance 'home' already exists.
Starting OC4J instance 'home'...done.
Deploying application 'portletapp' to OC4J instance 'home'.
Notification ==> Application Deployer for portletapp STARTS [ 2011-01-23T08:58:00.972CST ]
Notification ==> Undeploy previous deployment
Notification ==> Removing files for app file:/s01/app/gc/oms10g/j2ee/home/applications/portletapp.ear
Notification ==> Copy the archive to /s01/app/gc/oms10g/j2ee/home/applications/portletapp.ear
Notification ==> Unpack portletapp.ear begins...
Notification ==> Unpack portletapp.ear ends...
Notification ==> Initialize portletapp.ear begins...
Notification ==> Initialize portletapp.ear ends...
Notification ==> Initialize wsrp-samples begins...
Notification ==> Initialize wsrp-samples ends...
Notification ==> deleting:  /s01/app/gc/oms10g/j2ee/home/applications/portletapp.ear
Notification ==> deleting:  /s01/app/gc/oms10g/j2ee/home/applications/portletapp/wsrp-samples.war
Notification ==> Application Deployer for portletapp COMPLETES [ 2011-01-23T08:58:01.319CST ] 

Deploying application 'IsWebCacheWorking' to OC4J instance 'home'.
Notification ==> Application Deployer for IsWebCacheWorking STARTS [ 2011-01-23T08:58:01.328CST ]
Notification ==> Undeploy previous deployment
Notification ==> Removing files for app file:/s01/app/gc/oms10g/j2ee/home/applications/IsWebCacheWorking.ear
Notification ==> Copy the archive to /s01/app/gc/oms10g/j2ee/home/applications/IsWebCacheWorking.ear
Notification ==> Unpack IsWebCacheWorking.ear begins...
Notification ==> Unpack IsWebCacheWorking.ear ends...
Notification ==> Initialize IsWebCacheWorking.ear begins...
Notification ==> Initialize IsWebCacheWorking.ear ends...
Notification ==> Initialize IsWebCacheWorkingWeb begins...
Notification ==> Initialize IsWebCacheWorkingWeb ends...
Notification ==> deleting:  /s01/app/gc/oms10g/j2ee/home/applications/IsWebCacheWorking.ear
Notification ==> deleting:  /s01/app/gc/oms10g/j2ee/home/applications/IsWebCacheWorking/IsWebCacheWorkingWeb.war
Notification ==> Application Deployer for IsWebCacheWorking COMPLETES [ 2011-01-23T08:58:01.362CST ] 

Calling updateConfig to notify DCM of new deployments...done.
Adding dependent libraries for application 'portletapp'...done.
Adding OC4J mount points for application 'portletapp'...done.
Adding OC4J mount points for application 'IsWebCacheWorking'...done.
Calling SMI to save changes.
SMISession.saveChanges succeeded.
Binding web app 'wsrp-samples' to default-web-site for application 'portletapp' in OC4J instance 'home'
Web app 'wsrp-samples' bound successfully.
Binding web app 'IsWebCacheWorkingWeb' to default-web-site for application 'IsWebCacheWorking' in OC4J instance 'home'
Web app 'IsWebCacheWorkingWeb' bound successfully.
Calling updateConfig to notify DCM of new web-bindings...done.
Adding application 'portletapp' to the DCM repository...done.
Application 'portletapp' deployed successfully.
Adding application 'IsWebCacheWorking' to the DCM repository...done.
Application 'IsWebCacheWorking' deployed successfully.
Stopping OC4J instance 'home'...done.
Calling SMI to retry init of failed plugins...done.
Terminating DCM...done.
Copying /s01/app/gc/oms10g/j2ee/deploy.ini to /s01/app/gc/oms10g/j2ee/deploy.ini.1295744298019.bak.
Writing any undeployed entries back to /s01/app/gc/oms10g/j2ee/deploy.ini.

Oc4jDeploy tool completed successfully!

The plug-in OC4J Instance Configuration Assistant has successfully been performed
------------------------------------------------------
------------------------------------------------------
The plug-in Register DCM Plug-Ins With EM is running

Operation successful.

The plug-in Register DCM Plug-Ins With EM has successfully been performed
------------------------------------------------------
------------------------------------------------------
The plug-in DCM Repository Backup Assistant is running

backup created: InstalledImage_EnterpriseManager0.nas

The plug-in DCM Repository Backup Assistant has successfully been performed
------------------------------------------------------
------------------------------------------------------
The plug-in EM Technology Stack Upgrade is running

The plug-in EM Technology Stack Upgrade has successfully been performed
------------------------------------------------------
------------------------------------------------------
The plug-in OMS Configuration is running

Operation Stopping OPMN Processes is in progress.
Operation EM Deploying is in progress.
Operation Creating OMS Respository is in progress.
Operation Configuring OMS is in progress.
OMS is being Secured and Lock is set to false.
Precompiling JSPs.
Performing installation of CLI services for client.
Operation Restarting OPMN Processes is in progress.

The plug-in OMS Configuration has successfully been performed
------------------------------------------------------
------------------------------------------------------
The plug-in Agent Configuration Assistant is running

Performing free port detection on host=nas
Securing the agent
Performing targets discovery and agent configuration

The plug-in Agent Configuration Assistant has failed its perform method
------------------------------------------------------
The action configuration has failed its perform method
###################################################

7.如果以上oms configuration顺利完成那么Grid Control的网页界面已经可以登录了,但我们在本地服务器上的agent仍是10.2.0.1版本的(所以Agent Configuration Assistant失败了),这里我推荐将原agent目录删除后通过10.2.0.5的agent介质(Linux_x86_64_Grid_Control_agent_download_10_2_0_5_0.zip)重新安装并部署,这样可以很大程度上避免出现问题。

沪公网安备 31010802001379号

TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569