给11gR2 RAC添加LISTENER监听器并静态注册

之前有同学想要给11gR2的RAC添加LISTENER监听器,查看了listener.ora并发现问题:

 

[oracle@vrh2 ~]$ lsnrctl status
LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 04-DEC-2011 02:51:40

Copyright (c) 1991, 2011, Oracle. All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date 02-DEC-2011 05:40:09
Uptime 1 days 21 hr. 11 min. 31 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /g01/11.2.0/grid/network/admin/listener.ora
Listener Log File /g01/orabase/diag/tnslsnr/vrh2/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.163)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.164)(PORT=1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
Instance "+ASM2", status READY, has 1 handler(s) for this service...
Service "VPROD" has 1 instance(s).
Instance "VPROD2", status READY, has 1 handler(s) for this service...
Service "VPRODXDB" has 1 instance(s).
Instance "VPROD2", status READY, has 1 handler(s) for this service...
The command completed successfully

[oracle@vrh2 ~]$ cat /g01/11.2.0/grid/network/admin/listener.ora

LISTENER_SCAN1=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))) # line added by Agent
LISTENER=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))) # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN1=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LSN_MACLEAN=ON # line added by Agent

 

以上listener.ora配置文件中的信息是Grid Infrastructure安装过程中Agent自行添加的(During the Grid Infrastructure installation, the (default) node VIP listener is always created referencing the public network),比较难以理解的可能是LISTENER仅指定了PROTOCOL=IPC的信息, 而没有指定监听的地址、端口等信息。

 

实际上11.2 GI的LISTENER 监听器配置默认受到11.2新引入的endpoints_listener.ora配置文件的管理:

 

Listener.ora

[grid@netrac1 admin]$ more listener.ora
LISTENER_SCAN2=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN2)))) # line added by Agent
LISTENER=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER))))# line added by Agent
LISTENER_SCAN1=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))) # line added by Agent

ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN1=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER=ON # line added by Agent
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN2=ON # line added by Agent

The ENABLE_GLOBAL_DYNAMIC_ENDPOINT_ parameter is set to allow the listener to accept connections
for pre-11.2 databases which did not register the dynamic endpoint.

Listener status "listener" showing 1 instance registered, ie instance running on the node

[grid@netrac1 admin]$ lsnrctl status listener
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for Linux: Version 11.2.0.1.0 - Production
Start Date 15-FEB-2011 10:57:09
Uptime 0 days 0 hr. 0 min. 46 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /u01/app/11.2.0/grid/network/admin/listener.ora
Listener Log File /u01/app/grid/diag/tnslsnr/netrac1/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=12.345.678.111)(PORT=1521))) ** Node IP Address **
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=12.345.678.888)(PORT=1521))) ** Node VIP Address **
Services Summary...
Service "v11gr2" has 1 instance(s).
Instance "v11gr21", status READY, has 2 handler(s) for this service...
The command completed successfully

New file for 11.2 called endpoints_listener.ora, showing the Node IP address and Node VIP address.

[grid@netrac1 admin]$ more endpoints_listener.ora
LISTENER_NETRAC1=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=netrac1-vip)(PORT=1521))
(ADDRESS=(PROTOCOL=TCP)(HOST=12.345.678.888)(PORT=1521)(IP=FIRST)))) # line added by Agent

Endpoints_listener.ora file is there for backward compatibility with pre-11.2 databases.
DBCA needs to know the endpoints location to configure database parameters and tnsnames.ora file.
It used to use the listener.ora file, 11.2 RAC listener.ora by default only has IPC entries.

"Line added by Agent" is the Oraagent is the process updating the listener.ora and endpoints_listener.ora files.
Endpoints_listener.ora showing the Node IP address and Node VIP address

[grid@netrac2 admin]$ more endpoints_listener.ora
LISTENER_NETRAC2=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=netrac2-vip)(PORT=1521))
(ADDRESS=(PROTOCOL=TCP)(HOST=12.345.678.999) (PORT=1521)(IP=FIRST)))) # line added by Agent

 

我一开始以为LISTENER默认监听的地址和端口被写到了OCR中,后来用ocrdump转储注册信息发现没有相关记录。 后来才发现原来11.2 GI中监听器的地址和端口信息被移到了 endpoints_listener.ora中, “Line added by Agent”说明是由Oraagent 进程更新的记录。

 

注意:使用 endpoints_listener.ora的情况 下不应使用lsnrctl管理LISTENER,而需使用srvctl或crsctl工具管理,否则lsnrctl将不会识别endpoints_listener.ora中的信息,造成监听没有在必要地址、端口上工作。如:

 

[grid@vrh1 admin]$ lsnrctl status LSN_MACLEAN

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 27-NOV-2011 10:45:26

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LSN_MACLEAN)))
STATUS of the LISTENER
------------------------
Alias                     LSN_MACLEAN
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                26-NOV-2011 08:33:14
Uptime                    1 days 2 hr. 12 min. 11 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /g01/11.2.0/grid/network/admin/listener.ora
Listener Log File         /g01/orabase/diag/tnslsnr/vrh1/lsn_maclean/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LSN_MACLEAN)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.161)(PORT=1588)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.162)(PORT=1588)))
Services Summary...
Service "VPROD" has 1 instance(s).
  Instance "VPROD1", status READY, has 1 handler(s) for this service...
Service "VPRODXDB" has 1 instance(s).
  Instance "VPROD1", status READY, has 1 handler(s) for this service...
The command completed successfully

[grid@vrh1 admin]$ lsnrctl reload LSN_MACLEAN

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 27-NOV-2011 10:45:39

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LSN_MACLEAN)))
The command completed successfully

[grid@vrh1 admin]$ lsnrctl status LSN_MACLEAN

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 27-NOV-2011 10:45:44

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LSN_MACLEAN)))
STATUS of the LISTENER
------------------------
Alias                     LSN_MACLEAN
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                26-NOV-2011 08:33:14
Uptime                    1 days 2 hr. 12 min. 30 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /g01/11.2.0/grid/network/admin/listener.ora
Listener Log File         /g01/orabase/diag/tnslsnr/vrh1/lsn_maclean/alert/log.xml
Listening Endpoints Summary...
 (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LSN_MACLEAN)))
The listener supports no services
The command completed successfully

[grid@vrh1 admin]$ srvctl stop listener -l LSN_MACLEAN

[grid@vrh1 admin]$ srvctl start listener -l LSN_MACLEAN  

[grid@vrh1 admin]$ lsnrctl status LSN_MACLEAN

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 27-NOV-2011 10:46:26

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LSN_MACLEAN)))
STATUS of the LISTENER
------------------------
Alias                     LSN_MACLEAN
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                27-NOV-2011 10:46:22
Uptime                    0 days 0 hr. 0 min. 4 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /g01/11.2.0/grid/network/admin/listener.ora
Listener Log File         /g01/orabase/diag/tnslsnr/vrh1/lsn_maclean/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LSN_MACLEAN)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.161)(PORT=1588)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.162)(PORT=1588)))
The listener supports no services
The command completed successfully

 

而在11.2 RAC中listener.ora仅记录LISTENER的IPC条目。这样做的目的是方便dbca配置数据库参数及tnsnames.ora配置文件。

了解到以上信息后可能你对当前11.2 RAC中的listener.ora文件中的监听配置信息不再感到奇怪。

 

 

我们可以使用netca图形化工具或者srvctl 命令行工具添加监听配置; 如果仅仅是手动在listener.ora中添加记录的话是无法被注册为Cluster Ready Service的服务的,将不会被CRS管理。

 

方法1:

使用netca和netmgr图形化工具,完成添加监听和静态注册的工作。

1) 以Grid Infrastructure GI用户登录任意节点,并运行netca启动图形界面:

su - grid
(grid)$ export DISPLAY=:0
(grid)$ netca

选择LISTENER Configuration

 

 

选择ADD

填入监听名字

 

 

选择subnet和availabe protocol ,一般默认即可,除非你有多个public network网段

 

填入端口号

选择NO

 

 

选择要启动的监听名,即方才你创建的监听名

之后选择FINISH退出netca 界面,启动netmgr界面,为监听加入静态注册的信息:

su - grid
(grid)$ export DISPLAY=:0
(grid)$ netmgr

 

点选方才创建的监听器,选择Database Services菜单

 

 

填入Global Database Name和本地实例的SID信息,并确认ORACLE HOME Directory(应是Grid Infrastructure的Home目录)正确后点选Save Network Configuration。

 

之后使用srvctl 或 crsctl 重启该监听即可生效:

 

[grid@vrh1 admin]$ crsctl status  res ora.MACLEAN_LISTENER.lsnr
NAME=ora.MACLEAN_LISTENER.lsnr
TYPE=ora.listener.type
TARGET=ONLINE        , ONLINE
STATE=ONLINE on vrh1, ONLINE on vrh2

[grid@vrh1 admin]$ crsctl stop  res ora.MACLEAN_LISTENER.lsnr
CRS-2673: Attempting to stop 'ora.MACLEAN_LISTENER.lsnr' on 'vrh1'
CRS-2673: Attempting to stop 'ora.MACLEAN_LISTENER.lsnr' on 'vrh2'
CRS-2677: Stop of 'ora.MACLEAN_LISTENER.lsnr' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.MACLEAN_LISTENER.lsnr' on 'vrh2' succeeded

[grid@vrh1 admin]$ crsctl start  res ora.MACLEAN_LISTENER.lsnr
CRS-2672: Attempting to start 'ora.MACLEAN_LISTENER.lsnr' on 'vrh2'
CRS-2672: Attempting to start 'ora.MACLEAN_LISTENER.lsnr' on 'vrh1'
CRS-2676: Start of 'ora.MACLEAN_LISTENER.lsnr' on 'vrh1' succeeded
CRS-2676: Start of 'ora.MACLEAN_LISTENER.lsnr' on 'vrh2' succeeded

[grid@vrh1 admin]$ lsnrctl status MACLEAN_LISTENER

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 27-NOV-2011 11:00:42

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=MACLEAN_LISTENER)))
STATUS of the LISTENER
------------------------
Alias                     MACLEAN_LISTENER
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                27-NOV-2011 11:00:11
Uptime                    0 days 0 hr. 0 min. 31 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /g01/11.2.0/grid/network/admin/listener.ora
Listener Log File         /g01/orabase/diag/tnslsnr/vrh1/maclean_listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=MACLEAN_LISTENER)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.161)(PORT=1598)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.162)(PORT=1598)))
Services Summary...
Service "VPROD" has 1 instance(s).
  Instance "VPROD1", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully

[grid@vrh1 admin]$ srvctl stop listener -l MACLEAN_LISTENER

[grid@vrh1 admin]$ srvctl start listener -l MACLEAN_LISTENER  

[grid@vrh1 admin]$ srvctl config listener -l MACLEAN_LISTENER
Name: MACLEAN_LISTENER
Network: 1, Owner: grid
Home:
End points: TCP:1598

[grid@vrh1 admin]$ lsnrctl status MACLEAN_LISTENER

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 27-NOV-2011 11:01:42

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=MACLEAN_LISTENER)))
STATUS of the LISTENER
------------------------
Alias                     MACLEAN_LISTENER
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                27-NOV-2011 11:01:10
Uptime                    0 days 0 hr. 0 min. 31 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /g01/11.2.0/grid/network/admin/listener.ora
Listener Log File         /g01/orabase/diag/tnslsnr/vrh1/maclean_listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=MACLEAN_LISTENER)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.161)(PORT=1598)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.162)(PORT=1598)))
Services Summary...
Service "VPROD" has 1 instance(s).
  Instance "VPROD1", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully

 

以上使用netca和netmgr图形界面工具完成了新监听的添加和静态注册工作。

 

2. 使用srvctl 工具添加监听并手动加入静态注册信息

 

检查默认network的network number,红色的数字

[grid@vrh1 admin]$ srvctl config network
Network exists: 1/192.168.1.0/255.255.255.0/eth0, type static

srvctl 添加监听的语法如下

[grid@vrh1 admin]$  srvctl add listener -h

Adds a listener configuration to the Oracle Clusterware.

Usage: srvctl add listener [-l <lsnr_name>] [-s] [-p "[TCP:]<port>[, ...][/IPC:<key>]
[/NMP:<pipe_name>][/TCPS:<s_port>] [/SDP:<port>]"] [-o <oracle_home>] [-k <net_num>]
    -l <lsnr_name>           Listener name (default name is LISTENER)
    -o <oracle_home>         ORACLE_HOME path (default value is CRS_HOME)
    -k <net_num>             network number (default number is 1)
    -s                       Skip the checking of ports
    -p "[TCP:]<port>[, ...][/IPC:<key>][/NMP:<pipe_name>][/TCPS:<s_port>] [/SDP:<port>]"      
Comma separated tcp ports or listener endpoints
    -h                       Print usage

[grid@vrh1 admin]$  srvctl add listener -l NEW_MACLEAN_LISTENER -o $CRS_HOME -p 1601 -k 1

-k 填入方才获得的network number,-p填入端口号,-l填入监听名,-o 填入GI HOME路径

[grid@vrh1 admin]$ srvctl start listener -l NEW_MACLEAN_LISTENER

 

srvctl start listener启动新添加的监听后listener.ora和endpoints_listener.ora会出现新的记录:

 

[grid@vrh1 admin]$ head -1 listener.ora
NEW_MACLEAN_LISTENER=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=NEW_MACLEAN_LISTENER))))           
# line added by Agent

[grid@vrh1 admin]$ head -1 endpoints_listener.ora
NEW_MACLEAN_LISTENER_VRH1=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=vrh1-vip)(PORT=1601))
(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.1.161)(PORT=1601)(IP=FIRST))))            
# line added by Agent

 

以上已经完成了监听的添加,足见使用srvctl管理更为简便。

 

之后仅需要加入静态注册信息即可,如:

 

SID_LIST_NEW_MACLEAN_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (GLOBAL_DBNAME = VPROD)
      (ORACLE_HOME = /g01/11.2.0/grid)
      (SID_NAME = VPROD1)
    )
  )

 

加入如上信息到listener.ora配置文件中(SID_LIST_($LISTENER_NAME),并重启监听即完成静态注册:

 

[grid@vrh1 admin]$ srvctl stop listener -l NEW_MACLEAN_LISTENER

[grid@vrh1 admin]$ srvctl start listener -l NEW_MACLEAN_LISTENER  

[grid@vrh1 admin]$ lsnrctl status NEW_MACLEAN_LISTENER

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 27-NOV-2011 11:21:37

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=NEW_MACLEAN_LISTENER)))
STATUS of the LISTENER
------------------------
Alias                     NEW_MACLEAN_LISTENER
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                27-NOV-2011 11:21:25
Uptime                    0 days 0 hr. 0 min. 11 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /g01/11.2.0/grid/network/admin/listener.ora
Listener Log File         /g01/11.2.0/grid/log/diag/tnslsnr/vrh1/new_maclean_listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=NEW_MACLEAN_LISTENER)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.161)(PORT=1601)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.162)(PORT=1601)))
Services Summary...
Service "VPROD" has 1 instance(s).
  Instance "VPROD1", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully

 

以上利用srvctl管理工具完成了添加新监听和静态注册的任务。

11gR2新特性:STANDBY_MAX_DATA_DELAY

Active Data Guard 是 Oracle 11g 的亮点特性之一,而在11G release 2中对Active Data Guard引入了更多诱人的新特性,这些特性将Active Data Guard打造成Oracle 读写分离或报表查询的理想方案之一。

 

STANDBY_MAX_DATA_DELAY是11gr2中对Active Data Guard的最大增强(buffer)之一,这是一个可以在会话级别指定的参数(session parameter),该参数指定了在Primary Database已commit提交的变化与standby Database数据库上涉及相关变化的查询之间所允许的时间延迟,单位为second 秒(Specifies a limit for the amount of time (in seconds) allowed to elapse between when changes are committed on the primary and when those same changes can be queries  on the standby database)。

 

使用该STANDBY_MAX_DATA_DELAY参数的语法如下:

ALTER SESSION SET STANDBY_MAX_DATA_DELAY ={ NONE | INTEGER }

 

注意事项

  • 该参数无法为SYS用户所用,在SYS用户的SESSION下设置该参数将被忽略
  • 若没有指定STANDBY_MAX_DATA_DELAY,即使用其默认值NONE,那么无论主备库之间有多大的延迟,在Physical Standby上的查询都会被执行
  • 若查询延迟超过STANDBY_MAX_DATA_DELAY所指定的值那么,将报ORA-03172错误:

 

03172, 00000, "STANDBY_MAX_DATA_DELAY of %s seconds exceeded"
// *Cause:  Standby recovery fell behind the STANDBY_MAX_DATA_DELAY
//          requirement.
// *Action: Tune recovery and retry the query later, or switch to another
//          standby database within the data delay requirement.

在实际运用中STANDBY_MAX_DATA_DELAY保证了在Standby数据库上所作的报表查询不会得到过于陈旧的结果(stale result),通过该参数我们可以指定一个报表应用所容许的数据时间延迟。

当然也可以指定不容许任何数据延迟,即设置STANDBY_MAX_DATA_DELAY为零,以便做到实时数据查询。

配置Primary 与 Standby 数据库之间的实时查询或者说零延迟查询有以下注意事项:

  • 只有特定的应用程序才会对数据延迟有零容忍的需求,注意你的应用程序是否有如此苛刻的要求
  • 在Standby数据库上执行的查询语句必须返回和主库上查询的完全一致的结果
  • 必须设置STANDBY_MAX_DATA_DELAY 为0
  • 在查询开始的那一刻,Standby数据库必须同步到与Primary数据库一致的Current Scn
  • 若结果没有在200ms内返回,则查询会因ORA-03172而终止
  • Primary数据库必须采用最大可用(max availability)或最大保护(maximum protection)模式
  • redo 传输必须使用SYNC 选项
  • 必须启用 Real-Time Query 特性

 

实际使用

 

以下我们通过演示来了解该STANDBY_MAX_DATA_DELAY的效果:

SQL> select * from v$version;  

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.oracledatabase12g.com & www.askmaclean.com

Primary Database  SQL> conn maclean/maclean
Connected.

Primary Database SQL> select database_role,protection_mode from v$database;

DATABASE_ROLE    PROTECTION_MODE
---------------- --------------------
PRIMARY          MAXIMUM AVAILABILITY

Primary Database SQL>  create table TSMDD tablespace users as select * From dba_objects;
Table created.

Standby Database SQL> conn maclean/maclean
Connected.

Standby Database SQL> select database_role,protection_mode from v$database;

DATABASE_ROLE    PROTECTION_MODE
---------------- --------------------
PHYSICAL STANDBY MAXIMUM AVAILABILITY

注意STANDBY_MAX_DATA_DELAY是一个会话参数session parameter,而非实例参数instance parameter

Standby Database SQL> select name from v$system_parameter where name='standby_max_data_delay';

no rows selected

Standby Database SQL> alter session set STANDBY_MAX_DATA_DELAY=0;

Session altered.

Standby Database SQL> select count(*) from TSMDD; 

  COUNT(*)
----------
     13378

 

实际测试可以发现当STANDBY_MAX_DATA_DELAY=0时,并不是查询语句执行时间超过200ms就返回ORA-03172错误,而是指从查询开始的200ms内,若备库没有追上主库的Current SCN时出现ORA-03172。

 

Standby Database SQL> alter session set STANDBY_MAX_DATA_DELAY=0;

Session altered.

Standby Database SQL> set timing on;

Standby Database SQL> select count(1) from TSMDD a, TSMDD b;

  COUNT(1)
----------
 178970884

Elapsed: 00:00:05.34

Standby Database SQL> alter session set events '10046 trace name context forever,level 12';
Session altered.

在主库上执行大数据量的insert操作,但是不提交commit;

Primary Database SQL> insert into /*+ append */  tsmdd select * from tsmdd;

此时在Standby 数据库 上执行查询语句将触发ORA-3172错误

Standby Database SQL> select count(*) from tsmdd
                     *
ERROR at line 1:
ORA-03172: STANDBY_MAX_DATA_DELAY of 0 seconds exceeded

Standby Database SQL>  /
select count(*) from tsmdd
*
ERROR at line 1:
ORA-03172: STANDBY_MAX_DATA_DELAY of 0 seconds exceeded

 

以上查询语句执行过程中的10046 trace如下:

 

PARSING IN CURSOR #47828795969456 len=26 dep=0 uid=34 oct=3 lid=34 tim=1316692536000853
hv=2314050071 ad='7115e798' sqlid='3smn48y4yv6hr'

select count(*) from tsmdd
END OF STMT
PARSE #47828795969456:c=0,e=61,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1739041831,tim=1316692536000852
WAIT #47828795969456: nam='standby query scn advance'
ela= 201440 p1=770798 p2=0 p3=20 obj#=13873 tim=1316692536202337
WAIT #47828795969456: nam='SQL*Net break/reset to client' ela= 25 driver id=1650815232
break?=1 p3=0 obj#=13873 tim=1316692536202528
WAIT #47828795969456: nam='SQL*Net break/reset to client' ela= 144 driver id=1650815232
break?=0 p3=0 obj#=13873 tim=1316692536202694
WAIT #47828795969456: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1
p3=0 obj#=13873 tim=1316692536202715

*** 2011-09-22 19:55:37.983
WAIT #47828795969456: nam='SQL*Net message from client' ela= 1781108 driver
id=1650815232 #bytes=1 p3=0 obj#=13873 tim=1316692537983884
CLOSE #47828795969456:c=0,e=24,dep=0,type=0,tim=1316692537984068

===============================================================================================

PARSING IN CURSOR #47828795969456 len=26 dep=0 uid=34 oct=3 lid=34 tim=1316692537984172
hv=2314050071 ad='7115e798' sqlid='3smn48y4yv6hr'
select count(*) from tsmdd
END OF STMT
PARSE #47828795969456:c=0,e=53,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1739041831,tim=1316692537984171
WAIT #47828795969456: nam='standby query scn advance' ela= 200546 p1=770914
p2=0 p3=20 obj#=13873 tim=1316692538184822
WAIT #47828795969456: nam='SQL*Net break/reset to client' ela= 10 driver
id=1650815232 break?=1 p3=0 obj#=13873 tim=1316692538184998
WAIT #47828795969456: nam='SQL*Net break/reset to client' ela= 103 driver
id=1650815232 break?=0 p3=0 obj#=13873 tim=1316692538185154
WAIT #47828795969456: nam='SQL*Net message to client' ela= 1 driver
id=1650815232 #bytes=1 p3=0 obj#=13873 tim=1316692538185182

 

注意这里出现的standby query scn advance等待事件,显然该等待事件是为了确认Primary与Standby之间的Scn差距,但这又是一个Internal的undocumented 等待事件。我猜测是P1是Standby数据库的Current Scn,而p3可能是Primary 与 Standby之间的Scn 差距。OBJ#是查询对象的object_id:

 

SQL> col owner for a20
SQL> col object_name for a20

SQL> select owner,object_name from dba_objects where object_id=13873;

OWNER                OBJECT_NAME
-------------------- --------------------
MACLEAN              TSMDD

 

使用技巧

 

在实际的使用过程中我们没有必要每次登录会话查询都去指定STANDBY_MAX_DATA_DELAY参数,可以通过创建AFTER LOGON触发器来简化工作。

在11 g Release 2中引入了USERENV Context的一种新属性DATABASE_ROLE,使用该属性可以便捷地定位用户所登录数据库的角色是Primary 还是 Standby,11g的SQL 和 PL/SQL客户端程序均可以通过 SYS_CONTEXT 函数获取该数据库角色信息。

通过创建以下登陆后触发器可以做到当应用程序登录到启用实时查询的Standby数据库上后即自动设置合适的STANDBY_MAX_DATA_DELAY参数。这样即避免了修改应用程序的代码,有做到了配置合理的最大数据延迟。

CREATE OR REPLACE TRIGGER AUTO_SMDD
  AFTER LOGON ON USER.SCHEMA
BEGIN
  IF (SYS_CONTEXT('USERENV', 'DATABASE_ROLE') IN ('PHYSICAL STANDBY')) THEN
    execute immediate 'alter session set standby_max_data_delay=5';
  END IF;
END;

 

注意以上trigger 只需要在Primary Database上以应用相关用户身份建立即可,会同步到Standby上:

 

Primary Database SQL>  conn maclean/maclean
Connected.

Primary Database SQL> CREATE OR REPLACE TRIGGER AUTO_SMDD
  2    AFTER LOGON ON MACLEAN.SCHEMA
  3  BEGIN
  4    IF (SYS_CONTEXT('USERENV', 'DATABASE_ROLE') IN ('PHYSICAL STANDBY')) THEN
  5      execute immediate 'alter session set standby_max_data_delay=0';
  6    END IF;
  7  END;
  8  /
Trigger created.

为11.2.0.2 Grid Infrastructure添加节点

在之前的文章中我介绍了为10g RAC Cluster添加节点的具体步骤。在11gr2中Oracle CRS升级为Grid Infrastructure,通过GI我们可以更方便地控制CRS资源如:VIP、ASM等等,这也导致了在为11.2中的GI添加节点时,同10gr2相比有着较大的差异。

这里我们要简述在11.2中为GI ADD NODE的几个要点:

一、准备工作

准备工作是不可忽略的,在10g RAC Cluster添加节点中我列举了必须完成的先决条件,在11.2 GI中这些条件依然有效,但请注意以下2点:

1.不仅要为oracle用户配置用户等价性,也要为grid(GI安装用户)用户配置;除非你同时使用oracle安装GI和RDBMS,这是不推荐的

2.在11.2 GI中推出了octssd(Oracle Cluster Synchronization Service Daemon)时间同步服务,如果打算使用octssd的话那么建议禁用ntpd事件服务,具体方法如下:

# service ntpd stop
Shutting down ntpd:                                        [  OK  ]
# chkconfig ntpd off
# mv /etc/ntp.conf /etc/ntp.conf.orig
# rm /var/run/ntpd.pid

3.使用cluster verify工具验证新增节点是否满足cluster的要求:

cluvfy stage -pre nodeadd -n <NEW NODE>

具体用法如:

su - grid

[grid@vrh1 ~]$ cluvfy stage -pre nodeadd -n vrh3

Performing pre-checks for node addition 

Checking node reachability...
Node reachability check passed from node "vrh1"

Checking user equivalence...
User equivalence check passed for user "grid"

Checking node connectivity...

Checking hosts config file...

Verification of the hosts config file successful

Check: Node connectivity for interface "eth0"
Node connectivity passed for interface "eth0"

Node connectivity check passed

Checking CRS integrity...

CRS integrity check passed

Checking shared resources...

Checking CRS home location...
The location "/g01/11.2.0/grid" is not shared but is present/creatable on all nodes
Shared resources check for node addition passed

Checking node connectivity...

Checking hosts config file...

Verification of the hosts config file successful

Check: Node connectivity for interface "eth0"
Node connectivity passed for interface "eth0"

Check: Node connectivity for interface "eth1"
Node connectivity passed for interface "eth1"

Node connectivity check passed

Total memory check passed
Available memory check passed
Swap space check passed
Free disk space check passed for "vrh3:/tmp"
Free disk space check passed for "vrh1:/tmp"
Check for multiple users with UID value 54322 passed
User existence check passed for "grid"
Run level check passed
Hard limits check failed for "maximum open file descriptors"
Check failed on nodes:
        vrh3
Soft limits check passed for "maximum open file descriptors"
Hard limits check passed for "maximum user processes"
Soft limits check passed for "maximum user processes"
System architecture check passed
Kernel version check passed
Kernel parameter check passed for "semmsl"
Kernel parameter check passed for "semmns"
Kernel parameter check passed for "semopm"
Kernel parameter check passed for "semmni"
Kernel parameter check passed for "shmmax"
Kernel parameter check passed for "shmmni"
Kernel parameter check passed for "shmall"
Kernel parameter check passed for "file-max"
Kernel parameter check passed for "ip_local_port_range"
Kernel parameter check passed for "rmem_default"
Kernel parameter check passed for "rmem_max"
Kernel parameter check passed for "wmem_default"
Kernel parameter check passed for "wmem_max"
Kernel parameter check passed for "aio-max-nr"
Package existence check passed for "make-3.81( x86_64)"
Package existence check passed for "binutils-2.17.50.0.6( x86_64)"
Package existence check passed for "gcc-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libaio-0.3.106 (x86_64)( x86_64)"
Package existence check passed for "glibc-2.5-24 (x86_64)( x86_64)"
Package existence check passed for "compat-libstdc++-33-3.2.3 (x86_64)( x86_64)"
Package existence check passed for "elfutils-libelf-0.125 (x86_64)( x86_64)"
Package existence check passed for "elfutils-libelf-devel-0.125( x86_64)"
Package existence check passed for "glibc-common-2.5( x86_64)"
Package existence check passed for "glibc-devel-2.5 (x86_64)( x86_64)"
Package existence check passed for "glibc-headers-2.5( x86_64)"
Package existence check passed for "gcc-c++-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libaio-devel-0.3.106 (x86_64)( x86_64)"
Package existence check passed for "libgcc-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libstdc++-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libstdc++-devel-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "sysstat-7.0.2( x86_64)"
Package existence check passed for "ksh-20060214( x86_64)"
Check for multiple users with UID value 0 passed
Current group ID check passed

Checking OCR integrity...

OCR integrity check passed

Checking Oracle Cluster Voting Disk configuration...

Oracle Cluster Voting Disk configuration check passed
Time zone consistency check passed

Starting Clock synchronization checks using Network Time Protocol(NTP)...

NTP Configuration file check started...
No NTP Daemons or Services were found to be running

Clock synchronization check using Network Time Protocol(NTP) passed

User "grid" is not part of "root" group. Check passed
Checking consistency of file "/etc/resolv.conf" across nodes

File "/etc/resolv.conf" does not have both domain and search entries defined
domain entry in file "/etc/resolv.conf" is consistent across nodes
search entry in file "/etc/resolv.conf" is consistent across nodes
All nodes have one search entry defined in file "/etc/resolv.conf"
PRVF-5636 : The DNS response time for an unreachable node exceeded "15000" ms on following nodes: vrh3

File "/etc/resolv.conf" is not consistent across nodes

Pre-check for node addition was unsuccessful on all the nodes.

一般来说如果我们不使用DNS解析域名方式的话,那么resolv.conf不一直的问题可以忽略,但在slient安装模式下可能造成我们的操作无法完成,这个后面会介绍。

二、向GI中加入新的节点

注意11.2.0.2 GI添加节点的关键脚本addNode.sh可能存在Bug,如官方文档所述当希望使用Interactive Mode交互模式启动OUI界面添加节点时,只要运行addNode.sh脚本即可,实际情况则不是这样:

documentation said:
Go to CRS_home/oui/bin and run the addNode.sh script on one of the existing nodes.
Oracle Universal Installer runs in add node mode and the Welcome page displays.
Click Next and the Specify Cluster Nodes for Node Addition page displays.

we done:

运行addNode.sh要求以GI拥有者身份运行该脚本,一般为grid用户,要求在已有的正运行GI的节点上启动脚本

[grid@vrh1 ~]$ cd $ORA_CRS_HOME/oui/bin

[grid@vrh1 bin]$ ./addNode.sh
ERROR:
Value for CLUSTER_NEW_NODES not specified.

USAGE:
/g01/11.2.0/grid/cv/cvutl/check_nodeadd.pl  {-pre|-post} 

/g01/11.2.0/grid/cv/cvutl/check_nodeadd.pl -pre [-silent] CLUSTER_NEW_NODES={}
/g01/11.2.0/grid/cv/cvutl/check_nodeadd.pl -pre [-silent] CLUSTER_NEW_NODES={} 
CLUSTER_NEW_VIRTUAL_HOSTNAMES={}

/g01/11.2.0/grid/cv/cvutl/check_nodeadd.pl -pre [-silent] -responseFile
/g01/11.2.0/grid/cv/cvutl/check_nodeadd.pl -post [-silent]

我们的本意是期望使用图形化的交互界面的OUI(runInstaller -addnode)来新增节点,然而addNode.sh居然让我们输入一些参量,而且其调用的check_nodeadd.pl脚本使用的是silent模式。

在MOS和GOOGLE上搜了一圈,基本所有的文档都推荐使用silent模式来添加节点,无法只好转到静默添加上来。实际上静默添加所需要提供的参数并不多,这可能是这种方式得到推崇的原因之一,但是这里又碰到问题了:

语法SYNTAX:
./addNode.sh –silent 
"CLUSTER_NEW_NODES={node2}" 
"CLUSTER_NEW_PRIVATE_NODE_NAMES={node2-priv}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={node2-vip}"

在我们的例子中具体命令如下

./addNode.sh -silent
"CLUSTER_NEW_NODES={vrh3}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={vrh3-vip}"
"CLUSTER_NEW_PRIVATE_NODE_NAMES={vrh3-priv}" 

以上命令因为采用silent模式所以没有任何窗口输出(实际上会输出到 /tmp/silentInstall.log日志文件中),去掉-silent参数

./addNode.sh  "CLUSTER_NEW_NODES={vrh3}"
"CLUSTER_NEW_VIRTUAL_HOSTNAMES={vrh3-vip}" "CLUSTER_NEW_PRIVATE_NODE_NAMES={vrh3-priv}"

Performing pre-checks for node addition 

Checking node reachability...
Node reachability check passed from node "vrh1"

Checking user equivalence...
User equivalence check passed for user "grid"

Checking node connectivity...

Checking hosts config file...

Verification of the hosts config file successful

Check: Node connectivity for interface "eth0"
Node connectivity passed for interface "eth0"

Node connectivity check passed

Checking CRS integrity...

CRS integrity check passed

Checking shared resources...

Checking CRS home location...
The location "/g01/11.2.0/grid" is not shared but is present/creatable on all nodes
Shared resources check for node addition passed

Checking node connectivity...

Checking hosts config file...

Verification of the hosts config file successful

Check: Node connectivity for interface "eth0"
Node connectivity passed for interface "eth0"

Check: Node connectivity for interface "eth1"
Node connectivity passed for interface "eth1"

Node connectivity check passed

Total memory check passed
Available memory check passed
Swap space check passed
Free disk space check passed for "vrh3:/tmp"
Free disk space check passed for "vrh1:/tmp"
Check for multiple users with UID value 54322 passed
User existence check passed for "grid"
Run level check passed
Hard limits check failed for "maximum open file descriptors"
Check failed on nodes:
        vrh3
Soft limits check passed for "maximum open file descriptors"
Hard limits check passed for "maximum user processes"
Soft limits check passed for "maximum user processes"
System architecture check passed
Kernel version check passed
Kernel parameter check passed for "semmsl"
Kernel parameter check passed for "semmns"
Kernel parameter check passed for "semopm"
Kernel parameter check passed for "semmni"
Kernel parameter check passed for "shmmax"
Kernel parameter check passed for "shmmni"
Kernel parameter check passed for "shmall"
Kernel parameter check passed for "file-max"
Kernel parameter check passed for "ip_local_port_range"
Kernel parameter check passed for "rmem_default"
Kernel parameter check passed for "rmem_max"
Kernel parameter check passed for "wmem_default"
Kernel parameter check passed for "wmem_max"
Kernel parameter check passed for "aio-max-nr"
Package existence check passed for "make-3.81( x86_64)"
Package existence check passed for "binutils-2.17.50.0.6( x86_64)"
Package existence check passed for "gcc-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libaio-0.3.106 (x86_64)( x86_64)"
Package existence check passed for "glibc-2.5-24 (x86_64)( x86_64)"
Package existence check passed for "compat-libstdc++-33-3.2.3 (x86_64)( x86_64)"
Package existence check passed for "elfutils-libelf-0.125 (x86_64)( x86_64)"
Package existence check passed for "elfutils-libelf-devel-0.125( x86_64)"
Package existence check passed for "glibc-common-2.5( x86_64)"
Package existence check passed for "glibc-devel-2.5 (x86_64)( x86_64)"
Package existence check passed for "glibc-headers-2.5( x86_64)"
Package existence check passed for "gcc-c++-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libaio-devel-0.3.106 (x86_64)( x86_64)"
Package existence check passed for "libgcc-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libstdc++-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libstdc++-devel-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "sysstat-7.0.2( x86_64)"
Package existence check passed for "ksh-20060214( x86_64)"
Check for multiple users with UID value 0 passed
Current group ID check passed

Checking OCR integrity...

OCR integrity check passed

Checking Oracle Cluster Voting Disk configuration...

Oracle Cluster Voting Disk configuration check passed
Time zone consistency check passed

Starting Clock synchronization checks using Network Time Protocol(NTP)...

NTP Configuration file check started...
No NTP Daemons or Services were found to be running

Clock synchronization check using Network Time Protocol(NTP) passed

User "grid" is not part of "root" group. Check passed
Checking consistency of file "/etc/resolv.conf" across nodes

File "/etc/resolv.conf" does not have both domain and search entries defined
domain entry in file "/etc/resolv.conf" is consistent across nodes
search entry in file "/etc/resolv.conf" is consistent across nodes
All nodes have one search entry defined in file "/etc/resolv.conf"
PRVF-5636 : The DNS response time for an unreachable node exceeded "15000" ms on following nodes: vrh3

File "/etc/resolv.conf" is not consistent across nodes

Checking VIP configuration.
Checking VIP Subnet configuration.
Check for VIP Subnet configuration passed.
Checking VIP reachability
Check for VIP reachability passed.

Pre-check for node addition was unsuccessful on all the nodes.

在addNode.sh正式添加节点之前它也会调用cluvfy工具来验证新加入节点是否满足条件,如果不满足则拒绝下一步操作。因为我们在之前已经验证过了新节点的可用性,所以这里完全可以跳过addNode.sh的验证,具体来看一下addNode.sh脚本的内容:

[grid@vrh1 bin]$ cat addNode.sh 

#!/bin/sh
OHOME=/g01/11.2.0/grid
INVPTRLOC=$OHOME/oraInst.loc
ADDNODE="$OHOME/oui/bin/runInstaller -addNode -invPtrLoc $INVPTRLOC ORACLE_HOME=$OHOME $*"
if [ "$IGNORE_PREADDNODE_CHECKS" = "Y" -o ! -f "$OHOME/cv/cvutl/check_nodeadd.pl" ]
then
        $ADDNODE
else
        CHECK_NODEADD="$OHOME/perl/bin/perl $OHOME/cv/cvutl/check_nodeadd.pl -pre $*"
        $CHECK_NODEADD
        if [ $? -eq 0 ]
        then
        $ADDNODE
        fi
fi

可以看到存在一个IGNORE_PREADDNODE_CHECKS环境变量可以控制是否进行节点新增的预检查,我们手动设置该变量,之后再次运行addNode.sh脚本:

export IGNORE_PREADDNODE_CHECKS=Y

./addNode.sh  "CLUSTER_NEW_NODES={vrh3}"
"CLUSTER_NEW_VIRTUAL_HOSTNAMES={vrh3-vip}" "CLUSTER_NEW_PRIVATE_NODE_NAMES={vrh3-priv}"
> add_node.log  2>&1

另开一个窗口可以监控新增节点的过程日志

tail -f add_node.log 

Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 5951 MB    Passed
Checking monitor: must be configured to display at least 256 colors.    Actual 16777216    Passed
Oracle Universal Installer, Version 11.2.0.2.0 Production
Copyright (C) 1999, 2010, Oracle. All rights reserved.

Performing tests to see whether nodes vrh2,vrh3 are available
............................................................... 100% Done.

.
-----------------------------------------------------------------------------
Cluster Node Addition Summary
Global Settings
   Source: /g01/11.2.0/grid
   New Nodes
Space Requirements
   New Nodes
      vrh3
         /: Required 6.66GB : Available 32.40GB
Installed Products
   Product Names
      Oracle Grid Infrastructure 11.2.0.2.0
      Sun JDK 1.5.0.24.08
      Installer SDK Component 11.2.0.2.0
      Oracle One-Off Patch Installer 11.2.0.0.2
      Oracle Universal Installer 11.2.0.2.0
      Oracle USM Deconfiguration 11.2.0.2.0
      Oracle Configuration Manager Deconfiguration 10.3.1.0.0
      Enterprise Manager Common Core Files 10.2.0.4.3
      Oracle DBCA Deconfiguration 11.2.0.2.0
      Oracle RAC Deconfiguration 11.2.0.2.0
      Oracle Quality of Service Management (Server) 11.2.0.2.0
      Installation Plugin Files 11.2.0.2.0
      Universal Storage Manager Files 11.2.0.2.0
      Oracle Text Required Support Files 11.2.0.2.0
      Automatic Storage Management Assistant 11.2.0.2.0
      Oracle Database 11g Multimedia Files 11.2.0.2.0
      Oracle Multimedia Java Advanced Imaging 11.2.0.2.0
      Oracle Globalization Support 11.2.0.2.0
      Oracle Multimedia Locator RDBMS Files 11.2.0.2.0
      Oracle Core Required Support Files 11.2.0.2.0
      Bali Share 1.1.18.0.0
      Oracle Database Deconfiguration 11.2.0.2.0
      Oracle Quality of Service Management (Client) 11.2.0.2.0
      Expat libraries 2.0.1.0.1
      Oracle Containers for Java 11.2.0.2.0
      Perl Modules 5.10.0.0.1
      Secure Socket Layer 11.2.0.2.0
      Oracle JDBC/OCI Instant Client 11.2.0.2.0
      Oracle Multimedia Client Option 11.2.0.2.0
      LDAP Required Support Files 11.2.0.2.0
      Character Set Migration Utility 11.2.0.2.0
      Perl Interpreter 5.10.0.0.1
      PL/SQL Embedded Gateway 11.2.0.2.0
      OLAP SQL Scripts 11.2.0.2.0
      Database SQL Scripts 11.2.0.2.0
      Oracle Extended Windowing Toolkit 3.4.47.0.0
      SSL Required Support Files for InstantClient 11.2.0.2.0
      SQL*Plus Files for Instant Client 11.2.0.2.0
      Oracle Net Required Support Files 11.2.0.2.0
      Oracle Database User Interface 2.2.13.0.0
      RDBMS Required Support Files for Instant Client 11.2.0.2.0
      RDBMS Required Support Files Runtime 11.2.0.2.0
      XML Parser for Java 11.2.0.2.0
      Oracle Security Developer Tools 11.2.0.2.0
      Oracle Wallet Manager 11.2.0.2.0
      Enterprise Manager plugin Common Files 11.2.0.2.0
      Platform Required Support Files 11.2.0.2.0
      Oracle JFC Extended Windowing Toolkit 4.2.36.0.0
      RDBMS Required Support Files 11.2.0.2.0
      Oracle Ice Browser 5.2.3.6.0
      Oracle Help For Java 4.2.9.0.0
      Enterprise Manager Common Files 10.2.0.4.3
      Deinstallation Tool 11.2.0.2.0
      Oracle Java Client 11.2.0.2.0
      Cluster Verification Utility Files 11.2.0.2.0
      Oracle Notification Service (eONS) 11.2.0.2.0
      Oracle LDAP administration 11.2.0.2.0
      Cluster Verification Utility Common Files 11.2.0.2.0
      Oracle Clusterware RDBMS Files 11.2.0.2.0
      Oracle Locale Builder 11.2.0.2.0
      Oracle Globalization Support 11.2.0.2.0
      Buildtools Common Files 11.2.0.2.0
      Oracle RAC Required Support Files-HAS 11.2.0.2.0
      SQL*Plus Required Support Files 11.2.0.2.0
      XDK Required Support Files 11.2.0.2.0
      Agent Required Support Files 10.2.0.4.3
      Parser Generator Required Support Files 11.2.0.2.0
      Precompiler Required Support Files 11.2.0.2.0
      Installation Common Files 11.2.0.2.0
      Required Support Files 11.2.0.2.0
      Oracle JDBC/THIN Interfaces 11.2.0.2.0
      Oracle Multimedia Locator 11.2.0.2.0
      Oracle Multimedia 11.2.0.2.0
      HAS Common Files 11.2.0.2.0
      Assistant Common Files 11.2.0.2.0
      PL/SQL 11.2.0.2.0
      HAS Files for DB 11.2.0.2.0
      Oracle Recovery Manager 11.2.0.2.0
      Oracle Database Utilities 11.2.0.2.0
      Oracle Notification Service 11.2.0.2.0
      SQL*Plus 11.2.0.2.0
      Oracle Netca Client 11.2.0.2.0
      Oracle Net 11.2.0.2.0
      Oracle JVM 11.2.0.2.0
      Oracle Internet Directory Client 11.2.0.2.0
      Oracle Net Listener 11.2.0.2.0
      Cluster Ready Services Files 11.2.0.2.0
      Oracle Database 11g 11.2.0.2.0
-----------------------------------------------------------------------------

Instantiating scripts for add node (Monday, August 15, 2011 10:15:35 PM CST)
.                                                                 1% Done.
Instantiation of add node scripts complete

Copying to remote nodes (Monday, August 15, 2011 10:15:38 PM CST)
...............................................................................................                                 96% Done.
Home copied to new nodes

Saving inventory on nodes (Monday, August 15, 2011 10:21:02 PM CST)
.                                                               100% Done.
Save inventory complete
WARNING:A new inventory has been created on one or more nodes in this session.
However, it has not yet been registered as the central inventory of this system.
To register the new inventory please run the script at '/g01/oraInventory/orainstRoot.sh'
with root privileges on nodes 'vrh3'.
If you do not register the inventory, you may not be able to update or
patch the products you installed.
The following configuration scripts need to be executed as the "root" user in each cluster node.
/g01/oraInventory/orainstRoot.sh #On nodes vrh3
/g01/11.2.0/grid/root.sh #On nodes vrh3
To execute the configuration scripts:
    1. Open a terminal window
    2. Log in as "root"
    3. Run the scripts in each cluster node

The Cluster Node Addition of /g01/11.2.0/grid was successful.
Please check '/tmp/silentInstall.log' for more details.

以上GI软件的安装成功了,接下来我们还需要在新加入的节点上运行2个关键的脚本,千万不要忘记这一点!:

运行orainstRoot.sh 和 root.sh脚本要求以root身份
su - root 

[root@vrh3]# cat /etc/oraInst.loc
inventory_loc=/g01/oraInventory                     --这里是oraInventory的位置
inst_group=asmadmin

[root@vrh3 ~]# cd /g01/oraInventory

[root@vrh3 oraInventory]# ./orainstRoot.sh
Creating the Oracle inventory pointer file (/etc/oraInst.loc)
Changing permissions of /g01/oraInventory.
Adding read,write permissions for group.
Removing read,write,execute permissions for world.

Changing groupname of /g01/oraInventory to asmadmin.
The execution of the script is complete.

运行CRS_HOME下的root.sh脚本,可能会有警告但不要紧

[root@vrh3 ~]# cd $ORA_CRS_HOME

[root@vrh3 g01]# /g01/11.2.0/grid/root.sh
Running Oracle 11g root script...

The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME=  /g01/11.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
   Copying dbhome to /usr/local/bin ...
   Copying oraenv to /usr/local/bin ...
   Copying coraenv to /usr/local/bin ...

Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.

Using configuration parameter file: /g01/11.2.0/grid/crs/install/crsconfig_params
Creating trace directory
LOCAL ADD MODE
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
OLR initialization - successful
Adding daemon to inittab
ACFS-9200: Supported
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.
CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node vrh1, number 1, and is terminating
An active cluster was found during exclusive startup, restarting to join the cluster
clscfg: EXISTING configuration version 5 detected.
clscfg: version 5 is 11g Release 2.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
/g01/11.2.0/grid/bin/srvctl start listener -n vrh3 ... failed
Failed to perform new node configuration at /g01/11.2.0/grid/crs/install/crsconfig_lib.pm line 8255.
/g01/11.2.0/grid/perl/bin/perl -I/g01/11.2.0/grid/perl/lib -I/g01/11.2.0/grid/crs/install 
/g01/11.2.0/grid/crs/install/rootcrs.pl execution failed

以上会出现了2个小错误:

1.新增节点上LISTENER启动失败的问题可以忽略,这是因为RDBMS_HOME仍未安装,但CRS尝试去启动相关的监听

[root@vrh3 g01]# /g01/11.2.0/grid/bin/srvctl start listener -n vrh3
PRCR-1013 : Failed to start resource ora.CRS_LISTENER.lsnr
PRCR-1064 : Failed to start resource ora.CRS_LISTENER.lsnr on node vrh3
CRS-5010: Update of configuration file "/s01/orabase/product/11.2.0/dbhome_1/network/admin/listener.ora" failed: details at "(:CLSN00014:)" in "/g01/11.2.0/grid/log/vrh3/agent/crsd/oraagent_oracle/oraagent_oracle.log"
CRS-5013: Agent "/g01/11.2.0/grid/bin/oraagent.bin" failed to start process "/s01/orabase/product/11.2.0/dbhome_1/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/g01/11.2.0/grid/log/vrh3/agent/crsd/oraagent_oracle/oraagent_oracle.log"
CRS-2674: Start of 'ora.CRS_LISTENER.lsnr' on 'vrh3' failed
CRS-5013: Agent "/g01/11.2.0/grid/bin/oraagent.bin" failed to start process "/s01/orabase/product/11.2.0/dbhome_1/bin/lsnrctl" for action "clean": details at "(:CLSN00008:)" in "/g01/11.2.0/grid/log/vrh3/agent/crsd/oraagent_oracle/oraagent_oracle.log"
CRS-5013: Agent "/g01/11.2.0/grid/bin/oraagent.bin" failed to start process "/s01/orabase/product/11.2.0/dbhome_1/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/g01/11.2.0/grid/log/vrh3/agent/crsd/oraagent_oracle/oraagent_oracle.log"
CRS-2678: 'ora.CRS_LISTENER.lsnr' on 'vrh3' has experienced an unrecoverable failure
CRS-0267: Human intervention required to resume its availability.
PRCC-1015 : LISTENER was already running on vrh3
PRCR-1004 : Resource ora.LISTENER.lsnr is already running

2.rootcrs.pl脚本运行失败的话,一般重新运行一次即可:

[root@vrh3 bin]# /g01/11.2.0/grid/perl/bin/perl -I/g01/11.2.0/grid/perl/lib
-I/g01/11.2.0/grid/crs/install /g01/11.2.0/grid/crs/install/rootcrs.pl

Using configuration parameter file: /g01/11.2.0/grid/crs/install/crsconfig_params
PRKO-2190 : VIP exists for node vrh3, VIP name vrh3-vip
PRKO-2420 : VIP is already started on node(s): vrh3
Preparing packages for installation...
cvuqdisk-1.0.9-1
Configure Oracle Grid Infrastructure for a Cluster ... succeeded

3.建议在新增节点上重启crs,并使用cluvfy验证nodeadd顺利完成 :

[root@vrh3 ~]# crsctl stop crs

[root@vrh3 ~]# crsctl start crs

[root@vrh3 ~]# su - grid

[grid@vrh3 ~]$ cluvfy stage -post nodeadd -n vrh1,vrh2,vrh3

Performing post-checks for node addition 

Checking node reachability...
Node reachability check passed from node "vrh1"

Checking user equivalence...
User equivalence check passed for user "grid"

Checking node connectivity...

Checking hosts config file...

Verification of the hosts config file successful

Check: Node connectivity for interface "eth0"
Node connectivity passed for interface "eth0"

Node connectivity check passed

Checking cluster integrity...

Cluster integrity check passed

Checking CRS integrity...

CRS integrity check passed

Checking shared resources...

Checking CRS home location...
The location "/g01/11.2.0/grid" is not shared but is present/creatable on all nodes
Shared resources check for node addition passed

Checking node connectivity...

Checking hosts config file...

Verification of the hosts config file successful

Check: Node connectivity for interface "eth0"
Node connectivity passed for interface "eth0"

Check: Node connectivity for interface "eth1"
Node connectivity passed for interface "eth1"

Node connectivity check passed

Checking node application existence...

Checking existence of VIP node application (required)
VIP node application check passed

Checking existence of NETWORK node application (required)
NETWORK node application check passed

Checking existence of GSD node application (optional)
GSD node application is offline on nodes "vrh3,vrh2,vrh1"

Checking existence of ONS node application (optional)
ONS node application check passed

Checking Single Client Access Name (SCAN)...

Checking TCP connectivity to SCAN Listeners...
TCP connectivity to SCAN Listeners exists on all cluster nodes

Checking name resolution setup for "vrh.cluster.oracle.com"...

ERROR:
PRVF-4664 : Found inconsistent name resolution entries for SCAN name "vrh.cluster.oracle.com"

ERROR:
PRVF-4657 : Name resolution setup check for "vrh.cluster.oracle.com" (IP address: 192.168.1.190) failed

ERROR:
PRVF-4664 : Found inconsistent name resolution entries for SCAN name "vrh.cluster.oracle.com"

Verification of SCAN VIP and Listener setup failed

User "grid" is not part of "root" group. Check passed

Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed

Checking if CTSS Resource is running on all nodes...
CTSS resource check passed

Querying CTSS for time offset on all nodes...
Query of CTSS for time offset passed

Check CTSS state started...
CTSS is in Active state. Proceeding with check of clock time offsets on all nodes...
Check of clock time offsets passed

Oracle Cluster Time Synchronization Services check passed

Post-check for node addition was successful.

11gr2 RAC安装INS-35354问题一例

今天在安装一套11.2.0.2 RAC数据库时出现了INS-35354的问题:
11gR2-GI-INS-35354

因为之前已经成功安装了11.2.0.2的GI,而且Cluster的一切状态都正常,出现这错误都少有点意外:

[grid@vrh1 ~]$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

去MOS搜了一圈,发现有可能是oraInventory中的inventory.xml更新不正确导致的:

Applies to:
Oracle Server - Enterprise Edition - Version: 11.2.0.1 to 11.2.0.2 - Release: 11.2 to 11.2
Information in this document applies to any platform.
Symptoms

Installing 11gR2 database software in a Grid Infrastrsucture environment fails with the error INS-35354:

The system on which you are attempting to install Oracle RAC is not part of a valid cluster.

Grid Infrastructure (Oracle Clusterware) is running on all nodes in the cluster which can be verified with:

crsctl check crs

Changes
This is a new install.
Cause
As per 11gR2 documentation the error description is:

INS-35354: The system on which you are attempting to install Oracle RAC is not part of a valid cluster.

Cause: Prior to installing Oracle RAC, you must create a valid cluster. 
This is done by deploying Grid Infrastructure software, 
which will allow configuration of Oracle Clusterware and Automatic Storage Management.

However, the problem at hand may be that the central inventory is missing the "CRS=true" flag 
(for the Grid Infrastructure Home).
<inventory.xml>
-------------

<HOME_LIST>
<HOME NAME="Ora11g_gridinfrahome1" LOC="/u01/grid" TYPE="O" IDX="1">
<NODE_LIST>
<NODE NAME="node1"/>
<NODE NAME="node2"/>
</NODE_LIST>

 -------------

From the inventory.xml, we see that the HOME NAME line is missing the CRS="true" flag.

The error INS-35354 will occur when the central inventory entry for the Grid Infrastructure 
home is missing the flag that identifies it as CRS-type home.
Solution
Use the -updateNodeList option for the installer command to fix the the inventory.

The full syntax is:

./runInstaller -updateNodeList "CLUSTER_NODES={node1,node2}"
ORACLE_HOME="" ORACLE_HOME_NAME="" LOCAL_NODE="Node_Name" CRS=[true|false]

Execute the command on any node in the cluster.

Examples:

For a two-node RAC cluster on UNIX:

Node1:
cd /u01/grid/oui/bin
./runInstaller -updateNodeList "CLUSTER_NODES={node1,node2}" ORACLE_HOME="/u01/crs" 
ORACLE_HOME_NAME="GI_11201" LOCAL_NODE="node1" CRS=true

For a 2-node RAC cluster on Windows:

Node 1:
cd e:\app\11.2.0\grid\oui\bin
e:\app\11.2.0\grid\oui\bin\setup -updateNodeList "CLUSTER_NODES={RACNODE1,RACNODE2}" 
ORACLE_HOME="e:\app\11.2.0\grid" ORACLE_HOME_NAME="OraCrs11g_home1" LOCAL_NODE="RACNODE1" CRS=true

我环境中的inventory.xml内容如下:

[grid@vrh1 ContentsXML]$ cat inventory.xml 
<?xml version="1.0" standalone="yes" ?>
<!-- Copyright (c) 1999, 2010, Oracle. All rights reserved. -->
<!-- Do not modify the contents of this file by hand. -->
<INVENTORY>
<VERSION_INFO>
   <SAVED_WITH>11.2.0.2.0</SAVED_WITH>
   <MINIMUM_VER>2.1.0.6.0</MINIMUM_VER>
</VERSION_INFO>
<HOME_LIST>
<HOME NAME="Ora11g_gridinfrahome1" LOC="/g01/11.2.0/grid" TYPE="O" IDX="1" >
   <NODE_LIST>
      <NODE NAME="vrh1"/>
      <NODE NAME="vrh2"/>
   </NODE_LIST>
</HOME>
</HOME_LIST>
</INVENTORY>

显然是在<HOME NAME这里缺少了CRS=”true”的标志,导致OUI安装界面在检测时认为该节点没有安装GI。

解决方案其实很简单只要加入CRS=”true”在重启runInstaller即可,不需要如文档中介绍的那样使用runInstaller -updateNodeList的复杂命令组合。

[grid@vrh1 ContentsXML]$ cat /g01/oraInventory/ContentsXML/inventory.xml 
<?xml version="1.0" standalone="yes" ?>
<!-- Copyright (c) 1999, 2010, Oracle. All rights reserved. -->
<!-- Do not modify the contents of this file by hand. -->
<INVENTORY>
<VERSION_INFO>
   <SAVED_WITH>11.2.0.2.0</SAVED_WITH>
   <MINIMUM_VER>2.1.0.6.0</MINIMUM_VER>
</VERSION_INFO>
<HOME_LIST>
<HOME NAME="Ora11g_gridinfrahome1" LOC="/g01/11.2.0/grid" TYPE="O" IDX="1" CRS="true">
   <NODE_LIST>
      <NODE NAME="vrh1"/>
      <NODE NAME="vrh2"/>
   </NODE_LIST>
</HOME>
</HOME_LIST>
</INVENTORY>

如上修改后问题解决,安装界面正常:
11gr2-RAC-Installing-db-step-4-10

crsctl status resource -t -init in 11.2.0.2 grid infrastructure

11.2.0.2的grid infrastructure中crsctl stat res 命令不再显示如ora.cssd、ora.ctssd、ora.diskmon等基础资源的信息,如果用户想要了解这些resource状态需要加上-init选项:

[grid@rh2 ~]$ crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [11.2.0.2.0]

[grid@rh2 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       rh2
ora.LISTENER.lsnr
               OFFLINE OFFLINE      rh2
ora.asm
               ONLINE  ONLINE       rh2
ora.gsd
               OFFLINE OFFLINE      rh2
ora.net1.network
               ONLINE  ONLINE       rh2
ora.ons
               ONLINE  ONLINE       rh2
ora.registry.acfs
               OFFLINE OFFLINE      rh2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        OFFLINE OFFLINE
ora.cvu
      1        OFFLINE OFFLINE
ora.dw.db
      1        OFFLINE OFFLINE
ora.maclean.db
      1        OFFLINE OFFLINE
ora.oc4j
      1        OFFLINE OFFLINE
ora.prod.db
      1        OFFLINE OFFLINE
      2        OFFLINE OFFLINE
ora.prod.maclean.svc
      1        OFFLINE OFFLINE
      2        OFFLINE OFFLINE
ora.prod.maclean_pre.svc
      1        OFFLINE OFFLINE
      2        OFFLINE OFFLINE
ora.prod.maclean_pre_preconnect.svc
      1        OFFLINE OFFLINE
ora.prod.maclean_taf.svc
      1        OFFLINE OFFLINE
      2        OFFLINE OFFLINE
ora.rh2.vip
      1        OFFLINE OFFLINE
ora.rh3.vip
      1        OFFLINE OFFLINE
ora.scan1.vip
      1        OFFLINE OFFLINE                                       

[grid@rh2 ~]$ crsctl stat res -t -init 
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       rh2                      Started
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       rh2
ora.crf
      1        ONLINE  ONLINE       rh2
ora.crsd
      1        ONLINE  ONLINE       rh2
ora.cssd
      1        ONLINE  ONLINE       rh2
ora.cssdmonitor
      1        ONLINE  ONLINE       rh2
ora.ctssd
      1        ONLINE  ONLINE       rh2                      OBSERVER
ora.diskmon
      1        ONLINE  ONLINE       rh2
ora.drivers.acfs
      1        ONLINE  OFFLINE
ora.evmd
      1        ONLINE  ONLINE       rh2
ora.gipcd
      1        ONLINE  ONLINE       rh2
ora.gpnpd
      1        ONLINE  ONLINE       rh2
ora.mdnsd
      1        ONLINE  ONLINE       rh2

此外在11.2.0.2的grid中当我们想启动、停止、修改这些init资源时都需要加上-init选项,否则将出现CRS-2613: Could not find resource错误:

[grid@rh2 ~]$ crsctl stat res ora.asm
NAME=ora.asm
TYPE=ora.asm.type
TARGET=ONLINE
STATE=ONLINE on rh2

[grid@rh2 ~]$ crsctl modify res ora.asm -attr AUTO_START=never

[grid@rh2 ~]$ crsctl stat res ora.asm -p
NAME=ora.asm
TYPE=ora.asm.type
ACL=owner:grid:rwx,pgrp:oinstall:rwx,other::r--
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%
ALIAS_NAME=ora.%CRS_CSS_NODENAME%.ASM%CRS_CSS_NODENUMBER%.asm
AUTO_START=never
CHECK_INTERVAL=60
CHECK_TIMEOUT=30
DEFAULT_TEMPLATE=PROPERTY(RESOURCE_CLASS=asm) ELEMENT(INSTANCE_NAME= %GEN_USR_ORA_INST_NAME%)
DEGREE=1
DESCRIPTION=Oracle ASM resource
ENABLED=1
GEN_USR_ORA_INST_NAME=
GEN_USR_ORA_INST_NAME@SERVERNAME(rh2)=+ASM1
GEN_USR_ORA_INST_NAME@SERVERNAME(rh3)=+ASM2
LOAD=1
LOGGING_LEVEL=1
NLS_LANG=
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=60
START_DEPENDENCIES=weak(ora.LISTENER.lsnr)
START_TIMEOUT=900
STATE_CHANGE_TEMPLATE=
STOP_DEPENDENCIES=
STOP_TIMEOUT=600
TYPE_VERSION=1.2
UPTIME_THRESHOLD=1d
USR_ORA_ENV=
USR_ORA_INST_NAME=+ASM%CRS_CSS_NODENUMBER%
USR_ORA_OPEN_MODE=mount
USR_ORA_OPI=false
USR_ORA_STOP_MODE=immediate
VERSION=11.2.0.2.0

[grid@rh2 ~]$ crsctl status resource  -init -t|grep -v ONLINE|tail -13
ora.asm
ora.cluster_interconnect.haip
ora.crf
ora.crsd
ora.cssd
ora.cssdmonitor
ora.ctssd
ora.diskmon
ora.drivers.acfs
ora.evmd
ora.gipcd
ora.gpnpd
ora.mdnsd

[grid@rh2 ~]$ crsctl status resource  -init -t|grep -v ONLINE|tail -13|xargs crsctl status resource
CRS-2613: Could not find resource 'ora.cluster_interconnect.haip'.
CRS-2613: Could not find resource 'ora.crf'.
CRS-2613: Could not find resource 'ora.crsd'.
CRS-2613: Could not find resource 'ora.cssd'.
CRS-2613: Could not find resource 'ora.cssdmonitor'.
CRS-2613: Could not find resource 'ora.ctssd'.
CRS-2613: Could not find resource 'ora.diskmon'.
CRS-2613: Could not find resource 'ora.drivers.acfs'.
CRS-2613: Could not find resource 'ora.evmd'.
CRS-2613: Could not find resource 'ora.gipcd'.
CRS-2613: Could not find resource 'ora.gpnpd'.
CRS-2613: Could not find resource 'ora.mdnsd'.
NAME=ora.asm
TYPE=ora.asm.type
TARGET=ONLINE
STATE=ONLINE on rh2

[grid@rh2 ~]$ crsctl status res ora.crsd -init -p
NAME=ora.crsd
TYPE=ora.crs.type
ACL=owner:root:rw-,pgrp:oinstall:rw-,other::r--,user:grid:r-x
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=0
AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%CRS_EXE_SUFFIX%
AUTO_START=always
CARDINALITY=1
CHECK_ARGS=
CHECK_COMMAND=
CHECK_INTERVAL=30
CLEAN_ARGS=
CLEAN_COMMAND=
DAEMON_LOGGING_LEVELS=AGENT=1,AGFW=0,CLSFRAME=0,CLSVER=0,CLUCLS=0,COMMCRS=0,COMMNS=0,CRSAPP=0,CRSCCL=0,CRSCEVT=0,CRSCOMM=0,CRSD=0,CRSEVT=0,CRSMAIN=0,CRSOCR=0,CRSPE=0,CRSPLACE=0,CRSRES=0,CRSRPT=0,CRSRTI=0,CRSSE=0,CRSSEC=0,CRSTIMER=0,CRSUI=0,CSSCLNT=0,SuiteTes=1,UiServer=0,OCRAPI=1,OCRCLI=1,OCRSRV=1,OCRMAS=1,OCRMSG=1,OCRCAC=1,OCRRAW=1,OCRUTL=1,OCROSD=1,OCRASM=1
DAEMON_TRACING_LEVELS=AGENT=0,AGFW=0,CLSFRAME=0,CLSVER=0,CLUCLS=0,COMMCRS=0,COMMNS=0,CRSAPP=0,CRSCCL=0,CRSCEVT=0,CRSCOMM=0,CRSD=0,CRSEVT=0,CRSMAIN=0,CRSOCR=0,CRSPE=0,CRSPLACE=0,CRSRES=0,CRSRPT=0,CRSRTI=0,CRSSE=0,CRSSEC=0,CRSTIMER=0,CRSUI=0,CSSCLNT=0,SuiteTes=0,UiServer=0,OCRAPI=1,OCRCLI=1,OCRSRV=1,OCRMAS=1,OCRMSG=1,OCRCAC=1,OCRRAW=1,OCRUTL=1,OCROSD=1,OCRASM=1
DEFAULT_TEMPLATE=
DEGREE=1
DESCRIPTION="Resource type for CRSD"
DETACHED=true
ENABLED=1
FAILOVER_DELAY=0
FAILURE_INTERVAL=3
FAILURE_THRESHOLD=5
HOSTING_MEMBERS=
LOAD=1
LOGGING_LEVEL=1
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
ORA_VERSION=11.2.0.2.0
PID_FILE=
PLACEMENT=balanced
PROCESS_TO_MONITOR=
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=10
SCRIPT_TIMEOUT=60
SERVER_POOLS=
START_ARGS=
START_COMMAND=
START_DEPENDENCIES=hard(ora.asm,ora.cssd,ora.ctssd,ora.gipcd)pullup(ora.asm,ora.cssd,ora.ctssd,ora.gipcd)
START_TIMEOUT=600
STATE_CHANGE_TEMPLATE=
STOP_ARGS=
STOP_COMMAND=
STOP_DEPENDENCIES=hard(shutdown:ora.asm,intermediate:ora.cssd,intermediate:ora.gipcd)
STOP_MODE=NONE
STOP_TIMEOUT=43200
UPTIME_THRESHOLD=1m
USR_ORA_ENV=

[grid@rh2 ~]$ crsctl modify res ora.crsd -init -attr "SCRIPT_TIMEOUT"=65   
CRS-0245:  User doesn't have enough privilege to perform the operation
CRS-4000: Command Modify failed, or completed with errors.

/* 修改某些资源的属性要求root权限 */

[root@rh2 ~]# crsctl modify res ora.crsd -init -attr "SCRIPT_TIMEOUT"=65 

[root@rh2 ~]# crsctl status res ora.crsd -init -p|grep SCRIPT_TIMEOUT
SCRIPT_TIMEOUT=65

[root@rh2 ~]# crsctl status res ora.ctssd -p -init
NAME=ora.ctssd
TYPE=ora.ctss.type
ACL=owner:root:rw-,pgrp:oinstall:rw-,other::r--,user:grid:r-x
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=0
AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%CRS_EXE_SUFFIX%
AUTO_START=always
CARDINALITY=1
CHECK_ARGS=
CHECK_COMMAND=
CHECK_INTERVAL=30
CLEAN_ARGS=
CLEAN_COMMAND=
DAEMON_LOGGING_LEVELS=CLUCLS=0,CSSCLNT=0,CRSCCL=1,CTSS=5,OCRAPI=1,OCRCLI=1,OCRMSG=1
DAEMON_TRACING_LEVELS=CLUCLS=0,CSSCLNT=0,CRSCCL=1,CTSS=5,OCRAPI=1,OCRCLI=1,OCRMSG=1
DEFAULT_TEMPLATE=
DEGREE=1
DESCRIPTION="Resource type for Ctss Agents"
DETACHED=true
ENABLED=1
FAILOVER_DELAY=0
FAILURE_INTERVAL=3
FAILURE_THRESHOLD=5
HOSTING_MEMBERS=
LOAD=1
LOGGING_LEVEL=1
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
ORA_VERSION=11.2.0.2.0
PID_FILE=
PLACEMENT=balanced
PROCESS_TO_MONITOR=
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=60
SERVER_POOLS=
START_ARGS=
START_COMMAND=
START_DEPENDENCIES=hard(ora.cssd,ora.gipcd)pullup(ora.cssd,ora.gipcd)
START_TIMEOUT=60
STATE_CHANGE_TEMPLATE=
STOP_ARGS=
STOP_COMMAND=
STOP_DEPENDENCIES=hard(ora.cssd,ora.gipcd)
STOP_TIMEOUT=60
UPTIME_THRESHOLD=1m
USR_ORA_ENV=

[root@rh2 ~]# crsctl status res ora.diskmon -p -init
NAME=ora.diskmon
TYPE=ora.diskmon.type
ACL=owner:root:rw-,pgrp:oinstall:rw-,other::r--,user:grid:r-x
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=0
AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%CRS_EXE_SUFFIX%
AUTO_START=never
CARDINALITY=1
CHECK_ARGS=
CHECK_COMMAND=
CHECK_INTERVAL=3
CHECK_TIMEOUT=30
CLEAN_ARGS=
CLEAN_COMMAND=
DAEMON_LOGGING_LEVELS=
DAEMON_TRACING_LEVELS=
DEFAULT_TEMPLATE=
DEGREE=1
DESCRIPTION="Resource type for Diskmon"
DETACHED=true
ENABLED=1
FAILOVER_DELAY=0
FAILURE_INTERVAL=3
FAILURE_THRESHOLD=5
HOSTING_MEMBERS=
LOAD=1
LOGGING_LEVEL=1
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
ORA_VERSION=11.2.0.2.0
PID_FILE=
PLACEMENT=balanced
PROCESS_TO_MONITOR=
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=10
SCRIPT_TIMEOUT=60
SERVER_POOLS=
START_ARGS=
START_COMMAND=
START_DEPENDENCIES=weak(concurrent:ora.cssd)pullup:always(ora.cssd)
START_TIMEOUT=600
STATE_CHANGE_TEMPLATE=
STOP_ARGS=
STOP_COMMAND=
STOP_DEPENDENCIES=
STOP_TIMEOUT=60
UPTIME_THRESHOLD=5s
USR_ORA_ENV=ORACLE_USER=grid
VERSION=11.2.0.2.0

[root@rh2 ~]# crsctl status res ora.cssd -init -p
NAME=ora.cssd
TYPE=ora.cssd.type
ACL=owner:root:rw-,pgrp:oinstall:rw-,other::r--,user:grid:r-x
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=0
AGENT_FILENAME=%CRS_HOME%/bin/cssdagent%CRS_EXE_SUFFIX%
AGENT_HB_INTERVAL=0
AGENT_HB_MISCOUNT=10
AUTO_START=always
CARDINALITY=1
CHECK_ARGS=
CHECK_COMMAND=
CHECK_INTERVAL=30
CLEAN_ARGS=abort
CLEAN_COMMAND=
CSSD_MODE=
CSSD_PATH=%CRS_HOME%/bin/ocssd%CRS_EXE_SUFFIX%
CSS_USER=grid
DAEMON_LOGGING_LEVELS=CSSD=2,GIPCNM=2,GIPCGM=2,GIPCCM=2,CLSF=0,SKGFD=0,GPNP=1,OLR=0
DAEMON_TRACING_LEVELS=CSSD=0,GIPCNM=0,GIPCGM=0,GIPCCM=0,CLSF=0,SKGFD=0,GPNP=0,OLR=0
DEFAULT_TEMPLATE=
DEGREE=1
DESCRIPTION="Resource type for CSSD"
DETACHED=true
ENABLED=1
ENV_OPTS=
FAILOVER_DELAY=0
FAILURE_INTERVAL=3
FAILURE_THRESHOLD=5
HOSTING_MEMBERS=
LOAD=1
LOGGING_LEVEL=1
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
OMON_INITRATE=1000
OMON_POLLRATE=500
ORA_OPROCD_MODE=
ORA_VERSION=11.2.0.2.0
PID_FILE=
PLACEMENT=balanced
PROCD_TIMEOUT=1000
PROCESS_TO_MONITOR=
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=3
SCRIPT_TIMEOUT=600
SERVER_POOLS=
START_ARGS=
START_COMMAND=
START_DEPENDENCIES=weak(concurrent:ora.diskmon)hard(ora.cssdmonitor,ora.gpnpd,ora.gipcd)pullup(ora.gpnpd,ora.gipcd)
START_TIMEOUT=600
STATE_CHANGE_TEMPLATE=
STOP_ARGS=
STOP_COMMAND=
STOP_DEPENDENCIES=hard(intermediate:ora.gipcd,shutdown:ora.diskmon,intermediate:ora.cssdmonitor)
STOP_TIMEOUT=900
UPTIME_THRESHOLD=1m
USR_ORA_ENV=
VMON_INITLIMIT=16
VMON_INITRATE=500
VMON_POLLRATE=500

几个关于oracle 11g ASM的问题

Question:

1.11g Oracle Clusterware需要的OCR和Voting disk可以存储在ASM或者集群文件系统或者NFS中。对于全新安装,裸设备不再被支持(是否有办法使用裸设备?)。

2.使用ASM时,若相关存储上的磁盘路径(disk path)名前后不一致,是否仍然可以使用?需要什么调整?

Answer:

1.在11gr2 Grid Infrastructure全新安装时是没有办法使用裸设备的(You cannot install OCR or voting disk files on raw partitions. You can install only on Oracle ASM, or on supported network-attached storage or cluster file systems. The only use for raw devices is as Oracle ASM disks.);但可以通过后续的手段将OCR和VOTING DISK移动到裸设备上,如:

替换OCR:
ocrconfig -add rawdevice
ocrconfig -replace

替换voting disk
crsctl add votedisk css  -force
crsctl delete votedisk css  -force

具体可以参考Metalink文档<How to ADD/REMOVE/REPLACE/MOVE Oracle Cluster Registry (OCR) and Voting Disk>
实际上强烈不建议这样做。因为如果出现问题,Oracle GCS可以拒绝提供建议。

2.ASM是通过读取磁盘头部来了解磁盘内容的;磁盘路径名在安装时需要在所有节点一致,在安装完成后即便路径名改变也不会影响到ASM的使用。
需要注意的是在AIX操作平台上分配给ASM的磁盘(ASM DISK),如果直接是HDISK形式的LUN则该HDISK不应当具有PVID(If the disk device has a PVID, then ASM will fail to mount the diskgroup created on the disk device.)。如果是裸的逻辑卷,那么所建VG应当是scaleable volume group。

了解更多关于11gR2 diskmon

 

下图显示了有ohasd管理的所有资源(resource)/守护进程(daemons)的依赖关系:

 

 

Diskmon

 

Master diskmon

• Monitors CELLSRVs and Network using Heartbeats
• Propagates Cell/Network state to ASM/RDBMS processes (dskm)
• Maintains a cluster-wide global view of Cells with other DISKMONs in the cluster
• Accepts fencing requests from CRS and delivers them to the Cells
• Accepts intradatabase IORM plans from RDBMS and sends them to the Cells
• Provides communication with the cells

 

 

Diskmon daemon用以监控Exadata 中的cell server,即只有在Exadata环境中才有用。但是在版本11.2.0.1-11.2.0.2,即便是非Exadata环境也会默认启动该守护进 程。  在版本11.2.0.3 中 改进了这一细节,非Exadata环境无发启动diskmon了。

11.2.0.3 Grid Infrastructure diskmon Will be Offline by Default in Non-Exadata Environment

What is being announced?

As Grid Infrastructure daemon diskmon.bin is used for Exadata fencing, started from 11.2.0.3, resource ora.diskmon will be offline in non-Exadata environment. This is expected behaviour change.

Prior to 11.2.0.3:

ps -ef| grep diskmon.bin
grid      3361  3166  0 22:57 ?        00:00:00 /ocw/grid/bin/diskmon.bin -d -f

On 11.2.0.3:

ps -ef| grep diskmon.bin

>> no more diskmon.bin

 

 

一些diskmon进程的日志:

 

[ CSSD]2009-07-27 10:27:36.419 [20] >TRACE: kgzf_dskm_conn4: unable to connect to master
diskmon in 60174 msec

[ CSSD]2009-07-27 10:27:36.419 [20] >TRACE: kgzf_send_main1: connection to master diskmon
timed out

[ CSSD]2009-07-27 10:27:36.421 [22] >TRACE: KGZF: Fatal diskmon condition, IO fencing is
not available. For additional error info look at the master diskmon log file (diskmon.log)

[ CSSD]2009-07-27 10:27:36.421 [22] >ERROR: ASSERT clsssc.c 2471
[ CSSD]2009-07-27 10:27:36.421 [22] >ERROR: clssscSAGEInitFenceCompl: Fence completion
failed, rc 56859

It seems that the new process registered with Oracle Clusterware diskmon is not able to communicate properly .

setsid: failed with -1/1
dskm_getenv_oracle_user: calling getpwnam_r for user oracle
dskm_getenv_oracle_user: info for user oracle complete
dskm_set_user: unable to change ownership for the log directory
/optware/oracle/11.1.0.7/crs/log/shplab01/diskmon to user oracle, id 1101, errno 1
07/27/09 10:27:37: Master Diskmon starting

The tusc output of the cssd log gives the following information
...
1248953770.528145 [/optware/ora][20944]{2992772}
unlink("/var/spool/sockets/pwgr/client20944") ERR#2 ENOENT
1248953770.612485 [/optware/ora][20944]{2992772}
unlink("/tmp/.oracle_master_diskmon") ERR#1 EPERM
1248953770.649479 [/optware/ora][20944]{2992772}
unlink("/tmp/.oracle_master_diskmon") ERR#1 EPERM
1248953770.656719 [/optware/ora][20944]{2992772}
unlink("/var/spool/sockets/pwgr/client20944") ERR#1 EPERM
...

 There is a permission error of the file /tmp/.oracle_master_diskmon.
Solution

The resolution is to change the permissions of the file  /tmp/.oracle_master_diskmon, which should be owned by oracle . 

diskmon.log
============
2011-12-01 22:14:49.510: [ DISKMON][14036:1093384512] SKGXP:[386927568.6]{0}:
(14036 -> 12265) SKGXP_CHECK_HEART_BEAT_RESP_EXPIRE: NO PATH to Monitor
entry: 0x17161490
2011-12-01 22:14:49.510: [ DISKMON][14036:1093384512] SKGXP:[386927568.7]{0}:
  Subnet: 0
2011-12-01 22:14:49.510: [ DISKMON][14036:1093384512] SKGXP:[386927568.8]{0}:
   Remote endpoint [192.168.10.3/44538] is DOWN
2011-12-01 22:14:49.510: [ DISKMON][14036:1093384512] SKGXP:[386927568.9]{0}:
   Local endpoint [192.168.10.1/45530] is UP
2011-12-01 22:14:49.510: [ DISKMON][14036:1093384512]
SKGXP:[386927568.10]{0}: SKGXP_DO_HEART_BEAT_RESP: Matching Monitor Entry Not
Found
2011-12-01 22:14:49.510: [ DISKMON][14036:1093384512]
SKGXP:[386927568.11]{0}:   SKGXPGPID Internet address 192.168.10.3 RDS port
number 44538
2011-12-01 22:14:49.510: [ DISKMON][14036:1093384512] dskm_hb_thrd_main11:
got status change
2011-12-01 22:14:49.510: [ DISKMON][14036:1093384512]
dskm_ant_rsc_monitor_start: rscnam: o/192.168.10.3 rsc: 0x171609c0 state:
UNREACHABLE reconn_attempts: 0 last_reconn_ts: 1322773921
2011-12-01 22:14:49.649: [ DISKMON][14036:1093384512]
dskm_node_guids_are_offline: query SM done. retcode = 56891(REACHABLE)
2011-12-01 22:14:49.657: [ DISKMON][14036:1093384512] dskm_oss_get_net_info5:
oss_get_net_info for device o/192.168.10.3 returned skgxpid
040302010001894cb5afca0419ed706ae92f000008000000000000000000000001030000c0a80a
03000000000000000000000000adfa00000000000016000000 and the following 1 ip
adresess. known_reid: Yes
2011-12-01 22:14:49.657: [ DISKMON][14036:1093384512]     192.168.10.1
2011-12-01 22:14:49.657: [ DISKMON][14036:1093384512]
dskm_ant_rsc_monitor_start6.5:Cell does support TCP monitor, and does support
SM Query, cell incarnation is 1, guid num is 2
2011-12-01 22:14:49.657: [ DISKMON][14036:1093384512] GUID-0 =
0x0021280001a0af15
2011-12-01 22:14:49.657: [ DISKMON][14036:1093384512] GUID-1 =
0x0021280001a0af16
2011-12-01 22:14:49.657: [ DISKMON][14036:1093384512]
dskm_ant_rsc_monitor_start2: Connected to Same_Inc OSS device: o/192.168.10.3
numIP: 1
2011-12-01 22:14:49.657: [ DISKMON][14036:1093384512]     192.168.10.1

2011-12-01 22:15:07.501: [ DISKMON][14036:1108523328] dskm_slave_thrd_main3:
peer disconnected
2011-12-01 22:15:07.501: [ DISKMON][14036:1108523328] dskm_slave_thrd_main5:
client +ASM1/ASM/15374 disconnected, reid
cid=14e1b2b4de58ef1eff5487b58dccc906,icin=188142332,nmn=1,lnid=188142332,gid=7
,gin=1,gmn=0,umemid=0,opid=8,opsn=1,lvl=process hdr=0x       0

2011-12-01 22:15:08.440: [ CSSCLNT]clsssRecvMsg: got a disconnect from the
server while waiting for message type 1
2011-12-01 22:15:08.440: [ CSSCLNT]clssgsGroupGetStatus:  communications
failed (0/3/-1)

2011-12-01 22:15:08.440: [ CSSCLNT]clssgsGroupGetStatus: returning 8

2011-12-01 22:15:08.440: [ DISKMON][14036:1102219584] CRITICAL: Diskmon
exiting: dskm_rac_thrd_main10: Diskmon is shutting down due to CSSD ABORT
event
2011-12-01 22:15:08.440: [ DISKMON][14036:1102219584] SHUTDOWN FORCE due to
CSSD ABORT
2011-12-01 22:15:08.440: [ DISKMON][14036:1102219584] dskm_rac_thrd_main:
exiting
2011-12-01 22:15:08.754: [ DISKMON][14036:1104320832] dskm_slave_thrd_main5:
client orarootagent/13701 disconnected, reid
cid=DUMMY,icin=-1,nmn=-1,lnid=-1,gid=-1,gin=-1,gmn=-1,umemid=-1,opid=-1,opsn=-
1,lvl=process hdr=0xfece0100
2011-12-01 22:15:09.988: [ DISKMON][14036:1191118288] dskm_cleanup_thrds:
cleaning up the rac event handling thread tid 1102219584
[ DISKMON][13016]

I/O Fencing and SKGXP HA monitoring daemon -- Version 1.2.0.0
Process 13016 started on 2011-12-01 at 22:15:39.863

2011-12-01 22:15:39.867: [ DISKMON][13016] dskm main: starting up

ocssd.log
==========
2011-12-01 22:15:04.223: [    CSSD][1127139648]clssgmmkLocalKillThread: Time
up. Timeout 60500 Start time 369099698 End time 369160198 Current time
369160198
2011-12-01 22:15:04.223: [    CSSD][1127139648]clssgmmkLocalKillResults:
Replying to kill request from remote node 2 kill id 1 Success map 0x00000000
Fail map 0x00000000
2011-12-01 22:15:04.224: [GIPCHAUP][1094015296] gipchaUpperProcessDisconnect:
processing DISCONNECT for hendp 0x2aa5550 [00000000000092e5] { gipchaEndpoint
: port 'nm2_gts-cluster/af9c-724c-2e3f-3946', peer
'gts1db02:205f-3cac-025e-c962', srcCid 00000000-000092e5,  dstCid
00000000-000009d9, numSend 0, maxSend 100, usrFlags 0x4000, flags 0x204 }
2011-12-01 22:15:04.224: [    CSSD][1122408768]clssnmeventhndlr:
Disconnecting endp 0x932d ninf 0x1c3a2c0
2011-12-01 22:15:04.224: [    CSSD][1122408768]clssnmDiscHelper: gts1db02,
node(2) connection failed, endp (0x932d), probe(0x3000000000), ninf->endp
0x932d
2011-12-01 22:15:04.224: [    CSSD][1122408768]clssnmDiscHelper: node 2 clean
up, endp (0x932d), init state 3, cur state 3
2011-12-01 22:15:04.224: [GIPCXCPT][1122408768] gipcInternalDissociate: obj
0x2e99290 [000000000000932d] { gipcEndpoint : localAddr
'gipcha://gts1db01:nm2_gts-cluster/af9c-724c-2e3f-394', remoteAddr
'gipcha://gts1db02:205f-3cac-025e-c96', numPend 0, numReady 0, numDone 0,
numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x13860e, usrFlags
0x0 } not associated with any container, ret gipcretFail (1)
2011-12-01 22:15:04.224: [GIPCXCPT][1122408768] gipcDissociateF
[clssnmDiscHelper : clssnm.c : 3284]: EXCEPTION[ ret gipcretFail (1) ]  
failed to dissociate obj 0x2e99290 [000000000000932d] { gipcEndpoint :
localAddr 'gipcha://gts1db01:nm2_gts-cluster/af9c-724c-2e3f-394', remoteAddr
'gipcha://gts1db02:205f-3cac-025e-c96', numPend 0, numReady 0, numDone 0,
numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x13860e, usrFlags
0x0 }, flags 0x0
2011-12-01 22:15:04.224: [GIPCXCPT][1122408768] gipcInternalDissociate: obj
0x2e99290 [000000000000932d] { gipcEndpoint : localAddr
'gipcha://gts1db01:nm2_gts-cluster/af9c-724c-2e3f-394', remoteAddr
'gipcha://gts1db02:205f-3cac-025e-c96', numPend 0, numReady 0, numDone 0,
numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x13860e, usrFlags
0x0 } not associated with any container, ret gipcretFail (1)
2011-12-01 22:15:04.224: [GIPCXCPT][1122408768] gipcDissociateF
[clssnmDiscHelper : clssnm.c : 3430]: EXCEPTION[ ret gipcretFail (1) ]  
failed to dissociate obj 0x2e99290 [000000000000932d] { gipcEndpoint :
localAddr 'gipcha://gts1db01:nm2_gts-cluster/af9c-724c-2e3f-394', remoteAddr
'gipcha://gts1db02:205f-3cac-025e-c96', numPend 0, numReady 0, numDone 0,
numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x13860e, usrFlags
0x0 }, flags 0x0
2011-12-01 22:15:04.224: [    CSSD][1122408768]clssnmDiscEndp: gipcDestroy
0x932d
2011-12-01 22:15:04.603: [    
CSSD][1104976192](:CSSNM00005:)clssnmvDiskKillCheck: Aborting, evicted by
node gts1db02, number 2, sync 188142334, stamp 393990918
2011-12-01 22:15:04.603: [    
CSSD][1104976192]###################################
2011-12-01 22:15:04.603: [    CSSD][1104976192]clssscExit: CSSD aborting from
thread clssnmvKillBlockThread
2011-12-01 22:15:04.603: [    
CSSD][1104976192]###################################
2011-12-01 22:15:04.603: [    CSSD][1104976192](:CSSSC00012:)clssscExit: A
fatal error occurred and the CSS daemon is terminating abnormally
gts1db01, number 1, has experienced a failure in thread number 10 and is
shutting down
2011-12-01 22:15:04.603: [    CSSD][1104976192]clssscExit: Starting CRSD
cleanup

2011-12-01 22:15:04.737: [    CSSD][1103399232]clssgmDiscEndpcl: gipcDestroy
0xa2ea2f7
2011-12-01 22:15:04.925: [    
CSSD][1112947008](:CSSNM00058:)clssnmvDiskCheck: No I/O completions for
3472942430 ms for voting file o/192.168.10.5/DBFS_DG_CD_04_gts1cel03)
2011-12-01 22:15:04.925: [    CSSD][1112947008]clssnmvDiskAvailabilityChange:
voting file o/192.168.10.5/DBFS_DG_CD_04_gts1cel03 now offline
2011-12-01 22:15:04.925: [    
CSSD][1112947008](:CSSNM00058:)clssnmvDiskCheck: No I/O completions for
3472942450 ms for voting file o/192.168.10.4/DBFS_DG_CD_02_gts1cel02)
2011-12-01 22:15:04.925: [    CSSD][1112947008]clssnmvDiskAvailabilityChange:
voting file o/192.168.10.4/DBFS_DG_CD_02_gts1cel02 now offline
2011-12-01 22:15:04.925: [    
CSSD][1112947008](:CSSNM00058:)clssnmvDiskCheck: No I/O completions for
3472942480 ms for voting file o/192.168.10.3/DBFS_DG_CD_02_gts1cel01)
2011-12-01 22:15:04.926: [    CSSD][1112947008]clssnmvDiskAvailabilityChange:
voting file o/192.168.10.3/DBFS_DG_CD_02_gts1cel01 now offline
2011-12-01 22:15:04.926: [    
CSSD][1112947008](:CSSNM00018:)clssnmvDiskCheck: Aborting, 0 of 3 configured
voting disks available, need 2
2011-12-01 22:15:04.926: [    CSSD][1112947008]clssscExit: abort already set
1
2011-12-01 22:15:04.926: [   SKGFD][1109793088]Lib :OSS:: closing handle
0x2538e70 for disk :o/192.168.10.5/DBFS_DG_CD_04_gts1cel03:

2011-12-01 22:15:04.926: [   SKGFD][1098676544]Lib :OSS:: closing handle
0x2aaaac0d7cb0 for disk :o/192.168.10.3/DBFS_DG_CD_02_gts1cel01:

Heartbeat timeout logic may to fail to detect dead cells if diskmon
has been running for over 40 days.

Rediscovery Notes:
 If diskmon has been running for over 40 days and DB processes start to hang
 after a cell death, you may have hit this bug.

 All nodes may hang due to one of the heartbeat threads in diskmon
getting stuck trying to notify the instance(s) that it reconnected
to the cell. However if this occurs there is insufficient diagnostic
data collected to help confirm why the hang occurred.
This fix is a diagnostic enhancement for this scenario.

If diskmon/DSKM processes are hung/stuck this fix may help collect
additional useful diagnostics.

PROBLEM:
--------
Diskmon logs fill up quickly causing their disk / volume /u00  to become full

DIAGNOSTIC ANALYSIS:
--------------------
1)  The following are consistently and repeatedly logged:

2010-12-17 03:22:25.848: [ DISKMON][17796:1089268032] dskm_rac_ini13: calling
clssgsqgrp
2010-12-17 03:22:25.849: [ DISKMON][17796:1089268032] dskm_rac_ini80: called
clssgsqgrp:
2010-12-17 03:22:25.849: [ DISKMON][17796:1089268032] dskm_dump_group_priv:
vers: 0 flags: 0x0 confInc: 0 My confInc: 0
2010-12-17 03:22:25.849: [ DISKMON][17796:1089268032] dskm_dump_group_priv:
CSS Msg Hdr: vers: 0 type: UNKNOWN (0) chunks: NO MORE CHUNKS (0) transport:
UNKNOWN (0) mSize: 0
2010-12-17 03:22:25.849: [ DISKMON][17796:1089268032] dskm_dump_group_priv:
Group Private Data is not of type DSKM_MSG_SS_REQ. Not proceeding with msg
dump
2010-12-17 03:22:25.849: [ DISKMON][17796:1089268032] dskm_rac_ini15: Found
my member number 1 to be busy. Waiting (attempts: 598) for OCSSD to clean up
previous incarnation

2) Core files generated, and many  stack dumps of diskmon futher enlarges the
diskmon.log

   The following is frequently seen in the diskmon.log

2010-12-17 03:22:28.855: [ DISKMON][17796:1089268032] dskm_rac_ini16: OCSSD
has notified that another diskmon is currently running in this node.
This might be a duplicate startup. If not consult OCSSD log for additional
information.
2010-12-17 03:22:28.855: [ DISKMON][17796] INCIDENT : EXCEPTION (SIGNAL: 6)
in [gsignal()]
2010-12-17 03:22:28.855: [ DISKMON][17796] Thread 1089268032 got exception 6
2010-12-17 03:22:28.855: [ DISKMON][17796] Stack dump for thread 1089268032
[ DISKMON][17796]
....

Upgrading to RAC 11g R2 What you should know

How to upgrade?
Recommended: “Swing Kit”
Set up a new cluster on separate hardware
Install Grid Infrastructure at your leisure
Avoid time pressure
Create a physical standby on the new cluster
At a pre-defined outage window, perform a switchover
Remember: you don’t need to upgrade the database at the same time as you upgrade to Grid Infrastructure
The trick is to keep other standby databases in sync

Other upgrade options
If you don’t have Swing Kit…
Upgrade the existing software stack
Grid Infrastructure is an out of place upgrade
New Oracle home
Pre 11.2 Clusterware home and ASM home made redundant after successful upgrade
Out of place upgrade should make downgrade easier
Have not tested downgrade personally
Usually only one in the audience did

GDE Error: Error retrieving file - if necessary turn off error checking (404:Not Found)