10.2.0.4以后vip不会自动relocate back回原节点

10.2.0.4以后vip不会自动relocate back回原节点, 原因是ORACLE开发人员发现在实际使用中会遇到这样的情况: relocate back回原节点 需要停止VIP并在原始节点再次启动该VIP,但是如果原始节点上的公共网络仍不可用,则这个relocate的尝试将再次失败而failover到第二节点。 在此期间VIP将不可用,所以从10.2.0.4和11.1开始,默认的实例检查将不会自动relocate vip到原始节点。

 

 

 

详细见下面的Note介绍:

 

 

 

Applies to:
Oracle Server – Enterprise Edition – Version 10.2.0.4 to 11.1.0.7 [Release 10.2 to 11.1]
Information in this document applies to any platform.
Symptoms

Starting from 10.2.0.4 and 11.1, VIP does not fail-over back to the original node even after the public network problem is resolved.  This behavior is the default behavior in 10.2.0.4 and 11.1 and is different from that of 10.2.0.3
Cause

This is actually the default default behavior in 10.2.0.4 and 11.1

In 10.2.0.3, on every instance check, the instance attempted to relocate the VIP back to the preferred node (original node), but that required stopping the VIP and then attempt to restart the VIP on the original node. If the public network on the original node is still down, then the attempt to relocate VIP to the original node will fail and the VIP will fail-over back to the secondary node.  During this time, the VIP is not available, so starting from 10.2.0.4 and 11.1, the default behavior is that the instance check will not attempt to relocate the VIP back to the original node.
Solution

If the default behavior of 10.2.0.4 and 11.1 is not desired and if there is a need to have the VIP relocate back to the original node automatically when the public network problem is resolved, use the following workaround
Uncomment the line
ORA_RACG_VIP_FAILBACK=1 && export ORA_RACG_VIP_FAILBACK

in the racgwrap script in $ORACLE_HOME/bin

With the above workaround, VIP will relocate back to the original node when CRS performs the instance check, so in order for the VIP to relocate automatically, the node must have at least one instance running.

The instance needs to be restarted or CRS needs to be restarted to have the VIP start relocating back to the original node automatically if the change is being made on the existing cluster.

Relying on automatic relocation of VIP can take up to 10 minutes because the instance check is performed once every 10 minutes.  Manually relocating the VIP is only way to guarantee quick relocation of VIP back to the original node.
To manually relocate the VIP, start the nodeapps by issuing
srvctl start nodeapps -n <node name>

Starting the nodeapps does not harm the online resources such as ons and gsd.

Script:Collect vip resource Diagnostic Information

以下脚本可以用于收集 Oracle RAC中vip 资源或其他CRS resource的诊断信息:

 

action plan:
./runcluvfy.sh stage -post crsinst -n all  -verbose
./runcluvfy.sh stage -pre crsinst -n all  -verbose
or
cluvfy stage -post crsinst -n all -verbose
cluvfy stage -pre  crsinst -n all -verbose
1. Please upload the following logs of all two nodes:
$CRS_HOME/log/<nodename>/*.log
$CRS_HOME/log/<nodename>/crsd/*.log
$CRS_HOME/log/<nodename>/cssd/*.log
$CRS_HOME/log/<nodename>/racg/*.log
$CRS_HOME/log/<nodename>/client/*.log
$CRS_HOME/log/<nodename>/evmd/*.log
/etc/oracle/oprocd/*.log.* or /var/opt/oracle/oprocd/*.log.* (If have)
$crs_stat –t
$crsctl check crs
$crsctl check boot
2. 
Please consult your sysadmin and make sure that the gateway is pingable all the time 
1- test the gw on every node
consult your sysadmin to create a crontab unix shell script to ping the
gateway of your public interface every 2 seconds for example and the result is to be
spooled in /tmp/test_gw_<nodename>.log
ping your gateway  and upload the ping log 
2- increase the tracing level of the vip resource  
  as root user
  # cd $ORA_CRS_HOME/bin
  # crsctl debug log res <resname:level>
  # crsctl debug log res <vip resourfce name>:5
3- restart the clusterware
3- execute this test on both nodes at the same time
   $ script /tmp/testvip_<nodename>.log
   $ cd $ORA_CRS_HOME/bin
   $ hostname
   $ date
   $ cat /etc/hosts
   $ ifconfig -a
   $ oifcfg getif  
   $ netstat -rn
   $ oifcfg iflist
   $ srvctl config nodeapps -n <nodename> -a -g -s -l               (repeate it for all nodes)
   $ crs_stat –t
   $ exit
4- reset the tracing level of the vip resource  
  as root user
  # cd $ORA_CRS_HOME/bin
  # crsctl debug log res <resname:level>
  # crsctl debug log res <vip resourfce name>:1
Up on the next occurence, please upload the following information from all nodes
  a-  /tmp/test_gw_<nodename>.log
  b- /tmp/testvip_<nodename>.log
  c- the crsd log
  d. The resource racg
     $ORA_CRS_HOME/log/<nodename>/racg/vip*
  e. the racgvip script from
     $ORA_CRS_HOME/bin/racgvip
  f- RDA from all the nodes
    Note 314422.1 Remote Diagnostic Agent (RDA) 4.0 – Overview
   g- the o/s message file
      IBM: /bin/errpt -a > messages.out
      Linux: /var/log/messages
      Solaris: /var/adm/messages
3. CRS Diagnostics
note 330358.1 -- CRS Diagnostic Collection Guide, please use (all .gz files especially crsData_$HOST.tar.gz
need to be uploaded)
diagcollection.pl --collect 
Please make sure to include *ALL* requested files (missing any will delay or prevendting from
identifying root cause) from *ALL* nodes in a single zip and upload.
Note 330358.1 - CRS 10gR2/ 11gR1/ 11gR2 Diagnostic Collection Guide       
Note 298895.1 - Modifying the default gateway address used by the Oracle 10g VIP
Note 399213.1 - VIP Going Offline Intermittantly - Slow Response from Default Gateway
Note 401783.1 - Changes in Oracle Clusterware after applying 10.2.0.3 Patchset

How to troubleshooting RAC Vip Problem

1.- Please provide the output of the following commands from each node:
srvctl config nodeapps -n <nodename> -a -g -s -l
ifconfig -a
cat /etc/hosts
2.- Please set debug mode for VIP resources and reproduce the problem. Please take note of the time of the test:
a.- As root user, issue the command :
crsctl debug log res "<ora.dbtest2.vip>:5"
(note: replace ora.dbtest2.vip for each of your vip resources)
b.- Take note of node, date, time
c.- Reproduce the problem
d.- You may turn off debugging with command :
crsctl debug log res "<ora.dbtest2.vip>:0"
3.- Set OS watcher as the following note explain:
Note 301137.1: OS Watcher User Guide - upload output of OS Watcher
4.- Collect from each node:
a.- Os log files
/var/log/messages
b.- Os watcher stats for the time of the test
c.- CRS log files:
From the $ORA_CRS_HOME, run the following commands as root
* $script /tmp/diag.log
* $env
* $id
* $cd $ORA_CRS_HOME/bin
o Execute diagcollection.pl by passing the crs_home as the following
o export ORA_CRS_HOME=/u01/crs
o $ORA_CRS_HOME/bin/diagcollection.pl -crshome=$ORA_CRS_HOME --collect
This will create the crsData_<hostname>.tar.gz, ocrData_<hostname>.tar.gz,
oraData_<hostname>.tar.gz and basData_<hostname>.tar.gz. Additionally in 11gR2,
there will be os_<hostname>.tar.gz and ipd_
d.- If Vendor clusterware is not used then upload the oprocd logs.
They are in /var/opt/oracle/`hostname`/ on most platforms.

沪ICP备14014813号

沪公网安备 31010802001379号