Upgrade GI/CRS 11.1.0.7 to 11.2.0.2. Rootupgrade.sh Hanging

Upgrade grid 11.1.0.7 to 11.2.0.2. Rootupgrade.sh Hanging

We installed 11gR2 GI software and applied PSU2 patches upon getting runupgrade.sh prompt.runupgrade.sh hang on the first node.

[root@vrh8 client]# uname -a
Linux vrh8 2.6.18-238.5.1.el5 #1 SMP Mon Feb 21 05:52:39 EST 2011 x86_64

x86_64 x86_64 GNU/Linux
cluvfy passed with 2 ignorable errors:

[root@vrh8 vrh8]# cd /tmp
[root@vrh8 tmp]# df -lh .
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg0-tmp 992M 263M 679M 28% /tmp

[root@vrh8 grid]# grep fail cluvfy_during_inst.log
/tmp l118464lwap1049 /tmp 713MB 1GB failed
Result: Free disk space check failed for “l118464lwap1049:/tmp”
/tmp vrh8 /tmp 692.131MB 1GB failed
Result: Free disk space check failed for “vrh8:/tmp”
Result: Check for multiple users with UID value 0 failed

[root@vrh8 vrh8]# cd /tmp
[root@vrh8 tmp]# df -lh .
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg0-tmp 992M 263M 679M 28% /tmp

We installed 11gR2 GI software and applied PSU2 patches upon getting runupgrade.sh prompt.

runupgrade.sh hang on the first node. We followed “How to Proceed from Failed Upgrade to 11gR2

Grid Infrastructure on Linux/Unix [ID 969254.1]” 1A section, it didn’t help.

[root@vrh8 bin]# ./crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [11.1.0.7.0]

rootupgrade.sh output:

[root@vrh8 11.2.0.2]# ./rootupgrade.sh
Running Oracle 11g root script…

The following environment variables are set as:
ORACLE_OWNER= oracrs
ORACLE_HOME= /d22/oracrs/11.2.0.2

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of “dbhome” have not changed. No need to overwrite.
The contents of “oraenv” have not changed. No need to overwrite.
The contents of “coraenv” have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /d22/oracrs/11.2.0.2/crs/install/crsconfig_params
LOCAL ADD MODE
Creating OCR keys for user ‘root’, privgrp ‘root’..
Operation successful.
OLR initialization – successful
Adding daemon to inittab
ACFS-9200: Supported
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9312: Existing ADVM/ACFS installation detected.
ACFS-9314: Removing previous ADVM/ACFS installation.
ACFS-9315: Previous ADVM/ACFS components successfully removed.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies – this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.

****hanging here for more than 2 hrs, so we cancelled it

INT at /d22/oracrs/11.2.0.2/crs/install/crsconfig_lib.pm line 1173.
/d22/oracrs/11.2.0.2/perl/bin/perl -I/d22/oracrs/11.2.0.2/perl/lib –

I/d22/oracrs/11.2.0.2/crs/install /d22/oracrs/11.2.0.2/crs/install/rootcrs.pl execution failed
Oracle root script execution aborted!

1. The below logs are required to analyze this issue.

NEW_GRID_HOME/cfgtoollogs/crsconfig/*.*
NEW_GRID_HOME/log/<nodename>/*.*

Please upload the logs under the above directories. Zip and upload the files including the subdirectories.

2. When the rootupgrade was handing, did you check the usage of /tmp. Was free space exhausting?

=== ODM Research ===

There has been multiple root script run for upgrade. I have taken the first incident from the file
rootcrs_vrh8.log:
—————————————–

2011-02-13 13:07:55: Successfully started requested Oracle stack daemons
2011-02-13 13:07:55: Upgrading the existing voting disks!
2011-02-13 13:07:55: Executing /d22/oracrs/11.2.0.2/bin/cssvfupgd
2011-02-13 13:07:55: Executing cmd: /d22/oracrs/11.2.0.2/bin/cssvfupgd <<<<<<<<<<<<<<< The root script seems to hang at this point.
2011-02-13 15:01:16: ###### Begin DIE Stack Trace ######
2011-02-13 15:01:16: Package File Line Calling
2011-02-13 15:01:16: ————— ——————– —- ———-
2011-02-13 15:01:16: 1: main rootcrs.pl 325 crsconfig_lib::dietrap
2011-02-13 15:01:16: 2: crsconfig_lib crsconfig_lib.pm 9301 main::__ANON__
2011-02-13 15:01:16: 3: crsconfig_lib crsconfig_lib.pm 9301 (eval)
2011-02-13 15:01:16: 4: crsconfig_lib crsconfig_lib.pm 9260 crsconfig_lib::system_cmd_capture1
2011-02-13 15:01:16: 5: crsconfig_lib crsconfig_lib.pm 9247 crsconfig_lib::system_cmd_capture
2011-02-13 15:01:16: 6: crsconfig_lib crsconfig_lib.pm 924 crsconfig_lib::system_cmd
2011-02-13 15:01:16: 7: oracss oracss.pm 275 crsconfig_lib::run_crs_cmd
2011-02-13 15:01:16: 8: crsconfig_lib crsconfig_lib.pm 1019 oracss::CSS_upgrade
2011-02-13 15:01:16: 9: crsconfig_lib crsconfig_lib.pm 1006 crsconfig_lib::start_cluster
2011-02-13 15:01:16: 10: main rootcrs.pl 697 crsconfig_lib::perform_start_cluster
2011-02-13 15:01:16: ####### End DIE Stack Trace #######

cssvfupgd.log:
——————–
Oracle Database 11g Clusterware Release 11.2.0.2.0 – Production Copyright 1996, 2010 Oracle. All rights reserved.
2011-02-13 13:07:55.356: [ OCRRAW][3605955376]prgval:buffer passed is too small
2011-02-13 13:07:55.361: [CSSVFUPG][3605955376]cssvfupgd_GetVFList: found voting file /s01/app/ocrvot/VOTEDISK/UAT2_vdisk1.dat
2011-02-13 13:07:55.365: [ OCRRAW][3605955376]prgval:buffer passed is too small
2011-02-13 13:07:55.369: [CSSVFUPG][3605955376]cssvfupgd_GetVFList: found voting file /s01/app/ocrvot/VOTEDISK/UAT2_vdisk2.dat
2011-02-13 13:07:55.373: [ OCRRAW][3605955376]prgval:buffer passed is too small
2011-02-13 13:07:55.377: [CSSVFUPG][3605955376]cssvfupgd_GetVFList: found voting file /s01/app/ocrvot/VOTEDISK/UAT2_vdisk3.dat
2011-02-13 13:07:55.402: [CSSVFUPG][3605955376]cssvfupgd_SetNum: Processing SYSTEM.css.misscount
2011-02-13 13:07:55.404: [CSSVFUPG][3605955376]cssvfupgd_SetNum: Processing SYSTEM.css.disktimeout
2011-02-13 13:07:55.406: [CSSVFUPG][3605955376]cssvfupgd_SetNum: Processing SYSTEM.css.reboottime
2011-02-13 13:07:55.408: [CSSVFUPG][3605955376]cssvfupgd_SetNum: Processing SYSTEM.css.diagwait
2011-02-13 13:07:55.414: [CSSVFUPG][3605955376]cssvfupgd_SetNum: Processing SYSTEM.css.pollinterval
2011-02-13 13:07:55.416: [CSSVFUPG][3605955376]cssvfupgd_GetGUID: Fetching GUID for /s01/app/ocrvot/VOTEDISK/UAT2_vdisk1.dat
2011-02-13 13:07:55.419: [ SKGFD][3605955376]NOTE: No asm libraries found in the system

2011-02-13 13:07:55.419: [ CLSF][3605955376]Allocated CLSF context
2011-02-13 13:07:55.419: [ SKGFD][3605955376]Discovery with str:/s01/app/ocrvot/VOTEDISK/UAT2_vdisk1.dat:

2011-02-13 13:07:55.419: [ SKGFD][3605955376]UFS discovery with :/s01/app/ocrvot/VOTEDISK/UAT2_vdisk1.dat:

2011-02-13 13:07:55.420: [ SKGFD][3605955376]Fetching UFS disk :/s01/app/ocrvot/VOTEDISK/UAT2_vdisk1.dat:

2011-02-13 13:07:55.420: [ SKGFD][3605955376]OSS discovery with :/s01/app/ocrvot/VOTEDISK/UAT2_vdisk1.dat:

2011-02-13 13:07:55.421: [ SKGFD][3605955376]Handle 0x124de360 from lib :UFS:: for disk :/s01/app/ocrvot/VOTEDISK/UAT2_vdisk1.dat:

2011-02-13 14:19:31.132: [ SKGFD][3605955376]WARNING:io_getevents timed out 2226 sec >>>>>>>>>>>>>>>>>>>> After about one hour it shows time out error.

2011-02-13 14:19:31.132: [ SKGFD][3605955376]WARNING:io_getevents timed out 2226 sec

The script has stalled at the voting disk upgrade phase. Please provide me the below details.

1. What cluster file system are you using for the voting files? provide its details and the mount options used.

for ocfs, get its mount options
mount | grep ocfs

3. Voting disks details
ls -l /s01/app/ocrvot/VOTEDISK/UAT2_vdisk*

4. Get the diagwait detail.
OLD_CRS_HOME/bin/crsctl get css diagwait

1. What cluster file system are you using for the voting files? provide its details and the mount options used
/dev/emcpowera1 on /s01/app/ocrvot type ocfs2 (rw,_netdev,datavolume,nointr,heartbeat=local)

2. Voting disks details

[root@vrh8 11.2.0.2]# ls -l /s01/app/ocrvot/VOTEDISK/UAT2_vdisk*
-rw-r—– 1 oracrs oinstall 21004288 Jun 11 07:31 /s01/app/ocrvot/VOTEDISK/UAT2_vdisk1.dat
-rw-r—– 1 oracrs oinstall 21004288 Jun 11 07:31 /s01/app/ocrvot/VOTEDISK/UAT2_vdisk2.dat
-rw-r—– 1 oracrs oinstall 21004288 Jun 11 07:31 /s01/app/ocrvot/VOTEDISK/UAT2_vdisk3.dat

3. Get the diagwait detail

crsctl get css diagwait
Failure 33 in main Oracle Cluster Registry context initialization: PROC-33: Oracle Cluster Registry is not configured Operating System error [No such file or directory] [2]

owc may not be required now as the issue we face is clear.

The diagwait should not error out, as explained in the following note,
11gR2 rootupgrade.sh Fails as cssvfupgd Can not Upgrade Voting Disk (Doc ID 1102283.1)

Make sure you are running ‘crsctl get css diagwait’ from the old crs home. You can also check it in multiple node. If it errors out, this has to be fixed as explained in the above note.

according to that note ,When I ./oprocd stop ,get error:
[root@l118464lwap1049 bin]# ./oprocd stop
Jun 16 23:24:42.966 | ERR | failed to connect to daemon, errno(111)

ACFS-9200: Supported
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies – this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.

cssvfupgd.log
2011-02-13 23:36:49.311: [ OCRRAW][3394941744]prgval:buffer passed is too small
2011-02-13 23:36:49.315: [CSSVFUPG][3394941744]cssvfupgd_GetVFList: found voting
file /s01/app/ocrvot/VOTEDISK/UAT2_vdisk2.dat
2011-02-13 23:36:49.319: [ OCRRAW][3394941744]prgval:buffer passed is too small
2011-02-13 23:36:49.323: [CSSVFUPG][3394941744]cssvfupgd_GetVFList: found voting
file /s01/app/ocrvot/VOTEDISK/UAT2_vdisk3.dat
2011-02-13 23:36:49.351: [CSSVFUPG][3394941744]cssvfupgd_SetNum: Processing SYST
EM.css.misscount
2011-02-13 23:36:49.354: [CSSVFUPG][3394941744]cssvfupgd_SetNum: Processing SYST
EM.css.disktimeout
2011-02-13 23:36:49.356: [CSSVFUPG][3394941744]cssvfupgd_SetNum: Processing SYST
EM.css.reboottime
2011-02-13 23:36:49.358: [CSSVFUPG][3394941744]cssvfupgd_SetNum: Processing SYST
EM.css.diagwait
2011-02-13 23:36:49.367: [CSSVFUPG][3394941744]cssvfupgd_SetNum: Processing SYST
EM.css.pollinterval
2011-02-13 23:36:49.369: [CSSVFUPG][3394941744]cssvfupgd_GetGUID: Fetching GUID
for /s01/app/ocrvot/VOTEDISK/UAT2_vdisk1.dat
2011-02-13 23:36:49.371: [ SKGFD][3394941744]NOTE: No asm libraries found in t
he system

2011-02-13 23:36:49.372: [ CLSF][3394941744]Allocated CLSF context
2011-02-13 23:36:49.372: [ SKGFD][3394941744]Discovery with str:/s01/app/ocrvo
t/VOTEDISK/UAT2_vdisk1.dat:

2011-02-13 23:36:49.372: [ SKGFD][3394941744]UFS discovery with :/s01/app/ocrv
ot/VOTEDISK/UAT2_vdisk1.dat:

2011-02-13 23:36:49.372: [ SKGFD][3394941744]Fetching UFS disk :/s01/app/ocrvo
t/VOTEDISK/UAT2_vdisk1.dat:

2011-02-13 23:36:49.372: [ SKGFD][3394941744]OSS discovery with :/s01/app/ocrv
ot/VOTEDISK/UAT2_vdisk1.dat:

2011-02-13 23:36:49.372: [ SKGFD][3394941744]Handle 0x98c4360 from lib :UFS::
for disk :/s01/app/ocrvot/VOTEDISK/UAT2_vdisk1.dat:

Question:
in Your update about cssvfupgd.log You stated it was hanging there.
Is there an entry after about 70 minutes about a timeout in that log file like:

2011-02-13 23:36:49.372: [ SKGFD][3394941744]Handle 0x98c4360 from lib :UFS::
for disk :/s01/app/ocrvot/VOTEDISK/UAT2_vdisk1.dat:
2011-02-17 0:48:19.372: [ SKGFD][3394941744]WARNING:io_getevents timed out 4294 sec <<<< present ???

Please provide the following outputs:
rpm -qa|grep ocfs2
uname -a
cat /etc/redhat-release

[root@vrh8 ~]# rpm -qa|grep ocfs2
ocfs2console-1.4.4-1.el5
ocfs2-tools-1.4.4-1.el5
ocfs2-2.6.18-238.5.1.el5-1.4.7-1.el5
[root@vrh8 ~]# uname -a
Linux vrh8 2.6.18-238.5.1.el5 #1 SMP Mon Feb 21 05:52:39 EST 2011 x86_64 x86_64 x86_64 GNU/Linux
[root@vrh8 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.6 (Tikanga)
[root@vrh8 ~]#

Combinations that install SUCCESSFUL:

OEL5.4+ocfs2-1.4.7-1+ocfs2-tools-1.4.4
OEL5.6+ocfs2-1.4.8-1+ocfs2-tools-1.6.3
OEL5.6+ocfs2-1.4.7-1+ocfs2-tools-1.4.4
RHLE5.6+OEL kernel(redhat compatible kernel)+ocfs2-1.4.8-1+ocfs2-tools-1.6.3
RHLE5.6+OEL kernel(redhat compatible kernel)+ocfs2-1.4.7-1+ocfs2-tools-1.4.4
RHEL5.4

Combinations that failed:
RHLE5.6(redhat kernel)+ocfs2-1.4.7-1+ocfs2-tools-1.4.4
RHLE5.6(redhat kernel)+ocfs2-1.4.8-1+ocfs2-tools-1.6.3

Problem reproduces with redhat kernel — RHEL 5.6 with 2.6.18-2xx kernels

Please review the following Note to change the location of your voting disk
Note 428681.1
Title: How to ADD/REMOVE/REPLACE/MOVE Oracle Cluster Registry (OCR) and Voting Disk

Pasting info from —
Oracle? Clusterware Administration and Deployment Guide
11g Release 2 (11.2)

3 Managing Oracle Cluster Registry and Voting Disks
Oracle Universal Installer for Oracle Clusterware 11g release 2 (11.2), does not support the use of raw or block devices. However, if you upgrade from a previous Oracle Clusterware release, then you can continue to use raw or block devices.

[oracrs@vrh8 grid]$ grep fail cluvfy_during_inst_061711.log
/tmp l118464lwap1049 /tmp 706MB 1GB failed
Result: Free disk space check failed for “l118464lwap1049:/tmp”
/tmp vrh8 /tmp 927.1312MB 1GB failed
Result: Free disk space check failed for “vrh8:/tmp”
Result: Check for multiple users with UID value 0 failed
PRVF-5431 : Oracle Cluster Voting Disk configuration check failed

[oracrs@vrh8 grid]$ ./runcluvfy.sh stage -pre crsinst -n vrh8,l118464lwap1049 -verbose|tee cluvfy_during_inst.log

Please upload the following Cluvfy trace log —
$ORA_CRS_HOME/cv/log/cvutrace.log.0

Please download the latest CVU from OTN:
http://www.oracle.com/technetwork/database/clustering/downloads/cvu-download-homepage-099973.html

Please upload
/s02/app/crs/11.2.0.2/log/vrh8/agent/ohasd/oraagent_oracrs/oraagent_oracrs.log

In addition pls upload
/s02/app/crs/11.2.0.2/log/vrh8/agent/ohasd/oracssdagent_root/oracssdagent_root.log

Please run this command on both the new setup and your existing production setup for a quick comparison —
rpm -qa|grep ocfs2

Server with issue:
[root@vrh8 ohasd]# rpm -qa|grep ocfs2
ocfs2console-1.4.4-1.el5
ocfs2-tools-1.4.4-1.el5
ocfs2-2.6.18-238.5.1.el5-1.4.7-1.el5

Prod:

[root@vrh9 bin]# rpm -qa|grep ocfs2
ocfs2-2.6.18-194.el5-1.4.7-1.el5
ocfs2console-1.4.4-1.el5
ocfs2-tools-1.4.4-1.el5
ocfs2-2.6.18-194.8.1.el5-1.4.7-1.el5

[root@vrh8 ~]# uname -a
Linux vrh8 2.6.18-238.5.1.el5 #1 SMP Mon Feb 21 05:52:39 EST 2011 x86_64 x86_64 x86_64 GNU/Linux

[root@vrh8 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.6 (Tikanga)

rpm -qa|grep ocfs2
ocfs2console-1.4.4-1.el5
ocfs2-tools-1.4.4-1.el5
ocfs2-2.6.18-238.5.1.el5-1.4.7-1.el5

@ . from Bug 11876815 (Doc ID 1321757.1)
@ combinations that install SUCCESSFUL:
@ .
@ OEL5.4+ocfs2-1.4.7-1+ocfs2-tools-1.4.4
@ OEL5.6+ocfs2-1.4.8-1+ocfs2-tools-1.6.3
@ OEL5.6+ocfs2-1.4.7-1+ocfs2-tools-1.4.4
@ RHLE5.6+OEL kernel(redhat compatible kernel)+ocfs2-1.4.8-1+ocfs2-tools-1.6.3
@ RHLE5.6+OEL kernel(redhat compatible kernel)+ocfs2-1.4.7-1+ocfs2-tools-1.4.4
@ RHEL5.4
@ .
@ combinations that failed:
@ RHLE5.6(redhat kernel)+ocfs2-1.4.7-1+ocfs2-tools-1.4.4
@ RHLE5.6(redhat kernel)+ocfs2-1.4.8-1+ocfs2-tools-1.6.3
@ .
@ .
@ So that is clear that , it is redhat kernel’s problem.Since RHEL5.6 redhat
@ provided 2.6.18-2xx kernels, we can’t fix redhat kernels, please use Oracle
@ Enterprise kernel (redhat compatible) for installation.

As per last action plan (conveyed if any) you need to contact REDHAT support to know the cause of this issue. Workaround is to not use OCFS and go for raw device for upgrade to succeed.
A Oracle bug 11876815 was logged internally for this hang issue and few combinations of OEL, RHEL, OCFS2 were tried and tested and the combination you are using has not worked for us too (per bug internal updates given above)
The solution provided by Oracle bug developer is to use OEL and not RHEL or contact RHEL support for identifying the cause and solution (incase they have already tested this setup).
Let me know if RHEL support is already engaged and provide the case id so that I can open internal SR for Oracle/Red Hat Joint Escalation Team (JET) Engagement for both vendors to work together internally.

+ the SR issue of grid upgrade from 11.1 to 11.2.0.2.2 is resolved
– voting disk was moved from ocfs to raw device – as a workaround for Bug 11876815
– set TMP and TEMP env to new dir with availabe space before running the installer and prechecks to succeed
– applied GIPSU#2 before the rootupgrade.sh step
– rootupgrade.sh step was successful on all nodes
– verified post upgrade checks and logs to confirm GI upgrade was success !

+ DB upgrade to 11.2.0.2 Plus PSU#2 will be resumed shorlty

Upgrade GI/CRS 11.1.0.7 to 11.2.0.2. Rootupgrade.sh Hanging

Comments

Leave a Reply Cancel reply