留言板历史

原留言板是为了记录网友的一些问题以及我的分析，现在有更方便的t.askmac.cn可以使用了，所以旧的留言板功能将被取消，本贴保留了留言板的历史记录。

(41) lx
Sat, 15 October 2011 11:39:11 +0800
兄弟，你link的那些国外oracle圈的网站怎么都没了？

(40) 重庆-漂流
Tue, 11 October 2011 09:56:10 +0800

数据库报这类错误，请帮看下怎么解决
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0xFFFFFFFF] [PC:0x7FFF65D0, {empty}]
Errors in file d:appadministratordiagrdbmsorclorcltraceorcl_ora_2368.trc (incident=143793):
ORA-07445: 出现异常错误: 核心转储 [PC:0x7FFF65D0] [ACCESS_VIOLATION] [ADDR:0xFFFFFFFF] [PC:0x7FFF65D0] [UNABLE_TO_READ] []
ORA-12080: IOQ 批处理缺失缓冲区高速缓存

Trace file d:appadministratordiagrdbmsorclorcltraceorcl_ora_2368.trc
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 – Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Windows Server 2003 Version V5.2 Service Pack 2
CPU : 8 – type 586, 2 Physical Cores
Process Affinity : 0x00000000
Memory (Avail/Total): Ph:6549M/8190M, Ph+PgF:7168M/9796M, VA:1761M/4095M
Instance name: orcl
Redo thread mounted by this instance: 1
Oracle process number: 47
Windows thread id: 2368, image: ORACLE.EXE (SHAD)

*** 2011-10-11 09:12:27.953
*** SESSION ID:(280.236) 2011-10-11 09:12:27.953
*** CLIENT ID:() 2011-10-11 09:12:27.953
*** SERVICE NAME:(orcl) 2011-10-11 09:12:27.953
*** MODULE NAME:(?？?？?？exe) 2011-10-11 09:12:27.953
*** ACTION NAME:() 2011-10-11 09:12:27.953

Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0xFFFFFFFF] [PC:0x7FFF65D0, {empty}]
Incident 143793 created, dump file: d:appadministratordiagrdbmsorclorclincidentincdir_143793orcl_ora_2368_i143793.trc
ORA-07445: 出现异常错误: 核心转储 [PC:0x7FFF65D0] [ACCESS_VIOLATION] [ADDR:0xFFFFFFFF] [PC:0x7FFF65D0] [UNABLE_TO_READ] []
ORA-12080: IOQ 批处理缺失缓冲区高速缓存

(39) dume
Mon, 10 October 2011 21:03:15 +0800
再一次尝试的时候，发现使用多个窗口都可以正常登陆了，
窗口一：
C:Usersmdu>sqlplus sys/oracle@shared as sysdba

SQL*Plus: Release 10.2.0.1.0 – Production on 星期一 10月 10 20:54:03 20

连接到:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 – Production
With the Partitioning, OLAP and Data Mining options

SQL>
窗口二：
C:Usersmdu>sqlplus sys/oracle@shared as sysdba

SQL*Plus: Release 10.2.0.1.0 – Production on 星期一 10月 10 20:54:12 2011

已连接。
SQL>
这里的疑问是：我明明是将shared_server_sessions设置为1，此时为什么还可以以shared模式再次登陆呢？
其它的参数与上一贴子中内容一样，就不再粘贴了。

—————————————
[oracle@node admin]$ more listener.ora
# listener.ora Network Configuration File: /u01/oracle/oracle/oracle10g/network/admin/listener.ora
# Generated by Oracle configuration tools.

SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(SID_NAME = PLSExtProc)
(ORACLE_HOME = /u01/oracle/oracle/oracle10g)
(PROGRAM = extproc)
)
)

LISTENER =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.146.129)(PORT = 1521))
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC0))
)
)
—————————————————————–
# tnsnames.ora Network Configuration File: F:oracleproduct10.2.0client_1networkadmintnsnames.ora
# Generated by Oracle configuration tools.

GUOSHUN =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.146.129)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = guoshun)
)
)
shared =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.146.129)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = guoshun)
(SERVER = shared)
)
)

dedicated =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.146.129)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = guoshun)
(SERVER = dedicated)
)
)

EXTPROC_CONNECTION_DATA =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC0))
)
(CONNECT_DATA =
(SID = PLSExtProc)
(PRESENTATION = RO)
)
)
——————————————————

(38) Maclean
Mon, 10 October 2011 19:10:10 +0800

dume

什么叫连不上？连不上怎么也得有个报错信息或者其他症状吧。

(37) Maclean
Mon, 10 October 2011 19:09:19 +0800

stoperp 把 pfile 或 spfile的内容上传

cat init$SID.ora

strings spfile$SID.ora

(36) dume
Mon, 10 October 2011 17:00:01 +0800

本人是想测试一下，如果一个数据库同时起用了共享服务器模式和专有服务器模式,在客户端连接时优先使用哪种方式连接？如果是共享服务器模式优先的话，那么当sessions数量受限于shared_server_sessions参数时如何处理到来的连接请求？是不是用使用dedicated方式进行连接?
环境：centos4.8, oracle10201
实验中shared_server_sessions设置为1，dipatchers=(protocol=tcp)(dispatchers=1)，同时在客户端的连接字符串中不指定server参数，具体如下：
GUOSHUN =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.146.129)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = guoshun)
)
)
其它的一些参数：
SQL> show parameter share

NAME TYPE VALUE
———————————— ———– ——————————
hi_shared_memory_address integer 0
max_shared_servers integer
shared_memory_address integer 0
shared_pool_reserved_size big integer 7M
shared_pool_size big integer 0
shared_server_sessions integer 1
shared_servers integer 2
SQL> show parameter dispa

NAME TYPE VALUE
———————————— ———– ——————————
dispatchers string (protocol=tcp)(dispatchers=1)
max_dispatchers integer
SQL> show parameter shared_server_

NAME TYPE VALUE
———————————— ———– ——————————
max_shared_servers integer
shared_server_sessions integer 1
shared_servers integer 2
使用客户端连接,打开第一个窗口执行C:Usersmdu>sqlplus system/oracle@guoshun没有问题，打开第二窗口执行此命令就连不上了！这是为什么呢？
我认为第一次能连接上是正常的，第二次不能再以shared方式连接了，应该自动能够以dedicated方式连接上才对！

(35) stoperp
Mon, 10 October 2011 10:10:04 +0800

### BEGIN LOG – DATE: 111010, TIME: 095425 ###

[root@linux2 ~]#arch
ia64

[root@linux2 ~]# cat /etc/issue
Red Hat Enterprise Linux AS release 4 (Nahant Update 5)
Kernel r on an m

[root@linux2 ~]# whoami
root
[root@linux2 ~]# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
pending signals (-i) 1024
max locked memory (kbytes, -l) 128
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 12164
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
[root@linux2 ~]# df -h /dev/shm
Filesystem Size Used Avail Use% Mounted on
none 3.0G 0 3.0G 0% /dev/shm
[root@linux2 ~]# ipcs -ma

—— Shared Memory Segments ——–
key shmid owner perms bytes nattch status
0x00000000 32768 gdm 600 393216 2 dest
0x00000000 163841 oracle 640 2080768 19
0x00000000 196610 oracle 640 1610612736 19
0x3f327308 229379 oracle 640 16384 19

—— Semaphore Arrays ——–
key semid owner perms nsems
0xf1084ab0 294912 oracle 640 152
0xf1084ab1 327681 oracle 640 152
0xf1084ab2 360450 oracle 640 152

—— Message Queues ——–
key msqid owner perms used-bytes messages

[root@linux2 ~]# cat /etc/sysctl.conf
# Kernel sysctl configuration file for Red Hat Linux
#
# For binary values, 0 is disabled, 1 is enabled. See sysctl(8) and
# sysctl.conf(5) for more details.

# Controls IP packet forwarding
net.ipv4.ip_forward = 0

# Controls source route verification
net.ipv4.conf.default.rp_filter = 1

# Do not accept source routing
net.ipv4.conf.default.accept_source_route = 0

# Controls the System Request debugging functionality of the kernel
kernel.sysrq = 0

# Controls whether core dumps will append the PID to the core filename.
# Useful for debugging multi-threaded applications.
kernel.core_uses_pid = 1

kernel.shmall = 2097152
kernel.shmmax = 2147483648
kernel.shmmni = 4096
kernel.sem = 250 32000 100 128
fs.file-max = 65536
net.core.rmem_default = 524288
net.core.wmem_default = 524288
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.ipfrag_high_thresh=524288
net.ipv4.ipfrag_low_thresh=393216
net.ipv4.tcp_rmem=4096 524288 16777216
net.ipv4.tcp_wmem=4096 524288 16777216
net.ipv4.tcp_timestamps=0
net.ipv4.tcp_sack=0
net.ipv4.tcp_window_scaling=1
net.core.optmem_max=524287
net.core.netdev_max_backlog=2500
sunrpc.tcp_slot_table_entries=128
sunrpc.udp_slot_table_entries=128
net.ipv4.tcp_mem=16384 16384 16384
net.ipv4.ip_local_port_range = 1024 65000
[root@linux2 ~]# cat /proc/meminfo
MemTotal: 6126144 kB
MemFree: 6560 kB
Buffers: 86304 kB
Cached: 5442416 kB
SwapCached: 4864 kB
Active: 1020912 kB
Inactive: 4851920 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 6126144 kB
LowFree: 6560 kB
SwapTotal: 12287968 kB
SwapFree: 12228496 kB
Dirty: 205840 kB
Writeback: 0 kB
Mapped: 563904 kB
Slab: 161232 kB
CommitLimit: 15351040 kB
Committed_AS: 5687040 kB
PageTables: 9936 kB
VmallocTotal: 137430536192 kB
VmallocUsed: 7056 kB
VmallocChunk: 137430528368 kB
HugePages_Total: 0
HugePages_Free: 0
Hugepagesize: 262144 kB
[root@linux2 ~]# cat /proc/swaps
Filename Type Size Used Priority
/dev/sda3 partition 12287968 59472 -1
[root@linux2 ~]#
[root@linux2 ~]#
[root@linux2 ~]#
[root@linux2 ~]#
[root@linux2 ~]#
[root@linux2 ~]#
[root@linux2 ~]#
[root@linux2 ~]# su – oracle
[oracle@linux2 ~]$ opatch lsinventory -detail
Invoking OPatch 10.2.0.4.2

Oracle Home : /u01/oracle/product/10.2.0/db_1
Central Inventory : /u01/oraInventory
from : /var/opt/oracle/oraInst.loc
OPatch version : 10.2.0.4.2
OUI version : 10.2.0.5.0
OUI location : /u01/oracle/product/10.2.0/db_1/oui
Log file location : /u01/oracle/product/10.2.0/db_1/cfgtoollogs/opatch/opatch2011-10-10_09-57-39AM.log

Lsinventory Output file location : /u01/oracle/product/10.2.0/db_1/cfgtoollogs/opatch/lsinv/lsinventory2011-10-10_09-57-39AM.txt

——————————————————————————–
Installed Top-level Products (2):

Oracle Database 10g 10.2.0.1.0
Oracle Database 10g Release 2 Patch Set 4 10.2.0.5.0
There are 2 products installed in this Oracle Home.

Agent Required Support Files 10.2.0.1.0
Agent Required Support Files Patch 10.2.0.5.0
Assistant Common Files 10.2.0.1.0
Assistant Common Files Patch 10.2.0.5.0
Bali Share 1.1.18.0.0
Bali Share 1.1.19.0.0
Buildtools Common Files 10.2.0.1.0
Buildtools Common Files 10.2.0.5.0
Character Set Migration Utility 10.2.0.1.0
Character Set Migration Utility Patch 10.2.0.5.0
Database Configuration and Upgrade Assistants 10.2.0.1.0
Database Configuration and Upgrade Assistants Patch 10.2.0.5.0
Database SQL Scripts 10.2.0.1.0
Database SQL Scripts Patch 10.2.0.5.0
Database Workspace Manager 10.2.0.1.0
Database Workspace Manager 10.2.0.5.0
DBJAVA Required Support Files 10.2.0.1.0
DBJAVA Required Support Files Patch 10.2.0.5.0
Enterprise Edition Options 10.2.0.1.0
Enterprise Manager Agent Core 10.2.0.1.0
Enterprise Manager Agent Core Patch 10.2.0.5.0a
Enterprise Manager Agent DB 10.2.0.1.0
Enterprise Manager Agent DB 10.2.0.5.0
Enterprise Manager Baseline 10.2.0.1.0
Enterprise Manager Baseline 10.2.0.5.0
Enterprise Manager Common Core Patch 10.2.0.5.0a
Enterprise Manager Common Files 10.2.0.1.0
Enterprise Manager Minimal Integration 10.2.0.1.0
Enterprise Manager Minimal Integration 10.2.0.5.0
Enterprise Manager plugin Common Files 10.2.0.1.0
Enterprise Manager plugin Common Files 10.2.0.1.0
Enterprise Manager plugin Common Files Patch 10.2.0.5.0
Enterprise Manager Repository Core 10.2.0.1.0
Enterprise Manager Repository Core patch 10.2.0.5.0a
Enterprise Manager Repository DB 10.2.0.1.0
Enterprise Manager Repository DB 10.2.0.5.0
SQLJ Runtime 10.2.0.1.0
SQLJ Runtime Patch 10.2.0.5.0
SSL Required Support Files for InstantClient 10.2.0.1.0
SSL Required Support Files for InstantClient Patch 10.2.0.5.0
Sun JDK extensions 10.1.2.0.0
XDK Required Support Files 10.2.0.1.0
XDK Required Support Files Patch 10.2.0.5.0
XML Parser for Java 10.2.0.1.0
XML Parser for Java Patch 10.2.0.5.0
XML Parser for Oracle JVM 10.2.0.1.0
XML Parser for Oracle JVM Patch 10.2.0.5.0
There are 192 products installed in this Oracle Home.

There are no Interim patches installed in this Oracle Home.

——————————————————————————–

OPatch succeeded.

(34) maclean
Sun, 9 October 2011 23:17:08 +0800

#collect Linux info

arch
cat /etc/issue
whoami
ulimit -a
df -h /dev/shm
ipcs -ma
cat /etc/sysctl.conf
cat /proc/meminfo
cat /proc/swaps
cat /proc/vmstat
opatch lsinventory -detail

(33) stoperp
Sun, 9 October 2011 21:27:35 +0800
这就是打完补丁启动后报的错。
Completed: ALTER DATABASE OPEN
Sun Oct 09 18:19:10 CST 2011
Errors in file /oracle2/oracle/admin/orcl/bdump/orcl_ckpt_5240.trc:
ORA-00600: internal error code, arguments: [3751], [1], [93], [809], [109], [809], [], []
Sun Oct 09 18:19:17 CST 2011
Errors in file /oracle2/oracle/admin/orcl/bdump/orcl_mmnl_5250.trc:
ORA-00600: internal error code, arguments: [kewa_write_row: SW unaligned!], [0x4009A9B9A5A], [0x4009A9B9A60], [], [], [], [], []
Sun Oct 09 18:19:18 CST 2011
Errors in file /oracle2/oracle/admin/orcl/bdump/orcl_mmon_5248.trc:
ORA-07445: exception encountered: core dump [KGHISPIR()+16] [SIGSEGV] [unknown code] [0xFFFFFFFFFFFFFFFE] [] []
Sun Oct 09 18:19:24 CST 2011
Errors in file /oracle2/oracle/admin/orcl/bdump/orcl_dbw0_5236.trc:
ORA-07445: exception encountered: core dump [kcbbioq()+1120] [SIGSEGV] [unknown code] [0x9589DA8BBF0] [] []

(32) stoperp
Sun, 9 October 2011 21:00:59 +0800
操作系统： linux 安腾64位
数据库版本：10204，当前已升级到10205
前几天数据库异常宕机，重新startup，过一会就宕机了，日志里面报600,7445的错误，后来尝试做数据库全库不完全恢复，并且升级到10205，还是宕机。
ORA-00600: internal error code, arguments: [4519], [0], [4195978], [5], [], [], [], []
Fri Oct 7 12:43:40 2011
Errors in file /oracle2/oracle/admin/orcl/bdump/orcl_mmon_5215.trc:
ORA-00600: internal error code, arguments: [kgh_free_extents:ds], [0x40089000000], [], [], [], [], [], []
Fri Oct 7 12:43:50 2011
Errors in file /oracle2/oracle/admin/orcl/bdump/orcl_j006_5249.trc:
ORA-00600: internal error code, arguments: [xplUnCompact:magic], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [1100], [0x40046FAE578], [0x40046FAE578], [], [], [], [], []
ORA-08103: object no longer exists
Fri Oct 7 12:43:51 2011
Errors in file /oracle2/oracle/admin/orcl/bdump/orcl_j006_5249.trc:
ORA-07445: exception encountered: core dump [pfrlnitinit()+176] [SIGSEGV] [unknown code] [0x000000000] [] []
ORA-00600: internal error code, arguments: [xplUnCompact:magic], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [1100], [0x40046FAE578], [0x40046FAE578], [], [], [], [], []
ORA-08103: object no longer exists
Fri Oct 7 12:43:52 2011
Errors in file /oracle2/oracle/admin/orcl/bdump/orcl_ckpt_5207.trc:
ORA-07445: exception encountered: core dump [qesmmIFreeWorkArea()+688] [SIGSEGV] [unknown code] [0x000000008] [] []
Fri Oct 7 12:43:54 2011
Errors in file /oracle2/oracle/admin/orcl/bdump/orcl_pmon_5197.trc:
ORA-07445: exception encountered: core dump [ktcxdp()+784] [SIGSEGV] [unknown code] [0x000000008] [] []
Fri Oct 7 13:28:12 2011

(31) 广州-wendy
Mon, 19 September 2011 22:15:26 +0800

[oracle@cdb301 ~]$ cluvfy stage -post crsinst -n all -verbose

Performing post-checks for cluster services setup

Checking node reachability…

Check: Node reachability from node “cdb301”
Destination Node Reachable?
———————————— ————————
cdb302 yes
cdb301 yes
Result: Node reachability check passed from node “cdb301”.

Checking user equivalence…

Check: User equivalence for user “oracle”
Node Name Comment
———————————— ————————
cdb302 passed
cdb301 passed
Result: User equivalence check passed for user “oracle”.

Checking Cluster manager integrity…

Checking CSS daemon…
Node Name Status
———————————— ————————
cdb302 running
cdb301 running
Result: Daemon status check passed for “CSS daemon”.

Cluster manager integrity check passed.

Checking cluster integrity…

Node Name
————————————
cdb301
cdb302

Cluster integrity check passed

Checking OCR integrity…

Checking the absence of a non-clustered configuration…
All nodes free of non-clustered, local-only configurations.

Uniqueness check for OCR device passed.

Checking the version of OCR…
OCR of correct Version “2” exists.

Checking data integrity of OCR…
Data integrity check for OCR passed.

OCR integrity check passed.

Checking CRS integrity…

Checking daemon liveness…

Check: Liveness for “CRS daemon”
Node Name Running
———————————— ————————
cdb302 yes
cdb301 yes
Result: Liveness check passed for “CRS daemon”.

Checking daemon liveness…

Check: Liveness for “CSS daemon”
Node Name Running
———————————— ————————
cdb302 yes
cdb301 yes
Result: Liveness check passed for “CSS daemon”.

Checking daemon liveness…

Check: Liveness for “EVM daemon”
Node Name Running
———————————— ————————
cdb302 yes
cdb301 yes
Result: Liveness check passed for “EVM daemon”.

Liveness of all the daemons
Node Name CRS daemon CSS daemon EVM daemon
———— ———————— ———————— ———-
cdb302 yes yes yes
cdb301 yes yes yes

Checking CRS health…

Check: Health of CRS
Node Name CRS OK?
———————————— ————————
cdb302 unknown
cdb301 yes
Result: CRS health check failed.

CRS integrity check failed.

Checking node application existence…

Checking existence of VIP node application
Node Name Required Status Comment
———— ———————— ———————— ———-
cdb302 yes exists passed
cdb301 yes exists passed
Result: Check passed.

Checking existence of ONS node application
Node Name Required Status Comment
———— ———————— ———————— ———-
cdb302 no exists passed
cdb301 no exists passed
Result: Check passed.

Checking existence of GSD node application
Node Name Required Status Comment
———— ———————— ———————— ———-
cdb302 no exists passed
cdb301 no exists passed
Result: Check passed.

Post-check for cluster services setup was unsuccessful.
Checks did not pass for the following node(s):
cdb302

(30) 广州-wendy
Mon, 19 September 2011 19:38:36 +0800

[oracle@cdb301 ~]$ cluvfy comp ssa -n all

Verifying shared storage accessibility

Checking shared storage accessibility…

Disk Sharing Nodes (2 in count)
———————————— ————————
/dev/sda cdb302 cdb301
/dev/sdb cdb302 cdb301
/dev/sdc cdb302 cdb301
/dev/sdd cdb302 cdb301
/dev/sde cdb302 cdb301
/dev/sdf cdb302 cdb301
/dev/sdg cdb302 cdb301
/dev/sdh cdb302 cdb301
/dev/sdi cdb302 cdb301
/dev/sdj cdb302 cdb301
/dev/sdk cdb302 cdb301
/dev/sdl cdb302 cdb301
/dev/sdm cdb302 cdb301
/dev/sdn cdb302 cdb301
Shared storage check was successful on nodes “cdb302,cdb301”.
Verification of shared storage accessibility was successful.

[oracle@cdb301 ~]$ cluvfy comp nodecon -n all

Verifying node connectivity

Checking node connectivity…

Node connectivity check passed for subnet “192.168.254.0” with node(s) cdb302,cdb301.
Node connectivity check passed for subnet “192.168.0.0” with node(s) cdb302,cdb301.
Node connectivity check passed for subnet “172.16.3.0” with node(s) cdb302,cdb301.

Suitable interfaces for the private interconnect on subnet “192.168.254.0”:
cdb302 eth2:192.168.254.23
cdb301 eth2:192.168.254.21

Suitable interfaces for the private interconnect on subnet “192.168.0.0”:
cdb302 eth3:192.168.3.4
cdb301 eth3:192.168.3.3

Suitable interfaces for the private interconnect on subnet “172.16.3.0”:
cdb302 bond0:172.16.3.23
cdb301 bond0:172.16.3.21

ERROR:
Could not find a suitable set of interfaces for VIPs.

Node connectivity check failed.

Verification of node connectivity was unsuccessful on all the nodes.

[oracle@cdb301 ~]$ cluvfy comp nodereach -n all

Verifying node reachability

Checking node reachability…
Node reachability check passed from node “cdb301”.

Verification of node reachability was successful.

(29) maclean
Mon, 19 September 2011 16:29:36 +0800

尝试检查node2 上的votedisk 设备，

使用cluvfy 命令验证crs，

cluvfy comp nodecon -n vrh1,vrh2
cluvfy comp nodereach -n vrh1,vrh2
cluvfy comp ssa -n vrh1,vrh2

diagcollection.pl 脚本可以用于收集 crs日志，以便诊断

(28) 广州-wendy
Mon, 19 September 2011 16:11:28 +0800

大师，有时间帮我诊断个问题，谢谢！！！！

环境：rh5 + asm + database (10.2.0.4 64bit) 双节点rac
问题描述：由于power down，掉电后，启动后只能启动单节点（node1或者node2）

在两个节点上同时启动/home/oracle/product/10.2.0/crs/bin/crsctl start crs后，node1的上资源服务启动是正常的*（有时候时node2的资源服务是正常的），如下：

[root@cdb301 ~]# /home/oracle/product/10.2.0/crs/bin/crs_stat -t
Name Type Target State Host
————————————————————
ora….SM1.asm application ONLINE ONLINE cdb301
ora….01.lsnr application ONLINE ONLINE cdb301
ora.cdb301.gsd application ONLINE ONLINE cdb301
ora.cdb301.ons application ONLINE ONLINE cdb301
ora.cdb301.vip application ONLINE ONLINE cdb301
ora….SM2.asm application ONLINE OFFLINE
ora….02.lsnr application ONLINE OFFLINE
ora.cdb302.gsd application ONLINE OFFLINE
ora.cdb302.ons application ONLINE OFFLINE
ora.cdb302.vip application ONLINE ONLINE cdb301
ora.racdb.db application ONLINE ONLINE cdb301
ora….b1.inst application ONLINE ONLINE cdb301
ora….b2.inst application ONLINE OFFLINE

但是在node1上/home/oracle/product/10.2.0/crs/bin/crsctl stop crs后，node1上的资源服务停止，但是node2上的资源服务正常，如下：
[oracle@cdb302 ~]$ crs_stat -t
Name Type Target State Host
————————————————————
ora….SM1.asm application ONLINE OFFLINE
ora….01.lsnr application ONLINE OFFLINE
ora.cdb301.gsd application ONLINE OFFLINE
ora.cdb301.ons application ONLINE OFFLINE
ora.cdb301.vip application ONLINE ONLINE cdb302
ora….SM2.asm application ONLINE ONLINE cdb302
ora….02.lsnr application ONLINE ONLINE cdb302
ora.cdb302.gsd application ONLINE ONLINE cdb302
ora.cdb302.ons application ONLINE ONLINE cdb302
ora.cdb302.vip application ONLINE ONLINE cdb302
ora.racdb.db application ONLINE ONLINE cdb302
ora….b1.inst application ONLINE OFFLINE
ora….b2.inst application ONLINE ONLINE cdb302

在node1节点正常的情况下，node2节点上的crs启动日志如下：

[root@cdb302 ~]# tail -100f /home/oracle/product/10.2.0/crs/log/cdb302/cssd/ocssd.log
[ CSSD]2011-09-15 00:40:07.480 >USER: Copyright 2011, Oracle version 10.2.0.4.0
[ CSSD]2011-09-15 00:40:07.480 >USER: CSS daemon log for node cdb302, number 2, in cluster crs
[ clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=cdb302DBG_CSSD))
[ CSSD]2011-09-15 00:40:07.485 [2565121264] >TRACE: clssscmain: local-only set to false
[ CSSD]2011-09-15 00:40:07.491 [2565121264] >TRACE: clssnmReadNodeInfo: added node 1 (cdb301) to cluster
[ CSSD]2011-09-15 00:40:07.498 [2565121264] >TRACE: clssnmReadNodeInfo: added node 2 (cdb302) to cluster
[ CSSD]2011-09-15 00:40:07.508 [1129740608] >TRACE: clssnm_skgxninit: Compatible vendor clusterware not in use
[ CSSD]2011-09-15 00:40:07.508 [1129740608] >TRACE: clssnm_skgxnmon: skgxn init failed
[ CSSD]2011-09-15 00:40:07.511 [2565121264] >TRACE: clssnmNMInitialize: misscount set to (60)
[ CSSD]2011-09-15 00:40:07.512 [2565121264] >TRACE: clssnmNMInitialize: Network heartbeat thresholds are: impending reconfig 30000 ms, reconfig start (misscount) 60000 ms
[ CSSD]2011-09-15 00:40:07.515 [2565121264] >TRACE: clssnmDiskStateChange: state from 1 to 2 disk (0//orac/crs/vote.crs)
[ CSSD]2011-09-15 00:40:07.515 [1129740608] >TRACE: clssnmvDPT: spawned for disk 0 (/orac/crs/vote.crs)
[ CSSD]2011-09-15 00:40:09.574 [1129740608] >TRACE: clssnmDiskStateChange: state from 2 to 4 disk (0//orac/crs/vote.crs)
[ CSSD]2011-09-15 00:40:09.588 [2565121264] >TRACE: clssnmFatalInit: fatal mode enabled
[ CSSD]2011-09-15 00:40:09.588 [1140230464] >TRACE: clssnmvKillBlockThread: spawned for disk 0 (/orac/crs/vote.crs) initial sleep interval (1000)ms
[ CSSD]2011-09-15 00:40:09.594 [1129740608] >TRACE: clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(22) LATS(2817614) Disk lastSeqNo(22)
[ CSSD]2011-09-15 00:40:09.609 [1161210176] >TRACE: clssnmClusterListener: Listening on (ADDRESS=(PROTOCOL=tcp)(HOST=cdb302-priv)(PORT=49895))

[ CSSD]2011-09-15 00:40:09.609 [1161210176] >TRACE: clssnmconnect: connecting to node(1), con(0xd63f330), flags 0x0003
[ CSSD]2011-09-15 00:40:09.616 [1161210176] >TRACE: clsc_send_msg: (0xd63ebd0) NS err (12571, 12560), transport (530, 113, 0)

[ CSSD]2011-09-15 00:40:09.617 [1161210176] >TRACE: clssnmDiscHelper: cdb301, node(1) connection failed, con (0xd63f330), probe((nil))
[ CSSD]2011-09-15 00:40:09.635 [1171700032] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_2))
[ CSSD]2011-09-15 00:40:09.635 [1171700032] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_cdb302_crs))
[ CSSD]2011-09-15 00:40:09.643 [1203169600] >TRACE: clssgmPeerListener: Listening on (ADDRESS=(PROTOCOL=tcp)(DEV=20)(HOST=192.168.254.23)(PORT=14930))
[ CSSD]2011-09-15 00:40:10.605 [1129740608] >TRACE: clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(23) LATS(2818624) Disk lastSeqNo(23)
[ CSSD]2011-09-15 00:40:11.607 [1129740608] >TRACE: clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(24) LATS(2819634) Disk lastSeqNo(24)
[ CSSD]2011-09-15 00:40:12.610 [1129740608] >TRACE: clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(25) LATS(2820634) Disk lastSeqNo(25)
[ CSSD]2011-09-15 00:40:13.614 [1129740608] >TRACE: clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(26) LATS(2821634) Disk lastSeqNo(26)
[ CSSD]2011-09-15 00:40:14.616 [1129740608] >TRACE: clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(27) LATS(2822634) Disk lastSeqNo(27)
[ CSSD]2011-09-15 00:40:15.624 [1129740608] >TRACE: clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(28) LATS(2823644) Disk lastSeqNo(28)
[ CSSD]2011-09-15 00:40:16.626 [1129740608] >TRACE: clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(29) LATS(2824644) Disk lastSeqNo(29)
[ CSSD]2011-09-15 00:40:16.665 [1234639168] >TRACE: clssnmRcfgMgrThread: Local Join
[ CSSD]2011-09-15 00:40:16.665 [1234639168] >WARNING: clssnmLocalJoinEvent: takeover aborted due to ALIVE node on Disk
[ CSSD]2011-09-15 00:40:17.629 [1129740608] >TRACE: clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(30) LATS(2825654) Disk lastSeqNo(30)
[ CSSD]2011-09-15 00:40:18.632 [1129740608] >TRACE: clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(31) LATS(2826654) Disk lastSeqNo(31)

[root@cdb302 ~]# tail -100f /home/oracle/product/10.2.0/crs/log/cdb302/crsd/crsd.log

2011-09-15 00:40:03.133: [ default][22653680][ENTER]0
Oracle Database 10g CRS Release 10.2.0.4.0 Production Copyright 1996, 2004, Oracle. All rights reserved
2011-09-15 00:40:03.133: [ default][22653680]0CRS Daemon Starting
2011-09-15 00:40:03.133: [ CRSMAIN][22653680]0Checking the OCR device
2011-09-15 00:40:03.154: [ CRSMAIN][22653680]0Connecting to the CSS Daemon
2011-09-15 00:40:03.553: [ COMMCRS][1094519104]clsc_connect: (0xac7fe70) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_cdb302_crs))
2011-09-15 00:40:03.553: [ CSSCLNT][22653680]clsssInitNative: connect failed, rc 9
2011-09-15 00:40:03.553: [ CRSRTI][22653680]0CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2011-09-15 00:40:04.953: [ COMMCRS][1094519104]clsc_connect: (0xac7fe70) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_cdb302_crs))
2011-09-15 00:40:04.953: [ CSSCLNT][22653680]clsssInitNative: connect failed, rc 9
2011-09-15 00:40:04.953: [ CRSRTI][22653680]0CSS is not ready. Received status 3 from CSS. Waiting for good status ..

[root@cdb302 ~]# tail -100f /home/oracle/product/10.2.0/crs/log/cdb302/evmd/evmd.log

2011-09-15 00:40:03.007: [ EVMD][2628647472]0EVMD Starting
2011-09-15 00:40:03.008: [ EVMD][2628647472]0
Oracle Database 10g CRS Release 10.2.0.4.0 Production Copyright 1996, 2006, Oracle. All rights reserved
2011-09-15 00:40:03.008: [ EVMD][2628647472]0Initializing OCR
2011-09-15 00:40:03.603: [ COMMCRS][1098111296]clsc_connect: (0xc82750) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_cdb302_crs))
2011-09-15 00:40:03.603: [ CSSCLNT][2628647472]clsssInitNative: connect failed, rc 9
2011-09-15 00:40:03.603: [ EVMD][2628647472]0EVMD waiting for CSS to be ready err = 3

[oracle@cdb302 ~]$ tail -100f product/10.2.0/db_1/admin/+ASM/bdump/alert_+ASM2.log
—无输出
[oracle@cdb302 ~]$ tail -100f product/10.2.0/db_1/admin/racdb/bdump/alert_racdb2.log
—无输出

node1节点检查：
[oracle@cdb301 ~]$ crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy

[oracle@cdb301 ~]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 4818928
Used space (kbytes) : 3848
Available space (kbytes) : 4815080
ID : 986584232
Device/File Name : /orac/crs/ocr.crs
Device/File integrity check succeeded

Device/File not configured

Cluster registry integrity check succeeded

node2节点检查：
[oracle@cdb302 ~]$ crsctl check crs
—命令挂死
[oracle@cdb302 ~]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 4818928
Used space (kbytes) : 3848
Available space (kbytes) : 4815080
ID : 986584232
Device/File Name : /orac/crs/ocr.crs
Device/File integrity check succeeded

Device/File not configured

Cluster registry integrity check succeeded

在node2节点上：
[root@cdb302 bin]# /etc/init.d/init.crs stop
Shutting down Oracle Cluster Ready Services (CRS):
Stopping resources. This could take several minutes.
Error while stopping resources. Possible cause: CRSD is down.
Shutdown has begun. The daemons should exit soon.

[root@cdb302 bin]# /etc/init.d/init.crs start
Startup will be queued to init within 30 seconds

–接着node2节点OS自动重启，CSSD进程异常，系统会立即重启。CSS服务通过多种心跳机制，实时监控集群健康状态，提供脑裂保护等基础集群服务功能。css服务有两种心跳机制：一种是通过私有网络的Network Heartbeat，另一种是通过Voting Disk的Disk Heartbeat

在node1节点上：
[root@cdb301 ~]# ps -ef | grep d.bin
oracle 4738 4727 0 00:39 ? 00:00:01 /home/oracle/product/10.2.0/crs/bin/evmd.bin
root 4804 32251 0 00:39 ? 00:01:10 /home/oracle/product/10.2.0/crs/bin/crsd.bin reboot
root 5190 4815 0 00:39 ? 00:00:00 /home/oracle/product/10.2.0/crs/bin/oprocd.bin run -t 1000 -m 500 -f
oracle 5369 4888 0 00:39 ? 00:00:19 /home/oracle/product/10.2.0/crs/bin/ocssd.bin
root 15307 14776 0 20:03 pts/5 00:00:00 grep d.bin

[root@cdb301 ~]# ps -ef | grep crs | grep -v grep
root 4727 32138 0 00:39 ? 00:00:00 /bin/su -l oracle -c sh -c ‘ulimit -c unlimited; cd /home/oracle/product/10.2.0/crs/log/cdb301/evmd; exec /home/oracle/product/10.2.0/crs/bin/evmd ‘
oracle 4738 4727 0 00:39 ? 00:00:01 /home/oracle/product/10.2.0/crs/bin/evmd.bin
root 4804 32251 0 00:39 ? 00:01:11 /home/oracle/product/10.2.0/crs/bin/crsd.bin reboot
root 5190 4815 0 00:39 ? 00:00:00 /home/oracle/product/10.2.0/crs/bin/oprocd.bin run -t 1000 -m 500 -f
root 5209 4839 0 00:39 ? 00:00:00 /sbin/runuser -l oracle -c /bin/sh -c ‘cd /home/oracle/product/10.2.0/crs/log/cdb301/cssd/oclsomon; -c unlimited; /home/oracle/product/10.2.0/crs/bin/oclsomon || exit $?’
oracle 5210 5209 0 00:39 ? 00:00:00 /bin/sh -c cd /home/oracle/product/10.2.0/crs/log/cdb301/cssd/oclsomon; ulimit -c unlimited; /home/oracle/product/10.2.0/crs/bin/oclsomon || exit $?
oracle 5238 5210 0 00:39 ? 00:00:00 /home/oracle/product/10.2.0/crs/bin/oclsomon.bin
oracle 5369 4888 0 00:39 ? 00:00:19 /home/oracle/product/10.2.0/crs/bin/ocssd.bin
oracle 5526 4738 0 00:39 ? 00:00:00 /home/oracle/product/10.2.0/crs/bin/evmlogger.bin -o /home/oracle/product/10.2.0/crs/evm/log/evmlogger.info -l /home/oracle/product/10.2.0/crs/evm/log/evmlogger.log
oracle 5768 1 0 00:39 ? 00:00:00 /home/oracle/product/10.2.0/crs/opmn/bin/ons -d
oracle 5769 5768 0 00:39 ? 00:00:00 /home/oracle/product/10.2.0/crs/opmn/bin/ons -d
root 32251 1 0 00:28 ? 00:00:00 /bin/sh /etc/init.d/init.crsd run

在node2节点上：
[root@cdb302 ~]# ps -ef | grep d.bin
oracle 8408 8407 0 15:34 ? 00:00:00 /home/oracle/product/10.2.0/crs/bin/evmd.bin
root 8449 7512 0 15:34 ? 00:00:00 /home/oracle/product/10.2.0/crs/bin/crsd.bin reboot
root 8857 8469 0 15:34 ? 00:00:00 /home/oracle/product/10.2.0/crs/bin/oprocd.bin run -t 1000 -m 500 -f
oracle 9026 8546 0 15:34 ? 00:00:00 /home/oracle/product/10.2.0/crs/bin/ocssd.bin
root 26585 25147 0 20:06 pts/2 00:00:00 grep d.bin

[root@cdb302 ~]# ps -ef | grep crs | grep -v grep
root 7512 1 0 15:34 ? 00:00:00 /bin/sh /etc/init.d/init.crsd run
root 8407 7509 0 15:34 ? 00:00:00 /bin/su -l oracle -c sh -c ‘ulimit -c unlimited; cd /home/oracle/product/10.2.0/crs/log/cdb302/evmd; exec /home/oracle/product/10.2.0/crs/bin/evmd ‘
oracle 8408 8407 0 15:34 ? 00:00:00 /home/oracle/product/10.2.0/crs/bin/evmd.bin
root 8449 7512 0 15:34 ? 00:00:00 /home/oracle/product/10.2.0/crs/bin/crsd.bin reboot
root 8857 8469 0 15:34 ? 00:00:00 /home/oracle/product/10.2.0/crs/bin/oprocd.bin run -t 1000 -m 500 -f
root 8864 8493 0 15:34 ? 00:00:00 /sbin/runuser -l oracle -c /bin/sh -c ‘cd /home/oracle/product/10.2.0/crs/log/cdb302/cssd/oclsomon; ulimit -c unlimited; /home/oracle/product/10.2.0/crs/bin/oclsomon || exit $?’
oracle 8865 8864 0 15:34 ? 00:00:00 /bin/sh -c cd /home/oracle/product/10.2.0/crs/log/cdb302/cssd/oclsomon; ulimit -c unlimited; /home/oracle/product/10.2.0/crs/bin/oclsomon || exit $?
oracle 8893 8865 0 15:34 ? 00:00:00 /home/oracle/product/10.2.0/crs/bin/oclsomon.bin
oracle 9026 8546 0 15:34 ? 00:00:00 /home/oracle/product/10.2.0/crs/bin/ocssd.bin

node1上比node2上多了三个进程.

(27) maclean
Fri, 16 September 2011 23:35:41 +0800

你好，

这2天比较忙，不能及时回复，请耐心等待！

(26) 广州-wendy
Wed, 14 September 2011 20:50:09 +0800
环境：rh5+asm+db10.2.0.4（64bit）
描述：
rac安装完后两边能正常启动，但是系统管理员reboot OS 后，就只能启动node1上的服务和数据库实例，node2上的服务完全起不来，
然后系统管理员又reboot一次OS，这次是node2节点上的服务和数据库实例能启动，而node1上的完全没有起来。
如下：
[oracle@cdb302 ~]$ crs_stat -t
Name Type Target State Host
————————————————————
ora….SM1.asm application ONLINE OFFLINE
ora….01.lsnr application ONLINE OFFLINE
ora.cdb301.gsd application ONLINE OFFLINE
ora.cdb301.ons application ONLINE OFFLINE
ora.cdb301.vip application ONLINE ONLINE cdb302
ora….SM2.asm application ONLINE ONLINE cdb302
ora….02.lsnr application ONLINE ONLINE cdb302
ora.cdb302.gsd application ONLINE ONLINE cdb302
ora.cdb302.ons application ONLINE ONLINE cdb302
ora.cdb302.vip application ONLINE ONLINE cdb302
ora.racdb.db application ONLINE ONLINE cdb302
ora….b1.inst application ONLINE OFFLINE
ora….b2.inst application ONLINE ONLINE cdb302

[root@cdb301 ~]# tail -100f /home/oracle/product/10.2.0/crs/log/cdb301/cssd/ocssd.log
[ CSSD]2011-09-14 20:40:28.278 [1135917376] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(2) wrtcnt(6498) LATS(5815924) Disk lastSeqNo(6498)
[ CSSD]2011-09-14 20:40:29.280 [1135917376] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(2) wrtcnt(6499) LATS(5816934) Disk lastSeqNo(6499)
[ CSSD]2011-09-14 20:40:30.282 [1135917376] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(2) wrtcnt(6500) LATS(5817934) Disk lastSeqNo(6500)
[ CSSD]2011-09-14 20:40:31.284 [1135917376] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(2) wrtcnt(6501) LATS(5818934) Disk lastSeqNo(6501)
[ CSSD]2011-09-14 20:40:32.286 [1135917376] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(2) wrtcnt(6502) LATS(5819934) Disk lastSeqNo(6502)
[ CSSD]2011-09-14 20:40:33.288 [1135917376] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(2) wrtcnt(6503) LATS(5820934) Disk lastSeqNo(6503)
[ CSSD]2011-09-14 20:40:34.290 [1135917376] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(2) wrtcnt(6504) LATS(5821934) Disk lastSeqNo(6504)
[ CSSD]2011-09-14 20:40:34.314 [1230326080] >TRACE: clssnmRcfgMgrThread: Local Join
[ CSSD]2011-09-14 20:40:34.314 [1230326080] >WARNING: clssnmLocalJoinEvent: takeover aborted due to ALIVE node on Disk
[ CSSD]2011-09-14 20:40:35.292 [1135917376] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(2) wrtcnt(6505) LATS(5822944) Disk lastSeqNo(6505)
[ CSSD]2011-09-14 20:40:36.294 [1135917376] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(2) wrtcnt(6506) LATS(5823944) Disk lastSeqNo(6506)
[ CSSD]2011-09-14 20:40:37.296 [1135917376] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(2) wrtcnt(6507) LATS(5824944) Disk lastSeqNo(6507)
[ CSSD]2011-09-14 20:40:38.298 [1135917376] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(2) wrtcnt(6508) LATS(5825944) Disk lastSeqNo(6508)
[ CSSD]2011-09-14 20:40:39.300 [1135917376] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(2) wrtcnt(6509) LATS(5826944) Disk lastSeqNo(6509)
[ CSSD]2011-09-14 20:40:40.303 [1135917376] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(2) wrtcnt(6510) LATS(5827954) Disk lastSeqNo(6510)
[ CSSD]2011-09-14 20:40:41.305 [1135917376] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(2) wrtcnt(6511) LATS(5828954) Disk lastSeqNo(6511)
[ CSSD]2011-09-14 20:40:41.328 [1230326080] >TRACE: clssnmRcfgMgrThread: Local Join
[ CSSD]2011-09-14 20:40:41.328 [1230326080] >WARNING: clssnmLocalJoinEvent: takeover aborted due to ALIVE node on Disk
[ CSSD]2011-09-14 20:40:42.307 [1135917376] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(2) wrtcnt(6512) LATS(5829954) Disk lastSeqNo(6512)
[ CSSD]2011-09-14 20:40:43.309 [1135917376] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(2) wrtcnt(6513) LATS(5830954) Disk lastSeqNo(6513)
[ CSSD]2011-09-14 20:40:44.311 [1135917376] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(2) wrtcnt(6514) LATS(5831954) Disk lastSeqNo(6514)
[ CSSD]2011-09-14 20:40:45.313 [1135917376] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(2) wrtcnt(6515) LATS(5832964) Disk lastSeqNo(6515)
[ CSSD]2011-09-14 20:40:46.315 [1135917376] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(2) wrtcnt(6516) LATS(5833964) Disk lastSeqNo(6516)
[ CSSD]2011-09-14 20:40:47.317 [1135917376] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(2) wrtcnt(6517) LATS(5834964) Disk lastSeqNo(6517)
[ CSSD]2011-09-14 20:40:48.319 [1135917376] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(2) wrtcnt(6518) LATS(5835964) Disk lastSeqNo(6518)

检查node2的crs：
[oracle@cdb302 ~]$ crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy

检查node1节点的crs：
[oracle@cdb301 ~]$ crsctl check crs

—命令挂起

(25) 青子
Sun, 26 June 2011 10:05:41 +0800

能否把Step By Step guide for installing Oracle RAC 11gR2 on Windows (2003 and 2008)
发一份文档到邮箱，谢谢

(24) maclean
Mon, 6 June 2011 21:55:49 +0800

action plan:

1. 执行relink all
rerun catpatch.sql

2. try to remove jvm

— Start of File full_rmjvm.sql
spool full_rmjvm.log
set echo on
connect / as sysdba
startup mount
alter system set “_system_trig_enabled” = false scope=memory;
alter system enable restricted session;
alter database open;
start ?/rdbms/admin/catnojav.sql
start ?/xdk/admin/rmxml.sql
start ?/javavm/install/rmjvm.sql
truncate table java$jvm$status;
select * from obj$
where obj#=0 and type#=0;
delete from obj$
where obj#=0 and type#=0;
commit;
select owner, count(*) from all_objects
where object_type like ‘%JAVA%’ group by owner;
select obj#, name from obj$
where type#=28 or type#=29 or type#=30 or namespace=32;
select o1.name from obj$ o1,obj$ o2
where o1.type#=5 and o1.owner#=1 and o1.name=o2.name and o2.type#=29;
shutdown immediate
set echo off
spool off
exit
— End of File full_rmjvm.sql

rerun catpatch.sql

3. try to reinstall after upgrade
remove jvm and install it again:

— Start of File full_jvminst.sql
spool full_jvminst.log;
set echo on
connect / as sysdba
startup mount
alter system set “_system_trig_enabled” = false scope=memory;
alter database open;
select obj#, name from obj$
where type#=28 or type#=29 or type#=30 or namespace=32;
start ?/javavm/install/initjvm.sql
select count(*), object_type from all_objects
where object_type like ‘%JAVA%’ group by object_type;
start ?/xdk/admin/initxml.sql
select count(*), object_type from all_objects
where object_type like ‘%JAVA%’ group by object_type;
start ?/xdk/admin/xmlja.sql
select count(*), object_type from all_objects
where object_type like ‘%JAVA%’ group by object_type;
start ?/rdbms/admin/catjava.sql
select count(*), object_type from all_objects
where object_type like ‘%JAVA%’ group by object_type;
shutdown immediate
set echo off
spool off
exit
— End of File full_jvminst.sql

(23) sleepzzzzz
Mon, 6 June 2011 21:51:21 +0800
sorry,是9.2.0.1升级到9.2.0.4

(22) maclean
Mon, 6 June 2011 21:51:16 +0800

sleepzzzzz 你好，
该
ORA-00600: internal error code, arguments: [545],
ORA-00600: internal error code, arguments: [26599]
是由于9.2上的Bug 3236196: ORA-600 [26599] ORA-600 [545] DURING UPGRADE引起的,bug note如下:
Hdr: 3236196 9.2.0.3.0 JAVAVM 9.2.0.3.0 UPGRADE PRODID-5 PORTID-38 ORA-600
Abstract: ORA-600 [26599] ORA-600 [545] DURING UPGRADE

PROBLEM
——–
Ct has a few 9.0.1 databases that they will upgrade to 9.2.0.3.0. They are
trying to test this upgrade plan.

First time when this upgrade was tested, this was successful but they had some
invalid packages even after running utlrp.sql. These packages were the
following:
DBMS_CDC_PUBLISH, DBMS_CDC_SUBSCRIBE, DBMS_CDC_UTILITY,`DBMS_REPCAT_RGT,
DBMS_SQLJTYPE, MS_SUMADVISOR

When they tried to recompile this manually they got all sorts of java errors.
Dropping these packages resulted in invalidation of other packages It was
found out that JVM was not installed on the 9.0.1 database. Ct wanted to have
a clean db so they installed JVM on the 9.0.1 db using dbca without any
errors(Verified by querying the dba_regsitry and also count(*) of java
objects).

They tried the upgrade once again and when running cmpdbmig.sql which upgrades
java and other components it fails with the following error

create or replace java system
begin if initjvmaux.
startstep(‘CREATE_JAVA_SYSTEM’) then
*
ERROR at line 1:
ORA-600: internal error code, arguments: [545], [0x7000000000231F0], [12],
[], [], [], [], []
ORA-600: internal error code, arguments: [26599], [1], [249], [], [], [],
[],
[]
ORA-6512: at “SYS.INITJVMAUX”, line 15
ORA-6512: at line 5

DIAGNOSTICS
————
PATH, LD_LIBRARY_PATH, ORA_NLS33, ORACLE_HOME are set properly. Tried to
relink oracle with no luck.

Trace files and alert log will be uploaded to ess30.

IMPACT
——-
Very critical as the 9.0.1 databases will be upgrade very shortly.

JVM Fails With ORA-600 [545] & ORA-600 [26599],[1], [249] [ID 472584.1]
Applies to:
Oracle Server – Enterprise Edition – Version: 9.2.0.1 to 10.2.0.4 – Release: 9.2 to 10.2
IBM AIX on POWER Systems (64-bit)
IBM AIX Based Systems (64-bit)
AIX5L Based Systems (64-bit)
***Checked for relevance on 22-Mar-2011***
Symptoms
When using the JVM(i.e during exp, upgrade, JVM installation,..) the following error may be reported:

ERROR at line 1:
ORA-00600: internal error code, arguments: [545], [0x7000000000235F0], [12],
[], [], [], [], []
ORA-00600: internal error code, arguments: [26599], [1], [249], [], [], [], [], []
ORA-06512: at “SYS.INITJVMAUX”, line 15
ORA-06512: at line 5

.
Cause
This is a problem with the version $ORACLE_HOME/lib/libjox9.a in 9i installation ( libjox10.a in case of 10g installation) file which was not correct for his Patchset version and platform.

So a problem during the Patchset installation caused this file to be not replaced correctly.
Solution
A- To check the file version please run the following command:

$cd $ORACLE_HOME/lib
$ls -l libjox*
$sum libjox9.a

Check the results of the commands with a good installation with the same Patchset level and platform
i.e.
For a 9.2.0.7 home the file should be 7756886 in size.

B-If the file size does not match, then this confirms a problem with the installation. Replacing the file from another good installation should fix the problem :

Stop all db’s and servcies running on this home..
Save the original libjox9.a/libjox10.a file to a new name.
Copy in the libjox9.a/libjox10.a file from a valid server.
Run a “relink all”:
Note 131321.1:How to Relink Oracle Database Software on UNIX.
Reload the JVM:
Note 209870.1:How to Reload the JVM in 9.2.0.X/Note 276554.1:How to Reload the JVM in 10.1.0.X and 10.2.0.X

Finally software re-installation will be the recommended solution to fix this inconsistency problem especially that other files could be affected.

(21) sleepzzzzz
Mon, 6 June 2011 21:49:43 +0800
sorry,是9.2.0.1升级到9.2.0.4　。

(20) sleepzzzzz
Mon, 6 June 2011 21:41:36 +0800

hi，maclean

OS:aix 64bit,从9.2.0.升级到9.2.0.8时，执行catpatch.sql时报错如下：

PL/SQL procedure successfully completed.

begin if initjvmaux.startstep(‘CREATE_JAVA_SYSTEM’) then
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [545], [0x7000000000235F0], [12],
[], [], [], [], []
ORA-00600: internal error code, arguments: [26599], [1], [249], [], [], [], [],
[]
ORA-06512: at “SYS.INITJVMAUX”, line 15
ORA-06512: at line 5

COUNT(*)
———-
10452

如何解决，非常感谢！

(19) maclean
Fri, 27 May 2011 01:17:57 +0800

Hello David,
该ORA-00600[kddummy_blkchk][18019]错误是在对L1 BMB一级位图块做逻辑检验时发现存在逻辑错误。

move table是通过对该表数据段所在原表空间移动来重构原先有逻辑错误的一级位图块，重构后不再有逻辑错误，所以可以将DB_BLOCK_CHECKSUM调回TRUE

(18) david
Thu, 26 May 2011 17:24:01 +0800
hi，maclean
我想问下，
alter table NARI.NR_AIN_SMP_18_R move;

执行move操作的目的的什么？

(17) maclean
Wed, 25 May 2011 22:00:55 +0800

Hdr: 10636941 10.2.0.4 RDBMS 10.2.0.4.0 SPACE PRODID-5 PORTID-226 ORA-600
Abstract: INSTANCE CRASHES DUE TO ORA-600: [KDDUMMY_BLKCHK], [45], [65998097], [18019]

BUG TYPE CHOSEN
===============
Code

SubComponent: Buffer Cache
==========================
DETAILED PROBLEM DESCRIPTION
============================
Insert into partitoned table fails with ORA-600 [kddummy_blkchk], [45],
[65998098], [18040], and after SMON fails to recover TX because of ORA-600
[kddummy_blkchk], [45], [65998097], [18019].
Both blocks are part of same partition objd=1965473 L1 Bitmap.

This is an 8 node RAC env using ASM.

DIAGNOSTIC ANALYSIS
===================
alert_racrdb6.log
———————
Tue Jan 4 14:37:54 2011
SUCCESS: diskgroup ALOG was mounted
SUCCESS: diskgroup ALOG was dismounted
SUCCESS: diskgroup ALOG was mounted
Tue Jan 4 14:38:35 2011
Errors in file /u01/app/oracle/admin/racrdb/udump/racrdb6_ora_6762.trc:
ORA-600: internal error code, arguments: [kddummy_blkchk], [45],
[65998098], [18040],
..
Tue Jan 4 14:40:58 2011
Errors in file /u01/app/oracle/admin/racrdb/udump/racrdb6_ora_6762.trc:
ORA-600: internal error code, arguments: [kddummy_blkchk], [45],
[65998097], [18019],
ORA-607: Internal error occurred while making a change to a data block
ORA-600: internal error code, arguments: [kddummy_blkchk], [45],
[65998098], [18040],
..
Tue Jan 4 14:49:18 2011
Errors in file /u01/app/oracle/admin/racrdb/bdump/racrdb6_smon_7887.trc:
ORA-600: internal error code, arguments: [kddummy_blkchk], [45],
[65998097], [18019]
Tue Jan 4 14:49:19 2011
Trace dumping is performing id=[cdmp_20110104144919]
Tue Jan 4 14:50:32 2011
ORACLE Instance racrdb6 (pid = 18) – Error 607 encountered while recovering
transaction (948, 9) on object 1965473.

racrdb6_ora_6762.trc
———————
*** ID:(1589.69) 2011-01-04 14:37:05.232
TYP:0 CLS: 8 AFN:45 DBA:0x03ef0d12 OBJ:1965473 SCN:0x0912.fd260262 SEQ: 1
OP:13.22
Redo on Level1 Bitmap Block
Redo to add range
bdba: 0x022ce6c9 Length: 328
GLOBAL CACHE ELEMENT DUMP (address: 0x353077848):
id1: 0x3ef0d12 id2: 0x2d0000 obj: 1965473 block: (45/65998098)

Block after image:
buffer tsn: 52 rdba: 0x03ef0d12 (1024/65998098)
scn: 0x0912.fd260262 seq: 0x01 flg: 0x00 tail: 0x02622001
frmt: 0x02 chkval: 0x0000 type: 0x20=FIRST LEVEL BITMAP BLOCK

*** 2011-01-04 14:38:35.690
ksedmp: internal or fatal error
ORA-600: internal error code, arguments: [kddummy_blkchk], [45],
[65998098], [18040],
Current SQL statement for this session:
INSERT INTO dwh_rb.rb_fact_pd_out_itm …
SELECT …… FROM dwh_rb_staging.rb_fact_pd_out_itm_s
WHERE project_id = 65962
AND period_id BETWEEN 20090199999030 AND 20091299999030

racrdb3_smon_5821.trc
———————–
*** ID:(1639.1) 2011-01-04 16:37:42.697
SMON: about to recover undo segment 948
TYP:0 CLS: 8 AFN:45 DBA:0x03ef0d11 OBJ:1965473 SCN:0x0912.fd260262 SEQ: 2
OP:13.22
Redo on Level1 Bitmap Block
Redo to delete range
Length: 56
GLOBAL CACHE ELEMENT DUMP (address: 0x162f8f5c8):
id1: 0x3ef0d11 id2: 0x2d0000 obj: 1965473 block: (45/65998097)

Block after image:
buffer tsn: 52 rdba: 0x03ef0d11 (1024/65998097)
scn: 0x0912.fd260262 seq: 0x02 flg: 0x04 tail: 0x02622002
frmt: 0x02 chkval: 0x80d0 type: 0x20=FIRST LEVEL BITMAP BLOCK
Hex dump of corrupt header 3 = CHKVAL

*** 2011-01-04 16:38:13.154
ksedmp: internal or fatal error
ORA-600: internal error code, arguments: [kddummy_blkchk], [45],
[65998097], [18019],

WORKAROUND?
===========
Yes

WORKAROUND INFORMATION
======================
We had to disable checksum and let SMon recover the Tx, and recreate the
partition.

TECHNICAL IMPACT
================
Database crashed, was not available for usage.

RELATED ISSUES (bugs, forums, RFAs)
===================================
I didn’t find any.

HOW OFTEN DOES THE ISSUE REPRODUCE AT CUSTOMER SITE?
====================================================
Never

DOES THE ISSUE REPRODUCE INTERNALLY?
====================================
Not attempted

EXPLAIN WHY THE ISSUE WAS NOT TESTED INTERNALLY.
================================================
I don’t have ct env in house

ORA-600 [Kddummy_blkchk] Error Occurred On Create Table

Applies to:

Oracle Server – Enterprise Edition – Version: 10.2.0.3
This problem can occur on any platform.
Symptoms

The creation of a simple table caused the following errors to appear and corrupt block messages in the alert log.

create table foo
2 ( c1 number)
3 tablespace default_users;
create table foo
*
ERROR at line 1:
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [kddummy_blkchk], [36], [432394],
[18019], [], [], [], []

Changes

This was recently upgraded to 10.2.0.3.0 from 9.2.0 and used transportable tablespaces moved from 9.2.0 to create this database instance.
Cause

The error in the error stack indicates that the issue is a block corruption failing with the code
18019. So it looks like we find a corrupt block when trying to apply undo to a L1 BMB block, and
in the trace file the block shows a corrupt header with a bad CHKVAL.

The following criteria are met to point to unpublished Bug 6075487 which is still in development status at this time.

– flashback database on

– ASSM tablespace showing the error

– plugged in tablespace.

Solution

To fix this issue:

1. Use the workaround of setting the flashback database off

alter database flashback off;

select flashback_on from v$database;

FLASHBACK_ON
——————
NO

2. Wait for the 10.2.0.4.0 patchset to become available where this will be fixed. There are no one-off backports available for this issue at this time.

(16) maclean
Wed, 25 May 2011 19:25:28 +0800

Hello David,

建议在观察后将DB_BLOCK_CHECKSUM调回原始状态，命令如下

alter system set DB_BLOCK_CHECKSUM=TRUE;

(15) david
Wed, 25 May 2011 19:21:16 +0800
hi ,maclean
按照你的步骤，现在数据库能运行正常了。
我想问的是，alter system set DB_BLOCK_CHECKSUM = OFF;
这个参数是否要改回原来的true

(14) david
Wed, 25 May 2011 19:19:28 +0800
hi ,maclean
按照你的方法，现在数据库正常，我想问下，
alter system set DB_BLOCK_CHECKSUM = OFF;
这个参数还要改回去吗，改回原来的true

(13) david
Wed, 25 May 2011 16:25:11 +0800
hi ,把应用起来后，日志里面报600的错误，如下：
Errors in file /oracle2/oracle/admin/orcl/udump/orcl_ora_21233.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [28], [2756885], [18019], [], [], [], []
Wed May 25 14:22:39 2011
Doing block recovery for file 28 block 2756885
Block recovery from logseq 45973, block 1346 to scn 1769075520
Wed May 25 14:22:39 2011
Recovery of Online Redo Log: Thread 1 Group 1 Seq 45973 Reading mem 0
Mem# 0: /oracle2/oracle/oradata/orcl/redo01.log
Block recovery completed at rba 45973.1516.16, scn 0.1769075521
Wed May 25 14:22:39 2011
Corrupt Block Found
TSN = 5, TSNAME = NARI_DATA
RFN = 28, BLK = 2756885, RDBA = 120197397
OBJN = 82567, OBJD = 52667, OBJECT = NR_AIN_SMP_18_R, SUBOBJECT =
SEGMENT OWNER = NARI, SEGMENT TYPE = Table Segment
Wed May 25 14:22:43 2011
Errors in file /oracle2/oracle/admin/orcl/udump/orcl_ora_21233.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [28], [2756885], [18019], [], [], [], []

启用应用之前我曾经执行了blockrecover datafile 28,block 2756885，

(12) maclean
Wed, 25 May 2011 16:11:40 +0800

DAVID 你好，

该ORA-00600[kddummy_blkchk][18019]可以尝试通过设置DB_BLOCK_CHECKSUM为OFF来绕过，具体方法如下:

alter system set DB_BLOCK_CHECKSUM = OFF;

将corrupt block所在数据表，如你这里的NR_AIN_SMP_18_R 执行move操作

alter table NARI.NR_AIN_SMP_18_R move;

之后重建该表上的所有索引

alter index XXX rebuild;

请将alert.log告警日志和trace中的内容贴全，以便诊断。
如果后续还有问题，请跟帖，谢谢。

(11) DAVID
Wed, 25 May 2011 15:55:44 +0800

ksedmp: internal or fatal error Current SQL statement for this session: ORA-00600: internal error code, arguments: [kddummy_blkchk], [28], [2756885], [18019], [], [], [], [] Current SQL statement for this session: INSERT INTO NR_AIN_SMP_18_R VALUES(1306307980 , -0.011, 0.229, 0.012, -0.011, 0.000, 0.000,1531.840, 0.197,348.497,1168.399, 13.783, 0.000, 0.000, 42.230,183.638, 0.139,368.674, 0.001, 5.808, 7.919, 8.117, 0.198, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000) —– Call Stack Trace —–

(10) maclean
Tue, 24 May 2011 20:25:01 +0800

Hi Tony, You can revoke the sys privileges ,but you can’n revoke object privileges ,which the object is owned by this schema.

SQL> create user maclean identified by maclean;

User created.

SQL> grant create session,create table to maclean;

Grant succeeded.

SQL> alter user maclean quota unlimited on users;

User altered.

SQL> conn maclean/maclean

Connected.

SQL> create table tv (t1 int);

Table created.

SQL> insert into tv values(1);

1 row created.

SQL> commit;

Commit complete.

SQL> conn / as sysdba Connected.

SQL> revoke create table from maclean; Revoke succeeded.

— As here,we can revoke a sys privileges — create table

SQL> revoke all on maclean.tv from maclean;

Revoke succeeded.

— As here, we try to revoke maclean owned object’s permission, but this statement won’t work!

SQL> conn maclean/maclean

Connected.

SQL> select * from maclean.tv;

———-

SQL> insert into maclean.tv values(1);

1 row created.

SQL> commit;

Commit complete.

SQL> alter table maclean.tv add t2 blob;

Table altered.

(9) CrazyTony
Tue, 24 May 2011 09:50:22 +0800

I want to create a user, and allow him to create a table. Once the table is
established, I want to revoke the system privilege of creating, modifying,
or dropping an object in his schema. Can I do that?
================
SQL> connect buck/passwd
Connected.

SQL> create user sam identified by uncle default tablespace users quota
unlimited on users;

User created.

SQL> grant create session to sam;

Grant succeeded.

SQL> grant create session to sam;

Grant succeeded.

SQL> grant create table to sam;

Grant succeeded.

SQL> connect sam/uncle
Connected.
SQL> create table states (state_name varchar2(2));

Table created.

SQL> insert into states (state_name) values (‘NY’);

1 row created.

SQL> commit;

Commit complete.

SQL> connect buck/passwd
Connected.
SQL> revoke create table from sam;

Revoke succeeded.

SQL> connect sam/uncle;
Connected.
SQL> alter table states add zip varchar2(5);

Table altered.

SQL>
SQL> create table countries (country_name varchar2(40));
create table countries (country_name varchar2(40))
*
ERROR at line 1:
ORA-01031: insufficient privileges

SQL>
SQL> select * from countries;
select * from countries
*
ERROR at line 1:
ORA-00942: table or view does not exist

(8) Wang Weihua
Fri, 8 April 2011 17:40:18 +0800

刚把你的网站放到了友情链接里~

(7) maclean
Wed, 30 March 2011 15:32:45 +0800

答网友提问关于oracle 11.2.0.1 RAC Service-Side TAF 问题

11.2.0.1中修复了10g release 2中部分Server-Side TAF的BUG，相对而言配置起来更容易了，我这里推荐使用srvctl命令来配置，如下例：

[oracle@rh2 admin]$ srvctl add service -h

Adds a service configuration to the Oracle Clusterware.

Usage: srvctl add service -d -s {-r “” [-a “”] [-P {BASIC | NONE | PRECONNECT}] | -g [-c {UNIFORM | SINGLETON}] } [-k ] [-l [PRIMARY][,PHYSICAL_STANDBY][,LOGICAL_STANDBY][,SNAPSHOT_STANDBY]] [-y {AUTOMATIC | MANUAL}] [-q {TRUE|FALSE}] [-x {TRUE|FALSE}] [-j {SHORT|LONG}] [-B {NONE|SERVICE_TIME|THROUGHPUT}] [-e {NONE|SESSION|SELECT}] [-m {NONE|BASIC}] [-z ] [-w ] [-t ] [-f]
-d Unique name for the database
-s Service name
-r “” Comma separated list of preferred instances
-a “” Comma separated list of available instances
-g Server pool name
-c {UNIFORM | SINGLETON} Service runs on every active server in the server pool hosting this service (UNIFORM) or just one server (SINGLETON)
-k network number (default number is 1)
-P {NONE | BASIC | PRECONNECT} TAF policy specification
-l Role of the service (primary, physical_standby, logical_standby, snapshot_standby)
-y Management policy for the service (AUTOMATIC or MANUAL)
-e Failover type (NONE, SESSION, or SELECT)
-m Failover method (NONE or BASIC)
-w Failover delay
-z Failover retries
-t Edition (or “” for empty edition value)
-j Connection Load Balancing Goal (SHORT or LONG). Default is LONG.
-B Runtime Load Balancing Goal (SERVICE_TIME, THROUGHPUT, or NONE)
-x Distributed Transaction Processing (TRUE or FALSE)
-q AQ HA notifications (TRUE or FALSE)
Usage: srvctl add service -d -s -u {-r “” | -a “”} [-f]
-d Unique name for the database
-s Service name
-u Add a new instance to service configuration
-r Name of new preferred instance
-a Name of new available instance
-f Force the add operation even though a listener is not configured for a network
-h Print usage

以上为srvctl add service的命令参数定义，如果我们这里用到PRECONNECT的TAF policy和SELECT级别的FAILOVER，那么可以如下方式添加:

srvctl add service -d PROD -s maclean_taf -r “PROD1,PROD2” -P PRECONNECT -e SELECT -m BASIC -w 5 -z 180

/* 以上添加了名为maclean_taf的TAF service，下面我们会启动并使用测试它 */

srvctl start service -d PROD -s maclean_taf

SQL> col name format a15
SQL> col failover_method format a11 heading ‘METHOD’
SQL> col failover_type format a10 heading ‘TYPE’
SQL> col failover_retries format 9999999 heading ‘RETRIES’
SQL> col goal format a10
SQL> col clb_goal format a8
SQL> col AQ_HA_NOTIFICATIONS format a5 heading ‘AQNOT’
SQL> select name, failover_method, failover_type, failover_retries,goal, clb_goal,aq_ha_notifications
2 from dba_services where name=’maclean_taf’;

NAME METHOD TYPE RETRIES GOAL CLB_GOAL AQNOT
————— ———– ———- ——– ———- ——– —–
maclean_taf BASIC SELECT 180 NONE LONG NO

/* 以上可以从dba_serviecs视图观察该service的属性 */

[oracle@rh2 admin]$ lsnrctl service

LSNRCTL for Linux: Version 11.2.0.2.0 – Production on 30-MAR-2011 15:28:58

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
Services Summary…
Service “+ASM” has 1 instance(s).
Instance “+ASM1”, status READY, has 1 handler(s) for this service…
Handler(s):
“DEDICATED” established:0 refused:0 state:ready
LOCAL SERVER
Service “PROD” has 1 instance(s).
Instance “PROD1”, status READY, has 1 handler(s) for this service…
Handler(s):
“DEDICATED” established:0 refused:0 state:ready
LOCAL SERVER
Service “PRODXDB” has 1 instance(s).
Instance “PROD1”, status READY, has 1 handler(s) for this service…
Handler(s):
“D000” established:0 refused:0 current:0 max:1022 state:ready
DISPATCHER
(ADDRESS=(PROTOCOL=tcp)(HOST=rh2)(PORT=37714))
Service “maclean_taf” has 1 instance(s).
Instance “PROD1”, status READY, has 1 handler(s) for this service…
Handler(s):
“DEDICATED” established:0 refused:0 state:ready
LOCAL SERVER
The command completed successfully

/* 具体使用该TAF server-side service */

[oracle@rh2 admin]$ tnsping PROD_TAF

TNS Ping Utility for Linux: Version 11.2.0.2.0 – Production on 30-MAR-2011 15:29:31

Used parameter files:

Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = rh-cluster-scan)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = maclean_taf)))
OK (10 msec)

[oracle@rh2 admin]$ sqlplus system/password@PROD_TAF

SQL> select host_name,instance_name from v$instance;

HOST_NAME
—————————————————————-
INSTANCE_NAME
—————-
rh3.oracle.com
PROD2

/* 可以看到该会话的主会话链接到了PROD2实例上 */

/* 我们关闭PROD2实例 */

SQL> shutdown abort;
ORACLE instance shut down.

/* 再次在上述会话中查询可以看到链接透明切换到了PROD1上 */

SQL> select host_name,instance_name from v$instance;

HOST_NAME
—————————————————————-
INSTANCE_NAME
—————-
rh2.oracle.com
PROD1

(6) maclean
Wed, 30 March 2011 15:19:42 +0800

贴网友提问关于oracle 11.2.0.1 RAC Service-Side TAF 问题
ORACLE 10g RAC Service-Side TAF 配置方法两种：

第一种：用DBCA 配置Service
==========================
1）. 运行DBCA，选择ORACLE RAC Application Clusters database
2）. 在第二个界面选择：Services Management
3）. 第三个界面会出现RAC 数据库列表，用户可以在这个列表中选择要配置Service 的数据库
4）. 在Serice配置界面中，单击Add 创建新的Service，输入service名字。在Instance列表框定义实例角色，选择那个service1 作为 Preferred（首选实例），Service2 作为availiable(后备实例)。 TAF Policy有三个选项： None， Basic，Pre-connect。我们选Basic。最后点击Finish，完成Service 配置。

第二种：用srvctl 命令配置Service
=============================
1）创建service
#Srvctl add service -d -s -r “preferred-instance-list” -a “available-instance-list” -P
其中TAF-Policy可选：basic 和 preconnect。例如：
srvctl add service -d RAC -s Service2 -r “RAC1,RAC2” -a “RAC3,RAC4” -P basic

注意：在使用srvctl 创建service时，需要注意TAF策略选项必须通过dbms_service包来配置：

示例：
Begin
Dbms_service.modify_service(
Service_name=’>Service1′,
Failover_method=>dbms_service.failover_method_basic,
Failover_type=>dbms_service.failover_type_select,
Failover_retries=>180,
Failover_delay=>5
);
End;

总结：
======
注意：无论是使用dbca 工具还是使用srvctl 命令来配置service，都无法配置TAF的TYPE,DELAY,RETRIES 三个属性，必须使用dbms_service包来修改这些属性。

######################################################## oracle 11.2.0.1 RAC ##############################################################

问题1：没有在11.2的DBCA中找到配置Service的地方;

问题2: 11.2不能使用Dbms_service ; ORACLE BUG Bug 6886239

[oracle@bfb1 ~]$ srvctl add service -d bfb -s bfb_s -r bfb1,bfb2 -P basic
[oracle@bfb1 ~]$ srvctl enable service -d bfb -s bfb_s
PRCC-1010 : bfb_s was already enabled
[oracle@bfb1 ~]$ srvctl start service -d bfb -s bfb_s

修改数据字典：
begin
dbms_service.modify_service(
service_name=>’bfb_s’,
failover_method=>dbms_service.failover_method_basic,
failover_type=>dbms_service.failover_type_select,
failover_retries=>180,
failover_delay=>5
);
end;
/
===============================
set line 200;
col name for a20;
col FAILOVER_METHOD for a20;
col FAILOVER_TYPE for a15;

SQL> select name,failover_method, failover_type,FAILOVER_DELAY,FAILOVER_RETRIES from dba_services;
NAME FAILOVER_METHOD FAILOVER_TYPE FAILOVER_DELAY FAILOVER_RETRIES
——————– ——————– ————— ————– —————-
ltrac_s BASIC SELECT 5 180

RAC重启后：
===========
SQL> select name,failover_method, failover_type,FAILOVER_DELAY,FAILOVER_RETRIES from dba_services;
NAME FAILOVER_METHOD FAILOVER_TYPE FAILOVER_DELAY FAILOVER_RETRIES
——————– ——————– ————— ————– —————-
ltrac_s NONE NONE 0 0

注：RAC重启后修改数据字典的内容没有不见了。

matelink 文档：
How To Configure Server Side Transparent Application Failover [ID 460982.1]
5. Review the standard setup for the services

SQL>col name format a15
col failover_method format a11 heading ‘METHOD’
col failover_type format a10 heading ‘TYPE’
col failover_retries format 9999999 heading ‘RETRIES’
col goal format a10
col clb_goal format a8
col AQ_HA_NOTIFICATIONS format a5 heading ‘AQNOT’

SQL>select name, failover_method, failover_type, failover_retries,goal, clb_goal,aq_ha_notifications
from dba_services where service_id = 6

NAME METHOD TYPE RETRIES GOAL CLB_GOAL AQNOT
————— ———– ———- ——– ———- ——– —–
server_taf LONG NO
Please note there is no values for method, type or retries. These are required todo server side TAF.
Lack of values is due to Bug 6886239 DBMS_SERVICE parameters are not added using srvctl add service. This is fixed in release 11.2 onwards.

请问我该如何配置 Oracle 11.2.0.1 RAC 的 Service-Side TAF ？？

(5) maclean
Thu, 24 February 2011 18:52:11 +0800

Hello ConfusionBig,
1. Oracle said raw device is “fast but complex”, as I understand the “complex” means complex to manage, how many datafiles you have ,how many raw devices you need care. And it will be a little awkward when you are performing restore with raw devices(you have to recreate all raw devices)

2.files in filesystem is not a block device. A filesystem is responsible for organizing these storage sectors into files and directories, and keeping track of which sectors belong to which file and which are not being used. A filesystem may reside in a block device.
You can distinguish file and block device by the first symbol of ls command’s output, like:

[root@rh2 ~]# ls -l /dev/sda
brw-r—– 1 root disk 8, 0 Feb 18 22:39 /dev/sda -> “b” means block device

[root@rh2 ~]# ls -l /s01/enc.dbf
-rw-r—– 1 maclean oinstall 104865792 Feb 19 17:58 /s01/enc.dbf “-” means normal file

(4) ConfusionBig
Thu, 24 February 2011 12:30:55 +0800
ORACLE says raw device is “fast but complex”. How complex is it?
On linux, raw device is just binded to existing block devices(like lv) using raw utility. And in fact ,raw device is special block device, although it seems like character device by ls utility.

However,ORACLE says “Block or raw devices are not supported by Oracle Universal Installer (OUI) or Database Configuration Assistant (DBCA)”,I am sure DBCA can choose File System to locate database. Are files in FS not a block device?

Hope reply,thx lots.

(3) 贼车
Thu, 17 February 2011 18:17:52 +0800

先来占个位！
明天来提问~
这么好的oracle诊所居然没有人来咨询问题，怪哉！

(2) maclean
Sun, 5 December 2010 19:27:42 +0800

留言板回复测试

(1) maclean
Sat, 4 December 2010 23:52:21 +0800

测试留言板是否正常

Comments

Leave a Reply Cancel reply