How to troubleshoot ‘ASM does not discover disks’

原文链接:http://www.dbaleet.org/how_to_troubleshoot_asm_does_not_discover_disks/

 

This scenario is  far from new in a conventional Oracle database environment, there are a couple of checklist need to be done if you came across this issue.

 

1. Check the ownership and permission of the asm candidate disks, it should be owned by the RDBMS software owner, eg: oracle:dba, and the permission of the candidate disks should be 660. You can check this by “ls -ltr” command on most cases.

 

2. If both ownership and permission are correct, you might have to read the disk manually by OS command “dd” under user “oracle”. Eg, if the name of the LUN to be used by asm is “/dev/asm/ocr1″ , you can read this disk by:

#su - oracle
$dd if=/dev/asm/ocr1 of=/dev/null bs=8192 count=10

If the output of the above command returns  something like “xxx in, xxx out” , then it is most likely that not be a problem of disk itself.

 

3. If you are using a multi-path technology, do not forget checking the certification information before making a plan. The certification has been documented well in MOS Oracle ASM and Multi-Pathing Technologies [ID 294869.1] . Be ware that IBM VPath is not supported on ASM, you should use alternative solution MPIO instead.

 

4. If you are using ‘ASMlib’, Firstly make sure that asmlib has been properly reconfigured, asmlib relied on specific Linux kernel versions, a mismatch between asmlib   and linux kernel   will lead a asmlib installation failure.  Secondly, please try to run

/etc/init.d/oracleasm scandisks
/etc/init.d/oracleasm listdisks

If there is no disk be found, do not rush into building ASM instance and ASM Diskgroup,  investigate the reason behind first would save your time.

 

5. Please also make sure that your asm_diskstring parameter is  properly set, ASM will only find the devices under the path which asm_diskstring provided with.

 

6.  Last but not least, kfod is a friend you can count on.

 

$export LD_LIBRARY_PATH=/tmp/OraInstall2013-09-12_06-25-45PM/ext/lib
$cd /tmp/OraInstall2013-09-12_06-25-45PM/ext/bin
$./kfod op=disks disks=all

if the above command returns nothing, try to trace this process:

$strace -f ./kfod op=disks disks=all

and investigate further by the output, it should cover all of the detail which is helpful for diagnostic this issue.

 

ON EXADATA:

Normaly, you can skip all of the 5 steps above, just take the 6th step should be enough if you griddisks are all online.

there are some dummy traces environment variables need to be set  before trace the kfod on some rare occasions.

$export CELLCLIENT_TRACE_LEVEL="all,4"
$export CELLCLIENT_AUTOFLUSH_LEVEL="all,4"
$xport CELLCLIENT_TRACE_INFO="autoflush_sync,on"
$cd /tmp/OraInstall2013-09-12_06-25-45PM/ext/bin
$./kfod op=disks disks=all
$strace -f ./kfod op=disks disks=all

 

We recently found that ASM instance and cellsrv should not be on the same node, otherwise, ASM instance won’t find any disks if cellsrv on the same node  is already up.

It seems ASM instance tend to be searching for a library called “libcell11.so”, if there is a cell version of this file and it is now up,  ASM instance would stop discovering the griddisks.

 

 

Juan Mosqueda contributes for the Exadata part of this article.

Thanks you Juan.

Comment

*

沪ICP备14014813号

沪公网安备 31010802001379号