【Exadata一体机】Exadata Cell监控最佳实践

  1. Verify cable connections via the following steps

Visually inspect all cables for proper connectivity.

 

确认缆线链接正常

 

 

 

[root@dm01db01 ~]# cat /sys/class/net/ib0/carrier

1

[root@dm01db01 ~]# cat /sys/class/net/ib1/carrier

1

 

确认输出是1

 

 

检查这些命令,

ls -l /sys/class/infiniband/*/ports/*/*errors*

 

 

/opt/oracle.SupportTools/ibdiagtools 目录包含了verify_topology 和infinicheck工具 运行并确认网络。下面是这些工具的信息:

 

[root@dm01db01 ~]# cd /opt/oracle.SupportTools/

[root@dm01db01 oracle.SupportTools]# ls

asrexacheck         defaultOSchoose.pl  firstconf                        make_cellboot_usb  PS4ES            sys_dirs.tar

CheckHWnFWProfile   diagnostics.iso     flush_cache.sh                   MegaSAS.log        reclaimdisks.sh

CheckSWProfile.sh   em                  harden_passwords_reset_root_ssh  ocrvothostd        setup_ssh_eq.sh

dbserver_backup.sh  exachk              ibdiagtools                      onecommand         sundiag.sh

 

 

[root@dm01db01 oracle.SupportTools]# cd ibdiagtools/

[root@dm01db01 ibdiagtools]# ls

cells_conntest.log    dcli                  ibqueryerrors.log  perf_cells.log0  perf_mesh.log1     subnet_cells.log  VERSION_FILE

cells_user_equiv.log  diagnostics.output    infinicheck        perf_cells.log1  perf_mesh.log2     subnet_hosts.log  xmonib.sh

checkbadlinks.pl      hosts_conntest.log    monitord           perf_cells.log2  README             topologies

cleanup_remote.log    hosts_user_equiv.log  netcheck           perf_hosts.log0  SampleOutputs.txt  topology-zfs

clearcounters.log     ibping_test           netcheck_scratch   perf_mesh.log0   setup-ssh          verify-topology

 

 

 

[root@dm01db01 ibdiagtools]# ./verify-topology -h

 

[ DB Machine Infiniband Cabling Topology Verification Tool ]

[Version IBD VER 2.c 11.2.3.1.1  120607]

Usage: ./verify-topology [-v|–verbose] [-r|–reuse (cached maps)]  [-m|–mapfile]

[-ibn|–ibnetdiscover (specify location of ibnetdiscover output)]

[-ibh|–ibhosts (specify location of ibhosts output)]

[-ibs|–ibswitches (specify location of ibswitches output)]

[-t|–topology [torus | quarterrack ] default is fattree]

[-a|–additional [interconnected_quarterrack]

[-factory|–factory non-exadata machines are treated as error]

 

Please note that halfrack is now redundant. Checks for Half Racks

are now done by default.

-t quarterrack

option is needed to be used only if testing on a stand alone quarterrack

-a interconnected_quarterrack

option is to be used only when testing on large multi-rack setups

-t fattree

option is the default option and not required to be specified

 

Example : perl ./verify-topology

Example : ././verify-topology -t quarterrack

Example : ././verify-topology -t torus

Example : ././verify-topology -a interconnected_quarterrack

——— Some Important properties of the fattree cabling topology————–

(1) Every internal switch must be connected to every external switch

(2) No 2 external switches must be connected to each other

——————————————————————————-

Please note that switch guid can be determined by logging in to a switch and

trying either of these commands, depending on availability –

>module-firmware show

OR

>opensm

 

 

 

[root@dm01db01 ibdiagtools]# ./verify-topology -t fattree

 

[ DB Machine Infiniband Cabling Topology Verification Tool ]

[Version IBD VER 2.c 11.2.3.1.1  120607]

External non-Exadata-image nodes found: check for ZFS if on T4-4 – else ignore

Leaf switch found: dmibsw03.acs.oracle.com (212846902ba0a0)

Spine switch found: 10.146.24.251 (2128469c74a0a0)

Leaf switch found: dmibsw02.acs.oracle.com (21284692d4a0a0)

Spine switch found: 10.146.24.252 (2128b7f744c0a0)

Spine switch found: dmibsw01.acs.oracle.com (21286cc7e2a0a0)

Spine switch found: 10.146.24.253 (2128b7ac44c0a0)

 

Found 2 leaf, 4 spine, 0 top spine switches

 

Check if all hosts have 2 CAs to different switches……………[SUCCESS]

Leaf switch check: cardinality and even distribution…………..[SUCCESS]

Spine switch check: Are any Exadata nodes connected …………..[SUCCESS]

Spine switch check: Any inter spine switch links………………[ERROR]

Spine switches 10.146.24.251 (2128469c74a0a0) & 10.146.24.252 (2128b7f744c0a0) should not be connected

[ERROR]

Spine switches 10.146.24.251 (2128469c74a0a0) & 10.146.24.253 (2128b7ac44c0a0) should not be connected

[ERROR]

Spine switches 10.146.24.252 (2128b7f744c0a0) & dmibsw01.acs.oracle.com (21286cc7e2a0a0) should not be connected

[ERROR]

Spine switches 10.146.24.252 (2128b7f744c0a0) & 10.146.24.253 (2128b7ac44c0a0) should not be connected

[ERROR]

Spine switches dmibsw01.acs.oracle.com (21286cc7e2a0a0) & 10.146.24.253 (2128b7ac44c0a0) should not be connected

 

Spine switch check: Any inter top-spine switch links…………..[SUCCESS]

Spine switch check: Correct number of spine-leaf links…………[ERROR]

Leaf switch dmibsw03.acs.oracle.com (212846902ba0a0) must be linked

to spine switch 10.146.24.252 (2128b7f744c0a0) with

at least 1 links…0 link(s) found

[ERROR]

Leaf switch dmibsw02.acs.oracle.com (21284692d4a0a0) must be linked

to spine switch 10.146.24.252 (2128b7f744c0a0) with

at least 1 links…0 link(s) found

[ERROR]

Spine switch 10.146.24.252 (2128b7f744c0a0) has fewer than 2 links to leaf switches.

It has 0

[ERROR]

Leaf switch dmibsw03.acs.oracle.com (212846902ba0a0) must be linked

to spine switch 10.146.24.253 (2128b7ac44c0a0) with

at least 1 links…0 link(s) found

[ERROR]

Leaf switch dmibsw02.acs.oracle.com (21284692d4a0a0) must be linked

to spine switch 10.146.24.253 (2128b7ac44c0a0) with

at least 1 links…0 link(s) found

[ERROR]

Spine switch 10.146.24.253 (2128b7ac44c0a0) has fewer than 2 links to leaf switches.

It has 0

 

Leaf switch check: Inter-leaf link check……………………..[ERROR]

Leaf switches dmibsw03.acs.oracle.com (212846902ba0a0) & dmibsw02.acs.oracle.com (21284692d4a0a0) have 0 links between them

They should have 7 links instead.

 

Leaf switch check: Correct number of leaf-spine links………….[SUCCESS]

 

 

 

 

确认硬件和固件

 

cd /opt/oracle.cellos/

[root@dm01db01 oracle.cellos]# ./CheckHWnFWProfile

 

[SUCCESS] The hardware and firmware profile matches one of the supported profiles

 

 

确认平台软件

 

 

 

 

 

[root@dm01db01 oracle.cellos]# cd /opt/oracle.SupportTools/

[root@dm01db01 oracle.SupportTools]# ./CheckSWProfile.sh

usage: ./CheckSWProfile.sh options

 

This script returns 0 when the platform and software on the

machine on which it runs matches one of the suppored platform and

software profiles. It will return nonzero value in all other cases.

The check is applicable both to Exadata Cells and Database Nodes

with Oracle Enterprise Linux (OEL) and RedHat Enterprise Linux (RHEL).

 

OPTIONS:

-h    Show this message

-s    Show supported platforms and software profiles for this machine

-c    Check this machine for supported platform and software profiles

-I <No space comma separated list of Infiniband switch names/ip addresses>

To check configuration for SPINE switch prefix the switch host name or

ip address with IS_SPINE.

Example: CheckSWProfile.sh -I IS_SPINEswitch1.company.com,switch2.company.com

Check for the software revision on the managed Infiniband switches

in the Database Machine. You will need to supply the password for

admin user.

-S <No space comma separated list of Infiniband switch names/ip addresses>

Example: CheckSWProfile.sh -S switch1.company.com,switch2.company.com

Prints the Serial number and Hardware version for the switches

in the Database Machine. You will need to supply the password for

admin user for Voltaire switches and root user for Sun switches.

 

 

[root@dm01db01 oracle.SupportTools]# ./CheckSWProfile.sh  -c

[INFO] Software checker check option is only available on Exadata cells.

 

[root@dm01db01 oracle.SupportTools]# ssh dm01cel01-priv

 

[root@dm01cel01 oracle.SupportTools]# ./CheckSWProfile.sh -c

 

[INFO] SUCCESS: Meets requirements of operating platform and InfiniBand software.

[INFO] Check does NOT verify correctness of configuration for installed software.

 

 

[root@dm01cel01 oracle.SupportTools]# cd /opt/oracle.cellos/

[root@dm01cel01 oracle.cellos]# ./CheckHWnFWProfile

[SUCCESS] The hardware and firmware profile matches one of the supported profiles

 

 

 

If hardware is replaced, rerun the /opt/oracle.cellos/CheckHWnFWProfile script.


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *