Linux系统上的SCSI磁盘错误诊断技巧

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 13764045638 QQ号:47079569 邮箱:service@parnassusdata.com

 

当出现 scsi disk error 时至少收集如下数据:

/var/log/messages*

/sbin/fdisk -l
/bin/cat /proc/scsi/scsi
/bin/dmesg

 

一旦你收集了如上数据后 ,检查如上日志中与scsi error相关的记录。

若错误一直发生在某个设备上,则考虑替换该设备。  若错误发生在某个总线的不同对象上,则还需要进一步的诊断。

在下面的情况中,id 为设备对象号。

在下面的例子中我们看到的是 0号通道,id=1 并且 lun=0 ,每一行的错误都指向同一个id。我们也能看到该磁盘的不同扇区所爆出的错误。

 

 

/var/log/messages

Dec 20 10:33:23 localhost kernel: SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 27010000
Dec 20 10:33:23 localhost kernel: I/O error: dev 08:03, sector 0
Dec 20 10:33:23 localhost kernel: SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 27010000
Dec 20 10:33:23 localhost kernel: I/O error: dev 08:03, sector 13631520
Dec 20 10:33:23 localhost kernel: EXT3-fs error (device sd(8,3)): ext3_get_inode_loc: unable to read inode block - inode=852064, block=1703940
Dec 20 10:33:23 localhost kernel: SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 27010000
Dec 20 10:33:23 localhost kernel: I/O error: dev 08:03, sector 0
Dec 20 10:33:23 localhost kernel: EXT3-fs error (device sd(8,3)) in ext3_reserve_inode_write: IO failure
Dec 20 10:33:24 localhost kernel: SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 27010000
Dec 20 10:33:24 localhost kernel: I/O error: dev 08:03, sector 0
Dec 20 10:33:24 localhost kernel: SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 27010000
Dec 20 10:33:24 localhost kernel: I/O error: dev 08:03, sector 13631520
Dec 20 10:33:24 localhost kernel: EXT3-fs error (device sd(8,3)): ext3_get_inode_loc: unable to read inode block - inode=852064, block=1703940
Dec 20 10:33:24 localhost kernel: SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 27010000
Dec 20 10:33:24 localhost kernel: I/O error: dev 08:03, sector 0
Dec 20 10:33:24 localhost kernel: EXT3-fs error (device sd(8,3)) in ext3_reserve_inode_write: IO failure
Dec 20 10:33:24 localhost kernel: SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 27010000
Dec 20 10:33:24 localhost kernel: I/O error: dev 08:03, sector 0

 

 

另一个例子:

 

 

Sep 21 23:35:41 localhost kernel: klogd 1.4.1, log source = /proc/kmsg started.
Sep 21 23:35:41 localhost kernel: Inspecting /boot/System.map-2.4.18-17.7.x.4smp
Sep 21 23:35:41 localhost kernel: Loaded 17857 symbols from /boot/System.map-2.4.18-17.7.x.4smp.
Sep 21 23:35:41 localhost kernel: Symbols match kernel version 2.4.18.
Sep 21 23:35:41 localhost kernel: Loaded 256 symbols from 11 modules.
Sep 21 23:35:41 localhost kernel: SCSI disk error : host 0 channel 0 id 1 lun 0
return code = 27010000
Sep 21 23:35:41 localhost kernel: I/O error: dev 08:17, sector 66453508
Sep 21 23:35:41 localhost kernel: SCSI disk error : host 0 channel 0 id 1 lun 0
return code = 27010000
:
:
Sep 21 23:35:49 localhost kernel: scsi :''' aborting command due to timeout : pid
43891492, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 4b ae 5b 00 00 02 00'''
Sep 21 23:35:49 localhost kernel: mptscsih: OldAbort scheduling ABORT SCSI IO
(sc=c2db7200)
Sep 21 23:35:49 localhost kernel: IOs outstanding = 5
Sep 21 23:35:49 localhost kernel: scsi : aborting command due to timeout : pid
43891493, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 43 2e 5d 00 00 02 00
:
:
Sep 21 23:35:49 localhost kernel: mptscsih: ioc0: Issue of TaskMgmt Successful!
Sep 21 23:35:49 localhost kernel: SCSI host 0 abort (pid 43891492) timed out - resetting
Sep 21 23:35:49 localhost kernel: SCSI bus is being reset for host 0 channel 0.
Sep 21 23:35:50 localhost kernel: mptscsih: OldReset scheduling BUS_RESET (sc=c2db7200)
Sep 21 23:35:50 localhost kernel: IOs outstanding = 6
Sep 21 23:35:50 localhost kernel: SCSI host 0 abort (pid 43891493) timed out - resetting
:
:
Sep 21 23:35:51 localhost kernel: SCSI host 0 reset (pid 43891492) timed out again -
Sep 21 23:35:51 localhost kernel: probably an unrecoverable SCSI bus or device hang.
Sep 21 23:35:51 localhost kernel: SCSI host 0 reset (pid 43891493) timed out again -
Sep 21 23:35:51 localhost kernel: SCSI Error Report =-=-= (0:0:0)
Sep 21 23:35:51 localhost kernel: SCSI_Status=02h (CHECK CONDITION)
Sep 21 23:35:51 localhost kernel: Original_CDB[]: 28 00 02 B1 4E 62 00 00 04 00
Sep 21 23:35:51 localhost kernel: SenseData[12h]: 70 00 06 00 00 00 00 0A 00 00 00 00 29 02 02 00 00 00
Sep 21 23:35:51 localhost kernel: SenseKey=6h (UNIT ATTENTION); FRU=02h
Sep 21 23:35:51 localhost kernel: ASC/ASCQ=29h/02h "SCSI BUS RESET OCCURRED"
Sep 21 23:35:51 localhost kernel: SCSI Error Report =-=-= (0:1:0)
Sep 21 23:35:51 localhost kernel: SCSI_Status=02h (CHECK CONDITION)
Sep 21 23:35:51 localhost kernel: Original_CDB[]: 2A 00 00 45 EE 5F 00 00 02 00
Sep 21 23:35:51 localhost kernel: SenseData[12h]: 70 00 06 00 00 00 00 0A 00 00 00 00 29 02 02 00 00 00
Sep 21 23:35:51 localhost kernel: SenseKey=6h (UNIT ATTENTION); FRU=02h
Sep 21 23:35:51 localhost kernel: ASC/ASCQ=29h/02h "SCSI BUS RESET OCCURRED"
Sep 21 23:35:51 localhost kernel: md3: no spare disk to reconstruct array! -- continuing in degraded mode
Sep 21 23:35:51 localhost kernel: md: updating md2 RAID superblock on device
Sep 21 23:35:52 localhost kernel: md: (skipping faulty sdb5 )
Sep 21 23:35:52 localhost kernel: md: sda5 [events: 00000012](write) sda5's sb offset: 4192832
Sep 21 23:35:52 localhost kernel: raid1: sda7: redirecting sector 30736424 to another mirror

more

scsi0: ERROR on channel 0, id 0, lun 0, CDB: Write (10) 00 06 1d 3a 0d 00 00 08 00
Info fld=0x61d3a0d, Deferred sd08:02: sense key Medium Error
Additional sense indicates Write error
I/O error: dev 08:02, sector 102498376
SCSI Error: (0:0:0) Status=02h (CHECK CONDITION)
Key=3h (MEDIUM ERROR); FRU=0Ch
ASC/ASCQ=0Ch/02h ""
CDB: 2A 00 07 F9 3A 2D 00 00 08 00
scsi0: ERROR on channel 0, id 0, lun 0, CDB: Write (10) 00 07 f9 3a 2d 00 00 08 00
Info fld=0x8153a0d, Deferred sd08:02: sense key Medium Error
Additional sense indicates Write error - auto reallocation failed I/O error: dev 08:02, sector 133693544

定位设备信息:

cat /proc/scsi/scsi
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: SEAGATE Model: ST373307LC Rev: 0007
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 01 Lun: 00
Vendor: SEAGATE Model: ST373307LC Rev: 0007
Type: Direct-Access
Sector 3228343
scsi0 (1:0): rejecting I/O to offline device
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:0, o:1, dev:sda1
disk 1, wo:1, o:0, dev:sdb1
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:0, o:1, dev:sda1
scsi0 (1:0): rejecting I/O to offline device
md: write_disk_sb failed for device sdb2
md: errors occurred during superblock update, repeating scsi0 (1:0): rejecting I/O to offline device

使用 SMARTmonTools 工具 ,SMARTmonTools 收集 磁盘驱动信息,下面是 smartmontools 的例子:

[root@xxx-a log]# smartctl -a /dev/sda1
smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
Device: SEAGATE ST360057SSUN600G Version: 0B25
Serial number: 001112223333
Device type: disk
Transport protocol: SAS
Local Time is: Wed Oct 10 09:07:02 2012 PDT
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK

Comment

*

沪ICP备14014813号

沪公网安备 31010802001379号