【Oracle数据恢复】ORA-01115、ORA-01110、ORA-27091、ORA-27070、OSD-04006、O/S-Error

某用户windows 2003上的数据库由于存储故障导致系统表空间system.dbf出现IO问题,当打开数据库OPEN database时报错:

 

 

ORA-01115: IO error reading block from file 15 
ORA-01110: data file … 
ORA-27091: unable to queue I/O 
ORA-27070: async read/write failed 
OSD-04006: ReadFile() failure, unable to read from file 
O/S-Error: (OS 121) The semaphore timeout period has expired.


以上ORA-01115、ORA-01110、ORA-27091、ORA-27070、OSD-04006、O/S-Error 这堆报错本质与Oracle数据库层面没有关系,问题的根本原因是Windows上对应磁盘驱动器下的文件无法读取出来,这可能是OS bug 也可能就是对应磁盘出现了坏道或其他物理故障,所以对于该问题有限考虑在OS层面解决文件的读取问题, 如果确实发现无法从OS层面或从备份解决,那么可以考虑特殊的恢复手段。‘

 

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 13764045638 QQ号:47079569 邮箱:[email protected]

 

 

Error: OSD 4006
Text: ReadFile() failure, unable to read from file
—————————————————————————
Cause: Unexpected return from Windows NT system service ReadFile()
Action: Check OS error code and consult Windows NT documentation

This is due to a problem in Windows such that when Oracle attempted to access the data file on that device, it could not because the device timed out. This suggests that Windows has run out of asynchronous I/O buffers or there is a communications delay on the device.

There is nothing you’re going to be able to do at the database level to resolve this error, unless you move the data files to another drive. Ask the O/S system administrator to run diagnostics tools to check for possible faulty hardware and disk corruption on the disk device where the error is showing in the loader log. If the error persists, then log a call with Microsoft Support.

Oracle processes may encounter various (OS 1117) errors on a Windows 2003 Server. The text of the (OS 1117) error can be seen as follows:

C:\>net helpmsg 1117
The request could not be performed because of an I/O device error.
This error may manifest itself in different ways, depending on which Oracle process encounters the error:

Oracle RDBMS Instance Encounters (OS 1117) error

1. If an Oracle RDBMS instance encounters the error, you may see messages such as the following in the alert log for the RDBMS instance:

==========================================================================

Fri Jul 13 01:21:33 2007
Errors in file d:\oracle\db\product\admin\mydb\bdump\mydb1_lmon_4608.trc:
ORA-27091 : unable to queue I/O
ORA-27070 : async read/write failed
OSD-04006: ReadFile() failure, unable to read from file
O/S-Error: (OS 1117) The request could not be performed because of an I/O device error.
==========================================================================

Oracle ASM Instance Encounters (OS 1117) errors

2. If an Oracle ASM instance encounters the error, you may see similar errors in the ASM instance’s Alert log:

============================================
Fri Jul 13 01:22:10 2007
Errors in file d:\oracle\asm\product\admin\+asm\bdump\+asm1_gmon_3836.trc:
ORA-27091 : unable to queue I/O
ORA-27070 : async read/write failed
OSD-04016: Error queuing an asynchronous I/O request.
O/S-Error: (OS 1117) The request could not be performed because of an I/O device error.
============================================

CRS Daemon (crsd.exe) encounters 1117 errors
3. If you are running in an Oracle Clusterware environment, then you may also see errors in the crsd.log and/or certain resource logs, indicating a problem accessing the OCR (Oracle Cluster Registry). An example of those errors would be:

================================================
2007-07-13 01:21:51.766: [ OCROSD][4272]utwrite:4: Problem writing the buffer phy offset 184320 and oserror 1117
2007-07-13 01:21:51.766: [ OCROSD][4352]utwrite:4: Problem writing the buffer phy offset 184320 and oserror 1117
2007-07-13 01:21:51.766: [ OCRRAW][4352]beginlog: problem 26 clearing the log metadata buffer
2007-07-13 01:21:51.766: [ OCRRAW][4352]proprdkey: Problem in begin log
2007-07-13 01:21:51.766: [ OCRRAW][4352]proprseterror: Error in accessing physical storage [26] Marking context invalid.
================================================

CSS Daemon (ocssd.exe) encounters 1117 errors

4. Also, in an Oracle Clusterware environment, the Cluster Synchronization Services daemon (ocssd.exe) may experience problems accessing the voting disk. If this occurs, you will see an error in the ocssd.log similar to the following:

============================================
[ CSSD]2007-07-13 01:22:12.501 [4052] >ERROR: Internal Error Information:
Category: 1234
Operation: scls_block_write
Location: WriteFile
Other: unable to write block(s)
Dep: 1117

[ CSSD]2007-07-13 01:22:12.501 [4052] >ERROR: clssnmvReadBlocks: read failed 1 at offset 533 of \\.\votedsk2
[ CSSD]2007-07-13 01:22:12.501 [4052] >TRACE: clssnmDiskStateChange: state from 4 to 3 disk (1/\\.\votedsk2)
[ CSSD]2007-07-13 01:22:12.501 [2200] >TRACE: clssnmDiskPMT: disk offline (1/\\.\votedsk2)
[ CSSD]2007-07-13 01:22:12.501 [2200] >ERROR: clssnmDiskPMT: Aborting, 1 of 2 voting disks unavailable
[ CSSD]2007-07-13 01:22:12.501 [2200] >ERROR: ###################################
[ CSSD]2007-07-13 01:22:12.501 [2200] >ERROR: clssscExit: CSSD aborting
[ CSSD]2007-07-13 01:22:12.501 [2200] >ERROR: ###################################
==============================================

5. When you are running in an Oracle Clusterware environment, if the ocssd process encounters an I/O error when accessing the Voting Disk, the CSS daemon will evict the node from the cluster. This is done by signalling the Oracle Fence Driver (OraFencedrv.sys) to reboot the machine. When the fence driver reboots the machine, this will be seen as a bugcheck with stop code 0x0000ffff. You will be able to see this in the System Log with a message such as:

The computer has rebooted from a bugcheck.
The bugcheck was: 0x0000ffff (0x0000000000000000, 0x0000000000000000,
0x0000000000000000, 0x0000000000000000).
A dump was saved in: C:\WINDOWS\MEMORY.DMP.

Note that the bugcheck is expected behavior when ocssd.exe (the Cluster Synchornization Services daemon) encounters an I/O error when accessing the voting disk. The node experiencing the I/O error is intentionally rebooted to avoid a split-brain and possible data corruption when access to the voting disk is lost.
CHANGES

You may encounter this error after upgrading the Microsoft Storport driver to version 5.2.3790.4021 or later.

CAUSE

Reference Microsoft KB article#932755, available at the following URL:

http://support.microsoft.com/default.aspx?scid=kb;EN-US;932755

Per that article, one of the changes introduced in this version of the Storport driver is the following:

=========================================================
If a target returns a SCSI status of BUSY or Task Set Full, the port driver retries the command immediately. Storport retries the command an unlimited number of times. Therefore, if the busy status continues, the system could eventually experience problems.

This update configures the following behavior:

• It limits the number of retries. The default is 20.

• If the target returns a status of BUSY, the Storport driver performs a time-based pause before the Storport driver retries the command.

• If the target returns a status of Task Set Full, the Storport driver performs an I/O completion-based pause before the Storport driver retries the command.
=========================================================

Therefore, prior to upgrading the Storport driver, if a storage path had become saturated, the Storport driver would immediately continue to retry – indefinitely. This would result in slow I/O and perhaps a hang or spin scenario, but no error would be returned.

With the later version of the Storport driver, the retries are limited to 20 retries by default, with a pause between each retry. After 20 failures with a device busy status, the (OS 1117) error is returned to applications waiting on I/O. For more information on changes to the Storport driver, you must contact Microsoft.

SOLUTION

This is an I/O performance problem. You will need to increase the performance/capacity of the storage system to avoid the prolonged BUSY status. Specific solutions will vary, depending on your storage vendor, so the storage vendor may need to be contacted to assist with tuning the storage. One potential solution includes implementing multi-pathing technology to improve the throughput of the storage.


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *