Too many fragmentation in LMT?

这周和同事讨论技术问题时,他告诉我客户的一套11.1.0.6的数据库中某个本地管理表空间上存在大量的Extents Fragment区间碎片,这些连续的Extents没有正常合并为一个大的Extent,他怀疑这是由于11.1.0.6上的bug造成了LMT上存在大量碎片。

同事判断该表空间上有碎片的依据是从dba_free_space视图中查询到大量连续的Free Extents:

SQL> select tablespace_name,EXTENT_MANAGEMENT,ALLOCATION_TYPE from dba_tablespaces where tablespace_name='FRAGMENT';

TABLESPACE_NAME                EXTENT_MAN ALLOCATIO
------------------------------ ---------- ---------
FRAGMENT                       LOCAL      SYSTEM

SQL> select block_id,blocks from dba_free_space where tablespace_name='FRAGMENT' and rownum<10;
BLOCK_ID     BLOCKS
---------- ----------
40009     222136
25          8
9           8
17          8
33          8
41          8
49          8
57          8
65          8
.............. 

SQL> select count(*),blocks from dba_free_space where tablespace_name='FRAGMENT' and blocks=8 group by blocks;

  COUNT(*)     BLOCKS
---------- ----------
      5000          8

以上可以看到FRAGMENT表空间使用autoallocate的Local Extent Management,的确存在大量的连续Extents没有合并。在DMT即字典管理表空间模式下需要SMON进程定期维护FET$基表将tablespace上的连续空闲Extents合并为更大的一个Extents。而在LMT模式下因为采用数据文件头上(datafile header 3-8 blocks in 10g)的位图管理区间,所以无需某个后台进程特意去合并区间。

为什么LMT下连续空闲Extents没有合并而造成碎片呢?因为这套库采用11gr1较不稳定的11.1.0.6版本,所以把问题归咎为某个bug似乎可以讲得通。一开始我较为认同同事的bug论,且和同事一起查询了Metalink上11gr1上一些已知的bug,但并没有发现症状匹配的bug note。

这让我反思这个问题,过早的将cause定位到bug过于主观了,并不是所有我们预期外的情况(unexpected)都属于bug。

实际上dba_free_space所显示的信息可能并不”真实”,这种幻象往往由10g以后出现的flashback table特性引起:

SQL> select text from dba_views where view_name='DBA_FREE_SPACE';

TEXT
--------------------------------------------------------------------------------
======DMT REAL FREE EXTENTS=============

select ts.name, fi.file#, f.block#,
       f.length * ts.blocksize, f.length, f.file#
from sys.ts$ ts, sys.fet$ f, sys.file$ fi
where ts.ts# = f.ts#
  and f.ts# = fi.ts#
  and f.file# = fi.relfile#
  and ts.bitmapped = 0

union all

======LMT REAL FREE EXTENTS=============

select /*+ ordered use_nl(f) use_nl(fi) */
       ts.name, fi.file#, f.ktfbfebno,
       f.ktfbfeblks * ts.blocksize, f.ktfbfeblks, f.ktfbfefno
from sys.ts$ ts, sys.x$ktfbfe f, sys.file$ fi
where ts.ts# = f.ktfbfetsn
  and f.ktfbfetsn = fi.ts#
  and f.ktfbfefno = fi.relfile#
  and ts.bitmapped <> 0 and ts.online$ in (1,4) and ts.contents$ = 0

union all

======LMT RECYCLEBIN FREE EXTENTS=============

select /*+ ordered use_nl(u) use_nl(fi) */
       ts.name, fi.file#, u.ktfbuebno,
       u.ktfbueblks * ts.blocksize, u.ktfbueblks, u.ktfbuefno
from sys.recyclebin$ rb, sys.ts$ ts, sys.x$ktfbue u, sys.file$ fi
where ts.ts# = rb.ts#
  and rb.ts# = fi.ts#
  and u.ktfbuefno = fi.relfile#
  and u.ktfbuesegtsn = rb.ts#
  and u.ktfbuesegfno = rb.file#
  and u.ktfbuesegbno = rb.block#
  and ts.bitmapped <> 0 and ts.online$ in (1,4) and ts.contents$ = 0

union all

======DMT RECYCLEBIN FREE EXTENTS=============

select ts.name, fi.file#, u.block#,
       u.length * ts.blocksize, u.length, u.file#
from sys.ts$ ts, sys.uet$ u, sys.file$ fi, sys.recyclebin$ rb
where ts.ts# = u.ts#
  and u.ts# = fi.ts#
  and u.segfile# = fi.relfile#
  and u.ts# = rb.ts#
  and u.segfile# = rb.file#
  and u.segblock# = rb.block#
  and ts.bitmapped = 0

以上我们通过解析10g中的dba_free_space视图可以了解到该视图所显示的Free Extents由以下四个部分组成:

  1. LMT表空间上真正空闲的Extents
  2. DMT表空间上真正空闲的Extents
  3. LMT表空间上被RECYCLEBIN中对象占用的Extents
  4. DMT表空间上被RECYCLEBIN中对象占用的Extents

而在10g以前的版本中因为没有recyclebin特性的”干扰”,所以dba_free_space所显示的Free Extents由前2个部分组成,因此我们可以在10g中创建一个兼容视图以实现对真正空闲空间的查询:

create view dba_free_space_pre10g as
select ts.name TABLESPACE_NAME,
       fi.file# FILE_ID,
       f.block# BLOCK_ID,
       f.length * ts.blocksize BYTES,
       f.length BLOCKS,
       f.file# RELATIVE_FNO
  from sys.ts$ ts, sys.fet$ f, sys.file$ fi
 where ts.ts# = f.ts#
   and f.ts# = fi.ts#
   and f.file# = fi.relfile#
   and ts.bitmapped = 0
union all
select /*+ ordered use_nl(f) use_nl(fi) */
 ts.name TABLESPACE_NAME,
 fi.file# FILE_ID,
 f.ktfbfebno BLOCK_ID,
 f.ktfbfeblks * ts.blocksize BYTES,
 f.ktfbfeblks BLOCKS,
 f.ktfbfefno RELATIVE_FNO
  from sys.ts$ ts, sys.x$ktfbfe f, sys.file$ fi
 where ts.ts# = f.ktfbfetsn
   and f.ktfbfetsn = fi.ts#
   and f.ktfbfefno = fi.relfile#
   and ts.bitmapped <> 0
   and ts.online$ in (1, 4)
   and ts.contents$ = 0
 /

create view dba_free_space_recyclebin as
select /*+ ordered use_nl(u) use_nl(fi) */
 ts.name TABLESPACE_NAME,
 fi.file# FILE_ID,
 u.ktfbuebno BLOCK_ID,
 u.ktfbueblks * ts.blocksize BYTES,
 u.ktfbueblks BLOCKS,
 u.ktfbuefno RELATIVE_FNO
  from sys.recyclebin$ rb, sys.ts$ ts, sys.x$ktfbue u, sys.file$ fi
 where ts.ts# = rb.ts#
   and rb.ts# = fi.ts#
   and u.ktfbuefno = fi.relfile#
   and u.ktfbuesegtsn = rb.ts#
   and u.ktfbuesegfno = rb.file#
   and u.ktfbuesegbno = rb.block#
   and ts.bitmapped <> 0
   and ts.online$ in (1, 4)
   and ts.contents$ = 0
union all
select ts.name TABLESPACE_NAME,
       fi.file# FILE_ID,
       u.block# BLOCK_ID,
       u.length * ts.blocksize BYTES,
       u.length BLOCKS,
       u.file# RELATIVE_FNO
  from sys.ts$ ts, sys.uet$ u, sys.file$ fi, sys.recyclebin$ rb
 where ts.ts# = u.ts#
   and u.ts# = fi.ts#
   and u.segfile# = fi.relfile#
   and u.ts# = rb.ts#
   and u.segfile# = rb.file#
   and u.segblock# = rb.block#
   and ts.bitmapped = 0
/

通过以上创建的dba_free_space_pre10g和dba_free_space_recyclebin视图,我们可以很明确地区分表空间上空闲Extents。

针对本例中的LMT上存在大量连续的空闲Extent碎片,可以直接从上述视图中得到答案:

SQL> select * from dba_free_space_pre10g where tablespace_name='FRAGMENT';

TABLESPACE_NAME                   FILE_ID   BLOCK_ID      BYTES     BLOCKS RELATIVE_FNO
------------------------------ ---------- ---------- ---------- ---------- ------------
FRAGMENT                               13      40009 1819738112     222136           13

SQL> select count(*),blocks from dba_free_space_recyclebin where tablespace_name='FRAGMENT' group by blocks;

  COUNT(*)     BLOCKS
---------- ----------
      5000          8

显然是RECYCLEBIN中存在大量的小"对象"从而造成了LMT上出现大量碎片的假象

SQL> select space,count(*) from dba_recyclebin where ts_name='FRAGMENT' group by space;

     SPACE   COUNT(*)
---------- ----------
         8       5000

我们可以通过purge recyclebin来"合并"这些Extents碎片

SQL> purge dba_recyclebin;

DBA Recyclebin purged.

SQL>  select count(*),blocks from dba_free_space where tablespace_name='FRAGMENT' group by blocks;

  COUNT(*)     BLOCKS
---------- ----------
         1     262136

如果应用程序创建大量的小型堆(heap)表来存放临时数据,在不再需要这些数据时将这些堆表drop掉,那么就可能造成上述LMT”碎片”问题。我们在实际处理10g以后的这类空间问题时一定搞清楚,哪些是真正的Free Extents,而哪些是来自RECYCLEBIN的Extents。

另一方面这个case还告诉我们不要一遇到预料外的行为方式(unexpected behavior)就将问题定位到bug,这样会过早僵化我们的诊断预期。为了尽可能地发散思维,我们有必要如围棋中所提倡的”保留变化”那样来安排诊断步骤。

监控一个大事务的回滚

我们在大的事务失败时往往面临长时间的回滚,在回滚期间表会被加以TM-3 SX sub-exclusive锁,此时一般我们是无法针对表实施DDL操作的。长时间的大事务回滚可能耗尽我们的耐心,不过我们还是有办法预估何时回滚能够完成的,参考中的脚本<Script:when transaction will finish rollback>中的脚本,注意该脚本需要访问x$ktuxe内部视图,所以需要以sysdba身份方能执行。

SQL> select * from v$lock where type in ('TM','TX');
ADDR     KADDR           SID TY        ID1        ID2      LMODE    REQUEST      CTIME      BLOCK
-------- -------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
0ED0F30C 0ED0F33C          9 TM      13865          0          3          0       3757          0
2C3975FC 2C39763C          9 TX      65557        677          6          0       3757          0
SQL> select object_name,object_type from dba_objects where object_id=13865;
OBJECT_NAM OBJECT_TYPE
---------- -------------------
SAMPLE     TABLE
因为表上存在TM-3锁,所以此时是无法对表执行需要持有TM-6 exclusive排它锁的DDL操作的
SQL> drop table sample;
drop table sample
*
ERROR at line 1:
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired
SQL> select * from sample where rownum=1 for update nowait;
select * from sample where rownum=1 for update nowait
*
ERROR at line 1:
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired
rollback完成之前相关行上的row level lock不会被释放,因此dml操作会被block
SQL> select xidusn,xidslot,xidsqn,status from v$transaction;
XIDUSN    XIDSLOT     XIDSQN STATUS
---------- ---------- ---------- ----------------
1         21        677 ACTIVE
从v$tranasction视图中可以看到事务1.21.677处于active状态
使用set time on命令显示当前时间
SQL> set time on;
20:54:26 SQL>
set serveroutput on
set feedback off
prompt
prompt Looking for transactions that are rolling back ...
prompt
declare
cursor tx is
select
s.username,
t.xidusn,
t.xidslot,
t.xidsqn,
x.ktuxesiz
from
sys.x$ktuxe  x,
sys.v_$transaction  t,
sys.v_$session  s
where
x.inst_id = userenv('Instance') and
x.ktuxesta = 'ACTIVE' and
x.ktuxesiz > 1 and
t.xidusn = x.ktuxeusn and
t.xidslot = x.ktuxeslt and
t.xidsqn = x.ktuxesqn and
s.saddr = t.ses_addr;
user_name  varchar2(30);
xid_usn    number;
xid_slot   number;
xid_sqn    number;
used_ublk1 number;
used_ublk2 number;
begin
open tx;
loop
fetch tx into user_name, xid_usn, xid_slot, xid_sqn, used_ublk1;
exit when tx%notfound;
if tx%rowcount = 1
then
sys.dbms_lock.sleep(10);
end if;
select
sum(ktuxesiz)
into
used_ublk2
from
sys.x$ktuxe
where
inst_id = userenv('Instance') and
ktuxeusn = xid_usn and
ktuxeslt = xid_slot and
ktuxesqn = xid_sqn and
ktuxesta = 'ACTIVE';
if used_ublk2 < used_ublk1
then
sys.dbms_output.put_line(
user_name ||
'''s transaction ' ||
xid_usn  || '.' ||
xid_slot || '.' ||
xid_sqn  ||
' will finish rolling back at approximately ' ||
to_char(
sysdate + used_ublk2 / (used_ublk1 - used_ublk2) / 6 / 60 / 24,
'HH24:MI:SS DD-MON-YYYY'
)
);
end if;
end loop;
if user_name is null
then
sys.dbms_output.put_line('No transactions appear to be rolling back.');
end if;
end;
/
MACLEAN's transaction 1.21.677 will finish rolling back at approximately 21:46:45 25-MAY-2011

以上脚本给出了估计的1.21.677事务的回滚结束时间为21:46:45,即还有50分钟,
注意这只是根据之前单位时间内rollback的进度所得到的一个估算值,所以并不精确,但还是可以做为一个参考
再次运行以上脚本,会发现2次的结果不一定一致
SQL> /
MACLEAN's transaction 1.21.677 will finish rolling back at approximately 21:35:47 25-MAY-2011

在产品数据库中应当尽可能避免出现大事务回滚的现象,因为除去lock的影响外,large transaction rollback还会导致CPU使用率大幅上升并产生往往比事务本身所产生的更多的redo重做日志记录。

解决sqlplus的segmentation fault或hang问题

sqlplus应当是DBA 1.0时代使用最为频繁的管理工具,经常有经验丰富的老DBA会提到自己敲过几万次的sqlplus:),但有的时候这个吃饭家伙也会不好用,偶尔还会出现Segmentation fault错误,亦或者彻底hang住。在这里我介绍几种应对sqlplus无法正常使用的应对方法:

1.出现Segmentation fault,这种情况下一般是sqlplus 2进制文件被损坏了,可以通过重新build一个sqlplus来解决问题

[oracle@rh2 bin]$ sqlplus
Segmentation fault
/* 使用$ORACLE_HOME/sqlplus/lib目录下的make文件,编译一个新的sqlplus */
[oracle@rh2 ~]$ make -f $ORACLE_HOME/sqlplus/lib/ins_sqlplus.mk  newsqlplus
Linking /s01/oracle/product/11.2.0/dbhome_1/sqlplus/bin/sqlplus
rm -f /s01/oracle/product/11.2.0/dbhome_1/sqlplus/bin/sqlplus
gcc -o /s01/oracle/product/11.2.0/dbhome_1/sqlplus/bin/sqlplus -m64
-L/s01/oracle/product/11.2.0/dbhome_1/sqlplus/lib/ -L/s01/oracle/product/11.2.0/dbhome_1/lib/
-L/s01/oracle/product/11.2.0/dbhome_1/lib/stubs/  /s01/oracle/product/11.2.0/dbhome_1/sqlplus/lib/s0afimai.o
-lsqlplus -lclntsh  `cat /s01/oracle/product/11.2.0/dbhome_1/lib/ldflags`    -lncrypt11 -lnsgr11 -lnzjs11
-ln11 -lnl11 -lnro11 `cat /s01/oracle/product/11.2.0/dbhome_1/lib/ldflags`    -lncrypt11 -lnsgr11 -lnzjs11
-ln11 -lnl11 -lnnz11 -lzt11 -lztkg11 -lztkg11 -lclient11 -lnnetd11  -lvsn11 -lcommon11 -lgeneric11 -lmm
-lsnls11 -lnls11  -lcore11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lxml11 -lcore11 -lunls11 -lsnls11 -lnls11
-lcore11 -lnls11 `cat /s01/oracle/product/11.2.0/dbhome_1/lib/ldflags`    -lncrypt11 -lnsgr11 -lnzjs11 -ln11
-lnl11 -lnro11 `cat /s01/oracle/product/11.2.0/dbhome_1/lib/ldflags`    -lncrypt11 -lnsgr11 -lnzjs11 -ln11 -lnl11
-lclient11 -lnnetd11  -lvsn11 -lcommon11 -lgeneric11   -lsnls11 -lnls11  -lcore11 -lsnls11 -lnls11 -lcore11 -lsnls11
-lnls11 -lxml11 -lcore11 -lunls11 -lsnls11 -lnls11 -lcore11 -lnls11 -lclient11 -lnnetd11  -lvsn11
-lcommon11 -lgeneric11 -lsnls11 -lnls11  -lcore11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lxml11
-lcore11 -lunls11 -lsnls11 -lnls11 -lcore11 -lnls11   `cat /s01/oracle/product/11.2.0/dbhome_1/lib/sysliblist`
-Wl,-rpath,/s01/oracle/product/11.2.0/dbhome_1/lib -lm -lpthread   `cat /s01/oracle/product/11.2.0/dbhome_1/lib/sysliblist`
-ldl -lm -lpthread  -L/s01/oracle/product/11.2.0/dbhome_1/lib
/bin/chmod 755 /s01/oracle/product/11.2.0/dbhome_1/sqlplus/bin/sqlplus
rm -f /s01/oracle/product/11.2.0/dbhome_1/bin/sqlplus
mv -f /s01/oracle/product/11.2.0/dbhome_1/sqlplus/bin/sqlplus /s01/oracle/product/11.2.0/dbhome_1/bin/sqlplus
/bin/chmod 751 /s01/oracle/product/11.2.0/dbhome_1/bin/sqlplus
rm -f /s01/oracle/product/11.2.0/dbhome_1/sqlplus/lib/libsqlplus.so
rm -rf /s01/oracle/product/11.2.0/dbhome_1/sqlplus/bin/sqlplus
[oracle@rh2 ~]$ sqlplus  / as sysdba
SQL*Plus: Release 11.2.0.2.0 Production on Wed May 11 21:38:21 2011
Copyright (c) 1982, 2010, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

2.出现sqlplus之后hang住的现象,hang的原因存在多种可能:
1)instance hanging数据库实例hang住,这种情况下sqlplus无法正常登陆到正hang的实例,而登陆到其他实例是可以的;若在10g以后版本中可以使用-prelim选项登陆实例,使用该选项登陆后无法执行普通的SQL语句,但可以使用oradebug内部调试工具,通过oradebug收集必要的hanganalyze信息后,可以进一步判断hang住的原因并决定下一步的操作。

[oracle@rh2 ~]$ sqlplus  / as sysdba
.............................we suspend here!!!

[oracle@rh2 ~]$ sqlplus  -prelim / as sysdba
SQL*Plus: Release 11.2.0.2.0 Production on Wed May 11 21:46:27 2011
Copyright (c) 1982, 2010, Oracle.  All rights reserved.
SQL> oradebug setmypid;
Statement processed.
SQL> oradebug dump hanganalyze 4;
Statement processed.
SQL> oradebug dump systemstate 266;
Statement processed.
SQL> oradebug tracefile_name
/s01/orabase/diag/rdbms/prod/PROD1/trace/PROD1_ora_23436.trc   -- where dump resides

将以上trc文件提交给Oracle Support或者资深的Oracle技术人员,以便他们分析出实例hang住的原因,通过调整参数或者修复bug可以避免再次出现类似的状况。

2)一执行sqlplus就出现挂起现象,甚至没有登陆任何数据库。一般这种情况是在读取sqlplus 2进制文件或其相关的共享库文件(.so文件)时遇到了问题,或者是在实际system call系统调用execve(“sqlplus”)时遇到了错误,一般我们可以使用系统跟踪工具strace(Linux)或truss(Unix)工具来分析这种挂起现象:

/* Unix */
truss -o sqlplus_hang.log sqlplus
/* Linux */
strace -o sqlplus_hang.log sqlplus
head -10 sqlplus_hang.log
execve("/s01/db_1/bin/sqlplus", ["sqlplus"], [/* 28 vars */]) = -1 ENOEXEC (Exec format error)

可以看到以上strace记录中发现了调用execve函数(execve() executes the program pointed to by filename)运行sqlplus程序时出现了ENOEXEC错误,该ENOEXEC错误代码说明我们正在执行一个格式无效的可执行文件,具体的解释如下:

This error indicates that a request has been made to execute a file which, although it has the
appropriate permissions, does not start with a valid magic number. A magic number is the first two
bytes in a file, used to determine what type of file it is.
You tried to execute a file that is not in a valid executable format. The most common format for
binary programs under linux is called ELF. Note that your shell will run ascii files that have the
executable bit set as a shell script (ie run it as shell commands).
You can reproduce this by doing
$ dd if=/dev/random of=myfile bs=1k count=1 $ chmod +x myfile $ ./myfile zsh: exec format error: ./myfile
Note that there is a very slight possibility that you could create a valid program that does
something bad to your system!!
Note, you can have user defined ways of running programs using Linux's binfmt_misc. See
/usr/src/linux/Documentation/binfmt_misc.txt

to be continued …………

解决UDE-31623错误一例

一套Linux x86-64上的11.2.0.2数据库在使用data pump数据泵expdp工具导出时遭遇了ORA-31623错误,详细日志如下:

[oracle@rh2 dbs]$ expdp system/password dumpfile=s01:sh.dmp schemas=sh
Export: Release 11.2.0.2.0 - Production on Mon May 2 22:46:57 2011
Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
UDE-31623: operation generated ORACLE error 31623
ORA-31623: a job is not attached to this session via the specified handle
ORA-06512: at "SYS.DBMS_DATAPUMP", line 3326
ORA-06512: at "SYS.DBMS_DATAPUMP", line 4551
ORA-06512: at line 1
/* 伴随以上ORA-31623错误在告警日志中出现了以下记录 */
2011-05-02 22:47:05.792000 +08:00
Errors in file /s01/orabase/diag/rdbms/prod/PROD1/trace/PROD1_ora_22060.trc  (incident=54525):
ORA-04031: unable to allocate 56 bytes of shared memory 
("streams pool","unknown object","streams pool","fixed allocation callback")
Incident details in: /s01/orabase/diag/rdbms/prod/PROD1/incident/incdir_54525/PROD1_ora_22060_i54525.trc
2011-05-02 22:47:08.189000 +08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Dumping diagnostic data in directory=[cdmp_20110502224708], requested by (instance=1, osid=22060), summary=[incident=54525].
2011-05-02 22:47:10.502000 +08:00
Sweep [inc][54525]: completed
Sweep [inc2][54525]: completed
SQL> show parameter memory
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
hi_shared_memory_address             integer     0
memory_max_target                    big integer 912M
memory_target                        big integer 912M
shared_memory_address                integer     0

从以上日志可以看到导出进程在为”fixed allocation callback”从streams pool流池中分配内存时遇到了ORA-04031错误,进而导致了UDE-31623错误。为什么expdp要从streams pool中分配内存呢?data pump数据泵在11g中开始Advanced Queue高级队列来控制其job作业的启动、停止和重启了。如果streams pool的当前size为0,那么显然无法分配到任何内存;而当前系统中就只设置了AMM特性的memory_target内存参数,并没有显式地设置streams_pool_size参数,这就导致了问题的发生。
我们可以通过上症状在MOS上搜索可以找到Note”UDE-31623 Error with Data Pump Export”:

Applies to:
Oracle Server - Enterprise Edition - Version: 11.2.0.0. and later   [Release: 11.2 and later ]
Information in this document applies to any platform.
Goal
Data Pump Export does not start:
$ expdp system/ directory=test dumpfile=system.dmp logfile=abcd.log full=y
Export: Release 11.2.0.1.0 - Production on Thu Mar 25 12:11:31 2010
Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
UDE-31623: operation generated ORACLE error 31623
ORA-31623: a job is not attached to this session via the specified handle
ORA-06512: at "SYS.DBMS_DATAPUMP", line 3263
ORA-06512: at "SYS.DBMS_DATAPUMP", line 4488
ORA-06512: at line 1
Solution
For this Data Pump error, you will need to configure the database with some Streams Pool.
Perform the following query to find out how much STREAMS_POOL_SIZE has been allocated
select * from v$sgainfo;
...
Streams Pool Size                               0 Yes
Data Pump now uses Advanced Queuing (AQ) internally, so it can handle job stops/starts/re-starts.  
However, AQ using the Streams Pool, where the database currently has the STREAMS_POOL_SIZE set to ZERO.
Manually set the STREAMS_POOL_SIZE, re-start the database and re-attempt the Data Pump Export.
Alternately, turn on Automatic Shared Memory management (ASMM) by setting SGA_TARGET or MEMORY_TARGET.

显然上述Note在有一点上说错了,那就是设置memory_target参数无益于解决该UDE-31623问题;最好的做法还是显式地去设置streams_pool_size参数,但遗憾的是streams_pool_size虽然是一个可以动态修改的参数,但实际上是无法动态修改的:

SQL> col name for a20
SQL>
SQL> select name,isinstance_modifiable from v$system_parameter where name='streams_pool_size';
NAME                 ISINS
-------------------- -----
streams_pool_size    TRUE
SQL> alter system set streams_pool_size=60M sid='PROD1';
alter system set streams_pool_size=60M sid='PROD1'
*
ERROR at line 1:
ORA-02097: parameter cannot be modified because specified value is invalid
ORA-04033: Insufficient memory to grow pool
SQL> alter system set streams_pool_size=60M sid='PROD1' scope=spfile;
System altered.
Restart Instance .................
[oracle@rh2 dbs]$ expdp system/password dumpfile=s01:sh1.dmp schemas=sh
Export: Release 11.2.0.2.0 - Production on Mon May 2 23:28:33 2011
Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
Starting "SYSTEM"."SYS_EXPORT_SCHEMA_01":  system/******** dumpfile=s01:sh1.dmp schemas=sh
Estimate in progress using BLOCKS method...
Processing object type SCHEMA_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 273.8 MB
Processing object type SCHEMA_EXPORT/USER
Processing object type SCHEMA_EXPORT/SYSTEM_GRANT
Processing object type SCHEMA_EXPORT/ROLE_GRANT
Processing object type SCHEMA_EXPORT/DEFAULT_ROLE
Processing object type SCHEMA_EXPORT/PRE_SCHEMA/PROCACT_SCHEMA
Processing object type SCHEMA_EXPORT/TABLE/TABLE
Processing object type SCHEMA_EXPORT/TABLE/GRANT/OWNER_GRANT/OBJECT_GRANT
Processing object type SCHEMA_EXPORT/TABLE/INDEX/INDEX
Processing object type SCHEMA_EXPORT/TABLE/CONSTRAINT/CONSTRAINT
Processing object type SCHEMA_EXPORT/TABLE/INDEX/STATISTICS/INDEX_STATISTICS
Processing object type SCHEMA_EXPORT/TABLE/COMMENT
Processing object type SCHEMA_EXPORT/VIEW/VIEW
Processing object type SCHEMA_EXPORT/TABLE/CONSTRAINT/REF_CONSTRAINT
Processing object type SCHEMA_EXPORT/TABLE/INDEX/FUNCTIONAL_AND_BITMAP/INDEX
Processing object type SCHEMA_EXPORT/TABLE/INDEX/STATISTICS/FUNCTIONAL_AND_BITMAP/INDEX_STATISTICS
Processing object type SCHEMA_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
Processing object type SCHEMA_EXPORT/TABLE/INDEX/DOMAIN_INDEX/INDEX
Processing object type SCHEMA_EXPORT/MATERIALIZED_VIEW
Processing object type SCHEMA_EXPORT/DIMENSION
. . exported "SH"."CUSTOMERS"                            9.853 MB   55500 rows
. . exported "SH"."COSTS":"COSTS_Q1_1998"                139.5 KB    4411 rows
. . exported "SH"."COSTS":"COSTS_Q1_1999"                183.5 KB    5884 rows
. . exported "SH"."COSTS":"COSTS_Q1_2000"                120.6 KB    3772 rows
. . exported "SH"."COSTS":"COSTS_Q1_2001"                227.8 KB    7328 rows
. . exported "SH"."COSTS":"COSTS_Q2_1998"                79.52 KB    2397 rows
. . exported "SH"."COSTS":"COSTS_Q2_1999"                132.5 KB    4179 rows
. . exported "SH"."COSTS":"COSTS_Q2_2000"                119.0 KB    3715 rows
. . exported "SH"."COSTS":"COSTS_Q2_2001"                184.5 KB    5882 rows
. . exported "SH"."COSTS":"COSTS_Q3_1998"                131.1 KB    4129 rows
. . exported "SH"."COSTS":"COSTS_Q3_1999"                137.3 KB    4336 rows
. . exported "SH"."COSTS":"COSTS_Q3_2000"                151.4 KB    4798 rows
. . exported "SH"."COSTS":"COSTS_Q3_2001"                234.4 KB    7545 rows
. . exported "SH"."COSTS":"COSTS_Q4_1998"                144.7 KB    4577 rows
. . exported "SH"."COSTS":"COSTS_Q4_1999"                159.0 KB    5060 rows
. . exported "SH"."COSTS":"COSTS_Q4_2000"                160.2 KB    5088 rows
. . exported "SH"."COSTS":"COSTS_Q4_2001"                278.4 KB    9011 rows
. . exported "SH"."SALES":"SALES_Q1_1998"                1.412 MB   43687 rows
. . exported "SH"."SALES":"SALES_Q1_1999"                2.071 MB   64186 rows
. . exported "SH"."SALES":"SALES_Q1_2000"                2.012 MB   62197 rows
. . exported "SH"."SALES":"SALES_Q1_2001"                1.965 MB   60608 rows
. . exported "SH"."SALES":"SALES_Q2_1998"                1.160 MB   35758 rows
. . exported "SH"."SALES":"SALES_Q2_1999"                1.754 MB   54233 rows
. . exported "SH"."SALES":"SALES_Q2_2000"                1.802 MB   55515 rows
. . exported "SH"."SALES":"SALES_Q2_2001"                2.051 MB   63292 rows
. . exported "SH"."SALES":"SALES_Q3_1998"                1.633 MB   50515 rows
. . exported "SH"."SALES":"SALES_Q3_1999"                2.166 MB   67138 rows
. . exported "SH"."SALES":"SALES_Q3_2000"                1.909 MB   58950 rows
. . exported "SH"."SALES":"SALES_Q3_2001"                2.130 MB   65769 rows
. . exported "SH"."SALES":"SALES_Q4_1998"                1.581 MB   48874 rows
. . exported "SH"."SALES":"SALES_Q4_1999"                2.014 MB   62388 rows
. . exported "SH"."SALES":"SALES_Q4_2000"                1.814 MB   55984 rows
. . exported "SH"."SALES":"SALES_Q4_2001"                2.257 MB   69749 rows
. . exported "SH"."SUPPLEMENTARY_DEMOGRAPHICS"           697.3 KB    4500 rows
. . exported "SH"."FWEEK_PSCAT_SALES_MV"                 419.8 KB   11266 rows
. . exported "SH"."PROMOTIONS"                           58.89 KB     503 rows
. . exported "SH"."TIMES"                                380.8 KB    1826 rows
. . exported "SH"."CAL_MONTH_SALES_MV"                   6.312 KB      48 rows
. . exported "SH"."CHANNELS"                              7.25 KB       5 rows
. . exported "SH"."COUNTRIES"                            10.20 KB      23 rows
. . exported "SH"."PRODUCTS"                             26.17 KB      72 rows
. . exported "SH"."COSTS":"COSTS_1995"                       0 KB       0 rows
. . exported "SH"."COSTS":"COSTS_1996"                       0 KB       0 rows
. . exported "SH"."COSTS":"COSTS_H1_1997"                    0 KB       0 rows
. . exported "SH"."COSTS":"COSTS_H2_1997"                    0 KB       0 rows
. . exported "SH"."COSTS":"COSTS_Q1_2002"                    0 KB       0 rows
. . exported "SH"."COSTS":"COSTS_Q1_2003"                    0 KB       0 rows
. . exported "SH"."COSTS":"COSTS_Q2_2002"                    0 KB       0 rows
. . exported "SH"."COSTS":"COSTS_Q2_2003"                    0 KB       0 rows
. . exported "SH"."COSTS":"COSTS_Q3_2002"                    0 KB       0 rows
. . exported "SH"."COSTS":"COSTS_Q3_2003"                    0 KB       0 rows
. . exported "SH"."COSTS":"COSTS_Q4_2002"                    0 KB       0 rows
. . exported "SH"."COSTS":"COSTS_Q4_2003"                    0 KB       0 rows
. . exported "SH"."DIMENSION_EXCEPTIONS"                     0 KB       0 rows
. . exported "SH"."SALES":"SALES_1995"                       0 KB       0 rows
. . exported "SH"."SALES":"SALES_1996"                       0 KB       0 rows
. . exported "SH"."SALES":"SALES_H1_1997"                    0 KB       0 rows
. . exported "SH"."SALES":"SALES_H2_1997"                    0 KB       0 rows
. . exported "SH"."SALES":"SALES_Q1_2002"                    0 KB       0 rows
. . exported "SH"."SALES":"SALES_Q1_2003"                    0 KB       0 rows
. . exported "SH"."SALES":"SALES_Q2_2002"                    0 KB       0 rows
. . exported "SH"."SALES":"SALES_Q2_2003"                    0 KB       0 rows
. . exported "SH"."SALES":"SALES_Q3_2002"                    0 KB       0 rows
. . exported "SH"."SALES":"SALES_Q3_2003"                    0 KB       0 rows
. . exported "SH"."SALES":"SALES_Q4_2002"                    0 KB       0 rows
. . exported "SH"."SALES":"SALES_Q4_2003"                    0 KB       0 rows
Master table "SYSTEM"."SYS_EXPORT_SCHEMA_01" successfully loaded/unloaded
******************************************************************************
Dump file set for SYSTEM.SYS_EXPORT_SCHEMA_01 is:
/s01/sh1.dmp
Job "SYSTEM"."SYS_EXPORT_SCHEMA_01" successfully completed at 23:30:20
/* 11g中加强了data pump在告警日志中的记录信息,可以看到这里成功启动了DM进程*/
DM00 started with pid=55, OS id=25347, job SYSTEM.SYS_EXPORT_SCHEMA_01
2011-05-02 23:28:35.814000 +08:00
Starting background process SMCO
SMCO started with pid=56, OS id=25349
DW00 started with pid=57, OS id=25351, wid=1, job SYSTEM.SYS_EXPORT_SCHEMA_01
2011-05-02 23:28:54.029000 +08:00
XDB installed.
XDB initialized.

解决ORA-27103:internal error错误一例

一套Linux x86-64上的11.2.0.2数据库在startup启动阶段遭遇了ORA-27103:internal error内部错误,其出错日志如下:

SQL> startup nomount;
ORA-27103: internal error
Linux-x86_64 Error: 2: No such file or directory
Additional information: 9404423
Additional information: 2
oerr 27103
Usage: oerr facility error
Facility is identified by the prefix string in the error message.
For example, if you get ORA-7300, "ora" is the facility and "7300"
is the error.  So you should type "oerr ora 7300".
If you get LCD-111, type "oerr lcd 111", and so on.
================= alert.log ====================
This instance was first to mount
2011-05-02 21:49:47.009000 +08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /s01/orabase/diag/rdbms/prod/PROD1/trace/PROD1_asmb_14386.trc:
ORA-04031: unable to allocate 393240 bytes of shared memory 
("large pool","unknown object","large pool","ASM map operations hashtable")
ASMB (ospid: 14386): terminating the instance due to error 4031
System state dump requested by (instance=1, osid=14386 (ASMB)), summary=[abnormal instance termination].
System State dumped to trace file /s01/orabase/diag/rdbms/prod/PROD1/trace/PROD1_diag_14346.trc
Dumping diagnostic data in directory=[cdmp_20110502214947], requested by 
(instance=1, osid=14386 (ASMB)), summary=[abnormal instance termination].
Instance terminated by ASMB, pid = 14386
=============================system state dump============================
PROCESS 24: ASMB
----------------------------------------
SO: 0x92c955c8, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x92c955c8, name=process, file=ksu.h LINE:12451, pg=0
(process) Oracle pid:24, ser:1, calls cur/top: 0x9288c778/0x9288c778
flags : (0x6) SYSTEM
flags2: (0x0),  flags3: (0x0)
intr error: 0, call error: 0, sess error: 0, txn error 0
intr queue: empty
ksudlp FALSE at location: 0
(post info) last post received: 2296 0 2
last post received-location: ksl2.h LINE:2293 ID:kslpsr
last process to post me: 92c8e248 1 6
last post sent: 0 0 26
last post sent-location: ksa2.h LINE:282 ID:ksasnd
last process posted by me: 92c8e248 1 6
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc: 0x92d24ae0
O/S info: user: oracle, term: UNKNOWN, ospid: 14386
OSD pid info: Unix process pid: 14386, image: oracle@rh2.oracle.com (ASMB)
----------------------------------------
SO: 0x92e80a58, type: 4, owner: 0x92c955c8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x92c955c8, name=session, file=ksu.h LINE:12459, pg=0
(session) sid: 13 ser: 1 trans: (nil), creator: 0x92c955c8
flags: (0x51) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x408) -/-
DID: , short-term DID:
txn branch: (nil)
oct: 0, prv: 0, sql: (nil), psql: (nil), user: 0/SYS
ksuxds FALSE at location: 0
service name: SYS$BACKGROUND
Current Wait Stack:
Not in wait; last wait ended 1.501399 sec ago
Wait State:
fixed_waits=0 flags=0x21 boundary=(nil)/-1
Session Wait History:
elapsed time of 1.501469 sec since last wait
0: waited for 'SGA: allocation forcing component growth'
=0x0, =0x0, =0x0
wait_id=14 seq_num=27 snap_id=7
wait times: snap=0.000000 sec, exc=0.305374 sec, total=0.305390 sec
wait times: max=infinite
wait counts: calls=6 os=6
occurred after 0.000000 sec of elapsed time
1: waited for 'SGA: allocation forcing component growth'
=0x0, =0x0, =0x0
wait_id=20 seq_num=26 snap_id=1
wait times: snap=0.000001 sec, exc=0.000001 sec, total=0.000001 sec
wait times: max=infinite
wait counts: calls=1 os=1
occurred after 0.000000 sec of elapsed time
2: waited for 'SGA: allocation forcing component growth'
=0x0, =0x0, =0x0
wait_id=14 seq_num=25 snap_id=6

在没有阅读告警日志前我的第一反应可能是上次shutdown时Oracle进程没有被清理干净,导致shared memory segments一直没有释放,从而造成了以上ORA-27103错误。

不过其实这个问题告警日志里有明确的信息,即RDBMS Instance数据库实例在mount阶段asmb进程(负责db instance与asm instance的交互)试图从large pool大池中分配390k的空间,但遭遇了ORA-04031错误,如果asmb后台进程无法正常工作将直接导致db实例无法找到asm存储上的必要Extent,因此导致出出现了”ORA-27103: internal error:Linux-x86_64 Error: 2: No such file or directory”。

换而言之ORA-04031错误才是罪魁祸首,我们来是看该实例初始化的内存参数:

[oracle@rh2 dbs]$ strings spfilePROD1.ora |egrep "sga|memory|pool"
PROD1.__large_pool_size=16777216
*.memory_target=943718400
*.shared_pool_size=314572800
*.streams_pool_size=0

因为是11g的实例所以采用了automatic memory management特性管理直接设置了memory_target参数为900M,并设置了1号实例的large pool最小为16M,900M的大小对于10g的实例而言仍是绰绰有余的,但是显然在11gr2中设置memory_target为900M是不足以驱动这样一个”庞然大物”的。我们需要配置更多的内存,亦或者可以通过设置更大的large pool来解决令人郁闷的ORA-04031错误:

[oracle@rh2 dbs]$ strings spfilePROD1.ora > initPROD1.ora
[oracle@rh2 dbs]$ rm spfilePROD1.ora 
[oracle@rh2 dbs]$ vi initPROD1.ora
/* 修改memory_target为至少912M */
*.memory_target=1200M
/* 成功启动!  */
SQL> startup ;
ORACLE instance started.
Total System Global Area 1252663296 bytes
Fixed Size                  2226072 bytes
Variable Size             687868008 bytes
Database Buffers          553648128 bytes
Redo Buffers                8921088 bytes
Database mounted.
Database opened.
Private Interface 'eth1:1' configured from GPnP for use as a private interconnect.
[name='eth1:1', type=1, ip=169.254.236.169, mac=94-0c-6d-71-8c-c2, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
Public Interface 'eth0' configured from GPnP for use as a public interface.
[name='eth0', type=1, ip=192.168.1.121, mac=6c-f0-49-03-5f-99, net=192.168.1.0/24, mask=255.255.255.0, use=public/1]
Public Interface 'eth0:1' configured from GPnP for use as a public interface.
[name='eth0:1', type=1, ip=192.168.1.133, mac=6c-f0-49-03-5f-99, net=192.168.1.0/24, mask=255.255.255.0, use=public/1]
Public Interface 'eth0:2' configured from GPnP for use as a public interface.
[name='eth0:2', type=1, ip=192.168.1.122, mac=6c-f0-49-03-5f-99, net=192.168.1.0/24, mask=255.255.255.0, use=public/1]
Picked latch-free SCN scheme 3
2011-05-02 22:28:04.408000 +08:00
WARNING: db_recovery_file_dest is same as db_create_file_dest
Autotune of undo retention is turned on. 
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options.
Using parameter settings in server-side pfile /s01/oracle/product/11.2.0/dbhome_1/dbs/initPROD1.ora
System parameters with non-default values:
processes                = 150
shared_pool_size         = 304M
streams_pool_size        = 0
memory_target            = 1200M
control_files            = "+DATA/prod/controlfile/current.261.747100215"
control_files            = "+DATA/prod/controlfile/current.260.747100215"
db_block_size            = 8192
db_flash_cache_file      = "/flashcard/prod1cache.dsk"
db_flash_cache_size      = 20G
compatible               = "11.2.0.0.0"
log_archive_dest_1       = "location=+DATA"
cluster_database         = TRUE
db_create_file_dest      = "+DATA"
db_recovery_file_dest    = "+DATA"
db_recovery_file_dest_size= 40320M
thread                   = 1
undo_tablespace          = "UNDOTBS1"
instance_number          = 1
db_domain                = ""
dispatchers              = "(PROTOCOL=TCP) (SERVICE=PRODXDB)"
remote_listener          = "rh-cluster-scan:1521"
remote_listener          = "*.remote_login_pas"
audit_file_dest          = "/s01/orabase/admin/PROD/adump"
audit_trail              = "DB"
db_name                  = "PROD"
open_cursors             = 300
diagnostic_dest          = "/s01/orabase"
Cluster communication is configured to use the following interface(s) for this instance
169.254.236.169
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
2011-05-02 22:28:07.675000 +08:00
ORA-00132: syntax error or unresolved network name '*.remote_login_pas'
PMON started with pid=2, OS id=19807 
PSP0 started with pid=3, OS id=19809 
2011-05-02 22:28:08.754000 +08:00
VKTM started with pid=4, OS id=19811 at elevated priority
GEN0 started with pid=5, OS id=19815 
VKTM running at (1)millisec precision with DBRM quantum (100)ms
DIAG started with pid=6, OS id=19817 
DBRM started with pid=7, OS id=19819 
PING started with pid=8, OS id=19821 
ACMS started with pid=9, OS id=19823 
DIA0 started with pid=10, OS id=19825 
LMON started with pid=11, OS id=19827 
LMD0 started with pid=12, OS id=19829 
LMS0 started with pid=13, OS id=19831 at elevated priority
RMS0 started with pid=14, OS id=19835 
LMHB started with pid=15, OS id=19837 
MMAN started with pid=16, OS id=19839 
* Load Monitor used for high load check 
* New Low - High Load Threshold Range = [1920 - 2560] 
LGWR started with pid=18, OS id=19843 
DBW0 started with pid=17, OS id=19841 
CKPT started with pid=19, OS id=19845 
SMON started with pid=20, OS id=19847 
RECO started with pid=21, OS id=19849 
RBAL started with pid=22, OS id=19851 
ASMB started with pid=23, OS id=19853 
MMON started with pid=24, OS id=19855 
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
MMNL started with pid=25, OS id=19857 
starting up 1 shared server(s) ...
lmon registered with NM - instance number 1 (internal mem no 0)
2011-05-02 22:28:09.825000 +08:00
NOTE: initiating MARK startup 
Starting background process MARK
MARK started with pid=28, OS id=19866 
NOTE: MARK has subscribed 
Reconfiguration started (old inc 0, new inc 2)
List of instances:
1 (myinst: 1) 
Global Resource Directory frozen
* allocate domain 0, invalid = TRUE 
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info 
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Post SMON to start 1st pass IR
Submitted all GCS remote-cache requests
Post SMON to start 1st pass IR
Fix write in gcs resources
Reconfiguration complete
LCK0 started with pid=30, OS id=19872 
Starting background process RSMN
RSMN started with pid=31, OS id=19874 
ORACLE_BASE from environment = /s01/orabase
2011-05-02 22:28:12.112000 +08:00
ALTER DATABASE   MOUNT
This instance was first to mount
2011-05-02 22:28:13.202000 +08:00
NOTE: Loaded library: System 
ALTER SYSTEM SET 
local_listener='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.1.122)(PORT=1521))))' 
SCOPE=MEMORY SID='PROD1';
SUCCESS: diskgroup DATA was mounted
NOTE: dependency between database PROD and diskgroup resource ora.DATA.dg is established

/* 也可以直接增大large_pool_size来解决上述问题 */
large_pool_size=30M
memory_target=912M

ORA-20001错误一例

一套11.1.0.7上的Oracle Application Object Library应用程序,在收集schema统计信息时出现了ORA-20001错误,具体错误日志如下:

SQL> exec fnd_stats.gather_schema_statistics('AP');
PL/SQL procedure successfully completed.
SQL> show error
No errors.
============================================
Concurrent request error Log
------------------------------------
**Starts**14-APR-2011 02:20:53
**Ends**14-APR-2011 04:40:43
ORA-0000: normal, successful completion
+---------------------------------------------------------------------------+
Start of log messages from FND_FILE
+---------------------------------------------------------------------------+
In GATHER_SCHEMA_STATS , schema_name= ALL percent= 10 degree = 8 internal_flag= NOBACKUP
stats on table AQ$_WF_CONTROL_P is locked 
stats on table FND_CP_GSM_IPC_AQTBL is locked 
stats on table WF_NOTIFICATION_OUT is locked 
Error #1: ERROR: While GATHER_TABLE_STATS:
object_name=AP.JE_FR_DAS_010***ORA-20001: invalid column name or 
duplicate columns/column groups/expressions in method_opt***
Error #2: ERROR: While GATHER_TABLE_STATS:
object_name=AP.JE_FR_DAS_010_NEW***ORA-20001: invalid column name or 
duplicate columns/column groups/expressions in method_opt***
Error #3: ERROR: While GATHER_TABLE_STATS:
object_name=AP.JG_ZZ_SYS_FORMATS_ALL_B***ORA-20001: invalid column name or 
duplicate columns/column groups/expressions in method_opt***
+---------------------------------------------------------------------------+
End of log messages from FND_FILE
+---------------------------------------------------------------------------+

经确认该问题由bug 7601966: GATHER SCHEMA STATS ON AP SCHEMA FAILS WITH ORA-20001: INVALID COLUMN NAME 引起,可以通过follow文档<Gather Schema Statistics fails with Ora-20001 errors after 11G database upgrade [ID 781813.1]>解决该问题。

Probe how does your PGA consume

前2天有客户报一套10.2.0.3的数据库个别服务进程PGA使用量暴涨,疑似内存泄露(memory leak);遂提供on-site service,赶到用户现场时问题进程已经消失,系统内存使用量恢复正常,客户之前除了保留了v$process动态性能视图的信息外未抓取其他有用的诊断信息。

查看保存的v$process视图信息可以看到进程991714的PGA内存使用量达到13个G:

select spid,program,PGA_USED_MEM,PGA_ALLOC_MEM from v$process;
SPID                     PROGRAM                                          PGA_USED_MEM PGA_ALLOC_MEM
------------------------ ------------------------------------------------ ------------ -------------
991714                         oracleBTS@oam_app_a             14427510986 14432001786
oracle@oam_app_a@/oracle/product/10.2.0/dbs $ ulimit -a
time(seconds)        unlimited
file(blocks)         unlimited
data(kbytes)         unlimited
stack(kbytes)        4194304
memory(kbytes)       unlimited
coredump(blocks)     unlimited
nofiles(descriptors) unlimited
SQL> select x.ksppinm name,y.ksppstvl value
2  from sys.x$ksppi x, sys.x$ksppcv y
3  where x.inst_id=USERENV('Instance')
4  and y.inst_id = USERENV('Instance')
5  and x.indx = y.indx
6  and x.ksppinm like '%pga%';
pga_aggregate_target
200715200
NAME
--------------------------------------------------------------------------------
VALUE
--------------------------------------------------------------------------------
_pga_max_size
209715200
SQL> select x.ksppinm name,y.ksppstvl value
2  from sys.x$ksppi x, sys.x$ksppcv y
3  where x.inst_id=USERENV('Instance')
4  and y.inst_id = USERENV('Instance')
5  and x.indx = y.indx
6  and x.ksppinm like '%hash_join%';
NAME
--------------------------------------------------------------------------------
VALUE
--------------------------------------------------------------------------------
_hash_join_enabled
TRUE
SQL> show parameter work
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
fileio_network_adapters              string
workarea_size_policy                 string      AUTO

可以看到该系统使用自动PGA管理且pga_aggregate_target参数值为较小的191M,查询隐藏参数_pga_max_size可发现该参数值也为191M。

从告警日志alert.log中找不到任何信息,单个服务进程PGA使用量达到13G居然没有报ORA-04030错误!

到实例的user_dump目录下ls -ltr了一把有意外收获,找到了该991714进程最近的trace文件:

Dump file /oracle/product/10.2.0/admin/BTS/udump/bts_ora_991714.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /oracle/product/10.2.0
System name:    AIX
Node name:      oam_app_a
Release:        3
Version:        5
Machine:        0000E5DBD600
Instance name: BTS
Redo thread mounted by this instance: 1
Oracle process number: 46
Unix process pid: 991714, image: oracleBTS@oam_app_a
*** 2011-04-19 18:27:07.766
*** SERVICE NAME:(SYS$USERS) 2011-04-19 18:27:07.733
*** SESSION ID:(248.45987) 2011-04-19 18:27:07.733
WARNING: out of private memory [1]

以上trace文件中唯一有用的信息就是”WARNING: out of private memory [1]”了,但在metalink上搜索”out of private memory”找不到任何有用的信息;因为未发生ORA-04030错误,而用户也没有手动去收集过PGA Heap的堆使用情况,所以未产生任何对该进程PGA内存使用细节描述的转储信息文件,难以从dump信息中获取线索继续探索,线索断裂。

通过查询PGA状态历史记录视图可以发现在之前的2个快照时间窗口内,inused PGA总量为13G,之后PGA内存使用量又恢复正常:

select * from dba_hist_pgastat where name='total PGA inuse' order by value asc
SNAP_ID       DBID INSTANCE_NUMBER NAME                                                                  VALUE
---------- ---------- --------------- ---------------------------------------------------------------- ----------
16048 3731271451               1 total PGA inuse                                                  1.4883E+10
16047 3731271451               1 total PGA inuse                                                  1.4888E+10

查询在以上时间窗口中991714的活动会话记录,可以发现该进程一直处于”cursor: pin S wait on X”中;Oracle 10.2.0.2以后利用mutex来替代latch保护SQL游标,当硬解析发生时会出现”cursor: pin s wait on X”等待。这里可以看到需要硬解析的SQL的ID为”bp7y9fvtqra1h”,而且此处SQL_CHILD_NUMBER为-1,说明Oracle共享池中之前没有缓存该SQL游标,该问题进程是首次为该SQL语句执行硬解析。

     	SNAP_ID	DBID	INSTANCE_NUMBER	SAMPLE_ID	SAMPLE_TIME	SESSION_ID	SESSION_SERIAL#	USER_ID	SQL_ID	SQL_CHILD_NUMBER	SQL_PLAN_HASH_VALUE	FORCE_MATCHING_SIGNATURE	SQL_OPCODE	PLSQL_ENTRY_OBJECT_ID	PLSQL_ENTRY_SUBPROGRAM_ID	PLSQL_OBJECT_ID	PLSQL_SUBPROGRAM_ID	SERVICE_HASH	SESSION_TYPE	SESSION_STATE	QC_SESSION_ID	QC_INSTANCE_ID	BLOCKING_SESSION	BLOCKING_SESSION_STATUS	BLOCKING_SESSION_SERIAL#	EVENT	EVENT_ID	SEQ#	P1TEXT	P1	P2TEXT	P2	P3TEXT	P3	WAIT_CLASS	WAIT_CLASS_ID	WAIT_TIME	TIME_WAITED	XID	CURRENT_OBJ#	CURRENT_FILE#	CURRENT_BLOCK#	PROGRAM	MODULE	ACTION	CLIENT_ID
1	16047	3731271451	1	57716200	19-4月 -11 04.00.35.625 下午	248	45987	25	bp7y9fvtqra1h	-1	0	0	0					3427055676	FOREGROUND	WAITING				UNKNOWN		cursor: pin S wait on X	1729366244	1140	idn	4083918896	value	1.14676E+12	where|sleeps	21474836525	Concurrency	3875070507	0	9772		-1	0	0
2	16047	3731271451	1	57716210	19-4月 -11 04.00.45.629 下午	248	45987	25	bp7y9fvtqra1h	-1	0	0	0					3427055676	FOREGROUND	WAITING				UNKNOWN		cursor: pin S wait on X	1729366244	2137	idn	4083918896	value	1.14676E+12	where|sleeps	21474837522	Concurrency	3875070507	0	9773		-1	0	0
3	16047	3731271451	1	57716220	19-4月 -11 04.00.55.633 下午	248	45987	25	bp7y9fvtqra1h	-1	0	0	0					3427055676	FOREGROUND	WAITING				UNKNOWN		cursor: pin S wait on X	1729366244	3134	idn	4083918896	value	1.14676E+12	where|sleeps	21474838519	Concurrency	3875070507	0	9775		-1	0	0
4	16047	3731271451	1	57716230	19-4月 -11 04.01.05.637 下午	248	45987	25	bp7y9fvtqra1h	-1	0	0	0					3427055676	FOREGROUND	WAITING				UNKNOWN		cursor: pin S wait on X	1729366244	4126	idn	4083918896	value	1.14676E+12	where|sleeps	21474839511	Concurrency	3875070507	0	9772		-1	0	0

但无论是从SQL历史记录视图(Workload Repository)还是v$SQL中都无法找到该条语句,可以判断该问题进程最后也没能成功完成对该’bp7y9fvtqra1h’语句的解析!因为没有该SQL的记录,所以也无法了解该SQL的执行计划和workarea工作区的使用状况,线索再次断裂!

SQL> select * from dba_hist_sqltext where sql_id='bp7y9fvtqra1h';
no rows selected
SQL> select * From v$sql where sql_id='bp7y9fvtqra1h';
no rows selected

通过查询进程内存使用总结历史视图dba_hist_process_mem_summary可以发现占用主要内存的分类是Other,猜测这些Other内存是用来解析游标使用的临时内存(call heap).

SQL> select snap_id,category,num_processes,non_zero_allocs,used_total,allocated_total,allocated_max from dba_hist_process_mem_summary where snap_id in (16047,16048,16049) order by allocated_total desc ,snap_id ;
SNAP_ID CATEGORY        NUM_PROCESSES NON_ZERO_ALLOCS USED_TOTAL ALLOCATED_TOTAL ALLOCATED_MAX
---------- --------------- ------------- --------------- ---------- --------------- -------------
16047 Other                      95              95                 1.5062E+10    1.4665E+10
16048 Other                      96              96                 1.5053E+10    1.4664E+10
16049 Other                      94              94                  386117660      53794202
16047 Freeable                   36              36          0        36372480       3670016
16048 Freeable                   33              33          0        32112640       3670016
16047 SQL                        78              31   27346656        29785568      28016616
16048 SQL                        79              34   26812992        29240400      27885544
16049 Freeable                   27              27          0        26738688       3670016
16049 SQL                        77              31     591528         1242168        125816
16048 PL/SQL                     96              96     272224          601040         68944
16049 PL/SQL                     94              94     218104          536592         68944
16047 PL/SQL                     95              95     212816          506536         68800

因为证据断裂,无法根据现有的信息为该服务进程过量使用PGA内存来定位Bug。到MOS上搜索找不到类似的Bug,但即使有也很难定论,因为没有heapdump的话即便提交SR,Oracle GCS(Global Customer Service)也不太愿定位到Bug。

那么如果出现以上类似的PGA内存泄露的问题,我们因当如何第一时间收集有用的信息,以供后续诊断呢?我在这里提供一些可选的方案:

  1. 定期收集系统内PGA/UGA的使用情况,具体可以使用脚本<Script To Monitor RDBMS Session UGA and PGA Current And Maximum Usage>
  2. 防御式地在系统级别设置4030 dump heapdump 536870917级别的event dump事件,虽然此案例中未发生ORA-04030错误,但不代表下一次也不发生
  3. 在问题发生时,第一时间使用oradebug PGA_DETAIL_GET命令填充v$process_memory_detail视图,以便了解问题进程PGA内存的使用细节
  4. 若觉得v$process_memory_detail动态性能视图的信息还不够丰富的话,也可以做systemstate 266和heapdump 536870917级别的dump

虽然以上我们介绍了一些针对PGA内存泄露问题发生时的诊断方法,但可能你还是不了解具体要如何实践,接下来我们通过实践来体会一下,首先我们特意构建一个PGA内存泄露的假象(测试说明,不要用于生产环境!!):

[oracle@rh2 ~]$ wc -l very_large.sql 
18526 very_large.sql
/* 这里very_large.sql是我们"精心"构造的一条万分复杂的SQL语句,解析该SQL语句需要消耗大量的PGA内存 !!* /
SQL> @very_large
/* 执行very_large SQL语句开始模拟内存泄露,将持续较长时间  */
SQL> @MEMORY_USAGE_SCRIPT.SQL
Oracle Memory Usage Report: PGA And UGA Memory Usage Per Session
Host........: rh2.oracle.com
Name........: PROD1
Version.....: 11.2.0.2.0
Startup Time: 2011-04-20 19:41:32
Current Time: 2011.04.21-19:57:16
Worst possible value of concurrent PGA + UGA memory usage per session:
SID AND SERIAL#     USERNAME OR PROGRAM                              SUM(VALUE) SESSION START TIME
------------------- ------------------------------------------------ ---------- -------------------
29,465              SYS                                               180444688 2011-04-21 19:52:16
152,987             SYS                                                67781616 2011-04-21 18:40:59
146,1               oracle@rh2.oracle.com (ARC3)                       37598184 2011-04-20 19:41:43
19,5                oracle@rh2.oracle.com (ARC2)                       36484072 2011-04-20 19:41:43
17,7                oracle@rh2.oracle.com (ARC0)                       33141736 2011-04-20 19:41:42
145,1               oracle@rh2.oracle.com (ARC1)                       19837928 2011-04-20 19:41:43
125,7               oracle@rh2.oracle.com (CJQ0)                       15826432 2011-04-20 19:41:50
135,1               oracle@rh2.oracle.com (LGWR)                       13480936 2011-04-20 19:41:33
131,1               oracle@rh2.oracle.com (LMS0)                       11973608 2011-04-20 19:41:33
7,1                 oracle@rh2.oracle.com (LMS1)                       11973608 2011-04-20 19:41:33
6,1                 oracle@rh2.oracle.com (LMD0)                       11842536 2011-04-20 19:41:33
5,1                 oracle@rh2.oracle.com (DIA0)                       10580296 2011-04-20 19:41:33
25,57               SYS                                                 9854112 2011-04-21 19:56:59
10,1                oracle@rh2.oracle.com (DBW0)                        9105992 2011-04-20 19:41:33
136,1               oracle@rh2.oracle.com (SMON)                        8777056 2011-04-20 19:41:33
140,1               oracle@rh2.oracle.com (MARK)                        8565736 2011-04-20 19:41:33
130,1               oracle@rh2.oracle.com (LMON)                        8238120 2011-04-20 19:41:33
138,1               oracle@rh2.oracle.com (MMON)                        7215184 2011-04-20 19:41:33
31,1                oracle@rh2.oracle.com (SMCO)                        7123896 2011-04-20 19:43:52
3,1                 oracle@rh2.oracle.com (DIAG)                        6730728 2011-04-20 19:41:33
16,1                oracle@rh2.oracle.com (RSMN)                        5420008 2011-04-20 19:41:35
150,5               oracle@rh2.oracle.com (Q000)                        5001608 2011-04-20 19:41:46
23,1                oracle@rh2.oracle.com (Q001)                        3445984 2011-04-20 19:41:46
22,1                oracle@rh2.oracle.com (QMNC)                        3314960 2011-04-20 19:41:45
12,1                oracle@rh2.oracle.com (RECO)                        3249448 2011-04-20 19:41:33
11,1                oracle@rh2.oracle.com (CKPT)                        3086120 2011-04-20 19:41:33
128,1               oracle@rh2.oracle.com (DBRM)                        2667304 2011-04-20 19:41:33
14,1                oracle@rh2.oracle.com (MMNL)                        2143208 2011-04-20 19:41:33
127,1               oracle@rh2.oracle.com (GEN0)                        2012136 2011-04-20 19:41:33
158,183             SYS                                                 1758344 2011-04-21 07:44:57
143,23              SYS                                                 1692808 2011-04-21 07:45:01
142,1               oracle@rh2.oracle.com (LCK0)                        1299288 2011-04-20 19:41:34
149,1               oracle@rh2.oracle.com (RCBG)                        1160120 2011-04-20 19:41:45
33,59               oracle@rh2.oracle.com (W000)                         963512 2011-04-21 19:55:14
4,1                 oracle@rh2.oracle.com (PING)                         898024 2011-04-20 19:41:33
126,1               oracle@rh2.oracle.com (PSP0)                         832488 2011-04-20 19:41:32
13,1                oracle@rh2.oracle.com (ASMB)                         832488 2011-04-20 19:41:33
134,1               oracle@rh2.oracle.com (MMAN)                         832488 2011-04-20 19:41:33
144,1               oracle@rh2.oracle.com (O000)                         832488 2011-04-20 19:41:36
129,1               oracle@rh2.oracle.com (ACMS)                         832488 2011-04-20 19:41:33
133,1               oracle@rh2.oracle.com (RMS0)                         832488 2011-04-20 19:41:33
1,1                 oracle@rh2.oracle.com (PMON)                         832488 2011-04-20 19:41:32
9,1                 oracle@rh2.oracle.com (LMHB)                         832488 2011-04-20 19:41:33
21,1                oracle@rh2.oracle.com (GTX0)                         832488 2011-04-20 19:41:45
18,3                oracle@rh2.oracle.com (O001)                         832488 2011-04-20 19:41:37
137,1               oracle@rh2.oracle.com (RBAL)                         832488 2011-04-20 19:41:33
2,1                 oracle@rh2.oracle.com (VKTM)                         832488 2011-04-20 19:41:33
Worst possible total and average values of concurrent PGA + UGA memory usage:
564679192 bytes (total) and ~6007225 bytes (average), for ~47 sessions.
Approximate value of current PGA + UGA memory usage per session:
SID AND SERIAL#     USERNAME OR PROGRAM                              SUM(VALUE) SESSION START TIME
------------------- ------------------------------------------------ ---------- -------------------
29,465              SYS                                               178083824 2011-04-21 19:52:16
146,1               oracle@rh2.oracle.com (ARC3)                       36484072 2011-04-20 19:41:43
19,5                oracle@rh2.oracle.com (ARC2)                       35369960 2011-04-20 19:41:43
17,7                oracle@rh2.oracle.com (ARC0)                       33141736 2011-04-20 19:41:42
145,1               oracle@rh2.oracle.com (ARC1)                       19837928 2011-04-20 19:41:43
135,1               oracle@rh2.oracle.com (LGWR)                       13480936 2011-04-20 19:41:33
7,1                 oracle@rh2.oracle.com (LMS1)                       11973608 2011-04-20 19:41:33
131,1               oracle@rh2.oracle.com (LMS0)                       11973608 2011-04-20 19:41:33
6,1                 oracle@rh2.oracle.com (LMD0)                       11842536 2011-04-20 19:41:33
5,1                 oracle@rh2.oracle.com (DIA0)                       10580296 2011-04-20 19:41:33
10,1                oracle@rh2.oracle.com (DBW0)                        8712776 2011-04-20 19:41:33
140,1               oracle@rh2.oracle.com (MARK)                        8565736 2011-04-20 19:41:33
130,1               oracle@rh2.oracle.com (LMON)                        8238120 2011-04-20 19:41:33
3,1                 oracle@rh2.oracle.com (DIAG)                        6730728 2011-04-20 19:41:33
152,987             SYS                                                 6224040 2011-04-21 18:40:59
16,1                oracle@rh2.oracle.com (RSMN)                        5420008 2011-04-20 19:41:35
125,7               oracle@rh2.oracle.com (CJQ0)                        4854824 2011-04-20 19:41:50
25,57               SYS                                                 4738504 2011-04-21 19:56:59
138,1               oracle@rh2.oracle.com (MMON)                        4165448 2011-04-20 19:41:33
136,1               oracle@rh2.oracle.com (SMON)                        3863504 2011-04-20 19:41:33
150,5               oracle@rh2.oracle.com (Q000)                        3108848 2011-04-20 19:41:46
11,1                oracle@rh2.oracle.com (CKPT)                        2561832 2011-04-20 19:41:33
12,1                oracle@rh2.oracle.com (RECO)                        2538120 2011-04-20 19:41:33
31,1                oracle@rh2.oracle.com (SMCO)                        2536376 2011-04-20 19:43:52
128,1               oracle@rh2.oracle.com (DBRM)                        2339768 2011-04-20 19:41:33
23,1                oracle@rh2.oracle.com (Q001)                        2339672 2011-04-20 19:41:46
22,1                oracle@rh2.oracle.com (QMNC)                        2242336 2011-04-20 19:41:45
127,1               oracle@rh2.oracle.com (GEN0)                        2012136 2011-04-20 19:41:33
14,1                oracle@rh2.oracle.com (MMNL)                        1946600 2011-04-20 19:41:33
158,183             SYS                                                 1692856 2011-04-21 07:44:57
143,23              SYS                                                 1561784 2011-04-21 07:45:01
142,1               oracle@rh2.oracle.com (LCK0)                        1299288 2011-04-20 19:41:34
149,1               oracle@rh2.oracle.com (RCBG)                        1160120 2011-04-20 19:41:45
33,59               oracle@rh2.oracle.com (W000)                         963512 2011-04-21 19:55:14
4,1                 oracle@rh2.oracle.com (PING)                         898024 2011-04-20 19:41:33
13,1                oracle@rh2.oracle.com (ASMB)                         832488 2011-04-20 19:41:33
134,1               oracle@rh2.oracle.com (MMAN)                         832488 2011-04-20 19:41:33
129,1               oracle@rh2.oracle.com (ACMS)                         832488 2011-04-20 19:41:33
1,1                 oracle@rh2.oracle.com (PMON)                         832488 2011-04-20 19:41:32
9,1                 oracle@rh2.oracle.com (LMHB)                         832488 2011-04-20 19:41:33
21,1                oracle@rh2.oracle.com (GTX0)                         832488 2011-04-20 19:41:45
18,3                oracle@rh2.oracle.com (O001)                         832488 2011-04-20 19:41:37
137,1               oracle@rh2.oracle.com (RBAL)                         832488 2011-04-20 19:41:33
126,1               oracle@rh2.oracle.com (PSP0)                         832488 2011-04-20 19:41:32
133,1               oracle@rh2.oracle.com (RMS0)                         832488 2011-04-20 19:41:33
2,1                 oracle@rh2.oracle.com (VKTM)                         832488 2011-04-20 19:41:33
144,1               oracle@rh2.oracle.com (O000)                         832488 2011-04-20 19:41:36
Current total and average values of concurrent PGA + UGA memory usage:
463473320 bytes (total) and ~4930567 bytes (average), for ~47 sessions.
Maximum value of PGA memory usage per session:
SID AND SERIAL#     USERNAME OR PROGRAM                                   VALUE SESSION START TIME
------------------- ------------------------------------------------ ---------- -------------------
29,465              SYS                                               177212856 2011-04-21 19:52:16
152,987             SYS                                                57208040 2011-04-21 18:40:59
146,1               oracle@rh2.oracle.com (ARC3)                       37416168 2011-04-20 19:41:43
19,5                oracle@rh2.oracle.com (ARC2)                       36302056 2011-04-20 19:41:43
17,7                oracle@rh2.oracle.com (ARC0)                       32959720 2011-04-20 19:41:42
145,1               oracle@rh2.oracle.com (ARC1)                       19655912 2011-04-20 19:41:43
135,1               oracle@rh2.oracle.com (LGWR)                       13298920 2011-04-20 19:41:33
125,7               oracle@rh2.oracle.com (CJQ0)                       13045176 2011-04-20 19:41:50
131,1               oracle@rh2.oracle.com (LMS0)                       11791592 2011-04-20 19:41:33
7,1                 oracle@rh2.oracle.com (LMS1)                       11791592 2011-04-20 19:41:33
6,1                 oracle@rh2.oracle.com (LMD0)                       11660520 2011-04-20 19:41:33
5,1                 oracle@rh2.oracle.com (DIA0)                       10398280 2011-04-20 19:41:33
10,1                oracle@rh2.oracle.com (DBW0)                        8923976 2011-04-20 19:41:33
140,1               oracle@rh2.oracle.com (MARK)                        8383720 2011-04-20 19:41:33
130,1               oracle@rh2.oracle.com (LMON)                        8056104 2011-04-20 19:41:33
31,1                oracle@rh2.oracle.com (SMCO)                        6876392 2011-04-20 19:43:52
3,1                 oracle@rh2.oracle.com (DIAG)                        6548712 2011-04-20 19:41:33
25,57               SYS                                                 6163896 2011-04-21 19:56:59
136,1               oracle@rh2.oracle.com (SMON)                        5893352 2011-04-20 19:41:33
138,1               oracle@rh2.oracle.com (MMON)                        5294872 2011-04-20 19:41:33
16,1                oracle@rh2.oracle.com (RSMN)                        5237992 2011-04-20 19:41:35
150,5               oracle@rh2.oracle.com (Q000)                        3910216 2011-04-20 19:41:46
11,1                oracle@rh2.oracle.com (CKPT)                        2904104 2011-04-20 19:41:33
23,1                oracle@rh2.oracle.com (Q001)                        2551016 2011-04-20 19:41:46
22,1                oracle@rh2.oracle.com (QMNC)                        2485480 2011-04-20 19:41:45
12,1                oracle@rh2.oracle.com (RECO)                        2485480 2011-04-20 19:41:33
128,1               oracle@rh2.oracle.com (DBRM)                        2223336 2011-04-20 19:41:33
14,1                oracle@rh2.oracle.com (MMNL)                        1961192 2011-04-20 19:41:33
127,1               oracle@rh2.oracle.com (GEN0)                        1830120 2011-04-20 19:41:33
158,183             SYS                                                 1510840 2011-04-21 07:44:57
143,23              SYS                                                 1445304 2011-04-21 07:45:01
142,1               oracle@rh2.oracle.com (LCK0)                        1117272 2011-04-20 19:41:34
149,1               oracle@rh2.oracle.com (RCBG)                         912616 2011-04-20 19:41:45
33,59               oracle@rh2.oracle.com (W000)                         716008 2011-04-21 19:55:14
4,1                 oracle@rh2.oracle.com (PING)                         716008 2011-04-20 19:41:33
144,1               oracle@rh2.oracle.com (O000)                         650472 2011-04-20 19:41:36
137,1               oracle@rh2.oracle.com (RBAL)                         650472 2011-04-20 19:41:33
134,1               oracle@rh2.oracle.com (MMAN)                         650472 2011-04-20 19:41:33
133,1               oracle@rh2.oracle.com (RMS0)                         650472 2011-04-20 19:41:33
129,1               oracle@rh2.oracle.com (ACMS)                         650472 2011-04-20 19:41:33
126,1               oracle@rh2.oracle.com (PSP0)                         650472 2011-04-20 19:41:32
21,1                oracle@rh2.oracle.com (GTX0)                         650472 2011-04-20 19:41:45
18,3                oracle@rh2.oracle.com (O001)                         650472 2011-04-20 19:41:37
13,1                oracle@rh2.oracle.com (ASMB)                         650472 2011-04-20 19:41:33
9,1                 oracle@rh2.oracle.com (LMHB)                         650472 2011-04-20 19:41:33
2,1                 oracle@rh2.oracle.com (VKTM)                         650472 2011-04-20 19:41:33
1,1                 oracle@rh2.oracle.com (PMON)                         650472 2011-04-20 19:41:32
Worst possible total and average values of concurrent PGA memory usage:
528694504 bytes (total) and ~11248819 bytes (average), for ~47 sessions.
Maximum value of UGA memory usage per session:
SID AND SERIAL#     USERNAME OR PROGRAM                                   VALUE SESSION START TIME
------------------- ------------------------------------------------ ---------- -------------------
152,987             SYS                                                10573576 2011-04-21 18:40:59
25,57               SYS                                                 3690216 2011-04-21 19:56:59
29,465              SYS                                                 3231832 2011-04-21 19:52:16
136,1               oracle@rh2.oracle.com (SMON)                        2883704 2011-04-20 19:41:33
125,7               oracle@rh2.oracle.com (CJQ0)                        2781256 2011-04-20 19:41:50
138,1               oracle@rh2.oracle.com (MMON)                        1920312 2011-04-20 19:41:33
150,5               oracle@rh2.oracle.com (Q000)                        1091392 2011-04-20 19:41:46
23,1                oracle@rh2.oracle.com (Q001)                         894968 2011-04-20 19:41:46
22,1                oracle@rh2.oracle.com (QMNC)                         829480 2011-04-20 19:41:45
12,1                oracle@rh2.oracle.com (RECO)                         763968 2011-04-20 19:41:33
128,1               oracle@rh2.oracle.com (DBRM)                         443968 2011-04-20 19:41:33
158,183             SYS                                                  247504 2011-04-21 07:44:57
149,1               oracle@rh2.oracle.com (RCBG)                         247504 2011-04-20 19:41:45
143,23              SYS                                                  247504 2011-04-21 07:45:01
33,59               oracle@rh2.oracle.com (W000)                         247504 2011-04-21 19:55:14
31,1                oracle@rh2.oracle.com (SMCO)                         247504 2011-04-20 19:43:52
146,1               oracle@rh2.oracle.com (ARC3)                         182016 2011-04-20 19:41:43
145,1               oracle@rh2.oracle.com (ARC1)                         182016 2011-04-20 19:41:43
144,1               oracle@rh2.oracle.com (O000)                         182016 2011-04-20 19:41:36
142,1               oracle@rh2.oracle.com (LCK0)                         182016 2011-04-20 19:41:34
140,1               oracle@rh2.oracle.com (MARK)                         182016 2011-04-20 19:41:33
137,1               oracle@rh2.oracle.com (RBAL)                         182016 2011-04-20 19:41:33
135,1               oracle@rh2.oracle.com (LGWR)                         182016 2011-04-20 19:41:33
134,1               oracle@rh2.oracle.com (MMAN)                         182016 2011-04-20 19:41:33
133,1               oracle@rh2.oracle.com (RMS0)                         182016 2011-04-20 19:41:33
131,1               oracle@rh2.oracle.com (LMS0)                         182016 2011-04-20 19:41:33
130,1               oracle@rh2.oracle.com (LMON)                         182016 2011-04-20 19:41:33
129,1               oracle@rh2.oracle.com (ACMS)                         182016 2011-04-20 19:41:33
127,1               oracle@rh2.oracle.com (GEN0)                         182016 2011-04-20 19:41:33
126,1               oracle@rh2.oracle.com (PSP0)                         182016 2011-04-20 19:41:32
21,1                oracle@rh2.oracle.com (GTX0)                         182016 2011-04-20 19:41:45
19,5                oracle@rh2.oracle.com (ARC2)                         182016 2011-04-20 19:41:43
18,3                oracle@rh2.oracle.com (O001)                         182016 2011-04-20 19:41:37
17,7                oracle@rh2.oracle.com (ARC0)                         182016 2011-04-20 19:41:42
16,1                oracle@rh2.oracle.com (RSMN)                         182016 2011-04-20 19:41:35
14,1                oracle@rh2.oracle.com (MMNL)                         182016 2011-04-20 19:41:33
13,1                oracle@rh2.oracle.com (ASMB)                         182016 2011-04-20 19:41:33
11,1                oracle@rh2.oracle.com (CKPT)                         182016 2011-04-20 19:41:33
10,1                oracle@rh2.oracle.com (DBW0)                         182016 2011-04-20 19:41:33
9,1                 oracle@rh2.oracle.com (LMHB)                         182016 2011-04-20 19:41:33
7,1                 oracle@rh2.oracle.com (LMS1)                         182016 2011-04-20 19:41:33
6,1                 oracle@rh2.oracle.com (LMD0)                         182016 2011-04-20 19:41:33
5,1                 oracle@rh2.oracle.com (DIA0)                         182016 2011-04-20 19:41:33
4,1                 oracle@rh2.oracle.com (PING)                         182016 2011-04-20 19:41:33
3,1                 oracle@rh2.oracle.com (DIAG)                         182016 2011-04-20 19:41:33
2,1                 oracle@rh2.oracle.com (VKTM)                         182016 2011-04-20 19:41:33
1,1                 oracle@rh2.oracle.com (PMON)                         182016 2011-04-20 19:41:32
Worst possible total and average values of concurrent UGA memory usage:
35984688 bytes (total) and ~765631 bytes (average), for ~47 sessions.
Current value of PGA memory usage per session:
SID AND SERIAL#     USERNAME OR PROGRAM                                   VALUE SESSION START TIME
------------------- ------------------------------------------------ ---------- -------------------
29,465              SYS                                               177802680 2011-04-21 19:52:16
146,1               oracle@rh2.oracle.com (ARC3)                       36302056 2011-04-20 19:41:43
19,5                oracle@rh2.oracle.com (ARC2)                       35187944 2011-04-20 19:41:43
17,7                oracle@rh2.oracle.com (ARC0)                       32959720 2011-04-20 19:41:42
145,1               oracle@rh2.oracle.com (ARC1)                       19655912 2011-04-20 19:41:43
135,1               oracle@rh2.oracle.com (LGWR)                       13298920 2011-04-20 19:41:33
131,1               oracle@rh2.oracle.com (LMS0)                       11791592 2011-04-20 19:41:33
7,1                 oracle@rh2.oracle.com (LMS1)                       11791592 2011-04-20 19:41:33
6,1                 oracle@rh2.oracle.com (LMD0)                       11660520 2011-04-20 19:41:33
5,1                 oracle@rh2.oracle.com (DIA0)                       10398280 2011-04-20 19:41:33
10,1                oracle@rh2.oracle.com (DBW0)                        8530760 2011-04-20 19:41:33
140,1               oracle@rh2.oracle.com (MARK)                        8383720 2011-04-20 19:41:33
130,1               oracle@rh2.oracle.com (LMON)                        8056104 2011-04-20 19:41:33
3,1                 oracle@rh2.oracle.com (DIAG)                        6548712 2011-04-20 19:41:33
16,1                oracle@rh2.oracle.com (RSMN)                        5237992 2011-04-20 19:41:35
152,987             SYS                                                 4582632 2011-04-21 18:40:59
125,7               oracle@rh2.oracle.com (CJQ0)                        3935672 2011-04-20 19:41:50
25,57               SYS                                                 3787544 2011-04-21 19:56:59
136,1               oracle@rh2.oracle.com (SMON)                        3140840 2011-04-20 19:41:33
138,1               oracle@rh2.oracle.com (MMON)                        3066648 2011-04-20 19:41:33
150,5               oracle@rh2.oracle.com (Q000)                        2468424 2011-04-20 19:41:46
11,1                oracle@rh2.oracle.com (CKPT)                        2379816 2011-04-20 19:41:33
31,1                oracle@rh2.oracle.com (SMCO)                        2288872 2011-04-20 19:43:52
12,1                oracle@rh2.oracle.com (RECO)                        2223336 2011-04-20 19:41:33
128,1               oracle@rh2.oracle.com (DBRM)                        2092264 2011-04-20 19:41:33
23,1                oracle@rh2.oracle.com (Q001)                        1961192 2011-04-20 19:41:46
22,1                oracle@rh2.oracle.com (QMNC)                        1961192 2011-04-20 19:41:45
127,1               oracle@rh2.oracle.com (GEN0)                        1830120 2011-04-20 19:41:33
14,1                oracle@rh2.oracle.com (MMNL)                        1764584 2011-04-20 19:41:33
158,183             SYS                                                 1510840 2011-04-21 07:44:57
143,23              SYS                                                 1379768 2011-04-21 07:45:01
142,1               oracle@rh2.oracle.com (LCK0)                        1117272 2011-04-20 19:41:34
149,1               oracle@rh2.oracle.com (RCBG)                         912616 2011-04-20 19:41:45
33,59               oracle@rh2.oracle.com (W000)                         716008 2011-04-21 19:55:14
4,1                 oracle@rh2.oracle.com (PING)                         716008 2011-04-20 19:41:33
144,1               oracle@rh2.oracle.com (O000)                         650472 2011-04-20 19:41:36
137,1               oracle@rh2.oracle.com (RBAL)                         650472 2011-04-20 19:41:33
134,1               oracle@rh2.oracle.com (MMAN)                         650472 2011-04-20 19:41:33
133,1               oracle@rh2.oracle.com (RMS0)                         650472 2011-04-20 19:41:33
129,1               oracle@rh2.oracle.com (ACMS)                         650472 2011-04-20 19:41:33
126,1               oracle@rh2.oracle.com (PSP0)                         650472 2011-04-20 19:41:32
21,1                oracle@rh2.oracle.com (GTX0)                         650472 2011-04-20 19:41:45
18,3                oracle@rh2.oracle.com (O001)                         650472 2011-04-20 19:41:37
13,1                oracle@rh2.oracle.com (ASMB)                         650472 2011-04-20 19:41:33
9,1                 oracle@rh2.oracle.com (LMHB)                         650472 2011-04-20 19:41:33
2,1                 oracle@rh2.oracle.com (VKTM)                         650472 2011-04-20 19:41:33
1,1                 oracle@rh2.oracle.com (PMON)                         650472 2011-04-20 19:41:32
Current total and average values of concurrent PGA memory usage:
449247816 bytes (total) and ~9558464 bytes (average), for ~47 sessions.
Current value of UGA memory usage per session:
SID AND SERIAL#     USERNAME OR PROGRAM                                   VALUE SESSION START TIME
------------------- ------------------------------------------------ ---------- -------------------
152,987             SYS                                                 1641408 2011-04-21 18:40:59
138,1               oracle@rh2.oracle.com (MMON)                        1098800 2011-04-20 19:41:33
25,57               SYS                                                  950960 2011-04-21 19:56:59
125,7               oracle@rh2.oracle.com (CJQ0)                         919152 2011-04-20 19:41:50
136,1               oracle@rh2.oracle.com (SMON)                         722664 2011-04-20 19:41:33
150,5               oracle@rh2.oracle.com (Q000)                         640424 2011-04-20 19:41:46
23,1                oracle@rh2.oracle.com (Q001)                         378480 2011-04-20 19:41:46
12,1                oracle@rh2.oracle.com (RECO)                         314784 2011-04-20 19:41:33
29,465              SYS                                                  281144 2011-04-21 19:52:16
22,1                oracle@rh2.oracle.com (QMNC)                         281144 2011-04-20 19:41:45
149,1               oracle@rh2.oracle.com (RCBG)                         247504 2011-04-20 19:41:45
128,1               oracle@rh2.oracle.com (DBRM)                         247504 2011-04-20 19:41:33
33,59               oracle@rh2.oracle.com (W000)                         247504 2011-04-21 19:55:14
31,1                oracle@rh2.oracle.com (SMCO)                         247504 2011-04-20 19:43:52
158,183             SYS                                                  182016 2011-04-21 07:44:57
146,1               oracle@rh2.oracle.com (ARC3)                         182016 2011-04-20 19:41:43
145,1               oracle@rh2.oracle.com (ARC1)                         182016 2011-04-20 19:41:43
144,1               oracle@rh2.oracle.com (O000)                         182016 2011-04-20 19:41:36
143,23              SYS                                                  182016 2011-04-21 07:45:01
142,1               oracle@rh2.oracle.com (LCK0)                         182016 2011-04-20 19:41:34
140,1               oracle@rh2.oracle.com (MARK)                         182016 2011-04-20 19:41:33
137,1               oracle@rh2.oracle.com (RBAL)                         182016 2011-04-20 19:41:33
135,1               oracle@rh2.oracle.com (LGWR)                         182016 2011-04-20 19:41:33
134,1               oracle@rh2.oracle.com (MMAN)                         182016 2011-04-20 19:41:33
133,1               oracle@rh2.oracle.com (RMS0)                         182016 2011-04-20 19:41:33
131,1               oracle@rh2.oracle.com (LMS0)                         182016 2011-04-20 19:41:33
130,1               oracle@rh2.oracle.com (LMON)                         182016 2011-04-20 19:41:33
129,1               oracle@rh2.oracle.com (ACMS)                         182016 2011-04-20 19:41:33
127,1               oracle@rh2.oracle.com (GEN0)                         182016 2011-04-20 19:41:33
126,1               oracle@rh2.oracle.com (PSP0)                         182016 2011-04-20 19:41:32
21,1                oracle@rh2.oracle.com (GTX0)                         182016 2011-04-20 19:41:45
19,5                oracle@rh2.oracle.com (ARC2)                         182016 2011-04-20 19:41:43
18,3                oracle@rh2.oracle.com (O001)                         182016 2011-04-20 19:41:37
17,7                oracle@rh2.oracle.com (ARC0)                         182016 2011-04-20 19:41:42
16,1                oracle@rh2.oracle.com (RSMN)                         182016 2011-04-20 19:41:35
14,1                oracle@rh2.oracle.com (MMNL)                         182016 2011-04-20 19:41:33
13,1                oracle@rh2.oracle.com (ASMB)                         182016 2011-04-20 19:41:33
11,1                oracle@rh2.oracle.com (CKPT)                         182016 2011-04-20 19:41:33
10,1                oracle@rh2.oracle.com (DBW0)                         182016 2011-04-20 19:41:33
9,1                 oracle@rh2.oracle.com (LMHB)                         182016 2011-04-20 19:41:33
7,1                 oracle@rh2.oracle.com (LMS1)                         182016 2011-04-20 19:41:33
6,1                 oracle@rh2.oracle.com (LMD0)                         182016 2011-04-20 19:41:33
5,1                 oracle@rh2.oracle.com (DIA0)                         182016 2011-04-20 19:41:33
4,1                 oracle@rh2.oracle.com (PING)                         182016 2011-04-20 19:41:33
3,1                 oracle@rh2.oracle.com (DIAG)                         182016 2011-04-20 19:41:33
2,1                 oracle@rh2.oracle.com (VKTM)                         182016 2011-04-20 19:41:33
1,1                 oracle@rh2.oracle.com (PMON)                         182016 2011-04-20 19:41:32
Current total and average values of concurrent UGA memory usage:
14225504 bytes (total) and ~302670 bytes (average), for ~47 sessions.
Current SGA structure sizings:
Total System Global Area  939495424 bytes
Fixed Size                  2232088 bytes
Variable Size             398459112 bytes
Database Buffers          532676608 bytes
Redo Buffers                6127616 bytes
Some initialization parameter values at instance startup:
large_pool_size=0
pga_aggregate_target=0
sga_target=0
shared_pool_size=0
sort_area_size=65536
streams_pool_size=0
Current Time: 2011.04.21-19:57:16
/* 可以从以上输出看到sid,serial=29,465会话的PGA内存使用量异常,达到了170M,
虽然跟以上案例中的PGA泄露情况比较不算什么  */
/* 使用sid和serial定位到具体的操作系统进程号 */
SQL> select spid,pid,PGA_USED_MEM,PGA_MAX_MEM from v$process
2  where addr=(select paddr from v$session where sid=&1 and serial#=&2);
Enter value for 1: 29
Enter value for 2: 465
old   2: where addr=(select paddr from v$session where sid=&1 and serial#=&2)
new   2: where addr=(select paddr from v$session where sid=29 and serial#=465)
SPID                            PID PGA_USED_MEM PGA_MAX_MEM
------------------------ ---------- ------------ -----------
26932                            48    129716228   130034996
1 row selected.
SQL> oradebug setospid 26932;
Oracle pid: 48, Unix process pid: 26932, image: oracle@rh2.oracle.com (TNS V1-V3)
SQL> oradebug dump heapdump 536870917;
Statement processed.
SQL> oradebug dump processstate 10;
Statement processed.
SQL> oradebug tracefile_name;
/s01/orabase/diag/rdbms/prod/PROD1/trace/PROD1_ora_26932.trc
/* 接下来对堆转储文件进行分析,通过grep可以找出其中较大的SubHEAP子堆  */
[oracle@rh2 ~]$ egrep "HEAP DUMP heap name|Total heap size|Permanent space"
/s01/orabase/diag/rdbms/prod/PROD1/trace/PROD1_ora_26932.trc
HEAP DUMP heap name="session heap"  desc=0x2ac0b37a67f8
Total heap size    =   130840
Permanent space    =    62680
HEAP DUMP heap name="Alloc environm"  desc=0x2ac0b37ce090
Total heap size    =     4040
Permanent space    =     4040
HEAP DUMP heap name="PLS UGA hp"  desc=0x2ac0b37be7f0
Total heap size    =     1992
Permanent space    =     1080
HEAP DUMP heap name="koh-kghu sessi"  desc=0x2ac0b37cf660
Total heap size    =     1128
Permanent space    =       80
HEAP DUMP heap name="pga heap"  desc=0xb7c8ba0
Total heap size    =  2689432
Permanent space    =   660560
HEAP DUMP heap name="Alloc environm"  desc=0x2ac0b35ba5c8
Total heap size    =  1706816
Permanent space    =      464
HEAP DUMP heap name="Alloc server h"  desc=0x2ac0b35b9000
Total heap size    =  1704400
Permanent space    =  1694816
HEAP DUMP heap name="diag pga"  desc=0x2ac0b32537e0
Total heap size    =    65448
Permanent space    =     3672
HEAP DUMP heap name="KFK_IO_SUBHEAP"  desc=0x2ac0b35eb2b0
Total heap size    =    10992
Permanent space    =       80
HEAP DUMP heap name="peshm.c:Proces"  desc=0x2ac0b35e7ad0
Total heap size    =     4000
Permanent space    =       80
HEAP DUMP heap name="KSFQ heap"  desc=0x2ac0b35c6d70
Total heap size    =     3256
Permanent space    =     3256
HEAP DUMP heap name="top call heap"  desc=0xb7ce3c0
Total heap size    =155918560
Permanent space    =      448
HEAP DUMP heap name="callheap"  desc=0xb7cd4c0
Total heap size    =152906784
Permanent space    =       80
HEAP DUMP heap name="TCHK^30c42b7a"  desc=0x2ac0b378ff48
Total heap size    =151414512
Permanent space    =       80
HEAP DUMP heap name="kggec.c.kggfa"  desc=0x2ac0b4e76ec8
Total heap size    =     1016
Permanent space    =      736
HEAP DUMP heap name="kxs-heap-c"  desc=0x2ac0b37800c0
Total heap size    =  1489464
Permanent space    =  1485928
HEAP DUMP heap name="top uga heap"  desc=0xb7ce5e0
Total heap size    =   131024
Permanent space    =       80
HEAP DUMP heap name="session heap"  desc=0x2ac0b37a67f8
Total heap size    =   130840
Permanent space    =    62680
HEAP DUMP heap name="Alloc environm"  desc=0x2ac0b37ce090
Total heap size    =     4040
Permanent space    =     4040
HEAP DUMP heap name="PLS UGA hp"  desc=0x2ac0b37be7f0
Total heap size    =     1992
Permanent space    =     1080
HEAP DUMP heap name="koh-kghu sessi"  desc=0x2ac0b37cf660
Total heap size    =     1128
Permanent space    =       80
HEAP DUMP heap name="SQLA^30c42b7a"  desc=0x6f4c3ab8
Total heap size    =  4919904
Permanent space    =       80
HEAP DUMP heap name="KGLH0^30c42b7a"  desc=0x6ef44290
Total heap size    =     4032
Permanent space    =     2648
/* 以上heapdump表明TCHK^30c42b7a子堆占用了PGA中绝大多数的内存,
其中子堆的包含结构为PGA->top call heap -> call heap  -> TCHK 
*/
/* 接着processstate dump还可以让我们了解该问题进程的最近活动历史,及之前所运行的SQL语句
以便进一步诊断,以下为其调用堆栈      */
ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1900<-sspuser()+112<-__sighandler()<-qcsfccc()+206
<-qcsIsColInFro()+309<-qcsRslvColWithinQbc()+179<-qcsWeakColRslv()+94<-qcsRslvName()+2541<-qcsridn()+105
<-qcsraic()+455<-qcspqbDescendents()+527<-qcspqb()+260<-qcspqbDescendents()+2744<-qcspqb()+260
<-kkmdrv()+182<-opiSem()+1947<-opiprs()+293<-__PGOSF632_kksParseChildCursor()+572<-rpiswu2()+1618
<-kksLoadChild()+5167<-kxsGetRuntimeLock()+2066<-kksfbc()+14527<-kkspsc0()+2025<-kksParseCursor()+144
<-opiosq0()+2027<-kpooprx()+274<-kpoal8()+800<-opiodr()+910<-ttcpip()+2289<-opitsk()+1670<-opiino()+966
<-opiodr()+910<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+252<-main()+201<-__libc_start_main()+244
<-_start()+36

通过以上获取的信息"call heap-> TCHK"可以从MOS查找到11.2上的”Bug 11782790: ORA-4030 IS GENERATED IN HARD PARSE"和"Bug 12360198: ORA-04030 (TCHK^82665CD8,CHEDEF : QCUATC)";虽然我们这里故意为之而非真实的Bug,也可以看出"TCHK subheap"在11.2中充当SQL解析时的临时调用子堆(parse call subheap)。

to be continued.....

ORA-07274 spdcr:access error错误一例

下午在客户office遭遇了ORA-07274错误,那是一套AIX 5.3上10.2.0.4的2节点RAC系统,乍看到7274还以为是ORA-7445的孪生兄弟,心中暗想10.2.0.4的RAC问题还真是不少:

Errors in file /s01/oracle/product/10.2.0/db_1/rdbms/log/prod21_psp0_4272.trc:
ORA-07274: spdcr: access error, access to oracle denied.
No such file or directory
Mon Apr 18 17:05:44 2011
PSP0: terminating instance due to error 7274
Mon Apr 18 17:05:44 2011
Errors in file /s01/oracle/product/10.2.0/db_1/rdbms/log/prod21_lms1_4285.trc:
ORA-07274: spdcr: access error, access to oracle denied.
Mon Apr 18 17:05:44 2011
Errors in file /s01/oracle/product/10.2.0/db_1/rdbms/log/prod21_lms0_4279.trc:
ORA-07274: spdcr: access error, access to oracle denied.
Mon Apr 18 17:05:44 2011
Errors in file /s01/oracle/product/10.2.0/db_1/rdbms/log/prod21_lmd0_4276.trc:
ORA-07274: spdcr: access error, access to oracle denied.
Mon Apr 18 17:05:44 2011
Errors in file /s01/oracle/product/10.2.0/db_1/rdbms/log/prod21_lmon_4274.trc:
ORA-07274: spdcr: access error, access to oracle denied.
Mon Apr 18 17:05:45 2011
System state dump is made for local instance
System State dumped to trace file /s01/oracle/product/10.2.0/db_1/rdbms/log/prod21_diag_4270.trc
Mon Apr 18 17:05:45 2011
Trace dumping is performing id=[cdmp_20110418170545]
Mon Apr 18 17:05:46 2011
Instance terminated by PSP0, pid = 4272
====/s01/oracle/product/10.2.0/db_1/rdbms/log/prod21_psp0_4272.trc======
error 7274 detected in background process
ORA-07274: spdcr: access error, access to oracle denied.
ksuitm: waiting up to [5] seconds before killing DIAG
[oracle@vrh1 ~]$ oerr ora 7274
07274, 00000, "spdcr: access error, access to oracle denied."
// *Cause:  Unable to access "oracle" program. Verify ?/bin/oracle or
//          $ORABCKPRG exist, and are executable.
// *Action: Check errno returned.

以上ORA-07274错误直接导致RAC的2号节点实例意外终止!

引起ORA-07274错误的唯一直接原因是$ORACLE_HOME/bin目录下的oracle 2进制文件在错误发生时无法正常访问,可能造成该重要文件无法访问的原因可能有:

1.文件系统被umount或者ORACLE_HOME所在磁盘出现错误
2.oracle 2进制文件的权限设置错误,其正确权限应为6751
3.oracle 2进制文件被意外删除;当然relink oracle程序需要短暂移动该文件,所以也可能造成以上问题

在发生以上ORA-07274错误后我们第一时间使用ls命令查看了oracle 2进制文件的状态:

[oracle@vrh1 bin]$ ls -l $ORACLE_HOME/bin/oracle     
-rwsr-s--x 1 oracle oinstall 96789226 Apr 18 17:03 /s01/oracle/product/10.2.0/db_1/bin/oracle
[oracle@vrh1 bin]$ stat $ORACLE_HOME/bin/oracle
File: `/s01/oracle/product/10.2.0/db_1/bin/oracle'
Size: 96789226        Blocks: 189248     IO Block: 4096   regular file
Device: fd00h/64768d    Inode: 4434759     Links: 1
Access: (6751/-rwsr-s--x)  Uid: (  500/  oracle)   Gid: (  500/oinstall)
Access: 2011-04-18 17:04:55.000000000 -0400
Modify: 2011-04-18 17:03:44.000000000 -0400
Change: 2011-04-18 17:25:09.000000000 -0400
[oracle@vrh1 bin]$ date
Mon Apr 18 17:25:30 EDT 2011

并发现该文件在近期内被修改过(Modify: 2011-04-18 17:03:44.000000000 -0400),首先我想到的是可能有人在实例启动的情况下重新编译了oracle 2进制程序,但虽然的调查发现$ORACLE_HOME下大部分的目录修改时间(Modify time)都变成了最近。

这个时候一副”案件现场”已经呈现在我脑海中了,很明显是有人误删除了$ORACLE_HOME下的文件,在发现是误操作后立即终止了rm命令,但这个时候Oracle instance已经因为oracle 2进制程序文件丢失而意外终止terminated了,为了让现场恢复到原样,其又试图通过将1号节点上的Oracle Software传输到2号节点上来解决问题。

为了进一步证明我的想法,我求助于OS管理人员,希望通过传说中的系统命令审计来”锁定”问题。OS管理人员打开一个文本文件同我一起开始review起来,但在我看来这个审计文件的信息似乎过于简单了,只记录操作用户名下运行过的shell命令;一开始我们浏览了root用户的审计记录,没有发现任何问题;在准备查看oracle用户的审计文件时,系统管理人员才告之我,我们在查看的审计文件其实是用户home目录下的.sh_history,天哪!这就是传说中的审计文件,真神奇!

一般来说UNIX上的shell history只保留最近1000条的命令记录,为了不让这些唯一的证据丢失,我们首先将该.sh_history复制了一份,之后review历史记录文件后发现oracle用户执行过如下命令:

..............
rm -rf *
rcp -r node:/s01/oracle/product/10.2.0/db_1/* ./

rm -rf命令的确应当成为DBA的禁忌,无论是旧历江湖的老鸟还是初出茅庐的嫩头青都可能栽在rm手上。根据抽屉原理,最坏的情况总会发生,而我们又无法真正绝对意义上杜绝对rm命令的使用(我之前的文章介绍过由hostname误操作修改了RAC中一个节点的主机名的例子,这样的情况推荐使用uname -n来获取主机名字,可以杜绝使用hostname),那么我们真的很有必要在敲下rm命令前思考那么几秒,或者请另一位头脑清醒的仁兄帮自己确认一下,当然常年在船上走没有不湿鞋的,特别是当操作者处在一种panic模式的情景中时。

这个case因为oracle用户的密码只有少数几个人拥有,所以之后很容易能定位到责任人。多年之前在我第一家服务的公司中,office中到处贴着各种IT从业人员应当知晓的守则,其中就有一条”Passwords are like bubble gum”,是的密码就像是吹泡泡糖!

Restarting dead background process QMNC一例

一套10.2.0.5 的数据库,告警日志中反复出现”Restarting dead background process QMNC”的信息,具体日志如下:

 

 

Thu Jun 07 10:38:28 GMT 2012
Restarting dead background process QMNC
QMNC started with pid=271, OS id=17059
Thu Jun 07 10:43:08 GMT 2012
Restarting dead background process QMNC
QMNC started with pid=234, OS id=18263

 

但是通过ps 命令查看 重启的QMNC进程很快又消失掉,

Metalink Note Restarting dead background process QMNC recorded in the alert.log file [ID 395137.1]说明了该问题

 

Applies to:
Oracle Server - Enterprise Edition - Version: 10.1.0.2 to 11.2.0.2 - Release: 10.1 to 11.2
Information in this document applies to any platform.
Checked for currency 06-Sep-10
Symptoms
The following messages are recorded in the alert log every 5 minutes
Restarting dead background process QMNC
Changes
Cause
1. database startup :
The database has not opened for an undetermined reason, i.e there is no statement:
Completed: ALTER DATABASE OPEN
recorded in the alert.log file.
2. database shutdown :
A process may be preventing the database from shutting down
cleanly. As a consequence, this may be causing the qmnc process problems.
Solution
This is expected behaviour.
The reason why the database has not opened or shutdown should be investigated.
These messages are a symptom of the problem and not the underlying cause.

 

该问题除了会在 alert.log中每五分钟生成一次 “Restarting dead background process QMNC”的信息外,不会造成其他影响,一般可以忽略。

 

该警告信息可能由于ORACLE INSTANCE未能正常关闭引起(即shutdown不是clean干净的), 查询之前的alert.log 发现有一个EMN0进程在上一次shutdown immediate时被启动,且一直存在着:

 

Starting background process EMN0
Wed Jun 06 19:30:26 GMT 2012
Shutting down instance: further logons disabled
EMN0 started with pid=47, OS id=322

 

EMN0是一个高级队列advanced Queuing相关的进程,该进程不是关键后台进程 , 必须时可以被 KILL掉

 

The EMNO process is new to Oracle8i, and has to do with improvements made to Advanced
Queuing, specifically:

A combination of features are introduced to allow a publish/subscribe style of  messaging
between applications. These features include rule-based subscribers,
message propagation, the listen feature and notification capabilities.

OCI functions are available to register a callback for message notification.  This
interface is only valid for the asynchronous mode of message delivery. In this mode, a
subscriber issues a registration call which specifies a callback.  When messages are
received that match the subscription criteria, the callback
is invoked.  A new background process event monitor (EMON) is used to notify the
subscriber.  This process is started automatically when the first notification is issued
for the instance.  The application may then issue an explicit message_receive(dequeue) to
retrieve the message.

This functionality is triggered with the OCI call OCISubscriptionRegister. OCI clients
can use the new call OCISubscriptionRegister to register a callback for
message notification. The client issues a registration call which specifies a
subscription name and a callback. When messages for the subscription are received, the
callback is invoked. The callback may then issue an explicit dequeue to retrieve the
message.

Oracle内部错误ORA-00600:[2667]一例

一套Power AIX上的9.2.0.1系统在数据库打开过程中遇到ORA-00600:[2667]内部错误,详细日志如下:

 

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 13764045638   QQ号:47079569    邮箱:service@parnassusdata.com

 

Wed Mar 9 19:03:38 2011
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 117699122479
Resetting resetlogs activation ID 2197911857 (0x83017931)
Wed Mar 9 19:03:47 2011
LGWR: Primary database is in CLUSTER CONSISTENT mode
Assigning activation ID 2284878888 (0x88307c28)
Thread 1 opened at log sequence 1
Current log# 3 seq# 1 mem# 0: /s01/maclean/oradata/PROD/redo03.log
Successful open of redo thread 1.
Wed Mar 9 19:03:47 2011
SMON: enabling cache recovery
Wed Mar 9 19:03:47 2011
Errors in file /s01/maclean/admin/PROD/udump/PROD_ora_585914.trc:
ORA-07445: exception encountered: core dump [] [] [] [] [] []
Wed Mar 9 19:03:48 2011
Errors in file /s01/maclean/admin/PROD/bdump/PROD_pmon_602286.trc:
ORA-07445: exception encountered: core dump [] [] [] [] [] []
Wed Mar 9 19:03:50 2011
Errors in file /s01/maclean/admin/PROD/bdump/PROD_lgwr_548884.trc:
ORA-00600: internal error code, arguments: [2667], [1], [1], [3], [1739273391], [1739273391], [1739273391], [7267]
LGWR: terminating instance due to error 600
相关的初始化参数
fast_start_mttr_target = 300
_allow_resetlogs_corruption= TRUE
undo_management = AUTO
undo_tablespace = UNDOTBS1

以上可以看到lgwr关键进程在数据库open后几秒后遭遇了ORA-00600:[2667]内部错误后终止了实例。
该数据库在之前因为丢失当前日志文件进行了已经实施了一系列的非常规恢复操作,包括设置一系列的underscore参数:

Before I provide the steps to fix the ora-00600 error, I want to tell you that this database is opened with the unsupported parameter
"allow_resetlogs_corruption".
*************************************************************************
* By forcing open the database using this parameter, there is a strong *
* likelihood of logical corruption, possibly affecting the data *
* dictionary. Oracle does not guarantee that all of the data will be *
* accessible nor will it support a database that has been opened by *
* this method and that the database users will be allowed to continue *
* work. All this does is provide a way to get at the contents of the *
* database for extraction, usually by export. It is up to you to *
* determine the amount of lost data and to correct any logical *
* corruption issues. *
* *
*************************************************************************
2) The steps to get rid of the ora-00600 are as follows:
+ Change UNDO_MANAGEMENT=AUTO to
UNDO_MANAGEMENT=MANUAL
+ Remove or comment out UNDO_TABLESPACE and UNDO_RETENTION.
+ Add
_CORRUPTED_ROLLBACK_SEGMENTS =(comma separated list of Automatic Undo segments)
Example:
_CORRUPTED_ROLLBACK_SEGMENTS = (_SYSSMU1$, _SYSSMU2$, _SYSSMU3$, _SYSSMU4$,
_SYSSMU5$, _SYSSMU6$, _SYSSMU7$, _SYSSMU8$, _SYSSMU9$, _SYSSMU10$)
Note, sometimes the alert log will tell you what Automatic Undo segments are in use. 
Search the alert log for SYSS. If the alert log does not contain that information then use _SYSSMU1$ 
through _SYSSMU10$ as shown in the example above.
In UNIX you can issue this command to get the undo segment names:
$ strings system01.dbf | grep _SYSSMU | cut -d $ -f 1 | sort -u
From the output of the strings command above, add a $ to end of each _SYSSMU undo segment name.
++ Startup mount the database as follows:
SQL > startup mount
SQL > recover database;
SQl > alter database open;
*._corrupted_rollback_segments= (_SYSSMU730$, _SYSSMU731$, _SYSSMU732$, _SYSSMU733$, _SYSSMU734$, 
_SYSSMU735$, _SYSSMU736$, _SYSSMU737$, _SYSSMU738$, _SYSSMU739$, _SYSSMU744$, _SYSSMU740$, _SYSSMU741$, 
_SYSSMU742$, _SYSSMU743$, _SYSSMU744$, _SYSSMU745$, _SYSSMU746$, _SYSSMU747$, _SYSSMU748$, _SYSSMU749$, _SYSSMU74t$, 
_SYSSMU75$, _SYSSMU750$, _SYSSMU751$, _SYSSMU752$, _SYSSMU753$, _SYSSMU754$, _SYSSMU755$, _SYSSMU756$, _SYSSMU757$, 
_SYSSMU758$, _SYSSMU759$, _SYSSMU76$, _SYSSMU760$, _SYSSMU761$, _SYSSMU762$, _SYSSMU763$, _SYSSMU764$, _SYSSMU765$, 
_SYSSMU766$, _SYSSMU767$, _SYSSMU768$)

但在完成以上设置后仍不能避免ora-00600[2667]的发生,下位errpt日志中磁盘阵列损坏信息:

errpt -a
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR6
IDENTIFIER: B9735AF4
Date/Time: Thu Mar 10 19:01:51 THAIST 2011
Sequence Number: 106800
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: PERM
Resource Name: hdisk5
Resource Class: disk
Resource Type: array
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21-L1000000000000
Description
SUBSYSTEM COMPONENT FAILURE
Probable Causes
ARRAY DASD MEDIA
POWER OR FAN COMPONENT
Failure Causes
ARRAY DASD MEDIA
POWER OR FAN COMPONENT
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
SENSE DATA
0600 0308 0000 FF00 0000 0004 0000 0000 0000 0000 0000 0000 0000 0000 7000 0600
0000 0098 0000 0000 3FC6 0600 0000 0000 0000 0000 0000 D544 0000 0000 0000 0000
0008 5000 0000 0000 0000 0000 0000 0000 0000 5347 3830 3730 3033 3339 2020 2020
2020 0660 2200 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0005 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 5B8C 32C1 3033 3130 3131 2F30 3630 3533 3000 0000 0000 0000 0000 0000
0000 0000 526D 6000 F205 3703 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR6
IDENTIFIER: B9735AF4
Date/Time: Thu Mar 10 19:01:51 THAIST 2011
Sequence Number: 106799
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: PERM
Resource Name: hdisk5
Resource Class: disk
Resource Type: array
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21-L1000000000000
Description
SUBSYSTEM COMPONENT FAILURE
Probable Causes
ARRAY DASD MEDIA
POWER OR FAN COMPONENT
Failure Causes
ARRAY DASD MEDIA
POWER OR FAN COMPONENT
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
SENSE DATA
0600 0308 0000 FF00 0000 0004 0000 0000 0000 0000 0000 0000 0000 0000 7000 0600
0000 0098 0000 0000 3FC6 0600 0000 0000 0000 0000 0000 D524 0000 0000 0000 0000
0008 5000 0000 0000 0000 0000 0000 0000 0000 5347 3830 3730 3033 3339 2020 2020
2020 0660 2200 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0005 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 5B8C 32C0 3033 3130 3131 2F30 3630 3533 3000 0000 0000 0000 0000 0000
0000 0000 526D 6000 F205 3703 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR10
IDENTIFIER: C86ACB7E
Date/Time: Thu Mar 10 19:01:51 THAIST 2011
Sequence Number: 106798
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: INFO
Resource Name: dac0
Resource Class: array
Resource Type: ibm-dac-V4
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21
VPD:
Manufacturer................IBM
Machine Type and Model......1814 FAStT
Part Number.................24288-00
ROS Level and ID............0916
Description
ARRAY CONFIGURATION CHANGED
Probable Causes
ARRAY CONTROLLER
CABLES AND CONNECTIONS
Failure Causes
ARRAY CONTROLLER
CABLES AND CONNECTIONS
Recommended Actions
NO ACTION NECESSARY
Detail Data
SENSE DATA
0600 0308 0000 FF00 0000 0004 0000 0000 0000 0000 0000 0000 0000 0000 7000 0600
0000 0098 0000 0000 9502 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0008 1800 0000 0000 0000 0000 0000 0000 0000 5347 3830 3730 3033 3339 2020 2020
2020 0660 2200 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0005 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 5B8C 32BF 3033 3130 3131 2F30 3630 3533 3000 0000 0000 0000 0000 0000
0000 0000 526D 6000 F205 3703 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR6
IDENTIFIER: B9735AF4
Date/Time: Thu Mar 10 19:01:47 THAIST 2011
Sequence Number: 106797
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: PERM
Resource Name: hdisk4
Resource Class: disk
Resource Type: array
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21-L0
Description
SUBSYSTEM COMPONENT FAILURE
Probable Causes
ARRAY DASD MEDIA
POWER OR FAN COMPONENT
Failure Causes
ARRAY DASD MEDIA
POWER OR FAN COMPONENT
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
SENSE DATA
0600 0308 0000 FF00 0000 0004 0000 0000 0000 0000 0000 0000 0000 0000 7000 0600
0000 0098 0000 0000 3FC6 0600 0000 0000 0000 0000 0000 D544 0000 0000 0000 0000
0008 5000 0000 0000 0000 0000 0000 0000 0000 5347 3830 3730 3033 3339 2020 2020
2020 0660 2200 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0005 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 5B8C 3215 3033 3130 3131 2F30 3630 3532 3600 0000 0000 0000 0000 0000
0000 0000 526D 6000 F205 3703 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR6
IDENTIFIER: B9735AF4
Date/Time: Thu Mar 10 19:01:47 THAIST 2011
Sequence Number: 106796
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: PERM
Resource Name: hdisk4
Resource Class: disk
Resource Type: array
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21-L0
Description
SUBSYSTEM COMPONENT FAILURE
Probable Causes
ARRAY DASD MEDIA
POWER OR FAN COMPONENT
Failure Causes
ARRAY DASD MEDIA
POWER OR FAN COMPONENT
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
SENSE DATA
0600 0308 0000 FF00 0000 0004 0000 0000 0000 0000 0000 0000 0000 0000 7000 0600
0000 0098 0000 0000 3FC6 0600 0000 0000 0000 0000 0000 D524 0000 0000 0000 0000
0008 5000 0000 0000 0000 0000 0000 0000 0000 5347 3830 3730 3033 3339 2020 2020
2020 0660 2200 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0005 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 5B8C 3214 3033 3130 3131 2F30 3630 3532 3600 0000 0000 0000 0000 0000
0000 0000 526D 6000 F205 3703 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR10
IDENTIFIER: C86ACB7E
Date/Time: Thu Mar 10 19:01:47 THAIST 2011
Sequence Number: 106795
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: INFO
Resource Name: dac0
Resource Class: array
Resource Type: ibm-dac-V4
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21
VPD:
Manufacturer................IBM
Machine Type and Model......1814 FAStT
Part Number.................24288-00
ROS Level and ID............0916
Description
ARRAY CONFIGURATION CHANGED
Probable Causes
ARRAY CONTROLLER
CABLES AND CONNECTIONS
Failure Causes
ARRAY CONTROLLER
CABLES AND CONNECTIONS
Recommended Actions
NO ACTION NECESSARY
Detail Data
SENSE DATA
0600 0308 0000 FF00 0000 0004 0000 0000 0000 0000 0000 0000 0000 0000 7000 0600
0000 0098 0000 0000 9502 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0008 1800 0000 0000 0000 0000 0000 0000 0000 5347 3830 3730 3033 3339 2020 2020
2020 0660 2200 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0005 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 5B8C 3213 3033 3130 3131 2F30 3630 3532 3600 0000 0000 0000 0000 0000
0000 0000 526D 6000 F205 3703 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR4
IDENTIFIER: D5385D18
Date/Time: Thu Mar 10 18:49:23 THAIST 2011
Sequence Number: 106794
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: TEMP
Resource Name: hdisk5
Resource Class: disk
Resource Type: array
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21-L1000000000000
Description
ARRAY OPERATION ERROR
Probable Causes
ARRAY DASD MEDIA
ARRAY DASD DEVICE
Failure Causes
DASD MEDIA
DISK DRIVE
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
SENSE DATA
0A00 2800 4644 2F00 0003 0004 0000 0000 0000 0000 0007 7497 0200 0B00 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 5262 E000 F205 3703 0000 0200 0000 0000 0000 0000 0000 0002 0000 0010
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR4
IDENTIFIER: D5385D18
Date/Time: Thu Mar 10 18:49:23 THAIST 2011
Sequence Number: 106793
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: TEMP
Resource Name: hdisk5
Resource Class: disk
Resource Type: array
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21-L1000000000000
Description
ARRAY OPERATION ERROR
Probable Causes
ARRAY DASD MEDIA
ARRAY DASD DEVICE
Failure Causes
DASD MEDIA
DISK DRIVE
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
SENSE DATA
0A00 2A00 132E C948 0000 0804 0000 0000 0000 0000 0007 7497 0200 0B00 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 5262 E000 F205 3703 0000 0200 0000 0000 0000 0000 0000 0002 0000 0010
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR4
IDENTIFIER: D5385D18
Date/Time: Thu Mar 10 18:49:23 THAIST 2011
Sequence Number: 106792
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: TEMP
Resource Name: hdisk5
Resource Class: disk
Resource Type: array
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21-L1000000000000
Description
ARRAY OPERATION ERROR
Probable Causes
ARRAY DASD MEDIA
ARRAY DASD DEVICE
Failure Causes
DASD MEDIA
DISK DRIVE
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
SENSE DATA
0A00 2A00 1335 55F0 0000 0804 0000 0000 0000 0000 0007 7497 0200 0B00 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 5262 E000 F205 3703 0000 0200 0000 0000 0000 0000 0000 0002 0000 0010
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR4
IDENTIFIER: D5385D18
Date/Time: Thu Mar 10 18:49:23 THAIST 2011
Sequence Number: 106791
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: TEMP
Resource Name: hdisk5
Resource Class: disk
Resource Type: array
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21-L1000000000000
Description
ARRAY OPERATION ERROR
Probable Causes
ARRAY DASD MEDIA
ARRAY DASD DEVICE
Failure Causes
DASD MEDIA
DISK DRIVE
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
SENSE DATA
0A00 2A00 136C 0F98 0000 0804 0000 0000 0000 0000 0007 7497 0200 0B00 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 5262 E000 F205 3703 0000 0200 0000 0000 0000 0000 0000 0002 0000 0010
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR4
IDENTIFIER: D5385D18
Date/Time: Thu Mar 10 18:49:23 THAIST 2011
Sequence Number: 106790
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: TEMP
Resource Name: hdisk5
Resource Class: disk
Resource Type: array
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21-L1000000000000
Description
ARRAY OPERATION ERROR
Probable Causes
ARRAY DASD MEDIA
ARRAY DASD DEVICE
Failure Causes
DASD MEDIA
DISK DRIVE
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
SENSE DATA
0A00 2A00 13D7 29B0 0000 0804 0000 0000 0000 0000 0007 7497 0200 0B00 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 5262 E000 F205 3703 0000 0200 0000 0000 0000 0000 0000 0002 0000 0010
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR4
IDENTIFIER: D5385D18
Date/Time: Thu Mar 10 18:49:23 THAIST 2011
Sequence Number: 106789
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: TEMP
Resource Name: hdisk5
Resource Class: disk
Resource Type: array
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21-L1000000000000
Description
ARRAY OPERATION ERROR
Probable Causes
ARRAY DASD MEDIA
ARRAY DASD DEVICE
Failure Causes
DASD MEDIA
DISK DRIVE
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
SENSE DATA
0A00 2A00 1337 FD88 0000 0804 0000 0000 0000 0000 0007 7497 0200 0B00 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 5262 E000 F205 3703 0000 0200 0000 0000 0000 0000 0000 0002 0000 0010
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR4
IDENTIFIER: D5385D18
Date/Time: Thu Mar 10 18:49:23 THAIST 2011
Sequence Number: 106788
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: TEMP
Resource Name: hdisk5
Resource Class: disk
Resource Type: array
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21-L1000000000000
Description
ARRAY OPERATION ERROR
Probable Causes
ARRAY DASD MEDIA
ARRAY DASD DEVICE
Failure Causes
DASD MEDIA
DISK DRIVE
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
SENSE DATA
0A00 2A00 136C 1F08 0000 0804 0000 0000 0000 0000 0007 7497 0200 0B00 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 5262 E000 F205 3703 0000 0200 0000 0000 0000 0000 0000 0002 0000 0010
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR4
IDENTIFIER: D5385D18
Date/Time: Thu Mar 10 18:49:23 THAIST 2011
Sequence Number: 106787
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: TEMP
Resource Name: hdisk5
Resource Class: disk
Resource Type: array
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21-L1000000000000
Description
ARRAY OPERATION ERROR
Probable Causes
ARRAY DASD MEDIA
ARRAY DASD DEVICE
Failure Causes
DASD MEDIA
DISK DRIVE
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
SENSE DATA
0A00 2A00 2578 E838 0000 0804 0000 0000 0000 0000 0007 7497 0200 0B00 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 5262 E000 F205 3703 0000 0200 0000 0000 0000 0000 0000 0002 0000 0010
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR4
IDENTIFIER: D5385D18
Date/Time: Thu Mar 10 18:49:23 THAIST 2011
Sequence Number: 106786
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: TEMP
Resource Name: hdisk5
Resource Class: disk
Resource Type: array
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21-L1000000000000
Description
ARRAY OPERATION ERROR
Probable Causes
ARRAY DASD MEDIA
ARRAY DASD DEVICE
Failure Causes
DASD MEDIA
DISK DRIVE
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
SENSE DATA
0A00 2800 4644 3300 0001 0004 0000 0000 0000 0000 0007 7497 0200 0400 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 5262 E000 F205 3703 0000 0200 0000 0000 0000 0000 0000 0002 0000 0010
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR4
IDENTIFIER: D5385D18
Date/Time: Thu Mar 10 18:49:23 THAIST 2011
Sequence Number: 106785
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: TEMP
Resource Name: hdisk5
Resource Class: disk
Resource Type: array
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21-L1000000000000
Description
ARRAY OPERATION ERROR
Probable Causes
ARRAY DASD MEDIA
ARRAY DASD DEVICE
Failure Causes
DASD MEDIA
DISK DRIVE
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
SENSE DATA
0A00 2800 4643 BA00 0004 0004 0000 0000 0000 0000 0007 7497 0200 0B00 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 5262 E000 F205 3703 0000 0200 0000 0000 0000 0000 0000 0002 0000 0010
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR8
IDENTIFIER: 483C9D10
Date/Time: Thu Mar 10 18:49:23 THAIST 2011
Sequence Number: 106784
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: INFO
Resource Name: dac0
Resource Class: array
Resource Type: ibm-dac-V4
Location: U787B.001.DNWGK9Y-P1-C4-T1-W200500A0B8484C21
VPD:
Manufacturer................IBM
Machine Type and Model......1814 FAStT
Part Number.................24288-00
ROS Level and ID............0916
Description
ARRAY ACTIVE CONTROLLER SWITCH
Probable Causes
ARRAY CONTROLLER
CABLES AND CONNECTIONS
Failure Causes
ARRAY CONTROLLER
CABLES AND CONNECTIONS
Recommended Actions
NO ACTION NECESSARY
Detail Data
SENSE DATA
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0400 00EE 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 526D 6000 F205 3703 0000 0000 0000 0000 0000 0000 0000 0002 0000 0001
0000 0000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR3
IDENTIFIER: D9770360
Date/Time: Thu Mar 10 18:48:47 THAIST 2011
Sequence Number: 106783
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: PERM
Resource Name: dac0utm
Resource Class: NONE
Resource Type: NONE
Location:
Description
ARRAY OPERATION ERROR
Probable Causes
ARRAY DASD DEVICE
STORAGE DEVICE CABLE
EB4F
Failure Causes
DISK DRIVE
DISK DRIVE ELECTRONICS
STORAGE DEVICE CABLE
ARRAY CONTROLLER
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
SENSE DATA
0A00 2800 0000 0200 0000 0104 0000 0000 0000 0000 0000 0013 0200 0B00 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 1103 8000 F205 3703 0000 0200 0000 0000 0000 0000 0040 0002 0000 0000
0000 0000
---------------------------------------------------------------------------
LABEL: FSCSI_ERR4
IDENTIFIER: 3074FEB7
Date/Time: Thu Mar 10 18:48:47 THAIST 2011
Sequence Number: 106782
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: TEMP
Resource Name: fscsi0
Resource Class: driver
Resource Type: efscsi
Location: U787B.001.DNWGK9Y-P1-C4-T1
Description
ADAPTER ERROR
Probable Causes
ADAPTER HARDWARE OR CABLE
ADAPTER MICROCODE
FIBRE CHANNEL SWITCH OR FC-AL HUB
Failure Causes
ADAPTER
CABLES AND CONNECTIONS
DEVICE
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
CHECK CABLES AND THEIR CONNECTIONS
VERIFY DEVICE CONFIGURATION
Detail Data
SENSE DATA
0000 0000 0000 00B1 0000 0045 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0500 0000 0000
0001 0400 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 422F 0000 0812 0002 0000 0100 0000 0000 0001 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0100 2005 00A0
B848 4C21 2004 00A0 B848 4C20 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000
1105 C000
---------------------------------------------------------------------------
LABEL: FCP_ARRAY_ERR9
IDENTIFIER: 8B79A4BD
Date/Time: Thu Mar 10 18:48:45 THAIST 2011
Sequence Number: 106781
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: PERM
Resource Name: dac1
Resource Class: array
Resource Type: ibm-dac-V4
Location: U787B.001.DNWGK9Y-P1-C1-T1-W200400A0B8484C21
VPD:
Manufacturer................IBM
Machine Type and Model......1814 FAStT
Part Number.................24288-00
ROS Level and ID............0916
Description
ARRAY CONTROLLER SWITCH FAILURE
Probable Causes
ARRAY CONTROLLER
CABLES AND CONNECTIONS
Failure Causes
ARRAY CONTROLLER
CABLES AND CONNECTIONS
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
SENSE DATA
0A00 5A00 2C01 0000 0002 0004 0000 0000 0000 0000 0000 0000 0200 0B00 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 526D 6000 F205 3703 0000 0000 0000 0010 0000 0000 0040 0002 0000 0000
0000 0000
---------------------------------------------------------------------------
LABEL: FSCSI_ERR4
IDENTIFIER: 3074FEB7
Date/Time: Thu Mar 10 18:48:45 THAIST 2011
Sequence Number: 106780
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: TEMP
Resource Name: fscsi1
Resource Class: driver
Resource Type: efscsi
Location: U787B.001.DNWGK9Y-P1-C1-T1
Description
ADAPTER ERROR
Probable Causes
ADAPTER HARDWARE OR CABLE
ADAPTER MICROCODE
FIBRE CHANNEL SWITCH OR FC-AL HUB
Failure Causes
ADAPTER
CABLES AND CONNECTIONS
DEVICE
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
CHECK CABLES AND THEIR CONNECTIONS
VERIFY DEVICE CONFIGURATION
Detail Data
SENSE DATA
0000 0000 0000 00B1 0000 0045 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0100 0000 0000
0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 422F 0000 1212 0002 0000 0100 0000 0000 0001 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0100 2004 00A0
B848 4C21 2004 00A0 B848 4C20 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000
0FFF A000
---------------------------------------------------------------------------
LABEL: FSCSI_ERR4
IDENTIFIER: 3074FEB7
Date/Time: Thu Mar 10 18:48:43 THAIST 2011
Sequence Number: 106779
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: TEMP
Resource Name: fscsi1
Resource Class: driver
Resource Type: efscsi
Location: U787B.001.DNWGK9Y-P1-C1-T1
Description
ADAPTER ERROR
Probable Causes
ADAPTER HARDWARE OR CABLE
ADAPTER MICROCODE
FIBRE CHANNEL SWITCH OR FC-AL HUB
Failure Causes
ADAPTER
CABLES AND CONNECTIONS
DEVICE
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
CHECK CABLES AND THEIR CONNECTIONS
VERIFY DEVICE CONFIGURATION
Detail Data
SENSE DATA
0000 0000 0000 00B1 0000 0045 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0100 0000 0000
0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 422F 0000 1112 0002 0000 0100 0000 0000 0001 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0100 2004 00A0
B848 4C21 2004 00A0 B848 4C20 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000
0FFF A000
---------------------------------------------------------------------------
LABEL: FSCSI_ERR4
IDENTIFIER: 3074FEB7
Date/Time: Thu Mar 10 18:48:42 THAIST 2011
Sequence Number: 106778
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: TEMP
Resource Name: fscsi0
Resource Class: driver
Resource Type: efscsi
Location: U787B.001.DNWGK9Y-P1-C4-T1
Description
ADAPTER ERROR
Probable Causes
ADAPTER HARDWARE OR CABLE
ADAPTER MICROCODE
FIBRE CHANNEL SWITCH OR FC-AL HUB
Failure Causes
ADAPTER
CABLES AND CONNECTIONS
DEVICE
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
CHECK CABLES AND THEIR CONNECTIONS
VERIFY DEVICE CONFIGURATION
Detail Data
SENSE DATA
0000 0000 0000 00B1 0000 0045 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0500 0000 0000
0001 0400 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 422F 0000 0712 0002 0000 0100 0000 0000 0001 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0100 2005 00A0
B848 4C21 2004 00A0 B848 4C20 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000
1105 C000
---------------------------------------------------------------------------
LABEL: FSCSI_ERR4
IDENTIFIER: 3074FEB7
Date/Time: Thu Mar 10 18:48:41 THAIST 2011
Sequence Number: 106777
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: TEMP
Resource Name: fscsi1
Resource Class: driver
Resource Type: efscsi
Location: U787B.001.DNWGK9Y-P1-C1-T1
Description
ADAPTER ERROR
Probable Causes
ADAPTER HARDWARE OR CABLE
ADAPTER MICROCODE
FIBRE CHANNEL SWITCH OR FC-AL HUB
Failure Causes
ADAPTER
CABLES AND CONNECTIONS
DEVICE
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
CHECK CABLES AND THEIR CONNECTIONS
VERIFY DEVICE CONFIGURATION
Detail Data
SENSE DATA
0000 0000 0000 00B1 0000 0045 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0100 0000 0000
0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 422F 0000 1012 0002 0000 0100 0000 0000 0001 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0100 2004 00A0
B848 4C21 2004 00A0 B848 4C20 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000
0FFF A000
---------------------------------------------------------------------------
LABEL: FSCSI_ERR4
IDENTIFIER: 3074FEB7
Date/Time: Thu Mar 10 18:48:39 THAIST 2011
Sequence Number: 106776
Machine Id: 000786F9D600
Node Id: p550su
Class: H
Type: TEMP
Resource Name: fscsi1
Resource Class: driver
Resource Type: efscsi
Location: U787B.001.DNWGK9Y-P1-C1-T1
Description
ADAPTER ERROR
Probable Causes
ADAPTER HARDWARE OR CABLE
ADAPTER MICROCODE
FIBRE CHANNEL SWITCH OR FC-AL HUB
Failure Causes
ADAPTER
CABLES AND CONNECTIONS
DEVICE
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
CHECK CABLES AND THEIR CONNECTIONS
VERIFY DEVICE CONFIGURATION
Detail Data
SENSE DATA
0000 0000 0000 00B1 0000 0045 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0100 0000 0000
0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 422F 0000 0F12 0002 0000 0100 0000 0000 0001 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0100 2004 00A0
B848 4C21 2004 00A0 B848 4C20 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000
0FFF A000
---------------------------------------------------------------------------
LABEL: FSCSI_ERR4
IDENTIFIER: 3074FEB7

因为lgwr的crash是发生在database open后,所以实际上我们是可以在这段时间内操作数据库的,这个case最终通过新建undo tablespace代替老的问题回滚表空间解决了。

We have been repair the disk controller, but ora-600[2667] still occur when we try to open database.
After recreating the undo tablespace we were able to open the database.

沪ICP备14014813号

沪公网安备 31010802001379号