ORA-19808错误一例

一套Linux上的11.2.0.2 RAC系统,其中一个节点startup mount时出现ORA-19808错误,日志如下:


SQL> alter database mount;
alter database mount
*
ERROR at line 1:
ORA-01105: mount is incompatible with mounts by other instances
ORA-19808: recovery destination parameter mismatch


SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options


[oracle@rh3 ~]$ oerr ora 19808
19808, 00000, "recovery destination parameter mismatch"
// *Cause:  The value of parameters DB_RECOVERY_FILE_DEST and
//          DB_RECOVERY_FILE_DEST_SIZE must be same in all instances.
//          instance. All databases must have same recovery destination
//          parameters.
// *Action: Check DB_RECOVERY_FILE_DEST and DB_RECOVERY_FILE_DEST_SIZE
//          values in all instances.

SQL> col name for a30
SQL> col value for a30

SQL> select name ,value,inst_id from gv$system_parameter  where name='db_recovery_file_dest_size';

NAME                           VALUE                             INST_ID
------------------------------ ------------------------------ ----------
db_recovery_file_dest_size     0                                       1
db_recovery_file_dest_size     42278584320                             2

该错误是由于启动节点使用了pfile形式的参数文件,而该参数文件中的db_recovery_file_dest_size值与已经启动的另一个节点的db_recovery_file_dest_size不一致所造成。

如果使用共享的server parameter file则不可能出现上述情况,当然也可以通过在启动节点上修改db_recovery_file_dest_size来解决问题。

Oracle内部错误:ORA-00600[kccchb_3]一例

一套Linux x86-64上的11.1.0.7 RAC系统,该RAC使用Netapps NFS作为共享存储。其中一个节点出现ORA-00600: internal error code, arguments: [kccchb_3]内部错误并导致实例意外终止,详细日志如下:

Mon Dec 27 00:03:13 2010
Error: Controlfile sequence number in file header is different from the one in memory
Please check that the correct mount optionsare used if controlfile is located on NFS
Errors in file /d02/oratst/TSMOTEBS/db/11.1.0/admin/TSMOTEBS1_l118464dcss3001/diag/rdbms/tsmotebs/TSMOTEBS1/trace/TSMOTEBS1_ckpt_15907.trc (incident=11353):
ORA-00600: internal error code, arguments: [kccchb_3], [555541], [555540], [555540], [], [], [], [], [], [], [], []
Incident details in: /d02/oratst/TSMOTEBS/db/11.1.0/admin/TSMOTEBS1_l118464dcss3001/diag/rdbms/tsmotebs/TSMOTEBS1/incident/incdir_11353/TSMOTEBS1_ckpt_15907_i11353.trc
Errors in file /d02/oratst/TSMOTEBS/db/11.1.0/admin/TSMOTEBS1_l118464dcss3001/diag/rdbms/tsmotebs/TSMOTEBS1/trace/TSMOTEBS1_ckpt_15907.trc:
ORA-00600: internal error code, arguments: [kccchb_3], [555541], [555540], [555540], [], [], [], [], [], [], [], []
CKPT (ospid: 15907): terminating the instance due to error 469

根据Metalink文档Note 303986.1 RAC instance using NFS via Netapps report Ora-600 [kccchb_3]:

Symptoms
RAC instance using NFS via Netapps occasionally reports:
ORA-00600 [kccchb_3], [689], [0], [],[],[],[]
ORA-00600 [kclchkblk_3], [7], [738515663], [14], [], [], [], []

Cause
This is most probably a Netapp issue wherein 'noac' doesn't work as expected.
NetApp recommended mount options for Oracle9i RAC on Solaris:
rw,bg,vers=3,proto=tcp,hard,intr,rsize=32768,wsize=32768,forcedirectio,noac

NetApp recommended mount options for Linux SLES9, RHEL4, RHEL3 QU3 and later:
rw,bg,hard,nointr,tcp,vers=3,timeo=600,rsize=ORACLE_BLOCK_SIZE,
wsize=ORACLE_BLOCK_SIZE,actimeo=0

NetApp recommended mount options for Linux RHAS2.1, RHEL3 pre QU3:
rw,bg,hard,nointr,tcp,vers=3,timeo=600,rsize=ORACLE_BLOCK_SIZE,
wsize=ORACLE_BLOCK_SIZE,noac

And for AS 2.1 in /etc/modules.conf add:
options nfs nfs_uncached_io=1
Solution

Workaround
set filesystemio_options=DIRECTIO for all instances (to bypass any buffer cache)

以上介绍了该ORA-00600 [kccchb_3]内部错误可以通过设置初始化参数filesystemio_options为DIRECTIO来workaround;实际上更好的选择可以是设置filesystemio_options为SETALL,即同时使用DIRECTIO和ASYNC异步IO。

顺便提一下在Linux上启用异步ASYNC IO不仅需要设置该filesystemio_options参数,我们还需要同时保证以下几点:
1.filesystemio_options参数设置为ASYNC或者SETALL

2.合理设置Linux Kernel内核参数fs.aio-max-nr,例如设置为3145728;以及参数fs.file-max =6815744

3.disk_asynch_io初始化参数设置为TRUE

4.在10gR2下oracle binary可能没有正确以async_on选项编译,我们需要手动重新make(The reason behind this behavior is that the LIBAIO_0.1 Linux OS library is not attached for io_getevents.)

SQL>shutdown immediate
[maclean@rh8 ~]$ cd $ORACLE_HOME/rdbms/lib
[maclean@rh8 lib]$ ln -s /usr/lib/libaio.so skgaio.o
[maclean@rh8 lib]$ make PL_ORALIBS=-laio -f ins_rdbms.mk async_on
rm -f /u01/oracle/product/10.2.0/db_1/rdbms/lib/skgaioi.o
cp /u01/oracle/product/10.2.0/db_1/rdbms/lib/skgaio.o /u01/oracle/product/10.2.0/db_1/rdbms/lib/skgaioi.o
chmod 755 /u01/oracle/product/10.2.0/db_1/bin
- Linking Oracle 
rm -f /u01/oracle/product/10.2.0/db_1/rdbms/lib/oracle
gcc  -o /u01/oracle/product/10.2.0/db_1/rdbms/lib/oracle -L/u01/oracle/product/10.2.0/db_1/rdbms/lib/ -L/u01/oracle/product/10.2.0/db_1/lib/ -L/u01/oracle/product/10.2.0/db_1/lib/stubs/ -L/usr/lib -lirc   -Wl,-E `test -f /u01/oracle/product/10.2.0/db_1/rdbms/lib/sskgpsmti.o && echo /u01/oracle/product/10.2.0/db_1/rdbms/lib/sskgpsmti.o` /u01/oracle/product/10.2.0/db_1/rdbms/lib/opimai.o /u01/oracle/product/10.2.0/db_1/rdbms/lib/ssoraed.o /u01/oracle/product/10.2.0/db_1/rdbms/lib/ttcsoi.o /u01/oracle/product/10.2.0/db_1/rdbms/lib/defopt.o  -Wl,--whole-archive -lperfsrv10 -Wl,--no-whole-archive /u01/oracle/product/10.2.0/db_1/lib/nautab.o /u01/oracle/product/10.2.0/db_1/lib/naeet.o /u01/oracle/product/10.2.0/db_1/lib/naect.o /u01/oracle/product/10.2.0/db_1/lib/naedhs.o /u01/oracle/product/10.2.0/db_1/rdbms/lib/config.o  -lserver10 -lodm10 -lnnet10 -lskgxp10 -lsnls10 -lnls10  -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -lhasgen10 -lcore10 -lskgxn2 -locr10 -locrb10 -locrutl10 -lhasgen10 -lcore10 -lskgxn2   -lclient10  -lvsn10 -lcommon10 -lgeneric10 -lknlopt `if /usr/bin/ar tv /u01/oracle/product/10.2.0/db_1/rdbms/lib/libknlopt.a | grep xsyeolap.o > /dev/null 2>&1 ; then echo "-loraolap10" ; fi` -lslax10 -lpls10  -lplp10 -lserver10 -lclient10  -lvsn10 -lcommon10 -lgeneric10 -lknlopt -lslax10 -lpls10  -lplp10 -ljox10 -lserver10 -lclsra10 -ldbcfg10 -locijdbcst10 -lwwg  `cat /u01/oracle/product/10.2.0/db_1/lib/ldflags`    -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lnro10 `cat /u01/oracle/product/10.2.0/db_1/lib/ldflags`    -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lmm -lsnls10 -lnls10  -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 `cat /u01/oracle/product/10.2.0/db_1/lib/ldflags`    -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lnro10 `cat /u01/oracle/product/10.2.0/db_1/lib/ldflags`    -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10   -lsnls10 -lnls10  -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 `if /usr/bin/ar tv /u01/oracle/product/10.2.0/db_1/rdbms/lib/libknlopt.a | grep "kxmnsd.o" > /dev/null 2>&1 ; then echo " " ; else echo "-lordsdo10"; fi` -lctxc10 -lctx10 -lzx10 -lgx10 -lctx10 -lzx10 -lgx10 -lordimt10 -lsnls10 -lnls10  -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -lsnls10 -lunls10  -lsnls10 -lnls10  -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -laio    `cat /u01/oracle/product/10.2.0/db_1/lib/sysliblist` -Wl,-rpath,/u01/oracle/product/10.2.0/db_1/lib -lm    `cat /u01/oracle/product/10.2.0/db_1/lib/sysliblist` -ldl -lm   -L/u01/oracle/product/10.2.0/db_1/lib
/u01/oracle/product/10.2.0/db_1/lib//libcore10.a(lcd.o): In function `lcdprm':
lcd.c:(.text+0x8e8): warning: the `gets' function is dangerous and should not be used.
mv -f /u01/oracle/product/10.2.0/db_1/bin/oracle /u01/oracle/product/10.2.0/db_1/bin/oracleO
mv /u01/oracle/product/10.2.0/db_1/rdbms/lib/oracle /u01/oracle/product/10.2.0/db_1/bin/oracle
chmod 6751 /u01/oracle/product/10.2.0/db_1/bin/oracle
SQL> startup;
/* 通过以下方式可以验证ASYNC IO的工作情况 */
cat /proc/slabinfo | grep kio
[maclean@rh8 ~]$ cat /proc/slabinfo | grep kio
kioctx                56     72    320   12    1 : tunables   54   27    8 : slabdata      6      6      0
kiocb                 15     15    256   15    1 : tunables  120   60    8 : slabdata      1      1      0
/* 这里kioctx的值大于零,说明正在使用异步ASYNC IO */
如何是11g,那么无需那么麻烦要重新编译async_on,同时可以通过以下手段了解异步ASYNC IO的使用情况:
SQL> col name for a60 
SQL> SELECT name, asynch_io FROM v$datafile f,v$iostat_file i 
WHERE f.file# = i.file_no
AND filetype_name = 'Data File' ; 
NAME							     ASYNCH_IO
------------------------------------------------------------ ---------
/standby/oradata/PROD/datafile/o1_mf_system_6q9dwgwh_.dbf    ASYNC_ON
/standby/oradata/PROD/datafile/o1_mf_sysaux_6q9dwgyp_.dbf    ASYNC_ON
/standby/oradata/PROD/datafile/o1_mf_undotbs1_6q9dwh0r_.dbf  ASYNC_ON
/standby/oradata/PROD/datafile/o1_mf_users_6q9dwh2x_.dbf     ASYNC_ON
/standby/oradata/PROD/datafile/o1_mf_example_6q9dzhh1_.dbf   ASYNC_ON
/standby/oradata/PROD/datafile/o1_mf_enc_6q9fdzcx_.dbf	     ASYNC_ON
6 rows selected.
/* 可以看到ASYNCH_IO的状态为ON,说明正使用异步IO */

ORA-01652 even though there is sufficient space in RECYCLE BIN

There is a bug 6977045 which may cause ORA-1652 raised even though there is sufficient space in RECYCLE BIN. Version under 11.2 believed to be affected

[oracle@rh2 ~]$ oerr ora 1652
01652, 00000, "unable to extend temp segment by %s in tablespace %s"
// *Cause:  Failed to allocate an extent of the required number of blocks for
//          a temporary segment in the tablespace indicated.
// *Action: Use ALTER TABLESPACE ADD DATAFILE statement to add one or more
//          files to the tablespace indicated.
Bug 6977045  ORA-1652 even though there is sufficient space in RECYCLE BIN
This note gives a brief overview bug 6977045.
The content was last updated on: 06-DEC-2010
Click here for details of each of the sections below.
Affects:
Product (Component)	Oracle Server (Rdbms)
Range of versions believed to be affected 	Versions BELOW 11.2
Versions confirmed as being affected
11.1.0.7
Platforms affected	Generic (all / most platforms affected)
Fixed:
This issue is fixed in
11.2.0.1 (Base Release)
11.1.0.7 Patch 32 on Windows Platforms
Symptoms:
Related To:
Error May Occur
Storage Space Usage Affected
ORA-1652
Recycle Bin
Description
Under space pressure an ORA-1652 may be signalled even if there is sufficient
space in the recyclebin.
Rediscovery Notes:
Under space pressure, space allocation fails, even though there
is sufficient free space in recycle bin.
Workaround
Turn off the recycle bin.
OR
Purge the recyclebin.
Hdr: 12582291 11.1.0.7 RDBMS 11.1.0.7 SPACE PRODID-5 PORTID-59
Abstract: UPDATING A LOB FAILS WHILE CLEARING RECYCLE BIN EVEN WHEN ENOUGH FREE SPACE IS A
BUG TYPE CHOSEN
===============
Code
SubComponent: Recovery
======================
DETAILED PROBLEM DESCRIPTION
============================
An OCI application module tried to update a LOB object, and this operation
internally & recursively tried to clear off a few segments from the recycle
bin. As ct. had enabled triggers preventing uncontrolled droppings of
segments, this apparently prevented the application module from succeeding.
Further, since this error did not show up on the application module that
failed, this customer-facing critical application of this large enterprise
was down for considerable time.
DIAGNOSTIC ANALYSIS
===================
None. This bug is raised mainly as a Q/A to get clarifications for customer,
who is demanding an answer and possible action plan so that they can prevent
such disastrous situation in future.
WORKAROUND?
===========
Yes
WORKAROUND INFORMATION
======================
Disable the trigger or not using the recycle bin (Though neither operation
is acceptable to ct. because of their business reasons).
TECHNICAL IMPACT
================
Critical application module fails.
RELATED ISSUES (bugs, forums, RFAs)
===================================
None (MOS Note 978045.1 was referenced by ct.)
Hdr: 6977045 10.2 RDBMS 10.2 RAM DATA PRODID-5 PORTID-23 ORA-1652
Abstract: ORA-1652  LMT SPACE NOT REALLOCATED CORRECTLY AFTER DROP TABLE
*** 04/16/08 12:57 pm ***
TAR:
----
6880393.992
PROBLEM:
--------
ORA-12801: error signaled in parallel query server P038
ORA-1652: unable to extend temp segment by 320 in tablespace ERROR_TS
After dropping a table in a LMT the space is not properly returned to the
tablespace datafiles .
Only after purge tablespace error_ts; do we see the space returned correctly.
Subsequently the test plan is successful and the table is created.
DIAGNOSTIC ANALYSIS:
--------------------
See attached test case. test_output.log
WORKAROUND:
-----------
none
RELATED BUGS:
-------------
REPRODUCIBILITY:
----------------
TEST CASE:
----------
See attached test case. test_output.log
STACK TRACE:
------------
SUPPORTING INFORMATION:
-----------------------
24 HOUR CONTACT INFORMATION FOR P1 BUGS:
----------------------------------------
DIAL-IN INFORMATION:
--------------------
IMPACT DATE:
------------
*** 04/16/08 01:29 pm ***
*** 04/16/08 02:04 pm ***
the problem here is that even though the objects are occupying the same space
when they were created, dba_free_space shows one datafile to contain all the
free space reclaimed by the drop table command.
*** 04/16/08 02:35 pm ***
Please confirm this is a duplicate of bug 5083393.
*** 04/17/08 10:56 am ***
*** 04/17/08 05:09 pm ***
*** 04/17/08 05:14 pm *** (CHG: Sta->10)
*** 04/17/08 05:14 pm ***
*** 04/21/08 11:06 am *** (CHG: Sta->16)
*** 04/21/08 11:06 am ***
please review uploaded file ora_test1.log.
Patch 5083393 has been applied to this instance and the test was ran against
this patch.
Notice the query immedatly following the ORA_1652 error.  The temporary
segments seem to be causing the failure and specifically segment 1199.88012  .
*** 04/22/08 01:55 pm ***
Current SQL statement for this session:
create table seckle.my_test2_tb
nologging tablespace error_ts
parallel (degree 6)
as
select * from ecm.E08401AH_GEMINI_CMF_WIDE_TB
ERROR parallelizer slave or internal
qbas:54482
pgakid:2 pgadep:0
qerpx: error stack: OER(12805)
qbas_qerpxs: 54482
dfo_qerpxs: 0x4b7ba89e0 dfo1_qerpxs: 0x4b7ba9178
ntq_qerpxs: 1 ntqi_qerpxs: 0
nbfs_qerpxs: 0
nobj_qerpxs: 2  ngdef_qerpxs: 1
mflg_qerpxs: 0x2c
slave set 1 DFO dump:
kkfdo: (0x4b7ba9178)
kkfdo->kkfdochi: (0x0)
kkfdo->kkfdopar: (0x0)
kkfdo->kkfdonxt: (0x0)
kkfdo->kkfdotqi: 0
kkfdo->kkfdontbl: 2
kkfdo->kkfdongra: 1
kkfdo->kkfdofigra: 0
kkfdo->kkfdoflg: 0x2818
kkfdo->kkfdooct: 1
kkfdo->kkfdonumeopn: 0
Output table queue: (0x4b7fab1b8)
kxfqd     : 0x4b7fa5728
kxfqdtqi  : 0            TQ id
kxfqdcc   : 0x14         TQ: from slave set 1 to QC
kxfqdpty  : 4
kxfqdsmp  : 0            number of samples
kxfqdflg  : 0x4
kxfqdfmt  :              TQ format
kxfqfnco  : 5            number of TQ columns
kxfqfnky  : 0            number of key columns
TQ column        kxfqcbfl   kxfqcdty   kxfqcflg   kxfqcplen
kxfqfcol[   0]:  4          23         0x0          4
kxfqfcol[   1]:  32720      23         0x80         32720
kxfqfcol[   2]:  1          23         0x0          1
kxfqfcol[   3]:  76         23         0x0          76
kxfqfcol[   4]:  32720      23         0x0          32720
slave set 2 DFO dump:
np_qerpxm: 6 mflg_qerpxm: 0xa7
cdfo_qerpxm: 0x4b7ba9178 (tqid 0) sdfo_qerpxm: 0x0 (tqid -1)
ctqh_qerpxm: 0xffffffff79378ac8 dump:
kxfqh     : 0xffffffff79378ac8
kxfqhflg  : 0x15         TQ handle open
kxfqhmkr  : 0x4          QC
kxfqhpc   : 2            1:producer 2:consumer 3:ranger
kxfqepty  : 4
kxfqhnsam : 6
kxfqhnth  : 6
kxfqhdsc  :              TQ descriptor
kxfqd     : 0x4b7fa5728
kxfqdtqi  : 0            TQ id
kxfqdcc   : 0x14         TQ: from slave set 1 to QC
kxfqdpty  : 4
kxfqdsmp  : 0            number of samples
kxfqdflg  : 0x4
kxfqdfmt  :              TQ format
kxfqfnco  : 5            number of TQ columns
kxfqfnky  : 0            number of key columns
TQ column        kxfqcbfl   kxfqcdty   kxfqcflg   kxfqcplen
kxfqfcol[   0]:  4          23         0x0          4
kxfqfcol[   1]:  32720      23         0x80         32720
kxfqfcol[   2]:  1          23         0x0          1
kxfqfcol[   3]:  76         23         0x0          76
kxfqfcol[   4]:  32720      23         0x0          32720
dnst_qerpxm[cur,par]: 6,0 dcnt_qerpxm[cur,par]: 0,0
ppxv_qerpxm[0]: 0xffffffff79377f50 count[np..1]:1 1 1 1 1 1
pqv1_qerpxm: 0xffffffff79377f38 bits[np..1]: 111111
pqv2_qerpxm: 0xffffffff79377f40 bits[np..1]: 000000

If you have enabled recyclebin ,then you should check tablespace free space with dba_free_space and recyclebin space also like:

create view dba_free_space_pre10g as
select ts.name TABLESPACE_NAME,
fi.file# FILE_ID,
f.block# BLOCK_ID,
f.length * ts.blocksize BYTES,
f.length BLOCKS,
f.file# RELATIVE_FNO
from sys.ts$ ts, sys.fet$ f, sys.file$ fi
where ts.ts# = f.ts#
and f.ts# = fi.ts#
and f.file# = fi.relfile#
and ts.bitmapped = 0
union all
select /*+ ordered use_nl(f) use_nl(fi) */
ts.name TABLESPACE_NAME,
fi.file# FILE_ID,
f.ktfbfebno BLOCK_ID,
f.ktfbfeblks * ts.blocksize BYTES,
f.ktfbfeblks BLOCKS,
f.ktfbfefno RELATIVE_FNO
from sys.ts$ ts, sys.x$ktfbfe f, sys.file$ fi
where ts.ts# = f.ktfbfetsn
and f.ktfbfetsn = fi.ts#
and f.ktfbfefno = fi.relfile#
and ts.bitmapped <> 0
and ts.online$ in (1, 4)
and ts.contents$ = 0
/
create view dba_free_space_recyclebin as
select /*+ ordered use_nl(u) use_nl(fi) */
ts.name TABLESPACE_NAME,
fi.file# FILE_ID,
u.ktfbuebno BLOCK_ID,
u.ktfbueblks * ts.blocksize BYTES,
u.ktfbueblks BLOCKS,
u.ktfbuefno RELATIVE_FNO
from sys.recyclebin$ rb, sys.ts$ ts, sys.x$ktfbue u, sys.file$ fi
where ts.ts# = rb.ts#
and rb.ts# = fi.ts#
and u.ktfbuefno = fi.relfile#
and u.ktfbuesegtsn = rb.ts#
and u.ktfbuesegfno = rb.file#
and u.ktfbuesegbno = rb.block#
and ts.bitmapped <> 0
and ts.online$ in (1, 4)
and ts.contents$ = 0
union all
select ts.name TABLESPACE_NAME,
fi.file# FILE_ID,
u.block# BLOCK_ID,
u.length * ts.blocksize BYTES,
u.length BLOCKS,
u.file# RELATIVE_FNO
from sys.ts$ ts, sys.uet$ u, sys.file$ fi, sys.recyclebin$ rb
where ts.ts# = u.ts#
and u.ts# = fi.ts#
and u.segfile# = fi.relfile#
and u.ts# = rb.ts#
and u.segfile# = rb.file#
and u.segblock# = rb.block#
and ts.bitmapped = 0
/

dba_free_space_pre10g which shows the real free space like 9i behavior , dba_free_space_recyclebin shows free space resided in recyclebin.

How to trouble shooting Library cache lock/pin

1.Troubleshooting Library Cache: Lock, Pin and Load Lock (Doc ID 444560.1)
What is "Library cache lock" ?
This event controls the concurrency between clients of the library cache. It acquires a lock on
the object handle so that either:
* One client can prevent other clients from accessing the same object.
* The client can maintain a dependency for a long time (for example, so that no other client can change the object).
This lock is also obtained to locate an object in the library cache.
Library cache lock will be obtained on database objects referenced during parsing or compiling of
SQL or PL/SQL statements (table, view, procedure, function, package, package body, trigger, index, cluster, synonym).
The lock will be released at the end of the parse or compilation.
Cursors (SQL and PL/SQL areas), pipes and any other transient objects do not use this lock.
Library cache lock is not deadlock sensitive and the operation is synchronous.
Parameters:
* handle address
Address of the object being loaded.
* lock address
Address of the load lock being used. This is not the same thing as a latch or an enqueue, it is a State Object.
* Mode
Indicates the data pieces of the object which need to be loaded.
* Namespace
The name of the object namespace as it is displayed in V$DB_OBJECT_CACHE view
How can Library cache lock be reduced?
In general , high contention on library cache lock is usually a result of an under-sized shared pool or
non-sharing of sql. Some ways of reducing the contention are:
* Reduce the reloads by increasing the shared pool size as the locks may take a long time if the pool is undersized.
* Increase sharing by setting the cursor_sharing to similar or force.
Be aware this may change the execution plan; so setting the parameter should be thoroughly tested.
* Reduce the invalidations by running the batch jobs to collect statistics or any other maintenance jobs
separately from OLTP.
Note 122793.1 How to Find which Session is Holding a Particular Library Cache Lock
Known Bugs:
Note:10018789.8Spin in kgllock / DB hang with high library cache lock waits
Note:7423411.8Process may hang waiting for "library cache load lock" with no holder
Note:7706138.8Process may hang waiting for "library cache load lock" with no holder
Note:9675816.8Bug 9675816 - Self deadlock with 'library cache lock' waits
2.How to Find which Session is Holding a Particular Library Cache Lock (Doc ID 122793.1)
Method 1: Systemstate Analysis
For older versions you can use the following syntax that is also possible in higher versions.The level 266 is
not available before 9.2.0.6
alter session set max_dump_file_size=unlimited;
alter session set events 'immediate trace name systemstate level 10'
Oracle will create a systemstate tracefile in your USER_DUMP_DEST directory.
METHOD 2: EXAMINE THE X$KGLLK TABLE
select sid,saddr from v$session where event= 'library cache lock';
SID SADDR
---------- --------
16 572ed244
select kgllkhdl Handle,kgllkreq Request, kglnaobj Object
from x$kgllk where kgllkses = '572ed244'
and kgllkreq > 0;
HANDLE REQUEST OBJECT
-------- ---------- ------------------------------------------------------------
62d064dc 2 EMPLOYEES
A:It's possible that one library cache lock can block all the other sessions if this table need to be
queried by other sessions.
The problem is why the library cache lock was held for so long and wasn't released.
Usually, library cache lock only cost a few milliseconds.
It could be these Known Bugs:
Note:10018789.8 Spin in kgllock / DB hang with high library cache lock waits
Note:7423411.8 Process may hang waiting for "library cache load lock" with no holder
Note:7706138.8 Process may hang waiting for "library cache load lock" with no holder
Note:9675816.8 Bug 9675816 - Self deadlock with 'library cache lock' waits
For more information, you can refer note 444560.1 and 122793.1.
To understand the root cause, we need more information to analyze.
You may take systemstate and hanganalyze next time and we'll help you to analyze them.
1. Please generate systemstate dump as sysdba:
SQL>conn / as sysdba;
SQL>alter session set max_dump_file_size = unlimited;
SQL>alter session set events 'immediate trace name systemstate level 10';
Wait for some some seconds
SQL>alter session set events 'immediate trace name systemstate level 10';
Wait for some some seconds
SQL>alter session set events 'immediate trace name systemstate level 10';
2. Open another session as sysdba:
SQL>conn / as sysdba;
SQL>oradebug setmypid
SQL>oradebug unlimit;
SQL>oradebug dump hanganalyze 3
Wait for some seconds
SQL>oradebug dump hanganalyze 3
Wait for some seconds
SQL>oradebug dump hanganalyze 3
3. The generated files will be under udump. Please upload these files.
$cd udump
$ls –ltr

V$LOCK视图显示TX锁没有对应的TM锁的几种可能现象

v$lock视图是Oracle中经典的锁检测接口之一,该视图列出了当前实例中被持有的锁信息;其中TYPE列指出了该持有锁的类型,比较常见的TX锁即事务队列锁(Transaction enqueue)和TM锁即DML enqueue锁(俗称表锁);一般某个会话在其事务(transaction)结束前总是持有一个TX锁,并一个或多个TM ROW-X(SX)锁 (一个事务中可能对多个表或对象进行了更新). 但偶尔我们会发现某个会话持有一个TX锁,却没有对应的TM锁的情况. 第一次接触到该问题,可能会觉得有些不可思议,但这种情况却的确存在:

[Read more…]

Row Cache lock Problem

一套AIX上的11.1.0.7系统,应用启动时出现大量row cache lock等待,具体的systemstate dump信息如下:

FILE VERSIONS
-----------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
With the Partitioning, Real Application Clusters, Data Mining and Real Application Testing options
ORACLE_HOME = /oracle/product/11.1.0/database
System name: AIX
Node name: HA5_4ADB01
Release: 3
Version: 5
Machine: 000687C2D900
Instance name: HN4A1
TRACE FILE
---------------------------
Filename=HN4A1_ora_2061038.trc
Chains most likely to have caused the hang:
[a] Chain 1 Signature: 'row cache lock'< ='row cache lock' (cycle)
Chain 1 Signature Hash: 0x75bdd0c
[b] Chain 2 Signature: 'row cache lock'<='row cache lock' (cycle)
Chain 2 Signature Hash: 0x75bdd0c
[c] Chain 3 Signature: 'row cache lock'<='row cache lock' (cycle)
Chain 3 Signature Hash: 0x75bdd0c
===============================================================================
Cycles:
-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
Oracle session identified by:
{
instance: 1 (hn4a.hn4a1)
os id: 2114372
process id: 211, oracle@HA5_4ADB01
session id: 610
session serial #: 5
}
is waiting for 'row cache lock' with wait info:
{
p1: 'cache id'=0x7
p2: 'mode'=0x0
p3: 'request'=0x5
time in wait: 0.491964 sec
heur. time in wait: 26.859741 sec
timeout after: 2.508036 sec
wait id: 311
blocking: 1 session
wait history:
1.     event: 'row cache lock'
wait id: 310             p1: 'cache id'=0x7
time waited: 2.929713 sec    p2: 'mode'=0x0
p3: 'request'=0x5
2.     event: 'row cache lock'
wait id: 309             p1: 'cache id'=0x7
time waited: 2.929726 sec    p2: 'mode'=0x0
p3: 'request'=0x5
3.     event: 'row cache lock'
wait id: 308             p1: 'cache id'=0x7
time waited: 2.929720 sec    p2: 'mode'=0x0
p3: 'request'=0x5
}
and is blocked by
=> Oracle session identified by:
{
instance: 1 (hn4a.hn4a1)
os id: 2118476
process id: 209, oracle@HA5_4ADB01
session id: 616
session serial #: 5
}
which is waiting for 'row cache lock' with wait info:
{
p1: 'cache id'=0x7
p2: 'mode'=0x0
p3: 'request'=0x5
time in wait: 0.491910 sec
heur. time in wait: 26.859705 sec
timeout after: 2.508090 sec
wait id: 308
blocking: 1 session
wait history:
1.     event: 'row cache lock'
wait id: 307             p1: 'cache id'=0x7
time waited: 2.929713 sec    p2: 'mode'=0x0
p3: 'request'=0x5
2.     event: 'row cache lock'
wait id: 306             p1: 'cache id'=0x7
time waited: 2.929729 sec    p2: 'mode'=0x0
p3: 'request'=0x5
3.     event: 'row cache lock'
wait id: 305             p1: 'cache id'=0x7
time waited: 2.929715 sec    p2: 'mode'=0x0
p3: 'request'=0x5
}
and is blocked by the session at the start of the chain.
Chain 1 Signature: 'row cache lock'< ='row cache lock' (cycle)
Chain 1 Signature Hash: 0x75bdd0c
...==>many similar chains
PROCESS 211:
----------------------------------------
SO: 0x7000008a1370178, type: 2, owner: 0x0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x7000008a1370178, name=process, file=ksu.h LINE:10706 ID:, pg=0
(process) Oracle pid:211, ser:3, calls cur/top: 0x7000008ac61a0e8/0x7000008ac61a0e8
flags : (0x0) -
flags2: (0x0),  flags3: (0x0)
int error: 0, call error: 0, sess error: 0, txn error 0
ksudlp FALSE at location: 0
(post info) last post received: 0 0 201
last post received-location: kqr.h LINE:2181 ID:kqrbgl: compatible mode
last process to post me: 70000089134b800 1 6
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc: 0x7000008a153a4c8
O/S info: user: oracle, term: UNKNOWN, ospid: 2114372
OSD pid info: Unix process pid: 2114372, image: oracle@HA5_4ADB01
Short stack dump:
< -ksedsts()+0254<-ksdxfstk()+0028<-ksdxcb()+05d8<-sspuser()+0074<-44c0<-skgpwwait()+00b8<-ksliwat()+0c48<-kslwaitctx()+0150<-kqrigt()+06ac<-kqrLockAndPinPo()+0250<-kqrpre1()+061c<-kqrpre()+001c<-kziavua()+0384<-kpolnb()+0ffc<-kpoauth()+07a0<-opiodr()+0b98<-ttcpip()+115c<-opitsk()+1700<-opiino()+09f0<-opiodr()+0b98<-opidrv()+0440<-sou2o()+0090<-opimai_real()+01b0<-main()+0090<-__start()+0098
service name: SYS$USERS
client details:
O/S info: user: aiuap, term: unknown, ospid: 1234
machine: HA5-4A26 program: JDBC Thin Client
application name: JDBC Thin Client, hash value=2546894660
...
proc=0x7000008a1370178, name=row cache enqueues, file=kqr.h LINE:2004 ID:, pg=0
row cache enqueue: count=1 session=7000008916dee00 object=7000008dfd84428, request=X
savepoint=0x3e
row cache parent object: address=7000008dfd84428 cid=7(dc_users)
hash=481062cf typ=21 transaction=700000888da17a0 flags=00000002
own=7000008dfd844f0[7000008bf6e18f0,7000008bf6e18f0] wat=7000008dfd84500[7000008bf5d2750,7000008ef34e818] mode=X
status=VALID/-/-/-/-/-/-/-/-
request=N release=FALSE flags=8
instance lock=QK 481062cf bb6f8fe9
...
LIBRARY OBJECT LOCK: 7000008ef5069d8 handle=7000008dfd22338 mod=N
pnc=0 pns=0 cbb=1 rpr=1 exc=1 ilh=0 ctx=0
use=7000008916dee00 ses=7000008a15c44f0 cnt=1 flg=CNB/[0001] spn=0x4c3d09ea
LIBRARY HANDLE:7000008dfd22338 bid=105620 hid=a3c59c94 lmd=N pmd=0 sta=VALD
name=select privilege#,level from sysauth$ connect by grantee#=prior privilege# and privilege#>0 start with grantee#=:1 and privilege#>0
hash=8973f7c629f09e3f081962c2a3c59c94 idn=0

众多wait chain的源头均是row cache lock,systemstate转储中该等待事件相关的p1,也就是cache id均为7;cache#为7的字典缓存代表着dc_users,即字典中的用户信息(包括用户名,密码等信息):
[Read more…]

诊断ORA-08103错误

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 13764045638   QQ号:47079569    邮箱:service@parnassusdata.com

 

 

ORA-08103问题的诊断最好是能生成8103错误的ERROR STACK TRACE, 在TRACE中会记录具体引发8103的对象的OBJ和OBJD,这便于我们定位可能存在corruption的对象。

问题在于往往前台进程遇到ORA-08103错误不会在后台生成TRACE文件,这需要我们手动设置8103 触发ERRORSTACK的EVENTS:

 

ALTER SYSTEM SET  EVENTS  ‘ 8103 TRACE NAME ERRORSTACK LEVEL 3’;

解决思路包括:
1. 通过OBJD和DBA定位到具体的表名和块号
2. 有条件的情况下对该表做一个analyze .. validate structure
3. 有条件的情况下对该表所在tablespace做一个 dbms_space_admin.ASSM_TABLESPACE_VERIFY
4. 有条件的情况下move这张表或者相关的分区,尝试绕过该问题
5. 有条件的情况下降该表或分区移动到MSSM表空间上,绕过该问题

execute dbms_space_admin.tablespace_verify(‘&tablespace_name’)
oradebug setmypid
oradebug tracefile_name

 

execute dbms_space_admin.assm_tablespace_verify(‘&tablespace_name’,dbms_space_admin.TS_VERIFY_BITMAPS)
oradebug setmypid
oradebug tracefile_name

 

 

 

[oracle@nas ~]$ oerr ora 8103
08103, 00000, "object no longer exists"
// *Cause: The object has been deleted by another user since the operation
// began, or a prior incomplete recovery restored the database to
// a point in time during the deletion of the object.
// *Action: Delete the object if this is the result of an incomplete
// recovery.
@ Using the call stack trace arguments to identify the block producing the ORA-8103.
@ struct kcbds
@ {
@ ktid kcbdstid; /* full relative DBA plus object number */
@ .....@ struct ktid /* relative dba + objd */
@ {
@ kdbafr dbr_ktid; /* a relative dba */
@ kobjd objd_ktid; /* data object number */
@ kobjn objn_ktid; /* dictionary object number */@ struct kdbafr /* full relative dba */
@ {
@ ktsn tsn_kdbafr; 4bytes /* a tablespace number */
@ krdba dba_kdbafr; 4bytes /* a relative dba */
@ };
@ alter session set db_file_multiblock_read_count=1;
@ alter session set events '8103 trace name errorstack level 3';
@ kcbgtcr(kcbds *ds,...
@ ktecgshx(sdes, ...)
@ kcbds *sdes;
@ ktecgetsh(cdes, ...)
@ kcbds *cdes;
@ Example from a trace file with function ktecgshx being called by kteinicnt1:
@ kteinicnt1()+796 CALL ktecgshx() FFFFFFFF7FFF8F78 ?
@ 000000003 ? 000000004 ?
@ 0000001BC ? 000000000 ?
@ 1007AA000 ?
@ Argument/Register addr=0xFFFFFFFF7FFF8F78.
@ Dump of memory from 0xFFFFFFFF7FFF8F38 to 0xFFFFFFFF7FFF9078
@ FFFFFFFF7FFF8F30 00000000 00000000 [........]
@ FFFFFFFF7FFF8F40 00000000 00000000 FFFFFFFF 00000001 [................]
@ FFFFFFFF7FFF8F50 00000000 00000000 00000000 00000000 [................]
@ Repeat 1 times
@ FFFFFFFF7FFF8F70 00000000 00000000 0000000C 01006402 [..............d.]
After increase in load, queries against ASSM table intermittently fail with ORA-8103 when executed in
parallel if there are concurrent updates performed on the table.
This appears to only manifest when access is in parallel.
Cause
This is caused by Bug 5637976 ORA-8103 EVEN WITH THE WORKAROUND FROM Bug 3569503 fixed in 11.1g.
Concurrent inserts and direct path exports on an ASSM table causes ORA-8103/ORA-1410.
This is due to the fact that newly formatted blocks between low and high water mark do not get flushed to disk and query sees old copies from disk.
Rediscovery Information:
1. Concurrent inserts and exports on ASSM tables
2. ORA-8103/ORA-1410
3. redo dump shows 'ktspbfredo - Format Pagetable Datablock' for that rdba
REDO RECORD - Thread:2 RBA: 0x00045b.001887a1.0028 LEN: 0x008c VLD: 0x01
SCN: 0x0578.6eddf7be SUBSCN: 1 07/19/2012 12:11:00
CHANGE #1 TYP:1 CLS: 4 AFN:370 DBA:0x5ca5f32e OBJ:1638047 SCN:0x0578.6eddf7bd SEQ: 1 OP:13.17
ktsphfredo - Format Pagetable Segment Header
StartDBA 0x5ca5f32b nblks: 32 ForceL3 :1 Tsn: 15 objd: 1638047
REDO RECORD - Thread:2 RBA: 0x00045b.001887a5.0198 LEN: 0x008c VLD: 0x01
SCN: 0x0578.6eddf7c7 SUBSCN: 1 07/19/2012 12:11:00
CHANGE #1 TYP:1 CLS: 4 AFN:284 DBA:0x4718cbee OBJ:1638047 SCN:0x0578.6eddf7c2 SEQ: 1 OP:13.17
ktsphfredo - Format Pagetable Segment Header
BH (70000039ffb5108) file#: 370 rdba: 0x5ca5f32e (370/2487086) class: 7 ba: 70000039f230000
set: 94 blksize: 32768 bsi: 0 set-flg: 0 pwbcnt: 0
dbwrid: 5 obj: 1638047 objn: 148393 tsn: 15 afn: 370
hash: [700000fde5e6380,700000fde5e6380] lru: [7000005e7fcbdc0,700000b91fb4ce8]
lru-flags: hot_buffer
ckptq: [NULL] fileq: [NULL] objq: [700000f7c3f8288,70000063cfbac28]
st: SCURRENT md: NULL tch: 2 le: 70000069bff76a0
flags: remote_transfer
LRBA: [0x0.0.0] HSCN: [0xffff.ffffffff] HSUB: [65535]
buffer tsn: 15 rdba: 0x5ca5f32e (370/2487086)
scn: 0x0578.6eded558 seq: 0x01 flg: 0x00 tail: 0xd5582401
frmt: 0x02 chkval: 0x0000 type: 0x24=PAGETABLE EXTENT MAP BLOCK
Hex dump of block: st=0, typ_found=1
EMB Dump: 
Map Header:: next 0x4718cbee #extents: 1112 obj#: 1638047 flag: 0x10000000
Inc # 0 
Extent Map
-----------------------------------------------------------------
0x5ca5f32b length: 32 
0x5ceff1eb length: 32 
0x5d15360b length: 32 
0x5d5ddbcb length: 32 
0x5d9d106b length: 32
0x5dc000ab length: 32
0x5e09e1ab length: 32
0x5e4a8c0b length: 32
0x5e80d24b length: 32
0x5ec9a10b length: 32
0x5f009feb length: 32
0x5f40b74b length: 32
0x5f895f2b length: 32
0x5fd254cb length: 32
BH (700000dbcfc0ea8) file#: 284 rdba: 0x4718cbee (284/1625070) class: 7 ba: 700000dbc750000
set: 67 blksize: 32768 bsi: 0 set-flg: 0 pwbcnt: 0
dbwrid: 2 obj: 1638047 objn: 148393 tsn: 15 afn: 284
hash: [700000fdc387588,700000fdc387588] lru: [7000002f1fbcf90,700000a77fcfc30]
lru-flags: hot_buffer
ckptq: [NULL] fileq: [NULL] objq: [700000fc67dd420,700000453fb1828]
st: SCURRENT md: NULL tch: 143 le: 700000665fd8200
flags: remote_transfer
LRBA: [0x0.0.0] HSCN: [0xffff.ffffffff] HSUB: [65535]
buffer tsn: 15 rdba: 0x4718cbee (284/1625070)
scn: 0x0578.6ee3867a seq: 0x01 flg: 0x00 tail: 0x867a2401
frmt: 0x02 chkval: 0x0000 type: 0x24=PAGETABLE EXTENT MAP BLOCK
Hex dump of block: st=0, typ_found=1
EMB Dump:
Map Header:: next 0x00000000 #extents: 1983 obj#: 1638047 flag: 0x10000000
Inc # 0
Extent Map
-----------------------------------------------------------------
0x4718cbeb length: 32
0x475b598b length: 32
0x47989f6b length: 32
0x47d84f2b length: 32
ORA-8103 - objd: 1638108 objn: 1338416 tsn: 15 rdba: 0x4b8bf059
ksedmp: internal or fatal error
ORA-08103: object no longer exists
Current SQL statement for this session:
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst+001c          bl       ksedst1              000000001 ? 000000000 ?
ksedmp+0290          bl       ksedst               104C23090 ?
ksddoa+0308          bl       _ptrgl
ksdpcg+0104          bl       ksddoa               110490160 ? 11048ACB8 ?
ksdpec+00e8          bl       ksdpcg               FFFFFFFFFFEEF20 ?
700000010007FE0 ?
FFFFFFFFFFEEFF0 ?
ksfpec+00a4          bl       03F37234
kgesev+007c          bl       _ptrgl
ksesec0+0048         bl       kgesev               000007FE8 ? 104FD1FE0 ?
000000000 ? 000000000 ?
FFFFFFFFFFEF410 ?
kteinicnt1+0384      bl       01FC3F98
qertbFetch+0288      bl       03F386EC
qertqoFetch+0298     bl       01FC3FD8
qerpx_resume+0370    bl       01FC3FD8
qerpxFetch+0e08      bl       qerpx_resume         000000000 ? 11055A520 ?
rwsfcd+0054          bl       _ptrgl
insfch+00b4          bl       _ptrgl
insdrv+042c          bl       insfch               104C2BAE8 ? 000000000 ?
inscovexe+02d8       bl       insdrv               1104A81B0 ?
insExecStmtExecIniE  bl       _ptrgl
ngine+005c
insexe+0318          bl       insExecStmtExecIniE  000000000 ? 000000400 ?
ngine                11048A818 ?
opiexe+2840          bl       insexe               1104BF320 ? FFFFFFFFFFF1678 ?
opipls+1888          bl       opiexe               FFFFFFFFFFF29C8 ?
FFFFFFFFFFF2AB0 ?
FFFFFFFFFFF2968 ?
opiodr+0b2c          bl       _ptrgl
rpidrus+01dc         bl       opiodr               66FFFF47D0 ? 6FFFF4800 ?
FFFFFFFFFFF5900 ? A00000000 ?
skgmstack+00c8       bl       _ptrgl
rpidru+0088          bl       skgmstack            000000003 ? 000000003 ?
000000002 ? 000000000 ?
FFFFFFFFFFF50B0 ?
rpiswu2+0368         bl       _ptrgl
rpidrv+097c          bl       rpiswu2              70000100553C598 ? 000000000 ?
700000010003520 ? 110566428 ?
110566464 ? 96FFFF5B30 ?
1104C6010 ? 000000000 ?
Argument/Register addr=0x0FFFFFFFFFFEF410.
Dump of memory from 0x0FFFFFFFFFFEF3D0 to 0x0FFFFFFFFFFEF510
FFFFFFFFFFEF3D0 00000000 00000000 00000001 1048A818 [.............H..]
FFFFFFFFFFEF3E0 00000000 00002000 00000001 1019C060 [...... ........`]
FFFFFFFFFFEF3F0 0FFFFFFF FFFEF5E0 48220080 00000B9D [........H"......]
FFFFFFFFFFEF400 00000000 00000000 00000000 00000000 [................]
FFFFFFFFFFEF410 0000000F 4B8BF059 0018FEDC 00146C30 [....K..Y......l0]
FFFFFFFFFFEF420 00080003 00007FE8 00000000 100733A8 [..............3.]
00146C30=> 1338416=> ORA-8103 - objd: 1638108 objn: 1338416
kjbhistory[0xbf059.12e0000,(pkey 4294967295.0)(where 1)]
*** 2012-07-19 15:05:23.818
GLOBAL CACHE ELEMENT DUMP (address: 70000018cfe95a0):
id1: 0xbf059 id2: 0x12e0000 pkey: INVALID block: (302/782425)
lock: NC rls: 0x0000 acq: 0x0003 latch: 20
flags: 0xc1 fair: 0 recovery: 0 fpin: 'ktewh25: kteinicnt'
bscn: 0x578.6ee51801 bctx: 0 write: 0 scan: 0x0 xflg: 0 xid: 0x0.0.0
lcp: 700000fd843f070 lnk: [700000fd843f090,700000fd843f090] lch: [700000bdbfbb338,700000bdbfbb338]
seq: 25664 hist: 7 352 477 329 144:6 384 7 352 477 329
LIST OF BUFFERS LINKED TO THIS GLOBAL CACHE ELEMENT:
flg: 0x00080000 state: READING mode: EXCL
pin: 'ktewh25: kteinicnt'
addr: 700000bdbfbb228 obj: 1638108 cls: SEG HEAD bscn: 0x577.a4f2674f
Note= OERR: ORA-8103 "object no longer exists" / Troubleshooting, Diagnostic and Solution (Doc ID 8103.1)
==>
Cause
ORA-8103 is caused by an invalid block type. The block header has an invalid block type or the 
block type inside the block is not expected; e.g. a data block (Type=6) was expected but the 
actual block information is not a data block (Type!=6).
ORA-8103 is also caused by an unexpected data_object_id where it is changing for the involved 
objects while the affected SQL statement is executed.
$sqlplus / as sysdba
Note: please replace literal '<owner>' with actual owner 
--------------------<
set lines 500
set long 9999
set pages 999
set serveroutput on size 1000000
set feedback off
SET MARKUP HTML ON SPOOL ON HEAD "<TITLE>SQL*Plus Report</title><STYLE TYPE='TEXT/CSS'><!--BODY {background: ffffc6} --></STYLE>"
spool query_result.html
set echo off
alter session set nls_date_format = 'yyyy/mm/dd hh24:mi:ss';
SELECT * FROM DBA_TAB_MODIFICATIONS where table_owner = '<owner>'
and table_name in ('RAW_BORM','MG_34_FEE_DTL','RAW_BOIS','MG_34_CA_AMT_BK','RAW_BLDVNI');
spool off
SET MARKUP HTML OFF
set echo on
-------------------->
1. run the hcheck script against the database "using note hcheck.sql" script to check for 
known problems in Oracle8i, Oracle9i, Oracle10g and Oracle 11g (Doc ID 136697.1) and provide the output to SR.
Please do not provide a print screen, but the spool file obtained
2. set event for ORA 8103 to capture the errorstack
alter system set events='8103 trace name errorstack, level 3';
3. wait for the error to reproduce and upload the trace file created for the error

SQL*Net break/reset to client等待事件

一般情况下无法从动态视图(v$session/v$session_wait)看到这个等待事件,因为它十分短暂。其本质从字面意思上来解释的话,是一种网络等待(network issue);
举例而言,如果运行的代码中包含某种可能的错误,且在调用中触发了的话,服务器端本地的服务进程有义务对远程客户端告知该信息,这个告知的过程中服务进程就处于SQL*Net break/reset to client等待中,直到客户端收到问题信息为止。与一般意义上的Sever-client模式一样,使用dblink时也可能出现该种等待事件。
下面我们来尝试演示该种等待事件:

SQL>  create table tv (t1 int unique);
Table created.
SQL> insert into tv values(1);
1 row created.
SQL> commit;
Commit complete.
SQL> oradebug setmypid;
Statement processed.
SQL>  oradebug event 10046 trace name context forever,level 8;
Statement processed.
SQL> insert into tv values(1);
insert into tv values(1)
*
ERROR at line 1:
ORA-00001: unique constraint (SYS.SYS_C009343) violated
SQL>  oradebug event 10046 trace name context off;
Statement processed.
SQL> select event, total_waits from v$session_event where event like '%break%' and sid =(select distinct sid from v$mystat);
EVENT                                                            TOTAL_WAITS
---------------------------------------------------------------- -----------
SQL*Net break/reset to client                                              2

可以看到出现了2次SQL*Net break/reset to client ,进一步分析trace文件可以发现以下记录:

EXEC #2:c=8000,e=9020,p=0,cr=10,cu=9,mis=0,r=0,dep=0,og=1,plh=0,tim=1279117896316991
ERROR #2:err=1 tim=1279117896317039
STAT #2 id=1 cnt=0 pid=0 pos=1 obj=0 op='LOAD TABLE CONVENTIONAL  (cr=0 pr=0 pw=0 time=0 us)'
WAIT #2: nam='SQL*Net break/reset to client' ela= 33 driver id=1413697536 break?=1 p3=0 obj#=-1 tim=1279117896317323
WAIT #2: nam='SQL*Net break/reset to client' ela= 521 driver id=1413697536 break?=0 p3=0 obj#=-1 tim=1279117896317915
WAIT #2: nam='SQL*Net message to client' ela= 5 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1279117896317964

2次SQL*Net break/reset to client分别为33us和521us; 设计不良的应用程序或者程序中存在较多未安置的exception都可能导致SQL*Net break/reset to client等待事件。

AIX平台上大型OLTP数据库的shutdown问题

客户的新系统上线已经一年有余,核心系统硬件采用IBM P595,操作系统版本AIX 5300-09,存储使用DS6000,数据库版本为10.2.0.4,没有打额外的patch set update.此套系统平时会话数量在2000-3000的水平,每秒新建会话在10个左右。

客户这套系统一直有一个问题,即每次准备关闭实例进行一些维护工作时,在多次手动执行检查点(确保脏块被写出)后,shudown immediate命令仍需要非常长的时间才完成关闭数据库动作;之前客户一直使用在shutdown之前将大部分应用服务进程杀死的方法,可以缩短 shutdown immediate命令完成的时间。

实际上服务进程在2000-3000的OLTP系统在当前已经很普及了,而在其他平台上(譬如:Linux,SUN OS)上则不会出现一个shutdown操作持续半个小时以上的情况。

通过查询My Oracle support发现一个9i上shutdown immediate费时半个小时的note:

Hdr: 3484589 9.2.0.4.0 RDBMS 9.2.0.4.0 PRODID-5 PORTID-212
Abstract: BUG:3046394  WHICH IS A REWORK OF BUG :2674297 DOES NOT  STILL FIX THE PROBLEM.
PROBLEM:
--------
Shutdown abort on AIX5L takes 7 min. and Shutdown immediate takes 30 min.
Patch for the bug 3046394( which is a rework of Bug 2674297) is applied ,but
there is no improvement in the time taken ot shutdown.
The problem is same as addressed in these two bugs but has not been fixed.
2. Pertinent configuration information
none
3. Indication of the frequency and predictability of the problem
Consistent.
4. Sequence of events leading to the problem
Shutdown abort
5. Technical impact on the customer. Include persistent after effects.
Shutdown taking a long time which is unacceptable to the  Customer.
DIAGNOSTIC ANALYSIS:
--------------------
From the alert log we see time taken for shutdown is around 7 min.
The patch 3046394 is applie,which is confirmed form the output of
Apply_3046394_02-20-2004_19-11-35.log
The shutdown abort was tried again now,it still takes the same amount of time.
The system calls made by the shutdown abort include:-
(These are a subset of the total Kill calls made)
196022:    kill(359980, 9)                    = 0
196022:    kill(359980, 9)                    = 0
196022:    kill(359980, 0)                    = 0
196022:    _nsleep(0x0FFFFFFFFFFF9D80, 0x0FFFFFFFFFFF9E50)    = 0
196022:    kill(359980, 0)                    = 0
196022:    _nsleep(0x0FFFFFFFFFFF9D80, 0x0FFFFFFFFFFF9E50)    = 0
196022:    kill(359980, 0)                    = 0
196022:    _nsleep(0x0FFFFFFFFFFF9D80, 0x0FFFFFFFFFFF9E50)    = 0
196022:    kill(359980, 0)                    = 0
196022:    _nsleep(0x0FFFFFFFFFFF9D80, 0x0FFFFFFFFFFF9E50)    = 0
WORKAROUND:
-----------
No Workaround available.
RELATED BUGS:
-------------
Bug:3046394 , Bug :2674297

另一个文档指出,shutdown abort慢的主要原因Oracle关闭进程使用的是system call kill()函数,Bug 3484589的补丁中已经将Oracle 在shutdown实例时可能的等待问题移除了。

Shutdown abort taking long time on AIX (patch 3046394 applied) [ID 274399.1]
Applies To
Oracle Server - Enterprise Edition - Version: 9.2.0.1 to 9.2.0.4
AIX5L Based Systems (64-bit)
Symptoms
Shutdown abort taking a long time to complete on AIX.
Cause
This has been filed as bug 2674297. The wait time was reduced to shorten the shutdown time. This still 
takes around 50ms for each process to be terminated. This has been re-worked in Bug 3046394 to remove the 
delay during the shutdown and compensate for it during database startup.
But we still see the delay in shutdown abort on AIX. The delay during shutdown abort persists after applying 
the above patch (3046394) due to OS level handling of the shutdown command.
Fix
There is no fix. The problem still persists at the OS level. This is because of the way the kill command is 
processed at the OS level. Bug 3484589, has clearly mentioned that all the waits from the Oracle side have 
been removed. There is no further code that can be implemented to reduce the time at shutdown.

另一个文档则指出了这种连shutdown abort都慢的情况仅出现在AIX平台上:

Hdr: 3485690 9.2.0.3 RDBMS 9.2.0.3 OSD PRODID-5 PORTID-319 3046394
Abstract: SHUTDOWN ABORT TAKES A LONG TIME ON IBM AIX
PROBLEM:
--------
Ct has up to  8000 connections on their database. When they issue a shutdown
abort, it takes 8 minutes to complete. With 175 connections, it took 2.5
minutes to shutdown abort. With no one attached it takes 14 seconds.
When I test this on our database in house, AIX took 12 seconds to shutdown.
Linux takes .2 seconds, as does Solaris, and Windows 2000.
DIAGNOSTIC ANALYSIS:
--------------------
Applied patch for bug 2674297, and shutdown abort with 5000 connections took 5
minutes.

看起来这是一个操作系统调用引起的问题,且因为是操作系统的问题,Oracle无法提供进一步彻底解决方法;

总结起来当服务进程数量较多时,在AIX平台上shutdown immediate的缓慢程度往往是用户无法接受的,我们可以采取一个折中的方案,使用shutdown abort来缩短关闭实例所用的时间:

1.  alter system checkpoint;             -- 手动做检查点三次,保证脏块写出
2.  shutdown abort;                      -- abort实例,相当于断电
3.  startup restrict ;                   -- 以限制模式开启实例,普通应用此时无法连接数据库,故不会产生额外的服务进程
4.  shutdown normal;                     -- 以普通模式关闭实例,会再次进行完全检查点,并回滚事务

其中第二步的shutdown abort,可以以杀死pmon进程的方式替代(这样其实会更快);采用以上折中方式前,请确认您的online redo log处于复用或已镜像的状态下。

Script:收集UNDO诊断信息

以下脚本可以用于收集Automatic Undo Management的必要诊断信息,以sysdba身份运行:

spool Undo_Diag.out  
ttitle off
set pages 999
set lines 150
set verify off 
set termout off
set trimout on
set trimspool on
REM   
REM ------------------------------------------------------------------------  
REM   
REM  -----------------------------------------------------------------  
REM  
set space 2  
REM  REPORTING TABLESPACE INFORMATION: 
REM   
REM  This looks at Tablespace Sizing - Total bytes and free bytes  
REM   
column tablespace_name  format a30            heading 'TS Name'  
column sbytes           format 9,999,999,999  heading 'Total MBytes'  
column fbytes           format 9,999,999,999  heading 'Free MBytes'  
column file_name        format a30            heading 'File Name'
column kount            format 999            heading 'Ext'  
compute sum of fbytes on tablespace_name  
compute sum of sbytes on tablespace_name  
compute sum of sbytes on report  
compute sum of fbytes on report  
break on tablespace_name skip 2  
select a.tablespace_name,  a.file_name,  round(a.bytes/1024/1024,0) sbytes,  
round(sum(b.bytes/1024/1024),0) fbytes,  count(*) kount, autoextensible  
from   dba_data_files a,  dba_free_space b  
where  a.file_id  =  b.file_id  
and a.tablespace_name in (select z.tablespace_name from dba_tablespaces z where retention like '%GUARANTEE')
group  by a.tablespace_name, a.file_name, a.bytes, autoextensible
order  by a.tablespace_name  
/  
set linesize 160  
REM   
REM  If you can significantly reduce physical reads by adding incremental  
REM  data buffers...do it.  To determine whether adding data buffers will  
REM  help, set db_block_lru_statistics = TRUE and  
REM  db_block_lru_extended_statistics = TRUE in the init.ora parameters.  
REM  You can determine how many extra hits you would get from memory as  
REM  opposed to physical I/O from disk.  **NOTE:  Turning these on will  
REM  impact performance.  One shift of statistics gathering should be enough  
REM  to get the required information.  
REM   
REM   
REM  -----------------------------------------------------------------  
REM
set lines 160
col tablespace_name format a30 heading "Tablespace"
col tb format a15 heading "TB Status"
col df format a10 heading "DF Status"
col extent_management format a15 heading "Extent|Management"
col allocation_type format a8 heading "Type"
col segment_space_management format a7 heading "Auto|Segment"
col retention format a11 heading "Retention|Level"
col autoextensible format a5 heading "Auto?"
col mx format 999,999,999 heading "Max Allowed"
select t.tablespace_name, t.status tb, d.status df,
extent_management, allocation_type, segment_space_management, retention,
autoextensible, (maxbytes/1024/1024) mx
from dba_tablespaces t, dba_data_files d
where t.tablespace_name = d.tablespace_name
and retention like '%GUARANTEE'
/
col status format a20 head "Status"
col cnt format 999,999,999 head "How Many?"
select status, count(*) cnt
from dba_rollback_segs
group by status
/
set termout on
set trimout off
set trimspool off
set lines 120
set pages 999
set termout off
set trimout on
set trimspool on
alter session set nls_date_format='dd-Mon-yyyy hh24:mi';
prompt
prompt  ############## RUNTIME ############## 
prompt
col rdate head "Run Time"
select sysdate rdate from dual;
prompt 
prompt  ############## DATAFILES ############## 
prompt 
col retention head "Retention"
col tablespace_name format a30 head "TBSP Name"
col file_id format 999 head "File #"
col a format 999,999,999,999,999 head "Bytes Alloc (MB)"
col b format 999,999,999,999,999 head "Max Bytes Used (MB)"
col autoextensible head "Auto|Ext"
col extent_management head "Ext Mngmnt"
col allocation_type head "Type"
col segment_space_management head "SSM"
select tablespace_name, file_id, sum(bytes)/1024/1024 a, 
sum(maxbytes)/1024/1024 b, 
autoextensible
from dba_data_files
where tablespace_name in (select tablespace_name from dba_tablespaces
where retention like '%GUARANTEE' )
group by file_id, tablespace_name, autoextensible
order by tablespace_name
/
set termout on
set trimout off
set trimspool off
ttitle off
set pages 999
set lines 150
set verify off 
set termout off
set trimout on
set trimspool on
REM   
REM ------------------------------------------------------------------------  
REM   
REM  -----------------------------------------------------------------  
REM  
REM
REM  REPORTING UNDO EXTENTS INFORMATION:  
REM   
REM  -----------------------------------------------------------------  
REM 
REM  Undo Extents breakdown information
REM
ttitle center "Rollback Segments Breakdown" skip 2
col status format a20
col cnt format 999,999,999 head "How Many?"
select status, count(*) cnt from dba_rollback_segs
group by status
/
ttitle center "Undo Extents" skip 2
col segment_name format a30 heading "Name"
col "ACT BYTES" format 999,999,999,999 head "Active|Extents"
col "UNEXP BYTES" format 999,999,999,999 head "Unxpired|Extents"
col "EXP BYTES" format 999,999,999,999 head "Expired|Extents"
select segment_name,
nvl(sum(act),0) "ACT BYTES",
nvl(sum(unexp),0) "UNEXP BYTES",
nvl(sum(exp),0) "EXP BYTES"
from (
select segment_name,
nvl(sum(bytes),0) act,00 unexp, 00 exp
from DBA_UNDO_EXTENTS
where status='ACTIVE' group by segment_name
union
select segment_name,
00 act, nvl(sum(bytes),0) unexp, 00 exp
from DBA_UNDO_EXTENTS
where status='UNEXPIRED' group by segment_name
union
select segment_name,
00 act, 00 unexp, nvl(sum(bytes),0) exp
from DBA_UNDO_EXTENTS
where status='EXPIRED' group by segment_name
) group by segment_name;
ttitle center "Undo Extents Statistics" skip 2
col size format 999,999,999,999 heading "Size"
col "HOW MANY" format 999,999,999 heading "How Many?"
col st heading a12 heading "Status"
select distinct status st, count(*) "HOW MANY", sum(bytes) "SIZE"
from dba_undo_extents
group by status
/
col segment_name format a30 heading "Name"
col TABLESPACE_NAME for a20
col BYTES for 999,999,999,999
col BLOCKS for 999,999,999
col status for a15 heading "Status"
col segment_name heading "Segment"
col extent_id heading "ID"
select SEGMENT_NAME, TABLESPACE_NAME, EXTENT_ID, 
FILE_ID, BLOCK_ID, BYTES, BLOCKS, STATUS
from dba_undo_extents
order by 1,3,4,5
/
REM
REM  -----------------------------------------------------------------  
REM 
REM  Undo Extents Contention breakdown
REM  Take out column TUNED_UNDORETENTION if customer 
REM   prior to 10.2.x
REM
REM   The time frame can be adjusted with this query
REM   By default using around 4 hour window of time
REM
REM   Ex.
REM   Using sysdate-.04 looking at the last hour
REM   Using sysdate-.16 looking at the last 4 hours
REM   Using sysdate-.32 looking at the last 8 hours
REM   Using sysdate-1 looking at the last 24 hours
REM
set linesize 140
ttitle center "Undo Extents Error Conditions (Default - Last 4 Hours)" skip 2
col UNXPSTEALCNT format 999,999,999  heading "# Unexpired|Stolen"
col EXPSTEALCNT format 999,999,999   heading "# Expired|Reused"
col SSOLDERRCNT format 999,999,999   heading "ORA-1555|Error"
col NOSPACEERRCNT format 999,999,999 heading "Out-Of-space|Error"
col MAXQUERYLEN format 999,999,999   heading "Max Query|Length"
col TUNED_UNDORETENTION format 999,999,999  heading "Auto-Ajusted|Undo Retention"
col hours format 999,999 heading "Tuned|(HRs)"
select inst_id, to_char(begin_time,'MM/DD/YYYY HH24:MI') begin_time, 
UNXPSTEALCNT, EXPSTEALCNT , SSOLDERRCNT, NOSPACEERRCNT, MAXQUERYLEN,
TUNED_UNDORETENTION, TUNED_UNDORETENTION/60/60 hours
from gv$undostat
where begin_time between (sysdate-.16) 
and sysdate
order by inst_id, begin_time
/
set termout on
set trimout off
set trimspool off
ttitle off
set pages 999
set lines 150
set verify off 
set termout off
set trimout on
set trimspool on
REM   
REM ------------------------------------------------------------------------  
col name format a30  
col gets format 9,999,999  
col waits format 9,999,999  
PROMPT  ROLLBACK HIT STATISTICS:  
REM   
REM  GETS - # of gets on the rollback segment header 
REM  WAITS - # of waits for the rollback segment header  
set head on;  
select name, waits, gets  
from   v$rollstat, v$rollname  
where  v$rollstat.usn = v$rollname.usn  
/  
col pct head "< 2% ideal"
select 'The average of waits/gets is '||  
round((sum(waits) / sum(gets)) * 100,2)||'%' PCT 
From    v$rollstat  
/  
PROMPT  REDO CONTENTION STATISTICS:
REM   
REM  If the ratio of waits to gets is more than 1% or 2%, consider  
REM  creating more rollback segments  
REM   
REM  Another way to gauge rollback contention is:  
REM   
column xn1 format 9999999  
column xv1 new_value xxv1 noprint  
select class, count  
from   v$waitstat  
where  class in ('system undo header', 'system undo block', 
'undo header',        'undo block'          )  
/  
set head off
select 'Total requests = '||sum(count) xn1, sum(count) xv1  
from    v$waitstat  
/  
select 'Contention for system undo header = '||  
(round(count/(&xxv1+0.00000000001),4)) * 100||'%'  
from  v$waitstat  
where   class = 'system undo header'  
/  
select 'Contention for system undo block = '||  
(round(count/(&xxv1+0.00000000001),4)) * 100||'%'  
from    v$waitstat  
where   class = 'system undo block'  
/  
select 'Contention for undo header = '||  
(round(count/(&xxv1+0.00000000001),4)) * 100||'%'  
from    v$waitstat  
where   class = 'undo header'  
/  
select 'Contention for undo block = '||  
(round(count/(&xxv1+0.00000000001),4)) * 100||'%'  
from    v$waitstat  
where   class = 'undo block'  
/  
REM   
REM  NOTE: Not as useful with AUM configured 
REM 
REM  If the percentage for an area is more than 1% or 2%, consider  
REM  creating more rollback segments.  Note:  This value is usually very  
REM  small 
REM  and has been rounded to 4 places.  
REM   
REM ------------------------------------------------------------------------  
REM   
REM  The following shows how often user processes had to wait for space in  
REM  the redo log buffer:  
select name||' = '||value  
from   v$sysstat  
where  name = 'redo log space requests'  
/  
REM   
REM  This value should be near 0.  If this value increments consistently,  
REM  processes have had to wait for space in the redo buffer.  If this  
REM  condition exists over time, increase the size of LOG_BUFFER in the  
REM  init.ora file in increments of 5% until the value nears 0.  
REM  ** NOTE: increasing the LOG_BUFFER value will increase total SGA size.  
REM   
REM  -----------------------------------------------------------------------  
col name format a15  
col gets format 9999999  
col misses format 9999999  
col immediate_gets heading 'IMMED GETS' format 9999999  
col immediate_misses heading 'IMMED MISS' format 9999999  
col sleeps format 999999  
PROMPT  LATCH CONTENTION:  
REM   
REM  GETS - # of successful willing-to-wait requests for a latch  
REM  MISSES - # of times an initial willing-to-wait request was unsuccessful  
REM  IMMEDIATE_GETS - # of successful immediate requests for each latch  
REM  IMMEDIATE_MISSES = # of unsuccessful immediate requests for each latch  
REM  SLEEPS - # of times a process waited and requests a latch after an  
REM           initial willing-to-wait request  
REM   
REM  If the latch requested with a willing-to-wait request is not  
REM  available, the requesting process waits a short time and requests  
REM  again.  
REM  If the latch requested with an immediate request is not available,  
REM  the requesting process does not wait, but continues processing  
REM   
set head on  
select name,          gets,              misses,  
immediate_gets,  immediate_misses,  sleeps  
from   v$latch  
where  name in ('redo allocation',  'redo copy')  
/  
set head off 
select 'Ratio of MISSES to GETS: '||  
round((sum(misses)/(sum(gets)+0.00000000001) * 100),2)||'%'  
from    v$latch  
where   name in ('redo allocation',  'redo copy')  
/  
select 'Ratio of IMMEDIATE_MISSES to IMMEDIATE_GETS: '||  
round((sum(immediate_misses)/  
(sum(immediate_misses+immediate_gets)+0.00000000001) * 100),2)||'%' 
from    v$latch  
where   name in ('redo allocation',  'redo copy')  
/  
set head on
REM   
REM  If either ratio exceeds 1%, performance will be affected.  
REM   
REM  Decreasing the size of LOG_SMALL_ENTRY_MAX_SIZE reduces the number of  
REM  processes copying information on the redo allocation latch.  
REM   
REM  Increasing the size of LOG_SIMULTANEOUS_COPIES will reduce contention  
REM  for redo copy latches.  
REM   
REM  -----------------------------------------------------------------  
REM  This looks at overall i/o activity against individual  
REM  files within a tablespace  
REM   
REM  Look for a mismatch across disk drives in terms of I/O  
REM   
REM  Also, examine the Blocks per Read Ratio for heavily accessed  
REM  TSs - if this value is significantly above 1 then you may have  
REM  full tablescans occurring (with multi-block I/O)  
REM   
REM  If activity on the files is unbalanced, move files around to balance  
REM  the load.  Should see an approximately even set of numbers across files  
REM   
set space 1  
PROMPT  REPORTING I/O STATISTICS:
column pbr       format 99999999  heading 'Physical|Blk Read'  
column pbw       format 999999    heading 'Physical|Blks Wrtn'  
column pyr       format 999999    heading 'Physical|Reads'  
column readtim   format 99999999  heading 'Read|Time'  
column name      format a55       heading 'DataFile Name'  
column writetim  format 99999999  heading 'Write|Time'  
compute sum of f.phyblkrd, f.phyblkwrt on report  
select fs.name name,  f.phyblkrd pbr,  f.phyblkwrt pbw, 
f.readtim,     f.writetim  
from   v$filestat f, v$datafile fs  
where  f.file#  =  fs.file#  
order  by fs.name  
/  
REM   
REM  -----------------------------------------------------------------  
PROMPT  GENERATING WAIT STATISTICS:  
REM   
REM  This will show wait stats for certain kernel instances.  This  
REM  may show the need for additional rbs, wait lists, db_buffers  
REM   
column class  heading 'Class Type'  
column count  heading 'Times Waited'  format 99,999,999 
column time   heading 'Total Times'   format 99,999,999  
select class,  count,  time  
from   v$waitstat  
where  count > 0  
order  by class  
/  
REM   
REM  Look at the wait statistics generated above (if any). They will  
REM  tell you where there is contention in the system.  There will  
REM  usually be some contention in any system - but if the ratio of  
REM  waits for a particular operation starts to rise, you may need to  
REM  add additional resource, such as more database buffers, log buffers,  
REM  or rollback segments  
REM   
REM  -----------------------------------------------------------------  
PROMPT  ROLLBACK EXTENT STATISTICS:  
REM   
column usn        format 999          heading 'Undo #'
column extents    format 999          heading 'Extents'  
column rssize     format 999,999,999  heading 'Size in|Bytes'  
column optsize    format 999,999,999  heading 'Optimal|Size'  
column hwmsize    format 99,999,999   heading 'High Water|Mark'  
column shrinks    format 9,999        heading 'Num of|Shrinks'  
column wraps      format 9,999        heading 'Num of|Wraps'  
column extends    format 999,999      heading 'Num of|Extends'  
column aveactive  format 999,999,999  heading 'Average size|Active Extents'  
column rownum noprint  
select usn, extents, rssize,    optsize,  hwmsize,  
shrinks,   wraps,    extends,  aveactive  
from   v$rollstat  
order  by rownum  
/  
set termout on
set trimout off
set trimspool off
set lines 120
set pages 999
set termout off
set trimout on
set trimspool on
prompt
prompt  ############## RUNTIME ############## 
prompt
col rdate head "Run Time"
select sysdate rdate from dual;
prompt 
prompt  ############## HISTORICAL DATA ############## 
prompt 
col x format 999,999 head "Max Concurrent|Last 7 Days"
col y format 999,999 head "Max Concurrent|Since Startup"
select max(maxconcurrency) x from v$undostat
/
select max(maxconcurrency) y from sys.wrh$_undostat
/
col i format 999,999 head "1555 Errors"
col j format 999,999 head "Undo Space Errors"
select sum(ssolderrcnt) i from v$undostat
where end_time > sysdate-2
/
select sum(nospaceerrcnt) j from v$undostat
where end_time > sysdate-2
/
prompt 
prompt  ############## CURRENT STATUS OF SEGMENTS  ############## 
prompt  ##############   SNAPSHOT IN TIME INFO     ##############
prompt  ##############(SHOWS CURRENT UNDO ACTIVITY)##############
prompt 
col segment_name format a30 head "Segment Name"
col "ACT BYTES" format 999,999,999,999 head "Active Bytes"
col "UNEXP BYTES" format 999,999,999,999 head "Unexpired Bytes"
col "EXP BYTES" format 999,999,999,999 head "Expired Bytes"
select segment_name, nvl(sum(act),0) "ACT BYTES", 
nvl(sum(unexp),0) "UNEXP BYTES",
nvl(sum(exp),0) "EXP BYTES"
from (select segment_name, nvl(sum(bytes),0) act,00 unexp, 00 exp
from dba_undo_extents where status='ACTIVE' group by segment_name
union 
select segment_name, 00 act, nvl(sum(bytes),0) unexp, 00 exp
from dba_undo_extents where status='UNEXPIRED' group by segment_name
union
select segment_name, 00 act, 00 unexp, nvl(sum(bytes),0) exp
from dba_undo_extents where status='EXPIRED' group by segment_name)
group by segment_name
order by 1
/
prompt 
prompt  ############## UNDO SPACE USAGE ############## 
prompt 
col usn format 999,999 head "Segment#"
col shrinks format 999,999,999 head "Shrinks"
col aveshrink format 999,999,999 head "Avg Shrink Size"
select usn, shrinks, aveshrink from v$rollstat
/
set termout on
set trimout off
set trimspool off
set pages 999
set termout off
set trimout on
set trimspool on
prompt
prompt  ############## RUNTIME ############## 
prompt
col rdate head "Run Time"
select sysdate rdate from dual;
col inst_id format 999 head "Instance #"
col Parameter format a35 wrap
col "Session Value" format a25 wrapped
col "Instance Value" format a25 wrapped
prompt
prompt  ############## PARAMETERS ############## 
prompt
select  a.inst_id, a.ksppinm  "Parameter",
b.ksppstvl "Session Value",
c.ksppstvl "Instance Value"
from x$ksppi a, x$ksppcv b, x$ksppsv c
where a.indx = b.indx and a.indx = c.indx
and a.inst_id=b.inst_id and b.inst_id=c.inst_id
and a.ksppinm in ('_undo_autotune', '_smu_debug_mode',
'_highthreshold_undoretention',
'undo_tablespace','undo_retention','undo_management')
order by 2;
set termout on
set trimout off
set trimspool off
set pages 999
set termout off
set trimout on
set trimspool on
prompt
prompt  ############## RUNTIME ############## 
prompt
col rdate head "Run Time"
select sysdate rdate from dual;
prompt 
prompt  ############## WAITS FOR UNDO (Since Startup) ############## 
prompt 
col inst_id head "Instance#"
col eq_type format a3 head "Enq"
col total_req# format 999,999,999,999,999,999 head "Total Requests"
col total_wait# format 999,999 head "Total Waits"
col succ_req# format 999,999,999,999,999,999 head "Successes"
col failed_req# format 999,999,999999 head "Failures"
col cum_wait_time format 999,999,999 head "Cummalitve|Time"
select * from v$enqueue_stat where eq_type='US'
union
select * from v$enqueue_stat where eq_type='HW'
/
prompt 
prompt  ############## LOCKS FOR UNDO ############## 
prompt 
col addr head "ADDR"
col KADDR head "KADDR"
col sid head "Session"
col osuser format a10 head "OS User"
col machine format a15 head "Machine"
col program format a17 head "Program"
col process format a7 head "Process"
col lmode head "Lmode"
col request head "Request"
col ctime format 9,999 head "Time|(Mins)"
col block head "Blocking?"
select /*+ RULE */  a.SID, b.process,
b.OSUSER,  b.MACHINE,  b.PROGRAM, 
addr, kaddr, lmode, request, round(ctime/60/60,0) ctime, block 
from 
v$lock a, 
v$session b 
where 
a.sid=b.sid
and a.type='US'
/
prompt 
prompt  ############## TUNED RETENTION HISTORY (Last 2 Days) ############## 
prompt  ##############        LOWEST AND HIGHEST DATA        ############## 
prompt 
col low format 999,999,999,999 head "Undo Retention|Lowest Tuned Value"
col high format 999,999,999,999 head "Undo Retention|Highest Tuned Value"
select end_time, tuned_undoretention from v$undostat where tuned_undoretention = (
select min(tuned_undoretention) low
from v$undostat
where end_time > sysdate-2)
/
select end_time, tuned_undoretention from v$undostat where tuned_undoretention = (
select max(tuned_undoretention) high
from v$undostat
where end_time > sysdate-2)
/
prompt 
prompt  ############## CURRENT TRANSACTIONS ############## 
prompt 
col sql_text format a40 word_wrapped head "SQL Code"
select a.start_date, a.start_scn, a.status, c.sql_text
from v$transaction a, v$session b, v$sqlarea c
where b.saddr=a.ses_addr and c.address=b.sql_address
and b.sql_hash_value=c.hash_value
/
select current_scn from v$database
/
col a format 999,999 head "UnexStolen"
col b format 999,999 head "ExStolen"
col c format 999,999 head "UnexReuse"
col d format 999,999 head "ExReuse"
prompt 
prompt  ############## WHO'S STEALING WHAT? (Last 2 Days) ############## 
prompt 
select unxpstealcnt a, expstealcnt b,
unxpblkreucnt c, expblkreucnt d
from v$undostat
where (unxpstealcnt > 0 or expstealcnt > 0)
and end_time > sysdate-2
/
set termout on
set trimout off
set trimspool off
set pages 999
set termout off
set trimout on
set trimspool on
prompt
prompt  ############## RUNTIME ############## 
prompt
col rdate head "Run Time"
select sysdate rdate from dual;
col current_scn head "SCN Now"
col start_date head "Trans Started"
col start_scn head "SCN for Trans"
col ses_addr head "ADDR"
prompt 
prompt  ############## Historical V$UNDOSTAT (Last 2 Days) ############## 
prompt 
col end_time format a18 Head "Date/Time"
col maxq format 999,999 head "Query|Maximum|Minutes"
col maxquerysqlid head "SqlID"
col undotsn format 999,999 head "TBS"
col undoblks format 999,999,999 head "Undo|Blocks"
col txncount format 999,999,999 head "# of|Trans"
col unexpiredblks format 999,999,999 head "# of Unexpired"
col expiredblks format 999,999,999 head "# of Expired"
col tuned format 999,999 head "Tuned Retention|(Minutes)"
select end_time, round(maxquerylen/60,0) maxq, maxquerysqlid,
undotsn, undoblks, txncount, unexpiredblks, expiredblks, 
round(tuned_undoretention/60,0) Tuned
from dba_hist_undostat
where end_time > sysdate-2
order by 1
/
prompt 
prompt  ############## RECENT MISSES FOR UNDO (Last 2 Days) ############## 
prompt 
set lines 500
select * from v$undostat where maxquerylen > tuned_undoretention
and end_time > sysdate-2
order by 2
/
select * from sys.wrh$_undostat where maxquerylen > tuned_undoretention
and end_time > sysdate-2
order by 2
/
prompt 
prompt  ############## AUTO-TUNING TUNE-DOWN DATA    ############## 
prompt  ############## ROLLBACK DATA (Since Startup) ############## 
prompt 
col name format a60 head "Name"
col value format 999,999,999 head "Counters"
select name, value from v$sysstat
where name like '%down retention%' or name like 'une down%'
or name like '%undo segment%' or name like '%rollback%'
or name like '%undo record%'
/
prompt 
prompt  ############## Long Running Query History ############## 
prompt 
col end_time head "Date"
col maxquerysqlid head "SQL ID"
col runawayquerysqlid format a15 head "Runaway SQL ID"
col results format a35 word_wrapped head "Space Issues"
col status head "Status"
col newret head "Tuned Down|Retention"
select end_time, maxquerysqlid, runawayquerysqlid, status,
decode(status,1,'Slot Active',4,'Reached Best Retention',5,'Reached Best Retention',
8, 'Runaway Query',9,'Runaway Query-Active',10,'Space Pressure',
11,'Space Pressure Currently',
16, 'Tuned Down (to undo_retention) due to Space Pressure', 
17,'Tuned Down (to undo_retention) due to Space Pressure-Active',
18, 'Tuning Down due to Runaway', 19, 'Tuning Down due to Runaway-Active',
28, 'Runaway tuned down to last tune down value',
29, 'Runaway tuned down to last tune down value',
32, 'Max Tuned Down - Not Auto-Tuning',
33, 'Max Tuned Down - Not Auto-Tuning (Active)',
37, 'Max Tuned Down - Not Auto-Tuning (Active)', 
38, 'Max Tuned Down - Not Auto-Tuning', 
39, 'Max Tuned Down - Not Auto-Tuning (Active)', 
40, 'Max Tuned Down - Not Auto-Tuning', 
41, 'Max Tuned Down - Not Auto-Tuning (Active)', 
42, 'Max Tuned Down - Not Auto-Tuning', 
44, 'Max Tuned Down - Not Auto-Tuning', 
45, 'Max Tuned Down - Not Auto-Tuning (Active)', 
'Other ('||status||')') Results, spcprs_retention NewRet
from sys.wrh$_undostat
where status > 1
/
prompt 
prompt  ############## Details on Long Run Queries ############## 
prompt 
col sql_fulltext head "SQL Text"
Col sql_id heading "SQL ID"
select sql_id, sql_fulltext, last_load_time "Last Load", 
round(elapsed_time/60/60/24,0) "Elapsed Days" 
from v$sql where sql_id in 
(select maxquerysqlid from sys.wrh$_undostat 
where status > 1)
/
set termout on
set trimout off
set trimspool off
set pages 999
set termout off
set trimout on
set trimspool on
prompt
prompt  ############## RUNTIME ############## 
prompt
col rdate head "Run Time"
select sysdate rdate from dual;
prompt 
prompt  ############## IN USE Undo Data ############## 
prompt 
select 
((select (nvl(sum(bytes),0)) 
from dba_undo_extents 
where tablespace_name in (select tablespace_name from dba_tablespaces
where retention like '%GUARANTEE' )
and status in ('ACTIVE','UNEXPIRED')) *100) / 
(select sum(bytes) 
from dba_data_files 
where tablespace_name in (select tablespace_name from dba_tablespaces
where retention like '%GUARANTEE' )) "PCT_INUSE" 
from dual; 
select tablespace_name, extent_management, allocation_type,
segment_space_management, retention
from dba_tablespaces where retention like '%GUARANTEE'
/
col c format 999,999,999,999 head "Sum of Free"
select (nvl(sum(bytes),0)) c from dba_free_space
where tablespace_name in
(select tablespace_name from dba_tablespaces where retention like '%GUARANTEE')
/
col d format 999,999,999,999 head "Total Bytes"
select sum(bytes) d from dba_data_files
where tablespace_name in
(select tablespace_name from dba_tablespaces where retention like '%GUARANTEE')
/
PROMPT
PROMPT  ############## UNDO SEGMENTS ############## 
PROMPT
col status head "Status"
col z format 999,999 head "Total Extents"
break on report
compute sum on report of z
select status, count(*) z from dba_undo_extents
group by status
/
col z format 999,999 head "Undo Segments"
select status, count(*) z from dba_rollback_segs
group by status
/
prompt 
prompt  ############## CURRENT STATUS OF SEGMENTS  ############## 
prompt  ##############   SNAPSHOT IN TIME INFO     ##############
prompt  ##############(SHOWS CURRENT UNDO ACTIVITY)##############
prompt 
col segment_name format a30 head "Segment Name"
col "ACT BYTES" format 999,999,999,999 head "Active Bytes"
col "UNEXP BYTES" format 999,999,999,999 head "Unexpired Bytes"
col "EXP BYTES" format 999,999,999,999 head "Expired Bytes"
select segment_name, nvl(sum(act),0) "ACT BYTES", 
nvl(sum(unexp),0) "UNEXP BYTES",
nvl(sum(exp),0) "EXP BYTES"
from (select segment_name, nvl(sum(bytes),0) act,00 unexp, 00 exp
from dba_undo_extents where status='ACTIVE' group by segment_name
union 
select segment_name, 00 act, nvl(sum(bytes),0) unexp, 00 exp
from dba_undo_extents where status='UNEXPIRED' group by segment_name
union
select segment_name, 00 act, 00 unexp, nvl(sum(bytes),0) exp
from dba_undo_extents where status='EXPIRED' group by segment_name)
group by segment_name
order by 1
/
prompt 
prompt  ############## UNDO SPACE USAGE ############## 
prompt 
col usn format 999,999 head "Segment#"
col shrinks format 999,999,999 head "Shrinks"
col aveshrink format 999,999,999 head "Avg Shrink Size"
select usn, shrinks, aveshrink from v$rollstat
/
set termout on
set trimout off
set trimspool off
spool off

沪ICP备14014813号

沪公网安备 31010802001379号