Oracle Supplemental 补全日志介绍

Oracle补全日志(Supplemental logging)特性因其作用的不同可分为以下几种:最小(Minimal),支持所有字段(all),支持主键(primary key),支持唯一键(unique),支持外键(foreign key)。包括LONG,LOB,LONG RAW及集合等字段类型均无法利用补全日志。

最小(Minimal)补全日志开启后可以使得logmnr工具支持链式行,簇表和索引组织表。可以通过以下SQL检查最小补全日志是否已经开启:

SELECT supplemental_log_data_min FROM v$database;

若结果返回YES或IMPLICIT则说明已开启最小补全日志,当使用ALL,PRIMARY,UNIQUE或FOREIGN补全日志时最小补全日志默认开启(即检查结果为IMPLICIT)。

一般情况下我们在使用逻辑备库时启用主键和惟一键的补全日志,而有时表上可能没有主键,惟一键或唯一索引;我们通过以下实验总结这种情况下Oracle的表现。

首先建立相关的测试表:

alter database add supplemental log data (primary key,unique index) columns ;

create table test (t1 int , t2 int ,t3 int ,t4 int );

alter table test add constraint pk_t1 primary key (t1); –添加主键

随后使用循环插入一定量的数据

update test set t2=10;       commit;   — 更新数据

使用LOGMNR工具分析之前的操作,可以看到REDO中记录的SQL形式如下:

update “SYS”.”TEST” set “T2” = ’10’ where “T1” = ’64’ and “T2” = ’65’ and ROWID = ‘AAAMiSAABAAAOhiAA/’;

其中where字句后分别记录了主键值,被修改字段的值和原行的ROWID。

现在我们将原表上的主键去掉来观察。

alter table test drop constraint pk_t1 ;

update test set t2=11;       commit;   — 更新数据

使用LOGMNR分析可以发现,REDO中的SQL记录如下:

update “SYS”.”TEST” set “T2” = ’11’ where “T1” = ‘1’ and “T2” = ’10’ and “T3” = ‘3’ and “T4” = ‘4’ and ROWID = ‘AAAMiSAABAAAOhiAAA’;

当没有主键的情况下,where子句后记录了所有列值和ROWID。

以下实验在存在唯一索引情况下的表现

create unique index pk_t1 on test(t1);

update test set t2=15; commit;

使用LOGMNR分析可以发现,REDO中的SQL记录如下:

update “SYS”.”TEST” set “T2” = ’15’ where “T1” = ‘9’ and “T2” = ’11’ and “T3” = ’11’ and “T4” = ’12’ and ROWID = ‘AAAMiSAABAAAOhiAAI’;

以上是t1列有唯一索引但不限定not null的情况,下面我们加上not null限制

alter table test modify t1 not null;

update test set t2=21; commit;

使用LOGMNR分析可以发现,REDO中的SQL记录如下:

update “SYS”.”TEST” set “T2” = ’21’ where “T1” = ‘2’ and “T2” = ’15’ and ROWID = ‘AAAMiSAABAAAOhiAAB’;

如以上SQL所示,在存在唯一索引的情况下where子句后仍记录了所有列和ROWID;在存在唯一索引和非空约束的情况下表现与存在主键的情况一致。

当某个表上的列数量较多时且没有主键或唯一索引和非空约束的情况下,开启补全日志可能导致重做日志总量大幅提高。

首先建立一个存在250列的表:

Drop table test;

create table test (

t1 varchar2(5),

t2 varchar2(5),

t3 varchar2(5),

t4 varchar2(5),  …t250 varchar2(5))

insert into test values (‘TEST’,’TEST’ ……);   commit; –将255个列填入数据

alter database drop supplemental log data (primary key,unique index) columns;  –关闭补全日志

set autotrace on;

update test set t2=’BZZZZ’ where t1=’TEST’; commit;

可以从自动跟踪信息中看到,本条更新产生了516的重做量。

alter database add supplemental log data (primary key,unique index) columns;  –重新开启补全日志

update test set t2=’FSDSD’ where t1=’TEST’;

跟踪信息显示产生了3044的重做量。

补全日志因作用域的不同又可分为数据库级的和表级的。表级补全日志又可以分为有条件的和无条件的。有条件限制的表级补全日志仅在特定列被更新时才会起作用,有条件限制的表级补全日志较少使用,这里我们不做讨论。

下面我们来观察无条件限制表级补全日志的具体表现:

alter database drop supplemental log data (primary key,unique index) columns;

alter table test add supplemental log data (primary key,unique index) columns;

update test set t2=’ZZZZZ’; commit;

使用LOGMNR工具查看redo中的SQL:
update “SYS”.”TEST” set “T2” = ‘ZZZZZ’ where “T1” = ‘TEST’ and “T2” = ‘AAAAA’ and “T3” = ‘TEST’………

可以发现where子句之后包含了所有列值。

delete test; commit;

使用LOGMNR工具查看redo中的SQL:

delete from “SYS”.”TEST” where “T1” = ‘TEST’ and “T2” = ‘ZZZZZ’ and “T3” = ‘TEST’ and “T4” = ‘TEST’ and “T5” ……

delete操作同样在where子句之后包含了所有列值。

又我们可以针对表上字段建立特定的补全日志组,以减少where子句后列值的出现。

alter table test drop supplemental log data (primary key,unique index) columns;  –关闭表上原先的补全日志

alter table test add supplemental log group test_lgp (t1 ,t2,t3,t4,t5,t6,t12,t250) always; –创建补全日志组

update test set t2=’XXXXX’ ; commit;

使用LOGMNR工具查看redo中的SQL:

update “SYS”.”TEST” set “T2” = ‘XXXXX’ where “T1” = ‘TEST’ and “T2” = ‘TEST’ and “T3” = ‘TEST’ and “T4” = ‘TEST’ and “T5” = ‘TEST’ and “T6” = ‘TEST’ and “T12” = ‘TEST’ and “T250” = ‘TEST’ and ROWID = ‘AAAMieAABAAAOhnAAA’;

如上所示重做日志中正确地显示了UPDATE操作中用户指定的字段值。

delete test;

使用LOGMNR工具查看redo中的SQL:

delete from “SYS”.”TEST” where “T1” = ‘TEST’ and “T2” = ‘XXXXX’ and “T3” = ‘TEST’ ……

delete操作在重做日志中仍然保留了所有列值。

针对字段较多的表,我们在能够以多个列保证数据唯一性且非空的情况下(即应用概念上的主键)来指定表上的补全日志组,以减少update操作时所产生的重做日志,而对于delete操作则无法有效改善。

PMON: TERMINATING INSTANCE DUE TO ERROR 600 on 8i

Alert logfile reported as below:

*********************
Wed May 27 13:11:47 2009
Errors in file /u01/app/oracle/admin/proa021/udump/proa021_ora_9533.trc:
ORA-07445: exception encountered: core dump [memset()+116] [SIGSEGV] [Address not mapped to object] [0] [] []
From Trace file
********************
Dump file /u01/app/oracle/admin/proa021/udump/proa021_ora_9533.trc
Oracle8i Enterprise Edition Release 8.1.7.4.0 - Production
With the Partitioning option
JServer Release 8.1.7.4.0 - Production
ORACLE_HOME = /u01/app/oracle/product/817proa021
System name:	SunOS
Node name:	v08k01
Release:	5.8
Version:	Generic_117350-38
Machine:	sun4u
Instance name: proa021
Redo thread mounted by this instance: 1
Process Info
******************
Oracle process number: 117
Unix process pid: 9533, image: oracle@v08k01 (TNS V1-V3)
Error
*********
2009-05-27 13:11:47.847
ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [memset()+116] [SIGSEGV] [Address not mapped to object] [0] [] []
Current SQL(Current SQL statement for this session)
***********************************************************************
:
SELECT COUNT(PO_LINE_ID) FROM PO_LINES_INTERFACE WHERE PO_HEADER_ID = :b1
Call Stack functions
*************************
ksedmp <- ssexhd <- sigacthandler <- memset
#####################################################################################
From Alert logfile
*********************
Wed May 27 13:18:39 2009
Errors in file /u01/app/oracle/admin/proa021/bdump/proa021_pmon_9584.trc:
ORA-00600: internal error code, arguments: [1115], [], [], [], [], [], [], []
Wed May 27 13:18:56 2009
Errors in file /u01/app/oracle/admin/proa021/bdump/proa021_pmon_9584.trc:
ORA-00600: internal error code, arguments: [1115], [], [], [], [], [], [], []
From Tracefile
*******************
Dump file /u01/app/oracle/admin/proa021/bdump/proa021_pmon_9584.trc
Oracle8i Enterprise Edition Release 8.1.7.4.0 - Production
With the Partitioning option
JServer Release 8.1.7.4.0 - Production
ORACLE_HOME = /u01/app/oracle/product/817proa021
System name:	SunOS
Node name:	v08k01
Release:	5.8
Version:	Generic_117350-38
Machine:	sun4u
Instance name: proa021
Redo thread mounted by this instance: 1
Process Info
****************
Oracle process number: 2
Unix process pid: 9584, image: oracle@v08k01 (PMON)
Error
********
2009-05-27 13:18:39.766
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [1115], [], [], [], [], [], [], []
Call Stack Functions:
****************************
ksedmp <- kgeriv <- kgesiv <- ksesic0 <- kssdch
<- ksuxds <- kssxdl <- kssdch <- ksudlp <- kssxdl
<- ksuxdl <- ksuxda <- ksucln <- ksbrdp <- opirip
<- opidrv <- sou2o <- main <- start
CURRENT SESSION'S INSTANTIATION STATE
*********************************************************
current session=8c8fdfbc
---- Cursor Dump ------
Current cursor: 0, pgadep: 0
Cursor Dump:
End of cursor dump
END OF PROCESS STATE
******************** Cursor Dump ************************
Current cursor: 0, pgadep: 0
Cursor Dump:
End of cursor dump
ksedmp: no current context area

ERROR: ORA-600 [1115]

VERSIONS: versions 6.0 to 10.1

DESCRIPTION: We are encountering a problem while cleaning up a state object.

The State Object is already on free list or has the wrong parent State Object.

FUNCTIONALITY: Kernal Service State object manager

IMPACT:
POSSIBLE INSTANCE FAILURE
PROCESS FAILURE
NON CORRUPTIVE - No underlying data corruption.

SUGGESTIONS: This error may be reported as a direct result of another earlier problem.

Lot of bugs reported

Bug 3837965 : Abstract: ORA-7445'S AND 600'S LEADING UP TO DB CRASH
Comp Version: 8.1.7.4.0
Fixed In Version: 9.2.0.
-------------------------------------------------------------

Bug 3134843 : Abstract: ORACLE PROCESSES CRASHING WITH ORA-7445 SEGVIO ON A NUMBER OF DATABASES
Comp Version: 8.1.7.4
Status: Closed, could not be reproduced
----------------------------------------------------------------

Bug 2760836: Abstract: PMON cleanup of dead shared servers/dispatchers can crash instance(OERI:26599 / OERI 1115)

--------------------------------------------------------------
Note 2760836.8 PMON cleanup of dead shared servers/dispatchers can crash instance (OERI 26599 / OERI 1115)
----------------------------------------------------------------

PROPOSED SOLUTION JUSTIFICATION(S)
==================================
1. One-off patch for Bug 2760836 has fixed this issue...so after customer apply the one-off patch...then this issue will be solved.

OR

2. 9.2.0.4 or later version has fixed this issue...so after customer upgrade to at least 9.2.0.4 version...then this issue will be solved.

The solution can be justified by the followings:

Note 2760836.8 PMON cleanup of dead shared servers/dispatchers can crash instance (OERI 26599 / OERI 1115)

Network Interface No Longer Operational?

Solaris平台上的Oracle数据库,Alert日志偶尔会出现”Network Interface No Longer Operational”的相关记录:

ospid 11223: network interface with IP address 192.4.1.22 no longer operational
requested interface 192.4.1.22 ioctl get mtu. Check output from ifconfig command

该错误一般是由Solaris操作系统Bug 6546482引起的,该错误一般可以忽略。

Script to Detect Tablespace Fragmentation

create table SPACE_TEMP (
TABLESPACE_NAME        CHAR(30),
CONTIGUOUS_BYTES       NUMBER)
/
declare
cursor query is select *
from dba_free_space
order by tablespace_name, block_id;
this_row        query%rowtype;
previous_row    query%rowtype;
total           number;
begin
open query;
fetch query into this_row;
previous_row := this_row;
total := previous_row.bytes;
loop
fetch query into this_row;
exit when query%notfound;
if this_row.block_id = previous_row.block_id + previous_row.blocks then
total := total + this_row.bytes;
insert into SPACE_TEMP (tablespace_name)
values (previous_row.tablespace_name);
else
insert into SPACE_TEMP values (previous_row.tablespace_name,
total);
total := this_row.bytes;
end if;
previous_row := this_row;
end loop;
insert into SPACE_TEMP values (previous_row.tablespace_name,
total);
end;
.
/
set pagesize 60
set newpage 0
set echo off
ttitle center 'Contiguous Extents Report'  skip 3
break on "TABLESPACE NAME" skip page duplicate
spool contig_free_space.lis
rem
column "CONTIGUOUS BYTES"       format 999,999,999,999
column "COUNT"                  format 999
column "TOTAL BYTES"            format 999,999,999,999
column "TODAY"   noprint new_value new_today format a1
rem
select TABLESPACE_NAME  "TABLESPACE NAME",
CONTIGUOUS_BYTES "CONTIGUOUS BYTES"
from SPACE_TEMP
where CONTIGUOUS_BYTES is not null
order by TABLESPACE_NAME, CONTIGUOUS_BYTES desc;
select tablespace_name, count(*) "# OF EXTENTS",
sum(contiguous_bytes) "TOTAL BYTES"
from space_temp
group by tablespace_name;
spool off
drop table SPACE_TEMP
/

example output:

SQL> @TFSTSFRM
Table created.
PL/SQL procedure successfully completed.
Contiguous Extents Report
TABLESPACE NAME                CONTIGUOUS BYTES
------------------------------ ----------------
EXAMPLE                              32,768,000
Contiguous Extents Report
TABLESPACE NAME                CONTIGUOUS BYTES
------------------------------ ----------------
SYSAUX                                3,211,264
Contiguous Extents Report
TABLESPACE NAME                CONTIGUOUS BYTES
------------------------------ ----------------
SYSTEM                              371,130,368
SYSTEM                                  393,216
Contiguous Extents Report
TABLESPACE NAME                CONTIGUOUS BYTES
------------------------------ ----------------
UNDOTBS1                             13,500,416
UNDOTBS1                                524,288
UNDOTBS1                                458,752
UNDOTBS1                                458,752
UNDOTBS1                                327,680
UNDOTBS1                                262,144
UNDOTBS1                                196,608
UNDOTBS1                                131,072
UNDOTBS1                                131,072
UNDOTBS1                                131,072
UNDOTBS1                                 65,536
UNDOTBS1                                 65,536
UNDOTBS1                                 65,536
UNDOTBS1                                 65,536
UNDOTBS1                                 65,536
UNDOTBS1                                 65,536
UNDOTBS1                                 65,536
Contiguous Extents Report
TABLESPACE NAME                CONTIGUOUS BYTES
------------------------------ ----------------
USERS                            10,995,367,936
USERS                                 1,048,576
USERS                                   393,216
USERS                                   262,144
USERS                                   196,608
26 rows selected.
Contiguous Extents Report
TABLESPACE_NAME                # OF EXTENTS      TOTAL BYTES
------------------------------ ------------ ----------------
EXAMPLE                                   1       32,768,000
UNDOTBS1                                 17       16,580,608
USERS                                     7   10,997,268,480
SYSAUX                                    1        3,211,264
SYSTEM                                    2      371,523,584
Table dropped.

EVENT: 10231 "skip corrupted blocks on _table_scans_"

Event: 10231
Text:  skip corrupted blocks on _table_scans_
-------------------------------------------------------------------------------
Cause:
Action: Corrupt blocks are skipped in table scans, and listed in trace files.
Explanation:
This is NOT an error but is a special EVENT code.
It should *NOT* be used unless explicitly requested by ST support.
8.1 onwards:
~~~~~~~~~~~~
The "7.2 onwards" notes below still apply but in Oracle8i
there is a PL/SQL <Package:DBMS_REPAIR> which can be used
to check corrupt blocks.  See <DocIndex:DBMS_REPAIR>.
It is possible to simulate 10231 on a table using
DBMS_REPAIR.SKIP_CORRUPT_BLOCKS('schema','table').
The SKIP_CORRUPT column of DBA_TABLES shows tables which
have been marked to allow skipping of corrupt blocks.
7.2 onwards:
~~~~~~~~~~~~
Event 10231 causes SOFTWARE CORRUPT or MEDIA corrupt blocks
to be skipped on FULL TABLE SCANS only.  (E.g: on export)
Software corrupt blocks are defined below.  Media corrupt
blocks are Oracle blocks where the header field information
is not what was expected.  These can now be skipped with
the 10231 event.
Before 7.2:
~~~~~~~~~~~
Event 10231 causes SOFTWARE CORRUPT blocks to be skipped on
FULL TABLE SCANS only.  (E.g: on export).
A 'software corrupt' block is a block that has a SEQ number of ZERO.
This raises an ORA-1578 error.
NB: Blocks may be internally corrupt and still cause problems or
raise ORA-1578.  If a block is physically corrupt and the SEQ
is not set to ZERO, you cannot use 10231 to skip it.  You have
to try to scan around the block instead.
To manually corrupt a block and cause it to be skipped you
must: Set SEQ to ZERO.
Set the INCSEQ at the end of the block to match.
You can set event numbers 10210, 10211, and 10212 to check blocks
at the data level and mark them software corrupt if they are found
to be corrupt.  You CANNOT use these events to mark a physically
corrupt block as software corrupt because the block never reaches
the data layer.
When a block is skipped, any data in the block is totally ignored.
Usage:  Event="10231 trace name context forever, level 10".
This should be removed from the instance parameters immediately after
it has been used.
Alternatively it can be set at session level:
alter session set events '10231 trace name context forever, level 10'
@Articles:
@       Customer FAX Explaining How to Use Event 10231	 Note 33405.1
@       Data, Index & Cluster Block  <Event:10210><Event:10211><Event:10212>
@	Skip Blocks on Index Range Scan			 <Event:10233>
@	Physical Oracle Data Block Layout		 Note 33242.1

Script to Collect RAC Diagnostic Information (racdiag.sql)

Script:

-- NAME: RACDIAG.SQL
-- SYS OR INTERNAL USER, CATPARR.SQL ALREADY RUN, PARALLEL QUERY OPTION ON
-- ------------------------------------------------------------------------
-- AUTHOR:
-- Michael Polaski - Oracle Support Services
-- Copyright 2002, Oracle Corporation
-- ------------------------------------------------------------------------
-- PURPOSE:
-- This script is intended to provide a user friendly guide to troubleshoot
-- RAC hung sessions or slow performance scenerios. The script includes
-- information to gather a variety of important debug information to determine
-- the cause of a RAC session level hang. The script will create a file
-- called racdiag_.out in your local directory while dumping hang analyze
-- dumps in the user_dump_dest(s) and background_dump_dest(s) on all nodes.
--
-- ------------------------------------------------------------------------
-- DISCLAIMER:
-- This script is provided for educational purposes only. It is NOT
-- supported by Oracle World Wide Technical Support.
-- The script has been tested and appears to work as intended.
-- You should always run new scripts on a test instance initially.
-- ------------------------------------------------------------------------
-- Script output is as follows:
set echo off
set feedback off
column timecol new_value timestamp
column spool_extension new_value suffix
select to_char(sysdate,'Mondd_hhmi') timecol,
'.out' spool_extension from sys.dual;
column output new_value dbname
select value || '_' output
from v$parameter where name = 'db_name';
spool racdiag_&&dbname&×tamp&&suffix
set lines 200
set pagesize 35
set trim on
set trims on
alter session set nls_date_format = 'MON-DD-YYYY HH24:MI:SS';
alter session set timed_statistics = true;
set feedback on
select to_char(sysdate) time from dual;
set numwidth 5
column host_name format a20 tru
select inst_id, instance_name, host_name, version, status, startup_time
from gv$instance
order by inst_id;
set echo on
-- Taking Hang Analyze dumps
-- This may take a little while...
oradebug setmypid
oradebug unlimit
oradebug -g all hanganalyze 3
-- This part may take the longest, you can monitor bdump or udump to see if
-- the file is being generated.
oradebug -g all dump systemstate 267
-- WAITING SESSIONS:
-- The entries that are shown at the top are the sessions that have
-- waited the longest amount of time that are waiting for non-idle wait
-- events (event column). You can research and find out what the wait
-- event indicates (along with its parameters) by checking the Oracle
-- Server Reference Manual or look for any known issues or documentation
-- by searching Metalink for the event name in the search bar. Example
-- (include single quotes): [ 'buffer busy due to global cache' ].
-- Metalink and/or the Server Reference Manual should return some useful
-- information on each type of wait event. The inst_id column shows the
-- instance where the session resides and the SID is the unique identifier
-- for the session (gv$session). The p1, p2, and p3 columns will show
-- event specific information that may be important to debug the problem.
-- To find out what the p1, p2, and p3 indicates see the next section.
-- Items with wait_time of anything other than 0 indicate we do not know
-- how long these sessions have been waiting.
--
set numwidth 10
column state format a7 tru
column event format a25 tru
column last_sql format a40 tru
select sw.inst_id, sw.sid, sw.state, sw.event, sw.seconds_in_wait seconds,
sw.p1, sw.p2, sw.p3, sa.sql_text last_sql
from gv$session_wait sw, gv$session s, gv$sqlarea sa
where sw.event not in
('rdbms ipc message','smon timer','pmon timer',
'SQL*Net message from client','lock manager wait for remote message',
'ges remote message', 'gcs remote message', 'gcs for action', 'client message',
'pipe get', 'null event', 'PX Idle Wait', 'single-task message',
'PX Deq: Execution Msg', 'KXFQ: kxfqdeq - normal deqeue',
'listen endpoint status','slave wait','wakeup time manager')
and sw.seconds_in_wait > 0
and (sw.inst_id = s.inst_id and sw.sid = s.sid)
and (s.inst_id = sa.inst_id and s.sql_address = sa.address)
order by seconds desc;
-- EVENT PARAMETER LOOKUP:
-- This section will give a description of the parameter names of the
-- events seen in the last section. p1test is the parameter value for
-- p1 in the WAITING SESSIONS section while p2text is the parameter
-- value for p3 and p3 text is the parameter value for p3. The
-- parameter values in the first section can be helpful for debugging
-- the wait event.
--
column event format a30 tru
column p1text format a25 tru
column p2text format a25 tru
column p3text format a25 tru
select distinct event, p1text, p2text, p3text
from gv$session_wait sw
where sw.event not in ('rdbms ipc message','smon timer','pmon timer',
'SQL*Net message from client','lock manager wait for remote message',
'ges remote message', 'gcs remote message', 'gcs for action', 'client message',
'pipe get', 'null event', 'PX Idle Wait', 'single-task message',
'PX Deq: Execution Msg', 'KXFQ: kxfqdeq - normal deqeue',
'listen endpoint status','slave wait','wakeup time manager')
and seconds_in_wait > 0
order by event;
-- GES LOCK BLOCKERS:
-- This section will show us any sessions that are holding locks that
-- are blocking other users. The inst_id will show us the instance that
-- the session resides on while the sid will be a unique identifier for
-- the session. The grant_level will show us how the GES lock is granted to
-- the user. The request_level will show us what status we are trying to
-- obtain.  The lockstate column will show us what status the lock is in.
-- The last column shows how long this session has been waiting.
--
set numwidth 5
column state format a16 tru;
column event format a30 tru;
select dl.inst_id, s.sid, p.spid, dl.resource_name1,
decode(substr(dl.grant_level,1,8),'KJUSERNL','Null','KJUSERCR','Row-S (SS)',
'KJUSERCW','Row-X (SX)','KJUSERPR','Share','KJUSERPW','S/Row-X (SSX)',
'KJUSEREX','Exclusive',request_level) as grant_level,
decode(substr(dl.request_level,1,8),'KJUSERNL','Null','KJUSERCR','Row-S (SS)',
'KJUSERCW','Row-X (SX)','KJUSERPR','Share','KJUSERPW','S/Row-X (SSX)',
'KJUSEREX','Exclusive',request_level) as request_level,
decode(substr(dl.state,1,8),'KJUSERGR','Granted','KJUSEROP','Opening',
'KJUSERCA','Canceling','KJUSERCV','Converting') as state,
s.sid, sw.event, sw.seconds_in_wait sec
from gv$ges_enqueue dl, gv$process p, gv$session s, gv$session_wait sw
where blocker = 1
and (dl.inst_id = p.inst_id and dl.pid = p.spid)
and (p.inst_id = s.inst_id and p.addr = s.paddr)
and (s.inst_id = sw.inst_id and s.sid = sw.sid)
order by sw.seconds_in_wait desc;
-- GES LOCK WAITERS:
-- This section will show us any sessions that are waiting for locks that
-- are blocked by other users. The inst_id will show us the instance that
-- the session resides on while the sid will be a unique identifier for
-- the session. The grant_level will show us how the GES lock is granted to
-- the user. The request_level will show us what status we are trying to
-- obtain.  The lockstate column will show us what status the lock is in.
-- The last column shows how long this session has been waiting.
--
set numwidth 5
column state format a16 tru;
column event format a30 tru;
select dl.inst_id, s.sid, p.spid, dl.resource_name1,
decode(substr(dl.grant_level,1,8),'KJUSERNL','Null','KJUSERCR','Row-S (SS)',
'KJUSERCW','Row-X (SX)','KJUSERPR','Share','KJUSERPW','S/Row-X (SSX)',
'KJUSEREX','Exclusive',request_level) as grant_level,
decode(substr(dl.request_level,1,8),'KJUSERNL','Null','KJUSERCR','Row-S (SS)',
'KJUSERCW','Row-X (SX)','KJUSERPR','Share','KJUSERPW','S/Row-X (SSX)',
'KJUSEREX','Exclusive',request_level) as request_level,
decode(substr(dl.state,1,8),'KJUSERGR','Granted','KJUSEROP','Opening',
'KJUSERCA','Cancelling','KJUSERCV','Converting') as state,
s.sid, sw.event, sw.seconds_in_wait sec
from gv$ges_enqueue dl, gv$process p, gv$session s, gv$session_wait sw
where blocked = 1
and (dl.inst_id = p.inst_id and dl.pid = p.spid)
and (p.inst_id = s.inst_id and p.addr = s.paddr)
and (s.inst_id = sw.inst_id and s.sid = sw.sid)
order by sw.seconds_in_wait desc;
-- LOCAL ENQUEUES:
-- This section will show us if there are any local enqueues. The inst_id will
-- show us the instance that the session resides on while the sid will be a
-- unique identifier for. The addr column will show the lock address. The type
-- will show the lock type. The id1 and id2 columns will show specific
-- parameters for the lock type.
--
set numwidth 12
column event format a12 tru
select l.inst_id, l.sid, l.addr, l.type, l.id1, l.id2,
decode(l.block,0,'blocked',1,'blocking',2,'global') block,
sw.event, sw.seconds_in_wait sec
from gv$lock l, gv$session_wait sw
where (l.sid = sw.sid and l.inst_id = sw.inst_id)
and l.block in (0,1)
order by l.type, l.inst_id, l.sid;
-- LATCH HOLDERS:
-- If there is latch contention or 'latch free' wait events in the WAITING
-- SESSIONS section we will need to find out which proceseses are holding
-- latches. The inst_id will show us the instance that the session resides
-- on while the sid will be a unique identifier for. The username column
-- will show the session's username. The os_user column will show the os
-- user that the user logged in as. The name column will show us the type
-- of latch being waited on. You can search Metalink for the latch name in
-- the search bar. Example (include single quotes):
-- [ 'library cache' latch ]. Metalink should return some useful information
-- on the type of latch.
--
set numwidth 5
select distinct lh.inst_id, s.sid, s.username, p.username os_user, lh.name
from gv$latchholder lh, gv$session s, gv$process p
where (lh.sid = s.sid and lh.inst_id = s.inst_id)
and (s.inst_id = p.inst_id and s.paddr = p.addr)
order by lh.inst_id, s.sid;
-- LATCH STATS:
-- This view will show us latches with less than optimal hit ratios
-- The inst_id will show us the instance for the particular latch. The
-- latch_name column will show us the type of latch. You can search Metalink
-- for the latch name in the search bar. Example (include single quotes):
-- [ 'library cache' latch ]. Metalink should return some useful information
-- on the type of latch. The hit_ratio shows the percentage of time we
-- successfully acquired the latch.
--
column latch_name format a30 tru
select inst_id, name latch_name,
round((gets-misses)/decode(gets,0,1,gets),3) hit_ratio,
round(sleeps/decode(misses,0,1,misses),3) "SLEEPS/MISS"
from gv$latch
where round((gets-misses)/decode(gets,0,1,gets),3) < .99
and gets != 0
order by round((gets-misses)/decode(gets,0,1,gets),3);
-- No Wait Latches:
--
select inst_id, name latch_name,
round((immediate_gets/(immediate_gets+immediate_misses)), 3) hit_ratio,
round(sleeps/decode(immediate_misses,0,1,immediate_misses),3) "SLEEPS/MISS"
from gv$latch
where round((immediate_gets/(immediate_gets+immediate_misses)), 3) < .99 and immediate_gets + immediate_misses > 0
order by round((immediate_gets/(immediate_gets+immediate_misses)), 3);
-- GLOBAL CACHE CR PERFORMANCE
-- This shows the average latency of a consistent block request.
-- AVG CR BLOCK RECEIVE TIME should typically be about 15 milliseconds
-- depending on your system configuration and volume, is the average
-- latency of a consistent-read request round-trip from the requesting
-- instance to the holding instance and back to the requesting instance. If
-- your CPU has limited idle time and your system typically processes
-- long-running queries, then the latency may be higher. However, it is
-- possible to have an average latency of less than one millisecond with
-- User-mode IPC. Latency can be influenced by a high value for the
-- DB_MULTI_BLOCK_READ_COUNT parameter. This is because a requesting process
-- can issue more than one request for a block depending on the setting of
-- this parameter. Correspondingly, the requesting process may wait longer.
-- Also check interconnect badwidth, OS tcp settings, and OS udp settings if
-- AVG CR BLOCK RECEIVE TIME is high.
--
set numwidth 20
column "AVG CR BLOCK RECEIVE TIME (ms)" format 9999999.9
select b1.inst_id, b2.value "GCS CR BLOCKS RECEIVED",
b1.value "GCS CR BLOCK RECEIVE TIME",
((b1.value / b2.value) * 10) "AVG CR BLOCK RECEIVE TIME (ms)"
from gv$sysstat b1, gv$sysstat b2
where b1.name = 'global cache cr block receive time' and
b2.name = 'global cache cr blocks received' and b1.inst_id = b2.inst_id
or b1.name = 'gc cr block receive time' and
b2.name = 'gc cr blocks received' and b1.inst_id = b2.inst_id ;
-- GLOBAL CACHE LOCK PERFORMANCE
-- This shows the average global enqueue get time.
-- Typically AVG GLOBAL LOCK GET TIME should be 20-30 milliseconds. the
-- elapsed time for a get includes the allocation and initialization of a
-- new global enqueue. If the average global enqueue get (global cache
-- get time) or average global enqueue conversion times are excessive,
-- then your system may be experiencing timeouts. See the 'WAITING SESSIONS',
-- 'GES LOCK BLOCKERS', GES LOCK WAITERS', and 'TOP 10 WAIT EVENTS ON SYSTEM'
-- sections if the AVG GLOBAL LOCK GET TIME is high.
--
set numwidth 20
column "AVG GLOBAL LOCK GET TIME (ms)" format 9999999.9
select b1.inst_id, (b1.value + b2.value) "GLOBAL LOCK GETS",
b3.value "GLOBAL LOCK GET TIME",
(b3.value / (b1.value + b2.value) * 10) "AVG GLOBAL LOCK GET TIME (ms)"
from gv$sysstat b1, gv$sysstat b2, gv$sysstat b3
where b1.name = 'global lock sync gets' and
b2.name = 'global lock async gets' and b3.name = 'global lock get time'
and b1.inst_id = b2.inst_id and b2.inst_id = b3.inst_id
or b1.name = 'global enqueue gets sync' and
b2.name = 'global enqueue gets async' and b3.name = 'global enqueue get time'
and b1.inst_id = b2.inst_id and b2.inst_id = b3.inst_id;
-- RESOURCE USAGE
-- This section will show how much of our resources we have used.
--
set numwidth 8
select inst_id, resource_name, current_utilization, max_utilization,
initial_allocation
from gv$resource_limit
where max_utilization > 0
order by inst_id, resource_name;
-- DLM TRAFFIC INFORMATION
-- This section shows how many tickets are available in the DLM. If the
-- TCKT_WAIT columns says "YES" then we have run out of DLM tickets which
-- could cause a DLM hang. Make sure that you also have enough TCKT_AVAIL.
--
set numwidth 5
select * from gv$dlm_traffic_controller
order by TCKT_AVAIL;
-- DLM MISC
--
set numwidth 10
select * from gv$dlm_misc;
-- LOCK CONVERSION DETAIL:
-- This view shows the types of lock conversion being done on each instance.
--
select * from gv$lock_activity;
-- TOP 10 WRITE PINGING/FUSION OBJECTS
-- This view shows the top 10 objects for write pings accross instances.
-- The inst_id column shows the node that the block was pinged on. The name
-- column shows the object name of the offending object. The file# shows the
-- offending file number (gc_files_to_locks). The STATUS column will show the
-- current status of the pinged block. The READ_PINGS will show us read
-- converts and the WRITE_PINGS will show us objects with write converts.
-- Any rows that show up are objects that are concurrently accessed across
-- more than 1 instance.
--
set numwidth 8
column name format a20 tru
column kind format a10 tru
select inst_id, name, kind, file#, status, BLOCKS,
READ_PINGS, WRITE_PINGS
from (select p.inst_id, p.name, p.kind, p.file#, p.status,
count(p.block#) BLOCKS, sum(p.forced_reads) READ_PINGS,
sum(p.forced_writes) WRITE_PINGS
from gv$ping p, gv$datafile df
where p.file# = df.file# (+)
group by p.inst_id, p.name, p.kind, p.file#, p.status
order by sum(p.forced_writes) desc)
where rownum < 11
order by WRITE_PINGS desc;
-- TOP 10 READ PINGING/FUSION OBJECTS
-- This view shows the top 10 objects for read pings. The inst_id column shows
-- the node that the block was pinged on. The name column shows the object
-- name of the offending object. The file# shows the offending file number
-- (gc_files_to_locks). The STATUS column will show the current status of the
-- pinged block. The READ_PINGS will show us read converts and the WRITE_PINGS
-- will show us objects with write converts. Any rows that show up are objects
-- that are concurrently accessed across more than 1 instance.
--
set numwidth 8
column name format a20 tru
column kind format a10 tru
select inst_id, name, kind, file#, status, BLOCKS,
READ_PINGS, WRITE_PINGS
from (select p.inst_id, p.name, p.kind, p.file#, p.status,
count(p.block#) BLOCKS, sum(p.forced_reads) READ_PINGS,
sum(p.forced_writes) WRITE_PINGS
from gv$ping p, gv$datafile df
where p.file# = df.file# (+)
group by p.inst_id, p.name, p.kind, p.file#, p.status
order by sum(p.forced_reads) desc)
where rownum < 11
order by READ_PINGS desc;
-- TOP 10 FALSE PINGING OBJECTS
-- This view shows the top 10 objects for false pings. This can be avoided by
-- better gc_files_to_locks configuration. The inst_id column shows the node
-- that the block was pinged on. The name column shows the object name of the
-- offending object. The file# shows the offending file number
-- (gc_files_to_locks). The STATUS column will show the current status of the
-- pinged block. The READ_PINGS will show us read converts and the WRITE_PINGS
-- will show us objects with write converts. Any rows that show up are objects
-- that are concurrently accessed across more than 1 instance.
--
set numwidth 8
column name format a20 tru
column kind format a10 tru
select inst_id, name, kind, file#, status, BLOCKS,
READ_PINGS, WRITE_PINGS
from (select p.inst_id, p.name, p.kind, p.file#, p.status,
count(p.block#) BLOCKS, sum(p.forced_reads) READ_PINGS,
sum(p.forced_writes) WRITE_PINGS
from gv$false_ping p, gv$datafile df
where p.file# = df.file# (+)
group by p.inst_id, p.name, p.kind, p.file#, p.status
order by sum(p.forced_writes) desc)
where rownum < 11
order by WRITE_PINGS desc;
-- INITIALIZATION PARAMETERS:
-- Non-default init parameters for each node.
--
set numwidth 5
column name format a30 tru
column value format a50 wra
column description format a60 tru
select inst_id, name, value, description
from gv$parameter
where isdefault = 'FALSE'
order by inst_id, name;
-- TOP 10 WAIT EVENTS ON SYSTEM
-- This view will provide a summary of the top wait events in the db.
--
set numwidth 10
column event format a25 tru
select inst_id, event, time_waited, total_waits, total_timeouts
from (select inst_id, event, time_waited, total_waits, total_timeouts
from gv$system_event where event not in ('rdbms ipc message','smon timer',
'pmon timer', 'SQL*Net message from client','lock manager wait for remote message',
'ges remote message', 'gcs remote message', 'gcs for action', 'client message',
'pipe get', 'null event', 'PX Idle Wait', 'single-task message',
'PX Deq: Execution Msg', 'KXFQ: kxfqdeq - normal deqeue',
'listen endpoint status','slave wait','wakeup time manager')
order by time_waited desc)
where rownum < 11 order by time_waited desc; -- SESSION/PROCESS REFERENCE: -- This section is very important for most of the above sections to find out -- which user/os_user/process is identified to which session/process. --  set numwidth 7 column event format a30 tru column program format a25 tru column username format a15 tru select p.inst_id, s.sid, s.serial#, p.pid, p.spid, p.program, s.username, p.username os_user, sw.event, sw.seconds_in_wait sec from gv$process p, gv$session s, gv$session_wait sw where (p.inst_id = s.inst_id and p.addr = s.paddr) and (s.inst_id = sw.inst_id and s.sid = sw.sid) order by p.inst_id, s.sid; -- SYSTEM STATISTICS: -- All System Stats with values of > 0. These can be referenced in the
-- Server Reference Manual
--
set numwidth 5
column name format a60 tru
column value format 9999999999999999999999999
select inst_id, name, value
from gv$sysstat
where value > 0
order by inst_id, name;
-- CURRENT SQL FOR WAITING SESSIONS:
-- Current SQL for any session in the WAITING SESSIONS list
--
set numwidth 5
column sql format a80 wra
select sw.inst_id, sw.sid, sw.seconds_in_wait sec, sa.sql_text sql
from gv$session_wait sw, gv$session s, gv$sqlarea sa
where sw.sid = s.sid (+)
and sw.inst_id = s.inst_id (+)
and s.sql_address = sa.address
and sw.event not in ('rdbms ipc message','smon timer','pmon timer',
'SQL*Net message from client','lock manager wait for remote message',
'ges remote message', 'gcs remote message', 'gcs for action', 'client message',
'pipe get', 'null event', 'PX Idle Wait', 'single-task message',
'PX Deq: Execution Msg', 'KXFQ: kxfqdeq - normal deqeue',
'listen endpoint status','slave wait','wakeup time manager')
and sw.seconds_in_wait > 0
order by sw.seconds_in_wait desc;
-- Taking Hang Analyze dumps
-- This may take a little while...
oradebug setmypid
oradebug unlimit
oradebug -g all hanganalyze 3
-- This part may take the longest, you can monitor bdump or udump to see
-- if the file is being generated.
oradebug -g all dump systemstate 267
set echo off
select to_char(sysdate) time from dual;
spool off
-- ---------------------------------------------------------------------------
Prompt;
Prompt racdiag output files have been written to:;
Prompt;
host pwd
Prompt alert log and trace files are located in:;
column host_name format a12 tru
column name format a20 tru
column value format a60 tru
select distinct i.host_name, p.name, p.value
from gv$instance i, gv$parameter p
where p.inst_id = i.inst_id (+)
and p.name like '%_dump_dest'
and p.name != 'core_dump_dest';

Sample Output:

TIME
--------------------
AUG-11-2001 12:06:36
1 row selected.
INST_ID INSTANCE_NAME    HOST_NAME            VERSION        STATUS  STARTUP_TIME
------- ---------------- -------------------- -------------- ------- ------------
1 V9201            opcbsol1             9.2.0.1.0      OPEN    AUG-01-2002
2 V9202            opcbsol2             9.2.0.1.0      OPEN    JUL-09-2002
2 rows selected.
SQL>
SQL> -- Taking Hanganalyze Dumps
SQL> -- This may take a little while...
SQL> oradebug setmypid
Statement processed.
SQL> oradebug unlimit
Statement processed.
SQL> oradebug setinst all
Statement processed.
SQL> oradebug -g def hanganalyze 3
Hang Analysis in /u02/32bit/app/oracle/admin/V9232/bdump/v92321_diag_29495.trc
SQL>
SQL> -- WAITING SESSIONS:
SQL> -- The entries that are shown at the top are the sessions that have
SQL> -- waited the longest amount of time that are waiting for non-idle wait
SQL> -- events (event column).  You can research and find out what the wait
SQL> -- event indicates (along with its parameters) by checking the Oracle
SQL> -- Server Reference Manual or look for any known issues or documentation
SQL> -- by searching Metalink for the event name in the search bar.  Example
SQL> -- (include single quotes): [ 'buffer busy due to global cache' ].
SQL> -- Metalink and/or the Server Reference Manual should return some useful
SQL> -- information on each type of wait event.  The inst_id column shows the
SQL> -- instance where the session resides and the SID is the unique identifier
SQL> -- for the session (gv$session).  The p1, p2, and p3 columns will show
SQL> -- event specific information that may be important to debug the problem.
SQL> -- To find out what the p1, p2, and p3 indicates see the next section.
SQL> -- Items with wait_time of anything other than 0 indicate we do not know
SQL> -- how long these sessions have been waiting.
SQL> --

沪ICP备14014813号

沪公网安备 31010802001379号