resmgr:internal state change等待

resmgr:internal state change等待

 resmgr:internal state change==> session 对应的resource manager plan正在改变中

Hang or similar symptoms when changing resource manager plans during
 heavy workload activity.
 One or more sessions block on a wait for "resmgr:internal state change"

Enqueue enq:WL WL Enqueue等待事件

该Enqueue enq:WL WL Enqueue 队列等待的资源是用来锁定指定的online redo log在线日志文件,其在下列操作时被使用:

  • 当增加一个新的成员到一个日志组
  • 当从日志组中移出一个日志文件
  • 当在清理日志文件内容时
  • 当重命名一个日志文件
  • 当在归档一个日志文件

其ID1 和 ID2的可能的含义:

Log number (id1) ,零(id2)

Redo thread (id1),log sequence number (id2)

Redo thread (id1),log sequence number (id2) but id1’s high order bit is set (The “1<<16” th bit) when we are performing a logical standby related operation

enq: DX – contention等待事件

等待事件说明

在Oracle中enq: DX 队列锁一般用意保护分布式事务(used to protect distributed transactions),对应的就存在 enq: DX – contention等待事件。

 

Id1 / Id2 含义

id2总是0。id1 代表其希望锁定的记录,所以总是distributed transaction elements队列中的一个条记录数(一个整数),由实例参数”distributed_transactions”决定。

 

50: waiting for 'enq: DX - contention' [Enq DX-00000005-00000000]
 .
 Short stack dump:
 ksdxfstk()+36<-ksdxcb()+2472<-sspuser()+176<-__sighndlr()+12<-call_user_handle
 r()+992<-sigacthandler()+104<-_syscall6()+32<-sskgpwwait()+196<-ksliwat()+1020
 <-kslwaitns_timed()+48<-kskthbwt()+232<-kslwait()+296<-K2GTElock1()+1160<-k2gg
 et()+524<-ksupucg()+3312<-opiodr()+2608<-ttcpip()+1188<-opitsk()+1532<-opiino(
 7/11/13 Bug 14663113 - QUERY OVER DBLINK GENERATING 2 SESSIONS ON REMOTE LOCKING EACH OTHER
 )+1128<-opiodr()+1536<-opidrv()+828<-sou2o()+80<-opimai_real()+124<-main()+152
 <-_start()+380
 .
 waiting for 'enq: DX - contention' wait_time=0, seconds since wait
 started=0
 name|mode=44580006, transaction entry #=5, 0=0
 blocking sess=0x41055c078 seq=15904
 Dumping Session Wait History
 for 'inactive transaction branch' count=1 wait_time=0.672186 sec
 branch#=0, waited=0, =0
 for 'enq: DX - contention' count=1 wait_time=2.929764 sec
 name|mode=44580006, transaction entry #=5, 0=0
 for 'enq: DX - contention' count=1 wait_time=2.929774 sec
 name|mode=44580006, transaction entry #=5, 0=0
 for 'inactive transaction branch' count=1 wait_time=0.976637 sec

 

1.1    DX, Distributed TX

How Many Resources:

1

How Many Locks:

1 per  process that tries to do a distributed transaction.

How Many Users:

1/ instance

Who Uses:

RECO’s

When Used:

Used to ensure that only one process is using a  distributed transaction slot at a time.

Id1, Id2 Combination:

Slot Number (of distributed transaction table), Always 0

Lock Value Block:

Not Used.

Init.ora Parameters:

distributed_transactions. (Distributed Database Option)

Scope:

Local Enqueue.

Deadlock Sensitive:

Yes.

Operation:

Synchronous.

know more about ‘CSS initialization’

know more about ‘CSS initialization’

 

‘CSS initialization’ and other CSS registration events takes 8 seconds.
If the slaves write to disk, then they need to register with CSS. This is unavoidable.
If PCIX is run twice, the first run will do CSS registration. PCIX takes 20 seconds.
The second run will not do CSS registration, PCIX takes 12 seconds.
So we estimate CSS registration takes 8 seconds
The resolution is to run PCIX twice, and use second run’s results.

 

Customer is facing the DB performance slow issues.database response is very
slow due to which many of online transactions are declining & high CPU
utilization the server .
Experiencing a high number of CSS initialization and CSS operation action
wait events.even after applying the patch 11069614

 

Excessive CSS registrations were caused by SELECT statements that were doinghash joins, running out of PGA memory and spilling to disk (direct write totemp), thus needing to register with CSS. This is expected behavior.

Know more about “os thread startup”

Know more about “os thread startup”

 

‘os thread startup’ takes significant amount of time in ‘create index parallel’.
All slaves are allocated one by one in serial.
SQL tracing on foreground, there is one ‘os thread startup’ wait per slave, each wait takes 100ms. –> May
need investigation
When there are 512 slaves, ‘os thread startup’ wait take 50 seconds before the slaves start to do any job.
Resolution is to set *.parallel_min_servers=512 to pre-allocated 512 slaves per instance duirng instance
startup, or to run PCIX twice and ignore the first run

 

 

【Oracle等待事件】Transaction等待

【Oracle等待事件】Transaction等待

 

注意如果 不当设置了例如10513 level 2的等待事件,那么可能造成该Transaction等待,原因是一些实例CRASH所产生的死事务,将不会被SMON主动回滚,这可能造成部分行数据始终被锁定 而不释放。

 

Definition:     Wait for a blocking transaction to be rolled back. Continue
waiting until the transaction has been rolled back.

Wait Time:      Wait for up to 1 second for a blocking transaction to be
rolled back. Continue waiting until transaction has been
rolled out.

Parameters:
P1 – Undo Segment No | Slot No within Undo header.
P2 – Wrap Sequence [ max value is UB4MAXVAL ].
P3 – Count

undo seg#
SELECT * FROM v$rollstat WHERE usn = <undo seg#>

slot#   This is the slot# within the transaction table that is being
used to store a transaction.

wrap#   Wrap or sequence number of the slot. For each new transaction
this number increases.

count   Number of times that Oracle7 has waited on this transaction.

Commentary:     Used within kctwit() [ General TX wait routine . ]

Once we have acquired the TX enqueue of the transaction we
are waiting for , we must continue waiting until the blocking
transaction has been committed/rolled back. If the transaction
has been aborted, then we are waiting for PMON or SMON
to complete the rollback (We may be waiting for SMON if a
rollback segment needs recovery).

This wait is interleaved with “undo segment recovery”, because
on each ‘loop’ we call ktucun().
Whether we see this event depends on the availability of the
failed transactions enqueue. This wait may be more prevalent in
a 2pc environment.

Dependency:     We are dependent on the transaction being rolled out, but we
could be waiting for either the local PMON, or a PMON from
another instance [parallel server].

The Undo Segment No will tell us which instance we are waiting
for , but at present this will remain an eyeball operation
rather than automating it within ph2.

Validity:    N/A

Advise:         The waiting basically depends on the size of that transaction
that is being rolled back. You may want to check if PMON / SMON
is doing the rollback for a killed session and if so check
Parameter:CLEANUP_ROLLBACK_ENTRIES.

From 7.3 onwards View:X$KTUXE can be used to see the
state of a transaction – it shows the transaction table
entries in the rollback segment header.

Related:        Lock:TX
Note:34540.1 Enqueue Metrics.
Bug:960962   Oracle8 OPS issue which can cause OPS
deferred transaction recovery to take a long
time, hence causing ‘transaction’ waits.

 

enq: TA – contention 等待事件

enq: TA – contention 等待事件

 

This enqueue is used when undo tablespace operations are being performed. Some examples of such
operations are:
When dropping an undo tablespace we acquire the enqueue in exclsuive mode to stop other sessions
using the undo tablespace at that point. If we cannot acquire the enqueue because the undo tablespace
is in use then we return an error.
When onlining an undo segment.
Undo segment recovery.
Instance shutdown.

 

The id1 value records the operation in use and maps to the following values under 11.2 (beta):
KTU_INSTANCE_SEG1 (1) – instance lock on undo segment lock 1
KTU_INSTANCE_IUP (2) – instance lock on instance existence
KTU_INSTANCE_SEG2 (3) – instance lock on undo segment lock 2
KTU_INSTANCE_EXC (4) – mutual-exclusive lock on starting up
KTU_XA_LOCKS (5) – transaction enqueue
KTU_INSTANCE_UT (6) – instance lock on undo tablespace
KTU_SWITCH_UT (7) – switch (& recovery of) undo tbs operation
KTU_INSTANCE_INIT (8) – undo segment initialisation (during start of cache recovery)
Id2 is typically used to identify the undo segment but for some operations it is just hard-coded as 0.

 

 

 

latch: undo global data Oracle等待

latch: undo global data Oracle等待:

This latch is taken out when we need to search or manipulate the array of rollback (undo) segments used by
the instance (this is pointed to by the fixed SGA variable “ktugd” of type “struct ktugt”).
We need to search / manipulate the array to locate the rollback segment for such operations as SET
TRANSACTION USE ROLLBACK SEGMENT, shutdown, any ALTER ROLLBACK SEGMENT
operation, binding a rollback segment to a transaction etc.
Reducing Contention
1. General latch contention issues.
2. This latch is acquired in many places throughout the code and hence a good first step is to determine
exactly where in the code you are seeing most of the contention. Repeated selecting from
v$latch_misses for this latch and noting the increments may reveal where the contention lies and give a
clue as to how best to avoid it.

latch: undo global data在11g上是非共享的,有多个child的latch

ktusm_stealext_2
ktusmupst: KSLBEGIN
ktusm_stealext: KSLBEGIN
ktuGetRetentionDuration
ktusmasp: ktugd_tuux
ktudba: KSLBEGIN

深入了解db file parallel read等待事件

我们平日所常见的User I/O类物理读等待事件包括db file sequential read和db file scattered read,相信细心的同学也见过db file parallel read,我们今天来探究一下该db file parallel read(db list of blocks read)等待事件的特质。

 

 

SQL> select * from v$version where rownum=1;

BANNER                                                                               CON_ID
-------------------------------------------------------------------------------- ----------
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production              0
www.askmaclean.com

使用目前最新的12c R1 12.1.0.1版本测试

alter session set events '10046 trace name context forever,level 12';

set autotrace traceonly;

alter system flush buffer_cache;               ==》保证物理读

alter session set "_optimizer_batch_table_access_by_rowid"=true;
// BATCH IO一般都会用到db file parallel read

oracle@localhost:~$ strace -o read.log -p 16410 -t -s 200  

在实际操作前使用Strace对IO CALL做TRACE

 select /*+ index(sample ind_t2) */ * from sample where t2 between 1 and 999997;

 

 

 

我们可以从10046 TRACE中获取如下信息, 接着我们到Strace日志中去找对应的SYSCALL:

 

 

1:WAIT #140194991664112: nam='db file sequential read' ela= 12 file#=6 block#=1553 blocks=1 obj#=92176 tim=33606113133
2:WAIT #140194991664112: nam='db file sequential read' ela= 13 file#=6 block#=9324 blocks=1 obj#=92176 tim=33606113191
3:WAIT #140194991664112: nam='db file parallel read' ela= 236 files=1 blocks=13 requests=13 obj#=92176 tim=33606114196
4:WAIT #140194991664112: nam='db file sequential read' ela= 20 file#=6 block#=2192 blocks=1 obj#=92176 tim=33606114373

 

 

我们把10046 trace和 Strace对应起来:

 

 

列出Extent MAP:
1* select extent_id,file_id,block_id ,block_id+blocks-1 from dba_extents where segment_name='SAMPLE'
SQL> /
EXTENT_ID    FILE_ID   BLOCK_ID BLOCK_ID+BLOCKS-1
---------- ---------- ---------- -----------------
0          6        176               183
1          6        184               191
2          6        192               199
3          6        200               207
4          6        208               215
5          6        216               223
6          6        224               231
7          6        232               239
8          6        240               247
9          6        248               255
10          6        256               263
11          6        264               271
12          6        272               279
13          6        280               287
14          6        288               295
15          6        296               303
16          6        384               511
17          6        512               639
18          6        640               767
19          6        768               895
对应于1:
07:21:28 pread(257, "\6\242\0\0\21\6\200\1BZ\36\0\0\0\1\4\211\350\0\0\1\0\0\0\20h\1\0BZ\36\0\0\0\0\0\3\0002\0\0\6\200\1\377\377\0\0\0\
0\0\0\0\0\0\0\0\0\0\0\0\200\0\0\3079\36\0\2\0\37\0\36\7\0\0-p\0\1\203\0010\0\0\200\0\0\315Y\36\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\221\2\377\3774\0055\5\1\0\1\0\0\0\221\2b\37D\37&\37\10\37\352\36\314\36\256\36\220\36r\36T\0366\36\30\3
6\372\35\334\35\276\35\240\35\202\35d\35F\35(\35\n\35\355\34\317\34\261\34\223\34u\34W\0349\34\33\34"..., 8192, 12722176) = 8192
12722176/8192=1553  block#
07:21:28 write(12, "WAIT #140194991652056: nam='db file sequential read' ela= 221 file#=6 block#=1553 blocks=1 obj#=92176 tim=33620843
632", 117) = 117
对应于2:
07:21:28 pread(257, "\6\242\0\0l$\200\1BZ\36\0\0\0\1\4yN\0\0\1\0\37\0\20h\1\0BZ\36\0\0\0\0\0w\0372\0\3!\200\1\2\0\37\0\36\7\0\0\36p\0\
1\203\1E\0\0\200\0\0\315Y\36\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\200\0\0\0\0\0\0\0\0\0
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\200\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\200\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\200\0
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192, 76382208) = 8192
76382208/8192= 9324  block#
07:21:28 write(12, "WAIT #140194991652056: nam='db file sequential read' ela= 175 file#=6 block#=9324 blocks=1 obj#=92176 tim=33620844
373", 117) = 117
对应3:
07:21:28 mmap(0x7f81b0151000, 1114112, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 7, 0) = 0x7f81b0151000
07:21:28 pread(257,..., 8192, 2269184) = 8192    ==> block#=277
07:21:28 pread(257,..., 8192, 2449408) = 8192    ==> block#=299
07:21:28 pread(257,..., 8192, 4055040) = 8192    ==> block#=495
07:21:28 pread(257,..., 8192, 5382144) = 8192    ==> block#=657
5799936
5832704
5922816
8863744
8937472
9699328
10371072
12042240
13131776
07:21:28 write(12, "WAIT #140194991652056: nam='db file parallel read' ela= 3982 files=1 blocks=13 requests=13 obj#=92176 tim=33620849
111", 117) = 117

 

 

 

这里看到db file parallel read 物理读等待事件涉及到的数据块均是不连续的、同时还可以跨越Extent,这点不像db file scattered read。

db file parallel read 等待事件是Oracle 可以对多个数据文件实施并行地物理读取并加载到不连续的内存空间中(可能是PGA也可能是Buffer Cache)。  该db file parallel read 往往出现在 recovery操作或者 buffer prefetch以优化多个单块读的操作中。若该等待事件很突出,一般可以参考db file sequential read的优化方法来调优。

该等待事件的 P1、P2、P3如下:

 

Parameters:
P1	files	  	Number of files being requested
P2	blocks		Total number of blocks being requested
P3	requests	Number of actual AIO requests

[Oracle等待事件】SQL*NET MORE data to client

SQL*NET MORE data to client等待事件:

意味着server process服务进程正在发送更多 数据/信息给client, wait time代表SEND发送这个操作实际完成的时间。

driver id
In Oracle8i onwards P1RAW can be decoded into ASCII characters to give a clue as to which Net driver is used.
Eg: P1RAW=0x62657100 = ‘beq\0’ , P1RAW=0x54435000 = ‘TCP\0’ etc.
In earlier releases the value here is the value of the disconnect function of the Net driver being used (which is not much use).

#bytes
The number of bytes expected to be sent by the server process to the client. Note that in some cases the bytes value may be misleading and may not reflect the actual number of bytes sent.

对该等待时间的一般建议是检查和调整网络或数据库连接

与相关的bug:

 

NB Bug Fixed Description
14249402 12.1.0.1 Poor RAT replay performance with LOBS – “SQL*Net more data to client” waits
8477973 11.2.0.2, 12.1.0.1 Multiple open DB links / ORA-2020 / distributed deadlock / ORA-600 possible using DB Links

 

Wait Time:

This wait blocks until the message is sent (or until an abnormal end of file condition occurs on the underlying Net transport layer). There is not usually an Oracle timeout on the wait, although the Oracle Net layer can be configured to time out if required.
Finding Blockers:

The blocker is generally due to any network delay.
Systemwide Waits:

This event normally denotes time outside of the server waiting to send more data to the client so is treated as a “Network” wait when looking at systemwide timings.
Reducing Waits / Wait times:

This is a normal wait but if the times are excessive one should look at the Net transport to the client process and determine if the time is due to:
Is there an excessive amount of data being sent from server to client
time in the network between the server and the client
(SQL*Net trace (with TIMESTAMPS) can be helpful to check out the time in the network between the client and server. See Note:16658.1 for details of Net tracing).

For the ‘SQL*Net more data to client’ event wait, Oracle uses SDU (Session Data Unit) to write to the SDU buffer which is written to the TCP socket buffer.  If data is larger than the the initial size of Session Data Unit then multiple chunks of data need to be sent. If there is more data to send then after each batch sent the session will wait on the  ‘SQL*Net more data to client’ wait event.

 

The following note describes SDU:

Document 44694.1 SQL*Net Packet Sizes (SDU & TDU Parameters)
How to Diagnose Waits for the ‘SQL*Net message to client’ and ‘SQL*Net more data to client’ events

The best way to diagnose the wait is by running 10046 trace.  A process or a single sql can be traced using 10046 trace:

Document 376442.1 How To Collect 10046 Trace (SQL_TRACE) Diagnostics for Performance Issues
For  example, in the following we are running a select from from SQL*Plus:

SELECT * FROM emp;

call     count       cpu    elapsed       disk      query    current        rows
——- ——  ——– ———- ———- ———- ———-  ———-
Parse        1      0.00       0.01          0          0          0           0
Execute      1      0.00       0.02          0          3          0           0
Fetch        2      0.00       0.06          0          0          0          14
——- ——  ——– ———- ———- ———- ———-  ———-
total        4      0.00       0.10          0          3          0          14

Elapsed times include waiting on following events:
Event waited on                             Times   Max. Wait  Total Waited
—————————————-   Waited  ———-  ————
SQL*Net message to client                       2        0.00          0.00
PX Deq: Execute Reply                          13        0.01          0.01
SQL*Net message from client                     2       26.06         26.06
PX Deq: Signal ACK RSG                          8        0.01          0.01
PX Deq: Signal ACK EXT                          8        0.02          0.02
PX Deq: Slave Session Stats                     8        0.00          0.00
*****************************************************************************
Once the trace is obtained, you can TKProf it to see the timings and waits. Individual waits for  ‘SQL*Net message to client’ are usually of very short duration (in this case the total wait is < 1 microsecond).  The wait is recorded but since it has taken ‘zero’ time, the wait is not necessarily a performance issue or cause for concern.

If you notice unusually high waits for these events, for example as a top wait in  statspack or AWR, then start the tuning process by tracing the process or the sql.

Potential Solutions

1. SDU size

Remember that ‘SQL*net message to client’ is normally not a network issue, as the throughput is based on the TCP packet.  The first session is sent the contents of the SDU buffer which is written to TCP buffer then the session waits for the ‘SQL*net message to client’ event.  The wait is associated with the following factors:

Oracle SDU size
Amount of data returned to the client
One solution is to increase the SDU size. The following document can help with that:

Document 44694.1 SQL*Net Packet Sizes (SDU & TDU Parameters)
2. Arraysize

If the application is using large amount of data, consider increasing the arraysize in the application.  If small arraysize is used to fetch the data, then the query will use multiple fetch calls, each of these will wait for the ‘SQL*net message to client’ event. With a small arraysize and a large amount of data, the number of waits can become significant.

If running SQL from sqlplus, the arraysize can be increased using the sqlplus “set” command:

set arraysize 1000
From the raw 10046 tracefile, the fetch buffer size or the arraysize can be seen from the r (rows) of the fetch line:

FETCH #18446744071490060104:c=0,e=17086,p=2,cr=3,cu=0,mis=0,r=1,dep=1,og=4,plh=872636971,tim=28473178755694
Here the r=1 is indicating arraysize of 1.  1 may be too low; so try increasing it if the wait for ‘SQL*net message to client’ events is large.

There is more information on arraysize and how to increase it in different applications in the following document:

Document 1419023.1 Row Prefetching and its impact on logical reads and fetch calls
3. TCP

Tune TCP connections and make sure the TCP is configured correctly. The following note may help:

Document 1037210.1 How to tune TCP parameters for better performance
Tuning TCP is outside the scope of Oracle Support. If you have difficulties please consult with your network team.

 

沪ICP备14014813号

沪公网安备 31010802001379号