解决Oracle中Split Partition缓慢的问题

有这样一个case , 用户的10g产品数据库中有一张按照月份划分的RANGE分区表 , 其范围为Less than (maxvalue)的最后一个Partition分区总是为空Empty的, 用户每隔半年会对这个MaxValue Partition 执行Split Partition 操作, 以便存放后续月份的数据, 同时这个表上还有一个非分区索引Nonpartitioned indexes。

 

满以为Split 这样一个Empry Partition会是一个回车就结束的任务, 但是Performance issue就在这平淡无奇的分区维护工作中出现了, 实际在执行”Alter Table Split partition Pn at (value) into …” 的DDL语句时,发现需要花费超过十多分钟才能完成一次Split。问题就在于,如果是有大量数据的Partition分区 , Split 操作慢一些也是正常的(expected预期内的) , 但是实际这里的Max Partition总是为空的,没有任何一条数据, 为什么Split 一个空的分区也会是这种蜗牛的速度呢?

 

我们来模拟这个现象, 首先创建一张分区表,Maxvalue的Partition是Empty的,且有一个普通索引:

 

 

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bi
PL/SQL Release 10.2.0.1.0 - Production
CORE    10.2.0.1.0      Production
TNS for Linux: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmaclean.com & www.askmaclean.com

SQL>   CREATE TABLE Maclean
  2     (    "OWNER" VARCHAR2(30),
  3          "OBJECT_NAME" VARCHAR2(128),
  4          "SUBOBJECT_NAME" VARCHAR2(30),
  5          "OBJECT_ID" NUMBER,
  6          "DATA_OBJECT_ID" NUMBER,
  7          "OBJECT_TYPE" VARCHAR2(19),
  8          "CREATED" DATE,
  9          "LAST_DDL_TIME" DATE,
 10          "TIMESTAMP" VARCHAR2(19),
 11          "STATUS" VARCHAR2(7),
 12          "TEMPORARY" VARCHAR2(1),
 13          "GENERATED" VARCHAR2(1),
 14          "SECONDARY" VARCHAR2(1)
 15     ) nologging
 16     partition by range(object_id)
 17     (partition p1 values less than (99999) tablespace users,
 18     partition p2 values less than (maxvalue) tablespace users);

Table created.

SQL> insert /*+ append */ into maclean select * from maclean1;

38736384 rows created.

SQL> commit; 

Commit complete.

SQL> create index ind_obj on maclean(DATA_OBJECT_ID,OBJECT_ID,LAST_DDL_TIME,TIMESTAMP,object_type,owner,status)
nologging parallel
  2  ;

Index created.

SQL> alter index ind_obj noparallel;

Index altered.

SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN',cascade=>true,degree=>2);

PL/SQL procedure successfully completed.

SQL> select num_rows from dba_tables where table_name='MACLEAN';

  NUM_ROWS
----------
  38818438

SQL>  select * from maclean partition (p2);

no rows selected

/* Maclean表有大量的数据 ,但是都在p1分区中, p2分区没有任何数据 */

 

 

我们执行Split partition 的DDL 语句,并使用10046 level 12 event监控该过程:

 

 oradebug setmypid;
oradebug event 10046 trace name context forever,level 12;
SQL>  alter table maclean split partition p2 at (100001)
2   into (partition p3, partition p4);
Table altered.
[oracle@vrh8 ~]$ tkprof /s01/admin/G10R21/udump/g10r21_ora_4896.trc g10r21_ora_4896.tkf
TKPROF: Release 10.2.0.1.0 - Production on Thu Nov 17 23:42:48 2011
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
从tkf 文件中可以找出以下内容:
alter table maclean split partition p2 at (100001)
into (partition p3, partition p4)
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.13       0.30         20       1139          0           0
Execute      1      0.01       0.18          3          6         33           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.14       0.48         23       1145         33           0
select /*+ FIRST_ROWS PARALLEL("MACLEAN", 1) */ 1
from
 "SYS"."MACLEAN" PARTITION ("P2")  where ( (  (  ( "OBJECT_ID" < 100001 )  )
  ) ) and rownum < 2
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1     24.85      39.15     371276     385828          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3     24.85      39.15     371276     385828          0           0
Misses in library cache during parse: 1
Optimizer mode: FIRST_ROWS
Parsing user id: SYS   (recursive depth: 1)
Rows     Row Source Operation
-------  ---------------------------------------------------
      0  COUNT STOPKEY (cr=385828 pr=371276 pw=0 time=39153836 us)
      0   TABLE ACCESS BY GLOBAL INDEX ROWID MACLEAN PARTITION: 2 2 (cr=385828 pr=371276 pw=0 time=39153817 us)
38736384    INDEX FULL SCAN IND_OBJ (cr=385828 pr=371276 pw=0 time=309891137 us)(object id 52832)
Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file sequential read                    371276        0.08         19.46
  latch: cache buffers lru chain                  1        0.00          0.00
select /*+ FIRST_ROWS PARALLEL("MACLEAN", 1) */ 1
from
 "SYS"."MACLEAN" PARTITION ("P2")  where ( (  (  ( "OBJECT_ID" >= 100001 OR
  "OBJECT_ID" IS NULL  )  )  ) ) and rownum < 2
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1      0.00       0.00          0          3          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      0.00       0.00          0          3          0           0
Misses in library cache during parse: 1
Optimizer mode: FIRST_ROWS
Parsing user id: SYS   (recursive depth: 1)
Rows     Row Source Operation
-------  ---------------------------------------------------
      0  COUNT STOPKEY (cr=3 pr=0 pw=0 time=83 us)
      0   PARTITION RANGE SINGLE PARTITION: 2 2 (cr=3 pr=0 pw=0 time=63 us)
      0    TABLE ACCESS FULL MACLEAN PARTITION: 2 2 (cr=3 pr=0 pw=0 time=36 us)

 

 

可以看到在执行”Alter table Split partition”的时候该DDL 语句产生了另外2条递归SQL(recursive sql)被调用,即上例中红色标注的SQL语句,这2条递归SQL分别以 “OBJECT_ID” >= 100001 OR “OBJECT_ID” IS NULL  和  “OBJECT_ID” < 100001 作为条件查询P2分区中的数据, Split Partition的DDL需要使用这2条SQL来找出是否存在满足分隔条件过滤的数据(注意语句中有rownum<2 , 所以实际最多也只返回1条数据,Oracle这样来判定分隔条件的左端或右端是否有数据)。

 

但是这里可以看到,其中 以”OBJECT_ID” < 100001 作为条件的语句运行耗时39.15s, 产生了大量的逻辑和物理读, 究其原因是该SQL的执行计划采用了Index FULL SCAN ,而另外一条以 “OBJECT_ID” >= 100001 OR “OBJECT_ID” IS NULL 作为条件的递归SQL语句则采用了TABLE ACCESS FULL MACLEAN PARTITION, 因为实际P2分区中是没有任何数据的,所以后者运行时间是us级别的, 而前者所要扫描的是一整个没有分区的索引, 这产生了大量的”db file sequential read”等待事件, 我们再来看一下其执行计划:

 

 

SQL> explain plan for
2  select /*+ FIRST_ROWS PARALLEL("MACLEAN", 1) */
3   1
4    from "SYS"."MACLEAN" PARTITION("P2")
5   where (((("OBJECT_ID" < 100001))))
6     and rownum < 2; 
SQL> select * from table(dbms_xplan.display());
PLAN_TABLE_OUTPUT
-------------------------------------------------
Plan hash value: 985826631
---------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |         |     1 |    13 |   380K  (1)| 01:16:01 |       |       |
|*  1 |  COUNT STOPKEY                      |         |       |       |            |          |       |       |
|   2 |   TABLE ACCESS BY GLOBAL INDEX ROWID| MACLEAN |     1 |    13 |   380K  (1)| 01:16:01 |     2 |     2 |
|*  3 |    INDEX FULL SCAN                  | IND_OBJ |    38M|       |   380K  (1)| 01:16:01 |       |       |
---------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<2)
   3 - access("OBJECT_ID"<100001)
       filter("OBJECT_ID"<100001)

 

注意以上递归SQL中的Hint “FIRST_ROWS PARALLEL(“MACLEAN”, 1)”是该Recursive SQL所固有的,换句话说是写死在Split Partition的Oracle代码层里的。

我们可以分析该Recursive SQL 采用INDEX FULL SCAN的原因可能有2种:
1. Split 所指定的分区的分区统计信息不准确,或者已经被清除。 注意一旦我们Split 某个分区后该分区原有统计信息将失效,且被清除。

如下面的例子:

 

SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN',cascade=>true,degree=>2);
PL/SQL procedure successfully completed.
SQL> col high_value for a20
SQL> select partition_name,high_value,num_rows,blocks from dba_tab_partitions where table_name='MACLEAN';
PARTITION_NAME                 HIGH_VALUE             NUM_ROWS     BLOCKS
------------------------------ -------------------- ---------- ----------
P1                             99999                  38789142     533240
P3                             100001                        0          0
P4                             MAXVALUE                      0          0
SQL>  alter table maclean split partition p4 at (100010)
2   into (partition p5, partition p4);
这里我们再次Split 当前的MAXVALUE分区p4
SQL> select partition_name,high_value,num_rows,blocks from dba_tab_partitions where table_name='MACLEAN';
PARTITION_NAME                 HIGH_VALUE             NUM_ROWS     BLOCKS
------------------------------ -------------------- ---------- ----------
P1                             99999                  38789142     533240
P3                             100001                        0          0
P4                             MAXVALUE
P5                             100010

 

 

可以发现Split Partiton 会导致原Partiton的统计信息失效,即便使用dbms_stats.lock_table_stats锁住统计信息也无法避免这种失效。
且单个partiton的统计信息失效并不会导致动态采用的发生(dynamic sampling):

 

SQL> show parameter dyna 
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_dynamic_sampling           integer     2
SQL> select * from maclean partition (p4);
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 3900731449
--------------------------------------------------------------------------------------------------
| Id  | Operation              | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |         |  9699K|   860M| 29715   (2)| 00:05:57 |       |       |
|   1 |  PARTITION RANGE SINGLE|         |  9699K|   860M| 29715   (2)| 00:05:57 |     4 |     4 |
|   2 |   TABLE ACCESS FULL    | MACLEAN |  9699K|   860M| 29715   (2)| 00:05:57 |     4 |     4 |
--------------------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
124  recursive calls
0  db block gets
23  consistent gets
0  physical reads
0  redo size
1139  bytes sent via SQL*Net to client
458  bytes received via SQL*Net from client
1  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
0  rows processed
SQL> alter system flush shared_pool;
System altered.
SQL> alter session set optimizer_dynamic_sampling=10;
Session altered.
SQL> select * from maclean partition (p4);
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 3900731449
--------------------------------------------------------------------------------------------------
| Id  | Operation              | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |         |  9699K|   860M| 29715   (2)| 00:05:57 |       |       |
| 1 | PARTITION RANGE SINGLE | | 9699K| 860M| 29715 (2)| 00:05:57 | 4 | 4 |
|   2 |   TABLE ACCESS FULL    | MACLEAN |  9699K|   860M| 29715   (2)| 00:05:57 |     4 |     4 |
--------------------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
5812  recursive calls
0  db block gets
1141  consistent gets
22  physical reads
0  redo size
1139  bytes sent via SQL*Net to client
458  bytes received via SQL*Net from client
1  SQL*Net roundtrips to/from client
139  sorts (memory)
0  sorts (disk)
0  rows processed
SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN',partname=>'P4');
PL/SQL procedure successfully completed.
SQL> select * from maclean partition (p4);
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 3900731449
--------------------------------------------------------------------------------------------------
| Id  | Operation              | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |         |     1 |   177 |     2   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE|         |     1 |   177 |     2   (0)| 00:00:01 |     4 |     4 |
|   2 |   TABLE ACCESS FULL    | MACLEAN |     1 |   177 |     2   (0)| 00:00:01 |     4 |     4 |
--------------------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
0  recursive calls
0  db block gets
3  consistent gets
0  physical reads
0  redo size
1139  bytes sent via SQL*Net to client
458  bytes received via SQL*Net from client
1  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
0  rows processed

 

 

由于Split Partition 本身会导致分区的统计信息失效, 这叫造成由该Split DDL所生成的递归查询SQL语句在解析时CBO Optimizer优化器无法了解该分区的必要统计信息,所以优化器会根据整张Table的统计信息来估算(根据算法来估,而没有实际的统计操作), 实际统计信息中整张表有38818438 行 , 且共有4个分区, 所以P4分区就顺利成长的拥有38818438 /4 = 9704k Rows了 , 实际上例中红色标注的估算值时9699k rows, 因为CBO 优化器得到的统计信息是该分区中有大量的数据 ,这导致其最终选择了 FULL INDEX SCAN的执行计划 ,而不是去扫描其实是空空如也的分区。

 

这是我们在 10g 中执行Split Partition 操作时需要密切注意的一个问题, 解决方法是没执行一次Split Partition DDL语句之前都收集一遍MaxValue Partiton 的统计信息, 因为该分区是空的 ,所以这种统计是十分迅速的:

 

exec dbms_stats.gather_table_stats('&OWNER','&TABNAME',partname=>'&PARNAME');

 

 

另一种手段则是在每次Split 之前都 手动修改Maxvalue分区的统计信息, 这样做会更快一些:

 

exec dbms_stats.set_table_stats(ownname => '&OWNER',tabname => '&TABNAME',partname => '&PARNAME',
numrows => 0,numblks => 0,force => TRUE);

 

 

2. 另一个原因是 相关的递归SQL语句 被嵌入了”First Rows”的hint , 该提示会让CBO更倾向于使用索引以便快速返回前几行的结果,注意因为这些递归SQL实际只要求返回一行结果,所以First Rows 可以说是恰当且明智的; 另外在分区表+本地分区的情景中, 即便这个分区是非空的且存有大量的数据, 那么使用索引都可以说是正确的选择。

 

但是在这里选择INDEX FULL SCAN 恰恰是不明智的, 很显然Oracle开发部门没有为分区表+ 非分区索引(Non-partitioned Indexes) 或 全局索引(global partitioned indexes)的场景考虑该First Rows提示可能带来的后果,  已知在版本10.2.0.2 和 10.2.0.3 上都存在该不恰当的递归SQL hint造成的Split Partiton性能问题,Bug Note<Bug 6435544: SPLIT PARTITION SLOW BECAUSE OF HINTED RECURSIVE SQL>说明了该问题:

 

Hdr: 6435544 10.2.0.2.0 RDBMS 10.2.0.2.0 QRY OPTIMIZER PRODID-5 PORTID-226
Abstract: SPLIT PARTITION SLOW BECAUSE OF HINTED RECURSIVE SQL
PROBLEM:
--------
Split partition operation took more than 45 minutes to complete. Almost all
the time is taken up by the following SQL -
select /*+ FIRST_ROWS PARALLEL("D_CUSTOMER_ORDER_ITEM_CHANNELS", 8) */ 1
from
"BOOKER"."D_CUSTOMER_ORDER_ITEM_CHANNELS" PARTITION ("COIC101_MAX")
where ( ( ( ( "LEGAL_ENTITY_ID" < 101 )  )  OR  ( "LEGAL_ENTITY_ID" = 101 AND
(
"ORDER_DAY" < TO_DATE(' 2007-11-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS',
'NLS_CALENDAR=GREGORIAN') )  )  ) ) and rownum < 2
DIAGNOSTIC ANALYSIS:
--------------------
a. Table is partitioned on - (LEGAL_ENTITY_ID, ORDER_DAY).
Index PK_D_CUST_ORDER_ITEM_CHANNELS is on -
(CUSTOMER_ORDER_ITEM_ID, MARKETPLACE_ID, LEGAL_ENTITY_ID, ORDER_DAY)
b. Table has 555 partitions. Index is a global index. 
c. From the tkprof output -
call   count       cpu    elapsed       disk      query  current  rows
----- ------  -------- ---------- ---------- ---------- --------  ----
Parse      1      0.00       0.00          0          0        0     0
Execute    1      0.00       0.00          0          0        0     0
Fetch      1    405.50    2796.28    1408207    1444973        0     0
----- ------  -------- ---------- ---------- ---------- --------  ----
total      3    405.50    2796.28    1408207    1444973        0     0
Misses in library cache during parse: 1
Optimizer mode: FIRST_ROWS
Parsing user id: 108     (recursive depth: 1)
Rows     Row Source Operation
-------  ---------------------------------------------------
time=6 us)(object id 1160811)
d. Although we have partition_name in the query, we go for INDEX FULL scan
because of hard coded hint FIRST_ROWS. A FTS of the concerned partition would
be much faster in this case.
WORKAROUND:
-----------
Drop the global index which would force an FTS on a single partition.
RELEASE NOTES:
]] Poor performance was possible for ALTER TABLE SPLIT PARTITION if there
]] was an index on the partition key.
REDISCOVERY INFORMATION:
If you get poor performance for an ALTER TABLE SPLIT PARTITION operation
and there is an index on the partition key, you may be hitting this bug.
WORKAROUND:
None

 

 

Metalink给出的Workaround方案是将分区表上的global index 全局索引drop 掉, 这样可以让CBO只能选择对single partition的FULL TABLE SCAN。

 

实际上这个Solution并很不能让人满意,在产品环境中我们不可能随意drop掉某张关键表上的索引,所以这个solution的可操作性很低。

 

补充:我们来看一下First Rows Hint 在CBO计算成本时如何起作用的:

 

SQL> oradebug setmypid;
Statement processed.
SQL> oradebug event 10053 trace name context forever,level 1;
Statement processed.
SQL> explain plan for select /*+ FIRST_ROWS PARALLEL("MACLEAN", 1) */
2   1
3    from "SYS"."MACLEAN" PARTITION("P4")
4   where (((("OBJECT_ID" < 100010))))
5     and rownum < 2; Explained. SQL> oradebug tracefile_name;
10053 trace content ================================================
*************************************
PARAMETERS WITH ALTERED VALUES
******************************
optimizer_mode_hinted               = true
optimizer_mode                      = first_rows
***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: MACLEAN  Alias: MACLEAN  Partition [3]
    #Rows: 0  #Blks:  1  AvgRowLen:  0.00
    #Rows: 0  #Blks:  1  AvgRowLen:  0.00
Index Stats::
  Index: IND_OBJ  Col#: 5 4 8 9 6 1 10
    LVLS: 3  #LB: 380544  #DK: 50447  LB/K: 7.00  DB/K: 777.00  CLUF: 39208542.00
***************************************
SINGLE TABLE ACCESS PATH
  Column (#4): OBJECT_ID(NUMBER)  Part#: 3
    AvgLen: 22.00 NDV: 0 Nulls: 0 Density: 0.0000e+00 Min: 0 Max: 0
  Column (#4): OBJECT_ID(NUMBER)
    AvgLen: 22.00 NDV: 0 Nulls: 0 Density: 0.0000e+00 Min: 0 Max: 0
  Table: MACLEAN  Alias: MACLEAN     
    Card: Original: 0  Rounded: 1  Computed: 0.00  Non Adjusted: 0.00
  Access Path: TableScan
    Cost:  2.00  Resp: 2.00  Degree: 0
      Cost_io: 2.00  Cost_cpu: 7121
      Resp_io: 2.00  Resp_cpu: 7121
kkofmx: index filter:"MACLEAN"."OBJECT_ID"<100010 AND ROWNUM<2
  Access Path: index (skip-scan)
    SS sel: 0.0000e+00  ANDV (#skips): 4073
    SS io: 32584.00 vs. table scan io: 2.00
    Skip Scan rejected
  Access Path: index (FullScan)
    Index: IND_OBJ
    resc_io: 380547.00  resc_cpu: 10551751028
    ix_sel: 1  ix_sel_with_filters: 1
    Cost: 382007.38  Resp: 382007.38  Degree: 1
  Best:: AccessPath: IndexRange  Index: IND_OBJ
         Cost: 382007.38  Degree: 1  Resp: 382007.38  Card: 0.00  Bytes: 0
***************************************
OPTIMIZER STATISTICS AND COMPUTATIONS
***************************************
GENERAL PLANS
***************************************
Considering cardinality-based initial join order.
***********************
Join order[1]:  MACLEAN[MACLEAN]#0
***********************
Best so far: Table#: 0  cost: 382007.3822  card: 0.0000  bytes: 13
(newjo-stop-1) k:0, spcnt:0, perm:1, maxperm:80000
*********************************
Number of join permutations tried: 1
*********************************
prefetching is on for IND_OBJ
Final - First Rows Plan:  Best join order: 1
  Cost: 382007.3822  Degree: 1  Card: 1.0000  Bytes: 13
  Resc: 382007.3822  Resc_io: 380547.0000  Resc_cpu: 12512178128
  Resp: 382007.3822  Resp_io: 380547.0000  Resc_cpu: 12512178128
kkoipt: Query block SEL$1 (#0)
******* UNPARSED QUERY IS *******
SELECT /*+ FIRST_ROWS NO_PARALLEL ("MACLEAN") */ 1 "1" FROM "SYS"."MACLEAN" PARTITION ("P4") 
"MACLEAN" WHERE ROWNUM<2 AND "MACLEAN"."OBJECT_ID"<100010
kkoqbc-end
          : call(in-use=46464, alloc=49080), compile(in-use=39288, alloc=40552)
apadrv-end: call(in-use=46464, alloc=49080), compile(in-use=40072, alloc=40552)
sql_id=ff1ft3uxsq105.
Current SQL statement for this session:
explain plan for select /*+ FIRST_ROWS PARALLEL("MACLEAN", 1) */
 1
  from "SYS"."MACLEAN" PARTITION("P4")
 where (((("OBJECT_ID" < 100010))))
   and rownum < 2
============
Plan Table
============
-------------------------------------------------------+-----------------------------------+---------------+
| Id  | Operation                            | Name    | Rows  | Bytes | Cost  | Time      | Pstart| Pstop |
-------------------------------------------------------+-----------------------------------+---------------+
| 0   | SELECT STATEMENT                     |         |       |       |  373K |           |       |       |
| 1   |  COUNT STOPKEY                       |         |       |       |       |           |       |       |
| 2   |   TABLE ACCESS BY GLOBAL INDEX ROWID | MACLEAN |     1 |    13 |  373K |  01:16:25 | 4     | 4     |
| 3   |    INDEX FULL SCAN                   | IND_OBJ |   37M |       |  373K |  01:16:25 |       |       |
-------------------------------------------------------+-----------------------------------+---------------+
Predicate Information:
----------------------
1 - filter(ROWNUM<2)
3 - access("OBJECT_ID"<100010)
3 - filter("OBJECT_ID"<100010)
Content of other_xml column
===========================
  db_version     : 10.2.0.1
  parse_schema   : SYS
  plan_hash      : 985826631
  Outline Data:
  /*+
    BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('10.2.0.1')
      FIRST_ROWS
      OUTLINE_LEAF(@"SEL$1")
      INDEX(@"SEL$1" "MACLEAN"@"SEL$1" ("MACLEAN"."DATA_OBJECT_ID" "MACLEAN"."OBJECT_ID"
"MACLEAN"."LAST_DDL_TIME" "MACLEAN"."TIMESTAMP" "MACLEAN"."OBJECT_TYPE" "MACLE
AN"."OWNER" "MACLEAN"."STATUS"))
    END_OUTLINE_DATA
  */

 

 

可以看到虽然INDEX FULL SCAN的成本(cost:382007)大于 Access Path: TableScan (cost : 2.00) 很多 ,但是optimizer 最终仍然选择了Index Full Scan , 因为其是满足First Rows 要求的执行计划(红色标注部分)。

 

于是我开始自己寻找workaround的路径,目标是让优化器忽略”First Rows”的影响。我一开始寄望于能够通过设置一些影响CBO计算cost的优化器参数来让optimizer 迷途知返, 包括设置optimizer_index_cost_adj和”_db_file_optimizer_read_count”的值到一个很大水平,但发现并不起作用:

 

SQL> alter session set "_db_file_optimizer_read_count"=65535;
Session altered.
SQL>  alter session set optimizer_index_cost_adj=10000;
Session altered.
SQL> alter system flush shared_pool;
System altered.
SQL> select /*+ FIRST_ROWS PARALLEL("MACLEAN", 1) */
2   1
3    from "SYS"."MACLEAN" PARTITION("P4")
4   where (((("OBJECT_ID" < 100010))))
5     and rownum < 2;
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 985826631
---------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |         |     1 |    13 |    38M  (1)|127:20:09 |       |       |
|*  1 |  COUNT STOPKEY                      |         |       |       |            |          |       |       |
|   2 |   TABLE ACCESS BY GLOBAL INDEX ROWID| MACLEAN |     1 |    13 |    38M  (1)|127:20:09 |     4 |     4 |
|*  3 |    INDEX FULL SCAN                  | IND_OBJ |    39M|       |    38M  (1)|127:20:09 |       |       |
---------------------------------------------------------------------------------------------------------------

 

得益于好奇心,我以’%optimizer%ignore’的Like语句去查了Oracle的隐式参数表,果然有志者事竟成,最终有所斩获:

 

SQL> col name for a40
SQL> col value for a20
SQL> col describ for a60
SQL> set linesize 200 pagesize 1400
SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
2   FROM SYS.x$ksppi x, SYS.x$ksppcv y
3   WHERE x.inst_id = USERENV ('Instance')
4   AND y.inst_id = USERENV ('Instance')
5   AND x.indx = y.indx
6  AND x.ksppinm LIKE '%optimizer%ignore%';
NAME                                     VALUE                DESCRIB
---------------------------------------- -------------------- ---------------------
_optimizer_ignore_hints                  TRUE                 enables the embedded hints to be ignored

 

 

在metalink上查了下没有关于该”_optimizer_ignore_hints” 隐式参数的任何有用信息, 就注释来看是可以启用是否忽略SQL中嵌入的HINT提示信息, 我们来具体看以下是否其作用:

 

 

SQL> alter system flush shared_pool;
System altered.
SQL> explain plan for select /*+ FIRST_ROWS PARALLEL("MACLEAN", 1) */
2   1
3    from "SYS"."MACLEAN" PARTITION("P4")
4   where (((("OBJECT_ID" < 100010))))
5     and rownum < 2; 
Explained. 
SQL> select * from table(dbms_xplan.display());
PLAN_TABLE_OUTPUT
---------------------------------------------
Plan hash value: 985826631
---------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |         |     1 |    13 |    38M  (1)|127:20:09 |       |       |
|*  1 |  COUNT STOPKEY                      |         |       |       |            |          |       |       |
|   2 |   TABLE ACCESS BY GLOBAL INDEX ROWID| MACLEAN |     1 |    13 |    38M  (1)|127:20:09 |     4 |     4 |
|*  3 |    INDEX FULL SCAN                  | IND_OBJ |    39M|       |    38M  (1)|127:20:09 |       |       |
---------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<2)
   3 - access("OBJECT_ID"<100010)
       filter("OBJECT_ID"<100010)
SQL> alter session set "_optimizer_ignore_hints"=true;
Session altered.
SQL> oradebug setmypid;
Statement processed.
SQL> oradebug event 10053 trace name context forever , level 1;
Statement processed.
SQL> explain plan for select /*+ FIRST_ROWS PARALLEL("MACLEAN", 1) */
2   1
3    from "SYS"."MACLEAN" PARTITION("P4")
4   where (((("OBJECT_ID" < 100010))))
5     and rownum < 2; Explained. SQL> select * from table(dbms_xplan.display());
PLAN_TABLE_OUTPUT
-----------------------------------------------
Plan hash value: 4280157877
---------------------------------------------------------------------------------------------------
| Id  | Operation               | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |         |     1 |    13 |     2   (0)| 00:00:01 |       |       |
|*  1 |  COUNT STOPKEY          |         |       |       |            |          |       |       |
|   2 |   PARTITION RANGE SINGLE|         |     1 |    13 |     2   (0)| 00:00:01 |     4 |     4 |
|*  3 |    TABLE ACCESS FULL    | MACLEAN |     1 |    13 |     2   (0)| 00:00:01 |     4 |     4 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<2)
   3 - filter("OBJECT_ID"<100010)

 

惊喜! 该 “_optimizer_ignore_hints”参数确实起到无视HINT提示的作用, 且可以直接在session级别修改,十分方便, 我们透过10053 event来观察该参数是如何其作用的:

 

***************************************
PARAMETERS USED BY THE OPTIMIZER
********************************
*************************************
PARAMETERS WITH ALTERED VALUES
******************************
_optimizer_ignore_hints             = true
***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: MACLEAN  Alias: MACLEAN  Partition [3]
    #Rows: 0  #Blks:  1  AvgRowLen:  0.00
    #Rows: 0  #Blks:  1  AvgRowLen:  0.00
Index Stats::
  Index: IND_OBJ  Col#: 5 4 8 9 6 1 10
    LVLS: 3  #LB: 380544  #DK: 50447  LB/K: 7.00  DB/K: 777.00  CLUF: 39208542.00
***************************************
SINGLE TABLE ACCESS PATH
  Column (#4): OBJECT_ID(NUMBER)  Part#: 3
    AvgLen: 22.00 NDV: 0 Nulls: 0 Density: 0.0000e+00 Min: 0 Max: 0
  Column (#4): OBJECT_ID(NUMBER)
    AvgLen: 22.00 NDV: 0 Nulls: 0 Density: 0.0000e+00 Min: 0 Max: 0
  Table: MACLEAN  Alias: MACLEAN
    Card: Original: 0  Rounded: 1  Computed: 0.00  Non Adjusted: 0.00
  Access Path: TableScan
    Cost:  2.00  Resp: 2.00  Degree: 0
      Cost_io: 2.00  Cost_cpu: 7121
      Resp_io: 2.00  Resp_cpu: 7121
kkofmx: index filter:"MACLEAN"."OBJECT_ID"<100010 AND ROWNUM<2
  Access Path: index (skip-scan)
    SS sel: 0.0000e+00  ANDV (#skips): 4073
    SS io: 32584.00 vs. table scan io: 2.00
    Skip Scan rejected
  Access Path: index (FullScan)
    Index: IND_OBJ
    resc_io: 380547.00  resc_cpu: 10551751028
    ix_sel: 1  ix_sel_with_filters: 1
    Cost: 382007.38  Resp: 382007.38  Degree: 1
  Best:: AccessPath: TableScan
***************************************
OPTIMIZER STATISTICS AND COMPUTATIONS
***************************************
GENERAL PLANS
***************************************
Considering cardinality-based initial join order.
***********************
Join order[1]:  MACLEAN[MACLEAN]#0
***********************
Best so far: Table#: 0  cost: 2.0008  card: 0.0000  bytes: 13
(newjo-stop-1) k:0, spcnt:0, perm:1, maxperm:80000
*********************************
Number of join permutations tried: 1
*********************************
Final - All Rows Plan:  Best join order: 1
  Cost: 2.0008  Degree: 1  Card: 1.0000  Bytes: 13
  Resc: 2.0008  Resc_io: 2.0000  Resc_cpu: 7121
  Resp: 2.0008  Resp_io: 2.0000  Resc_cpu: 7121
Current SQL statement for this session:
explain plan for select /*+ FIRST_ROWS PARALLEL("MACLEAN", 1) */
 1
  from "SYS"."MACLEAN" PARTITION("P4")
 where (((("OBJECT_ID" < 100010))))
   and rownum < 2
============
Plan Table
============
-------------------------------------------+-----------------------------------+---------------+
| Id  | Operation                | Name    | Rows  | Bytes | Cost  | Time      | Pstart| Pstop |
-------------------------------------------+-----------------------------------+---------------+
| 0   | SELECT STATEMENT         |         |       |       |     2 |           |       |       |
| 1   |  COUNT STOPKEY           |         |       |       |       |           |       |       |
| 2   |   PARTITION RANGE SINGLE |         |     1 |    13 |     2 |  00:00:01 | 4     | 4     |
| 3   |    TABLE ACCESS FULL     | MACLEAN |     1 |    13 |     2 |  00:00:01 | 4     | 4     |
-------------------------------------------+-----------------------------------+---------------+
Predicate Information:
----------------------
1 - filter(ROWNUM<2)
3 - filter("OBJECT_ID"<100010)
Content of other_xml column
===========================
  db_version     : 10.2.0.1
  parse_schema   : SYS
  plan_hash      : 4280157877
  Outline Data:
  /*+
    BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('10.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "MACLEAN"@"SEL$1")
    END_OUTLINE_DATA
  */
Dumping Hints
=============
  atom_hint=(@=0x988453c0 err=20 resol=1 used=0 token=453 org=1 lvl=2 txt=FIRST_ROWS )
  atom_hint=(@=0x988451f8 err=0 resol=1 used=1 token=177 org=1 lvl=3 txt=NO_PARALLEL ("MACLEAN") )
********** WARNING: SOME HINTS HAVE ERRORS *********

 

实际 “_optimizer_ignore_hints”参数是起到的作用是使SQL 隐式地回归到默认的optimizer_mode=ALL_ROWS上来,  我们终于战胜了嵌入在SQL语句中的”First Rows” Hint 。

 

总结

在针对10g 多个版本上的Split Partition 可能因 其DDL所附带的递归SQL 使用了 固定的 而又不恰当的”First rows Hint” 提示而造成语句执行缓慢的问题, 我们可以通过 以下3个方案解决:

A.  通过每次执行Split之前都收集指定分区的统计,并设置”_optimizer_ignore_hints” =true 来规避 分区统计信息失效和 不恰当的”First Rows hint” 所可能带来的问题, 方法如下:

exec dbms_stats.gather_table_stats('&OWNER','&TABNAME',partname=>'&PARNAME');
alter session set "_optimizer_ignore_hints"=true;

推荐使用方案A

补充:

有同学反映:

”隐藏参数_optimizer_ignore_hints在分区表的split操作中并未启到作用。

从我的测试结果来看,虽然导致split操作慢的根源是FIRST_ROWS优化器模式下的分区表select语句:

select /*+ FIRST_ROWS PARALLEL(“MACLEAN”, 1) */ 1
from
 MACLEAN PARTITION (“P4”)  where ( (  (  (
  “OBJECT_ID” < ‘1000010’ )  )  ) ) and rownum < 2;

不过以上的sql语句在_optimizer_ignore_hints参数调整后仍使用FIRST_ROWS的优化器模式,即没有生效。

我觉得这可能与oracle在执行自己内部命令时,会忽略该参数的设置。(直接执行select的sql语句确实会忽略hint)”

 

实际通过10053 事件追踪该SPLIT PARTITION DDL语句所生产的递归SQL语句,发现带有”FIRST_ROWS”提示的SELECT语句甚至没有解析的过程, 很有可能是该递归SQL语句直接使用了内部硬编码的存储大纲OUTLINES所导致的。

 

换句话说之前_optimizer_ignore_hints的隐式参数针对我们手动执行的带有FIRST_ROWS HINT的SELECT语句是有效的,而对于DDL所附带的递归SQL无效。

 

注:实际在对嵌入了HINT的非recursive SQL语句做调优时,若我们无法修改该SQL的HINT,则还是可以利用到”_optimizer_ignore_hints”的。

 

想了一下可以通过设置较旧的优化器特性了解决该问题(alter session set optimizer_features_enable=’8.0.0′;),该optimizer_features_enable参数同样可以在session级别设置, 如:

 

TEST A:
SQL> set timing on;
SQL> oradebug setmypid;
Statement processed.
SQL> oradebug event 10046 trace name context forever,level 12;
Statement processed.
SQL> oradebug tracefile_name;
/s01/admin/G10R21/udump/g10r21_ora_13646.trc
SQL> alter table maclean split partition p4 at (10000081) into (partition p14, partition p4);
Table altered.
Elapsed: 00:00:42.50
TEST B:
SQL> set timing on;
SQL> oradebug setmypid;
Statement processed.
SQL> oradebug tracefile_name;
/s01/admin/G10R21/udump/g10r21_ora_13656.trc
SQL> alter session set optimizer_features_enable='8.0.0';
Session altered.
Elapsed: 00:00:00.01
SQL> alter table maclean split partition p4 at (10000091) into (partition p15, partition p4);
Table altered.
Elapsed: 00:00:00.05
PARSING IN CURSOR #2 len=152 dep=1 uid=0 oct=3 lid=0 tim=1291531645417871 hv=2124209225 ad='a74b41f0'
select /*+ FIRST_ROWS PARALLEL("MACLEAN", 1) */ 1 from "SYS"."MACLEAN" PARTITION ("P4") 
where ( (  (  ( "OBJECT_ID" < 10000091 )  )  ) ) and rownum < 2
END OF STMT
PARSE #2:c=1000,e=684,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=2,tim=1291531645417864
BINDS #2:
EXEC #2:c=0,e=71,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=2,tim=1291531645417984
FETCH #2:c=0,e=92,p=0,cr=3,cu=0,mis=0,r=0,dep=1,og=2,tim=1291531645418094
STAT #2 id=1 cnt=0 pid=0 pos=1 obj=0 op='COUNT STOPKEY (cr=3 pr=0 pw=0 time=122 us)'
STAT #2 id=2 cnt=0 pid=1 pos=1 obj=0 op='PARTITION RANGE SINGLE PARTITION: 13 13 (cr=3 pr=0 pw=0 time=95 us)'
STAT #2 id=3 cnt=0 pid=2 pos=1 obj=52809 op='TABLE ACCESS FULL MACLEAN PARTITION: 13 13 (cr=3 pr=0 pw=0 time=59 us)'
=====================
PARSING IN CURSOR #2 len=177 dep=1 uid=0 oct=3 lid=0 tim=1291531645418799 hv=339345368 ad='a74cedd8'
select /*+ FIRST_ROWS PARALLEL("MACLEAN", 1) */ 1 from "SYS"."MACLEAN" PARTITION ("P4")
 where ( (  (  ( "OBJECT_ID" >= 10000091 OR "OBJECT_ID" IS NULL  )  )  ) ) and
rownum < 2
END OF STMT
PARSE #2:c=1000,e=589,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=2,tim=1291531645418792
BINDS #2:
EXEC #2:c=0,e=67,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=2,tim=1291531645418908
FETCH #2:c=0,e=66,p=0,cr=3,cu=0,mis=0,r=0,dep=1,og=2,tim=1291531645418992
STAT #2 id=1 cnt=0 pid=0 pos=1 obj=0 op='COUNT STOPKEY (cr=3 pr=0 pw=0 time=92 us)'
STAT #2 id=2 cnt=0 pid=1 pos=1 obj=0 op='PARTITION RANGE SINGLE PARTITION: 13 13 (cr=3 pr=0 pw=0 time=64 us)'
STAT #2 id=3 cnt=0 pid=2 pos=1 obj=52809 op='TABLE ACCESS FULL MACLEAN PARTITION: 13 13 (cr=3 pr=0 pw=0 time=36 us)'
WAIT #1: nam='control file sequential read' ela= 31 file#=0 block#=1 blocks=1 obj#=-1 tim=1291531645419383
WAIT #1: nam='control file sequential read' ela= 12 file#=1 block#=1 blocks=1 obj#=-1 tim=1291531645419432
WAIT #1: nam='control file sequential read' ela= 23 file#=0 block#=15 blocks=1 obj#=-1 tim=1291531645419494
WAIT #1: nam='control file sequential read' ela= 10 file#=0 block#=17 blocks=1 obj#=-1 tim=1291531645419556

 

B. 如果确实遇到了该问题 ,也可以将Index FULL SCAN 所使用的全局索引drop 掉来达到强制使用FULL single table partition SCAN的目的, 实际使用中不推荐

C. 避免使用Global Partitioned index 或 Non-partitioned Index , 而采用Local partitioned index , 这似乎更难做到

D. 也可以通过将原Maxvalue的分区drop掉之后(前提是该分区真的是空的),再添加新分区的做法来绕过该问题

解决ORA-14098分区交换索引不匹配错误

上周在客户一套BRM系统上执行分区交换Exchange Partition操作的时候出现了ORA-14098错误,该错误是由于分区表上的LOCAL分区索引与非分区表上的索引不匹配造成的,我们来看一下这个错误:

[oracle@rh2 ~]$ oerr ora 14098
14098, 00000, "index mismatch for tables in ALTER TABLE EXCHANGE PARTITION"
// *Cause:  The two tables specified in the EXCHANGE have indexes which are
//          not equivalent
// *Action: Ensure that the indexes for the two tables have indexes which
//          follow this rule
//          For every non partitioned index for the non partitioned table,
//          there has to be an identical LOCAL index on the partitioned
//          table and vice versa. By identical, the column position, type
//          and size have to be the same.
SQL> ALTER TABLE sales EXCHANGE PARTITION SALES_Q4_2003 with
table SALES_TMP INCLUDING INDEXES WITH VALIDATION UPDATE GLOBAL INDEXES;
ALTER TABLE sales EXCHANGE PARTITION SALES_Q4_2003 with
table SALES_TMP INCLUDING INDEXES WITH VALIDATION UPDATE GLOBAL INDEXES
*
ERROR at line 1:
ORA-14098: index mismatch for tables in ALTER TABLE EXCHANGE PARTITION

如果表上有很多的索引,以至于你无法确定到底是哪个索引引发了ORA-14098错误,那么我们可以通过trace的方式来协助定位到具体的索引:

SQL>  select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production
SQL> select * from global_name;
GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmaclean.com
SQL> alter session set max_dump_file_size = unlimited;
Session altered.
SQL> alter session set events '10046 trace name context forever, level 12';
Session altered.
SQL> alter session set events '14098 trace name errorstack forever, level 4';
Session altered.
##SQL> alter system flush buffer_cache;
System altered.
Rerun Exchange Partition DDL 
SQL> ALTER TABLE sales EXCHANGE PARTITION SALES_Q4_2003 with table SALES_TMP
INCLUDING INDEXES WITH VALIDATION UPDATE GLOBAL INDEXES;
ALTER TABLE sales EXCHANGE PARTITION SALES_Q4_2003 with table SALES_TMP
INCLUDING INDEXES WITH VALIDATION UPDATE GLOBAL INDEXES
*
ERROR at line 1:
ORA-14098: index mismatch for tables in ALTER TABLE EXCHANGE PARTITION
11g中直接查询v$diag_info就可以得到trace的路径,10g执行gettracename.sql
SELECT    d.VALUE
|| '/'
|| LOWER (RTRIM (i.INSTANCE, CHR (0)))
|| '_ora_'
|| p.spid
|| '.trc' trace_file_name
FROM (SELECT p.spid
FROM v$mystat m, v$session s, v$process p
WHERE m.statistic# = 1 AND s.SID = m.SID AND p.addr = s.paddr) p,
(SELECT t.INSTANCE
FROM v$thread t, v$parameter v
WHERE v.NAME = 'thread'
AND (v.VALUE = 0 OR t.thread# = TO_NUMBER (v.VALUE))) i,
(SELECT VALUE
FROM v$parameter
WHERE NAME = 'user_dump_dest') d;
TRACE_FILE_NAME
--------------------------------------------------------------------------------
/s01/admin/G10R2/udump/g10r2_ora_17749.trc
==========================10046/errorstack trace contents========================
PARSING IN CURSOR #1 len=127 dep=0 uid=64 oct=15 lid=64 tim=1277655207436065 hv=1207961095 ad='9098f018'
ALTER TABLE sales EXCHANGE PARTITION SALES_Q4_2003 with table SALES_TMP
INCLUDING INDEXES WITH VALIDATION UPDATE GLOBAL INDEXES
END OF STMT
PARSE #1:c=0,e=1145,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=1277655207436059
BINDS #1:
STAT #2 id=2 cnt=0 pid=1 pos=1 obj=98001 op='INDEX FULL SCAN SALES_UNID_TMP (cr=1 pr=0 pw=0 time=39 us)'
*** 2011-06-17 21:55:32.417
ksedmp: internal or fatal error
ORA-14098: index mismatch for tables in ALTER TABLE EXCHANGE PARTITION
Current SQL statement for this session:
ALTER TABLE sales EXCHANGE PARTITION SALES_Q4_2003 with table SALES_TMP
INCLUDING INDEXES WITH VALIDATION UPDATE GLOBAL INDEXES
我们可以在trace中看到在出现ORA-14098错误前,正在对索引SALES_UNID_TMP的Fast Full Scan

除了通过10046/errorstack的trace信息诊断外,更多的问题可以直接从DDL语句中发现,在以上示例中非分区表的DDL语句:

-- Create table
create table SALES_TMP
(
UNI_ID        NUMBER NOT NULL,
PROD_ID       NUMBER not null,
CUST_ID       NUMBER not null,
TIME_ID       DATE not null,
CHANNEL_ID    NUMBER not null,
PROMO_ID      NUMBER not null,
QUANTITY_SOLD NUMBER(10,2) not null,
AMOUNT_SOLD   NUMBER(10,2) not null
);
create index SALES_CHANNEL_TMP       ON SALES_TMP (CHANNEL_ID) ;
create index SALES_CUST_TMP          ON SALES_TMP (CUST_ID)    ;
create index SALES_UNID_TMP          ON SALES_TMP  (UNI_ID,TIME_ID);  --注意细节该索引是非UNIQUE的
create index SALES_PROD_TMP          ON SALES_TMP (PROD_ID)    ;
create index SALES_PROMO_TMP         ON SALES_TMP (PROMO_ID)   ;
create index SALES_TIME_TMP          ON SALES_TMP (TIME_ID)    ;

下为分区表的DDL语句:

-- Create table
create table SALES
(
UNI_ID        NUMBER NOT NULL,
PROD_ID       NUMBER not null,
CUST_ID       NUMBER not null,
TIME_ID       DATE not null,
CHANNEL_ID    NUMBER not null,
PROMO_ID      NUMBER not null,
QUANTITY_SOLD NUMBER(10,2) not null,
AMOUNT_SOLD   NUMBER(10,2) not null
)
partition by range (TIME_ID)
...............
create index SALES_CHANNEL       ON SALES (CHANNEL_ID) LOCAL;
create index SALES_CUST          ON SALES (CUST_ID)    LOCAL;
create UNIQUE index SALES_UNID   ON SALES (UNI_ID,TIME_ID) LOCAL;      -- 对应的索引是UNIQUE的
create index SALES_PROD          ON SALES (PROD_ID)    LOCAL;
create index SALES_PROMO         ON SALES (PROMO_ID)   LOCAL;
create index SALES_TIME          ON SALES (TIME_ID)    LOCAL;

解决ORA-14098错误的要点是要找出引发错误的原因。当我们交换分区的时候,我们要确保所有交换表上的索引和分区表上的本地索引匹配。这意味着如果在分区表上有N个LOCAL INDEXES,那么在交换表上就应当有N个等价的索引。这里的等价要求存在映射关系的2个索引,在列的位置、类型、大小及UNIQUE/NON-UNIQUE都要一致。

可以利用如下SQL语句来找出分区表和交换表上索引的差异:

set linesize 160 pagesize 1400
col TABLE_NAME for a30
col INDEX_NAME for a30
col COLUMN_NAME for a30
col COLUMN_POSITION for 99
col COLUMN_LENGTH for 99
col CHAR_LENGTH for 99
col DESCEND for a4
Select TABLE_NAME,INDEX_NAME, COLUMN_NAME,COLUMN_POSITION, COLUMN_LENGTH, CHAR_LENGTH, DESCEND
FROM SYS.DBA_IND_COLUMNS DICN
WHERE INDEX_OWNER = '&own'
and DICN.TABLE_NAME in ('&TABNAME1','&TABNAME2')
ORDER BY  INDEX_NAME, COLUMN_POSITION
/
select TABLE_NAME, INDEX_NAME, INDEX_TYPE, UNIQUENESS, PARTITIONED
from dba_indexes
where owner='&OWNER'
and TABLE_NAME in ('&TABNAME1', '&TABNAME2')
order by index_name
/

也可以使用Toad的Single Schema Object Compare功能来对比检验索引:
single_object_compare

对于存在主键的分区表,可以在主键上以DISABLE VALIDATE方式创建unique constraint约束,以代替全局的主键索引。若交换表(Exchange Table)上存在主键索引的话,那么建议在交换前暂时将该索引drop掉,待交换完成后再重建。

如果实在无法解决该ORA-14098错误,那么可以尝试使用EXCLUDING INDEXES子句以跳过索引维护,而在交换完成后重建相关失效索引。

owner = '&OWNER1'

Script:列出失效索引或索引分区

以下脚本可用于列出数据库中的失效的索引、索引分区、子分区:

REM list of the unusable index,index partition,index subpartition in Database 
Select owner, index_name, status
From dba_indexes
where status = 'UNUSABLE'
and owner not in ('SYS','SYSTEM',
'SYSMAN',
'EXFSYS',
'WMSYS',
'OLAPSYS',
'OUTLN',
'DBSNMP',
'ORDSYS',
'ORDPLUGINS',
'MDSYS',
'CTXSYS',
'AURORA$ORB$UNAUTHENTICATED',
'XDB',
'FLOWS_030000',
'FLOWS_FILES')
order by 1, 2 
/
select index_owner, index_name, partition_name
from dba_ind_partitions
where status ='UNUSABLE'
and index_owner not in ('SYS',
'SYSTEM',
'SYSMAN',
'EXFSYS',
'WMSYS',
'OLAPSYS',
'OUTLN',
'DBSNMP',
'ORDSYS',
'ORDPLUGINS',
'MDSYS',
'CTXSYS',
'AURORA$ORB$UNAUTHENTICATED',
'XDB',
'FLOWS_030000',
'FLOWS_FILES') order by 1,2
/
Select
Index_Owner
, Index_Name
, partition_name
, SUBPARTITION_NAME
From 
DBA_IND_SUBPARTITIONS
Where
status = 'UNUSABLE'  
and index_owner not in ('SYS',
'SYSTEM',
'SYSMAN',
'EXFSYS',
'WMSYS',
'OLAPSYS',
'OUTLN',
'DBSNMP',
'ORDSYS',
'ORDPLUGINS',
'MDSYS',
'CTXSYS',
'AURORA$ORB$UNAUTHENTICATED',
'XDB',
'FLOWS_030000',
'FLOWS_FILES') order by 1, 2
/

ORA-00600: internal error code, arguments: [kdsgrp1] example

一套Linux x86-64上的11.2.0.1系统,alert日志中出现ORA-00600: internal error code, arguments: [kdsgrp1]错误,相关trace的部分内容如下:

Dump file /u01/app/oracle/diag/rdbms/utdw016/utdw016b/incident/incdir_276035/utdw016b_ora_5756_i276035.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/112utdw016
System name:	Linux
Node name:	x42k601
Release:	2.6.18-164.el5
Version:	#1 SMP Tue Aug 18 15:51:48 EDT 2009
Machine:	x86_64
Instance name: utdw016b
Redo thread mounted by this instance: 2
Oracle process number: 52
Unix process pid: 5756, image: oracle@x42k601
*** 2010-07-28 11:08:35.394
*** SESSION ID:(577.11818) 2010-07-28 11:08:35.394
*** CLIENT ID:() 2010-07-28 11:08:35.394
*** SERVICE NAME:(utdw016-edw) 2010-07-28 11:08:35.394
*** MODULE NAME:(pmdtm@x42k604-zone1 (TNS V1-V3)) 2010-07-28 11:08:35.394
*** ACTION NAME:() 2010-07-28 11:08:35.394
Dump continued from file: /u01/app/oracle/diag/rdbms/utdw016/utdw016b/trace/utdw016b_ora_5756.trc
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
========= Dump for incident 276035 (ORA 600 [kdsgrp1]) ========
*** 2010-07-28 11:08:35.395
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=9hfdcgzzgmgvp) -----
DELETE FROM EIMABS.FACT_OPEN_SALES_ORDER WHERE DIM_SNAPSHOT_TYPE_ID = (SELECT DIM_SNAPSHOT_TYPE_ID FROM EIMABS.DIM_SNAPSHOT_TYPE WHERE SNAPSHOT_TYPE_CODE='FUTURE')
----- PL/SQL Stack -----
----- PL/SQL Call Stack -----
object line object
handle number name
0x5578e0528 6 anonymous block
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
skdstdst()+36 call kgdsdst() 000000000 ? 000000000 ?
7FFFA85D7418 ? 000000001 ?
7FFFA85DB918 ? 000000000 ?
ksedst1()+98 call skdstdst() 000000000 ? 000000000 ?
7FFFA85D7418 ? 000000001 ?
000000000 ? 000000000 ?
ksedst()+34 call ksedst1() 000000000 ? 000000001 ?
7FFFA85D7418 ? 000000001 ?
000000000 ? 000000000 ?
dbkedDefDump()+2736 call ksedst() 000000000 ? 000000001 ?
7FFFA85D7418 ? 000000001 ?
000000000 ? 000000000 ?
ksedmp()+36 call dbkedDefDump() 000000003 ? 000000002 ?
7FFFA85D7418 ? 000000001 ?
000000000 ? 000000000 ?
ksfdmp()+64 call ksedmp() 000000003 ? 000000002 ?
7FFFA85D7418 ? 000000001 ?
000000000 ? 000000000 ?
dbgexPhaseII()+1764 call ksfdmp() 000000003 ? 000000002 ?
7FFFA85D7418 ? 000000001 ?
000000000 ? 000000000 ?
dbgexProcessError() call dbgexPhaseII() 2B30E1B74718 ? 2B30E1B795F8 ?
+2279 7FFFA85E3708 ? 000000001 ?
000000000 ? 000000000 ?
dbgeExecuteForError call dbgexProcessError() 2B30E1B74718 ? 2B30E1B795F8 ?
()+83 000000001 ? 000000000 ?
7FFF00000000 ? 000000000 ?
dbgePostErrorKGE()+ call dbgeExecuteForError 2B30E1B74718 ? 2B30E1B795F8 ?
1615 () 000000001 ? 000000001 ?
000000000 ? 000000000 ?
dbkePostKGE_kgsf()+ call dbgePostErrorKGE() 000000000 ? 2B30E1B87F38 ?
63 000000258 ? 2B30E1B795F8 ?
000000000 ? 000000000 ?
kgeadse()+383 call dbkePostKGE_kgsf() 00A9B2EE0 ? 2B30E1B87F38 ?
000000258 ? 2B30E1B795F8 ?
000000000 ? 000000000 ?
kgerinv_internal()+ call kgeadse() 00A9B2EE0 ? 2B30E1B87F38 ?
45 000000258 ? 000000000 ?
000000000 ? 000000000 ?
kgerinv()+33 call kgerinv_internal() 00A9B2EE0 ? 2B30E1B87F38 ?
A85DBCA000000000 ?
000000258 ? 000000000 ?
000000000 ?
kgeasnmierr()+143 call kgerinv() 00A9B2EE0 ? 2B30E1B87F38 ?
A85DBCA000000000 ?
000000000 ? 000000000 ?
000000000 ?
kdsgrp1_dump()+832 call kgeasnmierr() 00A9B2EE0 ? 2B30E1B87F38 ?
A85DBCA000000000 ?
000000000 ? 000000003 ?
000000003 ?
kdsgrp1()+19 call kdsgrp1_dump() 00A9B2EE0 ? 2B30E1B87F38 ?
7FFFA85E4360 ? 000000000 ?
000000003 ? 000000003 ?
kdsgrp()+6346 call kdsgrp1() 00A9B2EE0 ? 2B30E1B87F38 ?
7FFFA85E4360 ? 000000000 ?
000000003 ? 000000003 ?
qetlbr()+211 call kdsgrp() 2B30E2018620 ? 000000000 ?
2B30E2018620 ? 000000000 ?
000000003 ? 000000003 ?
qertbFetchByRowID() call qetlbr() 2B30E2018620 ? 00000001A ?
+850 000000000 ? 0000000C8 ?
00400F521 ? 000000000 ?
qergiFetch()+276 call qertbFetchByRowID() 000000000 ? 2B30E2018518 ?
000000000 ? 000000000 ?
000000001 ? 000000000 ?
qerdlFetch()+678 call qergiFetch() 511B78238 ? 2B30E20187F8 ?
000000000 ? 000000000 ?
000000001 ? 000000000 ?
delexe()+952 call qerdlFetch() 511B78090 ? 2B30E2018900 ?
000000000 ? 000000000 ?
000007FFF ? 000000000 ?
opiexe()+14449 call delexe() 000000000 ? 000000000 ?
000000000 ? 2B30E27E1E08 ?
2B30E1FCA000 ? 000000000 ?
opipls()+3098 call opiexe() 000000004 ? 000000005 ?
7FFFA85E6708 ? 2B30E27E1E08 ?
2B30E1FCA000 ? 000000000 ?
opiodr()+1149 call opipls() 000000066 ? 000000007 ?
7FFFA85E7F90 ? 000000000 ?
2B30E2701C28 ? 300000000 ?
__PGOSF141_rpidrus( call opiodr() 000000066 ? 000000007 ?
)+206 7FFFA85E7F90 ? 000000003 ?
008C71EF0 ? 300000000 ?
skgmstack()+148 call __PGOSF141_rpidrus( 7FFFA85E79A0 ? 000000007 ?
) 7FFFA85E7F90 ? 000000003 ?
008C71EF0 ? 300000000 ?
rpiswu2()+612 call skgmstack() 7FFFA85E7978 ? 00A9B2B20 ?
00000F618 ? 008468D40 ?
7FFFA85E79A0 ? 300000000 ?
rpidrv()+1342 call rpiswu2() 55CE5B2F8 ? 000000078 ?
55CE5B37C ? 000000009 ?
2B30E1FEEF98 ? 000000078 ?
psddr0()+459 call rpidrv() 000000003 ? 000000066 ?
7FFFA85E7F90 ? 000000039 ?
2B30E1FEEF98 ? 000000078 ?
psdnal()+457 call psddr0() 000000003 ? 000000066 ?
7FFFA85E7F90 ? 000000031 ?
2B30E1FEEF98 ? 000000078 ?
pevm_EXECC()+376 call psdnal() 7FFFA85E9370 ? 7FFFA85E9580 ?
7FFFA85E7F90 ? 2B30E2742370 ?
2B30E1FEEF98 ? 000000078 ?
pfrinstr_EXECC()+75 call pevm_EXECC() 2B30E273F0A0 ? 7FFFA85E9580 ?
000000020 ? 2B30E2742370 ?
2B30E1FEEF98 ? 000000078 ?
pfrrun_no_tool()+63 call pfrinstr_EXECC() 2B30E273F0A0 ? 51148B482 ?
2B30E273F110 ? 2B30E2742370 ?
2B30E1FEEF98 ? 2B3000000020 ?
pfrrun()+1025 call pfrrun_no_tool() 2B30E273F0A0 ? 51148B482 ?
2B30E273F110 ? 2B30E2742370 ?
2B30E1FEEF98 ? 2B3000000020 ?
plsql_run()+769 call pfrrun() 2B30E273F0A0 ? 000000000 ?
2B30E273F110 ? 7FFFA85E9370 ?
2B30E1FEEF98 ? 503AC23E2 ?
peicnt()+296 call plsql_run() 2B30E273F0A0 ? 000000001 ?
000000000 ? 7FFFA85E9370 ?
2B30E1FEEF98 ? 000000000 ?
kkxexe()+520 call peicnt() 7FFFA85E9370 ? 2B30E273F0A0 ?
2B30E2418960 ? 7FFFA85E9370 ?
2B30E2486008 ? 000000000 ?
opiexe()+14796 call kkxexe() 2B30E2013538 ? 2B30E273F0A0 ?
000000000 ? 7FFFA85E9370 ?
2B30E2486008 ? 000000000 ?
kpoal8()+2283 call opiexe() 000000049 ? 000000003 ?
7FFFA85EA8F8 ? 7FFFA85E9370 ?
2B30E2486008 ? 000000000 ?
opiodr()+1149 call kpoal8() 00000005E ? 00000001C ?
7FFFA85ED9D0 ? 7FFFA85E9370 ?
2B30E2486008 ? 5E00000001 ?
ttcpip()+1251 call opiodr() 00000005E ? 00000001C ?
7FFFA85ED9D0 ? 000000000 ?
008C71DB0 ? 5E00000001 ?
opitsk()+1628 call ttcpip() 00A9C6450 ? 0086CCB08 ?
7FFFA85ED9D0 ? 000000000 ?
7FFFA85ED430 ? 7FFFA85EDBD4 ?
opiino()+953 call opitsk() 00A9C6450 ? 000000000 ?
7FFFA85ED9D0 ? 000000000 ?
7FFFA85ED430 ? 7FFFA85EDBD4 ?
opiodr()+1149 call opiino() 00000003C ? 000000004 ?
7FFFA85EF0C8 ? 000000000 ?
7FFFA85ED430 ? 7FFFA85EDBD4 ?
opidrv()+565 call opiodr() 00000003C ? 000000004 ?
7FFFA85EF0C8 ? 000000000 ?
008C71860 ? 7FFFA85EDBD4 ?
sou2o()+98 call opidrv() 00000003C ? 000000004 ?
7FFFA85EF0C8 ? 000000000 ?
008C71860 ? 7FFFA85EDBD4 ?
opimai_real()+128 call sou2o() 7FFFA85EF0A0 ? 00000003C ?
000000004 ? 7FFFA85EF0C8 ?
008C71860 ? 7FFFA85EDBD4 ?
ssthrdmain()+209 call opimai_real() 000000002 ? 7FFFA85EF290 ?
000000004 ? 7FFFA85EF0C8 ?
008C71860 ? 7FFFA85EDBD4 ?
main()+196 call ssthrdmain() 000000002 ? 7FFFA85EF290 ?
000000001 ? 000000000 ?
008C71860 ? 7FFFA85EDBD4 ?
__libc_start_main() call main() 000000002 ? 7FFFA85EF438 ?
+244 000000001 ? 000000000 ?
008C71860 ? 7FFFA85EDBD4 ?
_start()+36 call __libc_start_main() 0009D3E88 ? 000000002 ?
7FFFA85EF428 ? 000000000 ?
008C71860 ? 000000002 ?

错误由此删除语句引起:

DELETE FROM EIMABS.FACT_OPEN_SALES_ORDER WHERE 
DIM_SNAPSHOT_TYPE_ID = 
(SELECT DIM_SNAPSHOT_TYPE_ID FROM EIMABS.DIM_SNAPSHOT_TYPE WHERE SNAPSHOT_TYPE_CODE='FUTURE')

其中EIMABS.DIM_SNAPSHOT_TYPE是一个普通堆表,而EIMABS.FACT_OPEN_SALES_ORDER是一个分区表,针对该分区表的各个分区进行了validate structure分析:

SQL> @?/rdbms/admin/utlvalid
Table created.
SQL> set time on;
23:32:51 SQL> analyze table EIMABS.DIM_SNAPSHOT_TYPE validate structure cascade;
Table analyzed.
00:30:46 SQL> analyze table EIMABS.FACT_OPEN_SALES_ORDER partition (P_2455405) validate structure cascade into invalid_rows;
analyze table EIMABS.FACT_OPEN_SALES_ORDER partition (P_2455405) validate structure cascade into invalid_rows
*
ERROR at line 1:
ORA-01499: table/index cross reference failure - see trace file

推测可能是P_2455405分区相关的索引可能出现了讹误,尝试重建该表上的15个索引,再次测试相关应用发现问题仍未解决。此次产生的trace文件:

Dump file /u01/app/oracle/diag/rdbms/utdw016/utdw016a/incident/incdir_310187/utdw016a_ora_26473_i310187.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/112utdw016
System name:	Linux
Node name:	x42k600
Release:	2.6.18-164.el5
Version:	#1 SMP Tue Aug 18 15:51:48 EDT 2009
Machine:	x86_64
Instance name: utdw016a
Redo thread mounted by this instance: 1
Oracle process number: 62
Unix process pid: 26473, image: oracle@x42k600
*** 2010-07-30 04:10:19.931
*** SESSION ID:(1993.31898) 2010-07-30 04:10:19.931
*** CLIENT ID:() 2010-07-30 04:10:19.931
*** SERVICE NAME:(utdw016-edw) 2010-07-30 04:10:19.931
*** MODULE NAME:(pmdtm@x42k604-zone2 (TNS V1-V3)) 2010-07-30 04:10:19.931
*** ACTION NAME:() 2010-07-30 04:10:19.931
Dump continued from file: /u01/app/oracle/diag/rdbms/utdw016/utdw016a/trace/utdw016a_ora_26473.trc
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
========= Dump for incident 310187 (ORA 600 [kdsgrp1]) ========
*** 2010-07-30 04:10:19.932
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=9hfdcgzzgmgvp) -----
DELETE FROM EIMABS.FACT_OPEN_SALES_ORDER WHERE DIM_SNAPSHOT_TYPE_ID = (SELECT DIM_SNAPSHOT_TYPE_ID FROM EIMABS.DIM_SNAPSHOT_TYPE WHERE SNAPSHOT_TYPE_CODE='FUTURE')
----- PL/SQL Stack -----
----- PL/SQL Call Stack -----
SO: 0x5334781d0, type: 52, owner: 0x55d2646b8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x55caf0778, name=transaction, file=ktccts.h LINE:388, pg=0
(trans) flg = 0x00001e03, flg2 = 0x000c0000, flg3 = 0x00000000, prx = (nil), ros = 2147483647
flg  = 0x00001e03: ALC TRN VUS VID CHG USN
flg2 = 0x000c0000: PGA NIP
flg3 = 0x00000000:
bsn = 0x8419 bndsn = 0x841b spn = 0x85c7
efd = 15 DID:
file:kta.c lineno:1665
parent xid: 0x0000.000.00000000
env: (scn: 0x0a19.a0b7c9d4  xid: 0x0021.017.00028415  uba: 0x00e00545.0438.12  statement num=0  parent xid: xid: 0x0000.000.00000000  scn: 0x0a19.a0b7c9d9 0sch: scn: 0x0000.00000000  mascn: (scn: 0x0000.00000000)
cev: (spc = 11546  arsp = 0x5335450a8  ubkds (ubk:tsn: 2 rdba: 0x00e00545 flag:0x8 hdl:(nil) addr:0x35c50c014)  useg tsn: 2 rdba: 0x00e2c990
hwm uba: 0x00e00545.0438.12  col uba: 0x00000000.0000.00
num bl: 3 bk list: 0x53db19518)
cr opc: 0x0 spc: 11546 uba: 0x00e00545.0438.12
ccbstg: 0x00000000
(enqueue) TX-00210017-00028415	DID: 0001-003E-00000C70
lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  res_flag: 0x47
mode: X, lock_flag: 0x0, lock: 0x533478248, res: 0x5414c2da0
own: 0x55d2646b8, sess: 0x8b37718, proc: 0x55caf0778, prv: 0x5414c2db0
slk: (nil)
xga: (nil), heap: UGA
Trans IMU st: 0 Pool index 65535, Redo pool 0x533478980, Undo pool 0x533478a68
Redo pool range [0x2b13aacccb90 0x2b13aacccb90 0x2b13aacd1390]
Undo pool range [0x2b13aacc8390 0x2b13aacc8390 0x2b13aacccb90]
chnf control flags 0x0         CHNF hwm uba uba: 0x00000000.0000.00        ----------------------------------------
SO: 0x53db19518, type: 51, owner: 0x5334781d0, flag: -/-/-/0x00 if: 0x3 c: 0x3
SO: 0x55d2646b8, type: 4, owner: 0x55caf0778, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x55caf0778, name=session, file=ksu.h LINE:11467, pg=0
(session) sid: 1993 ser: 31898 trans: 0x5334781d0, creator: 0x55caf0778
flags: (0x45) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x40008) -/-
DID: , short-term DID:
txn branch: (nil)
oct: 7, prv: 0, sql: 0x5534eae70, psql: 0x55a4ff230, user: 120/EIMABS
ksuxds FALSE at location: 0
service name: utdw016-edw
client details:
O/S info: user: infom, term: , ospid: 21955
machine: x42k604-zone2 program: pmdtm@x42k604-zone2 (TNS V1-V3)
application name: pmdtm@x42k604-zone2 (TNS V1-V3), hash value=4200765728
Current Wait Stack:
Not in wait; last wait ended 1.028093 sec ago
Wait State:
fixed_waits=0 flags=0x21 boundary=(nil)/-1
Session Wait History:
elapsed time of 1.028173 sec since last wait
0: waited for 'Disk file operations I/O'
FileOperation=0x5, fileno=0x1, filetype=0x3
wait_id=4666 seq_num=4860 snap_id=2
wait times: snap=0.000039 sec, exc=0.000181 sec, total=0.000687 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 0.000000 sec of elapsed time
1: waited for 'ASM file metadata operation'
msgop=0x12, locn=0x0, =0x0
wait_id=4667 seq_num=4859 snap_id=2
wait times: snap=0.000002 sec, exc=0.000005 sec, total=0.000506 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 0.000000 sec of elapsed time
2: waited for 'KSV master wait'
=0x0, =0x0, =0x0
wait_id=4668 seq_num=4858 snap_id=1
wait times: snap=0.000501 sec, exc=0.000501 sec, total=0.000501 sec
wait times: max=infinite
wait counts: calls=1 os=1
occurred after 0.000000 sec of elapsed time
3: waited for 'ASM file metadata operation'
msgop=0x12, locn=0x0, =0x0
wait_id=4667 seq_num=4857 snap_id=1
wait times: snap=0.000003 sec, exc=0.000003 sec, total=0.000003 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 0.000000 sec of elapsed time
4: waited for 'Disk file operations I/O'
FileOperation=0x5, fileno=0x1, filetype=0x3
wait_id=4666 seq_num=4856 snap_id=1
wait times: snap=0.000142 sec, exc=0.000142 sec, total=0.000142 sec
wait times: max=infinite
wait counts: calls=0 os=0
......
Block header dump:  0x0400f521 Data
Block header dump:  0x2317a579 Index
Block header dump:  0x2317a577 " "
Block header dump:  0x048c8c3c " "
Block header dump:  0x2317a1bb " "
Block header dump:  0x048c78ad " "
Block header dump:  0x0400f51f Data
Block header dump:  0x2317a33b Index
Block header dump:  0x048c78f3 " "
Block header dump:  0x07541a44 " "
Block header dump:  0x07577b92 " "
Block header dump:  0x078a4eb7 " "
Block header dump:  0x2317a4b7 " "
Block header dump:  0x040fe621 " "
Block header dump:  0x2317a47b " "

同时应用人员发现此系统中另一SQL语句也会导致ORA-600[kdsgrp1]错误的出现,如下:

SELECT
F.DIM_BIZ_SECTOR_ID AS DIM_BIZ_SECTOR_ID,
F.DW_REVENUE_ID ,
F.DIM_ENT_ITEM_ID AS DIM_ENT_ITEM_ID,
F.DIM_REV_PCH_KEY_ACCT_ID AS DIM_PCH_KEY_ACCOUNT_ID,
CT.DIM_PGH_COUNTRY_GROUP_ID AS DIM_PGH_COUNTRY_GROUP_ID,
F.DIM_REVENUE_DATE_ID AS DIM_FORECAST_DATE_ID,
NEXT_DAY(SYSDATE-4.000000000000000,'FRI') AS SNAPSHOT_DATE,
F.REVENUE_DT AS FORECAST_DATE,
F.REVENUE_QTY AS REVENUE_QTY
FROM
EIMABS.FACT_REVENUE F,
EIMABS.DIM_BIZ_SECTOR B,
EIMABS.DIM_ENT_ITEM E,
EIMABS.DIM_PCH_KEY_ACCOUNT PCH,
EIMABS.DIM_COUNTRY CT,
EIMABS.DIM_FISCAL_CAL_DATE CAL
WHERE
F.DIM_BIZ_SECTOR_ID = B.DIM_BIZ_SECTOR_ID
AND F.DIM_ENT_ITEM_ID = E.DIM_ENT_ITEM_ID
AND F.DIM_REV_PCH_KEY_ACCT_ID = PCH.DIM_PCH_KEY_ACCOUNT_ID
AND F.DIM_ULT_DEST_COUNTRY_ID = CT.DIM_COUNTRY_ID
AND UPPER(B.BIZ_SECTOR_CODE) = 'MDB'
AND F.DIM_REVENUE_DATE_ID = CAL.DIM_CALENDAR_DATE_ID
AND CAL.FISCAL_MONTH_NUM IN
(SELECT FISCAL_MONTH_NUM FROM EIMABS.DIM_FISCAL_CAL_MONTH
WHERE DIM_CALENDAR_MONTH_ID IN (
(SELECT B.DIM_CALENDAR_MONTH_ID -1 FROM EIMABS.DIM_FISCAL_CAL_DATE A , EIMABS.DIM_FISCAL_CAL_MONTH B
WHERE CALENDAR_DT = TRUNC(SYSDATE) AND A.DIM_CALENDAR_MONTH_ID = B.DIM_CALENDAR_MONTH_ID),
(SELECT B.DIM_CALENDAR_MONTH_ID -2 FROM EIMABS.DIM_FISCAL_CAL_DATE A , EIMABS.DIM_FISCAL_CAL_MONTH B
WHERE CALENDAR_DT = TRUNC(SYSDATE) AND A.DIM_CALENDAR_MONTH_ID = B.DIM_CALENDAR_MONTH_ID)));
ERROR at line 12:
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [],
[], [], [], [], []
/*相关trace文件如下*/
Trace file /u01/app/oracle/diag/rdbms/utdw016/utdw016a/trace/utdw016a_ora_26473.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/112utdw016
System name:	Linux
Node name:	x42k600
Release:	2.6.18-164.el5
Version:	#1 SMP Tue Aug 18 15:51:48 EDT 2009
Machine:	x86_64
Instance name: utdw016a
Redo thread mounted by this instance: 1
Oracle process number: 62
Unix process pid: 26473, image: oracle@x42k600
*** 2010-07-30 04:10:18.249
*** SESSION ID:(1993.31898) 2010-07-30 04:10:18.249
*** CLIENT ID:() 2010-07-30 04:10:18.249
*** SERVICE NAME:(utdw016-edw) 2010-07-30 04:10:18.249
*** MODULE NAME:(pmdtm@x42k604-zone2 (TNS V1-V3)) 2010-07-30 04:10:18.249
*** ACTION NAME:() 2010-07-30 04:10:18.249
* kdsgrp1-1: *************************************************
row 0x0400f521.1a continuation at
0x0400f521.1a file# 16 block# 62753 slot 26 not found
KDSTABN_GET: 0 ..... ntab: 1
curSlot: 26 ..... nrows: 28
kdsgrp - dump CR block dba=0x0400f521
Block header dump:  0x0400f521
Object id on Block? Y
seg/obj: 0x11527d  csc: 0xa19.61e20b3c  itc: 2  flg: E  typ: 1 - DATA
brn: 0  bdba: 0x400f500 ver: 0x01 opc: 0
inc: 0  exflg: 0
Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0040.02f.0001bfcd  0x2822b8ea.0224.34  C-U-    0  scn 0x0a19.61e1aa84
0x02   0x003f.01c.00028920  0x2822f9de.01e0.6b  --U-   26  fsc 0x0000.61e21527
bdba: 0x0400f521
data_block_dump,data header at 0x2d63ec064

这个case同时提交了SR,Oracle GCS支持分析日志后认为告警日志中曾经出现kewastUnPackStats记录,说明可能由11.2.0.1上的Bug 8967729 引起的:

I researched the issue further based upon the error that followed the log switch command,
Tue Aug 03 10:00:04 2010
ALTER SYSTEM ARCHIVE LOG
Tue Aug 03 10:00:05 2010
Thread 1 advanced to log sequence 8652 (LGWR switch)
Current log# 4 seq# 8652 mem# 0: +UAT_DATA/utdw016/redo041a.log
Current log# 4 seq# 8652 mem# 1: +UAT_ARCH/utdw016/redo042a.log
Tue Aug 03 10:00:14 2010
Archived Log entry 16346 added for thread 1 sequence 8651 ID 0x8d616825 dest 1:
Tue Aug 03 10:00:21 2010
ALTER SYSTEM ARCHIVE LOG
Tue Aug 03 10:00:23 2010
Thread 1 advanced to log sequence 8653 (LGWR switch)
Current log# 5 seq# 8653 mem# 0: +UAT_DATA/utdw016/redo051a.log
Current log# 5 seq# 8653 mem# 1: +UAT_ARCH/utdw016/redo052a.log
Tue Aug 03 10:00:23 2010
Archived Log entry 16348 added for thread 1 sequence 8652 ID 0x8d616825 dest 1:
Tue Aug 03 10:00:45 2010
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> kewastUnPackStats(): bad magic 1 (0x2b29e289bf13, 0)
We need to resolve this issue and then determine whether these indexes corrupt again within the database.
I found Bug 8967729 11.2.0.1 Abstract: KEWASTUNPACKSTATS(): BAD MAGIC MESSAGE IN ALERT.LOG.
Basically, there is a archivelog switch issue that is causes string corruption.
This issue was introduced by the fix of bug #8715387 and has been fully fixed as part of Bug 8730312 11.2 Abstract: FWD MERGE FOR BASE Bug 8715387 FOR 12G.
Please download and install patch 8730312 for your Oracle and OS versions.
However, if you have installed patches in this 11.2 database, then upload the results of the Opatch inventory.
We will check the installed database patches against patch 8730312 for any contradictions.
Hdr: 8967729 11.2.0.1 RDBMS 11.2.0.1 SVRMAN AWR PRODID-5 PORTID-226 8730312
Abstract: KEWASTUNPACKSTATS(): BAD MAGIC MESSAGE IN ALERT.LOG
PROBLEM:
--------
In a two-node RAC cluster (+ Data Guard), the rac instances write many
strange messages in alert log, such:
kewastUnPackStats(): bad magic 1 (0x2ac893121390, 0)
The hex address changes.
DIAGNOSTIC ANALYSIS:
--------------------
I don't see any error associated with this message.
It looks like Bug 8730312 description, but the base bug 8715387 should be
already fixed in 11.2.0.1 version.
WORKAROUND:
-----------
n/a
RELATED BUGS:
-------------
8715387, 8730312
REPRODUCIBILITY:
----------------
TEST CASE:
----------
STACK TRACE:
------------
Tue Sep 22 09:15:34 2009
ALTER SYSTEM ARCHIVE LOG
Tue Sep 22 09:15:34 2009
LGWR: Standby redo logfile selected to archive thread 1 sequence 1244
LGWR: Standby redo logfile selected for thread 1 sequence 1244 for
destination LOG_ARCHIVE_DEST_2
Thread 1 advanced to log sequence 1244 (LGWR switch)
Current log# 2 seq# 1244 mem# 0:
+DGCTRL/saneons/onlinelog/group_2.264.697133919
Archived Log entry 3903 added for thread 1 sequence 1243 ID 0x5186bbfd dest
1:
Tue Sep 22 10:00:02 2009
kewastUnPackStats(): bad magic 1 (0x2b1d760d3038, 0)
kewastUnPackStats(): bad magic 1 (0x2b1d760d3038, 0)

MOS认为应用补丁8730312可以解决我们的问题,但实际apply patch后EIMABS.FACT_OPEN_SALES_ORDER上的DELETE语句可以正常执行了,而之后发现的查询语句仍会报ORA-00600:[kdsgrp1]错误:

We have applied the patch. And we're in process of analyze all the partitions. So far there's no error reported. We analyzed all the partitions in table EIMABS.FACT_REVENUE, there's no error either. However, when we run the query, we still get the same error. Please note the other tables in the query are not partitioned.
SELECT
F.DIM_BIZ_SECTOR_ID AS DIM_BIZ_SECTOR_ID,
F.DW_REVENUE_ID ,
F.DIM_ENT_ITEM_ID AS DIM_ENT_ITEM_ID,
F.DIM_REV_PCH_KEY_ACCT_ID AS DIM_PCH_KEY_ACCOUNT_ID,
CT.DIM_PGH_COUNTRY_GROUP_ID AS DIM_PGH_COUNTRY_GROUP_ID,
F.DIM_REVENUE_DATE_ID AS DIM_FORECAST_DATE_ID,
NEXT_DAY(SYSDATE-4.000000000000000,'FRI') AS SNAPSHOT_DATE,
F.REVENUE_DT AS FORECAST_DATE,
F.REVENUE_QTY AS REVENUE_QTY
FROM
EIMABS.FACT_REVENUE F,
EIMABS.DIM_BIZ_SECTOR B,
EIMABS.DIM_ENT_ITEM E,
EIMABS.DIM_PCH_KEY_ACCOUNT PCH,
EIMABS.DIM_COUNTRY CT,
EIMABS.DIM_FISCAL_CAL_DATE CAL
WHERE
F.DIM_BIZ_SECTOR_ID = B.DIM_BIZ_SECTOR_ID
AND F.DIM_ENT_ITEM_ID = E.DIM_ENT_ITEM_ID
AND F.DIM_REV_PCH_KEY_ACCT_ID = PCH.DIM_PCH_KEY_ACCOUNT_ID
AND F.DIM_ULT_DEST_COUNTRY_ID = CT.DIM_COUNTRY_ID
AND UPPER(B.BIZ_SECTOR_CODE) = 'MDB'
AND F.DIM_REVENUE_DATE_ID = CAL.DIM_CALENDAR_DATE_ID
AND CAL.FISCAL_MONTH_NUM IN
(SELECT FISCAL_MONTH_NUM FROM EIMABS.DIM_FISCAL_CAL_MONTH
WHERE DIM_CALENDAR_MONTH_ID IN (
(SELECT B.DIM_CALENDAR_MONTH_ID -1 FROM EIMABS.DIM_FISCAL_CAL_DATE A , EIMABS.DIM_FISCAL_CAL_MONTH B
WHERE CALENDAR_DT = TRUNC(SYSDATE) AND A.DIM_CALENDAR_MONTH_ID = B.DIM_CALENDAR_MONTH_ID),
(SELECT B.DIM_CALENDAR_MONTH_ID -2 FROM EIMABS.DIM_FISCAL_CAL_DATE A , EIMABS.DIM_FISCAL_CAL_MONTH B
WHERE CALENDAR_DT = TRUNC(SYSDATE) AND A.DIM_CALENDAR_MONTH_ID = B.DIM_CALENDAR_MONTH_ID)));
SELECT
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [],
[], [], [], [], []
/*以下为trace文件*/
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/112utdw016
System name:	Linux
Node name:	x42k600
Release:	2.6.18-164.el5
Version:	#1 SMP Tue Aug 18 15:51:48 EDT 2009
Machine:	x86_64
Instance name: utdw016a
Redo thread mounted by this instance: 1
Oracle process number: 64
Unix process pid: 7170, image: oracle@x42k600 (TNS V1-V3)
*** 2010-08-05 19:10:19.776
*** SESSION ID:(10.4634) 2010-08-05 19:10:19.776
*** CLIENT ID:() 2010-08-05 19:10:19.776
*** SERVICE NAME:(SYS$USERS) 2010-08-05 19:10:19.776
*** MODULE NAME:(sqlplus@x42k600 (TNS V1-V3)) 2010-08-05 19:10:19.776
*** ACTION NAME:() 2010-08-05 19:10:19.776
* kdsgrp1-1:
*************************************************
row 0x04608b75.40 continuation at
0x04608b75.40 file# 17 block# 2132853 slot 64 not found
KDSTABN_GET: 0 ..... ntab: 1
curSlot: 64 ..... nrows: 130
kdsgrp - dump CR block dba=0x04608b75
Block header dump:  0x04608b75
Object id on Block? Y
seg/obj: 0x1c8071  csc: 0xa19.622a6023  itc: 2  flg: E  typ: 1 - DATA
brn: 0  bdba: 0x4608b40 ver: 0x01 opc: 0
inc: 0  exflg: 0
Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000f.040.0001267d  0x01067067.022e.1e  --U-   64  fsc 0x0000.622a621d
0x02   0x0029.026.0001be36  0x282115c8.0223.0a  C---    0  scn 0x0a19.62297970  <---------- this TX shows in the index too
bdba: 0x04608b75
data_block_dump,data header at 0x47c16c064
===============
tsiz: 0x3f98
hsiz: 0x116
pbl: 0x47c16c064
76543210
flag=--------
ntab=1
nrow=130
frre=64
..
..
tab 0, row 63, @0x3a65
tl: 110 fb: --H-FL-- lb: 0x1  cc: 22
col  0: [ 5]  c4 0e 31 32 04
col  1: [ 2]  c1 02
col  2: [ 5]  c4 02 44 33 44
col  3: [ 5]  c4 03 14 10 23
col  4: [ 5]  c4 09 5f 35 38
col  5: [ 5]  c4 05 32 42 0b
col  6: [ 2]  c2 03
col  7: [ 5]  c4 03 2e 36 49
col  8: [ 1]  80
col  9: [ 1]  80
col 10: [ 7]  32 30 34 31 39 32 35
col 11: [ 1]  4e
col 12: [ 7]  78 6e 06 18 01 01 01
col 13: [ 2]  c1 06
col 14: [ 3]  c2 4d 33
col 15: [ 7]  78 6e 07 1b 0a 0d 04
col 16: [ 4]  c3 05 2f 25
col 17: [ 2]  c1 09
col 18: [ 3]  c2 02 50
col 19: [ 3]  c2 02 50
col 20: [ 2]  c1 0e
col 21: [ 8]  38 30 30 31 31 38 35 38
end_of_block_dump
Block dump from cache:
Dump of buffer cache at level 4 for tsn=13, rdba=73436021
BH (0x47fd84178) file#: 17 rdba: 0x04608b75 (17/2132853) class: 1 ba: 0x47c16c000
set: 47 pool 3 bsz: 16384 bsi: 0 sflg: 1 pwc: 0,25
dbwrid: 0 obj: 1867889 objn: 1130693 tsn: 13 afn: 17 hint: f
hash: [0x54be43860,0x54be43860] lru: [0x49beca1a0,0x3b7e094c0]
ckptq: [NULL] fileq: [NULL] objq: [0x52a0e8580,0x52a0e8580]
use: [0x54271be40,0x54271be40] wait: [NULL]
st: SCURRENT md: SHR tch: 0 le: 0x3effd22c8
flags:
LRBA: [0x0.0.0] LSCN: [0x0.0] HSCN: [0xffff.ffffffff] HSUB: [65535]
cr pin refcnt: 0 sh pin refcnt: 0
Block dump from disk:
buffer tsn: 13 rdba: 0x04608b75 (17/2132853)
scn: 0x0a19.622a621d seq: 0x01 flg: 0x06 tail: 0x621d0601
frmt: 0x02 chkval: 0xdf67 type: 0x06=trans data
Block header dump:  0x04608b75
Object id on Block? Y
seg/obj: 0x1c8071  csc: 0xa19.622a6023  itc: 2  flg: E  typ: 1 - DATA
brn: 0  bdba: 0x4608b40 ver: 0x01 opc: 0
inc: 0  exflg: 0
Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000f.040.0001267d  0x01067067.022e.1e  --U-   64  fsc 0x0000.622a621d
0x02   0x0029.026.0001be36  0x282115c8.0223.0a  C---    0  scn 0x0a19.62297970
bdba: 0x04608b75
Incident 322216 created, dump file: /u01/app/oracle/diag/rdbms/utdw016/utdw016a/incident/incdir_322216/utdw016a_ora_7170_i322216.trc
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
utdw016a_ora_7170_i322218.trc
-------------------------------
----- Current SQL Statement for this session (sql_id=88kca6rwpm9pc) -----
SELECT
F.DIM_BIZ_SECTOR_ID AS DIM_BIZ_SECTOR_ID,
F.DW_REVENUE_ID ,
F.DIM_ENT_ITEM_ID AS DIM_ENT_ITEM_ID,
F.DIM_REV_PCH_KEY_ACCT_ID AS DIM_PCH_KEY_ACCOUNT_ID,
CT.DIM_PGH_COUNTRY_GROUP_ID AS DIM_PGH_COUNTRY_GROUP_ID,
F.DIM_REVENUE_DATE_ID AS DIM_FORECAST_DATE_ID,
NEXT_DAY(SYSDATE-4.000000000000000,'FRI') AS SNAPSHOT_DATE,
F.REVENUE_DT AS FORECAST_DATE,
F.REVENUE_QTY AS REVENUE_QTY
FROM
EIMABS.FACT_REVENUE F,
EIMABS.DIM_BIZ_SECTOR B,
EIMABS.DIM_ENT_ITEM E,
EIMABS.DIM_PCH_KEY_ACCOUNT PCH,
EIMABS.DIM_COUNTRY CT,
EIMABS.DIM_FISCAL_CAL_DATE CAL
WHERE
F.DIM_BIZ_SECTOR_ID = B.DIM_BIZ_SECTOR_ID
AND F.DIM_ENT_ITEM_ID = E.DIM_ENT_ITEM_ID
AND F.DIM_REV_PCH_KEY_ACCT_ID = PCH.DIM_PCH_KEY_ACCOUNT_ID
AND F.DIM_ULT_DEST_COUNTRY_ID = CT.DIM_COUNTRY_ID
AND UPPER(B.BIZ_SECTOR_CODE) = 'MDB'
AND F.DIM_REVENUE_DATE_ID = CAL.DIM_CALENDAR_DATE_ID
AND CAL.FISCAL_MONTH_NUM IN
(SELECT FISCAL_MONTH_NUM FROM EIMABS.DIM_FISCAL_CAL_MONTH
WHERE DIM_CALENDAR_MONTH_ID IN (
(SELECT B.DIM_CALENDAR_MONTH_ID -1 FROM EIMABS.DIM_FISCAL_CAL_DATE A , EIMABS.DIM_FISCAL_CAL_MONTH B
WHERE CALENDAR_DT = TRUNC(SYSDATE) AND A.DIM_CALENDAR_MONTH_ID = B.DIM_CALENDAR_MONTH_ID),
(SELECT B.DIM_CALENDAR_MONTH_ID -2 FROM EIMABS.DIM_FISCAL_CAL_DATE A , EIMABS.DIM_FISCAL_CAL_MONTH B
WHERE CALENDAR_DT = TRUNC(SYSDATE) AND A.DIM_CALENDAR_MONTH_ID = B.DIM_CALENDAR_MONTH_ID)))
----- Call Stack Trace -----
kdsgrp1  kdsgrp  qetlbr  qertbFetchByRowID  qerjotRowProc  qerbtFetch  wProc  qergiFetch
BH (0x47fd84178) file#: 17 rdba: 0x04608b75 (17/2132853) class: 1 ba: 0x47c16c000
set: 47 pool 3 bsz: 16384 bsi: 0 sflg: 1 pwc: 0,25
dbwrid: 0 obj: 1867889 objn: 1130693 tsn: 13 afn: 17 hint: f
addr: 0x47fd84178 obj: 1867889 cls: DATA bscn: 0xa19.622a621d
buffer tsn: 13 rdba: 0x04608b75 (17/2132853)
scn: 0x0a19.622a621d seq: 0x01 flg: 0x06 tail: 0x621d0601
frmt: 0x02 chkval: 0xdf67 type: 0x06=trans data
Block header dump:  0x04608b75
Object id on Block? Y
seg/obj: 0x1c8071  csc: 0xa19.622a6023  itc: 2  flg: E  typ: 1 - DATA
brn: 0  bdba: 0x4608b40 ver: 0x01 opc: 0
inc: 0  exflg: 0
Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000f.040.0001267d  0x01067067.022e.1e  --U-   64  fsc 0x0000.622a621d
0x02   0x0029.026.0001be36  0x282115c8.0223.0a  C---    0  scn 0x0a19.62297970
bdba: 0x04608b75
data_block_dump,data header at 0x47c16c064
===============
tsiz: 0x3f98
hsiz: 0x116
pbl: 0x47c16c064
76543210
flag=--------
ntab=1
nrow=130
frre=64
BH (0x49fe84af8) file#: 29 rdba: 0x076e5058 (29/3035224) class: 1 ba: 0x49ee0c000
set: 44 pool 3 bsz: 16384 bsi: 0 sflg: 1 pwc: 0,25
dbwrid: 1 obj: 1893758 objn: 1893758 tsn: 17 afn: 29 hint: f
addr: 0x49fe84af8 obj: 1893758 cls: DATA bscn: 0xa19.62297970
buffer tsn: 17 rdba: 0x076e5058 (29/3035224)
scn: 0x0a19.62297970 seq: 0x01 flg: 0x06 tail: 0x79700601
frmt: 0x02 chkval: 0xe878 type: 0x06=trans data
Block header dump:  0x076e5058
Object id on Block? Y
seg/obj: 0x1ce57e  csc: 0xa19.6229787e  itc: 2  flg: E  typ: 2 - INDEX
brn: 0  bdba: 0x76e5054 ver: 0x01 opc: 0
inc: 0  exflg: 0
Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x02   0x0029.026.0001be36  0x282115de.0223.2a  --U-   14  fsc 0x0000.62297970  <--------------- same TX as in the table

之后MOS又有回复,认为设置隐式参数_row_cr=FALSE可以解决问题:

There are few bugs that could raise the ora-600[KDSGRP1]  in 11.2
1) based on the stack --> Bug:8771916 ORA-600 [KDSGRP1] WHEN DOING AN UPDATE
Abstract:            OERI [kdsgrp1] during CR read
Fixed-Releases:     C100
Tags:               OERI
Details:
ORA-600 [kdsgrp1] can occur intermittently when using
a query access path to a row in a table via an index
due to a consistent read problem.
Workaround:
Disabling rowCR (which is an optimization to reduce consistent-read
rollbacks during queries) by setting "_row_cr"=FALSE in the
initialization files in one workaround. However, this could cause
performance degradation of queries - the statistics "RowCR hits" /
"RowCR attempts" can help show if this workaround may be detrimental
to performance.
2) based on the fact it is hit on partitioned tables
Bug 8546356  ORA-8102/ORA-1499/OERI[kdsgrp1] Composite Partitioned Index corruption after rebuild ONLINE
Details:
When a COMPOSITE PARTITIONED index is rebuild ONLINE and concurrent DML on
the same table is run on separate RAC instances, index corruption might
happen where the index is missing some keys. This is only pertaining to
composite partitioned indexes.
Wrong results or the next errors can be produced by SQL statements:
ORA-8102 by a delete/update
ORA-1499 by "analyze table validate structure cascade"
ORA-600 [kdsgrp1]
ORA-600 [qertbFetchByRowID]
Workaround:
Do not run concurrent DML from any instance other than the one where index
online rebuild is happening or rebuild the index without the ONLINE clause.
To fix it: Rebuild the index.
Please set in init.ora this parameter _row_cr=FALSE and bounce database,
see after that if OERI[kdsgrp1] still reproduces.
There is one-off patch available for Bug 8771916 for 11.2.0.1 on LInux x86_64
and for Bug 8546356 the one-off patch is done for 11.2.0.1.2
There is also Bug:8951812 hit a lot in 11.2.0.1, one-off patch available
An index can become corrupt during index rebuild online producing
ORA-600 errors or other external like ORA-8102 by an update/delete,
ORA-1499 by "analyze table validate structure cascade".

这个case仍在继续,to be continued .....

沪ICP备14014813号

沪公网安备 31010802001379号