【Maclean Liu技术分享】拨开Oracle优化器迷雾探究Histogram之秘

【Maclean Liu技术分享】拨开Oracle CBO 优化器迷雾, 探究Histogram直方图之秘,讲座文档正式版已上传
http://t.askmaclean.com/thread-2172-1-1.html

预计时长: 1.5个小时

适合参与成员: 对于性能调优和CBO优化器有兴趣的同学,或急于提升SQL调优技能的同学。

教学视频已上传 , 收看请猛击下面的地址:

http://www.tudou.com/programs/view/TGvP2pIyvwI/

【Maclean Liu技术分享】Histogram直方图技术演示脚本如下:

【Maclean Liu技术分享】Histogram直方图技术演示脚本.txt (9.93 KB, 下载次数: 65)

讲座材料presentation 当前正式版本下载:

【Maclean Liu技术分享】拨开Oracle CBO优化器迷雾,探究Histogram直方图之秘 20130429.pdf (1.11 MB, 下载次数: 3646)

 
 

 

【CBO Optimizer优化器】IX_SEL索引选择率

ix_sel – Index selectivity 索引选择率是Oracle中CBO 基于成本优化器重要的参考指标 ,反应了符合谓词条件 通过索引主导列访问表上数据行的比例。(ix_sel – Index selectivity. Fraction of the table rows accessed by the indexes leading column in order to meet the predicate supplied. (10053)。

 

注意仅仅leading column即索引的主导列用作计算ix_sel

举一个简单的计算ix_sel的例子:

SQL> create index ind_maclean on sh.sales( prod_id,CUST_ID,TIME_ID);

Index created.

SQL> exec dbms_stats.gather_table_stats(‘SH’,'SALES’,cascade=>true,method_opt=>’FOR ALL COLUMNS SIZE 1′);

PL/SQL procedure successfully completed.

 

SQL> oradebug setmypid
Statement processed.
SQL> oradebug event 10053 trace name context forever ,level 1;
Statement processed.
SQL> explain plan for select * from sh.sales where prod_id=13 and CUST_ID=987;

Explained.

 

 

 

 
BEGIN Single Table Cardinality Estimation
—————————————–
Column (#1): PROD_ID(NUMBER)
AvgLen: 4.00 NDV: 72 Nulls: 0 Density: 0.013889 Min: 13 Max: 148
Column (#2): CUST_ID(NUMBER)
AvgLen: 5.00 NDV: 5828 Nulls: 0 Density: 1.7159e-04 Min: 2 Max: 100989
Table: SALES Alias: SALES
Card: Original: 924076 Rounded: 2 Computed: 2.20 Non Adjusted: 2.20
—————————————–
END Single Table Cardinality Estimation
—————————————–
Access Path: TableScan
Cost: 414.20 Resp: 414.20 Degree: 0
Cost_io: 389.00 Cost_cpu: 215902675
Resp_io: 389.00 Resp_cpu: 215902675

www.askmaclean.com
Access Path: index (RangeScan)
Index: IND_MACLEAN
resc_io: 5.00 resc_cpu: 37017
ix_sel: 2.3831e-06 ix_sel_with_filters: 2.3831e-06
Cost: 5.00 Resp: 5.00 Degree: 1

 

 

 

ix_sel= 1/ (72*5828)=2.3831e-06

 

对于 Equality predicates 且变量可见(硬绑定或 绑定可窥视) IX_SEL=1 / (NDV1* NDV2*..)

 

同样变量可见情况下>、<开放范围 IX_SEL=(MAX- 代入的范围值) / (MAX-MIN)

 

 

 

而变量不可见(cursor_sharing=FORCE、_optim_peek_user_binds=false)的情况:

 

1、Equality predicates 等式谓词情况下,IX_SEL一般等于列的Density

 

2、 对于> <大于、小于的开放范围谓词 ix_sel一般恒等于0.009,  对于 (object_id>:i and object_id<:b;)的闭包则恒等于 0.0045

例如:

select count(*) from test where object_id>:i

Access Path: index (IndexOnly)
Index: TEST_IDX
resc_io: 3.00 resc_cpu: 160764
ix_sel: 0.009 ix_sel_with_filters: 0.009
Cost: 3.02 Resp: 3.02 Degree: 1
Best:: AccessPath: IndexRange Index: TEST_IDX
Cost: 3.02 Degree: 1 Resp: 3.02 Card: 3869.30 Bytes: 0

 
select count(*) from test where object_id>:i and object_id<:b

Access Path: index (IndexOnly)
Index: TEST_IDX
resc_io: 2.00 resc_cpu: 84043
ix_sel: 0.0045 ix_sel_with_filters: 0.0045
Cost: 2.01 Resp: 2.01 Degree: 1
Best:: AccessPath: IndexRange Index: TEST_IDX
Cost: 2.01 Degree: 1 Resp: 2.01 Card: 193.47 Bytes: 0

ix_sel的 0.009和0.0045 都是写死在代码里的常数值,具体可以参考下表:

 

/* defaults */
 
/* Default selectivities are set low to
 
keep cost values low for future resource limiter use
keep cost values low for permutation cutoff in kko
Defaults are used for bind variables, general expressions and
 
unanalyzed tables, except for equality where defaults are not
 
needed for bind variables.
 
*/
 
#define KKEDSREL 0.05 /* default selectivity for < <= > >= */
#define KKEDSEQ 0.01 /* default selectivity for = */
#define KKEDSNE 0.05 /* default selectivity for != */
#define KKEDSDF 0.05 /* default selectivity for all other ops */
#define KKEDSIRL 0.009 /* default selectivity for relation on indexed col */
#define KKEDSBRL 0.009 /* def sel for relation with bind var on index col*/
#define KKEDSIEQ 0.004 /* default selectivity for = on indexed col */
#define KKEDMBR 8 /* default multiblock read factor */
#define KKEDMBW 8 /* default multiblock write factor */
#define KKEDFNR 100.0 /* default – fixed table cardinality */
#define KKEDFRL 20 /* default – fixed table row length */
#define KKEDDNR 2000.0 /* default – remote table cardinality */
#define KKEDDRL 100 /* default – remote table avg row length */
#define KKEDDNB 100 /* default – default # of blocks */
#define KKEDDSC 13.0 /* default – default scan cost */
#define KKEDILV 1 /* default – default index levels */
#define KKEDILB 25 /* default – number of index leaf blocks */
#define KKEDLBK 1 /* default – number leaf blocks/key */
#define KKEDDBK 1 /* default – number of data blocks/key */
#define KKEDKEY 100 /* default – number of distinct keys */
#define KKEDCLF (KKEDDNB*8) /* default – clustering factor */
#define KKECRI 1.5 /* remote table access cost increase factor */
#define KKECFSC 1.0 /* fixed table scan cost */
#define KKECFNB 0 /* fixed table number of blocks */
#define KKECMXB 15 /* maximum byte length for normalization */
#define KKECBBS 256.0 /* base for byte sequence normalization */
#define KKECSPC ‘ ‘ /* space byte value */
#define KKECSPD 86400.0 /* seconds per day */
#define KKESROH 10.0 /* sort per row overhead in bytes */
#define KKESAUT 0.75 /* sort area utilization */
#define KKESROP 0.10 /* sort row overhead percent */
#define KKESRML 2.0 /* sort run multiple */
#define KKESTP 0×01 /* single table predicate */
#define KKETEQ 0×02 /* equi join */
#define KKETBCPJ 0×04 /* Cartesian product join */
#define KKESOK 0×08 /* input swap ok */
#define KKESWP 0×10 /* inputs swapped */
#define KKEEQP 0×20 /* equipartitioned */
#define KKELKNWC 0×01 /* LIKE no wild card */
#define KKELKTWC 0×02 /* LIKE trailing wild card */
#define KKELKEWC 0×04 /* LIKE embedded wild card */
#define KKELKLWC 0×08 /* LIKE leading wild card */
#define KKELKOWC 0×10 /* LIKE only wild card */

 

 

 

【Maclean技术分享】开Oracle调优鹰眼,深入理解AWR性能报告 第二讲

【技术分享】开Oracle调优鹰眼,深入理解AWR性能报告 第二讲

涉及性能优化教学知识:Host CPU、Instance CPU、Wait Class、SQL Statistics、AWR FOR RAC集群特定调优

适合的学员: 对性能优化有兴趣,或给予提升自己Oracle调优技能的同学

预计时长: 2个小时左右

本次公开教学的视频观看:

 

 

 

正式版文档材料已上传:

【Maclean Liu技术分享】开Oracle调优鹰眼,深入理解AWR性能报告 第二讲 正式版 20130.pdf (2.27 MB, 下载次数: 861)

GATHER_STATS_JOB: Stopped by Scheduler. Consider increasing the maintenance window duration if this happens frequently.

《Does GATHER_STATS_JOB gather all objects’ stats every time?》一文中 , 我们详细介绍了GATHER_STATS_JOB的一些特点,例如数据库一直打开的情况下,GATHER_STATS_JOB会伴随维护窗口一起被启动,默认情况下如果到维护窗口关闭该JOB仍未结束则将被终止(这取决于该JOB的属性stop_on_window_close),剩下的有待收集信息的对象将在下一个维护窗口中得到处理。

 

在版本10gR2中对于那些需要收集而在维护窗口中没来及收集的对象,会由执行收集作业的J00x作业子进程写出哪些没能来得及收集对象的名单,例如:

 

GATHER_STATS_JOB: Stopped by Scheduler.
Consider increasing the maintenance window duration if this happens frequently.
The following objects/segments were not analyzed due to timeout:
TABLE: “MACLEAN”.”HANA”.”ORACLE”

……………………

error 1013 in job queue process
ORA-01013: user requested cancel of current operation

 

 

对于统计信息收集在版本12c中得到进一步的加强的dbms_stats.REPORT_GATHER_AUTO_STATS可以告知用户其将自动收集哪些对象。

【11g新特性】Cardinality Feedback基数反馈

Cardinality Feedback基数反馈是版本11.2中引入的关于SQL 性能优化的新特性,该特性主要针对 统计信息陈旧、无直方图或虽然有直方图但仍基数计算不准确的情况, Cardinality基数的计算直接影响到后续的JOIN COST等重要的成本计算评估,造成CBO选择不当的执行计划。以上是Cardinality Feedback特性引入的初衷。

 

Cardinality Feedback2

Cardinality Feedback1

 

但是每一个Oracle新版本引入的新特性 都被一些老外DBA称之为buggy ,Cardinality Feedback基数反馈多少也造成了一些麻烦,典型的情况是测试语句性能时,第一次的性能最好,之后再运行其性能变差。

 

我们来看一下 Cardinality Feedback基数反馈是如何作用的:
注意使用普通用户来测试Cardinality Feedback,sys用户被默认禁用该特性

 

 

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> conn maclean/oracle
已连接。

SQL> show parameter dynamic

NAME                                 TYPE                   VALUE
------------------------------------ ---------------------- ------------------------------
optimizer_dynamic_sampling           integer                0

SQL> create table test as select * from dba_tables;

表已创建。

SQL> select /*+ gather_plan_statistics */ count(*) from test;

  COUNT(*)
----------
      2873

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------
SQL_ID  0p4u1wqwg6t9z, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from test

Plan hash value: 1950795681

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.01 |     104 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.01 |     104 |
|   2 |   TABLE ACCESS FULL| TEST |      1 |   8904 |   2873 |00:00:00.01 |     104 |
-------------------------------------------------------------------------------------

已选择14行。

SQL> select /*+ gather_plan_statistics */ count(*) from test;

  COUNT(*)
----------
      2873

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------
SQL_ID  0p4u1wqwg6t9z, child number 1
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from test

Plan hash value: 1950795681

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.01 |     104 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.01 |     104 |
|   2 |   TABLE ACCESS FULL| TEST |      1 |   2873 |   2873 |00:00:00.01 |     104 |
-------------------------------------------------------------------------------------

Note
-----
   - cardinality feedback used for this statement

已选择18行。

 

 

上例中第一次运行时,由于未收集表上的统计信息且optimizer_dynamic_sampling=0 关闭了动态采样所以基数评估值(1)和实际值(2873)有着较大的差距。

 

 

cardinality feedback used for this statement这个信息说明第二次执行时使用了Cardinality Feedback基数反馈,且其基数评估也十分精确了,这是因为第二次执行时考虑到第一次执行时的基数反馈,我们来看看Oracle到底是如何做到的:

 

 

 

SQL> alter system flush shared_pool;

系统已更改。

SQL>
SQL> alter session set events '10053 trace name context forever, level 1';

会话已更改。

SQL>  select /*+ gather_plan_statistics */ count(*) from test;

  COUNT(*)
----------
      2873

SQL>  select /*+ gather_plan_statistics */ count(*) from test;

  COUNT(*)
----------
      2873

10053 trace:

第一次执行:

sql= select /*+ gather_plan_statistics */ count(*) from test
----- Explain Plan Dump -----
----- Plan Table -----

============
Plan Table
============
--------------------------------------+-----------------------------------+
| Id  | Operation           | Name    | Rows  | Bytes | Cost  | Time      |
--------------------------------------+-----------------------------------+
| 0   | SELECT STATEMENT    |         |       |       |    31 |           |
| 1   |  SORT AGGREGATE     |         |     1 |       |       |           |
| 2   |   TABLE ACCESS FULL | TEST    |  8904 |       |    31 |  00:00:01 |
--------------------------------------+-----------------------------------+

SELECT /*+ OPT_ESTIMATE (TABLE "TEST" ROWS=2873.000000 ) */ COUNT(*) "COUNT(*)" FROM "MACLEAN"."TEST" "TEST"

SINGLE TABLE ACCESS PATH 
  Single Table Cardinality Estimation for TEST[TEST] 
  Table: TEST  Alias: TEST
    Card: Original: 8904.000000    >> Single Tab Card adjusted from:8904.000000 to:2873.000000
  Rounded: 2873  Computed: 2873.00  Non Adjusted: 8904.00
  Access Path: TableScan
    Cost:  31.10  Resp: 31.10  Degree: 0
      Cost_io: 31.00  Cost_cpu: 1991217
      Resp_io: 31.00  Resp_cpu: 1991217
  Best:: AccessPath: TableScan
         Cost: 31.10  Degree: 1  Resp: 31.10  Card: 2873.00  Bytes: 0

sql= select /*+ gather_plan_statistics */ count(*) from test
----- Explain Plan Dump -----
----- Plan Table -----

============
Plan Table
============
--------------------------------------+-----------------------------------+
| Id  | Operation           | Name    | Rows  | Bytes | Cost  | Time      |
--------------------------------------+-----------------------------------+
| 0   | SELECT STATEMENT    |         |       |       |    31 |           |
| 1   |  SORT AGGREGATE     |         |     1 |       |       |           |
| 2   |   TABLE ACCESS FULL | TEST    |  2873 |       |    31 |  00:00:01 |
--------------------------------------+-----------------------------------+

 

 

 

 

可以看到第二次执行时SQL最终转换加入了 OPT_ESTIMATE (TABLE “TEST” ROWS=2873.000000 )的HINT ,OPT_ESTIMATE HINT一般由 kestsaFinalRound()内核函数生成。该HINT用以纠正各种类型的优化器评估,例如某表上的基数或某个列的最大、最小值。反应出优化的不足或者BUG。

 

可以通过V$SQL_SHARED_CURSOR和来找出现有系统shared pool中仍存在的 使用了Cardinality Feedback基数反馈的子游标:

 

 

SQL> select sql_ID,USE_FEEDBACK_STATS  FROM V$SQL_SHARED_CURSOR where USE_FEEDBACK_STATS ='Y';

SQL_ID                     US
-------------------------- --
159sjt1f6khp2              Y

 

 

 

还可以使用cardinality HINT来强制使用Cardinality Feedback 。

select /*+ cardinality(test,  1) */ count(*) from test;

 

 

如何禁用Cardinality Feedback基数反馈

 

对于这些”惹火”特性,为了stable,往往考虑关闭该特性。

可以通过多种方法禁用该特性

1. 使用 _optimizer_use_feedback 隐藏参数

session 级别

SQL> alter session set “_optimizer_use_feedback”=false;

会话已更改。

system级别

SQL> alter system set “_optimizer_use_feedback”=false;

系统已更改。

 

2. 使用opt_param(‘_optimizer_use_feedback’ ‘false’) HINT

例如:

select /*+ opt_param(‘_optimizer_use_feedback’ ‘false’) cardinality(test,1) */ count(*) from test;

 

 

 

【Maclean Liu技术分享】拨开Oracle优化器迷雾探究Histogram之秘

【Maclean Liu技术分享】拨开Oracle CBO 优化器迷雾, 探究Histogram直方图之秘,讲座文档正式版已上传 
http://t.askmaclean.com/thread-2172-1-1.html  

 

 

预计时长: 1.5个小时

适合参与成员: 对于性能调优和CBO优化器有兴趣的同学,或急于提升SQL调优技能的同学。

讲座材料presentation 当前正式版本下载:

 

【Maclean Liu技术分享】拨开Oracle CBO优化器迷雾,探究Histogram直方图之秘_0321.pdf.pdf(1.1 MB, 下载次数: 749) 

CBO Cost Formulas基于成本优化器的成本计算公式大全

CBO Cost Formulas成本计算公式大全:

成本模型:连接方式Join method

注意 连接基数(Join Cardinality)不受到连接方式(join method) 的影响, oracle中主要的三种join method HASH JOIN、Nested Loops、Sort Merge:

  • Nested Loops嵌套循环成本公式:
    • Cost(outer)+Cost(inner))*cardinality(outer)
  • Sort merge 合并排序连接成本公式:
    • Cost(outer) + Cost(inner) + Sort(outer) + Sort(inner)
  • Hash Join 哈希连接公式:
    • Cost(outer) + Cost(inner) + Build(outer) + Probe(inner)

 

Index Unique Scan Cost成本计算
INDEX UNIQUE SCAN COST = (BLEVEL (1-(OIC/100)) + 1) * (OICA/100)

 

Index Range Scan Cost成本计算
INDEX RANGE SCAN COST = (BLEVEL + FF*LFBL)*(1-(OIC/100))+ FF*CLUF)* (OICA/100)

 

 

formula does not include the CPU cost

  • BLEVEL = number of branch levels in index
  • add +1 for leaf block
  • FF = filtering factor – selectivity
  • LFBL = number of leaf blocks
  • CLUF = index clustering factor
  • OIC = optimizer_index_caching(default 0)
  • OICA = optimizer_index_cost_adj parameter(default=100)

 

CPU costing启用的情况下:

mreadtime -Average time , in milliseconds, for a multi-block read (according to sys.aux_stats$)

sreadtime – Average time , in milliseconds, for a single-block read (according to sys.aux_stats$)

MBRC – Average number of blocks to be read in a multi-block read (according to sys.aux_stats$

#SRDs – number of single block reads

#MRDs – number of multi block reads

#CPUCycles – number of CPU Cycles

sreadtime = ioseektim + db_block_size/iotfrspeed

mreadtim = ioseektim + db_file_multiblock_read_count * db_block_size / iotrfspeed

#MRds = #Blks/MBRC

Cost 成本本身 =(#SRds * sreadtim +#MRds * mreadtim + #CPUCycles/cpuspeed)/sreadtim ,  

Cost成本的单位 为 single-block read time=sreadtim

关于10053 trace中的UNCOMPBKTS和ENDPTVALS

关于10053 trace中的Histogram部分的UNCOMPBKTS和ENDPTVALS

当Histogram直方图类型为frequency histograms( Histogram: Freq)时UncompBkts 等于统计信息中采样的总行数-NULLS(Card: Original- NULLS,因为dbms_stats默认是auto_sample_size采样,所以这栏其实是采样到的原始Card-NULLS), 而EndPtVals 等于bucket总数,或者说NDV,因为frequency histograms中 NDV=number of buckets 

当直方图类型为height balanced histograms (Histogram: HtBal) UncompBkts  等于bucket的数目(其实也等于10053 trace中#Bkts的数目),而EndPtVals 等于已经被压缩的Histogram的大小,其实是等于: select count(*) from dba_tab_histograms where table_name=’MACLEANLIU’ and column_name=’MACLEANLIU’的实际总和。  通过这2个值对比,可以了解到popular值的多少以及数据的倾斜度, 是有多个大量重复的值(popular value)还是仅有一个巨大的重复值。

 

仍不明确少数概念的话,来看看下面这个图:

Histogram representation