有哪些功能是DBMS_STATS无法实现而Analyze命令可以做到的?

今天有同行问我DBMS_STATS与Analyze的区别, 我简单地介绍了一下DBMS_STATS提供了更丰富的功能,从8i开始推荐使用DBMS_STATS来完全替代Analayz Table/Index/Cluster 命令收集数据库中对象的统计信息。

 

之后他又问起我有那些功能是Analyze 命令可以做到,反而DBMS_STATS做不到的?

 

这个问题我倒是没有很系统地去考虑过, 闪过脑子的2个念头就是Analyze Table/Index validate structure 和  Analyze Table list chained rows into chained_rows 这2个命令。

 

Google了一下相关的信息,发现Asktom对这个已经讨论过了,从8i以后analyze被提名为具备”validate”验证功能的命令,主要负责验证表和索引的结构以及链式行(chained and migrated rows)信息; 而DBMS_STATS包负责统计信息的管理。

 

analyze is slated to do this and this alone at some time in the future.  analyze is destined to be
the “validate” thing — analyze validate structure, analyze list chained rows, etc.

dbms_stats will stick with stats and stats only.

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:735625536552

 

因为Oracle Development 开发部门已经给予 Analyze 命令较为精确地功能定位了,  所以其最主要的2个功能validate structure和list chained rows目前和将来都不会被植入到DBMS_STATS包当中。

 

Analyze validate structure 的主要语法包括:

 

@?/rdbms/admin/utlvalid.sql                  /* 创建存放验证信息的表 */
desc invalid_rows

ANALYZE INDEX <index_name>
[PARTITION <partition_name>]
[SUBPARTITION <subpartition_name>]
VALIDATE STRUCTURE CASCADE
INTO <table_name> <OFFLINE | ONLINE>

select * from index_stats;

ANALYZE TABLE <table_name>
[PARTITION <partition_name>]
[SUBPARTITION <subpartition_name>]
VALIDATE STRUCTURE CASCADE
[INTO <table_name>] <OFFLINE | ONLINE>

ANALYZE CLUSTER <cluster_name> VALIDATE STRUCTURE CASCADE
INTO <table_name> <OFFLINE | ONLINE>

 

 

补充:analyze index validate structure会填充index_stats视图,该视图包含了丰富的索引空间信息:

 

SQL> desc index_stats;

 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 HEIGHT                                             NUMBER
 BLOCKS                                             NUMBER
 NAME                                               VARCHAR2(30)
 PARTITION_NAME                                     VARCHAR2(30)
 LF_ROWS                                            NUMBER
 LF_BLKS                                            NUMBER
 LF_ROWS_LEN                                        NUMBER
 LF_BLK_LEN                                         NUMBER
 BR_ROWS                                            NUMBER
 BR_BLKS                                            NUMBER
 BR_ROWS_LEN                                        NUMBER
 BR_BLK_LEN                                         NUMBER
 DEL_LF_ROWS                                        NUMBER
 DEL_LF_ROWS_LEN                                    NUMBER
 DISTINCT_KEYS                                      NUMBER
 MOST_REPEATED_KEY                                  NUMBER
 BTREE_SPACE                                        NUMBER
 USED_SPACE                                         NUMBER
 PCT_USED                                           NUMBER
 ROWS_PER_KEY                                       NUMBER
 BLKS_GETS_PER_ACCESS                               NUMBER
 PRE_ROWS                                           NUMBER
 PRE_ROWS_LEN                                       NUMBER
 OPT_CMPR_COUNT                                     NUMBER
 OPT_CMPR_PCTSAVE                                   NUMBER

 

但是请注意validate structure会锁住table/index/cluster等对象,对于在线业务可以考虑使用validate structure online在线验证方法,但是validate strucutre online也有它的缺点,那就是在线模式下结构验证命令将不填充索引的状态信息到index_stats视图。

 

Analyze list chained rows的主要语法包括:

 

@?/rdbms/admin/utlchain.sql

desc chained_rows

ANALYZE TABLE <table_name> LIST CHAINED ROWS
INTO <table_name>;

ANALYZE CLUSTER <cluster_name> LIST CHAINED ROWS INTO <table_name>;

 

注意因为DBMS_STATS仅仅收集对CBO有用的统计信息,所以一些空间使用情况信息和使用FreeList管理的信息都不会被收集,这些信息包括:

 

If statistics unrelated to the cost based optimizer are required, then these
must still be collected using the the ANALYZE command. These statistics include:

Space Usage information :

EMPTY_BLOCKS,
AVG_SPACE,
CHAIN_CNT

Information on freelist blocks

AVG_SPACE_FREELIST_BLOCKS,
NUM_FREELIST_BLOCKS

 

因为以上信息对于CBO计算成本并没有帮助,所以DBMS_STATS也就无意也无法收集它们,但是Analyze命令还是可以做到收集以上这些信息。

 

此外因为CBO其实并不会参考Cluster类型对象的统计信息来计算Cost成本,而是使用cluster中各个表的统计信息(DBMS_STATS does not gather cluster statistics, but you can use DBMS_STATS to gather statistics on the individual tables instead of the whole cluster. )
所以DBMS_STATS也不支持收集Cluster的统计信息。

 

这么总结一下,发现Analyze命令的独有功能还真不少,因为Oracle公司已经明确了Analyze作为”validate”验证命令的功能定位,且很多内部的工具和脚本仍在使用Analyze Table/Cluster/Index的特有功能,所以可以预期Analyze命令在未来的一段时间内也不会被废弃。

 

总结以上DBMS_STATS无法实现,而传统的Analyze命令可以做到的功能:

  1. Analyze validate structure 验证表、簇、索引的结构的完整性,使用cascade选项可以交叉验证表和索引的数据完整,online选项可以做到在线验证
  2. Analyze list chained rows 收集表、簇上的Migrated and Chained Rows链式或迁移行信息
  3. Analyze table compute statistics 收集表上的 empty_blocks、avg_space 空间使用信息
  4. Analyze Cluster 收集簇的信息,其实cluster上唯一可统计的信息是DBA_CLUSTERS.AVG_BLOCKS_PER_KEY(Number of blocks in the table divided by number of cluster keys) , 所以收集cluster的statistics意义不大

 

附Analyze命令的详细语法如下:

 

Syntax:

   ANALYZE TABLE tablename COMPUTE|ESTIMATE|DELETE STATISTICS ptnOption options
   ANALYZE INDEX indexname COMPUTE|ESTIMATE|DELETE STATISTICS ptnOption options
   ANALYZE CLUSTER clustername COMPUTE|ESTIMATE|DELETE STATISTICS options

ptnOption
   PARTITION (partion)
   SUBPARTITION (subpartition)

options
   VALIDATE STRUCTURE [CASCADE] [INTO tablename]
   LIST CHAINED ROWS [INTO tablename]
   COMPUTE|ESTIMATE STATISTICS FOR TABLE
   COMPUTE|ESTIMATE STATISTICS FOR ALL COLUMNS
   COMPUTE|ESTIMATE STATISTICS FOR ALL INDEXED COLUMNS
   COMPUTE|ESTIMATE STATISTICS FOR COLUMNS [SIZE int] column [SIZE int]

   When Estimating statistics you can optionally
    specify
    ... ESTIMATE STATISTICS SAMPLE n ROWS
    ... ESTIMATE STATISTICS SAMPLE n PERCENT
  1. How do you collect statistics about schema objects?

    There are two ways of gathering data related to schema objects
    Using the ANALYZE statement

    The ANALYZE statement

    You use the ANALYZE statement to do the following:
    Collect or delete statistics about an index or index partition, table or table partition, index-organized table, cluster, or scalar object attribute.
    Validate the structure of an index or index partition, table or table partition, index-organized table, cluster, or object reference (REF).
    Identify migrated and chained rows of a table or cluster.

    Using the DBMS_STATS package

    DBMS_STATS

    DBMS_STATS provides a mechanism to view and modify optimizer statistics gathered for database objects. The statistics can reside in the dictionary or in a table created in the user’s schema for this purpose. Only statistics stored in the dictionary have an impact on the cost-based optimizer

    When do you use the ANALYZE statement?

    You use the ANALYZE statement to sample a number (rather than a percentage) of rows and to collect statistics that are not related to the cost-based optimizer, such as:
    Using the VALIDATE or LIST CHAINED ROWS clauses
    Collecting information on freelist blocks

    When do you use the DBMS_STATS package to gather statistics?

    You use the DBMS_STATS package to enable the cost-based optimizer to use statistics. This is because the optimizer uses statistics gathered by the DBMS_STATS package. In addition, you can use the DBMS_STATS package to collect statistics in parallel, collect global statistics for partitioned objects, and fine-tune your statistics collection in other ways.