No way to disable datapump estimate?

昨天晚上去客户现场配合一个导入导出作业;这是一套Solaris 10上的10.2.0.1系统,导出采用expdp数据泵工具,需要导出的数据源是一张大小在120G左右的单表,该表存放了最近一年的数据,实际导出时只需要最近三个月的数据,所以使用了QUERY参数,并指定了并行度为2。
该导出作业之前未经测试过,语句也是临时写的,实际执行导出工作时发现在评估阶段(estimate phase)耗费了大约十多分钟的时间,estimate默认使用blocks模式,即通过计算对象段占用的数据库块来预估dumpfile的大小;此外还有statistics模式通过对象的统计信息来推算导出文件的大小:

The block method of estimate is the least accurate. We investigate how many blocks were allocated for the storage segments for the tables specified. If the table had been recently analyzed, the estimate=statistics would be the most accurate. So remember, it is important to generate statistics before calculating the estimate using estimate=statistics method. Additionally, ESTIMATE=BLOCKS method could be inaccurate when:

a) The table was created with a much bigger initial extent size than was needed for the actual table data

b) Many rows have been deleted from the table, or a very small percentage of each block is used.

# When statistics have been explicitly set via the DBMS_STATS package in order to force certain optimizer plans, instead of being directly collected from the data, ESTIMATE=STATISTICS may not report correct results.

但实际上这2中方式在使用QUERY的情况下都无法正确预估导出文件的大小:

[maclean@rh2 udump]$ expdp maclean/maclean directory=dump  tables=estimate_me query=estimate_me:'"where t1<2000"'  estimate=statistics

Export: Release 10.2.0.4.0 - 64bit Production on Tuesday, 02 November, 2010 21:28:26

Copyright (c) 2003, 2007, Oracle.  All rights reserved.

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
FLASHBACK automatically enabled to preserve database integrity.
Starting "MACLEAN"."SYS_EXPORT_TABLE_01":  maclean/******** directory=dump tables=estimate_me query=estimate_me:"where t1<2000" estimate=statistics
Estimate in progress using STATISTICS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
.  estimated "MACLEAN"."ESTIMATE_ME"                     104.7 MB
Total estimation using STATISTICS method: 104.7 MB
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
. . exported "MACLEAN"."ESTIMATE_ME"                     32.42 KB    1999 rows
Master table "MACLEAN"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for MACLEAN.SYS_EXPORT_TABLE_01 is:
  /s01/dump/expdat.dmp
Job "MACLEAN"."SYS_EXPORT_TABLE_01" successfully completed at 21:28:29

/* statistics模式使用统计来判断导出文件大小,比较容易控制 */

begin
 dbms_stats.set_table_stats(ownname => 'MACLEAN',tabname => 'ESTIMATE_ME',numrows => 999999999999999,numblks => 99999999999999999999);
  end;
/

PL/SQL procedure successfully completed.

Estimate in progress using STATISTICS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
.  estimated "MACLEAN"."ESTIMATE_ME"                     10244 GB
Total estimation using STATISTICS method: 10244 GB

/* 再次导出时数据泵采用了"作假的"统计信息*/

[maclean@rh2 udump]$ expdp maclean/maclean directory=dump  tables=estimate_me query=estimate_me:'"where t1<2000"'  estimate=blocks

Export: Release 10.2.0.4.0 - 64bit Production on Tuesday, 02 November, 2010 21:31:58

Copyright (c) 2003, 2007, Oracle.  All rights reserved.

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
FLASHBACK automatically enabled to preserve database integrity.
Starting "MACLEAN"."SYS_EXPORT_TABLE_01":  maclean/******** directory=dump tables=estimate_me query=estimate_me:"where t1<2000" estimate=blocks
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
.  estimated "MACLEAN"."ESTIMATE_ME"                       192 MB
Total estimation using BLOCKS method: 192 MB
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
. . exported "MACLEAN"."ESTIMATE_ME"                     32.42 KB    1999 rows
Master table "MACLEAN"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for MACLEAN.SYS_EXPORT_TABLE_01 is:
  /s01/dump/expdat.dmp
Job "MACLEAN"."SYS_EXPORT_TABLE_01" successfully completed at 21:32:01

/* 可以看到以上BLOCKS method预估的dumpfile大小为192MB,而STATISTICS方式预估为104.7MB;
    然而实际的使用QUERY参数导出的1999行数据只占用32k的空间!
    estimate在这里摆了一个大乌龙
*/

查了一下metalink发现10.2.0.1上存在expdp在estimate评估阶段耗时很久的bug:

EXPDP Slow and Estimate Takes Long Time [ID 822283.1]
Applies to:
Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 10.2.0.3
This problem can occur on any platform.
Symptoms
Data pump Export is very slow and long time is spent in Estimating the data size.
EXPDP is stuck for long at :

Starting "SYSTEM"."SYS_EXPORT_SCHEMA_01": system/******** parfile=dpexp.par
Estimate in progress using BLOCKS method...

AWR report taken when EXPDP is running slow show the below query took long time to complete .

SELECT /*+rule*/ SYS_XMLGEN(VALUE(KU$),
XMLFORMAT.createFormat2('TABLE_DATA_T', '7')), 0 ,KU$.BASE_OBJ.NAME
,KU$.BASE_OBJ.OWNER_NAME ,'TABLE' ,to_char(KU$.BYTES_ALLOC) ,
to_char(KU$.ET_PARALLEL) ,KU$.FGAC ,KU$.NONSCOPED_REF ,KU$.XMLSCHEMACOLS
,KU$.NAME , KU$.NAME ,'TABLE_DATA' ,KU$.PART_NAME ,KU$.SCHEMA_OBJ.OWNER_NAME
,KU$.TS_NAME , KU$.TRIGFLAG
,decode(KU$.SCHEMA_OBJ.TYPE_NUM,2,decode(bitand(KU$.PROPERTY,8192),8192,
'NESTED TABLE','T'),19,'PARTITION',20,'PARTITION','SUBPARTITION')
,to_char(KU$.UNLOAD_METHOD)
FROM SYS.KU$_TABLE_DATA_VIEW KU$
WHERE NOT BITAND(KU$.BASE_OBJ.FLAGS,128)!=0 AND NOT (BITAND
(KU$.BASE_OBJ.FLAGS,16)=16) AND KU$.BASE_OBJ.NAME IN
('PA_DRAFT_INVOICES_ALL') AND KU$.BASE_OBJ.OWNER_NAME IN ('PA') AND NOT
EXISTS (SELECT 1 FROM SYS.KU$NOEXP_TAB A WHERE A.OBJ_TYPE='TABLE' AND
A.NAME=KU$.BASE_OBJ.NAME AND A.SCHEMA=KU$.BASE_OBJ.OWNER_NAME) AND NOT EXISTS
(SELECT 1 FROM SYS.KU$NOEXP_TAB A WHERE A.OBJ_TYPE='SCHEMA' AND
A.NAME=KU$.BASE_OBJ.OWNER_NAME);

Cause
The issue here is with using RBO.
Data pump is not choosing Cost Based Optimizer.
This is because of the 'unpublished' bug 5929373 -DATA PUMP TAKES LONG TIME TO EXPORT DATA

This bug is closed as duplicate of another bug << 5464834 >>

Solution

   1. Upgrade to 10.2.0.4 . The bug is fixed in 10.2.0.4
      Or
   2. Apply the << patch 5464834 >> if available for your platform .

        * Login to metalink
        * Go to 'Patches and Updates'
        * Click on 'Simple Search'
        * Enter the patch number 5464834
        * Select your platform from the dropdown list .
        * Patch,if available , will be displayed .
        * Download and apply the patch .

该Bug可以通过升级版本解决,但远水解不了近火,最初想disable掉estimate功能;但查阅若干资料后发现目前似乎没有能够关掉estimate功能的办法,这个就是悲哀所在了!耐心等estimate阶段完成后,实际数据导出阶段倒也只花了十来分钟,就这一点看来datapump很有中国式的办事风格。

另外当指定的导出文件数小于指定的parallelism,且导出数据量较大时可能出现ORA-39095错误,会导致导出作业意外终止:

39095, 00000, "Dump file space has been exhausted: Unable to allocate %s bytes"
// *Cause:  The Export job ran out of dump file space before the job was
//          completed.
// *Action: Reattach to the job and add additional dump files to the job
//          restarting the job.

EXPDP generating ORA-39095 : " dump file space has been exhausted"
Applies to:
Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 11.2.0.2 - Release: 10.2 to 11.2
Information in this document applies to any platform.
Symptoms
Checked for relevance on 10-22-2010

Errors while performing Datapump Export:

- when exporting a large number of tables then it completely stops
- the jobs may remain in the datapump export tables
- No trace generated at the time of the failure.
ERROR:
ORA - 39095 : " dump file space has been exhausted. Unable to allocate 4096 bytes".jobsystem.sys_export_full_02 stops due to fatal error.

Cause
Customer chose only limited number of files in "dumpfile" parameter with limited "filesize" , i.e.:

   full=Y
   directory=dpump_dir
   dumpfile=full_1.dmp,full_2.dmp,full_3.dmp,full_4.dmp
   filesize=3000000000

This means 4 dump files with filesize 3000000000 byte (2.79 G).

the dumpfile parameter is the issue here.

.
Solution
1- You can use the dynamic format (i.e dumpfile=full_%U.dmp) :

The 'wildcard' specification for the dump file can expand up to 99 files. If 99 files have been generated before the export has completed, it will again return the ORA-39095 error.

2- If this is yet not enough and more files are needed, a workaround would be to speficy a bigger 'filesize' parameter.

3- If this is inconvenient, another option is to use this syntax:

      dumpfile=fullexp%U.dmp, fullexp2_%U.dmp, fullexp3_%U.dmp

which can expand up to 3*99 files.

If encountering problems containing the dump in a single directory using this solution, you may prefer this syntax:

     dumpfile=dmpdir1:fullexp1_%U.dmp, dmpdir2:fullexp2_%U.dmp, dmpdir3:fullexp3_U.dmp

(assuming the 3 directory objects listed above had been already created first).

解决方法很简单就是去掉并行度,或者指定动态的导出文件名。
记以录之,回去补觉了!

  • The block method of estimate is the least accurate. We investigate how many blocks were allocated for the storage segments for the tables specified. If the table had been recently analyzed, the estimate=statistics would be the most accurate. So remember, it is important to generate statistics before calculating the estimate using estimate=statistics method. Additionally, ESTIMATE=BLOCKS method could be inaccurate when:a) The table was created with a much bigger initial extent size than was needed for the actual table data

    b) Many rows have been deleted from the table, or a very small percentage of each block is used.

  • When statistics have been explicitly set via the DBMS_STATS package in order to force certain optimizer plans, instead of being directly collected from the data, ESTIMATE=STATISTICS may not report correct results.

关注dbDao.com的新浪微博

扫码关注dbDao.com 微信公众号:

Comments

  1. 老熊的fans says:

    原来导入导出都能写的这么神奇~

Speak Your Mind

TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569