流言终结者: AWR的保留天数和SYSAUX表空间的使用率有关吗?

今天在QQ群的技术讨论中有人提及AWR实际保留的天数并非10g的 7天 或 11g 的 8天 ,而是视乎SYSAUX表空间的使用率而定,当SYSAUX表空间空闲空间较多时会将AWR数据保留地更久。

虽然不知道以上这番理论出自那部书籍,但是至少是说的有模有样的,而且网友还告诉我这是他测试过的结果。

实际是这样吗?

我相信这位网友并没有吹牛,他很可能查询dba_hist_snapshot等AWR视图且看到了的确有7天之前的快照仍被保留着,而没有被清理掉。我们来重演他所看到的现场:

测试使用版本11.2.0.2 , 11g中默认AWR保留8天:

 

SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production
SQL> select * from global_name;
GLOBAL_NAME
----------------------------------------------------------
www.askmaclean.com & www.askmaclean.com
SQL> col SNAP_INTERVAL for a20
SQL> col RETENTION for a20
SQL> select snap_interval,retention from dba_hist_wr_control;
SNAP_INTERVAL        RETENTION
-------------------- --------------------
+00000 01:00:00.0    +00008 00:00:00.0
以上确认了默认的快照间隔为1小时 ,且保留时间为8天
检查当前SYSAUX表空间的使用率
REM tablespace report
set linesize 200
select a.tablespace_name,
round(a.bytes_alloc / 1024 / 1024) megs_alloc,
round(nvl(b.bytes_free, 0) / 1024 / 1024) megs_free,
round((a.bytes_alloc - nvl(b.bytes_free, 0)) / 1024 / 1024) megs_used,
round((nvl(b.bytes_free, 0) / a.bytes_alloc) * 100) Pct_Free,
100 - round((nvl(b.bytes_free, 0) / a.bytes_alloc) * 100) Pct_used,
round(maxbytes / 1048576) Max
from (select f.tablespace_name,
sum(f.bytes) bytes_alloc,
sum(decode(f.autoextensible, 'YES', f.maxbytes, 'NO', f.bytes)) maxbytes
from dba_data_files f
group by tablespace_name) a,
(select f.tablespace_name, sum(f.bytes) bytes_free
from dba_free_space f
group by tablespace_name) b
where a.tablespace_name = b.tablespace_name(+)
union all
select h.tablespace_name,
round(sum(h.bytes_free + h.bytes_used) / 1048576) megs_alloc,
round(sum((h.bytes_free + h.bytes_used) - nvl(p.bytes_used, 0)) /
1048576) megs_free,
round(sum(nvl(p.bytes_used, 0)) / 1048576) megs_used,
round((sum((h.bytes_free + h.bytes_used) - nvl(p.bytes_used, 0)) /
sum(h.bytes_used + h.bytes_free)) * 100) Pct_Free,
100 -
round((sum((h.bytes_free + h.bytes_used) - nvl(p.bytes_used, 0)) /
sum(h.bytes_used + h.bytes_free)) * 100) pct_used,
round(sum(f.maxbytes) / 1048576) max
from sys.v_$TEMP_SPACE_HEADER h,
sys.v_$Temp_extent_pool  p,
dba_temp_files           f
where p.file_id(+) = h.file_id
and p.tablespace_name(+) = h.tablespace_name
and f.file_id = h.file_id
and f.tablespace_name = h.tablespace_name
group by h.tablespace_name
ORDER BY 1
/
TABLESPACE_NAME                MEGS_ALLOC  MEGS_FREE  MEGS_USED   PCT_FREE   PCT_USED        MAX
------------------------------ ---------- ---------- ---------- ---------- ---------- ----------
MGMT_AD4J_TS                          200        199          1         99          1      32768
MGMT_ECM_DEPOT_TS                      40         13         27         32         68      32768
MGMT_TABLESPACE                      1350         86       1265          6         94      32768
SYSAUX                                600        295        305         49         51      32768
SYSTEM                                700        231        469         33         67      32768
TEMP                                   21         21          0        100          0      32768
UNDOTBS1                              495        358        137         72         28      32768
USERS                                1950       1243        707         64         36      32768
SYSAUX表空间剩余295MB空间,空闲率较高

 

因为这套数据库在2011-10-17之后就一直没有打开过,所以Automatic Workload Repository中最早保留的快照信息是在2011-10-10,通过查询dba_hist_snapshot视图可以反映这一点:

 

select snap_id,
to_char(begin_interval_time, 'YYYY-MM-DD'),
to_char(end_interval_time, 'YYYY-MM-DD')
from dba_hist_snapshot
order by snap_id;
SNAP_ID TO_CHAR(BE TO_CHAR(EN
---------- ---------- ----------
96 2011-10-10 2011-10-10
97 2011-10-10 2011-10-10
98 2011-10-10 2011-10-10
99 2011-10-10 2011-10-10
100 2011-10-10 2011-10-10
101 2011-10-10 2011-10-11
102 2011-10-11 2011-10-11
103 2011-10-11 2011-10-11
...................
221 2011-10-17 2011-10-17
222 2011-10-24 2011-10-24
SQL> select sysdate from dual;
SYSDATE
---------
24-OCT-11

 

当前的日期是24-OCT-11,而最早的快照信息是在2011-10-10,这样就达成了网友所说的AWR的保留时间并非7或8天,”awr保留天数根据sysaux大小决定。” 或 “默认7天,sysaux足够大这个7天没有意义” 的说法。

 

事实是这样吗?

不是的!

那么为什么能看到早于7天的快照呢?

回答: 不要被所看到的信息所蒙蔽,虽然我们常说事实胜于雄辩或实践是检验真知的唯一 , 但事情的表象往往会欺骗我们。

以上这个问题的关键点并非在于是否能看到早于7天的snapshot快照信息,而在于当MMON后台进程(该进程负责收集和清理AWR数据)在执行对过期快照清理工作时是否会清除7 或 8 天之前的snapshot,以及MMON后台进程多久才Purge一次AWR Snapshot。

以上这些问题 , 我们可以通过_swrf_test_action参数和10046 trace搞清楚:

 

SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
2   FROM SYS.x$ksppi x, SYS.x$ksppcv y
3   WHERE x.inst_id = USERENV ('Instance')
4   AND y.inst_id = USERENV ('Instance')
5   AND x.indx = y.indx
6  AND x.ksppinm LIKE '%_swrf%';
NAME                 VALUE      DESCRIB
-------------------- ---------- --------------------------------------------------
_swrf_test_action    0          test action parameter for SWRF
_swrf_mmon_flush     TRUE       Enable/disable SWRF MMON FLushing
_swrf_mmon_metrics   TRUE       Enable/disable SWRF MMON Metrics Collection
_swrf_metric_frequen FALSE      Enable/disable SWRF Metric Frequent Mode Collectio
t_mode                          n
_swrf_on_disk_enable TRUE       Parameter to enable/disable SWRF
d
_swrf_mmon_dbfus     TRUE       Enable/disable SWRF MMON DB Feature Usage
_swrf_test_dbfus     FALSE      Enable/disable DB Feature Usage Testing

 

_swrf_test_action 隐藏参数用以调试MMON的行为,设置该参数并10046事件:

 

SQL> alter session set "_swrf_test_action" = 28; 
Session altered.
SQL> alter session set "_swrf_test_action" = 10; 
Session altered.
[oracle@vrh4 ContentsXML]$ ps -ef|grep mmon
oracle    2872     1  0 18:28 ?        00:00:00 ora_mmon_SBDB
oracle    3446  3289  0 18:44 pts/1    00:00:00 tail -f SBDB_mmon_2872.trc
oracle    3997  3407  0 19:17 pts/2    00:00:00 grep mmon
SQL> oradebug setospid 2872;
Oracle pid: 15, Unix process pid: 2872, image: oracle@vrh4.oracle.com (MMON)
SQL> oradebug event 10046 trace name context forever,level 12;
Statement processed.

 

完成以上操作后等待一段时间,MMON进程的trace文件会陆续写出一些信息,如:

 

*** 2011-10-24 18:45:24.795
KEWRAFM: Beginning one MMON Auto-Flush cycle ...
Finished one MMON Auto-Flush cycle.
*** 2011-10-24 18:46:24.874
KEWRAFM: Beginning one MMON Auto-Flush cycle ...
Finished one MMON Auto-Flush cycle.
*** 2011-10-24 18:47:24.952
KEWRAFM: Beginning one MMON Auto-Flush cycle ...
Finished one MMON Auto-Flush cycle.
*** 2011-10-24 18:48:25.053
KEWRAFM: Beginning one MMON Auto-Flush cycle ...
Finished one MMON Auto-Flush cycle.

说明MMON每分钟都会自动刷新一定的数据到磁盘上。

 

此外还可以看到MMON清理过期快照的信息:

 

*** 2011-10-24 18:58:25.290
KEWRAFM: Beginning one MMON Auto-Flush cycle ...
Finished one MMON Auto-Flush cycle.
KEWRAPM: Beginning one MMON Auto-Purge cycle ...
KEWRAPM: Finished one MMON Auto-Purge cycle.
KEWRAPC: Auto Purge Action Completed.
*** 2011-10-24 19:28:26.091
KEWRAFM: Beginning one MMON Auto-Flush cycle ...
Finished one MMON Auto-Flush cycle.
KEWRAPM: Beginning one MMON Auto-Purge cycle ..
KEWRAPM: Finished one MMON Auto-Purge cycle
*** 2011-10-24 19:58:27.041
KEWRAFM: Beginning one MMON Auto-Flush cycle ...
Finished one MMON Auto-Flush cycle.
KEWRAPM: Beginning one MMON Auto-Purge cycle ...
KEWRAPM: Finished one MMON Auto-Purge cycle.

 

可以看到在默认情况下MMON每30分钟会自动去清理一次Automatic Workload Repository自动负载仓库中的过期快照信息,当18:58:25.290的第一次清理工作完成后查询dba_hist_snapshot可以发现过期快照消失了:

 

SQL> select snap_id,
2         to_char(begin_interval_time, 'YYYY-MM-DD'),
3         to_char(end_interval_time, 'YYYY-MM-DD')
4    from dba_hist_snapshot
5   order by snap_id;
SNAP_ID TO_CHAR(BE TO_CHAR(EN
---------- ---------- ----------
194 2011-10-16 2011-10-16
195 2011-10-16 2011-10-16
196 2011-10-16 2011-10-16
197 2011-10-16 2011-10-16
.................
222 2011-10-24 2011-10-24

 

通过以上演示我们可知AWR快照的保留天数与SYSAUX的使用率并无关系,实际控制AWR保留天数的最主要因素是MMON何时、如何地清理过期快照? MMON的清理操作直接受到dba_hist_wr_control.retention设置值的影响,默认情况10g 为保留7天,而11g为保留8天,MMON只已清理过期的快照。

同时KEWRAPM的trace信息也说明了,默认情况下MMON每30分钟做一次”MMON Auto-Purge cycle”清理工作。

Slide:深入了解Oracle自动内存管理ASMM by Maclean Liu

MMON Slave日志[KDL_TRIM]: NEWLEN: MESSAGES

有网友反映windows 2003 上的10.2.0.4数据库,MMON的Slave后台进程报带有NEWLEN: MESSAGES的相关日志,如:

*** ACTION NAME:(Auto ADDM Slave Action) 2011-09-09 18:00:12.053
*** MODULE NAME:(MMON_SLAVE) 2011-09-09 18:00:12.053
*** SERVICE NAME:(SYS$BACKGROUND) 2011-09-09 18:00:12.053
*** SESSION ID:(797.606) 2011-09-09 18:00:12.053
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0

这一般是由于10.2.0.4上已经confirmed的bug:”Bug 6972843 – Unnecessary trace files with “[kdl_trim]: newlen” text in them [ID 6972843.8]”所引起的,在bug目前确认仅在10.2.0.4上发生,而到10.2.0.4.1中已经被修复了。

主要是MMON的辅助slave进程会不必要地输出如”[kdl_trim]: newlen: 0″的日志信息:

Unnecessary trace files may be generated with entries that read:'[kdl_trim]: newlen: '.
Hdr: 9600827 10.2.0.4 RDBMS 10.2.0.4 RAM LOBS PRODID-5 PORTID-212
Abstract: [KDL_TRIM]: NEWLEN: MESSAGES IN MMON SLAVES TRACES
PROBLEM:
--------
MMON slave processes traces contains the following entries despite patch 
6972843 applied
*** NAME:(Auto ADDM Slave Action) 2010-04-14 06:00:42.899
*** NAME:(MMON_SLAVE) 2010-04-14 06:00:42.899
*** NAME:(SYS$BACKGROUND) 2010-04-14 06:00:42.899
*** ID:(2472.45569) 2010-04-14 06:00:42.899
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
[kdl_trim]: newlen: 0
DIAGNOSTIC ANALYSIS:
--------------------
- Opatch contains patch 6972843 applied
WORKAROUND:
-----------
none
RELATED BUGS:
-------------
Bug 6972843 - HOURLY TRACE FILE CREATED IN BDUMP FOR AUTO ADDM SLAVE ACTION

该bug可以被忽略,但是如果日志产生过于频繁的话可以考虑升级10.2.0.4的PSU或者直接升级到10.2.0.5。

沪ICP备14014813号

沪公网安备 31010802001379号