了解你所不知道的SMON功能(十):维护SMON_SCN_TIME字典基表

SMON后台进程的作用还包括维护SMON_SCN_TIME基表。

SMON_SCN_TIME基表用于记录过去时间段中SCN(system change number)与具体的时间戳(timestamp)之间的映射关系,因为是采样记录这种映射关系,所以SMON_SCN_TIME可以较为较为粗糙地(不精确地)定位某个SCN的时间信息。实际的SMON_SCN_TIME是一张cluster table簇表。

 

 

 

SMON_SCN_TIME时间映射表最大的用途是为闪回类型的查询(flashback type queries)提供一种将时间映射为SCN的途径(The SMON time mapping is mainly for flashback type queries to map a time to an SCN)。

 

Metalink文档<Error ORA-01466 while executing a flashback query. [ID 281510.1]>介绍了SMON更新SMON_SCN_TIME的规律:

  • 在版本10g中SMON_SCN_TIME每6秒钟被更新一次(In Oracle Database 10g, smon_scn_time is updated every 6 seconds hence that is the minimum time that the flashback query time needs to be behind the timestamp of the first change to the table.)
  • 在版本9.2中SMON_SCN_TIME每5分钟被更新一次(In Oracle Database 9.2, smon_scn_time is updated every 5 minutes hence the required delay between the flashback time and table properties change is at least 5 minutes.)

 

另外从10g开始SMON也会清理SMON_SCN_TIME中的记录了,SMON后台进程会每5分钟被唤醒一次,检查SMON_SCN_TIME在磁盘上的映射记录总数,若总数超过144000条,则会使用以下语句删除最老的一条记录(time_mp最小):

 

delete from smon_scn_time
 where thread = 0
   and time_mp = (select min(time_mp) from smon_scn_time where thread = 0)

 

若仅仅删除一条记录不足以获得足够的空间,那么SMON会反复多次执行以上DELETE语句。

 

触发场景

 

虽然Metalink文档<Error ORA-01466 while executing a flashback query. [ID 281510.1]>指出了在10g中SMON会以每6秒一次的频率更新SMON_SCN_TIME基表,但是实际观测可以发现更新频率与SCN的增长速率相关,在较为繁忙的实例中SCN的上升极快时SMON可能会以6秒一次的最短间隔频率更新 , 但是在空闲的实例中SCN增长较慢,则仍会以每5或10分钟一次频率更新,例如:

 

[oracle@vrh8 ~]$ ps -ef|grep smon|grep -v grep
oracle    3484     1  0 Nov12 ?        00:00:02 ora_smon_G10R21

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bi
PL/SQL Release 10.2.0.1.0 - Production
CORE    10.2.0.1.0      Production
TNS for Linux: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmaclean.com & www.askmaclean.com

SQL> oradebug setospid 3484;
Oracle pid: 8, Unix process pid: 3484, image: oracle@vrh8.oracle.com (SMON)

SQL> oradebug event 10500 trace name context forever,level 10 : 10046 trace name context forever,level 12;
Statement processed.
SQL>
SQL> oradebug tracefile_name;
/s01/admin/G10R21/bdump/g10r21_smon_3484.trc

/* 等待一定时间 */

 

找出SMON trace文件中insert数据到SMON_SCN_TIME的记录:

 

 grep -A20 "insert into smon_scn_time" /s01/admin/G10R21/bdump/g10r21_smon_3484.trc
insert into smon_scn_time (thread, time_mp, time_dp, scn, scn_wrp, scn_bas,  num_mappings, tim_scn_map)
values (0, :1, :2, :3, :4, :5, :6, :7)
END OF STMT
PARSE #4:c=0,e=43,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=1290280848899596
BINDS #4:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fb29844edb8  bln=22  avl=06  flg=05
  value=767145793
 Bind#1
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=00 csi=00 siz=8 off=0
  kxsbbbfp=7fff023ae780  bln=07  avl=07  flg=09
  value="11/14/2011 0:3:13"
 Bind#2
  oacdty=02 mxl=22(04) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fff023ae70c  bln=22  avl=04  flg=09
  value=954389
 Bind#3
--
insert into smon_scn_time (thread, time_mp, time_dp, scn, scn_wrp, scn_bas,  num_mappings, tim_scn_map)
values (0, :1, :2, :3, :4, :5, :6, :7)
END OF STMT
PARSE #1:c=0,e=21,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=1290281434933390
BINDS #1:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fb29844edb8  bln=22  avl=06  flg=05
  value=767146393
 Bind#1
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=00 csi=00 siz=8 off=0
  kxsbbbfp=7fff023ae780  bln=07  avl=07  flg=09
  value="11/14/2011 0:13:13"
 Bind#2
  oacdty=02 mxl=22(04) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fff023ae70c  bln=22  avl=04  flg=09
  value=954720
 Bind#3
--
insert into smon_scn_time (thread, time_mp, time_dp, scn, scn_wrp, scn_bas,  num_mappings, tim_scn_map)
values (0, :1, :2, :3, :4, :5, :6, :7)
END OF STMT
PARSE #3:c=0,e=20,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=1290281727955249
BINDS #3:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fb29844e960  bln=22  avl=06  flg=05
  value=767146993
 Bind#1
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=00 csi=00 siz=8 off=0
  kxsbbbfp=7fff023ae780  bln=07  avl=07  flg=09
  value="11/14/2011 0:23:13"
 Bind#2
  oacdty=02 mxl=22(04) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fff023ae70c  bln=22  avl=04  flg=09
  value=954926
 Bind#3
insert into smon_scn_time (thread, time_mp, time_dp, scn, scn_wrp, scn_bas,  num_mappings, tim_scn_map)
values (0, :1, :2, :3, :4, :5, :6, :7)
END OF STMT
PARSE #4:c=0,e=30,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=1290282313990553
BINDS #4:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fb29844edb8  bln=22  avl=06  flg=05
  value=767147294
 Bind#1
  oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=00 csi=00 siz=8 off=0
  kxsbbbfp=7fff023ae780  bln=07  avl=07  flg=09
  value="11/14/2011 0:28:14"
 Bind#2
  oacdty=02 mxl=22(04) mxlc=00 mal=00 scl=00 pre=00
  oacflg=10 fl2=0001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fff023ae70c  bln=22  avl=04  flg=09
  value=955036
 Bind#3

 

 

可以通过以上INSERT语句的TIME_DP绑定变量值中发现其更新SMON_SCN_TIME的时间规律,一般为5或10分钟一次。这说明SMON_SCN_TIME的更细频率与数据库实例的负载有关,其最短的间隔是每6秒一次,最长的间隔为10分钟一次。

 

由于SMON_SCN_TIME的更新频率问题可能引起ORA-01466错误,详见:
Error ORA-01466 while executing a flashback query. [ID 281510.1]

 

由于SMON_SCN_TIME的数据不一致可能引起ORA-00600[6711]或频繁地执行”delete from smon_scn_time”删除语句,详见:
ORA-00600[6711]错误一例
High Executions Of Statement “delete from smon_scn_time…” [ID 375401.1]

 

SMON维护SMON_SCN_TIME时相关的Stack CALL,ktf_scn_time是更新SMON_SCN_TIME的主要函数:

 

ksedst ksedmp ssexhd kghlkremf kghalo kghgex kghalf kksLoadChild kxsGetRuntimeLock kksfbc
kkspsc0 kksParseCursor opiosq0 opiall0 opikpr opiodr rpidrus skgmstack rpidru rpiswu2 kprball
ktf_scn_time
ktmmon ktmSmonMain ksbrdp opirip opidrv sou2o opimai_real main main_opd_entry

 

SMON 还可能使用以下SQL语句维护SMON_SCN_TIME字典基表:

 

select smontabv.cnt,
smontab.time_mp,
smontab.scn,
smontab.num_mappings,
smontab.tim_scn_map,
smontab.orig_thread
from smon_scn_time smontab,
(select max(scn) scnmax,
count(*) + sum(NVL2(TIM_SCN_MAP, NUM_MAPPINGS, 0)) cnt
from smon_scn_time
where thread = 0) smontabv
where smontab.scn = smontabv.scnmax
and thread = 0
insert into smon_scn_time
(thread,
time_mp,
time_dp,
scn,
scn_wrp,
scn_bas,
num_mappings,
tim_scn_map)
values
(0, :1, :2, :3, :4, :5, :6, :7)
update smon_scn_time
set orig_thread  = 0,
time_mp      = :1,
time_dp      = :2,
scn          = :3,
scn_wrp      = :4,
scn_bas      = :5,
num_mappings = :6,
tim_scn_map  = :7
where thread = 0
and scn = (select min(scn) from smon_scn_time where thread = 0)
delete from smon_scn_time
where thread = 0
and scn = (select min(scn) from smon_scn_time where thread = 0)

 

如何禁止SMON更新SMON_SCN_TIME基表

 

 

可以通过设置诊断事件event=’12500 trace name context forever, level 10’来禁止SMON更新SMON_SCN_TIME基表(Setting the 12500 event at system level should stop SMON from updating the SMON_SCN_TIME table.):

SQL>  alter system set events '12500 trace name context forever, level 10';
System altered.

 

一般我们不推荐禁止SMON更新SMON_SCN_TIME基表,因为这样会影响flashback Query闪回查询的正常使用,但是在某些异常恢复的场景中SMON_SCN_TIME数据讹误可能导致实例的Crash,那么可以利用以上12500事件做到不触发SMON_SCN_TIME被更新。

 

 

如何手动清除SMON_SCN_TIME的数据

 

因为SMON_SCN_TIME不是bootstrap自举核心对象,所以我们可以手动更新该表上的数据、及重建其索引。

如我在<ORA-00600[6711]错误一例>中介绍了因为SMON_SCN_TIME与其索引的数据不一致时,可以通过重建索引来解决问题:

connect / as sysdba
drop index smon_scn_time_scn_idx;
drop index smon_scn_time_tim_idx;
create unique index smon_scn_time_scn_idx on smon_scn_time(scn);
create unique index smon_scn_time_tim_idx on smon_scn_time(time_mp);
analyze table smon_scn_time validate structure cascade;

 

可以在设置了12500事件后手动删除SMON_SCN_TIME上的记录,重启实例后SMON会继续正常更新SMON_SCN_TIME。除非是因为SMON_SCN_TIME表上的记录与索引smon_scn_time_tim_idx或smon_scn_time_scn_idx上的不一致造成DELETE语句无法有效删除该表上的记录:文档<LOCK ON SYS.SMON_SCN_TIME [ID 747745.1]>说明了该问题,否则我们没有必要手动去清除SMON_SCN_TIME上的数据。

具体方法如下:

 

SQL> conn / as sysdba
/* Set the event at system level */
SQL> alter system set events '12500 trace name context forever, level 10';
/* Delete the records from SMON_SCN_TIME */
SQL> delete from smon_scn_time;
SQL> commit;
SQL> alter system set events '12500 trace name context off';
完成以上步骤后重启实例restart instance
shutdown immediate;
startup;

【Oracle数据恢复】ORA-00600[6711]错误一例

一套Linux上的10.2.0.4系统,日志中频繁出现ORA-00600[6711]内部错误:

 

如果自己搞不定可以找ASKMACLEAN专业ORACLE数据库修复团队成员帮您恢复!

 

Wed Sep  1 21:24:30 2010
Errors in file /s01/10gdb/admin/YOUYUS/bdump/youyus_smon_5622.trc:
ORA-00600: internal error code, arguments: [6711], [4256248], [1], [4256242], [0], [], [], []
Wed Sep  1 21:24:31 2010
Non-fatal internal error happenned while SMON was doing logging scn->time mapping.

 

 

MOS上有一个关于6711内部错误十分简单的Note,该文档声称出现6711错误极有可能是部分类型为簇(cluster)的数据字典表存在潜在的讹误,这个Note甚至没有告诉我们该错误argument参数的意义。
不过其实我们可以猜出来,因为是和corruption相关的错误,那么实际上可能关联的几个因素无非是obj#,file#,block#;4256248和4256242 两个数字像极了Data Block Address,把他们当做dba来看待,也就指向了1号数据文件的61938块和61944数据块,我们来看看这些块属于哪个对象:

SQL> set linesize 200;
SQL> select segment_name, segment_type
2    from dba_extents
3   where relative_fno = 1
4     and (61938 between block_id and block_id + blocks or
5         61944 between block_id and block_id + blocks);
SEGMENT_NAME                                                                      SEGMENT_TYPE
--------------------------------------------------------------------------------- ------------------
SMON_SCN_TO_TIME                                                                  CLUSTER

不出意料是一个cluster,SMON_SCN_TO_TIME是SMON_SCN_TIME表的基簇,SMON_SCN_TIME表用以记录数据库中scn对应的时间戳。我们直接查看用以创建数据字典的sql.bsq文件,可以进一步了解他们的结构:

cat $ORACLE_HOME/rdbms/admin/sql.bsq|grep -A 24 "create cluster smon_scn_to_time"
create cluster smon_scn_to_time (
thread number                         /* thread, compatibility */
)
/
create index smon_scn_to_time_idx on cluster smon_scn_to_time
/
create table smon_scn_time (
thread number,                         /* thread, compatibility */
time_mp number,                        /* time this recent scn represents */
time_dp date,                          /* time as date, compatibility */
scn_wrp number,                        /* scn.wrp, compatibility */
scn_bas number,                        /* scn.bas, compatibility */
num_mappings number,
tim_scn_map raw(1200),
scn number default 0,                  /* scn */
orig_thread number default 0           /* for downgrade */
) cluster smon_scn_to_time (thread)
/
create unique index smon_scn_time_tim_idx on smon_scn_time(time_mp)
/
create unique index smon_scn_time_scn_idx on smon_scn_time(scn)
/

从以上脚本可以看到这个簇上存在多个索引,我们需要进一步validate验证所有这些对象:

SQL> analyze table SMON_SCN_TIME validate structure;
Table analyzed.
SQL>analyze table SMON_SCN_TIME validate structure cascade;
Table analyzed.
SQL> analyze cluster SMON_SCN_TO_TIME validate structure;
Cluster analyzed.
SQL> analyze cluster SMON_SCN_TO_TIME validate structure cascade;
analyze cluster SMON_SCN_TO_TIME validate structure cascade
*
ERROR at line 1:
ORA-01499: table/index cross reference failure - see trace file

到这里问题已经很清晰了,问题出在SMON_SCN_TO_TIME的索引smon_scn_to_time_idx身上,极有可能是该索引上出现了逻辑讹误。所幸有问题的仅仅是索引,找出问题所在后要解决就显得容易得多了:

SQL> alter index smon_scn_to_time_idx rebuild ;
Index altered.
/* 在索引出现讹误的情况下仅仅rebuild往往是无效的,在我们rebuild的同时告警日志中再次出现了ORA-00600[6711]错误 !!! */
/* 我们需要的彻底把有问题的索引drop掉,并再次创建!!! */
SQL> drop index smon_scn_to_time_idx ;
Index dropped.
SQL> create index smon_scn_to_time_idx on cluster smon_scn_to_time;
Index created.
/* 至此问题解决,告警日志中不再出现错误! * /
/* That's great! * /

沪ICP备14014813号

沪公网安备 31010802001379号