11gR2新特性:STANDBY_MAX_DATA_DELAY

Active Data Guard 是 Oracle 11g 的亮点特性之一,而在11G release 2中对Active Data Guard引入了更多诱人的新特性,这些特性将Active Data Guard打造成Oracle 读写分离或报表查询的理想方案之一。

 

STANDBY_MAX_DATA_DELAY是11gr2中对Active Data Guard的最大增强(buffer)之一,这是一个可以在会话级别指定的参数(session parameter),该参数指定了在Primary Database已commit提交的变化与standby Database数据库上涉及相关变化的查询之间所允许的时间延迟,单位为second 秒(Specifies a limit for the amount of time (in seconds) allowed to elapse between when changes are committed on the primary and when those same changes can be queries  on the standby database)。

 

使用该STANDBY_MAX_DATA_DELAY参数的语法如下:

ALTER SESSION SET STANDBY_MAX_DATA_DELAY ={ NONE | INTEGER }

 

注意事项

  • 该参数无法为SYS用户所用,在SYS用户的SESSION下设置该参数将被忽略
  • 若没有指定STANDBY_MAX_DATA_DELAY,即使用其默认值NONE,那么无论主备库之间有多大的延迟,在Physical Standby上的查询都会被执行
  • 若查询延迟超过STANDBY_MAX_DATA_DELAY所指定的值那么,将报ORA-03172错误:

 

03172, 00000, "STANDBY_MAX_DATA_DELAY of %s seconds exceeded"
// *Cause:  Standby recovery fell behind the STANDBY_MAX_DATA_DELAY
//          requirement.
// *Action: Tune recovery and retry the query later, or switch to another
//          standby database within the data delay requirement.

在实际运用中STANDBY_MAX_DATA_DELAY保证了在Standby数据库上所作的报表查询不会得到过于陈旧的结果(stale result),通过该参数我们可以指定一个报表应用所容许的数据时间延迟。

当然也可以指定不容许任何数据延迟,即设置STANDBY_MAX_DATA_DELAY为零,以便做到实时数据查询。

配置Primary 与 Standby 数据库之间的实时查询或者说零延迟查询有以下注意事项:

  • 只有特定的应用程序才会对数据延迟有零容忍的需求,注意你的应用程序是否有如此苛刻的要求
  • 在Standby数据库上执行的查询语句必须返回和主库上查询的完全一致的结果
  • 必须设置STANDBY_MAX_DATA_DELAY 为0
  • 在查询开始的那一刻,Standby数据库必须同步到与Primary数据库一致的Current Scn
  • 若结果没有在200ms内返回,则查询会因ORA-03172而终止
  • Primary数据库必须采用最大可用(max availability)或最大保护(maximum protection)模式
  • redo 传输必须使用SYNC 选项
  • 必须启用 Real-Time Query 特性

 

实际使用

 

以下我们通过演示来了解该STANDBY_MAX_DATA_DELAY的效果:

SQL> select * from v$version;  

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmac.cn & www.askmac.cn

Primary Database  SQL> conn maclean/maclean
Connected.

Primary Database SQL> select database_role,protection_mode from v$database;

DATABASE_ROLE    PROTECTION_MODE
---------------- --------------------
PRIMARY          MAXIMUM AVAILABILITY

Primary Database SQL>  create table TSMDD tablespace users as select * From dba_objects;
Table created.

Standby Database SQL> conn maclean/maclean
Connected.

Standby Database SQL> select database_role,protection_mode from v$database;

DATABASE_ROLE    PROTECTION_MODE
---------------- --------------------
PHYSICAL STANDBY MAXIMUM AVAILABILITY

注意STANDBY_MAX_DATA_DELAY是一个会话参数session parameter,而非实例参数instance parameter

Standby Database SQL> select name from v$system_parameter where name='standby_max_data_delay';

no rows selected

Standby Database SQL> alter session set STANDBY_MAX_DATA_DELAY=0;

Session altered.

Standby Database SQL> select count(*) from TSMDD; 

  COUNT(*)
----------
     13378

 

实际测试可以发现当STANDBY_MAX_DATA_DELAY=0时,并不是查询语句执行时间超过200ms就返回ORA-03172错误,而是指从查询开始的200ms内,若备库没有追上主库的Current SCN时出现ORA-03172。

 

Standby Database SQL> alter session set STANDBY_MAX_DATA_DELAY=0;

Session altered.

Standby Database SQL> set timing on;

Standby Database SQL> select count(1) from TSMDD a, TSMDD b;

  COUNT(1)
----------
 178970884

Elapsed: 00:00:05.34

Standby Database SQL> alter session set events '10046 trace name context forever,level 12';
Session altered.

在主库上执行大数据量的insert操作,但是不提交commit;

Primary Database SQL> insert into /*+ append */  tsmdd select * from tsmdd;

此时在Standby 数据库 上执行查询语句将触发ORA-3172错误

Standby Database SQL> select count(*) from tsmdd
                     *
ERROR at line 1:
ORA-03172: STANDBY_MAX_DATA_DELAY of 0 seconds exceeded

Standby Database SQL>  /
select count(*) from tsmdd
*
ERROR at line 1:
ORA-03172: STANDBY_MAX_DATA_DELAY of 0 seconds exceeded

 

以上查询语句执行过程中的10046 trace如下:

 

PARSING IN CURSOR #47828795969456 len=26 dep=0 uid=34 oct=3 lid=34 tim=1316692536000853
hv=2314050071 ad='7115e798' sqlid='3smn48y4yv6hr'

select count(*) from tsmdd
END OF STMT
PARSE #47828795969456:c=0,e=61,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1739041831,tim=1316692536000852
WAIT #47828795969456: nam='standby query scn advance'
ela= 201440 p1=770798 p2=0 p3=20 obj#=13873 tim=1316692536202337
WAIT #47828795969456: nam='SQL*Net break/reset to client' ela= 25 driver id=1650815232
break?=1 p3=0 obj#=13873 tim=1316692536202528
WAIT #47828795969456: nam='SQL*Net break/reset to client' ela= 144 driver id=1650815232
break?=0 p3=0 obj#=13873 tim=1316692536202694
WAIT #47828795969456: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1
p3=0 obj#=13873 tim=1316692536202715

*** 2011-09-22 19:55:37.983
WAIT #47828795969456: nam='SQL*Net message from client' ela= 1781108 driver
id=1650815232 #bytes=1 p3=0 obj#=13873 tim=1316692537983884
CLOSE #47828795969456:c=0,e=24,dep=0,type=0,tim=1316692537984068

===============================================================================================

PARSING IN CURSOR #47828795969456 len=26 dep=0 uid=34 oct=3 lid=34 tim=1316692537984172
hv=2314050071 ad='7115e798' sqlid='3smn48y4yv6hr'
select count(*) from tsmdd
END OF STMT
PARSE #47828795969456:c=0,e=53,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1739041831,tim=1316692537984171
WAIT #47828795969456: nam='standby query scn advance' ela= 200546 p1=770914
p2=0 p3=20 obj#=13873 tim=1316692538184822
WAIT #47828795969456: nam='SQL*Net break/reset to client' ela= 10 driver
id=1650815232 break?=1 p3=0 obj#=13873 tim=1316692538184998
WAIT #47828795969456: nam='SQL*Net break/reset to client' ela= 103 driver
id=1650815232 break?=0 p3=0 obj#=13873 tim=1316692538185154
WAIT #47828795969456: nam='SQL*Net message to client' ela= 1 driver
id=1650815232 #bytes=1 p3=0 obj#=13873 tim=1316692538185182

 

注意这里出现的standby query scn advance等待事件,显然该等待事件是为了确认Primary与Standby之间的Scn差距,但这又是一个Internal的undocumented 等待事件。我猜测是P1是Standby数据库的Current Scn,而p3可能是Primary 与 Standby之间的Scn 差距。OBJ#是查询对象的object_id:

 

SQL> col owner for a20
SQL> col object_name for a20

SQL> select owner,object_name from dba_objects where object_id=13873;

OWNER                OBJECT_NAME
-------------------- --------------------
MACLEAN              TSMDD

 

使用技巧

 

在实际的使用过程中我们没有必要每次登录会话查询都去指定STANDBY_MAX_DATA_DELAY参数,可以通过创建AFTER LOGON触发器来简化工作。

在11 g Release 2中引入了USERENV Context的一种新属性DATABASE_ROLE,使用该属性可以便捷地定位用户所登录数据库的角色是Primary 还是 Standby,11g的SQL 和 PL/SQL客户端程序均可以通过 SYS_CONTEXT 函数获取该数据库角色信息。

通过创建以下登陆后触发器可以做到当应用程序登录到启用实时查询的Standby数据库上后即自动设置合适的STANDBY_MAX_DATA_DELAY参数。这样即避免了修改应用程序的代码,有做到了配置合理的最大数据延迟。

CREATE OR REPLACE TRIGGER AUTO_SMDD
  AFTER LOGON ON USER.SCHEMA
BEGIN
  IF (SYS_CONTEXT('USERENV', 'DATABASE_ROLE') IN ('PHYSICAL STANDBY')) THEN
    execute immediate 'alter session set standby_max_data_delay=5';
  END IF;
END;

 

注意以上trigger 只需要在Primary Database上以应用相关用户身份建立即可,会同步到Standby上:

 

Primary Database SQL>  conn maclean/maclean
Connected.

Primary Database SQL> CREATE OR REPLACE TRIGGER AUTO_SMDD
  2    AFTER LOGON ON MACLEAN.SCHEMA
  3  BEGIN
  4    IF (SYS_CONTEXT('USERENV', 'DATABASE_ROLE') IN ('PHYSICAL STANDBY')) THEN
  5      execute immediate 'alter session set standby_max_data_delay=0';
  6    END IF;
  7  END;
  8  /
Trigger created.

Posted

in

by

Tags:

Comments

2 responses to “11gR2新特性:STANDBY_MAX_DATA_DELAY”

  1. maclean Avatar

    Hdr: 5218701 10.2.0.2 RDBMS 10.2.0.2 DATAGUARD_LSBY PRODID-5 PORTID-46
    Abstract: ENHANCE DELAY MECHANISM FOR DATA GUARD STANDBYS

    The customer made the following enhancement request:
    Currently, when you set the DELAY parameter for a standby database in a Data
    Guard configuration, the delay is relative to a log switch. So, if you only
    switch logfiles every 20 minutes, your standby will be 20 minutes + DELAY +
    apply time minutes behind the primary. This is acceptable for physical
    standbys, but not for logical standbys used for reporting.

    We need a
    delay on the logical to ensure it is always “in the past” behind the primary,

    and therefore can always be successfully reinstated.

    There are times after a failover when the logical can not be reinstated, so
    you have to manually recreate it from a new backup. This can be
    time-consuming. Plus, if you are using the logical to store additional data,
    such as archive data, it is a real pain to recreate the database.

  2. tsc Avatar
    tsc

    请问若结果没有在200ms内返回,则查询会因ORA-03172而终止这里的200ms 是从何得知?我查询了很多资料,没有找到任何这方面的记录。

Leave a Reply

Your email address will not be published. Required fields are marked *