【Goldengate性能优化】优化Extract抽取进程性能,解决OGG抽取日志延迟

一般来说OGG Goldengate 抽取进程对CPU的压力非常小, 而对于I/O 、network的吞吐量有轻量级的要求。

用低配置AIX测试结果如下。

抽取进程支持DB Log生成峰值速度 = 4 * 2.1 = 8.4 MB/秒,或30GB/小时,或726 GB/天。
抽取进程平均CPU占用1.9% 。

投递进程支持DB Log生成平均速度 = 2,096,854 * 2.1 = 4.5 MB/秒,或16 GB/小时,或380 GB/天。
投递进程平均CPU占用7% 。

 

 

对于Extract抽取日志缓慢导致延迟的问题,优先采用如下方法诊断具体慢在 抽取 还是 写trail上:

 

1. 收集原始慢的Extract的性能信息

GGSCI> stats extract <extract_name>, totalsonly *, reportrate sec
GGSCI> stats extract <extract_name>, totalsonly *, reportrate min

 

2. 创建一个新的extract 参数文件

cp <extract_name>.prm ETEST.prm

3. 修改上述 etest params file中的extract名字 和 trail 位置

 

4. 加入TESTMAPPINGSPEED 参数到 etest的params files

TESTMAPPINGSPEED参数的作用是 不让extract 去写trail 文件 而仅仅抽取日志, 若加入该参数后抽取速度大幅提升则说明性能瓶颈在 write trail上

TESTMAPPINGSPEED
REPORTCOUNT EVERY 5000 RECORDS

 

5. 增加etest这个extract

GGSCI> add extract etest, tranlog, begin now

GGSCI> add exttrail ./dirdat/ma , extract etest , megabytes 200

 

6. 为etest指定 原始extract 存在抽取速度问题的archivelog 的sequence

GGSCI> alter extract etest, extseqno <arch_seq_no>, extrba 0

 

7. 启动etest 这个extract

GGSCI> start extract etest

 

等待5分钟并检查

GGSCI> stats extract etest, totalsonly *, reportrate sec
GGSCI> stats extract etest, totalsonly *, reportrate min

 

对比 原始慢的extract 与 新加入的etest的 stats reportrate 报告中的性能指标,若 TESTMAPPINGSPEED 后 性能明显提升则说明问题出在 写trail  (extract 写到本地的情况) 或者 网络传出慢( 直接写到目标机上)。

 

如果TESTMAPPINGSPEED 后性能也无明显变化则继续。

 

8. 将所有extract 的表都注释掉,而仅仅extract 一张很少变化记录的表, 若这样 后性能明显提升则说明 瓶颈不在读archivelog 上而在 日志记录的处理上 log record processing 。

一般来说redo日志的解析分成2部分:

A. Record parsing in Extract
B. Record fetching if needed

 

9.为了进一步确认问题 将TESTMAPPINGSPEED 注释掉, 并 加入 TRACE/TRACE2 参数 以便确认 Extract是否慢在fetch上

 

–TESTMAPPINGSPEED http://www.askmaclean.com
TRACE ./dirtmp/ext.trc
TRACE2 ./dirtmp/ext.trc2

 

10 检查生成的trace 文件 若 其中显示 大量的时间耗费在一些SELECT语句上,则需要DBA介入来调优这些SELECT SQL

 

11. 若看到一些与undo/rollback 相关的错误例如ORA-1555则确保UNDO 表空间可用 空间足够,  也可以加入  FETCHOPTIONS NOUSESNAPSHOT 让 Extract fetch column 数据是尽可能不要走UNDO CR READ

 

12. 如果将大部分表都去掉,只剩下一个不太用的表且仍无明显的性能增长, 且CPU 也不忙, 一般来说这可能是IO瓶颈造成的

 

13. 建议dd测一下archivelog 的读取速度

例如maclean>time dd if=<归档日志> of=/dev/null bs=1M

对比其他磁盘若有明显差异, 则考虑将archivelog 移动到对应磁盘并再次上述测试。

 

 

对于cache较小的sequence 可以引起在replicat DDL 时频繁执行 ALTER SEQUENCE “SEQ_NAME” CYCLE的DDL语句:

 

2013-04-22 09:54:06  INFO    OGG-01487  DDL found, operation [ALTER SEQUENCE "TXN_SEQ" CYCLE  (size 31)], start SCN [20181621217], commit SCN [20181621231] instance [C
ULPRODB (1)], DDL seqno [2734821], marker seqno [2736076].

2013-04-22 09:54:06  INFO    OGG-00487  DDL operation included [Include Mapped], optype [ALTER], objtype [SEQUENCE], objowner [OLSUSER], objname [TXN_SEQ].

2013-04-22 09:54:06  INFO    OGG-00497  Writing DDL operation to extract trail file.
Wildcard SEQUENCE resolved (entry olsuser.*):
  Sequence "OLSUSER"."TXN_SEQ";
Resolving source sequence OLSUSER.TXN_SEQ.

2013-04-22 09:54:07  INFO    OGG-01487  DDL found, operation [ALTER SEQUENCE "TXN_SEQ" CYCLE  (size 31)], start SCN [20181621236], commit SCN [20181621248] instance [C
ULPRODB (1)], DDL seqno [2734822], marker seqno [2736077].

2013-04-22 09:54:07  INFO    OGG-00487  DDL operation included [Include Mapped], optype [ALTER], objtype [SEQUENCE], objowner [OLSUSER], objname [TXN_SEQ].

2013-04-22 09:54:07  INFO    OGG-00497  Writing DDL operation to extract trail file.
Wildcard SEQUENCE resolved (entry olsuser.*):
  Sequence "OLSUSER"."TXN_SEQ";
Resolving source sequence OLSUSER.TXN_SEQ.

2013-04-22 09:54:08  INFO    OGG-01487  DDL found, operation [ALTER SEQUENCE "TXN_SEQ" CYCLE  (size 31)], start SCN [20181621251], commit SCN [20181621261] instance [C
ULPRODB (1)], DDL seqno [2734823], marker seqno [2736078].

2013-04-22 09:54:08  INFO    OGG-00487  DDL operation included [Include Mapped], optype [ALTER], objtype [SEQUENCE], objowner [OLSUSER], objname [TXN_SEQ].

2013-04-22 09:54:08  INFO    OGG-00497  Writing DDL operation to extract trail file.
Wildcard SEQUENCE resolved (entry olsuser.*):
  Sequence "OLSUSER"."TXN_SEQ";
Resolving source sequence OLSUSER.TXN_SEQ.

 

若该replicat target库上尚有extract则 extract挖掘日志时可能因为处理频繁的DDL操作而变得很慢,”Why GoldenGate replicat issues “alter sequence .. cycle|nocycle” in sequence replication? [ID 1535322.1]” 文档指出了 这种频繁的 是为了在target 上上可信赖的同步sequence的高水位。

但是这种超频繁的 几乎每2s 一次的ALTER SEQUENCE CYCLE操作确实拖慢了Extract的速度, 可以通过指定参数 DBOPTIONS _NOAUTOMATICSEQUENCEFLUSH来减少ALTER SEQUENCE CYCLE出现,实际并不能完全避免。 文档指出使用该 DBOPTIONS _NOAUTOMATICSEQUENCEFLUSH参数可能导致TARGET上的SEQUENCE 不同步。

实际优先考虑增加SOURCE上SEQUENCE的CACHE解决问题, 之后再考虑用 DBOPTIONS _NOAUTOMATICSEQUENCEFLUSH; 针对该SEQUENCE  在EXTRACT里 排除该索引 没有效果, 例如 DDL EXCLUDE OBJNAME “XX.SEQNAME”  或者 TABLEEXCLUDE “XX.SEQUENCE”在实际测试中均没有明显的改善, 但修改DBOPTIONS _NOAUTOMATICSEQUENCEFLUSH确实可以该少性能。

关注dbDao.com的新浪微博

扫码关注dbDao.com 微信公众号:

Comments

  1. Why GoldenGate replicat issues “alter sequence .. cycle|nocycle” in sequence replication?Applies to:Oracle GoldenGate – Version 11.2.1.0.4 and laterInformation in this document applies to any platform.GoalI am using GoldenGate to replicate sequence last_number. As this is for failover purpose, the DDL setup (including DDL trigger) is enabled in target database. during sequence replication by replicat, I saw lots of following DDLs captured in ggs_ddl_trace.log:alter sequence …. cycle; —- for the sequence with cycle option alter sequence … nocycle; —- for the sequence without cycle option What is this DDL for, and can I disable them to improve the purpose?FixThe replicat changes the last_number in target sequence by querying the nextval, but this change is not secured if the target database goes down. The observed DDL will move the high water mark (HWM) of target sequence and make sure sequences replicate reliably for each sequence record. The DDL may be skipped with replicat parameter: DBOPTIONS _NOAUTOMATICSEQUENCEFLUSH However, if target db goes down, the sequence number may be out of sync with source. then you will need to use FLUSH to make sure they are in sync (or FLUSH it before the target goes down, for scheduled shutdown).Other options to reduce the overhead from target DDL trigger:1. disable the target DDL trigger if possible, and enable it only before the failover happens.2. increase the cach size (e.g., from default 20 to 2000), which will reduce the number of sequence records to be replicated.3. modify ddl_filter.sql in target to filter out the DDL (note 1420197.1 shows how to use ddl_filter.sql). ReferencesNOTE:1331998.1 – Replicat For Sequence do not Checkpoint when (1) having cycle option, or (2) target last_number is significantly less than source one

  2. guest says:

    。。。。。

  3. GOGOGO!!! says:

    学习中。。感谢知识的分享

  4. GOGOGO!!! says:

    学习中。。感谢知识的分享

Speak Your Mind

TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569