How to set LOGSOURCE PATHMAP on GoldenGate

Question:
the source machine os is solaris ,the mid machine os is linux ,
the target machine os is solaris,i want extract archive log file and pump trail file
to target on mid machine , i check GoldenGate Reference Guide version 10.4 about TRANLOGOPTIONS,
must set ARCHIVEDLOGONLY and LOGSOURCE PATHMAP , in my environment ,how to set LOGSOURCE PATHMAP ?

Answer:
There two primary things that need to be met when extracting the data remotely.

1. Endian order (How the bytes are stored)
Big endian OS    : AIX, HPUX, MVS, SOLARIS, S290
Little endian OS : LINUX, VMS, WINDOWS

2. Bit (32 or 64 bit).

If your source is SOLARIS(Big Endian) and mid machine where archive logs resides as LINUX(Little Endian),
then you cannot use LOGSOURCE as it violates the ENDIAN order.

For exampel you can either have source and mid machine as solaris solaris or AIX ,HPUX  or vice versa

The general syntax to extract data from transaction logs which resides on a platform other than the one
which hosts the database If you the source and mid machine are Solaris you can use the following syntax.

TRANLOGOPTIONS, LOGSOURCE SOLARIS, PATHMAP

Please refer Oracle GoldenGate Reference guide page 356 for the example

How GoldenGate process consumes memory

Question:
We are using Golden Gate to replicate the data from Oracle 9.2.0.8 on Solaris8 SPARC 64 bit (GoldenGate Version 10.4.0.31 Build 001) to Oracle RAC 11.2.0.1 on Solaris10 SPARC 64bit (GoldenGate Version 10.4.0.19 Build 002). Both GoldenGate Extract and Replicat process are working fine. Please refer below information for more easy to understand our goldengate setup.

Extract Side | Replicat Side
Hostname: HK8SN020 | Hostname: HK8SP226 (HK8SP227 dont have any goldengate client, all goldengate process are located on HK8SP226)
Oracle 9.2.0.8 (32bit binary) | Oracle 11.2.0.1 (64bit binary)
Solaris8 Sparc 64bit Kernel | Solaris10 Sparc 64bit kernel
GoldenGate Version 10.4.0.31 Build 001 | GoldenGate Version 10.4.0.19 Build 002

However, on 27-Mar-2010, we found the server memory utilization on Solaris10 HK8SP226 are unexpected continuously rising since around 01:30. At around 3:20, the server memory utilization are up to 100%. At around 5:20, the server memory utilization is suddenly drop and back to normal. We compared the “Sar -r” and Solaris server message logfile. We found that on 05:21:44, goldengate relicate process are terminated with error message ” malloc 2097152 bytes failed”. After that, seem the server memory are suddenly released and back to normal memory utilization level.
We suspected the abnormal server memory usage are cased by goldengate replicate process. Can you please help to investigate and find out the root cause?

Answer:
GoldenGate replicates only committed transactions, it stores the operations of each transaction in a managed virtual-memory pool known as a cache until it receives either a commit or a rollback for that transaction. One global cache operates as a shared resource of an Extract process. The following sub-pools of virtual memory are allocated from the global cache:(1)One sub-pool per log reader thread for most transaction row data. and (2)One sub-pool for BLOB data and possibly other large items.

Within each sub-pool, individual buffers are allocated from the global cache, each one containing information that is relative to a transaction that is being processed by GoldenGate. The sizes of the initial and incremental buffers are controlled by the CACHEBUFFERSIZE option of CACHEMGR.

The actual amount of physical memory that is used by any GoldenGate process is controlled by the operating system, not the GoldenGate process. The global cache size is controlled by the CACHESIZE option of CACHEMGR.Cache manager keeps a GoldenGate process working within the soft limit of its global cache size, only allocating virtual memory (not physical memory) on demand. The actual amount of physical memory that is used by any GoldenGate process is controlled by the operating system, not the GoldenGate program.

GoldenGate cache manager only takes advantage of the memory management functions of the operating system to ensure that GoldenGate processes work in a sustained and efficient manner. Within cache, OGG makes use of all the modern “virtual memory” techniques by allocating and managing active buffers efficiently and recycling old buffers instead of paging to disk, when possible and paging less-used information to disk, when necessary.

When COM initializes, by default it first determines how much virtual memory the OS has available for it and uses that to determine what CACHESIZE should be. Default for CACHESIZE is 8GB for 64-bit systems and 2GB for 32-bit systems.

The available virtual memory is reported with the PROCESS VM AVAIL FROM OS value in the report file. The CACHESIZE value will either be rejected or sized down if it is larger than, or sufficiently close to, the amount of virtual memory that is available to the process.

The CACHESIZE value will always be a power of two, rounded down from the value of PROCESS VM AVAIL FROM OS, unless the latter is itself a power of two, in which case it is halved. After the specified size is consumed by data, the memory manager will try to free up memory by paging data to disk or by reusing aged buffers, before requesting more memory from the system.

The memory manager generates statistics that can be viewed with the SEND EXTRACT or SEND REPLICAT command when used with the CACHEMANAGER option.The statistics show the size of the memory pool, the paging frequency, the size of the transactions, and other information that creates a system profile. Based on this profile, you might need to make adjustments to the memory cache if you see performance problems that appear to be related to file caching. The first step is to modify the CACHESIZE and CACHEPAGEOUTSIZE parameters. You might need to use a higher or lower cache size, a higher or lower page size, or a combination of both, based on the size and type of transactions that are being generated. You might also need to adjust the initial memory allocation with the CACHEBUFFERSIZE option. It is possible, however, that operating system constraints could limit the effect of modifying any components of the CACHEMGR parameter. In particular, if the operating system has a small per-process virtual memory limit, it will force more file caching, regardless of the CACHEMGR configuration.

Once the CACHESIZE is set to 1 GB, the GoldenGate process will use up to 1 GB virtual memory and then it will use swap space on disk.
If the CACHESIZE is explicitly set in process parameter file then the CACHEMGR will use only 1GB. Otherwise it will default to the Memory size depending upon the platform(32 or 64).If a fixed CACHESIZE is set in the parameter file then it will be taken by the process, if no the default will be taken by the process depending upon the platform. If very low virtual memory limit is set or available in the OS then it will force more file caching. There is always a difference between caching in memory buffers and file caching as it involves read and write i/o’s.

So try to set a default CACHESIZE for the GoldenGate Process (Extract/Replicat). Edit the respective source extract and target replicat parameter files and use the below mentioned CACHEMGR parameter with the options given and restart the processes.

CACHEMGR CACHEBUFFERSIZE 64KB, CACHESIZE 1GB, CACHEDIRECTORY
, CACHEDIRECTORY
Example:
CACHEMGR CACHEBUFFERSIZE 64KB, CACHESIZE 1GB, CACHEDIRECTORY /ggs/dirtmp, CACHEDIRECTORY /ggs2/temp

So once the CACHESIZE is set to 1 GB, the GoldenGate process will use up to 1 GB virtual memory only and then after it will use swap space on disk.

Will Goldengate use Remote Procedure Call (RPC)?

Question:
My Customer using Oracle GoldenGate to sync data from 11.2.0.1 RAC (on Solaris 10 SPRAc 64bit) to 9.2.0.8 standalone DB (On solaris Sparc 64bit). My customer recently doing an OS security scan check for node of 11.2.0.1 RAC. The security scan check result saying the RAC node (HK8SP226) has a potential security issue on “rpcstatd: RPC statd remote file creation and removal”. My customer suspect that Goldengate is using RPC for file transfer. Would you please help to check if goldengate need to use RPC or not? if goldengate dont need to use RPC, then i will ask my csutomer to close this function in server

Supplementory information for security check result.

———————————————————————-

hk8sp226 {172.20.104.178} Solaris
H
rpcstatd: RPC statd remote file creation and removal
Remote Procedure Call (RPC) statd maintains state information in cooperation with RPC lockd to provide crash and recovery
functionality for file locking across the Network File System (NFS). Statd does not validate information received from a remote lockd.
By sending to the statd service an RCP or RDIST request including references to the parent directory (“..”), an attacker can provide
false information to the rpc.statd file, allowing the creation of a file in an arbitrary directory on the host. This can be used to overwrite
pre-existing files or create new files on the host.

Answer:
Oracle GoldenGate transfers trail files over TCP/IP to the remote host which internally uses RPC for transferring the data to remote. You could let customer know about it and could exclude it from hardening.

配置GoldenGate意外处理及记录Oracle错误信息(1)

GoldenGate自身不提供异常处理的程序。在默认情况下,若Replicat进程遭遇操作故障都会导致其异常终止(ABEND),同时将会回滚事务到最近的检查点。在生产环境中这往往并不理想。当我们在做无缝的数据迁移时会用到HANDLECOLLISIONS和NOHANDLECOLLISIONS参数,这2个参数控制了Replicat是否会试图解决重复记录和缺失记录的错误,但这错误真的应该被忽略吗?这个问题只有熟悉应用的维护人员才能解答,我们需要做的是在出现这类错误后及时记录错误的相关信息。

这就要求我们在Replicat中配置相关的意外处理程序,该程序用以记录是哪个Replicat进程,是那些数据引起了错误的产生。

接下来我们会尝试建立不同的意外处理程序,用以捕获复制过程中Oracle出现的相关错误,但在这些故障发生后我们允许Replicat继续它的工作:

1. 首先我们要做的是创建用以记录错误相关信息的错误记录表:
SQL> conn maclean/maclean
Connected.
SQL> create table exception_log
  2  ( replicat_name varchar2(10),
  3    table_name varchar2(100),
  4    errno number,
  5    dberrmsg varchar2(4000),
  6    optype varchar2(20),
  7    errtype varchar2(20),
  8    logrba number,
  9    logposition number,
 10    committimestamp timestamp);

Table created.

SQL> alter table exception_log add primary key (logrba,logposition,committimestamp);
Table altered.

以上意外处理记录表应建立于Goldengate管理员账户名下,可用以记录所有Replicat的意外数据。

2.修改各Replicat的参数文件添加相关的意外处理程序:
GGSCI (rh3.oracle.com) 59> view params rep1

replicat rep1
userid maclean,password maclean
ASSUMETARGETDEFS
discardfile /s01/discard/rep1.log,append,megabytes 10
REPERROR (DEFAULT, EXCEPTION)
REPERROR (DEFAULT2,ABEND)
map defs.tbc, target defs.tbc;
map defs.tbc, target maclean.exception_log, 
EXCEPTIONSONLY,
INSERTALLRECORDS,
COLMAP (   replicat_name = "rep1"
, table_name = @GETENV ("GGHEADER", "TABLENAME")
, errno = @GETENV ("LASTERR", "DBERRNUM")
, dberrmsg = @GETENV ("LASTERR", "DBERRMSG") 
, optype = @GETENV ("LASTERR", "OPTYPE")
, errtype = @GETENV ("LASTERR", "ERRTYPE")
, logrba = @GETENV ("GGHEADER", "LOGRBA")
, logposition = @GETENV ("GGHEADER", "LOGPOSITION")
, committimestamp = @GETENV ("GGHEADER", "COMMITTIMESTAMP"));

REPERROR参数用以控制Replicat进程如何响应映射过程中发生的错误
DEFAULT参数代表一种全局错误类型,即除去所有已明确指定的错误外的一切错误
DEFAULT2参数代表当DEFAULT错误以Exception方式响应时,所有MAP映射中未定义Exception部分出现的所有错误

3.停止并重启Replicat 进程
GGSCI (rh3.oracle.com) 60> stop rep1

Sending STOP request to REPLICAT REP1 ...
Request processed.


GGSCI (rh3.oracle.com) 61> start rep1

Sending START request to MANAGER ...
REPLICAT REP1 starting

4.以上完成了对单表defs.tbc意外处理的配置,接着我们可以启动相关的应用了

5.通过某些手动篡改数据或不同步操作,可以很容易地触发意外处理而将相关错误信息记录到我们的exception_log表中,我们可以来查看
相关记录:
SQL> col dberrmsg for a1;
SQL> col table_name for a10;
SQL> select * from exception_log;
 
REPLICAT_NAME TABLE_NAME      ERRNO D OPTYPE               ERRTYPE                  LOGRBA LOGPOSITION COMMITTIMESTAMP
------------- ---------- ---------- - -------------------- -------------------- ---------- ----------- ---------
rep1          DEFS.TBC         1403   PK UPDATE            DB                          231    59259920 23-DEC-10 1
rep1          DEFS.TBC         1403   PK UPDATE            DB                          231    59260812 23-DEC-10 1
rep1          DEFS.TBC         1403   PK UPDATE            DB                          231    94620688 23-DEC-10 1
rep1          DEFS.TBC         1403   PK UPDATE            DB                          231    94621580 23-DEC-10 1
rep1          DEFS.TBC         1403   PK UPDATE            DB                          231    94682640 23-DEC-10 1
rep1          DEFS.TBC         1403   PK UPDATE            DB                          231    94683532 23-DEC-10 1
 
6 rows selected

可以看到以上为基于主键更新时出现了1403 "no data found"的Oracle常规错误。

意外处理程序所能记录的相关信息还不于止此,我们还可以用它来记录如Update操作的前后数据镜像,这些信息可以在冲突解决时派上用场。

How to handle ddl in GoldenGate environment without ddl replication?

Question:gg only config to replicate dml, when source has some ddl changes, how to handle in source and target?

Answer:
GoldenGate capture the dml changes only by default,if you have made any changes in DDL structure,then stop the extract and replicat process and make changes in DDL structure and restart the extract and replicat process,then only extract can be able to capture the changes done at DDL structure.

It is advisible that stop extract,pump and replicat process make changes in the DDL and restart the extract,pump and replicat.
The above suggested procedure will not make any impact in DDL replication.

Also you can follow the below procedure
1. stop extract
2. stop data pump
3. there are lag in replicat, so replicat still running to reduce lag
4. do ddl in source, when replicat is running
5. stop replicat untill lag=0
6.do ddl in target
7.start extract and pump and replicat

Question:After do ddl in source, whether need to do delete and re-add the trandata on the table which is been changed?

Answer:
yes,you are correct,After doing DDL changes,you need to do delete and re-add the trandata on the table which is being changed.

GoldenGate Build for Oracle 8i

BUILD REQUEST INSTALLATION MEDIA
BUILD REQUEST ORACLE GOLDENGATE BUILD FOR 8I -32BIT ON WINDOWS (XP, 2003) 1040_WINNT.zip
BUILD REQUEST SUN SOLARIS 5.8 SPARC (64-BIT) , ORACLE 8I GG VERSION 10.0.0.52_00 ggs_solaris8_sparc_ora81_32bit_v10_0_0_52_001.tar
GGS V10.4.0.31.001 BUILD FOR HPUX 11.11 PA-RISC 64BIT / ORACLE 8.1 ggs_hpux1111_pa_ora81_32bit_v10_4_0_31_001.tar

GoldenGate实现Live Standby主备库切换(2)

在《GoldenGate实现Live Standby主备库切换(1)》中我们介绍了如何针对GoldenGate Live standby环境执行计划内的Switchover切换。除去计划内的主备切换,实际生产中更多的故障切换发生在主机故障或主库不可用的情况下,这种情况下一般我们已经无法在Primary上停止应用及extract了;当我们在这样的情况下failover到Standby上后如同在DataGuard环境下一样即便Primary上的数据库恢复了我们也无法直接进行回切了,需要做的是重新配置Primary上的OGG并将Standby上的数据以initial load的形式还原回去,在数据重新同步后才能再切换到Primary上。下面我们就来介绍如何在计划外的情况下从主库failover到备库,并尝试回切:

1.
使用lag replicat命令了解standby上的replicat的延迟情况,若返回"At EOF (end of file)"则说明replicat已应用所有trail中的数据到备库上。
GGSCI (rh3.oracle.com) 1> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     STOPPED     EXTSTD2     00:00:00      23:42:47    
EXTRACT     STOPPED     PUMPSTD2    00:00:00      23:41:29    
REPLICAT    RUNNING     REPSTD1     00:00:00      00:00:00  

GGSCI (rh3.oracle.com) 5> lag replicat repstd1

Sending GETLAG request to REPLICAT REPSTD1 ...
Last record lag: 5 seconds.
At EOF, no more records to process.

2.
停止standby上的replicat
GGSCI (rh3.oracle.com) 6> stop replicat repstd1

Sending STOP request to REPLICAT REPSTD1 ...
Request processed.

3.
在standby上执行必要的赋予DML权限,启动triggers触发器和cascade delete约束的脚本

4.
启动standby上的extract,
在此之前先确认Standby上的data pump group不被启动,以保证trail文件堆积在standby上
GGSCI (rh3.oracle.com) 15> info all
Program     Status      Group       Lag           Time Since Chkpt
MANAGER     RUNNING                                           
EXTRACT     STOPPED     EXTSTD2     00:00:00      24:04:16    
EXTRACT     STOPPED     PUMPSTD2    00:00:00      24:02:57    
REPLICAT    STOPPED     REPSTD1     00:00:00      00:00:06    

GGSCI (rh3.oracle.com) 16> start extstd2
Sending START request to MANAGER ...
EXTRACT EXTSTD2 starting

5.
此时可以将应用切换到standby上了

==============================================================================
以上步骤完成了故障切换到Standby的过程,接下来我们尝试将应用还原到primary上
1.如果原primary主机已损毁则需要重装Oracle软件,并重建Primary系统上的Goldengate软件目录
2.从primary端启动GGSCI命令
3.删除primary上相关的extract及EXTTRAIL,并重建

GGSCI (rh2.oracle.com) 6> delete extract extstd1
Deleted EXTRACT EXTSTD1.

GGSCI (rh2.oracle.com) 7> delete exttrail /d01/ext/cl

GGSCI (rh2.oracle.com) 14> add extract extstd1,tranlog,begin now
EXTRACT added.

GGSCI (rh2.oracle.com) 15> add exttrail /d01/ext/cl,megabytes 100,extract extstd1
EXTTRAIL added.

4.
在primary上启动Manager
GGSCI (rh2.oracle.com) 18> start Manager
Manager started.

5.
接着在primary上执行disable trigger触发器和cascade delete约束的脚本

6.
在standby上对执行热备份(逻辑,物理的均可);并记录该热备的结束时间

7.
使用standby上的热备份来完成primary上的initial load后,再以HANDLECOLLISIONS选项启动Standby上的replicat
GGSCI (rh2.oracle.com) 22> view params repstd2

-- Identify the Replicat group:
REPLICAT repstd2
-- State that source and target definitions are identical:
ASSUMETARGETDEFS
-- Specify database login information as needed for the database:
userid maclean, password maclean
HANDLECOLLISIONS
-- Specify tables for delivery:
MAP clinic.*, TARGET clinic.*;
-- Exclude specific tables from delivery if needed:
-- MAPEXCLUDE 


GGSCI (rh2.oracle.com) 23> start replicat repstd2

Sending START request to MANAGER ...
REPLICAT REPSTD2 starting

8.并启动standby上的data pump group,将堆积的trail文件传输到Primary上
GGSCI (rh3.oracle.com) 19> start pumpstd2
Sending START request to MANAGER ...
EXTRACT PUMPSTD2 starting

9.使用info replicat观察primary上的replicat,观察其进度是否已晚于完成初始化导出的时间


10.
禁用primary上目前使用的HANDLECOLLISIONS选项
GGSCI (rh2.oracle.com) 26> send replicat repstd2,NOHANDLECOLLISIONS

11.
关闭之前切换到Standby上的一切应用

12.
12.若需要进行数据验证则关闭Standby上的extract、pump及Primary上的replicat:
GGSCI (rh2.oracle.com) 31> lag replicat repstd2
Sending GETLAG request to REPLICAT REPSTD2 ...
Last record lag: 3 seconds.
At EOF, no more records to process.

GGSCI (rh3.oracle.com) 28> stop extstd2
Sending STOP request to EXTRACT EXTSTD2 ...
Request processed.


GGSCI (rh3.oracle.com) 26> stop pumpstd2
Sending STOP request to EXTRACT PUMPSTD2 ...
Request processed.

GGSCI (rh2.oracle.com) 34> stop replicat repstd2
Sending STOP request to REPLICAT REPSTD2 ...
Request processed.

/* 使用Oracle GoldenGate Veridata等工具验证数据一致性,
   若不一致则修复
*/

standby库上:
SQL> select sum(t2) from tv;

   SUM(T2)
----------
5355944997

primary库上:
SQL> select sum(t2) from tv;

   SUM(T2)
----------
5355944997

13.在primary系统上赋予应用相关DML权限,启用触发器及删除约束

14.
修改primary系统上的extract group的begin time为当前,启动Primary到Standby的extract、pump及replicat
GGSCI (rh2.oracle.com) 36> alter extstd1 ,begin now
EXTRACT altered.


GGSCI (rh2.oracle.com) 52> start extract extstd1

Sending START request to MANAGER ...
EXTRACT EXTSTD1 starting


GGSCI (rh2.oracle.com) 53> start extract pumpstd1

Sending START request to MANAGER ...
EXTRACT PUMPSTD1 starting


GGSCI (rh3.oracle.com) 3> start repstd1

Sending START request to MANAGER ...
REPLICAT REPSTD1 starting

此时系统切换回原始的primary->standby状态.
That's great!

How to understand goldengate report file statistics

- Are total process records = inserts + updates + deletes +discards + ignores?
Generally total process records = inserts + updates + deletes +discards + ignores

- Are update collisions is part of updates ?
Yes

- Are delete collisions is part of deletes?
Yes

- Are total successfully process records = inserts + updates + deletes?
Yes

- Can HANDLECOLLISIONS handle ORA-00001 error?
No, HANDLECOLLISIONS can not handle ORA-00001 error. This error is an unique constraint violation error.
HANDLECOLLISIONS control whether or not replicate tries to resolve duplicate-record
and missing-record errors when applying SQL on the target.
These errors can occur during an initial load, when data from source tables is
being loaded to target tables while Oracle Golden Gate is replicating
transactional changes that are being made to those tables.
You can read more about HANDLECOLLISIONS and NOHANDLECOLLISIONS in Oracle® GoldenGate Windows and UNIX Reference Guide.

- How to find the value of bind variable when error is occured by update statement?
In case you want to capture conflict records, you can capture the conflict records to discard file.
Also the conflict record can be written to trace table.

- How to capture the conflict records to discard file?
A discard file is generated when the error condition is encountered by the extract or replicat,
and there is a database error generated from a DDL operation.
Discard file is generally used to do the troubleshooting.

The location of the discard file, is determined by the DISCARDFILE parameter in the Extract or Replicat parameter file.

DISCARDFILE  [, APPEND | PURGE | MEGABYTES ]

Where:

<file name> is the discard file name.
APPEND adds new content to existing content if the file already exists.
PURGE purges the file vefore writing new content.
MEGABYTES <N> sets the maximum size of the file, in megabytes. The default is 1MB.