2009年9月4日

RAC动态资源(DRM)管理介绍

mac

Author

8 min

Read Time

369

Views

以下文本摘自: metalink doc 390483.1 Subject: DRM - Dynamic Resource management Doc ID: 390483.1 Type: BULLETIN Modified Date : 13-JAN-2009 Status: PUBLISHED In this Document Purpose Scope and Application DRM - Dynamic Resource management DRM - Dynamic Resource Mastering References Applies to: Oracle Server - Enterprise Edition - Version: 10.1.0.2 to 11.1.0 Oracle Server - Standard Edition - Version: 10.1.0.2 to 11.1.0 Information in this document applies to any platform. Oracle Real application Clusters Purpose To describe the concept of DRM (Dynamic Resource Mastering) Scope and Application This note in intended for experienced Real application cluster DBA's DRM - Dynamic Resource management DRM - Dynamic Resource Mastering When using Real application Clusters (RAC), Each instance has its own SGA and buffer cache. RAC will ensure that these block changes are co -ordinated to maximize performance and to ensure data intergrity. Each copy of the buffer also called as a cache resource has a master which is one of the nodes of the cluster. In database releases before 10g (10.1.0.2) once a cache resource is mastered on an instance, a re-mastering or a change in the master would take place only during a reconfiguration that would happen automatically during both normal operations like instance startup or instance shutdown or abnormal events like Node eviction by Cluster Manager. So if Node B is the master of a cache resource, this resource will remain mastered on Node B until reconfiguration. 10g introduces a concept of resource remastering via DRM. With DRM a resource can be re-mastered on another node say from Node B to Node A if it is found that the cache resource is accessed more frequently from Node A. A reconfiguration is no longer the only reason for a resource to be re-mastered. In 10gR1 DRM is driven by affinity of files and in 10gR2 it is based on objects. Sample LMD trace file during a DRM operation

Begin DRM(202) - transfer pkey 4294951314 to 0 oscan 1.1
*** 2006-08-01 17:34:54.645
Begin DRM(202) - transfer pkey 4294951315 to 0 oscan 1.1
*** 2006-08-01 17:34:54.646
Begin DRM(202) - transfer pkey 4294951316 to 0 oscan 1.1
*** 2006-08-01 17:34:54.646
Begin DRM(202) - transfer pkey 4294951317 to 0 oscan 1.1

DRM attributes are intentionally undocumented since they may change depending on the version. These attributes should not be changed without discussing with Support. @DRM is driven by the following @ 1.) _gc_affinity_time = Time in minutes at which statistics will be evaluated (default = 10 mins) @ 2.) _gc_affinity_limit = # of times a node accesses a file/object (default = 50) @ 3.) _gc_affinity_minimum = minimum # of times per minute a file/object is accessed before affinity kicks in @ (default = 600 per minute per cpu ) It is important to note that

Two instance will not start a DRM operation at the same time however lmd,lms,lmon processes from all instances collectively take part in the DRM operation.
Normal activity on the database is not affected due to DRM. This means users continue insert/update/delete operations without any interruptions. Also DRM operations complete very quickly.

@ Disable DRM Generally DRM should not be disabled unless Oracle Support/Development has suggested turning it off due to some known issues. @To disable DRM, set @To disable DRM, set @_gc_affinity_time=0 # Only if DB version is 10.1 or 10.2 @_gc_undo_affinity=FALSE # Only if Db version is 10.2 @_gc_policy_time=FALSE # Only if DB version is 11.1 or higher @_gc_affinity_time has been renamed to _gc_policy_time in 11g

admin 2010-08-02

quote: "10g Real Application Clusters introduced a concept of resource remastering via Dynamic Resource Mastering (DRM). With DRM a resource can be re-mastered on another node in the cluster if it is found that the cache resource is accessed more frequently from that node. In 10G R1 this was file based whereas the 10G R2 it is object based. In 10G R1 due to a few bugs many related to high CPU usage during the DRM freeze window most customers disabled DRM by setting the following parameters _gc_affinity_time=0 _gc_undo_affinity=FALSE _gc_affinity_time defines the frequency in minutes to check if remastering is needed. _gc_affinity_limit defines the number of times a node must access an object for it to be a DRM candidate _gc_affinity_minimum defines the minimum number of times an object is accessed per minute before affinity kicks in The performance problems may manifest themselves in terms of a DRM related wait event like 'gcs drm freeze in enter server mode' In 10G R2 this feature appears to be more stable. You can also manually remaster an object on a different node which is different from the node on which the object is currrently mastered as shown below SQL> select object_id,current_master, previous_master ,remaster_cnt from V$GCSPFMASTER_INFO where object_id = 144615 OBJECT_ID CURRENT_MASTER PREVIOUS_MASTER REMASTER_CNT ---------- -------------- --------------- ------------ 144615 0 2 0 The object 144615 is currently mastered on node 0. To remaster the object onto node 2 connect to node 2 as sysdba Go to instance 2 NODE2> oradebug setmypid Statement processed. NODE2> oradebug lkdebug -m pkey 144615 Statement processed. NODE2> select object_id,current_master, previous_master ,remaster_cnt from V$GCSPFMASTER_INFO where object_id = 144615 OBJECT_ID CURRENT_MASTER PREVIOUS_MASTER REMASTER_CNT ---------- -------------- --------------- ------------ 144615 2 0 0 Note: In V$GCSPFMASTER_INFO you will also see resources with object ids in the 4Gb range (e.g. 4294950913) These are for undo segments. To dissolve remastering of this object on this instance SQL> oradebug lkdebug -m dpkey 144615 Statement processed. SQL> select object_id,current_master, previous_master ,remaster_cnt from V$GCSPFMASTER_INFO where object_id = 144615; no rows selected The remaster_cnt appears to be 0 for all objects. I have got Oracle to log bug 5649377 on this issue. SQL> select distinct remaster_cnt from V$GCSPFMASTER_INFO ; REMASTER_CNT ------------ 0 DRM statistics are available in X$KJDRMAFNSTATS SQL> select * from X$KJDRMAFNSTATS 2 / ADDR INDX INST_ID DRMS AVG_DRM_TIME OBJECTS_PER_DRM QUISCE_T FRZ_T CLEANUP_T REPLAY_T FIXWRITE_T SYNC_T -------- ---------- ---------- ---------- ------------ --------------- ---------- ---------- ---------- ---------- ---------- ---------- RES_CLEANED REPLAY_S REPLAY_R MY_OBJECTS ----------- ---------- ---------- ---------- 200089CC 0 1 32 214 1 3 14 0 0 0 99 0 2441 6952 30 The column MY_OBJECTS denotes the number of objects mastered on that node. This should match with the following SQL> select count(*) from V$GCSPFMASTER_INFO where current_master=0 2 / COUNT(*) ---------- 30 "

admin 2010-08-14

Hdr: 5649377 10.2.0.1.0 RDBMS 10.2.0.1.0 DLM PRODID-5 PORTID-46 Abstract: REMASTER_CNT IN V$GCSPFMASTER_INFO IS NOT GETTING UPDATED PROBLEM: -------- When a object is getting remastered the column REMASTER_CNT in V$GCSPFMASTER_INFO is not getting updated DIAGNOSTIC ANALYSIS: -------------------- SQL> create table t1 ( a number); Table created. SQL> begin 2 for i in 1..100 3 loop 4 insert into t1 values(i); 5 end loop; 6 commit; 7 end; 8 / PL/SQL procedure successfully completed. SQL> select count(*) from t1; COUNT(*) ---------- 100 SQL> select * from t1; A Run select * from t1 multiple times and get the object mastered on any node. In our case the object is currently mastered on the node elephant elephant> select * from V$GCSPFMASTER_INFO where object_id = 290525; FILE_ID OBJECT_ID CURRENT_MASTER PREVIOUS_MASTER REMASTER_CNT ---------- ---------- -------------- --------------- ------------ 0 290525 0 32767 0 Force a remastering onto 2nd node hippo hippo>>oradebug lkdebug -m pkey 290525 Statement processed. hippo>> select * from V$GCSPFMASTER_INFO where object_id = 290525; FILE_ID OBJECT_ID CURRENT_MASTER PREVIOUS_MASTER REMASTER_CNT ---------- ---------- -------------- --------------- ------------ 0 290525 2 0 0 The remaster_cnt has not increased. Force a remastering onto 3rd node rhino rhino> oradebug lkdebug -m pkey 290525 Statement processed. rhino> select * from V$GCSPFMASTER_INFO where object_id = 290525; FILE_ID OBJECT_ID CURRENT_MASTER PREVIOUS_MASTER REMASTER_CNT ---------- ---------- -------------- --------------- ------------ 0 290525 1 0 0 Again the remaster count has not increased. WORKAROUND: ----------- NA RELATED BUGS: ------------- NA REPRODUCIBILITY: ---------------- Reproducible TEST CASE: ---------- STACK TRACE: ------------ SUPPORTING INFORMATION: ----------------------- 24 HOUR CONTACT INFORMATION FOR P1 BUGS: ---------------------------------------- DIAL-IN INFORMATION: -------------------- IMPACT DATE: ------------ REDISCOVERY INFORMATION: Remaster count (REMASTER_CNT) in V$GCSPFMASTER_INFO is always 0, even after remastering has occurred. WORKAROUND: None RELEASE NOTES: ]]When dynamic remastering occurred, the remaster_cnt column in the dynamic ]]view v$gcspfmaster_info was not being updated. This has now been fixed.

maclean 2010-08-18

lmon Process Terminated With Error Ora-481 Applies to: Oracle Server - Enterprise Edition - Version: 10.1.0.4 to 10.2.0.2 This problem can occur on any platform. RAC 10.1.0.3 (or 10.1.0.4) Symptoms Instance finished due to ORA-481 in lmon. Cause This can be produced for the internal bug: 3659289 Lmon Took Off Line During Remastering Sync. The bug matches: 1. lmon produces the ORA-481 2. instance crashed 3. lmon failed to get an anwer from the lmd during a dynamic reconfiguration in the same step: * kjfcdrmrfg: SYNC TIMEOUT while waiting for lmd in sync step 31 .. sync() timed out - lmon exiting 4. a quiesce step was executed before the error: Solution Before 10.2.0.3, we should not use DRM. The main reason is that we have some issues fixed in 10.2.0.3.0 which cannot be backported to earlier versions. To disable DRM the following parameters must be set on the instances: - 10gR1 _gc_affinity_time=0 - 10gR2 _gc_undo_affinity=FALSE _gc_affinity_time=0

maclean 2010-08-18

Hdr: 5181297 10.2.0.1.0 RDBMS 10.2.0.1.0 RAC PRODID-5 PORTID-197 ORA-481 Abstract: LMON TERMINATES INSTANCE DUE TO SYNC TIMEOUT (STEP 31) - ORA-481 PROBLEM: -------- o 4 node HP-UX Itanium RAC cluster running 10.2.0.1.0 o each node has 48 processors o customer running migration testing, replacing PA RISC with Itanium processors o on Apr 17 at around 23:15 instance on node 1 terminated due to LMON dying o LMON apparently terminated due to a timeout recorded in the LMON trace file kjfcdrmrfg: SYNC TIMEOUT (48358, 47457, 900), step 31 o prior to the instance termination all instances were experiencing a hang o after instance 1 died other instance "unfroze" DIAGNOSTIC ANALYSIS: -------------------- There is no obvious reason for the instance termination. Resource starvation appears unlikely as the CPU load during the time frame in qusestion is rather low. The highest load on some processors is 20%. WORKAROUND: ----------- none RELATED BUGS: ------------- REPRODUCIBILITY: ---------------- intermittent TEST CASE: ---------- n/a STACK TRACE: ------------ LMON trace shows the following call stack: ksedst ksedmp $cold_ksddoa ksdpcg ksdpec ksfpec kgesev ksesec0 kjfcdrmrfg kjfcln ksbrdp opirip opidrv sou2o opimai_real main main_opd_entry SUPPORTING INFORMATION: ----------------------- 24 HOUR CONTACT INFORMATION FOR P1 BUGS: ---------------------------------------- DIAL-IN INFORMATION: -------------------- IMPACT DATE: ------------ on ngmp1. ========= From ngmp1_lmon_15085.trc ------------------------- *** 23:15:44.052 kjfcdrmrfg: SYNC TIMEOUT (48358, 47457, 900), step 31 Submitting asynchronized dump request [28] ... ... <> error 481 detected in background process ORA-481: LMON process terminated with error *** 23:16:37.822 ksuitm: waiting for [5] seconds before killing DIAG From alert.log -------------- Mon Apr 17 23:16:06 2006 Errors in file /oracle/app/oracle/admin/NGMP/bdump/ngmp1_lmon_15085.trc: ORA-481: LMON process terminated with error ... ... <> Mon Apr 17 23:16:42 2006 Instance terminated by LMON, pid = 15085 *** 04/23/06 10:55 pm *** on ngmp2 ======== From alert.log -------------- Mon Apr 17 23:15:23 2006 GES: Potential blocker (pid=23648) on resource SS-0x3-0x2; enqueue info in file /oracle/app/oracle/admin/NGMP/udump/ngmp2_ora_3359.trc and DIAG trace file Mon Apr 17 23:16:08 2006 Trace dumping is performing id=[cdmp_20060417231607] Mon Apr 17 23:16:47 2006 Reconfiguration started (old inc 16, new inc 18) List of nodes: 1 2 3 on ngmp3 ======== From alert.log -------------- Mon Apr 17 23:16:08 2006 Trace dumping is performing id=[cdmp_20060417231607] Mon Apr 17 23:16:47 2006 Reconfiguration started (old inc 16, new inc 18) List of nodes: 1 2 3 on ngmp4 ======== From alert.log -------------- Mon Apr 17 23:15:23 2006 GES: Potential blocker (pid=23803) on resource CI-0xe-0x1; enqueue info in file /oracle/app/oracle/admin/NGMP/bdump/ngmp4_lmd0_23655.trc and DIAG trace file Mon Apr 17 23:16:08 2006 Trace dumping is performing id=[cdmp_20060417231607] Mon Apr 17 23:16:47 2006 Reconfiguration started (old inc 16, new inc 18) List of nodes: 1 2 3

admin 2010-08-18

Hdr: 5960580 10.2.0.2 RDBMS 10.2.0.2 RAC PRODID-5 PORTID-23 5181297 Abstract: LMON CRASHING WITH ORA-481 ERROR PROBLEM: -------- LMON crashed with ora-481 on Mon Mar 26 18:53:47 2007. Ct said that a particular job which was completing in 3.5 hours on Friday March 23, was taking 5 hours on Saturday March 24 and after the crash (March 26) is taking 12 to 15 hours. I am not sure whether the crash and the performance are related. The crash did NOT happen when the job was running. Let us consider this bug for only the LMON crash. DIAGNOSTIC ANALYSIS: -------------------- alert_PFNR1011.log: ========================== Mon Mar 26 18:53:36 2007 WARNING: inbound connection timed out (ORA-3136) Mon Mar 26 18:53:36 2007 WARNING: inbound connection timed out (ORA-3136) Mon Mar 26 18:53:36 2007 WARNING: inbound connection timed out (ORA-3136) Mon Mar 26 18:53:36 2007 WARNING: inbound connection timed out (ORA-3136) Mon Mar 26 18:53:36 2007 ??????? Mon Mar 26 18:53:47 2007 Errors in file /oracle/g01/admin/PFNR1011/bdump/pfnr1011_lmon_1349.trc: ORA-481: LMON process terminated with error Mon Mar 26 18:53:47 2007 LMON: terminating instance due to error 481 Mon Mar 26 18:53:47 2007 Errors in file /oracle/g01/admin/PFNR1011/bdump/pfnr1011_lck0_1767.trc: ORA-481: LMON process terminated with error Mon Mar 26 18:53:47 2007 Errors in file /oracle/g01/admin/PFNR1011/bdump/pfnr1011_dbw2_1485.trc: ORA-481: LMON process terminated with error Mon Mar 26 18:53:47 2007 Errors in file /oracle/g01/admin/PFNR1011/bdump/pfnr1011_lms0_1354.trc: ORA-481: LMON process terminated with error Mon Mar 26 18:53:47 2007 Errors in file /oracle/g01/admin/PFNR1011/bdump/pfnr1011_lmd0_1352.trc: ORA-481: LMON process terminated with error Mon Mar 26 18:53:47 2007 Errors in file /oracle/g01/admin/PFNR1011/bdump/pfnr1011_pmon_1341.trc: ORA-481: LMON process terminated with error Mon Mar 26 18:53:47 2007 Errors in file /oracle/g01/admin/PFNR1011/bdump/pfnr1011_lms2_1363.trc: ORA-481: LMON process terminated with error Mon Mar 26 18:53:47 2007 Errors in file /oracle/g01/admin/PFNR1011/bdump/pfnr1011_lms4_1410.trc: ORA-481: LMON process terminated with error Mon Mar 26 18:53:47 2007 Errors in file /oracle/g01/admin/PFNR1011/bdump/pfnr1011_lms6_1425.trc: ORA-481: LMON process terminated with error Mon Mar 26 18:53:47 2007 System state dump is made for local instance Mon Mar 26 18:53:47 2007 Errors in file /oracle/g01/admin/PFNR1011/bdump/pfnr1011_lms1_1359.trc: ORA-481: LMON process terminated with error Mon Mar 26 18:53:47 2007 Errors in file /oracle/g01/admin/PFNR1011/bdump/pfnr1011_lms3_1397.trc: ORA-481: LMON process terminated with error Mon Mar 26 18:53:47 2007 Errors in file /oracle/g01/admin/PFNR1011/bdump/pfnr1011_lms5_1419.trc: ORA-481: LMON process terminated with error Mon Mar 26 18:53:47 2007 Errors in file /oracle/g01/admin/PFNR1011/bdump/pfnr1011_lms7_1432.trc: ORA-481: LMON process terminated with error System State dumped to trace file /oracle/g01/admin/PFNR1011/bdump/pfnr1011_diag_1344.trc Mon Mar 26 18:53:47 2007 Errors in file /oracle/g01/admin/PFNR1011/bdump/pfnr1011_ckpt_1513.trc: ORA-481: LMON process terminated with error Mon Mar 26 18:53:47 2007 Errors in file /oracle/g01/admin/PFNR1011/bdump/pfnr1011_j002_5622.trc: ORA-481: LMON process terminated with error Mon Mar 26 18:53:47 2007 Errors in file /oracle/g01/admin/PFNR1011/bdump/pfnr1011_dbw3_1491.trc: ORA-481: LMON process terminated with error Mon Mar 26 18:53:47 2007 Errors in file /oracle/g01/admin/PFNR1011/bdump/pfnr1011_j003_5637.trc: ORA-481: LMON process terminated with error Mon Mar 26 18:53:52 2007 Instance terminated by LMON, pid = 1349 Mon Mar 26 18:54:50 2007 Starting ORACLE instance (normal) Mon Mar 26 18:55:29 2007 Reviewing pfnr1011_lmon_1349.trc: =================================== *** 18:53:27.122 kjfcdrmrfg: SYNC TIMEOUT (275372, 274471, 900), step 31 Submitting asynchronized dump request [28] kjctseventdump-end tail 205 heads 0 @ 0 205 @ 1047039906 sync() timed out - lmon exiting kjfsprn: sync status inst 0 tmout 900 (sec) kjfsprn: sync propose inc 8 level 396 kjfsprn: sync inc 8 level 396 waiting for 'ges remote message' blocking sess=0x0 seq=4510 wait_time=0 seconds since wait started=442 waittime=40, loop=0, p3=0 Dumping Session Wait History for 'ges remote message' count=1 wait_time=746343 waittime=40, loop=0, p3=0 WORKAROUND: ----------- no RELATED BUGS: ------------- similar to bug 5399702 - base bug 5181297 could be bug 4947571 - base bug 4940890 REPRODUCIBILITY: ---------------- no TEST CASE: ---------- no STACK TRACE: ------------ *** 18:53:27.159 Dumping diagnostic information for ospid 1352: OS pid = 1352 loadavg : 1.44 1.39 1.39 swap info: free_mem = 34099.71M rsv = 27064.04M alloc = 26321.46M avail = 95169.25 swap_free = 95911.83M F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD 0 S oracle 1352 1 0 40 20 ? 2646619 ? Mar 23 console 22:22 ora_lmd0_PFNR1011 1352: ora_lmd0_PFNR1011 ----------------- lwp# 1 / thread# 1 -------------------- ffffffff7a8cde8c ioctl (9, 373f, 105e6ba10) ffffffff7fffc910) + 15f4 710 0000000100e662e0 ksliwat (105c00, 2, 8, 91a000160, 0, 0) + b60 0000000100e66820 kslwaitns (8, 1, 32, 0, 40, 0) + 20 00000001010d61f4 kskthbwt (8, 1, 32, 0, 40, 0) + d4 0000000100e667dc kslwait (8, 32, 0, 15e, 190, 0) + 5c 00000001010e8344 ksxprcv (104fb4, 105d68558, 104fb4128, 1468, 105d68, 104fb4000) + 364 0000000101591254 kjctr_rksxp (40, 403fe88f8, 0, ffffffff7fffd978, 14, ffffffff7fffd974) + 1f4 0000000101592e24 kjctrcv (ffffffff79626208, 403fe88f8, 105e992c0, ffffffff7fffe1bc, 40, 32) + 164 000000010157f6a0 kjcsrmg (ffffffff796261f0, 0, 40, 32, 0, 105d71) + 60 00000001015dc098 kjmdm (a, 44, 4097ab030, 8, 4097ab030, 0) + 2ff8 0000000101002a60 ksbrdp (105d6b, 380007774, 380000, 38000e, 105c00, 1015d90a0) + 380 00000001024219f8 opirip (105d75000, 105c00, 105d7d, 380007000, 105d75, 105df1ae0) + 338 00000001002fe790 opidrv (105d77d18, 1, 32, 0, 32, 105c00) + 4b0 00000001002f8e30 sou2o (ffffffff7ffff468, 32, 4, ffffffff7ffff490, 1056ac000, 1056ac) + 50 00000001002bc2ec opimai_real (3, ffffffff7ffff568, 0, 0, 1e42ee4, 14400) + 10c 00000001002bc118 main (1, ffffffff7ffff678, 0, ffffffff7ffff570, ffffffff7ffff680, ffffffff7aa00140) + 98 00000001002bc03c _start (0, 0, 0, 0, 0, 0) + 17c SUPPORTING INFORMATION: ----------------------- will be uploaded 24 HOUR CONTACT INFORMATION FOR P1 BUGS: ---------------------------------------- Viral Shah - customer - 267-467-6950. This is a sev2 p1 SR. DIAL-IN INFORMATION: -------------------- IMPACT DATE: ------------ Ct is on pre-production. Planning to go production next week. Ct saidd he cannot upgrade to 10.2.0.3.

admin 2010-08-18

Hdr: 9405510 10.2.0.4 RDBMS 10.2.0.4 RAC PRODID-5 PORTID-197 ORA-481 6960699 Abstract: LMON TERMINATES INSTANCE DUE TO SYNC TIMEOUT, STEP 31 PROBLEM: -------- Two nodes cluster. Two node always shutdown abort because of ORA-481 LMON apparently terminated due to a timeout recorded in the LMON trace file kjfcdrmrfg: SYNC TIMEOUT (289446, 288545, 900), step 31 DIAGNOSTIC ANALYSIS: -------------------- alert_CSP1.log.20100203 ------------ Thu Jan 28 17:57:09 2010 Errors in file /cfs/oradata/CSP/admin/CSP/bdump/csp1_lmon_18295.trc: ORA-481: LMON process terminated with error Thu Jan 28 17:57:09 2010 LMON: terminating instance due to error 481 Thu Jan 28 17:57:09 2010 Errors in file /cfs/oradata/CSP/admin/CSP/bdump/csp1_lms0_18299.trc: ORA-481: LMON process terminated with error Thu Jan 28 17:57:09 2010 Errors in file /cfs/oradata/CSP/admin/CSP/bdump/csp1_lms1_18307.trc: ORA-481: LMON process terminated with error Thu Jan 28 17:57:09 2010 Errors in file /cfs/oradata/CSP/admin/CSP/bdump/csp1_lmd0_18297.trc: ORA-481: LMON process terminated with error Thu Jan 28 17:57:09 2010 System state dump is made for local instance System State dumped to trace file /cfs/oradata/CSP/admin/CSP/bdump/csp1_diag_18291.trc Thu Jan 28 17:57:09 2010 Errors in file /cfs/oradata/CSP/admin/CSP/bdump/csp1_pmon_18289.trc: ORA-481: LMON process terminated with error Thu Jan 28 17:57:11 2010 Shutting down instance (abort) alert_CSP2.log.20100203 ------------------ Thu Jan 28 17:57:10 2010 Trace dumping is performing id=[cdmp_20100128175709] Thu Jan 28 17:57:18 2010 Reconfiguration started (old inc 64, new inc 66) List of nodes: 1 From csp1_lmon_18295.trc ========= *** 17:41:47.285 Begin DRM(1328) sent syncr inc 64 lvl 8809 to 0 (64,0/31/0) sent synca inc 64 lvl 8809 (64,0/31/0) ... sent syncr inc 64 lvl 8840 to 0 (64,0/38/0) sent synca inc 64 lvl 8840 (64,0/38/0) End DRM(1328) Begin DRM(1329) *** 17:56:49.084 kjfcdrmrfg: SYNC TIMEOUT (289446, 288545, 900), step 31 Submitting asynchronized dump request [28] == WORKAROUND: ----------- Disable DRM as below _gc_undo_affinity=FALSE _gc_affinity_time=0 RELATED BUGS: ------------- 5181297 REPRODUCIBILITY: ---------------- TEST CASE: ---------- STACK TRACE: ------------ SUPPORTING INFORMATION: ----------------------- This seems to be very much similar to bug 5181297, which is fixed in 10203. But ct is on 10204. 24 HOUR CONTACT INFORMATION FOR P1 BUGS: ---------------------------------------- DIAL-IN INFORMATION: -------------------- IMPACT DATE: ------------

admin 2010-08-18

Hdr: 5407742 10.2.0.2 RDBMS 10.2.0.2 RAC PRODID-5 PORTID-46 ORA-481 Abstract: LMON CRASHE WITH ORA-481 KJFCDRMRFG: SYNC TIMEOUT STEP 31 PROBLEM: -------- 12 nodes with 2 instance on each nodes They ran into bug 4631662 and upgrade to 10.2.0.2 on 7/9/06. Intance 1 of NMC database crashed with ora-481. Thu Jul 20 09:25:19 2006 Errors in file /opt/app/oracle/product/10.2.0/RAC/admin/NMC/bdump/nmc1_lmon_9754.trc: ORA-481: LMON process terminated with error Thu Jul 20 09:25:19 2006 LMON: terminating instance due to error 481 This is their production and can't afford to have instance crashed on the time DIAGNOSTIC ANALYSIS: -------------------- From the lmon trace: nmc1_lmon_9754.trc --------------------------- *** 09:10:01.138 sent syncr inc 718 lvl 17953 to 0 (718,0/31/0) sent synca inc 718 lvl 17953 (718,0/31/0) sent syncr inc 718 lvl 17954 to 0 (718,0/34/0) sent synca inc 718 lvl 17954 (718,0/34/0) sent syncr inc 718 lvl 17955 to 0 (718,0/36/0) sent synca inc 718 lvl 17955 (718,0/36/0) sent syncr inc 718 lvl 17956 to 0 (718,0/38/0) sent synca inc 718 lvl 17956 (718,0/38/0) *** 09:25:04.835 kjfcdrmrfg: SYNC TIMEOUT (924581, 923680, 900), step 31 Submitting asynchronized dump request [28] KJC Communication Dump: state 0x5 flags 0x0 mode 0x0 inst 0 inc 718 nrcv 3 nsp 3 nrcvbuf 1000 reg_msg: sz 420 cur 216 (s:0 i:216) max 925 ini 2350 big_msg: sz 4128 cur 38 (s:0 i:38) max 205 ini 2350 rsv_msg: sz 4128 cur 0 (s:0 i:0) max 0 tot 1000 rcvr: id 2 orapid 8 ospid 9788 rcvr: id 1 orapid 7 ospid 9784 rcvr: id 0 orapid 6 ospid 9756 ................ kjctseventdump-end tail 247 heads 0 @ 0 247 @ 931725070 247 @ 931725070 247 @ 931725070 247 @ 931725070 247 @ 931725070 247 @ 931725070 247 @ 931725070 247 931725070 247 @ 931725070 247 @ 931725070 247 @ 931725070 sync() timed out - lmon exiting kjfsprn: sync status inst 0 tmout 900 (sec) kjfsprn: sync propose inc 718 level 17956 kjfsprn: sync inc 718 level 17956 kjfsprn: sync bitmap 0 1 2 3 4 5 6 7 8 9 10 11 kjfsprn: dmap ver 718 (step 0) kjfsprn: ftdone bitmap 0 1 2 3 5 6 7 8 9 10 11 This seem to match Bug 5237240 and that bug is at status 30. WORKAROUND: ----------- none RELATED BUGS: ------------- Bug 5181297 - LMON TERMINATES INSTANCE DUE TO SYNC TIMEOUT (STEP 31) - ORA-481 - fixed in 10.2.0.3 Bug 5131042 - ORA-481: LMON PROCESS TERMINATED WITH ERROR SYNC TIMEOUT - dup of 4903532 - fix in 10.2.0.3 REPRODUCIBILITY: ---------------- TEST CASE: ---------- STACK TRACE: ------------ there is no lmon stack trace SUPPORTING INFORMATION: ----------------------- 24 HOUR CONTACT INFORMATION FOR P1 BUGS: ---------------------------------------- DIAL-IN INFORMATION: -------------------- IMPACT DATE: ------------

评论 (7)