订阅了oracle-l@freelists.org邮件列表(顺便提一下,订阅这个邮件列表也是学习Oracle的好方法,它完全免费,且有许多Oracle界的”大师”出没于此,当然要通过审查才能发言;你可以从freelists.org上找到入口!)的朋友可能在若干天前收到了一封来自Guillermo Alan Bort关于到哪里可以找到学习Oracle内部工作原理资料的邮件。

著名的juliandyke的巢穴,他写了许多独创的专题,本人也从他的著作中获益良多 http://juliandyke.com
oraktable.net上汇集了众多专家在博客上发表的最新技术文章 http://www.oaktable.net/members
致力于挖掘RAC技术,可以说是研究RAC最好的站点 http://www.oracleracsig.org
Oracle 8i service internal作者的站点,虽然已经多年未更新,却是你不能错过的一课 http://www.ixora.com.au/
Tanel Poder可能是世界上真正的第一个OCM,他有很多关于internal和performance的著作 http://blog.tanelpoder.com/
著名的《Cost Based Oracle Fundamentals》的作者Jonathan Lewis在wordpress上的博客 http://jonathanlewis.wordpress.com/
这就像是专门存放Oracle资料的图书馆 http://www.morganslibrary.org/library.html
Pete Finnigans是Oracle security方面的专家,发表了大量关于hack Oracle的文章 http://www.petefinnigan.com
我们之前说的邮件列表的入口! http://www.freelists.org/webpage/oracle-l

to be continued ………..

【转】网络制图法(Internet Cartography)

fackbook的技术专家之一Carlos Bueno在这周一发表了这篇关于有趣的网络制图(Internet Cartography)的文章,如果你恰好”无法正常浏览“facebook的页面,那么也可以读读我所引用的:

Every generation likes to think it reinvents the world from scratch. But some things are shaped by history and geography as much as anything. Mountains, rivers, archipelagos, and long terrestrial crossings play a big role in deciding where, how, and how well different parts of the Earth get connected.

This is a map of the global telegraph network from 110 years ago side-by-side with the internet of today:

One way to see the internet is as a physical manifestation of trade volume between cities, on a 40-year moving average. That is about how long it takes for economic ties to develop, demand to rise, and high-volume communications routes to be financed and built. Once built, these links tend to stick around.

Governments and empires have come and gone, bandwidth has increased a billion-fold, but the network has the same general shape it had back when Mark Twain was sending witty telegrams. The only big change since then is greater ties between the US and Asia.

Just from looking at where the cables go you can guess how long it would take to send a message. A telegram from San Francisco to Hong Kong in 1901 must have taken many hops through British Empire cables to Europe, through the Middle East, and so on. London to New York was fast and direct. The vestiges of the Spanish and Portuguese Empires show up in the many links between South America, the Caribbean archipelago, and the Iberian peninsula.

A cool thing is that you can measure these relative latencies yourself, using the present-day internet. If you run a website with a decent amount of worldwide traffic, you can use that traffic to map out how the internet responds with regards to you, and see how that matches with the gross structure of the ‘net.

I wrote about a cheap and cheerful way to generate this data last year, and the code has since been open-sourced as part of Yahoo’s Boomerang measurement framework. The basic idea is to have your users perform two tiny network requests: one to a throwaway hostname generated in the moment, like 8j48sas.dns.example.net/A.gif, then another to a different single-pixel image on the same host, 8j48sas.dns.example.net/B.gif. The first request will require a DNS lookup, TCP handshake, and HTTP transaction. The second only needs to do the TCP and HTTP steps. Now you have fuzzy measurements of how long it took to do a full HTTP round-trip (B) and to do a full end-to-end DNS lookup (A – B).

Real-world data on DNS performance is generally considered hard to come by. The domain name system is designed with caching and intermediaries at all levels, so you as a site owner only see part of the story during normal operation. You can buy data like this from commercial services like Gomez or Keynote, or get it yourself if you happen to have, say, a browser plugin installed on millions of computers. Otherwise, this Javascript method is less accurate but works well enough.

Here is a chart of median (50th percentile) DNS latencies experienced by a random sample of Facebook users, broken down by country. As you can see, there are several lines crowding together at the bottom. That is the US and parts of Europe like the UK and Belgium. Facebook’s DNS servers tend to be physically close to users in those countries. Spain and France are a bit higher up, and the rest of the graph is a mix of Asian and South American countries. [1]

The median value only tells part of the story. Here is the worldwide DNS latency data as a density plot, to show the distribution. Notice that a substantial number of users took more than 500 milliseconds just to look up a hostname. This is the uncached worst-case, of course, but it’s something to keep in mind.

HTTP Latencies
Here is the chart for measurement B, the TCP + HTTP latency. This better reflects the real “geography” of the internet, because the HTTP requests travel all of the way back to our web tiers in the United States. There is much less volatility in these measurements day-to-day; it’s controlled more by basic network conditions and speed-of-light and less by the health of various DNS recursors around the world.

How low can you go?
So how fast are these links between countries, compared to what is possible? Below is a chart of the same median HTTP latency data, averaged over a week. The short light-grey bars represent the theoretical minimum. If you could carve a direct line between any two spots along the surface of the planet, this grey bar would be the internet round trip time between the US and the given country. [2]

We can learn a lot of things from this chart. The most obvious is that HTTP latency between Asia and the US is worse than US-Europe. The Pacific Ocean is wider than the Atlantic, of course, but raw distance is not the only factor. Economics and local geography play their part.

Look at the ratios between the black bars (real) and the grey bars (theoretical). Both the fastest European and Asian countries have real-world latencies at or below 2X the theoretical minimum, which is pretty impressive. Few technologies get within spitting distance of the physical limits of the universe.

These low-multiple countries tend to have fortunate geography, or a strong history of economic relations with the United States, or both. Other countries with less-strong trade ties, such as Spain, or lots of little islands like the Philippines, have multiples nearer to 2.5X and above. While Australia is a bit farther than Thailand it’s 15% closer as far as the internet is concerned. More investment has been put in by the cable operators to make that route fast and wide. In fact, Australia (population 22M) a comparable amount of bandwidth to the US as all of South America (population 385M).

The multiples of South American countries start at 3.5X and go up from there. North-South routes are hurt by an unlucky trifecta of mountains, long land crossings, and archipelagos. There is only one cable that serves the Pacific side from Los Angeles to Panama. It’s hard to justify building lots of capacity on the Pacific side, because the Andes mountains cut off that part of the continent from the rest. Most traffic follows a long and painful path across the entire length of the US to the Atlantic, then takes a right turn and down another 800 miles of the Florida peninsula. It exits Miami and immediately hits a congested maze of cables, hopping in and out of the water as it navigates the islands of the Caribbean. Someday South America will get better connected, but natural barriers drive the costs way up.

There are other interesting cases such as Belgium, which has the lowest latency and lowest multiple (1.6X) of any European country. The reason is that Belgium is well-placed as an internet nexus, being a) close to Britain but away from the Channel and b) geographically convenient for branching off into the rest of Europe.

Try this at home
These measurements are very skewed towards the United States. It would be awesome to see measurements from other spots and different traffic patterns from around the world. The code to collect this data (and a lot more) is open-source and simple to implement. So try the experiment for yourself and let us know what you find.

Carlos Bueno, an Engineer at Facebook, loves pinging the tubes.

[1] This chart generally agrees with data gathered by Yahoo and Microsoft. The data is very US-centric; the picture will be quite different if you were to run the experiment from a site based on another continent. Facebook’s servers are largely in the US, so naturally we care most about how to get bits from here to there and less about, say, between India and Saudi Arabia.

[2] The theoretical minimum latency is calculated using the average speed of light through optical fiber, over a hypothetical cable laid in a great circle line between the town of Independence, Kansas and the centroid of the given country. This time is multiplied by 4 to approximate the two round-trips necessary to complete a TCP handshake and HTTP transaction. You can read all about Great Circle routes and the speed of light through fiber in Wikipedia, or just use Wolfram Alpha to do it for you.

ocfs certification with 11gr2 rac

We are planning to implement 11gR2 RAC on OCFS2 file system. We are going to have ocr files,voting disk files,database files and flash recovery area files on OCFS2 file system.
Generic Note
ocfs2 is certified for oracle 11gr2 but oracle recommends using asm.
please see this for more information http://download.oracle.com/docs/cd/E11882_01/install.112/e10812/storage.htm#CWLIN262
3.1.3 Supported Storage Options
The following table shows the storage options supported for storing Oracle Clusterware and Oracle RAC files.


For information about OCFS2, refer to the following Web site:
If you plan to install an Oracle RAC home on a shared OCFS2 location, then you must upgrade OCFS2 to at least version 1.4.1, which supports shared writable mmaps.

For OCFS2 certification status, and for other cluster file system support, refer to the Certify page on My Oracle Support.

Table 3-1 Supported Storage Options for Oracle Clusterware and Oracle RAC
Linux x86-64
11gR2 RAC
RAC for LinuxRAC Technologies Compatibility Matrix (RTCM) for Linux platforms.


Note 183408.1 Raw Devices and Cluster Filesystems With Real Application Clusters
Note 238278.1 How to find the current OCFS or OCFS2 version for Linux
Note 811306.1 RAC Assurance Support Team: RAC Starter Kit and Best Practices (Linux)









吃完饭匆匆赶赴客户单位,路上竟堵了许久,想不到此处下午1点多也会有rush hour。

来到客户单位已是2点多,从客户的张科长,朱工程师口中大致了解了系统规模,这里主用的一套Linux上的RAC系统共有3个节点,硬件是三台Dell的3950 Pc Server以及一台IBM DS4700存储,三个实例接受不同地区的业务,客户目前对这套RAC系统的性能不顶满意,希望此次技术交流能够提出初步的方案,另外他们对与主机上较高的内存使用率表示疑问。

我首先抓取了RDA以及AWR报告,托福于这些工具Oracle的诊断调优信息收集已经十分集成了。这个系统居然还采用了ASM,Linux上RAC系统大致可以采取以下几种存储管理方式:裸设备(最容易也最麻烦),NFS(生产环境无法用),ASM(Oracle 推荐),OCFS2(11g被抛弃了),其他GPFS(譬如RedHat GPFS,Veritas Cluster Filesystem);10g下最时髦的管理方式还属ASM,ASM技术新颖,但普遍认为10gASM还不够成熟,至少客户应用上经验不足,一旦出现问题,除了向Oracle GCS求助外很少能找到其他解决方法。而且这套ASM系统直接使用形如/dev/sd*的块设备作为disk,不同于裸设备,使用块设备将会引起操作系统对块的进一步缓存(OS CACHE),这套系统除了跑数据库外没有其他应用,总内存为24g而每个实例使用的SGA不足5g,绝大多数内存都用作了缓存块设备。

[oracle@qhds2 ~]$ free
total       used       free     shared    buffers     cached
Mem:      16408324   16276472    131852          0      44184   13969312
-/+ buffers/cache:    2262976      14145348
Swap:      2031608      90224    1941384

如上所示空闲物理内存131852即128M,OS cache为13G。Oracle使用的高速缓存(Buffer_Cache)要较OS的cache更为有效,而目前三个实例设置的Buffer_cache为2-3G不等,建议客户调大Buffer_Cache至5-10G,同时为杜绝操作系统因物理内存不足,瞬间换页造成宕机事件发生的可能性,建议客户将Linux内核参数vm.min_free_kbytes 设置为512000(即总是保留500M的空闲物理内存)。

数据库中三个实例,整体负载(Load)处于较低的水平。相对而言1号实例最高,16小时快照内1号实例上Average Active Session(平均活动会话数,为10g中数据库负载的重要指标)为(AAS=DB_TIME/Elapsed=2327/960)2.42;2号实例上AAS为592/960=0.61;3号上AAS为1405/960=1.463。
分析AWR 报告可以发现:


Event Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class
enq: TM – contention 98,218 40,208 409 28.8 Application
gc cr multi block request 5,877,437 36,369 6 26.0 Cluster
CPU time 8,511 6.1
gc buffer busy 64,538 5,455 85 3.9 Cluster
db file parallel read 519,589 4,116 8 2.9 User I/O


Event Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class
CPU time 7,962 22.4
gc cr multi block request 2,903,270 6,027 2 17.0 Cluster
db file scattered read 843,958 3,070 4 8.6 User I/O
db file parallel read 421,792 3,010 7 8.5 User I/O
PX Deq Credit: send blkd 248,625 1,755 7 4.9 Other


Event Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class
CPU time 18,381 21.8
gc cr multi block request 6,534,528 18,204 3 21.6 Cluster
db file scattered read 1,562,020 5,575 4 6.6 User I/O
db file parallel read 563,113 4,281 8 5.1 User I/O
db file sequential read 709,698 2,502 4 3.0 User I/O


  • ENQ:TM- Contention
  • Gc cr multi block request

“Gc Multi Block request”为全局缓存多块请求事件,是一种集群类等待事件。当RAC中每个实例所涉及的数据严重交叉时该事件将成为数据库的主要性能瓶颈。在不能通过改善应用程序集群适应性或对表进行实例划分分区的情况下,建议客户将RAC中的多个实例划为主用和备用实例以获取更好的性能。举例来说可以将硬件性能较好的一号主机作为主用主机,而二三号主机作为备用主机平时不接受应用程序会话,在一号主机需要OFFLINE时做应急用。客户使用RAC系统的最主要目的是为了获取高可用性,而通过向客户解释RAC Global Cache大致的工作原理后,客户首肯了这个方案。

EnQ:TM-Contention即TM锁队列等待,一般是应用程序对表执行共享级以上的表锁时(包括:S,SSX,X锁)且锁定时间过长或表上有外键约束引起的,大多数OLTP类型应用中对整个表的S,SX,X锁是不必要的,只需要行级锁(ROW LEVEL LOCK)即可以满足需要。客户本身也在怀疑应用程序存在问题,但苦于没有有力证据,无法督促开发商修改程序,又因为目前这套应用数据量,业务量都较小,这种矛盾仍不尖锐,所以总是没有行之有效地整改。





too many tweets for twitter?

Just now find that Twitter is over capacity to response as below image:

I am now so curious about twitter web architecture and like to write a note .

EnterpriseDB Replication,复制Oracle数据测试(2)

介绍完了EnterpriseDB复制软件基本原理和注意事项,我们接下来进行Oracle数据复制到EntepriseDB advanced Server的实际演练。


SQL> drop user  source cascade;

User dropped.

SQL> create user source identified by source;

User created.

SQL> grant dba to source;

Grant succeeded.

SQL> grant create any trigger to source;

Grant succeeded.

SQL> conn source/source
SQL> create table EMP
2  (
3    EMPNO    NUMBER(4) not null,
4    ENAME    VARCHAR2(10),
5    JOB      VARCHAR2(9),
6    MGR      NUMBER(4),
8    SAL      NUMBER(7,2),
9    COMM     NUMBER(7,2),
10    DEPTNO   NUMBER(2)
11  )
12  tablespace USERS;

Table created.

SQL> alter table EMP
2    add constraint pk_empno primary key (EMPNO);

Table altered.

SQL> create table DEPT
2  (
DEPTNO NUMBER(2) not null,
3    4    5    6    7  tablespace USERS;

Table created.

SQL> alter table DEPT
2    add constraint PK_DEPT primary key (DEPTNO);

Table altered.

SQL> alter table EMP
2    add constraint fk_deptno foreign key (DEPTNO)
3    references dept (DEPTNO);

Table altered.


同时创建EnterpriseDB Advanced Server中的目标数据库及用户:

edb=#  create user subuser password 'subuser';
ERROR:  role "subuser" already exists
edb=# alter user subuser with Superuser;
edb=# create database subuser tablespace users;

EnterpriseDB复制服务需要DBA Management Server服务的相关支持,其运作方式如下图:


[enterprisedb@rh2 ~]$ source edb_83.env
[enterprisedb@rh2 ~]$ cd $EDBHOME
[enterprisedb@rh2 edba]$ cd mgmtsvr/bin
[enterprisedb@rh2 bin]$ ls
attachments            DBA_Management_Server.pid  jboss_init_redhat.sh  mgmtsvr.000  run.conf  shutdown.jar  twiddle.sh    wrapper.log   wsrunclient.sh
BrowserLauncher.class  deployer.sh                jboss_init_suse.sh    mgmtsvr.sh   run.jar   shutdown.sh   wrapper_83    wsconsume.sh  wstools.sh
classpath.sh           jboss_init_hpux.sh         kill_wrapper.sh       probe.sh     run.sh    twiddle.jar   wrapper.conf  wsprovide.sh
[enterprisedb@rh2 bin]$ ./mgmtsvr.sh  status
DBA Management Server is running (31198).           --目前服务已打开
[enterprisedb@rh2 bin]$ ./mgmtsvr.sh  stop
Stopping DBA Management Server...
Stopped DBA Management Server.
[enterprisedb@rh2 bin]$ ./mgmtsvr.sh  start           -- 若未打开,则start
Starting DBA Management Server...

接着我们可以从桌面上的application栏打开replication console,并选择注册管理服务(register management Server):

成功注册服务后,我们需要分别在发布服务和订阅服务中配置JVM option,右键点击Publication service选择Advanced JVM options,在该窗口内Insert一条记录,如图:

其内容为-Djava.rmi.server.hostname=$IP, 其中$IP为已注册的DBA Management Server所监听的IP地址。配置完成后分别启动发布与订阅服务。

针对订阅服务也需要进行以上配置,JVM options也添加的条目为-Djava.rmi.server.hostname=$IP。





[enterprisedb@rh2 ~]$ psql subuser subuser
Password for user subuser:
Welcome to psql, the EnterpriseDB interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help with edb-psql commands
       \g or terminate with semicolon to execute query
       \q to quit

subuser=# desc source.emp;
                 Table "source.emp"
  Column  |            Type             | Modifiers
 empno    | numeric(4,0)                | not null
 ename    | character varying(10)       |
 job      | character varying(9)        |
 mgr      | numeric(4,0)                |
 hiredate | timestamp without time zone |
 sal      | numeric(7,2)                |
 comm     | numeric(7,2)                |
 deptno   | numeric(2,0)                |
    "pk_empno" PRIMARY KEY, btree (empno)

subuser=# desc source.dept;
            Table "source.dept"
 Column |         Type          | Modifiers
 deptno | numeric(2,0)          | not null
 dname  | character varying(14) |
 loc    | character varying(13) |
    "pk_dept" PRIMARY KEY, btree (deptno)


SQL> insert into dept select * from scott.dept;

4 rows created.

SQL> commit;

Commit complete.

SQL> select * from dept;

    DEPTNO DNAME          LOC
---------- -------------- -------------
        10 ACCOUNTING     NEW YORK
        20 RESEARCH       DALLAS
        30 SALES          CHICAGO
        40 OPERATIONS     BOSTON

SQL> begin
  2  for i in 1..9999 loop
  3  insert into emp values(i,'Maclean','DBA',1,sysdate-365,8888,50,10);
  4  end loop;
  5  commit;
  6  end;
  7  /

PL/SQL procedure successfully completed.

SQL> select count(*) from emp;




Source database connectivity info…
conn =jdbc:oracle:thin:@
user =source
Target database connectivity info…
conn =jdbc:edb://
user =subuser
Importing redwood schema SOURCE…
Table List: ‘DEPT’,’EMP’
Loading Table Data in 8 MB batches…
Disabling FK constraints & triggers on source.dept before truncate…
Truncating table DEPT before data load…
Disabling indexes on source.dept before data load…
Loading Table: DEPT …
Migrated 4 rows.
Enabling FK constraints & triggers on source.dept…
Enabling indexes on source.dept after data load…
Creating Constraint: PK_DEPT
Table Data Load Summary: Total Time(s): 1.261 Total Rows: 4
Disabling FK constraints & triggers on source.emp before truncate…
Truncating table EMP before data load…
Disabling indexes on source.emp before data load…
Loading Table: EMP …
Migrated 9999 rows.
Enabling FK constraints & triggers on source.emp…
Enabling indexes on source.emp after data load…
Creating Constraint: PK_EMPNO
Table Data Load Summary: Total Time(s): 3.782 Total Rows: 9999 Total Size(MB): 0.494140625
Performing ANALYZE on EnterpriseDB database…
Data Load Summary: Total Time (sec): 5.043 Total Rows: 10003 Total Size(MB): 0.506

Schema SOURCE imported successfully.

Migration process completed successfully.

Migration logs have been saved to /s01/edba/mgmtsvr/server/default/deploy/edb-rrep-ws.war/WEB-INF/logs

******************** Migration Summary ********************
Tables: 2 out of 2
Constraints: 2 out of 2

Total objects: 4
Successful count: 4
Failure count: 0


可以看到快照成功复制了我们需要的数据,现在我们来尝试使用同步模式(synchronize mode) ,我们先来定义一个持续性的间隔为5s的 Scheduled Task,选中订阅服务并点击功能栏中的Configure Schedule,选择Synchronize和Continuously,间隔时间选择为10s:


-- source database
20:08:51 SQL> select sum(sal) from emp;


20:09:09 SQL> update emp set sal=sal*1.1 ;

9999 rows updated.

20:09:34 SQL> commit;

Commit complete.

20:09:36 SQL> select sum(sal) from emp;

-- EntepriseDB端
subuser=# select sum(sal) from source.emp;
(1 row)

好了,以上我们利用EnterpriseDB Replication软件完成了一个由Oracle数据库到EDB advanced server间最简单的数据复制服务的配置。

EnterpriseDB Migration 迁移工具使用测试(2)

下面我们来测试EnterpriseDB Migration 工具对于Oracle 大对象(LOB)的迁移情况;

SQL> create table tlob (t1 int primary key,t2 clob,t3 blob);
Table created.
-- 并填充数据
SQL> begin
  2  for i in 1..100 loop
  3  insert into tlob values(i,rpad('A',9999,'Z'),hextoraw(i) );
  4  end loop;
  5  commit;
  6  end;
  7  /

PL/SQL procedure successfully completed.

打开EnterpriseDB Migration 工具界面,从树形图中找到需要迁移的表TLOB,选择进行在线迁移:


[Starting Migration]
连接 =jdbc:oracle:thin:@rh2.home:1521:G10R21
用户 =system
连接 =jdbc:edb://rh2.home:5444/subuser
用户 =maclean
正在导入 Redwood 架构 SCOTT…
表列表: ‘TLOB’
正在创建表: TLOB
已创建 1 个表。
正在以 8 MB 批次大小加载表数据…
正在将大型对象加载到表: TLOB…
表数据加载摘要: 时间总计 (秒): 1.122 行数总计: 100 大小总计 (MB): 0.380859375
数据加载摘要: 时间总计 (秒): 1.122 行数总计: 100 大小总计 (MB): 0.39
正在创建约束: SYS_C005182

已成功导入架构 SCOTT。


迁移日志已保存到 C:\Users\windesk\.enterprisedb\migrationstudio\build60

******************** 迁移摘要 ********************
Tables: 1 来自 1
Constraints: 1 来自 1

全部对象: 2
成功计数: 2
失败计数: 0



[enterprisedb@rh2 ~]$ psql
Welcome to psql, the EnterpriseDB interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help with edb-psql commands
       \g or terminate with semicolon to execute query
       \q to quit

edb=# \c subuser
You are now connected to database "subuser".
subuser=# desc scott.tlob;
      Table "scott.tlob"
 Column |  Type   | Modifiers
 t1     | numeric | not null
 t2     | text    |
 t3     | bytea   |
    "sys_c005182" PRIMARY KEY, btree (t1)

subuser=# select count(*) from scott.tlob;

(1 row)

可以看到装换过程中将clob类型转制为text,而blob类型则转制为bytea。postgre中text类型为可变无限长文本类型(variable unlimited length)。


We are in the process of updating our website. The site will not be available for the next few minutes. Sorry for the inconvenience.
The EnterpriseDB Web team.

另外bytea类型为一种变长的二进制字串,postgre组织的文档对这2种类型的存储数据上限没有非常明确的叙述,就目前找到的文献可以肯定的是postgre V7中这两种类型大小限制为1G

那么如果Oracle 中Blob/Clob类型大小超过了1G,就可能导致迁移无法正常进行。

explore my oracle support using firefox 3.6

升级到FF3.6的朋友肯定发现了现在使用火狐浏览器访问my oracle support 时许多页面打开为空白页,譬如这篇介绍FF3.6与目前oracle support site兼容性的文章:

Firefox 3.6 Not Officially Supported by My Oracle Support

Although much of the functionality of My Oracle Support is accessible via Firefox 3.6, please be aware that Firefox 3.6 is not officially supported by My Oracle Support at this time. Please review the My Oracle Support FAQ (Knowledge Document #747252.5) for details on the current browser requirements for accessing My Oracle Support.

We are aware of a current issue when viewing Knowledge documents using Firefox 3.6. Using Firefox 3.6, when clicking on a link for a Knowledge document the document appears blank.  The workaround is to instead open the documents in a new tab or new window.

We are working on a fix for this issue and hope to have this resolved in the near future. We apologize for any inconvenience.

FF升级到3.6的时间已经超过一个月了;oracle support 之前曾发表将解决该兼容问题的声明,原本以为这只是一个小case,因该在数周内彻底解决。但目前又发布了暂时不官方支持的声明(大部分文档可以通过在新窗口内打开方式阅读)。

看起来大型网站的建设维护确实不是Oracle的特长,apex或者说htmldb又真的适合超大型web site吗?

ora-7445 [kghalp+0500] [SIGSEGV]错误

今天没有外出(似乎人不到现场就特别容易出问题),早上10点左右接到电话被告知crm11实例上出现了7445错误,准备用web vpn拨上去查看一下,赫然发觉windows 7 不支持这种vpn(准确说ie8和firefox都不支持);无奈无奈只好用拨号。
发现alert log中出现大量 7445错误记录:

Fri Mar 26 09:24:53 2010
Errors in file /oravl01/oracle/admin/CRMDB1/udump/crmdb11_ora_6754320.trc:
ORA-07445: exception encountered: core dump [kghalp+0500] [SIGSEGV] [Invalid permissions for mapped object] [0x00000003B] [] []
Fri Mar 26 09:24:55 2010
Trace dumping is performing id=[cdmp_20100326092455]
Fri Mar 26 09:31:16 2010
Errors in file /oravl01/oracle/admin/CRMDB1/udump/crmdb11_ora_2994552.trc:
ORA-07445: exception encountered: core dump [kghalp+0500] [SIGSEGV] [Invalid permissions for mapped object] [0x00000003B] [] []

看到kghalp函数第一印象 ,是Oracle中堆管理使用的函数;
让我们猜猜字面意思? k -> kernel g -> generic h-> heap a-> allocation p-> point
再让我们来看一下当时的call stack:

Exception signal: 11 (SIGSEGV), code: 51 (Invalid permissions for mapped object), addr: 0x3b, PC: [0x1000973e0, kghalp+0500]
iar: 00000001000973e0, msr: a00000000000d0b2
lr: 00000001013a6df8,  cr: 0000000022292484
r00: 0000000000000010, r01: 0ffffffffffcb160, r02: 000000011022a9c0,
r03: 0000000000000002, r04: 0000000000000000, r05: 0000000000000100,
r06: 0000000000000001, r07: 0000000000000000, r08: 0000000000000000,
r09: 0000000000000000, r10: 00000000101b60d8, r11: 0000000000000004,
r12: 0000000024592484, r13: 000000011026bfe0, r14: 0000000000000000,
r15: 0000000000009000, r16: 0000000110195b2c, r17: 0000000000000000,
r18: 0000000000000001, r19: 0000000000000000, r20: 0000000000001000,
r21: 0000000000000000, r22: 0000000000000100, r23: 0000000000000001,
r24: 0000000000000000, r25: 0000000000000000, r26: 0000000000000001,
r27: 0000000104c7fd44, r28: 0000000000000000, r29: 0000000000000100,
r30: 0000000000000000, r31: 0000000110195a58,
*** 2010-03-26 09:57:28.679
ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [kghalp+0500] [SIGSEGV] [Invalid permissions for mapped object] [0x00000003B] [] []
Current SQL statement for this session:
----- PL/SQL Call Stack -----
object      line  object
handle    number  name
70000043da500d0        10  anonymous block
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst+001c          bl       ksedst1              000000000 ? 104A54EED ?
ksedmp+0290          bl       ksedst               104A54870 ?
ssexhd+03e0          bl       ksedmp               300001D15 ?
000044C0             ?        00000000
parchk+01f4          bl       kghalp               000000000 ?
2842288200000001 ?
000000000 ? 000000000 ?
000001040 ? 110195B2C ?
ptmak+0168           bl       parchk               FFFFFFFFFFCB560 ?
pdybF00_Init+0244    bl       ptmak                10008049C ? 000000000 ?
pdy1F79_Init+00c8    bl       pdybF00_Init         110BEB1D0 ?
pdy1F01_Driver+0048  bl       pdy1F79_Init         FFFFFFFFFFCBC40 ?
pdli_new_cog+00f0    bl       pdy1F01_Driver       FFFFFFFFFFCBCE0 ? 000000000 ?
pdlifu+0264          bl       pdli_new_cog         1013885F4 ? FFFFFFFFFFCCB00 ?
7000004383E7680 ?
phpcog+0010          bl       pdlifu               FFFFFFFFFFCD958 ?
7000004383E7680 ? 104C95048 ?
phpcmp+0f80          bl       phpcog               FFFFFFFFFFCC4F0 ? 000000000 ?
pcicms2+02d4         bl       phpcmp               FFFFFFFFFFCD958 ?

发生错误的最上层 kghalp 函数由 parchk 调用, 这似乎是一个package check函数(猜测,呵呵). 我们来整理一下思路, parchk 函数调用了 kghalp函数以帮其分配内存,但却得到了一个非法的低地址[[0x00000003B],正常情况下正文段使用的空间; 这看起来显然是一个bug。
让我们来查查support.oracle.com , 键入7445 kghalp 和sigsegv 关键字 (很多时候不需要使用ora 600/7445 lookup tools).
bug 8244533 赫然显目:

ksedst <- ksedmp <- ssexhd <- 000044BC <- parchk        <- ptmak <-
pdybF00_Init <- pdy1F79_Init <- pdy1F01_Driver <- pdli_new_cog         <-
pdlifu <- phpcog <- phpcmp <- pcicms2 <- pcicms          <- kkxcms <- kkxswcm
<- kkxmpbms <- kkxmesu <- xtypls           <- qctopls <- qctcopn <- qctcopn
Exception signal: 11 (SIGSEGV), code: 51 (Invalid permissions for mapped
addr: 0x3b, PC: [0x1000973e0, kghalp+0500]
iar: 00000001000973e0, msr: a00000000000d0b2
lr: 000000010139ffb8,  cr: 00000000222a2484
r00: 0000000000000010, r01: 0ffffffffffe2980, r02: 00000001101e5ab8,
r03: 0000000000000002, r04: 0000000000000000, r05: 0000000000000100,
r06: 0000000000000001, r07: 0000000000000000, r08: 0000000000000000,
r09: 0000000000000000, r10: 0000000010171200, r11: 0000000000000004,
r12: 00000000245a2484, r13: 000000011021fbc0, r14: 0000000000000000,
r15: 0000000000009000, r16: 0000000110150c54, r17: 0000000000000000,
r18: 0000000000000001, r19: 0000000000000000, r20: 0000000000001000,
r21: 0000000000000000, r22: 0000000000000100, r23: 0000000000000001,
r24: 0000000000000000, r25: 0000000000000000, r26: 0000000000000001,
r27: 0000000104c5983c, r28: 0000000000000000, r29: 0000000000000100,
r30: 0000000000000000, r31: 0000000110150b80,
*** 16:37:14.603
ksedmp: internal or fatal error
ORA-7445: exception encountered: core dump [kghalp+0500] [SIGSEGV]
[Invalid permissions for mapped object] [0x00000003B] [] []
Current SQL statement for this session:
select dummy from dual where  ora_dict_obj_type = 'TABLE'
----- Call Stack Trace -----ptmak pdybF00_Init pdy1F79_Init pdy1F01_Driver pdli_new_cog pdlifuphpcog phpcmp pcicms2 pcicms kkxcms kkxswcm kkxmpbms kkxmesu xtyplsTo Filer.Based on this call stack this would appear a likely match forbug 6951953 Abstract: ORA-7445 [PTMAK] IMPORTING PACKAGE COMPILED DEBUG.This bug is fixed on and there is a patch available for IBM AIX Based Systems (64-bit).It maybe worth while to have the customer apply the patch to seeif it resolves the issue.Also the uploaded files included test.sql is this a reproducable testcase?

这个bug 似乎仅在 IBM AIX on POWER Systems (64-bit) 发生,当以DEBUG 模式编译包时有一定几率出现。
好了,既然已经了解了可能发生的诱因,我们可以进一步分析了,接下来看看 errorstack trace信息中 的SO 记录。

SO: 70000043d217668, type: 53, owner: 70000048cee2238, flag: INIT/-/-/0x00
LIBRARY OBJECT LOCK: lock=70000043d217668 handle=700000446261588 mode=N
call pin=0 session pin=0 hpc=0000 hlc=0000
htl=70000043d2176e8[70000042b52b368,70000042bb9a808] htb=70000044929b460 ssga=70000044929ad68
user=70000048cee2238 session=70000048eb33010 count=1 flags=[0000] savepoint=0x4bac1488
LIBRARY OBJECT HANDLE: handle=700000446261588 mtx=7000004462616b8(1) cdp=1
hash=164e6a8942406cee159f8943a1a3c85e timestamp=03-26-2010 09:52:12
namespace=CRSR flags=RON/KGHP/TIM/PN0/SML/KST/DBN/MTX/[120100d0]
kkkk-dddd-llll=0000-0001-0001 lock=N pin=0 latch#=16 hpc=0002 hlc=0002
lwt=700000446261630[700000446261630,700000446261630] ltm=700000446261640[700000446261640,700000446261640]
pwt=7000004462615f8[7000004462615f8,7000004462615f8] ptm=700000446261608[700000446261608,700000446261608]
ref=700000446261660[700000446261660,700000446261660] lnd=700000446261678[700000446261678,700000446261678]
LIBRARY OBJECT: object=70000045adbc1e8
type=CRSR flags=EXS[0001] pflags=[0000] status=VALD load=0
CHILDREN: size=16
child#    table reference   handle
5 70000041776f5c0 70000045ae44720 70000042bfa3a20
data#     heap  pointer    status pins change whr
0 70000043d9fed20 70000045adbc300 I/P/A/-/-    0 NONE   00

的确有以debug 模式编译对象的语句,不过对象不是包而是trigger ; 看起来只要是可以以debug 模式compile 的对象都有可能引发该问题。
好了,问题到这里已经比较明确了: 应用端以DEBUG模式重新编译包引发了 Oracle bug 8244533,从而导致了对应服务进程的崩溃;总算是虚惊一场,之后通过trace内的machine和user信息找到了实施变更的应用方人员并教育之。


当Oracle系统hang住 ,无法使用一切方法登录时 (包括 sqlplus -prelim / as sysdba),我们可以使用gdb调试工具来对 Oracle做系统 dump ,通过 系统 dump信息 判断 具体hang的原因 。 若直接 将 进程 kill 掉,则将失去现场 无法帮助今后避免 这样的hang情况。

要使用gdb 外部工具, 就需要知道目前实例中后台进程的进程号。

我们一般通过 以下命令列出 Oracle 进程:ps -ef|grep <SID>

[oracle@rh2 ~]$ ps -ef|grep oraclewebmoney
oracle   16996 16995  0 21:55 ?        00:00:00 oraclewebmoney (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

然后启动gdb ,指定Oracle软件中二进制文件 oracle的位置和 进程id

[oracle@rh2 udump]$ gdb $ORACLE_HOME/bin/oracle  16996
GNU gdb Red Hat Linux (
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type “show copying” to see the conditions.
There is absolutely no warranty for GDB.  Type “show warranty” for details.
This GDB was configured as “x86_64-redhat-linux-gnu”…
(no debugging symbols found)
Using host libthread_db library “/lib64/tls/libthread_db.so.1″.

Attaching to program: /u01/oracle/product/10.2.0/db_1/bin/oracle, process 14594
Reading symbols from /u01/oracle/product/10.2.0/db_1/lib/libskgxp10.so…(no debugging symbols found)…done.
Loaded symbols for /u01/oracle/product/10.2.0/db_1/lib/libskgxp10.so
Reading symbols from /u01/oracle/product/10.2.0/db_1/lib/libhasgen10.so…done.
Loaded symbols for /u01/oracle/product/10.2.0/db_1/lib/libhasgen10.so
Reading symbols from /u01/oracle/product/10.2.0/db_1/lib/libskgxn2.so…done.
Loaded symbols for /u01/oracle/product/10.2.0/db_1/lib/libskgxn2.so
Reading symbols from /u01/oracle/product/10.2.0/db_1/lib/libocr10.so…done.
Loaded symbols for /u01/oracle/product/10.2.0/db_1/lib/libocr10.so
Reading symbols from /u01/oracle/product/10.2.0/db_1/lib/libocrb10.so…done.
Loaded symbols for /u01/oracle/product/10.2.0/db_1/lib/libocrb10.so
Reading symbols from /u01/oracle/product/10.2.0/db_1/lib/libocrutl10.so…done.
Loaded symbols for /u01/oracle/product/10.2.0/db_1/lib/libocrutl10.so
Reading symbols from /u01/oracle/product/10.2.0/db_1/lib/libjox10.so…

在gdb 提示行中 输入 print ksudss(10),即

(gdb) print ksudss(10)



System name:    Linux
Node name:      rh2
Release:        2.6.9-78.ELsmp
Version:        #1 SMP Wed Jul 9 15:46:26 EDT 2008
Machine:        x86_64
Instance name: webmoney
Redo thread mounted by this instance: 1
Oracle process number: 15
Unix process pid: 16996, image: oracle@rh2 (TNS V1-V3)

*** 2009-09-07 21:57:14.100
*** SERVICE NAME:(SYS$USERS) 2009-09-07 21:57:14.100
*** SESSION ID:(528.2041) 2009-09-07 21:57:14.100
System global information:
processes: base 0x91637c30, size 500, cleanup 0x9167a2e0
allocation: free sessions 0x91779840, free calls (nil)
control alloc errors: 0 (process), 0 (session), 0 (call)
PMON latch cleanup depth: 0
seconds since PMON’s last scan for dead processes: 45
system statistics:

[gview file=”http://askmaclean.com/wp-content/uploads/resource/gdb.pdf”]


沪公网安备 31010802001379号