[转]著名安全专家Litchfield对Oracle开火

February 2nd, 2012 1 comment

著名的数据库安全专家 Litchfield为自己赋予的使命就是告诉全世界数据库软件并不安全——特别是Oracle的数据库。Litchfield曾经公开批评Oracle,甚至要求Oracle首席安全官Mary Ann Davidson下台。

Litchfield认为,长期以来,Oracle及其用户在安全领域里一直象鸵鸟一样把头插在沙子中。 Oracle采用了错误的方式来解决安全问题。

英国下一代安全软件的合作创办人Litchfield正在进行一场圣战。今年一月,他出版了一本Oracle黑客手册。手册的封面上说为读者提供了完整的访问和防护Oracle系统的方法。

在批判Oracle的同时,Litchfield却对微软极力推崇。他曾经公开声称微软最新的数据库软件SQL Server 2005是安全的。这种声明一定严重的伤害到了微软的主要竞争对手Oracle。Oracle已经眼看着一大块数据库市场划归了华盛顿Redmond的软件巨人。

在上周召开的Black Hat DC大会上,Litchfield讨论到了一种新的袭击技术使Oracle数据软件的漏洞问题更加严重。他向ZDNet澳洲的姐妹网站CNET News.com解释了揭露漏洞的必要性。

问:为什么您对数据库安全如此关注?还有其他那么多软件。
Litchfield: 数据库安全对于任何组织机构来说就象是王冠上的珠宝。这个星球上的每家机构都有数据库,而这组织机构存在的活力之源。没有什么比从源头进行把握更有效的安全措施。我们能够在周边进行安全工作,但是如果软件本身带有SQL injection这样的漏洞,那么安全措施就前功尽弃了。

我与Oracle的关系已经有所缓和。

尽管有防火墙,尽管网络服务器已经被锁定,但是网络应用中的SQL injection缺陷就能让我们一路畅通的进入数据库服务器的后端。如果这个数据库没有采用最低权限,或者没有完全打好补丁,那么我们就能对数据库进行充分的访问并攫取全部数据。

数据库必须是安全的。问题是在最近以前,没有人真正的处理过数据库服务器的后端。也就是说过去人们采取的都不过是外围安全措施。

最近您对Oracle的数据库相当关注。是有什么特别的原因让您对Oracle倾注更多吗?
Litchfield:是的。SQL Server 2005是安全的。因为微软解决了问题。Oracle正在解决问题。对于IBM,我研究过DB2和Informix,并为他们指出了从缓存溢出到权限增加等大约50个bug,IBM安全部门的反应是成熟的。

最近,Oracle安全部门的反应就没那么成熟。他们气势汹汹的,与“这个家伙在让我们的产品更安全”的想法完全相反。不过他们的态度现在有所好转。Oracle正在开始理解我和他们站在同一条战线上,只是彼此的看法不同。

当Oracle这样的厂家态度强硬时,您就会变得更加强硬?
Litchfield:是的。很遗憾我正是这样行事的。但是如果你不得不保护自己,那么就保护自己吧。我更愿意去工作,就象我对微软和IBM那样,与他们的安全响应团队一起工作。我们与微软和IBM拥有良好的关系。有什么比良好的关系好的成事方法呢?我可不想站在浑水中互相指责。

我与Oracle的关系有所缓解,他们理解这并不是一场意志上的对决。我努力使他们了解他们数据库所存在的问题,因为这些问题对我造成了直接的对影响。如果有人闯入数据库服务器然后窃取了我的信息,付出代价的是我,而不是Oracle

有人可能会认为这有点象敲诈。
Litchfield:我可从来没有向Oracle索要过财物。如果人们这么想,那么他们得到的信息可能有误。

 

 

那么微软也没有雇用你来说SQL Server 2005是安全的?
Litchfield:我说微软的产品是安全的但是没有从微软那里得到什么报酬,如果任何人在SQL Server
2005中找到bug,那个人最好是我。如果别人找到什么bug,它会破坏我将来判断产品是否安全的能力。因此,如果在SQL Server 2005中的确存在bug,我希望是我首先发现。我很期待。

 

 

微软过去和现在是否是NGS软件的客户?
Litchfield:NGS的确在微软工作,但我们并不是受雇来说他们是安全的——我们被雇来使他们的产品更安全。对于微软和NGS来说,现在以及将来的独立性都很重要。否则我们工作的正确性以及微软为使产品更安全所进行的努力就会遭到怀疑。这就是NGS 依然在为微软的产品提出安全建议的原因。

 

 

我听说您曾经担任SQL Server 2005的安全审计工作,是这样吗?
Litchfield:我不能说具体的说到我们所做的项目。这样,如果有人对SQL Server是否比Oracle安全的问题存在疑问,他所要做的就是想想包括那么多顶级研究人员在内的很多人都曾经研究过两个产品,寻找过安全漏洞。而SQL Server已经很长时间没有被发现问题了。我再重复一遍,如果有人在SQL Server 2005中发现严重的漏洞,那么我希望那个人是我。

 

 

Oracle是否曾经是NGS软件的客户?
Litchfield:是的,过去我们与Oracle合作过几个项目。

 

 

NGS软件的主要业务是什么?
Litchfield:我们的业务分三个方面。我们销售评估安全状况和是否遵从萨班斯 – 奥克斯利法案的工具;我们为一些组织机构提供顾问服务;而且我们还进行漏洞调研并销售调研报告。

 

 

你们一般调研对象是什么样的机构?
Litchfield:负责和保护关键性国家基础设施的政府机构。我们试着对他们的安全问题提出事前警告。我们能够告诉他们某个产品存在缺陷,并且提供消除问题的策略。甚至没有厂家提供的补丁,系统也能得到保护。

靠无知来保证安全是行不通的,因为某个人的无知就是别人的生财之道。

 

 

NGS过去几年发展顺利,这些需求来自哪里?
Litchfield:主要是顾问工作。说起来惭愧,我最初要成立一家软件企业,但现在却更象一家顾问公司。尽管我没有放弃,但也算是我个人的一次失败。我们到某个阶段还会成为一家软件公司。
顾问一般怎么工作?
Litchfield:他可能会做渗透测试,审查代码或者模仿入侵。我们所做的不是安装防火墙那样的工作,我们所从事的是高端工作。

 

是什么每天推动您进行工作?
Litchfield:是因为我对次很擅长。如果你很擅长某件事情,您的动力就会更足。如果我是优秀的画家,我就会画很多作品。如果我对此一窍不通,我当然就不会费心劳力的去画画。我很享受我的工作。

 

是不是特别享受发现bug的工作?
Litchfield:是的。这是一个关于分析的问题。如果我尝试推翻某个系统,我该怎么做呢?另一个原因是它会影响每个人的生活。现在,不是在拿死马当活马医。我知道明天数据库服务器将会更加安全。打个比方说,到那一天,更多的信用卡用户会更安全。

 

如果Oracle的人说你暴露缺陷的的行为实际上伤害了安全,你会怎么说?

Litchfield:在他们假设的情况下这样做的确会提高了风险等级。好的,这的确是这类工作最主要的问题。不过,在风险度提高以后,人们会更倾向于保护自己的系统。
举例来说,我刚刚披露了一种能使没有特殊权限的入侵者利用只有具有更高权限用户才能使用的漏洞进行袭击的方法。现在我们知道这种担心是不对的,因为人们没道理知道这个缺陷以后不打补丁。

有人在我贴出新方法后的零时间内利用我的方法修改入侵手段,并进行公布。于是任何人都可以使用这种手段,所以这的确增加了风险。

回头看2002年8月,我发布的一些代码被用做SQL Slammer病毒的基础。这属于最初的风险增加,但是短痛之后,打过补丁的SQL Servers数量增加了。短期风险成为了长期的受益。这是我对此的看法。

 

有人可能会说我们不想知道都有什么安全隐患,也就不会有人进行利用。你认为这有道理吗?
Litchfield:我不这么认为。世界上总有坏人。如果没有好人来帮助厂家弥补这些漏洞,那么我们会自以为我们是安全的,但实际上我们并不安全。对安全问题视而不见是起不了作用的,因为一个人的无知就是另外一个人的生财之道。

 

什么使您觉得最烦恼?
Litchfield:当人们说我增加了风险或者我的行为出于自私目的时,实际上并不是那样。不过我不会总那么受欢迎,我只是希望诽谤能够少一些。

 

您最近出版了Oracle黑客手册。您的目的是什么?
Litchfield:Oracle的安全世界里充斥着自鸣得意。我希望能够揭掉他们自我蒙蔽的毯子。外面有太多人认为Oracle的产品是安全的,他们无需采取任何措施。这是不负责任的,而我对此很在意。

 

你希望人们怎么看待你?
Litchfield:我希望能够成为帮助人们认识到数据库安全的重要性的人。我希望能够通过我的工作,以及我对行业的了解来改造Oracle 和微软这样的企业处理安全问题的方式。

历数几款第三方的Oracle数据库安全及漏洞扫描软件

February 1st, 2012 No comments

虽然oracle公司自有一套丰富的数据库产品线, 包括 oracle advanced security, VDP , Database vault , lable security , Database FireWall 等等。

但我们还是有必要关注一些第三方的 安全工具, 这些安全工具的主要用途 包括: 漏洞扫描,风险评估,安全建议,审计等。

 

Secure Oracle Auditor -  Secure Bytes 的产品 图形化的集中式审计工具, 可以自定义审计策略; 并分析数据库风险,
产品主页: http://www.secure-bytes.com/soa.php

软件截图:

 

 

Oracle Database Encryption Wizard For Oracle  – Relational Database Consultants, Inc (RDC)的产品  主要功能是 数据加密,  支持 AES256 and DES3 Encryption加密算法  , 在从版本7开始支持Oracle 11gR2 及HSM( Hardware Security Modules )。
产品主页:http://www.relationalwizards.com/html/ora_encyrption.html

软件截图:

 

 

DB  Protect  – AppSecInc  的产品 , 提供企业级 的数据安全方案, 功能包括 隔离敏感数据库, 发现及修正可能存在的数据风险, 控制企业员工的数据访问权限,  监控越权行为等。
产品主页: http://www.appsecinc.com/products/dbprotect/index.shtml

软件截图:

 

 

NGS SQuirreL for Oracle -  NGS secure 的产品 , 算是已经在国内比较知名的 oracle 风险评估漏洞扫描工具, 支持从oracle 7.3 到 11g的主要版本。
产品主页: http://www.ngssecure.com/services/information-security-software/ngs-squirrel-for-oracle.aspx

 

产品介绍:

 

Oracle数据库安全漏洞扫描工具——NGSSQuirreL for Oracle

NGSSQuirrel for Oracle是国际顶级的数据库安全漏洞扫描工具。深圳市九州安域科技有限公司是NGSSQuirrel for Oracle数据库安全漏洞扫描工具中国区授权代理。

NGSSQuirreL for Oracle支持Oracle 8i,9i和10g,并且可以检查几千个可能存在的安全威胁、补丁状况、对象和权限信息、登陆和密码机制、存储过程以及启动过程。NGSSQuirrel提供强大的密码审计功能,包括字典和暴力破解模式。

NGSSQuirrel

1.         专业数据库漏洞评估工具;

支持MSSQL Server,Oracle,Informix,DB2,MySQL ,Sybase ASE数据库

支持对数据库所有实例的特权、角色、表单、视图、存储过程等进行安全检测。

2.         用于保护基础数据库平台安全并确保数据库满足安全法规的要求;

3.         创建Lockdown脚本,用于自动修复数据库扫描中未发现的漏洞

4.         通过check selection功能,可实行具体扫描或针对目标客户的扫描。

可以为特定目标扫描存储某一定制模板

可根据具体合规性扫描选择合规性模板

5.     提供业界专业的数据库安全资讯以及安全教材。

NGS SQuirreL 数据库扫描检测内容:

1.  扫描数据库默认口令、弱口令

2.     检测触发器、存储程序、表单、包等的访问权限

3.     识别默认Object的漏洞

4.     利用密码哈希值运行密码审计

5.     审查密码策略

6.     审查数据库的版本以及补丁情况

7.     检查数据库所有安全配置以及安全审计配置

8.     针对所有发现的安全问题提供修复建议

9.     可针对单一数据库或单一实例进行扫描

NGS SQuirreL for Oracle, SQL Server, MySQL, DB2 & Informix 行业合规性审计

NGS扫描器其内置有如下所有的合规性模块:

PCI DSS(支付款行业数据安全标准V1.2或V2.2)

SOX

HIPAA

Gramm-Leach Bliley Act

FISMA

SANS Top 20

CIS Benchmark for Oracle 9i/10g Ver. 2.0

CIS Benchmark for SQL Server 2005 v1.0

CIS Benchmark for MySQL v1.0.2

Oracle 基准

NSA SQL Server 2000 V1.5 安全配置和管理指南

这些模板都会保持持续更新;NGS 审计客户将满足这些标准的要求

 

NGSSQuirrel for Oracle 目前在国内有 九州安域 和   XLSoft  2家代理。

 

软件截图:

 

 

DB Audit – SoftTree的产品    功能强大的数据库安全和审计产品, 支持Oracle, Sybase, DB2, MySQL, Microsoft SQL Server等主流数据库。  DB Audit Expert是一款专业的数据库安全评估,审计和提供解决方案的数据库管理系统。DB Audit Expert允许数据库及系统管理员,安全管理员,审计人员和操作人患跟踪和分析数据库的活动,包括对数据库的访问,使用,对象的建立,修改和删除等。 DB Audit真正独特的是它内置多个审计方式,让您灵活地选择最适合你的数据库安全性要求的审计方式。
产品主页:http://www.softtreetech.com/

 

 

产品介绍

 

   DB Audit Expert是一款专业的数据库安全评估,审计和提供解决方案的数据库管理系统。DB Audit Expert允许数据库及系统管理员,安全管理员,审计人员和操作人患跟踪和分析数据库的活动,包括对数据库的访问,使用,对象的建立,修改和删除等。DB Audit真正独特的是它内置多个审计方式,让您灵活地选择最适合你的数据库安全性要求的审计方式。

主要优势:

提高系统的安全性并确保系统问责制。
捕获常规和“后门”访问被审计的数据库系统。
从易于管理的单一位置,集中了安全和审计控制多个数据库系统的功能,
统一审计的图形界面功能,缩短了学习曲线,很容易使用。
提供分析报告,全面总结概括,减少审核大量数据从而使轻松地识别各种数据库的安全性侵犯。
提供分析报告,以确定哪些进程和用户占用系统资源。
提供本地数据库审计不可用的审计线索的细节。
当敏感数据发生变化时,提供能够生成对关键人员生成电子邮件警报。
解放了DBA,不再需要创建和管理用于数据更改审计目的精心调校的数据库触发器。
支持灵活的审计配置,使安全人员可以选择必须监督和审计跟踪记录的数据库操作和数据修改的特定类型。
对现有的应用程序提供完全透明的系统级和数据更改审计,无需任何修改这些应用。
完全兼容所有主机操作系统可以运行支持的数据库,包括但不限于Windows NT,UNIX和Linux,虚拟机,OS/390,z/OS。

DB Audit 在多种平台,多种数据库上都有完整的解决方案:
安全性预防管理
侦测和安全配置分析
审计与监控
弱点及渗入测试
校正

多种数据库统一管理,操作简便学习周期短,方便使用。通过左侧数据库树型目录可以方便管理各种数据库;右侧大块的工作区域中,将所有的审计按功能分类,可清晰地完成所有的配置。

DB Audit 可以出色的工作在多种平台,多数据库的复杂环境中,通过警告中心服务器收集、存储和分析各种数据库的审计警告,并按照管理中心所配置将审计报表或警告发送到不同的部门及用户。

DB Audit客户:
DB Audit拥有众多大客户,例如,M&T银行,道琼斯公司,富士银行(Fuji Bank),亨廷顿银行(Huntington Bank),Wells Fargo银行,北方信托公司, (The Reserve Funds)储备基金,第一资本金融公司(Capital One Financial Corp.),3M公司,AT&T公司,IBM公司,戴尔公司,JP摩根大通,惠普,壳牌,索尼,美国军队,美国航空航天局等。

 

软件截图:

 

DB Audit 目前在国内有一家授权代理  北京铸锐数码科技

 

 

Audit DB – LuMigent 公司的产品 , 功能包括 数据库活动监控、审计、 用户权限监控、变更复核、访问监控等。
产品主页: http://www.lumigent.com/products/audit-db

软件截图:

 

 

 

DBCoffer – 难得一见的国产数据库安全产品。 相关介绍: 国内首款主动预防型 数据库安全加固产品,存储层、数据访问层、应用访问层全方位防止数据泄密。
产品主页: http://www.schina.cn/a/fangan/shujukubaoxianxiang/about.html

目前找不到该款产品的软件截图 www.oracledatabase12g.com Here

 

Script:找出ASM中的Spfile参数文件

January 30th, 2012 No comments

以下脚本可以用于找出ASM存储中的Spfile参数文件,因为使用asmcmd去查找很不方便,而spfile丢失又是很头大的事情, 所以有一个脚本代劳可以省不少功夫呢!

 

 

--- listspfiles.sql
--- Purpose: Sample script to list spfiles kept in ASM instance
--- Usage: This should be run against an ASM instance,
--- not a database instance.
---
--- cut here --%<----%<----%<----%<----%<----%<--

--list all spfiles

set lines 120
col full_path for a110
SELECT full_path, dir, sys
FROM
(SELECT
CONCAT('+'||gname,SYS_CONNECT_BY_PATH(aname,'/')) full_path,
dir, sys FROM
(SELECT g.name gname,
a.parent_index pindex, a.name aname,
a.reference_index rindex, a.ALIAS_DIRECTORY dir,
a.SYSTEM_CREATED sys
FROM v$asm_alias a, v$asm_diskgroup g
WHERE a.group_number = g.group_number)
START WITH (MOD(pindex, POWER(2, 24))) = 0
CONNECT BY PRIOR rindex = pindex
ORDER BY dir desc, full_path asc)
WHERE UPPER(full_path) LIKE '%SPFILE%'
/

Sample output:

FULL_PATH                                                                                                      D S
-------------------------------------------------------------------------------------------------------------- - -
+DATA/Aspfile.ora                                                                                              N N
+DATA/VPROD/PARAMETERFILE/spfile.273.766620265                                                                 N Y
+DATA/VPROD/PARAMETERFILE/spfile.365.773976489                                                                 N Y
+DATA/VPROD/spfileVPROD.ora                                                                                    N N

Oracle Internal Event:10235 Heap Checking诊断事件

January 30th, 2012 No comments

10235 (check memory manager internal structures) event内部诊断事件,当oracle SGA内存堆heap 管理操作发生错误时会触发该检测事件(For the sga heap, the internal error is signaled only when heap checking (event 10235) is enabled),其级别列表如下:

 

[oracle@nas ~]$ oerr ora 10235
10235, 00000, "check memory manager internal structures"
// *Cause:
// *Action:

List of Levels for Event 10235
Level 1: Check heap before heap is freed.
Level 2: Level 1 + fill allocation and frees with 0xff.
Level 3: level 2 + checks to make sure chunk belongs to heap prior to operation (free, grow, make chunk as
freeable with mark). Scan of extents, could get expensive
Level 4: level 3 + allocates permanent chunks as freeable chunks. This accomplishes two things:
all permanent allocations are in their own chunks and the comments for the permanent chunks appear in a heap dump.
The following levels should be OR'ed in together. For example, in order to use levels 2, 8 and 16,
event 10235 needs to be set to level 26.
Level 8: Check heap on every operation to the heap.
Level 16: Level 8 + check top PGA heap and SGA heap.
Level 32: Level 16 + check all heaps in the top pga heap, recursively; check SGA heap.
Level 64: Level 32 + check SGA heap if started up in single_process=true.
Level 256: align chunks at the bottom of a page and page protect the page that follows the chunk,
to catch writes off the end of a chunk, for all heaps.
Level 512: Like level 256, except for all heaps in the pga heap; so this event is useful for detecting
corruption in the pga heap and pga subheaps. This can also be enabled on specific heaps or chunks with
specific comments.
Level 65536: Enable CPM(Commented PerManent chunk) Keep comment for permanent allocation. It's useful for
detecting memory growth/overrun in permanent space. If you need to investigate the allocations for "library cache",
The combination of event 10049 level 10100 and this event will help to have more descriptive information.

 

注意只有当 怀疑oracle实例存在内存讹误(memory corruption)或者深入研究KGH堆管理内部原理的时候才有必要使用该10235事件,不要在生产库设置该诊断事件!! 不推荐在session会话级别设置该event 可能引起问题,具体的设置方法如下:

 

alter system set event=' 10235 trace name context forever,level 512 ' scope=spfile;
restart instance;

 

 

实际上绝大多数情况下我们不会用到10235 event, 即便是提交有关内存泄露(memory leak)或者内存讹误(memory corruption)的SR后 oracle Support 要求你上传一些heapdump的trace信息, 直接做heapdump转储也已经足够了, 具体用法如下:

 

 

alter session set event 'trace name immediate heapdump level <n>';

level 级别列表如下:

 1:          pga heap,  1025:            pga heap w/ contents
 2:          sga heap,  2050:            sga heap w/ contents
 4:          uga heap,  5000:            uga heap w/ contents
 8:          current call heap,  8200: current call heap w/ contents
16:         user call heap, 16400:     user call heap w/ contents
32:         large alloc heap, 32800:  large alloc heap w/ contents

若希望dump转储某个特定的subheap ,则先要知道该heap descriptor 的address地址

alter session set event 'trace name immediate headump_addr level <addr>';

如何诊断ASMLIB故障

January 30th, 2012 No comments

虽然我并不推荐你使用ASMLIB绑定设备名, 详见这篇文章《Why ASMLIB and why not?》 。  但大概因为介绍ASMLIB的文章远多于UDEV的缘故, 导致有大量对RAC安装配置不太熟悉的朋友仍执意采用ASMLIB,又因为ASMLIB的配置不算太简单所以在实际安装RAC之前的ASMLIB实施过程中有不少人遇到了问题, 其次在ASMLIB的使用过程中麻烦也不少。

这里总结了一下ASMLIB的诊断思路, 如下脚本:

 

cat /etc/sysconfig/oracleasm

1) uname -a
2) rpm -qa | grep ^oracleasm
3) rpm -V oracleasmlib
4) multipath -ll

1) output of command line

# rpm -V oracleasm-support

# /etc/init.d/oracleasm scandisks

# /etc/init.d/oracleasm listdisks

# ls -l -R /dev/oracleasm/

# ls -l /etc/sysconfig/oracleasm

# cat /etc/sysconfig/oracleasm

# mount

2) oracleasm log file

/var/log/oracleasm

3) sosreport

By default the "sos" package should be installed into EL4u6 or later.
(If not, please download the sos package from ULN https://linux.oracle.com)

You just need type command "sosreport" as root user, and press "Enter" or "yes" for all the questions.

The sosreport will run for several minutes, according to different system, the running time might be more longer.
Once completed, "sosreport" will generate a compressed sosreport-xx-xx.bz2 file under /tmp.

[summary]
- confirm system build asm disk on muoltipath devices
- modify /etc/udev/rules.d.90-dm.rules
- currently both nodes could find the asm disk from scandisks and listdisks
- sharon.honor will try installer again, if necessary need get help from application(RAC) team

1. Reboot the box.

2. Run the following commands
#fdisk -l
#multipath -ll
#blkid
#cat /etc/sysconfig/oracleasm
#cat /etc/sysconfig/oracleasm-_dev_oracleasm

#uptime
#/etc/init.d/oracleasm start
#/etc/init.d/oracleasm listdisks

#uptime
#/etc/init.d/oracleasm scandisk

Please also modify the /etc/sysconfig/oracleasm-_dev_oracleasm with below.
ORACLEASM_SCANORDER="dm"
ORACLEASM_SCANEXCLUDE="sd"

The devices that asmlib will scan is controlled in the /etc/sysconfig/oracleasm file
with the "scanorder" and "scanexclude" parameters.
Categories: Oracle, Oracle ASM Tags: ,

如何诊断ORA-125XX连接问题

January 25th, 2012 No comments

以下这个Action Script是我收集的在解决ORA-125XX(如ORA-12560)这类网络链接故障时的一些思路,主要包括 现有的网络配置(client & server side)、监听日志、SQLNET Client trace等信息– How to troubleshooting ORA-125** connection issues:

 

ORA-12560

 

# CHECK FIREWALL, WINDOWS FIREWALL , ANTI-Virus Software First !
ping hostname
tnsping TNS
trcroute TNS
telnet  <hostname> <port>  

tracert hostname

client side
sqlplus scott/tiger@TNS

&
server side
sqlplus scott/tiger@TNS

cat /etc/hosts
cat /etc/resolv.conf
cat /etc/nsswitch.conf
ipconfig -a
ping 127.0.0.1

$ORACLE_HOME/network/admin/sqlnet.ora
$ORACLE_HOME/network/admin/tnsnames.ora
$ORACLE_HOME/network/admin/listener.ora
$ORACLE_HOME/network/admin/endpoints_listener.ora
$ORACLE_HOME/network/log/*
sqlnet.log listener.log
/var/log/messages
/var/adm/messages
errpt -a

ls -ld $ORACLE_HOME
netstat -rn
ps -ef | grep -i tns
lsnrctl status {listener_name}
lsnrvtl services {listener_name}

ulimit -a

1. Complete database alert log.

2. If the database was not restarted from the time of last occurance of the
issue,

select * from v$resource_limit

3. RAM and SWAP configured on the server.

4. ulimit settings for oracle user:

ulimit -aS
ulimit -aH

5. Kernel parameter settings:

/etc/sysctl.conf

dblogin

show parameter cluster_database

show parameter listener

$srvctl config vip -n {nodename}

$lsnrctl status listener

agent.log and the crsd.log ..

crsd agent log and the crsd.log
$crsctl getperm resource ora.LISTENER.lsnr

sql net client trace , Client side tracing is done by adding the following syntax to the client’s sqlnet.ora file:

We will need a timestamped matching set of client/listener sqlnet traces while error is reproduced in order to find the root cause of the issue.

++ Enable client sqlnet tracing.
=======================

To do this add the following to client sqlnet.ora:

TRACE_LEVEL_CLIENT=16
TRACE_UNIQUE_CLIENT=TRUE
TRACE_DIRECTORY_CLIENT=path
TRACE_FILE_CLIENT=client
TRACE_TIMESTAMP_CLIENT=ON

replace path with a local directory for the trace files. (for example c:\temp)
Do a test connection from the problematic client and check if the trace files are created.
Upload the traces containing the error to me on metalink.

++ Enable listener sqlnet tracing.
==========================
To do this edit the listener.ora and add,

TRACE_LEVEL_{listener name}=16
TRACE_TIMESTAMP_{listener name}=TRUE
TRACE_DIRECTORY_{listener name}=/tmp {– this can be any directory other than a top level directory like / or c:\

Replace {listener name} with the name of the listener. For example if your listener was called LISTENER then TRACE_LEVEL_LISTENER=16

You need to restrict the amount of disk space used by the tracing then you must also set,

TRACE_FILELEN_{listener name}=500000 {– size of the files in K
TRACE_FILENO_{listener name}=10 {– number of files

This will limit the traces to 10 files of around 500Mb, so 5000Mb in total. When the 10th file is full it will reuse file number one.
You will need to stop/start the listener for this to take effect.
When the problem reproduces please can you upload the listener trace and the listener log.

Trace_level_client=16
Trace_directory_client={path_to_the_trace_directory} # use the full path to the trace directory
Trace_unique_client=on
Trace_timestamp_client=on
Diag_adr_enabled=off

trace Local listener or SCAN listeners
TRACE_LEVEL_{listener_name}= 16
TRACE_TIMESTAMP_{listener_name}=on
TRACE_DIRECTORY_{listener_name}={path_to_the_trace_directory}

truss -o /tmp/lisener.out -fae lsnrctl start {listener_name}

Some Useful Note:

Note.444705.1 TroubleShooting Guide For ORA-12514 TNS listener could not resolve SERVICE_NAME given in connect descriptor
Note.761740.1 Technicians Unable To Receive Orders While MWM Components Display ODBC Errors And Are Connected
Note.119007.1 ORA-12560: Administering the Listener on UNIX – Troubleshooting
Note 276812.1 TNS-12542 Error When Executing Batch Jobs or in High Transaction Environment
Note.219208.1 Ext/Pub Client Connection via Connect Manager Fails with TNS-12564
Note.393941.1 Ext/Mod ORA-12564 Reported When Using 10g Connection Manager
Note.1116960.1 ORA-609 TNS-12537 and TNS-12547 in 11g Alert.log
Note.550859.1 Abstract TROUBLESHOOTING GUIDE TNS-12518 TNS listener could not hand off client connection.
Note.207303.1 Client / Server / Interoperability Support Between Different Oracle Versions
Note.119706.1 Troubleshooting Guide TNS-12535 or ORA-12535 or ORA-12170 Errors

For database links between different Oracle versions connections must be supported in BOTH directions in the matrix found in Note 207303.1
eg: As 9.2 -} 7.3.4 is not supported then database links between these version
are not supported in either direction.
You are trying to connect two versions (client-server) that are not certified (as confirmed by Note 207303.1) and between which exist many technical incompatibilities.

CLIENT — LISTENER — SERVER RESULT

8 11.1 8 OK
9 11.1 9 OK
10 11.1 10 OK
11 11.1 11 OK

8 11.2 8 FAILS
9 11.2 9 OK
10 11.2 10 OK
11 11.2 11 OK

9 11.1 8 OK
10 11.1 8 OK
11 11.1 8 OK

9 11.2 8 FAILS
10 11.2 8 FAILS
11 11.2 8 FAILS

The relevant relationship that appears to be at issue is LISTENER and DATABASE. Client version is not a factor.

But if the ultimate outcome is that the 11.2 (11gR2) LISTENER is indicated (though I still haven’t seen documentation of this) as not compatible with use on a ORACLE 8i (8.1.7.0) DATABASE, then we’ll capture that here and move on. I would, however, like to see some evidence of this, if it is available. I can find notes in the KB about 10gR2′s listener not supporting 8i database, and I can find notes about 11gR1 having resolved that regression. But I can find nothing regarding listener/database compatibility that mentions 11gR2, that would explain our results.
Clients should be complied with Servers , For Sever 11.2 the only supported clients are 11.2.0 , 11.1.0 , 10.2.0 : 10g end MUST be at 10.2.0.2 (or higher) respectively in order to use PLSQL between those versions. See Note:4511371.8 for more details and finally 10.1.0.5 only with extended support .

On the other Side in order to connect from listener to DB server in a supported way , Listener version should be greater than or equal to the server version .

Note 207303.1 should still be followed.

Oracle为何会发生归档日志archivelog大小远小于联机重做日志online redo log size的情况?

January 24th, 2012 2 comments

《rac不到redo大小就切换是什么原因?》

 

这个帖子详细讨论了RAC或单机环境中归档日志没到online redolog size 大小就马上切换问题的成因, 主要是由于oracle内部算法所决定的,  一般不会造成实例 性能或可用性问题, 唯一的困扰可能是 生成了较多的小归档日志。

 

Know more about RBA redo block address

January 24th, 2012 No comments

A “Redo Block Address” (RBA) describes a physical location within a redo log file. An RBA consists of the following:

Log sequence number
Block number within log file
Byte offset within block

The structure of the redo block address is given below :
1 – The number of the redo log.
[4 bytes]

2 – The block offset from the beginning of the redo log file.
[4 bytes]

3 – The byte offset within the block.
[2 bytes]

Log sequence number v$logmnr_contents.RBASQN
Block number within log file v$logmnr_contents.RBABLK
Byte offset within block v$logmnr_contents.RBABYTE
So in this case, RBA [0x19.2.10] maps to Log squence 25, Block number 2 with byte offset 16.
With this information you should be able to find the details from v$logmnr_contents.

In particular, the buffer header information includes two fields: low-RBA and high-SCN. The low-RBA field indicates the point in the redo-log corresponding to the block’s first update after it came into the buffer cache. The high-SCN field indicates the SCN of the last update to the block.

***************************************************************************
CHECKPOINT PROGRESS RECORDS
***************************************************************************
 (size = 8180, compat size = 8180, section max = 11, section in-use = 0,
  last-recid= 0, old-recno = 0, last-recno = 0)
 (extent = 1, blkno = 2, numrecs = 11)
THREAD #1 - status:0x2 flags:0x0 dirty:47
low cache rba:(0x37a.11c.0) on disk rba:(0x37a.151.0)
on disk scn: 0x0000.0045d2f2 12/19/2011 01:03:21
resetlogs scn: 0x0000.00080634 11/07/2011 04:08:50
heartbeat: 770246919 mount id: 2811012380
THREAD #2 - status:0x0 flags:0x0 dirty:0
low cache rba:(0x0.0.0) on disk rba:(0x0.0.0)
on disk scn: 0x0000.00000000 01/01/1988 00:00:00
resetlogs scn: 0x0000.00000000 01/01/1988 00:00:00
heartbeat: 0 mount id: 0

9i中DG remote archive可能导致Primary Database挂起

January 23rd, 2012 No comments

9i中若dg的remove archive stuck卡住可能即便在Max Performance最大性能模式下也会导致Primary database 主库的local archive 本地归档无法正常进行, 进而导致Primary db hang挂起。

Solaris sparce 64  平台上的9.2.0.5 库 的相关日志:

 

Verifying the issue in the alert log

Fri Oct 28 08:10:12 2011
ALTER SYSTEM SET log_archive_dest_1='location=/u24/app/oradata/temp_arch' SCOPE=BOTH;
Fri Oct 28 08:10:51 2011
ORACLE Instance maclean - Can not allocate log, archival required
Fri Oct 28 08:10:51 2011
ARCH: Connecting to console port...
Thread 1 cannot allocate new log, sequence 18963
All online logs needed archiving
 Current log# 3 seq# 18962 mem# 0: /u23/app/oradata/maclean/redo3a.log
 Current log# 3 seq# 18962 mem# 1: /u24/app/oradata/maclean/redo3b.log
Fri Oct 28 08:12:22 2011
ALTER SYSTEM SET log_archive_dest_1='location=/u04/app/oracle/admin/maclean mandatory' SCOPE=BOTH;

<== changing archive destination multiple times

Fri Oct 28 08:27:56 2011
Shutting down instance: further logons disabled
Shutting down instance (immediate) <===== issued shut immediate
License high water mark = 546
Fri Oct 28 08:30:55 2011
PMON failed to delete process, see PMON trace file
Fri Oct 28 08:33:11 2011
Active call for process 19878 user 'oracle' program 'oracle@e02k72 (TNS V1-V3)'
SHUTDOWN: waiting for active calls to complete. <=== looks there are active trans and waiting to complete
Fri Oct 28 08:36:40 2011
Starting ORACLE instance (force) <=== issued startup force

Fri Oct 28 08:37:03 2011
ALTER DATABASE   MOUNT
Fri Oct 28 08:37:08 2011
Successful mount of redo thread 1, with mount id 2399989231
Fri Oct 28 08:37:08 2011
Database mounted in Exclusive Mode.
Completed: ALTER DATABASE   MOUNT <=== mount completed

Fri Oct 28 08:37:08 2011
ALTER DATABASE OPEN
Fri Oct 28 08:37:08 2011
Beginning crash recovery of 1 threads <=== started crash recover as expected
Fri Oct 28 08:37:08 2011
Started redo scan
Fri Oct 28 08:37:08 2011
Completed redo scan
0 redo blocks read, 0 data blocks need recovery
Fri Oct 28 08:37:08 2011
Started recovery at
Thread 1: logseq 18962, block 409599, scn 2775.3390299666
Recovery of Online Redo Log: Thread 1 Group 3 Seq 18962 Reading mem 0
 Mem# 0 errs 0: /u23/app/oradata/maclean/redo3a.log
 Mem# 1 errs 0: /u24/app/oradata/maclean/redo3b.log
Fri Oct 28 08:37:08 2011
Completed redo application
Fri Oct 28 08:37:08 2011
Ended recovery at
Thread 1: logseq 18962, block 409599, scn 2775.3390319667
0 data blocks read, 0 data blocks written, 0 redo blocks read
Crash recovery completed successfully
Fri Oct 28 08:37:08 2011
LGWR: Primary database is in CLUSTER CONSISTENT mode
Fri Oct 28 08:37:08 2011
ARCH: Evaluating archive   log 1 thread 1 sequence 18959
ARCH: Beginning to archive log 1 thread 1 sequence 18959
Creating archive destination LOG_ARCHIVE_DEST_2: 'maclean_dr'
Creating archive destination LOG_ARCHIVE_DEST_1: '/u04/app/oracle/admin/maclean/maclean_1_18959.arch'
Fri Oct 28 08:41:12 2011
alter database open
Fri Oct 28 08:41:12 2011
ORA-1154 signalled during: alter database open...
Fri Oct 28 08:59:10 2011
ARCH: Completed archiving  log 1 thread 1 sequence 18959
Fri Oct 28 08:59:10 2011
LGWR: Primary database is in CLUSTER CONSISTENT mode
Thread 1 advanced to log sequence 18963
Thread 1 opened at log sequence 18963
 Current log# 1 seq# 18963 mem# 0: /u23/app/oradata/maclean/redo1a.log
 Current log# 1 seq# 18963 mem# 1: /u24/app/oradata/maclean/redo1b.log
Successful open of redo thread 1
Fri Oct 28 08:59:11 2011
LOG_CHECKPOINT_INTERVAL was set when MTTR advisory was switched on.
Fri Oct 28 08:59:11 2011
ARC0: Evaluating archive   log 2 thread 1 sequence 18960
ARC0: Beginning to archive log 2 thread 1 sequence 18960
Creating archive destination LOG_ARCHIVE_DEST_2: 'maclean_dr'
Fri Oct 28 08:59:11 2011
SMON: enabling cache recovery
Fri Oct 28 08:59:11 2011
Creating archive destination LOG_ARCHIVE_DEST_1: '/u04/app/oracle/admin/maclean/maclean_1_18960.arch'
Fri Oct 28 08:59:12 2011
Undo Segment 1 Onlined
Undo Segment 2 Onlined
Undo Segment 3 Onlined
Undo Segment 4 Onlined
Undo Segment 5 Onlined
Undo Segment 6 Onlined
Undo Segment 7 Onlined
Undo Segment 8 Onlined
Undo Segment 9 Onlined
Undo Segment 10 Onlined
Successfully onlined Undo Tablespace 1.
Fri Oct 28 08:59:12 2011
SMON: enabling tx recovery
Fri Oct 28 08:59:12 2011
Database Characterset is AL32UTF8
Fri Oct 28 08:59:12 2011
SMON: about to recover undo segment 17
Fri Oct 28 08:59:12 2011
replication_dependency_tracking turned off (no async multimaster replication found)
Fri Oct 28 08:59:12 2011
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 18
SMON: mark undo segment 18 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 18
SMON: mark undo segment 18 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 18
SMON: mark undo segment 18 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: Parallel transaction recovery tried <=== transaction recovery
Fri Oct 28 08:59:14 2011
Completed: ALTER DATABASE OPEN <=== database is now open
Fri Oct 28 09:02:34 2011

This looks to be an expected behavior, as the database waited for the crash/transaction recovery to
complete and it opened the database just after the transaction recovery.

Note 414242.1 Database Hangs Because SMON is taking 100% CPU doing transaction recovery Startup Hang

Also Please note that 9i  is Sustaining Support as of  now, so we cannot raise any bug/backport requests for
this version of Oracle.
So, with limited resources and information We will do our best to solve your issue.
However, if there is no available workaround or fix for this version of Oracle, you may need to migrate to
a supported version like 11.2.0.3 to solve this problem.

Also please note that 9.2.0.5 is no more listed in the certification for Oracle Solaris on SPARC (64-bit) platform.
So i request you to upgrade to at least 9.2.0.8 immediately, if upgrade to 11.2.0.3 is not feasible immediately.

If it is up are you able to issue an "alter system switch logfile" command and it finish?
yes now alter system switch logfile is working. but we had to restart database couple of times by shutdown abort..
we did it around 4-5 times.. why it concerns us a lot bcoz produciton instance was down for 2-3 hours.

3. Upload the output of the command SQL> show parameter archive.

SQL> show parameter archive

NAME TYPE
------------------------------------ --------------------------------
VALUE
------------------------------
archive_lag_target integer
0
log_archive_dest string

log_archive_dest_1 string
location=/u04/app/oracle/admin
/maclean
log_archive_dest_10 string

NAME TYPE
------------------------------------ --------------------------------
VALUE
------------------------------
log_archive_dest_2 string
service=maclean_dr reopen=60
log_archive_dest_3 string

log_archive_dest_4 string

log_archive_dest_5 string

log_archive_dest_6 string

NAME TYPE
------------------------------------ --------------------------------
VALUE
------------------------------

log_archive_dest_7 string

log_archive_dest_8 string

log_archive_dest_9 string

log_archive_dest_state_1 string
enable

NAME TYPE
------------------------------------ --------------------------------
VALUE
------------------------------
log_archive_dest_state_10 string
enable
log_archive_dest_state_2 string
enable
log_archive_dest_state_3 string
enable
log_archive_dest_state_4 string
enable
log_archive_dest_state_5 string

NAME TYPE
------------------------------------ --------------------------------
VALUE
------------------------------
enable
log_archive_dest_state_6 string
enable
log_archive_dest_state_7 string
enable
log_archive_dest_state_8 string
enable
log_archive_dest_state_9 string
enable

NAME TYPE
------------------------------------ --------------------------------
VALUE
------------------------------
log_archive_duplex_dest string

log_archive_format string
maclean_%t_%s.arch
log_archive_max_processes integer
2
log_archive_min_succeed_dest integer
1
log_archive_start boolean

NAME TYPE
------------------------------------ --------------------------------
VALUE
------------------------------
TRUE
log_archive_trace integer
0
remote_archive_enable string
true
standby_archive_dest string
?/dbs/arch
SQL>

Destination LOG_ARCHIVE_DEST_2 is in CLUSTER CONSISTENT mode
Destination LOG_ARCHIVE_DEST_2 is in MAXIMUM PERFORMANCE mode
Destination LOG_ARCHIVE_DEST_2 is in CLUSTER CONSISTENT mode
Destination LOG_ARCHIVE_DEST_2 is in MAXIMUM PERFORMANCE mode

This indicates that your Primary Init.ora parameter log_archive_dest_2 is not in sync with your standby destination.

I presume your standby database maclean_dr is working properly?

The standby dataguard database may not be in a correct state. Archive logs not being successfully transported.
Possible password errors. Possible destination file errors, possible set up errors.

standby was in sync. we even defer log_archive_dest_2 during issue to check if log switch works after that.
but even after deferring the log_archive_dest_2 log switch didn't work .& database was in bad shape for 2-3 hours..

Fri Oct 28 08:35:32 2011
RFS: Possible network disconnect with primary database <<<<<DISCONNECT FROM PRIMARY - SHUTDOWN IMMEDIATE ON PRIMARY
Closing latent archivelog for thread 1 sequence 18960
EOF located at block 47105 low SCN 2775:-904684681 next SCN 16:0
Latent archivelog '/u04/app/oracle/admin/maclean/maclean_1_18960.arch'
If you wish to failover to this standby database, you should use the
following command to manually register the archivelog for recovery:
ALTER DATABASE REGISTER LOGFILE '/u04/app/oracle/admin/maclean/maclean_1_18960.arch';
Fri Oct 28 08:35:32 2011
Errors in file /u01/app/oracle/admin/maclean/udump/maclean_rfs_6173.trc:
ORA-00367: checksum error in log file header <<<<<<<<<<<<<<<<<<<<<<CHECKSUM ERROR IN FILES
ORA-00332: archived log is too small - may be incompletely archived
ORA-00334: archived log: '/u04/app/oracle/admin/maclean/maclean_1_18960.arch'

by 0954 the standby had caught up.

Fri Oct 28 09:54:41 2011
Media Recovery Log /u04/app/oracle/admin/maclean/maclean_1_18962.arch
Media Recovery Log /u04/app/oracle/admin/maclean/maclean_1_18963.arch
Media Recovery Log /u04/app/oracle/admin/maclean/maclean_1_18964.arch
Media Recovery Log /u04/app/oracle/admin/maclean/maclean_1_18965.arch
Media Recovery Log /u04/app/oracle/admin/maclean/maclean_1_18966.arch
Media Recovery Log /u04/app/oracle/admin/maclean/maclean_1_18967.arch
Media Recovery Log /u04/app/oracle/admin/maclean/maclean_1_18968.arch
Media Recovery Log /u04/app/oracle/admin/maclean/maclean_1_18969.arch
Media Recovery Log /u04/app/oracle/admin/maclean/maclean_1_18970.arch
Media Recovery Log /u04/app/oracle/admin/maclean/maclean_1_18971.arch
Media Recovery Log /u04/app/oracle/admin/maclean/maclean_1_18972.arch
Media Recovery Log /u04/app/oracle/admin/maclean/maclean_1_18973.arch

The following notes should help you to reduce some of the errors on the standby side.

"Data Guard 9i ORA-332 Error When Registering Partial Archive Log After Primary Server Goes Down (Doc ID 233253.1)"
"Note:14579.1 SOLARIS Configuring Keepalive on Solaris (SPARC): "

This shows the value of the keep alive timer

ndd /dev/tcp tcp_keepalive_interval

You can change the parameter with:

ndd -set /dev/tcp tcp_keepalive_interval <new-smaller-value>

Current value for tcp_keepalive_interval is 7200000,
Can you please suggest us the value for this parameter also let us know if this parameter requires
server reboot or it can be done online.

$ ndd /dev/tcp tcp_keepalive_interval
7200000

Fri Oct 28 04:59:21 2011
Creating archive destination LOG_ARCHIVE_DEST_1: '/u04/app/oracle/admin/maclean/maclean_1_18958.arch'
Fri Oct 28 04:59:22 2011
Creating archive destination LOG_ARCHIVE_DEST_1: '/u04/app/oracle/admin/maclean/maclean_1_18957.arch'
Fri Oct 28 05:01:48 2011
ORACLE Instance maclean - Can not allocate log, archival required
Fri Oct 28 05:01:48 2011
ARCH: Connecting to console port...
Thread 1 cannot allocate new log, sequence 18960
All online logs needed archiving
Current log# 1 seq# 18959 mem# 0: /u23/app/oradata/maclean/redo1a.log
Current log# 1 seq# 18959 mem# 1: /u24/app/oradata/maclean/redo1b.log
Fri Oct 28 05:12:10 2011
ARCH: Completed archiving log 2 thread 1 sequence 18957
ARCH: Evaluating archive log 4 thread 1 sequence 18956
ARCH: Unable to archive log 4 thread 1 sequence 18956
Log actively being archived by another process
Fri Oct 28 05:12:10 2011
Thread 1 advanced to log sequence 18960
Current log# 2 seq# 18960 mem# 0: /u23/app/oradata/maclean/redo2a.log
Current log# 2 seq# 18960 mem# 1: /u24/app/oradata/maclean/redo2b.log
Fri Oct 28 05:40:25 2011
ARC1: Completed archiving log 4 thread 1 sequence 18956
ARC1: Evaluating archive log 3 thread 1 sequence 18958
ARC1: Unable to archive log 3 thread 1 sequence 18958
Log actively being archived by another process
ARC1: Evaluating archive log 1 thread 1 sequence 18959
ARC1: Beginning to archive log 1 thread 1 sequence 18959
Creating archive destination LOG_ARCHIVE_DEST_2: 'maclean_dr'
Creating archive destination LOG_ARCHIVE_DEST_1: '/u04/app/oracle/admin/maclean/maclean_1_18959.arch'
Fri Oct 28 05:40:27 2011
ARCH: Evaluating archive log 3 thread 1 sequence 18958
ARCH: Unable to archive log 3 thread 1 sequence 18958
Log actively being archived by another process
ARCH: Evaluating archive log 1 thread 1 sequence 18959
ARCH: Unable to archive log 1 thread 1 sequence 18959
Log actively being archived by another process
Fri Oct 28 05:40:27 2011
Thread 1 advanced to log sequence 18961
Current log# 4 seq# 18961 mem# 0: /u23/app/oradata/maclean/redo4a.log
Current log# 4 seq# 18961 mem# 1: /u24/app/oradata/maclean/redo4b.log
Fri Oct 28 06:11:45 2011
ORACLE Instance maclean - Can not allocate log, archival required
Fri Oct 28 06:11:45 2011
ARCH: Connecting to console port...
Thread 1 cannot allocate new log, sequence 18962
All online logs needed archiving
Current log# 4 seq# 18961 mem# 0: /u23/app/oradata/maclean/redo4a.log
Current log# 4 seq# 18961 mem# 1: /u24/app/oradata/maclean/redo4b.log
Fri Oct 28 07:40:42 2011
Archive Log Stop
Archiving is disabled
Shutting down archive processes
Archiving is disabled
Archive Log Start
Archiving is enabled
ARCH: STARTING ARCH PROCESSES
ARCH: STARTING ARCH PROCESSES COMPLETE
Fri Oct 28 07:54:01 2011
ALTER SYSTEM SET log_archive_dest_state_2='defer' SCOPE=SPFILE;
Fri Oct 28 07:54:09 2011
ALTER SYSTEM SET log_archive_dest_state_2='defer' SCOPE=BOTH;
Fri Oct 28 08:03:52 2011
ALTER SYSTEM SET log_archive_dest_state_2='enable' SCOPE=BOTH;
Fri Oct 28 08:06:20 2011
ARC0: Completed archiving log 3 thread 1 sequence 18958
Fri Oct 28 08:06:20 2011
ARCH: Evaluating archive log 1 thread 1 sequence 18959
ARCH: Unable to archive log 1 thread 1 sequence 18959
Log actively being archived by another process
ARCH: Evaluating archive log 2 thread 1 sequence 18960
ARCH: Beginning to archive log 2 thread 1 sequence 18960
Creating archive destination LOG_ARCHIVE_DEST_2: 'maclean_dr'
Fri Oct 28 08:06:20 2011
Thread 1 advanced to log sequence 18962
Current log# 3 seq# 18962 mem# 0: /u23/app/oradata/maclean/redo3a.log
Current log# 3 seq# 18962 mem# 1: /u24/app/oradata/maclean/redo3b.log
Fri Oct 28 08:06:20 2011
ARCH shutting down
ARC0: Archival stopped
ARC0: Shutdown aborted (current state is 3)
ARC0: Evaluating archive log 1 thread 1 sequence 18959
ARC0: Unable to archive log 1 thread 1 sequence 18959
Log actively being archived by another process
ARC0: Evaluating archive log 2 thread 1 sequence 18960
ARC0: Unable to archive log 2 thread 1 sequence 18960
Log actively being archived by another process
Fri Oct 28 08:06:21 2011
ARC0: Evaluating archive log 4 thread 1 sequence 18961
Fri Oct 28 08:06:21 2011
Undo Segment 21 Onlined
ARC0: Beginning to archive log 4 thread 1 sequence 18961
Creating archive destination LOG_ARCHIVE_DEST_2: 'maclean_dr'
Fri Oct 28 08:06:21 2011
Undo Segment 22 Onlined
Fri Oct 28 08:06:21 2011
Undo Segment 23 Onlined
Fri Oct 28 08:06:21 2011
Undo Segment 24 Onlined
Fri Oct 28 08:06:21 2011
Undo Segment 25 Onlined
Fri Oct 28 08:06:21 2011
Undo Segment 26 Onlined
Fri Oct 28 08:06:21 2011
Creating archive destination LOG_ARCHIVE_DEST_1: '/u04/app/oracle/admin/maclean/maclean_1_18960.arch'
Fri Oct 28 08:06:21 2011
Creating archive destination LOG_ARCHIVE_DEST_1: '/u04/app/oracle/admin/maclean/maclean_1_18961.arch'
Fri Oct 28 08:10:12 2011
ALTER SYSTEM SET log_archive_dest_1='location=/u24/app/oradata/temp_arch' SCOPE=BOTH;
Fri Oct 28 08:10:51 2011
ORACLE Instance maclean - Can not allocate log, archival required
Fri Oct 28 08:10:51 2011
ARCH: Connecting to console port...
Thread 1 cannot allocate new log, sequence 18963
All online logs needed archiving
Current log# 3 seq# 18962 mem# 0: /u23/app/oradata/maclean/redo3a.log
Current log# 3 seq# 18962 mem# 1: /u24/app/oradata/maclean/redo3b.log
Fri Oct 28 08:12:22 2011
ALTER SYSTEM SET log_archive_dest_1='location=/u04/app/oracle/admin/maclean mandatory' SCOPE=BOTH;
Fri Oct 28 08:27:56 2011
Shutting down instance: further logons disabled
Shutting down instance (immediate)
License high water mark = 546
Fri Oct 28 08:30:55 2011
PMON failed to delete process, see PMON trace file
Fri Oct 28 08:33:11 2011
Active call for process 19878 user 'oracle' program 'oracle@e02k72 (TNS V1-V3)'
SHUTDOWN: waiting for active calls to complete.
Fri Oct 28 08:36:40 2011
Starting ORACLE instance (force)
License high water mark = 546
Instance terminated by USER, pid = 28373

FileName
----------------
alert_maclean.log

 

这个SR提交之后 , 一开始认可能是Solaris上tcp tcp_keepalive_interval 参数的问题, 后来esclated到serverity 1 后Oracle high availability support team接手, 发现是 因为 9i中若 remote archive stuck住 会导致本地归档无法正常,  造成数据库无响应。

 

 

Please refer the below Article which seems to match with your issue :
Doc ID 260040.1 -- Refining Remote Archival Over a Slow Network with the ARCH Process .

Please set the parameter "_LOG_ARCHIVE_CALLOUT" to 'LOCAL_FIRST=TRUE' and also
increase the number of ARCH processes from the default value of 2 to 6.

SQL> alter system set log_archive_max_processes=6 scope=both;

As in Doc ID 260040.1
If the remote archiving goes into unknown state, it can cause primary hang as it doesn't attempts
local archiving first in 9..2.0.5

It can still happen in max performance mode. The only workaround to prevent such problem is to
set _LOG_ARCHIVE_CALLOUT" to 'LOCAL_FIRST=TRUE'

With that parameter, it will do local archiving first before trying remote archiving preventing
any impact of standby setup to primary database.

You mentioned that you did try deferring remote archiving but it didn't help.
If the destination is stuck (in network) already, deferring the destination may not help.
But you rather restart instances.

Action plan suggested).

Please set _LOG_ARCHIVE_CALLOUT" to 'LOCAL_FIRST=TRUE'

If problem happens again, get multiple system state dump with 10046 trace
on alter system switch log file session.

When the parameter "_LOG_ARCHIVE_CALLOUT" is set , 1 ARCH process is reserved for local archiving.
 So even if the network is slow or the Standby is down, the Primary will not be impacted.

Setting the parameter "_LOG_ARCHIVE_CALLOUT" should not cause log shipping issues.

 

 

“_LOG_ARCHIVE_CALLOUT”(Secondly, you can change the above outlined archiver behavior, to make sure the local destination is archived first, causing the OPTIONAL destination to behave really optionally. If the following parameter is set then the ARCH process will begin archiving to the local destination first. Once the redo log has been completely and successfully archived to at least one local destination. )

For 9iR2 (9.2.0.5 and above):

_log_archive_callout=’LOCAL_FIRST=TRUE’

–OR–

alter system set “_LOG_ARCHIVE_CALLOUT”=’LOCAL_FIRST=TRUE’ scope=both;

In Oracle 10g, this parameter is replaced by ‘log_archive_local_first’, which defaults to TRUE.

In Oracle 11g, ‘log_archive_local_first’ has been deprecated.

 

这个隐藏参数可以控制ARCH归档进程优先完成本地归档local archive工作,这样就避免了本地重做日志长期无法归档造成数据库无响应。

 

文档《Refining Remote Archival Over a Slow Network with the ARCH Process》介绍了更多信息:

 

Purpose

When archiving locally and remotely using the ARCH process where the remote destination is across a saturated or slow network you can receive the following errors in the alert log:

ARC0: Evaluating archive   log 2 thread 1 sequence 100

ARC0: Unable to archive log 2 thread 1 sequence 100

      Log actively being archived by another process

If the ARCH process is unable to archive at the rate at which online logs are switched then it is possible for the primary database to suspend while waiting for archiving to complete.  The following discussion describes how this can occur.

Default Behavior for 9iR2 and Below

The ARCH process sits in a very tight loop waiting for an update to the controlfile that states an online log needs to be archived.  Once the update occurs the ARCH process builds a list of archive destinations that need to be serviced.  Once this list is complete, the ARCH process will read a one megabyte chunk of data from the online log that is to be archived.  This one megabyte chunk is then sent to the first destination in the list.  When the write has completed, the same one megabyte chunk is written to the second destination.  This continues until all of the data from the online log being archived has been written to all destinations.  So it can be said that archiving is only as fast as the slowest destination.

A common misconception is that if the LOG_ARCHIVE_DEST_n parameter for a particular destination has the OPTIONAL attribute set, then that destination will not impede local archiving. This is true during error situations while archiving to that destination – e.g. a network disconnect error, but not during an archival over a slow network, which is not an error situation. In error situations, whether the destination is marked OPTIONAL or MANDATORY, Data Guard will close that destination and continue transmitting to all other valid destinations. Transmitting to the closed destination will be attempted again only after the time specified in the REOPEN attribute has expired and a log switch has occurred.  This process will continue for the number of times specified by the MAX_FAILURE attribute. During this time, it is possible that the log writer process recycles through the available online redo log groups and tries to use the online redo log file which has not yet been transmitted successfully to the remote destination. If the destination is marked OPTIONAL, log writer will reuse the online redo log file for the next set of redo. If the destination is marked MANDATORY,  log writer will not be able to reuse that online redo log file, and the primary database will delay processing until that online redo log file has been successfully transmitted to the remote destination.

However, the situation is very different if the transmission is being done over a slow network. In this case, no error is encountered and the destination is not closed. Transmission continues, but is very slow. Ultimately, with the unavailability of any more online redo log groups, Log writer may suspend because the archive process is taking a long time to complete its archival, including local archival.

Refining the Default Behavior

The following underscore parameter was introduced as of 9.2.0.5 to allow the DBA to change this default behavior:

_LOG_ARCHIVE_CALLOUT=’LOCAL_FIRST=TRUE’

This is a dynamic Parameter, so you can set it this Way:

SQL> alter system set “_LOG_ARCHIVE_CALLOUT”=’LOCAL_FIRST=TRUE’ scope=both;

If the above parameter is set then the ARCH process will begin archiving to the local destination first.  Once the redo log has been completely and successfully archived to at least one local destination, it will then be transmitted to the remote destination. This is the default behavior beginning with Oracle Database 10g Release 1.

Starting in 9.2.0.7 patchsets, one ARCH process will begin acting as a ‘dedicated’ archiver, handling only local archival duties. It will not perform remote log shipping or service FAL requests. This is a backport of behavior from 10gR1 to 9iR2.

 

 

设置_LOG_ARCHIVE_CALLOUT=’LOCAL_FIRST=TRUE’  + ’log_archive_max_processes=6′  一般可以解决该因为 归档状态异常而引起的实例无响应。

Oracle数据库升级与补丁

January 21st, 2012 No comments

以下是Maclean.Liu 编写或收集的数据库升级(Upgrade)与补丁(patch fix)方面的知识:

 

甲骨文发布2012 1月数据库安全补丁Critical Patch Update January 2012
Oracle数据库版本10.2实际进入扩展支持Extended Support周期
Patch Set Update and Critical Patch Update October 2011补丁更新发布了
11.2.0.3 Patch Set – List of Bug Fixes by Problem Type
快速升级Oracle 11.2.0.2 RAC到11.2.0.3
Upgrade GI/CRS 11.1.0.7 to 11.2.0.2. Rootupgrade.sh Hanging
Oracle补丁集的补丁号Patch ID/Number速查
Oracle 11gR2发布11.2.0.3 Patchset补丁集-又一重量级更新
Slide:11g新特性-在线实施补丁online patching
Slide:Oracle数据库升级前必要的准备工作
深入了解Oracle数据字典升级脚本catupgrd.sql调用过程
Slide:如何安装Oracle one-off 临时小补丁及注意事项 by Maclean.liu
Slide:了解Oracle critical patch update
Slide:Upgrade 11.2.0.1 GI/CRS to 11.2.0.2 in Linux
Slide:Upgrade 11.2.0.1 RAC DB/RDBMS to 11.2.0.2 in Linux By Maclean
Upgrade 11.2.0.1 DB/RDBMS to 11.2.0.2 in Linux
Upgrade 11.2.0.1 GI/CRS to 11.2.0.2 in Linux
Applying 11G R2 GI PSU 11.2.0.2.3
Critical Patch Update July 2011 Released
了解Oracle Critical Patch Update
了解Oracle数据库的版本号
Pre-check while you are applying one-off patch
Advise on OS patch upgrade with RAC
Oracle Patch Set Update and Critical Patch Update April 2011 Released
11g新特性:Rolling Upgrade With Physical Standby
Patch your 10g Oracle database to PSU 10.2.0.4.5
7月最新发布10.2.0.4.5 Patch Set Update
Oracle Exadata Database Recommended Patch (BP3) for Bug 10387939
Startup Upgrade为我们做了什么?
深入理解Oracle Universal Installer (OUI)
Script to Collect DB Upgrade/Migrate Diagnostic Information (dbupgdiag.sql)
介绍dbms_registry PL/SQL程序包
Applying online patch on 11gr2
January 2011 Patch set Update发布
Advice on upgrading to 11.2.0.2 and converting to RAC
Oracle 9iR2 RAC to Oracle 10gR2 RAC Upgrade
Upgrading to Oracle Database 11g
如何确定所打Patch是否需要停机
Database Upgrade using Transportable Tablespaces
Upgrading to 11g–Why,How and Best Practices
Upgrade from Oracle Database 10g to 11g:What to Expect From the Optimizer
Upgrade to Oracle Database 11g: Single Instance to RAC & ASM
Applying Database PSU 10.2.0.4.6
Oct 12: Patch Set Update Released
Oracle Critical Patch Updates Unwrapped
11.2.0.2补丁集安装体验
Upgrading to RAC 11g R2 What you should know
Oracle 9i/10g/11g数据库升级路线图总览
OPatch工具相关的环境变量
使用OPATCH_DEBUG环境变量调试Opatch工具
opatch java.lang.OutOfMemoryError:Java heap space错误一例