DNS设置引起的登录延迟

一套Linux上的11.1.0.7系统，操作系统管理人员最近对该服务器上的网络配置文件/etc/nsswitch.conf进行了调整，调整前其主机名解析选项为”hosts:files dns” ，调整后被修改成了”hosts:files [NOTFOUND=continue] dns”；此后应用人员尝试在该主机上使用
“sqlplus username/password@connect_string”远程登录数据库都会出现多达五分钟的延迟，使用lsnrctl status命令查看监听器状态，发现LISTENER一切正常；初步可以判断是dns解析导致了长时间的延迟。

针对以上问题，首先想到的是设置client端Oracle network trace以了解造成延迟的具体原因,在$ORACLE_HOME/network/admin/sqlnet.ora配置文件中加入以下记录:

TRACE_LEVEL_CLIENT = 16
TRACE_FILE_CLIENT = client
TRACE_DIRECTORY_CLIENT = [any valid directory path]
TRACE_TIMESTAMP_CLIENT = ON
DIAG_ADR_ENABLED=off

之后再次尝试登录就会触发Oracle Network Trace文件在$TRACE_DIRECTORY_CLIENT指定的目录下产生(如果DIAG_ADR_ENABLED未设置为false，那么11g下TRACE_DIRECTORY_CLIENT并不生效，而会产生在11g特有的diag目录下)。
登录测试产生的trace文件记录:

[02-SEP-2010 07:36:57:719] nsc2addr: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=m218279apss2012-vip)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=MOTOIDP.MOT.COM)(INSTANCE_NAME=MOTOIDP1)(CID=(PROGRAM=sqlplus)(HOST=m218279apss2012.mot.com)(USER=oraoid))))
[02-SEP-2010 07:36:57:719] nttbnd2addr: entry
[02-SEP-2010 07:36:57:719] snlinGetAddrInfo: entry
[02-SEP-2010 07:36:57:719] snlinGetAddrInfo: getaddrinfo() failed with error -2
[02-SEP-2010 07:36:57:719] snlinGetAddrInfo: exit
[02-SEP-2010 07:36:57:719] nttbnd2addr: looking up IP addr for host: m218279apss2012-vip
[02-SEP-2010 07:36:57:719] snlinGetAddrInfo: entry
[02-SEP-2010 07:36:57:719] snlinGetAddrInfo: exit
[02-SEP-2010 07:36:57:719] snlinFreeAddrInfo: entry
[02-SEP-2010 07:36:57:719] snlinFreeAddrInfo: exit
[02-SEP-2010 07:36:57:719] nttbnd2addr: exit
..................
[02-SEP-2010 07:36:57:722] nspsend: packet dump
[02-SEP-2010 07:36:57:722] nspsend: 01 1E 00 00 01 00 00 00  |........|
[02-SEP-2010 07:36:57:722] nspsend: 01 3A 01 2C 0C 41 20 00  |.:.,.A..|
[02-SEP-2010 07:36:57:722] nspsend: 7F FF 7F 08 00 00 01 00  |........|
[02-SEP-2010 07:36:57:722] nspsend: 00 E4 00 3A 00 00 02 00  |...:....|
[02-SEP-2010 07:36:57:722] nspsend: 41 41 00 00 00 00 00 00  |AA......|
[02-SEP-2010 07:36:57:722] nspsend: 00 00 00 00 00 00 00 00  |........|
[02-SEP-2010 07:36:57:722] nspsend: 00 00 00 00 00 00 00 00  |........|
[02-SEP-2010 07:36:57:722] nspsend: 00 00 28 44 45 53 43 52  |..(DESCR|
[02-SEP-2010 07:36:57:722] nspsend: 49 50 54 49 4F 4E 3D 28  |IPTION=(|
[02-SEP-2010 07:36:57:722] nspsend: 41 44 44 52 45 53 53 3D  |ADDRESS=|
[02-SEP-2010 07:36:57:722] nspsend: 28 50 52 4F 54 4F 43 4F  |(PROTOCO|
[02-SEP-2010 07:36:57:722] nspsend: 4C 3D 54 43 50 29 28 48  |L=TCP)(H|
[02-SEP-2010 07:36:57:722] nspsend: 4F 53 54 3D 6D 32 31 38  |OST=m218|
[02-SEP-2010 07:36:57:722] nspsend: 32 37 39 61 70 73 73 32  |279apss2|
[02-SEP-2010 07:36:57:722] nspsend: 30 31 32 2D 76 69 70 29  |012-vip)|
[02-SEP-2010 07:36:57:722] nspsend: 28 50 4F 52 54 3D 31 35  |(PORT=15|
[02-SEP-2010 07:36:57:722] nspsend: 32 31 29 29 28 43 4F 4E  |21))(CON|
[02-SEP-2010 07:36:57:722] nspsend: 4E 45 43 54 5F 44 41 54  |NECT_DAT|
[02-SEP-2010 07:36:57:722] nspsend: 41 3D 28 53 45 52 56 45  |A=(SERVE|
[02-SEP-2010 07:36:57:722] nspsend: 52 3D 44 45 44 49 43 41  |R=DEDICA|
[02-SEP-2010 07:36:57:722] nspsend: 54 45 44 29 28 53 45 52  |TED)(SER|
[02-SEP-2010 07:36:57:722] nspsend: 56 49 43 45 5F 4E 41 4D  |VICE_NAM|
[02-SEP-2010 07:36:57:722] nspsend: 45 3D 4D 4F 54 4F 49 44  |E=MOTOID|
[02-SEP-2010 07:36:57:722] nspsend: 50 2E 4D 4F 54 2E 43 4F  |P.MOT.CO|
[02-SEP-2010 07:36:57:722] nspsend: 4D 29 28 49 4E 53 54 41  |M)(INSTA|
[02-SEP-2010 07:36:57:722] nspsend: 4E 43 45 5F 4E 41 4D 45  |NCE_NAME|
[02-SEP-2010 07:36:57:722] nspsend: 3D 4D 4F 54 4F 49 44 50  |=MOTOIDP|
[02-SEP-2010 07:36:57:722] nspsend: 31 29 28 43 49 44 3D 28  |1)(CID=(|
[02-SEP-2010 07:36:57:722] nspsend: 50 52 4F 47 52 41 4D 3D  |PROGRAM=|
[02-SEP-2010 07:36:57:722] nspsend: 73 71 6C 70 6C 75 73 29  |sqlplus)|
[02-SEP-2010 07:36:57:722] nspsend: 28 48 4F 53 54 3D 6D 32  |(HOST=m2|
[02-SEP-2010 07:36:57:722] nspsend: 31 38 32 37 39 61 70 73  |18279aps|
[02-SEP-2010 07:36:57:722] nspsend: 73 32 30 31 32 2E 6D 6F  |s2012.mo|
[02-SEP-2010 07:36:57:722] nspsend: 74 2E 63 6F 6D 29 28 55  |t.com)(U|
[02-SEP-2010 07:36:57:722] nspsend: 53 45 52 3D 6F 72 61 6F  |SER=orao|
[02-SEP-2010 07:36:57:722] nspsend: 69 64 29 29 29 29        |id))))  |
[02-SEP-2010 07:36:57:722] nspsend: 286 bytes to transport
[02-SEP-2010 07:36:57:722] nspsend: normal exit
[02-SEP-2010 07:36:57:722] nscon: exit (0)
[02-SEP-2010 07:36:57:722] nsdo: nsctxrnk=0
[02-SEP-2010 07:36:57:722] nsdo: normal exit
[02-SEP-2010 07:36:57:722] nsdo: entry
[02-SEP-2010 07:36:57:722] nsdo: cid=0, opcode=68, *bl=512, *what=9, uflgs=0x0, cflgs=0x3
[02-SEP-2010 07:36:57:722] nsdo: rank=64, nsctxrnk=0
[02-SEP-2010 07:36:57:722] nsdo: nsctx: state=2, flg=0x4005, mvd=0
[02-SEP-2010 07:36:57:722] nsdo: gtn=10, gtc=10, ptn=10, ptc=8155
[02-SEP-2010 07:36:57:722] nscon: entry
[02-SEP-2010 07:36:57:722] nscon: recving a packet
[02-SEP-2010 07:36:57:722] nsprecv: entry
[02-SEP-2010 07:36:57:722] nsprecv: reading from transport...
[02-SEP-2010 07:36:57:722] nttrd: entry
[02-SEP-2010 07:41:57:741] nttrd: socket 9 had bytes read=8
[02-SEP-2010 07:41:57:741] nttrd: exit
[02-SEP-2010 07:41:57:741] nsprecv: 8 bytes from transport
[02-SEP-2010 07:41:57:741] nsprecv: tlen=8, plen=8, type=11
[02-SEP-2010 07:41:57:741] nsprecv: packet dump
[02-SEP-2010 07:41:57:741] nsprecv: 00 08 00 00 0B 00 00 00  |........|
[02-SEP-2010 07:41:57:741] nsprecv: normal exit

可以看到以上出现了”snlinGetAddrInfo: getaddrinfo() failed with error -2″的记录，通过关键词”snlinGetAddrInfo”在MOS上搜索可以找到以下Note:

Remote Connections Take Very Long to Establish

Applies to:

Oracle Net Services – Version: 11.1.0.6 to 11.1.0.7 – Release: 11.1 to 11.1
Information in this document applies to any platform.

Symptoms

When you connect remotely, it takes very long until the connection is established. Once connected, everything works fine.

Local bequeath connections work in a timely manner.

Resolution of hostnames is done via a centralized DNS server.

Cause

Listener traces show the listener is wasting time while calling the “snlinGetAddrInfo” function :

2009-07-08 13:58:35.135311 : nttcnp:exit 2009-07-08 13:58:35.135327 : snlinGetAddrInfo:entry 2009-07-08 13:58:55.135643 : snlinGetAddrInfo:getaddrinfo() failed with error -5 2009-07-08 13:58:55.135703 : snlinGetAddrInfo:exit

Solution

We need to make sure that the DNS server(s) configured are reachable.

The “snlinGetAddrInfo” function is a TCP layer function which Oracle uses (since version 11g) for hostname-to-IP mappings.

When the DNS server is unreachable, the listener will wait for some time until it will time out and fail over to the next method configured for resolution of hostnames (usually the local “hosts” file).

Note : On UNIX systems, the order of methods used for resolution of hostnames can be specified via the /etc/nsswitch.conf and /etc/host.conf files :

on Linux systems, you can specify the order by setting “hosts” to “files” and / or “dns”

Example for /etc/nsswitch.conf :
hosts: files [NOTFOUND=continue] dns

on HP-UX and Solaris systems, you need to specify the order using the “ipnodes” keyword

Example for /etc/nsswitch.conf :
ipnodes: files [NOTFOUND=continue] dns

Note : Since 11.2, this behaviour has changed and naming lookup is no longer performed at this stage (the changes was addressed through unpublished Bug 9593134).

这个文档指出snlinGetAddrInfo函数用以主机名到ip地址的映射，并建议使用在/etc/nsswitch.conf网络配置文件中设置过程为”hosts: files [NOTFOUND=continue] dns”，而我们恰恰正是这样做的！另一个文档[ID 803838.1]指出在其他UNIX平台上，可以使用”ipnodes: files [NOTFOUND=continue] dns”方式避免延迟问题，但Linux平台上并没有ipnodes方式。

11g中对hosts的解析实在变得有些西斯底里，这个case通过在移除nsswitch.conf中hosts的dns选项最后解决了，对于Oracle使用最简单的文件解析方式似乎仍是最稳妥的办法，不管版本有多新。

Posted

September 6, 2010

mac

Tags:

Comments

5 responses to “DNS设置引起的登录延迟”

admin

September 6, 2010

11g Network Layer Does Not Use /etc/hosts on UNIX

Applies to:

Oracle Net Services – Version: 11.1.0.6 to 11.1.0.7
Generic UNIX

Symptoms

After upgrading to 11g Oracle functionality bypasses the /etc/hosts file when resolving hostnames to IPs and instead query the DNS server. This can introduce some delay in establishing a connection to a remote host compared with 10g.

SQL*Plus and tnsping show this behavior but this delay can appear also when opening a dblink or anytime tcpip connections are established by the Oracle Network layer.

This happens even if Name Server Switch configuration (nsswitch) specifies the hosts file prior to dns lookups:

/etc/nsswitch.conf
hosts: files [NOTFOUND=continue] dns

or only:
hosts: files

If we obtain truss/tusc trace for a SQL*Plus connection we find the following sequence of OS system calls:

On 10g after reading the nsswitch.conf file, library “libnss_files.so” is loaded then /etc/hosts is read and the socket is opened:

open(“/etc/nsswitch.conf”, O_RDONLY|0x800, 0666) = 5
ioctl(5, TCGETA, 0x9fffffffffffaca0) ERR#25 ENOTTY
read(5, “# \n# / e t c / n s s w i t c “.., 8192) = 92
read(5, 0x60000000001e6078, 8192) = 0
close(5) = 0
open(“/usr/lib/hpux64/libnss_files.so.1”, O_RDONLY|0x800, 0) = 5
fstat(5, 0x9fffffffffffa720) = 0
pread(5, “7fE L F 0202010101\0\0\0\0\0\0\0”.., 1024, 0) = 1024
stat(“/usr/lib/hpux64/dpd”, 0x9fffffffffff9cd0) = 0
open(“/usr/lib/hpux64/dpd/libnss_files.so.1.bpd”, O_RDONLY|0x800, 0) ERR#2 ENOENT
getuid() = 305 (305)
getgid() = 303 (303)
mmap(NULL, 85872, PROT_READ|PROT_EXEC, MAP_SHARED|MAP_SHLIB, 5, 0) = 0xc0000000008d8000
mmap(NULL, 3159, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_SHLIB, 5, 131072) = 0x9fffffffbf63d000
close(5) = 0
getuid() = 305 (305)
getgid() = 303 (303)
open(“/etc/hosts”, O_RDONLY|0x800, 0666) = 5
ioctl(5, TCGETA, 0x9fffffffffffac30) ERR#25 ENOTTY
read(5, “# / e t c / h o s t s \n# \n# “.., 8192) = 8192
lseek(5, 18446744073709549410, SEEK_CUR) = 5986
close(5) = 0
socket(AF_INET, SOCK_STREAM, 0) = 5
connect(5, 0x60000000001f0680, 16) = 0
getsockname(5, 0x9fffffffffffb750, 0x9fffffffffffb740) = 0
getsockopt(5, SOL_SOCKET, SO_SNDBUF, 0x9fffffffffffb890, 0x9fffffffffffb894) = 0
getsockopt(5, SOL_SOCKET, SO_RCVBUF, 0x9fffffffffffb890, 0x9fffffffffffb894) = 0
setsockopt(5, 0x6, TCP_NODELAY, 0x9fffffffffffb89c, 4) = 0

But on 11g after reading the nsswitch.conf  library “libnss_dns.so” is loaded then /etc/resolv.conf (which specifies available domain name servers) is read and much later a socket for TCP/IP (SOCK_STREAM) is opened. :

open(“/etc/nsswitch.conf”, O_RDONLY|0x800, 0666) = 5
ioctl(5, TCGETA, 0x9fffffffffffa360) ERR#25 ENOTTY
read(5, “# \n# / e t c / n s s w i t c “.., 8192) = 92
read(5, 0x60000000001c9058, 8192) = 0
close(5) = 0
open(“/usr/lib/hpux64/libnss_dns.so.1”, O_RDONLY|0x800, 0) = 5
fstat(5, 0x9fffffffffff9de0) = 0
pread(5, “7fE L F 0202010101\0\0\0\0\0\0\0”.., 1024, 0) = 1024
stat(“/usr/lib/hpux64/dpd”, 0x9fffffffffff9390) = 0
open(“/usr/lib/hpux64/dpd/libnss_dns.so.1.bpd”, O_RDONLY|0x800, 0) ERR#2 ENOENT
getuid() = 305 (305)
getgid() = 303 (303)
mmap(NULL, 49440, PROT_READ|PROT_EXEC, MAP_SHARED|MAP_SHLIB, 5, 0) = 0xc00000000b054000
mmap(NULL, 800, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_SHLIB, 5, 65536) = 0x9fffffffbf68d000
close(5) = 0
getuid() = 305 (305)
getgid() = 303 (303)
getuid() = 305 (305)
getgid() = 303 (303)
open(“/test/local/oracle/11.1.0.7/lib/libdl.so.1”, O_RDONLY|0x800, 0) ERR#2 ENOENT
open(“/test/local/oracle/11.1.0.7/lib32/libdl.so.1”, O_RDONLY|0x800, 0) ERR#2 ENOENT
getuid() = 305 (305)
getgid() = 303 (303)
open(“/usr/lib/hpux64/libdl.so.1”, O_RDONLY|0x800, 0) = 5
fstat(5, 0x9fffffffffff9cf0) = 0
read(5, “7fE L F 0202010101\0\0\0\0\0\0\0”.., 64) = 64
close(5) = 0
socket(AF_INET, SOCK_DGRAM, 0) = 5
ioctl(5, SIOCGIFNUM, 0x9fffffffffff9680) = 0
ioctl(5, SIOCGIFCONF, 0x9fffffffffff9690) = 0
socket(AF_INET6, SOCK_DGRAM, 0) = 6
ioctl(6, SIOCGLIFNUM, 0x9fffffffffff9684) = 0
ioctl(6, SIOCGLIFCONF, 0x9fffffffffff96a0) = 0
ioctl(5, SIOCGIFFLAGS, 0x9fffffffffff96b0) = 0
close(5) = 0
close(6) = 0
gettimeofday(0x9fffffffffff7dd0, NULL) = 0
getpid() = 22968 (22967)
open(“/etc/resolv.conf”, O_RDONLY|0x800, 0666) = 5
ioctl(5, TCGETA, 0x9fffffffffff7da0) ERR#25 ENOTTY
read(5, “d o m a i n t e s t . c o m \n”.., 8192) = 453
read(5, 0x60000000001dddf8, 8192) = 0
close(5) = 0
…………….

………….

socket(AF_INET, SOCK_STREAM, 0) = 5
connect(5, 0x60000000001eba50, 16) = 0
getsockname(5, 0x9fffffffffff9da0, 0x9fffffffffff94c0) = 0
getsockopt(5, SOL_SOCKET, SO_SNDBUF, 0x9fffffffffffa000, 0x9fffffffffffa004) = 0
getsockopt(5, SOL_SOCKET, SO_RCVBUF, 0x9fffffffffffa000, 0x9fffffffffffa004) = 0
setsockopt(5, 0x6, TCP_NODELAY, 0x9fffffffffffa00c, 4) = 0

Changes

Nothing was changed in the configuration of the OS, only the upgrade from Oracle 10g to 11g was done.

Cause

What was changed between the two is the way Oracle resolves hostnames to IPs, more specifically the system call used to do that.

Oracle 11g use now getaddrinfo() while 10g used gethostbyname().

These system functions requires different configuration in /etc/nsswitch.conf.

gethostbyname() require the use of keyword “hosts” while getaddrinfo() the use the keyword “ipnodes”

Notes:
Even though this has only been observed on HP-UX and Solaris, this may be UNIX generic.
With Solaris, ipnodes has a different meaning (specify a file for IPV6 addresses resolution, gethostbyname and getaddrinfo both use hosts or ipnodes file).
Linux on the other hand does not use ipnodes in nsswitch.conf

Solution

Add a line in the /etc/nsswitch.conf file similar to the following:

ipnodes: files [NOTFOUND=continue] dns

The line starting with keyword “hosts” must not be deleted.

This way calls made by getaddrinfo() will search first in /etc/hosts then, if the name is not found, will contact the dns server.
Thus there will be no connection delay for any lookup of host names existing in the local /etc/hosts file.

Reply
admin

September 6, 2010

Sqlnet connection via TCP Hangs for about 10seconds Before Connection is Established

Applies to:

Oracle Server – Enterprise Edition – Version: 10.1.0.2 to 11.2.0.1.0 – Release: 10.1 to 11.2
Information in this document applies to any platform.

Symptoms

10 nodes RAC cluster, sqlnet connection via TCP consistently takes 10 seconds before the connection with the server is finally established.

$date;sqlplus apps/xxx@dwnodea @time.sql
Tue Feb 2 17:00:39 CST 2010

SQL*Plus: Release 11.1.0.7.0 – Production on Tue Feb 2 17:00:39 2010

Copyright (c) 1982, 2008, Oracle. All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 – 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Real Application Testing options

TO_CHAR(SYSDATE,’YYYY-MM-DDHH24:MI:SS’)
————————————————————————————————————————
2010-02-02 17:00:49

Sqlnet tracing (both server and client) against sqlplus connection with level 16 shows:

Server side trace:
2010-02-02 17:00:39.432277 : nttcnp:exit
2010-02-02 17:00:39.432308 : snlinGetAddrInfo:entry

*** 2010-02-02 17:00:49.440
2010-02-02 17:00:49.440655 : snlinGetAddrInfo:getaddrinfo() failed with error -2
2010-02-02 17:00:49.440736 : snlinGetAddrInfo:exit
2010-02-02 17:00:49.440751 : nttcon:exit
2010-02-02 17:00:49.440767 : nsopen:transport is open
2010-02-02 17:00:49.440785 : nsoptions:entry
2010-02-02 17:00:49.440802 : nsoptions:lcl[0]=0x0, lcl[1]=0x0, gbl[0]=0x0, gbl[1]=0x0, cha=0x0

Client side trace:
2010-02-02 17:00:39.418598 : nscon:recving a packet
2010-02-02 17:00:39.418714 : nsprecv:entry
2010-02-02 17:00:39.418833 : nsprecv:reading from transport…
2010-02-02 17:00:39.418949 : nttrd:entry
2010-02-02 17:00:49.444643 : nttrd:socket 12 had bytes read=8
2010-02-02 17:00:49.444832 : nttrd:exit

Changes

This is a new installation.

Cause

This issue is caused by invalid nameserver entry in /etc/resolv.conf.

Examine system configuration files show:

resolv.conf
——–
nameserver 192.168.1.20
nameserver 192.168.4.12

nsswitch.conf
———-
#hosts: db files ldap nis dns
hosts: files dns

These are incorrect nameserver per customer configuration, they do not exist. Once they are removed, there is no longer any delay in sqlnet connection.

Solution

1. Remove incorrect setting for nameserver in /etc/resolv.conf if DNS is not in use.
or
2. Consult with the network administrator to provide correct nameserver IP address if DNS is in use.

Reply
admin

September 6, 2010

=========================================================================
Interim Patch for Base Bug: 9593134
=========================================================================

Date: Tue Aug 10 18:18:48 2010
————————————————————————-
Platform Patch for : Linux-x86
Product Patched : SQLNET
Product Version # : 11.1.0.7.0
RAC Rolling Installable : YES

Bugs Fixed by this patch:
————————-
9593134: DNS OR NIS MIS-CONFIGURATION CAN CAUSE SLOW DATABASE CONNECTS

Hdr: 8307164 11.1.0.7 NET 11.1.0.7 PRODID-115 PORTID-197
Abstract: TNSPING 11G USING DNS AND NOT HOSTS FILE

*** 03/04/09 05:24 am ***
TAR:
—-

Problem:
——–
Customer has 2 oracle homes on the same HP-UX server,
one 10g and one 11.1.0.7.
He execute tnsping in both versions, and truss them truss -o out
tnsping
In 10g truss shows that the hostname from the connect descriptor is resolved
using /etc/hosts file but in 11g the same hostname is resolved using dns.
The same behavior is observed for sqlplus.
Because of this connections to the database ar slow because of the slow
hostname resolution using the dns.

Versions:
———

Diagnostic Analysis:
——————–
nsswitch.conf
==============
hosts: files [NOTFOUND=continue] dns

Reproducibility:
—————-
on customer site, also I observered same behavior on
internal HP-UX server with 11g database

Test Case:
———-

Workaround:
———–
none

Reply
maclean

May 22, 2011

Hdr: 5047830 9.2.0.6 NET 9.2.0.6 PRODID-115 PORTID-59 2728394
Abstract: TNSPING HANGS AND CONSUMES MEMORY UUNTIL THE SERVER LOCKS UP

Problem:
——–
Problem Description
~~~~~~~~~~~~~~~~~~~~
Firstly, the problem with tnsping hanging when locating a bad tnsnames
alias was reported in Bug2728394 “TNSPING HANGS AND DOES NOT RETURN ERROR
WHEN SERVICE NAME IS WRONG”
This was shown for all 9.2.0.x versions.

HOWEVER, the customer has responded saying that the “workaround” of making
sure the tnsnames.ora aliases are not corrupt is unacceptable for the
following severe reason:

If a tnsping is inadvertantly issued on a production server against a “bad”

tnsnames.ora alias, the tnsping command hangs, BUT memory resources leak
and eventually, after a few minutes (depending on the resources available)

the OS runs out of memory and causes a production outage.
The solution is to kill any hanging tnsping attempt but this could happen
without the DBA’s knowledge until the situation is severe.

Versions:
———
SQL*Net / Net Services 9.2.0.6 and 9.2.0.7
Tested against HP-UX 11 (customer’s platform) = 9.2.0.6
Tested against Solaris 2.9 = 9.2.0.7.
Same results each time.

Diagnostic Analysis:
——————–
1. Manually create a “bad” tnsnames.ora alias.
2. Run tnsping against the alias.
3. Run ‘top’ or some other tool to check memory leak.
4. Kill the hung process

Reproducibility:
—————-
Everytime in-house and customer’s site.

Test Case:
———-
Testcase uploaded as testcase.txt

Workaround:
———–
Kill the hanging tnsping process.
However, not easy if there is no knowledge of the tnsping that was run.

Reply
Lunar

September 17, 2012

感谢ML，学习了，真长知识

Reply

DNS设置引起的登录延迟

Applies to:

Symptoms

Cause

Solution

Comments

5 responses to “DNS设置引起的登录延迟”

Applies to:

Symptoms

Changes

Cause

Solution

Applies to:

Symptoms

Changes

Cause

Solution

Leave a Reply Cancel reply