admin
2010-09-06
<p><strong>11g Network Layer Does Not Use /etc/hosts on UNIX</strong></p>
<h2>Applies to: </h2>
Oracle Net Services - Version: 11.1.0.6 to 11.1.0.7<br>
Generic UNIX<br>
<h2><a name="SYMPTOM" rel="nofollow"></a>Symptoms</h2>
<p>After upgrading to 11g Oracle functionality bypasses the /etc/hosts file when resolving hostnames to IPs and instead query the DNS server. This can introduce some delay in establishing a connection to a remote host compared with 10g.</p>
<p>SQL*Plus and tnsping show this behavior but this delay can appear also when opening a dblink or anytime tcpip connections are established by the Oracle Network layer.</p>
<p>This happens even if Name Server Switch configuration (nsswitch) specifies the hosts file prior to dns lookups:</p>
<div>/etc/nsswitch.conf <br>
hosts: files [NOTFOUND=continue] dns <br>
or only: <br>
hosts: files </div>
<p><br>
<br>
If we obtain truss/tusc trace for a SQL*Plus connection we find the following sequence of OS system calls:</p>
<p>On 10g after reading the nsswitch.conf file, library "<strong>libnss_files.so</strong>" is loaded then /etc/hosts is read and the socket is opened:</p>
<div>
<p><strong><em>open("/etc/nsswitch.conf", O_RDONLY|0x800, 0666) = 5</em></strong> <br>
ioctl(5, TCGETA, 0x9fffffffffffaca0) ERR#25 ENOTTY <br>
read(5, "# \n# / e t c / n s s w i t c ".., 8192) = 92 <br>
read(5, 0x60000000001e6078, 8192) = 0 <br>
close(5) = 0 <br>
<strong><em>open("/usr/lib/hpux64/libnss_files.so.1", O_RDONLY|0x800, 0) = 5</em></strong> <br>
fstat(5, 0x9fffffffffffa720) = 0 <br>
pread(5, "7fE L F 0202010101\0\0\0\0\0\0\0".., 1024, 0) = 1024 <br>
stat("/usr/lib/hpux64/dpd", 0x9fffffffffff9cd0) = 0 <br>
open("/usr/lib/hpux64/dpd/libnss_files.so.1.bpd", O_RDONLY|0x800, 0) ERR#2 ENOENT <br>
getuid() = 305 (305) <br>
getgid() = 303 (303) <br>
mmap(NULL, 85872, PROT_READ|PROT_EXEC, MAP_SHARED|MAP_SHLIB, 5, 0) = 0xc0000000008d8000 <br>
mmap(NULL, 3159, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_SHLIB, 5, 131072) = 0x9fffffffbf63d000 <br>
close(5) = 0 <br>
getuid() = 305 (305) <br>
getgid() = 303 (303) <br>
<strong><em>open("/etc/hosts", O_RDONLY|0x800, 0666) = 5</em></strong> <br>
ioctl(5, TCGETA, 0x9fffffffffffac30) ERR#25 ENOTTY <br>
read(5, "# / e t c / h o s t s \n# \n# ".., 8192) = 8192 <br>
lseek(5, 18446744073709549410, SEEK_CUR) = 5986 <br>
close(5) = 0 <br>
<strong><em>socket(AF_INET, SOCK_STREAM, 0) = 5 <br>
</em></strong>connect(5, 0x60000000001f0680, 16) = 0 <br>
getsockname(5, 0x9fffffffffffb750, 0x9fffffffffffb740) = 0 <br>
getsockopt(5, SOL_SOCKET, SO_SNDBUF, 0x9fffffffffffb890, 0x9fffffffffffb894) = 0 <br>
getsockopt(5, SOL_SOCKET, SO_RCVBUF, 0x9fffffffffffb890, 0x9fffffffffffb894) = 0 <br>
setsockopt(5, 0x6, TCP_NODELAY, 0x9fffffffffffb89c, 4) = 0</p>
</div>
<p> </p>
<p>But on 11g after reading the nsswitch.conf library "<strong>libnss_dns.so</strong>" is loaded then /etc/resolv.conf (which specifies available domain name servers) is read and much later a socket for TCP/IP (SOCK_STREAM) is opened. :</p>
<div>
<p><strong><em>open("/etc/nsswitch.conf", O_RDONLY|0x800, 0666) = 5</em> <br>
</strong>ioctl(5, TCGETA, 0x9fffffffffffa360) ERR#25 ENOTTY <br>
read(5, "# \n# / e t c / n s s w i t c ".., 8192) = 92 <br>
read(5, 0x60000000001c9058, 8192) = 0 <br>
close(5) = 0 <br>
<strong><em>open("/usr/lib/hpux64/libnss_dns.so.1", O_RDONLY|0x800, 0) = 5</em></strong> <br>
fstat(5, 0x9fffffffffff9de0) = 0 <br>
pread(5, "7fE L F 0202010101\0\0\0\0\0\0\0".., 1024, 0) = 1024 <br>
stat("/usr/lib/hpux64/dpd", 0x9fffffffffff9390) = 0 <br>
open("/usr/lib/hpux64/dpd/libnss_dns.so.1.bpd", O_RDONLY|0x800, 0) ERR#2 ENOENT <br>
getuid() = 305 (305) <br>
getgid() = 303 (303) <br>
mmap(NULL, 49440, PROT_READ|PROT_EXEC, MAP_SHARED|MAP_SHLIB, 5, 0) = 0xc00000000b054000 <br>
mmap(NULL, 800, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_SHLIB, 5, 65536) = 0x9fffffffbf68d000 <br>
close(5) = 0 <br>
getuid() = 305 (305) <br>
getgid() = 303 (303) <br>
getuid() = 305 (305) <br>
getgid() = 303 (303) <br>
open("/test/local/oracle/11.1.0.7/lib/libdl.so.1", O_RDONLY|0x800, 0) ERR#2 ENOENT <br>
open("/test/local/oracle/11.1.0.7/lib32/libdl.so.1", O_RDONLY|0x800, 0) ERR#2 ENOENT <br>
getuid() = 305 (305) <br>
getgid() = 303 (303) <br>
open("/usr/lib/hpux64/libdl.so.1", O_RDONLY|0x800, 0) = 5 <br>
fstat(5, 0x9fffffffffff9cf0) = 0 <br>
read(5, "7fE L F 0202010101\0\0\0\0\0\0\0".., 64) = 64 <br>
close(5) = 0 <br>
socket(AF_INET, SOCK_DGRAM, 0) = 5 <br>
ioctl(5, SIOCGIFNUM, 0x9fffffffffff9680) = 0 <br>
ioctl(5, SIOCGIFCONF, 0x9fffffffffff9690) = 0 <br>
socket(AF_INET6, SOCK_DGRAM, 0) = 6 <br>
ioctl(6, SIOCGLIFNUM, 0x9fffffffffff9684) = 0 <br>
ioctl(6, SIOCGLIFCONF, 0x9fffffffffff96a0) = 0 <br>
ioctl(5, SIOCGIFFLAGS, 0x9fffffffffff96b0) = 0 <br>
close(5) = 0 <br>
close(6) = 0 <br>
gettimeofday(0x9fffffffffff7dd0, NULL) = 0 <br>
getpid() = 22968 (22967) <br>
<strong><em>open("/etc/resolv.conf", O_RDONLY|0x800, 0666) = 5</em></strong> <br>
ioctl(5, TCGETA, 0x9fffffffffff7da0) ERR#25 ENOTTY <br>
read(5, "d o m a i n t e s t . c o m \n".., 8192) = 453 <br>
read(5, 0x60000000001dddf8, 8192) = 0 <br>
close(5) = 0 <br>
................</p>
<p>.............</p>
<p><strong><em>socket(AF_INET, SOCK_STREAM, 0) = 5</em></strong> <br>
connect(5, 0x60000000001eba50, 16) = 0 <br>
getsockname(5, 0x9fffffffffff9da0, 0x9fffffffffff94c0) = 0 <br>
getsockopt(5, SOL_SOCKET, SO_SNDBUF, 0x9fffffffffffa000, 0x9fffffffffffa004) = 0 <br>
getsockopt(5, SOL_SOCKET, SO_RCVBUF, 0x9fffffffffffa000, 0x9fffffffffffa004) = 0 <br>
setsockopt(5, 0x6, TCP_NODELAY, 0x9fffffffffffa00c, 4) = 0 <br>
</p>
</div>
<p><br>
</p>
<h2><a name="CHANGE" rel="nofollow"></a>Changes</h2>
<p>Nothing was changed in the configuration of the OS, only the upgrade from Oracle 10g to 11g was done. </p>
<h2><a name="CAUSE" rel="nofollow"></a>Cause</h2>
<p>What was changed between the two is the way Oracle resolves hostnames to IPs, more specifically the system call used to do that.</p>
<p>Oracle 11g use now <strong><em>getaddrinfo()</em></strong> while 10g used <strong><em>gethostbyname().</em></strong></p>
<p>These system functions requires different configuration in /etc/nsswitch.conf.</p>
<p><strong><em>gethostbyname()</em></strong> require the use of keyword "hosts" while <strong><em>getaddrinfo()</em></strong> the use the keyword "ipnodes"</p>
<p> </p>
<p> </p>
<p>Notes:<br>
Even though this has only been observed on HP-UX and Solaris, this may be UNIX generic.<br>
With Solaris, ipnodes has a different meaning (specify a file for IPV6 addresses resolution, gethostbyname and getaddrinfo both use hosts or ipnodes file). <br>
Linux on the other hand does not use ipnodes in nsswitch.conf</p>
<h2><a name="FIX" rel="nofollow"></a>Solution</h2>
<p>Add a line in the /etc/nsswitch.conf file similar to the following:<br>
ipnodes: files [NOTFOUND=continue] dns</p>
<p>The line starting with keyword "hosts" must not be deleted.</p>
<p>This way calls made by getaddrinfo() will search first in /etc/hosts then, if the name is not found, will contact the dns server. <br>
Thus there will be no connection delay for any lookup of host names existing in the local /etc/hosts file.</p>
<p> </p>