RHEL/CENTOS 7中需要注意的OS设置 会影响ORACLE数据库实例运行

 

 

 

 

1、 Oracle Linux 7和Redhat Linux 7:/var/tmp/.oracle中的socket文件被删除

Oracle Database – Enterprise Edition – 版本 11.2.0.4 和更高版本
Linux x86-64

症状

Oracle Linux 7和Redhat Linux 7:/var/tmp/.oracle中的socket文件被神秘删除.

更改

原因

Oracle Linux 7和Redhat Linux 7都有一个内核服务systemd-tmpfiles-clean.service,由systemd管理并删除临时位置的文件。

上述服务删除:

  1. 在/tmp 中的文件/目录超过10天没有访问的(在tmp.conf中定义)
  2. 在/var/tmp中的文件/目录超过30天没有访问的(在tmp.conf中定义)

通过检查文件/目录的所有atime/mtime/ctime来确定“没有访问”。

 

解决方案

排除套接字文件被内核服务systemd-tmpfiles-clean.service删除

要排除tmp目录中的套接字文件被tempfile clean服务删除,请更改/usr/lib/tmpfiles.d/tmp.conf的内容并添加

x /tmp/.oracle*

x /var/tmp/.oracle*

x /usr/tmp/.oracle*

上面的“x”选项指示systemd-tmpfiles-clean.service排除列出目录中的文件。

注意:目录/var/tmp/.oracle包含许多“特殊”套接字文件,本地客户端使用这些文件通过IPC协议(sqlnet)连接到各种Oracle进程,包括TNS监听器,CSS,CRS和EVM守护进程甚至是数据库或ASM实例。在Clusterware运行时删除套接字文件时  ,会出现Doc ID 391790.1的症状

 

 

 

 

 

ALERT: Setting RemoveIPC=yes on Redhat 7.2 and higher Crashes ASM and Database Instances as Well as Any Application That Uses a Shared Memory Segment (SHM) or Semaphores (SEM) (Doc ID 2081410.1)

 

ontroled by the option RemoveIPC in the /etc/systemd/logind.conf configuration file,
see man logind.conf(5) for details.

The default value for RemoveIPC in RHEL7.2 and higher is yes.

As a result, when the last oracle or grid user disconnects, the OS removes shared memory segments and semaphores for those users.
As Oracle ASM and Databases use shared memory segments for SGA, removing shared memory segments will crash the Oracle ASM and database instances.

Please refer to the Redhat bug 1264533  – https://bugzilla.redhat.com/show_bug.cgi?id=1264533

OCCURRENCE

The problem affects all applications including Oracle Databases that use the shared memory segments and semaphores; thus, both, Oracle ASM and database instances are affected.

Oracle Linux 7.2 avoids this problem by setting RemoveIPC to no explicitly on /etc/systemd/logind.conf configuration file,
but if /etc/systemd/logind.conf is touched or modified before the upgrade started, the yum/update will write the correct/new configuration file (with RemoveIPC=no) as logind.conf.rpmnew,
and if user retains their original configuration file, then most likely the failures described in this note will occur.
To avoid this problem, after the upgrade be sure to edit the logind.conf and set RemoveIPC=no.  This is documented in the Oracle Linux 7.2 release notes.

SYMPTOMS

1) Installing 11.2 and 12c GI/CRS fails, because ASM crashes towards the end of the installation.

2) Upgrading to 11.2 and 12c GI/CRS fails.

3) After Redhat Linux is upgraded to 7.2 and higher, 11.2 and 12c ASM and database instances crash.

 

The removal of the IPC objects by systemd-logind may happen at any time, as such the failure patterns can vary greatly, here are some examples of how failures may look like:

 

Most common error that occurs is that the following is found in the asm or database alert.log:
ORA-27157: OS post/wait facility removed
ORA-27300: OS system dependent operation:semop failed with status: 43
ORA-27301: OS failure message: Identifier removed
ORA-27302: failure occurred at: sskgpwwait1

 

The second observed error occurs during installation and upgrade when asmca fails with the following error:
KFOD-00313: No ASM instances available. CSS group services were successfully initilized by kgxgncin
KFOD-00105: Could not open pfile ‘[email protected]

 

The third observed error occurred during installation and upgrade:
Creation of ASM password file failed. Following error occurred: Error in Process: $GRID_HOME/bin/orapwdEnter password for SYS:

OPW-00009: Could not establish connection to Automatic Storage Management instance

2015/11/20 21:38:45 CLSRSC-184: Configuration of ASM failed
2015/11/20 21:38:46 CLSRSC-258: Failed to configure and start ASM

 

The fourth observed error is the following message is found in the /var/log/messages file around the time that asm or database instance crashed:
Nov 20 21:38:43 testc201 kernel: traps: oracle[24861] trap divide error
ip:3896db8 sp:7ffef1de3c40 error:0 in oracle[400000+ef57000]

 

WORKAROUND

1) Set RemoveIPC=no in /etc/systemd/logind.conf

2) Reboot the server or restart systemd-logind as follows:
# systemctl daemon-reload
# systemctl restart systemd-logind

PATCHES

Migrating to Oracle Linux 7.2 and higher from Redhat 7.2 and higher resolves this problem.

If migrating to Oracle Linux 7.2 is not possible, please use the above workaround by setting RemoveIPC=no in /etc/systemd/logind.conf


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *