解决sqlplus的segmentation fault或hang问题

sqlplus应当是DBA 1.0时代使用最为频繁的管理工具,经常有经验丰富的老DBA会提到自己敲过几万次的sqlplus:),但有的时候这个吃饭家伙也会不好用,偶尔还会出现Segmentation fault错误,亦或者彻底hang住。在这里我介绍几种应对sqlplus无法正常使用的应对方法:

1.出现Segmentation fault,这种情况下一般是sqlplus 2进制文件被损坏了,可以通过重新build一个sqlplus来解决问题

[oracle@rh2 bin]$ sqlplus
Segmentation fault

/* 使用$ORACLE_HOME/sqlplus/lib目录下的make文件,编译一个新的sqlplus */

[oracle@rh2 ~]$ make -f $ORACLE_HOME/sqlplus/lib/ins_sqlplus.mk  newsqlplus

Linking /s01/oracle/product/11.2.0/dbhome_1/sqlplus/bin/sqlplus
rm -f /s01/oracle/product/11.2.0/dbhome_1/sqlplus/bin/sqlplus
gcc -o /s01/oracle/product/11.2.0/dbhome_1/sqlplus/bin/sqlplus -m64
-L/s01/oracle/product/11.2.0/dbhome_1/sqlplus/lib/ -L/s01/oracle/product/11.2.0/dbhome_1/lib/
-L/s01/oracle/product/11.2.0/dbhome_1/lib/stubs/  /s01/oracle/product/11.2.0/dbhome_1/sqlplus/lib/s0afimai.o
-lsqlplus -lclntsh  `cat /s01/oracle/product/11.2.0/dbhome_1/lib/ldflags`    -lncrypt11 -lnsgr11 -lnzjs11
-ln11 -lnl11 -lnro11 `cat /s01/oracle/product/11.2.0/dbhome_1/lib/ldflags`    -lncrypt11 -lnsgr11 -lnzjs11
-ln11 -lnl11 -lnnz11 -lzt11 -lztkg11 -lztkg11 -lclient11 -lnnetd11  -lvsn11 -lcommon11 -lgeneric11 -lmm
-lsnls11 -lnls11  -lcore11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lxml11 -lcore11 -lunls11 -lsnls11 -lnls11
-lcore11 -lnls11 `cat /s01/oracle/product/11.2.0/dbhome_1/lib/ldflags`    -lncrypt11 -lnsgr11 -lnzjs11 -ln11
-lnl11 -lnro11 `cat /s01/oracle/product/11.2.0/dbhome_1/lib/ldflags`    -lncrypt11 -lnsgr11 -lnzjs11 -ln11 -lnl11
-lclient11 -lnnetd11  -lvsn11 -lcommon11 -lgeneric11   -lsnls11 -lnls11  -lcore11 -lsnls11 -lnls11 -lcore11 -lsnls11
-lnls11 -lxml11 -lcore11 -lunls11 -lsnls11 -lnls11 -lcore11 -lnls11 -lclient11 -lnnetd11  -lvsn11
-lcommon11 -lgeneric11 -lsnls11 -lnls11  -lcore11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lxml11
-lcore11 -lunls11 -lsnls11 -lnls11 -lcore11 -lnls11   `cat /s01/oracle/product/11.2.0/dbhome_1/lib/sysliblist`
-Wl,-rpath,/s01/oracle/product/11.2.0/dbhome_1/lib -lm -lpthread   `cat /s01/oracle/product/11.2.0/dbhome_1/lib/sysliblist`
-ldl -lm -lpthread  -L/s01/oracle/product/11.2.0/dbhome_1/lib
/bin/chmod 755 /s01/oracle/product/11.2.0/dbhome_1/sqlplus/bin/sqlplus
rm -f /s01/oracle/product/11.2.0/dbhome_1/bin/sqlplus
mv -f /s01/oracle/product/11.2.0/dbhome_1/sqlplus/bin/sqlplus /s01/oracle/product/11.2.0/dbhome_1/bin/sqlplus
/bin/chmod 751 /s01/oracle/product/11.2.0/dbhome_1/bin/sqlplus
rm -f /s01/oracle/product/11.2.0/dbhome_1/sqlplus/lib/libsqlplus.so
rm -rf /s01/oracle/product/11.2.0/dbhome_1/sqlplus/bin/sqlplus

[oracle@rh2 ~]$ sqlplus  / as sysdba

SQL*Plus: Release 11.2.0.2.0 Production on Wed May 11 21:38:21 2011

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

2.出现sqlplus之后hang住的现象,hang的原因存在多种可能:
1)instance hanging数据库实例hang住,这种情况下sqlplus无法正常登陆到正hang的实例,而登陆到其他实例是可以的;若在10g以后版本中可以使用-prelim选项登陆实例,使用该选项登陆后无法执行普通的SQL语句,但可以使用oradebug内部调试工具,通过oradebug收集必要的hanganalyze信息后,可以进一步判断hang住的原因并决定下一步的操作。

[oracle@rh2 ~]$ sqlplus  / as sysdba
.............................we suspend here!!!

[oracle@rh2 ~]$ sqlplus  -prelim / as sysdba

SQL*Plus: Release 11.2.0.2.0 Production on Wed May 11 21:46:27 2011

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

SQL> oradebug setmypid;
Statement processed.

SQL> oradebug dump hanganalyze 4;
Statement processed.

SQL> oradebug dump systemstate 266;
Statement processed.

SQL> oradebug tracefile_name

/s01/orabase/diag/rdbms/prod/PROD1/trace/PROD1_ora_23436.trc   -- where dump resides

将以上trc文件提交给Oracle Support或者资深的Oracle技术人员,以便他们分析出实例hang住的原因,通过调整参数或者修复bug可以避免再次出现类似的状况。

2)一执行sqlplus就出现挂起现象,甚至没有登陆任何数据库。一般这种情况是在读取sqlplus 2进制文件或其相关的共享库文件(.so文件)时遇到了问题,或者是在实际system call系统调用execve(“sqlplus”)时遇到了错误,一般我们可以使用系统跟踪工具strace(Linux)或truss(Unix)工具来分析这种挂起现象:

/* Unix */

truss -o sqlplus_hang.log sqlplus

/* Linux */

strace -o sqlplus_hang.log sqlplus

head -10 sqlplus_hang.log

execve("/s01/db_1/bin/sqlplus", ["sqlplus"], [/* 28 vars */]) = -1 ENOEXEC (Exec format error)

可以看到以上strace记录中发现了调用execve函数(execve() executes the program pointed to by filename)运行sqlplus程序时出现了ENOEXEC错误,该ENOEXEC错误代码说明我们正在执行一个格式无效的可执行文件,具体的解释如下:

This error indicates that a request has been made to execute a file which, although it has the
appropriate permissions, does not start with a valid magic number. A magic number is the first two
bytes in a file, used to determine what type of file it is.

You tried to execute a file that is not in a valid executable format. The most common format for
binary programs under linux is called ELF. Note that your shell will run ascii files that have the
executable bit set as a shell script (ie run it as shell commands).

You can reproduce this by doing
    $ dd if=/dev/random of=myfile bs=1k count=1 $ chmod +x myfile $ ./myfile zsh: exec format error: ./myfile

Note that there is a very slight possibility that you could create a valid program that does
something bad to your system!!
Note, you can have user defined ways of running programs using Linux's binfmt_misc. See
    /usr/src/linux/Documentation/binfmt_misc.txt

to be continued …………


Posted

in

by

Tags:

Comments

2 responses to “解决sqlplus的segmentation fault或hang问题”

  1. maclean Avatar

    AIX: ENOEXEC unresolved symbols or invalid XCOFF header

    Problem Description
    ——————-

    You are attempting to install Oracle8i on an AIX system. You
    have verified that you are able to run X Windows programs like
    /usr/bin/X11/xclock. You change to the cdrom directory where
    Oracle8i is mounted and run the command “./runInstaller” and
    answer “y” to the prompt:

    Has ‘rootpre.sh’ been run by root? [y/n] (n)

    After a brief moment, a message similar to the following is
    displayed:

    Initializing Java Virtual Machine from ../stage/Components
    /oracle.swd.jre/1.1.8.3/1/DataFiles/Expanded/jre/aix/bin/jre.
    Please wait…
    Warning: JIT compiler “none” not found. Will use interpreter.
    java.lang.UnsatisfiedLinkError: Found /releases/rdbms/32bit
    /816/stage/Components/oracle.swd.jre/1.1.8.3/1/DataFiles
    /Expanded/jre/aix/lib/aix/native_threads/libawt.a but could
    not load it
    ENOEXEC: unresolved symbols or invalid XCOFF header
    30 /releases/rdbms/32bit/816/stage/Components/oracle.swd.jre
    /1.1.8.3/1/DataFiles/Expanded/jre/aix/lib/aix/native_threads/libawt.
    36 _XmStrings 44 /usr/lpp/X11/lib/libXm.a shr4.o
    36 XmGetFocusWidget 64 /usr/lpp/X11/lib/libXm.a shr4.o
    36 _XmGetMenuState 68 /usr/lpp/X11/lib/libXm.a shr4.o
    36 XmStringCreateLocalized 78 /usr/lpp/X11/lib/libXm.a shr4.o
    36 XmListPosSelected 83 /usr/lpp/X11/lib/libXm.a shr4.o
    36 XmFontListEntryLoad 90 /usr/lpp/X11/lib/libXm.a shr4.o
    36 XmFontListAppendEntry 91 /usr/lpp/X11/lib/libXm.a shr4.o
    36 XmFontListEntryFree 92 /usr/lpp/X11/lib/libXm.a shr4.o
    36 XmGetTearOffControl 93 /usr/lpp/X11/lib/libXm.a shr4.o
    36 XmFontListNextEntry 119 /usr/lpp/X11/lib/libXm.a shr4.o
    36 XmFontListEntryGetFont 120 /usr/lpp/X11/lib/libXm.a shr4.o
    36 XmFontListEntryGetTag 121 /usr/lpp/X11/lib/libXm.a shr4.o
    36 XmChangeColor 124 /usr/lpp/X11/lib/libXm.a shr4.o
    36 XmFontListEntryCreate 129 /usr/lpp/X11/lib/libXm.a shr4.o
    36 XOpenIM 198 /usr/lpp/X11/lib/libX11.a shr4.o
    36 XCreateIC 199 /usr/lpp/X11/lib/libX11.a shr4.o
    36 XSetICFocus 200 /usr/lpp/X11/lib/libX11.a shr4.o
    36 XFilterEvent 201 /usr/lpp/X11/lib/libX11.a shr4.o
    36 XmbLookupString 202 /usr/lpp/X11/lib/libX11.a shr4.o
    36 XGetICValues 236 /usr/lpp/X11/lib/libX11.a shr4.o
    36 XSetICValues 237 /usr/lpp/X11/lib/libX11.a shr4.o
    36 XmbResetIC 238 /usr/lpp/X11/lib/libX11.a shr4.o
    36 XUnsetICFocus 239 /usr/lpp/X11/lib/libX11.a shr4.o
    36 XDestroyIC 240 /usr/lpp/X11/lib/libX11.a shr4.o
    36 XGetIMValues 241 /usr/lpp/X11/lib/libX11.a shr4.o
    36 XCloseIM 242 /usr/lpp/X11/lib/libX11.a shr4.o
    36 XtStrings 247 /usr/lpp/X11/lib/libXt.a shr4.o
    36 XtShellStrings 248 /usr/lpp/X11/lib/libXt.a shr4.o
    36 _XtToolkitInitialize 255 /usr/lpp/X11/lib/libXt.a shr4.o
    36 _XtInitialize 256 /usr/lpp/X11/lib/libXt.a shr4.o
    36 XtCreateWidget 257 /usr/lpp/X11/lib/libXt.a shr4.o
    36 XtCreatePopupShell 258 /usr/lpp/X11/lib/libXt.a shr4.o
    36 _XtAppInitialize 259 /usr/lpp/X11/lib/libXt.a shr4.o
    36 XtAppCreateShell 260 /usr/lpp/X11/lib/libXt.a shr4.o
    36 XtVaCreateManagedWidget 305 /usr/lpp/X11/lib/libXt.a shr4.o
    at java.lang.Runtime.loadLibrary(Runtime.java)
    at java.lang.System.loadLibrary(System.java)
    at
    at java.awt.Toolkit.getDefaultToolkit(Toolkit.java:394)
    at oracle.sysman.oii.oiif.oiifm.OiifmSplashScreen.getImage(OiifmSplashScreen.java:67)
    at oracle.sysman.oii.oiif.oiifm.OiifmSplashScreen.(OiifmSplashScreen.java:45)
    at oracle.sysman.oii.oiic.OiicInstaller.main(OiicInstaller.java:417)

    Solution Description
    ——————–

    You should remove /usr/lpp/X11/lib from the environment variable LIBPATH.

    To remove it from the current environment, use this command:

    LIBPATH=`echo $LIBPATH | sed “s/\/usr\/lpp\/X11\/lib//g”`
    export LIBPATH

    Explanation
    ———–

    LIBPATH is an environment variable which determines where AIX will
    look for libraries loaded dynamically at run time. The normal
    directory AIX searches for dynamically loaded libraries is /usr/lib
    and /lib or the value of LIBPATH. If you include /usr/lpp/X11/lib
    in LIBPATH and have loaded the X11 compatibility libraries, then
    runInstaller will attempt to use the older incompatible libraries.

  2. maclean Avatar

    Segmentation Fault When Execute Sqlplus, Oracle, Lsnrctl After New/Patchset Install

    Applies to:
    Oracle Server – Standard Edition – Version: 9.2.0.1 to 10.1.0.2 – Release: 9.2 to 10.1
    Oracle Server – Enterprise Edition – Version: 9.2.0.1 to 10.1.0.2 [Release: 9.2 to 10.1]
    Linux x86
    Linux x86-64
    Checked for relevance on 24-Jul-2010
    Symptoms
    After fresh install and latest patchset 9.2.0.6.0 on Linux x86_64 every attempt to use sqlplus, lsnrctl fails with an segmentation fault.
    Changes
    New install or patchset apply.
    Cause
    The gcc/g++ compilers in /usr/bin are a script (with additional parameters)
    which caused an incorrect compilation.

    make.log shows this incorrect compilation:
    /tpp/oracle/920/bin/genclntsh
    /usr/bin/ld: skipping incompatible /usr/lib/gcc-lib/i386-redhat-linux/2.96/libgcc.a when
    searching for -lgcc
    /usr/bin/ld: skipping incompatible /usr/lib/gcc-lib/i386-redhat-linux/2.96/libgcc.a when
    searching for -lgcc
    /usr/bin/ld: warning: i386 architecture of input file
    `/usr/lib/gcc-lib/i386-redhat-linux/2.96/crtbeginS.o’ is incompatible with i386:x86-64 output
    /usr/bin/ld: warning: i386 architecture of input file
    `/usr/lib/gcc-lib/i386-redhat-linux/2.96/crtendS.o’ is incompatible with i386:x86-64 output.
    Solution
    To implement the solution, please execute the following steps:

    1. cd /usr/bin (as root)
    2. mv gcc gcc.script
    3. mv g++ g++.script
    4. ln -s gcc32 gcc
    5. ln -s g++32 g++
    6. login as oracle software owner (make sure environment is correct)
    7. cd $ORACLE_HOME/bin
    8. relink all

Leave a Reply

Your email address will not be published. Required fields are marked *