APPLIES TO:
Oracle Database - Enterprise Edition - Version 8.1.7.4 and later Information in this document applies to any platform. ***Checked for relevance on 30-Sep-2014***PURPOSE
The purpose of this troubleshooting article is to provide an insight into key areas of internal 600/7445 trace files to assist in deriving to a known bug or to highlight what might be needed by ORACLE Support to progress an issue further. Whilst the article can be used to some extent on 11g it is primarily written for versions <V10204 due to :- a) The worked example is based on a bug fixed in V10204 and higher releases b) The alert/trace file structure in 11g has changed The article will make reference to other notes so that the traces for 11g can also be proactively reviewed if wished. The nature of Internal errors means no one approach can guarantee getting to a solution but again the intention is to provide an indication on notes that support make available to customers and a simple workflow process that assists for a vast majority of SRs. A simple example is illustrated to generate a known bug that is harmless to the database when generated, that said it should ONLY be used on a TEST database as it's not encouraged to willingly raise such errors in 'LIVE' environments. This example will then be used to highlight various sections and how each section can be useful for general 600/7445 analysis and again to provide the best chance of identifying a known bugfix should one be available. The article also makes links to a number of others for completeness including Note:232963.1 should a testcase be required as is the ideal case for all internal errors but understandably is not always possible dependent on the problem area. Tests were made on SUN but the bug is not platform specific, it may however show slightly different messages within the alert log and for Windows based platforms some tools/notes might not be applicable. However this does not distract from the primary aim of how a trace file can be analyzed. An Internal Error whether ORA-00600 or ORA-07445 can fall into many categories :- Issue reproducible and testcase can be provided Issue reproducible but no simple testcase perhaps due to code being oracle SQL, 3rd party SQL Issue not reproducible but persistent and formulates to some pattern e.g. once a day Issue not reproducible and random pattern to occurrences By definition if an issue is not reproducible at will in a customer environment a testcase may be very difficult to obtain but should always be attempted where possible. These are a simplified subset, depending on root cause there can be many 600/7445 errors. Typically the argument(s) of the error when unique should mean each are different problems but if occurring on same timestamp or one always soon follows another there might be some relationship and this will be for support to determine.TROUBLESHOOTING STEPS
Worked Example ------------------ A real life example will follow and a working methodology will then be provided. Conclusions will then be made at the end of the analysis and some standard questions will be commented on that should always be appropriate to analysis of Internal errors and perhaps any SR raised into support. The bug number identified will be reported in a later section so as to show the workflow/analysis methodology used without knowing the solution in advance. This article is only suitable for <V10204 databases in terms of the testcase as the bug it relates to if fixed in the V10204PSR and higher RDBMS versions. The article is still appropriate as an introduction to analysis for 11g RDBMS and we will mention the main differences for when an internal error is seen in 11g. Due to the nature of Internal errors it is possible that the error is reported differently between versions/PSRs/platforms making analysis more complex. The example and trace results for this article came from V10201/64bitSUN but the same principles apply to all. sqlplus scott/tiger drop table a; drop table b; create table A(col11 number, col12 number); create table B(col21 number, col22 number); insert into a values (-3,-7); insert into a values (null,-1); insert into b values ( -7,-3); update a sET col11 = (select avg(b.col22) keep (dense_rank first order by (col22)) FROM b where b.col21= a.col12); The UPDATE will fail at session level with ORA-03113 error. A trace file will also be written to the UDUMP destination of the database and will now be highlighted, where appropriate if a section can be found using keyword searches in the trace it will be mentioned. Before any analysis of UDUMP/BDUMP traces take place there should be an understanding of how the error(s) are reported in the alert log. ALERT LOG --------------- As this is a worked example we know what to expect in advance but the alert and an understanding of what actions take place around an internal error can be fundamental to resolution. It is certainly possible for internal errors to be a consequence of a problem rather than the cause. Errors in file xxxx/xxxx_ora_24779.trc: ORA-07445: exception encountered: core dump [_memcpy()+592] [SIGSEGV] [Address not mapped to object] [0xFFFFFFFF7B180000] [] [] ORA-00600: internal error code, arguments: [kgh_heap_sizes:ds], [0xFFFFFFFF7B179E98], [], [], [], [], [], [] ORA-07445: exception encountered: core dump [_memcpy()+592] [SIGSEGV] [Address not mapped to object] [0xFFFFFFFF7B180000] [] [] ORA-00600: internal error code, arguments: [kghGetHpSz1], [0xFFFFFFFF7B179EA8] The errors continue in the alert and as we can see >1 error is raised but all are reported to be within the same tracefile. There will be cases where >1 error and >1 trace exist but this is beyond the scope of this 'Introduction'. Various sections of the trace are now reported based on this example, not all 600/7445 traces will allow for each section to be reviewed. Again this is a very simple example and often a combination of traces will need to be understood which is beyond the scope of this article.Section 1 : Trace header information
-------------------------------------
xxxx/xxxx_ora_24779.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = xxxxxx
System name: SunOS
Node name: xxxxxx
Release: 5.8
Version: Generic_117350-38
Machine: sun4u
Instance name: xxxxx
Redo thread mounted by this instance: 1
Oracle process number: 15
Unix process pid: 24779, image: oracle@xxxxx (TNS V1-V3)
This section gives general information on the machine/instance and version of RDBMS where the internal error has been seen, whilst important its certainly does not really contain much to narrow bug searches as this info is customer/machine specific where as a bug itself should really hold information generic to all customers.
Section 2 : The 600 or 7445 internal error
------------------------------------------
*** SESSION ID:(143.5) 2006-07-18 10:45:03.004
Exception signal: 11 (SIGSEGV), code: 1 (Address not mapped to object), addr: 0xffffffff7b280000,
PC: [0xffffffff7b700b58, memcpy()+1196]
*** 2006-07-18 10:45:03.008
ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [memcpy()+1196] [SIGSEGV] [Address not mapped to object]
[0xFFFFFFFF7B280000] [] []
This article is not written to show the differences between ORA-00600/ORA-07445 errors in any detail.
The basic difference between these two errors, is that
an ORA-600 is a trapped error condition in the Oracle code that should not occur, whereasan ORA-7445 is an untrapped
error condition detected by the operating system.
Section 3 : Current SQL statement
---------------------------------
Current SQL statement for this session:
update a sET col11 =
(select avg(b.col22) keep (dense_rank first order by (col22)) FROM b where b.col21= a.col12);
Not all internal issues will show a current SQL statement and there is no simple reason why this is the case. If this does happen it is recommended to try and use all other sections to narrow the search space down. When present in an internal (600/7445) trace file it should always be locatable using a search in the trace file of 'Current SQL' and should be the first hit found.
In addition there maybe a 'PLSQL Call Stack' that pinpoints the schema.object and line number for an internal issue.
Section 4 : Call Stack Trace
-----------------------------
The call stack within the trace file is seen
as follows :-
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedmp()+744 CALL ksedst() 000000840 ?
FFFFFFFF7FFE7D5C ?
000000000 ?
FFFFFFFF7FFE4850 ?
FFFFFFFF7FFE35B8 ?
FFFFFFFF7FFE3FB8 ?
ssexhd()+1000 CALL ksedmp() 000106000 ? 106324A04 ?
106324000 ? 000106324 ?
000106000 ? 106324A04 ?
sigacthandler()+44 PTR_CALL 0000000000000000 000380007 ?
FFFFFFFF7FFEB9A0 ?
000000067 ? 000380000 ?
00000000B ? 106324A00 ?
_memcpy()+592 PTR_CALL 0000000000000000 00000000B ?
FFFFFFFF7FFEB9A0 ?
FFFFFFFF7FFEB6C0 ?
FFFFFFFFFFFFFFFF ?
000000004 ? 000000000 ?
ksxb1bqb()+36 FRM_LESS __1cY__nis_writeCol FFFFFFFF7B2F0000 ?
dStartFile6FpcpnNdi FFFFFFFF7B35EE90 ?
rectory_obj__i_()+3 FFFFFFFFFFFE6DA0 ?
75 FFFFFFFFFFFFFFFF ?
000000004 ? 000000000 ?
kxhrPack()+1088 CALL ksxb1bqb() FFFFFFFF7B0F7220 ?
FFFFFFFF7B354BBF ?
FFFFFFFFFFFF1070 ?
FFFFFFFF7B2E5D60 ?
FFFFFFFF7B0F7220 ?
000001FF0 ?
qescaInsert()+280 CALL kxhrPack() 000000080 ?
FFFFFFFF7B34A5E8 ?
The stack function is the first column on each line, and so reads:
ksedmp ssexhd sigacthandler memcpy ksxb1bqb .....
For clarity the full stack trace is summarised to :-
Function List (to Full stack) (to Summary stack)
ksedmp ssexhd sigacthandler memcpy ksxb1bqb kxhrPack
qescaInsert subsr3 evaopn2 upderh upduaw kdusru
kauupd updrow qerupRowProcedure qerupFetch updaul updThreePhaseExe
updexe opiexe kpoal8 opiodr ttcpip opitsk opiino opiodr opidrv
sou2o opimai_real main start
There is no automated tool for customers to get into this
form. Note:211909.1 reports a further example
of ORA-7445 stack analysis.
The call stack can be very powerful to narrow a problem down to known issues
but if used incorrectly will generate many hits totally unrelated to the
problem being encountered.
As the functions and their purpose are ORACLE proprietry this article can only give pointers towards good practice and these include :-
a) Ignore function names that are before the 600/7445 error so for this worked example searches on 'ksedmp','ssexhd' or 'sigacthandler' will not benefit.
The top most routines are for error handling, so this is
why the failing function 'memcpy()' is not at the top of
the stack, and why the top most functions can be ignored.
b) Ignore calls towards the end of the call stack
c) Try a number of different searches based on the failing function from the 600/7445 and 4-5 functions after
In this case a useful 'My ORACLE Support' (MOS) search would be : memcpy ksxb1bqb kxhrPack qescaInsert subsr3Section 5 : Session Information
-------------------------------
SO: 392b5a188, type: 4, owner: 392a5a5b8, flag: INIT/-/-/0x00
(session) sid: 143 trans: 3912d2f78, creator: 392a5a5b8, flag: (41) USR/- BSY/-/-/-/-/-
DID: 0001-000F-0000000F, short-term DID: 0000-0000-00000000
txn branch: 0
oct: 6, prv: 0, sql: 38d8c05b0, psql: 38d8c0b20, user: 173/SCOTT
O/S info: user: xxxxxx, term: pts/21, ospid: 24482, machine: xxxxxx
program: sqlplus@xxxxxx (TNS V1-V3)
application name: SQL*Plus, hash value=3669949024
last wait for 'SQL*Net message from client' blocking sess=0x0 seq=282 wait_time=830 seconds since wait started=1
A 600/7445 trace should always have a 'Call stack Trace' whether it contains function names or not, after this follows 'PROCESS STATE' information and in this we can search on keyword 'session'.
The first hit should take us to the section of the trace file that shows where the error came from and the session/user affected (above they have been blanked to xxxxxx). In many cases we will see user, terminal and machine information that can often be useful for any issue that needs to be reproduced for further investigation. On many occasions an internal error will be reported in alert logs and the DBA will not necessarily have been informed by any users of the database/application that they encountered a problem.
This section can often also show 3rd party sessions e.g. TOAD/PLSQL or even where the client is another ORACLE product e.g. FORMS/APPS and even if just a client process e.g. EXP/RMAN. This can be another powerful method of narrowing a problem down, in the case of a problem coming from 3rd party software it is important to determine wherever possible if an issue can be reproduced in SQLPLUS.Section 6 : Explain Plan Information
-------------------------------------
============
Plan Table
============
---------------------------------------+-----------------------------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time |
---------------------------------------+-----------------------------------+
| 0 | UPDATE STATEMENT | | | | 3 | |
| 1 | UPDATE | A | | | | |
| 2 | TABLE ACCESS FULL | A | 2 | 52 | 3 | 00:00:01 |
| 3 | SORT AGGREGATE | | 1 | 26 | | |
| 4 | TABLE ACCESS FULL | B | 1 | 26 | 3 | 00:00:01 |
---------------------------------------+-----------------------------------+
This section can be found in the 600/7445 trace (when present) using the case
sensitive search 'Explain plan' and will be the first hit unless the SQL for
example contained the same string.
In this section the 'Plan Table' for the failing SQL will be reported and
especially for CBO related internal errors might give a pointer to finding
a workaround. For this worked example nothing really can be used but we might
for example see the plan using a 'hash join' and if so a possible w/a to
try would be hash_join_enabled=false. Caution should be used here, permanent
workarounds should only be set in place if matching to a known bug and the
implications are understood.Section 7 : Current Cursor Information
--------------------------------------
Current cursor: 4, pgadep: 0
Open cursors(pls, sys, hwm, max): 11(3, 7, 50, 50)
NULL 6 SYNTAX 0 PARSE 0 BOUND 5 FETCH 0 ROW 0
Cached frame pages(total, free):
4k(61, 58), 8k(10, 10), 16k(1, 1), 32k(0, 0)
pgactx: 38d2c9e98 ctxcbk: 38d2c9b30 ctxqbc: 0 ctxrws: 38e751110
This can be found using the case sensitive search in the trace file for 'Current cursor:' [do not include the quotes], this section will either tie up with the current SQL statement reported in the 600/7445 or when there is no SQL statement reported it 'might' assist in further analysis.
Once the current cursor is known the information on this can be found using the search 'Cursor#X' where X is the value shown e.g in this case 4.
However this is >=10.2 specific
and so it should be noted that prior to 10.2
use the search 'Cursor X '.
Cursor#4(ffffffff7b231828) state=BOUND curiob=ffffffff7b2456f8
curflg=4c fl2=0 par=0 ses=392b5a188
sqltxt(38d8c05b0)=update a sET col11 =
(select avg(b.col22) keep (dense_rank first order by (col22)) FROM b where b.col21= a.col12)
hash=e2ef469857c74a273b881c20276493b5
parent=38eabc4b8 maxchild=01 plk=38f664f50 ppn=nll
If a SQL statement is using BIND variables it is the cursor information that often proves useful at isolating the values used in the SQL which again can be the key to reproducing a problem if data specific.ORA-00600/ORA-07445 Analysis : General Guidelines
--------------------------------------------------
Now that the sections have been explained we need to ensure we use as systematic a procedure as possible to determine if a known issue/bug exists. It should be clear that a 600/7445 trace contains far more information but if the above sections together with what follows cannot conclude to a known issue it is recommended that a SR is raised to support.
Three useful documents types can assist here :-
OERI notes
----------
An 'OERI' article is a document provided where possible to customers that provides a brief meaning of the error/impact and known bugs relating to the 600/7445. The quickest way to find this information from MOS is to search as :
a) 7445 <1st argument> oeri
or
b) 600 <1st argument> oeri
So if the internal error was 600[12333] the search is:600 12333 oeri
So for this article and worked example the search would be : 7445 memcpy oeri
Equally the 600/7445 tool as per Note:153788.1 can be used and if the trace file is submitted into this tool an OERI note will be reported where published. If the testcase is attempted on a WINDOWS platform the error reported in the alert and trace may not be the same and the tool might not return any hits. Finding a suitable example for all platforms is beyond the scope of this document.
So for the latter search we should find Note:310531.1, it should be noted that not all bugs associated with an error will be tagged but ideally they would be.
It should also be clear from the notes seen in this type of search that ORA-600[x]/7445[y] does not mean only one bug can attribute to the Internal error.
BUG.8 notes
-----------
If the OERI or if a search gets a hit to a bug there maybe a link to a note where the MOS Noteid is <BUGNO>.8
For this worked example Note:5162852.8 exists, also to illustrate their purpose we can take for example Note:4451759.8
These articles are automatically generated when a bug becomes fixed and will typically include :-
--Product affected
--Range of releases the bug/error is believe appropriate to
--Known affected version(s)
--Fixed version
Together with the 'symptoms' of the bug and where possible workarounds. This information is also in the bug itself but the note provides this into a much clearer form.
Patch Set : List of Bugfixes by Problem Type Articles
-----------------------------------------------------
When a PSR is released ORACLE ensures there is an article that summarises all the bugfixes made into that PSR, this articles allows a quick search to customers to see if there is an ORA-600/ORA-7445 error of the same type in the PSR. As mentioned earlier in the OERI section just because there is a fix for the same type, it does not guarantee that the problem is the same.
For example if one of these PSR notes mentions ORA-00600[729] it means that the fix relates to one specifc way this error can be encountered, it does not stop 600[729] occurring for other reasons and even with the latest PSR applied to a given RDBMS release the internal error could occur due to a new unknown bug.
These documents can be found in MOS using a search with keywords like :-
<version> patch set bug fixes problem type
So if we are looking to see what fixes are made into the V92070 PSR we would use
the six keywords : 9.2.0.7 patch set bug fixes problem type
MOS will produce many hits but then the internet browsers search function can be used to search on the string '9.2.0.7' to quickly isolate the note which in this case is Note:308894.1
It is recommended to search always on the highest PSR available for the version you are running on since the note will then also contain links to the previous PSRs.
ORA-00600/ORA-07445 Analysis : Keyword Searches
-----------------------------------------------
This is the stage of the article where we need to use the research gathered from trace files and known MOS notes to see if we can use the MOS knowledge base to determine if the internal error is a known bug. It was commented that Section4 of a trace file can be very powerful here but again it is impossible to provide one unique way to search the knowledge bases. Instead we will use the worked example and will give some recommendations for 'bad' and 'good' search practices.
Bad Searches Practices
----------------------
a) Using first few calls of stack trace e.g ksedmp ssexhd sigacthandler memcpy
b) Use of MOS 'Advanced Search' and using the 'using any of the words' option
The reason this is a poor search method is that it is likely too many hits will be returned especially if keywords like 600/7445 are used. The default search functionality is an AND base search where all keywords must be present and this provides for better searches and fewer unrelated hits.
c) Using all of the stack trace. As explained above a default MOS search will find hits that contain all the keywords and using the whole stack allows a real risk of getting zero hits returned.
Good Searches Practices
------------------------
It should be noted that more than one search might actually be needed to get a feeling for what bugs/notes are relevant to the customers issue. The scope of what can be searched will depend on if a problem is reproducible.
a) Isolating a good section of the 'Call Stack' from Section4
b) If SQL is known then look at SQL to see if there is something special on the SQL e.g for this worked example we know dense_rank is being used and this is not common SQL
c) If a problem is reproducible and an explain plan is available look at the plan to see if there is something that can be changed at session level. An example was reported in Section6 and if a parameter is seen to make a query work this parameter can then become a very useful keyword for determining if a known bug exists.
d) Using the RDBMS version as keyword can sometimes show if any bugs exist that contain a oneoff fix for the same internal error argument but caution is needed as it is not a guarantee that the bug will cover the same cause.
With this information in mind the two searches :-
memcpy ksxb1bqb kxhrPack qescaInsert subsr3
memcpy dense_rank
both get very few bug hits and BUG:5162852 is quickly identified.