ALTER SYSTEM KILL SESSION 'sid,serial#' IMMEDIATE;and alert.log error message:
ORA-00700: soft internal error, arguments: [ksepop:1 ksepop recursion ] ORA-00601: cleanup lock conflict PMON: terminating the instance due to ORA error 12752 Cause - 'Instance is being terminated due to fatal process death (CL01)'Note: Tested on Oracle 19.13 in Linux.
Update (2023-02-26): By this Blog, Oracle fixed the described and reproduced problem with Bug 33060212 (from 19.18 in January 2023):
Bug 33060212 - ORA-07445 [kgldmp0] and Instance Termination Due to Background Process CLMN (Doc ID 33060212.8) The fix for 33060212 is first included in 19.18.0.0.230117 (January 2023) DB Release Update (DB RU)
1. Test Setup
Open 3 Sqlplus sessions (SID-1, SID-2, SID-3). SID-1 and SID-2 are two test sessions. SID-3 is a monitor session.
In the test, SID-1 is "blocked session", and SID-2 is "blocker session".
At first, in each session, we run a query to get its session info (SID, SERIAL#, PID, SPID).
In SID-1, we also run two queries on dbms_standard to warm up (avoid hard parsing), which will be used in later test.
---================== SID-1 (blocked session, SID=100, SERIAL#=1001, SPID=10011) ==================---
-- grant execute on sys.dbms_support to k;
15:25:41 SID-1 > select s.program, s.sid, s.serial#, pid, spid, last_call_et
from v$session s, v$process p where s.paddr=p.addr and s.sid = sys.dbms_support.mysid;
PROGRAM SID SERIAL# PID SPID LAST_CALL_ET
------------ ---- ------- ---- -------- ------------
sqlplus.exe 100 1001 51 10011 0
15:25:46 SID-1 > select localtimestamp, dbms_standard.ora_max_name_len_supported from dual;
LOCALTIMESTAMP ORA_MAX_NAME_LEN_SUPPORTED
-------------------- --------------------------
15:25:47 128
15:25:48 SID-1 > select dbms_standard.database_name from dual;
DATABASE_NAME
----------------------------------------
TESTDB
---================== SID-2 (blocker session, SID=200, SERIAL#=2002, SPID=20022) ==================---
15:26:01 SID-2 > select s.program, s.sid, s.serial#, pid, spid, last_call_et
from v$session s, v$process p where s.paddr=p.addr and s.sid = sys.dbms_support.mysid;
PROGRAM SID SERIAL# PID SPID LAST_CALL_ET
------------ ---- ------- ---- -------- ------------
sqlplus.exe 200 2002 52 20022 0
---================== SID-3 (monitor session, SID=300, SERIAL#=3003, SPID=30033) ==================---
15:26:08 SID-3 > select s.program, s.sid, s.serial#, pid, spid, last_call_et
from v$session s, v$process p where s.paddr=p.addr and s.sid = sys.dbms_support.mysid;
PROGRAM SID SERIAL# PID SPID LAST_CALL_ET
------------ ---- ------- ---- -------- ------------
sqlplus.exe 300 3003 53 30033 0
2. Test Run
We start a gdb on SID-2 process, set breakpoint on "kgxRelease". In each step, we print out "step_count", and display actual time by "shell date".
Then in Sqlplus SID-2, we run:
SID-2 > select dbms_standard.database_name from dual;
It is waiting for gdb instruction.Now we run gdb continue command. It stops on "kgxRelease", which indicates that SID-2 is holding a mutex (by kglGetMutex -> kgxExclusive), but not yet released by "kgxRelease".
We go to Sqlplus SID-1, run:
SID-1 (Run-1) > select localtimestamp, dbms_standard.ora_max_name_len_supported from dual;
It returns and not blocked.Repeat the same actions ("gdb kgxRelease" and "SID-1 run") till SID-1 is hanging.
In our test, it occurs in Step-8 (see following test action flow diagram). At this moment, it signifies that SID-1 is blocked by a mutex held by SID-2. SID-1 is waiting on event "library cache: mutex X" (see later Section 3).
Then, from SID-3, we run "kill session immediate" on SID-1:
15:32:41 SID-3 > alter system kill session '100,1001' immediate;
(Note* instead of killing "blocker session" SID-2, we kill "blocked session" SID-1)
Now SID-1 shows "ORA-03114: not connected to ORACLE".Immediately from v$session, we saw both SID-1 and one CLMN/CL01 are waiting on event "library cache: mutex X" (see later Section 3). In order to to cleanup killed processes (Marked process newly dead), CL01 from PMON group (PMON, CLMN, CLnn) is called, but also blocked (see later Section 3).
After about 5 minutes, DB instance crashed.
========= gdb Session-2 process ========= | ================================ SQL Session-1 ===================================
oracle@testdb > gdb -p 20022 |
|
(gdb) break kgxRelease |
Breakpoint 1 at 0x12f90180 |
(gdb) set $step_count = 0 |
(gdb) display $step_count++ |
1: $step_count++ = 0 |
(gdb) shell date |
15:28:02 |
|
-------------------------- 15:28:30 SID-2 > select dbms_standard.database_name from dual; ---------------------------------
|
------------------------------------- Step-1 --------------------------------------------------------------------------------
(gdb) c |
Continuing. |
Breakpoint 1, 0x12f90180 in kgxRelease () |
1: $step_count++ = 1 |
(gdb) shell date |
15:28:58 |
--->>>
| SID-1 (Run-1) > select localtimestamp, dbms_standard.ora_max_name_len_supported from dual;
|
| LOCALTIMESTAMP ORA_MAX_NAME_LEN_SUPPORTED
| -------------------- --------------------------
| 15:29:01 128
------------------------------------- Step-2 --------------------------------------------------------------------------------
(gdb) c |
Continuing. |
Breakpoint 1, 0x12f90180 in kgxRelease () |
1: $step_count++ = 2 |
(gdb) shell date |
15:29:17 |
--->>>
| SID-1 (Run-2) > select localtimestamp, dbms_standard.ora_max_name_len_supported from dual;
|
| LOCALTIMESTAMP ORA_MAX_NAME_LEN_SUPPORTED
| -------------------- --------------------------
| 15:29:27 128
------------------------------------- Step-3 --------------------------------------------------------------------------------
(gdb) c |
Continuing. |
Breakpoint 1, 0x12f90180 in kgxRelease () |
1: $step_count++ = 3 |
(gdb) shell date |
15:29:32 |
--->>>
| SID-1 (Run-3) > select localtimestamp, dbms_standard.ora_max_name_len_supported from dual;
|
| LOCALTIMESTAMP ORA_MAX_NAME_LEN_SUPPORTED
| -------------------- --------------------------
| 15:29:37 128
------------------------------------- Step-4 --------------------------------------------------------------------------------
(gdb) c |
Continuing. |
Breakpoint 1, 0x12f90180 in kgxRelease () |
1: $step_count++ = 4 |
(gdb) shell date |
15:29:45 |
--->>>
| SID-1 (Run-4) > select localtimestamp, dbms_standard.ora_max_name_len_supported from dual;
|
| LOCALTIMESTAMP ORA_MAX_NAME_LEN_SUPPORTED
| -------------------- --------------------------
| 15:29:50 128
------------------------------------- Step-5 --------------------------------------------------------------------------------
(gdb) c |
Continuing. |
Breakpoint 1, 0x12f90180 in kgxRelease () |
1: $step_count++ = 5 |
(gdb) shell date |
15:29:57 |
--->>>
| SID-1 (Run-5) > select localtimestamp, dbms_standard.ora_max_name_len_supported from dual;
|
| LOCALTIMESTAMP ORA_MAX_NAME_LEN_SUPPORTED
| -------------------- --------------------------
| 15:30:06 128
------------------------------------- Step-6 --------------------------------------------------------------------------------
(gdb) c |
Continuing. |
Breakpoint 1, 0x12f90180 in kgxRelease () |
1: $step_count++ = 6 |
(gdb) shell date |
15:30:11 |
--->>>
| SID-1 (Run-6) > select localtimestamp, dbms_standard.ora_max_name_len_supported from dual;
|
| LOCALTIMESTAMP ORA_MAX_NAME_LEN_SUPPORTED
| -------------------- --------------------------
| 15:30:38 128
------------------------------------- Step-7 --------------------------------------------------------------------------------
(gdb) c |
Continuing. |
Breakpoint 1, 0x12f90180 in kgxRelease () |
1: $step_count++ = 7 |
(gdb) shell date |
15:30:46 |
--->>>
| SID-1 (Run-7) > select localtimestamp, dbms_standard.ora_max_name_len_supported from dual;
|
| LOCALTIMESTAMP ORA_MAX_NAME_LEN_SUPPORTED
| -------------------- --------------------------
| 15:30:50 128
------------------------------------- Step-8 --------------------------------------------------------------------------------
(gdb) c |
Continuing. |
Breakpoint 1, 0x12f90180 in kgxRelease () |
1: $step_count++ = 8 |
(gdb) shell date |
15:31:05 |
--->>>
| SID-1 (Run-8) > select localtimestamp, dbms_standard.ora_max_name_len_supported from dual;
| select localtimestamp, dbms_standard.ora_max_name_len_supported from dual
|
| *** SID-1 is blocked ***
|
-------------------------- 15:32:41 SID-3 > alter system kill session '100,1001' immediate; ---------------------------------
|
| ERROR at line 1:
| ORA-03113: end-of-file on communication channel
| Process ID: 10011
| Session ID: 100 Serial number: 1001
| ERROR:
| ORA-03114: not connected to ORACLE
|
-----------------------------------------------------------------------------------------------------------------------------
From gdb, we can also show call stack and register info of of SID-2.
(gdb) display $r8d
2: $r8d = 4
(gdb) bt
#0 0x12f90180 in kgxRelease ()
#1 0x12f6a588 in kglReleaseMutex ()
#2 0x12f635b7 in kglLoadOnLock ()
#3 0x12f62255 in kgllkal ()
#4 0x12f5d90e in kglLock ()
#5 0x12f587d5 in kglget ()
#6 0x034d1e68 in kksaxs ()
#7 0x034d016d in kksauc ()
#8 0x034e3db9 in kkscscid_auc_eval ()
#9 0x12c3d85a in kkscsCheckCriteria ()
#10 0x034e1d17 in kkscsCheckCursor ()
#11 0x034e0eec in kkscsSearchChildList ()
#12 0x12c2e5c6 in kksfbc ()
#13 0x12c27e37 in kkspsc0 ()
#14 0x12c27552 in kksParseCursor ()
#15 0x12e139c6 in opiosq0 ()
#16 0x12b7d71f in kpooprx ()
#17 0x12b7b115 in kpoal8 ()
#18 0x12b08902 in opiodr ()
#19 0x12ed2fbe in ttcpip ()
#20 0x028afa8c in opitsk ()
#21 0x028b43e8 in opiino ()
#22 0x12b08902 in opiodr ()
#23 0x028ab806 in opidrv ()
#24 0x03425585 in sou2o ()
#25 0x00dcb266 in opimai_real ()
#26 0x03430fb1 in ssthrdmain ()
#27 0x00dcb090 in main ()
(gdb) info r
rax 0x0 0
rbx 0x7fecd71ec9c0 140655198128576
rcx 0x7fecd70f4738 140655197112120
rdx 0x0 0
rsi 0x9b71f4b0 2607936688
rdi 0x7fecd71ec9c0 140655198128576
rbp 0x7ffebb73bbe0 0x7ffebb73bbe0
rsp 0x7ffebb73bb78 0x7ffebb73bb78
r8 0x4 4
r9 0x80002801 2147493889
r10 0x0 0
r11 0x2 2
r12 0x9b71f4b0 2607936688
r13 0xa27c0820 2726037536
r14 0x2 2
r15 0x7ffebb73c9a0 140732043348384
rip 0x12f90180 0x12f90180
eflags 0x246 [ PF ZF IF ]
cs 0x33 51
ss 0x2b 43
Once SID-1 is blocked in Step-8, if we make an errorstack dump of SID-1 before SID-1 kill session,
the Call Stack shows that SID-1 is blocked in "kglGetMutex -> kgxExclusive -> kgxWait":
oradebug dump errorstack 3
----- Current SQL Statement for this session (sql_id=27t9knmt7u5pq) -----
select localtimestamp, dbms_standard.ora_max_name_len_supported from dual
----- Call Stack Trace -----
FRAME [1] (ksedst1()+95 -> kgdsdst())
FRAME [2] (ksedst()+58 -> ksedst1())
FRAME [3] (dbkedDefDump()+23448 -> ksedst())
FRAME [4] (ksedmp()+577 -> dbkedDefDump())
FRAME [5] (ksdxdmp()+1425 -> ksedmp())
FRAME [6] (ksdxfdmp()+152 -> ksdxdmp())
FRAME [7] (ksdxcb()+872 -> ksdxfdmp())
FRAME [8] (sspuser()+200 -> ksdxcb())
FRAME [9] (__sighandler() -> sspuser())
FRAME [10] (semtimedop()+10 -> __sighandler())
FRAME [11] (sskgpwwait()+245 -> semtimedop())
FRAME [12] (skgpwwait()+187 -> sskgpwwait())
FRAME [13] (ksliwat()+2293 -> skgpwwait())
FRAME [14] (kslwaitctx()+200 -> ksliwat())
FRAME [15] (kgxWait()+1291 -> kslwaitctx())
FRAME [16] (kgxExclusive()+712 -> kgxWait())
FRAME [17] (kglGetMutex()+147 -> kgxExclusive())
FRAME [18] (kglpin()+696 -> kglGetMutex())
FRAME [19] (kglpnp()+439 -> kglpin())
FRAME [20] (kgiina()+352 -> kglpnp())
FRAME [21] (kgiinp()+39 -> kgiina())
FRAME [22] (peiinspbn()+146 -> kgiinp())
FRAME [23] (kkxpispbn()+212 -> peiinspbn())
FRAME [24] (kgmexec()+688 -> kkxpispbn())
FRAME [25] (evapls()+1270 -> kgmexec())
FRAME [26] (evaopn2()+747 -> evapls())
FRAME [27] (kpofcr()+7390 -> evaopn2())
FRAME [28] (qerfiFetch()+143 -> kpofcr())
FRAME [29] (opifch2()+3211 -> qerfiFetch())
FRAME [30] (kpoal8()+4185 -> opifch2())
FRAME [31] (opiodr()+1202 -> kpoal8())
FRAME [32] (ttcpip()+1246 -> opiodr())
3. Session Event Monitoring During Test
From SID-3 (monitor session), we can see the Session Event change in three steps:
(1). No Blocking (2). User Session Blocked on "library cache: mutex X" when mutex not released (kgxRelease) before Kill Session (3). Both User Session and CL01 Blocked on "library cache: mutex X" when mutex not released (kgxRelease) after Kill Session
----------================== SID-3 (monitor session, SID=300, SERIAL#=3003, SPID=30033) ===========================================================
-----****** Before blocking
15:30:30 SID-3 > select s.program, event, s.sid, s.serial#, pid, spid, p1raw, p2raw, p3raw, last_call_et, p1text, p2text, p3text
from v$session s, v$process p
where s.paddr=p.addr and (lower(s.program) like '%sql%' or lower(s.program) like '%pmon%'
or lower(s.program) like '%(cl%' or lower(s.program) like '%(dia%')
order by s.program desc;
PROGRAM EVENT SID SERIAL# PID SPID P1RAW P2RAW P3RAW LAST_CALL_ET P1TEXT P2TEXT P3TEXT
--------------------- ---------------------------- ---- ------- ---- ----- -------- ---------------- ---------------- ------------ --------- ------- --------------------
sqlplus.exe SQL*Net message from client 100 1001 51 10011 54435000 0000000000000001 00 24 driver id #bytes
sqlplus.exe SQL*Net message from client 200 2002 52 20022 54435000 0000000000000001 00 61 driver id #bytes
sqlplus.exe SQL*Net message to client 300 3003 53 30033 54435000 0000000000000001 00 0 driver id #bytes
oracle@testdb (PMON) pmon timer 359 33127 2 31362 0000012C 00 00 562 duration
oracle@testdb (DIAG) DIAG idle wait 897 49978 11 31394 00000003 0000000000000001 00 560 component where wait time(millisec)
oracle@testdb (DIA0) DIAG idle wait 3 31438 18 31411 00000003 0000000000000001 00 560 component where wait time(millisec)
oracle@testdb (CLMN) pmon timer 539 36539 3 31366 0000012C 00 00 562 duration
oracle@testdb (CL01) pmon timer 186 46807 49 31939 0000012C 00 00 465 duration
oracle@testdb (CL00) pmon timer 722 15363 34 31937 0000012C 00 00 465 duration
-----****** During "library cache: mutex X" blocking of User Session SID-1 (SID=100) before Kill Session
15:31:26 SID-3 > select s.program, event, s.sid, s.serial#, pid, spid, p1raw, p2raw, p3raw, last_call_et, p1text, p2text, p3text
from v$session s, v$process p
where s.paddr=p.addr and (lower(s.program) like '%sql%' or lower(s.program) like '%pmon%'
or lower(s.program) like '%(cl%' or lower(s.program) like '%(dia%')
order by s.program desc;
PROGRAM EVENT SID SERIAL# PID SPID P1RAW P2RAW P3RAW LAST_CALL_ET P1TEXT P2TEXT P3TEXT
--------------------- ---------------------------- ---- ------- ---- ----- -------- ---------------- ---------------- ------------ --------- ------ --------------------
sqlplus.exe library cache: mutex X 100 1001 51 10011 7CE2FAC4 00000C8000000000 000004C500010004 18 idn value where
sqlplus.exe SQL*Net message from client 200 2002 52 20022 54435000 0000000000000001 00 117 driver id #bytes
sqlplus.exe SQL*Net message to client 300 3003 53 30033 54435000 0000000000000001 00 0 driver id #bytes
oracle@testdb (PMON) pmon timer 359 33127 2 31362 0000012C 00 00 618 duration
oracle@testdb (DIAG) DIAG idle wait 897 49978 11 31394 00000003 0000000000000001 00 616 component where wait time(millisec)
oracle@testdb (DIA0) DIAG idle wait 3 31438 18 31411 00000003 0000000000000001 00 616 component where wait time(millisec)
oracle@testdb (CLMN) pmon timer 539 36539 3 31366 0000012C 00 00 618 duration
oracle@testdb (CL01) pmon timer 186 46807 49 31939 0000012C 00 00 521 duration
oracle@testdb (CL00) pmon timer 722 15363 34 31937 0000012C 00 00 521 duration
-----****** "kill session immediate" on SID-1
15:32:41 SID-3 > alter system kill session '100,1001' immediate;
System altered.
-----****** During "library cache: mutex X" blocking of both User Session SID-1 (SID=100), and CLMN/CLnn (CL01 SID=186) after Kill Session
15:33:05 SID-3 > select s.program, event, s.sid, s.serial#, pid, spid, p1raw, p2raw, p3raw, last_call_et, p1text, p2text, p3text
from v$session s, v$process p
where s.paddr=p.addr and (lower(s.program) like '%sql%' or lower(s.program) like '%pmon%'
or lower(s.program) like '%(cl%' or lower(s.program) like '%(dia%')
order by s.program desc;
PROGRAM EVENT SID SERIAL# PID SPID P1RAW P2RAW P3RAW LAST_CALL_ET P1TEXT P2TEXT P3TEXT
--------------------- ---------------------------- ---- ------- ---- ----- -------- ---------------- ---------------- ------------ ---------- ------- --------------------
sqlplus.exe library cache: mutex X 100 1001 51 10011 7CE2FAC4 00000C8000000000 000004C500010004 117 idn value where
sqlplus.exe SQL*Net message from client 200 2002 52 20022 54435000 0000000000000001 00 216 driver id #bytes
sqlplus.exe SQL*Net message to client 300 3003 53 30033 54435000 0000000000000001 00 0 driver id #bytes
oracle@testdb (PMON) pmon timer 359 33127 2 31362 0000012C 00 00 717 duration
oracle@testdb (DIAG) DIAG idle wait 897 49978 11 31394 00000003 0000000000000001 00 715 component where wait time(millisec)
oracle@testdb (DIA0) DIAG idle wait 3 31438 18 31411 00000003 0000000000000001 00 715 component where wait time(millisec)
oracle@testdb (CLMN) pmon timer 539 36539 3 31366 0000012C 00 00 717 duration
oracle@testdb (CL01) library cache: mutex X 186 46807 49 31939 7CE2FAC4 00000C8000000000 000004C50001004F 620 idn value where
oracle@testdb (CL00) pmon timer 722 15363 34 31937 0000012C 00 00 620 duration
In some tests, instead of CLnn, we also saw CLMN is waiting for "library cache: mutex X".P2RAW in Event "library cache: mutex X" is mutex "value", which contains mutex holder (blocker) SID. In this test, it is 200 = 0xC8.
P3RAW in Event "library cache: mutex X" is mutex "where". Its last 8-bit byte indicates the blocking subroutine location.
In this test, it is 79 = 0x4F (kglrfcl1).
We can print out those locations from kglMutexLocations[] array:
#Define Command to PrintkglMutexLocations
define PrintkglMutexLocations
set $i = 0
while $i < $arg0 + $arg0
x /s *(&kglMutexLocations + $i)
set $i = $i + 2
end
end
(gdb) PrintkglMutexLocations 174
kglpin1 4 (0x04)
kglhdgn1 62 (0x3E)
kglrfcl1 79 (0x4F)
kgllkdl1 85 (0x55)
kglhdgn2 106 (0x6A)
In a few tests, we saw CLnn having "P3RAW=000004C500010055" on "library cache: mutex X".
After about two minutes (see Section DB alert.log), DIAG (DIA0) was active to perform certain actions on Hung Sessions,
and CLMN took over "library cache: mutex X" with "P3RAW=000004C50001004F" on "library cache: mutex X".
(changed from kgllkdl1 (0x55: library cache lock delete) to kglrfcl1 (0x4F: library cache reference clear))
4. DB alert.log, Dumps and Incident Files
For instance crash, we can also have a look of alert.log, dumps from user and system sessions (CLMN/CLnn, PMON, DIAG/DIAn, MMON/MMON_SLAVE).
4.1 DB alert.log
alert.log showed that KILL SESSION at 15:32:41, DIA0 is active at 15:35:12 (151 seconds later), and finally at 15:37:59 (167 seconds later), errors are signalled and Instance termination was alerted:
ORA-00700: soft internal error, arguments: [ksepop:1 ksepop recursion ]
ORA-00601: cleanup lock conflict
'Instance is being terminated due to fatal process death (pid: 49, ospid: 31939, CL01)'
----------================== alert.log ==================----------
15:32:41.540893
Process termination requested for pid 10011 [source = rdbms], [info = 2] [request issued by pid: 30033, uid: 100]
15:32:41.590770
KILL SESSION for sid=(100, 1001):
Reason = alter system kill session
Mode = KILL HARD SAFE -/-/-
Requestor = USER (orapid = 53, ospid = 30033, inst = 1)
Owner = Process: USER (orapid = 51, ospid = 10011)
Result = ORA-0
15:35:12.167943
DIA0 Critical Database Process Blocked: Hang ID 1 blocks 2 sessions
Final blocker is session ID 200 serial# 2002 OSPID 20022 on Instance 1
No resolution will be attempted by Hang Manager
15:37:55.685771
Errors in file /orabin/app/oracle/admin/testdb/diag/rdbms/testdb/testdb/trace/testdb_cl01_31939.trc (incident=89993):
ORA-00700: soft internal error, arguments: [ksepop:1 ksepop recursion ], [], [], [], [], [], [], [], [], [], [], []
ORA-00601: cleanup lock conflict
ORA-00601: cleanup lock conflict
Incident details in: /orabin/app/oracle/admin/testdb/diag/rdbms/testdb/testdb/incident/incdir_89993/testdb_cl01_31939_i89993.trc
15:37:57.240744
Dumping diagnostic data in directory=[cdmp_20220222153757], requested by (instance=1, osid=31939 (CL01)), summary=[incident=89993].
15:37:59.637484
PMON (ospid: 31362): terminating the instance due to ORA error 12752
15:37:59.637658
Cause - 'Instance is being terminated due to fatal process death (pid: 49, ospid: 31939, CL01)'
15:37:59.638404
System state dump requested by (instance=1, osid=31362 (PMON)), summary=[abnormal instance termination].
At 15:35:12.167943, DIA0 detected that 2 sessions were blocked by SID-2 (session ID 200 serial# 2002 OSPID 20022 ).
The 2 blocked sessions are SID-1 (orapid = 51, ospid = 10011) and CL01.
4.2 CL01 Dump
cl01.trc looks like:
----------================== testdb_cl01_31939.trc ==================----------
Unix process pid: 31939, image: oracle@testdb (CL01)
*** 15:32:41.593097
*** SESSION ID:(186.46807) 15:32:41.593132
KGX cleanup...
KGX Atomic Operation Log 0x9b77d058
Mutex 0xa27c0820(727, 0) idn 7ce2fac4 oper GET_EXCL(5)
Library Cache uid 548 efd 7 whr 4 slp 9271
oper=0 pt1=0xa27c06d0 pt2=(nil) pt3=(nil)
pt4=(nil) pt5=(nil) ub4=0 flg=0x0 uw1=0 uw2=0
msk=0000-0000-0000-0000-0000
*** 15:35:11.727606
KGL UOL cleanup...
KGX Atomic Operation Log 0x9c44cf58
Mutex (nil)(0, 0) idn 0 oper NONE(0)
Library Cache uid 186 efd 15 whr 77 slp 0
oper=259 pt1=0x9b6a3938 pt2=0x9b6a2800 pt3=(nil)
pt4=0x9b6a3f58 pt5=(nil) ub4=0 flg=0x0 uw1=0 uw2=0
msk=0000-0000-0000-0000-0000
LibraryHandle: Address=0x9b6a3938 Hash=0 LockMode=0 PinMode=0 LoadLockMode=0 Status=INVL Subpool=1
Name: Namespace=SQL AREA(00) Type=CURSOR(00) ContainerId=0
Statistics: InvalidationCount=1 ExecutionCount=9 LoadCount=1 ActiveLocks=0 TotalLockCount=3 TotalPinCount=10
Counters: BrokenCount=2 RevocablePointer=2 KeepDependency=0 Version=0 BucketInUse=0 HandleInUse=0 HandleReferenceCount=0
Concurrency: DependencyMutex=0x9b6a39e8(0, 0, 0, 0) Mutex=0x9b6a51e0(186, 34, 0, 6)
Flags=RON/PIN/PN0/EXP/CHD/[10010111] Flags2=[0000]
WaitersLists:
Lock=0x9b6a39c8[0x9b6a39c8,0x9b6a39c8]
Pin=0x9b6a39a8[0x9b6a39a8,0x9b6a39a8]
LoadLock=0x9b6a3a20[0x9b6a3a20,0x9b6a3a20]
LibraryObject: Address=0x9b6a2800 HeapMask=0000-0000-0000-0000
NamespaceDump:
Child Cursor: Heap0=0x9b6a28e0 Heap6=0x7f65526469c0 Heap0 Load Time=15:25:46 Heap6 Load Time=15:25:46 15:37:55.686246
Incident 89993 created, dump file: /orabin/app/oracle/admin/testdb/diag/rdbms/testdb/testdb/incident/incdir_89993/testdb_cl01_31939_i89993.trc
*** 15:37:55.686302
ORA-00700: soft internal error, arguments: [ksepop:1 ksepop recursion ], [], [], [], [], [], [], [], [], [], [], []
ORA-00601: cleanup lock conflict
It shows "KGX cleanup" on "idn 7ce2fac4" at 15:32:41,
and "KGL UOL cleanup" at 15:35:11 (it is close to above DIA0 detection time of 15:35:12).With following query, we can see "7CE2FAC4" (HASH_VALUE=2095250116) is SYS.DBMS_STANDARD Package Spec, which is also showed as P1RAW for the session event "library cache: mutex X".
select owner, name, namespace, type, hash_value, to_char(hash_value, 'XXXXXXXX') hash_value_HEX
from v$db_object_cache v where name like 'DBMS_STANDARD';
OWNER NAME NAMESPACE TYPE HASH_VALUE HASH_VALUE_HEX
-------- -------------------- -------------------- ---------- ---------- --------------------
K DBMS_STANDARD TABLE/PROCEDURE CURSOR 599930705 23C23751
PUBLIC DBMS_STANDARD TABLE/PROCEDURE SYNONYM 2691725148 A070775C
SYS DBMS_STANDARD TABLE/PROCEDURE PACKAGE 2095250116 7CE2FAC4
SYS DBMS_STANDARD BODY CURSOR 3231142607 C09752CF
Text "Mutex=0x9b6a51e0(186, 34, 0, 6)" shows cl01 (SESSION ID: 186) in Execlusuve lock (6).
4.3 CL01 Incident File
----------================== incident/incdir_89993/testdb_cl01_31939_i89993.trc ==================----------
*** SESSION ID:(186.46807) 15:37:55
ORA-00700: soft internal error, arguments: [ksepop:1 ksepop recursion ], [], [], [], [], [], [], [], [], [], [], []
----- Call Stack Trace -----
FRAME [11] (kgesoftnmierr()+712 -> kgerinv_internal())
FRAME [12] (ksepop()+1007 -> kgesoftnmierr())
FRAME [13] (kgepop()+135 -> ksepop())
FRAME [14] (kgesecl0()+145 -> kgepop())
FRAME [15] (kgxWait()+1403 -> kgesecl0())
FRAME [16] (kgxExclusive()+712 -> kgxWait())
FRAME [17] (kglGetMutex()+147 -> kgxExclusive())
FRAME [18] (kglrfcl()+354 -> kglGetMutex())
FRAME [19] (kglobcl()+589 -> kglrfcl())
FRAME [20] (kglobfr()+366 -> kglobcl())
FRAME [21] (kgllccl()+695 -> kglobfr())
FRAME [22] (kglMutexCleanupAll()+1499 -> kgllccl())
FRAME [23] (kglOnErrorMutexCleanup()+76 -> kglMutexCleanupAll())
FRAME [24] (ksepop()+351 -> kglOnErrorMutexCleanup())
FRAME [25] (kgepop()+135 -> ksepop())
FRAME [26] (kgepop()+472 -> kgepop())
FRAME [27] (ksesecl0()+189 -> kgepop())
FRAME [28] (ksucln_bump_cleanup_timer()+1053 -> ksesecl0())
FRAME [29] (ksucln_check_wait_tasks()+137 -> ksucln_bump_cleanup_timer())
FRAME [30] (ksliwat()+3923 -> ksucln_check_wait_tasks())
FRAME [31] (kslwaitctx()+200 -> ksliwat())
FRAME [32] (kgxWait()+1291 -> kslwaitctx())
FRAME [33] (kgxExclusive()+712 -> kgxWait())
FRAME [34] (kglGetMutex()+147 -> kgxExclusive())
FRAME [35] (kglrfcl()+354 -> kglGetMutex())
FRAME [36] (kglobcl()+589 -> kglrfcl())
FRAME [37] (kglobfr()+366 -> kglobcl())
FRAME [38] (kglobf0()+327 -> kglobfr())
FRAME [39] (kgllkdl()+1532 -> kglobf0())
FRAME [40] (kss_del_cb()+811 -> kgllkdl())
Current Wait Stack:
1: waiting for 'library cache: mutex X'
idn=0x7ce2fac4, value=0xc8000000000, where=0x4c50001004f
wait_id=202 seq_num=203 snap_id=1
wait times: snap=2 min 44 sec, exc=2 min 44 sec, total=2 min 44 sec
wait times: max=infinite, heur=2 min 44 sec
wait counts: calls=16383 os=16383
in_wait=1 iflags=0x1532
0: waiting for 'library cache: mutex X'
idn=0x7ce2fac4, value=0xc8000000000, where=0x4c50001004f
wait_id=201 seq_num=202 snap_id=1
wait times: snap=5 min 14 sec, exc=2 min 30 sec, total=5 min 14 sec
wait times: max=infinite, heur=5 min 14 sec
wait counts: calls=15000 os=15000
in_wait=1 iflags=0x15b2
There is at least one session blocking this session.
Dumping 1 direct blocker(s):
inst: 1, sid: 200, ser: 2002
Dumping final blocker:
inst: 1, sid: 200, ser: 2002
The error message said:
"ORA-00700: soft internal error, arguments: [ksepop:1 ksepop recursion ]"
In the Call Stack, we can see two same calls in two different frames:
FRAME [13] (kgepop()+135 -> ksepop())
FRAME [25] (kgepop()+135 -> ksepop())
Probably that is interpreted as "recursion" in "[ksepop:1 ksepop recursion ]".In "Current Wait Stack" section, we can see 'library cache: mutex X' with "idn=0x7ce2fac4", and blocker is SID-1 (sid: 200, ser: 2002).
4.4 SID-1 (blocked session) Dump
SID-1 is waiting for 'library cache: mutex X' with "idn=0x7ce2fac4" and "where=0x4c500010004" (kglpin1).
----------================== testdb_ora_10011_bucket.trc ==================----------
Unix process pid: 32246, image: oracle@testdb
*** SESSION ID:(100.1001) 15:37:57.302316
*** 15:37:57.302326
Process diagnostic dump for oracle@testdb, OS id=10011,
-------------------------------------------------------------------------------
current sql:
client details:
machine: SYS\M7080 program: sqlplus.exe
Current Wait Stack:
0: waiting for 'library cache: mutex X'
idn=0x7ce2fac4, value=0xc8000000000, where=0x4c500010004
wait_id=357 seq_num=359 snap_id=1
wait times: snap=6 min 48 sec, exc=6 min 48 sec, total=6 min 48 sec
wait times: max=infinite, heur=6 min 48 sec
wait counts: calls=9271 os=9271
in_wait=1 iflags=0x55b2
4.5 SID-2 (blocker session) Dump
There are 2 sessions blocked by this session (SID-2). One waiter is CL01 (sid: 186, ser: 46807), waiting for 'library cache: mutex X' with "idn=0x7ce2fac4" and "where=0x4c50001004f" (kglrfcl1).
----------================== testdb_ora_20022_bucket.trc ==================----------
Unix process pid: 20022, image: oracle@testdb
*** SESSION ID:(200.2002) 15:37:57.303902
*** 15:37:57.303913
Process diagnostic dump for oracle@testdb, OS id=20022,
-------------------------------------------------------------------------------
current sql: select dbms_standard.database_name from dual
client details:
machine: SYS\M7080 program: sqlplus.exe
Current Wait Stack:
Not in wait; last wait ended 9 min 3 sec ago
There are 2 sessions blocked by this session.
Dumping one waiter:
inst: 1, sid: 186, ser: 46807
wait event: 'library cache: mutex X'
p1: 'idn'=0x7ce2fac4
p2: 'value'=0xc8000000000
p3: 'where'=0x4c50001004f
row_wait_obj#: 4294967295, block#: 0, row#: 0, file# 0
min_blocked_time: 310 secs, waiter_cache_ver: 327
Blog:
DBMS_SCHEDULER Job Not Running and Used Slaves
has some further discussions on Oracle Kill Statement and UNIX Kill Command.