Note: Tested in Oracle 19.17 on Linux.
1. Test Code
The test involves a cluster table and two Sql statements, one is a select, another is an alter.
drop table test_tab;
drop cluster test_cluster;
create cluster test_cluster (x number);
create index test_cluster_indx on cluster test_cluster;
create table test_tab (x, y) cluster test_cluster(x)
as select level, 'abc' from dual connect by level <= 3;
create or replace procedure test_proc_sel_tab(p_cnt number) as
begin
for i in 1..p_cnt loop
for c in (select * from test_tab) loop
null;
end loop;
dbms_session.sleep(0.01);
end loop;
end;
/
-- exec test_proc_sel(5);
create or replace procedure test_proc_alt_cluster(p_cnt number) as
begin
for i in 1..p_cnt loop
execute immediate 'ALTER CLUSTER test_cluster PCTFREE 18 INITRANS 19';
end loop;
end;
/
-- exec test_proc_alt(5);
2. Test Run
Open two Sqlplus sessions, do following test. After a few seconds, Session-2 throws: ORA-00060: deadlock
-- Session-1@T1, run about 10 seconds
exec test_proc_sel_tab(1e4);
-- Session-2@T2, run within 10 seconds
exec test_proc_alt_cluster(1e4);
-- output
-- ORA-00060: deadlock detected while waiting for resource
-- ORA-06512: at line 3
In order to make deadlock reproducible, both sessions execute statements concurrently in a loop. If both statements are executed only occasionally at the same time, probably the deadlock appears only intermittently.
3. Trace Files
Here trace files after deadlock occurrence.
3.1 Session-1 Trace File
----- Current SQL Statement for this session (sql_id=g677askf3bry6) -----
SELECT * FROM TEST_TAB
----- PL/SQL Call Stack -----
object line object
handle number name
0x9a8b9220 4 procedure K.TEST_PROC_SEL_TAB
0x9a67a4b0 1 anonymous block
--------------------- Binary Stack Dump ---------------------
[1] (ksedst1()+95 -> kgdsdst())
...
[8] (kqrLockPo()+274 -> kqrget())
[9] (kqrpre()+1095 -> kqrLockPo())
[10] (kkdlgstcd()+347 -> kqrpre())
[11] (kkdlgstd()+3385 -> kkdlgstcd())
[12] (kkmfcbloCbk()+179 -> kkdlgstd())
[13] (kkmpfcbk()+235 -> kkmfcbloCbk())
[14] (qcsprfro()+648 -> kkmpfcbk())
[15] (qcsprfro_tree()+372 -> qcsprfro())
[16] (qcsprfro_tree()+138 -> qcsprfro_tree())
[17] (qcspafq()+240 -> qcsprfro_tree())
[18] (qcspqbDescendents()+264 -> qcspafq())
[19] (qcspqb()+255 -> qcspqbDescendents())
[20] (kkmdrv()+180 -> qcspqb())
[21] (opiSem()+2191 -> kkmdrv())
[22] (opiprs()+321 -> opiSem())
[23] (kksParseChildCursor()+527 -> opiprs())
[24] (rpiswu2()+2004 -> kksParseChildCursor())
[25] (kksLoadChild()+5287 -> rpiswu2())
[26] (kxsGetRuntimeLock()+2335 -> kksLoadChild())
[27] (kksfbc()+19638 -> kxsGetRuntimeLock())
[28] (opiexe()+2864 -> kksfbc())
-- kqrLockPo.rsi=r15 = 000000009AE73170 = v$rowcache_parent.ADDRESS of TEST_CLUSTER
========== FRAME [9] (kqrpre()+1095 -> kqrLockPo()) ==========
defined by frame pointers 0x7ffdf28dce80 and 0x7ffdf28dcd30
CALL TYPE: call ERROR SIGNALED: no COMPONENT: (null)
RDI 000000000000000B RSI 000000009AE73170 RDX 00000000B76FF440
RCX 0000000000000003 R8 0000000000000000 R9 00007F4C00000000
RAX 0000000000000000 RBX 0000000000000000 RBP 00007FFDF28DCE80
R10 00007FFDF28DB010 R11 0000000000000000 R12 0000000000000003
R13 00007F4CDA143140 R14 00000000A789CB80 R15 000000009AE73170
RSP 00007FFDF28DCD40 RIP 0000000012DCE457
SOC: 0x9c32fb98, type: row cache enqueues (111), map: 0xa5f39680
state: LIVE (0x99fc), flags: INIT (0x1)
row cache enqueue: count=1 session=0xb71de728 object=0x9ae73170, request=S
flag=03 WAI/TRC/-/-/-/-/-/- savepoint=0xd46f
row cache parent object: addr=0x9ae73170 cid=8(dc_objects)
conid=0 conuid=0 inc=1, pdbinc=1
hash=d5c4d330 typ=1 transaction=0xac528090 flags=0000002a
version=13 mtx version=134
objectno=4774244 ownerid=49 nsp=5
name=TEST_CLUSTER
own=0x9ae73240[0x9d422ca0,0x9d422ca0] wat=0x9ae73250[0x9c32fc18,0x9c32fc18] mode=X req=N
Current Wait Stack:
0: waiting for 'row cache lock'
cache id=0x8, mode=0x0, request=0x3
wait_id=6425 seq_num=6454 snap_id=1
wait times: snap=1.030535 sec, exc=1.030535 sec, total=1.030535 sec
wait times: max=infinite, heur=1.030535 sec
wait counts: calls=1 os=1
in_wait=1 iflags=0x55a2
Dumping one waiter:
inst: 1, sid: 369, ser: 13483
wait event: 'library cache lock'
p1: 'handle address'=0x9aebb010
p2: 'lock address'=0x9d8cc410
p3: '100*mode+namespace'=0x48d96600010003
row_wait_obj#: 4294967295, block#: 0, row#: 0, file# 0
min_blocked_time: 0 secs, waiter_cache_ver: 117
3.2 Session-2 Trace file
----- Current SQL Statement for this session (sql_id=42y5q6y5vyxsw) -----
ALTER CLUSTER test_cluster PCTFREE 18 INITRANS 19
----- PL/SQL Call Stack -----
object line object
handle number name
0x9a67e1c0 4 procedure K.TEST_PROC_ALT_CLUSTER
0x9a6665d8 1 anonymous block
--------------------- Binary Stack Dump ---------------------
[1] (ksedst1()+95 -> kgdsdst())
...
[7] (kglLockWait()+1240 -> ksfgti())
[8] (kgllkal()+4711 -> kglLockWait())
[9] (kglLock()+1425 -> kgllkal())
[10] (kglget()+293 -> kglLock())
[11] (kkdllk0()+418 -> kglget())
[12] (kkdllcl()+557 -> kkdllk0())
[13] (kkdllct()+26 -> kkdllcl())
[14] (acldrv()+1359 -> kkdllct())
[15] (opiexe()+30659 -> acldrv())
-- kgllkal.rdx = 000000009AEBB010 = v$db_object_cache.ADDR for TEST_TAB
========== FRAME [9] (kglLock()+1425 -> kgllkal()) ==========
defined by frame pointers 0x7ffd39edb340 and 0x7ffd39edade0
CALL TYPE: call ERROR SIGNALED: no COMPONENT: LIBCACHE
RDI 00007F087E5419C0 RSI 00007FFD39EDB2E0 RDX 000000009AEBB010
RCX 0000000000000003 R8 000000009AEBB090 R9 00007FFD39EDBAA0
RAX 00000000A541C1F8 RBX 0000000000000001 RBP 00007FFD39EDB340
R10 00007FFD39ED8FF0 R11 00000000132262E0 R12 0000000000000000
R13 0000000000000000 R14 0000000000000000 R15 0000000000000006
RSP 00007FFD39EDADF0 RIP 00000000130AC291
Current Wait Stack:
0: waiting for 'library cache lock'
handle address=0x9aebb010, lock address=0x9d8cc410, 100*mode+namespace=0x48d96600010003
wait_id=5715 seq_num=5755 snap_id=1
wait times: snap=1.002169 sec, exc=1.002169 sec, total=1.002169 sec
wait times: max=15 min 0 sec, heur=1.002169 sec
wait counts: calls=1 os=1
in_wait=1 iflags=0x55a2
Dumping one waiter:
inst: 1, sid: 549, ser: 40906
wait event: 'row cache lock'
p1: 'cache id'=0x8
p2: 'mode'=0x0
p3: 'request'=0x3
row_wait_obj#: 105930, block#: 271548, row#: 0, file# 47
min_blocked_time: 0 secs, waiter_cache_ver: 117
3.3 diag_base trace file
DEADLOCK DETECTED (id=0x770be968)
Chain Signature: 'library cache lock'<='row cache lock' (cycle)
-------------------------------------------------------------------------------
Oracle session identified by:
{
process id: 57, oracle@test_db
session id: 549
}
is waiting for 'row cache lock' with wait info:
{
p1: 'cache id'=0x8 -- P1=8 dc_objects (PARENT) or dc_object_grants
p2: 'mode'=0x0
p3: 'request'=0x3 -- P3=3 (share mode), P3=5 (exclusive)
current sql: SELECT * FROM TEST_TAB
}
and is blocked by
=> Oracle session identified by:
{
process id: 50, oracle@test_db
session id: 369
}
which is waiting for 'library cache lock' with wait info:
{
p1: 'handle address'=0x9aebb010 -- P1=0x9aebb010 TEST_TAB
p2: 'lock address'=0x9d8cc410
p3: '100*mode+namespace'=0x48d96600010003
Name: K.TEST_TAB [Nsp: 1 Typ: 2 Id: 4774246]
timeout after: 14 min 59 sec -- 'library cache lock' maximum 900 seconds
current sql: ALTER CLUSTER test_cluster PCTFREE 18 INITRANS 19
}
and is blocked by the session at the start of the chain.
4. GDB Debug
4.1 Meta Info
First, we collect some Meta Info of our test objects.
select name, type, addr, hash_value, v.* from v$db_object_cache v
where name in ('TEST_CLUSTER', 'TEST_CLUSTER_INDX', 'TEST_TAB');
NAME TYPE ADDR HASH_VALUE
------------------ -------- --------- ----------
TEST_CLUSTER_INDX INDEX 9A9F23B8 531250600
TEST_CLUSTER CLUSTER 9AE66A10 2742975119
TEST_TAB TABLE 9AEBB010 1905210899
-- Blog: Oracle ROWCACHE Views and Contents (http://ksun-oracle.blogspot.com/2018/10/oracle-rowcache-views.html)
----helper function----
create or replace function dump_hex2str (dump_hex varchar2) return varchar2 is
l_str varchar2(100);
begin
with sq_pos as (select level pos from dual connect by level <= 1000)
,sq_chr as (select pos, chr(to_number(substr(dump_hex, (pos-1)*2+1, 2), 'XX')) ch
from sq_pos where pos <= length(dump_hex)/2)
select listagg(ch, '') within group (order by pos) word
into l_str
from sq_chr;
return l_str;
end;
/
select * from (
select to_number(ltrim((substr(key, 7, 2)||substr(key, 5, 2)||substr(key, 3, 2)||substr(key, 1, 2)), '0'), 'XXXX') Schema_User_ID,
(select username from dba_users where user_id =
to_number(ltrim((substr(key, 7, 2)||substr(key, 5, 2)||substr(key, 3, 2)||substr(key, 1, 2)), '0'), 'XXXX')) username,
dump_hex2str(rtrim(substr(key, 13), '0')) dc_object_name,
to_number(trim(both '0' from substr(key, 11, 2)||substr(key, 9, 2)), 'XXXX') key_str_len,
v.*
from v$rowcache_parent v
where cache_name in ('dc_objects')
) where dc_object_name = 'TEST_CLUSTER';
DC_OBJECT_NAME ADDRESS
-------------- --------
TEST_CLUSTER 9AE73170
4.2 GDB script
Based on above meta info, we compose a gdb script:
------------------------- gdb -x gdb_rowcache_libcache_deadlock -p 3214 -------------------------
# first run test_proc_alt_cluster(1) to make test reproducible.
# For kglnaobj_and_schema, Tracing Library Cache Locks 3: Object Names
# (https://nenadnoveljic.com/blog/library-cache-lock-object-name/)
# kgllkal.rdx = 9AEBB010 = v$db_object_cache.ADDR for TEST_TAB
# kqrLockPo.rsi=r15 = 9AE73170 = v$rowcache_parent.ADDRESS of TEST_CLUSTER
set pagination off
set logging file /tmp/gdb.log
set logging overwrite on
set logging on
set $KGLLKHDL_ADDR=0X9AEBB010
set $ROWCACHE_PO_ADDR=0X9AE73170
set $s = 0
set $k = 0
set $r = 0
break *kgllkal if $rdx==$KGLLKHDL_ADDR
command
printf "S-%i ===== LibCache Seq: %i -- kgllkal (kgllkhdl Addr(rdx)=>%X, Mode(rcx)=>%X), kglnaobj_and_schema: %s --\n", ++$s, ++$k, $rdx, $rcx, ($rdx+0x1c0)
bt 3
cont
end
# kgllkdl or kglUnLock. ALTER CLUSTER GDB calls kgllkdl.
break *kgllkdl if $r8==$KGLLKHDL_ADDR
command
printf "S-%i ===== LibCache Seq: %i -- kgllkdl (kgllkhdl Addr(r8)=>%X) --\n", ++$s, ++$k, $r8
bt 3
cont
end
# kgllkdl or kglUnLock. QUERY GDB calls kglUnLock.
break *kglUnLock if $r8==$KGLLKHDL_ADDR || $r15==$KGLLKHDL_ADDR
command
printf "S-%i ===== LibCache Seq: %i -- kglUnLock (kgllkhdl Addr(r8)=>%X or Addr(r15)=>%X) --\n", ++$s, ++$k, $r8, $r15
bt 3
cont
end
break *kqrLockPo if $rsi==$ROWCACHE_PO_ADDR || $r15==$ROWCACHE_PO_ADDR
command
printf "S-%i ===== RowCache Seq: %i -- kqrLockPo (Addr(rsi)=>%X, Request Mode(rcx)=>%X) --\n", ++$s, ++$r, $rsi, $rcx
bt 3
cont
end
# kqrReleaseLock or kqreqd
break *kqrReleaseLock if $rdx==$ROWCACHE_PO_ADDR || $rdi==$ROWCACHE_PO_ADDR
command
printf "S-%i ===== RowCache Seq: %i -- kqrReleaseLock (Addr(rdx)=>%X) --\n", ++$s, ++$r, $rdx
bt 3
cont
end
cont
4.3 ALTER CLUSTER GDB
We run ALTER CLUSTER with gdb trace, here the output:
06:55:14 SQL > exec test_proc_alt_cluster(1);
Breakpoint 4, 0x0000000012dd0890 in kqrLockPo ()
S-1 ===== RowCache Seq: 1 -- kqrLockPo (Addr(rsi)=>9AE73170, Request Mode(rcx)=>5) --
#0 0x0000000012dd0890 in kqrLockPo ()
#1 0x0000000012dce457 in kqrpre ()
#2 0x00000000030b573d in kkdlGetRowCacheOb ()
Breakpoint 1, 0x00000000130b0b40 in kgllkal ()
S-2 ===== LibCache Seq: 1 -- kgllkal (kgllkhdl Addr(rdx)=>9AEBB010, Mode(rcx)=>3), kglnaobj_and_schema: TEST_TABK --
#0 0x00000000130b0b40 in kgllkal ()
#1 0x00000000130ac291 in kglLock ()
#2 0x00000000130a70c5 in kglget ()
Breakpoint 1, 0x00000000130b0b40 in kgllkal ()
S-3 ===== LibCache Seq: 2 -- kgllkal (kgllkhdl Addr(rdx)=>9AEBB010, Mode(rcx)=>1), kglnaobj_and_schema: TEST_TABK --
#0 0x00000000130b0b40 in kgllkal ()
#1 0x00000000130ac291 in kglLock ()
#2 0x00000000130a70c5 in kglget ()
Breakpoint 2, 0x00000000130b34d0 in kgllkdl ()
S-4 ===== LibCache Seq: 3 -- kgllkdl (kgllkhdl Addr(r8)=>9AEBB010) --
#0 0x00000000130b34d0 in kgllkdl ()
#1 0x00000000130ae53c in kglUnLock ()
#2 0x0000000012bbaad1 in ktagetg0 ()
Breakpoint 5, 0x0000000012dd5b40 in kqrReleaseLock ()
S-5 ===== RowCache Seq: 2 -- kqrReleaseLock (Addr(rdx)=>9AE73170) --
#0 0x0000000012dd5b40 in kqrReleaseLock ()
#1 0x0000000012dd4e9a in kqreqd ()
#2 0x00000000128efaaa in kss_del_cb ()
4.4 Query GDB
We run Query with gdb trace, here the output:
06:57:32 SQL > exec test_proc_sel_tab(1);
Breakpoint 1, 0x00000000130b0b40 in kgllkal ()
S-1 ===== LibCache Seq: 1 -- kgllkal (kgllkhdl Addr(rdx)=>9AEBB010, Mode(rcx)=>2), kglnaobj_and_schema: TEST_TABK --
#0 0x00000000130b0b40 in kgllkal ()
#1 0x00000000130ac291 in kglLock ()
#2 0x00000000130a70c5 in kglget ()
Breakpoint 4, 0x0000000012dd0890 in kqrLockPo ()
S-2 ===== RowCache Seq: 1 -- kqrLockPo (Addr(rsi)=>9AE73170, Request Mode(rcx)=>3) --
#0 0x0000000012dd0890 in kqrLockPo ()
#1 0x0000000012dce457 in kqrpre ()
#2 0x00000000030bf8ab in kkdlgstcd ()
Breakpoint 5, 0x0000000012dd5b40 in kqrReleaseLock ()
S-3 ===== RowCache Seq: 2 -- kqrReleaseLock (Addr(rdx)=>9AE73170) --
#0 0x0000000012dd5b40 in kqrReleaseLock ()
#1 0x0000000012dd4e9a in kqreqd ()
#2 0x0000000012dd4833 in kqrprl ()
Breakpoint 4, 0x0000000012dd0890 in kqrLockPo ()
S-4 ===== RowCache Seq: 3 -- kqrLockPo (Addr(rsi)=>9AE73170, Request Mode(rcx)=>3) --
#0 0x0000000012dd0890 in kqrLockPo ()
#1 0x0000000012dd7b22 in kqrpre1 ()
#2 0x00000000032d0fdf in kqlobjlod ()
Breakpoint 5, 0x0000000012dd5b40 in kqrReleaseLock ()
S-5 ===== RowCache Seq: 4 -- kqrReleaseLock (Addr(rdx)=>9AE73170) --
#0 0x0000000012dd5b40 in kqrReleaseLock ()
#1 0x0000000012dd4e9a in kqreqd ()
#2 0x0000000012dd4833 in kqrprl ()
Breakpoint 3, 0x00000000130ae460 in kglUnLock ()
S-6 ===== LibCache Seq: 2 -- kglUnLock (kgllkhdl Addr(r8)=>9AEBB010) --
#0 0x00000000130ae460 in kglUnLock ()
#1 0x00000000030aaf05 in kkdcicf ()
#2 0x0000000003094ab6 in kkmpost ()
Breakpoint 3, 0x00000000130ae460 in kglUnLock ()
S-7 ===== LibCache Seq: 3 -- kglUnLock (kgllkhdl Addr(r8)=>9AEBB010) --
#0 0x00000000130ae460 in kglUnLock ()
#1 0x00000000030aaf05 in kkdcicf ()
#2 0x0000000003094ab6 in kkmpost ()
5. Root Cacuse Analysis
Look gdb lock request order of both sessions:
(a). ALTER CLUSTER GDB shows:
S-1 ===== kqrLockPo () 'row cache lock' reqeust on TEST_CLUSTER
S-2 ===== kgllkal () 'library cache lock' allocate on TEST_TAB
S-3 ===== kgllkal () 'library cache lock' delete on TEST_TAB
S-4 ===== kgllkdl () 'library cache lock' delete on TEST_TAB
S-5 ===== kqrReleaseLock () 'row cache lock' release on TEST_CLUSTER
(b). Query GDB shows
S-1 ===== kgllkal () 'library cache lock' allocate on TEST_TAB
S-2 ===== kqrLockPo () 'row cache lock' reqeust on TEST_CLUSTER
S-3 ===== kqrReleaseLock () 'row cache lock' release on TEST_CLUSTER
S-4 ===== kqrLockPo () 'row cache lock' reqeust on TEST_CLUSTER
S-5 ===== kqrReleaseLock () 'row cache lock' release on TEST_CLUSTER
S-6 ===== kglUnLock () 'library cache lock' unlock on TEST_TAB
S-7 ===== kglUnLock () 'library cache lock' unlock on TEST_TAB
ALTER CLUSTER GDB shows that 'library cache lock' (from allocate till delete) is completely held within 'row cache lock' holding.(hence both locks can be held at the same time, and 'row cache lock' can be held longer time)
Query GDB shows that 'row cache lock' (from request till release) is completely held within 'library cache lock' holding.
(hence both locks can be held at the same time, and 'library cache lock' can be held longer time)
S1 and S2 in ALTER CLUSTER GDB is reverse order of S1 and S2 in Query GDB.
Therefore the deadlock is caused by the crossing order of 'row cache lock' and 'library cache lock' reqeusts.
There are also similar cases of deadlocks documented in:
Oracle MOS: Deadlock on Row Cache Lock and Library Cache Lock Dropping REF-PARTITION (Doc ID 2796372.1)
Deadlock Caused by Row Cache and Library Cache Locks (https://nenadnoveljic.com/blog/deadlock-row-cache-lock-library-cache-lock/)