Friday, July 9, 2021

Oracle Program and Table dc_objects "row cache lock"

In previous Blog: One "row cache lock" Test Case, we discussed dc_users "row cache lock".

In this Blog, we will look two types of dc_objects "row cache lock", one is Plsql program, another is table/partitions.

Note: Tested in Oracle 19c.


1. Test Setup


At first, we create two list partitioned tables, and Plsql procedures to add and drop partitions.

drop table test_tab_1 purge;
drop table test_tab_2 purge;

create table test_tab_1 (part_id number, val number) partition by list (part_id) (partition p_0 values (0));

create table test_tab_2 (part_id number, val number) partition by list (part_id) (partition p_0 values (0));

create or replace procedure add_part (p_tab_name varchar2, p_part_id number) as
  l_part_name  varchar2(10);
begin
  select decode(sign(p_part_id) , 1, 'P_', 'N_')||abs(p_part_id) into l_part_name from dual;
  execute immediate 'alter table '||p_tab_name||' add partition '||l_part_name||' values('||p_part_id||')';
  execute immediate 'insert into '||p_tab_name||' values ('||p_part_id||','||p_part_id||')';
  commit;
end;
/

create or replace procedure drop_part(p_tab_name varchar2, p_part_id number) as
  l_part_name  varchar2(10);
begin
  select decode(sign(p_part_id) , 1, 'P_', 'N_')||abs(p_part_id) into l_part_name from dual;
	execute immediate 'alter table '||p_tab_name||' drop partition '||l_part_name;
end;
/

-- exec add_part('TEST_TAB_1', 1);
-- exec drop_part('TEST_TAB_1', 1);
And one procedure to loop add and drop partitions:

create or replace procedure add_drop_part_loop(p_tab_name varchar2, p_cnt number, p_dur number, p_sign number := 1) as
	l_start   number := dbms_utility.get_time;
	l_run_cnt number := 0;
begin
  -- cleanup
	for c in (select * from dba_tab_partitions 
	           where table_name = p_tab_name and partition_name != 'P_0' and partition_name like (decode(p_sign, 1, 'P', 'N')||'_%')) 
	loop
	  drop_part(c.table_name, p_sign*substr(c.partition_name, 3));
	end loop;
	
  l_start  := dbms_utility.get_time;
  while (dbms_utility.get_time - l_start) < p_dur*100 loop
	  for i in 1..p_cnt loop
	    add_part(p_tab_name, p_sign*i);
	  end loop;
	  
	  for i in 1..p_cnt loop
	    drop_part(p_tab_name, p_sign*i);
	  end loop;
	  
	  l_run_cnt := l_run_cnt + 1;
	end loop;
	dbms_output.put_line('Table='||p_tab_name||', run_cnt='||l_run_cnt||', elapsed='||round((dbms_utility.get_time-l_start)/100));
end;
/


2. Program dc_objects "row cache lock"


We create a small Plsql program that depends on both above created tables.

create or replace procedure test_tab_bind as
  l_cnt number;
begin
  select count(*) into l_cnt from TEST_TAB_1, TEST_TAB_2; 
end;
/
Open two Sqlplus sessions (SID-1: 186, SID-2: 550), and run following test code for a duration of 300 seconds.
     In SID-1, we add_drop 'TEST_TAB_1';
     in SID-2, we add_drop 'TEST_TAB_2'.

-- SID-1: 186, add_drop 'TEST_TAB_1'
select sid, event, total_waits, time_waited, average_wait, max_wait from v$session_event 
where event = 'row cache lock' and sid = (select sid from v$mystat where rownum=1);

exec add_drop_part_loop('TEST_TAB_1', 100, 300);

select sid, event, total_waits, time_waited, average_wait, max_wait from v$session_event 
where event = 'row cache lock' and sid = (select sid from v$mystat where rownum=1);


--SID-2: 550, add_drop 'TEST_TAB_2'
select sid, event, total_waits, time_waited, average_wait, max_wait from v$session_event 
where event = 'row cache lock' and sid = (select sid from v$mystat where rownum=1);

exec add_drop_part_loop('TEST_TAB_2', 100, 300);

select sid, event, total_waits, time_waited, average_wait, max_wait from v$session_event 
where event = 'row cache lock' and sid = (select sid from v$mystat where rownum=1);
Here the test output:

-- SID-1: 186
select sid, event, total_waits, time_waited, average_wait, max_wait from v$session_event ...

  no rows selected

exec add_drop_part_loop('TEST_TAB_1', 100, 300);

  Table=TEST_TAB_1, run_cnt=79, elapsed=300

select sid, event, total_waits, time_waited, average_wait, max_wait from v$session_event ...   

  SID EVENT          TOTAL_WAITS TIME_WAITED AVERAGE_WAIT   MAX_WAIT
  --- -------------- ----------- ----------- ------------ ----------
  186 row cache lock        4143        3509          .85          3


--SID-2: 550
select sid, event, total_waits, time_waited, average_wait, max_wait from v$session_event ...

  no rows selected

exec add_drop_part_loop('TEST_TAB_2', 100, 300);

  Table=TEST_TAB_2, run_cnt=76, elapsed=300

select sid, event, total_waits, time_waited, average_wait, max_wait from v$session_event ...

  SID EVENT          TOTAL_WAITS TIME_WAITED AVERAGE_WAIT   MAX_WAIT
  --- -------------- ----------- ----------- ------------ ----------
  550 row cache lock        4142        3805          .92          3
We can see that there are about 4000 'row cache lock' concurrency waits in each session.

During the test, we can run query below to show the contention on 'dc_objects'
(cache_id: P1=8, Mode held: P2=0 (null), Mode request : P3=5 (exclusive))
(see MOS Docu: WAITEVENT: "row cache lock" Reference Note (Doc ID 34609.1)):

select chain_signature, sid, pid, osid, blocker_sid, blocker_is_valid, p1, p2, p3 from v$wait_chains;

  CHAIN_SIGNATURE    SID  PID OSID   BLOCKER_SID BLOCK   P1   P2   P3
  ---------------- ----- ---- ------ ----------- ----- ---- ---- ----
  'row cache lock'   186   49 11421          550 TRUE     8    0    5
  <not in a wait>    550   57 11575              FALSE
If we check DIAG dia0 or mmnl trace files, we can see the blocked session, blocker session, and the requested row cache parent address (po 0x9c79f958) in mode X. (we will give a further look in Section 4: dc_objects Internals).

--testdb_dia0_10885_base_1.trc
--testdb_mmnl_10915.trc

*** 2021-06-25T11:37:49.281348+02:00
kqrhngc ph1: po 0x9c79f958 req X from session 0xb8d1d508 lock 0x9cbfb5e8
kqrhngc ph1: blocker #1 owner  mode X session 0xb8990bb8 lock 0x9d32f7e0
With following query, we can find the contention dc_object: TEST_TAB_BIND (see Blog: Oracle ROWCACHE Views and Contents ):

select to_number(ltrim((substr(key, 7, 2)||substr(key, 5, 2)||substr(key, 3, 2)||substr(key, 1, 2)), '0'), 'XXXX') user_id,
       (select username from dba_users where user_id = 
          to_number(ltrim((substr(key, 7, 2)||substr(key, 5, 2)||substr(key, 3, 2)||substr(key, 1, 2)), '0'), 'XXXX')) user_name,
       dump_hex2str(rtrim(substr(key, 13), '0')) dc_object_name, 
       to_number(trim(both '0' from substr(key, 11, 2)||substr(key, 9, 2)), 'XXXX') key_len,
       indx, hash, address, cache#, cache_name, existent, lock_mode, lock_request, txn, saddr --,v.* 
from v$rowcache_parent v
where cache_name in ('dc_objects') and address like upper('%9c79f958');

  USER_ID USER_NAME  DC_OBJECT_NAME  KEY_LEN  INDX  HASH ADDRESS  CACHE# CACHE_NAME E LOCK_MODE LOCK_REQUEST TXN      SADDR
  ------- ---------- --------------- ------- ----- ----- -------- ------ ---------- - --------- ------------ -------- --------
       49 K          TEST_TAB_BIND        13  7196  5950 9C79F958      8 dc_objects Y         0            5 AE97BC88 B8D1D508
       49 K          TEST_TAB_BIND        13  7197  5950 9C79F958      8 dc_objects Y         5            0 AE97BC88 B8990BB8
       49 K          TEST_TAB_BIND        13 11549 27359 9C79F958      8 dc_objects Y         5            0 AE92EF30 B8D1D508
We can also list the involved sessions and (recursive) transactions:
(Note:
   gv$transaction is defined on x$ktcxb with filter: bitand (ksspaflg, 1) != 0 and bitand (ktcxbflg, 2) != 0.
   We use x$ktcxb since they are recursive transactions, which are filtered out in gv$transaction)

select sid, dc_object_name, r.indx, hash, address, cache#, cache_name, existent, lock_mode, lock_request, 
       txn, r.saddr, s.blocking_session --, s.*, t.* 
from (
  select to_number(ltrim((substr(key, 7, 2)||substr(key, 5, 2)||substr(key, 3, 2)||substr(key, 1, 2)), '0'), 'XXXX') user_id,
         (select username from dba_users where user_id = 
            to_number(ltrim((substr(key, 7, 2)||substr(key, 5, 2)||substr(key, 3, 2)||substr(key, 1, 2)), '0'), 'XXXX')) user_name,
         dump_hex2str(rtrim(substr(key, 13), '0')) dc_object_name, 
         to_number(trim(both '0' from substr(key, 11, 2)||substr(key, 9, 2)), 'XXXX') key_len, v.* 
  from v$rowcache_parent v
  where cache_name in ('dc_objects') and address like upper('%9c79f958')
  order by key) r, gv$session s, x$ktcxb t
where r.saddr = s.saddr(+) and r.txn = t.ktcxbxba(+);

  SID DC_OBJECT_NAME INDX  HASH  ADDRESS  CACHE# CACHE_NAME E LOCK_MODE LOCK_REQUEST TXN      SADDR    BLOCKING_SESSION
  ---- ------------- ----- ----- -------- ------ ---------- - --------- ------------ -------- -------- ----------------
   186 TEST_TAB_BIND  7291  5950 9C79F958      8 dc_objects Y         0            5 AEAB4380 B8990BB8              550
   186 TEST_TAB_BIND 11641 27359 9C79F958      8 dc_objects Y         5            0 AE7F2248 B8990BB8              550
   550 TEST_TAB_BIND  7292  5950 9C79F958      8 dc_objects Y         5            0 AEAB4380 B8D1D508
We can see that the contention of dc_objects 'row cache lock' is on 'TEST_TAB_BIND' because it binds TEST_TAB_1 and TEST_TAB_2 together.

If we invalidate 'TEST_TAB_BIND' by "dbms_utility.invalidate" and repeat above test, no more 'row cache lock' contention can be observed (Probably this can be used as a quick workaround to fix such contention).

select object_name, object_id, status from dba_objects where object_name = 'TEST_TAB_BIND';

  OBJECT_NAME    OBJECT_ID  STATUS
  ------------- ---------- -------
  TEST_TAB_BIND    4509953  VALID

exec sys.dbms_utility.invalidate(p_object_id =>4509953);  

--to validate:
--alter procedure TEST_TAB_BIND compile;

select object_name, object_id, status from dba_objects where object_name = 'TEST_TAB_BIND';

  OBJECT_NAME    OBJECT_ID  STATUS
  ------------- ---------- --------
  TEST_TAB_BIND    4509953  INVALID
In contrast to "dbms_utility.invalidate", if we created an INVALID Plsql program as follows and repeat above test, there still exists the same 'row cache lock' contention.

create or replace procedure test_tab_bind as
  l_cnt number;
begin
  -- xxxx_cnt is wrong name
  select count(*) into xxxx_cnt from TEST_TAB_1, TEST_TAB_2;  
end;
/

  Warning: Procedure created with compilation errors.
  
  5/24     PLS-00201: identifier 'XXXX_CNT' must be declared
  5/33     PL/SQL: ORA-00904: : invalid identifier

select object_name, object_id, status from dba_objects where object_name = 'TEST_TAB_BIND';

  OBJECT_NAME    OBJECT_ID  STATUS
  ------------- ---------- --------
  TEST_TAB_BIND    4509953  INVALID
After test, we re-create our VALID Plsql program:

create or replace procedure test_tab_bind as
  l_cnt number;
begin
  select count(*) into l_cnt from TEST_TAB_1, TEST_TAB_2; 
end;
/

select object_name, object_id, status from dba_objects where object_name = 'TEST_TAB_BIND';

  OBJECT_NAME    OBJECT_ID  STATUS
  ------------- ---------- -------
  TEST_TAB_BIND    4509953  VALID


3. Table and Partition dc_objects "row cache lock"


In last section, we demonstrated dc_objects "row cache lock" on Plsql program.
In this section, we will look dc_objects "row cache lock" on table/partition.

In two above Sqlplus sessions (SID-1: 186, SID-2: 550), run following test code for a duration of 300 seconds.
     In SID-1, we add_drop 'TEST_TAB_1' partitions prefixed by 'P_';
     in SID-2, we add_drop 'TEST_TAB_1' partitions prefixed by 'N_'.

-- SID-1: 186, add_drop 'TEST_TAB_1' partitions prefixed by 'P_'
select sid, event, total_waits, time_waited, average_wait, max_wait from v$session_event 
where event = 'row cache lock' and sid = (select sid from v$mystat where rownum=1);

-- Create/Drop partitions with name prefixed 'P_'
exec add_drop_part_loop('TEST_TAB_1', 100, 300, p_sign=>1);

select sid, event, total_waits, time_waited, average_wait, max_wait from v$session_event 
where event = 'row cache lock' and sid = (select sid from v$mystat where rownum=1);
       

-- SID-2: 550, add_drop 'TEST_TAB_1' partitions prefixed by 'N_'.    
select sid, event, total_waits, time_waited, average_wait, max_wait from v$session_event 
where event = 'row cache lock' and sid = (select sid from v$mystat where rownum=1);

-- Create/Drop partitions with name prefixed 'N_'
exec add_drop_part_loop('TEST_TAB_1', 100, 300, p_sign=>-1);

select sid, event, total_waits, time_waited, average_wait, max_wait from v$session_event 
where event = 'row cache lock' and sid = (select sid from v$mystat where rownum=1);
Here the test output. It shows that each session has about 500 'row cache lock' concurrency waits (5780-5217=563, 5751-5218=533).

-- SID-1: 186
select sid, event, total_waits, time_waited, average_wait, max_wait from v$session_event ...

  SID EVENT          TOTAL_WAITS TIME_WAITED AVERAGE_WAIT   MAX_WAIT
  --- -------------- ----------- ----------- ------------ ----------
  186 row cache lock        5217        5563         1.07          4

exec add_drop_part_loop('TEST_TAB_1', 100, 300, p_sign=>1);

  Table=TEST_TAB_1, run_cnt=23, elapsed=300

select sid, event, total_waits, time_waited, average_wait, max_wait from v$session_event ...

  SID EVENT          TOTAL_WAITS TIME_WAITED AVERAGE_WAIT   MAX_WAIT
  --- -------------- ----------- ----------- ------------ ----------
  186 row cache lock        5780        7246         1.25          7
       
   
-- SID-2: 500           
select sid, event, total_waits, time_waited, average_wait, max_wait from v$session_event ...

  SID EVENT          TOTAL_WAITS TIME_WAITED AVERAGE_WAIT   MAX_WAIT
  --- -------------- ----------- ----------- ------------ ----------
  550 row cache lock        5218        5804         1.11          4

exec add_drop_part_loop('TEST_TAB_1', 100, 300, p_sign=>-1);

  Table=TEST_TAB_1, run_cnt=22, elapsed=304
   
select sid, event, total_waits, time_waited, average_wait, max_wait from v$session_event ...

  SID EVENT          TOTAL_WAITS TIME_WAITED AVERAGE_WAIT   MAX_WAIT
  --- -------------- ----------- ----------- ------------ ----------
  550 row cache lock        5751        7334         1.28          6
During the test, we run query below to show the contention on 'dc_objects'
(cache_id: P1=8, Mode held: P2=0 (null), Mode request : P3=3 (share mode)).

Comparing to above P3=5 (exclusive) in case of dc_object Plsql program, dc_object table/partition request is blocked with P3=3 (share mode). (we will give a further look in Section 4: dc_objects Internals).

select chain_signature, sid, pid, osid, blocker_sid, blocker_is_valid, p1, p2, p3 from v$wait_chains;

  CHAIN_SIGNATURE                   SID PID OSID   BLOCKER_SID BLOCK  P1  P2  P3
  --------------------------------- --- --- ------ ----------- ----- --- --- ---
  <not in a wait><='row cache lock' 550  57 11575          186 TRUE    8   0   3
  <not in a wait><='row cache lock' 186  49 11421              FALSE
If we check DIAG dia0 or mmnl trace files, we can see the blocked session, blocker session. The blocked session requested row cache parent address (po 0x9c971fa8) in mode S, and blocked by mode X, held by blocker session.
      
--testdb_dia0_10885_base_1.trc
--testdb_mmnl_10915.trc

*** 2021-06-25T12:07:10.296235+02:00
kqrhngc ph1: po 0x9c971fa8 req S from session 0xb8990bb8 lock 0x9cfb6d50
kqrhngc ph1: blocker #1 owner  mode X session 0xb8d1d508 lock 0x9cbfd7b8
Similar to above Plsql program, we can also list the sessions and (recursive) transactions with po 0x9c971fa8:

select sid, dc_object_name, r.indx, hash, address, cache#, cache_name, existent, lock_mode, lock_request, 
       txn, r.saddr, s.blocking_session --, s.*, t.* 
from (
  select to_number(ltrim((substr(key, 7, 2)||substr(key, 5, 2)||substr(key, 3, 2)||substr(key, 1, 2)), '0'), 'XXXX') user_id,
         (select username from dba_users where user_id = 
            to_number(ltrim((substr(key, 7, 2)||substr(key, 5, 2)||substr(key, 3, 2)||substr(key, 1, 2)), '0'), 'XXXX')) user_name,
         dump_hex2str(rtrim(substr(key, 13), '0')) dc_object_name, 
         to_number(trim(both '0' from substr(key, 11, 2)||substr(key, 9, 2)), 'XXXX') key_len, v.* 
  from v$rowcache_parent v
  where cache_name in ('dc_objects') and address like upper('%9c971fa8')
  order by key) r, gv$session s, x$ktcxb t
where r.saddr = s.saddr(+) and r.txn = t.ktcxbxba(+);

  SID DC_OBJECT_NAME  INDX  HASH ADDRESS  CACHE# CACHE_NAME E LOCK_MODE LOCK_REQUEST TXN      SADDR    BLOCKING_SESSION
  --- --------------- ---- ----- -------- ------ ---------- - --------- ------------ -------- -------- ----------------
  186 TEST_TAB_1      5267 19145 9C971FA8      8 dc_objects Y         0            3 AEB04B20 B8990BB8              550
  550 TEST_TAB_1      8110 22657 9C971FA8      8 dc_objects Y         5            0 AEB04B20 B8D1D508
  186 TEST_TAB_1      8109 22657 9C971FA8      8 dc_objects Y         0            3 AEB04B20 B8990BB8              550
  550 TEST_TAB_1      5268 19145 9C971FA8      8 dc_objects Y         5            0 AEB04B20 B8D1D508


4. dc_objects Internals


To further understand 'row cache lock' contentions, we will make a blocking test with breakpoint, trace session activities, and dump Row Cache Data.


4.1 Blocking Test


At first, add one partition into 'TEST_TAB_1', and one into 'TEST_TAB_2':

exec add_part('TEST_TAB_1', 1);
exec add_part('TEST_TAB_2', 1);
Continue with two above Sqlplus sessions (SID-1: 186, SID-2: 550).

In SID-1 (ospid: 11421), set a breakpoint at retq of kqrpre1 with condition of above Plsql program dc_object (TEST_TAB_BIND: po 0x9c79f958):

gdb -p 11421

# break *kqrpre1 if $rdi==0x8 && $rcx==0x5
# <+3564>:	retq  
break *kqrpre1+3564 if $rdx==0x9c79f958
In SID-1, drop one partition of 'TEST_TAB_1':

-- SID-1
exec drop_part('TEST_TAB_1', 1);
It stopped with following frame info:

(gdb) c
Continuing.

Breakpoint 1, 0x00000000037ac6bc in kqrpre1 ()

(gdb) display /x $rdx
1: /x $rdx = 0x9c79f958

(gdb) bt 6
#0  0x00000000037ac6bc in kqrpre1 ()
#1  0x00000000032e95d6 in kqlidp0_int ()
#2  0x00000000032e7c19 in kqlidp0 ()
#3  0x0000000002631424 in atbFMdrop ()
#4  0x000000000261976a in atbdrv ()
#5  0x00000000127b19b0 in opiexe ()
(More stack frames follow...)
In SID-2, drop one partition of 'TEST_TAB_2':

-- SID-2
exec drop_part('TEST_TAB_2', 1);
Here the output of v$wait_chains (Mode request: P3=5 (exclusive)):

select chain_signature, sid, pid, osid, blocker_sid, blocker_is_valid, p1, p2, p3 from v$wait_chains;

  CHAIN_SIGNATURE                   SID PID OSID  BLOCKER_SID BLOCK  P1  P2  P3
  --------------------------------- --- --- ----- ----------- ----- --- --- ---
  <not in a wait><='row cache lock' 550  57 11575         186 TRUE    8   0   5
  <not in a wait><='row cache lock' 186  49 11421             FALSE
Some new lines appear in dia0 and mmnl trace files:

--testdb_dia0_10885_base_1.trc
--testdb_mmnl_10915.trc

*** 2021-06-25T11:37:49.281348+02:00
kqrhngc ph1: po 0x9c79f958 req X from session 0xb8d1d508 lock 0x9cbfb5e8
kqrhngc ph1: blocker #1 owner  mode X session 0xb8990bb8 lock 0x9d32f7e0
We can also list the contention details with the sessions and (recursive) transactions on po 0x9c79f958:

select sid, dc_object_name, r.indx, hash, address, cache#, cache_name, existent, lock_mode, lock_request, 
       txn, r.saddr, s.blocking_session --, s.*, t.* 
from (
  select to_number(ltrim((substr(key, 7, 2)||substr(key, 5, 2)||substr(key, 3, 2)||substr(key, 1, 2)), '0'), 'XXXX') user_id,
         (select username from dba_users where user_id = 
            to_number(ltrim((substr(key, 7, 2)||substr(key, 5, 2)||substr(key, 3, 2)||substr(key, 1, 2)), '0'), 'XXXX')) user_name,
         dump_hex2str(rtrim(substr(key, 13), '0')) dc_object_name, 
         to_number(trim(both '0' from substr(key, 11, 2)||substr(key, 9, 2)), 'XXXX') key_len, v.* 
  from v$rowcache_parent v
  where cache_name in ('dc_objects') and address like upper('%9c79f958')
  order by key) r, gv$session s, x$ktcxb t
where r.saddr = s.saddr(+) and r.txn = t.ktcxbxba(+);

  SID DC_OBJECT_NAME  INDX  HASH ADDRESS  CACHE# CACHE_NAME E LOCK_MODE LOCK_REQUEST TXN      SADDR    BLOCKING_SESSION
  --- -------------- ----- ----- -------- ------ ---------- - --------- ------------ -------- -------- ----------------
  550 TEST_TAB_BIND   6779  5950 9C79F958      8 dc_objects Y         0            5 AE92C090 B8D1D508              186
  186 TEST_TAB_BIND  28027 27359 9C79F958      8 dc_objects Y         5            0 AE92C090 B8990BB8
  550 TEST_TAB_BIND  28026 27359 9C79F958      8 dc_objects Y         0            5 AE92C090 B8D1D508              186
  186 TEST_TAB_BIND   6780  5950 9C79F958      8 dc_objects Y         5            0 AE92C090 B8990BB8
dia0 also shows blocking chain and call stack:

--testdb_dia0_10885_base_1.trc

*** 2021-06-25T16:19:35.806182+02:00
HM: Session with ID 550 serial # 22155 (FG) on single instance 1 is hung
    and is waiting on 'row cache lock' for 334 seconds.
    Final Blocker is Session ID 186 serial# 60254 on instance 1
     which is 'not in a wait' for 586 seconds
     
-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
              session id: 550
    }
    is waiting for 'row cache lock' with wait info:
    {
                      p1: 'cache id'=0x8
                      p2: 'mode'=0x0
                      p3: 'request'=0x5
            time in wait: 1.370457 sec (last interval)
            time in wait: 5 min 34 sec (total)  
    }
    and is blocked by
 => Oracle session identified by:
    {
              session id: 186
    }
              
HM: Short Stack of immediate waiter session ID 550 serial# 22155, OSPID 11575 (FG)
Short stack dump: 
ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-semtimedop()+10<-skgpwwait()+192<-ksliwat()+2192<-kslwaitctx()+200
<-kqrget()+1397<-kqrLockPo()+1644<-kqrpre1()+2373<-kqlidp0_int()+5990<-kqlidp0()+521<-atbFMdrop()+692<-atbdrv()+7722<-opiexe()+30672
 
HM: current SQL: alter table TEST_TAB_2 drop partition P_1
After the test, quit debugger to complete both partition dropping.


4.2 Trace Files


We will add one partition into 'TEST_TAB_1', and then trace the activities of partition drop: (see Blog: Oracle row cache objects Event: 10222, Dtrace Scripts (I) )

Add one partition:

exec add_part('TEST_TAB_1', 1);
Then drop it with tracing:

alter session set max_dump_file_size = UNLIMITED;
alter session set events='10046 trace name context forever, level 12 : 
                          10704 trace name context forever, level 3 : 
                          10222 trace name context forever, level 4294967295' 
                  tracefile_identifier='rcl_1';

exec drop_part('TEST_TAB_1', 1);

alter session set events='10046 trace name context off : 10704 trace name context off : 10222 trace name context off'; 
In the first part of trace file, we can see that "name=TEST_TAB_1" (po 0x9c971fa8) is gotten in "mode=S". That is probably why P3=3 (share mode) in case of table/partition 'row cache lock' contention. The flag: "transaction=(nil)" also shows that there is no transaction in this part:

KQR: cid  8 bucket  19145 marked HOT
kqrpre: start hash=7c144ac9 mode=S keyIndex=0 dur=CALL opt=FALSE hot=TRUE
kqrScan: hot hash chain: processing po 0x9c971fa8
kqrpre: found cached object po=0x9c971fa8 flg=2
kqrmpin : kqrpre: found po Pin po 0x9c971fa8 cid=8 flg=2 hash=7c144ac9
time=2774049915
kqrpre: pinned po=0x9c971fa8 flg=2 pso=0xa0294020
pinned stack po=0x9c971fa8 cid=8: 
----- Abridged Call Stack Trace -----
ksedsts()+426<-kqrpre()+1985<-kkdlgonm()+229<-ktfa_check_ddl_fda_tables()+729<-opiSem()+23554<-opiprs()+321<-kksParseChildCursor()+527
<-rpiswu2()+2004<-kksLoadChild()+5283<-kxsGetRuntimeLock()+1983<-kksfbc()+18656<-kkspsc0()+1566<-kksParseCursor()+114<-opiosq0()+2310
----- End of Abridged Call Stack Trace -----
Partial short call stack signature: 0x4d19b7541a515b26
kqrAllocateEnqueue: bis: po=0x9c971fa8 flag=00010000 before=00000002 after=00010002
kqrget bic: po=0x9c971fa8 flag=00010000 before=00010002 after=00000002
kqrpre: done po=0x9c971fa8 cid=8 flg=2 eq=0x9cfbbb88 pso=0xa0294020 dur=CALL
kqrpre: keyIndex=0 hash=7c144ac9 99535881 0
kqrpre: obobn=4547616 obname=TEST_TAB_1 obtyp=2 obsta=1 obflg=0
kqrpre: SQL=alter table TEST_TAB_1 drop partition P_1 kqrpre: time=2774053161
kqrpre: po=0x9c971fa8 cid=8 stack: 
----- Abridged Call Stack Trace -----
ksedsts()+426<-kqrpreDoneTrace()+613<-kqrpre()+2954<-kkdlgonm()+229<-ktfa_check_ddl_fda_tables()+729<-opiSem()+23554<-opiprs()+321<-
----- End of Abridged Call Stack Trace -----
kqrpre: returnVal: TRUE
kqrprl: eq=0x9cfbbb88 fpInUse=FALSE

kqreqd: eq=0x9cfbbb88 res=0x9c972078
----------------------------------------
row cache enqueue: count=0 session=0xb8990bb8 object=0x9c971fa8, mode=S
flag=00 -/-/-/-/-/-/-/- savepoint=0x91da37
row cache parent object: addr=0x9c971fa8 cid=8(dc_objects)
hash=7c144ac9 typ=1 transaction=(nil) flags=00000002
objectno=4547616 ownerid=49 nsp=1
name=TEST_TAB_1
own=0x9c972078[0x9cfbbc08,0x9cfbbc08] wat=0x9c972088[0x9c972088,0x9c972088] mode=S req=N
status=VALID/-/-/-/-/-/-/-/-/-/-/-  KGH pinned  
            [........]        
kqreqd: processing po=0x9c971fa8 flag=2 mode=S
kqrmupin : kqrReleaseLock Unpin po 0x9c971fa8 cid=8 flg=2 hash=7c144ac9 time=2774055412
kqreqd freed enqeue: eq: 0x9cfbbb88
time=2774055435
            [........] 
In the second part of trace file, we can see all three involved objects (partition, program, table):

  name=TEST_TAB_1subname=P_1
  name=TEST_TAB_BIND
  name=TEST_TAB_1
which are all gotten in "mode=X".

For table "name=TEST_TAB_1", in the first part, it is gotten in "mode=S", in the second part, it is gotten in "mode=X".

For Plsql program "name=TEST_TAB_BIND", it is only gotten once in this second part, that is "mode=X" . That is probably why (P3=5 (exclusive)) in case of Plsql program 'row cache lock' contention. The flag: "transaction=0xae92d7e0" shows that they are all involved in one (recursive) transaction.

Here some extract lines from trace file of 'dc_objects' and transaction call stacks (xctCommitTxn).
           
kqreqd: eq=0x9a64a500 res=0x9c8f2d40
----------------------------------------
SO: 0x9caf9990, type: row cache enqueues (111), map: 0x9a64a500
SOC: 0x9a64a500, type: row cache enqueues (111), map: 0x9caf9990
row cache enqueue: count=2 session=0xb8990bb8 object=0x9c8f2c70, mode=X
flag=00 -/-/-/-/-/-/-/- savepoint=0x91dd75
row cache parent object: addr=0x9c8f2c70 cid=8(dc_objects)
hash=28cd6d03 typ=1 transaction=0xae92d7e0 flags=00002001
objectno=4605248 ownerid=49 nsp=1
name=TEST_TAB_1subname=P_1
own=0x9c8f2d40[0x9a64a580,0x9a64a580] wat=0x9c8f2d50[0x9c8f2d50,0x9c8f2d50] mode=X req=N
status=EMPTY/-/-/-/-/-/-/-/NEEDS INV/-/-/-  KGH pinned  
            [........]        
kqreqd: processing po=0x9c8f2c70 flag=2001 mode=X
kqrInvalidateObject clear: ob=0x9c8f2c70 before=00002001 after=00000000
kqrmupin : kqrReleaseLock Unpin po 0x9c8f2c70 cid=8 flg=0 hash=28cd6d03 time=2779225491
kqreqd freed enqeue: eq: 0x9a64a500
time=2779225499


kqreqd: eq=0x9d32f938 res=0x9c79fa28
----------------------------------------
SO: 0xa6e6c660, type: row cache enqueues (111), map: 0x9d32f938
SOC: 0x9d32f938, type: row cache enqueues (111), map: 0xa6e6c660
row cache enqueue: count=1 session=0xb8990bb8 object=0x9c79f958, mode=X
flag=40 -/-/-/-/-/-/XLG/- savepoint=0x91da6e
row cache parent object: addr=0x9c79f958 cid=8(dc_objects)
hash=2526973e typ=1 transaction=0xae92d7e0 flags=00000002
objectno=4509953 ownerid=49 nsp=1
name=TEST_TAB_BIND
own=0x9c79fa28[0x9d32f9b8,0x9d32f9b8] wat=0x9c79fa38[0x9c79fa38,0x9c79fa38] mode=X req=N
status=VALID/-/-/-/-/-/-/-/-/-/-/-  KGH pinned  
            [........]        
kqreqd: processing po=0x9c79f958 flag=2 mode=X
kqrmupin : kqrReleaseLock Unpin po 0x9c79f958 cid=8 flg=2 hash=2526973e time=2779225965
kqreqd freed enqeue: eq: 0x9d32f938
time=2779225973


kqreqd: eq=0x9cfb2428 res=0x9c972078
----------------------------------------
SO: 0xa6e79900, type: row cache enqueues (111), map: 0x9cfb2428
SOC: 0x9cfb2428, type: row cache enqueues (111), map: 0xa6e79900
row cache enqueue: count=6 session=0xb8990bb8 object=0x9c971fa8, mode=X
flag=00 -/-/-/-/-/-/-/- savepoint=0x91da49
row cache parent object: addr=0x9c971fa8 cid=8(dc_objects)
hash=7c144ac9 typ=1 transaction=0xae92d7e0 flags=00000002
objectno=4547616 ownerid=49 nsp=1
name=TEST_TAB_1
own=0x9c972078[0x9cfb24a8,0x9cfb24a8] wat=0x9c972088[0x9c972088,0x9c972088] mode=X req=N
status=VALID/-/-/-/-/-/-/-/-/-/-/-  KGH pinned  
            [........]        
kqreqd: processing po=0x9c971fa8 flag=2 mode=X
kqrmupin : kqrReleaseLock Unpin po 0x9c971fa8 cid=8 flg=2 hash=7c144ac9 time=2779226293
kqreqd freed enqeue: eq: 0x9cfb2428
time=2779226300


Abridged Call Stack Trace -----
ksedsts()+426<-kqrpre()+1985<-ktssexist_segment()+159<-ktssdro_segment()+317<-ktadrpc()+103
<-ktcccDeleteCommitCallbacks()+439<-ktcccdel()+46<-ktccpcmt()+310<-ktcCommitTxn_new()+1779
<-ktcCommitTxn()+94<-ktdcmt()+120<-k2lcom()+165<-k2send()+1220<-xctctl()+94
<-xctCommitTxn()+669<-opiexe()+19234
----- End of Abridged Call Stack Trace -----
In above trace, we can also see some lines like:

  kqrpre: optimistic lookup: hash=2546ac00
  kqrpre: optimistic lookup: searching cache 0 po 0xa22b6ac8 hash      226 ver 1
  kqrpre: optimistic lookup: searching cache 1 po 0xa22b6ac8 hash      226 ver 1
  ...
  kqrpre: optimistic lookup: searching cache 6 empty
  ...
  kqrpre: optimistic lookup: searching cache 9 po 0xa22b6ac8 hash      226 ver 1
  kqrpre: optimistic lookup: reading hash mtx addr=0xa85e7570 version=78
  kqrpre: optimistic lookup: processing po 0xa3def4a0 mtx addr=0xa3def640 version=82
  kqrpre: optimistic lookup: lock mode 0 flag 00000002
  kqrpre: optimistic lookup: fail
It can be toggled by a hidden parameter (18c changed default from FALSE to TRUE):

   _kqr_optimistic_reads	   optimistic reading of row cache objects	    TRUE


4.3. Row Cache Data Dump


With following command, we can dump 'dc_objects'.
(See MOS Bug 19354335 - Diagnostic enhancement for rowcache data dumps (Doc ID 19354335.8)
and Blog: Row Cache Object and Row Cache Mutex Case Study )
    
alter session set max_dump_file_size = UNLIMITED;
alter session set tracefile_identifier = 'dc_objects_dump';
-- dump level 0x82b: f is cache id 8 ('dc_objects'), 2 is single cacheiddump, b is level of 11
alter session set events 'immediate trace name row_cache level 0x82b';
alter session set events 'immediate trace name row_cache off';
The dump file consists of two parts, the first is ROW CACHE STATISTICS; and the second is 'dc_objects' data sorted by BUCKET:

ROW CACHE STATISTICS:
cache                          size     gets  misses  hit ratio  
--------------------------  -------  -------  ------  ---------  
dc_objects                     1056  3224531   49309      0.985  
  dc_object_grants               96     1673     308      0.845

dc_users                       2072  2877996     102      1.000  
  dc_users                      392        0       0      0.000
  dc_user_grants                376   353752      26      1.000
  dc_app_role                   320        0       0      0.000
  user's audit policies          64        0       0      0.000
  user's audit contexts         320        0       0      0.000
  CONTAINER_DATA_Attributes     584        0       0      0.000

WARNING: Restricting rowcache dump to just cacheid 8
ROW CACHE HASH TABLE: cid=8 ht=0xa8760de0 size=32768
Buckets with more than 20 objects:
NONE
Hash Chain Size     Number of Buckets
---------------     -----------------
              0                     0
              1                 11697
              2                  4810
              3                  1239
              4                   272
              5                    48
  ...
            >20                     0

BUCKET 5951:
  row cache parent object: addr=0x9c79f958 cid=8(dc_objects)
  hash=2526973e typ=1 transaction=(nil) flags=00000002
  objectno=4509953 ownerid=49 nsp=1
  name=TEST_TAB_BIND
  data=
  00000031 4554000d 545f5453 425f4241 00444e49 00000000 00000000 00000000 ...
  BUCKET 5951 total object count=1
             
BUCKET 19146:
  row cache parent object: addr=0x9c971fa8 cid=8(dc_objects)
  hash=7c144ac9 typ=1 transaction=(nil) flags=00000002
  objectno=4547616 ownerid=49 nsp=1
  name=TEST_TAB_1
  data=
  00000031 4554000a 545f5453 315f4241 00000000 00000000 00000000 00000000 ...
  BUCKET 19146 total object count=1
  
BUCKET 27908:
  row cache parent object: addr=0x9c8f2c70 cid=8(dc_objects)
  hash=28cd6d03 typ=1 transaction=(nil) flags=00000000
  objectno=4605248 ownerid=49 nsp=1
  name=TEST_TAB_1  subname=P_1
  data=
  00000031 4554000a 545f5453 315f4241 00000000 00000000 00000000 00000000 ...
  BUCKET 27908 total object count=1
In DB alert.log, we can see some buckets marked HOT:

2021-06-25T17:43:19.272935+02:00
KQR: cid 10 bucket  20941 marked HOT
KQR: cid  8 bucket  22482 marked HOT
KQR: cid  2 bucket  11264 marked HOT
which are probably checked by:

     kqrScan: reached _kqr_max_hot_copies limit %d
but "_kqr_max_hot_copies" is not (yet) exposed (maybe it will be similar to "_kgl_hot_object_copies").

Update 2023-Aug-23: "_kqr_max_hot_copies" is now exposed in Oracle 21c

   _kqr_max_hot_copies: upper limit of object hot copies allowed (default 16)
In DB alert.log, we can see:

2023-08-18T05:14:13.661259+02:00
  KQR row cache hit _kqr_max_hot_copies limit 16 cid 10 bucket 17930
2023-08-18T06:21:09.844089+02:00
  KQR row cache hit _kqr_max_hot_copies limit 16 cid 10 bucket 31900

2023-08-18T11:21:01.397443+02:00
  KQR row cache contention check: gets 1000000 sleeps 5577 limit 5000
  KQR row cache hash chain marked hot: cid 0 hash f27284a6
  KQR: cid  0 bucket      6 marked HOT (kqrpre: mutex sleeps exceed limit)
MOS "Bug 31135517 - High row cache mutex waits on hot objects - superseded (Doc ID 31135517.8)" describes that the fix implements 'hot object cloning' to alleviate row cache mutex contention.

In Trace Files, we can also see the 19.9 new introduced subroutine "kqrScan" (called by kqrpre or kqrpre2).