Friday, August 21, 2020

Row Cache Object and Row Cache Mutex Case Study

12cR2 introduced "row cache mutex" to replace previous "latch: row cache objects". In this Blog, we will choose 'dc_props' and 'dc_cdbprops' to test "row cache mutex". Both have a fixed number of objects, a fixed pattern to use mutex for each object access, so that test can be repeated, and results are identical for each run.

First we run queries and make dumps to expose row cache data structure and content. Then we perform 10222 event trace and gdb script to understand row cache data access and row cache mutex usage. Finally we make a few discussions on the test result.

Note: Tested in Oracle 19c on Linux.

Acknowledgement: This work results from the help and discussions with a longtime Oracle specialist.


1. Row Cache: 'dc_props' and 'dc_cdbprops'


Following query shows that dc_props has 85 objects, and dc_cdbprops has 6.

select cache#, type, subordinate#, parameter, count, usage, fixed, gets, fastgets 
from v$rowcache where parameter in ('dc_props', 'dc_cdbprops');

  CACHE# TYPE   SUBORDINATE# PARAMETER   COUNT USAGE FIXED    GETS FASTGETS
  ------ ------ ------------ ----------- ----- ----- ----- ------- --------
      15 PARENT              dc_props       85    85     0 1162164        0
      60 PARENT              dc_cdbprops     6     6     0    1127        0
One query on nls_database_parameters requires 88 row cache Gets. (GETS_DELTA should be 85, the difference of 3 is due to v$rowcache query. dc_props is the backbone of nls_database_parameters. See Blog: nls_database_parameters, dc_props, latch: row cache objects )

column gets new_value gets_old;
column fastgets new_value gets_oldf;
select sum(gets) gets, sum(fastgets) fastgets from v$rowcache where parameter = 'dc_props';

select value from nls_database_parameters where parameter = 'NLS_CHARACTERSET';

select (sum(gets)-&gets_old) gets_delta, (sum(fastgets)-&gets_oldf) gets_deltaf, sum(gets) gets, sum(fastgets) fastgets from v$rowcache where parameter = 'dc_props';

  GETS_DELTA  GETS_DELTAF        GETS    FASTGETS
  ----------  -----------  ----------  ----------
          88            0      138749           0    
By the way, Oracle has a hidden parameter to turn on FASTGETS when TRUE, and 19c changed the default value from FALSE to TRUE.
    _kqr_optimistic_reads
Description: optimistic reading of row cache objects
Default: FALSE (Oracle 12.1); TRUE (Oracle 19.6)
Change: alter system set "_kqr_optimistic_reads"=false; (require DB restart)
One more query reveals that 85 'dc_props' objects are distributed into 5 hash buckets with size varied from 1 to 6 (no hash_bucket_size 5). Most of objects are hashed into bucket size of 1 or 2. Later we will see the same hash mapping in row_cache dump.

select Hash_Bucket_Size, count(*) Number_of_Buckets, (Hash_Bucket_Size * count(*)) Number_of_RowCache_Objects from 
  (select (hash+1) Hash_Bucket, count(*) Hash_Bucket_Size from v$rowcache_parent where cache_name = 'dc_props' group by hash)
group by Hash_Bucket_Size order by 2 desc;

  HASH_BUCKET_SIZE NUMBER_OF_BUCKETS NUMBER_OF_ROWCACHE_OBJECTS
  ---------------- ----------------- --------------------------
                 1                22                         22
                 2                13                         26
                 3                 5                         15
                 4                 4                         16
                 6                 1                          6
Here the similar query for 6 'dc_cdbprops' objects:

select Hash_Bucket_Size, count(*) Number_of_Buckets, (Hash_Bucket_Size * count(*)) Number_of_RowCache_Objects from 
  (select (hash+1) Hash_Bucket, count(*) Hash_Bucket_Size from v$rowcache_parent where cache_name = 'dc_cdbprops' group by hash)
group by Hash_Bucket_Size order by 2 desc;

  HASH_BUCKET_SIZE NUMBER_OF_BUCKETS NUMBER_OF_ROWCACHE_OBJECTS
  ---------------- ----------------- --------------------------
                 2                 2                          4
                 1                 2                          2      
With following helper function and query, we can reveal the exact name of each 'dc_props' row cache object (See Blog: Oracle ROWCACHE Views and Contents)

----helper function----
create or replace function dump_hex2str (dump_hex varchar2) return varchar2 is
  l_str varchar2(100);
begin
  with sq_pos as (select level pos from dual connect by level <= 1000)
      ,sq_chr as (select pos, chr(to_number(substr(dump_hex, (pos-1)*2+1, 2), 'XX')) ch
                  from sq_pos where pos <= length(dump_hex)/2)
  select listagg(ch, '') within group (order by pos) word
    into l_str
  from sq_chr;
  return l_str;
end;
/
     
select dump_hex2str(rtrim(key, '0')) dc_prop_name, v.* 
from v$rowcache_parent v 
where cache_name in ('dc_props') 
order by key; 

  DC_PROP_NAME         INDX HASH ADDRESS          CACHE# CACHE_NAME EXISTENT
  -------------------- ---- ---- ---------------- ------ ---------- --------
  COMMON_DATA_MA       8840    1 0000000094E51558     15 dc_props   N
  MAX_PDB_SNAPSHOTS    8838    0 0000000093638360     15 dc_props   Y
  NLS_SPECIAL_CHARS    8842    2 000000009F55FB18     15 dc_props   N
  NLS_TIMESTAMP_FORMAT 8841    2 000000008C2726C8     15 dc_props   Y
  PDB_AUTO_UPGRADE     8839    0 0000000095BCE218     15 dc_props   N
  ...
  85 rows selected.              


2. Row Cache Data Dump


With following command, we can dump 'dc_props'. (See MOS Bug 19354335 - Diagnostic enhancement for rowcache data dumps (Doc ID 19354335.8))

alter session set tracefile_identifier = 'dc_props_dump';
-- dump level 0xf2b: f is cache id 15 ('dc_props'), 2 is single cacheiddump, b is level of 11
alter session set events 'immediate trace name row_cache level 0xf2b';
alter session set events 'immediate trace name row_cache off';
The dump consists of 3 sections, the first is about all ROW CACHE STATISTICS, the second is ROW CACHE HASH TABLE for our specified 'dc_props' (same as above query output about HASH_BUCKET), the third is about each BUCKET and its grouped rows. For example, 4 rows are hashed to BUCKET 50. After BUCKET 50 is BUCKET 52, there is no BUCKET 51.

We will pick BUCKET 50 and its last row cache parent object "addr=0x8d7bc858" in our later discussion.

ROW CACHE STATISTICS:
cache                          size     gets  misses  hit ratio  
--------------------------  -------  -------  ------  ---------  
dc_tablespaces                  560    94329    3277      0.966  
dc_free_extents                 336        0       0      0.000  
dc_segments                     416     9751    3050      0.762  
dc_rollback_segments            496   769333     367      1.000  
...

ROW CACHE HASH TABLE: cid=15 ht=0xa709f910 size=64
  Hash Chain Size     Number of Buckets
  ---------------     -----------------
  0                    0
  1                   22
  2                   13
  3                    5
  4                    4
  5                    0    <<< HASH_BUCKET_SIZE 5 has 0 Buckets. Same as previous query
  6                    1               
           
BUCKET 50:
  row cache parent object: addr=0x8d776c38 cid=15(dc_props) conid=0 conuid=0
  hash=641d81f1 typ=21 transaction=(nil) flags=00000001 inc=1, pdbinc=1
  own=0x8d776d08[0x8d776d08,0x8d776d08] wat=0x8d776d18[0x8d776d18,0x8d776d18] mode=N req=N
  status=EMPTY/-/-/-/-/-/-/-/-  KGH unpinned
  data=
  
  row cache parent object: addr=0x8e8c5b10 cid=15(dc_props) conid=0 conuid=0
  hash=534cbb1 typ=21 transaction=(nil) flags=00000002 inc=1, pdbinc=1
  own=0x8e8c5be0[0x8e8c5be0,0x8e8c5be0] wat=0x8e8c5bf0[0x8e8c5bf0,0x8e8c5bf0] mode=N req=N
  status=VALID/-/-/-/-/-/-/-/-  KGH unpinned
  
  row cache parent object: addr=0x8e8e5be0 cid=15(dc_props) conid=0 conuid=0
  hash=3cf1dc71 typ=21 transaction=(nil) flags=00000002 inc=1, pdbinc=1
  own=0x8e8e5cb0[0x8e8e5cb0,0x8e8e5cb0] wat=0x8e8e5cc0[0x8e8e5cc0,0x8e8e5cc0] mode=N req=N
  status=VALID/-/-/-/-/-/-/-/-  KGH unpinned
  
  row cache parent object: addr=0x8d7bc858 cid=15(dc_props) conid=0 conuid=0
  hash=819e131 typ=21 transaction=(nil) flags=00000002 inc=1, pdbinc=1
  own=0x8d7bc928[0x8d7bc928,0x8d7bc928] wat=0x8d7bc938[0x8d7bc938,0x8d7bc938] mode=N req=N
  status=VALID/-/-/-/-/-/-/-/-  KGH unpinned
  data=
  
  BUCKET 50 total object count=4
BUCKET 52:
  row cache parent object: addr=0x8d76c148 cid=15(dc_props) conid=0 conuid=0


3. Processstate Dump


Additionally, we make a Processstate Dump, which contains "call: 0xb74520f0", and can be mapped to later 10222 trace "pso=0xb74520f0", and gdb "kqrpre1 pso (SOC): b74520f0".

alter session set tracefile_identifier = 'proc_state';
alter session set events 'immediate trace name PROCESSSTATE level 10'; 
alter session set events 'immediate trace name PROCESSSTATE off'; 
Each SO (State Object) seems having one SOC (State Object Call (or Copy) as current instance)

PROCESS STATE
-------------
Process global information:
     process: 0xb84202a0, call: 0xb74520f0, xact: (nil), curses: 0xb898ecc8, usrses: 0xb898ecc8
     in_exception_handler: no
  ----------------------------------------
  SO: 0xb8eeb9a8, type: process (2), map: 0xb84202a0
      state: LIVE (0x4532), flags: 0x1
      owner: (nil), proc: 0xb8eeb9a8
      link: 0xb8eeb9c8[0xb8eeb9c8, 0xb8eeb9c8]
      child list count: 6, link: 0xb8eeba18[0xb8f060c8, 0xa6fe8208]
      pg: 0
  SOC: 0xb84202a0, type: process (2), map: 0xb8eeb9a8
       state: LIVE (0x99fc), flags: INIT (0x1)
  (process) Oracle pid:55, ser:96, calls cur/top: 0xb74520f0/0xb74520f0
...
    ----------------------------------------
    SO: 0xb8f226a8, type: call (3), map: 0xb74520f0
        state: LIVE (0x4532), flags: 0x1
        owner: 0xb8eeb9a8, proc: 0xb8eeb9a8
        link: 0xb8f226c8[0xa6fe8208, 0xb4ed1eb0]
        child list count: 0, link: 0xb8f22718[0xb8f22718, 0xb8f22718]
        pg: 0
    SOC: 0xb74520f0, type: call (3), map: 0xb8f226a8
         state: LIVE (0x99fc), flags: INIT (0x1)
    (call) sess: cur b898ecc8, rec 0, usr b898ecc8; flg:20 fl2:1; depth:0
    svpt(xcb:(nil) sptn:0x13f uba: 0x00000000.0000.00 uba: 0x00000000.0000.00)


4. Row Cache Event 10222


Make one 10222 trace: (See Blog: Oracle row cache objects Event: 10222, Dtrace Scripts (I))

alter session set tracefile_identifier = 'dc_props_10222';
alter session set events '10222 trace name context forever, level 4294967295';  
select value from nls_database_parameters where parameter = 'NLS_CHARACTERSET';
alter session set events '10222 trace name context off';
The trace file contains Function Calls and Call Stack, for example, Parent Object po=0x8d7bc858 has hash=819e131, pso=0xb74520f0. The same information can be found in the output of next gdb Script.

kqrpre: start hash=819e131 mode=S keyIndex=0 dur=CALL opt=FALSE
kqrpre: found cached object po=0x8d7bc858 flg=2
kqrmpin : kqrpre: found po Pin po 0x8d7bc858 cid=15 flg=2 hash=819e131
time=978904531
kqrpre: pinned po=0x8d7bc858 flg=2 pso=0xb74520f0
pinned stack po=0x8d7bc858 cid=15: ----- Abridged Call Stack Trace -----
ksedsts()+426<-kqrpre2 code="" kkdlpexecsql="" kkdlpexecsqlcbk="" kkdlpget="" kqrpre1="">


5. gdb Script


In Oracle session, we run again query on nls_database_parameters, and at the same time, trace the process with Blog appended Script: gdb_cmd_script2 in a UNIX window, write the trace output into file mutex-test2.log.

$ > gdb -x gdb_cmd_script2 -p 18158

SQL > select value from nls_database_parameters where parameter = 'NLS_CHARACTERSET';
Excerpted from mutex-test2.log, here the access pattern for row cache row with addr=0x8d7bc858.

Breakpoint 334, 0x00000000126e3930 in kqrpre1 ()
<<<==================== (1) kqrpre1 pso (SOC): b74520f0 ====================>>>
Breakpoint 124, 0x00000000126dd4e0 in kqrpre2 ()
Breakpoint 125, 0x00000000126df840 in kqrhsh ()
Breakpoint 335, 0x00000000126df8ad in kqrhsh ()
     (1) kqrhsh retq hash (rax): hash=819e131
Breakpoint 135, 0x00000000126e33f0 in kqrGetHashMutexInt ()
Breakpoint 336, 0x00000000129da970 in kgxExclusive ()
     (1-1) ---> kgxExclusive Mutex addr (rsi): a70a01a8
Breakpoint 146, 0x00000000126e5fd0 in kqrGetPOMutex ()
Breakpoint 336, 0x00000000129da970 in kgxExclusive ()
     (1-2) ---> kgxExclusive Mutex addr (rsi): 8d7bc960
Breakpoint 137, 0x00000000126e38c0 in kqrIncarnationMismatch ()
Breakpoint 132, 0x00000000126e2170 in kqrCacheHit ()
Breakpoint 338, 0x00000000126e2980 in kqrmpin ()
     (1) kqrmpin v$rowcache_parent.address (rdi): addr=8d7bc858 cid=15
Breakpoint 131, 0x00000000126e1fa0 in kqrFreeHashMutex ()
Breakpoint 337, 0x00000000129db980 in kgxRelease ()
     (1-1) <--- kgxRelease Mutex addr (*rsi): a70a01a8
Breakpoint 127, 0x00000000126dfd30 in kqrLockPo ()
Breakpoint 128, 0x00000000126e0180 in kqrAllocateEnqueue ()
Breakpoint 129, 0x00000000126e0790 in kqrget ()
Breakpoint 137, 0x00000000126e38c0 in kqrIncarnationMismatch ()
Breakpoint 137, 0x00000000126e38c0 in kqrIncarnationMismatch ()
Breakpoint 126, 0x00000000126dfc40 in kqrFreePOMutex ()
Breakpoint 337, 0x00000000129db980 in kgxRelease ()
     (1-2) <--- kgxRelease Mutex addr (*rsi): 8d7bc960
Breakpoint 139, 0x00000000126e3980 in kqrprl ()
Breakpoint 140, 0x00000000126e3ed0 in kqreqd ()
Breakpoint 146, 0x00000000126e5fd0 in kqrGetPOMutex ()
Breakpoint 336, 0x00000000129da970 in kgxExclusive ()
     (1-3) ---> kgxExclusive Mutex addr (rsi): 8d7bc960
Breakpoint 141, 0x00000000126e4d40 in kqrReleaseLock ()
Breakpoint 143, 0x00000000126e5910 in kqrmupin ()
Breakpoint 142, 0x00000000126e4ff0 in kqrpspr ()
Breakpoint 337, 0x00000000129db980 in kgxRelease ()
     (1-3) <--- kgxRelease Mutex addr (*rsi): 8d7bc960
Breakpoint 144, 0x00000000126e5e30 in kqrFreeEnqueue ()


6. Discussions

  1. gdb output mutex-test2.log contains 95 kqrpre1 Calls, each for one 'dc_props' / 'dc_cdbprops' object
    (total 85+6=91, 4 are fetched twice).

  2. For each row cache access, first compute a hash value (hash=819e131) and then access object (addr=8d7bc858).

  3. Each RowCache Object Get triggers three kgxExclusive Mutex Gets,
    one from kqrGetHashMutexInt (Mutex addr (rsi): a70a01a8),
    two from kqrGetPOMutex (Mutex addr (rsi): 8d7bc960).

  4. Mutex a70a01a8 is used in kqrGetHashMutexInt to protect hash BUCKET, hence BUCKET mutex.
    In above example, BUCKET 50 contains 4 row cache objects, so there are 4 occurrences in mutex-test2.log.

    Mutex 8d7bc960 is used in kqrGetPOMutex to protect individual row cache object, hence OBJECT mutex.
    So each row cache object has its own mutex.

  5. kqrGetHashMutexInt Mutex is released after first kqrGetPOMutex Get (interleaved).
    So at certain instant, one session can hold two Mutexes at the same time (same as latch).

    For latch, there is MOS Note: ORA-600 [504] "Trying to obtain a latch which is already held" (Doc ID 28104.1), which said:
    "This ORA-600 triggers when a latch request violates the ordering rules for obtaining latches and granting the latch would potentially result in a deadlock." (See Blog: ORA-600 [504] LATCH HIERARCHY ERROR Deadlock)

    For Mutex, it is not clear if there could exist such similar deadlock.

  6. The first Mutex Get is from kqrGetHashMutexInt, called by kqrpre, and released only after the second Mutex Get (the first of kqrGetPOMutex mutex Get). So kqrpre can hold two Mutexes and keep the first Mutex longer (till kqrFreeHashMutex).

    kqreqd makes one Mutex Get.

    In case of Latch (Oracle 12c), there are three Latch Gets with 3 different Locations:
    (see Blog: Oracle row cache objects Event: 10222, Dtrace Scripts (I) )
           Where=>4441(0x1159): kqrpre: find obj -> kslgetl()  -- latch Get at 1st Location
           Where=>4464(0x1170): kqreqd           -> kslgetl()  -- latch Get at 2nd Location
           Where=>4465(0x1171): kqreqd: reget    -> kslgetl()  -- latch Get at 3rd Location
    
    They are probably mapped to three kgxExclusive Mutex Gets as discussed in above Point (c).

  7. If one session, which held a mutex, but abnormally terminated, CLMN wrote a log about its cleanup and recovery of the mutex (0xa70a01a8).
           KGX cleanup...
           KGX Atomic Operation Log 0x928ad480
            Mutex 0xa70a01a8(14, 0) idn f00003c oper EXCL(6)
            Row Cache uid 14 efd 6 whr 19 slp 0
            oper=0 pt1=(nil) pt2=(nil) pt3=(nil)
            pt4=(nil) pt5=(nil) ub4=15 flg=0x8
           KQR UOL Recovery lc 928ad480 options = 1
    


7. kqrMutexLocations[] array


For all kqrMutexLocations in kqr.c, we can try to list them with following command. They often appear in AWR section "Mutex Sleep Summary" for Mutex Type: "Row Cache" (or V$MUTEX_SLEEP / V$MUTEX_SLEEP_HISTORY.location).
   
#Define Command to kqrMutexLocations
define PrintkqrMutexLocations
  set $i = 0
  while $i < $arg0 + $arg0
    x /s *(&kqrMutexLocations + $i)
    set $i = $i + 2
  end
end

(gdb) PrintkqrMutexLocations 51
0x13e7f36c:     "[01] kqrHashTableInsert"
0x13e7f384:     "[02] kqrUnlinkFromHashTable"
0x13e7f3a0:     "[03] kqrUpdateHashTable"
0x13e7f3b8:     "[04] kqrshu"
0x13e7f3c4:     "[05] kqrWaitForObjectLoad"
0x13e7f3e0:     "[06] kqrGetClusterLock"
0x13e7f3f8:     "[07] kqrBackgroundInvalidate"
0x13e7f418:     "[08] kqrget"
0x13e7f424:     "[09] kqrReleaseLock"
0x13e7f438:     "[10] kqreqd"
0x13e7f444:     "[11] kqrfpo"
0x13e7f450:     "[12] kqrpad"
0x13e7f45c:     "[13] kqrsad"
0x13e7f468:     "[14] "
0x13e7f470:     "[15] kqrCacheHit"
0x13e7f484:     "[16] kqrReadFromDB"
0x13e7f498:     "[17] kqrCreateUsingSecondaryKey"
0x13e7f4b8:     "[18] kqrCreateNewVersion"
0x13e7f4d4:     "[19] kqrpre"
0x13e7f4e0:     "[20] kqrpla"
...
0x13e7f698:     "[50] kqrpspr"
0x13e7f6a8:     "[51] kqrglblkrs"


8. gdb script: gdb_cmd_script2


Before running the script, the breakpoint number and address have to be adjusted.

### ----================ gdb script: gdb -x gdb_cmd_script2 -p 18158 ================----
set pagination off
set logging file /temp/mutex-test2.log
set logging overwrite on
set logging on
set $k = 0
set $g = 0
set $r = 0
set $spacex = ' '

rbreak ^kqr.*
rbreak ^kgx.*

command 1-333                               
continue
end

#kqrmpin 
delete 133 
#kqrpre1   
delete 138 
#kgxExclusive   
delete 326 
#kgxRelease                               
delete 331   

#10222 trc pso, processstate dump SOC (State Object Call): $r8
break *kqrpre1
command 
printf "<<<==================== (%i) kqrpre1 pso (SOC): %x ====================>>>\n", ++$k, $r8
set $g = 0 
set $r = 0              
continue
end

#kqrhsh+109 retq, output hash value in "oradebug dump row_cache 0xf2b" (0xf2b is for "dc_props") 
break *0x00000000126df8ad     
command
printf "%5c(%i) kqrhsh retq hash (rax): hash=%x\n", $spacex, $k, $rax 
continue
end

break *kgxExclusive
command 
printf "%5c(%i-%i) ---> kgxExclusive Mutex addr (rsi): %x\n", $spacex, $k, ++$g, $rsi             
continue
end

break *kgxRelease
command
printf "%5c(%i-%i) <--- kgxRelease Mutex addr (*rsi): %x\n", $spacex, $k, ++$r, *((int *)$rsi)   
continue
end

#output v$rowcache_parent.address for row cache object
break *kqrmpin     
command
printf "%5c(%i) kqrmpin v$rowcache_parent.address (rdi): addr=%x cid=%i\n", $spacex, $k, $rdi, $r11
continue
end