Tuesday, February 25, 2020

Cache Buffer Chains Latch Contention Case Study-2: Reverse Primary Key Index

Continue with previous Blog: Cache Buffer Chains Latch Contention Case Study-1: Reverse Key Index, now we are going to make the second Case Study: Reverse Primary Key Index. It shows the diligent work of Oracle optimization.

Note: Tested in Oracle 12c, 18c, 19c.


1. Test Setup


First we drop the previous reverse index and create a new reverse primary key index, then get object_id of that index:

--drop table test_tab purge;
--create table test_tab (id number, val number);

drop index test_tab#r;

create unique index test_tab#r on test_tab(id) reverse;

alter table test_tab add constraint test_tab#r primary key (id);

select object_id, to_char(object_id, 'XXXXXXXX') id_hex, data_object_id, to_char(data_object_id, 'XXXXXXXX') did_hex 
from dba_objects where object_name = 'TEST_TAB#R';

  OBJECT_ID ID_HEX DATA_OBJECT_ID DID_HEX
  --------- ------ -------------- -------
    2460326 258AA6        2460326  258AA6


2. Leaf Block Split Point Finding


With appended Plsql procedure: leaf_split_finder, we can find one exact point of leaf block split.

SQL > exec leaf_split_finder(5000);

--- Leaf Split at row count: 4906, (id between 1 and 11) in 2 Blocks
block: 2661262, cnt: 5, min_id: 1, max_id: 5
block: 3665613, cnt: 6, min_id: 6, max_id: 11
Above output shows index split occurs at row count: 4906. In the previous Blog, that is 4708. So primary key index is more compact (4.2%) than normal index.


3. CBC Latch Contention Producing


Open a Plsql Window, truncate table, insert 4905 rows at first.

truncate table test_tab;

insert into test_tab select level, -level from dual connect by level <= 4905;

commit;

-- using above index object_id to substitute for 2460326
select blk, count(*), min(id) min_id, max(id) max_id from (
  select id 
        ,dbms_rowid.rowid_block_number(sys_op_lbid (2460326, 'L', rowid)) blk
    from test_tab)  
where id between 1 and 11
group by blk order by min_id, blk;

      BLK COUNT(*) MIN_ID MAX_ID
  ------- -------- ------ ------
  2661263       11      1     11
We can see all rows from 1 to 11 are in the same leaf block (2661263).

Now we start two groups of jobs, each group having 6 jobs in a query loop (see appended test_tab_select_job). All 6 jobs in the first group access row 1 to 5; whereas 6 jobs in the second group access row 6 to 10;

exec test_tab_select_job(6, 1e9, 1, 0);

exec test_tab_select_job(6, 1e9, 6, 0);

-- stop all jobs by:
--   exec clean_jobs;
Each job in the first group executes the query with id from 1 to 5 in a loop, whereas job in the second group executes it with id from 6 to 10. Both are with the same Xplan (INDEX UNIQUE SCAN. In previous Blog, it is INDEX RANGE SCAN).

select val from test_tab where id = :B1
 
------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |       |       |     1 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| TEST_TAB   |     1 |    26 |     1   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | TEST_TAB#R |     1 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Now insert a single new row:

insert into test_tab values(4906, -4906);

select blk, count(*), min(id) min_id, max(id) max_id from (
  select id 
        ,dbms_rowid.rowid_block_number(sys_op_lbid (2460326, 'L', rowid)) blk
    from test_tab)  
where id between 1 and 11
group by blk order by min_id, blk;

      BLK COUNT(*) MIN_ID MAX_ID
  ------- -------- ------ ------
  2661263        5      1      5
  3665609        6      6     11
One leaf block split occurs. Row 6 to 11 are moved to new block (3665609). Now if we look again all 12 job sessions. Only 6 sessions in the second group have contention, similar to the case of previous Blog.

select s.client_identifier, s.sid, s.serial#, p.pid, p.spid, s.event, p1, p2, p3, latchwait, latchspin
  from v$session s, v$process p
 where s.paddr=p.addr and lower(s.program) like '%j0%' 
 order by s.client_identifier; 

  CLIENT_IDENTIFIER     SID  SERIAL#  PID  SPID  EVENT                        P1          P2       P3  LATCHWAIT         LATCHSPIN
  --------------------  ---  -------  ---  ----  ---------------------------  ----------  -------  --  ----------------  ---------
  test_tab_select_id_1  725  53859    46   5226  buffer busy waits            1548        2661259  1
  test_tab_select_id_1  549  449      51   5236  null event                   0           0        0
  test_tab_select_id_1  547  35779    45   5224  buffer busy waits            1548        2661263  1
  test_tab_select_id_1  6    60174    42   5218  buffer busy waits            1548        2661263  1
  test_tab_select_id_1  8    49157    48   5230  buffer busy waits            1548        2661263  1
  test_tab_select_id_1  187  44089    43   5220  null event                   0           0        0
  test_tab_select_id_6  727  18689    52   5238  latch: cache buffers chains  6636068072  228      0   000000018B8A5CE8
  test_tab_select_id_6  369  18373    50   5234  latch: cache buffers chains  6636068072  228      0   000000018B8A5CE8
  test_tab_select_id_6  189  58221    49   5232  latch: cache buffers chains  6636068072  228      0   000000018B8A5CE8
  test_tab_select_id_6  905  10131    53   5240  latch: cache buffers chains  6636068072  228      0   000000018B8A5CE8
  test_tab_select_id_6  367  49253    44   5222  latch: cache buffers chains  6636068072  228      0
  test_tab_select_id_6  903  64294    47   5228  latch: cache buffers chains  6636068072  228      0   000000018B8A5CE8  
A further query on v$latch_children shows majority of CBC SLEEPS (140174) is on child latch: 000000018B8A5CE8 of second group. The child latch: 000000018B8DC5E8 of first group almost has no SLEEPS and WAIT_TIME.

select latch#, child#, addr, gets, misses, sleeps, spin_gets, wait_time, round(wait_time/sleeps) us_per_sleep 
  from v$latch_children 
 where addr = hextoraw('000000018B8DC5E8') or addr = hextoraw('000000018B8A5CE8'); 

  LATCH#  CHILD#  ADDR              GETS       MISSES    SLEEPS   SPIN_GETS  WAIT_TIME  US_PER_SLEEP
  ------  ------  ----------------  ---------  --------  -------  ---------  ---------  ------------
  228     31143   000000018B8DC5E8  57352278   59783     4       59779      81         20
  228     29979   000000018B8A5CE8  104090722  28359401  140174  28213248   334439191  2386  
With one more query on v$latch_misses, we can display CBC latch contention locations.

select * from v$latch_misses 
 where parent_name = 'cache buffers chains' and sleep_count > 0
 order by sleep_count desc;
 
  PARENT_NAME           WHERE                     NWFAIL_COUNT  SLEEP_COUNT  WTR_SLP_COUNT  LONGHOLD_COUNT  LOCATION
  --------------------  ------------------------  ------------  -----------  -------------  --------------  -----------------------
  cache buffers chains  kcbgtcr: slow path excl    0             181709       24123          2864            kcbgtcr: slow path excl
  cache buffers chains  kcbgtcr: fast path exam    0             61093        160708         10363           kcbgtcr: fast path exam
  cache buffers chains  kcbgkcbcr                  0             8999         65758          1806            kcbgkcbcr
  cache buffers chains  kcbrls: pin release        0             7171         10946          1580            kcbrls: pin release
  cache buffers chains  kcb_trim_hash_chain        0             2191         19             30              kcb_trim_hash_chain
  cache buffers chains  kcbchg: cr pin cleanout    0             444          82             43              kcbchg: cr pin cleanout
  cache buffers chains  kcbcge                     0             3            54             1               kcbcge
  cache buffers chains  kcbchg: cr pin change      0             3            0              0               kcbchg: cr pin change
  cache buffers chains  kcbzwb                     0             1            0              0               kcbzwb
  cache buffers chains  kcbgtcr: retry after read  0             1            0              0               kcbgtcr: retry after read 
The top Location are "kcbgtcr: slow path excl" and "kcbgtcr: fast path exam". "kcbgtcr: slow path excl" has higher SLEEP_COUNT, but "kcbgtcr: fast path exam" has much higher WTR_SLP_COUNT (Number of times a waiter slept) and LONGHOLD_COUNT (Number of times someone held a latch for the entire duration of someone else's sleep).

In the previous Blog for normal reverse index, the top Location is "kcbgtcr: fast path pin".

LONGHOLD_COUNT is a parmountal indicator for latch misses since it signifies the sleeping sessions can go to sleep again ("Recurrent Sleeps", see Blog: Oracle Latch Misses, Spin_gets, Sleeps and Recurrent_Misses ). But it is not included in AWR - "Latch Miss Sources" section.

If we make an index TEST_TAB#R tree dump, we can see two leaf blocks: 2661263 and 3665609:

alter session set tracefile_identifier = "TD_TEST_TAB#R_1";
alter session set events 'immediate trace name treedump level 2460326';

----- begin tree dump
branch: 0x13589b 1267867 (0: nrow: 16, level: 1)
   leaf: 0x289b8f 2661263 (-1: row:291.291 avs:3932)
   leaf: 0x37eec9 3665609 (0: row:282.282 avs:4060)
   leaf: 0x1358a1 1267873 (1: row:295.295 avs:3878)
Above dump shows that index root node block is 1267867. With following query, we can see that it is the next block of header_block: 1267866 on index TEST_TAB#R segment:

SQL > select segment_name, header_file, header_block from dba_segments where segment_name = 'TEST_TAB#R';

  SEGMENT_NAME HEADER_FILE HEADER_BLOCK
  ------------ ----------- ------------
  TEST_TAB#R            22      1267866
Even though there is heavy "latch: cache buffers chains" contention, it seems that no blocking sessions are tracked for such wait event:

SQL > select * from v$wait_chains where chain_signature like 'latch: cache buffers chains' or blocker_sid is not null;

      no rows selected


4. Latch State Transition Diagram By Dtrace


Pick one CBC latch contention session (SID: 367, Oracle PID: 44 (0x2C), spid: 5222) from above v$session and v$process query.

Run following Dtrace Script on contention latch address: 0x18B8A5CE8 (For Latch State Transition Diagram, See Blog: PL/SQL Function Result Cache Invalidation (I)).

sudo dtrace -w -n \
'
typedef unsigned int ub4;

typedef struct latchstate /* latch state */ {
    ub4 nproc_pid;  /* Nproc or pid   */
    ub4 flag;       /* blocking flag  */ 
    ub4 gets;       /* number of gets */
    ub4 latch;      /* latch number   */
    ub4 level;      /* latch level    */
    ub4 where; } latchstate;
    
BEGIN {laddr = $1; step = 1; begin_gets = 0, end_gets = 0; prev_ts = timestamp;
   self->evt = "begin"; self->loc = 0x0;}
pid$target::kslgess:entry /arg0 == laddr && (self->evt != probefunc || self->loc != arg4)/ { 
   printf("\n---------Step %d at %Y (Elapsed-ns=%d)---------", step, walltimestamp, (timestamp-prev_ts));
   ls = ((latchstate *) copyin(arg0, sizeof(latchstate)));
   begin_gets = (begin_gets == 0 ? ls->gets: begin_gets); end_gets = ls->gets;
   printf("\n***State Entry>>>(nproc_pid=>%d, flag=>%d, gets=>%d, latch=>%d, level=>%d, where=>%d)"
           ,ls->nproc_pid, ls->flag, ls->gets, ls->latch, ls->level, ls->where);
   printf("\n>>>Transition>>>%s:%s\n      (Addr=>0x%X, Where=>%d)", 
          probefunc, probename, arg0, arg4);
   self->evt = probefunc; self->loc = arg4;    
   ustack(10, 0); 
   /*stop(); exit(0);*/    /* no latch blocking, failed get, try again */
   step = step + 1; self->entry = 1; prev_ts = timestamp;} 
pid$target::kslgess:return /self->entry == 1/ {
   ls = ((latchstate *) copyin(laddr, sizeof(latchstate))); 
   printf("\n>>>State Return***(nproc_pid=>%d, flag=>%d, gets=>%d, latch=>%d, level=>%d, where=>%d)"
           ,ls->nproc_pid, ls->flag, ls->gets, ls->latch, ls->level, ls->where);
   /*stop(); exit(0);*/    /* simulate long latch holder, run by: dtrace -w -n */
   self->entry = 2;}   
pid$target::kslfre:entry /self->entry == 2 && arg0 == laddr && self->evt != probefunc/ {
   printf("\n---------Step %d at %Y (Elapsed-ns=%d)---------", step, walltimestamp, (timestamp-prev_ts));
   ls = ((latchstate *) copyin(arg0, sizeof(latchstate)));
   end_gets = ls->gets;
   printf("\n>>>Transition>>>%s:%s(Addr=>0x%-X)", probefunc, probename, arg0);
   self->evt = probefunc; self->loc = 0x0;
   /*stop(); exit(0);*/      /* simulate long latch holder, run by: dtrace -w -n */
   printf("\n===Latch freed===");
   ustack(10, 0); step = step + 1; prev_ts = timestamp;}      
END {
   printf("\n---------Step %d at %Y (Elapsed-ns=%d)---------", step, walltimestamp, (timestamp-prev_ts));
   printf("\nLatch Gets = %d (%d - %d)", (end_gets - begin_gets), (unsigned long)begin_gets, (unsigned long)end_gets); }
' -p 5222 0x18B8A5CE8
The output is some repeated occurrences of following pattern:

   kslgess:entry
---------Step 1825 at 2020 Feb 24 15:51:09 (Elapsed-ns=6088)---------
***State Entry>>>(nproc_pid=>47, flag=>536870912, gets=>461619058, latch=>228, level=>1, where=>2168)
>>>Transition>>>kslgess:entry
      (Addr=>0x18B8A5CE8, Where=>2218)
              oracle`kslgess
              a.out`kcbgtcr+0x1a323
              a.out`ktrgtc2+0x316
              a.out`qeilbk1+0x4a1
              a.out`qeilbk0+0xe
              a.out`qerixFetchUniqueIndex+0x17f
              a.out`qerstFetch+0x543
              a.out`qertbFetchByRowID+0x39e
              a.out`qerstFetch+0x4ca
              a.out`opifch2+0x188b

   kslgess:return
>>>State Return***(nproc_pid=>44, flag=>536870912, gets=>461619062, latch=>228, level=>1, where=>2218)

   kslgess:entry
---------Step 1826 at 2020 Feb 24 15:51:09 (Elapsed-ns=250117)---------
***State Entry>>>(nproc_pid=>43, flag=>536870912, gets=>461619113, latch=>228, level=>1, where=>2186)
>>>Transition>>>kslgess:entry
      (Addr=>0x18B8A5CE8, Where=>2186)
              oracle`kslgess
              oracle`kcbrls+0x170c
              a.out`ktrgtc2+0x554
              a.out`qeilbk1+0x4a1
              a.out`qeilbk0+0xe
              a.out`qerixFetchUniqueIndex+0x17f
              a.out`qerstFetch+0x543
              a.out`qertbFetchByRowID+0x39e
              a.out`qerstFetch+0x4ca
              a.out`opifch2+0x188b
It shows the CBC Latch State Transition. At first, C subroutine kslgess was called to get CBC latch in entry location (where=>2168) and next location (where=>2186). Then freed by kslfre. CallStack shows that each row is accessed by qertbFetchByRowID via qerixFetchUniqueIndex (TABLE ACCESS BY INDEX ROWID via INDEX UNIQUE SCAN).

For the where location, we can use the query below to find them.

CREATE OR REPLACE FORCE VIEW X_GV$LATCH_MISSES as
SELECT t1.indx              INDX,
       t1.inst_id           INST_ID,       
       t1.ksllasnam         PARENT_NAME,   
       t2.ksllwnam          "WHERE",       
       t1.kslnowtf          NWFAIL_COUNT,  
       t1.kslsleep          SLEEP_COUNT,   
       t1.kslwscwsl         WTR_SLP_COUNT, 
       t1.kslwsclthg        LONGHOLD_COUNT,
       t2.ksllwnam          LOCATION,      
       t1.con_id            CON_ID         
  FROM x$ksllw t2, x$kslwsc t1
 WHERE t2.indx = t1.indx;
 
 select * from sys.X_GV$LATCH_MISSES 
where parent_name in ('cache buffers chains') 
  and indx in (2168, 2186, 2218);
  
INDX  INST_ID  PARENT_NAME           WHERE                   NWFAIL_COUNT  SLEEP_COUNT  WTR_SLP_COUNT  LONGHOLD_COUNT  LOCATION
----  -------  --------------------  ----------------------  ------------  -----------  -------------  --------------  ---------------------
2168  1        cache buffers chains  kcbgtcr: fast path exam  0             106323       185902         15798           kcbgtcr: fast path exam
2186  1        cache buffers chains  kcbrls: pin release      0             14959        22770          3187            kcbrls: pin release
2218  1        cache buffers chains  kcbgtcr: slow path excl  0             189591       48044          3576            kcbgtcr: slow path excl
The above output showed that the most LONGHOLD_COUNT is in location "kcbgtcr: fast path exam" when using INDEX UNIQUE SCAN to make TABLE ACCESS. Whereas when using INDEX RANGE SCAN, the top location is "kcbgtcr: fast path pin". This tiny nuance of location change can play a critical role in CBC latch contention investigation. It also shows the different handling of reverse primary key index vs.normal reverse index, which is probably some diligent work of Oracle optimization.

In the above Dtrace, we trace kslgess function. Blog: Oracle 12 and latches, part 2 described it as:
  kslgess: kernel service latch get spin shared (latch)
Blog: Latch, mutex and beyond written:
  kslgess: shared latch wait get
About kcbrls, Blog: Buffer is pinned count said:
  The function kcbrls (Kernel Cache Buffer Release) is used to release a buffer, also it is possible from another places, 
  for example from kcb_post_apply can call kcbzfs (detach handle from x$kcbbf) directly.


5. Buffer Cache Activity Internals


If we make 10200 and 10046 trace for 100 loops for id from 6 to 10, we can see 500 "ktrgtc2 -> ktrexc" INDEX UNIQUE SCAN on the new block (3665609=0x0037eec9).
("ktrexc: for unique index and index root" and "ktrexf: for non-unique index".
see Blog: Oracle Unique Index Non-Consistency Read with RowCR)

alter session set events='10046 trace name context forever, level 12: 
                          10200 trace name context forever, level 10' 
      tracefile_identifier='10046_10200_index_rp_2';  
exec test_tab_select(6, 1000, 6, 0.00);
alter session set events='10046 trace name context off : 10200 trace name context off ';


ktrgtc2(): started for block <0x07cf : 0x0037eec9> objd: 0x00258aa6
ktrexc(): returning 9 
ktrgtc2(): completed for block <0x07cf : 0x0037eec9> objd: 0x00258aa6


6. X$TRACE Latch Operations


In this section, we try to use X$TRACE to trace latch operations (10005). Pick one CBC latch contention session (SID: 367, Oracle PID: 44 (0x2C), spid: 5222) and trace all levels (15).

$ > oerr ora 10005
10005, 00000, "trace latch operations for debugging"
// *Action: Enable tracing for various latch operations
//    level 1 - trace latch gets and frees
//    level 2 - protect/unprotect latch recovery areas
//    level 4 - trace multiple posts by processes when latch is freed
//    level 8 - signal error on bad latch waits


alter tracing enable "10005:15:44";

SQL> select * from x$trace_events where procs in (44);
ADDR              INDX  INST_ID  CON_ID  EVENT  TRCLEVEL  STATUS  PROCS  FLAGS
----------------  ----  -------  ------  -----  --------  ------  -----  -----
FFFF80FFBC1361C8  5     1        0       10005  15        1       44     1
Following query on x$trace shows the latch begin (KSL WAIT BEG) and end (KSL WAIT END) for each seq_num on contention child latch (6636068072/0x18b8a5ce8). One has a longer elapsed time of 169973 usecs, another is short as of 81 usecs.

169973 usecs is about 169 ms. For a CPU time slice of 10 ms, the process can be scheduled to run after 17 sleeps. That is also the case we have observed in some real applications.

select event, seq#, pid, file_loc,function, data 
  from x$trace 
 where pid in (44) 
 order by time;

EVENT SEQ# PID FILE_LOC    FUNCTION  DATA
----- ---- --- ----------- --------- ----------------------------------------------------------------------------------------------------------------------
10005 7186 44  ksl2.c@2966 kslwtbctx KSL WAIT BEG [latch: cache buffers chains] 6636068072/0x18b8a5ce8 228/0xe4 0/0x0 wait_id=11277 seq_num=11278 snap_id=1
10005 7200 44  ksl2.c@3109 kslwtectx KSL WAIT END [latch: cache buffers chains] 6636068072/0x18b8a5ce8 228/0xe4 0/0x0 wait_id=11277 seq_num=11278 snap_id=1
10005 7201 44  ksl2.c@3117 kslwtectx KSL WAIT END wait times (usecs) - snap=169973, exc=169973, tot=169973

10005 7280 44  ksl2.c@2966 kslwtbctx KSL WAIT BEG [latch: cache buffers chains] 6636068072/0x18b8a5ce8 228/0xe4 0/0x0 wait_id=11278 seq_num=11279 snap_id=1
10005 7301 44  ksl2.c@3109 kslwtectx KSL WAIT END [latch: cache buffers chains] 6636068072/0x18b8a5ce8 228/0xe4 0/0x0 wait_id=11278 seq_num=11279 snap_id=1
10005 7302 44  ksl2.c@3117 kslwtectx KSL WAIT END wait times (usecs) - snap=35031, exc=35031, tot=35031

10005 7312 44  ksl2.c@2966 kslwtbctx KSL WAIT BEG [latch: cache buffers chains] 6636068072/0x18b8a5ce8 228/0xe4 0/0x0 wait_id=11279 seq_num=11280 snap_id=1
10005 7317 44  ksl2.c@3109 kslwtectx KSL WAIT END [latch: cache buffers chains] 6636068072/0x18b8a5ce8 228/0xe4 0/0x0 wait_id=11279 seq_num=11280 snap_id=1
10005 7318 44  ksl2.c@3117 kslwtectx KSL WAIT END wait times (usecs) - snap=81, exc=81, tot=81