Saturday, July 28, 2018

row cache mutex in Oracle 12.2.0.1.0

In Oracle 12.2.0.1.0 (12cR2), "row cache mutex" replaced 12.1.0.2.0 (12cR1) "latch: row cache objects", similar to "latch: library cache" substitution by "library cache: mutex X" in the previous release.

Oracle Global Customer Support Document: Oracle Database 12c Release 2 Support Lessons Learned from Early Adopters wrote:
    "row cache mutex" replaces "latch: row cache objects" but rarely seen

As a start point to be acquainted with this new mechanism, this Blog takes the same Test Code in Blog: nls_database_parameters, dc_props, latch: row cache objects to focus only on Dictionary Cache dc_props.

We will run tests in both 12cR2 and 12cR1 by varying the number of parallel sessions, compare their behaviors, and then try to measure their scalability.

Note: All tests are done in 12.2.0.1.0 (12cR2) and 12.1.0.2.0 (12cR1) on Linux with 6 physical processors.

Update: (31-Aug-2021): "row cache mutex" seems caused hot objects, see:

  Bug 31135517 - High row cache mutex waits on hot objects - superseded (Doc ID 31135517.8)
  
  Description
      This problem is introduced in release 12.2 when latches are replaced with the introduction
      of row cache mutexes.
       
      In nearly all cases, the mutex architecture is better because the locking can be targeted at
      the object level. Because the mutexes are embedded within the objects themselves, there are
      no false conflicts caused by sessions trying to lock different objects within the same cache.
      So in general, contention is improved.
       
      It is unfortunate that where  one or two very 'hot' row cache objects exist(s), the previous
      architecture is better because the current architecture requires two mutex compared to one
      latch get.  The reasons for this are complex, but it is primarily because many row cache objects
      have multiple keys, and are linked into the hash table on multiple hash chains. The potential for
      contention is increased because the two mutex gets introduce multiple choke points.
       
      The fix implements 'hot object cloning' to alleviate row cache mutex  contention.  
       
      Stack from hang analyze trace will likely contain:
      ... kslwaitctx ksfwaitctx kgxWait kgxExclusive kqrGetMutexByAddr ..


1. 12cR2 "row cache mutex" vs. 12cR1 "latch: row cache objects"


The new Event "row cache mutex" is exposed with row cache parameters:

  P1TEXT = cache id 
  P2TEXT = where requested

select event, p1text, p1, p2text, p2, p3text, p3
  from v$session v where event in ('row cache mutex');
  
EVENT            P1TEXT    P1  P2TEXT           P2  P3TEXT  P3
---------------  --------  --  ---------------  --  ------  --
row cache mutex  cache id  15  where requested  19          0
"cache id" can be used to directly pinpoint the exact contention row cache objects.

Whereas in lower Releases, Event "latch: row cache objects" is exposed with latch parameters, and v$latch_children.child# is used to join with x$kqrst.kqrstcln and filtered by v$latch_children.addr = :P1RAW to find row cache info.

select event, p1text, p1, p1raw, p2text, p2, p2raw, p3text, p3, p3raw
  from v$session v where event in ('latch: row cache objects');

EVENT                     P1TEXT   P1          P1RAW             P2TEXT  P2   P2RAW             P3TEXT  P3  P3RAW
------------------------  -------  ----------  ----------------  ------  ---  ----------------  ------  --  -----
latch: row cache objects  address  5970309816  0000000163DBB2B8  number  411  000000000000019B  tries   0   00

select la.child# latch_child, kqrstcid cache#, kqrsttxt parameter, la.addr latch_addr --, dc.*, la.*
from x$kqrst dc, v$latch_children la 
where dc.kqrstcln = la.child# 
  and la.name     = 'row cache objects'
  and la.addr     = '0000000182D98958'; 

LATCH_CHILD  CACHE#  PARAMETER  LATCH_ADDR
-----------  ------  ---------  ----------------
18           15      dc_props   0000000163DBB2B8
12cR2 v$rowcache contains 71 rows, one new dictionary cache: 'dc_cdbprops' is added.

By the way, 'row cache objects' LATCH# is numbered as "506" (12cR1 is "411"), but hardly used.

select name, latch# from v$latch where name in ('row cache objects');

NAME               LATCH#  GETS
-----------------  ------  ----
row cache objects  506     456


2. Test


Run following test script in 12cR2 and 12cR1 on a Linux with 6 physical processors.
The script consists of 8 cases with 1, 2, 3, 4, 5, 6, 9 and 12 parallel sessions respectively.
Each case runs for 10 minutes.

declare
  type    t_num_tab is table of number;
  l_nums  t_num_tab := new t_num_tab(1, 2, 3, 4, 5, 6, 9, 12);
begin
  clean_jobs; 
  sys.dbms_workload_repository.create_snapshot('ALL');
  
  for n in 1..l_nums.count loop
    nls_select_jobs(l_nums(n), 1e9);    -- n Sessions
    dbms_lock.sleep(600);               -- 10 minutes
    clean_jobs; 
    sys.dbms_workload_repository.create_snapshot('ALL');
  end loop;
end;
/  


3. Test Result


After test, collect all 8 AWR reports in 12cR2 and 12cR1, extract relevant 4 Sections to compare.


3.1 12cR2


Top 10 Foreground Events by Total Wait Time

Session# Event            Waits     Total Wait Time (sec)   Avg Wait   % DB time  Wait Class
-------- ---------------  --------  ---------------------  --------   ---------   -----------
1        row cache mutex  9         0                       10.33us    0.0        Concurrency
2        row cache mutex  12,836    18                      1.46ms     1.6        Concurrency
3        row cache mutex  48,390    111                     2.31ms     5.9        Concurrency
4        row cache mutex  111,312   280                     2.52ms     10.5       Concurrency
5        row cache mutex  199,253   491                     2.47ms     14.6       Concurrency
6        row cache mutex  322,542   886                     2.75ms     20.1       Concurrency
9        row cache mutex  418,819   3030                    7.24ms     35.6       Concurrency
12       row cache mutex  573,556   4673                    8.15ms     38.7       Concurrency

SQL ordered by Elapsed Time
(SQL Id:   7mgsfc44trnr8
 SQL Text: SELECT VALUE FROM NLS_DATABASE_PARAMETERS WHERE PARAMETER = 'NLS_CHARACTERSET')

Session# Elapsed Time (s) Executions  Elapsed Time per Exec (s)  %Total  %CPU     %IO 
-------- ---------------- ----------  -------------------------  ------  -------  ----
1        498              3,174,194   0.00                       81.47   100.40   0.00
2        983              5,876,575   0.00                       83.48   98.52    0.00
3        1,515            8,383,434   0.00                       79.99   93.22    0.00
4        2,087            10,518,830  0.00                       77.82   87.48    0.00
5        2,533            11,688,284  0.00                       74.97   81.51    0.00
6        3,112            12,242,878  0.00                       70.67   71.93    0.00
9        4,894            9,121,274   0.00                       57.49   34.12    0.00
12       6,685            9,300,453   0.00                       55.32   25.77    0.00

Mutex Sleep Summary 

Session# Mutex Type  Location     Sleeps     Wait Time (ms)
-------- ----------  -----------  ---------  --------------
1        Row Cache   [19] kqrpre  12         0   
2        Row Cache   [19] kqrpre  16,734     18,744  
3        Row Cache   [19] kqrpre  70,398     108,322
4        Row Cache   [19] kqrpre  164,963    266,506
5        Row Cache   [19] kqrpre  290,989    466,843
6        Row Cache   [19] kqrpre  478,910    831,835
9        Row Cache   [19] kqrpre  890,437    2,929,696   
16       Row Cache   [19] kqrpre  1,268,073  4,499,378

1        Row Cache   [13] kqreqd  8          0 
2        Row Cache   [13] kqreqd  415        75
3        Row Cache   [13] kqreqd  1,536      300
4        Row Cache   [13] kqreqd  4,555      850
5        Row Cache   [13] kqreqd  9,913      1,625
6        Row Cache   [13] kqreqd  27,487     3,565
9        Row Cache   [13] kqreqd  20,223     14,429
16       Row Cache   [13] kqreqd  24,497     22,772

(Row Cache  [19] kqrpre: find obj 
 Row Cache  [13] kqreqd: reget)

Dictionary Cache Stats

Session# Cache     Get Requests  Pc t Miss  Scan Reqs  Pct Miss  Mod Reqs  Final Usage
-------- --------  ------------  ---------  ---------  --------  --------  -----------
1        dc_props  222,194,045   0.00       0                    0         70
2        dc_props  406,987,135   0.00       0                    0         70
3        dc_props  572,806,160   0.00       0                    0         70
4        dc_props  708,102,414   0.00       0                    0         70
5        dc_props  776,478,519   0.00       0                    0         70
6        dc_props  807,080,147   0.00       0                    0         70
9        dc_props  605,040,134   0.00       0                    0         70
12       dc_props  616,520,863   0.00       0                    0         70


3.2 12cR1


Top 10 Foreground Events by Total Wait Time

Session# Event                      Waits      Total Wait Time (sec)  Avg Wait(ms)   % DB time   Wait Class
-------- ------------------------   ---------  ---------------------  ------------   ---------   -----------
1        latch: row cache objects   12         0                      0.02           0.0         Concurrency
2        latch: row cache objects   19,775     1                      0.03           0.1         Concurrency
3        latch: row cache objects   259,069    6                      0.02           0.3         Concurrency
4        latch: row cache objects   612,329    16                     0.03           0.7         Concurrency
5        latch: row cache objects   1,008,481  36                     0.04           1.2         Concurrency
6        latch: row cache objects   1,189,823  116                    0.10           3.1         Concurrency
9        latch: row cache objects   1,036,139  2441                   2.36           30.2        Concurrency
12       latch: row cache objects   948,170    3861                   4.07           34.5        Concurrency

SQL ordered by Elapsed Time

Session# Elapsed Time (s)  Executions  Elapsed Time per Exec (s)  %Total  %CPU    %IO 
-------- ----------------  ----------  -------------------------  ------  ------  ----
1        506               6,164,161   0.00                       85.76   100.53  0.00
2        1,056             10,848,660  0.00                       88.01   100.32  0.00
3        1,614             9,977,224   0.00                       91.07   99.91   0.00
4        2,227             8,172,259   0.00                       93.72   99.69   0.00
5        2,843             6,951,871   0.00                       94.78   99.06   0.00
6        3,493             6,890,154   0.00                       93.93   96.45   0.00
9        5,243             7,245,168   0.00                       64.80   49.04   0.00
12       6,728             7,562,209   0.00                       60.19   36.65   0.00

Latch Sleep Breakdown

Session# Latch Name         Get Requests   Misses       Sleeps     Spin Gets
-------- -----------------  -------------  -----------  ---------  -----------
1        row cache objects  1,109,597,151  633          14         619   
2        row cache objects  1,953,334,110  105,279,309  20,757     105,259,556
3        row cache objects  1,796,543,339  378,078,181  269,547    377,819,261
4        row cache objects  1,471,777,609  541,981,017  632,528    541,369,153
5        row cache objects -3,043,081,012  596,060,413  1,034,539  595,053,127
6        row cache objects  1,229,364,378  689,496,609  1,200,975  688,326,423
9        row cache objects  1,304,630,648  561,205,207  1,091,371  560,171,774
16       row cache objects  1,361,872,758  607,120,518  1,007,494  606,174,258

 * -3,043,081,012 is due to 32-bit overflow, it should be 1,251,886,284 (=2^32 - 3,043,081,012).

Dictionary Cache Stats

Session# Cache     Get Requests  Pc t Miss  Scan Reqs  Pct Miss  Mod Reqs  Final Usage
-------- --------  ------------  ---------  ---------  --------  --------  -----------
1        dc_props  369,850,105   0.00       0                    0         60
2        dc_props  651,060,932   0.00       0                    0         60
3        dc_props  598,832,676   0.00       0                    0         60
4        dc_props  490,574,970   0.00       0                    0         60
5        dc_props  417,291,212   0.00       0                    0         60
6        dc_props  413,622,949   0.00       0                    0         60
9        dc_props  434,872,658   0.00       0                    0         60
12       dc_props  453,953,683   0.00       0                    0         60


4. Discussions


4.1 Top 10 Foreground Events by Total Wait Time


    12cR2 "row cache mutex" has about 2 times less "Waits" than 12cR1 "latch: row cache objects".
    That is the advantage of light-weight "Mutex" over "Latch".
    "Total Wait Time" and "Avg Wait" increase linearly when Session# <= 6 (number of physical processors), non-linearly when > 6.
    12cR2 "row cache mutex" has higher concurrency ("% DB time") than 12cR1 "latch: row cache objects".

    Strangely 12cR2 Mutex "Avg Wait" is much higher than that of 12cR1 Latch.


4.2 SQL ordered by Elapsed Time


    "Executions" in 12cR2 reaches maximum at Session# = 6, 20% higher than 12cR1 peak value at Session# = 2.
    It means 12cR1 can use less CPU resource to reach its peak value.

    12cR1 has higher throughput than 12cR2 when less Sessions (less than 4); but lower when more Sessions.


4.3 Mutex Sleep Summary vs Latch Sleep Breakdown


    12cR2 "row cache mutex" Sleeps and Wait Time increase non-linearly.
    12cR1 "latch: row cache objects" Misses and Sleeps increase non-linearly although "Get Requests" are almost similar.

    In 12cR1, "Waits" in AWR "Top 10 Foreground Events by Total Wait Time" is close to "Sleeps" in "Latch Sleep Breakdown".
    in 12cR2, it is far from "Sleeps" in "Mutex Sleep Summary".


4.4 Dictionary Cache Stats


    dc_props "Get Requests" depends on query "Executions".
    Each query (SQL Id: 7mgsfc44trnr8) Execution triggers 70 "Get Requests" in 12cR2, and 60 in 12cR1
    since dc_props rows (Field "Final Usage") increased from 60 in 2cR1 to 70 in 12cR2.


5. Test Code


create or replace procedure nls_select(p_cnt number) as
  l_val VARCHAR2 (256 Byte);
begin
for i in 1..p_cnt loop
  select value into l_val from nls_database_parameters where parameter = 'NLS_CHARACTERSET';
end loop;
end;
/

create or replace procedure nls_select_jobs(p_job_cnt number, p_cnt number) as
  l_job_id pls_integer;
begin
  for i in 1.. p_job_cnt loop
    dbms_job.submit(l_job_id, 'begin while true loop nls_select('||p_cnt||'); end loop; end;');
  end loop;
  commit;
end;    
/

create or replace procedure clean_jobs as
begin
  for c in (select job from dba_jobs) loop
    begin
       dbms_job.remove (c.job);
    exception when others then null;
    end;
    commit;
  end loop;

  for c in (select d.job, d.sid, (select serial# from v$session where sid = d.sid) ser 
              from dba_jobs_running d) loop
    begin
      execute immediate
             'alter system kill session '''|| c.sid|| ',' || c.ser|| ''' immediate';
      dbms_job.remove (c.job);
    exception when others then null;
    end;
    commit;
  end loop;
  
  dbms_lock.sleep(2);
  
  -- select * from dba_jobs;
  -- select * from dba_jobs_running;
end;
/

-- exec nls_select_jobs(2, 1e9);
-- exec clean_jobs; -- stop all jobs