Monday, May 25, 2020

One "row cache lock" Test Case

"row cache lock" is an Oracle locking mechanisms to protect disk layer persistent data (e.g. 'sys.user$'), whereas "latch: row cache objects" to protect memory (cache) layer volatile data (e.g. 'dc_users').

To improve performance, disk stored meta data are cached in memory layer. For any modifications, "row cache lock" is requested to synchronize and maintain data integrity and consistency.

"row cache lock" is well documented in MOS Docu:
  WAITEVENT: "row cache lock" Reference Note (Doc ID 34609.1)
    Parameters:
     P1 = cache - ID of the dictionary cache 
     P2 = mode - Mode held 
     P3 = request - Mode requested 
    p1:
     SELECT cache#, type, parameter FROM v$rowcache WHERE cache# = &P1;
    P2 and P3:
      KQRMNULL 0   null mode - not locked
      KQRMS    3   share mode
      KQRMX    5   exclusive mode
      KQRMFAIL 10  fail to acquire instance lock
In this Blog, we will demonstrate "row cache lock" with one test case, and then make further investigation with different tracing methods.

Note: Tested in Oracle 12cR1

(In Oracle 12.2.0.1.0 (12cR2), "row cache mutex" replaced 12.1.0.2.0 (12cR1) "latch: row cache objects", see Blog: row cache mutex in Oracle 12.2.0.1.0 )


1. Test Setup


At first, we create a test user and a procedure to change its identification.

drop user test_user cascade;
create user test_user identified by 123;
grant create session, alter session, create any procedure to test_user;

------- login as test_user -------
sqlplus test_user/123@testdb

create or replace procedure alter_user(p_cnt number) as
begin
  for i in 1..p_cnt loop
    execute immediate 'alter user test_user identified by 123';
  end loop;
end;
/

-- exec alter_user(1e1);


2. Contention Test


As first test, open two Sqlplus execution sessions: SID1, SID2 by test_user. And also open one monitoing session SID3. In SID1 and SID2, run above created procedure:

SID1 (367) > exec alter_user(1e6);

SID2 (549) > exec alter_user(1e6);
In monitoring session SID3, we can observe the contention on "row cache lock".

SID3 > 
select sid, event, total_waits, total_timeouts, time_waited, average_wait, max_wait, time_waited_micro 
from v$session_event 
where event = 'row cache lock' or sid in (367, 549)
order by total_waits desc;

  SID EVENT                     TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT   MAX_WAIT TIME_WAITED_MICRO
  --- ------------------------- ----------- -------------- ----------- ------------ ---------- -----------------
  549 row cache lock                 844170              0      510871          .61          6        5108705693
  367 row cache lock                 842831              0      510068          .61          5        5100680960
  549 library cache: mutex X          51920              0       12591          .24          2         125907876
  367 library cache: mutex X          46584              0       10950          .24          2         109501712
  549 latch: row cache objects         5198              0          17            0          0            165028
  367 latch: row cache objects         4963              0          16            0          0            156902
  549 cursor: mutex X                  1920              0         420          .22          1           4202576
  367 cursor: mutex X                  1746              0         380          .22          1           3801026
  549 latch: shared pool               1496              0           4            0          0             43000
  367 latch: shared pool               1344              0           4            0          0             39746

select event, total_waits, total_timeouts, time_waited, average_wait 
      ,total_waits_fg, total_timeouts_fg, time_waited_fg, average_wait_fg
from v$system_event where lower(event) like '%row cache%';

  EVENT                     TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT TOTAL_WAITS_FG TOTAL_TIMEOUTS_FG TIME_WAITED_FG AVERAGE_WAIT_FG
  ------------------------- ----------- -------------- ----------- ------------ -------------- ----------------- -------------- ---------------
  latch: row cache objects        14106              0      208739         14.8          13975              0            113296            8.11
  row cache lock                3440163              0    11800085         3.43        3440163              0          11800085            3.43

select chain_signature, osid, pid, sid, blocker_is_valid bvalid, blocker_sid, 
       p1, p1_text, p2, p3, in_wait_secs, num_waiters
  from v$wait_chains w
 order by in_wait_secs desc nulls first;
 
  CHAIN_SIGNATURE                   OSID PID SID BVALI BLOCKER_SID P1 P1_TEXT  P2 P3 IN_WAIT_SECS NUM_WAITERS
  --------------------------------- ---- --- --- ----- ----------- -- -------- -- -- ------------ -----------
  <not in a wait><='row cache lock' 2873  20 367 FALSE                                                      1
  <not in a wait><='row cache lock' 2876  22 549 TRUE          367 10 cache id  0  3            0           0

select * from v$lock where sid in (367, 549);

  ADDR      KADDR     SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK 
  --------- --------- --- -- --- --- ----- ------- ----- ----- 
  18C2A1A30 18C2A1AA8 549 AE 100   0     4       0  5141     0 
  18C2A1818 18C2A1890 367 AE 100   0     4       0  5164     0 
  18C236F68 18C236FE0 549 DT   0   0     4       0     0     0 
  18C29CF40 18C29CFB8 367 DT   0   0     4       0     0     0 
Above v$wait_chains.P1 = 10 ("cache id") is dc_users. Mode requested P3 = 3 is share mode.

Last query on v$lock shows DT lock is acquired (we will have more look later).

Note that if SID1 and SID2 are opened with a user other than test_user, run the same test, P3 = 5.

If let SID2 execute alter statement on another user, for example, test_user_2:

'alter user test_user_2 identified by 123'
the session wait events are:
  buffer busy waits
  latch: cache buffers chains
As we know, Row Cache or Data Dictionary Cache in SGA shared pool holds data as rows instead of buffers. Each row cache enqueue lock is a lock on one individual row (see later discussions on v$rowcache_parent).

select * from v$sgastat where name = 'row cache';

  POOL        NAME        BYTES
  ----------- --------- -------
  shared pool row cache 8640160
After the test, kill both SID1 and SID2.


3. Blocking Test


Open two new Sqlplus sessions: SID1 (sid:371, spid:2883), and SID2 (sid:559, spid:2886).

Open one UNIX window, start a Dtrace script on SID1:

$> sudo dtrace -w -n 'pid$target:oracle:kqrpre1:return{@CNT[ustack(15, 0)] = count(); stop(); exit(0);}' -p 2883
In SID1, change user identification:

SID1 (371) >  alter user test_user identified by 123;
Immediately SID1 is suspended, and Dtrace output shows:

  dtrace: description 'pid$target:oracle:kqrpre1:return' matched 1 probe
            oracle`kqrpre1+0x6ee
            a.out`kkdlGetBaseUser+0xd7
            a.out`kzulgt1+0x91
            a.out`kksLockUserSchema+0x4f
            a.out`kksLoadChild+0x527
            a.out`kglobld+0x422
            a.out`kglobpn+0x4d0
            a.out`kglpim+0x1e9
            a.out`kglpin+0x6f9
            oracle`kxsGetRuntimeLock+0x404
            oracle`kksfbc+0x146c
            a.out`kkspsc0+0x9d7
            oracle`kksParseCursor+0x74
            oracle`opiosq0+0x70a
            a.out`kpooprx+0x102
              1
Run the same alter statement in SID2:

SID2 (559) > alter user test_user identified by 123;
SID2 is blocked by SID1 with wait event: 'row cache lock', but p3=5 (exclusive mode).

select chain_signature, osid, pid, sid, blocker_is_valid bvalid, blocker_sid, 
       p1, p1_text, p2, p3, in_wait_secs, num_waiters
  from v$wait_chains w
 order by in_wait_secs desc nulls first;
 
  CHAIN_SIGNATURE                   OSID PID SID BVALI BLOCKER_SID P1 P1_TEXT  P2 P3 IN_WAIT_SECS NUM_WAITERS
  --------------------------------- ---- --- --- ----- ----------- -- -------- -- -- ------------ ------------
  <not in a wait><='row cache lock' 2883  24 371 FALSE                                                      1
  <not in a wait><='row cache lock' 2886  26 559 TRUE          371 10 cache id  0  5          111           0
Here the Call Stack of SID2:

$> pstack 2886
   ffff80ffbbf93d4b semsys   (4, 27, ffff80ffbfff2d48, 1, ffff80ffbfff2d50)
   000000000578acd8 sskgpwwait () + f8
   000000000578a965 skgpwwait () + c5
   0000000005944ffc ksliwat () + 8dc
   0000000005944350 kslwaitctx () + 90
   0000000005bec5b8 kqrget () + 498
   0000000005b99696 kqrLockAndPinPo () + 256
   0000000005b95644 kqrpre1 () + 8d4
   000000000b787bc3 kzdugt () + 1b3
   000000000b77839d kzuial () + 6dd
   0000000005709a35 opiexe () + 4cf5
   0000000005bcdb5e opiosq0 () + 1e2e
   0000000005d202d2 kpooprx () + 102
   0000000005c72b38 kpoal8 () + 308
Resume all above executions by:

$> prun 2883
We can also make a further test on "latch: row cache objects". At first, executing the query in Blog: Oracle ROWCACHE Views and Contents - Section: 2.5. dc_users, we can find the latch address of dc_users: TEST_USER.

select user_or_role_name, indx, hash, address, cache#, cache_name, existent, lock_mode, lock_request from (
  select to_number(substr(key, 3, 2)||substr(key, 1, 2), 'XXXX') key_len,
         dump_hex2str(rtrim(substr(key, 5, 2*to_number(substr(key, 3, 2)||substr(key, 1, 2), 'XXXX')), '0')) user_or_role_name, 
         v.* 
  from v$rowcache_parent v 
  where cache_name in ('dc_users') 
  order by key)
where user_or_role_name in ('TEST_USER');

  USER_OR_ROLE_NAME  INDX  HASH ADDRESS          CACHE# CACHE_NAME E  LOCK_MODE LOCK_REQUEST
  ----------------- ----- ----- ---------------- ------ ---------- - ---------- ------------
  TEST_USER         16173 11383 000000014D5AAC08     10 dc_users   Y          0            0
  TEST_USER         12629 53120 000000014D5AAC08     10 dc_users   Y          0            0
Start Dtrace below on SID1:

$> sudo dtrace -w -n '
pid$target:oracle:kqrLockAndPinPo:entry /arg1 == 0x14D5AAC08/
{printf("\n%s:%s (arg0=>0x%X, arg1=>0x%X, arg2=>0x%X, arg3=>0x%X, arg4=>0x%X, arg5=>0x%X, arg6=>0x%X, arg7=>0x%X, arg8=>0x%X)", 
        probefunc, probename, arg0, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8);
@CNT[ustack(15, 0)] = count(); stop(); exit(0);
}' -p 2883
In SID1, change user identification:

SID1 (371) >  alter user test_user identified by 123;

SID1 is suspended, and Dtrace shows kqrpre (row cache parent read) calling kqrLockAndPinPo:

  dtrace: description '
  pid$target:oracle:kqrLockAndPinPo:entry ' matched 1 probe
    kqrLockAndPinPo:entry (arg0=>0x7, arg1=>0x14D5AAC08, arg2=>0x17A0F8520, arg3=>0x3, arg4=>0x0, 
                           arg5=>0x0, arg6=>0x0, arg7=>0x5B9555E, arg8=>0x16FFF1D40)
       oracle`kqrLockAndPinPo
       oracle`kqrpre1+0x8d4
       oracle`kqrpre+0x24
       a.out`kkdlGetBaseUser+0xd7
       a.out`kzulgt1+0x91
       a.out`kksLockUserSchema+0x4f
       a.out`kksLoadChild+0x527
       a.out`kglobld+0x422
       a.out`kglobpn+0x4d0
       a.out`kglpim+0x1e9
       a.out`kglpin+0x6f9
       oracle`kxsGetRuntimeLock+0x404
       oracle`kksfbc+0x146c
       a.out`kkspsc0+0x9d7
       oracle`kksParseCursor+0x74
         1
In SID2, change user identification:

SID2 (559) > alter user test_user identified by 123;
SID2 is blocked by SID1 with wait event: 'latch: row cache objects'.

select chain_signature, osid, pid, sid, blocker_is_valid bvalid, blocker_sid, 
       p1, p1_text, p2, p3, in_wait_secs, num_waiters
  from v$wait_chains w
 order by in_wait_secs desc nulls first;

  CHAIN_SIGNATURE                              OSID  PID SID BVALI BLOCKER_SID         P1 P1_TEXT     P2 P3 IN_WAIT_SECS NUM_WAITERS
  -------------------------------------------- ----- --- --- ----- ----------- ---------- ---------- --- -- ------------ -----------
  <not in a wait><='latch: row cache objects'   2883  24 371 FALSE                                                                 4
  <not in a wait><='latch: row cache objects'  15590  34 723 TRUE          371 6490635480 address    411  0          163           0
  <not in a wait><='latch: row cache objects'  16187  31 187 TRUE          371 6490635480 address    411  0          126           0
  <not in a wait><='latch: row cache objects'  16189  32 375 TRUE          371 6490635480 address    411  0          125           0
  <not in a wait><='latch: row cache objects'   2886  26 559 TRUE          371 6490635480 address    411  0           18           0
  
  5 rows selected.   

  Legend:
     SID PROGRAM
     --- ----------------------------------------
     371 sqlplus.exe     <== SID1
     723 oracle@testdb (CJQ0)                 
     187 oracle@testdb (W003) KTSJ KTSJ Slave
     375 oracle@testdb (W001) KTSJ KTSJ Slave
     559 sqlplus.exe     <== SID2
Above v$wait_chains output also shows that there are other sessions also blocked by SID1 on 'latch: row cache objects'.

Here the Call Stack of SID2.

$ > pstack 2886
   ffff80ffbbf93d4b semsys   (2, 27, ffff80ffbfff2fe8, 1, 1b)
   000000000578adc5 sskgpwwait () + 1e5
   000000000578a965 skgpwwait () + c5
   000000000577cb19 kslges () + 5b9
   0000000005b9549e kqrpre1 () + 72e
   0000000005b32014 kqrpre () + 24
   000000000b265c97 kkdlGetBaseUser () + d7
   000000000b75bf31 kzulgt1 () + 91
   000000000c7ca61f kksLockUserSchema () + 4f
   0000000005d23e17 kksLoadChild () + 527
   0000000005dee052 kglobld () + 422
   0000000005ded180 kglobpn () + 4d0
   0000000005dec229 kglpim () + 1e9
   0000000005df9c59 kglpin () + 6f9
   0000000005b45804 kxsGetRuntimeLock () + 404
   0000000005b3bbac kksfbc () + 146c
   0000000005d20dd7 kkspsc0 () + 9d7
   0000000005bd04b4 kksParseCursor () + 74
   0000000005bcc43a opiosq0 () + 70a
   0000000005d202d2 kpooprx () + 102
   0000000005c72b38 kpoal8 () + 308


4. "row cache lock" Internals


In this section, we try to trace alter statement with event 10046, 10704 and Dtrace. Then look their output.

First set events: 10046 and 10704 in SID1:

SID1 > 
  alter session set events='10046 trace name context forever, level 1 : 10704 trace name context forever, level 3' 
                    tracefile_identifier='10704_rcl_7';
Then in UNIX Window, start Dtrace on SID1:

$> sudo dtrace -w -n '
pid$target:oracle:ksqgtlctx:entry,
pid$target:oracle:ksqrcl:entry, 
pid$target:oracle:kqrLockAndPinPo:entry
{printf("\n%s:%s (arg0=>0x%X, arg1=>0x%X, arg2=>0x%X)", probefunc, probename, arg0, arg1, arg2);
}' -p 2883
Run alter statement:
     
SID1 > alter user test_user identified by 123;
Terminate tracing events:

SID1 > alter session set events='10046 trace name context off : 10704 trace name context off '; 
Here the output of tracing events 10046 and 10704 (unrelated text are removed). It shows all SQL statemens, and 3 enqueue locks: DT, TM, TX.

PARSING IN CURSOR #18446604434603456984 sqlid='4dq0n3hjpcshj'
  alter user test_user iden

ksqgtl *** DT-00000000-00000000-00000000-00000000 mode=4 flags=0x10400 timeout=21474836 ***
ksucti: init txn DID from session DID 0001-001A-00006AC6

PARSING IN CURSOR #18446604434601674064 sqlid='g2rtajauwtbb6'
  select 1 from dual where exists (select 1 from sys.sysauth$ sa where privilege#=:1 and not exists 
    (select 1 from sys.user$ u where sa.grantee# = u.user# and u.type# = 1)) 
           or exists (select 1 from sys.codeauth$ where privilege#=:1)

PARSING IN CURSOR #18446604434601637344 sqlid='2z0udr4rc402m'
  select exptime, ltime, astatus, lcount from user$ where user#=:1

PARSING IN CURSOR #18446604434601672424 sqlid='0y6nfkk5dwd60'
  select pwd_verifier, pv_type from default_pwd$ where user_name = :1

PARSING IN CURSOR #18446604434601670784 sqlid='6mcm7j3g90vub'
  update user$ set user#=:1,password=:3,datats#=:4,tempts#=:5,type#=:6,defrole=:7,resource$=:8,
    ptime=DECODE(to_char(:9, 'YYYY-MM-DD'), '0000-00-00', to_date(NULL), :9),defschclass=:10, 
    spare1=:11, spare4=:12 where name=:2

ksqgtl *** TM-0018475C-00000000-00000000-00000000 mode=3 flags=0x400 timeout=21474836 ***
ksqgtl *** TX-00040006-007717CE-00000000-00000000 mode=6 flags=0x401 timeout=0 ***

EXEC #18446604434601670784

PARSING IN CURSOR #18446604434601669144 sqlid='b84cknyvnyq25'
  update user$ set exptime=DECODE(to_char(:2, 'YYYY-MM-DD'), '0000-00-00', to_date(NULL), :2),
    ltime=DECODE(to_char(:3, 'YYYY-MM-DD'), '0000-00-00', to_date(NULL), :3),
    astatus = :4, lcount = :5 where user#=:1

EXEC #18446604434601669144

ksqrcl: TX-00040006-007717CE-00000000-00000000
ksqrcl: TM-0018475C-00000000-00000000-00000000
ksqrcl: DT-00000000-00000000-00000000-00000000
We can see that DT lock is first acquired immediately after alter statement. DT is for Default Temporary Tablespace enqueue request, described as Serializes changing the default temporary tablespace and user creation.

For a normal table update, no DT lock is required.

TM and TX locks are acquired before last update statement.

The "TM-0018475C" lock is on sys.user$ table:

select owner, object_name, object_id, object_type 
from dba_objects where object_id = to_number('18475C', 'XXXXXX');

  OWNER OBJECT_NAM OBJECT_ID OBJECT_TYPE
  ----- ---------- --------- -----------
  SYS   USER$        1591132 TABLE
The involved tables are: sys.user$, sys.sysauth$, sys.codeauth$, and default_pwd$. which are all above user identifications.

Here Dtrace output (some similar lines are removed), where arg0 is v$rowcache_parent.cache#, arg1 is v$rowcache_parent.address, arg3 seems mode requested for kqrLockAndPinPo.

  kqrLockAndPinPo:entry (arg0=>0x7, arg1=>0x14D5AAC08, arg2=>0x18EE596C8)
  kqrLockAndPinPo:entry (arg0=>0x7, arg1=>0x14D5AAC08, arg2=>0x18EE596C8)
  kqrLockAndPinPo:entry (arg0=>0xA, arg1=>0x1688AEB50, arg2=>0x18EE596C8)
  kqrLockAndPinPo:entry (arg0=>0xA, arg1=>0x1688AEB50, arg2=>0x18EE596C8)
  kqrLockAndPinPo:entry (arg0=>0xA, arg1=>0x1688AEB50, arg2=>0x18EE596C8)
  kqrLockAndPinPo:entry (arg0=>0x7, arg1=>0x14D5AAC08, arg2=>0x18EE596C8)
  kqrLockAndPinPo:entry (arg0=>0x7, arg1=>0x166398748, arg2=>0x18EE596C8)
  kqrLockAndPinPo:entry (arg0=>0xA, arg1=>0x14D5AAC08, arg2=>0x184EB81B8)
  ksqgtlctx:entry       (arg0=>0x18C29CFB8, arg1=>0x4, arg2=>0x0)
  kqrLockAndPinPo:entry (arg0=>0x7, arg1=>0x166398748, arg2=>0x18EE596C8)
  kqrLockAndPinPo:entry (arg0=>0x23, arg1=>0x17BE91DB0, arg2=>0x18EE596C8)
  kqrLockAndPinPo:entry (arg0=>0xE, arg1=>0x1789631B0, arg2=>0x18EE596C8)
  kqrLockAndPinPo:entry (arg0=>0x3A, arg1=>0x178984728, arg2=>0x18EE596C8)
  ksqgtlctx:entry       (arg0=>0x184CE0220, arg1=>0x3, arg2=>0x0)
  ksqgtlctx:entry       (arg0=>0x184EB8238, arg1=>0x6, arg2=>0x0)
  ksqrcl:entry          (arg0=>0x184EB8238, arg1=>0x1, arg2=>0xFFFF80FFBFFF48A0)
  ksqrcl:entry          (arg0=>0x184CE0220, arg1=>0x1, arg2=>0x0)

Legend
  arg0 in kqrLockAndPinPo:
     select distinct to_char(cache#, 'XX') cache#_hex, cache#, cache_name
     from v$rowcache_parent where cache# in (7, 10, 35, 14, 58) order by cache#;
     
       CACHE#_HEX CACHE# CACHE_NAME
       ---------- ------ --------------------------
         7             7 dc_users
         A            10 dc_users
         E            14 dc_profiles
        23            35 triton security name to ID
        3A            58 dc_pdbdba
  
  arg0 in ksqgtlctx and ksqrcl is v$lock.kaddr, for example, 0x18C29CFB8 is DT's KADDR in above v$lock.
First there are a few kqrLockAndPinPo calls for "latch: row cache objects" requests, then enqueue get: ksqgtlctx and release: ksqrcl on "row cache lock".

Following query shows the involved dc_users:

select user_or_role_name, indx, hash, address, cache#, cache_name, existent, lock_mode, lock_request from (
  select to_number(substr(key, 3, 2)||substr(key, 1, 2), 'XXXX') key_len,
         dump_hex2str(rtrim(substr(key, 5, 2*to_number(substr(key, 3, 2)||substr(key, 1, 2), 'XXXX')), '0')) user_or_role_name, 
         v.* 
  from v$rowcache_parent v 
  where cache_name in ('dc_users') 
  order by key)
where address in ('00000001688AEB50', '000000014D5AAC08', '0000000166398748');

  USER_OR_ROLE_NAME  INDX  HASH ADDRESS          CACHE# CACHE_NAME E LOCK_MODE LOCK_REQUEST
  ----------------- ----- ----- ---------------- ------ ---------- - --------- ------------
  OUTLN             16187 21366 00000001688AEB50     10 dc_users   Y         0            0
  OUTLN             12616 41803 00000001688AEB50     10 dc_users   Y         0            0
  PUBLIC            16189 25134 0000000166398748      7 dc_users   Y         0            0
  PUBLIC            12595 19631 0000000166398748      7 dc_users   Y         0            0
  TEST_USER         16173 11383 000000014D5AAC08     10 dc_users   Y         0            0
  TEST_USER         12629 53120 000000014D5AAC08     10 dc_users   Y         0            0
We can also extend the query to find involved sessions and (recursive) transactions:

select user_or_role_name, r.indx, hash, address, cache#, cache_name, existent, lock_mode, lock_request, txn, r.saddr, s.*, t.* from (
  select to_number(substr(key, 3, 2)||substr(key, 1, 2), 'XXXX') key_len,
         k.dump_hex2str(rtrim(substr(key, 5, 2*to_number(substr(key, 3, 2)||substr(key, 1, 2), 'XXXX')), '0')) user_or_role_name, 
         v.* 
  from v$rowcache_parent v
  where cache_name in ('dc_users') 
  order by key) r, gv$session s, x$ktcxb t
where r.saddr = s.saddr(+) and r.txn = t.ktcxbxba(+)
  and txn not in ('00');
  
  -- gv$transaction is defined on x$ktcxb with filter: bitand (ksspaflg, 1) != 0 and bitand (ktcxbflg, 2) != 0