Wednesday, October 10, 2018

Result Cache: RC Enqueue and Contention (V)

This Blog will discuss 'enq: RC - Result Cache: Contention' when two (or multi) sessions adding the same Result Cache item at about the same time.

In the previous 4 Blogs, we showed 4 different cases of Result Cache Invalidation and the subsequent Result Cache: RC Latch Contention.
1. PL/SQL Function Result Cache Invalidation (I) (DML and Select for Update)
2. Result Cache: RC Latch Contention and RESULT_CACHE_MAX_SIZE (II))
3. Result Cache: RC Latch Contention and PL/SQL Exception Handler (III)
4. Result Cache:0 rows updated Invalidations (IV) 
Note: All tests are done in Oracle 12.1.0.2.


1. RC Latch vs. RC Enqueue


Adding a new RC item requires 4 RC Latch Gets (1 S Mode and successive 3 X Mode) and 4 subsequent Releases (kslfre), so that the single RC Latch is held solely in the necessary critical code path for a short time during whole RC item building time. (see Blog: PL/SQL Function Result Cache Invalidation (I))

Prepare Test:

--============ Test Setup ============--
create or replace function test_build_time (p_id number) return number result_cache as
begin
 dbms_lock.sleep(p_id);
  return p_id;
end;
/

SQL > exec dbms_result_cache.flush;
Open two Sessions to add two different new RC items at about same time.

--============ Session_1 @T1 (started at 05:06:24) ============--
05:06:08 SQL > var ret number;
05:06:15 SQL > exec :ret := test_build_time(9.1);
Elapsed: 00:00:09.12
05:06:33 SQL >

--============ Session_2 @T2 (started at 05:06:25) ============--
05:06:09 SQL > var ret number;
05:06:16 SQL > exec :ret := test_build_time(9.2);
Elapsed: 00:00:09.22
05:06:34 SQL >
Above test demonstrates that each adding takes the specified build_time (09.12, 09.22), hence no RC Latch contentions. Both seemed running independently alone even though they are running concurrently.

Now it arises the question, what happens when two different sessions adding identical RC item at about same time ?

RC Latch (ksl_get_shared_latch) does not have the knowledge of RC item content, and two sessions do not hold it during whole RC item building time. Therefore RC Latch is not able to protect the uniqueness of this new RC item.

The Oracle answer is RC Enqueue to coordinate such access, analogue to TX Enqueue.


2. Building Phases


Run test:

--============ Test Run ============--
SQL > exec :ret := test_build_time(15.8);

--============ Test Output ============--
06:53:10 SQL > var ret number;
06:53:20 SQL > exec :ret := test_build_time(15.8);
Elapsed: 00:00:15.80
06:53:36 SQL >
If during the Test Run, we frequently run query (for example, with SQLcl repeat command):

select id, status, name, creation_timestamp, (sysdate-creation_timestamp)*86400 elapsed_seconds, 
       scan_count, lru_number, build_time --, cache_id, cache_key
from v$result_cache_objects 
where type = 'Result' and name like '%TEST_BUILD_TIME%'
order by id desc;
We can observe that adding a new RC item experiences 3 phases, with status changed at different timestamp:
  New (0 - 10 second)
      -> Bypass (10 - 15.8 second)  
            -> Published (after 15.8 second)
10 seconds is a timeout and will be looked later.

At the end, we have the output:

 ID STATUS    NAME                      CREATION_TIMESTAMP  ELAPSED_SECONDS SCAN_COUNT LRU_NUMBER BUILD_TIME
--- --------- ------------------------- ------------------- --------------- ---------- ---------- ----------
  1 Published "K"."TEST_BUILD_TIME"::8. 2018-10-09 06:53:20              17          0        886       1580
The two columns in v$result_cache_objects denote:
  CREATION_TIMESTAMP: Starting TimeStamp of Build
  BUILD_TIME:         Elapsed Time of Build (in centisecond)
The intermediate status "Bypass" (10 - 15.8 second) seems related to "Bypass Object (BO)" mentioned in:
Bug 23738186 : HIGH RESULT CACHE LATCHES AFTER MIGRATION TO 12.1.0.2
(superseded by Bug 26436717 : RESULT CACHE: "BYPASS OBJECTS" INCORRECTLY USED BY WAITING PROCESSES.
This bug fixes a flaw in the original fix for bug# 23738186)

See Blog: Result Cache: RC Latch Contention and PL/SQL Exception Handler (III)


3. Result Cache: RC Enqueue


Run following test, we can see that adding a new Row requires 2 enq REQs (311-309=2).

select eq_name, total_req#, succ_req#, cum_wait_time from v$enqueue_statistics where eq_type in ('RC');
exec :ret := test_build_time(16.8);
select eq_name, total_req#, succ_req#, cum_wait_time from v$enqueue_statistics where eq_type in ('RC');

--============ Test Output ============--
11:53:24 SQL > select eq_name, total_req#, succ_req#, cum_wait_time from v$enqueue_statistics where eq_type in ('RC');
  EQ_NAME                        TOTAL_REQ#  SUCC_REQ# CUM_WAIT_TIME
  ------------------------------ ---------- ---------- -------------
  Result Cache: Enqueue                 309        309             0

11:54:02 SQL > exec :ret := test_build_time(16.8);
Elapsed: 00:00:16.82

11:54:19 SQL > select eq_name, total_req#, succ_req#, cum_wait_time from v$enqueue_statistics where eq_type in ('RC');
  EQ_NAME                        TOTAL_REQ#  SUCC_REQ# CUM_WAIT_TIME
  ------------------------------ ---------- ---------- -------------
  Result Cache: Enqueue                 311        311             0

4. Two Sessions adding the same Result Cache item at about the same time


Add the same Result Cache item at about the same time in two sessions, and trace both session with Event 10704 & 10046.

During test, run query below to watch 'enq: RC - Result Cache: Contention':

  select * from v$wait_chains;

4.1. Session_1 adding item started at 18:55:06


--============ Session_1 @T1 (18:55:06) ============--
exec dbms_result_cache.flush;

set time on timing on
column eq_name format a25
var ret number;

select eq_name, total_req#, total_wait#, succ_req#, failed_req#, cum_wait_time from v$enqueue_statistics where eq_type in ('RC');
alter session set events='10046 trace name context forever, level 12 : 10704 trace name context forever, level 3' tracefile_identifier='RC_Enq_1';
exec :ret := test_build_time(18.8);
alter session set events='10046 trace name context off : 10704 trace name context off ';
select eq_name, total_req#, total_wait#, succ_req#, failed_req#, cum_wait_time from v$enqueue_statistics where eq_type in ('RC');

--============ Session_1 Output (started at 18:55:06)============--
18:55:04 SQL> select eq_name, total_req#, total_wait#, succ_req#, failed_req#, cum_wait_time from v$enqueue_statistics where eq_type in ('RC');
  EQ_NAME                   TOTAL_REQ# TOTAL_WAIT#  SUCC_REQ# FAILED_REQ# CUM_WAIT_TIME
  ------------------------- ---------- ----------- ---------- ----------- -------------
  Result Cache: Enqueue            670           8        666           4        215480

18:55:06 SQL> exec :ret := test_build_time(18.8);
Elapsed: 00:00:18.83

18:55:25 SQL> select eq_name, total_req#, total_wait#, succ_req#, failed_req#, cum_wait_time from v$enqueue_statistics where eq_type in ('RC');
  EQ_NAME                   TOTAL_REQ# TOTAL_WAIT#  SUCC_REQ# FAILED_REQ# CUM_WAIT_TIME
  ------------------------- ---------- ----------- ---------- ----------- -------------
  Result Cache: Enqueue            674           9        669           5        225480

--============ Session_1 Trace (irelevant lines removed) ============--
*** SESSION ID:(189.37589) 2018-10-09 18:55:06.768
PARSING IN CURSOR #139879855170464 len=43 dep=0 uid=49 oct=47 lid=49 tim=12989538062736 hv=3533163271 ad='1513d9108' sqlid='4ffwn7z99gks7'
BEGIN :ret := test_build_time(18.8); END;
ksqgtl *** RC-00000002-FFFFFFFF-00000000-00000000 mode=4 flags=0x10400 timeout=21474836 ***
ksqgtl: RETURNS 0
ksqgtl *** RC-00000002-0024034D-00000000-00000000 mode=3 flags=0x10000 timeout=0 ***
ksqgtl: RETURNS 0
ksqgtl *** RC-00000001-00000001-00000000-00000000 mode=6 flags=0x10000 timeout=0 ***
ksqgtl: RETURNS 0

*** 2018-10-09 18:55:25.592
WAIT #139879855170464: nam='PL/SQL lock timer' ela= 18805802 duration=0 p2=0 p3=0 obj#=-1 tim=12989556871029
EXEC #139879855170464:c=859,e=18808207,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,plh=0,tim=12989556871192

4.2. Session_2 adding item started at 18:55:10 (4 seconds after Session_1)


--============ Session_2 @T2 (18:55:10) ============--
set time on timing on
column eq_name format a25
var ret number;

select eq_name, total_req#, total_wait#, succ_req#, failed_req#, cum_wait_time from v$enqueue_statistics where eq_type in ('RC');
alter session set events='10046 trace name context forever, level 12 : 10704 trace name context forever, level 3' tracefile_identifier='RC_Enq_2';
exec :ret := test_build_time(18.8);
alter session set events='10046 trace name context off : 10704 trace name context off ';
select eq_name, total_req#, total_wait#, succ_req#, failed_req#, cum_wait_time from v$enqueue_statistics where eq_type in ('RC');

--============ Session_2 Output (started at 18:55:10, 4 seconds after Session_1) ============--
17:54:08 SQL> select eq_name, total_req#, total_wait#, succ_req#, failed_req#, cum_wait_time from v$enqueue_statistics where eq_type in ('RC');
  EQ_NAME                   TOTAL_REQ# TOTAL_WAIT#  SUCC_REQ# FAILED_REQ# CUM_WAIT_TIME
  ------------------------- ---------- ----------- ---------- ----------- -------------
  Result Cache: Enqueue            673           8        669           4        215480

18:55:10 SQL> exec :ret := test_build_time(18.8);
Elapsed: 00:00:28.84

18:55:39 SQL> select eq_name, total_req#, total_wait#, succ_req#, failed_req#, cum_wait_time from v$enqueue_statistics where eq_type in ('RC');
  EQ_NAME                   TOTAL_REQ# TOTAL_WAIT#  SUCC_REQ# FAILED_REQ# CUM_WAIT_TIME
  ------------------------- ---------- ----------- ---------- ----------- -------------
  Result Cache: Enqueue            674           9        669           5        225480

--============ Session_2 Trace (irelevant lines removed) ============--
*** SESSION ID:(363.50996) 2018-10-09 18:55:11.039
PARSING IN CURSOR #139824425361968 len=43 dep=0 uid=49 oct=47 lid=49 tim=12989542327336 hv=3533163271 ad='1513d9108' sqlid='4ffwn7z99gks7'
BEGIN :ret := test_build_time(18.8); END;
ksqgtl *** RC-00000001-00000001-00000000-00000000 mode=4 flags=0x10000 timeout=10 ***
ksqgtl: no transaction
ksqgtl: use existing ksusetxn DID
ksqgtl:
 ksqlkdid: 0001-001A-00000B01
*** ksudidTrace: ksqgtl
 ksusesdi:   0000-0000-00000000
 ksusetxn:   0001-001A-00000B01
ksqcmi: RC-00000001-00000001-00000000-00000000 mode=4 timeout=10

*** 2018-10-09 18:55:21.052
WAIT #139824425361968: nam='enq: RC - Result Cache: Contention' ela= 10003442 name|mode=1380122628 chunkNo=1 blockNo=1 obj#=1591122 tim=12989552331028
ksqcmi: returns 51
ksqgtl: RETURNS 51

*** 2018-10-09 18:55:39.858
WAIT #139824425361968: nam='PL/SQL lock timer' ela= 18805872 duration=0 p2=0 p3=0 obj#=-1 tim=12989571137038

4.3. Discussions


Sqlplus output shows:
  Session_1 started at 18:55:06, and took 18.83 seconds.
  
  Session_2 started at 18:55:10 (4 seconds later), and took 28.84 seconds.
            28.84 is (10 + 18.8). We will look it later.
Above trace shows:
  Session_1 made 3 "ksqgtl *** RC" in mode=4, 3, 6, all succeded "ksqgtl: RETURNS 0",
                    with ela= 18805802 us on 'PL/SQL lock timer'.
           
  Session_2 made 1 "ksqgtl *** RC" in mode=4, failed "ksqgtl: RETURNS 51", 
                    with ela= 10003442 us on 'enq: RC - Result Cache: Contention'
                     and ela= 18805872 us on 'PL/SQL lock timer'.
                    total ela is about 28.8 seconds.
           "ksqgtl *** RC" is marked with "timeout=10".

               name|mode=1380122628 chunkNo=1 blockNo=1 obj#=1591122
            means:
               "1380122628" = 0x52430004 = RC 04 
               "1591122" is object_id of SYS.OBJ$
               "chunkNo=1 blockNo=1" is RC row pointer, similar to Heap Table ROWID: "block_id row_number".  
v$enqueue_statistics query output is consistent with above trace output:
  4 REQ# (674-670), 1 WAIT# (9-8), 3 SUCC_REQ# (669-666), 1 FAILED_REQ# (5-4)
  CUM_WAIT_TIME increase 10000 ms (225480 - 215480).

5. Hidden Parameters: _result_cache_timeout


Above the trace output reveals "timeout=10" for 'enq: RC - Result Cache: Contention' (ela= 10003442). Lookup all 17 Result Cache Hidden Parameters listed in Blog: PL/SQL Function Result Cache Invalidation (I)

The only matched one with default value 10 is:
  _result_cache_timeout: maximum time (sec) a session waits for a result, default = 10
Let's make a few tests to gamble its behaviours:
    Each test is done by two sessions to add the same item with different time delay.
    Different tests are carried out by varying build_time.
    Then modify "_result_cache_timeout" from 10 to 33 to make more tests.


Case_1. build_time=8.5 < 10, Session_2 starts 2 seconds aftet Session_1.


-- Session_2 started within NEW phase (10 seconds timeout) of Session_1.
-- Item ready in Session_1 at 11:36:05. 
-- Session_2's timeout at 11:35:59 + 10 =  11:36:09.
-- (11:36:05 < 11:36:09): Session_2 can get result before timeout. 

--============ Session_1 (started at 11:35:57) ============--
11:35:06 SQL> exec :ret := test_build_time(8.5);
Elapsed: 00:00:08.51
11:36:05 SQL>

--============ Session_2 (started at 11:35:59) ============--
11:35:06 SQL> exec :ret := test_build_time(8.5);
Elapsed: 00:00:06.17
11:36:05 SQL>

Case_2. build_time=20.1 > 10, Session_2 starts 2 seconds aftet Session_1.


-- Session_2 started within NEW phase of Session_1.
-- Item ready in Session_1 at 11:52:35. 
-- Session_2's timeout at 11:52:17 + 10 =  11:52:27.
-- (11:52:35 > 11:52:27): Session_2 can NOT get result before timeout. 

--============ Session_1 (started at 11:52:15) ============--
11:47:34 SQL> exec :ret := test_build_time(20.1);
Elapsed: 00:00:20.10
11:52:35 SQL>

--============ Session_2 (started at 11:52:17) ============--
11:47:47 SQL> exec :ret := test_build_time(20.1);
Elapsed: 00:00:30.12     -- timeout(10) + build_time(20.1) = 30.12
11:52:47 SQL>

Case_3. build_time=20.2 > 10, Session_2 starts 13 seconds aftet Session_1.


-- Session_2 started within BYPASS phase of Session_1.
-- Item ready in Session_1 at 11:53:53. 
-- Session_2's timeout at 11:53:47 + 10 =  11:53:57.
-- (11:53:53 < 11:53:57): Session_2 can get result before timeout.  

--============ Session_1 (started at 11:53:33) ============--
11:52:35 SQL> exec :ret := test_build_time(20.2);
Elapsed: 00:00:20.21
11:53:53 SQL>

--============ Session_2 (started at 11:53:47) ============--
11:52:47 SQL> exec :ret := test_build_time(20.2);
Elapsed: 00:00:20.21
11:54:07 SQL>

Increase "_result_cache_timeout" from 10 to 33, and make further tests:

alter system set "_result_cache_timeout"=33;

Case_4. build_time=20.4 < 33, Session_2 starts 13 seconds aftet Session_1.


-- Session_2 started within NEW phase (33 seconds timeout) of Session_1.
-- Item ready in Session_1 at 12:19:15. 
-- Session_2's timeout at 12:19:08 + 33 =  12:19:42.
-- (12:19:15 < 12:19:42): Session_2 can get result before timeout.  

--============ Session_1 (started at 12:18:55) ============--
12:18:51 SQL> exec :ret := test_build_time(20.4);
Elapsed: 00:00:20.42
12:19:15 SQL>

--============ Session_2 (started at 12:19:08) ============--
12:19:06 SQL> exec :ret := test_build_time(20.4);
Elapsed: 00:00:07.00
12:19:15 SQL>

Case_5. build_time=40.2 > 33, Session_2 starts 8 seconds aftet Session_1.


-- Session_2 started within NEW phase of Session_1.
-- Item ready in Session_1 at 13:14:56. 
-- Session_2's timeout at 13:14:20 + 33 =  13:14:53.
-- (13:14:56 > 13:14:53): Session_2 can NOT get result before timeout.  

--============ Session_1 (started at 13:14:12) ============--
13:14:02 SQL> exec :ret := test_build_time(40.2);
Elapsed: 00:00:40.20
13:14:56 SQL>

--============ Session_2 (started at 13:14:20) ============--
13:12:57 SQL> exec :ret := test_build_time(40.2);
Elapsed: 00:01:13.21   -- timeout(33) + build_time(40.2) = 73.2 = 01:13.21
13:15:33 SQL>

Case_6. build_time=40.1 > 33, Session_2 starts 12 seconds aftet Session_1.


-- Session_2 started within NEW phase of Session_1.
-- Item ready in Session_1 at 12:24:39. 
-- Session_2's timeout at 12:24:11 + 33 =  12:24:43.
-- (12:24:39 < 12:24:43): Session_2 can get result before timeout.  

--============ Session_1 (started at 12:23:59) ============--
12:19:15 SQL> exec :ret := test_build_time(40.1);
Elapsed: 00:00:40.12
12:24:39 SQL>

--============ Session_2 (started at 12:24:11) ============--
12:19:15 SQL> exec :ret := test_build_time(40.1);
Elapsed: 00:00:28.71
12:24:39 SQL>

Case_7. build_time=40.3 > 33, Session_2 starts 35 seconds aftet Session_1.


-- Session_2 started within BYPASS phase of Session_1.
-- Item ready in Session_1 at 13:39:42. 
-- Session_2's timeout at 13:39:37 + 33 =  13:40:10.
-- 13:39:42 < 13:40:10): Session_2 can get result before timeout.  

--============ Session_1 (started at 13:39:02) ============--
13:38:39 SQL > exec :ret := test_build_time(40.3);
Elapsed: 00:00:40.30
13:39:42 SQL >

--============ Session_2 (started at 13:39:37) ============--
13:38:32 SQL > exec :ret := test_build_time(40.3);
Elapsed: 00:00:40.30
13:40:17 SQL >
In summary, it looks like:
   When Session_2 started within NEW phase of Session_1, it experienced a long wait if it can not get the result built by Session_1
   within its timeout, for example, Case_2 (_result_cache_timeout=10) and Case_5 (_result_cache_timeout=33).

   When Session_2 started within BYPASS phase of Session_1, it built the RC item itself, hence "BYPASS" the Session_1's Build.
   It took exact the specified time, for example, Case_3 (_result_cache_timeout=10) and Case_7 (_result_cache_timeout=33).

If we repeat the test of Section 4, trace file will show "timeout=33":

ksqgtl *** RC-00000001-00000001-00000000-00000000 mode=4 flags=0x10000 timeout=33 ***
By the way, above tests showed that Result Cache can not be used in "Delay" (or "TIMEOUT") functions since only Session_1 has a specified duration in the first time, all other function calls take (almost) 0 time, or nondeterministic time.