Monday, September 18, 2017

Result Cache: RC Latch Contention and PL/SQL Exception Handler (III)

If we have fixed all the causes of Result Cache: RC Latch Contention discussed in previous two Blogs, the system still suffers from such Contention. There could exist the cases, which we have not yet covered.

In this Blog, we will discuss the third case: "Result Cache: RC Latch Contention and PL/SQL Exception Handler". So in total we have found 3 Cases:
1. PL/SQL Function Result Cache Invalidation (I) (DML and Select for Update)
2. Result Cache: RC Latch Contention and RESULT_CACHE_MAX_SIZE (II))
3. Result Cache: RC Latch Contention and PL/SQL Exception Handler (III)
Note: All tests are done in Oracle 11.2.0.4, 12.1.0.2, 12.2.0.1 on AIX, Solaris, Linux with 6 physical processors.

Update (2017-Sep-19): One day after this Blog appeared, it turns out that there exists an on-going bug fix, which probably addresses Case (II) and (III).
    Bug 23738186 : HIGH RESULT CACHE LATCHES AFTER MIGRATION TO 12.1.0.2
This fix addresses the Result Cache (RC) latch contention when the RC is in the LRU regime.
The following changes are part of this fix:

1. Bypass Object (BO)
    Currently, when a RC function throws an exception or a RC SQL exceeds memory
    limit, the under-construction Result Object (RO) is marked invalid and released.
    The customer's application has some RC functions that have a high rate of
    exceptions, which leads to high rate of invalid RO creation, which in turn leads
    to noticeable x-latch contention. To reduce these wasteful RO creations, we will
    instead store a BO when the function+parameters throws an exception. A BO is a
    regular RO that is created, on client directive, by the RC, but unlike regular
    RO's a BO can never be read. A BO uses exactly one block that contains the RO
    header but no data. A client trying to read a BO always gets a (non-storable)
    miss which forces the client to handle the miss by executing the function.
    
    Note, unlike regular misses, a non-storable miss does not allow the client to
      create a new RO. We have introduced two new statistics in v$result_cache_statistics
      to track the creation and use of BO's (for an example see tkqgcbug.sql/log).
 
2. Group release of invalid objects
    Release all invalid objects together (instead of one at a time).
 
3. Improved LRU list processing
    Instead of scanning all object on the list looking for the min lru#, stop
    the scan when the first un-scanned object (i.e. scan_count is 0) is reached.


1. Test Code


Out test table (TEST_TAB) contains 4 rows (1-4). Any select not in (1, 2, 3, 4) will get NO_DATA_FOUND exception.

We create 4 result_cache functions for 4 different scenarios:
  1. rc_c1: exception handler and return 
  2. rc_c2: exception handler and no return
  3. rc_c3: exception handler, raise again
  4. rc_c4: no exception handler

--------------------------------- Test Setup ---------------------------------
drop table test_tab;
drop table rc_latch_hist;
drop table rc_stats_hist;

create table test_tab as select level x from dual connect by level <= 4;

create table rc_latch_hist as 
select 'aaaaaaaaaaaaaaaaaaaa' snap_name, systimestamp ts, v.* from v$latch v where name = 'Result Cache: RC Latch'; 

create table rc_stats_hist as 
select 'aaaaaaaaaaaaaaaaaaaa' snap_name, systimestamp ts, v.* from v$result_cache_statistics v;

-- exception handler: return
create or replace function rc_c1(p_x number) return number result_cache as
  l_ret number;
begin
  select x into l_ret from test_tab where x = p_x;
  return l_ret;
  exception when others then return null;   --NO_DATA_FOUND
end;
/

-- exception handler: no return
create or replace function rc_c2(p_x number) return number result_cache as
  l_ret number;
begin
  select x into l_ret from test_tab where x = p_x;
  return l_ret;
  exception when others then null;
end;
/

-- exception handler: raise again
create or replace function rc_c3(p_x number) return number result_cache as
  l_ret number;
  my_excep exception;
  pragma exception_init(my_excep, -20001);
begin
  select x into l_ret from test_tab where x = p_x;
  return l_ret;
  exception when others then raise_application_error(-20001, 'My Private Exception'); 
end;
/

-- no exception handler
create or replace function rc_c4(p_x number) return number result_cache as
  l_ret number;
begin
  select x into l_ret from test_tab where x = p_x;
  return l_ret;
end;
/

create or replace procedure rc_proc_case(p_case number, p_rc_limit number) as
  l_sql_bool_val boolean := true;
  l_ret varchar2(10);
begin
 dbms_application_info.set_module('RC Test', 'RC_CASE_'||p_case);
  for x in 1..p_rc_limit loop
   begin
    case p_case 
      when 1 then l_ret := rc_c1(x);
      when 2 then l_ret := rc_c2(x);
      when 3 then l_ret := rc_c3(x);
      when 4 then l_ret := rc_c4(x);
    end case;
    exception when others then null;
    end;
  end loop;
end;
/

create or replace procedure rc_case_job(p_case number, p_job_cnt number, p_rc_limit number) as
  l_job_id pls_integer;
begin
  for i in 1.. p_job_cnt loop
    dbms_job.submit(l_job_id, 'begin while true loop rc_proc_case('||p_case||', '||p_rc_limit||'); end loop; end;');
  end loop;
  commit;
end;    
/

create or replace procedure clean_jobs as
begin
  for c in (select job from dba_jobs) loop
    begin
       dbms_job.remove (c.job);
    exception when others then null;
    end;
    commit;
  end loop;

  for c in (select d.job, d.sid, (select serial# from v$session where sid = d.sid) ser 
              from dba_jobs_running d) loop
    begin
      execute immediate
             'alter system kill session '''|| c.sid|| ',' || c.ser|| ''' immediate';
      dbms_job.remove (c.job);
    exception when others then null;
    end;
    commit;
  end loop;
  
  -- select * from dba_jobs;
  -- select * from dba_jobs_running;
end;
/

create or replace procedure rc_record_rc_latch(p_snap_name varchar2) as
begin
  insert into rc_latch_hist
  select p_snap_name snap_name, systimestamp ts, v.* from v$latch v where name = 'Result Cache: RC Latch';
  insert into rc_stats_hist
  select p_snap_name snap_name, systimestamp ts, v.* from v$result_cache_statistics v;
  commit; 
end;    
/

create or replace procedure rc_case_test(p_sleep number) as
begin
  clean_jobs;
  rc_record_rc_latch('Start');             
  rc_case_job(p_case=> 1, p_job_cnt=> 2, p_rc_limit=> 4); dbms_lock.sleep(p_sleep);
  rc_record_rc_latch('No_Error');          
  rc_case_job(p_case=> 1, p_job_cnt=> 1, p_rc_limit=> 5); dbms_lock.sleep(p_sleep);
  rc_record_rc_latch('Except_Return') ;     
  rc_case_job(p_case=> 2, p_job_cnt=> 1, p_rc_limit=> 5); dbms_lock.sleep(p_sleep);
  rc_record_rc_latch('Except_No_Return');  
  rc_case_job(p_case=> 3, p_job_cnt=> 1, p_rc_limit=> 5); dbms_lock.sleep(p_sleep);
  rc_record_rc_latch('Except_Raise');      
  rc_case_job(p_case=> 4, p_job_cnt=> 1, p_rc_limit=> 5); dbms_lock.sleep(p_sleep);
  rc_record_rc_latch('No_Hanler');
  clean_jobs;
end;    
/


2. Single Session Observation Test


Run following small test code:

column name format    a20
column rc_name format a48
column value format   a10
set numformat         999,999,999,999,999
column id format      999,999
alter session set nls_date_format ='dd-mon-yyyy hh24:mi:ss';
exec dbms_result_cache.flush;

select id, type, status, name rc_name, creation_timestamp, scn, lru_number from v$result_cache_objects v order by 3 desc, name, status;

var l_x number;
exec :l_x := rc_c4(1);
exec dbms_lock.sleep(1);
select id, type, status, name rc_name, creation_timestamp, scn, lru_number from v$result_cache_objects v order by 3 desc, name, status;

exec :l_x := rc_c4(5);
exec dbms_lock.sleep(1);
select id, type, status, name rc_name, creation_timestamp, scn, lru_number from v$result_cache_objects v order by 3 desc, name, status;
Output shows that there is "Invalid" Result when hitting NO_DATA_FOUND.

SQL > select id, type, status, name rc_name, creation_timestamp, scn, lru_number from v$result_cache_objects v order by 3 desc, name, status;
  no rows selected

SQL > exec :l_x := rc_c4(1);
SQL > select id, type, status, name rc_name, creation_timestamp, scn, lru_number from v$result_cache_objects v order by 3 desc, name, status;
   ID TYPE       STATUS    RC_NAME                                     CREATION_TIMESTAMP            SCN           LRU_NUMBER
  --- ---------- --------- ------------------------------------------- -------------------- -------------------- ---------------
    1 Result     Published "K"."RC_C4"::8."RC_C4"#3048d2af80817a01 #1  2017*sep*18 06:58:33    9,237,974,126,701   1,985,444,743
    0 Dependency Published K.RC_C4                                     2017*sep*18 06:58:33    9,237,974,126,701               0
    2 Dependency Published K.TEST_TAB                                  2017*sep*18 06:58:33    9,237,974,126,701               0

SQL > exec :l_x := rc_c4(5);
  ORA-01403: no data found
SQL > select id, type, status, name rc_name, creation_timestamp, scn, lru_number from v$result_cache_objects v order by 3 desc, name, status;
   ID TYPE       STATUS    RC_NAME                                     CREATION_TIMESTAMP            SCN           LRU_NUMBER
  --- ---------- --------- ------------------------------------------- -------------------- -------------------- ---------------
    1 Result     Published "K"."RC_C4"::8."RC_C4"#3048d2af80817a01 #1  2017*sep*18 06:58:33    9,237,974,126,701   1,985,444,743
    0 Dependency Published K.RC_C4                                     2017*sep*18 06:58:33    9,237,974,126,701               0
    2 Dependency Published K.TEST_TAB                                  2017*sep*18 06:58:33    9,237,974,126,701               0
    3 Result     Invalid   "K"."RC_C4"::8."RC_C4"#3048d2af80817a01 #1  2017*sep*18 06:58:34    9,237,974,126,701               0


3. Multiple Session Concurrency Test


On a UNIX machine with 6 physical processors, we start 6 Oracle Jobs one after another with an interval of about 60 seconds before each launch. At same time, we collect RC Latch and RC Stats between each launch.
1. First 2 Jobs (base load) select row 1-4, and there is no exception. 
   All other Jobs select row 1-5, and hence get exception when select row 5.
2. Job 3 has exception handler and returns null. 
3. Job 4 has exception handler, but does not have return. 
4. Job 5 has exception handler, but raises again.
5. Job 6 does not have exception handler.

truncate table rc_latch_hist;
truncate table rc_stats_hist;
exec dbms_result_cache.flush;
exec rc_case_test(60);
During the test, if we monitor RC Objects, we can see the status: "New" and "Invalid".

select type, status, name, count(*) from v$result_cache_objects 
group by type, status, name order by type desc, status, name;

TYPE       STATUS    NAME                                         COUNT(*)
---------- --------- ------------------------------------------ ----------
Result     Invalid   "K"."RC_C2"::8."RC_C2"#3048d2af80817a01 #1          7
Result     Invalid   "K"."RC_C3"::8."RC_C3"#3048d2af80817a01 #1          1
Result     New       "K"."RC_C3"::8."RC_C3"#3048d2af80817a01 #1          1
Result     New       "K"."RC_C4"::8."RC_C4"#3048d2af80817a01 #1          1
Result     Published "K"."RC_C1"::8."RC_C1"#3048d2af80817a01 #1          5
Result     Published "K"."RC_C2"::8."RC_C2"#3048d2af80817a01 #1          4
Result     Published "K"."RC_C3"::8."RC_C3"#3048d2af80817a01 #1          4
Result     Published "K"."RC_C4"::8."RC_C4"#3048d2af80817a01 #1          4
Dependency Published K.RC_C1                                             1
Dependency Published K.RC_C2                                             1
Dependency Published K.RC_C3                                             1
Dependency Published K.RC_C4                                             1
Dependency Published K.TEST_TAB                                          1
After the test, we look the collected stats, which shows the difference of RC Latch SLEEPS in 4 different scenarios.

The test result demonstrates that except the first scenario: "rc_c1: exception handler and return", all other 3 PL/SQL Result Cache implantations can cause RC Latch Contentions.

select v.*, round(wait_time/(nullif (sleeps, 0)), 2) wait_time_us_per_sleep
from (
    select snap_name, ts
          ,gets-lag(gets) over(order by ts) gets
          ,misses-lag(misses) over(order by ts) misses
          ,sleeps-lag(sleeps) over(order by ts) sleeps
          ,spin_gets-lag(spin_gets) over(order by ts) spin_gets
          ,wait_time-lag(wait_time) over(order by ts) wait_time
    from rc_latch_hist
) v order by ts;

SNAP_NAME         TS                                  GETS      MISSES     SLEEPS    SPIN_GETS    WAIT_TIME  WAIT_TIME_US_PER_SLEEP
----------------- --------------------------- ------------ ----------- ---------- ------------ ------------ -----------------------
Start             18-SEP-2017 07:59:55 +02:00                                                                
No_Error          18-SEP-2017 08:00:55 +02:00   61,623,726     790,876     0           790,876            0  
Except_Return     18-SEP-2017 08:01:55 +02:00  127,973,369   6,743,323     1         6,743,469            3                       3
Except_No_Return  18-SEP-2017 08:02:55 +02:00  135,558,711  14,221,514  1,277,086   12,948,140    7,762,707                       6
Except_Raise      18-SEP-2017 08:03:56 +02:00  123,932,503  16,996,212  2,304,628   14,752,639   14,765,338                       6
No_Hanler         18-SEP-2017 08:04:56 +02:00  112,230,114  18,895,634  3,322,022   15,724,684   22,069,024                       7

select name, snap_name, ts
      ,value-lag(value) over(partition by name order by ts) value
from rc_stats_hist
where name in ('Create Count Success', 'Delete Count Invalid')
order by name, ts;

NAME                    SNAP_NAME            TS                             VALUE
----------------------- -------------------- ------------------------------ ----------
Create Count Success    Start                18-SEP-2017 07:59:55 +02:00
Create Count Success    No_Error             18-SEP-2017 08:00:55 +02:00        4
Create Count Success    Except_Return        18-SEP-2017 08:01:55 +02:00        1
Create Count Success    Except_No_Return     18-SEP-2017 08:02:55 +02:00       747,926
Create Count Success    Except_Raise         18-SEP-2017 08:03:56 +02:00     1,410,675
Create Count Success    No_Hanler            18-SEP-2017 08:04:56 +02:00     2,025,890
Delete Count Invalid    Start                18-SEP-2017 07:59:55 +02:00
Delete Count Invalid    No_Error             18-SEP-2017 08:00:55 +02:00        0
Delete Count Invalid    Except_Return        18-SEP-2017 08:01:55 +02:00        0
Delete Count Invalid    Except_No_Return     18-SEP-2017 08:02:55 +02:00       747,902
Delete Count Invalid    Except_Raise         18-SEP-2017 08:03:56 +02:00     1,410,676
Delete Count Invalid    No_Hanler            18-SEP-2017 08:04:56 +02:00     2,025,891


4. SQL Result Cache Difference


In the above test, we use PL/SQL Result Cache. if we replace all of them by SQL Result Cache, for example:

create or replace function rc_c4(p_x number) return number as
  l_ret number;
begin
  select /*+ result_cache */ x into l_ret from test_tab where x = p_x;   
  return l_ret;
end;
/
make a similar test:

exec clean_jobs;
exec dbms_result_cache.flush;
exec rc_case_job(p_case=> 4, p_job_cnt=> 4, p_rc_limit=> 5);
and monitor RC Objects by:

select type, status, name, count(*) from v$result_cache_objects 
group by type, status, name order by type desc, status, name;

TYPE       STATUS    NAME                                                        COUNT(*)
---------- --------- ---------------------------------------------------------- ---------
Result     Published SELECT /*+ result_cache */ X FROM TEST_TAB WHERE X = :B1           5
Dependency Published K.TEST_TAB                                                         1
There does exist any RC status "New" or "Invalid". We can hardly observe RC Latch Contentions because there are no RC Latch SLEEPS. Therefore SQL Result Cache and PL/SQL Result Cache are probably NOT equivalent.

It is not clear how Oracle internally handles differently SQL Result Cache and PL/SQL Result Cache. But it could affect real applications when using them differently.