Tuesday, October 31, 2017

ORA-04025 library object lock on DATABASE Truncate Triggers

In this Blog, we will discuss ORA-04025 Errors caused by DATABASE Truncate Triggers.

Note 1. All tests are done in Oracle 12.1.0.2.0 on AIX, Solaris, Linux.
Note 2. ORA-04025 Maximum Allowed Library Object Lock Allocated is also discussed in Blog:
             ORA-04025 library object lock on AQ queue and SQL cursor


1. DATABASE Truncate Triggers


In Oracle 12c, there exist two DATABASE Triggers:
    XDB_PI_TRIG
    AW_TRUNC_TRG
They can be discovered by:

create or replace function getlong(p_trigger_name varchar2) return varchar2 as 
  l_data long; 
begin 
  select trigger_body into l_data from dba_triggers where trigger_name = p_trigger_name; 
  return substr(l_data, 1, 4000);
end; 
/

column owner format            a6
column trigger_name format     a12
column trigger_type format     a12
column triggering_event format a15
column status format           a8
column description format      a40
set wrap off

SQL> select owner, trigger_name, trigger_type, triggering_event, status, description 
         --,getlong(trigger_name) trigger_body
     from dba_triggers where triggering_event like '%TRUNCATE%';

  OWNER  TRIGGER_NAME TRIGGER_TYPE TRIGGERING_EVEN STATUS   DESCRIPTION
  ------ ------------ ------------ --------------- -------- ----------------------------------------
  SYS    XDB_PI_TRIG  BEFORE EVENT DROP OR TRUNCAT ENABLED  sys.xdb_pi_trig
  SYS    AW_TRUNC_TRG AFTER EVENT  TRUNCATE        ENABLED  aw_trunc_trg AFTER TRUNCATE ON DATABASE


set wrap on

SQL> select getlong(trigger_name) trigger_body from dba_triggers where trigger_name = 'XDB_PI_TRIG';

BEGIN
  BEGIN
    IF (sys.is_vpd_enabled(sys.dictionary_obj_owner, sys.dictionary_obj_name, xdb.DBMS_XDBZ.IS_ENABLED_CONTENTS)) THEN
      xdb.XDB_PITRIG_PKG.pitrig_truncate(sys.dictionary_obj_owner, sys.dictionary_obj_name);
    ELSIF (sys.is_vpd_enabled(sys.dictionary_obj_owner, sys.dictionary_obj_name, xdb.DBMS_XDBZ.IS_ENABLED_RESMETADATA)) THEN
      xdb.XDB_PITRIG_PKG.pitrig_dropmetadata(sys.dictionary_obj_owner, sys.dictionary_obj_name);
    END IF;
  EXCEPTION
    WHEN OTHERS THEN
     null;
  END;
END;

SQL> select getlong(trigger_name) trigger_body from dba_triggers where trigger_name = 'AW_TRUNC_TRG';

BEGIN
  aw_trunc_proc(ora_dict_obj_type, ora_dict_obj_name, ora_dict_obj_owner);
END;


Note: ora_dict_obj_owner, ora_dict_obj_name, ora_dict_obj_type are published Attributes (public synonyms) of System Events.
      (replacing functions of earlier releases: sys.dictionary_obj_owner, sys.dictionary_obj_name, sys.dictionary_obj_type 
       defined in dbms_standard).


2. Test


Run following test (see appended Test Code):

alter system set session_cached_cursors=2000 scope=spfile;
alter system set processes=1100 scope=spfile;
alter system set job_queue_processes=1000 scope=spfile;
alter system set "_session_cached_instantiations"=1001 scope=spfile;

-- restart DB
SQL> startup force

-- start Test by launching 66 Jobs (66 > floor(65535/1000)=65)
SQL> exec gtt_truncate_job(66, 1e9);

--after Test, clean jobs
-- exec clean_jobs;
During test, monitor v$db_object_cache.LOCKS for both triggers, we can see their increasing:

column name format a80
set wrap off

SQL> select hash_value, locks, pins, locked_total, pinned_total, name  --v.*
       from v$db_object_cache v
      where locks > 1000
        and (hash_value in (2219505151, 679830350) 
         or (name like '%xdb.DBMS_XDBZ.%' or name like '%aw_trunc_proc%')); 

  HASH_VALUE  LOCKS  PINS LOCKED_TOTAL PINNED_TOTAL NAME
  ---------- ------ ----- ------------ ------------ --------------------------------------------------------------------------------
  2219505151  64141     0      1164253      1164234 BEGIN   BEGIN     IF (sys.is_vpd_enabled(sys.dictionary_obj_owner, sys.dictionar
  2219505151  64238     0      1164365            1 BEGIN   BEGIN     IF (sys.is_vpd_enabled(sys.dictionary_obj_owner, sys.dictionar
   679830350  64506     0      1164192      1164167 BEGIN   aw_trunc_proc(ora_dict_obj_type, ora_dict_obj_name, ora_dict_obj_owner);
   679830350  64602     0      1164289            1 BEGIN   aw_trunc_proc(ora_dict_obj_type, ora_dict_obj_name, ora_dict_obj_owner);
After about 10 minutes, LOCKS increased closing to 65535 (66 sessions, each of which allocated about 1000 LOCKS).

database alert.log shows ORA-04025 Errors caused by two triggers:

ORA-04025: maximum allowed library object lock allocated for BEGIN
  aw_trunc_proc(ora_dict_obj_type, ora_dict_obj_name, ora_dict_obj_owner);
END;

ORA-04025: maximum allowed library object lock allocated for BEGIN
  BEGIN
    IF (sys.is_vpd_enabled(sys.dictionary_obj_owner, sys.dictionary_obj_name, xdb.DBMS_XDBZ.IS_ENABLED_CONTENTS)) THEN
      xdb.XDB_PITRIG_PKG.pitrig_truncate(sys.dictionary_obj_owner, sys.dictionary_obj_name);
    ELSIF (sys.is_vpd_enabled(sys.dictionary_obj_owner, sys.dictionary_obj_name, xdb.DBMS_XDBZ.IS_ENABLED_RESMETADATA)) THEN
      xdb.XDB_PITRIG_PKG.pitrig_dropmetadata(sys.dictionary_obj_owner, sys.dictionary_obj_name);
    END IF;
  EXCEPTION
    WHEN OTHERS THEN
     null;
  END;
END;
Both cursors can also be found by:

column sql_text format a100
set wrap off

SQL > select sql_id, hash_value, executions, parse_calls, sql_text from v$sqlarea v 
      where sql_id in ('ct6c4h224pxgz', 'gx87sann8asuf') or hash_value in (2219505151, 679830350);

  SQL_ID        HASH_VALUE EXECUTIONS PARSE_CALLS SQL_TEXT
  ------------- ---------- ---------- ----------- --------------------------------------------------
  ct6c4h224pxgz 2219505151   13537973    13623969 BEGIN   BEGIN     IF (sys.is_vpd_enabled(sys.dicti
  gx87sann8asuf  679830350   13649026    13603140 BEGIN   aw_trunc_proc(ora_dict_obj_type, ora_dict_
Blog: ORA-04025 library object lock on AQ queue and SQL cursor showed that Maximum Allowed Library Object Lock is hard limited to 65535. That is why at least 66 Jobs (66 > floor(65535/1000)=65) have to be launched to provoke ORA-04025 Error.

Note that in the above test, we set
  session_cached_cursors=2000
because of two DATABASE Truncate Triggers, each of both can acquire about 1000 LOCKs.

As a justification, even we start one single session, LOCK_CNT is around 1000.

SQL > exec clean_jobs;
 
SQL > exec gtt_truncate_job(1, 1e9);

SQL > with sq as
        (select /*+ materialize */ sid, program, saddr, event
            ,to_char(p1, 'xxxxxxxxxxxxxxx') p1, p1text
            ,to_char(p2, 'xxxxxxxxxxxxxxx') p2, p2text
            ,to_char(p3, 'xxxxxxxxxxxxxxx') p3, p3text
         from v$session
        where program like '%(J%')
      select o.kglnahsh, s.sid, holding_user_session, object_handle, l.type, mode_held, mode_requested,
             count(*) lock_cnt     -- it goes up to about 1000
        from v$libcache_locks l, x$kglob o, sq s
       where 1= 1
         and o.kglnahsh in (2219505151, 679830350)     
         and l.object_handle = o.kglhdadr
         and l.holding_user_session = s.saddr
      group by o.kglnahsh, s.sid, holding_user_session, object_handle, l.type, mode_held, mode_requested
      order by lock_cnt desc, s.sid, holding_user_session, l.type, mode_held, mode_requested;
  
    KGLNAHSH  SID HOLDING_USER_SES OBJECT_HANDLE    TYPE  MODE_HELD MODE_REQUESTED   LOCK_CNT
  ---------- ---- ---------------- ---------------- ---- ---------- -------------- ----------
  2219505151  152 070001010D31BD08 07000100EDC018C8 LOCK          1              0       1012
  2219505151  152 070001010D31BD08 07000100ED2F4110 LOCK          1              0       1011
   679830350  152 070001010D31BD08 07000100ECDBA460 LOCK          1              0       1009
   679830350  152 070001010D31BD08 07000100ECDBBF28 LOCK          1              0       1009


3. "library cache: mutex X" and "cursor: pin X"


During test, heavy "library cache: mutex X" and "cursor: pin X" are observed on GTT truncate statement:

truncate table my_gtt
 
SQL > select mutex_identifier, mutex_type, count(*) cnt from v$mutex_sleep_history 
      group by mutex_identifier, mutex_type 
      order by cnt desc;

  MUTEX_IDENTIFIER MUTEX_TYPE      CNT
  ---------------- -------------- ----
         167539573 Library Cache   278
         167539573 Cursor Pin       81
   
SQL > select hash_value, locks, pins, locked_total, pinned_total, invalidations, name 
      from v$db_object_cache where hash_value in (167539573);

  HASH_VALUE  LOCKS  PINS LOCKED_TOTAL PINNED_TOTAL INVALIDATIONS NAME
  ---------- ------ ----- ------------ ------------ ------------- ---------------------
   167539573      1     0        14295        26690         13002 truncate table my_gtt
   167539573      2     0        14309        26510         13026 truncate table my_gtt
   167539573      1     0        14134        26218         12874 truncate table my_gtt
   ......
 
SQL > select sql_id, hash_value, executions, parse_calls, invalidations, sql_text 
      from v$sqlarea v where sql_id in ('1jh104s4zswvp') or hash_value in (167539573); 
 
  SQL_ID        HASH_VALUE EXECUTIONS PARSE_CALLS INVALIDATIONS SQL_TEXT
  ------------- ---------- ---------- ----------- ------------- ---------------------
  1jh104s4zswvp  167539573         86          90       1325129 truncate table my_gtt
It seems that "library cache: mutex X" is caused by dynamic SQL statement "EXECUTE IMMEDIATE" because GTT truncate is executed by:
    execute immediate 'truncate table my_gtt'
and both above DATABASE Truncate Triggers are probably also called by "EXECUTE IMMEDIATE" since v$db_object_cache.NAME and v$sqlarea.SQL_TEXT are formatted in lower and upper mixed cases.

See Blogs:
    Divide and conquer the true mutex contention
    "library cache: mutex X" and Application Context
for more discussions.


4. XDB_PI_TRIG Trigger Components and Cost


Run GTT truncate with SQL Trace 10046:

alter trigger sys.AW_TRUNC_TRG disable;
alter trigger sys.XDB_PI_TRIG enable;

SQL > alter system flush shared_pool; 
SQL > alter session set events '10046 trace name context forever, level 12';
SQL > exec gtt_truncate(1);
SQL > alter session set events '10046 trace name context off';
We can see all the TOP statements:

--SQL_ID: 86kwhy1f0bttn 
SELECT /*+ ALL_ROWS */
      COUNT (*)
  FROM DBA_POLICIES V
 WHERE     V.OBJECT_OWNER = :B3
       AND V.OBJECT_NAME = :B2
       AND (V.POLICY_NAME LIKE '%xdbrls%' OR V.POLICY_NAME LIKE '%$xd_%')
       AND V.FUNCTION = :B1
       
--SQL_ID: 1jh104s4zswvp 
truncate table my_gtt
       
--SQL_ID: 1u4r80k42x8jj 
select count(FA#) from SYS_FBA_TRACKEDTABLES where OBJ# = 1001511 and bitand(FLAGS, 128)=0

--SQL_ID: au7qd3c3yuk1u 
LOCK TABLE "MY_GTT" IN EXCLUSIVE MODE WAIT 5

--SQL_ID: 9ff116syaf279 
SELECT TO_NUMBER(XS_SYS_CONTEXT('XS$SESSION', 'SESSION_XS_USER_GUID')) FROM DUAL

--SQL_ID: au7qd3c3yuk1u 
LOCK TABLE "MY_GTT" IN EXCLUSIVE MODE WAIT 5
In fact, our GTT table: my_gtt is irrelevant to XDB:

SELECT /*+ ALL_ROWS */
      *
  FROM DBA_POLICIES V
 WHERE     V.OBJECT_OWNER = 'XDB'
     --AND V.OBJECT_NAME = 'MY_GTT'
       AND (V.POLICY_NAME LIKE '%xdbrls%' OR V.POLICY_NAME LIKE '%$xd_%')
       AND V.FUNCTION in ('CHECKPRIVRLS_SELECTPF', 'CHECKPRIVRLS_SELECTPROPF');         


5. Query against SYS_FBA_TRACKEDTABLES


We also noticed one query on SYS_FBA_TRACKEDTABLES having TOP Executions and Parse Calls. Both number are very close, which means that each Execution requires a Parse Call.

SQL > select sql_id, hash_value, executions, parse_calls, sql_text from v$sqlarea v 
where sql_id in ('1u4r80k42x8jj') or hash_value in (2284757553);

  SQL_ID        HASH_VALUE EXECUTIONS PARSE_CALLS SQL_TEXT
  ------------- ---------- ---------- ----------- -------------------------------------------------------------------------------------------
  1u4r80k42x8jj 2284757553   35663414    35536924 select count(FA#) from SYS_FBA_TRACKEDTABLES where OBJ# = 1001511 and bitand(FLAGS, 128)=0

-- where 1001511 is MY_GT

SQL > select object_name, object_id, object_type from dba_objects where object_id = 1001511;

  OBJECT_NAME   OBJECT_ID  OBJECT_TYPE
  -----------  ---------- ------------
  MY_GTT          1001511        TABLE
Searching in Oracle MOS, it looks like:
     Bug 15931756 - ORA-4031 / Queries against SYS_FBA_TRACKEDTABLES not shared (do not use binds) (Doc ID 15931756.8)


6. Disable/Drop Triggers


Trigger XDB_PI_TRIG is applied for Oracle XML Database (XDB), and AW_TRUNC_TRG for OLAP Analytic Workspaces(AW). If they are not installed/used, one can disable/drop them. These can be checked by:

select comp_name, version, status from dba_registry order by 1;
select * from v$option where parameter = 'OLAP';  
select * from dba_feature_usage_statistics where name like '%XDB%' or name like '%OLAP%';
select * from dba_aws;

(Note: XDB_PI_TRIG trigger moved from schema XDB to SYS with PSU 11.1.0.7.16)  
It is not clear why AW_TRUNC_TRG got installed although we do not buy/pay the license. Besides, the table name MY_GTT is not like 'AW$_%'.


7. Boundary Value and Cursor Leak ?


Set session_cached_cursors to Boundary Value: 65535:

alter system set session_cached_cursors=65535 scope=spfile;
alter system set "_session_cached_instantiations"=1001 scope=spfile;
alter system set open_cursors=400 scope=spfile;
alter system set cursor_sharing=exact scope=spfile;

-- enable only one single truncate trigger
alter trigger sys.AW_TRUNC_TRG disable;  
alter trigger sys.XDB_PI_TRIG  enable;

-- Restart DB
--SQL> startup force
Open one Sqlplus session and run (it takes about 10 minutes):

SQL (567) > exec gtt_truncate(65535);
Immediately after its finished, run following queries about Library Cache and Cursor, and look the output:

set lines 400
column name format a50
set wrap off

select hash_value, locks, pins, locked_total, pinned_total, executions, child_latch, name
  from v$db_object_cache v
 where (hash_value in (2219505151) or (name like 'BEGIN%xdb.DBMS_XDBZ.%')) 
   and locks > 100; 
   
  HASH_VALUE  LOCKS  PINS LOCKED_TOTAL PINNED_TOTAL EXECUTIONS CHILD_LATCH NAME
  ---------- ------ ----- ------------ ------------ ---------- ----------- --------------------------------------------------
  2219505151  65532     0       131070       131072      65535           0 BEGIN   BEGIN     IF (sys.is_vpd_enabled(sys.dicti
  2219505151  65533     0       131073            1      65535       62975 BEGIN   BEGIN     IF (sys.is_vpd_enabled(sys.dicti


select hash_value, sql_id, child_number, users_opening, executions, parse_calls, invalidations
from  v$sql v
where (sql_id in ('ct6c4h224pxgz') or hash_value in (2219505151))
  and users_opening > 100;   
  
  HASH_VALUE SQL_ID        CHILD_NUMBER USERS_OPENING EXECUTIONS PARSE_CALLS INVALIDATIONS
  ---------- ------------- ------------ ------------- ---------- ----------- -------------
  2219505151 ct6c4h224pxgz            1         65531      65535       65535             1  


select count(*) from v$open_cursor
where (sql_id in ('ct6c4h224pxgz') or hash_value in (2219505151))
  and sid in (567);
  
   COUNT(*)
  ---------
      65530 
    
select * from v$open_cursor
where (sql_id in ('ct6c4h224pxgz') or hash_value in (2219505151))
  and sid in (567) 
  and rownum <= 3;  

   SID HASH_VALUE SQL_ID        SQL_TEXT                              LAST_SQL SQL_EXEC CURSOR_TYPE                     CHILD_ADDRESS   
  ---- ---------- ------------- ------------------------------------- -------- -------- ------------------------------- ----------------
   567 2219505151 ct6c4h224pxgz BEGIN   BEGIN     IF (sys.is_vpd_enab                   DICTIONARY LOOKUP CURSOR CACHED 000000017636BA68
   567 2219505151 ct6c4h224pxgz BEGIN   BEGIN     IF (sys.is_vpd_enab                   DICTIONARY LOOKUP CURSOR CACHED 000000017636BA68
   567 2219505151 ct6c4h224pxgz BEGIN   BEGIN     IF (sys.is_vpd_enab                   DICTIONARY LOOKUP CURSOR CACHED 000000017636BA68


select s.program, s.sid, n.name p_name, t.value
from v$session s, v$sesstat t, v$statname n 
where s.sid=t.sid and n.statistic# = t.statistic# 
  and name = 'session cursor cache count'
  and s.sid in (567);
  
  PROGRAM       SID  P_NAME                      VALUE
  ------------ ---- --------------------------- ------
  sqlplus.exe   567  session cursor cache count  65535
  
      
select sql_id, child_number from v$sql_shared_cursor
where sql_id in ('ct6c4h224pxgz');

  SQL_ID        CHILD_NUMBER
  ------------- ------------
  ct6c4h224pxgz            0
  ct6c4h224pxgz            1      
The above result shows:
  1. v$db_object_cache.LOCKS is close to 65535
  2. v$sql.USERS_OPENING is close to 65535
  3. v$open_cursor count is close to 65535
  4. 'session cursor cache count' is 65535
Contrary to Blog: ORA-04025 library object lock on AQ queue and SQL cursor
  1. SESSION_CACHED_CURSORS (65535) sets hard limit of LIBRARY OBJECT LOCK, 
     instead of "_session_cached_instantiations".
  2. OPEN_CURSORS has no effect. 
v$open_cursor shows that one single session for one single cursor has 65535 CURSORS (opened and parsed, or cached), all marked with CURSOR_TYPE: "DICTIONARY LOOKUP CURSOR CACHED".
65535 is clearly over open_cursors=400, and according to Oracle Doc:
    OPEN_CURSORS specifies the maximum number of open cursors (handles to private SQL areas) a session can have at once.
It is not clear if this a Cursor Leak.

(MOS: Bug 25477055: INCORRECT CURSOR_TYPE IN V$OPEN_CURSOR FOR SESSION CACHED CURSOR said:
    "DICTIONARY LOOKUP CURSOR CACHED" is misreported, it should be "SESSION CURSOR CACHED").

Immediately after above gtt_truncate(65535) terminated, run following statement in another session, it will raise ORA-04025:

SQL (1124)> exec gtt_truncate(100);

  ORA-04025: maximum allowed library object lock allocated for BEGIN
  BEGIN
  IF (sys.is_vpd_enabled(sys.dictionary_obj_owner, sys.dictionary_obj_name, xdb.DBMS_XDBZ.IS_ENABLED_CONTENTS)) THEN
  xdb.XDB_PITRIG_PKG.pitrig_truncate(sys.dictionary_obj_owner, sys.dictionary_obj_name);
  ELSIF (sys.is_vpd_enabled(sys.dictionary_obj_owner, sys.dictionary_obj_name, xdb.DBMS_XDBZ.IS_ENABLED_RESMETADATA)) THEN
  xdb.XDB_PITRIG_PKG.pitrig_dropmetadata(sys.dictionary_obj_owner, sys.dictionary_obj_name);
  END IF;
  EXCEPTION
  WHEN OTHERS THEN
  null;
  END;
  END;  


8. Simple Truncate Trigger Reproducing


We will use a (almost) empty Truncate Trigger to demonstrate that ORA-04025 is caused by the existence of Truncate Trigger, but irrelevant to its substances.

Disable (or Drop) both Oracle Truncate Triggers:

alter trigger sys.AW_TRUNC_TRG disable;
alter trigger sys.XDB_PI_TRIG  disable;
Create a new trivial Trigger:

create or replace trigger SYS.KSUN_TEST_TRIG
before drop or truncate ON database 
  BEGIN
    IF 1=2 THEN
      null;
    END IF;
  END;
/
Configure session_cached_cursors to maximum value:

alter system set session_cached_cursors=65535 scope=spfile;
alter system set "_session_cached_instantiations"=1001 scope=spfile;
alter system set open_cursors=400 scope=spfile;
alter system set cursor_sharing=exact scope=spfile;
Run test:

SQL> exec gtt_truncate_job(2, 1e9);
Monitor v$db_object_cache.LOCKS:

column name format a80
set wrap off

SQL> select hash_value, locks, pins, locked_total, pinned_total, executions, name  --v.*
       from v$db_object_cache v
      where locks > 1000
        and (name like '%1=2%' or name like '%KSUN_TEST_TRIG%');
        
  HASH_VALUE  LOCKS  PINS LOCKED_TOTAL PINNED_TOTAL EXECUTIONS NAME
  ---------- ------ ----- ------------ ------------ ---------- -----------------------------------------------------
  2564248212  65514     0        65516        65517      65516 BEGIN     IF 1=2 THEN       null;     END IF;   END;
  2564248212  65514     0        65516            1      65515 BEGIN     IF 1=2 THEN       null;     END IF;   END;        
and alert.log shows:
        
ORA-04025: maximum allowed library object lock allocated for BEGIN
    IF 1=2 THEN
      null;
    END IF;
  END;

9. Hidden Parameter: _system_trig_enabled


Oracle provided a hidden Parameter:
   _system_trig_enabled   are system triggers enabled   default: TRUE
to disable system triggers.

If it is deactivated by:

  alter system set "_system_trig_enabled"=false;  
query below returns no more both trigger rows:

 select hash_value, locks, pins, locked_total, pinned_total, name  --v.*
       from v$db_object_cache v
      where locks > 0
        and (hash_value in (2219505151, 679830350) 
         or (name like '%xdb.DBMS_XDBZ.%' or name like '%aw_trunc_proc%'))
Both triggers are listed in following sql:

select a.obj#, a.sys_evts, b.name, a.*, b.*
  from sys.trigger$ a, sys.obj$ b
 where a.sys_evts > 0 and a.obj#=b.obj#
   and name in ('XDB_PI_TRIG', 'AW_TRUNC_TRG');
Hence they are effectively disabled with "_SYSTEM_TRIG_ENABLED=FALSE".
(see MOS Doc ID 244524.1: How to find triggers being disabled with _SYSTEM_TRIG_ENABLED parameter)


10. Test Code



create global temporary table my_gtt (id number) on commit preserve rows;

create or replace procedure gtt_truncate(p_cnt number) is
begin
 for i in 1..p_cnt loop
    execute immediate 'truncate table my_gtt';
    --dbms_lock.sleep(0.01);
  end loop;
end;
/

create or replace procedure gtt_truncate_job(p_job_cnt number, p_cnt number) as
  l_job_id pls_integer;
begin
  for i in 1..p_job_cnt loop
    dbms_job.submit(l_job_id, 'begin while true loop gtt_truncate('||p_cnt||'); end loop; end;');
  end loop;
  commit;
end;    
/

create or replace procedure clean_jobs as
begin
  for c in (select job from dba_jobs) loop
    begin
       dbms_job.remove (c.job);
    exception when others then null;
    end;
    commit;
  end loop;

  for c in (select d.job, d.sid, (select serial# from v$session where sid = d.sid) ser 
              from dba_jobs_running d) loop
    begin
      execute immediate
             'alter system kill session '''|| c.sid|| ',' || c.ser|| ''' immediate';
      dbms_job.remove (c.job);
    exception when others then null;
    end;
    commit;
  end loop;
  
  -- select * from dba_jobs;
  -- select * from dba_jobs_running;
end;
/

Wednesday, October 4, 2017

Result Cache:0 rows updated Invalidations (IV)

We have hit a new case of Result Cache Invalidations, in which result_cache Table has "0 rows updated". If we update another non result_cache Table in the same transaction, Result Cache becomes Invalid.

At the end of this Blog, we also give a Workaround.

This is the 4th case of Result Cache Invalidation, which can cause Result Cache: RC Latch Contention.
1. PL/SQL Function Result Cache Invalidation (I) (DML and Select for Update)
2. Result Cache: RC Latch Contention and RESULT_CACHE_MAX_SIZE (II))
3. Result Cache: RC Latch Contention and PL/SQL Exception Handler (III)
4. Result Cache:0 rows updated Invalidations (IV) 
Note: All tests are done in Oracle 11.2.0.4, 12.1.0.2, 12.2.0.1 on AIX, Solaris, Linux with 6 physical processors.

Update (25-Jan-2018): The reported issue accepted as Bug and public visible.
    Bug 26964029 : RESULT CACHE INVALIDATED EVEN THOUGH UPDATE ZERO ROWS


1. Test Setup



alter session set nls_date_format = 'yyyy-MON-dd hh24:mi:ss';

drop table noise_tab;

create table noise_tab as select level x, rpad('ABC', 100, 'X') y from dual connect by level <= 3; 

drop table rc_tab;

create table rc_tab as select level id, level*10 val from dual connect by level <= 5;

create or replace function get_val (p_id number) return number result_cache as
  l_val     number;
begin
  select val into l_val from rc_tab where id = p_id;
  return l_val;
end;
/

create or replace procedure run_test as
  l_val number;
begin
  for i in 1 .. 5 loop
    l_val := get_val(i);
  end loop;
end;
/


2. Test_1. Pure Test, No Invalid



----------------- Test 1. Pure Test, No Invalid -----------------

exec dbms_result_cache.flush;

exec run_test;

select creation_timestamp, type, status, name from v$result_cache_objects;

update rc_tab set val = val+1 where 1 = 2;

commit;

select creation_timestamp, type, status, name from v$result_cache_objects;


3. Test_2. Noise Test, Invalid



----------------- Test 2. Noise Test, Invalid -----------------

exec dbms_result_cache.flush;

exec run_test;

select creation_timestamp, type, status, name from v$result_cache_objects;

update rc_tab set val = val+1 where 1 = 2;

update noise_tab set y = 'xx' where x = 2;

commit;

select creation_timestamp, type, status, name from v$result_cache_objects;


4. Test Output



--============================= Test Output =============================--

SQL > ----------------- Test 1. Pure Test, No Invalid -----------------
SQL > exec dbms_result_cache.flush;

SQL > exec run_test;

SQL > select creation_timestamp, type, status, name from v$result_cache_objects;

CREATION_TIMESTAMP   TYPE       STATUS    NAME
-------------------- ---------- --------- -------------------------------------------------------------------
2017-OCT-04 13:38:34 Dependency Published K.RC_TAB
2017-OCT-04 13:38:34 Dependency Published K.GET_VAL
2017-OCT-04 13:38:34 Result     Published "K"."GET_VAL"::8."GET_VAL"#3048d2af80817a01 #1
2017-OCT-04 13:38:34 Result     Published "K"."GET_VAL"::8."GET_VAL"#3048d2af80817a01 #1
2017-OCT-04 13:38:34 Result     Published "K"."GET_VAL"::8."GET_VAL"#3048d2af80817a01 #1
2017-OCT-04 13:38:34 Result     Published "K"."GET_VAL"::8."GET_VAL"#3048d2af80817a01 #1
2017-OCT-04 13:38:34 Result     Published "K"."GET_VAL"::8."GET_VAL"#3048d2af80817a01 #1

7 rows selected.

SQL > update rc_tab set val = val+1 where 1 = 2;

0 rows updated.

SQL > commit;

SQL > select creation_timestamp, type, status, name from v$result_cache_objects;

CREATION_TIMESTAMP   TYPE       STATUS    NAME
-------------------- ---------- --------- -------------------------------------------------------------------
2017-OCT-04 13:38:34 Dependency Published K.RC_TAB
2017-OCT-04 13:38:34 Dependency Published K.GET_VAL
2017-OCT-04 13:38:34 Result     Published "K"."GET_VAL"::8."GET_VAL"#3048d2af80817a01 #1
2017-OCT-04 13:38:34 Result     Published "K"."GET_VAL"::8."GET_VAL"#3048d2af80817a01 #1
2017-OCT-04 13:38:34 Result     Published "K"."GET_VAL"::8."GET_VAL"#3048d2af80817a01 #1
2017-OCT-04 13:38:34 Result     Published "K"."GET_VAL"::8."GET_VAL"#3048d2af80817a01 #1
2017-OCT-04 13:38:34 Result     Published "K"."GET_VAL"::8."GET_VAL"#3048d2af80817a01 #1

7 rows selected.

SQL > ----------------- Test 2. Noise Test, Invalid -----------------

SQL > exec dbms_result_cache.flush;

SQL > exec run_test;

SQL > select creation_timestamp, type, status, name from v$result_cache_objects;

CREATION_TIMESTAMP   TYPE       STATUS    NAME
-------------------- ---------- --------- -------------------------------------------------------------------
2017-OCT-04 13:38:34 Dependency Published K.RC_TAB
2017-OCT-04 13:38:34 Dependency Published K.GET_VAL
2017-OCT-04 13:38:34 Result     Published "K"."GET_VAL"::8."GET_VAL"#3048d2af80817a01 #1
2017-OCT-04 13:38:34 Result     Published "K"."GET_VAL"::8."GET_VAL"#3048d2af80817a01 #1
2017-OCT-04 13:38:34 Result     Published "K"."GET_VAL"::8."GET_VAL"#3048d2af80817a01 #1
2017-OCT-04 13:38:34 Result     Published "K"."GET_VAL"::8."GET_VAL"#3048d2af80817a01 #1
2017-OCT-04 13:38:34 Result     Published "K"."GET_VAL"::8."GET_VAL"#3048d2af80817a01 #1

7 rows selected.

SQL > update rc_tab set val = val+1 where 1 = 2;

0 rows updated.

SQL > update noise_tab set y = 'xx' where x = 2;

1 row updated.

SQL > commit;

SQL >
SQL > select creation_timestamp, type, status, name from v$result_cache_objects;

CREATION_TIMESTAMP   TYPE       STATUS    NAME
-------------------- ---------- --------- -------------------------------------------------------------------
2017-OCT-04 13:38:34 Dependency Published K.RC_TAB
2017-OCT-04 13:38:34 Dependency Published K.GET_VAL
2017-OCT-04 13:38:34 Result     Invalid   "K"."GET_VAL"::8."GET_VAL"#3048d2af80817a01 #1
2017-OCT-04 13:38:34 Result     Invalid   "K"."GET_VAL"::8."GET_VAL"#3048d2af80817a01 #1
2017-OCT-04 13:38:34 Result     Invalid   "K"."GET_VAL"::8."GET_VAL"#3048d2af80817a01 #1
2017-OCT-04 13:38:34 Result     Invalid   "K"."GET_VAL"::8."GET_VAL"#3048d2af80817a01 #1
2017-OCT-04 13:38:34 Result     Invalid   "K"."GET_VAL"::8."GET_VAL"#3048d2af80817a01 #1

7 rows selected.


5. Workaround



exec dbms_result_cache.flush;

exec run_test;

select creation_timestamp, type, status, name from v$result_cache_objects;

--lock table rc_tab in ROW EXCLUSIVE mode;
lock table rc_tab in EXCLUSIVE mode;

begin
  if  1 = 2 then
     update rc_tab set val = val+1;
  end if;
end;
/

update noise_tab set y = 'xx' where x = 2;

commit;

select creation_timestamp, type, status, name from v$result_cache_objects;

--============================= Workaround Output =============================--

SQL > exec dbms_result_cache.flush;

SQL > exec run_test;

SQL > select creation_timestamp, type, status, name from v$result_cache_objects;

 CREATION_TI TYPE       STATUS    NAME
 ----------- ---------- --------- ---------------------------------------------------------------------------
 09-OCT-2017 Dependency Published K.RC_TAB
 09-OCT-2017 Dependency Published K.GET_VAL
 09-OCT-2017 Result     Published "K"."GET_VAL"::8."GET_VAL"#3048d2af80817a01 #1
 09-OCT-2017 Result     Published "K"."GET_VAL"::8."GET_VAL"#3048d2af80817a01 #1
 09-OCT-2017 Result     Published "K"."GET_VAL"::8."GET_VAL"#3048d2af80817a01 #1
 09-OCT-2017 Result     Published "K"."GET_VAL"::8."GET_VAL"#3048d2af80817a01 #1
 09-OCT-2017 Result     Published "K"."GET_VAL"::8."GET_VAL"#3048d2af80817a01 #1

 7 rows selected.

SQL > lock table rc_tab in EXCLUSIVE mode;

 Table(s) Locked.

SQL > begin
   2    if  1 = 2 then
   3       update rc_tab set val = val+1;
   4    end if;
   5  end;
   6  /

SQL > update noise_tab set y = 'xx' where x = 2;

 1 row updated.

SQL > commit;

SQL > select creation_timestamp, type, status, name from v$result_cache_objects;

 CREATION_TI TYPE       STATUS    NAME
 ----------- ---------- --------- ---------------------------------------------------------------------------
 09-OCT-2017 Dependency Published K.RC_TAB
 09-OCT-2017 Dependency Published K.GET_VAL
 09-OCT-2017 Result     Published "K"."GET_VAL"::8."GET_VAL"#3048d2af80817a01 #1
 09-OCT-2017 Result     Published "K"."GET_VAL"::8."GET_VAL"#3048d2af80817a01 #1
 09-OCT-2017 Result     Published "K"."GET_VAL"::8."GET_VAL"#3048d2af80817a01 #1
 09-OCT-2017 Result     Published "K"."GET_VAL"::8."GET_VAL"#3048d2af80817a01 #1
 09-OCT-2017 Result     Published "K"."GET_VAL"::8."GET_VAL"#3048d2af80817a01 #1
 
 7 rows selected.


6. v$result_cache_statistics


V$RESULT_CACHE_STATISTICS contains 3 Rows reporting different kind of Invalidations. To recap all our previous discussion on triggering originals of Result Cache Invalidation, we can try to map those Rows to the originals.

ID 8. Invalidation Count: DML updates caused, see: PL/SQL Function Result Cache Invalidation (I) and Result Cache:0 rows updated Invalidations (IV)

ID 9. Delete Count Invalid:  PL/SQL RC Exception Handler related Result Cache, see: RC Latch Contention and PL/SQL Exception Handler (III)

ID 10. Delete Count Valid: Memory Limit Reached, see: Result Cache: RC Latch Contention and RESULT_CACHE_MAX_SIZE (II)

ID 11. Hash Chain Length:  length of hash chain. When no invalid, it is 1 or a constant number. When invalidate/create, it is a range.

If Result Cache experienced a heavy Invalidations, and got recreated, we can see the difference of "ID 5. Create Count Success" before and after recreated, and that of "ID 9. Delete Count Invalid" are quite close. In such case, if we rebuild the Invalidations by a few parallel sessions, latch free of "Result Cache: RC Latch" will appear at the top wait events. It could be fixed by either to flush Result Cache and rebuild from fresh, or use a single session to recreate them.


Update (18Mar2018)


In Blog: RESULT_CACHE hint expiration options , dependencies test showed one case of Result Cache Invalidation by "0 rows deleted".

drop table DEMO;
create table DEMO as select rownum id from xmltable('1 to 100000');
exec if dbms_result_cache.flush then dbms_output.put_line('Flushed.'); end if;

set autotrace on
select /*+ result_cache */ count(*) from DEMO;

set autotrace off
select * from v$result_cache_dependency;
  --
  --  RESULT_ID  DEPEND_ID  OBJECT_NO     CON_ID
  --  --------- ---------- ---------- ----------
  --          1          0    2305802          0
          
select id,type,status,name,cache_id,invalidations from v$result_cache_objects order by id;
  --  ID TYPE       STATUS    NAME                                          CACHE_ID                   INVALIDATIONS
  --  -- ---------- --------- --------------------------------------------- -------------------------- -------------
  --   0 Dependency Published K.DEMO                                        K.DEMO                                 0
  --   1 Result     Published select /*+ result_cache */ count(*) from DEMO 74s7myvzxca2u8jn4yfkftapdg             0

delete from DEMO where null is not null;
  --  0 rows deleted.
  
commit;

select * from v$result_cache_dependency;
  --  no rows selected
  
select id,type,status,name,cache_id,invalidations from v$result_cache_objects order by id;
  -- ID TYPE       STATUS    NAME                                          CACHE_ID                   INVALIDATIONS
  -- -- ---------- --------- --------------------------------------------- -------------------------- -------------
  --  0 Dependency Published K.DEMO                                        K.DEMO                                 1
  --  1 Result     Invalid   select /*+ result_cache */ count(*) from DEMO 3du8t8wtmx913djt9nvwm561sb             0
As we know, in order to record plan_table, "autotrace on" started a new transaction after first select statement with some DML like:

  DELETE FROM PLAN_TABLE WHERE STATEMENT_ID=:1
  insert into plan_table (statement_id, timestamp, operation ..
We can verify it by:

set autotrace off
commit;

select sum(s.sid), count(*)from v$transaction t, v$session s where t.addr=s.taddr and s.sid = sys.dbms_support.mysid;
  --  SUM(S.SID)   COUNT(*)
  --  ---------- ----------
  --                      0

set autotrace on

select dummy from dual;

select sum(s.sid), count(*)from v$transaction t, v$session s where t.addr=s.taddr and s.sid = sys.dbms_support.mysid;
  --  SUM(S.SID)   COUNT(*)
  --  ---------- ----------
  --         543          1
Removing "set autotrace on" and rerun the above test, there is no more such Invalid.

The original Blog got updated by a new Blog: Result cache invalidation caused by DML locks with deep analysis. Additionally it shows the Invalidation changes triggered by referential integrity in 12c, which needs to be further investigated.

Monday, September 18, 2017

Result Cache: RC Latch Contention and PL/SQL Exception Handler (III)

If we have fixed all the causes of Result Cache: RC Latch Contention discussed in previous two Blogs, the system still suffers from such Contention. There could exist the cases, which we have not yet covered.

In this Blog, we will discuss the third case: "Result Cache: RC Latch Contention and PL/SQL Exception Handler". So in total we have found 3 Cases:
1. PL/SQL Function Result Cache Invalidation (I) (DML and Select for Update)
2. Result Cache: RC Latch Contention and RESULT_CACHE_MAX_SIZE (II))
3. Result Cache: RC Latch Contention and PL/SQL Exception Handler (III)
Note: All tests are done in Oracle 11.2.0.4, 12.1.0.2, 12.2.0.1 on AIX, Solaris, Linux with 6 physical processors.

Update (2017-Sep-19): One day after this Blog appeared, it turns out that there exists an on-going bug fix, which probably addresses Case (II) and (III).
    Bug 23738186 : HIGH RESULT CACHE LATCHES AFTER MIGRATION TO 12.1.0.2
This fix addresses the Result Cache (RC) latch contention when the RC is in the LRU regime.
The following changes are part of this fix:

1. Bypass Object (BO)
    Currently, when a RC function throws an exception or a RC SQL exceeds memory
    limit, the under-construction Result Object (RO) is marked invalid and released.
    The customer's application has some RC functions that have a high rate of
    exceptions, which leads to high rate of invalid RO creation, which in turn leads
    to noticeable x-latch contention. To reduce these wasteful RO creations, we will
    instead store a BO when the function+parameters throws an exception. A BO is a
    regular RO that is created, on client directive, by the RC, but unlike regular
    RO's a BO can never be read. A BO uses exactly one block that contains the RO
    header but no data. A client trying to read a BO always gets a (non-storable)
    miss which forces the client to handle the miss by executing the function.
    
    Note, unlike regular misses, a non-storable miss does not allow the client to
      create a new RO. We have introduced two new statistics in v$result_cache_statistics
      to track the creation and use of BO's (for an example see tkqgcbug.sql/log).
 
2. Group release of invalid objects
    Release all invalid objects together (instead of one at a time).
 
3. Improved LRU list processing
    Instead of scanning all object on the list looking for the min lru#, stop
    the scan when the first un-scanned object (i.e. scan_count is 0) is reached.


1. Test Code


Out test table (TEST_TAB) contains 4 rows (1-4). Any select not in (1, 2, 3, 4) will get NO_DATA_FOUND exception.

We create 4 result_cache functions for 4 different scenarios:
  1. rc_c1: exception handler and return 
  2. rc_c2: exception handler and no return
  3. rc_c3: exception handler, raise again
  4. rc_c4: no exception handler

--------------------------------- Test Setup ---------------------------------
drop table test_tab;
drop table rc_latch_hist;
drop table rc_stats_hist;

create table test_tab as select level x from dual connect by level <= 4;

create table rc_latch_hist as 
select 'aaaaaaaaaaaaaaaaaaaa' snap_name, systimestamp ts, v.* from v$latch v where name = 'Result Cache: RC Latch'; 

create table rc_stats_hist as 
select 'aaaaaaaaaaaaaaaaaaaa' snap_name, systimestamp ts, v.* from v$result_cache_statistics v;

-- exception handler: return
create or replace function rc_c1(p_x number) return number result_cache as
  l_ret number;
begin
  select x into l_ret from test_tab where x = p_x;
  return l_ret;
  exception when others then return null;   --NO_DATA_FOUND
end;
/

-- exception handler: no return
create or replace function rc_c2(p_x number) return number result_cache as
  l_ret number;
begin
  select x into l_ret from test_tab where x = p_x;
  return l_ret;
  exception when others then null;
end;
/

-- exception handler: raise again
create or replace function rc_c3(p_x number) return number result_cache as
  l_ret number;
  my_excep exception;
  pragma exception_init(my_excep, -20001);
begin
  select x into l_ret from test_tab where x = p_x;
  return l_ret;
  exception when others then raise_application_error(-20001, 'My Private Exception'); 
end;
/

-- no exception handler
create or replace function rc_c4(p_x number) return number result_cache as
  l_ret number;
begin
  select x into l_ret from test_tab where x = p_x;
  return l_ret;
end;
/

create or replace procedure rc_proc_case(p_case number, p_rc_limit number) as
  l_sql_bool_val boolean := true;
  l_ret varchar2(10);
begin
 dbms_application_info.set_module('RC Test', 'RC_CASE_'||p_case);
  for x in 1..p_rc_limit loop
   begin
    case p_case 
      when 1 then l_ret := rc_c1(x);
      when 2 then l_ret := rc_c2(x);
      when 3 then l_ret := rc_c3(x);
      when 4 then l_ret := rc_c4(x);
    end case;
    exception when others then null;
    end;
  end loop;
end;
/

create or replace procedure rc_case_job(p_case number, p_job_cnt number, p_rc_limit number) as
  l_job_id pls_integer;
begin
  for i in 1.. p_job_cnt loop
    dbms_job.submit(l_job_id, 'begin while true loop rc_proc_case('||p_case||', '||p_rc_limit||'); end loop; end;');
  end loop;
  commit;
end;    
/

create or replace procedure clean_jobs as
begin
  for c in (select job from dba_jobs) loop
    begin
       dbms_job.remove (c.job);
    exception when others then null;
    end;
    commit;
  end loop;

  for c in (select d.job, d.sid, (select serial# from v$session where sid = d.sid) ser 
              from dba_jobs_running d) loop
    begin
      execute immediate
             'alter system kill session '''|| c.sid|| ',' || c.ser|| ''' immediate';
      dbms_job.remove (c.job);
    exception when others then null;
    end;
    commit;
  end loop;
  
  -- select * from dba_jobs;
  -- select * from dba_jobs_running;
end;
/

create or replace procedure rc_record_rc_latch(p_snap_name varchar2) as
begin
  insert into rc_latch_hist
  select p_snap_name snap_name, systimestamp ts, v.* from v$latch v where name = 'Result Cache: RC Latch';
  insert into rc_stats_hist
  select p_snap_name snap_name, systimestamp ts, v.* from v$result_cache_statistics v;
  commit; 
end;    
/

create or replace procedure rc_case_test(p_sleep number) as
begin
  clean_jobs;
  rc_record_rc_latch('Start');             
  rc_case_job(p_case=> 1, p_job_cnt=> 2, p_rc_limit=> 4); dbms_lock.sleep(p_sleep);
  rc_record_rc_latch('No_Error');          
  rc_case_job(p_case=> 1, p_job_cnt=> 1, p_rc_limit=> 5); dbms_lock.sleep(p_sleep);
  rc_record_rc_latch('Except_Return') ;     
  rc_case_job(p_case=> 2, p_job_cnt=> 1, p_rc_limit=> 5); dbms_lock.sleep(p_sleep);
  rc_record_rc_latch('Except_No_Return');  
  rc_case_job(p_case=> 3, p_job_cnt=> 1, p_rc_limit=> 5); dbms_lock.sleep(p_sleep);
  rc_record_rc_latch('Except_Raise');      
  rc_case_job(p_case=> 4, p_job_cnt=> 1, p_rc_limit=> 5); dbms_lock.sleep(p_sleep);
  rc_record_rc_latch('No_Hanler');
  clean_jobs;
end;    
/


2. Single Session Observation Test


Run following small test code:

column name format    a20
column rc_name format a48
column value format   a10
set numformat         999,999,999,999,999
column id format      999,999
alter session set nls_date_format ='dd-mon-yyyy hh24:mi:ss';
exec dbms_result_cache.flush;

select id, type, status, name rc_name, creation_timestamp, scn, lru_number from v$result_cache_objects v order by 3 desc, name, status;

var l_x number;
exec :l_x := rc_c4(1);
exec dbms_lock.sleep(1);
select id, type, status, name rc_name, creation_timestamp, scn, lru_number from v$result_cache_objects v order by 3 desc, name, status;

exec :l_x := rc_c4(5);
exec dbms_lock.sleep(1);
select id, type, status, name rc_name, creation_timestamp, scn, lru_number from v$result_cache_objects v order by 3 desc, name, status;
Output shows that there is "Invalid" Result when hitting NO_DATA_FOUND.

SQL > select id, type, status, name rc_name, creation_timestamp, scn, lru_number from v$result_cache_objects v order by 3 desc, name, status;
  no rows selected

SQL > exec :l_x := rc_c4(1);
SQL > select id, type, status, name rc_name, creation_timestamp, scn, lru_number from v$result_cache_objects v order by 3 desc, name, status;
   ID TYPE       STATUS    RC_NAME                                     CREATION_TIMESTAMP            SCN           LRU_NUMBER
  --- ---------- --------- ------------------------------------------- -------------------- -------------------- ---------------
    1 Result     Published "K"."RC_C4"::8."RC_C4"#3048d2af80817a01 #1  2017*sep*18 06:58:33    9,237,974,126,701   1,985,444,743
    0 Dependency Published K.RC_C4                                     2017*sep*18 06:58:33    9,237,974,126,701               0
    2 Dependency Published K.TEST_TAB                                  2017*sep*18 06:58:33    9,237,974,126,701               0

SQL > exec :l_x := rc_c4(5);
  ORA-01403: no data found
SQL > select id, type, status, name rc_name, creation_timestamp, scn, lru_number from v$result_cache_objects v order by 3 desc, name, status;
   ID TYPE       STATUS    RC_NAME                                     CREATION_TIMESTAMP            SCN           LRU_NUMBER
  --- ---------- --------- ------------------------------------------- -------------------- -------------------- ---------------
    1 Result     Published "K"."RC_C4"::8."RC_C4"#3048d2af80817a01 #1  2017*sep*18 06:58:33    9,237,974,126,701   1,985,444,743
    0 Dependency Published K.RC_C4                                     2017*sep*18 06:58:33    9,237,974,126,701               0
    2 Dependency Published K.TEST_TAB                                  2017*sep*18 06:58:33    9,237,974,126,701               0
    3 Result     Invalid   "K"."RC_C4"::8."RC_C4"#3048d2af80817a01 #1  2017*sep*18 06:58:34    9,237,974,126,701               0


3. Multiple Session Concurrency Test


On a UNIX machine with 6 physical processors, we start 6 Oracle Jobs one after another with an interval of about 60 seconds before each launch. At same time, we collect RC Latch and RC Stats between each launch.
1. First 2 Jobs (base load) select row 1-4, and there is no exception. 
   All other Jobs select row 1-5, and hence get exception when select row 5.
2. Job 3 has exception handler and returns null. 
3. Job 4 has exception handler, but does not have return. 
4. Job 5 has exception handler, but raises again.
5. Job 6 does not have exception handler.

truncate table rc_latch_hist;
truncate table rc_stats_hist;
exec dbms_result_cache.flush;
exec rc_case_test(60);
During the test, if we monitor RC Objects, we can see the status: "New" and "Invalid".

select type, status, name, count(*) from v$result_cache_objects 
group by type, status, name order by type desc, status, name;

TYPE       STATUS    NAME                                         COUNT(*)
---------- --------- ------------------------------------------ ----------
Result     Invalid   "K"."RC_C2"::8."RC_C2"#3048d2af80817a01 #1          7
Result     Invalid   "K"."RC_C3"::8."RC_C3"#3048d2af80817a01 #1          1
Result     New       "K"."RC_C3"::8."RC_C3"#3048d2af80817a01 #1          1
Result     New       "K"."RC_C4"::8."RC_C4"#3048d2af80817a01 #1          1
Result     Published "K"."RC_C1"::8."RC_C1"#3048d2af80817a01 #1          5
Result     Published "K"."RC_C2"::8."RC_C2"#3048d2af80817a01 #1          4
Result     Published "K"."RC_C3"::8."RC_C3"#3048d2af80817a01 #1          4
Result     Published "K"."RC_C4"::8."RC_C4"#3048d2af80817a01 #1          4
Dependency Published K.RC_C1                                             1
Dependency Published K.RC_C2                                             1
Dependency Published K.RC_C3                                             1
Dependency Published K.RC_C4                                             1
Dependency Published K.TEST_TAB                                          1
After the test, we look the collected stats, which shows the difference of RC Latch SLEEPS in 4 different scenarios.

The test result demonstrates that except the first scenario: "rc_c1: exception handler and return", all other 3 PL/SQL Result Cache implantations can cause RC Latch Contentions.

select v.*, round(wait_time/(nullif (sleeps, 0)), 2) wait_time_us_per_sleep
from (
    select snap_name, ts
          ,gets-lag(gets) over(order by ts) gets
          ,misses-lag(misses) over(order by ts) misses
          ,sleeps-lag(sleeps) over(order by ts) sleeps
          ,spin_gets-lag(spin_gets) over(order by ts) spin_gets
          ,wait_time-lag(wait_time) over(order by ts) wait_time
    from rc_latch_hist
) v order by ts;

SNAP_NAME         TS                                  GETS      MISSES     SLEEPS    SPIN_GETS    WAIT_TIME  WAIT_TIME_US_PER_SLEEP
----------------- --------------------------- ------------ ----------- ---------- ------------ ------------ -----------------------
Start             18-SEP-2017 07:59:55 +02:00                                                                
No_Error          18-SEP-2017 08:00:55 +02:00   61,623,726     790,876     0           790,876            0  
Except_Return     18-SEP-2017 08:01:55 +02:00  127,973,369   6,743,323     1         6,743,469            3                       3
Except_No_Return  18-SEP-2017 08:02:55 +02:00  135,558,711  14,221,514  1,277,086   12,948,140    7,762,707                       6
Except_Raise      18-SEP-2017 08:03:56 +02:00  123,932,503  16,996,212  2,304,628   14,752,639   14,765,338                       6
No_Hanler         18-SEP-2017 08:04:56 +02:00  112,230,114  18,895,634  3,322,022   15,724,684   22,069,024                       7

select name, snap_name, ts
      ,value-lag(value) over(partition by name order by ts) value
from rc_stats_hist
where name in ('Create Count Success', 'Delete Count Invalid')
order by name, ts;

NAME                    SNAP_NAME            TS                             VALUE
----------------------- -------------------- ------------------------------ ----------
Create Count Success    Start                18-SEP-2017 07:59:55 +02:00
Create Count Success    No_Error             18-SEP-2017 08:00:55 +02:00        4
Create Count Success    Except_Return        18-SEP-2017 08:01:55 +02:00        1
Create Count Success    Except_No_Return     18-SEP-2017 08:02:55 +02:00       747,926
Create Count Success    Except_Raise         18-SEP-2017 08:03:56 +02:00     1,410,675
Create Count Success    No_Hanler            18-SEP-2017 08:04:56 +02:00     2,025,890
Delete Count Invalid    Start                18-SEP-2017 07:59:55 +02:00
Delete Count Invalid    No_Error             18-SEP-2017 08:00:55 +02:00        0
Delete Count Invalid    Except_Return        18-SEP-2017 08:01:55 +02:00        0
Delete Count Invalid    Except_No_Return     18-SEP-2017 08:02:55 +02:00       747,902
Delete Count Invalid    Except_Raise         18-SEP-2017 08:03:56 +02:00     1,410,676
Delete Count Invalid    No_Hanler            18-SEP-2017 08:04:56 +02:00     2,025,891


4. SQL Result Cache Difference


In the above test, we use PL/SQL Result Cache. if we replace all of them by SQL Result Cache, for example:

create or replace function rc_c4(p_x number) return number as
  l_ret number;
begin
  select /*+ result_cache */ x into l_ret from test_tab where x = p_x;   
  return l_ret;
end;
/
make a similar test:

exec clean_jobs;
exec dbms_result_cache.flush;
exec rc_case_job(p_case=> 4, p_job_cnt=> 4, p_rc_limit=> 5);
and monitor RC Objects by:

select type, status, name, count(*) from v$result_cache_objects 
group by type, status, name order by type desc, status, name;

TYPE       STATUS    NAME                                                        COUNT(*)
---------- --------- ---------------------------------------------------------- ---------
Result     Published SELECT /*+ result_cache */ X FROM TEST_TAB WHERE X = :B1           5
Dependency Published K.TEST_TAB                                                         1
There does exist any RC status "New" or "Invalid". We can hardly observe RC Latch Contentions because there are no RC Latch SLEEPS. Therefore SQL Result Cache and PL/SQL Result Cache are probably NOT equivalent.

It is not clear how Oracle internally handles differently SQL Result Cache and PL/SQL Result Cache. But it could affect real applications when using them differently.

Monday, September 11, 2017

Result Cache: RC Latch Contention and RESULT_CACHE_MAX_SIZE (II)

Result Cache memory is limited by RESULT_CACHE_MAX_SIZE. Its default value is 1% of SHARED_POOL_SIZE. When allocated space reaches RESULT_CACHE_MAX_SIZE, any new request can cause contention of "Result Cache: RC Latch". RC Latch "ksl_get_shared_latch" Gets jumps from an order of magnitude Microseconds (us) to Millisecond (ms). The single RC Latch could eventually serialize the system ("Result Cache: Serialization12").

Each RC row takes at least one RC BLOCK (1K). In case of caching SQL Collection, it can occupy more than 1 BLOCK.

In the previous Blog, we discussed the RC Latch Contention caused by RC invalidation triggered by DML statements (Blog: PL/SQL Function Result Cache Invalidation)

Note: all tests are done in Oracle 11.2.0.4, 12.1.0.2, 12.2.0.1 on AIX, Solaris, Linux.

Update (2017-Sep-18): Oracle MOS: SQL Query Result Cache. Includes: [Video] (Doc ID 1108133.1)
RESULT_CACHE_MAX_SIZE defines the memory allocated to the result cache. When set to 0, the Result cache is disabled. The Default value is either 0.25% of MEMORY_TARGET or 0.5% of SGA_TARGET or 1% of SHARED_POOL_SIZE and is chosen by the database based on total memory available to the SGA and on the memory management method currently in use.


1. Test_1: Under Memory Limit


Test DB runs on a UNIX machine with 6 physical processors, and configured with:
 
  shared_pool_size         1408M
  
  result_cache_max_size    14432K
  result_cache_max_result  5
  result_cache_mode        MANUAL
When not specially set, result_cache_max_size is derived from shared_pool_size, which is about 1% of shared_pool_size in default.

Each RC row requires at least one RC BLOCK (1K), and with result_cache_max_size = 14432K, we can have maximum about 14432 RC Rows (ignoring some overhead).

Run test with 4 parallel sessions to create 13000 RC Rows (see appended Test Code):
 
exec clean_jobs;
exec dbms_result_cache.flush;
exec dbms_result_cache.memory_report(TRUE);
exec rc_fun_jobs(4, 13000);
Run query below, we can hardly observe RC Latch Sleeps increasing:
 
select latch#, name, gets, misses, spin_gets, sleeps, wait_time 
from v$latch where name = 'Result Cache: RC Latch'; 
-- latch# in (436, 559, 665) for Oracle 11.2.0.4/12.1.0.2, 12.2.0.1
Latch Misses is comprised of Spin_gets and Sleeps (see Blog: Is latch misses statistic gathered or deduced ? ), and the long wait is exposed by Sleeps.

In AWR report - "Top 10 Foreground Events by Total Wait Time" Section, it is presented as column "Wait Avg(ms)". In case of RC Latch, we can see it in the line "latch free". AWR "Latch Sleep Breakdown" Section contains more details.

Crosscheck RC memory usage by:
 
SQL > exec dbms_result_cache.memory_report(TRUE);
R e s u l t   C a c h e   M e m o r y   R e p o r t
[Parameters]
Block Size          = 1K bytes
Maximum Cache Size  = 14432K bytes (14432 blocks)
Maximum Result Size = 721K bytes (721 blocks)
[Memory]
Total Memory = 13540016 bytes [0.917% of the Shared Pool]

... Fixed Memory = 46272 bytes [0.003% of the Shared Pool]
....... Memory Mgr = 208 bytes
....... Cache Mgr  = 256 bytes
....... Bloom Fltr = 2K bytes

....... State Objs = 43760 bytes
... Dynamic Memory = 13493744 bytes [0.914% of the Shared Pool]
....... Overhead = 157168 bytes
........... Hash Table    = 64K bytes (4K buckets)
........... Chunk Ptrs    = 24K bytes (3K slots)
........... Chunk Maps    = 12K bytes
........... Miscellaneous = 157168 bytes
....... Cache Memory = 13024K bytes (13024 blocks)
........... Unused Memory = 23 blocks
........... Used Memory = 13001 blocks
............... Dependencies = 1 blocks (1 count)
............... Results = 13000 blocks
................... PLSQL   = 13000 blocks (13000 count)


SQL > select id, name, value from v$result_cache_statistics order by name;

   ID NAME                           VALUE
----- ------------------------------ ----------
    3 Block Count Current            13024
    2 Block Count Maximum            14432
    1 Block Size (Bytes)             1024
    6 Create Count Failure           0
    5 Create Count Success           13321
    9 Delete Count Invalid           321
   10 Delete Count Valid             0
   12 Find Copy Count                34959999
    7 Find Count                     34370958
   11 Hash Chain Length              1-5
    8 Invalidation Count             0
   13 Latch (Share)                  0
    4 Result Size Maximum (Blocks)   721


SQL > select name, bytes from v$sgastat where name in ('Result Cache');

NAME               BYTES
------------- ----------
Result Cache    13493744


SQL > column name     format a50
SQL > column cache_id format a30
SQL > select name, cache_id, type, status, namespace, sum(block_count) blocks, sum(ceil(row_count*row_size_avg/1024)) est_blocks
      from v$result_cache_objects
      group by name, cache_id, type, status, namespace;

NAME                                               CACHE_ID                       TYPE       STATUS    NAMESPACE      BLOCKS EST_BLOCKS
-------------------------------------------------- ------------------------------ ---------- --------- ---------- ---------- ----------
K.RC_FUN                                           K.RC_FUN                       Dependency Published                     1          0
"K"."RC_FUN"::8."RC_FUN"#8cd0e193d8974f19 #1       cyma7sjgpkr5zbtbp804tt27r7     Result     Published PLSQL           13000      13000
RC memory is under its maximum Limit.

"Block Count Current: 13024" < "Block Count Maximum: 14432", and "Delete Count Invalid: 0".

Measuring latch Get time by dtrace:
 
sudo dtrace -n \
'BEGIN {self->start_wts = walltimestamp; self->start_ts = timestamp;}
pid$target::qesrcCM_Acquire:entry /execname == "oracle"/ { self->rc = 1; }
pid$target::ksl_get_shared_latch:entry /execname == "oracle" && self->rc == 1/ { self->ts = timestamp; }
pid$target::ksl_get_shared_latch:return /self->ts > 0/ {
        @lquant["ns"] = lquantize(timestamp - self->ts, 0, 10000, 1000);
        @avgs["AVG_ns"] = avg(timestamp - self->ts);
        @mins["MIN_ns"] = min(timestamp - self->ts);
        @maxs["MAX_ns"] = max(timestamp - self->ts);
        @sums["SUM_ms"] = sum((timestamp - self->ts)/1000000);
        @counts[ustack(10, 0)] = count();
        self->rc = 0; self->ts = 0;}
END { printf("Start: %Y, End: %Y, Elapsed_ms: %d\n", self->start_wts, walltimestamp, (timestamp - self->start_ts)/1000000);}
' -p 13945


:BEGIN dtrace started: 2017 Sep 11 07:11:57
:END dtrace ended: 2017 Sep 11 07:11:59 ,Elapsed_ms: 1978
  ns
           value  ------------- Distribution ------------- count
             < 0 |                                         0
               0 |@@@@@@@@@@@@@@                           102706
            1000 |@@@@@@@@@@@@@@@@@@@@@@@@@@               186749
            2000 |                                         2193
            3000 |                                         54
            4000 |                                         49
            5000 |                                         42
            6000 |                                         28
            7000 |                                         16
            8000 |                                         9
            9000 |                                         10
        >= 10000 |                                         107

  AVG_ns     1167
  MIN_ns      827
  MAX_ns   169083
  SUM_ms        0

           oracle`ksl_get_shared_latch+0x1b8
           oracle`qesrcCM_Acquire+0x354
           oracle`psdfrcLookup+0x12d
           oracle`pevm_ENTERX+0x143
           oracle`pfrinstr_ENTERX+0x47
           oracle`pfrrun_no_tool+0x12a
           oracle`pfrrun+0x4c0
           oracle`plsql_run+0x288
           oracle`peicnt+0x946
           oracle`kkxexe+0x2f3
  291963
In average, each Get takes 1,167 Nanoseconds:


2. Test_2: Over Memory Limit


Stop above tests, re-run the test to create 15000 RC Rows, make the same checks:
 
exec clean_jobs;
exec dbms_result_cache.flush;
exec dbms_result_cache.memory_report(TRUE);
exec rc_fun_jobs(4, 15000);

SQL > exec dbms_result_cache.memory_report(TRUE);
R e s u l t   C a c h e   M e m o r y   R e p o r t
[Parameters]
Block Size          = 1K bytes
Maximum Cache Size  = 14432K bytes (14432 blocks)
Maximum Result Size = 721K bytes (721 blocks)
[Memory]
Total Memory = 14984624 bytes [1.015% of the Shared Pool]

... Fixed Memory = 46272 bytes [0.003% of the Shared Pool]
....... Memory Mgr = 208 bytes
....... Cache Mgr  = 256 bytes
....... Bloom Fltr = 2K bytes

....... State Objs = 43760 bytes
... Dynamic Memory = 14938352 bytes [1.012% of the Shared Pool]
....... Overhead = 159984 bytes
........... Hash Table    = 64K bytes (4K buckets)
........... Chunk Ptrs    = 24K bytes (3K slots)
........... Chunk Maps    = 12K bytes
........... Miscellaneous = 159984 bytes
....... Cache Memory = 14432K bytes (14432 blocks)
........... Unused Memory = 0 blocks
........... Used Memory = 14432 blocks
............... Dependencies = 1 blocks (1 count)
............... Results = 14431 blocks
................... PLSQL   = 14431 blocks (14431 count)


SQL > select id, name, value from v$result_cache_statistics order by name;

   ID NAME                           VALUE
----- ------------------------------ ----------
    3 Block Count Current            14432
    2 Block Count Maximum            14432
    1 Block Size (Bytes)             1024
    6 Create Count Failure           0
    5 Create Count Success           143128
    9 Delete Count Invalid           169
   10 Delete Count Valid             128528
   12 Find Copy Count                80928
    7 Find Count                     86758
   11 Hash Chain Length              1-5
    8 Invalidation Count             0
   13 Latch (Share)                  0
    4 Result Size Maximum (Blocks)   721


SQL > select name, bytes from v$sgastat where name in ('Result Cache');

NAME                   BYTES
----------------- ----------
Result Cache        14938352


SQL > select name, cache_id, type, status, namespace, sum(block_count) blocks, sum(ceil(row_count*row_size_avg/1024)) est_blocks
      from v$result_cache_objects
      group by name, cache_id, type, status, namespace;

NAME                                               CACHE_ID                       TYPE       STATUS    NAMESPACE      BLOCKS EST_BLOCKS
-------------------------------------------------- ------------------------------ ---------- --------- ---------- ---------- ----------
"K"."RC_FUN"::8."RC_FUN"#8cd0e193d8974f19 #1       cyma7sjgpkr5zbtbp804tt27r7     Result     New       PLSQL               1          0
K.RC_FUN                                           K.RC_FUN                       Dependency Published                     1          0
"K"."RC_FUN"::8."RC_FUN"#8cd0e193d8974f19 #1       cyma7sjgpkr5zbtbp804tt27r7     Result     Published PLSQL           14430      14430
dbms_result_cache.memory_report shows RC Memory reaches result_cache_max_size:
 
 Total Memory = 14984624 bytes [1.015% of the Shared Pool]
"Block Count Current: 14432" = "Block Count Maximum: 14432", and "Delete Count Invalid: 128528".

Output of query on v$result_cache_objects occasionally contains one line with Status "NEW" when repeatedly running.

Perform the latch Sleeps check:
 
select latch#, name, gets, misses, spin_gets, sleeps, wait_time 
from v$latch where name = 'Result Cache: RC Latch';

 LATCH# NAME                         GETS     MISSES  SPIN_GETS     SLEEPS  WAIT_TIME
------- ---------------------- ---------- ---------- ---------- ---------- ----------
    559 Result Cache: RC Latch  696395773 3217974991 2909396116 1023307692 7.9453E+11
It shows that Sleeps goes up steadily when each time run the query.

Again measuring latch Get time by dtrace:
 
Start: 2017 Sep 11 07:40:44, End: 2017 Sep 11 07:40:46, Elapsed_ms: 2333
  ns
           value  ------------- Distribution ------------- count
               0 |                                         0
            1000 |@                                        15
            2000 |                                         1
            3000 |                                         0
            4000 |                                         0
            5000 |                                         0
            6000 |                                         0
            7000 |                                         0
            8000 |                                         0
            9000 |                                         0
        >= 10000 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  747

  AVG_ns     3029147
  MIN_ns        1031
  MAX_ns    87959260
  SUM_ms        1951

            oracle`ksl_get_shared_latch+0x1b8
            oracle`qesrcCM_Acquire+0x354
            oracle`psdfrcLookup+0x12d
            oracle`pevm_ENTERX+0x143
            oracle`pfrinstr_ENTERX+0x47
            oracle`pfrrun_no_tool+0x12a
            oracle`pfrrun+0x4c0
            oracle`plsql_run+0x288
            oracle`peicnt+0x946
            oracle`kkxexe+0x2f3
  763
In average, each get takes 3,029,147 Nanoseconds (3 Microseconds).

RC Latch "ksl_get_shared_latch" Gets jumps from an order of magnitude Microseconds (us) to Millisecond (ms).

In real applications, we can also build a histogram on RC creation_timestamp to track RC usages. (see Blog: Latch free waits, Dynamic Statistics, and the Result Cache and MOS Doc ID 2002089.1: High "Latch Free" Waits with Contention on 'Result Cache: RC Latch' when RESULT_CACHE_MODE = MANUAL on Oracle 12c)
 
select trunc(creation_timestamp, 'MI') rc_create_ts
      ,name, cache_id, type, status, namespace
      ,count(*) rc_count, round(avg(scan_count)) avg_scan_cnt, sum(block_count) blocks, sum(ceil(row_count*row_size_avg/1024)) est_blocks 
from   v$result_cache_objects t
where  lru_number > 0
group by name, cache_id, type, status, namespace, trunc(creation_timestamp, 'MI')
order by 1, rc_count desc, avg_scan_cnt desc; 
Blog: PL/SQL Function Result Cache Invalidation contains 5 views in Section: Causal Analysis of Result Cache, and provides some code for Result Cache History Recording.


3. RC Latch Contention and CPU Usage


There exists one common understanding that Latch Contention is manifested in UNIX as high CPU Usage. In both above tests, we have a machine with 6 physical processors, and started 4 parallel sessions, so there is no CPU saturation.

When creating 13000 RC Rows, we observe our 4 processes at the TOP of CPU usage:
 
load averages:  3.98,  2.85,  2.09;                    up 148+17:47:10 07:55:59
73 processes: 68 sleeping, 5 on cpu
CPU states: 32.9% idle, 66.8% user,  0.3% kernel,  0.0% iowait,  0.0% swap
Memory: 24G phys mem, 9991M free mem, 30G total swap, 30G free swap

   PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
 16030 oracle     1   3    0 4712M 4632M cpu/5    4:50 16.75% oracle
 16032 oracle     1   1    0 4712M 4632M cpu/4    4:50 16.69% oracle
 16026 oracle     1   1    0 4712M 4632M cpu/2    4:50 16.45% oracle
 16028 oracle     1   1    0 4712M 4632M cpu/1    4:50 16.39% oracle
When creating 15000 RC Rows:
 
load averages:  1.16,  1.69,  1.84;                    up 148+17:54:36 08:03:25
73 processes: 71 sleeping, 2 on cpu
CPU states: 81.7% idle, 17.5% user,  0.8% kernel,  0.0% iowait,  0.0% swap
Memory: 24G phys mem, 9991M free mem, 30G total swap, 30G free swap

   PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
 16134 oracle     1   1    0 4712M 4632M cpu/2    1:35  6.52% oracle
 16132 oracle     1   1    0 4712M 4637M sleep    1:31  4.76% oracle
 16138 oracle     1   1    0 4712M 4632M sleep    1:39  3.42% oracle
 16136 oracle     1   1    0 4712M 4632M sleep    1:36  3.23% oracle
The differences are:
 
Creating 13000 RC Rows has:
  load averages:  3.98
  5 on cpu
  66.8% user
  
Creating 15000 RC Rows has:
  load averages:  1.16
  2 on cpu
  17.5% user
Probably Latch Spins is burning CPU; whereas Latch Sleeps yields CPU. Therefore when talking about Latch contentions, it would be necessary to distinguish between Spin_gets and Sleeps. As tested, usually Spin_gets of Latch Misses are caused by frequently concurrent access; whereas Sleeps of Latch Misses are triggered by Invalidations or DML Modifications.

On an AIX machine with 6 cores and SMT=4 (see Blog: IBM AIX POWER7 CPU Usage and Throughput), run test:
 
  exec rc_fun_jobs(4, 15000);
and list call stacks of all 4 processes for 4 running Oracle sessions (common code removed):
 
$ > procstack 52166700 57278636 58982430 62521360

52166700: ora_j004_aix3
  thread_wait(0xbb800000bb8) + 0x22c
  sskgpwwait(??, ??, ??, ??, ??) + 0x54
  skgpwwait(??, ??, ??, ??, ??) + 0x12c
  ksliwat(??, ??, ??) + 0x1f08
  kslwaitctx(??, ??) + 0xe0
  ksqcmi(??, ??, ??, ??, ??, ??, ??, ??) + 0x6b4c
  ksqgtlctx(??, ??, ??, ??, ??, ??, ??, ??) + 0x10bc
  ksqgelctx(??, ??, ??, ??, ??, ??, ??, ??) + 0x21c
  qesrcCM_Wait(??, ??, ??) + 0x3c4
  qesrcCM_Acquire(??, ??, ??, ??, ??, ??) + 0x474
  psdfrcLookup(??, ??, ??, ??, ??, ??) + 0x124
  pevm_ENTERX(??, ??, ??, ??) + 0x150
  pfrinstr_ENTERX(0xfffffffffff44a0, 0x7000100e5e4e618, 0x122053f58) + 0x60
  pfrrun_no_tool(??, ??, ??) + 0x7c

57278636: ora_j003_aix3
  sskgpwpost(??, ??, ??, ??) + 0x44
  sskgpwwait(??, ??, ??, ??, ??) + 0x54
  skgpwwait(??, ??, ??, ??, ??) + 0x12c
  kslgess(0x0, 0x1600000010, 0x200000002, 0x1, 0x14fe000014fe) + 0x9c8
  ksl_get_shared_latch(??, ??, ??, ??, ??, ??) + 0x460
  qesrcCM_Release(??, ??, ??) + 0x18c
  psdfrcWriteResult(??, ??, ??, ??) + 0x360
  pevm_RET(??, ??) + 0x380
  pfrinstr_RET(0xfffffffffff4490, 0x7000100e5e4e658, 0x122053f58) + 0x4c
  pfrrun_no_tool(??, ??, ??) + 0x7c

58982430: ora_j001_aix3
  thread_wait(0x10e0000012c) + 0x22c
  sskgpwwait(??, ??, ??, ??, ??) + 0x54
  skgpwwait(??, ??, ??, ??, ??) + 0x12c
  kslgess(0x8, 0x8e00000008, 0x200000002, 0x1, 0x14ee000014ee) + 0x9c8
  ksl_get_shared_latch(??, ??, ??, ??, ??, ??) + 0x460
  qesrcPin_Get(??, ??, ??, ??, ??, ??, ??, ??) + 0x218
  qesrcCM_Acquire(??, ??, ??, ??, ??, ??) + 0x368
  psdfrcLookup(??, ??, ??, ??, ??, ??) + 0x124
  pevm_ENTERX(??, ??, ??, ??) + 0x150
  pfrinstr_ENTERX(0xfffffffffff4490, 0x7000100e5e4e618, 0x122053f58) + 0x60

62521360: ora_j000_aix3
  qesrcCM_MakeRoom(??, ??, ??) + 0xec
  qesrcMM_Alloc(??, ??, ??) + 0x16c
  qesrcRO_New(??, ??, ??, ??) + 0xa9c
  qesrcPin_Get(??, ??, ??, ??, ??, ??, ??, ??) + 0x6f4
  qesrcCM_Acquire(??, ??, ??, ??, ??, ??) + 0x368
  psdfrcLookup(??, ??, ??, ??, ??, ??) + 0x124
  pevm_ENTERX(??, ??, ??, ??) + 0x150
  pfrinstr_ENTERX(0xfffffffffff4590, 0x7000100e5e4e618, 0x122053f58) + 0x60
  pfrrun_no_tool(??, ??, ??) + 0x7c
  pfrrun(??) + 0x136c
  plsql_run(??, ??, ??) + 0x310
  peicnt(??, ??) + 0x284
  kkxexe(??) + 0x2b4
  opiexe(??, ??, ??) + 0x4594
  opiodr(??, ??, ??, ??) + 0x3d4
  rpidrus(??) + 0x21c
  skgmstack(??, ??, ??, ??, ??) + 0xc0
  rpidru(??) + 0xa8
  rpiswu2(??, ??, ??, ??, ??, ??, ??, ??) + 0x5ac
  rpidrv(??, ??, ??, ??) + 0xe14
  rpiexe(??) + 0x80
  kkjex1e(??) + 0x1824
  kkjex1e_cdb(??, ??, ??, ??, ??) + 0x1d8
  kkjsexe() + 0x4f4
  kkjrdp() + 0x384
  opirip(??, ??, ??) + 0x408
  opidrv(??, ??, ??) + 0x484
  sou2o(??, ??, ??, ??) + 0x110
  opimai_real(??, ??) + 0x144
  ssthrdmain(0x0, 0x300000003, 0xfffffffffffebf8) + 0x1e4
  main(??, ??) + 0xf0
  __start() + 0x70
We can see that 4 processes are executing respectively:
 
  thread_wait(0xbb800000bb8)
  sskgpwpost(??, ??, ??, ??)
  thread_wait(0x10e0000012c)
  qesrcCM_MakeRoom(??, ??, ??) 
Two processes are waiting with thread_wait. AIX Document said:
  int thread_wait(timeout):  Specifies the maximum length of time, in milliseconds, to wait for a posting.
and two timeout values are:
 
  0x12c =  300
  0xbb8 = 3000
One speculation is that the first timeout is 300, that is, thread_wait(300). In case of later Sleep (not got RC Latch after first 300ms sleeping), timeout is multiplied by 10 (3000).

The thread_post subroutine posts the thread the occurrence of an event, hence waking up the sleeping threads.

The last process (Oracle session) is creating a new RC Row after holding Latch (qesrcPin_Get):
 
  qesrcCM_MakeRoom(??, ??, ??) 
  qesrcMM_Alloc(??, ??, ??)
  qesrcRO_New(??, ??, ??, ??) 
  qesrcPin_Get(??, ??, ??, ??, ??, ??, ??, ??)
  qesrcCM_Acquire(??, ??, ??, ??, ??, ??)


4. Big Rows


In above tests, each RC row has a size less than 1024 Bytes. When caching SQL Collections, we can make the row size more than 1K. Here two similar tests (see appended Collection Test Code) with each row taking about 2K (ROW_SIZE_AVG = 1099 Bytes, i.e 2 RC Blocks). one is under result_cache_max_size, another is beyond that.
 
exec clean_jobs;
exec dbms_result_cache.flush;
exec dbms_result_cache.memory_report(TRUE);
exec rc_tab_jobs(4, 6500, 100);

exec clean_jobs;
exec dbms_result_cache.flush;
exec dbms_result_cache.memory_report(TRUE);
exec rc_tab_jobs(4, 7500, 100);


5. Possible Solutions


Adapt (increase) result_cache_max_size to satisfy cache requests or reduce cache usage. It can be dynamically increased by:
 
alter system set result_cache_max_size = 30M;
We can also disable Result Cache by either dynamically setting result_cache_max_size = 0, or bypass RC:
 
alter system set result_cache_max_size = 0;

exec dbms_result_cache.bypass(true);
exec dbms_result_cache.bypass(false);
The difference is that "result_cache_max_size = 0" releases RC memory, but bypass instructs code bypassing RC, but still keeps the RC memory.

Oracle 12.2 introduced RC Black List, and dbms_result_cache is enhanced with 4 APIs:
 
  exec dbms_result_cache.Black_List_Add('cyma7sjgpkr5zbtbp804tt27r7', TRUE);     -- Add cache_id to black-list
  exec dbms_result_cache.Black_List_Remove('cyma7sjgpkr5zbtbp804tt27r7', TRUE);  -- Remove cache_id from black-list
  exec dbms_result_cache.Black_List_clear(TRUE);                                 -- Remove all cache_id's from black-list
  select * from table(dbms_result_cache.Black_List);                             -- List all contents of the black-list
They allow us to regulate RC usage in granularity of each single cache_id, instead of toggling whole RC Cache.


6. Test Code


 
create or replace function rc_fun(p_var_1 varchar2, p_var_2 varchar2) return pls_integer result_cache as
  l_ret pls_integer;
begin
  l_ret := length(p_var_1) + 1e3*length(p_var_2);
  return l_ret;
end;
/

create or replace procedure rc_proc(p_rc_limit number) as
  l_ret pls_integer;
begin
  for i in 1..p_rc_limit loop
    l_ret := rc_fun('p1_'||i, 'p2_'||(i + p_rc_limit));
  end loop;
end;
/

create or replace procedure rc_fun_jobs(p_job_cnt number, p_rc_limit number) as
  l_job_id pls_integer;
begin
  for i in 1.. p_job_cnt loop
    dbms_job.submit(l_job_id, 'begin while true loop rc_proc('||p_rc_limit||'); end loop; end;');
  end loop;
  commit;
end;    
/

create or replace procedure clean_jobs as
begin
  for c in (select job from dba_jobs) loop
    begin
       dbms_job.remove (c.job);
    exception when others then null;
    end;
    commit;
  end loop;

  for c in (select d.job, d.sid, (select serial# from v$session where sid = d.sid) ser 
              from dba_jobs_running d) loop
    begin
      execute immediate
             'alter system kill session '''|| c.sid|| ',' || c.ser|| ''' immediate';
      dbms_job.remove (c.job);
    exception when others then null;
    end;
    commit;
  end loop;
  
  -- select * from dba_jobs;
  -- select * from dba_jobs_running;
end;
/


7. Collection Test Code


 
create or replace type num_tab force is table of number(10)
/

create or replace function rc_fun_tab(p_id number, p_tab_cnt number) return num_tab result_cache as
  l_ret  num_tab := new num_tab();
begin
  for i in 1..p_tab_cnt loop
    l_ret.extend;
    l_ret(i) := p_id + i;
  end loop;
  return l_ret;
end;
/

create or replace procedure rc_proc_tab(p_rc_limit number, p_tab_cnt number) as
  l_ret num_tab := new num_tab();
begin
  for i in 1..p_rc_limit loop
    l_ret := rc_fun_tab(i, p_tab_cnt);
  end loop;
end;
/

create or replace procedure rc_tab_jobs(p_job_cnt number, p_rc_limit number, p_tab_cnt number) as
  l_job_id pls_integer;
begin
  for i in 1.. p_job_cnt loop
    dbms_job.submit(l_job_id, 'begin while true loop rc_proc_tab('||p_rc_limit||', '||p_tab_cnt||'); end loop; end;');
  end loop;
  commit;
end;    
/

Saturday, July 22, 2017

nls_database_parameters, dc_props, latch: row cache objects

This Blog will demonstrate latch: row cache objects on Dictionary Cache dc_props when querying nls_database_parameters.

Note: all tests are done in Oracle 12.1.0.2.0.

Reference: NLS understanding in this Blog is acquired and cited from:
    Oracle MOS: The Priority of NLS Parameters Explained (Where To Define NLS Parameters) (Doc ID 241047.1)


1. latch: row cache objects


At first, find v$rowcache.cache# mapping to v$latch_children.child#, then query nls_database_parameters, collecting statistics before and after test:

select r.kqrstcid cache#, r.kqrsttxt parameter, c.child#  --, r.*, c.* 
from x$kqrst r, v$latch_children c 
where r.kqrstcln = c.child# 
  and r.kqrsttxt = 'dc_props'
  and c.name     = 'row cache objects';

 CACHE# PARAMETER  CHILD#
------- --------- -------
     15 dc_props       18

select r.cache#, r.type, r.parameter, r.gets, r.count, c.name, c.child#, c.gets latch_gets
from v$rowcache r, v$latch_children c
where r.parameter = 'dc_props' and c.name = 'row cache objects' and c.child# = 18;  

 CACHE#   TYPE PARAMETER     GETS  COUNT              NAME  CHILD#  LATCH_GETS
------- ------ --------- -------- ------ ----------------- ------- -----------
     15 PARENT dc_props    140920     60 row cache objects      18      422820
                                                                    
select value from nls_database_parameters where parameter = 'NLS_CHARACTERSET';
  VALUE   
  --------
  AL32UTF8
  
select r.cache#, r.type, r.parameter, r.gets, r.count, c.name, c.child#, c.gets latch_gets
from v$rowcache r, v$latch_children c
where r.parameter = 'dc_props' and c.name = 'row cache objects' and c.child# = 18;  

 CACHE#   TYPE PARAMETER     GETS  COUNT              NAME  CHILD#  LATCH_GETS
------- ------ --------- -------- ------ ----------------- ------- -----------
     15 PARENT dc_props    140980     60 row cache objects      18      423000
We can see that dc_props GETs increased 60 (140980 - 140920), and LATCH_GETS increased 180 (423000 - 422820).

nls_database_parameters is defined by:

create or replace force view sys.nls_database_parameters (parameter, value) as
  select name, substr (value$, 1, 64) from x$props where name like 'NLS%';
A query on v$rowcache_parent for 'dc_props' shows:

select cache_name, existent, count(*) cnt from v$rowcache_parent 
 where cache_name = 'dc_props' group by cache_name, existent;

  CACHE_NAME  EXISTENT   CNT
  ----------- --------- ----
  dc_props    Y           37
  dc_props    N           23
It contains 60 rows, of which 37 are existing objects, 23 are not. Total number of 'dc_props' Cache entries: v$rowcache.COUNT = 60 is corresponding to 60 parent objects of v$rowcache_parent.

37 existing objects can be confirmed by:

select count(*) from x$props; 
   COUNT(*)
  ---------
         37
These 37 existing objects are probably originated from:

select * from sys.props$;
select * from database_properties;
Since the query on nls_database_parameters is a TABLE FULL scan on X$PROPS.

select value from nls_database_parameters where parameter = 'NLS_CHARACTERSET';

----------------------------------------------------------------------------
| Id  | Operation        | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT |         |     1 |    28 |     0   (0)| 00:00:01 |
|*  1 |  FIXED TABLE FULL| X$PROPS |     1 |    28 |     0   (0)| 00:00:01 |
----------------------------------------------------------------------------
and v$rowcache_parent contains 60 rows for 'dc_props', a TABLE FULL scan makes 60 GETs in v$rowcache, although the underlying X$PROPS contains only 37 EXISTENT objects. It seems that NON-EXISTENT objects also require the similar handling.

For each rowcache GET, it requires 3 "latch: row cache objects" GETs, that is why we have 180 LATCH_GETS.
(see Oracle Core: Essential Internals for DBAs and Developers Page 167:
    there was a common pattern indicating three latch gets for each dictionary cache get)

Oracle 12c V$ROWCACHE_PARENT wrote:

V$ROWCACHE_PARENT displays information for parent objects in the data dictionary. There is one row per lock owner, and one waiter for each object. This row shows the mode held or requested. For objects with no owners or waiters, a single row is displayed.

In case of acute wait of latch: row cache objects, we can try to find the lock owner or waiter by:

select s.sid, s.program, r.* 
  from v$rowcache_parent r, v$session s
 where cache_name = 'dc_props' 
   and (lock_mode != '0' or lock_request != '0')
   and r.saddr = s.saddr(+);
In essence, dc_props seems the Dictionary Cache (Row Cache) of Sys Table: sys.props$.


2. NLS_CHARACTERSET semantics and Solutions


Oracle stores NLS settings in 3 levels: database/instance/session:
  nls_database_parameters
  nls_instance_parameters
  nls_session_parameters
  
and the precedence is ordered from low to high.
(Note: Any setting explicit used in SQL will always take precedence about all other settings)
In application semantics, session specific NLS settings should be selected because of precedence.

Additionally, it defines:
  v$nls_parameters 
A view that shows the current session parameters and the *DATABASE* characterset as seen in the NLS_DATABASE_PARAMETERS view.

MOS Doc ID 241047.1 Section C) The Database Parameters - NLS_DATABASE_PARAMETERS wrote:

These parameters are "overruled" by NLS_INSTANCE_PARAMETERS and NLS_SESSION_PARAMETERS.
The only exception are the NLS_CHARACTERSET and NLS_NCHAR_CHARACTERSET settings.
The NLS_CHARACTERSET and NLS_NCHAR_CHARACTERSET parameters cannot be overruled by instance or session parameters.
They are defined by the value specified in "create database ..." and are not intended to be changed afterwords dynamically.

Look the definition of nls_session_parameters:

CREATE OR REPLACE FORCE VIEW SYS.NLS_SESSION_PARAMETERS (PARAMETER, VALUE) AS
SELECT SUBSTR (parameter, 1, 30), SUBSTR (VALUE, 1, 64)
  FROM v$nls_parameters
 WHERE     parameter != 'NLS_CHARACTERSET'
       AND parameter != 'NLS_NCHAR_CHARACTERSET';
and definition of gv$nls_parameters (underlying view):

SELECT inst_id,
       parameter,
       VALUE,
       con_id
  FROM x$nls_parameters
 WHERE parameter != 'NLS_SPECIAL_CHARS'
and count each by:

select count(*) from nls_session_parameters;  
   COUNT(*)
  ---------
         17
  
select count(*) from v$nls_parameters;             
   COUNT(*)
  ---------
         19  
The discrepancy are exactly the two excluded NLS_(NCHAR)_CHARACTERSET parameters, nls_session_parameters is a subset of v$nls_parameters.

select parameter, value from v$nls_parameters 
minus
select parameter, value from nls_session_parameters;

  PARAMETER              VALUE
  ---------------------- ---------
  NLS_CHARACTERSET       AL32UTF8
  NLS_NCHAR_CHARACTERSET AL16UTF16
Therefore, in order to avoid latch: row cache objects on dc_props, we can run the equivalent:

  select value from v$nls_parameters where parameter = 'NLS_CHARACTERSET';
instead of query:

  select value from nls_database_parameters where parameter = 'NLS_CHARACTERSET';
The later generates 60 rowcache GETs, and 180 latch: row cache objects on dc_props for each run. Therefore, severe row cache objects latch contention occurs if we frequently query nls_database_parameters.

We can verify this solution by:

select r.cache#, r.type, r.parameter, r.gets, r.count, c.name, c.child#, c.gets latch_gets
from v$rowcache r, v$latch_children c
where r.parameter = 'dc_props' and c.name = 'row cache objects' and c.child# = 18;  

 CACHE#   TYPE PARAMETER    GETS  COUNT              NAME  CHILD#  LATCH_GETS
------- ------ --------- ------- ------ ----------------- ------- -----------
     15 PARENT dc_props   141046     60 row cache objects      18      423198
                                                                    
select value from v$nls_parameters where parameter = 'NLS_CHARACTERSET';
  VALUE   
  --------
  AL32UTF8
   
select r.cache#, r.type, r.parameter, r.gets, r.count, c.name, c.child#, c.gets latch_gets
from v$rowcache r, v$latch_children c
where r.parameter = 'dc_props' and c.name = 'row cache objects' and c.child# = 18;  

 CACHE#   TYPE PARAMETER    GETS  COUNT              NAME  CHILD#  LATCH_GETS
------- ------ --------- ------- ------ ----------------- ------- -----------
     15 PARENT dc_props   141046     60 row cache objects      18      423198
Obviously there are neither rowcache GETs, nor latch: row cache objects, because NLS session parameters (v$nls_parameters) are stored in session memory UGA and are not visible for other sessions, whereas nls_database_parameters is setup in a DB wide globally shared space (SGA) and have to be protected by latch: row cache objects.

In general, latch: row cache objects is documented as protecting data dictionary cache, but we saw that it is also used to safeguard X$PROPS.

Similar to v$nls_parameters, database NLS_CHARACTERSET can also be gotten by:

select sys_context('userenv','language') from dual;
select userenv ('language') from dual;    -- userenv 12c deprecated 
without rowcache GETs and latch: row cache objects.

sys_context('userenv','language') returns the language and territory currently used by the current session, along with the database character set in UNIX-like format:
   language_territory.characterset
Usually SYS_CONTEXT returns the value of parameter associated with the context namespace at the current instant for the current session. Since database NLS_CHARACTERSET is a database wide parameter, sys_context returns the correct value.

Further more, MOS Doc ID 241047.1 explained the difference between database character set and OS env NLS_LANG character set:

The database character set is not the same as the character set of the NLS_LANG that you started this connection with!
The OS environment NLS_LANG (from where you started this connection) is reflected in v$session_connect_info.CLIENT_CHARSET:

  select CLIENT_CHARSET, v.* from v$session_connect_info v;
MOS Doc ID 241047.1 wrote:

D1) sys.props$
    SQL>select name,value$ from sys.props$ where name like '%NLS%';
    This gives the same info as NLS_DATABASE_PARAMETERS. You should use NLS_DATABASE_PARAMETERS instead of props$

We saw that NLS_DATABASE_PARAMETERS is underlined by X$PROPS, whereas SYS.PROPS$ is a real table. The former select generates row cache GETs and requires latch: row cache objects, the later not. Therefore from performance point of view, they are not the same.

In summary:
(A). nls_database_parameters is underlined by x$props, which requires latch: row cache objects on Dictionary Cache dc_props.

(B). v$nls_parameters is underlined by x$nls_parameters.

(C). sys_context('userenv','language') is underlined by sys.props$, which is a real table.
     It returns the language and territory of current session in UNIX (NLS_LANG) like format: 
        language_territory.characterset

(D). nls_session_parameters is a subset of v$nls_parameters.
     v$nls_parameters has two additional rows of *DATABASE* characterset (NLS_CHARACTERSET, NLS_NCHAR_CHARACTERSET) 
     from nls_database_parameters.
   
(E). To find *DATABASE* characterset (NLS_CHARACTERSET, NLS_NCHAR_CHARACTERSET), 
     both nls_database_parameters and v$nls_parameters can be used. 
     The difference is that nls_database_parameters requires "latch: row cache objects", but v$nls_parameters not.


3. 'latch: row cache objects' Contention Test


Launch 2 Jobs (see appended Test Code), monitor "latch: row cache objects" contentions.

SQL > exec nls_select_jobs(2, 1e9);

column p1text format a10                
column p2text format a10 
column p3text format a10                                      
column event format a25 

SQL > select sid, event, p1text, p1, p1raw, p2text, p2, p2raw, p3text, p3, p3raw
        from v$session where program like '%(J0%';

    SID EVENT                     P1TEXT           P1 P1RAW            P2TEXT    P2 P2RAW            P3TEXT   P3 P3RAW
  ----- ------------------------- -------- ---------- ---------------- ------- ---- ---------------- ------- --- -----
      6 latch: row cache objects  address  6376016176 000000017C0A4930 number   411 000000000000019B tries     0 00
    287 latch: row cache objects  address  6376016176 000000017C0A4930 number   411 000000000000019B tries     0 00
  
--After Test stop all jobs by:
--SQL > exec clean_jobs;
Display the top latch misses and their locations, which are kqrpre and kqreqd. The sum of SLEEP_COUNT is close to sum of WTR_SLP_COUNT. Their data distribution could indicate certain Holder-Waiter logic.

column PARENT_NAME format a30                                                     
column WHERE format a20 

SQL > select * from (select * from v$latch_misses order by sleep_count desc)
       where rownum <= 5;

  PARENT_NAME                  WHERE                NWFAIL_COUNT SLEEP_COUNT WTR_SLP_COUNT LONGHOLD_COUNT LOCATION
  ---------------------------- -------------------- ------------ ----------- ------------- -------------- ----------------
  row cache objects            kqrpre: find obj                0       64009         94027              0 kqrpre: find obj
  row cache objects            kqreqd: reget                   0       61416         30578              0 kqreqd: reget
  row cache objects            kqreqd                          0       35943         36768              0 kqreqd
  space background task latch  ktsj_grab_task                  0           4             4              0 ktsj_grab_task
  call allocation              ksuxds                          0           3             3              0 ksuxds

(Note: WHERE column is obsolete, it is always equal to the value in LOCATION)


4. Blocker and Waiter Demo


As a demo, we manually create 'latch: row cache objects' blocking. In one Session (SID 123, SPID 3456), stop it before it release the held 'latch: row cache objects', so it becomes a latch Holder, then run sql. Session is stopped with a callstack.

$ > dtrace -w  -n \
'pid$target::kqreqd:entry  /execname == "oracle"/ {self->rin = 1;}
pid$target::kslgetl:return  /execname == "oracle" && self->rin > 0 / 
{@[pid, ustack(5, 0)] = count(); stop(); exit(0);}
' -p 3456 

  -- Blocker SID = 123 --
  oracle`kslgetl+0x185
  oracle`kkdlpExecSql+0x20c
  oracle`kkdlpftld+0x147
  oracle`qerfxFetch+0x125f
  oracle`opifch2+0x188b
  
$ > pstack 3456
 0000000001ec38f8 kqreqd () + d8
 00000000028ee50c kkdlpExecSql () + 20c
 000000000757a7f7 kkdlpftld () + 147
 000000000204467f qerfxFetch () + 125f
 0000000001ee765b opifch2 () + 188b  


SQL (123) > exec nls_select(1);

// one can also use:
// dtrace -w -n 'pid$target::kqrLockAndPinPo:entry {@[pid, ustack(5, 0)] = count(); stop(); exit(0);}' -p 3456 
In a second session, run the same statement. It will be blocked, and pstack prints out its requesting code path.

SQL (789) > exec nls_select(1);

$ > pstack 5678

  -- Waiter SID = 789 --
  fffffd7ffc9d3e3b semsys   (2, 1000001f, fffffd7fffdf0af8, 1, 124a)
  0000000001ab90f5 sskgpwwait () + 1e5
  0000000001ab8c95 skgpwwait () + c5
  0000000001aaae69 kslges () + 5b9
  0000000001ebd12e kqrpre1 () + 72e
  00000000028ee49b kkdlpExecSql () + 19b
  000000000757a7f7 kkdlpftld () + 147
  000000000204467f qerfxFetch () + 125f
  0000000001ee765b opifch2 () + 188b


select sid, event, p1text, p1, p1raw, p2text, p2, p2raw, p3text, p3, p3raw, blocking_session, final_blocking_session
 from v$session s where sid in (123, 789);
 
SID EVENT                        P1TEXT             P1 P1RAW     P2TEXT   P2 P2RAW  P3TEXT P3 P3RAW  bs fbs
--- ---------------------------- ---------- ---------- --------- ------ ---- ------ ------ -- ----- --- ---
123 SQL*Net message from client  driver id  1413697536 054435000 #bytes    1 000001         0 00                                       
789 latch: row cache objects     address    6376016176 17C0A4930 number  411 00019B tries   0 00    123 123

(Note: shortcut bs for BLOCKING_SESSION, shortcut fbs for FINAL_BLOCKING_SESSION)
 
select * from v$latchholder;
select * from v$session_blockers where sid in (123, 789);
select * from v$session_wait where sid in (123, 789);
select * from v$session_wait_history where sid in (123, 789) order by seq# desc, sid;


5. rowcache Gets vs. latch Gets


For each rowcache GET, it requires 3 "latch: row cache objects" GETs as revealed by Book: Oracle Core: Essential Internals for DBAs and Developers (Page 167).

With following dtrace, we can see the code path and call count. All of them stem from kkdl (dictionary lookup) subroutines.

As demonstrated at the Blog's beginning, 'dc_props' contains 60 rows, 1000 runs will result in 60'000 rowcache Gets, and 180'000 latch Gets. Interestingly the output shows that 60 'dc_props' rowcache rows seems splitted into two Buckets (kkdlpftld+0x147 and kkdlpftld+0x351), each with 30 rows.

SQL > exec nls_select(1000);

sudo dtrace -n \
'pid$target::kqreqd:entry  /execname == "oracle"/ {self->rin = 1;}
pid$target::kslgetl:entry  /execname == "oracle"/ {self->lin = 1;}
pid$target::kqreqd:return  /self->rin > 0/ {@ROWCO_CNT[ustack(6, 0)] = count(); self->rin = 0;}
pid$target::kslgetl:return /self->lin > 0/ {@LATCH_CNT[ustack(6, 0)] = count(); self->lin = 0;}  
' -p 7482 

//---- rowcache Gets ----
    oracle`kqreqd+0x359        // row cache ...
    a.out`kkdlpftld+0x351      // dictionary lookup ...
    a.out`qerfxFetch+0x125f    // fixed table Fetch
    oracle`opifch2+0x188b      // fetch main routine
    oracle`opiefn0+0x1f7       // initialize opi
    oracle`opipls+0x352c
  30000
  
    oracle`kqreqd+0x359
    a.out`kkdlpftld+0x147
    a.out`qerfxFetch+0x125f
    oracle`opifch2+0x188b
    oracle`opiefn0+0x1f7
    oracle`opipls+0x352c
  30000
  

//------- latch Gets -------  
    oracle`kslgetl+0x185         // get latch
    a.out`kkdlpExecSql+0x19b
    a.out`kkdlpftld+0x351
    a.out`qerfxFetch+0x125f
    oracle`opifch2+0x188b
    oracle`opiefn0+0x1f7
  30000
  
    oracle`kslgetl+0x185
    a.out`kkdlpExecSql+0x19b
    a.out`kkdlpftld+0x147
    a.out`qerfxFetch+0x125f
    oracle`opifch2+0x188b
    oracle`opiefn0+0x1f7
  30000
  
    oracle`kslgetl+0x185
    a.out`kkdlpExecSql+0x20c
    a.out`kkdlpftld+0x351
    a.out`qerfxFetch+0x125f
    oracle`opifch2+0x188b
    oracle`opiefn0+0x1f7
  60000
  
    oracle`kslgetl+0x185
    a.out`kkdlpExecSql+0x20c
    a.out`kkdlpftld+0x147
    a.out`qerfxFetch+0x125f
    oracle`opifch2+0x188b
    oracle`opiefn0+0x1f7
  60000
Following documents have a deep investigation of Latches and Mutexes:
  New features of Latches and Mutexes in Oracle 12c.
  Latch, mutex and beyond
  Latch, mutex and beyond(Dtrace)
  Oracle 12 and latches
  Oracle 12 and latches, part 2
The kslgetl (get the exclusive latch) and ksl_get_shared_latch functions take the following arguments:
  1-latch address
  2-immediate get (0 means yes, 1 means no)
  3-where (X$KSLLW.INDX)
  4-why   (X$KSLWSC.INDX)
  5-mode (8=shared,16=exclusive; only for ksl_get_shared_latch function)
kslgetl() performs:
  – sskgslgf (immediate latch get)
  – kslges (kernel service latch get spinning, wait latch get)
  — kskthbwt
  — kslwlmod (setup wait list)
  — sskgslgf (immediate latch get)
  — skgpwwait (sleep latch get)
  — semop

6. row cache objects Queries


When there are contentions on 'latch: row cache objects', the instant and historical associated dc objects can be found by:

select c.addr, c.latch#, c.child#, c.name, c.gets latch_gets, c.misses, c.sleeps, c.spin_gets, c.wait_time, c.gets/(nullif(r.gets, 0)) ratio
      ,r.cache#, r.type, r.parameter, r.count, r.gets rowcache_gets, r.getmisses
      ,s.sid, s.serial#, program, s.user#, s.username, s.sql_id, s.row_wait_obj#
      ,(select owner||'.'||object_name from dba_objects where object_id = s.row_wait_obj#) object_name
      ,s.event, s.p1text, s.p1, s.p1raw, s.p2text, s.p2, s.p2raw, s.p3text, s.p3, s.p3raw
      ,(select  owner || '.' || object_name
              || case when procedure_name is not null then '.' || procedure_name end
         from dba_procedures
        where     object_id = s.plsql_entry_object_id and subprogram_id = s.plsql_entry_subprogram_id)
         plsql_entry
      ,(select   owner || '.'|| object_name
              || case when procedure_name is not null then  '.' || procedure_name end
         from dba_procedures
        where  object_id = s.plsql_object_id and subprogram_id = s.plsql_subprogram_id)
         plsql   
      ,(select dbms_lob.substr (sql_text, 50, 1) from dba_hist_sqltext where sql_id = s.sql_id and rownum = 1)
       sql_txt   
from v$session s, v$latch_children c, x$kqrst x, v$rowcache r
where (s.event = 'latch: row cache objects' or s.p2 = c.latch#)
  and s.p1raw = c.addr 
  and c.child# = x.kqrstcln
  and x.kqrstcid = r.cache#
  and c.name = 'row cache objects' 
--  and r.parameter = 'dc_props' and c.child# = 18
order by sid; 
  
with sq as (select /*+ materialize */ * from v$active_session_history where sample_time > sysdate - 10/1440)  -- last 10 minutes
select /*+ leading(sq) */
      c.addr, c.latch#, c.child#, c.name, c.gets latch_gets, c.misses, c.sleeps, c.spin_gets, c.wait_time, c.gets/(nullif(r.gets, 0)) ratio
      ,r.cache#, r.type, r.parameter, r.count, r.gets rowcache_gets, r.getmisses
      ,s.sample_time, s.session_id, s.session_serial#, program, s.user_id, s.sql_id, s.current_obj#
      ,(select owner||'.'||object_name from dba_objects where object_id = s.current_obj#) object_name
      ,s.event, s.p1text, s.p1, to_char(s.p1, 'XXXXXXXXXXXXXXXXXXXX') p1raw
      ,s.p2text, s.p2, to_char(s.p2, 'XXXXXXXXXXXXXXXXXXXX') p2raw 
      ,s.p3text, s.p3, to_char(s.p3, 'XXXXXXXXXXXXXXXXXXXX') p3raw
      ,(select  owner || '.' || object_name
              || case when procedure_name is not null then '.' || procedure_name end
         from dba_procedures
        where     object_id = s.plsql_entry_object_id and subprogram_id = s.plsql_entry_subprogram_id)
         plsql_entry
      ,(select   owner || '.'|| object_name
              || case when procedure_name is not null then  '.' || procedure_name end
         from dba_procedures
        where  object_id = s.plsql_object_id and subprogram_id = s.plsql_subprogram_id)
         plsql   
      ,(select dbms_lob.substr (sql_text, 50, 1) from dba_hist_sqltext where sql_id = s.sql_id and rownum = 1)
       sql_txt   
from sq s, v$latch_children c, x$kqrst x, v$rowcache r
where (s.event = 'latch: row cache objects' or s.p2 = c.latch#)
  and c.addr like '%'||(trim(to_char(p1, 'XXXXXXXXXXXXXXXXXXXX')))||'%'  
  and c.child# = x.kqrstcln
  and x.kqrstcid = r.cache#
  and c.name = 'row cache objects' 
--  and r.parameter = 'dc_props' and c.child# = 18
order by s.sample_time desc, s.session_id, s.session_serial#; 
Here two respective queries in short of access to x$kqrst. Then map CHILD# to dc CACHE# with a mapping table (created from a test DB of same Oracle Version by the query at the beginning of Blog).

select c.addr, c.latch#, c.child#, c.name, c.gets latch_gets, c.misses, c.sleeps, c.spin_gets, c.wait_time
      ,s.sid, s.serial#, program, s.user#, s.username, s.sql_id, s.row_wait_obj#
      ,(select owner||'.'||object_name from dba_objects where object_id = s.row_wait_obj#) object_name
      ,s.event, s.p1text, s.p1, s.p1raw, s.p2text, s.p2, s.p2raw, s.p3text, s.p3, s.p3raw
      ,(select  owner || '.' || object_name
              || case when procedure_name is not null then '.' || procedure_name end
         from dba_procedures
        where     object_id = s.plsql_entry_object_id and subprogram_id = s.plsql_entry_subprogram_id)
         plsql_entry
      ,(select   owner || '.'|| object_name
              || case when procedure_name is not null then  '.' || procedure_name end
         from dba_procedures
        where  object_id = s.plsql_object_id and subprogram_id = s.plsql_subprogram_id)
         plsql   
      ,(select dbms_lob.substr (sql_text, 50, 1) from dba_hist_sqltext where sql_id = s.sql_id and rownum = 1)
       sql_txt   
from v$session s, v$latch_children c
where (s.event = 'latch: row cache objects' or s.p2 = c.latch#)
  and s.p1raw = c.addr 
  and c.name = 'row cache objects'
--  and c.child# = 18
order by sid; 
  
with sq as (select /*+ materialize */ * from v$active_session_history where sample_time > sysdate - 10/1440)  -- last 10 minutes
select /*+ leading(sq) */
      c.addr, c.latch#, c.child#, c.name, c.gets latch_gets, c.misses, c.sleeps, c.spin_gets, c.wait_time
      ,s.sample_time, s.session_id, s.session_serial#, program, s.user_id, s.sql_id, s.current_obj#
      ,(select owner||'.'||object_name from dba_objects where object_id = s.current_obj#) object_name
      ,s.event, s.p1text, s.p1, to_char(s.p1, 'XXXXXXXXXXXXXXXXXXXX') p1raw
      ,s.p2text, s.p2, to_char(s.p2, 'XXXXXXXXXXXXXXXXXXXX') p2raw 
      ,s.p3text, s.p3, to_char(s.p3, 'XXXXXXXXXXXXXXXXXXXX') p3raw
      ,(select  owner || '.' || object_name
              || case when procedure_name is not null then '.' || procedure_name end
         from dba_procedures
        where     object_id = s.plsql_entry_object_id and subprogram_id = s.plsql_entry_subprogram_id)
         plsql_entry
      ,(select   owner || '.'|| object_name
              || case when procedure_name is not null then  '.' || procedure_name end
         from dba_procedures
        where  object_id = s.plsql_object_id and subprogram_id = s.plsql_subprogram_id)
         plsql   
      ,(select dbms_lob.substr (sql_text, 50, 1) from dba_hist_sqltext where sql_id = s.sql_id and rownum = 1)
       sql_txt   
from sq s, v$latch_children c
where (s.event = 'latch: row cache objects' or s.p2 = c.latch#)
  and c.addr like '%'||(trim(to_char(p1, 'XXXXXXXXXXXXXXXXXXXX')))||'%'  
  and c.name = 'row cache objects' 
--  and c.child# = 18
order by s.sample_time desc, s.session_id, s.session_serial#; 


LATCH_CHILD  CACHE# PARAMETER (Oracle Version 12.1.0.2.0)
----------- ------- -------------------------------------
          1       3 dc_rollback_segments
          2       1 dc_free_extents
          3       4 dc_used_extents
          4       2 dc_segments
          5       0 dc_tablespaces
          6       5 dc_tablespace_quotas
          7       6 dc_files
          8       7 dc_users
          8      10 dc_users
          9       8 dc_object_grants
          9       8 dc_objects
         10      17 dc_global_oids
         11      12 dc_constraints
         12      13 dc_sequences
         13      16 dc_histogram_data
         13      16 dc_histogram_defs
         14      54 dc_sql_prs_errors
         15      32 kqlsubheap_object
         16      19 dc_partition_scns
         16      19 dc_table_scns
         17      18 dc_outlines
         18      15 dc_props
         19      14 dc_profiles
       ....      ....


7. Test Code



create or replace procedure nls_select(p_cnt number) as
            l_val VARCHAR2 (256 Byte);
begin
for i in 1..p_cnt loop
  select value into l_val from nls_database_parameters where parameter = 'NLS_CHARACTERSET';
end loop;
end;
/

create or replace procedure nls_select_jobs(p_job_cnt number, p_cnt number) as
  l_job_id pls_integer;
begin
  for i in 1.. p_job_cnt loop
    dbms_job.submit(l_job_id, 'begin while true loop nls_select('||p_cnt||'); end loop; end;');
  end loop;
  commit;
end;    
/

create or replace procedure clean_jobs as
begin
  for c in (select job from dba_jobs) loop
    begin
       dbms_job.remove (c.job);
    exception when others then null;
    end;
    commit;
  end loop;

  for c in (select d.job, d.sid, (select serial# from v$session where sid = d.sid) ser 
              from dba_jobs_running d) loop
    begin
      execute immediate
             'alter system kill session '''|| c.sid|| ',' || c.ser|| ''' immediate';
      dbms_job.remove (c.job);
    exception when others then null;
    end;
    commit;
  end loop;
  
  -- select * from dba_jobs;
  -- select * from dba_jobs_running;
end;
/

-- exec nls_select_jobs(2, 1e9);
-- exec clean_jobs; -- stop all jobs