On Oracle: March 2014

Thursday, March 27, 2014

dbms_aq.dequeue - latch: row cache objects on AIX

dbms_aq.dequeue uses an ADT object_type to pick payload from queue, and this can cause a heavy
   latch: row cache objects (latch#: 280 and child#: 8)

which is used to protect Row Cache "dc_users" (cache# 10 and 7 in v$rowcache) in a CPU overloaded AIX System with Oracle 11.2.0.3.0.

At first, let's look at the special behaviour of dbms_aq.dequeue (see appended TestCase):
exec qstat;
exec deq(1, 5);
exec qstat;

The output looks like:
SQL> exec qstat;
  LOCKED_TOTAL=9,   PINNED_TOTAL=9
  SQL_TEXT=select , EXECUTIONS=19, ROWS_PROCESSED=3
  SQL_TEXT=insert , EXECUTIONS=3, ROWS_PROCESSED=3
SQL> exec deq(1, 5);
SQL> exec qstat;
  LOCKED_TOTAL=10,  PINNED_TOTAL=10
  SQL_TEXT=select , EXECUTIONS=22, ROWS_PROCESSED=3
  SQL_TEXT=insert , EXECUTIONS=3, ROWS_PROCESSED=3

It shows that the dequeue SELECT executed 3 times, and the LOCKED_TOTAL on the payload object: qtab_obj increased 1. By a 10046 event trace, we can see that two selects were run immediately after dequeue, and the third one was after 5 seconds, and finally throws: "ORA-25228: timeout".

By enqueuing one message, and then dequeue it:
exec qstat;
exec enq(1);
exec qstat;
exec deq(1, 5);
exec qstat;

The output is:
SQL> exec qstat;
  LOCKED_TOTAL=12,  PINNED_TOTAL=12
  SQL_TEXT=select , EXECUTIONS=23, ROWS_PROCESSED=4
  SQL_TEXT=insert , EXECUTIONS=4, ROWS_PROCESSED=4
SQL> exec enq(1);
SQL> exec qstat;
  LOCKED_TOTAL=13,  PINNED_TOTAL=13
  SQL_TEXT=select , EXECUTIONS=23, ROWS_PROCESSED=4
  SQL_TEXT=insert , EXECUTIONS=5, ROWS_PROCESSED=5
SQL> exec deq(1, 5);
SQL> exec qstat;
  LOCKED_TOTAL=14,  PINNED_TOTAL=14
  SQL_TEXT=select , EXECUTIONS=24, ROWS_PROCESSED=5
  SQL_TEXT=insert , EXECUTIONS=5, ROWS_PROCESSED=5

We can see 1 enqueue INSERT, 1 dequeue SELECT, and LOCKED_TOTAL increased 1 for enqueue, and 1 for dequeue.

Based on this observation, probably we can reconstruct one dbms_aq.dequeue pseudo code:
declare
  l_rec qtab%rowtype;
   cursor c is select /*+ INDEX(TAB AQ$_QTAB_I) */ ... for update skip locked;
begin
  open c;

  -- 1st fetch
  fetch c into l_rec;
  if c%found then return; end if;

  -- 2nd fetch
   fetch c into l_rec;
  if c%found then return; end if;

  -- thread_wait with timeout 5 seconds.
-- Within 5 seconds, if there is message available, waked up by a thread_post.
  thread_wait(5);

  fetch c into l_rec;
  if c%found then return; end if;

  finally close c;
          throw ORA-25228;
end;

By the way, "skip locked" option seems originally designed for AQ, and this select can not guarantee the consistency due to "skip locked" rows which are not deterministic to the result set.

Now we run a CPU intensive test on AIX Power7 System with Entitled Capacity 4, SMT=4 and Oracle 11.2.0.3.0:

exec loop_job_cpu_burning(48);
exec loop_job_enq(48, 0.01);
exec loop_job_deq(48, 1);

Then repeatedly run:
select event, v.* from v$active_session_history v where p2 = 280 order by sample_time desc;
select event, v.* from v$session v where p2 = 280;
select * from v$latch_children where latch# = 280 and child# in (8);
select * from v$latchholder;

we will observe:
latch: row cache objects

To make this event more noticeable, launch some dequeue jobs with NO_WAIT:
exec loop_job_deq(16, 0);

then there will be even more latch misses and sleeps (also some other events: "latch: cache buffers chains", "buffer busy waits", "redo copy").

This also indicates that if dequeue is faster than enqueue, a higher dequeue wait will reduce CPU load because each empty dequeue also triggers three dequeue select, and hence locks on payload object_type.

By running,
select locked_total, pinned_total, locks, pins, v.*
from v$db_object_cache v
order by v.locked_total desc;

we can notice huge LOCKED_TOTAL on payload object_type: "QTAB_OBJ". This is probably the root cause of "latch: row cache objects" caused by dbms_aq.dequeue.

In fact, it was first discovered by a 10046 Event Trace on a dequeue session and a system library_cache dump with:
alter session set events 'immediate trace name library_cache level 10';

Once the test is finished, remove all the jobs by:
exec clean_job;

UNIX Process Monitoring

By AIX truss or trace on a dequeue session process, we can observe continuous
thread_wait, thread_post
to sysnchonize IPC mechanism, but hardly see real work (for example: kread).

Running ps on a dequeue session process a few times (for example, PID 31785184):
ps -flp 31785184
it shows that both Column C (CPU utilization) and PRI (priority) are changed each time. And IBM AIX document states:

for the sched_other policy (AIX default), CPU utilization is used in determining process scheduling priority. Large values indicate a CPU intensive process and result in lower process priority, whereas small values indicate an I/O intensive process and result in a more favorable priority.

One reasonable explanation from UNIX level is that the process holds the single:
latch: row cache objects (latch#: 280 and child#: 8)
consumes a certain CPU (spin_gets), and was evaluated to a lower priority, yield CPU to others, and is not able to release this single latch. Whereas the other processes which are desperately requesting this latch are starving the CPU, but can't get this latch. And finally it results in a single latch contention, and all processes are serialized by this single Oracle resource.

We often see this latch taking on average several ms (3-40 ms), and occasionally more than one hour (that contradicts the common believe that latch is a short activity).

As we know that AIX Schedo Dispatcher uses a timeslice to choose the next processes.The default value for timeslice is a single clock tick, that is, 10 milliseconds.

One possible guess for the huge LOCKED_TOTAL on payload object_type: "QTAB_OBJ" is that AIX run-time linking problem in which each dequeue action requires a lock on "QTAB_OBJ" object file (or shared library (SO)).

Bug 14382262 latch free wait event in"kokc descriptor allocation latch"

Oracle seems aware of this huge LOCKED_TOTAL and published a patch to fix it. As tested, the patch stopped the increasing of LOCKED_TOTAL, but "latch: row cache objects" gets even worse. So all the tests in this Blog is without this patch.

latchstat tool

Imitated from vmstat, a small script: latchstat is created to to monitor latch activities, for example:
exec latchstat(4, 1, 280, 8);

(code is appended at the end of Blog).

---------------------- setup ----------------------
create or replace type qtab_obj is object (id integer, pr_id integer);
/
begin sys.dbms_aqadm.drop_queue_table(queue_table => 'QTAB', force=> TRUE); end;
/
begin
sys.dbms_aqadm.create_queue_table
(
    queue_table           => 'QTAB'
   ,queue_payload_type    => 'QTAB_OBJ'
   ,compatible            => '8.1'
   ,sort_list             => 'PRIORITY,ENQ_TIME'
   ,multiple_consumers    => false
   ,message_grouping      => 0
   ,comment               => 'Test Queue Table'
   ,secure                => false
   );
end;
/
begin
sys.dbms_aqadm.stop_queue (queue_name => 'QTAB_QUEUE');
sys.dbms_aqadm.drop_queue (queue_name => 'QTAB_QUEUE');
end;
/
begin
sys.dbms_aqadm.create_queue
(
    queue_name     => 'QTAB_QUEUE'
   ,queue_table    => 'QTAB'
   ,queue_type     => sys.dbms_aqadm.normal_queue
   ,max_retries    => 100
   ,retry_delay    => 2
   ,retention_time => 604800
   ,comment        => 'Test Queue'
   );
end;
/
begin sys.dbms_aqadm.start_queue(queue_name => 'QTAB_QUEUE', enqueue => true, dequeue => true); end;
/

---------------------- queuing ----------------------
create or replace procedure cpu_work(p_cnt number) as
l_x number;
begin
for i in 1..p_cnt loop
    l_x := l_x + sqrt(i);
end loop;
end;
/
create or replace procedure enq(p_cnt number, p_sleep_second number := 0.01) as
l_enqueue_options     dbms_aq.enqueue_options_t;
l_message_properties dbms_aq.message_properties_t;
l_message_id          raw(16);
l_message             qtab_obj;
begin
cpu_work(1000);
for i in 1..p_cnt loop
    l_message := qtab_obj(i, 2*i);
    if p_sleep_second > 0 then dbms_lock.sleep(p_sleep_second); end if;
    dbms_aq.enqueue(queue_name            => 'QTAB_QUEUE',
                    enqueue_options       => l_enqueue_options,
                    message_properties    => l_message_properties,
                    payload               => l_message,
                    msgid                 => l_message_id);
    commit;
end loop;
cpu_work(1000);
end;
/

create or replace procedure deq(p_cnt number, p_wait_second binary_integer := 1) as
l_dequeue_options     dbms_aq.dequeue_options_t;
l_message_properties dbms_aq.message_properties_t;
l_message_id          raw(16);
l_message             qtab_obj := qtab_obj(null, null);
begin
l_dequeue_options.wait       := p_wait_second;
cpu_work(1000);
for i in 1..p_cnt loop
   begin
     dbms_aq.dequeue(queue_name            => 'QTAB_QUEUE',
                     dequeue_options       => l_dequeue_options,
                     message_properties    => l_message_properties,
                     payload               => l_message,
                     msgid                 => l_message_id);
     commit;
  exception when others then null;
end;
end loop;
cpu_work(1000);
end;
/
create or replace procedure loop_job_enq(p_job_cnt number, p_sleep_second number := 0.01)
as
   l_job_id pls_integer;
begin
    for i in 1.. p_job_cnt loop
      dbms_job.submit(l_job_id, 'begin while true loop enq(100, '|| p_sleep_second ||'); end loop; end;');
    end loop;
    commit;
end;
/
create or replace procedure loop_job_deq(p_job_cnt number, p_wait_second binary_integer := 1)
as
   l_job_id pls_integer;
begin
    for i in 1.. p_job_cnt loop
      dbms_job.submit(l_job_id, 'begin while true loop deq(100, '|| p_wait_second ||'); end loop; end;');
    end loop;
    commit;
end;
/
create or replace procedure loop_job_cpu_burning (p_job_cnt number)
as
   l_job_id pls_integer;
begin
    for i in 1.. p_job_cnt loop
      dbms_job.submit(l_job_id, 'begin while true loop null; end loop; end;');
    end loop;
    commit;
end;
/
create or replace procedure qstat as
l_txt varchar2(100);
begin
select 'QTAB_TOTAL= '||count(*)||', '||'READY='||count(decode(state, 0, 1))||', At: '||systimestamp
into l_txt from QTAB;
dbms_output.put_line(l_txt);

select 'LOCKED_TOTAL='||locked_total||',   '||'PINNED_TOTAL='||pinned_total into l_txt
from v$db_object_cache v where name in ('QTAB_OBJ');
dbms_output.put_line(l_txt);

-- 313buw98w5y1a insert into "K"."QTAB"
-- 3yxj9h80vc0jw select /*+ INDEX(TAB AQ$_QTAB_I) */ ... for update skip locked

for c in (select substr(sql_text, 1, 7) sql_text, executions, rows_processed
            from v$sql v where sql_id in ('313buw98w5y1a','3yxj9h80vc0jw') and executions > 0)
loop
    dbms_output.put_line('SQL_TEXT='||c.sql_text||', '||'EXECUTIONS='||c.executions
                                    ||', '||'ROWS_PROCESSED='||c.rows_processed);
end loop;
end;
/
create or replace procedure clean_job as
begin
for c in (select d.job, d.sid, (select serial# from v$session where sid = d.sid) ser
            from dba_jobs_running d) loop
    begin
      execute immediate 'alter system kill session '''|| c.sid || ',' || c.ser ||''' immediate';
      dbms_job.remove(c.job);
    exception when others then null;
    end;
    commit;
end loop;

for c in (select job from dba_jobs) loop
    begin
      dbms_job.remove(c.job);
    exception when others then null;
    end;
    commit;
end loop;
end;
/

---------------------- latchstat----------------------
set echo on
drop type t_latch_rec_obj force;
create or replace type t_latch_rec_obj as object(
      snap                    number
     ,tim                     timestamp
     ,name                    varchar2(30)
     ,v1                      number
     ,v2                      number
     ,v3                      number
     ,v4                      number
     ,v5                      number
     ,v6                      number
     ,v7                      number
     ,v8                      number
     ,v9                      number
     ,v10                     number
    );
/

create or replace type t_lacth_rec_tab as table of t_latch_rec_obj;
/
create or replace procedure latchstat(cnt number, interval number, latch_num number, child_num number := null) as
l_col_len_s       pls_integer := 8;
l_col_len_l       pls_integer := 16;
l_rec_tab         t_lacth_rec_tab := t_lacth_rec_tab();
l_rec_tab_child   t_lacth_rec_tab := t_lacth_rec_tab();
l_latch_name      varchar2(20);
l_children_cnt    pls_integer;
l_top_5_children varchar2(100);
begin

select max(name), count(*)
into l_latch_name, l_children_cnt
from v$latch_children v where latch# in (latch_num);

select listagg(child# ||'('|| sleeps ||')', ' / ') within group (order by wait_time desc) child_sleeps
into   l_top_5_children
from (select * from v$latch_children v where latch# in (latch_num) order by wait_time desc)
where rownum <= 5;
dbms_output.put_line(' Latch: '                 || l_latch_name
                    || ', Children#: '             || l_children_cnt
                    || ', Top 5 Children(Sleeps): '|| l_top_5_children
                    || ', At: '                    || systimestamp);

dbms_output.put_line(null);

dbms_output.put_line(lpad('P_gets', l_col_len_s)
                 ||lpad('misses', l_col_len_s)
                 ||lpad('sleeps', l_col_len_s)
                 ||lpad('spin_gets', l_col_len_l)
                 ||lpad('wait_time(ms)', l_col_len_l)
                 ||lpad('avg_wait_time', l_col_len_l)
                 ||lpad(' | ', l_col_len_s)
                 ||lpad('C_gets', l_col_len_s)
                 ||lpad('misses', l_col_len_s)
                 ||lpad('sleeps', l_col_len_s)
                 ||lpad('spin_gets', l_col_len_l)
                 ||lpad('wait_time(ms)', l_col_len_l)
                 ||lpad('avg_wait_time', l_col_len_l)
                 );

for i in 1..cnt loop
    select t_latch_rec_obj(i, systimestamp, l.name, l.gets, l.misses, l.sleeps, l.spin_gets, l.wait_time, k.gets, k.misses, k.sleeps, k.spin_gets, k.wait_time)
    bulk collect into l_rec_tab
    from v$latch l, v$latch_children k
    where l.latch# = latch_num
    and l.latch# = k.latch#(+)
      and child_num = k.child#(+);

    dbms_lock.sleep(interval);

    -- only select one Row --
    for c in
      (select l.name, v1.snap, v1.tim snap_start, systimestamp snap_end
             ,(l.gets - v1.v1) l_get_d, (l.misses - v1.v2) l_misses_d, (l.sleeps - v1.v3) l_sleeps_d
             ,(l.spin_gets - v1.v4) l_spin_gets_d, round((l.wait_time - v1.v5)/1000) l_wait_time_d
             ,(case when (l.gets - v1.v1) > 0 then round((l.wait_time - v1.v5)/(l.gets - v1.v1), 2)
                    else null
               end) l_avg_wait_time_d
             ,(k.gets - v1.v6) k_get_d, (k.misses - v1.v7) k_misses_d, (k.sleeps - v1.v8) k_sleeps_d
        ,(k.spin_gets - v1.v9) k_spin_gets_d, round((k.wait_time - v1.v10)/1000) k_wait_time_d
        ,(case when (k.gets - v1.v6) > 0 then round((k.wait_time - v1.v10)/(k.gets - v1.v6), 2)
               else null
          end) k_avg_wait_time_d
       from   table(l_rec_tab) v1, v$latch l, v$latch_children k
       where l.latch# = latch_num
       and l.latch# = k.latch#(+)
         and child_num = k.child#(+)
      )
    loop
      -- Parent --
      dbms_output.put( lpad(c.l_get_d, l_col_len_s)
                      ||lpad(c.l_misses_d, l_col_len_s)
                      ||lpad(c.l_sleeps_d, l_col_len_s)
                      ||lpad(c.l_spin_gets_d, l_col_len_l)
                      ||lpad(c.l_wait_time_d, l_col_len_l)
                      ||lpad(c.l_avg_wait_time_d, l_col_len_l));

      -- Child --
   dbms_output.put( lpad(' | ', l_col_len_s)
                   ||lpad(c.k_get_d, l_col_len_s)
                   ||lpad(c.k_misses_d, l_col_len_s)
                   ||lpad(c.k_sleeps_d, l_col_len_s)
                   ||lpad(c.k_spin_gets_d, l_col_len_l)
                   ||lpad(c.k_wait_time_d, l_col_len_l)
                   ||lpad(c.k_avg_wait_time_d, l_col_len_l));

    dbms_output.put_line(null);
    end loop;

end loop;
end;
/

/*
----- Usage latchstat(cnt, interval, latch_num, child_num) ------
----- Example (latch_num=280: "row cache objects", child_num=8: "dc_users" ) ------

  exec latchstat(4, 1, 280);
  exec latchstat(4, 1, 280, 8);
*/
set serveroutput on
set echo off

Tuesday, March 11, 2014

Hot Block Identification and latch: cache buffers chains

When Hot Block Event:

    latch: cache buffers chains
    buffer busy waits

occurs intensively, system performance is sluggish and CPU starved.
Locating Hot Block originals could be the first step before tuning system and SQL statements.
This Blog will first show 2 queries to list the Hot Blocks, and 1 query to connect Hot Blocks and their fetching Sessions. Then try to demonstrate some Workaround and Fixes.

Note. All tests are done in Oracle 12.1.0.2.0 and 11.2.0.3.0

1. Hot Block x$bh Query


with sq as (select object_name, data_object_id from dba_objects)
select hladdr cbc_latch_addr
      ,sum(tch) tch_sum
      ,listagg(tch || '-' || obj || '(' || object_name || ')/' || file# || '/' ||dbablk, ';') 
         within group (order by tch desc)  "tch-obj(name)/file/blk_list"
      ,count(*) protected_db_blocks_cnt
from  x$bh b, sq
where b.obj = sq.data_object_id
  and tch > 0
group by hladdr
order by tch_sum desc;

2. Hot Block Tracking Query


alter system set "_db_hot_block_tracking"=true;

with sq as (select object_name, data_object_id from dba_objects)    
select object_name ,h.*, b.*
  from x$kslhot h, v$bh b, sq
where b.block# = h.kslhot_id
  and b.objd   = sq.data_object_id
order by h.kslhot_ref desc, b.file#, b.block#;

Once detecting the Hot Block, listing the rows in the block is straightforward. If the Hot Block belongs to a heap table, dbms_rowid.rowid_block_number will do that.

In case of Index block and IOT table block, an internal Oracle function can be used to map block to the entries. Voilà an example:


create table test_iot
(
  id number(9), name varchar2(10),  
  constraint test_iot_key
  primary key (id, name)
  enable validate
)
organization index;

insert into test_iot select level, 'test_'||level from dual connect by level < 100000;

commit;

select object_id from dba_objects where object_name = 'TEST_IOT_KEY';

--returns 

  278020

Now we can build the mapping between block# and rows:


select dbms_rowid.rowid_block_number(sys_op_lbid (278020, 'L', t.rowid))  block#
      ,t.*
  from test_iot t
 where id is not null or name is not null;

By the way, following short query can be used to check index quality:


select block#
      ,count(*) rows_per_block
from (
  select dbms_rowid.rowid_block_number(sys_op_lbid (278020, 'L', t.rowid))  block#
        ,t.*
    from test_iot t
   where id is not null or name is not null
)
group by block#
order by block#;

as discussed in Blog: "Index Efficiency 3"

3. Hot Blocks and Sessions

We can connect Data Blocks with fetching Sessions via CBC Latch Address during the last 10 wait events as follows:


with lch_sid as 
 (select /*+ materialize */ event, p1, sid, sum(wait_time_micro) wait_time_micro, count(*) wait_cnt
   from v$session_wait_history 
  where (p2=228 or event = 'latch: cache buffers chains') group by event, p1, sid)
,lch as 
 (select /*+ materialize */ event, p1
       ,listagg(sid || '(' || wait_time_micro || '/' || wait_cnt || ')', ';')
          within group (order by wait_time_micro desc)  "sid(wait_time_micro/wait_cnt)"
    from lch_sid group by event, p1) 
select hladdr cbc_latch_addr, w.p1 latch_addr_number, sum(tch) tch_sum, count(*) protected_db_blocks_cnt
       ,w."sid(wait_time_micro/wait_cnt)"
       ,listagg(tch || '-' || b.obj || '(' || ob.object_name || ')/' || file# || '/' ||dbablk, ';') 
          within group (order by tch desc)  "tch-obj(name)/file/blk_list"
  from  x$bh b, dba_objects ob, lch w
 where b.obj = ob.data_object_id(+)
   and tch > 0
   and to_number(hladdr, 'XXXXXXXXXXXXXXXX') = w.p1
 group by hladdr, w.p1
 order by tch_sum desc;

4. Workaround and Fixes

Hot Block is a phenomenon of frequently access / update of same DB Block concurrently by multiple sessions, exposed as "latch: cache buffers chains" (CBC Latch) contention.

DB Data Block is either Table Block or Index Block.

4.1 Table Block

We can use "MINIMIZE RECORDS_PER_BLOCK" to control the number of rows in each Block, one optimal value should be no more than "_db_block_max_cr_dba" (Maximum Allowed Number of CR buffers per dba), which is 6 in default (PCTFREE is not able to control the exact number of rows). For example,


drop table test_tab;
              
create table test_tab 
  INITRANS   26           -- prevent Segments ITL Waits and Deadlocks
as select level id, rpad('ABC', 10, 'X') val from dual connect by level <= 5;

alter table test_tab minimize records_per_block;

truncate table test_tab;

insert into test_tab select level id, rpad('ABC', 10, 'X') val from dual connect by level <= 100;

commit;

Then verify the number of entries per Block:


select block, count (*) cnt 
  from (select rowid rid, 
               dbms_rowid.rowid_object (rowid) object, 
               dbms_rowid.rowid_relative_fno (rowid) fil, 
               dbms_rowid.rowid_block_number (rowid) block, 
               dbms_rowid.rowid_row_number (rowid) ro, 
               t.* 
          from test_tab t 
         order by fil, block, ro) 
 group by block 
 order by block, cnt desc;
 

     BLOCK   CNT
  -------- -----
   1035776     5
   1035777     5
   1035778     5
   ... 
  20 rows selected.

select dbms_rowid.rowid_to_absolute_fno(t.rowid, 'K', 'TEST_TAB') af,
       dbms_rowid.rowid_relative_fno (rowid) rf,
       dbms_rowid.rowid_block_number (rowid) block,
       dbms_rowid.rowid_row_number(rowid)    row_no,
      (sys_op_rpb(rowid)) rpb,
       t.* 
  from test_tab t
 order by af, rf, block, rpb;

      AF  RF     BLOCK  ROW_NO  RPB   ID VAL
  ----- --- --------- ------- ---- ---- ----------
   1548   0   1035776       0    0   31 ABCXXXXXXX
   1548   0   1035776       1    1   32 ABCXXXXXXX
   1548   0   1035776       2    2   33 ABCXXXXXXX
   1548   0   1035776       3    3   34 ABCXXXXXXX
   1548   0   1035776       4    4   35 ABCXXXXXXX
   1548   0   1035777       0    0   36 ABCXXXXXXX
   ...
  100 rows selected.

Blog: Spreading Out Data With MINIMIZE RECORDS_PER_BLOCK reveals the hidden mechanic ("Hakan factor") behind the scene:


select spare1
      ,least(1, bitand(spare1, (power(2, 15)))) 
         "16th bit Hakan factor"          -- "minimize records_per_block" flag
from sys.tab$ where obj# = (select obj# from sys.obj$ where name = 'TEST_TAB');

   SPARE1 16th bit Hakan factor
  ------- ---------------------
    32772                     1

4.2 Index Block

We need to use PCTFREE, and INITRANS to approximate "RECORDS_PER_BLOCK".


drop index test_tab#u1;

create unique index test_tab#u1 on k.test_tab (id) 
  pctfree    90
  initrans   26        -- prevent Segments ITL Waits and Deadlocks
;

exec dbms_stats.gather_table_stats(null, 'TEST_TAB', cascade => TRUE);

Then verify the number of entries per Block:


select object_id from dba_objects where object_name = 'TEST_TAB#U1';

-- return 2267423

select block#
      ,count(*) rows_per_block
from (
  select dbms_rowid.rowid_block_number(sys_op_lbid (2267423, 'L', t.rowid))  block#
        ,t.*
    from TEST_TAB t
   where id is not null
)
group by block#
order by block#; 

    BLOCK# ROWS_PER_BLOCK
  -------- --------------
   1035796              5
   1035797              5
   1035798              5
   ...
  20 rows selected.

Above method is not able to regulate exact number of index entries per block. Moreover, we need to check if both Leaf block and Branch block are subject to the same "PCTFREE, and INITRANS" conditions.

Dumping Leaf block and Branch block:


Leaf block dump
===============
header address 18446741324863441060=0xfffffd7ffc0108a4
kdxcolev 0              <<<--- index level (0 represents Leaf blocks)                                     
KDXCOLEV Flags = - - -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
kdxconco 1
kdxcosdc 0
kdxconro 5              <<<--- number of index entries
kdxcofbo 46=0x2e
kdxcofeo 7401=0x1ce9
kdxcoavs 7355
kdxlespl 0
kdxlende 0
kdxlenxt 1035799=0xfce17
kdxleprv 1035797=0xfce15
kdxledsz 6
kdxlebksz 7456

kdxconro: number of index entries

Branch block dump
=================
header address 18446741324863505996=0xfffffd7ffc02064c
kdxcolev 1              <<<--- index level (1 represents Branch blocks)         
KDXCOLEV Flags = - - -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
kdxconco 1
kdxcosdc 0
kdxconro 19              <<<--- number of index entries
kdxcofbo 66=0x42
kdxcofeo 7923=0x1ef3
kdxcoavs 7857
kdxbrlmc 1035796=0xfce14
kdxbrsno 0
kdxbrbksz 8056 
kdxbr2urrc 15

which shows the number of index entries:

  Leaf block    "kdxconro 5"    
  Branch block  "kdxconro 19"

So it seems that Branch block is not obliged to the same "PCTFREE, and INITRANS" as Leaf block.

Another observation is that the 5 Index Rows per Block is not persistent after Table is modified. That means some blocks can contain more or less than 5 Rows.

5. Index Hot Block Application Fix

Index Block can experience more acute CBC Latch contention because its size is usually smaller, and hence one block can contain more entries.

Since we have no way to apply "minimize records_per_block" to Index Blocks, in the application, we can try to minimize INDEX usage to cut down excess index access.

For example, by caching ROWID of previously fetched ROWs, at subsequent time, we can make the direct table fetch, and therefore bypass INDEX access:


declare
  l_id           number;
  l_val          varchar2(10);
  type tab_rowid is table of rowid index by pls_integer;
 rowid_cache    tab_rowid;
begin
  for k in 1..100 loop
    for j in 1..10 loop
      if rowid_cache.exists(k) then
        begin
          select /*+ not use index access */ id, val, rowid into l_id, l_val, rowid_cache(k) 
            from test_tab 
           where rowid = rowid_cache(k)
             and id    = k;
        exception when others then dbms_output.put_line('No_Data_Found: id='||k||', rowid='||rowid_cache(k));
        end;
      end if;
      
      if l_id is null or l_id != k then
        select /*+ use index access */ id, val, rowid into l_id, l_val, rowid_cache(k)  
          from test_tab 
         where id    = k;      
        dbms_output.put_line('Fill rowid_cache: id='||k||', rowid='||rowid_cache(k));
      end if; 
    end loop;
  end loop;
end;
/

Then we have two XPLANs: one single Run with Index reading 3 Blocks; other repeated Runs without Index reading one single Block.


  /*+ use index access */:     3 Block Reads (1 Branch block + 1 Leaf block + 1 Table Block)
  /*+ not use index access */: 1 Block Read  (1 Table Block)
  

SELECT /*+ use index access */ ID, VAL, ROWID FROM TEST_TAB WHERE ID = :B1

  Rows     Row Source Operation
  -------  ---------------------------------------------------
        1  TABLE ACCESS BY INDEX ROWID TEST_TAB (cr=3 pr=3 pw=0 time=145 us cost=1 size=14 card=1)
        1   INDEX UNIQUE SCAN TEST_TAB#U1 (cr=2 pr=2 pw=0 time=102 us cost=1 size=0 card=1)(object id 2267423)
      

SELECT /*+ not use index access */ ID, VAL, ROWID FROM TEST_TAB WHERE ROWID = :B1 AND ID = :B2
 
  Rows     Row Source Operation
  -------  ---------------------------------------------------
        1  TABLE ACCESS BY USER ROWID TEST_TAB (cr=1 pr=0 pw=0 time=10 us cost=1 size=14 card=1)