latch: row cache objects (latch#: 280 and child#: 8)
which is used to protect Row Cache "dc_users" (cache# 10 and 7 in v$rowcache) in a CPU overloaded AIX System with Oracle 11.2.0.3.0.
At first, let's look at the special behaviour of dbms_aq.dequeue (see appended TestCase):
exec qstat;
exec deq(1, 5);
exec qstat;
The output looks like:
SQL> exec qstat;
LOCKED_TOTAL=9, PINNED_TOTAL=9
SQL_TEXT=select , EXECUTIONS=19, ROWS_PROCESSED=3
SQL_TEXT=insert , EXECUTIONS=3, ROWS_PROCESSED=3
SQL> exec deq(1, 5);
SQL> exec qstat;
LOCKED_TOTAL=10, PINNED_TOTAL=10
SQL_TEXT=select , EXECUTIONS=22, ROWS_PROCESSED=3
SQL_TEXT=insert , EXECUTIONS=3, ROWS_PROCESSED=3
It shows that the dequeue SELECT executed 3 times, and the LOCKED_TOTAL on the payload object: qtab_obj increased 1. By a 10046 event trace, we can see that two selects were run immediately after dequeue, and the third one was after 5 seconds, and finally throws: "ORA-25228: timeout".
By enqueuing one message, and then dequeue it:
exec qstat;
exec enq(1);
exec qstat;
exec deq(1, 5);
exec qstat;
The output is:
SQL> exec qstat;
LOCKED_TOTAL=12, PINNED_TOTAL=12
SQL_TEXT=select , EXECUTIONS=23, ROWS_PROCESSED=4
SQL_TEXT=insert , EXECUTIONS=4, ROWS_PROCESSED=4
SQL> exec enq(1);
SQL> exec qstat;
LOCKED_TOTAL=13, PINNED_TOTAL=13
SQL_TEXT=select , EXECUTIONS=23, ROWS_PROCESSED=4
SQL_TEXT=insert , EXECUTIONS=5, ROWS_PROCESSED=5
SQL> exec deq(1, 5);
SQL> exec qstat;
LOCKED_TOTAL=14, PINNED_TOTAL=14
SQL_TEXT=select , EXECUTIONS=24, ROWS_PROCESSED=5
SQL_TEXT=insert , EXECUTIONS=5, ROWS_PROCESSED=5
We can see 1 enqueue INSERT, 1 dequeue SELECT, and LOCKED_TOTAL increased 1 for enqueue, and 1 for dequeue.
Based on this observation, probably we can reconstruct one dbms_aq.dequeue pseudo code:
declare
l_rec qtab%rowtype;
cursor c is select /*+ INDEX(TAB AQ$_QTAB_I) */ ... for update skip locked;
begin
open c;
-- 1st fetch
fetch c into l_rec;
if c%found then return; end if;
-- 2nd fetch
fetch c into l_rec;
if c%found then return; end if;
-- thread_wait with timeout 5 seconds.
-- Within 5 seconds, if there is message available, waked up by a thread_post.
thread_wait(5);
fetch c into l_rec;
if c%found then return; end if;
finally close c;
throw ORA-25228;
end;
By the way, "skip locked" option seems originally designed for AQ, and this select can not guarantee the consistency due to "skip locked" rows which are not deterministic to the result set.
Now we run a CPU intensive test on AIX Power7 System with Entitled Capacity 4, SMT=4 and Oracle 11.2.0.3.0:
exec loop_job_cpu_burning(48);
exec loop_job_enq(48, 0.01);
exec loop_job_deq(48, 1);
Then repeatedly run:
select event, v.* from v$active_session_history v where p2 = 280 order by sample_time desc;
select event, v.* from v$session v where p2 = 280;
select * from v$latch_children where latch# = 280 and child# in (8);
select * from v$latchholder;
we will observe:
latch: row cache objects
To make this event more noticeable, launch some dequeue jobs with NO_WAIT:
exec loop_job_deq(16, 0);
then there will be even more latch misses and sleeps (also some other events: "latch: cache buffers chains", "buffer busy waits", "redo copy").
This also indicates that if dequeue is faster than enqueue, a higher dequeue wait will reduce CPU load because each empty dequeue also triggers three dequeue select, and hence locks on payload object_type.
By running,
select locked_total, pinned_total, locks, pins, v.*
from v$db_object_cache v
order by v.locked_total desc;
we can notice huge LOCKED_TOTAL on payload object_type: "QTAB_OBJ". This is probably the root cause of "latch: row cache objects" caused by dbms_aq.dequeue.
In fact, it was first discovered by a 10046 Event Trace on a dequeue session and a system library_cache dump with:
alter session set events 'immediate trace name library_cache level 10';
Once the test is finished, remove all the jobs by:
exec clean_job;
UNIX Process Monitoring
By AIX truss or trace on a dequeue session process, we can observe continuous
thread_wait, thread_post
to sysnchonize IPC mechanism, but hardly see real work (for example: kread).
Running ps on a dequeue session process a few times (for example, PID 31785184):
ps -flp 31785184
it shows that both Column C (CPU utilization) and PRI (priority) are changed each time. And IBM AIX document states:
for the sched_other policy (AIX default), CPU utilization is used in determining process scheduling priority. Large values indicate a CPU intensive process and result in lower process priority, whereas small values indicate an I/O intensive process and result in a more favorable priority.
One reasonable explanation from UNIX level is that the process holds the single:
latch: row cache objects (latch#: 280 and child#: 8)
consumes a certain CPU (spin_gets), and was evaluated to a lower priority, yield CPU to others, and is not able to release this single latch. Whereas the other processes which are desperately requesting this latch are starving the CPU, but can't get this latch. And finally it results in a single latch contention, and all processes are serialized by this single Oracle resource.
We often see this latch taking on average several ms (3-40 ms), and occasionally more than one hour (that contradicts the common believe that latch is a short activity).
As we know that AIX Schedo Dispatcher uses a timeslice to choose the next processes.The default value for timeslice is a single clock tick, that is, 10 milliseconds.
One possible guess for the huge LOCKED_TOTAL on payload object_type: "QTAB_OBJ" is that AIX run-time linking problem in which each dequeue action requires a lock on "QTAB_OBJ" object file (or shared library (SO)).
Bug 14382262 latch free wait event in"kokc descriptor allocation latch"
Oracle seems aware of this huge LOCKED_TOTAL and published a patch to fix it. As tested, the patch stopped the increasing of LOCKED_TOTAL, but "latch: row cache objects" gets even worse. So all the tests in this Blog is without this patch.
latchstat tool
Imitated from vmstat, a small script: latchstat is created to to monitor latch activities, for example:
exec latchstat(4, 1, 280, 8);
(code is appended at the end of Blog).
---------------------- setup ----------------------
create or replace type qtab_obj is object (id integer, pr_id integer);
/
begin sys.dbms_aqadm.drop_queue_table(queue_table => 'QTAB', force=> TRUE); end;
/
begin
sys.dbms_aqadm.create_queue_table
(
queue_table => 'QTAB'
,queue_payload_type => 'QTAB_OBJ'
,compatible => '8.1'
,sort_list => 'PRIORITY,ENQ_TIME'
,multiple_consumers => false
,message_grouping => 0
,comment => 'Test Queue Table'
,secure => false
);
end;
/
begin
sys.dbms_aqadm.stop_queue (queue_name => 'QTAB_QUEUE');
sys.dbms_aqadm.drop_queue (queue_name => 'QTAB_QUEUE');
end;
/
begin
sys.dbms_aqadm.create_queue
(
queue_name => 'QTAB_QUEUE'
,queue_table => 'QTAB'
,queue_type => sys.dbms_aqadm.normal_queue
,max_retries => 100
,retry_delay => 2
,retention_time => 604800
,comment => 'Test Queue'
);
end;
/
begin sys.dbms_aqadm.start_queue(queue_name => 'QTAB_QUEUE', enqueue => true, dequeue => true); end;
/
---------------------- queuing ----------------------
create or replace procedure cpu_work(p_cnt number) as
l_x number;
begin
for i in 1..p_cnt loop
l_x := l_x + sqrt(i);
end loop;
end;
/
create or replace procedure enq(p_cnt number, p_sleep_second number := 0.01) as
l_enqueue_options dbms_aq.enqueue_options_t;
l_message_properties dbms_aq.message_properties_t;
l_message_id raw(16);
l_message qtab_obj;
begin
cpu_work(1000);
for i in 1..p_cnt loop
l_message := qtab_obj(i, 2*i);
if p_sleep_second > 0 then dbms_lock.sleep(p_sleep_second); end if;
dbms_aq.enqueue(queue_name => 'QTAB_QUEUE',
enqueue_options => l_enqueue_options,
message_properties => l_message_properties,
payload => l_message,
msgid => l_message_id);
commit;
end loop;
cpu_work(1000);
end;
/
create or replace procedure deq(p_cnt number, p_wait_second binary_integer := 1) as
l_dequeue_options dbms_aq.dequeue_options_t;
l_message_properties dbms_aq.message_properties_t;
l_message_id raw(16);
l_message qtab_obj := qtab_obj(null, null);
begin
l_dequeue_options.wait := p_wait_second;
cpu_work(1000);
for i in 1..p_cnt loop
begin
dbms_aq.dequeue(queue_name => 'QTAB_QUEUE',
dequeue_options => l_dequeue_options,
message_properties => l_message_properties,
payload => l_message,
msgid => l_message_id);
commit;
exception when others then null;
end;
end loop;
cpu_work(1000);
end;
/
create or replace procedure loop_job_enq(p_job_cnt number, p_sleep_second number := 0.01)
as
l_job_id pls_integer;
begin
for i in 1.. p_job_cnt loop
dbms_job.submit(l_job_id, 'begin while true loop enq(100, '|| p_sleep_second ||'); end loop; end;');
end loop;
commit;
end;
/
create or replace procedure loop_job_deq(p_job_cnt number, p_wait_second binary_integer := 1)
as
l_job_id pls_integer;
begin
for i in 1.. p_job_cnt loop
dbms_job.submit(l_job_id, 'begin while true loop deq(100, '|| p_wait_second ||'); end loop; end;');
end loop;
commit;
end;
/
create or replace procedure loop_job_cpu_burning (p_job_cnt number)
as
l_job_id pls_integer;
begin
for i in 1.. p_job_cnt loop
dbms_job.submit(l_job_id, 'begin while true loop null; end loop; end;');
end loop;
commit;
end;
/
create or replace procedure qstat as
l_txt varchar2(100);
begin
select 'QTAB_TOTAL= '||count(*)||', '||'READY='||count(decode(state, 0, 1))||', At: '||systimestamp
into l_txt from QTAB;
dbms_output.put_line(l_txt);
select 'LOCKED_TOTAL='||locked_total||', '||'PINNED_TOTAL='||pinned_total into l_txt
from v$db_object_cache v where name in ('QTAB_OBJ');
dbms_output.put_line(l_txt);
-- 313buw98w5y1a insert into "K"."QTAB"
-- 3yxj9h80vc0jw select /*+ INDEX(TAB AQ$_QTAB_I) */ ... for update skip locked
for c in (select substr(sql_text, 1, 7) sql_text, executions, rows_processed
from v$sql v where sql_id in ('313buw98w5y1a','3yxj9h80vc0jw') and executions > 0)
loop
dbms_output.put_line('SQL_TEXT='||c.sql_text||', '||'EXECUTIONS='||c.executions
||', '||'ROWS_PROCESSED='||c.rows_processed);
end loop;
end;
/
create or replace procedure clean_job as
begin
for c in (select d.job, d.sid, (select serial# from v$session where sid = d.sid) ser
from dba_jobs_running d) loop
begin
execute immediate 'alter system kill session '''|| c.sid || ',' || c.ser ||''' immediate';
dbms_job.remove(c.job);
exception when others then null;
end;
commit;
end loop;
for c in (select job from dba_jobs) loop
begin
dbms_job.remove(c.job);
exception when others then null;
end;
commit;
end loop;
end;
/
---------------------- latchstat----------------------
set echo on
drop type t_latch_rec_obj force;
create or replace type t_latch_rec_obj as object(
snap number
,tim timestamp
,name varchar2(30)
,v1 number
,v2 number
,v3 number
,v4 number
,v5 number
,v6 number
,v7 number
,v8 number
,v9 number
,v10 number
);
/
create or replace type t_lacth_rec_tab as table of t_latch_rec_obj;
/
create or replace procedure latchstat(cnt number, interval number, latch_num number, child_num number := null) as
l_col_len_s pls_integer := 8;
l_col_len_l pls_integer := 16;
l_rec_tab t_lacth_rec_tab := t_lacth_rec_tab();
l_rec_tab_child t_lacth_rec_tab := t_lacth_rec_tab();
l_latch_name varchar2(20);
l_children_cnt pls_integer;
l_top_5_children varchar2(100);
begin
select max(name), count(*)
into l_latch_name, l_children_cnt
from v$latch_children v where latch# in (latch_num);
select listagg(child# ||'('|| sleeps ||')', ' / ') within group (order by wait_time desc) child_sleeps
into l_top_5_children
from (select * from v$latch_children v where latch# in (latch_num) order by wait_time desc)
where rownum <= 5;
dbms_output.put_line(' Latch: ' || l_latch_name
|| ', Children#: ' || l_children_cnt
|| ', Top 5 Children(Sleeps): '|| l_top_5_children
|| ', At: ' || systimestamp);
dbms_output.put_line(null);
dbms_output.put_line(lpad('P_gets', l_col_len_s)
||lpad('misses', l_col_len_s)
||lpad('sleeps', l_col_len_s)
||lpad('spin_gets', l_col_len_l)
||lpad('wait_time(ms)', l_col_len_l)
||lpad('avg_wait_time', l_col_len_l)
||lpad(' | ', l_col_len_s)
||lpad('C_gets', l_col_len_s)
||lpad('misses', l_col_len_s)
||lpad('sleeps', l_col_len_s)
||lpad('spin_gets', l_col_len_l)
||lpad('wait_time(ms)', l_col_len_l)
||lpad('avg_wait_time', l_col_len_l)
);
for i in 1..cnt loop
select t_latch_rec_obj(i, systimestamp, l.name, l.gets, l.misses, l.sleeps, l.spin_gets, l.wait_time, k.gets, k.misses, k.sleeps, k.spin_gets, k.wait_time)
bulk collect into l_rec_tab
from v$latch l, v$latch_children k
where l.latch# = latch_num
and l.latch# = k.latch#(+)
and child_num = k.child#(+);
dbms_lock.sleep(interval);
-- only select one Row --
for c in
(select l.name, v1.snap, v1.tim snap_start, systimestamp snap_end
,(l.gets - v1.v1) l_get_d, (l.misses - v1.v2) l_misses_d, (l.sleeps - v1.v3) l_sleeps_d
,(l.spin_gets - v1.v4) l_spin_gets_d, round((l.wait_time - v1.v5)/1000) l_wait_time_d
,(case when (l.gets - v1.v1) > 0 then round((l.wait_time - v1.v5)/(l.gets - v1.v1), 2)
else null
end) l_avg_wait_time_d
,(k.gets - v1.v6) k_get_d, (k.misses - v1.v7) k_misses_d, (k.sleeps - v1.v8) k_sleeps_d
,(k.spin_gets - v1.v9) k_spin_gets_d, round((k.wait_time - v1.v10)/1000) k_wait_time_d
,(case when (k.gets - v1.v6) > 0 then round((k.wait_time - v1.v10)/(k.gets - v1.v6), 2)
else null
end) k_avg_wait_time_d
from table(l_rec_tab) v1, v$latch l, v$latch_children k
where l.latch# = latch_num
and l.latch# = k.latch#(+)
and child_num = k.child#(+)
)
loop
-- Parent --
dbms_output.put( lpad(c.l_get_d, l_col_len_s)
||lpad(c.l_misses_d, l_col_len_s)
||lpad(c.l_sleeps_d, l_col_len_s)
||lpad(c.l_spin_gets_d, l_col_len_l)
||lpad(c.l_wait_time_d, l_col_len_l)
||lpad(c.l_avg_wait_time_d, l_col_len_l));
-- Child --
dbms_output.put( lpad(' | ', l_col_len_s)
||lpad(c.k_get_d, l_col_len_s)
||lpad(c.k_misses_d, l_col_len_s)
||lpad(c.k_sleeps_d, l_col_len_s)
||lpad(c.k_spin_gets_d, l_col_len_l)
||lpad(c.k_wait_time_d, l_col_len_l)
||lpad(c.k_avg_wait_time_d, l_col_len_l));
dbms_output.put_line(null);
end loop;
end loop;
end;
/
/*
----- Usage latchstat(cnt, interval, latch_num, child_num) ------
----- Example (latch_num=280: "row cache objects", child_num=8: "dc_users" ) ------
exec latchstat(4, 1, 280);
exec latchstat(4, 1, 280, 8);
*/
set serveroutput on
set echo off