In the first Blog (I), we made test to show Wait Events: "library cache: mutex X" on queue object and "latch free" on Latch "channel operations parent latch".
In the second Blog (II), we first performed dequeue trace by GDB. Based on test output, we built Latch and Mutex model, examined it against real DB stats and subsequently revealed implementation problems.
In this third Blog (III), we will trace dequeue operation with different number of parallel sessions in Dtrace, present the performance stats, and evaluate Blog (II) model against real DB stats.
We conclude the Blogs with workaround (fix) for identified "library cache: mutex X" and "latch free".
Note: tested on Oracle 19.7 in Solaris.
1. Single Message Dequeue Latch and Mutex
Same as Blog (II), we first get child latch and mutex addr as follows:
select addr from v$latch_children where latch#=98 and child#=144;
-- D371C240
select kglhdmtx, kglhdadr, v.* from sys.x_KGLOB v where kglnaobj = 'MSG_QUEUE_MULTIPLE';
-- 99CF9A20 99CF98D0
Then we compose below Dtrace script:
sudo dtrace -n \
'
BEGIN{
LATCHCHD_ADDR = 0XD371C240;
KGLHDMTX_ADDR = 0X99CF9A20;
l_get_seq = 0; l_rel_seq = 0;
m_get_seq = 0; m_rel_seq = 0}
pid$target:oracle:ksl_get_shared_latch:entry /arg0 == LATCHCHD_ADDR/
{printf("Latch Get Seq: %d --- %s (Addr=>0x%X, Mode=>%d)", ++l_get_seq, probefunc, arg0, arg4)}
pid$target:oracle:kslfre:entry /arg0 == LATCHCHD_ADDR/
{printf("Latch Fre Seq: %d-%d --- %s (Addr=>0x%X)", l_get_seq, ++l_rel_seq, probefunc, arg0)}
pid$target:oracle:kglGetMutex:entry /arg1 == KGLHDMTX_ADDR/
{printf("Mutex Get Seq: %d --- %s (KGLHDMTX_ADDR=>0x%X, Location=>%d, KGLLKHDL_ADDR=>0x%X)", ++m_get_seq, probefunc, arg1, arg4, arg5);}
pid$target:oracle:kglReleaseMutex:entry /arg1 == KGLHDMTX_ADDR/
{printf("Mutex Rel Seq: %d-%d --- %s (KGLHDMTX_ADDR=>0x%X)", m_get_seq, ++m_rel_seq, probefunc, arg1);}
END{printf("Total Counters: Latch GETs: %d, Latch FREs: %d, Mutex GETs: %d, Mutex RELs: %d", l_get_seq, l_rel_seq, m_get_seq, m_rel_seq)}
' -p 7476
Open one Sqlplus session (UNIX process Pid: 7476), enqueue one message:
SQL > exec enq(1);
Then start above Dtrace script on process Pid: 7476.On Sqlplus session, run:
SQL > exec listen_deq(1);
Here Dtrace output:
Mutex Get Seq: 1 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>1, KGLLKHDL_ADDR=>0x99CF98D0)
Mutex Rel Seq: 1-1 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
Mutex Rel Seq: 1-2 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
Mutex Rel Seq: 1-3 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
Mutex Get Seq: 2 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>4, KGLLKHDL_ADDR=>0x99CF98D0)
Mutex Rel Seq: 2-4 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
Mutex Get Seq: 3 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>90, KGLLKHDL_ADDR=>0x99CF98D0)
Mutex Get Seq: 4 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>71, KGLLKHDL_ADDR=>0x99CF98D0)
Mutex Rel Seq: 4-5 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
Mutex Rel Seq: 4-6 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
Mutex Get Seq: 5 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>95, KGLLKHDL_ADDR=>0x99CF98D0)
Mutex Rel Seq: 5-7 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
Mutex Get Seq: 6 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>85, KGLLKHDL_ADDR=>0x99CF98D0)
Mutex Rel Seq: 6-8 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
Latch Get Seq: 1 --- ksl_get_shared_latch (Addr=>0xD371C240, Mode=>16)
Latch Fre Seq: 1-1 --- kslfre (Addr=>0xD371C240)
Mutex Get Seq: 7 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>1, KGLLKHDL_ADDR=>0x99CF98D0)
Mutex Rel Seq: 7-9 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
Mutex Rel Seq: 7-10 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
Mutex Rel Seq: 7-11 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
Mutex Get Seq: 8 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>4, KGLLKHDL_ADDR=>0x99CF98D0)
Mutex Rel Seq: 8-12 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
Mutex Get Seq: 9 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>90, KGLLKHDL_ADDR=>0x99CF98D0)
Mutex Get Seq: 10 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>71, KGLLKHDL_ADDR=>0x99CF98D0)
Mutex Rel Seq: 10-13 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
Mutex Rel Seq: 10-14 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
Mutex Get Seq: 11 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>95, KGLLKHDL_ADDR=>0x99CF98D0)
Mutex Rel Seq: 11-15 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
Mutex Get Seq: 12 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>85, KGLLKHDL_ADDR=>0x99CF98D0)
Mutex Rel Seq: 12-16 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
Latch Get Seq: 2 --- ksl_get_shared_latch (Addr=>0xD371C240, Mode=>16)
Latch Fre Seq: 2-2 --- kslfre (Addr=>0xD371C240)
Latch Get Seq: 3 --- ksl_get_shared_latch (Addr=>0xD371C240, Mode=>16)
Latch Fre Seq: 3-3 --- kslfre (Addr=>0xD371C240)
Mutex Get Seq: 13 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>1, KGLLKHDL_ADDR=>0x99CF98D0)
Mutex Rel Seq: 13-17 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
Mutex Get Seq: 14 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>4, KGLLKHDL_ADDR=>0x99CF98D0)
Mutex Rel Seq: 14-18 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
Mutex Get Seq: 15 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>90, KGLLKHDL_ADDR=>0x99CF98D0)
Mutex Get Seq: 16 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>71, KGLLKHDL_ADDR=>0x99CF98D0)
Mutex Get Seq: 17 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>72, KGLLKHDL_ADDR=>0x99CF98D0)
Mutex Rel Seq: 17-19 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
Mutex Rel Seq: 17-20 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
Mutex Get Seq: 18 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>95, KGLLKHDL_ADDR=>0x99CF98D0)
Mutex Rel Seq: 18-21 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
Total Counters: Latch GETs: 3, Latch FREs: 3, Mutex GETs: 18, Mutex RELs: 21
Above output shows the number of Latch and Mutex GET/FREE (RELEASE) for one message dequeue when it is running alone (without any interference):
Latch GETs: 3
Latch FREs: 3
Mutex GETs: 18
Mutex RELs: 21
We can see that for one single message dequeue, we need 3 Latch GETs and also 3 subsequent Latch FREEs (GET and FREE are matched pair).But for Mutex, there are 18 GETS, but 21 RELs. These are due to consecutive Mutex GETs in different Locations, for example:
Mutex Get Seq: 3 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>90, KGLLKHDL_ADDR=>0x99CF98D0)
Mutex Get Seq: 4 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>71, KGLLKHDL_ADDR=>0x99CF98D0)
and consecutive Mutex RELs in different Locations, for example:
Mutex Get Seq: 1 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>1, KGLLKHDL_ADDR=>0x99CF98D0)
Mutex Rel Seq: 1-1 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
Mutex Rel Seq: 1-2 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
Mutex Rel Seq: 1-3 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
Above Latch and Mutex Gets are the same as GDB test outcome in previous Blog (II) for one single message dequeue:
3 Latch Gets (ksl_get_shared_latch)
18 Mutex Gets (kglGetMutex)
Blog (II) gdb trace shows the same mutex behaviour (consecutive Mutex GETs/RELs):
===== Mutext Get Seq: 3 -- kglGetMutex(kglhdmtx Addr(rsi)=>98417CA8, Location(r8)=>90) --
===== Mutext Get Seq: 4 -- kglGetMutex(kglhdmtx Addr(rsi)=>98417CA8, Location(r8)=>71) --
===== Lock Get Seq: 1 -- kgllkal (kgllkhdl Addr(rdx)=>98417B58, Mode(rcx)=>2), kglnaobj_and_schema: MSG_QUEUE_MULTIPLEK --
===== Mutext Rel Seq: 1-1 -- kglReleaseMutex(kglhdmtx Addr(rsi)=>98417CA8, Location(r8)=>4) --
===== Mutext Rel Seq: 1-2 -- kglReleaseMutex(kglhdmtx Addr(rsi)=>98417CA8, Location(r8)=>4) --
===== Mutext Rel Seq: 1-3 -- kglReleaseMutex(kglhdmtx Addr(rsi)=>98417CA8, Location(r8)=>4) --
2. Parallel Session Test
Above test shows the minimum number of Latch and Mutex GET/FREE (RELEASE) for one single message dequeue. We will continue our dequeue test in parallel sessions and monitor Latch and Mutex behaviour.
At first, we compose a Dtrace to collect test stats (see Section 5. Dtrace Script).
Then start Dtrace on Sqlplus session (process Pid: 7476), and run following test:
----------------- Parallel Sessions = 4 -----------------
begin
clearup_test;
purge_queue_table;
start_enq_jobs(4, 60);
start_deq_jobs(3, 1, 60);
sel_deq_loop(4, 1, 60);
end;
/
-- Output
-- Listener Deq Execs (SUC and Failed) = 9420, Duration (seconds) = 60
It starts 4 enqueue Job sessions, of which first 3 dequeue Jobs for first 3 enqueue sessions for 60 seconds,
and this Sqlplus dequeue session for 4th enqueue session for 60 seconds.After test run, we collect dequeue stats by query:
-- For each RUN, only select last corr_id, which is the traced session
select corr_id, count(*), min(enq_time), max(enq_time), min(deq_time), max(deq_time),
(max(enq_time)- min(enq_time))*86400 enq_delta, (max(deq_time) - min(deq_time))*86400 deq_delta
from aq$MultiConsumerMsgs_qtab
where corr_id = (select 'KSUB_'||count(distinct corr_id) from aq$MultiConsumerMsgs_qtab)
and deq_time is not null and msg_state = 'PROCESSED'
group by corr_id order by 1;
CORR_ID COUNT(*) MIN(ENQ_TIM MAX(ENQ_TIM MIN(DEQ_TIM MAX(DEQ_TIM ENQ_DELTA DEQ_DELTA
---------- ---------- ----------- ----------- ----------- ----------- ---------- ----------
KSUB_4 9420 10:56:41 10:57:41 10:56:41 10:57:41 60 60
and extract Latch and Mutex stats from Dtrace output:
L_CNTGET ksl_get_shared_latch = 155,862
L_AVG average_ns = 2,834
ALL_MUTEX_GETs kglGetMutex = 415,825
ALL_MUTEX_FREs kglReleaseMutex = 533,074
M_AVG average_ns = 4,306
Above output shows that during 60 seconds, one session (KSUB_4) dequeued 9420 messages.
There are 155,862 Latch GETs, which is much higher than (9420 * 3 = 28,260) if they were dequeued without parallel sessions.
There are 415,825 Mutex GETs, which is much higher than (9420 * 18 = 169,560) if they were dequeued without parallel sessions.
Latch Average GET takes 2,834 ns, which is varied between 1,024 and 524,288 ns (see L_QUANT in Section 5.1 Dtrace Output).
Mutex Average GET takes 4,306 ns, which is varied between 2,048 and 8,388,608 ns (see M_QUANT in Section 5.1 Dtrace Output).
We run one more test with 64 parallel Sessions and compare the performance:
----------------- Parallel Sessions = 64-----------------
begin
clearup_test;
purge_queue_table;
start_enq_jobs(64, 60);
start_deq_jobs(63, 1, 60);
sel_deq_loop(64, 1, 60);
end;
/
CORR_ID COUNT(*) MIN(ENQ_TIM MAX(ENQ_TIM MIN(DEQ_TIM MAX(DEQ_TIM ENQ_DELTA DEQ_DELTA
---------- ---------- ----------- ----------- ----------- ----------- ---------- ----------
KSUB_64 667 11:18:41 11:19:41 11:18:48 11:19:41 60 53
L_CNTGET ksl_get_shared_latch = 93,584
L_AVG average_ns = 29,496
ALL_MUTEX_GETs kglGetMutex = 258,710
ALL_MUTEX_FREs kglReleaseMutex = 343,015
M_AVG average_ns = 40,750
Above output shows that during 60 seconds, one session (KSUB_64) dequeued 667 messages.
There are 93,584 Latch GETs, which is much higher than (667 * 3 = 2,001) if they were dequeued without parallel sessions.
There are 258,710 Mutex GETs, which is much higher than (667 * 18 = 12,006) if they were dequeued without parallel sessions.
Latch Average GET takes 29,496 ns, which is varied between 1,024 and 268,435,456 ns (see L_QUANT in Section 5.2 Dtrace Output).
Mutex Average GET takes 40,750 ns, which is varied between 2,048 and 536,870,912 ns (see M_QUANT in Section 5.2 Dtrace Output).
For full Dtrace output, see Section 5.1 Dtrace Output for 4-Parallel Sessions.
and Section 5.2 for 64-Parallel Sessions.
3. Test Stats for different number of Parallel Sessions
We run the same test for 1, 2, 4. 8, 16, 32, 64 Parallel Sessions. Here the performance stats.
Dequeue Stats
Dequeue Stats
RUN PX CORR_ID COUNT(*) MIN(ENQ_TIM) MAX(ENQ_TIM) MIN(DEQ_TIM) MAX(DEQ_TIM) ENQ_DELTA DEQ_DELTA
--- -- ------- -------- ------------ ------------ ------------ ------------ --------- ---------
1 1 KSUB_1 11,884 10:49:17 10:50:17 10:49:17 10:50:16 60 59
2 2 KSUB_2 11,640 10:53:24 10:54:24 10:53:24 10:54:24 60 60
3 4 KSUB_4 9,420 10:56:41 10:57:41 10:56:41 10:57:41 60 60
4 8 KSUB_8 6,519 10:59:02 11:00:02 10:59:02 11:00:01 60 59
5 16 KSUB_16 3,181 11:02:08 11:03:08 11:02:08 11:03:07 60 59
6 32 KSUB_32 1,441 11:04:38 11:05:38 11:04:39 11:05:37 60 58
7 64 KSUB_64 667 11:18:41 11:19:41 11:18:48 11:19:41 60 53
Note: PX column: number of parallel sessions
Latch Stats (ns_Per_Latch_Get is Dtrace L_AVG (nanosecond))
RUN PX MSGs Total_MSGs Latch_Gets Total_Latch_Gets ns_Per_Latch_Get Latch_Gets_per_deq Latch_ns_per_Deq
--- -- ------ ---------- ---------- ---------------- ---------------- ------------------ ----------------
1 1 11,884 11,884 108,318 108,318 2,497 9 22,759
2 2 11,640 23,280 148,303 296,606 2,562 13 32,642
3 4 9,420 37,680 155,862 623,448 2,834 17 46,891
4 8 6,519 52,152 243,571 1,948,568 3,969 37 148,295
5 16 3,181 50,896 187,259 2,996,144 6,699 59 394,357
6 32 1,441 46,112 138,192 4,422,144 10,954 96 1,050,489
7 64 667 42,688 93,584 5,989,376 29,496 140 4,138,461
(Latch_ns_per_Deq = ns_Per_Latch_Get * Latch_Gets_per_deq)
Muext Stats (ns_Per_Mutex_Get is Dtrace M_AVG (nanosecond))
RUN PX MSGs Total_MSGs Mutex_Gets Total_Mutex_Gets ns_Per_Mutex_Get Mutex_Gets_per_deq Mutex_ns_per_Deq
--- -- ------ ---------- ---------- ---------------- ---------------- ------------------ ----------------
1 1 11,884 11,884 361,105 361,105 3,438 30 104,466
2 2 11,640 23,280 429,493 858,986 3,629 37 133,903
3 4 9,420 37,680 415,825 1,663,300 4,306 44 190,079
4 8 6,519 52,152 630,757 5,046,056 5,337 97 516,391
5 16 3,181 50,896 497,035 7,952,560 9,033 156 1,411,417
6 32 1,441 46,112 380,492 12,175,744 17,947 264 4,738,855
7 64 667 42,688 258,710 16,557,440 40,750 388 15,805,746
(Mutex_ns_per_Deq = ns_Per_Mutex_Get * Mutex_Gets_per_deq)
We can see that total dequeue throughput (Total_MSGs) is linearly increased
with number of Parallel Sessions till 8 Parallel Sessions with maximum reached (Total_MSGs = 52,152).Total number of Latch GETS (Total_Latch_Gets) and Mutex GETs (Total_Mutex_Gets) per dequeue are almost linearly increased with number of Parallel Sessions.
Latch and Mutex GET time per dequeue (Latch_ns_per_Deq and Mutex_ns_per_Deq, in nanosecond) are increased much faster when number of Parallel Sessions is more than 8.
For example, compare Mutex_ns_per_Deq for Parallel 32 vs. 16, number of Parallel Sessions increased 2 times, but duration increased more than 3 times (4,738,855/1,411,417 = 3.36).
Comparing above DB stats with Blog (II) model, we can see the model are approximative, but closed to real DB stats.
If we draw a graphic chart with test stats, we can see the similar trend as model formulas.
3.1 Test Outputs
Here all the parallel tests and stats details.
----------------- 1 Session -----------------
begin
clearup_test;
purge_queue_table;
start_enq_jobs(1, 60);
sel_deq_loop(1, 1, 60);
end;
/
L_CNTGET ksl_get_shared_latch = 108318
L_AVG average_ns = 2497
ALL_MUTEX_GETs kglGetMutex = 361105
ALL_MUTEX_FREs kglReleaseMutex = 445357
M_AVG average_ns = 3438
KSUB_1 11884 10:49:17 10:50:17 10:49:17 10:50:16 60 59
----------------- 2 Parallel Sessions -----------------
begin
clearup_test;
purge_queue_table;
start_enq_jobs(2, 60);
start_deq_jobs(1, 1, 60);
sel_deq_loop(2, 1, 60);
end;
/
L_CNTGET ksl_get_shared_latch = 148303
L_AVG average_ns = 2562
ALL_MUTEX_GETs kglGetMutex = 429493
ALL_MUTEX_FREs kglReleaseMutex = 539133
M_AVG average_ns = 3629
KSUB_2 11640 10:53:24 10:54:24 10:53:24 10:54:24 60 60
----------------- 4 Parallel Sessions -----------------
begin
clearup_test;
purge_queue_table;
start_enq_jobs(4, 60);
start_deq_jobs(3, 1, 60);
sel_deq_loop(4, 1, 60);
end;
/
L_CNTGET ksl_get_shared_latch = 155862
L_AVG average_ns = 2834
ALL_MUTEX_GETs kglGetMutex = 415825
ALL_MUTEX_FREs kglReleaseMutex = 533074
M_AVG average_ns = 4306
KSUB_4 9420 10:56:41 10:57:41 10:56:41 10:57:41 60 60
----------------- Parallel Sessions = 8 -----------------
begin
clearup_test;
purge_queue_table;
start_enq_jobs(8, 60);
start_deq_jobs(7, 1, 60);
sel_deq_loop(8, 1, 60);
end;
/
L_CNTGET ksl_get_shared_latch = 243571
L_AVG average_ns = 3969
ALL_MUTEX_GETs kglGetMutex = 630757
ALL_MUTEX_FREs kglReleaseMutex = 822641
M_AVG average_ns = 5337
KSUB_8 6519 10:59:02 11:00:02 10:59:02 11:00:01 60 59
----------------- 16 Parallel Sessions -----------------
begin
clearup_test;
purge_queue_table;
start_enq_jobs(16, 60);
start_deq_jobs(15, 1, 60);
sel_deq_loop(16, 1, 60);
end;
/
L_CNTGET ksl_get_shared_latch = 187259
L_AVG average_ns = 6699
ALL_MUTEX_GETs kglGetMutex = 497035
ALL_MUTEX_FREs kglReleaseMutex = 653138
M_AVG average_ns = 9033
KSUB_16 3181 11:02:08 11:03:08 11:02:08 11:03:07 60 59
----------------- 32 Parallel Sessions -----------------
begin
clearup_test;
purge_queue_table;
start_enq_jobs(32, 60);
start_deq_jobs(31, 1, 60);
sel_deq_loop(32, 1, 60);
end;
/
L_CNTGET ksl_get_shared_latch = 138192
L_AVG average_ns = 10954
ALL_MUTEX_GETs kglGetMutex = 380492
ALL_MUTEX_FREs kglReleaseMutex = 503141
M_AVG average_ns = 17947
KSUB_32 1426 11:04:38 11:05:38 11:04:39 11:05:37 60 58
----------------- Parallel Sessions = 64-----------------
begin
clearup_test;
purge_queue_table;
start_enq_jobs(64, 60);
start_deq_jobs(63, 1, 60);
sel_deq_loop(64, 1, 60);
end;
/
L_CNTGET ksl_get_shared_latch = 93584
L_AVG average_ns = 29496
ALL_MUTEX_GETs kglGetMutex = 258710
ALL_MUTEX_FREs kglReleaseMutex = 343015
M_AVG average_ns = 40750
KSUB_63 667 11:18:41 11:19:41 11:18:48 11:19:41 60 53
4. Workaround
To mitigate "library cache: mutex X" on queue object, we can mark hot object as follows:
select object_name, namespace from dba_objects where object_name = 'MSG_QUEUE_MULTIPLE';
-- MSG_QUEUE_MULTIPLE 10
alter system set "_kgl_hot_object_copies"= 254 scope=spfile;
alter system set "_kgl_debug"= "name='MSG_QUEUE_MULTIPLE' schema='K' namespace=10 debug=33554432" scope=spfile;
-- See Blog: "library cache: mutex X" and Application Context
-- (http://ksun-oracle.blogspot.com/2016/11/library-cache-mutex-x-and-application.html)
To workaround "latch free" on Latch "channel operations parent latch", we use subscriber instead of dbms_aq.listen.
See Blog (I) - procedure subscr_deqIn Blog (I), we created 100 subscribers by "create_subscribers('msg_queue_multiple', 100)". If a subscriber does not exist, Oracle temporarily creates one and use it for enqueue and dequeue, for example:
begin
for i in 1..1035 loop
enq(9990000+i);
listen_deq(9990000+i, 10);
end loop;
end;
/
-- only 1024 rows are listed, not all of 1035 rows are exposed v$persistent_subscribers.
select count(*) from v$persistent_subscribers where queue_name = 'MSG_QUEUE_MULTIPLE' and subscriber_name like 'KSUB_999%';
-- 1024
select count(*) from v$persistent_subscribers where queue_name = 'MSG_QUEUE_MULTIPLE' and subscriber_name NOT like 'KSUB_999%';
-- 100
select count(*) from dba_queue_subscribers where queue_name = 'MSG_QUEUE_MULTIPLE' and consumer_name like 'KSUB_999%';
-- 0
select count(*) from dba_queue_subscribers where queue_name = 'MSG_QUEUE_MULTIPLE' and consumer_name NOT like 'KSUB_999%';
-- 100
-- all messages are stored in queue table
select * from multiconsumermsgs_qtab where corrid like 'KSUB_999%' order by corrid;
Then we can see 1024 new subscribers with name like 'KSUB_999%' (maximum 1024 kept even 1035 created). The temporarily created subscribers are visible in v$persistent_subscribers (not all of them are exposed v$persistent_subscribers), but not stored in dba_queue_subscribers.
dba_queue_subscribers contains list of subscribers on all queues. They are created by dbms_aqadm add_subscriber and can be removed by dbms_aqadm.remove_subscriber. They are persistent after DB restart.
The maximum number of subscribers per queue is 1024, and error is:
maximum 1024: ORA-24067: exceeded maximum number of subscribers for queue MSG_QUEUE_MULTIPLE
See Blog: One Test of ORA-00600: [KGL-heap-size-exceeded] With AQ Subscriber
(http://ksun-oracle.blogspot.com/2023/03/one-test-of-ora-00600-kgl-heap-size.html)
According to Oracle Docu, v$persistent_subscribers displays information
about all active subscribers of the persistent queues in the database(above test shows that is not the case, only 1024 0f 1035 active subscribers are exposed in v$persistent_subscribers).
There is one row per instance per queue per subscriber. The rows are deleted when the database restarts.
So it is better to use pre-defined and limited number of subscribers to improved performance.
5. Dtrace Output for Parallel Sessions
5.1 4-Parallel Sessions
----------------- 4 Parallel Sessions -----------------
BEGIN at 10:56:31
END at 10:57:49
ELAPSED_Seconds = 77
------------------ Latch Child Stats ------------------
L_GET_CNT = 155862
L_TOTAL_NS = 441736077
L_CNTGET ksl_get_shared_latch = 155862
L_CNTFRE kslfre = 155862
L_AVG average_ns = 2834
L_QUANT quantize_ns =
value ------------- Distribution ------------- count
512 | 0
1024 |@@@@@@@@@@@@@@@@@@@@@@@@ 94257
2048 |@@@@@@@@@@@ 42290
4096 |@@@@ 17383
8192 | 1127
16384 | 287
32768 | 158
65536 | 337
131072 | 17
262144 | 2
524288 | 4
1048576 | 0
------------------ Mutex Stats ------------------
ALL_MUTEX_GETs kglGetMutex = 415825
ALL_MUTEX_FREs kglReleaseMutex = 533074
M_GET_CNT = 339405
M_TOTAL_NS = 1461574177
M_CNTGET kglGetMutex, 21 (Mode) = 1
M_CNTGET kglGetMutex, 109 (Mode) = 1
M_CNTGET kglGetMutex, 112 (Mode) = 1
M_CNTGET kglGetMutex, 6 (Mode) = 2
M_CNTGET kglGetMutex, 28 (Mode) = 2
M_CNTGET kglGetMutex, 75 (Mode) = 2
M_CNTGET kglGetMutex, 106 (Mode) = 2
M_CNTGET kglGetMutex, 157 (Mode) = 2
M_CNTGET kglGetMutex, 85 (Mode) = 62185
M_CNTGET kglGetMutex, 1 (Mode) = 69301
M_CNTGET kglGetMutex, 4 (Mode) = 69302
M_CNTGET kglGetMutex, 90 (Mode) = 69302
M_CNTGET kglGetMutex, 95 (Mode) = 69302
M_CNTFRE kglReleaseMutex = 339405
M_AVG average_ns = 4306
M_QUANT quantize_ns =
value ------------- Distribution ------------- count
1024 | 0
2048 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 235503
4096 |@@@@@@@@@@@@ 98902
8192 | 3262
16384 | 947
32768 | 568
65536 | 119
131072 | 44
262144 | 14
524288 | 12
1048576 | 9
2097152 | 6
4194304 | 5
8388608 | 14
16777216 | 0
5.2 64-Parallel Sessions
---------- Parallel Sessions = 64 ----------
BEGIN at 11:18:35
END at 11:19:53
ELAPSED_Seconds = 77
------------------ Latch Child Stats ------------------
L_GET_CNT = 93584
L_TOTAL_NS = -1534546885
L_CNTGET ksl_get_shared_latch = 93584
L_CNTFRE kslfre = 93584
L_AVG average_ns = 29496
L_QUANT quantize_ns =
value ------------- Distribution ------------- count
512 | 0
1024 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 64970
2048 |@@@@@@@@@ 21078
4096 |@@ 5382
8192 | 459
16384 | 680
32768 | 483
65536 | 302
131072 | 114
262144 | 26
524288 | 26
1048576 | 11
2097152 | 9
4194304 | 17
8388608 | 10
16777216 | 4
33554432 | 4
67108864 | 4
134217728 | 3
268435456 | 2
536870912 | 0
------------------ Mutex Stats ------------------
ALL_MUTEX_GETs kglGetMutex = 258710
ALL_MUTEX_FREs kglReleaseMutex = 343015
M_GET_CNT = 214951
M_TOTAL_NS = 169358324
M_CNTGET kglGetMutex, 21 (Mode) = 1
M_CNTGET kglGetMutex, 75 (Mode) = 1
M_CNTGET kglGetMutex, 157 (Mode) = 1
M_CNTGET kglGetMutex, 6 (Mode) = 2
M_CNTGET kglGetMutex, 28 (Mode) = 2
M_CNTGET kglGetMutex, 106 (Mode) = 2
M_CNTGET kglGetMutex, 85 (Mode) = 42475
M_CNTGET kglGetMutex, 1 (Mode) = 43116
M_CNTGET kglGetMutex, 4 (Mode) = 43117
M_CNTGET kglGetMutex, 90 (Mode) = 43117
M_CNTGET kglGetMutex, 95 (Mode) = 43117
M_CNTFRE kglReleaseMutex = 214951
M_AVG average_ns = 40750
M_QUANT quantize_ns =
value ------------- Distribution ------------- count
1024 | 0
2048 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 166007
4096 |@@@@@@@@ 40643
8192 |@ 2751
16384 | 1731
32768 | 1890
65536 | 891
131072 | 405
262144 | 211
524288 | 158
1048576 | 73
2097152 | 31
4194304 | 59
8388608 | 47
16777216 | 17
33554432 | 18
67108864 | 7
134217728 | 5
268435456 | 5
536870912 | 2
1073741824 | 0
6. Dtrace Script: Latch and Mutex Stats
--------- Latch and Mutex (not consider Mutex consecutive Gets and Release) ---------
sudo dtrace -n \
'
BEGIN{
START_TS = timestamp;
LATCHCHD_ADDR = 0XD371C240;
L_GET_CNT = 0; L_TOTAL_NS = 0;
KGLHDMTX_ADDR = 0X99CF9A20;
M_GET_CNT = 0; M_TOTAL_NS = 0;
m_get = 0; m_rel = 1; /* exclude Mutex consecutive Gets and Release */
printf("\nBEGIN at %Y ", walltimestamp);}
pid$target:oracle:ksl_get_shared_latch:entry /arg0 == LATCHCHD_ADDR/
{self->L_ts = timestamp; L_GET_CNT = L_GET_CNT + 1;
@L_CNTGET[probefunc] = count()}
pid$target:oracle:kslfre:entry /arg0 == LATCHCHD_ADDR/
{L_TOTAL_NS = L_TOTAL_NS + (timestamp - self->L_ts);
@L_AVG["average_ns"] = avg(timestamp - self->L_ts);
@L_QUANT["quantize_ns"] = quantize(timestamp - self->L_ts);
@L_CNTFRE[probefunc] = count()}
pid$target:oracle:kglGetMutex:entry /arg1 == KGLHDMTX_ADDR/
{@M_ALL_CNTGET[probefunc] = count();}
pid$target:oracle:kglReleaseMutex:entry /arg1 == KGLHDMTX_ADDR/
{@M_ALL_CNTFRE[probefunc] = count();}
pid$target:oracle:kglGetMutex:entry /arg1 == KGLHDMTX_ADDR && m_get == 0 && m_rel == 1/
{self->M_ts = timestamp; M_GET_CNT = M_GET_CNT + 1;
@M_CNTGET[probefunc, arg4] = count();
m_get = 1; m_rel = 0}
pid$target:oracle:kglReleaseMutex:entry /arg1 == KGLHDMTX_ADDR && m_get == 1 && m_rel == 0/
{M_TOTAL_NS = M_TOTAL_NS + (timestamp - self->M_ts);
@M_CNTFRE[probefunc] = count();
@M_AVG["average_ns"] = avg(timestamp - self->M_ts);
@M_QUANT["quantize_ns"] = quantize(timestamp - self->M_ts);
m_get = 0; m_rel = 1}
END{printf("\nEND at %Y\n", walltimestamp);
printf("\nELAPSED_Seconds = %i", (timestamp - START_TS)/1000/1000/1000);
printf("\n------------------ Latch Child Stats ------------------");
printf("\nL_GET_CNT = %i", L_GET_CNT);
printf("\nL_TOTAL_NS = %i", L_TOTAL_NS);
/* printf("\nL_AVG_GET_NS = %i", L_TOTAL_NS/L_GET_CNT); same as L_AVG */
printa("\nL_CNTGET %20s = %12@d", @L_CNTGET);
printa("\nL_CNTFRE %20s = %12@d", @L_CNTFRE);
printa("\nL_AVG %20s = %12@d", @L_AVG);
printa("\nL_QUANT %20s = %12@d", @L_QUANT);
printf("\n------------------ Mutex Stats ------------------");
printa("\nALL_MUTEX_GETs %20s = %12@d", @M_ALL_CNTGET);
printa("\nALL_MUTEX_FREs %20s = %12@d\n", @M_ALL_CNTFRE);
printf("\nM_GET_CNT = %i", M_GET_CNT);
printf("\nM_TOTAL_NS = %i", M_TOTAL_NS);
/* printf("\nM_AVG_GET_NS = %i", M_TOTAL_NS/M_GET_CNT); same as M_AVG */
printa("\nM_CNTGET %20s, %4d (Mode) = %12@d", @M_CNTGET);
printa("\nM_CNTFRE %20s = %12@d", @M_CNTFRE);
printa("\nM_AVG %20s = %12@d", @M_AVG);
printa("\nM_QUANT %20s = %12@d", @M_QUANT);}
' -p 7476
-- // -p $1
-- ksh ./aq_latch_mutex_dtrace 7476