On Oracle: Oracle Multi-Consumer AQ dequeue with dbms_aq.listen Performance Analysis and Workaround (III)

Wait Events (I) Performance Model (II) Performance Analysis and Workaround (III)

In the first Blog (I), we made test to show Wait Events: "library cache: mutex X" on queue object and "latch free" on Latch "channel operations parent latch".

In the second Blog (II), we first performed dequeue trace by GDB. Based on test output, we built Latch and Mutex model, examined it against real DB stats and subsequently revealed implementation problems.

In this third Blog (III), we will trace dequeue operation with different number of parallel sessions in Dtrace, present the performance stats, and evaluate Blog (II) model against real DB stats.

We conclude the Blogs with workaround (fix) for identified "library cache: mutex X" and "latch free".

Note: tested on Oracle 19.7 in Solaris.

1. Single Message Dequeue Latch and Mutex

Same as Blog (II), we first get child latch and mutex addr as follows:


select addr from v$latch_children where latch#=98 and child#=144;
    -- D371C240

select kglhdmtx, kglhdadr, v.* from  sys.x_KGLOB v where kglnaobj = 'MSG_QUEUE_MULTIPLE';  
    -- 99CF9A20	 99CF98D0

Then we compose below Dtrace script:


sudo dtrace -n \
'
 BEGIN{
   LATCHCHD_ADDR = 0XD371C240;
   KGLHDMTX_ADDR = 0X99CF9A20;
   l_get_seq = 0; l_rel_seq = 0;
   m_get_seq = 0; m_rel_seq = 0}
   
 pid$target:oracle:ksl_get_shared_latch:entry /arg0 == LATCHCHD_ADDR/  
  {printf("Latch Get Seq: %d --- %s (Addr=>0x%X, Mode=>%d)", ++l_get_seq, probefunc, arg0, arg4)}
  
 pid$target:oracle:kslfre:entry /arg0 == LATCHCHD_ADDR/  
  {printf("Latch Fre Seq: %d-%d --- %s (Addr=>0x%X)", l_get_seq, ++l_rel_seq, probefunc, arg0)}

 pid$target:oracle:kglGetMutex:entry /arg1 == KGLHDMTX_ADDR/ 
  {printf("Mutex Get Seq: %d --- %s (KGLHDMTX_ADDR=>0x%X, Location=>%d, KGLLKHDL_ADDR=>0x%X)", ++m_get_seq, probefunc, arg1, arg4, arg5);}
 
 pid$target:oracle:kglReleaseMutex:entry /arg1 == KGLHDMTX_ADDR/ 
  {printf("Mutex Rel Seq: %d-%d --- %s (KGLHDMTX_ADDR=>0x%X)", m_get_seq, ++m_rel_seq, probefunc, arg1);}

 END{printf("Total Counters: Latch GETs: %d, Latch FREs: %d, Mutex GETs: %d, Mutex RELs: %d", l_get_seq, l_rel_seq, m_get_seq, m_rel_seq)}
' -p 7476

Open one Sqlplus session (UNIX process Pid: 7476), enqueue one message:


SQL > exec enq(1);

Then start above Dtrace script on process Pid: 7476.
On Sqlplus session, run:


SQL > exec listen_deq(1);

Here Dtrace output:


  Mutex Get Seq: 1 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>1, KGLLKHDL_ADDR=>0x99CF98D0)
  Mutex Rel Seq: 1-1 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
  Mutex Rel Seq: 1-2 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
  Mutex Rel Seq: 1-3 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
  Mutex Get Seq: 2 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>4, KGLLKHDL_ADDR=>0x99CF98D0)
  Mutex Rel Seq: 2-4 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
  Mutex Get Seq: 3 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>90, KGLLKHDL_ADDR=>0x99CF98D0)
  Mutex Get Seq: 4 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>71, KGLLKHDL_ADDR=>0x99CF98D0)
  Mutex Rel Seq: 4-5 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
  Mutex Rel Seq: 4-6 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
  Mutex Get Seq: 5 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>95, KGLLKHDL_ADDR=>0x99CF98D0)
  Mutex Rel Seq: 5-7 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
  Mutex Get Seq: 6 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>85, KGLLKHDL_ADDR=>0x99CF98D0)
  Mutex Rel Seq: 6-8 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
  Latch Get Seq: 1 --- ksl_get_shared_latch (Addr=>0xD371C240, Mode=>16)
  Latch Fre Seq: 1-1 --- kslfre (Addr=>0xD371C240)
  Mutex Get Seq: 7 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>1, KGLLKHDL_ADDR=>0x99CF98D0)
  Mutex Rel Seq: 7-9 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
  Mutex Rel Seq: 7-10 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
  Mutex Rel Seq: 7-11 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
  Mutex Get Seq: 8 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>4, KGLLKHDL_ADDR=>0x99CF98D0)
  Mutex Rel Seq: 8-12 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
  Mutex Get Seq: 9 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>90, KGLLKHDL_ADDR=>0x99CF98D0)
  Mutex Get Seq: 10 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>71, KGLLKHDL_ADDR=>0x99CF98D0)
  Mutex Rel Seq: 10-13 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
  Mutex Rel Seq: 10-14 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
  Mutex Get Seq: 11 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>95, KGLLKHDL_ADDR=>0x99CF98D0)
  Mutex Rel Seq: 11-15 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
  Mutex Get Seq: 12 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>85, KGLLKHDL_ADDR=>0x99CF98D0)
  Mutex Rel Seq: 12-16 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
  Latch Get Seq: 2 --- ksl_get_shared_latch (Addr=>0xD371C240, Mode=>16)
  Latch Fre Seq: 2-2 --- kslfre (Addr=>0xD371C240)
  Latch Get Seq: 3 --- ksl_get_shared_latch (Addr=>0xD371C240, Mode=>16)
  Latch Fre Seq: 3-3 --- kslfre (Addr=>0xD371C240)
  Mutex Get Seq: 13 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>1, KGLLKHDL_ADDR=>0x99CF98D0)
  Mutex Rel Seq: 13-17 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
  Mutex Get Seq: 14 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>4, KGLLKHDL_ADDR=>0x99CF98D0)
  Mutex Rel Seq: 14-18 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
  Mutex Get Seq: 15 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>90, KGLLKHDL_ADDR=>0x99CF98D0)
  Mutex Get Seq: 16 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>71, KGLLKHDL_ADDR=>0x99CF98D0)
  Mutex Get Seq: 17 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>72, KGLLKHDL_ADDR=>0x99CF98D0)
  Mutex Rel Seq: 17-19 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
  Mutex Rel Seq: 17-20 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
  Mutex Get Seq: 18 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>95, KGLLKHDL_ADDR=>0x99CF98D0)
  Mutex Rel Seq: 18-21 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
  
  Total Counters: Latch GETs: 3, Latch FREs: 3, Mutex GETs: 18, Mutex RELs: 21

Above output shows the number of Latch and Mutex GET/FREE (RELEASE) for one message dequeue when it is running alone (without any interference):


  Latch GETs: 3
  Latch FREs: 3
  Mutex GETs: 18
  Mutex RELs: 21

We can see that for one single message dequeue, we need 3 Latch GETs and also 3 subsequent Latch FREEs (GET and FREE are matched pair).

But for Mutex, there are 18 GETS, but 21 RELs. These are due to consecutive Mutex GETs in different Locations, for example:


  Mutex Get Seq: 3 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>90, KGLLKHDL_ADDR=>0x99CF98D0)
  Mutex Get Seq: 4 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>71, KGLLKHDL_ADDR=>0x99CF98D0)

and consecutive Mutex RELs in different Locations, for example:


  Mutex Get Seq: 1 --- kglGetMutex (KGLHDMTX_ADDR=>0x99CF9A20, Location=>1, KGLLKHDL_ADDR=>0x99CF98D0)
  Mutex Rel Seq: 1-1 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
  Mutex Rel Seq: 1-2 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)
  Mutex Rel Seq: 1-3 --- kglReleaseMutex (KGLHDMTX_ADDR=>0x99CF9A20)

Above Latch and Mutex Gets are the same as GDB test outcome in previous Blog (II) for one single message dequeue:


   3   Latch Gets         (ksl_get_shared_latch)
   18  Mutex Gets         (kglGetMutex)

Blog (II) gdb trace shows the same mutex behaviour (consecutive Mutex GETs/RELs):


===== Mutext Get Seq: 3 -- kglGetMutex(kglhdmtx Addr(rsi)=>98417CA8, Location(r8)=>90) --
===== Mutext Get Seq: 4 -- kglGetMutex(kglhdmtx Addr(rsi)=>98417CA8, Location(r8)=>71) --

===== Lock Get Seq: 1 -- kgllkal (kgllkhdl Addr(rdx)=>98417B58, Mode(rcx)=>2), kglnaobj_and_schema: MSG_QUEUE_MULTIPLEK --
===== Mutext Rel Seq: 1-1 -- kglReleaseMutex(kglhdmtx Addr(rsi)=>98417CA8, Location(r8)=>4) --
===== Mutext Rel Seq: 1-2 -- kglReleaseMutex(kglhdmtx Addr(rsi)=>98417CA8, Location(r8)=>4) --
===== Mutext Rel Seq: 1-3 -- kglReleaseMutex(kglhdmtx Addr(rsi)=>98417CA8, Location(r8)=>4) --

2. Parallel Session Test

Above test shows the minimum number of Latch and Mutex GET/FREE (RELEASE) for one single message dequeue. We will continue our dequeue test in parallel sessions and monitor Latch and Mutex behaviour.

At first, we compose a Dtrace to collect test stats (see Section 5. Dtrace Script).

Then start Dtrace on Sqlplus session (process Pid: 7476), and run following test:


----------------- Parallel Sessions = 4 -----------------
begin
  clearup_test;
  purge_queue_table;
  start_enq_jobs(4, 60);
  start_deq_jobs(3, 1, 60);
  sel_deq_loop(4, 1, 60);
end;
/

-- Output
--   Listener Deq Execs (SUC and Failed) = 9420, Duration (seconds) = 60

It starts 4 enqueue Job sessions, of which first 3 dequeue Jobs for first 3 enqueue sessions for 60 seconds, and this Sqlplus dequeue session for 4th enqueue session for 60 seconds.

After test run, we collect dequeue stats by query:


-- For each RUN, only select last corr_id, which is the traced session

select corr_id, count(*), min(enq_time), max(enq_time), min(deq_time), max(deq_time),
       (max(enq_time)- min(enq_time))*86400 enq_delta, (max(deq_time) - min(deq_time))*86400 deq_delta
from aq$MultiConsumerMsgs_qtab 
where corr_id = (select 'KSUB_'||count(distinct corr_id) from aq$MultiConsumerMsgs_qtab) 
  and deq_time is not null and msg_state = 'PROCESSED' 
group by corr_id order by 1;

  CORR_ID    COUNT(*)   MIN(ENQ_TIM  MAX(ENQ_TIM  MIN(DEQ_TIM  MAX(DEQ_TIM  ENQ_DELTA   DEQ_DELTA
  ---------- ---------- -----------  -----------  -----------  -----------  ----------  ----------
  KSUB_4     9420       10:56:41     10:57:41     10:56:41     10:57:41     60          60

and extract Latch and Mutex stats from Dtrace output:


  L_CNTGET   ksl_get_shared_latch =       155,862
  L_AVG                average_ns =         2,834
  ALL_MUTEX_GETs      kglGetMutex =       415,825
  ALL_MUTEX_FREs  kglReleaseMutex =       533,074
  M_AVG                average_ns =         4,306

Above output shows that during 60 seconds, one session (KSUB_4) dequeued 9420 messages.


  There are 155,862 Latch GETs, which is much higher than (9420 * 3 = 28,260) if they were dequeued without parallel sessions.
  There are 415,825 Mutex GETs, which is much higher than (9420 * 18 = 169,560) if they were dequeued without parallel sessions.
  Latch Average GET takes 2,834 ns, which is varied between 1,024 and 524,288 ns (see L_QUANT in Section 5.1 Dtrace Output).
  Mutex Average GET takes 4,306 ns, which is varied between 2,048 and 8,388,608 ns (see M_QUANT in Section 5.1 Dtrace Output).

We run one more test with 64 parallel Sessions and compare the performance:


----------------- Parallel Sessions = 64-----------------
begin
  clearup_test;
  purge_queue_table;
  start_enq_jobs(64, 60);
  start_deq_jobs(63, 1, 60);
  sel_deq_loop(64, 1, 60);
end;
/

  CORR_ID    COUNT(*)   MIN(ENQ_TIM  MAX(ENQ_TIM  MIN(DEQ_TIM  MAX(DEQ_TIM  ENQ_DELTA   DEQ_DELTA
  ---------- ---------- -----------  -----------  -----------  -----------  ----------  ----------
  KSUB_64    667        11:18:41     11:19:41     11:18:48     11:19:41     60          53

  L_CNTGET  ksl_get_shared_latch =        93,584
  L_AVG               average_ns =        29,496
  ALL_MUTEX_GETs     kglGetMutex =       258,710
  ALL_MUTEX_FREs kglReleaseMutex =       343,015
  M_AVG               average_ns =        40,750

Above output shows that during 60 seconds, one session (KSUB_64) dequeued 667 messages.


  There are 93,584 Latch GETs, which is much higher than (667 * 3 = 2,001) if they were dequeued without parallel sessions.
  There are 258,710 Mutex GETs, which is much higher than (667 * 18 = 12,006) if they were dequeued without parallel sessions.
  Latch Average GET takes 29,496 ns, which is varied between 1,024 and 268,435,456 ns (see L_QUANT in Section 5.2 Dtrace Output).
  Mutex Average GET takes 40,750 ns, which is varied between 2,048 and 536,870,912 ns (see M_QUANT in Section 5.2 Dtrace Output).

For full Dtrace output, see Section 5.1 Dtrace Output for 4-Parallel Sessions. and Section 5.2 for 64-Parallel Sessions.

3. Test Stats for different number of Parallel Sessions

We run the same test for 1, 2, 4. 8, 16, 32, 64 Parallel Sessions. Here the performance stats.

Dequeue Stats


Dequeue Stats

RUN PX  CORR_ID  COUNT(*)  MIN(ENQ_TIM)  MAX(ENQ_TIM)  MIN(DEQ_TIM)  MAX(DEQ_TIM)  ENQ_DELTA  DEQ_DELTA
--- --  -------  --------  ------------  ------------  ------------  ------------  ---------  ---------
1   1   KSUB_1   11,884    10:49:17      10:50:17      10:49:17      10:50:16       60        59
2   2   KSUB_2   11,640    10:53:24      10:54:24      10:53:24      10:54:24       60        60
3   4   KSUB_4    9,420    10:56:41      10:57:41      10:56:41      10:57:41       60        60
4   8   KSUB_8    6,519    10:59:02      11:00:02      10:59:02      11:00:01       60        59
5   16  KSUB_16   3,181    11:02:08      11:03:08      11:02:08      11:03:07       60        59
6   32  KSUB_32   1,441    11:04:38      11:05:38      11:04:39      11:05:37       60        58
7   64  KSUB_64     667    11:18:41      11:19:41      11:18:48      11:19:41       60        53

Note: PX column: number of parallel sessions

Latch Stats (ns_Per_Latch_Get is Dtrace L_AVG (nanosecond))


RUN PX  MSGs    Total_MSGs  Latch_Gets  Total_Latch_Gets  ns_Per_Latch_Get  Latch_Gets_per_deq  Latch_ns_per_Deq
--- --  ------  ----------  ----------  ----------------  ----------------  ------------------  ----------------
1   1   11,884  11,884      108,318     108,318           2,497             9                   22,759   
2   2   11,640  23,280      148,303     296,606           2,562             13                  32,642   
3   4    9,420  37,680      155,862     623,448           2,834             17                  46,891   
4   8    6,519  52,152      243,571     1,948,568         3,969             37                  148,295  
5   16   3,181  50,896      187,259     2,996,144         6,699             59                  394,357  
6   32   1,441  46,112      138,192     4,422,144         10,954            96                  1,050,489
7   64     667  42,688      93,584      5,989,376         29,496            140                 4,138,461

(Latch_ns_per_Deq = ns_Per_Latch_Get * Latch_Gets_per_deq)

Muext Stats (ns_Per_Mutex_Get is Dtrace M_AVG (nanosecond))


RUN PX  MSGs    Total_MSGs  Mutex_Gets  Total_Mutex_Gets  ns_Per_Mutex_Get  Mutex_Gets_per_deq  Mutex_ns_per_Deq     
--- --  ------  ----------  ----------  ----------------  ----------------  ------------------  ----------------
1   1   11,884  11,884      361,105     361,105           3,438             30                  104,466   
2   2   11,640  23,280      429,493     858,986           3,629             37                  133,903   
3   4    9,420  37,680      415,825     1,663,300         4,306             44                  190,079   
4   8    6,519  52,152      630,757     5,046,056         5,337             97                  516,391   
5   16   3,181  50,896      497,035     7,952,560         9,033             156                 1,411,417 
6   32   1,441  46,112      380,492     12,175,744        17,947            264                 4,738,855 
7   64     667  42,688      258,710     16,557,440        40,750            388                 15,805,746

(Mutex_ns_per_Deq = ns_Per_Mutex_Get * Mutex_Gets_per_deq)

We can see that total dequeue throughput (Total_MSGs) is linearly increased with number of Parallel Sessions till 8 Parallel Sessions with maximum reached (Total_MSGs = 52,152).

Total number of Latch GETS (Total_Latch_Gets) and Mutex GETs (Total_Mutex_Gets) per dequeue are almost linearly increased with number of Parallel Sessions.

Latch and Mutex GET time per dequeue (Latch_ns_per_Deq and Mutex_ns_per_Deq, in nanosecond) are increased much faster when number of Parallel Sessions is more than 8.

For example, compare Mutex_ns_per_Deq for Parallel 32 vs. 16, number of Parallel Sessions increased 2 times, but duration increased more than 3 times (4,738,855/1,411,417 = 3.36).

Comparing above DB stats with Blog (II) model, we can see the model are approximative, but closed to real DB stats.

If we draw a graphic chart with test stats, we can see the similar trend as model formulas.

3.1 Test Outputs

Here all the parallel tests and stats details.


----------------- 1 Session -----------------
begin
  clearup_test;
  purge_queue_table;
  start_enq_jobs(1, 60);
  sel_deq_loop(1, 1, 60);
end;
/

L_CNTGET   ksl_get_shared_latch =       108318
L_AVG                average_ns =         2497
ALL_MUTEX_GETs      kglGetMutex =       361105
ALL_MUTEX_FREs  kglReleaseMutex =       445357
M_AVG                average_ns =         3438

KSUB_1	11884	10:49:17	10:50:17	10:49:17	10:50:16	60	59

----------------- 2 Parallel Sessions -----------------
begin
  clearup_test;
  purge_queue_table;
  start_enq_jobs(2, 60);
  start_deq_jobs(1, 1, 60);
  sel_deq_loop(2, 1, 60);
end;
/

L_CNTGET  ksl_get_shared_latch =       148303
L_AVG               average_ns =         2562
ALL_MUTEX_GETs     kglGetMutex =       429493
ALL_MUTEX_FREs kglReleaseMutex =       539133
M_AVG               average_ns =         3629

KSUB_2	11640	10:53:24	10:54:24	10:53:24	10:54:24	60	60

----------------- 4 Parallel Sessions -----------------
begin
  clearup_test;
  purge_queue_table;
  start_enq_jobs(4, 60);
  start_deq_jobs(3, 1, 60);
  sel_deq_loop(4, 1, 60);
end;
/

L_CNTGET  ksl_get_shared_latch =       155862
L_AVG               average_ns =         2834
ALL_MUTEX_GETs     kglGetMutex =       415825
ALL_MUTEX_FREs kglReleaseMutex =       533074
M_AVG               average_ns =         4306

KSUB_4	9420	10:56:41	10:57:41	10:56:41	10:57:41	60	60

----------------- Parallel Sessions = 8 -----------------
begin
  clearup_test;
  purge_queue_table;
  start_enq_jobs(8, 60);
  start_deq_jobs(7, 1, 60);
  sel_deq_loop(8, 1, 60);
end;
/

L_CNTGET  ksl_get_shared_latch =       243571
L_AVG               average_ns =         3969
ALL_MUTEX_GETs     kglGetMutex =       630757
ALL_MUTEX_FREs kglReleaseMutex =       822641
M_AVG               average_ns =         5337

KSUB_8	6519	10:59:02	11:00:02	10:59:02	11:00:01	60	59

----------------- 16 Parallel Sessions -----------------
begin
  clearup_test;
  purge_queue_table;
  start_enq_jobs(16, 60);
  start_deq_jobs(15, 1, 60);
  sel_deq_loop(16, 1, 60);
end;
/

L_CNTGET  ksl_get_shared_latch =       187259
L_AVG               average_ns =         6699
ALL_MUTEX_GETs     kglGetMutex =       497035
ALL_MUTEX_FREs kglReleaseMutex =       653138
M_AVG               average_ns =         9033

KSUB_16	3181	11:02:08	11:03:08	11:02:08	11:03:07	60	59

----------------- 32 Parallel Sessions -----------------
begin
  clearup_test;
  purge_queue_table;
  start_enq_jobs(32, 60);
  start_deq_jobs(31, 1, 60);
  sel_deq_loop(32, 1, 60);
end;
/

L_CNTGET  ksl_get_shared_latch =       138192
L_AVG               average_ns =        10954
ALL_MUTEX_GETs     kglGetMutex =       380492
ALL_MUTEX_FREs kglReleaseMutex =       503141
M_AVG               average_ns =        17947

KSUB_32	1426	11:04:38	11:05:38	11:04:39	11:05:37	60	58

----------------- Parallel Sessions = 64-----------------
begin
  clearup_test;
  purge_queue_table;
  start_enq_jobs(64, 60);
  start_deq_jobs(63, 1, 60);
  sel_deq_loop(64, 1, 60);
end;
/

L_CNTGET  ksl_get_shared_latch =        93584
L_AVG               average_ns =        29496
ALL_MUTEX_GETs     kglGetMutex =       258710
ALL_MUTEX_FREs kglReleaseMutex =       343015
M_AVG               average_ns =        40750

KSUB_63	667	11:18:41	11:19:41	11:18:48	11:19:41	60	53

4. Workaround

To mitigate "library cache: mutex X" on queue object, we can mark hot object as follows:


select object_name, namespace from dba_objects where object_name = 'MSG_QUEUE_MULTIPLE';
  -- MSG_QUEUE_MULTIPLE	10

alter system set "_kgl_hot_object_copies"= 254 scope=spfile;

alter system set "_kgl_debug"= "name='MSG_QUEUE_MULTIPLE' schema='K' namespace=10 debug=33554432" scope=spfile;

  -- See Blog: "library cache: mutex X" and Application Context 
  --           (http://ksun-oracle.blogspot.com/2016/11/library-cache-mutex-x-and-application.html)

To workaround "latch free" on Latch "channel operations parent latch", we use subscriber instead of dbms_aq.listen. See Blog (I) - procedure subscr_deq

In Blog (I), we created 100 subscribers by "create_subscribers('msg_queue_multiple', 100)". If a subscriber does not exist, Oracle temporarily creates one and use it for enqueue and dequeue, for example:


begin
  for i in 1..1035 loop
    enq(9990000+i);
    listen_deq(9990000+i, 10); 
  end loop;
end;
/

-- only 1024 rows are listed, not all of 1035 rows are exposed v$persistent_subscribers.
select count(*) from v$persistent_subscribers where queue_name = 'MSG_QUEUE_MULTIPLE' and subscriber_name like 'KSUB_999%';
  -- 1024

select count(*) from v$persistent_subscribers where queue_name = 'MSG_QUEUE_MULTIPLE' and subscriber_name NOT like 'KSUB_999%';
  -- 100
  
select count(*) from dba_queue_subscribers where queue_name = 'MSG_QUEUE_MULTIPLE' and consumer_name like 'KSUB_999%';
  -- 0

select count(*) from dba_queue_subscribers where queue_name = 'MSG_QUEUE_MULTIPLE' and consumer_name NOT like 'KSUB_999%';
  -- 100
  
-- all messages are stored in queue table
select * from multiconsumermsgs_qtab where corrid like 'KSUB_999%' order by corrid;

Then we can see 1024 new subscribers with name like 'KSUB_999%' (maximum 1024 kept even 1035 created).

The temporarily created subscribers are visible in v$persistent_subscribers (not all of them are exposed v$persistent_subscribers), but not stored in dba_queue_subscribers.

dba_queue_subscribers contains list of subscribers on all queues. They are created by dbms_aqadm add_subscriber and can be removed by dbms_aqadm.remove_subscriber. They are persistent after DB restart.

The maximum number of subscribers per queue is 1024, and error is:


  maximum 1024:  ORA-24067: exceeded maximum number of subscribers for queue MSG_QUEUE_MULTIPLE

     See Blog: One Test of ORA-00600: [KGL-heap-size-exceeded] With AQ Subscriber 
        (http://ksun-oracle.blogspot.com/2023/03/one-test-of-ora-00600-kgl-heap-size.html)

According to Oracle Docu, v$persistent_subscribers displays information about all active subscribers of the persistent queues in the database
(above test shows that is not the case, only 1024 0f 1035 active subscribers are exposed in v$persistent_subscribers).
There is one row per instance per queue per subscriber. The rows are deleted when the database restarts.

So it is better to use pre-defined and limited number of subscribers to improved performance.

5. Dtrace Output for Parallel Sessions

5.1 4-Parallel Sessions


----------------- 4 Parallel Sessions -----------------

BEGIN at 10:56:31
END   at 10:57:49

ELAPSED_Seconds = 77
------------------ Latch Child Stats ------------------
L_GET_CNT  = 155862
L_TOTAL_NS = 441736077
L_CNTGET ksl_get_shared_latch =       155862
L_CNTFRE               kslfre =       155862
L_AVG              average_ns =         2834
L_QUANT          quantize_ns  =
           value  ------------- Distribution ------------- count
             512 |                                         0
            1024 |@@@@@@@@@@@@@@@@@@@@@@@@                 94257
            2048 |@@@@@@@@@@@                              42290
            4096 |@@@@                                     17383
            8192 |                                         1127
           16384 |                                         287
           32768 |                                         158
           65536 |                                         337
          131072 |                                         17
          262144 |                                         2
          524288 |                                         4
         1048576 |                                         0

------------------ Mutex Stats ------------------
ALL_MUTEX_GETs      kglGetMutex =       415825
ALL_MUTEX_FREs      kglReleaseMutex =       533074

M_GET_CNT  = 339405
M_TOTAL_NS = 1461574177
M_CNTGET          kglGetMutex,   21 (Mode) =            1
M_CNTGET          kglGetMutex,  109 (Mode) =            1
M_CNTGET          kglGetMutex,  112 (Mode) =            1
M_CNTGET          kglGetMutex,    6 (Mode) =            2
M_CNTGET          kglGetMutex,   28 (Mode) =            2
M_CNTGET          kglGetMutex,   75 (Mode) =            2
M_CNTGET          kglGetMutex,  106 (Mode) =            2
M_CNTGET          kglGetMutex,  157 (Mode) =            2
M_CNTGET          kglGetMutex,   85 (Mode) =        62185
M_CNTGET          kglGetMutex,    1 (Mode) =        69301
M_CNTGET          kglGetMutex,    4 (Mode) =        69302
M_CNTGET          kglGetMutex,   90 (Mode) =        69302
M_CNTGET          kglGetMutex,   95 (Mode) =        69302
M_CNTFRE      kglReleaseMutex =       339405
M_AVG              average_ns =         4306
M_QUANT           quantize_ns =
           value  ------------- Distribution ------------- count
            1024 |                                         0
            2048 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@             235503
            4096 |@@@@@@@@@@@@                             98902
            8192 |                                         3262
           16384 |                                         947
           32768 |                                         568
           65536 |                                         119
          131072 |                                         44
          262144 |                                         14
          524288 |                                         12
         1048576 |                                         9
         2097152 |                                         6
         4194304 |                                         5
         8388608 |                                         14
        16777216 |                                         0

5.2 64-Parallel Sessions


---------- Parallel Sessions = 64 ----------

BEGIN at 11:18:35
END   at 11:19:53

ELAPSED_Seconds = 77
------------------ Latch Child Stats ------------------
L_GET_CNT  = 93584
L_TOTAL_NS = -1534546885
L_CNTGET ksl_get_shared_latch =        93584
L_CNTFRE               kslfre =        93584
L_AVG              average_ns =        29496
L_QUANT          quantize_ns  =
           value  ------------- Distribution ------------- count
             512 |                                         0
            1024 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@             64970
            2048 |@@@@@@@@@                                21078
            4096 |@@                                       5382
            8192 |                                         459
           16384 |                                         680
           32768 |                                         483
           65536 |                                         302
          131072 |                                         114
          262144 |                                         26
          524288 |                                         26
         1048576 |                                         11
         2097152 |                                         9
         4194304 |                                         17
         8388608 |                                         10
        16777216 |                                         4
        33554432 |                                         4
        67108864 |                                         4
       134217728 |                                         3
       268435456 |                                         2
       536870912 |                                         0

------------------ Mutex Stats ------------------
ALL_MUTEX_GETs      kglGetMutex =       258710
ALL_MUTEX_FREs      kglReleaseMutex =       343015

M_GET_CNT  = 214951
M_TOTAL_NS = 169358324
M_CNTGET          kglGetMutex,   21 (Mode) =            1
M_CNTGET          kglGetMutex,   75 (Mode) =            1
M_CNTGET          kglGetMutex,  157 (Mode) =            1
M_CNTGET          kglGetMutex,    6 (Mode) =            2
M_CNTGET          kglGetMutex,   28 (Mode) =            2
M_CNTGET          kglGetMutex,  106 (Mode) =            2
M_CNTGET          kglGetMutex,   85 (Mode) =        42475
M_CNTGET          kglGetMutex,    1 (Mode) =        43116
M_CNTGET          kglGetMutex,    4 (Mode) =        43117
M_CNTGET          kglGetMutex,   90 (Mode) =        43117
M_CNTGET          kglGetMutex,   95 (Mode) =        43117
M_CNTFRE      kglReleaseMutex =       214951
M_AVG              average_ns =        40750
M_QUANT           quantize_ns =
           value  ------------- Distribution ------------- count
            1024 |                                         0
            2048 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@          166007
            4096 |@@@@@@@@                                 40643
            8192 |@                                        2751
           16384 |                                         1731
           32768 |                                         1890
           65536 |                                         891
          131072 |                                         405
          262144 |                                         211
          524288 |                                         158
         1048576 |                                         73
         2097152 |                                         31
         4194304 |                                         59
         8388608 |                                         47
        16777216 |                                         17
        33554432 |                                         18
        67108864 |                                         7
       134217728 |                                         5
       268435456 |                                         5
       536870912 |                                         2
      1073741824 |                                         0

6. Dtrace Script: Latch and Mutex Stats


--------- Latch and Mutex (not consider Mutex consecutive Gets and Release) ---------

sudo dtrace -n \
'
BEGIN{
   START_TS = timestamp;
   LATCHCHD_ADDR = 0XD371C240;
   L_GET_CNT  = 0; L_TOTAL_NS = 0;
   KGLHDMTX_ADDR = 0X99CF9A20;
   M_GET_CNT  = 0; M_TOTAL_NS = 0;
   m_get = 0; m_rel = 1;       /* exclude Mutex consecutive Gets and Release */
   printf("\nBEGIN at %Y ", walltimestamp);}

 pid$target:oracle:ksl_get_shared_latch:entry /arg0 == LATCHCHD_ADDR/  
  {self->L_ts = timestamp; L_GET_CNT = L_GET_CNT + 1;
   @L_CNTGET[probefunc] = count()}
  
 pid$target:oracle:kslfre:entry /arg0 == LATCHCHD_ADDR/  
  {L_TOTAL_NS = L_TOTAL_NS + (timestamp - self->L_ts);
   @L_AVG["average_ns"]    = avg(timestamp - self->L_ts);
   @L_QUANT["quantize_ns"] = quantize(timestamp - self->L_ts);
   @L_CNTFRE[probefunc]    = count()} 

 pid$target:oracle:kglGetMutex:entry /arg1 == KGLHDMTX_ADDR/
  {@M_ALL_CNTGET[probefunc] = count();} 

 pid$target:oracle:kglReleaseMutex:entry /arg1 == KGLHDMTX_ADDR/    
  {@M_ALL_CNTFRE[probefunc] = count();}
  
 pid$target:oracle:kglGetMutex:entry /arg1 == KGLHDMTX_ADDR && m_get == 0 && m_rel == 1/
  {self->M_ts = timestamp; M_GET_CNT = M_GET_CNT + 1;
   @M_CNTGET[probefunc, arg4]        = count();
   m_get = 1; m_rel = 0}
  
 pid$target:oracle:kglReleaseMutex:entry /arg1 == KGLHDMTX_ADDR && m_get == 1 && m_rel == 0/    
  {M_TOTAL_NS = M_TOTAL_NS + (timestamp - self->M_ts); 
   @M_CNTFRE[probefunc]    = count();
   @M_AVG["average_ns"]    = avg(timestamp - self->M_ts);
   @M_QUANT["quantize_ns"] = quantize(timestamp - self->M_ts);
   m_get = 0; m_rel = 1} 
   
END{printf("\nEND at %Y\n", walltimestamp);
    printf("\nELAPSED_Seconds = %i",     (timestamp - START_TS)/1000/1000/1000);
    printf("\n------------------ Latch Child Stats ------------------");
    printf("\nL_GET_CNT  = %i",         L_GET_CNT);
    printf("\nL_TOTAL_NS = %i",         L_TOTAL_NS);
    /* printf("\nL_AVG_GET_NS = %i",       L_TOTAL_NS/L_GET_CNT);  same as L_AVG */
    printa("\nL_CNTGET %20s = %12@d",   @L_CNTGET);
    printa("\nL_CNTFRE %20s = %12@d",   @L_CNTFRE);
    printa("\nL_AVG    %20s = %12@d",       @L_AVG);
    printa("\nL_QUANT %20s  = %12@d",   @L_QUANT);
    
    printf("\n------------------ Mutex Stats ------------------");
    printa("\nALL_MUTEX_GETs %20s = %12@d",       @M_ALL_CNTGET);
    printa("\nALL_MUTEX_FREs %20s = %12@d\n",     @M_ALL_CNTFRE);
    printf("\nM_GET_CNT  = %i",                   M_GET_CNT);
    printf("\nM_TOTAL_NS = %i",                   M_TOTAL_NS);
    /* printf("\nM_AVG_GET_NS = %i",                 M_TOTAL_NS/M_GET_CNT);  same as M_AVG */
    printa("\nM_CNTGET %20s, %4d (Mode) = %12@d", @M_CNTGET);
    printa("\nM_CNTFRE %20s = %12@d",             @M_CNTFRE);
    printa("\nM_AVG    %20s = %12@d",                @M_AVG);
    printa("\nM_QUANT  %20s = %12@d",             @M_QUANT);}
' -p 7476


-- // -p $1
-- ksh ./aq_latch_mutex_dtrace 7476

Sunday, April 9, 2023

Oracle Multi-Consumer AQ dequeue with dbms_aq.listen Performance Analysis and Workaround (III)