Wednesday, May 12, 2021

Oracle 19c new shared pool "SO private sga" and "SO private so latch" Performance Impacts

Oracle 19c introduced new shared pool component "SO private sga" and latch "SO private so latch". In this Blog, we will make tests to reveal their behaviours and performance impacts.

Normally Oracle can have up to 7 subpools in shared pool. However shared pool dump shows that "SO private sga" is only allocated to one single subpool. In such case, "SO private sga" can be one of the TOP 5 memory components, and hence high memory pressure on that particular subpool. Additionally "alter system flush shared_pool" cannot force the release of "SO private sga" memory.

The new 19c latch "SO private so latch" is positioned in LEVEL#=9, and "shared pool" latch LEVEL# is increased to 10 from 7 in 18c (latch ordering rule: Once a process acquires a latch of level x, it can only acquire another latch of level higher than x). To update "SO private sga" (kss_grow_from_global_cache), "SO private so latch" (kslgetl immediate_gets) is requested at first, which again requests library cache lock (kglLock) and mutex (kglGetMutex).

Note: Tested in Oracle 19.10

Update (08Jul2021): With test code of this Blog, Oracle delivered fix:
     Bug 32940955 : ORA-4031 DUE TO LARGE "SO PRIVATE SGA" ALLOCATION IN ONE SHARED POOL SUBPOOL


1. Test Setup


At first, we elaborate a standalone test, which can touch both "SO private sga" and "SO private so latch" for each new execution in a new Oracle connection.

create or replace package so_private_pkg as
  s_ts      timestamp with time zone;
  procedure proc1(p_cnt number);
end;
/

create or replace package body so_private_pkg as
  procedure proc1(p_cnt number) as
  begin
    s_ts  := systimestamp;   -- key to kss_grow_from_global_cache 
    dbms_session.set_nls('nls_territory',           'AMERICA');
    dbms_session.set_nls('nls_language',            'AMERICAN');
    dbms_session.set_nls('nls_sort',                'BINARY');
    dbms_session.set_nls('nls_numeric_characters',  '''.,''');
    commit;
    dbms_session.set_nls('nls_timestamp_format','''YYYY-MM-DD"T"HH24:MI:SS''');
  end;
end;
/

create or replace procedure K.start_job_so(p_count number) as
begin
  for i in 1..p_count loop
    dbms_scheduler.create_job (
      job_name        => 'TEST_JOB_SO_'||i,
      job_type        => 'PLSQL_BLOCK',
      job_action      =>
        'begin
           dbms_lock.sleep(29);
           so_private_pkg.proc1('||i||');
           --test_proc_dynamic(100, 1000);
           dbms_lock.sleep(1);
        end;',
      start_date      => systimestamp,
      repeat_interval => 'systimestamp',
      auto_drop       => true,
      enabled         => true);
  end loop;
end;
/


2. "SO private sga" and "SO private so latch"


Login to a 19c DB, we can see the new shared pool component "SO private sga":

select * from v$sgastat where name= 'SO private sga';

POOL           NAME                  BYTES   CON_ID
-------------- ---------------- ---------- --------
shared pool    SO private sga     40027648        0
If we make a shared pool dump, we can see "SO private sga" is only allocated to one single subpool (2) as TOP memory Subheap.

5 LARGEST SUB HEAPS for heap name="sga heap(2,0)"   desc=0x6015dc18
  Subheap ds=0x60006c98  heap name=  SO private sga  size=        40029040
   owner=(nil)  latch=(nil)
  Subheap ds=0x6000a060  heap name=  KSFD SGA I/O b  size=         4190424
   owner=(nil)  latch=(nil)
  Subheap ds=0x729ff880  heap name=   SQLA^15b219ae  size=         1242144
   owner=0x729ff730  latch=(nil)
  Subheap ds=0x9fd7ae80  heap name=   SQLA^9bd0be72  size=         1046128
   owner=0x9fd7ad30  latch=(nil)
  Subheap ds=0x8d9ebcf8  heap name=  PLMCD^e38f5ee0  size=         1038512
   owner=0x8dbbf2e0  latch=(nil)
Make a new connection and run test below to show latch usage:
(Note: For each new Connection, DO NOT RUN any script like glogin.sql (Site Profile file), login.sql (User Profile))

$ sqlplus /nolog

SQL> conn k/s@db19c
Connected.

SQL> select name, sum(gets), sum(immediate_gets) from v$latch_children where name in('shared pool', 'SO private so latch') group by name;

NAME                  SUM(GETS) SUM(IMMEDIATE_GETS)
-------------------- ---------- -------------------
shared pool            68985540               15463
SO private so latch     2793045             2808678

SQL> exec so_private_pkg.proc1(1);

PL/SQL procedure successfully completed.

SQL> select name, sum(gets), sum(immediate_gets) from v$latch_children where name in('shared pool', 'SO private so latch') group by name;

NAME                  SUM(GETS) SUM(IMMEDIATE_GETS)
-------------------- ---------- -------------------
shared pool            68985562               15463
SO private so latch     2793045             2808679
We can see "shared pool" SUM(GETS) increased 22 (68985562-68985540).
"SO private so latch" SUM(IMMEDIATE_GETS) increased 1 (2808679-2808678).

By the way, referring to x$ksmsp, we can see that Oracle shared pool memory is organized in 5 levels:

   shared_pool -> subpool (ksmchidx) -> component (ksmchcom) -> area (ksmchpar) -> chunk (ksmchptr)

select * from x$ksmsp where ksmchcom in ('SO private sga') and rownum <= 2;
 
  ADDR         INDX INST_ID CON_ID KSMCHIDX KSMCHDUR KSMCHCOM       KSMCHPTR KSMCHSIZ KSMCHCLS KSMCHTYP KSMCHPAR
  ------------ ---- ------- ------ -------- -------- -------------- -------- -------- -------- -------- --------
  7FBF49B7AF38    4       1      0        3        1 SO private sga 9FEFCBE8  1048536 recr         4095 9E29C270
  7FBF49B65BE8  837       1      0        3        1 SO private sga 9BFBDDA8   253976 freeabl         0 9E29C270
Following two queries list all memory components which are only allocated into one single subpool.

---- x$ksmsp lists each memomy chunk (ksmchptr, minimum unit) in each area (ksmchpar) for each component (ksmchcom) in subpool (ksmchidx)
---- x$ksmsp does not contain reserved extents
 
select ksmchcom, count(cnt) subpool_cnt, sum(siz) subpool_size
from (select ksmchidx, ksmchcom, count(*) cnt, sum(ksmchsiz) siz from x$ksmsp group by ksmchidx, ksmchcom)
  --where ksmchcom in ('SO private sga')
group by ksmchcom
having count(cnt) = 1
order by subpool_size desc; 
 
 
---- x$ksmss (v$sgastat) is about stats of SGA component (ksmssnam) in each subpool (ksmdsidx)
---- ksmdsidx = 0 is for reserved extents
 
select ksmssnam, count(cnt) subpool_cnt, sum(siz) subpool_siz 
  from (select ksmdsidx, ksmssnam, count(*) cnt, sum(ksmsslen) siz from x$ksmss group by ksmdsidx, ksmssnam)
  --where ksmssnam in ('SO private sga')
group by ksmssnam
having count(cnt) = 1
order by subpool_siz desc; 


3. SO private Activity Tracing


At frist, we get latch address, and then we compose one gdb script with those address (see Appendix "gdb_latch_script_3.txt")
(Note: the test DB is set with "_kghdsidx_count"=3 to create 3 subpools in shared pool)

select addr, latch#, child#, level#, name, gets, immediate_gets 
  from v$latch_children where name in ('shared pool', 'SO private so latch') and child# <=3 order by name, child#;

ADDR         LATCH#     CHILD#     LEVEL#  NAME                       GETS IMMEDIATE_GETS
-------- ---------- ---------- ----------  -------------------- ---------- --------------
B626A638         42          1          9  SO private so latch      931312         936136
B626A6F0         42          2          9  SO private so latch      930302         935888
B626A7A8         42          3          9  SO private so latch      931444         936680
60560A38        619          1         10  shared pool            22816547           5894
60560AD8        619          2         10  shared pool            22630853           5199
60560B78        619          3         10  shared pool            23540102           4370
Each time when making a test, we start a new connection:

SQL> conn k/s@db19c
Connected.
Get its UNIX process id: 123

Start tracing with the composed script:

gdb -x gdb_latch_script_3.txt -p 123
Run the test:

SQL> exec so_private_pkg.proc1(1);
Here the tracing log:

===== Library Cache Lock (1) <<< kgllkhdl: 879D79B0, kgllkmod 1, kglnaobj: BEGIN so_private_pkg.proc1(1); END;
===== Library Cache Lock (2) <<< kgllkhdl: 879DDD28, kgllkmod 1, kglnaobj: >>>=====
===== Library Cache Lock (3) <<< kgllkhdl: 6E851DD0, kgllkmod 1, kglnaobj: SO_PRIVATE_PKGK>>>=====
===== Library Cache Lock (4) <<< kgllkhdl: 707490A8, kgllkmod 1, kglnaobj: SO_PRIVATE_PKGK>>>=====
===== Library Cache Lock (5) <<< kgllkhdl: A2EA2A50, kgllkmod 1, kglnaobj: STANDARDSYS>>>=====
===== Library Cache Lock (6) <<< kgllkhdl: A5B6FA50, kgllkmod 1, kglnaobj: STANDARDSYS>>>=====
===== Library Cache Lock (7) <<< kgllkhdl: 87E9EC90, kgllkmod 1, kglnaobj: DBMS_SESSIONPUBLIC>>>=====

Breakpoint 9, 12d7f230 in kglGetMutex ()
=====--- kglGetMutex (30) ---> Mutex addr (rsi): A127C888, Location(r8d): 106
#0  12d7f230 in kglGetMutex ()
#1  12d74ea2 in kglhdgn ()

Breakpoint 10, 12da4830 in kgxExclusive ()
=====----- kgxExclusive (20) ---> Mutex addr (rsi): A127C888

Breakpoint 8, 12d76c90 in kgllkal ()
===== Library Cache Lock (8) <<< kgllkhdl: A127C738, kgllkmod 1, kglnaobj: DBMS_SESSIONSYS>>>=====
#0  12d76c90 in kgllkal ()
#1  12d72627 in kglLock ()

Breakpoint 1, 125c9ba0 in kslgetl ()
===== kslgetl shared latch (1) <<< Addr(rdi): 60560A38, Imget: 1, Why: 0, Where: 6293 >>>=====
#0  125c9ba0 in kslgetl ()
#1  1259da43 in ksfglt ()
#2  12d36f13 in kghalo ()
#3  12d33c24 in kghgex ()
#4  12d38442 in kghfnd ()
#5  12d36ae2 in kghalo ()
#6  12d7e6f8 in kglGetSO ()
#7  12d76dbf in kgllkal ()

Breakpoint 2, 125cf6c0 in kslfre ()
===== kslfre shared latch (1) <<< Addr(rdi): 60560A38 >>>=====
#0  125cf6c0 in kslfre ()
#1  1259dded in ksfflt ()

Breakpoint 11, 12da5850 in kgxRelease ()
=====----- kgxRelease (20) ---> Mutex addr (r15): A127C888 
$20 = {0, 909, 7105784, 253203}

......

===== Library Cache Lock (9) <<< kgllkhdl: 9EF37618, kgllkmod 1, kglnaobj: DBMS_SESSIONSYS>>>=====
===== Library Cache Lock (10) <<< kgllkhdl: 9EF30B90, kgllkmod 1, kglnaobj: alter session set nls_territory = AMERICA>>>=====
===== Library Cache Lock (11) <<< kgllkhdl: 9FAF9F50, kgllkmod 1, kglnaobj: alter session set nls_language = AMERICAN>>>=====
===== Library Cache Lock (12) <<< kgllkhdl: 9FAF8A28, kgllkmod 1, kglnaobj: alter session set nls_sort = BINARY>>>=====
===== Library Cache Lock (13) <<< kgllkhdl: A069FC50, kgllkmod 1, kglnaobj: alter session set nls_numeric_characters = '.,'>>>=====

......

Breakpoint 9, 12d7f230 in kglGetMutex ()
=====--- kglGetMutex (51) ---> Mutex addr (rsi): A1CA99D0, Location(r8d): 57
#0  12d7f230 in kglGetMutex ()
#1  04c9de7c in kglLockCursor ()

Breakpoint 10, 12da4830 in kgxExclusive ()
=====----- kgxExclusive (37) ---> Mutex addr (rsi): A1CA99D0

Breakpoint 8, 12d76c90 in kgllkal ()
===== Library Cache Lock (14) <<< kgllkhdl: A1CA9880, kgllkmod 1, kglnaobj: COMMIT>>>=====
#0  12d76c90 in kgllkal ()
#1  04c9df43 in kglLockCursor ()

Breakpoint 7, 012a0700 in kss_grow_from_global_cache ()
===== kss_grow_from_global_cache (1) <<  (r14): B7F27168, kss private so Chunk Addr (r14-4112): B7F26158 >>>=====
#0  012a0700 in kss_grow_from_global_cache ()
#1  1260c5ca in kss_add_child ()

Breakpoint 3, 125c9ba0 in kslgetl ()
===== kslgetl so private (1) <<< Addr(rdi): B626A6F0, Imget: 0, Why: 0, Where: 290 >>>=====
#0  125c9ba0 in kslgetl ()
#1  012a0844 in kss_grow_from_global_cache ()
#2  1260c5ca in kss_add_child ()
#3  12d7e4a5 in kglGetSO ()
#4  12d76dbf in kgllkal ()
#5  04c9df43 in kglLockCursor ()
#6  035b5e3f in kkspbd0 ()
#7  12a3bc9a in kksParseCursor ()

Breakpoint 4, 125cf6c0 in kslfre ()
===== kslfre so private (1) <<< Addr(rdi): B626A6F0 >>>=====
#0  125cf6c0 in kslfre ()
#1  012a088b in kss_grow_from_global_cache ()

Breakpoint 11, 12da5850 in kgxRelease ()
=====----- kgxRelease (37) ---> Mutex addr (r15): A1CA99D0 
$37 = {0, 909, 1253713, 34977169}

......

Breakpoint 8, 12d76c90 in kgllkal ()
===== Library Cache Lock (15) <<< kgllkhdl: 95589568, kgllkmod 1, kglnaobj: alter session set nls_timestamp_format = 'YYYY-MM-DD"T"HH24:MI:SS'>>>=====
#0  12d76c90 in kgllkal ()
#1  12d72627 in kglLock ()
The above output shows that latch get/free is fully contained within mutex get/release:

(1). kgxExclusive mutex get   A127C888
            kslgetl latch get   60560A38 (Imget: 1 for non IMMEDIATE_GETS)
            kslfre  latch free  60560A38
     kgxRelease mutex release A127C888 

(2). kgxExclusive mutex get   A1CA99D0
            kslgetl latch get   B626A6F0 (Imget: 0 for IMMEDIATE_GETS)
            kslfre  latch free  B626A6F0
     kgxRelease mutex release A1CA99D0 
The first latch get 60560A38 is triggered by kghalo to allocate shared pool memory.
Between kgxExclusive (mutex get) and kslgetl (latch get), there is one kgllkal (Library Cache Lock Allocate).
So the call sequence to allocate shared pool memory is kgxExclusive -> kgllkal -> kslgetl.

The second latch get B626A6F0 is triggered by kss_grow_from_global_cache to update (or insert) "SO private sga" memory component.

With following query, we can find the touched memory chunk for kss_grow_from_global_cache (r14): B7F27168 in "SO private sga".

select v.*, to_number('B7F27168', 'xxxxxxxx') - to_number(ksmchptr, 'xxxxxxxxxxxxxxxx') offset
  from x$ksmsp v 
 where ksmchcom in ('SO private sga') 
   and to_number('B7F27168', 'xxxxxxxx') between to_number(ksmchptr, 'xxxxxxxxxxxxxxxx') 
   and to_number(ksmchptr, 'xxxxxxxxxxxxxxxx') + ksmchsiz-1;
   
ADDR               INDX INST_ID  CON_ID KSMCHIDX   KSMCHDUR KSMCHCOM        KSMCHPTR         KSMCHSIZ KSMCHCLS KSMCHTYP KSMCHPAR         OFFSET
---------------- ------ ------- ------- -------- ---------- -------------- ----------------- -------- -------- -------- ---------------- -------
00007EFEA3952810 117341       1       0        2          1 SO private sga  00000000B7EF4878  1048536 recr     4095     0000000060006C98  207088
The trace log also contains Library Cache Locks/Pins (kglLock/kglpin), Mutex Gets/Releases (kgxExclusive and kgxRelease). For details, see Blog: Oracle PLITBLM "library cache: mutex X".


4. "latch: shared pool" and "library cache: mutex X" Blocking Test


Based on above tracing output, we can make two blocking wait tests.
One is with kslfre to block "latch: shared pool" get,
another is with kss_grow_from_global_cache to block "SO private sga" update.


4.1 kslfre Blocking Test


In this blocking test, we will show two wait events: "latch: shared pool", "library cache: mutex X", and then look their respective code path.

First we start 4 Job sessions:

exec start_job_so(4);
Then make a new DB connection.

SQL> conn k/s@db19c
Connected.
Get its UNIX process id: 9558 (Oracle session id: 909)

Start tracing and set a breakpoint

gdb -p 9558

break kslfre if $rdi==0x60560A38 || $rdi==0x60560AD8 || $rdi==0x60560B78
Run the test:

SQL> exec so_private_pkg.proc1(1);
Resume process running in gdb. After a few seconds, we reached the breakpoint, and display call stack.

(gdb) c
Continuing.

Breakpoint 1, 0x125cf6c0 in kslfre ()
(gdb) bt 16
#0  0x125cf6c0 in kslfre ()
#1  0x1259dded in ksfflt ()
#2  0x12d3605c in kghalo ()
#3  0x12d33c24 in kghgex ()
#4  0x12d38442 in kghfnd ()
#5  0x12d36ae2 in kghalo ()
#6  0x12d7e6f8 in kglGetSO ()
#7  0x12d76dbf in kgllkal ()
#8  0x12d72627 in kglLock ()
#9  0x12d6d4b5 in kglget ()
#10 0x04c86f19 in kglgob ()
#11 0x04c878bd in kglgob ()
#12 0x12d9b0c0 in kgiind ()
#13 0x056d98ec in pfri8_inst_spec ()
#14 0x056d9714 in pfri1_inst_spec ()
#15 0x12daed50 in pfrrun ()
We can see that all Job sessions are blocked with "latch: shared pool" or "library cache: mutex X" by session 909. So one session can cause two different wait events in the blocked sessions. (Note: we started 4 Job sessions, but v$session shows 6 due to dbms_scheduler job delayed start/stop cleanup).

v$mutex_sleep_history shows mutex sleeping stats by BLOCKING_SESSION 909.

v$latchholder shows that SID 909 is holding shared pool CHILD# 1 latch 60560A38.

SQL> select program, event, sid, serial#, p1, p2raw, p3raw, final_blocking_session
    from v$session
    where lower(program) like '%sql%' or lower(program) like '%j0%'
    order by program;

PROGRAM                        EVENT                      SID    SERIAL#  P1         P2RAW            P3RAW            FINAL_BLOCKING_SESSION
------------------------------ ------------------------- ------ --------- ---------- ---------------- ---------------- ----------------------
oracle@db19c (J000)         latch: shared pool            1011    58459   1616251448 000000000000026B 0000000093FD6BD0                    909
oracle@db19c (J001)         library cache: mutex X         426    51552   3168695887 0000038D00000000 0000130A0001006A                    909
oracle@db19c (J002)         library cache: mutex X         372     7360   3168695887 0000038D00000000 0000130A0001006A                    909
oracle@db19c (J003)         latch: shared pool             122    30477   1616251448 000000000000026B 000000008C6E1038                    909
oracle@db19c (J004)         library cache: mutex X         277     6387   3168695887 0000038D00000000 0000130A0001006A                    909
oracle@db19c (J005)         library cache: mutex X         408    13814   3168695887 0000038D00000000 0000130A0001006A                    909
sqlplus@db19c (TNS V1-V3)   SQL*Net message from client    909    33954   1413697536 0000000000000001 00
        
SQL> select mutex_identifier, sleep_timestamp, mutex_type, gets, sleeps, requesting_session, blocking_session, mutex_value, p1raw, location
     from v$mutex_sleep_history
     where sleep_timestamp > sysdate -2/1440 order by sleep_timestamp desc;

MUTEX_IDENTIFIER SLEEP_TIMESTAMP MUTEX_TYPE         GETS  SLEEPS REQUESTING_SESSION BLOCKING_SESSION MUTEX_VALUE      P1RAW            LOCATION
---------------- --------------- --------------- ------- ------- ------------------ ---------------- ---------------- ---------------- ------------
      3168695887 01:10:44        Library Cache   7108586   41476                277              909 0000038D00000000 00000000A127C738 kglhdgn2 106
      3168695887 01:10:44        Library Cache   7108586   41457                426              909 0000038D00000000 00000000A127C738 kglhdgn2 106
      3168695887 01:10:44        Library Cache   7108586   41470                408              909 0000038D00000000 00000000A127C738 kglhdgn2 106
      3168695887 01:10:44        Library Cache   7108586   41447                372              909 0000038D00000000 00000000A127C738 kglhdgn2 106
              15 01:10:38        Row Cache         46638       1                 76              360 0000016800000000 00               [19] kqrpre

  -- 38D in MUTEX_VALUE and P2RAW are blocking session id: 909 (=0x38D).

SQL> select * from v$latchholder;

 PID   SID LADDR     NAME                            GETS  CON_ID
---- ----- --------- --------------------------- -------- -------
  35   909 60560A38  shared pool                 22844512       0
  47  1011 6005F500  parameter table management  17143486       0
For "library cache: mutex X", query below finds the library object (P1RAW and P1 in above query output). ADDR is its object_handle (see Blog: Oracle PLITBLM "library cache: mutex X").

SQL > select hash_value, addr, owner, name, namespace, type from v$db_object_cache where hash_value in (3168695887);

HASH_VALUE ADDR             OWNER  NAME         NAMESPACE            TYPE
---------- ---------------- ------ ------------ -------------------- ----------
3168695887 00000000A127C738 SYS    DBMS_SESSION TABLE/PROCEDURE      PACKAGE
Now we can have a further look of two different wait events.


4.1.1 "library cache: mutex X" Wait


Above gdb trace shows that shared pool latch Get/Free is fully contained within mutex Get/Release. If the blocked session comes in the same code path, it is blocked by "library cache: mutex X" because it first invokes kglGetMutex (kgxExclusive) to get the same mutex.

Here is what diag LWS db19c_dia0_30593_lws_1.trc showed for Session ID 277. The Short stack dump shows kglGetMutex and kgxExclusive calls.

*** 2021-05-12T01:04:22.093264+02:00
HM: Early Warning - Session ID 277 serial# 6387 OS PID 5888 (J004)
     is waiting on 'library cache: mutex X' for 32 seconds, wait id 62
     p1: 'idn'=0xbcde764f, p2: 'value'=0x38d00000000, p3: 'where'=0x130a0001006a
    Final Blocker is Session ID 909 serial# 33954 on instance 1
     which is 'not in a wait' for 60 seconds
 
 Total  Self-         Total  Total  Outlr  Outlr  Outlr           
  Hung  Rslvd  Rslvd   Wait WaitTm   Wait WaitTm   Wait           
  Sess  Hangs  Hangs  Count   Secs  Count   Secs  Count Wait Event
------ ------ ------ ------ ------ ------ ------ ------ -----------
  2632      0      0 812304 7729347  12122 7695264      0 library cache: mutex X
 
HM: Dumping Short Stack of pid[61.5888] (sid:277, ser#:6387)
Short stack dump: 
ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-semtimedop()+10<-skgpwwait()+187
<-ksliwat()+2224<-kslwaitctx()+188<-kgxWait()+1304<-kgxExclusive()+712<-kglGetMutex()+151<-kglhdgn()+898
<-kglLock()+562<-kglget()+293<-kglgob()+281<-kglgob()+2749<-kgiind()+4256<-pfri8_inst_spec()+140<-pfri1_inst_spec()+68
<-pfrrun()+544<-plsql_run()+752<-peicnt()+279<-kkxexe()+720<-opiexe()+25325<-kpoal8()+2387<-opiodr()+1202
<-kpoodr()+689<-upirtrc()+2760<-kpurcsc()+100<-kpuexec()+10994<-OCIStmtExecute()+41<-jslvec_execcb()+2537
<-jslvswu()+409<-jslvCDBSwitchUsr()+672<-jslve_execute0()+6761<-jslve_execute()+1529<-jslve_cdb_execute()+112
<-rpiswu2()+2004<-kkjex1e_cdb()+222<-kkjsexe()+2333<-kkjrdp()+1588<-opirip()+889<-opidrv()+581<-sou2o()+165
<-opimai_real()+173<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245
Here oradebug short_stack for 'library cache: mutex X' wait:

SQL> oradebug setorapid 61  
Oracle pid: 61, Unix process pid: 5888, image: oracle@db19c (J004)

SQL> oradebug short_stack
ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-semtimedop()+10<-skgpwwait()+187<-ksliwat()+2224
<-kslwaitctx()+188<-kgxWait()+1304<-kgxExclusive()+712<-kglGetMutex()+151<-kglhdgn()+898<-kglLock()+562<-kglget()+293<-kglgob()+281
<-kglgob()+2749<-kgiind()+4256<-pfri8_inst_spec()+140<-pfri1_inst_spec()+68<-pfrrun()+544<-plsql_run()+752<-peicnt()+279
<-kkxexe()+720<-opiexe()+25325<-kpoal8()+2387<-opiodr()+1202<-kpoodr()+689<-upirtrc()+2760<-kpurcsc()+100<-kpuexec()+10994
<-OCIStmtExecute()+41<-jslvec_execcb()+2537<-jslvswu()+409<-jslvCDBSwitchUsr()+672<-jslve_execute0()+6761<-jslve_execute()+1529
<-jslve_cdb_execute()+112<-rpiswu2()+2004<-kkjex1e_cdb()+222<-kkjsexe()+2333<-kkjrdp()+1588<-opirip()+889<-opidrv()+581
<-sou2o()+165<-opimai_real()+173<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245


4.1.2 "latch: shared pool" Wait


There is also other code path, which directly requests 'latch: shared pool' and does not require mutex get. Here is what diag LWS db19c_dia0_30593_lws_1.trc showed for Session ID 122. The Short stack dump does not have kglGetMutex and kgxExclusive calls.

*** 2021-05-12T01:04:43.183727+02:00
HM: Early Warning - Session ID 122 serial# 30477 OS PID 5785 (J003)
     is waiting on 'latch: shared pool' for 52 seconds, wait id 61
     p1: 'address'=0x60560a38, p2: 'number'=0x26b, p3: 'why'=0x8c6e1038
    Final Blocker is Session ID 909 serial# 33954 on instance 1
     which is 'not in a wait' for 61 seconds 
                                                     IO           
 Total  Self-         Total  Total  Outlr  Outlr  Outlr           
  Hung  Rslvd  Rslvd   Wait WaitTm   Wait WaitTm   Wait           
  Sess  Hangs  Hangs  Count   Secs  Count   Secs  Count Wait Event
------ ------ ------ ------ ------ ------ ------ ------ -----------
    20      0      0  36534   9032     19   8544      0 latch: shared pool
 
HM: Dumping Short Stack of pid[60.5785] (sid:122, ser#:30477)
Short stack dump: 
ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-semop()+7<-skgpwwait()+187<-kslges()+1534
<-kslgetl()+2489<-ksfglt()+163<-kghfre()+3985<-ksuxds()+1061<-kss_del_cb()+218<-kssdel()+216<-ksudel_int()+280
<-ksudel()+68<-kkjrdp()+2207<-opirip()+889<-opidrv()+581<-sou2o()+165<-opimai_real()+173<-ssthrdmain()+417
<-main()+256<-__libc_start_main()+245
Here oradebug short_stack for 'latch: shared pool' wait:

SQL> oradebug setorapid 60 
Oracle pid: 60, Unix process pid: 5785, image: oracle@db19c (J003)
SQL> oradebug short_stack
ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-semop()+7<-skgpwwait()+187<-kslges()+1534<-kslgetl()+2489
<-ksfglt()+163<-kghfre()+3985<-ksp_param_handle_free()+779<-kspdesc()+142<-ksmugf()+208<-ksuxds()+3727<-kss_del_cb()+218
<-kssdel()+216<-ksudel_int()+280<-ksudel()+68<-kkjrdp()+2207<-opirip()+889<-opidrv()+581<-sou2o()+165<-opimai_real()+173
<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245
In fact, if we use the same gdb script to trace one Job session, we can see the occurrence of kslgetl / kslfre on 60560A38 which is not contained within kglGetMutex kgxExclusive / kgxRelease.

Breakpoint 2, 0x125cf6c0 in kslfre ()
===== kslfre shared latch (14) <<< Addr(rdi): 60560B78 >>>=====
#0  0x125cf6c0 in kslfre ()
#1  0x1259dded in ksfflt ()

Breakpoint 1, 0x125c9ba0 in kslgetl ()
===== kslgetl shared latch (15) <<< Addr(rdi): 60560A38, Imget: 1, Why: 7165CBE8, Where: 6298 >>>=====
#0  0x125c9ba0 in kslgetl ()
#1  0x1259da43 in ksfglt ()
#2  0x12d2c341 in kghfre ()
#3  0x011380eb in ksp_param_handle_free ()
#4  0x01137d7e in kspdesc ()
#5  0x010f7990 in ksmugf ()
#6  0x1260e4cf in ksuxds ()
#7  0x1260503a in kss_del_cb ()

Breakpoint 2, 0x125cf6c0 in kslfre ()
===== kslfre shared latch (15) <<< Addr(rdi): 60560A38 >>>=====
#0  0x125cf6c0 in kslfre ()
#1  0x1259dded in ksfflt ()

Breakpoint 1, 0x125c9ba0 in kslgetl ()
===== kslgetl shared latch (16) <<< Addr(rdi): 60560AD8, Imget: 1, Why: 7D84F9C0, Where: 6298 >>>=====
db19c_dia0_30593_vfy_12.trc shows blocking graph for both wait events of two above sessions, which are blocked by session id: 909, and session id: 909 itself is in state 'CPU or Wait CPU' (although the fact is that session id: 909 is suspended by our manual breakpoint and in Process Status: t (stopped by debugger during trace) with %CPU=0.0, which does not consume any CPU).

*** 2021-05-12T01:05:52.275135+02:00
-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
                   os id: 5785
              process id: 60, oracle@db19c (J003)
              session id: 122
    }
    is waiting for 'latch: shared pool' with wait info:
    {
                      p1: 'address'=0x60560a38
                      p2: 'number'=0x26b
                      p3: 'why'=0x8c6e1038
            time in wait: 2 min 0 sec           
    }
    and is blocked by
 => Oracle session identified by:
    {
                   os id: 9558
              process id: 35, oracle@db19c
              session id: 909
             module name: 0 (SQL*Plusdb19c (TNS V1-V3))
    }
    which is on CPU or Wait CPU:
    {
               last wait: 2 min 30 sec ago
                blocking: 7 sessions
    }
 
Chain 1 Signature: 'CPU or Wait CPU'<='latch: shared pool'
===============================================================================
Chain 2:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
                   os id: 5888
              process id: 61, oracle@db19c (J004)
              session id: 277
    }
    is waiting for 'library cache: mutex X' with wait info:
    {
                      p1: 'idn'=0xbcde764f
                      p2: 'value'=0x38d00000000
                      p3: 'where'=0x130a0001006a
            time in wait: 2 min 1 sec
           timeout after: never
    }
    and is blocked by 'instance: 1, os id: 9558, session id: 909',
    which is a member of 'Chain 1'.


4.2 "SO private sga" kss_grow_from_global_cache Blocking Test


In this blocking test, only wait event: "library cache: mutex X" can be observed. We will also look its code path.

Make a new DB connection.

SQL> conn k/s@db19c
Connected.
Get its UNIX process id: 789 (Oracle session id: 909)

Start tracing and set a breakpoint

gdb -p 789

break kss_grow_from_global_cache
Run the test:

SQL> exec so_private_pkg.proc1(1);
Resume process running. After a few seconds, we reached the breakpoint, and display call stack.

(gdb) c
Continuing.

Breakpoint 1, 0x12a0700 in kss_grow_from_global_cache ()
(gdb) bt 21
#0  0x012a0700 in kss_grow_from_global_cache ()
#1  0x1260c5ca in kss_add_child ()
#2  0x12d7e4a5 in kglGetSO ()
#3  0x12d76dbf in kgllkal ()
#4  0x04c9df43 in kglLockCursor ()
#5  0x035b5e3f in kkspbd0 ()
#6  0x12a3bc9a in kksParseCursor ()
#7  0x12c11736 in opiosq0 ()
#8  0x129a7280 in opipls ()
#9  0x12990c52 in opiodr ()
#10 0x12aae556 in rpidrus ()
#11 0x12d501a1 in skgmstack ()
#12 0x12aae0d4 in rpidru ()
#13 0x12aad12f in rpiswu2 ()
#14 0x12aac4d2 in rpidrv ()
#15 0x12a89a83 in psddr0 ()
#16 0x12a88eb0 in psdnal ()
#17 0x12dbcfe2 in pevm_EXECC ()
#18 0x12db1a68 in pfrinstr_EXECC ()
#19 0x12db0544 in pfrrun_no_tool ()
#20 0x12daeeb6 in pfrrun ()
We can see that all Job sessions are blocked with "library cache: mutex X" by session 909. v$mutex_sleep_history shows mutex sleeping stats by BLOCKING_SESSION 909. (v$latchholder returns no rows because kslgetl is not yet invoked).

SQL> select program, event, sid, serial#, p1, p2raw, p3raw, final_blocking_session
    from v$session
    where lower(program) like '%sql%' or lower(program) like '%j0%'
    order by program;

PROGRAM                        EVENT                 SID   SERIAL#   P1        P2RAW            P3RAW            FINAL_BLOCKING_SESSION
------------------------------ -------------------- ------ --------- --------- ---------------- ---------------- ----------------------
oracle@db19c (J000)         library cache: mutex X    996     38685  255718823 0000038D00000000 0F3DF5A700000039                    909
oracle@db19c (J002)         library cache: mutex X    372     57666  255718823 0000038D00000000 0F3DF5A700000039                    909
oracle@db19c (J003)         library cache: mutex X    122     13135  255718823 0000038D00000000 0F3DF5A700000039                    909
oracle@db19c (J007)         library cache: mutex X   1011     38406  255718823 0000038D00000000 0F3DF5A700000039                    909
sqlplus@db19c (TNS V1-V3)   PGA memory operation      909     25675      65536 0000000000000001 00               

SQL> select mutex_identifier, sleep_timestamp, mutex_type, gets, sleeps, requesting_session, blocking_session, mutex_value, p1raw, location
     from v$mutex_sleep_history
     where sleep_timestamp > sysdate -2/1440 order by sleep_timestamp desc;

MUTEX_IDENTIFIER SLEEP_TIMESTAMP MUTEX_TYPE        GETS  SLEEPS REQUESTING_SESSION BLOCKING_SESSION MUTEX_VALUE      P1RAW            LOCATION
---------------- --------------- -------------  ------- ------- ------------------ ---------------- ---------------- ---------------- ---------------
       255718823 01:18:57        Library Cache  1254886    5155               1011              909 0000038D00000000 00000000A1CA9880 kgllkc1   57
       255718823 01:18:57        Library Cache  1254886    5145                122              909 0000038D00000000 00000000A1CA9880 kgllkc1   57
       255718823 01:18:57        Library Cache  1254886    5136                996              909 0000038D00000000 00000000A1CA9880 kgllkc1   57
       255718823 01:18:57        Library Cache  1254886    5141                372              909 0000038D00000000 00000000A1CA9880 kgllkc1   57
               0 01:17:36        Row Cache      4330026       1                122             1011 000003F300000000 00               [19] kqrpre

SQL> select * from v$latchholder;
  no rows selected


4.3 AWR Report


In AWR Section - Latch Sleep Breakdown, we can see stats of both "shared pool" and "SO private so latch".


Latch Sleep Breakdown

Latch Name                         Get Requests	 Misses  Sleeps Spin Gets
---------------------------------- ------------ ------- ------- ---------
cache buffers chains                 20,218,430   3,052     379     2,980
shared pool                             756,911   2,208     364     1,849
SO private so latch                      30,950      34       8	       27
kokc descriptor allocation latch            534       5       7         4
In Section - Latch Miss Sources, "SO private so latch" is displayed with correct Latch Name.

However, no Latch Name "shared pool" can be found. Probably it is re-named as "unknown". The location "Where" clearly shows that "kghalo" and "kghfre" (heap manager allocation/free). The sum of Sleeps for "unknown latch" is almost same as Sleeps (364) in above Section - Latch Sleep Breakdown.


Latch Miss Sources

Latch Name            Where                       NoWait Misses    Sleeps   Waiter Sleeps
--------------------- --------------------------- --------------  --------  -------------
SO private so latch   kss_grow_from_global_cache               0         7              0
SO private so latch   kss_shrink_private_so_list               0         1              8

unknown latch         kghalo                                   0       303            264
unknown latch         kghfre                                   0        31             78
unknown latch         kghupr1                                  0        15             12
unknown latch         kghalp                                   0         5              9
unknown latch         kgh_heap_sizes                           0         5              1
unknown latch         kghfnd: req scan                         0         2              0
Here the shared pool Child Latch Stats:

Child Latch Statistics

Latch Name    Child Num   Get Requests   Misses   Sleeps   Spin & Sleeps 1->3+
-----------  ----------  -------------  -------  -------  --------------------
shared pool           3        265,194      823      127      696/0/0/0
shared pool           2        256,577      775      149      629/0/0/0
shared pool           1        235,275      610       88      524/0/0/0
For other discussions of latch stats ("Get Requests", "Misses", "Sleeps", "Spin Gets"), see Blog: Is latch misses statistic gathered or deduced ?


4.4 "row cache mutex" Wait


diag LWS db19c_dia0_30593_lws_1.trc also shows "row cache mutex" Wait for 'cache id'=0xa (dc_users), which also involves 'latch: shared pool' ('address'=0x60560a38)

*** 2021-05-12T06:48:46.642290+02:00
HM: Early Warning - Session ID 277 serial# 6298 OS PID 7016 (J005)
     is waiting on 'row cache mutex' for 37 seconds, wait id 277
     p1: 'cache id'=0xa, p2: 'where requested'=0x13, p3: ''=0x0
    Blocked by Session ID 996 serial# 21988 on instance 1
     which is waiting on 'latch: shared pool' for 31 seconds
     p1: 'address'=0x60560a38, p2: 'number'=0x26b, p3: 'why'=0x0
    Final Blocker is Session ID 599 serial# 13104 on instance 1
     which is 'not in a wait' for 32 seconds
    Session ID 277 is blocking 2 sessions
    Blocking Session ID 765 serial# 16220 on instance 1
     which is waiting on 'row cache mutex' for 21 seconds
     p1: 'cache id'=0xa, p2: 'where requested'=0x13, p3: ''=0x0
                                                     IO           
 Total  Self-         Total  Total  Outlr  Outlr  Outlr           
  Hung  Rslvd  Rslvd   Wait WaitTm   Wait WaitTm   Wait           
  Sess  Hangs  Hangs  Count   Secs  Count   Secs  Count Wait Event
------ ------ ------ ------ ------ ------ ------ ------ -----------
    51      0      0 473776  30128    198  19008      0 row cache mutex
------------------------------------------

HM: Dumping Short Stack of pid[61.7016] (sid:277, ser#:6298)
Short stack dump: 
ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-semtimedop()+10<-skgpwwait()+187<-ksliwat()+2224
<-kslwaitctx()+188<-kgxWait()+1304<-kgxExclusive()+712<-kqrGetPOMutexInt()+195<-kqrpre1()+792<-jsksGetDBObjectName()+771
<-jslvepost_exec_post()+1020<-jslvsst_session_stop()+5631<-jslve_execute0()+7742<-jslve_execute()+1529<-jslve_cdb_execute()+112
<-rpiswu2()+2004<-kkjex1e_cdb()+222<-kkjsexe()+2333<-kkjrdp()+1588<-opirip()+889<-opidrv()+581<-sou2o()+165<-opimai_real()+173
<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245


5 Appendix gdb_latch_script_3.txt



set pagination off
set logging file latch_output_3.log
set logging overwrite on
set logging on
set $socg = 0
set $shcg = 0
set $socf = 0
set $shcf = 0
set $spec_lck = 0
set $spec_pin = 0
set $body_lck = 0
set $body_pin = 0
set $mutex_get = 0
set $mutex_ex = 0
set $mutex_fr = 0
set $mutex_addr = 0x0
set $body_locked = 0
set $sogrow = 0
set $i = 0

# Adjust beakpoint conditions
#   select name, listagg('$rdi==0X'||trim(leading 0 from addr), ' || ') within group (order by child#) addr
#    from v$latch_children where name in ('shared pool', 'SO private so latch') and child# <=3 group by name;
#   NAME                 ADDR
#   -------------------- ----------------------------------------------------
#   shared pool          $rdi==0X60560A38 || $rdi==0X60560AD8 || $rdi==0X60560B78
#   SO private so latch  $rdi==0xB626A638 || $rdi==0xB626A6F0 || $rdi==0xB626A7A8

# -- Usage: (1). make a trace new connection, (2). start gdb trace, (3). run a sql command, (4). stop gdb trace, (5). look trace output
# SQL> conn k/s@testdb
#      Connected.
# -- get spid (17191), gdb -x gdb_latch_script_3.txt -p 17191
# SQL> exec so_private_pkg.proc1(1);


break kslgetl if $rdi==0x60560A38 || $rdi==0x60560AD8 || $rdi==0x60560B78
commands
printf "===== kslgetl shared latch (%i) <<< Addr(rdi): %X, Imget: %i, Why: %X, Where: %i >>>=====\n", ++$shcg, $rdi, $rsi, $rdx, $rcx
backtrace 8
continue
end

break kslfre if $rdi==0x60560A38 || $rdi==0x60560AD8 || $rdi==0x60560B78
commands
printf "===== kslfre shared latch (%i) <<< Addr(rdi): %X >>>=====\n", ++$shcf, $rdi
backtrace 2
continue
end

break kslgetl if $rdi==0xB626A638 || $rdi==0xB626A6F0 || $rdi==0xB626A7A8
commands
printf "===== kslgetl so private (%i) <<< Addr(rdi): %X, Imget: %i, Why: %X, Where: %i >>>=====\n", ++$socg, $rdi, $rsi, $rdx, $rcx
backtrace 8
continue
end

break kslfre if $rdi==0xB626A638 || $rdi==0xB626A6F0 || $rdi==0xB626A7A8
commands
printf "===== kslfre so private (%i) <<< Addr(rdi): %X >>>=====\n", ++$socf, $rdi
backtrace 2
continue
end
    
break ksl_get_shared_latch if $rdi==0x60560A38 || $rdi==0x60560AD8 || $rdi==0x60560B78
commands
printf "===== ksl_get_shared_latch shared latch (%i) <<< addr(rdi): %X, Imget: %i, Why: %X, Where: %i, Mode: %X >>>=====\n", ++$i, $rdi, $rsi, $rdx, $rcx, $r8
backtrace 8
continue
end

break ksl_get_shared_latch if $rdi==0xB626A638 || $rdi==0xB626A6F0 || $rdi==0xB626A7A8
commands
printf "===== ksl_get_shared_latch SO latch (%i) <<< addr(rdi): %X, Imget: %i, Why: %X, Where: %i, Mode: %X >>>=====\n", ++$i, $rdi, $rsi, $rdx, $rcx, $r8
backtrace 8
continue
end

# Most "kss private so " has size 5136 in shared_poo dump like:   0b7f17b18 sz=     5136    cprm      "kss private so "
# Find Chunk which covers $r14. Offset 4112 to $r14 is only an example.
break kss_grow_from_global_cache
commands
printf "===== kss_grow_from_global_cache (%i) <<  (r14): %X, kss private so Chunk Addr (r14-4112): %X >>>=====\n", ++$sogrow, $r14, ($r14-4112)
backtrace 10
continue
end

break kgllkal 
#break kgllkal if $rdx==0X9FB08E08 || $rdx==0XA043B758
commands
printf "===== Library Cache Lock (%i) <<< kgllkhdl: %X, kgllkmod %x, kglnaobj: %s>>>=====\n", ++$body_lck, $rdx, $rcx, ($rdx+0x1c0)
backtrace 8 
continue
end

break kglGetMutex if $body_lck > 0 
command 
printf "=====--- kglGetMutex (%i) ---> Mutex addr (rsi): %X, Location(r8d): %d\n", ++$mutex_get, $rsi, $r8d
backtrace 4
continue
end

break kgxExclusive if $body_lck > 0 
#break kgxExclusive if $r9==0X9EC403D0 && $body_lck > 0 
command 
printf "=====----- kgxExclusive (%i) ---> Mutex addr (rsi): %X\n", ++$mutex_ex, $rsi
#backtrace 4
set $mutex_addr = $rsi  
continue
end

break kgxRelease if $r15==$mutex_addr && $body_lck > 0 
command 
printf "=====----- kgxRelease (%i) ---> Mutex addr (r15): %X \n", ++$mutex_fr, $r15 
p/d (int[4])*$r15
# x/4dw $r15     
continue
end