Normally Oracle can have up to 7 subpools in shared pool. However shared pool dump shows that "SO private sga" is only allocated to one single subpool. In such case, "SO private sga" can be one of the TOP 5 memory components, and hence high memory pressure on that particular subpool. Additionally "alter system flush shared_pool" cannot force the release of "SO private sga" memory.
The new 19c latch "SO private so latch" is positioned in LEVEL#=9, and "shared pool" latch LEVEL# is increased to 10 from 7 in 18c (latch ordering rule: Once a process acquires a latch of level x, it can only acquire another latch of level higher than x). To update "SO private sga" (kss_grow_from_global_cache), "SO private so latch" (kslgetl immediate_gets) is requested at first, which again requests library cache lock (kglLock) and mutex (kglGetMutex).
Note: Tested in Oracle 19.10
Update (08Jul2021): With test code of this Blog, Oracle delivered fix:
Bug 32940955 : ORA-4031 DUE TO LARGE "SO PRIVATE SGA" ALLOCATION IN ONE SHARED POOL SUBPOOL
1. Test Setup
At first, we elaborate a standalone test, which can touch both "SO private sga" and "SO private so latch" for each new execution in a new Oracle connection.
create or replace package so_private_pkg as
s_ts timestamp with time zone;
procedure proc1(p_cnt number);
end;
/
create or replace package body so_private_pkg as
procedure proc1(p_cnt number) as
begin
s_ts := systimestamp; -- key to kss_grow_from_global_cache
dbms_session.set_nls('nls_territory', 'AMERICA');
dbms_session.set_nls('nls_language', 'AMERICAN');
dbms_session.set_nls('nls_sort', 'BINARY');
dbms_session.set_nls('nls_numeric_characters', '''.,''');
commit;
dbms_session.set_nls('nls_timestamp_format','''YYYY-MM-DD"T"HH24:MI:SS''');
end;
end;
/
create or replace procedure K.start_job_so(p_count number) as
begin
for i in 1..p_count loop
dbms_scheduler.create_job (
job_name => 'TEST_JOB_SO_'||i,
job_type => 'PLSQL_BLOCK',
job_action =>
'begin
dbms_lock.sleep(29);
so_private_pkg.proc1('||i||');
--test_proc_dynamic(100, 1000);
dbms_lock.sleep(1);
end;',
start_date => systimestamp,
repeat_interval => 'systimestamp',
auto_drop => true,
enabled => true);
end loop;
end;
/
2. "SO private sga" and "SO private so latch"
Login to a 19c DB, we can see the new shared pool component "SO private sga":
select * from v$sgastat where name= 'SO private sga';
POOL NAME BYTES CON_ID
-------------- ---------------- ---------- --------
shared pool SO private sga 40027648 0
If we make a shared pool dump, we can see "SO private sga" is only allocated to one single subpool (2) as TOP memory Subheap.
5 LARGEST SUB HEAPS for heap name="sga heap(2,0)" desc=0x6015dc18
Subheap ds=0x60006c98 heap name= SO private sga size= 40029040
owner=(nil) latch=(nil)
Subheap ds=0x6000a060 heap name= KSFD SGA I/O b size= 4190424
owner=(nil) latch=(nil)
Subheap ds=0x729ff880 heap name= SQLA^15b219ae size= 1242144
owner=0x729ff730 latch=(nil)
Subheap ds=0x9fd7ae80 heap name= SQLA^9bd0be72 size= 1046128
owner=0x9fd7ad30 latch=(nil)
Subheap ds=0x8d9ebcf8 heap name= PLMCD^e38f5ee0 size= 1038512
owner=0x8dbbf2e0 latch=(nil)
Make a new connection and run test below to show latch usage:(Note: For each new Connection, DO NOT RUN any script like glogin.sql (Site Profile file), login.sql (User Profile))
$ sqlplus /nolog
SQL> conn k/s@db19c
Connected.
SQL> select name, sum(gets), sum(immediate_gets) from v$latch_children where name in('shared pool', 'SO private so latch') group by name;
NAME SUM(GETS) SUM(IMMEDIATE_GETS)
-------------------- ---------- -------------------
shared pool 68985540 15463
SO private so latch 2793045 2808678
SQL> exec so_private_pkg.proc1(1);
PL/SQL procedure successfully completed.
SQL> select name, sum(gets), sum(immediate_gets) from v$latch_children where name in('shared pool', 'SO private so latch') group by name;
NAME SUM(GETS) SUM(IMMEDIATE_GETS)
-------------------- ---------- -------------------
shared pool 68985562 15463
SO private so latch 2793045 2808679
We can see "shared pool" SUM(GETS) increased 22 (68985562-68985540)."SO private so latch" SUM(IMMEDIATE_GETS) increased 1 (2808679-2808678).
By the way, referring to x$ksmsp, we can see that Oracle shared pool memory is organized in 5 levels:
shared_pool -> subpool (ksmchidx) -> component (ksmchcom) -> area (ksmchpar) -> chunk (ksmchptr)
select * from x$ksmsp where ksmchcom in ('SO private sga') and rownum <= 2;
ADDR INDX INST_ID CON_ID KSMCHIDX KSMCHDUR KSMCHCOM KSMCHPTR KSMCHSIZ KSMCHCLS KSMCHTYP KSMCHPAR
------------ ---- ------- ------ -------- -------- -------------- -------- -------- -------- -------- --------
7FBF49B7AF38 4 1 0 3 1 SO private sga 9FEFCBE8 1048536 recr 4095 9E29C270
7FBF49B65BE8 837 1 0 3 1 SO private sga 9BFBDDA8 253976 freeabl 0 9E29C270
Following two queries list all memory components which are only allocated into one single subpool.
---- x$ksmsp lists each memomy chunk (ksmchptr, minimum unit) in each area (ksmchpar) for each component (ksmchcom) in subpool (ksmchidx)
---- x$ksmsp does not contain reserved extents
select ksmchcom, count(cnt) subpool_cnt, sum(siz) subpool_size
from (select ksmchidx, ksmchcom, count(*) cnt, sum(ksmchsiz) siz from x$ksmsp group by ksmchidx, ksmchcom)
--where ksmchcom in ('SO private sga')
group by ksmchcom
having count(cnt) = 1
order by subpool_size desc;
---- x$ksmss (v$sgastat) is about stats of SGA component (ksmssnam) in each subpool (ksmdsidx)
---- ksmdsidx = 0 is for reserved extents
select ksmssnam, count(cnt) subpool_cnt, sum(siz) subpool_siz
from (select ksmdsidx, ksmssnam, count(*) cnt, sum(ksmsslen) siz from x$ksmss group by ksmdsidx, ksmssnam)
--where ksmssnam in ('SO private sga')
group by ksmssnam
having count(cnt) = 1
order by subpool_siz desc;
3. SO private Activity Tracing
At frist, we get latch address, and then we compose one gdb script with those address (see Appendix "gdb_latch_script_3.txt")
(Note: the test DB is set with "_kghdsidx_count"=3 to create 3 subpools in shared pool)
select addr, latch#, child#, level#, name, gets, immediate_gets
from v$latch_children where name in ('shared pool', 'SO private so latch') and child# <=3 order by name, child#;
ADDR LATCH# CHILD# LEVEL# NAME GETS IMMEDIATE_GETS
-------- ---------- ---------- ---------- -------------------- ---------- --------------
B626A638 42 1 9 SO private so latch 931312 936136
B626A6F0 42 2 9 SO private so latch 930302 935888
B626A7A8 42 3 9 SO private so latch 931444 936680
60560A38 619 1 10 shared pool 22816547 5894
60560AD8 619 2 10 shared pool 22630853 5199
60560B78 619 3 10 shared pool 23540102 4370
Each time when making a test, we start a new connection:
SQL> conn k/s@db19c
Connected.
Get its UNIX process id: 123Start tracing with the composed script:
gdb -x gdb_latch_script_3.txt -p 123
Run the test:
SQL> exec so_private_pkg.proc1(1);
Here the tracing log:
===== Library Cache Lock (1) <<< kgllkhdl: 879D79B0, kgllkmod 1, kglnaobj: BEGIN so_private_pkg.proc1(1); END;
===== Library Cache Lock (2) <<< kgllkhdl: 879DDD28, kgllkmod 1, kglnaobj: >>>=====
===== Library Cache Lock (3) <<< kgllkhdl: 6E851DD0, kgllkmod 1, kglnaobj: SO_PRIVATE_PKGK>>>=====
===== Library Cache Lock (4) <<< kgllkhdl: 707490A8, kgllkmod 1, kglnaobj: SO_PRIVATE_PKGK>>>=====
===== Library Cache Lock (5) <<< kgllkhdl: A2EA2A50, kgllkmod 1, kglnaobj: STANDARDSYS>>>=====
===== Library Cache Lock (6) <<< kgllkhdl: A5B6FA50, kgllkmod 1, kglnaobj: STANDARDSYS>>>=====
===== Library Cache Lock (7) <<< kgllkhdl: 87E9EC90, kgllkmod 1, kglnaobj: DBMS_SESSIONPUBLIC>>>=====
Breakpoint 9, 12d7f230 in kglGetMutex ()
=====--- kglGetMutex (30) ---> Mutex addr (rsi): A127C888, Location(r8d): 106
#0 12d7f230 in kglGetMutex ()
#1 12d74ea2 in kglhdgn ()
Breakpoint 10, 12da4830 in kgxExclusive ()
=====----- kgxExclusive (20) ---> Mutex addr (rsi): A127C888
Breakpoint 8, 12d76c90 in kgllkal ()
===== Library Cache Lock (8) <<< kgllkhdl: A127C738, kgllkmod 1, kglnaobj: DBMS_SESSIONSYS>>>=====
#0 12d76c90 in kgllkal ()
#1 12d72627 in kglLock ()
Breakpoint 1, 125c9ba0 in kslgetl ()
===== kslgetl shared latch (1) <<< Addr(rdi): 60560A38, Imget: 1, Why: 0, Where: 6293 >>>=====
#0 125c9ba0 in kslgetl ()
#1 1259da43 in ksfglt ()
#2 12d36f13 in kghalo ()
#3 12d33c24 in kghgex ()
#4 12d38442 in kghfnd ()
#5 12d36ae2 in kghalo ()
#6 12d7e6f8 in kglGetSO ()
#7 12d76dbf in kgllkal ()
Breakpoint 2, 125cf6c0 in kslfre ()
===== kslfre shared latch (1) <<< Addr(rdi): 60560A38 >>>=====
#0 125cf6c0 in kslfre ()
#1 1259dded in ksfflt ()
Breakpoint 11, 12da5850 in kgxRelease ()
=====----- kgxRelease (20) ---> Mutex addr (r15): A127C888
$20 = {0, 909, 7105784, 253203}
......
===== Library Cache Lock (9) <<< kgllkhdl: 9EF37618, kgllkmod 1, kglnaobj: DBMS_SESSIONSYS>>>=====
===== Library Cache Lock (10) <<< kgllkhdl: 9EF30B90, kgllkmod 1, kglnaobj: alter session set nls_territory = AMERICA>>>=====
===== Library Cache Lock (11) <<< kgllkhdl: 9FAF9F50, kgllkmod 1, kglnaobj: alter session set nls_language = AMERICAN>>>=====
===== Library Cache Lock (12) <<< kgllkhdl: 9FAF8A28, kgllkmod 1, kglnaobj: alter session set nls_sort = BINARY>>>=====
===== Library Cache Lock (13) <<< kgllkhdl: A069FC50, kgllkmod 1, kglnaobj: alter session set nls_numeric_characters = '.,'>>>=====
......
Breakpoint 9, 12d7f230 in kglGetMutex ()
=====--- kglGetMutex (51) ---> Mutex addr (rsi): A1CA99D0, Location(r8d): 57
#0 12d7f230 in kglGetMutex ()
#1 04c9de7c in kglLockCursor ()
Breakpoint 10, 12da4830 in kgxExclusive ()
=====----- kgxExclusive (37) ---> Mutex addr (rsi): A1CA99D0
Breakpoint 8, 12d76c90 in kgllkal ()
===== Library Cache Lock (14) <<< kgllkhdl: A1CA9880, kgllkmod 1, kglnaobj: COMMIT>>>=====
#0 12d76c90 in kgllkal ()
#1 04c9df43 in kglLockCursor ()
Breakpoint 7, 012a0700 in kss_grow_from_global_cache ()
===== kss_grow_from_global_cache (1) << (r14): B7F27168, kss private so Chunk Addr (r14-4112): B7F26158 >>>=====
#0 012a0700 in kss_grow_from_global_cache ()
#1 1260c5ca in kss_add_child ()
Breakpoint 3, 125c9ba0 in kslgetl ()
===== kslgetl so private (1) <<< Addr(rdi): B626A6F0, Imget: 0, Why: 0, Where: 290 >>>=====
#0 125c9ba0 in kslgetl ()
#1 012a0844 in kss_grow_from_global_cache ()
#2 1260c5ca in kss_add_child ()
#3 12d7e4a5 in kglGetSO ()
#4 12d76dbf in kgllkal ()
#5 04c9df43 in kglLockCursor ()
#6 035b5e3f in kkspbd0 ()
#7 12a3bc9a in kksParseCursor ()
Breakpoint 4, 125cf6c0 in kslfre ()
===== kslfre so private (1) <<< Addr(rdi): B626A6F0 >>>=====
#0 125cf6c0 in kslfre ()
#1 012a088b in kss_grow_from_global_cache ()
Breakpoint 11, 12da5850 in kgxRelease ()
=====----- kgxRelease (37) ---> Mutex addr (r15): A1CA99D0
$37 = {0, 909, 1253713, 34977169}
......
Breakpoint 8, 12d76c90 in kgllkal ()
===== Library Cache Lock (15) <<< kgllkhdl: 95589568, kgllkmod 1, kglnaobj: alter session set nls_timestamp_format = 'YYYY-MM-DD"T"HH24:MI:SS'>>>=====
#0 12d76c90 in kgllkal ()
#1 12d72627 in kglLock ()
The above output shows that latch get/free is fully contained within mutex get/release:
(1). kgxExclusive mutex get A127C888
kslgetl latch get 60560A38 (Imget: 1 for non IMMEDIATE_GETS)
kslfre latch free 60560A38
kgxRelease mutex release A127C888
(2). kgxExclusive mutex get A1CA99D0
kslgetl latch get B626A6F0 (Imget: 0 for IMMEDIATE_GETS)
kslfre latch free B626A6F0
kgxRelease mutex release A1CA99D0
The first latch get 60560A38 is triggered by kghalo to allocate shared pool memory.Between kgxExclusive (mutex get) and kslgetl (latch get), there is one kgllkal (Library Cache Lock Allocate).
So the call sequence to allocate shared pool memory is kgxExclusive -> kgllkal -> kslgetl.
The second latch get B626A6F0 is triggered by kss_grow_from_global_cache to update (or insert) "SO private sga" memory component.
With following query, we can find the touched memory chunk for kss_grow_from_global_cache (r14): B7F27168 in "SO private sga".
select v.*, to_number('B7F27168', 'xxxxxxxx') - to_number(ksmchptr, 'xxxxxxxxxxxxxxxx') offset
from x$ksmsp v
where ksmchcom in ('SO private sga')
and to_number('B7F27168', 'xxxxxxxx') between to_number(ksmchptr, 'xxxxxxxxxxxxxxxx')
and to_number(ksmchptr, 'xxxxxxxxxxxxxxxx') + ksmchsiz-1;
ADDR INDX INST_ID CON_ID KSMCHIDX KSMCHDUR KSMCHCOM KSMCHPTR KSMCHSIZ KSMCHCLS KSMCHTYP KSMCHPAR OFFSET
---------------- ------ ------- ------- -------- ---------- -------------- ----------------- -------- -------- -------- ---------------- -------
00007EFEA3952810 117341 1 0 2 1 SO private sga 00000000B7EF4878 1048536 recr 4095 0000000060006C98 207088
The trace log also contains Library Cache Locks/Pins (kglLock/kglpin), Mutex Gets/Releases (kgxExclusive and kgxRelease).
For details, see Blog:
Oracle PLITBLM "library cache: mutex X".
4. "latch: shared pool" and "library cache: mutex X" Blocking Test
Based on above tracing output, we can make two blocking wait tests.
One is with kslfre to block "latch: shared pool" get,
another is with kss_grow_from_global_cache to block "SO private sga" update.
4.1 kslfre Blocking Test
In this blocking test, we will show two wait events: "latch: shared pool", "library cache: mutex X", and then look their respective code path.
First we start 4 Job sessions:
exec start_job_so(4);
Then make a new DB connection.
SQL> conn k/s@db19c
Connected.
Get its UNIX process id: 9558 (Oracle session id: 909)Start tracing and set a breakpoint
gdb -p 9558
break kslfre if $rdi==0x60560A38 || $rdi==0x60560AD8 || $rdi==0x60560B78
Run the test:
SQL> exec so_private_pkg.proc1(1);
Resume process running in gdb. After a few seconds, we reached the breakpoint, and display call stack.
(gdb) c
Continuing.
Breakpoint 1, 0x125cf6c0 in kslfre ()
(gdb) bt 16
#0 0x125cf6c0 in kslfre ()
#1 0x1259dded in ksfflt ()
#2 0x12d3605c in kghalo ()
#3 0x12d33c24 in kghgex ()
#4 0x12d38442 in kghfnd ()
#5 0x12d36ae2 in kghalo ()
#6 0x12d7e6f8 in kglGetSO ()
#7 0x12d76dbf in kgllkal ()
#8 0x12d72627 in kglLock ()
#9 0x12d6d4b5 in kglget ()
#10 0x04c86f19 in kglgob ()
#11 0x04c878bd in kglgob ()
#12 0x12d9b0c0 in kgiind ()
#13 0x056d98ec in pfri8_inst_spec ()
#14 0x056d9714 in pfri1_inst_spec ()
#15 0x12daed50 in pfrrun ()
We can see that all Job sessions are blocked with "latch: shared pool" or "library cache: mutex X" by session 909.
So one session can cause two different wait events in the blocked sessions.
(Note: we started 4 Job sessions, but v$session shows 6 due to dbms_scheduler job delayed start/stop cleanup).v$mutex_sleep_history shows mutex sleeping stats by BLOCKING_SESSION 909.
v$latchholder shows that SID 909 is holding shared pool CHILD# 1 latch 60560A38.
SQL> select program, event, sid, serial#, p1, p2raw, p3raw, final_blocking_session
from v$session
where lower(program) like '%sql%' or lower(program) like '%j0%'
order by program;
PROGRAM EVENT SID SERIAL# P1 P2RAW P3RAW FINAL_BLOCKING_SESSION
------------------------------ ------------------------- ------ --------- ---------- ---------------- ---------------- ----------------------
oracle@db19c (J000) latch: shared pool 1011 58459 1616251448 000000000000026B 0000000093FD6BD0 909
oracle@db19c (J001) library cache: mutex X 426 51552 3168695887 0000038D00000000 0000130A0001006A 909
oracle@db19c (J002) library cache: mutex X 372 7360 3168695887 0000038D00000000 0000130A0001006A 909
oracle@db19c (J003) latch: shared pool 122 30477 1616251448 000000000000026B 000000008C6E1038 909
oracle@db19c (J004) library cache: mutex X 277 6387 3168695887 0000038D00000000 0000130A0001006A 909
oracle@db19c (J005) library cache: mutex X 408 13814 3168695887 0000038D00000000 0000130A0001006A 909
sqlplus@db19c (TNS V1-V3) SQL*Net message from client 909 33954 1413697536 0000000000000001 00
SQL> select mutex_identifier, sleep_timestamp, mutex_type, gets, sleeps, requesting_session, blocking_session, mutex_value, p1raw, location
from v$mutex_sleep_history
where sleep_timestamp > sysdate -2/1440 order by sleep_timestamp desc;
MUTEX_IDENTIFIER SLEEP_TIMESTAMP MUTEX_TYPE GETS SLEEPS REQUESTING_SESSION BLOCKING_SESSION MUTEX_VALUE P1RAW LOCATION
---------------- --------------- --------------- ------- ------- ------------------ ---------------- ---------------- ---------------- ------------
3168695887 01:10:44 Library Cache 7108586 41476 277 909 0000038D00000000 00000000A127C738 kglhdgn2 106
3168695887 01:10:44 Library Cache 7108586 41457 426 909 0000038D00000000 00000000A127C738 kglhdgn2 106
3168695887 01:10:44 Library Cache 7108586 41470 408 909 0000038D00000000 00000000A127C738 kglhdgn2 106
3168695887 01:10:44 Library Cache 7108586 41447 372 909 0000038D00000000 00000000A127C738 kglhdgn2 106
15 01:10:38 Row Cache 46638 1 76 360 0000016800000000 00 [19] kqrpre
-- 38D in MUTEX_VALUE and P2RAW are blocking session id: 909 (=0x38D).
SQL> select * from v$latchholder;
PID SID LADDR NAME GETS CON_ID
---- ----- --------- --------------------------- -------- -------
35 909 60560A38 shared pool 22844512 0
47 1011 6005F500 parameter table management 17143486 0
For "library cache: mutex X", query below finds the library object (P1RAW and P1 in above query output).
ADDR is its object_handle (see Blog:
Oracle PLITBLM "library cache: mutex X").
SQL > select hash_value, addr, owner, name, namespace, type from v$db_object_cache where hash_value in (3168695887);
HASH_VALUE ADDR OWNER NAME NAMESPACE TYPE
---------- ---------------- ------ ------------ -------------------- ----------
3168695887 00000000A127C738 SYS DBMS_SESSION TABLE/PROCEDURE PACKAGE
Now we can have a further look of two different wait events.
4.1.1 "library cache: mutex X" Wait
Above gdb trace shows that shared pool latch Get/Free is fully contained within mutex Get/Release. If the blocked session comes in the same code path, it is blocked by "library cache: mutex X" because it first invokes kglGetMutex (kgxExclusive) to get the same mutex.
Here is what diag LWS db19c_dia0_30593_lws_1.trc showed for Session ID 277. The Short stack dump shows kglGetMutex and kgxExclusive calls.
*** 2021-05-12T01:04:22.093264+02:00
HM: Early Warning - Session ID 277 serial# 6387 OS PID 5888 (J004)
is waiting on 'library cache: mutex X' for 32 seconds, wait id 62
p1: 'idn'=0xbcde764f, p2: 'value'=0x38d00000000, p3: 'where'=0x130a0001006a
Final Blocker is Session ID 909 serial# 33954 on instance 1
which is 'not in a wait' for 60 seconds
Total Self- Total Total Outlr Outlr Outlr
Hung Rslvd Rslvd Wait WaitTm Wait WaitTm Wait
Sess Hangs Hangs Count Secs Count Secs Count Wait Event
------ ------ ------ ------ ------ ------ ------ ------ -----------
2632 0 0 812304 7729347 12122 7695264 0 library cache: mutex X
HM: Dumping Short Stack of pid[61.5888] (sid:277, ser#:6387)
Short stack dump:
ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-semtimedop()+10<-skgpwwait()+187
<-ksliwat()+2224<-kslwaitctx()+188<-kgxWait()+1304<-kgxExclusive()+712<-kglGetMutex()+151<-kglhdgn()+898
<-kglLock()+562<-kglget()+293<-kglgob()+281<-kglgob()+2749<-kgiind()+4256<-pfri8_inst_spec()+140<-pfri1_inst_spec()+68
<-pfrrun()+544<-plsql_run()+752<-peicnt()+279<-kkxexe()+720<-opiexe()+25325<-kpoal8()+2387<-opiodr()+1202
<-kpoodr()+689<-upirtrc()+2760<-kpurcsc()+100<-kpuexec()+10994<-OCIStmtExecute()+41<-jslvec_execcb()+2537
<-jslvswu()+409<-jslvCDBSwitchUsr()+672<-jslve_execute0()+6761<-jslve_execute()+1529<-jslve_cdb_execute()+112
<-rpiswu2()+2004<-kkjex1e_cdb()+222<-kkjsexe()+2333<-kkjrdp()+1588<-opirip()+889<-opidrv()+581<-sou2o()+165
<-opimai_real()+173<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245
Here oradebug short_stack for 'library cache: mutex X' wait:
SQL> oradebug setorapid 61
Oracle pid: 61, Unix process pid: 5888, image: oracle@db19c (J004)
SQL> oradebug short_stack
ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-semtimedop()+10<-skgpwwait()+187<-ksliwat()+2224
<-kslwaitctx()+188<-kgxWait()+1304<-kgxExclusive()+712<-kglGetMutex()+151<-kglhdgn()+898<-kglLock()+562<-kglget()+293<-kglgob()+281
<-kglgob()+2749<-kgiind()+4256<-pfri8_inst_spec()+140<-pfri1_inst_spec()+68<-pfrrun()+544<-plsql_run()+752<-peicnt()+279
<-kkxexe()+720<-opiexe()+25325<-kpoal8()+2387<-opiodr()+1202<-kpoodr()+689<-upirtrc()+2760<-kpurcsc()+100<-kpuexec()+10994
<-OCIStmtExecute()+41<-jslvec_execcb()+2537<-jslvswu()+409<-jslvCDBSwitchUsr()+672<-jslve_execute0()+6761<-jslve_execute()+1529
<-jslve_cdb_execute()+112<-rpiswu2()+2004<-kkjex1e_cdb()+222<-kkjsexe()+2333<-kkjrdp()+1588<-opirip()+889<-opidrv()+581
<-sou2o()+165<-opimai_real()+173<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245
4.1.2 "latch: shared pool" Wait
There is also other code path, which directly requests 'latch: shared pool' and does not require mutex get. Here is what diag LWS db19c_dia0_30593_lws_1.trc showed for Session ID 122. The Short stack dump does not have kglGetMutex and kgxExclusive calls.
*** 2021-05-12T01:04:43.183727+02:00
HM: Early Warning - Session ID 122 serial# 30477 OS PID 5785 (J003)
is waiting on 'latch: shared pool' for 52 seconds, wait id 61
p1: 'address'=0x60560a38, p2: 'number'=0x26b, p3: 'why'=0x8c6e1038
Final Blocker is Session ID 909 serial# 33954 on instance 1
which is 'not in a wait' for 61 seconds
IO
Total Self- Total Total Outlr Outlr Outlr
Hung Rslvd Rslvd Wait WaitTm Wait WaitTm Wait
Sess Hangs Hangs Count Secs Count Secs Count Wait Event
------ ------ ------ ------ ------ ------ ------ ------ -----------
20 0 0 36534 9032 19 8544 0 latch: shared pool
HM: Dumping Short Stack of pid[60.5785] (sid:122, ser#:30477)
Short stack dump:
ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-semop()+7<-skgpwwait()+187<-kslges()+1534
<-kslgetl()+2489<-ksfglt()+163<-kghfre()+3985<-ksuxds()+1061<-kss_del_cb()+218<-kssdel()+216<-ksudel_int()+280
<-ksudel()+68<-kkjrdp()+2207<-opirip()+889<-opidrv()+581<-sou2o()+165<-opimai_real()+173<-ssthrdmain()+417
<-main()+256<-__libc_start_main()+245
Here oradebug short_stack for 'latch: shared pool' wait:
SQL> oradebug setorapid 60
Oracle pid: 60, Unix process pid: 5785, image: oracle@db19c (J003)
SQL> oradebug short_stack
ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-semop()+7<-skgpwwait()+187<-kslges()+1534<-kslgetl()+2489
<-ksfglt()+163<-kghfre()+3985<-ksp_param_handle_free()+779<-kspdesc()+142<-ksmugf()+208<-ksuxds()+3727<-kss_del_cb()+218
<-kssdel()+216<-ksudel_int()+280<-ksudel()+68<-kkjrdp()+2207<-opirip()+889<-opidrv()+581<-sou2o()+165<-opimai_real()+173
<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245
In fact, if we use the same gdb script to trace one Job session, we can see the occurrence of kslgetl / kslfre on 60560A38
which is not contained within kglGetMutex kgxExclusive / kgxRelease.
Breakpoint 2, 0x125cf6c0 in kslfre ()
===== kslfre shared latch (14) <<< Addr(rdi): 60560B78 >>>=====
#0 0x125cf6c0 in kslfre ()
#1 0x1259dded in ksfflt ()
Breakpoint 1, 0x125c9ba0 in kslgetl ()
===== kslgetl shared latch (15) <<< Addr(rdi): 60560A38, Imget: 1, Why: 7165CBE8, Where: 6298 >>>=====
#0 0x125c9ba0 in kslgetl ()
#1 0x1259da43 in ksfglt ()
#2 0x12d2c341 in kghfre ()
#3 0x011380eb in ksp_param_handle_free ()
#4 0x01137d7e in kspdesc ()
#5 0x010f7990 in ksmugf ()
#6 0x1260e4cf in ksuxds ()
#7 0x1260503a in kss_del_cb ()
Breakpoint 2, 0x125cf6c0 in kslfre ()
===== kslfre shared latch (15) <<< Addr(rdi): 60560A38 >>>=====
#0 0x125cf6c0 in kslfre ()
#1 0x1259dded in ksfflt ()
Breakpoint 1, 0x125c9ba0 in kslgetl ()
===== kslgetl shared latch (16) <<< Addr(rdi): 60560AD8, Imget: 1, Why: 7D84F9C0, Where: 6298 >>>=====
db19c_dia0_30593_vfy_12.trc shows blocking graph for both wait events of two above sessions, which are blocked by session id: 909,
and session id: 909 itself is in state 'CPU or Wait CPU'
(although the fact is that session id: 909 is suspended by our manual breakpoint and
in Process Status: t (stopped by debugger during trace) with %CPU=0.0, which does not consume any CPU).
*** 2021-05-12T01:05:52.275135+02:00
-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
Oracle session identified by:
{
os id: 5785
process id: 60, oracle@db19c (J003)
session id: 122
}
is waiting for 'latch: shared pool' with wait info:
{
p1: 'address'=0x60560a38
p2: 'number'=0x26b
p3: 'why'=0x8c6e1038
time in wait: 2 min 0 sec
}
and is blocked by
=> Oracle session identified by:
{
os id: 9558
process id: 35, oracle@db19c
session id: 909
module name: 0 (SQL*Plusdb19c (TNS V1-V3))
}
which is on CPU or Wait CPU:
{
last wait: 2 min 30 sec ago
blocking: 7 sessions
}
Chain 1 Signature: 'CPU or Wait CPU'<='latch: shared pool'
===============================================================================
Chain 2:
-------------------------------------------------------------------------------
Oracle session identified by:
{
os id: 5888
process id: 61, oracle@db19c (J004)
session id: 277
}
is waiting for 'library cache: mutex X' with wait info:
{
p1: 'idn'=0xbcde764f
p2: 'value'=0x38d00000000
p3: 'where'=0x130a0001006a
time in wait: 2 min 1 sec
timeout after: never
}
and is blocked by 'instance: 1, os id: 9558, session id: 909',
which is a member of 'Chain 1'.
4.2 "SO private sga" kss_grow_from_global_cache Blocking Test
In this blocking test, only wait event: "library cache: mutex X" can be observed. We will also look its code path.
Make a new DB connection.
SQL> conn k/s@db19c
Connected.
Get its UNIX process id: 789 (Oracle session id: 909)Start tracing and set a breakpoint
gdb -p 789
break kss_grow_from_global_cache
Run the test:
SQL> exec so_private_pkg.proc1(1);
Resume process running. After a few seconds, we reached the breakpoint, and display call stack.
(gdb) c
Continuing.
Breakpoint 1, 0x12a0700 in kss_grow_from_global_cache ()
(gdb) bt 21
#0 0x012a0700 in kss_grow_from_global_cache ()
#1 0x1260c5ca in kss_add_child ()
#2 0x12d7e4a5 in kglGetSO ()
#3 0x12d76dbf in kgllkal ()
#4 0x04c9df43 in kglLockCursor ()
#5 0x035b5e3f in kkspbd0 ()
#6 0x12a3bc9a in kksParseCursor ()
#7 0x12c11736 in opiosq0 ()
#8 0x129a7280 in opipls ()
#9 0x12990c52 in opiodr ()
#10 0x12aae556 in rpidrus ()
#11 0x12d501a1 in skgmstack ()
#12 0x12aae0d4 in rpidru ()
#13 0x12aad12f in rpiswu2 ()
#14 0x12aac4d2 in rpidrv ()
#15 0x12a89a83 in psddr0 ()
#16 0x12a88eb0 in psdnal ()
#17 0x12dbcfe2 in pevm_EXECC ()
#18 0x12db1a68 in pfrinstr_EXECC ()
#19 0x12db0544 in pfrrun_no_tool ()
#20 0x12daeeb6 in pfrrun ()
We can see that all Job sessions are blocked with "library cache: mutex X" by session 909.
v$mutex_sleep_history shows mutex sleeping stats by BLOCKING_SESSION 909.
(v$latchholder returns no rows because kslgetl is not yet invoked).
SQL> select program, event, sid, serial#, p1, p2raw, p3raw, final_blocking_session
from v$session
where lower(program) like '%sql%' or lower(program) like '%j0%'
order by program;
PROGRAM EVENT SID SERIAL# P1 P2RAW P3RAW FINAL_BLOCKING_SESSION
------------------------------ -------------------- ------ --------- --------- ---------------- ---------------- ----------------------
oracle@db19c (J000) library cache: mutex X 996 38685 255718823 0000038D00000000 0F3DF5A700000039 909
oracle@db19c (J002) library cache: mutex X 372 57666 255718823 0000038D00000000 0F3DF5A700000039 909
oracle@db19c (J003) library cache: mutex X 122 13135 255718823 0000038D00000000 0F3DF5A700000039 909
oracle@db19c (J007) library cache: mutex X 1011 38406 255718823 0000038D00000000 0F3DF5A700000039 909
sqlplus@db19c (TNS V1-V3) PGA memory operation 909 25675 65536 0000000000000001 00
SQL> select mutex_identifier, sleep_timestamp, mutex_type, gets, sleeps, requesting_session, blocking_session, mutex_value, p1raw, location
from v$mutex_sleep_history
where sleep_timestamp > sysdate -2/1440 order by sleep_timestamp desc;
MUTEX_IDENTIFIER SLEEP_TIMESTAMP MUTEX_TYPE GETS SLEEPS REQUESTING_SESSION BLOCKING_SESSION MUTEX_VALUE P1RAW LOCATION
---------------- --------------- ------------- ------- ------- ------------------ ---------------- ---------------- ---------------- ---------------
255718823 01:18:57 Library Cache 1254886 5155 1011 909 0000038D00000000 00000000A1CA9880 kgllkc1 57
255718823 01:18:57 Library Cache 1254886 5145 122 909 0000038D00000000 00000000A1CA9880 kgllkc1 57
255718823 01:18:57 Library Cache 1254886 5136 996 909 0000038D00000000 00000000A1CA9880 kgllkc1 57
255718823 01:18:57 Library Cache 1254886 5141 372 909 0000038D00000000 00000000A1CA9880 kgllkc1 57
0 01:17:36 Row Cache 4330026 1 122 1011 000003F300000000 00 [19] kqrpre
SQL> select * from v$latchholder;
no rows selected
4.3 AWR Report
In AWR Section - Latch Sleep Breakdown, we can see stats of both "shared pool" and "SO private so latch".
Latch Sleep Breakdown
Latch Name Get Requests Misses Sleeps Spin Gets
---------------------------------- ------------ ------- ------- ---------
cache buffers chains 20,218,430 3,052 379 2,980
shared pool 756,911 2,208 364 1,849
SO private so latch 30,950 34 8 27
kokc descriptor allocation latch 534 5 7 4
In Section - Latch Miss Sources, "SO private so latch" is displayed with correct Latch Name. However, no Latch Name "shared pool" can be found. Probably it is re-named as "unknown". The location "Where" clearly shows that "kghalo" and "kghfre" (heap manager allocation/free). The sum of Sleeps for "unknown latch" is almost same as Sleeps (364) in above Section - Latch Sleep Breakdown.
Latch Miss Sources
Latch Name Where NoWait Misses Sleeps Waiter Sleeps
--------------------- --------------------------- -------------- -------- -------------
SO private so latch kss_grow_from_global_cache 0 7 0
SO private so latch kss_shrink_private_so_list 0 1 8
unknown latch kghalo 0 303 264
unknown latch kghfre 0 31 78
unknown latch kghupr1 0 15 12
unknown latch kghalp 0 5 9
unknown latch kgh_heap_sizes 0 5 1
unknown latch kghfnd: req scan 0 2 0
Here the shared pool Child Latch Stats:
Child Latch Statistics
Latch Name Child Num Get Requests Misses Sleeps Spin & Sleeps 1->3+
----------- ---------- ------------- ------- ------- --------------------
shared pool 3 265,194 823 127 696/0/0/0
shared pool 2 256,577 775 149 629/0/0/0
shared pool 1 235,275 610 88 524/0/0/0
For other discussions of latch stats ("Get Requests", "Misses", "Sleeps", "Spin Gets"), see Blog:
Is latch misses statistic gathered or deduced ?
4.4 "row cache mutex" Wait
diag LWS db19c_dia0_30593_lws_1.trc also shows "row cache mutex" Wait for 'cache id'=0xa (dc_users), which also involves 'latch: shared pool' ('address'=0x60560a38)
*** 2021-05-12T06:48:46.642290+02:00
HM: Early Warning - Session ID 277 serial# 6298 OS PID 7016 (J005)
is waiting on 'row cache mutex' for 37 seconds, wait id 277
p1: 'cache id'=0xa, p2: 'where requested'=0x13, p3: ''=0x0
Blocked by Session ID 996 serial# 21988 on instance 1
which is waiting on 'latch: shared pool' for 31 seconds
p1: 'address'=0x60560a38, p2: 'number'=0x26b, p3: 'why'=0x0
Final Blocker is Session ID 599 serial# 13104 on instance 1
which is 'not in a wait' for 32 seconds
Session ID 277 is blocking 2 sessions
Blocking Session ID 765 serial# 16220 on instance 1
which is waiting on 'row cache mutex' for 21 seconds
p1: 'cache id'=0xa, p2: 'where requested'=0x13, p3: ''=0x0
IO
Total Self- Total Total Outlr Outlr Outlr
Hung Rslvd Rslvd Wait WaitTm Wait WaitTm Wait
Sess Hangs Hangs Count Secs Count Secs Count Wait Event
------ ------ ------ ------ ------ ------ ------ ------ -----------
51 0 0 473776 30128 198 19008 0 row cache mutex
------------------------------------------
HM: Dumping Short Stack of pid[61.7016] (sid:277, ser#:6298)
Short stack dump:
ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-semtimedop()+10<-skgpwwait()+187<-ksliwat()+2224
<-kslwaitctx()+188<-kgxWait()+1304<-kgxExclusive()+712<-kqrGetPOMutexInt()+195<-kqrpre1()+792<-jsksGetDBObjectName()+771
<-jslvepost_exec_post()+1020<-jslvsst_session_stop()+5631<-jslve_execute0()+7742<-jslve_execute()+1529<-jslve_cdb_execute()+112
<-rpiswu2()+2004<-kkjex1e_cdb()+222<-kkjsexe()+2333<-kkjrdp()+1588<-opirip()+889<-opidrv()+581<-sou2o()+165<-opimai_real()+173
<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245
5 Appendix gdb_latch_script_3.txt
set pagination off
set logging file latch_output_3.log
set logging overwrite on
set logging on
set $socg = 0
set $shcg = 0
set $socf = 0
set $shcf = 0
set $spec_lck = 0
set $spec_pin = 0
set $body_lck = 0
set $body_pin = 0
set $mutex_get = 0
set $mutex_ex = 0
set $mutex_fr = 0
set $mutex_addr = 0x0
set $body_locked = 0
set $sogrow = 0
set $i = 0
# Adjust beakpoint conditions
# select name, listagg('$rdi==0X'||trim(leading 0 from addr), ' || ') within group (order by child#) addr
# from v$latch_children where name in ('shared pool', 'SO private so latch') and child# <=3 group by name;
# NAME ADDR
# -------------------- ----------------------------------------------------
# shared pool $rdi==0X60560A38 || $rdi==0X60560AD8 || $rdi==0X60560B78
# SO private so latch $rdi==0xB626A638 || $rdi==0xB626A6F0 || $rdi==0xB626A7A8
# -- Usage: (1). make a trace new connection, (2). start gdb trace, (3). run a sql command, (4). stop gdb trace, (5). look trace output
# SQL> conn k/s@testdb
# Connected.
# -- get spid (17191), gdb -x gdb_latch_script_3.txt -p 17191
# SQL> exec so_private_pkg.proc1(1);
break kslgetl if $rdi==0x60560A38 || $rdi==0x60560AD8 || $rdi==0x60560B78
commands
printf "===== kslgetl shared latch (%i) <<< Addr(rdi): %X, Imget: %i, Why: %X, Where: %i >>>=====\n", ++$shcg, $rdi, $rsi, $rdx, $rcx
backtrace 8
continue
end
break kslfre if $rdi==0x60560A38 || $rdi==0x60560AD8 || $rdi==0x60560B78
commands
printf "===== kslfre shared latch (%i) <<< Addr(rdi): %X >>>=====\n", ++$shcf, $rdi
backtrace 2
continue
end
break kslgetl if $rdi==0xB626A638 || $rdi==0xB626A6F0 || $rdi==0xB626A7A8
commands
printf "===== kslgetl so private (%i) <<< Addr(rdi): %X, Imget: %i, Why: %X, Where: %i >>>=====\n", ++$socg, $rdi, $rsi, $rdx, $rcx
backtrace 8
continue
end
break kslfre if $rdi==0xB626A638 || $rdi==0xB626A6F0 || $rdi==0xB626A7A8
commands
printf "===== kslfre so private (%i) <<< Addr(rdi): %X >>>=====\n", ++$socf, $rdi
backtrace 2
continue
end
break ksl_get_shared_latch if $rdi==0x60560A38 || $rdi==0x60560AD8 || $rdi==0x60560B78
commands
printf "===== ksl_get_shared_latch shared latch (%i) <<< addr(rdi): %X, Imget: %i, Why: %X, Where: %i, Mode: %X >>>=====\n", ++$i, $rdi, $rsi, $rdx, $rcx, $r8
backtrace 8
continue
end
break ksl_get_shared_latch if $rdi==0xB626A638 || $rdi==0xB626A6F0 || $rdi==0xB626A7A8
commands
printf "===== ksl_get_shared_latch SO latch (%i) <<< addr(rdi): %X, Imget: %i, Why: %X, Where: %i, Mode: %X >>>=====\n", ++$i, $rdi, $rsi, $rdx, $rcx, $r8
backtrace 8
continue
end
# Most "kss private so " has size 5136 in shared_poo dump like: 0b7f17b18 sz= 5136 cprm "kss private so "
# Find Chunk which covers $r14. Offset 4112 to $r14 is only an example.
break kss_grow_from_global_cache
commands
printf "===== kss_grow_from_global_cache (%i) << (r14): %X, kss private so Chunk Addr (r14-4112): %X >>>=====\n", ++$sogrow, $r14, ($r14-4112)
backtrace 10
continue
end
break kgllkal
#break kgllkal if $rdx==0X9FB08E08 || $rdx==0XA043B758
commands
printf "===== Library Cache Lock (%i) <<< kgllkhdl: %X, kgllkmod %x, kglnaobj: %s>>>=====\n", ++$body_lck, $rdx, $rcx, ($rdx+0x1c0)
backtrace 8
continue
end
break kglGetMutex if $body_lck > 0
command
printf "=====--- kglGetMutex (%i) ---> Mutex addr (rsi): %X, Location(r8d): %d\n", ++$mutex_get, $rsi, $r8d
backtrace 4
continue
end
break kgxExclusive if $body_lck > 0
#break kgxExclusive if $r9==0X9EC403D0 && $body_lck > 0
command
printf "=====----- kgxExclusive (%i) ---> Mutex addr (rsi): %X\n", ++$mutex_ex, $rsi
#backtrace 4
set $mutex_addr = $rsi
continue
end
break kgxRelease if $r15==$mutex_addr && $body_lck > 0
command
printf "=====----- kgxRelease (%i) ---> Mutex addr (r15): %X \n", ++$mutex_fr, $r15
p/d (int[4])*$r15
# x/4dw $r15
continue
end