In this Blog, Application Context is used as a concrete case to discuss Oracle "library cache: mutex X".
Note: All tests are done in Oracle 12.1.0.2 on AIX, Solaris, Linux with 6 physical processors.
1. Test
Run the appended Test Code by launching 4 Jobs:
exec ctx_set_jobs(4);
Monitor Job sessions:
select sid, program, event, p1text, p1, p2text, p2, p3text, p3
from v$session where program like '%(J%';
SID PROGRAM EVENT P1TEXT P1 P2TEXT P2 P3TEXT P3
----- -------------------- ------------------------- ------ ----------- ------ -------------- ------ -----------------
38 oracle@testdb (J003) library cache: mutex X idn 1317011825 value 3968549781504 where 9041305591414788
890 oracle@testdb (J000) library cache: mutex X idn 1317011825 value 163208757248 where 9041305591414874
924 oracle@testdb (J001) library cache: mutex X idn 1317011825 value 4556960301056 where 9041305591414874
1061 oracle@testdb (J002) library cache: mutex X idn 1317011825 value 3968549781504 where 9041305591414879
Pick idn (P1): 1317011825, and query v$db_object_cache:
select name, namespace, type, hash_value, locks, pins, locked_total, pinned_total
from v$db_object_cache where hash_value in (1317011825);
NAME NAMESPACE TYPE HASH_VALUE LOCKS PINS LOCKED_TOTAL PINNED_TOTAL
---------- --------------- --------------- ----------- ----------- ----------- ------------ ------------
TEST_CTX APP CONTEXT APP CONTEXT 1317011825 4 0 4 257802287
It shows that "library cache: mutex X" is on application context: TEST_CTX, and PINNED_TOTAL is probably increased for each access.
Although TEST_CTX is a local context and its values is stored in the User Global Area (UGA), the content of "library cache: mutex X" is globally on its definition.
select namespace, package, type from dba_context where namespace = 'TEST_CTX';
NAMESPACE PACKAGE TYPE
---------- ------------ ----------------
TEST_CTX TEST_CTX_PKG ACCESSED LOCALLY
After test, clean-up all jobs by:exec clean_jobs;
2. Mutex Contention and Performance
Run the test, monitor Mutex Contention and Performance:
SQL > exec ctx_set_jobs(4);
column owner format a6
column name format a10
column property format a10
column namespace format a12
column type format a12
SQL > select owner, name, property, hash_value, locks, pins, locked_total, pinned_total, executions, sharable_mem, namespace, type, full_hash_value
from v$db_object_cache v
where (name in ('TEST_CTX') or hash_value in (1317011825) or property like '%HOT%');
OWNER NAME PROPERTY HASH_VALUE LOCKS PINS LOCKED_TOTAL PINNED_TOTAL EXECUTIONS SHARABLE_MEM NAMESPACE TYPE FULL_HASH_VALUE
------ ---------- ---------- ---------- ------ ----- ------------ ------------ ---------- ------------ ------------ ------------ --------------------------------
SYS TEST_CTX 1317011825 4 0 4 1191106893 1191106886 4096 APP CONTEXT APP CONTEXT 3581f5a97dfac7485a3330954e800171
SQL > select * from v$mutex_sleep order by sleeps desc, location;
MUTEX_TYPE LOCATION SLEEPS WAIT_TIME
----------------- --------------------------- ---------- ----------
Library Cache kglpndl1 95 760307 76983943
Library Cache kglpin1 4 288210 172310041
Library Cache kglpnal1 90 193282 38493570
Library Cache kglGetHandleReference 123 1 0
--display mutex REQUESTING/BLOCKING details for each session
SQL > select * from v$mutex_sleep_history order by sleep_timestamp desc, location;
MUTEX_IDENTIFIER SLEEP_TIMESTAMP MUTEX_TYPE GETS SLEEPS REQUESTING_SESSION BLOCKING_SESSION LOCATION MUTEX_VALUE P1 P1RAW
---------------- -------------------- ------------- ---------- ------- ------------------ ---------------- -------------------------- ------------------ --- ----------------
1317011825 24-OCT-2017 14:15:40 Library Cache 675726540 449377 7 0 kglpin1 4 00 0 0000000176D1E860
1317011825 24-OCT-2017 14:15:40 Library Cache 675726542 442641 368 0 kglpndl1 95 00 0 0000000176D1E860
1317011825 24-OCT-2017 14:15:40 Library Cache 675711683 444299 901 7 kglpndl1 95 0000000700000000 0 0000000176D1E860
1317011825 24-OCT-2017 14:15:40 Library Cache 675709618 438207 187 0 kglpin1 4 00 0 0000000176D1E860
1317011825 24-OCT-2017 14:09:06 Library Cache 2806872 1 900 0 kglGetHandleReference 123 00 0 0000000176D1E860
Pick spid of one Oracle session, for example, 10684, get callstack:
$ > pstack 10684
10684: ora_j000_testdb
fffffd7ffc9d3e3b semsys (4, e000013, fffffd7fffdf5658, 1, fffffd7fffdf5660)
0000000001ab9008 sskgpwwait () + f8
0000000001ab8c95 skgpwwait () + c5
0000000001c710d5 ksliwat () + 8f5
0000000001c70410 kslwaitctx () + 90
0000000001e6ffb0 kgxWait () + 520
000000000dd1ae6f kgxExclusive () + 1cf
00000000021cc025 kglGetMutex () + b5
000000000212400e kglpin () + 2fe
00000000026aa159 kglpnp () + 269
00000000026a71ab kgiina () + 1db
000000000dd118b9 kgintu_named_toplevel_unit () + 39
0000000007ac16a6 kzctxBInfoGet () + 746
0000000007ac38ed kzctxChkTyp () + fd
0000000007ac43f0 kzctxesc () + 510
0000000002781d9d pevm_icd_call_common () + 29d
0000000002781930 pfrinstr_ICAL () + 90
0000000001a435ca pfrrun_no_tool () + 12a
0000000001a411e0 pfrrun () + 4c0
0000000001a3fb48 plsql_run () + 288
Where semsys(4, ...) is semtimedop(int semid, struct sembuf *sops, size_t nsops, const struct timespec *timeout) specified in syscall.h.
Run a small dtrace script:
$ > sudo dtrace -n \
'BEGIN {self->start_wts = walltimestamp; self->start_ts = timestamp;}
pid$target::kglpndl:entry /execname == "oracle"/ { self->rc = 1; }
pid$target::kgxExclusive:entry /execname == "oracle" && self->rc == 1/ { self->ts = timestamp; }
pid$target::kgxExclusive:return /self->ts > 0/ {
@lquant["ns"] = lquantize(timestamp - self->ts, 0, 10000, 1000);
@avgs["AVG_ns"] = avg(timestamp - self->ts);
@mins["MIN_ns"] = min(timestamp - self->ts);
@maxs["MAX_ns"] = max(timestamp - self->ts);
@sums["SUM_ms"] = sum((timestamp - self->ts)/1000000);
@counts[ustack(10, 0)] = count();
self->rc = 0; self->ts = 0;}
END { printf("Start: %Y, End: %Y, Elapsed_ms: %d\n", self->start_wts, walltimestamp, (timestamp - self->start_ts)/1000000);}
' -p 10684
dtrace: description 'BEGIN ' matched 8 probes
Start: 2017 Oct 24 14:30:02, End: 2017 Oct 24 14:31:08, Elapsed_ms: 66183
ns
value ------------- Distribution ------------- count
< 0 | 0
0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3352394
1000 |@@@@@@@@ 803168
2000 | 11598
3000 | 1484
4000 | 890
5000 | 626
6000 | 460
7000 | 315
8000 | 265
9000 | 147
>= 10000 | 2227
AVG_ns 1999
MIN_ns 777
MAX_ns 20411473
SUM_ms 4214
oracle`kgxExclusive+0x105
oracle`kglpndl+0x1fe
oracle`kglUnPin+0x101
a.out`kzctxChkTyp+0x14e
a.out`kzctxesc+0x510
a.out`pevm_icd_call_common+0x29d
a.out`pfrinstr_ICAL+0x90
oracle`pfrrun_no_tool+0x12a
oracle`pfrrun+0x4c0
oracle`plsql_run+0x288
4173574
The above CallStack shows that kgxExclusive is triggered by kglpin via kglGetMutex.Solaris "prstat -mL" show about 30% percentage of time the process has spent sleeping (SLP).
3. Hot library cache objects
Learning from Blog: Divide and conquer the true mutex contention
KGLPIN: KGL: PIN heaps and load data pieces of an object KGLPNDL: KGL PiN DeLete KGLPNAL1: KGL PiN ALlOcate KGLHBH1 63, KGLHDGN2 106: Invalid Password, Application Context(eg: SYS_CONTEXT)"library cache: mutex X" can be allevaited by creating multiple copies of hot objects, which are controlled by two hidden parameters:
_kgl_hot_object_copies: controls the maximum number of copies _kgl_debug: marks hot library cache objects as a candidate for cloningConfigure two hidden parameters:
SQL > alter system set "_kgl_hot_object_copies"= 255 scope=spfile;
--alter system reset "_kgl_hot_object_copies" scope=spfile;
SQL > alter system set "_kgl_debug"=
"name='TEST_CTX' schema='SYS' namespace=21 debug=33554432",
"name='PLITBLM' schema='PUBLIC' namespace=1 debug=33554432"
scope=spfile;
--alter system reset "_kgl_debug" scope=spfile;
--- added in 25-Apr-2021 ---
MOS Bug 19373224 - dbms_shared_pool.unmarkhot spins if _kgl_hot_object_copies is 255 (Doc ID 19373224.8) wrote
Executing dbms_shared_pool.unmarkhot() might have entered a spin under kglget() if
_kgl_hot_object_copies was set (or derived) to a value of 255.
Workaround
Reduce the value of _kgl_hot_object_copies to 254 or below.
PUBLIC SYNONYM (namespace=1) 'PLITBLM' is added here to show multiple library cache objects can be specified in _kgl_debug.
PLITBLM is package for PLSQL Index TaBLe Mangement, i.e PLSQL Collections (Associative Arrays, Nested Table, Varrays).
All its implementations are through c interface. library cache object NAMESPACE number, NAMESPACE name and TYPE name can be listed by following queries ("_kgl_debug" and dbms_shared_pool.markhot accept number as NAMESPACE):
SQL > select distinct namespace, object_type from dba_objects v order by 1;
SQL > select distinct namespace, type# from sys.obj$ order by 1;
SQL > select distinct kglhdnsp NAMESPACE_id, kglhdnsd NAMESPACE_name from x$kglob
--where kglhdnsd in ('APP CONTEXT')
order by kglhdnsp;
SQL > select distinct kglobtyp TYPE_id, kglobtyd TYPE_name from x$kglob
--where kglobtyd in ('APP CONTEXT')
order by kglobtyp;
Run the same test:
-- Stop all Jobs
SQL > exec clean_jobs;
--Restart DB to activate Hot library cache objects
SQL> startup force
SQL > select owner, name, property, hash_value, locks, pins, locked_total, pinned_total, executions, sharable_mem, namespace, type, full_hash_value
from v$db_object_cache v
where (name in ('TEST_CTX') or hash_value in (1317011825) or property like '%HOT%');
OWNER NAME PROPERTY HASH_VALUE LOCKS PINS LOCKED_TOTAL PINNED_TOTAL EXECUTIONS SHARABLE_MEM NAMESPACE TYPE FULL_HASH_VALUE
------ ---------- ---------- ---------- ----- ---- ------------ ------------ ---------- ------------ ------------ -------- -----------------------------
SYS TEST_CTX HOT 1317011825 0 0 1 0 0 0 APP CONTEXT CURSOR 3581f5a97dfac7485a3330954e800171
SQL > exec ctx_set_jobs(4);
SQL > select sid, program, event, p1text, p1, p2text, p2, p3text, p3
from v$session where program like '%(J%';
SID PROGRAM EVENT P1TEXT P1 P2TEXT P2 P3TEXT P3
---- ------------------------ ------------------------- ------ ----------- ------ -------------- ------ -----------------
5 oracle@s5d00003 (J001) null event 0 0 0
186 oracle@s5d00003 (J004) null event 0 0 0
369 oracle@s5d00003 (J005) null event 0 0 0
902 oracle@s5d00003 (J000) null event 0 0 0
SQL > select owner, name, property, hash_value, locks, pins, locked_total, pinned_total, executions, sharable_mem, namespace, type, full_hash_value
from v$db_object_cache v
where (name in ('TEST_CTX') or hash_value in (1317011825) or property like '%HOT%');
OWNER NAME PROPERTY HASH_VALUE LOCKS PINS LOCKED_TOTAL PINNED_TOTAL EXECUTIONS SHARABLE_MEM NAMESPACE TYPE FULL_HASH_VALUE
------ ---------- ---------- ---------- ----- ---- ------------ ------------ ---------- ------------ ------------ ------------ --------------------------------
SYS TEST_CTX HOT 1317011825 0 0 1 0 0 0 APP CONTEXT CURSOR 3581f5a97dfac7485a3330954e800171
SYS TEST_CTX HOTCOPY6 1487681198 1 0 2 151394920 151394917 4096 APP CONTEXT APP CONTEXT 1047cbfbca3fd100cf5758a258ac36ae
SYS TEST_CTX HOTCOPY138 3082567164 1 0 2 151821083 151821080 4096 APP CONTEXT APP CONTEXT 31ac20c09044dfec01acd307b7bc3dfc
SYS TEST_CTX HOTCOPY187 3192676979 1 0 2 151252013 151252010 4096 APP CONTEXT APP CONTEXT 6635075a7adcc68672a02262be4c6273
SYS TEST_CTX HOTCOPY115 4198626891 1 0 2 150529629 150529626 4096 APP CONTEXT APP CONTEXT d6724a1c9f480d55f73dc8fcfa41f64b
SQL > select * from v$mutex_sleep order by sleeps desc, location;
MUTEX_TYPE LOCATION SLEEPS WAIT_TIME
-------------------------------- ---------------------------------------- ---------- ----------
Cursor Pin kkslce [KKSCHLPIN2] 2 20118
SQL > select * from v$mutex_sleep_history order by sleep_timestamp desc, location;
MUTEX_IDENTIFIER SLEEP_TIMESTAMP MUTEX_TYPE GETS SLEEPS REQUESTING_SESSION BLOCKING_SESSION LOCATION MUTEX_VALUE P1 P1RAW
---------------- -------------------- ---------- ---- ------ ------------------ ---------------- ------------------- ---------------- -- -----
2816823972 24-OCT-2017 15:09:13 Cursor Pin 1 1 183 364 kkslce [KKSCHLPIN2] 0000016C00000000 2 00
2214650983 24-OCT-2017 15:04:50 Cursor Pin 1 1 5 902 kkslce [KKSCHLPIN2] 0000038600000000 2 00
Run the same dtrace script:
$ > sudo dtrace -n \
'BEGIN {self->start_wts = walltimestamp; self->start_ts = timestamp;}
pid$target::kglpndl:entry /execname == "oracle"/ { self->rc = 1; }
pid$target::kgxExclusive:entry /execname == "oracle" && self->rc == 1/ { self->ts = timestamp; }
pid$target::kgxExclusive:return /self->ts > 0/ {
@lquant["ns"] = lquantize(timestamp - self->ts, 0, 10000, 1000);
@avgs["AVG_ns"] = avg(timestamp - self->ts);
@mins["MIN_ns"] = min(timestamp - self->ts);
@maxs["MAX_ns"] = max(timestamp - self->ts);
@sums["SUM_ms"] = sum((timestamp - self->ts)/1000000);
@counts[ustack(10, 0)] = count();
self->rc = 0; self->ts = 0;}
END { printf("Start: %Y, End: %Y, Elapsed_ms: %d\n", self->start_wts, walltimestamp, (timestamp - self->start_ts)/1000000);}
' -p 11751
Start: 2017 Oct 24 15:21:02, End: 2017 Oct 24 15:22:40, Elapsed_ms: 97999
ns
value ------------- Distribution ------------- count
< 0 | 0
0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 8050589
1000 |@@ 330902
2000 | 1106
3000 | 1606
4000 | 1352
5000 | 630
6000 | 322
7000 | 201
8000 | 133
9000 | 94
>= 10000 | 481
AVG_ns 897
MIN_ns 813
MAX_ns 315083
SUM_ms 0
oracle`kgxExclusive+0x105
oracle`kglpndl+0x1fe
oracle`kglUnPin+0x101
a.out`kzctxChkTyp+0x14e
a.out`kzctxesc+0x510
a.out`pevm_icd_call_common+0x29d
a.out`pfrinstr_ICAL+0x90
oracle`pfrrun_no_tool+0x12a
oracle`pfrrun+0x4c0
oracle`plsql_run+0x288
8387416
Solaris "prstat -mL" show almost 100% percentage of time the process has spent in user mode (USR).Try with official API in dbms_shared_pool, it seems that NAMESPACE: 'APP CONTEXT' not yet supported.
-- Stop all Jobs
SQL > exec clean_jobs;
SQL > alter system reset "_kgl_debug" scope=spfile;
--Restart DB
SQL> startup force
SQL > exec sys.dbms_shared_pool.markhot('SYS', 'TEST_CTX', 21);
--exec sys.dbms_shared_pool.unmarkhot('SYS', 'TEST_CTX', 21);
ORA-26680: object type not supported
ORA-06512: at "SYS.DBMS_SHARED_POOL", line 133
-- Using 32 Byte (16 hexadecimal) V$DB_OBJECT_CACHE.FULL_HASH_VALUE
SQL > exec sys.dbms_shared_pool.markhot(hash=>'3581f5a97dfac7485a3330954e800171', NAMESPACE=>21);
--exec sys.dbms_shared_pool.unmarkhot(hash=>'3581f5a97dfac7485a3330954e800171', NAMESPACE=>21);
ORA-26680: object type not supported
ORA-06512: at "SYS.DBMS_SHARED_POOL", line 138
Comparing "_kgl_debug" and markhot, it seems that "_kgl_debug" is persistent after DB restart, but not always stable after DB restart.
Several sessions can still contend for the same library cache objects without creating/using HOT objects.Whereas markhot seems stable after DB restart, but not always persistent after DB restart. Moreover, markhot does not support all NAMESPACEs of library cache objects.
When a SYNONYM is marked HOT, it can encounter core dump with Error:
ORA-00600: internal error code, arguments: [kgltti-no-dep1]
with CallStack:
kgltti()+1358 -> dbgeEndDDEInvocation() // ERROR SIGNALED: yes COMPONENT: LIBCACHE
kqlCompileSynonym()+3840 -> kgltti()
kqllod_new()+3768 -> kqlCompileSynonym()
kqlCallback()+79 -> kqllod_new()
kqllod()+710 -> kqlCallback()
kglobld()+1058 -> kqllod()
kglobpn()+1232 -> kglobld()
kglpim()+489 -> kglobpn()
kglpin()+1785 -> kglpim()
kglgob()+493 -> kglpin()
kgiind()+1529 -> kglgob()
pfri8_inst_spec()+126 -> kgiind()
pfri1_inst_spec()+69 -> pfri8_inst_spec()
pfrrun()+1506 -> pfri1_inst_spec()
plsql_run()+648 -> pfrrun()
--->
This error is addressed by Oracle MOS:ORA-00600 [kgltti-no-dep1] When Synonym Marked Hot (Doc ID 2153847.1)
4. V$MUTEX_SLEEP_HISTORY
V$MUTEX_SLEEP_HISTORY displays time-series data. Each row in this view is for a specific time, mutex type, location, requesting session and blocking session combination. The data in this view is contained within a circular buffer, with the most recent sleeps shown (Oracle V$MUTEX_SLEEP_HISTORY).
Two fields are documented as:
GETS The total number of gets since the mutex was created and up until the time of the wait (and from all sessions past and present) SLEEPS The number of times the requester had to sleep before obtaining the mutexIt seems that GETS is an instance-wide accumulated historized data, whereas SLEEPS is per REQUESTING_SESSION for a specific time, mutex type, location.
We can try to edit a query to reveal Mutex contention details. Column SLEEPS and GETs_Per_MS are the points to monitor.
with mutex_1 as (select /*+ materializse */ m.*
,(gets -lag(gets) over(partition by mutex_identifier, mutex_type, location order by sleep_timestamp, requesting_session)) Total_GETs_Delta
,(((sleep_timestamp -lag(sleep_timestamp) over(partition by mutex_identifier, mutex_type, location
order by sleep_timestamp, requesting_session)))) Elapsed
from v$mutex_sleep_history m)
,mutex_2 as (select /*+ materializse */ m.*
,(extract(second from Elapsed) + extract(minute from Elapsed) * 60 + extract(hour from Elapsed))*1e3 Elapsed_MS
from mutex_1 m)
,mutex as (select /*+ materializse */ m.*
,round(Total_GETs_Delta/nullif(Elapsed_MS, 0)) GETs_Per_MS
from mutex_2 m)
,oc as (select /*+ materializse */ * from v$db_object_cache where hash_value in (select mutex_identifier from mutex))
,ses as (select /*+ materializse */ s.*
,(select owner|| '.' ||object_name||case when procedure_name is not null then '.' ||procedure_name end
from dba_procedures
where object_id = s.plsql_entry_object_id and subprogram_id = s.plsql_entry_subprogram_id) plsql_entry
,(select owner||'.'||object_name||case when procedure_name is not null then '.' || procedure_name end
from dba_procedures
where object_id = s.plsql_object_id and subprogram_id = s.plsql_subprogram_id) plsql
,(select dbms_lob.substr(sql_text, 50, 1) from dba_hist_sqltext where sql_id = s.sql_id and rownum = 1) sql_text
from v$active_session_history s where sample_time <= sysdate-10/1440) -- only last 10 minutes
select m.*, o.*, s.*
from mutex m, oc o, ses s;
where m.mutex_identifier = o.hash_value(+)
and m.requesting_session = s.session_id(+)
and m.sleep_timestamp between (s.sample_time(+) - INTERVAL'1'SECOND) and s.sample_time(+)
order by m.mutex_identifier, m.mutex_type, m.location, m.gets desc, m.sleep_timestamp desc, m.requesting_session;
5. Code Path of "library cache: mutex X"
Run Application Context settings 1000 times and at the same time dtrace its SPID: 1217.
It prints out 29 "kgxExclusive" callstacks (Top 3 callstacks are listed at first). Editing them together, we can build up a small dictionary of "library cache: mutex X" to lookup different occurrences and frequencies of mutex X.
The test (on a fresh started instance) shows that 'TEST_CTX' was pinned 1000 times for 1000 executions.
1000 Application Context settings are implemented by 1000:
PinAndPrepare: kglpnp -> kglpin PinAndAllocate: kglpin -> kglpnal PinAndDelete: kglUnPin -> kglpndl
SQL > select name, locks, pins, locked_total, pinned_total, executions
from v$db_object_cache where name in ('TEST_CTX') or hash_value in (1317011825);
NAME LOCKS PINS LOCKED_TOTAL PINNED_TOTAL EXECUTIONS
---------- ----------- ----------- ------------ ------------ -----------
TEST_CTX 1 0 1 1000 999
SQL > exec ctx_set(1000, 123);
SQL > select name, locks, pins, locked_total, pinned_total, executions
from v$db_object_cache where name in ('TEST_CTX') or hash_value in (1317011825);
NAME LOCKS PINS LOCKED_TOTAL PINNED_TOTAL EXECUTIONS
---------- ----------- ----------- ------------ ------------ -----------
TEST_CTX 1 0 1 2000 1999
$ > sudo dtrace -n \
'pid$target::kgxExclusive:entry /execname == "oracle"/ { self->in = 1; }
pid$target::kgxExclusive:return /self->in > 0/ {
@counts[ustack(10, 0)] = count();
self->in = 0;}
' -p 1217
dtrace: description 'pid$target::kgxExclusive:entry ' matched 4 probes
oracle`kgxExclusive+0x105 oracle`kgxExclusive+0x105 oracle`kgxExclusive+0x105
a.out`kglpin+0x2fe a.out`kglpndl+0x1fe a.out`kglpnal+0x1ae
a.out`kglpnp+0x269 a.out`kglUnPin+0x101 a.out`kglpin+0x52f
a.out`kgiina+0x1db a.out`kzctxChkTyp+0x14e a.out`kglpnp+0x269
oracle`kgintu_named_toplevel_unit+0x39 a.out`kzctxesc+0x510 a.out`kgiina+0x1db
a.out`kzctxBInfoGet+0x746 a.out`pevm_icd_call_common+0x29d oracle`kgintu_named_toplevel_unit+0x39
a.out`kzctxChkTyp+0xfd a.out`pfrinstr_ICAL+0x90 a.out`kzctxBInfoGet+0x746
a.out`kzctxesc+0x510 oracle`pfrrun_no_tool+0x12a a.out`kzctxChkTyp+0xfd
a.out`pevm_icd_call_common+0x29d oracle`pfrrun+0x4c0 a.out`kzctxesc+0x510
a.out`pfrinstr_ICAL+0x90 oracle`plsql_run+0x288 a.out`pevm_icd_call_common+0x29d
1000 1000 1000
oracle`kgxExclusive+0x105 oracle`kgxExclusive+0x105 oracle`kgxExclusive+0x105
a.out`kglpin+0x2fe a.out`kgldtin+0x85d a.out`kglhdgn+0xc2
a.out`kglgob+0x1ed a.out`kgldti+0x63 a.out`kglLock+0x679
a.out`kgldpo0+0x4d8 a.out`kksauc+0x3b7 a.out`kglget+0x11b
a.out`kglgbo+0x95 a.out`kkscscid_auc_eval+0x176 a.out`kkspsc0+0x744
a.out`kksauc+0x888 a.out`kkscsCheckCriteria+0x691 a.out`kksParseCursor+0x74
a.out`kkscscid_auc_eval+0x176 a.out`kkscsCheckCursor+0x30a a.out`opiosq0+0x70a
a.out`kkscsCheckCriteria+0x691 a.out`kkscsSearchChildList+0x356 a.out`kpooprx+0x102
a.out`kkscsCheckCursor+0x30a a.out`kksfbc+0x916 a.out`kpoal8+0x308
a.out`kkscsSearchChildList+0x356 a.out`kkspsc0+0x9d7 oracle`opiodr+0x433
1 1 1
oracle`kgxExclusive+0x105 oracle`kgxExclusive+0x105 oracle`kgxExclusive+0x105
a.out`kglLock+0x93d a.out`kglpin+0x2fe a.out`kglpndl+0x1fe
a.out`kglget+0x11b a.out`kglgob+0x1ed a.out`kglUnPin+0x101
a.out`kglgob+0x13b a.out`kkdogty+0x3fd a.out`kksauc+0x3d0
a.out`kkdogty+0x3fd a.out`kkdot2t+0x36 a.out`kkscscid_auc_eval+0x176
a.out`kkdot2t+0x36 a.out`opibnd0+0xbe0 a.out`kkscsCheckCriteria+0x691
a.out`psdgbaa+0x791 a.out`opibnd+0x175 a.out`kkscsCheckCursor+0x30a
a.out`pevm_GBVAR+0xdc a.out`kpopbnd+0xc7 a.out`kkscsSearchChildList+0x356
a.out`pfrinstr_GBVAR+0x37 a.out`kpoal8+0x1431 a.out`kksfbc+0x916
oracle`pfrrun_no_tool+0x12a oracle`opiodr+0x433 a.out`kkspsc0+0x9d7
1 1 1
oracle`kgxExclusive+0x105 oracle`kgxExclusive+0x105 oracle`kgxExclusive+0x105
a.out`kglLock+0x93d a.out`kglati+0x6a a.out`kglLockCursor+0xf6
a.out`kglget+0x11b a.out`kksaxs+0x76b a.out`kxsGetLookupLock+0x70
a.out`kglgob+0x13b a.out`kksauc+0x1cc a.out`kkscsCheckCursor+0x198
a.out`kgldpo0+0x4d8 a.out`kkscscid_auc_eval+0x176 a.out`kkscsSearchChildList+0x356
a.out`kglgbo+0x95 a.out`kkscsCheckCriteria+0x691 a.out`kksfbc+0x916
a.out`kksauc+0x888 a.out`kkscsCheckCursor+0x30a a.out`kkspsc0+0x9d7
a.out`kkscscid_auc_eval+0x176 a.out`kkscsSearchChildList+0x356 a.out`kksParseCursor+0x74
a.out`kkscsCheckCriteria+0x691 a.out`kksfbc+0x916 a.out`opiosq0+0x70a
a.out`kkscsCheckCursor+0x30a a.out`kkspsc0+0x9d7 a.out`kpooprx+0x102
1 1 1
oracle`kgxExclusive+0x105 oracle`kgxExclusive+0x105 oracle`kgxExclusive+0x105
a.out`kglhdgn+0xc2 a.out`kglLock+0x93d a.out`kgllkdl+0x196
a.out`kglLock+0x679 a.out`kglget+0x11b a.out`kglUnLock+0xf8
a.out`kglget+0x11b a.out`kglgob+0x13b a.out`kxsUnlock+0x1b1
a.out`kglgob+0x13b a.out`kkdogty+0x3fd a.out`kxsFreeXsc+0x16e
a.out`kkdogty+0x3fd a.out`kkdot2t+0x36 a.out`kksCloseCursor+0x60a
a.out`kkdot2t+0x36 oracle`psdsbaa+0x244 a.out`opicca+0x81
a.out`opibnd0+0xbe0 a.out`pevm_SBVAR+0x123 a.out`opiclo+0x96
a.out`opibnd+0x175 a.out`pfrinstr_SBVAR+0x3a a.out`kpoclsa+0x44
a.out`kpopbnd+0xc7 oracle`pfrrun_no_tool+0x12a oracle`opiodr+0x433
1 1 1
oracle`kgxExclusive+0x105 oracle`kgxExclusive+0x105 oracle`kgxExclusive+0x105
a.out`kgllkdl+0x196 a.out`kglpnal+0x1ae a.out`kglLock+0x2aa
a.out`kglUnLock+0xf8 a.out`kglpin+0x52f a.out`kglget+0x11b
a.out`kksauc+0x3e0 a.out`kglgob+0x1ed a.out`kksaxs+0x52d
a.out`kkscscid_auc_eval+0x176 a.out`kgldpo0+0x4d8 a.out`kksauc+0x1cc
a.out`kkscsCheckCriteria+0x691 a.out`kglgbo+0x95 a.out`kkscscid_auc_eval+0x176
a.out`kkscsCheckCursor+0x30a a.out`kksauc+0x888 a.out`kkscsCheckCriteria+0x691
a.out`kkscsSearchChildList+0x356 a.out`kkscscid_auc_eval+0x176 a.out`kkscsCheckCursor+0x30a
a.out`kksfbc+0x916 a.out`kkscsCheckCriteria+0x691 a.out`kkscsSearchChildList+0x356
a.out`kkspsc0+0x9d7 a.out`kkscsCheckCursor+0x30a a.out`kksfbc+0x916
1 1 1
oracle`kgxExclusive+0x105 oracle`kgxExclusive+0x105 oracle`kgxExclusive+0x105
a.out`kglpin+0x2fe a.out`kgllkdl+0x196 a.out`kglpnal+0x1ae
a.out`kglpnp+0x269 a.out`kglUnLock+0xf8 a.out`kglpin+0x52f
a.out`kgiinb+0x5e0 a.out`kxsReleaseParentLock+0xb6 a.out`kglgob+0x1ed
a.out`pfri7_inst_body_common+0x18d a.out`kxsFreeXsc+0x181 a.out`kkdogty+0x3fd
a.out`pfri3_inst_body+0x45 a.out`kksCloseCursor+0x60a a.out`kkdot2t+0x36
oracle`pfrrun+0x685 a.out`opicca+0x81 a.out`opibnd0+0xbe0
oracle`plsql_run+0x288 a.out`opiclo+0x96 a.out`opibnd+0x175
oracle`peicnt+0x946 a.out`kpoclsa+0x44 a.out`kpopbnd+0xc7
oracle`kkxexe+0x2f3 oracle`opiodr+0x433 a.out`kpoal8+0x1431
3 1 1
oracle`kgxExclusive+0x105 oracle`kgxExclusive+0x105 oracle`kgxExclusive+0x105
a.out`kglhdgn+0x23e a.out`kglpnal+0x1ae a.out`kgllkdl+0x196
a.out`kglLock+0x679 a.out`kglpin+0x52f a.out`kss_del_cb+0x19d
a.out`kglget+0x11b a.out`kglpnp+0x269 a.out`kssdel+0xf2
a.out`kglgob+0x13b a.out`kgiinb+0x5e0 a.out`kssdch_int+0x326
a.out`kkdogty+0x3fd a.out`pfri7_inst_body_common+0x18d a.out`ksudlc+0xf6
a.out`kkdot2t+0x36 a.out`pfri3_inst_body+0x45 a.out`kss_del_cb+0x106
a.out`opibnd0+0xbe0 oracle`pfrrun+0x685 a.out`kssdel+0xf2
a.out`opibnd+0x175 oracle`plsql_run+0x288 a.out`ksupop+0x23a
a.out`kpopbnd+0xc7 oracle`peicnt+0x946 oracle`opiodr+0x4b1
1 3 1
oracle`kgxExclusive+0x105 oracle`kgxExclusive+0x105 oracle`kgxExclusive+0x105
a.out`kglnti+0x83 a.out`kglhdgn+0x23e a.out`kglpin+0x2fe
a.out`kksauc+0x7c0 a.out`kglLock+0x679 a.out`kglpnp+0x269
a.out`kkscscid_auc_eval+0x176 a.out`kglget+0x11b a.out`kgiind+0xbae
a.out`kkscsCheckCriteria+0x691 a.out`kkspsc0+0x744 a.out`pfri8_inst_spec+0x7e
a.out`kkscsCheckCursor+0x30a a.out`kksParseCursor+0x74 a.out`pfri1_inst_spec+0x45
a.out`kkscsSearchChildList+0x356 a.out`opiosq0+0x70a oracle`pfrrun+0x5e2
a.out`kksfbc+0x916 a.out`kpooprx+0x102 oracle`plsql_run+0x288
a.out`kkspsc0+0x9d7 a.out`kpoal8+0x308 oracle`peicnt+0x946
a.out`kksParseCursor+0x74 oracle`opiodr+0x433 oracle`kkxexe+0x2f3
1 1 4
oracle`kgxExclusive+0x105 oracle`kgxExclusive+0x105
a.out`kglpnal+0x1ae a.out`kglpndl+0x1fe
a.out`kglpin+0x52f a.out`kss_del_cb+0x19d
a.out`kglpnp+0x269 a.out`kssdel+0xf2
a.out`kgiind+0xbae a.out`kssdch_int+0x326
a.out`pfri8_inst_spec+0x7e a.out`ksudlc+0xf6
a.out`pfri1_inst_spec+0x45 a.out`kss_del_cb+0x106
oracle`pfrrun+0x5e2 a.out`kssdel+0xf2
oracle`plsql_run+0x288 a.out`ksupop+0x23a
oracle`peicnt+0x946 oracle`opiodr+0x4b1
4 8
6. Mutex vs. Latch
Latch is an instance-wide centralized locking mechanism, whereas Mutex is a distributed locking mechanism, directly attached on the specific shared memory data structures. That is why there exists v$latch (v$latch_children) for all Latches, whereas Mutex is exposed as V$DB_OBJECT_CACHE.hash_value. Latch is pre-defined and limited, whereas Mutex is dynamically created/released when requested.
Blog: Reducing "library cache: mutex X" concurrency with dbms_shared_pool.markhot lists top 3 differences between mutexes and latches:
– A mutex can protect a single structure, latches often protect many structures – A mutex get is about 30-35 instructions in the algorithm, compared to 150-200 instructions for a latch get – A mutex is around 16 bytes in size, compared to 112-200 bytes for a latchMutex looks about 5 times slimmer and hence hopefully proportionally faster than Latch.
Further deep discussion can be found in Blog: LATCHES, LOCKS, PINS AND MUTEXES
7. Mutex Contention Test on Linux
Run the test on Linux, the same "library cache: mutex X" contentions can be observed.
Pick one Oracle session's spid, for example, 16060.
Run Command strace, We can see continues output of lines beginning with semtimedop:
$ > strace -Tp 16060
semtimedop(20873218, {{41, -1, 0}}, 1, {0, 10000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.011305>
Refer to Linux Documentations:
int semtimedop(int semid, struct sembuf *sops, size_t nsops, const struct timespec *timeout);
struct sembuf {
unsigned short sem_num; /* semaphore index in array, semaphore number */
short sem_op; /* semaphore operation */
short sem_flg; /* operation flags */
};
struct timespec {
time_t tv_sec;
long tv_nsec;
};
EAGAIN: An operation could not proceed immediately and
either IPC_NOWAIT was specified in sem_flg or the time limit specified in timeout expired.
timespec.tv_nsec = 10000000 ns (0.011305 Second) looks like CFS Scheduler Time Slice as 10ms.We can try to simulate Oracle ORA-07445 by:
kill -s SEGV 16060
We get database alert log and incident file of killed session:
-- alert log
Thu Oct 26 09:04:08 2017
Exception [type: SIGSEGV, unknown code] [ADDR:0x6400003F6A] [PC:0x7F98B7BA728A, semtimedop()+10] [exception issued by pid: 16234, uid: 100] [flags: 0x0, count: 1]
Errors in file /orabin/app/oracle/admin/c0d00445/diag/rdbms/c0d00445/c0d00445/trace/c0d00445_j001_16060.trc (incident=22697):
ORA-07445: exception encountered: core dump [semtimedop()+10] [SIGSEGV] [ADDR:0x6400003F6A] [PC:0x7F98B7BA728A] [unknown code] []
Incident details in: /orabin/app/oracle/admin/c0d00445/diag/rdbms/c0d00445/c0d00445/incident/incdir_22697/c0d00445_j001_16060_i22697.trc
-- incident file
Exception [type: SIGSEGV, unknown code] [ADDR:0x6400003F6A] [PC:0x7F98B7BA728A, semtimedop()+10]
[exception issued by pid: 16234, uid: 100] [flags: 0x0, count: 1]
Registers:
%rax: 0xfffffffffffffffc %rbx: 0x0000000000000000 %rcx: 0xffffffffffffffff
%rdx: 0x0000000000000001 %rdi: 0x00000000013e8002 %rsi: 0x00007ffc98536278
%rsp: 0x00007ffc985360a8 %rbp: 0x00007ffc985362a0 %r8: 0x0000000000002710
%r9: 0x0000000000989680 %r10: 0x00007ffc98536228 %r11: 0x0000000000000202
%r12: 0x00007ffc985364d0 %r13: 0x00000000013e8002 %r14: 0x00000001708d3998
%r15: 0x0000000000000000 %rip: 0x00007f98b7ba728a %efl: 0x0000000000000202
semtimedop()+0 (0x7f98b7ba7280) mov %rcx,%r10
semtimedop()+3 (0x7f98b7ba7283) mov $0xdc,%eax
semtimedop()+8 (0x7f98b7ba7288) syscall
> semtimedop()+10 (0x7f98b7ba728a) cmp $0xfffff001,%rax
semtimedop()+16 (0x7f98b7ba7290) jae 0x7f98b7ba7293
semtimedop()+18 (0x7f98b7ba7292) ret
semtimedop()+19 (0x7f98b7ba7293) mov 0x2a2d0e(%rip),%rcx
semtimedop()+26 (0x7f98b7ba729a) xor %edx,%edx
sslsshandler()+456 -> ssexhd() // CALL TYPE: call ERROR SIGNALED: yes COMPONENT: (null)
__sighandler() -> sslsshandler()
semtimedop()+10 -> __sighandler()
sskgpwwait()+237 -> semtimedop()
skgpwwait()+200 -> sskgpwwait()
ksliwat()+2091 -> skgpwwait()
kslwaitctx()+161 -> ksliwat()
ksfwaitctx()+32 -> kslwaitctx()
kgxWait()+1076 -> ksfwaitctx()
kgxExclusive()+741 -> kgxWait()
kglGetMutex()+170 -> kgxExclusive()
kglpndl()+503 -> kglGetMutex()
kglpnds()+81 -> kglpndl()
kglUnPin()+228 -> kglpnds()
kzctxChkTyp()+306 -> kglUnPin()
kzctxesc()+347 -> kzctxChkTyp()
pevm_icd_call_common()+411 -> kzctxesc()
pfrinstr_ICAL()+135 -> pevm_icd_call_common()
pfrrun_no_tool()+60 -> pfrinstr_ICAL()
pfrrun()+1155 -> pfrrun_no_tool()
plsql_run()+708 -> pfrrun()
8. No Read Consistency in Application Context (Addendum 2017.06.06)
Modify above local context as global context, and start a job to update context value by:
exec clean_jobs;
create or replace context test_ctx using test_ctx_pkg accessed globally;
create or replace procedure ctx_set_global(p_cnt number) as
begin
for i in 1..p_cnt loop
test_ctx_pkg.set_val(i);
end loop;
end;
/
create or replace procedure ctx_set_jobs_global as
l_job_id pls_integer;
begin
dbms_job.submit(l_job_id, 'begin while true loop ctx_set_global(1000000); end loop; end;');
commit;
end;
/
exec ctx_set_jobs_global;
And then query systimestamp and global context by:
alter session set nls_timestamp_tz_format ='yyyy-mon-dd hh24:mi:ss.ff tzr tzd';
column name format a8
column min_time format a35
column max_time format a35
column min_ctx format a10
column max_ctx format a10
-- grace period for job start
exec dbms_lock.sleep(10);
with sq_time_1 as (select systimestamp tm from dual connect by level <= 10000),
sq_ctx_1 as (select sys_context('test_ctx', 'attr') ctx from dual connect by level <= 100),
sq_time_2 as (select systimestamp tm from dual connect by level <= 10000),
sq_ctx_2 as (select sys_context('test_ctx', 'attr') ctx from dual connect by level <= 100)
select 'Snap_1' name, min(tm) min_time, max(tm) max_time, min(ctx) min_ctx, max(ctx) max_ctx from sq_time_1, sq_ctx_1
union all
select 'Snap_2' name, min(tm) min_time, max(tm) max_time, min(ctx) min_ctx, max(ctx) max_ctx from sq_time_2, sq_ctx_2;
NAME MIN_TIME MAX_TIME MIN_CTX MAX_CTX
-------- ----------------------------------- ----------------------------------- ---------- ----------
Snap_1 2017-jun-06 08:05:09.625790 +02:00 2017-jun-06 08:05:09.625790 +02:00 263007 263007
Snap_2 2017-jun-06 08:05:09.625790 +02:00 2017-jun-06 08:05:09.625790 +02:00 362922 362922
exec dbms_lock.sleep(4);
with sq_ctx_1 as (select sys_context('test_ctx', 'attr') ctx from dual connect by level <= 100),
sq_time_1 as (select systimestamp tm from dual connect by level <= 10000)
select 'Snap_1' name, min(tm) min_time, max(tm) max_time, min(ctx) min_ctx, max(ctx) max_ctx from sq_time_1, sq_ctx_1
union all
select 'Snap_2' name, systimestamp, systimestamp, min(sys_context('test_ctx', 'attr')), max(sys_context('test_ctx', 'attr'))
from dual connect by level <= 100;
NAME MIN_TIME MAX_TIME MIN_CTX MAX_CTX
-------- ----------------------------------- ----------------------------------- ---------- ----------
Snap_1 2017-jun-06 08:05:17.354265 +02:00 2017-jun-06 08:05:17.354265 +02:00 693574 693574
Snap_2 2017-jun-06 08:05:17.354265 +02:00 2017-jun-06 08:05:17.354265 +02:00 794397 794396
The output shows that systimestamp satisfies Read Consistency, but global context has two or more different values,
hence global context does not guarantee Statement-Level Read Consistency.Even for local context, Blog: Gotcha: Application Contexts demonstrated Non Read Consistency.
9. Test Code
create or replace context test_ctx using test_ctx_pkg;
create or replace package test_ctx_pkg is
procedure set_val (val number);
end;
/
create or replace package body test_ctx_pkg is
procedure set_val (val number) as
begin
dbms_session.set_context('test_ctx', 'attr', val);
end;
end;
/
create or replace procedure ctx_set(p_cnt number, val number) as
begin
for i in 1..p_cnt loop
test_ctx_pkg.set_val(val); -- 'library cache: mutex X' on TEST_CTX
end loop;
end;
/
create or replace procedure ctx_set_jobs(p_job_cnt number) as
l_job_id pls_integer;
begin
for i in 1.. p_job_cnt loop
dbms_job.submit(l_job_id, 'begin while true loop ctx_set(100000, '||i||'); end loop; end;');
end loop;
commit;
end;
/
create or replace procedure clean_jobs as
begin
for c in (select job from dba_jobs) loop
begin
dbms_job.remove (c.job);
exception when others then null;
end;
commit;
end loop;
for c in (select d.job, d.sid, (select serial# from v$session where sid = d.sid) ser
from dba_jobs_running d) loop
begin
execute immediate
'alter system kill session '''|| c.sid|| ',' || c.ser|| ''' immediate';
dbms_job.remove (c.job);
exception when others then null;
end;
commit;
end loop;
-- select * from dba_jobs;
-- select * from dba_jobs_running;
end;
/
/**
exec ctx_set_jobs(4);
exec clean_jobs;
column program format a20
column event format a25
column p1text format a6
column p1 format 9999999999
column p2text format a6
column p2 format 9999999999999
column p3text format a6
column p3 format 9999999999999999
select sid, program, event, p1text, p1, p2text, p2, p3text, p3
from v$session where program like '%(J%';
SID PROGRAM EVENT P1TEXT P1 P2TEXT P2 P3TEXT P3
----- -------------------- ------------------------- ------ ----------- ------ -------------- ------ -----------------
38 oracle@testdb (J003) library cache: mutex X idn 1317011825 value 3968549781504 where 9041305591414788
890 oracle@testdb (J000) library cache: mutex X idn 1317011825 value 163208757248 where 9041305591414874
924 oracle@testdb (J001) library cache: mutex X idn 1317011825 value 4556960301056 where 9041305591414874
1061 oracle@testdb (J002) library cache: mutex X idn 1317011825 value 3968549781504 where 9041305591414879
column name format a10
column namespace format a15
column type format a15
set numformat 9999999999
select name, namespace, type, hash_value, locks, pins, locked_total, pinned_total
from v$db_object_cache where hash_value in (1317011825);
NAME NAMESPACE TYPE HASH_VALUE LOCKS PINS LOCKED_TOTAL PINNED_TOTAL
---------- --------------- --------------- ----------- ----------- ----------- ------------ ------------
TEST_CTX APP CONTEXT APP CONTEXT 1317011825 4 0 4 257802287
**/