(I)-Tracing Methods (II)-Object Type(Part-1) (III)-Object Type(Part-2) (IV)-Sql Executions (IV) (V)-Contentions and Scalability
This Blog will test the scalability of row cache objects discussed in previous 4 Blogs:
1. Oracle row cache objects Event: 10222, Dtrace Script (I) 2. Row Cache Objects, Row Cache Latch on Object Type: Plsql vs Java Call (Part-1) (II) 3. Row Cache Objects, Row Cache Latch on Object Type: Plsql vs Java Call (Part-2) (III) 4. Row Cache and Sql Executions (IV)and then attempt to build a mathematical model to assess such system.
Empirical data and model predication demonstrate that such system is hardly to be scalable with the number of CPUs because of Latch contentions.
In a pure row cache object system, theoretically, 3 Processors (CPU); experimentally, 9 Processors are necessitated to (almost) achieve the maximum throughput. Extra Processors can only put up high contentions.
The possible alternatives could be using Oracle basic (Built-in) data types or object string serialization.
Finally, we are trying to show that "row cache objects Latch" falls into the same destiny as the single "Result Cache: RC Latch" discussed in Blog: PL/SQL Function Result Cache Invalidation (I)
As a secondary earning of this exercise, "library cache: mutex X" contentions are observed related to Oracle Object Types.
Note: All tests are done in Oracle 12.1.0.2.0 on Solaris, Linux, AIX (SMT 4, LCPU=24) with 6 physical processors.
1. Test
We will make the test based on the code of Blog: Row Cache Objects, Row Cache Latch on Object Type: Plsql vs Java Call (Part-1) (II)
1.1. Plsql Test
Run (see appended Test Code: Plsql):
exec foo_proc_sessions_ctrl(1e9, 10);
which performs dynamic Plsql Call (foo_proc) for 9 different parallel degrees, each of which is running for 10 minutes.Then collect the test data by query:
select * from exec_stats_v;
Table below shows number of parallel (concurrent) sessions, total number of executions, cpu_time in seconds, concurrency_wait_time in seconds,
and microseconds per execution.
exec_stats_v (Plsql SQL_ID: 241pfd82cu4wj)
SESSIONS EXECUTIONS CPU_TIME_S CONCURRENCY_WAIT_TIME_S US_PER_EXEC
-------- ---------- ---------- ----------------------- -----------
Solairs
1 8,096,318 413 0 51
3 21,850,342 1,212 45 57
6 32,609,592 1,923 520 75
12 42,217,169 2,555 1,693 125
18 42,890,535 2,615 3,548 193
24 42,955,245 2,631 5,667 265
36 43,117,305 2,633 9,372 406
42 42,523,671 2,616 11,551 480
48 40,546,142 2,506 13,915 561
AIX
1 6,535,357 251 0 65
3 17,355,458 714 75 74
6 21,327,093 1,069 664 117
12 24,797,761 1,353 1,814 198
18 25,037,784 1,740 2,992 297
24 20,691,357 2,597 2,537 542
36 19,048,665 2,945 7,872 963
42 19,199,984 2,872 11,683 1,132
48 19,214,011 2,815 15,315 1,306
Linux
1 13,992,932 413 0 30
3 32,715,487 1,216 44 39
6 52,665,062 2,222 316 48
12 49,743,420 2,146 2,321 98
18 50,448,264 2,249 4,525 149
24 50,836,921 2,340 6,895 201
36 51,864,133 2,458 12,002 307
42 48,914,411 2,495 14,942 390
48 48,549,375 2,535 17,989 460
The above result shows:
Throughput (EXECUTIONS) from 1 to 3 SESSIONS climbs approximately linearly; of 6 SESSIONS tends to be flat or descending. Max throughput is achieved with around 9 parallel Sessions. The performance is saturated by more than 12 parallel Sessions. The response time per execution (US_PER_EXEC) gets increased, probably due to the Latch contentions (sessions spend more time on Latch Gets instead of real work).During the test, we also collect the data for Row Cache Object Gets, and Row Cache Latch Gets.
Here the output for parallel sessions 12 and 42.
select * from rco_stats_v where sessions in (12, 42) order by child#;
rco_stats_v
SESSIONS CHILD# RC_PARAMETER GETS MISSES SLEEPS SPIN_GETS WAIT_TIME_S RC_GETS RC_GETMISSES RC_THEORY_GETS CACHE#_LIST PARAMETER_LIST
-------- ------ ---------------- --------- -------- ------ --------- ----------- --------- ------------ -------------- ----------- ----------------------------
12 8 dc_users 633984132 19649121 52958 19596500 262 211328141 0 1901952396 7, 10 dc_users, dc_users
12 9 dc_object_grants 633984845 54784381 58712 54726007 292 211328352 0 1901954535 8, 8 dc_objects, dc_object_grants
12 10 dc_global_oids 633982830 52451846 68813 52383487 346 211327687 0 1901948490 17 dc_global_oids
42 8 dc_users 638662605 22653084 56234 22597401 2274 212887753 47 1915987815 7, 10 dc_users, dc_users
42 9 dc_object_grants 638674026 56604969 65236 56540334 3006 212891172 552 1916022078 8, 8 dc_objects, dc_object_grants
42 10 dc_global_oids 638657482 64387264 76460 64311564 3905 212885997 30 1915972446 17 dc_global_oids
Look the case of 42 SESSIONS, although all 3 Latch Gets are similar, dc_global_oids suffers most, then dc_object_grants, and dc_users least.
They are exposed by the different MISSES, SLEEPS, SPIN_GETS, WAIT_TIME. We will use this info in the later causality modeling.Note Latch GETS is 3 times of RC_GETS.
1.2. Java Test
Run appended shell script:
ksh ./foo_proc_sessions_java 600
which invokes Java (RCOObjTypeJDBC1) to call foo_proc for 9 different parallel degrees, each of which is running for 10 minutes, same as above Plsql Test.Then collect the similar data for Java Call as Table below.
exec_stats_v (Java SQL_ID: 8nkt65d7hcnxz)
SESSIONS EXECUTIONS CPU_TIME_S CONCURRENCY_WAIT_TIME_S US_PER_EXEC
-------- ---------- ---------- ----------------------- -----------
Solairs
1 3,790,741 294 0 77
3 9,154,792 774 112 99
6 13,334,385 1,213 631 149
12 16,216,973 1,565 1,711 264
18 16,971,173 1,720 2,622 400
24 17,062,335 1,745 3,962 551
36 16,616,651 1,735 6,995 925
42 16,492,514 1,733 10,003 1,117
48 16,264,599 1,721 13,106 1,323
AIX
1 3,288,798 166 0 86
3 8,965,468 498 39 99
6 11,661,246 701 221 148
12 14,770,591 1,186 972 274
18 14,791,463 1,715 1,534 459
24 13,030,697 2,220 1,727 749
36 12,096,987 2,211 5,475 1,173
42 11,845,242 2,090 8,166 1,388
48 11,732,373 2,080 11,252 1,661
Linux
1 4,788,119 197 0 50
3 12,325,022 647 22 66
6 20,053,654 1,233 261 91
12 20,646,925 1,050 1,518 156
18 20,118,792 1,016 3,261 257
24 19,740,172 1,148 5,525 393
36 19,285,554 1,229 10,114 662
42 19,069,527 1,226 12,541 806
48 18,363,626 1,194 15,031 978
The result shows that throughput (EXECUTIONS) is about 2 times lower than Plsql,
and response time (US_PER_EXEC) is about 2 times higher than Plsql.In Blog: Row Cache Objects, Row Cache Latch on Object Type: Plsql vs Java Call (Part-1) (II), we see that each dynamic Plsql foo execution requires 45 Row Cache Latch Gets, whereas that of Java is 63. About 40% more Latch Gets in Java, but its throughput is about 2 times lower than Plsql.
2. Modeling
Let's try to model 3 Row Cache Object (CID: 17, 11, 7) Gets, and respective Latch Gets, each of which contains 3 latch Locations ("Where").
In the following discussion, we model:
Oracle Session as Job Row Cache Object Get as Task Task Processing Server as MachineSuppose we have one Workshop W, and n Job Generators:
G = (G1, G2, ... ,Gn)each holds one single Job in each instant (one terminated, a new produced).
Each Job is made of 3 Tasks (sub-Jobs):
J_i = (Si_1, Si_2, Si_3)Each Task consists of 3 Work Units:
Si_1 = (Si_1_u1, Si_1_u2, Si_1_u3) Si_2 = (Si_2_u1, Si_2_u2, Si_2_u3) Si_3 = (Si_3_u1, Si_3_u2, Si_3_u3)They are subject to constraints:
All 3 Tasks in each Job have to be processed sequentially. All 3 Work Units in each Task have to be processed sequentially. The Tasks and Work Units among different Jobs can be in parallel.The Workshop is equipped with an assembly line, which consists of 3 Machines (Processors):
W = (M1, M2, M3)Each Machine possesses 3 Processing Units:
M1 = (p1_1, p1_2, p1_3) M2 = (p2_1, p2_2, p2_3) M3 = (p3_1, p3_2, p3_3)3 Machines are dedicated respectively for 3 Tasks:
M1 exclusively processes Si_1 M2 exclusively processes Si_2 M3 exclusively processes Si_3M1, M2, M3 are running in parallel (inter-parallel); but Machine'3 processing Units are hardly running in parallel (no intra-parallel).
The service time of 3 Machines are:
t1 for M1 to process Si_1 t2 for M2 to process Si_2 t3 for M3 to process Si_3So minimum processing time of each Job is (t1 + t2 + t3).
Let's look the processing of first n Jobs.
Assume t1 < t2 <t3 (later empirical data support such presumption), after i-th Job being processed, there are:
(n-i)*(t2-t1)/t1Jobs waiting before M2 after M1; There are:
(n-i)*(t3-t2)/t2Jobs waiting before M3 after M2 when i-th Job being processed by M3.
So M3 processing i-th Job caused a delay of
(t3-t2)for
(n-i)*(t3-t2)/t2Jobs, that means an accumulation delay by i-th Job (J_i) is:
(t3-t2) * (n-i) * (t3-t2)/t2 1 <= i <= nThe total accumulation waiting time before M3 for processing all n Jobs is:
(t3-t2) * (n*(n-1)/2) * (t3-t2)/t2 = n*(n-1) * (t3-t2)^2 / (2 * t2)In case of t1 < t2 < t3, all Jobs waiting before M3. If we only consider the fastest machine M1 and slowest machine M3, i.e. min(t1, t2, t3) = t1, max(t1, t2, t3) = t3, M1 is first machine, M3 is last one, both are running in parallel, The accumulation waiting time inside whole Workshop can be approximately estimated as:
Average waiting time per Job is:
(n-1) * (t3-t1)^2 / (2 * t1)Put all together, average response time can be expressed as:
avg_response_time (rt) = waiting_time (wt) + service_time (st) = (n-1) * (t3-t1)^2 / (2 * t1) + t3 throughput = (1 / rt) <= (1 / t3)Therefore,
total response time is quadratically to n. average response time is linearly to n, which is similar to test result. maximum throughput is 1 / avg_response_time.Take the example 10222 trace file of Solairs at the beginning of this section:
t1 = 255 us for cid = 17 t2 = 324 us for cid = 11 t3 = 410 us for cid = 7 (covered by cid = 11) t1 + t2 + t3 = 989 (255+324+410) = sumThe time collected in 10222 trace is much bigger than Solaris US_PER_EXEC for one single session (SESSIIONS=1), probably because overhead of 10222 tracing. So we first convert them to the number without tracing (all are commented after "->").
US_PER_EXEC = 51 -- Solaris SESSIONS=1 t1 + t2 + t3 = 989 (255+324+410) = sum -> 51 t1 = 255 -> 51 * 255/989 = 13 t2 = 324 -> 51 * 324/989 = 17 t3 = 410 -> 51 * 410/989 = 21Now we can try to make the calculation for 12 SESSIONS (n=12). The accumulation waiting time inside entire Workshop is:
n*(n-1) * (t3-t2)^2 / (2 * t2) = 12*11*(21-13)*(21-13)/(2*13) = 324Average waiting time for each Job:
(n-1) * (t3-t2)^2 / (2 * t2) = 11*(21-13)*(21-13)/(2*13) = 27Average response time for each Job:
27 + 21 = 48In the above discussion, Job is modelled by 2 Layers:
Task Work UnitWorkshop is also modelled by 2 Layers correspondingly:
Machine Processing UnitsTill now, we only assess the first Layer. The second Layer have not yet been taken into account:
Work Units Processing UnitsAnother deficiency is that there are n Job Generators, and each Generator can produce next Job once previous Job is terminated (boundary condition), so there are total n Jobs in the system, but above model only consider one time of processing first n Jobs.
Back to Row Cache, the scope of model is still too far to reflect the real system because we only consider inter-parallel, but no intra-parallel. This is not precise because intra-parallel is related to 2nd and 3rd latch Get Locations; whereas inter-parallel is related to 1st latch Get Locations.
Besides that, the real system is influenced by Latch timeout, spinning, process Preemption, Multithreading, and hardware specialities (e.g. Solaris LWP, AIX SMT).
As evidenced by the model, there exists no deadlock in such a system since all operations are performed sequentially.
3. Model Justification
To verify the model, 3 parameters t1, t2, t3 have to be acquired.
Run foo_proc (see Blog: Row Cache Objects, Row Cache Latch on Object Type: Plsql vs Java Call (Part-1) (II)) with 10222 trace in an idle system.
Then excerpt all 3 Cache Gets (CID: 17, 11, 7) for one Object instance Get, for example, T_OBJ_OUT (This is only illustrative. In practice, more representative data should be captured).
Here is an example from Solaris (irrelevant lines removed):
(12cR1 10222 trace is hard to read, 12cR2 improved. see Blog: Oracle row cache objects Event: 10222, Dtrace Script (I))
--================ Start T_OBJ_OUT Get ================--
kqrfrpo : freed to fixed free list po=175d2f708 time=1459184662
##********* cid = 17, Start time=1459184662 *********--
kqrpad: new po 17727bac0 from kqrpre1.1
kqrReadFromDB : kqrpre1.1 po=17727bac0 flg=8000 cid=17 eq=1753067e0 idx=0 dsflg=0
kqrpre1 : done po=17727bac0 cid=17 flg=2 hash=3c4bb9d0 0 eq=1753067e0 SQL=begin :1 := foo(:2, :3, :4); end; time=1459184804
##------- cid = 17, End time=1459184917, Elapsed = 255 (1459184917-1459184662) -------
kqrfrpo : freed to fixed free list po=17727bac0 time=1459184917
##--********* cid = 11, Start time=1459184917, including cid = 7 *********--
kqrpad: new po 175d2f708 from kqrpre1.1
kqrReadFromDB : kqrpre1.3 po=175d2f708 flg=8000 cid=11 eq=1753067e0 idx=0 dsflg=0
kqrpre1 : done po=175d2f708 cid=11 flg=2 hash=94b841cf a9461655 eq=1753067e0
obobn=2360170 obname=T_OBJ_OUT obtyp=13 obsta=1 obflg=0 SQL=begin :1 := foo(:2, :3, :4); end; time=1459185132
##--********* cid = 7, Start time=1459185132 *********--
kqrpad: new po 170d94488 from kqrpre1.1
kqrReadFromDB : kqrpre1.3 po=170d94488 flg=8000 cid=7 eq=176e410b8 idx=0 dsflg=0
kqrpre1 : done po=170d94488 cid=7 flg=2 hash=de7751cd 395edb55 eq=176e410b8
SQL=begin :1 := foo(:2, :3, :4); end; time=1459185415
##------- cid = 7, End time=1459185542, Elapsed = 410 (1459185542-1459185132) -------
kqrfrpo : freed to heap po=170d94488 time=1459185542
kqrmupin : kqrpspr2 Unpin po 175d2f708 cid=11 flg=2 hash=a9461655 time=1459185556
##------- cid = 11, End time=1459185651, Elapsed = 734 (1459185651-1459184917), Pure = 324 (734-410) -------
kqrfrpo : freed to fixed free list po=175d2f708 time=1459185651
--================ End T_OBJ_OUT Get ================--
Look that example output from Solaris (irrelevant lines removed. AIX and Linux are added for comparison),
values for 3 Cache Gets are captured ("time" in microsecond are enclosed in comment lines like <-- xx -->),
then convert them into values (all prefixed by "->") without 10222 tracing according to value
for SESSIONS=1 (no concurrency) in Table-1.
----------------------- Solaris ---------------------- US_PER_EXEC = 51 -- Table-1 Solaris SESSIONS=1 t1 = 255 us -> 13 for cid = 17 t2 = 324 us -> 17 for cid = 11 t3 = 410 us -> 21 for cid = 7 (covered by cid = 11) total Elapsed = 989 us (255+324+410) -> 51 ----------------------- AIX ---------------------- US_PER_EXEC = 65 -- Table-1 AIX SESSIONS=1 t1 = 143 us -> 9 for cid = 17 t2 = 186 us -> 12 for cid = 11 t3 = 252 us -> 16 for cid = 7 (covered by cid = 11) total Elapsed = 581 us (143+186+252) -> 65 ----------------------- Linux ---------------------- US_PER_EXEC = 30 -- Table-1 Linux SESSIONS=1 t1 = 136 us -> 8 for cid = 17 t2 = 164 us -> 9 for cid = 11 t3 = 229 us -> 13 for cid = 7 (covered by cid = 11) total Elapsed = 529 us (136+164+229) -> 30Using the above average response time formula:
(n-1) * (t3-t1)^2 / (2 * t1) + t3substitute all variables and run following queries (for SESSIONS varying from 1 to 48), we get the average response time in microsecond (us) per execution: MODEL_US_PER_EXEC. Both test data and model data are shown in Table below.
-------- Solaris -------- select * from ( select level sessions, round((level-1)*(21-13)*(21-13)/(2*13)) + 21 model_us_per_exec from dual connect by level < 100 ) where sessions in (1, 3, 6, 12, 18, 24, 36, 42, 48); -------- AIX -------- select * from ( select level sessions, round((level-1)*(16-9)*(16-9)/(2*9)) + 16 model_us_per_exec from dual connect by level < 100 ) where sessions in (1, 3, 6, 12, 18, 24, 36, 42, 48); -------- Linux -------- select * from ( select level sessions, round((level-1)*(13-8)*(13-8)/(2*8)) + 13 model_us_per_exec from dual connect by level < 100 ) where sessions in (1, 3, 6, 12, 18, 24, 36, 42, 48);All tests are done in Oracle 12.1.0.2.0 on Solaris, Linux, AIX (SMT 4, LCPU=24) with 6 physical processors. Linux and AIX are added for comparison.
Parallel | Test | Test | Model
SESSIONS | EXECUTIONS | US_PER_EXEC | US_PER_EXEC
---------| ---------- | ----------- | -----------
Solairs | | |
1 | 8,096,318 | 51 | 21
3 | 21,850,342 | 57 | 26
6 | 32,609,592 | 75 | 33
12 | 42,217,169 | 125 | 48
18 | 42,890,535 | 193 | 63
24 | 42,955,245 | 265 | 78
36 | 43,117,305 | 406 | 107
42 | 42,523,671 | 480 | 122
48 | 40,546,142 | 561 | 137
---------| ---------- | ----------- | -----------
AIX | | |
1 | 6,535,357 | 65 | 16
3 | 17,355,458 | 74 | 21
6 | 21,327,093 | 117 | 30
12 | 24,797,761 | 198 | 46
18 | 25,037,784 | 297 | 62
24 | 20,691,357 | 542 | 79
36 | 19,048,665 | 963 | 111
42 | 19,199,984 | 1,132 | 128
48 | 19,214,011 | 1,306 | 144
---------| ---------- | ----------- | -----------
Linux | | |
1 | 13,992,932 | 30 | 13
3 | 32,715,487 | 39 | 16
6 | 52,665,062 | 48 | 21
12 | 49,743,420 | 98 | 30
18 | 50,448,264 | 149 | 40
24 | 50,836,921 | 201 | 49
36 | 51,864,133 | 307 | 68
42 | 48,914,411 | 390 | 77
48 | 48,549,375 | 460 | 86
Now it is open to judge the model by comparing the empirical observations with model predicated values,
and inspect its capability of extrapolation.
4. Row Cache Objects Latch: dedicated
Following query shows that there exists a 1-to-1 mapping between Row Cache and Row Cache Latch. Each Cache is protected by one single dedicated Latch. Hence accessing Row Cache is serialized by its designated Latch.
create view sys.x_kqrst as select * from x$kqrst;
select count(distinct latch_child), count(distinct cache#)
from (
select la.child# latch_child, kqrstcid cache#, kqrsttxt parameter, dc.*, la.*
from sys.x_kqrst dc, v$latch_children la
where dc.kqrstcln = la.child# and la.name='row cache objects'
)
where kqrsttyp =1
order by latch_child;
COUNT(DISTINCTLATCH_CHILD) COUNT(DISTINCTCACHE#)
-------------------------- ---------------------
56 56
select *
from (
select la.child# latch_child, kqrstcid cache#, kqrsttxt parameter, kqrsttyp --, dc.*, la.*
from sys.x_kqrst dc, v$latch_children la
where dc.kqrstcln = la.child# and la.name='row cache objects'
)
where kqrsttyp =1 -- parent
order by latch_child;
LATCH_CHILD CACHE# PARAMETER KQRSTTYP
----------- ------- --------------------- ---------
8 10 dc_users 1
9 8 dc_objects 1
10 17 dc_global_oids 1
18 15 dc_props 1
For example, to access Object Type: T_OBJ_OUT, we should sequentially go though 3 Caches:
dc_global_oids dc_objects dc_usersand each of which is guarded by its responsible Child Latch:
10 9 8It looks like that "row cache objects Latch" works similar to the single "Result Cache: RC Latch" discussed in Blog: PL/SQL Function Result Cache Invalidation (I). But the difference is that later is optional, and can be turned off; but the former is an unavoidable obstacle.
To this point, we can have a further look of other Oracle Latches. Probably only cache buffers chains Latch is conceived to be scale (nonetheless "latch: cache buffers chains" Wait Event) since each of which has a limited responsibility and it amounts to majority of Latch (> 75%). Any excessive usage of other Latches is rated as irrational, hence contentions.
select l.name, count(*) cnt, round(100*ratio_to_report(count(*)) over (), 2) ratio
from v$latch l, v$latch_children k
where l.latch# = k.latch#(+)
-- and l.name in ('cache buffers chains', 'row cache objects', 'shared pool', 'Result Cache: RC Latch', 'kokc descriptor allocation latch')
-- and lower(l.name) like '%kokc%'
group by l.name order by cnt desc, l.name;
NAME CNT RATIO
--------------------------------- ------ ------
cache buffers chains 32768 75.38
simulator hash latch 2048 4.71
...
row cache objects 56 0.13
...
shared pool 7 0.02
...
Result Cache: RC Latch 1 0
...
kokc descriptor allocation latch 1 0
770 rows selected.
(-------------- cache buffers chains -------------- Hidden Parameters and default values: _db_block_hash_latches=32768 Number of database block hash latches, _db_block_hash_buckets=1048576 Number of database block hash buckets 1048576/32768=32 buckets per latch Blog: Hot Block Identification and latch: cache buffers chains -------------- simulator hash latch -------------- Blog: Row Cache and Sql Executions (IV) -------------- row cache objects -------------- "Current Location" and Blog: nls_database_parameters, dc_props, latch: row cache objects -------------- shared pool -------------- Blog: sql_id and idn to shared pool subpool -------------- Result Cache: RC Latch -------------- Blog: Result Cache:0 rows updated Invalidations (IV) -------------- kokc descriptor allocation latch -------------- Bug 14382262 latch free wait event in"kokc descriptor allocation latch" Blog: dbms_aq.dequeue - latch: row cache objects on AIX Blog: Shared Pool - KKSSP )
5. AWR/ASH - Row Cache Objects
We collect AWR/ASH reports for one 10 minutes Run of Plsql Test in 42 parallel sessions, and extract all Row Cache Objects related Sections.
5.1. Load Profile
DB CPU(s) Per Second: 5.9 shows that all 6 CPU are busy (CPU intensive), and about 100*(1-5.9/66.2) = 91% of DB Time are in waiting.
Load Profile
Per Second | Per Transaction | Per Exec | Per Call | |
DB Time(s): | 66.2 | 373.6 | 0 | 195.01 |
DB CPU(s): | 5.9 | 33.2 | 0 | 17.34 |
5.2. Top 10 Foreground Events by Total Wait Time
23 %DB time is in Waiting Event for latch: row cache objects, with Wait Avg(ms) 46.85 (probably bigger than 10 ms CPU Scheduling time slice).
9 %DB time is in Waiting Event for library cache: mutex X, with Wait Avg(ms) 10.49.
Top 10 Foreground Events by Total Wait Time
Event | Waits | Total Wait Time (sec) | Wait Avg(ms) | % DB time |
latch: row cache objects | 196,081 | 9185.8 | 46.85 | 23 |
library cache: mutex X | 367,581 | 3856 | 10.49 | 9.6 |
DB CPU | 3554.8 | 8.9 |
5.3. SQL ordered by Elapsed Time
SQL_ID: 241pfd82cu4wj has 42,533,404 Executions with Elapsed Time 20,423.63 seconds, all in CPU.
SQL ordered by Elapsed Time
Elapsed Time (s) | Executions | Elapsed Time per Exec (s) | %Total | %CPU | %IO | SQL Id | SQL Text |
24,895.60 | 42 | 592.75 | 62.27 | 14.26 | 0 | 22thkvvjndnmj | DECLARE job BINARY_INTEGER := ... |
20,423.63 | 42,533,404 | 0 | 51.09 | 12.81 | 0 | 241pfd82cu4wj | begin :1 := foo(:2, :3, :4); e... |
5.4. Latch Activity and Latch Sleep Breakdown
Total Latch Gets is 1,916,333,412, with 197,952 Sleeps, about 7.5% of Latch Miss (Gets/Misses=143,650,808/1,916,333,412). 197,952 Sleeps matches Latch Waits in Section "Top 10 Foreground Events by Total Wait Time".
Look rco_stats_v table for 42 SESSIONS, the 3 GETs for 3 Caches (dc_global_oids, dc_object_grants, dc_users) summing together is 1,915,994,113, close to above Total Latch Gets.
Same matches for MISSES, SLEEPS, SPIN_GETS, WAIT_TIME_S.
Latch Activity
Latch Name | Get Requests | Pct Get Miss | Avg Slps /Miss | Wait Time (s) |
row cache objects | 1,916,333,412 | 7.5 | 0 | 9186 |
Latch Sleep Breakdown
Latch Name | Get Requests | Misses | Sleeps | Spin Gets |
row cache objects | 1,916,333,412 | 143,650,808 | 197,952 | 143,454,769 |
5.5. Latch Miss Sources
Blog: Oracle row cache objects Event: 10222, Dtrace Script (I) shows that each Latch Get goes through 3 consecutive Locations ("Where"):
Where=>4441(0x1159): kqrpre: find obj -- latch Get at 1st Location Where=>4464(0x1170): kqreqd -- latch Get at 2nd Location Where=>4465(0x1171): kqreqd: reget -- latch Get at 3rd LocationHere we can see "kqrpre: find obj" is main blocker (Waiter Sleeps: 179,972), whereas "kqreqd: reget" is the top victim (Sleeps: 157,629), which accounts for about 80% (157,629/197,952) of Latch Sleeps.
In the above discussion, we noted that inter-parallel is related to 1st Latch Get Locations.
Latch Miss Sources
Latch Name | Where | NoWait Misses | Sleeps | Waiter Sleeps |
row cache objects | kqreqd: reget | 0 | 157,629 | 3,121 |
row cache objects | kqrpre: find obj | 0 | 33,484 | 179,972 |
row cache objects | kqreqd | 0 | 6,812 | 14,827 |
5.6. Mutex Sleep Summary
This is related to library cache: mutex X (to be discussed in next Section).
Mutex Sleep Summary
Mutex Type | Location | Sleeps | Wait Time (ms) |
Library Cache | kglpndl1 95 | 163,409 | 1,187,289 |
Library Cache | kglhdgn2 106 | 161,608 | 1,191,330 |
Library Cache | kglpin1 4 | 58,848 | 393,368 |
Library Cache | kgllkdl1 85 | 31,551 | 186,930 |
Library Cache | kglpnal1 90 | 30,451 | 176,736 |
Library Cache | kglget1 1 | 30,157 | 233,526 |
Library Cache | kglhdgn1 62 | 26,839 | 229,241 |
5.7. Child Latch Statistics
See rco_stats_v.
Child Latch Statistics
Latch Name | Child Num | Get Requests | Misses | Sleeps | Spin & Sleeps 1->3+ |
row cache objects | 9 | 638,804,551 | 56,606,984 | 65,237 | 56542348/0/0/0 |
row cache objects | 8 | 638,793,715 | 22,654,054 | 56,248 | 22598358/0/0/0 |
row cache objects | 10 | 638,788,857 | 64,389,752 | 76,467 | 64314045/0/0/0 |
5.8. Dictionary Cache Stats
See rco_stats_v.
I almost overlooked this Section because each popular Oracle term gets two names.
Dictionary Cache Stats
Cache | Get Requests | Pct Miss | Scan Reqs | Pct Miss | Mod Reqs | Final Usage |
dc_global_oids | 212,933,433 | 0 | 0 | 0 | 69 | |
dc_objects | 212,938,294 | 0 | 0 | 117 | 3,585 | |
dc_users | 212,935,101 | 0 | 0 | 0 | 89 |
5.9. Top SQL with Top Events, Top Events
This ASH Section listed our foo statement triggering the Top Events.
Top SQL with Top Events
SQL ID | Plan Hash | Executions | % Activity | Event | % Event | SQL Text |
241pfd82cu4wj | 2048 | 84.37 | latch: row cache objects | 41.39 | begin :1 := foo(:2, :3, :4); e... | |
241pfd82cu4wj | 2048 | 84.37 | CPU + Wait for CPU | 34.99 | begin :1 := foo(:2, :3, :4); e... | |
241pfd82cu4wj | 2048 | 84.37 | library cache: mutex X | 7.92 | begin :1 := foo(:2, :3, :4); e... |
Top Events
Event | Event Class | Session Type | % Activity | Avg Active Sessions |
CPU + Wait for CPU | CPU | FOREGROUND | 45.57 | 18.39 |
latch: row cache objects | Concurrency | FOREGROUND | 41.39 | 16.71 |
library cache: mutex X | Concurrency | FOREGROUND | 12.92 | 5.22 |
5.10. Top Event P1/P2/P3 Values
Top Event P1/P2/P3 Values
Event | % Event | P1, P2, P3 Values | % Activity | Parameter 1 | Parameter 2 | Parameter 3 |
latch: row cache objects | 41.39 | "6490258456","411","0" | 18.91 | address | number | tries |
latch: row cache objects | "6490257608","411","0" | 11.94 | address | number | tries | |
latch: row cache objects | "6490258032","411","0" | 10.54 | address | number | tries | |
library cache: mutex X | 12.92 | "595386336", "3143916060672", "10136848668098666" | 0.66 | idn | value | where |
Pick all 3 P1 Parameters in Event latch, run query below. The output shows that they point to the 3 Child_Latches (10, 9, 8) repectively.
select to_number(addr, 'XXXXXXXXXXXXXXXX') addr_num, addr,
child#, gets, misses, sleeps, spin_gets, wait_time
from v$latch_children lat
where latch#=411
and (addr like '%'||trim(to_char(6490258456, 'XXXXXXXXX'))
or addr like '%'||trim(to_char(6490257608, 'XXXXXXXXX'))
or addr like '%'||trim(to_char(6490258032, 'XXXXXXXXX')))
order by sleeps desc;
ADDR_NUM ADDR CHILD# GETS MISSES SLEEPS SPIN_GETS WAIT_TIME
---------- ---------------- ------ --------- -------- ------ --------- ----------
6490258456 0000000182D97C18 10 638829560 64389752 76467 64314045 3905162356
6490258032 0000000182D97A70 9 638989001 56607014 65237 56542378 3006243205
6490257608 0000000182D978C8 8 639105230 22654062 56248 22598366 2274407991
6. library cache: mutex X
Look ASH Section - Top Event P1/P2/P3 Values, pick P1 Parameters in Event mutex, run query below.
The output shows that P1 Parameter: "595386336" represents T_OBJ_IN.
with start_snap as (
select mutex_identifier, location, max(gets) gets, max(sleeps) sleeps, min(sleep_timestamp) min_ts
from v$mutex_sleep_history
where sleep_timestamp < sysdate -30/1440
group by mutex_identifier, location)
,end_snap as (
select mutex_identifier, location, min(gets) gets, min(sleeps) sleeps, max(sleep_timestamp) max_ts
from v$mutex_sleep_history
where sleep_timestamp < sysdate -1/1440
group by mutex_identifier, location)
,delta as (
select e.mutex_identifier, e.location,
e.gets - nvl(s.gets, 0) gets,
e.sleeps - nvl(s.sleeps, 0) sleeps, min_ts, max_ts
from start_snap s, end_snap e
where e.mutex_identifier = s.mutex_identifier (+)
and e.location = s.location (+))
,snap as (
select mutex_identifier, sum(gets) gets, sum(sleeps) sleeps, min(min_ts) min_ts, max(max_ts) max_ts
from delta group by mutex_identifier
order by sleeps desc
)
select o.hash_value, name, s.gets, s.sleeps, s.min_ts, s.max_ts
from v$db_object_cache o, snap s
where o.hash_value = s.mutex_identifier
-- and mutex_identifier = 595386336
and rownum <= 5
order by s.sleeps desc;
HASH_VALUE NAME GETS SLEEPS
---------- ----------- ---------- ------
2812823600 T_OBJ_RET 1063400668 19362
668251247 T_OBJ_INOUT 1276396207 13660
595386336 T_OBJ_IN 1063241644 9290
415170529 T_OBJ_OUT 1063009333 8719
4198378475 FOO 254328227 1902
By the above output, if T_OBJ_IN occupies to 0.66 %Activity, T_OBJ_RET could be estimated as 1.38% (0.66*19362/9290) %Activity.
Probably not always the TOP consumer is displayed in ASH. Following query would give an output similar to AWR Section: Mutex Sleep Summary.
(The data in v$mutex_sleep_history is contained within a circular buffer, with the most recent sleeps shown)
with start_snap as (
select mutex_identifier, location, max(gets) gets, max(sleeps) sleeps
from v$mutex_sleep_history
where sleep_timestamp < timestamp'2018-12-17 08:04:30'
group by mutex_identifier, location)
,end_snap as (
select mutex_identifier, location, min(gets) gets, min(sleeps) sleeps
from v$mutex_sleep_history
where sleep_timestamp < timestamp'2018-12-17 08:14:50'
group by mutex_identifier, location)
,delta as (
select e.mutex_identifier, e.location,
e.gets - nvl(s.gets, 0) gets,
e.sleeps - nvl(s.sleeps, 0) sleeps
from start_snap s, end_snap e
where e.mutex_identifier = s.mutex_identifier (+)
and e.location = s.location (+))
select location, sum(gets) gets, sum(sleeps) sleeps
from delta group by location
order by sleeps desc;
LOCATION GETS SLEEPS
-------------------------- ---------- ------
kglhdgn2 106 977757576 13812
kglpndl1 95 1062042442 12725
kglpin1 4 1020903933 11455
kglpnal1 90 680931048 9432
kgllkdl1 85 425439695 5679
kksLockDelete [KKSCHLPIN6] 425095566 3199
kglget1 1 255195587 2600
kglhdgn1 62 336370181 2466
kksxsccmp [KKSCHLPIN5] 169434152 242
kksfbc [KKSCHLFSP2] 169595310 232
A quick alternative is to create multiple copies of hot objects, for example,
for T_OBJ_IN, and T_OBJ_RET
(See Blog: "library cache: mutex X" and Application Context)
alter system set "_kgl_hot_object_copies"= 255 scope=spfile;
alter system set "_kgl_debug"=
"name='T_OBJ_IN' schema='K' namespace=1 debug=33554432",
"name='T_OBJ_RET' schema='K' namespace=1 debug=33554432"
scope=spfile;
7. Test Code: Plsql
drop sequence foo_seq;
create sequence foo_seq;
create or replace force view rco_stats_record_v as
with rc as (
select min(parameter) rc_parameter, sum(gets) rc_gets, sum(getmisses) rc_getmisses,
decode(substr(parameter, 1, 9), 'dc_users', 8, 'dc_object', 9, 'dc_global', 10) la_child#,
listagg(cache#, ', ') within group (order by cache#) cache#_list,
listagg(parameter, ', ') within group (order by length(parameter)) parameter_list
from v$rowcache where cache# in (7,10, 8,11, 17) and gets > 0
group by substr(parameter, 1, 9))
select localtimestamp ts, addr, child#, name, gets, misses, sleeps, spin_gets, wait_time, rc.*, 3*gets RC_theory_gets
from v$latch_children la, rc
where la.latch# = 411 and la.child# in (8, 9, 10)
and la.child# = rc.la_child#
order by la.child#;
drop table rco_stats;
create table rco_stats as select 0 run, 0 sessions, v.* from rco_stats_record_v v where 1=2;
create or replace force view rco_stats_v as
select run, sessions, ts, child#, rc_parameter
,gets - lag(gets) over (partition by rc_parameter order by run, sessions) gets
,misses - lag(misses) over (partition by rc_parameter order by run, sessions) misses
,sleeps - lag(sleeps) over (partition by rc_parameter order by run, sessions) sleeps
,spin_gets - lag(spin_gets) over (partition by rc_parameter order by run, sessions) spin_gets
,round((wait_time - lag(wait_time) over (partition by rc_parameter order by run, sessions))/1e6) wait_time_s
,rc_gets - lag(rc_gets) over (partition by rc_parameter order by run, sessions) rc_gets
,rc_getmisses - lag(rc_getmisses) over (partition by rc_parameter order by run, sessions) rc_getmisses
,rc_theory_gets - lag(rc_theory_gets) over (partition by rc_parameter order by run, sessions) rc_theory_gets
,cache#_list, parameter_list
from rco_stats v;
-- Plsql 241pfd82cu4wj begin :1 := foo(:2, :3, :4); end;
-- Java 8nkt65d7hcnxz BEGIN :1 := K.FOO(:2, :3, :4); END;
create or replace force view exec_stats_record_v as
select localtimestamp ts, sql_id, executions, elapsed_time, cpu_time, concurrency_wait_time
from v$sqlarea v
where (lower(sql_text) like 'begin :1 := %foo(:2, :3, :4); end;%' or sql_id in ('241pfd82cu4wj', '8nkt65d7hcnxz'))
and last_active_time = (select max(last_active_time) from v$sqlarea where sql_id in ('241pfd82cu4wj', '8nkt65d7hcnxz'))
and rownum = 1
order by v.last_active_time desc;
drop table exec_stats;
create table exec_stats as select 0 run, 0 sessions, v.* from exec_stats_record_v v where 1=2;
create or replace force view exec_stats_v as
select run, sessions, ts
,executions - lag(executions) over (order by run, sessions) executions
,round((elapsed_time - lag(elapsed_time) over (order by run, sessions))/1e6) elapsed_time_s
,round((cpu_time - lag(cpu_time) over (order by run, sessions))/1e6) cpu_time_s
,round((concurrency_wait_time - lag(concurrency_wait_time) over (order by run, sessions))/1e6) concurrency_wait_time_s
,round((elapsed_time - lag(elapsed_time) over (order by run, sessions))/(nullif (executions - lag(executions) over (order by run, sessions), 0))) us_per_exec
,sql_id
from exec_stats;
create or replace procedure record_stats (p_sessions number) as
l_run number;
begin
l_run := foo_seq.nextval;
insert into rco_stats select l_run, p_sessions, v.* from rco_stats_record_v v;
insert into exec_stats select l_run, p_sessions, v.* from exec_stats_record_v v;
commit;
sys.dbms_workload_repository.create_snapshot('ALL');
end;
/
create or replace procedure k.foo_proc_sessions(p_session_cnt number, p_cnt number) as
l_job_id pls_integer;
begin
for i in 1.. p_session_cnt loop
--dbms_job.submit(l_job_id, 'begin foo_proc('||p_cnt||'); end;');
dbms_job.submit(l_job_id, 'begin while true loop foo_proc('||p_cnt||'); end loop; end;');
end loop;
commit;
end;
/
--exec foo_proc_sessions(6, 1e9);
--exec clean_jobs;
create or replace procedure k.foo_proc_sessions_ctrl(p_cnt number, p_dur_min number := 10) as
type t_num_tab is table of number;
l_nums t_num_tab := new t_num_tab(1, 3, 6, 12, 18, 24, 36, 42, 48);
l_run number;
begin
foo_proc(10); -- warmup
clean_jobs;
record_stats(0);
for n in 1..l_nums.count loop
foo_proc_sessions(l_nums(n), p_cnt); -- n Sessions
dbms_lock.sleep(p_dur_min*60); -- m minutes
clean_jobs;
record_stats(l_nums(n));
end loop;
end;
/
-- exec foo_proc_sessions_ctrl(1e9, 10);
8. Test Code: UNIX Script for Java Test
-------------- create Script: foo_proc_sessions_java ------
#!/bin/ksh
# setup CLASSPATH
export CLASSPATH=$CLASSPATH:java-path:jdbc-path:RCOObjTypeJDBC1-path
function db_record_stats {
`sqlplus -s /nolog << EOF
connect k/k
exec k.record_stats($1);
exit;
EOF`
}
# warm up
/usr/java/bin/java -Xmx30m -Xms8m RCOObjTypeJDBC1 "jdbc:oracle:thin:k/s@testdb:1521:testdb" 10 2
sleep 2
# MOS (Doc ID 2336567.1). new Oracle Database 12c installation.
# ./foo_proc_sessions_java[20]: db_record_stats[9]: PL/SQL: not found [No such file or directory]
db_record_stats 0
for i in 1 3 6 12 18 24 36 42 48
do
echo "Sessions Count: $i"
j=1
while [[ $j -le $i ]]
do
echo "Run: $j"
/usr/java/bin/java -Xmx30m -Xms8m RCOObjTypeJDBC1 "jdbc:oracle:thin:k/s@testdb:1521:testdb" 1000000000 2 &
(( j = j + 1 ))
done
sleep $1 # run interval
for pid in $(ps -ef | awk '/RCOObjTypeJDBC1/ {print $2}')
do
if [ -n "$pid" -a -e /proc/$pid ]; then
echo "Kill Pid: $pid" # kill if exist
kill -9 $pid
fi
done
sleep 2 # grace period for killing
db_record_stats $i
done
-------------- called by ----
chmod a+x ./foo_proc_sessions_java
ksh ./foo_proc_sessions_java 600