Wednesday, November 7, 2018

Blog List

Latch: row cache objects Contentions and Scalability (V)


(I)-Tracing Methods      (II)-Object Type(Part-1)      (III)-Object Type(Part-2)       (IV)-Sql Executions (IV)      (V)-Contentions and Scalability


This Blog will test the scalability of row cache objects discussed in previous 4 Blogs:
  1. Oracle row cache objects Event: 10222, Dtrace Script (I)
  2. Row Cache Objects, Row Cache Latch on Object Type: Plsql vs Java Call (Part-1) (II)
  3. Row Cache Objects, Row Cache Latch on Object Type: Plsql vs Java Call (Part-2) (III)
  4. Row Cache and Sql Executions (IV)
and then attempt to build a mathematical model to assess such system.

Empirical data and model predication demonstrate that such system is hardly to be scalable with the number of CPUs because of Latch contentions.

In a pure row cache object system, theoretically, 3 Processors (CPU); experimentally, 9 Processors are necessitated to (almost) achieve the maximum throughput. Extra Processors can only put up high contentions.

The possible alternatives could be using Oracle basic (Built-in) data types or object string serialization.

Finally, we are trying to show that "row cache objects Latch" falls into the same destiny as the single "Result Cache: RC Latch" discussed in Blog: PL/SQL Function Result Cache Invalidation (I)

As a secondary earning of this exercise, "library cache: mutex X" contentions are observed related to Oracle Object Types.

Note: All tests are done in Oracle 12.1.0.2.0 on Solaris, Linux, AIX (SMT 4, LCPU=24) with 6 physical processors.


1. Test


We will make the test based on the code of Blog: Row Cache Objects, Row Cache Latch on Object Type: Plsql vs Java Call (Part-1) (II)


1.1. Plsql Test


Run (see appended Test Code: Plsql):
exec foo_proc_sessions_ctrl(1e9, 10);
which performs dynamic Plsql Call (foo_proc) for 9 different parallel degrees, each of which is running for 10 minutes.

Then collect the test data by query:
select * from exec_stats_v;

exec_stats_v (Plsql SQL_ID: 241pfd82cu4wj)

 RUN SESSIONS  TS     EXECUTIONS ELAPSED_TIME_S CPU_TIME_S CONCURRENCY_WAIT_TIME_S    US_PER_EXEC
---- --------  ----   ---------- -------------- ---------- -------------------------- -----------
AIX                   
   1       0  08:19   
   2       1  08:30    6,535,357            427        251                          0          65
   3       3  08:40   17,355,458          1,291        714                         75          74
   4       6  08:50   21,327,093          2,492      1,069                        664         117
   5      12  09:01   24,797,761          4,921      1,353                      1,814         198
   6      18  09:11   25,037,784          7,438      1,740                      2,992         297
   7      24  09:22   20,691,357         11,215      2,597                      2,537         542
   8      36  09:32   19,048,665         18,347      2,945                      7,872         963
   9      42  09:43   19,199,984         21,742      2,872                     11,683       1,132
  10      48  09:53   19,214,011         25,094      2,815                     15,315       1,306
                      
Linux                 
   1       0  09:56    
   2       1  10:06   13,992,932            414        413                          0          30
   3       3  10:16   32,715,487          1,265      1,216                         44          39
   4       6  10:26   52,665,062          2,552      2,222                        316          48
   5      12  10:36   49,743,420          4,853      2,146                      2,321          98
   6      18  10:46   50,448,264          7,509      2,249                      4,525         149
   7      24  10:56   50,836,921         10,230      2,340                      6,895         201
   8      36  11:06   51,864,133         15,913      2,458                     12,002         307
   9      42  11:16   48,914,411         19,080      2,495                     14,942         390
  10      48  11:27   48,549,375         22,317      2,535                     17,989         460
                      
Solairs               
   1       0  16:32   
   2       1  16:42    8,096,318            411        413                          0          51
   3       3  16:52   21,850,342          1,253      1,212                         45          57
   4       6  17:02   32,609,592          2,438      1,923                        520          75
   5      12  17:12   42,217,169          5,263      2,555                      1,693         125
   6      18  17:22   42,890,535          8,293      2,615                      3,548         193
   7      24  17:32   42,955,245         11,387      2,631                      5,667         265
   8      36  17:42   43,117,305         17,501      2,633                      9,372         406
   9      42  17:52   42,523,671         20,418      2,616                     11,551         480
  10      48  18:02   40,546,142         22,747      2,506                     13,915         561
The above result shows:
  Throughput (EXECUTIONS) from 1 to 3 SESSIONS climbs approximately linearly; of 6 SESSIONS tends to be flat or descending.
  Max throughput is achieved with around 9 parallel Sessions.
  The performance is saturated by more than 12 parallel Sessions. 
  The response time per execution (US_PER_EXEC) gets increased, probably due to the Latch contentions
    (sessions spend more time on Latch Gets instead of real work).
During the test, we also collect the data for Row Cache Object Gets, and Row Cache Latch Gets.

Here the output for parallel sessions 12 and 42.
select * from rco_stats_v where sessions in (12, 42) order by child#;

rco_stats_v

SESSIONS  CHILD#  RC_PARAMETER      GETS       MISSES    SLEEPS  SPIN_GETS  WAIT_TIME_S  RC_GETS    RC_GETMISSES  RC_THEORY_GETS  CACHE#_LIST  PARAMETER_LIST
--------  ------  ----------------  ---------  --------  ------  ---------  -----------  ---------  ------------  --------------  -----------  ----------------------------
   12       8     dc_users          633984132  19649121  52958   19596500   262          211328141  0             1901952396      7, 10        dc_users, dc_users
   12       9     dc_object_grants  633984845  54784381  58712   54726007   292          211328352  0             1901954535      8, 8         dc_objects, dc_object_grants
   12       10    dc_global_oids    633982830  52451846  68813   52383487   346          211327687  0             1901948490      17           dc_global_oids
            
   42       8     dc_users          638662605  22653084  56234   22597401   2274         212887753  47            1915987815      7, 10        dc_users, dc_users
   42       9     dc_object_grants  638674026  56604969  65236   56540334   3006         212891172  552           1916022078      8, 8         dc_objects, dc_object_grants
   42       10    dc_global_oids    638657482  64387264  76460   64311564   3905         212885997  30            1915972446      17           dc_global_oids
Look the case of 42 SESSIONS, although all 3 Latch Gets are similar, dc_global_oids suffers most, then dc_object_grants, and dc_users least. They are exposed by the different MISSES, SLEEPS, SPIN_GETS, WAIT_TIME. We will use this info in the later causality modeling.

Note Latch GETS is 3 times of RC_GETS.


1.2. Java Test


Run appended shell script:
ksh ./foo_proc_sessions_java 600
which invokes Java (RCOObjTypeJDBC1) to call foo_proc for 9 different parallel degrees, each of which is running for 10 minutes, same as above Plsql Test.

Then collect the similar data for Java Call.

exec_stats_v (Java SQL_ID: 8nkt65d7hcnxz)

 RUN SESSIONS  TS     EXECUTIONS ELAPSED_TIME_S CPU_TIME_S CONCURRENCY_WAIT_TIME_S    US_PER_EXEC
---- -------- -----   ---------- -------------- ---------- -------------------------- -----------
AIX Java                                                                                         
  11       0  13:04                                                                              
  12       1  13:14    3,288,798             282        166                         0          86
  13       3  13:25    8,965,468             891        498                        39          99
  14       6  13:36   11,661,246           1,721        701                       221         148
  15      12  13:46   14,770,591           4,044      1,186                       972         274
  16      18  13:56   14,791,463           6,788      1,715                     1,534         459
  17      24  14:07   13,030,697           9,756      2,220                     1,727         749
  18      36  14:17   12,096,987          14,192      2,211                     5,475       1,173
  19      42  14:28   11,845,242          16,442      2,090                     8,166       1,388
  20      48  14:38   11,732,373          19,487      2,080                    11,252       1,661
                                                                                                 
Linux Java                                                                                       
  11       0  10:36                                                                              
  12       1  10:46    4,788,119             239        197                         0          50
  13       3  10:56   12,325,022             819        647                        22          66
  14       6  11:07   20,053,654           1,828      1,233                       261          91
  15      12  11:17   20,646,925           3,214      1,050                     1,518         156
  16      18  11:27   20,118,792           5,172      1,016                     3,261         257
  17      24  11:37   19,740,172           7,763      1,148                     5,525         393
  18      36  11:47   19,285,554          12,775      1,229                    10,114         662
  19      42  11:57   19,069,527          15,361      1,226                    12,541         806
  20      48  12:07   18,363,626          17,958      1,194                    15,031         978
                                                                                                 
Solairs Java                                                                                     
  11       0  10:38                                                                              
  12       1  10:48    3,790,741             293        294                         0          77
  13       3  10:58    9,154,792             906        774                       112          99
  14       6  11:08   13,334,385           1,983      1,213                       631         149
  15      12  11:18   16,216,973           4,287      1,565                     1,711         264
  16      18  11:28   16,971,173           6,792      1,720                     2,622         400
  17      24  11:38   17,062,335           9,406      1,745                     3,962         551
  18      36  11:48   16,616,651          15,363      1,735                     6,995         925
  19      42  11:58   16,492,514          18,416      1,733                    10,003       1,117
  20      48  12:08   16,264,599          21,518      1,721                    13,106       1,323
The result shows that throughput (EXECUTIONS) is about 2 times lower than Plsql, and response time (US_PER_EXEC) is about 2 times higher than Plsql.

In Blog: Row Cache Objects, Row Cache Latch on Object Type: Plsql vs Java Call (Part-1) (II), we see that each dynamic Plsql foo execution requires 45 Row Cache Latch Gets, whereas that of Java is 63. About 40% more Latch Gets in Java, but its throughput is about 2 times lower than Plsql.


2. Modeling


Let's try to model 3 Row Cache Object (CID: 17, 11, 7) Gets, and respective Latch Gets, each of which contains 3 latch Locations ("Where").

In the following discussion, we model:
  Oracle Session        as Job
  Row Cache Object Get  as Task
  Row Cache  Latch Get  as Machine
Suppose we have one Workshop W, and n Job Generators:
  G = (G1, G2, ... ,Gn)
each holds one single Job in each instant (one terminated, a new produced).

Each Job is made of 3 Tasks (sub-Jobs):
  J_i = (Si_1, Si_2, Si_3)
Each Task consists of 3 Work Units:
  Si_1 = (Si_1_u1, Si_1_u2, Si_1_u3)
  Si_2 = (Si_2_u1, Si_2_u2, Si_2_u3)
  Si_3 = (Si_3_u1, Si_3_u2, Si_3_u3)
They are subject to constraints:
  All 3 Tasks in each Job have to be processed sequentially.
  All 3 Work Units in each Task have to be processed sequentially.
  The Tasks and Work Units among different Jobs can be in parallel.
The Workshop is equipped with an assembly line, which consists of 3 Machines (Processors):
  W = (M1, M2, M3)
Each Machine possesses 3 Processing Units:
  M1 = (p1_1, p1_2, p1_3)
  M2 = (p2_1, p2_2, p2_3)
  M3 = (p3_1, p3_2, p3_3)
3 Machines are dedicated respectively for 3 Tasks:
  Si_1 is processed exclusively by M1
  Si_2 is processed exclusively by M2
  Si_3 is processed exclusively by M3
3 processing Units in each Machine are running sequentially.

M1, M2, M3 are running in parallel (inter-parallel); but Machine'3 processing Units are hardly running in parallel (no intra-parallel).

The service time of 3 Machines are:
  t1 
  t2 
  t3 
So minimum processing time of each JOB is (t1 + t2 + t3).

Assume t1 < t2 <t3 (later empirical data support such presumption), after n-th Job being processed, there will be:
  n*(t2-t1)/t1 
Jobs waiting before M2 after M1;
  n*(t3-t2)/t2 
Jobs waiting before M3 after M2;

In total, there will be:
  n*(t3-t1)/t2 
Jobs waiting inside Workshop (approximated).

i-th Job (J_i) waiting time before M3 is:
  i*(t3-t2)      1 <= i <= n*(t3-t2)/t2 
The accumulation waiting time before M3 is:
   (t3-t2) * (n*(t3-t2)/t2)*(1 + n*(t3-t2)/t2) / 2
  = n^2 * (t3-t2)^3 / (2 * t2^2)       -- approximated
The accumulation waiting time inside Workshop (approximated): (before M2 and M3, no Waiting Before M1 for empirical data)
  n^2 * (t3-t1)^3 / (2 * t1^2)
average waiting time is:
  n * (t3-t1)^3 / (2 * t1^2)
Put all together, average response time can be expressed as:
    response_time (rt) = waiting_time (wt) + service_time (st)
                       = n * (t3-t1)^3 / (2 * t1^2) + t3
    
    throughput         = (1 / rt)  
                      <= (1 / t3)
Therefore,
  total response time is quadratically to n.
  average response time is linearly to n.
  maximum throughput is 1 / max(t1, t2, t3).    
Since there are n Job Generators, and each Generator can produce next Job once previous Job is terminated (boundary condition). So there are total n Jobs in the system.

Take an example:
  n  =  12
  
  t1 = 255
  t2 = 324
  t3 = 410
    
  t1 + t2 + t3 = 989 (255+324+410)
The accumulation waiting time inside entire Workshop:
           
  12*12*(410-255)*(410-255)*(410-255)/(2*255*255) = 4123
average waiting time:
  12*(410-255)*(410-255)*(410-255)/(2*255*255)    = 343 = 4123/12 
average response time:
  343 + 410 = 753
In the above discussion, Job is modelled by 2 Layers:
  Task
  Work Unit
Workshop is also modelled by 2 Layers correspondingly:
  Machine 
  Processing Units
Till now, we only assess the first Layer. The second Layer have not yet been taken into account:
  Work Units 
  Processing Units 
Back to Row Cache, the scope of model is still too far to reflect the real system because we only consider inter-parallel, but no intra-parallel. It is not precise because intra-parallel is related to 2nd and 3rd Latch Get Locations; whereas inter-parallel is related to 1st Latch Get Locations.

Besides that, the real system is influenced by Latch timeout, spinning, process Preemption, Multithreading, and hardware specialities (e.g. Solaris LWP, AIX SMT).

As evidenced by the model, there exists no deadlock in such a system since all operations are performed sequentially.


3. Model Justification


To verify the model, 3 parameters t1, t2, t3 have to be acquired.

Run foo_proc (see Blog: Row Cache Objects, Row Cache Latch on Object Type: Plsql vs Java Call (Part-1) (II)) with 10222 trace in an idle system.

Then excerpt all 3 Cache Gets (CID: 17, 11, 7) for one Object instance Get, for example, T_OBJ_OUT (This is only illustrative. In practice, more representative data should be captured).

Here is an example from Solaris (irrelevant lines removed):
(12cR1 10222 trace is hard to read, 12cR2 improved. see Blog: Oracle row cache objects Event: 10222, Dtrace Script (I))

--================ Start T_OBJ_OUT Get ================--

kqrfrpo : freed to fixed free list po=175d2f708 time=1459184662

  ##********* cid = 17, Start time=1459184662 *********--
kqrpad: new po 17727bac0 from kqrpre1.1

kqrReadFromDB : kqrpre1.1 po=17727bac0 flg=8000 cid=17 eq=1753067e0 idx=0 dsflg=0
kqrpre1 : done po=17727bac0 cid=17 flg=2 hash=3c4bb9d0 0 eq=1753067e0 SQL=begin :1 := foo(:2, :3, :4); end; time=1459184804

  ##------- cid = 17, End time=1459184917, Elapsed = 255 (1459184917-1459184662) ------- 
kqrfrpo : freed to fixed free list po=17727bac0 time=1459184917

  ##--********* cid = 11, Start time=1459184917, including cid = 7 *********--

kqrpad: new po 175d2f708 from kqrpre1.1

kqrReadFromDB : kqrpre1.3 po=175d2f708 flg=8000 cid=11 eq=1753067e0 idx=0 dsflg=0
kqrpre1 : done po=175d2f708 cid=11 flg=2 hash=94b841cf a9461655 eq=1753067e0 
          obobn=2360170 obname=T_OBJ_OUT obtyp=13 obsta=1 obflg=0 SQL=begin :1 := foo(:2, :3, :4); end; time=1459185132

  ##--********* cid = 7, Start time=1459185132 *********--

kqrpad: new po 170d94488 from kqrpre1.1

kqrReadFromDB : kqrpre1.3 po=170d94488 flg=8000 cid=7 eq=176e410b8 idx=0 dsflg=0
kqrpre1 : done po=170d94488 cid=7 flg=2 hash=de7751cd 395edb55 eq=176e410b8 
          SQL=begin :1 := foo(:2, :3, :4); end; time=1459185415

  ##------- cid = 7, End time=1459185542, Elapsed = 410 (1459185542-1459185132) -------
kqrfrpo : freed to heap po=170d94488 time=1459185542

kqrmupin : kqrpspr2 Unpin po 175d2f708 cid=11 flg=2 hash=a9461655 time=1459185556

  ##------- cid = 11, End time=1459185651, Elapsed = 734 (1459185651-1459184917), Pure = 324 (734-410) -------
kqrfrpo : freed to fixed free list po=175d2f708 time=1459185651

--================ End T_OBJ_OUT Get ================--
Following values are captured (trace file "time" is in microsecond):
----------------------- Solaris ----------------------
  t1 = 255 us   for cid = 17
  t2 = 324 us   for cid = 11 
  t3 = 410 us   for cid =  7 (covered by cid = 11)
  
  total Elapsed = 989 us (255+324+410)
Here data for Linux and AIX:
----------------------- Linux ----------------------
  t1 = 136 us   for cid = 17
  t2 = 164 us   for cid = 11 
  t3 = 229 us   for cid =  7 (covered by cid = 11)
  
  total Elapsed = 529 us (136+164+229) 

----------------------- AIX ----------------------
  t1 = 143 us   for cid = 17
  t2 = 186 us   for cid = 11 
  t3 = 252 us   for cid =  7 (covered by cid = 11)
  
  total Elapsed = 581 us (143+186+252)  
Using the above average response time formula:
    n * (t3-t1)^3 / (2 * t1^2) + t3
for example, in case of Solaris, substitute variables and run query:

select * from (
  select level sessions,
         round(level*(410-255)*(410-255)*(410-255)/(2*255*255)) + 410 model_us_per_exec
  from dual connect by level < 100
) where sessions in (1, 3, 6, 12, 18, 24, 36, 42, 48); 
we obtained the average response time in microsecond (us) per execution: MODEL_US_PER_EXEC. Multiplying by EXECUTIONS gives us MODEL_ELAPSED_TIME_S in second.

Both test data and model data are shown in table below.

                                   Test                      Model
SESSIONS  EXECUTIONS | ELAPSED_TIME_S US_PER_EXEC | US_PER_EXEC ELAPSED_TIME_S
--------- ---------- | -------------- ----------- | ----------- --------------
AIX Plsql            |                            |                         
       0             |                            |                         
       1   6,535,357 |            427          65 |         284          1,856
       3  17,355,458 |          1,291          74 |         347          6,022
       6  21,327,093 |          2,492         117 |         442          9,427
      12  24,797,761 |          4,921         198 |         632         15,672
      18  25,037,784 |          7,438         297 |         822         20,581
      24  20,691,357 |         11,215         542 |       1,012         20,940
      36  19,048,665 |         18,347         963 |       1,392         26,516
      42  19,199,984 |         21,742       1,132 |       1,582         30,374
      48  19,214,011 |         25,094       1,306 |       1,772         34,047
                     |                            |                         
Linux Plsql          |                            |                         
       0   1,170,004 |             34          29 |                         
       1  13,992,932 |            414          30 |         251          3,512
       3  32,715,487 |          1,265          39 |         294          9,618
       6  52,665,062 |          2,552          48 |         359         18,907
      12  49,743,420 |          4,853          98 |         490         24,374
      18  50,448,264 |          7,509         149 |         620         31,278
      24  50,836,921 |         10,230         201 |         751         38,179
      36  51,864,133 |         15,913         307 |       1,012         52,487
      42  48,914,411 |         19,080         390 |       1,142         55,860
      48  48,549,375 |         22,317         460 |       1,273         61,803
                     |                            |                         
Solairs Plsql        |                            |                         
       0             |                            |                         
       1   8,096,318 |            411          51 |         439          3,554
       3  21,850,342 |          1,253          57 |         496         10,838
       6  32,609,592 |          2,438          75 |         582         18,979
      12  42,217,169 |          5,263         125 |         754         31,832
      18  42,890,535 |          8,293         193 |         925         39,674
      24  42,955,245 |         11,387         265 |       1,097         47,122
      36  43,117,305 |         17,501         406 |       1,441         62,132
      42  42,523,671 |         20,418         480 |       1,613         68,591
      48  40,546,142 |         22,747         561 |       1,784         72,334
Now it is open to judge the model by comparing the empirical observations with model predicated values, and inspect its capability of extrapolation.


4. Row Cache Objects Latch: dedicated


Following query shows that there exists a 1-to-1 mapping between Row Cache and Row Cache Latch. Each Cache is protected by one single dedicated Latch. Hence accessing Row Cache is serialized by its designated Latch.

create view sys.x_kqrst as select * from x$kqrst;

select count(distinct latch_child), count(distinct cache#)
  from (
  select la.child# latch_child, kqrstcid cache#, kqrsttxt parameter, dc.*, la.* 
  from sys.x_kqrst dc, v$latch_children la 
  where dc.kqrstcln = la.child# and la.name='row cache objects'
  )
where kqrsttyp =1
order by latch_child;

COUNT(DISTINCTLATCH_CHILD)  COUNT(DISTINCTCACHE#)
--------------------------  ---------------------
56                          56


select *
  from (
  select la.child# latch_child, kqrstcid cache#, kqrsttxt parameter, kqrsttyp  --, dc.*, la.* 
  from sys.x_kqrst dc, v$latch_children la 
  where dc.kqrstcln = la.child# and la.name='row cache objects'
  )
where kqrsttyp =1   -- parent
order by latch_child;

LATCH_CHILD  CACHE#  PARAMETER              KQRSTTYP
----------- ------- ---------------------  ---------
8            10      dc_users               1
9            8       dc_objects             1
10           17      dc_global_oids         1
18           15      dc_props               1
For example, to access Object Type: T_OBJ_OUT, we should sequentially go though 3 Caches:
  dc_global_oids
  dc_objects
  dc_users
and each of which is guarded by its responsible Child Latch:
  10
  9
  8  
It looks like that "row cache objects Latch" works similar to the single "Result Cache: RC Latch" discussed in Blog: PL/SQL Function Result Cache Invalidation (I). But the difference is that later is optional, and can be turned off; but the former is an unavoidable obstacle.

To this point, we can have a further look of other Oracle Latches. Probably only cache buffers chains Latch is conceived to be scale (nonetheless "latch: cache buffers chains" Wait Event) since each of which has a limited responsibility and it amounts to majority of Latch (> 75%). Any excessive usage of other Latches is rated as irrational, hence contentions.

select l.name, count(*) cnt, round(100*ratio_to_report(count(*)) over (), 2) ratio 
from v$latch l, v$latch_children k
where l.latch# = k.latch#(+) 
--  and l.name in ('cache buffers chains', 'row cache objects', 'shared pool', 'Result Cache: RC Latch', 'kokc descriptor allocation latch')
--  and lower(l.name) like '%kokc%'
group by l.name order by cnt desc, l.name;

NAME                               CNT     RATIO
--------------------------------- ------  ------
cache buffers chains               32768   75.38
simulator hash latch               2048    4.71
...                                        
row cache objects                  56      0.13
...                                        
shared pool                        7       0.02
...                                        
Result Cache: RC Latch             1       0
... 
kokc descriptor allocation latch   1       0

770 rows selected.
(-------------- cache buffers chains --------------
Hidden Parameters and default values: 
  _db_block_hash_latches=32768    Number of database block hash latches,
  _db_block_hash_buckets=1048576  Number of database block hash buckets 
           1048576/32768=32       buckets per latch
 
Blog: Hot Block Identification and latch: cache buffers chains
-------------- simulator hash latch --------------
Blog: Row Cache and Sql Executions (IV)
-------------- row cache objects --------------
"Current Location" 
and
Blog: nls_database_parameters, dc_props, latch: row cache objects
-------------- shared pool --------------
Blog: sql_id and idn to shared pool subpool
-------------- Result Cache: RC Latch --------------
Blog: Result Cache:0 rows updated Invalidations (IV)
-------------- kokc descriptor allocation latch --------------
Bug 14382262  latch free wait event in"kokc descriptor allocation latch"

Blog: dbms_aq.dequeue - latch: row cache objects on AIX
Blog: Shared Pool - KKSSP
)


5. AWR/ASH - Row Cache Objects


We collect AWR/ASH reports for one 10 minutes Run of Plsql Test in 42 parallel sessions, and extract all Row Cache Objects related Sections.


5.1. Load Profile


DB CPU(s) Per Second: 5.9 shows that all 6 CPU are busy (CPU intensive), and about 100*(1-5.9/66.2) = 91% of DB Time are in waiting.

Load Profile
  Per Second Per Transaction Per Exec Per Call
DB Time(s): 66.2 373.6 0 195.01
DB CPU(s): 5.9 33.2 0 17.34


5.2. Top 10 Foreground Events by Total Wait Time


23 %DB time is in Waiting Event for latch: row cache objects, with Wait Avg(ms) 46.85 (probably bigger than 10 ms CPU Scheduling time slice).

9 %DB time is in Waiting Event for library cache: mutex X, with Wait Avg(ms) 10.49.

Top 10 Foreground Events by Total Wait Time
Event Waits Total Wait Time (sec) Wait Avg(ms) % DB time
latch: row cache objects 196,081 9185.8 46.85 23
library cache: mutex X 367,581 3856 10.49 9.6
DB CPU   3554.8   8.9


5.3. SQL ordered by Elapsed Time


SQL_ID: 241pfd82cu4wj has 42,533,404 Executions with Elapsed Time 20,423.63 seconds, all in CPU.

SQL ordered by Elapsed Time
Elapsed Time (s) Executions Elapsed Time per Exec (s) %Total %CPU %IO SQL Id SQL Text
24,895.60 42 592.75 62.27 14.26 0 22thkvvjndnmj DECLARE job BINARY_INTEGER := ...
20,423.63 42,533,404 0 51.09 12.81 0 241pfd82cu4wj begin :1 := foo(:2, :3, :4); e...


5.4. Latch Activity and Latch Sleep Breakdown


Total Latch Gets is 1,916,333,412, with 197,952 Sleeps, about 7.5% of Latch Miss (Gets/Misses=143,650,808/1,916,333,412). 197,952 Sleeps matches Latch Waits in Section "Top 10 Foreground Events by Total Wait Time".

Look rco_stats_v table for 42 SESSIONS, the 3 GETs for 3 Caches (dc_global_oids, dc_object_grants, dc_users) summing together is 1,915,994,113, close to above Total Latch Gets.

Same matches for MISSES, SLEEPS, SPIN_GETS, WAIT_TIME_S.

Latch Activity
Latch Name Get Requests Pct Get Miss Avg Slps /Miss Wait Time (s)
row cache objects 1,916,333,412 7.5 0 9186

Latch Sleep Breakdown
Latch Name Get Requests Misses Sleeps Spin Gets
row cache objects 1,916,333,412 143,650,808 197,952 143,454,769


5.5. Latch Miss Sources


Blog: Oracle row cache objects Event: 10222, Dtrace Script (I) shows that each Latch Get goes through 3 consecutive Locations ("Where"):
  Where=>4441(0x1159): kqrpre: find obj   -- latch Get at 1st Location
  Where=>4464(0x1170): kqreqd             -- latch Get at 2nd Location
  Where=>4465(0x1171): kqreqd: reget      -- latch Get at 3rd Location
Here we can see "kqrpre: find obj" is main blocker (Waiter Sleeps: 179,972), whereas "kqreqd: reget" is the top victim (Sleeps: 157,629), which accounts for about 80% (157,629/197,952) of Latch Sleeps.

In the above discussion, we noted that inter-parallel is related to 1st Latch Get Locations.

Latch Miss Sources
Latch Name Where NoWait Misses Sleeps Waiter Sleeps
row cache objects kqreqd: reget 0 157,629 3,121
row cache objects kqrpre: find obj 0 33,484 179,972
row cache objects kqreqd 0 6,812 14,827


5.6. Mutex Sleep Summary


This is related to library cache: mutex X (to be discussed in next Section).

Mutex Sleep Summary
Mutex Type Location Sleeps Wait Time (ms)
Library Cache kglpndl1 95 163,409 1,187,289
Library Cache kglhdgn2 106 161,608 1,191,330
Library Cache kglpin1 4 58,848 393,368
Library Cache kgllkdl1 85 31,551 186,930
Library Cache kglpnal1 90 30,451 176,736
Library Cache kglget1 1 30,157 233,526
Library Cache kglhdgn1 62 26,839 229,241


5.7. Child Latch Statistics


See rco_stats_v.

Child Latch Statistics
Latch Name Child Num Get Requests Misses Sleeps Spin & Sleeps 1->3+
row cache objects 9 638,804,551 56,606,984 65,237 56542348/0/0/0
row cache objects 8 638,793,715 22,654,054 56,248 22598358/0/0/0
row cache objects 10 638,788,857 64,389,752 76,467 64314045/0/0/0


5.8. Dictionary Cache Stats


See rco_stats_v.

I almost overlooked this Section because each popular Oracle term gets two names.

Dictionary Cache Stats
Cache Get Requests Pct Miss Scan Reqs Pct Miss Mod Reqs Final Usage
dc_global_oids 212,933,433 0 0   0 69
dc_objects 212,938,294 0 0   117 3,585
dc_users 212,935,101 0 0   0 89


5.9. Top SQL with Top Events, Top Events


This ASH Section listed our foo statement triggering the Top Events.

Top SQL with Top Events
SQL ID Plan Hash Executions % Activity Event % Event SQL Text
241pfd82cu4wj   2048 84.37 latch: row cache objects 41.39 begin :1 := foo(:2, :3, :4); e...
241pfd82cu4wj   2048 84.37 CPU + Wait for CPU 34.99 begin :1 := foo(:2, :3, :4); e...
241pfd82cu4wj   2048 84.37 library cache: mutex X 7.92 begin :1 := foo(:2, :3, :4); e...

Top Events
Event Event Class Session Type % Activity Avg Active Sessions
CPU + Wait for CPU CPU FOREGROUND 45.57 18.39
latch: row cache objects Concurrency FOREGROUND 41.39 16.71
library cache: mutex X Concurrency FOREGROUND 12.92 5.22


5.10. Top Event P1/P2/P3 Values



Top Event P1/P2/P3 Values
Event % Event P1, P2, P3 Values % Activity Parameter 1 Parameter 2 Parameter 3
latch: row cache objects 41.39 "6490258456","411","0" 18.91 address number tries
latch: row cache objects   "6490257608","411","0" 11.94 address number tries
latch: row cache objects   "6490258032","411","0" 10.54 address number tries
library cache: mutex X 12.92 "595386336", "3143916060672", "10136848668098666" 0.66 idn value where

Pick all 3 P1 Parameters in Event latch, run query below. The output shows that they point to the 3 Child_Latches (10, 9, 8) repectively.

select to_number(addr, 'XXXXXXXXXXXXXXXX') addr_num, addr,
       child#, gets, misses, sleeps, spin_gets, wait_time
 from v$latch_children lat
where latch#=411 
  and (addr like '%'||trim(to_char(6490258456, 'XXXXXXXXX'))
    or addr like '%'||trim(to_char(6490257608, 'XXXXXXXXX'))
    or addr like '%'||trim(to_char(6490258032, 'XXXXXXXXX')))
order by sleeps desc;

ADDR_NUM    ADDR              CHILD#  GETS       MISSES    SLEEPS  SPIN_GETS  WAIT_TIME
----------  ----------------  ------  ---------  --------  ------  ---------  ----------
6490258456  0000000182D97C18  10      638829560  64389752  76467   64314045   3905162356
6490258032  0000000182D97A70  9       638989001  56607014  65237   56542378   3006243205
6490257608  0000000182D978C8  8       639105230  22654062  56248   22598366   2274407991


6. library cache: mutex X


Look ASH Section - Top Event P1/P2/P3 Values, pick P1 Parameters in Event mutex, run query below.

The output shows that P1 Parameter: "595386336" represents T_OBJ_IN.

with start_snap as (
  select mutex_identifier, location, max(gets) gets, max(sleeps) sleeps, min(sleep_timestamp) min_ts
   from v$mutex_sleep_history
  where sleep_timestamp < sysdate -30/1440 
   group by mutex_identifier, location)
 ,end_snap as (
  select mutex_identifier, location, min(gets) gets, min(sleeps) sleeps, max(sleep_timestamp) max_ts
   from v$mutex_sleep_history
  where sleep_timestamp < sysdate -1/1440 
   group by mutex_identifier, location)
 ,delta as (
  select e.mutex_identifier, e.location, 
       e.gets   - nvl(s.gets, 0)   gets,
       e.sleeps - nvl(s.sleeps, 0) sleeps, min_ts, max_ts
  from start_snap s, end_snap e
  where e.mutex_identifier = s.mutex_identifier (+) 
    and e.location = s.location (+))
 ,snap as (
  select mutex_identifier, sum(gets) gets, sum(sleeps) sleeps, min(min_ts) min_ts, max(max_ts) max_ts
    from delta group by mutex_identifier
   order by sleeps desc
  )  
select o.hash_value, name, s.gets, s.sleeps, s.min_ts, s.max_ts
 from v$db_object_cache o, snap s
where o.hash_value = s.mutex_identifier
--  and mutex_identifier = 595386336
  and rownum <= 5
order by s.sleeps desc;

HASH_VALUE  NAME         GETS        SLEEPS
----------  -----------  ----------  ------
2812823600  T_OBJ_RET    1063400668  19362
668251247   T_OBJ_INOUT  1276396207  13660
595386336   T_OBJ_IN     1063241644   9290
415170529   T_OBJ_OUT    1063009333   8719
4198378475  FOO           254328227   1902
By the above output, if T_OBJ_IN occupies to 0.66 %Activity, T_OBJ_RET could be estimated as 1.38% (0.66*19362/9290) %Activity. Probably not always the TOP consumer is displayed in ASH.

Following query would give an output similar to AWR Section: Mutex Sleep Summary.
(The data in v$mutex_sleep_history is contained within a circular buffer, with the most recent sleeps shown)

with start_snap as (
  select mutex_identifier, location, max(gets) gets, max(sleeps) sleeps
   from v$mutex_sleep_history
  where sleep_timestamp < timestamp'2018-12-17 08:04:30' 
   group by mutex_identifier, location)
 ,end_snap as (
  select mutex_identifier, location, min(gets) gets, min(sleeps) sleeps
   from v$mutex_sleep_history
  where sleep_timestamp < timestamp'2018-12-17 08:14:50'
   group by mutex_identifier, location)
 ,delta as (
  select e.mutex_identifier, e.location, 
       e.gets   - nvl(s.gets, 0)   gets,
       e.sleeps - nvl(s.sleeps, 0) sleeps
  from start_snap s, end_snap e
  where e.mutex_identifier = s.mutex_identifier (+) 
    and e.location = s.location (+))
select location, sum(gets) gets, sum(sleeps) sleeps
  from delta group by location
 order by sleeps desc;
 
LOCATION                    GETS        SLEEPS
--------------------------  ----------  ------
kglhdgn2 106                977757576   13812
kglpndl1  95                1062042442  12725
kglpin1   4                 1020903933  11455
kglpnal1  90                 680931048   9432
kgllkdl1  85                 425439695   5679
kksLockDelete [KKSCHLPIN6]   425095566   3199
kglget1   1                  255195587   2600
kglhdgn1  62                 336370181   2466
kksxsccmp [KKSCHLPIN5]       169434152    242
kksfbc [KKSCHLFSP2]          169595310    232
A quick alternative is to create multiple copies of hot objects, for example, for T_OBJ_IN, and T_OBJ_RET (See Blog: "library cache: mutex X" and Application Context)

alter system set "_kgl_hot_object_copies"= 255 scope=spfile;

alter system set "_kgl_debug"=
      "name='T_OBJ_IN'   schema='K'    namespace=1  debug=33554432", 
      "name='T_OBJ_RET'  schema='K'    namespace=1  debug=33554432"
      scope=spfile;


7. Test Code: Plsql



drop sequence foo_seq;

create sequence foo_seq;

create or replace force view rco_stats_record_v as
with rc as (
  select min(parameter) rc_parameter, sum(gets) rc_gets, sum(getmisses) rc_getmisses, 
         decode(substr(parameter, 1, 9), 'dc_users', 8, 'dc_object', 9, 'dc_global', 10) la_child#,
         listagg(cache#, ', ') within group (order by cache#) cache#_list,
         listagg(parameter, ', ') within group (order by length(parameter)) parameter_list
  from   v$rowcache where cache# in (7,10, 8,11, 17) and gets > 0 
  group by substr(parameter, 1, 9))
select localtimestamp ts, addr, child#, name, gets, misses, sleeps, spin_gets, wait_time, rc.*, 3*gets RC_theory_gets
from v$latch_children la, rc
where la.latch# = 411 and la.child# in (8, 9, 10) 
  and la.child# = rc.la_child# 
order by la.child#; 

drop table rco_stats;

create table rco_stats as select 0 run, 0 sessions, v.* from rco_stats_record_v v where 1=2;

create or replace force view rco_stats_v  as
select run, sessions, ts, child#, rc_parameter
      ,gets - lag(gets) over (partition by rc_parameter order by run, sessions) gets
      ,misses - lag(misses) over (partition by rc_parameter order by run, sessions) misses
      ,sleeps - lag(sleeps) over (partition by rc_parameter order by run, sessions) sleeps
      ,spin_gets - lag(spin_gets) over (partition by rc_parameter order by run, sessions) spin_gets
      ,round((wait_time - lag(wait_time) over (partition by rc_parameter order by run, sessions))/1e6) wait_time_s
      ,rc_gets - lag(rc_gets) over (partition by rc_parameter order by run, sessions) rc_gets
      ,rc_getmisses - lag(rc_getmisses) over (partition by rc_parameter order by run, sessions) rc_getmisses
      ,rc_theory_gets - lag(rc_theory_gets) over (partition by rc_parameter order by run, sessions) rc_theory_gets
      ,cache#_list, parameter_list
from rco_stats v;


-- Plsql  241pfd82cu4wj    begin :1 := foo(:2, :3, :4); end; 
-- Java   8nkt65d7hcnxz    BEGIN :1 := K.FOO(:2, :3, :4); END;

create or replace force view exec_stats_record_v as
select localtimestamp ts, sql_id, executions, elapsed_time, cpu_time, concurrency_wait_time
from v$sqlarea v 
where (lower(sql_text) like 'begin :1 := %foo(:2, :3, :4); end;%' or sql_id in ('241pfd82cu4wj', '8nkt65d7hcnxz'))
  and last_active_time = (select max(last_active_time) from v$sqlarea where sql_id in ('241pfd82cu4wj', '8nkt65d7hcnxz'))
  and rownum = 1
order by v.last_active_time desc;     

drop table exec_stats;

create table exec_stats as select 0 run, 0 sessions, v.* from exec_stats_record_v v where 1=2;

create or replace force view exec_stats_v as
select run, sessions, ts
      ,executions - lag(executions) over (order by run, sessions) executions
      ,round((elapsed_time - lag(elapsed_time) over (order by run, sessions))/1e6) elapsed_time_s
      ,round((cpu_time - lag(cpu_time) over (order by run, sessions))/1e6) cpu_time_s
      ,round((concurrency_wait_time - lag(concurrency_wait_time) over (order by run, sessions))/1e6) concurrency_wait_time_s
      ,round((elapsed_time - lag(elapsed_time) over (order by run, sessions))/(nullif (executions - lag(executions) over (order by run, sessions), 0))) us_per_exec
      ,sql_id
from exec_stats;

create or replace procedure record_stats (p_sessions number) as 
  l_run   number;
begin
  l_run := foo_seq.nextval;
  insert into rco_stats  select l_run, p_sessions, v.* from rco_stats_record_v v;
  insert into exec_stats select l_run, p_sessions, v.* from exec_stats_record_v v;
  commit;
  sys.dbms_workload_repository.create_snapshot('ALL');  
end;
/

create or replace procedure k.foo_proc_sessions(p_session_cnt number, p_cnt number) as
  l_job_id pls_integer;
begin
  for i in 1.. p_session_cnt loop
    --dbms_job.submit(l_job_id, 'begin foo_proc('||p_cnt||'); end;');
    dbms_job.submit(l_job_id, 'begin while true loop foo_proc('||p_cnt||'); end loop; end;');
  end loop;
  commit;
end;
/

--exec foo_proc_sessions(6, 1e9);

--exec clean_jobs;

create or replace procedure k.foo_proc_sessions_ctrl(p_cnt number, p_dur_min number := 10) as
  type    t_num_tab is table of number;
  l_nums  t_num_tab := new t_num_tab(1, 3, 6, 12, 18, 24, 36, 42, 48);
  l_run   number;
begin
  foo_proc(10);   -- warmup
  clean_jobs; 
  record_stats(0);
  
  for n in 1..l_nums.count loop
    foo_proc_sessions(l_nums(n), p_cnt);    -- n Sessions
    dbms_lock.sleep(p_dur_min*60);          -- m minutes
    clean_jobs; 
    record_stats(l_nums(n)); 
  end loop;
end;
/

-- exec foo_proc_sessions_ctrl(1e9, 10);


8. Test Code: UNIX Script for Java Test



-------------- create Script: foo_proc_sessions_java ------
#!/bin/ksh

# setup CLASSPATH
export CLASSPATH=$CLASSPATH:java-path:jdbc-path:RCOObjTypeJDBC1-path

function db_record_stats {
  `sqlplus -s /nolog << EOF
  connect k/k
  exec k.record_stats($1);
  exit;
  EOF`  
}

# warm up
/usr/java/bin/java -Xmx30m -Xms8m RCOObjTypeJDBC1 "jdbc:oracle:thin:k/s@testdb:1521:testdb"  10 2
sleep 2

# MOS (Doc ID 2336567.1). new Oracle Database 12c installation.
# ./foo_proc_sessions_java[20]: db_record_stats[9]: PL/SQL: not found [No such file or directory]

db_record_stats 0

for i in 1 3 6 12 18 24 36 42 48
do
  echo "Sessions Count: $i"
  j=1
  while [[ $j -le $i ]]
  do
    echo "Run: $j"
    /usr/java/bin/java -Xmx30m -Xms8m RCOObjTypeJDBC1 "jdbc:oracle:thin:k/s@testdb:1521:testdb"  1000000000 2 &
    (( j = j + 1 ))
  done
  
  sleep $1   # run interval
  
  for pid in $(ps -ef | awk '/RCOObjTypeJDBC1/ {print $2}')
  do 
      if [ -n "$pid" -a -e /proc/$pid ]; then
          echo "Kill Pid: $pid"   # kill if exist
          kill -9 $pid 
      fi    
  done
  sleep 2     # grace period for killing 
  
  db_record_stats $i
done


-------------- called by ----
chmod a+x ./foo_proc_sessions_java
ksh ./foo_proc_sessions_java 600

Row Cache and Sql Executions (IV)


(I)-Tracing Methods      (II)-Object Type(Part-1)      (III)-Object Type(Part-2)       (IV)-Sql Executions (IV)       (V)-Contentions and Scalability


A common belief is that Row Cache is used in Sql hard parsing or Plsql compiling. While minimizing the number of Sql parsing is to reduce non-productive overhead, maximizing the number of Sql executions is to achieve the utmost productive outcome(throughput).

If each Sql execution is bundled with Row Cache GETs and is going hand in hand together, it will be gradually dragged down, and eventually at some point be contained by Row Cache Latch contentions (see next Blog).

In previous two Blogs, we have showed that the Object Type Row Cache Gets are proportional to the number of executions in host language Plsql and Java (still interesting to see other languages).

This Blog will look Row Cache Gets and Sql Executions.

Note: All tests are done in Oracle 12.1.0.2.


1. dc_tablespaces, dc_users and Hash Joins


At first let's run a test for 1000 Hash Joins (after warmup), and then examine the output:

------------------------- Setup -------------------------
drop table hj1;
drop table hj2;

create table hj1 as select level x from dual connect by level <= 1000; 
create table hj2 as select level x from dual connect by level <= 1000; 

------------------------- Test -------------------------
column name format a20
column parameter format a20
column cache#_list format a20

select parameter, sum(gets) gets, listagg(cache#, ',') within group (order by cache#) as cache#_list 
from v$rowcache where parameter in ('dc_tablespaces', 'dc_users') and gets > 0
group by parameter;

select name, gets, addr, latch#, child#
from v$latch_children v where name in ('row cache objects') and child# in (5, 8);

select name, gets, addr, latch#
      ,(select count(*) from v$latch_children where name = v.name) children_cnt
from v$latch v where name in ('simulator hash latch');

select a.name, b.value from v$statname a, v$mystat b 
where a.statistic# = b.statistic# and name in ('parse count (total)', 'parse count (hard)');

declare
  l_cnt number;
begin
  for i in 1..1000 loop
    select /*+ use_hash(t2) */ count(*) into l_cnt from hj1 t1, hj2 t2 where t1.x=t2.x;
    -- with sq as (select /*+ materialize*/ * from hj1) select count(*) into l_cnt from sq;  -- for next test
  end loop;
end;
/

--grace period
exec dbms_lock.sleep(1);  

select a.name, b.value from v$statname a, v$mystat b 
where a.statistic# = b.statistic# and name in ('parse count (total)', 'parse count (hard)');

select parameter, sum(gets) gets, listagg(cache#, ',') within group (order by cache#) as cache#_list 
from v$rowcache where parameter in ('dc_tablespaces', 'dc_users') and gets > 0
group by parameter;

select name, gets, addr, latch#, child#
from v$latch_children v where name in ('row cache objects') and child# in (5, 8);

select name, gets, addr, latch#
      ,(select count(*) from v$latch_children where name = v.name) children_cnt
from v$latch v where name in ('simulator hash latch');

------------------------- Output -------------------------
---*** Before Test ***---
   PARAMETER             GETS   CACHE#_LIST
   --------------------- ------ -----------
   dc_tablespaces        11000            0
   dc_users              18746         7,10
   
   NAME                  GETS   ADDR              LATCH#  CHILD#
   --------------------- ------ ----------------- ------- ------
   row cache objects     34571  0000000182D973D0  411     5
   row cache objects     56110  0000000182D978C8  411     8
   
   NAME                  GETS   ADDR              LATCH#  CHILDREN_CNT
   --------------------- ------ ----------------- ------- ------------
   simulator hash latch  55711  00000000600296D0  240     2048

   NAME                  VALUE
   -------------------- ------
   parse count (total)     321
   parse count (hard)        1

---*** After Test ***---
   NAME                  VALUE
   -------------------- ------
   parse count (total)     328
   parse count (hard)        1

   PARAMETER             GETS   CACHE#_LIST
   --------------------- ------ -----------
   dc_tablespaces        12000            0
   dc_users              19756         7,10
   
   NAME                  GETS   ADDR              LATCH#  CHILD#
   --------------------- ------ ----------------- ------- ------
   row cache objects     37571  0000000182D973D0  411     5
   row cache objects     59140  0000000182D978C8  411     8
   
   NAME                  GETS   ADDR              LATCH#  CHILDREN_CNT
   --------------------- ------ ----------------- ------- ------------
   simulator hash latch  57712  00000000600296D0  240     2048

------------------------- Delta -------------------------
   NAME                 Delta-Parse Count
   -------------------- -----------------
   parse count (total)                  7
   parse count (hard)                   0

   PARAMETER             Delta-GETS   CACHE#_LIST
   --------------------- ------------ -----------
   dc_tablespaces        1000                   0
   dc_users              1010                7,10
                                      
   NAME                  Delta-GETS   ADDR              LATCH#  CHILD#
   --------------------- ------------ ----------------- ------- -----------------
   row cache objects     3000         0000000182D973D0  411     5 (dc_tablespaces)
   row cache objects     3030         0000000182D978C8  411     8 (dc_users)     
                                      
   NAME                  Delta-GETS   ADDR              LATCH#  CHILDREN_CNT
   --------------------- ------------ ----------------- ------- ------------
   simulator hash latch  2001         00000000600296D0  240     2048
The Delta values for 1000 Hash Joins showed:
  total parse count is 7, hard parse count is 0, hence no any hard parse observed (misconception ?).
  Row Cache: dc_tablespaces (CID=0), and dc_users (CID=7) made about 1000 GETs each.
  Row Cache Latch: dc_tablespaces (5) and dc_users (8) made about 3000 GETs each (3 times of Row Cache GETs).
  simulator hash latch made about 2000 GETs.
(Note: more statistics are listed with Tom Kyte Runstats)

The same behaviour can be observed by 10222 trace or dtrace, for example,
run dtace: rco_dtrace_cache, the output looks like (CID ADDR are same too):

CID = 0  ADDR = 17C376EF8
           kqrLockAndPinPo:entry
    oracle`kqrLockAndPinPo
    oracle`kqrpre1+0x8d4
    oracle`ktatminextsz+0x2f2
    oracle`qerhjComputeFanoutAndBPS+0x7a5
    oracle`qerhjComputeHjParameters+0x12f

CID = 7  ADDR = 17BD52070
           kqrLockAndPinPo:entry
    oracle`kqrLockAndPinPo
    oracle`kqrpre1+0x8d4
    oracle`ktsitbs_info+0x53
    oracle`ktatminextsz+0x9a3
    oracle`qerhjComputeFanoutAndBPS+0x7a5

  *************** CID Stats with Address ****************
  CID = 11   ADDR = 178FC8160        CNT = 3
  CID = 17   ADDR = 179A6D968        CNT = 3
  CID = 0    ADDR = 17C376EF8        CNT = 1000
  CID = 7    ADDR = 17BD52070        CNT = 1000
  *************** CID Stats ****************
  CID = 11   CNT = 3
  CID = 17   CNT = 3
  CID = 0    CNT = 1000
  CID = 7    CNT = 1000
  *************** CID Stats Summary ****************
  CNT = 2006  
If we put one more table (e.g, hj3) in the above select, all the GETs (except "simulator hash latch") will be doubled because of the one additional Hash Join.

This behaviour is described in MOS:
  Bug 13902396 Hash joins cause "row cache objects" latch gets and "shared pool" latch gets (disabled fix)
    The function making these calls, ktatminextsz(), returns the minimum extent size for the users temporary tablespace
It can be switched off (except "simulator hash latch") by a special event:

  alter system set events '45053 trace name context forever, level 255';
  
  --Switched on by: 
  alter system set events '45053 trace name context off';
  
  (45053 Event has no effect on "simulator hash latch")
where 255 is obtained by:

(Blog: 如何诊断latch: row cache objects)
  SELECT name, decode (bitmapped, 0, least (DFLincr, dflinit), bitmapped) -1 "Level" 
  FROM sys.ts$ where name in ('TEMP', 'UNDO', 'SYSTEM');
    NAME    Level
    ------- -----
    TEMP    255      --TEMP allocation_type is UNIFORM
    UNDO    7
    SYSTEM  7
    
  select tablespace_name, block_size, initial_extent, next_extent, allocation_type
        ,(least(initial_extent, coalesce(next_extent, initial_extent))/block_size) - 1 "Level" 
  from dba_tablespaces where tablespace_name in ('TEMP', 'UNDO', 'SYSTEM');   
  
    TABLESPACE_NAME  BLOCK_SIZE  INITIAL_EXTENT  NEXT_EXTENT  ALLOCATION_TYPE  Level
    ---------------- ----------- --------------- ------------ ---------------- -----
    SYSTEM           8192        65536                        SYSTEM           7
    TEMP             8192        2097152         2097152      UNIFORM          255
    UNDO             8192        65536                        SYSTEM           7 
With following Dtrace script, we can verify the return value (255) of subroutine: ktatminextsz. Therefore, setting value 255 effectively bypasses repeated calls of ktatminextsz.

declare
  l_cnt number;
begin
  for i in 1..1000 loop
    select /*+ use_hash(t2) */ count(*) into l_cnt from hj1 t1, hj2 t2 where t1.x=t2.x;
  end loop;
end;
/

sudo dtrace -n \
'pid$target::ktatminextsz:return /execname == "oracle"/ {
  @RET_CNT[arg0, arg1] = count();}
END {
  printf("\n**** ktatminextsz Output ****");
  printa("\nProgram Counter Offset = %d \nMINEXTSZ(Return Value) = %d \nCNT = %-10@d", @RET_CNT); }
' -p 17419


  **** ktatminextsz Output ****
  Program Counter Offset = 2578
  MINEXTSZ(Return Value) = 255
  CNT = 1000
Bug 13902396 Note said:
  This fix is disabled by default. ...
  This fix should ONLY be used under the direction of Oracle.
The Bug was first created in Mar 29, 2012 for Oracle 11.2.0.3, but still reproducible in 12cR1 since the fix is disabled by default according to the Note.

We have seen that in Oracle, the second TOP Latch by children count (the first is: cache buffers chains, see next Blog) is:

  NAME                     CNT     RATIO
  ----------------------- ------  ------
  simulator hash latch    2048    4.71
which is also heavily used in Hash Join, but hardly noticed any contentions owing to the number of its children
("simulator hash latch" is included here because it is a long time searched latch in Oracle community).

In the above test, if we use the same Table join by replacing hj2 with hj1:

declare
  l_cnt number;
begin
  for i in 1..1000 loop
    select /*+ use_hash(t2) */ count(*) into l_cnt from hj1 t1, hj1 t2 where t1.x=t2.x;
  end loop;
end;
/
There is (almost) no more GETs of "simulator hash latch".

By the way, 45053 is a kg_event with valid value from 0 to 2^32.

SQL> ORADEBUG DOC EVENT NAME

     Events in library GENERIC:
     ------------------------------
     kg_event[]           Support old error number events (use err# for short)

45053 Level Range: [0, 4294967295]  


2. dc_tablespaces, dc_users and temporary tables


Now it raises the question if 45053 is a universal trace event, which is also temporary table relevant.

Let's try it. In the above test, replace:

    select /*+ use_hash(t2) */ count(*) into l_cnt from hj1 t1, hj2 t2 where t1.x=t2.x;
by a Subquery Factoring Clause with materialize hint to resolve the subquery as a global temporary table:

    with sq as (select /*+ materialize*/ * from hj1) select count(*) into l_cnt from sq;
Then make two tests: one is without 45053 trace event (default); one is with it.

we got the identical result as follows:

------------------------- Delta -------------------------
   PARAMETER             Delta-GETS   CACHE#_LIST
   --------------------- ------------ -----------
   dc_tablespaces        5000                   0
   dc_users              2000                7,10
                                      
   NAME                  Delta-GETS   ADDR              LATCH#  CHILD#
   --------------------- ------------ ----------------- ------- -----------------
   row cache objects     15000        0000000182D973D0  411     5 (dc_tablespaces)
   row cache objects     6000         0000000182D978C8  411     8 (dc_users)     
                                      
   NAME                  Delta-GETS   ADDR              LATCH#  CHILDREN_CNT
   --------------------- ------------ ----------------- ------- ------------
   simulator hash latch  2399         00000000600296D0  240     2048
Therefore setting 45053 trace event does not reduce the Row Cache Gets.

Run dtace: rco_dtrace_cache, the output is consistent with above output:

*************** CID Stats with Address (double counting) ****************
CID = 7    ADDR = 17BF59D30        CNT = 1
CID = 11   ADDR = 16669DED0        CNT = 3
CID = 17   ADDR = 179A6D968        CNT = 3
CID = 7    ADDR = 17D7CACD0        CNT = 2000
CID = 0    ADDR = 17C376EF8        CNT = 5005
*************** CID Stats ****************
CID = 11   CNT = 3
CID = 17   CNT = 3
CID = 7    CNT = 2001
CID = 0    CNT = 5005
*************** CID Stats Summary ****************
CNT = 7012
Comparing to the test of Hash Joins in Section 1, dc_tablespaces is 5 times, dc_users is 2 times.

If we further investigate the reason by checking the call of subroutine: "ktatminextsz", there is no output.

sudo dtrace -n \
'pid$target::ktatminextsz:return /execname == "oracle"/ {
  @RET_CNT[arg0, arg1] = count();}
END {
  printf("\n**** ktatminextsz Output ****");
  printa("\nProgram Counter Offset = %d \nMINEXTSZ(Return Value) = %d \nCNT = %-10@d", @RET_CNT); }
' -p 13767

**** ktatminextsz Output ****
That is probably why activating 45053 trace event has no effect on above temporary tables Factoring Clause.

Now it is unclear how to enumerate all the cases in which Row Cache Gets are proportional to Sql Executions.