Monday, December 22, 2014

Local prefixed vs. nonprefixed indexes: Column Null Values

According to Oracle Doc, Oracle does not index table rows in which all key columns are null except in the case of bitmap indexes (Database SQL Language Reference).

Therefore, in a composite B-tree index, if there exists one column which contains a non-null value, the row will be indexed. Since local prefixed indexes is a composite index which includes partition keys on the leading edge of the index definition, and partition keys are never all null, all rows are always indexed.

For applications, it means that local prefixed indexes is always usable, but not local nonprefixed indexes. In the performance point of view, Column Null Values is a torque on Local prefixed vs. nonprefixed indexes (Number_of_Null_Columns x Number_of_Rows).

Taking the same example from Expert Oracle Database Architecture Page 593, but updating Column b of one row in PART_1 as NULL.

 create table partitioned_table
 ( a int,
   b int,
   data char(20)
 )
 partition by range (a)
 (
 partition part_1 values less than(2) tablespace p1,
 partition part_2  values less than(3) tablespace p2
 )
 /

 create index local_prefixed on partitioned_table (a,b) local;

 create index local_nonprefixed on partitioned_table (b) local;

 insert into partitioned_table
  select mod(rownum-1,2)+1, rownum, 'x' from all_objects where rownum <= 1000;

 update partitioned_table set b=null where a=1 and b=99;
  
 begin
    dbms_stats.gather_table_stats
    ( user,
     'PARTITIONED_TABLE',
      cascade=>TRUE );
 end;
 /


we can see PART_1 having one row with Column b being NULL:

 select a, count(*), count(a), count(b)
  from partitioned_table group by a;

           A   COUNT(*)   COUNT(A)   COUNT(B)
  ---------- ---------- ---------- ----------
           1        500        500        499
           2        500        500        500

 select index_name, partition_name, num_rows
   from dba_ind_partitions
  where index_name in ('LOCAL_PREFIXED', 'LOCAL_NONPREFIXED');

 INDEX_NAME                     PARTITION_NAME                   NUM_ROWS
 ------------------------------ ------------------------------ ----------
 LOCAL_NONPREFIXED              PART_1                                499
 LOCAL_NONPREFIXED              PART_2                                500
 LOCAL_PREFIXED                 PART_1                                500
 LOCAL_PREFIXED                 PART_2                                500


Now if we run following queries, all of them will use LOCAL_PREFIXED index.

 select /*+ index(t local_prefixed) */ count(a) from partitioned_table t where a = 1 and b is null;

 select /*+ index(t local_nonprefixed) */ count(a) from partitioned_table t where a = 1 and b is null;

 select /*+ index(t local_prefixed) */ count(a) from partitioned_table t where b is null;

 select /*+ index(t local_nonprefixed) */ count(a) from partitioned_table t where b is null;

 select /*+ index(t local_prefixed) */ count(a) from partitioned_table t where a is null and b is null;

 select /*+ index(t local_nonprefixed) */ count(a)

   from partitioned_table t where a is null and b is null;

  -----------------------------------------------------------------------------------------------
  | Id  | Operation               | Name           | Rows  | Bytes | Cost (%CPU)| Pstart| Pstop |
  -----------------------------------------------------------------------------------------------
  |   0 | SELECT STATEMENT        |                |     1 |     7 |     1   (0)|       |       |
  |   1 |  SORT AGGREGATE         |                |     1 |     7 |            |       |       |
  |   2 |   PARTITION RANGE SINGLE|                |     1 |     7 |     1   (0)|     1 |     1 |
  |*  3 |    INDEX RANGE SCAN     | LOCAL_PREFIXED |     1 |     7 |     1   (0)|     1 |     1 |
  -----------------------------------------------------------------------------------------------

  Predicate Information (identified by operation id):
  ---------------------------------------------------

     3 - access("A"=1 AND "B" IS NULL)


For the queries with partition key being NULL, PARTITION RANGE EMPTY elimination is used.

Even though PART_2 does not contain any rows whose Column b being NULL, LOCAL_NONPREFIXED index can neither be used in the following queries.

 select /*+ index(t local_prefixed) */ count(a) from partitioned_table t where a = 2 and b is null;

 select /*+ index(t local_nonprefixed) */ count(a) from partitioned_table t where a = 2 and b is null;

 select /*+ index(t local_nonprefixed) */ count(a)

   from partitioned_table partition (part_2) t where b is null;

Monday, November 10, 2014

reverse key index: the remedy is worse than the symptoms

In general, reverse key index mitigates data block contention (buffer busy waits) when inserting monotonically increasing key into right-most index by multiple concurrent sessions. The disadvantage is its inability to support range scan.

Based on AWR report, this Blog will try to list the main checking points when using reverse key index, and shows how to find the acceptable trade-off between benefits and drawbacks. The whole discussion stems from author's experience in single instance DB (non-RAC).

1.Top 5 Timed Foreground Events

When some events like:
 enq: TX - index contention
 buffer busy waits
 read by other session

appear in Top Events, reverse key index is hastily resorted.

However "db file sequential read" should be checked after experimenting reverse key index since all the original inserts of right-most single index block are now spread over many blocks. One should try to balance the fast logical buffer activity against slow physical disk access.

2. Time Model Statistics

Since more blocks are involved with reverse key index, DBWR is more active, and therefore:
 background elapsed time
 background cpu time

are augmented.

Higher "db file parallel write" and "DBWR checkpoint buffers written" can also be observed,

3. SQL ordered by Gets

The insert statement generates a lower "Buffer Gets" in case of reverse key index.

4. SQL ordered by Reads

Reverse key index incurs a higher "Physical Reads" for the insert statement, almost 1 "Reads per Exec" since each insert touches a single different block.

Moreover, the queries using the index also cause higher "Physical Reads" due to the wide spreading of conceptually adjacent indexes, but physically isolated.

5. Segment Statistics

The absolute figures of buffer contentions and physical activities can be extracted by looking:
 Segments by Logical Reads
 Segments by Row Lock Waits
 Segments by Buffer Busy Waits

 Segments by Physical Reads
 Segments by Physical Writes

Based on them, a quick magnitude comparison of IO and Memory associated with system statistics can give an approximate assessment of reverse key index.

Discussion


D-1. Expert Oracle Database Architecture said:

 One way to improve the performance of the PL/SQL implementation with a regular index would be to introduce a small wait. That would reduce the contention on the right-hand side of the index and increase overall performance (Page 439)

 reverse key index (designed for Oracle RAC) (Page 491)


Usually, sequence insert in OLTP system is inherently operated with a slight delay due to the interactive nature of OLTP, whereas Batch processing can be impeded by heavy buffer contention of index blocks.

D-2Introduction To Reverse Key Indexes: Part III (A Space Oddity) and
        Oracle B-Tree Index Internals:Rebuilding The Truth wrote:

If the "right-most" block is filled by the maximum current value in the index, Oracle performs 90-10 block splits meaning that full index blocks are left behind in the index structure.

I prefer to call them 99-1 block splits as 90-10 is misleading.

Generally reverse key index tends to use 50-50 Split, which costs more block creates and touches.
In principle, it shifts single block buffer contention to multiple accesses of separate blocks.

D-3. Oracle MOS has a cautious Doc:
        Using Reverse Key Indexes For Oracle 11.2.0.2 (Doc ID 1352477.1)

 However, it must be noted that : Now the entire index had better be in the buffer cache whereas before - only the hot right hand side needed to be in the cache for efficient inserts. If the index cannot fit into the cache we might well have turned a buffer busy wait into a physical IO wait which could be even worse (the remedy is worse than the symptoms).

The reminiscent last remark is voted as the title of this Blog.

D-4. Oracle MOS Doc:
 Troubleshooting 'enq: TX - index contention' Waits in a RAC Environment. (Doc ID 873243.1)
explains "enq: TX - index contention":

 The reason for this is the index block splits while inserting a new row into the index. The transactions will have to wait for TX lock in mode 4, until the session that is doing the block splits completes the operations.

and suggests to look Instance Activity Stats:  
  root node splits
 branch node splits
 leaf node splits
 leaf node 90-10 splits


In a DB, reverse key index can be found by:
 select * from dba_indexes where index_type = 'NORMAL/REV';

Thursday, November 6, 2014

Low High Water Mark, High High Water Mark, and Parallel Query

Recently we saw a 4 hours running parallel SELECT:

 select /*+  parallel(t,2) parallel_index(t,2) dbms_stats cursor_sharing_exact
   use_weak_name_resl dynamic_sampling(0) no_monitoring no_substrb_pad 
   */count(*), count("ID"), count(distinct "ID"),
   sum(sys_op_opnsize("ID")), substrb(dump(min("ID"),16,0,32),1,120),
   substrb(dump(max("ID"),16,0,32),1,120),
   ...
 from "X"."TABLE_1" sample (  5.0000000000)  t;


which is generated when executing:
 dbms_stats.gather_table_stats('X', 'TABLE_1', estimate_percent=>5, degree=>2);
on both Oracle 11.2.0.3.0 and 11.2.0.4.0.

The table is about 160GB and contains 20m blocks (8K per block) in ASSM LMT Managed Tablespaces, all the columns are number or varchar2.

AWR report shows the SELECT statement statistics:
   Buffer Gets          14,805,983,166
 Physical Reads           40,474,817
 UnOptimized Read Reqs     1,881,624


TABLE_1 Segment Statistics:
 Segments by Logical Reads        16,312,997,632
 Segments by Physical Reads           38,580,663
 Segments by Direct Physical Reads    38,557,265
 Segments by Table Scans                 263,660
 Segments by Buffer Busy Waits            53,492


At first, one surprising big "Table Scans" indicates certainly abnormal behavior.

Huge number of "Buffer Gets" and double "Physical Reads" also explain the slow running.

"Table Scans" for Parallel Query is the Execs of "TABLE ACCESS FULL" below "PX BLOCK ITERATOR" in xplan, since "PX BLOCK ITERATOR" at first breaks up the table into granules which are then dispatched to each PX slaves. In the case of SAMPLE select, it is the Execs of "TABLE ACCESS SAMPLE BY ROWID RANGE".

The number of granules processed by each PX slave is controlled by two hidden parameters:
 _px_min_granules_per_slave: minimum number of rowid range granules to generate per slave default  13
 _px_max_granules_per_slave: maximum number of rowid range granules to generate per slave default 100


For instance, parallel(t,2) contributes minimum 26 (2*13) "Table Scans", and maximum 200 (2*100);
and parallel(t,16) generates minimum 208 (16*13) "Table Scans", and maximum 1600 (16*100);

Therefore when increasing degree from 2 to 16,
 dbms_stats.gather_table_stats(null, 'TABLE_1', estimate_percent=>5, degree=>16);
will take 32 (8*4) hours. That is a counterintuitive behavior when using parallel query.

The "Table Scans" can be monitored by:
     select * from V$segment_statistics
   where object_name = 'TABLE_1' and statistic_name = 'segment scans';

 
The hidden parameters can be changed by, for example:
 alter system set "_px_min_granules_per_slave"=19;
 
 
Searching MOS, we found:


 Bug 6474009 - On ASSM tablespaces Scans using Parallel Query can be slower than in 9.2 (Doc ID 6474009.8)


which says:

 Scans using Parallel Query on ASSM tablespaces on 10.2 can be slower compared to 9.2 when there is a wide separation between the High and Low High Water Mark.  There can be High contention and/or a large number of accesses to bitmap blocks in PQ slaves during extent-based scans
 of table/index segments.

 AWR reports may show "read by other session" as one of the top wait events and Section "Buffer Wait Statistics" may report "1st level bmb" as the most important wait class.

 The difference between the High and Low High Water Mark can be identified by running procedure DBMS_SPACE_ADMIN.ASSM_SEGMENT_SYNCHWM with parameter check_only=1.


The AWR report confirms the above notice:
 read by other session  Waits=11,302,432
 1st level bmb          Waits=98,112


and calling:
 DBMS_SPACE_ADMIN.ASSM_SEGMENT_SYNCHWM('X','TABLE_1','TABLE',NULL,1) returns 1.

MOS also gives a Workaround:
   Execute procedure DBMS_SPACE_ADMIN.ASSM_SEGMENT_SYNCHWM with check_only=0 to synchronize the gap between the High and Low High Water Mark.
   IMPORTANT: A patch for bug 6493013 must be installed before running that procedure.

  
DBMS_SPACE_ADMIN.ASSM_SEGMENT_SYNCHWM only signals the existence of the gap,
but does not report the size of the gap. However, the width between low HWM and the HWM
determines the performance of full table scan (see Section "Segment Space and the High Water Mark" later in this Blog).

To find the exact gap size, we can dump segment header by:
 alter system dump datafile 94 block 2246443; 

Here the output:
  -----------------------------------------------------------------
  Extent Header:: spare1: 0      spare2: 0      #extents: 1242809 #blocks: 19884944
                  last map  0x9d178f31  #maps: 2446   offset: 2720 
      Highwater::  0x9d179f90  ext#: 1242808 blk#: 16     ext size: 16   
  #blocks in seg. hdr's freelists: 0    
  #blocks below: 19804642
  mapblk  0x9d178f31  offset: 441  
                   Unlocked
  --------------------------------------------------------
  Low HighWater Mark :
      Highwater::  0x1e243af9  ext#: 448588 blk#: 16     ext size: 16   
  #blocks in seg. hdr's freelists: 0    
  #blocks below: 7177417
  mapblk  0x1e242fd9  offset: 225  
  Level 1 BMB for High HWM block: 0x9d179910
  Level 1 BMB for Low HWM block: 0x1aace0d9
  --------------------------------------------------------


We can see there are 19,804,642 blocks below Highwater: 0x9d179f90, but only 7,177,417 below Low HighWater Mark: 0x1e243af9. So 12,627,225 (19804642-7177417) are between (64%).

The distance between Level 1 BMB High HWM block and Low HWM block is 2,188,032,055
(0x9d179910 - 0x1aace0d9 = 2635569424 - 447537369 = 2188032055).

After synchronizing by:
  DBMS_SPACE_ADMIN.ASSM_SEGMENT_SYNCHWM('X','TABLE_1','TABLE',NULL,0)
the gap is gotten closed as showed by anew header dump:
  
  -----------------------------------------------------------------
  Extent Header:: spare1: 0      spare2: 0      #extents: 1242809 #blocks: 19884944
                  last map  0x9d178f31  #maps: 2446   offset: 2720 
      Highwater::  0x9d179f90  ext#: 1242808 blk#: 16     ext size: 16   
  #blocks in seg. hdr's freelists: 0    
  #blocks below: 19804642
  mapblk  0x9d178f31  offset: 441  
                   Unlocked
  --------------------------------------------------------
  Low HighWater Mark :
      Highwater::  0x9d179f90  ext#: 1242808 blk#: 16     ext size: 16   
  #blocks in seg. hdr's freelists: 0    
  #blocks below: 19884944
  mapblk  0x9d178f31  offset: 441  
  Level 1 BMB for High HWM block: 0x9d179910
  Level 1 BMB for Low HWM block: 0x9d179910
  --------------------------------------------------------
  
 
In "Low HighWater Mark" section:
     -. "#blocks below: 7177417" is updated to "#blocks below: 19884944" (even bigger than 19804642).
     -. "Level 1 BMB for Low HWM block: 0x1aace0d9" is changed to "Level 1 BMB for Low HWM block: 0x9d179910",
               which got the same block id (0x9d179910) as "Level 1 BMB for High HWM block: 0x9d179910".

by chance, we also notice that the gap between the High and Low High Water Mark can be synchronized by:
 select /*+ parallel(t, 2) */ count(*) from TABLE_1 t;


Segment Space and the High Water Mark


Oracle has a detail description about High HWM (or simply HWM) and Low HWM
(see Segment Space and the High Water Mark):

 Suppose that a transaction inserts rows into the segment. The database must allocate a group of blocks to hold the rows. The allocated blocks fall below the HWM. The database formats a bitmap block in this group to hold the metadata,  but does not preformat the remaining blocks in the group. 

 The low HWM is important in a full table scan. Because blocks below the HWM are formatted only when used, some blocks could be unformatted, as in Figure 12-25. For this reason, the database reads the bitmap block to obtain the location of the low HWM. The database reads all blocks up to the low HWM because they are known to be formatted, and then carefully reads only the formatted blocks between the low HWM and the HWM.


We can see that ASSM uses bitmap to track the state of blocks in the segment, and introduces LHWM and HHWM to manage segment space.

Low High-Water Mark (LHWM): Like the old High Water Mark.
    All blocks below this block have been formatted for the table.

High High-Water Mark (HHWM)
    All blocks above this block have not been formatted.

The LHWM and the HHWM may not be the same value depending on how the bitmap tree was traversed. If different the blocks between them may or may not be formatted for use. The HHWM is necessary so that direct load operation can guarantee contiguous unformatted blocks.

During parallel query, a select on v$bh shows more than 100 CR versions of TABLE_1 segment header block. It could be triggered by bitmap block(BMB) checking.

When looking v$session.event, we saw a heavy activity on:
 "db file parallel read"

AWR report also shows:
 db file parallel read:  202,467

Oracle says that this event happens during recovery or buffer prefetching. But in this case, it could be the multiple BMB blocks' reads from different files in which BMB blocks are located for each group.

Friday, May 23, 2014

Oracle 11g AUTO_SAMPLE_SIZE performance on Small Tables

Oracle 11g enhanced DBMS_STATS.AUTO_SAMPLE_SIZE by a new algorithm in calculating NDV
(see Blog: Improvement of AUTO sampling statistics gathering feature in Oracle Database 11g ).

Recent experience on large Databases (a few TB with mixed objects' size) reveals the performance disadvantage on small tables when using AUTO_SAMPLE_SIZE. In most case, objects under 100 MB are 20% to 40% slower.

Such behaviour can be reproduced on a small test DB. At first, create a few tables with different size. Then call dbms_stats.gather_table_stats on them, and record session statistics for each case. (see appended TestCode).

Launch the test by:
     truncate table test_stats;
  exec test_tab_run(10);


then collect test result by:

 with sq as
   (select t.*, value - lag(value) over (partition by table_name, stat_name order by ts) val
      from test_stats t
      where stat_name in ('CPU used by this session', 'undo change vector size', 'sorts (rows)',
                          'db block gets', 'db block changes', 'physical reads', 'consistent gets'))
 select a.table_name, a.table_size, a.stat_name, a.val "AUTO_VAL", p.val "100%_VAL"
       ,(a.val - p.val) diff, round(100*(a.val - p.val)/p.val, 2) "DIFF_%"
 from  (select * from sq where estimate_percent = 'AUTO') a
      ,(select * from sq where estimate_percent = '100%') p
 where a.table_name = p.table_name
   and a.stat_name  = p.stat_name
 order by a.stat_name, a.table_size;


TABLE_NAME TABLE_SIZE STAT_NAME AUTO_VAL 100%_VAL DIFF DIFF_%
TEST_TAB_0KB
131072
CPU used by this session
62
32
30
93.75
TEST_TAB_200KB
262144
CPU used by this session
48
35
13
37.14
TEST_TAB_800KB
917504
CPU used by this session
51
54
-3
-5.56
TEST_TAB_2MB
2228224
CPU used by this session
58
91
-33
-36.26
TEST_TAB_8MB
8519680
CPU used by this session
89
289
-200
-69.2
TEST_TAB_20MB
20971520
CPU used by this session
150
702
-552
-78.63
TEST_TAB_80MB
83755008
CPU used by this session
436
2800
-2364
-84.43
TEST_TAB_0KB
131072
consistent gets
25595
14167
11428
80.67
TEST_TAB_200KB
262144
consistent gets
18976
13582
5394
39.71
TEST_TAB_800KB
917504
consistent gets
19730
14376
5354
37.24
TEST_TAB_2MB
2228224
consistent gets
21338
15990
5348
33.45
TEST_TAB_8MB
8519680
consistent gets
28935
23622
5313
22.49
TEST_TAB_20MB
20971520
consistent gets
44004
38464
5540
14.4
TEST_TAB_80MB
83755008
consistent gets
124858
114275
10583
9.26
TEST_TAB_0KB
131072
db block changes
3347
1748
1599
91.48
TEST_TAB_200KB
262144
db block changes
2277
1639
638
38.93
TEST_TAB_800KB
917504
db block changes
2308
1684
624
37.05
TEST_TAB_2MB
2228224
db block changes
2344
1640
704
42.93
TEST_TAB_8MB
8519680
db block changes
2302
1813
489
26.97
TEST_TAB_20MB
20971520
db block changes
2318
1685
633
37.57
TEST_TAB_80MB
83755008
db block changes
6998
1713
5285
308.52
TEST_TAB_0KB
131072
db block gets
3531
1611
1920
119.18
TEST_TAB_200KB
262144
db block gets
2785
1497
1288
86.04
TEST_TAB_800KB
917504
db block gets
2786
1555
1231
79.16
TEST_TAB_2MB
2228224
db block gets
2849
1499
1350
90.06
TEST_TAB_8MB
8519680
db block gets
2785
1659
1126
67.87
TEST_TAB_20MB
20971520
db block gets
2815
1540
1275
82.79
TEST_TAB_80MB
83755008
db block gets
2765
1577
1188
75.33
TEST_TAB_0KB
131072
physical reads
1885
988
897
90.79
TEST_TAB_200KB
262144
physical reads
1136
906
230
25.39
TEST_TAB_800KB
917504
physical reads
1208
965
243
25.18
TEST_TAB_2MB
2228224
physical reads
1382
1146
236
20.59
TEST_TAB_8MB
8519680
physical reads
2139
1903
236
12.4
TEST_TAB_20MB
20971520
physical reads
3639
25991
-22352
-86
TEST_TAB_80MB
83755008
physical reads
11188
101658
-90470
-88.99
TEST_TAB_0KB
131072
sorts (rows)
13623
9873
3750
37.98
TEST_TAB_200KB
262144
sorts (rows)
12609
69442
-56833
-81.84
TEST_TAB_800KB
917504
sorts (rows)
12609
248542
-235933
-94.93
TEST_TAB_2MB
2228224
sorts (rows)
12609
606742
-594133
-97.92
TEST_TAB_8MB
8519680
sorts (rows)
12609
2397742
-2385133
-99.47
TEST_TAB_20MB
20971520
sorts (rows)
12609
5979742
-5967133
-99.79
TEST_TAB_80MB
83755008
sorts (rows)
12609
23889742
-23877133
-99.95
TEST_TAB_0KB
131072
undo change vector size
143968
84360
59608
70.66
TEST_TAB_200KB
262144
undo change vector size
92536
81572
10964
13.44
TEST_TAB_800KB
917504
undo change vector size
102888
82036
20852
25.42
TEST_TAB_2MB
2228224
undo change vector size
103436
81240
22196
27.32
TEST_TAB_8MB
8519680
undo change vector size
104356
95100
9256
9.73
TEST_TAB_20MB
20971520
undo change vector size
94504
90772
3732
4.11
TEST_TAB_80MB
83755008
undo change vector size
94528
90664
3864
4.26


Above table points out:

  1. "CPU used by this session" is favorable to "AUTO_SAMPLE_SIZE" only when table size is over 800KB.
  2. "consistent gets", "physical reads" are increasing with table size.
          These reflect the real work of gather_table_stats.
  3. "db block changes", "db block gets", "undo change vector size" are almost constant.
     but "AUTO_SAMPLE_SIZE" is mostly higher than "100%".
          They stand for the base load of gather_table_stats, since gather_table_stats hardly makes any modifications on the tables.
          Regardless of table size, they pertain for each call of gather_table_stats.
  4. "sorts (rows)" is constant for "AUTO_SAMPLE_SIZE", but increasing rapidly for "100%".
         This exposes the effective advantage of new algorithm.
so we can say that only when base load in Point 3 is overwhelmed, AUTO_SAMPLE_SIZE exhibits the advantage.

The query on elapsed time (in millisecond) also confirms the variance of performance on object size for both methods:

  with sq as
    (select t.*
           ,(ts - lag(ts) over (partition by table_name, stat_name order by ts))dur
       from test_stats t
       where stat_name in ('CPU used by this session'))
  select a.table_name, a.table_size
        ,round(extract(minute from a.dur)*60*1000 + extract(second from a.dur)*1000) "AUTO_DUR"
        ,round(extract(minute from p.dur)*60*1000 + extract(second from p.dur)*1000) "100%_DUR"
        ,round(extract(minute from (a.dur-p.dur))*60*1000 + extract(second from (a.dur-p.dur))*1000) diff_dur
  from  (select * from sq where estimate_percent = 'AUTO') a
       ,(select * from sq where estimate_percent = '100%') p
  where a.table_name = p.table_name
    and a.stat_name  = p.stat_name
  order by a.stat_name, a.table_size;
 

TABLE_NAME TABLE_SIZE AUTO_DUR 100%_DUR DIFF_DUR
TEST_TAB_0KB
131072
1852
882
970
TEST_TAB_200KB
262144
986
748
239
TEST_TAB_800KB
917504
1066
1029
37
TEST_TAB_2MB
2228224
1179
1648
-469
TEST_TAB_8MB
8519680
1645
4819
-3174
TEST_TAB_20MB
20971520
2650
11407
-8756
TEST_TAB_80MB
83755008
7300
44910
-37610


As a conclusion, this Blog demonstrated the existence of crossing point on object size. Only apart from this point, AUTO_SAMPLE_SIZE is profitable, and benefice is proportional to the object size.

AUTO_SAMPLE_SIZE Internals


By a 10046 trace when running:
   exec dbms_stats.gather_table_stats(null, 'TEST_TAB_2MB', estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE);

we can see the new Operation: APPROXIMATE NDV:

 Rows     Row Source Operation
 -------  ---------------------------------------------------
       1  SORT AGGREGATE (cr=300 pr=296 pw=0 time=246170 us)
   30000   APPROXIMATE NDV AGGREGATE (cr=300 pr=296 pw=0 time=44330 us ...)
   30000    TABLE ACCESS FULL TEST_TAB_2MB (cr=300 pr=296 pw=0 time=23643 us ...)


As discussed in Amit Poddar's "One Pass Distinct Sampling" (comprising details of math foundations), the most expensive operation in the old method is due to aggregate function (count (distinct ..)) to calculate the NDVs, whereas the new algorithm scans the full table once to build a synopsis per column to calculate column NDV.

To explore APPROXIMATE NDV AGGREGATE internals, Oracle designated a special event:
   ORA-10832: Trace approximate NDV row source

The details of approximate NDV algorithm can be traced with event 10832 set at level 1.

The new algorithm is coded in DBMS_SQLTUNE_INTERNAL.GATHER_SQL_STATS and called by DBMS_STATS_INTERNAL.GATHER_SQL_STATS, which is again called by DBMS_STATS.GATHER_BASIC_USING_APPR_NDV, and finally by DBMS_STATS.GATHER_TABLE_STATS.

TestCode


drop view stats_v;

create view stats_v as
select n.name stat_name, s.value from v$statname n, v$mystat s where n.statistic# = s.statistic#;


drop table test_stats;

create table test_stats as
select timestamp'2013-11-22 10:20:30' ts, 'TEST_TAB_xxxxxxxxxx' table_name, 0 table_size, 'AUTO_or_100%' estimate_percent, v.*
from stats_v v where 1=2;


drop table test_tab_0kb;
create table test_tab_0kb(x number, y varchar2(100));
insert into test_tab_0kb(y) values(null);
commit;


drop table test_tab_200kb;
create table test_tab_200kb(x number, y varchar2(100));
insert into test_tab_200kb select level, rpad('abc', mod(level, 100), 'x') from dual connect by level <= 3000;
commit;


drop table test_tab_800kb;
create table test_tab_800kb(x number, y varchar2(100));
insert into test_tab_800kb select level, rpad('abc', mod(level, 100), 'x') from dual connect by level <= 12000;
commit;


drop table test_tab_1mb;
create table test_tab_1mb(x number, y varchar2(100));
insert into test_tab_1mb select level, rpad('abc', mod(level, 100), 'x') from dual connect by level <= 15000;
commit;


drop table test_tab_2mb;
create table test_tab_2mb(x number, y varchar2(100));
insert into test_tab_2mb select level, rpad('abc', mod(level, 100), 'x') from dual connect by level <= 30000;
commit;


drop table test_tab_8mb;
create table test_tab_8mb(x number, y varchar2(100));
insert into test_tab_8mb select level, rpad('abc', mod(level, 100), 'x') from dual connect by level <= 120000;
commit;


drop table test_tab_20mb;
create table test_tab_20mb(x number, y varchar2(100));
insert into test_tab_20mb select level, rpad('abc', mod(level, 100), 'x') from dual connect by level <= 300000;
commit;


drop table test_tab_80mb;
create table test_tab_80mb(x number, y varchar2(100));
insert into test_tab_80mb select level, rpad('abc', mod(level, 100), 'x') from dual connect by level <= 1200000;
commit;


create or replace procedure fill_tab(p_table_name varchar2, p_size_mb number) as
begin 
  for i in 1..p_size_mb loop
    execute immediate 'insert into ' || p_table_name||' select * from test_tab_1mb';
  end loop;
  commit;
end;
/


create or replace procedure test_tab_proc (p_table_name varchar2, p_cnt number) as
  l_table_size number;
 
  procedure flush as
    begin
      execute immediate 'alter system flush shared_pool';
      execute immediate 'alter system flush buffer_cache';   
    end;
begin
  execute immediate 'select bytes from dba_segments where segment_name = :s' into l_table_size using p_table_name;
  insert into test_stats select systimestamp, p_table_name, l_table_size, 'start', v.* from stats_v v;
  flush;
  for i in 1..p_cnt loop
    dbms_stats.gather_table_stats(null, p_table_name, estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE);   
  end loop;
  insert into test_stats select systimestamp, p_table_name, l_table_size, 'AUTO', v.* from stats_v v;
  flush;
  for i in 1..p_cnt loop
    dbms_stats.gather_table_stats(null, p_table_name, estimate_percent => 100);   
  end loop;
  insert into test_stats select systimestamp, p_table_name, l_table_size, '100%', v.* from stats_v v; 
  commit;
end;
/


create or replace procedure test_tab_run(p_cnt number) as
begin
 test_tab_proc('TEST_TAB_0KB',  p_cnt);
 test_tab_proc('TEST_TAB_200KB', p_cnt);
 test_tab_proc('TEST_TAB_800KB', p_cnt);
 test_tab_proc('TEST_TAB_2MB',  p_cnt);
 test_tab_proc('TEST_TAB_8MB',  p_cnt);
 test_tab_proc('TEST_TAB_20MB', p_cnt);
 test_tab_proc('TEST_TAB_80MB', p_cnt);
end;
/

Thursday, May 8, 2014

java stored procedure calls and latch: row cache objects, and performance

In the last Blog, we demonstrated java stored procedure calls and latch: row cache objects.

Is "latch: row cache objects" a real performance problem ?

Let's run some scalability tests for both cases (See TestCode in Blog: java stored procedure calls and latch: row cache objects) on AIX Power7 System with Entitled Capacity 4, SMT=4 and Oracle 11.2.0.3.0.

begin
  xpp_test.run_var(p_case => 1, p_job_var => 32, p_dur_seconds => 60);
  xpp_test.run_var(p_case => 2, p_job_var => 32, p_dur_seconds => 60);
end;
/


The above code runs Case_1 and Case_2 with Jobs varying from 1 to 32 for 60 seconds.

Scalability


Once the test terminated, run query: 

with run as
 (select case, job_cnt, sum(dur)*1000 elapsed,
         sum(run_cnt) run_cnt, sum(java_cnt) java_cnt, round(sum(java_nano)/1000/1000) java_elapsed
  from test_stats
  group by case, job_cnt)
select r1.job_cnt, r1.elapsed,
       r1.run_cnt c1_run_cnt, r2.run_cnt c2_run_cnt,
       r1.java_cnt c1_java_cnt, r2.java_cnt c2_java_cnt,
       r1.java_elapsed c1_java_elapsed, r2.java_elapsed c2_java_elapsed
from (select * from run where case = 1) r1
    ,(select * from run where case = 2) r2
where r1.job_cnt = r2.job_cnt(+)
order by r1.job_cnt;


The output shows that the throughput of both cases (C1_RUN_CNT and C2_RUN_CNT) is very closed.

JOB_CNT ELAPSED C1_RUN_CNT C2_RUN_CNT C1_JAVA_CNT C2_JAVA_CNT C1_JAVA_ELAPSED C2_JAVA_ELAPSED
1
60000
118
119
86612
119
390
1599
2
120000
240
242
176160
242
794
2944
3
180000
360
361
264240
361
1196
4439
4
240000
461
459
338374
459
1566
5756
5
300000
476
503
349384
503
1852
7055
6
360000
515
527
378010
527
2101
8040
7
420000
551
555
404434
555
2408
8952
8
480000
569
580
417646
580
2689
10464
9
540000
597
582
438198
582
2978
11459
10
600000
601
591
441134
591
3163
12797
11
660000
613
635
449942
635
3350
14150
12
720000
646
654
474164
654
3655
15019
13
780000
683
668
501322
668
3912
16619
14
840000
696
714
510864
714
4260
16985
15
900000
714
722
524076
722
4435
19440
16
960000
733
730
538022
730
4822
20250
17
1020000
730
728
535820
728
4947
22637
18
1080000
726
744
532884
744
5022
21034
19
1140000
743
727
545362
727
5274
23924
20
1200000
734
733
538756
733
5611
24402
21
1260000
737
729
540958
729
6075
26138
22
1320000
730
727
535820
727
6406
28011
23
1380000
737
735
540958
735
6555
27972
24
1440000
746
730
547564
730
7040
30529
25
1500000
733
742
538022
742
7181
29288
26
1560000
738
729
541692
729
6620
33637
27
1620000
740
735
543160
735
7269
32531
28
1680000
741
738
543894
738
7881
36467
29
1740000
749
747
549766
747
7652
34837
30
1800000
737
735
540958
735
8285
37758
31
1860000
746
743
547564
743
8759
38612
32
1920000
759
755
557106
755
8326
40002

                                                                     Table-1

If drawing a graph with above data, we can see that throughput is linear till JOB_CNT=4, and reach its peak point at JOB_CNT=16.


                                                                    Figure-1

Latch: row cache objects

                      
If we run the query for latch: row cache objects:

with rc as
  (select s.case, s.job_cnt, (e.gets-s.gets) gets, (e.misses-s.misses) misses, (e.sleeps-s.sleeps) sleeps,
        (e.spin_gets-s.spin_gets) spin_gets, (e.wait_time-s.wait_time) wait_time
  from  (select * from rc_latch_stats where step = 'start') s
       ,(select * from rc_latch_stats where step = 'end')   e
  where s.case = e.case and s.job_cnt = e.job_cnt)
select c1.job_cnt, c1.gets c1_gets, c2.gets c2_gets, c1.misses c1_misses, c2.misses c2_misses,
       c1.sleeps c1_sleeps, c2.sleeps c2_sleeps, c1.spin_gets c1_spin_gets, c2.spin_gets c2_spin_gets,
       round(c1.wait_time/1000) c1_wait_time, round(c2.wait_time/1000) c2_wait_time
from (select * from rc where case = 1) c1
    ,(select * from rc where case = 2) c2
where  c1.job_cnt = c2.job_cnt(+)
order by c1.job_cnt;


and look its output:

JOB_CNT C1_GETS C2_GETS C1_MISSES C2_MISSES C1_SLEEPS C2_SLEEPS C1_SPIN_GETS C2_SPIN_GETS C1_WAIT_TIME C2_WAIT_TIME
1
261925
38360
1
47
0
0
1
47
0
0
2
590567
4070
1950
0
0
0
1950
0
0
0
3
755390
2755
6342
0
0
0
6342
0
0
0
4
941682
3459
13772
0
2
0
13770
0
3
0
5
958751
4947
60853
0
4
0
60850
0
14
0
6
1128800
5224
92700
146
1
0
92699
146
1
0
7
1193847
4719
102209
31
1
0
102208
31
1
0
8
1219302
19716
151143
0
0
0
151143
0
0
0
9
1261478
5227
155337
8
13
0
155324
8
464
0
10
1256130
5404
195400
0
1
0
195399
0
0
0
11
1264005
7931
251793
27
0
0
251793
27
0
0
12
1320357
6014
254131
73
1
0
254130
73
0
0
13
1492126
7266
394229
121
0
0
394229
121
0
0
14
1496853
6948
408408
173
0
0
408408
173
0
0
15
1511372
7124
417898
472
13
0
417885
472
1795
0
16
1521562
8392
437642
283
11
0
437631
283
2271
0
17
1494286
20916
355035
179
36
0
354999
179
4324
0
18
1459280
7891
320098
420
53
0
320046
420
9841
0
19
1611370
9055
335870
212
37
0
335833
212
9564
0
20
1566755
9186
280001
218
100
5
279903
213
23946
530
21
1553970
8186
268234
52
70
14
268165
38
21438
3470
22
1499580
9350
230573
112
157
0
230420
112
40523
0
23
1474488
8538
244989
142
158
4
244832
138
36357
392
24
1628831
9677
229768
69
197
0
229573
69
54006
0
25
1568732
11112
226365
344
392
5
225984
339
103739
311
26
1548840
22742
229974
155
323
0
229656
155
76637
0
27
1519217
10326
204424
148
418
0
204014
148
86043
0
28
1479227
9322
200061
78
476
0
199596
78
102191
0
29
1641837
9841
226975
220
493
0
226486
220
107470
0
30
1556361
12102
210739
159
449
1
210305
158
93532
18
31
1518952
10004
191122
255
558
22
190572
233
104639
4785
32
1544149
10248
191564
254
628
9
190940
245
114982
447


                                                                   Table-2

There is a broad difference of latch gets and misses between Case_1 and Case_2.

Also drawing a graph for Case_1:


                                                                  Figure-2

For Case_1, latch misses reach its top at about JOB_CNT=16, and starting from JOB_CNT=15, there is a considerable latch wait_time.

Unbalanced Workload


One interesting query to reveal the unbalanced workload of Job sessions (UNIX processes) is:

select job_cnt, min(run_cnt) min, max(run_cnt) max, (max(run_cnt) - min(run_cnt)) delta_max
      ,round(avg(run_cnt), 2) avg, round(stddev(run_cnt), 2) stddev
from test_stats t
where case = 1
group by case, job_cnt
order by case, job_cnt;


which shows certain difference (STDDEV) starting from JOB_CNT = 4 on 4 Physical CPUs AIX (Entitled Capacity = 4) since at least one CPU is required to perform UNIX system and other Oracle tasks.
(Only Case_1 is showed, but Case_2 looks similar)


JOB_CNT MIN MAX DELTA_MAX AVG STDDEV
1
118
118
0
118
0
2
120
120
0
120
0
3
120
120
0
120
0
4
109
120
11
115.25
5.62
5
74
119
45
95.2
19.98
6
75
97
22
85.83
7.7
7
66
100
34
78.71
13.05
8
63
77
14
71.13
4.82
9
58
76
18
66.33
6.3
10
56
73
17
60.1
4.95
11
48
67
19
55.73
6.26
12
49
64
15
53.83
5.06
13
47
66
19
52.54
6.1
14
46
65
19
49.71
5.95
15
45
51
6
47.6
2.2
16
44
47
3
45.81
0.75
17
38
46
8
42.94
2.05
18
35
44
9
40.33
2.47
19
32
46
14
39.11
4.23
20
32
43
11
36.7
3.7
21
28
42
14
35.1
4.37
22
25
39
14
33.18
3.43
23
26
39
13
32.04
3.13
24
25
36
11
31.08
2.65
25
25
32
7
29.32
1.73
26
25
35
10
28.38
2.5
27
25
33
8
27.41
1.89
28
24
30
6
26.46
1.53
29
24
28
4
25.83
1.07
30
22
28
6
24.57
1.33
31
22
33
11
24.06
2.06
32
21
30
9
23.72
1.9

                                   Table-3