On Oracle: 2014

Monday, December 22, 2014

Local prefixed vs. nonprefixed indexes: Column Null Values

According to Oracle Doc, Oracle does not index table rows in which all key columns are null except in the case of bitmap indexes (Database SQL Language Reference).

Therefore, in a composite B-tree index, if there exists one column which contains a non-null value, the row will be indexed. Since local prefixed indexes is a composite index which includes partition keys on the leading edge of the index definition, and partition keys are never all null, all rows are always indexed.

For applications, it means that local prefixed indexes is always usable, but not local nonprefixed indexes. In the performance point of view, Column Null Values is a torque on Local prefixed vs. nonprefixed indexes (Number_of_Null_Columns x Number_of_Rows).

Taking the same example from Expert Oracle Database Architecture Page 593, but updating Column b of one row in PART_1 as NULL.

create table partitioned_table
( a int,
   b int,
   data char(20)
)
partition by range (a)
(
partition part_1 values less than(2) tablespace p1,
partition part_2 values less than(3) tablespace p2
)
/

create index local_prefixed on partitioned_table (a,b) local;

create index local_nonprefixed on partitioned_table (b) local;

insert into partitioned_table
  select mod(rownum-1,2)+1, rownum, 'x' from all_objects where rownum <= 1000;

update partitioned_table set b=null where a=1 and b=99;

begin
    dbms_stats.gather_table_stats
    ( user,
     'PARTITIONED_TABLE',
      cascade=>TRUE );
end;
/

we can see PART_1 having one row with Column b being NULL:

select a, count(*), count(a), count(b)
  from partitioned_table group by a;

           A   COUNT(*)   COUNT(A)   COUNT(B)
  ---------- ---------- ---------- ----------
           1        500        500        499
           2        500        500        500

select index_name, partition_name, num_rows
   from dba_ind_partitions
where index_name in ('LOCAL_PREFIXED', 'LOCAL_NONPREFIXED');

INDEX_NAME                     PARTITION_NAME                   NUM_ROWS
------------------------------ ------------------------------ ----------
LOCAL_NONPREFIXED              PART_1                                499
LOCAL_NONPREFIXED              PART_2                                500
LOCAL_PREFIXED                 PART_1                                500
LOCAL_PREFIXED                 PART_2                                500

Now if we run following queries, all of them will use LOCAL_PREFIXED index.

select /*+ index(t local_prefixed) */ count(a) from partitioned_table t where a = 1 and b is null;

select /*+ index(t local_nonprefixed) */ count(a) from partitioned_table t where a = 1 and b is null;

select /*+ index(t local_prefixed) */ count(a) from partitioned_table t where b is null;

select /*+ index(t local_nonprefixed) */ count(a) from partitioned_table t where b is null;

select /*+ index(t local_prefixed) */ count(a) from partitioned_table t where a is null and b is null;

select /*+ index(t local_nonprefixed) */ count(a)
   from partitioned_table t where a is null and b is null;

  -----------------------------------------------------------------------------------------------
  | Id | Operation               | Name           | Rows | Bytes | Cost (%CPU)| Pstart| Pstop |
  -----------------------------------------------------------------------------------------------
  |   0 | SELECT STATEMENT        |                |     1 |     7 |     1   (0)|       |       |
  |   1 | SORT AGGREGATE         |                |     1 |     7 |            |       |       |
  |   2 |   PARTITION RANGE SINGLE|                |     1 |     7 |     1   (0)|     1 |     1 |
  |* 3 |    INDEX RANGE SCAN     | LOCAL_PREFIXED |     1 |     7 |     1   (0)|     1 |     1 |
  -----------------------------------------------------------------------------------------------

  Predicate Information (identified by operation id):
  ---------------------------------------------------

     3 - access("A"=1 AND "B" IS NULL)

For the queries with partition key being NULL, PARTITION RANGE EMPTY elimination is used.

Even though PART_2 does not contain any rows whose Column b being NULL, LOCAL_NONPREFIXED index can neither be used in the following queries.

select /*+ index(t local_prefixed) */ count(a) from partitioned_table t where a = 2 and b is null;

select /*+ index(t local_nonprefixed) */ count(a) from partitioned_table t where a = 2 and b is null;

select /*+ index(t local_nonprefixed) */ count(a)
   from partitioned_table partition (part_2) t where b is null;

Monday, November 10, 2014

reverse key index: the remedy is worse than the symptoms

In general, reverse key index mitigates data block contention (buffer busy waits) when inserting monotonically increasing key into right-most index by multiple concurrent sessions. The disadvantage is its inability to support range scan.

Based on AWR report, this Blog will try to list the main checking points when using reverse key index, and shows how to find the acceptable trade-off between benefits and drawbacks. The whole discussion stems from author's experience in single instance DB (non-RAC).

1.Top 5 Timed Foreground Events

When some events like:
enq: TX - index contention
buffer busy waits
read by other session
appear in Top Events, reverse key index is hastily resorted.

However "db file sequential read" should be checked after experimenting reverse key index since all the original inserts of right-most single index block are now spread over many blocks. One should try to balance the fast logical buffer activity against slow physical disk access.

2. Time Model Statistics

Since more blocks are involved with reverse key index, DBWR is more active, and therefore:
background elapsed time
background cpu time
are augmented.

Higher "db file parallel write" and "DBWR checkpoint buffers written" can also be observed,

3. SQL ordered by Gets

The insert statement generates a lower "Buffer Gets" in case of reverse key index.

4. SQL ordered by Reads

Reverse key index incurs a higher "Physical Reads" for the insert statement, almost 1 "Reads per Exec" since each insert touches a single different block.

Moreover, the queries using the index also cause higher "Physical Reads" due to the wide spreading of conceptually adjacent indexes, but physically isolated.

5. Segment Statistics

The absolute figures of buffer contentions and physical activities can be extracted by looking:
Segments by Logical Reads
Segments by Row Lock Waits
Segments by Buffer Busy Waits

Segments by Physical Reads
Segments by Physical Writes
Based on them, a quick magnitude comparison of IO and Memory associated with system statistics can give an approximate assessment of reverse key index.

Discussion

D-1. Expert Oracle Database Architecture said:

One way to improve the performance of the PL/SQL implementation with a regular index would be to introduce a small wait. That would reduce the contention on the right-hand side of the index and increase overall performance (Page 439)

reverse key index (designed for Oracle RAC) (Page 491)

Usually, sequence insert in OLTP system is inherently operated with a slight delay due to the interactive nature of OLTP, whereas Batch processing can be impeded by heavy buffer contention of index blocks.

D-2. Introduction To Reverse Key Indexes: Part III (A Space Oddity) and
        Oracle B-Tree Index Internals:Rebuilding The Truth wrote:

If the "right-most" block is filled by the maximum current value in the index, Oracle performs 90-10 block splits meaning that full index blocks are left behind in the index structure.

I prefer to call them 99-1 block splits as 90-10 is misleading.

Generally reverse key index tends to use 50-50 Split, which costs more block creates and touches.
In principle, it shifts single block buffer contention to multiple accesses of separate blocks.

D-3. Oracle MOS has a cautious Doc:
        Using Reverse Key Indexes For Oracle 11.2.0.2 (Doc ID 1352477.1)

However, it must be noted that : Now the entire index had better be in the buffer cache whereas before - only the hot right hand side needed to be in the cache for efficient inserts. If the index cannot fit into the cache we might well have turned a buffer busy wait into a physical IO wait which could be even worse (the remedy is worse than the symptoms).

The reminiscent last remark is voted as the title of this Blog.

D-4. Oracle MOS Doc:
Troubleshooting 'enq: TX - index contention' Waits in a RAC Environment. (Doc ID 873243.1)
explains "enq: TX - index contention":

The reason for this is the index block splits while inserting a new row into the index. The transactions will have to wait for TX lock in mode 4, until the session that is doing the block splits completes the operations.

and suggests to look Instance Activity Stats:
root node splits
branch node splits
leaf node splits
leaf node 90-10 splits

In a DB, reverse key index can be found by:
select * from dba_indexes where index_type = 'NORMAL/REV';

Thursday, November 6, 2014

Low High Water Mark, High High Water Mark, and Parallel Query

Recently we saw a 4 hours running parallel SELECT:

select /*+ parallel(t,2) parallel_index(t,2) dbms_stats cursor_sharing_exact
   use_weak_name_resl dynamic_sampling(0) no_monitoring no_substrb_pad
   */count(*), count("ID"), count(distinct "ID"),
   sum(sys_op_opnsize("ID")), substrb(dump(min("ID"),16,0,32),1,120),
   substrb(dump(max("ID"),16,0,32),1,120),
   ...
from "X"."TABLE_1" sample ( 5.0000000000) t;

which is generated when executing:
dbms_stats.gather_table_stats('X', 'TABLE_1', estimate_percent=>5, degree=>2);
on both Oracle 11.2.0.3.0 and 11.2.0.4.0.

The table is about 160GB and contains 20m blocks (8K per block) in ASSM LMT Managed Tablespaces, all the columns are number or varchar2.

AWR report shows the SELECT statement statistics:
   Buffer Gets          14,805,983,166
Physical Reads           40,474,817
UnOptimized Read Reqs     1,881,624

TABLE_1 Segment Statistics:
Segments by Logical Reads        16,312,997,632
Segments by Physical Reads           38,580,663
Segments by Direct Physical Reads    38,557,265
Segments by Table Scans                 263,660
Segments by Buffer Busy Waits            53,492

At first, one surprising big "Table Scans" indicates certainly abnormal behavior.

Huge number of "Buffer Gets" and double "Physical Reads" also explain the slow running.

"Table Scans" for Parallel Query is the Execs of "TABLE ACCESS FULL" below "PX BLOCK ITERATOR" in xplan, since "PX BLOCK ITERATOR" at first breaks up the table into granules which are then dispatched to each PX slaves. In the case of SAMPLE select, it is the Execs of "TABLE ACCESS SAMPLE BY ROWID RANGE".

The number of granules processed by each PX slave is controlled by two hidden parameters:
_px_min_granules_per_slave: minimum number of rowid range granules to generate per slave default 13
_px_max_granules_per_slave: maximum number of rowid range granules to generate per slave default 100

For instance, parallel(t,2) contributes minimum 26 (2*13) "Table Scans", and maximum 200 (2*100);
and parallel(t,16) generates minimum 208 (16*13) "Table Scans", and maximum 1600 (16*100);

Therefore when increasing degree from 2 to 16,
dbms_stats.gather_table_stats(null, 'TABLE_1', estimate_percent=>5, degree=>16);
will take 32 (8*4) hours. That is a counterintuitive behavior when using parallel query.

The "Table Scans" can be monitored by:
     select * from V$segment_statistics
   where object_name = 'TABLE_1' and statistic_name = 'segment scans';

The hidden parameters can be changed by, for example:
alter system set "_px_min_granules_per_slave"=19;

Searching MOS, we found:

Bug 6474009 - On ASSM tablespaces Scans using Parallel Query can be slower than in 9.2 (Doc ID 6474009.8)

which says:

Scans using Parallel Query on ASSM tablespaces on 10.2 can be slower compared to 9.2 when there is a wide separation between the High and Low High Water Mark. There can be High contention and/or a large number of accesses to bitmap blocks in PQ slaves during extent-based scans
of table/index segments.

AWR reports may show "read by other session" as one of the top wait events and Section "Buffer Wait Statistics" may report "1st level bmb" as the most important wait class.

The difference between the High and Low High Water Mark can be identified by running procedure DBMS_SPACE_ADMIN.ASSM_SEGMENT_SYNCHWM with parameter check_only=1.

The AWR report confirms the above notice:
read by other session Waits=11,302,432
1st level bmb          Waits=98,112

and calling:
DBMS_SPACE_ADMIN.ASSM_SEGMENT_SYNCHWM('X','TABLE_1','TABLE',NULL,1) returns 1.

MOS also gives a Workaround:
   Execute procedure DBMS_SPACE_ADMIN.ASSM_SEGMENT_SYNCHWM with check_only=0 to synchronize the gap between the High and Low High Water Mark.
   IMPORTANT: A patch for bug 6493013 must be installed before running that procedure.

DBMS_SPACE_ADMIN.ASSM_SEGMENT_SYNCHWM only signals the existence of the gap,
but does not report the size of the gap. However, the width between low HWM and the HWM
determines the performance of full table scan (see Section "Segment Space and the High Water Mark" later in this Blog).

To find the exact gap size, we can dump segment header by:
alter system dump datafile 94 block 2246443;

Here the output:
-----------------------------------------------------------------
Extent Header:: spare1: 0      spare2: 0      #extents: 1242809 #blocks: 19884944
                  last map 0x9d178f31 #maps: 2446   offset: 2720
      Highwater:: 0x9d179f90 ext#: 1242808 blk#: 16     ext size: 16
#blocks in seg. hdr's freelists: 0
#blocks below: 19804642
mapblk 0x9d178f31 offset: 441
                   Unlocked
--------------------------------------------------------
Low HighWater Mark :
      Highwater:: 0x1e243af9 ext#: 448588 blk#: 16     ext size: 16
#blocks in seg. hdr's freelists: 0
#blocks below: 7177417
mapblk 0x1e242fd9 offset: 225
Level 1 BMB for High HWM block: 0x9d179910
Level 1 BMB for Low HWM block: 0x1aace0d9
--------------------------------------------------------

We can see there are 19,804,642 blocks below Highwater: 0x9d179f90, but only 7,177,417 below Low HighWater Mark: 0x1e243af9. So 12,627,225 (19804642-7177417) are between (64%).

The distance between Level 1 BMB High HWM block and Low HWM block is 2,188,032,055
(0x9d179910 - 0x1aace0d9 = 2635569424 - 447537369 = 2188032055).

After synchronizing by:
  DBMS_SPACE_ADMIN.ASSM_SEGMENT_SYNCHWM('X','TABLE_1','TABLE',NULL,0)
the gap is gotten closed as showed by anew header dump:

-----------------------------------------------------------------
Extent Header:: spare1: 0      spare2: 0      #extents: 1242809 #blocks: 19884944
                  last map 0x9d178f31 #maps: 2446   offset: 2720
      Highwater:: 0x9d179f90 ext#: 1242808 blk#: 16     ext size: 16
#blocks in seg. hdr's freelists: 0
#blocks below: 19804642
mapblk 0x9d178f31 offset: 441
                   Unlocked
--------------------------------------------------------
Low HighWater Mark :
      Highwater:: 0x9d179f90 ext#: 1242808 blk#: 16     ext size: 16
#blocks in seg. hdr's freelists: 0
#blocks below: 19884944
mapblk 0x9d178f31 offset: 441
Level 1 BMB for High HWM block: 0x9d179910
Level 1 BMB for Low HWM block: 0x9d179910
--------------------------------------------------------

In "Low HighWater Mark" section:
     -. "#blocks below: 7177417" is updated to "#blocks below: 19884944" (even bigger than 19804642).
     -. "Level 1 BMB for Low HWM block: 0x1aace0d9" is changed to "Level 1 BMB for Low HWM block: 0x9d179910",
               which got the same block id (0x9d179910) as "Level 1 BMB for High HWM block: 0x9d179910".

by chance, we also notice that the gap between the High and Low High Water Mark can be synchronized by:
select /*+ parallel(t, 2) */ count(*) from TABLE_1 t;

Segment Space and the High Water Mark

Oracle has a detail description about High HWM (or simply HWM) and Low HWM
(see Segment Space and the High Water Mark):

Suppose that a transaction inserts rows into the segment. The database must allocate a group of blocks to hold the rows. The allocated blocks fall below the HWM. The database formats a bitmap block in this group to hold the metadata,  but does not preformat the remaining blocks in the group.

The low HWM is important in a full table scan. Because blocks below the HWM are formatted only when used, some blocks could be unformatted, as in Figure 12-25. For this reason, the database reads the bitmap block to obtain the location of the low HWM. The database reads all blocks up to the low HWM because they are known to be formatted, and then carefully reads only the formatted blocks between the low HWM and the HWM.

We can see that ASSM uses bitmap to track the state of blocks in the segment, and introduces LHWM and HHWM to manage segment space.

Low High-Water Mark (LHWM): Like the old High Water Mark.
    All blocks below this block have been formatted for the table.

High High-Water Mark (HHWM)
    All blocks above this block have not been formatted.

The LHWM and the HHWM may not be the same value depending on how the bitmap tree was traversed. If different the blocks between them may or may not be formatted for use. The HHWM is necessary so that direct load operation can guarantee contiguous unformatted blocks.

During parallel query, a select on v$bh shows more than 100 CR versions of TABLE_1 segment header block. It could be triggered by bitmap block(BMB) checking.

When looking v$session.event, we saw a heavy activity on:
"db file parallel read"

AWR report also shows:
db file parallel read: 202,467

Oracle says that this event happens during recovery or buffer prefetching. But in this case, it could be the multiple BMB blocks' reads from different files in which BMB blocks are located for each group.

Friday, May 23, 2014

Oracle 11g AUTO_SAMPLE_SIZE performance on Small Tables

Oracle 11g enhanced DBMS_STATS.AUTO_SAMPLE_SIZE by a new algorithm in calculating NDV
(see Blog: Improvement of AUTO sampling statistics gathering feature in Oracle Database 11g ).

Recent experience on large Databases (a few TB with mixed objects' size) reveals the performance disadvantage on small tables when using AUTO_SAMPLE_SIZE. In most case, objects under 100 MB are 20% to 40% slower.

Such behaviour can be reproduced on a small test DB. At first, create a few tables with different size. Then call dbms_stats.gather_table_stats on them, and record session statistics for each case. (see appended TestCode).

Launch the test by:
     truncate table test_stats;
exec test_tab_run(10);

then collect test result by:

with sq as
   (select t.*, value - lag(value) over (partition by table_name, stat_name order by ts) val
      from test_stats t
      where stat_name in ('CPU used by this session', 'undo change vector size', 'sorts (rows)',
                          'db block gets', 'db block changes', 'physical reads', 'consistent gets'))
select a.table_name, a.table_size, a.stat_name, a.val "AUTO_VAL", p.val "100%_VAL"
       ,(a.val - p.val) diff, round(100*(a.val - p.val)/p.val, 2) "DIFF_%"
from (select * from sq where estimate_percent = 'AUTO') a
      ,(select * from sq where estimate_percent = '100%') p
where a.table_name = p.table_name
   and a.stat_name = p.stat_name
order by a.stat_name, a.table_size;

TABLE_NAME	TABLE_SIZE	STAT_NAME	AUTO_VAL	100%_VAL	DIFF	DIFF_%
TEST_TAB_0KB	131072	CPU used by this session	62	32	30	93.75
TEST_TAB_200KB	262144	CPU used by this session	48	35	13	37.14
TEST_TAB_800KB	917504	CPU used by this session	51	54	-3	-5.56
TEST_TAB_2MB	2228224	CPU used by this session	58	91	-33	-36.26
TEST_TAB_8MB	8519680	CPU used by this session	89	289	-200	-69.2
TEST_TAB_20MB	20971520	CPU used by this session	150	702	-552	-78.63
TEST_TAB_80MB	83755008	CPU used by this session	436	2800	-2364	-84.43
TEST_TAB_0KB	131072	consistent gets	25595	14167	11428	80.67
TEST_TAB_200KB	262144	consistent gets	18976	13582	5394	39.71
TEST_TAB_800KB	917504	consistent gets	19730	14376	5354	37.24
TEST_TAB_2MB	2228224	consistent gets	21338	15990	5348	33.45
TEST_TAB_8MB	8519680	consistent gets	28935	23622	5313	22.49
TEST_TAB_20MB	20971520	consistent gets	44004	38464	5540	14.4
TEST_TAB_80MB	83755008	consistent gets	124858	114275	10583	9.26
TEST_TAB_0KB	131072	db block changes	3347	1748	1599	91.48
TEST_TAB_200KB	262144	db block changes	2277	1639	638	38.93
TEST_TAB_800KB	917504	db block changes	2308	1684	624	37.05
TEST_TAB_2MB	2228224	db block changes	2344	1640	704	42.93
TEST_TAB_8MB	8519680	db block changes	2302	1813	489	26.97
TEST_TAB_20MB	20971520	db block changes	2318	1685	633	37.57
TEST_TAB_80MB	83755008	db block changes	6998	1713	5285	308.52
TEST_TAB_0KB	131072	db block gets	3531	1611	1920	119.18
TEST_TAB_200KB	262144	db block gets	2785	1497	1288	86.04
TEST_TAB_800KB	917504	db block gets	2786	1555	1231	79.16
TEST_TAB_2MB	2228224	db block gets	2849	1499	1350	90.06
TEST_TAB_8MB	8519680	db block gets	2785	1659	1126	67.87
TEST_TAB_20MB	20971520	db block gets	2815	1540	1275	82.79
TEST_TAB_80MB	83755008	db block gets	2765	1577	1188	75.33
TEST_TAB_0KB	131072	physical reads	1885	988	897	90.79
TEST_TAB_200KB	262144	physical reads	1136	906	230	25.39
TEST_TAB_800KB	917504	physical reads	1208	965	243	25.18
TEST_TAB_2MB	2228224	physical reads	1382	1146	236	20.59
TEST_TAB_8MB	8519680	physical reads	2139	1903	236	12.4
TEST_TAB_20MB	20971520	physical reads	3639	25991	-22352	-86
TEST_TAB_80MB	83755008	physical reads	11188	101658	-90470	-88.99
TEST_TAB_0KB	131072	sorts (rows)	13623	9873	3750	37.98
TEST_TAB_200KB	262144	sorts (rows)	12609	69442	-56833	-81.84
TEST_TAB_800KB	917504	sorts (rows)	12609	248542	-235933	-94.93
TEST_TAB_2MB	2228224	sorts (rows)	12609	606742	-594133	-97.92
TEST_TAB_8MB	8519680	sorts (rows)	12609	2397742	-2385133	-99.47
TEST_TAB_20MB	20971520	sorts (rows)	12609	5979742	-5967133	-99.79
TEST_TAB_80MB	83755008	sorts (rows)	12609	23889742	-23877133	-99.95
TEST_TAB_0KB	131072	undo change vector size	143968	84360	59608	70.66
TEST_TAB_200KB	262144	undo change vector size	92536	81572	10964	13.44
TEST_TAB_800KB	917504	undo change vector size	102888	82036	20852	25.42
TEST_TAB_2MB	2228224	undo change vector size	103436	81240	22196	27.32
TEST_TAB_8MB	8519680	undo change vector size	104356	95100	9256	9.73
TEST_TAB_20MB	20971520	undo change vector size	94504	90772	3732	4.11
TEST_TAB_80MB	83755008	undo change vector size	94528	90664	3864	4.26

Above table points out:

"CPU used by this session" is favorable to "AUTO_SAMPLE_SIZE" only when table size is over 800KB.
"consistent gets", "physical reads" are increasing with table size.
These reflect the real work of gather_table_stats.
"db block changes", "db block gets", "undo change vector size" are almost constant.
but "AUTO_SAMPLE_SIZE" is mostly higher than "100%".
They stand for the base load of gather_table_stats, since gather_table_stats hardly makes any modifications on the tables.
Regardless of table size, they pertain for each call of gather_table_stats.
"sorts (rows)" is constant for "AUTO_SAMPLE_SIZE", but increasing rapidly for "100%".
This exposes the effective advantage of new algorithm.

so we can say that only when base load in Point 3 is overwhelmed, AUTO_SAMPLE_SIZE exhibits the advantage.

The query on elapsed time (in millisecond) also confirms the variance of performance on object size for both methods:

with sq as
    (select t.*
           ,(ts - lag(ts) over (partition by table_name, stat_name order by ts))dur
       from test_stats t
       where stat_name in ('CPU used by this session'))
select a.table_name, a.table_size
        ,round(extract(minute from a.dur)*60*1000 + extract(second from a.dur)*1000) "AUTO_DUR"
        ,round(extract(minute from p.dur)*60*1000 + extract(second from p.dur)*1000) "100%_DUR"
        ,round(extract(minute from (a.dur-p.dur))*60*1000 + extract(second from (a.dur-p.dur))*1000) diff_dur
from (select * from sq where estimate_percent = 'AUTO') a
       ,(select * from sq where estimate_percent = '100%') p
where a.table_name = p.table_name
    and a.stat_name = p.stat_name
order by a.stat_name, a.table_size;

TABLE_NAME	TABLE_SIZE	AUTO_DUR	100%_DUR	DIFF_DUR
TEST_TAB_0KB	131072	1852	882	970
TEST_TAB_200KB	262144	986	748	239
TEST_TAB_800KB	917504	1066	1029	37
TEST_TAB_2MB	2228224	1179	1648	-469
TEST_TAB_8MB	8519680	1645	4819	-3174
TEST_TAB_20MB	20971520	2650	11407	-8756
TEST_TAB_80MB	83755008	7300	44910	-37610

As a conclusion, this Blog demonstrated the existence of crossing point on object size. Only apart from this point, AUTO_SAMPLE_SIZE is profitable, and benefice is proportional to the object size.

AUTO_SAMPLE_SIZE Internals

By a 10046 trace when running:
   exec dbms_stats.gather_table_stats(null, 'TEST_TAB_2MB', estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE);

we can see the new Operation: APPROXIMATE NDV:

Rows     Row Source Operation
------- ---------------------------------------------------
       1 SORT AGGREGATE (cr=300 pr=296 pw=0 time=246170 us)
   30000   APPROXIMATE NDV AGGREGATE (cr=300 pr=296 pw=0 time=44330 us ...)
   30000    TABLE ACCESS FULL TEST_TAB_2MB (cr=300 pr=296 pw=0 time=23643 us ...)

As discussed in Amit Poddar's "One Pass Distinct Sampling" (comprising details of math foundations), the most expensive operation in the old method is due to aggregate function (count (distinct ..)) to calculate the NDVs, whereas the new algorithm scans the full table once to build a synopsis per column to calculate column NDV.

To explore APPROXIMATE NDV AGGREGATE internals, Oracle designated a special event:
   ORA-10832: Trace approximate NDV row source

The details of approximate NDV algorithm can be traced with event 10832 set at level 1.

The new algorithm is coded in DBMS_SQLTUNE_INTERNAL.GATHER_SQL_STATS and called by DBMS_STATS_INTERNAL.GATHER_SQL_STATS, which is again called by DBMS_STATS.GATHER_BASIC_USING_APPR_NDV, and finally by DBMS_STATS.GATHER_TABLE_STATS.

TestCode

drop view stats_v;

create view stats_v as
select n.name stat_name, s.value from v$statname n, v$mystat s where n.statistic# = s.statistic#;

drop table test_stats;

create table test_stats as
select timestamp'2013-11-22 10:20:30' ts, 'TEST_TAB_xxxxxxxxxx' table_name, 0 table_size, 'AUTO_or_100%' estimate_percent, v.*
from stats_v v where 1=2;

drop table test_tab_0kb;
create table test_tab_0kb(x number, y varchar2(100));
insert into test_tab_0kb(y) values(null);
commit;

drop table test_tab_200kb;
create table test_tab_200kb(x number, y varchar2(100));
insert into test_tab_200kb select level, rpad('abc', mod(level, 100), 'x') from dual connect by level <= 3000;
commit;

drop table test_tab_800kb;
create table test_tab_800kb(x number, y varchar2(100));
insert into test_tab_800kb select level, rpad('abc', mod(level, 100), 'x') from dual connect by level <= 12000;
commit;

drop table test_tab_1mb;
create table test_tab_1mb(x number, y varchar2(100));
insert into test_tab_1mb select level, rpad('abc', mod(level, 100), 'x') from dual connect by level <= 15000;
commit;

drop table test_tab_2mb;
create table test_tab_2mb(x number, y varchar2(100));
insert into test_tab_2mb select level, rpad('abc', mod(level, 100), 'x') from dual connect by level <= 30000;
commit;

drop table test_tab_8mb;
create table test_tab_8mb(x number, y varchar2(100));
insert into test_tab_8mb select level, rpad('abc', mod(level, 100), 'x') from dual connect by level <= 120000;
commit;

drop table test_tab_20mb;
create table test_tab_20mb(x number, y varchar2(100));
insert into test_tab_20mb select level, rpad('abc', mod(level, 100), 'x') from dual connect by level <= 300000;
commit;

drop table test_tab_80mb;
create table test_tab_80mb(x number, y varchar2(100));
insert into test_tab_80mb select level, rpad('abc', mod(level, 100), 'x') from dual connect by level <= 1200000;
commit;

create or replace procedure fill_tab(p_table_name varchar2, p_size_mb number) as
begin
for i in 1..p_size_mb loop
    execute immediate 'insert into ' || p_table_name||' select * from test_tab_1mb';
end loop;
commit;
end;
/

create or replace procedure test_tab_proc (p_table_name varchar2, p_cnt number) as
l_table_size number;

procedure flush as
    begin
      execute immediate 'alter system flush shared_pool';
      execute immediate 'alter system flush buffer_cache';
    end;
begin
execute immediate 'select bytes from dba_segments where segment_name = :s' into l_table_size using p_table_name;
insert into test_stats select systimestamp, p_table_name, l_table_size, 'start', v.* from stats_v v;
flush;
for i in 1..p_cnt loop
    dbms_stats.gather_table_stats(null, p_table_name, estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE);
end loop;
insert into test_stats select systimestamp, p_table_name, l_table_size, 'AUTO', v.* from stats_v v;
flush;
for i in 1..p_cnt loop
    dbms_stats.gather_table_stats(null, p_table_name, estimate_percent => 100);
end loop;
insert into test_stats select systimestamp, p_table_name, l_table_size, '100%', v.* from stats_v v;
commit;
end;
/

create or replace procedure test_tab_run(p_cnt number) as
begin
test_tab_proc('TEST_TAB_0KB', p_cnt);
test_tab_proc('TEST_TAB_200KB', p_cnt);
test_tab_proc('TEST_TAB_800KB', p_cnt);
test_tab_proc('TEST_TAB_2MB', p_cnt);
test_tab_proc('TEST_TAB_8MB', p_cnt);
test_tab_proc('TEST_TAB_20MB', p_cnt);
test_tab_proc('TEST_TAB_80MB', p_cnt);
end;
/

Thursday, May 8, 2014

java stored procedure calls and latch: row cache objects, and performance

In the last Blog, we demonstrated java stored procedure calls and latch: row cache objects.

Is "latch: row cache objects" a real performance problem ?

Let's run some scalability tests for both cases (See TestCode in Blog: java stored procedure calls and latch: row cache objects) on AIX Power7 System with Entitled Capacity 4, SMT=4 and Oracle 11.2.0.3.0.

begin
xpp_test.run_var(p_case => 1, p_job_var => 32, p_dur_seconds => 60);
xpp_test.run_var(p_case => 2, p_job_var => 32, p_dur_seconds => 60);
end;
/

The above code runs Case_1 and Case_2 with Jobs varying from 1 to 32 for 60 seconds.

Scalability

Once the test terminated, run query:

with run as
(select case, job_cnt, sum(dur)*1000 elapsed,
         sum(run_cnt) run_cnt, sum(java_cnt) java_cnt, round(sum(java_nano)/1000/1000) java_elapsed
from test_stats
group by case, job_cnt)
select r1.job_cnt, r1.elapsed,
       r1.run_cnt c1_run_cnt, r2.run_cnt c2_run_cnt,
       r1.java_cnt c1_java_cnt, r2.java_cnt c2_java_cnt,
       r1.java_elapsed c1_java_elapsed, r2.java_elapsed c2_java_elapsed
from (select * from run where case = 1) r1
    ,(select * from run where case = 2) r2
where r1.job_cnt = r2.job_cnt(+)
order by r1.job_cnt;

The output shows that the throughput of both cases (C1_RUN_CNT and C2_RUN_CNT) is very closed.

JOB_CNT	ELAPSED	C1_RUN_CNT	C2_RUN_CNT	C1_JAVA_CNT	C2_JAVA_CNT	C1_JAVA_ELAPSED	C2_JAVA_ELAPSED
1	60000	118	119	86612	119	390	1599
2	120000	240	242	176160	242	794	2944
3	180000	360	361	264240	361	1196	4439
4	240000	461	459	338374	459	1566	5756
5	300000	476	503	349384	503	1852	7055
6	360000	515	527	378010	527	2101	8040
7	420000	551	555	404434	555	2408	8952
8	480000	569	580	417646	580	2689	10464
9	540000	597	582	438198	582	2978	11459
10	600000	601	591	441134	591	3163	12797
11	660000	613	635	449942	635	3350	14150
12	720000	646	654	474164	654	3655	15019
13	780000	683	668	501322	668	3912	16619
14	840000	696	714	510864	714	4260	16985
15	900000	714	722	524076	722	4435	19440
16	960000	733	730	538022	730	4822	20250
17	1020000	730	728	535820	728	4947	22637
18	1080000	726	744	532884	744	5022	21034
19	1140000	743	727	545362	727	5274	23924
20	1200000	734	733	538756	733	5611	24402
21	1260000	737	729	540958	729	6075	26138
22	1320000	730	727	535820	727	6406	28011
23	1380000	737	735	540958	735	6555	27972
24	1440000	746	730	547564	730	7040	30529
25	1500000	733	742	538022	742	7181	29288
26	1560000	738	729	541692	729	6620	33637
27	1620000	740	735	543160	735	7269	32531
28	1680000	741	738	543894	738	7881	36467
29	1740000	749	747	549766	747	7652	34837
30	1800000	737	735	540958	735	8285	37758
31	1860000	746	743	547564	743	8759	38612
32	1920000	759	755	557106	755	8326	40002

Table-1

If drawing a graph with above data, we can see that throughput is linear till JOB_CNT=4, and reach its peak point at JOB_CNT=16.

Figure-1

Latch: row cache objects

If we run the query for latch: row cache objects:

with rc as
(select s.case, s.job_cnt, (e.gets-s.gets) gets, (e.misses-s.misses) misses, (e.sleeps-s.sleeps) sleeps,
        (e.spin_gets-s.spin_gets) spin_gets, (e.wait_time-s.wait_time) wait_time
from (select * from rc_latch_stats where step = 'start') s
       ,(select * from rc_latch_stats where step = 'end')   e
where s.case = e.case and s.job_cnt = e.job_cnt)
select c1.job_cnt, c1.gets c1_gets, c2.gets c2_gets, c1.misses c1_misses, c2.misses c2_misses,
       c1.sleeps c1_sleeps, c2.sleeps c2_sleeps, c1.spin_gets c1_spin_gets, c2.spin_gets c2_spin_gets,
       round(c1.wait_time/1000) c1_wait_time, round(c2.wait_time/1000) c2_wait_time
from (select * from rc where case = 1) c1
    ,(select * from rc where case = 2) c2
where c1.job_cnt = c2.job_cnt(+)
order by c1.job_cnt;

and look its output:

JOB_CNT	C1_GETS	C2_GETS	C1_MISSES	C2_MISSES	C1_SLEEPS	C2_SLEEPS	C1_SPIN_GETS	C2_SPIN_GETS	C1_WAIT_TIME	C2_WAIT_TIME
1	261925	38360	1	47	0	0	1	47	0	0
2	590567	4070	1950	0	0	0	1950	0	0	0
3	755390	2755	6342	0	0	0	6342	0	0	0
4	941682	3459	13772	0	2	0	13770	0	3	0
5	958751	4947	60853	0	4	0	60850	0	14	0
6	1128800	5224	92700	146	1	0	92699	146	1	0
7	1193847	4719	102209	31	1	0	102208	31	1	0
8	1219302	19716	151143	0	0	0	151143	0	0	0
9	1261478	5227	155337	8	13	0	155324	8	464	0
10	1256130	5404	195400	0	1	0	195399	0	0	0
11	1264005	7931	251793	27	0	0	251793	27	0	0
12	1320357	6014	254131	73	1	0	254130	73	0	0
13	1492126	7266	394229	121	0	0	394229	121	0	0
14	1496853	6948	408408	173	0	0	408408	173	0	0
15	1511372	7124	417898	472	13	0	417885	472	1795	0
16	1521562	8392	437642	283	11	0	437631	283	2271	0
17	1494286	20916	355035	179	36	0	354999	179	4324	0
18	1459280	7891	320098	420	53	0	320046	420	9841	0
19	1611370	9055	335870	212	37	0	335833	212	9564	0
20	1566755	9186	280001	218	100	5	279903	213	23946	530
21	1553970	8186	268234	52	70	14	268165	38	21438	3470
22	1499580	9350	230573	112	157	0	230420	112	40523	0
23	1474488	8538	244989	142	158	4	244832	138	36357	392
24	1628831	9677	229768	69	197	0	229573	69	54006	0
25	1568732	11112	226365	344	392	5	225984	339	103739	311
26	1548840	22742	229974	155	323	0	229656	155	76637	0
27	1519217	10326	204424	148	418	0	204014	148	86043	0
28	1479227	9322	200061	78	476	0	199596	78	102191	0
29	1641837	9841	226975	220	493	0	226486	220	107470	0
30	1556361	12102	210739	159	449	1	210305	158	93532	18
31	1518952	10004	191122	255	558	22	190572	233	104639	4785
32	1544149	10248	191564	254	628	9	190940	245	114982	447

Table-2

There is a broad difference of latch gets and misses between Case_1 and Case_2.

Also drawing a graph for Case_1:

Figure-2

For Case_1, latch misses reach its top at about JOB_CNT=16, and starting from JOB_CNT=15, there is a considerable latch wait_time.

Unbalanced Workload

One interesting query to reveal the unbalanced workload of Job sessions (UNIX processes) is:

select job_cnt, min(run_cnt) min, max(run_cnt) max, (max(run_cnt) - min(run_cnt)) delta_max
,round(avg(run_cnt), 2) avg, round(stddev(run_cnt), 2) stddev
from test_stats t
where case = 1
group by case, job_cnt
order by case, job_cnt;

which shows certain difference (STDDEV) starting from JOB_CNT = 4 on 4 Physical CPUs AIX (Entitled Capacity = 4) since at least one CPU is required to perform UNIX system and other Oracle tasks.
(Only Case_1 is showed, but Case_2 looks similar)

JOB_CNT	MIN	MAX	DELTA_MAX	AVG	STDDEV
1	118	118	0	118	0
2	120	120	0	120	0
3	120	120	0	120	0
4	109	120	11	115.25	5.62
5	74	119	45	95.2	19.98
6	75	97	22	85.83	7.7
7	66	100	34	78.71	13.05
8	63	77	14	71.13	4.82
9	58	76	18	66.33	6.3
10	56	73	17	60.1	4.95
11	48	67	19	55.73	6.26
12	49	64	15	53.83	5.06
13	47	66	19	52.54	6.1
14	46	65	19	49.71	5.95
15	45	51	6	47.6	2.2
16	44	47	3	45.81	0.75
17	38	46	8	42.94	2.05
18	35	44	9	40.33	2.47
19	32	46	14	39.11	4.23
20	32	43	11	36.7	3.7
21	28	42	14	35.1	4.37
22	25	39	14	33.18	3.43
23	26	39	13	32.04	3.13
24	25	36	11	31.08	2.65
25	25	32	7	29.32	1.73
26	25	35	10	28.38	2.5
27	25	33	8	27.41	1.89
28	24	30	6	26.46	1.53
29	24	28	4	25.83	1.07
30	22	28	6	24.57	1.33
31	22	33	11	24.06	2.06
32	21	30	9	23.72	1.9

Table-3