Wednesday, May 17, 2017

PLSQL Context Switch Functions and Cost

This Blog will try to investigate the cost of PLSQL context switch on the (speculated) Oracle internal function calls:
    (1) plsql_run
    (2) pfrrun
    (3) evapls
    (4) kgiPinObject
We will use Dtrace to acquire call count and duration in case of different 12c PLSQL features.

Blog: Oracle 12c: PRAGMA UDF – the truth debugs plsql_run calls of 12c PRAGMA UDF, and claims the impact of fetch size.

Blog: PL/SQL context switch calls Linux SystemTap to get stacktraces of first 3 functions, and applies perf to sample stacktraces, finally generates flamegraph. Because of sampling, it does not have an exact call counter, and Nyquist sampling theorem requires the sampling should be double fast.

The last function is discussed in Blog: Blog: Using DTrace to understand why PL/SQL packages can be faster than stored procedures . It uses Dtrace to compare PL/SQL packages and stored procedures performance, and demonstrates PL/SQL packages prevails the performance of stored procedures. Probably because PL/SQL packages are stateful objects, they have to be pinned in Library Cache. Whereas stored procedures are stateless, they can be freely discarded and re-loaded if necessary.

Even 4 popular PL/SQL errors are designated hereinafter for package state:
  ORA-04068: existing state of packages has been discarded
  ORA-04061: existing state of package %s has been invalidated
  ORA-04065: not executed, altered or dropped package %s
  ORA-06508: PL/SQL: could not find program unit being called:%s
This Blog will try to demonstrate that plsql_run, pfrrun and kgiPinObject are hardly to be the deciding factors of PLSQL context switch cost because lowing the calls and elapsed time of these 3 functions to an ignorable value does not produce the proportional benefits. Only evapls keeps constant and hence probably stays in the critical code path.

But all the above approaches only discuss context switch from SQL to PLSQL. It is still open to see how to investigate the inverse context switch from PLSQL to SQL.


1. Test


Run run_test_all (see appended code), we get the following result.

Note that 'base_sql_only' is an implementation in pure SQL, 'base_plsql_only' is one in pure PLSQL. Both will be served as base for context switch comparisons.
   
SQL > exec run_test_all(10000);
  base_sql_only      Elaplsed (ms) =       60
  base_plsql_only    Elaplsed (ms) =      790
  get_sql_cnt        Elaplsed (ms) =     2140
  get_udf_cnt        Elaplsed (ms) =     2130
  get_bulk_cnt       Elaplsed (ms) =     1130
  get_with_cnt       Elaplsed (ms) =     1100  
  update_loop_cnt    Elaplsed (ms) =     2940
  update_forall_cnt  Elaplsed (ms) =     2040

'get_with_cnt' is coded in 12c WITH Clause PL/SQL Function, which seems double fast. 'get_bulk_cnt' is also faster due to bulk collect.


2. Test with Dtrace


Run the similar test case in Sqlplus Window, and call Dtrace script (see appended code) to trace the process (pid 5678) by:
   ./plsql_cost.d -p 5678
we get the result as follows.

SQL > exec run_test('base_sql_only'      , 10000);
  plsql_run    Count = 2,     Total_time (ms) = 61,   Elpased (ms) = 71
  pfrrun       Count = 2,     Total_time (ms) = 61,   Elpased (ms) = 71
  evapls       Count = 0,     Total_time (ms) = 0,    Elpased (ms) = 0
  kgiPinObject Count = 10,    Total_time (ms) = 0,    Elpased (ms) = 70

SQL > exec run_test('base_plsql_only'   , 10000);
  plsql_run    Count = 2,     Total_time (ms) = 760,  Elpased (ms) = 763
  pfrrun       Count = 2,     Total_time (ms) = 760,  Elpased (ms) = 763
  evapls       Count = 0,     Total_time (ms) = 0,    Elpased (ms) = 0
  kgiPinObject Count = 9,     Total_time (ms) = 0,    Elpased (ms) = 763

SQL > exec run_test('get_sql_cnt'      , 10000);
  plsql_run    Count = 10001, Total_time (ms) = 1296, Elpased (ms) = 2860
  pfrrun       Count = 10001, Total_time (ms) = 1175, Elpased (ms) = 2860
  evapls       Count = 10000, Total_time (ms) = 1770, Elpased (ms) = 2859
  kgiPinObject Count = 10009, Total_time (ms) = 112,  Elpased (ms) = 2860

SQL > exec run_test('get_udf_cnt'      , 10000);
  plsql_run    Count = 10001, Total_time (ms) = 1291, Elpased (ms) = 3175
  pfrrun       Count = 10001, Total_time (ms) = 1170, Elpased (ms) = 3175
  evapls       Count = 10000, Total_time (ms) = 1773, Elpased (ms) = 3173
  kgiPinObject Count = 10009, Total_time (ms) = 116,  Elpased (ms) = 3175

SQL > exec run_test('get_bulk_cnt'     , 10000);
  plsql_run    Count = 10001, Total_time (ms) = 1252, Elpased (ms) = 1720
  pfrrun       Count = 10001, Total_time (ms) = 1145, Elpased (ms) = 1720
  evapls       Count = 10000, Total_time (ms) = 1529, Elpased (ms) = 1708
  kgiPinObject Count = 10010, Total_time (ms) = 49,   Elpased (ms) = 1720

SQL > exec run_test('get_with_cnt'     , 10000);
  plsql_run    Count = 2,     Total_time (ms) = 0,    Elpased (ms) = 1664
  pfrrun       Count = 10001, Total_time (ms) = 1141, Elpased (ms) = 1664
  evapls       Count = 10000, Total_time (ms) = 1244, Elpased (ms) = 1653
  kgiPinObject Count = 10,    Total_time (ms) = 0,    Elpased (ms) = 1664
      
SQL > exec run_test('update_loop_cnt'  , 10000);
  plsql_run    Count = 10001, Total_time (ms) = 1352, Elpased (ms) = 3760
  pfrrun       Count = 10001, Total_time (ms) = 1229, Elpased (ms) = 3760
  evapls       Count = 10000, Total_time (ms) = 1859, Elpased (ms) = 3756
  kgiPinObject Count = 10007, Total_time (ms) = 123,  Elpased (ms) = 3760

SQL > exec run_test('update_forall_cnt', 10000);
  plsql_run    Count = 10001, Total_time (ms) = 1352, Elpased (ms) = 3083
  pfrrun       Count = 10001, Total_time (ms) = 1231, Elpased (ms) = 3083
  evapls       Count = 10000, Total_time (ms) = 1661, Elpased (ms) = 3077
  kgiPinObject Count = 10008, Total_time (ms) = 57,   Elpased (ms) = 3083


3. PLSQL_CODE_TYPE NATIVE


Changing plsql_code_type from default INTERPRETED to NATIVE
   alter system set plsql_code_type=native;
recompile the PLSQL Test Code, and repeat the same tests with Dtrace.

(check NATIVE comipled source by:
   select * from dba_stored_settings where object_name = 'BASE_SQL_ONLY';)

Run all test cases:

SQL > exec run_test_all(10000);
  base_sql_only      Elaplsed (ms) =       60
  base_plsql_only    Elaplsed (ms) =      800
  get_sql_cnt        Elaplsed (ms) =     2130
  get_udf_cnt        Elaplsed (ms) =     2110
  get_bulk_cnt       Elaplsed (ms) =     1110
  get_with_cnt       Elaplsed (ms) =     1040
  update_loop_cnt    Elaplsed (ms) =     2920
  update_forall_cnt  Elaplsed (ms) =     2040

There seems not much difference compared to INTERPRETED code.

Then test with Dtrace:

SQL > exec run_test('base_sql_only'   , 10000);
  plsql_run    Count = 2,     Total_time (ms) = 64,   Elpased (ms) = 73
  pfrrun       Count = 5,     Total_time (ms) = 0,    Elpased (ms) = 73
  evapls       Count = 0,     Total_time (ms) = 0,    Elpased (ms) = 0
  kgiPinObject Count = 7,     Total_time (ms) = 0,    Elpased (ms) = 73
  
SQL > exec run_test('base_plsql_only'   , 10000);
  plsql_run    Count = 2,     Total_time (ms) = 760,  Elpased (ms) = 763
  pfrrun       Count = 2,     Total_time (ms) = 760,  Elpased (ms) = 763
  evapls       Count = 0,     Total_time (ms) = 0,    Elpased (ms) = 0
  kgiPinObject Count = 9,     Total_time (ms) = 0,    Elpased (ms) = 763

SQL > exec run_test('get_sql_cnt'      , 10000);
  plsql_run    Count = 10001, Total_time (ms) = 1171, Elpased (ms) = 2639
  pfrrun       Count = 4,     Total_time (ms) = 0,    Elpased (ms) = 2639
  evapls       Count = 10000, Total_time (ms) = 1604, Elpased (ms) = 2636
  kgiPinObject Count = 10002, Total_time (ms) = 111,  Elpased (ms) = 2639
  
SQL > exec run_test('get_udf_cnt'      , 10000);
  plsql_run    Count = 10001, Total_time (ms) = 1175, Elpased (ms) = 3056
  pfrrun       Count = 4,     Total_time (ms) = 0,    Elpased (ms) = 3056
  evapls       Count = 10000, Total_time (ms) = 1642, Elpased (ms) = 3055
  kgiPinObject Count = 10004, Total_time (ms) = 110,  Elpased (ms) = 3056
  
SQL > exec run_test('get_bulk_cnt'     , 10000);
  plsql_run    Count = 10001, Total_time (ms) = 1162, Elpased (ms) = 1953
  pfrrun       Count = 5,     Total_time (ms) = 0,    Elpased (ms) = 1953
  evapls       Count = 10000, Total_time (ms) = 1435, Elpased (ms) = 1942
  kgiPinObject Count = 10007, Total_time (ms) = 47,   Elpased (ms) = 1953
    
SQL > exec run_test('get_with_cnt'     , 10000);
  plsql_run    Count = 2,     Total_time (ms) = 0,    Elpased (ms) = 1655
  pfrrun       Count = 10005, Total_time (ms) = 1138, Elpased (ms) = 1655
  evapls       Count = 10000, Total_time (ms) = 1238, Elpased (ms) = 1644
  kgiPinObject Count = 5,     Total_time (ms) = 0,    Elpased (ms) = 1655
        
SQL > exec run_test('update_loop_cnt'  , 10000);
  plsql_run    Count = 10001, Total_time (ms) = 1227, Elpased (ms) = 4377
  pfrrun       Count = 4,     Total_time (ms) = 0,    Elpased (ms) = 4377
  evapls       Count = 10000, Total_time (ms) = 1730, Elpased (ms) = 4373
  kgiPinObject Count = 10004, Total_time (ms) = 121,  Elpased (ms) = 4377
  
SQL > exec run_test('update_forall_cnt', 10000);
  plsql_run    Count = 10001, Total_time (ms) = 1235, Elpased (ms) = 3568
  pfrrun       Count = 5,     Total_time (ms) = 0,    Elpased (ms) = 3568
  evapls       Count = 10000, Total_time (ms) = 1549, Elpased (ms) = 3564
  kgiPinObject Count = 10004, Total_time (ms) = 59,   Elpased (ms) = 3568

By the way, changing optimization level from default 2 to 1 by
   alter system set plsql_optimize_level=1;
seems no impact.


4. Reasoning


Look all above test result, we can conclude the facts below:
(a). All approaches except 'get_with_cnt' have similar behavior, but different between INTERPRETED and NATIVE.

(b). NATIVE Compile eliminates pfrrun to the level of base tests in all cases except 'get_with_cnt'.

(c). 'get_with_cnt' has plsql_run and kgiPinObject close to base tests, and behaves same in NATIVE and INTERPRETED.
      Probably it is implemented as NATIVE compiled.

(d). 'kgiPinObject' has a comparable call count as that of plsql_run, but its duration is less than 10% of total elapsed time.
      Therefore, it alone seems having not so much contribution.

(e). Comparing all tests with base tests, we can see only evapls keeps constant 
      and hence probably stays in the critical code path.


5. PLSQL Test Code



drop table test_tab;
drop table test_tab2;

create table test_tab as select level id, rpad('ABC_'||level, 1000, 'x') name from dual connect by level <= 10000; 

create index test_tab#ix_id on test_tab(id);

select count(*) from test_tab;

create table test_tab2 as select * from test_tab;

create index test_tab2#ix_id on test_tab2(id);


create or replace function get_sql (p_id in number) return varchar2 is
  l_name varchar2(1000);
begin
  select name into l_name from test_tab where id = p_id;
  return l_name;
end;
/

create or replace function get_udf (p_id in number) return varchar2 is
  l_name varchar2(1000);
  PRAGMA UDF;
begin
  select name into l_name from test_tab where id = p_id;
  return l_name;
end;
/

create or replace procedure base_sql_only (p_cnt number) is
 type t_name_tab is table of test_tab.name%type;
 l_name_tab         t_name_tab   := t_name_tab();
begin
  select name bulk collect into l_name_tab from test_tab where rownum <= p_cnt;
  dbms_output.put_line('get_bulk_cnt.count = '||l_name_tab.count);
end;
/

create or replace procedure base_plsql_only (p_cnt number) is
 l_name varchar2(1000);
begin
 for i in 1 .. p_cnt loop
    l_name := rpad('ABC_'||i, 1000, 'x');
  end loop;
end;
/

create or replace procedure get_sql_cnt (p_cnt number) is
  l_name varchar2(1000);
begin
 for i in 1 .. p_cnt loop
    select get_sql(i) into l_name from test_tab where id = i;
  end loop;
end;
/

create or replace procedure get_udf_cnt (p_cnt number) is
  l_name varchar2(1000);
begin
 for i in 1 .. p_cnt loop
    select get_udf(i) into l_name from test_tab where id = i;
  end loop;
end;
/

create or replace procedure get_bulk_cnt (p_cnt number) is
 type t_name_tab is table of test_tab.name%type;
 l_name_tab         t_name_tab   := t_name_tab();
begin
  select get_sql(id) bulk collect into l_name_tab from test_tab where rownum <= p_cnt;
  dbms_output.put_line('get_bulk_cnt.count = '||l_name_tab.count);
end;
/

create or replace procedure get_with_cnt (p_cnt number) is
 type t_name_tab is table of test_tab.name%type;
 l_name_tab         t_name_tab   := t_name_tab();
begin
  execute immediate q'[
    with function get (p_id in number) return varchar2 is
      l_name varchar2(1000);
    begin
      select name into l_name from test_tab where id = p_id;
      return l_name;
    end;
    select get(id) from test_tab where rownum <= :p_cnt]'
    bulk collect into l_name_tab
    using p_cnt;
  dbms_output.put_line('get_with_cnt.count = '||l_name_tab.count);
end;
/

create or replace procedure update_loop_cnt (p_cnt number) is
  type t_id_tab is table of test_tab.id%type;
  l_id_tab         t_id_tab   := t_id_tab();
begin
 select id bulk collect into l_id_tab from test_tab where id <= p_cnt;
  for i in 1 .. p_cnt loop
    update test_tab2 set name = get_sql(id) where id = l_id_tab(i);
  end loop;
  commit;    
end;
/

create or replace procedure update_forall_cnt (p_cnt number) is
  type t_id_tab is table of test_tab.id%type;
  l_id_tab         t_id_tab   := t_id_tab();
begin
  select id bulk collect into l_id_tab from test_tab where id <= p_cnt;
    forall i in 1 .. p_cnt
      update test_tab2 set name = get_sql(id) where id = l_id_tab(i);
    dbms_output.put_line('update_forall_cnt.sql%rowcount = '||sql%rowcount);
    commit;    
end;
/

create or replace procedure run_test (p_case varchar2, p_cnt number) as
 l_start  number := dbms_utility.get_time;
begin
  case p_case
    when 'base_sql_only'     then base_sql_only(p_cnt);
    when 'base_plsql_only'   then base_plsql_only(p_cnt); 
    when 'get_sql_cnt'       then get_sql_cnt(p_cnt); 
    when 'get_udf_cnt'       then get_udf_cnt(p_cnt); 
    when 'get_bulk_cnt'      then get_bulk_cnt(p_cnt);
    when 'get_with_cnt'      then get_with_cnt(p_cnt);
    when 'update_loop_cnt'   then update_loop_cnt(p_cnt);
    when 'update_forall_cnt' then update_forall_cnt(p_cnt);     
  end case;
  dbms_output.put_line(rpad(p_case, 18)||' Elaplsed (ms) = '||lpad((dbms_utility.get_time - l_start)*10, 8));   
end;
/

create or replace procedure run_test_all(p_cnt number) as
begin
  run_test('base_sql_only'    , p_cnt);
  run_test('base_plsql_only'  , p_cnt);
  run_test('get_sql_cnt'      , p_cnt);
  run_test('get_udf_cnt'      , p_cnt);
  run_test('get_with_cnt'     , p_cnt);
  run_test('get_bulk_cnt'     , p_cnt);
  run_test('update_loop_cnt'  , p_cnt);
  run_test('update_forall_cnt', p_cnt);
end;
/


6. Dtrace Script: plsql_cost.d



#!/usr/sbin/dtrace -s

/* plsql_cost.d
 * chmod u+x plsql_cost.d
 * ./plsql_cost.d -p 
*/

BEGIN  {
  ttime_sr = 0; stime_sr = timestamp; etime_sr = timestamp;  /** plsql_run    **/
  ttime_pr = 0; stime_pr = timestamp; etime_pr = timestamp;  /** pfrrun       **/
  ttime_er = 0; stime_er = timestamp; etime_er = timestamp;  /** evapls       **/
  ttime_kp = 0; stime_kp = timestamp; etime_kp = timestamp;  /** kgiPinObject **/
} 

pid$target::plsql_run:entry { self->t_sr = timestamp; }
 
pid$target::pfrrun:entry    { self->t_pr = timestamp; } 
 
pid$target::evapls:entry    { self->t_er = timestamp; } 

pid$target::kgiPinObject:entry    { self->t_kp = timestamp; } 
  
pid$target::plsql_run:return /self->t_sr != 0/ { 
  @CNT_SR[probefunc] = count(); 
  ttime_sr   = ttime_sr + (timestamp - self->t_sr)/1000;
  self->t_sr = 0;
  etime_sr   = timestamp;
}
 
pid$target::pfrrun:return /self->t_pr != 0/ { 
  @CNT_PR[probefunc] = count(); 
  ttime_pr   = ttime_pr + (timestamp - self->t_pr)/1000;
  self->t_pr = 0;
  etime_pr   = timestamp;
} 
 
pid$target::evapls:return /self->t_er != 0/ { 
  @CNT_ER[probefunc] = count(); 
  ttime_er   = ttime_er + (timestamp - self->t_er)/1000;
  self->t_er = 0;
  etime_er   = timestamp;
}  
 
pid$target::kgiPinObject:return /self->t_kp != 0/ { 
  @CNT_KP[probefunc] = count(); 
  ttime_kp   = ttime_kp + (timestamp - self->t_kp)/1000;
  self->t_kp = 0;
  etime_kp   = timestamp;
}  
 
END { 
  /** plsql_run **/
  printa("\n%s    Count = %@d"     ,@CNT_SR);  
  printf(", Total_time (ms) = %d"  ,ttime_sr/1000);
  printf(", Elpased (ms) = %d\n"   ,(etime_sr - stime_sr)/1000000);
  
  /** pfrrun **/
  printa("%s       Count = %@d"    ,@CNT_PR);  
  printf(", Total_time (ms) = %d"  ,ttime_pr/1000);
  printf(", Elpased (ms) = %d\n"   ,(etime_pr - stime_pr)/1000000);
  
  /** evapls **/
  printa("%s       Count = %@d"    ,@CNT_ER);
  printf("%s", ttime_er == 0 ? "evapls       Count = 0":"");   
  printf(", Total_time (ms) = %d"  ,ttime_er/1000);
  printf(", Elpased (ms) = %d\n"   ,(etime_er - stime_er)/1000000); 
  
  /** kgiPinObject **/
  printa("%s Count = %@d"       ,@CNT_KP);   
  printf(", Total_time (ms) = %d"  ,ttime_kp/1000);
  printf(", Elpased (ms) = %d\n"   ,(etime_kp - stime_kp)/1000000);  
}


7. References


(1). The Joy of Low Hanging Fruit, Part 2: the Dismay and the Delight Steven Feuerstein  

(2). Oracle 12c: PRAGMA UDF – the truth Kamil Stawiarski

(3). PL/SQL context switch Frits Hoogland
 
(4). Using DTrace to understand why PL/SQL packages can be faster than stored procedures Kamil Stawiarski