Thursday, September 24, 2015

SQL DML Exceptions, Rollbacks and PL/SQL Exception Handlers

When SQL DML statements hit runtime errors, Oracle rollbacks only the last DML statement, which caused the error. PL/SQL exception handling can make use of this behavior to save the not rollbacked work and keep the application continue running. Oracle is optimized to rollback the entire transaction if there is only one single DML statement inside the transaction.

Oracle "Database PL/SQL Language Reference" Section: "Retrying Transactions After Handling Exceptions" said:

   To retry a transaction after handling an exception that it raised, use this technique:
    ......
   If the transaction fails, control transfers to the exception-handling part of the sub-block, and after the exception handler runs, the loop repeats.

which interprets the exception-handling as Transaction-Level. Over there the example contains two DML statements (delete and insert), and rollback is controlled by a savepoint.

At first pick the test code from Book Oracle PL/SQL Programming (5th edition, Steven Feuerstein) Chapter 14: Section: DML and Exception Handling (Page 446), and add some extra lines:
   
drop table books;

create table books (book_id integer);

create or replace function tabcount return pls_integer is
   l_return pls_integer;
begin
   select count (*) into l_return from books;
   return l_return;
end tabcount;
/

create or replace procedure empty_library (pre_empty_count out pls_integer) is
begin
   pre_empty_count := tabcount ();
   dbms_output.put_line ('empty_library pre_empty_count='||pre_empty_count);
   dbms_output.put_line ('empty_library tabcount S1='||tabcount());
   delete from books where book_id=1;
   dbms_output.put_line ('empty_library tabcount S2='||tabcount());
   raise no_data_found;
end;
/

Run first test with Exception Handler:

set serveroutput on;

truncate table books;

declare
  table_count number := -1;
begin
  insert into books (book_id) values (1);
  insert into books (book_id) values (2);
  dbms_output.put_line ('tabcount S1='||tabcount());
  empty_library (table_count);
  exception when others then
    dbms_output.put_line ('tabcount S2='||tabcount());
    dbms_output.put_line ('table_count S3='||table_count);
end;
/

select * from books;

The output looks like:

tabcount S1=2
empty_library pre_empty_count=2
empty_library tabcount S1=2
empty_library tabcount S2=1
tabcount S2=1
table_count S3=-1

SQL> select * from books;
   BOOK_ID
----------
         2   

Run another test without Exception Handler:

set serveroutput on;

truncate table books;

declare
  table_count number := -1;
begin
  insert into books (book_id) values (1);
  insert into books (book_id) values (2);
  dbms_output.put_line ('tabcount S1='||tabcount());
  empty_library (table_count);
end;
/

select * from books;

The output is:

tabcount S1=2
empty_library pre_empty_count=2
empty_library tabcount S1=2
empty_library tabcount S2=1
ORA-01403: no data found
ORA-06512: at "K.EMPTY_LIBRARY", line 8

SQL> select * from books;
no rows selected  

Reading text in Page 446:

  When an exception occurs in a PL/SQL block, the Oracle database does not roll back
  any of the changes made by DML statements in that block.


and Page 447:

  If an exception propagates past the outermost block (i.e., it goes “unhandled”),
  then in most host execution environments for PL/SQL like SQL*Plus, a rollback is
  automatically executed, reversing any outstanding changes.


Crosschecking above two tests, we can see SQL*Plus does not roll back if there is an error handler,
and does a rollback to the beginning of block if there is no error handler (Unhandled Exceptions).

Relevant information can also be found in Page 145 about Unhandled Exceptions:

  If an exception is raised in your program, and it is not handled by an exception section
  in either the current or enclosing PL/SQL blocks, that exception is unhandled. PL/SQL
  returns the error that raised the unhandled exception all the way back to the application
  environment from which PL/SQL was run. That environment (a tool like SQL*Plus,
  Oracle Forms, or a Java program) then takes an action appropriate to the situation; in
  the case of SQL*Plus, a ROLLBACK of any DML changes from within that top-level
  block’s logic is automatically performed.


and Page 137,

  When this procedure(RAISE_APPLICATION_ERROR) is run, execution of the current PL/SQL block halts immediately,
  and any changes made to OUT or IN OUT arguments (if present and without the NOCOPY hint) will be reversed.

Book Expert Oracle Database Architecture (3rd Edition, Thomas Kyte, Darl Kuhn) - Chapter 8, Section: Atomicity (Page 277-283) explains the principle of Statement-Level Atomicity, and mimicks the work Oracle normally does with the SAVEPOINT:

 Savepoint sp;
 statement;
 If error then rollback to sp;


It further shows that Oracle considers Procedure-Level Atomicity (PL/SQL anonymous blocks) to be statements as well.

This Blog will try to show how to apply above principle to PL/SQL blocks with Exception Handlers in order to keep application not interrupted.

We will run 5 test cases to demonstrate such rollbacks in Statement-Level. When error occurs, it is not "transaction fails", but only "statement fails".

All Testcode is appended at the end of Blog.

1. undo_tbs_test_1


Run following code:

  set serveroutput on lines=200
  select n.name, s.value from v$mystat s, v$statname n
  where s.statistic#=n.statistic#
    and name in ('user commits', 'transaction rollbacks', 'rollback changes - undo records applied');
  select id, ts, step from test_t1 where id <= 2;

  exec undo_tbs_test_1;

  select n.name, s.value from v$mystat s, v$statname n
  where s.statistic#=n.statistic#
    and name in ('user commits', 'transaction rollbacks', 'rollback changes - undo records applied');
  select id, ts, step from test_t1 where id <= 2;


The output look as follows:

  NAME                                             VALUE
  ------------------------------------------- ----------
  user commits                                       685
  rollback changes - undo records applied         147854
  transaction rollbacks                              126

  ID TS                                     STEP
  -- -------------------------------- ----------
   1 22-SEP-2015 07:15:01                      0
   2 22-SEP-2015 07:15:01                      0

  ORA-30036: unable to extend segment by 8 in undo tablespace 'TEMPUNDO', at Step=1
  ORA-30036: unable to extend segment by 8 in undo tablespace 'TEMPUNDO', at Step=2
  ORA-30036: unable to extend segment by 8 in undo tablespace 'TEMPUNDO', at Step=3
  End committed, at Step=4

  NAME                                            VALUE
  ------------------------------------------ ----------
  user commits                                      685
  rollback changes - undo records applied        151098
  transaction rollbacks                             129

  ID TS                                   STEP
  -- ------------------------------ ----------
   1 22-SEP-2015 07:15:01                    0
   2 22-SEP-2015 07:15:01                    0


we can see 3 (129-126) "transaction rollbacks", and no user commits (685-685), and hence no rows updated.

2. undo_tbs_test_2


Run following code:

  update test_t1 set acc = null;
  commit;
  select n.name, s.value from v$mystat s, v$statname n
  where s.statistic#=n.statistic#
    and name in ('user commits', 'transaction rollbacks', 'rollback changes - undo records applied');
  select count(*) from test_t1 where acc is null;

  exec undo_tbs_test_2;

  select n.name, s.value from v$mystat s, v$statname n
  where s.statistic#=n.statistic#
    and name in ('user commits', 'transaction rollbacks', 'rollback changes - undo records applied');
  select count(*) from test_t1 where acc is null;


The output look as follows:

  NAME                                                    VALUE
  -------------------------------------------------- ----------
  user commits                                              707
  rollback changes - undo records applied                153199
  transaction rollbacks                                     129

  sql%rowcount=50 Updated_1, at Step=1
  ORA-30036: unable to extend segment by 8 in undo tablespace 'TEMPUNDO', at Step=1
  sql%rowcount=50 Updated_1, at Step=2
  ORA-30036: unable to extend segment by 8 in undo tablespace 'TEMPUNDO', at Step=2
  ......
  ORA-30036: unable to extend segment by 8 in undo tablespace 'TEMPUNDO', at Step=18
  sql%rowcount=50 Updated_1, at Step=19
  ORA-30036: unable to extend segment by 8 in undo tablespace 'TEMPUNDO', at Step=19
  sql%rowcount=50 Updated_1, at Step=20
  sql%rowcount=0 Updated_1, at Step=20
  End committed, at Step=20

  NAME                                                    VALUE
  -------------------------------------------------- ----------
  user commits                                              727
  rollback changes - undo records applied                154923
  transaction rollbacks                                     129

  sql> select count(*) from test_t1 where acc is null;
     COUNT(*)
   ----------
            0


we can see 20 user commits (727-707), no "transaction rollbacks" (129-129), and all rows are updated. From application point of view, all the updates are successfully performed, even with a small UNDO Tablespace, which is not met one-run requirement.

This could be used as a workaround in case of UNDO Tablespace is limited.

3. unique_constraint_test_1


Run following code:

  update test_t1 set id = rownum;
  commit;
  select n.name, s.value from v$mystat s, v$statname n
  where s.statistic#=n.statistic#
    and name in ('user commits', 'transaction rollbacks', 'rollback changes - undo records applied');

  exec unique_constraint_test_1;

  select n.name, s.value from v$mystat s, v$statname n
  where s.statistic#=n.statistic#
    and name in ('user commits', 'transaction rollbacks', 'rollback changes - undo records applied');


The output look as follows:

  NAME                                                    VALUE
  -------------------------------------------------- ----------
  user commits                                              728
  rollback changes - undo records applied                154923
  transaction rollbacks                                     129

  ORA-00001: unique constraint (K.TEST_T1_PK) violated, at Step=1
  ORA-00001: unique constraint (K.TEST_T1_PK) violated, at Step=2
  ORA-00001: unique constraint (K.TEST_T1_PK) violated, at Step=3
  End committed, at Step=4

  NAME                                                    VALUE
  -------------------------------------------------- ----------
  user commits                                              728
  rollback changes - undo records applied                157962
  transaction rollbacks                                     132


Again we see 3 (132-129) "transaction rollbacks", and no user commits (728-728), and hence no rows updated.

4. unique_constraint_test_2


Run following code:

 update test_t1 set id = rownum, step = 0, acc = 0;
 commit;
 select id, ts, step, acc from test_t1
  where id in (1, 900, 901, 1000, -1, -900, -901, 1000) order by step, acc, id;
 select n.name, s.value from v$mystat s, v$statname n
 where s.statistic#=n.statistic#
   and name in ('user commits', 'transaction rollbacks', 'rollback changes - undo records applied');

 exec unique_constraint_test_2;

 select n.name, s.value from v$mystat s, v$statname n
 where s.statistic#=n.statistic#
   and name in ('user commits', 'transaction rollbacks', 'rollback changes - undo records applied'); 
 select id, ts, step, acc from test_t1
  where id in (1, 900, 901, 1000, -1, -900, -901, 1000) order by step, acc, id;


The output look as follows:

   ID TS                               STEP     ACC
 ---- -------------------------- ---------- -------
    1 22-SEP-2015 07:35:13                0       0
  900 22-SEP-2015 07:35:13                0       0
  901 22-SEP-2015 07:35:13                0       0
 1000 22-SEP-2015 07:35:13                0       0

 NAME                                           VALUE
 ----------------------------------------- ----------
 user commits                                     742
 rollback changes - undo records applied       163307
 transaction rollbacks                            145

 sql%rowcount=100 Updated_1, at Step=1
 sql%rowcount=100 Updated_1, at Step=1
 sql%rowcount=100 Updated_1, at Step=1
 sql%rowcount=100 Updated_1, at Step=1
 sql%rowcount=100 Updated_1, at Step=1
 sql%rowcount=100 Updated_1, at Step=1
 sql%rowcount=100 Updated_1, at Step=1
 sql%rowcount=100 Updated_1, at Step=1
 sql%rowcount=100 Updated_1, at Step=1
 ORA-00001: unique constraint (K.TEST_T1_PK) violated, at Step=1
 ORA-00001: unique constraint (K.TEST_T1_PK) violated, at Step=2
 ORA-00001: unique constraint (K.TEST_T1_PK) violated, at Step=3
 End committed, at Step=4

 NAME                                          VALUE
 ---------------------------------------- ----------
 user commits                                    743
 rollback changes - undo records applied      164201
 transaction rollbacks                           147

   ID TS                              STEP     ACC
 ---- ------------------------- ---------- -------
  901 22-SEP-2015 07:35:13               0       0
 1000 22-SEP-2015 07:35:13               0       0
   -1 22-SEP-2015 07:35:13               1       1
 -900 22-SEP-2015 07:35:13               1       9


we can see 1 user commits (743-742), 2 "transaction rollbacks" (147-145), 900 rows are updated, 100 rows not updated.

5. deadlock_test


Restore test table:

 column name format a50
 update test_t1 set id = rownum, name = null, acc = 0;

 commit;


Open 3 Sessions, run following 3 scripts in 3 different sessions sequetially at time T1, T2, and T3:

 exec deadlock_s1;  --Session_1_T1

 exec deadlock_s2;  --Session_2_T2

 exec deadlock_s3;  --Session_3_T3


Check the updates by:

 select id, ts, name, acc from test_t1 where id <= 3 order by id; 
 ID TS                         NAME                                ACC
 -- -------------------------- -------------------------------- ----------
  1 22-SEP-2015 07:35:13       Session_1_T1/Session_2_T2/                2
  2 22-SEP-2015 07:35:13       Session_2_T2/                         1
  3 22-SEP-2015 07:35:13       Session_1_T1/Session_3_T3/                2


All Sessions are terminated in about 120 seconds, and Session_1 throws Exeception:
 ORA-00060: deadlock detected while waiting for resource, in Session_1_T1/

Row 1 updated twice (Session_1 and Session_2), Row 3 updated twice (Session_1 and Session_3),
Row 2 updated once (Session_2).

In fact, Row 2 was also updated once by Session_1, but it hit deadlock error, and the update is rollbacked. However, the other two row updates (Row 1 and 3) by Session_1 are still kept, thus committed.

Session_3 is blocked by Session_1, no transaction can be started (see v$transaction).
It will wait till Session_1's transaction terminated.

TestCode


--ensure no transaction alive on TEMPUNDO and all TEMPUNDO'terminated DML's undo_retention expired.
drop tablespace tempundo;


create undo tablespace tempundo
datafile '/testdb/undo/tempundo.dbf'
size 8m reuse autoextend off retention noguarantee
/


alter system set undo_tablespace = tempundo scope=both;
select name, value from v$parameter where name like '%undo%';
drop table test_t1;
create table test_t1
  (id number, ts timestamp default systimestamp, step number, name varchar2(4000), acc number);
alter table test_t1 add constraint test_t1_pk primary key (id);
insert into test_t1 select level, systimestamp, 0, lpad('x', 4000, 'y'), null
  from dual connect by level <= 1000;
commit;
select bytes from dba_segments where segment_name ='TEST_T1';  
--8'388'608

create or replace procedure undo_tbs_test_1 as
  l_step number := 0;
begin
  loop
    begin
      l_step := l_step + 1;
      exit when l_step > 3;  -- limit the number of retries. without it, endless loop.
      update test_t1 set step=l_step, name = lower(name) where id >= 2;
      dbms_output.put_line('sql%rowcount='||sql%rowcount||' Updated_2, at Step='||l_step);
      exit;
    exception when others then
      dbms_output.put_line(sqlerrm||', at Step='||l_step);
      commit;
    end;
  end loop;
  commit;
  dbms_output.put_line('End committed'||', at Step='||l_step);
end;
/


create or replace procedure undo_tbs_test_2 as
  l_step number := 0;
begin
  loop
    begin
      l_step := l_step + 1;
      exit when l_step > 20;
     
      for i in 1..3 loop
        update test_t1 set step=l_step, name = lower(name), acc=rownum

         where acc is null and rownum <=50;
        dbms_output.put_line('sql%rowcount='||sql%rowcount||' Updated_1, at Step='||l_step);
        exit when sql%rowcount=0;
      end loop;
     
      exit;
    exception when others then
      dbms_output.put_line(sqlerrm||', at Step='||l_step);
      commit;
    end;
  end loop;
  commit;
  dbms_output.put_line('End committed'||', at Step='||l_step);
end;
/


create or replace procedure unique_constraint_test_1 as
  l_step number := 0;
begin
  loop
    begin
      l_step := l_step + 1;
      exit when l_step > 3;   -- limit the number of retries. without it, endless loop.
      update test_t1 set id = - mod(id, 998);
      dbms_output.put_line('sql%rowcount='||sql%rowcount||' Updated_1, at Step='||l_step);
      exit;
    exception when others then
      dbms_output.put_line(sqlerrm||', at Step='||l_step);
      commit;
    end;
  end loop;
  commit;
  dbms_output.put_line('End committed'||', at Step='||l_step);
end;
/


create or replace procedure unique_constraint_test_2 as
  l_step number := 0;
begin
  loop
    begin
      l_step := l_step + 1;
      exit when l_step > 3;
     
      for i in 1..10 loop
        update test_t1 set id = - mod(id, 998), step = l_step, acc = i where id > 0 and rownum <=100;
        dbms_output.put_line('sql%rowcount='||sql%rowcount||' Updated_1, at Step='||l_step);
        exit when sql%rowcount=0;
      end loop;
     
      exit;
    exception when others then
      dbms_output.put_line(sqlerrm||', at Step='||l_step);
      commit;
    end;
  end loop;
  commit;
  dbms_output.put_line('End committed'||', at Step='||l_step);
end;
/


create or replace procedure deadlock_s1 as
begin
  update test_t1 set name = name||'Session_1_T1/', acc = acc + 1 where id = 3; 

  -- updated, keep locked till committed
  update test_t1 set name = name||'Session_1_T1/', acc = acc + 1 where id = 1; 

  -- updated, keep locked till committed
  dbms_lock.sleep(60);
  update test_t1 set name = name||'Session_1_T1/', acc = acc + 1 where id = 2; 

  -- rollback due to deadlock
  commit;
exception when others then
  dbms_output.put_line(sqlerrm||', in Session_1_T1/');
  dbms_lock.sleep(60);
  commit; 
end;
/

create or replace procedure deadlock_s2 as
begin
  update test_t1 set name = name||'Session_2_T2/', acc = acc + 1 where id = 2; 

  -- updated, keep locked till committed
  dbms_lock.sleep(60);
  update test_t1 set name = name||'Session_2_T2/', acc = acc + 1 where id = 1; 

  -- updated, keep locked till committed
  commit;
exception when others then
  dbms_output.put_line(sqlerrm||', in Session_2_T2/');
  dbms_lock.sleep(60);
  commit;   
end;
/

create or replace procedure deadlock_s3 as
begin
  update test_t1 set name = name||'Session_3_T3/', acc = acc + 1 where id = 3;  
  -- blocked 120 seconds by Session_1, no transaction started, see v$transaction
  commit;
end;
/

Monday, September 14, 2015

Limit PGA Memory Usage

Following MOS Notes lists 2 measures to control and limit PGA memory usage, but neither are recommended by Oracle. However, the best option is to fix the application.

This Blog is a follow-up of previous Blog: ORA-04030 incident file and alert.log.

1.   Event 10261


Oracle MOS:
    ORA-00600 [723], [67108952], [pga heap] When Event 10261 Set To Limit The PGA Leak (Doc ID 1162423.1)

This event is useful for PGA memory leaks (and UGA if the UGA is in the PGA). The event causes Oracle to raise an ORA-600 if the PGA tries to grow above the specified size.

In pfile/spfile, for example, the below event:
  event = 10261 trace name context forever,level 3145728
enforces a 3.2Gb Gb limit on the PGA size, and replaces the ORA-4030 with an ORA-600 [723] error.

Let's make two tests (see appended Test Code, tested with Oracle 11.2.0.3.0 on AIX):

alter system set event = "10261 trace name context forever,level 3145728" scope=spfile;
    -- DB must be restarted

SQL > exec pga_mem_test.allo(2*1024);
    -- 2GB allocation is OK
SQL > exec pga_mem_test.allo(4*1024);
    -- 4GB allocation throws error
   
ORA-00600: internal error code, arguments: [723], [65520], [top uga heap], [], [], [], [], [], [], [], [], []

Incident Dump shows:

 ORA-00600: internal error code, arguments: [723], [65520], [top uga heap], [], [], [], [], [], [], [], [], []
 
 ========= Dump for incident 16985 (ORA 600 [723]) ========
 ----- Beginning of Customized Incident Dump(s) -----
 ****** ERROR: PGA size limit exceeded in rfg: 3221284184 > 3221225472 *****
 ******************************************************

where the bottom limit is computed as:
   3221225472 = 3145728*1024

2.   Limit Parameters


Oracle MOS:
   PLSQL Procedure Causing ORA-04030: (pga heap,control file i/o buffer) And ORA-04030:
   (koh-kghu sessi,pmuccst: adt/record) or ORA-04030: (koh-kghucall ,pmucalm coll) Errors (Doc ID 1325100.1)

Either Change the page count (memory map entries per process) limit at the OS level, or adjust realfree heap pagesize at the database level:
Change the page count at the OS level:
     more /proc/sys/vm/max_map_count
  sysctl -w vm.max_map_count=262144 (for example)


Adjust the realfree heap pagesize within the database by setting the following parameters in the init/spfile and restart the database.

For versions 11.2.0.4 and lower:
  _use_realfree_heap=TRUE
  _realfree_heap_pagesize_hint = 262144

For 12.1 and higher:
  _use_realfree_heap=TRUE
  _realfree_heap_pagesize = 262144

As a test with Oracle 11.2.0.4.0 on Linux with following configuration:

Linux$ cat /proc/sys/vm/max_map_count
65530
 
Linux$ > free -mt
             total       used       free     shared    buffers     cached
Mem:         24160       5571      18588       1761        131       4257
-/+ buffers/cache:       1183      22977
Swap:            0          0          0
Total:       24160       5571      18588

The system has 24GB physical memory, and database SGA is 4GB.

At first, limit PGA per Session to 65530*4K = 256M

SQL > alter system set "_realfree_heap_pagesize_hint"=4K scope=spfile;
   -- restart DB
SQL > exec pga_mem_test.allo(1024); 

the incident dump shows:      

Dump of Real-Free Memory Allocator Heap [0x7fd64003a000]
mag=0xfefe0001 flg=0x5000003 fds=0x6 blksz=4096
blkdstbl=0x7fd64003a010, iniblk=331776 maxblk=524288 numsegs=71
In-use num=65353 siz=1335451648, Freeable num=32 siz=155648, Free num=113 siz=2225766
...
******************* Dumping process map ****************
...
7f455e47e000-7f455e483000 rw-p 00000000 00:05 4219                       /dev/zero
...
 
where blksz=4096.

However, siz/num=1335451648/65353=20434, it means that average pagesize is about 20 instead of specified 4k.
The entries in process map also confirmed it:
 7f455e47e000-7f455e483000
is decimal:
 139935911239680-139935911260160
i.e 20k.

Then, increase PGA per Session limit to 65530*256K = 16G

SQL > alter system set "_realfree_heap_pagesize_hint"=256K scope=spfile;
   -- restart DB

Open 3 Sqlplus sessions, at first session, run:

SQL(162,55) > exec pga_mem_test.allo(1024*13, 120);

Wait 5 seconds, at second session, run query:

SQL(84,59) > set numformat 99,999,999,999
SQL(84,59) > select s.sid, s.program, p.pga_used_mem, p.pga_alloc_mem 
               from v$session s, v$process p
              where s.paddr=p.addr and p.pga_used_mem >1e9;

 SID PROGRAM        PGA_USED_MEM   PGA_ALLOC_MEM
---- ----------- --------------- ---------------
 162 sqlplus.exe  15,957,853,158  15,959,926,406

At third session, run:

SQL(242,559) > exec pga_mem_test.allo(1024*13, 120);

Wait 5 seconds, at second session, run again the query:

SQL(84,59) > select s.sid, s.program, p.pga_used_mem, p.pga_alloc_mem 
               from v$session s, v$process p
              where s.paddr=p.addr and p.pga_used_mem >1e9;
 
 SID PROGRAM        PGA_USED_MEM   PGA_ALLOC_MEM
---- ----------- --------------- ---------------
 242 sqlplus.exe  15,957,853,158  15,959,926,406 

Look again first session:

SQL(162,55) > exec pga_mem_test.allo(1024*13, 120);
ERROR:
ORA-03114: not connected to ORACLE
BEGIN pga_mem_test.allo(1024*13, 120); END;
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 19710
Session ID: 162 Serial number: 55

Checking Oracle alert.log, trace, and incident files, there are nothing about the disconnected 1st session (Process ID: 19710, Session ID: 162 Serial number: 55).

Resorting to Linux dmesg, we can see:

[12:15:20] oracle invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
[12:15:20] oracle cpuset=/ mems_allowed=0
[12:15:20] Pid: 19555, comm: oracle Not tainted 2.6.32-642.el6.x86_64 #1  <--- Pid: 19555 PMON
[12:15:20] Call Trace:
[12:15:20] [<ffffffff81131640>] ? dump_header+0x90/0x1b0
[12:15:20] [<ffffffff8123c20c>] ? security_real_capable_noaudit+0x3c/0x70
[12:15:20] [<ffffffff81131ac2>] ? oom_kill_process+0x82/0x2a0
[12:15:20] [<ffffffff81131a01>] ? select_bad_process+0xe1/0x120
[12:15:20] [<ffffffff81131f00>] ? out_of_memory+0x220/0x3c0
[12:15:20] [<ffffffff8113e8dc>] ? __alloc_pages_nodemask+0x93c/0x950
[12:15:20] [<ffffffff81177b2a>] ? alloc_pages_current+0xaa/0x110
[12:15:20] [<ffffffff8112ea37>] ? __page_cache_alloc+0x87/0x90
[12:15:20] [<ffffffff8112e41e>] ? find_get_page+0x1e/0xa0
[12:15:20] [<ffffffff8112f9d7>] ? filemap_fault+0x1a7/0x500
[12:15:20] [<ffffffff81159394>] ? __do_fault+0x54/0x530
[12:15:20] [<ffffffff81159967>] ? handle_pte_fault+0xf7/0xb20
[12:15:20] [<ffffffff8122959c>] ? sem_lock+0x6c/0x130
[12:15:20] [<ffffffff8122b298>] ? sys_semtimedop+0x338/0xae0
[12:15:20] [<ffffffff8115a629>] ? handle_mm_fault+0x299/0x3d0
[12:15:20] [<ffffffff8100bc0e>] ? apic_timer_interrupt+0xe/0x20
[12:15:20] [<ffffffff81052156>] ? __do_page_fault+0x146/0x500
[12:15:20] [<ffffffff810688ed>] ? thread_group_times+0x3d/0x120
[12:15:20] [<ffffffff81079b8e>] ? mmput+0x1e/0x120
[12:15:20] [<ffffffff8109c348>] ? getrusage+0x158/0x340
[12:15:20] [<ffffffff8154dbce>] ? do_page_fault+0x3e/0xa0
[12:15:20] [<ffffffff8154aed5>] ? page_fault+0x25/0x30
...
[12:15:20] [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
[12:15:20] [19555]   100 19555  1168551      871   0       0             0 oracle  <--- PMON
...
[12:15:20] [19710]   100 19710  5072861  3995279   0       0             0 oracle  <--- 1st session
[12:15:20] [19713]   100 19713  2974770  1866502   5       0             0 oracle  <--- 3rd session
[12:15:20] [19715]   100 19715  1171449    60050   4       0             0 oracle  <--- 2nd session
...
[12:15:20] Out of memory: Kill process 19710 (oracle) score 647 or sacrifice child
[12:15:20] Killed process 19710, UID 100, (oracle) total-vm:20291444kB, anon-rss:15600524kB, file-rss:380592kB
The above log shows that Oracle PMON(19555) calls oom_kill_process() to kill one memory offending process(19710) to satisfy the request of the new process(19713).

process 19710 is killed due to badness score 647:
  Out of memory: Kill process 19710 (oracle) score 647 or sacrifice child

The baseline for the badness score is the proportion of RAM that each task's rss, pagetable and swap space use.

In Linux oom_kill.c, out_of_memory() calls select_bad_process() to find processes to be killed.
  If found, kill by oom_kill_process().
  If not found, panic the system (halt the system, never return) by:
     panic("Out of memory and no killable processes...\n");

and it is emphasized by the comment:

  out_of_memory - kill the "best" process when we run out of memory
  Found nothing?!?! Either we hang forever, or we panic.

The hidden parameter _pga_max_size does not limit a process size, only the work area.

3.   12c PGA_AGGREGATE_LIMIT


Oracle MOS:
   Doc ID 1520324.1: Limiting process size with database parameter PGA_AGGREGATE_LIMIT

While PGA_AGGREGATE_TARGET only controls allocations of tunable memory, PGA_AGGREGATE_LIMIT aborts or terminates the sessions or processes that are consuming the most untunable PGA memory, such as:
     pl/sql memory areas
  session context, cursor caches (MOS Doc ID 284951.1).

This new initialization parameter dynamically sets an instance-wide hard limit for PGA memory.

If the value of PGA_AGGREGATE_LIMIT is reached, a 12c new error message will be reported:
  ORA-04036: PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT

4.  PGA Memory Components


Blog TUNING PGA : PART – I (Anju Garg) classified PGA in functions, and by tunability:

PGA components:
    Stack space
        bind variables
        arrays (PL/SQL)
   UGA
        Session information such as logon information, and other information required by a database Session.
        SQL Work areas : used for sorting, hash operations etc.
        Private SQL Area : contains Open/Closed cursors and cursor state information for open cursors for example, 
                                        the number of rows retrieved so far in a full table scan.

PGA areas:
    Untunable PGA
         Context information of each session
         Each open cursor
         PL/SQL, OLAP or Java memory
   Tunable PGA
         SQL work areas

5.  PGA Overallocation vs. tunable and non-tunable areas


Oracle MOS: LOW PGA HIT RATIO THOUGH OVER ALLOCATION COUNT IS NONE (Doc ID 284951.1) said:

 Over-allocating PGA memory can happen if the value of PGA_AGGREGATE_TARGET is too small to accommodate the PGA component un tunable (session context, cursor caches, etc) memory plus the minimum memory required to execute the work area workload. When this happens, Oracle cannot honor the initialization parameter PGA_AGGREGATE_TARGET, and extra PGA memory needs to be allocated.
 
The number of times Oracle had to allocate more PGA memory then the PGA_AGGREGATE_TARGET suggested. This indicates the PGA target was set too small to accommodate the un tunable (session context, cursor caches, etc) memory plus the tunable component. This count should be zero ideally.

Oracle MOS: How To Avoid ORA-04030/ORA-12500 In 32-bit Windows Environment [Video] (Doc ID 373602.1) wrote:

 Within the PGA we have "tunable" and "non-tunable" areas. The tunable part is memory allocated for intensive SQL operations such as sorts, hash-joins, bitmap merge, and bitmap index create. This memory can be shrunk and expanded in response to system load. However, the non-tunable part cannot be managed in the same way. Importantly the non-tunable part includes cursors. We can control the number of cursors by the init.ora OPEN_CURSORS parameter, but if this is exceeded we get an ORA-1000 error, which is clearly undesirable. See unpublished Note:1012266.6 - "Overview of ORA-1000 Maximum Number of Cursors Exceeded" for more info. More importantly, however, we have no control over the size of a cursor, and users may open very large cursors dependent on their SQL or PLSQL.
 
 Also note that if we set PGA_AGGREGATE_TARGET too small to accommodate the non-tunable part of the PGA plus the minimum memory required to execute the tunable part, then Oracle cannot honour the PGA_AGGREGATE_TARGET value, and will attempt to allocate extra memory. This is known as overallocation, and an estimation of this can be seen in the view V$PGA_TARGET_ADVICE under the column ESTD_OVERALLOC_COUNT.

PGA memory is divided as tunable and non-tunable areas, while tunable is constrained under PGA_AGGREGATE_TARGET, non-tunable can be over allocated in any size (till ORA-04030), and Oracle records these activities in column ESTD_OVERALLOC_COUNT of V$PGA_TARGET_ADVICE. Therefore ESTD_OVERALLOC_COUNT is caused by over request of non-tunable areas.

Before each overallocation, probably Oracle tries to deallocate certain less used memory (for example, LRU Algorithm) at first. If not satisfied, new memory is allocated.

OPEN_CURSORS specifies the maximum number of open cursors (handles to private SQL areas) a session can have at once. Under overallocation, session cursor caches could be subject to memory deallocation.

In Oracle, each child cursor is associated with one KGLH0 and one SQLA (both in SGA), where KGLH0 stores environment information, SQLA stores parsing tree and xplan. When memory is reclaimed, KGLH0 stays, whereas SQLA is deallocated.

Once child cursor is cleaned out from session cursor caches, and no more found in SQLA. The later re-use of the same child cursor will result in a hard-parsing, and normally Oracle SQL Trace shows it as:

     Misses in library cache during parse:   0
    Misses in library cache during execute: 1


Probably the parsing "during execute" is a quick parsing with less optimization compared to “during parse”, for example, limiting "_optimizer_max_permutations" to a small number, so that it could be faster and not disturb too much execution phase. But in the negative side, it could be that it would not find the optimal xplan.

We have observed such sub-optimal xplan generated in "during execute" phase, but in general we don't know how to determine if an xplan is created "during parse" or "during execute".

In order to fix such sub-optimal xplan generated in "during execute" phase, PGA_AGGREGATE_TARGET is increased to avoid PGA overallocation, and at the same time, the SQL statement is manipulated to be different so that it is forced to be hard parsed from scratch.

The problem sql involves partitions of a partitioned table, in which all partitions are unbalanced (some of them almost empty). Hence the pure sql with partition key is hardly optimal for each partition select. So we try to put some dummy hint with partition key to prevent cursor sharing (this sounds counterproductive). Since the sql is heavy, and at one time period, only a few partitions are selected, this paid off seems justified.

Since the problem occurs when upgrade to 11.2.0.4 and much more "Parse Calls" than "Executions" are found in AWR report, probably we hit:
     Oracle MOS: ORA-04030 occurred while executing PLSQL procedure (Doc ID 1953999.1)
which is supposed to be fixed by interim Patch 18384537:
     OPIPLS TAKES TOO MANY RETRIES TO LOAD CURSOR.

In summary, the above discussion reveals one aspect of cooperative work between SGA and PGA. Here we are trying to bring two commonly regarded independent components together.

Test Code



create or replace package pga_mem_test as
  procedure allo (p_mb int, p_sleep number := 0);
end;
/

create or replace package body pga_mem_test as
  type t_tab_kb   is table of char(1024);   -- 1KB
  p_tab_1mb          t_tab_kb := t_tab_kb();
  type t_tab_mb   is table of t_tab_kb;     
  p_tab_mb           t_tab_mb := t_tab_mb();
  p_sid              number   := sys.dbms_support.mysid;
  
  -------------------------------------------
  procedure rpt(l_name varchar) is
     l_v$process_mem            varchar2(4000);
     l_v$process_memory_mem     varchar2(4000);
  begin
   select 'Used/Alloc/Freeable/Max >>> '||
           round(pga_used_mem/1024/1024)    ||'/'||round(pga_alloc_mem/1024/1024)||'/'||
             round(pga_freeable_mem/1024/1024)||'/'||round(pga_max_mem/1024/1024)
       into l_v$process_mem
       from v$process 
       where addr = (select paddr from v$session where sid = p_sid);
      
    select 'Category(Alloc/Used/Max) >>> '||
             listagg(Category||'('||round(allocated/1024/1024)||'/'||
                     round(used/1024/1024)||'/'||round(max_allocated/1024/1024)||') > ')
     within group (order by Category desc) name_usage_list
       into l_v$process_memory_mem
       from v$process_memory
       where pid = (select pid from v$process
                     where addr = (select paddr from v$session where sid = p_sid));
  
    dbms_output.put_line(rpad(l_name, 20)||' > '||rpad(l_v$process_mem, 50));
    dbms_output.put_line('             ------ '||l_v$process_memory_mem);
  end rpt;
   
  -------------------------------------------   
  procedure allo (p_mb int, p_sleep number) is
  begin
   rpt('Start allocate: '||p_mb||' MB');
   
   select 'M' bulk collect into p_tab_1mb from dual connect by level <= 1024;  -- 1MB
   
   for i in 1..p_mb loop   -- p_mb MB
    p_tab_mb.extend;
    p_tab_mb(i) := p_tab_1mb;
   end loop;
  
   rpt('End allocate: '||p_mb||' MB');
   dbms_lock.sleep(p_sleep);
  end allo;
  
end;
/

/*
   exec dbms_session.reset_package;
   set  serveroutput on
   exec pga_mem_test.allo(1024*1, 30);       -- allocate 1GB
*/

------------------------------------------- 
create or replace procedure pga_mem_test_jobs(p_job_cnt number, p_mb number, p_sleep number := 0)
as
   l_job_id pls_integer;
begin
    for i in 1.. p_job_cnt loop
      dbms_job.submit(l_job_id, 'begin pga_mem_test.allo('||p_mb||', '||p_sleep||'); end;');
    end loop;
    commit;
end;    
/

--exec pga_mem_test_jobs(4, 1024*2, 60);   -- 4 Jobs, each allocates 2 GB, sleeping 60 seconds