Monday, September 14, 2015

Limit PGA Memory Usage

Following MOS Notes lists 2 measures to control and limit PGA memory usage, but neither are recommended by Oracle. However, the best option is to fix the application.

This Blog is a follow-up of previous Blog: ORA-04030 incident file and alert.log.

1.   Event 10261

Oracle MOS:
    ORA-00600 [723], [67108952], [pga heap] When Event 10261 Set To Limit The PGA Leak (Doc ID 1162423.1)

This event is useful for PGA memory leaks (and UGA if the UGA is in the PGA). The event causes Oracle to raise an ORA-600 if the PGA tries to grow above the specified size.

In pfile/spfile, for example, the below event:
  event = 10261 trace name context forever,level 3145728
enforces a 3.2Gb Gb limit on the PGA size, and replaces the ORA-4030 with an ORA-600 [723] error.

Let's make two tests (see appended Test Code, tested with Oracle on AIX):

alter system set event = "10261 trace name context forever,level 3145728" scope=spfile;
    -- DB must be restarted

SQL > exec pga_mem_test.allo(2*1024);
    -- 2GB allocation is OK
SQL > exec pga_mem_test.allo(4*1024);
    -- 4GB allocation throws error
ORA-00600: internal error code, arguments: [723], [65520], [top uga heap], [], [], [], [], [], [], [], [], []

Incident Dump shows:

 ORA-00600: internal error code, arguments: [723], [65520], [top uga heap], [], [], [], [], [], [], [], [], []
 ========= Dump for incident 16985 (ORA 600 [723]) ========
 ----- Beginning of Customized Incident Dump(s) -----
 ****** ERROR: PGA size limit exceeded in rfg: 3221284184 > 3221225472 *****

where the bottom limit is computed as:
   3221225472 = 3145728*1024
Update: In Oracle 19.13, the output looks like:

SQL > exec pga_mem_test.allo(4*1024);
BEGIN pga_mem_test.allo(4*1024); END;

ERROR at line 1:
ORA-04068: existing state of packages has been discarded
ORA-10260: PGA limit (3072 MB) exceeded - process terminated
ORA-06512: at "K.PGA_MEM_TEST", line 44
ORA-06512: at line 1

2.   Limit Parameters

Oracle MOS:
   PLSQL Procedure Causing ORA-04030: (pga heap,control file i/o buffer) And ORA-04030:
   (koh-kghu sessi,pmuccst: adt/record) or ORA-04030: (koh-kghucall ,pmucalm coll) Errors (Doc ID 1325100.1)

Either Change the page count (memory map entries per process) limit at the OS level, or adjust realfree heap pagesize at the database level:
Change the page count at the OS level:
     more /proc/sys/vm/max_map_count
  sysctl -w vm.max_map_count=262144 (for example)

Adjust the realfree heap pagesize within the database by setting the following parameters in the init/spfile and restart the database.

For versions and lower:
  _realfree_heap_pagesize_hint = 262144

For 12.1 and higher:
  _realfree_heap_pagesize = 262144

As a test with Oracle on Linux with following configuration:

Linux$ cat /proc/sys/vm/max_map_count
Linux$ > free -mt
             total       used       free     shared    buffers     cached
Mem:         24160       5571      18588       1761        131       4257
-/+ buffers/cache:       1183      22977
Swap:            0          0          0
Total:       24160       5571      18588

The system has 24GB physical memory, and database SGA is 4GB.

At first, limit PGA per Session to 65530*4K = 256M

SQL > alter system set "_realfree_heap_pagesize_hint"=4K scope=spfile;
   -- restart DB
SQL > exec pga_mem_test.allo(1024); 

the incident dump shows:      

Dump of Real-Free Memory Allocator Heap [0x7fd64003a000]
mag=0xfefe0001 flg=0x5000003 fds=0x6 blksz=4096
blkdstbl=0x7fd64003a010, iniblk=331776 maxblk=524288 numsegs=71
In-use num=65353 siz=1335451648, Freeable num=32 siz=155648, Free num=113 siz=2225766
******************* Dumping process map ****************
7f455e47e000-7f455e483000 rw-p 00000000 00:05 4219                       /dev/zero
where blksz=4096.

However, siz/num=1335451648/65353=20434, it means that average pagesize is about 20 instead of specified 4k.
The entries in process map also confirmed it:
is decimal:
i.e 20k.

Then, increase PGA per Session limit to 65530*256K = 16G

SQL > alter system set "_realfree_heap_pagesize_hint"=256K scope=spfile;
   -- restart DB

Open 3 Sqlplus sessions, at first session, run:

SQL(162,55) > exec pga_mem_test.allo(1024*13, 120);

Wait 5 seconds, at second session, run query:

SQL(84,59) > set numformat 99,999,999,999
SQL(84,59) > select s.sid, s.program, p.pga_used_mem, p.pga_alloc_mem 
               from v$session s, v$process p
              where s.paddr=p.addr and p.pga_used_mem >1e9;

---- ----------- --------------- ---------------
 162 sqlplus.exe  15,957,853,158  15,959,926,406

At third session, run:

SQL(242,559) > exec pga_mem_test.allo(1024*13, 120);

Wait 5 seconds, at second session, run again the query:

SQL(84,59) > select s.sid, s.program, p.pga_used_mem, p.pga_alloc_mem 
               from v$session s, v$process p
              where s.paddr=p.addr and p.pga_used_mem >1e9;
---- ----------- --------------- ---------------
 242 sqlplus.exe  15,957,853,158  15,959,926,406 

Look again first session:

SQL(162,55) > exec pga_mem_test.allo(1024*13, 120);
ORA-03114: not connected to ORACLE
BEGIN pga_mem_test.allo(1024*13, 120); END;
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 19710
Session ID: 162 Serial number: 55

Checking Oracle alert.log, trace, and incident files, there are nothing about the disconnected 1st session (Process ID: 19710, Session ID: 162 Serial number: 55).

Resorting to Linux dmesg, we can see:

[12:15:20] oracle invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
[12:15:20] oracle cpuset=/ mems_allowed=0
[12:15:20] Pid: 19555, comm: oracle Not tainted 2.6.32-642.el6.x86_64 #1  <--- Pid: 19555 PMON
[12:15:20] Call Trace:
[12:15:20] [<ffffffff81131640>] ? dump_header+0x90/0x1b0
[12:15:20] [<ffffffff8123c20c>] ? security_real_capable_noaudit+0x3c/0x70
[12:15:20] [<ffffffff81131ac2>] ? oom_kill_process+0x82/0x2a0
[12:15:20] [<ffffffff81131a01>] ? select_bad_process+0xe1/0x120
[12:15:20] [<ffffffff81131f00>] ? out_of_memory+0x220/0x3c0
[12:15:20] [<ffffffff8113e8dc>] ? __alloc_pages_nodemask+0x93c/0x950
[12:15:20] [<ffffffff81177b2a>] ? alloc_pages_current+0xaa/0x110
[12:15:20] [<ffffffff8112ea37>] ? __page_cache_alloc+0x87/0x90
[12:15:20] [<ffffffff8112e41e>] ? find_get_page+0x1e/0xa0
[12:15:20] [<ffffffff8112f9d7>] ? filemap_fault+0x1a7/0x500
[12:15:20] [<ffffffff81159394>] ? __do_fault+0x54/0x530
[12:15:20] [<ffffffff81159967>] ? handle_pte_fault+0xf7/0xb20
[12:15:20] [<ffffffff8122959c>] ? sem_lock+0x6c/0x130
[12:15:20] [<ffffffff8122b298>] ? sys_semtimedop+0x338/0xae0
[12:15:20] [<ffffffff8115a629>] ? handle_mm_fault+0x299/0x3d0
[12:15:20] [<ffffffff8100bc0e>] ? apic_timer_interrupt+0xe/0x20
[12:15:20] [<ffffffff81052156>] ? __do_page_fault+0x146/0x500
[12:15:20] [<ffffffff810688ed>] ? thread_group_times+0x3d/0x120
[12:15:20] [<ffffffff81079b8e>] ? mmput+0x1e/0x120
[12:15:20] [<ffffffff8109c348>] ? getrusage+0x158/0x340
[12:15:20] [<ffffffff8154dbce>] ? do_page_fault+0x3e/0xa0
[12:15:20] [<ffffffff8154aed5>] ? page_fault+0x25/0x30
[12:15:20] [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
[12:15:20] [19555]   100 19555  1168551      871   0       0             0 oracle  <--- PMON
[12:15:20] [19710]   100 19710  5072861  3995279   0       0             0 oracle  <--- 1st session
[12:15:20] [19713]   100 19713  2974770  1866502   5       0             0 oracle  <--- 3rd session
[12:15:20] [19715]   100 19715  1171449    60050   4       0             0 oracle  <--- 2nd session
[12:15:20] Out of memory: Kill process 19710 (oracle) score 647 or sacrifice child
[12:15:20] Killed process 19710, UID 100, (oracle) total-vm:20291444kB, anon-rss:15600524kB, file-rss:380592kB
The above log shows that Oracle PMON(19555) calls oom_kill_process() to kill one memory offending process(19710) to satisfy the request of the new process(19713).

process 19710 is killed due to badness score 647:
  Out of memory: Kill process 19710 (oracle) score 647 or sacrifice child

The baseline for the badness score is the proportion of RAM that each task's rss, pagetable and swap space use.

In Linux oom_kill.c, out_of_memory() calls select_bad_process() to find processes to be killed.
  If found, kill by oom_kill_process().
  If not found, panic the system (halt the system, never return) by:
     panic("Out of memory and no killable processes...\n");

and it is emphasized by the comment:

  out_of_memory - kill the "best" process when we run out of memory
  Found nothing?!?! Either we hang forever, or we panic.

The hidden parameter _pga_max_size does not limit a process size, only the work area.


Oracle MOS:
   Doc ID 1520324.1: Limiting process size with database parameter PGA_AGGREGATE_LIMIT

While PGA_AGGREGATE_TARGET only controls allocations of tunable memory, PGA_AGGREGATE_LIMIT aborts or terminates the sessions or processes that are consuming the most untunable PGA memory, such as:
     pl/sql memory areas
  session context, cursor caches (MOS Doc ID 284951.1).

This new initialization parameter dynamically sets an instance-wide hard limit for PGA memory.

If the value of PGA_AGGREGATE_LIMIT is reached, a 12c new error message will be reported:
  ORA-04036: PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT

4.  PGA Memory Components

Blog TUNING PGA : PART – I (Anju Garg) classified PGA in functions, and by tunability:

PGA components:
    Stack space
        bind variables
        arrays (PL/SQL)
        Session information such as logon information, and other information required by a database Session.
        SQL Work areas : used for sorting, hash operations etc.
        Private SQL Area : contains Open/Closed cursors and cursor state information for open cursors for example, 
                                        the number of rows retrieved so far in a full table scan.

PGA areas:
    Untunable PGA
         Context information of each session
         Each open cursor
         PL/SQL, OLAP or Java memory
   Tunable PGA
         SQL work areas

5.  PGA Overallocation vs. tunable and non-tunable areas


 Over-allocating PGA memory can happen if the value of PGA_AGGREGATE_TARGET is too small to accommodate the PGA component un tunable (session context, cursor caches, etc) memory plus the minimum memory required to execute the work area workload. When this happens, Oracle cannot honor the initialization parameter PGA_AGGREGATE_TARGET, and extra PGA memory needs to be allocated.
The number of times Oracle had to allocate more PGA memory then the PGA_AGGREGATE_TARGET suggested. This indicates the PGA target was set too small to accommodate the un tunable (session context, cursor caches, etc) memory plus the tunable component. This count should be zero ideally.

Oracle MOS: How To Avoid ORA-04030/ORA-12500 In 32-bit Windows Environment [Video] (Doc ID 373602.1) wrote:

 Within the PGA we have "tunable" and "non-tunable" areas. The tunable part is memory allocated for intensive SQL operations such as sorts, hash-joins, bitmap merge, and bitmap index create. This memory can be shrunk and expanded in response to system load. However, the non-tunable part cannot be managed in the same way. Importantly the non-tunable part includes cursors. We can control the number of cursors by the init.ora OPEN_CURSORS parameter, but if this is exceeded we get an ORA-1000 error, which is clearly undesirable. See unpublished Note:1012266.6 - "Overview of ORA-1000 Maximum Number of Cursors Exceeded" for more info. More importantly, however, we have no control over the size of a cursor, and users may open very large cursors dependent on their SQL or PLSQL.
 Also note that if we set PGA_AGGREGATE_TARGET too small to accommodate the non-tunable part of the PGA plus the minimum memory required to execute the tunable part, then Oracle cannot honour the PGA_AGGREGATE_TARGET value, and will attempt to allocate extra memory. This is known as overallocation, and an estimation of this can be seen in the view V$PGA_TARGET_ADVICE under the column ESTD_OVERALLOC_COUNT.

PGA memory is divided as tunable and non-tunable areas, while tunable is constrained under PGA_AGGREGATE_TARGET, non-tunable can be over allocated in any size (till ORA-04030), and Oracle records these activities in column ESTD_OVERALLOC_COUNT of V$PGA_TARGET_ADVICE. Therefore ESTD_OVERALLOC_COUNT is caused by over request of non-tunable areas.

Before each overallocation, probably Oracle tries to deallocate certain less used memory (for example, LRU Algorithm) at first. If not satisfied, new memory is allocated.

OPEN_CURSORS specifies the maximum number of open cursors (handles to private SQL areas) a session can have at once. Under overallocation, session cursor caches could be subject to memory deallocation.

In Oracle, each child cursor is associated with one KGLH0 and one SQLA (both in SGA), where KGLH0 stores environment information, SQLA stores parsing tree and xplan. When memory is reclaimed, KGLH0 stays, whereas SQLA is deallocated.

Once child cursor is cleaned out from session cursor caches, and no more found in SQLA. The later re-use of the same child cursor will result in a hard-parsing, and normally Oracle SQL Trace shows it as:

     Misses in library cache during parse:   0
    Misses in library cache during execute: 1

Probably the parsing "during execute" is a quick parsing with less optimization compared to “during parse”, for example, limiting "_optimizer_max_permutations" to a small number, so that it could be faster and not disturb too much execution phase. But in the negative side, it could be that it would not find the optimal xplan.

We have observed such sub-optimal xplan generated in "during execute" phase, but in general we don't know how to determine if an xplan is created "during parse" or "during execute".

In order to fix such sub-optimal xplan generated in "during execute" phase, PGA_AGGREGATE_TARGET is increased to avoid PGA overallocation, and at the same time, the SQL statement is manipulated to be different so that it is forced to be hard parsed from scratch.

The problem sql involves partitions of a partitioned table, in which all partitions are unbalanced (some of them almost empty). Hence the pure sql with partition key is hardly optimal for each partition select. So we try to put some dummy hint with partition key to prevent cursor sharing (this sounds counterproductive). Since the sql is heavy, and at one time period, only a few partitions are selected, this paid off seems justified.

Since the problem occurs when upgrade to and much more "Parse Calls" than "Executions" are found in AWR report, probably we hit:
     Oracle MOS: ORA-04030 occurred while executing PLSQL procedure (Doc ID 1953999.1)
which is supposed to be fixed by interim Patch 18384537:

In summary, the above discussion reveals one aspect of cooperative work between SGA and PGA. Here we are trying to bring two commonly regarded independent components together.

Test Code

create or replace package pga_mem_test as
  procedure allo (p_mb int, p_sleep number := 0);

create or replace package body pga_mem_test as
  type t_tab_kb   is table of char(1024);   -- 1KB
  p_tab_1mb          t_tab_kb := t_tab_kb();
  type t_tab_mb   is table of t_tab_kb;     
  p_tab_mb           t_tab_mb := t_tab_mb();
  p_sid              number;  --   := sys.dbms_support.mysid;
  procedure rpt(l_name varchar) is
     l_v$process_mem            varchar2(4000);
     l_v$process_memory_mem     varchar2(4000);
   select 'Used/Alloc/Freeable/Max >>> '||
           round(pga_used_mem/1024/1024)    ||'/'||round(pga_alloc_mem/1024/1024)||'/'||
       into l_v$process_mem
       from v$process 
       where addr = (select paddr from v$session where sid = p_sid);
    select 'Category(Alloc/Used/Max) >>> '||
                     round(used/1024/1024)||'/'||round(max_allocated/1024/1024)||') > ')
     within group (order by Category desc) name_usage_list
       into l_v$process_memory_mem
       from v$process_memory
       where pid = (select pid from v$process
                     where addr = (select paddr from v$session where sid = p_sid));
    dbms_output.put_line(rpad(l_name, 20)||' > '||rpad(l_v$process_mem, 50));
    dbms_output.put_line('             ------ '||l_v$process_memory_mem);
  end rpt;
  procedure allo (p_mb int, p_sleep number) is
   select sid into p_sid from v$mystat s where rownum <=1;
   rpt('Start allocate: '||p_mb||' MB');
   select 'M' bulk collect into p_tab_1mb from dual connect by level <= 1024;  -- 1MB
   for i in 1..p_mb loop   -- p_mb MB
    p_tab_mb(i) := p_tab_1mb;
   end loop;
   rpt('End allocate: '||p_mb||' MB');
  end allo;

   exec dbms_session.reset_package;
   set  serveroutput on
   exec pga_mem_test.allo(1024*1, 30);       -- allocate 1GB

create or replace procedure pga_mem_test_jobs(p_job_cnt number, p_mb number, p_sleep number := 0)
   l_job_id pls_integer;
    for i in 1.. p_job_cnt loop
      dbms_job.submit(l_job_id, 'begin pga_mem_test.allo('||p_mb||', '||p_sleep||'); end;');
    end loop;

--exec pga_mem_test_jobs(4, 1024*2, 60);   -- 4 Jobs, each allocates 2 GB, sleeping 60 seconds