This Blog is a follow-up of previous Blog: ORA-04030 incident file and alert.log.
1. Event 10261
ORA-00600 , , [pga heap] When Event 10261 Set To Limit The PGA Leak (Doc ID 1162423.1)
This event is useful for PGA memory leaks (and UGA if the UGA is in the PGA). The event causes Oracle to raise an ORA-600 if the PGA tries to grow above the specified size.
In pfile/spfile, for example, the below event:
event = 10261 trace name context forever,level 3145728
enforces a 3.2Gb Gb limit on the PGA size, and replaces the ORA-4030 with an ORA-600  error.
Let's make two tests (see appended Test Code, tested with Oracle 18.104.22.168.0 on AIX):
alter system set event = "10261 trace name context forever,level 3145728" scope=spfile; -- DB must be restarted SQL > exec pga_mem_test.allo(2*1024); -- 2GB allocation is OK SQL > exec pga_mem_test.allo(4*1024); -- 4GB allocation throws error ORA-00600: internal error code, arguments: , , [top uga heap], , , , , , , , ,  Incident Dump shows: ORA-00600: internal error code, arguments: , , [top uga heap], , , , , , , , ,  ========= Dump for incident 16985 (ORA 600 ) ======== ----- Beginning of Customized Incident Dump(s) ----- ****** ERROR: PGA size limit exceeded in rfg: 3221284184 > 3221225472 ***** ****************************************************** where the bottom limit is computed as: 3221225472 = 3145728*1024
2. Limit Parameters
PLSQL Procedure Causing ORA-04030: (pga heap,control file i/o buffer) And ORA-04030:
(koh-kghu sessi,pmuccst: adt/record) or ORA-04030: (koh-kghucall ,pmucalm coll) Errors (Doc ID 1325100.1)
Either Change the page count (memory map entries per process) limit at the OS level, or adjust realfree heap pagesize at the database level:
Change the page count at the OS level:
sysctl -w vm.max_map_count=262144 (for example)
Adjust the realfree heap pagesize within the database by setting the following parameters in the init/spfile and restart the database.
For versions 22.214.171.124 and lower:
_realfree_heap_pagesize_hint = 262144
For 12.1 and higher:
_realfree_heap_pagesize = 262144
As a test with Oracle 126.96.36.199.0 on Linux with following configuration:
The system has 24GB physical memory, and database SGA is 4GB.
Linux$ cat /proc/sys/vm/max_map_count 65530 Linux$ > free -mt total used free shared buffers cached Mem: 24160 5571 18588 1761 131 4257 -/+ buffers/cache: 1183 22977 Swap: 0 0 0 Total: 24160 5571 18588
At first, limit PGA per Session to 65530*4K = 256M
Then, increase PGA per Session limit to 65530*256K = 16G
SQL > alter system set "_realfree_heap_pagesize_hint"=4K scope=spfile; -- restart DB SQL > exec pga_mem_test.allo(1024); the incident dump shows: Dump of Real-Free Memory Allocator Heap [0x7fd64003a000] mag=0xfefe0001 flg=0x5000003 fds=0x6 blksz=4096 blkdstbl=0x7fd64003a010, iniblk=331776 maxblk=524288 numsegs=71 In-use num=65353 siz=1335451648, Freeable num=32 siz=155648, Free num=113 siz=2225766 ... ******************* Dumping process map **************** ... 7f455e47e000-7f455e483000 rw-p 00000000 00:05 4219 /dev/zero ... where blksz=4096. However, siz/num=1335451648/65353=20434, it means that average pagesize is about 20 instead of specified 4k. The entries in process map also confirmed it: 7f455e47e000-7f455e483000 is decimal: 139935911239680-139935911260160 i.e 20k.
Open 3 Sqlplus sessions, at first session, run:
SQL > alter system set "_realfree_heap_pagesize_hint"=256K scope=spfile; -- restart DB
Wait 5 seconds, at second session, run query:
SQL(162,55) > exec pga_mem_test.allo(1024*13, 120);
At third session, run:
SQL(84,59) > set numformat 99,999,999,999 SQL(84,59) > select s.sid, s.program, p.pga_used_mem, p.pga_alloc_mem from v$session s, v$process p where s.paddr=p.addr and p.pga_used_mem >1e9; SID PROGRAM PGA_USED_MEM PGA_ALLOC_MEM ---- ----------- --------------- --------------- 162 sqlplus.exe 15,957,853,158 15,959,926,406
Wait 5 seconds, at second session, run again the query:
SQL(242,559) > exec pga_mem_test.allo(1024*13, 120);
Look again first session:
SQL(84,59) > select s.sid, s.program, p.pga_used_mem, p.pga_alloc_mem from v$session s, v$process p where s.paddr=p.addr and p.pga_used_mem >1e9; SID PROGRAM PGA_USED_MEM PGA_ALLOC_MEM ---- ----------- --------------- --------------- 242 sqlplus.exe 15,957,853,158 15,959,926,406
Checking Oracle alert.log, trace, and incident files, there are nothing about the disconnected 1st session (Process ID: 19710, Session ID: 162 Serial number: 55).
SQL(162,55) > exec pga_mem_test.allo(1024*13, 120); ERROR: ORA-03114: not connected to ORACLE BEGIN pga_mem_test.allo(1024*13, 120); END; ERROR at line 1: ORA-03113: end-of-file on communication channel Process ID: 19710 Session ID: 162 Serial number: 55
Resorting to Linux dmesg, we can see:
The above log shows that Oracle PMON(19555) calls oom_kill_process() to kill one memory offending process(19710) to satisfy the request of the new process(19713).
[12:15:20] oracle invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0 [12:15:20] oracle cpuset=/ mems_allowed=0 [12:15:20] Pid: 19555, comm: oracle Not tainted 2.6.32-642.el6.x86_64 #1 <--- Pid: 19555 PMON [12:15:20] Call Trace: [12:15:20] [<ffffffff81131640>] ? dump_header+0x90/0x1b0 [12:15:20] [<ffffffff8123c20c>] ? security_real_capable_noaudit+0x3c/0x70 [12:15:20] [<ffffffff81131ac2>] ? oom_kill_process+0x82/0x2a0 [12:15:20] [<ffffffff81131a01>] ? select_bad_process+0xe1/0x120 [12:15:20] [<ffffffff81131f00>] ? out_of_memory+0x220/0x3c0 [12:15:20] [<ffffffff8113e8dc>] ? __alloc_pages_nodemask+0x93c/0x950 [12:15:20] [<ffffffff81177b2a>] ? alloc_pages_current+0xaa/0x110 [12:15:20] [<ffffffff8112ea37>] ? __page_cache_alloc+0x87/0x90 [12:15:20] [<ffffffff8112e41e>] ? find_get_page+0x1e/0xa0 [12:15:20] [<ffffffff8112f9d7>] ? filemap_fault+0x1a7/0x500 [12:15:20] [<ffffffff81159394>] ? __do_fault+0x54/0x530 [12:15:20] [<ffffffff81159967>] ? handle_pte_fault+0xf7/0xb20 [12:15:20] [<ffffffff8122959c>] ? sem_lock+0x6c/0x130 [12:15:20] [<ffffffff8122b298>] ? sys_semtimedop+0x338/0xae0 [12:15:20] [<ffffffff8115a629>] ? handle_mm_fault+0x299/0x3d0 [12:15:20] [<ffffffff8100bc0e>] ? apic_timer_interrupt+0xe/0x20 [12:15:20] [<ffffffff81052156>] ? __do_page_fault+0x146/0x500 [12:15:20] [<ffffffff810688ed>] ? thread_group_times+0x3d/0x120 [12:15:20] [<ffffffff81079b8e>] ? mmput+0x1e/0x120 [12:15:20] [<ffffffff8109c348>] ? getrusage+0x158/0x340 [12:15:20] [<ffffffff8154dbce>] ? do_page_fault+0x3e/0xa0 [12:15:20] [<ffffffff8154aed5>] ? page_fault+0x25/0x30 ... [12:15:20] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name [12:15:20]  100 19555 1168551 871 0 0 0 oracle <--- PMON ... [12:15:20]  100 19710 5072861 3995279 0 0 0 oracle <--- 1st session [12:15:20]  100 19713 2974770 1866502 5 0 0 oracle <--- 3rd session [12:15:20]  100 19715 1171449 60050 4 0 0 oracle <--- 2nd session ... [12:15:20] Out of memory: Kill process 19710 (oracle) score 647 or sacrifice child [12:15:20] Killed process 19710, UID 100, (oracle) total-vm:20291444kB, anon-rss:15600524kB, file-rss:380592kB
process 19710 is killed due to badness score 647:
Out of memory: Kill process 19710 (oracle) score 647 or sacrifice child
The baseline for the badness score is the proportion of RAM that each task's rss, pagetable and swap space use.
In Linux oom_kill.c, out_of_memory() calls select_bad_process() to find processes to be killed.
If found, kill by oom_kill_process().
If not found, panic the system (halt the system, never return) by:
panic("Out of memory and no killable processes...\n");
and it is emphasized by the comment:
out_of_memory - kill the "best" process when we run out of memory
Found nothing?!?! Either we hang forever, or we panic.
The hidden parameter _pga_max_size does not limit a process size, only the work area.
3. 12c PGA_AGGREGATE_LIMIT
Doc ID 1520324.1: Limiting process size with database parameter PGA_AGGREGATE_LIMIT
While PGA_AGGREGATE_TARGET only controls allocations of tunable memory, PGA_AGGREGATE_LIMIT aborts or terminates the sessions or processes that are consuming the most untunable PGA memory, such as:
pl/sql memory areas
session context, cursor caches (MOS Doc ID 284951.1).
This new initialization parameter dynamically sets an instance-wide hard limit for PGA memory.
If the value of PGA_AGGREGATE_LIMIT is reached, a 12c new error message will be reported:
ORA-04036: PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT
4. PGA Memory Components
Blog TUNING PGA : PART – I (Anju Garg) classified PGA in functions, and by tunability:
Session information such as logon information, and other information required by a database Session.
SQL Work areas : used for sorting, hash operations etc.
Private SQL Area : contains Open/Closed cursors and cursor state information for open cursors for example,
the number of rows retrieved so far in a full table scan.
Context information of each session
Each open cursor
PL/SQL, OLAP or Java memory
SQL work areas
5. PGA Overallocation vs. tunable and non-tunable areas
Oracle MOS: LOW PGA HIT RATIO THOUGH OVER ALLOCATION COUNT IS NONE (Doc ID 284951.1) said:
Over-allocating PGA memory can happen if the value of PGA_AGGREGATE_TARGET is too small to accommodate the PGA component un tunable (session context, cursor caches, etc) memory plus the minimum memory required to execute the work area workload. When this happens, Oracle cannot honor the initialization parameter PGA_AGGREGATE_TARGET, and extra PGA memory needs to be allocated.
The number of times Oracle had to allocate more PGA memory then the PGA_AGGREGATE_TARGET suggested. This indicates the PGA target was set too small to accommodate the un tunable (session context, cursor caches, etc) memory plus the tunable component. This count should be zero ideally.
Oracle MOS: How To Avoid ORA-04030/ORA-12500 In 32-bit Windows Environment [Video] (Doc ID 373602.1) wrote:
Within the PGA we have "tunable" and "non-tunable" areas. The tunable part is memory allocated for intensive SQL operations such as sorts, hash-joins, bitmap merge, and bitmap index create. This memory can be shrunk and expanded in response to system load. However, the non-tunable part cannot be managed in the same way. Importantly the non-tunable part includes cursors. We can control the number of cursors by the init.ora OPEN_CURSORS parameter, but if this is exceeded we get an ORA-1000 error, which is clearly undesirable. See unpublished Note:1012266.6 - "Overview of ORA-1000 Maximum Number of Cursors Exceeded" for more info. More importantly, however, we have no control over the size of a cursor, and users may open very large cursors dependent on their SQL or PLSQL.
Also note that if we set PGA_AGGREGATE_TARGET too small to accommodate the non-tunable part of the PGA plus the minimum memory required to execute the tunable part, then Oracle cannot honour the PGA_AGGREGATE_TARGET value, and will attempt to allocate extra memory. This is known as overallocation, and an estimation of this can be seen in the view V$PGA_TARGET_ADVICE under the column ESTD_OVERALLOC_COUNT.
PGA memory is divided as tunable and non-tunable areas, while tunable is constrained under PGA_AGGREGATE_TARGET, non-tunable can be over allocated in any size (till ORA-04030), and Oracle records these activities in column ESTD_OVERALLOC_COUNT of V$PGA_TARGET_ADVICE. Therefore ESTD_OVERALLOC_COUNT is caused by over request of non-tunable areas.
Before each overallocation, probably Oracle tries to deallocate certain less used memory (for example, LRU Algorithm) at first. If not satisfied, new memory is allocated.
OPEN_CURSORS specifies the maximum number of open cursors (handles to private SQL areas) a session can have at once. Under overallocation, session cursor caches could be subject to memory deallocation.
In Oracle, each child cursor is associated with one KGLH0 and one SQLA (both in SGA), where KGLH0 stores environment information, SQLA stores parsing tree and xplan. When memory is reclaimed, KGLH0 stays, whereas SQLA is deallocated.
Once child cursor is cleaned out from session cursor caches, and no more found in SQLA. The later re-use of the same child cursor will result in a hard-parsing, and normally Oracle SQL Trace shows it as:
Misses in library cache during parse: 0
Misses in library cache during execute: 1
Probably the parsing "during execute" is a quick parsing with less optimization compared to “during parse”, for example, limiting "_optimizer_max_permutations" to a small number, so that it could be faster and not disturb too much execution phase. But in the negative side, it could be that it would not find the optimal xplan.
We have observed such sub-optimal xplan generated in "during execute" phase, but in general we don't know how to determine if an xplan is created "during parse" or "during execute".
In order to fix such sub-optimal xplan generated in "during execute" phase, PGA_AGGREGATE_TARGET is increased to avoid PGA overallocation, and at the same time, the SQL statement is manipulated to be different so that it is forced to be hard parsed from scratch.
The problem sql involves partitions of a partitioned table, in which all partitions are unbalanced (some of them almost empty). Hence the pure sql with partition key is hardly optimal for each partition select. So we try to put some dummy hint with partition key to prevent cursor sharing (this sounds counterproductive). Since the sql is heavy, and at one time period, only a few partitions are selected, this paid off seems justified.
Since the problem occurs when upgrade to 188.8.131.52 and much more "Parse Calls" than "Executions" are found in AWR report, probably we hit:
Oracle MOS: ORA-04030 occurred while executing PLSQL procedure (Doc ID 1953999.1)
which is supposed to be fixed by interim Patch 18384537:
OPIPLS TAKES TOO MANY RETRIES TO LOAD CURSOR.
In summary, the above discussion reveals one aspect of cooperative work between SGA and PGA. Here we are trying to bring two commonly regarded independent components together.
create or replace package pga_mem_test as procedure allo (p_mb int, p_sleep number := 0); end; / create or replace package body pga_mem_test as type t_tab_kb is table of char(1024); -- 1KB p_tab_1mb t_tab_kb := t_tab_kb(); type t_tab_mb is table of t_tab_kb; p_tab_mb t_tab_mb := t_tab_mb(); p_sid number := sys.dbms_support.mysid; ------------------------------------------- procedure rpt(l_name varchar) is l_v$process_mem varchar2(4000); l_v$process_memory_mem varchar2(4000); begin select 'Used/Alloc/Freeable/Max >>> '|| round(pga_used_mem/1024/1024) ||'/'||round(pga_alloc_mem/1024/1024)||'/'|| round(pga_freeable_mem/1024/1024)||'/'||round(pga_max_mem/1024/1024) into l_v$process_mem from v$process where addr = (select paddr from v$session where sid = p_sid); select 'Category(Alloc/Used/Max) >>> '|| listagg(Category||'('||round(allocated/1024/1024)||'/'|| round(used/1024/1024)||'/'||round(max_allocated/1024/1024)||') > ') within group (order by Category desc) name_usage_list into l_v$process_memory_mem from v$process_memory where pid = (select pid from v$process where addr = (select paddr from v$session where sid = p_sid)); dbms_output.put_line(rpad(l_name, 20)||' > '||rpad(l_v$process_mem, 50)); dbms_output.put_line(' ------ '||l_v$process_memory_mem); end rpt; ------------------------------------------- procedure allo (p_mb int, p_sleep number) is begin rpt('Start allocate: '||p_mb||' MB'); select 'M' bulk collect into p_tab_1mb from dual connect by level <= 1024; -- 1MB for i in 1..p_mb loop -- p_mb MB p_tab_mb.extend; p_tab_mb(i) := p_tab_1mb; end loop; rpt('End allocate: '||p_mb||' MB'); dbms_lock.sleep(p_sleep); end allo; end; / /* exec dbms_session.reset_package; set serveroutput on exec pga_mem_test.allo(1024*1, 30); -- allocate 1GB */ ------------------------------------------- create or replace procedure pga_mem_test_jobs(p_job_cnt number, p_mb number, p_sleep number := 0) as l_job_id pls_integer; begin for i in 1.. p_job_cnt loop dbms_job.submit(l_job_id, 'begin pga_mem_test.allo('||p_mb||', '||p_sleep||'); end;'); end loop; commit; end; / --exec pga_mem_test_jobs(4, 1024*2, 60); -- 4 Jobs, each allocates 2 GB, sleeping 60 seconds