Friday, February 3, 2017

Oracle JVM Java OutOfMemoryError and lazy GC

Java Stored Procedure running in Oracle RDBMS embedded JVM can throw OutOfMemoryError in connection with lazy GC.
All tests are done in Oracle 12.1.0.2.0.


1. Test Setup



create or replace java source named S."OracleJavaOOM512" as
public class OracleJavaOOM512 {
  // final static byte[] SB = new byte[1024*1024*500];  // workaround
  public static void createBuffer(int bufferSize) {
    // OracleRuntime.setMaxRunspaceSize(1024*1024*1024) // set 1GB, no help
    byte[] SB = new byte[bufferSize];
  }
}
/

alter JAVA CLASS S."OracleJavaOOM512" COMPILE;

create or replace procedure createBuffer512(p_buffer_size in number) is language java
name 'OracleJavaOOM512.createBuffer(int)';
/

create or replace procedure createBuffer512_loop 
  (p_cnt number, p_start number, p_delta number, p_sleep_seconds number := 0) as
  l_size number := p_start;
begin
  for i in 1..p_cnt loop
    dbms_lock.sleep(p_sleep_seconds);
    dbms_output.put_line('Buffer Size (MB) = '||l_size);
    createBuffer512(1024*1024*l_size);
    l_size := l_size + p_delta;
  end loop;
end;
/


2. Tests


Run first test to allocate 511MB:

Sqlplus > exec createBuffer512(1024*1024*511);
    PL/SQL procedure successfully completed

succeeded without Error.

Run second test for 512MB:

Sqlplus > exec createBuffer512(1024*1024*512);
    ORA-29532: Java call terminated by uncaught Java exception: java.lang.OutOfMemoryError
    ORA-06512: at "S.CREATEBUFFER512", line 1

It hits java.lang.OutOfMemoryError, which probably indicates that the JVM is limited by 512MB.

Calling OracleRuntime.setMaxRunspaceSize(1024*1024*1024) to set MaxRunspaceSize to 1GB does not help.

If interested, make a 29532 event trace:

Sqlplus > alter session set max_dump_file_size = UNLIMITED;
Sqlplus > alter session set tracefile_identifier='OOM512_29532'; 
Sqlplus > alter session set events '29532 trace name errorstack level 3'; 
Sqlplus > exec createBuffer512(1024*1024*512);
    ORA-29532: Java call terminated by uncaught Java exception: java.lang.OutOfMemoryError

Open a new Oracle session, run third test case, which gradually allocates memory from 1MB to 511MB, each time increases 1MB per call.

Sqlplus > exec createBuffer512_loop(511, 1, 1);

    Buffer Size (MB) = 1
    Buffer Size (MB) = 2
    ...
    Buffer Size (MB) = 265
    Buffer Size (MB) = 266
    BEGIN createBuffer512_loop(511); END;
    
    ERROR at line 1:
    ORA-29532: Java call terminated by uncaught Java exception: java.lang.OutOfMemoryError
    ORA-04030: out of process memory when trying to allocate 283285584 bytes (joxu pga heap,f:OldSpace)
    ORA-06512: at "S.CREATEBUFFER512", line 1
    ORA-06512: at "S.CREATEBUFFER512_LOOP", line 5

It throws OutOfMemoryError when allocating 266MB (the number can be varied, but before reaching 511), and generates trace file and incident file.


3. Oracle JVM Garbage Collection


Oracle 12c Document: Database Java Developer's Guide Chapter 1 Introduction to Java in Oracle Database, Section Memory Spaces Management depicts a 3-Layer Java memory Management:
  1.  New Space in Call space
  2.  Old Space in Call space
  3.  Session Memory in Session space
and 3 corresponding garbage collection algorithms:
  1.  Generational scavenging for short-lived objects
  2.  Mark and lazy sweep collection for objects that exist for the life of a single call
  3.  Copying collector for long-lived objects, that is, objects that live across calls within a session
It also mentions that Oracle JVM Garbage collection uses Oracle Database scheduling facilities.

Book: Oracle Database Programming using Java and Web Services (Kuassi Mensah) Section 2.2 Java Memory Management (Page 41-51) reveals internal GC mechanisms (probably written for Oracle 10g).

Garbage Collection Techniques
  The OracleJVM memory manager uses a set of GC techniques for its various memory structures (listed in Table 2.1), including the generational GC, mark-sweep GC, and copy GC.

These 3 GC techniques in Oracle 10g look similar to GC algorithms in 12c.

Mark-sweep GC
  A mark-sweep GC consists of two phases: (1) the mark phase (garbage detection) and the (2) sweep phase (Garbage Collection). The sweep phase places the “garbaged” memory on the freelists. The allocation is serviced out of the freelists. Lazy sweeping is a variation technique that marks objects normally, but, in the sweep phase, leaves it up to the allocation phase to determine whether the marked objects are live (no need to reclaim) or not (can be reclaimed), rather than sweeping the object memory at that time.

Java Memory Areas
  Oldspace is an object memory used for holding long-lived or large objects (i.e., larger than 1-K Bytes) for the duration of a call. It is cleaned up using Marksweep GC (described earlier) for large objects; it uses a variation of “lazy sweeping” for small objects.

Looking up Table 2.1 "Summary of OracleJVM Memory Structure" (Page 50), since our test allocation is more than 1-K Bytes, it should be allocated into Old-space using Buddy memory allocation, and garbage collected by Mark-Sweep.


4. Reasoning


The error message in Sqlplus Window indicates that memory problem is caused by:
    joxu pga heap,f:OldSpace
which also appears in trace file.

An excerpt of incident file looks like:

=======================================
TOP 10 MEMORY USES FOR THIS PROCESS
---------------------------------------
84%   91 MB,  17 chunks: "f:OldSpace                "  JAVA
         joxu pga heap   ds=fffffd7ffbe8bcc8  dsprt=fffffd7ffc0a9078
 2% 2432 KB,  92 chunks: "free memory               "  
         pga heap        ds=fffffd7ffc345640  dsprt=0
 2% 1767 KB, 437 chunks: "free memory               "  
         session heap    ds=fffffd7ffc02d728  dsprt=fffffd7ffc358350
 1% 1064 KB,   2 chunks: "permanent memory          "  JAVA
         joxu pga heap   ds=fffffd7ffbe8bcc8  dsprt=fffffd7ffc0a9078
...
=======================================
PRIVATE MEMORY SUMMARY FOR THIS PROCESS
---------------------------------------
******************************************************
PRIVATE HEAP SUMMARY DUMP
108 MB total:
   104 MB commented, 519 KB permanent
  4125 KB free (0 KB in empty extents),
      97 MB,   1 heap:    "joxp heap      "            2315 KB free held
    8894 KB,   1 heap:    "session heap   "            679 KB free held
------------------------------------------------------
Summary of subheaps at depth 1
104 MB total:
   101 MB commented, 952 KB permanent
  1814 KB free (25 KB in empty extents),
      94 MB,   1 heap:    "joxu pga heap  "           
------------------------------------------------------
Summary of subheaps at depth 2
99 MB total:
    97 MB commented, 1408 KB permanent
   272 KB free (0 KB in empty extents),
      91 MB,  17 chunks:  "f:OldSpace                "

=========================================
REAL-FREE ALLOCATOR DUMP FOR THIS PROCESS
-----------------------------------------
 
Dump of Real-Free Memory Allocator Heap [0xfffffd7ffbfed000]
mag=0xfefe0001 flg=0x5000003 fds=0x0 blksz=65536
blkdstbl=0xfffffd7ffbfed010, iniblk=523264 maxblk=524288 numsegs=255
In-use num=133 siz=112984064, Freeable num=224 siz=2914910208, Free num=15 siz=21889024

...
-------------------------
Top 10 processes:
-------------------------
(percentage is of 27 GB total allocated memory)
99% pid 66: 105 MB used of 27 GB allocated (27 GB freeable) <= CURRENT PROC
 0% pid 11: 7792 KB used of 8294 KB allocated 
...
======================================================
ESTIMATED MEMORY USES FOR ALL PROCESSES
------------------------------------------------------
(from 1 snapshot out of 67 processes)
99%     27 GB,   66 processes: "unsnapshotted               "   
 1%    172 MB,    1 process  : "free memory                 "   
           pga heap             95 chunks
 0%     93 MB,    1 process  : "f:OldSpace                  "  JAVA
...

******************* Dumping process map ****************
    Start addr     -      End addr       Size    PgSz       Shmid    Perms  Object name
------------------ -  ----------------- -------  ----     ---------- ------------------------------
...
        0x80000000 -        0x1af000000 4964352K    4K     0x40000039 rwxs-  [ anon ] osm 
...
0xfffffd792326d000 - 0xfffffd79333bd000 263488K    4K     0xffffffff rw---  [ anon ] 
0xfffffd793355d000 - 0xfffffd794365d000 263168K    4K     0xffffffff rw---  [ anon ] 
0xfffffd794374d000 - 0xfffffd795374d000 262144K    4K     0xffffffff rw---  [ anon ] 
0xfffffd795377d000 - 0xfffffd796293d000 247552K    4K     0xffffffff rw---  [ anon ] 
0xfffffd7962aad000 - 0xfffffd7971b6d000 246528K    4K     0xffffffff rw---  [ anon ]
...

There are about 30GB segments are marked as:
   rw--- [ anon ]
which are session's PGA.

Above output shows:
   105 MB used of 27 GB allocated (27 GB freeable) <= CURRENT PROC
among 27 GB allocated, almost all of which is freeable.

However due to Oracle JVM lazy GC, they are not immediately released, and eventually leads to:
   ORA-04030: out of process memory when trying to allocate 283285584 bytes (joxu pga heap,f:OldSpace)

(By the way, biggest chunk 4964352K is SGA, created with Optimized Shared Memory (OSM) available since Oracle 12c when sga_max_size is not configured, or is set greater than the computed value.
See ISM, DISM in Oracle 11g Blog: PGA, SGA space usage watching from UNIX)

Concerning the freeable memory, V$PROCESS_MEMORY documentation said:

    For the "Freeable" category, Column ALLOCATED is the amount of free PGA memory eligible to be released to the operating system.

Note that OutOfMemoryError process still holds the "Freeable" memory and does not immediately put them back to freelists since the process is still alive. It takes time for GC to reclaim them gradually. If the process (Oracle session) terminated (disconnected), it is immediately reusable by other processes.

You can query V$PROCESS_MEMORY or V$PROCESS to monitor it.

Crosschecking with Solaris pmap output:

oracle@test$ pmap -x 1122
1122:  testdb (LOCAL=NO)
         Address     Kbytes        RSS       Anon     Locked Mode   Mapped File
...         
0000000080000000    4964352    4964352          -    4964352 rwxs-    [ osm shmid=0x40000039 ]
...
FFFFFD785FC1D000     306816     306812     306812          - rw---    [ anon ]
FFFFFD787296D000     305792     305788     305788          - rw---    [ anon ]
FFFFFD78854BD000     297472     297452     297452          - rw---    [ anon ]
FFFFFD7897F1D000     296448     296412     296412          - rw---    [ anon ]
FFFFFD78AA76D000     295424     295420     295420          - rw---    [ anon ]
FFFFFD78BCECD000     294336     294332     294332          - rw---    [ anon ]
...
---------------- ---------- ---------- ---------- ----------
        total Kb   31990020   31933340   26615772    4982784

(you can even use Solaris MDB Formatting Dcmds to display content at one above virtual address space)

Incident file and pmap output show that the majority of memory are allocated as "anon" pages, but not removed from the mappings, which means that there are more mmap than munmap.

We can see this un-matching mmap and munmap with:

oracle@test$ truss -cp -t mmap,munmap 1122

syscall               seconds   calls  errors
mmap                     .056    1014
munmap                  6.447      68

The output shows that munmap is 1716 (=(6.447/68)/(0.056/1014)) times expensive. That is probably one reason why the GC strategy is lazy delayed.

or watch the details by:

oracle@test$ truss -p -t mmap,munmap 1122  

...
mmap(0xFFFFFD7C6E8ED000, 202375168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANON, 4294967295, 0) = 0xFFFFFD7C6E8ED000
mmap(0xFFFFFD7C625ED000, 203423744, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANON, 4294967295, 0) = 0xFFFFFD7C625ED000
mmap(0xFFFFFD7C561FD000, 204537856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANON, 4294967295, 0) = 0xFFFFFD7C561FD000
...
mmap(0xFFFFFD7A43ACD000, 243924992, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANON, 4294967295, 0) = 0xFFFFFD7A43ACD000
mmap(0xFFFFFD7A34F7D000, 244973568, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANON, 4294967295, 0) = 0xFFFFFD7A34F7D000
mmap(0xFFFFFD7A2632D000, 246022144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANON, 4294967295, 0) = 0xFFFFFD7A2632D000
mmap(0xFFFFFD7A175DD000, 247136256, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANON, 4294967295, 0) = 0xFFFFFD7A175DD000
munmap(0xFFFFFD7A2632D000, 246022144)           = 0
munmap(0xFFFFFD7A34F7D000, 244973568)           = 0
munmap(0xFFFFFD7A43ACD000, 243924992)           = 0
...

It shows that not all mapped chunks are un-mapped, and unmapping occurs in reverse order (last mapping is first unmapped).

With Solaris dtrace, we can count number of mmap and munmap, as well as sum of their amount (in MB). The large difference between mmap and munmap is probably pointing to the leaked memory.

oracle@test$ dtrace -n 'syscall::mmap:entry,syscall::munmap:entry/pid==1122 && arg0 != 0x0/
{
 @CNT[probefunc] = count();
 @SUM[probefunc] = sum(arg1/1024/1024);
}
END
{
 printf("\n%10s %10s  %10s\n",  "NAME", "COUNT", "SUM(MB)");
 printa(  "%10s %10@d %10@d\n",         @CNT,   @SUM);
}'

------------ output -------------
      NAME      COUNT     SUM(MB)
    munmap         44       3987
      mmap        404      32400

There are 360 (404-44) more mmap, 32400 MB allocated, but only 3987 MB de-allocated. which results in 28413 (32400-3987) MB un-released allocation.

Other dtrace commands can help us understand largeobj (oldSpace) allocation and GC activity.

oracle@test$ dtrace -n 'pid$target::eoa_new_largeobj:entry {@[pid, ustack(40, 0)] = count(); }' -p 1122

oracle@test$ dtrace -n 'pid$target::eoa_oldspace_gc_scan_xt:entry {@[pid, ustack(40, 0)] = count(); }' -p 1122

Two tests below have no problem:

Sqlplus > exec createBuffer512_loop(511, 511, -1);

Sqlplus > exec createBuffer512_loop(511, 511, 0);

Both above tests propose us a quick workaround by introducing a class (static) variable and setting it to a enough big value as follows:
    final static byte[] SB = new byte[1024*1024*500];
Note that java class variable in Java Stored Procedure is stateful object, which survives across Java calls in the same RDBMS session (like PL/SQL Package variable), and can be reset by dbms_java.endsession (dbms_session.reset_package). Therefore, if JAVA VM memory is under pressure, dbms_java.endsession can be called after certain threshold to release the memory, a trade-off between Memory (JavaGC) and Performance.


5. Oracle Performance Views


Query GC statistics by:

select s.program, s.sid, n.name p_name, t.value, round(t.value/1e6, 2) mb
from v$session s, v$sesstat t, v$statname n 
where 1=1
  and s.sid=t.sid 
  and n.statistic# = t.statistic# 
  and name = 'java call heap gc count'
  and s.sid in (920);

Populate v$process_memory_detail by:
(see dbms_session.get_package_memory_utilization and limitations)

Sqlplus > exec pga_sampling(920, 120, 1);

and then run query:

select round(bytes/1024/1024, 2) mb, v.* from  process_memory_detail_v v
where 1=1
  and name in ('f:OldSpace', 'free memory')
  and round(bytes/1024/1024, 2) > 10
order by timestamp;

Oracle performance views v$process, v$process_memory reports "freeable" memory, whereas view:
   v$sesstat, v$active_session_history, dba_hist_active_sess_history
seem having difficult to record the real figures, probably due to "freeable" memory (27GB) which is signaled by the 27 GB "unsnapshotted" line in the incident file:
   99% 27 GB, 66 processes: "unsnapshotted "


6. Linux Test


All above tests are performed on Solaris. Repeat the same test on a Linux with 24GB of total usable memory.

Sqlplus > exec createBuffer512_loop(511, 1, 1);

    ORA-03113: end-of-file on communication channel
    Process ID: 32251

There are no trace file or incident file generated. Checking dmesg out, it looks like that process is killed by PSP0 (call oom-killer) when reaching system limit of 24GB, but before reaching 32GB PGA limit.

[08:07:36] ora_psp0_td invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
[08:07:36] ora_psp0_td cpuset=/ mems_allowed=0
[08:07:36] Pid: 23201, comm: ora_psp0_td Not tainted 2.6.32-642.6.2.el6.x86_64 #1
[08:07:36] Call Trace:
[08:07:36] [] ? dump_header+0x90/0x1b0
[08:07:36] [] ? security_real_capable_noaudit+0x3c/0x70
[08:07:36] [] ? oom_kill_process+0x82/0x2a0
[08:07:36] [] ? select_bad_process+0xe1/0x120
[08:07:36] [] ? out_of_memory+0x220/0x3c0
[08:07:36] [] ? __alloc_pages_nodemask+0x93c/0x950
[08:07:36] [] ? alloc_pages_current+0xaa/0x110
[08:07:36] [] ? __page_cache_alloc+0x87/0x90
[08:07:36] [] ? find_get_page+0x1e/0xa0
[08:07:36] [] ? filemap_fault+0x1a7/0x500
[08:07:36] [] ? __do_fault+0x54/0x530
[08:07:36] [] ? handle_pte_fault+0xf7/0xb20
[08:07:36] [] ? sem_lock+0x6c/0x130
[08:07:36] [] ? sys_semtimedop+0x338/0xae0
[08:07:36] [] ? handle_mm_fault+0x299/0x3d0
[08:07:36] [] ? __do_page_fault+0x146/0x500
[08:07:36] [] ? wait_consider_task+0x7e6/0xb20
[08:07:36] [] ? read_tsc+0x9/0x10
[08:07:36] [] ? ktime_get_ts+0xbf/0x100
[08:07:36] [] ? poll_select_copy_remaining+0xf8/0x150
[08:07:36] [] ? do_page_fault+0x3e/0xa0
[08:07:36] [] ? page_fault+0x25/0x30
...
[08:07:36] Out of memory: Kill process 32251 (oracle_32251_td) score 619 or sacrifice child
[08:07:36] Killed process 32251, UID 100, (oracle_32251_td) total-vm:20111724kB, anon-rss:15083248kB, file-rss:215188kB
[08:07:36] oracle_32251_td: page allocation failure. order:0, mode:0x280da
[08:07:36] Pid: 32251, comm: oracle_32251_td Not tainted 2.6.32-642.6.2.el6.x86_64 #1

Note Buddy memory request "order=0", and oom-killer reason: "score 619"


References


1. Database Java Developer's Guide Chapter 1 Introduction to Java in Oracle Database, Section Memory Spaces Management (Old Space)

2.Oracle Database Programming using Java and Web Services (Kuassi Mensah) Section 2.2 Java Memory Management (Page 41-49)

3. dbms_session.get_package_memory_utilization and limitations

4.Oracle Java OutOfMemoryError (11g Blog, no more reproducible in 12c)

5. PGA, SGA space usage watching from UNIX