Sunday, February 14, 2016

Native Code Files: Ora-7445 [Ioc_pin_shared_executable_object()] Reproducing

PL/SQL and JAVA Native Code Files are used in Oracle to boost performance.

Blog: Remove Stale Native Code Files on AIX discussed the generation and remove of Native Code Files.

Blog: What the heck are the /dev/shm/JOXSHM_EXT_x files on Linux? revealed the contents and loading of Native Code Files.

Both Blogs follow MOS Note Ora-7445 [Ioc_pin_shared_executable_object()] (Doc ID 1316906.1), which said:
  If ORA-7445 [ioc_pin_shared_executable_object] is encountered, one should look into the following:
  1) Is there any user or job that may have tried to clean up files in the directory where the backing store images are located?
  2) Is the location where the backing store files are located out of space?
  3) Is the location where the backing store files are located accessible?

This Blog will try to reproduce the error on Linux with Oracle Version 11.2.0.4.0.

1. PL/SQL Native Code Files


Create Package by:
 alter system set plsql_code_type=native;    
 -- Default  INTERPRETED
 
 create or replace package bla as
   procedure foo;
 end;
 /
 create or replace package body bla as
   procedure foo is begin null; end;
 end;
 /
   
One Native Code File is generated after create package body bla (not generated after create package spec), and stored in tmpfs (/dev/shm/):
   linux: ls -lt /dev/shm/
   -rwx------ 1 oracle dba 424 Feb 12 07:08 PESLD_testdb_56ae0002_b8c1106001000000*

In Sqlplus, run:
 07:09:48 (243,7)>  exec bla.foo;

In Linux, map into the virtual memory of calling process (15022 is PID of Oracle session (243,7)):
  linux: pmap -x 15022
   Address           Kbytes     RSS   Dirty Mode   Mapping
   00007ff61a7cb000       4       4       4 r-x--  PESLD_testdb_56ae0002_b8c1106001000000

  linux: cat /proc/15022/maps
   7ff61a7cb000-7ff61a7cc000 r-xp 00000000 00:0f 21930676  /dev/shm/PESLD_testdb_56ae0002_b8c1106001000000

if we remove it:
  linux: rm /dev/shm/PESLD_testdb_56ae0002_b8c1106001000000

they are marked as "deleted" for PID 15022:

   00007f19bcd64000       4       4       4 r-x--  PESLD_testdb_56ae0002_b8c1106001000000 (deleted)
   7f19bcd64000-7f19bcd65000 r-xp 00000000 00:0f 21618650 /dev/shm/PESLD_testdb_56ae0002_b8c1106001000000 (deleted)

Open a new SQL session, run:
 07:18:07 (325,21) > exec bla.foo;
 ORA-00600: internal error code, arguments: [pesldl01_Get_Object: shm_open failed: errno 2 errmsg No such fil], [/PESLD_testdb_56ae0002_b8c1106001000000]

strace shows:
 open("/dev/shm/PESLD_testdb_56ae0002_b8c1106001000000", O_RDWR|O_NOFOLLOW|O_CLOEXEC) = -1 ENOENT (No such file or directory):
 
occasionally there is incident dump, which shows:
 ORA-07445: exception encountered: core dump [ioc_pin_shared_executable_object()+888] [SIGSEGV] [ADDR:0x0] [PC:0x30E1CD0] [Address not mapped to object] []
 
In the already opened session, it is still OK:
 07:29:48 (243,7) > exec bla.foo;

Native Code Files are stored in tmpfs (/dev/shm/) and there is no locking mechanisms to prevent them from removing by the user. In Linux, tmpfs is one part of cache (see output of command free and vmstat). In case of low memory situations, it could be subject to garbage collection or swapping out.

2. Java JIT Native Code Files


Run
 alter system set java_jit_enabled=true;
 create or replace and compile java source named "Hello" as
  public class Hello {
     public static String world ()
     {
        return "hello world";
     }
  }
 /
 create or replace function helloworld
  return varchar2 as language java
  name 'Hello.world () return java.lang.string';
 /
 select helloworld from dual;

We can see more than 100 files are created after compile java source. They all look like:
 /dev/shm/JOXSHM_EXT_553_testdb_1451786242*


3. Ora-7445 [Ioc_pin_shared_executable_object()] Reproducing


At first restart database by:
 shutdown immediate   
 startup 
 
so that all Native Code Files are cleaned away, no more files under /dev/shm/
(note: "shutdown abort" will not clean up /dev/shm/, see Blog: Remove Stale Native Code Files on AIX).


Open one Plsql session, run a simple query:

 08:07:38 (124,13)> select helloworld from dual;
  hello world
 
and at the same time, one can also watch the process in Linux (51876 id PID of Session 124):

 linux: strace -p 51876 
  open("/dev/shm/JOXSHM_EXT_162_testdb_1454571522", O_RDWR|O_CREAT|O_TRUNC|O_NOFOLLOW|O_CLOEXEC, 0770) = 7
  fchmod(7, 0770)                         = 0
  ftruncate(7, 12288)                     = 0
  write(7, "\0", 1)                       = 1
  lseek(7, 4095, SEEK_CUR)                = 4096
  write(7, "\0", 1)                       = 1
  lseek(7, 4095, SEEK_CUR)                = 8192
  write(7, "\0", 1)                       = 1
  mmap(NULL, 12288, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_SHARED, 7, 0) = 0x7f9f77a52000
  close(7)                                = 0
 
 linux: ls -ltr /dev/shm/* |wc
     167    1503   14258 
     
 linux: pmap -x 51876    
  00007f1fb1b96000       4       4       4 rwxs-  JOXSHM_EXT_47_testdb_1453129730
  ...
  00007f1fb1b97000       4       4       4 rwxs-  JOXSHM_EXT_8_testdb_1453129730
 
Allocate 33GB PGA memory (see Test Code in Blog: Limit PGA Memory Usage):

 08:09:25 (124,13)> exec pga_mem_test.allo(33*1024);
  Start allocate: 3379 > Used/Alloc/Freeable/Max >>> 11/14/1/16
               ------ Category(Alloc/Used/Max) >>> SQL(1/0/5) > PL/SQL(2/2/4) > Other(6//6) > JAVA(4/4/5) > Freeable(1/0/) >
  ORA-04030: out of process memory when trying to allocate 16328 bytes (koh-kghu sessi,pl/sql vc2)
 
 08:09:36 (124,13)> select helloworld from dual;
  hello world
  
 and the trace dump shows:
  Memory (Avail / Total) = 640.90M / 15943.66M
  Swap (Avail / Total) = 8152.43M /  8187.99M
  ioc_allocate(size: 335927, heap name: *** ??? ***, flags: a0008) caught 4030
 
Run a new query, which requires new mmap:
 08:10:31 (124,13)> select dbms_java.get_compiler_option('', 'encoding')from dual;
  ORA-04030: out of process memory when trying to allocate 262224 bytes (joxcx callheap,f:Class)
  ORA-06512: at "SYS.DBMS_JAVA", line 147

continue running that query, get the same error.

and strace shows:
 linux: strace -p 51876
  unlink("/dev/shm/JOXSHM_EXT_162_testdb_1454571522") = 0
  open("/dev/shm/JOXSHM_EXT_162_testdb_1454571522", O_RDWR|O_CREAT|O_TRUNC|O_NOFOLLOW|O_CLOEXEC, 0770) = 19
  fchmod(19, 0770)                        = 0
  ftruncate(19, 12288)                    = 0
  write(19, "\0", 1)                      = 1
  lseek(19, 4095, SEEK_CUR)               = 4096
  write(19, "\0", 1)                      = 1
  lseek(19, 4095, SEEK_CUR)               = 8192
  write(19, "\0", 1)                      = 1
  mmap(NULL, 12288, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_SHARED, 19, 0) = 0x7f9f77a52000
  close(19)        

Open a new session, run the same select:
 08:13:12 (356,7)> select dbms_java.get_compiler_option('', 'encoding')from dual;
  UTF-8

watch the new loaded files:
 linux: ls -ltr /dev/shm|wc
     317    2846   24555

Back to the error session, run the select again:
 08:15:22 (124,13)> select dbms_java.get_compiler_option('', 'encoding')from dual;
  ORA-03113: end-of-file on communication channel
  Process ID: 51876
  Session ID: 124 Serial number: 13
  ORA-03114: not connected to ORACLE
 
and strace said:
 linux: strace -p 51876
  open("/dev/shm/JOXSHM_EXT_2_testdb_1454571522", O_RDWR|O_NOFOLLOW|O_CLOEXEC) = 8
  mmap(NULL, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_SHARED, 8, 0) = -1 ENOMEM (Cannot allocate memory)
  ...
  --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0} ---

The same Native Code File in Session 356 (54806 is PID of Session 356):
 pmap -x 54806 |grep JOXSHM_EXT_2_
  00007ff1785ba000       4       4       4 rwxs-  JOXSHM_EXT_2_testdb_1453719554
  
 (Note that "PROT_READ|PROT_WRITE|PROT_EXEC" are mapped to "rwx", "MAP_SHARED" to "s") 
 

The incident dump of error session shows:
 ORA-07445: exception encountered: core dump [ioc_pin_shared_executable_object()+888] [SIGSEGV] [ADDR:0x0] [PC:0x30E1CD0] [Address not mapped to object] []
 > ioc_pin_shared_executable_object()+888 (0x30e1cd0) mov (%rax),%rdx
 mmap(offset=207478784, len=8192) failed with errno=12 for the file oracletestdb

and the trace dump contains:
 mmap failed with error 12
 error message:Cannot allocate memory
 ORA-04035: unable to allocate 4096 bytes of shared memory in shared object cache "JOXSHM" of size "117440512"
 Assertion failure at joez.c:3422
 Bulk load of method oracle/sql/DatumWithConnection.getPhysicalConnection failed; insufficient shm-object space

(prefix "ioc" in ORA-07445 message could be acronym of "Inversion of Control", something like callbacks)

The same information is also written into MMON_SLAVE dump, for example, M000
 *** MODULE NAME:(MMON_SLAVE) 2016-02-10T07:02:30.076240+01:00
 *** ACTION NAME:(JAVAVM JIT slave action) 2016-02-10T07:02:30.076244+01:00
 ORA-07445: exception encountered: core dump 

With following script, one can watch the allocated size (Kbytes):
  linux: pmap -x 51876 |grep -e PESLD -e JOXSHM |awk '{cnt+=1; Kbytes+=$2; RSS+=$3; Dirty+=$4} END {print "Count=",cnt,",Kbytes=",Kbytes,",RSS=",RSS,",Dirty=",Dirty}'
         Count= 65280 ,Kbytes= 4210688 ,RSS= 4180512 ,Dirty= 4180512


In summary, after a session hits:
  ORA-04030: out of process memory when trying to allocate 262224 bytes (joxcx callheap,f:Class)
and when it wants to load more Native Code Files:
  (1). If the Native Code Files are not found in tmpfs (/dev/shm/), it continues throwing ORA-04030.
  (2). If they are found in tmpfs (which are loaded by other sessions), it throws:
            ORA-07445: exception encountered: core dump [ioc_pin_shared_executable_object()+888] [SIGSEGV]
        and disconnects the session.