Thursday, December 15, 2022

OracleJVM JAVA_JIT_ENABLED Linux /dev/shm mount noexec EPERM and Performance

In Linux, when JAVA_JIT_ENABLED is enabled, the native compiled Java code is stored in /dev/shm/JOEZSHM_*.
However, if tmpfs /dev/shm/ is mounted with "noexec" option as follows:

mount | grep shm
   tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,noexec)
mmap of JOEZSHM_* gets error EPERM ("Operation not permitted"), and JAVA_JIT_ENABLED does not work.

In MZnn and session trace files, there are plenty of "Operation not permitted" (EPERM) for JOEZSHM_*.

As a consequence, Oracle falls back to JAVA_JIT_ENABLED=false and the performance is degraded upto 50 times slower than the functional case.

Note: Tested on Oracle 19.13.


1. Test-1 Simple Case


First we make a simple SQL Java call (dbms_java.getversion calls java.lang.System.getProperty via dbms_java.get_ojvm_property), and strace its Linux process:

SQL > select 'JServer version: '||dbms_java.getversion, 'JDK version:'||dbms_java.get_jdk_version from dual;  

$> strace -tT -o mmap_strace.log -p 16637
After the call, we can see the new created memory files: /dev/shm/JOEZSHM_*:

$> ls -l /dev/shm
  total 32768
  -rwxrwx--- 1 oracle dba 16777216 Dec 13 07:36 JOEZSHM_testdb_1_0_0_0_0_2084322508
  -rwxrwx--- 1 oracle dba 16777216 Dec 13 07:36 JOEZSHM_testdb_1_0_1_0_0_749348400
  
-- no entry found, no JOEZSHM_* mapped to process address space
$> pmap -X -p 16637 |grep -i JOEZSHM
mmap_strace.log shows:

  07:36:23 open("/dev/shm/JOEZSHM_testdb_1_0_0_0_0_2084322508", O_RDWR|O_CREAT|O_NOFOLLOW|O_CLOEXEC, 0770) = 7 <0.000456>
  07:36:23 lseek(7, 0, SEEK_CUR)          = 0 <0.000018>
  07:36:23 lseek(7, 0, SEEK_END)          = 16777216 <0.000023>
  07:36:23 lseek(7, 0, SEEK_SET)          = 0 <0.000030>
  07:36:23 mmap(NULL, 16777216, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_SHARED, 7, 0) = -1 EPERM (Operation not permitted) <0.000022>
MZ00 and session trace files are filled with:

sjoezshm_map_obj mmap failed for /JOEZSHM_testdb_1_0_0_0_0_2084322508: Operation not permitted
size  = 16777216, prot_mode  = 7, map_flag = 1
joez_shm_open_object failed: size = 16777216, extnam = /JOEZSHM_testdb_1_0_0_0_0_2084322508  flags = 0x34 
joez: Failed loading machine code: Unable to allocate code space

sjoezshm_map_obj mmap failed for /JOEZSHM_testdb_1_0_1_0_0_749348400: Operation not permitted
size  = 16777216, prot_mode  = 7, map_flag = 1
joez_shm_open_object failed: size = 16777216, extnam = /JOEZSHM_testdb_1_0_1_0_0_749348400  flags = 0x34 
joez: Failed loading machine code: Unable to allocate code space


2. Test-2 Complex Case


Use the same test code in Blog: Oracle 19.4 OracleJVM JAVA_JIT_ENABLED Not Working on AIX to read a 27MB signed JAR file testJar.jar (security signatures under META-INF directory).

$> ls -l /tmp/testJar.jar
  -rw-r--r-- 1 oracle dba 28944084 Dec  6 17:39 /tmp/testJar.jar

$> unzip -l /tmp/testJar.jar
  Archive:  /tmp/testJar.jar
    Length      Date    Time    Name
  ---------  ---------- -----   ----
        144  12-05-2019 08:27   META-INF/MANIFEST.MF
        306  12-05-2019 08:27   META-INF/KUNALIAS.SF
       1471  12-05-2019 08:27   META-INF/KUNALIAS.DSA
          0  12-05-2019 08:26   META-INF/
  110947240  12-05-2019 08:23   test1.txt
  ---------                     -------
  110949161                     5 files
First we invoke OracleJVMJarInputStream without signature verify, it takes 3 seconds:

Sql > set serveroutput on size 50000
Sql > exec dbms_java.set_output(50000); 

SQL > exec OracleJVMJarInputStream(p_verify => 'false', p_info => 'no');

********* getNextJarEntry *********
------ NextJarEntry: 1, Name: META-INF/KUNALIAS.SF ------
         getNextEntry ElapsedMills: 31, at: 1670863088140
         Insert DB 1 row, blob size 0 at Mon Dec 12 17:38:08 CET 2022
         readContent ElapsedMills: 5, at: 1670863088145
------ NextJarEntry: 2, Name: META-INF/KUNALIAS.DSA ------
         getNextEntry ElapsedMills: 0, at: 1670863088145
         Insert DB 1 row, blob size 0 at Mon Dec 12 17:38:08 CET 2022
         readContent ElapsedMills: 2, at: 1670863088147
------ NextJarEntry: 3, Name: META-INF/ ------
         getNextEntry ElapsedMills: 0, at: 1670863088148
         Insert DB 1 row, blob size 0 at Mon Dec 12 17:38:08 CET 2022
         readContent ElapsedMills: 2, at: 1670863088150
------ NextJarEntry: 4, Name: test1.txt ------
         getNextEntry ElapsedMills: 0, at: 1670863088150
         Insert DB 1 row, blob size 110920480 at Mon Dec 12 17:38:11 CET 2022
         readContent ElapsedMills: 3125, at: 1670863091275

PL/SQL procedure successfully completed.
Elapsed: 00:00:03.20
Then we invoke OracleJVMJarInputStream with signature verify, it takes 3 minutes:

Sql > exec OracleJVMJarInputStream(p_verify => 'true', p_info => 'no');

********* getNextJarEntry *********
------ NextJarEntry: 1, Name: META-INF/KUNALIAS.SF ------
         getNextEntry ElapsedMills: 1, at: 1670863115333
         Insert DB 1 row, blob size 0 at Mon Dec 12 17:38:35 CET 2022
         readContent ElapsedMills: 6, at: 1670863115339
------ NextJarEntry: 2, Name: META-INF/KUNALIAS.DSA ------
         getNextEntry ElapsedMills: 0, at: 1670863115339
         Insert DB 1 row, blob size 0 at Mon Dec 12 17:38:35 CET 2022
         readContent ElapsedMills: 423, at: 1670863115762
------ NextJarEntry: 3, Name: META-INF/ ------
         getNextEntry ElapsedMills: 0, at: 1670863115762
         Insert DB 1 row, blob size 0 at Mon Dec 12 17:38:35 CET 2022
         readContent ElapsedMills: 3, at: 1670863115765
------ NextJarEntry: 4, Name: test1.txt ------
         getNextEntry ElapsedMills: 0, at: 1670863115765
         Insert DB 1 row, blob size 110920480 at Mon Dec 12 17:41:35 CET 2022
         readContent ElapsedMills: 179651, at: 1670863295417

PL/SQL procedure successfully completed.
Elapsed: 00:03:00.12
If JAVA_JIT_ENABLED works (/dev/shm is mounted without "noexec"), the execution with signature verify is completed in 5 seconds (instead of 3 minutes).


3. Test-3 Standalone mmap.c Test


Web Page: mmap() fails on tmpfs (operation not permitted) #2974 provided a mmap.c test code and showed "mmap() failed: : Operation not permitted".

We can use the same code to simulate Oracle JIT /dev/shm/JOEZSHM_* test.
(with small adaptations of open and mmap arguments to Oracle parameters according to strace output in above Test-1 Simple Case).

// mmap-fail.cpp
#include <stdint.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>

// $ g++ mmap-fail.cpp -o mmap-fail
// $ sudo ./mmap-fail
// mmap() failed: : Operation not permitted

int main(int argc, const char* argv[])
{
    //const char* path = "./eightbytes.bin"; // okay on lxfs
    //const char* path = "/run/eightbytes.bin";
    //int fd = open(path, O_RDWR|O_CREAT|O_CLOEXEC|O_NOCTTY|O_NOFOLLOW, 0644);
    int fd = open("/dev/shm/JOEZSHM_testdb_1_0_0_0_0_2084322508", O_RDWR|O_CREAT|O_NOFOLLOW|O_CLOEXEC, 0770);
    if (fd < 0) {
        perror("open() failed: ");
        return 1;
    }
    int r = posix_fallocate(fd, 0, sizeof(uint64_t));
    if (r < 0) {
        perror("posix_fallocate() failed: ");
        return 1;
    }
    //uint64_t *p = (uint64_t*)mmap(NULL, sizeof(uint64_t), PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
    uint64_t *p = (uint64_t*)mmap(NULL, sizeof(uint64_t), PROT_READ|PROT_WRITE|PROT_EXEC, MAP_SHARED, fd, 0);
    if (p == MAP_FAILED) {
        perror("mmap() failed: ");
        return 1;
    }
    int r2 = close(fd);

    printf("Whee! mmap SUC\n");
    return 0;
}
Run the test with strace:

$> strace -tT -o mmap-fail_strace.log ./mmap-fail
  mmap() failed: : Operation not permitted
mmap-fail_strace.log shows:

  07:42:37 open("/dev/shm/JOEZSHM_testdb_1_0_0_0_0_2084322508", O_RDWR|O_CREAT|O_NOFOLLOW|O_CLOEXEC, 0770) = 3 <0.000022>
  07:42:37 fallocate(3, 0, 0, 8)          = 0 <0.000018>
  07:42:37 mmap(NULL, 8, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_SHARED, 3, 0) = -1 EPERM (Operation not permitted) <0.000017>


4. mmap.c EPERM Operation not permitted


From source mm/mmap.c, we can see that EPERM (Operation not permitted) is thrown in do_mmap
(case MAP_SHARED fallthrough to case MAP_PRIVATE).

// mm/mmap.c
unsigned long do_mmap(struct file *file, unsigned long addr,
  ...
	if (file) {
		...
		switch (flags & MAP_TYPE) {
		case MAP_SHARED:
      ...
			fallthrough;
		case MAP_SHARED_VALIDATE:
      ...
			fallthrough;
		case MAP_PRIVATE:
			if (!(file->f_mode & FMODE_READ))
				return -EACCES;
			if (path_noexec(&file->f_path)) {
				if (vm_flags & VM_EXEC)
					return -EPERM;
				vm_flags &= ~VM_MAYEXEC;
			}
			
			
include/uapi/asm-generic/errno-base.h
  #define	EPERM		 1	/* Operation not permitted */


5. Fix


To fix EPERM problem on Linux for JAVA_JIT_ENABLED, remove noexec option for tmpfs /dev/shm and make it permanent in /etc/fstab:

$> mount -o remount,exec /dev/shm
So that it looks like:

$> mount |grep shm
     tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)

$> cat /etc/fstab |grep shm
     tmpfs /dev/shm tmpfs nosuid,nodev 0 0


6. Related Work and Other Observations


For further discussions on OracleJVM JAVA_JIT_ENABLED, see:
     What the heck are the /dev/shm/JOXSHM_EXT_x files on Linux?
     Oracle 19.4 OracleJVM JAVA_JIT_ENABLED Not Working on AIX

There is also contre-proposition:
     The Oracle Linux operating system must mount /dev/shm with the noexec option.

When JAVA_JIT_ENABLED works, and we strace MZ00 session for a while.
Trace file shows that MZ00 is continuously compiling Java, each time, only one method (not entire class).
The timestamp of /dev/shm/JOEZSHM_* shows that those memory files are also updated.

$> strace -tT -s 128 -o mz00_strace.txt -p 23037

-- (fd 6 is testdb_mz00_23037.trc)
$> grep "Done compiling.*locale*" mz00_strace.txt
  15:48:11 write(6, "Done compiling sun/util/locale/provider/TimeZoneNameUtility$TimeZoneNameGetter.getObject", 88) = 88 <0.000026>
  15:50:44 write(6, "Done compiling sun/util/locale/provider/TimeZoneNameUtility$TimeZoneNameGetter.getObject", 88) = 88 <0.000019>
  15:50:44 write(6, "Done compiling sun/util/locale/provider/TimeZoneNameProviderImpl.getDisplayNameArray", 84) = 84 <0.000025>
  15:50:44 write(6, "Done compiling sun/util/locale/provider/TimeZoneNameUtility$TimeZoneNameGetter.getName", 86) = 86 <0.000046>
  15:50:44 write(6, "Done compiling sun/util/locale/provider/LocaleResources.removeEmptyReferences", 77) = 77 <0.000035>
  15:50:45 write(6, "Done compiling sun/util/locale/provider/LocaleResources.getTimeZoneNames", 72) = 72 <0.000039>
  15:50:46 write(6, "Done compiling sun/util/locale/BaseLocale$Key.normalize", 55) = 55 <0.000030>
  15:50:47 write(6, "Done compiling sun/util/locale/provider/TimeZoneNameProviderImpl.getDisplayName", 79) = 79 <0.000057>
By the way, in the old Oracle release, there are many JIT compiled small files (KB) with name pattern like JOXSHM_EXT_*,
In Oracle 19c, there are only a few big files (16MB) with name pattern like JOEZSHM_*.

Update (2023-02-26)Oracle 19c Java Developer's Guide: 9.1 Oracle JVM Just-in-Time Compiler (JIT) documented this behaviour:
Note:
 
On Linux, Oracle JVM JIT uses POSIX shared memory that requires access to the /dev/shm directory.
The /dev/shm directory should be of type tmpfs and you must mount this directory as follows:
  -. With rw and execute permissions set on it
  -. Without noexec or nosuid set on it
If the correct mount options are not used, then the following failure may occur during installation of the database:
  ORA-29516: Aurora assertion failure: Assertion failure at joez.c:
             Bulk load of method java/lang/Object. failed; insufficient shm- object space