Tuesday, May 24, 2022

How volatile is ORA-00600 [qernsRowP] ?

ORA-600 [qernsRowP] seems related to a parallel query when the execution plan includes SORT GROUP BY NOSORT.

This Blog will demonstrate ORA-00600 [qernsRowP] with one small test code (probably one shortest ORA-00600 test code).

Note: Tested in Oracle 19.13, 19.7, 18.9, 12.1


1. Test Setup



drop type t_char100_varray50 force;

create or replace noneditionable type t_char100_varray50 as varray(50) of varchar2(100)
/

drop table test_tab cascade constraints;

create table test_tab as select level id, t_char100_varray50('a', 'b', 'c') vary
  from dual connect by level <= 1e4;
  
-- No primary key, no error
alter table test_tab add constraint test_tab#p primary key (id);


2. Test Run


Run following query, it throws ORA-00600 [qernsRowP].

select /*+ parallel(4) index(t test_tab#p) */ t.id, count(*)
  from test_tab t, table(t.vary) v
where rownum <= 3000
group by t.id;

    ID   COUNT(*)
 ----- ----------
     1          3
     2          3
     3          3
  ...
  2602          3
  2603          3
  
  ERROR:
  ORA-00600: internal error code, arguments: [qernsRowP], [1], [], [], [], [], [], [], [], [], [], []

  615 rows selected.
  
  Note that sometime error occurs alternatively (not in first run, but in second run).
If we limit rownum to a small number, e.g. 100, no error occurs, but exact number is hard to find (varying with Oracle release, OS, and run sequence).

-- rownum <= 100, no error. Exact number is not fixed. 
select /*+ parallel(4) index(t test_tab#p) */ t.id, count(*)
  from test_tab t, table(t.vary) v
where rownum <= 100
group by t.id;

    ID   COUNT(*)
  ---- ----------
     1          3
     2          3
   ... 
    33          3
    34          1

  34 rows selected.
If we remove parallel hint, there is also no more errors.

--remove parallel hint, no error.
select /*+ index(t test_tab#p) */  t.id, count(*)
  from test_tab t, table(t.vary) v 
where rownum <= 1000
group by t.id; 

        ID   COUNT(*)
  ----- ----------
      1          3
      2          3
    ...  
    333          3
    334          1

  334 rows selected.
For ORA-00600 [qernsRowP] query, Plan Table is dumped in session trace and incident file.
The rowsource line 1 contains "SORT GROUP BY NOSORT".

Plan Table
--------------------------------------------------------------------+-----------------------------------+-------------------------+
| Id  | Operation                             | Name      | Rows  | Bytes | Cost  | Time      | ObjectId  |  TQ  |IN-OUT|PQ Distrib |
--------------------------------------------------------------------+-----------------------------------+-------------------------+
| 0   | SELECT STATEMENT                      |           |       |       |  6016 |           |           |      |      |           |
| 1   |  SORT GROUP BY NOSORT                 |           |  3000 | 5540K |  6016 |  06:26:57 |           |      |      |           |
| 2   |   COUNT STOPKEY                       |           |       |       |       |           |           |      |      |           |
| 3   |    NESTED LOOPS                       |           |   46M |   86G |  6016 |  06:26:57 |           |      |      |           |
| 4   |     PX COORDINATOR                    |           |       |       |       |           |           |      |      |           |
| 5   |      PX SEND QC (RANDOM)              | :TQ10001  |  5963 |   11M |    44 |  00:03:50 |           |:Q1001| P->S |QC (RANDOM)|
| 6   |       TABLE ACCESS BY INDEX ROWID     | TEST_TAB  |  5963 |   11M |    44 |  00:03:50 | 4683207   |:Q1001| PCWP |           |
| 7   |        BUFFER SORT                    |           |       |       |       |           |           |:Q1001| PCWC |           |
| 8   |         PX RECEIVE                    |           |  5963 |       |    11 |  00:00:43 |           |:Q1001| PCWP |           |
| 9   |          PX SEND HASH (BLOCK ADDRESS) | :TQ10000  |  5963 |       |    11 |  00:00:43 |           |      | S->P |HASH (BLOCK ADDRESS)|
| 10  |           INDEX FULL SCAN             | TEST_TAB#P|  5963 |       |    11 |  00:00:43 | 4683210   |      |      |           |
| 11  |     COLLECTION ITERATOR PICKLER FETCH |           |  8168 |       |     8 |  00:00:31 |           |      |      |           |
--------------------------------------------------------------------+-----------------------------------+-------------------------+

Predicate Information:
----------------------
2 - filter(ROWNUM<=3000)
 
Content of other_xml column
===========================
  dop_reason     : hint
  dop            : 4
  px_in_memory_imc: no
  px_in_memory   : no
  db_version     : 19.0.0.0
-----------------

  Hint Report:
    Query Block: SEL$F5BB74E1
      Table: ("T"@"SEL$1") index(t test_tab#p)
    Statement: parallel(4)
Call Stack in incident file looks like:

  --------------------- Binary Stack Dump ---------------------
  [1]  (ksedst1()+95 -> kgdsdst())
  [2]  (ksedst()+58 -> ksedst1())
  [3]  (dbkedDefDump()+23448 -> ksedst())
  [4]  (ksedmp()+577 -> dbkedDefDump())
  [5]  (dbgexPhaseII()+2092 -> ksedmp())
  [6]  (dbgexProcessError()+1871 -> dbgexPhaseII())
  [7]  (dbgePostErrorKGE()+1853 -> dbgexProcessError())
  [8]  (dbkePostKGE_kgsf()+71 -> dbgePostErrorKGE())
  [9]  (kgeadse()+447 -> dbkePostKGE_kgsf())
  [10] (kgerinv_internal()+44 -> kgeadse())
  [11] (kgerinv()+40 -> kgerinv_internal())
  [12] (kgesinv()+21 -> kgerinv())
  [13] (ksesin()+180 -> kgesinv())
  [14] (qernsRowP()+501 -> ksesin())          --ERROR SIGNALED: yes   COMPONENT: SQL_Execution
  [15] (qercoRop()+111 -> qernsRowP())
  [16] (qerocpFetch()+428 -> qercoRop())
  [17] (qerocFetch()+201 -> qerocpFetch())
  [18] (qerjotRowProc()+397 -> qerocFetch())
  [19] (qerpxFetch()+995 -> qerjotRowProc())
  [20] (qerjotFetch()+2094 -> qerpxFetch())
  [21] (qercoFetch()+299 -> qerjotFetch())
  [22] (qernsFetch()+424 -> qercoFetch())
  [23] (opifch2()+3211 -> qernsFetch())
  [24] (opifch()+61 -> opifch2())
  [25] (opiodr()+1202 -> opifch())
  [26] (ttcpip()+1246 -> opiodr())  
MOS provides a workaround, but test shows that it does not work.

     ORA-600 [qernsrowp] (Doc ID 285913.1)
     ORA-00600 [QERNSROWP] When Running a Parallel Query With Group By NOSORT Option (Doc ID 984955.1)

workarounds

  Alter session set events '10119 trace name context forever, level 12';
  alter session set events '10119 trace name context forever';
MOS: Receiving ORA-600 [qernsRowP] Internal Error When Saving Changes. (Doc ID 455139.1) provides solution:

  -- To implement the solution, please execute the following steps::
  Setting cursor_sharing=EXACT
But our test DB is already set "Setting cursor_sharing=EXACT"

  SQL > show parameter cursor_sharing
  
     NAME             TYPE    VALUE
     ---------------- ------- -----
     cursor_sharing   string  exact

One Case of ORA-00036: Maximum Number Of Recursive SQL Levels (50) Exceeded

ORA-00036 is documented as:
  00036, 00000, "maximum number of recursive SQL levels (%s) exceeded"
  // *Cause:  An attempt was made to go more than the specified number
  //          of recursive SQL levels.
  // *Action: Remove the recursive SQL, possibly a recursive trigger.
The most (possibly only) circulated test code is about recursive trigger
(see MOS: PL/SQL Trigger causes ORA-00036: Maximum Number Of Recursive SQL Levels (50) Exceeded (Doc ID 1478056.1)).

This Blog will demonstrate one most often occurred case of ORA-00036 in Oracle applications.

Note: Tested in Oracle 19.13, 19.7, 18.9, 12.1


1. Test Setup


We create a Plsql procedure, which makes dynamic recursive calls with execute immediate.

create or replace procedure recursive_dynamic (p_depth number) as 
begin
  dbms_output.put_line('Depth = '|| p_depth); 
  execute immediate q'[begin recursive_dynamic (:dep); end;]' using p_depth + 1;
end;
/


2. Dynamic Recursive Call


We run the test with "36 trace" and "10046 trace". (See MOS: OERR: ORA-36 "maximum number of recursive SQL levels (%s) exceeded" Reference Note (Doc ID 48793.1))

alter session set max_dump_file_size = UNLIMITED;
alter session set events='36 trace name errorstack level 3: 10046 trace name context forever, level 12' 
                  tracefile_identifier='recursive_trc';
            
begin      
  execute immediate q'[begin recursive_dynamic (:dep); end;]' using 1;
end;
/

alter session set events='36 trace name errorstack off: 10046 trace name context off';
After a few seconds, session throws ORA-00036 after 51 recursive calls:

Depth = 1
Depth = 2
Depth = 3
...
Depth = 49
Depth = 50
Depth = 51

ORA-00036: maximum number of recursive SQL levels (50) exceeded
ORA-06512: at "K.RECURSIVE_DYNAMIC", line 4
ORA-06512: at line 1

Elapsed: 00:00:02.92
Trace file shows 51 "PARSING IN CURSOR" with dep=1 to dep=51 and Bind#0 from value=1 to value=51:

PARSING IN CURSOR #140643820632152 dep=1 tim=142118502429 hv=3989675504 ad='8cbdaf18' sqlid='2s8jqqgqwv7gh'
begin recursive_dynamic (:dep); end;
END OF STMT
PARSE #140643820632152:c=238,e=238,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=0,tim=142118502429
BINDS #140643820632152:

 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=03 fl2=1206001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fea315aff88  bln=22  avl=02  flg=05
  value=1
  
...
 
PARSING IN CURSOR #140643819871240 dep=51 tim=142118509346 hv=3989675504 ad='8cbdaf18' sqlid='2s8jqqgqwv7gh'
begin recursive_dynamic (:dep); end;
END OF STMT
PARSE #140643819871240:c=12,e=12,p=0,cr=0,cu=0,mis=0,r=0,dep=51,og=1,plh=0,tim=142118509346
BINDS #140643819871240:

 Bind#0
  oacdty=02 mxl=22(21) mxlc=00 mal=00 scl=00 pre=00
  oacflg=13 fl2=206001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fea30ebb770  bln=22  avl=02  flg=09
  value=51
Here the Call Stack:

----- Error Stack Dump -----
ORA-00036: maximum number of recursive SQL levels (50) exceeded
Current SQL information unavailable - no cursor.
----- PL/SQL Call Stack -----
  object      line  object
  handle    number  name
0x8cccbf90         4  procedure K.RECURSIVE_DYNAMIC
0x8cbcedf0         1  anonymous block
0x8cccbf90         4  procedure K.RECURSIVE_DYNAMIC
0x8cbcedf0         1  anonymous block
0x8cccbf90         4  procedure K.RECURSIVE_DYNAMIC
0x8cbcedf0         1  anonymous block
0x8cccbf90         4  procedure K.RECURSIVE_DYNAMIC  
  
--------------------- Binary Stack Dump ---------------------
[13] (dbkePostKGE_kgsf()+71 -> dbgePostErrorKGE()) 
[14] (kgeade()+392 -> dbkePostKGE_kgsf()) 
[15] (kgeselv()+89 -> kgeade()) 
[16] (ksesec1()+205 -> kgeselv()) 
[17] (ksuprc()+1629 -> ksesec1()) 
[18] (opiodr()+760 -> ksuprc()) 
[19] (rpidrus()+198 -> opiodr()) 
[20] (skgmstack()+65 -> rpidrus()) 
[21] (rpidru()+132 -> skgmstack()) 
[22] (rpiswu2()+543 -> rpidru()) 
[23] (rpidrv()+1266 -> rpiswu2()) 
[24] (psddr0()+467 -> rpidrv()) 
[25] (psdopn()+72 -> psddr0()) 
[26] (plcurOpen()+64 -> psdopn()) 
[27] (kgscGetCursor()+4842 -> plcurOpen()) 
[28] (pevm_I4EXIM()+601 -> kgscGetCursor()) 
[29] (pfrinstr_I4EXIM()+167 -> pevm_I4EXIM()) 
[30] (pfrrun_no_tool()+60 -> pfrinstr_I4EXIM()) 
[31] (pfrrun()+902 -> pfrrun_no_tool()) 
[32] (plsql_run()+752 -> pfrrun()) 
In Call Stack Trace, we can see that Oracle detected "recursion pattern":

----- Call Stack Trace -----

**** At frame 91 recursion pattern of size 17 found, for return address 
     rpiswu2()+543 suppressing  printing.
**** At frame 100 recursion pattern broken, last return was 
     plsql_run()
     
**** At frame 891 recursion pattern of size 17 found, for return address 
     rpidrv()+1266 suppressing  printing.
**** At frame 900 recursion pattern broken, last return was 
     peicnt()
The 51 recursive calls generated 51 Cursors from Cursor#7 to Cursor#57 with Bind#0 from value=1 to value=51:

----------------------------------------
Cursor#7(0x7fea318a0908) state=BOUND curiob=0x7fea30f7f858
 curflg=0xcd fl2=0x0 fl3=0x0 par=(nil) ses=0xb8c54908
----- Dump Cursor sql_id=2s8jqqgqwv7gh xsc=0x7fea30f7f858 cur=0x7fea318a0908 -----

LibraryHandle:  Address=0x8cbdaf18 Hash=edcd9df0 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD 
  ObjectName:  Name=begin recursive_dynamic (:dep); end; 
  
----- Bind Info (kkscoacd) -----
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=03 fl2=1206001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fea315aff88  bln=22  avl=02  flg=05
  value=1
  
...

----------------------------------------
Cursor#57(0x7fea318a2848) state=BOUND curiob=0x7fea30ec5c08
 curflg=0xc5 fl2=0x0 fl3=0x0 par=(nil) ses=0xb8c54908
----- Dump Cursor sql_id=2s8jqqgqwv7gh xsc=0x7fea30ec5c08 cur=0x7fea318a2848 -----

LibraryHandle:  Address=0x8cbdaf18 Hash=edcd9df0 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD 
  ObjectName:  Name=begin recursive_dynamic (:dep); end; 

----- Bind Info (kkscoacd) -----
 Bind#0
  oacdty=02 mxl=22(21) mxlc=00 mal=00 scl=00 pre=00
  oacflg=13 fl2=206001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7fea30ebb770  bln=22  avl=02  flg=09
  value=51
10046 SQL trace shows Parse and Execute with count=51:

SQL ID: 2s8jqqgqwv7gh Plan Hash: 0

begin recursive_dynamic (:dep); end;


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse       51      0.00       0.00          0          0          0           0
Execute     51      2.76       2.89          0          0          0           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      102      2.76       2.89          0          0          0           0

Misses in library cache during parse: 1
Misses in library cache during execute: 2
Optimizer mode: ALL_ROWS
Parsing user id: 49     (recursive depth: 1)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  PGA memory operation                            7        0.00          0.00


3. Static Recursive Call


In contrast to "execute immediate" dynamic calls, we can also test static recursive calls as follows.

create or replace procedure recursive_static (p_depth number) as 
begin
  recursive_static(p_depth + 1);
end;
/
   
alter session set max_dump_file_size = UNLIMITED;
alter session set events='36 trace name errorstack level 3: 10046 trace name context forever, level 12' 
                  tracefile_identifier='static_trc'; 
begin      
  execute immediate q'[begin recursive_static (:dep); end;]' using 1;
end;
/

alter session set events='36 trace name errorstack off: 10046 trace name context off';
If the machine has sufficient memory (more than 32 GB), after about one hour, session throws ORA-03114.

begin
  execute immediate q'[begin recursive_static (:dep); end;]' using 1;
end;
 /
 
ORA-03114: not connected to ORACLE

ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 11285
Session ID: 564 Serial number: 15134

Elapsed: 00:59:46.25
In DB alert.log / Trace / Incident file, we can see ORA-04030: out of process memory due to PL/SQL STACK.

-- DB alert.log / Trace / Incident file
ORA-04030: out of process memory when trying to allocate 8216 bytes (PLS PGA hp,PL/SQL STACK)

82%   26 GB, 3421439 chunks: "PL/SQL STACK              "  PL/SQL
         PLS PGA hp      ds=7fffbdb2bd40  dsprt=7fffbdb831f0
17% 5436 MB, 349774 chunks: "pl/sql vc2                "  PL/SQL
         koh-kghu sessi  ds=7fffbbfa3050  dsprt=7fffbd8f96b8
 1%  200 MB, 12842 chunks: "pmucalm coll              "  PL/SQL
         koh-kghu sessi  ds=7fffbd749660  dsprt=7fffbd8f96b8
If the machine has not sufficient memory (less than 32 GB), session also hits ORA-03114.

begin
  execute immediate q'[begin recursive_static (:dep); end;]' using 1;
end;
/

ORA-03114: not connected to ORACLE

ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 20417
Session ID: 14 Serial number: 53851

Elapsed: 00:01:27.51
But DB alert.log / Trace / Incident file reported ORA-6544:

ORA-6544 [pevm_peruws_callback-1] [27102] [] [] [] [] [] [] [] [] [] []

========= Dump for incident 28438 (ORA 6544 [pevm_peruws_callback-1]) ========

----- Current SQL Statement for this session (sql_id=g3u50dymhsuwn) -----
begin recursive_static (:dep); end;

----- PL/SQL Call Stack -----
  object      line  object
  handle    number  name
0x8cccb400         1  procedure K.RECURSIVE_STATIC
0x8cccb400         4  procedure K.RECURSIVE_STATIC
0x8cccb400         4  procedure K.RECURSIVE_STATIC
0x8cccb400         4  procedure K.RECURSIVE_STATIC
0x8cccb400         4  procedure K.RECURSIVE_STATIC
0x8cccb400         4  procedure K.RECURSIVE_STATIC

[8]  (dbgePostErrorDirect()+798 -> dbgePostErrorDirectVaList_int()) 
[9]  (pevm_peruws_callback()+1233 -> dbgePostErrorDirect()) 
[10] (kgepop()+438 -> pevm_peruws_callback()) 
[11] (kgersel()+256 -> kgepop()) 
[12] (ksmrf_init_alloc()+508 -> kgersel()) 
[13] (ksmapg()+539 -> ksmrf_init_alloc()) 
[14] (kgh_invoke_alloc_cb()+494 -> ksmapg()) 
[15] (kghgex()+2751 -> kgh_invoke_alloc_cb()) 
[16] (kghfnd()+1030 -> kghgex()) 
[17] (kghalo()+6631 -> kghfnd()) 
[18] (kghgex()+760 -> kghalo()) 
[19] (kghalf()+1607 -> kghgex()) 
[20] (pfrsgr()+246 -> kghalf()) 
[21] (pevm_ENTER()+3089 -> pfrsgr()) 
[22] (pfrinstr_ENTER()+59 -> pevm_ENTER()) 
[23] (pfrrun_no_tool()+60 -> pfrinstr_ENTER()) 
[24] (pfrrun()+902 -> pfrrun_no_tool()) 
[25] (plsql_run()+752 -> pfrrun()) 
In Oracle, ORA-6544 is documented as:
  06544, 00000, "PL/SQL: internal error, arguments: [%s], [%s], [%s], [%s], [%s], [%s], [%s], [%s]"
  // *Cause: A pl/sql internal error occurred.
  // *Action:Report as a bug; the first argument is the internal error nuber.

Note*: typo "nuber"
In such case, Linux dmesg shows that session process hits "Out of memory", so in-kernel, that is still something similar to ORA-04030.

[09:55:19] Out of memory: Kill process 20417 (oracle_20417_c0) score 784 or sacrifice child
[09:55:19] Killed process 20417 (oracle_20417_c0) total-vm:23949296kB, anon-rss:18819800kB, file-rss:2332kB, shmem-rss:375432kB