Sunday, December 4, 2022

One Test of Oracle too many parse errors with ORA-12850 and ORA-12872

In this Blog, we first make a test to demonstrate Oracle "too many parse errors" associated with ORA-12850 and ORA-12872.
Then we reveal both errors in Oracle trace files. It looks like a limitation of materialized WITH subquery in parallel statements.

Note: Reproduced in Oracle 19.9, 19.10, 19.11, 19.13, but not 19.17. All are non RAC DB.


1. Test Setup



create table test_tab as select level x from dual connect by level <= 10; 

create or replace procedure test_proc (p_cnt number, p_limit pls_integer, p_sleeps number) as
  type t_num_tab      is table of number;
  l_num_tab           t_num_tab := new t_num_tab();
begin
  for i in 1..p_cnt loop 
    dbms_session.sleep(p_sleeps);
    with sq as (select /*+ materialize  */ x from test_tab)
    select /*+ parallel(2) */ t.x  
      bulk collect into l_num_tab
      from test_tab t, sq 
     where rownum <= p_limit;
   end loop; 
   
   dbms_output.put_line('Selected Rows = '||l_num_tab.count);
end;
/


2. Test Run


(Step-1) Open a new Sqlplus session

(Step-2) Flush Shared Pool
            alter system flush shared_pool; 

(Step-3) Run test procedure
            exec test_proc(1000, 3, 0.1);

(Step-4) Check DB alert.log if there is message:
            WARNING: too many parse errors
   
(Step-5) If no such message, repeat Step-1 to Step-5.
         That means opening next new Sqlplus session to do the same test.
         Most of tests showed that one or two Sqlplus sessions are sufficient to reproduce the problem.
Here the message in DB alert.log. PARSE ERROR are signalled with error=12850 or error=12872.
Note that the message is only written for each 100 such errors (count=13300 jump to count=13400).

2022-12-01T20:01:08.221827+01:00
WARNING: too many parse errors, count=13300 SQL hash=0x49952973
PARSE ERROR: ospid=31115, error=12850 for statement: 
2022-12-01T20:01:08.221953+01:00
WITH SQ AS (SELECT /*+ materialize  */ X FROM TEST_TAB) SELECT /*+ parallel(2) */ T.X FROM TEST_TAB T, SQ WHERE ROWNUM <= :B1
Additional information: hd=0x97efe2e0 phd=0x9437c118 flg=0x110676 cisid=86 sid=86 ciuid=86 uid=86 sqlid=3z4qz1d4taabm
...Current username=K
...Application:  Action: 
2022-12-01T20:01:09.170414+01:00
WARNING: too many parse errors, count=13400 SQL hash=0x49952973
PARSE ERROR: ospid=31117, error=12872 for statement: 
2022-12-01T20:01:09.170531+01:00
WITH SQ AS (SELECT /*+ materialize  */ X FROM TEST_TAB) SELECT /*+ parallel(2) */ T.X FROM TEST_TAB T, SQ WHERE ROWNUM <= :B1
Additional information: hd=0x97efe2e0 phd=0x9437c118 flg=0x110676 cisid=86 sid=86 ciuid=86 uid=86 sqlid=3z4qz1d4taabm
...Current username=K
...Application: SQL*Plus Action:  
Appendix-1 lists a few related Oracle Error Documents.

Appendix-2 shows parallel xplan.


3. ORA-10388 Trace


Oracle (Doc ID 2180844.1) has a description of ORA-12850 (PX QC and worker signature-plan mismatch) and documents Troubleshooting Steps.

How to get plan signature tracing for ORA-12850 (Doc ID 2180844.1)
  Purpose
      ORA-12850 errors are usually caused by the worker getting a plan signature different from that of the QC.  
      These are typically, although not always, QRY OPTIMIZER bugs. This note provides the recommended tracing needed 
      in order to see if there is a signature-plan mismatch and diagnose the cause of ORA-12850.†

  Troubleshooting Steps
      alter system set†"_parallel_blackbox_size" =†524288;†
      -- prevents traces from being truncated
      alter session set max_dump_file_size = unlimited;                         
      -- customer will be prompted for a string that will appear in the tracefile name
      alter session set tracefile_identifier = '&TRACEFILE_IDENTIFIER';         
      alter session set events 'trace[SQL_Compiler.*] disk highest';
      alter session set events 'trace[Parallel_Execution.*] disk highest';
      
      --run test code, for example, in this Blog:
      exec test_proc(10, 3, 0.1);
      
      -- turn tracing off
      alter session set events 'trace[SQL_Compiler.*] off';
      alter session set events 'trace[Parallel_Execution.*] off';
    
    Look for the plan signatures in the QC vs. worker traces.  
    The worker that will have the mismatch should be the one with the 10388 error in it.  
    Notice that the plan signatures do NOT match, which causes the ORA-12850.
	
  3. Look for the plan signatures in the QC vs. worker traces.  
     The worker that will have the mismatch should be the one with the 10388 error in it.  
     Notice that the plan signatures do NOT match, which causes the ORA-12850.
     
        QC Plan Signature from Parse Msg:  SQL Plan Signature
        Slave Plan Signature:  SQL Plan Signature
Following above Oracle instructions, we get Plan Signature in worker (pnnn) traces like:

  Slave Plan Signature:  SQL Plan Signature (0x93838ff0):
  QC Plan Signature from Parse Msg:  SQL Plan Signature (0x7f0fed684134):
In Oracle Document, ORA-10388 Comment said: commands a slave to leave the server group (query failed).
So we will look for 10388 error in the trace files.

In worker (pnnn) trace files, internal oracle error=10388 (message to terminate a parallel query) is precised by "In signal-handler after catching 12850", and finalized by "Exiting parallel client parallel query execution(6) on error=10388".

====================== error 12850 ======================
2022-12-01 11:28:29.288*:PX_Control:kxfx.c@4849:kxfxsp1(begin): 
	Open & parse cursor(EXPRESS/-): 
	[123:WITH SQ AS (SELECT /*+ materialize  */ X FROM TEST_TAB) SELECT /*+ parallel(2) */ T.X FROM TEST_TAB T, SQ WHERE ROWNUM <= :B1 ]
2022-12-01 11:28:29.288*:PX_Control:kxfx.c@4853:kxfxsp1(end): 
2022-12-01 11:28:29.288*:PX_Control:kxfx.c@5885:kxfxsp(end): 
2022-12-01 11:28:29.327*:PX_Control:kxfx.c@6114:kxfxsStatus(): 
	joining FALSE ParseState KXFXUPARSE2 (3)
2022-12-01 11:28:29.331*:PX_Control:kxfx.c@10728:kxfxmai(): 
	In signal-handler after catching 12850
2022-12-01 11:28:29.333*:PX_Control:kxfx.c@10728:kxfxmai(): 
	In signal-handler after catching 10388
2022-12-01 11:28:29.333*:PX_Control:kxfx.c@5678:kxfxsuf(): 
	kxfx slave finish
2022-12-01 11:28:29.333*:PX_Messaging:kxfp.c@14665:kxfprdp_int(): 
	Exiting parallel client parallel query execution(6) on error=10388
In the same trace files, "Received error 12872" is noted with "First parse failed" during Semantic Analysis:

====================== error 12872 ======================
2022-12-01 11:29:30.411*:PX_Control:kxfx.c@6829:kxfxsExecute(): 
	Received error 12872 from first slave parse.
	First parse failed. On to outlined parse 
2022-12-01 11:29:30.411*:PX_Control:kxfx.c@13917:kxfxsStmtDump(): 
	kxfxsSmtExecute stmt:0x7f0fed69ff50
	 UGA: InUse:255KB Alloc:255KB
	 PGA: InUse:1665KB Alloc:1835KB
Enabling tracing for cur#=1 sqlid=3z4qz1d4taabm top-level
Parsing cur#=1 sqlid=3z4qz1d4taabm len=123 
sql=WITH SQ AS (SELECT /*+ materialize  */ X FROM TEST_TAB) SELECT /*+ parallel(2) */ T.X FROM TEST_TAB T, SQ WHERE ROWNUM <= :B1 
End parsing of cur#=1 sqlid=3z4qz1d4taabm
Semantic Analysis cur#=1 sqlid=3z4qz1d4taabm
OPTIMIZER INFORMATION


4. Performance Impacts


When multiple sessions frequently run such statement, especially with high degree of parallelism (DOP), DB performance degradation is often caused by following heavy wait events:

  cursor: pin S wait on X
  library cache lock
  kksfbc child completion
  PX Deq: Parse Reply
  PX Deq: Execution Msg
When looking the wait chains, often one PX particular worker is the blocker of QC or other workers. We also noticed that V$SQL.child_number can go over 1000 ("_cursor_obsolete_threshold" default 1024).

A quick workaround is to remove parallel hint to make serial execution.

Blog: 12c Parse listed a few activity stats to monitor:
  opened cursors cumulative         
  enqueue requests                  
  enqueue releases                  
  sql area purged                   
  sql area evicted                  
  parse count (total)               
  parse count (hard)                
  parse count (failures)            
Tests showed that the call (1000 executions of query):

  exec test_proc(1000, 3, 0.1); 
generated 6000 lines of "too many parse errors" in DB alert.log:
(the count seems reading from V$SQLAREA.invalidations for each sql_id)

  WARNING: too many parse errors, count=53700 SQL hash=0x49952973 ... sqlid=3z4qz1d4taabm
  ...
  WARNING: too many parse errors, count=59700 SQL hash=0x49952973 ... sqlid=3z4qz1d4taabm
v$sysstat query:

select v.value, v.* from v$sysstat v
where name in ('parse count (failures)', 'sql area purged', 'sql area evicted', 
               'parse count (hard)', 'parse count (total)', 
               'opened cursors cumulative', 'enqueue releases', 'enqueue requests') 
order by v.value;
shwed that:
   the value of 'parse count (failures)', 'sql area purged', 'sql area evicted' increased 6000
   the value of 'parse count (hard)', 'parse count (total)' increased a little more than 6000 (if no other DB activities)
So each test query execution created 6 "too many parse errors" related to 6 'parse count (failures)'.


5. ORA-00600 [xplUnCompact:magic]


In very rare case, QC session throws ORA-00600 as follows. It looks like that the error occurs when optimizer automatic sql reoptimisation (kkoarSalvageStats, COMPONENT: SQL_Costing) checks cardinality feedback (kkocfbCheckCardEstRws, COMPONENT: SQL_Costing) in xplan (xplUnCompact, COMPONENT: explain).

ORA-00600: internal error code, arguments: [xplUnCompact:magic], [0], [0x7FFE5C370D98]

----- Call Stack Trace -----

[12] (kgeasnmierr()+146 -> kgerinv())
[13] (xplUnCompact()+1320 -> kgeasnmierr())
[14] (kkocfbCheckCardEstRws()+815 -> xplUnCompact())
[15] (kkocfbCheckCardEstRws()+259 -> kkocfbCheckCardEstRws())
[16] (kkocfbCheckCardEstRws()+259 -> kkocfbCheckCardEstRws())
     -- a couple of same calling lines
[35] (kkocfbCheckCardEstRws()+259 -> kkocfbCheckCardEstRws())
[36] (kkocfbCheckCardEst()+571 -> kkocfbCheckCardEstRws())
[37] (kkoarSalvageStats()+910 -> kkocfbCheckCardEst())
[38] (kxsFreeWorkArea()+149 -> kkoarSalvageStats())
[39] (kxsFreeExecutionHeap()+60 -> kxsFreeWorkArea())
[40] (kksumc()+631 -> kxsFreeExecutionHeap())
[41] (opiexe()+10228 -> kksumc())

Session Wait History:
    elapsed time of 0.993485 sec since last wait
 0: waited for 'enq: PS - contention'


6. Related Oracle BUGs and Related Work


There are a few related BUGs in Oracle MOS, we tested them (patch installed or hidden parameter set. Test DB are not RAC).
All have no effect, and DB still throws the same "too many parse errors":

Bug 31602782 ††Contention on "CURSOR: PIN S WAIT ON X" when PQ slave's execution plan does not match with QC       
   Versions confirmed as being affected: 12.2.0.1 (Base Release) - 19.13.0
   Description
     When a parallel query is executed, QC sends parse message to all workers in all  
     instances. In a instance one worker tries to parse the cursor and other workers 
     waits (cursor pin s wait for x) for that cursor to parsed and built. 
      
     If this worker could not build the cursor with the same plan as QC and could not  
     join, another worker tries the same and so on until all workers get exhausted 
     and query runs in serial or without workers from that instance. 
      
     This can cause a huge contention on CURSOR: PIN S WAIT ON X .
      
     After this bug fix, if a worker on one instance fails to produce same plan as QC, 
     other workers on the same instance do not retry the parse and workers from 
     that instance are not allocted.
    
Bug 32121673††A Process May Crash With Ora-600 [xpluncompact:magic]
 
Warning: Too Many Parse Errors With ORA-12850/ORA-12872 (Doc ID 2816169.1)
  Cause
     It appears that this is due to Bug 31877812 : WHEN HIGH CONCURRENCY TOO MUCH CPU UTILIZATION AND PARALLEL 
     QUERY IS FAILING WITH ORA-12872, this bug is currently waiting on two other bugs to be resolved first. 
  Solution
    †Bug is not resolved as of yet but what has been able to get by the issue is setting the following parameter:
       alter system set "_nlj_batching_enabled"=0;
  
RAC: ORA-12801 and ORA-12850: Could Not Allocate Slaves On All Specified Instances: 2 Needed, 1 (Doc ID 1503358.1)
Blog: ORA-12850: Could not allocate slaves on all specified instances error and a workaround for my scripts talked about GV$ views for reading cluster-wide ASH data and PX slaves used for fetching remote data due to Xplan temp table transformations of materialized WITH subqueries, and provided workaround of "INLINE" hint in some TPT scripts.

The ORA-12850 indicates the limitation of materialized WITH subquery in Oracle 19c RAC installations because of GV$ view access.
This Blog test shows the limitation of materialized WITH subquery in parallel statements (non RAC).

Blog: Oracle WITH subquery_factoring_clause TEMP TABLE TRANSFORMATION and log file sync talked about "INLINE" hint in materialized WITH subqueries and log file sync.


Appendix-1. Oracle Error Document


ORA-12850, "Could not allocate slaves on all specified instances: %s needed, %s allocate d"
  *Cause: When executing a query on a gv$ fixed view, one or more                                                  
          instances failed to allocate a slave to process query.                                                   
  *Action:Check trace output for instances on which slaves failed to start.                                        
          GV$ query can only proceed if slaves can be allocated on all                                             
          instances.  
          
ORA-12872, "First slave parse gave different plan"                                                               
  *Cause: First hard parse on slave given QC-supplied environment and                                              
          parameters gave different plan from QC. Try again with outline.                                          
  *Action: No external action. Internally used for outline-based reparse.                                          

ORA-12842, "Cursor invalidated during parallel execution"
  *Cause:  The cursor was invalidated during the parse phase of deferred
           parallel processing, e.g. when set operands are parallelized.
  *Action: Depends on why the cursor was invalidated. Possible causes include
           DDL on a schema object and shared pool being flushed.

ORA-10388, "parallel query server interrupt (failure)"
  *Cause: internal use only
  *Action: this event should never be set externally
  *Comment: commands a slave to leave the server group (query failed)

  The ORA-10388 is an internal oracle message to terminate a parallel query when the operation is failed.
  This message should not be visible in alert.log

ORA-10387, "parallel query server interrupt (normal)"
  *Cause: internal use only
  *Action: this event should never be set externally
  *Comment: commands a slave to leave the server group normally

ORA-10382,  "parallel query server interrupt (reset)"            
  *Cause: internal use only                                       
  *Action: this event should never be set externally              
  *Comment: commands a slave to clear its buffers and reset itself


Appendix-2. Xplan



SQL_ID  3z4qz1d4taabm, child number 0
-------------------------------------
WITH SQ AS (SELECT /*+ materialize  */ X FROM TEST_TAB) SELECT /*+ 
parallel(2) */ T.X FROM TEST_TAB T, SQ WHERE ROWNUM <= :B1
 
Plan hash value: 1215440870
 
---------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name                        | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                             |       |       |   170 (100)|          |        |      |            |
|   1 |  TEMP TABLE TRANSFORMATION             |                             |       |       |            |          |        |      |            |
|   2 |   PX COORDINATOR                       |                             |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)                 | :TQ10000                    |   327 |  4251 |     2   (0)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
|   4 |     LOAD AS SELECT (TEMP SEGMENT MERGE)| SYS_TEMP_0FD9D664E_DBF2D292 |       |       |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR                 |                             |   327 |  4251 |     2   (0)| 00:00:01 |  Q1,00 | PCWC |            |
|*  6 |       TABLE ACCESS FULL                | TEST_TAB                    |   327 |  4251 |     2   (0)| 00:00:01 |  Q1,00 | PCWP |            |
|*  7 |   COUNT STOPKEY                        |                             |       |       |            |          |        |      |            |
|   8 |    PX COORDINATOR                      |                             |       |       |            |          |        |      |            |
|   9 |     PX SEND QC (RANDOM)                | :TQ20001                    |   106K|  1357K|   168   (0)| 00:00:01 |  Q2,01 | P->S | QC (RAND)  |
|* 10 |      COUNT STOPKEY                     |                             |       |       |            |          |  Q2,01 | PCWC |            |
|  11 |       MERGE JOIN CARTESIAN             |                             |   106K|  1357K|   168   (0)| 00:00:01 |  Q2,01 | PCWP |            |
|  12 |        PX BLOCK ITERATOR               |                             |   327 |  4251 |     2   (0)| 00:00:01 |  Q2,01 | PCWC |            |
|* 13 |         TABLE ACCESS FULL              | TEST_TAB                    |   327 |  4251 |     2   (0)| 00:00:01 |  Q2,01 | PCWP |            |
|  14 |        BUFFER SORT                     |                             |   327 |       |   166   (0)| 00:00:01 |  Q2,01 | PCWP |            |
|  15 |         PX RECEIVE                     |                             |   327 |       |            |          |  Q2,01 | PCWP |            |
|  16 |          PX SEND BROADCAST             | :TQ20000                    |   327 |       |            |          |  Q2,00 | P->P | BROADCAST  |
|  17 |           VIEW                         |                             |   327 |       |            |          |  Q2,00 | PCWP |            |
|  18 |            PX BLOCK ITERATOR           |                             |   327 |  4251 |     2   (0)| 00:00:01 |  Q2,00 | PCWC |            |
|* 19 |             TABLE ACCESS FULL          | SYS_TEMP_0FD9D664E_DBF2D292 |   327 |  4251 |     2   (0)| 00:00:01 |  Q2,00 | PCWP |            |
---------------------------------------------------------------------------------------------------------------------------------------------------
 
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
 
   1 - SEL$2       
   2 - SEL$1       
   6 - SEL$1        / TEST_TAB@SEL$1
  13 - SEL$2        / T@SEL$2
  17 - SEL$D67CB2D2 / SQ@SEL$2
  19 - SEL$D67CB2D2 / T1@SEL$D67CB2D2
 
Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      OPT_PARAM('optimizer_dynamic_sampling' 1)
      OPT_PARAM('_optimizer_use_feedback' 'false')
      OPT_PARAM('_optimizer_gather_stats_on_load' 'false')
      OPT_PARAM('_px_adaptive_dist_method' 'off')
      OPT_PARAM('_optimizer_strans_adaptive_pruning' 'false')
      OPT_PARAM('_optimizer_nlj_hj_adaptive_join' 'false')
      OPT_PARAM('optimizer_index_cost_adj' 50)
      OPT_PARAM('optimizer_index_caching' 95)
      OPT_PARAM('_fix_control' '23473108:0 20424684:0 22582700:0')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      OUTLINE_LEAF(@"SEL$D67CB2D2")
      MATERIALIZE(@"SEL$1")
      OUTLINE_LEAF(@"SEL$2")
      FULL(@"SEL$2" "T"@"SEL$2")
      NO_ACCESS(@"SEL$2" "SQ"@"SEL$2")
      LEADING(@"SEL$2" "T"@"SEL$2" "SQ"@"SEL$2")
      USE_MERGE_CARTESIAN(@"SEL$2" "SQ"@"SEL$2")
      PQ_DISTRIBUTE(@"SEL$2" "SQ"@"SEL$2" NONE BROADCAST)
      FULL(@"SEL$D67CB2D2" "T1"@"SEL$D67CB2D2")
      FULL(@"SEL$1" "TEST_TAB"@"SEL$1")
      END_OUTLINE_DATA
  */
Peeked Binds (identified by position):
--------------------------------------
   1 - :B1 (NUMBER): 3
 
Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access(:Z>=:Z AND :Z<=:Z)
   7 - filter(ROWNUM<=:B1)
  10 - filter(ROWNUM<=:B1)
  13 - access(:Z>=:Z AND :Z<=:Z)
  19 - access(:Z>=:Z AND :Z<=:Z)
 
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2
---------------------------------------------------------------------------
   0 -  STATEMENT
           -  parallel(2)
 
   2 -  SEL$1
           -  materialize
Note
-----
   - Degree of Parallelism is 2 because of hint