Then we reveal both errors in Oracle trace files. It looks like a limitation of materialized WITH subquery in parallel statements.
Note: Reproduced in Oracle 19.9, 19.10, 19.11, 19.13, but not 19.17. All are non RAC DB.
1. Test Setup
create table test_tab as select level x from dual connect by level <= 10;
create or replace procedure test_proc (p_cnt number, p_limit pls_integer, p_sleeps number) as
type t_num_tab is table of number;
l_num_tab t_num_tab := new t_num_tab();
begin
for i in 1..p_cnt loop
dbms_session.sleep(p_sleeps);
with sq as (select /*+ materialize */ x from test_tab)
select /*+ parallel(2) */ t.x
bulk collect into l_num_tab
from test_tab t, sq
where rownum <= p_limit;
end loop;
dbms_output.put_line('Selected Rows = '||l_num_tab.count);
end;
/
2. Test Run
(Step-1) Open a new Sqlplus session
(Step-2) Flush Shared Pool
alter system flush shared_pool;
(Step-3) Run test procedure
exec test_proc(1000, 3, 0.1);
(Step-4) Check DB alert.log if there is message:
WARNING: too many parse errors
(Step-5) If no such message, repeat Step-1 to Step-5.
That means opening next new Sqlplus session to do the same test.
Most of tests showed that one or two Sqlplus sessions are sufficient to reproduce the problem.
Here the message in DB alert.log. PARSE ERROR are signalled with error=12850 or error=12872.Note that the message is only written for each 100 such errors (count=13300 jump to count=13400).
2022-12-01T20:01:08.221827+01:00
WARNING: too many parse errors, count=13300 SQL hash=0x49952973
PARSE ERROR: ospid=31115, error=12850 for statement:
2022-12-01T20:01:08.221953+01:00
WITH SQ AS (SELECT /*+ materialize */ X FROM TEST_TAB) SELECT /*+ parallel(2) */ T.X FROM TEST_TAB T, SQ WHERE ROWNUM <= :B1
Additional information: hd=0x97efe2e0 phd=0x9437c118 flg=0x110676 cisid=86 sid=86 ciuid=86 uid=86 sqlid=3z4qz1d4taabm
...Current username=K
...Application: Action:
2022-12-01T20:01:09.170414+01:00
WARNING: too many parse errors, count=13400 SQL hash=0x49952973
PARSE ERROR: ospid=31117, error=12872 for statement:
2022-12-01T20:01:09.170531+01:00
WITH SQ AS (SELECT /*+ materialize */ X FROM TEST_TAB) SELECT /*+ parallel(2) */ T.X FROM TEST_TAB T, SQ WHERE ROWNUM <= :B1
Additional information: hd=0x97efe2e0 phd=0x9437c118 flg=0x110676 cisid=86 sid=86 ciuid=86 uid=86 sqlid=3z4qz1d4taabm
...Current username=K
...Application: SQL*Plus Action:
Appendix-1 lists a few related Oracle Error Documents.Appendix-2 shows parallel xplan.
3. ORA-10388 Trace
Oracle (Doc ID 2180844.1) has a description of ORA-12850 (PX QC and worker signature-plan mismatch) and documents Troubleshooting Steps.
How to get plan signature tracing for ORA-12850 (Doc ID 2180844.1)
Purpose
ORA-12850 errors are usually caused by the worker getting a plan signature different from that of the QC.
These are typically, although not always, QRY OPTIMIZER bugs. This note provides the recommended tracing needed
in order to see if there is a signature-plan mismatch and diagnose the cause of ORA-12850.†
Troubleshooting Steps
alter system set†"_parallel_blackbox_size" =†524288;†
-- prevents traces from being truncated
alter session set max_dump_file_size = unlimited;
-- customer will be prompted for a string that will appear in the tracefile name
alter session set tracefile_identifier = '&TRACEFILE_IDENTIFIER';
alter session set events 'trace[SQL_Compiler.*] disk highest';
alter session set events 'trace[Parallel_Execution.*] disk highest';
--run test code, for example, in this Blog:
exec test_proc(10, 3, 0.1);
-- turn tracing off
alter session set events 'trace[SQL_Compiler.*] off';
alter session set events 'trace[Parallel_Execution.*] off';
Look for the plan signatures in the QC vs. worker traces.
The worker that will have the mismatch should be the one with the 10388 error in it.
Notice that the plan signatures do NOT match, which causes the ORA-12850.
3. Look for the plan signatures in the QC vs. worker traces.
The worker that will have the mismatch should be the one with the 10388 error in it.
Notice that the plan signatures do NOT match, which causes the ORA-12850.
QC Plan Signature from Parse Msg: SQL Plan Signature
Slave Plan Signature: SQL Plan Signature
Following above Oracle instructions, we get Plan Signature in worker (pnnn) traces like:
Slave Plan Signature: SQL Plan Signature (0x93838ff0):
QC Plan Signature from Parse Msg: SQL Plan Signature (0x7f0fed684134):
In Oracle Document, ORA-10388 Comment said: commands a slave to leave the server group (query failed). So we will look for 10388 error in the trace files.
In worker (pnnn) trace files, internal oracle error=10388 (message to terminate a parallel query) is precised by "In signal-handler after catching 12850", and finalized by "Exiting parallel client parallel query execution(6) on error=10388".
====================== error 12850 ======================
2022-12-01 11:28:29.288*:PX_Control:kxfx.c@4849:kxfxsp1(begin):
Open & parse cursor(EXPRESS/-):
[123:WITH SQ AS (SELECT /*+ materialize */ X FROM TEST_TAB) SELECT /*+ parallel(2) */ T.X FROM TEST_TAB T, SQ WHERE ROWNUM <= :B1 ]
2022-12-01 11:28:29.288*:PX_Control:kxfx.c@4853:kxfxsp1(end):
2022-12-01 11:28:29.288*:PX_Control:kxfx.c@5885:kxfxsp(end):
2022-12-01 11:28:29.327*:PX_Control:kxfx.c@6114:kxfxsStatus():
joining FALSE ParseState KXFXUPARSE2 (3)
2022-12-01 11:28:29.331*:PX_Control:kxfx.c@10728:kxfxmai():
In signal-handler after catching 12850
2022-12-01 11:28:29.333*:PX_Control:kxfx.c@10728:kxfxmai():
In signal-handler after catching 10388
2022-12-01 11:28:29.333*:PX_Control:kxfx.c@5678:kxfxsuf():
kxfx slave finish
2022-12-01 11:28:29.333*:PX_Messaging:kxfp.c@14665:kxfprdp_int():
Exiting parallel client parallel query execution(6) on error=10388
In the same trace files, "Received error 12872" is noted with "First parse failed" during Semantic Analysis:
====================== error 12872 ======================
2022-12-01 11:29:30.411*:PX_Control:kxfx.c@6829:kxfxsExecute():
Received error 12872 from first slave parse.
First parse failed. On to outlined parse
2022-12-01 11:29:30.411*:PX_Control:kxfx.c@13917:kxfxsStmtDump():
kxfxsSmtExecute stmt:0x7f0fed69ff50
UGA: InUse:255KB Alloc:255KB
PGA: InUse:1665KB Alloc:1835KB
Enabling tracing for cur#=1 sqlid=3z4qz1d4taabm top-level
Parsing cur#=1 sqlid=3z4qz1d4taabm len=123
sql=WITH SQ AS (SELECT /*+ materialize */ X FROM TEST_TAB) SELECT /*+ parallel(2) */ T.X FROM TEST_TAB T, SQ WHERE ROWNUM <= :B1
End parsing of cur#=1 sqlid=3z4qz1d4taabm
Semantic Analysis cur#=1 sqlid=3z4qz1d4taabm
OPTIMIZER INFORMATION
4. Performance Impacts
When multiple sessions frequently run such statement, especially with high degree of parallelism (DOP), DB performance degradation is often caused by following heavy wait events:
cursor: pin S wait on X
library cache lock
kksfbc child completion
PX Deq: Parse Reply
PX Deq: Execution Msg
When looking the wait chains, often one PX particular worker is the blocker of QC or other workers.
We also noticed that V$SQL.child_number can go over 1000 ("_cursor_obsolete_threshold" default 1024).A quick workaround is to remove parallel hint to make serial execution.
Blog: 12c Parse listed a few activity stats to monitor:
opened cursors cumulative
enqueue requests
enqueue releases
sql area purged
sql area evicted
parse count (total)
parse count (hard)
parse count (failures)
Tests showed that the call (1000 executions of query):
exec test_proc(1000, 3, 0.1);
generated 6000 lines of "too many parse errors" in DB alert.log:(the count seems reading from V$SQLAREA.invalidations for each sql_id)
WARNING: too many parse errors, count=53700 SQL hash=0x49952973 ... sqlid=3z4qz1d4taabm
...
WARNING: too many parse errors, count=59700 SQL hash=0x49952973 ... sqlid=3z4qz1d4taabm
v$sysstat query:
select v.value, v.* from v$sysstat v
where name in ('parse count (failures)', 'sql area purged', 'sql area evicted',
'parse count (hard)', 'parse count (total)',
'opened cursors cumulative', 'enqueue releases', 'enqueue requests')
order by v.value;
shwed that:
the value of 'parse count (failures)', 'sql area purged', 'sql area evicted' increased 6000
the value of 'parse count (hard)', 'parse count (total)' increased a little more than 6000 (if no other DB activities)
So each test query execution created 6 "too many parse errors" related to 6 'parse count (failures)'.
5. ORA-00600 [xplUnCompact:magic]
In very rare case, QC session throws ORA-00600 as follows. It looks like that the error occurs when optimizer automatic sql reoptimisation (kkoarSalvageStats, COMPONENT: SQL_Costing) checks cardinality feedback (kkocfbCheckCardEstRws, COMPONENT: SQL_Costing) in xplan (xplUnCompact, COMPONENT: explain).
ORA-00600: internal error code, arguments: [xplUnCompact:magic], [0], [0x7FFE5C370D98]
----- Call Stack Trace -----
[12] (kgeasnmierr()+146 -> kgerinv())
[13] (xplUnCompact()+1320 -> kgeasnmierr())
[14] (kkocfbCheckCardEstRws()+815 -> xplUnCompact())
[15] (kkocfbCheckCardEstRws()+259 -> kkocfbCheckCardEstRws())
[16] (kkocfbCheckCardEstRws()+259 -> kkocfbCheckCardEstRws())
-- a couple of same calling lines
[35] (kkocfbCheckCardEstRws()+259 -> kkocfbCheckCardEstRws())
[36] (kkocfbCheckCardEst()+571 -> kkocfbCheckCardEstRws())
[37] (kkoarSalvageStats()+910 -> kkocfbCheckCardEst())
[38] (kxsFreeWorkArea()+149 -> kkoarSalvageStats())
[39] (kxsFreeExecutionHeap()+60 -> kxsFreeWorkArea())
[40] (kksumc()+631 -> kxsFreeExecutionHeap())
[41] (opiexe()+10228 -> kksumc())
Session Wait History:
elapsed time of 0.993485 sec since last wait
0: waited for 'enq: PS - contention'
6. Related Oracle BUGs and Related Work
There are a few related BUGs in Oracle MOS, we tested them (patch installed or hidden parameter set. Test DB are not RAC).
All have no effect, and DB still throws the same "too many parse errors":
Bug 31602782 ††Contention on "CURSOR: PIN S WAIT ON X" when PQ slave's execution plan does not match with QC
Versions confirmed as being affected: 12.2.0.1 (Base Release) - 19.13.0
Description
When a parallel query is executed, QC sends parse message to all workers in all
instances. In a instance one worker tries to parse the cursor and other workers
waits (cursor pin s wait for x) for that cursor to parsed and built.
If this worker could not build the cursor with the same plan as QC and could not
join, another worker tries the same and so on until all workers get exhausted
and query runs in serial or without workers from that instance.
This can cause a huge contention on CURSOR: PIN S WAIT ON X .
After this bug fix, if a worker on one instance fails to produce same plan as QC,
other workers on the same instance do not retry the parse and workers from
that instance are not allocted.
Bug 32121673††A Process May Crash With Ora-600 [xpluncompact:magic]
Warning: Too Many Parse Errors With ORA-12850/ORA-12872 (Doc ID 2816169.1)
Cause
It appears that this is due to Bug 31877812 : WHEN HIGH CONCURRENCY TOO MUCH CPU UTILIZATION AND PARALLEL
QUERY IS FAILING WITH ORA-12872, this bug is currently waiting on two other bugs to be resolved first.
Solution
†Bug is not resolved as of yet but what has been able to get by the issue is setting the following parameter:
alter system set "_nlj_batching_enabled"=0;
RAC: ORA-12801 and ORA-12850: Could Not Allocate Slaves On All Specified Instances: 2 Needed, 1 (Doc ID 1503358.1)
Blog:
ORA-12850: Could not allocate slaves on all specified instances error and a workaround for my scripts
talked about GV$ views for reading cluster-wide ASH data and PX slaves used for fetching remote data
due to Xplan temp table transformations of materialized WITH subqueries, and provided workaround of "INLINE" hint in some TPT scripts.The ORA-12850 indicates the limitation of materialized WITH subquery in Oracle 19c RAC installations because of GV$ view access.
This Blog test shows the limitation of materialized WITH subquery in parallel statements (non RAC).
Blog: Oracle WITH subquery_factoring_clause TEMP TABLE TRANSFORMATION and log file sync talked about "INLINE" hint in materialized WITH subqueries and log file sync.
Appendix-1. Oracle Error Document
ORA-12850, "Could not allocate slaves on all specified instances: %s needed, %s allocate d"
*Cause: When executing a query on a gv$ fixed view, one or more
instances failed to allocate a slave to process query.
*Action:Check trace output for instances on which slaves failed to start.
GV$ query can only proceed if slaves can be allocated on all
instances.
ORA-12872, "First slave parse gave different plan"
*Cause: First hard parse on slave given QC-supplied environment and
parameters gave different plan from QC. Try again with outline.
*Action: No external action. Internally used for outline-based reparse.
ORA-12842, "Cursor invalidated during parallel execution"
*Cause: The cursor was invalidated during the parse phase of deferred
parallel processing, e.g. when set operands are parallelized.
*Action: Depends on why the cursor was invalidated. Possible causes include
DDL on a schema object and shared pool being flushed.
ORA-10388, "parallel query server interrupt (failure)"
*Cause: internal use only
*Action: this event should never be set externally
*Comment: commands a slave to leave the server group (query failed)
The ORA-10388 is an internal oracle message to terminate a parallel query when the operation is failed.
This message should not be visible in alert.log
ORA-10387, "parallel query server interrupt (normal)"
*Cause: internal use only
*Action: this event should never be set externally
*Comment: commands a slave to leave the server group normally
ORA-10382, "parallel query server interrupt (reset)"
*Cause: internal use only
*Action: this event should never be set externally
*Comment: commands a slave to clear its buffers and reset itself
Appendix-2. Xplan
SQL_ID 3z4qz1d4taabm, child number 0
-------------------------------------
WITH SQ AS (SELECT /*+ materialize */ X FROM TEST_TAB) SELECT /*+
parallel(2) */ T.X FROM TEST_TAB T, SQ WHERE ROWNUM <= :B1
Plan hash value: 1215440870
---------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 170 (100)| | | | |
| 1 | TEMP TABLE TRANSFORMATION | | | | | | | | |
| 2 | PX COORDINATOR | | | | | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10000 | 327 | 4251 | 2 (0)| 00:00:01 | Q1,00 | P->S | QC (RAND) |
| 4 | LOAD AS SELECT (TEMP SEGMENT MERGE)| SYS_TEMP_0FD9D664E_DBF2D292 | | | | | Q1,00 | PCWP | |
| 5 | PX BLOCK ITERATOR | | 327 | 4251 | 2 (0)| 00:00:01 | Q1,00 | PCWC | |
|* 6 | TABLE ACCESS FULL | TEST_TAB | 327 | 4251 | 2 (0)| 00:00:01 | Q1,00 | PCWP | |
|* 7 | COUNT STOPKEY | | | | | | | | |
| 8 | PX COORDINATOR | | | | | | | | |
| 9 | PX SEND QC (RANDOM) | :TQ20001 | 106K| 1357K| 168 (0)| 00:00:01 | Q2,01 | P->S | QC (RAND) |
|* 10 | COUNT STOPKEY | | | | | | Q2,01 | PCWC | |
| 11 | MERGE JOIN CARTESIAN | | 106K| 1357K| 168 (0)| 00:00:01 | Q2,01 | PCWP | |
| 12 | PX BLOCK ITERATOR | | 327 | 4251 | 2 (0)| 00:00:01 | Q2,01 | PCWC | |
|* 13 | TABLE ACCESS FULL | TEST_TAB | 327 | 4251 | 2 (0)| 00:00:01 | Q2,01 | PCWP | |
| 14 | BUFFER SORT | | 327 | | 166 (0)| 00:00:01 | Q2,01 | PCWP | |
| 15 | PX RECEIVE | | 327 | | | | Q2,01 | PCWP | |
| 16 | PX SEND BROADCAST | :TQ20000 | 327 | | | | Q2,00 | P->P | BROADCAST |
| 17 | VIEW | | 327 | | | | Q2,00 | PCWP | |
| 18 | PX BLOCK ITERATOR | | 327 | 4251 | 2 (0)| 00:00:01 | Q2,00 | PCWC | |
|* 19 | TABLE ACCESS FULL | SYS_TEMP_0FD9D664E_DBF2D292 | 327 | 4251 | 2 (0)| 00:00:01 | Q2,00 | PCWP | |
---------------------------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$2
2 - SEL$1
6 - SEL$1 / TEST_TAB@SEL$1
13 - SEL$2 / T@SEL$2
17 - SEL$D67CB2D2 / SQ@SEL$2
19 - SEL$D67CB2D2 / T1@SEL$D67CB2D2
Outline Data
-------------
/*+
BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE('19.1.0')
DB_VERSION('19.1.0')
OPT_PARAM('optimizer_dynamic_sampling' 1)
OPT_PARAM('_optimizer_use_feedback' 'false')
OPT_PARAM('_optimizer_gather_stats_on_load' 'false')
OPT_PARAM('_px_adaptive_dist_method' 'off')
OPT_PARAM('_optimizer_strans_adaptive_pruning' 'false')
OPT_PARAM('_optimizer_nlj_hj_adaptive_join' 'false')
OPT_PARAM('optimizer_index_cost_adj' 50)
OPT_PARAM('optimizer_index_caching' 95)
OPT_PARAM('_fix_control' '23473108:0 20424684:0 22582700:0')
ALL_ROWS
OUTLINE_LEAF(@"SEL$1")
OUTLINE_LEAF(@"SEL$D67CB2D2")
MATERIALIZE(@"SEL$1")
OUTLINE_LEAF(@"SEL$2")
FULL(@"SEL$2" "T"@"SEL$2")
NO_ACCESS(@"SEL$2" "SQ"@"SEL$2")
LEADING(@"SEL$2" "T"@"SEL$2" "SQ"@"SEL$2")
USE_MERGE_CARTESIAN(@"SEL$2" "SQ"@"SEL$2")
PQ_DISTRIBUTE(@"SEL$2" "SQ"@"SEL$2" NONE BROADCAST)
FULL(@"SEL$D67CB2D2" "T1"@"SEL$D67CB2D2")
FULL(@"SEL$1" "TEST_TAB"@"SEL$1")
END_OUTLINE_DATA
*/
Peeked Binds (identified by position):
--------------------------------------
1 - :B1 (NUMBER): 3
Predicate Information (identified by operation id):
---------------------------------------------------
6 - access(:Z>=:Z AND :Z<=:Z)
7 - filter(ROWNUM<=:B1)
10 - filter(ROWNUM<=:B1)
13 - access(:Z>=:Z AND :Z<=:Z)
19 - access(:Z>=:Z AND :Z<=:Z)
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2
---------------------------------------------------------------------------
0 - STATEMENT
- parallel(2)
2 - SEL$1
- materialize
Note
-----
- Degree of Parallelism is 2 because of hint