Monday, October 3, 2022

Oracle WITH subquery_factoring_clause TEMP TABLE TRANSFORMATION and log file sync

If WITH subquery_factoring_clause in a query is transformed as a temporary table, the session can experience log file sync when Log Write (LGWR) has problem.

As further check, we also make tests without supsending LGWR, and show real system observations.

Note: Tested in Oracle 19c


1. Test


Open 3 Sqlplus sessions:
   Oradebug session
   Test session
   Monitor session


1.1 Oradebug session


In order to understand Oracle internals, run following script to suspend LGWR for 30 seconds:

------------------------ Oradebug session ------------------------
col pid new_value orapid
select pid from v$process where program like '%LGWR%';  

alter system switch logfile;

exec dbms_lock.sleep(1); 

oradebug setorapid &orapid
oradebug suspend
exec dbms_lock.sleep(30);
oradebug resume 


1.2 Test session


The test sql contains one WITH subquery_factoring_clause. Before and After the sql execution, we collect session redo/undo stats, and we also dump CURRENT redo logfile.

------------------------ Test session ------------------------
select 'BEFORE---', s.sid, n.name, s.value from v$statname n, v$mystat s 
 where n.statistic# = s.statistic#
   and name in ('redo entries', 'redo size', 'undo change vector size', 'redo synch time (usec)');

with sq as (select class, rownum r from v$waitstat)
  select session_id, 
         (select min(r) from sq) mir, 
         (select min(r) from sq) mar
    from dba_hist_active_sess_history
    --from gv$active_session_history
    where rownum <= 3;   
    
select 'AFTER---', s.sid, n.name, s.value from v$statname n, v$mystat s 
 where n.statistic# = s.statistic#  
   and name in ('redo entries', 'redo size', 'undo change vector size', 'redo synch time (usec)');
   

-- dump redo log
col member new_value redolog 
col trcfile new_value trcfile 
select member, 'redodump_G'||l.group#||'_'||to_char(sysdate,'hh24_mi_ss')  trcfile 
  from v$log l, v$logfile f where l.status = 'CURRENT' and f.group# = l.group#;

alter session set tracefile_identifier='&trcfile';
alter system dump logfile '&redolog'; 


1.3 Monitor session


During the test, run a query to check the existence of transaction and get its xid:

------------------------ Monitor session ------------------------
-- gv$transaction is defined on x$ktcxb with filter: bitand (ksspaflg, 1) != 0 and bitand (ktcxbflg, 2) != 0

set lines 300
col state for a10
col event for a30

select sid, state, event, last_call_et, kxidusn, kxidslt, kxidsqn -- ,x.*, s.*
  from x$ktcxb x, v$session s
 where ktcxbses = saddr and kxidusn != 0 and event = 'log file sync';  


2. Test Output



2.1 Test session Output



    BEFORE--        SID NAME                              VALUE
    --------- ---------- ---------------------------- ----------
    BEFORE---        383 redo entries                          5
    BEFORE---        383 redo size                          1140
    BEFORE---        383 redo synch time (usec)         45940025
    BEFORE---        383 undo change vector size             272
                                                       
                                                       
    AFTER--        SID NAME                              VALUE
    -------- ---------- ------------------------------ --------
    AFTER---        383 redo entries                          7
    AFTER---        383 redo size                          1516
    AFTER---        383 redo synch time (usec)         66985927
    AFTER---        383 undo change vector size             352
Comparing "BEFORE" and "AFTER" session stats, we can see:
   "redo entries"            Increase: 7-5       = 2
   "redo size"               Increase: 1516-1140 = 376
   "undo change vector size" Increase: 352-272   = 80                                                      
Later in redo logfile dump, we can see the two "redo entries" (REDO RECORD) and "undo change vector size".


2.2 Monitor session Output



    SID STATE      EVENT            LAST_CALL_ET    KXIDUSN    KXIDSLT    KXIDSQN
    --- ---------- ---------------- ------------ ---------- ---------- ----------
    383 WAITING    log file sync              15        106         19      48951
We can see that the session has one transaction with:

  xid=KXIDUSN.KXIDSLT.KXIDSQN=106.19.48951=0x6a.13.bf37
which can also be found in redo logfile dump.

If during the hanging, we printout short_stack on Test session, it looks like:

SQL> oradebug short_stack

  ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()
  <-semtimedop()+10<-skgpwwait()+187<-ksliwat()+2293<-kslwaitctx()+200
  <-kcrf_commit_force_int()+1866<-ksupop()+3275<-opiodr()+1525<-ttcpip()+1246
  <-opitsk()+1900<-opiino()+936<-opiodr()+1202<-opidrv()+1094<-sou2o()+165
  <-opimai_real()+422<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245
It seems that the transaction for CURSOR DURATION MEMORY temporary table is proceeded with Oracle internal subroutines:

  ktiTxnPoolAllocate ()
  kcrf_commit_force_int ()
  ktuIMTabCacheCommittedTxn ()
CURSOR DURATION MEMORY temporary table is created by "qks3tCrtTmpTabl" during parsing with "kksParseCursor" when first time executed in the system (the created temporary table is a DB-wide shared cursor, not session private).


2.3 redo logfile dump


The complete dump is:

REDO RECORD - Thread:1 RBA: 0x00008e.00000002.0010 LEN: 0x0110 VLD: 0x05 CON_UID: 0
SCN: 0x00000b7305de450c SUBSCN:  1 10/03/2022 08:21:38
(LWN RBA: 0x00008e.00000002.0010 LEN: 0x00000001 NST: 0x0001 SCN: 0x00000b7305de450c)
CHANGE #1 CON_ID:0 TYP:0 CLS:227 AFN:3 DBA:0x00c003d0 OBJ:4294967295 SCN:0x00000b7305de44f7 SEQ:1 OP:5.2 ENC:0 RBL:0 FLG:0x0000
ktudh redo: slt: 0x0013 sqn: 0x0000bf37 flg: 0x0411 siz: 80 fbi: 0
            uba: 0x00c001cb.1f32.07    pxid:  0x0000.000.00000000
CHANGE #2 CON_ID:0 TYP:0 CLS:228 AFN:3 DBA:0x00c001cb OBJ:4294967295 SCN:0x00000b7305de44f6 SEQ:1 OP:5.1 ENC:0 RBL:0 FLG:0x0000
ktudb redo: siz: 80 spc: 7454 flg: 0x0010 seq: 0x1f32 rec: 0x07
            xid:  0x006a.013.0000bf37  
ktubl redo: slt: 19 wrp: 1 flg: 0x0c08 prev dba:  0x00000000 rci: 0 opc: 5.7 [objn: 0 objd: 0 tsn: 0]
[Undo type  ] Regular undo  [User undo done   ]  No  [Last buffer split]  No 
[Temp object]           No  [Tablespace Undo  ]  No  [User only        ]  No 
Begin trans    
 prev ctl uba: 0x00c001cb.1f32.06 prev ctl max cmt scn:  0x00000b7305de40a5 
 prev tx cmt scn:  0x00000b7305de40a6 
 txn start scn:  0xffffffffffffffff  logon user: 49
 prev brb:  0x00c001bf  prev bcl:  0x00000000
BuExt idx: 0 flg2: 0
 
REDO RECORD - Thread:1 RBA: 0x00008e.00000002.0120 LEN: 0x0068 VLD: 0x01 CON_UID: 0
SCN: 0x00000b7305de450d SUBSCN:  1 10/03/2022 08:21:38
CHANGE #1 CON_ID:0 TYP:0 CLS:227 AFN:3 DBA:0x00c003d0 OBJ:4294967295 SCN:0x00000b7305de450c SEQ:1 OP:5.4 ENC:0 RBL:0 FLG:0x0000
ktucm redo: slt: 0x0013 sqn: 0x0000bf37 srt: 0 sta: 9 flg: 0x2 ktucf redo: uba: 0x00c001cb.1f32.07 ext: 2 spc: 7372 fbi: 0 
END OF REDO DUMP
There are two REDO RECORDs (text after ### are added comment), which are the two "redo entries" in session stats:

  First REDO RECORD with: 
     CHANGE #1 OP:5.2    ###  Update rollback segment header - KTURDH    ('undo header' CLS:227)
     CHANGE #2 OP:5.1    ###  Undo block or undo segment header - KTURDB ('undo block'  CLS:228, siz: 80)

  Second REDO RECORD with: 
     CHANGE #1 OP:5.4    ###  Commit transaction (transaction table update) - KTURCM (CLS:227)
For OP code, See Layer and corresponding Operation codes description.

  Layer 5 : Transaction Undo - KCOCOTUN [ktucts.h]
    Opcode 1 : Undo block or undo segment header - KTURDB
    Opcode 2 : Update rollback segment header - KTURDH
    Opcode 3 : Rollout a transaction begin - KTURBG
    Opcode 4 : Commit transaction (transaction table update) - KTURCM - no undo record


2.4 Sql Xplan


Xplan shows that WITH subquery_factoring_clause is treated as a temporary table load:

SQL_ID  gvrzr3dmans6k, child number 0
-------------------------------------
with sq as (select class, rownum r from v$waitstat)   select 
session_id,          (select min(r) from sq) mir,          (select 
min(r) from sq) mar     from dba_hist_active_sess_history     --from 
gv$active_session_history     where rownum <= 3
 
Plan hash value: 1419524593
 
----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                        | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                             |       |       |   453 (100)|          |       |       |
|   1 |  SORT AGGREGATE                          |                             |     1 |    13 |            |          |       |       |
|   2 |   VIEW                                   |                             |     1 |    13 |     2   (0)| 00:00:01 |       |       |
|   3 |    TABLE ACCESS FULL                     | SYS_TEMP_0FD9D6B1B_42ABBB2  |     1 |     6 |     2   (0)| 00:00:01 |       |       |
|   4 |  SORT AGGREGATE                          |                             |     1 |    13 |            |          |       |       |
|   5 |   VIEW                                   |                             |     1 |    13 |     2   (0)| 00:00:01 |       |       |
|   6 |    TABLE ACCESS FULL                     | SYS_TEMP_0FD9D6B1B_42ABBB2  |     1 |     6 |     2   (0)| 00:00:01 |       |       |
|   7 |  TEMP TABLE TRANSFORMATION               |                             |       |       |            |          |       |       |
|   8 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6B1B_42ABBB2  |       |       |            |          |       |       |
|   9 |    COUNT                                 |                             |       |       |            |          |       |       |
|* 10 |     FIXED TABLE FULL                     | X$KCBWAIT                   |     1 |     6 |     0   (0)|          |       |       |
|* 11 |   COUNT STOPKEY                          |                             |       |       |            |          |       |       |
|* 12 |    HASH JOIN RIGHT OUTER                 |                             | 29795 |   931K|   449   (0)| 00:00:01 |       |       |
|  13 |     INDEX FAST FULL SCAN                 | WRH$_EVENT_NAME_PK          |  2054 | 28756 |     5   (0)| 00:00:01 |       |       |
|  14 |     PARTITION RANGE ALL                  |                             | 29795 |   523K|   444   (0)| 00:00:01 |     1 |     8 |
|  15 |      TABLE ACCESS FULL                   | WRH$_ACTIVE_SESSION_HISTORY | 29795 |   523K|   444   (0)| 00:00:01 |     1 |     8 |
----------------------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
  10 - filter((MOD("INDX",19)<>0 AND "INST_ID"=USERENV('INSTANCE')))
  11 - filter(ROWNUM<=3)
  12 - access("ASH"."DBID"="EVT"."DBID" AND "ASH"."EVENT_ID"="EVT"."EVENT_ID")
Note that the name of temporary table (SYS_TEMP_0FD9D6B1B_42ABBB2) is changed after each new parsing of the same statement, but SQL_ID (gvrzr3dmans6k), child number (0) and Plan hash value (1419524593) are always the same. (the first number: 0FD9D6B1B increased by 1 after each new parsing, the second number: 42ABBB2 stays the same in each database)


3. INLINE hint


With an INLINE hint, the subquery is treated as an inline view instead of a temporary table. Then there is no more log file sync is observed.

with sq as (select /*+ INLINE */ class, rownum r from v$waitstat)
  select session_id, 
         (select min(r) from sq) mir, 
         (select min(r) from sq) mar
    from dba_hist_active_sess_history
    --from gv$active_session_history
    where rownum <= 3; 
Update (04-Dec-2022)
Blog: ORA-12850: Could not allocate slaves on all specified instances error and a workaround for my scripts talked about GV$ views for reading cluster-wide ASH data and PX slaves used for fetching remote data due to Xplan temp table transformations of materialized WITH subqueries, and provided workaround of "INLINE" hint in some TPT scripts.


4. TEMP TABLE TRANSFORMATION and Performance


As a further test without suspending LGWR, we can run following code for both cases of With/Without INLINE hint and compare their outcome:

create or replace procedure testp (p_sleep_seconds number, p_job_id number, p_hint varchar2 := null) as
  l_start_time    number := dbms_utility.get_time;
  l_cnt           number := 0;
  type num_tab    is table of number;
  l_num_tab_1     num_tab;
  l_num_tab_2     num_tab;
  l_num_tab_3     num_tab;
begin
  loop
    execute immediate q'[
      with sq as (select /*+ ]' ||p_hint||q'[ */ class, rownum r from v$waitstat)
      select session_id, 
            (select min(r) from sq) mir, 
            (select min(r) from sq) mar
       from dba_hist_active_sess_history
      where rownum <= ]' || p_job_id
      bulk collect into l_num_tab_1, l_num_tab_2, l_num_tab_3;

    l_cnt := l_cnt + 1;
    exit when mod(l_cnt, 1000) = 0 and (dbms_utility.get_time - l_start_time) >= 100*p_sleep_seconds;
  end loop;
end;
/

-- exec testp(3, 2);
-- exec testp(3, 2, 'INLINE');

create or replace procedure start_jobs (p_count number, p_sleep_seconds number, p_hint varchar2 := null) as
begin
  for i in 1..p_count loop
    dbms_scheduler.create_job (
      job_name        => 'TEST_JOB_'||i,
      job_type        => 'PLSQL_BLOCK',
      job_action      => 'begin testp('||p_sleep_seconds||','||i||','''||p_hint||'''); end;',    
      start_date      => systimestamp,
      auto_drop       => true,
      enabled         => true);
  end loop;
end;
/


-- Start Test for 10 minutes

exec start_jobs(16, 600);

--exec start_jobs(16, 600, 'INLINE');
During the test, we can check system behaviour with:

select program, event, p1, p2, p3, p1text, p2text, p3text, v.* 
  from v$session v 
 where program like '%(J0%'
 order by v.event; 

select program, event, p1, p2, p3, p1text, p2text, p3text, v.* 
  from v$active_session_history v
 order by sample_time desc, v.event; 
 
select * from v$sysstat 
 where name in ('redo entries', 'redo size', 'undo change vector size', 'redo synch time');
In real OLTP systems, which have many short running queries composed of WITH subquery_factoring_clause transformed as a temporary table, "SQL memory manager workarea list latch" ("Where": qesmmIRegisterWorkArea, qesmmIUnRegisterWorkArea) is observed in top Latch Miss Sources.

"log file sync" is listed in Top 10 Foreground Events with following redo stats:
   log file sync Avg Wait:  1.12 ms
   Redo size per second:      90 K
   Redo entries per second:  240
   Redo size per Redo entry: 380 Bytes
   log switches per Hour:    1