Monday, May 2, 2011

Update Restart and new Active Undo Extent

Blog:  That old restart problem again...
(http://tkyte.blogspot.com/2010/04/that-old-restart-problem-again.html) or Link: That old restart problem again...
presents a new case of update restart, which demonstrates that restart can also happen within a single session, not necessarily two sessions with one being blocked by a update from another session. Inherently single session restart is distinct from the normal two sessions' restart  in which all rows till first detected changed row are restarted, however in single session restart, only occasionally a few rows are restarted.

Referring to Oracle documentation, it is not evident to get a clue about such behavior. Sometimes I hear the warning: "Don't go searching for it in the documentation" (Blog:  NO_DATA_NEEDED - something I learned recently in http://tkyte.blogspot.com/2010/04/nodataneeded-something-i-learned.html).

So it is worth of pursuing some research on it. Following modified script shows certain coincidence of new active undo_extent and restart.

I slightly modified the code in Blog:  That old restart problem again...: by adding a helper function to count the active undo extents, and some lines to print out this count.

create or replace function get_undo_extent_cnt return number
as
  l_cnt number;
begin   
 select count(*) into l_cnt from dba_undo_extents where status = 'ACTIVE';
 return l_cnt;
end;
/

create or replace package pacepack
as
  type array is table of number index by varchar2(40);
  g_data                         array;
  g_cnt                           number;
  g_restart_cnt                number;
  g_undo_extent_cnt       number;
end pacepack;
/

drop table delete_data;

create table delete_data as
select owner, object_name, rownum id
from  dba_objects
where rownum  < 53001;
    
create or replace trigger delete_data_bt
  before delete or update on delete_data
  for each row
begin
  if ( pacepack.g_data.exists(rowidtochar(:old.rowid)))
  then
    pacepack.g_restart_cnt := pacepack.g_restart_cnt + 1;
   
    dbms_output.put_line( 'doing "' || :old.rowid ||
                           '" again was called ' || pacepack.g_cnt );
    dbms_output.put_line('-- Restart#: ' || pacepack.g_restart_cnt ||
                            ', Undo Extent Count: ' || get_undo_extent_cnt);
  else
    pacepack.g_data(rowidtochar(:old.rowid)) := 1;
  end if;
      
  pacepack.g_cnt        := pacepack.g_cnt + 1;
end;
/

Assume you are the only one on the database, then run the same script:

declare
  cursor l_delete_csr is
    select * from delete_data for update;
  l_cnt number := 0;
begin
  pacepack.g_data.delete;
  pacepack.g_cnt         := 0;
  pacepack.g_restart_cnt := 0;
         
  for l_delete_row in l_delete_csr loop
    update delete_data set owner = lower(owner) where current of l_delete_csr;
    l_cnt := l_cnt + 1;
   
    if l_cnt = 1 then
     for c in (select usn, extents, extends from v$rollstat where xacts > 0) loop
      dbms_output.put_line('---- Begin Rollstat: '||'USN='||c.usn||', EXTENTS='||c.extents||', EXTENDS='||c.extends);
     end loop;
    end if;
  end loop;
    
  dbms_output.put_line( 'trigger count = ' || pacepack.g_cnt || ' local count = '   || l_cnt );   
  for c in (select usn, extents, extends from v$rollstat where xacts > 0) loop
    dbms_output.put_line('---- End Rollstat: '||'USN='||c.usn||', EXTENTS='||c.extents||', EXTENDS='||c.extends);
  end loop;
 end;
 /

output:

---- Begin Rollstat: USN=20, EXTENTS=24, EXTENDS=21
doing "AABEYTAAAAAC9WMAB6" again was called 7144
-- Restart#: 1, Undo Extent Count: 23
doing "AABEYTAAAAAC9XMACo" again was called 18475
-- Restart#: 2, Undo Extent Count: 24
doing "AABEYTAAAAAC9YpACA" again was called 29740
-- Restart#: 3, Undo Extent Count: 25
doing "AABEYTAAAAAC9ZsAB1" again was called 41005
-- Restart#: 4, Undo Extent Count: 26
doing "AABEYTAAAAAC9axAAa" again was called 52298
-- Restart#: 5, Undo Extent Count: 27
trigger count = 53005 local count = 53000
---- End Rollstat: USN=20, EXTENTS=29, EXTENDS=26

The output shows 5 Restarts(53005-53000) , 5 new Extents(29-24) created by 5 extending(26-21).
So for each restart, a new active Undo Extent is created. If you run it repeatedly, you will see the same behavior.

As we know, Oracle fulfils a DML by following 5 steps:
  create UNDO Change Vector
  create REDO Change Vector
  combine both Change Vectors into Redo Buffer, and then write to redo log
  write UNDO record into UNDO file
  write modified Data into DATA file

When a transaction fills rollback segment and finds that the last remaining block is not sufficient to hold the UNDO record, it will trigger SMON to perform a recursive space transaction, which will allocate a new UNDO extent. Hereafter Oracle recommences the above 5 steps, and hence an "Update Restart".

As I tested in 10g and 11g (undo_management=AUTO), this output is confirmed,  but I would be careful to have any earlier conclusion before Oracle says.

Update (2022Aug25) --------------------------------------------------

Blog: That Old Restart Problem Strikes Back: Setting the Stage (1 August 2019)
sets DML diagnostic events in session:

  alter session set events 'trace[dml]:sql_trace wait=true';
and shows that there is one ORA-01551 error for each restart.

Setting this DML event, repeat our above test.
For above 5 Restarts, we can see 5 occurrences of following text in the trace file:

updThreePhaseExe: objn=4891285 phase=NOT LOCKED
updaul: phase is NOT LOCKED snap oldsnap env: (scn: 0x00000b7302b6b7ac  xid: 0x006e.011.00006316  uba: 0x00c0227c.1936.58 
updrow: objn=4891285 phase=NOT LOCKED
updrow: objn=4891285 error=1551
updrow: qecinvsub objn=4891285
updrow: setting ROW_RETRY objn=4891285
updrow: retry_this_row: ROW_RETRY set, objn= 4891285 phase=NOT LOCKED
The difference of those 5 occurrences are the second line about (scn, xid, uba),
where scn and uba are different for each restart, but for the same xid:

updaul: phase is NOT LOCKED snap oldsnap env: (scn: 0x00000b7302b6b7ac  xid: 0x006e.011.00006316  uba: 0x00c0227c.1936.58 
updaul: phase is NOT LOCKED snap oldsnap env: (scn: 0x00000b7302b6b7b0  xid: 0x006e.011.00006316  uba: 0x00c0517c.1937.5a 
updaul: phase is NOT LOCKED snap oldsnap env: (scn: 0x00000b7302b6b7b2  xid: 0x006e.011.00006316  uba: 0x00c051fc.1938.5a 
updaul: phase is NOT LOCKED snap oldsnap env: (scn: 0x00000b7302b6b7b3  xid: 0x006e.011.00006316  uba: 0x00c0527c.1939.59 
updaul: phase is NOT LOCKED snap oldsnap env: (scn: 0x00000b7302b6b7b7  xid: 0x006e.011.00006316  uba: 0x00c052fc.193a.5a 
If we made a 1551 errorstack trace:

  alter session set events='1551 trace name errorstack level 3';
the Error Stack shows:

----- Error Stack Dump -----
ORA-01551: extended rollback segment, pinned blocks released
----- Current SQL Statement for this session (sql_id=4u6b4b82z7xrp) -----
SELECT * FROM DELETE_DATA FOR UPDATE
----- PL/SQL Call Stack -----

[15] (kgeselv()+89                    
[16] (ksesecl0()+189                  
[17] (ktusmasp()+1088   --->ktusmasp - kernel transaction undo smu (system managed undo) add space              
[18] (ktuchg2()+9603                  
[19] (ktbchg2()+231                   
[20] (kddchg()+611                    
[21] (kddlok()+2350                   
[22] (kddlkr()+322                    
[23] (qerfuProcessFetchedRow()+669    
[24] (qerfuLockingRowProcedure()+68   
[25] (qersoRowP()+880                 
[26] (kdstf000110100000000km()+1014   
[27] (kdsttgr()+2154                  
[28] (qertbFetch()+1090               
[29] (qersoProcessULS()+281           
[30] (qersoFetchSimple()+2251         
[31] (qersoFetch()+210                
[32] (qerfuStart()+602                
[33] (selexe0()+3409                  
[34] (opiexe()+11896  
End of Update (2022Aug25)--------------------------------------------------



The same behavior is also observed for the single row update, in which the cursor is defined with an input parameter p_id to select a single row:

declare
   cursor l_delete_csr (p_id number) is
     select * from delete_data where id = p_id for update;
   l_cnt number := 0;
begin
  pacepack.g_data.delete;
  pacepack.g_cnt         := 0;
  pacepack.g_restart_cnt := 0;

  for i in 1..53000 loop       
    for l_delete_row in l_delete_csr(i) loop
      update delete_data set owner = lower(owner) where current of l_delete_csr;
      l_cnt := l_cnt + 1;
     
      if l_cnt = 1 then
       for c in (select usn, extents, extends from v$rollstat where xacts > 0) loop
        dbms_output.put_line('---- Begin Rollstat: '||'USN='||c.usn||', EXTENTS='||c.extents||', EXTENDS='||c.extends);
       end loop;
      end if;     
    end loop;
  end loop;
  
  dbms_output.put_line( 'trigger count = ' || pacepack.g_cnt ||' local count = '   || l_cnt );
  for c in (select usn, extents, extends from v$rollstat where xacts > 0) loop
    dbms_output.put_line('---- End Rollstat: '||'USN='||c.usn||', EXTENTS='||c.extents||', EXTENDS='||c.extends);
  end loop;      
end;
/



Output:

---- Begin Rollstat: USN=34, EXTENTS=29, EXTENDS=25
doing "AABEYTAAAAAC9WhABL" again was called 11315
-- Restart#: 1, Undo Extent Count: 29
doing "AABEYTAAAAAC9XnAAB" again was called 22612
-- Restart#: 2, Undo Extent Count: 30
doing "AABEYTAAAAAC9ZCAAe" again was called 33877
-- Restart#: 3, Undo Extent Count: 31
doing "AABEYTAAAAAC9aHAAr" again was called 45142
-- Restart#: 4, Undo Extent Count: 32
trigger count = 53004 local count = 53000
---- End Rollstat: USN=34, EXTENTS=33, EXTENDS=29


The Blog: Three new "hints" (http://rwijk.blogspot.com/2009/10/three-new-hints.html) shows a new  Oracle 11.2.0.1 hint: RETRY_ON_ROW_CHANGE, which I would call it block level restart, in comparing to the traditional row level (maybe this hint already exists in 10g since it is mentioned in 10g  "Bug 6082734  Dump (kdr9igtl) accessing a row from an IOT").

One interesting code in this Blog is that it uses after trigger to catch the restart.