Monday, November 21, 2011

One Oracle 12g Pre-announce of bug fix in dbms_debug_jdwp

4 years ago, I filed a bug of dbms_debug_jdwp in 10gR2, after 2 years of pingpong playing, Oracle finally acknowledged the reproducibility of submitted testcase, but only promised to fix it in Oracle 12g.

Here the test code:

alter system set plsql_optimize_level = 1;

CREATE OR REPLACE PACKAGE ksun_nocopy_called
IS
   type t_rec is record(
    a                    number := 100
   ,b                    number
   );
  PROCEDURE called(
    o_rec     OUT NOCOPY t_rec
  );
END ksun_nocopy_called;
/

CREATE OR REPLACE PACKAGE BODY ksun_nocopy_called
IS
  PROCEDURE called(
    o_rec     OUT NOCOPY t_rec
  )
  IS
  BEGIN
    null;
  END called;
END ksun_nocopy_called;
/

CREATE OR REPLACE PACKAGE ksun_nocopy_caller
IS
  PROCEDURE caller;
END ksun_nocopy_caller;
/

CREATE OR REPLACE PACKAGE BODY ksun_nocopy_caller
IS
  b_rec_old                   ksun_nocopy_called.t_rec;
  PROCEDURE caller
  IS
  BEGIN
    ksun_nocopy_called.called(b_rec_old);
  END caller;
END ksun_nocopy_caller;
/


To run the testcase, start JDeveloper (Version 10.1.3.2.0), set a breakpoint at the line in ksun_nocopy_called.called:
    null;
   
Open a Sqlplus window, run the following code:

exec dbms_debug_jdwp.connect_tcp('pc-123', 4000);
exec ksun_nocopy_caller.caller
exec dbms_debug_jdwp.disconnect;

Then get the following message:

*********START PLSQL RUNTIME DUMP************
***Got internal error Exception caught in pl/sql run-time while running PLSQL***
***Got ORA-6544 while running PLSQL***
PACKAGE BODY K.KSUN_NOCOPY_CALLER:


Due to my impatience and their reluctance, two alternative approaches were worked out.

One is to substitute the package body variable with a procedure local variable as:

CREATE OR REPLACE PACKAGE BODY ksun_nocopy_caller
IS
  b_rec_old                   ksun_nocopy_called.t_rec;
  PROCEDURE caller
  IS
    l_rec_copy                ksun_nocopy_called.t_rec;
  BEGIN
    l_rec_copy := b_rec_old;
    ksun_nocopy_called.called(l_rec_copy);
    b_rec_old := l_rec_copy;
  END caller;
END ksun_nocopy_caller;

Another is to eliminate initial assignment in record declaration as:

   type t_rec is record(
    a                    number
   ,b                    number
   );
  
Probably Oracle was either encouraged by the simplicity of testcase, or inspired by the intuitive workarounds, they finally recognized the bug and projected to fix it in Oracle 12g.

By the way, DEBUG parameters are not updated in Oracle 12c.

Addendum (2015.10.19):  Oracle 12c document said:
   The PLSQL_DEBUG parameter is deprecated. It is retained for backward compatibility only (see PLSQL_DEBUG)

 it is replaced by:
   PLSQL_OPTIMIZE_LEVEL = 1

However 12c DBMS_TRACE document wrote:
   You can enable a program unit by compiling it debug.
       alter session set plsql_debug=true;

(see Database PL/SQL Packages and Types Reference)



 

Thursday, November 17, 2011

Oracle 11gR2 single session "library cache pin"

Abstract:


Recently we encountered a single session cyclic  "library cache pin" during upgrade to 11.2.0.3.0 from a lower version, in which the application generated types (in SYS schema) are cleaned out by:

     drop type sys.SYS_PLSQL_X_Y_Z force;

Addendum (2015.11.02): Above drop is done by SMON (see Section CLEANUP_NON_EXIST_OBJ Task of follow-up Blog: Oracle 12c single session "library cache lock (cycle)" deadlock).

In order to understand this special deadlock case, run the attached the test code (at the end of this Blog). which consists of three code parts (SYS_PLSQL_3238_382_1 is a package generated sys type):

   create package spec and package body
   drop type sys.SYS_PLSQL_3238_382_1 force
   alter package suk_lc_pin# compile body


and session is waiting for "library cache pin" for a duration of 15 minutes, which can be observed by:

   select * from v$session where event like 'library cache pin';
   select * from v$wait_chains; 

After 15 minutes, session throws:

      ORA-04021: timeout occurred while waiting to lock object

15 minutes timeout is the default value of "_kgl_time_to_wait_for_locks" (time to wait for locks and pins before timing out).

We can break the blocked session and remove all invalids by:

      alter package lc_pin# compile;

The attached testcase is based on dba_tables, but it is reproducible with dba_segments, dba_objects, dba_indexes.     

Affected Oracle Version:

      11.2.0.1.0, 11.2.0.2.0, 11.2.0.3.0, but not 10.2.0.4.0.

Reasoning:


The problem seems caused by the "with" factoring clause (see attached testcase at the end of this Blog).

When Oracle parses "with" factoring clause, it acquires a "library cache pin" in the Share Mode (S) on the dependent objects, in this case, it is "t_vc", then it proceeds to main clause,  in which it realizes that the dependent object: "t_dba_row_tab" is invalid. In order to resolve this invalid, Oracle attempts to recompile package spec, which requests Exclusive Mode (X) on the related objects.

Since the already held mode (S) on "t_vc" is not consistent with requesting mode (X), Oracle session spins on the wait event "library cache pin". By default, Oracle throws an error:
    ORA-04021: timeout occurred while waiting to lock object
after 15 minutes.

In theory, this single session cyclic lock should be detected as a deadlock, however, Oracle "library cache pin" is not deadlock sensitive. That is probably the reason why the waiting event is controlled by a 15 minute timeout in Oracle implementation.

A further query:

select (select kglnaobj||'('||kglobtyd||')'
          from x$kglob v
         where kglhdadr = object_handle and rownum=1) kglobj_name
       ,v.*
from v$libcache_locks v
where v.holding_user_session  =
         (select saddr from v$session
           where event ='library cache pin' and rownum = 1)
  and object_handle in (select object_handle from v$libcache_locks where mode_requested !=0)

order by kglobj_name, holding_user_session, type, mode_held, mode_requested;

shows there are two rows on SYS_PLSQL_263602_21_1(TYPE) with TYPE: PIN, in which the HOLDING_USER_SESSION and HOLDING_SESSION are different in the row with
        MODE_REQUESTED = 3 (Exclusive mode)

KGLOBJ_NAMETYPEADDRHOLDING_USER_SESSIONHOLDING_SESSIONOBJECT_HANDLELOCK_HELDREFCOUNTMODE_HELDMODE_REQUESTEDSAVEPOINT_NUMBER
SYS_PLSQL_263602_21_1(TYPE)PIN07000000A5C224E007000000C09CE61007000000C1A399B007000000A8AEFD2000037504
SYS_PLSQL_263602_21_1(TYPE)PIN07000000A724383807000000C09CE61007000000C09CE61007000000A8AEFD2007000000A72439381206235

From the query result, we can see that HOLDING_USER_SESSION already held a PIN mode of 2(Share mode), but at the same time designates a different recursive session to request a PIN mode of 3(Exclusive mode). The column SAVEPOINT_NUMBER seems recording the sequence of PIN get and request.

This probably explains why Oracle can't detect such deadlock case.

In Oracle, HOLDING_USER_SESSION is the session we see in v$session, whereas HOLDING_SESSION is the recursive session when both are not the same. Normally recursive session is spawned out when HOLDING_USER_SESSION requires "SYS" user privilege to perform certain tasks.

By the way, recursive session is not exported in v$session because of filter predicate:
       bitand("s"."ksuseflg",1)<>0
on x$ksuse, where bitand (s.ksuseflg, 19) = 1 for 'USER' session; = 17 for 'BACKGROUND', = 2 for 'RECURSIVE'.
(Tanel has a Blog on recursive session in Recursive sessions…)

A quick workaround is to recompile the package spec by:
   
    alter package lc_pin# compile;

The new recompiled sources can be listed by:

    select * from dba_objects order by last_ddl_time desc;
   
from which one can see the t_vc is recompiled even it was valid before (this confirms the Exclusive Mode (X) request on t_vc).

Further statistics shows that Oracle is continuously enriching its library cache with the newer versions, for example, v$db_object_cache contains 13 columns in 10.2.0.4.0, 21 in 11.2.0.1.0 and 11.2.0.2.0, and now 23 in 11.2.0.3.0.

Once all are valid, you can run the query:

   select * from table(lc_pin#.soo);   
   

Fundamentals:

   
I would like to share the basics I used to approach this problem.

The Oracle generated representations for package defined types are denoted as:

    SYS_PLSQL_263602_9_1  for t_dba_row_tab
    SYS_PLSQL_3238_382_1  for sys.dba_tables%rowtype
    SYS_PLSQL_263602_31_1 for t_vc_tab
    SYS_PLSQL_263602_21_1 for t_vc
   
where 263602 is the object_id of lc_pin# (package spec, not package body) in dba_objects, 3238 is that of DBA_TABLES.
   
Additionally a type

    SYS_PLSQL_263602_DUMMY_1 as table of number;
   
is created, probably to index the two PL/SQL nested tables: t_dba_row_tab and t_vc_tab, even though both are not declared as associative array (formerly called PL/SQL table or index-by table). We can also guess that PL/SQL nested table is internally implemented as conventional index-by table.

The generated sources can be listed by:

select * from dba_source where name in (
     'SYS_PLSQL_263602_9_1'
     ,'SYS_PLSQL_3238_382_1'
     ,'SYS_PLSQL_263602_31_1'
     ,'SYS_PLSQL_263602_21_1'
     ,'SYS_PLSQL_263602_DUMMY_1');
   
One can also notice that after:

    drop type sys.SYS_PLSQL_3238_382_1 force;

SYS_PLSQL_3238_382_1 is no more registered in dba_objects, but still retained in sys.obj$. In sys.obj$, however, it is altered from type# 13 (TYPE) to type# 10 object  (also named NON-EXISTENT object in Oracle).

v$libcache_locks contains rich details on pin and lock (refcount, mode_held, mode_requested).

After the first publication of this Blog, Tanel's powerful kglpn.sql (TPT_public.zip) showed me the following output:

PIN_MODE REQ_MODE PINNED_BLOCKS  OBJECT_NAME
-------- -------- -------------- -----------------------
Share     None     0 6           SYS.SYS_PLSQL_263602_21_1
None      Excl                   SYS.SYS_PLSQL_263602_21_1


This confirms again the self-deadlock caused by:
   SYS.SYS_PLSQL_263602_21_1
since its PIN_MODE = Share, and REQ_MODE = Excl.

AIX Trace


On AIX, running trace command with session's PID: 28246034 for 20 seconds by:

trace -a -A 28246034 -o trc_raw; sleep 20; trcstop; trcrpt -O "exec=on,pid=on" trc_raw

we get the output (some details are removed):

ID   PID       ELAPSED_SEC     DELTA_MSEC   SYSCALL KERNEL  INTERRUPT
001  16318614  0.000000000       0.000000           TRACE ON channel 0
200  28246034  1.243906765    1243.906765           resume  oracletestdb iar=DF09C cpuid=FFFFFFFF
104  28246034  1.243909585       0.002820   return from system call
101  28246034  1.243953916       0.044331   _poll LR = 9000000012D75D4
252  28246034  1.243956337       0.002421   SOCK soo_select fp=xx so= corl=0 reqevents= rtneventsp=
252  28246034  1.243956876       0.000539   SOCK return from soo_select fp= so= error=0
104  28246034  1.243957916       0.001040   return from _poll [4 usec]
101  28246034  1.243965021       0.007105   _thread_wait LR = 90000000129C2AC
200  28246034  4.243980259    3000.015238           resume  oracletestdb iar=DF09C cpuid=FFFFFFFF
104  28246034  4.243982353       0.002094   return from _thread_wait [3.000017 sec]
101  28246034  4.244031337       0.048984   _thread_wait LR = 90000000129C2AC
200  28246034  7.244053839    3000.022502           resume  oracletestdb iar=DF09C cpuid=FFFFFFFF
104  28246034  7.244055822       0.001983   return from _thread_wait [3.000024 sec]
101  28246034  7.244095943       0.040121   _thread_wait LR = 90000000129C2AC
200  28246034 10.244111460    3000.015517           resume  oracletestdb iar=DF09C cpuid=FFFFFFFF


it shows that session resumes after each 3 seconds of thread_wait (v$wait_chains is also refreshed each 3 seconds).



------------------------- TestCase -------------------------
-- This test is with dba_tables.
-- It is also reproducible with dba_segments, dba_objects, dba_indexes.

drop package lc_pin#;

-- Create Package and Package Body
create or replace package lc_pin#

as
  type t_dba_row_tab is table of sys.dba_tables%rowtype; 
  type t_vc          is record (name varchar2(30));
  type t_vc_tab      is table of t_vc;
   
  function foo return t_vc_tab pipelined;
  function koo return t_dba_row_tab pipelined;
  function soo return t_dba_row_tab pipelined;
end lc_pin#;
/

create or replace package body lc_pin#
as

  function foo return t_vc_tab pipelined
  is
    l_result  t_vc;
  begin
    l_result.name     := 'lc_test';
    pipe row(l_result);
    return;
  end foo;  
   
  function koo return t_dba_row_tab pipelined
  is
  begin
    for c in (select * from dba_tables where  rownum = 1) loop
      pipe row(c);
    end loop;
  end koo;
 
  function soo return t_dba_row_tab pipelined
  is
  begin
    for c in (
      with sq as (select * from table(foo))
      select nt.*
      from   sq 
            ,(select * from table(koo)) nt
     
      -- following re-write works
      -- select nt.* from (select * from table(foo)) sq, (select * from table(koo)) nt
    ) loop
      pipe row(c);
    end loop;
  end soo;
 
end lc_pin#;
/

-- Generate "drop type sys.SYS_PLSQL_3238_382_1 force;" and execute
declare

    l_stmt    varchar2(100);
begin
    select 'drop type sys.' || object_name || ' force' drop_stmt
    into   l_stmt
    from   dba_objects
    where  object_name like
            (select 'SYS_PLSQL_' || object_id || '%_1'
             from   dba_objects
             where  owner = 'SYS' and object_name = 'DBA_TABLES' and object_type = 'VIEW')
      and  object_name not like '%DUMMY%'
    ;
   
    dbms_output.put_line('Run l_stmt: ' || l_stmt);
   
    execute immediate l_stmt;
end;
/

-- Compile Package Body, session is in wait event "library cache pin" for 15 minutes,
-- then throws "ORA-04021: timeout occurred while waiting to lock object"
alter package lc_pin# compile body;

 

Tuesday, July 19, 2011

Redo/Undo explosion from thick declared table insert

This blog presents a Redo/Undo explosion caused by thick declared table insert. Originally there is a thin table (thin_tab) consisting of two columns: a number and a 40 char varchar2. Later two new columns with 1000 char varchar2 each are required to store some seldom occurred message, so a new table (thick_tab) is created by adding these two new columns.

The test code is performed on 10gR2 and 11gR2 on a NOARCHIVELOG-mode database with
   undo_management=AUTO,
   db_block_size=8192,
   nls_characterset=AL32UTF8.

First we insert 10000 rows into the thin_tab, then we insert the same content into the thick_tab. The two new columns in thick_tab are not used at all. Table test_stats is used to store the test statistics for each step.

drop table test_stats;

create table test_stats
    (step  varchar2(10),
     name  varchar2(30),
     value number       );

drop table thin_tab;

create table thin_tab
(
  num    number,
  txt    varchar2(40 char)
)
tablespace sysaux
pctused    0
pctfree    10
initrans   1
maxtrans   255
storage    (
            initial          64k
            next             1m
            minextents       1
            maxextents       unlimited
            pctincrease      0
            buffer_pool      default
           )
logging
nocompress
nocache
noparallel
monitoring;

drop table thick_tab;

create table thick_tab
(
  num    number,
  txt    varchar2(40 char),
  txtn1  varchar2(1000 char),
  txtn2  varchar2(1000 char)
)
tablespace sysaux
pctused    0
pctfree    10
initrans   1
maxtrans   255
storage    (
            initial          64k
            next             1m
            minextents       1
            maxextents       unlimited
            pctincrease      0
            buffer_pool      default
           )
logging
nocompress
nocache
noparallel
monitoring;

---- step_1 ----
insert into test_stats
select 'step_1' step, vn.name, vs.value
from   v$sesstat  vs
     , v$statname vn
where  vs.sid = userenv('sid')
  and  vs.statistic# = vn.statistic#
  and  vn.name in ('redo size', 'undo change vector size');

---- step_2 insert into thin_tab ----
insert into thin_tab(num, txt)
select level, 'abc' from dual connect by level <= 10000;

insert into test_stats
select 'step_2' step, vn.name, vs.value
from   v$sesstat  vs
     , v$statname vn
where  vs.sid = userenv('sid')
  and  vs.statistic# = vn.statistic#
  and  vn.name in ('redo size', 'undo change vector size');

---- step_3 insert into thick_tab ----
insert into thick_tab(num, txt)
select level, 'abc' from dual connect by level <= 10000;

insert into test_stats
select 'step_3' step, vn.name, vs.value
from   v$sesstat  vs
     , v$statname vn
where  vs.sid = userenv('sid')
  and  vs.statistic# = vn.statistic#
  and  vn.name in ('redo size', 'undo change vector size');
 
select step, name, value,
       (value - lag(value) over (partition by name order by step)) diff
from   test_stats;

commit;

select segment_name, blocks, bytes
from   dba_segments
where  segment_name in ('THIN_TAB', 'THICK_TAB');


Output:

STEP    NAME                     VALUE       DIFF
------- ------------------------ ----------  ----------
step_1  redo size                11'592'572 
step_2  redo size                11'793'484     200'912
step_3  redo size                14'335'284   2'541'800
step_1  undo change vector size   3'011'440 
step_2  undo change vector size   3'037'820      26'380
step_3  undo change vector size   3'720'120     682'300

SEGMENT_NAME  BLOCKS  BYTES
------------- ------  -------
THICK_TAB         24  196'608
THIN_TAB          24  196'608

Above output demonstrates thick_tab insert generated 10 times redo,and 30 times undo than thin_tab even though the two new columns in thick_tab have nothing inserted. But both data segments have the similar size.


By dumping the redo logfile, it turns out that Oracle uses row array allocation for thin_tab, but single row allocation for thick_tab. Probably that is an Oracle internal optimization.

If using direct-path insert (insert /*+ append */ ), there will be no big difference for both inserts, and redo and undo will be much less (redo size = 10K, undo change vector size = 2K).


For partitioned table, when multiple sessions concurrently make the direct-path(with append hint) for each partition per session, the PARTITION clause is mandatory:

 insert /*+ append */ into test_table_1 PARTITION (part_1) select *  from test_table_2;

Otherwise there is a TM lock contention among the sessions on global table.
The reason is because without PARTITION clause, direct-path makes:

 TM lock with LMODE 6 on global table;
 TM lock with LMODE 3 on the specified partition;


however, with PARTITION clause, they are:

  TM lock with LMODE 3 on global table;
 TM lock with LMODE 6 on the specified partition;


For non direct-path, with or without PARTITION clause are the same:

 TM lock with LMODE 3 on global table;
 TM lock with LMODE 3 on the specified partition;


By the way, Oracle official document about:
   Locking Considerations with Direct-Path INSERT
states:
  During direct-path INSERT, the database obtains exclusive locks on the table (or on all partitions of a partitioned table).
(see Oracle® Database Administrator's Guide 11g Release 2 (11.2)

This claim only holds when no PARTITION clause is specified.

Thursday, May 5, 2011

ORA-04027: self-deadlock during automatic validation

Sometimes ago I hit an error:

   ORA-04027: self-deadlock during automatic validation for object

I wonder why there is a self-deadlock in a single session (no autonomous transaction involved). Then I tried to extract a short test case to reproduce it as follows.

drop table testt1;
drop package testp1;
drop procedure proc1;

create table testt1 (a number(2));
insert into testt1 values (12);
commit;

create or replace package testp1 as
  function f1 return number;
  procedure p1;
end testp1;
/

create or replace procedure proc1              
as
  procedure prt(p_name varchar2) as
  begin
      for c in (select p_name || object_name || '(' || object_type || ')' || status s
                from   dba_objects
                where  object_name in ('TESTP1', 'PROC1'))
      loop
        dbms_output.put_line(c.s);
      end loop; 
  end;
begin
  prt('Before Alter: ');
  execute immediate 'alter table testt1 modify (a number(2))';
  prt('After Alter:  ');
  update testt1 set a=testp1.f1;
end proc1;
/

create or replace package body testp1 as
  function f1 return number as
    begin
      return 10;
    end;
   
  procedure p1 is
    begin
      proc1;
    end;
end testp1;
/

Now if I run:

SQL> exec proc1;

Before Alter: TESTP1(PACKAGE BODY)VALID
Before Alter: TESTP1(PACKAGE)VALID
Before Alter: PROC1(PROCEDURE)VALID
After Alter: TESTP1(PACKAGE BODY)INVALID
After Alter: TESTP1(PACKAGE)VALID
After Alter: PROC1(PROCEDURE)INVALID
BEGIN proc1; END;

*
ERROR at line 1:
ORA-04045: errors during recompilation/revalidation of TESTP1
ORA-04027: self-deadlock during automatic validation for object PROC1
ORA-06512: at "PROC1", line 16
ORA-06512: at line 1

There are two invalids: TESTP1(PACKAGE BODY) and PROC1(PROCEDURE). You can make both valid by:


alter PACKAGE TESTP1 COMPILE BODY;

but whenever you call proc1, they are again invalid.

The dependency is TESTP1(PACKAGE BODY) on PROC1, and PROC1 & TESTP1(PACKAGE BODY) on TESTP1(PACKAGE).

Here is an attempt to make some reasoning.

When running "exec proc1;", proc1 is pinned, after "alter table testt1", proc1 is invalid, which in turn, causes the invalid of testp1 body due to dependency (the pinned version of proc1 is still valid since it is the directly called unit), then proc1 runs to the update statement, which requires a valid testp1 body, thus it demands an X-lock on testp1 body, which again asks for an X-lock of proc1 (due to dependency). Since proc1 is already pinned (Share-lock) by its own, it is not possible to allocate an X-lock to itself. So a self-deadlock is generated during validation of proc1.
                       
The code was tested on 10gr2 and 11gr2. Toggling 11gr2 hidden parameters:
_disable_fast_validate(TRUE, FALSE), _ignore_fg_deps  (TABLES, PLSQL, ALL, NONE)
seems having no influence on the behavior.

Wednesday, May 4, 2011

Foreign Key on Nested Table

This Blog will make an attempt to define a referential constraint on a nested table column.

At first I copy and run the code from Book: Expert Oracle Database Architecture: 9i and 10g (http://www.apress.com/9781590595305).

drop table dept_and_emp cascade constraints;
drop type emp_tab_type;
drop type emp_type;

create or replace NONEDITIONABLE type emp_type  -- NONEDITIONABLE since 11.2
--create or replace type emp_type
as object
(empno       number(4),
 ename       varchar2(10),
 job         varchar2(9),
 mgr         number(4),
 hiredate    date,
 sal         number(7, 2),
 comm        number(7, 2)
)
/

create or replace NONEDITIONABLE type emp_tab_type  -- NONEDITIONABLE since 11.2
--create or replace type emp_tab_type
as table of emp_type
/

create table dept_and_emp
(deptno    number(2) primary key,
 dname     varchar2(14),
 loc       varchar2(13),
 emps      emp_tab_type
)
nested table emps store as emps_nt
/

alter table emps_nt add constraint
emps_empno_unique unique(empno)
/

insert into dept_and_emp values(1, 'Dept_1', 'Loc_1',
  new emp_tab_type(new emp_type(11, 'Emp_11', 'Job_11', null, date'1962-01-11', 1, 1),
                   new emp_type(12, 'Emp_12', 'Job_12', 11, date'1962-01-12', 2, 2)
                   ));
                  
insert into dept_and_emp values(2, 'Dept_2', 'Loc_2',
  new emp_tab_type(new emp_type(21, 'Emp_21', 'Job_21', null, date'1962-02-11', 1, 1),
                   new emp_type(22, 'Emp_22', 'Job_22', 21, date'1962-02-12', 2, 2)
                   )); 
                  
commit;
As already showed in the book, if run:

alter table emps_nt add constraint mgr_fk foreign key(mgr) references emps_nt(empno);

one will get an error:

ORA-30730: referential constraint not allowed on nested table column

Now I will try with other columns. At first, I list the columns from master and nested table (not relevant columns are removed):

SQL > select o.object_name tname, c.name, c.type# ctype
        from   dba_objects o, sys.col$ c
       where c.obj# = o.object_id
         and o.object_name in ('EMPS_NT', 'DEPT_AND_EMP')
       order by o.object_name, c.name;

DEPT_AND_EMP            SYS_NC0000400005$    23
EMPS_NT                 NESTED_TABLE_ID      23

I see both DEPT_AND_EMP.SYS_NC0000400005$ and EMPS_NT.NESTED_TABLE_ID having the same type: 23(SQLT_BIN).
   
then run:

alter table emps_nt add constraint emps_nt_fk foreign key(nested_table_id)
references dept_and_emp(sys_nc0000400005$);

it will create a foreign key constraint EMPS_NT_FK.

I can verify it by:

SQL > select constraint_name, constraint_type, table_name 
        from dba_constraints where table_name = 'EMPS_NT';

EMPS_NT_FK          R     EMPS_NT
EMPS_EMPNO_UNIQUE   U     EMPS_NT

Tested on Oracle 10gr2 and 11gr2, both works.

Now I am lost with Oracle documentation:

ORA-30730: referential constraint not allowed on nested table column
    Cause: An attempt was made to define a referential constraint on a nested table column.
    Action: Do not specify referential constraints on nested table columns.

By the way, on 11gr2, with the query:

SQL > select  index_name, table_name, column_name 
        from  dba_ind_columns 
       where table_name in ('EMPS_NT', 'DEPT_AND_EMP');

SYS_C0016818               DEPT_AND_EMP      DEPTNO
SYS_C0016819               DEPT_AND_EMP      EMPS
SYS_FK0000184243N00004$    EMPS_NT           NESTED_TABLE_ID
EMPS_EMPNO_UNIQUE          EMPS_NT           EMPNO

we can see the column NESTED_TABLE_ID in EMPS_NT is indexed by: SYS_FK0000184243N00004$. So Oracle follows its rule: "foreign keys should be indexed" (Oracle® Database Concepts 11g Release 2 (11.2) Part Number E16508-05 in http://download.oracle.com/docs/cd/E11882_01/server.112/e16508/datainte.htm#CNCPT88886)

However, this index can not be found in 10gr2.    

We can also verify this foreign key constraint:

SQL > delete dept_and_emp where deptno = 1;
ORA-02292: integrity constraint (K.EMPS_NT_FK) violated - child record found

Now disabling this constraint, delete deptno 1 again:

SQL > alter table emps_nt disable constraint emps_nt_fk;
SQL > delete dept_and_emp where deptno = 1;
1 row deleted.

We can see that both parent and child rows are deleted:

SQL > select SYS_NC0000400005$, t.* from dept_and_emp t;

SQL > select /*+ nested_table_get_refs */ NESTED_TABLE_ID, SYS_NC_ROWINFO$, t.* from emps_nt t;

(Without hint: nested_table_get_refs, it hits: "ORA-22812: cannot reference nested table column's storage table")

However, if we delete with two additional hints:

SQL > delete /*+ no_index(t SYS_FK0000184243N00004$) nested_table_get_refs */ dept_and_emp t
      where deptno = 2;
1 row deleted.

We can see that only parent row is deleted, but child rows remain:

SQL > select SYS_NC0000400005$, t.* from dept_and_emp t;
no rows selected

SQL > select /*+ nested_table_get_refs */ NESTED_TABLE_ID, SYS_NC_ROWINFO$, t.* from emps_nt t;

E40DC694BEAC31C7E04478E7D1E92954 (21; Emp_21; Job_21; ; 11.02.1962; 1; 1) 21 Emp_21 Job_21  11.02.1962 1 1

E40DC694BEAC31C7E04478E7D1E92954 (22; Emp_22; Job_22; 21; 12.02.1962; 2; 2) 22 Emp_22 Job_22 21 12.02.1962 2 2

2 rows selected.

Taking the NESTED_TABLE_ID of above query, we can delete these child rows directly:

SQL > delete /*+ nested_table_get_refs */ emps_nt t
      where nested_table_id=hextoraw('E40DC694BEAC31C7E04478E7D1E92954');
2 rows deleted.

SQL > select /*+ nested_table_get_refs */ NESTED_TABLE_ID, SYS_NC_ROWINFO$, t.* from emps_nt t;
no rows selected

 

Monday, May 2, 2011

Update Restart and new Active Undo Extent

Blog:  That old restart problem again...
(http://tkyte.blogspot.com/2010/04/that-old-restart-problem-again.html) or Link: That old restart problem again...
presents a new case of update restart, which demonstrates that restart can also happen within a single session, not necessarily two sessions with one being blocked by a update from another session. Inherently single session restart is distinct from the normal two sessions' restart  in which all rows till first detected changed row are restarted, however in single session restart, only occasionally a few rows are restarted.

Referring to Oracle documentation, it is not evident to get a clue about such behavior. Sometimes I hear the warning: "Don't go searching for it in the documentation" (Blog:  NO_DATA_NEEDED - something I learned recently in http://tkyte.blogspot.com/2010/04/nodataneeded-something-i-learned.html).

So it is worth of pursuing some research on it. Following modified script shows certain coincidence of new active undo_extent and restart.

I slightly modified the code in Blog:  That old restart problem again...: by adding a helper function to count the active undo extents, and some lines to print out this count.

create or replace function get_undo_extent_cnt return number
as
  l_cnt number;
begin   
 select count(*) into l_cnt from dba_undo_extents where status = 'ACTIVE';
 return l_cnt;
end;
/

create or replace package pacepack
as
  type array is table of number index by varchar2(40);
  g_data                         array;
  g_cnt                           number;
  g_restart_cnt                number;
  g_undo_extent_cnt       number;
end pacepack;
/

drop table delete_data;

create table delete_data as
select owner, object_name, rownum id
from  dba_objects
where rownum  < 53001;
    
create or replace trigger delete_data_bt
  before delete or update on delete_data
  for each row
begin
  if ( pacepack.g_data.exists(rowidtochar(:old.rowid)))
  then
    pacepack.g_restart_cnt := pacepack.g_restart_cnt + 1;
   
    dbms_output.put_line( 'doing "' || :old.rowid ||
                           '" again was called ' || pacepack.g_cnt );
    dbms_output.put_line('-- Restart#: ' || pacepack.g_restart_cnt ||
                            ', Undo Extent Count: ' || get_undo_extent_cnt);
  else
    pacepack.g_data(rowidtochar(:old.rowid)) := 1;
  end if;
      
  pacepack.g_cnt        := pacepack.g_cnt + 1;
end;
/

Assume you are the only one on the database, then run the same script:

declare
  cursor l_delete_csr is
    select * from delete_data for update;
  l_cnt number := 0;
begin
  pacepack.g_data.delete;
  pacepack.g_cnt         := 0;
  pacepack.g_restart_cnt := 0;
         
  for l_delete_row in l_delete_csr loop
    update delete_data set owner = lower(owner) where current of l_delete_csr;
    l_cnt := l_cnt + 1;
   
    if l_cnt = 1 then
     for c in (select usn, extents, extends from v$rollstat where xacts > 0) loop
      dbms_output.put_line('---- Begin Rollstat: '||'USN='||c.usn||', EXTENTS='||c.extents||', EXTENDS='||c.extends);
     end loop;
    end if;
  end loop;
    
  dbms_output.put_line( 'trigger count = ' || pacepack.g_cnt || ' local count = '   || l_cnt );   
  for c in (select usn, extents, extends from v$rollstat where xacts > 0) loop
    dbms_output.put_line('---- End Rollstat: '||'USN='||c.usn||', EXTENTS='||c.extents||', EXTENDS='||c.extends);
  end loop;
 end;
 /

output:

---- Begin Rollstat: USN=20, EXTENTS=24, EXTENDS=21
doing "AABEYTAAAAAC9WMAB6" again was called 7144
-- Restart#: 1, Undo Extent Count: 23
doing "AABEYTAAAAAC9XMACo" again was called 18475
-- Restart#: 2, Undo Extent Count: 24
doing "AABEYTAAAAAC9YpACA" again was called 29740
-- Restart#: 3, Undo Extent Count: 25
doing "AABEYTAAAAAC9ZsAB1" again was called 41005
-- Restart#: 4, Undo Extent Count: 26
doing "AABEYTAAAAAC9axAAa" again was called 52298
-- Restart#: 5, Undo Extent Count: 27
trigger count = 53005 local count = 53000
---- End Rollstat: USN=20, EXTENTS=29, EXTENDS=26

The output shows 5 Restarts(53005-53000) , 5 new Extents(29-24) created by 5 extending(26-21).
So for each restart, a new active Undo Extent is created. If you run it repeatedly, you will see the same behavior.

As we know, Oracle fulfils a DML by following 5 steps:
  create UNDO Change Vector
  create REDO Change Vector
  combine both Change Vectors into Redo Buffer, and then write to redo log
  write UNDO record into UNDO file
  write modified Data into DATA file

When a transaction fills rollback segment and finds that the last remaining block is not sufficient to hold the UNDO record, it will trigger SMON to perform a recursive space transaction, which will allocate a new UNDO extent. Hereafter Oracle recommences the above 5 steps, and hence an "Update Restart".

As I tested in 10g and 11g (undo_management=AUTO), this output is confirmed,  but I would be careful to have any earlier conclusion before Oracle says.

Update (2022Aug25) --------------------------------------------------

Blog: That Old Restart Problem Strikes Back: Setting the Stage (1 August 2019)
sets DML diagnostic events in session:

  alter session set events 'trace[dml]:sql_trace wait=true';
and shows that there is one ORA-01551 error for each restart.

Setting this DML event, repeat our above test.
For above 5 Restarts, we can see 5 occurrences of following text in the trace file:

updThreePhaseExe: objn=4891285 phase=NOT LOCKED
updaul: phase is NOT LOCKED snap oldsnap env: (scn: 0x00000b7302b6b7ac  xid: 0x006e.011.00006316  uba: 0x00c0227c.1936.58 
updrow: objn=4891285 phase=NOT LOCKED
updrow: objn=4891285 error=1551
updrow: qecinvsub objn=4891285
updrow: setting ROW_RETRY objn=4891285
updrow: retry_this_row: ROW_RETRY set, objn= 4891285 phase=NOT LOCKED
The difference of those 5 occurrences are the second line about (scn, xid, uba),
where scn and uba are different for each restart, but for the same xid:

updaul: phase is NOT LOCKED snap oldsnap env: (scn: 0x00000b7302b6b7ac  xid: 0x006e.011.00006316  uba: 0x00c0227c.1936.58 
updaul: phase is NOT LOCKED snap oldsnap env: (scn: 0x00000b7302b6b7b0  xid: 0x006e.011.00006316  uba: 0x00c0517c.1937.5a 
updaul: phase is NOT LOCKED snap oldsnap env: (scn: 0x00000b7302b6b7b2  xid: 0x006e.011.00006316  uba: 0x00c051fc.1938.5a 
updaul: phase is NOT LOCKED snap oldsnap env: (scn: 0x00000b7302b6b7b3  xid: 0x006e.011.00006316  uba: 0x00c0527c.1939.59 
updaul: phase is NOT LOCKED snap oldsnap env: (scn: 0x00000b7302b6b7b7  xid: 0x006e.011.00006316  uba: 0x00c052fc.193a.5a 
If we made a 1551 errorstack trace:

  alter session set events='1551 trace name errorstack level 3';
the Error Stack shows:

----- Error Stack Dump -----
ORA-01551: extended rollback segment, pinned blocks released
----- Current SQL Statement for this session (sql_id=4u6b4b82z7xrp) -----
SELECT * FROM DELETE_DATA FOR UPDATE
----- PL/SQL Call Stack -----

[15] (kgeselv()+89                    
[16] (ksesecl0()+189                  
[17] (ktusmasp()+1088   --->ktusmasp - kernel transaction undo smu (system managed undo) add space              
[18] (ktuchg2()+9603                  
[19] (ktbchg2()+231                   
[20] (kddchg()+611                    
[21] (kddlok()+2350                   
[22] (kddlkr()+322                    
[23] (qerfuProcessFetchedRow()+669    
[24] (qerfuLockingRowProcedure()+68   
[25] (qersoRowP()+880                 
[26] (kdstf000110100000000km()+1014   
[27] (kdsttgr()+2154                  
[28] (qertbFetch()+1090               
[29] (qersoProcessULS()+281           
[30] (qersoFetchSimple()+2251         
[31] (qersoFetch()+210                
[32] (qerfuStart()+602                
[33] (selexe0()+3409                  
[34] (opiexe()+11896  
End of Update (2022Aug25)--------------------------------------------------



The same behavior is also observed for the single row update, in which the cursor is defined with an input parameter p_id to select a single row:

declare
   cursor l_delete_csr (p_id number) is
     select * from delete_data where id = p_id for update;
   l_cnt number := 0;
begin
  pacepack.g_data.delete;
  pacepack.g_cnt         := 0;
  pacepack.g_restart_cnt := 0;

  for i in 1..53000 loop       
    for l_delete_row in l_delete_csr(i) loop
      update delete_data set owner = lower(owner) where current of l_delete_csr;
      l_cnt := l_cnt + 1;
     
      if l_cnt = 1 then
       for c in (select usn, extents, extends from v$rollstat where xacts > 0) loop
        dbms_output.put_line('---- Begin Rollstat: '||'USN='||c.usn||', EXTENTS='||c.extents||', EXTENDS='||c.extends);
       end loop;
      end if;     
    end loop;
  end loop;
  
  dbms_output.put_line( 'trigger count = ' || pacepack.g_cnt ||' local count = '   || l_cnt );
  for c in (select usn, extents, extends from v$rollstat where xacts > 0) loop
    dbms_output.put_line('---- End Rollstat: '||'USN='||c.usn||', EXTENTS='||c.extents||', EXTENDS='||c.extends);
  end loop;      
end;
/



Output:

---- Begin Rollstat: USN=34, EXTENTS=29, EXTENDS=25
doing "AABEYTAAAAAC9WhABL" again was called 11315
-- Restart#: 1, Undo Extent Count: 29
doing "AABEYTAAAAAC9XnAAB" again was called 22612
-- Restart#: 2, Undo Extent Count: 30
doing "AABEYTAAAAAC9ZCAAe" again was called 33877
-- Restart#: 3, Undo Extent Count: 31
doing "AABEYTAAAAAC9aHAAr" again was called 45142
-- Restart#: 4, Undo Extent Count: 32
trigger count = 53004 local count = 53000
---- End Rollstat: USN=34, EXTENTS=33, EXTENDS=29


The Blog: Three new "hints" (http://rwijk.blogspot.com/2009/10/three-new-hints.html) shows a new  Oracle 11.2.0.1 hint: RETRY_ON_ROW_CHANGE, which I would call it block level restart, in comparing to the traditional row level (maybe this hint already exists in 10g since it is mentioned in 10g  "Bug 6082734  Dump (kdr9igtl) accessing a row from an IOT").

One interesting code in this Blog is that it uses after trigger to catch the restart.