Monday, November 21, 2011

One Oracle 12g Pre-announce of bug fix in dbms_debug_jdwp

4 years ago, I filed a bug of dbms_debug_jdwp in 10gR2, after 2 years of pingpong playing, Oracle finally acknowledged the reproducibility of submitted testcase, but only promised to fix it in Oracle 12g.

Here the test code:

alter system set plsql_optimize_level = 1;

CREATE OR REPLACE PACKAGE ksun_nocopy_called
IS
   type t_rec is record(
    a                    number := 100
   ,b                    number
   );
  PROCEDURE called(
    o_rec     OUT NOCOPY t_rec
  );
END ksun_nocopy_called;
/

CREATE OR REPLACE PACKAGE BODY ksun_nocopy_called
IS
  PROCEDURE called(
    o_rec     OUT NOCOPY t_rec
  )
  IS
  BEGIN
    null;
  END called;
END ksun_nocopy_called;
/

CREATE OR REPLACE PACKAGE ksun_nocopy_caller
IS
  PROCEDURE caller;
END ksun_nocopy_caller;
/

CREATE OR REPLACE PACKAGE BODY ksun_nocopy_caller
IS
  b_rec_old                   ksun_nocopy_called.t_rec;
  PROCEDURE caller
  IS
  BEGIN
    ksun_nocopy_called.called(b_rec_old);
  END caller;
END ksun_nocopy_caller;
/


To run the testcase, start JDeveloper (Version 10.1.3.2.0), set a breakpoint at the line in ksun_nocopy_called.called:
    null;
   
Open a Sqlplus window, run the following code:

exec dbms_debug_jdwp.connect_tcp('pc-123', 4000);
exec ksun_nocopy_caller.caller
exec dbms_debug_jdwp.disconnect;

Then get the following message:

*********START PLSQL RUNTIME DUMP************
***Got internal error Exception caught in pl/sql run-time while running PLSQL***
***Got ORA-6544 while running PLSQL***
PACKAGE BODY K.KSUN_NOCOPY_CALLER:


Due to my impatience and their reluctance, two alternative approaches were worked out.

One is to substitute the package body variable with a procedure local variable as:

CREATE OR REPLACE PACKAGE BODY ksun_nocopy_caller
IS
  b_rec_old                   ksun_nocopy_called.t_rec;
  PROCEDURE caller
  IS
    l_rec_copy                ksun_nocopy_called.t_rec;
  BEGIN
    l_rec_copy := b_rec_old;
    ksun_nocopy_called.called(l_rec_copy);
    b_rec_old := l_rec_copy;
  END caller;
END ksun_nocopy_caller;

Another is to eliminate initial assignment in record declaration as:

   type t_rec is record(
    a                    number
   ,b                    number
   );
  
Probably Oracle was either encouraged by the simplicity of testcase, or inspired by the intuitive workarounds, they finally recognized the bug and projected to fix it in Oracle 12g.

By the way, DEBUG parameters are not updated in Oracle 12c.

Addendum (2015.10.19):  Oracle 12c document said:
   The PLSQL_DEBUG parameter is deprecated. It is retained for backward compatibility only (see PLSQL_DEBUG)

 it is replaced by:
   PLSQL_OPTIMIZE_LEVEL = 1

However 12c DBMS_TRACE document wrote:
   You can enable a program unit by compiling it debug.
       alter session set plsql_debug=true;

(see Database PL/SQL Packages and Types Reference)



 

Thursday, November 17, 2011

Oracle 11gR2 single session "library cache pin"

Abstract:


Recently we encountered a single session cyclic  "library cache pin" during upgrade to 11.2.0.3.0 from a lower version, in which the application generated types (in SYS schema) are cleaned out by:

     drop type sys.SYS_PLSQL_X_Y_Z force;

Addendum (2015.11.02): Above drop is done by SMON (see Section CLEANUP_NON_EXIST_OBJ Task of follow-up Blog: Oracle 12c single session "library cache lock (cycle)" deadlock).

In order to understand this special deadlock case, run the attached the test code (at the end of this Blog). which consists of three code parts (SYS_PLSQL_3238_382_1 is a package generated sys type):

   create package spec and package body
   drop type sys.SYS_PLSQL_3238_382_1 force
   alter package suk_lc_pin# compile body


and session is waiting for "library cache pin" for a duration of 15 minutes, which can be observed by:

   select * from v$session where event like 'library cache pin';
   select * from v$wait_chains; 

After 15 minutes, session throws:

      ORA-04021: timeout occurred while waiting to lock object

15 minutes timeout is the default value of "_kgl_time_to_wait_for_locks" (time to wait for locks and pins before timing out).

We can break the blocked session and remove all invalids by:

      alter package lc_pin# compile;

The attached testcase is based on dba_tables, but it is reproducible with dba_segments, dba_objects, dba_indexes.     

Affected Oracle Version:

      11.2.0.1.0, 11.2.0.2.0, 11.2.0.3.0, but not 10.2.0.4.0.

Reasoning:


The problem seems caused by the "with" factoring clause (see attached testcase at the end of this Blog).

When Oracle parses "with" factoring clause, it acquires a "library cache pin" in the Share Mode (S) on the dependent objects, in this case, it is "t_vc", then it proceeds to main clause,  in which it realizes that the dependent object: "t_dba_row_tab" is invalid. In order to resolve this invalid, Oracle attempts to recompile package spec, which requests Exclusive Mode (X) on the related objects.

Since the already held mode (S) on "t_vc" is not consistent with requesting mode (X), Oracle session spins on the wait event "library cache pin". By default, Oracle throws an error:
    ORA-04021: timeout occurred while waiting to lock object
after 15 minutes.

In theory, this single session cyclic lock should be detected as a deadlock, however, Oracle "library cache pin" is not deadlock sensitive. That is probably the reason why the waiting event is controlled by a 15 minute timeout in Oracle implementation.

A further query:

select (select kglnaobj||'('||kglobtyd||')'
          from x$kglob v
         where kglhdadr = object_handle and rownum=1) kglobj_name
       ,v.*
from v$libcache_locks v
where v.holding_user_session  =
         (select saddr from v$session
           where event ='library cache pin' and rownum = 1)
  and object_handle in (select object_handle from v$libcache_locks where mode_requested !=0)

order by kglobj_name, holding_user_session, type, mode_held, mode_requested;

shows there are two rows on SYS_PLSQL_263602_21_1(TYPE) with TYPE: PIN, in which the HOLDING_USER_SESSION and HOLDING_SESSION are different in the row with
        MODE_REQUESTED = 3 (Exclusive mode)

KGLOBJ_NAMETYPEADDRHOLDING_USER_SESSIONHOLDING_SESSIONOBJECT_HANDLELOCK_HELDREFCOUNTMODE_HELDMODE_REQUESTEDSAVEPOINT_NUMBER
SYS_PLSQL_263602_21_1(TYPE)PIN07000000A5C224E007000000C09CE61007000000C1A399B007000000A8AEFD2000037504
SYS_PLSQL_263602_21_1(TYPE)PIN07000000A724383807000000C09CE61007000000C09CE61007000000A8AEFD2007000000A72439381206235

From the query result, we can see that HOLDING_USER_SESSION already held a PIN mode of 2(Share mode), but at the same time designates a different recursive session to request a PIN mode of 3(Exclusive mode). The column SAVEPOINT_NUMBER seems recording the sequence of PIN get and request.

This probably explains why Oracle can't detect such deadlock case.

In Oracle, HOLDING_USER_SESSION is the session we see in v$session, whereas HOLDING_SESSION is the recursive session when both are not the same. Normally recursive session is spawned out when HOLDING_USER_SESSION requires "SYS" user privilege to perform certain tasks.

By the way, recursive session is not exported in v$session because of filter predicate:
       bitand("s"."ksuseflg",1)<>0
on x$ksuse, where bitand (s.ksuseflg, 19) = 1 for 'USER' session; = 17 for 'BACKGROUND', = 2 for 'RECURSIVE'.
(Tanel has a Blog on recursive session in Recursive sessions…)

A quick workaround is to recompile the package spec by:
   
    alter package lc_pin# compile;

The new recompiled sources can be listed by:

    select * from dba_objects order by last_ddl_time desc;
   
from which one can see the t_vc is recompiled even it was valid before (this confirms the Exclusive Mode (X) request on t_vc).

Further statistics shows that Oracle is continuously enriching its library cache with the newer versions, for example, v$db_object_cache contains 13 columns in 10.2.0.4.0, 21 in 11.2.0.1.0 and 11.2.0.2.0, and now 23 in 11.2.0.3.0.

Once all are valid, you can run the query:

   select * from table(lc_pin#.soo);   
   

Fundamentals:

   
I would like to share the basics I used to approach this problem.

The Oracle generated representations for package defined types are denoted as:

    SYS_PLSQL_263602_9_1  for t_dba_row_tab
    SYS_PLSQL_3238_382_1  for sys.dba_tables%rowtype
    SYS_PLSQL_263602_31_1 for t_vc_tab
    SYS_PLSQL_263602_21_1 for t_vc
   
where 263602 is the object_id of lc_pin# (package spec, not package body) in dba_objects, 3238 is that of DBA_TABLES.
   
Additionally a type

    SYS_PLSQL_263602_DUMMY_1 as table of number;
   
is created, probably to index the two PL/SQL nested tables: t_dba_row_tab and t_vc_tab, even though both are not declared as associative array (formerly called PL/SQL table or index-by table). We can also guess that PL/SQL nested table is internally implemented as conventional index-by table.

The generated sources can be listed by:

select * from dba_source where name in (
     'SYS_PLSQL_263602_9_1'
     ,'SYS_PLSQL_3238_382_1'
     ,'SYS_PLSQL_263602_31_1'
     ,'SYS_PLSQL_263602_21_1'
     ,'SYS_PLSQL_263602_DUMMY_1');
   
One can also notice that after:

    drop type sys.SYS_PLSQL_3238_382_1 force;

SYS_PLSQL_3238_382_1 is no more registered in dba_objects, but still retained in sys.obj$. In sys.obj$, however, it is altered from type# 13 (TYPE) to type# 10 object  (also named NON-EXISTENT object in Oracle).

v$libcache_locks contains rich details on pin and lock (refcount, mode_held, mode_requested).

After the first publication of this Blog, Tanel's powerful kglpn.sql (TPT_public.zip) showed me the following output:

PIN_MODE REQ_MODE PINNED_BLOCKS  OBJECT_NAME
-------- -------- -------------- -----------------------
Share     None     0 6           SYS.SYS_PLSQL_263602_21_1
None      Excl                   SYS.SYS_PLSQL_263602_21_1


This confirms again the self-deadlock caused by:
   SYS.SYS_PLSQL_263602_21_1
since its PIN_MODE = Share, and REQ_MODE = Excl.

AIX Trace


On AIX, running trace command with session's PID: 28246034 for 20 seconds by:

trace -a -A 28246034 -o trc_raw; sleep 20; trcstop; trcrpt -O "exec=on,pid=on" trc_raw

we get the output (some details are removed):

ID   PID       ELAPSED_SEC     DELTA_MSEC   SYSCALL KERNEL  INTERRUPT
001  16318614  0.000000000       0.000000           TRACE ON channel 0
200  28246034  1.243906765    1243.906765           resume  oracletestdb iar=DF09C cpuid=FFFFFFFF
104  28246034  1.243909585       0.002820   return from system call
101  28246034  1.243953916       0.044331   _poll LR = 9000000012D75D4
252  28246034  1.243956337       0.002421   SOCK soo_select fp=xx so= corl=0 reqevents= rtneventsp=
252  28246034  1.243956876       0.000539   SOCK return from soo_select fp= so= error=0
104  28246034  1.243957916       0.001040   return from _poll [4 usec]
101  28246034  1.243965021       0.007105   _thread_wait LR = 90000000129C2AC
200  28246034  4.243980259    3000.015238           resume  oracletestdb iar=DF09C cpuid=FFFFFFFF
104  28246034  4.243982353       0.002094   return from _thread_wait [3.000017 sec]
101  28246034  4.244031337       0.048984   _thread_wait LR = 90000000129C2AC
200  28246034  7.244053839    3000.022502           resume  oracletestdb iar=DF09C cpuid=FFFFFFFF
104  28246034  7.244055822       0.001983   return from _thread_wait [3.000024 sec]
101  28246034  7.244095943       0.040121   _thread_wait LR = 90000000129C2AC
200  28246034 10.244111460    3000.015517           resume  oracletestdb iar=DF09C cpuid=FFFFFFFF


it shows that session resumes after each 3 seconds of thread_wait (v$wait_chains is also refreshed each 3 seconds).



------------------------- TestCase -------------------------
-- This test is with dba_tables.
-- It is also reproducible with dba_segments, dba_objects, dba_indexes.

drop package lc_pin#;

-- Create Package and Package Body
create or replace package lc_pin#

as
  type t_dba_row_tab is table of sys.dba_tables%rowtype; 
  type t_vc          is record (name varchar2(30));
  type t_vc_tab      is table of t_vc;
   
  function foo return t_vc_tab pipelined;
  function koo return t_dba_row_tab pipelined;
  function soo return t_dba_row_tab pipelined;
end lc_pin#;
/

create or replace package body lc_pin#
as

  function foo return t_vc_tab pipelined
  is
    l_result  t_vc;
  begin
    l_result.name     := 'lc_test';
    pipe row(l_result);
    return;
  end foo;  
   
  function koo return t_dba_row_tab pipelined
  is
  begin
    for c in (select * from dba_tables where  rownum = 1) loop
      pipe row(c);
    end loop;
  end koo;
 
  function soo return t_dba_row_tab pipelined
  is
  begin
    for c in (
      with sq as (select * from table(foo))
      select nt.*
      from   sq 
            ,(select * from table(koo)) nt
     
      -- following re-write works
      -- select nt.* from (select * from table(foo)) sq, (select * from table(koo)) nt
    ) loop
      pipe row(c);
    end loop;
  end soo;
 
end lc_pin#;
/

-- Generate "drop type sys.SYS_PLSQL_3238_382_1 force;" and execute
declare

    l_stmt    varchar2(100);
begin
    select 'drop type sys.' || object_name || ' force' drop_stmt
    into   l_stmt
    from   dba_objects
    where  object_name like
            (select 'SYS_PLSQL_' || object_id || '%_1'
             from   dba_objects
             where  owner = 'SYS' and object_name = 'DBA_TABLES' and object_type = 'VIEW')
      and  object_name not like '%DUMMY%'
    ;
   
    dbms_output.put_line('Run l_stmt: ' || l_stmt);
   
    execute immediate l_stmt;
end;
/

-- Compile Package Body, session is in wait event "library cache pin" for 15 minutes,
-- then throws "ORA-04021: timeout occurred while waiting to lock object"
alter package lc_pin# compile body;