Tuesday, November 3, 2015

Oracle 12c single session "library cache lock (cycle)" deadlock

Note:

This is a 12c follow-up of previous Blog: Oracle 11gR2 single session "library cache pin"

Update (2020-Oct-23):

The discussed behaviour is no more reproducible in 18c or later version due to 18c new introduced a hidden parameter "_EnableShadowTypes" (enable shadow types, default FALSE). To reproduce the problem in 18c or later version, set "_EnableShadowTypes" to TRUE.

It is documented in MOS Note:
Pipeline Function Returning UDT Defined in the Package Specification Raises ORA-00902 Post Database Upgrade to 18c (Doc ID 2696498.1)

Prior to 18c, when such pipeline function is compiled, database implicitly creates shadow objects at SQL level 
(e.g. SYS_PLSQL_135723_9_1) so that the pipeline function can return the value. 
However, from RDBMS version 18c and onward, it fails because implicit creation of shadow object type 
for pipelined table functions returning user defined object type defined in package has been stopped. 
This change has been introduced by the following internal.

Alternatively, there's a backout switch to re-enable the generation of shadow object types 
for pipelined table functions returning user defined object type defined in package specification. 
The backout switch is to set the hidden parameter "_EnableShadowTypes" to TRUE.

      alter session SET "_EnableShadowTypes" = true;
(also see MOS Note:
-. Bug 29935685 - ORA-7445 [qcopgoty] Error During Compilation of Package When Using Pipelined Table Functions in 19.3 and Above (Doc ID 29935685.8)
-. Pipelined Function Consumes More Time IN 18C Than 12.2.0.1 (Doc ID 2603568.1) )

library cache lock (cycle)


Recently we observed a single session "library cache lock (cycle)" deadlock when upgrading to 12c (12.1.0.2.0) from 11gR2.

To reproduce the deadlock, run the attached TestCode (at the end of this Blog) to create lc_pin# package and package body.

To observe the new generated objects, run query:
 select owner, object_name, object_type from dba_objects
 where last_ddl_time > sysdate -10/1440
 order by object_name;


it lists all new created sources:

 TEST LC_PIN#                    PACKAGE BODY
 TEST LC_PIN#                    PACKAGE
 TEST SYS_PLSQL_6174CDA6_21_1    TYPE
 TEST SYS_PLSQL_6174CDA6_31_1    TYPE
 TEST SYS_PLSQL_6174CDA6_9_1     TYPE
 TEST SYS_PLSQL_6174CDA6_DUMMY_1 TYPE
 SYS   SYS_PLSQL_750F00_462_1     TYPE
 SYS   SYS_PLSQL_750F00_DUMMY_1   TYPE


Run query on dba_source:
 select * from dba_source
 where name like 'SYS_PLSQL_6174CDA6%' or name like 'SYS_PLSQL_750F00%'
 order by name, line;


it shows the mapping between LC_PIN# defined types and new generated types:

 TEST SYS_PLSQL_6174CDA6_21_1     for t_vc
 TEST SYS_PLSQL_6174CDA6_31_1     for t_vc_tab (table of SYS_PLSQL_6174CDA6_21_1)
 TEST SYS_PLSQL_6174CDA6_9_1      for t_dba_row_tab (table of SYS_PLSQL_750F00_462_1)
 TEST SYS_PLSQL_6174CDA6_DUMMY_1  for index table of SYS_PLSQL_6174CDA6_31_1
 SYS  SYS_PLSQL_750F00_462_1      for sys.dba_tables%rowtype
 SYS  SYS_PLSQL_750F00_DUMMY_1    for index table of SYS_PLSQL_6174CDA6_9_1


Now if we manually drop the generated SYS type: (in real applications, it is done by CLEANUP_NON_EXIST_OBJ Task. see later discussion):

 SQL > drop type SYS.SYS_PLSQL_750F00_462_1 force;

SYS_PLSQL_750F00_462_1 is no more registered in dba_objects, but still retained in sys.obj$.

 select * from sys.obj$
 where mtime > sysdate -10/1440
 order by mtime;


In sys.obj$, however, it is altered from type# 13 (TYPE) to type# 10 object (also named NON-EXISTENT object in Oracle).

SYS_PLSQL_6174CDA6_9_1 is a table of dropped SYS_PLSQL_750F00_462_1, it becomes invalid, try to recompile it:

 SQL > alter type test.sys_plsql_6174cda6_9_1 compile;
   Warning: Type altered with compilation errors.

 SQL > show error
   Errors for TYPE TEST.SYS_PLSQL_6174CDA6_9_1:
   LINE/COL ERROR
   -------- -----------------------------------------------------------------
   0/0      PL/SQL: Compilation unit analysis terminated
   1/46     PLS-00201: identifier 'SYS.SYS_PLSQL_750F00_462_1' must be
            declared


The invalid SYS_PLSQL_6174CDA6_9_1 is used in package body, try to compile it:

 SQL > alter package lc_pin# compile body;
   Warning: Package Body altered with compilation errors.

 SQL > show error
   Errors for PACKAGE BODY LC_PIN#:
   LINE/COL ERROR
   -------- -----------------------------------------------------------------
   25/8     PL/SQL: ORA-04020: deadlock detected while trying to lock object
            TEST.SYS_PLSQL_6174CDA6_31_1
   25/8     PL/SQL: SQL Statement ignored
   33/8     PL/SQL: Statement ignored
   33/17    PLS-00364: loop index variable 'C' use is invalid


where Line 25 (see attached TestCode) is
  with sq as (select * from table(foo))

Now we have SYS_PLSQL_6174CDA6_9_1 and LC_PIN# package body invalid, but LC_PIN# package spec still valid.

The above compile body took a few seconds. If we open a new Sqlplus session, and run a query on v$wait_chains during the compile body:

 select chain_signature, to_char(p1, 'xxxxxxxxxxxxxxxxxxxx') p1, p1_text,
    to_char(p2, 'xxxxxxxxxxxxxxxxxxxxxxxx') p2, p2_text,
    to_char(p3, 'xxxxxxxxxxxxxxxxx') p3, p3_text,
       in_wait_secs, time_remaining_secs
 from v$wait_chains;


it shows CHAIN_SIGNATURE: 'library cache lock' (cycle) (v$session.event: 'library cache lock', see later discussion):

CHAIN_SIGNATURE P1 P1_TEXT P2 P2_TEXT P3 P3_TEXT IN_WAIT_SECS TIME_REMAINING_SECS
'library cache lock' (cycle) 15ab8f290 handle address 15f2189a8 lock address 585a300010003 100*mode+namespace 1 898


It will also generates an trace file, which contains a line:
     ORA-04020: deadlock detected while trying to lock object TEST.SYS_PLSQL_6174CDA6_31_1

(In 11gR2, they are "library cache pin" and "ORA-04021: timeout occurred while waiting to lock object".
see Blog: Oracle 11gR2 single session "library cache pin")

A quick workaround is to recompile the package spec even it is still valid:

     alter package lc_pin# compile;

It re-compiled SYS_PLSQL_6174CDA6_9_1 (TYPE) and LC_PIN# (PACKAGE BODY), but not LC_PIN# (PACKAGE).

After the re-compilation, all are valid, we can run the query:

   select * from table(lc_pin#.soo);    

And watch object dependencies currently loaded in the shared pool by:

 select (select to_name from v$object_dependency where to_hash = d.from_hash and rownum=1) from_name
       ,(select sql_text from v$sql where hash_value = d.from_hash) sql_text
       ,d.*
 from v$object_dependency d
 where to_name like 'SYS_PLSQL_6174CDA6%' or to_name like 'SYS_PLSQL_750F00%' or to_name = 'LC_PIN#'
 order by to_name;


Oracle view v$object_dependency has to_name column, but not from_name column, we can add two from_name columns with query:

with sq as (select hash_value, name, type from v$db_object_cache v where name = 'LC_PIN#')
select (select name||' ('||type||')' from v$db_object_cache where hash_value = from_hash and rownum=1) from_name_1
      ,(select to_name from v$object_dependency where to_hash = d.from_hash and rownum=1) from_name_2
      ,(select substr(sql_text, 1, 20) from v$sql where hash_value = d.from_hash and rownum=1) sql_text
      ,d.*
  from v$object_dependency d
 where from_hash in (select hash_value from sq) or to_hash in (select hash_value from sq) 
 order by from_hash, to_name;

Reasoning:


The problem seems caused by the "with" factoring clause (see TestCode, function soo in package body lc_pin#).

When Oracle parses "with" factoring clause, it acquires a "library cache pin" in the Share Mode (S) on the dependent objects, in this case, it is "t_vc_tab", then it proceeds to main clause, in which it realizes that the dependent object: "t_dba_row_tab" is invalid. In order to resolve this invalid, Oracle attempts to recompile package spec, which requests Exclusive Mode (X) on the related objects.

Since the already held mode (S) on "t_vc_tab" is not consistent with requesting mode (X), Oracle session generated a dump:

 A deadlock among DDL and parse locks is detected.
 ORA-04020: deadlock detected while trying to lock object TEST.SYS_PLSQL_6174CDA6_31_1

 Short stack dump of current process:
  ksedsts()+244<-ksfdmp()+59<-kgllkde()+742<-kglLockWait()+1216<-kgllkal()+1493<-kglLock()+1533
  <-kglget()+354<-kokt_gtyp()+1195<-koktgdptt()+1678<-koktsdcu()+327<-koktdrp()+3321<-opiexe()+23402

 --------------------------------------------------------
   object   waiting  waiting       blocking blocking
   handle   session     lock mode   session     lock mode
 --------  -------- -------- ----  -------- -------- ----
 15ab8f290  18fbfb3c0 15f2189a8    X  18fbfb3c0 165dbbe28    S


 ------------- WAITING LOCK -------------
 SO: 0x15f2189a8, type: 96, owner: 0x180658498, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
  LibraryObjectLock:  Address=15f2189a8 Handle=15ab8f290 RequestMode=X
   CanBeBrokenCount=9 Incarnation=7 ExecutionCount=0
   User=18fbfb3c0 Session=18fbff560 ReferenceCount=0
   Flags=[0000] SavepointNum=2043e
   LibraryHandle:  Address=15ab8f290 Hash=f71a33e4 LockMode=S PinMode=S LoadLockMode=0 Status=VALD

 ------------- BLOCKING LOCK ------------
 SO: 0x165dbbe28, type: 96, owner: 0x15f102fe0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
  LibraryObjectLock:  Address=165dbbe28 Handle=15ab8f290 Mode=S
   CallPin=155fbeed8 CanBeBrokenCount=9 Incarnation=7 ExecutionCount=0
   User=18fbfb3c0 Session=18fbfb3c0 ReferenceCount=1
   Flags=CNB/PNC/[0003] SavepointNum=203a9
  LibraryHandle:  Address=15ab8f290 Hash=f71a33e4 LockMode=S PinMode=S LoadLockMode=0 Status=VALD


If we quickly select on v$wait_chains by:

 select chain_signature, to_char(p1, 'xxxxxxxxxxxxxxxxxxxx') p1, p1_text,
    to_char(p2, 'xxxxxxxxxxxxxxxxxxxxxxxx') p2, p2_text,
    to_char(p3, 'xxxxxxxxxxxxxxxxx') p3, p3_text,
       in_wait_secs, time_remaining_secs
 from v$wait_chains;


We got:

CHAIN_SIGNATURE P1 P1_TEXT P2 P2_TEXT P3 P3_TEXT IN_WAIT_SECS TIME_REMAINING_SECS
'library cache lock' (cycle) 15ab8f290 handle address 15f2189a8 lock address 585a300010003 100*mode+namespace 1 898


Although time_remaining_secs is 898 seconds (total wait_time is about 900), the above row disappeared after 9 seconds,
probably because the session already generated the dump. The time_remaining_secs is still calculated with 15 minutes "library cache pin" timeout in 11gR2 (Blog: Oracle 11gR2 single session "library cache pin")

Total wait_time of 900 seconds (15 minutes) seems coming from default setting of "_kgl_time_to_wait_for_locks" (time in minute, modification requires instance restart), which controls time to wait for locks and pins before timing out (see Blog: One Oracle AQ Deadlock ). But with Oracle 12c, 15 minutes (900 seconds) is only for display, no more real honoured.

A further query:

  select (select kglnaobj||'('||kglobtyd||')'
            from x$kglob v 
           where kglhdadr = object_handle and rownum=1) kglobj_name
         ,v.*
  from v$libcache_locks v
  where v.holding_user_session  =
           (select saddr from v$session
             where event ='library cache lock' and rownum = 1)
    and object_handle in (select object_handle from v$libcache_locks where mode_requested !=0)
  order by kglobj_name, holding_user_session, type, mode_held, mode_requested;


shows there are two rows on SYS_PLSQL_6174CDA6_31_1(TYPE) with TYPE: LOCK.
In the first row, the HOLDING_USER_SESSION and HOLDING_SESSION are different, it has
 MODE_REQUESTED = 3 (Exclusive mode).

KGLOBJ_NAME TYPE ADDR HOLDING_USER_SESSION HOLDING_SESSION OBJECT_HANDLE LOCK_HELD REFCOUNT MODE_HELD MODE_REQUESTED SAVEPOINT_NUMBER CON_ID
SYS_PLSQL_6174CDA6_31_1(TYPE) LOCK 15F2189A8 18FBFB3C0 18FBFF56015AB8F290 0 0 0 3 132158 0
SYS_PLSQL_6174CDA6_31_1(TYPE) LOCK 165DBBE28 18FBFB3C0 18FBFB3C0 15AB8F290155FBEED8 1 2 0 132009 0
(some leading ZEROs are removed)

From the query result, we can see that HOLDING_USER_SESSION already held a LOCK mode of 2(Share mode), but at the same time designates a different recursive session to request a LOCK mode of 3(Exclusive mode). The column SAVEPOINT_NUMBER seems recording the sequence of LOCK GET (132009) and REQUEST(132158), so first has GET, then REQUEST (132009 < 132158).

Oracle throws such deadlock since both GET and REQUEST are originated from same HOLDING_USER_SESSION.

Crossing check with above dump file under "WAITING LOCK", we can see:
    User=18fbfb3c0 Session=18fbff560
where HOLDING_USER_SESSION (18fbfb3c0) is different from HOLDING_SESSION (18fbff560), but under "BLOCKING LOCK", both are same (18fbfb3c0).

SavepointNum is hex: 2043e (decimal 132158), and 203a9 (decimal 132009).

In Oracle, HOLDING_USER_SESSION is the session we see in v$session, whereas HOLDING_SESSION is the recursive session when both are not the same. Normally recursive session is spawned out when HOLDING_USER_SESSION requires "SYS" user privilege to perform certain tasks.

By the way, recursive session is not exported in v$session because of filter predicate:
    bitand("s"."ksuseflg",1)<>0
on x$ksuse, where bitand (s.ksuseflg, 19) =
    1     for 'USER' session,
    17   for 'BACKGROUND',
    2     for 'RECURSIVE'.

(Tanel has a Blog on recursive session)

One more point, in 11.2.0.3.0, the names of generated types are composed by DBA_OBJECTS.object_id of lc_pin# and dba_tables, but in 12c, they are named "6174CDA6" and "750F00". It is not clear to me where they come from.

Repeat the same test in 11.2.0.4.0, the behavior is same as 12c, but the name convention is similar to 11.2.0.3.0.

TYPE Dropping


In the above discussion, we manually drop the type to force the invalid by:

 drop type SYS.SYS_PLSQL_750F00_462_1 force;

Before upgrade to 12c, the query in 11gR2:

 select owner, object_name, object_type
 from dba_objects
 where (object_name like 'LC_PIN#'
     or object_name like 'SYS_PLSQL_1349946%' or object_name like 'SYS_PLSQL_3238%'
     or object_name like 'SYS_PLSQL_6174CDA6%' or object_name like 'SYS_PLSQL_750F00%')
 order by object_name;


returns:

 TEST LC_PIN#                    PACKAGE BODY
 TEST LC_PIN#                    PACKAGE
 TEST SYS_PLSQL_1349946_21_1     TYPE
 TEST SYS_PLSQL_1349946_31_1     TYPE
 TEST SYS_PLSQL_1349946_9_1      TYPE
 TEST SYS_PLSQL_1349946_DUMMY_1  TYPE
 SYS  SYS_PLSQL_3238_382_1       TYPE
 SYS  SYS_PLSQL_3238_DUMMY_1     TYPE


and after the upgrade to 12c, it returns:

 TEST LC_PIN#                     PACKAGE BODY
 TEST LC_PIN#                     PACKAGE
 TEST SYS_PLSQL_6174CDA6_21_1     TYPE
 TEST SYS_PLSQL_6174CDA6_31_1     TYPE
 TEST SYS_PLSQL_6174CDA6_9_1      TYPE
 TEST SYS_PLSQL_6174CDA6_DUMMY_1  TYPE
 SYS  SYS_PLSQL_750F00_462_1      TYPE
 SYS  SYS_PLSQL_750F00_DUMMY_1    TYPE


so all the TYPES in 12c are newly created, and the TYPES in 11gR2 were dropped during upgrade.

Look the DDL of DBA_TABLES,
  select * from  dba_tab_cols where table_name = 'DBA_TABLES' order by column_name; 

we can see that is is augmented to 65 Columns in 12c from 55 in 11gR2. Query:
  select * from dba_objects where object_name = 'DBA_TABLES';
shows the created time and last_ddl_time.

CLEANUP_NON_EXIST_OBJ Task


Oracle 12c has introduced a few CLEANUP JOBs. We can see them by:

 select * from dba_scheduler_jobs
 where job_name like 'CLEANUP%';
  

output:

 CLEANUP_NON_EXIST_OBJ
 CLEANUP_ONLINE_IND_BUILD
 CLEANUP_ONLINE_PMO
 CLEANUP_TAB_IOT_PMO
 CLEANUP_TRANSIENT_PKG
 CLEANUP_TRANSIENT_TYPE


Look JOB: CLEANUP_NON_EXIST_OBJ, the comments Column said:

 Cleanup Non Existent Objects in obj$

and the job_action Column:

 declare
   myinterval   number;
 begin
   myinterval := dbms_pdb.cleanup_task (1);
   if myinterval <> 0
   then
     next_date := systimestamp + numtodsinterval (myinterval, 'second');
   end if;
 end;


If we run the above block, the NON-EXISTENT object, in this case,
        SYS.SYS_PLSQL_750F00_462_1,
will be removed.

Open package SYS.dbms_pdb, it is documented as:

        The following routine is related to operations done in SMON
        until 11.2. But, with the introduction of PDBs in 12c and with
        the possibility of having multiple PDBs in a single
        CDB, we want to move this cleanup out of SMON so that SMON is not
        overloaded with work that can be done in some other background process.
        The goal is to move everything except transaction recovery out of SMON.


        cleanup_task - cleanup task previously done in SMON
        This procedure performs cleanup task previously done in SMON


It looks like the type was dropped by SMON in 11gR2; but in 12c, performed by a dedicated JOB.

But in one 12c instance crash, SMON dump contains one section on its most recent 20 activities:

  SMON's recent activities: (most recent first)
    Activity  1:              Active transaction recovery -  started 2205" ago, not ended 
    Activity  2:                  Doing Sort Segment work -  0" (started 2205" ago)
    ...
    Activity  9:                         Coalesce extents -  1" (started 2221" ago)
    Activity 10:                      Auto-tuning for IMU -  0" (started 2221" ago)
    Activity 11:                        scn-->time update -  0" (started 2221" ago)    
    ...
    Activity 20:        read afn from prewarm fet channel -  0" (started 2521" ago)
  End -- SMON's most recent 20 activities.
in which we can see SMON 10 activities.
  Active transaction recovery
  Auto-tuning for IMU
  Coalesce extents
  Doing Sort Segment work
  Launch background space manager
  Offline undo segments
  read afn from prewarm fet channel
  Read from broadcast channel
  read from segment info channel
  scn-->time update
So SMON is not only perfroming Active transaction recovery, but also other 9 (at least) activities. It is not clear how to match above SYS.dbms_pdb.cleanup_task package description with real SMON dump. (In general, both sources are higly reliable since the description should be wriiten by the developers, and SMON dump reports real existed facts, which have been performed.)

There are also other 12c new JOBs:

 select * from dba_scheduler_jobs where owner = 'SYS';

For example, FILE_SIZE_UPD, whose comments said: Update file size periodically, and job_action:

 declare
   myinterval   number;
 begin
   myinterval := dbms_pdb.cleanup_task (7);
   if myinterval <> 0
   then
     next_date := systimestamp + numtodsinterval (myinterval, 'second');
   end if;
 end;



TestCode


 -- This test is with dba_tables.
 -- It is also reproducible with dba_segments, dba_objects, dba_indexes.

drop package lc_pin#;

create or replace package lc_pin#
 as
   type t_dba_row_tab is table of sys.dba_tables%rowtype; 
   type t_vc          is record (name varchar2(30));
   type t_vc_tab      is table of t_vc;
    
   function foo return t_vc_tab pipelined;
   function koo return t_dba_row_tab pipelined;
   function soo return t_dba_row_tab pipelined;
 end lc_pin#;
 /

create or replace package body lc_pin#
 as

  function foo return t_vc_tab pipelined
   is
     l_result  t_vc;
   begin
     l_result.name     := 'lc_test';
     pipe row(l_result);
     return;
   end foo;  
   
  function koo return t_dba_row_tab pipelined
   is
   begin
     for c in (select * from dba_tables where  rownum = 1) loop
       pipe row(c);
     end loop;
   end koo;
  
  function soo return t_dba_row_tab pipelined
   is
   begin
     for c in (
       with sq as (select * from table(foo))    -- Line 25
       select nt.*
      from   sq 
            ,(select * from table(koo)) nt
      
      -- following re-write works
      -- select nt.* from (select * from table(foo)) sq, (select * from table(koo)) nt
     ) loop
       pipe row(c);                 -- Line 33
     end loop;
   end soo;
  
end lc_pin#;
/