Monday, November 2, 2015

Oracle 12c single session "library cache lock (cycle)" deadlock

Note: This is a 12c follow-up of previous Blog: Oracle 11gR2 single session "library cache pin"

library cache lock (cycle)

Recently we observed a single session "library cache lock (cycle)" deadlock when upgrading to 12c ( from 11gR2.

At first, run the attached TestCode (at the end of this Blog) to create lc_pin# package and package body.

 select owner, object_name, object_type from dba_objects
 where last_ddl_time > sysdate -10/1440
 order by object_name;

will list all new created sources:

 TEST LC_PIN#                    PACKAGE BODY
 TEST LC_PIN#                    PACKAGE
 SYS  SYS_PLSQL_750F00_462_1     TYPE

 select * from dba_source
 where name like 'SYS_PLSQL_6174CDA6%' or name like 'SYS_PLSQL_750F00%'
 order by name, line;

shows the mapping between LC_PIN# defined types and new generated types:

 SYS_PLSQL_6174CDA6_21_1     for t_vc
 SYS_PLSQL_6174CDA6_31_1     for t_vc_tab
 SYS_PLSQL_6174CDA6_9_1      for t_dba_row_tab
 SYS_PLSQL_6174CDA6_DUMMY_1  for index table of SYS_PLSQL_6174CDA6_31_1
 SYS_PLSQL_750F00_462_1      for sys.dba_tables%rowtype
 SYS_PLSQL_750F00_DUMMY_1    for index table of SYS_PLSQL_6174CDA6_9_1

Now if we drop the generated SYS type by (will discuss later):

 SQL > drop type SYS.SYS_PLSQL_750F00_462_1 force;

SYS_PLSQL_750F00_462_1 is no more registered in dba_objects, but still retained in sys.obj$.
 select * from sys.obj$
 where mtime > sysdate -10/1440
 order by mtime;

In sys.obj$, however, it is altered from type# 13 (TYPE) to type# 10 object (also named NON-EXISTENT object in Oracle).

Since SYS_PLSQL_6174CDA6_9_1 becomes invalid, try to recompile it:

 SQL > alter type test.sys_plsql_6174cda6_9_1 compile;
   Warning: Type altered with compilation errors.

 SQL > show error
   Errors for TYPE TEST.SYS_PLSQL_6174CDA6_9_1:
   -------- -----------------------------------------------------------------
   0/0      PL/SQL: Compilation unit analysis terminated
   1/46     PLS-00201: identifier 'SYS.SYS_PLSQL_750F00_462_1' must be

Compile package body:

 SQL > alter package lc_pin# compile body;
   Warning: Package Body altered with compilation errors.

 SQL > show error
   Errors for PACKAGE BODY LC_PIN#:
   -------- -----------------------------------------------------------------
   25/8     PL/SQL: ORA-04020: deadlock detected while trying to lock object
   25/8     PL/SQL: SQL Statement ignored
   33/8     PL/SQL: Statement ignored
   33/17    PLS-00364: loop index variable 'C' use is invalid

where Line 25 (see attached TestCode) is
  with sq as (select * from table(foo))

Now SYS_PLSQL_6174CDA6_9_1 and LC_PIN# package body are invalid, but LC_PIN# package spec is always valid.

A quick workaround is to recompile the package spec even it is valid:
    alter package lc_pin# compile;

which re-compiled SYS_PLSQL_6174CDA6_9_1 (TYPE) and LC_PIN# (PACKAGE BODY), but not LC_PIN# (PACKAGE).

After the re-compilation, all are valid, you can run the query:

   select * from table(lc_pin#.soo);    

And object dependencies currently loaded in the shared pool can be shown by:

 select (select to_name from v$object_dependency where to_hash = d.from_hash and rownum=1) from_name
       ,(select sql_text from v$sql where hash_value = d.from_hash) sql_text
 from v$object_dependency d
 where to_name like 'SYS_PLSQL_6174CDA6%' or to_name like 'SYS_PLSQL_750F00%' or to_name = 'LC_PIN#'
 order by to_name;


The problem seems caused by the "with" factoring clause (see TestCode, function soo in package body lc_pin#).

When Oracle parses "with" factoring clause, it acquires a "library cache pin" in the Share Mode (S) on the dependent objects, in this case, it is "t_vc_tab", then it proceeds to main clause, in which it realizes that the dependent object: "t_dba_row_tab" is invalid. In order to resolve this invalid, Oracle attempts to recompile package spec, which requests Exclusive Mode (X) on the related objects.

Since the already held mode (S) on "t_vc_tab" is not consistent with requesting mode (X), Oracle session generated a dump:

 A deadlock among DDL and parse locks is detected.
 ORA-04020: deadlock detected while trying to lock object TEST.SYS_PLSQL_6174CDA6_31_1
   object   waiting  waiting       blocking blocking
   handle   session     lock mode   session     lock mode
 --------  -------- -------- ----  -------- -------- ----
 15ab8f290  18fbfb3c0 15f2189a8    X  18fbfb3c0 165dbbe28    S

 ------------- WAITING LOCK -------------
 SO: 0x15f2189a8, type: 96, owner: 0x180658498, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
  LibraryObjectLock:  Address=15f2189a8 Handle=15ab8f290 RequestMode=X
   CanBeBrokenCount=9 Incarnation=7 ExecutionCount=0
   User=18fbfb3c0 Session=18fbff560 ReferenceCount=0
   Flags=[0000] SavepointNum=2043e
   LibraryHandle:  Address=15ab8f290 Hash=f71a33e4 LockMode=S PinMode=S LoadLockMode=0 Status=VALD

 ------------- BLOCKING LOCK ------------
 SO: 0x165dbbe28, type: 96, owner: 0x15f102fe0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
  LibraryObjectLock:  Address=165dbbe28 Handle=15ab8f290 Mode=S
   CallPin=155fbeed8 CanBeBrokenCount=9 Incarnation=7 ExecutionCount=0
   User=18fbfb3c0 Session=18fbfb3c0 ReferenceCount=1
   Flags=CNB/PNC/[0003] SavepointNum=203a9
  LibraryHandle:  Address=15ab8f290 Hash=f71a33e4 LockMode=S PinMode=S LoadLockMode=0 Status=VALD

If we quickly select on v$wait_chains by:

 select chain_signature, to_char(p1, 'xxxxxxxxxxxxxxxxxxxx') p1, p1_text,
    to_char(p2, 'xxxxxxxxxxxxxxxxxxxxxxxx') p2, p2_text,
    to_char(p3, 'xxxxxxxxxxxxxxxxx') p3, p3_text,
       in_wait_secs, time_remaining_secs
 from v$wait_chains;

We got:

15ab8f290 handle address 15f2189a8 lock address 585a300010003 100*mode+namespace 1 898

Although time_remaining_secs is more than 800 seconds, the above row disappeared after 9 seconds,
probably because the session already generated the dump. The time_remaining_secs is still for 15 minutes "library cache pin" timeout in 11gR2 (Blog: Oracle 11gR2 single session "library cache pin")

A further query:

  select (select kglnaobj||'('||kglobtyd||')'
            from x$kglob v 
           where kglhdadr = object_handle and rownum=1) kglobj_name
  from v$libcache_locks v
  where v.holding_user_session  =
           (select saddr from v$session
             where event ='library cache lock' and rownum = 1)
    and object_handle in (select object_handle from v$libcache_locks where mode_requested !=0)
  order by kglobj_name, holding_user_session, type, mode_held, mode_requested;

shows there are two rows on SYS_PLSQL_6174CDA6_31_1(TYPE) with TYPE: LOCK,
in which the HOLDING_USER_SESSION and HOLDING_SESSION are different in the row which has
 MODE_REQUESTED = 3 (Exclusive mode).

SYS_PLSQL_6174CDA6_31_1(TYPE) LOCK 15F2189A8 18FBFB3C0 18FBFF56015AB8F290 0 0 0 3 132158 0
SYS_PLSQL_6174CDA6_31_1(TYPE) LOCK 165DBBE28 18FBFB3C0 18FBFB3C0 15AB8F290155FBEED8 1 2 0 132009 0
(some leading ZEROs are removed)

From the query result, we can see that HOLDING_USER_SESSION already held a LOCK mode of 2(Share mode), but at the same time designates a different recursive session to request a LOCK mode of 3(Exclusive mode). The column SAVEPOINT_NUMBER seems recording the sequence of LOCK GET (132009) and REQUEST(132158), so first has GET, then REQUEST (132009 < 132158).

Oracle throws such deadlock since both GET and REQUEST are originated from same HOLDING_USER_SESSION.

Crossing check with above dump file under "WAITING LOCK", we can see:
    User=18fbfb3c0 Session=18fbff560
where HOLDING_USER_SESSION (18fbfb3c0) is different from HOLDING_SESSION (18fbff560), but under "BLOCKING LOCK", both are same (18fbfb3c0).

SavepointNum is hex: 2043e (decimal 132158), and 203a9 (decimal 132009).

In Oracle, HOLDING_USER_SESSION is the session we see in v$session, whereas HOLDING_SESSION is the recursive session when both are not the same. Normally recursive session is spawned out when HOLDING_USER_SESSION requires "SYS" user privilege to perform certain tasks.

By the way, recursive session is not exported in v$session because of filter predicate:
on x$ksuse, where bitand (s.ksuseflg, 19) = 1 for 'USER' session; =17 for 'BACKGROUND', =2 for 'RECURSIVE'.
(Tanel has a Blog on recursive session)

One more point, in, the names of generated types are composed by DBA_OBJECTS.object_id of lc_pin# and dba_tables, but in 12c, they are named "6174CDA6" and "750F00". It is not clear to me where they come from.

Repeat the same test in, the behavior is same as 12c, but the name convention is similar to

TYPE Dropping

In the above discussion, we drop the type manually to force the invalid by:

 drop type SYS.SYS_PLSQL_750F00_462_1 force;

Before upgrade to 12c, the query in 11gR2:

 select owner, object_name, object_type
 from dba_objects
 where (object_name like 'LC_PIN#'
     or object_name like 'SYS_PLSQL_1349946%' or object_name like 'SYS_PLSQL_3238%'
     or object_name like 'SYS_PLSQL_6174CDA6%' or object_name like 'SYS_PLSQL_750F00%')
 order by object_name;


 TEST LC_PIN#                    PACKAGE BODY
 TEST LC_PIN#                    PACKAGE
 TEST SYS_PLSQL_1349946_21_1     TYPE
 TEST SYS_PLSQL_1349946_31_1     TYPE
 TEST SYS_PLSQL_1349946_9_1      TYPE
 SYS  SYS_PLSQL_3238_382_1       TYPE

and after the upgrade to 12c, it returns:

 TEST LC_PIN#                     PACKAGE BODY
 TEST LC_PIN#                     PACKAGE
 TEST SYS_PLSQL_6174CDA6_21_1     TYPE
 TEST SYS_PLSQL_6174CDA6_31_1     TYPE
 TEST SYS_PLSQL_6174CDA6_9_1      TYPE
 SYS  SYS_PLSQL_750F00_462_1      TYPE

so all the TYPES in 12c are newly created, and the TYPES in 11gR2 were dropped during upgrade.

Look the DDL of DBA_TABLES,
  select * from  dba_tab_cols where table_name = 'DBA_TABLES' order by column_name; 

we can see that is is augmented to 65 Columns in 12c from 55 in 11gR2. Query:
  select * from dba_objects where object_name = 'DBA_TABLES';
shows the created time and last_ddl_time.


Oracle 12c has introduced a few CLEANUP JOBs. We can see them by:

 select * from dba_scheduler_jobs
 where job_name like 'CLEANUP%';



Look JOB: CLEANUP_NON_EXIST_OBJ, the comments Column said:

 Cleanup Non Existent Objects in obj$

and the job_action Column:

   myinterval   number;
   myinterval := dbms_pdb.cleanup_task (1);
   if myinterval <> 0
     next_date := systimestamp + numtodsinterval (myinterval, 'second');
   end if;

If we run the above block, the NON-EXISTENT object, in this case,
will be removed.

Open package SYS.dbms_pdb, it is documented as:

        The following routine is related to operations done in SMON
        until 11.2. But, with the introduction of PDBs in 12c and with
        the possibility of having multiple PDBs in a single
        CDB, we want to move this cleanup out of SMON so that SMON is not
        overloaded with work that can be done in some other background process.
        The goal is to move everything except transaction recovery out of SMON.

        cleanup_task - cleanup task previously done in SMON
        This procedure performs cleanup task previously done in SMON

It looks like the type was dropped by SMON in 11gR2; but in 12c, performed by a dedicated JOB.

But in one 12c instance crash, SMON dump contains one section on its most recent 20 activities:

  SMON's recent activities: (most recent first)
    Activity  1:              Active transaction recovery -  started 2205" ago, not ended 
    Activity  2:                  Doing Sort Segment work -  0" (started 2205" ago)
    Activity  9:                         Coalesce extents -  1" (started 2221" ago)
    Activity 10:                      Auto-tuning for IMU -  0" (started 2221" ago)
    Activity 11:                        scn-->time update -  0" (started 2221" ago)    
    Activity 20:        read afn from prewarm fet channel -  0" (started 2521" ago)
  End -- SMON's most recent 20 activities.
in which we can see SMON 10 activities.
  Active transaction recovery
  Auto-tuning for IMU
  Coalesce extents
  Doing Sort Segment work
  Launch background space manager
  Offline undo segments
  read afn from prewarm fet channel
  Read from broadcast channel
  read from segment info channel
  scn-->time update
So SMON is not only perfroming Active transaction recovery, but also other 9 (at least) activities. It is not clear how to match above SYS.dbms_pdb.cleanup_task package description with real SMON dump. (In general, both sources are higly reliable since the description should be wriiten by the developers, and SMON dump reports real existed facts, which have been performed.)

There are also other 12c new JOBs:

 select * from dba_scheduler_jobs where owner = 'SYS';

For example, FILE_SIZE_UPD, whose comments said: Update file size periodically, and job_action:

   myinterval   number;
   myinterval := dbms_pdb.cleanup_task (7);
   if myinterval <> 0
     next_date := systimestamp + numtodsinterval (myinterval, 'second');
   end if;


 -- This test is with dba_tables.
 -- It is also reproducible with dba_segments, dba_objects, dba_indexes.

drop package lc_pin#;

create or replace package lc_pin#
   type t_dba_row_tab is table of sys.dba_tables%rowtype; 
   type t_vc          is record (name varchar2(30));
   type t_vc_tab      is table of t_vc;
   function foo return t_vc_tab pipelined;
   function koo return t_dba_row_tab pipelined;
   function soo return t_dba_row_tab pipelined;
 end lc_pin#;

create or replace package body lc_pin#

  function foo return t_vc_tab pipelined
     l_result  t_vc;
   begin     := 'lc_test';
     pipe row(l_result);
   end foo;  
  function koo return t_dba_row_tab pipelined
     for c in (select * from dba_tables where  rownum = 1) loop
       pipe row(c);
     end loop;
   end koo;
  function soo return t_dba_row_tab pipelined
     for c in (
       with sq as (select * from table(foo))    -- Line 25
       select nt.*
      from   sq 
            ,(select * from table(koo)) nt
      -- following re-write works
      -- select nt.* from (select * from table(foo)) sq, (select * from table(koo)) nt
     ) loop
       pipe row(c);                 -- Line 33
     end loop;
   end soo;
end lc_pin#;