Friday, April 12, 2013

cursor: pin S wait on X

Oracle introduced mutex (replacing latch) to protect cursor and library cache, and consequently following wait events:

   cursor: mutex S
   cursor: mutex X

   cursor: pin S
   cursor: pin X
   cursor: pin S wait on X

   library cache: mutex S
   library cache: mutex X


Staring in perplexity, I wonder why Oracle conceived such an asymmetrical third wait event for "cursor: pin":

      cursor: pin S wait on X

but not for "cursor: mutex" and "library cache: mutex".

A recent session crash dump shed some light on this deliberate design.

Session Wait History shows that it is desperately waiting for 'cursor: pin S wait on X'.

  waited for 'cursor: pin S wait on X'                               
  idn=0xb9a7f11c, value=0x1fef00000000, where=0x500000000
  wait_id=26539 seq_num=26582 snap_id=1
  wait times: snap=0.010060 sec, exc=0.010060 sec, total=0.010060 sec
  wait times: max=infinite
  wait counts: calls=2 os=2
  occurred after 0.014483 sec of elapsed time


pick the idn value, and search further in the dump file, we find the waited object.

  LibraryHandle:  Address=70000521ff2b1e8 Hash=b9a7f11c LockMode=N PinMode=0 LoadLockMode=0 Status=VALD
   ObjectName:  Name=SELECT NAME FROM
     FullHashValue=28a0bd52de76008db45a3f29b9a7f11c Namespace=SQL AREA(00) Type=CURSOR(00)
     Children: 
       Child:  childNum='0'
         LibraryHandle:  Address=70000521f025fc0 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD
           Name:  Namespace=SQL AREA(00) Type=CURSOR(00)
       Child:  childNum='1'
         LibraryHandle:  Address=70000521d0eeeb0 LockMode=N PinMode=X LoadLockMode=0 Status=VALD
           Name:  Namespace=SQL AREA(00) Type=CURSOR(00)
       Child:  childNum='2'
         LibraryHandle:  Address=70000521dec3790 LockMode=0 PinMode=0 LoadLockMode=0 Status=VALD
           Name:  Namespace=SQL AREA(00) Type=CURSOR(00)
           ObjectFreed=last freed from HPD addn data CBK
       Child:  childNum='3'
         LibraryHandle:  Address=70000521adcce68 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD
           Name:  Namespace=SQL AREA(00) Type=CURSOR(00)

          
A query on dba_hist_active_sess_history reveals that only this session is with SQL_CHILD_NUMBER = 2, all other sessions are with SQL_CHILD_NUMBER = 1. Before and after this session crashes(exit), other sessions are still running.

Look carefully at childNum='1', we can see this child cursor marked with:

    PinMode=X
          
Now a possible interpretation for 'cursor: pin S wait on X' would be:

   When Oracle wants to "pin S" a cursor, it sequentially traverses all child cursors under the same parent.
   If it hits a cursor marked with: PinMode=X before it reaches the matched cursor, it posts:

        cursor: pin S wait on X


1. Stack Trace


Pick UNIX pid of blocked Oracle sessions waiting on Event: "cursor: pin S wait on X", make a stack trace.
Here the outputs on both AIX and Solaris, where the line:
    kkscsSearchChildList
probably indicates the sequential examination along all child cursors.

oracle@aix > procstack 64815340
 64815340: oracledb (LOCAL=NO)
 __fd_select(??, ??, ??, ??, ??) + 0xbc
 select(??, ??, ??, ??, ??) + 0x14
 skgpnap(??, ??, ??, ??) + 0x16c
 skgpwwait(??, ??, ??, ??, ??) + 0x204
 kgxWait(??, ??, ??, ??, ??, ??) + 0x1b8
 kgxModifyRefCount(??, ??, ??, ??) + 0x288
 kgxSharedExamine(??, ??, ??) + 0x3c
 kxsGetRuntimeLock(??, ??, ??, ??, ??) + 0x144
 kkscsCheckCursor(??, ??, ??, ??, ??, ??, ??, ??) + 0x1ec
 kkscsSearchChildList(??, ??, ??, ??, ??, ??, ??, ??) + 0x7b8
 kksfbc(??, ??, ??, ??, ??, ??, ??, ??) + 0x51f4
 kkspsc0(??, ??, ??, ??, ??, ??, ??) + 0x8b0
 kksParseCursor(??) + 0x78
 opiosq0(??, ??, ??, ??, ??) + 0x7f4
 kpooprx(0x7000100fb4086a0, 0x0, 0x0, 0x100000001, 0x0, 0xa4000000a4, 0x0) + 0x16c
 kpoal8(??, ??, ??) + 0x1bc
 opiodr(??, ??, ??, ??) + 0x3b4
 ttcpip(??, ??, ??, ??, ??, ??, ??, ??) + 0x34c
 opitsk(??, ??) + 0x2224
 opiino(??, ??, ??) + 0xb98
 opiodr(??, ??, ??, ??) + 0x3b4
 opidrv(??, ??, ??) + 0x414
 sou2o(??, ??, ??, ??) + 0xf0
 opimai_real(??, ??) + 0x1c0
 ssthrdmain(0x0, 0x200000002, 0xfffffffffffed18) + 0x1c4
 main(??, ??) + 0xd0
 __start() + 0x70


oracle@solaris :/export/home/oracle > pstack 1098
 pollsys  (fffffd7fffdf5860, 0, fffffd7fffdf58f0, 0)
 pselect () + 154
 select () + 72
 skgpwwait () + 168
 kgxWait () + 216
 kgxModifyRefCount () + 15e
 kgxSharedExamine () + 3b
 kxsGetRuntimeLock () + 137
 kkscsCheckCursor () + 22a
 kkscsSearchChildList () + 58f
 kksfbc () + 915
 kkspsc0 () + 9d7
 kksParseCursor () + 74
 opiosq0 () + 70a
 kpooprx () + 102
 kpoal8 () + 308
 opiodr () + 433
 ttcpip () + 593
 opitsk () + 6cc
 opiino () + 3c5
 opiodr () + 433
 opidrv () + 33b
 sou2o () + 7a
 opimai_real () + 265
 ssthrdmain () + 22e
 main () + a4
 ???????? ()

2. Mutex Sleep Reason


We have a SQL statement, whose parsing takes about 11 minutes. If we start two sessions to parse the same statement. One session is parsing, another is waiting for Event: cursor: pin S wait on X.

Collect AWR report for the duration of 11 minutes, Section: Mutex Sleep Summary shows:

Mutex Type Location Sleeps Wait Time (ms)
Cursor Pin kkslce [KKSCHLPIN2] 61,421 616,083
Library Cache kglhdgn2 106 1 0
Library Cache kglpnal1 90 1 0

We can see the 11 minutes of 616,083 Wait Time (ms), Sleeps of 61,421.
That is probably due to Scheduler clock tick frequency of 10 milliseconds, approximately the equation:
    61,421 x 10 = 616,083

The above observation seems conforming to MOS (Doc ID 1268724.1):
   "Cursor: Pin S Wait On X" Contention Mutex Sleep Reason Primarily ' kkslce [KKSCHLPIN2]'

MOS: How to Determine the Blocking Session for Event: 'cursor: pin S wait on X' (Doc ID 786507.1) has a detailed description:
   A session waits on this event when requesting a mutex for shareable operations related to pins (such as executing a cursor),
   but the mutex cannot be granted because it is being held exclusively by another session (which is most likely parsing the cursor).
   The column P2RAW in v$session or v$session_wait gives the blocking session for wait event cursor: pin S wait on X.


and gave the query to check it:

select p2raw,to_number(substr(to_char(rawtohex(p2raw)),1,8),'XXXXXXXX') sid
  from v$session
 where event = 'cursor: pin S wait on X'; 

in 64 bit platforms, 8 bytes are used.
  Top 4 bytes hold the session id (if the mutex is held X)
  Bottom 4 bytes hold the ref count (if the mutex is held S). 


3. cursor: pin S wait on X    Simulation


Setup test:

drop table testt;

create table testt as select 1 id, 'Test' name from dual;

Open a Sqlplus Session,

SQL (72) > select sid, pid, spid from v$session s, v$process p where s.paddr=p.addr and s.sid = sys.dbms_support.mysid;

 SID  PID  SPID
---- ---- -----
  72   34  1122

SQL (72) > alter system flush shared_pool; 

Pick one kks subroutine, for instance, kksFullTypeCheck. On Solaris, execute:

$ sudo dtrace -w -n 'pid$target::kksFullTypeCheck:entry {@[pid, ustack(10, 0)] = count(); stop(); exit(0);}' -p 1122  

On above Sqlplus Session, run:

SQL (72) > select id, name from testt where id = 1;

The Session is stopped.

Open a second Sqlplus Session, and run the same query:

SQL (83) > select sid, pid, spid from v$session s, v$process p where s.paddr=p.addr and s.sid = sys.dbms_support.mysid;

 SID  PID  SPID
---- ---- -----
 83    45  6677

SQL (83) > select id, name from testt where id = 1;

SID 83 is blocked.

Observe the "cursor: pin S wait on X" event by:

select sid, osid, chain_signature, blocker_is_valid, wait_event_text, in_wait_secs, num_waiters
from v$wait_chains where sid in (72, 83);

To unblock, execute:
  $ prun 1122


4. Slow Parsing in multiple Sessions


Further tests show that if a slow parsing takes x seconds for one statement in one single session, concurrently parsing it in n sessions will take a total elapsed time of (n*(n-1)*x + x) seconds for all sessions, even worse if CPU is saturated. So in average, that is about (n-1)*x seconds per session (assume n is relatively big). The main wait event is "cursor: pin S wait on X" among the concurrent sessions.

One workaround to get off the racing condition is to make the statement passed to each session semantic equivalent, but syntax (sql text) different, for example, adding different dummy hint, or appending different constant to "order by" clause in each session's statement. The downside is more space in shared pool.