On Oracle

Thursday, March 28, 2013

Shared Pool - Memory Allocation - unbalanced

Continuing from my last field experience of ORA-04031 (Shared Pool - KKSSP), I would like to share my further observation of Shared Pool Memory Allocation.

At first exposing some figures (facts), then trying to discuss the unbalanced memory allocation among subpools, eventual the cause of ORA-04031.

Figures

1. Areas (Subheaps)

There are 959 areas (Oracle 11.2.0.3.0) in shared pool dump.

The query:

select count(*) from v$sgastat where pool = 'shared pool';

returns only 881, so not all areas are registered in v$sgastat, for example, "kokcd", "post agent".
(probably you will see 1062 in Oracle 12.1.0.1.0, and 35, 593, 729, 857 in the previous releases enumerated in Oracle Performance Firefighting)

2. Subpools

Subpool size can have more than 20% difference (range from 2'684'354KB to 3'355'443KB, all size in KB), areas in Subpools are also different. There are total 959 areas, but each subpool can have maximum 316 areas.

Name	Subpool_1	Subpool_2	Subpool_3	Subpool_4	Subpool_5	Subpool_6	Subpool_7	Sum
Subpool_Size	2'684'354	3'355'443	3'355'443	2'684'354	3'355'443	2'684'354	3'355'443	21'474'836
Area_Count	295	303	308	306	306	308	316	959

Top 5 Areas Name	Subpool_1	Subpool_2	Subpool_3	Subpool_4	Subpool_5	Subpool_6	Subpool_7	Sum
KKSSP	617'693	563'340	590'296	590'798	577'532	643'504	577'237	4'160'404
db_block_hash_buckets	529'039	535'429	529'039	539'525	535'429	529'044	535'429	3'732'935
KGLH0	554'532	485'312	464'006	353'634	450'528	346'045	332'455	2'986'514
SQLA	1'110	588'943	565'317	185'664	574'008	155'498	272'452	2'342'994
free memory	306'562	358'029	353'146	306'659	342'194	325'386	304'535	2'296'513

3. Single Area Allocation

There are 649 areas allocated only to one single subpool. 110 areas allocated to two subpools, but often one of two is allocated simply 24 Bytes, and other is allocated the rest, for example, FileOpenBlock is allocated 24 Bytes into Subpool_6, 510,025,424 Bytes into Subpool_7.

Here are the top 10 (all size in Byte):

                                                                
Name                    Subpool_1   Subpool_2   Subpool_3   Subpool_4    Subpool_5   Subpool_6   Subpool_7    Sum         ALLOC_COUNT
----------------------- ----------- ----------- ----------- ------------ ----------- ----------- ------------ ----------- -----------
FileOpenBlock                                                                        24          510,025,424  510,025,448           2
enqueue                 39,257,104                                                               24           39,257,128            2
KSK VT POOL                                                 24           19,130,696                           19,130,720            2
ktlbk state objects                             17,971,200                                                    17,971,200            1
Wait History Segment                                                                 15,733,120               15,733,120            1
Global Context                                              11,017,024                                        11,017,024            1
call                    24           10,057,864                                                               10,057,888            2
keswx:plan en                        10,011,896                                                               10,011,896            1
KEWS sesstat values                                         9,802,944                                         9,802,944             1
FileIdentificatonBlock               24         7,711,368                                                     7,711,392             2

4. Unbalanced Areas

Following Areas allocations are not well balanced (difference to average > 30%)

The top 10 are (all size in Byte):


Name                 Subpool_1   Subpool_2   Subpool_3   Subpool_4    Subpool_5   Subpool_6   Subpool_7    Sum           ALLOC_COUNT
-------------------- ----------- ----------- ----------- ------------ ----------- ----------- ------------ ------------- -----------
SQLA                 1,110,321   588,943,273 565,317,412 185,664,243  574,008,432 155,498,158 272,452,386  2,342,994,419           7
FileOpenBlock                                                                     24          510,025,424  510,025,448             2
SQLP                 761,808     66,155,040  61,548,864  15,324,936   57,623,952  10,234,544  21,440,800   233,089,944             7
KGLS                 737,792     9,169,960   8,730,128   4,925,272    8,779,216   4,559,760   5,985,728    42,887,856              7
KQR M PO             30,720      799,232     512         255,896      30,568,128  138,680     8,590,256    40,383,424              7
enqueue              39,257,104                                                               24           39,257,128              2
PLDIA                0           7,246,864   8,072,928   4,285,000    7,129,088   3,520,016   5,066,408    35,320,304              7
write state object   4,662,208   4,662,184   48          9,324,368    24          4,662,208   4,662,208    27,973,248              7
KQR L PO                         924,032     21,262,248  117,768                                           22,304,048              3
PRTDS                241,720     4,668,976   4,023,352   2,747,344    5,001,776   2,203,384   1,311,744    20,198,296              7

Discussion

Based on the above figures, we can see that each subpool can have maximum 1/3 of total areas, and certain areas are only allocated to 1 or 2 subpools. For example,

"FileOpenBlock" is in Subpool_6 and Subpool_7, but Subpool_6 has only 24 Bytes.

"enqueue" is in Subpool_1 and Subpool_7, but Subpool_7 has only 24 Bytes.

"PLDIA" is in Subpool_1 and Subpool_7, but Subpool_1 has 0 Bytes.

"Global Context" is only in Subpool_4.

Now we have some more explanation of "Global Context" contention as experienced in the past since it is all only in one single subpool.

Certain areas are extremely unbalanced among subpools. For example,

Subpool_1 has 1'110 KB for SQLA.

Subpool_2 has 588'943 KB for SQLA.

That probably results in the error:

       ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","SELECT MAX(XX) FROM...","SQLA","tmp")

(probably Oracle can improve the error message by including the Subpool number)

Now I wonder if this shared pool architecture is by design hardly to be balanced. For static area allocations, like "db_block_hash_buckets", it could be OK. However dynamic areas with frequent memory fragmentation would put tremendous pressure on memory management.

ORA-04031 is a cumulative outcome after certain time of ineffective usage of memory. Therefore it is hard to predict and reproduce. Quite often it is thrown by the victim session, which are not necessarily the cause of the error.

By what we learned from LISP (CLOS) and Java, which are using automatic memory management (garbage collection), we can also say that Shared Pool memory management puts remarkable challenge on this technique, and hence a long way to be perfect.

For example, Oracle shared_pool durations "instance", "session", "cursor", and "execution" (vs. LISP garbage collector generations), is probably the second parameter of Heap Dump title:
   HEAP DUMP heap name="sga heap(1,0)"
where 1 is subpool_number, 0 is duration.

In Oracle, if Redo is claimed as most critical mechanism in Oracle, probably shared pool should be declared as most sophisticated one.

BTW, to inspect number of Subpools, run following query (instead of checking "_kghdsidx_count").

    select * from v$latch_children where name = 'shared pool';

It returns 7 rows (maxmum 7 Subpools), but only the active ones are those with higher GETS (or MISSES).

Wednesday, March 27, 2013

Shared Pool - KKSSP

In a recent DB hanging caused by ORA-04031, we saw huge allocations of KKSSP, and session dump with:

     ORA-04031: unable to allocate 256 bytes of shared memory ("shared pool","unknown object","KKSSP^9876","kgllk").

Even the smallest memory chunk of 32 bytes is no more procurable:

     ORA-04031: unable to allocate 32 bytes of shared memory

The DB is configured with 20GB shared pool, 300GB buffer cache, running as dedicated server with 6000 concurrent login sessions in Oracle 11.2.0.3.0.

Additionally, Automatic Shared Memory Management(ASMM) disabled, cursor_sharing being exact
("_memory_imm_mode_without_autosga" set as FALSE, that is, disallow immediate mode without sga/memory target.
See Oracle Doc ID 1269139.1: SGA Re-Sizes Occurring Despite AMM/ASMM Being Disabled (MEMORY_TARGET/SGA_TARGET=0)).

The whole KKSSP consumption amounts to 4GB just before crash, so average per session 700KB.

A heapdump of shared pool shows that all KKSSP are listed with comment like:

      Chunk 70000009fad14d0 sz=     2136    freeable "KKSSP^2566     " ds=7000000A5DA1798

where 2566 is SID of login session. So KKSSP is per session allocated and is a session specific area (subheap) in shared pool.

This contradicts a common believe of shared pool being shared (at least in majority) across all sessions because they are not negligible memory consummers.

Pick the ds marked address(descriptor), and dig further by a KKSSP address dump:

    ORADEBUG DUMP HEAPDUMP_ADDR 2 0x7000000A5DA1798

then aggregate by Tanel's Heapdump Analyzer:

Total_size #Chunks Chunk_size,     From_heap,       Chunk_type, Alloc_reason
---------- ------- ------------ ----------------- --------------- -----------------
      188160 735        256 ,       KKSSP^2566,         freeable, kgllk
      181504 709        256 ,       KKSSP^2566,         freeable, kglpn
       56320 220        256 ,       KKSSP^2566,         freeable, KQR ENQ
       28896 516         56 ,       KKSSP^2566,         freeable, kglseshtSegs
       12312    1      12312 ,       KKSSP^2566,         freeable, kglseshtTable

The above table shows that the top 3 memory consumers are kgllk, kglpn, KQR ENQ. Each single one of them needs 256 Bytes. More than half of memory is allocated to kgllk and kglpn since the application is coded in PL/SQL packages and types, which requires kgllk and kglpn during each call to keep them stateful.

To inspect kgllk and kglpn touched objects, following query will list them:

select s.sid, username, logon_time
      ,(select kglnaobj||'('||kglobtyd||')' from x$kglob v
         where kglhdadr = v.object_handle and rownum=1) kobj_name
      ,v.*
from v$libcache_locks v, v$session s
where holding_session = s.saddr
and s.sid = 2566;

The above query can also help to debug: library cache pin and library cache lock.

(select * from x$kglob where kglhdadr in
   (select p1raw from v$session where sid = :blocked_session);

Instead of Heapdump, a direct way to get the memory consumption for a session is:

select count(*), sum(ksmchsiz) from x$ksmsp where ksmchcom='KKSSP^2566';

Following select can also give the address for KKSSP address Heapdump:

select ksmchpar from x$ksmsp where ksmchcom='KKSSP^2566' and ksmchcls = 'recr';

(The output looks like 07000000A5DA1798, replace first 0 with 0X to dump:
oradebug dump heapdump_addr 2 0X7000000A5DA1798)

By the way, the above dump probably finds the lost "KQR ENQ" discussed in Page169 of Book:
Oracle Core: Essential Internals for DBAs and Developers

... when I ran the same query against an instance of 11.2.0.2 there was no entry for KQR ENQ ...

We can see "KQR ENQ" now moved to KKSSP in Oracle 11.2.0.3.0.

MOS: ORA-4031 Or Excessive Memory Consumption On KKSSP Due To Parse Failures (Doc ID 2369127.1)
said:
   KKSSP is just a type of internal memory allocation related to child cursors.

I also noticed that high usage of kgllk and kglpin was allied with heavy contention on KOKC latch(kokc descriptor allocation latch), which is responsible for pinning, unpinning and freeing objects. KOKC is a single latch without children, thus a single point of contention.

Friday, December 7, 2012

Delete Restart and new Active Undo Extent

Oracle dbms_lock.allocate_unique allocates a unique lock ID to a named lock, and returns the handle associated with the lock ID so that application can use this handle in subsequent calls to request, convert, and release.

allocate_unique occasionally runs:
delete dbms_lock_allocated where expiration < sysdate;
to clean up expired UL locks.

This delete statement makes me rethink my previous Blog: "Update Restart and new Active Undo Extent"

is this delete can also cause Restart ?

The appended test code confirms it (tested on AIX and Solaris in Release 11.2.0.3.0).

When several sessions call allocate_unique, it can cause certain "enq: TX - row lock contention" on dbms_lock_allocated.

As a unintentional side effect, once more than 100000 "UL" locks are generated by dbms_lock.request, queries on v$lock and v$enqueue_lock are heavily affected, even after all "UL" locks are released.

Test Code

create or replace function get_undo_extent_cnt return number
as
l_cnt number;
begin
select count(*) into l_cnt from dba_undo_extents where status = 'ACTIVE';
return l_cnt;
end;
/

create or replace package pacepack
as
type array is table of number index by varchar2(40);
g_data                  array;
g_cnt                   number;
g_restart_cnt           number;
g_undo_extent_cnt       number;
end pacepack;
/

drop table delete_data;

create table delete_data as
select sysdate + level/86400 tim, rpad('abc', 100, 'x') txt from dual connect by level <= 100000;

create or replace trigger delete_data_bt
before delete or update on delete_data
for each row
begin
if ( pacepack.g_data.exists(rowidtochar(:old.rowid)))
then
    pacepack.g_restart_cnt := pacepack.g_restart_cnt + 1;

    dbms_output.put_line( 'doing "' || :old.rowid ||
                           '" again was called ' || pacepack.g_cnt );
    dbms_output.put_line('-- Restart#: ' || pacepack.g_restart_cnt ||
                            ', Undo Extent Count: ' || get_undo_extent_cnt);
else
    pacepack.g_data(rowidtochar(:old.rowid)) := 1;
end if;

pacepack.g_cnt        := pacepack.g_cnt + 1;
end;
/

create or replace procedure testp as
l_cnt     number := 0;
l_del_cnt number;
begin
pacepack.g_data.delete;
pacepack.g_cnt         := 0;
pacepack.g_restart_cnt := 0;

delete delete_data where tim <= (select max(tim) from delete_data);

l_del_cnt := sql%rowcount;
dbms_output.put_line( 'trigger_count (' || pacepack.g_cnt         || ') = ' ||
                        'delete_count (' || l_del_cnt              || ') + ' ||
                        'restart_count (' || pacepack.g_restart_cnt || ') = ' ||
                         (l_del_cnt + pacepack.g_restart_cnt)
                      );

commit;

end;
/

exec testp;
/

Output

doing "AAC2jpAAAAAHmm2AAF" again was called 10716
-- Restart#: 1, Undo Extent Count: 5
doing "AAC2jpAAAAAHmoFAAS" again was called 15581
-- Restart#: 2, Undo Extent Count: 6
doing "AAC2jpAAAAAHmpSAAf" again was called 20446
-- Restart#: 3, Undo Extent Count: 7
doing "AAC2jpAAAAAHmqhAAs" again was called 25311
-- Restart#: 4, Undo Extent Count: 8
doing "AAC2jpAAAAAHmruAA5" again was called 30176
-- Restart#: 5, Undo Extent Count: 9
doing "AAC2jpAAAAAHms+AAH" again was called 35041
-- Restart#: 6, Undo Extent Count: 10
doing "AAC2jpAAAAAHmuNAAU" again was called 39906
-- Restart#: 7, Undo Extent Count: 11
doing "AAC2jpAAAAAHmvaAAh" again was called 44771
-- Restart#: 8, Undo Extent Count: 12
doing "AAC2jpAAAAAHmwpAAu" again was called 49636
-- Restart#: 9, Undo Extent Count: 13
doing "AAC2jpAAAAAHmx2AA7" again was called 54501
-- Restart#: 10, Undo Extent Count: 14
doing "AAC2jpAAAAAHmzGAAJ" again was called 59366
-- Restart#: 11, Undo Extent Count: 15
doing "AAC2jpAAAAAHm0VAAW" again was called 64231
-- Restart#: 12, Undo Extent Count: 16
doing "AAC2jpAAAAAHm1iAAj" again was called 69096
-- Restart#: 13, Undo Extent Count: 17
doing "AAC2jpAAAAAHm2xAAw" again was called 73961
-- Restart#: 14, Undo Extent Count: 18
doing "AAC2jpAAAAAHm3+AA9" again was called 78826
-- Restart#: 15, Undo Extent Count: 19
doing "AAC2jpAAAAAHm5OAAL" again was called 83691
-- Restart#: 16, Undo Extent Count: 20
doing "AAC2jpAAAAAHm6dAAY" again was called 88556
-- Restart#: 17, Undo Extent Count: 21
doing "AAC2jpAAAAAHm7qAAl" again was called 93421
-- Restart#: 18, Undo Extent Count: 22
doing "AAC2jpAAAAAHm85AAy" again was called 98286
-- Restart#: 19, Undo Extent Count: 23
trigger_count (100019) = delete_count (100000) + restart_count (19) = 100019

So for each restart, a new active Undo Extent is created.

Wednesday, November 21, 2012

One Oracle AQ Deadlock

A few days ago, an AQ based Oracle application (Oracle 11.2.0.3.0) threw a deadlock after more than 1,000,000 times of unobstructed running since a few years.

The incident is frustrating since the reliability of AQ based implementation is disputed, and resorting to Oracle appears being a long way to go.

This demonstrates again that correctness proof of computer programs is hardly achievable, let alone concurrent programs.

The deadlock is caused by the following simplified pseudo code:

dbms_aqadm.start_queue;
enqueue some message;
dbms_job.submit to launch a dequeue job in background; -- can also be dbms_scheduler
wait till queue is empty;
dbms_aqadm.stop_queue;

which is started in main session and concurrently forks background job session.

The trace dump shows that main session is waiting for 'library cache pin' in the step of dbms_aqadm.stop_queue; whereas job session is waiting for 'enq: TM - contention' with Current SQL Statement:

select ... from queue_table ... for update skip locked;

Although reproducing this deadlock is not evident, we can still try to make some plausible experiment and some sort of post-mortem analysis.

At first we open 3 Sqlplus sessions:

observer session: observer_sid
main session:     main_sid
job session:      job_sid

before and after each step, we run the following 3 statements to check the delta of statistics:

select sid, type, id1, id2, lmode, request,ctime, block
   from v$lock where type in ('TM', 'TQ');
select eq_type, eq_name, req_reason, total_req#, total_wait#,succ_req#,failed_req#, cum_wait_time
   from v$enqueue_statistics where eq_type in ('TQ', 'TM');
select locked_total, pinned_total, locks, pins, lock_mode, pin_mode, executions
   from v$db_object_cache v where name = 'KSUN_QUEUE';

Inspired by Perturbation Theory, we "perturbe" time series by a table lock so that we can microscope the Oracle internal locking mechanisms.

Here is the time sequence and actions of first experiment on the observer session and main session as well as the noticed statistics change:

T1 observer_sid --- start_queue
T2 observer_sid --- lock queue table in exclusive mode
T3 main_sid     --- stop_queue
T4 observer_sid --- check stats
    main_sid got 1 TQ Lock in Mode 6, requests 1 TM in Mode 6
    TM DML: TOTAL_REQ# increased 1, but SUCC_REQ# not changed
    TQ Queue table enqueue DDL contention: TOTAL_REQ# increased 1, SUCC_REQ# increased 1
    LOCKED_TOTAL increased 1, PINNED_TOTAL increased 1
T5 observer_sid --- rollback to unlock table
    TM DML: TOTAL_REQ# increased 6, SUCC_REQ# increased 7
    LOCKED_TOTAL increased 1, PINNED_TOTAL increased 1

Here is the second experiment on the observer session and job session:

T6 observer_sid --- start_queue
T7 observer_sid --- lock queue table in exclusive mode
T8 job_sid --- dequeue with dequeue_options.wait being 3 seconds
T9 observer_sid --- check stats
    job_sid requests 1 TM in Mode 3
    TM DML: TOTAL_REQ# increased 1, but SUCC_REQ# not changed
    LOCKED_TOTAL increased 1, PINNED_TOTAL increased 1
T10 observer_sid --- rollback to unlock table
    TM DML: TOTAL_REQ# increased 2, but SUCC_REQ# increased 3
    LOCKED_TOTAL increased 1, PINNED_TOTAL increased 2

According the above statistics change, we can conclude:

(1). "stop_queue" got 'TQ Queue table enqueue DDL contention' before getting TM DML,
         and again 'library cache pin' / 'library cache lock' after getting TM DM.

(2). "dequeue" got 'library cache pin'/'library cache lock' before getting TM DML,
         and again 'library cache pin'/'library cache lock' after getting TM DM.

that probably explains:

(a). when main session "stop_queue", it first holds 'TQ Queue table enqueue DDL contention' (and hence TM) lock
      and then is waiting for 'library cache pin';

(b). whereas job session when "dequeue", it first holds 'library cache pin'/'library cache lock'
      and then is waiting for 'enq: TM - contention' with Current SQL Statement:

         select ... from queue_table ... for update skip locked;

Because 'library cache pin' and 'enq: TM - contention' in main session and job session are acquired and requested in reverse order, it signals a deadlock.

Oracle has 4 different req_reasons for TQ:

TQ DDL-INI contention
TQ INI contention
TQ DDL contention
TQ TM contention

what the first experiment shows is that "stop_queue" requires "DDL contention", which seems incurring 1 TQ in Mode 6 firstly and subsequently 1 TM in Mode 6 on queue table.

Probably one fix is to ensure no more Job sessions running before stepping into stop_queue.

Be careful that dbms_job.broken, dbms_job.remove, dbms_scheduler.disable, dbms_scheduler.stop_job, dbms_scheduler.drop_job seem not terminating job session immediately.

The dump file contains some more details about main session:

           time in wait: 1.309131 sec
    timeout after: 14 min 58 sec

and job session:

            time in wait: 1.309007 sec
     timeout after: never

Usually Oracle picks the longer waiting session to resolve the deadlock. In this case, it is the main session which raises an error, and job session continues its processing after receiving the lock.

The above text confirms again the "library cache pin" 15 minutes timeout mentioned in Blog:
Oracle 11gR2 single session "library cache pin"

15 minutes timeout can also be justified by following code.

In the first session, execute:

create or replace procedure proc as
begin
   execute immediate 'alter procedure proc compile';
end;
/

exec proc;
/

In the second session, run this query a few times:

select in_wait_secs, time_remaining_secs, (in_wait_secs + time_remaining_secs) total_secs
   from v$wait_chains;

two evidences can be observed:

(I). total_secs displays constantly 899 seconds (exit when elapsed > 899, that is, one second less than 15 minutes).

(II). both in_wait_secs and time_remaining_secs are updated each 3 seconds (deadlock detection interval).

after 899 seconds, the first session throws:

ORA-04021: timeout occurred while waiting to lock object

In fact, 15 minutes comes from default setting of "_kgl_time_to_wait_for_locks", which controls time to wait for locks and pins before timing out (time in minute, modification requires instance restart).

By the way, following statement requests a TM Lock in Mode 3 even the table is empty.

select ... from queue_table ... for update skip locked;

Friday, November 16, 2012

Oracle Java OutOfMemoryError

A long time perplexed issue is that a Java stored procedure in Oracle throws a java.lang.OutOfMemoryError very seldom, once a couple of months since a few years.

Analog to Java non-standard option: -Xmx, the appended test code 1 attempts to use:
OracleRuntime.setMaxMemorySize
to provoke OutOfMemoryError.

On a Sqlplus window, when one runs:


set serveroutput on size 50000
exec dbms_java.set_output(50000);
exec testp1(500000);

we get the error:


Exception in thread "Root Thread" java.lang.Throwable:
--- Message ---
null
--- Cause ---
null
--- StackTrace ---
java.lang.OutOfMemoryError
        at JavaOutput.output(JavaOutput:20)
        at JavaOutput.output(JavaOutput:28)

Open a new Sqlplus window (hence a new Oracle session), when one runs:

        
set serveroutput on size 50000
exec dbms_java.set_output(50000);
exec testp1(900000);

call succeeds. If one call again:
exec testp1(500000);
no more error.

Test Code 2 is the original Java stored procedure. It tries to make some post-processing after OutOfMemoryError, which causes further OutOfMemoryError.

There are two deficiencies with this code.

(1). It converts OutOfMemoryError (a subclass of Error) into RuntimeException (a subclass of Exception). And Java Doc on Class Error said:

An Error is a subclass of Throwable that indicates serious problems that a reasonable application should not try to catch.

(2). There is no OutOfMemoryError printed in the output:

         
Exception in thread "Root Thread" JavaOutput$WrapException:
--- Message ---
null
        at JavaOutput$ExceptHelper.setError(JavaOutput:41)
        at JavaOutput.output(JavaOutput:25)

and the root cause is discarded.

The reason is that "message" and "cause" are optional in Java VM implementation and it seems
that Oracle JVM does not include them. Here is an excerpt of Java Doc about class Throwable:

It can also contain a message string that gives more information about the error.
Finally, it can contain a cause: another throwable that caused this throwable to get thrown.

Test Code 1

(tested in Oracle 11.2.0.3.0 on AIX and Solaris):


CREATE OR REPLACE AND RESOLVE JAVA SOURCE NAMED "JavaOutput" as 
import java.io.InputStream;
import java.io.OutputStream;
import java.io.FileOutputStream;
import java.io.Writer;
import java.io.StringWriter;
import java.io.PrintWriter;
import oracle.sql.BLOB;
import oracle.aurora.vm.OracleRuntime;
public class JavaOutput {
  public static void output(String file, BLOB blob, long rtMaxMSSet) throws Throwable {
   InputStream  inStream = null;
   OutputStream outStream = null;
    try {
          OracleRuntime.setMaxMemorySize(rtMaxMSSet);
          
          int bufferSize = blob.getBufferSize();
          inStream  = blob.getBinaryStream();
          outStream = new FileOutputStream(file, true);
          
          byte[] buffer = new byte[bufferSize];
          int length = -1;
          while ((length = inStream.read(buffer)) != -1) {
                outStream.write(buffer, 0, length);
          }
    } catch (Throwable t) {
          String msg = t.getMessage();
          String stackTrc = getStackTrace(t);
          throw new Throwable("\n --- Message ---\n"    + t.getMessage() + 
                              "\n --- Cause ---\n"      + t.getCause() + 
                              "\n --- StackTrace ---\n" + stackTrc);
    } finally {
           if (inStream != null) {
                 inStream.close();
           }
           if (outStream != null) {
                 outStream.close();
           }
    }
  }
  static String getStackTrace(Throwable t)
  {
    Writer writer = new StringWriter();
    PrintWriter printWriter = new PrintWriter(writer);
    t.printStackTrace(printWriter);
    return writer.toString();
  }
}
/
create or replace procedure java_output(
  p_file_name               varchar2
 ,p_blob                    blob
 ,p_max_size                number
  ) as language java name 'JavaOutput.output(java.lang.String, oracle.sql.BLOB, long)';
/
create or replace procedure testp1(p_max_size number) as
  l_blob_huge               blob  := empty_blob;
  l_file_name               varchar2(200) ;
  l_raw                     raw(100);
  begin
    select  diagnostic_dest||'/diag/rdbms/' || db_name||'/'|| db_name||'/trace/ksun_output.txt'
    into    l_file_name
    from   ( select
             (select value from v$parameter where name = 'diagnostic_dest') diagnostic_dest
            ,(select value from v$parameter where name = 'db_name') db_name
            from dual);
    
    dbms_lob.createtemporary(l_blob_huge, true);
    l_raw :=  '4b53554e'; --ksun;
    dbms_lob.writeappend(l_blob_huge, lengthb(l_raw)/2, l_raw);
    java_output(l_file_name, l_blob_huge, p_max_size);
end testp1;
/

Test Code 2:


CREATE OR REPLACE AND RESOLVE JAVA SOURCE NAMED "JavaOutput" as 
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.io.FileOutputStream;
import java.sql.SQLException;
import oracle.sql.BLOB;
import oracle.aurora.vm.OracleRuntime;
public class JavaOutput {
  public static void output(String file, BLOB blob, long rtMaxMSSet) throws IOException, SQLException {
   InputStream  inStream = null;
   OutputStream outStream = null;  
    try {
          OracleRuntime.setMaxMemorySize(rtMaxMSSet);
          
          int bufferSize = blob.getBufferSize();
          inStream  = blob.getBinaryStream();
          outStream = new FileOutputStream(file, true);
          
          byte[] buffer = new byte[bufferSize];
          int length = -1;
          while ((length = inStream.read(buffer)) != -1) {
                outStream.write(buffer, 0, length);
          }
    } catch (Throwable t) {
          ExceptHelper.setError(t);
    } finally {
           if (inStream != null) {
                 inStream.close();
           }
           if (outStream != null) {
                 outStream.close();
           }
    }
  }
  
  static class ExceptHelper {
    static void setError(Throwable t) {
      try {
          byte[] errReport = new byte[400000];  
      } finally {
          throw new WrapException("\n --- Message ---\n" + t.getMessage());
      } 
    }
  }
  
  static class WrapException extends RuntimeException {
    WrapException(String msg) {
      super(msg);
    }
  }
}
/

Monday, July 30, 2012

One Mutex Collision Test

Inspired by the recent Higgs boson-hunting experiments at CERN, this Blog is trying to make some Oracle Mutex collision test on Oracle 11.2.0.3.0, and to watch what new "particles" can be detected.

We start the test by (see appended Test Code):

begin
ksun_mutex_test_jobs(64, 2);
end;

which launches 64 sessions running ksun_mutex_testp1 and 2 ksun_mutex_testp2.

ksun_mutex_testp1 runs the same sql statement by varying the NLS settings.

ksun_mutex_testp2 runs a query over v$sql in order to simulate some monitoring applications on sql statements. v$sql is chosen also because it is derived from x$kglcursor_child, which means any select on it will touch all child_cursors.

ksun_mutex_testp2 accelerates the collision, but does not play a decisive role in the test.

Here is the observed "particles":

1. Number of child_cursors

In theory, ksun_mutex_testp1 generates at most 800 (40 nls_language multiplied 20 nls_territory) child_cursors for the statement (sql_id: '754r1k9db5u80'):
    select id into l_id from testt where name = :B1;
but
   select count(*) from v$sql where sql_id ='754r1k9db5u80';
   >>> 66'510
(with certain fluctuations, but more than 10'000)

10Jan2013 Update: Tested on new Oracle 11.2.0.3.0,

select count(*), min(child_number), max(child_number) from v$sql where sql_id ='754r1k9db5u80';
=> 23'027 0 99

count(*) can no more reach 66'510.

select count(*), min(child_number), max(child_number) from v$sql where sql_id ='754r1k9db5u80' and is_obsolete = 'N';

=> 78 0 77
It seems that "_cursor_obsolete_threshold" is active.

2. Shared memory

select sharable_mem, persistent_mem, runtime_mem from v$sqlarea where sql_id ='754r1k9db5u80';

>>> 845'643'351 292'590'880 226'019'744

3. Reason for child_cursors

select extract (xmltype ('<kroot>' || reason || '</kroot>'), '//ChildNumber').getStringVal ()   child
      ,extract (xmltype ('<kroot>' || reason || '</kroot>'), '//reason').getStringVal ()        reason
from v$sql_shared_cursor
where sql_id ='754r1k9db5u80';

>>> <ChildNumber>x</ChildNumber>    <reason>NLS Settings(0)</reason>

where x is the child_cursor number.

4. New MUTEX_TYPE: "hash table"

select * from v$mutex_sleep where mutex_type = 'hash table' order by sleeps;

>>> hash table kkshGetNextChild [KKSHBKLOC1] 1289912 0

I wonder if this is related to db_block_hash_buckets subpool in shared pool:

select * from v$sgastat where pool = 'shared pool' and name = 'db_block_hash_buckets';

   >>> shared pool  db_block_hash_buckets  2920448

Update in 18Mar2013:
db_block_hash_buckets is for database block hash buckets. It is allocated in shared pool.
It takes about 1% of db_cache_size for db_block_size = 8192, or 70 Bytes for each database block hash bucket (chain).
For example, a DB with db_cache_size=262GB, db_block_hash_buckets needs 2848MB in shared pool.

5. System frozen

The most astonishing detection is that all started sessions are blocked after dozens of hours or even a couple of days by one session which is waiting either for an event:

   "library cache: mutex X"
or
    "library cache lock".

The waiters can be either "library cache: mutex X" or "library cache lock".

By dumping systemstate for the case of final blocker: library cache lock, we can see:

sid: 446 ser: 2025
    Current Wait Stack:
      Not in wait; last wait ended 1118 min 2 sec ago
    There are 65 sessions blocked by this session.
    Dumping one waiter:
      inst: 1, sid: 42, ser: 1249
      wait event: 'library cache lock'
        p1: 'handle address'=0xde5d11a8
        p2: 'lock address'=0xe019ed68
        p3: '100*mode+namespace'=0x520002

sid: 42 ser: 1249
    Current Wait Stack:
     0: waiting for 'library cache lock'
        handle address=0xde5d11a8, lock address=0xe019ed68, 100*mode+namespace=0x520002
        wait_id=78226 seq_num=20283 snap_id=347
        wait times: snap=51.136124 sec, exc=1118 min 2 sec, total=1118 min 2 sec
        wait times: max=infinite, heur=1118 min 2 sec
        wait counts: calls=22362 os=22362
        in_wait=1 iflags=0x15a2
    There is at least one session blocking this session.
      Dumping 1 direct blocker(s):
        inst: 1, sid: 446, ser: 2025

v$locks lists that final blocker session 446 holds 3 locks of type: CU, AE, JQ, and does not request any locks. Further check shows that only this session holds one CU lock. CU means "Recovers cursors in case of death while compiling", so this final blocker is probably falling into an endless loop of cursor rescuing process.

If one tries to deblock the situation by killing the blocking session, one of the blocked sessions will take over its role, and the original blocking session joins to the blocked sessions. So the only fix could be to reboot the DB.

On the UNIX level, we can see this process is more than 95% on Usr CPU.

As checked from time to time, the system is getting gradually slower and finally totally hanging on one single session.

The whole tests were carried out on Solaris(x86-64) and AIX. Similar blocking behaviors were also reported by some real Oracle 11gR2 applications.

Probably these "particles" will not need 45-year to be confirmed by Oracle.

By the way, with Oracle 11.2.0.3.0, a new parameter is introduced:

Name:              _cursor_obsolete_threshold
Description:     Number of cursors per parent before obsoletion.
Default value: 100

Addendum (2015.03.25): Looking at Oracle 11.2.0.4.0, the default value is increased to 1024.

It seems Oralce does not strictly follow this threshold, and eventually can cause Shared Pool explosion due to number of child cursor versions.

Test Code (Oralce 11.2.0.3.0):

drop table testt;

create table testt as select 1 id, 'ksun' name from dual;

create table ksun_mutex_sleep as select * from v$mutex_sleep where 1=2;

create or replace procedure ksun_mutex_testp1 as
l_id       number;
l_name     varchar2(10) := 'ksun';
type       tab is table of varchar2(100);
l_tab_lang tab;
l_tab_terr tab;
begin
select value bulk collect into l_tab_lang from v$nls_valid_values where parameter='LANGUAGE';
select value bulk collect into l_tab_terr from v$nls_valid_values where parameter='TERRITORY';

for j in 1..least(40, l_tab_lang.count) loop
    --execute immediate q'[alter session set nls_language=']'||l_tab_lang(j)||q'[']';
    dbms_session.set_nls('nls_language', ''''||l_tab_lang(j)||'''');
    for k in 1..least(20, l_tab_terr.count) loop
      --execute immediate q'[alter session set nls_territory=']'||l_tab_terr(k)||q'[']';
      dbms_session.set_nls('nls_territory', ''''||l_tab_terr(k)||'''');
      select id into l_id from testt where name = l_name;
    end loop;
end loop;
end;
/

create or replace procedure ksun_mutex_testp2 as
begin
for c in (select * from v$sql) loop
    null;
end loop;
end;
/

create or replace procedure ksun_mutex_test_jobs(p_job_cnt1 number, p_job_cnt2 number)
as
   l_job_id pls_integer;
begin
    for i in 1.. p_job_cnt1 loop
      dbms_job.submit(l_job_id, 'ksun_mutex_testp1;', interval => 'sysdate');
    end loop;

    for i in 1.. p_job_cnt2 loop
      dbms_job.submit(l_job_id, 'ksun_mutex_testp2;', interval => 'sysdate');
    end loop;
    commit;
end;
/

Monday, May 7, 2012

Why shared pool is doubled ?

Problem:

There is an Oracle application which divides the whole work as front-end and back-end. Both are performing the similar tasks. Front-end is for GUI online processing of small and urgent work;
whereas back-end is for large and normal case settled by Oracle background Jobs.

The system is set up with default value of WORKAREA_SIZE_POLICY = AUTO. From time to time, back-end hits error of ORA-04030, i.e. Operating system process private memory has been exhausted. So the developers decided that they should have a better control of *_AREA_SIZE parameters, for example, by regulating HASH_AREA_SIZE, SORT_AREA_SIZE, they can somehow avoid such error. Hence, they decided to put a:

    execute immediate 'alter session set workarea_size_policy = manual';

at the beginning of background Jobs (see Test Code).

For a while, system gets stable and no more ORA-04030 occurs.

Following the progress of development and workload on the system, application gets bigger and bigger (more code too). The consequence is that DBA is obliged to increase shared pool memory with each rollout of new release. For the moment, it is more than 12 GB. However the performance was continuously degraded, probably due to the huge cursor cache and library cache, which overwhelm managing and seeking time for cursors.

By dumping library_cache with:

    alter session set events 'immediate trace name library_cache level 15';

One can see frequent CursorDiagnosticsNodes and AgedOutCursorDiagnosticNodes marked as

    reason=Optimizer mismatch(12) size=2x216
    workarea_size_policy= manual auto
    _smm_auto_cost_enabled= false true

which caused enormous cursor reloads and invalids.

And even worse, there are some SGA memory resizing, that means: Shared Pool steals memory from Buffer Cache.

One can check this resizing by (or check AWR - Memory Dynamic Components):

select component, current_size, user_specified_size, oper_count, last_oper_type
from v$memory_dynamic_components;

The output looks like:

COMPONENT                 CURRENT_SIZE    USER_SPECIFIED_SIZE        OPER_COUNT        LAST_OPER_TYPE
shared pool                        2'785'017'856            2'415'919'104                            5                                GROW
DEFAULT buffer cache    2'936'012'800            3'305'111'552                            5                                SHRINK

The above says same amount(369'098'752 Bytes) stolen by shared pool from DEFAULT buffer cache (v$sga_resize_ops, v$memory_resize_ops also provide the similar information).

During such operation, the entire system looks jammed. It seems well known by Oracle, so they have designated a new hidden parameter: "_memory_imm_mode_without_autosga" (Allow immediate mode without sga/memory target) to prevent such resizing.

MOS also made a clear claim:

set the parameter _MEMORY_IMM_MODE_WITHOUT_AUTOSGA=false (default is TRUE) in the instance to disable this feature with the consequence that in future an ORA-4031 error would be raised.

    alter system set "_memory_imm_mode_without_autosga"=FALSE;

That is also true. Since then the system occasionally hits ORA-04031.

Analysis:

With Oracle 11g Release 2 (11.2.0.2), a new column: REASON is available to reveal the root cause of cursor boost.

select sql_id, optimizer_mismatch
      ,extract (xmltype ('<kroot>' || reason || '</kroot>'), '//workarea_size_policy').getStringVal ()   wasp
      ,extract (xmltype ('<kroot>' || reason || '</kroot>'), '//_smm_auto_cost_enabled').getStringVal () sace
      ,v.*
from v$sql_shared_cursor v
where sql_id in (select sql_id from v$sql_shared_cursor where optimizer_mismatch='Y')
order by v.sql_id, wasp, sace;

It shows that there are paired child cursors for each sql_id due to optimizer_mismatch,
and the reason is workarea_size_policy.

00hxpd7rjygxq       N
<workarea_size_policy>   auto manual </workarea_size_policy>
<_smm_auto_cost_enabled> true false </_smm_auto_cost_enabled>
00hxpd7rjygxq       Y
<workarea_size_policy>   manual auto </workarea_size_policy>
<_smm_auto_cost_enabled> false true </_smm_auto_cost_enabled>

Note that a top Node <kroot> is specially appended since reason column can contain more than one <ChildNode> and it requires a top Node to group them before using xpath query (again confirmed that new Features need a long way to go before perfect).

This Blog demonstrates that a tiny unconscious change could provoke a dramatic impact.

Fix:

Remove:
'alter session set workarea_size_policy = manual'
and use dynamic code to adjust to back-end jobs.

If we check again "SQL AREA", we find that memory consumption is only half of original:

select bytes from v$sgastat where name = 'SQLA';

Test Code (Oralce 11.2.0.2 or later):

create table testt as
select level x, rpad ('abc', 100, 'k') y from dual connect by level <= 1000;

create or replace procedure tasks as
y   number;
begin
for j in 1 .. 500 loop
    for i in 1 .. 1000 loop
      execute immediate 'select /*' || (rpad ('x', i, 'y')) || '*/ x from testt where x = :i'
      into y using i;
    end loop;
end loop;
end;
/

create or replace procedure auto_run as
begin
execute immediate 'alter session set workarea_size_policy = auto';
tasks;
end;
/

create or replace procedure manual_run as
begin
execute immediate 'alter session set workarea_size_policy = manual';
tasks;
end;
/

create or replace procedure start_work (p_job_cnt number) as
l_job_id   pls_integer;
begin
for i in 1 .. p_job_cnt loop
      dbms_job.submit (l_job_id, 'begin auto_run; end;');
      dbms_job.submit (l_job_id, 'begin manual_run; end;');
end loop;
commit;
end;
/

exec start_work(32);