Monday, May 7, 2012

Why shared pool is doubled ?

Problem:


There is an Oracle application which divides the whole work as front-end and back-end. Both are performing the similar tasks. Front-end is for GUI online processing of small and urgent work;
whereas back-end is for large and normal case settled by Oracle background Jobs.

The system is set up with default value of WORKAREA_SIZE_POLICY = AUTO. From time to time, back-end hits error of ORA-04030, i.e. Operating system process private memory has been exhausted. So the developers decided that they should have a better control of *_AREA_SIZE parameters, for example, by regulating HASH_AREA_SIZE, SORT_AREA_SIZE, they can somehow avoid such error. Hence, they decided to put a:

    execute immediate 'alter session set workarea_size_policy = manual';

at the beginning of background Jobs (see Test Code).

For a while, system gets stable and no more ORA-04030 occurs.

Following the progress of development and workload on the system, application gets bigger and bigger (more code too). The consequence is that DBA is obliged to increase shared pool memory with each rollout of new release. For the moment, it is more than 12 GB. However the performance was continuously degraded, probably due to the huge cursor cache and library cache, which overwhelm managing and seeking time for cursors.

By dumping library_cache with:

    alter session set events 'immediate trace name library_cache level 15';

One can see frequent CursorDiagnosticsNodes and AgedOutCursorDiagnosticNodes marked as

    reason=Optimizer mismatch(12) size=2x216
      workarea_size_policy= manual auto
      _smm_auto_cost_enabled= false true


which caused enormous cursor reloads and invalids.

And even worse, there are some SGA memory resizing, that means: Shared Pool steals memory from Buffer Cache.

One can check this resizing by (or check AWR - Memory Dynamic Components):

select component, current_size, user_specified_size, oper_count, last_oper_type
from v$memory_dynamic_components;


The output looks like:

COMPONENT                 CURRENT_SIZE      USER_SPECIFIED_SIZE        OPER_COUNT        LAST_OPER_TYPE
shared pool                        2'785'017'856            2'415'919'104                            5                                GROW
DEFAULT buffer cache    2'936'012'800            3'305'111'552                            5                                SHRINK


The above says same amount(369'098'752 Bytes) stolen by shared pool from DEFAULT buffer cache (v$sga_resize_ops, v$memory_resize_ops also provide the similar information).

During such operation, the entire system looks jammed. It seems well known by Oracle, so they have designated a new hidden parameter: "_memory_imm_mode_without_autosga" (Allow immediate mode without sga/memory target) to prevent such resizing.

MOS also made a clear claim:

set the parameter _MEMORY_IMM_MODE_WITHOUT_AUTOSGA=false (default is TRUE) in the instance to disable this feature with the consequence that in future an ORA-4031 error would be raised.

    alter system set "_memory_imm_mode_without_autosga"=FALSE;

That is also true. Since then the system occasionally hits ORA-04031.

Analysis:


With Oracle 11g Release 2 (11.2.0.2), a new column: REASON is available to reveal the root cause of cursor boost.

select sql_id, optimizer_mismatch
      ,extract (xmltype ('<kroot>' || reason || '</kroot>'), '//workarea_size_policy').getStringVal ()   wasp
      ,extract (xmltype ('<kroot>' || reason || '</kroot>'), '//_smm_auto_cost_enabled').getStringVal () sace
      ,v.*
from v$sql_shared_cursor v
where sql_id in (select sql_id from v$sql_shared_cursor where optimizer_mismatch='Y')
order by v.sql_id, wasp, sace;


It shows that there are paired child cursors for each sql_id due to optimizer_mismatch,
and the reason is workarea_size_policy.

00hxpd7rjygxq       N
  <workarea_size_policy>   auto manual </workarea_size_policy>   
  <_smm_auto_cost_enabled> true false  </_smm_auto_cost_enabled>
00hxpd7rjygxq       Y
  <workarea_size_policy>   manual auto </workarea_size_policy>   
  <_smm_auto_cost_enabled> false  true </_smm_auto_cost_enabled>


Note that a top Node <kroot> is specially appended since reason column can contain more than one <ChildNode> and it requires a top Node to group them before using xpath query (again confirmed that new Features need a long way to go before perfect).

This Blog demonstrates that a tiny unconscious change could provoke a dramatic impact.

Fix:


Remove:
    'alter session set workarea_size_policy = manual' 
and use dynamic code to adjust to back-end jobs.

If we check again "SQL AREA", we find that memory consumption is only half of original:

    select bytes from v$sgastat where name = 'SQLA';

Test Code (Oralce 11.2.0.2 or later):


create table testt as
select level x, rpad ('abc', 100, 'k') y from dual connect by level <= 1000;


create or replace procedure tasks as
  y   number;
begin
  for j in 1 .. 500 loop
    for i in 1 .. 1000 loop
      execute immediate 'select /*' || (rpad ('x', i, 'y')) || '*/ x from testt where x = :i'
      into y using i;
    end loop;
  end loop;
end;
/

create or replace procedure auto_run as
begin
  execute immediate 'alter session set workarea_size_policy = auto';
  tasks;
end;
/

create or replace procedure manual_run as
begin
  execute immediate 'alter session set workarea_size_policy = manual';
  tasks;
end;
/

create or replace procedure start_work (p_job_cnt number) as
  l_job_id   pls_integer;
begin
  for i in 1 .. p_job_cnt loop
      dbms_job.submit (l_job_id, 'begin auto_run; end;');
      dbms_job.submit (l_job_id, 'begin manual_run; end;');
  end loop;
  commit;
end;
/

exec start_work(32);