Problem:
There is an Oracle application which divides the whole work as front-end and back-end. Both are performing the similar tasks. Front-end is for GUI online processing of small and urgent work;
whereas back-end is for large and normal case settled by Oracle background Jobs.
The system is set up with default value of WORKAREA_SIZE_POLICY = AUTO. From time to time, back-end hits error of ORA-04030, i.e. Operating system process private memory has been exhausted. So the developers decided that they should have a better control of *_AREA_SIZE parameters, for example, by regulating HASH_AREA_SIZE, SORT_AREA_SIZE, they can somehow avoid such error. Hence, they decided to put a:
execute immediate 'alter session set workarea_size_policy = manual';
at the beginning of background Jobs (see Test Code).
For a while, system gets stable and no more ORA-04030 occurs.
Following the progress of development and workload on the system, application gets bigger and bigger (more code too). The consequence is that DBA is obliged to increase shared pool memory with each rollout of new release. For the moment, it is more than 12 GB. However the performance was continuously degraded, probably due to the huge cursor cache and library cache, which overwhelm managing and seeking time for cursors.
By dumping library_cache with:
alter session set events 'immediate trace name library_cache level 15';
One can see frequent CursorDiagnosticsNodes and AgedOutCursorDiagnosticNodes marked as
reason=Optimizer mismatch(12) size=2x216
workarea_size_policy= manual auto
_smm_auto_cost_enabled= false true
which caused enormous cursor reloads and invalids.
And even worse, there are some SGA memory resizing, that means: Shared Pool steals memory from Buffer Cache.
One can check this resizing by (or check AWR - Memory Dynamic Components):
select component, current_size, user_specified_size, oper_count, last_oper_type
from v$memory_dynamic_components;
The output looks like:
COMPONENT CURRENT_SIZE USER_SPECIFIED_SIZE OPER_COUNT LAST_OPER_TYPE
shared pool 2'785'017'856 2'415'919'104 5 GROW
DEFAULT buffer cache 2'936'012'800 3'305'111'552 5 SHRINK
The above says same amount(369'098'752 Bytes) stolen by shared pool from DEFAULT buffer cache (v$sga_resize_ops, v$memory_resize_ops also provide the similar information).
During such operation, the entire system looks jammed. It seems well known by Oracle, so they have designated a new hidden parameter: "_memory_imm_mode_without_autosga" (Allow immediate mode without sga/memory target) to prevent such resizing.
MOS also made a clear claim:
set the parameter _MEMORY_IMM_MODE_WITHOUT_AUTOSGA=false (default is TRUE) in the instance to disable this feature with the consequence that in future an ORA-4031 error would be raised.
alter system set "_memory_imm_mode_without_autosga"=FALSE;
That is also true. Since then the system occasionally hits ORA-04031.
Analysis:
With Oracle 11g Release 2 (11.2.0.2), a new column: REASON is available to reveal the root cause of cursor boost.
select sql_id, optimizer_mismatch
,extract (xmltype ('<kroot>' || reason || '</kroot>'), '//workarea_size_policy').getStringVal () wasp
,extract (xmltype ('<kroot>' || reason || '</kroot>'), '//_smm_auto_cost_enabled').getStringVal () sace
,v.*
from v$sql_shared_cursor v
where sql_id in (select sql_id from v$sql_shared_cursor where optimizer_mismatch='Y')
order by v.sql_id, wasp, sace;
It shows that there are paired child cursors for each sql_id due to optimizer_mismatch,
and the reason is workarea_size_policy.
00hxpd7rjygxq N
<workarea_size_policy> auto manual </workarea_size_policy>
<_smm_auto_cost_enabled> true false </_smm_auto_cost_enabled>
00hxpd7rjygxq Y
<workarea_size_policy> manual auto </workarea_size_policy>
<_smm_auto_cost_enabled> false true </_smm_auto_cost_enabled>
Note that a top Node <kroot> is specially appended since reason column can contain more than one <ChildNode> and it requires a top Node to group them before using xpath query (again confirmed that new Features need a long way to go before perfect).
This Blog demonstrates that a tiny unconscious change could provoke a dramatic impact.
Fix:
Remove:
'alter session set workarea_size_policy = manual'
and use dynamic code to adjust to back-end jobs.
If we check again "SQL AREA", we find that memory consumption is only half of original:
select bytes from v$sgastat where name = 'SQLA';
Test Code (Oralce 11.2.0.2 or later):
create table testt as
select level x, rpad ('abc', 100, 'k') y from dual connect by level <= 1000;
create or replace procedure tasks as
y number;
begin
for j in 1 .. 500 loop
for i in 1 .. 1000 loop
execute immediate 'select /*' || (rpad ('x', i, 'y')) || '*/ x from testt where x = :i'
into y using i;
end loop;
end loop;
end;
/
create or replace procedure auto_run as
begin
execute immediate 'alter session set workarea_size_policy = auto';
tasks;
end;
/
create or replace procedure manual_run as
begin
execute immediate 'alter session set workarea_size_policy = manual';
tasks;
end;
/
create or replace procedure start_work (p_job_cnt number) as
l_job_id pls_integer;
begin
for i in 1 .. p_job_cnt loop
dbms_job.submit (l_job_id, 'begin auto_run; end;');
dbms_job.submit (l_job_id, 'begin manual_run; end;');
end loop;
commit;
end;
/
exec start_work(32);