At first exposing some figures (facts), then trying to discuss the unbalanced memory allocation among subpools, eventual the cause of ORA-04031.
Figures
1. Areas (Subheaps)
There are 959 areas (Oracle 11.2.0.3.0) in shared pool dump.
The query:
select count(*) from v$sgastat where pool = 'shared pool';
returns only 881, so not all areas are registered in v$sgastat, for example, "kokcd", "post agent".
(probably you will see 1062 in Oracle 12.1.0.1.0, and 35, 593, 729, 857 in the previous releases enumerated in Oracle Performance Firefighting)
2. Subpools
Subpool size can have more than 20% difference (range from 2'684'354KB to 3'355'443KB, all size in KB), areas in Subpools are also different. There are total 959 areas, but each subpool can have maximum 316 areas.
Name | Subpool_1 | Subpool_2 | Subpool_3 | Subpool_4 | Subpool_5 | Subpool_6 | Subpool_7 | Sum |
Subpool_Size | 2'684'354 | 3'355'443 | 3'355'443 | 2'684'354 | 3'355'443 | 2'684'354 | 3'355'443 | 21'474'836 |
Area_Count | 295 | 303 | 308 | 306 | 306 | 308 | 316 | 959 |
Top 5 Areas Name | Subpool_1 | Subpool_2 | Subpool_3 | Subpool_4 | Subpool_5 | Subpool_6 | Subpool_7 | Sum |
KKSSP | 617'693 | 563'340 | 590'296 | 590'798 | 577'532 | 643'504 | 577'237 | 4'160'404 |
db_block_hash_buckets | 529'039 | 535'429 | 529'039 | 539'525 | 535'429 | 529'044 | 535'429 | 3'732'935 |
KGLH0 | 554'532 | 485'312 | 464'006 | 353'634 | 450'528 | 346'045 | 332'455 | 2'986'514 |
SQLA | 1'110 | 588'943 | 565'317 | 185'664 | 574'008 | 155'498 | 272'452 | 2'342'994 |
free memory | 306'562 | 358'029 | 353'146 | 306'659 | 342'194 | 325'386 | 304'535 | 2'296'513 |
3. Single Area Allocation
There are 649 areas allocated only to one single subpool. 110 areas allocated to two subpools, but often one of two is allocated simply 24 Bytes, and other is allocated the rest, for example, FileOpenBlock is allocated 24 Bytes into Subpool_6, 510,025,424 Bytes into Subpool_7.
Here are the top 10 (all size in Byte):
Name Subpool_1 Subpool_2 Subpool_3 Subpool_4 Subpool_5 Subpool_6 Subpool_7 Sum ALLOC_COUNT
----------------------- ----------- ----------- ----------- ------------ ----------- ----------- ------------ ----------- -----------
FileOpenBlock 24 510,025,424 510,025,448 2
enqueue 39,257,104 24 39,257,128 2
KSK VT POOL 24 19,130,696 19,130,720 2
ktlbk state objects 17,971,200 17,971,200 1
Wait History Segment 15,733,120 15,733,120 1
Global Context 11,017,024 11,017,024 1
call 24 10,057,864 10,057,888 2
keswx:plan en 10,011,896 10,011,896 1
KEWS sesstat values 9,802,944 9,802,944 1
FileIdentificatonBlock 24 7,711,368 7,711,392 2
4. Unbalanced Areas
Following Areas allocations are not well balanced (difference to average > 30%)
The top 10 are (all size in Byte):
Name Subpool_1 Subpool_2 Subpool_3 Subpool_4 Subpool_5 Subpool_6 Subpool_7 Sum ALLOC_COUNT
-------------------- ----------- ----------- ----------- ------------ ----------- ----------- ------------ ------------- -----------
SQLA 1,110,321 588,943,273 565,317,412 185,664,243 574,008,432 155,498,158 272,452,386 2,342,994,419 7
FileOpenBlock 24 510,025,424 510,025,448 2
SQLP 761,808 66,155,040 61,548,864 15,324,936 57,623,952 10,234,544 21,440,800 233,089,944 7
KGLS 737,792 9,169,960 8,730,128 4,925,272 8,779,216 4,559,760 5,985,728 42,887,856 7
KQR M PO 30,720 799,232 512 255,896 30,568,128 138,680 8,590,256 40,383,424 7
enqueue 39,257,104 24 39,257,128 2
PLDIA 0 7,246,864 8,072,928 4,285,000 7,129,088 3,520,016 5,066,408 35,320,304 7
write state object 4,662,208 4,662,184 48 9,324,368 24 4,662,208 4,662,208 27,973,248 7
KQR L PO 924,032 21,262,248 117,768 22,304,048 3
PRTDS 241,720 4,668,976 4,023,352 2,747,344 5,001,776 2,203,384 1,311,744 20,198,296 7
Discussion
Based on the above figures, we can see that each subpool can have maximum 1/3 of total areas, and certain areas are only allocated to 1 or 2 subpools. For example,
- "FileOpenBlock" is in Subpool_6 and Subpool_7, but Subpool_6 has only 24 Bytes.
- "enqueue" is in Subpool_1 and Subpool_7, but Subpool_7 has only 24 Bytes.
- "PLDIA" is in Subpool_1 and Subpool_7, but Subpool_1 has 0 Bytes.
- "Global Context" is only in Subpool_4.
Now we have some more explanation of "Global Context" contention as experienced in the past since it is all only in one single subpool.
Certain areas are extremely unbalanced among subpools. For example,
- Subpool_1 has 1'110 KB for SQLA.
- Subpool_2 has 588'943 KB for SQLA.
That probably results in the error:
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","SELECT MAX(XX) FROM...","SQLA","tmp")
(probably Oracle can improve the error message by including the Subpool number)
Now I wonder if this shared pool architecture is by design hardly to be balanced. For static area allocations, like "db_block_hash_buckets", it could be OK. However dynamic areas with frequent memory fragmentation would put tremendous pressure on memory management.
ORA-04031 is a cumulative outcome after certain time of ineffective usage of memory. Therefore it is hard to predict and reproduce. Quite often it is thrown by the victim session, which are not necessarily the cause of the error.
By what we learned from LISP (CLOS) and Java, which are using automatic memory management (garbage collection), we can also say that Shared Pool memory management puts remarkable challenge on this technique, and hence a long way to be perfect.
For example, Oracle shared_pool durations "instance", "session", "cursor", and "execution" (vs. LISP garbage collector generations), is probably the second parameter of Heap Dump title:
HEAP DUMP heap name="sga heap(1,0)"
where 1 is subpool_number, 0 is duration.
In Oracle, if Redo is claimed as most critical mechanism in Oracle, probably shared pool should be declared as most sophisticated one.
BTW, to inspect number of Subpools, run following query (instead of checking "_kghdsidx_count").
select * from v$latch_children where name = 'shared pool';
It returns 7 rows (maxmum 7 Subpools), but only the active ones are those with higher GETS (or MISSES).