IOStat by Filetype summary
Filetype Name | Reads: Data | Reqs per sec | Data per sec | Writes: Data | Reqs per sec | Data per sec | Small Read | Large Read |
---|---|---|---|---|---|---|---|---|
Temp File | 640M | 2.06 | .177M | 641M | 0.85 | .177M | 309.84ms | 31.37ms |
IOStat by Filetype summary
Filetype Name | Reads: Data | Reqs per sec | Data per sec | Writes: Data | Reqs per sec | Data per sec | Small Read | Large Read |
---|---|---|---|---|---|---|---|---|
Temp File | 725M | 1.71 | .204M | 725M | 1.71 | .204M | 4.12ms | 303.76us |
In this Blog, we will make tests to demonstrate that Temp File Megabytes Read and Read Requests stats are accurately counted, but Read Time stats are not precisely measured.
Note: Tested in Oracle 19.7 on Solaris with usual parameters below:
db_file_multiblock_read_count 32 _db_file_optimizer_read_count 32 (calculate optimizer query cost at parse time) _db_file_exec_read_count 32 (control the multiblock reads at execution time) disk_asynch_io TRUE db_block_size 8192At first, we create a GTT table of 700,000 rows:
set time on timing on
drop table test_gtt_tab purge;
create global temporary table test_gtt_tab on commit preserve rows as select level x, rpad('ABC', 1000, 'X') y from dual connect by level <= 7*1e5;
The GTT table has 100,096 data blocks and takes about 780 MB:
select s.sid, s.serial#, tablespace, contents, segtype, segfile#, segblk#, extents, blocks, segrfno#, ts#, sql_id_tempseg
from v$session s, v$tempseg_usage t where s.saddr=t.session_addr and sid = 189;
SID SERIAL# TABLESPACE CONTENTS SEGTYPE SEGFILE# SEGBLK# EXTENTS BLOCKS SEGRFNO# TS# SQL_ID_TEMPSE
----- ---------- ---------- --------- --------- --------- -------- -------- ------- --------- --- -------------
189 36349 TEMP TEMPORARY DATA 3073 477312 391 100096 1 3 0m2mbh7cxvfnr
189 36349 TEMP TEMPORARY TEMP_UNDO 3073 373632 1 256 1 3 0m2mbh7cxvfnr
select sid, s.serial#, p.spid, round(pga_used_mem/1024/1024) used_mb, round(pga_alloc_mem/1024/1024) alloc_mb,round(pga_max_mem/1024/1024) max_mb
from v$session s, v$process p where s.paddr=p.addr and sid = 189;
SID SERIAL# SPID USED_MB ALLOC_MB MAX_MB
----- -------- ----- ---------- ---------- ----------
189 36349 27944 9 9 53
Now we run a query from GTT table and watch its reading stats
SQL> select sysdate, large_read_megabytes, large_read_reqs, large_read_servicetime, small_read_megabytes, small_read_reqs, small_read_servicetime
from v$iostat_file v where filetype_name = 'Temp File';
SYSDATE LARGE_READ_MEGABYTES LARGE_READ_REQS LARGE_READ_SERVICETIME SMALL_READ_MEGABYTES SMALL_READ_REQS SMALL_READ_SERVICETIME
-------- -------------------- --------------- ---------------------- -------------------- --------------- ----------------------
06:13:35 278097 1113328 2916 507 63543 817
SQL> select count(*) from test_gtt_tab t;
COUNT(*)
----------
700000
Elapsed: 00:00:00.33
SQL> select sysdate, large_read_megabytes, large_read_reqs, large_read_servicetime, small_read_megabytes, small_read_reqs, small_read_servicetime
from v$iostat_file v where filetype_name = 'Temp File';
SYSDATE LARGE_READ_MEGABYTES LARGE_READ_REQS LARGE_READ_SERVICETIME SMALL_READ_MEGABYTES SMALL_READ_REQS SMALL_READ_SERVICETIME
-------- -------------------- --------------- ---------------------- -------------------- --------------- ----------------------
06:13:36 278878 1116453 2916 507 63545 817
We can see that full scan of test_gtt_tab increases LARGE_READ_REQS 3125 (1116453-1113328),
and increases read LARGE_READ_MEGABYTES 781 (278878-278097), but LARGE_READ_SERVICETIME is not updated and still remains the same (2916). Each LARGE_READ_REQS fetches 32 blocks, 3125*32 makes about 100,000 blocks (781 MB). (There are also 2 (63545-63543) SMALL_READ_REQS. For Temp File read, we mainly focus on LARGE_READ).
If we re-run above query and at the same truss its process:
$ > truss -c -t pread,read,readv -p 27944
syscall seconds calls errors
read .000 3
pread .184 3127
-------- ------ ----
sys totals: .184 3130 0
usr time: .288
elapsed: 9.680
The summary report shows that there are 3127 pread to read test_gtt_tab from Temp File.
3127 pread system calls are corresponding 3125 LARGE_READ_REQS and 2 SMALL_READ_REQS.
All 3127 pread took 0.184 seconds, usr time is 0.288 seconds.We can also truss each read system call, the output looks like (repeated pread lines are removed):
$ > truss -ADE -t pread,read,readv -p 27944
/1: read(17, 0x7FFFBDE7658E, 8208) (sleeping...)
/1: 06:35:26.287987 7.510987 7.510987 read(17, "\0\001 h06\0\0\0\0\011 i".., 8208) = 360
/1: 06:35:26.289297 0.001353 0.000065 pread(268, "10A2\0\080 H G\094 5 ` N".., 8192, 0xE9100000) = 8192
/6: 06:35:26.289979 7.513009 0.000409 pread(268, "06A2\0\081 H G\0DA 0 ` N".., 253952, 0xE9102000) = 253952
/7: 06:35:26.290062 7.513088 0.000433 pread(268, "06A2\0\0A0 H G\0DA 0 ` N".., 262144, 0xE9140000) = 262144
/4: 06:35:26.290445 7.513482 0.000117 pread(268, "06A2\0\0C0 H G\0DA 0 ` N".., 262144, 0xE9180000) = 262144
/5: 06:35:26.290516 7.513563 0.000089 pread(268, "06A2\0\0E0 H G\0DA 0 ` N".., 262144, 0xE91C0000) = 262144
/6: 06:35:26.290725 0.000764 0.000088 pread(268, "06A2\0\0\0 I G\0DA 0 ` N".., 262144, 0xE9200000) = 262144
/7: 06:35:26.290862 0.000822 0.000132 pread(268, "06A2\0\0 I G\0DA 0 ` N".., 262144, 0xE9240000) = 262144
/4: 06:35:26.291015 0.000580 0.000093 pread(268, "06A2\0\0 @ I G\0DA 0 ` N".., 262144, 0xE9280000) = 262144
/5: 06:35:26.291158 0.000643 0.000117 pread(268, "06A2\0\0 ` I G\0DA 0 ` N".., 262144, 0xE92C0000) = 262144
...
/6: 06:35:26.727446 0.000574 0.000096 pread(268, "06A2\0\0\0B5 E\0DA 0 ` N".., 262144, 0xB6A00000) = 262144
/7: 06:35:26.727545 0.000559 0.000086 pread(268, "06A2\0\0 B5 E\0DA 0 ` N".., 8192, 0xB6A40000) = 8192
/1: 06:35:26.728299 0.438996 0.000563 read(17, "\0\0\01506\0\0\0\0\00305".., 8208) = 21
/1: 06:35:26.746981 0.018691 0.018558 read(17, "\0\001F206\0\0\0\0\003 ^".., 8208) = 498
/1: read(17, 0x7FFFBDE9505E, 8208) (sleeping...)
Description:
Column 1: LWP
Column 2: time stamp at the completion of the system call (Option -A)
Column 3: time stamp delta betweeen two same LWP (Option -D), e.g. for /5, 06:35:26.291158 - 06:35:26.290516 => 0.000643
Column 4: time elapsed between the beginning and end of a system call(wall clock time) (Option -E)
Column 5: System call function with arguments and return value
There are 3127 pread system calls from file descriptor 268. Two of them read 8192 bytes (1 data block SMALL_READ),
which are the first and last pread.
All other 3125 read 262144 bytes (32 blocks, except one read 253952 bytes or 31 blocks).
There are also three read system calls from file descriptor 17.
(For Oracle Physical Read with pread, read and readv system calls,
see right linked book:
Oracle Database Performance Tuning (Studies . Practices . Researches) Chapter 1: Data Acess - Physical Read
)The two file descriptors in above truss output are:
$ > pfiles 27944
17: S_IFSOCK mode:0666 dev:16377,0 ino:30968 uid:0 gid:0 size:0
O_RDWR FD_CLOEXEC
sockname: AF_INET6 ::ffff:10.116.10.50 port: 1522
peername: AF_INET6 ::ffff:10.55.101.112 port: 52955
SOCK_STREAM
268: S_IFREG mode:0640 dev:285,65562 ino:3052 uid:100 gid:101 size:3925876736 (4GB)
O_RDWR|O_DSYNC|O_LARGEFILE FD_CLOEXEC
/oratestdb/oradata/testdb/temp01.dbf
offset:0
If we make a SQL trace of same query:
alter session set events='10046 trace name context forever, level 12' tracefile_identifier='read_time_stats_1';
select count(*) from test_gtt_tab t;
alter session set events='10046 trace name context off';
The TKPORF output looks like:
SQL ID: a4qgnmwbw6nxf Plan Hash: 2907084990
select count(*) from test_gtt_tab t
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 1 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.34 4.02 100001 100010 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.34 4.03 100001 100011 0 1
Rows (1st) Row Source Operation
---------- ---------------------------------------------------
1 SORT AGGREGATE (cr=100010 pr=100001 pw=0 time=4029979 us starts=1)
700000 TABLE ACCESS FULL TEST_GTT_TAB (cr=100010 pr=100001 pw=0 time=517483 us starts=1 cost=30 size=0 card=8168)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 1 0.07 0.07
direct path read temp 535 0.14 3.62
And raw trace file contains 535 lines with nam='direct path read temp' (repeated lines removed):
WAIT #140736367740528: nam='direct path read temp' ela= 21302 file number=3073 first dba=477313 block cnt=31 obj#=4062216 tim=2558073933872
WAIT #140736367740528: nam='direct path read temp' ela= 21723 file number=3073 first dba=477440 block cnt=32 obj#=4062216 tim=2558073956542
WAIT #140736367740528: nam='direct path read temp' ela= 863 file number=3073 first dba=475520 block cnt=32 obj#=4062216 tim=2558073957967
...
WAIT #140736367740528: nam='direct path read temp' ela= 115 file number=3073 first dba=375744 block cnt=32 obj#=4062216 tim=2558077865590
WAIT #140736367740528: nam='direct path read temp' ela= 127 file number=3073 first dba=375808 block cnt=32 obj#=4062216 tim=2558077865921
Now we make a further test to select only 7000 rows:
SQL> select sysdate, large_read_megabytes, large_read_reqs, large_read_servicetime, small_read_megabytes, small_read_reqs, small_read_servicetime
from v$iostat_file v where filetype_name = 'Temp File';
SYSDATE LARGE_READ_MEGABYTES LARGE_READ_REQS LARGE_READ_SERVICETIME SMALL_READ_MEGABYTES SMALL_READ_REQS SMALL_READ_SERVICETIME
-------- -------------------- --------------- ---------------------- -------------------- --------------- ----------------------
07:30:35 284347 1138328 9766 507 63559 887
SQL> select count(*) from test_gtt_tab t where rownum <=7*1e3;
COUNT(*)
----------
7000
SQL> select sysdate, large_read_megabytes, large_read_reqs, large_read_servicetime, small_read_megabytes, small_read_reqs, small_read_servicetime
from v$iostat_file v where filetype_name = 'Temp File';
SYSDATE LARGE_READ_MEGABYTES LARGE_READ_REQS LARGE_READ_SERVICETIME SMALL_READ_MEGABYTES SMALL_READ_REQS SMALL_READ_SERVICETIME
-------- -------------------- --------------- ---------------------- -------------------- --------------- ----------------------
07:30:35 284355 1138361 9766 507 63560 887
There are 33 (1138361-1138328) LARGE_READ and 1 (63560-63559) SMALL_READ to read 8 (284355-284347) MB data,
bot both LARGE_READ_SERVICETIME and SMALL_READ_SERVICETIME remain same before and after.truss also shows 34 pread.
$ > truss -c -t pread,read,readv -p 27944
syscall seconds calls errors
read .000 3
pread .002 34
-------- ------ ----
sys totals: .002 37 0
usr time: .004
elapsed: 4.770
If we repeat the same query 1000 times:
SQL> select sysdate, large_read_megabytes, large_read_reqs, large_read_servicetime, small_read_megabytes, small_read_reqs, small_read_servicetime
from v$iostat_file v where filetype_name = 'Temp File';
SYSDATE LARGE_READ_MEGABYTES LARGE_READ_REQS LARGE_READ_SERVICETIME SMALL_READ_MEGABYTES SMALL_READ_REQS SMALL_READ_SERVICETIME
-------- -------------------- --------------- ---------------------- -------------------- --------------- ----------------------
07:39:27 383269 1534394 9848 601 75561 887
SQL> begin
for i in 1..1000 loop
for c in (select count(*) from test_gtt_tab t where rownum <=7*1e3) loop
null;
end loop;
end loop;
end;
/
Elapsed: 00:00:04.42
SQL> select sysdate, large_read_megabytes, large_read_reqs, large_read_servicetime, small_read_megabytes, small_read_reqs, small_read_servicetime
from v$iostat_file v where filetype_name = 'Temp File';
SYSDATE LARGE_READ_MEGABYTES LARGE_READ_REQS LARGE_READ_SERVICETIME SMALL_READ_MEGABYTES SMALL_READ_REQS SMALL_READ_SERVICETIME
-------- -------------------- --------------- ---------------------- -------------------- --------------- ----------------------
07:39:31 391512 1567394 9851 609 76561 887
There are 33000 (1567394-1534394) LARGE_READ and 1000 (76561-75561) SMALL_READ. LARGE_READ_REQS increases 8243 (391512-383269) MB.
They are exactly 1000 times of one single execution.LARGE_READ_SERVICETIME increases 3ms (9851-9848) and SMALL_READ_SERVICETIME remains same before and after.
truss also shows 34000 pread with sys totals: 1.938 seconds and usr time: 4.093 seconds.
$ > truss -c -t pread,read,readv -p 27944
syscall seconds calls errors
read .000 2
pread 1.938 34000
-------- ------ ----
sys totals: 1.938 34002 0
usr time: 4.093
elapsed: 8.350
If we make a SQL trace of same query:
alter session set events='10046 trace name context forever, level 12' tracefile_identifier='read_time_stats_2';
begin
for i in 1..1000 loop
for c in (select count(*) from test_gtt_tab t where rownum <=7*1e3) loop
null;
end loop;
end loop;
end;
/
alter session set events='10046 trace name context off';
TKPROF shows:
SQL ID: 4nca0gxwjjwkx Plan Hash: 2658366304
SELECT COUNT(*) FROM TEST_GTT_TAB T WHERE ROWNUM <=7*1E3
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 2 0 0
Execute 1000 0.03 0.03 0 0 0 0
Fetch 1000 4.19 4.40 1056000 1003000 0 1000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2001 4.23 4.44 1056000 1003002 0 1000
Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
1 1 1 SORT AGGREGATE (cr=1003 pr=1056 pw=0 time=4961 us starts=1)
7000 7000 7000 COUNT STOPKEY (cr=1003 pr=1056 pw=0 time=5612 us starts=1)
7000 7000 7000 TABLE ACCESS FULL TEST_GTT_TAB (cr=1003 pr=1056 pw=0 time=4484 us starts=1 cost=26 size=0 card=7000)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
reliable message 1000 0.00 0.07
db file sequential read 1000 0.00 0.01
direct path read temp 2184 0.00 0.14
We can also collect one AWR report by:
exec sys.dbms_workload_repository.create_snapshot('ALL');
begin
for i in 1..1000 loop
for c in (select count(*) from test_gtt_tab t where rownum <=7*1e3) loop
null;
end loop;
end loop;
end;
/
exec sys.dbms_workload_repository.create_snapshot('ALL');
IOStat by Filetype summary
Filetype Name | Reads: Data | Reqs per sec | Data per sec | Writes: Data | Reqs per sec | Data per sec | Small Read | Large Read |
---|---|---|---|---|---|---|---|---|
Temp File | 8.1G | 6013.44 | 1.4G | 0M | 0.00 | 0M | .00ns | 151.52ns |
The whole elapsed time is about 6 seconds (8.1/1.4 or 34000/6013.44), with total LARGE_READ time 5 ms (34000*151.52ns). However, the system tool truss shows that 34000 pread system calls already takes 1.938 seconds, and usr time is 4.093 seconds. Oracle Layer SQL Trace shows total elapsed 4.44 seconds, Total Waited of 2184 "direct path read temp" accounts for 0.14 seconds. All those are much bigger than the total disproportionate LARGE_READ time 5 ms.
Here is a quick query to list Temp File Read Time stats. By looking fluctuation of column ms_per_large_req, we can detect the discrepancies.
select round(l_time_d/nullif(l_req_d, 0),8) ms_per_large_req, round(s_time_d/nullif(s_req_d, 0),8) ms_per_small_req, v.*
from (
select h.end_interval_time ts, f.snap_id,
large_read_megabytes, large_read_reqs, large_read_servicetime,
large_read_megabytes-lag(large_read_megabytes) over (order by f.snap_id) l_mb_d,
large_read_reqs-lag(large_read_reqs) over (order by f.snap_id) l_req_d,
large_read_servicetime-lag(large_read_servicetime) over (order by f.snap_id) l_time_d,
small_read_megabytes, small_read_reqs, small_read_servicetime,
small_read_megabytes-lag(small_read_megabytes) over (order by f.snap_id) s_mb_d,
small_read_reqs-lag(small_read_reqs) over (order by f.snap_id) s_req_d,
small_read_servicetime-lag(small_read_servicetime) over (order by f.snap_id) s_time_d
from dba_hist_iostat_filetype f, dba_hist_snapshot h
where f.snap_id = h.snap_id and filetype_name = 'Temp File'
order by f.snap_id) v
-- where l_req_d > 0 or s_req_d > 0
;
For above AWR report, the output about LARGE_READ shows that MS_PER_LARGE_REQ is .00015152ms, or 151.52ns.
MS_PER_LARGE_REQ TS SNAP_ID LARGE_READ_MEGABYTES LARGE_READ_REQS LARGE_READ_SERVICETIME L_MB_D L_REQ_D L_TIME_D
---------------- --------- ------- -------------------- --------------- ---------------------- ------ ------- --------
08:28:19 68202 457449 1831394 9924 0 0 0
.00015152 08:28:25 68203 465691 1864394 9929 8242 33000 5
v$iostat_file (dba_hist_iostat_filetype) is underlined by the fixed view x$ksfdstfile.
Above test shows that LARGE_READ_SERVICETIME is hardly updated,
and not coordinated with LARGE_READ_MEGABYTES LARGE_READ_REQS.
It is not clear how x$ksfdstfile collect Temp File Read Time stats and when it is updated.In the same DB, there is also one snapshot in which MS_PER_LARGE_REQ is irrational 5592.17391 ms (5 seconds). For such case, it is not yet clear how to reproduce it.
MS_PER_LARGE_REQ TS SNAP_ID LARGE_READ_MEGABYTES LARGE_READ_REQS LARGE_READ_SERVICETIME L_MB_D L_REQ_D L_TIME_D
---------------- --------- ------- -------------------- --------------- ---------------------- ------- ------- --------
12:00:23 68148 559 2330 26162
5592.17391 12:27:19 68149 628 2606 1569602 69 276 1543440
By the way, for TEST_GTT_TAB "obj#=4062216" in above raw trace file, we can run the query below
to map it to TEMP_DATA_OBJECT_ID: 4671616 in x$ktsso (underline fixed view of v$tempseg_usage):
SQL> select o.object_name, o.object_id, ktssoobjd temp_data_object_id, ktssotsnum ts#, ktssofno segfile#,
ktssorfno segrfno#, ktssobno segblk#, ktssoexts extents, ktssoblks blocks
from x$ktsso t, dba_objects o
where t.ktssoobjn=o.object_id and (o.object_name = 'TEST_GTT_TAB' or ktssoobjd=4062216);
OBJECT_NAME OBJECT_ID TEMP_DATA_OBJECT_ID TS# SEGFILE# SEGRFNO# SEGBLK# EXTENTS BLOCKS
-------------- ---------- ------------------- ---- ---------- ---------- ---------- -------- -------
TEST_GTT_TAB 4062216 4671616 3 3073 1 477312 391 100096
SQL> select objd, file#, block#, class#, status, dirty, temp, ts# from v$bh where objd=4671616;
OBJD FILE# BLOCK# CLASS# STATUS D T TS#
-------- ------ ------- ------- ---------- - - ----
4671616 1 477312 4 xcur N Y 3
The temp_data_object_id 4671616 is also referred in v$bh as OBJD (CLASS# 4 for 'segment header').