On Oracle: Oracle Temp File Read Time Stats Problem

From time to time, we observe excessive discrepancies of Temp File Read Time stats between different AWR reports. Here two examples of AWR Section "IOStat by Filetype summary" about Temp File, both are with the almost identical workload. The first one shows that each Large Read takes 31.37ms, whereas the second only 303.76us, that is 100 times of difference.

IOStat by Filetype summary

Filetype Name	Reads: Data	Reqs per sec	Data per sec	Writes: Data	Reqs per sec	Data per sec	Small Read	Large Read
Temp File	640M	2.06	.177M	641M	0.85	.177M	309.84ms	31.37ms

IOStat by Filetype summary

Filetype Name	Reads: Data	Reqs per sec	Data per sec	Writes: Data	Reqs per sec	Data per sec	Small Read	Large Read
Temp File	725M	1.71	.204M	725M	1.71	.204M	4.12ms	303.76us

In this Blog, we will make tests to demonstrate that Temp File Megabytes Read and Read Requests stats are accurately counted, but Read Time stats are not precisely measured.

Note: Tested in Oracle 19.7 on Solaris with usual parameters below:

   db_file_multiblock_read_count   32
   _db_file_optimizer_read_count   32 (calculate optimizer query cost at parse time)
   _db_file_exec_read_count        32 (control the multiblock reads at execution time)
   disk_asynch_io                  TRUE
   db_block_size                   8192

At first, we create a GTT table of 700,000 rows:


set time on timing on
drop table test_gtt_tab purge; 
create global temporary table test_gtt_tab on commit preserve rows as select level x, rpad('ABC', 1000, 'X') y from dual connect by level <= 7*1e5;

The GTT table has 100,096 data blocks and takes about 780 MB:


select s.sid, s.serial#, tablespace, contents, segtype, segfile#, segblk#, extents, blocks, segrfno#, ts#, sql_id_tempseg
from v$session s, v$tempseg_usage t where s.saddr=t.session_addr and sid = 189;

  SID    SERIAL# TABLESPACE CONTENTS  SEGTYPE    SEGFILE#  SEGBLK#  EXTENTS  BLOCKS  SEGRFNO# TS# SQL_ID_TEMPSE
----- ---------- ---------- --------- --------- --------- -------- -------- ------- --------- --- -------------
  189      36349 TEMP       TEMPORARY DATA           3073   477312      391  100096         1   3 0m2mbh7cxvfnr
  189      36349 TEMP       TEMPORARY TEMP_UNDO      3073   373632        1     256         1   3 0m2mbh7cxvfnr

select sid, s.serial#, p.spid, round(pga_used_mem/1024/1024) used_mb, round(pga_alloc_mem/1024/1024) alloc_mb,round(pga_max_mem/1024/1024) max_mb
from v$session s, v$process p where s.paddr=p.addr and sid = 189;

  SID  SERIAL# SPID     USED_MB   ALLOC_MB     MAX_MB
----- -------- ----- ---------- ---------- ----------
  189    36349 27944          9          9         53

Now we run a query from GTT table and watch its reading stats


SQL> select sysdate, large_read_megabytes, large_read_reqs, large_read_servicetime, small_read_megabytes, small_read_reqs, small_read_servicetime
       from v$iostat_file v where filetype_name = 'Temp File';

SYSDATE  LARGE_READ_MEGABYTES LARGE_READ_REQS LARGE_READ_SERVICETIME SMALL_READ_MEGABYTES SMALL_READ_REQS SMALL_READ_SERVICETIME
-------- -------------------- --------------- ---------------------- -------------------- --------------- ----------------------
06:13:35               278097         1113328                   2916                  507           63543                    817

SQL> select count(*) from test_gtt_tab t;
  COUNT(*)
----------
    700000

Elapsed: 00:00:00.33

SQL> select sysdate, large_read_megabytes, large_read_reqs, large_read_servicetime, small_read_megabytes, small_read_reqs, small_read_servicetime
       from v$iostat_file v where filetype_name = 'Temp File';

SYSDATE  LARGE_READ_MEGABYTES LARGE_READ_REQS LARGE_READ_SERVICETIME SMALL_READ_MEGABYTES SMALL_READ_REQS SMALL_READ_SERVICETIME
-------- -------------------- --------------- ---------------------- -------------------- --------------- ----------------------
06:13:36               278878         1116453                   2916                  507           63545                    817

We can see that full scan of test_gtt_tab increases LARGE_READ_REQS 3125 (1116453-1113328), and increases read LARGE_READ_MEGABYTES 781 (278878-278097), but LARGE_READ_SERVICETIME is not updated and still remains the same (2916).

Each LARGE_READ_REQS fetches 32 blocks, 3125*32 makes about 100,000 blocks (781 MB). (There are also 2 (63545-63543) SMALL_READ_REQS. For Temp File read, we mainly focus on LARGE_READ).

If we re-run above query and at the same truss its process:


$ > truss -c -t pread,read,readv -p 27944
  syscall               seconds   calls  errors
  read                     .000       3
  pread                    .184    3127
                       --------  ------   ----
  sys totals:              .184    3130      0
  usr time:                .288
  elapsed:                9.680

The summary report shows that there are 3127 pread to read test_gtt_tab from Temp File. 3127 pread system calls are corresponding 3125 LARGE_READ_REQS and 2 SMALL_READ_REQS. All 3127 pread took 0.184 seconds, usr time is 0.288 seconds.

We can also truss each read system call, the output looks like (repeated pread lines are removed):


$ > truss -ADE -t pread,read,readv -p 27944

/1:     read(17, 0x7FFFBDE7658E, 8208)  (sleeping...)
/1:     06:35:26.287987     7.510987     7.510987    read(17, "\0\001 h06\0\0\0\0\011 i".., 8208)       = 360
/1:     06:35:26.289297     0.001353     0.000065    pread(268, "10A2\0\080 H G\094 5 ` N".., 8192, 0xE9100000) = 8192
/6:     06:35:26.289979     7.513009     0.000409    pread(268, "06A2\0\081 H G\0DA 0 ` N".., 253952, 0xE9102000) = 253952
/7:     06:35:26.290062     7.513088     0.000433    pread(268, "06A2\0\0A0 H G\0DA 0 ` N".., 262144, 0xE9140000) = 262144
/4:     06:35:26.290445     7.513482     0.000117    pread(268, "06A2\0\0C0 H G\0DA 0 ` N".., 262144, 0xE9180000) = 262144
/5:     06:35:26.290516     7.513563     0.000089    pread(268, "06A2\0\0E0 H G\0DA 0 ` N".., 262144, 0xE91C0000) = 262144
/6:     06:35:26.290725     0.000764     0.000088    pread(268, "06A2\0\0\0 I G\0DA 0 ` N".., 262144, 0xE9200000) = 262144
/7:     06:35:26.290862     0.000822     0.000132    pread(268, "06A2\0\0   I G\0DA 0 ` N".., 262144, 0xE9240000) = 262144
/4:     06:35:26.291015     0.000580     0.000093    pread(268, "06A2\0\0 @ I G\0DA 0 ` N".., 262144, 0xE9280000) = 262144
/5:     06:35:26.291158     0.000643     0.000117    pread(268, "06A2\0\0 ` I G\0DA 0 ` N".., 262144, 0xE92C0000) = 262144
...
/6:     06:35:26.727446     0.000574     0.000096    pread(268, "06A2\0\0\0B5 E\0DA 0 ` N".., 262144, 0xB6A00000) = 262144
/7:     06:35:26.727545     0.000559     0.000086    pread(268, "06A2\0\0  B5 E\0DA 0 ` N".., 8192, 0xB6A40000) = 8192
/1:     06:35:26.728299     0.438996     0.000563    read(17, "\0\0\01506\0\0\0\0\00305".., 8208)       = 21
/1:     06:35:26.746981     0.018691     0.018558    read(17, "\0\001F206\0\0\0\0\003 ^".., 8208)       = 498
/1:     read(17, 0x7FFFBDE9505E, 8208)  (sleeping...)

  Description:
    Column 1: LWP
    Column 2: time stamp at the completion of the system call (Option -A)
    Column 3: time stamp delta betweeen two same LWP (Option -D), e.g. for /5,  06:35:26.291158 - 06:35:26.290516 => 0.000643
    Column 4: time elapsed between the beginning and end of a system call(wall clock time) (Option -E)
    Column 5: System call function with arguments and return value

There are 3127 pread system calls from file descriptor 268. Two of them read 8192 bytes (1 data block SMALL_READ), which are the first and last pread. All other 3125 read 262144 bytes (32 blocks, except one read 253952 bytes or 31 blocks). There are also three read system calls from file descriptor 17. (For Oracle Physical Read with pread, read and readv system calls, see right linked book: Oracle Database Performance Tuning (Studies . Practices . Researches) Chapter 1: Data Acess - Physical Read )

The two file descriptors in above truss output are:


$ > pfiles 27944

  17: S_IFSOCK mode:0666 dev:16377,0 ino:30968 uid:0 gid:0 size:0
      O_RDWR FD_CLOEXEC
        sockname: AF_INET6 ::ffff:10.116.10.50  port: 1522
        peername: AF_INET6 ::ffff:10.55.101.112  port: 52955
        SOCK_STREAM

 268: S_IFREG mode:0640 dev:285,65562 ino:3052 uid:100 gid:101 size:3925876736 (4GB)
      O_RDWR|O_DSYNC|O_LARGEFILE FD_CLOEXEC
      /oratestdb/oradata/testdb/temp01.dbf
      offset:0

If we make a SQL trace of same query:


alter session set events='10046 trace name context forever, level 12' tracefile_identifier='read_time_stats_1';
select count(*) from test_gtt_tab t;
alter session set events='10046 trace name context off';

The TKPORF output looks like:


SQL ID: a4qgnmwbw6nxf Plan Hash: 2907084990
select count(*) from test_gtt_tab t

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          1          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.34       4.02     100001     100010          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.34       4.03     100001     100011          0           1

Rows (1st)  Row Source Operation
----------  ---------------------------------------------------
         1  SORT AGGREGATE (cr=100010 pr=100001 pw=0 time=4029979 us starts=1)
    700000   TABLE ACCESS FULL TEST_GTT_TAB (cr=100010 pr=100001 pw=0 time=517483 us starts=1 cost=30 size=0 card=8168)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file sequential read                         1        0.07          0.07
  direct path read temp                         535        0.14          3.62

And raw trace file contains 535 lines with nam='direct path read temp' (repeated lines removed):


WAIT #140736367740528: nam='direct path read temp' ela= 21302 file number=3073 first dba=477313 block cnt=31 obj#=4062216 tim=2558073933872
WAIT #140736367740528: nam='direct path read temp' ela= 21723 file number=3073 first dba=477440 block cnt=32 obj#=4062216 tim=2558073956542
WAIT #140736367740528: nam='direct path read temp' ela= 863   file number=3073 first dba=475520 block cnt=32 obj#=4062216 tim=2558073957967
...
WAIT #140736367740528: nam='direct path read temp' ela= 115   file number=3073 first dba=375744 block cnt=32 obj#=4062216 tim=2558077865590
WAIT #140736367740528: nam='direct path read temp' ela= 127   file number=3073 first dba=375808 block cnt=32 obj#=4062216 tim=2558077865921

Now we make a further test to select only 7000 rows:


SQL> select sysdate, large_read_megabytes, large_read_reqs, large_read_servicetime, small_read_megabytes, small_read_reqs, small_read_servicetime
       from v$iostat_file v where filetype_name = 'Temp File';

SYSDATE  LARGE_READ_MEGABYTES LARGE_READ_REQS LARGE_READ_SERVICETIME SMALL_READ_MEGABYTES SMALL_READ_REQS SMALL_READ_SERVICETIME
-------- -------------------- --------------- ---------------------- -------------------- --------------- ----------------------
07:30:35               284347         1138328                   9766                  507           63559                    887

SQL> select count(*) from test_gtt_tab t where rownum <=7*1e3;
  COUNT(*)
----------
      7000

SQL> select sysdate, large_read_megabytes, large_read_reqs, large_read_servicetime, small_read_megabytes, small_read_reqs, small_read_servicetime
       from v$iostat_file v where filetype_name = 'Temp File';

SYSDATE  LARGE_READ_MEGABYTES LARGE_READ_REQS LARGE_READ_SERVICETIME SMALL_READ_MEGABYTES SMALL_READ_REQS SMALL_READ_SERVICETIME
-------- -------------------- --------------- ---------------------- -------------------- --------------- ----------------------
07:30:35               284355         1138361                   9766                  507           63560                    887

There are 33 (1138361-1138328) LARGE_READ and 1 (63560-63559) SMALL_READ to read 8 (284355-284347) MB data, bot both LARGE_READ_SERVICETIME and SMALL_READ_SERVICETIME remain same before and after.

truss also shows 34 pread.


$ > truss -c -t pread,read,readv -p 27944
  syscall               seconds   calls  errors
  read                     .000       3
  pread                    .002      34
                       --------  ------   ----
  sys totals:              .002      37      0
  usr time:                .004
  elapsed:                4.770

If we repeat the same query 1000 times:


SQL> select sysdate, large_read_megabytes, large_read_reqs, large_read_servicetime, small_read_megabytes, small_read_reqs, small_read_servicetime
       from v$iostat_file v where filetype_name = 'Temp File';

SYSDATE  LARGE_READ_MEGABYTES LARGE_READ_REQS LARGE_READ_SERVICETIME SMALL_READ_MEGABYTES SMALL_READ_REQS SMALL_READ_SERVICETIME
-------- -------------------- --------------- ---------------------- -------------------- --------------- ----------------------
07:39:27               383269         1534394                   9848                  601           75561                    887

SQL> begin
       for i in 1..1000 loop
         for c in (select count(*) from test_gtt_tab t where rownum <=7*1e3) loop
           null;
         end loop;
       end loop;
     end;
     /

Elapsed: 00:00:04.42

SQL> select sysdate, large_read_megabytes, large_read_reqs, large_read_servicetime, small_read_megabytes, small_read_reqs, small_read_servicetime
       from v$iostat_file v where filetype_name = 'Temp File';

SYSDATE  LARGE_READ_MEGABYTES LARGE_READ_REQS LARGE_READ_SERVICETIME SMALL_READ_MEGABYTES SMALL_READ_REQS SMALL_READ_SERVICETIME
-------- -------------------- --------------- ---------------------- -------------------- --------------- ----------------------
07:39:31               391512         1567394                   9851                  609           76561                    887

There are 33000 (1567394-1534394) LARGE_READ and 1000 (76561-75561) SMALL_READ. LARGE_READ_REQS increases 8243 (391512-383269) MB. They are exactly 1000 times of one single execution.

LARGE_READ_SERVICETIME increases 3ms (9851-9848) and SMALL_READ_SERVICETIME remains same before and after.

truss also shows 34000 pread with sys totals: 1.938 seconds and usr time: 4.093 seconds.


$ > truss -c -t pread,read,readv -p 27944
  syscall               seconds   calls  errors
  read                     .000       2
  pread                   1.938   34000
                       --------  ------   ----
  sys totals:             1.938   34002      0
  usr time:               4.093
  elapsed:                8.350

If we make a SQL trace of same query:


alter session set events='10046 trace name context forever, level 12' tracefile_identifier='read_time_stats_2';
begin
  for i in 1..1000 loop
    for c in (select count(*) from test_gtt_tab t where rownum <=7*1e3) loop
      null;
    end loop;
  end loop;
end;
/
alter session set events='10046 trace name context off';

TKPROF shows:


SQL ID: 4nca0gxwjjwkx Plan Hash: 2658366304
SELECT COUNT(*) FROM TEST_GTT_TAB T WHERE ROWNUM <=7*1E3

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          2          0           0
Execute   1000      0.03       0.03          0          0          0           0
Fetch     1000      4.19       4.40    1056000    1003000          0        1000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     2001      4.23       4.44    1056000    1003002          0        1000

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=1003 pr=1056 pw=0 time=4961 us starts=1)
      7000       7000       7000   COUNT STOPKEY (cr=1003 pr=1056 pw=0 time=5612 us starts=1)
      7000       7000       7000    TABLE ACCESS FULL TEST_GTT_TAB (cr=1003 pr=1056 pw=0 time=4484 us starts=1 cost=26 size=0 card=7000)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  reliable message                             1000        0.00          0.07
  db file sequential read                      1000        0.00          0.01
  direct path read temp                        2184        0.00          0.14

We can also collect one AWR report by:


exec sys.dbms_workload_repository.create_snapshot('ALL');
begin
  for i in 1..1000 loop
    for c in (select count(*) from test_gtt_tab t where rownum <=7*1e3) loop
      null;
    end loop;
  end loop;
end;
/
exec sys.dbms_workload_repository.create_snapshot('ALL');

IOStat by Filetype summary

Filetype Name	Reads: Data	Reqs per sec	Data per sec	Writes: Data	Reqs per sec	Data per sec	Small Read	Large Read
Temp File	8.1G	6013.44	1.4G	0M	0.00	0M	.00ns	151.52ns

The whole elapsed time is about 6 seconds (8.1/1.4 or 34000/6013.44), with total LARGE_READ time 5 ms (34000*151.52ns). However, the system tool truss shows that 34000 pread system calls already takes 1.938 seconds, and usr time is 4.093 seconds. Oracle Layer SQL Trace shows total elapsed 4.44 seconds, Total Waited of 2184 "direct path read temp" accounts for 0.14 seconds. All those are much bigger than the total disproportionate LARGE_READ time 5 ms.

Here is a quick query to list Temp File Read Time stats. By looking fluctuation of column ms_per_large_req, we can detect the discrepancies.


select round(l_time_d/nullif(l_req_d, 0),8) ms_per_large_req, round(s_time_d/nullif(s_req_d, 0),8) ms_per_small_req, v.* 
 from (
   select h.end_interval_time ts, f.snap_id,  
          large_read_megabytes, large_read_reqs, large_read_servicetime,
          large_read_megabytes-lag(large_read_megabytes)     over (order by f.snap_id) l_mb_d,
          large_read_reqs-lag(large_read_reqs)               over (order by f.snap_id) l_req_d,
          large_read_servicetime-lag(large_read_servicetime) over (order by f.snap_id) l_time_d,
          small_read_megabytes, small_read_reqs, small_read_servicetime,
          small_read_megabytes-lag(small_read_megabytes)     over (order by f.snap_id) s_mb_d,
          small_read_reqs-lag(small_read_reqs)               over (order by f.snap_id) s_req_d,
          small_read_servicetime-lag(small_read_servicetime) over (order by f.snap_id) s_time_d
     from dba_hist_iostat_filetype f, dba_hist_snapshot h 
     where f.snap_id = h.snap_id and filetype_name = 'Temp File'
    order by f.snap_id) v
-- where l_req_d > 0 or s_req_d > 0
;

For above AWR report, the output about LARGE_READ shows that MS_PER_LARGE_REQ is .00015152ms, or 151.52ns.


MS_PER_LARGE_REQ  TS       SNAP_ID LARGE_READ_MEGABYTES LARGE_READ_REQS LARGE_READ_SERVICETIME L_MB_D L_REQ_D L_TIME_D
---------------- --------- ------- -------------------- --------------- ---------------------- ------ ------- --------
                  08:28:19   68202               457449         1831394                   9924      0       0        0
       .00015152  08:28:25   68203               465691         1864394                   9929   8242   33000        5

v$iostat_file (dba_hist_iostat_filetype) is underlined by the fixed view x$ksfdstfile. Above test shows that LARGE_READ_SERVICETIME is hardly updated, and not coordinated with LARGE_READ_MEGABYTES LARGE_READ_REQS. It is not clear how x$ksfdstfile collect Temp File Read Time stats and when it is updated.

In the same DB, there is also one snapshot in which MS_PER_LARGE_REQ is irrational 5592.17391 ms (5 seconds). For such case, it is not yet clear how to reproduce it.


MS_PER_LARGE_REQ  TS       SNAP_ID LARGE_READ_MEGABYTES LARGE_READ_REQS LARGE_READ_SERVICETIME  L_MB_D L_REQ_D L_TIME_D
---------------- --------- ------- -------------------- --------------- ---------------------- ------- ------- --------
                  12:00:23   68148                  559            2330                  26162                 
      5592.17391  12:27:19   68149                  628            2606                1569602      69     276  1543440

By the way, for TEST_GTT_TAB "obj#=4062216" in above raw trace file, we can run the query below to map it to TEMP_DATA_OBJECT_ID: 4671616 in x$ktsso (underline fixed view of v$tempseg_usage):


SQL> select o.object_name, o.object_id, ktssoobjd temp_data_object_id, ktssotsnum ts#, ktssofno segfile#,
            ktssorfno segrfno#, ktssobno segblk#, ktssoexts extents, ktssoblks blocks
       from x$ktsso t, dba_objects o
      where t.ktssoobjn=o.object_id and (o.object_name = 'TEST_GTT_TAB' or ktssoobjd=4062216);

OBJECT_NAME     OBJECT_ID TEMP_DATA_OBJECT_ID  TS#   SEGFILE#   SEGRFNO#    SEGBLK#  EXTENTS  BLOCKS
-------------- ---------- ------------------- ---- ---------- ---------- ---------- -------- -------
TEST_GTT_TAB      4062216             4671616    3       3073          1     477312      391  100096


SQL> select objd, file#, block#, class#, status, dirty, temp, ts# from v$bh where objd=4671616;

    OBJD  FILE#  BLOCK#  CLASS# STATUS     D T  TS#
-------- ------ ------- ------- ---------- - - ----
 4671616      1  477312       4 xcur       N Y    3

The temp_data_object_id 4671616 is also referred in v$bh as OBJD (CLASS# 4 for 'segment header').

Sunday, March 7, 2021

Oracle Temp File Read Time Stats Problem

IOStat by Filetype summary

IOStat by Filetype summary

IOStat by Filetype summary