Sunday, March 7, 2021

Oracle Temp File Read Time Stats Problem

From time to time, we observe excessive discrepancies of Temp File Read Time stats between different AWR reports. Here two examples of AWR Section "IOStat by Filetype summary" about Temp File, both are with the almost identical workload. The first one shows that each Large Read takes 31.37ms, whereas the second only 303.76us, that is 100 times of difference.

IOStat by Filetype summary

    Filetype Name Reads: Data Reqs per sec Data per sec Writes: Data Reqs per sec Data per sec Small Read Large Read
    Temp File 640M 2.06 .177M 641M 0.85 .177M 309.84ms 31.37ms


    IOStat by Filetype summary

      Filetype Name Reads: Data Reqs per sec Data per sec Writes: Data Reqs per sec Data per sec Small Read Large Read
      Temp File 725M 1.71 .204M 725M 1.71 .204M 4.12ms 303.76us

      In this Blog, we will make tests to demonstrate that Temp File Megabytes Read and Read Requests stats are accurately counted, but Read Time stats are not precisely measured.

      Note: Tested in Oracle 19.7 on Solaris with usual parameters below:
         db_file_multiblock_read_count   32
         _db_file_optimizer_read_count   32 (calculate optimizer query cost at parse time)
         _db_file_exec_read_count        32 (control the multiblock reads at execution time)
         disk_asynch_io                  TRUE
         db_block_size                   8192
      
      At first, we create a GTT table of 700,000 rows:
      
      set time on timing on
      drop table test_gtt_tab purge; 
      create global temporary table test_gtt_tab on commit preserve rows as select level x, rpad('ABC', 1000, 'X') y from dual connect by level <= 7*1e5; 
      
      The GTT table has 100,096 data blocks and takes about 780 MB:
      
      select s.sid, s.serial#, tablespace, contents, segtype, segfile#, segblk#, extents, blocks, segrfno#, ts#, sql_id_tempseg
      from v$session s, v$tempseg_usage t where s.saddr=t.session_addr and sid = 189;
      
        SID    SERIAL# TABLESPACE CONTENTS  SEGTYPE    SEGFILE#  SEGBLK#  EXTENTS  BLOCKS  SEGRFNO# TS# SQL_ID_TEMPSE
      ----- ---------- ---------- --------- --------- --------- -------- -------- ------- --------- --- -------------
        189      36349 TEMP       TEMPORARY DATA           3073   477312      391  100096         1   3 0m2mbh7cxvfnr
        189      36349 TEMP       TEMPORARY TEMP_UNDO      3073   373632        1     256         1   3 0m2mbh7cxvfnr
      
      select sid, s.serial#, p.spid, round(pga_used_mem/1024/1024) used_mb, round(pga_alloc_mem/1024/1024) alloc_mb,round(pga_max_mem/1024/1024) max_mb
      from v$session s, v$process p where s.paddr=p.addr and sid = 189;
      
        SID  SERIAL# SPID     USED_MB   ALLOC_MB     MAX_MB
      ----- -------- ----- ---------- ---------- ----------
        189    36349 27944          9          9         53
      
      Now we run a query from GTT table and watch its reading stats
      
      SQL> select sysdate, large_read_megabytes, large_read_reqs, large_read_servicetime, small_read_megabytes, small_read_reqs, small_read_servicetime
             from v$iostat_file v where filetype_name = 'Temp File';
      
      SYSDATE  LARGE_READ_MEGABYTES LARGE_READ_REQS LARGE_READ_SERVICETIME SMALL_READ_MEGABYTES SMALL_READ_REQS SMALL_READ_SERVICETIME
      -------- -------------------- --------------- ---------------------- -------------------- --------------- ----------------------
      06:13:35               278097         1113328                   2916                  507           63543                    817
      
      SQL> select count(*) from test_gtt_tab t;
        COUNT(*)
      ----------
          700000
      
      Elapsed: 00:00:00.33
      
      SQL> select sysdate, large_read_megabytes, large_read_reqs, large_read_servicetime, small_read_megabytes, small_read_reqs, small_read_servicetime
             from v$iostat_file v where filetype_name = 'Temp File';
      
      SYSDATE  LARGE_READ_MEGABYTES LARGE_READ_REQS LARGE_READ_SERVICETIME SMALL_READ_MEGABYTES SMALL_READ_REQS SMALL_READ_SERVICETIME
      -------- -------------------- --------------- ---------------------- -------------------- --------------- ----------------------
      06:13:36               278878         1116453                   2916                  507           63545                    817
      
      We can see that full scan of test_gtt_tab increases LARGE_READ_REQS 3125 (1116453-1113328), and increases read LARGE_READ_MEGABYTES 781 (278878-278097), but LARGE_READ_SERVICETIME is not updated and still remains the same (2916).

      Each LARGE_READ_REQS fetches 32 blocks, 3125*32 makes about 100,000 blocks (781 MB). (There are also 2 (63545-63543) SMALL_READ_REQS. For Temp File read, we mainly focus on LARGE_READ).

      If we re-run above query and at the same truss its process:
      
      $ > truss -c -t pread,read,readv -p 27944
        syscall               seconds   calls  errors
        read                     .000       3
        pread                    .184    3127
                             --------  ------   ----
        sys totals:              .184    3130      0
        usr time:                .288
        elapsed:                9.680
      
      The summary report shows that there are 3127 pread to read test_gtt_tab from Temp File. 3127 pread system calls are corresponding 3125 LARGE_READ_REQS and 2 SMALL_READ_REQS. All 3127 pread took 0.184 seconds, usr time is 0.288 seconds.

      We can also truss each read system call, the output looks like (repeated pread lines are removed):
      
      $ > truss -ADE -t pread,read,readv -p 27944
      
      /1:     read(17, 0x7FFFBDE7658E, 8208)  (sleeping...)
      /1:     06:35:26.287987     7.510987     7.510987    read(17, "\0\001 h06\0\0\0\0\011 i".., 8208)       = 360
      /1:     06:35:26.289297     0.001353     0.000065    pread(268, "10A2\0\080 H G\094 5 ` N".., 8192, 0xE9100000) = 8192
      /6:     06:35:26.289979     7.513009     0.000409    pread(268, "06A2\0\081 H G\0DA 0 ` N".., 253952, 0xE9102000) = 253952
      /7:     06:35:26.290062     7.513088     0.000433    pread(268, "06A2\0\0A0 H G\0DA 0 ` N".., 262144, 0xE9140000) = 262144
      /4:     06:35:26.290445     7.513482     0.000117    pread(268, "06A2\0\0C0 H G\0DA 0 ` N".., 262144, 0xE9180000) = 262144
      /5:     06:35:26.290516     7.513563     0.000089    pread(268, "06A2\0\0E0 H G\0DA 0 ` N".., 262144, 0xE91C0000) = 262144
      /6:     06:35:26.290725     0.000764     0.000088    pread(268, "06A2\0\0\0 I G\0DA 0 ` N".., 262144, 0xE9200000) = 262144
      /7:     06:35:26.290862     0.000822     0.000132    pread(268, "06A2\0\0   I G\0DA 0 ` N".., 262144, 0xE9240000) = 262144
      /4:     06:35:26.291015     0.000580     0.000093    pread(268, "06A2\0\0 @ I G\0DA 0 ` N".., 262144, 0xE9280000) = 262144
      /5:     06:35:26.291158     0.000643     0.000117    pread(268, "06A2\0\0 ` I G\0DA 0 ` N".., 262144, 0xE92C0000) = 262144
      ...
      /6:     06:35:26.727446     0.000574     0.000096    pread(268, "06A2\0\0\0B5 E\0DA 0 ` N".., 262144, 0xB6A00000) = 262144
      /7:     06:35:26.727545     0.000559     0.000086    pread(268, "06A2\0\0  B5 E\0DA 0 ` N".., 8192, 0xB6A40000) = 8192
      /1:     06:35:26.728299     0.438996     0.000563    read(17, "\0\0\01506\0\0\0\0\00305".., 8208)       = 21
      /1:     06:35:26.746981     0.018691     0.018558    read(17, "\0\001F206\0\0\0\0\003 ^".., 8208)       = 498
      /1:     read(17, 0x7FFFBDE9505E, 8208)  (sleeping...)
      
        Description:
          Column 1: LWP
          Column 2: time stamp at the completion of the system call (Option -A)
          Column 3: time stamp delta betweeen two same LWP (Option -D), e.g. for /5,  06:35:26.291158 - 06:35:26.290516 => 0.000643
          Column 4: time elapsed between the beginning and end of a system call(wall clock time) (Option -E)
          Column 5: System call function with arguments and return value
      
      There are 3127 pread system calls from file descriptor 268. Two of them read 8192 bytes (1 data block SMALL_READ), which are the first and last pread. All other 3125 read 262144 bytes (32 blocks, except one read 253952 bytes or 31 blocks). There are also three read system calls from file descriptor 17. (For Oracle Physical Read with pread, read and readv system calls, see right linked book: Oracle Database Performance Tuning (Studies . Practices . Researches) Chapter 1: Data Acess - Physical Read )

      The two file descriptors in above truss output are:
      
      $ > pfiles 27944
      
        17: S_IFSOCK mode:0666 dev:16377,0 ino:30968 uid:0 gid:0 size:0
            O_RDWR FD_CLOEXEC
              sockname: AF_INET6 ::ffff:10.116.10.50  port: 1522
              peername: AF_INET6 ::ffff:10.55.101.112  port: 52955
              SOCK_STREAM
      
       268: S_IFREG mode:0640 dev:285,65562 ino:3052 uid:100 gid:101 size:3925876736 (4GB)
            O_RDWR|O_DSYNC|O_LARGEFILE FD_CLOEXEC
            /oratestdb/oradata/testdb/temp01.dbf
            offset:0
      
      If we make a SQL trace of same query:
      
      alter session set events='10046 trace name context forever, level 12' tracefile_identifier='read_time_stats_1';
      select count(*) from test_gtt_tab t;
      alter session set events='10046 trace name context off';
      
      The TKPORF output looks like:
      
      SQL ID: a4qgnmwbw6nxf Plan Hash: 2907084990
      select count(*) from test_gtt_tab t
      
      call     count       cpu    elapsed       disk      query    current        rows
      ------- ------  -------- ---------- ---------- ---------- ----------  ----------
      Parse        1      0.00       0.00          0          1          0           0
      Execute      1      0.00       0.00          0          0          0           0
      Fetch        2      0.34       4.02     100001     100010          0           1
      ------- ------  -------- ---------- ---------- ---------- ----------  ----------
      total        4      0.34       4.03     100001     100011          0           1
      
      Rows (1st)  Row Source Operation
      ----------  ---------------------------------------------------
               1  SORT AGGREGATE (cr=100010 pr=100001 pw=0 time=4029979 us starts=1)
          700000   TABLE ACCESS FULL TEST_GTT_TAB (cr=100010 pr=100001 pw=0 time=517483 us starts=1 cost=30 size=0 card=8168)
      
      Elapsed times include waiting on following events:
        Event waited on                             Times   Max. Wait  Total Waited
        ----------------------------------------   Waited  ----------  ------------
        db file sequential read                         1        0.07          0.07
        direct path read temp                         535        0.14          3.62
      
      And raw trace file contains 535 lines with nam='direct path read temp' (repeated lines removed):
      
      WAIT #140736367740528: nam='direct path read temp' ela= 21302 file number=3073 first dba=477313 block cnt=31 obj#=4062216 tim=2558073933872
      WAIT #140736367740528: nam='direct path read temp' ela= 21723 file number=3073 first dba=477440 block cnt=32 obj#=4062216 tim=2558073956542
      WAIT #140736367740528: nam='direct path read temp' ela= 863   file number=3073 first dba=475520 block cnt=32 obj#=4062216 tim=2558073957967
      ...
      WAIT #140736367740528: nam='direct path read temp' ela= 115   file number=3073 first dba=375744 block cnt=32 obj#=4062216 tim=2558077865590
      WAIT #140736367740528: nam='direct path read temp' ela= 127   file number=3073 first dba=375808 block cnt=32 obj#=4062216 tim=2558077865921
      
      Now we make a further test to select only 7000 rows:
      
      SQL> select sysdate, large_read_megabytes, large_read_reqs, large_read_servicetime, small_read_megabytes, small_read_reqs, small_read_servicetime
             from v$iostat_file v where filetype_name = 'Temp File';
      
      SYSDATE  LARGE_READ_MEGABYTES LARGE_READ_REQS LARGE_READ_SERVICETIME SMALL_READ_MEGABYTES SMALL_READ_REQS SMALL_READ_SERVICETIME
      -------- -------------------- --------------- ---------------------- -------------------- --------------- ----------------------
      07:30:35               284347         1138328                   9766                  507           63559                    887
      
      SQL> select count(*) from test_gtt_tab t where rownum <=7*1e3;
        COUNT(*)
      ----------
            7000
      
      SQL> select sysdate, large_read_megabytes, large_read_reqs, large_read_servicetime, small_read_megabytes, small_read_reqs, small_read_servicetime
             from v$iostat_file v where filetype_name = 'Temp File';
      
      SYSDATE  LARGE_READ_MEGABYTES LARGE_READ_REQS LARGE_READ_SERVICETIME SMALL_READ_MEGABYTES SMALL_READ_REQS SMALL_READ_SERVICETIME
      -------- -------------------- --------------- ---------------------- -------------------- --------------- ----------------------
      07:30:35               284355         1138361                   9766                  507           63560                    887
      
      There are 33 (1138361-1138328) LARGE_READ and 1 (63560-63559) SMALL_READ to read 8 (284355-284347) MB data, bot both LARGE_READ_SERVICETIME and SMALL_READ_SERVICETIME remain same before and after.

      truss also shows 34 pread.
      
      $ > truss -c -t pread,read,readv -p 27944
        syscall               seconds   calls  errors
        read                     .000       3
        pread                    .002      34
                             --------  ------   ----
        sys totals:              .002      37      0
        usr time:                .004
        elapsed:                4.770
      
      If we repeat the same query 1000 times:
      
      SQL> select sysdate, large_read_megabytes, large_read_reqs, large_read_servicetime, small_read_megabytes, small_read_reqs, small_read_servicetime
             from v$iostat_file v where filetype_name = 'Temp File';
      
      SYSDATE  LARGE_READ_MEGABYTES LARGE_READ_REQS LARGE_READ_SERVICETIME SMALL_READ_MEGABYTES SMALL_READ_REQS SMALL_READ_SERVICETIME
      -------- -------------------- --------------- ---------------------- -------------------- --------------- ----------------------
      07:39:27               383269         1534394                   9848                  601           75561                    887
      
      SQL> begin
             for i in 1..1000 loop
               for c in (select count(*) from test_gtt_tab t where rownum <=7*1e3) loop
                 null;
               end loop;
             end loop;
           end;
           /
      
      Elapsed: 00:00:04.42
      
      SQL> select sysdate, large_read_megabytes, large_read_reqs, large_read_servicetime, small_read_megabytes, small_read_reqs, small_read_servicetime
             from v$iostat_file v where filetype_name = 'Temp File';
      
      SYSDATE  LARGE_READ_MEGABYTES LARGE_READ_REQS LARGE_READ_SERVICETIME SMALL_READ_MEGABYTES SMALL_READ_REQS SMALL_READ_SERVICETIME
      -------- -------------------- --------------- ---------------------- -------------------- --------------- ----------------------
      07:39:31               391512         1567394                   9851                  609           76561                    887
      
      There are 33000 (1567394-1534394) LARGE_READ and 1000 (76561-75561) SMALL_READ. LARGE_READ_REQS increases 8243 (391512-383269) MB. They are exactly 1000 times of one single execution.

      LARGE_READ_SERVICETIME increases 3ms (9851-9848) and SMALL_READ_SERVICETIME remains same before and after.

      truss also shows 34000 pread with sys totals: 1.938 seconds and usr time: 4.093 seconds.
      
      $ > truss -c -t pread,read,readv -p 27944
        syscall               seconds   calls  errors
        read                     .000       2
        pread                   1.938   34000
                             --------  ------   ----
        sys totals:             1.938   34002      0
        usr time:               4.093
        elapsed:                8.350
      
      If we make a SQL trace of same query:
      
      alter session set events='10046 trace name context forever, level 12' tracefile_identifier='read_time_stats_2';
      begin
        for i in 1..1000 loop
          for c in (select count(*) from test_gtt_tab t where rownum <=7*1e3) loop
            null;
          end loop;
        end loop;
      end;
      /
      alter session set events='10046 trace name context off';
      
      TKPROF shows:
      
      SQL ID: 4nca0gxwjjwkx Plan Hash: 2658366304
      SELECT COUNT(*) FROM TEST_GTT_TAB T WHERE ROWNUM <=7*1E3
      
      call     count       cpu    elapsed       disk      query    current        rows
      ------- ------  -------- ---------- ---------- ---------- ----------  ----------
      Parse        1      0.00       0.00          0          2          0           0
      Execute   1000      0.03       0.03          0          0          0           0
      Fetch     1000      4.19       4.40    1056000    1003000          0        1000
      ------- ------  -------- ---------- ---------- ---------- ----------  ----------
      total     2001      4.23       4.44    1056000    1003002          0        1000
      
      Rows (1st) Rows (avg) Rows (max)  Row Source Operation
      ---------- ---------- ----------  ---------------------------------------------------
               1          1          1  SORT AGGREGATE (cr=1003 pr=1056 pw=0 time=4961 us starts=1)
            7000       7000       7000   COUNT STOPKEY (cr=1003 pr=1056 pw=0 time=5612 us starts=1)
            7000       7000       7000    TABLE ACCESS FULL TEST_GTT_TAB (cr=1003 pr=1056 pw=0 time=4484 us starts=1 cost=26 size=0 card=7000)
      
      Elapsed times include waiting on following events:
        Event waited on                             Times   Max. Wait  Total Waited
        ----------------------------------------   Waited  ----------  ------------
        reliable message                             1000        0.00          0.07
        db file sequential read                      1000        0.00          0.01
        direct path read temp                        2184        0.00          0.14
      
      We can also collect one AWR report by:
      
      exec sys.dbms_workload_repository.create_snapshot('ALL');
      begin
        for i in 1..1000 loop
          for c in (select count(*) from test_gtt_tab t where rownum <=7*1e3) loop
            null;
          end loop;
        end loop;
      end;
      /
      exec sys.dbms_workload_repository.create_snapshot('ALL');
      

      IOStat by Filetype summary

        Filetype Name Reads: Data Reqs per sec Data per sec Writes: Data Reqs per sec Data per sec Small Read Large Read
        Temp File 8.1G 6013.44 1.4G 0M 0.00 0M .00ns 151.52ns

        The whole elapsed time is about 6 seconds (8.1/1.4 or 34000/6013.44), with total LARGE_READ time 5 ms (34000*151.52ns). However, the system tool truss shows that 34000 pread system calls already takes 1.938 seconds, and usr time is 4.093 seconds. Oracle Layer SQL Trace shows total elapsed 4.44 seconds, Total Waited of 2184 "direct path read temp" accounts for 0.14 seconds. All those are much bigger than the total disproportionate LARGE_READ time 5 ms.

        Here is a quick query to list Temp File Read Time stats. By looking fluctuation of column ms_per_large_req, we can detect the discrepancies.
        
        select round(l_time_d/nullif(l_req_d, 0),8) ms_per_large_req, round(s_time_d/nullif(s_req_d, 0),8) ms_per_small_req, v.* 
         from (
           select h.end_interval_time ts, f.snap_id,  
                  large_read_megabytes, large_read_reqs, large_read_servicetime,
                  large_read_megabytes-lag(large_read_megabytes)     over (order by f.snap_id) l_mb_d,
                  large_read_reqs-lag(large_read_reqs)               over (order by f.snap_id) l_req_d,
                  large_read_servicetime-lag(large_read_servicetime) over (order by f.snap_id) l_time_d,
                  small_read_megabytes, small_read_reqs, small_read_servicetime,
                  small_read_megabytes-lag(small_read_megabytes)     over (order by f.snap_id) s_mb_d,
                  small_read_reqs-lag(small_read_reqs)               over (order by f.snap_id) s_req_d,
                  small_read_servicetime-lag(small_read_servicetime) over (order by f.snap_id) s_time_d
             from dba_hist_iostat_filetype f, dba_hist_snapshot h 
             where f.snap_id = h.snap_id and filetype_name = 'Temp File'
            order by f.snap_id) v
        -- where l_req_d > 0 or s_req_d > 0
        ;
        
        For above AWR report, the output about LARGE_READ shows that MS_PER_LARGE_REQ is .00015152ms, or 151.52ns.
        
        MS_PER_LARGE_REQ  TS       SNAP_ID LARGE_READ_MEGABYTES LARGE_READ_REQS LARGE_READ_SERVICETIME L_MB_D L_REQ_D L_TIME_D
        ---------------- --------- ------- -------------------- --------------- ---------------------- ------ ------- --------
                          08:28:19   68202               457449         1831394                   9924      0       0        0
               .00015152  08:28:25   68203               465691         1864394                   9929   8242   33000        5
        
        v$iostat_file (dba_hist_iostat_filetype) is underlined by the fixed view x$ksfdstfile. Above test shows that LARGE_READ_SERVICETIME is hardly updated, and not coordinated with LARGE_READ_MEGABYTES LARGE_READ_REQS. It is not clear how x$ksfdstfile collect Temp File Read Time stats and when it is updated.

        In the same DB, there is also one snapshot in which MS_PER_LARGE_REQ is irrational 5592.17391 ms (5 seconds). For such case, it is not yet clear how to reproduce it.
        
        MS_PER_LARGE_REQ  TS       SNAP_ID LARGE_READ_MEGABYTES LARGE_READ_REQS LARGE_READ_SERVICETIME  L_MB_D L_REQ_D L_TIME_D
        ---------------- --------- ------- -------------------- --------------- ---------------------- ------- ------- --------
                          12:00:23   68148                  559            2330                  26162                 
              5592.17391  12:27:19   68149                  628            2606                1569602      69     276  1543440
        
        By the way, for TEST_GTT_TAB "obj#=4062216" in above raw trace file, we can run the query below to map it to TEMP_DATA_OBJECT_ID: 4671616 in x$ktsso (underline fixed view of v$tempseg_usage):
        
        SQL> select o.object_name, o.object_id, ktssoobjd temp_data_object_id, ktssotsnum ts#, ktssofno segfile#,
                    ktssorfno segrfno#, ktssobno segblk#, ktssoexts extents, ktssoblks blocks
               from x$ktsso t, dba_objects o
              where t.ktssoobjn=o.object_id and (o.object_name = 'TEST_GTT_TAB' or ktssoobjd=4062216);
        
        OBJECT_NAME     OBJECT_ID TEMP_DATA_OBJECT_ID  TS#   SEGFILE#   SEGRFNO#    SEGBLK#  EXTENTS  BLOCKS
        -------------- ---------- ------------------- ---- ---------- ---------- ---------- -------- -------
        TEST_GTT_TAB      4062216             4671616    3       3073          1     477312      391  100096
        
        
        SQL> select objd, file#, block#, class#, status, dirty, temp, ts# from v$bh where objd=4671616;
        
            OBJD  FILE#  BLOCK#  CLASS# STATUS     D T  TS#
        -------- ------ ------- ------- ---------- - - ----
         4671616      1  477312       4 xcur       N Y    3
        
        The temp_data_object_id 4671616 is also referred in v$bh as OBJD (CLASS# 4 for 'segment header').

        Oracle deallocate_unused_clause and its effect

        The Purpose of deallocate_unused_clause is to explicitly deallocate unused space at the end of a database object segment and documented as:
          Oracle Database frees only unused space above the high water mark (that is, the point beyond which database blocks 
          have not yet been formatted to receive data). Oracle deallocates unused space beginning from the end of the object 
          and moving toward the beginning of the object to the high water mark.
          
          If an extent is completely contained in the deallocation, then the whole extent is freed for reuse. 
          If an extent is partially contained in the deallocation, then the used part up to the high water mark becomes the extent,
          and the remaining unused space is freed for reuse.
        
        In this Blog, we will make deallocation tests and look its effect in relation to low high-water mark (LHWM) and high high-water mark (HHWM). Then we dump LMT file space header and segment header to reveal extent management internals.

        Note: Tested in Oracle 19.8


        1. Deallocation Test


        At first, we make a test:
        
        -- drop tablespace test_ts including contents and datafiles;
        -- Locally Managed, AUTOALLOCATE (system managed) Tablespace, Segment Space Management AUTO
        SQL> create BIGFILE tablespace test_ts datafile '/oratestdb/oradata/testdb/test_ts.dbf' size 1000m online
                    extent management local autoallocate segment space management auto;
        
        -- drop table test_tab purge;
        SQL> create table test_tab tablespace test_ts as select level x, rpad('ABC', 1000, 'X') y from dual connect by level <= 2*1e4;
        	Table created.
        
        SQL> select round(bytes/1024/1024) mb, blocks, extents, initial_extent/8192 initial_ext_blocks, next_extent/8192 next_ext_blocks
             from dba_segments where segment_name = 'TEST_TAB';
        
                MB     BLOCKS    EXTENTS INITIAL_EXT_BLOCKS NEXT_EXT_BLOCKS
        ---------- ---------- ---------- ------------------ ---------------
                23       2944         38                  8             128
        
        SQL> select blocks, count(*) cnt, blocks*count(*) blocks#
                   ,min(extent_id) min_ext_id, max(extent_id) max_ext_id, max(extent_id)-min(extent_id)+1 ext_id_span
                   ,min(block_id) min_id_blk, max(block_id)+blocks-1 max_id_blk, max(block_id)+blocks-1-min(block_id)+1 blk_id_span
             from dba_extents where segment_name = 'TEST_TAB' group by blocks order by 1;
        
            BLOCKS        CNT    BLOCKS# MIN_EXT_ID MAX_EXT_ID EXT_ID_SPAN MIN_ID_BLK MAX_ID_BLK BLK_ID_SPAN
        ---------- ---------- ---------- ---------- ---------- ----------- ---------- ---------- -----------
                 8         16        128          0         15          16       8696       8823         128
               128         22       2816         16         37          22       8832      11647        2816
        
        We can see that 16 extents of size 8 blocks and 22 extents of size 128 blocks are allocated to the test table, total 38 (16+22) extents, 2944 (128+2816) blocks.

        Now we deallocate unused space:
        
        -- 'keep 0' has no effect since it is the default
        SQL> alter table test_tab deallocate unused keep 0;
        	Table altered.
        
        SQL> select blocks, count(*) cnt, blocks*count(*) blocks#
                   ,min(extent_id) min_ext_id, max(extent_id) max_ext_id, max(extent_id)-min(extent_id)+1 ext_id_span
                   ,min(block_id) min_id_blk, max(block_id)+blocks-1 max_id_blk, max(block_id)+blocks-1-min(block_id)+1 blk_id_span
             from dba_extents where segment_name = 'TEST_TAB' group by blocks order by 1;
        
            BLOCKS        CNT    BLOCKS# MIN_EXT_ID MAX_EXT_ID EXT_ID_SPAN MIN_ID_BLK MAX_ID_BLK BLK_ID_SPAN
        ---------- ---------- ---------- ---------- ---------- ----------- ---------- ---------- -----------
                 8         16        128          0         15          16       8696       8823         128
                96          1         96         37         37           1      11520      11615          96
               128         21       2688         16         36          21       8832      11519        2688
        
        We can see that the last extent (extent_id: 37) is deallocated, its size is shrunk from 128 blocks to 96 blocks, which means that 32 (128-96) unformatted blocks at the end of that extent are freed. MAX_ID_BLK is also reduced 32 (11647-11615).

        Oracle has two High-Water Marks:
          -. Low High-Water Mark (LHWM): Like the old High Water Mark. All blocks below LHWM have been formatted for the table.
          -. High High-Water Mark (HHWM): blocks between LHWM and HHWM are formatted only when used, some blocks could be unformatted.
        
        The last extent is located between LHWM and HHWM, and it can contain blocks which have not yet been formatted, hence to be deallocated. In the above Oracle Docu excerpt, high water mark probably refers to Low High-Water Mark (LHWM).

        After deallocation, total extents is still 38 (16+1+21), but blocks is reduced to 2912 (128+96+2688) blocks from previous 2944 blocks. In the later TEST_TAB segment header dump, we will see following text:
        
          Low HighWater Mark : 
              Highwater::  0x00002d60  ext#: 37     blk#: 96     ext size: 128   
          #blocks in seg. hdr's freelists: 0     
          #blocks below: 2912  
        
        Now we insert one row and then deallocate again:
        
        SQL> insert into test_tab values (987654, rpad('ABC', 1000, 'X'));
        	1 row created.
        
        SQL> commit;
        	Commit complete.
        
        SQL> select blocks, count(*) cnt, blocks*count(*) blocks#
                   ,min(extent_id) min_ext_id, max(extent_id) max_ext_id, max(extent_id)-min(extent_id)+1 ext_id_span
                   ,min(block_id) min_id_blk, max(block_id)+blocks-1 max_id_blk, max(block_id)+blocks-1-min(block_id)+1 blk_id_span
             from dba_extents where segment_name = 'TEST_TAB' group by blocks order by 1;
        
            BLOCKS        CNT    BLOCKS# MIN_EXT_ID MAX_EXT_ID EXT_ID_SPAN MIN_ID_BLK MAX_ID_BLK BLK_ID_SPAN
        ---------- ---------- ---------- ---------- ---------- ----------- ---------- ---------- -----------
                 8         16        128          0         15          16       8696       8823         128
                96          1         96         37         37           1      11520      11615          96
               128         22       2816         16         38          23       8832      11775        2944
        
        SQL> alter table test_tab deallocate unused;
          Table altered.
        
        SQL> select blocks, count(*) cnt, blocks*count(*) blocks#
                   ,min(extent_id) min_ext_id, max(extent_id) max_ext_id, max(extent_id)-min(extent_id)+1 ext_id_span
                   ,min(block_id) min_id_blk, max(block_id)+blocks-1 max_id_blk, max(block_id)+blocks-1-min(block_id)+1 blk_id_span
             from dba_extents where segment_name = 'TEST_TAB' group by blocks order by 1;
        
            BLOCKS        CNT    BLOCKS# MIN_EXT_ID MAX_EXT_ID EXT_ID_SPAN MIN_ID_BLK MAX_ID_BLK BLK_ID_SPAN
        ---------- ---------- ---------- ---------- ---------- ----------- ---------- ---------- -----------
                 8         16        128          0         15          16       8696       8823         128
                96          1         96         37         37           1      11520      11615          96
               128         22       2816         16         38          23       8832      11775        2944
        
        We can see that a new extent (size 128 blocks) is allocated for this single row, and such size of extents is increased from 21 to 22. The previous deallocated extent (size 96 blocks) is not expanded and re-used, but this new produced irregular extent could put additional complexity to extent management.

        In fact, if we did not perform any deallocation, we could safely insert 200 more rows without adding any new extent:
        
        insert into test_tab select level x, rpad('ABC', 1000, 'X') y from dual connect by level <= 200; 
        
        So the effect is that deallocated 32 blocks incurs a new extent of 128 blocks. (Note: blocks of any deleted rows are not eligible to be deallocated because they have been formatted)

        The above test shows that deallocation frees the unformatted blocks of last extent in the first time, and no more further effect in the next execution.


        2. File Header and Bitmap Dump


        Now we create 3 tablesapces, one is BIGFILE, one is SMALLFILE, and one more BIGFILE with "_enable_12g_bft"=FALSE (default is TRUE). On each of them, we also create one table.
        
        drop tablespace test_ts including contents and datafiles;
        create BIGFILE tablespace test_ts datafile '/oratestdb/oradata/testdb/test_ts.dbf' size 1000m online
               extent management local autoallocate segment space management auto;
        drop table test_tab purge;
        create table test_tab tablespace test_ts as select level x, rpad('ABC', 1000, 'X') y from dual connect by level <= 2*1e4; 
        
        drop tablespace test_ts_small including contents and datafiles;
        create SMALLFILE tablespace test_ts_small datafile '/oratestdb/oradata/testdb/test_ts_small.dbf' size 1000m online
               extent management local autoallocate segment space management auto;
        drop table test_tab_small purge;
        create table test_tab_small tablespace test_ts_small as select level x, rpad('ABC', 1000, 'X') y from dual connect by level <= 2*1e4; 
        
        --alter system reset "_enable_12g_bft" scope=memory;     -- (enable 12g bigfile tablespace, default TRUE)
        alter system set "_enable_12g_bft"=FALSE scope=memory;
        
        drop tablespace test_ts_12g_bft_false including contents and datafiles;
        create BIGFILE tablespace test_ts_12g_bft_false datafile '/oratestdb/oradata/testdb/test_ts_12g_bft_false.dbf' size 1000m online
               extent management local autoallocate segment space management auto;
        drop table test_tab_12g_bft_false purge;
        create table test_tab_12g_bft_false tablespace test_ts_12g_bft_false as select level x, rpad('ABC', 1000, 'X') y from dual connect by level <= 2*1e4; 
        
        alter system reset "_enable_12g_bft" scope=memory; 
        
        Now we query FILE_ID for each tablespace and dump block 2 and 3:
        
        SQL > select tablespace_name, file_id from dba_data_files where tablespace_name in ('TEST_TS', 'TEST_TS_SMALL', 'TEST_TS_12G_BFT_FALSE');
        
        TABLESPACE_NAME                   FILE_ID
        ------------------------------ ----------
        TEST_TS                              1968
        TEST_TS_SMALL                        1969
        TEST_TS_12G_BFT_FALSE                1970
        
        alter session set tracefile_identifier = 'TEST_TS';
        alter system dump datafile 1968 block min 2 block max 3;
        
        alter session set tracefile_identifier = 'TEST_TS_SMALL';
        alter system dump datafile 1969 block min 2 block max 3;
        
        alter session set tracefile_identifier = 'TEST_TS_12G_BFT_FALSE';
        alter system dump datafile 1970 block min 2 block max 3;
        
        Here all 3 dumps (many detail lines removed):
        
        --============ TEST_TS ============--
        Start dump data blocks tsn: 2882 file#:1968 minblk 2 maxblk 3
        BH (0x118fd2048) file#: 1968 rdba: 0x00000002 (1024/2) class: 13 ba: 0x118bbe000
          frmt: 0x02 chkval: 0xff66 type: 0x4c=KTFBN File Space Header
          
        File Space Header Block:
        ------------------------
        Header Control:
        ---------------
          RelFno:1024 Unit:8 Size:128000 Initial:8691 Tail:128000
          AutoExtend:NO Increment:0 MaxSize:0
          Flag:0x00000901 (BASIC/-/-/-/-/-/-/-/BIGF/-/AUTOALLOC)
          Deallocation scn: 0x00000000fb5cd8ea 
        BigFile Header Control:
        -----------------------
          First_FSB:3 FSB_Batch:128 FSB_Chunk:1 FSB_FFBs:16384
          First_FFB:131 FFB_Chunk:16 FFB_AUs:32768
          -----------------
          FSB instance map:
          -----------------
          [  0] 0x00000003 (3)
          METADATA Table:
          ---------------
          [ 0] flag:0x7(I/S/P) first:0x00000093(147) len:96
          
        BH (0x128fd8df8) file#: 1968 rdba: 0x00000003 (1024/3) class: 12 ba: 0x128c5a000
          frmt: 0x02 chkval: 0xa038 type: 0x4d=KTFBN File Space Level 2 Bitmap
        FSB BitMap Control:
        -------------------
          begin:0 unit:16384 flag:0x1
        FFB Bitmap:
        
        
        --============ TEST_TS_SMALL ============--
        Start dump data blocks tsn: 2883 file#:1969 minblk 2 maxblk 3
        BH (0x154f7dbb0) file#: 1969 rdba: 0xec800002 (946/2) class: 13 ba: 0x154440000
          frmt: 0x02 chkval: 0xbc13 type: 0x1d=KTFB Bitmapped File Space Header
        File Space Header Block: 
        Header Control: 
        RelFno: 946, Unit: 8, Size: 128000, Flag: 1(0x1) 
        AutoExtend: NO, Increment: 0, MaxSize: 0 
        Initial Area: 126, Tail: 127999, First: 368, Free: 15616 
        
        BH (0x124fd52e8) file#: 1969 rdba: 0xec800003 (946/3) class: 12 ba: 0x124c06000
          frmt: 0x02 chkval: 0xac2b type: 0x1e=KTFB Bitmapped File Space Bitmap
        File Space Bitmap Block: 
        BitMap Control: 
        RelFno: 946, BeginBlock: 128, Flag: 0, First: 368, Free: 63120 
        
        
        --============ TEST_TS_12G_BFT_FALSE ============--
        Start dump data blocks tsn: 2884 file#:1970 minblk 2 maxblk 3
        BH (0x162fc1248) file#: 1970 rdba: 0x00000002 (1024/2) class: 13 ba: 0x162a3e000
          frmt: 0x02 chkval: 0xdbd1 type: 0x1d=KTFB Bitmapped File Space Header
        File Space Header Block: 
        Header Control: 
        RelFno: 1024, Unit: 8, Size: 128000, Flag: 1(0x1) 
        AutoExtend: NO, Increment: 0, MaxSize: 0 
        Initial Area: 126, Tail: 127999, First: 368, Free: 15616 
        
        BH (0x118fb5fb0) file#: 1970 rdba: 0x00000003 (1024/3) class: 12 ba: 0x118940000
          frmt: 0x02 chkval: 0x4719 type: 0x1e=KTFB Bitmapped File Space Bitmap
        File Space Bitmap Block: 
        BitMap Control: 
        RelFno: 1024, BeginBlock: 128, Flag: 0, First: 368, Free: 63120 
        
        In the above dumps, block 2 is "class: 13" ('file header block'), and block 3 is "class: 12" ('bitmap index block') (See Oracle Internals - Block Classes).

        In TEST_TS dump, block 2 and 3 types are prefixed with "KTFBN", which are:
          KTFBN File Space Header
          KTFBN File Space Level 2 Bitmap
        
        whereas in TEST_TS_SMALL dump, block 2 and 3 types are prefixed with "KTFB", which are:
          KTFB Bitmapped File Space Header
          KTFB Bitmapped File Space Bitmap
        
        TEST_TS_12G_BFT_FALSE is a BIGFILE tablespace similar to TEST_TS, but with "_enable_12g_bft"=FALSE (default is TRUE). Then its header and bitmap are same as SMALLFILE.

        The extents inside a datafile of a locally managed tablespace are tracked in KTFBN or KTFB Bitmap blocks. It seems that this special 12g hidden parameter "_enable_12g_bft" switched Bitmap type identification from KTFB to KTFBN.

        About KTFB Bitmap blocks, Blog: Deciphering extent management internals inside locally managed tablespaces has a deep investigation.


        3. Segment Header and Bitmap Dump


        For 3 created tables, we first find their HEADER_BLOCK, and then dump HEADER_BLOCK and its two preceding blocks:
        
        SQL> select segment_name, header_file, header_block from dba_segments where segment_name in ('TEST_TAB', 'TEST_TAB_SMALL', 'TEST_TAB_12G_BFT_FALSE');
        SEGMENT_NAME           HEADER_FILE HEADER_BLOCK
        ---------------------- ------- ----------- ------------
        TEST_TAB                      1968         8698
        TEST_TAB_SMALL                1969          130
        TEST_TAB_12G_BFT_FALSE        1970          130
        
        alter session set tracefile_identifier = 'TEST_TAB';
        alter system dump datafile 1968 block min 8696 block max 8698;
        
        alter session set tracefile_identifier = 'TEST_TAB_SMALL';
        alter system dump datafile 1969 block min 128 block max 130;
        
        alter session set tracefile_identifier = 'TEST_TAB_12G_BFT_FALSE';
        alter system dump datafile 1970 block min 128 block max 130;
        
        The HEADER_BLOCK is "class: 4" ('segment header'), two preceding blocks are "class: 8" ('1st level bmb') and "class: 9" ('2nd level bmb'):
        
        --============ TEST_TAB ============--
        Start dump data blocks tsn: 2882 file#:1968 minblk 8696 maxblk 8698
        BH (0x118f9aa58) file#: 1968 rdba: 0x000021f8 (1024/8696) class: 8 ba: 0x1186d2000
          frmt: 0x02 chkval: 0xe677 type: 0x20=FIRST LEVEL BITMAP BLOCK
        Dump of First Level Bitmap Block
         --------------------------------  
           First free datablock : 16     
          --------------------------------------------------------
          DBA Ranges :
          --------------------------------------------------------
           0x000021f8  Length: 8      Offset: 0      
           
        BH (0xfbfa0890) file#: 1968 rdba: 0x000021f9 (1024/8697) class: 9 ba: 0xfb758000
          frmt: 0x02 chkval: 0x74b9 type: 0x21=SECOND LEVEL BITMAP BLOCK
        Dump of Second Level Bitmap Block
           number: 52      nfree: 1       ffree: 51     pdba:     0x000021fa 
           
        BH (0xfaf8d8d0) file#: 1968 rdba: 0x000021fa (1024/8698) class: 4 ba: 0xfa5a8000
          frmt: 0x02 chkval: 0x7293 type: 0x23=PAGETABLE SEGMENT HEADER
          Extent Control Header
          -----------------------------------------------------------------
          Extent Header:: spare1: 0      spare2: 0      #extents: 38     #blocks: 2944  
                          last map  0x00000000  #maps: 0      offset: 2716  
              Highwater::  0x00002d60  ext#: 37     blk#: 96     ext size: 128   
          #blocks in seg. hdr's freelists: 0     
          #blocks below: 2912  
          --------------------------------------------------------
          Low HighWater Mark : 
              Highwater::  0x00002d60  ext#: 37     blk#: 96     ext size: 128   
          #blocks in seg. hdr's freelists: 0     
          #blocks below: 2912  
          mapblk  0x00000000  offset: 37    
          Level 1 BMB for High HWM block: 0x00002d01
          Level 1 BMB for Low HWM block: 0x00002d01
          --------------------------------------------------------
          Segment Type: 1 nl2: 1      blksz: 8192   fbsz: 0      
             Map Header:: next  0x00000000  #extents: 38   obj#: 3998654 flag: 0x10000000
          Extent Map
          -----------------------------------------------------------------
           0x000021f8  length: 8     
           0x00002280  length: 128   
          Auxillary Map
          --------------------------------------------------------
           Extent 0     :  L1 dba:  0x000021f8 Data dba:  0x000021fb
          --------------------------------------------------------
           Second Level Bitmap block DBAs 
           --------------------------------------------------------
           DBA 1:   0x000021f9
        
        In the above dump, we can see a special section starting with "Low HighWater Mark". (Blog: Low High Water Mark, High High Water Mark, and Parallel Query has some discussions on LHWM and HHWM).

        The dump of TEST_TAB_SMALL and TEST_TAB_12G_BFT_FALSE are very similar to TEST_TAB.