Wednesday, July 30, 2025

Reading CLOB Line by Line with SQL and Plsql

This blog will demonstrate different implementations of reading CLOB line by line in SQL and Plsql.
For both SQL and Plsql, we show one slow and one fast version.

Then we run tests, compare and investigate their performance difference.

In next Blog: Reading CLOB Line by Line with SQL and Plsql: ChatGPT and Gemini Implementations
we will give a look of SQL and Plsql implementations provided by ChatGPT and Gemini.

Note: Tested on Oracle 19c.


1. Test Setup


1.1 Create Table and Filling Procedure


drop table test_tab_lob cascade constraints;

create table test_tab_lob(id number, mylob clob, note varchar2(100));

create or replace procedure create_test_data(p_id number, p_cnt number, p_last_line_no_LF number := 0) as 
  LF     constant varchar2(1)  := chr(10);
  l_mylob         clob; 
begin
  insert into test_tab_lob values (p_id, empty_clob(), 'CLOB Line Count = '||p_cnt||', Line_Last_Line_No_LF = '||p_last_line_no_LF) 
         returning mylob into l_mylob;
  
  for i in 1..p_cnt loop
      dbms_lob.append(dest_lob => l_mylob, src_lob => rpad('Line_'||i||'-', least(10+i-1, 1000), 'X')||LF);  
  end loop;
  if p_last_line_no_LF = 1 then 
    dbms_lob.append(dest_lob => l_mylob, src_lob => rpad('Line_Last_Line_No_LF-', 20, 'Y'));  -- add last line without LF
  end if;
  dbms_output.put_line(length(l_mylob));
  commit;
end;
/


1.2. Create Test Data


---- Smoking Test with LF at the end of Last Line
exec create_test_data(1, 5);

---- Smoking Test without LF at the end of Last Line
exec create_test_data(2, 5, 1);

---- CLOB with 1000 Lines
exec create_test_data(3, 1000);

---- CLOB with 10*1000 Lines
exec create_test_data(4, 10*1000);

---- CLOB with 100*1000 Lines
exec create_test_data(5, 100*1000);


1.3. Check Table CLOB Size


select id, length(mylob), note, round(dbms_lob.getlength(mylob)/1024/1024, 2) LOB_MB from test_tab_lob t;

    ID LENGTH(MYLOB) NOTE                                                             LOB_MB
  ---- ------------- ------------------------------------------------------------ ----------
     1            65 CLOB Line Count = 5, Line_Last_Line_No_LF = 0                         0
     2            85 CLOB Line Count = 5, Line_Last_Line_No_LF = 1                         0
     3        510455 CLOB Line Count = 1000, Line_Last_Line_No_LF = 0                    .49
     4       9519455 CLOB Line Count = 10000, Line_Last_Line_No_LF = 0                  9.08
     5      99609455 CLOB Line Count = 100000, Line_Last_Line_No_LF = 0                94.99


with sq as (select /*+ materialize */ segment_name, index_name from dba_lobs where owner='K' and table_name='TEST_TAB_LOB')
select t.segment_name, tablespace_name, header_file, relative_fno, header_block, bytes, blocks, extents, round(bytes/1024/1024) mb from dba_segments t, sq 
where t.segment_name = sq.segment_name or t.segment_name = sq.index_name;

  SEGMENT_NAME                      TABLESPACE_NAME      HEADER_FILE RELATIVE_FNO HEADER_BLOCK      BYTES     BLOCKS    EXTENTS         MB
  --------------------------------- -------------------- ----------- ------------ ------------ ---------- ---------- ---------- ----------
  SYS_IL0005911971C00002$$          U1                            22         1024      2830970   10485760       1280         25         10
  SYS_LOB0005911971C00002$$         U1                            22         1024      2830962 1143996416     139648        201       1091

(see Blog: "One Oracle CLOB Space Usage Test": 
"https://ksun-oracle.blogspot.com/2025/02/one-oracle-clob-space-usage-test.html")


2. SQL Recursive Query


2.1 SQL Slow Recursive


In Post: "Reading clob line by line with pl\sql"
(https://stackoverflow.com/questions/11647041/reading-clob-line-by-line-with-pl-sql),
there is one ANSI Standard Recursive Query to read clob line by line:

create or replace procedure SQL_SLOW (p_id number) as
  --v_tmp clob := 'aaaa'||chr(10)||'bbb'||chr(10)||'ccccc';
  v_tmp clob;
  l_cnt number := 0;
  l_len number := 0;
begin
  select mylob into v_tmp from test_tab_lob where id = p_id;
  for rec in (with clob_table(c) as (select v_tmp c from dual)
            select regexp_substr(c, '.+', 1, level) text,level line
             from clob_table
          connect by level <= regexp_count(c, '.+')) 
  loop
    --dbms_output.put_line(rec.text);
    l_cnt := l_cnt+1;
    l_len := l_len+length(rec.text);
  end loop;
  dbms_output.put_line('CNT = '||l_cnt ||', Length = '||l_len);
end;
/

-- exec SQL_SLOW(1);
This code has Quadratic Performance (O(n^2)). Each line is read from first CLOB line position, hence n*n/2 scans.
Session shows intensive Oracle Wait Event "direct path read".

With perf tool, we observe following top Oracle subroutine calls:

$ > perf top -p 2450579 -d 2 

    PerfTop:     744 irqs/sec  kernel: 0.0%  exact:  0.0% lost: 0/0 drop: 0/0 [4000Hz cpu-clock:uhpppH],  (target_pid: 2450579)
-------------------------------------------------------------------------------------------------------------------------------------

    10.72%  oracle         [.] lxregmatch
     3.88%  oracle         [.] sxorchk
     3.17%  oracle         [.] lxregmatpush
     3.04%  oracle         [.] lxoCntByte
     2.71%  oracle         [.] lxpoCmpStr


2.2 SQL Fast Recursive


create or replace procedure SQL_FAST (p_id number) as
  l_cnt number := 0;
  l_len number := 0;
begin
  for c in (
    with 
      --lob as (select 1 id, length(mylob)+1 lob_len, mylob||chr(10) mylob from 
      --    (select to_clob('abc'||chr(10)||'x1'||chr(10)||chr(10)||'x2'||chr(10)||'x3') mylob from dual))
      lob as (select id, length(mylob)+1 lob_len, mylob||chr(10) mylob from test_tab_lob where id = p_id)
     ,recur_tab(id, lob_len, mylob, line, total_len, lf_pos, lf_pos_prior, lvl) as (
        select id, lob_len, mylob, dbms_lob.substr(mylob, instr(mylob, chr(10), 1, 1)-1, 1) line
              ,0 total_len, instr(mylob, chr(10), 1, 1) lf_pos, 0 lf_pos_prior, 0 lvl 
          from lob
        union all
        select id, lob_len, mylob, dbms_lob.substr(mylob, lf_pos - 1 - lf_pos_prior, lf_pos_prior+1) line
              ,total_len + length(line) + 1 total_len
              ,instr(mylob, chr(10), lf_pos+1, 1) lf_pos, lf_pos lf_pos_prior, lvl+1 lvl
          from recur_tab where total_len <= length(mylob))
    select t.*, length(line) line_len, dbms_lob.substr(line, length(line), 1) line_text 
      from recur_tab t where line is not null and lf_pos_prior > 0)
  loop
    l_cnt := l_cnt+1;
    l_len := l_len+length(c.line);
  end loop;
  dbms_output.put_line('CNT = '||l_cnt ||', Length = '||l_len);
end;
/

-- exec SQL_FAST(1);
The above code has Linear Prformance (O(n)). Each line is read from next CLOB line position, hence n scans.

Post: "Oracle: Read from CLOB column line by line and insert to the table"
(https://dba.stackexchange.com/questions/10893/oracle-read-from-clob-column-line-by-line-and-insert-to-the-table)
also has some similar code, which wrote:
"This is not a full answer because it only works if your clob is less than 4000 chars."


3. Plsql Implementations


3.1 Plsql Fast Version


In Post: "Reading clob line by line with pl\sql"
(https://stackoverflow.com/questions/11647041/reading-clob-line-by-line-with-pl-sql),
there is one Plsql code: "procedure parse_clob" as follows:

create or replace procedure PLSQL_FAST (p_id number) is
  p_clob         clob;
  l_offset       pls_integer:=1;
  l_line         varchar2(32767);
  l_total_length pls_integer := length(p_clob);
  l_line_length  pls_integer;
  l_cnt          number := 0;
  l_len          number := 0;
begin
  select mylob into p_clob from test_tab_lob where id = p_id;
  l_total_length := length(p_clob);
  dbms_output.put_line('l_total_length = '|| l_total_length); 
  while l_offset <= l_total_length loop
    l_line_length := instr(p_clob, chr(10), l_offset) - l_offset;
    if l_line_length < 0 then
      l_line_length := l_total_length + 1 - l_offset;
    end if;
    l_line:=substr(p_clob, l_offset, l_line_length);
    --dbms_output.put_line(l_line); --do line processing
    l_offset:=l_offset + l_line_length + 1;
    
    l_cnt := l_cnt + 1;
    l_len := l_len + l_line_length;
  end loop;
  
  dbms_output.put_line('CNT = '||l_cnt ||', Length = '||l_len);
end;
/

-- exec PLSQL_FAST(1);
Each line is read from next CLOB line offset.


3.2. Plsql Slow Version


create or replace procedure PLSQL_SLOW(p_id number) is
  p_clob         clob;
  l_offset       pls_integer := 1;
  l_line         varchar2(32767);
  l_total_length pls_integer := length(p_clob);
  l_line_length  pls_integer;
  l_lf_pos       pls_integer;
  l_cnt          number := 0;
  l_len          number := 0;
begin
  select mylob into p_clob from test_tab_lob where id = p_id;
  l_total_length := length(p_clob);
  dbms_output.put_line('l_total_length = '|| l_total_length); 
  while l_offset <= l_total_length loop
    l_lf_pos      := instr(p_clob, chr(10));
    l_line_length := l_lf_pos - 1;
    --dbms_output.put_line('l_offset = '||l_offset); 
    --dbms_output.put_line('l_line_length = '||l_line_length); 
    
    -- In the case of CLOB last line without CHR(10), l_line_length < 0.
    -- instr: If substring not found, INSTR will return 0, hence l_line_length = - l_offset. 
    if l_line_length < 0 then
      l_line_length := l_total_length + 1 - l_offset;
    end if;
    l_line := substr(p_clob, 1, l_line_length);
    --dbms_output.put_line('New l_line_length = '||l_line_length); 
    --dbms_output.put_line(l_line); --do line processing
    l_offset := l_offset + l_line_length + 1;
    p_clob := substr(p_clob, l_lf_pos + 1);
    
    l_cnt := l_cnt + 1;
    l_len := l_len + l_line_length;
  end loop;
  
  dbms_output.put_line('CNT = '||l_cnt ||', Length = '||l_len);
end;
/

-- exec PLSQL_SLOW(1);
CLOB loop calling of substr triggers many dynamic LOB Creations/Destructions in session PL/SQL
(Oracle subroutine "__intel_avx_rep_memcpy": UNIX memcpy).

perf tool shows top Oracle calls:

$ > perf top -p 2450579 -d 2 

  PerfTop:    3878 irqs/sec  kernel: 0.0%  exact:  0.0% lost: 0/0 drop: 0/37 [4000Hz cpu-clock:uhpppH],  (target_pid: 2450579)
-------------------------------------------------------------------------------------------------------------------------------------

    26.05%  oracle         [.] __intel_avx_rep_memcpy
     6.36%  oracle         [.] kcbgcur
     4.76%  oracle         [.] kcbgtcr
     4.05%  oracle         [.] kcbchg1_main
     3.55%  oracle         [.] kcbrls
Post "Testing Oracle's Use of Optane Persistent Memory, Part 1 - Low Latency Commits"
(https://tanelpoder.com/posts/testing-oracles-use-of-optane-persistent-memory/)
wrote: "This is the ìmemcpyî (actually __intel_avx_rep_memcpy)"


4. Test Outcome


4.1. SQL Slow


SQL > exec SQL_SLOW(3);
  CNT = 1,000, Length = 509,455
  Elapsed: 00:04:45.59

SQL > exec SQL_SLOW(4);
  CNT = 10,000, Length = 9,509,455
  Elapsed: 09:49:48.05

SQL > exec SQL_SLOW(5);
-- Not finished after 10 hours


4.2. SQL Fast


SQL > exec SQL_FAST(3);
  CNT = 1,000, Length = 509,455
  Elapsed: 00:00:00.67

SQL > exec SQL_FAST(4);
  CNT = 10,000, Length = 9,509,455
  Elapsed: 00:00:02.78

SQL > exec SQL_FAST(5);
  CNT = 100,000, Length = 99,509,455
  Elapsed: 00:00:39.74


4.3. Plsql Fast


SQL > exec PLSQL_FAST(3);
  CNT = 1,000, Length = 509,455
  Elapsed: 00:00:01.20

SQL > exec PLSQL_FAST(4);
  CNT = 10,000, Length = 9,509,455
  Elapsed: 00:00:04.93

SQL > exec PLSQL_FAST(5);
  CNT = 100,000, Length = 99,509,455
  Elapsed: 00:00:45.29


4.4. Plsql Slow


SQL > exec PLSQL_SLOW(3);
  CNT = 1,000, Length = 509,455
  Elapsed: 00:00:01.36

SQL > exec PLSQL_SLOW(4);
  CNT = 10,000, Length = 9,509,455
  Elapsed: 00:03:22.48

SQL > exec PLSQL_SLOW(5);
  CNT = 100,000, Length = 99,509,455
  Elapsed: 06:16:49.58


4.5. Performance Summary


CLOB_Lines = 1,000
   Code          Elapsed
   -----------   -----------
   SQL_SLOW      00:04:45.59
   SQL_FAST      00:00:00.67
   PLSQL_SLOW    00:00:01.36
   PLSQL_FAST    00:00:01.20    

CLOB_Lines = 10,000
   Code          Elapsed
   -----------   -----------
   SQL_SLOW      09:49:48.05    *** 9 hours
   SQL_FAST      00:00:02.78
   PLSQL_SLOW    00:03:22.48
   PLSQL_FAST    00:00:04.93 

CLOB_Lines = 100,000
   Code          Elapsed
   -----------   -----------
   SQL_SLOW      > 10 hours   *** > 10 hours
   SQL_FAST      00:00:39.74
   PLSQL_SLOW    06:16:49.58  *** 6 hours
   PLSQL_FAST    00:00:45.29