Thursday, August 3, 2023

Oracle SYS_GUID Composition

Oracle Docu wrote:
SYS_GUID generates and returns a globally unique identifier (RAW value) made up of 16 bytes. 
On most platforms, the generated identifier consists of a host identifier, a process or thread identifier of the process 
or thread invoking the function, and a nonrepeating value (sequence of bytes) for that process or thread.
The Oracle Database SYS_GUID function does not return a standard UUID since the generated GUID is not a random number.
(see: IETF RFC 4122 version 4 UUID, and Oracle dbms_crypto.randombytes and Enhancement Suggestion).

In this Blog, we will try to look its construction.

With following query, we can decompose it into 5 components.

with sq as (select sys_guid() gid, s.logon_time, p.spid, s.sid, s.serial#, p.pid from v$session s, v$process p 
             where s.paddr=p.addr and (s.sid = (select distinct sid from v$mystat)))
select substr(gid, 1, 12) seq_inc_by_1
      ,substr(gid, 13, 4)||'(='||to_number(substr(gid, 13, 4), 'XXXXXX')||')' proc_spid  --, to_number(substr(gid, 13, 4), 'XXXXXX') proc_spid_number
      ,substr(gid, 17, 4) unknown_1
      ,substr(gid, 21, 8) unix_host_id
      ,substr(gid, 29, 4) unknown_2
      ,sq.*
from sq;

  SEQ_INC_BY_1   PROC_SPID   UNKNOWN_1 UNIX_HOST_ID  UNKNOWN_2 
  -------------- ----------- --------- ------------  ----------
  01EC602EB9D3   259D(=9629) E063      6E08740A      29BD      
   
  GID                              LOGON_TIME           SPID SID SERIAL# PID
  -------------------------------- -------------------- ---- --- ------- ---
  01EC602EB9D3259DE0636E08740A29BD 2023-AUG-02 09:11:33 9629 197    5605  31


== Legend ==

    SEQ_INC_BY_1 : session own Sequence Number, increase 1 per sys_guid call, 
                   initiated by a number related to v$session logon_time
    PROC_SPID    : v$process.spid
    UNKNOWN_1    : ??? (E063 or E064)
    UNIX_HOST_ID : hostid command output (Linux little endian, 4 bytes reverse order)
    UNKNOWN_2    : ???
With each call of sys_guid, SEQ_INC_BY_1 is increased by 1, starting from some v$session.logon_time.

with sq1 as (select /*+ materialize */ level         nr, substr(sys_guid(), 1, 12) guid_12 from dual connect by level <= 1e6)
    ,sq2 as (select /*+ materialize */ level + 1*1e6 nr, substr(sys_guid(), 1, 12) guid_12 from dual connect by level <= 1e6)
    ,sq3 as (select /*+ materialize */ level + 2*1e6 nr, substr(sys_guid(), 1, 12) guid_12 from dual connect by level <= 1e6)
select min(nr) min_nr, max(nr) max_nr
      ,min(guid_12) min_guid_12, max(guid_12) max_guid_12
      ,count(*) nr_count
      ,to_number(max(guid_12), 'xxxxxxxxxxxx') - to_number(min(guid_12), 'xxxxxxxxxxxx') + 1 nr_count
from 
(select * from sq1
   union 
 select * from sq2
   union
 select * from sq3); 
 
  MIN_NR     MAX_NR MIN_GUID_12  MAX_GUID_12    NR_COUNT   NR_COUNT
  ------ ---------- ------------ ------------ ---------- ----------
       1    3000000 00E804347C2E 00E8046242ED    3000000    3000000
So SEQ_INC_BY_1 is a Sequence Number, increasing 1 per sys_guid call, initiated by a number related to epoch time of v$session logon_time (probably cached in each v$process.spid).

SEQ_INC_BY_1 is 12 hex digits, with maximum decimal value:

  select to_number(lpad('F', 12, 'F'), lpad('X', 12, 'X')) from dual; 
    -- 281474976710655
The last 6 digits represents a pure calling sequence number, the rest prefix digits are UNIX epoch seconds. So the maximum seconds is:

   281474976
Since 281474976 seconds is about 3258 days (281474976/86400) or about 9 years, sys_guid is wrapped on overflow about each 9 years. The first 10 reset datetime can be projected as follows:

select level NR#
      ,to_date('1970-01-01 00:00:00', 'yyyy-mm-dd hh24:mi:ss') + level*281474976/86400 datetime
  from dual connect by level <= 10;

    NR# DATETIME
  ----- --------------------
      1 1978*DEC*02 19:29:36
      2 1987*NOV*03 14:59:12
      3 1996*OCT*04 10:28:48
      4 2005*SEP*05 05:58:24
      5 2014*AUG*07 01:28:00
      6 2023*JUL*08 20:57:36
      7 2032*JUN*08 16:27:12
      8 2041*MAY*10 11:56:48
      9 2050*APR*11 07:26:24
     10 2059*MAR*13 02:56:00
16 bytes sys_guid is a 32 long raw hex value. In each interval, 10/16 of them are starting with number 0-9, 6/16 with A-F.

Given a sys_guid, we can estimate its datetime by:

with sq as (select to_number(substr('00E928C3120917BAE063A005740AC0F2', 1, 12), lpad('X', 12, 'X'))/1e6 epoch_reminder from dual)
select epoch_reminder
      ,to_date('2023*JUL*08 20:57:36', 'YYYY*MON*DD hh24:mi:ss') + epoch_reminder/86400 estimated_datetime
from  sq;

  EPOCH_REMINDER   ESTIMATED_DATETIME
  --------------   --------------------
      1001411.25   2023*JUL*20 11:07:47
For each logon_time, we can estimate SEQ_INC_BY_1 as follows:

create or replace function seq_inc_by_1_calc (logon_time date) return varchar2 as
  l_max_seconds   number;
  l_logon_utc     date;
  l_logon_epoch   number; 
  l_inc_by_1_calc varchar2(20);
begin
  l_max_seconds   := trunc(to_number(lpad('F', 12, 'F'), lpad('X', 12, 'X'))/1e6);
  l_logon_utc     := cast(from_tz(cast(logon_time as timestamp), to_char(systimestamp, 'TZH:TZM')) at time zone 'UTC' as date);
  l_logon_epoch   := (l_logon_utc - to_date('1970-01-01 00:00:00','YYYY-MM-DD HH24:MI:SS')) * 86400;
  l_inc_by_1_calc := to_char(mod(l_logon_epoch, l_max_seconds)*1e6, lpad('X', 12, '0'));
  return l_inc_by_1_calc;
end;
/
For example, here is a test to show generated first 12 hex digits and calculated SEQ_INC_BY_1 (seq_inc_by_1_calc):

with sq as (
  select /*+ materialize */ logon_time, sys_guid() guid, seq_inc_by_1_calc(logon_time) seq_inc_by_1_calc
    from v$session 
   where sid = (select sid from v$mystat where rownum=1))
select logon_time
      ,guid
      ,substr(guid, 1, 12) seq_inc_by_1
      ,seq_inc_by_1_calc
      ,trunc(to_number(substr(guid, 1, 12), lpad('X', 12, 'X'))/1e6) seq_inc_by_1_num
      ,trunc(to_number(seq_inc_by_1_calc, lpad('X', 12, 'X'))/1e6)        seq_inc_by_1_calc_num
  from sq;

  LOGON_TIME           GUID                             SEQ_INC_BY_1   SEQ_INC_BY_1_CALC  SEQ_INC_BY_1_NUM SEQ_INC_BY_1_CALC_NUM
  -------------------- -------------------------------- -------------- ------------------ ---------------- ---------------------
  2023-AUG-02 09:11:33 01EC602EB9C2259DE0636E08740A29BD 01EC602EB9C2    01EB5FD90340               2114737               2110437
To verify the above SEQ_INC_BY_1 estimation, we run Oracle Jobs to collect both generated and computed value and then run query to compare the differences.

drop table test_guid_tab;

create sequence test_guid_seq;

create table test_guid_tab as
  select -123456789 job_id, test_guid_seq.nextval seq, sys_guid() guid
        ,logon_time, sid, s.serial#, spid, pid, sysdate datetime
   from v$session s, v$process p 
   where s.paddr=p.addr and sid = (select sid from v$mystat where rownum=1) and 1=2;

create or replace procedure test_tab_guid_proc(p_job_id number, p_sleep_seconds number := 1) as
begin
  insert into test_guid_tab
  select p_job_id, test_guid_seq.nextval seq, sys_guid() guid
        ,logon_time, sid, s.serial#, spid, pid, sysdate datetime
   from v$session s, v$process p 
   where s.paddr=p.addr and sid = (select sid from v$mystat where rownum=1);
  commit;
  dbms_session.sleep(p_sleep_seconds);
end;
/

--exec test_tab_guid_proc(0);

-- select * from test_guid_tab;

-- collect sys_guid by jobs. Each new started job logs on as a new session with a new logon_time.
create or replace procedure test_tab_guid_job(p_job_cnt number, p_dur_seconds number, p_sleep_seconds number := 1) as
  l_job_name varchar2(50);
begin
  for i in 1..p_job_cnt loop
    l_job_name := 'TEST_JOB_GUID_'||i;
    dbms_scheduler.create_job (
      job_name        => l_job_name,
      job_type        => 'PLSQL_BLOCK',
      job_action      => 
        'begin 
           test_tab_guid_proc('||i||', '||p_sleep_seconds||');
        end;',    
      start_date      => systimestamp,
      end_date        => systimestamp + numtodsinterval(p_dur_seconds, 'SECOND'),
      repeat_interval => 'freq=SECONDLY',
      auto_drop       => true,
      enabled         => true);
  end loop;
end;
/

-- 32 Job sessions running 100 seconds, each time a new logon_time for each session.
--exec test_tab_guid_job(32, 100);


-- check the difference of Oracle generated GUID and calculated GUID

select t.*, seq_inc_by_1_calc(logon_time) guid_12_calc
      ,substr(guid, 1, 12) guid_12
      ,seq_inc_by_1_calc(logon_time) guid_12_calc
      ,trunc(to_number(substr(guid, 1, 12), lpad('X', 12, 'X'))/1e6)            guid_12_num
      ,trunc(to_number(seq_inc_by_1_calc(logon_time), lpad('X', 12, 'X'))/1e6)  guid_12_calc_num
      ,trunc(to_number(substr(guid, 1, 12), lpad('X', 12, 'X'))/1e6)
       -
       trunc(to_number(seq_inc_by_1_calc(logon_time), lpad('X', 12, 'X'))/1e6)  delta
from test_guid_tab t
--where spid=32287
order by sid, logon_time;
By the way, sys_guid() is internally used to generate Oracle AQ msgid out parameter in DBMS_AQ.ENQUEUE and DBMS_AQ.DEQUEUE call. (for some example, see Blog: Oracle Multi-Consumer AQ dequeue with dbms_aq.listen Wait Events (I) )

sys_guid() is Oracle internal subroutine kokidgen. In oracle STANDARD package, it is pesguid.