Wednesday, July 8, 2015

Redo Practice

Although "Undo is more complicated than redo" (Oracle Core: Essential Internals for DBAs and Developers), Redo performance is more visible to applications. End user is affected by "log file sync", while DBA is googling Panacea of "log file parallel write". It travels through all layers of computer system, from applications, Oracle, OS (Scheduler, VM, FS), adapter, down to disk (LUNS, RAID) and network.

Following previous Blog: "UNDO Practice", it is time to do the second exercise in sticking to:
"The Beginners' Guide to Becoming an Oracle Expert" (Page 5).

All tests are done in Oracle 11.2.0.4.0 (see appended Test Code).

1. Asynchronous Commit


The default PL/SQL commit behavior for nondistributed transactions is BATCH NOWAIT if the COMMIT_LOGGING and COMMIT_WAIT database initialization parameters have not been set (Database PL/SQL Language Reference).

Run:

 exec create_awr;
 exec update_test_tab(1000, 1);
 exec create_awr;


AWR shows:

Statistic Total / Waits per Second per Trans
redo size 11,369,340 943,435.40 11,245.64
user commits 1,011 83.89 1.00
redo synch writes 2 0.17 0.00
redo writes 971 80.57 0.96
log file sync 2
log file parallel write 972

Table-1

For 1000 updates and 1000 user commits, it requires 971 "redo write" which triggered almost same number of "log file parallel write" (972), very close to the number of user commits. However, only 2 "redo synch write", hence 2 "log file sync". That is the effect of Asynchronous Commit.

Note that "select for update" shows the similar Redo behaviour as "update" even though there is no real update (see Test Code 2).

This Redo optimization is also effective for Oracle Server-Side Internal Driver JVM.

2. Synchronous Commit


Run "script_1 1000", AWR shows:


Statistic Total / Waits per Second per Trans
redo size 11,202,868 3,002,644.87 11,080.98
user commits 1,011 270.97 1.00
redo synch writes 1,001 268.29 0.99
redo writes 1,036 277.67 1.02
log file sync 1,001
log file parallel write 1,034

Table-2

In case of 1000 Synchronous Commits, all 5 statistics (not including "redo size") are almost same. Each user commit leads to one respective event.

"redo synch write" represents the number of times the redo is forced to disk immediately, usually for a transaction commit.

Client-Side JDBC connection has the similar behaviour. But it is possible to switch off default auto-commit by:
   connection.setAutoCommit(false).

3. Piggybacked Commit


Piggybacked Commit reduces the number of redo writes caused by commits by grouping redo records from several sessions together (Oracle Mos WAITEVENT: "log file sync" Reference Note (Doc ID 34592.1)).

Run:

 exec create_awr;
 exec update_test_tab_loop(1000, 10);
 exec dbms_lock.sleep(120); 
 -- wait for job finished
 exec create_awr;


AWR shows:


Statistic Total / Waits per Second per Trans
redo size 18,138,376 136,033.06 1,808.05
user commits 10,032 75.24 1.00
redo synch writes 12 0.09 0.00
redo writes 440 3.30 0.04
log file sync 12
log file parallel write 439

Table-3

Comparing to Table-1, although we make 10,000 updates and user commits, "redo size" is not 10 times that of Table-1 (18,138,376 vs. 11,369,340), and "redo write" (also "log file parallel write") is even less than that of Table-1 (440 vs. 971). That is the role Piggybacked Commit plays.

LGWR truss outputs:

  listio64(0x0000000010000004, 0x000000000FFFFFFF, 0x00000000FFFDB4D0, 0x0000000000000002, ...) = 0x0000000000000000
  aio_nwait64(0x0000000000001000, 0x0000000000000002, 0x0FFFFFFFFFFEB4D0, 0x800000000000D032, ...) = 0x0000000000000002
  thread_post_many(7, 0x0FFFFFFFFFFF3488, 0x0FFFFFFFFFFF3480) = 0
  listio64(0x0000000010000004, 0x000000000FFFFFFF, 0x00000000FFFDB4D0, 0x0000000000000002, ...) = 0x0000000000000000
  aio_nwait64(0x0000000000001000, 0x0000000000000002, 0x0FFFFFFFFFFEB4D0, 0x800000000000D032, ...) = 0x0000000000000002
  thread_post_many(4, 0x0FFFFFFFFFFF3488, 0x0FFFFFFFFFFF3480) = 0
  listio64(0x0000000010000004, 0x000000000FFFFFFF, 0x00000000FFFDB4D0, 0x0000000000000002, ...) = 0x0000000000000000
  aio_nwait64(0x0000000000001000, 0x0000000000000002, 0x0FFFFFFFFFFEB4D0, 0x800000000000D032, ...) = 0x0000000000000002
  thread_post_many(6, 0x0FFFFFFFFFFF3488, 0x0FFFFFFFFFFF3480) = 0

We can see LGWR "thread_post_many" several sessions (nthreads is the first parameter of "thread_post_many") after each "aio_nwait64". It means that redo of multiple threads (oracle sessions) are written by one single "listio64" and are posted at the same time.

4. Distributed Transactions


We will look at Oracle-Controlled Distributed Transactions (using Database Link).

Run "script_2 1000", AWR in dblocal (Commit Point Site) shows:


Statistic Total / Waits per Second per Trans
redo size 12,135,636 394,513.70 12,003.60
user commits 1,011 32.87 1.00
redo synch writes 3,003 97.62 2.97
redo writes 3,038 98.76 3.00
log file sync 3,002
log file parallel write 3,116
transaction branch allocation 4,012

Table-4


AWR in dbremote shows:

Statistic Total / Waits per Second per Trans
redo size 11,315,104 371,681.63 11,191.99
user commits 1,011 33.21 1.00
redo synch writes 2,003 65.80 1.98
redo writes 2,035 66.85 2.01
log file sync 2,002
log file parallel write 2,036
transaction branch allocation 9,013

Table-5

Comparing to Table-2 of 1000 Synchronous Commits in nondistributed transactions, 1000 updates and 1000 user commits in Distributed Transactions demand for 3 times of redo events in Commit Point Site, and 2 times in other node. "transaction branch allocation" is 4 times, respectively, 9 times.

Each "redo synch write" is translated as one UNIX System Call "pwrite", which can be watched by truss or dtrace command in Solaris. Therefore there are 3,003 pwrites in dblocal, and 2,003 pwrites in dbremote.

By dumping Redo Log only for the "commit" command in dblocal and dbremote with:
     ALTER SYSTEM DUMP LOGFILE '<full_path_logfile_name>';

CHANGE vectors in dblocal are:

CLS:117 AFN:3 DBA:0x00c00290 OBJ:4294967295 SCN:0x0853.9628ab8a SEQ:1 OP:5.2 
CLS:118 AFN:3 DBA:0x00c00550 OBJ:4294967295 SCN:0x0853.9628ab8a SEQ:1 OP:5.1 
CLS:118 AFN:3 DBA:0x00c00550 OBJ:4294967295 SCN:0x0853.9628ab8d SEQ:1 OP:5.1 
CLS:117 AFN:3 DBA:0x00c00290 OBJ:4294967295 SCN:0x0853.9628ab8d SEQ:1 OP:5.12 
CLS:117 AFN:3 DBA:0x00c00290 OBJ:4294967295 SCN:0x0853.9628ab8d SEQ:2 OP:5.12 
CLS:117 AFN:3 DBA:0x00c00290 OBJ:4294967295 SCN:0x0853.9628ab8f SEQ:1 OP:5.4 

CHANGE vectors in dbremote are:

CLS:123 AFN:3 DBA:0x00c002b0 OBJ:4294967295 SCN:0x0853.9628ac1d SEQ:1 OP:5.2
CLS:124 AFN:3 DBA:0x00c009ba OBJ:4294967295 SCN:0x0853.9628ac5c SEQ:1 OP:5.1
CLS:124 AFN:3 DBA:0x00c009ba OBJ:4294967295 SCN:0x0853.9628ac6a SEQ:1 OP:5.1
CLS:123 AFN:3 DBA:0x00c002b0 OBJ:4294967295 SCN:0x0853.9628ac6a SEQ:1 OP:5.4

we can see that "commit" in dblocal includes two additional "OP:5.12" CHANGE vectors on UNDO header (CLS:117), apart from OP:5.2 (change for update transaction table in undo segment header), OP:5.1 (Change for update of the undo block), OP:5.4 (change for Commit). Probably "OP:5.12" is particular for 2pc commit.

As documented in Oracle Two-Phase Commit Mechanism, Commit Point Site experiences 3 Phases: Prepare/Commit/Forget Phases, whereas other resource nodes perform only first 2 Phases.

A redo record, also called a redo entry, is made up of a group of change vectors. In AWR report, we can also check "redo entries".

Each committed transaction has an associated system change number (SCN) to uniquely identify the changes made  by the SQL statements within that transaction.

During the prepare phase, the database determines the highest SCN at all nodes involved in the transaction. The transaction then commits with the high SCN at the commit point site. The commit SCN is then sent to all prepared nodes with the commit decision (Distributed Transactions Concepts).

alert log of dblocal contains the following text when first time to make the Distributed Transaction.

 Fri Jul 03 13:48:00 2015
 Advanced SCN by 29394 minutes worth to 0x0824.738bd480, by distributed transaction logon, remote DB: DBREMOTE.COM.
  Client info: DB logon user TESTU, machine dbremote, program oracle@dbremote (TNS V1-V3), and OS user oracle
 Fri Jul 03 13:50:04 2015 

Following two queries can help monitor Distributed Transactions.

 select 'local' db, v.* from v$global_transaction v  union all
 select 'remote' db, v.* from v$global_transaction@dblinkremote v;
 
 select 'local' db, v.* from v$lock v where type in ('TM', 'TX', 'DX')  union all
 select 'remote' db, v.* from v$lock@dblinkremote v where type in ('TM', 'TX', 'DX');

For XA Based Distributed Transactions (X/Open DTP), the similar behaviour can be observed.
Additionally in AWR - Enqueue Activity shows "DX-Distributed Transaction".

5. Distributed Transaction Commit


We can commit the whole distributed transaction in local db:
   update test_redo set name = 'update_local_1' where id = 1;
   update
test_redo@dblinkremote set name = 'update_remote_1' where id = 1;
   commit;

or commit in remote db:
   update test_redo set name = 'update_local_1' where id = 1;
   update
test_redo@dblinkremote set name = 'update_remote_1' where id = 1;
   exec dblinkremote_commit;


6. Distributed Transaction with autonomous_transaction


We can make one more sophisticated test by incorporating one autonomous_transaction in remote DB.
Run "script_3 1000", AWR in dblocal (Commit Point Site) shows:

Statistic Total / Waits per Second per Trans
redo size 12,010,796 483,565.34 11,880.11
user commits 1,011 40.7 1.00
redo synch writes 3,004 120.94 2.97
redo writes 3,041 122.43 3.01
log file sync 3,004
log file parallel write 3,042
transaction branch allocation 6,010

Table-6

AWR in dbremote shows:

Statistic Total / Waits per Second per Trans
redo size 12,156,052 499,673.30 6,047.79
user commits 2,010 82.62 1.00
redo synch writes 3,002 123.4 1.49
redo writes 3,035 124.75 1.51
log file sync 3,004
log file parallel write 3,034
transaction branch allocation 10,019

Table-7

In dblocal, Table-4 and Table-6 are almost identical.

In dbremote, the numbers in Table-7 is higher than Table-5 due to autonomous_transaction. It looks like no Piggybacked Commit in effect.

The above test case reflects the complexity of three-tier architecture, where middle tier and data tier synchronize atomic data operations with Distributed Transaction. In case of Oracle as data tier, it can involve autonomous_transactions.

7. Distributed Transaction: distributed_lock_timeout


In remote db, set distributed_lock_timeout by:
       alter system set distributed_lock_timeout=27 scope=spfile;
then restart remote db:
       startup force;
in remote db, run;
       update test_redo set name = 'update_remote_0' where id = 1;
and then in local db, execute:
   update test_redo set name = 'update_local_1' where id = 1;
   update
test_redo@dblinkremote set name = 'update_remote_1' where id = 1;
after 27 seconds (waiting Event "enq: TX - contention"), it returns:
   ORA-02049: timeout: distributed transaction waiting for lock
   ORA-02063: preceding line from DBLINKREMOTE


ORA-02049 occurs when the DML modified segment is blocked more than distributed_lock_timeout, for example, if the segment is stored in Oracle bigfile tablespace, and pre-allocation takes more than distributed_lock_timeout. (see Blog: Oracle bigfile tablespace pre-allocation and session blocking).

Test Setup


Setup two DBs, one named dblocal, another dbremote.

Run Test Code 1 on both DBs, additionally run Test Code 2 on dblocal.
Note: adapt database name, user_id, password at first.

---------------- Test Code 1 ----------------

drop table test_redo; create table test_redo ( id int primary key using index (create index ind_p on test_redo (id)), name varchar2(300)); insert into test_redo select level, rpad('abc', 100, 'x') y from dual connect by level <= 10000; exec dbms_stats.gather_table_stats(null, 'TEST_REDO', cascade => true); create or replace procedure create_awr as begin sys.dbms_workload_repository.create_snapshot('ALL'); end; / create or replace procedure db_commit as begin commit; end; / create or replace procedure db_commit_autotrx(i number) as pragma autonomous_transaction; begin update test_redo set name = i||'_autonomous_transaction_at_'||localtimestamp where id = (10000 - (i - 1)); commit; end; /

---------------- Test Code 2 ----------------

drop database link dblinkremote; create database link dblinkremote connect to k identified by k using 'dbremote'; create or replace procedure update_test_tab(p_cnt number, p_job number) as begin for i in 1.. p_cnt loop update test_redo set name = rpad('abc', 100, i) where id = (p_job -1) * 1000 + mod(i, 1001); ---- get similar AWR Redo figures for select for update --for c in (select name from test_redo where id = (p_job -1) * 1000 + mod(i, 1001) for update) --loop null; end loop; commit; end loop; end; / create or replace procedure update_test_tab_loop(p_cnt number, p_job_cnt number) as l_job_id pls_integer; begin for i in 1.. p_job_cnt loop dbms_job.submit(l_job_id, 'update_test_tab('||p_cnt||', '|| i||');'); end loop; commit; end; / create or replace procedure dblinkremote_commit as begin db_commit@dblinkremote; end; / create or replace procedure dblinkremote_commit_autotrx(i number) as begin db_commit_autotrx@dblinkremote(i); end; /

---------------- script_1 ----------------

#!/bin/ksh sqlplus -s testu/testp |& print -p "exec create_awr;" print -p "set feedback off" i=0 while (( i < $1 )) do (( i+=1 )) echo $i print -p "update test_redo set name = rpad('abc', 100, $i) where id = mod($i, 1001);" print -p "commit;" done print -p "exec create_awr;" print -p "exit 0"

---------------- script_2 ----------------

#!/bin/ksh sqlplus -s testu/testp |& print -p "exec create_awr;" print -p "exec create_awr@dblinkremote;" print -p "set feedback off" i=0 while (( i < $1 )) do (( i+=1 )) echo $i print -p "update test_redo set name = rpad('abc', 100, $i) where id = mod($i, 1001);" print -p "update test_redo@dblinkremote set name = rpad('abc', 100, $i) where id = mod($i, 1001);" print -p "commit;" done print -p "exec create_awr;" print -p "exec create_awr@dblinkremote;" print -p "exit 0"

---------------- script_3 ----------------

#!/bin/ksh sqlplus -s testu/testp |& print -p "exec create_awr;" print -p "exec create_awr@dblinkremote;" print -p "set feedback off" i=0 while (( i < $1 )) do (( i+=1 )) echo $i print -p "update test_redo set name = rpad('abc', 100, $i) where id = mod($i, 1001);" print -p "update test_redo@dblinkremote set name = rpad('abc', 100, $i) where id = mod($i, 1001);" print -p "exec dblinkremote_commit_autotrx(mod($i, 1001));" print -p "commit;" done print -p "exec create_awr;" print -p "exec create_awr@dblinkremote;" print -p "exit 0"