“enq: TX – contention” on SELECT with a large buffer cache and 2PC

12/07/2013

“enq: TX – contention” on SELECT with a large buffer cache and 2PC

Filed under: 11gR2 — Nigel Noble @ 12:08 pm

Note: The following is based on testing with 11.2.0.3 (I believe same issue exists within other Oracle versions).

I recently worked on an interesting problem relating to the “enq: TX – contention” wait event. There are a number of reasons for the wait but the most common reason I have come across at my site is best described by a forum posting I found by Jonathan Lewis... “select can wait for another transaction if that transaction is between the “prepare” and “commit” of a two-phase commit.”

My site has a number of databases in the UK and Australia. We have often come across the problem when a SELECT statement tries to read a row which is involved in a distributed transaction to Australia. The problem is with the round trip latency to Australia. It is possible that during the communication of the PREPARE and COMMIT phases you have a 200ms – 300ms latency. Anyone trying to read the rows involved in the transaction can see a 300ms pause on “enq: TX – contention” waits. There are a number of tricks you can use to try and reduce these problems by finding ways to separate the rows the SELECT statement reads on the UK data from the rows involved in the Australian transaction. The very fact you visit the same block and evaluate the 2pc row can cause it to hang (Just reading a row to find it’s not actually the one you needed would do it as well.) We use tricks like careful usage of indexes to ensure the reader can go directly to the UK data and not even evaluate the row involved in the Australian 2pc (Partitioning can also help here too).

A new variation of “enq: TX – contention”

We recently noticed some of these “enq: TX – contention” waits in a very time critical transactional UK application and we were seeing multiple concurrent waits of between 1 and 2 seconds each. It seemed an unusually long duration. Looking at ASH (Active Session History) we could see the select statement was blocked by a session which was a database link coming from another UK based database. Investigating the session we found it was a large batch job running between two UK based databases. Further investigation of the actual SELECT statement running during “enq: TX – contention” wait and its blocking session identified a common table but not what caused the long delay.

Our application had the following profile on TWO separate databases:

The Batch Job on DB1
- UK DB 1: Starts a transaction.
- UK DB 1: Updates lot of data in local tables.
- UK DB 1: Updates lot of data in remote table over a dblink (This creates a distributed transaction with UK DB 2. DB 1 waits on “SQLNET message from dblink”)
- UK DB 2: is now part of the distributed transaction and processes the rows to update
- UK DB 2: returns control back to DB1
- UK DB 1: inserts a message to a “queue” message table to say it’s batch data has been processed.
- UK DB 1: Issue a commit of the 2PC (causing it to commit both DB1 and DB2 distributed transaction)

The Transaction Job on DB2
- The critical transaction job running on DB 2 (Completly separate job running on it’s own session)
- UK DB 2: running a critical transaction system on a number of sessions.
- UK DB 2: As well as performing it’s various tasks… Also reads the same message “queue” table to action some tasks when told the large batch job has completed.
- UK DB 2: The SELECT of the message “queue” table (normally takes few ms… now takes between 1 and 2 seconds when it gets stuck on “enq: TX – contention” waits”

We managed to identify that when the large batch job commits on DB1 (2pc), it can cause the fast transactional job on DB2to stall. We knew had had a long pause between the PREPARE phase of the commit and the actual COMMIT completing… but why?

How long should a commit take?

Further investigation showed a strong correlation between the sizes of the BATCH transaction (the amount of data processed) and the duration of the “TX Contention” wait seen by the blocked session. How long SHOULD a commit take to complete? I ran some tests on a large test system and made the following observations….

If I update 1 million rows on a test table, then time the commit…. The actual COMMIT gets faster the smaller the buffer cache gets (ignore that the DML gets slower with less cache as I am only interested in the actual COMMIT time).

Output of test showing commit gets quicker if I have less buffer cache.
This test is run as a local transaction (not a Distributed Transaction)

100GB   (65 main pool 35 keep)  table in KEEP
==================================

16:54:43 SQL> update source_DB2.large_table set status_id = status_id where rownum < 1000000;

999999 rows updated.

16:55:09 SQL> set timing on
16:55:09 SQL> commit;

Commit complete.

Elapsed: 00:00:00.37
16:55:10 SQL>
16:55:10 SQL> set timing off



-----------

9GB   (6 main pool 3 keep)  table in KEEP
17:08:34 SQL> set timing off
17:08:34 SQL> update source_DB2.large_table set status_id = status_id where rownum < 1000000;

999999 rows updated.

17:21:22 SQL> set timing on
17:21:22 SQL> commit;

Commit complete.

Elapsed: 00:00:00.19

----------

4GB    (1main pool and 3 keep) Table in KEEP

17:30:45 SQL> set timing off
17:30:45 SQL> update source_DB2.large_table set status_id = status_id where rownum < 1000000;

999999 rows updated.

17:42:00 SQL> set timing on
17:42:00 SQL> commit;

Commit complete.

Elapsed: 00:00:00.08
17:42:00 SQL>
17:42:00 SQL> set timing off
17:42:00 SQL> spool off

—-

Commit- Cleanout

(See Jonathan Lewis page: Clean it up )

Jonathan’s note describes that when a commit is issued, up to 10% of the buffer cache size worth of blocks involved in the transaction can be touched for commit-cleanout. When you start running very large buffer caches, 10% can be a lot of blocks. 100GB is over 1 million blocks that could be touched as part of the commit phase.

I created a re-producible test case for the Distributed Transaction problem and started to play around with it.

I discussed my theory about commit-cleanout with an Oracle contact in the Real World Performance Group. He also suggested we look at the commit_point_strength parameter.

Commit_point_strength

Commit_point_strength can be set on a database to control which side of a distributed transaction will manage the transaction.

Following is taken from OTN document (XA and Oracle controlled DistributedTransactions)

COMMIT_POINT_STRENGTH is nominated as the “commit point site” for the transaction. For
remote transactions, this is and can only be the database that was updated. The “commit point site” is
the single point where the status of the transaction is recorded and it will commit before any of the
other databases involved in the transaction. It is the only database that never enters the “prepare” state
and can therefore never have any ‘in-doubt’ transactions on failure. In effect, the outcome of the
distributed or remote transaction at the commit point site determines whether the transaction is
committed or rolled back. Other nodes if involved will follow the lead of the commit point site. The
global coordinator ensures that all nodes within a distributed transaction complete the transaction in
the same manner as the commit point site.
Note that for distributed transactions, the database selected as commit point site should be the
database that stores the most critical data, as this database never enters the prepared state and
consequently can never become “in-doubt”, even if a failure occurs. In failure situations, failed nodes
remain in a prepared state, holding locks on affected data until the “in-doubt” transactions are
resolved.

My “DB1” already happened to have higher commit point strength of 100 and the other database which was just 1.
When we checked four databases involved in distributed transactions on our production site the “DB1” database did not have any “enq: TX – contention” waits (which was a good sign to suggest moving commit_point_strength may help us).

Although changing the commit_point_strength might help I still wanted to understand the duration of the delay. Changing the commit_point_strength could also change the behavior if we did have any in-doubt transactions on the database)

My Test Case

I start the transaction from DB1 which ( also has higher commit strength = 100 which is the case in production)
DB1 updates 2mil rows on local table
DB1 then updates 2 million rows over dblink to DB2 (commit strength = 1)
I then alternate some other local and remote tables. Some of these are 2 – 3 times the size of initial test tables.
Last of all I update a single row in my “queue” message table.
Issue the commit (All from sqlplus so should be log file sync’d aswell)
During the above test, I have another session on DB2 polling (select) the queue table every 10ms. This runs with sql trace level 8 so I can grep for the “enq: TX – contention” waits after the test completes.

I initially managed to scale the “TX contention” delay on the DB2 polling session to aprox 300 – 400ms running the above test / transaction sizes.

Our Oracle contact told us of an unsupported way to completly switch off the commit-cleanout operations: (Note: This should only be done on a test system you don’t care about. I am NOT going to document how this was done as the information is not in the public domain)

By running various combinations of switching off commit-cleanout on DB2 and not DB1 and vice-a-versa we found the following.

Commit-cleanout of DB1 was the important factor. DB2 was not an issue. We modified the test so DB2 only needed to update a single row and insert into the message table for the issue to still occur. I then continued to increase the size of the transaction in DB1 and managed to scale the “enq: TX- contention” wait to over 1.2 seconds.

As soon as i switched off commit-cleanout in DB1 I nolonger got any TX contention waits within DB2.

Explanation of the delay

Commit-cleanout causes the delay on the DB2 transaction job (This is based on a description from our Oracle contact).

Transaction T updates lots of blocks on DB1
Transaction T updates a row on DB2 (becoming a distributed transaction, if it isn’t one already)
COMMIT is issued for T on DB1
DB1 asks DB2 to PREPARE
The fast running daemon job blocks at the next query after PREPARE
DB1 receives the PREPARE response, and performs a local COMMIT, which takes a while because of cleanout (cleanout being applied to potentially 10% size of buffer cache).
DB1 ask DB2 to COMMIT
DB2 performs a COMMIT
As soon as the commit record hits disk, the daemon job unblocks
If the COMMIT on DB2 takes a while because of cleanout, the transaction on DB1 will be kept waiting a while longer (but we’re probably not measuring this because we’re not too worried about it)

SQL Traces

We traced the 3 sessions involved and you can see the above description in the trace output:

The initial session has done a lot of transaction work and then we see the commit occur between lines 32 – 45. You can see as part of the Two Phase Commit the time spent waiting on the dblink. You can also see the elapsed time of the EXEC of the commit being e=1237529 (1.23 seconds). Note that the actual COMMIT in a trace file is really shown as the “XCTEND” line.


DB1 (as it COMMITS) 


WAIT #48003855878616: nam='db file sequential read' ela= 341 file#=2 block#=1141261 blocks=1 obj#=0 tim=1366907498153354
WAIT #48003855878616: nam='db file sequential read' ela= 361 file#=2 block#=1141262 blocks=1 obj#=0 tim=1366907498156241
WAIT #48003855878616: nam='db file sequential read' ela= 222 file#=2 block#=1141263 blocks=1 obj#=0 tim=1366907498158793
WAIT #48003855878616: nam='db file sequential read' ela= 340 file#=2 block#=1141264 blocks=1 obj#=0 tim=1366907498161786
WAIT #48003855878616: nam='db file sequential read' ela= 325 file#=2 block#=1141265 blocks=1 obj#=0 tim=1366907498164878
WAIT #48003855878616: nam='db file sequential read' ela= 223 file#=2 block#=1141266 blocks=1 obj#=0 tim=1366907498167504
EXEC #48003855878616:c=598471018,e=789131425,p=231884,cr=235832,cu=32696002,mis=0,r=15999992,dep=0,og=1,plh=1102971125,tim=1366907498168116
STAT #48003855878616 id=1 cnt=0 pid=0 pos=1 obj=0 op='UPDATE  SETT_BIG3 (cr=235832 pr=231884 pw=0 time=789131330 us)'
STAT #48003855878616 id=2 cnt=15999992 pid=1 pos=1 obj=12256661 op='TABLE ACCESS FULL SETT_BIG3 (cr=235664 pr=0 pw=0 time=19970841 us cost=40159 size=76988304 card=19247076)'
WAIT #48003855878616: nam='SQL*Net message to client' ela= 4 driver id=1650815232 #bytes=1 p3=0 obj#=0 tim=1366907498168305
WAIT #48003855878616: nam='SQL*Net message from client' ela= 872 driver id=1650815232 #bytes=1 p3=0 obj#=0 tim=1366907498169204
CLOSE #48003855878616:c=0,e=7,dep=0,type=0,tim=1366907498169264
WAIT #48003855878616: nam='SQL*Net message to dblink' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=0 tim=1366907498169488
WAIT #48003855878616: nam='SQL*Net message from dblink' ela= 3367 driver id=1413697536 #bytes=1 p3=0 obj#=0 tim=1366907498172877
WAIT #48003855878616: nam='SQL*Net message to dblink' ela= 7 driver id=1413697536 #bytes=1 p3=0 obj#=0 tim=1366907498173155
WAIT #48003855878616: nam='SQL*Net message from dblink' ela= 711 driver id=1413697536 #bytes=1 p3=0 obj#=0 tim=1366907498173892
=====================
PARSING IN CURSOR #48003855878616 len=65 dep=0 uid=370 oct=6 lid=370 tim=1366907498174148 hv=1477430740 ad='209f99a028' sqlid='40hwrntc0zmfn'
update  nigel.queue@nigel_DB2 set txt = 'dddd' where id  = 1
END OF STMT
PARSE #48003855878616:c=0,e=4842,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=0,tim=1366907498174146
WAIT #48003855878616: nam='SQL*Net message to dblink' ela= 0 driver id=1413697536 #bytes=1 p3=0 obj#=0 tim=1366907498174298
WAIT #48003855878616: nam='SQL*Net message from dblink' ela= 1549 driver id=1413697536 #bytes=1 p3=0 obj#=0 tim=1366907498175867
EXEC #48003855878616:c=0,e=1710,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,plh=0,tim=1366907498175928
WAIT #48003855878616: nam='SQL*Net message to client' ela= 3 driver id=1650815232 #bytes=1 p3=0 obj#=0 tim=1366907498175991
WAIT #48003855878616: nam='SQL*Net message from client' ela= 339 driver id=1650815232 #bytes=1 p3=0 obj#=0 tim=1366907498176353
CLOSE #48003855878616:c=0,e=18,dep=0,type=0,tim=1366907498176428
=====================
PARSING IN CURSOR #48003855878616 len=6 dep=0 uid=370 oct=44 lid=370 tim=1366907498176518 hv=3480936638 ad='0' sqlid='23wm3kz7rps5y'
commit
END OF STMT
PARSE #48003855878616:c=0,e=43,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,plh=0,tim=1366907498176517
XCTEND rlbk=0, rd_only=0, tim=1366907498176582
WAIT #48003855878616: nam='log file sync' ela= 891 buffer#=49185 sync scn=3077596287 p3=0 obj#=-1 tim=1366907498186812
WAIT #48003855878616: nam='SQL*Net message to dblink' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1366907498186943
WAIT #48003855878616: nam='SQL*Net message from dblink' ela= 938 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1366907498187917
WAIT #48003855878616: nam='log file sync' ela= 670 buffer#=49187 sync scn=3077596291 p3=0 obj#=-1 tim=1366907498188782

*** 2013-04-25 16:31:39.412
WAIT #48003855878616: nam='SQL*Net message to dblink' ela= 9 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1366907499412179
WAIT #48003855878616: nam='SQL*Net message from dblink' ela= 1299 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1366907499413647
EXEC #48003855878616:c=1232812,e=1237529,p=0,cr=0,cu=10,mis=0,r=0,dep=0,og=0,plh=0,tim=1366907499414084
WAIT #48003855878616: nam='SQL*Net message to client' ela= 4 driver id=1650815232 #bytes=1 p3=0 obj#=0 tim=1366907499414163

*** 2013-04-25 16:33:27.444
WAIT #48003855878616: nam='SQL*Net message from client' ela= 108030586 driver id=1650815232 #bytes=1 p3=0 obj#=0 tim=1366907607444780
XCTEND rlbk=0, rd_only=1, tim=1366907607444991
CLOSE #48003855878616:c=0,e=8,dep=0,type=0,tim=1366907607445122
WAIT #0: nam='SQL*Net message to dblink' ela= 3 driver id=1413697536 #bytes=1 p3=0 obj#=0 tim=1366907607445499
WAIT #0: nam='SQL*Net message from dblink' ela= 1107 driver id=1413697536 #bytes=1 p3=0 obj#=0 tim=1366907607446630

NOTE: You can see the commit take 1.2 seconds long. I assume if you break down the tim values you can see where the time actually goes between the chat between DB1 and DB2.

This is now the session on the other side of the database link (on DB2).
Lines 27 – 33 show the commit from the remote side of the transaction.
You can see to “log file sync” waits with a “SQL*Net message from client’ ela= 1224672”. It looks like the PREPARE does a “log file sync”, then we wait on the SQL*Net client for 1.22 seconds (which in this case is the DB1 original sessions”) then the second “log file sync” seems to be the completion of the 2pc. Most of this delay has been waiting for DB1 to perform commit-cleanout on it’s dirty blocks (we have 100GB buffer cache).


DB2 DB LINK as it commits

*** 2013-04-25 16:27:43.899
WAIT #47521476037632: nam='SQL*Net message from client' ela= 2066407044 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1366907263899665
CLOSE #47521476037632:c=0,e=15,dep=0,type=1,tim=1366907263899876
=====================
PARSING IN CURSOR #47521476002352 len=47 dep=0 uid=196 oct=3 lid=196 tim=1366907263902418 hv=1355272285 ad='1afff6d3a8' sqlid='7mmfcsx8cgn2x'
SELECT /*+ FULL(P) +*/ * FROM "NIGEL"."QUEUE" P
END OF STMT
PARSE #47521476002352:c=3000,e=2487,p=0,cr=1,cu=0,mis=1,r=0,dep=0,og=1,plh=1157425618,tim=1366907263902417
WAIT #47521476002352: nam='SQL*Net message to client' ela= 4 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1366907263902597
WAIT #47521476002352: nam='SQL*Net message from client' ela= 509 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1366907263903229
WAIT #47521476002352: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1366907263903839
WAIT #47521476002352: nam='SQL*Net message from client' ela= 505 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1366907263904365
CLOSE #47521476002352:c=0,e=6,dep=0,type=0,tim=1366907263904399
=====================
PARSING IN CURSOR #47521476037632 len=64 dep=0 uid=196 oct=6 lid=196 tim=1366907263905383 hv=2400842460 ad='1b1f221af0' sqlid='3jpb77q7jmwqw'
UPDATE "NIGEL"."QUEUE" "A1" SET "TXT" = 'dddd' WHERE "A1"."ID"=1
END OF STMT
PARSE #47521476037632:c=999,e=956,p=0,cr=1,cu=0,mis=1,r=0,dep=0,og=1,plh=860449894,tim=1366907263905381
EXEC #47521476037632:c=0,e=272,p=0,cr=3,cu=2,mis=0,r=1,dep=0,og=1,plh=860449894,tim=1366907263905705
STAT #47521476037632 id=1 cnt=0 pid=0 pos=1 obj=0 op='UPDATE  QUEUE (cr=3 pr=0 pw=0 time=225 us)'
STAT #47521476037632 id=2 cnt=1 pid=1 pos=1 obj=2718475 op='TABLE ACCESS FULL QUEUE (cr=3 pr=0 pw=0 time=61 us cost=2 size=65 card=1)'
WAIT #47521476037632: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1366907263905798
WAIT #47521476037632: nam='SQL*Net message from client' ela= 11279 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1366907263917092
CLOSE #47521476037632:c=0,e=4,dep=0,type=0,tim=1366907263917154
XCTEND rlbk=0, rd_only=0, tim=1366907263917212
WAIT #0: nam='log file sync' ela= 376 buffer#=119687 sync scn=3077596289 p3=0 obj#=-1 tim=1366907263917789
WAIT #0: nam='SQL*Net message to client' ela= 0 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1366907263917834

*** 2013-04-25 16:27:45.142
WAIT #0: nam='SQL*Net message from client' ela= 1224672 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1366907265142517
WAIT #0: nam='log file sync' ela= 370 buffer#=119853 sync scn=3077596449 p3=0 obj#=-1 tim=1366907265143222
WAIT #0: nam='SQL*Net message to client' ela= 3 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1366907265143406

*** 2013-04-25 16:29:33.174
WAIT #0: nam='SQL*Net message from client' ela= 108030822 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1366907373174249
XCTEND rlbk=0, rd_only=1, tim=1366907373174431

NOTE: Above I see two log file sync waits with 1.2 seconds between them – assume this is first and second part of the commit with the time waiting on DB1 in between.

This is the trace for our fast transactional job. It sleeps for 10ms and then tries to read from a small message table to be told when work was completed by the large 2pc batch job. It’s this message table we stall on when we try and read data during the 2pc of the sessions above.


DB2 reader job as it gets delayed by the TX contention.

WAIT #47750313386464: nam='PL/SQL lock timer' ela= 10899 duration=0 p2=0 p3=0 obj#=-1 tim=1366907263894098
EXEC #47750313382640:c=0,e=16,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=3515042038,tim=1366907263894169
FETCH #47750313382640:c=0,e=19,p=0,cr=3,cu=0,mis=0,r=1,dep=1,og=1,plh=3515042038,tim=1366907263894200
CLOSE #47750313382640:c=0,e=1,dep=1,type=3,tim=1366907263894224
WAIT #47750313386464: nam='PL/SQL lock timer' ela= 10839 duration=0 p2=0 p3=0 obj#=-1 tim=1366907263905075
EXEC #47750313382640:c=0,e=19,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=3515042038,tim=1366907263905145
FETCH #47750313382640:c=0,e=21,p=0,cr=3,cu=0,mis=0,r=1,dep=1,og=1,plh=3515042038,tim=1366907263905178
CLOSE #47750313382640:c=0,e=1,dep=1,type=3,tim=1366907263905202
WAIT #47750313386464: nam='PL/SQL lock timer' ela= 10871 duration=0 p2=0 p3=0 obj#=-1 tim=1366907263916085
EXEC #47750313382640:c=0,e=16,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=3515042038,tim=1366907263916159
FETCH #47750313382640:c=0,e=252,p=0,cr=5,cu=0,mis=0,r=1,dep=1,og=1,plh=3515042038,tim=1366907263916424
CLOSE #47750313382640:c=0,e=2,dep=1,type=3,tim=1366907263916453
WAIT #47750313386464: nam='PL/SQL lock timer' ela= 10622 duration=0 p2=0 p3=0 obj#=-1 tim=1366907263927086
EXEC #47750313382640:c=0,e=16,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=3515042038,tim=1366907263927168

*** 2013-04-25 16:27:44.928
WAIT #47750313382640: nam='enq: TX - contention' ela= 1000879 name|mode=1415053316 usn<<16 | slot=27590684 sequence=168697 obj#=-1 tim=1366907264928133
WAIT #47750313382640: nam='enq: TX - contention' ela= 214602 name|mode=1415053316 usn<<16 | slot=27590684 sequence=168697 obj#=-1 tim=1366907265142859
FETCH #47750313382640:c=0,e=1215779,p=0,cr=5,cu=2,mis=0,r=1,dep=1,og=1,plh=3515042038,tim=1366907265142959
CLOSE #47750313382640:c=0,e=1,dep=1,type=3,tim=1366907265143000
WAIT #47750313386464: nam='PL/SQL lock timer' ela= 11115 duration=0 p2=0 p3=0 obj#=-1 tim=1366907265154134
EXEC #47750313382640:c=0,e=22,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=3515042038,tim=1366907265154216
FETCH #47750313382640:c=0,e=28,p=0,cr=3,cu=0,mis=0,r=1,dep=1,og=1,plh=3515042038,tim=1366907265154257
CLOSE #47750313382640:c=0,e=1,dep=1,type=3,tim=1366907265154281
WAIT #47750313386464: nam='PL/SQL lock timer' ela= 10822 duration=0 p2=0 p3=0 obj#=-1 tim=1366907265165115
EXEC #47750313382640:c=0,e=19,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=3515042038,tim=1366907265165182
FETCH #47750313382640:c=0,e=20,p=0,cr=3,cu=0,mis=0,r=1,dep=1,og=1,plh=3515042038,tim=1366907265165215
CLOSE #47750313382640:c=0,e=1,dep=1,type=3,tim=1366907265165239

Summary

So we know the size of the buffer cache/commit-cleanout can directly affect the duration of “’enq: TX – contention” waits. We also tested switching around the commit_point_strength on the two database and in this case it DID resolve the issue, however we could not be sure we would not introduce waits within the other database if we moved it. We are currently exploring a small application change to split the update of the small message table at the end of the larger batch job into it’s own transaction (making sure we handle any side effects of it being a separate transaction)

Oracle Enhancement Request…. Please!

This issue (And one I talked about in the past.. Strange statistic, large transactions, dirty buffers and direct path read ) highlight it would be really nice if Oracle made the percentage of commit-cleanout blocks touched during commit a tunable parameter). I have passed this requirement to the Oracle contact and we will just have to wait and see if anything appears in the future.

Comments (1)

1 Comment »

Very interesting article, well-explained and illustrated – as usual.

Comment by Dom Brooks — 12/07/2013 @ 2:27 pm

Reply

RSS feed for comments on this post. TrackBack URI

Nigel Noble's Oracle Blog

12/07/2013