I recently investigated a performance problem on an Oracle 11.2 OLTP trading system and although we still don’t fully understand the issue (and which versions of Oracle it effects), I thought I would share what we found (and how we found it). We had a hardware failure on the database server, within 30 seconds the database had automatically been restarted on an idle identical member of the cluster and the application continued on the new database host. A few days later I just happened to notice the following change in the LGWR trace file. I noticed that the Log Writer trace was showing more “kcrfw_update_adaptive_sync_mode” messages than normal.
Note: The following is based on testing with 126.96.36.199 (I believe same issue exists within other Oracle versions).
I recently worked on an interesting problem relating to the “enq: TX – contention” wait event. There are a number of reasons for the wait but the most common reason I have come across at my site is best described by a forum posting I found by Jonathan Lewis... “select can wait for another transaction if that transaction is between the “prepare” and “commit” of a two-phase commit.”
My site has a number of databases in the UK and Australia. We have often come across the problem when a SELECT statement tries to read a row which is involved in a distributed transaction to Australia. The problem is with the round trip latency to Australia. It is possible that during the communication of the PREPARE and COMMIT phases you have a 200ms – 300ms latency. Anyone trying to read the rows involved in the transaction can see a 300ms pause on “enq: TX – contention” waits. There are a number of tricks you can use to try and reduce these problems by finding ways to separate the rows the SELECT statement reads on the UK data from the rows involved in the Australian transaction. The very fact you visit the same block and evaluate the 2pc row can cause it to hang (Just reading a row to find it’s not actually the one you needed would do it as well.) We use tricks like careful usage of indexes to ensure the reader can go directly to the UK data and not even evaluate the row involved in the Australian 2pc (Partitioning can also help here too).
A new variation of “enq: TX – contention”
My site uses a 3rd party SQL monitoring tool which collects data based on the Oracle view v$sqlstats. The tool collects data for all sql statements which have been executed since the previous collection using the last_active_time column. A few months ago we noticed (after an upgrade to 11g) we would occasionally be missing some sql statements from the monitor graphs. We investigated the problem, re-produced a test case, raised a bug with Oracle and Support has just released an 11.2 patch.
Before I explain the issue and demonstrate the issue, I will explain what prompted me to post this blog item. Until now I always thought this just effected v$sqlstats (and my 3rd party monitoring tool) and not AWR. If you ever trace (10046) an AWR collection you will not find any references to v$sqlstats. If the data is missing from the 3rd Party tools “I can always rely on AWR?”…. I thought!
Well, I recently discovered that any time a sql statement drops off the 3rd party monitoring tool…… IT STOPS BEING MONITORED IN AWR’s DBA_HIST_SQLSTAT table as well. We have our AWR collection threshold set to collect as many sql statements as possible. This problem has nothing to do with the cost of the SQL statements and you could well find your most expensive sql statement just disappear from AWR for a period of time.
Let’s start at the very beginning….
I recently investigated an IO performance “spike” on a large 188.8.131.52 transactional system and I thought I would cover some interesting issues found. I am going to take the approach of detailing the observations made from our production and test systems and avoid attempting to cover how other versions of Oracle behave. The investigation also uncovers a confusing database statistic which we are currently discussing with Oracle Development so they can decide if this is an Oracle coding bug or a documentation issue.
The initial IO issue
We run a simple home grown database monitor which watches database wait events and sends an email alert if it detects either a single session waiting on a non-idle wait for a long time or the total number of database sessions concurrently waiting goes above a defined threshold. The monitor can give a number of false alerts but can also draw our attention to some more interesting events.
One day the monitor alerted me to two separate databases waiting on “log file sync” waits at exactly the same few seconds and affecting many hundreds of sessions. Both databases share the same storage array so the obvious place to start was to look at the storage statistics. We found a strange period lasting around 10 seconds long when both databases had a large increase in redo write service times and a few seconds when no IO was written at all. The first database we looked at seemed to show increase in disk service times for a very similar work load. The second system showed a large increase in data file writes (500MB/sec) but without any matching increase in redo log writes. It seems the first database was slowed down by the second database flushing 5GB of data.
Where did 5GB of data file writes come from, and what triggered it?
Looking at the database we knew there were no corresponding redo writes, there were no obvious large sql statements reading or writing. We confirmed these writes were real and not something outside the database.
My site recently upgraded one of its databases to the 10.2.0.5 patchset and found a serious performance problem relating to code using dbms_sql. Once we had completed the upgrade, we noticed a number of data feeds to the upgraded database started to fall behind and could no longer keep up. When we stopped and restarted the feeds they appeared to speed up. After some investigation we found two problems, the first relating to a “log file sync” problem in RAC which is still under investigation (one for a future post when we have more detail) but the second issue caused the performance of the data feed to degrade steadily over time.
We use a couple of products to dynamically feed data to our 10.2.0.5 database and it seemed the performance degraded over time (many days). We did a 10046 session trace and found we lost a lot of time between the dbms_sql.parse of some dynamic SQL and the bind phase. We found reference in Metalink to Bug 10269717 (DBMS_SQL.PARSE leaks session heap memory in 10.2.0.5). Although the bug discusses a memory leak, we found that the performance also degrades over time.
We applied the patch for 10269717 and the PGA memory leak was resolved but more importantly the performance remained constant. I just checked the 10.2.0.5 known issues document and this does now list the bug, but it still only references the leak and not the performance implications.
I thought I would post a very short note about a recent PGA memory “leak” issue we found in one of our applications that appears to exist in Oracle versions 10gR2 through to 11gR2. I would not expect the problem to actually affect many sites so I am not going to spend a huge amount of time showing the test case but thought I would make people aware of the potential issue.
My site introduced a “parallel scheduler” which allows us to break some of our business transactions into parallel jobs. It simply manages the running of some time critical business tasks in parallel but takes full control of the business rules and co-ordinates that all the tasks are complete, verified and handles the rules if parts fail to complete. The program runs itself as a continually running dbms_scheduler job and then schedules the worker threads (within various rules) then keeps careful track of the completion status of each part.
One day our scheduler program failed with ORA-4030 (“out of process memory when trying to allocate %s bytes (%s,%s)”). We re-started the job, but further investigation showed that the job process which had previously been running our scheduler (which was still attached) although now showing a low “session pga memory” had at some point had “session pga memory max” of 4GB. I set up a monitor to collect the “session pga memory” of the re-started job and left it collecting for a few days. When I plotted the PGA memory data we could clearly see the PGA memory appeared to grow during busy periods and not at all at off peak times but importantly never reduced. I sent the memory usage graph to a colleague and after a short while, he sent me back a graph which looked 100% the same as mine……except his graph had a totally different scale and was not memory. The graph he sent me was actually the total number of tasks our scheduler processes was asked to run in the same time period. So we knew we appeared to “leak” a fixed amount of PGA every time we ran a scheduled job.
After some further investigation (dumping “session pga memory” after every call on a test system) we found the problem was not the submission of the job, but after the job had completed, we formally make a call to dbms_scheduler.drop_job and this call “leaks” approximately 21k every call. NOTE: I’ve used the term “leak” a lot and strictly speaking this is not a leak. Oracle knows about all about the memory and when your plsql package completes all the PGA memory is returned. The problem is Oracle does not free the memory during the execution of the main plsql procedure.
We have raised a new bug with Oracle: 9957867 – EXECUTING DBMS_SCHEDULER.DROP_JOB REPEATEDLY IN A PLSQL PROCEDURE LEAKS MEMORY which is still under investigation by Oracle Support. I’ll post an update if I get anymore information.
This problem will really only cause problems if you schedule and then explicitly drop (dbms_scheduler.drop_job) a lot of jobs in the same plsql loop without completing the procedure. The dbms scheduler will by default create jobs with auto_drop=true so again this is only a problem if you create jobs with auto_drop=false because you want to take full control. We have a number of options which my site will consider (Change our code and use auto_drop=true, move the drop_job to another scheduled task which does restart from time to time or finally restart our scheduler task from time to time.
ASSM (Automatic Segment Space Management) has an issue when trying to re-use “deleted” space created by another session. There is a very specific set of circumstances which must occur for this issue to show itself, but will result in tables (and I suspect indexes) growing significantly larger than they need to be. I am aware that the problem exists in versions 10.2.0.3 through to the current 11gR2 inclusive although I don’t know which Oracle release first introduced the problem.
The conditions required to cause the issue
My site has a number of daemon style jobs running permanently on the database loading data into a message table. The daemon is managed by dbms_job and is re-started when the database is first started and can run for many days, weeks (or in some cases months) before the job is ever stopped. We only need to keep the messages for a short time, so we have another daemon job whose role is to delete the messages from the table as soon as the expiry time is reached. In one example we only need to retain the data for a few minutes (after which time we no longer need it) and we also wanted to keep the table as small as possible so it remained cached in the buffer cache (helped by a KEEP pool). When we developed the code, we expected the message table to remain at a fairly constant size of 50 – 100MB in size. What we found was the table continued to grow at a consistent rate to many gigabytes in size until we stopped the test. If we copied the table (eg using “create table <new table> as select * from <current table>”) the new table would again be 50MB so most of the space in the table was empty “deleted” space. The INSERT statements were never re-using the space made free by the delete statement (run in another session).
Jonathan Lewis made reference to a 11g bug related to using a KEEP POOL in his note Not KEEPing. Oracle 11g introduced a new feature called adaptive serial direct path reads which allows large “Full Scan” disk reads to be read using “direct path reads” rather than through the buffer cache. In many cases “direct IO” can give significant increase in performance when reading data for first time (from disk), however can be significantly slower if your subsequent queries could have been serviced from the buffer cache. The bug Jonathan references (Bug 8897574) causes problems if you assign any large object to a KEEP POOL because by default, 11g would read large objects using the new direct IO feature and avoid ever placing the object in the KEEP POOL. The whole point of using the KEEP POOL is to identify objects you do want to protect and keep in a cache.
The 10.2.0.5 patchset has also back-ported the same direct read feature which is new to 11g although I don’t know if the rules are the same as 11g. The site where I work makes significant use of KEEP pools and also has spent some time investigating aspects relating to serial direct IO vs. buffer cache IO.
I want to use this blog entry to explore a number of related issues but also demonstrate that the 11g bug Jonathan identified seems to also exists in 10.2.0.5 patchset (and 11gR2). This blog item will cover:
- Brief reference relating to the 11g “adaptive serial direct path read”
- The 10.2.0.5 implementation and how to switch it on
- 10.2.0.5 demonstration showing the relative difference for different types of IO vs read from cache
- 10.2.0.5 demonstration which shows the KEEP pool bug also exists (but not by default)
- Some real life comparison figures of disk reads via “direct path read” and via “buffer cache” to show why the “adaptive serial direct path read” feature is worth exploring in more detail.
A number of years ago, I wrote a SQL Trace File formatter when I needed to process a large amount of trace files for a complex project we were working on. The formatter is not intended to replace the really good tools that are out there, but I like reading the detail which appears in a raw trace file (but with some additional help). I also wanted to structure the trace file so it could be processed by other scripts separately. This is by no means written to a commercial standard, but I thought people may find it useful and also provide interesting insite on how to process trace files. I worked with James Morle in a previous company to build a benchmark based on Oracle trace files using some software he had written in his book (Scaling Oracle 8i, the paper Brewing Benchmarks and the later software simora ). The benchmark software contained an awk script to process traces files and extract the entry point database calls and convert them to a tcl scripting language to then drive the benchmark. I took (with permission) the conversion script and modified it so it generated a Oracle trace file but with extra information.
I’ve just seen a note on Jonathan Lewis’s blog regarding Online Index Rebuilds. It reminds me of some issues which existed in Oracle 9i and 10g but appear to have been resolved in 11gR1 and 11gR2. Oracle 9i introduced a patch to change behaviour regarding online Index Rebuilds. The default behaviour in 9i and 10g is that an Online Index Rebuild would get blocked behind a long active transaction which uses the index (which is still true in 11g) but critically then would also block any new DML wanting to also modify the index (Leading to a hang of the application as well as the index build). They introduced a new database EVENT 10629 (in a 9i patch) which would mean the Online Index Rebuild would keep trying to acquire its locks but would keep backing off to allow other DML to continue. The Event would be set with a level to either try forever (but don’t block other new DML) or fail the Online Index Rebuild after a specific time period (well retries).
Out of interest, you can sometimes look up the text for Oracle Events using the “oerr” command .
$ oerr ora 10629 10629, 00000, "force online index build to backoff and retry DML lock upgrade" // *Cause: // *Action: set this event only under the supervision of Oracle development // *Comment: Change the behaviour of an online index rebuild such that it // will backoff and retry a failed DML lock upgrade. // The event level is the number of retries the online index rebuild // should wait. Level 1 means backoff and retry indefinitely. Any // other value less than 32 will be adjusted automatically to be 32.
There is more information on Meta link (note: 3566511.8). The comments of “This issue is fixed in” mean the new Event feature is included in the release (you still need to set it).
I’ve just had a very quick test of the Event on 11gR1/R2 but it’s not clear if the Event still works. The very important thing (to me) is the 11g versions no longer cause other unrelated DML to become stuck behind a long running active transaction.