Nigel Noble's Oracle Blog

11/01/2011

10.2.0.5 dbms_sql leaks memory AND performance degrades

Filed under: Uncategorized — Nigel Noble @ 11:24 am

My site recently upgraded one of its databases to the 10.2.0.5 patchset and found a serious performance problem relating to code using dbms_sql. Once we had completed the upgrade, we noticed a number of data feeds to the upgraded database started to fall behind and could no longer keep up. When we stopped and restarted the feeds they appeared to speed up. After some investigation we found two problems, the first relating to a “log file sync” problem in RAC which is still under investigation (one for a future post when we have more detail) but the second issue caused the performance of the data feed to degrade steadily over time.  

We use a couple of products to dynamically feed data to our 10.2.0.5 database and it seemed the performance degraded  over time (many days). We did a 10046 session trace and found we lost a lot of time between the dbms_sql.parse of some dynamic SQL and the bind phase. We found reference in Metalink to Bug 10269717 (DBMS_SQL.PARSE leaks session heap memory in 10.2.0.5).  Although the bug discusses a memory leak, we found that the performance also degrades over time.

We applied the patch for 10269717 and the PGA memory leak was resolved but more importantly the performance remained constant. I just checked the 10.2.0.5 known issues document and this does now list the bug, but it still only references the leak and not the performance implications.

Advertisements

25/08/2010

dbms_scheduler.drop_job “leaks” PGA memory

Filed under: Uncategorized — Nigel Noble @ 1:00 pm

I thought I would post a very short note about a recent PGA memory “leak” issue we found in one of our applications that appears to exist in Oracle versions 10gR2 through to 11gR2. I would not expect the problem to actually affect many sites so I am not going to  spend a huge amount of time showing the test case but thought I would make people aware of the potential issue.

My site introduced a “parallel scheduler” which allows us to break some of our business transactions into parallel jobs. It simply manages the running of some time critical  business tasks in parallel but takes full control of the business rules and co-ordinates that all the tasks are complete, verified and handles the rules if parts fail to complete. The program runs itself as a continually running dbms_scheduler job and then schedules the worker threads (within various rules) then keeps careful track of the completion status of each part.

The Problem

One day our scheduler program failed with ORA-4030 (“out of process memory when trying to allocate %s bytes (%s,%s)”). We re-started the job, but further investigation showed that the job process which had previously been running our scheduler (which was still attached)  although now showing a low “session pga memory” had at some point had “session pga memory max” of 4GB. I set up a monitor to collect the “session pga memory” of the re-started job and left it collecting for a few days. When I plotted the PGA memory data we could clearly see the PGA memory appeared to grow during busy periods and not at all at off peak times but importantly never reduced. I sent the memory usage graph to a colleague and after a short while, he sent me back a graph which looked 100% the same as mine……except his graph had a totally different scale and was not memory. The graph he sent me was actually the total number of tasks our scheduler processes was asked to run in the same time period. So we knew we appeared to “leak” a fixed amount of PGA every time we ran a scheduled job.

Investigation

After some further investigation (dumping “session pga memory” after every call on a test system) we found the problem was not the submission of the job, but after the job had completed, we formally make a call to dbms_scheduler.drop_job and this call “leaks” approximately 21k every call. NOTE: I’ve used the term “leak” a lot and strictly speaking this is not a leak. Oracle knows about all about the memory and when your plsql package completes all the PGA memory is returned. The problem is Oracle does not free the memory during the execution of the main plsql procedure.

Bug

We have raised a new bug with Oracle:  9957867 – EXECUTING DBMS_SCHEDULER.DROP_JOB REPEATEDLY IN A PLSQL PROCEDURE LEAKS MEMORY which is still under investigation by Oracle Support. I’ll post an update if I get anymore information.

Work Around

This problem will really only cause problems if you schedule and then explicitly drop (dbms_scheduler.drop_job)  a lot of jobs in the same plsql loop without completing the procedure.  The dbms scheduler will by default create jobs with auto_drop=true so again this is only a problem if you create jobs with auto_drop=false because you want to take full control. We have a number of options which my site will consider (Change our code and use auto_drop=true, move the drop_job to another scheduled task which does restart from time to time or finally restart our scheduler task from time to time.

02/06/2010

Online Index Rebuilds

Filed under: Uncategorized — Nigel Noble @ 12:24 pm

I’ve just seen a note on Jonathan Lewis’s blog regarding Online Index Rebuilds. It reminds me of some issues which existed in Oracle 9i and 10g but appear to have been resolved in 11gR1 and 11gR2. Oracle 9i introduced a patch to change behaviour regarding online Index Rebuilds. The default behaviour in 9i and 10g is that an Online Index Rebuild would get blocked behind a long active transaction which uses the index (which is still true in 11g) but critically then would also block any new DML  wanting to also modify the index (Leading to a hang of the application as well as the index build). They introduced a new database EVENT 10629 (in a 9i patch) which would mean the Online Index Rebuild would keep trying to acquire its locks but would keep backing off to allow other DML to continue. The Event would be set with a level to either try forever (but don’t block other new DML) or fail the Online Index Rebuild after a specific time period (well retries).

Out of interest, you can sometimes look up the text for Oracle Events using the “oerr” command .  

$ oerr ora 10629
10629, 00000, "force online index build to backoff and retry DML lock upgrade"
// *Cause:
// *Action: set this event only under the supervision of Oracle development
// *Comment: Change the behaviour of an online index rebuild such that it
// will backoff and retry a failed DML lock upgrade.
// The event level is the number of retries the online index rebuild
// should wait. Level 1 means backoff and retry indefinitely. Any
//           other value less than 32 will be adjusted automatically to be 32.

 
There is more information on Meta link (note: 3566511.8). The comments of “This issue is fixed in” mean the new Event feature is included in the release (you still need to set it).

I’ve just had a very quick test of the Event on 11gR1/R2 but it’s not clear if the Event still works. The very important thing (to me) is the 11g versions no longer cause other unrelated DML to become stuck behind a long running active transaction.

04/05/2010

Hello world!

Filed under: Uncategorized — Nigel Noble @ 12:19 pm

Welcome to WordPress.com. This is your first post. Edit or delete it and start blogging!

Blog at WordPress.com.