I thought I would post a very short note about a recent PGA memory “leak” issue we found in one of our applications that appears to exist in Oracle versions 10gR2 through to 11gR2. I would not expect the problem to actually affect many sites so I am not going to spend a huge amount of time showing the test case but thought I would make people aware of the potential issue.
My site introduced a “parallel scheduler” which allows us to break some of our business transactions into parallel jobs. It simply manages the running of some time critical business tasks in parallel but takes full control of the business rules and co-ordinates that all the tasks are complete, verified and handles the rules if parts fail to complete. The program runs itself as a continually running dbms_scheduler job and then schedules the worker threads (within various rules) then keeps careful track of the completion status of each part.
One day our scheduler program failed with ORA-4030 (“out of process memory when trying to allocate %s bytes (%s,%s)”). We re-started the job, but further investigation showed that the job process which had previously been running our scheduler (which was still attached) although now showing a low “session pga memory” had at some point had “session pga memory max” of 4GB. I set up a monitor to collect the “session pga memory” of the re-started job and left it collecting for a few days. When I plotted the PGA memory data we could clearly see the PGA memory appeared to grow during busy periods and not at all at off peak times but importantly never reduced. I sent the memory usage graph to a colleague and after a short while, he sent me back a graph which looked 100% the same as mine……except his graph had a totally different scale and was not memory. The graph he sent me was actually the total number of tasks our scheduler processes was asked to run in the same time period. So we knew we appeared to “leak” a fixed amount of PGA every time we ran a scheduled job.
After some further investigation (dumping “session pga memory” after every call on a test system) we found the problem was not the submission of the job, but after the job had completed, we formally make a call to dbms_scheduler.drop_job and this call “leaks” approximately 21k every call. NOTE: I’ve used the term “leak” a lot and strictly speaking this is not a leak. Oracle knows about all about the memory and when your plsql package completes all the PGA memory is returned. The problem is Oracle does not free the memory during the execution of the main plsql procedure.
We have raised a new bug with Oracle: 9957867 – EXECUTING DBMS_SCHEDULER.DROP_JOB REPEATEDLY IN A PLSQL PROCEDURE LEAKS MEMORY which is still under investigation by Oracle Support. I’ll post an update if I get anymore information.
This problem will really only cause problems if you schedule and then explicitly drop (dbms_scheduler.drop_job) a lot of jobs in the same plsql loop without completing the procedure. The dbms scheduler will by default create jobs with auto_drop=true so again this is only a problem if you create jobs with auto_drop=false because you want to take full control. We have a number of options which my site will consider (Change our code and use auto_drop=true, move the drop_job to another scheduled task which does restart from time to time or finally restart our scheduler task from time to time.