Nigel Noble's Oracle Blog

11/01/2011

10.2.0.5 dbms_sql leaks memory AND performance degrades

Filed under: Uncategorized — Nigel Noble @ 11:24 am

My site recently upgraded one of its databases to the 10.2.0.5 patchset and found a serious performance problem relating to code using dbms_sql. Once we had completed the upgrade, we noticed a number of data feeds to the upgraded database started to fall behind and could no longer keep up. When we stopped and restarted the feeds they appeared to speed up. After some investigation we found two problems, the first relating to a “log file sync” problem in RAC which is still under investigation (one for a future post when we have more detail) but the second issue caused the performance of the data feed to degrade steadily over time.  

We use a couple of products to dynamically feed data to our 10.2.0.5 database and it seemed the performance degraded  over time (many days). We did a 10046 session trace and found we lost a lot of time between the dbms_sql.parse of some dynamic SQL and the bind phase. We found reference in Metalink to Bug 10269717 (DBMS_SQL.PARSE leaks session heap memory in 10.2.0.5).  Although the bug discusses a memory leak, we found that the performance also degrades over time.

We applied the patch for 10269717 and the PGA memory leak was resolved but more importantly the performance remained constant. I just checked the 10.2.0.5 known issues document and this does now list the bug, but it still only references the leak and not the performance implications.

25/08/2010

dbms_scheduler.drop_job “leaks” PGA memory

Filed under: Uncategorized — Nigel Noble @ 1:00 pm

I thought I would post a very short note about a recent PGA memory “leak” issue we found in one of our applications that appears to exist in Oracle versions 10gR2 through to 11gR2. I would not expect the problem to actually affect many sites so I am not going to  spend a huge amount of time showing the test case but thought I would make people aware of the potential issue.

My site introduced a “parallel scheduler” which allows us to break some of our business transactions into parallel jobs. It simply manages the running of some time critical  business tasks in parallel but takes full control of the business rules and co-ordinates that all the tasks are complete, verified and handles the rules if parts fail to complete. The program runs itself as a continually running dbms_scheduler job and then schedules the worker threads (within various rules) then keeps careful track of the completion status of each part.

The Problem

One day our scheduler program failed with ORA-4030 (“out of process memory when trying to allocate %s bytes (%s,%s)”). We re-started the job, but further investigation showed that the job process which had previously been running our scheduler (which was still attached)  although now showing a low “session pga memory” had at some point had “session pga memory max” of 4GB. I set up a monitor to collect the “session pga memory” of the re-started job and left it collecting for a few days. When I plotted the PGA memory data we could clearly see the PGA memory appeared to grow during busy periods and not at all at off peak times but importantly never reduced. I sent the memory usage graph to a colleague and after a short while, he sent me back a graph which looked 100% the same as mine……except his graph had a totally different scale and was not memory. The graph he sent me was actually the total number of tasks our scheduler processes was asked to run in the same time period. So we knew we appeared to “leak” a fixed amount of PGA every time we ran a scheduled job.

Investigation

After some further investigation (dumping “session pga memory” after every call on a test system) we found the problem was not the submission of the job, but after the job had completed, we formally make a call to dbms_scheduler.drop_job and this call “leaks” approximately 21k every call. NOTE: I’ve used the term “leak” a lot and strictly speaking this is not a leak. Oracle knows about all about the memory and when your plsql package completes all the PGA memory is returned. The problem is Oracle does not free the memory during the execution of the main plsql procedure.

Bug

We have raised a new bug with Oracle:  9957867 – EXECUTING DBMS_SCHEDULER.DROP_JOB REPEATEDLY IN A PLSQL PROCEDURE LEAKS MEMORY which is still under investigation by Oracle Support. I’ll post an update if I get anymore information.

Work Around

This problem will really only cause problems if you schedule and then explicitly drop (dbms_scheduler.drop_job)  a lot of jobs in the same plsql loop without completing the procedure.  The dbms scheduler will by default create jobs with auto_drop=true so again this is only a problem if you create jobs with auto_drop=false because you want to take full control. We have a number of options which my site will consider (Change our code and use auto_drop=true, move the drop_job to another scheduled task which does restart from time to time or finally restart our scheduler task from time to time.

23/07/2010

10gR2 – 11gR2, ASSM Space Management “Bug”

Filed under: 10gR2, 11gR1, 11gR2, assm — Nigel Noble @ 10:59 am

ASSM (Automatic Segment Space Management) has an issue when trying to re-use ”deleted” space created by another session.  There is a very specific set of circumstances which must occur for this issue to show itself, but will result in tables (and I suspect indexes) growing significantly larger than they need to be. I am aware that the problem exists in versions 10.2.0.3 through to the current 11gR2 inclusive although I don’t know which Oracle release first introduced the problem.

The conditions required to cause the issue

My site has a number of daemon style jobs running permanently on the database loading data into a message table. The daemon is managed by dbms_job and is re-started when the database is first started and can run for many days, weeks (or in some cases months) before the job is ever stopped. We only need to keep the messages for a short time, so we have another daemon job whose role is to delete the messages from the table as soon as the expiry time is reached. In one example we only need to retain the data for a few minutes (after which time we no longer need it) and we also wanted to keep the table as small as possible so it remained cached in the buffer cache (helped by a KEEP pool). When we developed the code, we expected the message table to remain at a fairly constant size of 50 – 100MB in size. What we found was the table continued to grow at a consistent rate to many gigabytes in size until we stopped the test. If we copied the table (eg using “create table <new table> as select * from <current table>”) the new table would again be 50MB so most of the space in the table was empty “deleted” space. The INSERT statements were never re-using the space made free by the delete statement (run in another session).

(more…)

05/07/2010

10.2.0.5, KEEP pool / Serial Direct Read

Filed under: 10.2.0.5, 11gR1, Performance — Nigel Noble @ 11:36 am

Jonathan Lewis made reference to a 11g bug related to using a KEEP POOL in his note Not KEEPing.  Oracle 11g introduced a new feature called adaptive serial direct path reads which allows large “Full Scan” disk reads  to be read using “direct path reads” rather than through the buffer cache. In many cases “direct IO” can give significant increase in performance when reading data for first time (from disk), however can be significantly slower if your subsequent queries could have been serviced from the buffer cache. The bug Jonathan references (Bug 8897574) causes problems if you assign any large object to a KEEP POOL because by default, 11g would read large objects using the new direct IO feature and avoid  ever placing the object in the KEEP POOL. The whole point of using the KEEP POOL is to identify objects you do want to protect and keep in a cache. 

The 10.2.0.5 patchset has also back-ported the same direct read feature which is new to 11g although I don’t know if the rules are the same as 11g. The site where I work makes significant use of KEEP pools and also has spent some time investigating aspects relating to serial direct IO vs. buffer cache IO. 

 I want to use this blog entry to explore a number of related issues but also demonstrate that the 11g bug Jonathan identified seems to also exists in 10.2.0.5 patchset (and 11gR2). This blog item will cover:

  • Brief reference relating to the 11g “adaptive serial direct path read”
  • The 10.2.0.5 implementation and how to switch it on
  • 10.2.0.5 demonstration showing the relative difference for different types of IO vs read from cache
  • 10.2.0.5 demonstration which shows the KEEP pool bug also exists (but not by default)
  • Some real life comparison figures of disk reads via “direct path read” and via “buffer cache” to show why the ”adaptive serial direct path read” feature is worth exploring in more detail.   

 

(more…)

17/06/2010

My 10046 SQL Trace Formatter

Filed under: Monitoring, trace — Nigel Noble @ 10:52 pm

trace_by_hash.awk

A number of years ago, I wrote a SQL Trace File formatter when I needed to process a large amount of trace files for a complex project we were working on. The formatter is not intended to replace the really good tools that are out there, but I like reading the detail which appears in a raw trace file (but with some additional help). I also wanted to structure the trace file so it could be processed by other scripts separately. This is by no means written to a commercial standard, but I thought people may find it useful and also provide interesting insite on how to process trace files.  I worked with James Morle in a previous company to build a benchmark based on Oracle trace files using some software he had written in his book (Scaling Oracle 8i, the  paper Brewing Benchmarks and the later software simora ). The benchmark software contained an awk script to process traces files and extract the entry point database calls and convert them to a tcl scripting language to then drive the benchmark. I took (with permission) the conversion script and modified it so it generated a Oracle trace file but with extra information.

(more…)

02/06/2010

Online Index Rebuilds

Filed under: Uncategorized — Nigel Noble @ 12:24 pm

I’ve just seen a note on Jonathan Lewis’s blog regarding Online Index Rebuilds. It reminds me of some issues which existed in Oracle 9i and 10g but appear to have been resolved in 11gR1 and 11gR2. Oracle 9i introduced a patch to change behaviour regarding online Index Rebuilds. The default behaviour in 9i and 10g is that an Online Index Rebuild would get blocked behind a long active transaction which uses the index (which is still true in 11g) but critically then would also block any new DML  wanting to also modify the index (Leading to a hang of the application as well as the index build). They introduced a new database EVENT 10629 (in a 9i patch) which would mean the Online Index Rebuild would keep trying to acquire its locks but would keep backing off to allow other DML to continue. The Event would be set with a level to either try forever (but don’t block other new DML) or fail the Online Index Rebuild after a specific time period (well retries).

Out of interest, you can sometimes look up the text for Oracle Events using the “oerr” command .  

$ oerr ora 10629
10629, 00000, "force online index build to backoff and retry DML lock upgrade"
// *Cause:
// *Action: set this event only under the supervision of Oracle development
// *Comment: Change the behaviour of an online index rebuild such that it
// will backoff and retry a failed DML lock upgrade.
// The event level is the number of retries the online index rebuild
// should wait. Level 1 means backoff and retry indefinitely. Any
//           other value less than 32 will be adjusted automatically to be 32.

 
There is more information on Meta link (note: 3566511.8). The comments of “This issue is fixed in” mean the new Event feature is included in the release (you still need to set it).

I’ve just had a very quick test of the Event on 11gR1/R2 but it’s not clear if the Event still works. The very important thing (to me) is the 11g versions no longer cause other unrelated DML to become stuck behind a long running active transaction.

17/05/2010

Monitoring Connection Pools

Filed under: Monitoring, Performance — Nigel Noble @ 4:19 pm

The company where I work run a large web infrastructure with many different Java based applications and servers. Most of these application servers connect to the database using a connection pool to manage database connections and reduce the cost of starting/destroying database sessions. Over the years we have spent a lot of time trying to get the right balance to keep session usage as smooth as possible.

  • maximum connections in the pool set too low? - Can lead to the requests queuing to get a connection or running out all together during peak spikes (leading to  application failure).
  • maximum connections set too high? - Can lead to “logon storms”. A sudden surge in concurrent user activity can cause a huge demand on new sessions to be created on the database server. Because the database is slowed by the surge, you often get a “feed back loop” effect. Slow database response means even more connection requests. “Connection Storms” can also be caused by small problems which start in the database or network (unexpected slow network pause or database wait event…. then a sudden logon storm hits the database host as increasing connections slow the database even more (seen this lots in the past)
  • minimum connections in the pool set too low? – Can lead to the added cost of always having to restart new database sessions just at the critical time you need them.
  • idle out time set too low? - Once you have taken the hit of creating a new session in the database, you let this session reach the idle time too soon and the session is destroyed… only to be needed again a few minutes later.
  • idle out time set too high? Lets say you have suffered a minor “logon storm”, if the idle time set too high you could have all these extra sessions hanging around for a very long time (There have been a number of good presentations by the Oracle Real World Performance group talking about benefits of reducing connections on a database server)
  • minimum and maximum the same value? There is a lot to be said for running with a fixed number of sessions supporting your average usage and your peaks (The trick is finding the correct number to support peaks)

A very simple session monitoring script

Before I actually talk about the script itself, I thought I would give  an example of why I wrote it in the first place and how it was used, then I will show the script and why we still use it today.

The problem

When I first joined my company, I could never get my head around the number of sessions on the database compared to the number of “ACTIVE” sessions seen in v$session. We seemed to have many more connections from the application servers then sessions doing work. Every once in a while the company would review the peak connection pool settings (per box) and adjust them. The kind of conversations we would have around the office were:

 ”We are using the maximum 30 connections per box, let’s make it 35 per box for growth”

The next year we would say

“We are using all the 35 per box, let’s make it 40 per box…..  for growth”

All the time I kept thinking maybe v$session status=”ACTIVE” was mis-leading, or maybe a problem with the connection pools (We had 30 minute idle out time but we always ran at the maximum connections but very few were actually seen to be used.

Back to the script

client_info.sql

 

set trimspool on
set pagesize 1000
set linesize 190
column since format a20
column logon format a20
column event format a25 truncate
column machine format a10 truncate
column status format a10 truncate
column username format a10 truncate
column n format 9999.99

break on machine on report  skip 1

compute sum of sess_count on machine
compute sum of sess_count on report

set time on
spool client.txt append

select 'CFROM' tag,
  to_char(sysdate,'hh24:mi:ss') when,
        machine,
        event,
        seconds_in_wait,
        sql_id,
        prev_sql_id,
        count(*) sess_count,
  to_char(sysdate - (SECONDS_IN_WAIT / (60 * 60 * 24)),'dd-mon-yy hh24:mi:ss') since,
--      next two lines useful if trying to predict a concurrent spike.   1200 being 20 minutes
--      Left over from site specific issue but could be useful.
--      next is last active time + 20 minutes
--      n is a count down to next predicted spike
-- to_char(sysdate - ( (seconds_in_wait ) / (60 * 60 * 24)) + (1200 / (60 * 60 * 24)),'dd-mon-yy hh24:mi:ss') NEXT ,
-- ((sysdate - (sysdate - ( (seconds_in_wait ) / (60 * 60 * 24)) + (1200 / (60 * 60 * 24)) ) ) * (60 * 60 * 24) ) / 60 n,
        username,
        status,
        state
from v$session
group by machine,
         event,
         seconds_in_wait,
         sql_id,
         prev_sql_id,
         username,
         status,
         state
order by machine,
         username,
         event,
         seconds_in_wait,
         sql_id,
         prev_sql_id,
         status,
         state
/

One day I wrote the above script and all became very clear. Something sent  a very fast (150 micro seconds) sql statement concurrently from each application server to every session in the connection pool. The request rate could easily have been serviced by a handful of sessions, but because they came concurrently at the exact same time every session was used. 

   

note: This output has been faked, Not able to find real example of the original issue we had (few years back)
note: All host names and user names are also faked...... so I can keep my job!
Columns            Description
-----------------  --------------------------
TAG                tag used so can grep info from log file
WHEN current system time when data collected
MACHINE            host where the session originates
EVENT              Current database event
SECONDS_IN_WAIT    Seconds in Wait
SQL_ID             Current SQL statement
PREV_SQL_ID        Previous SQL statement last run on the session
SESS_COUNT         Total number of sessions within group (sql_id,event,seconds in wait etc)
SINCE What time the session has been idle since (sysdate - seconds wait)
USERNAME           User name of sessions
STATUS             Session status (Active or Inactive)
STATE              Session Wait State

TAG   WHEN     MACHINE    EVENT                     SECONDS_IN_WAIT SQL_ID        PREV_SQL_ID   SESS_COUNT SINCE                USERNAME   STATUS     STATE
----- -------- ---------- ------------------------- --------------- ------------- ------------- ---------- -------------------- ---------- ---------- -------------------
CFROM 13:55:17 abcabc01.i SQL*Net message from clie               0 2j7pff3tfuzzz 2j7pff3tfuzzz          1 14-may-10 13:55:17   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie               0               qw4bv6jwup5ab          1 14-may-10 13:55:17   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie               0               5tarshstnypzv          1 14-may-10 13:55:17   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie               1               8zv7177vuc8dt          3 14-may-10 13:55:16   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie               5               4616pfpak8akh          1 14-may-10 13:55:12   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie              22               amsmuu1pp1w74          1 14-may-10 13:54:55   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie             185               3q4bv6jx8wup5         30 14-may-10 13:52:12   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie             433               f3tg1gz4zdadm          1 14-may-10 13:48:04   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie             873               cmh3vh4pjs7q7          1 14-may-10 13:40:44   WEBUSERABC INACTIVE   WAITING
               **********                                                                       ----------
               sum                                                                                      40
CFROM 13:55:17 abcabc02.i SQL*Net message from clie               0 zzz0ahx447fpr zzz0ahx447fpr          1 14-may-10 13:55:17   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie               0               3q4bv6jx8wup5          1 14-may-10 13:55:17   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie               0               5tarshstnypzv          1 14-may-10 13:55:17   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie               0               f3tg1gz4zdadm          1 14-may-10 13:55:17   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie               6               cmh3vh4pjs7q7          1 14-may-10 13:55:11   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie               8               8zv7177vuc8dt          1 14-may-10 13:55:09   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie              11               cmh3vh4pjs7q7          1 14-may-10 13:55:06   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie              12               f3tg1gz4zdadm          1 14-may-10 13:55:05   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie              80               8zv7177vuc8dt          1 14-may-10 13:53:57   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie              80               f3tg1gz4zdadm          1 14-may-10 13:53:57   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie             138               3q4bv6jx8wup5         28 14-may-10 13:52:59   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie             695               cmh3vh4pjs7q7          1 14-may-10 13:43:42   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie            1394               cmh3vh4pjs7q7          1 14-may-10 13:32:03   WEBUSERABC INACTIVE   WAITING
               **********                                                                       ----------
               sum                                                                                      40
CFROM 13:55:17 abcabc03.i SQL*Net message from clie               0 zzz0ahx447fpr zzz0ahx447fpr          1 14-may-10 13:55:17   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie               0               5tarshstnypzv          1 14-may-10 13:55:17   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie               0               cmh3vh4pjs7q7          1 14-may-10 13:55:17   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie               2               5tarshstnypzv          1 14-may-10 13:55:15   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie               7               8zv7177vuc8dt          2 14-may-10 13:55:10   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie              14               3q4bv6jx8wup5          1 14-may-10 13:55:03   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie              14               4616pfpak8akh          1 14-may-10 13:55:03   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie              61               8zv7177vuc8dt          1 14-may-10 13:54:16   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie             159               3q4bv6jx8wup5         31 14-may-10 13:52:38   WEBUSERABC INACTIVE   WAITING
               **********                                                                       ----------
               sum                                                                                      40
CFROM 13:55:17 abcabc04.i SQL*Net message from clie               0               3q4bv6jx8wup5         39 14-may-10 13:55:17   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie               0               5tarshstnypzv          1 14-may-10 13:55:17   WEBUSERABC INACTIVE   WAITING
               **********                                                                       ----------
               sum                                                                                      40
CFROM 13:55:17 abcabc05.i SQL*Net message from clie               0 zzz0ahx447fpr zzz0ahx447fpr          1 14-may-10 13:55:17   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie               0               33hsynd62ka4k          1 14-may-10 13:55:17   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie               0               cmh3vh4pjs7q7          1 14-may-10 13:55:17   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie               1 zzz0ahx447fpr zzz0ahx447fpr          2 14-may-10 13:55:16   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie              10               cmh3vh4pjs7q7          1 14-may-10 13:55:07   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie              31               8zv7177vuc8dt          1 14-may-10 13:54:46   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie             157               amsmuu1pp1w74          1 14-may-10 13:52:40   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie             318               3q4bv6jx8wup5         29 14-may-10 13:49:59   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie             874               cmh3vh4pjs7q7          2 14-may-10 13:40:43   WEBUSERABC INACTIVE   WAITING
CFROM 13:55:17            SQL*Net message from clie            1040               8zv7177vuc8dt          1 14-may-10 13:37:57   WEBUSERABC INACTIVE   WAITING
               **********                                                                       ----------
               sum                                                                                      40

Whenever I ran the script, I could see that we had spikes of sessions running the same sql statement in the same second on any given host. Because the statement groups by “seconds_in_wait” we might see that say 25 sessions all were active 8 minutes ago with the same statement and have since done no work (waiting on “SQL*Net message from client”). When we looked at all the application hosts, each would spike at a 20 minute interval (although each host had its own time at which it occurred).

My company use a number of different application caching techniques in our middle-tier application, one of these was a concurrent “read- ahead” cache, this would allow a read from cache to detect its data was soon to expire and asynchronously request a load of the caches in parallel. Our “read-ahead” code had been set to allow 50 threads to get data, but we only had 40 connections in the database connection pool (each box). We had a 30 minute keep alive on the connections, but every 20 minutes exactly, we would touch every connection and keep them alive. The solution was to reduce the number of threads on the “read-ahead” and not even change the connection pool configuration. This fix automatically reduced the number of connections on the database by around 1000.

There have been so many examples where the script has helped, we leave it collecting every 5 minutes to a log file on the more sensitive databases. It can help identify cases where:

  • Sudden concurrent requests by one or more severs or applications. We can get clues in what was the last sql statement run, who by and when
  • Statements taking a long time on specific hosts (Look for “ACTIVE” sessions for long time on same sql)
  • Idle connections which never get returned to the connection pool correctly within the application hosts
  • user behaviour driving sudden spikes in concurrent requests from specific hosts
  • sessions which are getting stuck on database waits (eg db links due to networking issues, row locking etc) and not returning to the pool
  • What time an application host last sent any requests to the database

The script attempts to provide a fast summary of the connections on the database, how long they have been idle, what they last did, which current wait events are used.  Although the script is very simple, I have found it a really good way to summarise what our hosts (using connection pools)  are doing.

11/05/2010

11.1.0.7 poor plsql array performance

Filed under: 11gR1, Performance — Nigel Noble @ 2:05 pm

The PLSQL application at my site is dependent on the usage of very large PLSQL arrays (several GB in size). During the testing on 11.1.0.7 we found a severe performance issue when the application first loaded the plsql arrays with data. The application used to load the array data in tens of seconds but was now taking 6 or 7 minutes to do the same work. Further investigation showed that this problem did not exist on any other version of Oracle (We tested 10.2.0.3, 10.2.0.4, 11.1.0.6 and 11.2.0.1).

A colleague wrote a test case to show the issue for Oracle support so they could identify a fix and we could request a patch:

Source of the script:
Package Header (has no body!)

CREATE OR REPLACE PACKAGE tst_pkg_array IS

   TYPE typ_rec_1 IS RECORD(
       attr1  NUMBER(12)
      ,attr2  NUMBER(12)
      ,attr3  NUMBER(12)
      ,attr4  NUMBER(12)
      ,attr5  NUMBER(12)
      ,attr6  VARCHAR2(100)
      ,attr7  VARCHAR2(100)
      ,attr8  VARCHAR2(100)
      ,attr9  VARCHAR2(100)
      ,attr10 VARCHAR2(100)
      ,attr11 DATE
      ,attr12 DATE
      ,attr13 DATE
      ,attr14 DATE
      ,attr15 DATE
      ,attr16 NUMBER(12)
      ,attr17 NUMBER(12)
      ,attr18 NUMBER(12)
      ,attr19 NUMBER(12)
      ,attr20 NUMBER(12)
      ,attr21 VARCHAR2(100)
      ,attr22 VARCHAR2(100)
      ,attr23 VARCHAR2(100)
      ,attr24 VARCHAR2(100)
      ,attr25 VARCHAR2(100));

   TYPE typ_tab_1 IS TABLE OF typ_rec_1 INDEX BY PLS_INTEGER;

   tab_test_simple typ_tab_1;

END tst_pkg_array;
/

Test script to generate timings

set timing on

spool run.log append

set time on

select * from v$version;

set serveroutput on size 100000

DECLARE

   --
   PROCEDURE pr_put_elements_into_array(i_num_elements IN NUMBER) IS

      l_rec_test tst_pkg_array.typ_rec_1;

      tab_test_local tst_pkg_array.typ_tab_1;

   BEGIN
      --
      -- start by clearing down
      tst_pkg_array.tab_test_simple.delete;
      tst_pkg_array.tab_test_simple := tab_test_local;
      --
      l_rec_test.attr1  := 10000000;
      l_rec_test.attr2  := 10000000;
      l_rec_test.attr3  := 10000000;
      l_rec_test.attr4  := 10000000;
      l_rec_test.attr5  := 10000000;
      l_rec_test.attr6  := 'ABCDEFGH';
      l_rec_test.attr7  := 'ABCDEFGH';
      l_rec_test.attr8  := 'ABCDEFGH';
      l_rec_test.attr9  := 'ABCDEFGH';
      l_rec_test.attr10 := 'ABCDEFGH';
      l_rec_test.attr11 := SYSDATE;
      l_rec_test.attr12 := SYSDATE;
      l_rec_test.attr13 := SYSDATE;
      l_rec_test.attr14 := SYSDATE;
      l_rec_test.attr15 := SYSDATE;
      l_rec_test.attr16 := 10000000;
      l_rec_test.attr17 := 10000000;
      l_rec_test.attr18 := 10000000;
      l_rec_test.attr19 := 10000000;
      l_rec_test.attr20 := 10000000;
      l_rec_test.attr21 := 'ABCDEFGH';
      l_rec_test.attr22 := 'ABCDEFGH';
      l_rec_test.attr23 := 'ABCDEFGH';
      l_rec_test.attr24 := 'ABCDEFGH';
      l_rec_test.attr25 := 'ABCDEFGH';
      --
      FOR n IN 1 .. i_num_elements LOOP
         tst_pkg_array.tab_test_simple(n) := l_rec_test;
--       Using the line below instead of the one above (ie using a local variable rather than a package state variable) resolves the issue on 11.1.0.7
--       To test that, comment out the line above, and "comment in" the line below
--         tab_test_local(n) := l_rec_test;
      END LOOP;

   END pr_put_elements_into_array;

   PROCEDURE pr_run_test(i_num_elements IN NUMBER) IS
      l_duration_secs NUMBER;
      l_start_time    NUMBER;

   BEGIN

      l_start_time := DBMS_UTILITY.get_time;
      pr_put_elements_into_array(i_num_elements => i_num_elements);
      l_duration_secs := (DBMS_UTILITY.get_time - l_start_time) / 100;
      dbms_output.put_line(i_num_elements || ' elements took ' || (l_duration_secs) || ' secs or ' ||
                           (l_duration_secs / (i_num_elements / 1000)) || ' secs per thousand');
   END pr_run_test;
BEGIN
   pr_run_test(i_num_elements => 1000);
   pr_run_test(i_num_elements => 10000);
   pr_run_test(i_num_elements => 100000);
   pr_run_test(i_num_elements => 200000);
   pr_run_test(i_num_elements => 300000);
END;
/

exit

Output on 11.1.0.7

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
PL/SQL Release 11.1.0.7.0 - Production
CORE    11.1.0.7.0      Production
TNS for Linux: Version 11.1.0.7.0 - Production
NLSRTL Version 11.1.0.7.0 - Production

Elapsed: 00:00:00.40
1000 elements took .01 secs or .01 secs per thousand
10000 elements took .09 secs or .009 secs per thousand
100000 elements took 6.1 secs or .061 secs per thousand
200000 elements took 21.65 secs or .10825 secs per thousand
300000 elements took 46.55 secs or .1551666666666666666666666666666666666667
secs per thousand

PL/SQL procedure successfully completed.

Elapsed: 00:01:14.40

The above example shows the script took over 1 minute to complete and you can see the result degreades exponentially (per 1000 requests) as more entries are added to the array.

Output on 10.2.0.4


BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

Elapsed: 00:00:00.02
1000 elements took .01 secs or .01 secs per thousand
10000 elements took .05 secs or .005 secs per thousand
100000 elements took .51 secs or .0051 secs per thousand
200000 elements took 1.02 secs or .0051 secs per thousand
300000 elements took 1.17 secs or .0039 secs per thousand

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.89

Patch details:

Patch 7671793: EXCESSIVE MEMORY USAGE WHEN USING KGHU
patch is avaiable for Linux x86-64)

Output once patch was applied:

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
PL/SQL Release 11.1.0.7.0 - Production
CORE    11.1.0.7.0      Production
TNS for Linux: Version 11.1.0.7.0 - Production
NLSRTL Version 11.1.0.7.0 - Production

Elapsed: 00:00:00.06
1000 elements took 0 secs or 0 secs per thousand
10000 elements took .05 secs or .005 secs per thousand
100000 elements took .55 secs or .0055 secs per thousand
200000 elements took .88 secs or .0044 secs per thousand
300000 elements took 1.25 secs or .004166666666666666666666666666666666666667
secs per thousand

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.84

You can now see with the patch applied, the above test takes less than 3 seconds to complete.

Although our application uses plsql arrays in a very extreme way, it may be worth reviewing this patch if your application has some unexplained slow down and is using large plsql arrays AND 11.1.0.7.

I have tested this issue on 11.1.0.7.3 (latest PSU patch 9352179) and the problem still exists. It’s not clear if this is a generic bug or one specific to our platform (Linux x86-64). Anoyingly I have found that the available patch ( 7671793) clashes with all the avaiable  11.1.0.7 psu patches, so a merge patch would be needed or you could try a non 11.1.0.7 version of Oracle.

10/05/2010

11gR2 “alter user enable editions”… What’s going on?

Filed under: 11gR2, Editions, trace — Nigel Noble @ 1:54 pm

Ok, Let me start on my very first blog entry. Firstly I should say I am totally new to blogging so it may take a while before I settle on a format and style.

I want to try and combine two subjects in this entry:

  • How you can try and investigate Oracle commands to see what they do, where Oracle stores it’s dictionary data and how you can get more information about an Oracle error.
  • Trying to be topical, I thought I would use the example of enabling the Oracle 11gR2 Editions feature on a complex schema to allow multi versioning of code and hot releases.

One of the first issues you may find when you try and enable the Oracle 11gr2 Editions feature on a complex schema is the following error:

SQL> alter user test_codea enable editions;
alter user test_codea enable editions
*
ERROR at line 1:
ORA-38819: user TEST_CODEA owns one or more objects whose type is editionable
and that have noneditioned dependent objects

This is a very simple test script to cause the error (run on 11.2.0.1.1)

set echo on

rem set up two demo users called test_codea and test_codeb
connect / as sysdba
drop user test_codea cascade;
drop user test_codeb cascade;

create user test_codea identified by test_codea;
create user test_codeb identified by test_codeb;
grant create session,resource to test_codea;
grant create session,resource to test_codeb;

connect test_codea/test_codea

create or replace package ca as
    procedure a;
end;
/
create or replace package body ca as
    procedure a is
    begin
         null;
    end;
end;
/
grant execute on ca to test_codeb;
connect test_codeb/test_codeb

create or replace package cb as
    procedure b;
end;
/
create or replace package body cb as
    procedure b is
    begin
         test_codea.ca.a;    -- This procedure makes a call back to the procedure test_codeA
    end;
end;
/

rem confirm the code works
execute cb.b;

connect / as sysdba

rem now try and enable the 11gR2 Editions feature on the test_codeA user

alter user test_codea enable editions;

So what does the error mean and what can you do about it:

ORA-38819: user TEST_CODEA owns one or more objects whose type is editionable
and that have noneditioned dependent objects

Well, Oracle is kindly telling you that one or more objects in the schema is dependent on something which is not enabled for Editions, or another non Editioned schema will depend on your objects….. OR on objects which can never support Editions feature.

Well, kind Oracle gives as a nice work around to the issue…… or do they:

SQL> alter user test_codea enable editions force;

User altered.

SQL> connect test_codeb/test_codeb
Connected.
SQL> execute cb.b;
BEGIN cb.b; END;

*
ERROR at line 1:
ORA-04063: package body "TEST_CODEB.CB" has errors
ORA-06508: PL/SQL: could not find program unit being called: "TEST_CODEB.CB"
ORA-06512: at line 1

 

Oracle allows us to FORCE a user to be enabled for Editions however this can be dangerous because it can really break a complete schema in a very big way.

In the case of my simple example, there is a very simple solution, just enable Editions on the test_userB first so that when we enable Editions on test_userA both can now be versioned under the same Edition.


SQL> alter user test_codea enable editions;
alter user test_codea enable editions
*
ERROR at line 1:
ORA-38819: user TEST_CODEA owns one or more objects whose type is editionable
and that have noneditioned dependent objects
SQL>
SQL>
SQL>
SQL> alter user test_codeb enable editions;

User altered.

SQL> alter user test_codea enable editions;

User altered.

SQL>  connect test_codeb/test_codeb
Connected.
SQL> execute cb.b;

PL/SQL procedure successfully completed.

 

BUT, the above example is very simple case, what happens when you have dependency on objects which cannot be Editioned (ever!). There are a number of cases where objects can not be editioned and these objects must be moved out of the current schema to a non editioned schema. Examples of these objects are things like “user defined indexes” which include a user written plsql function, restrictions exist for the objects relating to Advanced Queues.

So, back to the other point of this post, how can we find out which objects are causing us problems so that we can fix or move them to another schema. Back to the good old 10046 trace. Lets trace the command and find out how Oracle is deciding which objects are invalid and need fixing before Editions can be enabled.


SQL> alter session set events '10046 trace name context forever, level 1';

Session altered.

SQL>
SQL> alter user test_codea enable editions;
alter user test_codea enable editions
*
ERROR at line 1:
ORA-38819: user TEST_CODEA owns one or more objects whose type is editionable
and that have noneditioned dependent objects

 

So if we look at the trace file we find the following internal Oracle sql statement along with a bunch of other stuff:


PARSING IN CURSOR #2 len=487 dep=1 uid=0 oct=3 lid=0 tim=1273060065216252 hv=305533609 ad='271fb5cf8' sqlid='btgrjvh93c4p9'
select d.owner#, u.name, d.name, d.namespace, d.stime    from obj$ d, dependency$ dep, obj$ p, user$ u    where d.obj# = dep.d_obj# and p.obj# = dep.p_obj#      and d.remoteowner is null and p.owner# = :1      and d.owner# = u.user#      and p.type# in (4,5,7,8,9,10,11,12,13,14,22,87)      and ((u.type#             != 2        and            bitand(u.spare1, 16) = 0        and            u.user#             != p.owner#) or           (d.type# not in (4,5,7,8,9,10,11,12,13,14,22,87)))
END OF STMT
PARSE #2:c=1000,e=758,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1273060065216251
EXEC #2:c=10999,e=10924,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=1111485757,tim=1273060065227267
FETCH #2:c=35994,e=37823,p=0,cr=872,cu=0,mis=0,r=1,dep=1,og=4,plh=1111485757,tim=1273060065265189

In the above trace output, we can see the SQL statement which Oracle runs to test for invalid objects which prevent the user Editions feature being enabled (In fact in my simple case, I can see “r=1″ on the FETCH meaning my simple case found 1 row matching invalid objects.

So, I can take that SQL statement and run it to find all objects which would cause problems enabling Editions and can fixed the problems in advance of trying to enable the editions feature:

Cleaned up version of the sql statement and output (Note: avoid running statements you extract this way on production systems unless you are really sure they will not have any adverse effects!)

connect / as sysdba
set echo on

rem find the user_id for the schema

select user_id from sys.dba_users where username = '&USERNAME'
/

set linesize 132

rem provide the user_id to the following sql statement

select d.owner#,
       u.name,
       d.name,
       d.namespace,
       d.stime
from   obj$ d,
       dependency$ dep,
       obj$ p, user$ u
where d.obj# = dep.d_obj#
and   p.obj# = dep.p_obj#
and   d.remoteowner is null
and   p.owner# = &user_id
and   d.owner# = u.user#
and p.type# in (4,5,7,8,9,10,11,12,13,14,22,87)
and ((u.type#             != 2
     and            bitand(u.spare1, 16) = 0
     and            u.user#             != p.owner#)
   or
     (d.type# not in (4,5,7,8,9,10,11,12,13,14,22,87)
     )
    )
/

Output from the script

    OWNER# NAME                           NAME                            NAMESPACE STIME
---------- ------------------------------ ------------------------------ ---------- ---------
       258 TEST_CODEB                     CB                                      2 05-MAY-10

When we ran this on one of our complex databases, we had 35 objects listed in just one schema which were preventing us enabling Editions (a test system I must point out… we are not ready for live yet). Most were the simple case of needing to also enable editions on other dependent schemas, but we also had problems with user written functional indexes, many of our Advanced Queues. The AQ and functional indexed had to be moved out to non editionable schemas.

summary

I hope I have shown the value of tracing Oracle commands, how you  might be able to use the information. I also hope you can see you need to be careful using “alter user x enable editions force” and the internal sql statement can help identify objects which cause issues.

04/05/2010

Hello world!

Filed under: Uncategorized — Nigel Noble @ 12:19 pm

Welcome to WordPress.com. This is your first post. Edit or delete it and start blogging!

Theme: Silver is the New Black. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.