Nigel Noble's Oracle Blog

08/01/2015

“log file sync” and the MTTR Advisor

Filed under: 11gR2, Performance — Nigel Noble @ 6:12 pm

I recently investigated a performance problem on an Oracle 11.2 OLTP trading system and although we still don’t fully understand the issue (and which versions of Oracle it effects), I thought I would share what we found (and how we found it). We had a hardware failure on the database server, within 30 seconds the database had automatically been restarted on an idle identical member of the cluster and the application continued on the new database host. A few days later I just happened to notice the following change in the LGWR trace file. I noticed that the Log Writer trace was showing more “kcrfw_update_adaptive_sync_mode” messages than normal.

(more…)

12/07/2013

“enq: TX – contention” on SELECT with a large buffer cache and 2PC

Filed under: 11gR2 — Nigel Noble @ 12:08 pm

Note: The following is based on testing with 11.2.0.3 (I believe same issue exists within other Oracle versions).

I recently worked on an interesting problem relating to the “enq: TX – contention” wait event. There are a number of reasons for the wait but the most common reason I have come across at my site is best described by a forum posting I found by Jonathan Lewis... “select can wait for another transaction if that transaction is between the “prepare” and “commit” of a two-phase commit.” 

My site has a number of databases in the UK and Australia. We have often come across the problem when a SELECT statement tries to read a row which is involved in a distributed transaction to Australia. The problem is with the round trip latency to Australia. It is possible that during the communication of the PREPARE and COMMIT phases you have a 200ms – 300ms latency. Anyone trying to read the rows involved in the transaction can see a 300ms pause on “enq: TX – contention” waits. There are a number of tricks you can use to try and reduce these problems by finding ways to separate the rows the SELECT statement reads on the UK data from the rows involved in the Australian transaction. The very fact you visit the same block and evaluate the 2pc row can cause it to hang (Just reading a row to find it’s not actually the one you needed would do it as well.) We use tricks like careful usage of indexes to ensure the reader can go directly to the UK data and not even evaluate the row involved in the Australian 2pc (Partitioning can also help here too).

A new variation of “enq: TX – contention”

(more…)

28/03/2013

11.2.0.3 v$sqlstats.last_active_time stops changing and breaks AWR

Filed under: 11gR2, Monitoring — Nigel Noble @ 3:18 pm

My site uses a 3rd party SQL monitoring tool which collects data based on the Oracle view v$sqlstats.  The tool collects data for all sql statements which have been executed since the previous collection using the last_active_time column. A few months ago we noticed (after an upgrade to 11g) we would occasionally be missing some sql statements from the monitor graphs. We investigated the problem, re-produced a test case, raised a bug with Oracle and Support has just released an 11.2 patch.

Before I explain the issue and demonstrate the issue, I will explain what prompted me to post this blog item. Until now I always thought this just effected v$sqlstats (and my 3rd party monitoring tool) and not AWR. If you ever trace (10046) an AWR collection you will not find any references to v$sqlstats. If the data is missing from the 3rd Party tools “I can always rely on AWR?”…. I thought!

Well, I recently discovered that any time a sql statement drops off the 3rd party monitoring tool…… IT STOPS BEING MONITORED IN AWR’s DBA_HIST_SQLSTAT table as well. We have our AWR collection threshold set to collect as many sql statements as possible. This problem has nothing to do with the cost of the SQL statements and you could well find your most expensive sql statement just disappear from AWR for a period of time.

Let’s start at the very beginning….

(more…)

10/01/2013

11.2.0.3 Strange statistic, large transactions, dirty buffers and “direct path read”

Filed under: 11gR2, Performance — Nigel Noble @ 6:27 pm

Summary

I recently investigated an IO performance “spike” on a large 11.2.0.3 transactional system and I thought I would cover some interesting issues found. I am going to take the approach of detailing the observations made from our production and test systems and avoid attempting to cover how other versions of Oracle behave. The investigation also uncovers a confusing database statistic which we are currently discussing with Oracle Development so they can decide if this is an Oracle coding bug or a documentation issue.

The initial IO issue

We run a simple home grown database monitor which watches database wait events and sends an email alert if it detects either a single session waiting on a non-idle wait for a long time or the total number of database sessions concurrently waiting goes above a defined threshold. The monitor can give a number of false alerts but can also draw our attention to some more interesting events.

One day the monitor alerted me to two separate databases waiting on “log file sync” waits at exactly the same few seconds and affecting many hundreds of sessions. Both databases share the same storage array so the obvious place to start was to look at the storage statistics. We found a strange period lasting around 10 seconds long when both databases had a large increase in redo write service times and a few seconds when no IO was written at all. The first database we looked at seemed to show increase in disk service times for a very similar work load.  The second system showed a large increase in data file writes (500MB/sec) but without any matching increase in redo log writes. It seems the first database was slowed down by the second database flushing 5GB of data.

Where did 5GB of data file writes come from, and what triggered it?

Looking at the database we knew there were no corresponding redo writes, there were no obvious large sql statements reading or writing. We confirmed these writes were real and not something outside the database.

(more…)

23/07/2010

10gR2 – 11gR2, ASSM Space Management “Bug”

Filed under: 10gR2, 11gR1, 11gR2, assm — Nigel Noble @ 10:59 am

ASSM (Automatic Segment Space Management) has an issue when trying to re-use “deleted” space created by another session.  There is a very specific set of circumstances which must occur for this issue to show itself, but will result in tables (and I suspect indexes) growing significantly larger than they need to be. I am aware that the problem exists in versions 10.2.0.3 through to the current 11gR2 inclusive although I don’t know which Oracle release first introduced the problem.

The conditions required to cause the issue

My site has a number of daemon style jobs running permanently on the database loading data into a message table. The daemon is managed by dbms_job and is re-started when the database is first started and can run for many days, weeks (or in some cases months) before the job is ever stopped. We only need to keep the messages for a short time, so we have another daemon job whose role is to delete the messages from the table as soon as the expiry time is reached. In one example we only need to retain the data for a few minutes (after which time we no longer need it) and we also wanted to keep the table as small as possible so it remained cached in the buffer cache (helped by a KEEP pool). When we developed the code, we expected the message table to remain at a fairly constant size of 50 – 100MB in size. What we found was the table continued to grow at a consistent rate to many gigabytes in size until we stopped the test. If we copied the table (eg using “create table <new table> as select * from <current table>”) the new table would again be 50MB so most of the space in the table was empty “deleted” space. The INSERT statements were never re-using the space made free by the delete statement (run in another session).

(more…)

10/05/2010

11gR2 “alter user enable editions”… What’s going on?

Filed under: 11gR2, Editions, trace — Nigel Noble @ 1:54 pm

Ok, Let me start on my very first blog entry. Firstly I should say I am totally new to blogging so it may take a while before I settle on a format and style.

I want to try and combine two subjects in this entry:

  • How you can try and investigate Oracle commands to see what they do, where Oracle stores it’s dictionary data and how you can get more information about an Oracle error.
  • Trying to be topical, I thought I would use the example of enabling the Oracle 11gR2 Editions feature on a complex schema to allow multi versioning of code and hot releases.

One of the first issues you may find when you try and enable the Oracle 11gr2 Editions feature on a complex schema is the following error:

SQL> alter user test_codea enable editions;
alter user test_codea enable editions
*
ERROR at line 1:
ORA-38819: user TEST_CODEA owns one or more objects whose type is editionable
and that have noneditioned dependent objects

This is a very simple test script to cause the error (run on 11.2.0.1.1)

set echo on

rem set up two demo users called test_codea and test_codeb
connect / as sysdba
drop user test_codea cascade;
drop user test_codeb cascade;

create user test_codea identified by test_codea;
create user test_codeb identified by test_codeb;
grant create session,resource to test_codea;
grant create session,resource to test_codeb;

connect test_codea/test_codea

create or replace package ca as
    procedure a;
end;
/
create or replace package body ca as
    procedure a is
    begin
         null;
    end;
end;
/
grant execute on ca to test_codeb;
connect test_codeb/test_codeb

create or replace package cb as
    procedure b;
end;
/
create or replace package body cb as
    procedure b is
    begin
         test_codea.ca.a;    -- This procedure makes a call back to the procedure test_codeA
    end;
end;
/

rem confirm the code works
execute cb.b;

connect / as sysdba

rem now try and enable the 11gR2 Editions feature on the test_codeA user

alter user test_codea enable editions;

So what does the error mean and what can you do about it:

ORA-38819: user TEST_CODEA owns one or more objects whose type is editionable
and that have noneditioned dependent objects

Well, Oracle is kindly telling you that one or more objects in the schema is dependent on something which is not enabled for Editions, or another non Editioned schema will depend on your objects….. OR on objects which can never support Editions feature.

Well, kind Oracle gives as a nice work around to the issue…… or do they:

SQL> alter user test_codea enable editions force;

User altered.

SQL> connect test_codeb/test_codeb
Connected.
SQL> execute cb.b;
BEGIN cb.b; END;

*
ERROR at line 1:
ORA-04063: package body "TEST_CODEB.CB" has errors
ORA-06508: PL/SQL: could not find program unit being called: "TEST_CODEB.CB"
ORA-06512: at line 1

 

Oracle allows us to FORCE a user to be enabled for Editions however this can be dangerous because it can really break a complete schema in a very big way.

In the case of my simple example, there is a very simple solution, just enable Editions on the test_userB first so that when we enable Editions on test_userA both can now be versioned under the same Edition.


SQL> alter user test_codea enable editions;
alter user test_codea enable editions
*
ERROR at line 1:
ORA-38819: user TEST_CODEA owns one or more objects whose type is editionable
and that have noneditioned dependent objects
SQL>
SQL>
SQL>
SQL> alter user test_codeb enable editions;

User altered.

SQL> alter user test_codea enable editions;

User altered.

SQL>  connect test_codeb/test_codeb
Connected.
SQL> execute cb.b;

PL/SQL procedure successfully completed.

 

BUT, the above example is very simple case, what happens when you have dependency on objects which cannot be Editioned (ever!). There are a number of cases where objects can not be editioned and these objects must be moved out of the current schema to a non editioned schema. Examples of these objects are things like “user defined indexes” which include a user written plsql function, restrictions exist for the objects relating to Advanced Queues.

So, back to the other point of this post, how can we find out which objects are causing us problems so that we can fix or move them to another schema. Back to the good old 10046 trace. Lets trace the command and find out how Oracle is deciding which objects are invalid and need fixing before Editions can be enabled.


SQL> alter session set events '10046 trace name context forever, level 1';

Session altered.

SQL>
SQL> alter user test_codea enable editions;
alter user test_codea enable editions
*
ERROR at line 1:
ORA-38819: user TEST_CODEA owns one or more objects whose type is editionable
and that have noneditioned dependent objects

 

So if we look at the trace file we find the following internal Oracle sql statement along with a bunch of other stuff:


PARSING IN CURSOR #2 len=487 dep=1 uid=0 oct=3 lid=0 tim=1273060065216252 hv=305533609 ad='271fb5cf8' sqlid='btgrjvh93c4p9'
select d.owner#, u.name, d.name, d.namespace, d.stime    from obj$ d, dependency$ dep, obj$ p, user$ u    where d.obj# = dep.d_obj# and p.obj# = dep.p_obj#      and d.remoteowner is null and p.owner# = :1      and d.owner# = u.user#      and p.type# in (4,5,7,8,9,10,11,12,13,14,22,87)      and ((u.type#             != 2        and            bitand(u.spare1, 16) = 0        and            u.user#             != p.owner#) or           (d.type# not in (4,5,7,8,9,10,11,12,13,14,22,87)))
END OF STMT
PARSE #2:c=1000,e=758,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1273060065216251
EXEC #2:c=10999,e=10924,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=1111485757,tim=1273060065227267
FETCH #2:c=35994,e=37823,p=0,cr=872,cu=0,mis=0,r=1,dep=1,og=4,plh=1111485757,tim=1273060065265189

In the above trace output, we can see the SQL statement which Oracle runs to test for invalid objects which prevent the user Editions feature being enabled (In fact in my simple case, I can see “r=1” on the FETCH meaning my simple case found 1 row matching invalid objects.

So, I can take that SQL statement and run it to find all objects which would cause problems enabling Editions and can fixed the problems in advance of trying to enable the editions feature:

Cleaned up version of the sql statement and output (Note: avoid running statements you extract this way on production systems unless you are really sure they will not have any adverse effects!)

connect / as sysdba
set echo on

rem find the user_id for the schema

select user_id from sys.dba_users where username = '&USERNAME'
/

set linesize 132

rem provide the user_id to the following sql statement

select d.owner#,
       u.name,
       d.name,
       d.namespace,
       d.stime
from   obj$ d,
       dependency$ dep,
       obj$ p, user$ u
where d.obj# = dep.d_obj#
and   p.obj# = dep.p_obj#
and   d.remoteowner is null
and   p.owner# = &user_id
and   d.owner# = u.user#
and p.type# in (4,5,7,8,9,10,11,12,13,14,22,87)
and ((u.type#             != 2
     and            bitand(u.spare1, 16) = 0
     and            u.user#             != p.owner#)
   or
     (d.type# not in (4,5,7,8,9,10,11,12,13,14,22,87)
     )
    )
/

Output from the script

    OWNER# NAME                           NAME                            NAMESPACE STIME
---------- ------------------------------ ------------------------------ ---------- ---------
       258 TEST_CODEB                     CB                                      2 05-MAY-10

When we ran this on one of our complex databases, we had 35 objects listed in just one schema which were preventing us enabling Editions (a test system I must point out… we are not ready for live yet). Most were the simple case of needing to also enable editions on other dependent schemas, but we also had problems with user written functional indexes, many of our Advanced Queues. The AQ and functional indexed had to be moved out to non editionable schemas.

summary

I hope I have shown the value of tracing Oracle commands, how you  might be able to use the information. I also hope you can see you need to be careful using “alter user x enable editions force” and the internal sql statement can help identify objects which cause issues.

Blog at WordPress.com.