Nigel Noble's Oracle Blog

05/07/2010

10.2.0.5, KEEP pool / Serial Direct Read

Filed under: 10.2.0.5, 11gR1, Performance — Nigel Noble @ 11:36 am

Jonathan Lewis made reference to a 11g bug related to using a KEEP POOL in his note Not KEEPing.  Oracle 11g introduced a new feature called adaptive serial direct path reads which allows large “Full Scan” disk reads  to be read using “direct path reads” rather than through the buffer cache. In many cases “direct IO” can give significant increase in performance when reading data for first time (from disk), however can be significantly slower if your subsequent queries could have been serviced from the buffer cache. The bug Jonathan references (Bug 8897574) causes problems if you assign any large object to a KEEP POOL because by default, 11g would read large objects using the new direct IO feature and avoid  ever placing the object in the KEEP POOL. The whole point of using the KEEP POOL is to identify objects you do want to protect and keep in a cache. 

The 10.2.0.5 patchset has also back-ported the same direct read feature which is new to 11g although I don’t know if the rules are the same as 11g. The site where I work makes significant use of KEEP pools and also has spent some time investigating aspects relating to serial direct IO vs. buffer cache IO. 

 I want to use this blog entry to explore a number of related issues but also demonstrate that the 11g bug Jonathan identified seems to also exists in 10.2.0.5 patchset (and 11gR2). This blog item will cover:

  • Brief reference relating to the 11g “adaptive serial direct path read”
  • The 10.2.0.5 implementation and how to switch it on
  • 10.2.0.5 demonstration showing the relative difference for different types of IO vs read from cache
  • 10.2.0.5 demonstration which shows the KEEP pool bug also exists (but not by default)
  • Some real life comparison figures of disk reads via “direct path read” and via “buffer cache” to show why the “adaptive serial direct path read” feature is worth exploring in more detail.   

 

11g adaptive direct path reads

There are already some good blog notes out there (follow the links on Jonathans and Alex Fatkulin links at the top)  but just wanted to say a couple of things.

  • it’s ON by default, but can be switched off with database event (Event 10949 at level 1)
  • Patch for  8897574 is now available for 11.1.0.7 (Linux x86-64) to resolve the issue with KEEP pools. However, there is a clash of objects on the latest 11.1.0.7 PSU so a merge patch would be required.
  • I have also tested the bug on 11gR2 and the problem also still exists but no 11gR2 patch is available yet
  • **UPDATE** 8/10/2010 Please read note at end of this page blog item**
  • More information on Article ID 1081553.1

 

10.2.0.5

I am going to give a simple demo of using the serial direct read feature implemented in 10.2.0.5 but also use the example to highlight some of the pros and cons. I also hope to show that the same 11g KEEP pool bug exists in 10.2.0.5 and may also look at raising a back port request to get this addressed at some stage.

The 10.2.0.5 version of the Serial Direct Path Read features is NOT ENABLED by default meaning there should not be any issues (including the KEEP pool) unless someone explicitly decided to enable it. The feature is enabled by the use of Event (10358 is set to level 4).

NOTE: The Event to disable the feature in 11g is NOT the same event/level to enable it in 10.2.0.5

Details of my test database

I have installed 10.2.0.5 and created a default database (using dbca) on a test Linux (x86-64) server using Netapp nfs mounted storage. The hardware is not setup for high end performance but it is able to demonstrate the relative difference of various IO. I have set filesystemio_options  to “setall” to ensure we can use asynchronous IO (when required) and also direct IO (also avoids IO from the file system cache which skew the results). I have manually set a db_cache_size of 3GB and db_keep_cache_size  of 1GB. I have loaded a table of approximate 775MB which I will use to test.  The table is larger than 10% of the buffer cache, but can fit in both the default cache and the keep pool. fyi, it’s a 8k block size.

There is more information on enabling the feature in Metalink note: 8332368.8. Bug 8332368 – “Enable automatic serial direct read under an event” is actually the code change which implements the new feature in 10.2.0.5.

About the test script

note: This test should only be run on 10.2.0.5 as it uses 10.2.0.5 specific database events (remove the “alter sessions set events” if running on any other version and that  will also effect the results. The script also contains a flush of the buffer cache so watch out!

The script is included as a link to reduce the blog size.  Included as a template for doing your own testing. 

Test against the DEFAULT pool using 10.2.0.5

Lets run my test script against the 775MB table which is held in the DEFAULT buffer pool and discuss some of the observations we can make. I will include the spool of the output.  I have not included the output of AWR which confirms use of buffered or direct IO at various stages because I want to concentrate on the relative response times.

SQL>
SQL> rem
SQL> rem Option to trace the test
SQL> rem alter session set events = '10046 trace name context forever, level 12';
SQL>
SQL>
SQL> rem ============================================================
SQL>
SQL> rem confirm parameters
SQL> rem
SQL>
SQL> show parameter db_cache_size

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_cache_size                        big integer 3008M
SQL> show parameter db_keep_cache_size

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_keep_cache_size                   big integer 1008M
SQL> show parameter filesystemio_options

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
filesystemio_options                 string      SETALL
SQL>
SQL> rem
SQL> rem Set the buffer pool of the table  (DEFAULT or KEEP)
SQL> rem
SQL> alter table testuser.TABLE_DATA storage (buffer_pool &ENTER_BUFFER_POOL);
Enter value for enter_buffer_pool: default
old   1: alter table testuser.TABLE_DATA storage (buffer_pool &ENTER_BUFFER_POOL)
new   1: alter table testuser.TABLE_DATA storage (buffer_pool default)

Table altered.

SQL>
SQL>
SQL> rem
SQL> rem show size of the test table and which buffer pool
SQL> rem
SQL> column segment_name format a30 truncate
SQL> column meg format 999999
SQL> select segment_name,
  2         bytes / (1024 * 1024 ) Meg, blocks ,buffer_pool
  3  from sys.dba_segments
  4  where segment_name = 'TABLE_DATA'
  5  /

SEGMENT_NAME                       MEG     BLOCKS BUFFER_
------------------------------ ------- ---------- -------
TABLE_DATA                         775      99200 DEFAULT

SQL>
SQL>
SQL> rem
SQL> rem lets flush the buffer cache to start fresh (or bounce the instance)
SQL> rem
SQL> rem WARNING: Do NOT run this script on any production systems
SQL> rem
SQL>
SQL> alter system flush buffer_cache;

System altered.

SQL>
SQL> rem
SQL> rem check which buffer pool the table is assigned too
SQL> rem
SQL>
SQL> select buffer_pool from dba_tables where table_name = 'TABLE_DATA';

BUFFER_
-------
DEFAULT

So far we have confirmed the setup of key database parameters, ensured our 775MB table is assigned to the DEFAULT buffer pool and also flushed the buffer cache so we can start from an empty and consistent point.
We will now go and enable the 10.2.0.5 direct IO feature and see what happens when we scan our table a couple of times.

SQL>
SQL> rem
SQL> rem lets scan the table using the new serial direct path read option
SQL> rem going to switch it on at session level but can be set global
SQL> rem Will run it twice to show a consistant read time (ie data never made the buffer cache
SQL> rem
SQL>
SQL> alter session set events = '10358 trace name context forever, level 4';

Session altered.

SQL> set timing on
SQL>
SQL> select /*+FULL(c)*/ count(*) from testuser.TABLE_DATA c;

  COUNT(*)
----------
   1958939

Elapsed: 00:00:18.37
SQL> select /*+FULL(c)*/ count(*) from testuser.TABLE_DATA c;

  COUNT(*)
----------
   1958939

Elapsed: 00:00:19.20
SQL>
SQL> set timing off
SQL>

So, you can see that with the direct read feature switched on, both the queries take around 20 seconds each. So the second query has not made use of any buffer cache.

Now.. Let’s go and switch OFF the new feature and then scan the data.

SQL>
SQL>
SQL> rem
SQL> rem Lets switch OFF the new 10.2.0.5 serial direct path read option
SQL> rem again, this is done at session level for purpose of this test
SQL>
SQL> alter session set events = '10358 trace name context off';

Session altered.

SQL>
SQL>
SQL> rem scan it twice and see what happens
SQL> set timing on
SQL> select /*+FULL(c)*/ count(*) from testuser.TABLE_DATA c;

  COUNT(*)
----------
   1958939

Elapsed: 00:00:55.44
SQL>
SQL> rem this second read is VERY MUCH FASTERbecause the data is already in the buffer cache
SQL> select /*+FULL(c)*/ count(*) from testuser.TABLE_DATA c;

  COUNT(*)
----------
   1958939

Elapsed: 00:00:00.54
SQL>

Ok, now we see the first query takes 55 seconds via the buffer cache (rather than 20 seconds direct). So I hope you can see that when you actually do need to scan large volumes of data direct is much faster. BUT hold on… now look at the second query. This only takes less than 1 second because it was read from the buffer cache. So if my query was usually found in the buffer cache, then the direct IO feature may actually be very bad.

Now my data is already in the buffer cache, what happens if I ask to do a direct path read?

SQL> set timing off
SQL>
SQL> rem
SQL> rem One last test, lets now switch on the serial direct read event, BUT the data is already in the buffer cache.
SQL> rem
SQL> rem
SQL>
SQL> alter session set events = '10358 trace name context forever, level 4';

Session altered.

SQL>
SQL> set timing on
SQL> select /*+FULL(c)*/ count(*) from testuser.TABLE_DATA c;

  COUNT(*)
----------
   1958939

Elapsed: 00:00:00.56
SQL> set timing  off
SQL>
SQL>
SQL> rem ============================================================
SQL> rem option to switch off trace
SQL> rem alter session set events = '10046 trace name context off';
SQL>
SQL> set timing off
SQL> spool off

The test shows that Oracle will read blocks from the buffer cache before deciding it need to read the data using the direct path option (Assuming your blocks could get in the cache in the first place)

Test against the KEEP pool using 10.2.0.5

I hope you will see that the table is assigned to the KEEP POOL but the query results are 100% as the test run above. Since I have told Oracle this is an important object which should be protected in the KEEP pool, it should really read it via the buffer cache (slowly) but all the subsequent reads should be out of the cache in sub 1 second response. Because the data is read in directly, EVERY query now takes around 15 seconds and not .5 seconds.

SQL>
SQL> rem
SQL> rem Option to trace the test
SQL> rem alter session set events = '10046 trace name context forever, level 12';
SQL>
SQL>
SQL> rem ============================================================
SQL>
SQL> rem confirm parameters
SQL> rem
SQL>
SQL> show parameter db_cache_size

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_cache_size                        big integer 3008M
SQL> show parameter db_keep_cache_size

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_keep_cache_size                   big integer 1008M
SQL> show parameter filesystemio_options

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
filesystemio_options                 string      SETALL
SQL>
SQL> rem
SQL> rem Set the buffer pool of the table  (DEFAULT or KEEP)
SQL> rem
SQL> alter table testuser.TABLE_DATA storage (buffer_pool &ENTER_BUFFER_POOL);
Enter value for enter_buffer_pool: keep
old   1: alter table testuser.TABLE_DATA storage (buffer_pool &ENTER_BUFFER_POOL)
new   1: alter table testuser.TABLE_DATA storage (buffer_pool keep)

Table altered.

SQL>
SQL>
SQL> rem
SQL> rem show size of the test table and which buffer pool
SQL> rem
SQL> column segment_name format a30 truncate
SQL> column meg format 999999
SQL> select segment_name,
  2         bytes / (1024 * 1024 ) Meg, blocks ,buffer_pool
  3  from sys.dba_segments
  4  where segment_name = 'TABLE_DATA'
  5  /

SEGMENT_NAME                       MEG     BLOCKS BUFFER_
------------------------------ ------- ---------- -------
TABLE_DATA                         775      99200 KEEP

SQL>
SQL>
SQL> rem
SQL> rem lets flush the buffer cache to start fresh (or bounce the instance)
SQL> rem
SQL> rem WARNING: Do NOT run this script on any production systems
SQL> rem
SQL>
SQL> alter system flush buffer_cache;

System altered.

SQL>
SQL> rem
SQL> rem check which buffer pool the table is assigned too
SQL> rem
SQL>
SQL> select buffer_pool from dba_tables where table_name = 'TABLE_DATA';

BUFFER_
-------
KEEP

SQL>
SQL> rem
SQL> rem lets scan the table using the new serial direct path read option
SQL> rem going to switch it on at session level but can be set global
SQL> rem Will run it twice to show a consistant read time (ie data never made the buffer cache
SQL> rem
SQL>
SQL> alter session set events = '10358 trace name context forever, level 4';

Session altered.

SQL> set timing on
SQL>
SQL> select /*+FULL(c)*/ count(*) from testuser.TABLE_DATA c;

  COUNT(*)
----------
   1958939

Elapsed: 00:00:17.89
SQL> select /*+FULL(c)*/ count(*) from testuser.TABLE_DATA c;

  COUNT(*)
----------
   1958939

Elapsed: 00:00:17.24
SQL>
SQL> set timing off
SQL>
SQL>
SQL>
SQL> rem
SQL> rem Lets switch OFF the new 10.2.0.5 serial direct path read option
SQL> rem again, this is done at session level for purpose of this test
SQL>
SQL> alter session set events = '10358 trace name context off';

Session altered.

SQL>
SQL>
SQL> rem scan it twice and see what happens
SQL> set timing on
SQL> select /*+FULL(c)*/ count(*) from testuser.TABLE_DATA c;

  COUNT(*)
----------
   1958939

Elapsed: 00:00:56.03
SQL>
SQL> rem this second read is VERY MUCH FASTERbecause the data is already in the buffer cache
SQL> select /*+FULL(c)*/ count(*) from testuser.TABLE_DATA c;

  COUNT(*)
----------
   1958939

Elapsed: 00:00:00.64
SQL>
SQL> set timing off
SQL>
SQL> rem
SQL> rem One last test, lets now switch on the serial direct read event, BUT the data is already in the buffer cache.
SQL> rem
SQL> rem
SQL>
SQL> alter session set events = '10358 trace name context forever, level 4';

Session altered.

SQL>
SQL> set timing on
SQL> select /*+FULL(c)*/ count(*) from testuser.TABLE_DATA c;

  COUNT(*)
----------
   1958939

Elapsed: 00:00:00.58
SQL> set timing  off
SQL>
SQL>
SQL> rem ============================================================
SQL> rem option to switch off trace
SQL> rem alter session set events = '10046 trace name context off';
SQL>
SQL> set timing off
SQL> spool off

I tested (using a modified version of the script) the patch for 8897574 on 11.1.0.7 and could show with the patch applied and my table assigned to the KEEP POOL, the first execution of the query was read into the buffer cache (and slow) but all further requests were sub 1 second. I don’t yet fully understand the rules which Oracle uses in 10.2.0.5 or 11g to make the decision for direct read, but the key aspect of success will be how well Oracle manages to understand your data usage…. Get it wrong and you may well find “Full Scans” taking very much longer (with higher disk IO) but if they get it right it could have huge improvements in IO response times. The “adaptive serial direct path read” needs to come into play for very large object scans that would be unlikely found in the buffer cache (e.g. very large Data Warehouse objects).

Real Life Numbers

The final part I wanted to give a few real life examples of why queries using the “direct path options” can be so much faster but also reiterate why it can also cause issues. The following are some numbers taken from a live 10.2.0.3 RAC based Data Warehouse (running ASM). It is possible to experiment with direct path reads on older versions of Oracle by either the “_serial_direct_read” session parameter or using parallel query. The “_serial_direct_read” internal session parameter can be used to test running a serial query using direct path read but it can be difficult to use because it is effected by the parsed open cursors (i.e. if someone has already parsed a statement without it, it will not be used, also if someone parses a cursor with it set other sessions using the same sql statement will be direct). But “_serial_direct_read” can be useful for testing. The other option is to run sql statements using parallel query. The following numbers are taken from Full Table Scanning running COUNT(*) on a 100GB table (with 1MB multi block read set). 100GB is larger than the cache in the storage system and also the database instance. We also verified that at the time of the query, none of the data was cached in any node of the RAC cluster.

query by                                             rate
------------------------------               --------------
via buffer cache                              33 MBytes/sec
_serial_direct_read                          300 MBytes/sec
Parallel degress 2 (direct io)               339 MBytes/sec

Things we observed.

  • It was not possible to ever make reads via the buffer cache to use async IO
  • We lost some time when reading via buffer cache in RAC verifying the blocks were not on any other node, but not enough to account for such a poor rate.
  • We suspect that +300MB/sec is achieved with direct path reads, because firstly it does seem to use async IO, secondly we suspect the IO rate is sufficiently high that it can drive the read-a-head caching in our storage system (eg: read blocks 10,11,12,.. so start fetching 13,14,15 before we ask for it)
  • Since _serial_direct_read is not supported (and difficult to use), we tested using very low degrees parallel. Because parallel query is implemented to use “direct path reads”, we could see huge increases in through-put just by using 2 degrees parallel.
  • The key point I am making about these numbers is the huge difference we have seen on our system between buffer cache and direct reads.
  • I may well write a longer blog on these results and include some of the supporting tkprof/trace data at a later time.

 

One final comment I would like to make and hope someone else may be able to help (Someone good at Hints or within Oracle Development). What I would really like to have seen is a HINT which allows sql statements to indicate preference of “buffer cache” or “direct path” for full scans. That way you could code sql statement for direct path knowing how your data is expected to be used. It may well be possible that people may need to get the KEEP POOL patch applied and use this as the means to help Oracle make the correct decision.

It would be interesting to hear peoples experiences of using the “adaptive serial direct reads” feature in production 11g environments and how good a job Oracle does (or not) at choosing which blocks are cached vs read direct.

update 8-oct-2010

The latest 11.2.0.2 patchset (10098816)  now includes the fix for Bug 8897574. So large objects assigned to the KEEP pool will always do buffered IO and never direct IO (regardless of the size of the object scan).   

4 Comments »

  1. Hi Nigel,

    Nice article and a good description of the serial direct IO capabilities of later 10 and 11 versions of Oracle, thank you.

    I’m a little suprised Oracle treat this issue of large objects being read via serial direct read and thus not going into the keep cache as a bug and not an enhancement.

    As far as I am aware, there is nothing special about the KEEP pool or the RECYCLE pool other than they are separate pools with a name suggesting what you might want to use them for. Blocks will be cycled through the KEEP pool via the same rules as the standard pool, if you have more segments set to use the keep pool than can physically fit in the pool. Thus the fact that objects scanned by serial IO do not go into the keep pool is just a special case that they do not going into the buffer cache at all?

    Comment by mwidlake — 14/07/2010 @ 2:53 pm

    • Hi Martin,

      I see this very much as a bug when looking at the usage of objects assigned to the KEEP POOL. If a DBA/Developer has explicitly asked for objects to be maintained in the KEEP POOL to reduce the cost of physical disk reads I would not expect an upgrade of Oracle to change this. Although this issue only exists for full object scans of large objects (>10% buffer cache), i.e random IO would still get into the KEEP POOL. My site has some specific parts of the application which we can’t afford to do any disk reads so we protect some key tables/indexes within the keep pool and in fact pre-cache the tables/indexes on database startup by full scanning all the tables and indexes. Although it is true to say the KEEP POOL and RECYLE act just like any other pool (blocks can still get aged if all the blocks don’t fit) there are some subtle differences with KEEP and RECYLE. Oracle plays some tricks for example with where it places Consistent Read blocks depending on existence of RECYCLE POOL/KEEP POOL (See very old note on Jonathans Site free buffer waits ).
      My problem with this bug is not the mechanics of direct IO but rather when Oracle decides it will do it. Objects assigned to the KEEP pool should always be read via it because that was what the DBA/Developer asked for.

      Regards

      Nigel

      Comment by Nigel Noble — 14/07/2010 @ 10:20 pm

  2. Hi Nigel

    Nice post a few good bits and bobs.

    You hit the nail on the head when you looked at the real numbers and took stats from a data warehouse environment – long running queries that may access blocks only once a month why would you want to pollute the cache with that!. Why not use you own cache (PGA) instead of the polluting the group sharing cache (the buffer cache). Does not make too much sense for an OLTP though?

    I had a look at this a while back to try and make Oracle (10.2 and not in production) run parallel queries without using direct path reads and instead use the buffer cache (a bit of R&D). I knew that all the blocks for the query were in the cache I wanted to eliminate any IO. Try setting _small_table_threshold very high. Even for parallel query it will stop using direct path reads.

    Rob

    Comment by Rob Dempsey (ex-WebBased Company) — 25/08/2010 @ 4:54 pm

  3. […] will be delighted to see the serial direct reads in effect on the database I’ve borrowed from […]

    Pingback by Conditional joins: Part 2 « Dave Webster's Blog — 30/12/2010 @ 1:10 pm


RSS feed for comments on this post. TrackBack URI

Leave a reply to Rob Dempsey (ex-WebBased Company) Cancel reply

Blog at WordPress.com.