Nigel Noble's Oracle Blog

TECHNIQUES FOR FAST DATABASE RE-ORGANISATION

 
 

 

This is a copy of a paper I wrote in 1995 while working at Churchill Insurance. It’s included on this blog just because I’m really proud to have been associated with the paper and the people who worked on the migration at Churchill. 

  

If you want to know about database rebuilds today…. this is NOT the place to read….. go look up “Oracle Data Pump or Oracle Transportable Database/ Tablespaces”   
 
Nigel Noble, May 2010.   
    
 
 
 

TECHNIQUES FOR FAST DATABASE RE-ORGANISATION (1995)

    

Nigel Noble   

Churchill Insurance, Bromley,Kent,England   

Summary   

This paper will discuss a number of features and techniques available within the Oracle RDBMS which can dramatically speed the reorganisation of databases. However well tuned the storage parameters within a database, there will eventually come the time when you need to reorganise all, or part of the database. This could be to reduce fragmentation, general house-keeping or when moving to a new operating system. The current utilities which come with Oracle were never designed to move large volumes of data in a short period of time. We recently moved our 60 gigabyte production database to a new machine. If we had used the standard ‘export / import’  method we estimated the whole move would of taken at least 12 DAYS, but using the techniques we  developed the move was done in 18 HOURS. As the database grows and as business considerations limit maintenance time, fast database maintenance is essential.   

Overview of Churchill Insurance   

Churchill Insurance has a single database of over 60 gigabytes of data and runs a 6 by 24 hour operation. Churchill uses a Cray Superserver 6400 as the database server with 600 high OLTP concurrent users,  having  recently moved the database from a Pyramid Nile. The main production database is running under Oracle 7.1 and generates approximately 4 gigabytes of archived redo logs a day. During peak we parse 250 new SQL statements per second and execute 350 SQL statements per second. The database grows by 15 gigabytes per year. Most of the ideas in this paper have been developed by Churchill over a number of years and are not widely used elsewhere. Churchill is continually evaluating and exploring methods for building databases quickly. Churchill has developed  a number of programs to aid these techniques since most were not commercially available.   

Exporting Data   

On UNIX, Oracle links the EXP/IMP programs in two task mode. This is so that the SGA can be protected from potential corruption by users own programs. It is possible to relink the EXPort program in single task mode and gain as much as 30% improvement. This can only be run from the ‘Oracle’ account. Remember, Oracle does not support this usage on unix ( I hear they do use it themselves ).   

cd $ORACLE_HOME/rdbms/lib
make -f oracle.mk expst
make -f oracle.mk impst
#move the expst and impst to the bin
#you can now say 'expst userid= etc...'

A FULL database export will not make good usage of the machines resources. If your machine has a number of CPU’s available, exporting the database will only use one. You need to ask yourself ‘How many CPUS,Disks,Networks etc. do I have?’ and then think about how to make maximum use of them. Most Unix systems have a maximum file size limit of 2 gigabytes so you might have to export to tape or raw device.
For example; If your machine has 10 CPU’s with lots of free memory and you wish to ‘exp/imp’ a database; if you use EXP to do a FULL database export you will only have one process writing the whole export to a single file. Your remaining 9 CPU’s are idle. A much more efficient way is to fire off multiple single table exports in parallel. We developed a program to schedule this automatically but some editing of script files could achieve the same effect. You should sequence the biggest jobs first because in theory, if you get the correct number of streams and have a large machine, your total export should only take as long as your biggest single table ( ie, you have exported all other tables by the time the biggest completes). You could also ensure no two streams are processing the same table space and disks at the same time to redue IO contention.   

Parallel Export
Single export vs parallel streams

You could do a full database export without the data to get just the database structure in a export file. Because IO can become a major bottle neck in the performance, you could also split the export / output files to different disks.   

Importing Data   

Now you have your export files, what can be done to speed up your imports? Firstly, you can do the same as the exports and run multiple streams of import with their IO split.  If you wish to keep archive log enabled, remember to manage the space of archive logs generated. Of course you can link the ‘imp’ program in single task mode and see improvements of 30 percent. Since you are now exporting single tables you can start to import the completed export files while other tables are still exporting. If you were copying the data to another machine you could copy the completed files across while  waiting for other exports to complete.   

Now you are importing a large volume of data quickly, you will find that Oracle is able to fill the database very fast. Ensure you have a number of large redo logs during the import which you can drop once the build is complete. There are two reasons for doing this. Firstly, if you don’t do this you will find all your on-line redo logs fill very quickly and the database will become very slow (watch for ‘Thread x cannot allocate new log…. Checkpoint not complete’ in the alert log.) Secondly, each time Oracle fills an on-line redo log  (or reaches LOG_CHECKPOINT_INTERVAL) a Checkpoint occurs. Checkpoints can hinder performance because dirty data blocks in the SGA are written back to disk. For a very large SGA this can take minutes. At Churchill we always set LOG_CHECKPOINT_INTERVAL to a value larger than any redo log and this causes checkpointing to only occur when switching redo logs. Checkpoints work on a queuing system and you can find that the next switch of redo log will cause another checkpoint before the previous one is complete. It is possible to still fill all the redo logs even if they are large. If we were importing a 1 gigabyte table, we would create two 500 Meg redo logs and the import would only checkpoint twice.   

We have found we can make import twice as fast by saying ‘COMMIT=NO’ during an import. Although ‘COMMIT=NO’ is the default, most DBA’s set ‘COMMIT=YES’ so that during large imports a small number of rows are committed at a time so that the rollback segments are not  exceeded.  Since import can be 50% faster if the complete import is done in one transaction, it would be very useful if it were possible to know how much rollback segment the import would use before you start.  Using ‘COMMIT=N’, each table import  is seen by Oracle as one transaction, you must allow enough space in the rollback segment. I never could understand why, if I used IMP to import a large table I would blow my rollback segment. I would then have to say COMMIT=y for the import to work and commit each row at a time. ‘How could I generate rollback segment entries on an insert when I would have no old data to rollback?’.  Oracle must know which rows in the database to rollback.   

The space required in the rollback segment is far less than that of the table to be created. It’s based on the number of rows per block and the number of blocks used to store the data in the new table. The information held in the RBS is basically a pointer to each block and a 2 byte pointer to each row in the block. There is also an 84 byte header to each RBS block. Each block in the new table has a 58 byte header in the RBS.   

Rollback Sizing

Calculating Rollback Size

I suggest that you allow for the biggest table and then set a number of rollback segments which all can grow to this size. The above formula was used to calculate a 3 gigabyte rebuild to the nearest 1 megabyte (only 131 megabytes).   

SQL*Loader with the direct load path option is very fast at loading large volumes of data and I understand that future versions of EXPort/IMPort may be built on this technology. We have tended to stay away from this technique due to worries in building our own export utility. If you wish to use SQL*Loader you need to write your own program which would fetch the data out of the database into flat files and then use SQL*Loader to load it again. With 7.1 it would also be possible to unload/load the data in parallel. The only problem with this method is can you trust your own export program? What happens if a few days after a total rebuild you find for example you’ve lost all the times on date fields?   

Copying Data (any version of oracle)   

There is a better approach to rebuilding tables within the same database which is more than twice as fast as export/import and produces no flat files. We wrote a PL/SQL package to automatically manage the streaming but this document will only explain the basics of the process.  You are able to move very large volumes of data around the same database (table space to table space) using ‘CREATE TABLE SELECT * FROM’. Constraints make it a little more complex. We have used this to move gigabytes of data within the same database and seen very fast times. We have also tested using this method between two databases but we found that SQL*NET became a major problem and the speed was far too slow for large volumes of data. This technique also gets around the UNIX 2 gigabyte export file size limit since no files are generated. You should also note that like importing with ‘commit=N’ this is done as a single transaction and you should calculate the rollback segment requirements. This paper will later talk about a number of changes with this in Oracle 7.2   

Copy Table Data
Copying Table Data (CTAS)

If you are using constraints be careful of the ‘NOT NULL’. If you say:   

                CREATE TABLE new_table   

                (col1 DATE CHECK(col1 IS NOT NULL))   

                AS SELECT * FROM old_table;   

Oracle will create and copy the data in the table OK BUT, Oracle will then scan the whole table checking the NOT NULL constraint. This is an extra task which you don’t want (also with the BLOCK CLEAN OUT delays).   

If you do this:   

                CREATE TABLE new_table   

                (col1 DATE NOT NULL)   

                AS SELECT * FROM old_table   

Oracle will also build a CHECK constraint but the testing is done as the rows are inserted and no checks are made after the transfer is complete. In one example I copied a 3 Gigabyte table in 4.5 hours using NOT NULL (and the CHECK constraint generated by Oracle). The same build took over 9 hours with the statement which only had the CHECK constraint.   

Index Building   

What can be done to speed up the index builds ? Again you can run multiple index builds but note that you may not be able to run a high number of streams. To avoid sorting the index on disk increase the SORT_AREA_SIZE in the init.ora. If you have a 200 Megabyte index to build and enough memory on the machine you can set the sort_area_size to 200Meg. Sorts done in memory can save hours on an index build. If you can’t sort the whole index in memory then set the sort_area_size as large as you can but you will need a large temporary table space to sort in. Remember to change the SORT_AREA_SIZE back to the original value once the builds are complete.   

Oracle 7.1 introduced a number of  new parallel functions and index building was one of  them. You can now split a single index build into a number of streams. How many streams should you use? We did some tests and found that with our configuration the best number of streams was 4 or 5. After that number we started to bottle out on IO and as we increased the work load we got very little speed improvements. We used a 370 megabyte table and found that we could reduce the index build to under half using 4 streams.  We felt that 4 streams should give a quarter of the speed. The problem was that as more streams were loaded the IO became the bottle neck. By using operating system disk striping we were able to spread the IO over a number of devices. The original 1 hour index build could now be done in under 10 minutes.  We tested with different numbers of disk in the stripe and we now standardise on 4 disk stripes.   

Parallel Index Build
Parallel Index Build using comparsing Stripes

As you can see from this graph performance stops improving after about 4 to 5 parallel processes. Since an 8 disk stripe is far harder to manage and less secure we now use 4 disk stripes. However, you should note that if one disk in a stripe fails, all database files on the whole stripe will go off-line.   

During our recent migration we developed a pl/sql program to automate our index builds in parallel. Because we wanted to build all our indexes in memory and we had 12 gigabytes of indexes to build and only 4 gigabytes of machine memory, we put in an addition to our program to limit the number of streams so that they never used more than 3 gigabytes of memory. It’s also important to remember that each parallel stream will have two parallel server processes. Therefore a 4 stream index build will have 8 processes and thus will use twice us much sort_area_size. From previous tests we knew how much each index would grow to so we allowed any number of streams to run provided that the total index size was less than 1.5 gigabytes. Another change we had to make which goes against normal parallel processing was to sequence the index builds so the smallest indexes build first and the largest was build last. This was necessary because the parallel processes do not release their memory once a index build completes. Example:- First index may require 800 megabytes, second only 300 megabytes. This is now wasting 500 megabytes during the second build. If you now do this for 30 streams you will start to see ‘malloc’ errors from Oracle. By creating small to large, each process only ever grows to it’s memory requirement.   

Oracle has a feature called BLOCK CLEAN OUT. When Oracle makes changes in data blocks  the blocks are marked as dirty. The first time a block is read from the database (even by a SELECT) it is cleaned and put back. If you import a table all the blocks in the table are dirty. If you ‘SELECT COUNT(*) FROM table’ all the blocks are cleaned. If you do the same select statement again you will find that it goes dramatically faster than the first time. You can try this yourself and please note that this is NOT because the table has been cached in the SGA. The ‘dirty’ blocks are used because when updating a block Oracle does not know if the transaction will finally be committed to the database. When Oracle first reads a block, it may detect that it does not know if the transaction was committed so it must check in the rollback segments and if it finds nothing it can mark the block as committed.  Once you’ve imported your tables and started the index build, Oracle has to clean the blocks which slows it down. Currently, you can’t avoid block clean out. We thought  we could speed up index builds while waiting for other imports to complete by counting the rows of the completed imports. When we start the index creation the tables are already clean so the index build would be faster. During our recent migration we found that running counts while loading data would dramatically slow the data loading (block cleanout alone added 6 hours to our index builds).  As an example I loaded a 100 megabyte table, it took 9 minutes to load the data. The first ‘Select count(*)’ on the new table took 14 minutes. I then did another ‘Select count(*)’ on the same table and it took under 2 minutes (Yes, that’s TWO MINUTES!!!).   

Data Transfer using Named Pipes   

When we were trying to find a way to move our 60 gigabyte database we were left with a major problem. We had 30 gigabytes of raw data to move and this meant we would need an extra 30 gigabytes of disk on each machine to export to and import from. The ‘CREATE TABLE SELECT * FROM’ method of moving data within the same database is very efficient but what about moving data from one database to another? We tried the create table method over SQL*net and found very poor performance (waiting days in some cases!). In the past we successfully moved a 20 gigabyte database by exporting to flat files, copying the data across the network and then importing the data. The whole process was done in parallel and at the time we felt this was the fastest solution, but was it?   

The solution we came up with to move data between two machines was much faster than export / import and produced no files The idea is based on using ‘named pipes’ on UNIX. We saw amazing results.  A named pipe is a UNIX feature which allows a pipe to be defined which looks like a file. You can write into the file until the pipe is full. Another process can read from the pipe until there is nothing to read. Using this method we are able to define a pipe on one machine, EXPort a table into the pipe file, then define another pipe on the other machine and start IMPort running on that machine. At this stage both processes will hang (‘export’, because nothing is reading from the pipe, and ‘IMPort’ because nothing is writing to the pipe). The final step is to fire off a UNIX file copy from the first file to the second file. The table will now be exporting and importing at the same time. As well as good performance, no extra disk areas are needed for the export file. Of course, you can then start to use multiple streams during the processes. During our production move we were able to move 30 gigabytes of data in under 6 hours. Here are some example results of the pipe method: Test was with a 109 Megabyte table over TCP/IP to another machine and database. The table held 1.7 million rows.   

Unix Pipes
Using Unix Pipe
stats

Timings

Our normal way to copy data would be ‘Exp/rcp/imp’ (Export to file. Remote copy the file to another machine. Import the flat file). As you can see, the normal method would take 32 minutes compared to 20 minutes for the pipe method. That gives an improvement of 30 percent. If you also set COMMIT=N with the same parameters you will see a 50 percent improvement. You should also note that the default buffer for our machine is 30K and we see poor results with that value. Overall we have been able to improve from 32 minutes down to 10 minutes ( a saving of 70 percent!!!!).   

Creating Tablespaces (Oracle 7.x)   

An obvious way to speed the creation of tablespaces within a database would be to run them in parallel. Unfortunately a ‘feature’ of the internal locking within Oracle results in only ONE table space being added to the database at one time, each process will wait until it’s  turn.   

Create Tablespace Locks

Parallel Create Tablespace locking

As you can see in this example both single and parallel take the same time. The trick you can use to achieve parallel table space creation and bypass the Oracle locking problem is to create all your table spaces in a single stream of size 10K bytes, then drop all the table spaces. Now create them the correct size in parallel BUT IN THE REVERSE ORDER to the original create statements. We developed a program which automatically does this but it’s possible to do it by hand. The order is very important, so if you get any sequence incorrect the build starts to lock up again. We have been able to generate a 60 gigabyte database shell starting from nothing in under one hour using this technique. On the same system we expect it to take well over 15 hours in a single stream. Another thing we did to achieve such good times was to sequence the build so that the IO was spread evenly over all available disks and disk controllers.   

The Trick

The Trick

Overview of our Pyramid to Cray move   

  • 60GB parallel database creation -(1 hour)   
  • Multiple export / imports in single task   
  • export /import 30GB using pipes -(6 hours)   
  • split over 4 back bone networks   
  • index built in memory parallel (12 Hours)   
  • program to monitor data load and predicts    
  • completion time for any tables   
  • with automatic error checking   
  • check number rows in both DB’s   
The Migration

Pyramid to Cray migration

    

 Oracle 7.2   

The ‘CREATE TABLE SELECT * FROM’ has been extended on version 7.2 to use the direct load path options (as in SQL*Loader) to copy tables within the same database. The speed improvements are dramatically faster than before because the data bypasses the internal SQL layer. If you specify the PARALLEL option in the CREATE statement Oracle will read and write the data in parallel. When I wrote about the streaming of EXPort I stated that your export could take as long as your largest single table. Now we have a way of  even streaming a single table build. Another major improvement is because the data is loaded using the direct loader – no rollback segment data is generated and no blocks are marked dirty. This means index builds on tables built this way will be much faster (clean out alone added 5 hours to our recent migration)   

Another feature of 7.2 is the ability to build tables and indexes with the ‘unrecoverable’ option. The are two major benefits of this option. Times for building objects are faster because Oracle no longer has to write the new data into the redo logs. When a block is written to a redo log block, Oracle must wait for confirmation that the data got there. The second benefit is if you are running your database in Archive log mode, all others transactions are still Logged. The disadvantage of this option is that once your build is complete, you have NO ability to recover the table if the table space needs recovery. Once the build is complete you MUST get a  backup of your database. If the backup fails, you can’t recover your database from an earlier backup and roll through the logs (Churchill NEVER switch off archive log and therefore can roll forward from any backup taken months earlier). There has been an undocumented/unsupported way of turning off redo logging at database level from Oracle 7.0 but since there was no way to recover after a crash it was not worth the performance increase. Using ‘unrecoverable’ we have seen improvements of between 10 and 40 percent. The key to the improvement is whether your redo logs are on their own disk. If they are we only see 10 percent improvement using ‘unrecoverable’. The ‘unrecoverable’ option does have a use for duplicating temporary data and new database builds but I feel its use would be limited on production databases.   

The times for copying a 2 GB table. As you can see, it was possible to copy a full 2 gigabyte table in fifteen minutes.   


   

Conclusions   

As a DBA you will need to consider which techniques to make use of depending on available resources. You should also evaluate some of the risks involved. All of these techniques will speed up your re-organisations but they need more planning, performance/task monitoring and good decision making if errors occur.  Good system administration skills may be needed in monitoring the machines resources. Notes should be taken of any changes made during the build work so parameters can be set to their original values once complete. Other considerations can be that your recovery may be slower if you hit certain problems. For example, you can speed your import with ‘COMMIT=N’ but if your import fails you must wait as long as your import had taken, for the rollback to complete. Large redo logs will reduce the delays of check pointing but could increase database recovery time after a crash.  If careful planning is done and with adequate disaster recovery you should be able to dramatically improve your Oracle database re-organisation.

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: