Nigel Noble's Oracle Blog

08/01/2015

“log file sync” and the MTTR Advisor

Filed under: 11gR2, Performance — Nigel Noble @ 6:12 pm

I recently investigated a performance problem on an Oracle 11.2 OLTP trading system and although we still don’t fully understand the issue (and which versions of Oracle it effects), I thought I would share what we found (and how we found it). We had a hardware failure on the database server, within 30 seconds the database had automatically been restarted on an idle identical member of the cluster and the application continued on the new database host. A few days later I just happened to notice the following change in the LGWR trace file. I noticed that the Log Writer trace was showing more “kcrfw_update_adaptive_sync_mode” messages than normal.

Adaptive Log Writer

MOS Doc ID 1541136.1 discusses in more detail but basically when “_use_adaptive_log_file_sync” is set to true (the default since 11.2.3), Oracle switches between two methods of communication between the LGWR and foreground processes to acknowledge that a commit has completed:

  • Post/wait – traditional method available in previous Oracle releases
    LGWR explicitly posts all processes waiting for the commit to complete.
    The advantage of the post/wait method is that sessions should find out almost immediately when the redo has been flushed to disk.
  • Polling
    Foreground processes sleep and poll to see if the commit is complete.
    The advantage of this new method is to free LGWR from having to inform many processes waiting on commit to complete thereby freeing high CPU usage by the LGWR.

Initially the LGWR uses post/wait and according to an internal algorithm evaluates whether polling is better. Under high system load polling may perform better because the post/wait implementation typically does not scale well. If the system load is low, then post/wait performs well and provides better response times than polling. Oracle relies on internal statistics to determine which method should be used. Because switching between post/wait and polling incurs an overhead, safe guards are in place in order to ensure that switches do not occur too frequently.

All switches are recorded in LGWR’s trace file.

Looking in the Log Writer trace file before the host “fail-over” ( On the previous db host ) the Log Writer would not switch that often and mostly it would run in “post” mode. Since we moved to the new database server it switched mode much more often and spent much longer in the “poll” mode.

*** 2014-11-18 12:48:43.068 kcrfw_update_adaptive_sync_mode: post->poll long#=102 sync#=645 sync=5167 poll=2056 rw=522 rw+=1735 ack=0 min_sleep=2056
*** 2014-11-18 23:01:37.034 kcrfw_update_adaptive_sync_mode: poll->post current_sched_delay=0 switch_sched_delay=1 current_sync_count_delta=320 switch_sync_count_delta=645
*** 2014-11-18 23:16:31.263 kcrfw_update_adaptive_sync_mode: post->poll long#=49 sync#=367 sync=5943 poll=2056 rw=474 rw+=474 ack=0 min_sleep=2056
*** 2014-11-19 01:50:03.342 kcrfw_update_adaptive_sync_mode: poll->post current_sched_delay=0 switch_sched_delay=1 current_sync_count_delta=178 switch_sync_count_delta=367
*** 2014-11-19 02:05:46.387 kcrfw_update_adaptive_sync_mode: post->poll long#=54 sync#=238 sync=9913 poll=3956 rw=1978 rw+=1978 ack=0 min_sleep=2056
*** 2014-11-19 02:48:16.062 kcrfw_update_adaptive_sync_mode: poll->post current_sched_delay=0 switch_sched_delay=1 current_sync_count_delta=102 switch_sync_count_delta=238
*** 2014-11-19 03:32:17.901 kcrfw_update_adaptive_sync_mode: post->poll long#=26 sync#=113 sync=6574 poll=2056 rw=885 rw+=890 ack=0 min_sleep=2056
*** 2014-11-19 05:30:02.634 kcrfw_update_adaptive_sync_mode: poll->post current_sched_delay=0 switch_sched_delay=1 current_sync_count_delta=46 switch_sync_count_delta=113
*** 2014-11-19 11:19:10.203 kcrfw_update_adaptive_sync_mode: post->poll long#=51 sync#=378 sync=6465 poll=2056 rw=464 rw+=464 ack=0 min_sleep=2056

Although I was not aware of any particular problems I was intrigued so decided to run one of my favourite scripts to get a general overview of wait performance.

AWR Wait Histogram data:

I have a script ( awr_wait_histogram.sql ) which is based on the AWR wait histogram data. I prefer using this script to the AWR reports because it creates a fast way to see database wait event performance over many AWR snap shots (we snap AWR every 15 minutes). The script allows you to spot sudden shifts in the wait time buckets. I looked at “log file sync” and “log file parallel write” times. The DB restarted at 03:00 on 18th November.

Looking at the actual physical write of the redo log file (log file parallel write) there is no obvious shift in performance.

mttr_1

But when we look at “log file sync” and as the database becomes busy during the day we start to see a large shift in wait time. Seeing lots of waits in the 128 – 256ms ranges. I can only show a short period of time in these screen shots be we have the advantage that we can compare many days either side of the db restart.

mttr_2

When we look at all the redo related wait events we found a sudden shift in performance of the “LGWR wait for redo copy” wait event shortly after the database restarted.

(data filtered hourly in this screen shot .. 03:00 awr snap missing due to the restart)

mttr_3

So we know “LGWR wait for redo copy” changed shortly after the database was restarted on the new host but we can see the first changes of the Adaptive Log Writer poll mode does not occur for many hours later, so it’s reasonable to assume that post/poll mode changes are caused by the slower log writer rather than the poll mode causing the change in “LGWR wait for redo copy”.

Are the hosts really the same?

We checked all the obvious things for differences between the two servers.

Same CPU’s running at same clock speed?… YES
Is the database still using Huge Pages?… YES
Same Oracle OPatch patches?… YES
Same OS Patches etc?… YES

Have we seen this before?

Our site feeds all its key production raw AWR snap shot data into a central repository. The AWR table data is pulled every 15 minutes (over db links) from the live databases and held for more than a year in a single repository. The repository allows us to profile trends over a longer period of time. I profiled all the past database restarts (DBA_HIST_DATABASE_INSTANCE) and the “LGWR wait for redo copy” wait event. I was surprised to find we had seen this issue twice before but on a completely different database. The other odd behavior is that both times we had seen the problem it resolved itself approximately 20 days later. After a database restart, the “LGWR wait for redo copy” degrades and then 20 days later the wait histogram data shows it returns to normal.

LGWR wait for redo copy

My understanding is as the name suggests… the “LGWR wait for redo copy” is the Log Writer process waiting to get hold of the “redo copy” latch. User sessions take hold of the latch before filling the “log buffer” and the Log Writer takes the latch when it empties the buffer and writes the contents to disk. We started to take a look at the “redo copy” latch. We checked that the usage of “redo copy“ child latches seemed to be evenly spread across all the available child latches ( which they were. There is a known bug related to new sessions using just the first latch). We did some samples on v$latchholder for any session holding latches for a long time (we are looking for something holding “redo copy” for approximately 200 – 300ms).

Since we were looking for ~200ms I just wrote a very crude script in sqlplus. Select from v$latchholder joined to v$session and then sleep for 50ms, then another script which just keeps calling the first script and log the output.

Session SID 3529 looks interesting in this sample. It shows it is holding the “redo copy” latch and the GETS value did not change for at least 220ms. We were able to confirm the latch was being held by lots of different users and also on different child latches (LADDR).

CURRENT_TIMESTAMP                     SID SQL_ID        MACHINE         SQL_ID        LADDR            NAME                  GETS
------------------------------ ---------- ------------- --------------- ------------- ---------------- --------------- ----------
27-NOV-14 05.16.12.241113 PM +       3529 3nxkqz15y126c xxxxxxxxxxxxxxx 3nxkqz15y126c 0000001B8B412880 redo copy         98609934
27-NOV-14 05.16.12.241113 PM +       3529 3nxkqz15y126c xxxxxxxxxxxxxxx 3nxkqz15y126c 0000001B9B03E7A8 sim partition l    1647931
27-NOV-14 05.16.12.241113 PM +        191 47p2818jn0qjy xxxxxxxxxxxxxxx 47p2818jn0qjy 0000001BA64F9D30 cache buffers c  459154930

CURRENT_TIMESTAMP                     SID SQL_ID        MACHINE         SQL_ID        LADDR            NAME                  GETS
------------------------------ ---------- ------------- --------------- ------------- ---------------- --------------- ----------
27-NOV-14 05.16.12.296109 PM +       3529 3nxkqz15y126c xxxxxxxxxxxxxxx 3nxkqz15y126c 0000001B8B412880 redo copy         98609934
27-NOV-14 05.16.12.296109 PM +       3529 3nxkqz15y126c xxxxxxxxxxxxxxx 3nxkqz15y126c 0000001B9B03E7A8 sim partition l    1647931

CURRENT_TIMESTAMP                     SID SQL_ID        MACHINE         SQL_ID        LADDR            NAME                  GETS
------------------------------ ---------- ------------- --------------- ------------- ---------------- --------------- ----------
27-NOV-14 05.16.12.351009 PM +       3529 3nxkqz15y126c xxxxxxxxxxxxxxx 3nxkqz15y126c 0000001B8B412880 redo copy         98609934
27-NOV-14 05.16.12.351009 PM +       3529 3nxkqz15y126c xxxxxxxxxxxxxxx 3nxkqz15y126c 0000001B9B03E7A8 sim partition l    1647931

CURRENT_TIMESTAMP                     SID SQL_ID        MACHINE         SQL_ID        LADDR            NAME                  GETS
------------------------------ ---------- ------------- --------------- ------------- ---------------- --------------- ----------
27-NOV-14 05.16.12.405915 PM +       3529 3nxkqz15y126c xxxxxxxxxxxxxxx 3nxkqz15y126c 0000001B8B412880 redo copy         98609934
27-NOV-14 05.16.12.405915 PM +       3529 3nxkqz15y126c xxxxxxxxxxxxxxx 3nxkqz15y126c 0000001B9B03E7A8 sim partition l    1647931
27-NOV-14 05.16.12.405915 PM +        191 47p2818jn0qjy xxxxxxxxxxxxxxx 47p2818jn0qjy 0000001BA64F9D30 cache buffers c  459155521
27-NOV-14 05.16.12.405915 PM +       6369 4yw5na4anwtbq xxxxxxxxxxxxxxx 4yw5na4anwtbq 0000001B8B410870 redo copy         94583224

CURRENT_TIMESTAMP                     SID SQL_ID        MACHINE         SQL_ID        LADDR            NAME                  GETS
------------------------------ ---------- ------------- --------------- ------------- ---------------- --------------- ----------
27-NOV-14 05.16.12.461621 PM +       3529 3nxkqz15y126c xxxxxxxxxxxxxxx 3nxkqz15y126c 0000001B8B412880 redo copy         98609934
27-NOV-14 05.16.12.461621 PM +       3529 3nxkqz15y126c xxxxxxxxxxxxxxx 3nxkqz15y126c 0000001B9B03E7A8 sim partition l    1647931

Start: 27-NOV-14 05.16.12.241113
End  : 27-NOV-14 05.16.12.461621

220ms example

“sim partition latch”

Looking at the various samples, we found that the slow samples always seems to also involve the session holding the “sim partition latch”. These also had a static GETS value suggesting the same latch is held during this time. I asked an Oracle contact what “sim partition latch” related too and he said MTTR (Mean Time to Recover).

A few days later I checked my wait histogram data for “LGWR wait for redo copy” and the problem had stopped.

mttr_4

So just like the problem I had spotted in last year’s AWR data from another database, the problem started after a db restart and a number of days later (11 in this case) the problem stopped and performance returned to normal. The instance has not been restarted, work load not changed and the “LGWR wait for redo copy” / “Log file Sync” times are normal again.

“MTTR?”

Because our site has a large buffer cache and we want to reduce the crash recovery times, we set the fast_start_mttr_target parameter. This causes Oracle to enable the MTTR Advisor. What can we find about MTTR in AWR? I found that Oracle MTTR Advisory data is recorded in the table DBA_HIST_MTTR_TARGET_ADVICE.

  1* select * from DBA_HIST_MTTR_TARGET_ADVICE order by snap_id

   SNAP_ID       DBID INSTANCE_NUMBER MTTR_TARGET_FOR_ESTIMATE ADVIC DIRTY_LIMIT ESTD_CACHE_WRITES ESTD_CACHE_WRITE_FACTOR ESTD_TOTAL_WRITES ESTD_TOTAL_WRITE_FACTOR ESTD_TOTAL_IOS ESTD_TOTAL_IO_FACTOR
---------- ---------- --------------- ------------------------ ----- ----------- ----------------- ----------------------- ----------------- ----------------------- -------------- --------------------
    117020 2221677383               1                      300 ON        2889429        1007419633                  1.0113        1014534019                  1.0113     2136389504               1.0053
    117020 2221677383               1                      792 ON        7907807         996114435                       1        1003228821                       1     2125084306                    1
    117020 2221677383               1                      696 ON        6928611         996114435                       1        1003228821                       1     2125084306                    1
    117020 2221677383               1                      600 ON        5949415         996114435                       1        1003228821                       1     2125084306                    1
    117021 2221677383               1                       60 ON         441439        1911332814                  1.9112        1918447766                  1.9048
    117021 2221677383               1                      300 ON        2889429        1011713489                  1.0117        1018828441                  1.0116     2145756732               1.0055
    117021 2221677383               1                      792 ON        7907807        1000052428                       1        1007167380                       1     2134095671                    1
    117021 2221677383               1                      696 ON        6928611        1000052428                       1        1007167380                       1     2134095671                    1
    117021 2221677383               1                      600 ON        5949415        1000052428                       1        1007167380                       1     2134095671                    1
    117022 2221677383               1                       60 ON         441439        1923542883                  1.9185        1930752875                  1.9119
    117022 2221677383               1                      300 ON        2889429        1014254275                  1.0116        1021464267                  1.0115
    117022 2221677383               1                      792 ON        7907807        1002647983                       1        1009857975                       1
    117022 2221677383               1                      696 ON        6928611        1002647983                       1        1009857975                       1
    117022 2221677383               1                      600 ON        5949415        1002647983                       1        1009857975                       1
PROBLEM STOPPED at this SNAP in AWR

    117023 2221677383               1                       60 READY      441439        1923816649                  1.9175        1931026734                  1.9109
    117023 2221677383               1                      300 READY     2889429        1014901737                  1.0116        1022111822                  1.0115
    117023 2221677383               1                      792 READY     7907807        1003300690                       1        1010510775                       1
    117023 2221677383               1                      696 READY     6928611        1003300690                       1        1010510775                       1
    117023 2221677383               1                      600 READY     5949415        1003300690                       1        1010510775                       1

Looking at the AWR data we found the MTTR advisory status changed from ON to READY at the exact time performance returned to normal. Our problem has something to do with MTTR Advisory being switched ON. We checked the alert log to confirm nobody had made any changes to any parameters on the database. The MTTR had switched from ON to READY on its own (possibly a separate Oracle bug/issue).

We had not seen any performance problems during the previous start of this database (7 months earlier), so I decide to review the alert.log file from that instance restart and found the following error.
“Error 25532 happened during MTTR advisory startup” ( “MTTR specified is too large: ” Cause: The current FAST_START_MTTR_TARGET setting or a candidate MTTR setting is too large for MTTR advisory).
None of the parameters had been changed so it looks like Oracle had a problem starting the Advisor. So it seems like the previous restart of the database in May had not suffered any performance problem because the MTTR Advisor had actually failed to start. Since the status was OFF no data at all was found in the AWR repository during this time. It was only when the hardware crashed and the database was restarted did the MTTR advisor get switched ON and the Log Writer performance degraded. Something about our application profile means we have some inefficiently around the MTTR Advisory and this causes the “redo copy” latch to be held longer than needed. We are currently working with Oracle Support to try and understand the issue and to raise a possible bug with Oracle development.

Oracle Support did ask us to SQL trace the lgwr process and to see what happens when we switch back on the MTTR advisor. We produced this graph from the extracted wait data. You can see a clear difference although in this case the traces were done “off peak” so the increase in elapsed time is not as long as we see at peak load.

mttr_ready_on_compare

Summary

I hope you find this note interesting. If you are investigating issues with slow “log file sync” and slow commits times then you may want to also check for this issue. Key points to take away are to follow up strange changes in behavior and also the value of keeping long term AWR history. I’ll update this note if we get any more information on why the MTTR Advisor is actually slow and why it effects redo performance.

Advertisements

2 Comments »

  1. Thanks for sharing.

    Comment by Anand — 14/01/2015 @ 6:31 pm

  2. […] file sync” and the MTTR […]

    Pingback by Log Buffer #405, A Carnival of the Vanities for DBAs | InsideMySQL — 20/01/2015 @ 4:20 am


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: