Tuesday, February 21, 2012

Speed Sometimes Means Changing The Rules

This page has been permanently moved. Please CLICK HERE to be redirected.

Thanks, Craig.

Do like Kirk did

How did James Tiberius Kirk beat the Kobayashi Maru? He changed the rules. As performance analysts, to beat the performance game Oracle sometimes allows us to change the rules as well. While relational heresy to some, the commit write facility is an in your face way to improve redo constained performance.

In a nutshell, the commit write facility is this: When a commit is issued and the client process quickly receives a "commit complete", it's not really completed... not yet anyway. So there is a possibility of loosing committed data.

Recorded in my book, Oracle Performance Firefighting, are the results of some experiments using Oracle's commit write facility based on Oracle 10g. They experimental results showed massive performance gains. However, in Oracle 11 some syntax changes were made. It was getting a little confusing, so I thought it was time for a new round of experiments. I thought I would post the results for all to enjoy.

It's OK to loose committed data...sometimes

There exists applications where the loss of committed transitions is acceptable. There are also applications where redo related logging is the key performance constraint. If you can put these two together, Oracle does provide a novel solution. Starting in Oracle 10g, there are ways for a client process to receive a "commit complete" when the commit may have not physically occurred... which means committed data could be lost.

While most DBAs will find this unsavory and near heresy, there are others who have discovered a delayed commit, with the unlikely possibility of lost committed transaction, just what they need to meet performance objectives. I know multiple DBAs who have proudly stated they have a system using the commit write facility. But to understand why, you need to understand their application.

Consider a social networking application. While very unlikely, suppose there was an outage during the commit related to adding others to your network. Are their legal issues? I doubt it. Has money been lost? Let's hope not. Has someone been hurt? Perhaps you lost some time, but hurt? No. All you end up doing is re-adding the people to your network. It may be irritating, but the performance gains combined with an unlikely interruption compared to constantly slower performance or more hardware, it is a compelling business argument to allow Oracle to say, "Yep, it's committed." when it's really not quite on disk just yet.

How to use the commit write facility

The commit write facility can be influenced within the commit statement, by issuing an alter session or alter system, and also by setting the instance parameters commit_wait and commit_logging. In llg the instance parameter commit_write exists, but the alert log states it has been deprecated. After I define what the various options mean, I'll demonstrate the various ways to enable commit write.

The first set of options relates to how the log writer background process writes. The options are either wait or nowait. As you can guess the default is wait, which is a synchronous write, that is, the log writer will effectively wait for the IO call to complete before issuing another IO call. As with accessing Oracle memory structure serially, synchronous writes can become a performance issue.

The other option relates to batching redo entries before the log writer physically issues the write. The options are immediate and batch. As you can guess, the default is immediate, which means as soon as the commit is issued, the log writer issues the write (either in wait or nowait mode).

As you would expect, the default mode is wait and immediate. And we would expect the fastest way to complete a commit would be a nowait (i.e., asynchronous) batch write. This is what I checked in my experiments far below.

In both Oracle 10g and 11g, the command line commit options both work. If you run the below command in either Oracle 10g or 11g, the results will look the same. In my experiments, I used the command line options.
SQL> update op_test set object_name='OraPub training rocks!' where object_id=100;
1 row updated.
SQL> commit write immediate nowait;
Commit complete.
SQL> update op_test set object_name='OraPub training rocks!' where object_id=100;
1 row updated.
SQL> commit write batch nowait;
Commit complete.
If this sounds convoluted, it is a little. Especially in 10g where both parameters were set using a single entry, like "wait,immediate". So in Oracle 11g, the commit_write was replaced with both the commit_logging and the commit_wait parameters. The good news is, your 10g based applications will still functionally work after you upgrade to 11g, except they commit write facility will not technically be working! While issuing these statements in 11g still results in a statement processed, if you look at the alert log file, you'll see something like this:
COMMIT_WRITE is deprecated,use COMMIT_LOGGING and COMMIT_WAIT instead.
What makes this really confusing (to me anyways) is the commit_write instance parameter exists, you can set it, you can see it, but according to the alert.log file it is depricated! Bottom line: Stop using commit_write, which leads us to the changes in Oracle 11g.

So in 11g, there are two new parameters commit_logging and commit_wait instead of a single commit_write parameter. And you can still alter session, alter system, and change these instance parameters. For example:

SQL> show parameter commit_wait

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
commit_wait                          string      WAIT

SQL> show parameter commit_logging

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
commit_logging                       string      IMMEDIATE
SQL> alter system set commit_wait=nowait;

System altered.

SQL> alter system set commit_logging=batch;

System altered.

SQL> show parameter commit_wait

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
commit_wait                          string      NOWAIT
SQL> show parameter commit_logging

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
commit_logging                       string      BATCH
SQL> 
While I issued the above commands, this is what occurred in the alert.log file.

[oracle@fourcore trace]$ pwd
/home/oracle/admin/diag/rdbms/prod18/prod18/trace
[oracle@fourcore trace]$ tail -f alert_prod18.log

Mon Feb 20 09:58:43 2012
ALTER SYSTEM SET commit_wait='NOWAIT' SCOPE=MEMORY;
Mon Feb 20 09:58:54 2012
ALTER SYSTEM SET commit_logging='BATCH' SCOPE=MEMORY;
If you have been using the instance parameter commit_write="batch,nowait" in 10g, that should be replaced with the commit_logging=immediate and the commit_wait=nowait parameters.

Interesting, on Linux running Oracle 11.2 and tracing the log writer using the Linux strace command, regardless of the command line commit write settings, the log writer always issued the pwrite64 system call. The async io calls (e.g., io_setup, io_submit, aio) where not used. This was cool: I also consistently observered the with only the batch/nowait option a second or more delay before the write was issued. It was pretty cool and tested this a few times and it always occurred. 

Why use the commit write facility

The log file sync wait event is posted by a server process (the only time I've seen it) after it has issued a commit, is not consuming CPU, and is waiting for the OK to continue processing the commit.

It's important to understand that when a server process commits, both the server process and the log writer engage in a very serial and very coordinated/controlled steps to ensure the committed data properly resides on an on-line redo log. Therefore, for every commit, CPU must be consumed, IO writing must occur, and serialization must be controlled. This allows for either a CPU, IO, or process CPU serialization issue (think: single core busy while other cores are not). This is one of the reasons why DBAs have trouble diagnosing and solving log file sync issues.

I'd say that 99% of the log file sync issues are the result of a commit occurring after each DML statement. For an extreme example, think of an insert and commit together in a plsql loop. As I researched and wrote about in my blog entries (4 in total) about log file sync and commit times, simply batching or grouping the DML before a commit (even a batch of two!) makes a massive performance difference.

This is why when you see a log file sync issue think application code first. Look for a situation where there is a commit (implicit or explicit) following a DML statement.

There are usually a number of options to resolve this, but one way is to cheat! Let Oracle do the batching of commits instead of the application. Or let Oracle ease up on the serialization control a bit. This is what the commit write facility allows us to do! The batch (vs immediate) allows Oracle to batch redo activity so it doesn't have to issue a write immediately after a commit. (Though it appears, based on the CPU bottleneck experiment below, the application change strategy has a greater affect.) And the nowait (vs wait) options allows for an async write, which immediate returns an "OK, your committed data is safely on an on-line redo log" when it may not quite be on disk yet and if there was a failure you may loose that data.

But what about performance impact? Read on...

Experiment Introduction

This was a rather involved experiment because I created two loads (CPU and IO write) for each of the four commit write options. The top wait event, log file sync was easily created by enclosing a DML statement and a commit within a simple plsql loop. And it's this commit statement that contained one of the four commit write options.

Each sample was 60 or 90 seconds and 30 or 32 were collected for each load and commit write options. This means over 240 samples where collected.

You can download the 8.5MB experimental pack HERE, which includes all the scripts, notes, and results in both PDF and Mathematica notebook format. There is much more interesting data, charts, and comments available than I have posted.

Remember, while I mention and link to specific scripts below, they are all included in the experimental pack zip file.

Here's a common situation: According to the users, the entire application is unacceptably slow. The top wait event is log file sync. During daytime processing, there is a clear OS CPU bottleneck and during the evening and nighttime processing there is a clear IO bottleneck.

Interesting, when the top wait event is log file sync there can be either a CPU or an IO bottleneck. As the experiments below will clearly demonstrate, we need to focus on minimizing time; both CPU consumption time and wait time. Focusing on one can lead to un-optimal results.

Because with log file sync the system can be constrained by either a CPU or IO bottleneck, I ran the experiment for both loads.

Experimental Setup

Simulating five concurrent sessions, there is a load script (dml1.sql (CPU) or dml2.sql (IO) which is run in the background five times. These two short plsql scripts are very different and you may want to quickly review them. Here is the actual dml2.sql script:

-- This is the dml2.sql SQL
--
def startv=&1
def endv=&2
def wait=&3
def batch=&4
def opt='write &wait &batch';
begin
  loop
    update op_test
    set object_name='kldfkjldfkjdfkjdfakl;dakldkldkldsfakjldfkjldfadkldakldfksljdfkjldfskjldfsk;l'
    where object_id=trunc(dbms_random.value(&startv,&endv));
    commit &opt;
  end loop;
end;
/
Here is how I used the dml2.sql script to place a log file sync /  IO write load on the system using the nowait/batch option.

wait=nowait
batch=batch
sqlplus tester/jester @./dml2.sql     1  5000 $wait $batch &
sqlplus tester/jester @./dml2.sql  5500 10000 $wait $batch &
sqlplus tester/jester @./dml2.sql 10500 15000 $wait $batch &
sqlplus tester/jester @./dml2.sql 15500 20000 $wait $batch &
sqlplus tester/jester @./dml2.sql 20500 25000 $wait $batch &
sqlplus tester/jester @./dml2.sql 25500 30000 $wait $batch &
In another session I collect the performance data. It's pretty straightforward: I gather the initial performance data, sleep x seconds, gather the final data, compute the differences, store the differences (op_results table), and repeat 30 or more times. The script is a little long to paste, but CLICK HERE and you can view it.

The experiment was run using Oracle 11.2 on a Dell single CPU 4 core server running Linux version 2.6.18-164.el5PAE (mockbuild@ca-build10.us.oracle.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)).

Now let's take a look at the actual results!

Experimental Results: CPU Bottleneck

First I need to mention that ALL the results and lots of notes reside in the Mathematica Notebook. You can view the 3.2MB PDF file here. For the CPU bottleneck experiment the load was placed on the system using the dml1.sql script.

From a broad perspective,  here are the results:

There were 32, ninety second samples gathered for each of the four settings. Here is a short description of the table columns:
Setting is the commit write setting.
Commit/sec is the average workload expressed in commits.
CPU % is the average OS CPU utilization percentage.
log file sync Wait Time % is the average log file sync wait time when compared to all the other non-idle wait time.
log file par write Wait Time % is the average log file parallel write wait time when compared to all the other non-idle wait time.

From a raw "did performance get better" perspective, the results are disastrous. Specifically, the workload rate only slightly increased and our users would probably not notice (or not tell us) a difference. But from a performance analysis perspective, there are deep Oracle performance analysis insights to be gleaned!

The above histogram shows the difference between our four sample sets. It's a histogram so it shows the number of occurrences (vertical axis) vs CPU time plus non-idle wait time per commit. (Because it's smoothed the vertical axis occurrences is not the actual number of occurrences.) The main separation between the histograms is when using the nowait option, that is, allowing the log write to write asynchronously. But this can be misleading, because while the time looks much better and is statistically significant, it's only around 2%!!

The good news is we essentially removed the log file sync wait time by using the "nowait" option. While we did very effectively alter the wait time situation (below: Avg Wt), wait time makes up a very small portion of the total time (below: Avg Rt) and so nearly eliminating the wait time did not improve the performance situation much. To better understand this, we need to look at the performance situation from a different perspective.
The statistics shown in the table above are in a unit of work time based analysis format. This allows us to unite Oracle performance analysis with Operations Research queuing theory. There are many references about this; my Advanced Analysis class, the last chapter in my Oracle Performance Firefighting book, conference presentations, white paper, and in my blog postings. Let me explain the strange looking columns from the above table.

Avg L is the workload, which in queuing theory is represented by the greek letter lambda, hence L.
Avg CPUt is the CPU time per each commit. This is an approximation and is derived as the total instance CPU time divided by the total commits that occurred over the sample period, which in this case is 90 seconds. The more singular the bottleneck, the less approximate the number becomes. This is also called the service time.
Avg Wt is the non-idle wait time per commit. This is an approximation and is derived as the total instance non-idle wait time divided by the total commits that occurred over the sample period, which in this case is 90 seconds. The more singular the bottleneck, the less approximate the number becomes. This is also called the queue time.
Avg Rt is the response time, which is simply the sum of CPUt plus the Wt time or queuing theory terms, service time plus queue time. Keep in mind, response time is the time to process a single unit of work, which in this case is a commit. The elapsed time is the time it takes to run a SQL statement, process a group of SQL statements, run a process, etc... many pieces of work.

If you look at the above table closely, you'll notice the CPU time per commit is a massive part of the response time (CPU + wait time) AND the commit write options did not change the CPU time component much. Regardless of the workload intensity, async IO, or batching the commits, it still took pretty much the same amount of CPU. Since CPU consumption is what drives this response time, our small CPU consumption drop didn't do much for performance. Yes, statistically the CPU decreased but not enough to make a real business difference.

We did see wait time percentage improvements (reduction) when comparing immediate to batch. If you look closely at the table above there is about a 2% wait time drop when changing from the default of wait immediate to wait batch, but the massive percentage drop occurred when going from nowait immediate to nowait batch. Because the wait time is so low, the 85% wait time reduction (1.3 to 0.2) doesn't amount to much. However, if wait time wait was the majority of the response time, it would have indeed made a difference!

When using the batching option, I expected to see a greater reduction in the CPU consumption per commit. My guess is the CPU load was so intense, the next Oracle issue was also CPU related, and since a commit was still occurring after each DML statement, Oracle just couldn't gain significant CPU efficiencies.

Focusing on the wrong thing...

This is a wonderful example of how focusing only on wait time can get you into trouble! We need to focus on reducing total time and the time per unit of work. So while our very specific strategy of reducing log file sync wait time worked beautifully, because the response time was more dependent on CPU consumption rather than wait time, the net performance gain was minimal.

Going deeper...

While not occuring in this particular experiment, it's important to understand that even a 5% drop in the service time (i.e., CPU consumption per unit of work) can make a significant performance improvement. This is especially true when the system is operating deep in the "elbow" of the response time curve. A drop in the service time (CPU time per unit of work) shifts the response time curve down and to the right. As a result of the response time curve shift, a 5% drop in service time can result in a much larger drop in the response time. And since elapsed time (i.e., the time to run a SQL/process/etc) is equal to the amount of work to be processed multiplied by the response time (time/work), a small drop in the service time can results in a much larger proportional drop in the elapsed time! I blogged about this in my Anticipating SQL Elapsed Times posting.

While this is out of scope for this posting, if you want to watch the response time curve change (before you eyes) when the service time decreases, download RT Explore interactive Mathematica tool. It's pretty cool.

What does this mean in real production system?
  1. Batch DML to reduce the likelihood of accumulating harmful log file sync time in the first lpace. (see blog post here).
  2. Using the commit write facility can dramatically reduce log file sync wait time, which can yield similar dramatic performance improvements.
  3. If the log file sync time is a small part of response time, don't expect a large performance improvement by reducing or even eliminating it. This is what occurred in this experiment.
This experiment was a CPU bottleneck with log file sync as the top wait event. It's also common to see an IO write bottleneck when log file sync is the top wait event. Hence the next experiment...

Experimental Results: IO Bottleneck

First I need to mention that ALL the results and lots of notes reside in the Mathematica Notebook. You can view the 3.2MB PDF file here.

From a broad perspective,  here are the results:
From a raw "did performance get better" perspective, the results are amazing! Over a 25X workload improvement is pretty good, eh? And if you read below, you'll notice when using the "nowait" options, the CPU and non-idle wait time to process a commit dropped by 96%. Wow! Let's take a closer look to glean some deeper performance insights.
The smoothed histogram above, like with the CPU bottleneck situation, is based on the CPU time plus the non-idle wait time per commit for each of our samples. While we can't really tell if the difference will impact performance, we can obviously see there is a very significant difference between the wait and nowait options.

By using the "nowait" options, we removed the log file sync wait time and shifted the processing focus (percentage perspective) from IO to CPU. We can see this in the table below. Click on the table so its humanly readable.
The statistics are in a unit of work time based analysis format, which I detailed in the CPU bottleneck section above.

First, look at the split between the CPU consumed per commit (Avg CPUt) and the non-idle wait time per commit (Avg Wt). In the CPU bottleneck situation most of the time was related to CPU consumption and so eliminating the wait time had little impact on performance. But in this IO bottleneck situation, the situation is reversed with the majority of commit time being associated with wait time... and specifically log file sync! In fact, using the wait/immediate option (the Oracle default), 84% of the commit time was related to log file sync

Since there is plenty of CPU resources available, all CPU cores are being used (I checked), commit time is heavily wait time dependent, and the wait event is log file sync, we can reduce the wait time and increase performance using the commit wait facility! Of course not committing after each DML would also help, but the commit write facility provides a non-application centric solution, that is, an Oracle-centric solution.

Why did the commit write facility work and in particular the nowait option? Because the log file sync time occurs when the server process is not consuming CPU but waiting on something to complete. What to complete? Predominately the log writer writing to disk! By releasing the log writer from writing synchronously, the server process receives the "commit complete" much quicker, the log file parallel write is "complete", and the server process can continue processing. As with the CPU bottleneck, log file sync wait time was virtually eliminated. The big difference in this case was the commit time was highly log file sync time dependent, so when this time was eliminated there was only CPU time left to complete the commit.

Oracle was also able to gain some efficiencies and that decreased the CPU consumption per commit around 48%. The drop in CPU requirements and drop in wait time allowed the system to absorb more work. In the experiment, the workload will increase until the system stabilizes and in this case it stabilized at a staggering 2400% increase.

It's important to understand that when the commit intensity increases, the time to process each commit may increase. This is another reason why a 2400% workload increase is so amazing, because the time to process a commit did not remain the same or even increase... it decreased by around 96%. So we witnessed a commit time decrease while the number of commits per second increased!

Going deeper...

Understanding the relationship between work and time is fascinating and it what unit of work time based analysis is all about. If you look closely at the very bottom of the Mathematica notebook, you'll notice I developed a formal mathematical model relating time and work. I was then able to set the workload for all four wait/immediate combinations to the default of wait/immediate and then fairly compare their response times, that is, the time involved to complete a single commit. Even better would have been to create their respective response time graphs, but I'll save that for another posting!

What does this mean in real production systems?
  1. Batch DML to reduce the likelihood of accumulating harmful log file sync time in the first place. (see blog post here).
  2. Using the commit write facility can dramatically reduce log file sync wait time, which can yield similar dramatic performance improvements. Which clearly occurred in this experiment.
  3. If the log file sync time is a large part of response time, expect a large response time decrease, a large workload increase, or a mix of the two. In this experiment we saw both a large response time decrease and a large workload increase.
Summary

While there is lots of detail and analysis in this posting, it can be distilled down to a few important points.

1. Batch DML to reduce the likelihood of accumulating harmful log file sync time.

2. If the top wait event is log file sync and its wait time is a significant part of the total time (CPU + wait time), then using the commit write facility can yield impressive performance gains measured by both a decrease in response time and an increase in workload (i.e., arrival rate).

3. The 10g commit_write instance parameter has been replaced in 11g with both the commit_logging and the commit_wait parameters. The parameters can be altered in the instance parameter file or by issuing an alter session or alter system. And finally, the commit command line options remain the same.

4. Never use the commit write facility were loosing committed data is not acceptable. And always get this in writing! (or make sure your resume is updated)

Thanks for reading!

Craig.

3 comments:

  1. Sir OraPub!

    When is losing data ever 'ok'?

    I know, of course, you're not advocating data loss in any way/shape/form, but even in silly 'working-out-of-our-garage' businesses, no one will 'fess up to being willing to lose data.

    This, plus the fact that bottlenecking on 'log file sync' is more likely on big data/high usage systems makes 'written' approval unlikely.

    How 'bout you whip out the Shalla-Hammer on this, and give us a *predictive* model for the chances of data loss given the system component MTBFs which we can balance against recovery/redo mechanisms in place, and then run that against whatever business SLA is in force? :-)

    That would be great for your next European tour!

    ReplyDelete
  2. Hi Craig,

    Today I spotted an "production Example" which can work on

    http://oaktable.net/content/commitwait-and-commitlogging

    Keep up the good posts they are inpiring.

    ReplyDelete
  3. Thanks for sharing these information. It’s a very nice topic. We are providing online training classes

    Best bca college in noida
    Top bca colleges in noida

    ReplyDelete