5

An architect in our company has designed a solution based on 64 bit SQL2005 Standard edition synchronous mirroring between a physical (4 quad core, 32GB RAM) server and a virtual DR server (4 virtual CPUs with 16GB RAM) in two geographically remote data centers with a witness server (1 virtual CPU). Storage is Enterprise class SAN in both data centers.

The front end app is web facing with mixed read/write usage.

As a DBA (who wasn't consulted at the design stage) I'm worried this configuration has been designed with minimizing redundancy as the main criterion and that it's not going to work as a real world solution - network latency and performance of the virtual box will cause unacceptable response times? And even poorer performance if a fail-over is invoked.

Does anyone have experience of a similar set up?

SuperCoolMoss
  • 1,252
  • 11
  • 20

4 Answers4

5

Microsoft published a really good whitepaper on database mirroring that includes some good examples on how much performance impact you get from synchronous mirroring. You're totally right in that there's going to be a performance hit. Do a ping from the primary box to the database mirror and look at the round trip times in milliseconds: that's going to be the absolute bare minimum overhead that synchronous mirroring will add. The ping doesn't even take into account how long the remote server would take to handle each incoming transaction - it's purely network latency time.

The more network latency you add, the slower performance goes, and the hardware just sits idle:

alt text http://i.technet.microsoft.com/Cc917681.dbm_fig09(en-us,TechNet.10).gif

I'm a really big fan of asynchronous mirroring because it's an easy way to add some protection, but the protection can get behind. This is a good thing and a bad thing: it's good because it can handle network latency, but it's bad because you can lose any data that hasn't made it over to the failover site.

Also, when you're designing database mirroring solutions (whether synch or asynch) make sure to think about your index maintenance operations. If you do index rebuilds weekly, those will absolutely kill your mirroring backlogs because they produce so much logged activity that has to go over the wire.

Brent Ozar
  • 4,425
  • 17
  • 21
  • 1
    Thanks for the links Brent. Client wants zero data loss - so can't use asynchronous. Will reorganise indexes with a higher level of fragmentation daily - hopefully at a time of low transactional activity. – SuperCoolMoss Jun 13 '09 at 19:34
5

Although network bandwidth comes into play in a big way, the absolute number one factor to consider is what is the transaction log generation rate on the principal?

If the app and your maintenance doesn't generate any transaction log, then network bandwidth is really irrelevant. If it does generate log, then network bandwidth has to be able to handle the amount of log generated.

To answer your actual question, your h/w config may work (network issues aside) if there isn't a large OLTP workload on the principal. If there is, and you've got 4x4 processor cores generating transaction log then it's likely that your mirror server won't be able to keep up with replaying the log, no matter whether your network can cope with the log traffic. On Standard edition, there is one thread processing REDO of the log on the mirror - so your REDO queue is going to grow pretty big under heavy load.

The REDO queue is the amount of log that has been hardened on the mirror but not yet replayed in the mirror database - the larger it is, the longer it will be before the mirror database comes on line as the principal in the event of a failover. This is especially troublesome in Standard Edition where you don't have features like parallel redo and fast-recovery (database comes online after REDO and before UNDO) available.

And, of course, after a failover from the principal to the mirror, there's no way that that the mirror will be able to service the same workload as the principal server - so you'll be there, but potentially running a lot slower.

Hope this helps.

Paul Randal
  • 7,184
  • 1
  • 35
  • 45
  • Thanks, yes this does help. So there's 3 potential issues: 1. The network bandwidth gets saturated with transactions causing a backlog and unacceptable response times for the end users. 2. The duration of a fail-over (if one happens) may be extended at busy times as mirrored transactions are replayed with a single thread (limitation of Standard edition). 3. If failed over to the secondary server, it may not be man enough to handle the work load at busy times. – SuperCoolMoss Jun 13 '09 at 19:20
  • Absolutely. Glad to help. – Paul Randal Jun 13 '09 at 19:32
0

I do not have direct experience, but you should look at the OpenVMS cluster latency documentation. They discuss the issues of distance extensively.

Some things to consider, for purposes of an active/standby backup, a VM isn't necessarily a bad choice. If the disks for the VM are on a SAN, you should see pretty good performance.

The synchronous mirroring over long distances is what I would be more concerned about. Reads should not be affected, but every write will need to wait for remote commit ready before returning.

I should also add - while the OpenVMS documentation talks a lot about OpenVMS specifically, the latency issues are applicable to any sort of mirroring or clustering application. Doing "the math" about light-speed delay for your link distance can be very illuminating w/r to latency and responsiveness over long distances.

Christopher
  • 304
  • 2
  • 6
0

Your main concern should be the network link. The SANs shouldn't offer too much of a bottle-neck, but I have not seen any performance data on them so I can't really tell you yes or no. You should ask your the architect and yourself the following questions:

Take a good hard look at the network link

  • Is it stable?
  • How much packet loss is there?
  • How much bandwidth is available?
  • Is this the link that everyone else uses to surf the Internet at work?

Take a good hard look at the SANS

  • How many disks are there?
  • What is the RAID setup like?
  • How many other applications will be sharing resources?
  • What is the current utilization of the SAN?

Then look at your application

  • How many times will you be accessing the data?
  • How large will the database grow? (Ballpark)
  • How often will indices be created?
  • How much load do your queries put on the CPU, Memory, and Disk?
  • How will data be verified on the remote ends of the link?

Your RAM and processor setup sound good for an Enterprise Application. These types of questions are very hard to quantify, especially without real world data.

Virtual Machines are usually not, IMO, the reason for bottlenecks. It depends a lot on how they are setup and the distribution of resources. I/O is usually the single largest factor in VM speed, those SANs should help your speed considerably.

Each application is different, but you and your Architect need to sit down and answer these questions (above) together. And all the others that pop up in the process.

And if all else fails, go buy another server, and delete the VM.

Joseph Kern
  • 9,809
  • 3
  • 31
  • 55