1

I am having severe trouble with a SQL Server - in particular, with transactional replication.

We used to have a Windows 2003 physical machine, with SQL 2000. We call this “WebDB” The server was located in a datacenter, and we connected to it over WAN.

Back in the office, we have a Windows 2008, with SQL 2008. We call this “OfficeDB” This has transactional replication publication with around 20 catalogs. It was replicating to this physical server with no problems.

Recently, we deployed a new server, on a VM (Esxi) hosted environment. This environment is also connected to over WAN.

This server, has SQL 2008 R2 installed. It has 12gb memory.

We took a backup of the original WebDB and restored it on the new vWebDB. I configured it as a subscriber to our publication from OfficeDB.

After initializing, everything seemed to be working ok.

We switched our web app to use the new vWebDB server.

CPU seems to be constantly hovering around 100% Nothing on the web app has changed - the queries (although somewhat inefficient- and i will look at this in due course) have remained exactly the same.

Replication latency is up in the region of 10 mins - 2 hours (varies)

If i disable replication, the vWebDB server seems to calm down, and CPU usage drops to around 50 - 60 %

With replication enabled however, eventually, the DB starts timing out, and users get errors etc...

Things of note:

I did a straight backup / restore of the original database, onto the new, virtual hardware. The thing with this is the MDF / LDF files are split over 2 virtual disks. Is this good practice in a virtual environment?

The distributor was (and still is) on the OfficeDB server. The distribution Agent is the distrubutor (push subscription)

The thing that puzzles me is this setup was working fine on the old machine. I am open to suggestions on fixing this. I will edit this post with answers to questions if relevant.

Chopper3
  • 100,240
  • 9
  • 106
  • 238
alex
  • 1,710
  • 15
  • 43
  • 63
  • What version of ESXi? How many CPUs does the guest virtual machine have assigned to it? Do you have VMware-tools installed in the guest? Have you done any performance monitoring to ensure that CPU usage isn't high as a result of something like being blocked by slow disk IO? Is the CPU usage on the host high, or just on the guest? Is this a dedicated host, or are there other VMs on it? – Jed Daniels Mar 18 '11 at 06:20

1 Answers1

0

Are the subscriptions set up as push or pull? If pull, I'd suggest maybe moving them to push so that you reduce some load off of your subscriber and see if that helps. Also, that's an odd way to initialize a subscriber (from a backup of another subscriber). Are you seeing any errors in the replication monitor? What does the profiler say about the stored procedures responsible for delivering the replicated commands? If there's one that's running hot, what does the query plan for it look like?

Ben Thul
  • 2,969
  • 16
  • 23
  • I must of been mis-understood- the backup / restore was to MOVE the database from one server to another. I'm not sure it is using stored proc replication - how can i enable this? – alex Feb 03 '11 at 19:23
  • also, it is using PUSH subscription. – alex Feb 03 '11 at 19:29
  • I don't think I misunderstood. Initializing a new subscription from a backup of the subscriber (regardless of intent) is still a weird way to do it. When you set up the new subscription, what did you pass in for the @sync_type parameter to sp_addsubscription? – Ben Thul Feb 03 '11 at 22:30
  • you COMPLETELY misunderstood. Backup / restore has NOTHING to do with replication subscription. We moved the ENTIRE database to a different server, replication was removed, then, the new database, on the new server, was configured as a subscriber – alex Feb 04 '11 at 15:49
  • let me re-iterate... the subscription was stopped completely, then a backup taken, then the database restored on NEW HARDWARE. then the new db configured as a subscriber of the publication – alex Feb 04 '11 at 15:50
  • Sounds like you've got it under control then. – Ben Thul Feb 04 '11 at 16:34
  • No, it is still not resolved. – alex Feb 11 '11 at 10:54