2

I am quite new to Percona/database servers and i dont know how to approach the following issue.

Couple of days ago i upgraded the hardware of a percona cluster (3 nodes) servers with a much more superior one.

Specs of prevous hardware:

Vendor: OEM  
cpu: i7-3930K 3.2 GHz(12 cores)  
RAM: 64G  (8 x 8GB DIMM DDR3 1334Mhz)
I/O: software RAID

Specs of new hardware:

Vendor: DELL PowerEdge™ R730 DX291   
cpu: Intel(R) Xeon(R) CPU E5-2630v3 2.4GHz  (32 cores)
RAM: 128G  (8 x 16GB DIMM DDR4 1866Mhz)
I/O: Hardware raid ( raid10 - 1024M Non-Volatile cache - Adaptive Read Ahead - Writeback) 

Database Workload:

writes (inserts-updates): 25 per second
reads (select): 350 per second

The difference of the servers is huge and i was expecting that performance of percona(mysql) would significantly increase however after viewing various stats from new relic i noticed the opposite.

For example, some queries response time have increased from 2ms to 6ms. Query time for some other select queries has raised from 50ms to 75ms.

Any ideas how to troubleshoot this?

giomanda
  • 1,644
  • 4
  • 20
  • 30

1 Answers1

1

First, it is difficult to reply to your question without better understanding of your workloads and the hardware at hand. For example:

  1. Is your workload read or write centric?
  2. Your hardware raid has a non-volatile write cache? If so, it's configured in writeback or writethrough mode?
  3. What is your DRAM modules speed?
  4. And so on...

Anyway, lets do some educated guess: I imagine that your workload is read centric, and that Xeon's RAM are higher density but lower speed version. If so, you are probably bitten by these factors:

  1. Lower clock: the i7-3930k has 3.2/3.8 GHz speed (base/turbo), while your Xeon 2630v3 runs at lower 2.4/3.2 GHz
  2. Slower memory: due to higher density and ECC requirement, Xeon latency should be significantly higher then i7's
  3. Dual socket vs Single socket: to extract maximum performance, multi socket system need to be carefully tuned to avoid unneeded process migratoon, cache trashing, superflous remote-node memory access, and the likes.

In other words, you need to track your application's specific needs before changing your cluster infrastructure. Otherwise, you risk to buy high-throughput hardware while you need low-latency one, and vice versa.

GregL
  • 9,030
  • 2
  • 24
  • 35
shodanshok
  • 44,038
  • 6
  • 98
  • 162
  • thanks for the info provided. I have updated some of the information you mentioned. You are right, the workload is read centric however memory speed is higher than the previous one. Currently i am trying to figure out the latency as you already mentioned. – giomanda Dec 17 '15 at 15:02