2

I've had a quick look through the other similarly titled questions and none are particularly similar to the issues I'm currently having.

Basically, we've had a multi node memcached ring running for over two years, and for the most part its been problem free. The memcache installation was moved recently onto dedicated servers and the capacity was tripled (2x 1GB to 2x3GB). At first we had troubles with what I believe to be issues with how the php libraries were talking to the servers, either issues with the ordering of the server list, or them being started incorrectly.

The servers 'appeared' to be working correctly, but keys seemed to be being stored on multiple servers and an expire wouldn't expire all instances of the value.

Basically, we changed the hashing mechanism from standard to consistent, and the problems with key lookups (and expires/gets) and everything seems to have returned to normal.

However, I've been monitoring things over the last few weeks and noticed the first server seems to be being hit many, many more times than the second (the PHP memcache monitor tool reports one averaging 1,200 hits a second, whilst the second is only at 500).

Can anyone explain:

  • Firstly, any idea of what is happening above, why one server would be getting so many more hits in a 'distributed' environment
  • Secondly, what are the recommended settings for memcache clients in a distributed situation
    • Am I doing the right thing using consistent hashing
    • Should I use failover?;
    • binary storage?;
    • or compression?
  • What is the correct procedure for resetting/moving a live memcache ring

I've found memcached to be such a fantastic tool, perfect for its purpose, but the actual best practice guides and useful documentation (very few describe it in any detail at all) are few and far between. If I can get some measure of insight into whats happening, I'll definitely post it as a tech article for all to see (to help in future), but I'm having trouble right now!

Thanks in advance

kwiksand
  • 463
  • 1
  • 8
  • 16

2 Answers2

0

Are you sure the front-ends that communicate with Memcached have properly synced configuration entries for your pool?

Can all of the servers make a clean connection to the Memcached node that is having low connectivity problems?

Make sure you have Memcached::OPT_LIBKETAMA_COMPATIBLE turned on as well.

Regarding configuration; if you are storing large objects compression/igbinary will speed things up on the network I/O end obviously there might be a drawback so each case is different. Benchmarking is the key.

Aleksey Korzun
  • 276
  • 1
  • 4
  • All the servers can defeinitely contact the second memcached node. I'm currently in the processes of writing a test suite which I'll run in a separate environment for testing. The usage on the second server appears to have grown a bit tonight , up to about 700-800 requests a second now, ish. I wonder what the upper limit on requests/sec I can get to on these servers (all part of the benchmark), 1200/s may well be nearing limits, I'd have thought! – kwiksand Jun 22 '11 at 00:31
0

If your keys have unequal access patterns you will see unequal traffic to each memcached node. e.g. If you have 2 keys, one of which a is get/set 500 times per second and one b which is get/set 250 times per second then the node which contains a will have twice as much traffic as the node which contains b.

In my case, we had 8 memcached nodes with a few thousand keys. One of those keys was doing about 800 gets/sec at peak traffic and almost every other key was doing less than 1 get/sec. The memcached node which had the busy key exhibited significantly higher traffic than the others.

If you want to balance the traffic equally to each of your memcached nodes then you either need to:

  • Play games with your keying to make sure that your busy keys are spread out properly.
  • Switch to using repcached or Membase to replicate the keys across multiple nodes
Conor McDermottroe
  • 938
  • 1
  • 7
  • 17