I've had a quick look through the other similarly titled questions and none are particularly similar to the issues I'm currently having.
Basically, we've had a multi node memcached ring running for over two years, and for the most part its been problem free. The memcache installation was moved recently onto dedicated servers and the capacity was tripled (2x 1GB to 2x3GB). At first we had troubles with what I believe to be issues with how the php libraries were talking to the servers, either issues with the ordering of the server list, or them being started incorrectly.
The servers 'appeared' to be working correctly, but keys seemed to be being stored on multiple servers and an expire wouldn't expire all instances of the value.
Basically, we changed the hashing mechanism from standard to consistent, and the problems with key lookups (and expires/gets) and everything seems to have returned to normal.
However, I've been monitoring things over the last few weeks and noticed the first server seems to be being hit many, many more times than the second (the PHP memcache monitor tool reports one averaging 1,200 hits a second, whilst the second is only at 500).
Can anyone explain:
- Firstly, any idea of what is happening above, why one server would be getting so many more hits in a 'distributed' environment
- Secondly, what are the recommended settings for memcache clients in a distributed situation
- Am I doing the right thing using consistent hashing
- Should I use failover?;
- binary storage?;
- or compression?
- What is the correct procedure for resetting/moving a live memcache ring
I've found memcached to be such a fantastic tool, perfect for its purpose, but the actual best practice guides and useful documentation (very few describe it in any detail at all) are few and far between. If I can get some measure of insight into whats happening, I'll definitely post it as a tech article for all to see (to help in future), but I'm having trouble right now!
Thanks in advance