2

Seconds_Behind_Master from SHOW SLAVE STATUS is considered an unreliable measure of Slave lag. mk-heartbeat is often offered as a reliable alternative.

Now mk-heartbeat does not even need the Slave to be running.

http://www.maatkit.org/doc/mk-heartbeat.html

Excerpt:

mk-heartbeat is a two-part MySQL and PostgreSQL replication delay monitoring system that doesn't require the slave to be working (in other words, it doesn't rely on SHOW SLAVE STATUS on MySQL).

So my understanding is that you create a DB/table on the Master, run mk-heartbeat with --update like so:

./mk-heartbeat -D heart --table beat -u heartbeat -p XXXXXXXXX --update -h 192.168.2.80

And then on the Slave you point mk-heartbeat at the DB/table on the Master (i.e. you do a GRANT statement on the Master to give the Slave privileges) and run with --monitor like so:

./mk-heartbeat -D heart --table beat -u heartbeat_slave -p XXXXXXXXX --monitor -h 192.168.2.80

I have done just this and even when updating over and over the 2.8M+ rows in the MySQL sample employees salaries table (which creates Slave lag, at least according to the unreliable Seconds_Behind_Master) I never see the mk-heartbeat --monitor change from:

0s [  0.00s,  0.00s,  0.00s ]

Maybe it is the case that I haven't produced enough lag and that as per the mk-heartbeat docs the replication events are propagating in less than half a second and I can expect to see zero seconds of delay:

mk-heartbeat has a one-second resolution. It depends on the clocks on the master and slave servers being closely synchronized via NTP. --update checks happen on the edge of the second, and --monitor checks happen halfway between seconds. As long as the servers' clocks aren't skewed much and the replication events are propagating in less than half a second, mk-heartbeat will report zero seconds of delay.

(My servers' clocks are using NTP and are in sync.)

But Seconds_Behind_Master is hundreds of seconds behind so I would think they are not propagating in less than half a second so I'm still uncertain whether I am getting an accurate view of the mk-heartbeat utility or not.

Would love to hear from anyone that has deployed this tool for monitoring their MySQL replication.

Thanks in advance.

Cheers

HTTP500
  • 4,827
  • 4
  • 22
  • 31

2 Answers2

2

You're close, but your problem is you have both instances pointing at the master. What you want is one instance updating the master every second, and the second instance reading the slave every second.

Also note it does not need to run on the actual database servers at all, it uses a regular mysql client connection. I run mine from my cacti server. Here's my sanitized /etc/rc.local for an example:

/usr/bin/mk-heartbeat -D maatkit -u maatkit -paardvark --update -h sql-master.fake.net --daemonize
/usr/bin/mk-heartbeat -D maatkit -u maatkit -paardvark -h sql-slave.fake.net --monitor --file /tmp/sql-slave.heartbeat --daemonize
cagenut
  • 4,808
  • 2
  • 23
  • 27
  • @cagenut, thanks for your response but I'm still not 100% clear on this. Are you replicating maatkit.heartbeat to your cacti server in your example? If not, what is being --monitor[ed]? – HTTP500 Oct 14 '09 at 01:13
  • @Jason, I wish I could whiteboard it for you but let me try here in text. mk-heartbeat is "just a perl script". It uses a mysql connection over a tcp socket to interact with both the slave and the master. If you run it on the mysql servers, its just connecting over loopback. You could run the instance that updates the master on a webserver and the instance that monitors the slave on fileserver, it doesn't care or matter. I'm hitting the char limit here, more in the next comment. – cagenut Oct 14 '09 at 17:49
  • I haven't read the code exactly but conceptually just think of it as if you wrote a script that inserts NOW() into the master over and over, then a second script that does a "select (NOW() - ts) from heartbeat" on the slaves (pardon my terrible sql). – cagenut Oct 14 '09 at 17:50
  • @cagenut, thanks again. I understand what you're saying. The biggest gap in my understanding though (and I see I'm not alone in this as betch seems to fallen into the trap as well) is whether the heartbeat table is replicated or not. The docs seem to mislead on this point i.e. "replication delay monitoring system that doesn't require the slave to be working" but I think I've come to understand that as: when the slave isn't working (stop slave) mk-heartbeat continues to --monitor but it'll just increment by one second. Most important thing to clarify: is the heartbeat table replicated? – HTTP500 Oct 14 '09 at 21:44
  • 1
    Yes the heartbeat table should be replicated, thats the core of how this works. The timestamp gets written to the master, and then replicated to the slave. So when you select the timestamp value at the slave, and calculate the difference between it and the system time, that gives you the exact number of seconds the slave is behind. What the docs mean by the slave-not-working comment is that if replication is off SBM will show "NULL", whereas mk-heartbeat will continue doing the math off the timestamp as it slips farther and farther behind. – cagenut Oct 14 '09 at 22:39
0

Here is what I'm doing:

mk-heartbeat -D maatkit -u maatkit -p pass --update -h master
mk-heartbeat -D maatkit -u maatkit -p pass -h slave --monitor

When I run the above the output snippet is

1618s [ 53.92s, 10.78s,  3.59s ]
1619s [ 80.90s, 16.18s,  5.39s ]
1620s [ 107.90s, 21.58s,  7.19s ]
1621s [ 134.92s, 26.98s,  8.99s ]
1622s [ 161.95s, 32.39s, 10.80s ]
1623s [ 189.00s, 37.80s, 12.60s ]
1624s [ 216.07s, 43.21s, 14.40s ]
1625s [ 243.15s, 48.63s, 16.21s ]

The numbers just slowly go up.

Does the heartbeat table need to be replicating to the slave? Is that what I'm missing?

quanta
  • 50,327
  • 19
  • 152
  • 213
  • @betch "Does the heartbeat table need to be replicating to the slave? Is that what I'm missing?" I'm pretty sure the heartbeat table does need to be replicated to the slave and my further testing seems to bear that out. Oddly though, in all the docs and slides I found on the interweb nowhere is that explicitly stated. – HTTP500 Oct 14 '09 at 21:48
  • I put this in: http://code.google.com/p/maatkit/issues/detail?id=648 If you guys have any reccomendations for better wording feel free to comment. – cagenut Oct 15 '09 at 13:58