3

Sometimes my NodeBalancer takes a node out of rotation and I would like to find out about it so I can reboot it. How do I do this?

This is the background:

I have two websites of the same type, which both run CPU intensive applications. One of the websites runs off a single VPS, which has a load of approx 3000 executions a day (each of which takes anywhere from 5-50 seconds) and for the other website I've installed a NodeBalancer with 9 nodes underneath. The each execute approx 40-60% of what the single VPS does.

This all works fine actually with hardly any disruption, but once a week or so, a node stops responding to the NodeBalancer and is taken out of rotation. This usually happens in combination with very high CPU usage. Now - this never happens at the single VPS (which has run without disruption or reboot for a year now).

So like I said - on the load balanced nodes I do have disruptions (although I run the same scripts and software for 99%) and I would like to find out when the node is taken out of rotation, so I can reboot it and get it up and running again.

Currently my workaround is to act on the mails I get from Linode, which alert me to high CPU usage. In some cases I then do a manual reboot if the node indeed became inactive.

user1914292
  • 111
  • 10
  • You're going to have to talk to Linode about this. The best thing we can do it tell you to monitor it and have that alert you. – Jacob Mar 17 '14 at 17:05
  • I did talk to Linode and they didn't offer an API to the NodeBalancer, but of course my dashboard shows the status. They had no solution readily available, but I was hoping that someone figured something smart out themselves - maybe by doing something from the Linode to the NodeBalancer. – user1914292 Mar 17 '14 at 17:58
  • It sounds like the node isn't responding when this is happening? Can you monitor the nodes with a tool like pingdom to check? – Jacob Mar 17 '14 at 18:02
  • i could and i have several other VPSs running as well as other hosting, from where I could ping, but that's not really what i'm after. I'd be pinging the external IP and not the connectivity of the NodeBalancer to the node. And the external IP doesn't necessarily have to do anything, because the nodes are operating over the internal Linode network. But most importantly - I'd like to be able to reboot the Node when this happens, so ideally I'd check from the Node itself somehow? – user1914292 Mar 17 '14 at 18:53
  • If you're losing connection to the load balancer it seems like Linode should help? I really don't understand the issue? – Jacob Mar 17 '14 at 19:18
  • No, the Node's being taken out of rotation - but I would like to find out on the Node itself (ideally), but any other place where I could trigger a reboot of the node is also fine. Linode's only comments are that the NodeBalancer does not have an API and that I should just check sanity on the Nodes. However, all the checks in the world are not going to help me, unless I do exactly what the NodeBalancer does and at the same time. But of course they don't disclose what exactly they do. – user1914292 Mar 18 '14 at 17:16
  • Can you scrape the status from the page? – Jacob Mar 18 '14 at 17:20
  • not without a huge effort and it being an error-prone solution - it's behind a login and a few clicks – user1914292 Mar 18 '14 at 18:37
  • Than you need to find a new provider. – Jacob Mar 18 '14 at 18:42
  • :) for the rest Linode is awesome so far... Thanks for thinking with me though! Much appreciated! – user1914292 Mar 18 '14 at 19:00

2 Answers2

3

A bit naive approach would be to have the nodes serve a page example.com/node.html returning a different result for each node (for example numbers 1-9). Then from an outside computer you request that page constantly (say once per second). You should receive a more or less random result (series) from all servers after a certain time interval (say a minute), then a script can check after this if all numbers are present, and if a node is missing then call the Linode's API to restart it.

Easier than this, can't you just easily check in the node itself if it's receiving web requests (from 'netstat', firewall, logs etc, you can even check for the load balancer's hostname as origin) and if not it means the LB has taken it out of rotation?

LinuxDevOps
  • 1,754
  • 9
  • 14
  • wow - the 2nd part of your remark makes complete sense! The first part is a bit tricky as requests from the same origin get routed to the same node and because it would be elaborate, but the 2nd part is indeed very logical. what would be easiest tool / check I can do to see if traffic from the NodeBalancer comes in? – user1914292 Mar 19 '14 at 17:15
  • If the webserver process name is 'apache' for example, then to get the number of current apache connections you can do: `netstat -tap|grep -v LISTEN|grep -c apache` , and connections from a hostname (loadbalancer) would be `netstat -tap|grep -v LISTEN|grep hostname|grep -c apache` , so if that last line is 0 that means there's no apache connections coming from the load balancer. You can run a cronjob every 5 mins for example checking this and sending you an email (`mail -s node_down myemail@example.com`) if 0 connections detected – LinuxDevOps Mar 19 '14 at 17:41
  • netstat -tap|grep -v LISTEN|grep -c apache gives me a result of either 0 or 2, although the node is up all the time? – user1914292 Mar 19 '14 at 18:28
  • your script should check several times and only trigger the alert after a few 0 connections, probably the node is up but not receiving any requests some times? `netstat -tap|grep apache` should show all the webserver connections (try also just `netstat -tapn` for all connections), check other nodes too, how many connections (range) have a healthy node typically? you may be having too many nodes and hence some are idle. – LinuxDevOps Mar 19 '14 at 18:36
  • Hmm, actually the load balancer will typically do one web request every few seconds to the offline node to see if it's up again so checking for 0 connections is not good, should check for a minimum in a period of time. To complicate things, in your case the issue is CPU. To summarize I think you should first profile the node (# connections, CPU usage etc) when they are in and out of rotation and then use that to detect internally the state. As suggested before it may be easier to use something like python/Beautiful Soap to log in automatically and scrap Linode's status. – LinuxDevOps Mar 19 '14 at 18:50
3

After searching the internet some more, it turns out there's a command line interface to Linode, which allows one to perform all kinds of actions on the Nodes as well as NodeBalancers.

This will help me automatically restart a node that's down by performing a simple command like:

linode restart My-Linode-Label

And it will also allow me to list all nodes which are handling traffic on a NodeBalancer by performing:

linode nodebalancer node-list mynodebalancer 80

I will check if this indeed gives me a status on the nodes or shows me the active nodes only and update the answer. It seems this is the solution I was looking for as it contains many more actions that I will most likely want in the future, such as starting a new node etc.

The CLI can be found at github at https://github.com/linode/cli

UPDATE: this CLI indeed gives me the status of each node under the node balancer in a easy to output of name, status and address. I will be able to easily run a script of this.

As it seems I only need to have some perl modules added and i'm good to go! Can't believe Linode support was not aware of this one...

user1914292
  • 111
  • 10
  • Apparantly this is built on the API they launched last january - the documentation is here: https://www.linode.com/api/ and it has wrappers in most major languages already... – user1914292 Mar 23 '14 at 18:02