2

I started to notice my web interface hasn't updated the graph in hours. Each time I restart the gmond process on my clients, I see that the graphs does work. I come back an hour or so later and my graph is blank, just a white graph and nothing has been updated. if I started it again, it works just fine. I'm not sure what it is.

My setup is as follows.

Client -> gmond collector -> gmeta/web host

gmetad.conf

data_source "ENG1" 10.199.1.110
data_source "ENG2" 10.199.19.100
data_source "QA" 10.199.10.200

gmond.conf from 10.199.10.200

globals {
    daemonize = yes
    setuid = yes
    user = nobody
    debug_level = 0
    max_udp_msg_len = 1472
    mute = no
    deaf = no
    allow_extra_data = yes
    host_dmax = 0 /*secs */
    cleanup_threshold = 300 /*secs */
    gexec = no
    send_metadata_interval = 0 /*secs */
}

cluster {
    name = "QA"    
}

udp_send_channel {
    host = 10.199.10.200
    port = 8649
    ttl = 1
}

udp_recv_channel {
    port = 8649
}

**gmond.conf no my client files are the same as above except it doesn't have the udp_recv_channel block defined. I forwarded the states from my client to a collector (such as 10.199.10.200), which then gets pulled by the gmeta server (10.199.1.110). This server also collects data from a group of servers defined as "ENG1."

sdot257
  • 3,039
  • 5
  • 29
  • 38
  • Please post the /etc/gmond.conf and /etc/gmetad.conf files for client and server. Also, take a look at the iptables rules on client and server. – dmourati May 17 '11 at 23:40
  • Updated, and no fw between clients and server. – sdot257 May 17 '11 at 23:48
  • Looks like you are not using multicast at all? In any event, the gmetad communication from the server to the store needs a tcp listener on port 8649. Try adding: tcp_accept_channel { port = 8649 } – dmourati May 18 '11 at 00:08
  • what do you mean by you modify the config to use multicast? What changes did you make to your gmond.conf and gmetad.conf so that it works? –  Mar 02 '12 at 04:19

3 Answers3

4

I've been facing with this problem when Ganglia is installed on Ubuntu. According to the document, it sounds like gmond lost metadata and doesn't know what to do with the metric data. Since you're setting up Ganglia in unicast mode, you need to instruct gmond to periodically send metadata by changing send_metadata_interval to a non-zero value:

globals {
  daemonize = yes
  setuid = yes
  user = ganglia
  debug_level = 0
  max_udp_msg_len = 1472
  mute = no
  deaf = no
  allow_extra_data = yes
  host_dmax = 0 /*secs */
  cleanup_threshold = 300 /*secs */
  gexec = no
  send_metadata_interval = 30 /*secs */
}

Give it a try!

Read more:

http://sourceforge.net/apps/trac/ganglia/wiki/ganglia_release_notes

3.1 collectors will request a gmond to resend its metric description information if needed and if using multicast, if you are using unicast there is no way to do that yet and so if you restart your collector will be left with partial or no data from the cluster that is being collected through it untill all gmond in that cluster are restarted. To workaround this problem if using unicast setup send_metadata_interval to a reasonable value so that all gmond resent their metadata periodically to the collector in case it gets lost.

http://sourceforge.net/apps/trac/ganglia/wiki/FAQ

In recent versions of gmond (3.1.x), a new global variable was added in gmond.conf called send_metadata_interval, with a default setting of 0. Purpose was to reduce network traffic. In 3.1 metric data is sent separately from metadata e.g. metadata contains detailed description, grouping, other possible setting. A value of zero means that the gmond will send metadata when it starts, and no other time (which is consistent with older versions of ganglia).

If you plan on using unicast mode, please set send_metadata_interval to something other than 0. 30-60 seconds has been found to work reliably in most cases. Setting this variable to a non-zero value will make the gmond processes periodically announce their metrics and the graphs will reappear on the host-view page.

quanta
  • 50,327
  • 19
  • 152
  • 213
0

Try adding: tcp_accept_channel { port = 8649 }

dmourati
  • 24,720
  • 2
  • 40
  • 69
  • on the `gmeta` server or the collector - `10.199.10.200` Apologies, I looked at my datastore (10.199.10.200) and I do see the `tcp_accept_channel` block. – sdot257 May 18 '11 at 12:38
0

I modified by config to use multi cast and it's working now!

sdot257
  • 3,039
  • 5
  • 29
  • 38