2

We have several (20+) application servers spread across multiple data-centres. We need to centralise the logfiles and monitor them from a single box.

Requirements:

  1. Large logfiles, in the order of 5-10 Gb per day, per application - so there could be several thousand lines a second.
  2. Latency is important - we need to be able to react to log events within seconds, if possible.
  3. Performance footprint should be as low as possible, and should scale predictably with logfile size.

I'd like to get opinions on the best approach to centralise these log files?

One approach we though of was to use Logstash (http://logstash.net/) and Graylog2 (http://graylog2.org/), and send the log events over the network to the monitoring box, either straight TCP, or via a bus like RabbitMQ.

A second approach is to have a "shared" SAN volume that all the application servers will write their logfiles to.

What are the pros/cons of the above approaches? Any caveats we should be wary of? Best practices?

EEAA
  • 108,414
  • 18
  • 172
  • 242
victorhooi
  • 515
  • 3
  • 11
  • 20
  • 1
    I'd be wary of using TCP for logging without some sort of local queue/buffer. See [Bitbucket's postmortem](http://blog.bitbucket.org/2012/01/12/follow-up-on-our-downtime-last-week/) of their downtime last week. – EEAA Jan 14 '12 at 02:23
  • I'd be more vary of doing multi-site synchronous replication against a SAN unless you have _very_ fast/low latency connections. Perhaps something using TCP with a buffer or UDP syslog would work? – pauska Jan 14 '12 at 02:43
  • @pauska - Oh, agreed. I don't think the SAN option is good, either. TCP is probably fine, as long as the systems will buffer locally and not block if the syslog server is down or unreachable. – EEAA Jan 14 '12 at 02:55
  • @pauska Hmm, what are the main objections against SAN replication for the logfiles? Is it somehow less reliable that TCP or message buses or something? I would have thought the load on the boxes would be less with SAN as well? Or are there other issues? (I was leaning towards TCP/RabbitMQ anyhow, simply because of Logstash/Graylog2, but I'm curious what the objections to SAN replication are). – victorhooi Jan 14 '12 at 06:06
  • Hmm, I may have got the terminology wrong - it was a colleague who suggested the SAN solution, but I think he might have meant clustered, and not replication - he said shared SAN volume - would that make sense? Does that change anything? Pros/Cons versus TCP/RabbitMQ for log shipping? – victorhooi Jan 14 '12 at 06:29

4 Answers4

3

With the open-source nxlog tool you can centralize your log files from Linux and Windows hosts. It can forward over UDP, TCP, SSL, has powerful filtering capabilities, disk based buffering and a wealth of other features.

b0ti
  • 986
  • 1
  • 6
  • 13
  • Wait, centralizing from Windows hosts as well? In the past I've used syslog-ng and installed the necessary plugins to Windows servers so they could communicate with syslog-ng. I need to take a look at this! :) +1 to you sir, for making me google for something. – Janne Pikkarainen Jan 17 '12 at 14:44
  • Most people use snare or eventlog-to-syslog for this to send logs into syslog(-ng/rsyslog/etc). But then you need to parse the logs again to extract usernames and such. [nxlog](http://nxlog.org/why-nxlog) can avoid this. – b0ti Jan 17 '12 at 18:33
2

Just set up a centralized log server running syslog-ng (or rsyslogd like the latest trend seems to be) and configure your servers applications / syslog to log to your syslog server. That approach is clean and field-tested around the world.

5-10 GB per app per day is respectable but not something that would overload your syslog-ng. No sir, that requires more effort. Several thousand of lines per sec is something I'm looking at work every day and the syslog servers are mainly idle.

I personally like syslog-ng because it's so plug 'n play. If you add new servers pointing at your syslog server, syslog-ng will automatically create the necessary directory hierarchy for its log files, no sysadmin needed.

Janne Pikkarainen
  • 31,454
  • 4
  • 56
  • 78
0

I evaluated LogRhythm a year ago and that service was very awesome indeed. Give them a look, they can do a lot more then simply centralizing your logs too. Alerts, normalazation, reporting, etc.

Eric C. Singer
  • 2,319
  • 15
  • 17
0

Take a look at this document from rsyslog:

http://rsyslog.com/doc/rsyslog_reliable_forwarding.html

With such setup you can forward messages to remote syslog (or graylog2-server as it can listen for syslog messages) and if the remote server is down it will queue it locally on disk. I had problems with forwarding to graylog2 under high load, if graylog2 or elasticsearch (graylog2 uses it for storage) can't keep up with message rate, it will queue them in memory, once you fill all available memory it will just hang until you kill it (loosing all messages).

  • BTW: I did some test with rsyslog->logstash->elasticsearch and it was much more stable to me than graylog2 so if you don't need graylog2 fancy UI features it may be better solution. – Łukasz Mierzwa Jan 17 '12 at 14:10