0

UPDATE 2: I've answered this via my new question at the link below. The root cause is behaviour by telegraf where by default it disconnects the TCP connection 5 seconds after the last received message. This may be by design, however I have an issue with their documentation which made this difficult for me to spot as a potential fix.

Perhaps this question can now be deleted?


UPDATE 1: rather than edit this question extensively, making the current answers make no sense, I have posed a new question based on new information I received as a result of posting this question.

syslog-ng / telegraf : EOF occurred when idle - incompatible?


I'm using syslog-ng Open-Source Edition (OSE) v3.31.2 in a docker-compose stack.

I have syslog messages arriving over the network from various hosts via UDP (which I'm constrained to because my clients use Boost::Log and this does not support syslog over TCP, only UDP), and I have syslog-ng set to forward these to another service downstream. This happens to be telegraf utilising a inputs.syslog module, but I'm not sure that matters yet.

My config looks like this:

@version: 3.29
@include "scl.conf"

options {
    flush-lines(1);
};
    
source s_network {
    udp(ip(0.0.0.0) port(514));
};

destination d_file {
    file("/var/log/messages");
};
    
destination d_telegraf {
    syslog("telegraf" port(6514) transport(tcp));
};
    
log {
    source(s_network);
    destination(d_telegraf);
    destination(d_file);
};

I have explicitly set the global flush-lines value to 1. I think this is the default, but I want to be sure. I want log messages to be forwarded as soon as they are received.

Most of the time this works - individual "lines" of logs arrive into syslog-ng via UDP 514, and are immediately written to the file /var/log/messages, and in almost all cases they are also immediately forwarded to telegraf on TCP port 6514.

The problem I'm seeing is that quite often syslog-ng is holding back many lines of incoming logs for up to around 30-60 seconds, then delivering them to telegraf in a big chunk. There doesn't seem to be much pattern to this, but it happens a lot. The odd thing is that the /var/log/messages file has the missing log entries written immediately, it's just the network delivery that is delayed. I had thought that flush-lines(1) would avoid this buffering, but it doesn't seem to.

I've used Wireshark to determine where the delay is, and it's in the output of packets from syslog-ng, between syslog-ng and telegraf TCP port 6514.

I did wonder if this might be a TCP Nagle's Algorithm thing - if so, is there a way to turn on the TCP_NO_DELAY socket option for syslog-ng's syslog destination driver?

Ultimately what I'm looking for is a fast, low-latency syslog service that can aggregate and relay logs as quickly as possible for real-time review downstream.

EDIT: I tried switching over to UDP transport between syslog-ng and telemetry and this seems to be much more responsive and the long, occasional delays have disappeared. However this will make it difficult to secure the connection in future.

davidA
  • 353
  • 2
  • 11

2 Answers2

2

What you experience is not normal. The above configuration should forward logs to d_telegraf and d_file at the same time, as soon as possible.

I believe you are having connection issues, that must be the reason for the 60-second delay, which is the default value of the reconnection timer.

You can lower this value using the time-reopen() global option, for example:

options {
  time-reopen(1);
};

You can also start syslog-ng in the foreground (in debug mode) to investigate the connection issues:

$ syslog-ng -Fdev
MrAnno
  • 211
  • 1
  • 5
  • Thank you for the hint about `-Fdev` - from this I've determined that syslog-ng is reporting `EOF on control channel, closing connection;` almost exactly 30 seconds after the last log message (after a client terminates), and then 30 seconds later: `Syslog connection established`. Wireshark monitoring syslog-ng output shows a FIN,ACK followed by nothing for exactly 60 seconds, followed by a SYN which I suspect is the reconnect. I'll try `time-reopen(1)` next. – davidA Nov 17 '21 at 02:27
  • I set `time-reopen(1)` and the connection seems to reconnect very quickly, as expected. Essentially the problem "goes away" with this short retry. However I'd like to work out what is causing the EOF on control channel - is this caused by a misbehaving client shutting down without closing the syslog connection? The client is sending syslog over UDP - does syslog implement a protocol connection state over UDP? – davidA Nov 17 '21 at 02:37
  • The clients are in-house applications written with Boost::Log - since this may be turning into a programmming question, perhaps I need to ask on StackOverflow if there's a way for them to tidily "disconnect" on program termination? – davidA Nov 17 '21 at 02:39
  • I think the UDP client is a red herring. I can reproduce the same behaviour with a single UDP syslog message sent by `logger -d`. From the log, I can see that syslog-ng receives the entry, sends it to both the network and file destinations, then reports "EOF occurred while idle" 5 seconds later, then closes the connection, "EOF on control channel", then 30 - 6o seconds later it reconnects. But I'm also seeing this happen, rarely, when completely idle, so perhaps there's an underlying network issue. I'm using a docker-compose network, btw, it's all on the same host. – davidA Nov 17 '21 at 02:50
  • This is getting a bit unwieldy so I might write this up as a separate question. Thank you again for the hints on where to start looking. – davidA Nov 17 '21 at 02:56
  • 1
    The message about "control channels" can be misleading, this is not about your network connections, it's about syslog-ng's own control channel. The EOF message means that you probably executed `syslog-ng-ctl` to query stats, reload, or to restart syslog-ng. – MrAnno Nov 17 '21 at 16:39
  • Ah, I think there must be an automated `syslog-ng-ctl` process within the official Docker container that is running periodically since I’ve never run that program manually myself. That would explain that message. – davidA Nov 17 '21 at 19:34
1

Try flush-lines(0) by just deleting that line all together.

How does syslog-ng handles flush_lines(0)?

https://github.com/syslog-ng/syslog-ng/issues/1411

dmourati
  • 24,720
  • 2
  • 40
  • 69
  • Indeed, I've tried `flush-lines(0)` and with it omitted as well. It doesn't seem to have any effect on this issue, and did find evidence (that I've since mislaid, the syslog-ng docs are so light on details sadly) that a value of 0 isn't actually valid. – davidA Nov 16 '21 at 02:37