Formal separation marker of syslog events?

Question

I've been looking at RFC5424 to find the formally specified marker that will end a syslog event.

Unfortunately I couldn't find it. So If I wanted to implement some small syslog server that reacts on certain messages what is the marker that ends a message (yes commonly an event is a single line, but I just couldn't find it in the specification)

Clarification:

I call it event because I associate a message with a single line. An event could possibly be some thing like

Type: foo
Source: webservers

whereas a message to me is this:

Type: foo Source: webservers

https://www.rfc-editor.org/rfc/rfc5424#section-6 defines:

SYSLOG-MSG      = HEADER SP STRUCTURED-DATA [SP MSG]

neither STRUCTURED-DATA nor MSG tell me how these fields end. Especially MSG is defined as as MSG-ANY / MSG-UTF8 which expands to virtually anything. There's nothing that says a newline marks the end (or an 8 or an a for that matter). Given the example messages (section 6.5):

This is one valid message, or 2 valid messages depending on wether you say that a HEADER element must never occur in any MSG element:

literal whitespace

<34>1 2003-10-11T22:14:15.003Z mymachine.example.com su - ID47 - <34>1 2003-10-11T22:14:15.003Z mymachine.example.com su - ID47
                                                                |
                                                               is this an end marker?

\t stands for a tab

<34>1 2003-10-11T22:14:15.003Z mymachine.example.com su - ID47 -\t<34>1 2003-10-11T22:14:15.003Z mymachine.example.com su - ID47
                                                                |
                                                               is this an end marker?

\n stands for a newline

<34>1 2003-10-11T22:14:15.003Z mymachine.example.com su - ID47 -\n<34>1 2003-10-11T22:14:15.003Z mymachine.example.com su - ID47
                                                                |
                                                               is this an end marker?

Either I'm misreading the RFC or there just isn't any mention. The sizes specified in the RFC just say what the minimum length is expected that I can work with...

ANSWER?: Appearantly I was reading the wrong RFC. One needs to go the the specific transport RFCs and keep to that https://www.rfc-editor.org/rfc/rfc5426#section-3.1 says it all for the UDP transport.

@joechip: Since your comments and answer lead me to actually read a bit more in the transport RFCs I'll be happy to accept your answer if you update it a bit in that direction :)

OK structured data ends with a `]` but a `\n` could be inside the `PARAM-VALUE` of an `SD-ELEMENT` — Martin M., Jun 29 '11 at 23:18
Which is what I said: it can contain newlines. I think you cannot assume a message corresponds to a single line. And I agree that the RFC doesn't seem to specify how the message length gets determined, so I think the string should be NULL-terminated. — joechip, Jun 30 '11 at 05:39
OK now I'm confused. Are you essentially saying: "There's no spec about end markers" or "Maybe it's a ASCII NULL". Since the spec isn't about any transport in fact it's transport agnostic. I could easily come up with a SMTP transport for Syslog and be perfectly compatible with this RFC, all this tells me is how to split the message up into fields, just not what the last part is so that I know wether a UDP/TCP/whateverpacket contains 1 or more messages. Am I looking at the wrong RFC for that question? — Martin M., Jun 30 '11 at 06:37
I have edited the answer to clarify this whole string-termination issue. String termination is needed in memory, but not needed over the network. And it turns out that the syslog message can contain something other than a string (i.e., an octet stream) which would not be suitable to null-terminate. — joechip, Jul 01 '11 at 02:07

joechip · Accepted Answer · 2011-07-01T02:04:15.833

Well, what do you mean by "syslog event"? In case you refer to syslog messages, RFC5424 unambiguously defines the syslog message syntax in its section 6, as how it is to be transmitted from one syslog application to another.

In case you are referring to how they are stored in the log files by the receiving syslog application, typical syslog implementations simply separate one record from another with newlines, and this is not usually a configurable behavior. Furthermore, a syslog record's text field can also include newlines and this complicates the task of parsing the log file correctly. It can usually be parsed nonetheless because each syslog record starts with the usual sequence of date, time, host and tag while newlines inside a syslog record would not normally be followed by text similar to those.

I think that the ability to change the syslog stored-record separator would be a useful feature, but any ocurrence of such separator inside the record itself should be escaped for this to be useful. Adding so much structure to a plain text file is bound to be a compromise. If you care much about this issue, perhaps you should support writing to log files in some well-defined binary format (e.g., sqlite could be useful here).

Edit: A more careful examination of RFC5424 section 6 shows that a syslog message can have two forms:

HEADER SP STRUCTURED-DATA

or

HEADER SP STRUCTURED-DATA SP MSG

By expanding the ABNF specification, we can easily see that the first form ends in either "-" or "]". There could be other "-" and "]" chars before this final char, so it can't be taken for a syslog message terminator.

The second form ending depends on how MSG ends. MSG can be either a UTF-8 string (as specified in RFC 3629, which contains no string termination) or an arbitrary octet stream ending in any value. Evidently, there's no such termination symbol specified for this form either.

But the fact is that there is no need for a syslog message terminator, no matter what form it is in, because the message length is communicated out-of-band by the transport layer. When the UDP packet is sent by the application, the syslog message must be already prepared according to spec and stored in a buffer. This buffer is passed by the application to a function or method in order to send it, and the amount of bytes to send is passed too. For example, in C we have:

ssize_t sendto(int sockfd, const void *buf, size_t len, int flags,
               const struct sockaddr *dest_addr, socklen_t addrlen);

In this example, len is the amount of bytes that should be taken from the buffer buf and sent to the remote host.

Likewise, on the syslog server another function or method is called, such as this one:

ssize_t recvfrom(int sockfd, void *buf, size_t len, int flags,
                 struct sockaddr *src_addr, socklen_t *addrlen);

This function returns the length in bytes of the UDP payload received in buffer buf. If the application attempts to read more than this returned length, it will get garbage (or a segmentation fault). To avoid reading over this limit, it is usual to put a NULL value at position buf[siz] right after the siz=recvfrom(...) call. This way, any later function call that uses buf as a string will work properly. This null-termination only applies to strings, of course, and not to octet streams. And this null value is, as I said, usually not transmitted over the network but only added by the receiving application.

In the case of the syslog server as a receiving application, most syslog servers might add this null-termination for their internal handling of the received string (if they treat it as a string at all), but in any case this null value is left out when the string is appended to the logfile so as not to disrupt text processing of the logfile as a whole.

How about local TLS messages that are not transmitted via UDP? — Nils, Jul 03 '11 at 20:37
Just a different RFC to read (of course I don't have the number handy). But basically it's the transport that defines how to determine when a message ends and when a new one starts. UDP says that only one message per datagram is delivered, either in full, or truncated. I don't know what TLS says... — Martin M., Jul 03 '11 at 22:47
@Nils It's the same thing, only the TLS encapsulation is ruled by RFC5425. Its section 4.3 shows how the total syslog message length (its byte count) is used in encapsulating, and it says that the syslog message itself is ruled by RFC5424 (the one we have already discussed). — joechip, Jul 04 '11 at 03:47

score 0 · Answer 2 · answered Jun 29 '11 at 21:36

In section 6.1 they define a message length. I would figure that when you get the complete message you'd have the header and data and it would add up to that length.

Beyond that, I see no facility in there for multiple messages. So I'd figure each message is an event. There is no multi-message tracking of any sort and no specified coding for start, middle, and ending messages. Syslog tracks logged messages, it doesn't really have a higher-level event concept.

Formal separation marker of syslog events?

2 Answers2