1

I'd like to parse logfiles. Is the logfile format of syslogd the same for all systems? On my system (Debian Lenny), it's:

Mar  7 04:22:40 my-host-name ...

(I'm not much interested in the ... part)

Can I rely on this? And is there maybe some more-or-less official description? The manpage of syslogd describes the config format, but not the logfile format.

Ideally, the description would give the fields official names like (date, time, host, entry) or (datetime, hostname, message). Maybe additionally some regular expressions. I'd like to use the names and regexes in my script, to avoid an unnecessary deviation from the standard, and to make sure, that the script runs everywhere.

Thanks

Chris

Chris Lercher
  • 3,982
  • 9
  • 34
  • 41

3 Answers3

3

RFC 3164 that Warner pointed you to describes the network format for UDP syslog messages, you can rely on this being what goes over the wire, but syslogd may write something slightly different to disk when it logs your messages.
That said, you can usually rely on syslog entries resembling what's described in the RFC, roughly in the form:

DATE HOSTNAME TAG: MESSAGE

Date is of the form Jan 1 00:00:01
Hostname is usually the short hostname, but may be fully qualified (particularly if you're logging a message from a remote host)
Tag is freeform, but by convention doesn't contain :. It is often of the form procname[PID], and I believe always followed by a literal :
Message is freeform

If you need a better guarantee of consistency in your log format syslog-NG is worth looking in to -- it will let you define your fields & insert markers to ensure you can parse the resulting files. syslog-NG also lets you include metadata like the facility+priority values from the message. Using syslog-NG reduces the definition of "everywhere" to "machines running syslog-NG with a configuration similar to yours" though.

voretaq7
  • 79,345
  • 17
  • 128
  • 213
2

The RFC should answer this question. To my knowledge: yes, that's usually the case.

Warner
  • 23,440
  • 2
  • 57
  • 69
0

The devil is in the RFC that @warner linked:

4.1.3 MSG Part of a syslog Packet

The MSG part will fill the remainder of the syslog packet. This will usually contain some additional information of the process that generated the message, and then the text of the message. There is no ending delimiter to this part. The MSG part of the syslog packet MUST contain visible (printing) characters. The code set traditionally and most often used has also been seven-bit ASCII in an eight-bit field like that used in the PRI and HEADER parts. In this code set, the only allowable characters are the ABNF VCHAR values (%d33-126) and spaces (SP value %d32). However, no indication of the code set used within the MSG is required, nor is it expected. Other code sets MAY be used as long as the characters used in the MSG are exclusively visible characters and spaces similar to those described above. The selection of a code set used in the MSG part SHOULD be made with thoughts of the intended receiver. A message containing characters in a code set that cannot be viewed or understood by a recipient will yield no information of value to an operator or administrator looking at it. The MSG part has two fields known as the TAG field and the CONTENT field. The value in the TAG field will be the name of the program or process that generated the message. The CONTENT contains the details of the message. This has traditionally been a freeform message that gives some detailed information of the event. The TAG is a string of ABNF alphanumeric characters that MUST NOT exceed 32 characters. Any non-alphanumeric character will terminate the TAG field and will be assumed to be the starting character of the CONTENT field. Most commonly, the first character of the CONTENT field that signifies the

This essentially says the developer can stick whatever they want into CONTENT, so there really is no standard for the actual contents of messages, just for the organization of messages. I might say that this is a flaw but I'm not sure yet.

Tommy
  • 101
  • 3