3

I'm parsing exim log files and, due to my processing method, lose the original order of all entries in this file. I rebuild the transactions by their transaction ID (i.e. 1OfiYX-0000Ev-7k) but still don't have a way to determine the original order.

The <= , => , == , ** characters' original order matter, right? Is there a way to rebuild the order without any additional information?

Αντίο

gnucom
  • 197
  • 8

4 Answers4

2

I think the answer, limited solely to the symbols you ask about, is that <= is always going to come before each of the other symbols you list, and a message's Completed line will come after all of those symbols. However, each of the other symbols ==, =>, **, can appear in any order between <= and Completed.

One thing to keep in mind is that a message can have multiple recipients, and each of those recipients can be deferred (==), so the order of those symbols matters for each recipient of the message

So, every message should have exactly one <= when the message is accepted by the local server

Every message should have exactly one Completed line indicating that the local server is done with the message

Between those:

Each message:recipient will have exactly one of ** (failure) or => (delivered). It will be the last entry for that specific message:recipient.

Each message:recipient may have one or more == (deferred) lines. If a message:recipient has a == log line, it will occur before that message:recipient's => or ** line.

The order of different recipients in a given message only matters if you think it matters, most likely.

jj33
  • 11,038
  • 1
  • 36
  • 50
2

Exim is shipped with tools to help with logfile analysis. In particular, exigrep may be of interest, as it can search for a pattern in a line and then show all the log-lines for that message, including those which came before the match-line.

Exim is shipped with documentation, "The Exim Specification"; at the very least, you should have a file called "spec.txt", if not .pdf or other variant; this is also online at http://www.exim.org/; you might find "49. Log files", documenting the precise format of the log-files, and "50. Exim utilities" to be useful.

Each log-line has a timestamp; group by exim message-id and then sort by timestamp and you have the original order back.

Phil P
  • 3,040
  • 1
  • 15
  • 19
1

Yes the matter as they indicate the direction of the message flow. You need to improve your processing method not to reorder your entries.

topdog
  • 3,490
  • 16
  • 13
  • *Is there a way to rebuild the order without any additional information?* – gnucom Aug 05 '10 at 06:57
  • Yes there could be it depends on how hadoop has split the lines. because each line has the ID you should be able to reassemble the message. – topdog Aug 06 '10 at 10:07
  • So here is my next finding. 1) I sent a message FROM the server. 2) I sent a message TO the server. Both had the same structure. The first log entry *always* contained the `<=` while the second entry contained the others `=>`, `==`, etc. Based on the ID, why can't I just rebuild the message so the `<=` always is the first line? Am I misunderstanding something? Based on this experiment, I want to say order *doesn't* matter. – gnucom Aug 08 '10 at 00:49
0

Have you tried using one of the available log file parsers, e.g. awstats or sawmill.

wolfgangsz
  • 8,767
  • 3
  • 29
  • 34