3

Is there a way to determine the kind of log (so that it can be parse correctly) if I have no prior information about the type (for eg. syslog, apache log, IIS log) of log it is? I am trying to write a Grok filter for the logs but I have no idea what the fields represent.

These are the first few lines from the logs:

14;1074585600;147.33.10.112;89ccfad2c4bbc02c91ed66055a235fca;/ls/index.php?      &id=62&view=1,2,3,4,6,9&sort=,13,4&pozice=40;hXXX://YYY.shop4.cz/ls/index.php?&id=62&view=1,2,3,4,6,9&sort=,13,4&pozice=20

12;1074585600;57.66.66.138;17bff4c98f96413dbe748c9cd8822da9;/ct/?c=158;hXXX://YYY.shop3.cz

14;1074585600;194.196.100.86;e9455a109435408eb7b8e170d636d024;/klient/seznam.php;hXXX://YYY.shop4.cz/klient/zpravy.php

11;1074585600;66.77.73.176;88dc79e8eb5968d936a7d563af55bd08;/dt/?id=9354;

10;1074585601;158.196.177.79;cbf84093e4740423436abaf3c1a65ebc;/;
shruti gupta
  • 33
  • 1
  • 4

1 Answers1

6

Sure. It looks like it's a log from the European Conferences on Machine Learning and European Conferences on Principles and Practice of Knowledge Discovery in Databases Discovery Challenge 2005 competition. They've got a page describing the data format and a FAQ about the data on the site.

(I could tell those were some old unix timestamps just by eyeballing them... 2004 vintage, those are.)

That's not any standard log format (and, BTW, syslog is a protocol, not a log format.)

In terms of methodology, I started with just looking at the lines. I could tell that the second field was a Unix epoch date just by seeing the size of the numbers. Obviously, the third field is an IPv4 address. The fifth field is 32 hexadecimal digits, so it's very likely an MD5 sum. The next field looks like the hierarchical part of a URL and the query. The last field looks like a URL, and I'd tend to surmise that it's a referer.

So, it looks like a web server log.

I searched the hashes using Google because I was curious to see if this data turned-up anywhere else. Sure enough, one of the hashes turns up in the pages I linked to above.

Evan Anderson
  • 141,071
  • 19
  • 191
  • 328
  • Thank you so much. I was wondering if there was some sort of list of standard log formats but you answered my question perfectly and saved me a bunch of time. – shruti gupta Jul 15 '14 at 16:09