1

I'm writing a program that deals with server generated logs. I need to catch the c-referrer and the uri-stem, regardless of what else is being logged or not and regardless of what log format is used. I've found this link, WC3 Extended Log Format (list of fields), and am looking for any other like it that talks about a format including the two fields I'm looking for, or anyone who can tell me about a format like so. Am I correct in assuming that the uri-stem will always begin with a "/" (and that no other field ever will)? I'm not so worried about finding the c-referrer as I am parsing for posted queries that have relatively particular parameters.

Also, if anyone knows of any common server log defaults (the default directory logs are written to) besides IIS (C:\WINDOWS\system32\LogFiles\W3SVC1), it would be greatly appreciated. (or do different versions of IIS have different defaults?)

Thanks!

nona urbiz
  • 187
  • 1
  • 1
  • 11

1 Answers1

3

Speaking for IIS...

Am I correct in assuming that the uri-stem will always begin with a "/"

Yes. (Although the field is named **cs-**uri-stem.)

and that no other field ever will?

No. Several other fields might begin with a "/", such as:

  • cs-username (for a user whose name begins with "/", which is odd but perfectly legal)
  • cs(Cookie) (perfectly legal for a cookie to begin with "/")
  • cs-uri-query (although it should be uri-encoded to %2f, that's browser-dependant)
  • cs(User-Agent) (begins with "Mozilla" or "Opera" for browsers, but there are literally thousands of different spiders, robots, etc that use any kind of random user-agent they feel like)
  • cs(Referer) (most browsers send the full URL, but it's would be possible for a client to send something else like a relative URL).

Again, I can't speak for any other web servers, but in IIS, it would be extremely unwise to assume cs-uri-stem is the only field which begins with "/".

--

PS: Have you seen LogParser? (download or docs)

Portman
  • 5,263
  • 4
  • 27
  • 31
  • perhaps you can help me then define a regex or another method by which to safely extract the cs-uri-stem regardless of what other fields are enabled? – nona urbiz Sep 14 '09 at 02:07
  • LogParser is irrelevant as I am trying to write my own specialized parser, as much because I want to write it as because I want the software, but thanks for the idea. – nona urbiz Sep 14 '09 at 02:08
  • 2
    The log files have a header which identifies which columns are which. The columns are then space delimited. So there's really no use for a regular expression -- just split the header on spaces to determine which column cs-uri-stem is in, and then split each line on spaces. I suppose you could do the split in a regular expression if you really wanted to, but I can't see what the advantage would be... – Portman Sep 15 '09 at 02:46
  • +1 and +1 and I will accept your answer before the bounty if something better doesn't appear. You answered my stupidly specific question, but better yet, the above comment pointed out the idiotically simpler approach that I have been blind to. Thank you. – nona urbiz Sep 15 '09 at 18:39