3

EDIT: Thanks for the help

Here is a quick idea of the setup:

webserver X

In apache httpd.conf:

LogFormat "%v %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" vcombined
CustomLog "|/usr/bin/logger -p local6.info -t access " vcombined

In rsyslog.conf:

*.* @logserver

Logserver

syslog-ng.conf:

...
parser p_apache {csv-parser(columns(
"APACHE.VIRTUAL_HOST",
"APACHE.CLIENT_IP",
"APACHE.IDENT_NAME",
"APACHE.USER_NAME",
"APACHE.TIMESTAMP",
"APACHE.REQUEST_URL",
"APACHE.REQUEST_STATUS",
"APACHE.CONTENT_LENGTH",
"APACHE.REFERER",
"APACHE.USER_AGENT",
"APACHE.PROCESS_TIME",
"APACHE.SERVER_NAME")
# flags:
#   escape-none,escape-backslash,escape-double-char,
#   strip-whitespace
flags(escape-double-char,strip-whitespace)
delimiters(" ")
quote-pairs('""[]')
);};
...
source s_net { udp(ip(0.0.0.0) port(514) so_rcvbuf(1048576)); };
destination hosts_acc { file("/var/log/hosts/$HOST/${APACHE.VIRTUAL_HOST}_acc.log"); };
filter f_apacheacc   { facility(local6); };
log { source(s_net); parser(p_apache); filter(f_apacheacc); destination(hosts_acc); };
...

The log's get there just fine, but there are a LOT of logs like the following:

-rw------- 1 root root       5726 Apr  6 01:02 xc3\x9d\xc3\x9ed$yA;_acc.log
-rw------- 1 root root      23435 Apr  6 01:06 \xc3\x9ed$yA;_acc.log
-rw------- 1 root root        745 Apr  6 00:57 xc3\x9ed$yA;_acc.log
-rw------- 1 root root       8440 Apr  5 22:50 \xc3\xaf_F\xc3\x95$yA;_acc.log
-rw------- 1 root root       3112 Apr  6 00:58 xe2\x80\x94w\xe2\x80\x98\xc3\x9d\xc3\x9ed$yA;_acc.log
-rw------- 1 root root       4220 Apr  5 22:03 xe2\x80\x98\twd\xc2\xa2\xc2\xb0\xc3\x96$yA;_acc.log
-rw------- 1 root root       1055 Apr  5 22:03 xe2\x80\x98\xc2\x9dw\xc3\x94\xc3\xb4T\xc5\x93$yA;_acc.log
-rw------- 1 root root       1821 Apr  6 00:58 \xe2\x80\x98\xc3\x9d\xc3\x9ed$yA;_acc.log
-rw------- 1 root root       2875 Apr  6 01:02 xe2\x80\x98\xc3\x9d\xc3\x9ed$yA;_acc.log
-rw------- 1 root root       3165 Apr  5 22:48 \xe2\x80\x99-w\xc3\xaf_F\xc3\x95$yA;_acc.log
-rw------- 1 root root       3165 Apr  5 22:40 \xe2\x80\x99\xe2\x80\x9aw\xe2\x82\xac\xc2\xbd\xe2\x80\x9d($yA;_acc.log
-rw------- 1 root root      15825 Apr  5 22:50 xe2\x80\x99\xe2\x80\x9aw\xe2\x82\xac\xc2\xbd\xe2\x80\x9d($yA;_acc.log
-rw------- 1 root root       1055 Apr  5 22:39 \xe2\x80\x9aw\xe2\x82\xac\xc2\xbd\xe2\x80\x9d($yA;_acc.log
-rw------- 1 root root       2110 Apr  5 22:50 xe2\x80\x9aw\xe2\x82\xac\xc2\xbd\xe2\x80\x9d($yA;_acc.log
-rw------- 1 root root       2034 Apr  5 22:50 \xe2\x80\x9d($yA;_acc.log
-rw------- 1 root root       4066 Apr  5 22:45 xe2\x80\x9d($yA;_acc.log
-rw------- 1 root root       7212 Apr  6 13:30 \xe2\x80\xb9>$yA;_acc.log
-rw------- 1 root root       3000 Apr  6 13:25 xe2\x80\xb9>$yA;_acc.log

My question is where, and how can I filter these out, I don't want them on the filesystem (But actually I guess it wouldn't be a bad idea to keep them logged, but in their correct VHost file)

Here is an example VHost

<VirtualHost *:80>
    ServerAdmin xxx@xxx.xx
    ServerName xxx.xx
    DocumentRoot /var/www/vhosts/xxx
    <Directory /var/www/vhosts/xxx>
        AllowOverride All
        Options All
        RewriteEngine on
    </Directory>
</VirtualHost>

And the default "catch-all" vhost at the bottom of the vhosts config file:

<VirtualHost *:80>

    ServerName default
    ServerAlias *
    ServerAlias catchall.xxx.xx

    DocumentRoot /var/www/vhosts/nodomain

    <Directory "/var/www/vhosts/nodomain">
        Options Indexes FollowSymLinks
        AllowOverride none
        Allow from All
    </Directory>
    CustomLog /dev/null combined
    ErrorLog /dev/null
</VirtualHost>

I had posted this in a related question but It's better in it's own question.

Here are some examples from inside the log files

r_acc.log:
Apr  7 11:16:27 xxxxx access: r PC 5.0; eSobiSubscriber 2.0.4.16; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C)"
Apr  7 11:16:28 xxxxx access: r PC 5.0; eSobiSubscriber 2.0.4.16; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C)"

########################

D46-28E2-0FBC95-78798EV\xe2\x80\x94w\xe2\x80\x98\xc3\x9d\xc3\x9ed$yA;_acc.log:
Apr  7 14:54:06 xxxxx access: D46-28E2-0FBC95-78798EV\xe2\x80\x94w\xe2\x80\x98\xc3\x9d\xc3\x9ed$yA; B557000E-F20D-35DD-021A-9824EC-17A4AFV\xe2\x80\x94w\xe2\x80\x98\xc3\x9d\xc3\x9ed$yA; 3BD03D7B-EEFD-83FF-7599-B751AD-6F0A2EV\xe2\x80\x94w\xe2\x80\x98\xc3\x9d\xc3\x9ed$yA; 9CAE0724-D455-0B31-3378-871C11-BBD0A4V\xe2\x80\x94w\xe2\x80\x98\xc3\x9d\xc3\x9ed$yA; C1E24799-3979-2452-81-3BAA0FFD361F5A; 0E701CBC-5832-5AB6-D5-CFBF9BDE863EAA; 464714B1-B3E2-774A-A4-FEA612A46CEE06; 74C817B0-D081-D2CC-6D-C4EF0F1B4F49BB; 1338B1DE-67CD-977C-B35D-1F2C4441DD6A; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 1.1.4322; .NET CLR 3.5.30729; .NET CLR 3.0.30729; OfficeLiveConnector.1.5; OfficeLivePatch.1.3; .NET4.0C; BRI/2)"

########################

V\xe2\x80\x94w\xe2\x80\x98\xc3\x9d\xc3\x9ed$yA;_acc.log:
Apr  7 14:55:04 xxxxx access: V\xe2\x80\x94w\xe2\x80\x98\xc3\x9d\xc3\x9ed$yA; FEEACE4F-092A-1D46-28E2-0FBC95-78798EV\xe2\x80\x94w\xe2\x80\x98\xc3\x9d\xc3\x9ed$yA; B557000E-F20D-35DD-021A-9824EC-17A4AFV\xe2\x80\x94w\xe2\x80\x98\xc3\x9d\xc3\x9ed$yA; 3BD03D7B-EEFD-83FF-7599-B751AD-6F0A2EV\xe2\x80\x94w\xe2\x80\x98\xc3\x9d\xc3\x9ed$yA; 9CAE0724-D455-0B31-3378-871C11-BBD0A4V\xe2\x80\x94w\xe2\x80\x98\xc3\x9d\xc3\x9ed$yA; C1E24799-3979-2452-81-3BAA0FFD361F5A; 0E701CBC-5832-5AB6-D5-CFBF9BDE863EAA; 464714B1-B3E2-774A-A4-FEA612A46CEE06; 74C817B0-D081-D2CC-6D-C4EF0F1B4F49BB; 1338B1DE-67CD-977C-B35D-1F2C4441DD6A; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 1.1.4322; .NET CLR 3.5.30729; .NET CLR 3.0.30729; OfficeLiveConnector.1.5; OfficeLivePatch.1.3; .NET4.0C; BRI/2)"

###################

xc2\x90\xc3\x91\xc3\x94\xc2\xab$yA;_acc.log:
Apr  7 19:48:39 xxxxx access: xc2\x90\xc3\x91\xc3\x94\xc2\xab$yA; 3C12D25C-9D40-91CF-1F40-AC-B1A083426DV-w\xc2\x90\xc3\x91\xc3\x94\xc2\xab$yA; D4713FA8-0142-A0C2-4812-BA-E03221005BV-w\xc2\x90\xc3\x91\xc3\x94\xc2\xab$yA; 199BAF2A-ECD5-39FA-65C3-E8-B107FAFF08V-w\xc2\x90\xc3\x91\xc3\x94\xc2\xab$yA; 384BDA70-9954-7744-05A0-C4-C7D9FEA685V-w\xc2\x90\xc3\x91\xc3\x94\xc2\xab$yA; EE7292A9-333C-AF70-5A7F-55-CAA7D0BA39V-w\xc2\x90\xc3\x91\xc3\x94\xc2\xab$yA; -AD7D48FA3A55-2A33-D10B-B4B66276D8B8; -166A9C6A2E71-24DF-A192-C8258AA4DE14; -00077C6C84E0-A302-4954-3D6D17C54D31; 3F56C318-EC3C-432B-680F-7E4BB2B852C4; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.5.21022; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C)"
Apr  7 19:48:39 xxxxx access: xc2\x90\xc3\x91\xc3\x94\xc2\xab$yA; 3C12D25C-9D40-91CF-1F40-AC-B1A083426DV-w\xc2\x90\xc3\x91\xc3\x94\xc2\xab$yA; D4713FA8-0142-A0C2-4812-BA-E03221005BV-w\xc2\x90\xc3\x91\xc3\x94\xc2\xab$yA; 199BAF2A-ECD5-39FA-65C3-E8-B107FAFF08V-w\xc2\x90\xc3\x91\xc3\x94\xc2\xab$yA; 384BDA70-9954-7744-05A0-C4-C7D9FEA685V-w\xc2\x90\xc3\x91\xc3\x94\xc2\xab$yA; EE7292A9-333C-AF70-5A7F-55-CAA7D0BA39V-w\xc2\x90\xc3\x91\xc3\x94\xc2\xab$yA; -AD7D48FA3A55-2A33-D10B-B4B66276D8B8; -166A9C6A2E71-24DF-A192-C8258AA4DE14; -00077C6C84E0-A302-4954-3D6D17C54D31; 3F56C318-EC3C-432B-680F-7E4BB2B852C4; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.5.21022; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C)"

Thanks

zeyus
  • 33
  • 1
  • 7
  • I've just seen a bunch more come in...as best as I can figure the requests must be so long that they're wrapping or something and that it becomes a "new line" but I don't know how to test this theory, other than that the entries inside the strange files look like half lines... Apr 7 11:11:26 xxxxxx access: D9632C6-FC46-5A-9C5E42-86A23F7C38FB; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; eSobiSubscriber 2.0.4.16; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C)" – zeyus Apr 07 '11 at 10:40
  • Oh, I get it. These are [malformed requests](http://serverfault.com/questions/256245/how-to-block-malformed-requests-to-apache) (There is no GET or POST, etc.) and this is throwing off your syslog-ng parser. Can you post a few more lines from your logfile into the question above (It's hard to read them down here in the comments), so we can get a better sense of how these requests are coming in? – Stefan Lasiewski Apr 07 '11 at 14:47
  • Thanks, I've updated the post to include examples from the log files :) Hopefully that gives a bit more information! There are many many many log files with these kinds of names and the odd requests – zeyus Apr 07 '11 at 18:38

3 Answers3

1

if you want to keep your log setup the same, and handle these weird files in syslog-ng, you could try defining a 'known hosts' filter, and add it to all your log directives.

Then catch the ones that don't in a 'fallback' log with a name that doesn't depend on information in the log message.

destination hosts_acc { file("/var/log/hosts/$HOST/${APACHE.VIRTUAL_HOST}_acc.log"); };
destination hosts_def { file("/var/log/hosts/unk/unmatched.log"); };
filter f_apacheacc   { facility(local6); };
filter f_known { host("myserver1") or host("myserver2") or...); };
log { source(s_net); parser(p_apache); filter(f_known); filter(f_apacheacc); destination(hosts_acc); flags("final"); };
log { source(s_net); parser(p_apache); filter(f_apacheacc); destination(hosts_def); flags("fallback"); }; 

You could do a similar thing for any variable, like APACHE.VIRTUAL_HOST or w/e you like

ryansstack
  • 128
  • 4
  • Thanks, this is a really interesting idea and one that we might use, the only downside with this is that we have 100s of VHosts and new ones are added weekly. It will add another step to every new site added. – zeyus Apr 11 '11 at 11:25
  • if you have a few naming conventions, it will become a "make sure this works" more often than an "add a line to this file" step – ryansstack Apr 12 '11 at 15:31
  • We can't use host() the logs are coming from our webservers, but filtering by VIRTUAL_HOST would be great, and is it possible to filter with wildcards? They're all TLDs. (For example *.com) if so then top marks for you! :) – zeyus Apr 13 '11 at 19:03
  • You sir are a gentleman and a scholar, I've looked into it and using filter+match seems to be the way forward. Thank you!! – zeyus Apr 14 '11 at 07:30
  • The final solution ended up being adding 2 filters (the second for the catchall, because the apache self connections were just a waste of log lines), I want to upvote your answer but I don't have enough points yet... ------- `filter f_reallog { match("^([a-zA-Z0-9-]+\.)+[a-zA-Z]{2,5}$" value("APACHE.VIRTUAL_HOST")); }; filter f_noself { not match("\(internal dummy connection\)" value("MESSAGE")); }; ` – zeyus Apr 14 '11 at 11:02
  • 1
    I'm glad that worked for you, i recently had the same issue on my own servers :) – ryansstack Apr 20 '11 at 18:41
0

Don't have your default vhost log.

(Adding more here so I can use formatting.)

You probably want the logs. You could configure the default vhost like so:

LogFormat "default %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" vcombineddefault
CustomLog "|/usr/bin/logger -p local6.info -t access " vcombineddefault

I've replaced %v with default.

(I'm assuming rsyslog is correctly parsing the apache log. I don't know about that part.)

Mark Wagner
  • 17,764
  • 2
  • 30
  • 47
  • In the default vhost do you think it's a good idea to just point the logs to /dev/null ? Because before I didn't have a catch-all vhost and the problem was obviously still there, but now I have that catchall I guess it's possible to set that specifically to log somewhere else – zeyus Apr 06 '11 at 19:23
  • This didn't help unfortunately, even making the default vhost log to /dev/null instead I still get the strange files, I think the requests must be really too long or multilined somehow – zeyus Apr 07 '11 at 09:37
0

(I am overwriting my old answer with this new answer based on the new information you are giving. Sadly, I don't have an answer yet.).

These are malformed requests to your webserver. There is no GET or POST method, etc. and this is throwing off your syslog-ng parser. syslog-ng is assuming that things like 'xc3\x9d\xc3\x9ed$yA;' is the 'APACHE.VIRTUAL_HOST', and is building the directories as you instructed it too.

According to the http mod_log_config documentation, this /xhh text represents non-printable characters in the requests:

For security reasons, starting with version 2.0.46, non-printable and other special characters in %r, %i and %o are escaped using \xhh sequence

I'm confused why these requests are in your access log at all. There is no GET, HEAD, POST, etc. so it's not a valid request from what I can see.

Can you post a few more lines from your logfile into the question above (It's hard to read them down in the comments), so we can get a better sense of how these requests are coming in?

Stefan Lasiewski
  • 22,949
  • 38
  • 129
  • 184
  • Basically it's been set up so the only logging directive is within the httpd.conf, and nothing at all about logging in vhosts. Also, there are no vhosts with any names like that (they're all FQDN) :( but it must be matching it against the line somehow because of the requests they're sending! – zeyus Apr 07 '11 at 07:26
  • Thanks for your help so far. I've posted the lines from the logs, but the problem is I can't see the "full" lines because of the weird requests and how it gets mixed up (there may be a GET or POST at the beginning somewhere but it's getting mixed up) – zeyus Apr 07 '11 at 18:48