7

I have an apache/nginx/whatever web server which logs client IP addresses to the access logs. Now these log files are rotated via logrotate.

I want to keep the IP addresses for some days, then after 7 days, I want to remove the IPs from the log files for privacy reasons (mostly dictated by German law).

Using mod_removeip or something like that doesn't work because I need to filter some requests based on their IP addresses.

Is there any 'standard' way to do it? Maybe even with logrotate?

EDIT

I just found this script but it depends on the ability to pipe all logging through the script in real-time. I'm not really sure about the performance implication of this approach.

Also, this only works for the 'front-end' server logs, not the application server logs.

Dave M
  • 4,494
  • 21
  • 30
  • 30
Michael Siebert
  • 213
  • 2
  • 7

3 Answers3

2

PCRE! (Perl-Compatible Regular Expression)

s/\b(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\b/REMOVED IP/g

Use that as a filter in a perl script or any other suitable language (quite a few use PCRE or some other close-enough regex language that will work) to rewrite your log files at 7 days.

$ cat > file_with_ip
some text from 192.168.1.1
^D
$ perl -p -i -e 's/\b(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\b/REMOVED IP/g' file_with_ip
$ cat file_with_ip
some text from REMOVED IP
Jeff Ferland
  • 20,239
  • 2
  • 61
  • 85
1

On Ubuntu > 12.04 / apache 2.4, with default config you could use something like this:

for file in `find /var/log/apache2 -type f -name ".*gz"  ! -name "*.ano.*" -mtime +7`
do
    datestamp=`date +"%Y%m%d%H%M%s"`
    # echo Process $file
    zcat $file |sed -E "s/([0-9]{1,3}\.[0-9]{1,3})\.[0-9]{1,3}\.[0-9]{1,3}/\1.0.0/"|gzip > ${file%.*}.ano.${datestamp}.gz 
    # rm -f $file # Only call this if you are sure that the command before succeeds, otherwise you will lose data.
done

This creates a copy of all *.gz files older then 7 days and replaces the last two bytes of all IPs 0.0 in the copied version with ano suffix added.

If you don't use compression or different compression like bz2 you have to change the commands accordingly, e.g. zcat -> bzcat.

Finally you can call this routine via cron once per day/week.

jschnasse
  • 123
  • 5
0

I don't think logrotate will do it; you may need to look at creating a script that will decompress the files, process them through awk or sed to strip the IP's out, then recompress them. Just can't do it on "active" log files.

Bart Silverstrim
  • 31,092
  • 9
  • 65
  • 87
  • 3
    I believe logrotate has pre/post hooks that you could use to launch the script you mention, then the OP wouldn't need to manage a separate process. – EEAA Feb 09 '12 at 14:17
  • 3
    Maybe you can use logroate's "postrotate" for this. – Stone Feb 09 '12 at 14:21
  • i even thought of creating a "compress" script which filters and then pipes to gzip. this would essentially save the step of decompressing the logs but would 'kill' the time window of 7 days i want – Michael Siebert Feb 09 '12 at 14:31