4

The European General Data Protection Regulation Law (GDPR) aims to protect end users privacy. Among many other consequences, system administrators are therefore obliged to configure their systems in a way that they do not store IP addresses for unnecessary long periods of time, not without consent, et cetera. This is because IP addresses are considered personal data.

Nevertheless, there are good reasons – in accordance with the GDRP – for not anonymizing IP addresses right from the very beginning. For example, one needs means to protect a system from attacks (e.g. in order to protect the personal data of many users in the database). For example, if your system currently is under attack, and this attack originates from one particular IP address, you need to be able to block this IP (probably only temporarily). You also may want to be able to check, when the attack started, i.e. when those bad requests from this IP started. Moreover, you often want to keep your log files for a longer period of time so that you can analyze them (which is perfectly okay, if they don't contain personal data).

So these are competing interests. One simple compromise is to store the original IP addresses in the log files for only a short period of time, anonymize the IP addresses in the older log files, and – of course – inform your users/visitors about these facts (in your web sites privacy notice).

How can I configure NGINX for such a GDPR compliant setup, which does not anonymize all IP addresses right from the beginning? There are discussions and solutions for instantly/directly anonymizing IPs (e.g. here); but how can I setup anonymization for older log files only?

Caveat: IANAL

Mischa
  • 183
  • 8

1 Answers1

2

One can easily setup such a hybrid setup for having not-anonymized short-term logs and anonymized long-term logs. The trick is to let logrotate rotate your NGINX logs, and to anonymize them in the course of rotation. This also shifts the (small) performance-burden of anonymization from the busy webserver to the logrotate process.

First of all, you need a script for anonymizing access log files. One option is anonip.py from the Digitale Gesellschaft (formerly Swiss Privacy Foundation). Using such a dedicated external tool has the advantage over a quick-n-dirty self-made script that it can deal with e.g. IPv6 and IPv4 addresses et cetera. But you can, of course, use your own script in addition as well, for example to anonymize other parts in the log file as well (e.g. the URL parameter userId of your web app).

So download and install the script:

 cd /usr/local/bin
 wget https://raw.githubusercontent.com/DigitaleGesellschaft/Anonip/master/anonip.py
 chmod 755 anonip.py

Then create or edit your /etc/logrotate.d/nginx file to something similar to this:

/var/log/nginx/*.log {
    weekly
    missingok
    rotate 52
    maxage 365
    compress
    delaycompress
    notifempty
    create 0640 www-data adm
    prerotate
        if [ -d /etc/logrotate.d/httpd-prerotate ]; then \
            run-parts /etc/logrotate.d/httpd-prerotate; \
        fi \
    endscript
    postrotate
        /usr/sbin/invoke-rc.d nginx rotate >/dev/null 2>&1 ;
        /usr/local/bin/anonip.py < "$1".1 --output "$1".1.anon ;
        /bin/mv "$1".1.anon "$1".1 ;
    endscript
}

What this basically does is that it keeps an unanonymized access log for one week. Once a week, the file is rotated and anonymized. The assumption is, that the rotated file has the suffix .1. It keeps basically one year of anonymized data. Of course, one can tweak this setup to do e.g. daily rotation etc...

Mischa
  • 183
  • 8