6

EU's General Data Protection Regulation (GDPR), and the German DSGVO implementation, are very strict when it comes to individual-related data (such as IP addresses). However this question is not about the GDPR, but how to implement the regulation with nginx HTTP access log while keep the possibility of "identifying" the anonymous user within a user journey (to border a user journey from other ones).

My current implementation is, that I do not record the remote IP and port at all. I purged the environment variables for upstreams/proxies/etc and simple does not have remote IP and port information with the access logs.

Now I am facing the issue that I need to follow a path of a user journey. I just simply does not have any way of "identifying" which requests are within which user journey. I want to point out, that I also do not use cookies, etc.

The legacy approach to "identify" an "anonymous user" is to look for the remote IP and the date information. Within the same day, the same remote IP would most likely be the same user. However, as mentioned above, I do not log remote IP and port information. And I don't want that even now.

My current though is to hash the remote IP address with the remote port and date of the request. I would have the date information with the logs but not the remote port, so I cannot - without heavy brute forcing - recover the remote IP, an individual-related data. This approach would help to give back some level of user journey identification, which would help me quite a bit.

A general workflow to accomplish this approach would be:

  1. The request is accepted by nginx,
  2. nginx performs a hash operation with remote IP, remote port and current date (e.g. md5_hex("$remote_addr $remote_port $current_date")) and stores the hash in a new variable (e.g. $remote_ip_anonymous),
  3. the log_format would be having the $remote_ip_anonymous variable.

The hash would alter, even when the remote IP and remote port would be the same, due to the current date salt. And it would alter, when the remote port is changed. So this should be fine with GDPR or at least the lowest data security category, while the actual remote IP would be a mayor data security category with GDPR.

Enough with the theory... how would I implement such remote IP anonymization? Do I have to use the nginx Perl module or Lua module, or is there another (faster) way of getting that hash and store it into the nginx variable?

burnersk
  • 1,966
  • 4
  • 25
  • 38

1 Answers1

3

EU's General Data Protection Regulation (GDPR) is about "protection of natural persons with regard to the processing of personal data and rules relating to the free movement of personal data". It's not about how to sabotage IT systems. The best approach is to calm down and see whats O.K. and whats not O.K. regarding personal data protection.

It's technically essential that a web server processes an IP address of a browser/client. Without this ability a web server would be unable to send a response back to the browser/client.

Avoiding the processing of personal data is no option. (Actually there are of course options. For example the TOR browser or using an anonymization proxy would be an option. But this must be done by the client.)

Regarding your web server and a GDPR complaint set-up you should:

  • take care, that your log files will be deleted after 7 days (recommendation of Data Protection Authority of Bavaria)
  • include IP address and other gathered private data (e.g. browser identification string) into your web sites privacy statement
  • enable HTTPS and redirect all HTTP traffic to HTTPS (or even use HSTS)
  • take care to set-up a secure server (see Best practices for hardening new sever in 2017)

However there is proper way of anonymize IP logging in Nginx. I would not recommend it but it works.

How to delete log files after 7 days:

With the installed service log-rotation you have to change the Nginx config file as follows ...

vim /etc/logrotate.d/nginx

/var/log/nginx/*.log {
    daily
    missingok
    rotate 7
    compress
    delaycompress
    notifempty
    create 0640 www-data adm
    sharedscripts
    prerotate
            if [ -d /etc/logrotate.d/httpd-prerotate ]; then \
                    run-parts /etc/logrotate.d/httpd-prerotate; \
            fi \
    endscript
    postrotate
            invoke-rc.d nginx rotate >/dev/null 2>&1
    endscript
}
Jens Bradler
  • 6,133
  • 2
  • 16
  • 13
  • In my understanding, you are missing some points with GDPR. GDPR is also about storing data securely **and only when essential required**. I do not need IP information, so I am not allowed to store it. But you are right, GDPR is also about processing, and it is a essential part with IP communication so it is allowed with GDPR. Storing personal data without a need is prohibited by GDPR. – burnersk Jul 11 '18 at 13:14
  • As a webmaster you will need it for technical purposes and even in case someone attacks or abuses your services you will need the IP address. For more details check here: https://www.datenschutz-notizen.de/duerfen-ip-adressen-zu-sicherheitszwecken-gespeichert-werden-1616783/ – Jens Bradler Jul 11 '18 at 13:25
  • 1
    "I do not need IP information", well ... yes you do, you are trying very hard not to but it is impacting your need clearly. Even your hash solution will hit a wall when for exemple you will be required (by law, by the user,...) to look for a specific user browsing path. Also i do not know where your are but you a required by law to store access log for judiciary need for around a year.. – silmaril Jul 11 '18 at 13:34
  • @silmaril could you please add details which law you are actually corresponding to? – st-h Aug 24 '18 at 13:47