The default NGINX format is this:
log_format combined '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent"';
Which is a bit hard to parse. I am afraid that people inject "
in either requests, referrers or user-agents.
I have thought about using delimiters instead, and use my own format, that uses |P-,|
as a delimiter:
log_format parsable '$status |P-,| $time_iso8601 |P-,| $http_host
|P-,| $bytes_sent |P-,| $http_user_agent |P-,| $http_referer
|P-,| $request_time |P-,| $request';
However, nothing prevents users from injecting |P-,|
into their requests, referrers or user-agents.
I read this article about ASCII delimited text: https://ronaldduncan.wordpress.com/2009/10/31/text-file-formats-ascii-delimited-text-not-csv-or-tab-delimited-text/
I think that could be used to solve this problems, but users would be able to inject ASCII delimiters into their data as well.
Is there a best-practice way to solve this problem?