How to retrieve logs from AWS EC2 and Docker containers fail-proof?

Question

Currently, we have the following setup:

Multiple AWS EC2 machines, some which have a Docker Container running, some are dockerless. To retrieve Syslogs, webserver logs (Apache) and application logs, we have a Fluentd agent (td-agent) running on EC2 instance. These are forwarding error log messages to a centralized Fluentd server (a log aggregator) which then in turn send them to Graylog. For access logs, all td-agents on the instances forward them directly to AWS Kinesis Firehose, which in turn stores those on S3 every 5 minutes (buffered) and will be searchable by AWS Athena. Access logs on Docker containers are written to stdout and error logs to stderr. The docker logging driver for fluentd is being used to forward them to the td-agent installed on the respective host machine.

Now, there are some problems with this setup:

Access logs can't be viewed in real time (e.g. for debugging purposes by developers)
The caching/buffering of the td-agents might become a problem on high loads
The caching/buffering of the td-agents or Docker containers might become a problem when the log aggregator or the td-agents are out of service

We're not using CloudWatch Logs because of the price and other reasons. Working with real log files would also mean, that we'd need to rotate them regularly, pay attention to disk space etc. The last point might be tackled by using a RAM-Disk or a separate drive. But this would not resolve the actual problem of having a fixed size cache/buffer which might get full and blocking incoming logs.

What's a better approach on this problem? Are there any best practices regarding logging Docker containers?

Roger Lehmann · Accepted Answer · 2017-12-12T16:19:06.783

In case anyone else stumbling here and wondering how to do it, here's what our options were and what we finally committed to:

Docker's fluentd log driver: Would've been great but you can't easily limit the size of the file output plugin. Hacks with only having the buffer viewable weren't successful either.
Docker's json-file log driver: You can limit the number of log files as well as their size. They're easily viewable by docker logs, but they're tailable for the root user only. They are not meant to be tailed by an automated system. Trust me, I tried it and it wasn't worth it. Several drawbacks including no support for tag directive, td-agent takes at least 30 seconds to use new containers' log files and so on.
Docker's Syslog log driver: This is what we're using now. It supports tag, can be viewed and grepped by other users for a live overview and works quite nicely with td-agent. Drawbacks: Adds another service and overhead. Doesn't support docker logs. Also, there might be a problem with rate limiting etc.

How to retrieve logs from AWS EC2 and Docker containers fail-proof?

1 Answers1