0

I use a combination of Logstash and the AWS Elasticsearch service to index S3 access logs.

The logs are collected in an S3 bucket, processed with the Logstash S3 input filter, renamed after they are processed and then archived in another bucket. I use this method so that the number of access log files that Logstash has to process in each rotation is as small as possible.

However, the logs are not being processed in real time. When I look at Kibana or query Elasticseach, the most recent log entry that I see will be the latest log entry from the previous hour. I never see log entries that are < 1 hour old.

I can't see anything in the s3 input configuration options to control this behaviour. There is an interval config option, which I have set to 120 secs. This is supposed to instruct Logstash to poll the S3 bucket which contains the logs every 2 mins.

I also use this Logstash system to process syslog input from a variety of servers, which does process logs in next to real time.

Is this something peculiar to the S3 input filter in Logstash?

Garreth McDaid
  • 3,399
  • 26
  • 41

1 Answers1

1

This issue seems to arise from the way S3 generates access logs rather than anything to do with Elasticsearch or Logstash.

According to:

http://docs.aws.amazon.com/AmazonS3/latest/dev/ServerLogs.html

Server access log records are delivered on a best effort basis. Most requests for a bucket that is properly configured for logging will result in a delivered log record, and most log records will be delivered within a few hours of the time that they were recorded.

From what I can see, from looking at the actual files that contain the logs in the target S3 bucket, you will never see a log entry that is < 1 hour old.

You will see logs entries that are precisely 1 hour old, which explains they you see entries right up to the hour mark.

As such, both Elasticsearch and Logstash are performing as designed, and the issue is with AWS S3.

Garreth McDaid
  • 3,399
  • 26
  • 41