1

The Setup

I've setup a simple Nginx server that logs (in a JSON) format, which is then piped to an S3 bucket with Apache Flume. All the Nginx server does is respond with a web beacon tracking pixel and write to the log file. Everything's cool so far.

The Problem

However, it would be nice to handle a couple other processing steps at this level of the pipeline:

  1. Convert query string parameters into actual JSON in the event records
  2. Set a UUID cookie for tracking purposes
  3. Increment some counters in a local database (eventually all the data will be processed with a Map/Reduce)

It seems like I'll need a custom Flume sink to convert the query string parameters and a backend to nginx to set the cookie and update the database. It seems very inefficient to have several systems in play here, especially when optimizing throughput for hundreds of requests per second.

Possible Solutions

My first thought was to use NodeJS, which could handle all these tasks (and even replace nginx?), but I don't like that it's single-threaded (maybe spawn child workers?).

Then I thought maybe the processing should happen at the Flume agent level, and a Java program could handle everything (any performance advantage here due to Flume's being written in Java?)

Questions

  1. Am I going about this the right way, or overthinking it?
  2. How would you recommend consolidating everything into one or two processes?
womble
  • 95,029
  • 29
  • 173
  • 228
landons
  • 111
  • 1

0 Answers0