0

I've been looking at the ELK stack or RabbitMQ to replace a homegrown system that ingests a large number of files (200-300 million per hour) and operates on then sends them to various locations based on name and content, storing a copy locally. (Basically, ELK)

However, I'm more than a little offset at the complexity and hardware footprint required.

What are the advantages of RabbitMQ versus an ELK stack for this kind of task? It seems to me that ELK may be overkill, but I'm not familiar enough with RabbitMQ to say definitively.

Oblivious12
  • 31
  • 2
  • 7
  • 1
    200-300 million files per hour is going to require a hardware footprint. Seriously, like ... wow. RabbitMQ and ELK are not what you're looking for IMHO. You should be looking into an integration library, such as Spring Integration, Apache Camel, or the newer Apache Nifi (which has clustering). – thinice Mar 23 '17 at 15:32
  • Suffice to say an integration library is out of the question - so say the devs. I'm OK with having hardware, but I've got an upper limit on cost I have to satisfy. – Oblivious12 Mar 24 '17 at 14:54
  • Apache Kafka - https://kafka.apache.org/ – alexus Mar 24 '17 at 17:13
  • I think you should consider Apache Kafka over rabbitmq and ELK. I worked both on rabbitmq and ELK both are not ideal solutions for your problem it will be better you use Apache Kafka because Apache Kafka is insanely fast but its not that flexible as rabbitmq. Mainly for processing and streaming whereas Rabbitmq is mainly use as message broker but Kafka can do many things wickedly fast as they claim. :) – blackOcean Nov 24 '17 at 10:17

2 Answers2

1

My math says that's on the order of 83K events a second. That's a lot.

I'm not quite sure why you're separating RMQ and ELK, as RMQ can be a component of ELK. In fact, the very large deployments I know of definitely use either an AQMP solution like Rabbit, or something like Kafka, to provide the buffer between event generation and the parsing tier, as well as feeding multiple consumers.

The general high-scale pipeline that can handle an event-stream like you're considering:

LogstashArchitectureLarge - Distributed

  1. Shippers send the logs to a central queue. Shipping can be FileBeat, Logstash itself, or something else entirely.
  2. The queue system, whatever it is. Could be Redis, RabbitMQ, Kafka, or something else.
  3. The parsing tier. A group of Logstash nodes that pulls events off the queue, massages them, and ships them on to the next stage.
    • This scales horizontally. So long as your queue system can keep up, you can keep adding parsers here. In our system, with our filter rules, we can do 2K events/second per core. Yours will be different.
    • If you leverage channels in your queue, you can even have multiple parsing tiers depending on how your workload splits out.
    • This group is high CPU. How high RAM it is, depends on how gnarly your filters end up being.
  4. The Storage tier. In classic ELK, this is ElasticSearch. It doesn't have to be, though.
    • An ElasticSearch cluster handling 300M events an hour is going to be big. No getting around that. How big depends on how long you want to keep the data.
    • It sounds like your consumers are expecting files. This can be done too. So is sending processed events (which are just JSON) into yet another queuing system for consumption by other system.

The advantage to this architecture is that you're not putting your filtering logic on the resources that are doing production work. It also reduces the ingestion problem for ElasticSearch into mostly bulk requests coming from the Logstash parsing tier, rather than smaller ingestion batches coming from the production resources.

This also provides some security separation between your log archive (ElasticSearch) and your production resources. If one of those gets an evil person on them, having the queue buffer means they can't directly scrub ElasticSearch of their presence. That matters when you have a Security organization in your company, and at 300M events per hour you're probably large enough to have one.


Points against the Rabbit system are more around missing features. Rabbit is a queuing system, and doesn't by itself provide any way to transform the data (L in ELK) or store it for display (E and K in ELK). A log-archive system that can't display what its storing is not a good log-archive system.

sysadmin1138
  • 131,083
  • 18
  • 173
  • 296
0

Going along with @sysadmin1138 for the most part, but I want to warn you of comparing two distinct things. ELK is an implementation of three separate apps that provide a log-aggregation solution (F/OSS competitor to Splunk), where RabbitMQ is a Message Queue.

It sounds like you have a very specific application workflow, and anything that you do is going to require significant engineering. You could probably bend ELK to apply to your needs, but as sysadmin1138 said, any system with that workload is going to require proper scaling to meet capacity and HA requirements.

ELK and RabbitMQ do scale very well, but do require some expertise to be able to manage effectively. With that said, if you're not using and managing an ELK cluster at any kinds of scale (say at least 5 ES nodes), you might want to avoid using ELK as a hacked-together system for an unintended purpose.

Depending on the types of messages coming in, and where the results are being processed to, to me it sounds like you're probably better off with an MQ cluster, and app pools that pull jobs off and act on them. But at this point, you're talking about a significant project that probably requires involvement of a software architect and ops architect with familiarization of your business requirements and workflow. (i.e. more complex than a ServerFault question.)

gWaldo
  • 11,887
  • 8
  • 41
  • 68