0

Our Elasticsearch cluster is used to provide search results for a frontend. Most of the traffic is pretty negligible and the cluster can handle the load just fine. At a scheduled time each week, however, several hundred thousands of newsletters are generated, each containing user-specific content, resulting in a ES query for each of them.

During that time the overall response time of our cluster is degrading significantly. We are looking for ways to mitigate this behavior and came up with the idea of having separate ES nodes for separate query concerns. So node A would be accessed for normal traffic, while node B would be accessed for newsletter queries exclusively. That way node B would only cause a slowdown for newsletter queries, which is fine.

Is a cluster setup like this possible/viable/advisable? Are there better alternatives?

1 Answers1

0

Is the data used by the frontend the same as used by the weekly newsletter job? If it is split into different indices, you could use Shard Allocation Filtering to make sure certain indices end up on specific hosts.

Alternatively, you could make sure that a number of nodes are dedicated to the frontend, and a number are dedicated to the weekly job. You'd use the "rack_id" trick to make sure the primary/replica shards are split properly between the two groups.

Nils
  • 301
  • 1
  • 3