Documents from Elasticsearch Cluster appearing twice in Grafana

Question

I set up an Elasticsearch cluster with one dedicated master node, two master-eligible data nodes and one coordinating node. The number of replicas is set to one.

There are two pipelines in Logstash, each receiving syslog messages from a firewall, converting it to JSON and feeding it into either one of the data nodes. I don't explicitly generate a UUID for the documents.

Grafana is connected to the coordinating node to pull data from the cluster.

So far so good. But I noticed that in Grafana I see every document twice. I assume that this is not correct, but I have no idea what might be the issue.

I checked the output from Logstash and found no copies, so I guess the duplication happens in the cluster. Can anybody give me a hint here? Do I have to add an ID to the documents prior to indexing?

Thanks, Henry

Why two pipelines in Logstash ? Both connected to the same firewall ? — Swisstone, Aug 16 '19 at 12:43
Hi Swisstone: Thanks for your answer. There is one pipeline for each firewall. I solved the issue (or worked around it?) by adding a fingerprint filter into Logstash to generate a document id. But I'd still be interested why the duplication happened in the first place. https://www.elastic.co/de/blog/logstash-lessons-handling-duplicates — Henry S., Aug 16 '19 at 13:13

score 0 · Answer 1 · answered Aug 16 '19 at 16:21

it turned out I misunderstood how Logstash works.

Putting two config files in the logstash "conf.d" directory [1] containing "input {}", "filter {}", and "output{} sections does not mean you run two pipelines.

Instead, Logstash merged all the files in that directory and now found two separate elasticsearch output plugins. Thats why it wrote all docs to both my ES nodes. When I tested earlier I looked at only one of the outputs, that's why I saw no duplicates.

Just adding an explicit document id was not a solution but only a very bad workaround as every document got stored and immediately overwritten by the database, causing a huge waste of resources.

I now use only one config file with two two inputs, one filter and one ES output that has both nodes in the host parameter.

Hope this helps others who have similar issues.

Regards, Henry

[1] .deb packet

If you want to use multiple pipelines in one instance of logstash, you need to define them im "pipelines.yml" https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html — Henry S., Aug 16 '19 at 16:50

Documents from Elasticsearch Cluster appearing twice in Grafana

1 Answers1