1

We have two instances of an application (same application, different database), lets call them app1 and app2. The route the logs take is:

  1. appX runs filebeat, which picks up the applications log file tags it with appX, and sends each entry to an instance of logstash on a rabbitmq cluster. Its then added into rabbitmq.
  2. The message traverses the rabbitmq cluster, and is consumed by another logstash instance, which runs various filters depending on the tag.
  3. The same logstash instance submits the resulting log line to elasticsearch.

All good. In the case of app1 & 2, the filters are a grok, which splits out the message into fields we can then search on.

app1 works, the logs are processed by the filter, and appear in the correct index in elasticsearch as expected.

app2 does not work. The problem is the grok rules, I know this because If I remove the grok rules which are run for app2, the logs appear as expected. The moment I uncomment the grok rules, the logs for app2 stop appearing in elasticsearch anywhere

The filter for app1 and app2 are identical:

filter {
    if "app2" in [tags] {
            mutate {
                    gsub => ["message", "\n", "LINE_BREAK"]
            }
            grok {
                    match => [ "message", "%{TIMESTAMP_ISO8601:time}\s+%{WORD:level}\s+\[%{DATA:thread_name}\]\s+%{DATA:class}\:%{DATA:line} %{DATA:method}\s*-\s*%{DATA:message}$" ]
                    overwrite => [ "message" ]
            }
            date {
                    match => [ "time", "YYYY-MM-dd HH:mm:ss,SSS"]
                    target => "@timestamp"
            }
            mutate {
                    remove_field => [ "time" ]  # Removes the 'time' field                        
            }
    } 
}

I suspect that elasticsearch is refusing to index the logs from app2. Naturally I have checked the logs of both elasticsearch and logstash, but no problems are reported. This has led me to investigate how to "turn up" logging in elasticsearch.

Does anyone know how to get elasticsearch to report any errors related to ingesting these logs? or have an idea of how I can find out what is happening to app2's logs when the grok is enabled?

I have tried this:

    # curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{ 
  "transient": {
"logger._root": "trace" 
 } 
}
'

which predictably gives me a "fire hose to the mouth" but at least something that is greppable. The tag is only mentioned in terms of processing a specific log line.

Thanks in advance.

A little more on this: I have just run logstash with the app2 grok rules in place, and with filter logging enabled like so:

# curl -XPUT 'localhost:9600/_node/logging?pretty' -H 'Content-Type: application/json' -d'
{
"logger.logstash.filters.grok" : "ALL"
}'

No errors appear, which further backs up my theory that the grok rules themselves are correct and that its elasticsearch that is refusing to index the messages. I've also done this:

# curl -XPUT 'localhost:9600/_node/logging?pretty' -H 'Content-Type: application/json' -d' 
{
"logger.logstash.outputs.elasticsearch" : "ALL"
} '

To verify there are no indexing errors. While no errors are reported, somewhat worryingly, NOTHING is reported, which makes me wonder if Im barking up the wrong tree.

GeoSword
  • 1,647
  • 12
  • 16

1 Answers1

2

In my experience, when ES pukes on an insert, it'll puke it right up in the logs for the poor human to decode and figure out. I usually see it when there is a mapping problem. Either a type collision or a malformed document.

(This is reason number one I'm not logstashing my ES logs. That bad ingestion will fail ingestion again when it's in the ES logs, and around the error-loop runs. Also, it's a PII-bomb in the right cases)

Another troubleshooting step to try is to set a conditional output right next to the ES output to drop the app1 and app2 events to a file. It'll give you the chance to look at them side-by-side in the state they would be sent to ES. That may reveal some clues as to what's happening. Perhaps some other filter stage is manipulating one. Or maybe app2 never gets that far, so the problem is between the filter and the output {} stages.

sysadmin1138
  • 131,083
  • 18
  • 173
  • 296