0

I am trying to apply an item template to my elasticsearch cluster, to deal with the issue of having fields with content longer than 32kb. I am using version 2.4.4, as this is the highest supported version in graylog.

See: https://github.com/Graylog2/graylog2-server/issues/873

Specifically the solution here: https://github.com/Graylog2/graylog2-server/issues/873#issuecomment-199898314

I am also running into an other problem that i am trying to fix with an item template. One of the fields can contain either a number or a string. But because elasticsearch maps the field based on the first occurance of a value in it, it will sometimes give me a MapperParsingException on the active index.

Based on the solution suggested in the linked github issue, i made my own item template and with support from the elasticsearch documentation i added a dynamic template.

This is the result:

{
    "template": "graylog*",
    "mappings": {
        "_default_": {
            "_all": {
                "enabled": false
            },
            "dynamic_templates": [{
                "entityid_as_string": {
                    "match": "EntityId",
                    "mapping": {
                        "type": "string"
                    }
                }
            },
            {
                "notanalyzed_string": {
                    "match_mapping_type": "string",
                    "mapping": {
                        "ignore_above": 32766,
                        "type": "string",
                        "doc_values": true
                    }
                }
            }]
        }
    }
}

The behaviour i expect, is that the field EntityId will always be mapped as a string. And that any string fields in a document, with content more than 32kb will not be indexed.

But that does not seem to be the case. Even after manually rotating the active write index, i am still getting the same errors. I have even rebooted the VM, and rotated active write index - with no effect.

Can anyone see an obvious mistake with my template? Specifically i am unsure if the _all section should be there.

I used this command to add it:

curl -XPUT 'localhost:9200/_template/loggingtemplate?pretty' -H 'Content-Type: application/json' -d'<template here'

And this command to verify that it has been added.

curl -XGET localhost:9200/_template/loggingtemplate
Sven
  • 97,248
  • 13
  • 177
  • 225
  • Please don't rollback the edit again. We consider the removed elements as noise. – Sven May 31 '17 at 11:58
  • I am not allowed to be polite? And if you do not want people to undo edits, then remove the feature. It clearly stated that I was allowed to as the creator of the post. – Morten Toudahl Jun 01 '17 at 05:52

1 Answers1

2

For some reason my dynamic mapping was not honoured.

Instead, to solve the issue I had to create a custom index mapping for all my index sets. Which is a dirty solution in my opinion, as i now have to copy paste the configuration for all index sets. Forgetting one will then result in indexing errors, and loss of messages - in case the structure of our log messages are changed in the future.

Details are here: http://docs.graylog.org/en/2.2/pages/configuration/elasticsearch.html#custom-index-mappings

This is the mapping i created for our index sets. In the specific example, I am applying it to an index set called "application_logs".

{
    "template": "application_logs_*",
    "mappings": {
        "message": {
            "properties": {
                "Message": {
                    "type": "string",
                    "ignore_above": 32766
                },
                "EventEntities": {
                    "type": "string",
                    "ignore_above": 32766
                },
                "Severity": {
                    "type": "string"
                },
                "EntityId": {
                    "type": "string"
                }
            }
        }
    }
}

To add it to elasticsearch, i would then use the following command.

curl -XPUT 'localhost:9200/_template/logs_fields_as_strings?pretty' -H 'Content-Type: application/json' -d'{"template": "application_logs_*","mappings" : {"message" : {"properties" : {"Message" : {"type" : "string","ignore_above" : 32766},"EventEntities" : {"type" : "string","ignore_above": 32766},"Severity" : {"type" : "string"},"EntityId" : {"type" : "string"}}}}}'

This will create a template called "logs_fields_as_strings".

For each index set we have, I would then need to modify the template name and the target of the template.

The number 32766 is the max number of bytes that a field can contain if it is to be indexed. Keep in mind that some UTF8 chars are 3 bytes. So if you expect to have those in your messages, you will need to divide 32766 by 3, to make sure that you do not lose any messages.