We're using ElasticSearch to store and inspect logs from our infrastructure. Some of those logs are required by law, and we can't afford to lose any.
We've been parsing logs for quite some time without any mapping. That makes them mostly unusable for searching and/or graphing. For example, some integer fields have been automatically recognized as text, and thus we can't aggregate them in histograms.
We want to introduce templates and mapping, which would solve the issue for new indices.
However, we've noticed that having a mapping also opens the door for parsing failures. If a field is defined as an integer but suddenly gets a non-integer value, then the parsing will fail, and the document will be rejected.
Is there any place those documents go and/or any way to save them for inspection later?
Python script here below works with a local ES instance.
#!/usr/bin/env python3
import requests
import JSON
from typing import Any, Dict
ES_HOST = "http://localhost:9200"
def es_request(method: str, path: str, data: Dict[str, Any]) -> None:
response = requests.request(method, f"{ES_HOST}{path}", json=data)
if response.status_code != 200:
print(response.content)
es_request('put', '/_template/my_template', {
"index_patterns": ["my_index"],
"mappings": {
"properties": {
"some_integer": { "type": "integer" }
}
}
})
# This is fine
es_request('put', '/my_index/_doc/1', {
'some_integer': 42
})
# This will be rejected by ES, as it doesn't match the mapping.
# But how can I save it?
es_request('put', '/my_index/_doc/2', {
'some_integer': 'hello world'
})
Running the script gives the following error:
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason":"failed to parse field [some_integer] of type [integer] in document with id '2'. Preview of field's value: 'hello world'"
}
],
"type": "mapper_parsing_exception",
"reason":"failed to parse field [some_integer] of type [integer] in document with id '2'. Preview of field's value: 'hello world'",
"caused_by": {
"type": "number_format_exception",
"reason": "For input string: \"hello world\""
}
},
"status": 400
}
And then the document is lost, or so it seems. Can I set an option somewhere that would save the document automagically somewhere else, a sort of dead letter queue?
tl;dr: We need mappings, but can't afford to lose log lines due to parsing errors. Can we automatically save the documents that don't fit the mapping somewhere else?