1

My real patterns are more complex but I have tried to boil the problem down to the core issue. Something I don't understand. Please try this out on http://grokconstructor.appspot.com/do/match

I'm trying to match the following lines:

Start-Date: 2017-08-07  06:48:12
End-Date: 2017-08-07  06:48:12

Start-Date: 2017-08-07  12:55:16
End-Date: 2017-08-07  12:56:01

Using the additional patterns:

DATE_EU2 %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[\s]+?%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}?
DATE_COMB %{DATE_EU2}?%{DATE_EU}?%{DATE_US}?

And the following main pattern:

Start-Date: %{DATE_COMB:starttime}\nEnd-Date: %{DATE_COMB:endtime}

With the multiline filter:

^\n (negated)

Run that and you should (hopefully!) get:

Start-Date: 2017-08-07 06:48:12 End-Date: 2017-08-07 06:48:12 Start-Date: 2017-08-07 12:55:16 End-Date: 2017-08-07 12:56:01
MATCHED
starttime   2017-08-07··06:48:12
endtime 2017-08-07··06:48:12
after match:    Start-Date: 2017-08-07 12:55:16 End-Date: 2017-08-07 12:56:01

So it's matched the first record but not matched the second. If I add a '\z' to the end of the main pattern then it will match the second record but not the first. So it's clearly treating the whole thing as one line. But why? My multiline filter states that if a line does not start with a newline it's part of the previous record, right? Well that should leave a blank line in the middle which clearly does start with a newline and should therefore comprise a seperate event, no?

Any pointers gratefully accepted.

spoovy
  • 334
  • 4
  • 14
  • It looks like it's treating the entire string (both sets of dates) as a single entry. A quick look up for multiline with logstash brings up the multiline codec, which seems to have options for choosing how and when lines should be merged into one. – USD Matt Aug 08 '17 at 09:38
  • Sorry just seen that you're aware it's all processed as a single entry but your multiline filter seems wrong - lines won't start with a \n. You're better off matching "^End-Date" and merging that to previous. – USD Matt Aug 08 '17 at 09:53
  • Thanks, I am experimenting with that method (a positive multiline matcher) at the moment, but I am curious why the negative multiline matcher I'm using above is not working? It seems to me that the 'empty line' in the middle is a line that starts with a newline isn't it? – spoovy Aug 08 '17 at 09:59
  • Each line is processed individually. I doubt `\n` is part of the line during matching. (Not that it's a true indication of logstash but the appspot page won't match `\n`) – USD Matt Aug 08 '17 at 10:06
  • By the way I've just tested on appspot with `^\Z`, which matches the end of input, and appears to split the events as you require. `\z` also works which suggests that blank line really is completely empty when matched. – USD Matt Aug 08 '17 at 10:10

1 Answers1

3

Input

Start-Date: 2017-08-07  06:48:12
End-Date: 2017-08-07  06:48:12

Start-Date: 2017-08-07  12:55:16
End-Date: 2017-08-07  12:56:01

Multiline filter = ^\n (negated)

The multiline filter will look at each line in turn to see what should be merged.

First line starts with `^Start-Date` (merged)
Second line starts with `^End-Date` (merged)
Third line is blank (merged, unless logstash skips blank lines)
Fourth line starts with `^Start-Date` (merged)
Fifth line starts with `^End-Date` (merged)

Trying to match a \n, especially at the start of a line makes no sense.

You're better off matching ^End-Date: and merging that with previous. (Or if there's more lines for an event, and it always starts with Start-Date:, match that and negate.

Edit, based on comments and testing with the Grok constructor.

If it makes more sense to use the blank line as the record separator, ^\z or ^\Z appears to work. \Z ignores any final terminator, but seeing as \z also worked in my tests, it appears to confirm that the line, when passed into the filter, is a completely empty string (no newline or any other termination characters).

USD Matt
  • 5,321
  • 14
  • 23
  • Thanks, I can get the match to work (in the appspot page) by using either of the alternate multiline patterns you suggest. The problem is they don't work in reality, as far as I can tell because the actual application that writes the log file (apt) starts each new entry with a blank line. What I actually see then is the first entry of each day having a grokparsefailure, and subsequent ones succeeding. This is why I'm trying alternatives and trying to understand more about blank line matching. – spoovy Aug 08 '17 at 10:20
  • Ah I just read your comment re \Z. I never thought to try that. I shall experiment! Thanks. – spoovy Aug 08 '17 at 10:22
  • OK I think I've sorted my issues. It was actually rather esoteric in the end and revolved around the difference between aptitude and apt-get in what they do to the logfile while waiting for user input. In short aptitude does something (I don't know what, it keeps the file handle open) which breaks any grok/multiline combination I can think of. Apt-get apparently does not do this. Thanks USD Matt you helped kick me along the path a bit! – spoovy Aug 08 '17 at 12:04