How can I parse a human-readable byte count in Logstash?

Question

I'm dealing with log files containing parts such as:

538,486K of 1,048,576K

These represent memory use (Java heap space) rendered in a human-readable format. I would like to track those numbers in charts in Kibana. To do this I would like to somehow use Logstash's grok filter to parse these numbers, but I don't know how to handle (i.e. ignore) the thousands separator.

Ideally I would have something that can also handle the "K" and multiply by one thousand. At this point in time I am not aware that any system logs in a unit other than kilobyte, but I'd prefer not to make that assumption.

What about trying to make your app's log format more "machine readable" ? That would be more reliable than a regex. — , Mar 05 '15 at 18:01
@AndréDaniel: I am trying to get the production developers to treat logs more like data, but it's not up to me to change the code myself. And it would take many months until a change is rolled out across all customers. — Peter Becker, Mar 05 '15 at 23:51

score 1 · Accepted Answer · answered Mar 05 '15 at 17:58

The mutate filter allows text replacement with the gsub option.

gsub takes an array, where every triplet of values indicates:

Target field name
Search pattern
Replace pattern

It technically supports regular expressions, but we don't need that in this case.

First, we strip commas. Simple enough.

Second, we multiply. Should K multiply by 1000? If so, it seems to me that we can simply replace K with 000.

Putting those together:

filter {
    mutate {
        gsub {[
            "some_field", ",", "",
            "some_field", "K", "000"
        ]}
    }
}

You can add other replacement options as needed.

Depending on your circumstances, K might multiply by 1024, which is going to be a bit more complicated. I don't see any solution right out of the box, but you can use the ruby filter to run some arithmetic.

I ended up down the ruby path, but I think your answer is correct. I'll post mine as alternative. — Peter Becker, Mar 05 '15 at 23:52

score 1 · Answer 2 · answered Mar 05 '15 at 23:57

I think rutter's answer should work in my case. Here's what I ended up doing prior to reading it:

filter {
  grep {
    match => { "message" => "...something identifying the message..." }
    drop => false
    add_tag => [ "MyMarker" ] 
  }
  if "MyMarker" in [tags] {
    grok {
      match => [ "message", "...(?<rawCurValue>[0-9,]+)K of (?<rawMaxValue>[0-9,]+)K..." ]
      break_on_match => false
    }
    if "_grokparsefailure" not in [tags] {
      ruby {
        code => "
           if(event['rawCurValue'])
             event['curValue'] = Integer(event['rawCurValue'].gsub(',','')) * 1000
           end
           if(event['rawMaxValue'])
             event['maxValue'] = Integer(event['rawMaxValue'].gsub(',','')) * 1000
           end
        "
      }
    }
  }
}

I suspect it could be made more concise, but it seems to work.

How can I parse a human-readable byte count in Logstash?

2 Answers2