Elasticsearch performance tuning

Question

In a Single Node Elastic Search along with logstash, We tested with 20mb and 200mb file parsing to Elastic Search on Different types of the AWS instance i.e Medium, Large and Xlarge.

Logstash conf

input {
   file {

  }

}

filter {
  mutate
  {
    gsub => ["message", "\n", " "]
  }
 mutate
 {
    gsub => ["message", "\t", " "]
 }
 multiline
   {
        pattern => "^ "
        what => "previous"
   }

grok { match => [ "message", "%{TIME:log_time}\|%{WORD:Message_type}\|%{GREEDYDATA:Component}\|%{NUMBER:line_number}\| %{GREEDYDATA:log_message}"] 
     match => [ "path" , "%{GREEDYDATA}/%{GREEDYDATA:loccode}/%{GREEDYDATA:_machine}\:%{DATE:logdate}.log"]

         break_on_match => false
}


#To check location is S or L
  if [loccode] == "S"  or [loccode] == "L" {
 ruby {   
        code => " temp = event['_machine'].split('_')
              if  !temp.nil? || !temp.empty?
          event['_machine'] = temp[0]
        end"
   } 
 }
 mutate {

    add_field => ["event_timestamp", "%{@timestamp}" ]
    replace => [ "log_time", "%{logdate} %{log_time}" ]
    # Remove the 'logdate' field since we don't need it anymore.
   lowercase=>["loccode"]
   remove => "logdate"

  }
# to get all site details (site name, city and co-ordinates)
sitelocator{sitename => "loccode"  datafile=>"vendor/sitelocator/SiteDetails.csv"}
date {  locale=>"en"
    match => [ "log_time", "yyyy-MM-dd HH:mm:ss", "MM-dd-yyyy HH:mm:ss.SSS","ISO8601" ] }

}

output {
elasticsearch{
     }

}

Environment Details : Medium instance 3.75 RAM 1 cores Storage :4 GB SSD 64-bit Network Performance: Moderate Instance running with : Logstash, Elastic search

Scenario: 1

**With default settings** 
Result :
20mb logfile 23 mins Events Per/second 175
200mb logfile 3 hrs 3 mins Events Per/second 175


Added the following to settings:
Java heap size : 2GB
bootstrap.mlockall: true
indices.fielddata.cache.size: "30%"
indices.cache.filter.size: "30%"
index.translog.flush_threshold_ops: 50000
indices.memory.index_buffer_size: 50%

# Search thread pool
threadpool.search.type: fixed
threadpool.search.size: 20
threadpool.search.queue_size: 100

**With added settings** 
Result:
20mb logfile 22 mins Events Per/second 180
200mb logfile 3 hrs 07 mins Events Per/second 180

Scenario 2

Environment Details : R3 Large 15.25 RAM 2 cores Storage :32 GB SSD 64-bit Network Performance: Moderate Instance running with : Logstash, Elastic search

**With default settings** 
Result :
  20mb logfile 7 mins Events Per/second 750
  200mb logfile 65 mins Events Per/second 800

Added the following to settings:
Java heap size: 7gb
other parameters same as above

**With added settings** 
Result:
20mb logfile 7 mins Events Per/second 800
200mb logfile 55 mins Events Per/second 800

Scenario 3

Environment Details : R3 High-Memory Extra Large r3.xlarge 30.5 RAM 4 cores Storage :32 GB SSD 64-bit Network Performance: Moderate Instance running with : Logstash, Elastic search

**With default settings** 
  Result:
  20mb logfile 7 mins Events Per/second 1200
  200mb logfile 34 mins Events Per/second 1200

 Added the following to settings:
    Java heap size: 15gb
    other parameters same as above

**With added settings** 
Result:
    20mb logfile 7 mins Events Per/second 1200
    200mb logfile 34 mins Events Per/second 1200

I wanted to know

What is the benchmark for the performance?
Is the performance meets the benchmark or is it below the benchmark
Why even after i increased the elasticsearch JVM iam not able to find the difference?
how do i monitor Logstash and improve its performance?

appreciate any help on this as iam new to logstash and elastic search.

score 0 · Answer 1 · answered Feb 18 '15 at 11:24

0

1- If you want comment to your perf we need to see your logstash filter config .

Logstash performance is a mix of filter/output/worker setup .

More filter = less event/seconds .

A good idea is to scale wide if you have logstash perf problems . More worker more instance could increase event/seconds perf . People work with sender to rabbimq queu and scale logstash node behind .

2- see 1

3- there is IO limits and sometine its better to have more node. Elasticsearch is designed to work with shard/node etc .

4- logstash monitoring is only process monitoring for the moment . There is some clue about doing that with java debugger but you have to find information in logstash user group . For elasticsearch there is marvel to monitor you elasticsearch cluster .

answered Feb 18 '15 at 11:24

YuKYuK

627
3
14

Attached logstash conf with my question posted. – Devaraj Feb 18 '15 at 12:49
In my case im using mutate filter,It wont support for filterworker so i haven't used – Devaraj Feb 18 '15 at 12:51
Ok you do more then classic job and to get more speed from logstash i can just give you some ticks like : try with oracle jvm and not open jdk . check step by step your conf to see if one the filter is a bottlenek . Don't forget to use last logstash . Try to split logstash and elastic search on different node . And i found a benchmark you can read to see perf with "classic" job https://github.com/matejzero/logstash-benchmark – YuKYuK Feb 18 '15 at 13:13
I'm using oracle jvm only and will check about the each filter and Let you know the results. – Devaraj Feb 18 '15 at 13:35
Hi, I have checked step by step to find bottleneck filter. Below filter which took much time. Can you guide me How can I tune it to get faster. date { locale=>"en" match => [ "log_time", "yyyy-MM-dd HH:mm:ss", "MM-dd-yyyy HH:mm:ss.SSS","ISO8601" ] } } – Devaraj Feb 19 '15 at 09:54
Do you really need a date processing ? Because logstash set the date at the beggening of processing of event. If its for realtime log do you need to match date ? – YuKYuK Feb 19 '15 at 10:23
Yes i need date processing , we trying to parse historical data – Devaraj Feb 19 '15 at 12:28
Ok so you need to work with more worker , more cache , more config etc :/ look this blog he seegregate input , filter , output : http://everythingshouldbevirtual.com/highly-available-elk-elasticsearch-logstash-kibana-setup – YuKYuK Feb 19 '15 at 13:02

score 0 · Answer 2 · answered Jul 25 '17 at 18:11

The way we monitor logstash:

1) Monitor elastic directly: Make a simple about/info call to ES api (if ES goes down, you are down)

2) Monitor elastic stats. Depends how you use. You can look for activity (# of docs, index size, etc) or other stat that is meaningful in your env. If you see the stat moving, you know logstash is successfully getting messages into ES

3) Logstash itself: Just hit the port it is listening on. If the port goes dark... logstash died/isnt' running.

Elasticsearch performance tuning

2 Answers2