CentOs 6.2 server with ElasticSearch crashing multiple times each day

Question

Im running a CentOS 6.2 server 1GB Memory 98 GB Disk Size

Elasticsearch version 1.4.2 with around 9000 documents and using JDBC to retrieve the records from a mysql database

Over the past few weeks it has crashed on average 2 times a day. When this happens the server ain’t reachable until it is rebooted. (no ssh access, 500 error when trying to reach the website, etc)

Restarting Elasticsearch frequently seems to prevent the server from going down but i would much rather find and fix the source of the problem. These are the logs of a crash that happend a few days ago:

Elasticsearch.log

[2015-10-04 10:20:58,247][WARN ][monitor.jvm              ] [Speedball] [gc][young][152940][2627] duration [2.2s], collections [1]/[3.9s], total [2.2s]/[12.5m], memory [89.8mb]->[35.9mb]/[197.2mb], all_pools {[young] [54.1mb]->[110.1kb]/[54.5mb]}{[survivor] [472.9kb]->[656.6kb]/[6.7mb]}{[old] [35.2mb]->[35.2mb]/[136mb]}
[2015-10-04 10:21:05,538][INFO ][index.shard.service      ] [Speedball] [rondreis][0] updating refresh_interval from [1s] to [-1000]
[2015-10-04 10:21:06,795][INFO ][index.shard.service      ] [Speedball] [tblproductoffer][0] updating refresh_interval from [1s] to [-1000]
[2015-10-04 10:22:54,653][WARN ][monitor.jvm              ] [Speedball] [gc][young][153002][2628] duration [1.3s], collections [1]/[1.9s], total [1.3s]/[12.5m], memory [90.3mb]->[36.8mb]/[197.2mb], all_pools {[young] [54.5mb]->[24kb]/[54.5mb]}{[survivor] [656.6kb]->[1.5mb]/[6.7mb]}{[old] [35.2mb]->[35.2mb]/[136mb]}
[2015-10-04 10:22:57,406][INFO ][index.shard.service      ] [Speedball] [rondreis][0] updating refresh_interval from [-1000] to [1s]
[2015-10-04 10:22:59,971][WARN ][monitor.jvm              ] [Speedball] [gc][young][153006][2630] duration [1.7s], collections [1]/[2.2s], total [1.7s]/[12.5m], memory [83.1mb]->[36.8mb]/[197.2mb], all_pools {[young] [42.8mb]->[45.3kb]/[54.5mb]}{[survivor] [5mb]->[835.8kb]/[6.7mb]}{[old] [35.2mb]->[35.9mb]/[136mb]}
[2015-10-04 10:23:08,215][INFO ][river.jdbc.RiverMetrics  ] pipeline org.xbib.elasticsearch.plugin.jdbc.RiverPipeline@f106d7f complete: river jdbc/rondreis metrics: 3304 rows, 0.1295726757143051 mean, (6.604331351617896 1.365257914334395 0.45761655479553015), ingest metrics: elapsed 2 minutes 2 seconds, 915.27 KB bytes, 283.0 bytes avg, 0.007 MB/s
[2015-10-04 10:25:33,021][WARN ][monitor.jvm              ] [Speedball] [gc][young][153151][2632] duration [1.5s], collections [1]/[1s], total [1.5s]/[12.5m], memory [64.7mb]->[91.3mb]/[197.2mb], all_pools {[young] [27.8mb]->[54.5mb]/[54.5mb]}{[survivor] [917.6kb]->[2.1mb]/[6.7mb]}{[old] [35.9mb]->[35.9mb]/[136mb]}
[2015-10-04 10:25:43,724][INFO ][monitor.jvm              ] [Speedball] [gc][young][153154][2639] duration [1.4s], collections [2]/[2.2s], total [1.4s]/[12.6m], memory [91.7mb]->[45.3mb]/[197.2mb], all_pools {[young] [48.4mb]->[168.9kb]/[54.5mb]}{[survivor] [4.3mb]->[4.2mb]/[6.7mb]}{[old] [39mb]->[40.9mb]/[136mb]}
[2015-10-04 10:25:57,116][WARN ][monitor.jvm              ] [Speedball] [gc][young][153165][2641] duration [3s], collections [1]/[3.3s], total [3s]/[12.7m], memory [90.7mb]->[52.1mb]/[197.2mb], all_pools {[young] [44.5mb]->[2kb]/[54.5mb]}{[survivor] [4.2mb]->[6.7mb]/[6.7mb]}{[old] [41.8mb]->[45.4mb]/[136mb]}
[2015-10-04 10:26:06,283][WARN ][monitor.jvm              ] [Speedball] [gc][young][153171][2642] duration [3.6s], collections [1]/[3.9s], total [3.6s]/[12.7m], memory [89.4mb]->[54.8mb]/[197.2mb], all_pools {[young] [37.2mb]->[4.3kb]/[54.5mb]}{[survivor] [6.7mb]->[5mb]/[6.7mb]}{[old] [45.4mb]->[49.8mb]/[136mb]}
[2015-10-04 10:26:12,194][WARN ][monitor.jvm              ] [Speedball] [gc][young][153172][2643] duration [4.9s], collections [1]/[5.9s], total [4.9s]/[12.8m], memory [54.8mb]->[57.9mb]/[197.2mb], all_pools {[young] [4.3kb]->[4.1kb]/[54.5mb]}{[survivor] [5mb]->[3.7mb]/[6.7mb]}{[old] [49.8mb]->[54.2mb]/[136mb]}
[2015-10-04 10:26:38,171][WARN ][monitor.jvm              ] [Speedball] [gc][young][153176][2644] duration [3s], collections [1]/[22.2s], total [3s]/[12.9m], memory [101.9mb]->[42.2mb]/[197.2mb], all_pools {[young] [43.9mb]->[4.1kb]/[54.5mb]}{[survivor] [3.7mb]->[0b]/[6.7mb]}{[old] [54.2mb]->[42.2mb]/[136mb]}
[2015-10-04 10:26:39,186][WARN ][monitor.jvm              ] [Speedball] [gc][old][153176][209] duration [18.3s], collections [1]/[22.2s], total [18.3s]/[4.4m], memory [101.9mb]->[42.2mb]/[197.2mb], all_pools {[young] [43.9mb]->[4.1kb]/[54.5mb]}{[survivor] [3.7mb]->[0b]/[6.7mb]}{[old] [54.2mb]->[42.2mb]/[136mb]}
[2015-10-04 10:26:47,155][WARN ][monitor.jvm              ] [Speedball] [gc][young][153182][2646] duration [2.6s], collections [1]/[2.9s], total [2.6s]/[12.9m], memory [102.9mb]->[67.5mb]/[197.2mb], all_pools {[young] [47mb]->[6.1kb]/[54.5mb]}{[survivor] [6.7mb]->[6.7mb]/[6.7mb]}{[old] [49.1mb]->[60.8mb]/[136mb]}
[2015-10-04 10:26:54,247][WARN ][monitor.jvm              ] [Speedball] [gc][young][153186][2647] duration [1.4s], collections [1]/[3.1s], total [1.4s]/[12.9m], memory [106mb]->[77.7mb]/[197.2mb], all_pools {[young] [38.4mb]->[842.4kb]/[54.5mb]}{[survivor] [6.7mb]->[6.7mb]/[6.7mb]}{[old] [60.8mb]->[70.2mb]/[136mb]}
[2015-10-04 10:27:03,745][WARN ][monitor.jvm              ] [Speedball] [gc][young][153189][2648] duration [1.6s], collections [1]/[7s], total [1.6s]/[13m], memory [118.3mb]->[51.7mb]/[197.2mb], all_pools {[young] [41.3mb]->[8.1kb]/[54.5mb]}{[survivor] [6.7mb]->[0b]/[6.7mb]}{[old] [70.2mb]->[51.7mb]/[136mb]}
[2015-10-04 10:27:58,542][INFO ][index.shard.service      ] [Speedball] [tblproductoffer][0] updating refresh_interval from [-1000] to [1s]
[2015-10-04 10:28:27,313][INFO ][river.jdbc.RiverMetrics  ] pipeline org.xbib.elasticsearch.plugin.jdbc.RiverPipeline@72a191c1 complete: river jdbc/tblproductoffer metrics: 76480 rows, 2.962182052284631 mean, (98.523212205934 28.85622315906601 10.274778138333478), ingest metrics: elapsed 7 minutes 21 seconds, 16.72 MB bytes, 229.0 bytes avg, 0.038 MB/s

Messages

        Oct  4 10:47:33   kernel: ps invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
    Oct  4 10:47:35   kernel: ps cpuset=/ mems_allowed=0
    Oct  4 10:47:35   kernel: Pid: 31557, comm: ps Not tainted 2.6.32-504.3.3.el6.x86_64 #1
    Oct  4 10:47:35   kernel: Call Trace:
    Oct  4 10:47:35   kernel: [<ffffffff810d40c1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
    Oct  4 10:47:35   kernel: [<ffffffff81127300>] ? dump_header+0x90/0x1b0
    Oct  4 10:47:35   kernel: [<ffffffff8122eabc>] ? security_real_capable_noaudit+0x3c/0x70
    Oct  4 10:47:35   kernel: [<ffffffff81127782>] ? oom_kill_process+0x82/0x2a0
    Oct  4 10:47:35   kernel: [<ffffffff811276c1>] ? select_bad_process+0xe1/0x120
    Oct  4 10:47:35   kernel: [<ffffffff81127bc0>] ? out_of_memory+0x220/0x3c0
    Oct  4 10:47:35   kernel: [<ffffffff811344df>] ? __alloc_pages_nodemask+0x89f/0x8d0
    Oct  4 10:47:35   kernel: [<ffffffff8116c69a>] ? alloc_pages_current+0xaa/0x110
    Oct  4 10:47:35   kernel: [<ffffffff811246f7>] ? __page_cache_alloc+0x87/0x90
    Oct  4 10:47:35   kernel: [<ffffffff811240de>] ? find_get_page+0x1e/0xa0
    Oct  4 10:47:35   kernel: [<ffffffff81125697>] ? filemap_fault+0x1a7/0x500
    Oct  4 10:47:35   kernel: [<ffffffff8114eae4>] ? __do_fault+0x54/0x530
    Oct  4 10:47:35   kernel: [<ffffffff8114f0b7>] ? handle_pte_fault+0xf7/0xb00
    Oct  4 10:47:35   kernel: [<ffffffff81198b9b>] ? pipe_wait+0x5b/0x80
    Oct  4 10:47:35   kernel: [<ffffffff8114fcea>] ? handle_mm_fault+0x22a/0x300
    Oct  4 10:47:35   kernel: [<ffffffff8104d0d8>] ? __do_page_fault+0x138/0x480
    Oct  4 10:47:35   kernel: [<ffffffff8128c245>] ? _atomic_dec_and_lock+0x55/0x80
    Oct  4 10:47:35   kernel: [<ffffffff811b07d0>] ? mntput_no_expire+0x30/0x110
    Oct  4 10:47:35   kernel: [<ffffffff8118f981>] ? __fput+0x1a1/0x210
    Oct  4 10:47:35   kernel: [<ffffffff8152ff7e>] ? do_page_fault+0x3e/0xa0
    Oct  4 10:47:35   kernel: [<ffffffff8152d335>] ? page_fault+0x25/0x30
    Oct  4 10:47:35   kernel: Mem-Info:
    Oct  4 10:47:35   kernel: Node 0 DMA per-cpu:
    Oct  4 10:47:35   kernel: CPU    0: hi:    0, btch:   1 usd:   0
    Oct  4 10:47:35   kernel: Node 0 DMA32 per-cpu:
    Oct  4 10:47:35   kernel: CPU    0: hi:  186, btch:  31 usd: 173
    Oct  4 10:47:35   kernel: active_anon:87918 inactive_anon:88009 isolated_anon:5376
    Oct  4 10:47:35   kernel: active_file:136 inactive_file:122 isolated_file:332
    Oct  4 10:47:35   kernel: unevictable:0 dirty:0 writeback:284 unstable:0
    Oct  4 10:47:35   kernel: free:12258 slab_reclaimable:3027 slab_unreclaimable:14974
    Oct  4 10:47:35   kernel: mapped:478 shmem:22 pagetables:36389 bounce:0
    Oct  4 10:47:35   kernel: Node 0 DMA free:4648kB min:668kB low:832kB high:1000kB active_anon:4400kB inactive_anon:4648kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):768kB isolated(file):48kB present:15368kB mlocked:0kB dirty:0kB writeback:20kB mapped:44kB shmem:0kB slab_reclaimable:36kB slab_unreclaimable:244kB kernel_stack:16kB pagetables:864kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:12 all_unreclaimable? no
    Oct  4 10:47:35   kernel: lowmem_reserve[]: 0 994 994 994
    Oct  4 10:47:35   kernel: Node 0 DMA32 free:44384kB min:44384kB low:55480kB high:66576kB active_anon:347272kB inactive_anon:347388kB active_file:544kB inactive_file:488kB unevictable:0kB isolated(anon):20736kB isolated(file):1280kB present:1018016kB mlocked:0kB dirty:0kB writeback:1116kB mapped:1868kB shmem:88kB slab_reclaimable:12072kB slab_unreclaimable:59652kB kernel_stack:3440kB pagetables:144692kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:3034 all_unreclaimable? yes
    Oct  4 10:47:35   kernel: lowmem_reserve[]: 0 0 0 0
    Oct  4 10:47:35   kernel: Node 0 DMA: 34*4kB 64*8kB 22*16kB 2*32kB 8*64kB 8*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 4648kB
    Oct  4 10:47:35   kernel: Node 0 DMA32: 2240*4kB 2782*8kB 373*16kB 81*32kB 34*64kB 3*128kB 2*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 44384kB
    Oct  4 10:47:35   kernel: 17908 total pagecache pages
    Oct  4 10:47:35   kernel: 17292 pages in swap cache
    Oct  4 10:47:35   kernel: Swap cache stats: add 2986384, delete 2969092, find 1357308/1573932
    Oct  4 10:47:35   kernel: Free swap  = 0kB
    Oct  4 10:47:35   kernel: Total swap = 2097148kB
    Oct  4 10:47:35   kernel: 262127 pages RAM
    Oct  4 10:47:35   kernel: 7514 pages reserved
    Oct  4 10:47:35   kernel: 132906 pages shared
    Oct  4 10:47:35   kernel: 231782 pages non-shared
    Oct  4 10:47:35   kernel: [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
    Oct  4 10:47:35   kernel: [  294]     0   294     2725       36   0     -17         -1000 udevd
    Oct  4 10:47:35   kernel: [  812]     0   812     6905       74   0     -17         -1000 auditd
    Oct  4 10:47:35   kernel: [  842]     0   842    62271       98   0       0             0 rsyslogd
    Oct  4 10:47:35   kernel: [  863]    25   863    40013       68   0       0             0 named
    Oct  4 10:47:35   kernel: [  878]    81   878     5358       38   0       0             0 dbus-daemon
    Oct  4 10:47:35   kernel: [  905]     0   905     1020       37   0       0             0 acpid
    Oct  4 10:47:35   kernel: [  925]     0   925    16672       39   0     -17         -1000 sshd
    Oct  4 10:47:35   kernel: [  937]     0   937    13145       14   0       0             0 vsftpd
    Oct  4 10:47:35   kernel: [  974]     0   974    27075       41   0       0             0 mysqld_safe
    Oct  4 10:47:35   kernel: [ 1289]    27  1289   237303     9646   0       0             0 mysqld
    Oct  4 10:47:35   kernel: [ 1369]     0  1369     4816       35   0       0             0 dovecot
    Oct  4 10:47:35   kernel: [ 1375]    97  1375     3246       43   0       0             0 anvil
    Oct  4 10:47:35   kernel: [ 1376]     0  1376     3278       45   0       0             0 log
    Oct  4 10:47:35   kernel: [ 1378]     0  1378     3589       44   0       0             0 config
    Oct  4 10:47:35   kernel: [ 1394]   498  1394   271775     3584   0       0             0 java
    Oct  4 10:47:35   kernel: [ 1408]    93  1408    23430       42   0       0             0 exim
    Oct  4 10:47:35   kernel: [ 1420]     0  1420   112748      522   0       0             0 httpd
    Oct  4 10:47:35   kernel: [ 1437]     0  1437    16578       52   0       0             0 nginx
    Oct  4 10:47:35   kernel: [ 1439]   496  1439    16713      139   0       0             0 nginx
    Oct  4 10:47:35   kernel: [ 1441]   496  1441    16762      137   0       0             0 nginx
    Oct  4 10:47:35   kernel: [ 1442]   496  1442    16579       85   0       0             0 nginx
    Oct  4 10:47:35   kernel: [ 1450]     0  1450    13814       11   0       0             0 vesta-nginx
    Oct  4 10:47:35   kernel: [ 1452]   502  1452    13814       29   0       0             0 vesta-nginx
    Oct  4 10:47:35   kernel: [ 1462]     0  1462    34253       56   0       0             0 vesta-php
    Oct  4 10:47:35   kernel: [ 1471]   502  1471    34253       31   0       0             0 vesta-php
    Oct  4 10:47:35   kernel: [ 1472]   502  1472    34253       31   0       0             0 vesta-php
    Oct  4 10:47:35   kernel: [ 1482]     0  1482    29333       68   0       0             0 crond
    Oct  4 10:47:35   kernel: [ 1499]     0  1499   204119      485   0       0             0 fail2ban-server
    Oct  4 10:47:35   kernel: [ 1539]     0  1539     1016       34   0       0             0 mingetty
    Oct  4 10:47:35   kernel: [ 1541]     0  1541     1016       34   0       0             0 mingetty
    Oct  4 10:47:35   kernel: [ 1543]     0  1543     1016       34   0       0             0 mingetty
    Oct  4 10:47:35   kernel: [ 1545]     0  1545     1016       34   0       0             0 mingetty
    Oct  4 10:47:35   kernel: [ 1547]     0  1547     1016       34   0       0             0 mingetty
    Oct  4 10:47:35   kernel: [ 1549]     0  1549     1016       34   0       0             0 mingetty
    Oct  4 10:47:35   kernel: [ 1560]     0  1560     2724       31   0     -17         -1000 udevd
    Oct  4 10:47:35   kernel: [ 1561]     0  1561     2724       31   0     -17         -1000 udevd
    Oct  4 10:47:35   kernel: [10565]     0 10565    62866      239   0       0             0 httpd
    Oct  4 10:47:35   kernel: [10567]   503 10567   141456     1109   0       0             0 httpd
    Oct  4 10:47:35   kernel: [10568]   503 10568   141469     1260   0       0             0 httpd
    Oct  4 10:47:35   kernel: [10569]   503 10569   142692     1186   0       0             0 httpd
    Oct  4 10:47:35   kernel: [10570]   503 10570   141602     1183   0       0             0 httpd
    Oct  4 10:47:35   kernel: [10571]   503 10571   141610      589   0       0             0 httpd
    Oct  4 10:47:35   kernel: [10572]   503 10572   141756     1256   0       0             0 httpd
    Oct  4 10:47:35   kernel: [10573]   503 10573   141157     1020   0       0             0 httpd
    Oct  4 10:47:35   kernel: [10574]   503 10574   142386     1227   0       0             0 httpd
    Oct  4 10:47:35   kernel: [12531]   503 12531   141259      606   0       0             0 httpd
    Oct  4 10:47:35   kernel: [12532]   503 12532   141667      945   0       0             0 httpd
    Oct  4 10:47:35   kernel: [12533]   503 12533   141454      609   0       0             0 httpd
    Oct  4 10:47:35   kernel: [29641]   502 29641    35055       52   0       0             0 crond
    Oct  4 10:47:35   kernel: [29645]     0 29645    43896       51   0       0             0 sudo
    Oct  4 10:47:35   kernel: [29648]     0 29648    26582       46   0       0             0 v-backup-users
    Oct  4 10:47:35   kernel: [29958]     0 29958    26681       46   0       0             0 v-backup-user
    Oct  4 10:47:35   kernel: [30067]     0 30067    28084       67   0       0             0 find
    Oct  4 10:47:35   kernel: [30068]     0 30068    30701       45   0       0             0 tar
    Oct  4 10:47:35   kernel: [30826]   503 30826   116293     1098   0       0             0 httpd
    Oct  4 10:47:35   kernel: [30834]   503 30834   116141      630   0       0             0 httpd
    Oct  4 10:47:35   kernel: [30835]   503 30835   116329     1236   0       0             0 httpd
    Oct  4 10:47:35   kernel: [30837]   503 30837   116329      997   0       0             0 httpd
    Oct  4 10:47:35   kernel: [30838]   503 30838   116397     1242   0       0             0 httpd
    Oct  4 10:47:35   kernel: [30839]   503 30839   116329     1057   0       0             0 httpd
    Oct  4 10:47:35   kernel: [30844]   503 30844   116236     1131   0       0             0 httpd
    Oct  4 10:47:35   kernel: [30845]   503 30845   116108      673   0       0             0 httpd
    Oct  4 10:47:35   kernel: [30848]   503 30848   140602      988   0       0             0 httpd
    Oct  4 10:47:35   kernel: [30849]   503 30849   116104      614   0       0             0 httpd
    —
    ——continues for a while—
    — 
    Oct  4 10:47:35   kernel: [31576]   503 31576   112875      833   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31577]   503 31577   112879      978   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31578]   503 31578   112785      797   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31579]   503 31579   112876      841   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31580]   503 31580   112781      496   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31581]   503 31581   112782      726   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31582]   503 31582   112875      845   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31583]    48 31583   112748      480   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31584]    48 31584   112748      480   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31585]    48 31585   112748      480   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31586]     0 31586   112748      445   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31587]    48 31587   112748      441   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31588]     0 31588   112748      414   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31589]    48 31589   112748      434   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31590]    48 31590   112748      441   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31591]     0 31591   112748      423   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31592]     0 31592   112748      404   0       0             0 httpd
    Oct  4 10:47:35   kernel: [31593]     0 31593   112748      445   0       0             0 httpd
    Oct  4 10:47:35   kernel: Out of memory: Kill process 1394 (java) score 107 or sacrifice child
    Oct  4 10:47:35   kernel: Killed process 1394, UID 498, (java) total-vm:1087100kB, anon-rss:14132kB, file-rss:204kB

And the elastic search log from when it went down today (was able to revive it with ssh this time without rebooting by using ‘service elasticsearch reboot’).

[2015-10-09 10:21:08,062][INFO ][index.shard.service      ] [Black Panther] [tblproductoffer][0] updating refresh_interval from [1s] to [-1000]
[2015-10-09 10:22:22,647][DEBUG][action.admin.indices.settings.put] [Black Panther] failed to update settings on indices [rondreis]
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (update-settings) within 30s
    at org.elasticsearch.cluster.service.InternalClusterService$2$1.run(InternalClusterService.java:263)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
[2015-10-09 10:22:23,002][WARN ][monitor.jvm              ] [Black Panther] [gc][young][83250][1221] duration [5.5s], collections [1]/[14.2s], total [5.5s]/[3.8m], memory [125.8mb]->[74.4mb]/[197.2mb], all_pools {[young] [53.1mb]->[1.5mb]/[54.5mb]}{[survivor] [1018kb]->[1.1mb]/[6.7mb]}{[old] [71.7mb]->[71.7mb]/[136mb]}
[2015-10-09 10:22:23,101][ERROR][river.jdbc.SimpleRiverFlow] failed to process cluster event (update-settings) within 30s
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (update-settings) within 30s
    at org.elasticsearch.cluster.service.InternalClusterService$2$1.run(InternalClusterService.java:263)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
[2015-10-09 10:25:02,504][WARN ][monitor.jvm              ] [Black Panther] [gc][young][83345][1222] duration [3.4s], collections [1]/[4.2s], total [3.4s]/[3.9m], memory [127.3mb]->[76mb]/[197.2mb], all_pools {[young] [54.5mb]->[1.1mb]/[54.5mb]}{[survivor] [1.1mb]->[3.1mb]/[6.7mb]}{[old] [71.7mb]->[71.7mb]/[136mb]}
[2015-10-09 10:25:42,385][WARN ][monitor.jvm              ] [Black Panther] [gc][young][83361][1223] duration [8.8s], collections [1]/[9.7s], total [8.8s]/[4m], memory [128.8mb]->[75.4mb]/[197.2mb], all_pools {[young] [53.9mb]->[66kb]/[54.5mb]}{[survivor] [3.1mb]->[2.8mb]/[6.7mb]}{[old] [71.7mb]->[72.5mb]/[136mb]}
[2015-10-09 10:25:45,132][INFO ][river.jdbc.RiverMetrics  ] pipeline org.xbib.elasticsearch.plugin.jdbc.RiverPipeline@322e6dcf complete: river jdbc/rondreis metrics: 3288 rows, 0.12815869356635873 mean, (6.572349117469625 1.358646495863042 0.45540049399748883), ingest metrics: elapsed 4 minutes 38 seconds, 908.23 KB bytes, 282.0 bytes avg, 0.003 MB/s
[2015-10-09 10:27:44,274][INFO ][node                     ] [Black Panther] stopping ...
[2015-10-09 10:28:01,713][WARN ][river.jdbc.RiverThread   ] interrupted
[2015-10-09 10:29:45,413][WARN ][common.jna               ] Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out. Increase RLIMIT_MEMLOCK (ulimit).
[2015-10-09 10:30:01,866][INFO ][node                     ] [Vance Astrovik] version[1.4.2], pid[11796], build[927caff/2014-12-16T14:11:12Z]
[2015-10-09 10:30:01,867][INFO ][node                     ] [Vance Astrovik] initializing ...
[2015-10-09 10:30:04,408][INFO ][plugins                  ] [Vance Astrovik] loaded [jdbc-1.4.0.6-664f819], sites []
[2015-10-09 10:32:14,502][INFO ][node                     ] [Vance Astrovik] initialized
[2015-10-09 10:32:14,502][INFO ][node                     ] [Vance Astrovik] starting ...
[2015-10-09 10:32:17,555][INFO ][transport                ] [Vance Astrovik] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/185.13.224.246:9300]}
[2015-10-09 10:32:18,209][INFO ][discovery                ] [Vance Astrovik] elasticsearch/3s3H31MrSCmyEze2t2L_jA
[2015-10-09 10:32:22,383][INFO ][cluster.service          ] [Vance Astrovik] new_master [Vance Astrovik][3s3H31MrSCmyEze2t2L_jA][sensead.localdomain][inet[/185.13.224.246:9300]], reason: zen-disco-join (elected_as_master)
[2015-10-09 10:32:22,864][INFO ][http                     ] [Vance Astrovik] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/185.13.224.246:9200]}
[2015-10-09 10:32:22,864][INFO ][node                     ] [Vance Astrovik] started
[2015-10-09 10:32:34,032][INFO ][gateway                  ] [Vance Astrovik] recovered [3] indices into cluster_state
[2015-10-09 10:32:37,313][DEBUG][action.search.type       ] [Vance Astrovik] All shards failed for phase: [query]
[2015-10-09 10:32:40,213][INFO ][river.jdbc.JDBCRiver     ] scheduled river instance with cron expressions [0 21 * * * ?]
[2015-10-09 10:32:42,423][INFO ][river.jdbc.JDBCRiver     ] scheduled river instance with cron expressions [0 21 * * * ?]
[2015-10-09 10:33:14,578][WARN ][indices.ttl              ] [Vance Astrovik] failed to execute ttl purge

The last line be printed to the log every 60 seconds.

I've been trying to solve this issue for a couple of weeks now. Am I overseeing something simple or is something bigger happening?

Without even looking at the logs I can tell you that running ES with 1GB of RAM total on your system is a bad idea. But this `kernel: Out of memory: Kill process 1394 (java)` confirms it. Add more RAM to the system and I'm guessing your problems will go away. — GregL, Oct 09 '15 at 14:55
"Out of memory" is a pretty big thing to miss. Aside from adding more RAM, you should bring the system up to date. — Michael Hampton, Oct 09 '15 at 15:02

score 0 · Answer 1 · answered Oct 09 '15 at 15:06

Based on the log entry :

Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out. Increase RLIMIT_MEMLOCK (ulimit).

You will need to make changes to your elasticsearch configuration apart from upgrading your RAM as mentioned here

score 0 · Answer 2 · answered Oct 09 '15 at 15:42

0

Well, this message talks crystal clear that the issue you are having is the low memory:

         Oct  4 10:47:35   kernel: Out of memory: Kill process 1394 (java) score 107 or sacrifice child

Or limit memory size to your java/httpd instances.

answered Oct 09 '15 at 15:42

ostendali

373
1
4

CentOs 6.2 server with ElasticSearch crashing multiple times each day

2 Answers2