Elasticsearch and couchdb river indexing slows down after a few hours

Question

Trying to import ~400m docs in to Elasticsearch from couchdb using the couchdb river plugin. Everything starts out great with indexing time around 5k/s but after a few hours come back and find its hitting the floor around 20/s. We have the system on a beefy box, a x1.xlarge, and all its doing is Elasticsearch. We have a 20 shard with no replication to help with the indexing and disable index refreshing. Heap is setup to use 65% of memory and we are using Java 7 latest from oracle.

What setting do i need to tune to help the initial data importing? I have played with bluk timeouts/size but still cant find the sweet spot.

Any help would be great. Zuhaib

What is the memory situation, is the rig swapping to disk? In cases like these I try to find the algorithmic bottleneck somewhere, an exponentially growing data structure or an inefficient indexing/sort/search routine. — Deer Hunter, Mar 08 '13 at 07:49
memory is good, the box has 15GB of ram we set 65% to JAVA Heap and its not even coming close to that max. This box is configured with no swap and from watching our monitoring graphs the box has lots of freeable memory, most in buffer or cache. — Zuhaib, Mar 08 '13 at 16:56

Elasticsearch and couchdb river indexing slows down after a few hours

0 Answers0