1

I am using ManifoldCF to help index documents to Solr. Solr is configured in Cloud-Mode, with one node/core and an external ZooKeeper (on the same machine as the one running Solr). ManifoldCF reads the files (about 2300, total size 2,4GB) from a local hdd. Both systems are running in a VM using SUSE Enterprise and the HotSpot JVM. The machine running Solr has been set to 2,5 GB, of which Solr is allowed to use up to 2. The other machine running Manifold is currently set to 8GB. I am invoking Manifold with the following command (as root):

java -Xmx7168m -jar manifoldcf/example/start.jar

The indexing process is working flawlessly, apart from it coming to an aprupt halt when Manifold runs out of memory and starts throwing OutOfMemoryExceptions and crashing.

I haven't changed anything in the configuration of Manifold, apart from setting the Tika-Parser used in Solr to ignore exceptions, as those would interrupt the indexing process when documents with unknown/different formatting were scanned.

I have already tried using OpenJDK as well as changing to Ubuntu, which hasn't really changed anything. Using more or less memory (along with changing the java memory parameter) also has lead to the same problem. I also has a look at the garbage collection (using -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -Xloggc:/root/Documents/gc.log), results are available here. Using (way) less documents, the index process completes, but memory usage stays high, and goes up even further when indexing a second batch of documents, resulting, again, in a memory related crash (console output shows this error message before the application exits a few moments later).

The VMs are running on a machine with 16 GB RAM and a 3,6GHz-Quadcore with HyperThreading (i7-4790), both are allowed to use all 4 cores, CPU load ranges from rather low to medium.

Now my question: Is this a bug in ManifoldCF or Solr or is it related to a certain aspect of the setup or configuration? If it is my fault, what would be the appropriat way to fix this? (If this is in fact a bug or a problem I can't fix, alternatives to ManifoldCF (apart from the Simple Post Tool ofc) are also appreciated)

lgoenner
  • 11
  • 2

0 Answers0