0
I have a few machines that run tesseract-ocr 4.0 for different applications. The machines have similar configuration(4 cores, 16 GB memory), and all of them run Ubuntu 16.04.5 LTS.
However, in the course of work, at least one of these applications has diverged and is running something which is causing a significant performance improvement in tesseract. So much so, that for a particular page, while the other instances' tesseracts take 7-7.5s, this particular instance's tesseract takes just 3.5-4 s.
Naturally, I want to isolate the reason for this and try and apply them to all the other instances.
Here is all I've found till now. 1. The storage is same for all of them, so no SSD/Magnetic HDD performance differences 2. The CPU cores are the same, i5-7400, 3 GHz 2. The OS version(16.04.5) and kernel versions(Linux 4.15.0-47-generic) are the same. 3. These are the tesseract-ocr version and dependent library details
tesseract 4.1.0-rc1-255-g332a1
leptonica-1.78.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.3.0
Short of comparing every package ever installed on every one of these systems, how do I find what is causing the improvement?
Any chance there's a difference in the other apps running in the background? – fixer1234 – 2019-04-16T07:34:13.020
adding to what fixer1234 said, could you run top/htop on each machine and see if there is something extra running that is causing the performance hit. – Randomhero – 2019-04-16T08:54:34.493