Can Acrobat 11 be made to do OCR using multiple CPU cores?

8

4

OCR processing takes time. Using multiple CPU cores would speed up processing. Acrobat 10 was not a multithreaded application. How about Acrobat 11? Does 11 by default do OCR using multiple CPU cores (if available)? If not, are there any workarounds, e.g. scripting, to help make Acrobat 11 do OCR using multiple CPU cores? Either through Acrobat's built in scripting language or using external scripts that launch and direct multiple single thread instances of Acrobat to in parallell to parts of the processing job.

Note: This question is not too localized (not limited to a specific moment in time) because (1) Adobe does not release new major Acrobat versions very often (Acrobat 10 was released two years ago) and (2) Adobe Acrobat is a widely used application.

tarcman.

Posted 2012-10-26T23:38:59.543

Reputation: 141

Answers

6

I have installed the Acrobat 11 (XI) trial in VirtualBox. Acrobat 11 is single threaded.

I have also made an external script that starts multiple Acrobat instances (one per CPU core), parallel processes the OCR job and merges the result. A crucial step is to turn on error logging in Acrobat preferences, parse all .log and reprocess any error files. The script (when using 4 cores) still does OCR over two times faster than Acrobat 11 default.

tarcman.

Posted 2012-10-26T23:38:59.543

Reputation: 141

1@tarcman Any possibility of posting your script? I am sure there are a lot of people who would be interested in using it – Jason – 2015-05-21T17:04:39.273

4You can just give the source if you want. If they dare removing it it can be restored easily. – Joey – 2012-10-29T08:07:55.587

Also, if you happen to be the same person who posted the question, consider merging both of your current unregistered accounts to a new, registered one. You can start here, and also read this for more information. After that you'll be able to amend your question as you see fit.

– Indrek – 2012-10-29T09:41:08.890

I'm not trying to obstruct anything. Because you keep switching user names, it appeared that your edits were by a third party who didn't seem to understand the original question. Also, if you want to answer your own question, you should write the question and answer all at once. – Isaac Rabinovitch – 2012-10-29T22:12:35.277

I've merged your (unregistered) accounts for now. We would however ask you not to use a disposable e-mail address, but register on our site so you can stay logged in, comment on your questions, et cetera. Also, nothing will be removed, nothing to worry about. Just note that anonymous edits are always reviewed more strictly. – slhck – 2012-10-29T22:13:31.843

@IsaacRabinovitch, maybe the OP wasn't aware of an answer right after posting? Also, new users cannot answer their questions within 8 hours after posting unless they have a certain level of reputation. – slhck – 2012-10-29T22:15:23.557

1

Multithreading needs to built into an application. The developer has to write code that creates threads and that breaks down the task into subtasks that can be allocated to each thread. If the developers of Acrobat fail to do this for their OCR recognition code, there's no way for the user to create the extra logic needed.

Isaac Rabinovitch

Posted 2012-10-26T23:38:59.543

Reputation: 2 645

2If it can be applied to ranges of pages you could probably try to divide the work into multiple processes, each OCRing just a few pages and afterwards merging the results back together. – Joey – 2012-10-28T12:11:11.127

0

To use all cores for OCR you may want to look at PDF-Exchange Editor. It's OCR engine appears to use all cores on my system. Once you get to this level of performance though, it make sense to use an SSD.

There must be a windows tweak that will cause it to dedicate more CPU time to a single threaded application that is not I/O bound. On my system Acrobat is not being slowed by disk performance but the most CPU time I get building an index is about 30%.

Let's face it, Acrobat is a widely used but poorly written application. Acrobat Pro has some features you still can't get anywhere else (yet).

Len

Posted 2012-10-26T23:38:59.543

Reputation: 1