How can I extract contents from scanned files?

I've used Preview and Automator before to extract text from PDF documents, but they do not work on scanned ones. How can I extract the contents from scanned files with the formatting preserved? I do not want to pay for Adobe.

Rosa Reyes

Posted 2017-02-27T02:17:07.210

Reputation: 11

Answers

By "scanned", I presume you mean the document contains only images of text, rather than the text characters. In that case, use optical character recognition (OCR) software.

For Windows OS, there are FreeOCR, a9t9 and others. There is also software for Android, Linux and Mac, and there are also browser-based online services.

DrMoishe Pippik

Posted 2017-02-27T02:17:07.210

Reputation: 13 291

Yes, I tried this one, but not that satisfied with the results, a little messy on formatting, too many gaps between words and sentences. – Rosa Reyes – 2017-03-01T09:13:45.227

I found Google OCR just solved my problem well. – Rosa Reyes – 2017-07-19T01:58:45.177

As it has been said already, your scanned documents are images (of text). In order to understand the text, you will need to run OCR (Optical Character Recognition) over that document.

There are several OCR products available for Mac, and it may be that your scanner came with such a product. However, by asking for formatting, you are demanding some quite sophisticated features, which are not available in basic products. You may therefore have to expect paying for that OCR software. Under this point of view, you might reconsider Acrobat.

Max Wyss

Posted 2017-02-27T02:17:07.210

Reputation: 1 481

Adobe costs money, any other alternatives？ – Rosa Reyes – 2017-03-01T09:12:11.577

@RosaReyes: … and? You want quite sophisticated functionality. Keep in mind, you get what you pay… – Max Wyss – 2017-03-02T09:56:38.700