How to reduce error rate when scanning pages using sheet feeder?

1

I would like to scan old text documents, and then destroy some of the originals.

Apart from spot-checking, what can I do to get an acceptably low scan failure rate? I would like to get a failure rate below perhaps 0.25% (after spot checks). I count as failures pages that are missed or are not legible.

This seems a difficult target to achieve. What can I do to reduce the rate of failures in the first place, so that I have less checking to do?

Related question (this question is about "QA" i.e. preventing failures, the linked question is about "QC" i.e. detecting failures): How to verify scanned page count and quality when using sheet feeder?

Croad Langshan

Posted 2015-03-29T17:18:23.353

Reputation: 818

Use high quality equipment or scan all documents multiple times (in several batches) – Nifle – 2015-03-29T19:03:16.910

Call me silly but I hadn't thought of scanning multiple times. Why not make it an answer? "Use high quality equipment", on the other hand, is too vague to be useful. – Croad Langshan – 2015-03-29T19:39:31.597

Scanning is very consistent, unless you are talking about things like misfeeds. If a page is successfully fed, the result will be the same each time. You need to deal with the potential causes of illegible results. They won't be an issue for high quality originals. The sources are things like originals that aren't in good physical condition, or where the content is hard to capture due to fading, discoloration, background noise on the paper, content color that doesn't scan well (light colors, but especially blue), etc. Can you describe the originals? Solutions are specific to the problem. – fixer1234 – 2015-03-29T20:41:00.253

1In your parallel question "What features are important in a scanner + sheet feeder for old personal documents" you stated "I'm very unlikely to spend more than 500 UK pounds on hardware (and likely significantly less than this)". You might achieve your goals (price and quality-wise) if you have documents that vary only slightly. Otherwise your expectations at less than 500 GBP are overblown. The quality of a scanner (and the outcome of your scanning) is not only determined by the scanner's hardware but very much also by the quality of bundled drivers and software. – user291737 – 2015-03-30T12:33:58.013

@user291737: why is that overblown, when you consider that that error rate is after manual checks (and repeat scans)? This question is about how to reduce the pre-check error rate so as to make the manual checking less onerous, not to achieve 0.25% with no checking. – Croad Langshan – 2015-04-02T00:20:50.287

Because you wrote spot checks, i.e. a random sample of scanned documents. You don't mention your envisaged sample size but let's assume that a spot check is <10% of the total of thousands of pages (as you wrote in your other 2 questions). As you explained in your other question (http://superuser.com/questions/895454/what-features-are-important-in-a-scanner-sheet-feeder-for-old-personal-documen) your documents vary A LOT. At a budget of 500 GBP you get consumer grade or very basic professional grade scan equipment (hardware + software) that does not offer the amount of automation you look for.

– user291737 – 2015-04-03T11:17:40.793

Right. Well, I'm open-minded, and somebody suggested giving each page a quick look by eye: seems quite workable. The documents are mostly fairly boring A4, or close to that -- modified the other question to make that clear (though really I think it would be better if the question were a little less explicit in the details, and answerers used their own expectations about the contents of typical home filing cabinets, since I suspect useful answers can still be given about scanners that handle a relatively broad workload and I'd rather the answers are useful to other people than just me...). – Croad Langshan – 2015-04-03T15:57:55.883

Answers

1

To reduce your error rate with very diverse documents (as you stated in What features are important in a scanner + sheet feeder for old personal documents):

(A) The "simple" answer: 1. Sort your documents into batches of equal document characteristics. 2. For each batch do test scans with varying scanner driver settings. Do this until you find a set of driver settings that produces scans with your intended failure rate of "below perhaps 0.25%" within the test sample. 3. Use these driver settings and scan the rest of your batch. 4. Do spot checks to verify whether your scan results are within your intended failure rate. 5. If you get a higher failure rate: either go back to step 2 and fine-tune your driver settings with a new test sample or go back to step 1 and divide your batch into separate batches with each their own scanner driver settings.

(B) With (A) you should be able to reach your intended failure rate with very simple documents i.e. plain black one-sided print on white, non-folded, non-wrinkled standard quality paper. If you have many such documents your batch size can be quite large. But the more attributes a document has (e.g. colored paper, colored print, screen-printed images/graphics, bleed-through on thin paper, low contrast, yellowing, fading on sales slips, damaged paper, …) the more time consuming your scanning will get at a budget of 500 GBP. You will need to keep variation in document attributes as low as possible to reach your failure rate. As a consequence, your batch size will decrease. Depending on your documents, you might end up checking more or less every other document to stay within your failure rate. In case you want OCR for easier document retrieval and you have documents in different languages, this will add an additional dimension of complexity.

(C) Buy a professional software that claims to be capable of processing whatever you throw at your scanner – no need for document sorting beforehand. But 1. such software alone would blow your budget, 2. such software works only with certified scanners that eat up your entire budget and are still "hungry" for additional software.

user291737

Posted 2015-03-29T17:18:23.353

Reputation: 155

0

You might have a chance to reach your failure rate of below perhaps 0.25% with less time and effort as in my answer above and within the budget of 500 GBP, which you mentioned in your parallel question, as follows:

There are companies that rent out professional scanners, sometimes including a computer with additional professional scan and/or post-processing software. Ask such a company for equipment (scanner and software) including introduction into its use and support on standby, available for a day or two, within your budget that allows maximum automation in image processing with a minimum of prior sorting into batches of similar document characteristics.

With some luck you might get equipment with your budget that will allow you to scan most of your documents in one run with some additional reruns for special cases - provided you are able to handle such equipment and/or have quick help on standby.

The benefit of this approach: You will see what is possible with scanners and software at a certain price level and you will be able to better adjust your expectations when you later buy your own document scanner at a budget you might revise upwards of 500 GBP after this experience.

user291737

Posted 2015-03-29T17:18:23.353

Reputation: 155