What features are important in a scanner + sheet feeder for old personal documents

0

I would like to scan some old text documents. My purpose is twofold: disaster recovery (e.g. fire), and to save space on bulky documents I rarely refer to (e.g. old phone bills).

After scanning I intend to destroy some of the originals, where I rarely refer to them and they are bulky. The rest I will keep and continue referring to. I do not intend to OCR the documents.

I estimate there are a few thousand sides of A4 to scan, and I am aiming for only a few failures (missed or illegible sides) per 1000 sides scanned. By illegible I mean text that a human cannot read reliably.

I would like to do this myself rather than using a commercial service.

I believe the documents are fairly typical of what home users will have collected in their filing cabinets over the past say 10 or 20 years:

  • Mostly (perhaps 80%) standard paper size or close to standard size (A4, would be US letter elsewhere presumably)
  • Some bills that are longer than A4 (less than 10%)
  • A small number of "very miscellaneous" pages (less than 10%)
  • Mostly relatively flat good quality paper
  • The documents are printed on various papers since they include bills, receipts, letters, etc.
  • Many but not all documents are printed on both sides
  • A mixture of colour and in black and white only. Most of the documents do not use colour in an important way
  • A minority of pages with some graphics and pictures, etc. (perhaps 5 or 10%)
  • A minority of yellowed pages (less than 5%)

I would like to scan in colour because I do not want to verify that all of the colour information is unimportant. I will exclude large format documents (e.g. A3), but I would ideally like to scan bills that are longer than A4.

I don't mind scanning the "awkward cases" sheet-by-sheet but would like to save time using a sheet feeder where possible. However I anticipate that a high-end professional scanner isn't really called for. Also, as long as documents are still human-legible, damage to the paper is not very important.

Aside from dpi, what features in a scanner and sheet feeder are important for a job like this? By "features" I mean specific technical features (or performance characteristics) of the design, rather than broad categories like "reliability".

I am not looking for product recommendations. I would like to know what features are relevant for this scale of application.

Croad Langshan

Posted 2015-03-29T17:49:33.797

Reputation: 818

Question was closed 2015-04-12T20:01:48.137

you mean scanner device ? – TechLife – 2015-03-29T17:57:21.133

@fixer1234 I'm not looking for product recommendations (apart from off-topic this would be impractical since there are too many models and too varied availability). How is it an odd question? I'm completely unfamiliar with scanners and sheet feeders, I know that mechanical designs etc. vary, and would like to know what features are relevant for this scale of application. I don't consider price to be a feature exactly, but of course that constrains the relevant set of devices. – Croad Langshan – 2015-03-29T18:06:50.420

@TechLife: yes, a scanner to me is a kind of device (software would be "scanning software"). – Croad Langshan – 2015-03-29T18:08:39.773

1There are relatively inexpensive, consumer-grade sheet-fed scanners and commercial-grade scanners for high-volume work. Huge difference in cost and size. Will you need these requirements after the job is done? How much is your time worth and how much do you have? The main difference you would see would be speed and better paper feeding. The output quality would be comparable (you wouldn't know after-the-fact which machine they were scanned on). It's really a question of comparing features of what is available, weighing what's important to you, and investigating owner satisfaction. – fixer1234 – 2015-03-29T18:30:05.477

The scope is limited to what's in the question: I won't be doing other big jobs after this one. To give an idea, I'm very unlikely to spend more than 500 UK pounds on hardware (and likely significantly less than this). I did expect speed and paper feeding to be the areas that separate different devices, and I am definitely interested in those, since they will likely determine whether the project is practical. – Croad Langshan – 2015-03-29T18:43:01.420

1000 sides is two reams of paper. There are commercial scanners, probably over your price range, that would handle that in a few batches and scanning would be completed in under 20 minutes. They would also be better at handling non-pristine pages. Inexpensive consumer scanners might require on the order of 100 batches, plus more re-feeding if the originals are not in good shape. Even so, scanning might take only a few hours in total, although you might need to stretch it out to not exceed the scanner's duty cycle, and do some feed roller cleaning during the job. – fixer1234 – 2015-03-29T19:05:52.227

For once, I'll give in to the urge to recommend a specific product. I know this is (for very good reasons) off topic, but still I'd like to mention that I used Fuji's ScanSnap iX500 to scan 1000's of pages for similar goals. Price, quality, speed, size are well balanced. I'd happily recommend it. Ps: besides that I own one Fuji product, I have no interest, intentions or gain with this recommendation. Just wanted to share my positive experience. – agtoever – 2015-03-29T21:06:35.520

1Your question is very broad. There are many aspects to consider. To get more specific answers, you will need to be much more specific about your documents, e.g. are they on standard printer paper or on very thin paper. Is print both-sided or only one-sided. Are they in color or black and white only, do they contain graphics or pictures, etc. Is the paper yellowed. Are there some smaller formats in between. Do some pages have other paper glued on e.g. as accountants do it with sales receipts. – user291737 – 2015-03-30T12:20:34.923

What do you mean with "illegible sides". Do you want to read them with your eyes or with optical character recognition? That's quite a difference. Our eyes are capable to read low quality scans where OCR failed completely. – user291737 – 2015-03-30T12:48:47.680

If the quality of your documents vary a lot and you want "failure rate below perhaps 0.25%" you need a professional scanner (hardware + driver + software). To get this all at 500 GBP you need to consider buying a used professional scanner. – user291737 – 2015-03-30T13:24:07.317

It is false that answers to this question will tend to be "almost entirely based on opinions": see user291737's answer. I have responded to that user's comments by editing the question. – Croad Langshan – 2015-04-02T00:15:46.447

Don't forget that you lose information when you switch to an electronic format! Personal documents have characteristics that help to find them later, e.g. storage type (folders, boxes, file racks, drawers in all kind of materials/colors), location (shelves, cabinets in different rooms or even outside your house/flat), format size, and so on. Usually you know more or less where to look. In electronic form all these visual cues are lost! All folders have the same color and shape, all documents the same few icons. You should not underestimate this. – user291737 – 2015-04-03T14:18:52.617

@user291737 Thanks, I do appreciate that. My intent is 1. to get rid of old boring documents like old phone bills and 2. to help with disaster recovery (fire etc.). I don't intend to get rid of all of the old documents, mostly for the reasons you cite. – Croad Langshan – 2015-04-03T14:24:57.537

What I wanted to indicate with my remark above: You might need to consider beforehand how to replace the missing cues by other ones (e.g. OCR) in order to later find documents in your thousands of scans. And the decision about OCR has an impact on your scanning equipment. To aim for human readable only will cost you a lot of time finding your documents later. – user291737 – 2015-04-03T14:36:14.427

@user291737: Edited question to emphasize disaster recovery and getting rid of boring bulky documents (I should have done that to start with...) – Croad Langshan – 2015-04-03T15:39:42.747

I know that you are not looking for product recommendations but in the end information that shortens your decision making process might be helpful. PC Magazine reviews scanners on a regular basis (www.pcmag.com/reviews/scanners). It offers a relatively wide overview of scanners and their pros and cons in comparison. (I am not linked to PC Mag) – user291737 – 2015-04-03T16:29:51.957

There's a system occasionally advertised on TV designed for this kind of application (http://www.tryneat.com/site/tryneat/home.html; probably available from places like Amazon). It's a sheetfed scanner with feeders for different sized documents. It's optimized for these kinds of documents (and does double-sided scanning). However, it also does OCR as part of the process and the software does automated filing of the results. If you simply scan a couple of thousand sheets, you will never find a specific one if you actually need it. I've never used it, but it looks great on TV.

– fixer1234 – 2015-04-03T18:12:30.223

1One other thought: you might be scanning way more than you need to. At least in the US, most things like bills and receipts have no purpose after varying times, but generally 5 yrs is the upper limit on those. Various legal documents should be retained longer, you might want to keep things like medical records, etc. Research document retention standards where you live. If your documents are 10-20 years old, it could be that a shredder would be more useful than a scanner. – fixer1234 – 2015-04-03T18:19:23.590

1+ to fixer1234. Your shredder and paper bin is your best friend in saving a lot of time! – user291737 – 2015-04-03T22:12:07.417

@fixer1234: thought takes time too :-) – Croad Langshan – 2015-04-05T19:11:03.507

Answers

1

If your pages (or some of them) where folded or are wrinkled (e.g. paper dried after exposure to water or high humidity) better chose a scanner with CCD instead of CIS. CCD elements have a much greater depth of field than CIS. Scanning such paper with a CIS scanner will result in unsharp areas on your scan. OCR often fails in unsharp areas. You might sharpen such areas with settings in the driver or with software but this might still not do the trick to get reliable OCR. With a CCD scanner you avoid the problem in the first place.

Regarding pages longer than A4: Probably all sheet feed scanners at your price point support that. It's usally a setting in the scanner driver that switches off multi-page feed detection by length.

Comparing scanners by advertised speed (pages/images per minute) can be very misleading. Some producers state it at 150 others at 200 or 300 dpi. Speed very much depends on the scanner driver settings you chose. Example: If you scan a newspaper/magazine article with (screen-printed) pictures/graphics at 300 dpi and aim for small document size, you need to choose the descreen function in the driver. This will cause your scanner to slow down considerably. Although you set 300 dpi for such a scan the speed will be comparable to a scan at about 600 dpi (remember that we talk about rather inexpensive document scanners for 500 GBP only).

Chose a scanner with LEDs as light source instead of cold cathode discharge lamps, which is an older kind of lightning. LEDs have a longer live span and do not need a warm-up time.

user291737

Posted 2015-03-29T17:49:33.797

Reputation: 155

0

As for any job of that importance, I would say that the reliability of the product / company is of importance. (The specs don't matter if the quality of the scan will be low, or the feeder breaks.) Also, I assume (although I might be wrong, of course) that all scanners today will have high enough dpi and will be able to output to the usual file types (jpeg for lower file size, png for higher quality, etc.)

However, I'd recommend taking a moment to consider whether digital preservation is reliable enough. E.g.

  • Are we sure that a dvd, HDD, or flash drive will hold its memory for many years (assuming you want this for many years).
  • Are we sure that we'll be able to read the files a decade from now? (Think file type, and hardware type. - how would you read information from a floppy disk today?!)

See Digital Preservation on Wikipedia. And this answer on this site.

ispiro

Posted 2015-03-29T17:49:33.797

Reputation: 1 259

Though good advice, I think the part of this answer that deal with preservation reliability don't belong here except maybe in a comment. – Croad Langshan – 2015-03-29T18:44:54.790

@CroadLangshan The answer (however good or not, that's a separate issue) is in the first paragraph. However, if I'm already answering, I think some advice which I see as important (though admittedly, not worthy of an answer by itself) can be added after it. But as I said - I agree, the "answer" itself is only in the first paragraph. – ispiro – 2015-03-29T18:49:59.123

Thanks for the answer. I don't consider "reliability" to be a feature, because it is too broad an answer to be useful for my purposes. What I'm looking for is something more specific such as technical features. For example, "consider getting one with thunking sprockets" (I made up that feature :-). I've edited the question to and explicitly exclude dpi since everybody seems to agree that feature is not interesting since high dpi is so widespread. – Croad Langshan – 2015-03-29T19:28:33.977

-1

Assuming that you intend to continue scanning incoming documents on a regular basis (if you only plan to scan old ones you better get it done at a scan service anyway):

Scan profiles, some scanner producers call it scan presets, will make your work much easier and faster. With a profile/preset you save a combination of scanner driver settings for later reuse. Example: Profile A for plain black print on standard white paper, B for colored magazine articles, C for sales slips of different sizes (e.g. auto-crop to original size instead of scanning small slips at a standardized page sizes), D for thin paper with print on both sides (driver settings e.g. see-through or bleed-through prevention), E for documents with extra length, etc.

Considering the documents you mentioned you will probably get to the point where you need more than 9 scan profiles. Many ADF scanners offer just 9 profiles, some even less. Some producers implement scan profiles in the driver, others in "scan utility" software. Some offer hardware buttons to choose among profiles. Many models with hardware buttons and display just show the profile number without additional text. Will you later remember what profile 3 does? A few scanners have a display that shows text as well, so you can give your profiles speaking names. And more than 9 profiles? Often implemented in software – but such demands get you quickly beyond consumer-grade hardware/software.

I recommend buying a scanner where auto-crop is already supported in the driver. If you have to crop your scans with additional software you have to live with a lot of compromises. So better do not count on upgrading this feature with additional software at a later stage. Reliable auto-crop is very hard to implement on the software level alone (and requires quite some CPU power). Even if a consumer-level third-party software claims to support auto-crop you will get a lot of false results (from not enough cropped to cropped too much, to even cropped completely at random - there is consumer and semi-professional software for around 200 USD that cropped completely at random in my tests).

Why did I not limit my answer to hardware? Because buying a scanner is not like buying a printer as those that did not use a document scanner before might think. The print dialogue is more or less standardized and variations are quite limited across the many printer producers and models we use for our general printing needs. WIA drivers (Windows) for scanners are similarly standardized but you get only a fraction of your scanner's capabilities. TWAIN drivers are a completely different story. If you have no prior experience with scanner drivers and image processing, the time necessary for understanding and using your scanner's driver and scan utility software to its full potential can vary a lot depending on the scanner's producer and even the producer's model. And even after you understood one model you might be lost with another one to the point that you want to through it out of your window.

Once you bought your scanner, you are stuck with its driver(s) and scan utility software – assuming you are not prepared to go beyond your budget with additional third-party software or you are not willing or able to patch your workflow with scripts or manually go through process steps with a number of free or open source software. If you are willing to spend additionally for additional image processing capabilities, more scan profiles, more automation (file naming, distributing files to specific folders, etc.) it gets expensive quickly because you enter a market focused on larger companies that is only slowly moving towards small companies with limited IT resources. Your scanning needs overlap with the needs of many small companies or SOHOs.

user291737

Posted 2015-03-29T17:49:33.797

Reputation: 155

Why the downgrade of the answer? – user291737 – 2015-04-04T17:26:12.740

To all readers: Any feedback about the (possible) reason of the downgrade could be helpful to improve the answer. – user291737 – 2015-04-05T21:46:11.947