12
9
I was over by my grandparent's place this past weekend. My grandmother pulled out this giant (~1400 page) book of her family history going back to 1630 or so. Giant nerd that I am, I thought it would be slick to have all the information stored in a database and available from the web. I can handle all the web programming and regular expressions and what not, but what I don't know is the best way to get the text from book to computer.
I know some kind of OCR will be necessary, from the little research I've done, it seems like my options are:
- take a picture of every page with a camera then process the pictures with OCR software
- use a scanner to scan each page, then process with OCR software
- use some kind of hand held device, like this.
Does anyone have any ideas about the best way to tackle this problem? I don't want to destroy the book, because as far as I know, it can't be replaced. This is probably the only time I'm ever going to scan a large book, so I don't think I want to spend more than $250 on any kind of device. I don't mind some manual effort here (I realize this will most likely take months), but I'd like to find the most efficient method possible.
Note about the book: It's only about 20 years old, so it's in pretty good shape. It's monochrome and the pages haven't begun to yellow. Since it is so large though, I worry about possible shadows when the text gets down close to the binding.
1On a side note, if the book is only 20 years old and the information goes back to the 1600s, where is the original source material? That might be nice to capture as well! – Craig – 2009-09-15T17:34:28.740
Yeah, that would be cool too. I'm going to see if I can track down the original author. – None – 2009-09-15T22:32:19.153