Google Mini Search Optimization for PDFs

1

I have it working, per se, but maybe I have chosen the wrong tool for the job.

Basically we have electronic copies of numerous books related to our industry.

What I wanted to do was create a searchable index of those books.

Unfortunately, many of the books are larger than the 30MB file size indexing limit, so they don't even get indexed. (I think there is a configuration to change this?)

Those that do however, I can search for them, find them, and am linked directly to them... But upon clicking the link, the entire PDF is downloaded, and it displays the PDF starting at page one instead of the page the search terms were found on.

Any suggestions or advice on how to approach this project? I am completely open...

I think the first question is "should I even bother trying to adapt the materials / google mini to work in this scenario?" and if so, "which approach is best?"

Earls

Posted 2011-06-16T17:35:57.553

Reputation: 218

Answers

0

My solution was to split the PDFs into individual pages. This works for me because I'm searching and serving reference materials, for example, a Dictionary.

If the user wants to know the definition of "apple" then searching apple will return the single PDF page the word and definition of apple appear on. That's all the user wants to know.

This wouldn't work so well if the paragraph spanned multiple pages - though as long as you keep your PDFs under 2.5MB, you can "package" any number of pages into a single PDF.

Earls

Posted 2011-06-16T17:35:57.553

Reputation: 218

1

I would probably stay away from the google mini approach to this and look into getting some content management software online. I'm personally fond of the Alfresco community edition. It might be a little too high end for your application though.

ErnieTheGeek

Posted 2011-06-16T17:35:57.553

Reputation: 429

0

Not sure if this answer helps you. But here it goes:

Acrobat Pro as well as Acrobat Reader (even on Linux) do offer some "PDF open commandline parameters". These control how exactly the document is opened (which page, which zoom level, etc).

One of the things supported is to open a PDF with the search dialog open and the matching search words already clickable. Examples:

Acrobat and Acrobat Professional on Windows:

 acrobat.exe ^
   /a #search="superuser basketball supermodels" ^
   "d:\path\to\example.pdf"

Acrobat Reader on Windows:

 acrord32.exe ^
   /a #search="PDF computing searching" ^
   "d:\path\to\example.pdf"

Acrobat Reader on Linux:*

 acroread \
   /a #search="stackexchange football girls" \
   "/path/to/example.pdf"

On the Adobe website, search for "PDF Open Parameters" to locate the PDF manual describing all details about this functionality....

Kurt Pfeifle

Posted 2011-06-16T17:35:57.553

Reputation: 10 024