How can I extract text (text only) from a PDF of sheet music?

0

I have a PDF book of sheet music that I need to extract the text from. I don't need to extract the musical notes or anything, just the verses of the text.

I can't select one line of text by itself – it always selects other parts of the page. Copying the whole page together puts everything out of order. There are also hyphens between syllables that I'd like to remove.

This is the first song in the PDF: http://bradshawfamily.net/~samuel/zzz/34832_kek_h1.pdf

Samuel Bradshaw

Posted 2013-01-22T06:20:35.213

Reputation: 146

Answers

1

  1. My first thought was to copy&paste the whole text into notepad++ and do some regex actions to filter only valid characters. That failed because the lines are messed up after pasting them.

  2. Second thought: Use a online OCR like onlineocr.net or ocrconvert.com That wasn't as bad as I expected. Still you have to delete some misinterpretations

    enter image description here
    Click for full example

nixda

Posted 2013-01-22T06:20:35.213

Reputation: 23 233