Finding, deleting soft hyphens from a piece of text

2

I'm trying to clean up a piece of text I've copied from an OCR'd PDF. Currently the text contains a lot of (what I assume are) soft hyphens. This makes it difficult to 'Find' them in Notepad or Microsoft Word, which means it's difficult to delete them from the text.

76987

Posted 2011-10-30T07:55:47.280

Reputation: 168

Answers

1

Do you have an example of where that would not be possible?

If they are just in plain text, you can simply search for them, and if the exist, you can substitute them by nothing. Note that a different encoding can give you different results, so try emacs or Notepad++...

jjmerelo

Posted 2011-10-30T07:55:47.280

Reputation: 136

Well, this may be difficult. Here is a small chunk of the text: "než ja­kákoli řeč". If you copy that into Notepad, you should see a soft hyphen appearing between the 'a' and 'k' of 'jakákoli'. But if you don't see it, I'm not sure what to do. – None – 2011-10-30T08:40:27.477

I have copied it into emacs, copied hyphen into buffer, pasted it into the "search" slot, and then search and deleted it just fine. You can download emacs for Windows, or maybe use another programmer's editor. – None – 2011-10-30T08:51:10.350

All right, I copied it into Notepad++ and it worked: I was able to do a Find and Replace. Thanks a lot. – None – 2011-10-30T08:59:34.120