1
1
I have a pdf file which is Persian script and it is a Right-to-Left. Since Persian uses UTF-8 format therefore I can't convert it into a plain text in Microsoft Word, also I can't copy-paste the text resulting unreadable characters. I have tried a lot of softwares such as unipdf and e-Pdf Converter however after the conversion still the characters are not displayed properly. I even tried OCR but again same problem appeared. The pdf does'nt have any password or restrictions.
Does anyone have any other ideas?
Edit: I actually tried creating a file in MS Word and converting it to a PDF, after that again I had the same problem with the PDF file.(even the encoding was known)
3Microsoft Word supports UTF-8 format. It also supports right to left languages. So why exactly can't you convert it to a Word document? – Ramhound – 2015-05-06T13:14:47.827
Hey thanx for your consideration. The source of my file is PDF so I don't know what exactly happens when I try to copy and paste it in Microsoft Word, but it doesn't show proper character. The same thing happens when I try to convert it using third party tools. – Mehdi – 2015-05-06T13:21:40.853
1
possible duplicate of Cutting & Pasting Vietnamese characters from a PDF
– RedGrittyBrick – 2015-05-06T14:27:32.590@RedGrittyBrick I read your answer. but in my case I actually tried creating a file in MS Word and converting it to a PDF, after that again I had the same problem with the PDF file.(even the encoding was known)- Thanks – Mehdi – 2015-05-06T14:59:51.293
How was the PDF created? Electronically or scanned and you are hoping for OCR to take over? – Austin T French – 2015-05-06T15:39:51.427
Can you create an example PDF and post it somewhere public so that people can download it from there using a URL? – RedGrittyBrick – 2015-05-06T16:04:08.173
@AthomSfere The PDF was created automatically by converting a MS Word file into a pdf. Thanks – Mehdi – 2015-05-09T12:13:47.187
@RedGrittyBrick Here is an example of PDF https://drive.google.com/open?id=0BzLHaKpzBvMNZXZrd1NURWhIS0F4OGkzVldSRm1ZYXJXbHNF&authuser=0
– Mehdi – 2015-05-09T12:14:04.700I can cut and paste text from that using Chrome's built-in PDF viewer - there is no obvious garbling of the characters but the direction of text is mostly reversed. I don't read Persian so can't tell whether the actual characters are all OK - but they look superficially OK. With a different PDF viewer, eVince, the main problem is selecting contiguous text. Unfortunately I don't think I can help with your problem. – RedGrittyBrick – 2015-05-09T22:39:48.747
@RedGrittyBrick Thank you very much for your consideration. this problem exists with non-English PDF and I don't the reason! however, you have already helped me, I cant copy-paste portion by portion! the long way but the only way! – Mehdi – 2015-05-10T13:25:20.637