7
1
Some websites offer conversion of PDFs int DOCX or ODT files; and I think Adobe Acrobat (at least the full version) offers an export functionality to all sorts of formats. But in LibreOffice, if I open a PDF files, it opens in Draw. Now, Draw is fine sometimes, not always.
So, can I somehow open PDF files into an LO Writer document?
Note: I'm obviously interested in PDFs which can be legitimately perceived as Writer documents, e.g. having been exported from a word processor. Thus opening them as dozens of frames scattered across the page is not what I'm after. That can be achieved with opening in Draw, copying everything and pasting in Writer. I want the text in nice consecutive paragraphs, hopefully with consistent styles (even if synthesized) etc.
Apparently, there is no direct way to open a PDF document in Writer, nor to save it in ODT format from Draw. However, there are numerous tools for conversion of PDF to ODT documents, both online and as discrete applications. That said, conversion is always "iffy" because PDF is a page description format, losing the line breaks of the original document. – DrMoishe Pippik – 2017-10-16T22:24:37.717
@DrMoishePippik: But often the PDF is the output of a conversion/print-out of a document, which you then want to work on. See my edit of the question. Also, do you suggest I ask on SR.SX? – einpoklum – 2017-10-16T22:38:06.550
The fact you're using LibreOffice may make this moot or awkward, but Word 2016 can both open and convert PDF files AND save files to ODT. – music2myear – 2017-10-16T22:51:22.363
In the conversion from ODT to PDF much is (intentionally) lost. The PDF file, for example, may lose all the original CR/LF (paragraph symbols), and add its own line breaks at the end of each line of text in the PDF document as displayed, rather than at the end of a paragraph. – DrMoishe Pippik – 2017-10-16T22:54:53.693
@DrMoishePippik: Most of that can rather easily be recovered, and online tools do this. Also, PDFs can include meta-data so practically none of this stuff is lost (but I'm not sure what LibreOffice saves). – einpoklum – 2017-10-16T23:15:11.537
Actually, it's often not recovered, but synthesized through optical character recognition (OCR) that recreates the actual paragraph format based on the page layout. The extreme case is a PDF document that contains no text, only the images of text. OCR is the only way to recover text from such a file. – DrMoishe Pippik – 2017-10-16T23:21:02.070
@DrMoishePippik: I accept the distinction between recovery and synthesis. However, I'm not talking about scanned images which require OCR, I'm taking about PDFs generated on a computer, often originally being MS-Word or LO Writer documents. – einpoklum – 2017-10-17T07:39:01.980