odt2txt handling page breaks incorrectly

0

I'm also not sure if I'm doing something wrong or its a bug.

I want to use the bash command odt2txt to convert an odt-file, made with Libreoffice Writer, to a text file. However, the line breaks don't seem to be handled correctly. Every single line break is converted to two line breaks, multiple line breaks are also converted to two line breaks.

If I, for example, save this

This is a test
one line break before this

two line breaks before this 


and three line breaks before this

into test.odt with LO Writer, and then do

odt2txt test.odt

I get

This is a test

one line break before this

two line breaks before this

and three line breaks before this

Using any of the options hasn't helped me either.

I don't find anything about this on Google, so I wonder if I'm the only one who has this problem.

Update: output from cat -vet output.txt, as asked for in comment

$
This is a test$
$
one line break before this$
$
two line breaks before this$
$
and three line breaks before this$
$

Lu Kas

Posted 2017-01-13T16:44:50.553

Reputation: 1

test your output with cat -vet output.txt. If you see ^M$ at the end of each line, either use dos2unix output.txt or look more closely at doc for odt2txt to see if there is an option to create Unix/Linux line endings OR turn off Windows processing. Might be good to check you original file too for ^M$s, and then you'll know the source. – shellter – 2017-01-13T22:39:40.573

I have added the output of cat -vet output.txt in the question. I don't think it is Windows processing, though, since it's not just a doubling of every line ending. Every thing gets turned into two (i.e. three or more also are turned into two). (sorry for my late reaction, btw. Was away without internet for the weekend) – None – 2017-01-16T11:44:37.233

If you use Save As and select Text Document from within Writer itself, the formatting appears to be as you would want, as does selecting all, copying and pasting into a text editor. – AFH – 2017-01-16T12:50:09.180

@AFH, yes I know. But I need to do it from the command line. I want to run some script of commands on the text I'm writing, however, I want to keep being able to edit the text in LO. I want be able to keep using the mark up tools (not needed in the final text, but helps me structuring). And always saving as text or copying into a text-file, as opposed to just saving, is inefficient. – Lu Kas – 2017-01-16T13:10:56.630

I have found a work-around by now, though. Now I just save to .docx and use docx2txt. docx2txt seems to give the correct behaviour. So it is not really a problem for me anymore, but it still seems to me that odt2txt has a bug. Or I'm doing something wrong ... – Lu Kas – 2017-01-16T13:15:04.733

After saving as text, you can still revert to editing the original ODF file. But I agree that you have probably found a bug, unless some of the conversion options will change the handling for repeated new-lines. Glad you found a solution. You should submit it as an answer, for the benefit of others. – AFH – 2017-01-16T13:29:48.947

So should I report this somewhere? I don't really know how to. Or is nobody really interested? – Lu Kas – 2017-01-16T14:35:08.937

No answers