"file" command yields "ASCII text, with no line terminators", unless I first edit the file in vim

6

I am experimenting a strange behaviour which I don't know how to solve. I will explain the scenario:

  • From a Python script I'm getting a json from a simple application hosted on parse.
  • Once I get the text, I get a sentence from it and save it to a local "txt" file saving it as iso-8859-15.
  • Finally I send it to a text to speech processor, which expects receiving it on ISO-8859-15

The weird thing is that once the python script runs, if I run

file my_file.txt

The output is:

my_file.txt: ASCII text, with no line terminators

But if I open my_file.txt with vim, then remove the last "dot" of the sentence, write it again, and save the file: if I do again:

file my_file.txt

now the output is:

my_file.txt: ASCII text

Which solves some problems when processing the voice synthesizer. So, how can I force this behaviour automatically without doing the vim stuff? I have also done many tries with iconv with no success.

Any help would be much appreciated

Edit:

i@raspberrypi ~/main $ hexdump -C my_file.txt

00000000  73 61 6d 70 6c 65 20 61  6e 73 77 65 72 2e 2e     |sample answer..|
0000000f

pi@raspberrypi ~/main $ file my_file.txt
my_file.txt: ASCII text, with no line terminators
pi@raspberrypi ~/main $ vim my_file.txt
pi@raspberrypi ~/main $ file my_file.txt
my_file.txt: ASCII text
pi@raspberrypi ~/main $ hexdump -C my_file.txt

00000000  73 61 6d 70 6c 65 20 61  6e 73 77 65 72 2e 2e 0a  |sample answer...|
00000010

Sample file

Python code:

import json,httplib
from random import randint
import codecs

connection = httplib.HTTPSConnection('api.parse.com', 443)
connection.connect()
connection.request('GET', '/1/classes/XXXX', '', {
       "X-Parse-Application-Id": "xxxx",
       "X-Parse-REST-API-Key": "xxxx"
     })
result = json.loads(connection.getresponse().read())

pos = randint(0,len(result['results'])-1)
sentence = result['results'][pos]['sentence'].encode('iso-8859-15')
response = result['results'][pos]['response'].encode('iso-8859-15')

text_file = codecs.open("sentence.txt", "w","ISO-8859-15")
text_file.write("%s" % sentence)
text_file.close()

text_file = open("response.txt","w")
text_file.write("%s" % response)
text_file.close()

cor

Posted 2015-10-17T09:09:08.607

Reputation: 163

Can you upload the file with no line terminators? I would like to have a look at it. – Nidhoegger – 2015-10-17T09:23:55.673

1Is it removing the 'dot', or does any edit fix it? It might be that editing the file adds the end of line marker, rather than the dot causing the problem. – Paul – 2015-10-17T09:24:29.507

So it's a single line in that text file? And does it have a line terminator? And are you sure you're only removing the dot? You can validate using hexdump -C. When typing in vim, lines always seem to end with 0x0a, even though you cannot move the cursor to the next empty line. So I guess vim is indeed adding it when you remove the dot, or make any edit. – Arjan – 2015-10-17T09:24:58.623

many thanks! yes, you are all right, just opening and saving the file with vim is enough – cor – 2015-10-17T09:27:13.567

thank you @Arjan I edited the post with the command results – cor – 2015-10-17T09:40:40.110

@Nidhoegger I uploaded a file. Is on the edited question. Many thanks – cor – 2015-10-17T10:33:18.673

Please show the python code how you get the line and how you write it. I suspect that the newline is stripped when looping over the input and all you need to do is append it when writing the output file. Please make sure to specify if you are using python 2 or 3 since unicode handling has changed a lot between those two versions. – Bram – 2015-10-17T10:42:33.973

Thanks @Bram, there it is. Using python 2.7.3. Writing to a file in two different ways, with same result. – cor – 2015-10-17T10:52:22.513

So that specific example even has two dots, right? 0x2e is a dot, and that's in the example twice. But indeed, the 0x0a is added by vim, even when you don't even remove anything, like you already saw now. – Arjan – 2015-10-17T11:01:14.357

Answers

6

The standard /bin/echo can be used to add that newline to the end of the file for you:

$ echo -n 'ssss'>test
$ file test
test: ASCII text, with no line terminators
$ hexdump -C test 
00000000  73 73 73 73                                       |ssss|
00000004
$ echo >> test
$ file test
test: ASCII text
$ hexdump -C test 
00000000  73 73 73 73 0a                                    |ssss.|
00000005
$ 

Another option would be to add it in your Python code:

text_file = open("response.txt","w")
text_file.write("%s" % response)
text_file.write("\n")  # <-- newline added here
text_file.close()

Scott Johnson

Posted 2015-10-17T09:09:08.607

Reputation: 176

Or: text_file.write("%s\n" % response) ;-) – Arjan – 2015-10-17T11:01:43.583

1@Arjan that's probably how I would do it because I like things to be ultra-concise, but I wanted the extra verbosity for illustrative purposes here. :) – Scott Johnson – 2015-10-17T11:03:06.840

3

The simplest solution is to append the newline in the write command:

text_file.write("%s\n" % sentence)

My sample program to demonstrate

import codecs
sentence = 'something'
text_file = codecs.open("sentence.txt", "w","ISO-8859-15")
text_file.write("%s" % sentence)
text_file.close()
text_file = codecs.open("sentence2.txt", "w","ISO-8859-15")
text_file.write("%s\n" % sentence)
text_file.close()

And the result:

$ file sentence.txt 
sentence.txt: ASCII text, with no line terminators
$ file sentence2.txt 
sentence2.txt: ASCII text

The explanation is that the variable you are writing does not contain the newline and write() writes exactly hat you give it.

Bram

Posted 2015-10-17T09:09:08.607

Reputation: 582

Thank you, it works! your answer could be the acepted one perfectly, but Scott was quicker. – cor – 2015-10-17T11:13:16.020