Line ending conversion program

6

1

Write a program or function that alters the file at a user-specified filename via replacing all instances of UNIX line endings with Windows line endings and vice versa.

Clarifications:

  • For the purposes of this problem, a UNIX line ending consists of the octet 0A in hexadecimal, except when it is immediately preceded by the octet 0D. A Windows line ending consists of the two octets 0D 0A.

  • As usual, your program only has to run on one (pre-existing) interpreter on one (pre-existing) operating system; this is particularly important for this question, where I imagine many submissions will succeed on UNIX but fail on Windows, or vice versa, due to line ending conversions built into the language.

  • To alter a file, you can either edit the file "in-place", or read the file, then save the contents of the file back over the original. However, the program must alter a file on disk; just taking a string as an argument and returning a string specifying the result is not acceptable for this challenge, because the challenge is partially about testing the ability to handle files.

  • The program's user must have some method of specifying the filename to change without needing to alter the program itself (i.e. you can't use hardcoded filenames). For example, you could read the name of the file to modify from a command-line argument, or from standard input. If you're submitting a function rather than a full program, making it take a filename as argument would be acceptable.

  • The input to the program must be specified as a filename specifically; you can't take input other than strings (assuming that you're using an operating system in which filenames are strings; in most, they are), and you can't place restrictions on what filenames you accept (you must accept all filenames that the operating system would accept, and that refer to a file that the user running the program has permission to read and write). For example, you can't write a function that takes a UNIX file descriptor as an argument and requires the file to already be open, and you can't write an editor macro that assumes the file has already been opened in the current buffer.

  • If the file is missing a trailing newline before you alter it, it should continue to be missing a trailing newline after you alter it. Likewise, if the file does have a trailing newline, you should preserve that (leaving the file with the other sort of trailing newline).

  • If the input file has mixed line ending conventions, you should nonetheless translate each of the line endings to the opposite sort of line ending (so the resulting line will still have mixed line ending conventions, but each individual line will have been converted).

  • The program is intended for use converting text files, and as such, you don't have to be able to handle nonprintable control codes in the input (although you can if you wish). The nonprintable control codes are the octets from 00 to 1F (hexadecimal) inclusive except for 09, 0A, 0B, 0C, 0D. It's also acceptable for your program to fail on input that cannot be decoded as UTF-8 (however, there is no requirement to place a UTF-8 interpretation on the input; programs which handle all the high-bit-set octets are also acceptable).

  • As an exception from the specification, it's acceptable for the program to do anything upon seeing the octet sequence 0D 0D 0A in the input (because it's impossible to comply with the spec when this sequence appears in the input: you'd have to generate 0D followed by an 0A which is not immediately preceded by 0D, which is a contradiction).

The shortest program wins, under usual rules (i.e. fewest bytes in the program you submit). Good luck!

Test cases

Here are some hex dumps of files you can use to test your program. All these test cases are reversible; swapping the input and output gives another valid test case. Note that your program should work for cases other than the given test-cases; these are just some examples to help testing:


Input

30 31 32 0A 33 34 35 0A 36 37 38 0A

Output

30 31 32 0D 0A 33 34 35 0D 0A 36 37 38 0D 0A

Input

30 31 0A 0D 0A 32 0A 0A 0A 33 34

Output

30 31 0D 0A 0A 32 0D 0A 0D 0A 0D 0A 33 34

Input and Output both a zero-byte file.

user62131

Posted 2016-11-28T17:51:29.533

Reputation:

Could you give some examples? Like normal and edge cases? – Karl Napf – 2016-11-28T17:54:24.443

Sure, I've added a few. Note that there's not much variety possible; the actual algorithmic part of the challenge is fairly simple. – None – 2016-11-28T18:02:25.477

the input/output has to be files? or could be string (representing the content)? – Rod – 2016-11-28T18:24:16.857

@Rod: has to be files. That's already stated in the question: "However, the program must alter a file on disk; just taking a string as an argument and returning a string specifying the result is not acceptable for this challenge, because the challenge is partially about testing the ability to handle files." – None – 2016-11-28T18:27:57.053

1So just for understanding your second example in 0A 0D 0A, the transform is 0A -> 0D 0A and 0D 0A -> 0A resulting in 0D 0A 0A? – Karl Napf – 2016-11-28T18:28:08.530

@KarlNapf: that's right; the problem's basically about replacing 0D 0A with 0A and vice versa. – None – 2016-11-28T18:32:03.910

-1 for requiring file I/O. Arbitrarily overriding the defaults is one of the things to avoid when writing challenges.

– Dennis – 2016-11-28T19:08:42.297

3The override isn't arbitrary. The "read the file, then write it back at the same location" task is a) the way this particular task is normally most useful practically, and b) not actually a method of I/O that's acceptable under default rules. So it's not overriding the defaults any more than "print hello world" is overriding the defaults; it's part of the task, not part of the I/O method (and in fact, it wouldn't be allowed if it were an I/O method!). I didn't override the defaults with respect to the actual input of the challenge (which is the filename). – None – 2016-11-28T19:12:33.247

Can we assume the file won't include bytes equal to 0 (null) or maybe 4 (end of transmission)? Just in case... – Luis Mendo – 2016-11-28T20:51:57.397

1I don't know what PPCG's standard rules are on assumptions about what the input file can contain, and I'll be shouted at if I answer the question in the comments :-(. This problem's mostly useful in the context of text files, though, which rarely contain characters like that, so I guess I'll edit the question to specify a character repertoire you have to handle. – None – 2016-11-28T21:29:31.743

That's very reasonable. And helped me save 2 bytes :-D – Luis Mendo – 2016-11-28T22:25:03.687

The challenge is interesting, I'm ok with IO manipulation and I've dealt with the EOL issue before, but why mixed EOL conventions? – adrianmp – 2016-11-29T20:29:23.180

@adrianmp: basically because this site requires you to define the puzzle explicitly, including cases like that. I'd be surprised if it makes much of a difference in the long run. – None – 2016-11-29T21:09:17.950

Answers

4

Sed (GNU), 27, 37, 24 bytes (21 byte + "-zi" flags)

(21 byte for code + 3 for "-zi" flags)

EDIT: I've just spotted an issue with an extra carriage return being appended to files with no trailing newline, when going from Unix to DOS.

This is now fixed, and I've also re-profiled my answer to pure Sed instead of Bash, and added some tests.

Golfed

s|\n|\r\n|g;s|\r\r||g

Usage

sed -zi -f newline.sed myfile.txt

Tests

An old pond!
A frog jumps in —
the sound of water.

(Matsuo Basho)

Test Case #1, Unix newlines + trailing newline

>xxd haiky.txt
0000000: 416e 206f 6c64 2070 6f6e 6420 210a 4120  An old pond !.A 
0000010: 6672 6f67 206a 756d 7073 2069 6e20 e280  frog jumps in ..
0000020: 940a 7468 6520 736f 756e 6420 6f66 2077  ..the sound of w
0000030: 6174 6572 2e0a                           ater.

>sed -zi -f newline.sed haiku.txt 
0000000: 416e 206f 6c64 2070 6f6e 6420 210d 0a41  An old pond !..A
0000010: 2066 726f 6720 6a75 6d70 7320 696e 20e2   frog jumps in .
0000020: 8094 0d0a 7468 6520 736f 756e 6420 6f66  ....the sound of
0000030: 2077 6174 6572 2e0d 0a                    water...

>sed -zi -f newline.sed haiku.txt 
0000000: 416e 206f 6c64 2070 6f6e 6420 210a 4120  An old pond !.A 
0000010: 6672 6f67 206a 756d 7073 2069 6e20 e280  frog jumps in ..
0000020: 940a 7468 6520 736f 756e 6420 6f66 2077  ..the sound of w
0000030: 6174 6572 2e0a                           ater..

Test Case #2, Unix newlines, no trailing newline

>xxd haiku.no-trailing-newline.txt
0000000: 416e 206f 6c64 2070 6f6e 6420 210a 4120  An old pond !.A 
0000010: 6672 6f67 206a 756d 7073 2069 6e20 e280  frog jumps in ..
0000020: 940a 7468 6520 736f 756e 6420 6f66 2077  ..the sound of w
0000030: 6174 6572 2e                             ater.

>sed -zi -f newline.sed haiku.no-trailing-newline.txt
0000000: 416e 206f 6c64 2070 6f6e 6420 210d 0a41  An old pond !..A
0000010: 2066 726f 6720 6a75 6d70 7320 696e 20e2   frog jumps in .
0000020: 8094 0d0a 7468 6520 736f 756e 6420 6f66  ....the sound of
0000030: 2077 6174 6572 2e

>sed -zi -f newline.sed haiku.no-trailing-newline.txt
0000000: 416e 206f 6c64 2070 6f6e 6420 210a 4120  An old pond !.A 
0000010: 6672 6f67 206a 756d 7073 2069 6e20 e280  frog jumps in ..
0000020: 940a 7468 6520 736f 756e 6420 6f66 2077  ..the sound of w
0000030: 6174 6572 2e                             ater.

zeppelin

Posted 2016-11-28T17:51:29.533

Reputation: 7 884

Both are true. I've correct it to handle conversion in both sides, and added explicit $1. Thank you ! – zeppelin – 2016-11-28T19:21:07.130

1I think it works now, although you can probably save several bytes via the use of a ; in the sed program rather than using -e twice. – None – 2016-11-28T19:23:01.740

1You can combine the -i and -e, and combine the two sed expression into one with a ;: sed -ie "s/$/\r/;s/\r\r$//" $1 – Riley – 2016-11-28T19:26:16.553

@Riley, true, actually I did this in the original version (which was shown to have a flaw by @ais523). Will restore that now. – zeppelin – 2016-11-28T19:29:30.550

1

MATL, 28 bytes

&Z$[13X]OZt5E6MZtO10ZtG3$FZ#

This works in current release (19.5.1), which is previous than the challenge.

I have only tested it on Windows, but it should work on Linux too.

Because of file input/ouput this can't be tested online directly. But here is an online version that accepts and produces hex-dump strings with the format shown in the challenge.

Explanation

&Z$      % Implicitly input filename, and read as chars
[13X]    % Push array [13 10]
O        % Push 0
Zt       % Replace [13 10] by 0
5E       % Push 10
6M       % Push [13 10] again
Zt       % Replace 10 by [13 10]
O        % Push 0
10       % Push 10
Zt       % Replace 0 by 10
G        % Push filename again
3$FZ#    % Overwrite the file with the modified char array

Luis Mendo

Posted 2016-11-28T17:51:29.533

Reputation: 87 464

0

C, 240 219 211 227 bytes

Saving a little by #define...fputc and a lot by converting to a program. -8 bytes for converting some logic. +16 bytes for in-place file replacement using the temporary file T

Takes filenames via command line parameters.

#import<stdio.h>
#define P(c) fputc(c,o)
d;int main(int c,char**v){FILE*i=fopen(v[1],"r"),*o=fopen("T","w");while((c=fgetc(i))!=-1){if(c==10)P(13),P(c);else if(c==13){if((d=fgetc(i))!=10)P(c);P(d);}else P(c);}rename("T",v[1]);}

Ungolfed:

#import<stdio.h>
#define P(c) fputc(c,o)
d;
int main(int c,char**v){
  FILE*i=fopen(v[1],"r"),*o=fopen("T","w");
  while((c=fgetc(i))!=-1){
    if(c==10)
      fputc(13,o),fputc(c,o);
    else if(c==13)
      if((d=fgetc(i))!=10)
        fputc(c,o);
      fputc(d,o);
    else fputc(c,o);
  }
  rename("T",v[1]);
}

Usage:

./a.out test_in.txt

Karl Napf

Posted 2016-11-28T17:51:29.533

Reputation: 4 131

This isn't quite complete by itself; it's writing to a new file, rather than overwriting the original. I think you can most easily fix this via using a hardcoded filename for O and doing a rename(O,I) at the end of the function; there might be a way that requires fewer bytes though. – None – 2016-11-28T19:17:57.570

@ais523 Ah okay, I misinterpreted your third point. I guess rename is fine for this stage of the program, which might not be final. – Karl Napf – 2016-11-28T19:22:16.120