Document saved with Notepad has linebreaks deleted

0

A very odd situation is occurring:

(Running Windows 7 Enterprise, SP1, 64bit)

A GUI I use at work (built specifically for this purpose and intended to run in a Windows environment) generates .dat files. I need to edit one slightly before I use it. I open it in plain old Notepad, make my edit and save it (everything looks fine). When I REOPEN it with any program (Notepad, Notepad++, etc) all "new lines"/"enters"/line-breaks have been removed - everything appears to be jumbled together as one long line.

If I simply open it and close it in Notepad without saving, nothing changes and the line-breaks are where they should be. Opening the doc in Notepad++ or another program and saving it does not affect the line-breaks. Copying the contents to Notepad++ and back to Notepad also fixes this problem - subsequent saves in Notepad do not bread the Linebreaks.

What makes this an even more awkward problem is that this behavior does not apply to ALL the .dat files my GUI produces. Just some of them.

Any good ideas on what is happening and how to fix it?

If the solution to this is to modify the way my GUI produces the files, that is an acceptable answer as the GUI is something I can submit a bug-report for.

However... it seems unlikely as I don't think my boss has ever had this problem before and he uses the same version of the GUI and Notepad to make slight modifications to the files. I also have only had this happen recently and inconsistently: a file that had this problem previously does NOT loose its linebreaks when saved with Notepad in this current iteration of files.

Edit: more info: I sent the file to my boss and had him open and save with Notepad on his computer and nothing funny happened - all linebreaks remained after saving, closing and reopening. Either the process of sending it fixed something in the file, or it is something funny with my computer.

Looking at the hex of the saved and unsaved files: As far as I can tell, the unsaved version has 0D0A between lines and the saved version is missing all instances of 0D0A except for a single instance at the very end (I wonder if this was added by Notepad++ as the file was opened/converted).

Edit again: the hex version of the "unsaved" file after editing out sensitive info:

2320504C4541534520434845434B3A20
544845524D5F43617020616E64204465
70436170206265666F72652072756E6E
696E67202121210D0D0A0D0D0A706172
616D20696E697469616C203A3D20313B
0D0D0A706172616D2054203A3D313735
32303B0D0D0A706172616D206474203A
3D20333630303B0D0D0A0D0D0A706172
616D204950505F4F524F203A3D20302E
313233343B0D0D0A706172616D205448
45524D5F4F524F203A3D20302E313233
343B0D0D0A706172616D20544845524D
5F436170203A3D2031323334353B0D0D
0A706172616D20446570436170203A3D
31323334353B0D0D0A234D572C204465
70656E6461626C652043617061636974
79206F662073747566660D0D0A090909
234E756D6265727320666F7220726566
6572656E63653A207468696E67732E0D
0D0A09090923446570656E6461626C65
204361703A2073747566660D0D0A7061
72616D20425546464552203A3D20303B
0D0D0A706172616D20636F6E76657274
203A3D20312E303B0D0D0A0D0D0A7061
72616D09525245534E504F494E545309
3A3D20353B0D0D0A706172616D095252
4553424B50093A3D0D0D0A31092D3132
33343530200D0D0A32092D3132333435
3030200D0D0A330930200D0D0A340931
32333435200D0D0A3509313233343520
0D0D0A3B0D0D0A0D0D0A706172616D09
525245534C4F5045093A3D0D0D0A3109
2D3132333435452D30350D0D0A32092D
3132333435452D30350D0D0A33092D31
32333435452D30350D0D0A3409313233
34350D0D0A3B0D0D0A0D0D0A2357696E
64792073747566660D0D0A706172616D
2057494E445F49433A3D0D0D0A706C61
63650931323334350D0D0A3B0D0D0A0D
0D0A706172616D207468696E67793A3D
0D0D0A706C6163650931323334350D0D
0A3B0D0D0A0D0D0A706172616D204F50
545F5265733A3D0D0D0A7468696E6709
300D0D0A3B0D0D0A0D0A

This is the hex code after saving in notepad and reopening:

2320504C4541534520434845434B3A20
544845524D5F43617020616E64204465
70436170206265666F72652072756E6E
696E6720212121706172616D20696E69
7469616C203A3D20313B706172616D20
54203A3D31373532303B706172616D20
6474203A3D20333630303B706172616D
204950505F4F524F203A3D20302E3132
33343B706172616D20544845524D5F4F
524F203A3D20302E313233343B706172
616D20544845524D5F436170203A3D20
31323334353B09706172616D20446570
436170203A3D383632362E313B234D57
2C20446570656E6461626C6520436170
6163697479206F662073747566660909
09234E756D6265727320666F72207265
666572656E63653A207468696E677309
090923446570656E6461626C65204361
703A207374756666706172616D204255
46464552203A3D20303B706172616D20
636F6E76657274203A3D20312E303B70
6172616D09525245534E504F494E5453
093A3D20353B706172616D0952524553
424B50093A3D31092D31323334353020
32092D31323334353030203309302034
0931323334352035093132333435203B
706172616D09525245534C4F5045093A
3D31092D3132333435452D303532092D
3132333435452D303533092D31323334
35452D303534092D3132333435452D30
353B2357696E64792073747566667061
72616D2057494E445F49433A3D706C61
63650931323334353B706172616D2074
68696E67793A3D706C61636509313233
34353B706172616D204F50545F526573
3A3D7468696E6709303B0D0A

BunnyKnitter

Posted 2017-03-22T17:54:31.210

Reputation: 103

Answers

1

Some background might help. Notepad requires a file to contain both <CR><LF> in order to determine it is a line ending and perform line breaking. If either of these characters are missing it will skip them and display everything on a single line. <CR><LF> is the standard line break sequence on DOS/Windows machines whereas Unix/Linux and derivatives use <LF> as line break. The file you are opening most likely only has <LF> in it thus, notepad can't display it correctly.

Most likely these files were created by a program that isn't following Windows/DOS conventions for text formatting. If this is only intended to run on Windows, I think a bug report is in order especially if the files are supposed to be edited in notepad.

In the meantime, I recommend only opening and editing the files in Notepad++.

If you want to debug the problem, download a hex editor to see what is contained at the end of a line. For proper windows formatting it should contain hex codes 0d 0a. If either are missing or in a different order, that will create a problem. Try this on a new file that has never been saved from notepad and one that has.

RayG

Posted 2017-03-22T17:54:31.210

Reputation: 21

Windows is the only platform it has ever run on. I considered bad encoding but the inconsistency is bothering me. I generated the same set of files several weeks ago and "File1.dat" had this problem. Now I generated more files today, and "File2.dat" has this problem, but "File1.dat" does not. No update to my GUI has occurred in this time. I am using the same computer. – BunnyKnitter – 2017-03-22T18:24:24.570

Also, if there is a problem with the encoding, wouldn't it show the lack of linebreaks from the beginning? Opening the file initially works fine, it is the act of saving it in Notepad that seems to mess up the linebreaks. – BunnyKnitter – 2017-03-22T18:26:53.587

Added some debug info to the original post – RayG – 2017-03-22T18:38:23.877

as far as I can tell, the unsaved version has 0d 0a between lines and the saved version is missing both 0d and 0a. (using some online hex editor and pasting my text in - can't install random programs on my work comp.) – BunnyKnitter – 2017-03-22T18:43:29.970

Without actually viewing the file in a hex editor you can't be sure. The entire file needs to be properly formatted, if even one line is wrong, notepad may do something strange. If you can get the file to another computer that you can view in hex then that would help. – RayG – 2017-03-22T18:45:47.597

Oh. Notepad++ can convert to hex. Checked again:

The saved version is missing all instances of "0D0A" except for a single instance at the very end of the file. The unsaved version has many instances of 0D0A presumably corresponding to each linebreak. – BunnyKnitter – 2017-03-22T18:49:46.703

How big is the original file? – RayG – 2017-03-22T18:51:37.377

just a few kb. its a small text file. I could probably edit all sensitive info out but... I don't know how I would then save it to send an "unsaved" version. – BunnyKnitter – 2017-03-22T18:52:45.143

@SnyperBunny If the file contents fits on one screen in hex view, maybe you can just edit the sensitive info, then take a screen capture of hex view without saving. Do this on a freshly created .dat file so it's known to be unaltered other than the edits you make. – RayG – 2017-03-22T19:00:12.027

I added the hex code to the main question post. – BunnyKnitter – 2017-03-22T21:16:29.803

This pattern may be causing a problem: 0D0D0A – RayG – 2017-03-22T21:36:08.823

1I believe that 0d0d0a pattern is the problem and Notepad is stripping them out. Two <CR> in a row doesn't make sense, it is redundant and does not conform to Windows standard. Notice that there is a proper <CR><LF> at the end after it is saved which is proof that Notepad sees all of the original text as one long line, skipping and removing the control characters just as you described in your original post. You need to fix the program that is generating text with two CR in a row. – RayG – 2017-03-22T22:02:44.580

@SnyperBunny If you have access to the code, look for strings that look like this: "This is a line of text\r" The backslash r would not be needed. – RayG – 2017-03-22T22:06:17.960

I don't have access to the code that generates the files unfortunately. What I can't understand is why this problem would be inconsistent. Why would this file be broken this time while a different file was broken previously and when "normally" this is not an issue with ANY files generated. – BunnyKnitter – 2017-03-22T22:09:24.587

@SnyperBunny I can't answer that question without seeing the code that generates the files. Some programs may be inconsistent in how they add lines to a file. Notepad is unforgiving of bad formatted lines while the other editors can deal with it. Unfortunately that's all I have to offer on this topic. – RayG – 2017-03-22T22:13:59.100