1

So I made a txt file using notepad and made a hash of it:

SHA1: 701B6FAD6530C61528F9C11F024A9434B3C42D65

then I edited that file and took the hash again:

SHA1: 97A1D0B1A8BBEE639BADF4A54CEC1C83284ED1CF

then I reverted a change:

SHA1: 701B6FAD6530C61528F9C11F024A9434B3C42D65

Note that the hash of original and reverted are the same, which makes sense because the content is the same.

Now I decided to make a hash of a pdf file:

enter image description here

SHA1: CB498FAEF0CD2886A12A4128E168CD30CF97B537

Then I appended a character to the last line and saved the file

enter image description here

SHA1: 15DFC97EAD337537931BAD381A8EB7DBC7E7C050

Then I reverted a change using Ctrl+Z Combo and saved the fileenter image description here

SHA1: 0D5A19A1DAEBC47F75E759C279B4D1849BD5A9E8

Note that the hash of original and reverted are NOT the same. I was wondering what exactly changed in the content that changed the hash.

Here are original and reverted are side by side, also two hashes: enter image description here

Aibek
  • 13
  • 4
  • UPD: I just noticed that I didn't even had to add characters. If you just open it with notepad and hit Ctrl+S(to save) it changes the hash. But I am still unclear, does notepad somehow alters the characters, or reinterprets them? – Aibek Sep 26 '18 at 04:15
  • Yea, there are a bunch of things that are different, the question is why notepad does that – Aibek Sep 26 '18 at 05:07
  • https://stackoverflow.com/questions/8432584/how-to-make-notepad-to-save-text-in-utf-8-without-bom may provide a hint. – vidarlo Sep 26 '18 at 05:49
  • `why notepad does that` you should use a better text editor like notepad++, sublime, or just use hex editor for binary file. – mootmoot Sep 26 '18 at 07:09
  • *"Yea, there are a bunch of things that are different, the question is why notepad does that"* - this is not a security question. – Steffen Ullrich Sep 26 '18 at 11:29
  • it's probably a different encoding, notepad likes ascii – dandavis Sep 26 '18 at 16:04

1 Answers1

4

The hash is done over binary data, not visible characters.

Likely there were invisible changes, like adding a missing end-of-line character or replacing all UNIX style end-of-line \n with Windows style end-of-line \r\n. Given that you've viewed binary data in a text editor it might also be that the text editor just removed or sanitized characters which made no sense as text (like invalid utf-8 sequences).

To see what exactly happened look for example at the file size before and after (did it change?) or compare the binary data for example using a hexdump of the versions.

Steffen Ullrich
  • 184,332
  • 29
  • 363
  • 424
  • Added screenshots of files, still the same :/ – Aibek Sep 26 '18 at 04:44
  • @АйбекЖылкайдаров: you've added a screenshot which shows that the file size is the same. This says nothing about any other possible changes I've mentioned, specifically the sanitizing of binary data as displayable text which might happen if you handle **binary** data with a **text** editor not designed for binary data. – Steffen Ullrich Sep 26 '18 at 05:07