SYSV checksum

The SYSV checksum algorithm is a commonly used, legacy checksum algorithm. It has been implemented in UNIX System V and is also available through the GNU sum command line utility.

Newer checksum algorithms

The manual page of the GNU sum utility program (that implements the BSD checksum algorithm) states: "sum is provided for compatibility; the cksum program is preferable in new applications."

Description of the algorithm

The main part of this algorithm is simply adding up all bytes in a 32-bit sum. As a result, this algorithm has the characteristics (disadvantages and advantages) of a simple sum:

  • re-arranging the same bytes in another order (e.g. moving text from one place to another place) does not change the checksum.
  • increasing one byte and decreasing another byte by the same amount does not change the checksum.
  • adding or removing zero bytes does not change the checksum.

As a result, many common changes to text data are not detected by this method.

The last two lines of the algorithm reduce the total sum to a 16-bit number.

Sources

gollark: I don't know how many are in my data dump in total, but I was training it jankily on Colab so it only ran for a few tens of thousands of steps.
gollark: Anyway, I did train a GPT-2 model on my messages ages ago. It wasn't very good, but I think this is just because I did not know much ML stuff at the time, so it was a small model and very undertrained.
gollark: A day or so, I forgot.
gollark: Oh, oops, it's 54MB compressed altogether, I read the wrong thing.
gollark: 587MB.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.