-1

How to make a program that calculates a checksum of a file? What is the exact process of calculating a checksum?

2 Answers2

2

Checksums are simply numbers that help verify that a particular set of data (a file, for example) is unchanged from the last time it was checked. I can create a message here, compute the checksum is 42, then send you the message. You compute the checksum of the message you received, and if you also get the same number I told you (42), you have increased confidence that the two messages are identical. We say that the message's integrity was preserved.

But checksums are math operations, so how do they work on things like word documents and letters and pictures? Keep in mind that a file (or any set of data) is just a collection of bytes to a computer. It doesn't matter if it's a text file, a spreadsheet, or a digital picture, it's just a series of bytes. And all bytes are just numbers - they only mean things like letters and pictures when we use programs to display them that way. (Some common translations of text to numbers you may have heard of include ASCII and Unicode.)

The important thing is that to a computer, any file is just a series of numbers. And that's helpful for making a checksum routine work.

Creating the simplest kind of checksum can be done by literally adding up each byte of a file, and printing the sum total of all the bytes. If I have a file containing the five bytes that represent HELLO using ASCII, the checksum using this simple approach would be 72+69+76+76+79=372.

However, that's not a very good checksum. Remember, we want to make sure the file isn't changed. But if I rearrange those same letters into a different word, like HOLLE or OHELL, the checksum is still 372. I could even replace letters with other letters that add up to the same sum. So most checksum algorithms do something to take into account the order of the bytes. A simple checksum algorithm in very common use today is called the Luhn algorithm. It uses a multiplier that doubles every other character. These are used on virtually every credit card account number to make sure that someone doesn't accidentally make a typo when entering the credit card number into a computer.

(In the case of the Luhn algorithnm, instead of telling you the checksum in a separate message, an extra digit is added to the account number to make the checksum of the algorithm equal to zero. Anyone who wants to validate the credit card number was entered correctly just adds up all the digits in the appropriate way, uses modulo division by 10 on the sum, and ensures their result is zero.)

So that's how you'll have to make your program. Determine an appropriate checksum algorithm, read a file, compute the checksum as you go, then when you've reached the end of the file, print the checksum and exit.

John Deters
  • 33,650
  • 3
  • 57
  • 110
1

By checksum you are probably referring to hashing.

Hashing libraries are available in practically every programming language. Depending on how you want to use it, you could use it in a shell script (bash or Powershell for example) too.

All of them do the same thing. Take your file as input and spit out the hash. e.g., In a shell script:

md5sum myfile 
sha25sum myfile

In python (one similar question here on StackOverflow -- please look at the answer, not the example in the question!):

import hashlib
myhash=hashlib.md5(open('path_to_myfile', 'rb').read()).hexdigest()
Sas3
  • 2,638
  • 9
  • 20