Lossless Compression
Lossless compression is where no data is lost. Everything that is entered can be retrieved perfectly. This works well for text or binary files where the smallest error will be noticed.
File compression works by taking the file and scanning for patterns, and translating those patterns into something else which takes up less space.
For example "AAAAAAAA" could be turned into "8A".
Granted that's not how it works exactly because then you have the problem what if "8A" was in the plaintext. You would uncompress the file and it would be wrong. A good place to start is either Wikipedia or the LZW Data Compression Algorithm.
There is some simply psuedo-code for this copied below:
STRING = get input character
WHILE there are still input characters DO
CHARACTER = get input character
IF STRING+CHARACTER is in the string table then
STRING = STRING+character
ELSE
output the code for STRING
add STRING+CHARACTER to the string table
STRING = CHARACTER
END of IF
END of WHILE
output the code for STRING
All compression uses a lookup dictionary which is used to compress and decompress the file. The bigger the dictionary, the more you can compress it, although you do run into the Law of Diminishing Returns.
It's also worth noting that compression doesn't always yield a smaller file. There are situations (with small files, or when compressing random data) that you will not get a smaller file after compression. There have been some fun challenges pertaining to the ability to compress random data.
"Lossy" Compression
The above mostly pertains to lossless compression. Other types of compression used in video / audio applications such as MP3, JPG, and h.264 are examples lossy compression.
Lossy compression works by discarding data that is least likely to be noticed. In audio this is sounds about 30,000 Hrz and below 100 Hrz, along with other various things. In picture (static) it removes various things and merges pixles together, along with discarding data.
Lossy compression is a form of transform coding. It averages out data to reduce the overall size. For example a block of 10 pixels in an image, all of slightly different colors may be merged together to one color and thus compressed.
In video compression often instructions will be placed to only redraw pixels that have changed since the last frame, or keyframe.
1
The Wikipedia article is a good start, but it would be nice to have more specific explanations. Good question (though I was sure we had such question already, but seems that not).
– Gnoupi – 2010-04-18T14:34:30.7902@Gnoupi: Indeed, first thing I did was search, as I was sure there was one here. Apparently not, so I tried to rectify that :P – Phoshi – 2010-04-18T14:47:13.250
2we've got a "what-is" tag for when you post pictures and go "wot izzit??"; i've been noticing a need for a "how-does-it-work" tag, but that's too long and "how-work" sounds dumb. "explain" might do it tho. – quack quixote – 2010-04-18T15:01:24.127
@quack quixote: Ah, thanks. I was looking in the autocomplete for a "plz-send-the-explanation" type tag, but couldn't find one. – Phoshi – 2010-04-18T15:04:45.527
2i've come close to just creating a "how" tag a couple of times... but "explain" is probably better. "tutorial" and "howto" and "beginner" are all semi-applicable but don't quite fit. – quack quixote – 2010-04-18T15:23:58.030