What you've calculated is the bitrate for a raw, uncompressed video. You typically won't find these except in research or other specialized applications. Even broadcasters use compressed video, albeit at a much higher bitrate than your typical YouTube video.
So, video quality has a lot to do with how the video was compressed. The more you compress it, the less bits it takes per frame. Also, the more you compress, the worse the quality is. Now, some videos are much easier to compress than others – in essence, this is why they have a lower bitrate even though they have the same resolution and framerate.
In order to understand why this is, you need to be aware of the two main principles video compression uses. These are called "spatial" and "temporal redundancy".
Spatial redundancy
Spatial redundancy exists in images that show natural content. This is the reason JPEG works so well — it compresses image data because blocks of pixels can be coded together. These are 8 × 8 pixels, for example. These are called "macroblocks".
Modern video codecs do the same: They basically use similar algorithms to JPEG in order to compress a frame, block by block. So you don't store bits per pixel anymore, but bits per macroblock, because you "summarize" pixels into larger groups. By summarizing them, the algorithm will also discard information that is not visible to the human eye — this is where you can reduce most of the bitrate. It works by quantizing the data. This will retain frequencies that are more perceivable and "throw away" those we can't see. Quantizing factor is expressed as "QP" in most codecs, and it's the main control knob for quality.
You can now even go ahead and predict macroblocks from macroblocks that have been previously encoded in the same image. This is called intra prediction. For example, a part of a grey wall was already encoded in the upper left corner of the frame, so we can use that macroblock in the same frame again, for example for the macroblock right next to it. We will just store the difference it had to the previous one and save data. This way, we don't have to encode two macroblocks that are very similar to each other.
Why does bitrate change for same image size? Well, some images are easier to encode than others. The higher the spatial activity, the more you actually have to encode. Smooth textures take up less bits than detailed ones. The same goes for intra prediction: A frame of a grey wall will allow you to use one macroblock to predict all others, whereas a frame of flowing water might not work that well.
Temporal redundancy
This exists because a frame following another frame is probably very similar to its predecessor. Mostly, just a tiny bit changes, and it wouldn't make sense to fully encode it. What video encoders do is just encode the difference between two subsequent frames, just like they can do for macroblocks.
Taking an example from Wikipedia's article on motion compensation, let's say this is your original frame:
Then the difference to the next frame is just this:
The encoder now only stores the actual differences, not the pixel-by-pixel values. This is why the bits used for each frame are not the same every time. These "difference" frames depend on a fully encoded frame, and this is why there are at least two types of frames for modern codecs:
- I-frames (aka keyframes) — these are the fully encoded ones
- P-frames — these are the ones that just store the difference
You occasionally need to insert I-frames into a video. The actual bitrate depends also on the number of I-frames used. Moreover, the more difference in motion there is between two subsequent frames, the more the encoder has to store. A video of "nothing" moving will be easier to encode than a sports video, and use less bits per frame.
If video were a sequence of bitmap images. Your math is already off for png/jpg image files. – Daniel Beck – 2012-05-06T18:24:35.350
The two existing answers don't emphasize the salient attribute about video compression: most (if not all) all video codecs employ lossy compression. That is, some picture information is discarded when the raw video is compressed and encoded. The amount or degree of discarded and lost image information/detail is determined by a quality factor. As for audio compression, there are both lossy and lossless compression techniques. – sawdust – 2012-05-06T23:19:56.037
@sawdust: They don't? I thought my third paragraph made that fairly clear. Anyway, giving too much information is sometimes not so good; I believe in giving enough to allow the asker to learn more, if desired. Otherwise, I could say your post doesn't emphasize why someone would pick one compressor over another, or why there are so many different methods, etc, etc. – Marty Fried – 2012-05-06T23:48:30.947
@sawdust You're correct, this was somewhat buried in the JPEG part. I added a little more details. – slhck – 2012-05-07T09:55:27.467