Raw Bitstream vs Container?

I've been researching multimedia formats (although recently, I've been told not to use the word "format" as it's ambiguous.)

I learned that a video file is composed of the Raw Bitstream encoded according to some sort of standard, e.g. H.264, then that bitstream is packaged in a Container, e.g. .mp4.

i.e. Raw Bitstream (encoded to a standard protocol) + Container = My Video File

I learned this from this other SuperUser article: What is a Codec (e.g. DivX?), and how does it differ from a File Format (e.g. MPG)?

In this article, it also said this:

Until now we've only explained the raw "bitstream", which is basically just really raw video data. You could actually go ahead and watch the video using such a raw bitstream. But in most cases that's just not enough or not practical.

Therefore, you need to wrap the video in a container. There are several reasons why:

-Maybe you want some audio along with the video.

-Maybe you want to skip to a certain part in the video (like, "go to 1:32:20.12").

-Both audio and video should be perfectly synchronized.

-The video might need to be transmitted over a reliable network and split into packets before.

-The video might even be sent over a lossy network (like 3G) and split into packets before.

I really just don't understand why a Raw Bitstream can't be used, and how a container can allow all those things. He says that they can, but he doesn't explain how, and that's what I'm getting at.

This is probably because I've never dealt with Raw Bitstreams ever, in my life. I've always clicked on an .mp4 container file, and it just worked.

Can someone explain the magic of containers and how they augment Raw Bitstreams?

Anton Paras

Posted 2015-11-21T22:34:27.273

Reputation: 135

Answers

Containers add metadata to one or many “raw bitstreams”. One can imagine the latter as a traditional film roll: a series of images, nothing more. The container would act as the box the film roll is stored in: it adds the title, index positions (scene 2 starts at 03:45), total length and so on.

Pure video without a container can work; durations can obviously be calculated without an index, but it quickly gets unpractical — the whole video needs to be decoded to get its total length, as the amount of data needed to store a second of film is not necessarily constant (some codecs even allow for variable frame rates). To skip forward ten seconds, pre-decoding ten seconds of video would be needed; to skip back ten seconds, re-decoding from the start would be involved, or a running index of what was already seen would need to be kept. Not pretty and not efficient.

So, in the case of streaming, containers streamline operations like seeking or getting total length; no need to pre-download more data than really needed. A watching session can start at half-point, without downloading then decoding the first half.

The same limitations apply to pure audio.

So now for synchronized audio and video, two distinct data streams are needed. Switching back and forth between reading two files (even if each had its distinct container) would involve a useless performance hit, and on a loaded computer, might mean that video is ready to be played but audio still waits on the disk. Containers chunk data into manageable clusters of short length (a few seconds at most) where video and audio are located next to each other on the storage medium (or on the network). Should transmission happen over a lossy network, and packets get lost, the player can easily resume playing the next cluster, without the need to deal with estimation of data loss to ensure video and audio are kept in sync.

So, containers mostly store redundant information that can easily gathered from the “raw bitstreams”, but that addition makes a lot of operations more efficient and adds reliability.

Patrice Levesque

Posted 2015-11-21T22:34:27.273

Reputation: 780