Its all a bit more complicated than the simple answers given so far.
There are 2 aspects: The machine, and mass storage.
On the Machine:
It depends on the hardware architecture.
On a PC, addressing is by byte, and you can access a byte (8 bits), a word (16 bits), a double word (32 bits), and a quadword (64 bits).
On other architectures you might only have access to some other sized "blob" for the machine data type. For example on the TMS320C40 you can access 32 bit words, and 8 bit bytes are packed into these words. You can can pack the bytes in and out, but its quite a slow process requiring several machine instructions.
So on that TMS320C40 the C compiler has a native char type that is 32 bits!
(when programming in C, never ASSUME that a char is 8 bits. Read your compiler manual, especially if doing embedded programming).
Things get even more complicated when endian-ness comes into play, there are 2 common arrangements: little and big endian, this describes how byte are arranged to fit into a larger quantity (normally that machines native word size). So for example, on a 32 bit machine you might find the bytes arrange like this:
Address X: Byte 0, Byte 1, Byte 2, Byte 3
Address X+4: Byte 4, Byte 5, Byte 6, Byte 7
OR
Address X: Byte 3, Byte 2, Byte 1, Byte 0
Address X+4: Byte 7, Byte 6, Byte 5, Byte 4
(And it gets even more complex because the bits in a byte have endian-ness as well.)
MOSTLY this kind of thing only comes up as a worry for the hardware designers. But if you have to write device drivers and things that talk to hardware that is through memory mapped registers, it becomes a big deal.
A simple example can suffice:
Dumping a block of memory at address X might present a stream of bytes:
01 02 03 04 05 06 07 08
BUT dumping that same block from the same address and presenting as 16 bit (hex) integers might present as:
0201 0403 0605 0807
And dumping again from the same address as 32 bit integers in hex might present as:
04030201 08070605
This causes vast amounts of confusion to the uninitiated, because it all depends on the endian-ness, and the method (byte order) used to make bigger quantities out of smaller ones.
Generally high level languages hide this level of gruesomeness, but it can be important for things like overlay data structures, and, again, memory mapped device control registers.
Mass Storage.
Fortunately here, life gets easier.
Just think of your mass storage as a great big bunch of bytes, that can be accessed, and the machine will magically take care of it all. The common term used is to thing of files as a "stream", where you start at the start and the stream comes rolling in. (This conveniently ignores random access.) The smallest part you can break the file's stream into is a byte.
If a machine wants to store larger quantities (16 bit words, etc), then it may or may not do some level of transform to get that into the bytes that go to the storage.
Caveats.
All of the above is in relation to underlying low level stuff - bytes, words, and so on.
Programs make use of this in all kinds of ways. So for example you will get CHARACTERS represented by bytes if they fit happily into plain ASCII (or even EBCDIC for those with long memories). The modern Unicode character systems may use Wide Characters (generally these are 16 bits), but there are many encoding systems for unicode. The Wikipedia page on Unicode is pretty instructive.
The convention in C of assume CHARACTER = BYTE is these days, misleading and misguided. Its best to thing of "char" is a synonym for "byte" - unless your machine / compiler tells otherwise (see above). GOOD C programs generally define a set of preferred types such as "UINT8" - unsigned 8 bit integer, "SINT8" - signed 8 bit integer, and so on, so that the program written becomes as independent as is sensibly possible from the peculiarities of the specific compiler and underlying hardware.
To the specific question: How are characters stored? The answer is: it depends. Frequently, ascii characters that fit in a byte are stored as a byte. Wide characters are frequently stored as 16 bit words. But unicode might implement wide characters or one of any number of coding systems, in which case characters might occupy anywhere from 1 to about 4 bytes, depending on the character.
This does not answer your question, but based on some of the stuff you are asking you (and several of the people who have provided you answers) need to read this: http://www.joelonsoftware.com/articles/Unicode.html
– ubiquibacon – 2010-10-21T07:24:56.873