Why are hard drives never as large as advertised?

18

5

From all the hard drives I have bought, they never seem to be as large as the advertised size; from 320 GB down to 290 GB, from 500 GB down to 450 GB, etc. Is there a technical reason for this?

Sam152

Posted 2009-07-15T09:06:15.563

Reputation: 2 052

@DanielRHicks Congratulations! You got more than you bargained for! ;-) – Samir – 2015-07-06T11:47:38.683

5Your drives are as large as advertised. The operating system just measures them wrong. – endolith – 2010-09-04T20:03:34.123

3The nontechnical reason, of course, is that the manufacturers will put as large a number as they can possibly justify on the box, to drive sales. It's similar to ads with small print "up to"s on the capabilities and "starting at"s for the prices. – David Thornley – 2009-12-10T15:13:07.297

2Don't forget that a drive is always specified as unformatted size, and, when formatted, there will be less space available due to format tables, page substitution tables, and the like. – Daniel R Hicks – 2014-05-24T11:41:36.417

(But the 16G stick I have plugged in right now has 16,000,761,856 bytes total, according to Properties.) – Daniel R Hicks – 2014-05-24T11:44:35.143

Answers

29

The technical reason is that the hard drive manufacturers sell you capacities in metric units. So a GB = 1,000,000,000 bytes by the metric system. However, computers measure the drive size in powers of 2. So 1GiB = 1,024MiB, 1MiB = 1,024KiB, etc. What this means is that 1GiB = 1,073,741,824 bytes, a difference of 73,741,824.

So when you install your 1GB (for the sake of example) drive, the OS only sees 0.93GiB, and this is the cause of the discrepancy.

(If you've never seen the abbreviation GiB before, it's a new notation adopted to denote powers of 1024 as opposed to 1000. However, most operating systems will report GiB as GB, confusing this issue even further)

Kyle Cronin

Posted 2009-07-15T09:06:15.563

Reputation: 7 169

6Gi => Gibi rather than G => Giga – ChrisF – 2009-07-15T09:12:53.987

@ChrisF: yep, I added an addendum to my post explaining that – Kyle Cronin – 2009-07-15T09:13:32.020

11And don't get me started on the old "1.44Mb" floppy disks. These were actually made out of 1440 * 1024 bytes, using both the 1000 and 1024 measure simultaneously. It wasn't neither MiB nor MB – R. Martinho Fernandes – 2009-07-15T09:35:09.720

1

Wikipedia has a writeup and chart showing the differences http://en.wikipedia.org/wiki/Hard_disk_drive#Capacity_measurements

– Chris Nava – 2009-12-10T15:39:39.083

1Apple recently changed the display of disk sizes within MacOSX to use metric values. – Chris Nava – 2009-12-10T15:44:25.597

9

Originally this was the answer to this question (merged) about 4GB pen drive.

Let's we start from the statement: "Human system is based on power of 10, binary on on power of 2"
What it follows can give a first answer to your question.

The metric prefixes are power of 10, 1000 or 10^3 is k, 10^6 is M, 10^9 G...
The binary prefixes are power of 2 ( 2^10 = 1024 not so far from 1000 but different, 2.4%).

4000000000/1024/1024/1024  Your 4GB are 4 000 000 000 Bytes
3.72529029846191406250     That becames around 3.73 GiB 

Vendors and Law: Vendors behave following market's rules, when laws do not force them to do otherwise. 4 sells better then 3.78. For the same reasons the internet providers often speak about bps and let you understand Bps. There is a factor 8: a Byte(B) is 8 bits(b).

The problem is that the laws exist, but not in all the nations are the same.

The International System, or SI, is the most widely used in the world for commerce and science (It was published in 1960 and at present are partially out only USA that is adopting, Burma and Liberia).
It establishes not only the units of measurement but even the prefixes.

Since it is natural in the computer world the use of a numeric base in power of 2 (and not 10 as in human world) it was introduced in 1998 the system of the binary prefixes. Here directly the table. Nowadays we find in the situation that

the International Electrotechnical Commission (IEC) and several other standards
(NIST...) and trade organizations approved standards and recommendations 
for a new set of binary prefixes that refer unambiguously to powers of 1024

When you read 1GB it should be 1 000 000 Bytes,
instead when you read 1GiB it should be 1 073 741 824 Bytes.

Why still should be and not is? Because it depends from how the legislator of the nation in which is produced the item and the legislator of the nation in which the item is imported adopt and transform in law the directive of the international commissions.

So keep your eyes well open.

(Even because in several nations it is prescribed to write the informations to fulfill the duties of law on an adhesive label. Usually it is so little than you really need to keep well open your eyes to read read it)


Additional References

Hastur

Posted 2009-07-15T09:06:15.563

Reputation: 15 043

7

When a drive manufacturer creates a 500 GB capacity drive, it does have a capacity of 500,000,000,000 bytes, and they are sure going to advertise it as such. Computers, being binary devices, prefer powers of two, with a different set of prefixes, so that is what they use for storage space measurement:

1 kibibyte = 2^10, 1 mebibyte = 2^20, 1 gibibyte = 2^30, etc.

For instance, I have a 300 GB drive attached to this machine and Windows displays the following for the capacity:

Capacity:          300,082,855,936     279 GB

300,082,855,936 / 2^30 = ~279. What it is actually showing you is the drive's size in gibibytes, not gigabytes. So, it should read:

Capacity:          300,082,855,936     279 Gi

One might say this is a flaw in Windows, but apparently there is no definitive standard for storage capacity prefix meanings. Lots more good info, including a section on "Consumer confusion", in this Wikipedia article.

raven

Posted 2009-07-15T09:06:15.563

Reputation: 5 135

4

See this article for an explanation.

Basically, there are two definitions of a "gigabyte". One definition is that 1GB = 10243 bytes. This is the definition that the computer reports (for technical reasons).

The other definition (from SI units) is that 1GB = 10003 bytes. This is the same as every other metric unit ( 1 gigameter = 10003 meters).

Since the metric definition of a gigabyte is less than what the computer considers a gigabyte, hard drive manufacturers use the metric definition because they can print a larger capacity on the box.

A small amount of space is also used by the file system itself, but most of the missing capacity is from the definition of a gigabyte.

Stephen Jennings

Posted 2009-07-15T09:06:15.563

Reputation: 21 788

4

If you want to be sure about how big it really is, find out what sector size it uses and the total number of sectors. Then multiply these two numbers to get the total size in bytes. This is the true size! In any operating system! It is also referred to as disk capacity.

T = b x S

Where T is the total disk size in bytes,
b is the sector size in bytes,
and S is the total number of sectors.

Number of sectors

You will often find the number of sectors printed on a label on the device itself. If not, then look at the data sheet for your model. This is a document specifying all kinds of technical details about your model. In an Internet connected world, you will find this on the manufacturer's website, either in some kind of table on a web page or as a file you can download (commonly PDF) for study and reference. In the old age (before there was a web), you might have received a printed copy when you purchased the hard disk drive.

Sector sizes

There are two kinds of sectors: physical, and logical. Most commonly, the physical sector size is 512 byte on a standard disk. The sector size is not listed on the label of a modern hard disk drive. To understand why this is, you need to understand the difference between logical and physical sectors. I will try to explain this briefly.

LBA disk

Modern hard disk drives use logical sectors. You will see this referred to as LBA (Logical Block Addressing). In fact, when looking for the total number of sectors on the label, you will see the number of sectors referred to as LBA, so it will say something like LBA: 123456789. This is your total number of sectors. These are the logical sectors on the disk, and they are written to and read from using the LBA addressing method. This method allows the operating system to use a file system formatting (e.g. NTFS, FAT32) with an allocation unit that is bigger than the physical sector size.

wd maxtor

Allocation unit

The allocation unit is similar in concept to a sector size, but it ads some level of flexibility in that you can change its size, without changing the size of the physical sector. If you have purchased and installed, and then formatted more than one hard disk drive in your life, then you have undoubtedly come across this term. The most common allocation unit sizes for an NTFS formatted hard disk drive today are 4K, 8K, and 16K. I say "today" because of the disk sizes that hard disk drives are available in these days.

Namely, what allocation unit size is appropriate for one hard disk drive may not be appropriate for another. It depends on how big it is. Smaller ones are better off with smaller allocation unit sizes, and the bigger ones are better off with bigger allocation unit sizes. However, that does not stop you from using a big allocation unit size on a small hard disk drive. On the contrary! Thanks to the logical nature of the allocation unit, it can be set during the formatting process, and it can be set to be bigger than the physical sector. On a small hard disk drive, a big allocation unit tends to give a slight performance increase, on the expense of the disk space though.

This is why Microsoft has changed the terminology, from sector size, to allocation unit. This happened several Windows versions back. If I recall correctly, it was with one of the 9x family of Windows that they started using this term.

The allocation unit is then translated and mapped internally to one or several physical sectors on the disk. This task is performed by the drive controller. The controller is the PCB board on the back of the hard disk drive. On the early ATA hard disk drives (now known as Parallel ATA or PATA), the controller board was known as IDE (Integrated Drive Electronics). Historically, the hard disk drives did not always have the controller built into them. Instead, this was a separate interface.

The most common, physical sector size on an LBA addressed hard disk drive is 512 byte. But since around year 2010, many new hard disk drives are now of the Advanced Format type. This simply means that it uses sector sizes that are bigger than 512 byte. Currently, the biggest sector size is 4K, or 4096 byte.

The main point is: the physical sector size on a modern hard disk drive has little to no relevance for the user. The physical sector sizes are organized into logical sectors and allocation units, and abstracted away from the user. There is even one more layer of abstraction with the Advanced Format disks, because those disks can emulate 512 byte sectors but use 4096 physical sectors. For this reason, the sector size is usually not printed on the label of an LBA addressed hard disk drive, and even more so for Advanced Format disks. But they do have physical sector sizes, nevertheless. You will find this detail in the data sheet for each model, or by using a utility software on a running system.

CHS disk

This type of disks pre-dates the LBA addressed disk drives. They use a method called CHS (Cylinder Head Sector) addressing for reading and writing. The user has direct access to physical sectors. Unlike LBA, there is no sector abstraction layer. The sector size on these disks is almost guaranteed to be 512 byte. But it could be changed by the user.

Have you ever heard of "low level formatting"? This is where that term stems from. As a result of direct access to physical sectors, it is possible to change the size of the sector. This allows the user to "low level" format the disk, which means re-writing the sectors physically on the disk. This was sometimes useful when there was a problem with the disk. It was a means of refreshing the disk. True low level formatting is no longer possible with modern hard disk drives. This is not to be confused with file system formatting.

quantum ibm

The CHS disks always had the number of Sectors Per Track (SPT) printed on the label, among other details. If there was no mention of sector size, it was assumed to be 512 byte. The other details being number of cylinders and number of heads. Those were the main three. Hence the name, Cylinder Head Sector. There was a good reason for this too. Because on the really early hard disk drives that used CHS addressing, all of these parameters had to be set manually in the system's BIOS setup program. This was part of the installation process! So this was a key piece of information in order to properly install it. As the PC platform evolved, including BIOS enhancements, disk drive and interface innovations, it was possible to just plug in the hard disk drive and the system would detect it and configure it automatically.

You may have noticed that I write about these disks in past tense. This is because they are obsolete, and they are (almost) nowhere to be found. Except for technical museums perhaps.

Prefixing byte sizes

Some basics first on measurements:

  • A binary digit (bit) is the smallest unit of measurement in a binary computer. It is either a 1 or a 0. (Or both in a quantum computer.)
  • A bit is abbreviated with a lower case b, or spelled out as bit.
  • The next unit is a byte.
  • A byte is abbreviated with an upper case B, or spelled out as byte or byte.
  • A byte is exactly 8 bit.
  • The next unit is a word, and it is usually just spelled out as word.
  • Word length depends on the processor architecture. It is commonly 8 bit, 16 bit, or 32 bit, or 64 bit.
  • The next unit after that is a multiple of a word, such as a double word or quad word.
  • A double word is abbreviated as Dword or Dw, and a quad word is abbreviated as Qword or Qw.

Those are the basic measurements, but you will not encounter words unless you are a programmer. Disk sizes, partitions and files are using bytes. A byte is the most practical measurement to work with. A sector on a disk is a block of a bytes. By convention, this is most commonly 512 bytes, which is a multiple of 2.

2^0 = 1 byte
2^1 = 2 byte
2^2 = 4 byte
2^3 = 8 byte
2^4 = 16 byte
2^5 = 32 byte
2^6 = 64 byte
2^7 = 128 byte
2^8 = 256 byte
2^9 = 512 byte

These smallest byte sizes can be easily expressed with numbers only. But the 20th multiple of 2 is 1048576, and the 30th multiple is 1073741824. If this represents bytes, we can use a prefix to express the same value more simply. This is why we have prefixes like kilo, mega and giga. But the problem is that these are the SI (Système International) prefixes that are used in the metric decimal measurement system. Each prefix in this system represents a value that is a multiple of 10. While a binary computer uses a base of 2 to measure information.

unit 10^0 = 1
kilo 10^3 = 1000
mega 10^6 = 1000000
giga 10^9 = 1000000000

It is for this reason that IEC, an international standards body, has introduced the concept of binary prefixes. The names kilo, mega, giga, and so on, have been slightly changed in this system to reflect that they are to be used with binary measurements.

kibi 2^10 = 1024 = 1024^1
mebi 2^20 = 1048576 = 1024^2
gibi 2^30 = 1073741824 = 1024^3

The names are concatenations of their respective name in the SI system, and the word binary. For instance, kibi, is formed from kilo and binary.

If I say that an object has a mass of 5000 grams, I can express that value with a prefix as 5 kG (kilogram). I am dividing it by a thousand to remove the trailing zeros. Because the value of the prefix is known, a second person doesn't need to ask me how many grams I measured up the first time. He simply reverses the process, by taking my notation of 5 kG and multiplying it by a thousand to convert it to grams. Kilo means thousand, so 5 x 1000 = 5000.

The first 30 sectors on a disk is 15360 byte, if each sector is 512 byte. To express this more simply, I could divide it by 1000. The result is 15.36 kilobyte, or 15.36 kB. If I were to round it to the nearest whole number, it would be 15 kB. If another person looked at this number, he would assume that 15 kB was the exact measure, and multiply it with 1000 to convert it to bytes. So that would be 15000 bytes, which is not right, because the original measurement was 15360 bytes. On the other hand, if I were to divide 15360 byte by 1024, I would get exactly 15 KiB! That's kibibyte. No decimal expansion! Since it says "KiB" and not "KB", another person would know to multiply by 1024, and not 1000, to get the original value.

Similarly, when a manufacturer prints 8 GB on a device, they are using decimal prefixes. The ones with trailing zero values! So 8 GB is not 8 GiB (gibibyte) or 8 x 2^30, but 8 x 10^9 = 8 000 000 000 bytes. However, Windows is using binary size calculations (powers of 2) with what looks like decimal prefixes (i.e. "GB"). So in Windows, these 8 000 000 000 bytes are divided by 2^30 (or 1024^3) to get 7.450580597 "GB" (in reality GiB). This is rounded to the nearest hundredths place, so it will show as 7.45 "GB" in Windows. I keep quoting "GB" because Microsoft should be using GiB for this meaning, not GB. This only ads to an already confusing topic.

Working examples

I will now run through some examples, using the label information from the hard disk drives in the pictures. Let's have a look at the 500 GB disk first.

Capacity: 500 GB
LBA: 976773168
976773168 x 512 = 500107862016 bytes
500107862016 / 1024^3 = 465.761741638 ≈ 466 GiB

So this is 466 GiB, or 466 GB in Microsoft terms (and JEDEC). Note that the number was not even after division. I believe this is because there are more sectors than the user can use to store data. Some sectors are protected and some are used for re-mapping. Some sectors become bad over time, so this is when the other sectors are used as a reserve. The hard disk drive marks and keeps track of the bad sectors and stops using them.

If you take only the capacity number and convert it to GiB it will look something like this.

500 GB = 500 x 10^9 = 500000000000 byte
500000000000 byte = 500000000000 / 1024^3 = 465.661287308 ≈ GiB

You can see that it's a somewhat smaller number, but it still rounds to 466 GiB. But in exact bytes, this is more closer to how much you can actually use. This way, you don't need to know the sector size. Exact capacity is still calculated using the LBA number and sector size. That's what I will be using in the rest of the examples.

Capacity: 320 GB
LBA: 632672208
632672208 x 512 = 323928170496 bytes
323928170496 / 1024^3 ≈ 302 GiB

Lastly, here is one of the CHS disks. The basic idea is very similar. The sector size is assumed to be 512 byte if it's not otherwise indicated. I will look at the Quantum disk. You can do the IBM yourself. The quantum disk does not say anything about its capacity.

C: 2484
H: 16
S: 63
2048 x 16 x 63 x 512 = 1056964608 bytes
1056964608 bytes = 1056964608 / 1024^2 = 1008 MiB
1056964608 bytes = 1056964608 / 1024^3 = 0.984375 ≈ 0.98 GiB

There you go! A whopping 0.98 GB! Pardon me! I meant 0.98 GiB! ;-)

Marketing

There is something called "guaranteed sectors". You will find this printed on the label of some hard disk drives, or in their data sheet. This is the result of the ongoing dispute between users/consumers and the storage device vendors. This confusion is still present today, in the age of cloud computing and in a world where solid state disks have become a mainstream technology and are gradually replacing old hard disk drives.

I would say marketing has very little, if anything, to do with this. It is purely a math problem, and it's not a problem with the math itself, but with people. It is all just a big confusion that has been allowed to go on. At very least, Microsoft should be denoting binary prefixes as KiB, MiB and GiB. Windows is still the main operating system on PCs today.

Samir

Posted 2009-07-15T09:06:15.563

Reputation: 17 919

3

They actually usually are as large as they are advertised, but:

  1. They always (as far as I know) use 1000 instead of 1024 when doing B to KB and so on.
  2. Some small amount of space is used by the file system to keep track of everything.

May be other reasons too, but those are the major ones I know about

Svish

Posted 2009-07-15T09:06:15.563

Reputation: 27 731

3

In the old days of computers every calculation was expensive (in the performance sense). Programmers used all kind of shortcuts to do as little calculations as possible. One of those tricks was to store the year part of a date as only two digits, which ultimately led to the y2k problem. Another trick was that they defined 1k (kilo) to not mean 1000 as everyone else in the civilised world did, but to mean 1024 instead. This allowed them to cut a few corners when doing size calculations. That habit stuck and is still being used today although computer calculations have become so much cheaper.

The hardware manufacturer is giving you the proper size where K=1000, M=1000000 and G=1000000000. It's the software that's giving you false values.

Software manufacturers are changing their habits nowadays. OSX for example shows the proper size.

Dennis Janssen

Posted 2009-07-15T09:06:15.563

Reputation: 47

@arne.b: A "4GB" flash drive will typically contain a chip with 4,429,185,024 bytes of storage, which is to say 4.125GiB. Because the performance of flash drives is strongly correlated with the amount of slack space, a drive which tried to make 3.999GiB or more of storage available to the user would probably perform much worse than one that tried to make 3.73GiB available, – supercat – 2016-05-26T17:59:11.733

Good to know that they are starting to change. – 09stephenb – 2014-05-24T11:39:24.613

6I do not think it is correct to attribute the power-of-two-habit to cutting corners. For example, the MBR HD size limit of 2.2 TB (2 TiB) is not at 2.2 TB because someone today (or in the past) cut corners, but because it still nowadays makes sense to use binary format for adresses, and 2^32 512 byte blocks mean 2.2*10^12 bytes. (This also means that it is completely pointless to sell flash drives in sizes that look like powers of two - 4GB, 512GB - because the actual number of bytes is not really near a power of two.) – arne.b – 2014-05-24T12:16:49.400

1I think you've got the wrong end of the stick... Using SI magnitude units allows the manufacturers to reach what they call "2GB" more cheaply with less hardware... – Basic – 2014-05-24T12:24:18.460

Relevant - http://superuser.com/q/287375/8972

– paradroid – 2014-05-24T12:29:29.350

2Hard disk and networking tends to use the decimal units and memory related values use binary. – paradroid – 2014-05-24T12:30:32.743

-1

This should clear up others comments who think there is a standard and metric equivalent when referring to hard drive size.

No, we do not use the metric system for data, exactly. I would think of it as “meta-metric” — units that are “next to” actual metric units.

Metric prefixes WERE borrowed to express data sizes — kilo=, mega=, giga-, tera-, peta- etc.

However, SI has no unit for “bit” or “byte”.

And, smaller units, milli-, micro-, and nano- were also borrowed, though not applied to data, but to “processors”. (“Minicomputers” were smaller computers, compared to main-frames. “Microprocessors” and “microcomputers” were much smaller than minicomputers. In neither case was the 1000:1 ratio implied.)

James L

Posted 2009-07-15T09:06:15.563

Reputation: 1