My SanDisk USB flash drive shows that 43GB is used when I just copied a 10GB folder after formatting

17

5

I recently bought a SanDisk 128GB USB flash drive.

And after formatting the USB flash drive in exFAT format, I copied a folder whose capacity is around 10GB. There are lots of small files in it, so it took some time.

However, when I see in the Windows Explorer after copying the folder, it says that around 43GB of the storage is occupied and now only 70GB of the storage is free to use.

What is happening and how should I deal with it? Is my USB flash drive physically broken?

It is still weird because when I copied a single file with 7 GB capacity, it showed the remaining capacity correctly at around 110 GB available..

Felix Lee

Posted 2018-10-28T20:16:27.380

Reputation: 199

Question was closed 2018-10-30T02:07:39.050

7If you right click on a small file and go to properties what does it display for "size" and "size on disk" – Scott Chamberlain – 2018-10-28T20:24:59.503

20

You said a 10GB file in the title but actually copied a 10GB folder of small files. They're completely different. If your cluster size is 4KB and your files are 1KB on average then obviously it'll take 40GB on disk. By default the allocation size of exFAT is much higher than other file systems

– phuclv – 2018-10-29T01:58:25.783

Why did you do that? Do you have to use the USB stick with something other than laptops, desktops and similar hardware? AFAIK only some car radios & such do not support NTFS or a similar alternative... – Bakuriu – 2018-10-29T21:45:56.447

I don't think folders have a "capacity" (some maximum they can hold) expect maybe number of files. What do you mean? – jpmc26 – 2018-10-30T02:03:42.283

Answers

56

You already answered your own question: There are lots of small files in it

Every file on an exFAT volume takes at least one blocksize. So a file of a single byte in size takes at least 4K - a size amplification of 1:4096. You are seing a size amplification of 4.3, which is very plausible with lots of small files.

You can check this hypothesis by packing the files with WinRAR and the zero compression settings, then copy this file to the USB stick.

Eugen Rieck

Posted 2018-10-28T20:16:27.380

Reputation: 15 128

I actually don't understand what you mean. Then, do you mean it is a normal occasion? If I format the drive to NTFS, then will the folder be copied in an appropriate capacity? (I actually have no idea how exFAT works) – Felix Lee – 2018-10-29T01:17:57.557

14It means exactly what it means. Disk space is allocated in increments of 4kb, approx. A one byte file takes up 4kb of disk space. A two byte file takes up the same 4kb of disk space. Ditto for 3 bytes, and up to 4096 bytes. A 4097 byte file takes up 8192 bytes of disk space, and so on (this is ignoring the overhead of creating directory entries). The average size of your files seems to be about 1kb, so you end up using up four times as much as the sum total of your data. All filesystems work this way, FAT or NTFS, differing only in blk sizes, but some optimizations are possible, occasionally. – Sam Varshavchik – 2018-10-29T01:28:25.527

7NTFS is substantially more efficient than any version of FAT at handling lots of small files. If you're only ever going to use this USB drive with full-size computers running Windows, formatting it as NTFS is a perfectly reasonable thing to do. If you planned to plug it into a camera, on the other hand, or an Apple product, they wouldn't be able to read it. – zwol – 2018-10-29T01:53:01.460

Pretty much any file systems work like that: dividing the drive into blocks instead of bytes @zwol NTFS can store files directly inside the MFT entry, but since the entry is only 1KB long, only files that are a few hundred bytes long can be made resident

– phuclv – 2018-10-29T01:55:07.277

3Is it possible the exFAT was gratuitously formatted with a block size much larger than 4k? That could be fixed by reformatting as exFAT with sane options, with no loss in compatibility. – R.. GitHub STOP HELPING ICE – 2018-10-29T03:07:05.570

zero compression may be logical choice but using some compression might be even faster. Because CPU calculates compression usually faster than writing full size on disk/card – JIV – 2018-10-29T08:32:58.913

@JIV Of course it would be faster - but it wouldn't be a valid test against the alloation size hypothesis, as the amount of data written would differ very significantly. – Eugen Rieck – 2018-10-29T08:52:51.393

3@zwol Apple products can read NTFS drives. They just can't write to them by default – awksp – 2018-10-29T13:36:40.747

4k blocks is seriously optimistic for exFAT. I recently formatted a 64GB micro-SD for my phone under Windows and just noticed the very last instant... wtf? 32k block size? – Damon – 2018-10-29T14:25:47.867

Just to make sure, I edited my answer to point out it is at least 4K. – Eugen Rieck – 2018-10-29T14:48:17.330

1After researching more (and writing my own answer) I think this answer is incomplete/misleading without mentioning that exFAT often (esp. by default) has utterly huge block size (allocation unit), not 4k. – R.. GitHub STOP HELPING ICE – 2018-10-29T15:21:13.783

4MSDN says that the default cluster size for an exFAT partition with 128 GB is 128 kB. That will behave very badly with small files. Zipping is your friend here. – Peter - Reinstate Monica – 2018-10-29T15:37:41.403

@awksp Huh, I recall MacOS not having any support at all for NTFS, but the last time I tried it was more than ten years ago, so probably I'm just out of date. Thanks. – zwol – 2018-10-29T15:50:04.517

1@SamVarshavchik "It means exactly what it means" There's no need to be a dick about it. Obviously the OP didn't understand what was actually a pretty poor explanation. It might mean what it means (which is a completely pointless thing to say) but that doesn't mean it's easy to comprehend what it means. – Clonkex – 2018-10-29T22:15:43.753

@zwol Macs are greatly improved, they now have Unix under under the hood, the same OS used by that computer in Jurassic Park. I fully expect OS 11 to be LCARS. Or ACARS. I really wouldn't put anything past Apple. – Harper - Reinstate Monica – 2018-10-29T22:27:23.447

Why WinRAR, which asks you to buy a license and requires jumping through hoops to even get a trial, over the open source 7-zip? (7-zip's 7z with LZMA compression format is also more commonly supported, since it's available under LGPL.) – jpmc26 – 2018-10-30T02:05:31.787

@jpmc26 The reason I mentioned WinRAR is, that it has an easily GUI-available "no compression" setting, that 7Z lacks. This way it is possible to test the allocation block hypothesis (one big file vs. many small files) against reality. This is evaluation use of WinRAR - I did not (and do not) endorse it for normal compression tool use. – Eugen Rieck – 2018-10-30T08:12:08.513

@EugenRieck That is simply not true. 7-zip's GUI has an immediately available compression level setting with a "Store" option. tar is also readily available in the formats. – jpmc26 – 2018-10-30T14:42:42.293

As I said - there is no setting called "no compression". Calling it "store" is counterintuitive. But this complete discussion misses the point: To test the hypothesis, try to put all those small files into one big file, but without compression. – Eugen Rieck – 2018-10-30T15:11:51.743

14

When formatting as exFAT, you almost surely chose some large allocation unit (block size) like 128k or 512k. Reformat with the standard 4k allocation units and the problem should go away.

R.. GitHub STOP HELPING ICE

Posted 2018-10-28T20:16:27.380

Reputation: 1 783

4the default allocation size for a 128GB partition is 128KB – phuclv – 2018-10-29T03:56:26.737

2Yeah, that's a huge problem. Reformat with 4k. – R.. GitHub STOP HELPING ICE – 2018-10-29T05:21:10.897

1For me, the default allocation size was 512KB.. – Felix Lee – 2018-10-29T06:46:55.777

@FelixLee that means even a 1 byte file will take 512 KB. – Captain Man – 2018-10-29T13:56:01.970

So is it the best way for me to format it in 4k? – Felix Lee – 2018-10-29T13:57:02.647

There are three conceivable reasons why MS would choose such large cluster sizes: (1) Prevent fragmentation (the reasons they give on their support site); (2) Prevent performance degradation; (3) Prevent adminstrative information overhead. I'm pretty sure that one of these will bite you when you go to 4 kB; But of course if it's a delete-rarely use then fragmentation is not an issue, and if it's mostly a backup device then performance is not paramount. – Peter - Reinstate Monica – 2018-10-29T15:42:13.940

@PeterA.Schneider: (1) Preventing fragmentation is possible just by moving files when they change size at the filesystem implementation level; there's no reason to waste huge amounts of space to do it. You can explicitly defrag when needed if the OS doesn't know how to do that. (2) Is only a consequence of (1) and solved the same way. (3) This is likely the reason - they don't want people who just store huge photo and movie files to experience something like 5% overhead and only get 95GB out of their 100GB device. But 5% max is a non-issue compared to 10000% overhead when you store small files – R.. GitHub STOP HELPING ICE – 2018-10-29T15:51:51.073

@R.. Well, fragmentation is mentioned on the [MS support page.] But fragmentation is not as important for flash memory, and there should be little performance penalty for fragmentation. There may be a penalty for looking up many little clusters though, and USB communication overhead. – Peter - Reinstate Monica – 2018-10-29T16:00:25.110

@FelixLee That will help, but the best way to handle this is to compress them into a single file (zip) since the drive will then only need to allocate enough blocks to contain that file (with very little wasted) rather than the many files, individually, stored within it. – GalacticCowboy – 2018-10-29T17:06:40.033

@PeterA.Schneider: Fragmentation is important if large files are being modified, because the flash read-modify-write units are probably large. – R.. GitHub STOP HELPING ICE – 2018-10-29T17:15:33.440

@GalacticCowboy: That's ridiculous, using another layer of structure on top of the filesystem that prevents individual file access in applications to work around misbehavior of the filesystem. Just tune the settings of the filesystem to behave properly. – R.. GitHub STOP HELPING ICE – 2018-10-29T17:17:37.247

@R.. It's unclear from OP whether that's even a requirement. – GalacticCowboy – 2018-10-29T18:15:55.863

1@GalacticCowboy: It may not presently be a requirement, but it's just general best practices. It may especially become an issue in the event of needing to access the files from a different system or if there's corruption on the drive and OP is trying to recover as much data as possible (good luck if it's nested in an archive file, especially a compressed one). – R.. GitHub STOP HELPING ICE – 2018-10-29T18:21:50.657

@R.. That is an excellent point. – GalacticCowboy – 2018-10-29T18:24:04.890

But I don't understand why Windows set 512kb as default for a 128gb drive. – Felix Lee – 2018-10-30T00:48:17.010

@FelixLee: Because Windows does lots of really stupid things, especially when they naively look better to a significant number of people (like the ones who are upset about only getting 95GB of movie files on a 100GB drive). – R.. GitHub STOP HELPING ICE – 2018-10-30T00:56:41.933

Then which size is the best recommended? – Felix Lee – 2018-10-30T04:38:24.373

As I've said several times, 4k. – R.. GitHub STOP HELPING ICE – 2018-10-30T09:26:49.027

7

Why is this happening?

Because you're storing a lot of tiny files.

Filesystems have a minimum file size that they can store. For NTFS filesystems, it's usually 4KB. For exFAT, it can be much larger. That's called the block or cluster size. Files that are smaller than this size will still use up the minimum size, so a 1KB file might use 4KB of disk space. A 3KB file would also use 4KB of disk space. If you have a 5KB file, it'll use 8KB of disk space.

You can imagine it like a grid of holes. Each hole can hold a certain amount of data. Files are spread across as many holes as necessary to hold all the file's data, but holes can't have data from more than one file. So, if a file's data doesn't completely fill a hole, some of that space is wasted. No other file can use it that hole so the unused space is unavailable.

What can you do about it?

In your case, you have a lot of files that don't fill the holes, so there's lots of wasted space. If you were to put all the files into a ZIP file, then all that data would be contained in a single file and it would use a lot less space on the drive.

Some USB drives are formatted as exFAT by default, so alternatively, if you're just using this drive to copy files between Windows computers (or just for storage), you could try reformatting the drive as NTFS (but copy all the files off first, obviously!) to try to get a smaller cluster size.

Clonkex

Posted 2018-10-28T20:16:27.380

Reputation: 780

2

As the other answer suggested, use an archiver, but I'll recommend using 7z instead of WinRAR because it's free, and also you can avoid installing any third-party archivers if you use Windows' built-in "Send to > Compressed (zipped) folder" option when you right-click files and folder. It's faster than 7z but it archives slightly slower.

In case you need to store mostly JPEG images or something else that doesn't compress at all, you should benefit from using 7z and picking the "no compression" option explicitly.

Using .zip archive format over .rar or .7z is important because Windows supports browsing them as if it was just any other folder (albeit with some limitations).

If you are okay with not being able to browse files like that on the flash drive, you can use another format, but the important part about the files not taking so much space is having a single archive file instead of all the original files separately.

Wildcard licensee

Posted 2018-10-28T20:16:27.380

Reputation: 49

3If the size grew 4x over nominal size, the vast majority of the files are 1k or smaller. These almost surely aren't jpeg files. – R.. GitHub STOP HELPING ICE – 2018-10-29T03:06:20.943

The other thing is, why would you ever select "no compression"? It'll still be slightly smaller, even if the files are not overly compressible. – Clonkex – 2018-10-29T04:08:08.447

1@clonkex because speed and latency are a thing – PlasmaHH – 2018-10-29T08:18:00.933

2@Clonkex Because compression algorithms are relatively slow and resource intensive by their very nature, if you know it's not going to get a meaningful gain from being compressed with the extra time to compress/decompress the files, why not tell the zipper to skip that step? – Trotski94 – 2018-10-29T09:34:45.270

1@JamesTrotter for ordinary (non-LZMA) compression algorithms, on ordinary machines, the compression code runs faster than the disk can write, so there is no "extra time" — writing the compressed archive is faster because it writes fewer bytes and the disk is the bottleneck. – hobbs – 2018-10-29T15:11:19.980

It hardly needs to be a single ZIP file. 5, 256 or 1000 ZIP files will also substantially solve the problem. – Harper - Reinstate Monica – 2018-10-29T22:16:50.190

The point about the use of WinRAR was exactly the "no compression" settings - it was explicitly designed not as compression, but as putting a lot of small files into one big file. Compression may or may not help with the type of data in the OQ – Eugen Rieck – 2018-10-29T23:46:29.620