18

I soon will have a folder with thousands of files, each file on the order of a few KB. I will need to transfer these across a Windows network from one UNC share to another. In general, is it faster to simply copy the files over en masse, or would it be faster to zip them up (e.g., using 7zip in fastest mode) and send one or a few large files? Or is there no difference in practice?

Dave Cheney
  • 18,307
  • 7
  • 48
  • 56
kestes
  • 183
  • 1
  • 2
  • 5

6 Answers6

41

It is faster to transfer a single large file instead of lots of little files because of the overhead of negotiating the transfer. The negotiation is done for each file, so transferring a single file it needs to be done once, transferring n files means it needs to be done n times.

You will save yourself a lot of time if you zip first before the transfer.

Jon Cahill
  • 646
  • 9
  • 7
  • 1
    http://en.wikipedia.org/wiki/Slow-start also favours large files. – Commander Keen May 19 '09 at 05:34
  • 5
    Consider that compression will take time, too. If your data cannot be compressed (e. g. JPEGs, ZIPs, JARs and other already compressed formats) you should only TAR them (or ZIP without compression). This will save CPU time for the pointless attempt to further compress your data. – Daniel Schneller May 19 '09 at 10:19
  • 1
    That many small files will cause you a lot of pain - in between tiny packets and doing an SMB handshake for each one, zipping will probably shave a good 60% off your copy time. – user2278 May 19 '09 at 13:44
  • 1
    +1 for TAR since you can copy/extract partial archive. – Cristian Vat May 30 '09 at 04:42
  • This answer is correct, but on Windows 7 (at least) there is a known bug where copying the exact same set of files on XP is **much** faster than on Windows 7: http://social.technet.microsoft.com/Forums/en-US/w7itproperf/thread/5fa159e3-2053-4d8d-b0a9-11424c7eb196 – tbone Apr 25 '12 at 17:29
  • Note that if the files are really small, a pk zip archive might be bigger than the raw files, since zip stores two copies of file metadata per file, which can add up to between perhaps 80 and 140 overhead bytes per file depending on what "extra" filestamps, uids and other metadata are included. So another archive format might be be slightly more efficient. But overall, the networking overheads are probably the biggest issues, so any archive will help. – nealmcb Jul 11 '12 at 23:42
7

Jon Cahill is very correct, a single file will be faster. However it's worth keeping in mind that if there is any instability in the connection, individual files (or medium-sized groups in zip files) may be better, because if the transfer fails you'll have to start all over again, whereas with multiple files you will just have to re-do the last file started

Glenn Slaven
  • 2,330
  • 2
  • 29
  • 41
2

Lots of little files will also be more expensive to write to the file system than a single large file. It needs to do things like:

  • Check the file name is unique
  • Write out the file table entry

As you get more and more files in a directory this can become quite costly. And each of these steps can add latency to the copy process and slow the whole thing down.

Luke Quinane
  • 717
  • 1
  • 9
  • 20
  • 2
    I guess he's still going to need all the small files in the target system, so he'll probably have to extract the zip later on, i.e. the filesystem will still have to do the work. Sending the large file and unzipping will still be much faster than transferring all the small files over net, though. – BlaM May 19 '09 at 08:08
  • @BlaM, as I said in the answer it all comes down to latency. If network latency is added onto each CreateFile operation the total time could be much longer. If the copy is smart enough to concurrently create files perhaps it wouldn't impact the operation. – Luke Quinane May 19 '09 at 23:51
0

The average packet size relative to average file size is probably critical here. With lots of small files you may find yourself sending out many tiny packets. Tiny packets still incur TCP overhead; you could wind up doubling the amount of traffic as a result.

Modern systems and even relatively ancient ones can send multiple files over a single TCP connection, avoiding the costs of that handshake.

jldugger
  • 14,122
  • 19
  • 73
  • 129
0

Just what I've found, but if you want a faster transfer initiate the transfer from the local computer, and copy to the local drive.

Ie copy \computer1\myshare to c:\files\myshare, don't use a third computer and copy from \computer1\myshare to \computer2\mynewshare.

Tubs
  • 1,194
  • 3
  • 11
  • 19
0

It's also worth remembering that the choice of protocol affects the overall time to complete - for example, to FTP files from one host to another, can be noticeably faster than using windows file sharing (of course, things like domain permissions and the like are also lost, but in some situations, this can be an acceptable trade off -- After all, these would also be lost by zipping/unzipping)

Rowland Shaw
  • 494
  • 1
  • 9
  • 19