2

System Win 7 32 bit. I am using WinRAR to extract a .rar file, but I think it is the same for any zipped files. I have:

1) SomeSourceCodeFolder.rar -> extract -> SomeDestCodeFolder

2) SomeSourceCodeFolder -> copy -> SomeDestCodeFolder

1st way is in general is much faster than the 2nd one. In my case, copying a 300mb code folder it says 45 minutes (which I only waited 5 minutes, didn't bother to finish it), but when I try to do the same with winRAR extract, it takes only about 45 seconds to finish.

Why is that? Doesn't zipped files go thru decrypt + copy to file system? shouldn't it be always slower than copying alone?

Tom
  • 123
  • 1
  • 4

1 Answers1

7

This perfectly makes sense for certain hardware: fast CPU, slow disk (HDD not SSD), just one disk.

The data has to be read and written. The amount of written data is the same in both cases but reading a compressed file means that less data has to be read. Furthermore it is usually much faster to read a single big file than to read a directory. This effect is bigger if there are many small files. You can reduce it by reading the directory structure into the cache so that the disk does not have to jump between the inodes and the data blocks:

# Edit: This works under Unix only
find /dir/to/be/copied -printf "" # just read the names
find /dir/to/be/copied -perm 777 -printf "" # just read the inodes

If the CPU does not slow down the data input (because deflating takes more time than reading) then extracting is faster than copying.

If you instead copy from a SSD to another device and your CPU is from stone age then copying will be faster.

Hauke Laging
  • 5,157
  • 2
  • 23
  • 40
  • alas trying to run the unix version of find on stock windows 7 install probably not going to warm up the file cache.. – Doon May 06 '13 at 00:24
  • @Doon Ooooops, sorry. Two fast switching between the sites. But there should be a similar effect on NTFS. – Hauke Laging May 06 '13 at 00:56
  • That is 45 sec vs 45 mins. You mainly talked about HDD nodes jumping and finding. That means 95% of the time the computer is doing that? Writing it back to HDD dest folder (i assume equal work for zip or copy) only takes 5% of time in that case?? – Tom May 06 '13 at 01:09
  • @Tom I don't really understand your remark. Reading the file and directory meta data is much faster if it is done for all files at once. This reduces the amount of head movements (which kill disk performance). If you have just one huge file in the archive then this effect completely disappears. The other extreme: If all your files are just 100 byte in size then the read performance of the disk probably doesn't matter at all because the head positioning nightmare determines the speed completely. – Hauke Laging May 06 '13 at 01:15