tar performance writing to disk


When writing to disk (SATA) I've noticed tar seems to have a large performance hit. I'm trying to copy a, relatively, large .dmg file (556MB) from my client (OSX) across my local network as part of a backup to my server (debian). Trying the typical method the results were pretty bad in terms of transfer speed throughput from the client and I/O on the server

For I/O monitoring both: iostat -Ndx 1 and iotop -oa were used on the server

scp: ~18 minutes throughput on client ~500KB/s-540KB/s I/O on server ~800kB/s-1100kB/s (time scp <my_file> user@host:/path/to/dir/)

sftp: ~50% faster ~9 minutes throughput on client ~1MB/s I/O on server ~1500kB/s-2000kB/s but can't be scripted as I used the cyberduck gui

More research yielded this post and so I tried the following:

On the Client:
(tar -cf - <my_file> | pv -s $(du -sb <my_file> | awk '{print $1}') | nc -l 8888)

On the Server:
(nc <source_ip> 8888 | tar xf -)

NOTE: I dropped pigz usage as it seemed to cause throughput from the client drop to 0 Kb/s frequently during transmission.

This yielded the worst results of around ~33 minutes with throughput on the client being ~300-400KB/s and I/O on the server ~800-1200KB/s which in the last 5 minutes dropped to about ~200KB/s and I/O of ~800KB/s respectively.

To ensure it was not the network I modified the server to (nc <source_ip> 8888 > /dev/null) and transfer time dropped to ~2minutes with client throughput at ~6-7MB/s.

Through more searching in the man page I decided to modify the blocksize (-b, --blocking-factor) to higher values i.e., 128, 512, 1024, etc... which yielded much better write performance with -b1024 being comparable to the redirect test to /dev/null. The man page seems rather dated any only refers to alteration of this option in relation to writing to tape and makes no mention of modern media. Is it possible there are negative implications for data integrity for doing this? By this modification, and according to the man page, I assume tar is trying to write the data in blocks of 512bytes so by modifying it to be 512*1024 which is blocks of 512KB I don't know if there would be an issue for the OS to write this.


originally posted away from computers so I updated the actual commands used, provided more accurate times, and fixed typo's. Also tried suggested scp encryption as suggested below and included results

scp: ~17.5 minutes throughput on client ~500KB/s-540KB/s I/O on server ~1100kB/s-1500kB/s
(scp -C <my_file> user@host:/path/to/dir/)

with modified block size: ~42 seconds throughput on client and I/O on server ~15MB/s

client: (tar --disable-copyfile -cf - <my_file> | pv -s $(du <my_file> | awk '{size = $1 * 512} END {print size}') | nc -l 8888)

server: (nc 8888 | tar -b1024 -xf -)


Posted 2015-07-07T17:32:04.237

Reputation: 23



Depending on the content of the file you may get a shorter time in the scp approach if you enable compression by adding the -C option.

For the nc approach you can drop tar from the picture since you only transfer a single file (tar's primary function is to mux/demux multiple dirs and files into/from a single data stream):

nc -l 8888 < <my_file>
nc <source_ip> 8888 > <my_file_copy>

You could try compression with the nc approach as well:

cat <my_file> | gzip - | nc -l 8888
nc <source_ip> 8888 | zcat - > <my_file_copy>

Overall plain nc could be faster than scp since encryption/decryption is out of the picture.

If you still want to use tar then yes, the blocking factor matters a lot. See the docs and this Q&A for example. BTW, tar's block size is 512 bytes, not KB.

Dan Cornilescu

Posted 2015-07-07T17:32:04.237

Reputation: 802

Trying your first approach of removing tar had an ETA of about 2.5 hours which I cancelled about 10 minutes into the transfer. The compression over nc approach had basically the exact same results as the sftp method I tried using cyberduck so not a bad solution at all. I read both the doc and Q&A you advised before actually posting my question but was unsure whether trying having tar write 512KB blocks would be an issue since I don't know the max my server OS/disk would handle (debain 7, etx4) – johnsoga – 2015-07-08T19:24:58.017

I don't think that tar's block size (default 20x512=10K) has any relationship with the filesystem's block size (default 4K for ext4). The tar's block size matters when writing to or reading from the archive (the nc stream in your case) - the other end of the pipe (from tar's prospective). – Dan Cornilescu – 2015-07-09T03:25:48.230