2
When writing to disk (SATA) I've noticed tar seems to have a large performance hit. I'm trying to copy a, relatively, large .dmg file (556MB) from my client (OSX) across my local network as part of a backup to my server (debian). Trying the typical method the results were pretty bad in terms of transfer speed throughput from the client and I/O on the server
For I/O monitoring both: iostat -Ndx 1
and iotop -oa
were used on the server
scp: ~18 minutes
throughput on client ~500KB/s-540KB/s
I/O on server ~800kB/s-1100kB/s
(time scp <my_file> user@host:/path/to/dir/
)
sftp: ~50% faster ~9 minutes
throughput on client ~1MB/s
I/O on server ~1500kB/s-2000kB/s
but can't be scripted as I used the cyberduck gui
More research yielded this post and so I tried the following:
On the Client:
(tar -cf - <my_file> | pv -s $(du -sb <my_file> | awk '{print $1}') | nc -l 8888
)
On the Server:
(nc <source_ip> 8888 | tar xf -
)
NOTE: I dropped pigz
usage as it seemed to cause throughput from the client drop to 0 Kb/s
frequently during transmission.
This yielded the worst results of around ~33 minutes
with throughput on the client being ~300-400KB/s
and I/O on the server ~800-1200KB/s
which in the last 5 minutes dropped to about ~200KB/s
and I/O of ~800KB/s
respectively.
To ensure it was not the network I modified the server to (nc <source_ip> 8888 > /dev/null
) and transfer time dropped to ~2minutes
with client throughput at ~6-7MB/s
.
Through more searching in the man page I decided to modify the blocksize (-b, --blocking-factor
) to higher values i.e., 128, 512, 1024, etc... which yielded much better write performance with -b1024
being comparable to the redirect test to /dev/null
. The man page seems rather dated any only refers to alteration of this option in relation to writing to tape and makes no mention of modern media. Is it possible there are negative implications for data integrity for doing this? By this modification, and according to the man page, I assume tar is trying to write the data in blocks of 512bytes so by modifying it to be 512*1024 which is blocks of 512KB I don't know if there would be an issue for the OS to write this.
EDITED:
originally posted away from computers so I updated the actual commands used, provided more accurate times, and fixed typo's. Also tried suggested scp encryption as suggested below and included results
scp: ~17.5 minutes
throughput on client ~500KB/s-540KB/s
I/O on server ~1100kB/s-1500kB/s
(scp -C <my_file> user@host:/path/to/dir/
)
with modified block size: ~42 seconds
throughput on client and I/O on server ~15MB/s
client: (tar --disable-copyfile -cf - <my_file> | pv -s $(du <my_file> | awk '{size = $1 * 512} END {print size}') | nc -l 8888
)
server: (nc 10.0.1.28 8888 | tar -b1024 -xf -
)
Trying your first approach of removing
tar
had an ETA of about 2.5 hours which I cancelled about 10 minutes into the transfer. The compression overnc
approach had basically the exact same results as the sftp method I tried using cyberduck so not a bad solution at all. I read both the doc and Q&A you advised before actually posting my question but was unsure whether trying having tar write 512KB blocks would be an issue since I don't know the max my server OS/disk would handle (debain 7, etx4) – johnsoga – 2015-07-08T19:24:58.017I don't think that tar's block size (default 20x512=10K) has any relationship with the filesystem's block size (default 4K for ext4). The tar's block size matters when writing to or reading from the archive (the
nc
stream in your case) - the other end of the pipe (from tar's prospective). – Dan Cornilescu – 2015-07-09T03:25:48.230