Transfer 15TB of tiny files

Question

I'm archiving data from one server to another. Initially I started a rsync job. It took 2 weeks for it to build the file list just for 5 TB of data and another week to transfer 1 TB of data.

Then I had to kill the job as we need some down time on the new server.

It's been agreed that we will tar it up since we probably won't need to access it again. I was thinking of breaking it into 500 GB chunks. After I tar it then I was going to copy it across through ssh. I was using tar and pigz but it is still too slow.

Is there a better way to do it? I think both servers are on Redhat. Old server is Ext4 and the new one is XFS.

File sizes range from few kb to few mb and there are 24 million jpegs in 5TB. So I'm guessing around 60-80 million for 15TB.

edit: After playing with rsync, nc, tar, mbuffer and pigz for a couple of days. The bottleneck is going to be the disk IO. As the data is striped across 500 SAS disks and around 250 million jpegs. However, now I learnt about all these nice tools that I can use in future.

possible duplicate of [linux to linux, 10TB transfer?](http://serverfault.com/questions/149045/linux-to-linux-10tb-transfer) — D34DM347, Sep 09 '15 at 15:40
One option is creating the compressed tar files on an external drive and moving that to the new system. The extra disk will speed up creating the tar files (won't be writing to existing disks in the system, possibly while trying to read 15TB from them) and doesn't tie up the new server. — Brian, Sep 09 '15 at 15:47
*Is there a better way to do it?* - Yeah, Windows Server 2012 R2 DFS replication [would prepare that in about 10 hours](http://blogs.technet.com/b/filecab/archive/2013/11/15/using-dfs-replication-clone-feature-to-prepare-100tb-of-data-in-3-days-a-test-perspective.aspx). And it would sync changes, and pick up where it left off after reboots. — TessellatingHeckler, Sep 09 '15 at 17:15
@TessellatingHeckler: so you suggest OP migrates from Redhat to Windows before archiving? — Thomas Weller, Sep 09 '15 at 22:22
@ThomasWeller They asked "is there a better way?", and there is. I make no recommendation that they use the better way. They're free to use commands in a pipe which can't recover from interruption, won't verify the file content, can't report copy status, can't use previously copied blocks to avoid copying parts of files, has no implicit support low-priority copying, can't be paused, has no mention of copying ACLs, and needs someone to stay logged in to run it. Anyone else following along, however, might be interested - or prompted to say "x does that on Linux". — TessellatingHeckler, Sep 09 '15 at 23:52
@TessellatingHeckler: That sounds a bit like BTRFS send/receive. https://en.wikipedia.org/wiki/Btrfs#Send.2Freceive. I think that can work as a dump/restore but with incremental capability. Some other Linux filesystems also have dump/restore tools that read the data in disk order, not logical directory order (e.g. `xfsdump`). The problem here is that the OP is going from ext4 to XFS, so this isn't an option. (BTW, OP, I'd suggest evaluating BTRFS for use on your server. XFS can handle being used as an object store for zillions of small files, but BTRFS may be better at it.) — Peter Cordes, Sep 10 '15 at 03:51
It's a little offtopic, but: @PeterCordes I'd be very careful recommending btrfs for production use, yet. Lately I had some data corruption issues related to btrfs and bcache on Ubuntu 14.04. — Fox, Sep 10 '15 at 09:28
@TessellatingHeckler It is true that these are free commands and doesn't have any reporting status if there are corruptions. Now that you mention it, I think I'm going back to rsync. Because I know there might be corruption in our old system when the temperature threshold was breached. — lbanz, Sep 10 '15 at 09:43
@lbanz: ssh encryption, or rsync's gzip compression, might be bottlenecking you. Discussion in comments on http://unix.stackexchange.com/a/228048/79808 has some numbers for compression. — Peter Cordes, Sep 10 '15 at 09:49
@Fox: From what I've read, if you use BTRFS, it's a good idea to use the latest kernel. They usually fix more bugs than they introduce, and it's still new and improving, so a years-old stable-distro kernel version of BTRFS is not ideal. — Peter Cordes, Sep 10 '15 at 10:16
@PeterCordes that is why I recommend being careful. Myself being rather fan of bleeding edge, I quite understand why some people like long term support distros, which tend to stick to an older kernel. So sure, btrfs is maturing at a pretty good pace, but it's not an universal answer. Sure not without buts. — Fox, Sep 10 '15 at 11:30
@Fox: I haven't used BTRFS myself, since XFS is good at what I mostly do. Any comment on whether it's good for a workload like the OP's, where it's *all* small to medium-size files? I know the XFS devs sometimes say on the mailing list that XFS isn't designed to be an object-store, and my impression was BTRFS was designed with that workload as a potential use-case. (And in practice may handle it better than XFS.) Reiserfs is a bad choice for a new FS these day, but it was explicitly designed for using the filesystem as a database. — Peter Cordes, Sep 10 '15 at 12:42
Speaking of FS-as-object-store, I did some digging when this came up recently, since I was curious. http://unix.stackexchange.com/a/222640/79808 has most of what I found. Traditional-filesystem on RAID5 is a bad choice. One object-store system I looked at did redundancy at an object level, and wanted a separate XFS filesystem on each disk. The difference is subtle but huge. Metadata ops improve, because each CPU can be searching a separate small free-inode map, instead of one giant one, for example. Taking RAID5 out of the picture for small object writes is also huge. — Peter Cordes, Sep 10 '15 at 12:49
Sounds like a great little use case for BitTorrent Sync to me. https://www.getsync.com/ — , Sep 10 '15 at 20:19
If you're not going to access it again*, what if you simply removed the drive itself and stored it in an airtight container (Lock&Lock) together with a packet of desiccant and maybe a bit of bubble wrap or padding? If you needed to transfer it, use snail mail or other physical methods. It's usually faster than 17 weeks. I am assuming that the files are in a different drive than the OS. — Aloha, Sep 11 '15 at 12:36
@TessellatingHeckler LOL, the OP asked for a *better* way, not Windows... nobody genuinely *wants* Windows. — SnakeDoc, Sep 11 '15 at 18:27
I'm so glad this is tagged [tag:Linux] and not [tag:windows]. I would probably die. — corsiKa, Sep 11 '15 at 19:24
You might want to stick ice packs on the drives during the transfer as well, to help prevent heat degradation. — alexw, Sep 11 '15 at 21:14
@TessellatingHeckler The example (and the link) you reported clearly state that the preseed phase (read: file upload to the new server) is done by DFSR via robocopy. While robocopy **is** very useful, rsync is a better alternative from almost any point of view. — shodanshok, Sep 12 '15 at 21:20
@RahulPatil small files is around 6mb/s and large files are at 150mb/s. I'm expecting 1-2 months to transfer 15TB of small files. — lbanz, Sep 16 '15 at 08:27

score 66 · Accepted Answer · edited Sep 11 '15 at 17:19

66

I have had very good results using tar, pigz (parallel gzip) and nc.

Source machine:

tar -cf - -C /path/of/small/files . | pigz | nc -l 9876

Destination machine:

To extract:

nc source_machine_ip 9876 | pigz -d | tar -xf - -C /put/stuff/here

To keep archive:

nc source_machine_ip 9876 > smallstuff.tar.gz

If you want to see the transfer rate just pipe through pv after pigz -d!

edited Sep 11 '15 at 17:19

GregL

9,030
2
24
35

answered Sep 09 '15 at 16:29

h0tw1r3

2,746
18
17

3

FYI, you can replace `pigz` with `gzip` or remove it altogether, but the speed will be significantly slower. – h0tw1r3 Sep 09 '15 at 17:25
10

How can this be accepted if OP has already tried `tar` and `pigz`? I don't understand... – Thomas Weller Sep 09 '15 at 22:26
5

@ThomasWeller where did you get that he's tried `pigz`? From the question it looks like he's only tried `rsync` so far, and was *considering* using `tar` to split and bundle the data. Especially if he hasn't used the `-z`/`--compress` option on rsync, `pigz` could theoretically help significantly. – Doktor J Sep 09 '15 at 22:38
2

@ThomasWeller yes indeed I already tried tar and pigz but not nc. I was using ssh so it added a lot more overhead. – lbanz Sep 10 '15 at 08:45
http://intermediatesql.com/linux/scrap-the-scp-how-to-copy-data-fast-using-pigz-and-nc/ Using nc/pigz seems to score the highest on benchmark too. I was piping it through ssh so it was incredibly slow. – lbanz Sep 10 '15 at 09:07
@h0tw1r3 Just to let you know that it is insanely fast. After pressing enter, and doing ls. It has already done 1GB. With rsync or piping it over ssh usually takes 20-30 mins just for 1GB. The bit I'm worried is how to verify the data once it has completed the transfer. – lbanz Sep 10 '15 at 09:59
To verify the data run the compression step on both sides and compare the result on one side or the other. You'd need to make sure that the files are all in the archive in the same order which might not be possible. In that case you could (assuming enough space is available repeat the transfer in reverse to a different location and compare the result using a standard file compare utility. Or if space is short, transfer to a third (spacious) location from both source and target servers, and do the compare there. – David Spillett Sep 10 '15 at 11:30
I would propose using [`mbuffer`](http://www.maier-komor.de/mbuffer.html) instead of `nc`. The advantage is the ability to define a local buffer for the transfer. Plus, you have some additional stats. It is being widely used in [zfs dataset transfers](http://everycity.co.uk/alasdair/2010/07/using-mbuffer-to-speed-up-slow-zfs-send-zfs-receive/) for years. – the-wabbit Sep 10 '15 at 12:25
@h0tw1r3 looks like this doesn't help either when there are so many tiny jpegs. It's around 24 million of jpegs in the folder and pigz is just using 1 core. Whereas if the files are larger, it uses the default 8 cores and is insanely fast. – lbanz Sep 10 '15 at 12:36
1

For checking you could use _tee_ to divert a copy of the tar to _sha256sum_ (or other checksum/CRC tool) on both source and destination. And then compare the resulting checksum values. – David Balažic Sep 10 '15 at 13:56
@lbanz the speed that `tar` is able to collect small files is likely a disk or filesystem IO problem. The size of files should not make a difference to `pigz` because the data it receives is a `tar` stream, not the individual files. – h0tw1r3 Sep 10 '15 at 15:02
2

@lbanz that simply means that `tar` isn't producing data fast enough for `pigz` to use much CPU for compression. Reading lots of small files involves many more syscalls, many more disk seeks, and a lot more kernel overhead than reading the same number of bytes of larger files, and it looks like you're simply bottlenecking at a fundamental level. – hobbs Sep 11 '15 at 03:54

score 21 · Answer 2 · answered Sep 09 '15 at 18:44

21

I'd stick to the rsync solution. Modern (3.0.0+) rsync uses incremental file list, so it does not have to build full list before transfer. So restarting it won't require you to do whole transfer again in case of trouble. Splitting the transfer per top or second level directory will optimize this even further. (I'd use rsync -a -P and add --compress if your network is slower than your drives.)

answered Sep 09 '15 at 18:44

Fox

3,887
16
23

I'm using rsync 2.6.8 on the old server. As it is one of those boxes where we're not allowed to install/update anything as stated by the vendor or it voids the warranty. I might update it and see if it is any quicker. – lbanz Sep 10 '15 at 08:43
18

Find (or build) a statically-linked rsync binary and just run it from your home. Hopefully that won't ruin no warranty. – Fox Sep 10 '15 at 09:09
How about `unison`? How does it compare to `rsync`? – Gwyneth Llewelyn Dec 09 '18 at 18:28

score 15 · Answer 3 · edited Sep 10 '15 at 06:04

15

Set up a VPN (if its internet), create a virtual drive of some format on the remote server (make it ext4), mount it on the remote server, then mount that on the local server (using a block-level protocol like iSCSI), and use dd or another block-level tool to do the transfer. You can then copy the files off the virtual drive to the real (XFS) drive at your own convenience.

Two reasons:

No filesystem overhead, which is the main performance culprit
No seeking, you're looking at sequential read/write on both sides

edited Sep 10 '15 at 06:04

Giacomo1968

3,522
25
38

answered Sep 09 '15 at 16:17

Arthur Kay

461
2
10

3

Bypassing the filesystem is good. Copying block-level of a read-write mounted filesystem is a really bad idea. Unmount or mount read-only first. – JB. Sep 10 '15 at 12:38
Having a 15TB copy sucks, too. It means the new server needs minimum 30. – Arthur Kay Sep 10 '15 at 12:41
4

If the server is using LVM, one could do a read-only snapshot of the filesystem and copy it instead. Space overhead only for the changes in the filesystem that happen while the snapshot is read. – liori Sep 10 '15 at 17:51

score 10 · Answer 4 · answered Sep 10 '15 at 03:14

10

If the old server is being decommissioned and the files can be offline for a few minutes then it is often fastest to just pull the drives out the old box and cable them into the new server, mount them (back online now) and copy the files to the new servers native disks.

answered Sep 10 '15 at 03:14

Robin Hammond

121
2

2

It's about 1PB of 2TB drives so it is way too much. – lbanz Sep 10 '15 at 08:40

score 3 · Answer 5 · answered Sep 09 '15 at 15:39

3

Use mbuffer and if it is on a secure network you can avoid the encryption step.

answered Sep 09 '15 at 15:39

JamesRyan

8,138
2
24
36

score 3 · Answer 6 · answered Sep 10 '15 at 23:34

3

(Many different answers can work. Here is another one.)

Generate the file list with find -type f (this should finish in a couple of hours), split it to small chunks, and transfer each chunk using rsync --files-from=....

answered Sep 10 '15 at 23:34

pts

425
1
5
15

score 3 · Answer 7 · answered Sep 12 '15 at 17:56

Have you considered sneakernet? With that, I mean transfering everything onto the same drive, then physically moving that drive over.

about a month ago, Samsung unveiled a 16 TB drive (technically, it's 15.36 TB), which is also an SSD: http://www.theverge.com/2015/8/14/9153083/samsung-worlds-largest-hard-drive-16tb

I think this drive would just about do for this. You'd still have to copy all the files, but since you don't have network latency and probably can use SATA or a similarly fast technique, it should be quite a lot faster.

score 2 · Answer 8 · answered Sep 09 '15 at 20:38

2

If there is any chance to get high success ratio when deduplication, I would use something like borgbackup or Attic.

If not, check the netcat+tar+pbzip2 solution, adapt the compression options according to your hardware - check what is the bottleneck (CPU? network? IO?). The pbzip2 would nicely span across all CPUs, giving better performance.

answered Sep 09 '15 at 20:38

neutrinus

1,095
7
18

lzma (`xz`) decompresses faster than bzip2, and does well on most input. Unfortunately, `xz`'s multithread option isn't implemented yet. – Peter Cordes Sep 10 '15 at 03:40
Usually the compression stage needs more horsepower than decompression, so if the CPU is the limiting factor, pbzip2 would result in better overall performance. Decompression shouldn't affect the process, if both machines are similar. – neutrinus Sep 10 '15 at 07:45
Yes, my point was it's a shame that there isn't a single-stream multi-thread lzma. Although for this use-case, of transferring whole filesystems of data, `pigz` would prob. be the slowest compressor you'd want to use. Or even `lz4`. (There's a `lz4mt` multi-threaded-for-a-single-stream available. It doesn't thread very efficiently (spawns new threads extremely often), but it does get a solid speedup) – Peter Cordes Sep 10 '15 at 09:14

score 2 · Answer 9 · answered Sep 10 '15 at 18:49

You are using RedHat Linux, so this wouldn't apply, but as another option:

I've had great success using ZFS to hold millions of files as inodes aren't an issue.

If that was an option for you, you could then take snapshots and use zfs to send incremental updates. I've had a lot of success using this method to transfer as well as archive data.

ZFS is primarily a Solaris filesystem, but can be found in the illumos (open source fork of Sun's OpenSolaris). I know there has also been some luck at using ZFS under BSD and Linux (using FUSE?)--but I have no experience on trying that.

There has been a non-FUSE native Linux port of ZFS for quite a while now: http://zfsonlinux.org/ — EEAA, Sep 10 '15 at 18:52

score 1 · Answer 10 · edited Sep 11 '15 at 15:57

1

Start an rsync daemon on the target machine. This will speedup the transfer process a lot.

edited Sep 11 '15 at 15:57

MadHatter

78,442
20
178
229

answered Sep 11 '15 at 15:50

Heiko Wiesner

19
1

score -1 · Answer 11 · answered Sep 11 '15 at 18:06

-1

You can do this with just tar and ssh, like this:

tar zcf - <your files> | ssh <destination host> "cat > <your_file>.tar.gz"

Or, if you want to keep individual files:

tar zcf - <your files> | ssh <destination host> "tar zxf -"

answered Sep 11 '15 at 18:06

Fabio Brito

1

1

It will not deduplicate, no way to resume, compressing using only one CPU. – neutrinus Sep 11 '15 at 19:53

Transfer 15TB of tiny files

11 Answers11