6

I'm looking to transfer data across 2 lv of an HP-UX server. I have a couple of those transfers to do, some of which are mostly binary (Oracle tablespace...) and some others are more text files (logs...). Used data size of the volumes is between 100Gb and 1Tb. Also, I will be changing the block size from 1K to 8K on some of these partitions...

Things I'm looking for:

  • Guarantees data integrity
  • Fastest data transfer speed
  • Keeps file ownership and permissions

Right now, I've thought about dd, cp and rsync, but I'm not sure on the best one to use and the best way to use them...

skinp
  • 749
  • 1
  • 7
  • 19

4 Answers4

6

You don't want to use dd. That is for working on 1 file or stream, not on a whole filesystem.

rsync is designed to do what you want but as the previous poster has stated, and as my tests have shown, it's not the fastest. That's because it's for doing something like this: "Ok, I'm looking at file A. Is file A on the destination? If so, is it newer, older, the same?" Etc. rsync is a bit complicated because it is meant to be run more than once... like the name says, it's for synchronizing two locations.

For doing the sort of thing you want, I have found a tar copy to be quick, easy, and reliable. Tar knows about hard links. Tar knows about devices. Tar handles almost any situation you'll run into in your filesystem (except really long paths, and, if you're not using Gnu tar, you may need to be wary of putting a / in the beginning of your pathname).

Anyway, I've had 99.98% success for the last 20 years doing by doing this:

cd /my/source; tar cf - subdirectory | (cd /destination/path; tar xf -)

...The subdirectory you want to copy will show up in /destination/path .

If you like to watch your progress, you can use "xvf" instead of "xf" in the latter portion of that string.

...my 0.02% failures have come from really long file paths... :-(

Tar will not guarantee file integrity. That said, as long as you don't see any error messages, I've found it to be very reliable. It will keep permissions and ownership properly.

Mike S
  • 1,103
  • 5
  • 19
  • 40
  • I recommend using archive mode and also enabling all the special options like the xattr ones so that you really get to keep as much metadata as possible. Also: If you're looking at a btrfs subvolume, you're probably fastest just using `btrfs send` piped to `btrfs receive` – xdevs23 Sep 06 '22 at 20:46
2

Have a look at this post. Some answers suggested using tar. Others suggested using rsync. They are taking about copying data between two machines. Your problem is similar, but you need to copy the files locally instead of doing it over the network.

Khaled
  • 35,688
  • 8
  • 69
  • 98
2

I would recommend using rsync, as it has features that specifically address most of your concerns. If you use appropriate options (e.g. the -a option), then all file ownerships, permissions, and times will be preserved. Furthermore, rsync automatically uses checksums to ensure that all transferred files arrive at the destination intact, so data integrity is assured (presuming a successful run).

The only point where rsync may not be optimal is speed, especially when compared to a lighter-weight alternative like cp, but I doubt that you would notice much difference, unless your processing power is very low.

Steven Monday
  • 13,019
  • 4
  • 35
  • 45
  • one issue with rsync is memory. it tries to load ALL file paths from the source into memory before scanning the destination for what needs to be copied. if making a disk replica to an empty filesystem then cp can do the job – Skaperen Feb 14 '15 at 11:53
0

You basically have three options:

  1. Copy the entire partition/block device
  2. Dump the entire filesystem
  3. Copy the data inside the filesystem

Select one of the three option depend on what you had to backup, and the results you want to have. For your specific case, I think that option n.1 (block device copy) coupled with ddrescue is the way to go. Anyway, let see a collection of available options.

Case 1: partition copy
PRO: copying an entire block device, you are sure that noting was left behind.
CON: bother with block devices is less convenient than working with files, selection the wrong block device or options can destroy your data.

If you want to have a binary copy of an entire block dev, you had to use dd or similar tool. Other very useful tools are dcfldd (an hash-ready dd fork) and ddrescue (an even more advanced dd-like tool).

Case 2: filesystem dump
PRO: copying an entire filesystem, you are sure that all data and metadata inside it were backupped.
CON: if have multiple filesystems to backup, you had to do multiple pass (one for filesystem)
A useful tools for dealing with filesystems is FSArchive. Moreover, many filesystem have integrated utilities for dumping their contents in an efficient manner (eg: XFS has xfsdump, Ext2/3/4 use dumpe2fs, and so on).

Case 3: copy the data inside the filesystem
PRO: copying data from inside the filesystem, you can very specifically select what to backup. This ensure fast backup/restore time and small backup images.
CON: You had to perfectly know what to backup, and how. Special care should be used for important metadata (eg: owner, permission, ACLs, EAs...)
Rsync is your best friend here. Rsnapshot and rdiff-backup are wonderful tools build on top of rsync/librsync. Tar is the swiss-knife of any Unix sysadmin.

shodanshok
  • 44,038
  • 6
  • 98
  • 162