How can I move a bunch of hardlinked files from a Linux box onto my OS X Mac and preserve the hardlinks?

1

I have an rsnapshot backup that I need to move off of a corrupt Linux file system. I need to preserve the internal hardlinks. I've tried rsync -H and using a newer rsync and neither preserve the hardlinks on OS X.

I tried to get rsync -H working and I've isolated it to the file system mounted. I can preserve hard links copying locally (HFS to HFS) but it doesn't preserve when I try to rsync off of a SMB file system mount or AFP file system mount. Is there some mount option solution to getting OS X rsync to obey -H?

Any advice would be greatly appreciated.

eatloaf

Posted 2010-12-08T06:37:37.673

Reputation: 224

As an aside: maybe a file system guru that runs into this question needs to know what exact file system that other disk is using? (I can imagine that sharing it through Samba might not expose any hard links, but I'm no expert.) – Arjan – 2010-12-09T10:56:46.817

Answers

1

Since the problem seemed to be OSX's rsync not identifying and preserving hard links from a mounted EXT2 source, I succeeded instead in running an rsync daemon on the source linux box and using rsync on my Mac to connect to that daemon. It seems to correctly preserve internal hard-links this way.

  • To accomplish this you need to have rsync installed on both machines, with one of them running in daemon mode. In my case it was the source.

  • You'll also need to edit the rsyncd.conf on the daemon side to define the 'module' ( fancy name for 'path' ) that will be the source or target.

  • Finally, you use a modified syntax from the non-daemon side to reference the daemon: user@host::module. So copying from the daemon could be: rsync -r user@host::module ~/foo

For more detail, google 'rsync' and/or 'rsync daemon'

eatloaf

Posted 2010-12-08T06:37:37.673

Reputation: 224

Good it's solved! Is running such daemon any difficult, or just using some out of the box settings? – Arjan – 2010-12-09T17:59:20.990

It is not difficult but does require obtaining a minimal understanding of the .conf file settings. – eatloaf – 2010-12-09T21:36:59.330

If you think others can benefit then please add those details to your answer? Thanks! – Arjan – 2010-12-10T11:29:22.607

I'll add the fundamentals, but it's heavily dependent on the specifics. For example, my file source is a DroboFS, upon which I can install the rsync daemon and tweak the config to suit my needs in this case, but it's not information that's generally relevant to the question I posed. The most useful general advice I can give is to search for "rsync daemon". The manual might be enough for some. Others will want to search for a step by step. – eatloaf – 2010-12-12T03:07:51.800

0

I surely hope there's an easier way. Still, if all else fails:

I've never used it, but the timecopy Python script (for use with faulty Time Machine backups) might help. It's a long script but it seems it's not only that long just because of Time Machine. And especially its support for faulty disks can be useful for your corrupted file system too. From its website:

Using a tool that performs a block-for-block copy will in fact copy the file system error to the new disk, which is of no use at all. What's needed is a way to copy the file system to a new location using traditional file copy. The only problem with that is the Time Machine backups are full of hard links, which will appear as normal files and directories, and performing a simple file copy will result in an enormous waste of disk space.

It supports a --dry-run option, and --verbose outputs nice mkdir, cp, ln and ln -s commands.

The script enforces using the Time Machine Backups.backupdb file structure. It seems to me that changing srcdb = os.path.join(srcbase, 'Backups.backupdb') into srcdb = srcbase, and also changing dstdb = os.path.join(dstbase, 'Backups.backupdb') into dstdb = dstbase, might make this usable for non-TM sources.

It then processes each sub folder of the source folder, expecting each to be a machine name, being the root of all backups for that machine (typically one, unless the disk is used for multiple computers). Within each sub folder, it processes everything except for files named .DS_Store, Latest or ending with .inProgress. But: it does not expect the sub folders of the source folder to be hard links themselves. If you do have hard links in the source folder, then maybe you can mount the disk with an extra folder name. Like: use /Volumes/my/mount rather than /Volumes/mount, and then run timecopy for the source folder /Volumes/my.

Finally, it will also create a symbolic link named Latest, just as a Time Machine disk would, for the most recent sub folder. You can of course delete that afterwards.

You can then still do the --dry-run, or maybe the output of --verbose --dry-run can help to get a script that you can use in some other way?

Arjan

Posted 2010-12-08T06:37:37.673

Reputation: 29 084

Thanks for the tip. I tried and it requires an actual Time Machine backup source and I'm not confident enough to hack the code and trust it with backups. Although if someone else could modify it to be a general purpose copier then that would be very much appreciated. – eatloaf – 2010-12-08T07:21:26.120

Thanks for the suggestion. I tried those replacements and it got further than before but I think it's expecting some explicit Time Machine structure because I don't see it actually linking anything; it's all cp and mkdir commands. It's not done yet but even if it correctly links at the end, I don't have room for it to duplicate everything and then clean up redundant files. – eatloaf – 2010-12-09T04:57:11.657

I tried to get rsync -H working and I've isolated it to the fs mounted. I can preserve hard links copying locally ( HFS to HFS ) but it doesn't preserve when I try to rsync off of a smbfs or afpfs mount. Is there some mount option solution to getting osx rsync to obey -H? – eatloaf – 2010-12-09T05:20:30.177

@eatloaf, I investigated some more. The timecopy script does not expect the sub folders of the source folder to be hard links themselves. If you do have hard links in the source folder, then maybe you can mount the disk with an extra folder name. Like: use /Volumes/my/mount rather than /Volumes/mount. See my edits. (Still not the best answer to your simple question, I hope. I edited your comment into your question, which also makes your question bump back on the front page.) – Arjan – 2010-12-09T07:32:44.637

That sounds very much like rsnapshot's folder structure: Computer/daily.N/... However when I dry-ran it, I only saw CP and MKDIR commands so I don't see how it would preserve my hard-links. I appreciate your help, and will certainly make note of timecopy since I'll be switching from rsnapshot to Time Machine, however I have found a simpler solution to my current issue, which is to use an rsync deamon on the EXT2 source end rather than mount it locally. – eatloaf – 2010-12-09T17:15:20.780