9

I have a folder which contains a certain number of files which have hard links (in the same folder or somewhere else), and I want to de-hardlink these files, so they become independant, and changes to their contents won't affect any other file (their link count becomes 1).

Below, I give a solution which basically copies each hard link to another location, then move it back in place.

However this method seems rather crude and error-prone, so I'd like to know if there is some command which will de-hardlink a file for me.

Crude answer :

Find files which have hard links (Edit: To also find sockets etc. that have hardlinks, use find -not -type d -links +1) :

find      -type f -links +1 # files only
find -not -type d -links +1 # files, sockets etc.

A crude method to de-hardlink a file (copy it to another location, and move it back) : Edit: As Celada said, it's best to do a cp -p below, to avoid loosing timestamps and permissions. Edit: Create a temporary directory and copy to a file under it, instead of overwriting a temp file, it minimizes the risk to overwrite some data, though the mv command is still risky (thanks @Tobu). Edit: Try to create the temporary directory in the same filesystem (@MikkoRantalainen).

# This is unhardlink.sh
set -e
for i in "$@"; do
  temp="$(mktemp -d -- "${i%/*}/hardlnk-XXXXXXXX")"
  [ -e "$temp" ] && cp -ip "$i" "$temp/tempcopy" && mv "$temp/tempcopy" "$i" && rmdir "$temp"
done

So, to un-hardlink all hard links (Edit: changed -type f to -not -type d, see above) :

find -not -type d -links +1 -print0 | xargs -0 unhardlink.sh
Suzanne Soy
  • 268
  • 2
  • 11
  • I wouldnt consider that 'crude'. The only way to get that faster is probably doing some trick with the sendfile() system call and unlinking the open source file and rewriting the target in-place. Frankly its not worth the effort though. – Matthew Ife May 06 '12 at 19:10
  • By 'crude', I mean that, for example, when I ran this command using the `cp -i` switch, it spat at me a few messages asking if it should override `./fileXXXXXX` (the `$temp` file), even though tmpfile should give unique file names, so there *must* be some kind of race condition or whatever, and with it the risk to loose some data. – Suzanne Soy May 08 '12 at 13:59
  • 1
    It's normal that the file exists, you just created it with tempfile (nb: deprecated in favour of mktemp, but that's not what caused your problem). – Tobu Oct 31 '12 at 21:26
  • @Tobu Thanks, I modified my code to use `mktemp -d` in order to create a temp dir, in which I copy the file using `cp -i`, to avoid accidentally overwriting anything. There's still a possible race condition if we start copying the original file, then something removes it and replaces it with some new file, and we mv the copy over that new file, so this script isn't safe when run in, say, a network share, but should be ok for de-hardlinking files on a local disk where we make sure no process is performing modifications while the script works. – Suzanne Soy Nov 01 '12 at 09:42
  • 1
    Your `unhardlink.sh` should create temporary directory inside the same directory that contains the file that needs to be unhardlinked. Otherwise your recursive call may recurse inside another filesystem and you end up moving stuff over filesystem boundaries because your temporary directory is at current working directory. I guess you could pass `"$(dirname "$i")/hardlink-XXXXXX"` as the argument to mktemp instead. – Mikko Rantalainen Jan 06 '19 at 10:31
  • 1
    @MikkoRantalainen Thanks a lot, updated! Note that if the filesystem is some sort of unionfs or a `fuse` filesystem, it might actually dispatch `path/to/hardlink-XXX` to a different physical storage medium than `path/to/original-file`, but there's not much that can be done about that. – Suzanne Soy Feb 08 '19 at 11:12

2 Answers2

8

There is room for improvement in your script, for example adding a -p option to the cp command so that permissions and timestamps will be preserved across the unhardlink operation, and you could add some error handling so that the temp file is deleted in case of an error, but the basic idea of your solution is the only one that will work. To unhardlink a file you have to copy it and then move the copy back over the original name. There is no "less crude" solution, and this solution has race conditions in case another process is accessing the file at the same time.

Celada
  • 6,060
  • 1
  • 20
  • 17
  • Indeed, I always use cp -a when copying stuff, to preserve everything, recurse and copy symlinks as symlinks. Don't know why I forgot it this time, but after seeing your answer, I understood I had screwed up all my timestamps, and had to (rather painfully) recover them from a backup. – Suzanne Soy May 07 '12 at 19:40
4

If you want to burn up disk space, and you have a relatively modern version of tar (e.g., what's on Ubuntu 10.04 and CentOS 6), you can play with the --hard-dereference option.

Something like:

$ cd /path/to/directory
$ ls -l *
bar:
total 12
-rw-rw-r-- 2 cjc cjc 2 May  6 19:07 1
-rw-rw-r-- 2 cjc cjc 2 May  6 19:07 2
-rw-rw-r-- 1 cjc cjc 2 May  6 19:07 3

foo:
total 12
-rw-rw-r-- 2 cjc cjc 3 May  6 19:07 1
-rw-rw-r-- 2 cjc cjc 2 May  6 19:07 2
-rw-rw-r-- 1 cjc cjc 2 May  6 19:07 4

(where I had run ln foo/[12] bar)

$ tar cvf /tmp/dereferencing.tar --hard-dereference .
$ tar xvf /tmp/dereferencing.tar
$ ls -l *
bar:
total 12
-rw-rw-r-- 1 cjc cjc 2 May  6 19:07 1
-rw-rw-r-- 1 cjc cjc 2 May  6 19:07 2
-rw-rw-r-- 1 cjc cjc 2 May  6 19:07 3

foo:
total 12
-rw-rw-r-- 1 cjc cjc 3 May  6 19:07 1
-rw-rw-r-- 1 cjc cjc 2 May  6 19:07 2
-rw-rw-r-- 1 cjc cjc 2 May  6 19:07 4

From the man page:

   --hard-dereference
          follow hard links; archive and dump the files they refer to
cjc
  • 24,533
  • 2
  • 49
  • 69
  • I suspect there is little tar cannot do. Nice fix. – Joseph Kern May 07 '12 at 00:30
  • I forgot to mention that I didn't have enough disk space to copy everything. Basically, your method is the same as `cp -a --no-preserve=links /path/to/folder /path/to/copy && rm -rf /path/to/folder && mv /path/to/copy /path/to/folder`, if I'm not mistaken. I guess your method would be more efficient, though, because tar would involve less disk seeks, so less thrashing. One could achieve the same with rsync, with even lower performance than the cp method :). – Suzanne Soy May 07 '12 at 19:34
  • 1
    To avoid using much extra disk, it might be possible to run something like `tar cvf - --hard-dereference . | tar xf -` but there might be a race condition that will cause things to explode. I have not tried it, and I'm sort of disinclined to do so at the moment. – cjc May 08 '12 at 14:35