linux merge folders: rsync?

13

7

I have two copies of a folder

src/
dest/

I want to merge them, doing the following:

If a file is only in src, I want it to be moved to dest

If a file is only in dest, I want it ignored IE left alone.

If a file is in both and has identical contents (IE same size and date), delete from src

If a file is in both and does not have identical contents, leave behind in src so I can manually merge them.

Only a very small number of files (between 0% and 5% of total files) should be in this last category, but I don't know how to separate the the in both and the same from in both, but different.

I've tried to figure out how to do this with rsync but to no avail so far.

David Oneill

Posted 2010-11-23T18:31:10.687

Reputation: 2 381

Answers

17

I've only performed limited functionality testing, so please be careful with this command (--dry-run):

rsync -avPr --ignore-existing --remove-source-files src/ dest

Please note the trailing / as this will recurse into src instead of copying src itself, this should maintain your existing paths.

By using the --ignore-existing flag in combination with the --remove-source-files flag you will delete only files from src that are sync'ed from src to dest, that is files that did not previously exist in dest only.

For deleting non-sync'ed files, that is those that already existed in dest/ as in src/, you can use:

for file in `find src/ -type f`; do diff $file `echo $file | sed 's/src/dest/'` && rm $file || echo $file; done

or

find src -type f -exec bash -c 'cmp -s "$0" "${0/#src/dest}" && rm "$0"' {} \;

if filenames could contain whitespace/new lines/… Regarding Gilles' comment concerning special characters, that is certainly something to be mindful of and there are many solutions, the simplest would be to pass an -i to rm which will prompt before all deletion. Provided that src/, or its parent path, is provided to find, however, the fully qualified path should result in all file names being handled properly by both the diff and rm commands without quoting.

Tok

Posted 2010-11-23T18:31:10.687

Reputation: 499

correction: that command will not remove files from src if an identical copy already exists in dest – Tok – 2010-11-23T18:52:48.927

Yeah :(. That's the part that I'm finding hard to figure out. – David Oneill – 2010-11-23T19:07:46.450

2Well, the good news is that you can solve it independently without much hassle: for file in \find src/ -type f`; do diff $file `echo $file | sed 's/src/dest/'` && rm $file || echo $file; done(you can skip the|| echo $file` if you like, it is included for completeness) – Tok – 2010-11-23T19:16:27.760

Nifty: that's what I needed. Edit that into your answer, and I'll accept it! – David Oneill – 2010-11-24T00:33:42.517

@Tok: Your command will choke on file names that contain special characters (whitespace, \?*[, initial -). You need to use double quotes around variable substitutions, pass -- to utilities before file names, use find … -exec … instead of parsing the output of find. With an rm command in the mix, this is a recipe for disaster. – Gilles 'SO- stop being evil' – 2010-11-24T01:06:09.193

@Tok: No, passing -i not only won't help getting the right files deleted (obviously), but it won't even help avoiding getting the wrong files deleted. Try (on Linux or Cygwin) touch 'foo -f bar'; rm -i $(echo foo*). – Gilles 'SO- stop being evil' – 2010-11-24T19:27:09.577

--prune-empty-dirs might also be a good option, in my case I have x = millions of files, y = x * 1.33 directories. – Alix Axel – 2013-06-04T12:47:32.657

6

unison is the tool you're looking for. Try unison-gtk if you prefer a gui. But I don't think it will delete similar files: unison try to have both directories identical. Nevertheless it will easyly 1) identify which files are to copy; 2) which ones needs manual merge.

simonp

Posted 2010-11-23T18:31:10.687

Reputation: 616

It doesn't do exactly what the OP asks for, but it sounds like it accomplishes the OP's ultimate goal. +1 – Ryan C. Thompson – 2010-11-24T17:27:35.513

+1 Sadly, the server I'm running this on does not have unison installed, nor do I have the permissions to install it. But this might be a good answer to someone else. – David Oneill – 2010-11-24T21:37:30.167

1

You can download unison executable from http://www.seas.upenn.edu/~bcpierce/unison//download/unison-contributed-binaries/linux/. Install it somewhere in your home directory, it's just one file.

– JooMing – 2010-11-29T15:01:36.537

2

The following script should do things reasonably. It moves files from the source to the destination, never overwriting a file and creating directories as necessary. Source files that have a corresponding different file in the destination are left alone, as are files that are not regular files or directories (e.g. symbolic links). The files left over in the source are those for which there is a conflict. Beware, I haven't tested it at all.

cd src
find . -exec sh -c '
    set -- "/path/to/dest/$0"
    if [ -d "$0" ]; then #  the source is a directory 
      if ! [ -e "$1" ]; then
        mv -- "$0" "$1"  # move whole directory in one go
      fi
    elif ! [ -e "$0" ]; then  # the source doesn't exist after all
      :  # might happen if a whole directory was moved
    elif ! [ -e "$1" ]; then  # the destination doesn't exist
      mv -- "$0" "$1"
    elif [ -f "$1" ] && cmp -s -- "$0" "$1"; then  # identical files
      rm -- "$0"
    fi
  ' {} \;

Another approach would be to do a union mount one directory above the other, for example with funionfs or unionfs-fuse.

Gilles 'SO- stop being evil'

Posted 2010-11-23T18:31:10.687

Reputation: 58 319