rsync using regex to include only some files

11

1

I am trying to run rsync to copy some files recursively down a path based on their file name pattern, case insensitive. This is what I have done to run rsync:

$ rsync -avvz --include ='*/' --include='.*[Nn][Aa][Mm][E].*' --exclude='*' ./a/ ./b/

Nothing gets copied, the debug output shows:

[sender] hiding file 1Name.txt because of pattern *
[sender] hiding file 1.txt because of pattern *
[sender] hiding file 2.txt because of pattern *
[sender] hiding file Name1.txt because of pattern *
[sender] hiding directory test1 because of pattern *
[sender] hiding file NaMe.txt because of pattern *

I have tried using: --include='*[Nn][Aa][Mm][E]*'and other combinations but it still doesn't go.

Any ideas on how to use regex to include some files?

user1957413

Posted 2013-01-14T10:22:28.590

Reputation: 123

4Why are you using the --exclude='*'? – None – 2013-01-14T10:50:39.220

2so it excludes everything that is not part of the include. – None – 2013-01-14T14:40:15.247

'hiding file 1Name.txt because of pattern ' this indicates:-" does that --exclude rule need to be in the command ?" or If you want to exclude some files then why a "". – Akshay Patil – 2013-01-14T19:29:18.020

Answers

5

rsync doesn't speak regex. You can enlist find and grep, though it gets a little arcane. To find the target files:

find a/ |
grep -i 'name'

But they're all prefixed with "a/" - which makes sense, but what we want to end up with is a list of include patterns acceptable to rsync, and as the "a/" prefix doesn't work for rsync I'll remove it with cut:

find . |
grep -i 'name' |
cut -d / -f 2-

There's still a problem - we'll still miss files in subdirectories, because rsync doesn't search directories in the exclude list. I'm going to use awk to add the subdirectories of any matching files to the list of include patterns:

find a/ |
grep -i 'name' |
cut -d / -f 2- |
awk -F/ '{print; while(/\//) {sub("/[^/]*$", ""); print}}'

All that's left is to send the list to rsync - we can use the argument --include-from=- to provide a list of patterns to rsync on standard input. So, altogether:

find a/ |
grep -i 'name' |
cut -d / -f 2- |
awk -F/ '{print; while(/\//) {sub("/[^/]*$", ""); print}}' |
rsync -avvz --include-from=- --exclude='*' ./a/ ./b/

Note that the source directory 'a' is referred to via two different paths - "a/" and "./a/". This is subtle but important. To make things more consistent I'm going to make one final change, and always refer to the source directory as "./a/". However, this means the cut command has to change as there will be an extra "./" on the front of the results from find:

find ./a/ |
grep -i 'name' |
cut -d / -f 3- |
awk -F/ '{print; while(/\//) {sub("/[^/]*$", ""); print}}' |
rsync -avvz --include-from=- --exclude='*' ./a/ ./b/

sqweek

Posted 2013-01-14T10:22:28.590

Reputation: 356

That's fine for a “push” but can't be applied for a “pull”. – Hibou57 – 2014-07-11T16:37:23.957

Tried to run it, ran into issues with the cut command. Seems that -t is that a valid switch. – None – 2013-01-15T06:42:39.797

edit: i meant -t is not a valid switch – None – 2013-01-15T11:18:41.717

sorry, should be -d. i started off using sed and then changed to cut because i thought it was clearer, but forgot to edit my commands :S – None – 2013-01-15T13:31:21.333

Follow up: Tried to edit the scrip to take arguments ($1 = path_to_search, $2 as the pattern for egrep )as i am matching filename + mix of extensions. That parts works fine, i got the expected list, however rsync fails to copy. It seems to only work with the single name character directory as in the example (a) my guess is that the cut command has to be modified to cut characters based on the parent / or source dir? Kinda lost of how to do that : – user1957413 – 2013-01-18T10:15:30.133

Ah yeah, you are quite right. It should work on a directory name of any length, but will fail as soon as you refer to a directory outside the current directory (because there will be a different number of slashes in the prefix portion).

To fix that, probably easiest to use sed instead of cut, like: sed "s#^$1/*##"

buuuut that will break on paths that contain a #. To fix that we have to quote the incoming directory name: prefix=$(echo "$1" | sed 's#/#\\/#g') and then sed "s/^$prefix\\/*//"

The subleties of bash quoting are a bit of a nightmare ;) – sqweek – 2013-01-19T07:49:17.093

7

I would suggest to use the filter option of rsync. For your example just type:

rsync -vam -f'+ *[Nn][Aa][Mm][E]*' -f'+ */' -f'- *' a b

the first filter rule tells rsync what patterns to include. The second rule is needed to tell rsync to inspect all directories on its traversal. To prevent empty dirs from inclusion they are excluded explicitly by -m option. The last filter rule tells rsync to dispose all remaining patterns that still didn't match so far.

sparkie

Posted 2013-01-14T10:22:28.590

Reputation: 2 110

Use -f'+ *[Nn][Aa][Mm][E]**' (two stars at the end) to include the contents of all directories with a specific name. – phobic – 2016-11-11T13:51:20.220

Sweet. This worked as well. I was getting the folder a inside of b, that got fixed by using a/ b/ as the source and destination. Thanks! – user1957413 – 2013-01-16T06:04:43.733

2

If you use ZSH then you can use the (#i) flag to turn off case sensitivity. Example:

$ touch NAME
$ ls (#i)*name*
NAME

ZSH also supports exclusions, which are specified just like the regular path but they have an initial ~

$ touch aa ab ac
$ ls *~*c
aa ab

You can chain exclusions:

$ ls *~*c~*b
aa

Finally you can specify what kind of file you want returned (directory, file, etc). This is done with (/) for directory and (.) for file.

$ touch file
$ mkdir dir
$ ls *(.)
file

Based on all this, I would do that command as:

rsync -avvz *(/) (#i)*name* ./a/ ./b/

(I don't see a need for an exclusion with these selectors)

Matthew Franglen

Posted 2013-01-14T10:22:28.590

Reputation: 311

1

@sqweek's answer above is awesome, though I suspect he has a bug in his awk script for generating parent directories, as it gives me e.g.:

$ echo a/b/c/d | awk -F/ '{print; while(/\//) {sub("/[^/]*", ""); print}}'
a/b/c/d
a/c/d
a/d
a

I was able to fix it by using gensub instead:

$ echo a/b/c/d | awk -F/ '{print; while(/\//) { $0=gensub("(.*)/[^/]*", "\\1", "g"); print}}'
a/b/c/d
a/b/c
a/b
a

So, his full solution, with the awk bit changed, would be:

find ./a/ |
grep -i 'name' |
cut -d / -f 3- |
awk -F/ '{print; while(/\//) { $0=gensub("(.*)/[^/]*", "\\1", "g"); print}}' |
rsync -avvz --include-from=- --exclude='*' ./a/ ./b/

Ryan Williams

Posted 2013-01-14T10:22:28.590

Reputation: 11

Thanks. Edited my answer with the equivalent fix of anchoring the regex to the end of line (sub("/[^/]*$")). – sqweek – 2017-05-21T00:05:04.680

0

[EDIT] This only works locally. For remote paths, the directory structure has to be created first.

More simple than the accepted answer; Use --file-from, which includes parent directories automatically and printf the file path with %P

find /tmp/source -wholename '*[Nn][Aa][Mm][E]*' -printf '%P\n' | rsync -vzrm --exclude='*/' --files-from=- /tmp/source/ /tmp/target/

So you only have to use find and rsync.

phobic

Posted 2013-01-14T10:22:28.590

Reputation: 191

0

Tried with a C# script since is the language i have the most experience with. I am able to create the list of files that i want to include, but someone rsync is still tell me take a hike. It creates the folders, but it ignores the files. Here is what is what i got..

First the content of the directory:

~/mono$ ls -l
total 24
drwxr-xr-x 5 me me 4096 Jan 15 00:36 a
drwxr-xr-x 2 me me 4096 Jan 15 00:36 b
drwxr-xr-x 3 me me 4096 Jan 14 00:31 bin
-rw-r--r-- 1 me me 3566 Jan 15 00:31 test.cs
-rwxr-xr-x 1 me me 4096 Jan 15 00:31 test.exe
-rwxr--r-- 1 me me  114 Jan 14 22:40 test.sh

Then the output of the C# script:

~/mono$ mono test.exe

/a/myfile/myfileseries.pdf
/a/myfile2/testfile.pdf

And the debug output:

~/mono$ mono test.exe | rsync -avvvz --include='*/' --include-from=- --exclude='*' ./a/ ./b/
[client] add_rule(+ */)
[client] parse_filter_file(-,20,3)
[client] add_rule(+ /a/myfile/myfileseries.pdf)
[client] add_rule(+ /a/myfile2/testfile.pdf)
[client] add_rule(- *)
sending incremental file list
[sender] make_file(.,*,0)
[sender] hiding file 1Name.txt because of pattern *
[sender] showing directory myfile2 because of pattern */
[sender] make_file(myfile2,*,2)
[sender] hiding file 1.txt because of pattern *
[sender] hiding file 2.txt because of pattern *
[sender] hiding file Name1.txt because of pattern *
[sender] showing directory test1 because of pattern */
[sender] make_file(test1,*,2)
[sender] hiding file NaMe.txt because of pattern *
[sender] showing directory myfile because of pattern */
[sender] make_file(myfile,*,2)
send_file_list done
send_files starting
[sender] hiding file myfile/myfileseries.pdf because of pattern *
[sender] hiding file myfile2/testfile.pdf because of pattern *
[sender] hiding file test1/test.txt because of pattern *

user1957413

Posted 2013-01-14T10:22:28.590

Reputation: 123