Using rsync to backup a Windows-hosted share over Samba in FreeBSD

1

I am trying to backup a Windows-hosted share, using rsync, from a FreeBSD8.2 box:

bash$ sudo -i
bash#  uname -a
FreeBSD zeus.companyname.gr 8.2-RELEASE ...amd64
bash# cat /root/.nsmbrc
...
[MACHINENAME:ADMINISTRATOR]
password=mysuperuncrackablepassword
bash# mount_smbfs -N -E utf-8:cp737 -I 192.168.0.2 //Administrator@machinename/f$ /iso1/
bash# ls -l -raw /iso1/prj/
ΠΡΟΕΤΟΙΜΑΣΙΑ ΔΕΔΟΜΕΝΩΝ ΠΑΡΕΛΘΟΝΤΩΝ ΕΤΩΝ
Πανεπιστήμιο - Προβολή, Δημοσιότητα

In plain words, I correctly see folders with Greek characters (locale-specific). The listing above is from a PuTTY session (i.e. an SSH session), and PuTTY was configured to translate UTF-8 by default.

Note I have not touched the locale:

bash# locale
LANG=
LC_CTYPE="C"
LC_COLLATE="C"
LC_TIME="C"
LC_NUMERIC="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_ALL=

So all seems to be OK.

However, when I tried rsyncing from the mounted folder, some files appear to vanish...

bash# rsync --inplace -rltvxp /iso1/ /backups/backup-machinename/
sending incremental file list
file has vanished: "/iso1/prj/..."

The message "file has vanished" means that rsync called the proper system calls to read the contents of a folder (dir/dirent I believe), and when it later tried to read one of the contained files, it did not find it - i.e. "open(2)" failed.

I checked the reported file: (a) it exists, (b) it has world-readable permissions.

I then assumed that the cp737 (Greek codepage) is the problem, so I mounted again with...

bash$ mount_smbfs -N -E utf-8:utf-8 -I 192.168.0.2 //Administrator@machinename/f$ /iso1/

...that is, I used utf-8 for the Windows side, too. When I tried rsync again, however, it got stuck (!) with 100% CPU utilization... Attaching with GDB showed:

bash# gdb /usr/local/bin/rsync 3109
GNU gdb 6.1.1 [FreeBSD]
Attaching to program: /usr/local/bin/rsync, process 3109
Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.7
Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols found)...done.
Loaded symbols for /libexec/ld-elf.so.1
0x0000000800709c0a in getdirentries () from /lib/libc.so.7

...so it appears rsync is stuck waiting for getdirentries to return, or each call to getdirentries takes an impossible amount of time...

Any ideas?

Has anybody managed to do what I am doing? I.e. Use FreeBSD to rsync files with locale-specific characters in their filenames, from a Windows share that is mounted via mount_smbfs?

P.S. In case anyone wonders why I try to do this, the answer is simple: ZFS.

ttsiodras

Posted 2011-03-24T13:34:09.807

Reputation: 588

How about running truss on the process? truss -d -s 255 -p <pid> That will show you exactly what it's doing when it gets stuck - which system calls it's running and how long they take. – Majenko – 2011-03-24T13:40:32.693

truss revealed that rsync is probably stuck asking for the same directory over and over: 200.223900400 getdirentries(0x5,0x800a2e000,0x1000,0x800a2d068,0x0,0x0) = 196 (0xc4) 200.224160040 getdirentries(0x5,0x800a2e000,0x1000,0x800a2d068,0x0,0x0) = 196 (0xc4) 200.224394880 getdirentries(0x5,0x800a2e000,0x1000,0x800a2d068,0x0,0x0) = 196 (0xc4) – ttsiodras – 2011-03-24T14:38:17.017

verified - not just rsync, but also "find /iso1" gets stuck as well. Truss again show getdirentries() called over and over. – ttsiodras – 2011-03-24T16:05:37.800

Ok, so we know that the UTF mapping is broken somewhere fundamental - so we can't use it. Now, can you identify what the files that vanish have in common? Is it a particular character or something? – Majenko – 2011-03-24T16:21:55.867

There is a common thing - the files that rsync reports as vanished, are those that ls -l -raw shows as having at least one '?' in their names... Apparently some characters have no "cp737-equivalent" representation, hence the failure. The problem is that the utf-8 mapping, which can handle anything, leads to an infinite recursion in folder navigation... – ttsiodras – 2011-03-24T16:40:39.067

Could you use cp869 instead? – Majenko – 2011-03-24T16:56:58.253

Tried it - shows garbage instead of the filenames. With cp737 I see valid filenames - it's just that some of them have '?' in them, and suffer the 'vanish' fate (i.e. they are reported in the dir/dirent syscalls, but can't be opened) – ttsiodras – 2011-03-24T16:57:59.283

Additional info: it turns out that some filenames use characters that appear greek, but ARE NOT! Look (I hope HTML accomodates them): ΒΑΣΕΩΝ ∆Ε∆ΟΜΕΝΩΩΝ.htm. Both the "Delta" and "Omega" are bad - I copied a "normal" UTF8 Omega next to the one in the original filename - that character looks like the greek omega, but is NOT the greek omega! The file was saved from the Web, so apparently some web page was written to use this "fake Greek" characters... hence why cp737 can't map them. Which leads us back to UTF-8 - only a full unicode set can safely accomodate everything... but doesn't work! – ttsiodras – 2011-03-24T17:02:50.450

Answers

1

I'm not sure that running rsync over samba is a good idea. I don't know what rsync does internally, but it may generate a lot of network traffic in order to check what files have changed.

There's also a windows version of rsync, which can be run as a system service (http://www.brentnorris.net/rsyncntdoc.html). This way, you don't have the network load and because it only uses local calls on the machine to be backupped, it may fix above problems as well. I used this method in order to backup users laptops.

Geeklab

Posted 2011-03-24T13:34:09.807

Reputation: 327

You are right - I followed the instructions in the link you provided, and everything went fine. In theory, at least, your solution is also faster and (much) more network-friendly. Thank you! – ttsiodras – 2011-03-24T19:35:19.010