1

Summary:

I use the following setup to backup data on a Synology NAS to a remote disk via rsync.

  • First backup local: rsync initialised on the Synology NAS to the disk (mounted on the NAS).
  • Future backups remote: rsync initialised on a Mac to the disk, now mounted on the Mac on a remote location.

Problem: I get two copies of all data (files or folders) with special characters.

Question: Is there a way to use the same basic process (first backup locally via NAS, rest via Mac, using rsync) without the above problem?

State of play: What follows below is rather long and includes two edits, but although the problem is further analysed, it is not yet solved.

Full Description:

I have a setup that has been going on for ages. Indeed, ever since I managed to solve an initial special character problem (detailed here), after which I use the "--iconv=utf-8-mac,utf-8" option for the rsync on the mac. The setup is this:

Location 1: Synology NAS

Location 2 (in a galaxy far far away): Mac with external disk (Mac OS Journaled)

Task: rsync job on the Mac pulling folders to the external disk (location 2) from the NAS (location 1).

I now plan to set up a new disk (also Mac OS Journaled) on Location 2. Since there is around 2TB of data to transfer, I did the following:

  1. Location 1: Plugged the new disk into the NAS thanks to the wonders of USB.

  2. Location 1: Pushed the data to the new disk with a rsync job on the NAS

  3. Travelled to the far away galaxy that we here call Location 2

  4. Location 2: initialised a limited rsync pull job from the Mac, now with the new disk plugged in.

Problem: For some reason, step (4) did not finish in 2 seconds with no changes at all, but started to complain that “file has vanished: …[file location specified]” for a bunch-load of files. Then it started to copy folders and files to the disk — even though they were already there! 70 GB later, from what I could tell, it had made a completely redundant copy of all folders which had special characters in their name (and a redundant copy of all files with special characters in their name in folders which did not have special characters in their name). For example:

drwxrwxrwx   5 _unknown  _unknown      170 Aug  7  2013 Pippi Långstrump-Pippi i Söderhavet
drwxrwxrwx   5 _unknown  _unknown      170 Aug  7  2013 Pippi Långstrump-Pippi i Söderhavet

These two folders seem completely identical, yet they are listed alongside each other as two distinct folders. If I use the Mac GUI, I can enter each of them and see that they contain the same (qualitatively identical) three tracks (I do not even know how to separate them using the command line, but with the GUI I can visually see that I ‘enter’ different folders). And they are not merely virtual, since the total size of the subset of the data went from 64 GB to 82 GB.

What has happened? To my untrained eye, it seems as if the rsync process initialised on the Mac cannot ‘see’ that the source files on the NAS are already present on the target disk, and put them there again. When the mac terminal displays the file and folder names, it evidently uses the same symbols, but it must still interpret them as different ‘underneath’, since otherwise the file system would not allow it.

Now, this is not all. When I try to get the system to keep only one of the special character folders/files with the --delete option, everything just happens all over. Folders are deleted indeed, but new ones are copies and in the end I am still sitting with duplicates and 82 GB in the subset instead of 64 GB as a result.

What is going on and what can I do about it?

EDIT 11 sept: The wise Tomáš Pospíšek (well acquainted with special characters, I presume ;) advised me to go “under the hood”, and so I used his command (on Ronja instead of Pippi, since I had too many different Pippi folders). A simple “ls -l” gave me:

drwxrwxrwx   2 _unknown  _unknown        68 Aug  7  2013 Ronja Rövardotter
drwxrwxrwx   2 _unknown  _unknown        68 Aug  7  2013 Ronja Rövardotter

whereas

sh-3.2# ls -l Ronja* | hexdump -C

resulted in:

00000000  52 6f 6e 6a 61 20 52 6f  cc 88 76 61 72 64 6f 74  |Ronja Ro..vardot|
00000010  74 65 72 3a 0a 0a 52 6f  6e 6a 61 20 52 c3 b6 76  |ter:..Ronja R..v|
00000020  61 72 64 6f 74 74 65 72  3a 0a                    |ardotter:.|

or, if I sort it out a bit:

52 6f 6e 6a 61 20 52 6f cc 88 76 61 72 64 6f 74 74 65 72 3a 0a 0a |Ronja Ro..vardotter:..
52 6f 6e 6a 61 20 52 c3 b6 76 61 72 64 6f 74 74 65 72 3a 0a       |Ronja R..vardotter:.

In other words, they are not identical, only superficially displayed as such.

Thanks for that. But what should I do about it? Is there a way to format the disk (e.g. case-sensitive Journaled) so that both the NAS and my mac can write to the disk properly? Or is there no way around biting the bullet, i.e. connecting the disk to a local Mac (location 1) and do the first backup via ethernet? That would take forever compared to the USB 3 connection, but at least the backup would be "mac interpreted" both locally and remotely. What do you suggest?

EDIT 14 sept: The helpful Tomáš further suggested (via comment below) that I should try to rsync a single file with special characters in the name, to see what happens then (and he suggests a workaround). Unfortunately, what happens is that I am left with two files on the destination disk with seemingly identical names, but, when hexdumping you can see that they are coded differently. My problem then was that I could not seem to delete both files properly. That is, when I “rm”-deleted them so that no files were visible (“ls -l” did not list them), I could still see the files (or folders; same there) in Mac Finder. This happened even if I rebooted the system etc, so somehow the file information was there for Mac Finder to display, even though they did not turn up from a command listing.

At this point, I sort of threw in the towel and went for the cowardly solution of simply erasing the disk and go back to pulling the data on the initial site (location 1) through a mac and the same rsync command. That took a much longer time, transfer-wise, but directly ‘solved’ the problem. I now have it all set up, working like clockwork.

Still, the problem as such is not yet solved. That is, I would like to know how to:

  • push data to an external disk (Mac OS Journaled, mounted on the NAS) from a Synology NAS with an rsync process initialised on the NAS
  • Backup that data on an external site using a rsync command on a mac
    to which the disk is mounted.

If anyone knows, answer the question and I will mark you as the problem-solver (and a hero) right away!

NOTE: This question is now like a little “what I did last summer”-tale of its own, so I have re-written the summary above for potential problem solvers to have a change of knowing what the core question is.

Nick The Swede
  • 401
  • 1
  • 3
  • 14
  • 2
    can you do a `ls -l Pippi* | hexdump -C` and compare whether the codes used for the spacial characters are identical (which I suspect are not), which would point to some character encoding/character set usage problem. – Tomáš Pospíšek Sep 10 '18 at 18:45
  • Maybe it is just some "funny" unicode encoding differences, in which case renaming one of them will make them easier to distinguish. Maybe it is file system corruption of some sorts, in which case renaming might still help but it would carry some risk as will any write operation on a corrupted file system. – kasperd Sep 10 '18 at 22:03
  • Try https://askubuntu.com/questions/533690/rsync-with-special-character-files-not-working-between-mac-and-linux – roaima Sep 10 '18 at 22:39
  • @roaima: Thanks, but that is actually my own post from a few years ago ;-) and I am using the correct --iconv syntax now (or at least, the one that has worked for several years). – Nick The Swede Sep 11 '18 at 08:46
  • Ah ok. I have seen this too, when backing up Cyrillic filenames. I'm trying to find the change I made. In terms of "what's different", it's "o-with-umlaut" or "o" followed by "overstrike-previous-with-double-quotes". Unicode allows ö to be constructed either way. – roaima Sep 11 '18 at 09:08
  • Does this answer (the one starting "You can use rsync's --iconv option...") help you at all? https://serverfault.com/a/427200/267016 – roaima Sep 11 '18 at 09:41
  • I'm afraid not. That is the very option I already use. But thanks for the search! – Nick The Swede Sep 11 '18 at 10:32
  • So I suggest that you reduce your problem. Only rsync *one single file* that displays the behavior. Will that also create two files? If it only creates a single file then what you can do as a _workaround_ is to clean up, i.e. delete all the "wrong" files, and go on with life as before. If you want to go deeper, then I suggest to apply the `ls -l | hexdump` trick to every step of the procedure. I.e. how is the name encoded on the source side? If you run `rsync -v` how is the name encoded that rsync is telling you it is syncing on the source side? ... – Tomáš Pospíšek Sep 13 '18 at 06:38
  • ... Then I guess you should be able to switch on logging on the destination side as well (there's also a rsync running there). How's the name encoded there? That is, you should be able to see at which point the file name gets "mangled". Also, it'd be best to use UTF-8 at every point. That is: the source file system should use UTF-8, rsync should use UTF-8 on the source side, when transfering, and on the destination side. The FS on the destination side should use UTF-8. Is it possible to ensure that's the case? – Tomáš Pospíšek Sep 13 '18 at 06:40
  • Thanks for the additional tips, Tomáš. I discuss what I did do in “EDIT 14 sept” above. For clarification, however, how does rsync on the ‘other side’ [it is actually the source side, since I am pulling the data from the destination side] work? Should not that rsync command, i.e. the one on the NAS, mirror the one I have given on the mac side, i..e include the same --iconv option of the initial rsync command? Or does it not work that way? – Nick The Swede Sep 14 '18 at 10:27

0 Answers0