45

Note to serverfault users: This closed question is answered here by four comments by hynekcer. It is then well accepted in other comments as a regularly answerable question. It's now being voted for re-opening.


When both source and destination are remote, rsync complains:

The source and destination cannot both be remote. rsync error: syntax or usage error (code 1) at main.c(1156) [Receiver=3.0.7]

Is there an insurmountable technical obstacle to making rsync do this? Or is it simply a case of not-yet-implemented? It seems relatively easy to create a local buffer in memory that mediates the transfer between two remotes, holding both hashes and data.

loopbackbee
  • 1,305
  • 1
  • 10
  • 20
  • 1
    It would involve a remote src rsyncd sending data to a remote dest rsyncd. You can work around it by ssh'ing to the src system and invoking rsync. – Alex Holst Nov 17 '12 at 14:18
  • @AlexHolst I don't think that would work in my particular case. see edit – loopbackbee Nov 18 '12 at 01:35
  • Sorry, [SF] doesn't deal with theoretical questions; only Answerable questions about problems you actually face. See the [FAQ] for more details. – Chris S Nov 20 '12 at 15:17
  • 4
    Sorry (moderators), this is an answerable interesting question: The reason is that the rdiff algorithm can not be symmetric. A bigger CPU and memory overhead is on the "active" side. The "passive" side needs only to compute checksums of all blocks (see --block-size parameter) of all modified files and to resend them. That is done with very small memory requirements and the most of operations can be done in the first level CPU cache. The "active" side needs to search by a checksums where the same data block is located now... – hynekcer Nov 13 '16 at 20:27
  • 3
    That requires to hold 24 bytes of memory per every block and a frequent random access to that big temporary memory database to search 4 bytes per every byte of misplaced data, computed on the fly by `rolling hash` algorithm for all bytes of the block. It is clear that this big range must use the slow uncached memory. It would be a controversial idea to implement the "active" part on a remote side if you imagine that it could be a file server that must execute more similar concurrent requests, or even a cheap NAS server... – hynekcer Nov 13 '16 at 20:29
  • (I don't know the current best state of art of the algorithm and its system requirements, but for informative purpose of this question it is sufficient.) [rsync wiki](https://en.wikipedia.org/wiki/Rsync#Algorithm) – hynekcer Nov 13 '16 at 20:29
  • 2
    I think this is a *great* question (I had exactly this question, that's why I'm here) and the closing reason is bogus. I still want to know the answer! – reinierpost Mar 16 '17 at 09:49
  • Perhaps the question should be more specific, e.g. *Is there any technical reason why `rsync` doesn't have a `-3` option like `scp`? Couldn't the `rsync` command just pass the traffic between the two remote hosts through without loss of efficiency?* And hynecker gives a good answer. – reinierpost Mar 16 '17 at 09:53
  • @hynekcer: you should post this as an answer, because none of the current answers actually answer the question, but your comment does. – Benoit Duffez Feb 19 '21 at 16:04
  • @BenoitDuffez A closed question can't be answered. I don't have enough reputation to ask for reopening. I add the most important reason to my "answer": The `rolling hash` algorithm algorithm requires that full data of one side must be accessible: "The sender computes the checksum for each rolling section in its version of the file" [rsync wiki - Algorithm] (https://en.wikipedia.org/wiki/Rsync#Algorithm). There is nothing to be cached by the third side. It can only authorize the transfer (prepare a restricted temporary SSH authorized key) and create a network tunnel, not to control a transfer. – hynekcer Feb 20 '21 at 14:27

3 Answers3

8

why not try and connect to the remote machine and start the transfer from there. If you are using ssh-keys you can use agent pass though to manage the authenticate for you.

ssh -A remotehostA rsync /remote/file/on/host/a remoteHostB:/destination/

This command will log you on the remoteHostA and run rsync from there.

Kaplaa
  • 99
  • 1
  • 5
    There's some security considerations involved. See edit – loopbackbee Nov 18 '12 at 01:38
  • 5
    Also, that won't work if you don't have direct access between the two systems... And if getting direct access involves waiting two weeks for a security section to maybe correctly implement firewall rules... – Gert van den Berg Nov 25 '14 at 12:38
  • 1
    Scenario: Server A has key-based root access to server B and C. You want to sync from B to C using root access on both. But you don't want B or C to have root access to each other. – thomasrutter Nov 12 '17 at 23:05
  • this does not answer the question – Julius Aug 24 '22 at 19:19
6
scp -3r <remote src> <remote dest>

has no trouble doing this.

user1133275
  • 195
  • 1
  • 11
Karma Fusebox
  • 1,064
  • 9
  • 18
  • 5
    scp doesn't do deltas though, AFAIK, and it's badly needed in this case. More details on my edit – loopbackbee Nov 18 '12 at 01:42
  • 4
    Btw, scp have to be with options -3 if you'd like to be like a proxy between hosts – alterpub Dec 03 '14 at 15:18
  • 1
    Unfortunately scp lacks important features of rsync eg: option to not cross over filesystems (eg network drive mounted in user folder), archive mode (in which "symbolic links, devices, attributes, permissions, ownerships, etc. are preserved"), etc... – Bastion Sep 12 '19 at 04:08
  • I wish it wasn't this slow ( – Nakilon Jun 29 '21 at 21:17
0

You can work around this by mounting one (or both) of the remote filesystems with sshfs. Then, rsync will treat it as if it were local.

Unfortunately, this will result in a lot of bandwidth usage on the machine whose filesystem is mounted with sshfs, so I would recommend only doing this with the machine that has a lot of bandwidth between you and the third machine.

Of course, the ideal solution is for the machines to talk directly to each other. I can't think of any good reason why they should not.

Michael Hampton
  • 237,123
  • 42
  • 477
  • 940
  • see edit. The reason you're referring to is security. A (root) compromise any one of the two machines must not lead to filesystem access to the other. But maybe I'm attacking this from the wrong angle and there's a solution that doesn't involve a third machine... – loopbackbee Nov 18 '12 at 01:41
  • Hmm. I think those details really should be added to this question. As for a root compromise, you really should be using SELinux. – Michael Hampton Nov 18 '12 at 01:47
  • I left this one because someone else may be interested in this behaviour on rsync in particular. AFAIK, SELinux on any of the machines cannot tell if the other has been compromised if the locally executing rsync has the exact same behaviour in both cases (filesystem access to a particular directory). – loopbackbee Nov 18 '12 at 01:52
  • I'm not sure if you understand how SELinux works. Its whole point is that it prevents a service (even a compromised service) from accessing things it isn't explicitly allowed to access, even if that service runs as root. – Michael Hampton Nov 18 '12 at 01:59
  • I have only a basic understanding of SELinux policies, but rsync **is** supposed to access the filesystem, isn't it? The exact same kind of (local) access patterns are undesirable if the **remote** machine is compromised. – loopbackbee Nov 18 '12 at 02:19
  • Eh? Do you envision _rsync_ itself being compromised? It's not a continuously running service, after all. – Michael Hampton Nov 18 '12 at 02:20
  • I'm assuming you've read my other question, but maybe I'm not explaining myself clearly. Say there's machine1 and machine2. I envision machine1 being rooted. At this point, evil root deletes all documents on machine1, and happily rsyncs to machine2, effectively destroying two filesystems instead of only one. This can be (somewhat) prevented with a third machine acting as intermediary. – loopbackbee Nov 18 '12 at 04:14
  • Yes, that's what SELinux is for. It will protect you quite well from this circumstance when it involves a compromise of an outside facing service. Nothing can prevent you from an insider with the root password, though. – Michael Hampton Nov 18 '12 at 04:18
  • I was going to ask "and how is the SELinux policy going to tell apart a licit delete-all-files operation from one made by a rooted machine?", but then I realized a third machine also can't, really. So I guess your point is "whatever policies can be implemented in that third machine can also be implemented locally in SELinux", right? Though SELinux can't really probe the remote machine, or monitor a remote IDS... Also, assuming said policies are impossible to write, having the third machine initiate the connections buys time to detect the rooting. – loopbackbee Nov 18 '12 at 04:29