4

I'm currently writing a script to sync files in s3 buckets with s3cmd.

I check the document and it says:

s3cmd sync LOCAL_DIR s3://BUCKET[/PREFIX] or s3://BUCKET[/PREFIX] LOCAL_DIR

also I find a nice option:

--delete-removed
         Delete remote objects with no corresponding local file [sync]

I tested on the first form of s3cmd sync with --delete-removed:

s3cmd sync -r --delete-removed LOCAL_DIR s3://BUCKET[/PREFIX]

It works like a charm that s3 bucket will delete any files not in my LOCAL_DIR

However when I try the second form:

s3cmd sync -r --delete-removed s3://BUCKET[/PREFIX] LOCAL_DIR

The s3cmd seems first to delete all my files under LOCAL_DIR and then download files from s3 bucket to my LOCAL_DIR

It is apparently a waste of time, so is there another better way to sync without deleting all my local files first. That is, copy all files from s3 bucket to my local dir exactly

lazyka
  • 43
  • 1
  • 3
  • It could be the use (or lack) of trailing slash that is causing issues. It's hard to say without looking at your filesystem and the output of a dryrun. Also have you tried with a more basic example (one or two files in the folder) to prove this happens all the time, not just in your case. – Drew Khoury Jun 22 '13 at 05:10
  • @DrewKhoury Thanks!! your comments saves my time, the trailing slash is really the problem – lazyka Jun 24 '13 at 02:50
  • That's one tat has got me before. I've added a proper answer with reference to the docs. – Drew Khoury Jun 24 '13 at 07:36
  • Please let me know if I've suitably answered your question. If so you can mark it as the best answer. – Drew Khoury Jun 25 '13 at 01:20

1 Answers1

2

Take care with your trailing slash (or lack of slash) in path names. It makes a difference.

http://s3tools.org/s3cmd-sync

Important — in both cases just the last part of the path name is taken into account. In the case of dir1 without trailing slash (which would be the same as, say, ~/demo/dir1 in our case) the last part of the path is dir1 and that’s what’s used on the remote side, appended after s3://s3…/path/ to make s3://s3…/path/dir1/….

On the other hand in the case of dir1/ (note the trailing slash), which would be the same as ~/demo/dir1/ (trailing slash again) is actually similar to saying dir1/* – ie expand to the list of the files in dir1. In that case the last part(s) of the path name are the filenames (file1-1.txt and file1-2.txt) without the dir1/ directory name. So the final S3 paths are s3://s3…/path/file1-1.txt and s3://s3…/path/file1-2.txt respectively, both without the dir1/ member in them. I hope it’s clear enough, if not ask in the mailing list or send me a better wording ;-)

Drew Khoury
  • 4,569
  • 8
  • 26
  • 28