2

I'm trying to rsync between google cloud storage and amazon s3. On my local machine (MacOS) this command works

gsutil rsync -d -r gs://mira-internal/data/exports/20191108 s3://mira-temp/raw/20191108

However, when I try to run it on a Ubuntu machine running on Kubernetes it fails with the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

Per documentation this could be caused by a locale configuration issue, but when I run locale in the Ubuntu shell the encoding looks right:

LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8

Not sure what the issue is here.

UPDATE: As a work around, I can use gsutil to rsync to local disk without any issues, then use the aws cli to write to s3.

  • Do you know if any of the files in the Google Cloud Storage bucket were uploaded from your MacOS or a Windows machine? I suspect the issue might be related with some [Cross-Platform Encoding Problems](https://cloud.google.com/storage/docs/gsutil/addlhelp/Filenameencodingandinteroperabilityproblems#cross-platform-encoding-problems-of-which-to-be-aware) as explained in the official documentation. – pessolato Nov 12 '19 at 08:47
  • The GCS files were created using BigQuery – kellanburket Nov 12 '19 at 14:22

0 Answers0