I'm trying to rsync between google cloud storage and amazon s3. On my local machine (MacOS) this command works
gsutil rsync -d -r gs://mira-internal/data/exports/20191108 s3://mira-temp/raw/20191108
However, when I try to run it on a Ubuntu machine running on Kubernetes it fails with the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
Per documentation this could be caused by a locale configuration issue, but when I run locale
in the Ubuntu shell the encoding looks right:
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8
Not sure what the issue is here.
UPDATE: As a work around, I can use gsutil to rsync to local disk without any issues, then use the aws cli to write to s3.