Similar to what is described in this article[0], the company I work for uses a bastion AWS account to store IAM users and other AWS accounts to separate different running environments (prod, dev, etc.). The reason this is important is that we have multiple AWS accounts and in some unique cases these AWS accounts need access to a single S3 bucket.
A way to enable this to work correctly is to set a bucket policy that allows access to the bucket from the S3 Endpoint from a particular AWS Account's VPC.
Bucket Policy for
data-warehouse
{ "Sid": "access-from-dev-VPCE", "Effect": "Allow", "Principal": "*", "Action": "s3:*", "Resource": [ "arn:aws:s3:::data-warehouse", "arn:aws:s3:::data-warehouse/*" ], "Condition": { "StringEquals": { "aws:sourceVpce": "vpce-d95b05b0" } } }
Role policy for role
EMRRole
{ "Sid": "AllowRoleToListBucket", "Effect": "Allow", "Action": "s3:ListBucket", "Resource": [ "arn:aws:s3:::data-warehouse", ] }, { "Sid": "AllowRoleToGetBucketObjects", "Effect": "Allow", "Action": [ "s3:GetObject", "s3:GetObjectVersion" ], "Resource": "arn:aws:s3:::data-warehouse/*" }
Unfortunately this doesn't work until I've explicitly set the ACL for each object to allow full control to that object by the owner of the AWS account I'm accessing from. If I don't do this, I get:
fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden
My instance that I'm running this on (EMR) has the correct role:
[hadoop@ip-10-137-221-91 tmp]$ aws sts get-caller-identity
{
"Account": "1234567890",
"UserId": "AROAIGVIL6ZDI6SR87KXO:i-0eaf8a5ca52876835",
"Arn": "arn:aws:sts::1234567890:assumed-role/EMRRole/i-0eaf8a5ca52876835"
}
The ACL for an object in the data-warehouse
bucket look like this:
aws s3api get-object-acl --bucket=data-warehouse --key=content_category/build=2017-11-23/part0000.gz.parquet
{
"Owner": {
"DisplayName": "aws+dev",
"ID": "YXJzdGFyc3RhcnRzadc6frYXJzdGFyc3RhcnN0"
},
"Grants": [
{
"Grantee": {
"Type": "CanonicalUser",
"DisplayName": "aws+dev",
"ID": "YXJzdGFyc3RhcnRzadc6frYXJzdGFyc3RhcnN0"
},
"Permission": "FULL_CONTROL"
}
]
}
In the above ACL, the dev
AWS Account will be able to read the object but another AWS account, say prod
, will not be able to read the object until they've been added as a "Grantee".
My question: Is there a way to read/write objects to an S3 bucket from multiple AWS accounts without having to set ACLs on each individual object?
Note: we use spark to write to s3 using s3a.