S3 to Redshift : Copy with Access Denied

2020-04-17 07:12发布

问题:

We previously used to copy files from s3 to Redshift using the COPY command every day, from a bucket with no specific policy.

COPY schema.table_staging     
FROM 's3://our-bucket/X/YYYY/MM/DD/'     
CREDENTIALS 'aws_access_key_id=xxxxxx;aws_secret_access_key=xxxxxx'     
CSV     
GZIP     
DELIMITER AS '|'     
TIMEFORMAT 'YYYY-MM-DD HH24:MI:SS';  

As we needed to improve the security of our S3 bucket, we added a policy to authorize connections either from our VPC (the one we use for our Redshift cluster) or specific IP address.

{
"Version": "2012-10-17",
"Id": "S3PolicyId1",
"Statement": [
    {
        "Sid": "DenyAllExcept",
        "Effect": "Deny",
        "Principal": "*",
        "Action": "s3:*",
        "Resource": [
            "arn:aws:s3:::our-bucket/*",
            "arn:aws:s3:::our-bucket"
        ],
        "Condition": {
            "StringNotEqualsIfExists": {
                "aws:SourceVpc": "vpc-123456789"
            },
            "NotIpAddressIfExists": {
                "aws:SourceIp": [
                    "12.35.56.78/32"
                ]
            }
        }
    }
]
}

This policy works well for accessing files from EC2, EMR or our specific address using AWS CLI or the boto Python library.

Here is the error we have on Redshift :

ERROR: S3ServiceException:Access Denied,Status 403,Error AccessDenied,Rid xxxxxx,CanRetry 1
Détail : 
-----------------------------------------------
error:  S3ServiceException:Access Denied,Status 403,Error AccessDenied,Rid xxxxxx,CanRetry 1
code:      8001
context:   Listing bucket=our-bucket prefix=X/YYYY/MM/DD/
query:     1587954
location:  s3_utility.cpp:552
process:   padbmaster [pid=21214]
-----------------------------------------------

Many thanks in advance if you can help us on this,

Damien

ps : this question is quite similar to this one : Copying data from S3 to Redshift - Access denied

回答1:

You need to use the 'Enhanced VPC Routing' feature of Redshift. From the documentation here:

  1. When you use Amazon Redshift Enhanced VPC Routing, Amazon Redshift forces all COPY and UNLOAD traffic between your cluster and your data repositories through your Amazon VPC.

  2. If Enhanced VPC Routing is not enabled, Amazon Redshift routes traffic through the Internet, including traffic to other services within the AWS network.

  3. For traffic to an Amazon S3 bucket in the same region as your cluster, you can create a VPC endpoint to direct traffic directly to the bucket.