I have an ECS managed EC2 instance running in a VPC (in one of the private subnets). When trying to run a task on this instance it doesn't seem to be able to pull the image. As far as I can make out from the documentation there is no special configuration needed for the ECS agent to pull the image from the repo.
Looking at the Docker logs I repeatedly see the following:
level=error msg="Download failed, retrying: dial tcp 54.231.17.81:443: i/o timeout"
The ecs-agent logs repeatedly show me that the image is not downloading:
Pulling image module="TaskEngine" image="REDACTED.dkr.ecr.us-east-1.amazonaws.com/REDACTED:latest" status="Retrying in 19 seconds"
It eventually tries to run image, but obviously fails and exits. Giving me the message below in the Cluster Tasks tab:
STOPPED (Essential container in task exited)
This error has been occurring with both amzn-ami-2016.03.e and amzn-ami-2016.03.d AMIs
Is there any specific configuration or networking rules that I need to apply to be able to pull from ECR?
Any help here would be greatly appreciated.
As a side note, the instance does have access to the internet (pinging google.com works fine), and when I try to pull an image from Docker Hub, it also works fine.
To download image from ECR, Container Instance needs access to ECR/S3 endpoints.
If your subnet is private you have to either use PrivateLink feature or have to use NAT gateway to reach to ECR endpoints.
If you choose to use PrivateLink, this includes:
If you choose to use NatGateway, route all traffic to NATGateway and whitelist AWS IP ranges.
Reference Link: https://docs.aws.amazon.com/AmazonECR/latest/userguide/vpc-endpoints.html