I've two AWS Cloudformation stacks, one for IAM roles and the second to create an AWS service and import the respective roles into it using Cloudformation.
When 10+ services are deployed the following error appears randomly on 1 or 2 of the services -
AWS::ECS::Service service Unable to assume role and validate the listeners configured on your load balancer. Please verify that the ECS service role being passed has the proper permissions.
If all the services are torn down and the services redployed to the ECS cluster, the error appears but for different services.
The AWS fix for this can be seen here
If the 1 or 2 broken services are torn down and redeployed the services deploy without issue. So the problem appears to only occur when many services are deployed at the same time - this indicates it may be an IAM propagation timing issue within Cloudformation.
I've tried adding depends on in the service definition -
"service" : {
"Type" : "AWS::ECS::Service",
"DependsOn" : [
"taskdefinition",
"ECSServiceRole"
],
"Properties" : {
"Cluster" : { "Ref": "ECSCluster"},
"Role" : {"Ref" : "ECSServiceRole"},
etc...
}
}
But this doesn't work.
As you can note, I've also removed the IAM import value for the ECSServiceRole and replaced it with an inline resource policy seen here -
"ECSServiceRole" : {
"Type" : "AWS::IAM::Role",
"Properties" : {
"AssumeRolePolicyDocument" : {
"Statement" : [
{
"Sid": "",
"Effect" : "Allow",
"Principal" : {
"Service" : [
"ecs.amazonaws.com"
]
},
"Action" : [
"sts:AssumeRole"
]
}
]
},
"Path" : "/",
"Policies" : [
{
"PolicyName" : "ecs-service",
"PolicyDocument" : {
"Statement" : [
{
"Effect" : "Allow",
"Action" : [
"ec2:Describe*",
"ec2:AuthorizeSecurityGroupIngress",
"elasticloadbalancing:DeregisterInstancesFromLoadBalancer",
"elasticloadbalancing:DeregisterTargets",
"elasticloadbalancing:Describe*",
"elasticloadbalancing:RegisterInstancesWithLoadBalancer",
"elasticloadbalancing:RegisterTargets",
"sns:*"
],
"Resource" : "*"
}
]
}
}
]
}
}
But again - the inline policy doesn't fix the issue either.
Any ideas or pointers would be much appreciated!
In reply to answer 1.
Thank you - I wasn't aware of this improvment.
Is this the correct way to associate the service linked role for ECS?
"ECSServiceRole": {
"Type": "AWS::IAM::Role",
"Properties": {
"AssumeRolePolicyDocument": {
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": [
"ecs.amazonaws.com"
]
},
"Action": [
"sts:AssumeRole"
]
}
]
},
"Path": "/",
"Policies": [
{
"PolicyName": "CreateServiceLinkedRoleForECS",
"PolicyDocument": {
"Statement": [
{
"Effect": "Allow",
"Action": [
"iam:CreateServiceLinkedRole",
"iam:PutRolePolicy",
"iam:UpdateRoleDescription",
"iam:DeleteServiceLinkedRole",
"iam:GetServiceLinkedRoleDeletionStatus"
],
"Resource": "arn:aws:iam::*:role/aws-service-role/ecs.amazonaws.com/AWSServiceRoleForECS*",
"Condition": {
"StringLike": {
"iam:AWSServiceName": "ecs.amazonaws.com"
}
}
}
]
}
}
]
}
}
Final Answer
After months of intermittent on-going issues with AWS regarding this matter AWS came back to say they were throttling us in the background, on the ELB. This is why the random and varied issues were appearing when deploying 3+ docker services via Cloudformation at the same time. The solution was nothing to do with IAM permissions, rather it was to increase the rate limit on the ELB via the "AWS Service Team".