On running terraform apply
it is creating a cluster, service, ec2 instance. But Registered container instances is 0, running tasks count is 0.
I tried changing ecs.amazonaws.com
to ec2.amazonaws.com
but it is throwing an error:
aws_ecs_service.nginx: InvalidParameterException: Unable to assume role and validate the listeners configured on your load balancer. Please verify that the ECS service role being passed has the proper permissions.
provider "aws" {
region = "us-east-1"
}
resource "aws_ecs_cluster" "demo" {
name = "demo"
}
resource "aws_iam_role" "ecs_elb" {
name = "ecs-elb"
assume_role_policy = <<EOF
{
"Version": "2008-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": "ecs.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
EOF
}
resource "aws_iam_policy_attachment" "ecs_elb" {
name = "ecs_elb"
roles = ["${aws_iam_role.ecs_elb.id}"]
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceRole"
}
resource "aws_launch_configuration" "ecs_instance"{
name_prefix = "ecs-instance-"
instance_type = "t2.micro"
image_id = "ami-4fffc834"
}
resource "aws_autoscaling_group" "ecs_cluster_instances"{
availability_zones = ["us-east-1a"]
name = "ecs-cluster-instances"
min_size = 1
max_size = 1
launch_configuration = "${aws_launch_configuration.ecs_instance.name}"
}
resource "aws_ecs_task_definition" "nginx" {
family = "nginx"
container_definitions = <<EOF
[{
"name": "nginx",
"image": "nginx",
"cpu": 1024,
"memory": 768,
"essential": true,
"portMappings": [{"containerPort":80, "hostPort":80}]
}]
EOF
}
resource "aws_ecs_service" "nginx" {
name = "nginx"
cluster = "${aws_ecs_cluster.demo.id}"
task_definition = "${aws_ecs_task_definition.nginx.arn}"
desired_count = 1
iam_role = "${aws_iam_role.ecs_elb.arn}"
load_balancer {
elb_name = "${aws_elb.nginx.id}"
container_name = "nginx"
container_port = 80
}
}
resource "aws_elb" "nginx" {
availability_zones = ["us-east-1a"]
name = "nginx"
listener {
lb_port = 80
lb_protocol = "http"
instance_port = 80
instance_protocol = "http"
}
}
To troubleshooting ecs problems, you can follow below steps.
- click service name
nginx
, check if any tasks are in pending
status. If you see that, normally there are a lot of stopped
tasks.
That means the containers are not healthy.
click service name, events, check if there any error events to help you do the troubleshooting.
Click ECS instances
, if there are any instances in the list. If not, that means no ec2 instance is successfully registered itself to ECS cluster.
If you use AWS ECS AMI, it should be fine. But if you use your own AMI, you need add below userdata script
ecs-userdata.tpl
#!/bin/bash
echo "ECS_CLUSTER=${ecs_cluster_name}" >> /etc/ecs/ecs.config
update terraform codes:
data "template_file" "ecs_user_data" {
template = "file("ecs-userdata.tpl") }"
vars {
ecs_cluster_name = "${var.ecs_cluster_name}"
}
}
resource "aws_launch_configuration" "demo" {
...
user_data = "${data.template_file.ecs_user_data.rendered}"
...
}
- Enable docker container logs, the easiest way is to send the logs to aws cloudwatch.
Add below resource first.
resource "aws_cloudwatch_log_group" "app_logs" {
name = "demo"
retention_in_days = 14
}
Then add below codes into task definition.
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "${aws_cloudwatch_log_group.app_logs.name}",
"awslogs-region": "${var.region}"
}
},
after you applied change, go to cloudwatch
, logs to check if there are any error logs.
- change iam role to
["ecs.amazonaws.com", "ec2.amazonaws.com"]
"Principal": {
"Service": ["ecs.amazonaws.com", "ec2.amazonaws.com"]
},
Hope these steps are helpful for you.
Future reading:
Launching an Amazon ECS Container Instance
Here are few suggestions to check in AWS Console:
Make sure that you are using Amazon ECS-optimized AMIs.
Basically these instances, once you login as root
, they should have start ecs
command.
Terraform example:
data "aws_ami" "ecs_ami" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["amzn-ami-*-amazon-ecs-optimized"]
}
}
Check whether EC2 are spinned up.
- Check your Load Balancing Target Group (e.g. why they're not registered by checking Health status of the instances in Targets tab, Attributes in Description tab and Health checks tab).
Check whether ECS agent is running on the EC2 instances.
- Login to EC2 instance as
root
.
- Run
docker ps
and check for whether ecs-agent
container is running.
- Otherwise start manually by
start ecs
or restart ecs
.
Note: If you don't have docker
, start
or restart
command, you're not using ECS-optimized AMI.
When the instances get terminated.
- Verify that ECS agent is still running (check above).
- When using Launch Configurations, check your user data script for errors. Also, that it adds the right cluster to
/etc/ecs/ecs.config
ECS config file. And it starts ECS agent (start ecs
).
- Check system logs of terminated instances by navigating to EC2 Running Instances Dashboard, selecting terminated instance, Get System Log in Instance Settings (menu), then scroll down to the bottom to see any obvious issues. The logs are kept for a while after instance is terminated.
- Check the ECS logs (
tail -f /var/log/ecs/*
).
- See: Why is my Amazon ECS agent listed as disconnected?.
- Check: How do I find the cause of an EC2 autoscaling group "health check" failure? (no load balancer involved)
Once instances have ECS agent running, make sure you assigned them into the right cluster. E.g.
root# cat /etc/ecs/ecs.config
ECS_CLUSTER=demo
Note the IAM role of the running EC2 instance, then make sure that AmazonEC2ContainerServiceforEC2Role policy is attached to that role.
In Trust relationships tab of that cluster role, make sure to give the access to EC2 provider to that role. Example role trust policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
Terraform example:
data "aws_iam_policy_document" "instance" {
provider = "aws.auto-scale-group"
statement {
effect = "Allow"
actions = ["sts:AssumeRole"]
principals {
type = "Service"
identifiers = ["ec2.amazonaws.com"]
}
}
}
See: What is the purpose of AssumeRolePolicyDocument in IAM?.
You also need aws_iam_instance_profile
and aws_iam_role
, e.g.
resource "aws_iam_instance_profile" "instance" {
provider = "aws.auto-scale-group"
name = "myproject-profile-instance"
role = "${aws_iam_role.instance.name}"
lifecycle {
create_before_destroy = true
}
}
resource "aws_iam_role" "instance" {
provider = "aws.auto-scale-group"
name = "myproject-role"
path = "/"
assume_role_policy = "${data.aws_iam_policy_document.instance.json}"
lifecycle {
create_before_destroy = true
}
}
Now, your cluster should be ready to go.
Related:
- AWS ECS Error when running task: No Container Instances were found in your cluster