terraform-ecs. Registered container instance is sh

2020-02-06 17:24发布

问题:

On running terraform apply it is creating a cluster, service, ec2 instance. But Registered container instances is 0, running tasks count is 0.

I tried changing ecs.amazonaws.com to ec2.amazonaws.com but it is throwing an error:

aws_ecs_service.nginx: InvalidParameterException: Unable to assume role and validate the listeners configured on your load balancer. Please verify that the ECS service role being passed has the proper permissions.

    provider "aws" {
        region = "us-east-1"
    }

    resource "aws_ecs_cluster" "demo" {
      name = "demo"
    }

    resource "aws_iam_role" "ecs_elb" {
        name = "ecs-elb"
        assume_role_policy = <<EOF
    {
      "Version": "2008-10-17",
      "Statement": [
        {
          "Sid": "",
          "Effect": "Allow",
          "Principal": {
            "Service": "ecs.amazonaws.com"
          },
          "Action": "sts:AssumeRole"
        }
      ]
    }
    EOF
    }

    resource "aws_iam_policy_attachment" "ecs_elb" {
        name = "ecs_elb"
        roles = ["${aws_iam_role.ecs_elb.id}"]
        policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceRole"
    }

    resource "aws_launch_configuration" "ecs_instance"{
        name_prefix = "ecs-instance-"
        instance_type = "t2.micro"
        image_id = "ami-4fffc834"
    }

    resource "aws_autoscaling_group" "ecs_cluster_instances"{
        availability_zones = ["us-east-1a"]
        name = "ecs-cluster-instances"
        min_size = 1
        max_size = 1
        launch_configuration = "${aws_launch_configuration.ecs_instance.name}"
    }

    resource "aws_ecs_task_definition" "nginx" {
      family = "nginx"
      container_definitions = <<EOF
      [{
        "name": "nginx",
        "image": "nginx",
        "cpu": 1024,
        "memory": 768,
        "essential": true,
        "portMappings": [{"containerPort":80, "hostPort":80}]
      }]
      EOF
    }

    resource "aws_ecs_service" "nginx" {
        name = "nginx"
        cluster = "${aws_ecs_cluster.demo.id}"
        task_definition = "${aws_ecs_task_definition.nginx.arn}"
        desired_count = 1
        iam_role = "${aws_iam_role.ecs_elb.arn}"
        load_balancer {
            elb_name = "${aws_elb.nginx.id}"
            container_name = "nginx"
            container_port = 80
        }
    }
    resource "aws_elb" "nginx" {
        availability_zones = ["us-east-1a"]
        name = "nginx"
        listener {
            lb_port = 80
            lb_protocol = "http"
            instance_port = 80
            instance_protocol = "http"
        }
    }

回答1:

To troubleshooting ecs problems, you can follow below steps.

  1. click service name nginx, check if any tasks are in pending status. If you see that, normally there are a lot of stopped tasks.

That means the containers are not healthy.

  1. click service name, events, check if there any error events to help you do the troubleshooting.

  2. Click ECS instances, if there are any instances in the list. If not, that means no ec2 instance is successfully registered itself to ECS cluster.

If you use AWS ECS AMI, it should be fine. But if you use your own AMI, you need add below userdata script

ecs-userdata.tpl

#!/bin/bash
echo "ECS_CLUSTER=${ecs_cluster_name}" >> /etc/ecs/ecs.config

update terraform codes:

data "template_file" "ecs_user_data" {

  template = "file("ecs-userdata.tpl") }"

  vars {
    ecs_cluster_name = "${var.ecs_cluster_name}"
  }
}


resource "aws_launch_configuration" "demo" {
  ...
  user_data = "${data.template_file.ecs_user_data.rendered}"
  ...
}
  1. Enable docker container logs, the easiest way is to send the logs to aws cloudwatch.

Add below resource first.

resource "aws_cloudwatch_log_group" "app_logs" {
  name              = "demo"
  retention_in_days = 14
}

Then add below codes into task definition.

"logConfiguration": {
  "logDriver": "awslogs",
  "options": {
    "awslogs-group": "${aws_cloudwatch_log_group.app_logs.name}",
    "awslogs-region": "${var.region}"
  }
},

after you applied change, go to cloudwatch, logs to check if there are any error logs.

  1. change iam role to ["ecs.amazonaws.com", "ec2.amazonaws.com"] "Principal": { "Service": ["ecs.amazonaws.com", "ec2.amazonaws.com"] }, Hope these steps are helpful for you.

Future reading:

Launching an Amazon ECS Container Instance



回答2:

Here are few suggestions to check in AWS Console:

  • Make sure that you are using Amazon ECS-optimized AMIs.

    Basically these instances, once you login as root, they should have start ecs command.

    Terraform example:

    data "aws_ami" "ecs_ami" {
      most_recent = true
      owners      = ["amazon"]
    
      filter {
        name   = "name"
        values = ["amzn-ami-*-amazon-ecs-optimized"]
      }
    }
    
  • Check whether EC2 are spinned up.

  • Check your Load Balancing Target Group (e.g. why they're not registered by checking Health status of the instances in Targets tab, Attributes in Description tab and Health checks tab).
  • Check whether ECS agent is running on the EC2 instances.

    1. Login to EC2 instance as root.
    2. Run docker ps and check for whether ecs-agent container is running.
    3. Otherwise start manually by start ecs or restart ecs.

    Note: If you don't have docker, start or restart command, you're not using ECS-optimized AMI.

  • When the instances get terminated.

    • Verify that ECS agent is still running (check above).
    • When using Launch Configurations, check your user data script for errors. Also, that it adds the right cluster to /etc/ecs/ecs.config ECS config file. And it starts ECS agent (start ecs).
    • Check system logs of terminated instances by navigating to EC2 Running Instances Dashboard, selecting terminated instance, Get System Log in Instance Settings (menu), then scroll down to the bottom to see any obvious issues. The logs are kept for a while after instance is terminated.
    • Check the ECS logs (tail -f /var/log/ecs/*).
    • See: Why is my Amazon ECS agent listed as disconnected?.
    • Check: How do I find the cause of an EC2 autoscaling group "health check" failure? (no load balancer involved)
  • Once instances have ECS agent running, make sure you assigned them into the right cluster. E.g.

    root# cat /etc/ecs/ecs.config
    ECS_CLUSTER=demo
    
  • Note the IAM role of the running EC2 instance, then make sure that AmazonEC2ContainerServiceforEC2Role policy is attached to that role.

  • In Trust relationships tab of that cluster role, make sure to give the access to EC2 provider to that role. Example role trust policy:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid": "",
          "Effect": "Allow",
          "Principal": {
            "Service": "ec2.amazonaws.com"
          },
          "Action": "sts:AssumeRole"
        }
      ]
    }
    

    Terraform example:

    data "aws_iam_policy_document" "instance" {
      provider = "aws.auto-scale-group"
    
      statement {
        effect  = "Allow"
        actions = ["sts:AssumeRole"]
    
        principals {
          type        = "Service"
          identifiers = ["ec2.amazonaws.com"]
        }
      }
    }
    

    See: What is the purpose of AssumeRolePolicyDocument in IAM?.

    You also need aws_iam_instance_profile and aws_iam_role, e.g.

    resource "aws_iam_instance_profile" "instance" {
      provider = "aws.auto-scale-group"
      name     = "myproject-profile-instance"
      role     = "${aws_iam_role.instance.name}"
    
      lifecycle {
        create_before_destroy = true
      }
    }
    
    resource "aws_iam_role" "instance" {
      provider           = "aws.auto-scale-group"
      name               = "myproject-role"
      path               = "/"
      assume_role_policy = "${data.aws_iam_policy_document.instance.json}"
    
      lifecycle {
        create_before_destroy = true
      }
    }
    
  • Now, your cluster should be ready to go.


Related:

  • AWS ECS Error when running task: No Container Instances were found in your cluster