I recently started using ECS. I was able to deploy a container image in ECR and create task definition for my container with CPU/Memory limits. My use case is that each container will be a long running app (no webserver, no port mapping needed). The containers will be spawned on demand 1 at a time and deleted on demand 1 at a time.
I am able to create a cluster with N server instances. But I'd like to be able for the server instances to automatically scale up/down. For example if there isn't enough CPU/Memory in the cluster, I'd like a new instance to be created.
And if there is an instance with no containers running in it, I'd like that specific instance to be scaled down / deleted. This is to avoid auto scale down termination of a server instance that has running tasks in it.
What steps are needed to be able to achieve this?
Considering that you already have an ECS Cluster created, AWS provides instructions on Scaling cluster instances with CloudWatch Alarms.
Assuming that you want to scale the cluster based on the memory reservation, at a high level, you would need to do the following:
Because it's more of my specialty I wrote up an example CloudFormation template that should get you started for most of this:
This creates an ECS Cluster, a Launch Configuration, An AutoScaling Group, As well as the Alarms based on the ECS Memory Reservation.
Now we can get to the interesting discussions.
Why can't we scale up based on the CPU Utilization And Memory Reservation?
The short answer is you totally can But you're likely to pay a lot for it. EC2 has a known property that when you create an instance, you pay for a minimum of 1 hour, because partial instance hours are charged as full hours. Why that's relevant is, imagine you have multiple alarms. Say you have a bunch of services that are currently running idle, and you fill the cluster. Either the CPU Alarm scales down the cluster, or the Memory Alarm scales up the cluster. One of these will likely scale the cluster to the point that it's alarm is no longer triggered. After the cooldown, period, the other alarm will undo it's last action, After the next cooldown, the action will likely be redone. Thus instances are created then destroyed repeatedly on every other cooldown.
After giving a bunch of thought to this, the strategy that I came up with was to use Application Autoscaling for ECS Services based on CPU Utilization, and Memory Reservation based on the cluster. So if one service is running hot, an extra task will be added to share the load. This will slowly fill the cluster memory reservation capacity. When the memory gets full, the cluster scales up. When a service is cooling down, the services will start shutting down tasks. As the memory reservation on the cluster drops, the cluster will be scaled down.
The thresholds for the CloudWatch Alarms might need to be experimented with, based on your task definitions. The reason for this is that if you put the scale up threshold too high, it may not scale up as the memory gets consumed, and then when autoscaling goes to place another task, it will find that there isn't enough memory available on any instance in the cluster, and therefore be unable to place another task.