I'm working with a CloudFormation template that brings up as many instances as I request, and want to wait for them to finish initialising (via User Data) before the stack creation/update is considered complete.
The Expectation
Creating or updating the stack should wait for signals from all newly created instances, such to ensure that their initialisation is complete.
I don't want the stack creation or update to be considered successful if any of the created instances fail to initialise.
The Reality
CloudFormation only seems to wait for signals from instances when the stack is first created. Updating the stack and increasing the number of instances seems to disregard signalling. The update operation finishes successfully very quickly, whilst instances are still being initialised.
Instances created as a result of updating the stack can fail to initialise, but the update action would've already been considered a success.
The Question
Using CloudFormation, how can I make the reality meet the expectation?
I want the same behaviour that applies when the stack is created, to when the stack is updated.
Similar Questions
I have found only the following question that matches my problem: UpdatePolicy in Autoscaling group not working correctly for AWS CloudFormation update
It's been open for a year and has not received an answer.
I'm creating another question as I've more information to add, and I'm not sure if these particulars will match those of the author in that question.
Reproducing
To demonstrate the problem, I've created a template based off of the example beneath the Auto Scaling Group header on this AWS documentation page, which includes signalling.
The created template has been adapted as so:
- It uses an Ubuntu AMI (in region
ap-northeast-1
). Thecfn-signal
command has been bootstrapped and called as necessary considering this change. - A new parameter dictates how many instances to launch in the auto scaling group.
- A sleep time of 2 minutes has been added before signalling, to simulate the time spent whilst initialising.
Here's the template, saved to template.yml
:
Parameters:
DesiredCapacity:
Type: Number
Description: How many instances would you like in the Auto Scaling Group?
Resources:
AutoScalingGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
AvailabilityZones: !GetAZs ''
LaunchConfigurationName: !Ref LaunchConfig
MinSize: !Ref DesiredCapacity
MaxSize: !Ref DesiredCapacity
CreationPolicy:
ResourceSignal:
Count: !Ref DesiredCapacity
Timeout: PT5M
UpdatePolicy:
AutoScalingScheduledAction:
IgnoreUnmodifiedGroupSizeProperties: true
AutoScalingRollingUpdate:
MinInstancesInService: 1
MaxBatchSize: 2
PauseTime: PT5M
WaitOnResourceSignals: true
LaunchConfig:
Type: AWS::AutoScaling::LaunchConfiguration
Properties:
ImageId: ami-b7d829d6
InstanceType: t2.micro
UserData:
'Fn::Base64':
!Sub |
#!/bin/bash -xe
sleep 120
apt-get -y install python-setuptools
TMP=`mktemp -d`
curl https://s3.amazonaws.com/cloudformation-examples/aws-cfn-bootstrap-latest.tar.gz | \
tar xz -C $TMP --strip-components 1
easy_install $TMP
/usr/local/bin/cfn-signal -e $? \
--stack ${AWS::StackName} \
--resource AutoScalingGroup \
--region ${AWS::Region}
Now I create the stack with a single instance, via:
$ aws cloudformation create-stack \
--region=ap-northeast-1 \
--stack-name=asg-test \
--template-body=file://template.yml \
--parameters ParameterKey=DesiredCapacity,ParameterValue=1
After waiting a few minutes for the creation to complete, let's look some key stack events:
$ aws cloudformation describe-stack-events \
--region=ap-northeast-1 \
--stack-name=asg-test
...
{
"Timestamp": "2017-02-03T05:36:45.445Z",
...
"LogicalResourceId": "AutoScalingGroup",
...
"ResourceStatus": "CREATE_COMPLETE",
...
},
{
"Timestamp": "2017-02-03T05:36:42.487Z",
...
"LogicalResourceId": "AutoScalingGroup",
...
"ResourceStatusReason": "Received SUCCESS signal with UniqueId ...",
"ResourceStatus": "CREATE_IN_PROGRESS"
},
{
"Timestamp": "2017-02-03T05:33:33.274Z",
...
"LogicalResourceId": "AutoScalingGroup",
...
"ResourceStatusReason": "Resource creation Initiated",
"ResourceStatus": "CREATE_IN_PROGRESS",
...
}
...
You can see that the auto scaling group started initiating at 05:33:33. At 05:36:42 (3 minutes after initiation), it received a success signal. This allowed the auto scaling group to reach its own success status only moments after, at 05:36:45.
That's awesome - working like a charm.
Now let's try increasing the number of instances in this auto scaling group to 2 by updating the stack:
$ aws cloudformation update-stack \
--region=ap-northeast-1 \
--stack-name=asg-test \
--template-body=file://template.yml \
--parameters ParameterKey=DesiredCapacity,ParameterValue=2
After waiting a much shorter time for the update to complete, let's look at some of the new stack events:
$ aws cloudformation describe-stack-events \
--region=ap-northeast-1 \
--stack-name=asg-test
{
"ResourceStatus": "UPDATE_COMPLETE",
...
"ResourceType": "AWS::CloudFormation::Stack",
...
"Timestamp": "2017-02-03T05:45:47.063Z"
},
...
{
"ResourceStatus": "UPDATE_COMPLETE",
...
"LogicalResourceId": "AutoScalingGroup",
"Timestamp": "2017-02-03T05:45:43.047Z"
},
{
"ResourceStatus": "UPDATE_IN_PROGRESS",
...,
"LogicalResourceId": "AutoScalingGroup",
"Timestamp": "2017-02-03T05:44:20.845Z"
},
{
"ResourceStatus": "UPDATE_IN_PROGRESS",
...
"ResourceType": "AWS::CloudFormation::Stack",
...
"Timestamp": "2017-02-03T05:44:15.671Z",
"ResourceStatusReason": "User Initiated"
},
....
Now you can see that whilst the auto scaling group started updating at 05:44:20, it completed at 05:45:43 - that's less than one and a half minutes to completion, which shouldn't be possible considering a sleep time of 120 seconds in the user data.
The stack update then proceeds to completion without the auto scaling group ever having received any signals.
The new instance does indeed exist.
In my real use case I've SSHed into one of these new instances to find that it was still in the process of initialising even after the stack update completed.
What I've Tried
I've read and re-read the documentation surrounding CreationPolicy
and UpdatePolicy
, but have failed to identify what I'm missing.
Taking a look at the update policy in use above, I don't understand what it's actually doing. Why is WaitOnResourceSignals
true, but it's not waiting? Is it serving some other purpose?
Or are these new instances not falling under the "rolling update" policy? If they don't belong there, then I'd expect them to fall under the creation policy, but that doesn't seem to apply either.
As such, I don't really know what else to try.
I have a sneaking feeling that it's functioning as designed/expected, but if it is then what's the point of that WaitOnResourceSignals
property and how can I meet the expectation set above?