CloudFormation AutoScalingGroup not waiting for si

2019-03-20 15:45发布

问题:

I'm working with a CloudFormation template that brings up as many instances as I request, and want to wait for them to finish initialising (via User Data) before the stack creation/update is considered complete.

The Expectation

Creating or updating the stack should wait for signals from all newly created instances, such to ensure that their initialisation is complete.

I don't want the stack creation or update to be considered successful if any of the created instances fail to initialise.

The Reality

CloudFormation only seems to wait for signals from instances when the stack is first created. Updating the stack and increasing the number of instances seems to disregard signalling. The update operation finishes successfully very quickly, whilst instances are still being initialised.

Instances created as a result of updating the stack can fail to initialise, but the update action would've already been considered a success.

The Question

Using CloudFormation, how can I make the reality meet the expectation?

I want the same behaviour that applies when the stack is created, to when the stack is updated.

Similar Questions

I have found only the following question that matches my problem: UpdatePolicy in Autoscaling group not working correctly for AWS CloudFormation update

It's been open for a year and has not received an answer.

I'm creating another question as I've more information to add, and I'm not sure if these particulars will match those of the author in that question.

Reproducing

To demonstrate the problem, I've created a template based off of the example beneath the Auto Scaling Group header on this AWS documentation page, which includes signalling.

The created template has been adapted as so:

  • It uses an Ubuntu AMI (in region ap-northeast-1). The cfn-signal command has been bootstrapped and called as necessary considering this change.
  • A new parameter dictates how many instances to launch in the auto scaling group.
  • A sleep time of 2 minutes has been added before signalling, to simulate the time spent whilst initialising.

Here's the template, saved to template.yml:

Parameters:
  DesiredCapacity:
    Type: Number
    Description: How many instances would you like in the Auto Scaling Group?

Resources:
  AutoScalingGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      AvailabilityZones: !GetAZs ''
      LaunchConfigurationName: !Ref LaunchConfig
      MinSize: !Ref DesiredCapacity
      MaxSize: !Ref DesiredCapacity
    CreationPolicy:
      ResourceSignal:
        Count: !Ref DesiredCapacity
        Timeout: PT5M
    UpdatePolicy:
      AutoScalingScheduledAction:
        IgnoreUnmodifiedGroupSizeProperties: true
      AutoScalingRollingUpdate:
        MinInstancesInService: 1
        MaxBatchSize: 2
        PauseTime: PT5M
        WaitOnResourceSignals: true

  LaunchConfig:
    Type: AWS::AutoScaling::LaunchConfiguration
    Properties:
      ImageId: ami-b7d829d6
      InstanceType: t2.micro
      UserData:
        'Fn::Base64':
          !Sub |
            #!/bin/bash -xe
            sleep 120

            apt-get -y install python-setuptools
            TMP=`mktemp -d`
            curl https://s3.amazonaws.com/cloudformation-examples/aws-cfn-bootstrap-latest.tar.gz | \
              tar xz -C $TMP --strip-components 1
            easy_install $TMP

            /usr/local/bin/cfn-signal -e $? \
              --stack ${AWS::StackName} \
              --resource AutoScalingGroup \
              --region ${AWS::Region}

Now I create the stack with a single instance, via:

$ aws cloudformation create-stack \
  --region=ap-northeast-1 \
  --stack-name=asg-test \
  --template-body=file://template.yml \
  --parameters ParameterKey=DesiredCapacity,ParameterValue=1

After waiting a few minutes for the creation to complete, let's look some key stack events:

$ aws cloudformation describe-stack-events \
  --region=ap-northeast-1 \
  --stack-name=asg-test

 

    ...
    {
        "Timestamp": "2017-02-03T05:36:45.445Z",
        ...
        "LogicalResourceId": "AutoScalingGroup",
        ...
        "ResourceStatus": "CREATE_COMPLETE",
        ...
    },
    {
        "Timestamp": "2017-02-03T05:36:42.487Z",
        ...
        "LogicalResourceId": "AutoScalingGroup",
        ...
        "ResourceStatusReason": "Received SUCCESS signal with UniqueId ...",
        "ResourceStatus": "CREATE_IN_PROGRESS"
    },
    {
        "Timestamp": "2017-02-03T05:33:33.274Z",
        ...
        "LogicalResourceId": "AutoScalingGroup",
        ...
        "ResourceStatusReason": "Resource creation Initiated",
        "ResourceStatus": "CREATE_IN_PROGRESS",
        ...
    }
    ...

You can see that the auto scaling group started initiating at 05:33:33. At 05:36:42 (3 minutes after initiation), it received a success signal. This allowed the auto scaling group to reach its own success status only moments after, at 05:36:45.

That's awesome - working like a charm.

Now let's try increasing the number of instances in this auto scaling group to 2 by updating the stack:

$ aws cloudformation update-stack \
  --region=ap-northeast-1 \
  --stack-name=asg-test \
  --template-body=file://template.yml \
  --parameters ParameterKey=DesiredCapacity,ParameterValue=2

After waiting a much shorter time for the update to complete, let's look at some of the new stack events:

$ aws cloudformation describe-stack-events \
  --region=ap-northeast-1 \
  --stack-name=asg-test

 

    {
        "ResourceStatus": "UPDATE_COMPLETE",
        ...
        "ResourceType": "AWS::CloudFormation::Stack",
        ...
        "Timestamp": "2017-02-03T05:45:47.063Z"
    },
    ...
    {
        "ResourceStatus": "UPDATE_COMPLETE",
        ...
        "LogicalResourceId": "AutoScalingGroup",
        "Timestamp": "2017-02-03T05:45:43.047Z"
    },
    {
        "ResourceStatus": "UPDATE_IN_PROGRESS",
        ...,
        "LogicalResourceId": "AutoScalingGroup",
        "Timestamp": "2017-02-03T05:44:20.845Z"
    },
    {
        "ResourceStatus": "UPDATE_IN_PROGRESS",
        ...
        "ResourceType": "AWS::CloudFormation::Stack",
        ...
        "Timestamp": "2017-02-03T05:44:15.671Z",
        "ResourceStatusReason": "User Initiated"
    },
    ....

Now you can see that whilst the auto scaling group started updating at 05:44:20, it completed at 05:45:43 - that's less than one and a half minutes to completion, which shouldn't be possible considering a sleep time of 120 seconds in the user data.

The stack update then proceeds to completion without the auto scaling group ever having received any signals.

The new instance does indeed exist.

In my real use case I've SSHed into one of these new instances to find that it was still in the process of initialising even after the stack update completed.

What I've Tried

I've read and re-read the documentation surrounding CreationPolicy and UpdatePolicy, but have failed to identify what I'm missing.

Taking a look at the update policy in use above, I don't understand what it's actually doing. Why is WaitOnResourceSignals true, but it's not waiting? Is it serving some other purpose?

Or are these new instances not falling under the "rolling update" policy? If they don't belong there, then I'd expect them to fall under the creation policy, but that doesn't seem to apply either.

As such, I don't really know what else to try.

I have a sneaking feeling that it's functioning as designed/expected, but if it is then what's the point of that WaitOnResourceSignals property and how can I meet the expectation set above?

回答1:

The AutoScalingRollingUpdate policy handles rotating out an entire set of instances in an Auto Scaling group in response to changes to the underlying LaunchConfiguration. It doesn't apply to individual changes to the number of instances in the existing group. According to the UpdatePolicy Attribute documentation,

The AutoScalingReplacingUpdate and AutoScalingRollingUpdate policies apply only when you do one or more of the following:

  • Change the Auto Scaling group's AWS::AutoScaling::LaunchConfiguration.
  • Change the Auto Scaling group's VPCZoneIdentifier property
  • Update an Auto Scaling group that contains instances that don't match the current LaunchConfiguration.

Changing the Auto Scaling group's DesiredCapacity property is not in this list, so the AutoScalingRollingUpdate policy does not apply to this type of change.

As far as I know, it is not possible (using standard AWS CloudFormation resources) to delay the completion of a Stack Update modifying DesiredCapacity until any new instances added to the Auto Scaling Group are fully provisioned.

Here are some alternative options:

  1. Instead of modifying only DesiredCapacity, modify a LaunchConfiguration property at the same time. This will trigger an AutoScalingRollingUpdate to the desired capacity (the downside is that it will also update existing instances, which may not actually need to be modified).
  2. Add an AWS::AutoScaling::LifecycleHook resource to your Auto Scaling Group, and call aws autoscaling complete-lifecycle-action in addition to cfn-signal, to signal lifecycle-hook completion. This won't delay your CloudFormation stack update as desired, but it will delay the individual auto-scaled instances from entering the InService state until the lifecycle signal is received. (See Lifecycle Hooks documentation for more info.)
  3. As an extension to #2, it should be possible to add a Lifecycle Hook to your Auto Scaling group, as well as a Custom Resource that polls your Auto Scaling Group and only completes when the Auto Scaling group contains the DesiredCapacity number of instances all in the InService state.


回答2:

the rolling update only works for existing instances. The documentation says:

Rolling updates enable you to specify whether AWS CloudFormation updates instances that are in an Auto Scaling group in batches or all at once.

So to test this, create a stack based on your template. than make a small modification to the launch config (e.g. set sleep 120 to 121) and update the stack. now you should see a rolling update.