How to automate OS/ECS-agent updating on a EC2 instance in ECS Auto Scaling environment?

Question

First off: I feel like I still don't understand some of the fundamental concepts of AWS, so please bear with me, if this question is noobish.

I have the following set-up in AWS:

1 ECS-Cluster with 1 single service
The Cluster is configured to use 1 single EC2 instance
This EC2 instance is part of an AutoScaling group, that is based on a specific Launch Configuration. (The Cluster Setup configured it like this, which makes sense, I guess.)

I've developed a few preconceptions/conditions

I don't care about the EC2 instance, because my service runs machine-agnostic
My service only ever needs to be run on 1 instance at a time. I only use ECS to have a simple way to run a dockerized application.
I don't care about downtime at specific times.
There is a predefined Elastic IP that has to be used with the service.
I want this service to be as automated as possible. When something goes wrong, we can fix things (uptime is not as critical), but I never want to SSH to the EC2 instance or anything like that.

With help of CloudWatch and Lambda, I have set-up the following tasks:

The instance is identified by the cluster name, that is automatically added to the Name tag.

Once a week, the cluster instance reboots. This renews certificates and configuration, because the service does that on startup. (I probably could also have scheduled the service to be killed and restarted inside the cluster somehow...)
Everytime a new EC2 instance of the cluster starts, it will be assigned the predefined Elastic IP.
Once a month, the EC2 instance gets terminated to be automatically replaced by a new one, started by the Auto Scaling Group.

Now my hope was, that once a new instance gets created by the Auto Scaling Group, it will have the latest and greatest AMI including the latest ECS agent.

Correct me if I'm wrong, but when I looked at the Launch Configuration for this Auto Scaling Group, I figured this won't be the case, because it always takes the configured AMI.

My general question is: What use does this set-up have, when I need to manually check in every once in a while (when exactly?) to update the AMI in the Launch Configuration and then terminate the instance to have a new one replace it?

I understand that many people probably don't want to automate OS updates in a production cluster, because they want to test it first. But still, one might want to have a staging environment, where OS updates are applied automatically. Why do I use a highly automateable platform, when I still need to roll out OS updates manually. Is this a conceptual misunderstanding on my side?

Well, automatically applying OS/AMI updates can cause more damage than good. However, if you have a good enough pipeline with full covering tests I see no issue in doing it. Could you maybe create another Lambda that checks if there is a new AMI available, and if so, update the launch configuration with the latest AMI? Or rather, updates the AMI parameter in your CloudFormation stack which I’m hoping you have in place for this. — Bazze, Oct 11 '17 at 21:15
Thanks for your comment. I ended up subscribing via SNS to the update feed for ECS AMIs http://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS-AMI-SubscribeTopic.html. This way updates can be applied manually in time when they are released. I think, the only way to automate this, despite the possible hassles of automatic OS updates, would indeed be a custom Lambda function that updates the Launch Configuration. There also is a CloudFormation stack for this app, but I don't really understand why there are two places I can configure the AMI: Launch Configuration & CF stack? — Thomas Ebert, Oct 16 '17 at 08:15
That sounds like a sane first step. The CFN stack is just a way to manage your infrastructure, so if you have a parameter in your CFN stack for the AMI id you are most likely referring to this parameter in the Launch Configuration resource. So if you have the launch config resource in a CFN stack, you should only update the stack and not the launch config directly (since they would become out of sync). Preferably all AWS resources should be created/updated through CFN since this helps you follow the "infrastructure as code" concept. — Bazze, Oct 16 '17 at 08:58

score 1 · Accepted Answer · answered Mar 11 '20 at 00:01

I have created a Lambda function to update the instance agent in all my ECS clusters:

var AWS = require('aws-sdk');
AWS.config.update({ region: 'sa-east-1' });

exports.handler = async(event, context) => {
    var ecs = new AWS.ECS();

    var responseArray = [];

    const clusters = await ecs.listClusters({}).promise();

    for (var i = 0; i < clusters.clusterArns.length; i++) {
        const clusterArn = clusters.clusterArns[i];

        const clusterInstances = await ecs.listContainerInstances({
            cluster: clusterArn
        }).promise();

        for (var j = 0; j < clusterInstances.containerInstanceArns.length; j++) {
            const containerInstanceArn = clusterInstances.containerInstanceArns[j];

            try {
                const response = await ecs.updateContainerAgent({
                    containerInstance: containerInstanceArn,
                    cluster: clusterArn
                }).promise();

                responseArray.push({
                    cluster: clusterArn,
                    containerInstance: containerInstanceArn,
                    response: response
                });
            }
            catch (e) {
                responseArray.push({
                    cluster: clusterArn,
                    containerInstance: containerInstanceArn,
                    response: e
                });
            }
        }
    }

    return responseArray;
};

Then I created a CloudWatch event rule to execute lambda function daily. Works good for me.

Thanks a lot for your response. I'm sorry I can't actually check anymore if your setup works, because I've moved on from this project and someone else is taking care of it now. Lately AWS has also introduced Fargate, which is a managed version of ECS that takes some of this pain away. That said, if someone confirms that your response works, I'd be happy to accept your response as the correct answer! — Thomas Ebert, Mar 11 '20 at 09:38

How to automate OS/ECS-agent updating on a EC2 instance in ECS Auto Scaling environment?

1 Answers1