Serverless Infrastructure Autoscaling

Large-scale compute clusters are expensive, so it is important to use them well. Utilization and efficiency can be increased by running a mix of workloads on the same machines: CPU- and memory-intensive jobs, small and large ones, and a mix of offline and low-latency jobs – ones that serve end-user requests or provide infrastructure services such as storage, naming or locking.

The challenge of scaling containers

The ECS topology is built on Clusters, where each cluster has Services (which can be referred to as applications), and services run Tasks. Each Task has a Task definition which tells the scheduler how much resources the Task requires.

For example, if a cluster runs 10 machines of c3.large (2 vCPUs and 3.8 GiB of RAM) and 10 machines of c4.xlarge (4 vCPUs and 7.5 GiB of RAM), the total vCPUs is 60*1024 = 61,440 CPU Units and the total RAM is 113 GiB.

The issue here is that if a single Task requires more RAM than the individual instance has, it can’t be scheduled. In the example above, a task with 16 GiB of RAM won’t start, despite the total available RAM being 113 GiB. Elastigroup matches the Task with the appropriate Instance Type / Size, with zero overhead or management.

Ocean dynamically scales the cluster up and down to ensure there are always sufficient resources to run all tasks and at the same time maximizes resource utilization in the cluster. It does so by optimizing task placement across the cluster in a process we call Tetris Scaling, and by automatically managing Headroom – a buffer of spare capacity (in terms of both memory and CPU) that makes sure that when you want to scale quickly more containers, you don’t have to wait for new VMs (Instances) to be provisioned.

Scale Down Behavior

Ocean monitors the cluster and runs bin-packing algorithms that simulate different permutations of task placement across the available container instances. A container instance is considered for scale down when:

  • All the running tasks on the particular instance are schedulable on other instances.
  • The instance’s removal won’t reduce the headroom below the target.
  • Ocean will prefer to downscale the least utilized instances first.

When an instance is chosen for scale-down it will be drained. Its running tasks are rescheduled on other instances, and the instance is then terminated.

Note: Scale-Down actions are limited by default to 10% of the cluster size at a time. This parameter is configurable.
Scale Down Prevention

It is possible to mark a resource so that it will not be scaled down. You can use the tag to prevent the scale down of a task or a service.

The following is an example of the tag that you add to task definition or the service:

  "tags": [
      "key": "",
      "value": "true"
Note: When you add a new tag in a task definition, you must create a new revision. Otherwise, the new tag will not be visible.

For more information about tagging in ECS, see Tagging Your Amazon EC2 Resources.


Ocean provides the option to include a buffer of spare capacity (vCPU and memory resources) known as headroom. Headroom ensures that the cluster has the capacity to quickly scale more tasks without waiting for new container instances to be provisioned. You can configure headroom in specific amounts of vCPU and memory or specify headroom as a percentage of the total CPU and memory.
Ocean optimally manages the headroom to provide the best possible cost/performance balance. However, headroom may also be manually configured to support any use case.

Labels & Constraints

Ocean supports built-in and custom Task placement constraints within the scaling logic. Task placement constraints give you the ability to control where tasks are scheduled, such as in a specific Availability Zone or on instances of a specific type. You can utilize the built-in ECS container attributes or create your own custom key-value attribute and add a constraint to place your tasks based on the desired attribute. To configure task placement constraints on Ocean see Launch Specifications.

Daemon Tasks

Daemon tasks run on each instance or on a selected set of instances in an Amazon ECS cluster and can be used to provide common functionality, such as logging and monitoring. Ocean automatically identifies and accounts for Daemon Tasks when optimizing capacity allocation to make sure the launched instances have enough capacity for both the daemon services and the pending tasks. It also monitors for new container instances in the cluster and adds the Daemon Tasks to them. Ocean supports and considers Daemon services and tasks, both for scale down and scale up behavior.

Scale down: A Daemon task which was part of a scaled-down instance won’t initialize a launch of a new instance and will not be placed on a different container instance.

Scale up: In case there is a Daemon scheduling strategy configured to one of the cluster services, Elastigroup will consider all newly launched instances to have enough spare capacity available in order to run the Daemon task properly in addition to other pending tasks.

Customizing scaling configuration

Ocean manages the cluster capacity to ensure all tasks are running and that resources are utilized.
If you wish to override the default configuration, you can customize the scaling configuration.
To customize the scaling configuration:

  1. Navigate to your Ocean cluster.
  2. Click on the ‘Actions’ button on the top-right side of the screen to open the actions menu.
  3. Choose ‘Customize Scaling’.