20

In AWS for example, when I spin up a new EC2 instance, it loads up a new VM, then populate the VM with a container image. This is the reason why spinning up new EC2 instances take 60-90 seconds to start.

Out of curiosity, what are the disadvantages to having AWS run the host machine as-is, and when a user wants to "spin up an EC2 instance", AWS just spins up a container with restricted permissions, and allows the user only access to that container?

The upside would be that the compute instance would spin up very quickly. I'm still learning about cloud technologies, so I was just wondering what the downsides are.

Perhaps it is harder to allocate CPU resources without using VMs? And as a result, users would fight over each other to take the available CPU? Or perhaps there's some security concern? Would love to learn about this.

user3667125
  • 339
  • 2
  • 6

4 Answers4

36

Containers typically run only a single application and are immutable, i.e. and changes are not preserved across restarts. Containers also don't have their own kernel.

VMs on the other hand run the whole Operating System, including the kernel, init scripts, system daemons, etc. And the storage is typically preserved across restarts.

VMs and Containers serve different purpose - google something like "VMs vs Containers", there's plenty on the internet.

If you want to run Container as a service in AWS without having to worry about the underlaying VMs look at AWS Fargate - that does exactly what you want.

Hope that helps :)

MLu
  • 23,798
  • 5
  • 54
  • 81
  • Thanks! I'll look into those resource further. On initial glance, it seems like AWS Fargate startup time is still pretty long at ~20 seconds. When I load a docker container, it only takes 1 second to start. What else is Fargate doing that adds to the overhead in the startup time? I thought startup time is one advantage containers had over VMs. – user3667125 Jul 01 '20 at 07:41
  • 8
    Fargate has to download the container image from the repository first - that takes some time. The booting once the image is available is fast. – MLu Jul 01 '20 at 07:43
  • 4
    If you want really really quick start look at [AWS Lambda](https://aws.amazon.com/lambda) - that’s even more restricted than Container but has much faster start time. – MLu Jul 01 '20 at 07:45
  • 6
    @MLu: Containers are not necessarly immutable, it's a matter of choice. You mention typically which makes a difference, but it should be clarified. You 're referring mainly to Docker I suppose - which of course can be permanent if you like. – Krackout Jul 01 '20 at 07:46
  • 1
    @Krackout indeed, hence the word “typically” in my answer ;) – MLu Jul 01 '20 at 07:48
  • @MLu I'd place a second "typically" before "immutable" (or before "are") to clarify – Bergi Jul 01 '20 at 20:59
  • [Heroku Dynos](https://www.heroku.com/dynos) are another example of containers on AWS. Their [filesystem is ephemeral](https://devcenter.heroku.com/articles/dynos#ephemeral-filesystem), anything you want to persist must be somewhere else. Dynos do [rolling restarts](https://devcenter.heroku.com/articles/dynos#restarting) to ensure the system as a whole remains operational, startup time is not so important. – Schwern Jul 02 '20 at 05:19
  • AWS VMs are also typically immutable – user253751 Jul 02 '20 at 09:24
  • -1 for no mention of security implications. – trognanders Jul 03 '20 at 01:23
  • 1
    @trognanders and another -1 for not going into every detail of (non-)immutability, networking, resource efficiency, deployment strategies, and so on ;) – MLu Jul 03 '20 at 01:27
  • @MLu I apologize for being curt and honestly think your answer is good. In the context of AWS security is one of the biggest Docker vs VM reasons and above all else deserves to be mentioned. The other stuff is just *do I want Docker in my VM?* Gladly will change my vote with a small update! – trognanders Jul 03 '20 at 01:40
19

Your question is, to some extent, looking at things backwards: EC2 isn't a general-purpose hosting solution that happens to use VMs; it is a service for hosting VMs. As such, there's a few ways to interpret your question.

Why wasn't EC2 designed to use containers?

The answer to this can be deduced from the timeline: EC2 was launched in beta in 2006, and full production in 2008; Docker wasn't publicly released until 2013, and Kubernetes was 2015.

Container technology was being developed at the time EC2 launched - BSD already had "jails", and Linux had some forms of namespace isolation - but it wasn't the mature ecosystem we're familiar with today. Virtual Private Servers, on the other hand, were a well-established concept - VMWare explicitly marketed ESX for hosting services in 2002, the Xen hypervisor followed in 2003, and Linode was launched that same year. EC2's innovation was a system for launching virtual servers on demand using this established technology.

Why hasn't EC2 moved from VMs to containers?

Although containers can be thought of in some ways as "a light-weight VM", this is not a full description, and the two are not inter-changeable. A VM is designed to give the user the illusion that they are accessing a physical server, with full control of the entire system; resources such as networking are presented as virtual hardware with which the user can directly interact if they wish. Containers present a more limited abstraction, and the application is generally much more closely bound to the configuration of the container itself, such as only forwarding specific network ports.

Amazon has added many services over the years, but are very conservative about retiring old ones which customers rely on. So, they do offer many services based around containers rather than VMs, such as ECS (Elastic Container Service, launched 2014), Fargate (launched 2017), and EKS (Elastic Kubernetes Service, launched 2018); but they are unlikely to retire EC2 if users are still using it.

Why haven't users moved to container services?

Given that container-based cloud hosting is available, why do people still opt to use VM-based services like EC2?

I think there are several reasons; a few that come to mind:

  • Familiarity: People understand how to configure a VM, and can learn the differences between a local VM and an EC2 instance relatively quickly. Understanding container technology requires more re-training.
  • Migration cost: Existing systems can often be run un-modified on an EC2 instance, including entire operating systems and graphical interfaces. Containerising an application is generally more complex.
  • Security: The less of the system is shared, the lower the risk of data leaking to other customers. Container hosting services will often try to mitigate this by orchestrating separate VMs for each customer, but this has an obvious cost for some of the metrics you mention like startup speed.

So, although containers continue to grow in popularity, they have not yet completely replaced virtual servers, and probably never will. As such, EC2, and similar VM-based cloud hosting services, are here to stay.

IMSoP
  • 480
  • 2
  • 10
  • 1
    "Containerization" support in operating systems (and especially in Linux, but also FreeBSD Jails, and probably in Solaris) also requires a huge amount of incredibly complex extra code in the host kernel; and also of course most container creation and management tools are incredibly complex pieces of software. Complexity breeds insecurity. – Greg A. Woods Jul 04 '20 at 21:14
  • What are some security vulnerabilities that could arise from hosting a single VM that allows different customers to "spin up" containers on demand? Would it be something like gaining read/write access to some other customer's application/memory/disk? – user3667125 Jul 05 '20 at 08:11
  • @user3667125 Yes, that general kind of thing. Obviously that can theoretically happen in a full VM environment as well, if there's a bug in the hypervisor, or the hardware itself. My general understanding is that in a VM, most resources are completely simulated (e.g. access to the network is via virtual hardware), whereas in a container they might be shared but restricted (e.g. you can only open certain network ports); that leaves more scope for evading those restrictions without having to completely "break out" of the container. – IMSoP Jul 05 '20 at 11:50
17

Security is definitely a reason. Containers share the same kernel between them and the host. So they are not considered 100% isolated.

Yet cloud providers do provide containers also. AWS does it too. I suppose containers are cheaper than VMs, but I haven't checked.

In essence what you ask is a more general topic, VMs vs. containers; regardless of platform, the same pros and cons apply.

Glorfindel
  • 1,213
  • 3
  • 15
  • 22
Krackout
  • 1,559
  • 6
  • 17
  • 4
    I wouldn't consider virtual machines 100% isolated either, especially with all the CPU vulnerabilities that are being discovered. Still far better than running under the same kernel though, and virtualization is popular enough that CPUs have hardware to (attempt to) make them less likely to interact with each other or the host. – user Jul 01 '20 at 16:41
  • Yes, you have a point. Actually these vulnerabilities may give a boost to ARM adaption, instead of Intel, even on servers. We'll see what the future brings. – Krackout Jul 01 '20 at 16:50
  • 6
    @Krackout: Actually, the scary part about Spectre, Meltdown and their friends, relatives, and successors, is that they have demonstrated that *the entire industry as a whole* has systematically sacrificed security for performance, and has developed technologies without evaluating or understanding their security impact. Speculative and out-of-order execution are *fundamental* performance tricks that are used by *almost every vendor*. For example, Apple A11, several Nvidia Tegra models, some Qualcomm Snapdragons, MIPS, Sparc, POWER, PowerPC, are also vulnerable. – Jörg W Mittag Jul 01 '20 at 20:05
  • At the time Spectre and Meltdown were published, *all* Apple products except the Watch were vulnerable: all iPhones, all iPads, all Apple TVs, and all Macs. – Jörg W Mittag Jul 01 '20 at 20:07
  • Other CPUs are affected too, indeed. I have some AIX & Linux on Power and due to them I had to check mitigations on these platforms too. But I'm under the impression that other CPUs are less affected than Intel's. Didn't search in depth though, can't be certain. – Krackout Jul 01 '20 at 20:09
  • 2
    @Krackout Intel had some particular glaring bugs (i.e. "Meltdown"). But Spectre applies to every fast CPU. If it doesn't apply to your CPU, that's because your CPU is slow. End of story. – user253751 Jul 02 '20 at 09:25
  • AWS guarantees VM level isolation between different customers, so whatever containers they provide are using a VM as the host. – trognanders Jul 03 '20 at 01:22
  • @JörgWMittag Spectre etc are only an issue on machine that run more then one workload, it is only recently that it has been common for machines to run untrusted workloads. The industry solved the problem the customers had at the time the concept of these opermization were first being designed. – Ian Ringrose Jul 04 '20 at 11:41
12

There's several different approaches to containers, and the current accepted answer only seems to account for the OCI-style (docker-like) containers. There's many other types of containers, such as LXC and BSD jails, which have different approaches.

LXC for example can easily contain several applications, and is mutable by default. It also has init scripts and system daemons (systemd etc).


Perhaps it is harder to allocate CPU resources without using VMs? And as a result, users would fight over each other to take the available CPU?

The allocation for CPU, RAM and disk space resources can be done as easily with containers.

The upside would be that the compute instance would spin up very quickly.

Provisioning containers is not an instant task (but can be faster than "60-90 seconds") as you still have to get an image, extract it and start it up.

Or perhaps there's some security concern?

Security is a major source of concern on all of the container solutions I mentioned as they all share a kernel. While there's many security measures in place, there's still occasionally vulnerabilities that are found. If you had a shared server with your friends and you all had containers in them, you'd probably be mostly safe, but at the scale of large providers such as Amazon (where there's tons of businesses using their services), it can be significant security concern.

If you check the AWS Fargate website for example, it states that many resources for their containers aren't shared, and from that aspect it is much closer to a VM than a traditional self-hosted container:

Individual ECS tasks or EKS pods each run in their own dedicated kernel runtime environment and do not share CPU, memory, storage, or network resources with other tasks and pods. This ensures workload isolation and improved security for each task or pod.

One final concern I'd like to note is compatibility. As your access to the kernel (and also potentially your syscalls) is limited, you can't do certain tasks like loading dkms modules or doing sysctl configs. Not all applications will run in this, but those tend to be the exception rather than the norm.


There's many valid use cases for containers (both OCI-like and LXC-like), and it's definitely not a "one solution fits all" thing. Not having to run a whole kernel and do other types of virtualization (graphics, audio, network etc) does result in a lot less overhead, but there's also considerations that must be made about the cons of using containers, some of which I've mentioned in my answer.

ave
  • 220
  • 1
  • 6
  • Thank you! This post is extremely helpful. Why do people typically modify the kernel with dkms modules or sysctl configs? Is it generally to optimize speed of the application? – user3667125 Jul 02 '20 at 23:16
  • @user3667125 dkms modules are often used to add kernel-space functionality. Two relevant examples from my laptop are wireguard and vboxhost. Wireguard is merged into kernel 5.6 now so dkms modules aren't needed for that if you're running 5.6+ with the relevant build flags, but if you want virtualbox for example, you'll need vboxhost. [Further reading](https://wiki.archlinux.org/index.php/Dynamic_Kernel_Module_Support) – ave Jul 03 '20 at 01:06
  • @user3667125 sysctl can be used to configure a bunch of kernel parameters on runtime, [here's some further reading](https://wiki.archlinux.org/index.php/sysctl), one common one example "net.ipv4.ip_forward", which enables IP forwarding on IPv4. Enabling that specific one for example is required for certain network related tasks ([further reading](https://unix.stackexchange.com/q/14056/167345)). I'm sure there's stuff that can help speed up applications, but from what I've seen, they're usually used to make things actually start to work. It's more of a compatibility thing. – ave Jul 03 '20 at 01:09
  • Do note though, that for both dkms and kernel parameters, the host's kernel's options will be in place. So, if you control the environment and need wireguard dkms module for example, you can install it on the host and it'll be available on the containers. Same goes for kernel parameters: Set on host, and it'll apply to containers too. Those are mainly concerns when you lack access to the host. – ave Jul 03 '20 at 01:10
  • 1
    Great answer! Docker containers are also mutable by default and can *easily* contain multiple applications, init scripts, and work like a virtual machine with a full desktop distribution. However, this usage is frowned upon. – Aleksandr Dubinsky Jul 04 '20 at 11:48
  • Why are mutable Docker containers frowned upon? I did a quick search, and couldn't find any technical reasons. The response was typically like "mutable containers are treating containers like VMs, but you shouldn't do that" or "it makes them portable to help with CI workflows". But if one wants to use a docker container like a VM, and it's not used in a CI workflow, is there any reason it shouldn't be mutable? – user3667125 Jul 05 '20 at 07:53
  • @user3667125 That sounds like a new question, if you can word it to meet the site's guidelines. Remember that this site is strictly _not_ a discussion forum, but a Q&A site, as explained on the [tour]. – IMSoP Jul 05 '20 at 18:29