We are currently in the process of designing the architecture of our new Apache Mesos cloud setup. The goal is to unify our systems by moving different stacks onto the same architecture. The main workloads are big data analytics using Apache Spark and our corporate infrastructure including web servers, mail servers, etc.
The idea is to run our web services in Docker containers running on top of one of the available schedulers for Mesos (Marathon/Chronos, Aurora or Singularity). This would thus be the first Mesos framework group. Next to it, we would have the Apache Spark framework and several database frameworks for data storage. This would be the second group of Mesos frameworks. We will choose the specifics after running them all in parallel for testing.
We have trouble deciding, however, on which basis to run Mesos itself. Ideally, we want to run it as close to the metal as possible. We also want to use an orchestration solution to make sure that the Mesos & framework daemons are always running/restarted on failure. The options we are considering are as follows:
1) Running Mesos & the frameworks as docker containers in a minimal OS. In this respect, we are currently leaning towards CoreOS and Fleet.
2) Running Mesos & the frameworks directly on Ubuntu/Debian servers. For this option, we are leaning towards Foreman and Puppet.
As for the question, we are looking to identify the solution which, in order of importance:
- is the least complex to configure
- is the easiest to maintain & keep updated
- has the least overhead
We have not worked with CoreOS before, but it is the option that we seem to be heading towards. One big (subjective) issue I have with this is that we run Mesos on Docker containers and then we run Docker containers on Mesos. This seems "unclean" and wrong to me. Is this consideration without merit?
A similar thought concerns the redundancy between layers. To explain where I'm coming from, I would prefer if Mesos was an actual OS that just runs right on top of the metal. It seems that no matter what basis you use, you end up with the same intended functionality on more than one layer of the architecture (i.e. CoreOS&Fleet&SystemD == Mesos&Marathon&Chronos). Is this unavoidable?
Are there other good options to run the layer below Mesos that we failed to consider, keeping in mind our criteria?