Running Apache Mesos in Docker on CoreOS: A Beginners Guide

TOC:

CoreOS Introduction

If you’ve heard of Docker containers, you’ve probably also heard of CoreOS. If not, CoreOS is a lightweight, minimal Linux distribution designed for running containers in a clustered environment. This sounds ideal for running Apache Mesos, except for one caveat. CoreOS doesn’t have packages, so in order to run Mesos, we’re going to have to create containerised versions of all Mesos’ services.

Let’s quickly go over the key components of Apache Mesos:

Neither Zookeeper nor the Mesos master have any issues running in a container. However, the Mesos slave is a little more complex, as it expects to have access to the docker daemon. In order to accomplish this, we’re going to have to mount the host’s docker socket, executable, and related libraries into the container.

For a quick-and-dirty single node setup, fire up a fresh CoreOS installation and run the following commands:

# Grab our IP
export HOST_IP=`ip -o -4 addr list eth0 | grep global | awk '{print $4}' | cut -d/ -f1`
# Start Zookeeper
docker run -d \
  --name=zookeeper --net=host --name=zookeeper jplock/zookeeper
# Start Mesos master
docker run -d \
  --name=mesos_master --net=host mesosphere/mesos-master:0.20.1 \
--ip=$HOST_IP --zk=zk://$HOST_IP:2181/mesos --work_dir=/var/lib/mesos/master --quorum=1

# Start Mesos slave
docker run -d \
  --name=mesos_slave --privileged --net=host \
  -v /sys:/sys -v /usr/bin/docker:/usr/bin/docker:ro \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v /lib64/libdevmapper.so.1.02:/lib/libdevmapper.so.1.02:ro \
  -v /lib64/libpthread.so.0:/lib/libpthread.so.0:ro \
  -v /lib64/libsqlite3.so.0:/lib/libsqlite3.so.0:ro \
  -v /lib64/libudev.so.1:/lib/libudev.so.1:ro \
  mesosphere/mesos-slave:0.20.1 \
  --ip=$HOST_IP --containerizers=docker \
  --master=zk://$HOST_IP:2181/mesos \
  --work_dir=/var/lib/mesos/slave \
  --log_dir=/var/log/mesos/slave

# Start framework, for example Marathon:
docker run -d \
  --name marathon -e LIBPROCESS_PORT=9090 \-p 8080:8080 -p 9090:9090 \
  mesosphere/marathon:v0.7.6 \
  --master zk://$HOST_IP:2181/mesos --zk zk://$HOST_IP:2181/marathon \
  --checkpoint --task_launch_timeout 300000

Just like that, we have a single host mesos “cluster.”

Expanding to a multiple-host cluster

If you’re unfamiliar with Mesos’ architecture, I covered it briefly in the Running Docker containers on Apache Mesos post, or you can read the official Mesos documentation. The key point is that Mesos needs to know the address(es) of a running Zookeeper quorum in order for nodes to register themselves in the cluster. In the above (single host) example, this was easily achieved by grabbing our own IP address. But now we’ll need to use some sort of service discovery.

The easy way would be to set up Zookeeper on a dedicated host (or 3) and use DNS. However, seeing as we talked about automated service discovery in the Service discovery for Docker containers using Consul blog post, let’s roll a completely automated solution. The only thing we need to know is each other’s IP addresses, which we can glean using CoreOS’s built-in etcd discovery.

Start some new CoreOS hosts with the following cloud-config file:

#cloud-config
coreos:
  etcd:
    # generate a new token for each unique
    # cluster from https://discovery.etcd.io/new
    discovery: ——-Generate your own token here———-
    # use $public_ipv4 if your datacenter of choice
    # does not support private networking
    addr: $private_ipv4:4001
    peer-addr: $private_ipv4:7001
fleet:
  # used for fleetctl ssh command
  public-ip: $private_ipv4
units:
  - name: etcd.service
    command: start
  - name: fleet.service
    command: start

Here are some SystemD unit files for launching Consul, Registrator, and bootstrapping Consul. Launch them on each node using systemctl.

consul.service:

[Unit]
Description=Consul
After=docker.service
Requires=docker.service

[Service]
Restart=on-failure
TimeoutStartSec=0
EnvironmentFile=/etc/environment
ExecStartPre=-/usr/bin/docker kill consul
ExecStartPre=-/usr/bin/docker rm consul
ExecStartPre=/usr/bin/docker pull progrium/consul
ExecStartPre=-/usr/bin/etcdctl mk /consul $COREOS_PUBLIC_IPV4
ExecStart=/usr/bin/sh -c “/usr/bin/docker run –rm –name consul -h $(/usr/bin/cat /etc/hostname) -p 8300:8300 -p 8301:8301 -p 8301:8301/udp -p 8302:8302 -p 8302:8302/udp -p 8400:8400 -p 8500:8500 -p 53:53/udp progrium/consul -server -bootstrap-expect 3 -advertise $(/usr/bin/ip -o -4 addr list eth0 | /usr/bin/grep global | /usr/bin/awk '{print $4}' | /usr/bin/cut -d/ -f1)”
ExecStop=/usr/bin/docker stop consul

[Install]
WantedBy=multi-user.target

consul-discovery.service:

[Unit]
Description=Consul Discovery
BindsTo=consul.service
After=consul.service

[Service]
Restart=on-failure
EnvironmentFile=/etc/environment
ExecStart=/bin/sh -c “while true; do etcdctl mk /services/consul $COREOS_PUBLIC_IPV4 –ttl 60;/usr/bin/docker exec consul consul join $(etcdctl get /services/consul);sleep 45;done”
ExecStop=/usr/bin/etcdctl rm /services/consul –with-value %H

[Install]
WantedBy=multi-user.target

registrator.service:

[Unit]
Description=Registrator
After=After=docker.service
Requires=docker.service

[Service]
Restart=on-failure
TimeoutStartSec=0
EnvironmentFile=/etc/environment
ExecStartPre=-/usr/bin/docker kill registrator
ExecStartPre=-/usr/bin/docker rm registrator
ExecStartPre=/usr/bin/docker pull progrium/registrator
ExecStart=/usr/bin/sh -c "/usr/bin/docker run --rm --name registrator -h $(/usr/bin/cat /etc/hostname) -v /var/run/docker.sock:/tmp/docker.sock progrium/registrator consul://$(/usr/bin/ip -o -4 addr list eth0 | grep global | awk \'{print $4}\' | cut -d/ -f1):8500"
ExecStop=/usr/bin/docker stop registrator

[Install]
WantedBy=multi-user.target

DevOps Engineer Final Thoughts

Alternatively, grab our example archive-mesos-coreos-cluster-example repository, which contains all the appropriate unit files embedded inside cloud-config files already, and give it a whirl!