CoreOS Introduction
If you’ve heard of Docker containers, you’ve probably also heard of CoreOS. If not, CoreOS is a lightweight, minimal Linux distribution designed for running containers in a clustered environment. This sounds ideal for running Apache Mesos, except for one caveat. CoreOS doesn’t have packages, so in order to run Mesos, we’re going to have to create containerised versions of all Mesos’ services.
Let’s quickly go over the key components of Apache Mesos:
- Zookeeper
- Mesos master
- Mesos slave
- Framework
Neither Zookeeper nor the Mesos master have any issues running in a container. However, the Mesos slave is a little more complex, as it expects to have access to the docker daemon. In order to accomplish this, we’re going to have to mount the host’s docker socket, executable, and related libraries into the container.
For a quick-and-dirty single node setup, fire up a fresh CoreOS installation and run the following commands:
# Grab our IP
export HOST_IP=`ip -o -4 addr list eth0 | grep global | awk '{print $4}' | cut -d/ -f1`
# Start Zookeeper
docker run -d \
--name=zookeeper --net=host --name=zookeeper jplock/zookeeper
# Start Mesos master
docker run -d \
--name=mesos_master --net=host mesosphere/mesos-master:0.20.1 \
--ip=$HOST_IP --zk=zk://$HOST_IP:2181/mesos --work_dir=/var/lib/mesos/master --quorum=1
# Start Mesos slave
docker run -d \
--name=mesos_slave --privileged --net=host \
-v /sys:/sys -v /usr/bin/docker:/usr/bin/docker:ro \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /lib64/libdevmapper.so.1.02:/lib/libdevmapper.so.1.02:ro \
-v /lib64/libpthread.so.0:/lib/libpthread.so.0:ro \
-v /lib64/libsqlite3.so.0:/lib/libsqlite3.so.0:ro \
-v /lib64/libudev.so.1:/lib/libudev.so.1:ro \
mesosphere/mesos-slave:0.20.1 \
--ip=$HOST_IP --containerizers=docker \
--master=zk://$HOST_IP:2181/mesos \
--work_dir=/var/lib/mesos/slave \
--log_dir=/var/log/mesos/slave
# Start framework, for example Marathon:
docker run -d \
--name marathon -e LIBPROCESS_PORT=9090 \-p 8080:8080 -p 9090:9090 \
mesosphere/marathon:v0.7.6 \
--master zk://$HOST_IP:2181/mesos --zk zk://$HOST_IP:2181/marathon \
--checkpoint --task_launch_timeout 300000
Just like that, we have a single host mesos “cluster.”
Expanding to a multiple-host cluster
If you’re unfamiliar with Mesos’ architecture, I covered it briefly in the Running Docker containers on Apache Mesos post, or you can read the official Mesos documentation. The key point is that Mesos needs to know the address(es) of a running Zookeeper quorum in order for nodes to register themselves in the cluster. In the above (single host) example, this was easily achieved by grabbing our own IP address. But now we’ll need to use some sort of service discovery.
The easy way would be to set up Zookeeper on a dedicated host (or 3) and use DNS. However, seeing as we talked about automated service discovery in the Service discovery for Docker containers using Consul blog post, let’s roll a completely automated solution. The only thing we need to know is each other’s IP addresses, which we can glean using CoreOS’s built-in etcd discovery.
Start some new CoreOS hosts with the following cloud-config
file:
#cloud-config
coreos:
etcd:
# generate a new token for each unique
# cluster from https://discovery.etcd.io/new
discovery: ——-Generate your own token here———-
# use $public_ipv4 if your datacenter of choice
# does not support private networking
addr: $private_ipv4:4001
peer-addr: $private_ipv4:7001
fleet:
# used for fleetctl ssh command
public-ip: $private_ipv4
units:
- name: etcd.service
command: start
- name: fleet.service
command: start
Here are some SystemD unit files for launching Consul, Registrator, and bootstrapping Consul.
Launch them on each node using systemctl
.
consul.service
:
[Unit]
Description=Consul
After=docker.service
Requires=docker.service
[Service]
Restart=on-failure
TimeoutStartSec=0
EnvironmentFile=/etc/environment
ExecStartPre=-/usr/bin/docker kill consul
ExecStartPre=-/usr/bin/docker rm consul
ExecStartPre=/usr/bin/docker pull progrium/consul
ExecStartPre=-/usr/bin/etcdctl mk /consul $COREOS_PUBLIC_IPV4
ExecStart=/usr/bin/sh -c “/usr/bin/docker run –rm –name consul -h $(/usr/bin/cat /etc/hostname) -p 8300:8300 -p 8301:8301 -p 8301:8301/udp -p 8302:8302 -p 8302:8302/udp -p 8400:8400 -p 8500:8500 -p 53:53/udp progrium/consul -server -bootstrap-expect 3 -advertise $(/usr/bin/ip -o -4 addr list eth0 | /usr/bin/grep global | /usr/bin/awk '{print $4}' | /usr/bin/cut -d/ -f1)”
ExecStop=/usr/bin/docker stop consul
[Install]
WantedBy=multi-user.target
consul-discovery.service
:
[Unit]
Description=Consul Discovery
BindsTo=consul.service
After=consul.service
[Service]
Restart=on-failure
EnvironmentFile=/etc/environment
ExecStart=/bin/sh -c “while true; do etcdctl mk /services/consul $COREOS_PUBLIC_IPV4 –ttl 60;/usr/bin/docker exec consul consul join $(etcdctl get /services/consul);sleep 45;done”
ExecStop=/usr/bin/etcdctl rm /services/consul –with-value %H
[Install]
WantedBy=multi-user.target
registrator.service
:
[Unit]
Description=Registrator
After=After=docker.service
Requires=docker.service
[Service]
Restart=on-failure
TimeoutStartSec=0
EnvironmentFile=/etc/environment
ExecStartPre=-/usr/bin/docker kill registrator
ExecStartPre=-/usr/bin/docker rm registrator
ExecStartPre=/usr/bin/docker pull progrium/registrator
ExecStart=/usr/bin/sh -c "/usr/bin/docker run --rm --name registrator -h $(/usr/bin/cat /etc/hostname) -v /var/run/docker.sock:/tmp/docker.sock progrium/registrator consul://$(/usr/bin/ip -o -4 addr list eth0 | grep global | awk \'{print $4}\' | cut -d/ -f1):8500"
ExecStop=/usr/bin/docker stop registrator
[Install]
WantedBy=multi-user.target
DevOps Engineer Final Thoughts
Alternatively, grab our example
archive-mesos-coreos-cluster-example
repository,
which contains all the appropriate unit files embedded inside cloud-config
files already, and give it a whirl!