Deploy Kubernetes on DigitalOcean: A Beginner-Friendly Guide

TOC:

Why DigitalOcean?

If you’ve experimented with Kubernetes, you’ve probably used the gcloud tool to create a 1-click cluster on Google Compute Engine. However, if you want to deploy Kubernetes on another cloud provider, or bare metal, the process is a little more involved. There are a couple of tutorials out there that are a little out of date, and involve manually adding the addresses of your cluster nodes to the Kubernetes master. In this post, I’ll show you how to set up an auto-configured Kubernetes cluster on DigitalOcean.

CoreOS

If you are familiar with Kubernetes, or read my introduction, you’ll recall Kubernetes requires a container runtime, and some method of routing traffic between containers running across multiple hosts. CoreOS is the perfect OS for this task, as it ships with Docker (the container runtime) and Flannel (a tool for assigning and routing subnets between containers). CoreOS also allows us to specify its configuration through a cloud-config file provided on first boot, so once we’ve generated a working configuration it can be reproduced at any time by starting a new machine with the same file.

Installing Kubernetes

CoreOS uses systemd as its init system, so we can provide it with systemd unit files to install and run Kubernetes. For example, we can drop in the following unit to install Kubernetes:

[Unit]
After=network-online.target
Description=Download Kubernetes Binaries
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
Requires=network-online.target

[Service]
Environment=KUBE_RELEASE_TARBALL=https://github.com/GoogleCloudPlatform/kubernetes/releases/download/v0.17.0/kubernetes.tar.gz
ExecStartPre=/bin/mkdir -p /opt/bin/
ExecStart=/bin/bash -c “curl -s -L $KUBE_RELEASE_TARBALL | tar xzv -C /tmp/”
ExecStart=/bin/tar xzvf /tmp/kubernetes/server/kubernetes-server-linux-amd64.tar.gz -C /opt
ExecStartPost=/bin/ln -s /opt/kubernetes/server/bin/kubectl /opt/bin/
ExecStartPost=/bin/mv /tmp/kubernetes/examples/guestbook /home/core/guestbook-example
ExecStartPost=/bin/rm -rf /tmp/kubernetes
ExecStartPost=/usr/bin/chmod -R a+r /opt/kubernetes
ExecStartPost=/usr/bin/chmod -R a+x /opt/kubernetes/server
RemainAfterExit=yes
Type=oneshot

In order to provide this unit to CoreOS, we will add it to a cloud-config file like so:

coreos:
  units:
  - name: download-kubernetes.service
    enable: true
    command: start
    content: |
      [Unit]
      After=network-online.target
      Description=Download Kubernetes Binaries
      Documentation=https://github.com/GoogleCloudPlatform/kubernetes
      Requires=network-online.target
      [Service]
      Environment=KUBE_RELEASE_TARBALL=http://dl.lwy.io/3rdparty/kubernetes.tar.gz
      ExecStartPre=/bin/mkdir -p /opt/bin/
      ExecStart=/bin/bash -c “curl -s -L $KUBE_RELEASE_TARBALL | tar xzv -C /tmp/”
      ExecStart=/bin/tar xzvf /tmp/kubernetes/server/kubernetes-server-linux-amd64.tar.gz -C /opt
      ExecStartPost=/bin/ln -s /opt/kubernetes/server/bin/kubectl /opt/bin/
      ExecStartPost=/bin/mv /tmp/kubernetes/examples/guestbook /home/core/guestbook-example
      ExecStartPost=/bin/rm -rf /tmp/kubernetes
      ExecStartPost=/usr/bin/chmod -R a+r /opt/kubernetes
      ExecStartPost=/usr/bin/chmod -R a+x /opt/kubernetes
      RemainAfterExit=yes
      Type=oneshot      

If you create a new CoreOS droplet, and paste the above cloud-config into the “user data” section, along with your new CoreOS machine you wil get a fresh installation of Kubernetes. However, we’ll need to define some more units if we actually want it to run!

More Units

The first thing Kubernetes needs to run is a working installation of etcd. We want to run all the “minion” nodes in proxy mode, passing all requests to the master etcd node, which means we require etcd >= 2.0.0. Note that this only shipped in CoreOS >= 653 (currently the alpha release is the only one new enough to contain it). The unit file looks like this:

- name: etcd2.service
  enable: true
  command: start
  content: |
    [Unit]
    Description=etcd2
    Conflicts=etcd.service
    [Service]
    User=etcd
    Environment=ETCD_DATA_DIR=/var/lib/etcd2
    Environment=ETCD_NAME=%H
    ExecStart=/usr/bin/etcd2
    Restart=always
    RestartSec=10s
    LimitNOFILE=40000    

The only thing of note is %H, which is replaced by systemd with the hostname of the machine, making it easier to decipher the logs if something goes wrong.

The Kubernetes master node requires 3 components, the API Server, Controller Manager and Scheduler, so here are the units:

- name: apiserver.service
  enable: true
  content: |
    [Unit]
    After=etcd2.service
    After=download-kubernetes.service
    Before=controller-manager.service
    Before=scheduler.service
    ConditionFileIsExecutable=/opt/kubernetes/server/bin/kube-apiserver
    Description=Kubernetes API Server
    Documentation=https://github.com/GoogleCloudPlatform/kubernetes
    Wants=etcd2.service
    Wants=download-kubernetes.service
    ConditionHost=master
    Requires=bootstrap-address-information.service
    [Service]
    ExecStart=/opt/kubernetes/server/bin/kube-apiserver
      –insecure_bind_address=0.0.0.0
      –insecure_port=8080
      –etcd_servers=http://127.0.0.1:4001
      –portal_net=10.0.0.0/16
      –cloud_provider=vagrant
      –allow_privileged=true
      –logtostderr=true –v=3
    Restart=always
    RestartSec=10
    [Install]
    WantedBy=kubernetes-master.target    

- name: scheduler.service
  enable: true
  content: |
    [Unit]
    After=apiserver.service
    After=download-kubernetes.service
    ConditionFileIsExecutable=/opt/kubernetes/server/bin/kube-scheduler
    Description=Kubernetes Scheduler
    Documentation=https://github.com/GoogleCloudPlatform/kubernetes
    Wants=apiserver.service
    ConditionHost=master
    Requires=bootstrap-address-information.service
    [Service]
    ExecStart=/opt/kubernetes/server/bin/kube-scheduler
      –logtostderr=true
      –master=127.0.0.1:8080
    Restart=always
    RestartSec=10
    [Install]
    WantedBy=kubernetes-master.target    

  - name: controller-manager.service
    enable: true
    content: |
      [Unit]
      After=etcd2.service
      After=download-kubernetes.service
      After=apiserver.service
      ConditionFileIsExecutable=/opt/kubernetes/server/bin/kube-controller-manager
      Description=Kubernetes Controller Manager
      Documentation=https://github.com/GoogleCloudPlatform/kubernetes
      Wants=apiserver.service
      Wants=etcd2.service
      Wants=download-kubernetes.service
      ConditionHost=master
      Requires=bootstrap-address-information.service
      [Service]
      ExecStartPre=/bin/bash -x -c ‘result=wget 
        --retry-connrefused --tries=5 127.0.0.1:8080/healthz 
        -O - && test -n “$${result}” && test “$${result}” = ok’
      ExecStart=/opt/kubernetes/server/bin/kube-controller-manager
        –machines=(comma separated list of minions)
        –cloud_provider=vagrant
        –master=127.0.0.1:8080
        –logtostderr=true
      Restart=always
      RestartSec=10
      [Install]
      WantedBy=kubernetes-master.target      

Things of note: we’re now using After, Before, Requires, Wants and ConditionHost to ensure the units start in the order we would like. These should be fairly self-explanitory, but refer to the SystemD documentation if there’s anything you don’t understand!

Finally, the “minion” nodes all need to be running Kubelet, Kube-proxy and Docker, like so:


- name: kubelet.service
  content: |
    [Unit]
    After=etcd2.service
    After=download-kubernetes.service
    ConditionFileIsExecutable=/opt/kubernetes/server/bin/kubelet
    Description=Kubernetes Kubelet
    Documentation=https://github.com/GoogleCloudPlatform/kubernetes
    Wants=etcd2.service
    Wants=download-kubernetes.service
    [Service]
    ExecStart=/opt/kubernetes/server/bin/kubelet
      –address=0.0.0.0
      –port=10250
      –api_servers=172.17.8.101:8080
      –hostname_override=$public_ipv4
      –host-network-sources=*
      –logtostderr=true
    Restart=always
    RestartSec=10
    [Install]
    WantedBy=kubernetes-minion.target    

- name: proxy.service
  content: |
    [Unit]
    After=etcd2.service
    After=download-kubernetes.service
    ConditionFileIsExecutable=/opt/kubernetes/server/bin/kube-proxy
    Description=Kubernetes Proxy
    Documentation=https://github.com/GoogleCloudPlatform/kubernetes
    Wants=etcd2.service
    Wants=download-kubernetes.service
    [Service]
    ExecStart=/opt/kubernetes/server/bin/kube-proxy 
      –master=http://172.17.8.101:7080 –logtostderr=true
    Restart=always
    RestartSec=10
    [Install]
    WantedBy=kubernetes-minion.target    

Note there are some hardcoded IPs in the above units, we’ll get to those later.

Configuring Everything

We’re almost there! We now need to configure etcd, Docker and flannel. We can do this using extra sections in the cloud-config file. On the master:

etcd2:
  initial-cluster: "master=http://(IP of master):2380"
  advertise-client-urls: http://(IP of master):2379
  initial-advertise-peer-urls: http://(IP of master):2380
  listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001
  listen-peer-urls: http://$private_ipv4:2380,http://$private_ipv4:7001
flannel:
  interface: $private_ipv4

units:
- name: etcd2.service
  enable: true
  command: start
  content: |
    [Unit]
    Description=etcd2
    Conflicts=etcd.service
    [Service]
    User=etcd
    Environment=ETCD_DATA_DIR=/var/lib/etcd2
    Environment=ETCD_NAME=%H
    ExecStart=/usr/bin/etcd2
    Restart=always
    RestartSec=10s
    LimitNOFILE=40000    

- name: rpc-statd.service
  command: start

- name: flanneld.service
  drop-ins:
  - name: 50-network-config.conf
    content: |
    [Service]
    ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config
      ‘{ “Network”: “192.168.0.0/16” }’
    command: start    

…and on the minions replace etcd2 with:

etcd2:
  proxy: "on"
  listen-client-urls: "http://0.0.0.0:2379,http://0.0.0.0:4001"
  listen-peer-urls: http://$private_ipv4:2380,http://$private_ipv4:7001
  initial-cluster: "master=http://(IP of master):2380"
  initial-cluster-state: "existing"

Finally, add some systemd targets to ensure all units are started:

- name: kubernetes-master.target
  enable: true
  content: |
    [Unit]
    Description=Kubernetes Cluster Master
    Documentation=http://kubernetes.io/
    RefuseManualStart=no
    After=flanneld.service
    Requires=flanneld.service
    After=bootstrap-address-information.service
    ConditionHost=master
    Wants=apiserver.service
    Wants=scheduler.service
    Wants=controller-manager.service
    Requires=bootstrap-address-information.service
    [Install]
    WantedBy=multi-user.target    

- name: kubernetes-minion.target
  enable: true
  content: |
    [Unit]
    Description=Kubernetes Cluster Minion
    Documentation=http://kubernetes.io/
    RefuseManualStart=no
    ConditionHost!=master
    After=bootstrap-address-information.service
    After=flanneld.service
    Requires=flanneld.service
    Wants=proxy.service
    Wants=kubelet.service
    Requires=bootstrap-address-information.service
    [Install]
    WantedBy=multi-user.target    

That’s a lot of configuring, but we’re not done yet! If you’ve noticed, there are still some hardcoded IP addresses in those unit files. We don’t want to have to input these manually every time, so we’re going to have to implement some kind of node discovery.

Enter confd

Seeing as every node already has etcd, if we store node information in etcd itself, we can use this information to configure Kubernetes! To generate the configuation, we’re going to use a tool called confd, which watches etcd and populates go templates with values in real-time, then executes a custom command (in our case, regenerating the systemd unit file, the reloading it).

Confd is configured by dropping toml files in /etc/confd/conf.d. Here’s an example of how to drop in files using cloud-config:

write-files:
- path: /etc/confd/conf.d/controller-manager.toml
  permissions: 0644
  owner: root
  content: |
    [template]
    src = "controller-manager.template"
    dest = "/etc/systemd/system/controller-manager.service"
    keys = [
      "/bootstrap/node-01",
      "/bootstrap/node-02",
      "/bootstrap/node-03"
    ]
    reload_cmd = "systemctl daemon-reload && systemctl restart controller-manager"    

This means watch update the controller-manager service based on the information in the 3 keys, then reload the unit file and restart it. Here’s what /etc/confd/templates/controller-manager.template looks like:

[Unit]
After=etcd2.service
After=download-kubernetes.service
After=apiserver.service
ConditionFileIsExecutable=/opt/kubernetes/server/bin/kube-controller-manager
Description=Kubernetes Controller Manager
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
Wants=apiserver.service
Wants=etcd2.service
Wants=download-kubernetes.service
ConditionHost=master
Requires=bootstrap-address-information.service

[Service]
ExecStartPre=/bin/bash -x -c ‘result=wget --retry-connrefused 
  --tries=5 127.0.0.1:8080/healthz -O - && test -n “$${result}” && test “$${result}” = ok’
ExecStart=/opt/kubernetes/server/bin/kube-controller-manager
  –machines={{getv “/bootstrap/node-01”}},{{getv “/bootstrap/node-02”}},{{getv “/bootstrap/node-03”}}
  –cloud_provider=vagrant
  –master=127.0.0.1:8080
  –logtostderr=true
Restart=always
RestartSec=10

[Install]
WantedBy=kubernetes-master.target

Note the {{}} sections, which are replaced by values extracted from etcd. We’ll configure all the sections in a similar way.

Even if you’ve followed everything up to this point, I don’t expect you to have pieced together a working cloud-config file, so I’ve added a couple of files to this GitHub repository you can play with. The only thing you will need to change is the discovery mechanism for the etcd2 cluster. I bootstrap it using DNS, but if you would rather use the etcd discovery service you can!

Deploying on DigitalOcean

Finally, you can deploy your cloud-configs onto DigitalOcean. The mundane way is to manually spin up a master and 3 minions by pasting the contents into the user data section on the “create droplet” page. However, that’s boring and slow, so we’re going to use Terraform, by Hashicorp to do a 1-command deploy.

Generate a DigitalOcean personal access token on the API page, then run the following commands to deploy!

export SSH_FINGERPRINT=$(ssh-keygen -lf ~/.ssh/id_rsa.pub | awk '{print $2}')
export DO_PAT="your key here"
export PUB_KEY="/home/username/.ssh/id_rsa.pub"
export PVT_KEY="/home/username/.ssh/id_rsa"

terraform apply \
  -var "do_token=$DO_PAT" \
  -var "pub_key=$PUB_KEY" \
  -var "pvt_key=$PVT_KEY" \
  -var "ssh_fingerprint=$SSH_FINGERPRINT"

DevOps Engineer Final Thoughts

Deploying Kubernetes on bare metal is a bit of an exercise, but knowing how to do it stops you being limited to GCE. If you try this and have any issues, feel free to leave a comment on this post!