Simplify Distributed Storage with GlusterFS: A Quick Guide

TOC:

Persistent Storage Requirement

A common issue with cloud applications is the requirement for persistent storage. If you’re spinning up instances all across your datacentre, you have no idea which physical host will end up serving your applications. This points to the need for some sort of networked file system. If you’ve ever used a Linux-based network, you’ve probably heard of NFS. But NFS is a poor choice for distributed / concurrent systems, and as that is what cloud computing is all about, we will need a better solution.

GlusterFS Overview

GlusterFS is an “open source, distributed file system designed for massive scale.” However, this is slightly misleading. Gluster isn’t really a filesystem, but it collates several file systems so that data is distributed across multiple hosts. The underlying filesystem is usually XFS (recommended by the GlusterFS developers) but ZFS or plain ext4 are often used.

So, why use GlusterFS for your storage needs? I’ll outline a few of the features and you can decide for yourself.

GlusterFS is:

GlusterFS Guide

Note: before starting, ensure each node’s hostname resolves correctly to the address of the host, and that times are synchronised using NTP.

For a quick replicated GlusterFS setup, on a couple of Ubuntu hosts, run the following commands:

sudo fallocate -l 15G /brick

sudo add-apt-repository ppa:semiosis/ubuntu-glusterfs-3.5
sudo apt-get update
sudo apt-get -y install glusterfs-server xfsprogs

sudo mkfs.xfs -i size=512 /brick
sudo mkdir -p /export/brick
sudo mount /brick /export/brick
sudo mkdir -p /export/brick/volume
sudo sh -c "echo \"/brick /export/brick xfs defaults 0 0\" \
  >> /etc/fstab

Then, on one of the hosts, run the following:

sudo gluster peer probe [address.of.host2]
sudo gluster volume create gv0 replica 2 \
  [address.of.host1]:/export/brick/volume \
  [address.of.host2]:/export/brick/volume
sudo gluster volume info
sudo gluster volume set gv0 auth.allow \
  [client1.address,client2.address,...,clientn.address]
sudo gluster volume start gv0

Then on a client:

sudo add-apt-repository ppa:semiosis/ubuntu-glusterfs-3.5
sudo apt-get update
sudo apt-get -y install glusterfs-client
sudo mkdir -p /mnt/volume
sudo mount -t glusterfs [address.of.host]:/gv0 /mnt/volume
sudo chmod a+w /mnt/volume
echo "Hello from $HOSTNAME" > /mnt/volume/test.txt

GlusterFS Architecture Overview

A quick overview of GlusterFS concepts:

GlusterFS Architecture

GlusterFS Engineer Summary

GlusterFS takes your bricks, and passes them through several translators before exposing them as volumes. In a multi-host setup, the “cluster” translator is responsible for distribution / replication.