Unleash the Data Beast: A Fun Guide to Building a Robust Galera Cluster with LXC/LXD Containers

The Challenge

Setup a MariaDB/Galera cluster
Setup HA Proxy for load balancing

Quick Introduction

In a site will relatively low traffic, a database hosted on the same server as that of the site files would be sufficient. But with more traffic to the site comes more issues with handling the load. What will happen if the database gets corrupted? This is where the importance of database replication comes into play.

By database replication, we intend to make several copies of the database and update them in real time so that if the database on one server gets corrupted, we will still have a copy of the same on multiple servers. This can be easily accomplished using Galera. This is a database clustering technology built into and actively maintained by many database application vendors.

Everything is good so far. Assuming that the site/application is connected only to the main database server and not the replicas, what will happen if the main database server itself goes down? This is where the load balancer comes in. This time around we will be using HAProxy (High Availability Proxy) it for setting up the load balancer. Unlike Nginx, HAProxy comes with a lot of functionality that can better provide high availability.

Setting up the Cluster

For this challenge, we will be setting up three containers for hosting the MariaDB servers and one for the load balancer. The 3 containers with the database server named galera1, galera2 and galera3 respectively. The container with the load balancer is named as haproxy.

The first thing we would want to do would launch an Ubuntu container. We will be building up a base image on this container which can be used later on for spinning up the other containers. This might not seem like much in our current scenario as well only have 2 other containers for the database cluster to be spun up. But for larger numbers, the whole situation changes. It's always better to construct and publish a container that will be used as a base image for other containers to be spun up.

$ lxc launch ubuntu galera1 -c security.privileged=true
Creating galera1
Starting galera1

$ lxc list
+---------+---------+-----------------------+-----------------------------------------------+-----------+-----------+----------+
|  NAME   |  STATE  |         IPV4          |                     IPV6                      |   TYPE    | SNAPSHOTS | LOCATION |
+---------+---------+-----------------------+-----------------------------------------------+-----------+-----------+----------+
| galera1 | RUNNING | 10.156.178.190 (eth0) | fd42:93f2:521c:69a4:216:3eff:fe97:16c1 (eth0) | CONTAINER | 0         | cicada   |
+---------+---------+-----------------------+-----------------------------------------------+-----------+-----------+----------+

As you may have noticed, I've included a security flag for the containers. This is because of certain permission issues that will arise within the container while spinning up the cluster. After the container is launched, get a shell on the container and install MariaDB on it.

To do so, first, add the keys for the repo so that the packages downloaded from the external mirror can be verified. Once that's done, add the mirror link to the repo list and update the apt cache using apt update.

$ lxc exec galera1 -- bash
root@galera1:~# cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.4 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.4 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

root@galera1:~# apt update
root@galera1:~# sudo apt-key adv --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 0xF1656F24C74CD1D8
Executing: /tmp/apt-key-gpghome.QEnLsU7KeJ/gpg.1.sh --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 0xF1656F24C74CD1D8
gpg: key F1656F24C74CD1D8: 7 signatures not checked due to missing keys
gpg: key F1656F24C74CD1D8: public key "MariaDB Signing Key <signing-key@mariadb.org>" imported
gpg: Total number processed: 1
gpg:               imported: 1

root@galera1:~# echo 'deb [arch=amd64] http://nyc2.mirrors.digitalocean.com/mariadb/repo/10.4/ubuntu bionic main' >> /etc/apt/sources.list
root@galera1:~# sudo apt update
Hit:1 http://archive.ubuntu.com/ubuntu bionic InRelease
Hit:2 http://security.ubuntu.com/ubuntu bionic-security InRelease
Get:3 http://nyc2.mirrors.digitalocean.com/mariadb/repo/10.4/ubuntu bionic InRelease [6265 B]
Hit:4 http://archive.ubuntu.com/ubuntu bionic-updates InRelease
Get:5 http://nyc2.mirrors.digitalocean.com/mariadb/repo/10.4/ubuntu bionic/main amd64 Packages [16.2 kB]
Fetched 22.5 kB in 1s (18.3 kB/s)
Reading package lists... Done
Building dependency tree
Reading state information... Done
6 packages can be upgraded. Run 'apt list --upgradable' to see them.
root@galera1:~# apt install mariadb-server -y

That should be enough to install and start MariaDB. Check MariaDB status and make sure it is working.

root@galera1:~# mysql -V
mysql  Ver 15.1 Distrib 10.1.44-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2

root@galera1:~# service mysql status
● mariadb.service - MariaDB 10.4.13 database server
   Loaded: loaded (/lib/systemd/system/mariadb.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/mariadb.service.d
           └─migrated-from-my.cnf-settings.conf
   Active: active (running) since Fri 2020-06-19 07:16:07 UTC; 1min 42s ago
     Docs: man:mysqld(8)
           https://mariadb.com/kb/en/library/systemd/
 Main PID: 2081 (mysqld)
   Status: "Taking your SQL requests now..."
    Tasks: 32 (limit: 14199)
   CGroup: /system.slice/mariadb.service
           └─2081 /usr/sbin/mysqld

Jun 19 07:16:12 galera1 /etc/mysql/debian-start[2121]: Processing databases
Jun 19 07:16:12 galera1 /etc/mysql/debian-start[2121]: information_schema
Jun 19 07:16:12 galera1 /etc/mysql/debian-start[2121]: mysql
Jun 19 07:16:12 galera1 /etc/mysql/debian-start[2121]: performance_schema
Jun 19 07:16:12 galera1 /etc/mysql/debian-start[2121]: Phase 6/7: Checking and upgrading tables
Jun 19 07:16:12 galera1 /etc/mysql/debian-start[2121]: Processing databases
Jun 19 07:16:12 galera1 /etc/mysql/debian-start[2121]: information_schema
Jun 19 07:16:12 galera1 /etc/mysql/debian-start[2121]: performance_schema
Jun 19 07:16:12 galera1 /etc/mysql/debian-start[2121]: Phase 7/7: Running 'FLUSH PRIVILEGES'
Jun 19 07:16:12 galera1 /etc/mysql/debian-start[2121]: OK

Since these are at one point to be used in a production environment (hypothetically), we should secure the installed MySQL server using mysql_secure_installation command. Make sure that a root password is set and remove the anonymous login and test database along with its access. It is a good practice to deny remote root login. If that is ever needed, we can create another user with root privileges, which we will be later on in the setup. Make sure the privilege tables are reloaded so that the changes take effect.

root@galera1:~# mysql_secure_installation

NOTE: RUNNING ALL PARTS OF THIS SCRIPT IS RECOMMENDED FOR ALL MariaDB
      SERVERS IN PRODUCTION USE!  PLEASE READ EACH STEP CAREFULLY!

In order to log into MariaDB to secure it, we'll need the current
password for the root user. If you've just installed MariaDB, and
haven't set the root password yet, you should just press enter here.

Enter current password for root (enter for none):
OK, successfully used password, moving on...

Setting the root password or using the unix_socket ensures that nobody
can log into the MariaDB root user without the proper authorisation.

You already have your root account protected, so you can safely answer 'n'.

Switch to unix_socket authentication [Y/n] y
Enabled successfully!
Reloading privilege tables..
 ... Success!


You already have your root account protected, so you can safely answer 'n'.

Change the root password? [Y/n] y
New password:
Re-enter new password:
Password updated successfully!
Reloading privilege tables..
 ... Success!


By default, a MariaDB installation has an anonymous user, allowing anyone
to log into MariaDB without having to have a user account created for
them.  This is intended only for testing, and to make the installation
go a bit smoother.  You should remove them before moving into a
production environment.

Remove anonymous users? [Y/n] y
 ... Success!

Normally, root should only be allowed to connect from 'localhost'.  This
ensures that someone cannot guess at the root password from the network.

Disallow root login remotely? [Y/n] y
 ... Success!

By default, MariaDB comes with a database named 'test' that anyone can
access.  This is also intended only for testing, and should be removed
before moving into a production environment.

Remove test database and access to it? [Y/n] y
 - Dropping test database...
 ... Success!
 - Removing privileges on test database...
 ... Success!

Reloading the privilege tables will ensure that all changes made so far
will take effect immediately.

Reload privilege tables now? [Y/n] y
 ... Success!

Cleaning up...

All done!  If you've completed all of the above steps, your MariaDB
installation should now be secure.

Thanks for using MariaDB!

With that, the base image can be generated using this container. To publish the container, stop the container and then use the lxc publish command with reference to the galera1 container. In my case, I've given the published image an alias as galera.

$ lxc stop galera1
$ lxc publish galera1 --alias galera
Instance published with fingerprint: a939e96df6ec81403a44576347e1b794a74874522c77e820c378c605ec70c8dc

We can now spin up replicas of the database server using the published image.

$ lxc launch galera galera2
Creating galera2
Starting galera2
$ lxc launch galera galera3
Creating galera3
Starting galera3

$ lxc start galera1
$ lxc list
+---------+---------+-----------------------+-----------------------------------------------+-----------+-----------+----------+
|  NAME   |  STATE  |         IPV4          |                     IPV6                      |   TYPE    | SNAPSHOTS | LOCATION |
+---------+---------+-----------------------+-----------------------------------------------+-----------+-----------+----------+
| galera1 | RUNNING | 10.156.178.190 (eth0) | fd42:93f2:521c:69a4:216:3eff:fe97:16c1 (eth0) | CONTAINER | 0         | cicada   |
+---------+---------+-----------------------+-----------------------------------------------+-----------+-----------+----------+
| galera2 | RUNNING | 10.156.178.130 (eth0) | fd42:93f2:521c:69a4:216:3eff:fe0f:f150 (eth0) | CONTAINER | 0         | cicada   |
+---------+---------+-----------------------+-----------------------------------------------+-----------+-----------+----------+
| galera3 | RUNNING | 10.156.178.25 (eth0)  | fd42:93f2:521c:69a4:216:3eff:feb2:3bfb (eth0) | CONTAINER | 0         | cicada   |
+---------+---------+-----------------------+-----------------------------------------------+-----------+-----------+----------+

The base setup is now complete and we can now proceed with setting up the cluster. Mysql uses a separate config file for specifying extra configurations. These configuration files have the extension .cnf. Create a common configuration file on the first container and name it galera.cnf and then pull it to the host machine. The contents of the file are shown below.

$ lxc exec galera1 -- touch /etc/mysql/conf.d/galera.cnf
$ lxc file pull galera1/etc/mysql/conf.d/galera.cnf
$ cat galera.cnf
[mysqld]
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0

# Galera Provider Configuration
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so

# Galera Cluster Configuration
wsrep_cluster_name="test_cluster"
wsrep_cluster_address="gcomm://First_Node_IP,Second_Node_IP,Third_Node_IP"

# Galera Synchronization Configuration
wsrep_sst_method=rsync

# Galera Node Configuration
wsrep_node_address="This_Node_IP"
wsrep_node_name="This_Node_Name"

The first section modifies or re-asserts MariaDB/MySQL settings that will allow the cluster to function correctly. For example, Galera won’t work with MyISAM or similar non-transactional storage engines, and MySQL must not be bound to the IP address for localhost. You can learn about the settings in more detail on the Galera Cluster system configuration page.
The “Galera Provider Configuration” section configures the MariaDB components that provide a WriteSet replication API. This means Galera in our case since Galera is a wsrep (WriteSet Replication) provider. You specify the general parameters to configure the initial replication environment. This doesn’t require any customization and is usually the same for every galera cluster setup.
The “Galera Cluster Configuration” section defines the cluster, identifying the cluster members by IP address or resolvable domain name and creating a name for the cluster to ensure that members join the correct group. You can change the wsrep_cluster_name to something more meaningful than test_cluster or leave it as-is, but you must update wsrep_cluster_address with the private IP addresses of your three servers.
The “Galera Synchronization Configuration” section defines how the cluster will communicate and synchronize data between members. This is used only for the state transfer that happens when a node comes online. For your initial setup, you are using rsync, because it’s commonly available and does what you’ll need for now.
The “Galera Node Configuration” section clarifies the IP address and the name of the current server. This is helpful when trying to diagnose problems in logs and for referencing each server in multiple ways. The wsrep_node_address must match the address of the machine you’re on, but you can choose any name you want to help you identify the node in log files. For our setup, this will be as follows.
```
wsrep_cluster_address="gcomm://10.156.178.190,10.156.178.130,10.156.178.25"
wsrep_node_address="10.156.178.XX"
wsrep_node_name="galeraX"
```
Make the necessary edits concerning each server and push them to the containers.
```
$ lxc file push galera.cnf galera1/etc/mysql/conf.d/
$ lxc file push galera.cnf galera2/etc/mysql/conf.d/
$ lxc file push galera.cnf galera3/etc/mysql/conf.d/

$ lxc file edit galera2/etc/mysql/conf.d/galera.cnf
$ lxc file edit galera3/etc/mysql/conf.d/galera.cnf
```

The crucial part starts here. Before the cluster can be set up further, we have to first stop MySQL on all servers. This is very important. Every single step should be followed as mentioned below.

$ lxc exec galera1 -- bash
root@galera1:~# systemctl stop
Too few arguments.
root@galera1:~# systemctl stop mysql
root@galera1:~# systemctl status mysql
● mariadb.service - MariaDB 10.4.13 database server
   Loaded: loaded (/lib/systemd/system/mariadb.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/mariadb.service.d
           └─migrated-from-my.cnf-settings.conf
   Active: inactive (dead) since Fri 2020-06-19 07:35:42 UTC; 5s ago
     Docs: man:mysqld(8)
           https://mariadb.com/kb/en/library/systemd/
  Process: 275 ExecStartPost=/etc/mysql/debian-start (code=exited, status=0/SUCCESS)
  Process: 273 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
  Process: 234 ExecStart=/usr/sbin/mysqld $MYSQLD_OPTS $_WSREP_NEW_CLUSTER $_WSREP_START_POSITION (code=exited, status=0/SUCCESS)
  Process: 94 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= ||   VAR=`cd /usr/bin/..; /usr/bin/galera_recovery`; [ $? -eq 0 ]   && systemc
  Process: 92 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
  Process: 89 ExecStartPre=/usr/bin/install -m 755 -o mysql -g root -d /var/run/mysqld (code=exited, status=0/SUCCESS)
 Main PID: 234 (code=exited, status=0/SUCCESS)
   Status: "MariaDB server is down"

Jun 19 07:35:40 galera1 mysqld[234]: 2020-06-19  7:35:40 0 [Note] /usr/sbin/mysqld (initiated by: unknown): Normal shutdown
Jun 19 07:35:40 galera1 mysqld[234]: 2020-06-19  7:35:40 0 [Note] Event Scheduler: Purging the queue. 0 events
Jun 19 07:35:40 galera1 mysqld[234]: 2020-06-19  7:35:40 0 [Note] InnoDB: FTS optimize thread exiting.
Jun 19 07:35:40 galera1 mysqld[234]: 2020-06-19  7:35:40 0 [Note] InnoDB: Starting shutdown...
Jun 19 07:35:40 galera1 mysqld[234]: 2020-06-19  7:35:40 0 [Note] InnoDB: Dumping buffer pool(s) to /var/lib/mysql/ib_buffer_pool
Jun 19 07:35:40 galera1 mysqld[234]: 2020-06-19  7:35:40 0 [Note] InnoDB: Buffer pool(s) dump completed at 200619  7:35:40
Jun 19 07:35:42 galera1 mysqld[234]: 2020-06-19  7:35:42 0 [Note] InnoDB: Shutdown completed; log sequence number 60990; transaction id 22
Jun 19 07:35:42 galera1 mysqld[234]: 2020-06-19  7:35:42 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
Jun 19 07:35:42 galera1 mysqld[234]: 2020-06-19  7:35:42 0 [Note] /usr/sbin/mysqld: Shutdown complete
Jun 19 07:35:42 galera1 systemd[1]: Stopped MariaDB 10.4.13 database server.
$ lxc exec galera2 -- systemctl stop mysql
$ lxc exec galera3 -- systemctl stop mysql

Since these servers are to be run as only a database cluster, all other ports should be closed and only the ones required by the cluster should be opened. This can be done easily using ufw. Uncomplicated Firewall (ufw) is a program for managing a Netfilter firewall designed to be easy to use. It uses a command-line interface consisting of a small number of simple commands and uses iptables for configuration. Install and enable ufw on all containers.

root@galera1:~# sudo apt install ufw
root@galera1:~# ufw status
Status: inactive

We can see that ufw is in an inactive state. Enable the same and add the rules for opening ports for Galera and SSH.

Galera can make use of four ports:

3306 For MySQL client connections and State Snapshot Transfer that use the mysqldump method.
4567 For Galera Cluster replication traffic. Multicast replication uses both UDP transport and TCP on this port.
4568 For Incremental State Transfer.
4444 For all other State Snapshot Transfer.

root@galera1:~# ufw enable
root@galera1:~# ufw allow 22,3306,4567,4568,4444/tcp
Rule added
Rule added (v6)
root@galera1:~# ufw allow 4567/udp
Rule added
Rule added (v6)

The cluster is now set up and we can start the cluster nodes one by one. We must bring up the first node using the Galera special script (galera_new_cluster) first. Otherwise, the other nodes will not know which cluster to connect to and will fail to load. Once the galera_new_cluster command is run successfully, we can check the number of nodes or better said the size of the cluster.

root@galera1:~# galera_new_cluster
root@galera1:~# mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size'"
Enter password:
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 1     |
+--------------------+-------+

In our case, the cluster is now showing to have a size of 1. This indicates that the first node was started successfully and the other nodes can be launched to connect to this node. We can then restart the service on the other nodes.

$ lxc exec galera2 -- systemctl start mysql
$ lxc exec galera3 -- bash
root@galera3:~# systemctl start mysql
root@galera3:~# systemctl status mysql
● mariadb.service - MariaDB 10.4.13 database server
   Loaded: loaded (/lib/systemd/system/mariadb.service; enabled; vendor preset:
  Drop-In: /etc/systemd/system/mariadb.service.d
           └─migrated-from-my.cnf-settings.conf
   Active: active (running) since Fri 2020-06-19 08:16:47 UTC; 2s ago
     Docs: man:mysqld(8)
           https://mariadb.com/kb/en/library/systemd/
  Process: 711 ExecStartPost=/etc/mysql/debian-start (code=exited, status=0/SUCC
  Process: 709 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START
  Process: 340 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR=
  Process: 338 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_
  Process: 337 ExecStartPre=/usr/bin/install -m 755 -o mysql -g root -d /var/run
 Main PID: 552 (mysqld)
   Status: "Taking your SQL requests now..."
    Tasks: 37 (limit: 14199)
   CGroup: /system.slice/mariadb.service
           └─552 /usr/sbin/mysqld --wsrep_start_position=00000000-0000-0000-0000

Jun 19 08:16:47 galera3 mysqld[552]: 2020-06-19  8:16:47 0 [Note] WSREP: Shiftin
Jun 19 08:16:47 galera3 mysqld[552]: 2020-06-19  8:16:47 2 [Note] WSREP: Server
Jun 19 08:16:47 galera3 mysqld[552]: 2020-06-19  8:16:47 2 [Note] WSREP: Server
Jun 19 08:16:47 galera3 mysqld[552]: 2020-06-19  8:16:47 2 [Note] WSREP: Synchro
Jun 19 08:16:47 galera3 mysqld[552]: 2020-06-19  8:16:47 2 [Note] WSREP: wsrep_n

root@galera3:~# mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size'"
Enter password:
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 3     |
+--------------------+-------+

All our 3 nodes are now up and connected. To check whether they are in sync, create a test database and add contents to it from one container. If the databases are in sync, then they will be automatically replicated to the other nodes as well.

$ lxc exec galera1 -- bash
root@galera1:~# mysql -u root -p -e 'CREATE DATABASE playground;
> CREATE TABLE playground.equipment ( id INT NOT NULL AUTO_INCREMENT, type VARCHAR(50), quant INT, color VARCHAR(25), PRIMARY KEY(id));
> INSERT INTO playground.equipment (type, quant, color) VALUES ("slide", 2, "blue");'
Enter password:
root@galera1:~# mysql -u root -p -e 'SELECT * FROM playground.equipment;'
Enter password:
+----+-------+-------+-------+
| id | type  | quant | color |
+----+-------+-------+-------+
|  3 | slide |     2 | blue  |
+----+-------+-------+-------+
root@galera1:~# exit
exit

$ lxc exec galera2 -- bash
root@galera2:~# mysql -u root -p -e 'SELECT * FROM playground.equipment;'
Enter password:
+----+-------+-------+-------+
| id | type  | quant | color |
+----+-------+-------+-------+
|  3 | slide |     2 | blue  |
+----+-------+-------+-------+
root@galera2:~# exit
exit

$ lxc exec galera3 -- bash
root@galera3:~# mysql -u root -p -e 'SELECT * FROM playground.equipment;'
Enter password:
+----+-------+-------+-------+
| id | type  | quant | color |
+----+-------+-------+-------+
|  3 | slide |     2 | blue  |
+----+-------+-------+-------+
root@galera3:~#

From the above snippet, we can see that our setup is working as expected and is cluster is now up and running.

Setting up Load Balancer using HAProxy

This would be enough if the application was on the same server. But since we intend to use this as a Highly Available DB Cluster, the settings for every database server need to be changed so that the connections are routed only through the haproxy container and all other connections should be dropped. For the haproxy container to be able to talk to the Galera cluster, a separate user with root privileges need to be created. We will be also creating another user just for checking node health. The user haproxy_root will be the user with root privileges on the cluster and haproxy_check will be for checking node health. In the snippet shown below, we are creating the user haproxy_root with all privileges on all databases and the table also set its password to secret. Then the privilege table is reloaded using the FLUSH PRIVILEGES command. In the second part, I've commented out the bind-address directive from the MySQL config file so that remote connections will be possible to the server. For this to take effect, the MySQL service is reloaded.

$ lxc exec galera1 -- bash
root@galera1:~# mysql -u root -p
Enter password:
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 41
Server version: 10.4.13-MariaDB-1:10.4.13+maria~bionic-log mariadb.org binary distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> GRANT ALL PRIVILEGES ON *.* TO 'haproxy_root'@'%' IDENTIFIED BY 'secret' WITH GRANT OPTION;
MariaDB [(none)]> FLUSH PRIVILEGES;
MariaDB [(none)]> exit
Bye

root@galera1:~# vi /etc/mysql/my.cnf
...
# Instead of skip-networking the default is now to listen only on
# localhost which is more compatible and is not less secure.
#bind-address        = 127.0.0.1
...

root@galera1:~# systemctl reload mysql

We need not do it for the other nodes as the changes are automatically reflected on the other servers. Now create the MySQL user haproxy_check which will be able to check the server health. This needs to be done on only one server as the others will be automatically synced. It is anyway a database cluster. It should be noted that we are not assigning any password for the haproxy_check user. This is required for the load balancer to check the health of the server, it needs to connect to the nodes. Setting passwords will cause hinder this ability.

MariaDB [mysql]> create user 'haproxy_check'@'10.156.178.56';
Query OK, 0 rows affected (0.034 sec)
MariaDB [mysql]> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.027 sec)
MariaDB [mysql]> select User from user where Host='10.156.178.56';
+---------------+
| User          |
+---------------+
| haproxy_check |
+---------------+
1 row in set (0.001 sec)

MariaDB [(none)]> GRANT ALL PRIVILEGES ON *.* TO 'haproxy_root'@'10.156.178.56' IDENTIFIED BY 'secret' WITH GRANT OPTION;
Query OK, 0 rows affected (0.036 sec)

We can now spin up another Ubuntu container and set up HAProxy on it. We can check whether the service was properly installed by just running the command service haproxy. If it is properly installed, then we will get the Usage instruction message for it as shown at the end of the snippet.

$ lxc launch ubuntu haproxy
Creating haproxy
Starting haproxy
$ lxc exec haproxy -- bash
root@haproxy:~# apt update
root@haproxy:~# apt install haproxy -y
root@haproxy:~# haproxy -v
HA-Proxy version 1.8.8-1ubuntu0.10 2020/04/03
Copyright 2000-2018 Willy Tarreau <willy@haproxy.org>
root@haproxy:~# service haproxy
Usage: /etc/init.d/haproxy {start|stop|reload|restart|status}

We must be able to connect to the nodes directly from the haproxy container. To check whether this is working, install the mariadb-client on the haproxy container and check whether it can connect to the nodes. In our case, we were able to connect to it.

root@haproxy:~# apt install mariadb-client -y
root@haproxy:~# mysql -h 10.156.178.190 -u haproxy_root -p -e "show databases;"
Enter password:
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| playground         |
+--------------------+

We can now set up the configuration file for the HA Proxy to work as a load balancer. The main configuration file is /etc/default/haproxy. In that file, we should uncomment the line which specifies the configuration file which we intend to use for setting up the load balancer. The configuration file which we intend to use for setting up the load balancer. It's a good practice to make a copy of the original configuration file and work on the copy so that we can always revert to the original in case some configuration is causing any issues with the haproxy.

root@haproxy:~# vi /etc/default/haproxy
# Defaults file for HAProxy
#
# This is sourced by both, the initscript and the systemd unit file, so do not
# treat it as a shell script fragment.

# Change the config file location if needed
CONFIG="/etc/haproxy/haproxy.cfg"

# Add extra flags here, see haproxy(1) for a few options
#EXTRAOPTS="-de -m 16"

root@haproxy:~# cp /etc/haproxy/haproxy.cfg{,.orginal}
root@haproxy:~# ls /etc/haproxy/
errors  haproxy.cfg  haproxy.cfg.orginal

In the load balancer config file (/etc/haproxy/haproxy.cfg), we need to specify the specifics for the load balancer. Use the listen to create a proxy and give it a name. In our case it is Galera. bind parameter is used to specify from where it should receive connections from and the port it should be able to bind to. Since we are to allow connection from anywhere 0.0.0.0, otherwise it will be the private IP from which the connection should be allowed. balance defines the destination selection policy HAProxy should use in choosing which server it routes incoming connections. mode tcp defines the type of connections it should route. Galera Cluster uses TCP connections. option tcpka enables the keepalive function to maintain TCP connections. option mysql-check user <username> enables a database server check to determine whether the node is currently operational. server <server-name> <IP_address> check defines the nodes HAProxy should use in routing connections.

The second part has a similar setup. This time it is for setting up the stats page which we could use to check the current state of a node in the cluster. stats enable enables stats view and stats uri /<path> defines the URI which should be used to access the stats page. stats auth Username:Password enables basic authentication for accessing the stats page.

root@haproxy:~# cat /etc/haproxy/haproxy.cfg
...
listen galera
    bind 0.0.0.0:3306
    balance source
    mode tcp
    option tcpka
    option mysql-check user haproxy_check
    balance roundrobin
    server galera1 10.156.178.190:3306 check
    server galera2 10.156.178.130:3306 check
    server galera3 10.156.178.25:3306  check

listen stats
    bind 0.0.0.0:8080
    mode http
    stats enable
    stats uri /stats
    stats realm Strictly\ Private
    stats auth admin:admin


root@haproxy:~# haproxy -c -V -f /etc/haproxy/haproxy.cfg
Configuration file is valid

root@haproxy:~# service haproxy reload
root@haproxy:~# systemctl status haproxy
● haproxy.service - HAProxy Load Balancer
   Loaded: loaded (/lib/systemd/system/haproxy.service; enabled; vendor preset: enabled)
   Active: active (running) (Result: exit-code) since Fri 2020-06-19 15:09:03 UTC; 1h 53min ago
     Docs: man:haproxy(1)
           file:/usr/share/doc/haproxy/configuration.txt.gz
  Process: 352 ExecReload=/bin/kill -USR2 $MAINPID (code=exited, status=0/SUCCESS)
  Process: 351 ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q $EXTRAOPTS (code=exited, status=0/SUCCESS)
  Process: 181 ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q $EXTRAOPTS (code=exited, status=0/SUCCESS)
 Main PID: 182 (haproxy)
    Tasks: 2 (limit: 14199)
   CGroup: /system.slice/haproxy.service
           ├─182 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf 277 -x /run/haproxy/admin.sock
           └─354 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf 277 -x /run/haproxy/admin.sock

haproxy -c -V -f /etc/haproxy/haproxy.cfg can be used for verifying whether the configuration file is valid and making sure that does not have any errors. Once the tests clear, the haproxy service can be reloaded.

Testing

Testing the setup is rather straightforward. We should be able to query the database by making a connection through the haproxy even when one or more nodes are not working properly. Getting a response till the last node is not turned off is our indicator for a properly working Galera cluster. The expected outcome of the demonstration should be similar to that of the following snippet.

$ lxc list
+---------+---------+-----------------------+-----------------------------------------------+-----------+-----------+----------+
|  NAME   |  STATE  |         IPV4          |                     IPV6                      |   TYPE    | SNAPSHOTS | LOCATION |
+---------+---------+-----------------------+-----------------------------------------------+-----------+-----------+----------+
| galera1 | RUNNING | 10.156.178.190 (eth0) | fd42:93f2:521c:69a4:216:3eff:fe97:16c1 (eth0) | CONTAINER | 0         | cicada   |
+---------+---------+-----------------------+-----------------------------------------------+-----------+-----------+----------+
| galera2 | RUNNING | 10.156.178.130 (eth0) | fd42:93f2:521c:69a4:216:3eff:fe0f:f150 (eth0) | CONTAINER | 0         | cicada   |
+---------+---------+-----------------------+-----------------------------------------------+-----------+-----------+----------+
| galera3 | RUNNING | 10.156.178.25 (eth0)  | fd42:93f2:521c:69a4:216:3eff:feb2:3bfb (eth0) | CONTAINER | 0         | cicada   |
+---------+---------+-----------------------+-----------------------------------------------+-----------+-----------+----------+
| haproxy | RUNNING | 10.156.178.56 (eth0)  | fd42:93f2:521c:69a4:216:3eff:fea5:a203 (eth0) | CONTAINER | 0         | cicada   |
+---------+---------+-----------------------+-----------------------------------------------+-----------+-----------+----------+
$ mysql -h 10.156.178.56 -u haproxy_root -p -e "SELECT * FROM playground.equipment;"
Enter password:
+----+-------+-------+-------+
| id | type  | quant | color |
+----+-------+-------+-------+
|  3 | slide |     2 | blue  |
+----+-------+-------+-------+
$ lxc stop galera1
$ mysql -h 10.156.178.56 -u haproxy_root -p -e "SELECT * FROM playground.equipment;"
Enter password:
+----+-------+-------+-------+
| id | type  | quant | color |
+----+-------+-------+-------+
|  3 | slide |     2 | blue  |
+----+-------+-------+-------+
$ lxc stop galera2
$ mysql -h 10.156.178.56 -u haproxy_root -p -e "SELECT * FROM playground.equipment;"
Enter password:
+----+-------+-------+-------+
| id | type  | quant | color |
+----+-------+-------+-------+
|  3 | slide |     2 | blue  |
+----+-------+-------+-------+

The status of the cluster can be checked from the following link: http://10.156.178.56:8080/stats

Troubleshooting

Stopping the container like this will damage the MySQL instance and it will not work when the container is brought back up. This can be seen in the snippet below. The correct method to test the load balancer was to stop the MySQL service and not kill the entire container.

$ lxc exec galera1 -- systemctl reload mysql
Failed to reload mysql.service: Job type reload is not applicable for unit mariadb.service.
See system logs and 'systemctl status mysql.service' for details.
$ lxc exec galera1 -- journalctl -xe
--
-- Unit mariadb.service has begun starting up
....
Jun 19 17:36:37 galera1 mysqld[870]: 2020-06-19 17:36:37 0 [Note] WSREP: ####### Assign initial position for certification: dc4131b7-b202-11ea-a09b-276abc28c5
Jun 19 17:36:37 galera1 mysqld[870]: 2020-06-19 17:36:37 0 [Note] WSREP: Start replication
Jun 19 17:36:37 galera1 mysqld[870]: 2020-06-19 17:36:37 0 [Note] WSREP: Connecting with bootstrap option: 1
Jun 19 17:36:37 galera1 mysqld[870]: 2020-06-19 17:36:37 0 [Note] WSREP: Setting GCS initial position to dc4131b7-b202-11ea-a09b-276abc28c565:54
Jun 19 17:36:37 galera1 mysqld[870]: 2020-06-19 17:36:37 0 [ERROR] WSREP: It may not be safe to bootstrap the cluster from this node. It was not the last one
Jun 19 17:36:37 galera1 mysqld[870]: 2020-06-19 17:36:37 0 [ERROR] WSREP: wsrep::connect(gcomm://10.156.178.190,10.156.178.130,10.156.178.25) failed: 7
Jun 19 17:36:37 galera1 mysqld[870]: 2020-06-19 17:36:37 0 [ERROR] Aborting
Jun 19 17:36:37 galera1 systemd[1]: mariadb.service: Main process exited, code=exited, status=1/FAILURE
Jun 19 17:36:37 galera1 systemd[1]: mariadb.service: Failed with result 'exit-code'.
Jun 19 17:36:37 galera1 systemd[1]: Failed to start MariaDB 10.4.13 database server.
-- Subject: Unit mariadb.service has failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Unit mariadb.service has failed.
--

The issue can be quickly identified from the following error log entry.

Jun 19 17:36:37 galera1 mysqld[870]: 2020-06-19 17:36:37 0 [ERROR] WSREP: It may not be safe to bootstrap the cluster from this node. It was not the last one Bootstrapping from that server was marked as not safe. To resolve that issue go to the grastate.dat file and change safe_to_bootstrap to 1

root@galera1:~# vi /var/lib/mysql/grastate.dat
# GALERA saved state
version: 2.1
uuid:    dc4131b7-b202-11ea-a09b-276abc28c565
seqno:   -1
safe_to_bootstrap: 1

Now try recovering the previous state manually and then restarting the Galera cluster.

root@galera1:~# mysqld --wsrep-recover
2020-06-19 17:48:30 0 [Note] mysqld (mysqld 10.4.13-MariaDB-1:10.4.13+maria~bionic-log) starting as process 1612 ...
2020-06-19 17:48:30 0 [Note] InnoDB: Using Linux native AIO
2020-06-19 17:48:30 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2020-06-19 17:48:30 0 [Note] InnoDB: Uses event mutexes
2020-06-19 17:48:30 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2020-06-19 17:48:30 0 [Note] InnoDB: Number of pools: 1
2020-06-19 17:48:30 0 [Note] InnoDB: Using SSE2 crc32 instructions
2020-06-19 17:48:30 0 [Note] mysqld: O_TMPFILE is not supported on /tmp (disabling future attempts)
2020-06-19 17:48:30 0 [Note] InnoDB: Initializing buffer pool, total size = 256M, instances = 1, chunk size = 128M
2020-06-19 17:48:30 0 [Note] InnoDB: Completed initialization of buffer pool
2020-06-19 17:48:30 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2020-06-19 17:48:30 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
2020-06-19 17:48:30 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2020-06-19 17:48:30 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2020-06-19 17:48:30 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2020-06-19 17:48:30 0 [Note] InnoDB: Waiting for purge to start
2020-06-19 17:48:30 0 [Note] InnoDB: 10.4.13 started; log sequence number 313959; transaction id 463
2020-06-19 17:48:30 0 [Warning] InnoDB: Skipping buffer pool dump/restore during wsrep recovery.
2020-06-19 17:48:30 0 [Note] Plugin 'FEEDBACK' is disabled.
2020-06-19 17:48:30 0 [Note] Server socket created on IP: '0.0.0.0'.
2020-06-19 17:48:30 0 [Note] WSREP: Recovered position: dc4131b7-b202-11ea-a09b-276abc28c565:54

root@galera1:~# galera_new_cluster
root@galera1:~#service mysql status
● mariadb.service - MariaDB 10.4.13 database server
   Loaded: loaded (/lib/systemd/system/mariadb.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/mariadb.service.d
           └─migrated-from-my.cnf-settings.conf
   Active: active (running) since Fri 2020-06-19 17:48:39 UTC; 8min ago
     Docs: man:mysqld(8)
           https://mariadb.com/kb/en/library/systemd/
  Process: 1893 ExecStartPost=/etc/mysql/debian-start (code=exited, status=0/SUCCESS)
  Process: 1891 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
  Process: 1648 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= ||   VAR=`cd /usr/bin/..; /usr/bin/galera_recovery`; [ $? -eq 0 ]   && syste
  Process: 1646 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
  Process: 1645 ExecStartPre=/usr/bin/install -m 755 -o mysql -g root -d /var/run/mysqld (code=exited, status=0/SUCCESS)
 Main PID: 1854 (mysqld)
   Status: "Taking your SQL requests now..."
    Tasks: 37 (limit: 14199)
   CGroup: /system.slice/mariadb.service
           └─1854 /usr/sbin/mysqld --wsrep-new-cluster --wsrep_start_position=dc4131b7-b202-11ea-a09b-276abc28c565:54

Do the same for other stopped containers as well.
NB: There is no need to restore the state on the other servers. It will automatically sync with the first one.