Remote RAID1 using HAST

With HAST (Highly Available Storage) , we will be able to mount a transparent storage system between two remote computers connected via a TCP/IP network. HAST can be considered a network RAID1 (mirror).

The article is composed of several sections:

Introduction
HAST Configuration
Testing
Troubleshooting

Introduction:

To demonstrate the operation of HAST, we will have two FreeBSD13.1 servers, each with its own independent IP and a VIP-CARP :

VIP: 192.168.69.40
PeanutBrain01: 192.168.69.41
PeanutBrain02: 192.168.69.42

When working with HAST, we must take into account several aspects. HAST provides synchronous block-level replication, making it transparent to file systems and applications. There is no difference between using HAST devices or raw disks, partitions, etc., all of them are just regular GEOM providers. HAST works in primary-secondary mode, only one of the nodes can be active at any given time. The primary node will handle I/O requests, and the secondary node is automatically synchronized from the primary. Write/delete/flush operations are sent to the primary and then replicated to the secondary. Read operations are served from the primary unless there is an I/O error, in which case the read is sent to the secondary.

HAST implements several synchronization modes:

memsync: This mode reports a write operation as completed when the primary has written the data to disk and the secondary sends the ACK to confirm the start of data reception, only the start of reception, the data has not been written to the secondary. This mode reduces latency while providing reasonable reliability and is the default mode.
fullsync: This mode reports a write operation as completed when both nodes have written the data to disk. It is the safest mode but also the slowest.
async: This mode reports a write operation as completed when the primary has written the data to disk. It is the fastest mode but also the most dangerous and should only be used when the latency to the secondary is too high to use either of the other two modes.

If we are using a custom kernel , we will have to enable the following option in its compilation:

options	GEOM_GATE

HAST Configuration:

In both servers we have two disks in addition to the one used by the system, so we can test the operation with UFS/ZFS.

UFS: /dev/ada1
ZFS: /dev/ada2

root@PeanutBrain01:~ # camcontrol devlist
<VBOX HARDDISK 1.0>                at scbus0 target 0 lun 0 (pass0,ada0)
<VBOX HARDDISK 1.0>                at scbus0 target 1 lun 0 (pass1,ada1)
<VBOX HARDDISK 1.0>                at scbus1 target 0 lun 0 (pass2,ada2)

root@PeanutBrain02:~ # camcontrol devlist
<VBOX HARDDISK 1.0>                at scbus0 target 0 lun 0 (pass0,ada0)
<VBOX HARDDISK 1.0>                at scbus0 target 1 lun 0 (pass1,ada1)
<VBOX HARDDISK 1.0>                at scbus1 target 0 lun 0 (pass2,ada2)

The disks must be clean:

It is not possible to use GEOM providers with an existing file system or to convert an existing storage to a HAST-managed pool

The HAST configuration would be the following in both nodes:

vi /etc/hast.conf

resource MySQLData {
	on PeanutBrain01 {
		local /dev/ada1
		remote 192.168.69.42
	}
	on PeanutBrain02 {
		local /dev/ada1
		remote 192.168.69.41
	}
}

resource FilesData {
	on PeanutBrain01 {
		local /dev/ada2
		remote 192.168.69.42
	}
	on PeanutBrain02 {
		local /dev/ada2
		remote 192.168.69.41
	}
}

We create the HAST pools on both nodes:

hastctl create MySQLData
hastctl create FilesData

We enable and start the service:

sysrc hastd_enable=YES
service hastd start

We set one of the nodes to primary (PeanutBrain01):

hastctl role primary MySQLData
hastctl role primary FilesData

The other node to secondary (PeanutBrain02):

hastctl role secondary MySQLData
hastctl role secondary FilesData

We check the status on both:

root@PeanutBrain01:~ # hastctl status MySQLData
Name	Status	 Role		Components
MySQLData	complete primary        /dev/ada1	192.168.69.42

root@PeanutBrain01:~ # hastctl status FilesData
Name	Status	 Role		Components
FilesData	complete primary        /dev/ada2	192.168.69.42

root@PeanutBrain02:~ # hastctl status MySQLData
Name	Status	 Role		Components
MySQLData	complete secondary      /dev/ada1	192.168.69.41

root@PeanutBrain02:~ # hastctl status FilesData
Name	Status	 Role		Components
FilesData	complete secondary      /dev/ada2	192.168.69.41

We can see how the HAST device has only been generated on the primary:

root@PeanutBrain01:~ # ls -la /dev/hast/
total 1
dr-xr-xr-x   2 root  wheel      512 Oct 30 18:11 .
dr-xr-xr-x  10 root  wheel      512 Oct 30 18:04 ..
crw-r-----   1 root  operator  0x61 Oct 30 18:11 FilesData
crw-r-----   1 root  operator  0x5f Oct 30 18:11 MySQLData

root@PeanutBrain02:~ # ls -la /dev/hast/
ls: /dev/hast/: No such file or directory

Now we must decide which file system we want to use on the HAST device, in my case I am going to use both UFS and ZFS, the following commands are ONLY executed on the primary:

Now we need to decide which file system we want to use on the HAST device. In my case, I will use both UFS and ZFS. The following commands should only be executed on the primary:

newfs -U /dev/hast/MySQLData
mkdir /var/db/mysql
mount /dev/hast/MySQLData /var/db/mysql

zpool create FilesData /dev/hast/FilesData

Now we check that it is mounted in the primary node:

df -Th /var/db/mysql
Filesystem           Type    Size    Used   Avail Capacity  Mounted on
/dev/hast/MySQLData  ufs      15G    8.0K     14G     0%    /var/db/mysql

zpool status FilesData
pool: FilesData
  state: ONLINE
  config:

      NAME              STATE     READ WRITE CKSUM
      FilesData         ONLINE       0     0     0
      hast/FilesData  ONLINE       0     0     0

  errors: No known data errors

df -Th /FilesData
  Filesystem  Type    Size    Used   Avail Capacity  Mounted on
  FilesData   zfs      15G     96K     15G     0%    /FilesData

The idea is to set up two MySQL and NFS services, both offered by the primary node, which will be either node depending on where the VIP-CARP is configured at any given time.

Install the mysql-server package on both nodes:

pkg install mysql80-server

Enable the service on both nodes:

sysrc mysql_enable=YES

Bind the service to all server IPs:

vi /usr/local/etc/mysql/my.cnf

[mysqld]
bind-address                    = 0.0.0.0

Start the service on both nodes:

service mysql-server start

Now enable NFS for the /FilesData directory on the primary node:

zfs set sharenfs="maproot=root:wheel" FilesData

Check if it is shared:

zfs get sharenfs FilesData

NAME       PROPERTY  VALUE               SOURCE
FilesData  sharenfs  maproot=root:wheel  local

Enable NFS services on both nodes so that everything is ready when the VIP-CARP migrates:

sysrc rpcbind_enable=YES
sysrc nfs_server_enable=YES
sysrc mountd_enable=YES

Start the NFS service on both nodes:

service nfsd start

Create a test file on the primary node:

echo kr0m > /FilesData/AA

From our PC, make sure we can access the content via NFS:

Garrus # ~> showmount -e 192.168.69.40
Exports list on 192.168.69.40:
/FilesData                         Everyone

Garrus # ~> mount 192.168.69.40:/FilesData /mnt/nfs/
Garrus # ~> df -Th /mnt/nfs/
Filesystem                Type    Size    Used   Avail Capacity  Mounted on
192.168.69.40:/FilesData  nfs      15G    100K     15G     0%    /mnt/nfs

Garrus # ~> cat /mnt/nfs/AA 
kr0m

The role change will be controlled through devd events. When the VIP-CARP migrates to a node, it will be configured as the primary, and when the VIP-CARP is lost, it will be configured as the secondary.
The necessary parameters for configuring devd rules are as follows :

System    Subsystem    Type         Description
CARP                                Events related to the carp(4) protocol.
CARP      vhid@inet                 The "subsystem" contains	the actual
                                    CARP vhid and the name of the network
                                    interface on which the event took
                                    place.
CARP      vhid@inet    MASTER       Node become the master for a virtual
                                    host.
CARP      vhid@inet    BACKUP       Node become the backup for a virtual
                                    host.

Add the following configuration to devd to execute an HAST reconfiguration script when a VIP-CARP migration is detected:

vi /etc/devd/carp-hast.conf

notify 30 {
	match "system" "CARP";
	match "subsystem" "1@em0";
	match "type" "MASTER";
	action "/usr/local/sbin/carp-hast-switch primary";
};

notify 30 {
	match "system" "CARP";
	match "subsystem" "1@em0";
	match "type" "BACKUP";
	action "/usr/local/sbin/carp-hast-switch secondary";
};

Restart devd:

service devd restart

Depending on the detected change, the failover script will act in a certain way:

Primary: Waits for the HAST-secondary processes to die, changes the role, mounts/imports the file system, and starts the services.
Secondary: Stops the services, unmounts/exports the file system, and changes the role.

All HAST resources are linked to a directory where they will be mounted, a file system type, and a service:

Resource	Directory	File System	Service
MySQLData	/var/db/mysql	UFS	mysql
FilesData	/FilesData	ZFS	nfs

The script in question would be as follows:

vi /usr/local/sbin/carp-hast-switch

#!/bin/sh

# The names of the HAST resources, as listed in /etc/hast.conf
resources="MySQLData FilesData"
# Resource mountpoints
resource_mountpoints="/var/db/mysql /FilesData"
# Supported file system types: UFS, ZFS
resource_filesystems="UFS ZFS"
# Service types: mysql nfs
resource_services="mysql nfs"

# Delay in mounting HAST resource after becoming primary
delay=3

# logging
log="local0.debug"
name="carp-hast"

# end of user configurable stuff

case "$1" in
	primary)
        logger -p $log -t $name "Switching to primary provider for - ${resources} -."
        sleep ${delay}

        # -- SERVICE MANAGEMENT --
        logger -p $log -t $name ">> Stopping services."
        resource_counter=1
        for resource in ${resources}; do
            resource_service=`echo $resource_services | cut -d\  -f$resource_counter`
            case "${resource_service}" in
            mysql)
                logger -p $log -t $name "Service MySQL detected for resource - ${resource} -."
                logger -p $log -t $name "Stoping MySQL service for resource - ${resource} -."
                service mysql-server stop
                logger -p $log -t $name "Done for resource ${resource}"
                ;;
            nfs)
                logger -p $log -t $name "Service NFS detected for resource - ${resource} -."
                logger -p $log -t $name "Disabling NFS-ZFS share for resource - ${resource} -."
                zfs set sharenfs="off" FilesData
                logger -p $log -t $name "Done for resource ${resource}"
                logger -p $log -t $name "Stopping NFS service for resource - ${resource} -."
                service nfsd stop
                logger -p $log -t $name "Done for resource ${resource}"
                ;;
            *)
                logger -p local0.error -t $name "ERROR: Unknown service: ${resource_filesystem}, exiting."
                exit 1
                ;;
            esac
            let resource_counter=$resource_counter+1
        done

        # -- HAST ROLE MANAGEMENT --
        logger -p $log -t $name ">> Managing disks."
        for resource in ${resources}; do
            # When primary HAST node is inaccesible secondary node stops hastd secondary processes automatically
            # Wait 30s for any "hastd secondary" processes to stop
            num=0
            logger -p $log -t $name "Waitting for secondary process of resource - ${resource} - to die."
            while $( pgrep -lf "hastd: ${resource} \(secondary\)" > /dev/null 2>&1 ); do
                let num=$num+1
                sleep 1
                if [ $num -gt 29 ]; then
                    logger -p $log -t $name "ERROR: Secondary process for resource - ${resource} - is still running after 30 seconds, exiting."
                    exit
                fi
            done
            logger -p $log -t $name "Secondary process for resource - ${resource} - dead successfully."

            # Switch role for resource
            logger -p $log -t $name "Switching resource - ${resource} - to primary."
            hastctl role primary ${resource}
            if [ $? -ne 0 ]; then
                logger -p $log -t $name "ERROR: Unable to change role to primary for resource - ${resource} -."
                exit 1
            fi
            logger -p $log -t $name "Role for HAST resource - ${resource} - switched to primary."
        done

        # -- WAIT FOR HAST DEVICE CREATION --
        logger -p $log -t $name ">> Waitting for hast devices."
        for resource in ${resources}; do
            num=0
            logger -p $log -t $name "Waitting for hast device of resource - ${resource} -."
            while [ ! -c "/dev/hast/${resource}" ]; do
                let num=$num+1
                sleep 1
                if [ $num -gt 29 ]; then
                    logger -p $log -t $name "ERROR: GEOM provider /dev/hast/${resource} did not appear, exiting."
                    exit
                fi
            done
            logger -p $log -t $name "Device /dev/hast/${resource} appeared for resource - ${resource} -."
        done

        # -- FILESYSTEM MANAGEMENT --
        logger -p $log -t $name ">> Managing filesystems."
        resource_counter=1
        for resource in ${resources}; do
            resource_mountpoint=`echo $resource_mountpoints | cut -d\  -f$resource_counter`
            resource_filesystem=`echo $resource_filesystems | cut -d\  -f$resource_counter`
            case "${resource_filesystem}" in
            UFS)
                logger -p $log -t $name "UFS filesystem detected in resource - ${resource} -."
                mkdir -p ${resource_mountpoint} 2>/dev/null
                logger -p $log -t $name "Checking /dev/hast/${resource} of resource - ${resource} -."
                fsck -p -y -t ufs /dev/hast/${resource}
                logger -p $log -t $name "Mounting /dev/hast/${resource} in ${resource_mountpoint}."
                out=`mount /dev/hast/${resource} ${resource_mountpoint} 2>&1`
                if [ $? -ne 0 ]; then
                    logger -p local0.error -t hast "ERROR: UFS mount - ${resource} - failed: ${out}."
                    exit 1
                fi
                logger -p local0.debug -t $name "UFS mount - ${resource} - mounted successfully."
                ;;
            ZFS)
                logger -p $log -t $name "ZFS filesystem detected in resource - ${resource} -."
                logger -p $log -t $name "Importing ZFS pool of resource - ${resource} -."
                out=`zpool import -f "${resource}" 2>&1`
                if [ $? -ne 0 ]; then
                    logger -p local0.error -t $name "ERROR: ZFS pool import for resource - ${resource} - failed: ${out}."
                    exit 1
                fi
                logger -p local0.debug -t $name "ZFS pool for resource - ${resource} - imported successfully."
                ;;
            *)
                logger -p local0.error -t $name "ERROR: Unknown filesystem: ${resource_filesystem}, exiting."
                exit 1
                ;;
            esac
            let resource_counter=$resource_counter+1
        done

        # -- SERVICE MANAGEMENT --
        logger -p $log -t $name ">> Starting services."
        resource_counter=1
        for resource in ${resources}; do
            logger -p $log -t $name "Starting service for resource - ${resource} -."
            resource_service=`echo $resource_services | cut -d\  -f$resource_counter`
            case "${resource_service}" in
            mysql)
                logger -p $log -t $name "Service MySQL detected for resource - ${resource} -."
                logger -p $log -t $name "Starting MySQL service for resource - ${resource} -."
                service mysql-server start
                logger -p $log -t $name "Done for resource . ${resource} -."
                ;;
            nfs)
                logger -p $log -t $name "Service NFS detected for resource - ${resource} -."
                logger -p $log -t $name "Starting NFS service for resource - ${resource} -."
                service nfsd start
                logger -p $log -t $name "Done for resource - ${resource} -."
                logger -p $log -t $name "Enabling NFS-ZFS share for resource - ${resource} -."
                zfs set sharenfs="maproot=root:wheel" FilesData
                logger -p $log -t $name "Done for resource - ${resource} -."
                ;;
            *)
                logger -p local0.error -t $name "ERROR: Unknown service: ${resource_filesystem}, exiting."
                exit 1
                ;;
            esac
            let resource_counter=$resource_counter+1
        done
    ;;

    secondary)
        logger -p $log -t $name "Switching to secondary provider for - ${resources} -."

        # -- SERVICE MANAGEMENT --
        logger -p $log -t $name ">> Stopping services."
        resource_counter=1
        for resource in ${resources}; do
            resource_service=`echo $resource_services | cut -d\  -f$resource_counter`
            logger -p $log -t $name "Stopping services for resource - ${resource} -."
            case "${resource_service}" in
            mysql)
                logger -p $log -t $name "Service MySQL detected for resource - ${resource} -."
                logger -p $log -t $name "Stopping MySQL service for resource - ${resource} -."
                service mysql-server stop
                logger -p $log -t $name "Done for resource - ${resource} -."
                ;;
            nfs)
                logger -p $log -t $name "Service NFS detected for resource - ${resource} -."
                logger -p $log -t $name "Disabling NFS-ZFS share for resource - ${resource} -."
                zfs set sharenfs="off" FilesData
                logger -p $log -t $name "Done for resource - ${resource} -."
                logger -p $log -t $name "Restarting NFS service for resource - ${resource} -."
                service nfsd restart
                logger -p $log -t $name "Done for resource - ${resource} -."
                ;;
            *)
                logger -p local0.error -t $name "ERROR: Unknown service: ${resource_service}, exiting."
                exit 1
                ;;
            esac
            let resource_counter=$resource_counter+1
        done

        # -- FILESYSTEM MANAGEMENT --
        logger -p $log -t $name ">> Managing filesystems."
        resource_counter=1
        for resource in ${resources}; do
            resource_mountpoint=`echo $resource_mountpoints | cut -d\  -f$resource_counter`
            resource_filesystem=`echo $resource_filesystems | cut -d\  -f$resource_counter`
            case "${resource_filesystem}" in
            UFS)
                logger -p $log -t $name "UFS filesystem detected in resource - ${resource} -."
                if ! mount | grep -q "^/dev/hast/${resource} on "
                then
                else
                    logger -p $log -t $name "Umounting - ${resource} -."
                    umount -f ${resource_mountpoint}
                    logger -p $log -t $name "Done."
                fi
                sleep $delay
                ;;
            ZFS)
                logger -p $log -t $name "ZFS filesystem detected in resources - ${resource} -."
                if ! mount | grep -q "^${resource} on ${resource_mountpoint}"
                then
                else
                    logger -p $log -t $name "Umounting - ${resource} -."
                    zfs umount ${resource}
                    logger -p $log -t $name "Done."
                    logger -p $log -t $name "Exporting ZFS pool of resources - ${resource} -."
                    out=`zpool export -f "${resource}" 2>&1`
                    if [ $? -ne 0 ]; then
                        logger -p local0.error -t $name "ERROR: ZFS pool export for resource - ${resource} - failed: ${out}."
                        exit 1
                    fi
                    logger -p local0.error -t $name "ZFS pool for resource - ${resource} - exported successfully."
                fi
                sleep $delay
                ;;
            *)
                logger -p local0.error -t $name "ERROR: Unknown filesystem: ${resource_filesystem}, exiting."
                exit 1
                ;;
            esac
            let resource_counter=$resource_counter+1
        done

        # -- HAST ROLE MANAGEMENT --
        logger -p $log -t $name ">> Managing resources."
        resource_counter=1
        for resource in ${resources}; do
            logger -p $log -t $name "Switching resource - ${resource} - to secondary."
            hastctl role secondary ${resource} 2>&1
            if [ $? -ne 0 ]; then
                logger -p $log -t $name "ERROR: Unable to switch resource - ${resource} - to secondary role."
                exit 1
            fi
            logger -p $log -t $name "Role for resource - ${resource} - switched to secondary successfully."
            let resource_counter=$resource_counter+1
        done
    ;;
esac

We assign the necessary permissions:

chmod 700 /usr/local/sbin/carp-hast-switch

Testing:

MySQL:
To generate MySQL traffic, we will use sysbench . To do this, we create the test database on the primary node:

mysql

root@localhost [(none)]> create database kr0m;
Query OK, 1 row affected (0.00 sec)

root@localhost [(none)]> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| kr0m               |
| mysql              |
| performance_schema |
| sys                |
+--------------------+
5 rows in set (0.00 sec)

We create the access user with the necessary grants for the database:

root@localhost [(none)]> create user sbtest_user identified by 'password';
Query OK, 0 rows affected (0.01 sec)

root@localhost [(none)]> grant all on kr0m.* to `sbtest_user`@`%`;
Query OK, 0 rows affected (0.01 sec)

root@localhost [(none)]> show grants for sbtest_user;
+-------------------------------------------------------+
| Grants for sbtest_user@%                              |
+-------------------------------------------------------+
| GRANT USAGE ON *.* TO `sbtest_user`@`%`               |
| GRANT ALL PRIVILEGES ON `kr0m`.* TO `sbtest_user`@`%` |
+-------------------------------------------------------+
2 rows in set (0.01 sec)

On our PC, we install sysbench:

pkg install sysbench

We create some tables and insert data into them:

sysbench –db-driver=mysql –mysql-user=sbtest_user –mysql_password=password –mysql-db=kr0m –mysql-host=192.168.69.40 –mysql-port=3306 –tables=16 –table-size=10000 /usr/local/share/sysbench/oltp_read_write.lua prepare

Now that we have data in the database, let’s have sysbench perform a read-write test:

sysbench –db-driver=mysql –mysql-user=sbtest_user –mysql_password=password –mysql-db=kr0m –mysql-host=192.168.69.40 –mysql-port=3306 –tables=16 –threads=4 –time=0 –events=0 –report-interval=1 /usr/local/share/sysbench/oltp_read_write.lua run

NFS:
To test NFS access, we will use bonnie++ . We install it on our PC where we have the NFS mounted:

pkg install bonnie++

Let bonnie++ run while we perform the failover tests.

bonnie++ -d /mnt/nfs/ -s 64G -n 0 -m HAST_NFS -f -b -u root

Failover:
To migrate the service between nodes, we have three possibilities:

Migrate the VIP by running the following command on the primary node:

ifconfig em0 vhid 1 state backup
Gracefully shut down the primary node:

service mysql-server stop
umount -f /var/db/mysql

service nfsd stop
zpool export -f FilesData

service hastd stop

shutdown -p now
Abruptly shutting down the primary node:

Pull the power cable.

In case of migrating the VIP, the node that has lost the floating IP will automatically reconfigure itself as secondary. In the other two cases, when the node comes back to life, the HAST resources will be in the init state:

hastctl status MySQLData

Name	Status	 Role		Components
MySQLData	-        init           /dev/ada1	192.168.69.42

hastctl status FilesData

Name	Status	 Role		Components
FilesData	-        init           /dev/ada2	192.168.69.42

We need to switch it to secondary:

hastctl role secondary MySQLData
hastctl role secondary FilesData
service mysql-server stop
service nfsd stop

Resulting in the following state:

hastctl status MySQLData

Name	Status	 Role		Components
MySQLData	complete secondary      /dev/ada1	192.168.69.42

hastctl status FilesData

Name	Status	 Role		Components
FilesData	complete secondary      /dev/ada2	192.168.69.42

Troubleshooting:

A very important point is that the nodes have the same time. Using NTP is a good idea.
In case of split-brain, if data has been written on both nodes, the administrator must decide which data is more important and configure the other node as secondary, discarding the data:

hastctl role init test
hastctl create test
hastctl role secondary test

There are occasions when the migration script fails. We can always run it manually to check if the script has any issues:

/usr/local/sbin/carp-hast-switch primary

We can see the steps performed by the failover script in the logs:

tail -f /var/log/debug.log

We can also check the status of HAST:

hastctl status MySQLData
hastctl status FilesData

Check if the ZFS pool is imported:

zpool list
zfs get sharenfs FilesData

Check that the test database exists:

mysql -sre “show databases”

Check that NFS access is available from the client:

showmount -e 192.168.69.40

We must be very clear that HAST is not a backup. If data is deleted, it will be replicated over the network to the secondary, resulting in data loss. Additionally, in certain scenarios such as its use as a backend for databases, it can cause problems. An incorrect shutdown of MySQL could leave the data corrupted, and it would be replicated at the block level to the secondary, leaving us with a corrupted database. In such cases, we can only restore a backup on the current primary.