High Availability

As more and more critical commercial applications move on the Internet, providing highly available servers becomes increasingly important. One of the advantages of a clustered system is that it has hardware and software redundancy. High availability can be provided by detecting node or daemon failures and reconfiguring the system appropriately so that the workload can be taken over by the remaining nodes in the cluster.

The high availability of virtual server is now provided by using of "mon", "heartbeat" , "fake" and "coda" software. The "mon" is a general-purpose resource monitoring system, which can be used to monitor network service availability and server nodes. The "heartbeat" code currently provides the round-robin (ring) heartbeats among computers through serial lines. Fake is IP take-over software by using of ARP spoofing.  The high availability of Linux Virtual Server is illustrated in the following figure.

The server failover is handle as follows: The "mon" daemon is running on the load balancer to monitor service daemons and server nodes in the cluster. The fping.monitor is configured to detect whether the server nodes is alive every t seconds, and the relative service monitor is also configured to detect the service daemons on all the nodes every m minutes. For example, http.monitor can be used to check the http services; ftp.monitor is for the ftp services, and so on. An alert was written to remove/add a rule in the virtual server table while detecting the server node or daemon is down/up. Therefore, the load balancer can automatically mask service daemons or servers failure and put them into service when they are back.

Now, the load balancer becomes a single failure point of the whole system. In order to prevent the failure of the load balancer, we need setup a backup server of the load balancer. The "fake" software is used for the backup to takeover the IP addresses of the load balancer when the load balancer fails, and the "heartbeat" code is used to detect the status of the load balancer to activate/deactivate the "fake" on the backup server. Two heartbeat daemons run on the primary and the backup, they heartbeat the message like "I'm alive" each other through the serial line periodically. When the heartcode daemon of the backup cannot hear the "I'm alive" message from the primary in the defined time, it activates the fake to take over the virtual IP address to provide the load-balancing service; when it receives the "I'm alive" message from the primary later, it deactivate the fake to release the virtual IP address, and the primary comes back to work again.

However, the failover or the takeover of the primary load balancer will cause the established connection in the hash table lost in the current implementation, which will require the clients to send their requests again.

Coda is a fault-tolerant distributed file systems, a descendant of Andrew file system. The contents of servers can be stored in Coda, so that files can be highly available and easy to manage.

An Example

The following is an example to setup a highly available virtual web server via tunneling.

The failover of real servers

The "mon" is used to monitor service daemons and server nodes in the cluster. For example, the fping.monitor can be used to monitor the server nodes, http.monitor can be used to check the http services, ftp.monitor is for the ftp services, and so on. So, we just need to write an alert to remove/add a rule in the virtual server table while detecting the server node or daemon is down/up. Here is an example calleded virtualserver.alert, which takes virtual service(IP:Port) and the service port of real servers as parameters.
 
#!/usr/bin/perl
#
# virtualserver.alert - Virtual server alert for mon
#
# It can be activated by mon to remove a virtual server rule when the
# service is down, and add a virtual server rule when the service is up.
#
# $Id: HighAvailability.html,v 1.1 1999/09/09 13:36:13 wanger Exp $
#
# Copyright (C) 1998, Wensong Zhang
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
#
use Getopt::Std;
getopts ("s:g:h:t:l:V:R:u");

$virtual_service = $opt_V;
$remote = $opt_R;

if ($opt_u) {
system("ippfvsadm -A -t $virtual_service -R $remote");
}
else {
system("ippfvsadm -D -t $virtual_service -R $remote");
};

The virtualserver.alert is put under the /usr/lib/mon/alert.d directory. The /usr/lib/mon/mon.cf can be configured to monitor the http services and servers in the cluster as follows.
 
#
# Example "mon.cf" configuration for "mon".
#
# $Id: HighAvailability.html,v 1.1 1999/09/09 13:36:13 wanger Exp $
#

#
# global options
#
alertdir = /usr/lib/mon/alert.d
mondir = /usr/lib/mon/mon.d
#maxprocs = 20
histlength = 100
randstart = 5s

#
# NB: hostgroup and watch entries are terminated with a blank line (or
# end of file). Don't forget the blank lines between them or you lose.
#

#
# group definitions (hostnames or IP addresses)
#

hostgroup www1 www1.domain-name.com

hostgroup www2 www2.domain-name.com
 

# Web server
#
watch www1
service http
interval 11s
monitor http.monitor
period wd {Sun-Sat}
alert mail.alert wensong
upalert mail.alert wensong
alert virtualserver.alert -V 192.168.0.5:80 -R 192.168.0.6
upalert virtualserver.alert -V 192.168.0.5:80 -R 192.168.0.6

# Web server
#
watch www2
service http
interval 13s
monitor http.monitor
period wd {Sun-Sat}
alert mail.alert wensong
upalert mail.alert wensong
alert virtualserver.alert -V 192.168.0.5:80 -R 192.168.0.7
upalert virtualserver.alert -V 192.168.0.5:80 -R 192.168.0.7

Note that you need to set the paramter of virtualserver.alert like "virtualserver.alert -V 192.168.0.5:80 -R 192.168.0.7:80" if the virtual server via NAT is used.

Finally, put the S99mon script under the /etc/rc.d/rc3.d directory (for RedHat). Now the load balancer can automatically mask service daemons or servers failure and put them into service when they are back. Setup the same mon daemon on both the primary and the backup, so that they can keep the up-to-date status of real servers.

The failover of the load balancer

Note: the following instructions of using heartbeat code is out of date, it only applies to the very early versions of heartbeat code. The instructions of how to use the latest heartbeat code in Linux Virtual Server will come soon.

In order to prevent the load balancer becoming a single failure point of the whole system, we need setup a backup of the load balancer. The network interfaces of the primary and the backup are configured as follows:
 
The primary:
ifconfig eth0 192.168.0.3 netmask 255.255.255.0 broadcast 192.168.0.0 up
route add -net 192.168.0.0 netmask 255.255.255.0 dev eth0
ifconfig eth0:0 192.168.0.5 netmask 255.255.255.255 broadcast 192.168.0.5 up
route add -host 192.168.0.5 dev eth0:0

The backup:
ifconfig eth0 192.168.0.4 netmask 255.255.255.0 broadcast 192.168.0.0 up
route add -net 192.168.0.0 netmask 255.255.255.0 dev eth0

The "fake" software is installed on the backup to takeover the IP address of the load balancer when the load balancer fails. The /etc/fake/instance_config/192.168.0.5.cfg is configured as follows:
 
SPOOF_IP=192.168.0.5
SPOOF_NETMASK=255.255.255.0
TARGET_INTERFACE=eth0:1

The primary and the backup are linked through a serial line. The "heartbeat" code is installed on both the primary and the backup. The /etc/ha.d/ha.cf is both configured as follows:
 
#
# serial serialportname ...
# Must have one. Two provides redundancy
#
# keepalive seconds-between-heartbeats
# deadtime seconds-to-declare-host-dead
# hopfudge maximum hop count minus number of nodes in config
#
#
# node nodename ... -- must match uname -n
#
serial /dev/ttyS0
keepalive 2
deadtime 4
hopfudge 1
#
node vs.domain-name.com
node backup.domain-name.com

The /etc/ha.d/harc file on the backup is needed to change as follows:
 
#!/bin/sh

#
#
# This script is patterned after the Red Hat SysV init script system
#
# It doesn't know how to do anything except to run other scripts...
#
# Basically, it notifies the world of something that was sent around
# via the heartbeat cluster network...
#
#

RCDIR=$HA_DIR/rc.d

hadate() {
date "+${HA_DATEFMT}"
}

exec >> $HA_DEBUGLOG 2>&1

echo "`hadate`INFO: Running $0: argv[1] == $1";
 

for j in "$@"
do
echo "`hadate`ARG: [$j]"
done
if
[ ! -d $RCDIR ]
then
echo "`hadate`ERROR: $0: $RCDIR does not exist" >>$HA_LOGFILE
exit 1
else
if
[ ! -x $RCDIR/$1 ]
then
echo "`hadate`ERROR: $0: $RCDIR/$1: cannot execute " >>$HA_LOGFILE
exit 1
fi
fi

echo "Enter Fake Section... STATUS $6 "
if
[ "$6" = "dead" ]
then
echo "The primary site is dead, running fake" >>$HA_LOGFILE
/usr/bin/fake 192.168.0.5 &
else
echo "The primary site is alive, removing fake" >> $HA_LOGFILE
/usr/bin/fake remove 192.168.0.5
fi

echo "`hadate`INFO: Running $RCDIR/$1 $*" >>$HA_LOGFILE
exec $RCDIR/$1 "$@"
echo "`hadate`ERROR: $0: $RCDIR/$1: cannot execute " >>$HA_LOGFILE

Then, when the heartbeat daemon of the backup cannot receive the heartbeat message in 4 seconds, it will activate the fake to take over the 192.168.0.5 address to provide the service. When the primary comes back, it will deactivate the fake to release the 192.168.0.5 address.


Last updated: 1999/8/7

Created on: 1998/12/5