How to install Ultra Monkey LVS
in a 2-Node HA/LB Setup on CentOS/RHEL4

I never really got this setup working quite right. Use the information here however you like, but don't bother asking me questions about it. I can't answer them.

I'm the resident Linux guru at my job -- a mid-sized local company with a decent sized IT department. We like to install servers in clusters to improve our fault tolerance. Being the Linux guy in a shop where Windows servers outnumber Unix server about 8:1, I wanted to do a one up on Windows' active-passive (high availability, or HA) cluster setup by doing a 2-node active-active (load balanced, or LB) cluster using the Linux Virtual Server (LVS) system. Our Linux distribution of choice is RedHat Enterprise Linux 4 (RHEL 4), and CentOS is the most compatible free clone thereof. Version 4 of these distros uses the Linux 2.6 kernel.

I was able to find a number of good tutorials on the web for configuring similar platforms, but nothing that quite matched what we wanted to do. Hence, I'm writing one now.

For these examples, let's assume that you have two physical web servers named lvs1 (192.168.0.1) and lvs2 (192.168.0.2) that you want to cluster together. They sit on a class C network, with a gateway router of 192.168.0.254. Those machines are known as the "real servers," since they are the ones that do the real work of serving up web pages. The outside world will reference those servers using a single hostname of vip1 (192.168.0.100). Either or both real servers will answer requests made to vip1. The determination of which real server will answer each request is made by the "ldirectord" package. In a larger setup, ldirectord would run on its own HA pair of servers, but in our 2-node setup, it jumps back and forth between the two real servers. The jumping back and forth (in case one director server completely dies) is handled by the "heartbeat" package.

The first step is to download all the necessary packages. All of them could be built from source, but I prefer to use RPM packages when available because they allow you to manage versions and dependancies much more easily. Since LVS doesn't officially ship with RHEL, the best place to get recent packages seems to be from the CentOS respository at ftp://ftp.osuosl.org/pub/centos/4.4/extras/i386/RPMS/ or directly from the Linux-HA web site at http://linux-ha.org/download/index.html. There is a bug in the IPaddr2 script in all 2.x versions prior to 2.0.8, so until 2.0.8 makes it into the repositories, you'll have to apply this patch (relative to v2.0.7) to /usr/lib/ocf/resource.d/heartbeat/IPaddr2.

The exact package list required will vary depending on what's already installed on your system. At a minimum, you will need the following packages. The indenting indicates the package dependancies; ie, most packages exist to support heartbeat and heartbeat-ldirectord.

  • heartbeat
    • heartbeat-pils
    • heartbeat-stonith
  • heartbeat-ldirectord
    • ipvsadm
    • perl-MailTools
      • perl-TimeDate
    • perl-Net-IMAP-Simple
    • perl-Net-IMAP-Simple-SSL
      • perl-IO-Socket-SSL
        • perl-Net-SSLeay
    • perl-Mail-POP3Client
    • perl-Mail-IMAPClient
    • perl-Authen-Radius
      • perl-Data-HexDump

Once the necessary packages are installed, you can start the configuration process. There's a pretty good writeup for installing Ultra Monkey in a 2-node HA/LB setup on RHEL3 or Debian here. I had a couple problems with that on RHEL4, though, which is why I'm writing my own tutorial.

First, you need to change a few kernel parameters by editing /etc/sysctl.conf. Ensure that the following variables are all set to the following values. Beware that some of them may be set to other values somewhere in the file, while others won't exist yet at all. These settings prevent the servers from advertising via ARP the VIP address that will later be assigned to each localhost interface. They also allow the machine acting as the director to forward packets to the other real server when necessary.

#========================================================================
# UltraMonkey requirements below
#

# Enable configuration of arp_ignore option
net.ipv4.conf.all.arp_ignore = 1

# When an arp request is received on eth0, only respond if that address is
# configured on eth0. In particular, do not respond if the address is
# configured on lo
net.ipv4.conf.eth0.arp_ignore = 1

# Ditto for eth1, add for all ARPing interfaces
#net.ipv4.conf.eth1.arp_ignore = 1

# Enable configuration of arp_announce option
net.ipv4.conf.all.arp_announce = 2

# When making an ARP request sent through eth0 Always use an address that
# is configured on eth0 as the source address of the ARP request.  If this
# is not set, and packets are being sent out eth0 for an address that is on
# lo, and an arp request is required, then the address on lo will be used.
# As the source IP address of arp requests is entered into the ARP cache on
# the destination, it has the effect of announcing this address.  This is
# not desirable in this case as adresses on lo on the real-servers should
# be announced only by the linux-director.
net.ipv4.conf.eth0.arp_announce = 2

# Ditto for eth1, add for all ARPing interfaces
#net.ipv4.conf.eth1.arp_announce = 2

# Enables packet forwarding
net.ipv4.ip_forward = 1

#
# UltraMonkey requirements above
#========================================================================

To make these changes take effect, either reboot the system or run:

# /sbin/sysctl -p

Next, you need to configure the loopback interface to have an alias for the VIP address so that the real servers will know to answer connections on that IP even when they're not acting as the director. Create a file named "/etc/sysconfig/network-scripts/ifcfg-lo:0" that contains IP information for the VIP and its network:

DEVICE=lo:0
IPADDR=192.168.0.100
NETMASK=255.255.255.255
NETWORK=192.168.0.0
BROADCAST=192.168.0.255
ONBOOT=yes
NAME=loopback

To turn on this new alias, run:

# /sbin/ifup lo

or

# service network start

This alias won't show up when running "ifconfig", a fact that caused me to waste several hours tracking down a problem that didn't even exist. Instead, you can verify its existance by running:

# ip addr sh lo
1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
    inet 192.168.0.100/32 brd 192.168.0.255 scope global lo:0
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever

Because we're using what's known as LVS-DR (direct routing), you need to make sure that the default gateway for the servers' primary network interface points to the proper gateway router rather than to the director. To do this, check for the GATEWAY entry in either "/etc/sysconfig/network" or "/etc/sysconfig/network-scripts/ifcfg-eth0" and ensure that it lists the proper IP:

# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=lvs1
GATEWAY=192.168.0.254

or

# cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
BOOTPROTO=static
ONBOOT=yes
TYPE=Ethernet
IPADDR=192.168.0.1
NETMASK=255.255.255.0
GATEWAY=192.168.0.254

You can verify this by running:

# ip route show 0/0
default via 192.168.0.254 dev eth0

Now it's time to configure the heartbeat package to handle failover of the VIP and ldirectord package. There are three files in "/etc/ha.d" that must be configured to make things work. Each of these files should be identical on the two real servers. The packages will install default config files full of comments, but here are a reasonable set of configuration parameters. Everywhere you see a hostname listed, it must match the output of "uname -n" on the appropriate server. The "authkeys" file must be readable only by root for security purposes.

# cat /etc/ha.d/ha.cf
logfacility   local0
keepalive     1
deadtime      10
warntime      5
initdead      120
udpport       694
mcast eth0 225.0.0.1 694 1 0
auto_failback off
node          lvs2.mydomain.com
node          lvs3.mydomain.com
ping          192.168.0.254
respawn hacluster /usr/lib/heartbeat/ipfail
crm off
# cat /etc/ha.d/authkeys
auth 2
2 sha1 ThisIsMyPassword
# cat /etc/ha.d/haresources
lvs1.mydomain.com       \
        ldirectord::ldirectord.cf \
        LVSSyncDaemonSwap::master \
        IPaddr2::192.168.0.100/24/eth0/192.168.0.255
# cat /etc/ha.d/ldirectord.cf
checktimeout=15
checkinterval=5
autoreload=no
logfile="/var/log/ldirectord.log"
quiescent=no
virtual=192.168.0.100:80
        fallback=127.0.0.1:80
        real=192.168.0.1:80 gate
        real=192.168.0.2:80 gate
        service=http
        request="ldirectord.html"
        receive="It worked"
        scheduler=rr
        persistent=600
        protocol=tcp
        checktype=negotiate

The above files should be the same on both hosts. ldirectord.cf above is configured to check for a web server on port 80 which contains a file in the root directory named ldirectord.html containing only the string "It worked". Ldirectord checks the health of each real server by querying each web server for that file. If it gets back a file containing the receive string, it considers the server willing and able to receive public requests. There are builtin check mechanisms for serveral other popular services, too.

Now you need to make sure that heartbeat is started at boot time and that ldirectord is NOT started at boot by running this on both servers:

/sbin/chkconfig heartbeat on
/sbin/chkconfig ldirectord off
/sbin/service ldirectord stop
/sbin/service heartbeat start

You also need to ensure that your user services (httpd, mysql, etc) are running before you turn on heartbeat. Give it a minute to startup and stabilize, then check that things are running by typing:

lvs1# ip addr sh
1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 brd 192.168.0.255 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:50:56:8a:01:10 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.1/18 brd 192.168.0.255 scope global eth0
    inet 192.168.0.100/18 brd 192.168.0.255 scope global secondary eth0
    inet6 fe80::250:56ff:fe8a:110/64 scope link
       valid_lft forever preferred_lft forever

lvs2# ip addr sh
1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
    inet 192.168.0.100/32 brd 192.168.0.255 scope global lo:0
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:50:56:8a:1f:39 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.2/18 brd 192.168.0.255 scope global eth0
    inet6 fe80::250:56ff:fe8a:1f39/64 scope link
       valid_lft forever preferred_lft forever

The first node you started up (the active director, lvs1 in this example) should have the VIP on eth0, while the second node you started should have it on lo. You can now run ipvsadm to check the status of the nodes and any incoming connections. Only the machine currently acting as director will list any useful info:

lvs2# ipvsadm -L -n
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn

lvs1# ipvsadm -L -n
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  192.168.0.100:80 rr persistent 600
  -> 192.168.0.2:80              Route   1      0          0
  -> 192.168.0.1:80              Local   1      0          0

You can see above in the "weight" column that incoming requests will be split equally between the two real servers. If you stop the HTTP daemon on one of othe servers, within a few seconds the weight for that server will drop top zero, and no more new requests will be directed to that server. To allow existing connections to finish politely while sending all new connections to the other box (if you're about to do some planned maintenance, for example), set the weight of the dying server to zero with the first command below. In order to make new connections from persistent hosts make the transition, you must set "quiescent=no" in ldirectord.cf. With "quiescent=yes", persistent hosts will continue trying to hit the dying server even after it dies, on the assumption that it will eventually come back.

# /sbin/ipvsadm -e -t 192.168.0.100:80 -r 192.168.0.2:80 -w 0
# /sbin/ipvsadm -L -n
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  192.168.0.100:80 rr persistent 600
  -> 192.168.0.2:80              Route   0      0          0
  -> 192.168.0.1:80              Local   1      0          0

If you don't want to remember that first ipvsadm command, you can (de)activate individual real services using this init script. Run "service cluster stop lvs2" to set the weight for lvs2 to zero. Determining the other functionality is left as an exercise for the reader.


last updated 12 March 2007
Obi-Wan (obiwan@jedi.com)

Links on this page to Amazon are part of an affiliate program that helps keep Jedi.com operational. Thank you for your support!




Namecheap.com - Cheap domain name registration, renewal and transfers - Free SSL Certificates - Web Hosting