Ajitabh Pandey's Soul & Syntax

Exploring systems, souls, and stories – one post at a time

Tag: HA

  • Balancing Traffic Across Data Centres Using LVS

    The LVS (Linux Virtual Server) project was launched in 1998 and is meant to eliminate Single Point of Failures (SPOF). According to the linuxvirtualserver.org website: “LVS is a highly scalable and available server built on a cluster of real servers, with the load balancer running on Linux. The architecture of the server cluster is fully transparent to the end user, and the users interact as if it were a single high-performance virtual server. The real servers and the load balancers may be interconnected by either a high speed LAN or by a geographically dispersed WAN.”
    (more…)

  • Building A Highly Available Nginx Reverse-Proxy Using Heartbeat

    A cluster in computing is a term used to describe a group of closely linked computers often appearing as a single entity to the outside world. There are various types of clusters—high-availability (HA) clusters, load-balancing clusters, compute clusters or high performance computing (HPC) clusters and grids.

    (more…)

  • Building A Highly-Available Web Server Cluster

    nginx (pronounced as ‘engine x’) is a powerful HTTP Web server/reverse proxy and IMAP/POP3 reverse proxy. According to a survey conducted in December 2008 by Netcraft, nginx has grown significantly and has surpassed lighttpd (also known as Lighty). Because of its small memory footprint and high scalability, nginx has found tremendous usage in virtual private servers (VPS).

    (more…)

  • Using SNAT for Highly Available Services

    Problem

    Often network based services are restricted to a particular source IP address. A common example is SNMP. A good system/network administrator will restrict access to the SNMP daemon from a particular host, usually a central management server. Sometimes these central management servers are HA pair. Under these circumstances, a service address can be used for the active node. This service address has access to the desired networking resource. Heartbeat usually will start this service IP address as a resource on the active node. This will result in the Active node taking over the IP address, which enables the node to listen on that IP address for incoming requests. But this still does not solve the problem of active node attempting to access a network resource, because all packets originating from this node will bear the primary IP address of this node and not the secondary address(es) or aliased address(es).

    Solution

    For such cases, SNAT (Source Network Address Translation) can be useful. Using SNAT we can ask the kernel to change the source IP addresses on all outgoing packets. But the IP address which we want on our packets must be present either as a primary or secondary or aliased IP address. This can be checked as:

    # ip addr show bond0
    6: bond0:  mtu 1500 qdisc noqueue
        link/ether 00:18:fe:89:df:d8 brd ff:ff:ff:ff:ff:ff
        inet 192.168.1.3/16 brd 172.16.255.255 scope global bond0
        inet 192.168.1.2/16 brd 172.16.255.255 scope global secondary bond0:0
        inet 192.168.1.1/16 brd 172.16.255.255 scope global secondary bond0:1
        inet6 fe80::218:feff:fe89:dfd8/64 scope link
           valid_lft forever preferred_lft forever

    or

    # ifconfig bond0
    bond0     Link encap:Ethernet  HWaddr 00:18:FE:89:DF:D8
              inet addr:192.168.1.3  Bcast:172.16.255.255  Mask:255.255.0.0
              inet6 addr: fe80::218:feff:fe89:dfd8/64 Scope:Link
              UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
              RX packets:53589964 errors:0 dropped:0 overruns:0 frame:0
              TX packets:25857501 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0
              RX bytes:40502210697 (37.7 GiB)  TX bytes:4148482317 (3.8 GiB)

    Instead of specifying a interface, all interfaces can also be viewed using:

    # ip addr show
    1: lo:  mtu 16436 qdisc noqueue
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
        inet6 ::1/128 scope host
           valid_lft forever preferred_lft forever
    2: eth0:  mtu 1500 qdisc pfifo_fast master bond0 qlen 1000
        link/ether 00:18:fe:89:df:d8 brd ff:ff:ff:ff:ff:ff
        inet6 fe80::218:feff:fe89:dfd8/64 scope link
           valid_lft forever preferred_lft forever
    3: eth1:  mtu 1500 qdisc pfifo_fast master bond0 qlen 1000
        link/ether 00:18:fe:89:df:d8 brd ff:ff:ff:ff:ff:ff
    4: sit0:  mtu 1480 qdisc noop
        link/sit 0.0.0.0 brd 0.0.0.0
    6: bond0:  mtu 1500 qdisc noqueue
        link/ether 00:18:fe:89:df:d8 brd ff:ff:ff:ff:ff:ff
        inet 192.168.1.3/16 brd 172.16.255.255 scope global bond0
        inet 192.168.1.2/16 brd 172.16.255.255 scope global secondary bond0:0
        inet 192.168.1.1/16 brd 172.16.255.255 scope global secondary bond0:1
        inet6 fe80::218:feff:fe89:dfd8/64 scope link
           valid_lft forever preferred_lft forever

    or

    # ifconfig
    bond0     Link encap:Ethernet  HWaddr 00:18:FE:89:DF:D8
              inet addr:192.168.1.3  Bcast:172.16.255.255  Mask:255.255.0.0
              inet6 addr: fe80::218:feff:fe89:dfd8/64 Scope:Link
              UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
              RX packets:53587551 errors:0 dropped:0 overruns:0 frame:0
              TX packets:25855600 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0
              RX bytes:40501872867 (37.7 GiB)  TX bytes:4148267377 (3.8 GiB)
    
    bond0:0   Link encap:Ethernet  HWaddr 00:18:FE:89:DF:D8
              inet addr:192.168.1.2  Bcast:172.16.255.255  Mask:255.255.0.0
              UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
    
    bond0:1   Link encap:Ethernet  HWaddr 00:18:FE:89:DF:D8
              inet addr:192.168.1.1  Bcast:172.16.255.255  Mask:255.255.0.0
              UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
    
    eth0      Link encap:Ethernet  HWaddr 00:18:FE:89:DF:D8
              inet6 addr: fe80::218:feff:fe89:dfd8/64 Scope:Link
              UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
              RX packets:53587551 errors:0 dropped:0 overruns:0 frame:0
              TX packets:25855600 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:40501872867 (37.7 GiB)  TX bytes:4148267377 (3.8 GiB)
              Interrupt:185
    
    eth1      Link encap:Ethernet  HWaddr 00:18:FE:89:DF:D8
              UP BROADCAST SLAVE MULTICAST  MTU:1500  Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
              Interrupt:193
    
    lo        Link encap:Local Loopback
              inet addr:127.0.0.1  Mask:255.0.0.0
              inet6 addr: ::1/128 Scope:Host
              UP LOOPBACK RUNNING  MTU:16436  Metric:1
              RX packets:536101 errors:0 dropped:0 overruns:0 frame:0
              TX packets:536101 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0
              RX bytes:59243777 (56.4 MiB)  TX bytes:59243777 (56.4 MiB)

    My NICs are bonded and hence bond0 is the interface I use.

    Setting Up SNAT

    In Linux IPTables can be used to setup SNAT.
    To change the source IP address of all packets going out of the box to anywhere, following rule can be used:

    $ sudo /sbin/iptables -t nat -A POSTROUTING -o bond0 -j SNAT --to-source 192.168.1.1

    The result can be seen as follows:

    $ sudo /sbin/iptables -t nat -L
    Chain PREROUTING (policy ACCEPT)
    target     prot opt source               destination
    
    Chain POSTROUTING (policy ACCEPT)
    target     prot opt source               destination
    SNAT       all  --  anywhere             anywhere            to:192.168.1.1
    
    Chain OUTPUT (policy ACCEPT)
    target     prot opt source               destination

    I normally restrict SNAT to selected services and destination IP addresses only. The following three IPTables command respectively translates the source address for all packets destined for 10.199.65.191 to 192.168.1.1, only ICMP packets destined for 192.168.2.4 and all packets destined for network 192.168.1.0/24:

    $ sudo /sbin/iptables -t nat -A POSTROUTING -d 10.199.65.191 -o bond0 -j SNAT --to-source 192.168.1.1
    $ sudo /sbin/iptables -t nat -A POSTROUTING -d 192.168.2.4 -p ICMP -o bond0 -j SNAT --to-source 192.168.1.1
    $ sudo /sbin/iptables -t nat -A POSTROUTING -d 192.168.1.0/24 -o bond0 -j SNAT --to-source 192.168.1.1

    The result of all these commands can be seen as:

    $ sudo /sbin/iptables -t nat -L
    Chain PREROUTING (policy ACCEPT)
    target     prot opt source               destination
    
    Chain POSTROUTING (policy ACCEPT)
    target     prot opt source               destination
    SNAT       all  --  anywhere             anywhere            to:192.168.1.1
    SNAT       all  --  anywhere             10.199.65.191       to:192.168.1.1
    SNAT       icmp --  anywhere             192.168.2.4         to:192.168.1.1
    SNAT       all  --  anywhere             192.168.1..0/24     to:192.168.1.1
    
    Chain OUTPUT (policy ACCEPT)
    target     prot opt source               destination

    Setting Heartbeat and IPTables for SNAT

    The /etc/ha.d/haresources file in heartbeat can be set to accept the desired IP address as a resource and associate it with a script which can start/stop/restart these IPTables rules.

    $ sudo vi /etc/ha.d/haresources
    node01 192.168.1.1 iptables

    Red Hat and Fedora has such a script and which is located in /etc/init.d/iptables. This script reads a file /etc/sysconfig/iptables, which contains various rules in iptables-save format. I had created a similar script for Debian and derivatives distributions which reads the rules from /etc/iptables file. The script is given below:

    #! /bin/sh
    # Script      - iptables
    # Description - Read IPTables rule from a file in iptables-save format.
    # Author      - Ajitabh Pandey 
    #
    PATH=/usr/sbin:/usr/bin:/sbin:/bin
    DESC="IPTables Configuration Script"
    NAME=iptables
    DAEMON=/sbin/$NAME
    SCRIPTNAME=/etc/init.d/$NAME
    
    # Exit if the package is not installed
    [ -x "$DAEMON" ] || exit 0
    
    # Load the VERBOSE setting and other rcS variables
    [ -f /etc/default/rcS ] && . /etc/default/rcS
    
    if [ ! -e /etc/iptables ]
    then
            echo "no valid iptables config file found!"
            exit 1
    fi
    
    case "$1" in
      start)
            echo "Starting $DESC:" "$NAME"
            /sbin/iptables-restore /etc/iptables
            ;;
      stop)
            echo "Stopping $DESC:" "$NAME"
            $DAEMON -F -t nat
            $DAEMON -F
            ;;
      restart|force-reload)
            echo "Restarting $DESC:" "$NAME"
            $DAEMON -F -t nat
            $DAEMON -F
            /sbin/iptables-restore /etc/iptables
            ;;
      *)
            echo "Usage: $SCRIPTNAME {start|stop|restart|force-reload}" >&2
            exit 3
            ;;
    esac

    Following is a sample iptables rules file, in iptables-save format.

    *nat
    :PREROUTING ACCEPT [53:8294]
    :POSTROUTING ACCEPT [55:11107]
    :OUTPUT ACCEPT [55:11107]
    
    # Allow all ICMP packets to be SNATed
    -A POSTROUTING  -p ICMP -o bond0 -j SNAT --to-source 192.168.0.1
    
    # Allow packets destined for SNMP port (161) on local network to be SNATed
    -A POSTROUTING -d 192.168.0.0/24 -p tcp -m tcp --dport snmp -o bond0 -j SNAT --to-source 192.168.0.1
    -A POSTROUTING -d 192.168.0.0/24 -p udp -m udp --dport snmp -o bond0 -j SNAT --to-source 192.168.0.1
    
    # These are for the time servers on internet
    -A POSTROUTING -p tcp -m tcp --dport ntp -o bond0 -j SNAT --to-source 192.168.0.1
    COMMIT
    
    *filter
    :INPUT ACCEPT [0:0]
    :FORWARD ACCEPT [0:0]
    :OUTPUT ACCEPT [144:12748]
    COMMIT
  • Highly Available Apache with mod_jk load balanced Tomcat

    This how-to describes the procedure for setting up a highly available setup in which a pair of apache servers are load balancing a pair of tomcat servers. I have tested this configuration with Red Hat Enterprise 4, Fedora Core 5, Debian GNU/Linux 3.1 and CentOS 4. Technically this should work on any Unix capable of running apache+mod_jk, tomcat and heartbeat.

    Pre-requisites

    A pair of linux machine with Apache web server running with the default configuration.

    Assumptions

    Hostnames – webhst01, webhst02 Domain – Unixclinic.net IP addresses – 192.168.1.1, 192.168.1.2

    This is described in another post – Setting Up Tomcat and Apache

    Setting up highly available Apache

    We need to install “heartbeat” on both the machines for making apache as highly available. Except for Red Hat Enterprise linux, heartbeat is in the package set of Fedora, Debian and CentOS. CentOS packages work on Red Hat enterprise linux.

    On Fedora and CentOS

    yum install heartbeat

    On Debian

    apt-get install heartbeat

    Configuring heartbeat

    With a high availability set-up, its recommended that the heartbeat travels over a separate link. But in my setup, the servers were geographically separate location (in two different data centres) and hence I had to send the heartbeat over standard ethernet link. But this does not make any difference to the working of the servers.

    Heartbeat can be configured for active -standby mode or active-active mode. Active-Standby mode – One host remains active for all the HA services (known as primary host) and the other remains standby. Active-Active mode – Service-1 is primary at one node and service-2 primary at second node, so both nodes are active at the same time but offering different services. In case any one of these nodes fails, heartbeat transfer the services to the other host.

    /etc/hosts file

    In a highly available environment all nodes should be able to see each other irrespective of whether the DNS server is available or not. The /etc/hosts file is typically used in this case. Add the following two lines in the /etc/hosts file in both the hosts.

    192.168.1.1 webhst01.unixclinic.net webhst01
    192.168.1.2 webhst02.unixclinic.net webhst02

    You should be able to ping both servers i.e. both servers should be able to see each other.

    /etc/ha.d/ha.cf file

    This is the heartbeat configuration file and has to be configured identically on all the nodes of the set-up. Its quite possible that you do not want to exceed more than two nodes in the highly available set-up. If you are sure about that then there is not need to use multicast for heartbeat and unicast can be used. If unicast is used then the only difference between this file in the two nodes is the IP address of the host to unicast to.

    # File          - ha.cf
    # Purpose       - Heartbeat configuration file
    #       ATTENTION: As the configuration file is read line by line,
    #               THE ORDER OF DIRECTIVE MATTERS!
    # Author        - Ajitabh Pandey
    # History       -
    #       ver 0.1 -  Created the file on 31st Oct 2006 - AP
    #
    debugfile /var/log/ha-debug # Debug messages are logged here
    logfile /var/log/ha-log     # all heart-beat log messages are stored here
    logfacility     local0      # This is the syslog log facility
    keepalive 2                 # no of secs between two heartbeats
    deadtime 30                 # no of secs after which a host is considered dead if not responding
    warntime 10                 # no of secs after which late heartbeat warning will be issued
    initdead 120                # After starting up heartbeat no of secs to wait for other
                                # host to respond to heartbeat before considering it dead
    udpport 695                 # This is the UDP port for the bcast/ucast communication
    bcast   bond0               # this is the interface to broadcast
    auto_failback on            # resources are automatically failback to its primary node
    node webhst01               # first node in the setup
    node webhst02               # second node in the setup

    /etc/ha.d/haresources

    This file contains a list of all resources being managed by heartbeat and their primary nodes.

    I will discuss Active-standby in this post:

    Here webhst01 will host the IP address 192.168.1.5 and the httpd and hence any services under this IP address will also be hosted on this. In case this host fails this IP address will be taken over by webhst02.

    webhst01.bds.tv 192.168.1.3 httpd

    /etc/ha.d/authkeys

    This file must be owned, readable and writeable by root only, otherwise heartbeat will refuse to start. Also this file must be same on all the nodes.

    auth 1
    1 crc

    Configuring Apache

    httpd.conf

    Depending on distribution being used, this file can be present at different locations. Red Hat, Fedora and CentOS will have this file as /etc/httpd/conf/httpd.conf and Debian and derivatives will have this as /etc/apache2/apache2.conf.

    We will make apache listen only on the service IP address.

    Listen 192.168.1.3:80

    I normally use name based virtual hosts, even if there is a single website to be hosted. So I will create a virtual host

    <VirtualHost NameVirtualHost *:80>
        ServerName   webhst.unixclinic.net
        ServerAlias  webhst
        ServerAdmin  webmaster@unixclinic.net
        DocumentRoot /var/www/webhst.unixclinic.net
    </VirtualHost>
    
    Now since this IP address can be taken over by any machine in the cluster, we have to make sure that apache webserver does not start up automatically on this system.
    
    On Red Hat and derivatives its easy:
    
    chkconfig httpd remove

    On Debian and derivatives:

    update-rc.d -f apache2 remove && update-rc.d apache2 stop 45 S .

    Start heartbeat and do fail-over testing

    On Red Hat and derivatives

    service heartbeat start

    On Debian and derivatives

    invoke-rc.d heartbeat start

    The activities can be seen in /var/log/ha-log file on both the nodes. Fail-over testing can be done as below:

    1. Shutdown primary node, while keeping the secondary running. Secondary should take over the services when primary goes down.
    2. Bring primary node up while the secondary is running the services, the services should automatically be fallback to primary.