A cluster in computing is a term used to describe a group of closely linked computers often appearing as a single entity to the outside world. There are various types of clusters—high-availability (HA) clusters, load-balancing clusters, compute clusters or high performance computing (HPC) clusters and grids.
An HA cluster is also known as a failover cluster and is typically meant to improve service availability rather than performance, by using redundant nodes. There are many models of HA cluster configuration such as active-passive, active-active, N+1, N+M, N-to-1 and N-to-N.
Load-balancing clusters distribute the workload evenly among various redundant nodes. There are various different algorithms, using which the load-balancer distributes the load among member servers.
HPC clusters are used for highly CPU-intensive compute jobs. There are various types of compute clusters. The common distinguishing factor is the coupling of the compute nodes. Typically, there are specialised scientific applications that run on these clusters. The applications are built using libraries supporting parallel processing. A popular example of a compute cluster is the Beowulf Clusters. These are built using commodity hardware and run on FOSS systems like FreeBSD or GNU/Linux. Typically, a Beowulf cluster uses either MPI (Message Passing Interface) or PVM (Parallel Virtual Machine) libraries that allow a programmer to divide a task among the nodes of the cluster, and then recollect and assemble the results later on.
A grid is a special class of compute clusters with possibly heterogeneous nodes that are not so tightly coupled with each other. All nodes in the grid target single problems that require a great number of CPU cycles and a large amount of data. A grid typically divides the entire computation work into jobs that are independent and do not share data with each other. The intermediate results of one job in the grid do not affect the other jobs running on other nodes.
Heartbeat is a piece of software from ‘The High Availability Linux’ project, which provides high-availability clustering solutions for a wide range of *nix operating systems, including (though not limited to) GNU/Linux, FreeBSD and OpenBSD.
The architecture: active-passive HA cluster
The two primary modes of an HA cluster are:
- Active-passive: In Active-Passive HA clusters, the primary node is active and serves the request. In case it fails, then the services are transferred to the secondary node, either through automatic or manual failover.
- Active-active: In active-active HA clusters, both the nodes remain active all the time and serve their respective requests. In case one of the nodes goes down, the services running on that node are failed-over to the other node in the cluster. This way an active-active HA cluster is used when you have multiple services under the high-availability requirement.
The service being served by the HA cluster depends on the IP address, so we first need to distinguish between the ‘administrative address’ and the ‘service address’. Each interface on the cluster nodes should have an administrative address and can optionally have one or more service addresses, depending on the cluster configuration (active-active or active-passive) and state (active or standby).
An administrative address is one that is in control of the operating system and is brought up and down with the OS. A service address is one that is under the control of the Heartbeat software, which then controls its allocation to one of the cluster nodes. The node where this service address should reside, by default, is known as the primary or the active node, and the other node in the cluster is known as a secondary, failover or passive node.
In failover clustering, when a failover happens, the secondary node takes over the service address and becomes active. This is how we will be configuring our cluster.
In our case, the service being offered by the cluster, a reverse-proxy server, depends on the IP address. So we need to take care of the following points:
- Make sure that the nginx server is not started automatically on any node, but is under the control of Heartbeat.
- The functioning of nginx will be dependent on the availability of the service IP address and hence, in case of a failover, we need to make sure that nginx is started after the service IP address has been taken over by the secondary node.
Table 1 can be referred to for setting up the networking. For a test environment, a flat network would do.
Table 1: Primary/secondary node IP addresses | ||
---|---|---|
Parameters/Node | rproxy1.unixclinic.net | rproxy2.unixclinic.net |
eth0 (192.168.1.x is Private Subnet for Heartbeat) | 192.168.1.1 | 192.168.1.2 |
eth1 (administrative address) | 172.202.2.1 | 172.202.2.2 |
Service address 10.8.0.1 | Primary node | Secondary node |
Installation and configuration of Heartbeat
On Debian-based systems, Heartbeat can be installed as follows:
# apt-get install heartbeat-2
On CentOS 5.2, after subscribing to the ‘extras’ repository, execute the following command:
# yum install heartbeat
Typically, in an active-passive HA environment, the nodes of the cluster have an identical set-up, so unless otherwise noted, all the configurations have to be done identically on both the nodes. It is not mandatory that the nodes are of identical hardware configuration, but this is recommended for a production environment. Also, having the same OS on both the nodes will help from the maintenance and troubleshooting point of view.
In the high-availability environment, all nodes should be able to see one another, irrespective of the availability of the DNS; so we need to make sure that there are relevant entries in the /etc/hosts
file on each node. My host’s file for this set-up looks like what’s shown below:
# cat /etc/hosts ....... 192.168.1.1 rproxy1 192.168.1.2 rproxy2
Now you should be able to ping
both nodes from each other. This will make sure that the Heartbeats from both nodes will see each other, irrespective of the DNS. You can build as many redundancies as you want; e.g., you can have bonding for the interface that’s responsible for sending Heartbeat, you can have trunking on the switch ports, and so on. Whatever you do, just make sure it’s not an overkill and the set-up is not unnecessarily expensive.
The ha.cf file
The main configuration file for Heartbeat is ha.cf
, which lists the nodes of the cluster, communications topology and all the features that are enabled. The order of directives in ha.cf
matters, so make sure you take note of that.
The minimum ha.cf
file will look like the following, on both the nodes:
# cat /etc/ha.d/ha.cf use_logd on keepalive 2 deadtime 30 warntime 10 initdead 120 udpport 695 ucast eth0 192.168.1.1 ucast eth0 192.168.1.2 # bcast eth0 auto_failback on node rproxy1 rproxy2
The explanation of these options is given below:
-
use_logd
specifies the use of the logging daemon to log all the messages. This option has deprecated the debugfile/logfile/logfacility log options. Using this is recommended. -
keepalive
specifies the number of seconds between two heartbeats. -
deadtime
specifies the number of seconds after which a host is considered dead if not responding. -
warntime
specifies the number of seconds after which the late Heartbeat warning will be issued. -
initdead
is the number of seconds to wait for the other host after starting Heartbeat, before it is considered dead. -
udpport
is the port number used for bcast/ucast communication. The default value for this is 694, but I’ve used 695 because I had a pair of HA LM-1500 load balancing appliances from Kemp Technologies on the same subnet for some Citrix servers. This appliance is based on Linux and to me it looked like it was using Heartbeat for HA on the default UDP port 694. -
bcast/ucast
is the interface on which to broadcast/unicast. If you are planning to make this only a two-node cluster, then there is no need for sending broadcasts; instead, use unicast. Now you will see that one of the IP addresses to which unicast is being sent is the local machine itself. I have done this to make sure that theha.cf
file is identical on both the cluster nodes. The unicast directives sent to the local machines are effectively ignored.
Note: If you have changed the default UDP port (like I have done above) then make sure that the ‘ucast’ or ‘bcast’ line is after the ‘udpport’ line; otherwise, the default port 694 will be used. Remember that the order of the DIRECTIVES in ha.cf is important and matters.
- If
auto_failback
is set to ‘on’, then resources are automatically failed back to its primary node. - The
node
directive specifies the nodes in the HA set-up. The name specified here must match with theuname -n
of the cluster node.
The haresources file
Now we need to tell Heartbeat about the resources the cluster will be managing. There are two ways of doing this. One, by using the haresources
file, and two, by enabling the Cluster Resource Manager (CRM) and using cib.xml
. In Heartbeat 2.x versions, if CRM is enabled, haresources
is not used.
Setting up clustering using CRM is unnecessarily complicated for our simple set-up, although the Linux-HA project provides the command line tools and GUI tools to manage the set-up. In our set-up, we will use the Heartbeat R1 style clustering (named so because of its compatibility with the older release 1.x of Heartbeat). Once you can get this working, you can move towards setting up the Heartbeat R2 style clustering.
If you are using the haresources
file for your set-up, you need to make sure this file is identical on both the machines. The general syntax of this file is to list the preferred-node followed by the list of resources that will be running on this preferred-node. All the resources specified on a single line are called a resource-group. In order to continue on the next line, a ‘\’ can be used. The first resource in each resource group (in case you are specifying multiple resource groups) needs to be unique because it is used as the resource-group name. The preferred-node is the one where the listed resources will run by default when both (or all) nodes of a cluster are available and the ‘autofailback’ option is set to ‘on’ in the ha.cf
file.
Shown below is my haresources file:
# cat /etc/ha.d/haresources rproxy1 10.8.0.1 nginx
While taking-over or acquiring the resources, Heartbeat uses left-to-right ordering and while releasing, a right-to-left order. All resources under cluster control must have a resource control script that is typically located in either /etc/init.d
or /etc/ha.d/resource.d
directories. Any script that can take at least two parameters, start and stop—to start and stop the resource, respectively—can be used as a resource control script. The IP address used in our haresources
file is the service IP address, where our nginx reverse proxy server will be serving the requests. In DNS, this IP is to be mapped with www.unixclinic.net, which is what the customers will be accessing. You can see that we have not used any resource control script to acquire the IP address. The reason for this is that a service IP address is typically the requirement for all kinds of HA clusters that we set up and hence Heartbeat, by default, uses a resource control script called IPaddr
for acquiring the IP address. So instead, the haresources
file could have been written as:
rproxy1 IPaddr::10.8.0.1 nginx
…or if we want to specify netmask, interface or broadcast values for the service IP, then we can use the following syntax:
rproxy1 IPaddr::10.8.0.1/255.255.255.0/eth1/255.255.255.255
In our haresources
file, we have only specified the service IP address for the cluster, but have not specified which interface in the machine will acquire this IP address; neither have we specified the netmask and broadcast values. In such cases, these values are set automatically by Heartbeat, by looking at the routing table. Heartbeat attempts to find the lowest cost route to the service IP address, and if multiple interfaces are found to provide the lowest cost route, then the first such route is considered. This basically means that the default route of the system is the least preferred. For setting up the broadcast address, the largest available address is used.
For details on this, see the box titled ‘IPaddr versus IPaddr2 Cluster Resource Manager’.
The second resource nginx is the name of the start-up script to start/stop the resource nginx and by default, Heartbeat looks for it in either /etc/init.d
or /etc/ha.d/resource.d
. The lines in haresources
are translated as follows:
- On Heartbeat startup, acquire IP address 10.8.0.1 and then start nginx on node rproxy1
- When Heartbeat is stopped, stop nginx and then release IP 10.8.0.1 on node rproxy1
IPaddr versus IPaddr2 Cluster Resource Manager |
---|
Heartbeat uses either the IPaddr or IPaddr2 resource manager script to configure IPv4 service addresses. By default, the IPaddr script is used. The cluster resource manager scripts are located in the /etc/ha.d/resource.d directory. The basic syntax for both the scripts are:
IPaddr::ip-address[/netmask][/interface][/broadcast] IPaddr2::ip-address[/netmask][/interface][/broadcast] The difference between the two is that the IPaddr script uses the ancient method of IP aliasing, whereas IPaddr2 uses the new IPv2 method of setting secondary IP address. There is a limit of 100 aliases on the IPaddr script, while there is no such limit on IPaddr2 script. In case you use IPaddr2 (which is preferable, and the default cluster resource manager script in Heartbeat now), you will not be able to see the acquired service address using the |
The authkeys file
The authkeys
file is very important in maintaining the security of the cluster, as it authenticates the cluster nodes. This file should be owned by the root with permissions set to 600 (that is, readable and writeable by only the root), otherwise Heartbeat will refuse to start. Also, all the nodes of the cluster should have an identical authkeys
file. Heartbeat supports three authentication methods: crc, md5 and sha1. You probably don’t use a serial or crossover connection for Heartbeat—if you do, crc is a good choice. For set-ups where the Heartbeat travels over the network, sha1 is a good choice. We will use sha1 in our set-up.
The following is the authkeys
file in our set-up:
# cat /etc/ha.d/authkeys auth 1 1 sha1 ThisIsMySecretKeyAndICanChooseAnyStringHere
Typically, the number of the key used is 1, but you can use any number ranging from 1-15. Just make sure that whatever number you use with the auth line, is present in one of the keys listed in the following lines. In order to generate a complete random number as a secret key for sha1, use the following command as suggested in the Linux-HA website and other places on the Web, and replace the string “ThisIsMySecretKeyAndICanChooseAnyStringHere” with the output of the following command:
# dd if=/dev/urandom count=4 2>/dev/null | openssl dgst -sha1
Controlling the start-up of cluster resources
Since our proxy server functionality is dependent on the IP address, we need to make sure that the nginx server does not get started automatically by the system start-up scripts during system boot. On Debian and its derivatives, this can be done as follows:
# invoke-rc.d nginx stop # update-rc.d -f nginx remove && update-rc.d nginx stop 45 .
Note how the update-rc.d
script is used above. If we would have just removed the nginx from start-up as done by the remove
command, an update to nginx through apt-get
would have triggered the creation of the missing symbolic links in the rc?.d
directories. This is not what we wanted and will bring our cluster down. For post installation/update scripts not to create or update symbolic links, and in order to leave nginx in the default disabled state, we have created a stop symlink in run-level 4 and 5 (/etc/rc4.d/K??nginx
and /etc/rc5.d/K??nginx
) file. This is because update-rc.d has been designed in a way to ignore creating or updating any symlinks, if something like /etc/rc?.d/[SK]??name
(where ‘name’ is nginx, in our case) already exists. This makes sure that a package update never changes any existing configuration. For details, read the man page of update-rc.d
.
On RHEL and derivatives:
# /sbin/service nginx stop # /sbin/chkconfig nginx remove
Starting the cluster
Now that we are all set to start the cluster, execute the following commands as per your distribution:
# invoke-rc.d heartbeat start ## on Debian
or:
# /sbin/service heartbeat start ## on RHEL
Our cluster will be up and running in a short time. You can check the availability of the service address and running nginx instance. In order to test the cluster failover and fallback, you can start playing with various options like unplugging the network cable from the back of the primary node, switching it off physically, etc.
I will leave testing in your capable hands.
Further tuning
I would strongly recommend that you set up the cluster to use the ipfail
plug-in to ensure proper failover, if a network problem occurs. Sometimes, in a large environment where the nodes of the clusters are far apart, the Heartbeat link typically stays alive, and so the cluster thinks that it is functioning in a proper manner. Due to the failure of a switch or router to which the primary node is connected, the resource on the cluster as a whole will not be available to users. In such cases, if you have configured ipfail
in your cluster, then Heartbeat on all nodes can continuously monitor whether they can reach one of the resources on the network (typically, a switch or router). This resource should not be a member of the cluster. If Heartbeat detects ping
failures then it can directly go and query if the other node has also detected the failures. If the other node reports that the connectivity is okay, then the services are failed back to that node.
Path to better redundancy
So over the last two articles, we have configured a load-balanced, highly-available cluster of Web servers. There are still many shortcomings in achieving the redundancy for a Web infrastructure. One of these issues is the data centre redundancy. In the next article in this series, we will be looking at setting up a redundant data centre for this environment, as well as at networking set-ups like DNS and firewalls, and the positioning of various components.
References
- Nginx Home Page
- The HA Linux Project
- Beowulf Clusters
- Sun Grid Engine
- Wikipedia—Computer Cluster
- Wikipedia—High-Availability Cluster
- Wikipedia—Load Balancing (Computing)
- Wikipedia—Beowulf (Computing)
- Wikipedia—Grid Computing
- Wikipedia—Sun Grid Engine
- Loadmaster LM-1500
- Wikipedia- Multicast
- Wikipedia—Unicast
- Microsoft KB—Differences between Unicast and Multicast
- Linux Advanced Routing and Traffic Control