HeartBeat

HeartBeat is a high-availability-cluster software for Linux that allows you to make a service highly available without needing some kind of shared storage. The concept is that one cluster node offers a service on a virtual IP-address. When this computer breaks down, another cluster node takes over the virtual ip address and continues serving.

= Installation =

Process
192.168.0.21   heartbeat-1.site heartbeat-1 192.168.0.22   heartbeat-2.site heartbeat-2 heartbeat-1:~ # yast2 heartbeat Choose MD5 as auth method and myauth as password. heartbeat-1:~ # yast -i heartbeat heartbeat-1:~ # scp /etc/ha.d/ha.cf root@heartbeat-2:/etc/ha.d heartbeat-1:~ # scp /etc/ha.d/authkeys root@heartbeat-2:/etc/ha.d heartbeat-1:~ # /etc/init.d/heartbeat start heartbeat-2:~ # /etc/init.d/heartbeat start heartbeat-1:~ # passwd hacluster heartbeat-1:~ # hb_gui Log in as hacluster user to heartbeat-1
 * On heartbeat-*, install SLES 10 as per default, but I disable ZMD and the firewall.
 * On tweedleburg, install an iscsi storage.
 * establish passwordless login between heartbeat-1 and -2.
 * adapt /etc/hosts on both nodes so that it contains
 * configure the cluster on node 1
 * install heartbeat on node 2
 * propagate the heartbeat configuration
 * start the cluster
 * set a password for the hacluster admin
 * configure the cluster

Shared storage
heartbeat-1:~ # /usr/lib64/heartbeat/sfex_init /dev/sdb heartbeat-1:~ # /usr/lib64/heartbeat/sfex_lock /dev/sdb acquired lock successfully. heartbeat-1:~ # /usr/lib64/heartbeat/sfex_stat /dev/sdb control data: magic: 0x01, 0x1f, 0x71, 0x7f version: 1 revision: 3 blocksize: 512 numlocks: 1 lock data #1: status: lock count: 2 nodename: heartbeat-1 status is LOCKED. heartbeat-2:~/sfex-1.3 # /usr/lib64/heartbeat/sfex_stat /dev/sdb control data: magic: 0x01, 0x1f, 0x71, 0x7f version: 1 revision: 3 blocksize: 512 numlocks: 1 lock data #1: status: lock count: 2 nodename: heartbeat-1 status is UNLOCKED.
 * build sfex from http://www.linux-ha.org/sfex on both nodes
 * set up an iscsi initiator on both nodes, we assume it makes the device /dev/sdb.
 * test sfex by locking /dev/sdb
 * test the other cluster node realizes the lock:

Configure cluster resources
In the hb_gui program, you add a new resource group named resource_group. You add the first resource, the sfex device. You add the start, stop and monitor operations to this.

= Commands = To see if your cluster is up and running, use the cluster resource manager, i.e. like this: crm_mon -i5