[SOLVED] LVS setup : Pulse process not discovering the backup server via heartbeat

Issues related to applications and software problems
bchand
Posts: 16
Joined: 2012/01/02 18:55:39

[SOLVED] LVS setup : Pulse process not discovering the backup server via heartbeat

Post by bchand » 2012/01/19 00:28:11

All

I been attempting to evaluate if Linux Virtual Server ( bundled with CentOS 6.2 ) would meet my needs for a highly available web cluster.

In this process I have hit a snag. I am not able to get the master pulse process on my lb1 server to detect the backup pulse process on the lb2 server

As a result both appear to think the other is not running and both attempt to startup lvs services, which the backup node shouldn't unless if the master is really down.

I am going to share my IP's on this post, as they are 192.168.1.* addresses and not publicly available via the internet anyways.

The output of everything I could think of as relevant is below. I apologize for the long post, however if additional information is required please let me know and I will get it.

Thank you in advance for any assistance in getting this "heartbeat" issue figured out.
Brian

IP layout

192.168.1.40 the VIP for the LVS cluster.
192.168.1.41 ( hostname c1 ) -- real web server ( running apache and have confirmed it is fully functional )
192.168.1.42 ( hostname c2 ) -- real web server ( running apache and have confirmed it is fully functional )
192.168.1.44 ( hostname lb1 ) -- This is intended to be my primary LVS server
192.168.1.45 ( hostname lb2 ) -- This is intended to be my backup LVS server


/etc/sysconfig/selinux contents on both lb1 and lb2 ( I purposely switched to permissive on a hunch that SELinux was causing the issue, it appears that assumption is not correct )

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
SELINUX=permissive
# SELINUXTYPE= can take one of these two values:
# targeted - Targeted processes are protected,
# mls - Multi Level Security protection.
SELINUXTYPE=targeted


ifconfig for both lb1 and lb2

[root@lb1 ha]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:0C:29:EA:5A:CF
inet addr:192.168.1.44 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:feea:5acf/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:32460 errors:0 dropped:0 overruns:0 frame:0
TX packets:7454 errors:0 dropped:127 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:9502529 (9.0 MiB) TX bytes:995136 (971.8 KiB)

[root@lb2 ~]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:0C:29:51:49:AC
inet addr:192.168.1.45 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fe51:49ac/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:33253 errors:0 dropped:0 overruns:0 frame:0
TX packets:6984 errors:0 dropped:90 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:9742659 (9.2 MiB) TX bytes:886201 (865.4 KiB)


contents of /etc/sysconfig/ha/lvs.cf ( confirmed to be the same on lb1 and lb2 )

serial_no = 42
primary = 192.168.1.44
service = lvs
backup_active = 1
backup = 192.168.1.45
heartbeat = 1
heartbeat_port = 539
keepalive = 10
deadtime = 30
network = direct
debug_level = NONE
monitor_links = 0
syncdaemon = 0
virtual web {
active = 1
address = 192.168.1.40 eth0:1
vip_nmask = 255.255.255.0
port = 80
expect = "OK"
use_regex = 1
send_program = "/usr/bin/http_check %h"
load_monitor = none
scheduler = wlc
protocol = tcp
timeout = 6
reentry = 15
quiesce_server = 1
server c1 {
address = 192.168.1.41
active = 1
weight = 1
}
server c2 {
address = 192.168.1.42
active = 1
weight = 1
}
}



contents of my iptables on lb1 ( NOTE: I have the same issue if turn off iptables completely on both lb1 and lb2 )

# Firewall configuration written by system-config-firewall
# Manual customization of this file is not recommended.
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 80 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 3636 -j ACCEPT
-A INPUT -m state --state NEW -m udp -p udp --dport 513 -j ACCEPT
-A INPUT -m state --state NEW -m udp -p udp --dport 539 -j ACCEPT
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
COMMIT


contents of iptables for lb2

# Firewall configuration written by system-config-firewall
# Manual customization of this file is not recommended.
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 80 -j ACCEPT
-A INPUT -m state --state NEW -m udp -p udp --dport 513 -j ACCEPT
-A INPUT -m state --state NEW -m udp -p udp --dport 539 -j ACCEPT
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
COMMIT


Here is the output via pulse -n -v on the Master node ( lb1 )
NOTE: the partner dead messages and that it is starting the lvs services

pulse: STARTING PULSE AS MASTER
pulse: DEBUG -- setting SEND_heartbeat timer
pulse: DEBUG -- Sending heartbeat...
pulse: DEBUG -- setting SEND_heartbeat timer
pulse: DEBUG -- Sending heartbeat...
pulse: DEBUG -- setting NEED_heartbeat timer
pulse: partner dead: activating lvs
pulse: DEBUG -- setting SEND_heartbeat timer
lvs: starting virtual service web active: 80
lvs: create_monitor for web/c1 running as pid 4499
lvs: create_monitor for web/c2 running as pid 4500
.....


Here is the output via pulse -n -v on the Backup node ( lb2 )
NOTE: the partner dead message and it too is starting the lvs services too. Which it should not unless LB1 is truly down.

pulse: STARTING PULSE AS BACKUP
pulse: DEBUG -- setting SEND_heartbeat timer
pulse: DEBUG -- Sending heartbeat...
pulse: DEBUG -- setting SEND_heartbeat timer
pulse: DEBUG -- Sending heartbeat...
pulse: DEBUG -- setting NEED_heartbeat timer
pulse: partner dead: activating lvs
pulse: DEBUG -- setting SEND_heartbeat timer
lvs: starting virtual service web active: 80
pulse: DEBUG -- Executing '/sbin/ifconfig eth0:1 192.168.1.40 netmask 255.255.255.0 up'
lvs: create_monitor for web/c1 running as pid 3443
lvs: create_monitor for web/c2 running as pid 3444

User avatar
TrevorH
Site Admin
Posts: 33216
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: LVS setup : Pulse process not discovering the backup server via heartbeat

Post by TrevorH » 2012/01/19 00:40:07

I'm suspecting that you don't have all necessary ports open on your firewall. I don't know what ports it needs so I would add a new rule to the end of the chain, just before the final REJECT line that you have now on the INPUT chain, to log all packets before they are rejected.

[code]
-A INPUT -j LOG
-A INPUT -j REJECT --reject-with icmp-host-prohibited
[/code]

This will write lines to syslog for any packets that have not already been matched by an ACCEPT rule. If there are packets logged then you can look at those to see which port(s) they are for and add corresponding firewall rules for them. If there aren't any logged then you know it isn't that :-)

bchand
Posts: 16
Joined: 2012/01/02 18:55:39

Re: LVS setup : Pulse process not discovering the backup server via heartbeat

Post by bchand » 2012/01/19 01:02:52

Thanks for the quick reply.

I gave it a go and found that it was not logging the port. Which i take is a good thing.

FYI the port is ostensibly UDP 539.

I also just for the fun of it, I completely shut down iptables by using /etc/init.d/iptables stop and verified that iptables --list was empty. the output from pulse -n -v is the same as the initial post.

I don't think this is firewall related. However I am open to other suggestions and opinions.

Thanks
Brian

User avatar
TrevorH
Site Admin
Posts: 33216
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: LVS setup : Pulse process not discovering the backup server via heartbeat

Post by TrevorH » 2012/01/19 01:12:13

That eliminates that theory then!

I reread your post and do not see anything about verifying the connectivity but I suspect that you have... can you ping the other server and get a response?

bchand
Posts: 16
Joined: 2012/01/02 18:55:39

Re: LVS setup : Pulse process not discovering the backup server via heartbeat

Post by bchand » 2012/01/19 01:20:57

pings look good, here is what I checked.

ping from lb1 to lb2 is good
ping from lb2 to lb1 is good
ping from lb1 to c1 and c2 is good
ping from lb2 to c1 and c2 is good

FWIW I also ran arp on lb1 and lb2 and the info looks correct and arpings are good too.

LOL looking like I got a real head scratcher here..... Wondering if the pulse package has a "feature" I don't have a RH subscription to confirm. However the CentOS bug search revealed no joy.

I am running lb1 and lb2 on vmware player, however I don't see how that would pose a problem. However, just putting that out there in case if I am mistaken.

Standing by for any further ideas. Thanks for keeping them coming.

Brian

bchand
Posts: 16
Joined: 2012/01/02 18:55:39

Re: LVS setup : Pulse process not discovering the backup server via heartbeat

Post by bchand » 2012/01/19 02:42:17

I think I "may" be on to something. I found this bug ( filed against RHEL 5) however I have seen evidence of this situation in 6.2 as well.

https://bugzilla.redhat.com/show_bug.cgi?id=725367

The current working theory is that the lack of command arguments immediately falls back into a master mode on the backup node.

I am going to try to get the SRPM from centos 6 for piranha and see if the patch on this bug can be applied and see how this goes.

If anyone in the meantime has a better idea, please pass it along.

Thanks
Brian

bchand
Posts: 16
Joined: 2012/01/02 18:55:39

Re: LVS setup : Pulse process not discovering the backup server via heartbeat

Post by bchand » 2012/01/19 03:53:21

Well so much for that bugzilla theory. The patch doesn't seem to have any positive effect.

Back to the drawing board.....

Anyone got any other ideas, I am fresh out.

Thanks
Brian

User avatar
TrevorH
Site Admin
Posts: 33216
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: LVS setup : Pulse process not discovering the backup server via heartbeat

Post by TrevorH » 2012/01/19 09:30:38

I'd double check that the packets are reaching the other node. So on 192.168.1.44 I'd run

[code]
tshark udp and port 539 and src host 192.168.1.45
[/code]

This should confirm that udp packets from 192.168.1.45 are arrivnig successfully on the other machine. The tshark command is part of wireshark.

miko
Posts: 56
Joined: 2005/05/03 09:56:00
Location: BiH, Sarajevo

[SOLVED] LVS setup : Pulse process not discovering the backu

Post by miko » 2012/01/19 10:58:57

Hi,

are you using some kind of virtualisation platform?

bchand
Posts: 16
Joined: 2012/01/02 18:55:39

Re: LVS setup : Pulse process not discovering the backup server via heartbeat

Post by bchand » 2012/01/19 23:23:33

I will give this a go tonight. Will report back results.

Post Reply