DHCPD failover failing

Issues related to applications and software problems
Post Reply
johlsson
Posts: 5
Joined: 2015/02/11 21:01:29

DHCPD failover failing

Post by johlsson » 2015/09/25 22:38:39

Hello,

There is precious little information available about troubleshooting ISC DHCPD failover when it fails. Of course, I am writing about the famous dhcpd: failover peer failover-partner: unexpected error error.

I'm running the latest CentOS, the latest DHCPD, all updated.

Here's the relevant section of dhcpd.conf from my primary DHCP server:
  • # This server is the primary member of a DHCP cluster.
    # Here follows the declaration block that defines this server as the primary
    # failover peer for this cluster:

    failover peer "failover-partner" {
    primary;
    address 192.168.90.20;
    port 519;
    peer address 192.168.90.21;
    peer port 520;
    max-response-delay 60;
    max-unacked-updates 10;
    mclt 3600;
    split 128;
    load balance max seconds 3;
    }

    # The following section sets up OMAPI (Object Management API) between the two
    # DHCP failover cluster members:

    omapi-port 7911;
    omapi-key omapi_key;
    key omapi_key {
    algorithm hmac-md5;
    secret aS/OEvPT22rsCGFivtYqIXwJSoXrSueflS5tAXMVcY2LJA4YQSmUMXnP 3opfSGnZkO6v/yQxvcX0Kd8So09OdA==;
    }
Here is the relevant section of the secondary DHCP server:
  • # This server is the secondary member of a DHCP cluster.
    # Here follows the declaration block that defines this server as the primary
    # failover peer for this cluster:

    failover peer "failover-partner" {
    secondary;
    address 192.168.90.21;
    port 520;
    peer address 192.168.90.20;
    port 519;
    max-response-delay 60;
    max-unacked-updates 10;
    load balance max seconds 3;
    }

    # The following section sets up OMAPI (Object Management API) between the two
    # DHCP failover cluster members:

    omapi-port 7911;
    omapi-key omapi_key;
    key omapi_key {
    algorithm hmac-md5;
    secret aS/OEvPT22rsCGFivtYqIXwJSoXrSueflS5tAXMVcY2LJA4YQSmUMXnP 3opfSGnZkO6v/yQxvcX0Kd8So09OdA==;
    }
When I start the DHCP daemon, here's the /var/log/messages output from the prime server:
  • tail -f /var/log/messages
    Sep 25 15:29:59 cesrvPubWiFiDHCP01 dhcpd: ** Ignoring requests on eno33557248. If this is not what
    Sep 25 15:29:59 cesrvPubWiFiDHCP01 dhcpd: you want, please write a subnet declaration
    Sep 25 15:29:59 cesrvPubWiFiDHCP01 dhcpd: in your dhcpd.conf file for the network segment
    Sep 25 15:29:59 cesrvPubWiFiDHCP01 dhcpd: to which interface eno33557248 is attached. **
    Sep 25 15:29:59 cesrvPubWiFiDHCP01 dhcpd:
    Sep 25 15:29:59 cesrvPubWiFiDHCP01 dhcpd: Listening on LPF/eno16777984/00:50:56:9a:7c:89/192.168.90.0/24
    Sep 25 15:29:59 cesrvPubWiFiDHCP01 dhcpd: Sending on LPF/eno16777984/00:50:56:9a:7c:89/192.168.90.0/24
    Sep 25 15:29:59 cesrvPubWiFiDHCP01 dhcpd: Sending on Socket/fallback/fallback-net
    Sep 25 15:29:59 cesrvPubWiFiDHCP01 dhcpd: failover peer failover-partner: I move from recover to startup
    Sep 25 15:29:59 cesrvPubWiFiDHCP01 systemd: Started DHCPv4 Server Daemon.
    Sep 25 15:30:14 cesrvPubWiFiDHCP01 dhcpd: failover peer failover-partner: I move from startup to recover
    Sep 25 15:31:29 cesrvPubWiFiDHCP01 dhcpd: failover peer failover-partner: unexpected error
And again, the same from the secondary server:
  • tail -f /var/log/messages
    Sep 25 15:29:38 cesrvPubWiFiDHCP02 dhcpd: ** Ignoring requests on eno33557248. If this is not what
    Sep 25 15:29:38 cesrvPubWiFiDHCP02 dhcpd: you want, please write a subnet declaration
    Sep 25 15:29:38 cesrvPubWiFiDHCP02 dhcpd: in your dhcpd.conf file for the network segment
    Sep 25 15:29:38 cesrvPubWiFiDHCP02 dhcpd: to which interface eno33557248 is attached. **
    Sep 25 15:29:38 cesrvPubWiFiDHCP02 dhcpd:
    Sep 25 15:29:38 cesrvPubWiFiDHCP02 dhcpd: Listening on LPF/eno16777984/00:50:56:9a:33:e0/192.168.90.0/24
    Sep 25 15:29:38 cesrvPubWiFiDHCP02 dhcpd: Sending on LPF/eno16777984/00:50:56:9a:33:e0/192.168.90.0/24
    Sep 25 15:29:38 cesrvPubWiFiDHCP02 dhcpd: Sending on Socket/fallback/fallback-net
    Sep 25 15:29:38 cesrvPubWiFiDHCP02 dhcpd: failover peer failover-partner: I move from recover to startup
    Sep 25 15:29:38 cesrvPubWiFiDHCP02 systemd: Started DHCPv4 Server Daemon.
    Sep 25 15:29:53 cesrvPubWiFiDHCP02 dhcpd: failover peer failover-partner: I move from startup to recover
    Sep 25 15:31:08 cesrvPubWiFiDHCP02 dhcpd: failover peer failover-partner: unexpected error
    Sep 25 15:32:38 cesrvPubWiFiDHCP02 dhcpd: failover peer failover-partner: unexpected error
And, of course, that last line repeats every ninety seconds on both servers.

I have stopped the firewalld (systemctl stop firewalld) to ensure it is not contributing with this problem. NTPD is running correctly on both DHCP servers, they remain within 150ms of each others' clocks.

At this point, I am stuck. I don't know what steps to take next.

Any ideas, anyone?

brigame
Posts: 1
Joined: 2015/09/28 08:41:00

Re: DHCPD failover failing

Post by brigame » 2015/09/28 09:44:09

You should have in your dhcpd.conf on each server separate names.
Example:
For failover-partner1
failover peer "failover-partner1"{
# your configuration
}
For failover-partner2
failover peer "failover-partner2"{
# your configuration
}

johlsson
Posts: 5
Joined: 2015/02/11 21:01:29

Re: DHCPD failover failing

Post by johlsson » 2015/09/29 21:58:20

brigame wrote:You should have in your dhcpd.conf on each server separate names.
Example:
For failover-partner1
failover peer "failover-partner1"{
# your configuration
}
For failover-partner2
failover peer "failover-partner2"{
# your configuration
}
Thanks, man. I'll try that out right away.

johlsson
Posts: 5
Joined: 2015/02/11 21:01:29

Re: DHCPD failover failing

Post by johlsson » 2015/09/29 23:31:32

brigame wrote:You should have in your dhcpd.conf on each server separate names.
Example:
For failover-partner1
failover peer "failover-partner1"{
# your configuration
}
For failover-partner2
failover peer "failover-partner2"{
# your configuration
}
brigame,

This did nothing for me. I got the same unexpected errors.

I'm curious though, the example given at ISC doesn't show using different names there for the primary and secondary servers: https://kb.isc.org/article/AA-00502/0/A ... lover.html:
  • 5) Add declaration blocks for the failover peers to the configuration files on the primary:

    failover peer "failover-partner" {
    primary;
    address dhcp-primary.example.com;
    port 519;
    peer address dhcp-secondary.example.com;
    peer port 520;
    max‐response‐delay 60;
    max‐unacked‐updates 10;
    mclt 3600;
    split 128;
    load balance max seconds 3;
    }

    ..and secondary:

    failover peer "failover-partner" {
    secondary;
    address dhcp-secondary.example.com;
    port 520;
    peer address dhcp-primary.example.com;
    peer port 519;
    max‐response‐delay 60;
    max‐unacked‐updates 10;
    load balance max seconds 3;
    }

johlsson
Posts: 5
Joined: 2015/02/11 21:01:29

Re: DHCPD failover failing

Post by johlsson » 2015/10/02 15:40:23

brigame wrote:You should have in your dhcpd.conf on each server separate names.
Example:
For failover-partner1
failover peer "failover-partner1"{
# your configuration
}
For failover-partner2
failover peer "failover-partner2"{
# your configuration
}
brigame, I can confirm that the cluster name doesn't have to be different on each cluster member, as you wrote here. I solved the problem with the cluster failing to form.

johlsson
Posts: 5
Joined: 2015/02/11 21:01:29

Re: DHCPD failover failing

Post by johlsson » 2015/10/02 18:34:30

SELINUX. That is what was stopping the DHCP failover cluster from forming.

The guide to configuring DHCP failover from the ISC knowledge base, suggested using TCP ports 519 and 520 for the failover protocol communications between the two cluster members. If you look in my example configurations above, I have dutifully done that. I also made sure to allow those ports through the firewalld (although I got the same result with the firewalld halted). Yet, the cluster still would not form. "unexpected error".

In the /etc/audit/audit.log file, I found the answer (repeated many times):

type=AVC msg=audit(1443033016.881:3911): avc: denied { name_bind } for pid=30742 comm="dhcpd" src=519 scontext=system_u:system_r:dhcpd_t:s0 tcontext=system_u:object_r:hi_reserved_port_t:s0 tclass=tcp_socket

SELINUX was denying these packets. The cluster members could not communicate with each other, of course the cluster would not form.

I had two choices here:
  1. figure out how to configure selinux to allow port 519 (520 on the other cluster member)
  2. configure dhcpd to use a known port
This protocol has been defined for some years now, and has TCP and UDP ports assigned to it by IANA: 647 and 847. These are listed in the /etc/services file.

To make my cluster form, I removed the port statements from the failover declaration sections of /etc/dhcp/dhcpd.conf, which has the effect of making DHCPD listen on the default ports as listed in /etc/services.
  • # This server is the primary member of a DHCP cluster.
    # Here follows the declaration block that defines this server as the primary
    # failover peer for this cluster:

    failover peer "dhcp-cluster" {
    primary;
    address 192.168.90.20;
    peer address 192.168.90.21;
    max-response-delay 60;
    max-unacked-updates 10;
    mclt 3600;
    split 128;
    load balance max seconds 3;
    }
I edited my firewalld service file I created for the dhcp failover protocol to reflect default port 647
  • <?xml version="1.0" encoding="utf-8"?>
    <service>
    <short>DHCP Failover Service</short>
    <description>ISC DHCPD failover protocol messages from peer.</description>
    <port protocol="tcp" port="647"/>
    </service>
Then, after starting the firewalld service, I started the dhcpd service. The cluster formed immediately. Here's the output from /var/log/messages:
  • Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: Internet Systems Consortium DHCP Server 4.2.5
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: Copyright 2004-2013 Internet Systems Consortium.
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: All rights reserved.
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: For info, please visit https://www.isc.org/software/dhcp/
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: Not searching LDAP since ldap-server, ldap-port and ldap-base-dn were not specified in the config file
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: Internet Systems Consortium DHCP Server 4.2.5
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: Copyright 2004-2013 Internet Systems Consortium.
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: All rights reserved.
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: For info, please visit https://www.isc.org/software/dhcp/
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: Wrote 8 leases to leases file.
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd:
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: No subnet declaration for eno33557248 (10.10.35.69).
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: ** Ignoring requests on eno33557248. If this is not what
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: you want, please write a subnet declaration
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: in your dhcpd.conf file for the network segment
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: to which interface eno33557248 is attached. **
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd:
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: Listening on LPF/eno16777984/00:50:56:9a:7c:89/192.168.90.0/24
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: Sending on LPF/eno16777984/00:50:56:9a:7c:89/192.168.90.0/24
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: Sending on Socket/fallback/fallback-net
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: failover peer dhcp-cluster: I move from recover to startup
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: failover peer dhcp-cluster: peer moves from unknown-state to recover
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: failover peer dhcp-cluster: requesting full update from peer
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: failover peer dhcp-cluster: I move from startup to recover
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: Sent update request all message to dhcp-cluster
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: failover peer dhcp-cluster: peer moves from recover to recover
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: failover peer dhcp-cluster: requesting full update from peer
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: Update request all from dhcp-cluster: sending update
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: Sent update done message to dhcp-cluster
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: failover peer dhcp-cluster: peer update completed.
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: failover peer dhcp-cluster: I move from recover to recover-done
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: failover peer dhcp-cluster: peer moves from recover to recover-done
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: Both servers have entered recover-done!
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: failover peer dhcp-cluster: I move from recover-done to normal
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: balancing pool 7fa8194e9960 192.168.40.0/24 total 245 free 245 backup 0 lts 122 max-own (+/-)25
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: balanced pool 7fa8194e9960 192.168.40.0/24 total 245 free 123 backup 122 lts 0 max-misbal 37
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: Sending updates to dhcp-cluster.
    Oct 1 16:25:44 cesrvPubWiFiDHCP01 dhcpd: failover peer dhcp-cluster: peer moves from recover-done to normal
    Oct 1 17:25:44 cesrvPubWiFiDHCP01 dhcpd: balancing pool 7fa8194e9960 192.168.40.0/24 total 245 free 123 backup 122 lts 0 max-own (+/-)25
    Oct 1 17:25:44 cesrvPubWiFiDHCP01 dhcpd: balanced pool 7fa8194e9960 192.168.40.0/24 total 245 free 123 backup 122 lts 0 max-misbal 37
I didn't investigate how to make selinux behave with the ports the ISC recommended in their DHCP configuration guide. It could be as simple as adding those ports to /etc/services. I have not confirmed that, though.

User avatar
TrevorH
Site Admin
Posts: 33202
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: DHCPD failover failing

Post by TrevorH » 2015/10/02 19:22:27

Port 647 is already in the selinux rules so it's probably the correct port to use. No idea why the upstream manual recommernds otherwise.

Code: Select all

[root@centos7 ~]# semanage port -l | grep 647
dhcpd_port_t                   tcp      547, 548, 647, 847, 7911
dhcpd_port_t                   udp      67, 547, 548, 647, 847
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

Post Reply