There is precious little information available about troubleshooting ISC DHCPD failover when it fails. Of course, I am writing about the famous dhcpd: failover peer failover-partner: unexpected error error.
I'm running the latest CentOS, the latest DHCPD, all updated.
Here's the relevant section of dhcpd.conf from my primary DHCP server:
- # This server is the primary member of a DHCP cluster.
# Here follows the declaration block that defines this server as the primary
# failover peer for this cluster:
failover peer "failover-partner" {
primary;
address 192.168.90.20;
port 519;
peer address 192.168.90.21;
peer port 520;
max-response-delay 60;
max-unacked-updates 10;
mclt 3600;
split 128;
load balance max seconds 3;
}
# The following section sets up OMAPI (Object Management API) between the two
# DHCP failover cluster members:
omapi-port 7911;
omapi-key omapi_key;
key omapi_key {
algorithm hmac-md5;
secret aS/OEvPT22rsCGFivtYqIXwJSoXrSueflS5tAXMVcY2LJA4YQSmUMXnP 3opfSGnZkO6v/yQxvcX0Kd8So09OdA==;
}
- # This server is the secondary member of a DHCP cluster.
# Here follows the declaration block that defines this server as the primary
# failover peer for this cluster:
failover peer "failover-partner" {
secondary;
address 192.168.90.21;
port 520;
peer address 192.168.90.20;
port 519;
max-response-delay 60;
max-unacked-updates 10;
load balance max seconds 3;
}
# The following section sets up OMAPI (Object Management API) between the two
# DHCP failover cluster members:
omapi-port 7911;
omapi-key omapi_key;
key omapi_key {
algorithm hmac-md5;
secret aS/OEvPT22rsCGFivtYqIXwJSoXrSueflS5tAXMVcY2LJA4YQSmUMXnP 3opfSGnZkO6v/yQxvcX0Kd8So09OdA==;
}
- tail -f /var/log/messages
Sep 25 15:29:59 cesrvPubWiFiDHCP01 dhcpd: ** Ignoring requests on eno33557248. If this is not what
Sep 25 15:29:59 cesrvPubWiFiDHCP01 dhcpd: you want, please write a subnet declaration
Sep 25 15:29:59 cesrvPubWiFiDHCP01 dhcpd: in your dhcpd.conf file for the network segment
Sep 25 15:29:59 cesrvPubWiFiDHCP01 dhcpd: to which interface eno33557248 is attached. **
Sep 25 15:29:59 cesrvPubWiFiDHCP01 dhcpd:
Sep 25 15:29:59 cesrvPubWiFiDHCP01 dhcpd: Listening on LPF/eno16777984/00:50:56:9a:7c:89/192.168.90.0/24
Sep 25 15:29:59 cesrvPubWiFiDHCP01 dhcpd: Sending on LPF/eno16777984/00:50:56:9a:7c:89/192.168.90.0/24
Sep 25 15:29:59 cesrvPubWiFiDHCP01 dhcpd: Sending on Socket/fallback/fallback-net
Sep 25 15:29:59 cesrvPubWiFiDHCP01 dhcpd: failover peer failover-partner: I move from recover to startup
Sep 25 15:29:59 cesrvPubWiFiDHCP01 systemd: Started DHCPv4 Server Daemon.
Sep 25 15:30:14 cesrvPubWiFiDHCP01 dhcpd: failover peer failover-partner: I move from startup to recover
Sep 25 15:31:29 cesrvPubWiFiDHCP01 dhcpd: failover peer failover-partner: unexpected error
- tail -f /var/log/messages
Sep 25 15:29:38 cesrvPubWiFiDHCP02 dhcpd: ** Ignoring requests on eno33557248. If this is not what
Sep 25 15:29:38 cesrvPubWiFiDHCP02 dhcpd: you want, please write a subnet declaration
Sep 25 15:29:38 cesrvPubWiFiDHCP02 dhcpd: in your dhcpd.conf file for the network segment
Sep 25 15:29:38 cesrvPubWiFiDHCP02 dhcpd: to which interface eno33557248 is attached. **
Sep 25 15:29:38 cesrvPubWiFiDHCP02 dhcpd:
Sep 25 15:29:38 cesrvPubWiFiDHCP02 dhcpd: Listening on LPF/eno16777984/00:50:56:9a:33:e0/192.168.90.0/24
Sep 25 15:29:38 cesrvPubWiFiDHCP02 dhcpd: Sending on LPF/eno16777984/00:50:56:9a:33:e0/192.168.90.0/24
Sep 25 15:29:38 cesrvPubWiFiDHCP02 dhcpd: Sending on Socket/fallback/fallback-net
Sep 25 15:29:38 cesrvPubWiFiDHCP02 dhcpd: failover peer failover-partner: I move from recover to startup
Sep 25 15:29:38 cesrvPubWiFiDHCP02 systemd: Started DHCPv4 Server Daemon.
Sep 25 15:29:53 cesrvPubWiFiDHCP02 dhcpd: failover peer failover-partner: I move from startup to recover
Sep 25 15:31:08 cesrvPubWiFiDHCP02 dhcpd: failover peer failover-partner: unexpected error
Sep 25 15:32:38 cesrvPubWiFiDHCP02 dhcpd: failover peer failover-partner: unexpected error
I have stopped the firewalld (systemctl stop firewalld) to ensure it is not contributing with this problem. NTPD is running correctly on both DHCP servers, they remain within 150ms of each others' clocks.
At this point, I am stuck. I don't know what steps to take next.
Any ideas, anyone?