Servers Unable to Reconnect to the Internet (via Ethernet) after Router Loses Internet Connection
Servers Unable to Reconnect to the Internet (via Ethernet) after Router Loses Internet Connection
Hi all,
I'm not so sure how to properly word my issue. My apologies if a similar thread exists, as I couldn't find one.
I have a couple of Dell PowerEdge servers connected via Ethernet to a router, provided by my business ISP. One set of servers have strictly DHCP Ethernet connections, and another set have a static IP, none are in "mixed mode" as there's only one Ethernet connection per server connected to the router, and all are set to "Connect automatically" under the Network settings. The router connects directly to the internet via RF. I'm not so sure how to confirm they're being managed by Network Manager, however, the connections do appear after I enter "nmtui" under Terminal.
Every now and then, the router may lose & regain connection to the internet. However, none of my servers are able to reconnect unless I run "systemctl restart network" under Terminal. I can manually replicate this problem by unplugging my router from a power source and re-connecting. My servers display the same behavior, where they'll never regain internet connectivity unless I restart the network interfaces.
Is there a way of automatically reconnecting or attempting 'x' number of times if the router loses internet connectivity? Perhaps the servers continue seeing an active ethernet connection to the router and are unable to identify a loss in internet connectivity.
Any help would be greatly appreciated, thanks. Let me know if you need to know any additional configurations, as all networking changes were made under the Network tab in "Settings."
Nick
I'm not so sure how to properly word my issue. My apologies if a similar thread exists, as I couldn't find one.
I have a couple of Dell PowerEdge servers connected via Ethernet to a router, provided by my business ISP. One set of servers have strictly DHCP Ethernet connections, and another set have a static IP, none are in "mixed mode" as there's only one Ethernet connection per server connected to the router, and all are set to "Connect automatically" under the Network settings. The router connects directly to the internet via RF. I'm not so sure how to confirm they're being managed by Network Manager, however, the connections do appear after I enter "nmtui" under Terminal.
Every now and then, the router may lose & regain connection to the internet. However, none of my servers are able to reconnect unless I run "systemctl restart network" under Terminal. I can manually replicate this problem by unplugging my router from a power source and re-connecting. My servers display the same behavior, where they'll never regain internet connectivity unless I restart the network interfaces.
Is there a way of automatically reconnecting or attempting 'x' number of times if the router loses internet connectivity? Perhaps the servers continue seeing an active ethernet connection to the router and are unable to identify a loss in internet connectivity.
Any help would be greatly appreciated, thanks. Let me know if you need to know any additional configurations, as all networking changes were made under the Network tab in "Settings."
Nick
Last edited by keennay on 2020/05/12 16:40:18, edited 1 time in total.
Re: Servers Unable to Reconnect to the Internet (via Ethernet) after Router Loses Internet Connection
These commands show some details:
You say that you unplug the router from power. Turn it off. That is quite drastic.
How about unplug router from RF? Router is still powered and linked to server, but lacks that WAN connection.
Third, unplug server from router. Now physical link to router is definitely off.
That is scary; easy to misconfigure.
Code: Select all
nmcli d s
nmcli c s
ip ro
nmcli
How about unplug router from RF? Router is still powered and linked to server, but lacks that WAN connection.
Third, unplug server from router. Now physical link to router is definitely off.
Code: Select all
mix of DHCP & static IPs
Re: Servers Unable to Reconnect to the Internet (via Ethernet) after Router Loses Internet Connection
I meant to say: for me to recreate this issue, I can unplug my router, plug it back in, and my servers would react the same as when my router randomly loses internet connection. In both cases, my servers are still unable to reconnect to the internet, despite the fact my router regained internet connectivity, unless I run "systemctl restart network" or restart the servers.
I can run those commands once I get home and post the results. When I'd mentioned a mix of static and DHCP, I meant 2 servers have static IPs and 2 have a DHCP connection, one Ethernet connection each, not a multi-config per server.
I can run those commands once I get home and post the results. When I'd mentioned a mix of static and DHCP, I meant 2 servers have static IPs and 2 have a DHCP connection, one Ethernet connection each, not a multi-config per server.
jlehtone wrote: ↑2020/05/11 22:12:58These commands show some details:You say that you unplug the router from power. Turn it off. That is quite drastic.Code: Select all
nmcli d s nmcli c s ip ro nmcli
How about unplug router from RF? Router is still powered and linked to server, but lacks that WAN connection.
Third, unplug server from router. Now physical link to router is definitely off.
That is scary; easy to misconfigure.Code: Select all
mix of DHCP & static IPs
Re: Servers Unable to Reconnect to the Internet (via Ethernet) after Router Loses Internet Connection
Here are the results from the offline server:jlehtone wrote: ↑2020/05/11 22:12:58These commands show some details:Code: Select all
nmcli d s nmcli c s ip ro nmcli
Code: Select all
nmcli d s
DEVICE TYPE STATE CONNECTION
p3p1 ethernet connected p3p1
em1 ethernet connected em1
virbr0 bridge connected virbr0
em2 ethernet unavailable --
em3 ethernet unavailable --
em4 ethernet unavailable --
p2p1 ethernet unavailable --
p2p2 ethernet unavailable --
p3p2 ethernet unavailable --
lo loopback unmanaged --
virbr0-nic tun unmanaged --
nmcli c s
NAME UUID TYPE DEVICE
p3p1 59fc7fa7-488f-44de-bc84-9bcb63329fef ethernet p3p1
em1 7bdfb634-f707-4950-8ca9-ef7b4d3aba87 ethernet em1
virbr0 f87797ba-83fe-486b-b63f-54b1fef5e42c bridge virbr0
em2 5d758ddd-7f80-4d62-a176-ad3b6bc5e84d ethernet --
em3 cff6e382-f2a4-44ce-9829-246d212ab13b ethernet --
em4 09271d7f-53a7-47de-8787-3a2b6d4d1045 ethernet --
p2p1 1fb2a3d9-f2ba-4311-931e-287b1a2b3d51 ethernet --
p2p2 6583b649-8602-4057-bad3-fcc292ba9641 ethernet --
p3p2 f2bfed52-dd58-4c2d-97a7-b96890dffc16 ethernet --
ip ro
default via 192.168.1.1 dev p3p1 proto static metric 101
default via 10.1.10.1 dev em1 proto dhcp metric 102
10.1.10.0/24 dev em1 proto kernel scope link src 10.1.10.110 metric 102
192.168.1.0/27 dev p3p1 proto kernel scope link src 192.168.1.3 metric 101
192.168.1.0/27 via 192.168.1.1 dev p3p1 proto static metric 101
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1
nmcli
p3p1: connected to p3p1
"MYRICOM Myri-10G"
ethernet (myri10ge), 00:60:DD:44:11:A4, hw, mtu 1500
ip4 default
inet4 192.168.1.3/27
route4 192.168.1.0/27
route4 0.0.0.0/0
route4 192.168.1.0/27
inet6 fe80::4416:138d:10c3:85fe/64
route6 fe80::/64
route6 ff00::/8
em1: connected to em1
"Broadcom and subsidiaries NetXtreme BCM5720 2-port"
ethernet (tg3), F8:BC:12:4E:E0:D0, hw, mtu 1500
ip6 default
inet4 10.1.10.110/24
route4 0.0.0.0/0
route4 10.1.10.0/24
inet6 2603:3001:292a:8000:414d:93ea:aaf1:9014/64
inet6 2603:3001:292a:8000::7caf/128
inet6 fe80::1a1a:c9ae:b0e3:1f62/64
route6 2603:3001:292a:8000::/64
route6 ::/0
route6 ff00::/8
route6 fe80::/64
route6 2603:3001:292a:8000::7caf/128
virbr0: connected to virbr0
"virbr0"
bridge, 52:54:00:57:4F:18, sw, mtu 1500
inet4 192.168.122.1/24
route4 192.168.122.0/24
em2: unavailable
"Broadcom and subsidiaries NetXtreme BCM5720 2-port"
ethernet (tg3), F8:BC:12:4E:E0:D1, hw, mtu 1500
em3: unavailable
"Broadcom and subsidiaries NetXtreme BCM5720 2-port"
ethernet (tg3), F8:BC:12:4E:E0:D2, hw, mtu 1500
em4: unavailable
"Broadcom and subsidiaries NetXtreme BCM5720 2-port"
ethernet (tg3), F8:BC:12:4E:E0:D3, hw, mtu 1500
p2p1: unavailable
"MYRICOM Myri-10G"
ethernet (myri10ge), 00:60:DD:44:11:B0, hw, mtu 9000
p2p2: unavailable
"MYRICOM Myri-10G"
ethernet (myri10ge), 00:60:DD:44:11:B1, hw, mtu 9000
p3p2: unavailable
"MYRICOM Myri-10G"
ethernet (myri10ge), 00:60:DD:44:11:A5, hw, mtu 9000
lo: unmanaged
"lo"
loopback (unknown), 00:00:00:00:00:00, sw, mtu 65536
virbr0-nic: unmanaged
"virbr0-nic"
tun, 52:54:00:57:4F:18, sw, mtu 1500
DNS configuration:
servers: 2603:3001:292a:8000:1698:7dff:fe39:8997 2001:558:feed::1 2001:558:feed::2
interface: em1
servers: <removed>
domains:<removed>
interface: em1
Use "nmcli device show" to get complete information about known devices and
"nmcli connection show" to get an overview on active connection profiles.
Consult nmcli(1) and nmcli-examples(7) manual pages for complete usage details.
Code: Select all
nmcli d s
DEVICE TYPE STATE CONNECTION
em1 ethernet connected em1
p3p1 ethernet connected p3p1
virbr0 bridge connected virbr0
em2 ethernet unavailable --
em3 ethernet unavailable --
em4 ethernet unavailable --
p2p1 ethernet unavailable --
p2p2 ethernet unavailable --
p3p2 ethernet unavailable --
lo loopback unmanaged --
virbr0-nic tun unmanaged --
nmcli c s
NAME UUID TYPE DEVICE
em1 7bdfb634-f707-4950-8ca9-ef7b4d3aba87 ethernet em1
p3p1 59fc7fa7-488f-44de-bc84-9bcb63329fef ethernet p3p1
virbr0 f87797ba-83fe-486b-b63f-54b1fef5e42c bridge virbr0
em2 5d758ddd-7f80-4d62-a176-ad3b6bc5e84d ethernet --
em3 cff6e382-f2a4-44ce-9829-246d212ab13b ethernet --
em4 09271d7f-53a7-47de-8787-3a2b6d4d1045 ethernet --
p2p1 1fb2a3d9-f2ba-4311-931e-287b1a2b3d51 ethernet --
p2p2 6583b649-8602-4057-bad3-fcc292ba9641 ethernet --
p3p2 f2bfed52-dd58-4c2d-97a7-b96890dffc16 ethernet --
ip ro
default via 10.1.10.1 dev em1 proto dhcp metric 100
default via 192.168.1.1 dev p3p1 proto static metric 101
10.1.10.0/24 dev em1 proto kernel scope link src 10.1.10.110 metric 100
192.168.1.0/27 dev p3p1 proto kernel scope link src 192.168.1.3 metric 101
192.168.1.0/27 via 192.168.1.1 dev p3p1 proto static metric 101
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1
nmcli
em1: connected to em1
"Broadcom and subsidiaries NetXtreme BCM5720 2-port"
ethernet (tg3), F8:BC:12:4E:E0:D0, hw, mtu 1500
ip4 default, ip6 default
inet4 10.1.10.110/24
route4 0.0.0.0/0
route4 10.1.10.0/24
inet6 2603:3001:292a:8000:414d:93ea:aaf1:9014/64
inet6 2603:3001:292a:8000::7caf/128
inet6 fe80::1a1a:c9ae:b0e3:1f62/64
route6 2603:3001:292a:8000::/64
route6 ::/0
route6 fe80::/64
route6 ff00::/8
route6 2603:3001:292a:8000::7caf/128
p3p1: connected to p3p1
"MYRICOM Myri-10G"
ethernet (myri10ge), 00:60:DD:44:11:A4, hw, mtu 1500
inet4 192.168.1.3/27
route4 192.168.1.0/27
route4 0.0.0.0/0
route4 192.168.1.0/27
inet6 fe80::4416:138d:10c3:85fe/64
route6 fe80::/64
route6 ff00::/8
virbr0: connected to virbr0
"virbr0"
bridge, 52:54:00:57:4F:18, sw, mtu 1500
inet4 192.168.122.1/24
route4 192.168.122.0/24
em2: unavailable
"Broadcom and subsidiaries NetXtreme BCM5720 2-port"
ethernet (tg3), F8:BC:12:4E:E0:D1, hw, mtu 1500
em3: unavailable
"Broadcom and subsidiaries NetXtreme BCM5720 2-port"
ethernet (tg3), F8:BC:12:4E:E0:D2, hw, mtu 1500
em4: unavailable
"Broadcom and subsidiaries NetXtreme BCM5720 2-port"
ethernet (tg3), F8:BC:12:4E:E0:D3, hw, mtu 1500
p2p1: unavailable
"MYRICOM Myri-10G"
ethernet (myri10ge), 00:60:DD:44:11:B0, hw, mtu 9000
p2p2: unavailable
"MYRICOM Myri-10G"
ethernet (myri10ge), 00:60:DD:44:11:B1, hw, mtu 9000
p3p2: unavailable
"MYRICOM Myri-10G"
ethernet (myri10ge), 00:60:DD:44:11:A5, hw, mtu 9000
lo: unmanaged
"lo"
loopback (unknown), 00:00:00:00:00:00, sw, mtu 65536
virbr0-nic: unmanaged
"virbr0-nic"
tun, 52:54:00:57:4F:18, sw, mtu 1500
DNS configuration:
servers: <removed>
domains: <removed>
interface: em1
servers: 2603:3001:292a:8000:1698:7dff:fe39:8997 2001:558:feed::1 2001:558:feed::2
interface: em1
Use "nmcli device show" to get complete information about known devices and
"nmcli connection show" to get an overview on active connection profiles.
Consult nmcli(1) and nmcli-examples(7) manual pages for complete usage details.
To re-iterate from my previous post, I was only unplugging my router from the outlet to replicate this issue for this forum: a scenario where my router randomly loses and regains internet connectivity. I'm not sure why my router loses internet in the first place, but it's able to eventually regain it seconds later... whereas my servers never do.
I can unplug the RF; it was quite harder to do as the adapter was screwed on very tightly.
I'm seeing the same results unplugging & re-plugging my RF connection, as I would unplugging & re-plugging my router power, where the Ethernet interfaces of my device appear "connected" under nmcli yet lack full internet connectivity until I run "systemctl restart network".
This is correct and accurately reflected under "nmcli d s" below. However, what's interesting here is even after reconnecting my Ethernet cables to the router, I still do not have any internet connectivity and must run "systemctl restart network" in order to regain access.
Code: Select all
nmcli d s
DEVICE TYPE STATE CONNECTION
p3p1 ethernet connected p3p1
virbr0 bridge connected virbr0
em1 ethernet unavailable --
em2 ethernet unavailable --
em3 ethernet unavailable --
em4 ethernet unavailable --
p2p1 ethernet unavailable --
p2p2 ethernet unavailable --
p3p2 ethernet unavailable --
lo loopback unmanaged --
virbr0-nic tun unmanaged --
Again, I meant to clarify that some of the servers' primary eth1 / eno1 interface has a static IP configuration (these are web servers), whereas the eth1 / eno1 interfaces of the other servers have an "auto" DHCP configuration, with only one Ethernet interface per server connected to the router. None of the servers have a mixed mode static / DHCP configuration, yet all of the servers are unable to regain internet connectivity when my router has its random hiccups unless I physically access each server and run "systemctl restart network".jlehtone wrote: ↑2020/05/11 22:12:58That is scary; easy to misconfigure.Code: Select all
mix of DHCP & static IPs
From what we saw above, even after disconnecting my Ethernet cables from the router and reconnecting them, my servers are still unable to auto-reconnect to the internet.
The problem here is this prevents a fully remote scenario where I won't have to be at the datacenter to restart these services. I'm looking for a way (or possible config) that allows an auto-reconnection / restart of services once internet is restored.
Thanks
Re: Servers Unable to Reconnect to the Internet (via Ethernet) after Router Loses Internet Connection
Code: Select all
$ ip ro
default via 192.168.1.1 dev p3p1 proto static metric 101
default via 10.1.10.1 dev em1 proto dhcp metric 102
10.1.10.0/24 dev em1 proto kernel scope link src 10.1.10.110 metric 102
192.168.1.0/27 dev p3p1 proto kernel scope link src 192.168.1.3 metric 101
192.168.1.0/27 via 192.168.1.1 dev p3p1 proto static metric 101
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1
A process in this machine creates a new packet. Packet has destination X. X is not this machine.
What should we do with the packet?
This machine is connected to three networks.
* IF X is in 10.1.10.0/24 THEN toss it out from interface em1
* IF X is in 192.168.1.0/27 THEN toss it out from interface p3p1
* IF X is in 192.168.122.0/24 THEN toss it out from interface virbr0
Those are specific rules. Nice and clear.
Wait,
* IF X is in 192.168.1.0/27 THEN give it to 192.168.1.1. That router knows a way to 192.168.1.0/27
What? We are already a member of 192.168.1.0/27. We should not use router. Bad rule.
Lets ignore that. Do we know any other networks? No.
IF X is not in any of those three subnets THEN use the default route
Default route says what to do for everything else.
What is out default route?
Code: Select all
default via 192.168.1.1 dev p3p1 proto static metric 101
default via 10.1.10.1 dev em1 proto dhcp metric 102
Ok, in principle one could have more than one route for high availability or load balancing.
However, is that really true here?
Does 192.168.1.1 act as the router between this server and ISP?
Does 10.1.10.1 act as the router between this server and ISP?
I bet not.
Code: Select all
# "good"
default via 10.1.10.1 dev em1 proto dhcp metric 100
default via 192.168.1.1 dev p3p1 proto static metric 101
# "bad"
default via 192.168.1.1 dev p3p1 proto static metric 101
default via 10.1.10.1 dev em1 proto dhcp metric 102
When 10.1.10.1 fails, it is demoted to priority 102, and 192.168.1.1 takes over.
192.168.1.1 does not route, but looks so much alive that it stays as number one option (until you restart).
Thou shalt not have more than one default route.
Remove the "default via 192.168.1.1 dev p3p1".
Remove the "192.168.1.0/27 via 192.168.1.1" too. (I have no idea how you made it.)
You can see configuration and state of connections with:
Code: Select all
nmcli con show em1
nmcli con show p3p1
Re: Servers Unable to Reconnect to the Internet (via Ethernet) after Router Loses Internet Connection
Thanks, I can follow up on the rest in a bit (when I'm at the servers).
In summary, 192.168.1.1/27 is the gateway for a 10GbE Dell switch (192.168.1.30) with 192.168.1.27 as the 10GbE fiber interface on this particular server to that switch. The 3 other servers are 192.168.1.3, xxx.7, and xxx.11. These were all created under the "Network" tab under Settings.
The 1GbE Ethernet ports are separate and connect directly to the ISP router.
In summary, 192.168.1.1/27 is the gateway for a 10GbE Dell switch (192.168.1.30) with 192.168.1.27 as the 10GbE fiber interface on this particular server to that switch. The 3 other servers are 192.168.1.3, xxx.7, and xxx.11. These were all created under the "Network" tab under Settings.
The 1GbE Ethernet ports are separate and connect directly to the ISP router.
jlehtone wrote: ↑2020/05/12 18:49:35Routing 101Code: Select all
$ ip ro default via 192.168.1.1 dev p3p1 proto static metric 101 default via 10.1.10.1 dev em1 proto dhcp metric 102 10.1.10.0/24 dev em1 proto kernel scope link src 10.1.10.110 metric 102 192.168.1.0/27 dev p3p1 proto kernel scope link src 192.168.1.3 metric 101 192.168.1.0/27 via 192.168.1.1 dev p3p1 proto static metric 101 192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1
A process in this machine creates a new packet. Packet has destination X. X is not this machine.
What should we do with the packet?
This machine is connected to three networks.
* IF X is in 10.1.10.0/24 THEN toss it out from interface em1
* IF X is in 192.168.1.0/27 THEN toss it out from interface p3p1
* IF X is in 192.168.122.0/24 THEN toss it out from interface virbr0
Those are specific rules. Nice and clear.
Wait,
* IF X is in 192.168.1.0/27 THEN give it to 192.168.1.1. That router knows a way to 192.168.1.0/27
What? We are already a member of 192.168.1.0/27. We should not use router. Bad rule.
Lets ignore that. Do we know any other networks? No.
IF X is not in any of those three subnets THEN use the default route
Default route says what to do for everything else.
What is out default route?Why do I see two, if "default" means "one rule for the rest"?Code: Select all
default via 192.168.1.1 dev p3p1 proto static metric 101 default via 10.1.10.1 dev em1 proto dhcp metric 102
Ok, in principle one could have more than one route for high availability or load balancing.
However, is that really true here?
Does 192.168.1.1 act as the router between this server and ISP?
Does 10.1.10.1 act as the router between this server and ISP?
I bet not.
Looks like system starts with 10.1.10.1 as the router and traffic flows.Code: Select all
# "good" default via 10.1.10.1 dev em1 proto dhcp metric 100 default via 192.168.1.1 dev p3p1 proto static metric 101 # "bad" default via 192.168.1.1 dev p3p1 proto static metric 101 default via 10.1.10.1 dev em1 proto dhcp metric 102
When 10.1.10.1 fails, it is demoted to priority 102, and 192.168.1.1 takes over.
192.168.1.1 does not route, but looks so much alive that it stays as number one option (until you restart).
Thou shalt not have more than one default route.
Remove the "default via 192.168.1.1 dev p3p1".
Remove the "192.168.1.0/27 via 192.168.1.1" too. (I have no idea how you made it.)
You can see configuration and state of connections with:Code: Select all
nmcli con show em1 nmcli con show p3p1
Re: Servers Unable to Reconnect to the Internet (via Ethernet) after Router Loses Internet Connection
That makes no sense.
Does the network 192.168.1.0/27 have these members:
Code: Select all
192.168.1.3 serverA
192.168.1.7 serverB
192.168.1.11 serverC
192.168.1.27 serverD
192.168.1.30 switch
If servers have direct link to router, then why do they (try to) talk through machine 192.168.1.1?
Re: Servers Unable to Reconnect to the Internet (via Ethernet) after Router Loses Internet Connection
Those IPs are correct. 192.168.1.1 doesn't exist as a separate device, I used that as the gateway IP with /27 as the subnet. I prolly messed that up, as my networking experience is limited lol.jlehtone wrote: ↑2020/05/12 19:12:43That makes no sense.
Does the network 192.168.1.0/27 have these members:Is the 192.168.1.1 a sixth member of this network?Code: Select all
192.168.1.3 serverA 192.168.1.7 serverB 192.168.1.11 serverC 192.168.1.27 serverD 192.168.1.30 switch
If servers have direct link to router, then why do they (try to) talk through machine 192.168.1.1?
Re: Servers Unable to Reconnect to the Internet (via Ethernet) after Router Loses Internet Connection
Fix that first then.
If the switch absolutely has to have a default route (aka gateway) set, then pick one of the four servers.
If the switch really need to talk to someone else than the members of 192.168.1.0/27, then that server has to actually route too.
If the switch absolutely has to have a default route (aka gateway) set, then pick one of the four servers.
If the switch really need to talk to someone else than the members of 192.168.1.0/27, then that server has to actually route too.
Re: Servers Unable to Reconnect to the Internet (via Ethernet) after Router Loses Internet Connection
Hey jlehtone, looks like removing those options (via Network settings) resolved this issue! I originally had the 192.168.1.xxx IP setup in the images below.
Afterwards, I'd changed the gateway IP to 0.0.0.0 and removed the routes, and I was still able to access each server via the 10GbE switch (my little understanding of networking had me thinking this was the correct procedure).
I went ahead and reset my router, and each connection automatically connected to the internet as desired.
Thanks again!