Bondig 802.3ad with mellanox connectx-4lx 10/25Gb

Posted: 2020/11/18 10:31:54
by NordalLund
Hi all,

I trying to setup a LACP bond against an Cisco ACI box, and are facing some difficulties.
We are doing clean installs of CentOS 7.8 minimal, and setting up the bond and VLAN interfaces during installation.
when booted the interface is going up and down with no specific timerange in between.

We have tried several options, and solutions found in variuos forums, but we are unable to get a stable connection.
Our network guy is seeing errros about missing LACP PDU's and then suspends both ports.

We have tried fast and slow LACP rate, no change.

We have tried disabling the network manager as suggested on several posts, but with little success.
If we stops and disables the network manager and then starts the network service, we sometimes get a stable ping without the interface going down. But this is not surviving a reboot.

We have tried installing the 8.2 version, but same result.

I would like to try to install the non-minimal version of 7.8, but it seems they removed the isos yesterday from the official repos?

I hope someone is able to help.

The Nics are mellanox connectx-4lx 10/25Gb (HP branded)

BR Kasper

Re: Bondig 802.3ad with mellanox connectx-4lx 10/25Gb

Posted: 2020/11/18 13:49:04
by tunk
You could try to run yum update to get 7.9.
It was released last week and that's why you don't find 7.8.
And it is possible to install GUI+DE/WM by some yum commands (don't remember which).

Re: Bondig 802.3ad with mellanox connectx-4lx 10/25Gb

Posted: 2020/11/18 14:27:40
by NordalLund
Unfortunately my software does not support 7.9, and using 8.2 (as a test) did not solve anything.

I have just tried doing a new installation but using a team instead of a bond, same result.
The network seems to be running fine on both VLANs for around 2 minutes, and the it's taken down by the network manager. (so it seems in log files)

It seems a like network manager is killing the connection

Just to outline what we are doing:
2 Nics - 1 bond - 2 VLANS
Trying to configure LACP, but we have also tried active/backup - still no luck

we also tried updating the driver to a newer one from Mellanox, no sucess either :(
BR Kasper

Re: Bondig 802.3ad with mellanox connectx-4lx 10/25Gb

Posted: 2020/11/21 17:37:09
by NordalLund
I got this one figured out.
It turned out that "ignoring" ipv6 on the nics/bond/VLAN resolved my issue.
What helped us, was a clean install of Red Hat that had these settings out of the box when configuring the network during the installation.
The Red Hat installation worked in first try (we had tried 10+ times with CentOS in different combinations), and we could compare the network settings.
I would strongly recommend that CentOS configures the underlaying NIC's/bind correctly, when turning this on during the installation. We really did spend a lot of hours debugging this setup.
BR Kasper