CentOS 7.9.2009 Infiniband drivers causing kernel panic on Ryzen 9 5950X

Issues related to applications and software problems
Post Reply
alpha754293
Posts: 69
Joined: 2019/07/29 16:15:14

CentOS 7.9.2009 Infiniband drivers causing kernel panic on Ryzen 9 5950X

Post by alpha754293 » 2021/12/24 05:08:37

I am trying to have a system set up with an AMD Ryzen 9 5950X on an Asus X570 TUF Gaming Pro (WiFi) motherboard and when I get CentOS 7.9.2009 installed and the kernel updated to 5.15.11 and I tried to get Infiniband up and running, it results in a kernel panic.

Image

Here is the background:

I recently got the processor, motherboard, RAM (2x 32 GB Crucial DDR4-3200 unbuffered, non-ECC RAM) and I am using my old Evga GTX 980 just to get the system going, along with a Mellanox ConnectX-4 100 Gbps Infiniband NIC (MCX456A-ECAT) using CentOS 7.7.2009 to a 1 TB HGST SATA 6 Gbps 7200 rpm HDD.

If I tried to install directly off the ISO image (written to a USB drive using Rufus 3.8), it will immediately produce a kernel panic.

So, instead, I use my old Intel Core i7-3930K to "bootstrap/jumpstart" the installation (i.e. install it on there and then I update the kernel via elrepo from the default kernel that ships with CentOS 7.9.2009 to the 5.15.11 kernel) and once I get the kernel updated, then I can move the hard drive back over to the Ryzen 9 5950X system and then it will be able to boot up initially without any problems.

The moment that I install and/or update, for example, glibc, yum will install it, but when I go to reboot the system is when I will get the kernel panic as shown above.

If I try to reset the system, it will also result in a kernel panic.

Has anybody else encountered this issue before on said Ryzen 9 5950X?

If so, how did you resolve this kernel panic issue?

Your help is greatly appreciated.

Thank you.

User avatar
TrevorH
Site Admin
Posts: 33216
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: CentOS 7.9.2009 Infiniband drivers causing kernel panic on Ryzen 9 5950X

Post by TrevorH » 2021/12/24 11:54:31

That is not a CentOS kernel so no point in reporting problems with it here. Since it appears to be from ELRepo, you should take the problem report to the linux kernel mailing list and report it there.
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

alpha754293
Posts: 69
Joined: 2019/07/29 16:15:14

Re: CentOS 7.9.2009 Infiniband drivers causing kernel panic on Ryzen 9 5950X

Post by alpha754293 » 2021/12/24 18:58:31

If I don't update the kernel, then the 3.10.0-1160 kernel that ships with also has a kernel panic on a Ryzen 5950X (by default anyways), i.e. I can't even get it to install after writing the 7.9.2009 ISO onto a USB drive using Rufus 3.17 (which is the latest version as of this writing).

Either way, the kernel panics regardless of which kernel I am using.

Mellanox's community forum states that if there is an issue with the Mellanox "inbox" driver that ships with the OS, which IS under the purview of CentOS 7.9.2009, then I should be reaching out here.

If I don't update the kernel, then the system still has a kernel panic with the Ryzen 5950X with said 3.10.0-1160 kernel, so either way, CentOS has a kernel panic on a system that has both the Ryzen 5950X and the Mellanox ConnectX-4 100 Gbps Infiniband NIC.

Thanks.

User avatar
jlehtone
Posts: 4530
Joined: 2007/12/11 08:17:33
Location: Finland

Re: CentOS 7.9.2009 Infiniband drivers causing kernel panic on Ryzen 9 5950X

Post by jlehtone » 2021/12/25 12:05:28

alpha754293 wrote:
2021/12/24 18:58:31
I can't even get it to install after writing the 7.9.2009 ISO onto a USB drive using Rufus 3.17 (which is the latest version as of this writing).
Does that thumb drive function properly on other machines?

If you have a kernel that would boot if there were no IB card, then installation without the card would be logical.
One blacklist the offending modules with the kernel command-line options?
Once system is up, running, "better" drivers are at hand and load during runtime, then make them load on boot too.

alpha754293
Posts: 69
Joined: 2019/07/29 16:15:14

Re: CentOS 7.9.2009 Infiniband drivers causing kernel panic on Ryzen 9 5950X

Post by alpha754293 » 2021/12/25 15:42:05

jlehtone wrote:
2021/12/25 12:05:28
alpha754293 wrote:
2021/12/24 18:58:31
I can't even get it to install after writing the 7.9.2009 ISO onto a USB drive using Rufus 3.17 (which is the latest version as of this writing).
Does that thumb drive function properly on other machines?

If you have a kernel that would boot if there were no IB card, then installation without the card would be logical.
One blacklist the offending modules with the kernel command-line options?
Once system is up, running, "better" drivers are at hand and load during runtime, then make them load on boot too.
Yes, it does.

I am able to use the same thumb drive to perform the installation on an Intel Core i7-3930K on an Asus X79 Sabertooth motherboard and that system also has an IB card as well.

Both the Ryzen 5950X system AND the 3930K system has an IB card in each system.

On the Ryzen 5950X system, when I try to boot from the thumb drive, it will produce the kernel panic as shown above.

Booting directly into the USB drive on the 5950X system produces this kernel panic:

Image

Therefore; as a workaround, I am using my 3930K system (which also has an IB card in it as well), and I am using that to "jumpstart" my 5950X system.

Per the other thread, it was stated that CentOS 7.9.2009 is the only supported CentOS version here, so that's what I'm using. So if I try to use that directly on the 5950X system WITHOUT first "jumpstarting" said 5950X system with my 3930K system, then it results in a kernel panic right away.

But if I want the 5950X system to work, then I would need to install CentOS 7.9.2009 using my 3930K system first, and then I can update the kernel (in order to get around the kernel panic issue 2.something seconds into the boot sequence) because if I do that, then at least I can get into the GNOME desktop environment. However, the moment that I update, for example, glibc to the latest, upon reboot is where/how/when you'd see the kernel panic as shown above.

Your help is greatly appreciated.

Thank you.

P.S. In regards to the "better drivers" comment from your reply above, I tried using the Mellanox MLNX_OFED_LINUX driver instead of the "inbox" Infiniband drivers that ships with the OS.

The problem is that with that, I would have to install the MLNX_OFED_LINUX drivers BEFORE I update the kernel using my 3930K system. And that works fine.

However, once I update the kernel, then even the Mellanox drivers doesn't work anymore because the kernel modules haven't been updated/rebuilt for that kernel (version) and as a result, `/etc/init.d/openibd restart` fails.

And if I try to execute `./mlnxofedinstall --add-kernel-support`, it fails to find the appropriate source for the kernel even if I install (when I update the kernel for example: `sudo yum install --enablerepo=elrepo-kernel kernel-lt kernel-lt-devel`), the Mellanox drivers doesn't know how or where to find the new kernel source properly and therefore; the rebuild of the kernel modules files, and therefore; `/etc/init.d/openibd restart` fails.

Therefore; either way:

1) Using the default 3.10.0-1160 kernel on 5950X results in a kernel panic which renders the entire system inoperative.

2) Updating the kernel using elrepo-kernel results in a kernel that's not supported here, but the "inbox" Infiniband drivers causes a kernel panic, and the Mellanox drivers can't rebuild the kernel modules for the updated kernel, which results in IB being inoperative in the system which is a no-go, full-stop barrier for me.

(This is being tested as a new compute node in a cluster, and therefore; getting the system, with IB operational is imperative.)

Thank you.

Post Reply