Infiniband during PXE Boot

Issues related to configuring your network
Post Reply
levidd
Posts: 2
Joined: 2020/01/02 18:18:01

Infiniband during PXE Boot

Post by levidd » 2020/01/02 18:50:39

Hello,

I have been trying to boot into CentOS-8 via PXE over an infiniband connection. I am starting to believe that my issue lies with firmware (from all the forums I have been reading), but I am new to this so I could be wrong.
The device finds the vmlinuz and initrd.img files via tftp correctly, but once they are loaded it runs into a dracut initqueue timeout, saying failure to start switch root.

Infiniband is getting it's IP addr via dhcp. My (super simple for debugging) pxe cfg default looks like

Code: Select all

default CentOS8
prompt 600

label CentOS8
  kernel images/CentOS-8/vmlinuz
  append initrd=images/CentOS-8/initrd.img ksdevice=bootif inst.ks=nfs:10.5.5.5:/local/path/to/ks.cfg ip=dhcp inst.repo=nfs:10.5.5.5:/local/path/to/repo
label CentOS7
  kernel images/CentOS-7/vmlinuz
  append initrd=images/CentOS-7/initrd.img ksdevice=bootif ip=dhcp inst.repo=nfs:10.5.5.5:/local/path/to/source


I thought adding a kickstart file might help find the source tree but it does not seem to help.

The reason I think it is a firmware issue is that after getting into emergency mode, the infiniband ports do not show up but the ethernet ports do. Therefore while trying to find the installation source it cannot, and comes up with that error. I have an mlx4_core module on the machine with a ConnectX3-Pro. I just updated to the latest driver via MLX_OFED (which enabled me to find the infiniband ports after booting into CentOs8 via a USB).

I tried booting into CentOS-7 as well and the same issue comes up.

Any help is appreciated, this has been driving me insane

levidd
Posts: 2
Joined: 2020/01/02 18:18:01

Re: Infiniband during PXE Boot

Post by levidd » 2020/01/03 20:36:38

Update:
I ended up upgrading firmware and drivers and it did not seem to be the problem. I have noticed that with CentOS-8, my dhcp server is not giving out IP addresses when I use the dhcp-client-identifier. during bootup flexboot finds it correctly but once CentOS-8 is running, the identifier must have changed? Has anyone else come across this issue?

Here is a snippet of my dhcp server

Code: Select all

# 
# DHCP Server Configuration file.
#       see /usr/share/doc/dhcp*/dhcpd.conf.sample
#       see 'man 5 dhcpd.conf'


default-lease-time 28800;
max-lease-time 28800;
option subnet-mask 255.255.255.0;
option domain-name "myserver.org";
ddns-update-style interim;
server-name MYserver;
ignore client-updates;
option client-architecture code 93 = unsigned integer 16;
option broadcast-address 255.255.255.255;
allow booting;
allow bootp;

option space pxelinux;
option pxelinux.magic code 208 = string;
option pxelinux.configfile code 209 = text;
option pxelinux.pathprefix code 210 = text;
option pxelinux.reboottime code 211 = unsigned integer 32;
option architecture-type code 93 = unsigned integer 16;

subnet 10.5.5.0 netmask 255.255.255.0 {
        option subnet-mask 255.255.255.0;
        option routers 10.5.5.5;
        #range 10.5.5.25 10.5.5.30;
        always-broadcast on;

        class "pxeclients" {
                match if substring(option vendor-class-identifier, 0, 9) = "PXEClient";
                next-server 10.5.5.5;

                if option architecture-type = 00:07 {
                        filename "shim.efi";
                } else {
                        filename "pxelinux/pxelinux.0";
                }
        }

        host MY.nodeA {
                option host-name "nodeA";
                option dhcp-client-identifier=ff:00:00:00:00:00:02:00:00:02:c9:00:ac:1f:6b:ff:ff:1f:0e:c1;
                fixed-address 10.5.5.21;
        }

Post Reply