Page 1 of 1

RAID1 + LUKS + LVM not booting on remote server

Posted: 2021/01/20 03:17:25
by sandtler
I have a dedicated server with two identical drives and would like to install CentOS 8 Stream with a RAID1 -> LUKS -> LVM -> rootfs setup. However, there is a problem: I don't have physical access to the server, and therefore cannot enter the LUKS passphrase on boot. I am mitigating this with the dracut-sshd module which basically spawns an openSSH server in the initramfs so that I can enter the passphrase using systemd-tty-ask-password-agent. This was my entire installation procedure:
  • I booted into the graphical CentOS 8 installer (because my hosting provider doesn't have a CentOS Stream image yet) using VNC, configured my drives, and installed the OS without any problems. This resulted in the exact disk layout described above. The system is using UEFI.
  • I booted into the (debian-based, also UEFI) rescue image my hosting provider is, well, providing and mounted /, /home, /boot, /boot/efi, /proc, /dev, and /sys, and chrooted into the system root.
  • I upgraded CentOS 8 to CentOS Stream as described on the CentOS website and went out of and back into the chroot to ensure the entire environment has been updated.
  • I installed the dracut-sshd package from the Copr repository, the dracut module is enabled by default.
  • I created /etc/dracut.conf.d/10-custom.conf with the following content and then re-generated the initramfs:

    Code: Select all

    add_dracutmodules+=" crypt lvm network systemd "
  • I added the following kernel parameters and then re-generated my grub config:

    Code: Select all

    rd.neednet=1 ip=dhcp
The system actually boots and I am able to connect to the ssh daemon running in the initramfs; however, systemd-tty-ask-password-agent doesn't ask for a passphrase but exits immediately without giving any output. And what's even worse, the reboot command doesn't do anything either. Not even sending a Ctrl+Alt+Delete through my hosting provider's web interface has any effect, rebooting it to at least get back into the rescue system requires invoking a hardware reset.

So here is the question: Why is there no password prompt in the initramfs, and what do I need to change in order to get one? Thanks in advance, you are my last hope as I'm slowly starting to lose my sanity after an entire day of troubleshooting and even completely redoing the entire installation from scratch. The rest of this post is just some more details about my setup in case they are of any help:

Full Kernel cmdline:

Code: Select all

crashkernel=auto resume=/dev/mapper/astolfovg-swap rd.lvm.lv=astolfovg/root rd.luks.uuid=luks-672fcdcc-c8ba-469f-aaec-f634f18e6ba1 rd.md.uuid=a555b305:f92e29f8:03bc3657:702c3b2d rd.lvm.lv=astolfovg/swap rd.neednet=1 ip=dhcp nomodeset
/etc/crypttab:

Code: Select all

luks-672fcdcc-c8ba-469f-aaec-f634f18e6ba1 UUID=672fcdcc-c8ba-469f-aaec-f634f18e6ba1 none discard
partition layout of my two drives:

Code: Select all

# fdisk -l /dev/sda
Disk /dev/sda: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: WDC WD4002FYYZ-0
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 2E63758B-9C95-4B8F-8AB4-3037177C5502

Device       Start        End    Sectors  Size Type
/dev/sda1     2048    1230847    1228800  600M EFI System
/dev/sda2  1230848    3327999    2097152    1G Linux filesystem
/dev/sda3  3328000 7814035455 7810707456  3.7T Linux RAID

# fdisk -l /dev/sdb
Disk /dev/sdb: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: WDC WD4002FYYZ-0
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: B203032C-39AE-4C5E-8302-598E14A94572

Device     Start        End    Sectors  Size Type
/dev/sdb1   2048 7810709503 7810707456  3.7T Linux RAID
blkids:

Code: Select all

/dev/sda1: UUID="B793-733D" TYPE="vfat" PARTLABEL="EFI System Partition" PARTUUID="8b4f7ffa-2a13-427f-ae75-a3e67c1ea71c"
/dev/sda2: UUID="9acb213f-fdeb-4530-923f-39b935c703d0" TYPE="xfs" PARTUUID="239b89e8-6795-438b-8cd4-b5d05f67f202"
/dev/sda3: UUID="a555b305-f92e-29f8-03bc-3657702c3b2d" UUID_SUB="03b7d63c-e63c-7d01-307b-5cba7c466d0c" LABEL="pv00" TYPE="linux_raid_member" PARTUUID="519bbdbb-5069-4203-8748-089a29b084a3"

/dev/sdb1: UUID="a555b305-f92e-29f8-03bc-3657702c3b2d" UUID_SUB="a99a97be-3e44-2666-5b15-2c84636fc01e" LABEL="pv00" TYPE="linux_raid_member" PARTUUID="cc6a7153-14e2-496f-9c1a-384d7016c2c1"

/dev/md127: UUID="672fcdcc-c8ba-469f-aaec-f634f18e6ba1" TYPE="crypto_LUKS"
/proc/mdstat:

Code: Select all

Personalities : [raid1] 
md127 : active raid1 sda3[0] sdb1[1]
      3905221632 blocks super 1.2 [2/2] [UU]
      [=============>.......]  resync = 67.7% (2644339904/3905221632) finish=144.6min speed=145268K/sec
      bitmap: 11/30 pages [44KB], 65536KB chunk

unused devices: <none>

Re: RAID1 + LUKS + LVM not booting on remote server

Posted: 2021/01/20 12:09:12
by sandtler
Upon further investigation, I found the problem myself: dracut wasn't including the raid1 kernel module in the initramfs, which obviously resulted in the kernel not being able to find the LUKS device. So all I had to do was put

Code: Select all

add_drivers+=" raid1 "
into my /etc/dracut.conf.d/10-custom.conf and everything worked as expected.

Just in case anyone is replicating this exact setup, if connecting to the ssh daemon of the actual host system (after having successfully entered the passphrase, being disconnected and then reconnect when the system is fully up) fails with

Code: Select all

client_loop: send disconnect: Broken pipe
this is most likely having to do with SELinux, probably because the systemd required re-labeling. Temporarily setting it to permissive from within the rescue shell fixes it, then just touch /.autorelabel and reboot again to relabel the system.