[CLOSED] - Reboot after kernel grub update fails

Issues related to applications and software problems and general support
Post Reply
User avatar
warron.french
Posts: 616
Joined: 2014/03/27 20:21:58

[CLOSED] - Reboot after kernel grub update fails

Post by warron.french » 2021/01/12 03:25:10

We, a coworker, just upgraded our system which included a new kernel, 3.10.0-1160.13.1.

Then we rebooted the machine and after selecting any entry from the grub menu, the server simply hangs.
The rescue menu item also does not work.

This is on a Red Hat, not CentOS, server.

Any ideas what could have caused this? It happened on a physical server so we cannot recover with a VM snapshot, and of course, there is no backup of the server either.
Last edited by warron.french on 2021/02/01 19:56:33, edited 1 time in total.
Thanks,
War

BShT
Posts: 585
Joined: 2019/10/09 12:31:40

Re: Reboot after kernel grub update fails

Post by BShT » 2021/01/12 12:35:14

try to boot verbose

User avatar
warron.french
Posts: 616
Joined: 2014/03/27 20:21:58

Re: Reboot after kernel grub update fails

Post by warron.french » 2021/01/12 16:09:40

BShT wrote:
2021/01/12 12:35:14
try to boot verbose
Sorry, can you provide a little more context and detail, please?
Thanks,
War


User avatar
warron.french
Posts: 616
Joined: 2014/03/27 20:21:58

Re: Reboot after kernel grub update fails

Post by warron.french » 2021/01/13 19:25:39

BShT, I read the link, it doesn't really help.

However, I was able to recover the system with an old grub.cfg copy. My coworker, before initiating the update of our system backed up /etc/grub2.cfg to /etc/grub2.cfg.bak.

With the system in Rescue Mode I was able to mount the (root) / and (/boot) /boot partitions and copy the file around as appropriate. I then rebooted the machine off of disk and chose the latest kernel listed in the backup copy of grub.cfg, which was not the latest kernel installed. The reboot off of HDD without any modification to the linux16 line worked perfectly.

This is the second time after a kernel update that I have had to recover my system this way. Ironically, I didn't know that I required the prior grub.cfg file to recover the system and then build a new one on the running copy. Why is this happening? Is the fact that the server does not have a swap partition at all a part of this?
Thanks,
War

BShT
Posts: 585
Joined: 2019/10/09 12:31:40

Re: Reboot after kernel grub update fails

Post by BShT » 2021/01/13 19:54:47

i don´t know why but i know that is not good to stay without swap

make a swap file of 1GB, it does not hurt

linux kernel needs a swap

User avatar
warron.french
Posts: 616
Joined: 2014/03/27 20:21:58

Re: Reboot after kernel grub update fails

Post by warron.french » 2021/01/14 01:40:01

@BShT, thanks. That confirms my suspicion. I know that Solaris makes it a minimum requirement along with the root (/) and /var partitions.

I think what I am going to do is uninstall and reinstall the latest applied kernel packages, if I see a problem at the reboot again, I will then create a swap file and implement it too then again uninstall/reinstall the same kernel packages a third time.

I am wondering if the lack of swap is the reason for bad kernel package installations, more specifically targeting the initramfs builds and the other associated files such as vmlinuz and etcetera. I don't know anything, yet, about the relationship between System.map, initramfs, vmlinuz and all of those other files. I think once I learn that I can understand why this particular server crashes after every kernel package update specifically.
Thanks,
War

User avatar
warron.french
Posts: 616
Joined: 2014/03/27 20:21:58

Re: Reboot after kernel grub update fails

Post by warron.french » 2021/01/15 14:51:05

I managed to boot to from HDD using Rescue Mode, and put back into place a grub.cfg (backup) that had a matching menuentry to a kernel that was still on my system.

I then was able to escalate privilege to root and remove the last kernel package installed which removed all of its associated (child) dependencies and reboot from HDD and choose a GRUB (menu)entry that worked.

I reinstalled the kernel package (kernel-3.10.0-1160.6.1), rebooted, chose a (menu)entry from grub for the newly installed kernel without issue.

One of the interesting problems I have noticed in all of the backed up bad and working copies is that:
in the working copy, yes working, menuentry line specifically within the /boot/grub2/grub.cfg file there is a reference to gnlulinux-3.10.0-1062 with all of the menuentry lines. All of them have a mismatch of the kernel-version-release to the fixed value of gnulinux-3.10.0-1062.7.1.el7.x86_64-advanced-someUUID. With the exception of the Rescue entry, that one doesn't have the same string for gnulinux. Which I cannot find an explanation for what it does or where it truly determines the version numbers to use in the script that builds that line. Any ideas?

In the broken copies, the gnulinux-version-release and the menuentry version-release details match up perfectly. I am confused.

Shouldn't they actually match and not be different version-release strings?

The version-release strings match between the menuentry items and that of the vmlinuz and initramfs referenced on the linux16 and initrd16 directives too.
Thanks,
War

User avatar
warron.french
Posts: 616
Joined: 2014/03/27 20:21:58

Re: [CLOSED] - Reboot after kernel grub update fails

Post by warron.french » 2021/02/01 19:58:10

A total rebuild of the server became necessary. The vendor did not even have a reason for why the GRUB files got so messed up, but their scripts made silly changes. I don't think they even tested their scripts over several upgrades; which is how the problem was reproduced.

Closing this matter.
Thanks,
War

Post Reply