Centos 9-stream hypervisor randomly reboots

user9452 · Post by **user9452** » 2022/06/15 11:30:59

I'm running 9-stream as a hypervisor for four vms. The system is very unstable and reboots at random. All the vms were migrated from a similar Fedora34 hypervisor installation which worked fine for over a year. Never had similar issues.

Only difference between these two hosts is Fedora uses systemd-networkd and Stream-9 uses NetworkManager to bridge to a firewall vm running obenbsd which handles networking.

this is the journalctl output from previous session just before the system rebooted itself:
the /dev/sdb is a media disk used by a vm so that shoudn't cause anything even if it has bad sectors

Jun 15 13:55:43 hypervisor smartd[994]: Device: /dev/sdb [SAT], 2 Currently unreadable (pending) sectors
Jun 15 13:56:11 hypervisor kernel: perf: interrupt took too long (3131 > 3130), lowering kernel.perf_event_max_sample_rate to 63000
Jun 15 13:57:41 hypervisor kernel: perf: interrupt took too long (3916 > 3913), lowering kernel.perf_event_max_sample_rate to 51000
Jun 15 14:01:01 hypervisor CROND[3773]: (root) CMD (run-parts /etc/cron.hourly)
Jun 15 14:01:01 hypervisor run-parts[3776]: (/etc/cron.hourly) starting 0anacron
Jun 15 14:01:01 hypervisor run-parts[3782]: (/etc/cron.hourly) finished 0anacron
Jun 15 14:01:01 hypervisor CROND[3772]: (root) CMDEND (run-parts /etc/cron.hourly)
Jun 15 14:01:16 hypervisor virtqemud[1104]: internal error: connection closed due to keepalive timeout
Jun 15 14:03:16 hypervisor systemd[1]: virtnetworkd.service: Deactivated successfully.
Jun 15 14:03:16 hypervisor systemd[1]: virtnetworkd.service: Unit process 2372 (dnsmasq) remains running after unit stopped.
Jun 15 14:03:16 hypervisor systemd[1]: virtnetworkd.service: Unit process 2373 (dnsmasq) remains running after unit stopped.
Jun 15 14:03:16 hypervisor systemd[1]: virtnodedevd.service: Deactivated successfully.
Jun 15 14:03:16 hypervisor systemd[1]: virtstoraged.service: Deactivated successfully.
Jun 15 14:05:52 hypervisor NetworkManager[1079]: <info> [1655291152.9085] dhcp6 (br0): state changed new lease, address=2001ab00:c218::1
Jun 15 14:16:48 hypervisor sshd[2218]: pam_unix(sshd:session): session closed for user root
Jun 15 14:16:48 hypervisor systemd[1]: session-4.scope: Deactivated successfully.

Post by **TrevorH** » 2022/06/15 14:59:03

Do you get a kernel panic output to the console? There are various settings controlling how long it waits after a pnaic before it reboots so you may want to check what the current setting in /proc/sys/kernel/panic is and consult https://www.kernel.org/doc/Documentatio ... kernel.txt for its meaning. I believe a setting of 0 (the default on my RHEL 9) is 0 which means wait forever.

user9452 · Post by **user9452** » 2022/06/15 16:15:50

I don't know about console output since I haven't been in front of it while this has happened and the system reboots immediately.

The panic setting seemed to be empty so I don't know is it reboot immediately or wait forever. Anyway ran sysctl -w kernel.panic="0" and I guess now I play the waiting game and see what I get.

Post by **TrevorH** » 2022/06/15 17:19:25

Code: Select all

[root@rhel9 ~]# cat /proc/sys/kernel/panic
0

CentOS

Centos 9-stream hypervisor randomly reboots

Centos 9-stream hypervisor randomly reboots

Re: Centos 9-stream hypervisor randomly reboots

Re: Centos 9-stream hypervisor randomly reboots

Re: Centos 9-stream hypervisor randomly reboots