CentOS 6.9 server keeps fatally hanging every few days with a completely empty screen and no crash file or log errors. W

Issues related to hardware problems
Post Reply
jiwkel
Posts: 1
Joined: 2021/12/05 11:35:07

CentOS 6.9 server keeps fatally hanging every few days with a completely empty screen and no crash file or log errors. W

Post by jiwkel » 2021/12/05 11:38:05

Hi all, we have a CentOS 6.9 server that's fairly important for us as a gateway machine but it keeps going down every few days. I cannot for the life of me understand why. /var/log/messages shows a lot of repeating LDAP error messages but I don't think that would crash the system. We also get a lot of SSH attempts from weird places but the system's beefy enough for that (its got 256 GB of RAM, 2 x Xeon E5-2670s).

dmesg doesn't show anything suspicious or notable.

At boot, we do get a weird message about the on-board MegaRAID SAS 2208. It says "your VDs that are configured for write-back are temporarily running in write-through mode.This is caused by the battery being charged, missing or bad. The following VDs are affected 00". I've tried running top so I can see if the server runs a process that causes it to freeze. The last time it froze, top showed a load average of 0.00 with no weird processes running. I'm not sure what to do at this point. Where would y'all look next?

User avatar
TrevorH
Site Admin
Posts: 33202
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: CentOS 6.9 server keeps fatally hanging every few days with a completely empty screen and no crash file or log error

Post by TrevorH » 2021/12/05 15:00:39

Your first step should be to change the RAID controller battery as it sounds like it is dead or dying. That shouldn't affect anything in theory but... it's going to make stuff use code that it doesn't use very often and could be buggy.
/var/log/messages shows a lot of repeating LDAP error messages
Check for network card errors in the logs prior to those starting.
we have a CentOS 6.9 server that's fairly important for us as a gateway machine
And now the elephant in the room.... CentOS 6 is dead and should no longer be used. In addition, 6.9 is already not the most recent (from 2017) which was 6.10 (2018) and all of the subsequent patches to 6.10 that continued right up until it went EOL in 2020. At the very least you should already be on 6.10 plus all the assorted patches that came out after 6.10. Get there by running yum update though you may need to disable the base/updates/extras repos and use the Vault copies of those (or maybe even edit the .repo files to point them to vault.centos.org). Once that's done, you need to go away and start thinking about how to get off 6 altogether. It's not safe to run and every new security vulnerability that comes out makes it less so. CentOS 7 has about another 2.5 years life left, CentOS 8 has less and goes EOL at the end of 2021 so look at Rocky/Alma/OEL or even RHEL for a replacement.
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

Post Reply