"Random" reboot problems

General support questions
Post Reply
WizardStan
Posts: 2
Joined: 2019/12/19 16:46:11

"Random" reboot problems

Post by WizardStan » 2019/12/19 20:02:07

I just joined today because I have an interesting problem I have never, in all my years, encountered that I'm hoping someone may be able to shed some light on.
tl;dr; power management shuts off my screen after 10 minutes and then my computer reboots and I don't know why. This post is long because I've done SCIENCE! I apologize.

I have a minimal install of CentOS 7; this is meant to be a headless server for throwing up a few prototype web services. This was setup about a month back and then put aside until I picked it up again this morning.
So it boots, I ssh into it, start preparing it for what I need it to do, after 10 minutes of uptime it reboots. Consistently. Exactly 10 minutes, maybe 10 minutes and a couple seconds. I ssh in, work for 10 minutes, reboots.
Here's a snip of my "last reboot" output

Code: Select all

reboot   system boot  3.10.0-1062.el7. Thu Dec 19 06:43 - 11:50  (05:06)    
reboot   system boot  3.10.0-1062.el7. Thu Dec 19 06:32 - 11:50  (05:18)    
reboot   system boot  3.10.0-1062.el7. Thu Dec 19 06:08 - 11:50  (05:42)    
reboot   system boot  3.10.0-1062.el7. Thu Dec 19 05:58 - 11:50  (05:52)    
reboot   system boot  3.10.0-1062.el7. Thu Dec 19 05:48 - 11:50  (06:02)    
reboot   system boot  3.10.0-1062.el7. Thu Dec 19 05:38 - 10:47  (05:08)    
reboot   system boot  3.10.0-1062.el7. Thu Dec 19 05:28 - 10:47  (05:19)    
reboot   system boot  3.10.0-1062.el7. Thu Dec 19 05:18 - 10:47  (05:29)    
reboot   system boot  3.10.0-1062.el7. Thu Dec 19 05:08 - 10:47  (05:39)
Something misconfigured with my hardware clock I assume, overcompensating for timezone; I'll figure it out later, the important thing is that 05:08 => 10:08, and each reboot is 10 minutes long.
Things to note here: the fourth line from the bottom, 05:38 - 10:47, I actually forced a clean reboot there, waited 9 minutes 50 seconds, just to see if that made a difference. It did not; the next loop was exactly 10 minutes again.

5:48, 5:58, 6:08, all ten minutes apart, right? Then during the 6:08 run I wanted to see exactly how long it runs, to the second, so I started running "watch -n uptime". And I waited 10 minutes... and it kept going, 10 seconds, 20 seconds, one minute, two minutes. So long as I'm doing "something" (even a simple watch) it doesn't reboot. I expected it to reboot at 06:18 but it didn't. At 6:20 I plugged in a monitor... odd, screen is black. Plug in the keyboard, hit a key, EH, there's my login prompt. I stop the watch process in my ssh terminal, log in on the real terminal, make sure everything is fine, and logout again. I wait, I set a timer on my phone this time because I have an idea. It is 6:22 now.
10 minutes later, 6:32 the screen goes black. 2 seconds later it reboots.

Now the 6:32 session, I started "watch -n 1 uptime" and waited, this time with monitor plugged in. 10 minutes later the screen went black. My watch continues going, 10 seconds, 20 seconds, 30 seconds. I stop it. 5 seconds later, reboot.

Since then I have discovered the following:
1) The screen shuts off after 10 minutes of inactivity at the main terminal. If I log in through ssh it still shuts the screen off. If I plug in a keyboard and log in then 10 minute timer starts from the last key stroke. Conclusion: power management, even in minimal, non-gui mode, is shutting off my screen. Logging in via ssh obviously does not change this.
2) This causes my computer to reboot for some reason. May be directly related, may be a side effect of something tangentially related, I don't know.
3) The reboot can be mitigated for 3 minutes (2 minutes 55 seconds, consistently) by running a process in an ssh session; instead of rebooting at the 10 minute mark when the screen turns off, it will reboot at 13 minutes with the screen being off for the last 3 minutes. If, after the first 10 minutes and the screen going black, I stop this process, within 10 seconds it will reboot.
4) I can combine these two facts: run a process in an ssh session, hit a key on the "real" keyboard within 13 minutes (screen going black after 10 minutes and wakes up when the key is pressed), the apparent 10 (13) minute countdown begins again.
5) I can log in on the main terminal, run "setterm -blank 0", and this solves everything: the screen no longer goes black and the PC no longer reboots. Doesn't matter if there's an ssh session running a watch process or not, 10 minutes or 13 minutes or 20 minutes, it just keep going.
6) I can "setterm -blank 1" and trigger the blank/reboot in 1 (or 4) minutes.

Questions:
1) Why does it blank after 10 minutes? And how can I properly disable this? I'm inundated with advice on how to change it in a graphical environment, but as mentioned, this is meant to be headless, no GUI is installed and I don't intend to install one. One suggestion I found was to add "setterm -blank 0" to the rc.local file but that doesn't seem to work (even after fixing everything so it actually runs)
2) What magic is going on that a watch process in a separate session can keep it going for an extra 2 minutes 55 seconds?
3) Why does it suddenly decide to reboot within seconds of the watch process stopping?

I feel like I should just wipe the drive and start over again but this feels like too big of a puzzle to solve; if I wipe it I may never know!
I don't even know how to go about debugging this further. Here's all the things it does in these controlled environments... WHY?

User avatar
TrevorH
Site Admin
Posts: 33215
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: "Random" reboot problems

Post by TrevorH » 2019/12/19 20:13:45

I'd look at BIOS updates and BIOS settings, especially ones about power management and/or watchdogs.

You should also look at running yum update as your system is behind. But make sure that you do whatever is required to stop it rebooting while that's running or it'll add another set of problems to the pile.
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

desertcat
Posts: 843
Joined: 2014/08/07 02:17:29
Location: Tucson, AZ

Re: "Random" reboot problems

Post by desertcat » 2019/12/20 02:10:13

While this is *probably* NOT the problem, there is indeed *some* similarity with Screen Saver ie at exactly 10 minutes if it has not detected any activity it kicks in, but in your case it re-boots the machine ie after 10 minutes if it detects no activity it reboots. Maybe it is expecting to have a monitor plugged in?!? Some Machines auto-reboot once every 24 hours. Though in your case the auto re-boot it is once every 10 minutes.. unless it detects activity at which point it it stops the clock, until it no longer detects activity, at which point the count down clock starts again. That is what it *sounds* like to me, but I am probably wrong. The solution is probably in some config file somewhere.

Merry Christmas and a Happy New Year

WizardStan
Posts: 2
Joined: 2019/12/19 16:46:11

Re: "Random" reboot problems

Post by WizardStan » 2019/12/20 18:09:41

And solution, I guess.

I added "consoleblank=0 nomodeset text" to my kernel parameters.
The "consoleblank" I added first to prevent it from shutting down the screen and rebooting after 10 minutes, but then it started rebooting after 30 seconds when no monitor was connected at all. I don't know how I figured out "nomodeset text" was required, but it works; it probably only needs "nomodeset" but I'm going to dust my hands and not touch it now that it's working.
So the lesson I'm taking away, if something weird is going on, check your kernel parameters.

User avatar
KernelOops
Posts: 428
Joined: 2013/12/18 15:04:03
Location: xfs file system

Re: "Random" reboot problems

Post by KernelOops » 2019/12/21 18:40:44

The nomodeset parameter instructs the kernel to not load any video drivers.

If that fixed your reboot problem, then maybe there is an incompatibility between your hardware (video card, chipset) with a specific kernel driver.

Are you sure you looked in your journalctl for any related errors? You could run "journalctl -f" (follow) and see the logs go by when the reboot occurs.
--
R.I.P. CentOS :cry:
--

desertcat
Posts: 843
Joined: 2014/08/07 02:17:29
Location: Tucson, AZ

Re: "Random" reboot problems

Post by desertcat » 2020/03/10 09:04:24

WizardStan wrote:
2019/12/19 20:02:07
I just joined today because I have an interesting problem I have never, in all my years, encountered that I'm hoping someone may be able to shed some light on.
tl;dr; power management shuts off my screen after 10 minutes and then my computer reboots and I don't know why. This post is long because I've done SCIENCE! I apologize.

I have a minimal install of CentOS 7; this is meant to be a headless server for throwing up a few prototype web services. This was setup about a month back and then put aside until I picked it up again this morning.
So it boots, I ssh into it, start preparing it for what I need it to do, after 10 minutes of uptime it reboots. Consistently. Exactly 10 minutes, maybe 10 minutes and a couple seconds. I ssh in, work for 10 minutes, reboots.
Here's a snip of my "last reboot" output

Code: Select all

reboot   system boot  3.10.0-1062.el7. Thu Dec 19 06:43 - 11:50  (05:06)    
reboot   system boot  3.10.0-1062.el7. Thu Dec 19 06:32 - 11:50  (05:18)    
reboot   system boot  3.10.0-1062.el7. Thu Dec 19 06:08 - 11:50  (05:42)    
reboot   system boot  3.10.0-1062.el7. Thu Dec 19 05:58 - 11:50  (05:52)    
reboot   system boot  3.10.0-1062.el7. Thu Dec 19 05:48 - 11:50  (06:02)    
reboot   system boot  3.10.0-1062.el7. Thu Dec 19 05:38 - 10:47  (05:08)    
reboot   system boot  3.10.0-1062.el7. Thu Dec 19 05:28 - 10:47  (05:19)    
reboot   system boot  3.10.0-1062.el7. Thu Dec 19 05:18 - 10:47  (05:29)    
reboot   system boot  3.10.0-1062.el7. Thu Dec 19 05:08 - 10:47  (05:39)
Something misconfigured with my hardware clock I assume, overcompensating for timezone; I'll figure it out later, the important thing is that 05:08 => 10:08, and each reboot is 10 minutes long.
Things to note here: the fourth line from the bottom, 05:38 - 10:47, I actually forced a clean reboot there, waited 9 minutes 50 seconds, just to see if that made a difference. It did not; the next loop was exactly 10 minutes again.

5:48, 5:58, 6:08, all ten minutes apart, right? Then during the 6:08 run I wanted to see exactly how long it runs, to the second, so I started running "watch -n uptime". And I waited 10 minutes... and it kept going, 10 seconds, 20 seconds, one minute, two minutes. So long as I'm doing "something" (even a simple watch) it doesn't reboot. I expected it to reboot at 06:18 but it didn't. At 6:20 I plugged in a monitor... odd, screen is black. Plug in the keyboard, hit a key, EH, there's my login prompt. I stop the watch process in my ssh terminal, log in on the real terminal, make sure everything is fine, and logout again. I wait, I set a timer on my phone this time because I have an idea. It is 6:22 now.
10 minutes later, 6:32 the screen goes black. 2 seconds later it reboots.

Now the 6:32 session, I started "watch -n 1 uptime" and waited, this time with monitor plugged in. 10 minutes later the screen went black. My watch continues going, 10 seconds, 20 seconds, 30 seconds. I stop it. 5 seconds later, reboot.

Since then I have discovered the following:
1) The screen shuts off after 10 minutes of inactivity at the main terminal. If I log in through ssh it still shuts the screen off. If I plug in a keyboard and log in then 10 minute timer starts from the last key stroke. Conclusion: power management, even in minimal, non-gui mode, is shutting off my screen. Logging in via ssh obviously does not change this.
2) This causes my computer to reboot for some reason. May be directly related, may be a side effect of something tangentially related, I don't know.
3) The reboot can be mitigated for 3 minutes (2 minutes 55 seconds, consistently) by running a process in an ssh session; instead of rebooting at the 10 minute mark when the screen turns off, it will reboot at 13 minutes with the screen being off for the last 3 minutes. If, after the first 10 minutes and the screen going black, I stop this process, within 10 seconds it will reboot.
4) I can combine these two facts: run a process in an ssh session, hit a key on the "real" keyboard within 13 minutes (screen going black after 10 minutes and wakes up when the key is pressed), the apparent 10 (13) minute countdown begins again.
5) I can log in on the main terminal, run "setterm -blank 0", and this solves everything: the screen no longer goes black and the PC no longer reboots. Doesn't matter if there's an ssh session running a watch process or not, 10 minutes or 13 minutes or 20 minutes, it just keep going.
6) I can "setterm -blank 1" and trigger the blank/reboot in 1 (or 4) minutes.

Questions:
1) Why does it blank after 10 minutes? And how can I properly disable this? I'm inundated with advice on how to change it in a graphical environment, but as mentioned, this is meant to be headless, no GUI is installed and I don't intend to install one. One suggestion I found was to add "setterm -blank 0" to the rc.local file but that doesn't seem to work (even after fixing everything so it actually runs)
2) What magic is going on that a watch process in a separate session can keep it going for an extra 2 minutes 55 seconds?
3) Why does it suddenly decide to reboot within seconds of the watch process stopping?

I feel like I should just wipe the drive and start over again but this feels like too big of a puzzle to solve; if I wipe it I may never know!
I don't even know how to go about debugging this further. Here's all the things it does in these controlled environments... WHY?
I am in the midst of rebuilding an OLD computer which last week "blew up": It was making noise so I powered down and then powered it back up, BUT.... NOTHING!!! The PSU is the "Chief Suspect" in this case: The mother board was getting power and that was about it. Since I had to take 75% of the computer apart just to remove the PSU , and after battling my way through the maze of cables etc., I said, "ENOUGH", decided to junk the case, buy a better case (Corsair Carbide 200R). So the old case is history, the new case is in and I have been rebuilding the computer as small parts keep rolling in, part by part. I'm just waiting on some case fans and the PSU which is the EXACT same 550W power supply I had in there. While the PSU says it has -5V power which my motherboard seems to need just to turn on, the one that arrived came devoid of the WHITE -5V power wire (pin 18 on a 20 pin cable), so I sent the PSU back to have them replace it with one that DOES have the -5V power. But I was "CURIOUS" (it will be the death of me yet!!) to see what that -5V actually does. And came across this article:

http://www.technologyuk.net/computing/c ... unit.shtml

I found out that the -5V rail is used:

"-5V Used on some early personal computers for floppy disk controllers and some ISA add-on cards. Generally unused on newer systems. Current is usually limited to 1A."

Well that makes sense my mother board was built in 2004 and YES still has a built in floppy port.

... BUT then came across the following statement:

"Power supply failure will invariably require the replacement of the PSU, since the computer will obviously not function without it. Such failures often result from overheating due to the breakdown of the cooling fan. The system subsequently powers itself off and cannot be rebooted or, as sometimes happens, repeatedly reboots itself at apparently random intervals."


I remembered we tried to puzzle this one out but could not figure it out. Now by chance I think I know the ANSWER: Your POWER SUPPLY was/is on its way out to visit Thor!! (The God of Thunder and LIGHTNING aka ELECTRICITY ie. It's about to die!!).

Thought I'd pass on this little bit of information. Me?!? I'd have never thought of that.

D'Cat

afewgoodman
Posts: 98
Joined: 2019/12/11 03:51:58

Re: "Random" reboot problems

Post by afewgoodman » 2020/03/11 01:40:45

Hi,

If you have turned off AC power for long time also, coin battery in the main board to maintain your hw clock would be end of life.

Try to change time in the BIOS, then reboot, if you go through it again, you should change coin battery in the main board.

BR.

desertcat
Posts: 843
Joined: 2014/08/07 02:17:29
Location: Tucson, AZ

Re: "Random" reboot problems

Post by desertcat » 2020/03/11 10:09:05

afewgoodman wrote:
2020/03/11 01:40:45
Hi,

If you have turned off AC power for long time also, coin battery in the main board to maintain your hw clock would be end of life.

Try to change time in the BIOS, then reboot, if you go through it again, you should change coin battery in the main board.

BR.

Given that I decided to go the whole enchilada route as long as I was "upgrading" everything from drives, cables, case, case fans, etc., as long as I had everything apart laying everywhere, and since I could not remember when the last time it was that I changed the battery, I decided for a "Few Dollars More" (Sounds like a movie title) I'd pop in a new coin battery as well. Great minds must think alike. Back in the days of the IBM AT (slightly dates myself) I remember that if the Time and Date were not set correctly the computer would refuse to turn on, so.... just in case that strange quirk still exists I decided to change the battery, and eliminate any potential problem.

I'm waiting a pair of case fans due to arrive tomorrow (Thurs.), but the real thing I am awaiting on is the RMA PSU to arrive -- hopefully by early next week. For the most part I've done a nice re-build, and it will be a piece of cake to service now. No more hernias moving it around, no more pulling some wire lose while trying to snake my hand to disconnect something else, no more bloodied knuckles from trying to reach a drive, and maintenance will be so much easier.

Once the PSU is installed, all that will remain to do after that to test and make sure the computer comes back to life, reset the Time and Date, etc. then the final thing that I'll need to do is cable management which at this point is very loosely held in place if it is held in place at all. Cats have Nine Lives, well bobcat will now be on its third one, and each time it changes in some small way, with the addition or subtraction of some legacy hardware. This time I've added a USB multi-card reader/writer which I salvaged from the old case before it was scraped. Without a doubt this has got to be one of the sleekest re-builds I've ever done: Everything is BLACK!! Except for the 3.5" floppy drive which gives the game away, this looks like this computer is of recent vintage. At least now it won't stand out like a sore thumb.

Thanks for the info.

D'Cat

Post Reply