Page 1 of 1

Kernel panic, APIC error, /dev/null error and other bad mojo

Posted: 2009/12/23 20:57:02
by gsmithe
Hi all,

I got a kernel panic the other day on a machine that has been fine for 6 months+. I can't be more descriptive, because that's all I was told over the phone before someone rebooted it.

Since then, no panics, but I am seeing things that trouble me and am just looking for some guidance on any action I may need to take.

I get messages like this at the console:

[quote]VMHost2 procmail[5957]: Error while writing to "/dev/null" Message from syslogd@ at Tue Dec 22 01:31:26 2009[/quote]

and this in 21 times via dmesg:
[quote]APIC error on CPU1: 40(40)[/quote]

with 1 different entry after those 21 entries:
[quote]APIC error on CPU0: 00(40)[/quote]


I'm running Centos 5.3


lshw:
Asus M3N78-VM
AMD Athlon(tm) Dual Core Processor 5400B
8GB ram (4 x 2gb)
DIMM DDR2 Synchronous 800 MHz (1.2 ns)

uname -a
Linux VMHost2 2.6.18-128.1.10.el5 #1 SMP Thu May 7 10:35:59 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

I am also running:
VMware Server 2.0.2 build-203138

Also, nothing on the console seems to get logged anywhere. Can I capture that somewhere to help track this problem?

Thanks,

GS

Kernel panic, APIC error, /dev/null error and other bad mojo

Posted: 2009/12/23 21:52:35
by michaelnel
What's the output of "ls -l /dev/null"?

Re: Kernel panic, APIC error, /dev/null error and other bad mojo

Posted: 2009/12/24 14:55:23
by gsmithe
Thanks for the response.
Here you go:


[root@VMHost2 ~]# ls -l /dev/null
-rw-r--r-- 1 root root 0 Dec 24 08:51 /dev/null

Re: Kernel panic, APIC error, /dev/null error and other bad mojo

Posted: 2009/12/24 15:30:11
by AlanBartlett
[quote]
-rw-r--r-- 1 root root 0 Dec 24 08:51 /dev/null
[/quote]
You have a regular file of size zero rather than the required character special device node. :-o

Look --

[code]
$ ls -lZ /dev/null
crw-rw-rw- root root system_u:object_r:null_device_t /dev/null
[/code]
With [i]root[/i]'s powers, try the following --

[code]
[b]rm /dev/null
mknod -m 666 -Z "system_u:object_r:null_device_t" /dev/null c 1 3[/b]
[/code]
If unsure --

[code]
[b]man mknod[/b]
[/code]

Re: Kernel panic, APIC error, /dev/null error and other bad mojo

Posted: 2009/12/24 16:53:51
by gsmithe
Wow, thanks.

Interesting thing, I have 2 boxes that I do the same operations on, and they were identical in their output of ls -l /dev/null.

I checked a third centos box, and it had the right attributes.

I ran your commands, and things look OK now.

Is there someway that I may have caused this to happen (I always thought that /dev/null and other special items were immutable)?

Also, I guess I should track down the APIC errors using the normal route of "noapic" as a boot option?

Is there any way to get the console messages to log somewhere? My searches mainly turned up references to iptables...


Thanks again for the help and expert response.

Re: Kernel panic, APIC error, /dev/null error and other bad mojo

Posted: 2009/12/24 17:33:42
by AlanBartlett
[quote]
Interesting thing, I have 2 boxes that I do the same operations on, and they were identical in their output of ls -l /dev/null.
[/quote]
One way that could happen is if a process, running with the super-user's (i.e. [i]root[/i]'s) powers, has gone haywire (or is defective) and is deleting the device node.

[quote]
I checked a third centos box, and it had the right attributes.

I ran your commands, and things look OK now.
[/quote]
Good and good. :-)

[quote]
Is there someway that I may have caused this to happen . . . ?
[/quote]
Do you regularly perform tasks as the super-user? Have you recently performed a task as the super-user? Executed a shell-script, for example?

If the answer to any of the above questions is yes -- then that might be the cause. If yes, then you really should get into the habit of performing tasks as an ordinary user and only assuming super-user powers when absolutely necessary.

[quote]
I always thought that /dev/null and other special items were immutable
[/quote]
Not from the super-user or any process executing with the super-user's power.

[quote]
Is there any way to get the console messages to log somewhere?
[/quote]
If messages originate from the kernel they should be accessible via dmesg and also logged to the [i]/var/log/messages[/i] file. Other subsystems that output messages should also be logged there. (Hint: Try a [b]tail /var/log/mesages[/b] command.)

I see you are currently running a deprecated point release ([i]5.3[/i]) with a very old kernel ([i]2.6.18-128.1.10.el5[/i]). I would advise that you have a read of the [url=http://wiki.centos.org/Manuals/ReleaseNotes/CentOS5.4]CentOS 5.4 release notes[/url] and upgrade your system by following the procedure shown in Section #4.

Once you have rebooted and are running the current kernel (2.6.18-164.9.1.el5 -- as of the date of this posting), check to see if you still have any APIC errors flagged.

If you are interested to see what has been changed between the 2.6.18-128.1.10.el5 and the current kernel, I have the deltas of each successive kernel change-log [url=http://www.centos.toracat.org/ajb/kernel-clog-diff/]available[/url] for viewing. In your case, you are missing a considerable number of bug and security fixes -- files 28- to 36- will tell all.

Re: Kernel panic, APIC error, /dev/null error and other bad mojo

Posted: 2009/12/24 18:36:16
by gsmithe
Alan,

I have searched this forum numerous times, and I must say that your posts are among the best. Thanks for your time in giving such great answers and support.

I have delayed updating to 5.4 because I haven't tested my running applications yet to confirm they won't blow up in production. I'll try updating the newest 5.4 / kernel before doing any "noapic" or other boot options.

Also, the "error writing to /dev/null" isn't in /var/log/messages or in dmesg either. They got written to the console, and that's it.

Thanks again for your help.

GS

Re: Kernel panic, APIC error, /dev/null error and other bad mojo

Posted: 2009/12/24 18:44:48
by toracat
[quote]
gsmithe wrote:

I am also running:
VMware Server 2.0.2 build-203138
[/quote]
One caution when you are updating to CentOS 5.4. There is a known issue about VMware Server 2.0.x and glibc in CentOS 5.4:

http://bugs.centos.org/view.php?id=3884

It is best to follow the workaround in that bug report [i]before[/i] updating to 5.4.

Re: Kernel panic, APIC error, /dev/null error and other bad mojo

Posted: 2009/12/25 13:32:58
by AlanBartlett
[quote]
I have searched this forum numerous times, and I must say that your posts are among the best. Thanks for your time in giving such great answers and support.
[/quote]
Why, thank you. ([i]Blushes[/i]. :-) )

A lot of the help (we, the forum regulars, try to give) is the result of teamwork . . . Take, for example, [b]toracat[/b]'s advice regarding [i]VMware[/i]. As I don't use it, that issue has not affected me. However [b]toracat[/b] knows about it and, having been watching "over my shoulder", has posted the advice regarding the workaround.

[quote]
Also, the "error writing to /dev/null" isn't in /var/log/messages or in dmesg either. They got written to the console, and that's it.
[/quote]
Having seen your description, above, I now understand. That "error writing . . . " message will not get appended to any system log file because it is just an error message resulting from the attempted redirection of the output of an application process. Such errors will only be displayed on the (virtual) terminal controlling the process -- and in your case, that terminal coincides with the system console.

Explaining things a little more . . .

When a process is being run and either its output or error output is not required to be displayed, the outputs are directed to the null device:

[i]application > /dev/null 2>&1[/i], for example.

Looking back, we see you had a [i]regular file[/i] called null in the /dev/ directory. That file was owned by [i]root[/i] with mode 644. So what would be the result of a process, a process executed by anyone other than [i]root[/i], that attempts to write output to that file? The answer is a terse "error writing to /dev/null" message displayed on the controlling terminal. Q.E.D. :-D