Page 1 of 1

kernel update- nouveau freezes gpu - can't use?

Posted: 2015/10/15 02:56:18
by CorvusB
Running CentOS 7. Last kernel upgrade somehow broke nouveau. Now, when I just let it start with this kernel, the startup stalls and freezes the desktop when starting gdm. Nouveau is repeating a message "failed to idle channel" and a bunch of other meaningless (to me) stuff, like what looks like a mem address.

Without intervention, the keyboard and screen are locked. The mouse is still responsive. I can not change sessions or get any response for ctrl-alt-del or ctrl-alt-backspace from the keyboard. I've got a blinking cursor on a black screen on the monitor.

I can ssh in. I see no unusual activity using top. I kill several processes associated with gdm, and I can get the monitor and keyboard responsive again. But then nouveau is just continually giving me the same message in a console screen (the "failed to idle channel" above). My samba is working, too. I just do not have machine-local I/O!

I do not care a bit about nouveau. I don't care about graphics at all on this machine - it is a file server - pure and simple. Well, that and a db server testbed. So, graphics mean nothing.

So what do I do? Disable nouveau? I hate nouveau anyway - it has been the source of seemingly endless problems whenever I've dealt with it. On other machines, where I care about the gpu, I go with the nvidia drivers, and I've always been happy.

This machine has to reboot remotely with no problems. As it stands now, I have to be at the console to change the grub default selection to get it to boot. It boots to the older kernel just fine.

So tell me what you want to see - xorg log? message log? boot log?

Should I just set the grub boot default to the old kernel somehow? (tell me how, or point me in the right direction, plz!)
Should I just disable nouveau (and replace it with what?)?

Appreciate your help!

Re: kernel update- nouveau freezes gpu - can't use?

Posted: 2015/10/15 08:07:27
by giulix63
Change the default target to multi-user (same as setting default runlevel to 3 in /etc/inittab on old CentOS):

Code: Select all

su - -c "systemctl set-default multi-user.target"
This way it won't even bother to start a graphical interface, nor the X window system.

Re: kernel update- nouveau freezes gpu - can't use?

Posted: 2015/10/17 00:19:10
by CorvusB
giulix63 wrote:Change the default target to multi-user (same as setting default runlevel to 3 in /etc/inittab on old CentOS):

Code: Select all

su - -c "systemctl set-default multi-user.target"
This way it won't even bother to start a graphical interface, nor the X window system.
Ok. Checked that idea out - and it works, for sure. Thanks for that! But it is a limited workaround, since I do also log directly in to the server sometimes for various maintenance tasks. And for that, I mostly use a GUI. And logging in followed by startx hangs.

Over the next couple of hours I will try to get some of the log errors and warnings that look significant, and I'll add them to the convo.

Btw - I thought of just pulling the GPU card - but this mb does not have an onboard GPU - so I have no replacement on hand that is likely to be any better.

Re: kernel update- nouveau freezes gpu - can't use?

Posted: 2015/10/17 07:48:32
by giulix63
CorvusB wrote: Over the next couple of hours I will try to get some of the log errors and warnings that look significant, and I'll add them to the convo.
When you do that, please post make & model of your system as well.

Re: kernel update- nouveau freezes gpu - can't use?

Posted: 2015/10/20 20:53:59
by CorvusB
Ok. Some details.
Get this out of the way - I arranged a 2nd workaround by editing grub2 so that the working kernel is the default boot. More on that in a second.

Specs:
uname -r
3.10.0-123.el7.x86_64

cat /proc/cpuinfo

Code: Select all

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 15
model		: 4
model name	: Intel(R) Pentium(R) 4 CPU 3.40GHz
stepping	: 10
microcode	: 0x4
cpu MHz		: 2400.000
cache size	: 2048 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 5
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc pebs bts nopl pni dtes64 monitor ds_cpl est cid cx16 xtpr lahf_lm
bogomips	: 6800.38
clflush size	: 64
cache_alignment	: 128
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 15
model		: 4
model name	: Intel(R) Pentium(R) 4 CPU 3.40GHz
stepping	: 10
microcode	: 0x4
cpu MHz		: 2400.000
cache size	: 2048 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 1
apicid		: 1
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 5
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc pebs bts nopl pni dtes64 monitor ds_cpl est cid cx16 xtpr lahf_lm
bogomips	: 6800.38
clflush size	: 64
cache_alignment	: 128
address sizes	: 36 bits physical, 48 bits virtual
power management:
lscpu

Code: Select all

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                2
On-line CPU(s) list:   0,1
Thread(s) per core:    2
Core(s) per socket:    1
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            15
Model:                 4
Model name:            Intel(R) Pentium(R) 4 CPU 3.40GHz
Stepping:              10
CPU MHz:               2400.000
BogoMIPS:              6800.38
L1d cache:             16K
L2 cache:              2048K
NUMA node0 CPU(s):     0,1
The gpu is an Nvidia chip card from the same timeframe as the CPU. See the Xorg logs that will follow for more detail on the drivers there.

Back to the work-around I switched to: fixing grub's default boot. (If you want to skip to the current problem, scroll down to where I've marked a section ~~~~~LOG OUTPUTS~~~~~~)
This workaround actually helped me to find some pertinent errors. It allows me to boot into a gui environment.
Edited /etc/default/grub to set the "GRUB_DEFAULT=" line. Default value is 0, which is the top of the list. You can set it using a line number, but this will change with every kernel update. A better option is to use the exact text that grub2 uses in the menu. E.g. GRUB_DEFAULT="Advanced options for Ubuntu>Ubuntu, with Linux 3.13.0-53-generic" (including quotes). (1)
Changes to /etc/default/grub require rebuilding the grub.cfg file. Update manually as follows:
On BIOS-based machines, issue the following command as root:

grub2-mkconfig -o /boot/grub2/grub.cfg (2)


~~~~~LOG OUTPUTS~~~~~~
When back to a graphical.target boot, I compared Xorg log files, and found this difference:
good boot

Code: Select all

[    26.962] (II) [drm] nouveau interface version: 1.1.1                                                                                                    │
[    26.962] (WW) Falling back to old probe method for modesetting                                                                                          │
[    26.962] (WW) Falling back to old probe method for fbdev                                                                                                │
[    26.962] (II) Loading sub module "fbdevhw"                                                                                                              │
[    26.962] (II) LoadModule: "fbdevhw"                                                                                                                     │
[    26.963] (II) Loading /usr/lib64/xorg/modules/libfbdevhw.so                                                                                             │
[    26.963] (II) Module fbdevhw: vendor="X.Org Foundation"                                                                                                 │
[    26.963]    compiled for 1.15.0, module version = 0.0.2                                                                                                 │
[    26.963]    ABI class: X.Org Video Driver, version 15.0                                                                                                 │
[    26.963] (WW) Falling back to old probe method for vesa                                                                                                 │
[    26.963] (II) Loading sub module "dri2"                                                                                                                 │
[    26.963] (II) LoadModule: "dri2"                                                                                                                        │
[    26.963] (II) Module "dri2" already built-in                                                                                                            │
[    26.963] (--) NOUVEAU(0): Chipset: "NVIDIA NV46"                                                                                                        │
[    26.963] (II) NOUVEAU(0): Creating default Display subsection in Screen section                                                                         │
        "Default Screen Section" for depth/fbbpp 24/32                                                                                                      │
[    26.963] (==) NOUVEAU(0): Depth 24, (--) framebuffer bpp 32                                                                                             │
[    26.963] (==) NOUVEAU(0): RGB weight 888                                                                                                                │
[    26.963] (==) NOUVEAU(0): Default visual is TrueColor                                                                                                   │
[    26.963] (==) NOUVEAU(0): Using HW cursor                                                                                                               │
[    26.963] (==) NOUVEAU(0): GLX sync to VBlank disabled.                                                                                                  │
[    26.963] (==) NOUVEAU(0): Page flipping enabled                                                                                                         │
[    26.963] (==) NOUVEAU(0): Swap limit set to 2 [Max allowed 2]             
bad boot

Code: Select all

[  1626.245] (II) [drm] nouveau interface version: 1.1.2                                                                                                    │
[  1626.245] (WW) Falling back to old probe method for modesetting                                                                                          │
[  1626.246] (WW) Falling back to old probe method for fbdev                                                                                                │
[  1626.246] (II) Loading sub module "fbdevhw"                                                                                                              │
[  1626.246] (II) LoadModule: "fbdevhw"                                                                                                                     │
[  1626.246] (II) Loading /usr/lib64/xorg/modules/libfbdevhw.so                                                                                             │
[  1626.246] (II) Module fbdevhw: vendor="X.Org Foundation"                                                                                                 │
[  1626.246]    compiled for 1.15.0, module version = 0.0.2                                                                                                 │
[  1626.246]    ABI class: X.Org Video Driver, version 15.0                                                                                                 │
[  1626.246] (WW) Falling back to old probe method for vesa                                                                                                 │
[  1626.247] (II) Loading sub module "dri2"                                                                                                                 │
[  1626.247] (II) LoadModule: "dri2"                                                                                                                        │
[  1626.247] (II) Module "dri2" already built-in                                                                                                            │
[  1626.247] (EE) NOUVEAU(0): [drm] failed to set drm interface version.                                                                                    │
[  1626.247] (EE) NOUVEAU(0): [drm] error opening the drm                                                                                                   │
[  1626.247] (EE) NOUVEAU(0): 836:                                                                                                                          │
[  1626.247] (II) UnloadModule: "nouveau"                                                                                                                   │
[  1626.247] (EE) Screen(s) found, but none have a usable configuration.                                                                                    │
[  1626.247] (EE)                                                                                                                                           │
Fatal server error:                                                                                                                                         │
[  1626.247] (EE) no screens found(EE)                                                                                                                      │
[  1626.247] (EE)                                            
The item that strikes me is the first line I've copied.
[ 1626.245] (II) [drm] nouveau interface version: 1.1.2
vs
[ 26.962] (II) [drm] nouveau interface version: 1.1.1

I don't know that it is meaningful, but obviously something in nouveau changed with the kernel upgrades. I'm not sure if there is anything I can do about this other than continue to boot the older kernel until the devs get nouveau fixed? If that is the case I should probably file a bug report, yes?

Footnotes:
(1) http://askubuntu.com/questions/216398/s ... grub-entry
(2)https://access.redhat.com/documentation ... oader.html

Re: kernel update- nouveau freezes gpu - can't use?

Posted: 2015/10/21 07:12:27
by giulix63
Yes, I think it's a good idea. In the meanwhile, if you don't want to stay with the old kernel, you can always switch to the Nvidia drivers until Nouveau gets fixed.

Re: kernel update- nouveau freezes gpu - can't use?

Posted: 2015/10/22 22:20:35
by Walkin.Blue
If you have vga card nvidia, better use official nvidia driver first.
You can download directly from nvidia and find suitable driver for your card.

Re: kernel update- nouveau freezes gpu - can't use?

Posted: 2015/10/23 07:09:56
by giulix63
Install nvidia-detect, run it and install the resulting drivers from ELRepo. I think this is the endorsed method here (TrevorH et al. correct me if I'm wrong, please). Else you'll have to rely on Nvidia for support.

Re: kernel update- nouveau freezes gpu - can't use?

Posted: 2015/10/23 16:10:06
by AlanBartlett
giulix63 wrote:Install nvidia-detect, run it and install the resulting drivers from ELRepo. I think this is the endorsed method here (TrevorH et al. correct me if I'm wrong, please). Else you'll have to rely on Nvidia for support.
Correct.