kernel-4.18.0-193.6.3.el8_2.x86_64 update broke kmod-nvidia driver functioning

Issues related to hardware problems
mprater
Posts: 22
Joined: 2020/04/16 19:10:33

Re: kernel-4.18.0-193.6.3.el8_2.x86_64 update broke kmod-nvidia driver functioning

Post by mprater » 2020/06/23 00:55:33

Actually, here's something from /var/log/messages:

Code: Select all

Jun 22 17:35:42 localhost /usr/libexec/gdm-x-session[2715]: (EE) NVIDIA(0): Failed to initialize the GLX module; please check in your X
Jun 22 17:35:42 localhost /usr/libexec/gdm-x-session[2715]: (EE) NVIDIA(0):     log file that the GLX module has been loaded in your X
Jun 22 17:35:42 localhost /usr/libexec/gdm-x-session[2715]: (EE) NVIDIA(0):     server, and that the module is the NVIDIA GLX module.  If
Jun 22 17:35:42 localhost /usr/libexec/gdm-x-session[2715]: (EE) NVIDIA(0):     you continue to encounter problems, Please try
Jun 22 17:35:42 localhost /usr/libexec/gdm-x-session[2715]: (EE) NVIDIA(0):     reinstalling the NVIDIA driver.
So... it would seem that using kmod-nvidia doesn't actually work when updating the kernel, and manual installation of the nvidia driver is required. Seems like kmod-nvidia ought to be removed from the repo if that's the case. I don't know why updating the kernel would have caused so much to fail otherwise. Does that sound about right?

More from the /var/log/Xorg.0.log file:

Code: Select all

...
[    30.979] (II) LoadModule: "glx"
[    30.981] (II) Loading /usr/lib64/xorg/modules/extensions/libglx.so
[    30.992] (II) Module glx: vendor="X.Org Foundation"
[    30.992] 	compiled for 1.20.6, module version = 1.0.0
[    30.992] 	ABI class: X.Org Server Extension, version 10.0
[    30.992] (II) LoadModule: "nvidia"
[    30.992] (II) Loading /usr/lib64/xorg/modules/drivers/nvidia_drv.so
[    31.006] (II) Module nvidia: vendor="NVIDIA Corporation"
[    31.006] 	compiled for 4.0.2, module version = 1.0.0
[    31.006] 	Module class: X.Org Video Driver
[    31.006] (II) NVIDIA dlloader X Driver  390.132  Fri Nov  1 03:36:28 PDT 2019
[    31.006] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
[    31.006] (II) systemd-logind: releasing fd for 226:0
[    31.007] (II) Loading sub module "fb"
[    31.007] (II) LoadModule: "fb"
[    31.008] (II) Loading /usr/lib64/xorg/modules/libfb.so
[    31.013] (II) Module fb: vendor="X.Org Foundation"
[    31.013] 	compiled for 1.20.6, module version = 1.0.0
[    31.013] 	ABI class: X.Org ANSI C Emulation, version 0.4
[    31.013] (II) Loading sub module "wfb"
[    31.013] (II) LoadModule: "wfb"
[    31.014] (II) Loading /usr/lib64/xorg/modules/libwfb.so
[    31.015] (II) Module wfb: vendor="X.Org Foundation"
[    31.015] 	compiled for 1.20.6, module version = 1.0.0
[    31.015] 	ABI class: X.Org ANSI C Emulation, version 0.4
[    31.015] (II) Loading sub module "ramdac"
[    31.015] (II) LoadModule: "ramdac"
[    31.015] (II) Module "ramdac" already built-in
[    31.016] (**) NVIDIA(0): Depth 24, (--) framebuffer bpp 32
[    31.016] (==) NVIDIA(0): RGB weight 888
[    31.016] (==) NVIDIA(0): Default visual is TrueColor
[    31.016] (==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0)
[    31.016] (**) NVIDIA(0): Enabling 2D acceleration
[    31.016] (EE) NVIDIA(0): Failed to initialize the GLX module; please check in your X
[    31.016] (EE) NVIDIA(0):     log file that the GLX module has been loaded in your X
[    31.016] (EE) NVIDIA(0):     server, and that the module is the NVIDIA GLX module.  If
[    31.016] (EE) NVIDIA(0):     you continue to encounter problems, Please try
[    31.016] (EE) NVIDIA(0):     reinstalling the NVIDIA driver.
[    31.619] (--) NVIDIA(0): Valid display device(s) on GPU-0 at PCI:5:0:0
...
Unfortunately, this doesn't tell me anything I didn't already know. What I don't know is how to fix it - other than removing everything to do with kmod-nvidia, and installing the driver manually. I'm still not sure that would repair the damage and get my system working again.

chemal
Posts: 776
Joined: 2013/12/08 19:44:49

Re: kernel-4.18.0-193.6.3.el8_2.x86_64 update broke kmod-nvidia driver functioning

Post by chemal » 2020/06/23 01:59:37

There should be

Code: Select all

/usr/lib64/xorg/modules/extensions/libglx.so
from xorg-x11-server-Xorg and

Code: Select all

/usr/lib64/xorg/modules/extensions/nvidia/libglx.so
from nvidia-x11-drv-390xx. If the proprietary driver is configured, the X server should be made to load the latter. This should be done by the elrepo packages.

I don't have a CentOS 8 installation with nvidia, but one of my CentOS 7 computers with nvidia-x11-drv-340xx (another legacy version) has this in /etc/X11/xorg.conf.d/99-nvidia.conf to make it actually happen:

Code: Select all

Section "Files"
        ModulePath   "/usr/lib64/xorg/modules/extensions/nvidia"
        ModulePath   "/usr/lib64/xorg/modules"
EndSection

mprater
Posts: 22
Joined: 2020/04/16 19:10:33

Re: kernel-4.18.0-193.6.3.el8_2.x86_64 update broke kmod-nvidia driver functioning

Post by mprater » 2020/06/23 15:03:28

Well, here's what I've got:

Code: Select all

ls -lR /usr/lib64/xorg/modules/extensions
/usr/lib64/xorg/modules/extensions:
total 14932
-rwxr-xr-x. 1 root root   308168 Apr 23 19:51 libglx.so*
-rwxr-xr-x. 1 root root 14976296 Apr  7 11:16 libglx.so.390.132*
drwxr-xr-x. 2 root root       48 Jun 22 14:34 nvidia/

/usr/lib64/xorg/modules/extensions/nvidia:
total 14628
lrwxrwxrwx. 1 root root       17 May  2 04:27 libglx.so -> libglx.so.390.132*
-rwxr-xr-x. 1 root root 14976296 May  2 04:27 libglx.so.390.132*
I don't seem to have a /etc/X11/xorg.conf.d/99-nvidia.conf file - whatever that is. The 00-keyboard.conf file's comments say it's
Written by systemd-localed(8)
and
It's probably wise not to edit this file manually.
I assume the same would go for an nvidia.conf file I might want to have in /etc/X11/xorg.conf.d, but I don't know anything about how to go about generating one.

In case it isn't obvious by now, I'm not a sys-admin. Once every few years I'm forced to install a new operating system that has the newer library versions in its repo's that I must have in order to build the software I develop, and to keep pace with the OS being used at the company I work for. My CentOS 8 system had been working fine for some time with kernel-4.18.0-147.8.1.el8_1.x86_64 and the kmod-nvidia packages. In previous kernels, I had installed the nvidia driver manually, but then switched to using the kmod packages in the kernel-4.18.0-147.8.1.el8_1.x86_64 update, since I was told that would allow my system to be upgraded without going through the manual driver installation process every time: viewtopic.php?f=54&t=74060&p=311958#p311958 Unfortunately, that doesn't seem to have been the case, and now I can't boot using kernel-4.18.0-147.8.1.el8_1.x86_64 any more.

Is there any hope of getting this configured properly, or should I remove the kmod-nvidia packages and try a manual installation? My fear with that route is that the system's configuration is now too mucked up, and I'll only be digging a deeper hole by shifting gears.

Thanks, mitch

mprater
Posts: 22
Joined: 2020/04/16 19:10:33

Re: kernel-4.18.0-193.6.3.el8_2.x86_64 update broke kmod-nvidia driver functioning

Post by mprater » 2020/06/23 17:27:40

Not wanting to wait, I went ahead and manually edited these lines into /etc/X11/xorg.conf:

Code: Select all

Section "Files"
	ModulePath	"/usr/lib64/xorg/modules/extensions/nvidia"
	ModulePath	"/usr/lib64/xorg/modules"
EndSection
My assumption is that /usr/lib64/xorg/modules/extensions/libglx.so was being found instead of /usr/lib64/xorg/modules/extensions/nvidia/libglx.so (which links to libglx.so.390.132).

After rebooting, I have

Code: Select all

glxinfo | grep vendor
server glx vendor string: NVIDIA Corporation
client glx vendor string: NVIDIA Corporation
OpenGL vendor string: NVIDIA Corporation
and OpenGL is now working. Yay!

I'm troubled that I had to make these modifications to get it to function again after the update. Shouldn't this all be more "automatic"? Or is this simply how the CentOS world is? My previous linux OS was Fedora 21, 23 and then 26. I somewhat expected issues like this from Fedora, but I hoped CentOS would be a bit less sketchy.

User avatar
jlehtone
Posts: 4530
Joined: 2007/12/11 08:17:33
Location: Finland

Re: kernel-4.18.0-193.6.3.el8_2.x86_64 update broke kmod-nvidia driver functioning

Post by jlehtone » 2020/06/23 19:09:29

I have up to date and functional system. I keep elrepo disabled and enable it explicitly when needed.
This lets me run yum update before I update the NVidia drivers. Just to ensure that base packages do not mess what elrepo packages do.

Code: Select all

$ uname -r
4.18.0-193.6.3.el8_2.x86_64

$ sudo yum --enablerepo=elrepo list kmod-nvidia\*
ELRepo.org Community Enterprise Linux Repository - el8                                  273 kB/s | 233 kB     00:00    
Installed Packages
kmod-nvidia.x86_64                                       440.82-2.el8_2.elrepo                                   @elrepo
Available Packages
kmod-nvidia-390xx.x86_64                                 390.132-2.el8_2.elrepo                                  elrepo 

$ sudo yum --enablerepo=elrepo list \*nvidia\*
Last metadata expiration check: 0:00:18 ago on Tue 23 Jun 2020 07:26:56 PM EEST.
Installed Packages
kmod-nvidia.x86_64                                       440.82-2.el8_2.elrepo                           @elrepo        
nvidia-detect.x86_64                                     440.64-1.el8                                    @elrepo-testing
nvidia-x11-drv.x86_64                                    440.82-2.el8_2.elrepo                           @elrepo        
nvidia-x11-drv-libs.x86_64                               440.82-2.el8_2.elrepo                           @elrepo        
Available Packages
kmod-nvidia-390xx.x86_64                                 390.132-2.el8_2.elrepo                          elrepo         
nvidia-x11-drv-390xx.x86_64                              390.132-2.el8_2.elrepo                          elrepo         
nvidia-x11-drv-390xx-libs.i686                           390.132-2.el8_2.elrepo                          elrepo         
nvidia-x11-drv-390xx-libs.x86_64                         390.132-2.el8_2.elrepo                          elrepo         
nvidia-x11-drv-libs.i686                                 440.82-2.el8_2.elrepo                           elrepo         
pcp-pmda-nvidia-gpu.x86_64                               5.0.2-5.el8                                     AppStream
ELRepo seems to have "el8_2" for the *-390xx*. Good.

ELrepo's NVidia packages do not seem to have many config files:

Code: Select all

$ rpm -qc kmod-nvidia
/etc/depmod.d/kmod-nvidia.conf
/etc/dracut.conf.d/dracut-nvidia.conf
/usr/lib/modprobe.d/blacklist-nouveau.conf

$ rpm -qc nvidia-x11-drv
/etc/X11/nvidia-xorg.conf
Nothing in /etc/X11/xorg.conf.d/ :

Code: Select all

$ ls /etc/X11/xorg.conf.d/
00-keyboard.conf
$ rpm -qf /etc/X11/xorg.conf.d/*
systemd-239-30.el8_2.x86_64
The packages apparently copy nvidia-xorg.conf to xorg.conf

Code: Select all

$ ls -l /etc/X11/*.conf
-rw-r--r--. 1 root root 135 Mar 30 21:18 /etc/X11/nvidia-xorg.conf
-rw-r--r--. 1 root root 135 Mar 30 21:18 /etc/X11/xorg.conf

$ cat /etc/X11/nvidia-xorg.conf
# /etc/X11/nvidia-xorg.conf provided by http://elrepo.org

Section "Device"
	Identifier  "Videocard0"
	Driver      "nvidia"
EndSection

$ cat /etc/X11/xorg.conf
# /etc/X11/nvidia-xorg.conf provided by http://elrepo.org

Section "Device"
	Identifier  "Videocard0"
	Driver      "nvidia"
EndSection

$ rpm -qf /etc/X11/xorg.conf
file /etc/X11/xorg.conf is not owned by any package
There is no ModulePath modifications.

Code: Select all

$ grep -i path /var/log/Xorg.0.log
[    15.881] (==) FontPath set to:
	catalogue:/etc/X11/fontpath.d,
[    15.881] (==) ModulePath set to "/usr/lib64/xorg/modules"
[    16.958] (II) NVIDIA(0):     may not be running or the "AcpidSocketPath" X
[    16.958] (II) NVIDIA(0):     "AcpidSocketPath" X configuration options in Appendix B: X
The X11 nevertheless has ModulePath set.

Code: Select all

$ rpm -ql nvidia-x11-drv | grep modules
/usr/lib64/xorg/modules/drivers/nvidia_drv.so
/usr/lib64/xorg/modules/extensions/libglxserver_nvidia.so
/usr/lib64/xorg/modules/extensions/libglxserver_nvidia.so.440.82
$ rpm -ql nvidia-x11-drv-libs | grep modules
$ rpm -ql kmod-nvidia | grep modules
/lib/modules/4.18.0-193.el8.x86_64
/lib/modules/4.18.0-193.el8.x86_64/extra
/lib/modules/4.18.0-193.el8.x86_64/extra/nvidia
/lib/modules/4.18.0-193.el8.x86_64/extra/nvidia/nvidia-drm.ko
/lib/modules/4.18.0-193.el8.x86_64/extra/nvidia/nvidia-modeset.ko
/lib/modules/4.18.0-193.el8.x86_64/extra/nvidia/nvidia-uvm.ko
/lib/modules/4.18.0-193.el8.x86_64/extra/nvidia/nvidia.ko
One can repeat a package installation:

Code: Select all

yum reinstall kmod-nvidia-390xx nvidia-x11-drv-390xx nvidia-x11-drv-390xx-libs
That should rerun the install scripts, should the previous time fail.

Did it?

Code: Select all

$ sudo yum history list kmod-nvidia
ID     | Command line             | Date and time    | Action(s)      | Altered
-------------------------------------------------------------------------------
    87 | --enablerepo=elrepo rein | 2020-06-16 12:28 | R              |    2   
    86 | --enablerepo=elrepo upda | 2020-06-16 12:20 | Upgrade        |    4 E<
    64 | --enablerepo=elrepo inst | 2020-05-15 08:47 | Install        |    1 >
E? What was that error on transaction 86? Why did I Reinstall?

Code: Select all

$ sudo yum history info 86
...
Scriptlet output:
   1 warning: /etc/yum.repos.d/elrepo.repo created as /etc/yum.repos.d/elrepo.repo.rpmnew
   2 dracut-install: Failed to find module 'nvidia'
   3 dracut: FAILED:  /usr/lib/dracut/dracut-install -D /var/tmp/dracut.Xm77d7/initramfs -N nouveau --kerneldir /lib/modules/4.18.0-147.8.1.el8_1.x86_64/ -m nvidia
Okay, I too had some minor hiccup.

mprater
Posts: 22
Joined: 2020/04/16 19:10:33

Re: kernel-4.18.0-193.6.3.el8_2.x86_64 update broke kmod-nvidia driver functioning

Post by mprater » 2020/06/23 19:25:53

Thank you for the detailed evaluation! I think it means I'm not crazy.

In particular, this helps to expand my understanding:
I keep elrepo disabled and enable it explicitly when needed. This lets me run yum update before I update the NVidia drivers. Just to ensure that base packages do not mess what elrepo packages do.

chemal
Posts: 776
Joined: 2013/12/08 19:44:49

Re: kernel-4.18.0-193.6.3.el8_2.x86_64 update broke kmod-nvidia driver functioning

Post by chemal » 2020/06/23 19:29:54

@mprater: You should discuss this with the elrepo people. They have their own bugtracker and mailing list. The nvidia driver packages are not provided by CentOS.

@jlehtone: You are using the current driver which does not want to replace the system's libglx.so.

mprater
Posts: 22
Joined: 2020/04/16 19:10:33

Re: kernel-4.18.0-193.6.3.el8_2.x86_64 update broke kmod-nvidia driver functioning

Post by mprater » 2020/06/23 19:38:14

Okay, thanks @chemal - good to know. I didn't understand how to determine where the problem originated; but now I'll know better how to do a bit more thorough investigation before posting here.

Thanks! mitch

User avatar
jlehtone
Posts: 4530
Joined: 2007/12/11 08:17:33
Location: Finland

Re: kernel-4.18.0-193.6.3.el8_2.x86_64 update broke kmod-nvidia driver functioning

Post by jlehtone » 2020/06/24 10:42:44

chemal wrote:
2020/06/23 19:29:54
@jlehtone: You are using the current driver which does not want to replace the system's libglx.so.
It is true that different versions can behave differently.

I do have NVidia's current, 390xx, and 340xx from ELRepo in use in CentOS 7 systems and my
procedure (unnecessarily paranoid or not) "behaves".

Post Reply