[SOLVED] Walkthrough: Nvidia Card

Issues related to hardware problems
Post Reply
jmacdougca
Posts: 71
Joined: 2017/12/09 00:49:14
Location: Port Coquitlam, BC
Contact:

[SOLVED] Walkthrough: Nvidia Card

Post by jmacdougca » 2020/08/11 23:10:50

My videocard freezes my machine randomly. I have been observing this for a few years now to try and pinpoint the issue.

Last night I upgraded to kernel 5.8.0-1.el7.elrepo.x86_64. When I first booted up, I opened up a terminal and within two minutes the machine froze. So the driver I'm using isn't cutting it.

Code: Select all

 
[b][i]sudo lshw -c video [/i][/b]

  *-display                 
       description: VGA compatible controller
       product: GK106 [GeForce GTX 650 Ti]
       vendor: NVIDIA Corporation
       physical id: 0
       bus info: pci@0000:02:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
       configuration: driver=nouveau latency=0
       resources: irq:56 memory:fa000000-faffffff memory:f0000000-f7ffffff memory:f8000000-f9ffffff ioport:e000(size=128) memory:c0000-dffff
I see I am running driver nouveau.

So clearly that driver is not the one I want. So let's see if ELRepo has drivers compatible with my device.
02:00.0 VGA compatible controller: NVIDIA Corporation GK106 [GeForce GTX 650 Ti] (rev a1)

Code: Select all

/sbin/lspci -n | grep '02:00.0'
02:00.0 0300: [u][b]10de:11c6[/b][/u] (rev a1)
Device ID = 10de:11c6 so let's search for the device ID in ELRepo. Negative. No pair found.

So logically I was thinking to try the Nvidia driver for my card.
So I downloaded the driver NVIDIA-Linux-x86_64-450.57.run. First time I have seen a .run file. I don't know how to install a .run file so I read this article on how to install Nvidia drivers on Centos. It looked to "hacker" method for me. There are multiple ways of doing things in Linux so I asked my skilled friend and he did not prefer this method.

I then found this article written by Centos on how to install Nvidia drivers and got pumped because the penguin in me is closer to getting the problem solved. Unfortunately, the author states to not use the article yet because it is not finished. :cry:

Here are the legacy Nvidia drivers


Then, during my research on before posting this thread I saw tomcat posted
.
kmod-nvidia-346.35 has been released as a long lived driver. THIS DRIVER DROPS SUPPORT FOR OLDER G8xxx, G9xxx, and GT2xx GPUs. Users with older unsupported cards should migrate to the legacy kmod-nvidia-340xx package.
...So my card is GTX 650 so the legacy driver should not be used. Great keep going. :arrow:

So now my eagle eye is looking at kmod-nvidia thanks to tomcat's post. I can see I am using driver nouveau. So if I get the terminal to say driver=kmod-nvidia I bet the random freezes will go away and then there will be peace in the land.

I can see there are multiple kmod-nvidia packages.
kmod-nvidia
kmod-nvidia-96xx
kmod-nvidia-173xx
kmod-nvidia-304xx
kmod-nvidia-340xx

I believe the kmod-nvida package is the one. Then I found nvidia-detect. So let's confirm which package we need.

Code: Select all

nvidia-detect - a
kmod-nvidia
Boom! we are on target. installing...

Code: Select all

 sudo yum install $(nvidia-detect)
Loaded plugins: copr, fastestmirror, langpacks
Loading mirror speeds from cached hostfile
 * base: mirror.its.sfu.ca
 * elrepo: ftp.osuosl.org
 * epel: d2lzkl7pfhq30w.cloudfront.net
 * extras: mirror.its.sfu.ca
 * updates: mirror.its.sfu.ca
Resolving Dependencies
--> Running transaction check
---> Package kmod-nvidia.x86_64 0:450.57-1.el7_8.elrepo will be installed
--> Processing Dependency: nvidia-x11-drv = 450.57 for package: kmod-nvidia-450.57-1.el7_8.elrepo.x86_64
--> Running transaction check
---> Package nvidia-x11-drv.x86_64 0:450.57-1.el7_8.elrepo will be installed
--> Processing Dependency: nvidia-x11-drv-libs(x86-64) = 450.57-1.el7_8.elrepo for package: nvidia-x11-drv-450.57-1.el7_8.elrepo.x86_64
--> Processing Dependency: yum-plugin-nvidia >= 1.0.2 for package: nvidia-x11-drv-450.57-1.el7_8.elrepo.x86_64
--> Processing Dependency: libnvidia-tls.so.450.57()(64bit) for package: nvidia-x11-drv-450.57-1.el7_8.elrepo.x86_64
--> Processing Dependency: libnvidia-ml.so.1()(64bit) for package: nvidia-x11-drv-450.57-1.el7_8.elrepo.x86_64
--> Processing Dependency: libnvidia-glcore.so.450.57()(64bit) for package: nvidia-x11-drv-450.57-1.el7_8.elrepo.x86_64
--> Running transaction check
---> Package nvidia-x11-drv-libs.x86_64 0:450.57-1.el7_8.elrepo will be installed
---> Package yum-plugin-nvidia.noarch 0:1.0.2-1.el7.elrepo will be installed
--> Processing Conflict: nvidia-x11-drv-450.57-1.el7_8.elrepo.x86_64 conflicts ocl-icd
--> Finished Dependency Resolution
Error: nvidia-x11-drv conflicts with ocl-icd-2.2.12-1.el7.x86_64
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest
So

Code: Select all

rpm -qi olc-cid
results in
package olc-cid is not installed
. So there is a conflict with something that isn't installed :?

Code: Select all

[orca@orcacomputers ~]$ rpm -qi ocl-icd
Name        : ocl-icd
Version     : 2.2.12
Release     : 1.el7
Architecture: x86_64
Install Date: Sat 02 Nov 2019 03:44:34 PM PDT
Group       : Unspecified
Size        : 143405
License     : BSD
Signature   : RSA/SHA256, Sun 25 Mar 2018 01:40:35 AM PDT, Key ID 6a2faea2352c64e5
Source RPM  : ocl-icd-2.2.12-1.el7.src.rpm
Build Date  : Sun 25 Mar 2018 01:33:38 AM PDT
Build Host  : buildvm-29.phx2.fedoraproject.org
Relocations : (not relocatable)
Packager    : Fedora Project
Vendor      : Fedora Project
URL         : https://forge.imag.fr/projects/ocl-icd/
Bug URL     : https://bugz.fedoraproject.org/ocl-icd
Summary     : OpenCL ICD Bindings
Description :
OpenCL ICD Bindings.

Code: Select all

yum remove ocl-icd-2.2.12-1.el7.x86_64
(thanks TrevorH for confirming I can remove)
Dependencies removed. I hope simplescreenrecorder will work in the future ssr is awesome!

Install taking a while and takes up 100% of one of the 12 cores on this machine.
Install complete, now let's verify the proper driver is in place. Shoot driver=nouveau is still being used. That's because I need to reboot.

No dice. kmod-nvidia is not compatible with kernel 5.8.0-1.el7.elrepo.x86_64 and my card. So I rebooted to 3.10.0-1127.18.2.el7.x86_64 and device=nvidia.

Code: Select all

 lshw -c video 
WARNING: you should run this program as super-user.
  *-display                 
       description: VGA compatible controller
       product: GK106 [GeForce GTX 650 Ti]
       vendor: NVIDIA Corporation
       physical id: 0
       bus info: pci@0000:02:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: vga_controller bus_master cap_list rom
       configuration: driver=nvidia latency=0
       resources: irq:60 memory:fa000000-faffffff memory:f0000000-f7ffffff memory:f8000000-f9ffffff ioport:e000(size=128) memory:fb000000-fb07ffff
So victory there on the videocard front. - The webcam no longer works. I will stream and and host meetings to see if the machine freezes.

Conclusion: Just had to instal kmod-nvidia.
Great learning experience.
Start less finish more

Post Reply

Return to “CentOS 7 - Hardware Support”