nvidia detect results nothing

Issues related to applications and software problems and general support
Post Reply
mikekim
Posts: 5
Joined: 2024/01/16 05:54:22

nvidia detect results nothing

Post by mikekim » 2024/01/16 06:02:16

Dear all,

I installed alma9.3 (I guess very similar to centos),
and now I have probloem of nvidia,

I tried installing epel-release and nvidia-detect,
but when I type nvidia detect it shows nothing.

Thanks for your help.

ATB

User avatar
jlehtone
Posts: 4532
Joined: 2007/12/11 08:17:33
Location: Finland

Re: nvidia detect results nothing

Post by jlehtone » 2024/01/17 09:40:24

The command nvidia-detect is from package nvidia-detect that is from ELRepo repository.
On my (EL9, not Centos Stream 9) system it says:

Code: Select all

$ nvidia-detect

$ nvidia-detect -v
Probing for supported NVIDIA devices...
[10de:1cb6] NVIDIA Corporation GP107GL [Quadro P620]
This device requires the current 525.85.05 NVIDIA driver
$ nvidia-detect -l | grep -E "Devices|10de:1cb6"
*** Devices supported by the current 525.85.05 NVIDIA driver  ***
[10de:1cb6] NVIDIA Corporation GP107GL [Quadro P620]
*** Devices supported by the legacy 470.xx NVIDIA driver  ***
*** Devices supported by the legacy 390.xx NVIDIA driver  ***
*** Devices supported by the legacy 367.xx NVIDIA driver  ***
*** Devices supported by the legacy 340.xx NVIDIA driver  ***
*** Devices supported by the legacy 304.xx NVIDIA driver  ***
*** Devices supported by the legacy 173.xx NVIDIA driver  ***
*** Devices supported by the legacy 96.xx NVIDIA driver  ***
My system has actually a bit newer build of 525 driver from NVidia's own repository:

Code: Select all

$ nvidia-smi | grep NVIDIA-SMI
| NVIDIA-SMI 525.147.05   Driver Version: 525.147.05   CUDA Version: 12.0     |
An issue is that the ELRepo repository for EL9 does not have any packages for NVidia, except the nvidia-detect.
They chose to not build NVidia driver packages, because at least NVidia and RPMFusion repositories do have the driver packages.

One gains access to the NVidia's EL9 repository with:

Code: Select all

sudo dnf config-manager --add-repo=https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo

PS. AlmaLinux OS has its own support channels. See https://almalinux.org/

mikekim
Posts: 5
Joined: 2024/01/16 05:54:22

Re: nvidia detect results nothing

Post by mikekim » 2024/01/18 04:50:48

Thanks @jlehtone for your help.

I think my problem is nvidia-smi, here is the log of the commands:
$ nvidia-detect

$ nvidia-detect -v
Probing for supported NVIDIA devices...
[10de:220a] NVIDIA Corporation GA102 [GeForce RTX 3080 12GB]
This device requires the current 525.85.05 NVIDIA driver

$ nvidia-detect -l | grep -E "Devices|10de:220a"
*** Devices supported by the current 525.85.05 NVIDIA driver ***
[10de:220a] NVIDIA Corporation GA102 [GeForce RTX 3080 12GB]
*** Devices supported by the legacy 470.xx NVIDIA driver ***
*** Devices supported by the legacy 390.xx NVIDIA driver ***
*** Devices supported by the legacy 367.xx NVIDIA driver ***
*** Devices supported by the legacy 340.xx NVIDIA driver ***
*** Devices supported by the legacy 304.xx NVIDIA driver ***
*** Devices supported by the legacy 173.xx NVIDIA driver ***
*** Devices supported by the legacy 96.xx NVIDIA driver ***

$ nvidia-smi | grep NVIDIA-SMI
bash: nvidia-smi: command not found...
Install package 'nvidia-driver-cuda' to provide command 'nvidia-smi'? [N/y] y

Proceed with changes? [N/y] y

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

$ sudo dnf config-manager --add-repo=https://developer.download.nvidia.com/c ... rhel9.repo
Adding repo from: https://developer.download.nvidia.com/c ... rhel9.repo

$nvidia-smi | grep NVIDIA-SMI
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

User avatar
jlehtone
Posts: 4532
Joined: 2007/12/11 08:17:33
Location: Finland

Re: nvidia detect results nothing

Post by jlehtone » 2024/01/18 09:59:41

The nvidia-smi tries to talk with the driver, the NVidia's kernel module. It fails, but offers suggestion:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
BTW, you could not have installed package nvidia-driver-cuda, if you hadn't had the cuda-rhel9-x86_64 repository already installed.
Hence your "dnf config-manager --add-repo ..." was redundant.

If you don't have driver, then you should install it.
If you have driver, then you should load it. If that fails, then figure out why.

Do I have packages (with "nvidia" in name)? rpm -qa \*nvidia\*
If there is no "kmod-nvidia*", then one way to install is:
sudo dnf module install nvidia-driver:latest-dkms

If driver is installed, then there should be a kernel module that one can inspect: modinfo nvidia
If there is no such module, then installation is incomplete.

If module exists, is it loaded? lsmod | grep nvidia
If not loaded, can one load it? sudo modprobe nvidia
Loading can fail if Secure Boot is on in UEFI and the certifcate that was used to sign the module is not in UEFI.

mikekim
Posts: 5
Joined: 2024/01/16 05:54:22

Re: nvidia detect results nothing

Post by mikekim » 2024/01/19 01:47:28

Thanks @jlehtone for your supports and help.

I think my problem was secure boot (I did not have security->secure boot in the boot menu ,but I had option of UEFI/CSM),
I selected CSM and it is corrected now (previously there was no results for $ lsmod | grep nvidia):

$ rpm -qa \*nvidia\*
nvidia-detect-525.85.05-1.el9.x86_64
nvidia-driver-cuda-libs-545.23.08-1.el9.x86_64
nvidia-libXNVCtrl-545.23.08-1.el9.x86_64
nvidia-driver-NVML-545.23.08-1.el9.x86_64
nvidia-driver-NvFBCOpenGL-545.23.08-1.el9.x86_64
nvidia-libXNVCtrl-devel-545.23.08-1.el9.x86_64
dnf-plugin-nvidia-2.0-1.el9.noarch
nvidia-driver-libs-545.23.08-1.el9.x86_64
kmod-nvidia-latest-dkms-545.23.08-1.el9.x86_64
nvidia-kmod-common-545.23.08-1.el9.noarch
nvidia-driver-545.23.08-1.el9.x86_64
nvidia-modprobe-545.23.08-1.el9.x86_64
nvidia-settings-545.23.08-1.el9.x86_64
nvidia-xconfig-545.23.08-1.el9.x86_64
nvidia-driver-devel-545.23.08-1.el9.x86_64
nvidia-persistenced-545.23.08-1.el9.x86_64
nvidia-driver-cuda-545.23.08-1.el9.x86_64

$ lsmod | grep nvidia
nvidia_drm 126976 8
nvidia_modeset 1343488 10 nvidia_drm
nvidia_uvm 3575808 0
nvidia 56295424 142 nvidia_uvm,nvidia_modeset
drm_kms_helper 245760 1 nvidia_drm
drm 704512 12 drm_kms_helper,nvidia,nvidia_drm
video 73728 1 nvidia_modeset

$ nvidia-smi | grep NVIDIA-SMI
| NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3 |

BTW, in case do I need to go to UEFI mode again?

User avatar
jlehtone
Posts: 4532
Joined: 2007/12/11 08:17:33
Location: Finland

Re: nvidia detect results nothing

Post by jlehtone » 2024/01/19 08:44:21

There used to be BIOS on the motherboards. Now there is UEFI. They do low level tasks, like start load of OS on power-on.
The BIOS did load OS (bootloader, actually) differently from how UEFI does it. The UEFI offers two "modes": legacy, which loads OS like BIOS did, and UEFI.

One feature that exists only in UEFI mode is the Secure Boot. There are thus three ways that OS can be loaded:
(1) UEFI with Secure Boot, (2) UEFI without Secure Boot, and legacy (that never can have secure boot).
If OS installer boots with legacy mode, then installed system has to boot with legacy mode.
If OS installer boots with UEFI mode, then installed system has to boot with UEFI mode.

mokutil --sb shows whether Secure Boot is on or off.

What the Secure Boot does, is that the loaded executables (bootloader, kernel, and kernel modules) are verified to be intact:
motherboard has certificates and executables might have been signed with one of them. If an executable is not signed by
any of the certificates, then it is not loaded. Motherboards tend to have only vendor's and Microsoft's certificates by default.

The point of Secure Boot feature is to ensure that one does not load malware by accident; that OS has only trusted kernel components.


The NVidia's kernel module from NVidia's repository is signed by NVidia's certificate (or not).
The mokutil can list known certificates and import new into motherboard (if one has the certificate file).
I have not digged up which certificate the NVidia drivers use because I don't need Secure Boot (yet).

(Microsoft Windows 11 supposedly requires Secure Boot, so dual boot setup with it would require SB. I don't have that "OS".)

mikekim
Posts: 5
Joined: 2024/01/16 05:54:22

Re: nvidia detect results nothing

Post by mikekim » 2024/01/22 04:05:34

Thanks @jlehtone for the complete explanations regarding the secure boot.
And sorry for being late.

I think OS linux can work wit legacy mode and cannot work with UEFI mode correctly.
(I will try to recheck with UEFI mode).

Cheers

User avatar
jlehtone
Posts: 4532
Joined: 2007/12/11 08:17:33
Location: Finland

Re: nvidia detect results nothing

Post by jlehtone » 2024/01/22 09:03:26

All my AlmaLinux 8 and 9 systems do boot with UEFI mode, except the ancient ones where firmware does not support UEFI.
I do have Secure Boot off on systems that do have proprietary NVidia driver, although I had some with SB on fro a while.

mikekim
Posts: 5
Joined: 2024/01/16 05:54:22

Re: nvidia detect results nothing

Post by mikekim » 2024/01/25 01:59:53

Thanks @jlehtone for the note reagrding UEFI mode.
I will try to switch to UEFI mode and turn off the secure boot.
Cheers

Post Reply