amdgpu strangeness

Issues related to hardware problems
Post Reply
gostal
Posts: 71
Joined: 2019/09/23 15:26:45

amdgpu strangeness

Post by gostal » 2019/09/24 14:30:36

My system gives inconsistent information regarding graphic card driver:

Code: Select all

lspci -nnk|grep -A3 VGA
03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon Pro WX 7100] [1002:67c4]
        Subsystem: Dell Device [1028:0b0d]
        Kernel driver in use: amdgpu
        Kernel modules: amdgpu
This is not what X reports:

Code: Select all

cat /var/log/Xorg.o.log | grep amd
[    27.933] (II) LoadModule: "amdgpu"
[    27.933] (WW) Warning, couldn't open module amdgpu
[    27.933] (EE) Failed to load module "amdgpu" (module does not exist, 0)
Indeed, there is no amdgpu driver to be found under /usr/lib64/xorg/modules/drivers
The only amdgpu related thing that yum finds is umr.x86_64 : AMDGPU Userspace Register Debugger.

What's going here?
Why does part of the system believe that my graphics card driver is amdgpu when X doesn't think so?
amdgpu is open source so why can't I find the driver in the standard repos?

AMD provides the amdgpu-pro driver for CentOS but the latest release is for CentOS 7.6 and I have 7.7. Any chance that this would work on standard kernel 3.10.0-1062.1.1.el7.x86_64?

Thanks for any attempt to rid me of this bewilderment or help in any other way.

Cheers,
gostal
Last edited by gostal on 2019/09/25 23:00:27, edited 1 time in total.
Desktop Dell T5810 Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz, 72 GB RAM, Radeon Pro WX 7100
CentOS 7.9.2009

gostal
Posts: 71
Joined: 2019/09/23 15:26:45

Re: amdgpu strangeness

Post by gostal » 2019/09/25 22:58:45

Hi,

I got wiser so the bewilderment disappeared. I realise that there is, indeed, a kernel module amdgpu but there is no corresponding xorg-x11 driver. At first I did not realise they were two different things. I guess the reason why X tries to load the amdgpu driver is that the kernel module is loaded. By default X autoconfigures a number of drivers, the first one is ati then there is fbdev, vesa and modesetting, I believe that was the order but I cannot check right now. I also think that amdgpu was not autoconfigured but memory is volatile now. Edit: Just checked. The order is modesetting, fbdev and vesa last. amdgpu was not autoconfigured in the beginning.

Anyway, since I could not find any xorg-x11-drv-amdgpu package except for Fedora I downloaded the latest sources from https://www.x.org/archive/individual/driver/ and compiled the driver and installed it under /usr/local/x86-video-amdgpu. I put a symlink to the driver in the standard search path and rebooted. Then also amdgpu was autoconfigured (and also loaded) but not first but second after ati. The build process also made a 10-amdgpu.conf-file to be put under /etc/X11/xorg.conf.d and when I put a symlink there to that file then amdgpu got autoconfigured first and ati second.

Since there is no more need for the ati driver and never was for the radeon driver I removed the package but still the ati driver gets autoconfigured. So now the ati and amdgpu roles are switched. First I tried to disable ati but there does not seem to be any ati kernel module so I could not blacklist it, but perhaps I did not look hard enough.

Now I wonder if there is a way to prevent X to autoconfigure the ati driver. If somebody could tell me I'd be grateful.

I also still would like to know if it's any idea to try and build the latest amdgpu-pro driver with claimed CentOS 7.6 support but the case has grown weaker since on my machine Matlab R2017b accepts the radeonsi gallium 3D acceleration provided by Mesa, however, the latest release R2019b crashes if I don't force software OpenGL emulation so if I wan't 3D acceleration in the latest release then amdgpu-pro is of interest. The following page: https://bugs.centos.org/view.php?id=14329 makes me wonder, though, if it is possible since according to the page

Code: Select all

# CONFIG_DRM_AMDGPU_SI is not set
in the standard kernel. It's quite easy to check, BTW, just do

Code: Select all

cat /boot/config...|grep AMDGPU
.

Cheers,
gostal
Desktop Dell T5810 Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz, 72 GB RAM, Radeon Pro WX 7100
CentOS 7.9.2009

gostal
Posts: 71
Joined: 2019/09/23 15:26:45

Re: amdgpu strangeness

Post by gostal » 2019/10/12 09:50:50

After removing xorg-x11-drv-ati-19.0.1-2.el7.x86_64 I noticed that X crashed on practically every reboot so I reinstalled the package thinking that perhaps that was the reason but X keeps crashing on reboot so that was not the reason. It's a rather mild crash, X restarts itself and you're not even aware but it's kind of annoying once you know about it. In fact it happened once that X started without crashing. The other day I noticed something odd, however, and that is that X does not unload the ati driver but it unloads all the other initially loaded drivers. Xorg.0.log:

Code: Select all

[    34.349] (==) Matched amdgpu as autoconfigured driver 0
[    34.349] (==) Matched ati as autoconfigured driver 1
[    34.349] (==) Matched modesetting as autoconfigured driver 2
[    34.349] (==) Matched fbdev as autoconfigured driver 3
[    34.349] (==) Matched vesa as autoconfigured driver 4

[    34.350] (II) Loading /usr/lib64/xorg/modules/drivers/amdgpu_drv.so
[    34.351] (II) Loading /usr/lib64/xorg/modules/drivers/ati_drv.so
[    34.353] (II) Loading /usr/lib64/xorg/modules/drivers/modesetting_drv.so
[    34.353] (II) Loading /usr/lib64/xorg/modules/drivers/fbdev_drv.so
[    34.353] (II) Loading /usr/lib64/xorg/modules/drivers/vesa_drv.so

[    34.536] (II) UnloadModule: "modesetting"
[    34.536] (II) Unloading modesetting
[    34.536] (II) UnloadModule: "fbdev"
[    34.536] (II) Unloading fbdev
[    34.536] (II) UnloadSubModule: "fbdevhw"
[    34.536] (II) Unloading fbdevhw
[    34.536] (II) UnloadModule: "vesa"
[    34.536] (II) Unloading vesa
So, X for some reason beyond me runs both amdgpu and ati in parallell. Why? To me it seems that there can hardly be any advantage to have both running but rather there is a potential conflict between the two and it's only a matter of time before something bad happens. So I am thinking about removing the ati package again to prevent that situation. I'd appreciate any advice in the matter.

I also did some performance investigation regarding the 3D hardware acceleration using Matlab's bench function. It turns out that Matlab starts with hardware OpenGL for two of my installed releases namely R2016b and R2017b. The later ones won't have it. According to bench amdgpu does better on everything except 2D and in 3D the performance increase is considerable, just over 20%. It's about 8% worse in 2D which is also considerable but since it does better on all the math benchmarks and 3D I am going to stay with the open source amdgpu driver, at least for the time being.

I also would still appreciate any thoughts on whether it will be any point at all to try and build the next amdgpu-pro release which I hope will support CentOS 7.7. The current release only supports up to 7.6 as I have already mentioned. About the only advantages as far as I can make it out from various sources on the Internet is that Matlab will not have any hardware driver issues for any of my installed releases and that a later version of OpenGL will be supported but otherwise it seems there will generally be a performance decrease compared to the open source driver.

Cheers,
gostal
Desktop Dell T5810 Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz, 72 GB RAM, Radeon Pro WX 7100
CentOS 7.9.2009

zerofire
Posts: 16
Joined: 2019/10/11 01:41:41

Re: amdgpu strangeness

Post by zerofire » 2019/10/13 00:58:27

Generally point releases update the Kernel and a few supporting packages. They generally do not shuffle around the coupling method to the Kernel for drivers. In fact I am fairly certain that the most common reason for a driver to error out is due to a package dependency that gets moved around. You should be able to take the amdgpu driver for CentOS 7.6 and load it in CentOS 7.7.

Post Reply