Interrupts generated by mlx5_core are not load balanced

Issues related to applications and software problems
Post Reply
nkatiyar
Posts: 3
Joined: 2023/02/09 04:22:09

Interrupts generated by mlx5_core are not load balanced

Post by nkatiyar » 2023/02/13 11:40:45

Hi,
I am running centos 7.7 VM in azure with Mellanox (mlx5_core) driver for NIC. It is running customized 3.10.0-1062.18.1.el7 kernel image with some minor changes in net directory.

It has created as many queues and irqs as the number of CPUs in VM but all the interrupts are being processed by CPU0 only. Irqbalance service is also running and smp_affinity is set differently for different irqs. I tried setting it manually after stopping the irqbalance service but still all the interrupts were targeted to CPU0 as can be seen from below output.

> cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
0: 9881 0 0 0 0 0 0 0 IO-APIC-edge timer
1: 0 0 0 0 0 0 0 9 IO-APIC-edge i8042
3: 21 25 13 19 2 2 3 856 IO-APIC-edge
4: 68 6 25 22 21 10 19 360 IO-APIC-edge serial
8: 0 0 0 0 0 0 0 0 IO-APIC-edge rtc0
9: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi acpi
12: 0 0 0 0 0 0 0 5 IO-APIC-edge i8042
14: 602 318 226 232 278 205 69 8917 IO-APIC-edge ata_piix
15: 0 0 0 0 0 0 0 0 IO-APIC-edge ata_piix
24: 0 0 0 0 0 0 0 0 PCI-MSI-edge mlx5_pages_eq@pci:8b76:00:02.0
25: 19694 0 0 0 0 0 0 0 PCI-MSI-edge mlx5_cmd_eq@pci:8b76:00:02.0
26: 0 0 0 0 0 0 0 0 PCI-MSI-edge mlx5_async_eq@pci:8b76:00:02.0
28: 123648 0 0 0 0 0 0 0 PCI-MSI-edge mlx5_comp0@pci:8b76:00:02.0
29: 152455 0 0 0 0 0 0 0 PCI-MSI-edge mlx5_comp1@pci:8b76:00:02.0
30: 102308 0 0 0 0 0 0 0 PCI-MSI-edge mlx5_comp2@pci:8b76:00:02.0
31: 89403 0 0 0 0 0 0 0 PCI-MSI-edge mlx5_comp3@pci:8b76:00:02.0
32: 86793 0 0 0 0 0 0 0 PCI-MSI-edge mlx5_comp4@pci:8b76:00:02.0
33: 107817 0 0 0 0 0 0 0 PCI-MSI-edge mlx5_comp5@pci:8b76:00:02.0
34: 117091 0 0 0 0 0 0 0 PCI-MSI-edge mlx5_comp6@pci:8b76:00:02.0
35: 59714 0 0 0 0 0 0 0 PCI-MSI-edge mlx5_comp7@pci:8b76:00:02.0
36: 0 0 0 0 0 0 0 0 PCI-MSI-edge mlx5_pages_eq@pci:83a4:00:02.0
37: 12427 0 0 0 0 0 0 0 PCI-MSI-edge mlx5_cmd_eq@pci:83a4:00:02.0
38: 0 0 0 0 0 0 0 0 PCI-MSI-edge mlx5_async_eq@pci:83a4:00:02.0
40: 35520 0 0 0 0 0 0 0 PCI-MSI-edge mlx5_comp0@pci:83a4:00:02.0
41: 576 0 0 0 0 0 0 0 PCI-MSI-edge mlx5_comp1@pci:83a4:00:02.0
42: 34139 0 0 0 0 0 0 0 PCI-MSI-edge mlx5_comp2@pci:83a4:00:02.0
43: 19951 0 0 0 0 0 0 0 PCI-MSI-edge mlx5_comp3@pci:83a4:00:02.0
44: 41038 0 0 0 0 0 0 0 PCI-MSI-edge mlx5_comp4@pci:83a4:00:02.0
45: 36569 0 0 0 0 0 0 0 PCI-MSI-edge mlx5_comp5@pci:83a4:00:02.0
46: 42023 0 0 0 0 0 0 0 PCI-MSI-edge mlx5_comp6@pci:83a4:00:02.0
47: 12610 0 0 0 0 0 0 0 PCI-MSI-edge mlx5_comp7@pci:83a4:00:02.0
NMI: 0 0 0 0 0 0 0 0 Non-maskable interrupts
LOC: 1536 1224 1240 1107 1299 1379 1171 2152 Local timer interrupts
SPU: 0 0 0 0 0 0 0 0 Spurious interrupts
PMI: 0 0 0 0 0 0 0 0 Performance monitoring interrupts
IWI: 726 143 776 309 780 370 748 1047 IRQ work interrupts
RTR: 0 0 0 0 0 0 0 0 APIC ICR read retries
RES: 59746 34162 150579 45146 149421 87954 149095 47137 Rescheduling interrupts
CAL: 2562 2717 2601 2590 2577 2649 2572 2557 Function call interrupts

Mellanox driver version is :
version: 5.0-0
license: Dual BSD/GPL
description: Mellanox 5th generation network adapters (ConnectX series) core driver
author: Eli Cohen <eli@mellanox.com>
rhelversion: 7.7
srcversion: 7D9FFD656B0EB1000804CB2

Same kernel with different NIC driver (in AWS) and igb driver in physical server works fine.
I tried centos7.9 (3.10.0-1160.76.1.el7) available in Azure market place and there I don't see the issue.

Please suggest that can help in resolving/debugging this issue.

Please CC to [email address removed by moderator] while replying.
regards,
Nitin

User avatar
TrevorH
Site Admin
Posts: 33219
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: Interrupts generated by mlx5_core are not load balanced

Post by TrevorH » 2023/02/13 12:10:23

Reduce the number of cores in your VM to < 8.
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

nkatiyar
Posts: 3
Joined: 2023/02/09 04:22:09

Re: Interrupts generated by mlx5_core are not load balanced

Post by nkatiyar » 2023/02/13 12:39:08

Hi,
I had tried with 2, 4 and 8 also but same result.

regards,
Nitin

User avatar
TrevorH
Site Admin
Posts: 33219
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: Interrupts generated by mlx5_core are not load balanced

Post by TrevorH » 2023/02/13 12:56:29

There is a known problem that will never be fixed where a VM with >= 8 vcpus takes all interrupts on core 0. To fix that you would need to adjust the number it is set to use and then virtually "power cycle" the VM to have it take effect. Your /proc/interrupts output above shows that it has 8 cores now.
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

nkatiyar
Posts: 3
Joined: 2023/02/09 04:22:09

Re: Interrupts generated by mlx5_core are not load balanced

Post by nkatiyar » 2023/02/13 13:11:53

Thank you Trevor for responding.

Following is the output from 4 CPU VM.

[root@anap:default ~]$ date
Mon Feb 13 13:09:10 UTC 2023
[root@anap:default ~]$ ps -eaf | grep irqbalance
root 664 1 0 13:01 ? 00:00:00 /usr/sbin/irqbalance --foreground --debug
root 23749 30667 0 13:09 pts/0 00:00:00 grep --line-buffered irqbalance
[root@anap:default ~]$ cat /proc/irq/31/smp_affinity
8
[root@anap:default ~]$ cat /proc/irq/30/smp_affinity
2
[root@anap:default ~]$ cat /proc/irq/29/smp_affinity
4
[root@anap:default ~]$ cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
0: 9672 0 0 0 IO-APIC-edge timer
1: 0 3 1 5 IO-APIC-edge i8042
3: 98 35 38 826 IO-APIC-edge
4: 8 0 5 16 IO-APIC-edge serial
8: 0 0 0 0 IO-APIC-edge rtc0
9: 0 0 0 0 IO-APIC-fasteoi acpi
12: 2 0 0 1 IO-APIC-edge i8042
14: 647 389 351 7284 IO-APIC-edge ata_piix
15: 0 0 0 0 IO-APIC-edge ata_piix
24: 0 0 0 0 PCI-MSI-edge mlx5_pages_eq@pci:3f0d:00:02.0
25: 9830 0 0 0 PCI-MSI-edge mlx5_cmd_eq@pci:3f0d:00:02.0
26: 0 0 0 0 PCI-MSI-edge mlx5_async_eq@pci:3f0d:00:02.0
28: 875 0 0 0 PCI-MSI-edge mlx5_comp0@pci:3f0d:00:02.0
29: 285 0 0 0 PCI-MSI-edge mlx5_comp1@pci:3f0d:00:02.0
30: 233 0 0 0 PCI-MSI-edge mlx5_comp2@pci:3f0d:00:02.0
31: 1278 0 0 0 PCI-MSI-edge mlx5_comp3@pci:3f0d:00:02.0
NMI: 0 0 0 0 Non-maskable interrupts
LOC: 2406 2001 2125 2942 Local timer interrupts
SPU: 0 0 0 0 Spurious interrupts
PMI: 0 0 0 0 Performance monitoring interrupts
IWI: 2642 1297 2855 1394 IRQ work interrupts
RTR: 0 0 0 0 APIC ICR read retries
RES: 102491 90075 105613 87760 Rescheduling interrupts
CAL: 2059 2251 2096 2207 Function call interrupts
TLB: 0 0 0 0 TLB shootdowns
TRM: 0 0 0 0 Thermal event interrupts
THR: 0 0 0 0 Threshold APIC interrupts
DFR: 0 0 0 0 Deferred Error APIC interrupts
MCE: 0 0 0 0 Machine check exceptions
MCP: 2 2 2 2 Machine check polls
HYP: 382 22 0 47 Hypervisor callback interrupts
HVS: 779346 380639 742171 365452 Hyper-V stimer0 interrupts
ERR: 0
MIS: 0
PIN: 0 0 0 0 Posted-interrupt notification event
NPI: 0 0 0 0 Nested posted-interrupt event
PIW: 0 0 0 0 Posted-interrupt wakeup event
[root@anap:default ~]$

regards,
Nitin

Post Reply