Page 1 of 1

INTEL NIC ERROR

Posted: 2020/02/16 02:52:50
by songfeiyu1
Hello.

I've discovered a problem with centos 6.10 system, the kernel is 2.6.32-754.27.1.el6.x86_64, and the NIC is Intel Corporation I350 Gigabit Network Connection (rev 01).
As soon as Iconnect with my server it works perfectly. but when my APP services are runing for few hours, i get an error of "igb 0000:01:00.0: Detected Tx Unit Hang"
eb 15 10:10:14 localhost kernel: igb 0000:01:00.0: Detected Tx Unit Hang
Feb 15 10:10:14 localhost kernel: Tx Queue <0>
Feb 15 10:10:14 localhost kernel: TDH <0>
Feb 15 10:10:14 localhost kernel: TDT <34>
Feb 15 10:10:14 localhost kernel: next_to_use <34>
Feb 15 10:10:14 localhost kernel: next_to_clean <12>
Feb 15 10:10:14 localhost kernel: buffer_info[next_to_clean]
Feb 15 10:10:14 localhost kernel: time_stamp <110f5efb1>
Feb 15 10:10:14 localhost kernel: next_to_watch <ffff8804175e9140>
Feb 15 10:10:14 localhost kernel: jiffies <110f5f9b9>
Feb 15 10:10:14 localhost kernel: desc.status <0>
Feb 15 10:10:15 localhost kernel: igb 0000:01:00.1: eth1: igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Feb 15 10:10:16 localhost kernel: igb 0000:01:00.0: Detected Tx Unit Hang
Feb 15 10:10:16 localhost kernel: Tx Queue <0>
Feb 15 10:10:16 localhost kernel: TDH <0>
Feb 15 10:10:16 localhost kernel: TDT <34>
Feb 15 10:10:16 localhost kernel: next_to_use <34>
Feb 15 10:10:16 localhost kernel: next_to_clean <12>
Feb 15 10:10:16 localhost kernel: buffer_info[next_to_clean]
Feb 15 10:10:16 localhost kernel: time_stamp <110f5efb1>
Feb 15 10:10:16 localhost kernel: next_to_watch <ffff8804175e9140>
Feb 15 10:10:16 localhost kernel: jiffies <110f60189>
Feb 15 10:10:16 localhost kernel: desc.status <0>
Feb 15 10:10:18 localhost kernel: igb 0000:01:00.0: Detected Tx Unit Hang
Feb 15 10:10:18 localhost kernel: Tx Queue <0>
Feb 15 10:10:18 localhost kernel: TDH <0>
Feb 15 10:10:18 localhost kernel: TDT <34>
Feb 15 10:10:18 localhost kernel: next_to_use <34>
Feb 15 10:10:18 localhost kernel: next_to_clean <12>
Feb 15 10:10:18 localhost kernel: buffer_info[next_to_clean]
Feb 15 10:10:18 localhost kernel: time_stamp <110f5efb1>
Feb 15 10:10:18 localhost kernel: next_to_watch <ffff8804175e9140>
Feb 15 10:10:18 localhost kernel: jiffies <110f60959>
Feb 15 10:10:18 localhost kernel: desc.status <0>
Feb 15 10:10:23 localhost kernel: igb 0000:01:00.0: eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

SO, waht should i do, how to slove this issue, i have try to update the driver for the NIC to latest version ,but it`s no use.

Re: INTEL NIC ERROR

Posted: 2020/02/16 11:41:18
by TrevorH
You could try turning off offloading with e.g ethtool -K ethx tso off gso off gro off

Re: INTEL NIC ERROR

Posted: 2020/02/16 14:27:52
by songfeiyu1
TrevorH wrote:
2020/02/16 11:41:18
You could try turning off offloading with e.g ethtool -K ethx tso off gso off gro off
I turned off tso, but still facing the same issue, but my other servers with the same hardware and system are working well.

Re: INTEL NIC ERROR

Posted: 2020/02/16 15:48:30
by TrevorH
my other servers with the same hardware and system are working well.
If everything else is the same then I suspect a hardware problem.

Re: INTEL NIC ERROR

Posted: 2020/02/18 13:21:36
by songfeiyu1
TrevorH wrote:
2020/02/16 15:48:30
my other servers with the same hardware and system are working well.
If everything else is the same then I suspect a hardware problem.
My other servers are facing the same issue, i try other system,such as ubuntu,windows server, those are working well, maybe this is a bug for centos 6.10, i will try centos 7

Re: INTEL NIC ERROR

Posted: 2020/02/18 14:23:28
by TrevorH
Please note that the report I saw that I quoted ethtool -K ethx tso off gso off gro off from, explicitly says that ONLY turning off tso is not enough and that all 3 must be off.