I build a custom storage system - a NAS basically.
I put a SATA-3 controller on my E450 board that has a Marvel 88SE9230 chip on it and attached 4x 4TB Seagate drives.
Setup with mdadm went fine: create a SW Raid-5 over the four disks - no LVM - and only one big ext4 partition.
At first everything was fine. But then some "strange error messages appeared in the /var/lg messages ... I checked a few forums and a few people pointed out some possible firmware/driver problems as I was (still) running CentOS 5.6 (with the 2.6 kernel).
So I upgraded to CentOS 7.0
Again everything was running ok ... for three weeks no errors at all ... however then this appeared:
Code: Select all
Oct 5 03:29:03 data-server kernel: ata7.00: exception Emask 0x0 SAct 0xc000000 SErr 0x0 action 0x6
Oct 5 03:29:03 data-server kernel: ata7.00: irq_stat 0x40000008
Oct 5 03:29:03 data-server kernel: ata7.00: failed command: READ FPDMA QUEUED
Oct 5 03:29:03 data-server kernel: ata7.00: cmd 60/00:d8:90:f4:2e/04:00:4d:00:00/40 tag 27 ncq 524288 in
res 41/84:00:10:8e:21/00:04:4d:00:00/00 Emask 0x410 (ATA bus error) <F>
Oct 5 03:29:03 data-server kernel: ata7.00: status: { DRDY ERR }
Oct 5 03:29:03 data-server kernel: ata7.00: error: { ICRC ABRT }
Oct 5 03:29:03 data-server kernel: ata7: hard resetting link
Oct 5 03:29:03 data-server kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Oct 5 03:29:03 data-server kernel: ata7.00: configured for UDMA/133
Oct 5 03:29:03 data-server kernel: ata7: EH complete
Code: Select all
Oct 5 08:43:03 data-server kernel: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Oct 5 08:43:03 data-server kernel: sd 6:0:0:0: [sdf] CDB:
Oct 5 08:43:03 data-server kernel: Read(16): 88 00 00 00 00 00 f3 0b a2 00 00 00 00 08 00 00
Oct 5 08:43:03 data-server kernel: md/raid:md1: Too many read errors, failing device sdf1.
Oct 5 08:43:03 data-server kernel: md/raid:md1: Disk failure on sdf1, disabling device.
md/raid:md1: Operation continuing on 3 devices.
Oct 5 08:43:03 data-server kernel: md/raid:md1: read error not correctable (sector 4077622784 on sdf1).
Oct 5 08:43:03 data-server kernel: md/raid:md1: read error not correctable (sector 4077622792 on sdf1).
Oct 5 08:43:03 data-server kernel: md/raid:md1: read error not correctable (sector 4077622800 on sdf1).
Oct 5 08:43:03 data-server kernel: md/raid:md1: read error not correctable (sector 4077622808 on sdf1).
Oct 5 08:43:03 data-server kernel: md/raid:md1: read error not correctable (sector 4077622816 on sdf1).
Oct 5 08:43:03 data-server kernel: md/raid:md1: read error not correctable (sector 4077622824 on sdf1).
Oct 5 08:43:03 data-server kernel: md/raid:md1: read error not correctable (sector 4077622832 on sdf1).
Oct 5 08:43:03 data-server kernel: md/raid:md1: read error not correctable (sector 4077622840 on sdf1).
Oct 5 08:43:03 data-server kernel: md/raid:md1: read error not correctable (sector 4077622848 on sdf1).
Oct 5 08:43:03 data-server kernel: md/raid:md1: read error not correctable (sector 4077622856 on sdf1).
Oct 5 08:43:03 data-server kernel: sd 6:0:0:0: [sdf] Unhandled error code
Oct 5 08:43:03 data-server kernel: sd 6:0:0:0: [sdf]
Oct 5 08:43:03 data-server kernel: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Oct 5 08:43:03 data-server kernel: sd 6:0:0:0: [sdf] CDB:
Oct 5 08:43:03 data-server kernel: Write(16): 8a 00 00 00 00 00 f3 0b a1 f8 00 00 04 00 00 00
Oct 5 08:43:34 data-server kernel: ata10.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Oct 5 08:43:34 data-server kernel: ata10.00: failed command: FLUSH CACHE EXT
Oct 5 08:43:34 data-server kernel: ata10.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 26
res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
Oct 5 08:43:34 data-server kernel: ata10.00: status: { DRDY }
Oct 5 08:43:34 data-server kernel: ata10: hard resetting link
Oct 5 08:43:34 data-server kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Oct 5 08:43:34 data-server kernel: ata9.00: failed command: FLUSH CACHE EXT
Oct 5 08:43:34 data-server kernel: ata9.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 16
res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
Oct 5 08:43:34 data-server kernel: ata9.00: status: { DRDY }
Oct 5 08:43:34 data-server kernel: ata9: hard resetting link
Oct 5 08:43:34 data-server kernel: ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Oct 5 08:43:34 data-server kernel: ata8.00: failed command: FLUSH CACHE EXT
Oct 5 08:43:34 data-server kernel: ata8.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 28
res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
Oct 5 08:43:34 data-server kernel: ata8.00: status: { DRDY }
Oct 5 08:43:34 data-server kernel: ata8: hard resetting link
Oct 5 08:43:35 data-server kernel: ata10: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Oct 5 08:43:35 data-server kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Oct 5 08:43:35 data-server kernel: ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Oct 5 08:43:40 data-server kernel: ata10.00: qc timeout (cmd 0xec)
Oct 5 08:43:40 data-server kernel: ata9.00: qc timeout (cmd 0xec)
Oct 5 08:43:40 data-server kernel: ata8.00: qc timeout (cmd 0xec)
Oct 5 08:43:40 data-server kernel: ata10.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Oct 5 08:43:40 data-server kernel: ata10.00: revalidation failed (errno=-5)
Oct 5 08:43:40 data-server kernel: ata10: hard resetting link
Oct 5 08:43:40 data-server kernel: ata9.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Oct 5 08:43:40 data-server kernel: ata9.00: revalidation failed (errno=-5)
Oct 5 08:43:40 data-server kernel: ata9: hard resetting link
Oct 5 08:43:40 data-server kernel: ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Oct 5 08:43:40 data-server kernel: ata8.00: revalidation failed (errno=-5)
Oct 5 08:43:40 data-server kernel: ata8: hard resetting link
Oct 5 08:43:41 data-server kernel: ata10: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Oct 5 08:43:41 data-server kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Oct 5 08:43:41 data-server kernel: ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Oct 5 08:43:51 data-server kernel: ata10.00: qc timeout (cmd 0xec)
Oct 5 08:43:51 data-server kernel: ata9.00: qc timeout (cmd 0xec)
Oct 5 08:43:51 data-server kernel: ata8.00: qc timeout (cmd 0xec)
Oct 5 08:43:51 data-server kernel: ata10.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Oct 5 08:43:51 data-server kernel: ata10.00: revalidation failed (errno=-5)
Oct 5 08:43:51 data-server kernel: ata10: limiting SATA link speed to 3.0 Gbps
Oct 5 08:43:51 data-server kernel: ata10: hard resetting link
Oct 5 08:43:51 data-server kernel: ata9.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Oct 5 08:43:51 data-server kernel: ata9.00: revalidation failed (errno=-5)
Oct 5 08:43:51 data-server kernel: ata9: limiting SATA link speed to 3.0 Gbps
Oct 5 08:43:51 data-server kernel: ata9: hard resetting link
Oct 5 08:43:51 data-server kernel: ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Oct 5 08:43:51 data-server kernel: ata8.00: revalidation failed (errno=-5)
Oct 5 08:43:51 data-server kernel: ata8: limiting SATA link speed to 3.0 Gbps
Oct 5 08:43:51 data-server kernel: ata8: hard resetting link
Oct 5 08:43:52 data-server kernel: ata10: SATA link up 6.0 Gbps (SStatus 133 SControl 320)
Oct 5 08:43:52 data-server kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 320)
Oct 5 08:43:52 data-server kernel: ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 320)
Oct 5 08:44:22 data-server kernel: ata10.00: qc timeout (cmd 0xec)
Oct 5 08:44:22 data-server kernel: ata9.00: qc timeout (cmd 0xec)
Oct 5 08:44:22 data-server kernel: ata8.00: qc timeout (cmd 0xec)
Oct 5 08:44:23 data-server kernel: ata10.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Oct 5 08:44:23 data-server kernel: ata10.00: revalidation failed (errno=-5)
Oct 5 08:44:23 data-server kernel: ata10.00: disabled
Oct 5 08:44:23 data-server kernel: ata10.00: device reported invalid CHS sector 0
Oct 5 08:44:23 data-server kernel: ata9.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Oct 5 08:44:23 data-server kernel: ata9.00: revalidation failed (errno=-5)
Oct 5 08:44:23 data-server kernel: ata9.00: disabled
Oct 5 08:44:23 data-server kernel: ata9.00: device reported invalid CHS sector 0
Oct 5 08:44:23 data-server kernel: ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Oct 5 08:44:23 data-server kernel: ata8.00: revalidation failed (errno=-5)
Oct 5 08:44:23 data-server kernel: ata8.00: disabled
Oct 5 08:44:23 data-server kernel: ata8.00: device reported invalid CHS sector 0
Oct 5 08:44:23 data-server kernel: ata10: hard resetting link
Oct 5 08:44:23 data-server kernel: ata9: hard resetting link
Oct 5 08:44:23 data-server kernel: ata8: hard resetting link
Oct 5 08:44:24 data-server kernel: ata10: SATA link up 6.0 Gbps (SStatus 133 SControl 320)
Oct 5 08:44:24 data-server kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 320)
Oct 5 08:44:24 data-server kernel: ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 320)
Oct 5 08:44:25 data-server kernel: ata10: EH complete
Oct 5 08:44:25 data-server kernel: sd 9:0:0:0: [sdi] Unhandled error code
Oct 5 08:44:25 data-server kernel: sd 9:0:0:0: [sdi]
Oct 5 08:44:25 data-server kernel: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Oct 5 08:44:25 data-server kernel: sd 9:0:0:0: [sdi] CDB:
Oct 5 08:44:25 data-server kernel: Write(16): 8a 00 00 00 00 00 00 00 08 08 00 00 00 01 00 00
Oct 5 08:44:25 data-server kernel: blk_update_request: 249 callbacks suppressed
Oct 5 08:44:25 data-server kernel: end_request: I/O error, dev sdi, sector 2056
Oct 5 08:44:25 data-server kernel: end_request: I/O error, dev sdi, sector 2056
Oct 5 08:44:25 data-server kernel: md: super_written gets error=-5, uptodate=0
Oct 5 08:44:25 data-server kernel: md/raid:md1: Disk failure on sdi1, disabling device.
md/raid:md1: Operation continuing on 2 devices.
Oct 5 08:44:25 data-server kernel: ata9: EH complete
Oct 5 08:44:25 data-server kernel: sd 8:0:0:0: [sdh] Unhandled error code
Oct 5 08:44:25 data-server kernel: sd 8:0:0:0: [sdh]
Oct 5 08:44:25 data-server kernel: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Oct 5 08:44:25 data-server kernel: sd 8:0:0:0: [sdh] CDB:
Oct 5 08:44:25 data-server kernel: Write(16): 8a 00 00 00 00 00 00 00 08 08 00 00 00 01 00 00
Oct 5 08:44:25 data-server kernel: end_request: I/O error, dev sdh, sector 2056
Oct 5 08:44:25 data-server kernel: end_request: I/O error, dev sdh, sector 2056
Oct 5 08:44:25 data-server kernel: md: super_written gets error=-5, uptodate=0
Oct 5 08:44:25 data-server kernel: md/raid:md1: Disk failure on sdh1, disabling device.
md/raid:md1: Operation continuing on 1 devices.
Oct 5 08:44:25 data-server kernel: ata8: EH complete
Oct 5 08:44:25 data-server kernel: sd 7:0:0:0: [sdg] Unhandled error code
Oct 5 08:44:25 data-server kernel: sd 7:0:0:0: [sdg]
Oct 5 08:44:25 data-server kernel: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Oct 5 08:44:25 data-server kernel: sd 7:0:0:0: [sdg] CDB:
Oct 5 08:44:25 data-server kernel: Write(16): 8a 00 00 00 00 00 00 00 08 08 00 00 00 01 00 00
Oct 5 08:44:25 data-server kernel: end_request: I/O error, dev sdg, sector 2056
Oct 5 08:44:25 data-server kernel: end_request: I/O error, dev sdg, sector 2056
Oct 5 08:44:25 data-server kernel: md: super_written gets error=-5, uptodate=0
Oct 5 08:44:25 data-server kernel: md/raid:md1: Disk failure on sdg1, disabling device.
md/raid:md1: Operation continuing on 0 devices.

I checked their Website but there are zero Linux drivers nor Firmware updates ...
Maybe anyone someone else came along such an issue too ? If so, how did you solve it ?
However I'm thinking of replacing the controller ... but that seem not to be such an easy task.
What HW SATA-3 controller (4 Ports internal !!) with good Linux support do you suggest ?
As this is a home NAS system, I don't need (nor can afford) high-end Raid Controllers ...
any help would be highly appreciated
