Degraded raid 1 in Centos7

Post by **TrevorH** » 2014/07/20 00:36:15

I'm not sure he is mistaken, if you look at the Hitachi docs they clearly state that there are several models, some which emulate 512 sectors and some which have native 512 bytes sectors - they call them 512e and 512n. The model numbers have an E in them for emulated ones and an A for non-emulated ones.

obla4ko · Post by **obla4ko** » 2014/07/20 03:56:43

I finally got a log in the destruction of the raid array

File log http://goo.gl/suxQeC 800kb (sorry, but pastebin have 500kb limit)

gerald_clark wrote:You are mistaken.
Those are advanced format drives with 4096 byte sectors.
They only emulate 512 byte sectoring, and if blocks are misaligned they do it with a massive performance hit.

My 2tb hdd have 512 byte sectors (HUS724020ALA640 / 0F14690)

I specifically took 2tb disk that was not 4k sectors

Now i change sata slots on MB and try re add disk to raid..

obla4ko · Post by **obla4ko** » 2014/07/21 06:37:11

After the weekend old raid with Centos 6.5 Mounted to Centos 7 degraded...
I think C7 dont work normally with my MB.
Now i downgraded to C6.5 and wait first service pack for C7

log file

Code: Select all

Jul 21 10:31:26 localhost kernel: ata5: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1
Jul 21 10:31:26 localhost kernel: ata5: SWNCQ:qc_active 0x1 defer_bits 0x0 last_issue_tag 0x0
  dhfis 0x0 dmafis 0x0 sdbfis 0x0
Jul 21 10:31:26 localhost kernel: ata5: ATA_REG 0x50 ERR_REG 0x0
Jul 21 10:31:26 localhost kernel: ata5: tag : dhfis dmafis sdbfis sactive
Jul 21 10:31:26 localhost kernel: ata5: tag 0x0: 0 0 0 1  
Jul 21 10:31:26 localhost kernel: ata5.00: NCQ disabled due to excessive errors
Jul 21 10:31:26 localhost kernel: ata5.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
Jul 21 10:31:26 localhost kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Jul 21 10:31:26 localhost kernel: ata5.00: cmd 61/01:00:08:a8:0f/00:00:01:00:00/40 tag 0 ncq 512 out
         res 40/00:01:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 21 10:31:26 localhost kernel: ata5.00: status: { DRDY }
Jul 21 10:31:26 localhost kernel: ata5: hard resetting link
Jul 21 10:31:26 localhost kernel: ata5: nv: skipping hardreset on occupied port
Jul 21 10:31:27 localhost kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jul 21 10:31:27 localhost kernel: ata5.00: configured for UDMA/133
Jul 21 10:31:27 localhost kernel: ata5: EH complete
Jul 21 10:31:48 localhost dbus-daemon: dbus[977]: [system] Activating via systemd: service name='net.reactivated.Fprint' unit='fprintd.service'
Jul 21 10:31:48 localhost dbus[977]: [system] Activating via systemd: service name='net.reactivated.Fprint' unit='fprintd.service'
Jul 21 10:31:48 localhost systemd: Starting Fingerprint Authentication Daemon...
Jul 21 10:31:48 localhost dbus-daemon: dbus[977]: [system] Successfully activated service 'net.reactivated.Fprint'
Jul 21 10:31:48 localhost dbus[977]: [system] Successfully activated service 'net.reactivated.Fprint'
Jul 21 10:31:48 localhost systemd: Started Fingerprint Authentication Daemon.
Jul 21 10:31:48 localhost fprintd: Launching FprintObject
Jul 21 10:31:48 localhost fprintd: ** Message: D-Bus service launched with name: net.reactivated.Fprint
Jul 21 10:31:48 localhost fprintd: ** Message: entering main loop
Jul 21 10:31:53 localhost su: (to root) L0gRuS on pts/1
Jul 21 10:31:57 localhost kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jul 21 10:31:57 localhost kernel: ata5.00: failed command: WRITE DMA
Jul 21 10:31:57 localhost kernel: ata5.00: cmd ca/00:01:08:a8:0f/00:00:00:00:00/e1 tag 0 dma 512 out
         res 40/00:01:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 21 10:31:57 localhost kernel: ata5.00: status: { DRDY }
Jul 21 10:31:57 localhost kernel: ata5: hard resetting link
Jul 21 10:31:57 localhost kernel: ata5: nv: skipping hardreset on occupied port
Jul 21 10:31:58 localhost kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jul 21 10:31:58 localhost kernel: ata5.00: configured for UDMA/133
Jul 21 10:31:58 localhost kernel: ata5: EH complete
Jul 21 10:32:19 localhost fprintd: ** Message: No devices in use, exit
Jul 21 10:32:28 localhost kernel: ata5: limiting SATA link speed to 1.5 Gbps
Jul 21 10:32:28 localhost kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jul 21 10:32:28 localhost kernel: ata5.00: failed command: WRITE DMA
Jul 21 10:32:28 localhost kernel: ata5.00: cmd ca/00:01:08:a8:0f/00:00:00:00:00/e1 tag 0 dma 512 out
         res 40/00:01:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 21 10:32:28 localhost kernel: ata5.00: status: { DRDY }
Jul 21 10:32:28 localhost kernel: ata5: hard resetting link
Jul 21 10:32:28 localhost kernel: ata5: nv: skipping hardreset on occupied port
Jul 21 10:32:29 localhost kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jul 21 10:32:29 localhost kernel: ata5.00: configured for UDMA/133
Jul 21 10:32:29 localhost kernel: sd 4:0:0:0: [sdc]  
Jul 21 10:32:29 localhost kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul 21 10:32:29 localhost kernel: sd 4:0:0:0: [sdc]  
Jul 21 10:32:29 localhost kernel: Sense Key : Aborted Command [current] [descriptor]
Jul 21 10:32:29 localhost kernel: Descriptor sense data with sense descriptors (in hex):
Jul 21 10:32:29 localhost kernel:        72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
Jul 21 10:32:29 localhost kernel:        00 00 00 08
Jul 21 10:32:29 localhost kernel: sd 4:0:0:0: [sdc]  
Jul 21 10:32:29 localhost kernel: Add. Sense: No additional sense information
Jul 21 10:32:29 localhost kernel: sd 4:0:0:0: [sdc] CDB:
Jul 21 10:32:29 localhost kernel: Write(10): 2a 00 01 0f a8 08 00 00 01 00
Jul 21 10:32:29 localhost kernel: end_request: I/O error, dev sdc, sector 17803272
Jul 21 10:32:29 localhost kernel: end_request: I/O error, dev sdc, sector 17803272
Jul 21 10:32:29 localhost kernel: md: super_written gets error=-5, uptodate=0
Jul 21 10:32:29 localhost kernel: md/raid1:md2: Disk failure on sdc3, disabling device.
md/raid1:md2: Operation continuing on 1 devices.
Jul 21 10:32:29 localhost kernel: ata5: EH complete
Jul 21 10:32:49 localhost kernel: ata5.00: limiting speed to UDMA/100:PIO4
Jul 21 10:32:49 localhost kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jul 21 10:32:49 localhost kernel: ata5.00: failed command: IDENTIFY DEVICE
Jul 21 10:32:49 localhost kernel: ata5.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
         res 40/00:01:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 21 10:32:49 localhost kernel: ata5.00: status: { DRDY }
Jul 21 10:32:49 localhost kernel: ata5: hard resetting link
Jul 21 10:32:49 localhost kernel: ata5: nv: skipping hardreset on occupied port
Jul 21 10:32:50 localhost kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jul 21 10:32:50 localhost kernel: ata5.00: configured for UDMA/100
Jul 21 10:32:50 localhost kernel: ata5: EH complete

===================================update========================
Founded workaround, maybe it solve my problem
disqble NCQ

echo 1 > /sys/block/sda/device/queue_depth

Now only wait

================update 26/07/2014================================
After few days i dont have any degraded raid.

geekbeast · Post by **geekbeast** » 2014/12/16 02:57:44

We are seeing this across two boxes. It is really annoying, because our raid array is always rebuilding and frequently the entire filesystem goes read only.

This affects drives that aren't even partitioned or setup for use, so its happening below the file system layers.

We were getting FPDMA queued from dmesg, but now we're mainly getting read errors and such.

We used to have this problem on Centos 6, but when we upgraded to 6.5 it has gone away even though we are thrashing it pretty hard. On our brand spanking new prod box, we're going down every time we try to do even a moderate number of writes.

We're still troubleshooting and will report back if we can find a work around.

Post by **TrevorH** » 2014/12/16 11:33:53

Without seeing the error messages (the whole error, not just part of it), it's impossible to guess at the cause but FPDMA errors are almost always hardware related. Either cable or drive or possibly even PSU.

CentOS

Degraded raid 1 in Centos7

Re: degraded raid 1 in Centos7

Re: degraded raid 1 in Centos7

Re: degraded raid 1 in Centos7

Re: [Solved] Degraded raid 1 in Centos7

Re: [Solved] Degraded raid 1 in Centos7