Centos 7.6 RAID5 physical system will not boot after single drive failure

Clovis_Sangrail · Post by **Clovis_Sangrail** » 2019/01/03 19:11:36

Hello,
I recently built a Centos 7.6 server with 4 identical 500GB drives, using the Anaconda installer from a local Centos 7.6 DVD. The BIOS settings give me EFI rather than bios boot.

Using "manual partitioning" in the disk sub-menu, I generated a small four-way RAID1 mirror holding the "/boot" and "/boot/efi" mount points. As best as I can tell this mirror uses software RAID via 'md', and does not use 'lvm'. I created "/", "/var", "/home", and "swap" as LVM mount points within a maximally-sized RAID5 array. On completing the install I see the following drive layout:

Code: Select all

$ lsblk
NAME                       MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                          8:0    0 465.8G  0 disk  
├─sda1                       8:1    0     1G  0 part  
│ └─md127                    9:127  0  1023M  0 raid1 /boot
├─sda2                       8:2    0     1G  0 part  
│ └─md125                    9:125  0     1G  0 raid1 /boot/efi
└─sda3                       8:3    0 463.8G  0 part  
  └─md126                    9:126  0   1.4T  0 raid5 
    ├─centos_reports2-root 253:0    0   100G  0 lvm   /
    ├─centos_reports2-swap 253:1    0  19.9G  0 lvm   [SWAP]
    ├─centos_reports2-var  253:2    0 299.9G  0 lvm   /var
    └─centos_reports2-home 253:3    0  49.9G  0 lvm   /home
sdb                          8:16   0 465.8G  0 disk  
├─sdb1                       8:17   0     1G  0 part  
│ └─md127                    9:127  0  1023M  0 raid1 /boot
├─sdb2                       8:18   0     1G  0 part  
│ └─md125                    9:125  0     1G  0 raid1 /boot/efi
└─sdb3                       8:19   0 463.8G  0 part  
  └─md126                    9:126  0   1.4T  0 raid5 
    ├─centos_reports2-root 253:0    0   100G  0 lvm   /
    ├─centos_reports2-swap 253:1    0  19.9G  0 lvm   [SWAP]
    ├─centos_reports2-var  253:2    0 299.9G  0 lvm   /var
    └─centos_reports2-home 253:3    0  49.9G  0 lvm   /home
sdc                          8:32   0 465.8G  0 disk  
├─sdc1                       8:33   0     1G  0 part  
│ └─md127                    9:127  0  1023M  0 raid1 /boot
├─sdc2                       8:34   0     1G  0 part  
│ └─md125                    9:125  0     1G  0 raid1 /boot/efi
└─sdc3                       8:35   0 463.8G  0 part  
  └─md126                    9:126  0   1.4T  0 raid5 
    ├─centos_reports2-root 253:0    0   100G  0 lvm   /
    ├─centos_reports2-swap 253:1    0  19.9G  0 lvm   [SWAP]
    ├─centos_reports2-var  253:2    0 299.9G  0 lvm   /var
    └─centos_reports2-home 253:3    0  49.9G  0 lvm   /home
sdd                          8:48   0 465.8G  0 disk  
├─sdd1                       8:49   0     1G  0 part  
│ └─md127                    9:127  0  1023M  0 raid1 /boot
├─sdd2                       8:50   0     1G  0 part  
│ └─md125                    9:125  0     1G  0 raid1 /boot/efi
└─sdd3                       8:51   0 463.8G  0 part  
  └─md126                    9:126  0   1.4T  0 raid5 
    ├─centos_reports2-root 253:0    0   100G  0 lvm   /
    ├─centos_reports2-swap 253:1    0  19.9G  0 lvm   [SWAP]
    ├─centos_reports2-var  253:2    0 299.9G  0 lvm   /var
    └─centos_reports2-home 253:3    0  49.9G  0 lvm   /home
$ cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4] 
md125 : active raid1 sdb2[0] sdd2[2] sdc2[1] sda2[3]
      1049536 blocks super 1.0 [4/4] [UUUU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md126 : active raid5 sdb3[1] sdc3[2] sdd3[4] sda3[0]
      1458462720 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 2/4 pages [8KB], 65536KB chunk

md127 : active raid1 sdd1[2] sdb1[0] sdc1[1] sda1[3]
      1047552 blocks super 1.2 [4/4] [UUUU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

unused devices:
$

All four drives are partitioned identically, as follows:

Code: Select all

# fdisk /dev/sdd

Command (m for help): p

Disk /dev/sdd: 500.1 GB, 500107862016 bytes, 976773168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk label type: dos
Disk identifier: 0x000f06c8

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1   *        2048     2101247     1049600   fd  Linux raid autodetect
/dev/sdd2         2101248     4200447     1049600   fd  Linux raid autodetect
/dev/sdd3         4200448   976773119   486286336   fd  Linux raid autodetect

Command (m for help): q
"

I want to be able to survive a single drive failure, and as best as understand RAID1 and RAID5 I should be able to do so. Yet, when I shut down the server and and simulate a drive failure by pulling a drive, the server will not come up.

I do see the grub menu, and I do see the attempt to boot into Centos 7. I watch the various different-colored lines crawl all the way across the botton of the console. But then the process falls to the emergency boot prompt, with errors saying that /dev/centos_reports2-root and /dev/centos_reports2-swap do not exist and suggests that I save "/run/initramfs/rdsosreport.txt" to a USB stick and submit a bug report. Exiting the shell results in an identical failure. If I halt the system, power down, re-insert the drive, and power back up, then it boots normally.

I hope that there is a way to get this to work as it ought to. Centos 7 should be able to construct the RAID5 and run with it in a degraded state, and it would be on me to replace the failed drive, before another one dies.

Now, perhaps if I had pulled the sda drive, I may not have even gotten this far. I have not manually written the boot loader to every drive; maybe if I pulled sda I wouldn't even get to the Centos boot attempt. It seems to me that the Anaconda installer should put the boot loader on every drive that is part of a metadevice containing "/" and/or "/boot/efi", but I do not know if it does so. (I'm pretty sure earlier versions did not.) In any case, that is a separate issue.

Should I be using RAID-somethingelse where I used RAID5? I've encountered several sources that recommend RAID10 instead of RAID5. Would my system boot if I had done so? Or is there some other thing I am doing wrong?

hunter86_bg · Post by **hunter86_bg** » 2019/01/04 11:52:12

Try to boot again but edit your grub entry and remove the 'rhgb quiet' stuff from the menu and boot from it (ctrl +x or something like that).
Once it breaks , go a little earlier via (Shift + PgUp) and check for errors .
Also ,once you get prompted for the sosreport , you can check the storage layout via 'lvs; vgs; pvs;lsblk'

Clovis_Sangrail · Post by **Clovis_Sangrail** » 2019/01/08 17:42:39

When I remove the "rhgb quiet" from the grub entry, I see a lot of extra verbiage but end up in the same place. I cannot scroll back (I'm on an old KVM switch), but it seems like everything on the screen is a subset of what I can see by running "journalctl"

The line that fails is very near the end. As best as I can tell everything was normal up to:

. . . . . . . .
sd :1:0:0:0 [sda] Attached SCSI disk
random: fast init done
mgag200 0000 04:03:0: fb0: mgadrmfb frame buffer device
[drm] initialized mgag200 1:0:0 20110418 for 0000:04:03:0: on minor 0
Job dev-mapper-centos_myhost\x2droot.device/start timed out
[TIME] timed out waiting for device dev-mapper-centos_myhost\x2droot.device
Dependency failed ...
Dependency failed ...
. . . . . . . .

This happens even after I made some changes and tried re-installing. I changed my BIOS Firmware boot options from "UEFI and Legacy" to "UEFI Only", and reinstalled Centos with just two drives configured in a mirror. The /boot and /boot/efi partitions are the same as before (save on a 2-way rather than 4-way mirror), and the other partitions again are lvm on a maximally-sized RAID array, this time RAID1 of the two drives rather than RAID5 with four drives.

There were noticeable effects of these changes. Anaconda wrote GPT rather than DOS labels on the two drives, and the install included the "efibootmgr" package by default. "efibootmgr -v" now works, instead of failing with a message about EFI variables not being supported.

But the system still cannot boot when I take away one of the drives and again, the 'md' driver should be able to run the mirror configuration in a degraded state. Are there perhaps configuration options I need to supply to the md software?

The lvs,pvs,lsblk, and vga utilities are not available in the emergency shell, or at any rate I dont see them in /bin or /sbin. I tried copying the report over to /boot, though I suspect it'll be mounted over when I reboot with both drives (assuming that works). I will try and submit this as a bug report, though again any advice is welcome.

Clovis_Sangrail · Post by **Clovis_Sangrail** » 2019/01/09 18:33:15

This is less of an actual solution to my problem, and more of a realization that my expectations of "md+lvm" software RAID were perhaps more than realistic. I was perhaps asking too much to expect "md+lvm" to boot with a hard disk suddenly and mysteriously vanished. When a drive (or a partition of a drive) in a RAID configuration actually becomes faulty during use, it will over time generate various errors in log files, etc, and the 'md' RAID software will experience failures in trying to use that drive and/or partition.
Eventually the 'md' software will 'fail' that component, or mark it as faulty. You can see this via "cat /proc/mdstat", in this case showing that the /dev/sdb1 component of the md125 RAID1 mirror is faulty:

Code: Select all

[root@host ~]# cat /proc/mdstat
Personalities : [raid1] 
md125 : active raid1 sda1[0] sdb1[1](F)
      1049536 blocks super 1.0 [2/1] [U_]
      bitmap: 0/1 pages [0KB], 65536KB chunk
.  .  .  .  .
[root@host ~]#

Using the "mdadm" administrative utility, you can simulate such a failure via:

Code: Select all

mdadm --manage /dev/md125 --fail /dev/sdb1

(That's how I produced the above output.) The command:

Code: Select all

mdadm --manage /dev/md125 --remove /dev/sdb1

then removes the failed component from the metadevice configuration, and the metadevice runs on the remaining components. When a partition or drive really does fail, then before you can pull the drive it is necessary to 'fail' and 'remove' every partition of that drive from the metadevices of which they are components. After I simulated a drive failure and response by doing all this, I was able to successfully shut down, pull the drive, reboot the unit and it came back up into Centos successfully. All the metadevices (RAID1 mirrors) ran on single sub-mirrors:

Code: Select all

[root@reports2 ~]# cat /proc/mdstat
Personalities : [raid1] 
md125 : active raid1 sda1[0]
      1049536 blocks super 1.0 [2/1] [U_]
      bitmap: 1/1 pages [4KB], 65536KB chunk

md126 : active raid1 sda2[0]
      1047552 blocks super 1.2 [2/1] [U_]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md127 : active raid1 sda3[0]
      974529536 blocks super 1.2 [2/1] [U_]
      bitmap: 3/8 pages [12KB], 65536KB chunk

unused devices: <none>
[root@reports2 ~]#

I did not really have a bad drive, so I just shut down, put the drive back in, rebooted, and added the /dev/sdb components back into their respective metadevices via commands like:

Code: Select all

mdadm --manage /dev/md125 --add /dev/sdb1

After doing this with all the mirrors, they will re-sync. The larger the mirror the longer it takes:

Code: Select all

[root@reports2 ~]# !cat
cat /proc/mdstat
Personalities : [raid1] 
md125 : active raid1 sdb1[1] sda1[0]
      1049536 blocks super 1.0 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md126 : active raid1 sdb2[2] sda2[0]
      1047552 blocks super 1.2 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md127 : active raid1 sdb3[2] sda3[0]
      974529536 blocks super 1.2 [2/1] [U_]
      [>....................]  recovery =  0.0% (883968/974529536) finish=91.7min speed=176793K/sec
      bitmap: 3/8 pages [12KB], 65536KB chunk

unused devices: <none>
[root@reports2 ~]#

With a real drive failure, it is of course necessary to first partition the replacement drive identically as the remaining one. Though I did not really need to, I followed the instructions at:

https://www.howtoforge.com/tutorial/lin ... -harddisk/

And used the "gdisk" utility to conveniently duplicate the /dev/sda GPT partition table onto /dev/sdb via:

Code: Select all

sgdisk -R /dev/sdb /dev/sda
sgdisk -G /dev/sdb

The above link recommends "gdisk" as reliable for GPT disklabels/partition-tables. The first command does the actual copy (from "/dev/sda" to "/dev/sdb", perhaps a little counter-intuitively) and the second generates unique UUIDs for /dev/sdb and it's partitions.

lightman47 · Post by **lightman47** » 2019/01/09 19:08:04

OK - I've twice suffered a bad drive, but I run RAID 10 (1 + 0). I've also since learned mine's likely fake-raid.

Each time, I powered down removed the bad drive, plugged in the new drive, and the RAID drivers did the rest on the subsequent boot-up. It took a while dependent upon drive size while the array got rebuilt (patience paid off)

- probably no help, but my 2 (USD) cents.

CentOS

Centos 7.6 RAID5 physical system will not boot after single drive failure

Centos 7.6 RAID5 physical system will not boot after single drive failure

Re: Centos 7.6 RAID5 physical system will not boot after single drive failure

Re: Centos 7.6 RAID5 physical system will not boot after single drive failure

Re: Centos 7.6 RAID5 physical system will not boot after single drive failure

Re: Centos 7.6 RAID5 physical system will not boot after single drive failure