Page 1 of 1

Corrupted /boot, trying to copy with xfs_metadumb but no progress

Posted: 2020/12/28 00:32:06
by chrisB

I have been having issues with my CentOS 7 for a (long) while that I could not understand since kdump was not producing crash dumps. Last night I finally found out that the crashdump directory I had defined in the kdump configuration did not exist. Putting it back to /var/crash produced my long-awaited dump and I thought the problem would be soon sorted out.

Today I got a crash and on multiple attempts to boot, I would just reach the grub menu, choose a kernel and then the screen goes blank forever. It responds to SysRq commands, so I also tried ssh`ing to it from another computer but it didn't work.

I went to the grub console and typed

Code: Select all

ls (hd0,3)
which printed the list of directories but also the message

Code: Select all

error: not a correct XFS node
after the directories /boot, /run and /srv/.

I rebooted from a USB stick with CentOS 7 and went to the troubleshooting section. I checked the crash dump and saw many instances of a block of output with the form (X stands for a number)

Code: Select all

XFS (sda3): Metadata I/O error in xfs_trans_read_buf_map at daddr 0xXXXXXXXX len 232 error 117
XFS (sda3): xfs_imap_to_bp: xfs_trans_read_buf() returned error -117.
XFS (sda3): Metadata CRC error detected at xfs_arg_read_verify+0xXX/0xXX [xfs], xfs_age block 0xXXXXXXXX
XFS (sda3): Unmount and run xfs_repair
XFS (sda3): First 128 bytes of corrupted metadata buffer:
000000X0 < hex values>
Alright, so then when unmounted I tried

Code: Select all

xfs_repair -n /dev/sda3
and got hundreds of lines printed too quickly to read the content or take a photo. Sounded pretty bad.

I then googled and came across this question ... ilesystems where in the accepted answer he suggests making an image and trying to repair that image instead of the real system and see if it breaks. So I went on with the

Code: Select all

xfs_metadump /dev/sda3 /path/to/ext/drive 
and got

Code: Select all

Metadata CRC error detected at xfs_agi block 0x19000001/0x200
xfs_metadump: cannot init perag data (-74). Continuing anyway.
bad magic number
Metadata CRC error detected at xfs_agi block 0x19000002/0x200
Metadata CRC error detected at xfs_agi block 0x19000003/0x200
and it has been like that for some 5 hours already. I just killed the task and shut down the computer.

Do you think there is anything I could do to bring it back to live or just reinstall it?

Thanks! Sorry about the long post.

Re: Corrupted /boot, trying to copy with xfs_metadumb but no progress

Posted: 2021/01/01 06:38:37
by chrisB
Just to update the thread if somebody happens to open it, I ended up using the xfs_repair command directly on my corrupt partition and now there are no problems reported. However, the computer still doesn't boot and so far I'm clueless why. I just see a black screen even when I add

Code: Select all rd.debug
to the grub line.