Page 1 of 1

Corrupted /boot, trying to copy with xfs_metadumb but no progress

Posted: 2020/12/28 00:32:06
by chrisB
Hi,

I have been having issues with my CentOS 7 for a (long) while that I could not understand since kdump was not producing crash dumps. Last night I finally found out that the crashdump directory I had defined in the kdump configuration did not exist. Putting it back to /var/crash produced my long-awaited dump and I thought the problem would be soon sorted out.

Today I got a crash and on multiple attempts to boot, I would just reach the grub menu, choose a kernel and then the screen goes blank forever. It responds to SysRq commands, so I also tried ssh`ing to it from another computer but it didn't work.

I went to the grub console and typed

Code: Select all

ls (hd0,3)
which printed the list of directories but also the message

Code: Select all

error: not a correct XFS node
after the directories /boot, /run and /srv/.

I rebooted from a USB stick with CentOS 7 and went to the troubleshooting section. I checked the crash dump and saw many instances of a block of output with the form (X stands for a number)

Code: Select all

XFS (sda3): Metadata I/O error in xfs_trans_read_buf_map at daddr 0xXXXXXXXX len 232 error 117
XFS (sda3): xfs_imap_to_bp: xfs_trans_read_buf() returned error -117.
XFS (sda3): Metadata CRC error detected at xfs_arg_read_verify+0xXX/0xXX [xfs], xfs_age block 0xXXXXXXXX
XFS (sda3): Unmount and run xfs_repair
XFS (sda3): First 128 bytes of corrupted metadata buffer:
000000X0 < hex values>
Alright, so then when unmounted I tried

Code: Select all

xfs_repair -n /dev/sda3
and got hundreds of lines printed too quickly to read the content or take a photo. Sounded pretty bad.

I then googled and came across this question https://serverfault.com/questions/77729 ... ilesystems where in the accepted answer he suggests making an image and trying to repair that image instead of the real system and see if it breaks. So I went on with the

Code: Select all

xfs_metadump /dev/sda3 /path/to/ext/drive 
and got

Code: Select all

Metadata CRC error detected at xfs_agi block 0x19000001/0x200
xfs_metadump: cannot init perag data (-74). Continuing anyway.
bad magic number
Metadata CRC error detected at xfs_agi block 0x19000002/0x200
Metadata CRC error detected at xfs_agi block 0x19000003/0x200
and it has been like that for some 5 hours already. I just killed the task and shut down the computer.

Do you think there is anything I could do to bring it back to live or just reinstall it?

Thanks! Sorry about the long post.

Re: Corrupted /boot, trying to copy with xfs_metadumb but no progress

Posted: 2021/01/01 06:38:37
by chrisB
Just to update the thread if somebody happens to open it, I ended up using the xfs_repair command directly on my corrupt partition and now there are no problems reported. However, the computer still doesn't boot and so far I'm clueless why. I just see a black screen even when I add

Code: Select all

rd.systemd.unit=multi-user.target rd.info rd.debug rd.shell
to the grub line.