(re)discover SCSI device removed by kernel

Issues related to applications and software problems
Post Reply
geoweaser
Posts: 1
Joined: 2022/04/28 12:38:20

(re)discover SCSI device removed by kernel

Post by geoweaser » 2022/07/26 12:27:03

I have a PowerVault with a RAID 5 array attached to a PowerEdge server running CentOS 7.9.2009.

One of the disks in the array failed and apparently the kernel then removed the array. Here are some relevant lines from /var/log/messages ("..." are my insertion to replace repeated lines or blocks of lines):
Jul 22 17:54:10 kssmn1 rc.local: UTC_Fri Jul 22 21:54:10 2022 kssmn1.ad.uky.edu/statmgr msg from unknown module (statmgr doesn't have a .desc file for this one) inst:66 mod:208 typ:3
Jul 22 17:54:17 kssmn1 kernel: megaraid_sas 0000:0d:0e.0: MR_DCMD_PD_LIST_QUERY not supported by firmware
Jul 22 17:54:17 kssmn1 kernel: ses 1:0:33:0: SCSI device is removed
Jul 22 17:54:40 kssmn1 rc.local: UTC_Fri Jul 22 21:54:40 2022 kssmn1.ad.uky.edu/statmgr msg from unknown module (statmgr doesn't have a .desc file for this one) inst:66 mod:208 typ:3
Jul 22 17:55:03 kssmn1 kernel: sd 1:2:0:0: [sdc] tag#2 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=26s
Jul 22 17:55:03 kssmn1 kernel: sd 1:2:0:0: [sdc] tag#2 CDB: Write(16) 8a 00 00 00 00 00 00 00 14 08 00 00 00 10 00 00
Jul 22 17:55:03 kssmn1 kernel: blk_update_request: I/O error, dev sdc, sector 5128
Jul 22 17:55:03 kssmn1 kernel: Buffer I/O error on dev sdc, logical block 641, lost async page write
Jul 22 17:55:03 kssmn1 kernel: Buffer I/O error on dev sdc, logical block 642, lost async page write

Jul 22 17:55:03 kssmn1 kernel: sd 1:2:0:0: [sdc] tag#10 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=26s
Jul 22 17:55:03 kssmn1 kernel: sd 1:2:0:0: [sdc] tag#10 CDB: Write(16) 8a 00 00 00 00 00 00 00 1c b0 00 00 00 08 00 00
Jul 22 17:55:03 kssmn1 kernel: blk_update_request: I/O error, dev sdc, sector 7344
Jul 22 17:55:03 kssmn1 kernel: sd 1:2:0:0: [sdc] tag#11 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=26s
Jul 22 17:55:03 kssmn1 kernel: sd 1:2:0:0: [sdc] tag#11 CDB: Write(16) 8a 00 00 00 00 00 00 00 1d 18 00 00 00 08 00 00
Jul 22 17:55:03 kssmn1 kernel: blk_update_request: I/O error, dev sdc, sector 7448
Jul 22 17:55:03 kssmn1 kernel: Aborting journal on device sdc-8.
Jul 22 17:55:03 kssmn1 kernel: JBD2: Error -5 detected when updating journal superblock for sdc-8.
Jul 22 17:55:03 kssmn1 kernel: megaraid_sas 0000:0d:0e.0: 45625 (711838981s/0x0004/CRIT) - Enclosure PD 21(e1/s255) communication lost
Jul 22 17:55:03 kssmn1 kernel: megaraid_sas 0000:0d:0e.0: MR_DCMD_PD_LIST_QUERY not supported by firmware
Jul 22 17:55:03 kssmn1 kernel: megaraid_sas 0000:0d:0e.0: scanning for scsi1...
Jul 22 17:55:03 kssmn1 kernel: megaraid_sas 0000:0d:0e.0: MR_DCMD_PD_LIST_QUERY not supported by firmware
Jul 22 17:55:03 kssmn1 kernel: megaraid_sas 0000:0d:0e.0: DCMD not supported by firmware - megasas_ld_list_query 4762
Jul 22 17:55:03 kssmn1 kernel: EXT4-fs error (device sdc): ext4_journal_check_start:56: Detected aborted journal
Jul 22 17:55:03 kssmn1 kernel: EXT4-fs (sdc): Remounting filesystem read-only
Jul 22 17:55:03 kssmn1 kernel: EXT4-fs (sdc): previous I/O error to superblock detected
Jul 22 17:55:03 kssmn1 kernel: sd 1:2:0:0: SCSI device is removed
Jul 22 17:55:04 kssmn1 kernel: EXT4-fs warning (device sdc): __ext4_read_dirblock:676: error reading directory block (ino 166068866, block 0)
Jul 22 17:55:04 kssmn1 kernel: EXT4-fs warning (device sdc): __ext4_read_dirblock:903: error reading directory block (ino 165939015, block 0)
Jul 22 17:55:04 kssmn1 kernel: megaraid_sas 0000:0d:0e.0: 45629 (711838985s/0x0001/CRIT) - VD 00/0 is now DEGRADED
Jul 22 17:55:04 kssmn1 kernel: megaraid_sas 0000:0d:0e.0: MR_DCMD_PD_LIST_QUERY not supported by firmware
Jul 22 17:55:04 kssmn1 kernel: megaraid_sas 0000:0d:0e.0: scanning for scsi1...
Jul 22 17:55:04 kssmn1 kernel: megaraid_sas 0000:0d:0e.0: MR_DCMD_PD_LIST_QUERY not supported by firmware
Jul 22 17:55:04 kssmn1 kernel: megaraid_sas 0000:0d:0e.0: DCMD not supported by firmware - megasas_ld_list_query 4762
Jul 22 17:55:04 kssmn1 kernel: megaraid_sas 0000:0d:0e.0: 45635 (711838985s/0x0001/FATAL) - VD 00/0 is now OFFLINE
Jul 22 17:55:04 kssmn1 kernel: megaraid_sas 0000:0d:0e.0: DCMD not supported by firmware - megasas_ld_list_query 4762
Jul 22 17:55:04 kssmn1 kernel: megaraid_sas 0000:0d:0e.0: MR_DCMD_PD_LIST_QUERY not supported by firmware
Jul 22 17:55:04 kssmn1 kernel: megaraid_sas 0000:0d:0e.0: MR_DCMD_PD_LIST_QUERY not supported by firmware
Jul 22 17:55:04 kssmn1 kernel: megaraid_sas 0000:0d:0e.0: MR_DCMD_PD_LIST_QUERY not supported by firmware

Jul 22 17:55:04 kssmn1 kernel: megaraid_sas 0000:0d:0e.0: MR_DCMD_PD_LIST_QUERY not supported by firmware
Jul 22 17:55:04 kssmn1 kernel: megaraid_sas 0000:0d:0e.0: MR_DCMD_PD_LIST_QUERY not supported by firmware
Jul 22 17:55:04 kssmn1 systemd: Stopped target Local File Systems.
Jul 22 17:55:04 kssmn1 systemd: Unmounting /array1...
Jul 22 17:55:04 kssmn1 umount: umount: /array1: target is busy.
Jul 22 17:55:04 kssmn1 umount: (In some cases useful info about processes that use
Jul 22 17:55:04 kssmn1 umount: the device is found by lsof(8) or fuser(1))
Jul 22 17:55:04 kssmn1 systemd: array1.mount mount process exited, code=exited status=32
Jul 22 17:55:04 kssmn1 systemd: Failed unmounting /array1.
Jul 22 17:55:04 kssmn1 systemd: Unit array1.mount is bound to inactive unit dev-sdc.device. Stopping, too.
Jul 22 17:55:04 kssmn1 systemd: Unmounting /array1...
Jul 22 17:55:04 kssmn1 umount: umount: /array1: target is busy.
Jul 22 17:55:04 kssmn1 umount: (In some cases useful info about processes that use
Jul 22 17:55:04 kssmn1 umount: the device is found by lsof(8) or fuser(1))
Jul 22 17:55:04 kssmn1 systemd: array1.mount mount process exited, code=exited status=32
Jul 22 17:55:04 kssmn1 systemd: Failed unmounting /array1.
Jul 22 17:55:04 kssmn1 systemd: Unit array1.mount is bound to inactive unit dev-sdc.device. Stopping, too.
Jul 22 17:55:04 kssmn1 systemd: Unmounting /array1...
Jul 22 17:55:04 kssmn1 umount: umount: /array1: target is busy.
Jul 22 17:55:04 kssmn1 umount: (In some cases useful info about processes that use
Jul 22 17:55:04 kssmn1 umount: the device is found by lsof(8) or fuser(1))
Jul 22 17:55:04 kssmn1 systemd: array1.mount mount process exited, code=exited status=32
Jul 22 17:55:04 kssmn1 kernel: megaraid_sas 0000:0d:0e.0: MR_DCMD_PD_LIST_QUERY not supported by firmware
Jul 22 17:55:04 kssmn1 kernel: megaraid_sas 0000:0d:0e.0: MR_DCMD_PD_LIST_QUERY not supported by firmware

Jul 22 18:12:02 kssmn1 kernel: EXT4-fs error: 14 callbacks suppressed
Jul 22 18:12:02 kssmn1 kernel: EXT4-fs error (device sdc): __ext4_get_inode_loc:4245: inode #165938991: block 1327497362: comm rsync: unable to read itable block
Jul 22 18:12:02 kssmn1 kernel: EXT4-fs error (device sdc): __ext4_get_inode_loc:4245: inode #165938991: block 1327497362: comm rsync: unable to read itable block
Jul 22 18:12:02 kssmn1 kernel: EXT4-fs error (device sdc): __ext4_get_inode_loc:4245: inode #165938991: block 1327497362: comm rsync: unable to read itable block
Jul 22 18:12:02 kssmn1 kernel: EXT4-fs error (device sdc): __ext4_get_inode_loc:4245: inode #165938991: block 1327497362: comm rsync: unable to read itable block
Jul 22 18:12:02 kssmn1 kernel: EXT4-fs error (device sdc): __ext4_get_inode_loc:4245: inode #165938991: block 1327497362: comm rsync: unable to read itable block
Jul 22 18:12:02 kssmn1 kernel: EXT4-fs warning: 852 callbacks suppressed
Jul 22 18:12:02 kssmn1 kernel: EXT4-fs warning (device sdc): __ext4_read_dirblock:676: error reading directory block (ino 165938991, block 0)
Jul 22 18:12:02 kssmn1 kernel: EXT4-fs warning (device sdc): __ext4_read_dirblock:676: error reading directory block (ino 165938991, block 0)
Jul 22 18:12:02 kssmn1 kernel: EXT4-fs warning (device sdc): __ext4_read_dirblock:676: error reading directory block (ino 165938991, block 0)
Jul 22 18:12:02 kssmn1 kernel: EXT4-fs warning (device sdc): __ext4_read_dirblock:676: error reading directory block (ino 165938991, block 0)
Jul 22 18:12:02 kssmn1 kernel: EXT4-fs warning (device sdc): __ext4_read_dirblock:676: error reading directory block (ino 165938991, block 0)
Jul 22 18:12:02 kssmn1 kernel: EXT4-fs warning (device sdc): __ext4_read_dirblock:676: error reading directory block (ino 165938991, block 0)
Jul 22 18:12:02 kssmn1 kernel: EXT4-fs warning (device sdc): __ext4_read_dirblock:676: error reading directory block (ino 165938991, block 0)
Jul 22 18:12:02 kssmn1 kernel: EXT4-fs warning (device sdc): __ext4_read_dirblock:676: error reading directory block (ino 165938991, block 0)
Jul 22 18:12:02 kssmn1 kernel: EXT4-fs warning (device sdc): __ext4_read_dirblock:676: error reading directory block (ino 165938991, block 0)
Jul 22 18:12:02 kssmn1 kernel: EXT4-fs warning (device sdc): __ext4_read_dirblock:676: error reading directory block (ino 165938991, block 0)
Jul 22 18:12:02 kssmn1 kernel: EXT4-fs error (device sdc): __ext4_get_inode_loc:4245: inode #165938991: block 1327497362: comm rsync: unable to read itable block
Jul 22 18:12:02 kssmn1 kernel: EXT4-fs error (device sdc): __ext4_get_inode_loc:4245: inode #165938991: block 1327497362: comm rsync: unable to read itable block
Jul 22 18:12:02 kssmn1 umount: umount: /array1: target is busy.
Jul 22 18:12:02 kssmn1 umount: (In some cases useful info about processes that use
Jul 22 18:12:02 kssmn1 umount: the device is found by lsof(8) or fuser(1))
Jul 22 18:12:02 kssmn1 kernel: EXT4-fs error (device sdc): __ext4_get_inode_loc:4245: inode #165938991: block 1327497362: comm sftp: unable to read itable block
Jul 22 18:12:02 kssmn1 kernel: EXT4-fs error (device sdc): __ext4_get_inode_loc:4245: inode #165938991: block 1327497362: comm sftp: unable to read itable block
Jul 22 18:12:02 kssmn1 kernel: EXT4-fs error (device sdc): __ext4_get_inode_loc:4245: inode #165938991: block 1327497362: comm put_nehrp_gifs.: unable to read itable block
Jul 22 18:12:04 kssmn1 journal: Suppressed 17031 messages from /
Jul 22 18:12:04 kssmn1 umount: the device is found by lsof(8) or fuser(1))
Jul 22 18:12:04 kssmn1 systemd: array1.mount mount process exited, code=exited status=32
Jul 22 18:12:04 kssmn1 systemd: Unit array1.mount is bound to inactive unit dev-sdc.device. Stopping, too.
Jul 22 18:12:04 kssmn1 systemd: Unmounting /array1...
Jul 22 18:12:04 kssmn1 systemd: array1.mount mount process exited, code=exited status=32
Jul 22 18:12:04 kssmn1 systemd: Unit array1.mount is bound to inactive unit dev-sdc.device. Stopping, too.
I replaced the failed disk, and I hope it's rebuilding (if not already finished), but I can't verify it is because the array no longer appears under /dev. Apparently, there is a known bug related to megaraid throwing messages that the controller can't understand: https://lkml.org/lkml/2016/8/18/553. And the result is that the OS no longer recognizes my array even though I have tried to scan for it,

Code: Select all

echo "- - -" > /sys/class/scsi_host/host1/scan
How can I get this array to show up in /dev? I think it would be best if I could do it without a reboot.

Post Reply