we have a HP DL380 Gen10 server with CentOS 7.6 and after 30 minutes running we lost our logical drive (with all that means).
Searching for known bugs we found a similar problem with CentOS 7.4 (https://bugs.centos.org/view.php?id=15801) and it was solved updating the kernel. There is also a close situation with RHEL 7.6 (https://bugzilla.redhat.com/show_bug.cgi?id=1666912) and bugs related to smartpqi driver (https://support.hpe.com/hpesc/public/do ... 71158en_us). Lastly, it could be a physical problem with the disks. (FYI, we have several servers in the same conditions and working well).
Executing the dmesg command we can see:
Code: Select all
[ 19.593914] warning: `BackgrProcPool' uses 32-bit capabilities (legacy support in use)
[ 363.182434] smartpqi 0000:5c:00.0: resetting scsi 1:1:0:1
[ 363.187484] smartpqi 0000:5c:00.0: reset of scsi 1:1:0:1: SUCCESS
[ 409.445866] smartpqi 0000:5c:00.0: resetting scsi 1:1:0:1
[ 409.449850] smartpqi 0000:5c:00.0: reset of scsi 1:1:0:1: SUCCESS
[ 569.358969] usb 1-1: USB disconnect, device number 2
[ 780.619211] smartpqi 0000:5c:00.0: resetting scsi 1:1:0:1
[ 780.623117] smartpqi 0000:5c:00.0: reset of scsi 1:1:0:1: SUCCESS
[ 1618.332963] smartpqi 0000:5c:00.0: resetting scsi 1:1:0:1
[ 1651.417533] smartpqi 0000:5c:00.0: reset of scsi 1:1:0:1: SUCCESS
[ 1651.417633] sd 1:1:0:1: [sdc] Medium access timeout failure. Offlining disk!
[ 1651.417700] sd 1:1:0:1: Device offlined - not ready after error recovery
[ 1651.417715] sd 1:1:0:1: [sdc] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
[ 1651.417722] sd 1:1:0:1: [sdc] CDB: Read(16) 88 00 00 00 00 06 47 6e 0e 00 00 00 01 d8 00 00
[ 1651.417727] blk_update_request: I/O error, dev sdc, sector 26968198656
[ 1651.417792] sd 1:1:0:1: rejecting I/O to offline device
[ 1651.417808] sd 1:1:0:1: [sdc] killing request
[ 1651.417833] sd 1:1:0:1: rejecting I/O to offline device
[ 1651.417859] sd 1:1:0:1: rejecting I/O to offline device
[ 1651.417879] sd 1:1:0:1: rejecting I/O to offline device
[ 1651.417900] sd 1:1:0:1: rejecting I/O to offline device
[ 1651.417916] sd 1:1:0:1: [sdc] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 1651.417923] sd 1:1:0:1: [sdc] CDB: Read(16) 88 00 00 00 00 04 a3 a7 7b f0 00 00 00 08 00 00
[ 1651.417930] sd 1:1:0:1: rejecting I/O to offline device
[ 1651.417935] sd 1:1:0:1: rejecting I/O to offline device
[ 1651.417949] sd 1:1:0:1: [sdc] FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[ 1651.417950] sd 1:1:0:1: rejecting I/O to offline device
[ 1651.417951] sd 1:1:0:1: [sdc] CDB: Read(16) 88 00 00 00 00 03 8d 22 6e 68 00 00 01 50 00 00
[ 1651.417952] blk_update_request: I/O error, dev sdc, sector 15252745832
[ 1651.417959] sd 1:1:0:1: rejecting I/O to offline device
[ 1651.417984] sd 1:1:0:1: rejecting I/O to offline device
[ 1651.418002] sd 1:1:0:1: [sdc] FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[ 1651.418003] sd 1:1:0:1: [sdc] CDB: Read(16) 88 00 00 00 00 07 1b ae 6e 00 00 00 00 38 00 00
[ 1651.418004] blk_update_request: I/O error, dev sdc, sector 30529187328
[ 1651.418016] sd 1:1:0:1: rejecting I/O to offline device
[ 1651.418034] sd 1:1:0:1: [sdc] FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[ 1651.418035] sd 1:1:0:1: [sdc] CDB: Read(16) 88 00 00 00 00 04 ea 12 af c8 00 00 00 08 00 00
[ 1651.418036] blk_update_request: I/O error, dev sdc, sector 21106962376
[ 1651.418053] sd 1:1:0:1: rejecting I/O to offline device
[ 1651.418069] sd 1:1:0:1: [sdc] FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[ 1651.418072] sd 1:1:0:1: [sdc] CDB: Read(16) 88 00 00 00 00 05 75 c6 2e 00 00 00 02 00 00 00
[ 1651.418077] blk_update_request: I/O error, dev sdc, sector 23450758656
[ 1651.418096] sd 1:1:0:1: [sdc] FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[ 1651.418097] sd 1:1:0:1: [sdc] CDB: Read(16) 88 00 00 00 00 00 45 ef 6f f8 00 00 00 08 00 00
[ 1651.418098] blk_update_request: I/O error, dev sdc, sector 1173319672
[ 1651.418104] sd 1:1:0:1: rejecting I/O to offline device
[ 1651.418123] sd 1:1:0:1: [sdc] FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[ 1651.418127] sd 1:1:0:1: [sdc] CDB: Read(16) 88 00 00 00 00 03 46 77 0f a0 00 00 00 08 00 00
[ 1651.418128] blk_update_request: I/O error, dev sdc, sector 14067109792
[ 1651.418133] sd 1:1:0:1: rejecting I/O to offline device
[ 1651.418153] sd 1:1:0:1: [sdc] FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[ 1651.418155] sd 1:1:0:1: [sdc] CDB: Read(16) 88 00 00 00 00 05 2f 8f 0d d0 00 00 00 68 00 00
[ 1651.418156] sd 1:1:0:1: rejecting I/O to offline device
[ 1651.418156] blk_update_request: I/O error, dev sdc, sector 22272740816
[ 1651.418169] sd 1:1:0:1: [sdc] FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[ 1651.418170] sd 1:1:0:1: [sdc] CDB: Read(16) 88 00 00 00 00 04 e9 b8 cf 68 00 00 00 08 00 00
[ 1651.418170] blk_update_request: I/O error, dev sdc, sector 21101072232
[ 1651.418176] blk_update_request: I/O error, dev sdc, sector 21121355264
[ 1651.418190] sd 1:1:0:1: rejecting I/O to offline device
[ 1651.418204] sd 1:1:0:1: rejecting I/O to offline device
[ 1651.418243] sd 1:1:0:1: rejecting I/O to offline device
[ 1651.418266] XFS (dm-2): metadata I/O error: block 0x45ddb4400 ("xlog_iodone") error 5 numblks 512
[ 1651.418268] sd 1:1:0:1: rejecting I/O to offline device
[ 1651.418271] XFS (dm-2): xfs_do_force_shutdown(0x2) called from line 1221 of file fs/xfs/xfs_log.c. Return address = 0xffffffffc0962c30
[ 1651.418286] sd 1:1:0:1: rejecting I/O to offline device
[ 1651.418585] XFS (dm-2): Log I/O Error Detected. Shutting down filesystem
[ 1651.418586] XFS (dm-2): Please umount the filesystem and rectify the problem(s)
[ 1651.418588] XFS (dm-2): metadata I/O error: block 0x45ddb4600 ("xlog_iodone") error 5 numblks 512
[ 1651.418589] XFS (dm-2): xfs_do_force_shutdown(0x2) called from line 1221 of file fs/xfs/xfs_log.c. Return address = 0xffffffffc0962c30
[ 1651.418619] XFS (dm-2): metadata I/O error: block 0x6d288c160 ("xfs_trans_read_buf_map") error 5 numblks 8
[ 1651.418645] XFS (dm-2): xfs_do_force_shutdown(0x1) called from line 236 of file fs/xfs/libxfs/xfs_defer.c. Return address = 0xffffffffc092182b
[ 1651.418664] XFS (dm-2): metadata I/O error: block 0x41c20 ("xfs_trans_read_buf_map") error 5 numblks 32
[ 1651.418687] XFS (dm-2): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
[ 1651.418689] XFS (dm-2): xfs_do_force_shutdown(0x8) called from line 3416 of file fs/xfs/xfs_inode.c. Return address = 0xffffffffc0956ea6
Code: Select all
-bash-4.2$ hostnamectl
Operating System: CentOS Linux 7 (Core)
CPE OS Name: cpe:/o:centos:centos:7
Kernel: Linux 3.10.0-957.5.1.el7.x86_64
Architecture: x86-64
-bash-4.2$ modinfo smartpqi
filename: /lib/modules/3.10.0-957.5.1.el7.x86_64/kernel/drivers/scsi/smartpqi/smartpqi.ko.xz
license: GPL
version: 1.1.4-115
description: Driver for Microsemi Smart Family Controller version 1.1.4-115
author: Microsemi
retpoline: Y
rhelversion: 7.6
We need help to isolate the problem between the kernel version, smartpqi driver version, hardware failure or other reasons. Can anyone give us any advice?
Thanks in advance
Fabrizio