clvmd is hanging on startup of CentOS 6 cluster with iscsi
Posted: 2020/11/20 17:41:42
Dear community,
I've been digging in the following problem for several days now, but did not find the root problem.
There was a short power surge on the following older redundant production system:
Hardware configuration
Cluster of 5 servers (1 Dell PowerEdge M520, 4 Dell PowerEdge M620).
Attached SAN (iSCSI storage): Dell PowerVault M32xxi
RAID-6
Operating system
I know, that this version of CentOS is at its end of life, but I'd like to get this system working again.
I hope that you may help me and can give me some hint in the right direction. I'm willing to give you the necessary information.
I've been digging in the following problem for several days now, but did not find the root problem.
There was a short power surge on the following older redundant production system:
Hardware configuration
Cluster of 5 servers (1 Dell PowerEdge M520, 4 Dell PowerEdge M620).
Attached SAN (iSCSI storage): Dell PowerVault M32xxi
RAID-6
Operating system
Code: Select all
# cat /etc/redhat-release
CentOS release 6.10 (Final)
# uname -a
Linux srv01.local 2.6.32-754.29.2.el6.x86_64 #1 SMP Tue May 12 17:39:04 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
- One of the four switches failed, but the rest is still working. I reconfigured the servers, so they don't use the failed route over em1, but em2, only.
- I started the servers in runlevel 1 and managed them to start their network and the cluster as well.
Code: Select all
# service cman start
Starting cluster:
Checking if cluster has been disabled at boot... [ OK ]
Checking Network Manager... [ OK ]
Global setup... [ OK ]
Loading kernel modules... [ OK ]
Mounting configfs... [ OK ]
Starting cman... [ OK ]
Waiting for quorum... [ OK ]
Starting fenced... [ OK ]
Starting dlm_controld... [ OK ]
Tuning DLM kernel config... [ OK ]
Starting gfs_controld... [ OK ]
Unfencing self... [ OK ]
Joining fence domain... [ OK ]
#
Code: Select all
# clustat
Cluster Status for My-Cluster @ Tue Nov 17 14:37:23 2020
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
srv01.local 1 Online, Local
srv02.local 2 Offline
srv03.local 3 Online
srv04.local 4 Online
srv05.local 5 Online
#
- But the clvmd is waiting indefinitely.
- iSCSI is working correctly.
- Multipath also seems to be correct.
Code: Select all
# multipath -ll
projekte (3690b11c00053b79f0000036a524b1980) dm-4 DELL,MD32xxi
size=1.8T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
`-+- policy='round-robin 0' prio=9 status=active
`- 15:0:0:1 sdc 8:32 active ready running
grundlage (3690b11c00053b858000001f43a105417) dm-3 DELL,MD32xxi
size=350G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
`-+- policy='round-robin 0' prio=14 status=active
`- 15:0:0:0 sdb 8:16 active ready running
#
Code: Select all
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 278,9G 0 disk
+-sda1 8:1 0 500M 0 part /boot
+-sda2 8:2 0 278,4G 0 part
+-vg_srv01-lv_root (dm-0) 253:0 0 50G 0 lvm /
+-vg_srv01-lv_swap (dm-1) 253:1 0 15,7G 0 lvm [SWAP]
+-vg_srv01-lv_home (dm-2) 253:2 0 212,7G 0 lvm /old_home
sdb 8:16 0 350G 0 disk
+-grundlage (dm-3) 253:3 0 350G 0 mpath
+-grundlagep1 (dm-5) 253:5 0 350G 0 part
sdc 8:32 0 1,9T 0 disk
+-projekte (dm-4) 253:4 0 1,9T 0 mpath
+-projektep1 (dm-6) 253:6 0 1,3T 0 part
#