I've been digging in the following problem for several days now, but did not find the root problem.
There was a short power surge on the following older redundant production system:
Hardware configuration
Cluster of 5 servers (1 Dell PowerEdge M520, 4 Dell PowerEdge M620).
Attached SAN (iSCSI storage): Dell PowerVault M32xxi
RAID-6
Operating system
Code: Select all
# cat /etc/redhat-release
CentOS release 6.10 (Final)
# uname -a
Linux srv01.local 2.6.32-754.29.2.el6.x86_64 #1 SMP Tue May 12 17:39:04 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
- One of the four switches failed, but the rest is still working. I reconfigured the servers, so they don't use the failed route over em1, but em2, only.
- I started the servers in runlevel 1 and managed them to start their network and the cluster as well.
Code: Select all
# service cman start
Starting cluster:
Checking if cluster has been disabled at boot... [ OK ]
Checking Network Manager... [ OK ]
Global setup... [ OK ]
Loading kernel modules... [ OK ]
Mounting configfs... [ OK ]
Starting cman... [ OK ]
Waiting for quorum... [ OK ]
Starting fenced... [ OK ]
Starting dlm_controld... [ OK ]
Tuning DLM kernel config... [ OK ]
Starting gfs_controld... [ OK ]
Unfencing self... [ OK ]
Joining fence domain... [ OK ]
#
Code: Select all
# clustat
Cluster Status for My-Cluster @ Tue Nov 17 14:37:23 2020
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
srv01.local 1 Online, Local
srv02.local 2 Offline
srv03.local 3 Online
srv04.local 4 Online
srv05.local 5 Online
#
- But the clvmd is waiting indefinitely.
- iSCSI is working correctly.
- Multipath also seems to be correct.
Code: Select all
# multipath -ll
projekte (3690b11c00053b79f0000036a524b1980) dm-4 DELL,MD32xxi
size=1.8T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
`-+- policy='round-robin 0' prio=9 status=active
`- 15:0:0:1 sdc 8:32 active ready running
grundlage (3690b11c00053b858000001f43a105417) dm-3 DELL,MD32xxi
size=350G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
`-+- policy='round-robin 0' prio=14 status=active
`- 15:0:0:0 sdb 8:16 active ready running
#
Code: Select all
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 278,9G 0 disk
+-sda1 8:1 0 500M 0 part /boot
+-sda2 8:2 0 278,4G 0 part
+-vg_srv01-lv_root (dm-0) 253:0 0 50G 0 lvm /
+-vg_srv01-lv_swap (dm-1) 253:1 0 15,7G 0 lvm [SWAP]
+-vg_srv01-lv_home (dm-2) 253:2 0 212,7G 0 lvm /old_home
sdb 8:16 0 350G 0 disk
+-grundlage (dm-3) 253:3 0 350G 0 mpath
+-grundlagep1 (dm-5) 253:5 0 350G 0 part
sdc 8:32 0 1,9T 0 disk
+-projekte (dm-4) 253:4 0 1,9T 0 mpath
+-projektep1 (dm-6) 253:6 0 1,3T 0 part
#