Cluster Issue

If it doesn't fit in another category, ask it here.
Post Reply
Jath
Posts: 5
Joined: 2011/05/13 02:37:08

Cluster Issue

Post by Jath » 2011/05/25 03:20:52

Hello,

Currently with my CentOS cluster for my group's Senior Project, we are getting a few errors with the service.

One of the problems it that sometimes, we get an error trying to stop the service cman, through 'service cman stop', and it just comes up saying "Device in use" or something along those lines. We can't actually stop the service without pulling the plug. I unfortunately don't have pictures of that like I do the following.

The next issue is while shutting down. This doesn't happen on all machines, nor does it happen all the time, but we sometimes have this happen:

[img]http://img204.imageshack.us/img204/1758/img20110524214155.jpg[/img]
(This picture is very hard to read. Sorry.)

This hangs at "Shutting down Cluster Manager" and doesn't move further for quiet a bit. Once it finally Does get past that, it gets to this error message:

[img]http://img546.imageshack.us/img546/5489/img20110524214351.jpg[/img]

For this second message, it doesn't go any further. We then have to shut any computer with this message down manually - plug or hold power button in.

Any help is appreciated, and thank you.

teleport
Posts: 114
Joined: 2005/09/09 05:30:18
Contact:

Cluster Issue

Post by teleport » 2011/05/25 07:29:21

Did You have fence devices configured?

http://sources.redhat.com/cluster/wiki/FAQ/Fencing

Jath
Posts: 5
Joined: 2011/05/13 02:37:08

Re: Cluster Issue

Post by Jath » 2011/05/25 12:16:10

Thank you for your quick reply Teleport.

Yes, I do have fencing configured on this cluster. The only thing that confused me is this never happened before. Especially since fencing is just set to manual fencing (no parameters set) and it's just shutting the node down. It used to just go through the regular "Sending kill TERM to:" to all services, it went through like "Stopping Fencing..... Done" along with all cluster services in cman. Never had this issue.

I would suppose this is normal, then?

Thank you again for your help.

teleport
Posts: 114
Joined: 2005/09/09 05:30:18
Contact:

Re: Cluster Issue

Post by teleport » 2011/05/25 17:46:07

I ask did You have fence [b]devices[/b] configured.

This is from link I posted:

[quote][b]Can't I just use my own watchdog or manual fencing?[/b]

No. Fencing is absolutely required in all production environments. That's right. We do not support people using only watchdog timers anymore.
Manual fencing is absolutely not supported in any production environment, ever, under any circumstances.

[b]Is manual fencing supported?[/b]

No. No. A thousand times no. Oh sure. You can use it. But don't complain when a node needs to be fenced and the cluster locks up, and services don't fail over. [/quote]

Exactly same happen regulary on all test clusters I set, real servers or virtual machines.

On my production clusters I use integrated IPMI on motherboards, this works fine and is cheapest solution, as IPMI card is integrated or is cheap addon module for almost all server motherboards.

PS: Manual or no fencing works only on two node clusters with quorum partition. As I see You have at least 7 node cluster, so hardware fencing device is required. If IPMI is not possible, look for network connected power switch/distributor.

Post Reply