Network fails = manually restart resources groups always?
- From: chrismichaelgardner@xxxxxxxxxxx
- Date: 16 Mar 2006 13:55:31 -0800
I've been pulling my hair out over this all day.
We have a 4-node cluster we're building for file sharing. Each node is
connected to four switches. 2 of the switches are used for the private
heartbeat, 2 are used for serving files to clients.
We're testing a variety of hypothetical network failures: a bad network
cable on one of the nodes, individual nodes losing all network
connectivity, one of the public switches going down, etc. In all cases
the cluster recovers and is able to serve files -- except one scenario.
If both public switches go down simultaneously, the nodes try
desperately to shuffle the resource groups around. All the resource
groups then fail, and can only be brought up manually. As a real-world
example, a person on our network team needed to reboot both switches
last night. This morning, all of the resource groups were down (no one
could connect via Cluster Administrator, no file shares were up, etc).
The only fix is to bring up the resource groups manually.
I've spent the day reading newsgroups and sites on the web. A number
of people seem to have this problem but I don't see any real solutions.
A few people say it's "by design" (if the cluster can't get any
network connection for a while, it should fail). Considering our
regular file servers handle a couple switches rebooting just fine, this
new setup's "design" is less than acceptable.
My questions:
1.) Is there any way to NOT have the resource groups fail to the the
point they need to be manually restarted if all nodes lose connectivity
for a few minutes? Is there a setting I can change somewhere (say,
"check every five minutes to see if network connectivity is back; if
it's down for 6 hours, fail the resources totally")?
2.) If they have to fail, is there a way to have the cluster try to
start up the failed resources when network connectivity is returned?
Basically, how have other people with clusters gotten around this? Do
they just manually restart the groups every time it happens?
Thanks in advance for any help.
.
- Follow-Ups:
- Re: Network fails = manually restart resources groups always?
- From: Russ Kaufmann [MVP]
- Re: Network fails = manually restart resources groups always?
- Prev by Date: Cannot change Quorum drive
- Next by Date: Re: How to Force client to hit same host
- Previous by thread: Cannot change Quorum drive
- Next by thread: Re: Network fails = manually restart resources groups always?
- Index(es):
Relevant Pages
|