Re: 99.999% Uptime For A Cluster - Real World Comments
- From: "Don Wilwol" <donWilwol@(EMAIL)yahoo.com>
- Date: Sat, 13 May 2006 09:35:28 -0400
Simon
It does sounds like your problems are planning related, not technically
related. You need to figure out what your clusters are doing. What are the
cpu cycles, memory cycles, access trends, etc. Was the blue screens simple
overloads?
40 resources? Are we suppose to know what that means. Do you know what that
means? If 40 are file shares being accesses once a day, your good. If 40 are
SQL databases getting hammered with thousands of transaction per second,
your beat.
Between the Microsoft sites, the sites of the guys in this news group,
(Rodney's for one) you should be able to become a cluster expert. I strongly
recommend you set up a cluster in a lab so you can test and learn. I've put
some info on my site on how to do that in vmware, (its free) so all you need
is a workstation and your learning clustering. Don't try things on your
production network!!
A file share cluster should be able to cruise through less then 32 minutes a
year, without worry. If not, go back to the drawing boards. Remember, if
your going to do an active/active cluster, you need **less than** 40%
utilization on each node.
--
--------
Hope It Helps!
dw
_______________________________
Don Wilwol
Distributed Application Technologies.
dwilwol(DELETE)@datbusiness.com
www.AtTheDataCenter.com (personal website)
www.skysphere.com (hosting available)
"Simon" <Simon@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:EAF35876-7204-483E-88A6-5A89545872ED@xxxxxxxxxxxxxxxx
John
But this is what is happening. One Node has over 40 resources, which I
dont
know is good or bad. Finding any real good working documentation outside
of
microsoft is extremely hard. Even a book is hard to come by here in the
UK.
Is there a limit to the amount of resources a node can host. I do plan to
add another node, but was concerned I may be ADDING to the problem, rather
than relieving.
And 2 nodes have failed with BSOD - which I did post on here before, and
never really received any real satisfactory response as the message is
extremely vague.
Also John, when a failure occurs, not all resources come back on line.
Worse
still, is the resources move onto another node, and then all go offline
and
attempt to go back to its host node. Only to then go down and revert to
the
node it initially attempted to move to! In other words, its playing
ping-pong!
Simon
"John Toner [MVP]" wrote:
Simon,
If you plan out your clusters correctly, you should never get into most
of
the situations you describe. You should rarely experience a hardware
failure
on both nodes at the same time, though it can surely happen. Cluster has
the
solution for this, though...add another node :)
One node should never be in the situation where it dies because "it
cannot
take on all the resources." This is clearly a planning issue rather than
a
cluster issue.
Regards,
John
"Simon" <Simon@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:4EA4153E-16A9-4056-BF90-E4908D7C0400@xxxxxxxxxxxxxxxx
Thanks Rodneynode
Cluster in question is a File Server on 2 nodes. We dont have an SLA
for
99.999 uptime - someone said that in a year, you could achieve 32mins
downtime.
I sort of disagree - because if you have a hardware problem or both
nodes
fail and its a complete disaster your stuffed really. and if one of
your
dies because it cannot take on all resources, your also stuffed.book
Again, all depends on setup as well I guess. I wish there was a better
out there!!!!a
But I wanted to get the feel from other people who work day in and day
out
with clusters. Surely, you have come across sites where the cluster is
in
mess, and you are able to sort things out?Per
I think that is part of my problem - fully understanding the cluster,
the
hardware config, why there are SO many resources on each node!
Simon
"Rodney R. Fournier [MVP]" wrote:
Short answer - Yes
Medium answer - Depends. How do you define uptime? Application
uptime?
alerts.Server? If you define it on a per server at the hardware layer, then
probably not. Patches, BIOS/Firmware updates, etc. will kill a few
9s.
Long answer - I sure hope so.
What does your SLA define for uptime? You have an SLA right?
Hopefully you monitor it with user application availability.
Hopefully you have a monitoring system in place that can send out
themBoth proactive and reactive.
Hopefully you have standard - well defined maintenance windows, patch
management, virus protection, firewalls, policies, etc.
Hopefully you have a fully trained staff on hand 24x7.
Hopefully you have vendor support and a good working relationship
with
mind,already.
Hopefully you have hardware from the Clustering HCL.
Hopefully your organization from the top down understands and wants
to
maintain an HA environment.
Hopefully you have configuration and change management.
Hopefully you have a complete and accurate documentation for every
component. Documentation is very important.
If everything you do throughout the entire organization is with HA in
(weyou can indeed achieve 5 9's. I know we do on most of our clusters
here
possiblehave 30 clusters).
Cheers,
Rodney R. Fournier
MVP - Windows Server - Clustering
http://www.nw-america.com - Clustering Website
http://msmvps.com/clustering - Blog
http://www.clusterhelp.com - Cluster Training
ClusterHelp.com is a Microsoft Certified Gold Partner
"Simon" <Simon@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:382FFAA7-86D7-4B80-A2D6-08D4CCBDD2FA@xxxxxxxxxxxxxxxx
Hi guys
I just wanted to get an idea if anyone really believes that its
serversto
have a 99.999% uptime for a Win2k3 Cluster.
Our cluster has been quite unreliable and in fact our stand alone
:-)are well behaved compared to the cluster!
Any comments would be appreciated - its just to get an overall
picture
Greetings
Simon
.
- References:
- Re: 99.999% Uptime For A Cluster - Real World Comments
- From: Rodney R. Fournier [MVP]
- Re: 99.999% Uptime For A Cluster - Real World Comments
- From: Simon
- Re: 99.999% Uptime For A Cluster - Real World Comments
- From: John Toner [MVP]
- Re: 99.999% Uptime For A Cluster - Real World Comments
- From: Simon
- Re: 99.999% Uptime For A Cluster - Real World Comments
- Prev by Date: Re: 99.999% Uptime For A Cluster - Real World Comments
- Next by Date: Re: 99.999% Uptime For A Cluster - Real World Comments
- Previous by thread: Re: 99.999% Uptime For A Cluster - Real World Comments
- Next by thread: Re: 99.999% Uptime For A Cluster - Real World Comments
- Index(es):
Relevant Pages
|