Re: 99.999% Uptime For A Cluster - Real World Comments



John
But this is what is happening. One Node has over 40 resources, which I dont
know is good or bad. Finding any real good working documentation outside of
microsoft is extremely hard. Even a book is hard to come by here in the UK.

Is there a limit to the amount of resources a node can host. I do plan to
add another node, but was concerned I may be ADDING to the problem, rather
than relieving.

And 2 nodes have failed with BSOD - which I did post on here before, and
never really received any real satisfactory response as the message is
extremely vague.

Also John, when a failure occurs, not all resources come back on line. Worse
still, is the resources move onto another node, and then all go offline and
attempt to go back to its host node. Only to then go down and revert to the
node it initially attempted to move to! In other words, its playing ping-pong!

Simon

"John Toner [MVP]" wrote:

Simon,

If you plan out your clusters correctly, you should never get into most of
the situations you describe. You should rarely experience a hardware failure
on both nodes at the same time, though it can surely happen. Cluster has the
solution for this, though...add another node :)

One node should never be in the situation where it dies because "it cannot
take on all the resources." This is clearly a planning issue rather than a
cluster issue.

Regards,
John

"Simon" <Simon@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:4EA4153E-16A9-4056-BF90-E4908D7C0400@xxxxxxxxxxxxxxxx
Thanks Rodney
Cluster in question is a File Server on 2 nodes. We dont have an SLA for
99.999 uptime - someone said that in a year, you could achieve 32mins
downtime.

I sort of disagree - because if you have a hardware problem or both nodes
fail and its a complete disaster your stuffed really. and if one of your
node
dies because it cannot take on all resources, your also stuffed.

Again, all depends on setup as well I guess. I wish there was a better
book
out there!!!!

But I wanted to get the feel from other people who work day in and day out
with clusters. Surely, you have come across sites where the cluster is in
a
mess, and you are able to sort things out?

I think that is part of my problem - fully understanding the cluster, the
hardware config, why there are SO many resources on each node!

Simon


"Rodney R. Fournier [MVP]" wrote:

Short answer - Yes

Medium answer - Depends. How do you define uptime? Application uptime?
Per
Server? If you define it on a per server at the hardware layer, then
probably not. Patches, BIOS/Firmware updates, etc. will kill a few 9s.

Long answer - I sure hope so.
What does your SLA define for uptime? You have an SLA right?
Hopefully you monitor it with user application availability.
Hopefully you have a monitoring system in place that can send out
alerts.
Both proactive and reactive.
Hopefully you have standard - well defined maintenance windows, patch
management, virus protection, firewalls, policies, etc.
Hopefully you have a fully trained staff on hand 24x7.
Hopefully you have vendor support and a good working relationship with
them
already.
Hopefully you have hardware from the Clustering HCL.
Hopefully your organization from the top down understands and wants to
maintain an HA environment.
Hopefully you have configuration and change management.
Hopefully you have a complete and accurate documentation for every
component. Documentation is very important.

If everything you do throughout the entire organization is with HA in
mind,
you can indeed achieve 5 9's. I know we do on most of our clusters here
(we
have 30 clusters).

Cheers,

Rodney R. Fournier

MVP - Windows Server - Clustering
http://www.nw-america.com - Clustering Website
http://msmvps.com/clustering - Blog
http://www.clusterhelp.com - Cluster Training
ClusterHelp.com is a Microsoft Certified Gold Partner


"Simon" <Simon@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:382FFAA7-86D7-4B80-A2D6-08D4CCBDD2FA@xxxxxxxxxxxxxxxx
Hi guys
I just wanted to get an idea if anyone really believes that its
possible
to
have a 99.999% uptime for a Win2k3 Cluster.

Our cluster has been quite unreliable and in fact our stand alone
servers
are well behaved compared to the cluster!

Any comments would be appreciated - its just to get an overall picture
:-)
Greetings
Simon






.



Relevant Pages

  • Re: Changing Node & Virtual IPs for Print Server Cluster
    ... It's a clustered print server. ... Cluster Node 1 Name: cluster1a.domain.com ... Move all resources to Node A. ... IP addresses for all virtual servers including the cluster itself. ...
    (microsoft.public.windows.server.clustering)
  • Re: Exch 2003 SP2 - applied on one node, but cant move resources
    ... resources to Node2, the failover did not complete because 'system attendant' ... Virtual Exchange server and failover occurred normally again upon taking ... cluster resources oline. ...
    (microsoft.public.exchange.admin)
  • Re: Failed cluster node confusion!
    ... Blue exclamation marks usually means that the cluster service has terminated ... If this fails then the heartbeat will go over the teamed NIC ... the second node did NOT failover the resources. ... working node when one node has completely died (blue screen, ...
    (microsoft.public.windows.server.clustering)
  • RE: Cluster migrations
    ... Our file clusters have one virtual name per cluster group, ... individual file share resources within that group. ... Then if you need to move it you just present the LUN ... run mountvol /e at the command on the Win2k3 node before you present the LUN ...
    (microsoft.public.windows.server.clustering)
  • RE: Cluster migrations
    ... client maps to that vitual server name, not the cluster name. ... individual resources for each share? ... Then if you need to move it you just present the LUN ... run mountvol /e at the command on the Win2k3 node before you present the LUN ...
    (microsoft.public.windows.server.clustering)