Re: Cluster Freezes



Edwin,

Thanks for your reply.

I apologize if I wasn't clear earlier, I just inherited the cluster and
didn't quite have all the detail that maybe I should have.

When testing the offline times I notice that the file shares go offline in a
matter of seconds which is expected and the same for the IP and Network Name
resources. However, the Physical disk seem to take atleast 30-45 seconds to
go offline.

The failover time for all the resources is really good maybe 20 seconds tops
for the whole group.

But, the real problem is coming back online. IP and Network name are fast
and then I have a period of about 30 minutes waiting for the Physical disk to
come online and then the fileshares come online quickly after or it never
finishes bringing the disk online before it freezes.

At this point you have to shutdown one node and reboot the other in order to
get the cluster back and then bring back the second node.

The only configuration that seems out of place to me is that the physical
disks have a dependency of the Network Name. I have always made it a practice
to have no depenencies on the physical and just make the disk and network
name dep. of the file share. My managers ideal was that by him putting the
network name as a dep. on the disk that it would always be available.

I haven't changed this yet simple there are so many file shares it would
have to be added to if I changed this dep. However, I found an article that
says that you should never add dep. on a physical disk and do you think this
could be my problem.

I can't find any article that tells me why I have it set like this. Can you
explain?

Thank again!

"Edwin vMierlo [MVP]" wrote:

I think you need to time it more carefully, so you know where the actual
time is taken

there are 3 times involved

1) the offline time of the group
2) the move of ownership of the group to the other node
3) the online time of the group

In regards to 1) the offline time of the group

you need to determine the offline time of each of the resource depending on
their dependency structure, e.g. how long does it take to take *just* the
File Share Resources, and leave IP / Name and Disk online, just to see if
offlining the Shares are taking the time or that it is something else.

Once you know the offline timings, do the same for 3) the online timings

usually the 2) move of ownership is reasonably fast and does not take a long
time.

In other words, you need to be more detailed in "group move is slow" and
determine what resource or resources is slow and doing what action
(off/on/move)

Rgds,
edwin.



"Rog" <Rog@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:BA9BA565-5E11-4C48-B6D5-FFCDE3584FD3@xxxxxxxxxxxxxxxx
All,

I have a W2k3 R2 cluster that had about 10 disk and 150 file share
resources. When I attempt to failover the group it will take about 15
minutes
minimum to failover if successful. But, most recently it completly locks
up
both servers and you have to shut them both down and bring one up at a
time
to get the cluster back online.

If I take about a third of the file share resources offline, then I can
failover in about 10 minutes. I was first thinking that I might be dealing
with a timeout issue, but I didn't think it would cause the server to
freeze.

Is there a limit to the number of disk and file share resources you should
have in a single cluster group?

Anyone have any suggestions?

Thanks!



.



Relevant Pages

  • Re: Disk Ressource are pointin to different disks
    ... during the time when they wanted to bring this group online the ... disk was presented to the cluster and not offline as it was most of the ...
    (microsoft.public.windows.server.clustering)
  • Re: Cluster Freezes
    ... I took everything offline one at a time to determine if it really was the ... disk that was causing the delay in failover and it was. ... Network Name online, with out any issues. ... Second question I would ask is are you using shadow copies? ...
    (microsoft.public.windows.server.clustering)
  • Re: Cluster doesnt failover to other node properly
    ... You could also take all resources in the group offline. ... online on the other node. ... > transfer resources to ExchserverB, ...
    (microsoft.public.windows.server.clustering)
  • Re: Physical disk hangs at "offline pending"
    ... when you look at the properties for this disk resource and inspect the 'dependencies' tab...is there anything listed there? ... "OracleDB" completely online or offline. ... The other resources are offline. ... is this Oracle FileSystem? ...
    (microsoft.public.windows.server.clustering)
  • Re: Help Again! Cluster Resources Moving Takes Ages & unreliable
    ... seconds for their resources to move over. ... several stay on offline pending - mainly the groups stay in this state. ... Also, when attempting to go online, they say online pending for ages - 3-4 ... 5 - 7 minutes to move a cluster group is not really an excessive amount ...
    (microsoft.public.windows.server.clustering)