RE: 2003 Enterprise Storage Edition Cluster

From: ScottH (anonymous_at_discussions.microsoft.com)
Date: 09/21/04


Date: Tue, 21 Sep 2004 12:02:05 -0700

Are using Dells? I am having a similar problem, but not
for all clients and not for all resources. It happened
after Dell upgraded the PERCs. Not sure if that had
anything to do with it.........

>-----Original Message-----
>I should briefly mention that as soon as I move the
>resource group back to the original node I can
immediately
>ping the clustername and the client's connections work
>fine.
>
>This really makes it seems like a directory issue to me -
-
>either DNS/ActiveDirectory/PrimaryDomainController
>something. I'm not sure if we've moved as a corporation
>to full AD yet, I think we might still be Windows
>2000/2003 AD hybrid.
>
>But why would it work for ~500sec and then fail? Why
not
>fail right away? If which ever directory server was
>caching a stale entry then wouldn't this point clients
to
>the old (aka failed) node? Thus their connections
should
>fail immediately would fail immediately?!?
>
>>-----Original Message-----
>>Hi Greg,
>>
>>Thanks for your response. No, through the cluster
>>administration tool NONE of the cluster resources
>(quorum,
>>name, IP, nor any of the other resources disks, DFS,
file
>>shares) are failed. When I initiate a move or have a
>real
>>failure the resources (as viewed through the cluster
>>administration tool move back and forth from node to
node
>>in ~15-30sec everything without any issues). In fact
if
>I
>>remote desktop to either server BOTH can access the
>shares
>>via Network Neighborhood, regardless of which node is
the
>>Active partner, despite the fact that other domain
member
>>clients cannot. Similarly either node can successfully
>>ping the cluster name (nwfileshare) -- but clients
cannot.
>>
>>Per Ramon's request I started fresh logs and initiated
a
>>Move Group and sure enough after ~500 seconds I was not
>>longer able to ping the cluster name. However between
>>Time Zero when I initiated the MOVE and before the 500
>>second magic time I could ping the cluster name.
>>
>>Something happens around 500 seconds that disrupts the
>>ability of clients to ping the cluster name. If they
>>cannot ping the cluster name, then my DFS/CIFS clients
>>cannot connect.
>>
>>When investigating the Application and System logs I
saw
>>no red errors on either node. I saw one yellow warning
>on
>>the node receiving the resource group:
>>
>>"The registration of DNS name nwfileshare.merck.com for
>>resource 'Cluster Name' over adapter 'AdapterTeam'
failed
>>for the following reason: DNS operation refused."
>>
>>Would this be a problem? All three of my names (nodea,
>>nodeb, clustername) have static, DNS registered FQDNs?
>>
>>I also saw one informational message, on both nodes,
>which
>>occured a few minutes after 500 seconds, stating that
the
>>W32Time:
>>The time provider NtpClient cannot reach or is
currently
>>receiving invalid time data from uswsdc1100.merck.com
>>(ntp.d|54.14.62.34:123->54.23.184.189:123).
>>
>>Hopefully this helps! Thanks so much for all of your
>>effort!
>>
>>Jonathan E. Schneeweis, Research Biochemist
>>Sun Certified System & Network Administrator
>>
>>Department of Automated Biotechnology
>>Merck Research Laboratories
>>503 Louise Lane, NW-2
>>North Wales, PA 19454
>>
>>Jonathan_Schneeweis@merck.com
>>
>>
>>>-----Original Message-----
>>>do the IP address or network name resources fail? do
any
>>resources in the
>>>cluster fail?
>>>
>>
>>.
>>
>.
>



Relevant Pages

  • Re: cluster unaware custom application
    ... How long does a "fail over" usually take? ... why a application in a cluster should NOT be a .NET ... Also, once online on the surviving node, you need you clients to ... My application is a production control, ...
    (microsoft.public.windows.server.clustering)
  • Re: cluster unaware custom application
    ... How long does a "fail over" usually take? ... why a application in a cluster should NOT be a .NET ... Also, once online on the surviving node, you need you clients to ... My application is a production control, ...
    (microsoft.public.windows.server.clustering)
  • RE: 2003 Enterprise Storage Edition Cluster
    ... But why would it work for ~500sec and then fail? ... No, through the cluster ... >name, IP, nor any of the other resources disks, DFS, file ... >longer able to ping the cluster name. ...
    (microsoft.public.windows.server.clustering)
  • Re: Event ID 1205 occuring during advanced testing phase.
    ... Are the 2 disk resources part of your "Cluster Group"? ... would have to move over to the other node...though they could fail to come ...
    (microsoft.public.windows.server.clustering)
  • Re: [ANNOUNCE] Minneapolis Cluster Summit, July 29-30
    ... > We're showing up with loads of Sistina code this time. ... > kinds of resources a particular service requires, ... cluster resource management is one of the things where I'm quite ... fail better. ...
    (Linux-Kernel)