Re: Help Again! Cluster Resources Moving Takes Ages & unreliable

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



John
thanks - Can I just clarify about the managed pagefile - I was always lead
to belive its essential a pagefile exists on C: and is specified a correct
min and max size setting.

the Veritas Disk Administrator is the tool behind the scenes that seems to
have replaced the built in tool that comes under Manage My Computer. I do not
know what it is fully doing.

Its not blue screened for a while now - just one node complaining of lack of
memory - hence why I felt the pagefile set at system managed may not be good
for the amount of resources that the node hosts.

Any clarification on the pagefile will be good.

Thanks, Simon

"John Toner [MVP]" wrote:

I'm not at all shocked to hear that the Veritas resources are causing
issues. In my experience, Veritas + MSCS = headaches. Unfortuantely, there's
not much that anyone here can do to help resolve why your veritas disk
groups are taking an excessive amount of time to go offline and online. Your
best bet would be to contact Veritas and have them explain why this process
is delayed.

Regards,
John


"Simon" <Simon@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:9F664E70-7BC5-4DB1-9AE7-FBA2873CDDF0@xxxxxxxxxxxxxxxx
Hi John
Thank you for the reply! The resources hanging are mainly all the file
resources but especially the ones that pending offline or pending online
are
the main Volume Manager Disk Group (we call it groupcluster).

Incidently, one of our nodes is set for SYSTEM MANAGED SIZE for the
Pagefile
- I was always under the impression you had to specify a size - ie: if you
have 2GB RAM, it should be 4096 min and 4096 max!

Any ideas?
Simon



"John Toner [MVP]" wrote:

Simon,

Specifically which RESOURCES are taking a long time to go offline or
online
pending? Resources will go offline and online in a specific order based
on
your dependencies. For example, a correctly configured file share
resource
will not even attempt to go online until the NetName and Physical Disk
resources are online. Use the Cluster Administrator GUI and watch the
way
the resources go offline and online.

Which resources are failing? Is it the File Share resources, disk
resources,
all of the above?

Getting a memory dump analysis probably isn't going to happen in the
newsgroups. We can give you suggestions as to what might cause a blue
screen, but you'll likely want to open a case with MS at this point and
have
them pinpoint what is causing the host to BSOD.

Regards,
John


"Simon" <Simon@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:745F5E4B-3530-4A22-8D83-6DA3BA555FCD@xxxxxxxxxxxxxxxx
John
The 2 nodes are file servers - thats all! I have another cluster at
another
site, same setup, hardware, software as mentioned above, yet it takes
90
seconds for their resources to move over.

As you can see, much better compared to ours. The resources go
offline,
but
several stay on offline pending - mainly the groups stay in this
state.

Also, when attempting to go online, they say online pending for ages -
3-4
mins at least, and then attempt to come up. most resources fail to
come
up!

The blue screens are inconclusive - I posted them before here and no
one
has yet to come to a real point as to why it failed - must point out
for 3
weeks its not blue screened yet!

Simon

"John Toner [MVP]" wrote:

Simon,

Specifically, which resources are taking the most time to go
offline/online?
You might start by explaining where the delay is in the failover
process...are they spending most of the time going offline or coming
online?

Another item that I'd look at is why you're experiencing downtime.
What
is
crashing and why is it crashing?

BTW, 5 - 7 minutes to move a cluster group is not really an
excessive
amount
of time to move a group. Some applications can take a while to stop
and
start their services (like exchange) so this is not an unreasonable
failover
time. If it's taking 5 minutes for your disk resources or IP
resources
to go
offline and back online, this would certainly be excessive.

Regards,
John


"Simon" <Simon@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:15035EC2-899E-4CED-ABF5-C0C8D6A2322A@xxxxxxxxxxxxxxxx
Hi again everyone :-)
I am in need of assistance - happy to post on here, or directly if
anyone
is
willing to provide some assistance here. I am more concerned about
our
cluster misbehaving, and not having any good resources in the
United
Kingdom
(1 book I found which I am still waiting on!!), and the White
Papers
are
not
fully helping me understand what is going on.

Spec is as follows
2 Servers x HP Proliant DL380 G3 Dual Processor 3Ghz and 2GB RAM
1 x HP EVA SAN Hardware Solution
OS installed on C:
The Quorum is on a seperate disk, but from my understanding, its
configured
as Shared. In other words, it can move to another NODE.

All hardware fibre channel using HBA's all drivers and firmware up
to
date.
Secure Path has been installed and configured and we also use
Veritas
Enterprise Administrator for the Disks Administrator application
(This
replaces the Microsoft Disk Admin tool which is part of the OS).

1st Node hosts 9 VDisks in sizes of 250GB. On this Node, there are
65
Resources configured in Cluster Administrator.

2nd Node hosts 8 VDsisks in sizes of 250GB. On this node, there
are 31
Resources configured in CLuster Administrator.

The main problem I have is the uptime, which currently stands at
14
days
(Max has only been 22 so far!!) and the Resources taking a LONG
time
to
move,
5-7 mins in most cases.

I have monitored this and noticed that some resources stay in a
pending
state for some time. When they move over, it takes a long time for
the
resources to move onto to another node.

Also, if the resources have not come back online CORRECTLY, it
takes
the
entire group down and moves them back. I think I may have resolved
this by
disabling the option "Affect the Group", which was ticked. A
lesson
learned
here was a Shared Resource was removed, the cluster tool could not
find it
and took the ENTIRE group down!

I am not too sure where to start - its in production so taking it
down
is
not easy. But I want to help the company with the limited skills I
have.
Im
not sure if its permissions problem with resources that is causing
the
issue,
or hardware but if anyone is able to share any additional info,
this
would
truly help.

Im the first to admit, I am NOT a cluster expert - But I want to
be
and I
want to know what is happening, so I can understand and correct
it.

I also apprecaite everyone is busy, but in times like this, I am
willing
to
do almost anything to help sort this out.

Thank you
Simon









.



Relevant Pages

  • Re: How would you configure an N+1 cluster using MS Cluster Servic
    ... hence prevent multiple failovers to a passive node in an N+1 configuration. ... You could try to create a script that all your application resources depend ... you cannot really prevent multiple nodes from coming online on ... Scope out the cluster configuration resource-wise so that 2 active ...
    (microsoft.public.windows.server.clustering)
  • Re: Help Again! Cluster Resources Moving Takes Ages & unreliable
    ... seconds for their resources to move over. ... several stay on offline pending - mainly the groups stay in this state. ... Also, when attempting to go online, they say online pending for ages - 3-4 ... 5 - 7 minutes to move a cluster group is not really an excessive amount ...
    (microsoft.public.windows.server.clustering)
  • Re: Cluster Installation
    ... why do you strongly suggest to not cluster Exchange in an AA config? ... comes online at all" is pretty strong wording. ... resources to manage, so if you want to do AA, then you need to make sure ...
    (microsoft.public.windows.server.clustering)
  • RE: Need help troubleshooting
    ... If you suspect an issue with the MSDTC log, you can flush any existing packets in the MSDTC log by running the msdtc.exe -resetlog command. ... Other resources in the new group ... I am running W2K3-EE on two servers that are nodes in a cluster. ... Can't get either to come online. ...
    (microsoft.public.windows.server.clustering)
  • RE: Help Again! Cluster Resources Moving Takes Ages & unreliable
    ... the time of the delay on the server that the resource is online pending you ... will see a chkdsk process running. ... cluster misbehaving, and not having any good resources in the United Kingdom ... Resources configured in Cluster Administrator. ...
    (microsoft.public.windows.server.clustering)