Re: SAN Reservation Contention problem



2003 for the drivers? Dang Bill, I have washed me car 4-5 since then! The
Broadcom drivers from Dell's Server Update Utility (SUU 5.0) are solid for
teaming. As for the Intel settings, you are correct, you can't set it at
1000 FULL, I just don't want to use AUTO - ever.

Dell Open Manager is fine and would be a best practice from Dell - and your
monitoring software might work better with it loaded. Go ahead a load it.
5.0 is pretty solid, just not separate Array Manager, you have to use the
web page one now and it simply sucks!

Call Microsoft support on the SAN issues, that is if the vendor is stumped.

Cheers,

Rodney R. Fournier

MVP - Windows Server - Clustering
http://www.nw-america.com - Clustering Website
http://www.msmvps.com/clustering - Blog
http://www.clusterhelp.com - Cluster Training
ClusterHelp.com is a Microsoft Certified Gold Partner


"Bill Bradley" <wdbradley3@xxxxxxxxxxx> wrote in message
news:%230bqog%233GHA.4900@xxxxxxxxxxxxxxxxxxxxxxx
WOW! Thanks for the fast response.

I have verified that the firmware, driver, settings, and Registry settings
for the QLogic 2310F cards are the ones required on the XioTech support
matrix, and, the latest driver and firmware listed on the QLogic site, as
well.

I have two embedded Broadcom NICs (the server is a Dell PE 2650), so,
could use one of the for the heartbeat, thus separating the NICs (we
didn't use the Broadcoms, initially, as there were issues with their
drivers back in 2003). Is there a procedure to change to a different set
of heartbeat NICs safely?

On the 1 Gb Speed/Duplex settings, I'm at Intel driver 10.3, and, see that
they have an 11.1 out, so, will try that, but, I don't know that I can
hard-code our Cisco equipment to any more that 1Gb/Auto (I'm NOT at
Auto/Auto, but at 1Gb/Auto). AFAIK, I can't do 1Gb/Full, but will check.

The other thing I am looking at is that I have Dell OpenManage loaded on
the servers...could THAT be causing any of this?

Here is a snippet of the XioTech Capture, showing what they say is the
contention:

Sorry, I missed your second question. The log files are encrypted and
only viewable by Xiotech. I can send snippets but they would be mostly
meaningless as it's all engineering codes.

When a cluster has reservation conflicts it looks like this:

CN0 258475 ! 257237 01:14:58pm 09/22/2006 10cc Debug
HOST-210000e08b082366 VID 00/14 ERROR
> ec 0x18, VID 14, port 0, WWN 210000e08b082366
CN0 258476 ! 257238 01:14:58pm 09/22/2006 10cc Debug
HOST-210000e08b082366 VID 00/16 ERROR
> ec 0x18, VID 16, port 0, WWN 210000e08b082366
CN0 258477 ! 257239 01:14:58pm 09/22/2006 10cc Debug
HOST-210000e08b082366 VID 00/16 ERROR
> ec 0x18, VID 16, port 0, WWN 210000e08b082366
CN0 258478 ! 257240 01:14:58pm 09/22/2006 10cc Debug
HOST-210000e08b082366 VID 00/16 ERROR
> ec 0x18, VID 16, port 0, WWN 210000e08b082366
CN0 258479 ! 257241 01:14:58pm 09/22/2006 10cc Debug
HOST-210000e08b082366 VID 00/16 ERROR
> ec 0x18, VID 16, port 0, WWN 210000e08b082366
CN0 258480 ! 257242 01:14:58pm 09/22/2006 10cc Debug
HOST-210000e08b082366 VID 00/16 ERROR
> ec 0x18, VID 16, port 0, WWN 210000e08b082366
CN0 258481 ! 257243 01:14:58pm 09/22/2006 10cc Debug
HOST-210000e08b082366 VID 00/16 ERROR


There are literally hundreds per second. I am looking only at the past
65000 or so lines, about 3 days worth in this case, and it's almost
nothing but reservation conflicts. There is quite literally 99% of the log
filled with these conflicts.

I suggest contacting Microsoft to resolve this if the steps we outlined
haven't helped. Zoning and storage CANNOT cause reservation conflicts.
Only the Microsoft cluster can. Reservation conflicts occur when the
cluster nodes fight for control of the storage. When there is this disk
contention it is usually heartbeat related. The storage and zoning does
not control which server accesses a shared disk at a given time. In all
cases with all vendors this is handled by the OS.

Thanks!


"Rodney R. Fournier [MVP]" <rod@xxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote in
message news:e02$nA%233GHA.1588@xxxxxxxxxxxxxxxxxxxxxxx
Bill, I see a few issues here:

1) Single point of failure with your Intel Pro MT. Public and private on
the same nic? That is not a best practice. Buy at least one more NIC.

2) After you buy a second NIC for the private, team the Public (on two
different NIC's if possible) using Fault Tolerance and not Load
Balancing. Teaming the Public is another best practice.

3) Hardcode the Public to 1000 and not auto. If you can't do this, it's
because you are not running the correct/latest Intel drivers. Go to Intel
and download them.

4) None of the above have anything to do with your SAN issue. Check out
the firmware, bios, and drives for the HBA's, I am beating the are not
the latest.

5) As for your suggestions below don't do any of them, since they won't
help, only my number 4 will. My 1-3 are best practices, but they won't
help your SAN.

Cheers,

Rodney R. Fournier

MVP - Windows Server - Clustering
http://www.nw-america.com - Clustering Website
http://www.msmvps.com/clustering - Blog
http://www.clusterhelp.com - Cluster Training
ClusterHelp.com is a Microsoft Certified Gold Partner


"Bill Bradley" <wdbradley3@xxxxxxxxxxx> wrote in message
news:ObpOyl63GHA.512@xxxxxxxxxxxxxxxxxxxxxxx
I have two, two-node, Active/Active, Server 2003 w/SP1 clusters, with
shared disks on a XioTech Magnitude 3D 3000e connected via a pair of
McData 4400 fibre switches. The Public and Private networks are the two
ports on an Intel Pro/1000 MT dual-port card, with the Public at 1 Gb
Auto, and the Private using crossover cables at 100/Full. The HAB's are
QLogic 2310F.

I had been having some problems with the backup system (CommVault),
where tape drives or the library were dropping offline, so, had
incidents open with both XioTech and CommVault.

XioTech tells me that I have a serious problem with reservation
contentions on the SAN's disks, and, that this is NOT a SAN or zoning
problem, but a Microsoft Cluster problem.

I don't know what to adjust on my clusters. I've verified all the
settings, and, they all seem correct with the best tips and tricks. I
also don't see a LOT of errors on the Windows side, like they say they
are seeing on the SAN side (hundreds per minute).

They say this is likely a communications problem, and, I'm looking at
changing three things:

1. Private to 10 Mb/Half.
2. Under the Network Name parameters, remove Enable Kerberos
Authentication.
3. Under the Network Name parameters, remove DNS registration must
succeed.
4. Disable multicast on the cluster (even though it's only two
nodes).

I don't particularly see that I NEED to do this, but...can't see
anything else to do.

Any ideas on this?

Thanks!







.



Relevant Pages

  • RE: Bidirectional Printing Windows 2003 Cluster
    ... printer drivers, are they built-in drivers, or are they downloaded from the ... The language monitor provides the common language that is needed for the ... Windows Server 2003 includes Pjlmon.dll, ... Bidirectional Printing Windows 2003 Cluster ...
    (microsoft.public.windows.server.clustering)
  • Re: 32-bit to 64-bit Print Server Cluster Migration
    ... You can use Print Mig as long as you install the 64bit print drivers to the ... cluster spooler before performing the restore. ... before backing up so all the printers will be restored. ... standalone 64bit test server and till now I didn't get the printers to ...
    (microsoft.public.windows.server.clustering)
  • Re: Print$ Share
    ... The printers and drivers were installed onto the cluster Virtual Server not ... We have just brought online a second file and print cluster, ... installing printer drivers they are copied to ...
    (microsoft.public.windows.server.clustering)
  • RE: Cant connect to shared printers on a Clustered Print Server.
    ... the cluster server can only cluster the printer spooler. ... How to Configure the Spooler Resource for the Cluster ... The printer drivers are copied to ...
    (microsoft.public.windows.server.clustering)
  • Re: NLB Cluster - Ping fails or long time to reply from outside local subnet - SOLVED
    ... Windows Server 2008 Readiness Team ... cluster on a separate DLink card in multicast mode. ... I thought that the litmus test was that the router functions fine ... member of the NLB cluster, setup NLB on it, plug the NICs ...
    (microsoft.public.windows.server.clustering)

Loading