Re: Unable to Apply SP4 to SQL 2000 Cluster (new Node)



There are a couple of issues here; so, let me see if I have them straight,
and then tackle them one at a time.

Yes, although you can run heterogeneous clusters, it is not recommend, and
yes, not only does the hardware need to be on the HCL, the entire
configuration must be in order to supported. With that said, MS PSS will
still try to help out in most cases.

As for the volumes, I'm not quite sure what you mean by "on board." If that
were the case, you would be using a RAID controller, not HBAs. So, I'll
assume you mean either an externally connected array or a SAN.

Now, for the first problem, adding the 3rd node as a member of the cluster.
Yes, when you have a shared system and shared bus configuration, the
Advanced Configuration for the Cluster Add Node function is required. You
may still want to add the registry key indicated in the KB article. I will
note, however, that the location switched between Win2K3 and Win2K3 SP1. It
used to be in the ClusSvc Key but moved to the ClusDisk Key so it would not
be removed if the node was evicted, since the ClusSvc Key is regenerated in
such cases but the ClusDisk Key is always present.

From what you said, you were able to make node 3 a member of the cluster,
and from prior messages, you were eventually able to move both the Cluster
Resource Group and DTC Resource Groups between all three cluster nodes.

You then installed a 2nd instance of SQL Server. May I ask which member
nodes were included in the SQL Server Setup? All 3 nodes? I have had
problems installing SP4 prior to SP3, even on a new installation, but if you
were able to, good. Can all member nodes take ownership of this second
instance?

Now, you are trying to tackle including the new 3rd node as a potential
owner of the 1st SQL Server instance, but are running into disk ownership
messages. What is the exact error message? When do you receive it? Where
did you locate the message? What method did you use to place the RTM bits
on the 3rd node? You will not be able to host disk or services on node 3
for SS instance one until the bits are at the same patch level. The cluster
service will not allow you to move the group if any resource is not a
designated owner. If all of them are, then you could offline the SQL Server
resources, and then try to take ownership and only bring the disks, IP, and
Network Name online. Is this when you get your error messages? If so, then
it is probably a disk signature collision on node 3. If this is the case,
examine HKLM\SYSTEM\MountedDevices. All disclaimers apply: but delete any
shared resources, both GUID and Drive letter entries. Reboot the server,
and then try to take ownership again without launching the LDM Administrator
(remember, the SQL Server resources will not come online until the bits are
at the same patch level).

If this is not the case, then I will need the answers to the above
questions.

Now, when the 4th cluster node was added, were you again successful in
adding it as a cluster node member? When you installed the 3rd SQL Server
instance, which cluster nodes did you add as potential owners? Are you
again attempting to add node 4 as a potential owner of SQL Server instance
1? And you are receiving the same error messages as on node 3?

Is it only the 1st SQL Server instance that you are have trouble with?

We will probably have to swap messages back and forth for a bit, but the
more explicit you can be, and any supplementary documents you can provide
(error logs, methods, etc.), the quicker this will go.

Sincerely,


Anthony Thomas




--

"hmscott" <hmscott@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:9547B1C4-9952-4B40-BE0C-E0AD864505F7@xxxxxxxxxxxxxxxx
I read throught the KB article, but I'm not 100% certain it's going to
apply
in this situation, unless it addresses broader issues that I'm not
catching.

Here is our complete set up (and a bit more history below):
Node 1: IBM x346, QLogic dual port HBA, 4 on board drives configured as
RAID
10
Node 2: IBM x346, QLogic dual port HBA, 4 on board drives configured as
RAID
10
Node 3: Dell 2650, QLogic dual port HBA, 5 on board drives; RAID 1 and
RAID 5
Node 4: Dell 2650, 2 QLogic single port HBAs, 5 on board drives; RAID 1
and
RAID5

In the cluster administrator, there are 5 Groups:
Cluster Group (Disk, IP and Cluster Name)
MS DTC Group (Disk, IP, Network Name and DTC Application)
DBC01 (Disks, SQL Server resources)
DBC02 (Disks, SQL Server resources)
DBC03 (Disks, SQL Server resources)

Originally, this was a two node Active/Passive cluster using just the
IBMs.
We had a second Active/Passive cluster running on some older machines and
the
idea was to consolidate things into a three-node Active/Active/Passive
cluster. Problems cropped up as soon as I added the third node. The
Cluster
Wizard couldn't find a shared (Quorum capable) disk on node three. I
found a
KB article that indicated that this sometimes happens in Windows 2003 (pre
SP1, which we were at the time). The solution was to hit the back button,
and use the advanced options to specify minimal checks during the analysis
phase. The warning was to be certain that the drives were configured
correctly.

I was able to proceed after that with no trouble at all, loading a second
instance of SQL 2000 and then patching it to SP4. I then started to
configure the newly added third node to be "aware of" the first SQL
Instance
(since it was newly added, it had no knowledge of the first instance). I
was
able to load the RTM binaries, but when it came time to load SP4, I came
accross this same error (Virtual disks owned by another node). I get the
same error with SP3, by the way.

I;ve been living with it for some time now, but recently we took our three
node Active/Active/Passive cluster and made it a four node
Active/Active/Active/Passive with a third instance of SQL Server 2000.
That's where I am now.

Sorry, I realize it's a bit of a long tale.

Also, I have pretty much the exact same configuration in our test
environment. The only exception is that all of our hardware in test is
Dell
(all Dell 2650s). Everything that I have done in our production
environment,
I have done in the test environment first (with no troubles at all).

I think I'm going to open a ticket with MS, but I'm a bit reluctant since
it's not a supported configuration for clustering. I'm afraid all
thery're
going to say is, "nice try, do it all over again."

Thanks for your kind responses.

Regards,

hmscott
"Anthony Thomas" wrote:

How to add a registry value to a Windows Server 2003-based computer that
you
start from a SAN so that the startup disk, the pagefile disks, and the
cluster disks are all on the same SAN fabric

http://support.microsoft.com/kb/886569


Anthony Thomas


--

"hmscott" <hmscott@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:501F7E4D-6D5F-45C2-9D31-EBD1638F2361@xxxxxxxxxxxxxxxx
Scratch this post (I really am stretched thin). I found your reply to
my
other post.

Yes! I am using multi-pathing software. I am currently using IBM SDD
Driver
(we're mounted on a DS 8100 through an IBM SVC).

Can you tell me what the reg fix is for multi-pathing? Or just point
me
at
the MS KB?

Regards,

hmscott

"hmscott" wrote:

Anthony,

I'm sorry, I searched through the forum and did not find your
previous
post.
I looked at all of my posts and the oldest one I could find was
from
10/27
(which is maybe why I reposted on Saturday).

You mention that the shared drives are not properly configured. I
won't
dispute the point, but could you please elaborate?

I recognize that something is wrong in the MSCS configuration
because
when I
went to add another node in the cluster, the cluster configuration
utility
gave me a warning that the fourth node in the cluster was unable to
detect
the Quorum disk. I was able to "get around" that error message by
using
the
advanced configuration which bypasses some of the checks. Once the
node
was
added to the cluster, I was able to use the "Move Group" to pass the
"Cluster
Group" (containing the Quorum drive) around to all four nodes in the
cluster
with no trouble. Likewise, I was able to pass the MSDTC group
around to
all
nodes in the cluster.

I assumed that the error was because the local (on board) SCSI
drives
are
different among thr four nodes. On two nodes, there is only one
virtual
disk
while on the other two nodes there are two virtual disks.

I again apologize if I did not pick up on your response to my first
post. I
realize it's very bad etiquette, but I am stretched fairly thin at
this
point. I would be very grateful for any added assistance you can
provide.

Regards,

hmscott
hugh.scott3.at.verizon.net

"Anthony Thomas" wrote:

I think you had an additional post, which I responded to, but you
had
some
things here that concern me as well.

First, you are installing SS2K on a Win2K3 SP1 machine. SP3 will
be
required. Moreover, SP1 security enhancements disable TCP until
after
you
apply SP3.

Specifically, items 5 and 6 below are unnecessary because item 2
already
deployed the RTM binaries.

The sequence is:

1. Add new node to cluster.
2. Install RTM SS2K and designate the node membership.
3. Create Named Pipe aliases on each member node for the Named
Instance you
are managing.
4. Install SS2K SP3 from the active node.
5. Install SS2K SP4 from the active node.
6. Install at least SP4 HF2040, but would be highly recommended
that
you
apply SP4 HF2187.

Nevertheless, from your other post, it appears that you still have
not
gotten your shared volumes configured correctly first. This is a
prerequisite.

Sincerely,


Anthony Thomas


--

"hmscott" <hmscott@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:9D800B12-9E4A-4A59-9E61-96CD6BDC1A7A@xxxxxxxxxxxxxxxx
O/S: Windows 2003 Enterprise SP1
SQL Server 2000 EE (32 bit)

I had a three-node A/A/P cluster.

This weekend, I added a fourth node and a third active instance
(A/A/A/P).
The installation process went smoothly:

1. Add new node to cluster
2. Install SQL 2000 (RTM) (new named instance)
3. Install SQL 2000 (SP4)
4. Install SQL 2000 (hf 2040)
5. Install SQL 2000 binaries on newly added node for Instance 1
6. Install SQL 2000 binaries on newly added node for Instance 2

The problem came when I tried to apply SP4 on the new node for
Instances 1
and 2. I keep getting the following error:

"All cluster disks available to this virtual server are owned by
other
nodes."

It comes right after I type in the name of the Virtual Server to
be
maintained. I am outside the maintenance window now, so I can't
run
SP4
from
the instance that "owns" the resources. I read through the
installation
guide a couple of times, and I've searched a fair amount on the
boards. I
see where other people have come across a different error (See
http://support.microsoft.com/?kbid=905286), but I've never seen
anyone
with
this particular error.

I have another three-node cluster in the test environment and
when I
added
the thrid node for that cluster in, things went just fine.

I'd be grateful for any thoughts and/or suggestions.

Here is the tail end of the sqlsp.log file:

13:07:47 Previous Install Version: 8.00.194
13:07:47 ReleaseSetupTopology
13:07:47 End Action DialogShowSdMaintain
13:07:47 begin ShowDialogsUpdateMask
13:07:47 nFullMask = 0xb034603, nCurrent = 0x400, nDirection = 1
13:07:47 Updated Dialog Mask: 0xb03e607, Disable Back = 0x1
13:07:47 Dialog 0x400 returned: 1
13:07:47 End Action ShowDialogsHlpr
13:07:47 ShowDialogsGetDialog returned: nCurrent=0x2000,index=13
13:07:47 Begin Action ShowDialogsHlpr: 0x2000
13:07:47 Begin Action: DialogShowSdUpgrade
13:07:47 ShowDlgUpgrade returned : 1
13:07:47 Checking databases on instance 'MSSQLSERVER'
13:07:47 Begin Action: Check for VS Node
13:07:50 All cluster disks available to this virtual server are
owned by
other node(s).
13:07:51 Setup was unable to verify the state of the server for
an
upgrade.
Verify the server is able to start and that you provided a valid
sa
password
and restart setup.
13:07:51 End Action DialogShowSdUpgrade
13:07:51 End: ShowDialogs()
13:07:51 Action CleanUpInstall:
13:07:51 Installation Failed.


Regards,

hmscott








.