Cluster will not fail over.
- From: "Admiral" <admiral@xxxxxxxxxxxxxxxxxxx>
- Date: Tue, 20 Dec 2005 14:34:47 -0800
We had an error over the weekend of mass
porportions(Sunday 3pm PST). Long story short; the model
database was detached and the SQL Server was stopped, with it still
detached. This happened to happen on our primary Production Database
Clustered Server which is the bread-n-butter of the compay.
(OUCH!)
It was time for some fast actions. We started
the re-install SQL Server. In order to do so, the previous install had to
be uninstalled. This seemed to go smoothly enough, but when re-applying
the SP3a, we encountered an error. After researching the error, apparently
in a clustered environment this will occur since the SP3a files still reside on
the node(s). Microsoft states that if within a particular log file
it results with an 'Installation was Successful', to disregard the
error. I double checked the log file and sure enough the error was
disregarded.
We moved along with the installation. We were
able to restore all the user databases and all system databases with the
exception of the master database. Unfortunately, even with starting SQL
Server in single-user mode, the restore of the master database would not
take. So it was not restored, but all other databases were.
Fortunately, I ran a quick script to recover all the user logins previous to the
disaster, which I reapplied to the new installation of SQL Server.
Everything came back up and the QA Team
successfully tested the production Application (Monday 4am PST).
(Fhweeh)
After the succesful testing of the production
environment, we tested the fail-over which resulted in SQL Server not
starting on the secondary node. All the resources came right up on it, but
not SQL Server. The only error that was that it was not able to locate the
file on 'O\logs\mastlog.ldf'. This error did not make sense since SQL
Server uses the same file for the primary node. We were pressed for
time since it was closing to start of business East Coast time, so we left the
server as is.
Throughout the day there were other issues that
arose, one in particular was certain systems were not able to connect to the
server via TCP/IP. In order to have them connect they needed to create an
alias of the server and use Name Pipes. This seems to be a rising concern
because there are users who need to connect via ODBC to a widely used
particular Access Application, which seems to only like the TCP/IP
route. I am somewhat sure this is related to the cluster
failure.
Anyway, this is the first time I've had to take a
breathe to revisit the problem at hand. We have been dealing with another
server that crashed on the same day, resulting in a brand new build of a SQL
Server Cluster environment (completely non related to the issue at
hand).
I'm sorry for the long winded story. Would
you have any idea as to why the cluster would fail on failover along with the
TCP/IP issue?
Thanks in Advanced..
- Follow-Ups:
- Re: Cluster will not fail over.
- From: Geoff N. Hiten
- Re: Cluster will not fail over.
- From: Admiral
- Re: Cluster will not fail over.
- Prev by Date: sql service password in cluster changed by wrong method
- Next by Date: Re: Cluster will not fail over.
- Previous by thread: sql service password in cluster changed by wrong method
- Next by thread: Re: Cluster will not fail over.
- Index(es):