Re: Max SAN time out
- From: "Michael Epprecht [MSFT]" <michael.epprecht@xxxxxxxxxxxxxxxxxxxx>
- Date: Wed, 20 Jun 2007 21:31:27 +0200
Hello
It depends on if a hard error was raised by the IO driver or the OS, or neither. A Soft failure is more tolerable.
If a hard error comes up the driver stack to SQL Server, it will “immediately” fail the affected DB. Losing the physical connection, say pulling out the IDE/SCSI interface from the local drive or HA Fiber cable being pulled out will be reported immediately as the bus dies, gets detected by the hardware and IO driver, and the volume goes offline. Hard failure. Similar to disconnecting the network cable.
If say a SAN Fiber Switch keeps the port up to the SQL Server, but the SAN drops totally on another port, and the switch does not return an error code, you then get into IO driver and OS defined timeouts (depends on the drivers normally). Soft failure followed by a hard failure.
If you are trunking the HBA fiber traffic over ATM or a MAN infrastructure, you generally wait for timeouts before SQL Server notices it. Soft failure followed by a hard failure.
One thing is for sure, do not rely on any type of “grace period” as it may differ from one IO driver version, OS SP version, or even SQL Server SP level. If the connectivity is not reliable, don’t put SQL Server near it unless you like data loss.
As far as “how long SQL can survive” once a file/drive starts failing or a soft failure occurs, the answer is “it depends”. There is no explicit timeout. SQL doesn’t proactively do anything about an IO error other than a few retries. If those fail it just reports the OS error to whatever functionality is asking for the read or write.
If you are just doing SELECTs against the data, then your query will fail with a severe error, but other queries may find their data already in the buffer pool and work just fine. Even UPDATEs can succeed if they don’t need the data on the bad file or if it is already in cache as long as the disk the log is on is working.
Failure of the DB can occur if a transaction needs to ROLLBACK and finds it can’t do the reads or writes that it needs to do. A rollback failure will cause a database restart. If the database happens to be tempdb or master, it will cause an instance restart.
If the restart runs into the same error, then the database can be stuck in a failed startup state until someone fixes the problem or chooses to restore the DB.
--
Regards
Michel Epprecht [MSFT]
This posting is provided "AS IS" with no warranties, and confers no rights.
"Zekske" <Zekske@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message news:705A33BA-2AF8-4CCB-9CB2-130F51EA2B33@xxxxxxxxxxxxxxxx
What is the maximum disk or SAN time-out a SQL-server cluster can handle?
Is this documented anywhere?
Regards
.
- Follow-Ups:
- Re: Max SAN time out
- From: Zekske
- Re: Max SAN time out
- Prev by Date: Re: sql 2005 service pack 2 fails on a node in the cluster
- Next by Date: Re: Max SAN time out
- Previous by thread: Re: SQL Server pulls incorrect instance data from the registry?
- Next by thread: Re: Max SAN time out
- Index(es):
Relevant Pages
|