Re: Max SAN time out



Hello

It depends on if a hard error was raised by the IO driver or the OS, or neither. A Soft failure is more tolerable.

If a hard error comes up the driver stack to SQL Server, it will “immediately” fail the affected DB. Losing the physical connection, say pulling out the IDE/SCSI interface from the local drive or HA Fiber cable being pulled out will be reported immediately as the bus dies, gets detected by the hardware and IO driver, and the volume goes offline. Hard failure. Similar to disconnecting the network cable.

If say a SAN Fiber Switch keeps the port up to the SQL Server, but the SAN drops totally on another port, and the switch does not return an error code, you then get into IO driver and OS defined timeouts (depends on the drivers normally). Soft failure followed by a hard failure.

If you are trunking the HBA fiber traffic over ATM or a MAN infrastructure, you generally wait for timeouts before SQL Server notices it. Soft failure followed by a hard failure.

One thing is for sure, do not rely on any type of “grace period” as it may differ from one IO driver version, OS SP version, or even SQL Server SP level. If the connectivity is not reliable, don’t put SQL Server near it unless you like data loss.


As far as “how long SQL can survive” once a file/drive starts failing or a soft failure occurs, the answer is “it depends”. There is no explicit timeout. SQL doesn’t proactively do anything about an IO error other than a few retries. If those fail it just reports the OS error to whatever functionality is asking for the read or write.

If you are just doing SELECTs against the data, then your query will fail with a severe error, but other queries may find their data already in the buffer pool and work just fine. Even UPDATEs can succeed if they don’t need the data on the bad file or if it is already in cache as long as the disk the log is on is working.

Failure of the DB can occur if a transaction needs to ROLLBACK and finds it can’t do the reads or writes that it needs to do. A rollback failure will cause a database restart. If the database happens to be tempdb or master, it will cause an instance restart.

If the restart runs into the same error, then the database can be stuck in a failed startup state until someone fixes the problem or chooses to restore the DB.

--
Regards
Michel Epprecht [MSFT]

This posting is provided "AS IS" with no warranties, and confers no rights.
"Zekske" <Zekske@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message news:705A33BA-2AF8-4CCB-9CB2-130F51EA2B33@xxxxxxxxxxxxxxxx
What is the maximum disk or SAN time-out a SQL-server cluster can handle?
Is this documented anywhere?

Regards

.



Relevant Pages

  • Re: Max SAN time out
    ... A Soft failure is more tolerable. ... If a hard error comes up the driver stack to SQL Server, ... If say a SAN Fiber Switch keeps the port up to the SQL Server, ...
    (microsoft.public.sqlserver.clustering)
  • Re: 1394 Error code 0xC0120090
    ... retry if this failure is received. ... but they finally stopped changing this value with each new DDK. ... >> driver is 0xC0120090. ... >> Don't know what these folks at M$ are upto. ...
    (microsoft.public.development.device.drivers)
  • Re: Its me ranting again
    ... (Tony Gardner) ... Failure to do so was a test ... I didn't engage the examiner in discussion about it - best to keep ... And it was explained as being a confident driver. ...
    (uk.media.radio.archers)
  • Re: "It just come off"- Driver charged
    ... Dave wrote: ... so apparently the load was being checked as often as required. ... If so, then WHAT was the failure, EXACTLY? ... failure is -not known- yet, so the citation was just because, well, when bad shit happens, people feel that -someone- must be to blame, so the driver in this case is a convenient scapegoat. ...
    (misc.transport.trucking)
  • Re: Persistent CPU slowdowns during any gameplay
    ... >computer the heat is a little below the threshold of pain. ... So I'm wondering if there's a driver or some software that is making ... failure of the hardware cooling... ... have the machine checked before it fries due to repeated cooling ...
    (microsoft.public.windowsxp.games)

Quantcast