Re: Max SAN time out



Thanks

you answered our question, everything on our SAN should be redundant but you
never know...

"Michael Epprecht [MSFT]" wrote:

Hello

It depends on if a hard error was raised by the IO driver or the OS, or
neither. A Soft failure is more tolerable.

If a hard error comes up the driver stack to SQL Server, it will
“immediately” fail the affected DB. Losing the physical connection, say
pulling out the IDE/SCSI interface from the local drive or HA Fiber cable
being pulled out will be reported immediately as the bus dies, gets detected
by the hardware and IO driver, and the volume goes offline. Hard failure.
Similar to disconnecting the network cable.

If say a SAN Fiber Switch keeps the port up to the SQL Server, but the SAN
drops totally on another port, and the switch does not return an error code,
you then get into IO driver and OS defined timeouts (depends on the drivers
normally). Soft failure followed by a hard failure.

If you are trunking the HBA fiber traffic over ATM or a MAN infrastructure,
you generally wait for timeouts before SQL Server notices it. Soft failure
followed by a hard failure.

One thing is for sure, do not rely on any type of “grace period” as it may
differ from one IO driver version, OS SP version, or even SQL Server SP
level. If the connectivity is not reliable, don’t put SQL Server near it
unless you like data loss.


As far as “how long SQL can survive” once a file/drive starts failing or a
soft failure occurs, the answer is “it depends”. There is no explicit
timeout. SQL doesn’t proactively do anything about an IO error other than a
few retries. If those fail it just reports the OS error to whatever
functionality is asking for the read or write.

If you are just doing SELECTs against the data, then your query will fail
with a severe error, but other queries may find their data already in the
buffer pool and work just fine. Even UPDATEs can succeed if they don’t need
the data on the bad file or if it is already in cache as long as the disk
the log is on is working.

Failure of the DB can occur if a transaction needs to ROLLBACK and finds it
can’t do the reads or writes that it needs to do. A rollback failure will
cause a database restart. If the database happens to be tempdb or master,
it will cause an instance restart.

If the restart runs into the same error, then the database can be stuck in a
failed startup state until someone fixes the problem or chooses to restore
the DB.

--
Regards
Michel Epprecht [MSFT]

This posting is provided "AS IS" with no warranties, and confers no rights.
"Zekske" <Zekske@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:705A33BA-2AF8-4CCB-9CB2-130F51EA2B33@xxxxxxxxxxxxxxxx
What is the maximum disk or SAN time-out a SQL-server cluster can handle?
Is this documented anywhere?

Regards


.



Relevant Pages

  • Re: Max SAN time out
    ... It depends on if a hard error was raised by the IO driver or the OS, ... A Soft failure is more tolerable. ... If say a SAN Fiber Switch keeps the port up to the SQL Server, but the SAN drops totally on another port, and the switch does not return an error code, you then get into IO driver and OS defined timeouts. ... Soft failure followed by a hard failure. ...
    (microsoft.public.sqlserver.clustering)
  • Re: JDBC Driver: Connection reset by peer: socket write error
    ... Either your network broke the socket between the driver ... and the DBMS, or the DBMS had a failure which required it to ... The SQL Server it connects to is ...
    (microsoft.public.sqlserver.jdbcdriver)
  • Re: Veritas storage foundation HA for windows
    ... This is a solution to manage your data, not the services like SQL Server. ... To make an application cluster suitable ("cluster aware" is for used for ... Most of the time that we have hardware failure, ... Hope you have multiple generators, multiple UPS's, multiple switches, ...
    (microsoft.public.sqlserver.clustering)
  • Retrieve error text from extended stored proc
    ... SQL Server 2000. ... I am calling an extended stored procedure that returns an error code (0 or ... On failure I would like to record the error ...
    (microsoft.public.sqlserver.programming)
  • Re: Persistent CPU slowdowns during any gameplay
    ... >computer the heat is a little below the threshold of pain. ... So I'm wondering if there's a driver or some software that is making ... failure of the hardware cooling... ... have the machine checked before it fries due to repeated cooling ...
    (microsoft.public.windowsxp.games)

Quantcast