Re: Store File in SQLServer via ADO

From: mcginty (mcginty_at_discussions.microsoft.com)
Date: 07/20/04


Date: Tue, 20 Jul 2004 16:09:03 -0700


"William (Bill) Vaughn" wrote:

> By "far more efficient" I mean 6x slower. Even with SQL Server 2000, we've
> run our own test and seen other tests that show that BLOB retrieval is far
> slower than fetching a path and opening the path to the target file--even
> with a convoluted file structure.

What was the test environment like? Single/few users? Single machine/one-hop LAN or multiple segments? Network speed? Number of files in the set? Concurrent access?

One implementation I was working with had a file set of ~500,000 files, consuming ~55GB. Ever had to move, copy or delete 500K files? Right off the top, set management jobs enjoy a huge benefit.

Our tests showed that performance over a range of load conditions was close enough that it was well worth the benefits, like the ability to easily retain the last n revisions, keep track of who's in the file, setup specific levels of auditing and run complex queries against the data, point in time restore, the list goes on and on.

The first implementation was a regular old LANMan file server but it was a travesty, SMB and NBT overhead incurring network load that escalated exponentially, heinously time-consuming glitches caused by concurrency, master browsers taking dives into the file system that lasted a minute or more... utter hell.

> Since the BLOB type is not directly usable by the target application (as when
> fetching a Word document or Excel spread***, you often have to write the
> BLOB to a local temp file which slows the process even further.

Not if you set the content type and write a binary stream to the Response context. The browser may do it behind the scenes, but it's seamless and fast. (I know for sure it's possible to do HTTP transfers without creating a file, but whether IE ever does that I won't guess.)

Of course, sending binary data via XML/HTTP means doubling the size of the blob for transfer (binary data is sent as a text representation of hex.) On the surface that seems excessive, but I figure that there are many, many things that don't need to be unicode, but are anyway, so I've reasoned inductively that a bloat factor of 2x is inconsequential, per defacto standards set by common O/S vendor practices.

> In many cases, these BLOBs are RO. However, unless you're ready to setup
> your backups very carefully you'll end up backing up these files over
> and over again along with the other data that does change.

So are you saying that is *not* the case when you do a full backup of a hard drive? :-) Full vs incremental still applies to either store. Also that it gets backed up at all is a plus in my book.

When you say "setup backups very carefully" I assume you're talking about planning a file group strategy to reduce handling of effectively r/o data -- I've found it can be damn handy to have that option.

> When you fetch a BLOB, you're also pushing out other "real" data from
> the data cache or putting pressure on the procedure cache.

That I won't argue with...

> And who "owns" the file? What if it needs to be accessed
> directly? Does one need a SQL Server license to get to it?

Does not one need a CAL to access a file server? Isn't SQL Server Lite freely distributable? One would have to imagine that license issues have been dealt with prior to implementation, as they exist no matter which way you turn.

> No, I have not been a fan of BLOBs in the database for a decade. While
> (as I said) SQL Server handles BLOBs more efficiently than ever but I
> still does not match fetching the path and accessing the path
> (even through protected file structures).

Well... :-) I hold onto opinions like that for way too long also. But if there's anything that can be definitively said about IT, it's that there are no absolutes. It can never be absolutely said that this is better than that -- personally I never use the word 'impossible' when talking to clients ('extremely unlikely', 'in defiance of all convention, logic and knowledge' but not 'impossible'). There are just too many different possible scenarios, not to mention perspectives.

Add to that the curiously fine line between broken and fixed, crawling and flying, useless and indespensible. I once improved database throughput by at least a factor of 5, just by changing the SQL's packet size. Client had a cheesy router that had 25%-40% loss forwarding 4K packets, and 0% loss with ~1400 byte packets. Sh*t to gold in one config.

Three weeks later I find out that router was just a loaner -- of course no one even mentioned that there had been a network failure prior to my visit. Now they wanted to know what I had done to reduce efficiency of some DTS job that ran at night. From genius and savior, to idiot and menace, just that fast. :-)

What can be learned from this deluge of verbiage? How about, keep an open mind, roll with the changes and the things in the industry that are destined to punish you will usually be a bit more merciful. :-)

-Mark McGinty

 
> ____________________________________
> William (Bill) Vaughn
> Author, Mentor, Consultant
> Microsoft MVP
> www.betav.com
> Please reply only to the newsgroup so that others can benefit.
> This posting is provided "AS IS" with no warranties, and confers no rights.


Loading