Re: meta data & full text search

Tech-Archive recommends: Fix windows errors by optimizing your registry

From: Sharon (talsharon_at_hotmail.com)
Date: 03/10/04


Date: Wed, 10 Mar 2004 23:39:54 +0200

Hi john.
Great, i didn't know i can call OLE-DB Provider from sp.
Cheers!
But i'm still not sure where to put those files.
file system is better streaming.
Even longhorn with its SQL based file system still uses file system to store
the files.
But SQL is easier to implement and to admin.
The files are not going to be larger than 300k
and are not expected to be more than 50000.
According to the "rule of thumb" i should store them in the file system.
If you have more stuff i can read please send the URL.
10x.
Sharon.

"John Kane" <jt-kane@comcast.net> wrote in message
news:e%23P2XWhBEHA.3064@tk2msftngp13.phx.gbl...
> There are more issues (transaction control, change control, audit of
> changes, etc) than just efficiency and scalability, although, those are
> important points... I've seen the ASPFAQ before and it has a number of
> references and
> links to KB articles (I didn't check them all), but one major reason for
> storing the files in a SQL Server table's column defined with an IMAGE
> datatype, is that in SQL Server 2000 you can take advantage of the new
> Full-Text Search (FTS) feature to FT Index the contents of supported file
> types, primarily MS Office, HTML and 3rd party IFilter's like Adobe's PDF
> files.
> As this is one of those FAQ type questions that have been known to start
> religious wars or at the very least a flame war, and while Sharon didn't
ask
> about other application issues that this question often generates as it is
> often an open-ended question, i.e., one that never seems to be
> answered to everyone's satisfactions as it usually "depends" upon the
> application or upon how one defines the word "best"...
>
> As for the Terra Server, checkout the "about page"
> (http://terraserver-usa.com/about.aspx?n=AboutBody) on the site that
> explains how MS and others did it and yes, they used the IMAGE datatype,
> specifically http://terraserver-usa.com/about.aspx?n=AboutTechDbschema
(The
> Imagery table contains the "blob" [image] field where the imagery data is
> stored.
>
> Sharon , depending upon how many files you have, how frequently they
change,
> how large they are, what the app is, etc. you may want to review the web
> site and the following rule of thumb that they used:
>
> < 1 million images or big images (> 1MB) put them in the file system.
> > 1 million and < 1 MB images, put them in SQL Server.
> Note, you can also use "Text-in-Row" if the files are >7000 bytes on avg.
>
> For everything in between, either way will work, depending upon if you
need
> transactional control over your files. Additionally, if you store the
files
> in a TEXT or IMAGE column, you can also store related metadata about that
> file in SQL Server as well for increased searching capabilities. Also, and
> obviously with SQL Server you get built-in support for validating the
> consistency of the database, indices, backup, restore, etc.
>
> As for loading &/or extracting the files from SQL Server there are now
many
> methods of doing this via BCP, BULK INSERT as well as ADO Stream DTS too,
if
> you need to transform your files in some manner.
>
> Hope this helps, as the primary consideration should always be what is
best
> for your application....
> Regards,
> John
>
>
>
> "Hilary Cotter" <hilaryk@att.net> wrote in message
> news:eZrhE1gBEHA.3928@TK2MSFTNGP09.phx.gbl...
> > My point was not that you can or can't store data larger than 8k in a
> table,
> > but I have found that it is more effecient and scalable to store data in
> the
> > char or varchar fields rather than text for retrieval purposed, and
better
> > yet, store it in the file system.
> >
> > I was recommending to Sharon, that she store it in both the file system
> and
> > the database, and if she does store it in the database that she stores
it
> in
> > a child table.
> >
> > I was speculating that the images stored in terraserver have been broken
> > into 8k chunks to achieve the performance benefit of using varchar or
char
> > as opposed to text or image - note you can store binary data in the
char,
> > varchar, or text columns. My specualtion about this is based on what I
> have
> > observed when you retrieve data from a database vs in the file system,
and
> > in a database where the data is at the 8k threshold.
> >
> > I suggest you try http://www.aspfaq.com/show.asp?id=2149
> >
> > This doesn't really address the performance hit of text vs varchar, but
> try
> > it for yourself.
> >
> >
> > "John Kane" <jt-kane@comcast.net> wrote in message
> > news:e1bemKfBEHA.1128@TK2MSFTNGP11.phx.gbl...
> > > Sharon,
> > > You can use the Indexing Service OLE-DB provider MSIDXS and setup a
> Linked
> > > Server via sp_addlinkedserver to query the document stored on disk and
> > > indexed by the Indexing Service. You can then write a SQL Server query
> > that
> > > can both query the IS via OpenQuery as well as the metadata that is
> stored
> > > in your SQL Server table, for example:
> > >
> > > use pubs
> > > go
> > > EXEC sp_addlinkedserver
> > > @server = 'JTKWin2003_IS', -- Unique Server name
> > > @srvproduct = 'Index Server',
> > > @provider = 'MSIDXS', -- OLE-DB Provider
> > > @datasrc = 'IS_DDrive' -- Indexing Service Catalog name
> > > go
> > >
> > > -- your document with a serial number 123 in the where clauase here.
> > > select pub_id from pub_info where pub_id = '0736'
> > > union
> > > SELECT *
> > > FROM OPENQUERY(JTKWin2003_IS,
> > > 'SELECT Filename FROM IS_DDrive..SCOPE()
> > > WHERE CONTAINS(''john'') AND CONTAINS(''kane'')' ) AS me
> > > go
> > >
> > > Hilary, I'm surprised! Of course, you can store >8Kb files in SQL
Server
> > via
> > > storing the files in columns defined with the IMAGE datatype, and
while
> > the
> > > debate on "storing files on disk" vs. "storing files in SQL Server" is
a
> > > long discussion and still actively debated (at near flame-war levels),
> > there
> > > are advantages to storing files in SQL Server, one of which is that
your
> > > able to Full text search the contents of these files, if they are
stored
> > in
> > > SQL Server, and of course that is the subject of this newsgroup! :-))
> > >
> > > John
> > > PS: the 8K chunks stored in the terraserver are satellite images (jpg
> > > files) and not MS Word documents...
> > >
> > >
> > > "Hilary Cotter" <hilaryk@att.net> wrote in message
> > > news:esLewkdBEHA.3024@tk2msftngp13.phx.gbl...
> > > > To query the metadata and the docuement you should do a Boolean AND
> > > search,
> > > > ie
> > > >
> > > > select * from TableName where contains(*,'Searchterm') and
> > > > MetaDataPropertySerialNumber=123
> > > >
> > > > For large numbers of matches this will be expensive.
> > > >
> > > > if your files are over 8k in size they should be stored in the file
> > system
> > > > and the database for performance reasons. Even
> terraserver.microsoft.com
> > > > uses a composite of 8k chunks of images to achieve its high
> performance
> > > > rather than text or image columns with a single image in.
> > > >
> > > >
> > > > "Sharon" <talsharon@hotmail.com> wrote in message
> > > > news:uhsN2ccBEHA.2768@tk2msftngp13.phx.gbl...
> > > > >
> > > > > Hi all.
> > > > > I have an application that stores document meta data in SQL
database
> > > > > and the document itself on the disk.
> > > > > The problem is that i need to search both meta data and the
> documents
> > > > > and display the result in one table.
> > > > > So lets say i need to find a document with a serial number 123
(meta
> > > data
> > > > in
> > > > > SQL)
> > > > > and that contains the word 'banana' (document body in file).
> > > > > How do i do that?
> > > > > The only solution i can come up with is to store the files in the
> > > database
> > > > > and preform the full text search in SQL.
> > > > > But the files are stored on the disk for streaming performance
> > reasons.
> > > > > If you have a better solution please let me know.
> > > > > Thanks.
> > > > > Sharon.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
> >
>
>



Relevant Pages

  • Re: why>?
    ... of ADP and OLAP just as you'd use them with SQL Server? ... Database as file system ain't new. ... talk of WinFS by years. ... Yup, database as storage subsystem. ...
    (microsoft.public.excel)
  • Re: drawbacks of Saving documents to Sql server
    ... Retrieving the files from the file system is always faster but not necessarily better. ... In the database I used a table to store a "header" record with one row per file containing the filename, filesize, number of 64K chunks, etc... ... SQL Server 2005 performs better than SQL Server 2000 hands down. ... that microsft is gong more towars saving everthing in sql server. ...
    (microsoft.public.dotnet.framework)
  • Re: Long text fields
    ... > nature of the data, not some arbitary string length consideration. ... documents using XML at the file system and simply store a URL. ... which is why I would like to store it in the database. ...
    (comp.databases.pick)
  • Re: file system permissions and cross-database certificates?
    ... one database to another within the same instance? ... is the use of the file system to BACKUP CERTIFICATE TO ... Erland Sommarskog, SQL Server MVP, esquel@xxxxxxxxxxxxx ...
    (microsoft.public.sqlserver.security)
  • Re: drawbacks of Saving documents to Sql server
    ... by anychance know or have the code to retrive the documents from the database ... retriving documents, do you think u might be able to point me to online ... Writing random documents to SQL was slower than the file system. ... SQL Server 2005 performs better than SQL Server 2000 hands down. ...
    (microsoft.public.dotnet.framework)