Re: Identify/parse content within documents?
- From: noel.whelan@xxxxxxxxx
- Date: 14 Jan 2006 16:17:00 -0800
yea, I was thinking of that. I can extract the text of word docs and
pdfs; but I was thinking ideally I could query the index itself. If
not, I've basically got to include the blob itself, and the clob text,
for each document.. ;/
Hilary Cotter wrote:
> You have to have some mechanism to convert the binary data to text data.
> After you have done this its a simple matter of running a regex or writing a
> parser.
>
> I don't recommend a tsql solution to this.
>
> --
> Hilary Cotter
> Looking for a SQL Server replication book?
> http://www.nwsu.com/0974973602.html
>
> Looking for a FAQ on Indexing Services/SQL FTS
> http://www.indexserverfaq.com
>
> <noel.whelan@xxxxxxxxx> wrote in message
> news:1137089226.559095.313470@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> > I've got a full-text index on a table in which I've got word docs and
> > pdfs. I want to parse out certain types of content from these
> > documents, and include the parsed content in the info I give back to
> > the client.
> >
> > In this case, I want to collect the e-mail address(es) from each
> > document yielded by a query. In other words, the client would input a
> > term like 'indexing', I query the database to yield those docs which
> > contain this term, then for each doc parse out the e-mail addresses it
> > contains and provide them back to the client. I imagine I could
> > identify these with a posix expression; but I can't find an example of
> > this type of query. Is there a way to do this?
> >
> > Currently installed evaluation version is 2000 - 8.00.194 on Windows
> > XP. Thank you for any input..
> >
.
- Follow-Ups:
- Re: Identify/parse content within documents?
- From: noel . whelan
- Re: Identify/parse content within documents?
- References:
- Identify/parse content within documents?
- From: noel . whelan
- Re: Identify/parse content within documents?
- From: Hilary Cotter
- Identify/parse content within documents?
- Prev by Date: Re: full text search on multiple tables yielding one rank
- Next by Date: Advice Needed On Very Large Full Text Indexes
- Previous by thread: Re: Identify/parse content within documents?
- Next by thread: Re: Identify/parse content within documents?
- Index(es):
Relevant Pages
|
Loading