Re: Identify/parse content within documents?



yea, I was thinking of that. I can extract the text of word docs and
pdfs; but I was thinking ideally I could query the index itself. If
not, I've basically got to include the blob itself, and the clob text,
for each document.. ;/

Hilary Cotter wrote:
> You have to have some mechanism to convert the binary data to text data.
> After you have done this its a simple matter of running a regex or writing a
> parser.
>
> I don't recommend a tsql solution to this.
>
> --
> Hilary Cotter
> Looking for a SQL Server replication book?
> http://www.nwsu.com/0974973602.html
>
> Looking for a FAQ on Indexing Services/SQL FTS
> http://www.indexserverfaq.com
>
> <noel.whelan@xxxxxxxxx> wrote in message
> news:1137089226.559095.313470@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> > I've got a full-text index on a table in which I've got word docs and
> > pdfs. I want to parse out certain types of content from these
> > documents, and include the parsed content in the info I give back to
> > the client.
> >
> > In this case, I want to collect the e-mail address(es) from each
> > document yielded by a query. In other words, the client would input a
> > term like 'indexing', I query the database to yield those docs which
> > contain this term, then for each doc parse out the e-mail addresses it
> > contains and provide them back to the client. I imagine I could
> > identify these with a posix expression; but I can't find an example of
> > this type of query. Is there a way to do this?
> >
> > Currently installed evaluation version is 2000 - 8.00.194 on Windows
> > XP. Thank you for any input..
> >

.



Relevant Pages

  • Identify/parse content within documents?
    ... I've got a full-text index on a table in which I've got word docs and ... I want to parse out certain types of content from these ... document yielded by a query. ... the client would input a ...
    (microsoft.public.sqlserver.fulltext)
  • Re: Combine Several Tables from SQL Server Into One Access Table
    ... query so I can point the merge documents to it. ... tables and create a query in your Access database (or link to a view in the ... I'm curious how Word docs are used so often and why they're ... We have an average of about 20 funerals a ...
    (comp.databases.ms-access)
  • Including/parsing out text from within document in query..
    ... document in the data yielded by an Oracle Text CONTAINS query. ... word docs and pdf files), in which case I could execute this query: ... imagine I could write an expression to filter these out, ... Oracle Database 10g Express Edition Release ...
    (comp.databases.oracle.misc)
  • Re: Combine Several Tables from SQL Server Into One Access Table
    ... query so I can point the merge documents to it. ... tables and create a query in your Access database (or link to a view in the ... SQL Server database) that combines the data from the source tables. ... I'm curious how Word docs are used so often and why they're ...
    (comp.databases.ms-access)
  • Re: several new articles and reviews up on my blog
    ... To answer Roger's question, I don't think Adobe 8 format existed in 2006, so unless the guys who made PocketXPDF were psychic... ... In addition, installing Adobe Reader for PPC on your PC added a plug-in to Activesync to "tag" PDFs and reformat them for the small screen- unfortunately that process was hit and miss, and only worked for documents you synced from PC to PPC- not for anything you might download directly on the device, or receive as an email attachment. ... Of course, converting large PDFs to Word is no picnic either- besides the giant file sizes produced eating storage space, the newer versions of Word take AGES to open large Word docs. ...
    (microsoft.public.pocketpc)

Loading