Re: strange output from pdf extracts



Is this a problem with all pdf's or just a few of them. It could be that the
problem pdf's are largely binary. Use filtdump to determine if they are.

--
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.

This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com



"s_m_b" <smb20002ns@xxxxxxxxxxx> wrote in message
news:Xns98616E726811Dsmb2000nshotrmailcom@xxxxxxxxxxxxxxxx
Whilst our ifilter appears to be working ok - its pulling documents out -
the extract (characterization) appears to be mangled, or perhaps just not
getting converted:

PDF-1.3 ???? 44 0 obj Linearized 1 O 46 H [ 1354 398 ] L 113568 E 82643 N
4
T 112570 endobj xref 44 46 0000000016 00000 n 0000001267 00000 n
0000001752
00000 n 0000001974 00000 n 0000002225 00000 n 0000002643 00000 n
0000003226
00000 n 0000003450 00000 n 0000003850 00000 n 0000004066 00000 n 00000

is a sample from a pdf doc.

Office docs seem unaffected by this, and until a while ago, the pdfs were
ok too.

I've installed ifilter 6 recently - would that be having any effect?

system is w2k/IIS5, with content based in acrobat v3 to v6 + word, excel,
etc
using ixsso.query/ixsso.util for the engine


.



Relevant Pages

  • Re: Invalid Descriptor Index
    ... Director of Text Mining and Database Strategy ... RelevantNOISE.Com - Dedicated to mining blogs for business intelligence. ... Looking for a SQL Server replication book? ... "Hilary Cotter" wrote: ...
    (microsoft.public.sqlserver.replication)
  • Re: Replication from SQL Server 2005 to SQL Express using RMO
    ... Director of Text Mining and Database Strategy ... RelevantNOISE.Com - Dedicated to mining blogs for business intelligence. ... Const DB_AUTHENTICATION = 0 ... RMO is just like DMO, but only contains the replication components. ...
    (microsoft.public.sqlserver.replication)
  • Re: Backup failure due to full-text indexing
    ... Director of Text Mining and Database Strategy ... RelevantNOISE.Com - Dedicated to mining blogs for business intelligence. ... Shift the backup log after backup database (it was starting at the same ... "Hilary Cotter" wrote: ...
    (microsoft.public.sqlserver.fulltext)
  • Re: Fulltext and cluster - odd problem
    ... Director of Text Mining and Database Strategy ... RelevantNOISE.Com - Dedicated to mining blogs for business intelligence. ... Looking for a SQL Server replication book? ... [Location of errorlog files] ...
    (microsoft.public.sqlserver.fulltext)
  • Re: Fulltext and cluster - odd problem
    ... Hilary Cotter ... Director of Text Mining and Database Strategy ... RelevantNOISE.Com - Dedicated to mining blogs for business intelligence. ... [Location of errorlog files] ...
    (microsoft.public.sqlserver.fulltext)

Loading