Re: strange output from pdf extracts
- From: "Hilary Cotter" <hilary.cotter@xxxxxxxxx>
- Date: Thu, 19 Oct 2006 13:58:15 -0400
Is this a problem with all pdf's or just a few of them. It could be that the
problem pdf's are largely binary. Use filtdump to determine if they are.
--
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.
This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"s_m_b" <smb20002ns@xxxxxxxxxxx> wrote in message
news:Xns98616E726811Dsmb2000nshotrmailcom@xxxxxxxxxxxxxxxx
Whilst our ifilter appears to be working ok - its pulling documents out -
the extract (characterization) appears to be mangled, or perhaps just not
getting converted:
PDF-1.3 ???? 44 0 obj Linearized 1 O 46 H [ 1354 398 ] L 113568 E 82643 N
4
T 112570 endobj xref 44 46 0000000016 00000 n 0000001267 00000 n
0000001752
00000 n 0000001974 00000 n 0000002225 00000 n 0000002643 00000 n
0000003226
00000 n 0000003450 00000 n 0000003850 00000 n 0000004066 00000 n 00000
is a sample from a pdf doc.
Office docs seem unaffected by this, and until a while ago, the pdfs were
ok too.
I've installed ifilter 6 recently - would that be having any effect?
system is w2k/IIS5, with content based in acrobat v3 to v6 + word, excel,
etc
using ixsso.query/ixsso.util for the engine
.
- References:
- strange output from pdf extracts
- From: s_m_b
- strange output from pdf extracts
- Prev by Date: strange output from pdf extracts
- Next by Date: Unique Document key
- Previous by thread: strange output from pdf extracts
- Next by thread: Unique Document key
- Index(es):
Relevant Pages
|
Loading