Re: More PDF IFilter problems



The entire pdf is an image (an image of text mind you), but there is not
text in there.

You need to do ocr on the image to extract the text. =

--
RelevantNoise.com - dedicated to mining blogs for business intelligence.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"LiveCycle" <livecycle@xxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:OaUtPGfAIHA.1208@xxxxxxxxxxxxxxxxxxxxxxx
Hi Hilary,

Thank you for responding. I've tried a number of different PDFs, but this
is one of the ones that did not work
(http://www.cogenix.com/Registration_2007-2008.pdf). As you'll see, it's
version 1.3 (Acrobat 4.x), and it's got quite a bit of text in it. I'm
sorry I couldn't attach this file directly to this post...

Thanks again, Jim

"Hilary Cotter" <hilary.cotter@xxxxxxxxx> wrote in message
news:ucRJ11WAIHA.1212@xxxxxxxxxxxxxxxxxxxxxxx
can you post a problem pdf here? Sometime PDFs only contain images and no
text. The iFilter can only understand the text. Also what version of the
PDF is it?

--
RelevantNoise.com - dedicated to mining blogs for business intelligence.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"LiveCycle" <livecycle@xxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:%23eQ$P6VAIHA.3848@xxxxxxxxxxxxxxxxxxxxxxx
Sorry, scratch that last one, the DOC file I was looking at was
corrupted. PDF problem remains, however...
Thanks!
"LiveCycle" <livecycle@xxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:O1Tn0fVAIHA.5184@xxxxxxxxxxxxxxxxxxxxxxx
OK, this gets stranger by the minute. I am able to successfully index
Excel files, but I am not able to index Word documents! I get this
message in my logs.

2007-09-27 15:41:04.59 spid19s Error '0x8004170c: The document
format is not recognized by the filter.' occurred during full-text
index population for table or indexed view
'[RMSTest].[Template].[Content]' (table or indexed view ID
'2142018762', database ID '12'), full-text key value 0x00000003.
Attempt will be made to reindex it.
2007-09-27 15:41:04.59 spid19s The component 'offfilt.dll' reported
error while indexing. Component path 'C:\WINDOWS\system32\offfilt.dll'.

Please, I will lose all my hair soon, any ideas are welcome.



"LiveCycle" <livecycle@xxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:eyhiy9UAIHA.1168@xxxxxxxxxxxxxxxxxxxxxxx
So, I have found the log, and am receiving the following error
information:

2007-09-27 14:40:23.13 spid21s Informational: Full-text Full
population initialized for table or indexed view
'[RMSTest].[Template].[Content]' (table or indexed view ID
'2142018762', database ID '12'). Population sub-tasks: 1.
2007-09-27 14:40:37.24 spid21s Error '0x80043651: msftesql should
reprocess this document in an isolated fashion to confirm the error.'
occurred during full-text index population for table or indexed view
'[RMSTest].[Template].[Content]' (table or indexed view ID
'2142018762', database ID '12'), full-text key value 0x00000001.
Attempt will be made to reindex it.
2007-09-27 14:40:37.24 spid21s The component 'MSFTE.DLL' reported
error while indexing. Component path 'C:\Program Files\Microsoft SQL
Server\MSSQL.1\MSSQL\Binn\MSFTE.DLL'.
2007-09-27 14:40:37.24 spid21s Warning: No appropriate filter for
embedded object was found during full-text index population for table
or indexed view '[RMSTest].[Template].[Content]' (table or indexed
view ID '2142018762', database ID '12'), full-text key value
0x00000002. Some embedded objects in the row could not be indexed.
2007-09-27 14:40:37.24 spid21s Informational: Full-text Full
population completed for table or indexed view
'[RMSTest].[Template].[Content]' (table or indexed view ID
'2142018762', database ID '12'). Number of documents processed: 2.
Number of documents failed: 0. Number of documents need retry: 1.

Clearly, it doesn't like my IFilter. Any ideas how I can make SQL
recognize this?

Thanks, Jim

"LiveCycle" <livecycle@xxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:%23o1jMiUAIHA.3940@xxxxxxxxxxxxxxxxxxxxxxx
Hi all,

I'm having some frustrating issues with the PDF IFilter and making it
work. I've read the other posts here, and still haven't been able to
figure this out. I am running SQL Server 2005 Standard 32 bit
edition on Windows Server 2003 Standard Edition. I performed the
following:

1 - Installed the PDF IFilter v 6.0
2 - Ran EXEC sp_fulltext_service 'load_os_resources', 1
3 - Stopped and restarted the SQL Server service
4 - Ran sys.fulltext_document_types and verified that .pdf was indeed
a valid document type
5 - Built a new full-text catalog and added my table with PDF & other
files (stored as image data type) to the catalog
6 - Fully populated the FT index
7 - Ran my CONTAINS query against that table. I'm able to return
results against Office files, but nothing for PDF files.

So, I'm not sure what I should do at this point. I even tried
restarting the server itself. Somebody (Hilary Cotter?) mentioned
that it might be possible to look at gatherer logs somewhere, but I'm
not clear where those would be. I would appreciate any further
suggestions.

Thanks, Jim













.



Relevant Pages

  • Re: More PDF IFilter problems
    ... Looking for a SQL Server replication book? ... Looking for a FAQ on Indexing Services/SQL FTS ... PDF problem remains, however... ... I'm having some frustrating issues with the PDF IFilter and making it ...
    (microsoft.public.sqlserver.fulltext)
  • Re: More PDF IFilter problems
    ... I did generate a new PDF file (which is ... Looking for a SQL Server replication book? ... occurred during full-text index population for table or indexed view ... I'm having some frustrating issues with the PDF IFilter and making ...
    (microsoft.public.sqlserver.fulltext)
  • Re: Adding PDF indexing with Sql Server 2005
    ... Looking for a SQL Server replication book? ... > index pdf files. ... I have downloaded and installed the Adobe PDF iFilter ... > It does index the pdf files. ...
    (microsoft.public.sqlserver.fulltext)
  • Re: PDF iFilter install on WSS
    ... This might have some effect on your ability to search WSS sites. ... Unfortunately I don't have SQL Server 2005 so I can't check. ... PDF iFilter works fine with WSS. ... I'm trying to get the Adobe PDF ifilter to work, ...
    (microsoft.public.sharepoint.windowsservices)
  • Re: PDF iFilter 8/9 with SQL Server?
    ... before there were instructions for Sharepoint but nothing for SQL Server. ... app (it's based on the xpdf source code) to handle PDF packages too, ... adobe site for version 9/ ifilter. ...
    (microsoft.public.sqlserver.fulltext)