Re: Crawling Custom Metadata Using SPS 2003
From: rdcpro (rdcpro_at_hotmail.com)
Date: 01/12/05
- Previous message: Nick Coleman: "WSDL missing for sharepoint webservices"
- Next in thread: Peter: "Re: Crawling Custom Metadata Using SPS 2003"
- Reply: Peter: "Re: Crawling Custom Metadata Using SPS 2003"
- Messages sorted by: [ date ] [ thread ]
Date: 12 Jan 2005 13:23:01 -0800
I finally got resolution on the custom metadata and HTMLProp.dll issue.
In case the previous posts aren't available anymore, the issue is that
htmlprop.dll, found in the Windows Platform SDK, is supposed to convert
a variety of DateTime meta tags (with certain formats) to a DateTime
datatype, so that functions such as DATEADD() can be used in an SPS
query. You can't use CAST in a where clause (a limitation of the
MSSQLFT language), and by default all custom metadata is returned as
String.
A similar situation exists with integer types as well. The problem is,
no matter what I did, the custom metadata was always coming back as a
string. I followed all the instructions posted in the platform SDK,
plus a bunch that PSS provided, to no avail.
While this isn't documented anywhere, it turns out that this key:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\SPSSearch\ContentIndexCommon\Filters\CLSID\{25336920-03F9-11CF-8FD0-00AA00686F13}
contained the GUID of nlhtml.dll, not HTMLProp.dll.
The GUID {25336920-03F9-11CF-8FD0-00AA00686F13} is used to identify
file and MIME types for HTML, GIF, JPEG, plain text and similar files.
It's sort of like an extension. I had placed the GUID of htmlprop.dll
in all the right places but this one, as it wasn't specified.
I also found the old nlhtml.dll GUID under the following keys, neither
of which seem to be registered to any file types:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\SPSSearch\ContentIndexCommon\Filters\CLSID\{7F73B8F6-C19C-11D0-AA66-00C04FC2EDDC}
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\SPSSearch\ContentIndexCommon\Filters\CLSID\{BD70C020-2D24-11D0-9110-00004C752752}
PSS recommends replacing all occurances of the nlhtml.dll GUID with
HTML prop's GUID, since htmlprop.dll will call nlhtml.dll if it finds a
custom metaproperty it can't handle. However, since these don't map to
anything, I just replaced the necessary one for now.
In any event, once I deleted the custom properties from my SPS Manage
Custom Properties for Search, reset *all* the content indexes,
restarted the Microsoft SharePointPS Search service and recrawled the
content, I now find my Date meta tag is coming back as a DateTime type!
Woo Hoo!
Here's my htmlprop.ini:
[Names]
#
# These lines define datatypes for HTML meta properties.
# The HTML property filter will convert properties from strings to
# these types.
#
YEAR (DBTYPE_I8) = d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1 YEAR
MONTH (DBTYPE_I8) = d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1 MONTH
DAY (DBTYPE_I8) = d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1 DAY
Date (VT_FILETIME) = d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1 Date
# htmlprop.dll clsid is {f4309e80-a1db-11d1-a8fb-00e098006ed3}
#
# Default HTML IFilter clsid is {E0CA5340-4534-11CF-B952-00AA0051FE20}
How to Register HTMLProp.dll
1. Copy HTMLProp.dll to C:\Windows\System32
2. Copy HTMLProp.ini to C:\Windows\System32
3. Modify HTMLProp.ini to include the value-type properties you're
interested in. For example, add following lines at the bottom of the
file:
YEAR (DBTYPE_I8) = d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1 YEAR
MONTH (DBTYPE_I8) = d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1 MONTH
DAY (DBTYPE_I8) = d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1 DAY
Date (VT_FILETIME) = d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1 Date
4. Important Note: Stop the Microsoft SharePointPS Search service
before proceeding. Do not attempt the steps below if scans are in
progress. Wait for scans to complete, or halt all scans.
5. Enable automatic registration by adding the following line:
C:\WINDOWS\system32\htmlprop.dll
To the registry key:
HKLM\SYSTEM\CurrentControlSet\Control\ContentIndex\DLLsToRegister
6. At the command-line prompt, self register the filter by typing:
"regsvr32.exe %windir%\System32\HtmlProp.dll".
7. Search in the registry under HKEY_CLASSES_ROOT for HTMLProp.dll.
You are looking for the CLSID of this file, which should be returned
something like this:
HKEY_CLASSES_ROOT\CLSID\{f4309e80-a1db-11d1-a8fb-00e098006ed3}
If you built the source code from the platform SDK, the GUID will be as
listed above.
8. The CLSID of the htmlprop.dll needs to be registered in the
SharePoint Portal Server 2003 IFilter registration to override the
CLSID of the default html filter. The SharePoint Portal Server 2003
filters are registered in the Windows Registry under
HKLM\Software\Microsoft\SPSSearch\ContentIndexCommon\Filters\CLSID\
and
HKLM\Software\Microsoft\SPSSearch\ContentIndexCommon\Filters\Extension\
There are several places where this is done. Some are under the
Extension subkey, and there are others under the CLSID subkey. Steps 8a
and 8b detail these changes.
8a. The CLSID of the default html filter is
{E0CA5340-4534-11CF-B952-00AA0051FE20}. Replace this value with the
new value {f4309e80-a1db-11d1-a8fb-00e098006ed3} in:
HKLM\SOFTWARE\Microsoft\SPSSearch\ContentIndexCommon\Filters\Extension\.htm
HKLM\SOFTWARE\Microsoft\SPSSearch\ContentIndexCommon\Filters\Extension\.html
As well as any other file extensions you want handled by htmlprop.dll.
8b. There are several CLSID GUIDs that are mapped to the default html
filter nlhtml.dll. The GUID {25336920-03F9-11CF-8FD0-00AA00686F13} is
used to identify file and MIME types for HTML, GIF, JPEG, plain text
and similar files. It can be thought of as something similar to a file
extension. The GUIDs {7F73B8F6-C19C-11D0-AA66-00C04FC2EDDC} and
{BD70C020-2D24-11D0-9110-00004C752752} might also be found, but these
may not be mapped to any file types (search the registry for
occurrences of these GUIDs to be sure). Search for occurrences of the
old GUID for nlhtml.dll {E0CA5340-4534-11CF-B952-00AA0051FE20} and
replace with {f4309e80-a1db-11d1-a8fb-00e098006ed3} in:
HKLM\SOFTWARE\Microsoft\SPSSearch\ContentIndexCommon\Filters\CLSID\
Please note this step is not documented elsewhere, but it has been
found to be necessary.
9. Go to "Manage Properties of Crawled Content", and delete all the
value-type properties in the
urn:schemas.microsoft.com:htmlinfo:metainfo namespace that you are
trying to process with htmlprop.dll. This is important because if any
content index has crawled content containing the value-type properties
you're trying to process, you will not be able to coerce the type.
10. Reset all content indexes. This is important because if any
content index contains the value-type properties you're trying to
process, you will not be able to coerce the type.
11. Restart the Microsoft SharePointPS Search service
12. Initiate a Full Update of all content indexes that contain the
value-type properties. If this crawl would take a long time, you can
create a temporary content index with one document to crawl that
contains the value-type Meta tags you want to process with
htmlprop.dll, and crawl only that one content index.
13. Go to "Manage Properties of Crawled Content", locate the
value-type properties in the
urn:schemas.microsoft.com:htmlinfo:metainfo namespace, and check the
option, "Included in the Advanced Search options", which exposes
the meta property to the Sharepoint Search Service
14. Initiate another Full Update of all content indexes.
15. After crawl and propagation has completed, you will be able to
select the value-type properties on the Advanced Search page and
specific value-type criteria. For instance, for an integer property
you can specify >, >=, =, <, or <= criteria. You will also be able to
run programmatically run queries with a WHERE clause that contains
these criteria, and you will be able to use date functions like
DATEADD() in a WHERE clause.
Thanks to Anant Dimri with Microsoft PSS, for catching that
undocumented registry key!
Regards,
Mike Sharp
rdcpro@hotmail.com
http://rdcpro.com
- Previous message: Nick Coleman: "WSDL missing for sharepoint webservices"
- Next in thread: Peter: "Re: Crawling Custom Metadata Using SPS 2003"
- Reply: Peter: "Re: Crawling Custom Metadata Using SPS 2003"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|