Re: Crawling Custom Metadata Using SPS 2003

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance

From: rdcpro (rdcpro_at_hotmail.com)
Date: 01/12/05

  • Next message: AnyIdeas: "Uploading Folders and Files to Sharepoint Lists"
    Date: 12 Jan 2005 13:23:01 -0800
    
    

    I finally got resolution on the custom metadata and HTMLProp.dll issue.

    In case the previous posts aren't available anymore, the issue is that
    htmlprop.dll, found in the Windows Platform SDK, is supposed to convert
    a variety of DateTime meta tags (with certain formats) to a DateTime
    datatype, so that functions such as DATEADD() can be used in an SPS
    query. You can't use CAST in a where clause (a limitation of the
    MSSQLFT language), and by default all custom metadata is returned as
    String.

    A similar situation exists with integer types as well. The problem is,
    no matter what I did, the custom metadata was always coming back as a
    string. I followed all the instructions posted in the platform SDK,
    plus a bunch that PSS provided, to no avail.

    While this isn't documented anywhere, it turns out that this key:
    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\SPSSearch\ContentIndexCommon\Filters\CLSID\{25336920-03F9-11CF-8FD0-00AA00686F13}
    contained the GUID of nlhtml.dll, not HTMLProp.dll.
    The GUID {25336920-03F9-11CF-8FD0-00AA00686F13} is used to identify
    file and MIME types for HTML, GIF, JPEG, plain text and similar files.
    It's sort of like an extension. I had placed the GUID of htmlprop.dll
    in all the right places but this one, as it wasn't specified.

    I also found the old nlhtml.dll GUID under the following keys, neither
    of which seem to be registered to any file types:
    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\SPSSearch\ContentIndexCommon\Filters\CLSID\{7F73B8F6-C19C-11D0-AA66-00C04FC2EDDC}
    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\SPSSearch\ContentIndexCommon\Filters\CLSID\{BD70C020-2D24-11D0-9110-00004C752752}

    PSS recommends replacing all occurances of the nlhtml.dll GUID with
    HTML prop's GUID, since htmlprop.dll will call nlhtml.dll if it finds a
    custom metaproperty it can't handle. However, since these don't map to
    anything, I just replaced the necessary one for now.

    In any event, once I deleted the custom properties from my SPS Manage
    Custom Properties for Search, reset *all* the content indexes,
    restarted the Microsoft SharePointPS Search service and recrawled the
    content, I now find my Date meta tag is coming back as a DateTime type!
    Woo Hoo!

    Here's my htmlprop.ini:

    [Names]

    #
    # These lines define datatypes for HTML meta properties.
    # The HTML property filter will convert properties from strings to
    # these types.
    #

    YEAR (DBTYPE_I8) = d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1 YEAR
    MONTH (DBTYPE_I8) = d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1 MONTH
    DAY (DBTYPE_I8) = d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1 DAY
    Date (VT_FILETIME) = d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1 Date

    # htmlprop.dll clsid is {f4309e80-a1db-11d1-a8fb-00e098006ed3}
    #
    # Default HTML IFilter clsid is {E0CA5340-4534-11CF-B952-00AA0051FE20}

    How to Register HTMLProp.dll

    1. Copy HTMLProp.dll to C:\Windows\System32
    2. Copy HTMLProp.ini to C:\Windows\System32
    3. Modify HTMLProp.ini to include the value-type properties you're
    interested in. For example, add following lines at the bottom of the
    file:
    YEAR (DBTYPE_I8) = d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1 YEAR
    MONTH (DBTYPE_I8) = d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1 MONTH
    DAY (DBTYPE_I8) = d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1 DAY
    Date (VT_FILETIME) = d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1 Date

    4. Important Note: Stop the Microsoft SharePointPS Search service
    before proceeding. Do not attempt the steps below if scans are in
    progress. Wait for scans to complete, or halt all scans.

    5. Enable automatic registration by adding the following line:
    C:\WINDOWS\system32\htmlprop.dll
    To the registry key:
    HKLM\SYSTEM\CurrentControlSet\Control\ContentIndex\DLLsToRegister

    6. At the command-line prompt, self register the filter by typing:
    "regsvr32.exe %windir%\System32\HtmlProp.dll".

    7. Search in the registry under HKEY_CLASSES_ROOT for HTMLProp.dll.
    You are looking for the CLSID of this file, which should be returned
    something like this:
    HKEY_CLASSES_ROOT\CLSID\{f4309e80-a1db-11d1-a8fb-00e098006ed3}
    If you built the source code from the platform SDK, the GUID will be as
    listed above.

    8. The CLSID of the htmlprop.dll needs to be registered in the
    SharePoint Portal Server 2003 IFilter registration to override the
    CLSID of the default html filter. The SharePoint Portal Server 2003
    filters are registered in the Windows Registry under
    HKLM\Software\Microsoft\SPSSearch\ContentIndexCommon\Filters\CLSID\
    and
    HKLM\Software\Microsoft\SPSSearch\ContentIndexCommon\Filters\Extension\
    There are several places where this is done. Some are under the
    Extension subkey, and there are others under the CLSID subkey. Steps 8a
    and 8b detail these changes.

    8a. The CLSID of the default html filter is
    {E0CA5340-4534-11CF-B952-00AA0051FE20}. Replace this value with the
    new value {f4309e80-a1db-11d1-a8fb-00e098006ed3} in:
    HKLM\SOFTWARE\Microsoft\SPSSearch\ContentIndexCommon\Filters\Extension\.htm
    HKLM\SOFTWARE\Microsoft\SPSSearch\ContentIndexCommon\Filters\Extension\.html
    As well as any other file extensions you want handled by htmlprop.dll.

    8b. There are several CLSID GUIDs that are mapped to the default html
    filter nlhtml.dll. The GUID {25336920-03F9-11CF-8FD0-00AA00686F13} is
    used to identify file and MIME types for HTML, GIF, JPEG, plain text
    and similar files. It can be thought of as something similar to a file
    extension. The GUIDs {7F73B8F6-C19C-11D0-AA66-00C04FC2EDDC} and
    {BD70C020-2D24-11D0-9110-00004C752752} might also be found, but these
    may not be mapped to any file types (search the registry for
    occurrences of these GUIDs to be sure). Search for occurrences of the
    old GUID for nlhtml.dll {E0CA5340-4534-11CF-B952-00AA0051FE20} and
    replace with {f4309e80-a1db-11d1-a8fb-00e098006ed3} in:
    HKLM\SOFTWARE\Microsoft\SPSSearch\ContentIndexCommon\Filters\CLSID\
    Please note this step is not documented elsewhere, but it has been
    found to be necessary.

    9. Go to "Manage Properties of Crawled Content", and delete all the
    value-type properties in the
    urn:schemas.microsoft.com:htmlinfo:metainfo namespace that you are
    trying to process with htmlprop.dll. This is important because if any
    content index has crawled content containing the value-type properties
    you're trying to process, you will not be able to coerce the type.

    10. Reset all content indexes. This is important because if any
    content index contains the value-type properties you're trying to
    process, you will not be able to coerce the type.

    11. Restart the Microsoft SharePointPS Search service

    12. Initiate a Full Update of all content indexes that contain the
    value-type properties. If this crawl would take a long time, you can
    create a temporary content index with one document to crawl that
    contains the value-type Meta tags you want to process with
    htmlprop.dll, and crawl only that one content index.

    13. Go to "Manage Properties of Crawled Content", locate the
    value-type properties in the
    urn:schemas.microsoft.com:htmlinfo:metainfo namespace, and check the
    option, "Included in the Advanced Search options", which exposes
    the meta property to the Sharepoint Search Service

    14. Initiate another Full Update of all content indexes.

    15. After crawl and propagation has completed, you will be able to
    select the value-type properties on the Advanced Search page and
    specific value-type criteria. For instance, for an integer property
    you can specify >, >=, =, <, or <= criteria. You will also be able to
    run programmatically run queries with a WHERE clause that contains
    these criteria, and you will be able to use date functions like
    DATEADD() in a WHERE clause.

    Thanks to Anant Dimri with Microsoft PSS, for catching that
    undocumented registry key!
    Regards,
    Mike Sharp

    rdcpro@hotmail.com
    http://rdcpro.com


  • Next message: AnyIdeas: "Uploading Folders and Files to Sharepoint Lists"

    Relevant Pages

    • Re: Search on Numeric property on Sharepoint 2003
      ... I finally got resolution on the custom metadata and HTMLProp.dll issue. ... The GUID is used to identify ... Stop the Microsoft SharePointPS Search service ... Search in the registry under HKEY_CLASSES_ROOT for HTMLProp.dll. ...
      (microsoft.public.sharepoint.portalserver)
    • RE: SBS2003 Registry
      ... Please review the DhcpIPAddress or IPAddress registry ... entries to determine the correct GUID for the LAN adapter, ... internal LAN IP address? ... you need to modify the LNNIC entry to the same as the internal ...
      (microsoft.public.windows.server.sbs)
    • Re: Cannot run CEICW after changing ISPs
      ... Warning If you use Registry Editor incorrectly, ... IPAddress registry entries to determine the correct GUID for the LAN ... Make a note of the external network adapter GUID also. ... "The wizard cannot set the DHCP scope options. ...
      (microsoft.public.windows.server.sbs)
    • Re: Help - How to transfer the Outlook Express Message Rules to another PC
      ... You can do it with or without OETool, but the bottom line is that /all/ your OE material is backed up to one folder you create and something like this can be avoided as the restore function puts everything right in the new load of your OS. ... MS-MVP Outlook Express ... To restore the keys at a later time, you must first determine the GUID of your new Identity. ... For each registry file that you saved in your backup, open the *.reg file in Wordpad and change all occurrences of the old with the of your current Identity. ...
      (microsoft.public.windows.inetexplorer.ie6_outlookexpress)
    • Re: Setting metadata date fields in SPS 2003
      ... I finally got resolution on the custom metadata and HTMLProp.dll issue. ... a variety of DateTime meta tags to a DateTime ... A similar situation exists with integer types as well. ... The GUID is used to identify ...
      (microsoft.public.sharepoint.portalserver)