I was playing around with custom Search Indexing Handlers for SDL Tridion 2011 (GA). I got something working, using the very helpful information provided by Arjen, however I am not sure if my execution is the best option.
The requirement is to be able to search for pages in the CMS by url (eg www.example.com/news/index.html). In order to do this I have the created a class using the ISearchIndexingHandler
interface (code below). I am indexing the url in the ContentText field of the item, however I am not sure if this would normally contain something else for a page (I think a page only has metadata so this should be OK). The advantage of using this over a custom field is that I can simply type the url in the search box without having to use <url> IN <fieldname> or something like that.
So my question is, is there any reason not to use ContentText for Pages, and is there any advantage in using a custom field? Also bonus marks go to anyone with good ideas on how to handle BluePrinting (if I create a page in a parent publication, I want the local urls also to be indexed in the child publications), and the case where a Structure group path is altered (I guess I can somehow trigger a re-index of child page items from within my indexing handler).
The code:
using System;
using Tridion.ContentManager.Search;
using Tridion.ContentManager.Search.Indexing.Handling;
using Tridion.ContentManager.Search.Indexing.Service;
using Tridion.ContentManager.Search.Indexing;
using Tridion.ContentManager.Search.Fields;
namespace ExampleSearchIndexHandler
{
public class PageUrlHandler : ISearchIndexingHandler
{
public void Configure(SearchIndexingHandlerSettings settings)
{
}
public void ExtractIndexFields(IdentifiableObjectData subjectData, Item item, CoreServiceProxy serviceProxy)
{
PageData data = subjectData as PageData;
if (data != null)
{
PublishLocationInfo info = data.LocationInfo as PublishLocationInfo;
string url = GetUrlPrefix(data) + info.PublishLocationUrl;
item.ContentText = url;
}
}
private string GetUrlPrefix(PageData page)
{
//hardcoded for now, but will be read from publication metadata
return "www.example.com";
}
}
}