How to index sub-content in Sitecore with Lucene?

2019-08-09 14:05发布

问题:

I'm using Sitecore 7.2 with MVC and a component approach to page building. This means that pages are largely empty and the content comes from the various renderings placed on the page. However, I would like the search results to return the main pages, not the individual content pieces.

Here is the basic code I have so far:

public IEnumerable<Item> GetItemsByKeywords(string[] keywords)
{
    var index = ContentSearchManager.GetIndex("sitecore_master_index");
    var allowedTemplates = new List<ID>();
    IEnumerable<Item> items;

    // Only Page templates should be returned
    allowedTemplates.Add(new Sitecore.Data.ID("{842FAE42-802A-41F5-96DA-82FD038A9EB0}"));

    using (var context = index.CreateSearchContext(SearchSecurityOptions.EnableSecurityCheck))
    {
        var keywordsPredicate = PredicateBuilder.True<SearchResultItem>();
        var templatePredicate = PredicateBuilder.True<SearchResultItem>();
        SearchResults<SearchResultItem> results;

        // Only return results from allowed templates
        templatePredicate = allowedTemplates.Aggregate(templatePredicate, (current, t) => current.Or(p => p.TemplateId == t));

        // Add keywords to predicate
        foreach (string keyword in keywords)
        {
            keywordsPredicate = keywordsPredicate.And(p => p.Content.Contains(keyword));
        }

        results = context.GetQueryable<SearchResultItem>().Where(keywordsPredicate).Filter(templatePredicate).GetResults();
        items = results.Hits.Select(hit => hit.Document.GetItem());
    }

    return items;
}

回答1:

You could create a computed field in the index which looks at the renderings on the page and resolves each rendering's data source item. Once you have each of those items you can index their fields and concatenate all of this data together.

One option is to do this with the native "content" computed field which is natively what full text search uses.



回答2:

An alternative solution is to make an HttpRequest back to your published site and essentially scrape the HTML. This ensures that all renderings are included in the index.

You probably will not want to index common parts, like the Menu and Footer, so make use of HTMLAgilityPack or FizzlerEx to only return the contents of a particular parent container. You could get more clever to remove inner containers is you needed to. Just remember to strip out the html tags as well :)

using HtmlAgilityPack;
using Fizzler.Systems.HtmlAgilityPack;

//get the page
var web = new HtmlWeb();
var document = web.Load("http://localsite/url-to-page");
var page = document.DocumentNode;

var content = page.QuerySelector("div.main-content");