As part of search engine i have developed an inverted index.
So i have a list which contains elements of the following type
public struct ForwardBarrelRecord
{
public string DocId;
public int hits { get; set; }
public List<int> hitLocation;
}
Now this record is against a single word. The hitLocation contains the locations where a particular word has been found in a document.
Now what i want is to calculate the closeness of elements in List<int> hitLocation
to another List<int> hitLocation
and then if the elements in the List are adjacent then to increase the weight of both records.
Problem that i am having is finding a suitable algorithm for this purpose. Any Help is appreciated
This is easiest if the
hitLocation
lists are in sorted order. So start with:Although if you're doing this for a search engine then you'll probably want those lists to be pre-sorted in your inverted index.
In any case, once you have the lists sorted, finding matches is pretty easy.
That will locate occurrences of word1 followed by word2. If you also want word2 followed by word1, you could put a similar check in the
else
clause.please refer to the code to output all adjacent hit locations in a merge-scan of 2 sorted lists.