I have perhaps trillions of string sequences. I'm looking for a fast substring search.
I've created an index. When I am trying to get some results( x => x.StartWith
), it takes about 2 sec on a 3 million object database.
How much time it might take on 500 million objects?
Is it possible to have RavenDB search faster?
store.DatabaseCommands.PutIndex("KeyPhraseInfoByWord", new Raven.Client.Indexes.IndexDefinitionBuilder<KeyPhraseInfo>
{
Map = wordStats => from keyPhraseInfo in keyPhraseInfoCollection
select new { keyPhraseInfo.Key },
Analyzers =
{
{ x => x.Key, "SimpleAnalyzer"}
}
});
Nier0,
You can do really fast NGram search using RavenDB, yes.
See: https://gist.github.com/1669767
Ayende's excellent NGram analyzer seems to be made for an older version of Lucene than RavenDB uses now, so I made an updated version of it for confused people like me. See: http://pastebin.com/a78XzGDk. All credit goes to Ayende for this one.
To use it, put it in a library, build it and drop it into the Analyzers-folder under Server in the RavenDB directory. Then create an index like this:
public class PostByNameIndex : AbstractIndexCreationTask<Posts>
{
public PostByNameIndex()
{
Map = posts => posts.Select(x => new {x.Name});
Analyze(x => x.Name, typeof(NGramAnalyzer).AssemblyQualifiedName);
}
}
But as I said, all credit and thanks to Ayende for creating this.