Custom full-text index stored in Cassandra

2019-05-26 07:15发布

问题:

I've got a situation where I'm using Cassandra for DB and I need full-text search capability. Now I'm aware of Apache Solr, Apache Cassandra, and DSE search.

However, I do not want to use a costly and proprietary software(DSE search). The reason I do not want to use Apache Solr is because I don't want to deal with HA, sharding, and redundency for it. Cassandra is perfect for HA, sharding, and redundency; I would like to store my full-text index in the existing Cassandra DB.

So what I'm looking for is something that will break down a string into its indexable parts. For example:

String input = "I like apples and bannanas.";

String tokens[] = makeTokenIndex(input);

//tokens = {"I","like","apples","bannanas","apple","bannana"}

Obviously I could split strings on spaces and use the words as index-keys. But I'm looking for something smarter than that. Something that can handle plurals, find the root of a word, etc...

Would modifying Apache Lucene be the best solution for this, or is there another option?

回答1:

I've not used Cassandra, but I think you're talking about using a Cassandra implementation of Lucene's Directory interface. Lucene uses a Directory to interact with a storage mechanism.

I found a couple of projects that might help:

  • lucene-on-cassandra
  • Solandra

I can't speak with experience about either one, though.