There are a few threads floating around on the topic, but I think my use-case is somewhat different.
What I want to do:
- Full text search component for my GAE/J app
- The index size is small: 25-50MB or so
- I do not need live updates to the index, a periodic re-indexing is fine
- This is for auto-complete and the like, so it needs to be extremely fast (I get the impression that implementing an inverted index in Datastore introduces considerable latency)
My strategy so far (just planning, haven't tried implementing anything yet):
- Use Lucene with RAMDirectory
- A periodic cron job creates the index, serializes it to the Datastore, stores an update id (or timestamp)
- Search servlet loads the index on startup and creates the RAMDirectory
- On each request the servlet checks the current update id and reloads the index as necessary
The main thing I'm fuzzy on is how to synchronize in-memory data between instances - will this work, or am I missing something?
Also, how far can I push it before I start having problems with memory use? I couldn't find anything on RAM quotas for GAE. (This index is small, but I can think of more stuff I'd like to add)
And, of course, any thoughts on better approaches?
For autocomplete, perhaps you could store the top N matches for each prefix (basically what you'd put in the drop-down menu) in memcache? The memcache entities could be backed by entities in the datastore and reloaded if needed.
If you're okay with periodic rebuilds, and your index is small, your current approach sounds mostly okay. Instead of building the index online and serializing it to the datastore, though, why not build it offline, and upload it with the app? Then, you can instantiate it directly from the disk store, and to push an update, you deploy a new version of your app.
App Engine now includes a full-text search API (Experimental): https://developers.google.com/appengine/docs/java/search/
Well, as of GAE 1.5.0 looks like resident Backends can be used to create a search service.
Of course, there's no free quota for these.
Recently GAE added "text search" service. Take a look at GAE Java Text Search