Full text search on Google App Engine (Java)

2019-04-26 18:42发布

There are a few threads floating around on the topic, but I think my use-case is somewhat different.

What I want to do:

  • Full text search component for my GAE/J app
  • The index size is small: 25-50MB or so
  • I do not need live updates to the index, a periodic re-indexing is fine
  • This is for auto-complete and the like, so it needs to be extremely fast (I get the impression that implementing an inverted index in Datastore introduces considerable latency)

My strategy so far (just planning, haven't tried implementing anything yet):

  • Use Lucene with RAMDirectory
  • A periodic cron job creates the index, serializes it to the Datastore, stores an update id (or timestamp)
  • Search servlet loads the index on startup and creates the RAMDirectory
  • On each request the servlet checks the current update id and reloads the index as necessary

The main thing I'm fuzzy on is how to synchronize in-memory data between instances - will this work, or am I missing something?

Also, how far can I push it before I start having problems with memory use? I couldn't find anything on RAM quotas for GAE. (This index is small, but I can think of more stuff I'd like to add)

And, of course, any thoughts on better approaches?

5条回答
在下西门庆
2楼-- · 2019-04-26 18:51

For autocomplete, perhaps you could store the top N matches for each prefix (basically what you'd put in the drop-down menu) in memcache? The memcache entities could be backed by entities in the datastore and reloaded if needed.

查看更多
Animai°情兽
3楼-- · 2019-04-26 18:53

If you're okay with periodic rebuilds, and your index is small, your current approach sounds mostly okay. Instead of building the index online and serializing it to the datastore, though, why not build it offline, and upload it with the app? Then, you can instantiate it directly from the disk store, and to push an update, you deploy a new version of your app.

查看更多
祖国的老花朵
4楼-- · 2019-04-26 18:56

App Engine now includes a full-text search API (Experimental): https://developers.google.com/appengine/docs/java/search/

查看更多
别忘想泡老子
5楼-- · 2019-04-26 18:59

Well, as of GAE 1.5.0 looks like resident Backends can be used to create a search service.

Of course, there's no free quota for these.

查看更多
老娘就宠你
6楼-- · 2019-04-26 19:04

Recently GAE added "text search" service. Take a look at GAE Java Text Search

查看更多
登录 后发表回答