Is SQLite on Android built with the ICU tokenizer

2019-02-17 01:04发布

Like the title says: can we use ...USING fts3(tokenizer icu th_TH, ...). If we can, does anyone know what locales are suported, and whether it varies by platform version?

3条回答
Melony?
2楼-- · 2019-02-17 01:26

No, only tokenizer=porter

When I specify tokenizer=icu, I get "android.database.sqlite.SQLiteException: unknown tokenizer: icu"

Also, this link hints that if Android didn't compile it in by default, it will not be available http://sqlite.phxsoftware.com/forums/t/2349.aspx

查看更多
Animai°情兽
3楼-- · 2019-02-17 01:30

For API Level 21 or up, I tested and found that ICU tokenizer is already available.

However to support 90%+ devices, some work-around can be made. I have a work-around idea, which is also mentioned in my another question: Work around of Android SQLite full-text search for Asian text

You may port the ICU tokenizer function into java, or a native Android module, as a separate module but not directly involved in SQLite. Then use the "external content table" to link to the virtual table (supported from FTS4).

When adding tuple, add normal content to external content table, but invoke the stand alone tokenzier to add artificial spaces to boundary of words before adding into the virtual index table.

When doing tuple delete, invoke the tokenzier again to update the content table with artificial spaces, then delete the virtual table tuple, then delete the content table tuple.

This is a little tricky, but comparing another option of re-compile a full SQLite, it is already much less effort.

For the external content table and how it works, please refer https://www.sqlite.org/fts3.html#section_6_2_2

The available ICU tokenizer is actually there in Android SDK. Use BreakIterator.getWordInstance. Looks like it even supports dictionary based tokenizer for languages such as Chinese. http://developer.android.com/reference/java/text/BreakIterator.html

查看更多
Bombasti
4楼-- · 2019-02-17 01:36

I have some Android code that uses tokenization in the link below, maybe it will of some help:

https://github.com/gast-lib/gast-lib/blob/master/app/src/root/gast/playground/speech/food/db/FtsIndexedFoodDatabase.java

查看更多
登录 后发表回答