-->

How to deal with accented characters in iOS SQLite

2019-02-08 16:51发布

问题:

I need to perform a SELECT queries that are insensitive to case and accents. For demo purposes, I create a table like that:

create table table
(
  column text collate nocase
);

insert into table values ('A');
insert into table values ('a');
insert into table values ('Á');
insert into table values ('á');

create index table_cloumn_Index
  on table (column collate nocase);

Then, I get those results when executing the following queries:

SELECT * FROM table WHERE column LIKE 'a';
> A
> a

SELECT * FROM table WHERE column LIKE 'á';
> á

SELECT * FROM table WHERE column LIKE 'Á';
> Á

How can I fix that so the results for any of the following queries be like that:

> A
> a
> Á
> á

The sqlite is running on iOS, by the way.

Thanks in advance,

回答1:

Two basic approaches:

  1. You can create a second column in the table which contains the string without the international characters. Furthermore, before doing a search against this secondary search column, you should also remove international characters from the string being search for, too (that way you are comparing non-international to non-international).

    This is the routine I use to convert the international characters:

    NSData *data = [string dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES];
    string = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];
    

    You could also replace the accented characters with:

    NSMutableString *mutableString = [string mutableCopy];
    CFStringTransform((__bridge CFMutableStringRef)mutableString, NULL, kCFStringTransformStripCombiningMarks, NO);
    

    By the way, if you need to sort your results, you can also sort upon this secondary search field instead of the main field, which will avoid problems stemming from SQLite's inability to sort the international characters, either.

  2. You can alternatively create your own "unaccented" C function (define this C function outside the @implementation for your class):

    void unaccented(sqlite3_context *context, int argc, sqlite3_value **argv)
    {
        if (argc != 1 || sqlite3_value_type(argv[0]) != SQLITE_TEXT) {
            sqlite3_result_null(context);
            return;
        }
    
        @autoreleasepool {
            NSMutableString *string = [NSMutableString stringWithUTF8String:(const char *)sqlite3_value_text(argv[0])];
            CFStringTransform((__bridge CFMutableStringRef)string, NULL, kCFStringTransformStripCombiningMarks, NO);
            sqlite3_result_text(context, [string UTF8String], -1, SQLITE_TRANSIENT);
        }
    }
    

    You can then define a SQLite function that will call this C-function (call this method after you open the database, which will be effective until you close that database):

    - (void)createUnaccentedFunction
    {
        if (sqlite3_create_function_v2(database, "unaccented", 1, SQLITE_ANY, NULL, &unaccented, NULL, NULL, NULL) != SQLITE_OK)
            NSLog(@"%s: sqlite3_create_function_v2 error: %s", __FUNCTION__, sqlite3_errmsg(database));
    }
    

    Having done that, you can now use this new unaccented function in SQL, e.g.:

    if (sqlite3_prepare_v2(database, "select a from table where unaccented(column) like 'a'", -1, &statement, NULL) != SQLITE_OK)
        NSLog(@"%s: insert 1: %s", __FUNCTION__, sqlite3_errmsg(database));
    


回答2:

You will need to either create some user function, or override (i.e. replace) the default implementation of the like() functions. The reason is that the LIKE operator in sqlite doesn't support non-ASCII case-insensitiveness:

SQLite only understands upper/lower case for ASCII characters by default. The LIKE operator is case sensitive by default for unicode characters that are beyond the ASCII range. For example, the expression 'a' LIKE 'A' is TRUE but 'æ' LIKE 'Æ' is FALSE.

This makes sense otherwise sqlite would need to support different cultures since case varies from one to the other. An example is the capital i in Turkey which is not I but a dotted İ, and the lower-case of I is a dot-less ı. Embedding all this culture information in sqlite would be very burdensome (i.e. it would increase sqlite object code).



回答3:

Here is my solution of LIKE problem

static void myLow(sqlite3_context *context, int argc, sqlite3_value **argv)
{
    NSString* str = [[NSString alloc] initWithUTF8String:
                            (const char *)sqlite3_value_text(argv[0])];
    const char* s = [[str lowercaseString] UTF8String];
    sqlite3_result_text(context, s, strlen(s), NULL);
    [str release];
}

// call it once after opening db
sqlite3_create_function(_db, "myLow", 1, SQLITE_UTF8,NULL, &myLow, NULL, NULL);

And then instead of query

SELECT * FROM table WHERE column LIKE 'a'

you should use

SELECT * FROM table WHERE myLow(column) LIKE 'a'