可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

The title sums it up pretty well. I'm looking for a regular expression matching Unicode uppercase character for the Postgres ~ operator. The obvious way doesn't work:

=> select 'A' ~ '[[:upper:]]';
 ?column? 
----------
 t
(1 row)

=> select 'Ó' ~ '[[:upper:]]';
 ?column? 
----------
 t
(1 row)

=> select 'Ą' ~ '[[:upper:]]';
 ?column? 
----------
 f
(1 row)

I'm using Postgresql 9.1 and my locale is set to pl_PL.UTF-8. The ordering works fine.

=> show LC_CTYPE;
  lc_ctype   
-------------
 pl_PL.UTF-8
(1 row)

回答1:

The regexp engine of PG 9.1 and older versions does not correctly classify characters whose codepoint doesn't fit it one byte. The codepoint of 'Ó' being 211 it gets it right, but the codepoint of 'Ą' is 260, beyond 255.

PG 9.2 is better at this, though still not 100% right for all alphabets. See this commit in PostgreSQL source code, and particularly these parts of the comment:

remove the hard-wired limitation to not consider wctype.h results for character codes above 255

and

Still, we can push it up to U+7FF (which I chose as the limit of 2-byte UTF8 characters), which will at least make Eastern Europeans happy pending a better solution

Unfortunately this was not backported to 9.1

回答2:

I've found that perl regular expressions handles Unicode perfectly.

create extension plperl;

create function is_letter_upper(text) returns boolean
immutable strict language plperl
as $$
    use feature 'unicode_strings';
    return $_[0] =~ /^\p{IsUpper}$/ ? "true" : "false";
$$;

Tested on postgres 9.2 with perl 5.16.2.