regexp_replace Unicode in PostgreSQL

2020-03-30 02:14发布

How to regexp_replace for Unicode in PostgreSQL

i read this http://www.regular-expressions.info/unicode.html

select regexp_replace('s4y8sds', '\\p{Number}', '')

select regexp_replace('s4y8sds', '\\p{N}', '')

but not work

i have this following code work in PHP

preg_replace( "/[^\p{Ll}|\p{Lm}|\p{Lo}|\p{Lt}|\p{Lu}|\p{Zs}]/u", "", "string1212.," );

Please help me

标签： regex postgresql unicode

1条回答

叼着烟拽天下

2楼-- · 2020-03-30 03:04

For ordinary numbers use digit character class as [[:digit:]] or shorthand \d:

SELECT regexp_replace('s4y8sds', $$\d+$$, '', 'g');

Result:

 regexp_replace
----------------
 sysds
(1 row)

For other numbers (for example ¼) is not that simple, more precisely as documentation says it's ctype (locale) dependent:

Within a bracket expression, the name of a character class enclosed in [: and :] stands for the list of all characters belonging to that class. Standard character class names are: alnum, alpha, blank, cntrl, digit, graph, lower, print, punct, space, upper, xdigit. These stand for the character classes defined in ctype. A locale can provide others.

However you could use internal PL/Perl procedural language and write server-side function with wanted Unicode characters classes \p{}:

CREATE OR REPLACE FUNCTION removeNumbersUnicode(text)
RETURNS text AS $$
    $s = $_[0];
    $s =~ s/\p{N}//g;
    return $s;
$$ LANGUAGE plperl;

Check Chapter 41 from doc for more info how to write such functions.

0人赞添加讨论(0) 举报

regexp_replace Unicode in PostgreSQL

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间