We have seen 4byte UTF-8 characters stored in oracle. We need to detect how many rows have such characters in a particular column.
Tried the recommendation given here - UTF-16 supplementary characters but that didn't work for me. It throws an error - ORA-12728: invalid range in regular expression. Tried the other answers too. None of them work.
Oracle documented the possible hex code ranges for UTF-8 in table B-2 in this page Unicode Character Set. I am trying to detect any character that falls in row 5 or 6 of that table ( "Supplementary characters: Additional Chinese, Japanese, and Korean characters; historic characters; musical symbols; mathematical symbols" OR "Private Use Area #2" )
Is there a regular expression I can use to detect such rows?
select count(*) from mytable where regexp_like(mycolumn, expression);
May be you did something wrong with regexp building: There is short example.
It works on Oracle 11.2.0.4 and 12.2.0.1
Maybe it is easier to look for surrogates. High surrogate is from
U+D800
-U+DBFF
, low surrogate is fromU+DC00
-U+DFFF
.This one should also work, is looks for the low surrogate: