I want to remove Unicode in some range, e.g.:
echo "abcABC123" | sed 's/[\uff21-\uff3b]//g'
expect "abc123"
, but get:
sed: -e expression #1, char 20: Invalid range end
or use:
echo "abcABC123" | sed 's/[A-Z]//g'
get:
sed: -e expression #1, char 14: Invalid collation character
Unicode support in sed
is not well defined. You may be better off using command line perl
:
echo "abcABC123" | perl -CS -pe 's/[\x{FF21}-\x{FF3B}]+//g'
abc123
It is important to use -CS
flags here to be able to get correct UTF8 encodings for input/output/error.
Not sure why sed
is not working, but you can use tr
instead
$ echo 'abcABC123' | tr -d 'A-Z'
abc123
From man tr
tr - translate or delete characters
-d, --delete
delete characters in SET1, do not translate