Getting "illegal byte sequence" error while trying to extract non English characters from a large file in MacOS bash shell. This is the script that I am trying to use:
sed 's/[][a-z,0-9,A-Z,!@#\$%^&*(){}":/_-|. -][\;''=?]*//g' < $1 >Abhineet_extract1.txt;
sed 's/\(.\)/\1\
/g' <Abhineet_extract1.txt | sort | uniq |tr -d '\n' >&1;
rm Abhineet_extract1.txt;
and here is the error that I am getting:
uniq: stdin: Illegal byte sequence
'+?
It seems that a UTF-8 locale is causing
Illegal byte sequence
.Instead say:
man locale
says: