I'm seeing BASH bracket ranges (e.g. [A-Z]) behaving in an unexpected way.
Is there's an explanation for such behavior, or it is a bug?
Let's say I have a variable, from which I want to strip all uppercase letters:
$ var='ABCDabcd0123'
$ echo "${var//[A-Z]/}"
The result I get is this:
a0123
If I do it with sed
, I get an expected result:
$ echo "${var}" | sed 's/[A-Z]//g'
abcd0123
The same seems to be the case for BASH built-in regex match:
$ [[ a =~ [A-Z] ]] ; echo $?
1
$ [[ b =~ [A-Z] ]] ; echo $?
0
If I check all lowercase letters from 'a' to 'z', it seems that only 'a' is an exception:
$ for l in {a..z}; do [[ $l =~ [A-Z] ]] || echo $l; done
a
I do not have case-insensitive matching enabled, and even if I did, it should not make letter 'a' behave differently:
$ shopt -p nocasematch
shopt -u nocasematch
For the reference, I'm using Cygwin, and I don't see this behavior on any other machine:
$ uname
CYGWIN_NT-6.3
$ bash --version | head -1
GNU bash, version 4.3.46(7)-release (x86_64-unknown-cygwin)
$ locale
LANG=en_GB.UTF-8
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_ALL=
EDIT:
I've found the exact same issue reported here:
https://bugs.launchpad.net/ubuntu/+source/bash/+bug/120687
So, I guess it's a bug(?) of "en_GB.UTF-8" collation, but not BASH itself.
Setting LC_COLLATE=C
indeed solves this.
It certainly had to do with setting of your
locale
. An excerpt from the GNU bash man page under Pattern MatchingUse the
POSIX
character-classess,[[:upper:]]
in this case or change yourlocale
settingLC_ALL
orLC_COLLATE
toC
as mentioned above.Also, your negative test to do upper-case check will fail for all the lower case letters when setting this locale hence printing the letters,
Also, under the above locale setting
but will be true for all lower-case ranges,
Said this, all these can be avoided by using the
POSIX
specified character classes, under a new shell without anylocale
setting,and