I'm trying to create a regex to verify that a given string only has alpha characters a-z or A-Z. The string can be up to 25 letters long. (I'm not sure if regex can check length of strings)
Examples:
1. "abcdef" = true;
2. "a2bdef" = false
;
3. "333" = false;
4. "j" = true;
5. "aaaaaaaaaaaaaaaaaaaaaaaaaa" = false;
//26 letters
Here is what I have so far... can't figure out what's wrong with it though
Regex alphaPattern = new Regex("[^a-z]|[^A-Z]");
I would think that would mean that the string could contain only upper or lower case letters from a-z, but when I match it to a string with all letters it returns false...
Also, any suggestions regarding efficiency of using regex vs. other verifying methods would be greatly appreciated.
I'm trying to create a regex to verify that a given string only has alpha
characters a-z or A-Z.
Easily done as many of the others have indicated using what are known as "character classes". Essentially, these allow us to specifiy a range of values to use for matching:
(NOTE: for simplification, I am assuming implict ^ and $ anchors which are explained later in this post)
[a-z] Match any single lower-case letter.
ex: a matches, 8 doesn't match
[A-Z] Match any single upper-case letter.
ex: A matches, a doesn't match
[0-9] Match any single digit zero to nine
ex: 8 matches, a doesn't match
[aeiou] Match only on a or e or i or o or u.
ex: o matches, z doesn't match
[a-zA-Z] Match any single lower-case OR upper-case letter.
ex: A matches, a matches, 3 doesn't match
These can, naturally, be negated as well:
[^a-z] Match anything that is NOT an lower-case letter
ex: 5 matches, A matches, a doesn't match
[^A-Z] Match anything that is NOT an upper-case letter
ex: 5 matches, A doesn't matche, a matches
[^0-9] Match anything that is NOT a number
ex: 5 doesn't match, A matches, a matches
[^Aa69] Match anything as long as it is not A or a or 6 or 9
ex: 5 matches, A doesn't match, a doesn't match, 3 matches
To see some common character classes, go to:
http://www.regular-expressions.info/reference.html
The string can be up to 25 letters long.
(I'm not sure if regex can check length of strings)
You can absolutely check "length" but not in the way you might imagine. We measure repetition, NOT length strictly speaking using {}:
a{2} Match two a's together.
ex: a doesn't match, aa matches, aca doesn't match
4{3} Match three 4's together.
ex: 4 doesn't match, 44 doesn't match, 444 matches, 4434 doesn't match
Repetition has values we can set to have lower and upper limits:
a{2,} Match on two or more a's together.
ex: a doesn't match, aa matches, aaa matches, aba doesn't match, aaaaaaaaa matches
a{2,5} Match on two to five a's together.
ex: a doesn't match, aa matches, aaa matches, aba doesn't match, aaaaaaaaa doesn't match
Repetition extends to character classes, so:
[a-z]{5} Match any five lower-case characters together.
ex: bubba matches, Bubba doesn't match, BUBBA doesn't match, asdjo matches
[A-Z]{2,5} Match two to five upper-case characters together.
ex: bubba doesn't match, Bubba doesn't match, BUBBA matches, BUBBETTE doesn't match
[0-9]{4,8} Match four to eight numbers together.
ex: bubba doesn't match, 15835 matches, 44 doesn't match, 3456876353456 doesn't match
[a3g]{2} Match an a OR 3 OR g if they show up twice together.
ex: aa matches, ba doesn't match, 33 matches, 38 doesn't match, a3 DOESN'T match
Now let's look at your regex:
[^a-z]|[^A-Z]
Translation: Match anything as long as it is NOT a lowercase letter OR an upper-case letter.
To fix it so it meets your needs, we would rewrite it like this:
Step 1: Remove the negation
[a-z]|[A-Z]
Translation: Find any lowercase letter OR uppercase letter.
Step 2: While not stricly needed, let's clean up the OR logic a bit
[a-zA-Z]
Translation: Find any lowercase letter OR uppercase letter. Same as above but now using only a single set of [].
Step 3: Now let's indicate "length"
[a-zA-Z]{1,25}
Translation: Find any lowercase letter OR uppercase letter repeated one to twenty-five times.
This is where things get funky. You might think you were done here and you may well be depending on the technology you are using.
Strictly speaking the regex [a-zA-Z]{1,25} will match one to twenty-five upper or lower-case letters ANYWHERE on a line:
[a-zA-Z]{1,25}
a matches, aZgD matches, BUBBA matches, 243242hello242552 MATCHES
In fact, every example I have given so far will do the same. If that is what you want then you are in good shape but based on your question, I'm guessing you ONLY want one to twenty-five upper or lower-case letters on the entire line. For that we turn to anchors. Anchors allow us to specify those pesky details:
^ beginning of a line
(I know, we just used this for negation earlier, don't get me started)
$ end of a line
We can use them like this:
^a{3} From the beginning of the line match a three times together
ex: aaa matches, 123aaa doesn't match, aaa123 matches
a{3}$ Match a three times together at the end of a line
ex: aaa matches, 123aaa matches, aaa123 doesn't match
^a{3}$ Match a three times together for the ENTIRE line
ex: aaa matches, 123aaa doesn't match, aaa123 doesn't match
Notice that aaa matches in all cases because it has three a's at the beginning and end of the line technically speaking.
So the final, technically correct solution, for finding a "word" that is "up to five characters long" on a line would be:
^[a-zA-Z]{1,25}$
The funky part is that some technologies implicitly put anchors in the regex for you and some don't. You just have to test your regex or read the docs to see if you have implicit anchors.
The string can be up to 25 letters long.
(I'm not sure if regex can check length of strings)
Regexes ceartanly can check length of a string - as can be seen from the answers posted by others.
However, when you are validating a user input (say, a username), I would advise doing that check separately.
The problem is, that regex can only tell you if a string matched it or not. It won't tell why it didn't match. Was the text too long or did it contain unallowed characters - you can't tell. It's far from friendly, when a program says: "The supplied username contained invalid characters or was too long". Instead you should provide separate error messages for different situations.
The regular expression you are using is an alternation of [^a-z]
and [^A-Z]
. And the expressions [^…]
mean to match any character other than those described in the character set.
So overall your expression means to match either any single character other than a-z
or other than A-Z
.
But you rather need a regular expression that matches a-zA-Z
only:
[a-zA-Z]
And to specify the length of that, anchor the expression with the start (^
) and end ($
) of the string and describe the length with the {
n
,
m
}
quantifier, meaning at least n
but not more than m
repetitions:
^[a-zA-Z]{0,25}$
Do I understand correctly that it can only contain either uppercase or lowercase letters?
new Regex("^([a-z]{1,25}|[A-Z]{1,25})$")
A regular expression seems to be the right thing to use for this case.
By the way, the caret ("^") at the first place inside a character class means "not", so your "[^a-z]|[^A-Z]
" would mean "not any lowercase letter, or not any uppercase letter" (disregarding that a-z are not all letters).