I'm working on a form which one of it's custom validator should only accept persian characters...I used the following code:
var myregex = new Regex(@"^[\u0600-\u06FF]+$");
if (myregex.IsMatch(mytextBox.Text))
{
args.IsValid = true;
}
else
{
args.IsValid = false;
}
but it seems it only work for checking arabic characters and it doesn't cover all persian characters (it lacks these four گ,چ,پ,ژ )... is there a way for solving this problem?
TL;DR
Farsi MUST used character sets are as following:
Use
^[آابپتثجچحخدذرزژسشصضطظعغفقکگلمنوهی]+$
for letters or use codepoints regarding your regex flavor (not all engines support\uXXXX
notation):Use
^[۰۱۲۳۴۵۶۷۸۹]+$
for numbers or regarding your regex flavor:Use
[ ٌ ًّ َ ِ ُ ْ ]
for vowels or regarding your regex flavor:or a combination of those together. You may want to add other Arabic letters like Hamza
ء
to your character set additionally.Why are
[\u0600-\u06FF]
and[آ-ی]
both wrong?Although
\u0600-\u06FF
includes:گ
with codepoint06AF
چ
with codepoint0686
پ
with codepoint067E
ژ
with codepoint0698
as well, all answers that suggest
[\u0600-\u06FF]
or[آ-ی]
are simply WRONG.Whole story
This answer exists to fix a common misconception. Codepoints
0600
through06FF
do not denote Persian / Farsi alphabet (neither does[آ-ی]
):255 characters are fallen under Arabic block (0600–06FF), Farsi alphabet has 32 letters that in addition to Farsi demonstration of digits it would be 42. If we add vowels (Arabic vowels originally, that rarely used in Farsi) without Tanvin (
ً
,ٍِ
,ٌ
) and Tashdid (ّ
) that are both a subset of Arabic diacritics not Farsi, we would end up with 46 characters. This means\u0600-\u06FF
contains 209 more characters than you need!۷
with codepoint06F7
is a Farsi representation of number7
and٧
with codepoint0667
is Arabic representation of the same number.۶
is Farsi representation of number6
and٦
is Arabic representation of the same number. And all reside in0600
through06FF
codepoints.You can see different number of other characters that doesn't exist in Farsi / Persian too and nobody is willing to have them while validating a first name or surname.
[آ-ی]
includes 117 characters too which is much more than what someone needs for validation. You can see them all using Unicode CLDR.