I'm working on a form which one of it's custom validator should only accept persian characters...I used the following code:
var myregex = new Regex(@"^[\u0600-\u06FF]+$");
if (myregex.IsMatch(mytextBox.Text))
{
args.IsValid = true;
}
else
{
args.IsValid = false;
}
but it seems it only work for checking arabic characters and it doesn't cover all persian characters (it lacks these four گ,چ,پ,ژ )... is there a way for solving this problem?
In addition to the accepted answer(https://stackoverflow.com/a/22565376/790811), we should consider Zero-width_non-joiner (or نیم فاصله in persian) characters too. Unfortunately we have 2 symbols for it. One is standard and the other is not standard but widely used :
So the final regix can be :
If you want to consider "space", you can use this :
you can test it JavaScript by this :
What you currently have in your regex is a standard Arabic symbols range. For additional characters your need to add them to the regex separately. Here are their codes:
So all in all you should have
attention: persianRex is written in Javascript however you can use the source code and copy paste the characters
Detecting Persian characters is a tricky task due to veraiety of keyboard layouts and operating systems. I faced the same challenge sometime before and I decided to write an open source library to fix this issue.
you can fix your issue like this: persianRex.text.test(yourInput); //returns true or false
here is the full documentation: http://imanmh.github.io/persianRex/
Farsi, Dari and Tajik are out of my bailiwick, but a little rummaging through the Unicode code charts tells me that Arabic covers 5 Unicode code blocks:
You can get at them (at least some of them) in regular expressions using named blocks instead of explicit code point ranges:
\p{IsArabicPresentationForms-A}
will give you the 4th Unicode block in the preceding list.You might also read Persian Computing in Unicode: http://behdad.org/download/Publications/persiancomputing/a007.pdf
I'm not sure if regex is the way to do this, however the problem is not specific to only persian or arabic, chinees, russian text. so perhaps you could see if the character is existing in your Codepage, if not in the code page then I doubt the user can insert them using a input device....
The test tests a round trip where input should match the string to bytes and back. The link shows those code pages supported.
The named blocks, e.g \p{Arabic} cover the entire Arabic script, not just the Persian characters.
The presentation forms (u+FB50-u+FDFF) should not be used in text, and should be converted to the standard range (u+0600-u+06FF).
In order to only cover Persian we need the following:
So, the resulting regexp would be:
See also the exemplar characters for Persian listed here:
http://unicode.org/cldr/trac/browser/trunk/common/main/fa.xml