I have been trying to write a regular expression that would match all unicode word character something like :
/[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF\w]/gi
But this completely fails and doesn't match anything. I have tried a variety of expressions and it seems that as soon as I try to specify a range it fails. As anyone been luckier than me?
I wish actionscript would offer something like \p{L}, but if there's anything in the like, I couldn't find it in the doc.
You can use String.fromCharCode with the unicode characters and then the ranges will work correctly in a regular expression. Here is an example using your original problem:
var exp:RegExp = new RegExp("[" + generateRangeForUnicodeVariables(0x00A0, 0xD7FF) + generateRangeForUnicodeVariables(0xF900, 0xFDCF) + generateRangeForUnicodeVariables(0xFDF0, 0xFFEF) + "\w]", "gi");
private function generateRangeForUnicodeVariables(var1:Object, var2:Object):String
{
return String.fromCharCode(var1) + "-" + String.fromCharCode(var2);
}
This has been a problem for sometime and I couldn't find any information that it has been solved, previously asked in:
Restrict input to a specified language
and
How to specify a unicode range in a RegExp?
I know this is a hack, but it does work in JavaScript so you could use ExternalInterface to farm the test out there and pass the result back.
Hmm. Looks like it's not about ranges, it's about multi-byte characters.
This works:
var exp:RegExp = new RegExp("[\u00A0-\u0FCF]", "gi");
var str:String = "\u00A1 \u00A2 \u00A3 \u00A3";
trace("subject:", str);
trace("match:", str.match(exp));
And this does not:
var exp:RegExp = new RegExp("[\u00A0-\u0FD0]", "gi");
var str:String = "\u00A1 \u00A2 \u00A3 \u00A3";
trace("subject:", str);
trace("match:", str.match(exp));
Anyway, you can use RegExp constructor that converts a string to a matching pattern.