Without looping over the entire range of Unicode characters, how can I get a list of characters that have a given property? In particular I want a list of all characters that are digits (i.e. those that match /\d/
). I have looked at Unicode::UCD
, and it is useful for determining the properties of a given character, but there doesn't seem to be a way to get a list characters that have a property out of it.
相关问题
- $ENV{$variable} in perl
- UrlEncodeUnicode and browser navigation errors
- Is it possible to pass command-line arguments to @
- Redirecting STDOUT and STDERR to a file, except fo
- Change first key of multi-dimensional Hash in perl
相关文章
- Running a perl script on windows without extension
- Why is `'↊'.isnumeric()` false?
- How to display unicode in SVG?
- Comparing speed of non-matching regexp
- Can NOT List directory including space using Perl
- Extracting columns from text file using Perl one-l
- UnicodeEncodeError when saving ImageField containi
- Lazy (ungreedy) matching multiple groups using reg
There is no way to do that without iterating through all the characters. (if you create a huge string with all of them and use a regexp you still have to do the loop at least once, to create the string).
The list of Unicode characters for each class is generated from the Unicode spec when you compile Perl, and is typically stored in /usr/lib/perl-YOURPERLVERSION/unicore/lib/gc_sc/
For example, the list of Unicode character ranges that match IsDigit (a.k.a. \d) is stored in the file /usr/lib/perl-YOURPERLVERSION/unicore/lib/gc_sc/Digit.pl
which characters /\d/ match depends entirely on your regexp implementation (although standard 0-9 are guaranteed). In the case of perl the perl locale used defines which characters are considered alphabetic and digits.
Even better than
unicore/lib/gc_sc/Digit.pl
isunicore/To/Digit.pl
. It is a direct mapping of Unicode digit characters (well, really their offsets) to their numeric values. This means instead of:I can say:
Or even better: