What are the unicode ranges for Hindi accented cha

2020-02-01 08:35发布

I'm trying to gather a Unicode list of all the 'o' like shapes in the Hindi character-set. In fact, a list of any characters (in any language) that makes uses of separate characters to indicate an accent would be better.

I intend to use this unicode-list in a RegExp.

I been trying to edit a list of character-ranges by outputting them in an Input TextField, but editing this text causes weird issues (the keyboard-cursor isn't place on the correct character, selections suddenly dissappear / incorrectly warps... in other words... HINDI HELL!)

I've tried this with Notepad++ too, but although it was more responsive, it eventually crapped out on me like it did in the Flash Player textfield. This seems to occur especially while removing the [] block (nulls?) characters. Some of them trigger odd behaviors.

Anyways, all I want is a list of the accents. An example of a few are in the image below (but I would need ALL accents):

enter image description here

Thanks!

3条回答
狗以群分
2楼-- · 2020-02-01 09:12

You can find pdf's containing lists of unicode ranges, grouped by language, here: http://unicode.org/charts/

For Hindi, you probably want Devanagari or Devanagari Extended.

查看更多
戒情不戒烟
3楼-- · 2020-02-01 09:14

If you want the complete set (for all languages), you can do it problematically. You start from the Unicode date file at ftp://ftp.unicode.org/Public/6.1.0/ucd/UnicodeData.txt, described by TR-44 (http://unicode.org/reports/tr44/#Property_Definitions)

You can use the Canonical_Combining_Class field (see at http://unicode.org/reports/tr44/#Canonical_Combining_Class_Values) to filter the exact characters you want. Can't be more precise, because "accent" a bit vague :-) You might even have to also look at General_Category to get the filter right (and exclude certain marks, or symbols, or punctuation).

And a script doing this would definitely be better than trying to mess with text editors. One of the characteristics of combining characters is that they combine :-) So you might get all kind of puzzling results (like this: http://www.siao2.com/2006/02/17/533929.aspx :-)

查看更多
beautiful°
4楼-- · 2020-02-01 09:18

Here is the character class for Devanagari combining marks:

[\u901\u902\u903\u93c\u93e\u93f\u940\u941\u942\u943
 \u944\u945\u946\u947\u948\u949\u94a\u94b\u94c\u94d
 \u951\u952\u953\u954\u962\u963]

This is only the basic Devanagari block (not Devanagari Extended).

查看更多
登录 后发表回答