Does someone know a easy way to find characters in Unicode that are similar to ASCII characters. An example is the "CYRILLIC SMALL LETTER DZE (ѕ)". I'd like to do a search and replace for similar characters. By similar I mean human readable. You can't see a difference by looking at it.
相关问题
- Replacing more than n consecutive values in Pandas
- UrlEncodeUnicode and browser navigation errors
- Cannot use the Knowledge academic API
- Unicode issue with makemessages --all Django 1.6.2
- Including decimal equivalent of a char in a charac
相关文章
- Why is `'↊'.isnumeric()` false?
- How to display unicode in SVG?
- UnicodeEncodeError when saving ImageField containi
- Why is TextView showing the unicode right arrow (\
- cscript - print output on same line on console?
- C++ (Standard) Exceptions and Unicode
- Is it possible to have SQL Server convert collatio
- Is there a such a thing like “user-defined encodin
As noted by other commenters, Unicode normalisation ("compatibilty characters") isn't going to help you here as you aren't looking for official equivalences but for similarities in glyphs (letter shapes). (The linked Unicode Technical Report is still worth reading, though, as it is extremely well written.)
If I were you, to spare you the tedious work of assembling a list of characters yourself, I'd search for resources on homograph attacks: This is a method of maliciously misleading web users by displaying URLs containing domain names in which some letters have been replaced with visually similar letters. Another Unicode Technical Report, on security, contains a section on the problem. There is also -- and that may be what you most need -- a "confusables" table. Here's another article with mainly punctuation marks, some of which ASCII, that have visually similar counterparts in the non-ASCII code tables.
What I do hope is that you aren't asking the question to construct such an attack.
See the Unicode Database: http://www.unicode.org/Public/UNIDATA/UnicodeData.txt.
Each line describes a unicode caharacter, for example:
If there's any similar (compatible) characters for that symbol, it will appear in the
<compat>
field of the entry. In this example,0061
(ASCIIa
) is compatible to theLATIN SMALL LETTER A WITH RIGHT HALF RING
Unicode character.As for your character, the entry is
which, as you can see, does not specify a compatibility character.