What is a list of every unicode bracket-like characters (including, for example: {}[]()<>
)? What is a good way to search for unicode characters?
相关问题
- UrlEncodeUnicode and browser navigation errors
- Unicode issue with makemessages --all Django 1.6.2
- Python process a csv file to remove unicode charac
- How to match non-ASCII (German, Spanish, etc.) let
- Unicode Warning when using NLTK stopwords with Tfi
相关文章
- Why is `'↊'.isnumeric()` false?
- How to display unicode in SVG?
- UnicodeEncodeError when saving ImageField containi
- Why is TextView showing the unicode right arrow (\
- C++ (Standard) Exceptions and Unicode
- Is it possible to have SQL Server convert collatio
- Grouping AND and OR conditionals in PostgreSQL
- UTF-16 safe substring in C# .NET
Recent Unicode releases have added a property Bidi_Paired_Bracket that gives what Unicode thinks is the answer to this question. This is different from the ones that are mirrored. There are 60 bracket-pairs as of Unicode 8.0. The following table maps each to its mate. The first column gives a code point; the second gives the Unicode version it was introduced in; the third shows the mapping, and the final column gives the mapping by character name. Also, for looking at the Unicode character database, Perl5 is packaged with the module Unicode::UCD, with many functions for inspecting things, including new ones in Perl v5.22 that will output the value of all properties for a given code point. Unicode::Tussle on CPAN offers similar and other functionality
If you want to pick up characters like
<
and>
that are not formally considered grouping symbols you could take a look at http://www.unicode.org/Public/UNIDATA/BidiMirroring.txt as suggested by @roeland here. That file lists all pairs of characters which should be mirror images of each other.Here's the full list:
The idea of “bracket-like” characters might be more or less be identified with the General Category (gc) property values of Ps (Punctuation, open) and Pe (Punctuation, close). This category contains a few dozens of paired punctuation marks, mostly excluding quotation marks (categories Pi and Pf).
In programming, many languages have tools for testing for the General Category of a character, e.g.
\p{Ps}
in Perl.If you just need some lists, you could use the Unicode Character Categories information at fileformat.info.
Generally, the way to search for Unicode characters depends on what you are looking for and on your criteria. General Category is a good starting point in many cases.
There is a plain-text database of information about every Unicode character available from the Unicode Consortium; the format is described in Unicode Annex #44. The primary information is contained in UnicodeData.txt. Open and close punctuation characters are denoted with
Ps
(punctuation start) andPe
(punctuation end) in the General_Category field (the third field, delimited by;
). Look for those character, and you'll find what you're looking for.Note that not all characters that you consider brackets may be listed; for instance, quotation marks (including "«»"). are indicated with
Pi
andPf
(initial and final punctuation), so you might want to include those as well. And some character, like<
and>
, are used as brackets in some contexts (such as HTML/XML), while they are considered math symbols (Sm
) in UnicodeData.txt. Those you are going to have to find by hand; there is no pre-determined listing of those.Here's a quick Bash script to get this information, and its output. I've included both brackets and quotes. (note: it looks like Bash's UTF-8 printing has a bug that causes it not to print U+00AB "«" and U+00BB "»" properly, that's why those show up as ?).
There's no canonical list of this in Unicode—you'll have to define your own list. You could start by using Python's
unicodedata
module to explore the Unicode database. Note that this won't find things like<>
that are used as braces even though they have other official meanings, namely as less-than and greater-than signs.Output:
http://xahlee.info/comp/unicode_matching_brackets.html
This is an excellent and very comprehensive website (for brackets and everything else too), and it looks like they display them all using Arial, sans-serif, so if you can see the character, then it should work with good browser support.